Skip to Content
API ReferenceChat Completions

Chat Completions

Process an AI chat completion request. Requires payment via x402.

Endpoint

POST /v1/chat/completions

Request

Headers

Content-Type: application/json

Body Parameters

FieldTypeRequiredDefaultDescription
messagesArrayYes-Array of message objects representing the conversation history
modelStringNoanthropic/claude-3.5-sonnetAI model to use (see List Models)
max_tokensNumberNo4096Maximum tokens to generate (1-128,000)
temperatureNumberNo0.7Sampling temperature (0-2)
streamBooleanNofalseEnable Server-Sent Events streaming

Message Object

FieldTypeRequiredDescription
roleStringYesRole of the message sender: system, user, or assistant
contentStringYesContent of the message (1-100,000 characters)

Validation Rules

  • Messages array: 1-100 items
  • Message content: 1-100,000 characters
  • Model name: Alphanumeric with -, _, /, ., :
  • Max tokens: 1-128,000
  • Temperature: 0-2

Response

Success (200 OK) - Non-Streaming

{ id: string; // Unique request identifier object: "chat.completion"; // Response object type created: number; // Unix timestamp of response creation model: string; // The model that generated the response system_fingerprint?: string; // Backend configuration fingerprint choices: Array<{ index: number; // Choice index (usually 0) message: { role: "assistant"; // Always "assistant" for responses content: string | null; // The AI-generated response text // Reasoning fields (for reasoning models like DeepSeek R1, Gemini Thinking) reasoning?: string | null; // Chain-of-thought reasoning process reasoning_details?: Array<{ // Structured reasoning breakdown type: string; // Type: "reasoning.text", "reasoning.summary" format?: string | null; // Format identifier index?: number; // Reasoning step index text?: string; // Reasoning text content summary?: string; // Reasoning summary }>; refusal?: string | null; // Model refusal message (if applicable) }; finish_reason: string | null; // Reason for completion: "stop", "length", "content_filter" }>; usage?: { prompt_tokens: number; // Number of tokens in the input completion_tokens: number; // Number of tokens in the output total_tokens: number; // Total tokens (input + output) // Detailed token breakdowns (when available) prompt_tokens_details?: { cached_tokens?: number; // Tokens retrieved from cache }; completion_tokens_details?: { reasoning_tokens?: number; // Tokens used for reasoning (o1, R1, Gemini Thinking models) image_tokens?: number; // Tokens from image processing audio_tokens?: number; // Tokens from audio processing }; // Timing statistics (when available) queue_time?: number; // Time spent in queue (ms) prompt_time?: number; // Time processing prompt (ms) completion_time?: number; // Time generating completion (ms) total_time?: number; // Total processing time (ms) }; }

Note on Reasoning Models: Models like deepseek/deepseek-r1 and google/gemini-2.0-flash-thinking-exp return their internal reasoning process in the reasoning field, while the final answer appears in content. The reasoning_details array provides structured breakdowns of the reasoning steps.

Success (200 OK) - Streaming

When stream: true, the response is delivered as Server-Sent Events (SSE):

Content-Type: text/event-stream Cache-Control: no-cache Connection: keep-alive data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3.5-sonnet","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]} data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3.5-sonnet","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]} data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3.5-sonnet","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":3,"total_tokens":15}} data: [DONE]

Streaming Event Format:

  • Each chunk is prefixed with data:
  • Chunks contain delta objects with incremental content
  • The final chunk includes a usage object with token counts
  • Usage contains: prompt_tokens, completion_tokens, total_tokens
  • Stream ends with data: [DONE]
  • Payment is required before streaming begins

Payment Required (402)

{ "statusCode": 402, "error": "Payment Required", "message": "Payment required for this endpoint" }

Note: x402-fetch handles this automatically by signing and submitting payment.

Examples

Simple Request

curl -X POST https://api.x-router.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "messages": [ { "role": "user", "content": "What is 2+2?" } ], "model": "anthropic/claude-3.5-sonnet", "max_tokens": 100 }'

With TypeScript

import { wrapFetchWithPayment } from 'x402-fetch'; import { privateKeyToAccount } from 'viem/accounts'; const account = privateKeyToAccount(process.env.PRIVATE_KEY as `0x${string}`); const fetchWithPayment = wrapFetchWithPayment(fetch, account); const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: [ { role: 'user', content: 'What is 2+2?' } ], model: 'anthropic/claude-3.5-sonnet', max_tokens: 100 }) }); const data = await response.json(); console.log(data.choices[0].message.content);

With System Prompt

const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: [ { role: 'system', content: 'You are a helpful coding assistant.' }, { role: 'user', content: 'How do I reverse a string in Python?' } ], model: 'anthropic/claude-3.5-sonnet', max_tokens: 200, temperature: 0.7 }) }); const data = await response.json(); console.log(data.choices[0].message.content); console.log('Tokens used:', data.usage.total_tokens);

Multi-Turn Conversation

const messages = [ { role: 'user', content: 'Hello!' }, { role: 'assistant', content: 'Hi! How can I help you today?' }, { role: 'user', content: 'What is quantum computing?' } ]; const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages, model: 'anthropic/claude-3.5-sonnet', max_tokens: 300 }) }); const data = await response.json(); // Add assistant's response to conversation messages.push({ role: 'assistant', content: data.choices[0].message.content });

Using Reasoning Models

Reasoning models like DeepSeek R1 and Gemini Thinking provide their internal thought process:

const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: [ { role: 'user', content: 'Solve this math problem: If x + 5 = 12, what is x?' } ], model: 'deepseek/deepseek-r1', max_tokens: 1000 }) }); const data = await response.json(); const message = data.choices[0].message; // Access the reasoning process if (message.reasoning) { console.log('Reasoning:', message.reasoning); } // Get the final answer console.log('Answer:', message.content); // Check token usage including reasoning tokens if (data.usage?.completion_tokens_details?.reasoning_tokens) { console.log('Reasoning tokens:', data.usage.completion_tokens_details.reasoning_tokens); }

Streaming Response

const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: [ { role: 'user', content: 'Tell me a short story.' } ], model: 'anthropic/claude-3.5-sonnet', max_tokens: 200, stream: true }) }); const reader = response.body?.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader!.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = line.slice(6); if (data === '[DONE]') break; try { const json = JSON.parse(data); const content = json.choices?.[0]?.delta?.content || ''; if (content) process.stdout.write(content); } catch (e) {} } } }

Pricing

Pricing is dynamic and calculated based on:

  • Number of input tokens (all messages combined)
  • Requested max_tokens for output
  • Model-specific pricing rates

Price is determined before payment and includes a buffer to ensure we never undercharge.

Use the Estimate endpoint to get a cost estimate before making a paid request.

Message Patterns

Simple User Message

{ "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }

System Prompt + User Message

{ "messages": [ { "role": "system", "content": "You are a helpful math tutor." }, { "role": "user", "content": "What is 15 * 7?" } ] }

Multi-Turn Conversation

{ "messages": [ { "role": "user", "content": "Hello!" }, { "role": "assistant", "content": "Hi! How can I help you today?" }, { "role": "user", "content": "What is quantum computing?" } ] }

Complex Conversation with System Prompt

{ "messages": [ { "role": "system", "content": "You are a concise coding assistant." }, { "role": "user", "content": "How do I create a function in JavaScript?" }, { "role": "assistant", "content": "Use function myFunc() {} or const myFunc = () => {}" }, { "role": "user", "content": "Which is better for callbacks?" } ] }

Error Responses

The API returns standard HTTP error codes with JSON error responses.

Common errors:

  • 400 Bad Request: Invalid request format or parameters
  • 402 Payment Required: Payment needed (handled automatically by x402-fetch)
  • 429 Too Many Requests: Rate limit exceeded (see Rate Limits)
  • 500 Internal Server Error: Server error
  • 503 Service Unavailable: Backend service unavailable

Rate Limits

All API endpoints are rate limited to prevent abuse and ensure fair usage:

  • Limit: 100 requests per minute per IP address
  • Window: 60 seconds (rolling window)
  • Scope: Applied per IP address (includes forwarded IPs via X-Forwarded-For header)

Rate Limit Response

When rate limited, you’ll receive a 429 error:

{ "error": "Too many requests", "code": "RATE_LIMITED" }

Best Practices

  • Implement exponential backoff when receiving 429 errors
  • Batch operations when possible to reduce request count
  • Cache responses to avoid repeated identical requests
  • Monitor your rate in production to stay under limits
  • Use streaming for long responses instead of polling

Example: Retry with Backoff

async function requestWithRetry(url: string, options: RequestInit, maxRetries = 3) { for (let attempt = 0; attempt < maxRetries; attempt++) { try { const response = await fetchWithPayment(url, options); if (response.status === 429) { const waitTime = Math.pow(2, attempt) * 1000; // Exponential backoff console.log(`Rate limited. Waiting ${waitTime}ms before retry...`); await new Promise(resolve => setTimeout(resolve, waitTime)); continue; } return response; } catch (error) { if (attempt === maxRetries - 1) throw error; } } }
Last updated on