Chat Completions

Process an AI chat completion request. Requires payment via x402.

Endpoint


POST /v1/chat/completions

Request

Headers


Content-Type: application/json

Body Parameters

Field	Type	Required	Default	Description
`messages`	Array	Yes	-	Array of message objects representing the conversation history
`model`	String	No	`anthropic/claude-3.5-sonnet`	AI model to use (see List Models)
`max_tokens`	Number	No	4096	Maximum tokens to generate (1-128,000)
`temperature`	Number	No	0.7	Sampling temperature (0-2)
`stream`	Boolean	No	false	Enable Server-Sent Events streaming

Message Object

Field	Type	Required	Description
`role`	String	Yes	Role of the message sender: `system`, `user`, or `assistant`
`content`	String	Yes	Content of the message (1-100,000 characters)

Validation Rules

Messages array: 1-100 items
Message content: 1-100,000 characters
Model name: Alphanumeric with -, _, /, ., :
Max tokens: 1-128,000
Temperature: 0-2

Response

Success (200 OK) - Non-Streaming


{
  id: string;                    // Unique request identifier
  object: "chat.completion";     // Response object type
  created: number;               // Unix timestamp of response creation
  model: string;                 // The model that generated the response
  system_fingerprint?: string;   // Backend configuration fingerprint
 
  choices: Array<{
    index: number;               // Choice index (usually 0)
    message: {
      role: "assistant";         // Always "assistant" for responses
      content: string | null;    // The AI-generated response text
 
      // Reasoning fields (for reasoning models like DeepSeek R1, Gemini Thinking)
      reasoning?: string | null;           // Chain-of-thought reasoning process
      reasoning_details?: Array<{          // Structured reasoning breakdown
        type: string;                      // Type: "reasoning.text", "reasoning.summary"
        format?: string | null;            // Format identifier
        index?: number;                    // Reasoning step index
        text?: string;                     // Reasoning text content
        summary?: string;                  // Reasoning summary
      }>;
 
      refusal?: string | null;   // Model refusal message (if applicable)
    };
    finish_reason: string | null; // Reason for completion: "stop", "length", "content_filter"
  }>;
 
  usage?: {
    prompt_tokens: number;            // Number of tokens in the input
    completion_tokens: number;        // Number of tokens in the output
    total_tokens: number;             // Total tokens (input + output)
 
    // Detailed token breakdowns (when available)
    prompt_tokens_details?: {
      cached_tokens?: number;         // Tokens retrieved from cache
    };
    completion_tokens_details?: {
      reasoning_tokens?: number;      // Tokens used for reasoning (o1, R1, Gemini Thinking models)
      image_tokens?: number;          // Tokens from image processing
      audio_tokens?: number;          // Tokens from audio processing
    };
 
    // Timing statistics (when available)
    queue_time?: number;              // Time spent in queue (ms)
    prompt_time?: number;             // Time processing prompt (ms)
    completion_time?: number;         // Time generating completion (ms)
    total_time?: number;              // Total processing time (ms)
  };
}

Note on Reasoning Models: Models like deepseek/deepseek-r1 and google/gemini-2.0-flash-thinking-exp return their internal reasoning process in the reasoning field, while the final answer appears in content. The reasoning_details array provides structured breakdowns of the reasoning steps.

Success (200 OK) - Streaming

When stream: true, the response is delivered as Server-Sent Events (SSE):


Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3.5-sonnet","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3.5-sonnet","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3.5-sonnet","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":3,"total_tokens":15}}

data: [DONE]

Streaming Event Format:

Each chunk is prefixed with data:
Chunks contain delta objects with incremental content
The final chunk includes a usage object with token counts
Usage contains: prompt_tokens, completion_tokens, total_tokens
Stream ends with data: [DONE]
Payment is required before streaming begins

Payment Required (402)


{
  "statusCode": 402,
  "error": "Payment Required",
  "message": "Payment required for this endpoint"
}

Note: x402-fetch handles this automatically by signing and submitting payment.

Examples

Simple Request


curl -X POST https://api.x-router.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What is 2+2?"
      }
    ],
    "model": "anthropic/claude-3.5-sonnet",
    "max_tokens": 100
  }'

With TypeScript


import { wrapFetchWithPayment } from 'x402-fetch';
import { privateKeyToAccount } from 'viem/accounts';
 
const account = privateKeyToAccount(process.env.PRIVATE_KEY as `0x${string}`);
const fetchWithPayment = wrapFetchWithPayment(fetch, account);
 
const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    messages: [
      { role: 'user', content: 'What is 2+2?' }
    ],
    model: 'anthropic/claude-3.5-sonnet',
    max_tokens: 100
  })
});
 
const data = await response.json();
console.log(data.choices[0].message.content);

With System Prompt


const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    messages: [
      {
        role: 'system',
        content: 'You are a helpful coding assistant.'
      },
      {
        role: 'user',
        content: 'How do I reverse a string in Python?'
      }
    ],
    model: 'anthropic/claude-3.5-sonnet',
    max_tokens: 200,
    temperature: 0.7
  })
});
 
const data = await response.json();
console.log(data.choices[0].message.content);
console.log('Tokens used:', data.usage.total_tokens);

Multi-Turn Conversation


const messages = [
  { role: 'user', content: 'Hello!' },
  { role: 'assistant', content: 'Hi! How can I help you today?' },
  { role: 'user', content: 'What is quantum computing?' }
];
 
const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    messages,
    model: 'anthropic/claude-3.5-sonnet',
    max_tokens: 300
  })
});
 
const data = await response.json();
 
// Add assistant's response to conversation
messages.push({
  role: 'assistant',
  content: data.choices[0].message.content
});

Using Reasoning Models

Reasoning models like DeepSeek R1 and Gemini Thinking provide their internal thought process:


const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    messages: [
      { role: 'user', content: 'Solve this math problem: If x + 5 = 12, what is x?' }
    ],
    model: 'deepseek/deepseek-r1',
    max_tokens: 1000
  })
});
 
const data = await response.json();
const message = data.choices[0].message;
 
// Access the reasoning process
if (message.reasoning) {
  console.log('Reasoning:', message.reasoning);
}
 
// Get the final answer
console.log('Answer:', message.content);
 
// Check token usage including reasoning tokens
if (data.usage?.completion_tokens_details?.reasoning_tokens) {
  console.log('Reasoning tokens:', data.usage.completion_tokens_details.reasoning_tokens);
}

Streaming Response


const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    messages: [
      { role: 'user', content: 'Tell me a short story.' }
    ],
    model: 'anthropic/claude-3.5-sonnet',
    max_tokens: 200,
    stream: true
  })
});
 
const reader = response.body?.getReader();
const decoder = new TextDecoder();
 
while (true) {
  const { done, value } = await reader!.read();
  if (done) break;
 
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
 
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = line.slice(6);
      if (data === '[DONE]') break;
 
      try {
        const json = JSON.parse(data);
        const content = json.choices?.[0]?.delta?.content || '';
        if (content) process.stdout.write(content);
      } catch (e) {}
    }
  }
}

Pricing

Pricing is dynamic and calculated based on:

Number of input tokens (all messages combined)
Requested max_tokens for output
Model-specific pricing rates

Price is determined before payment and includes a buffer to ensure we never undercharge.

Use the Estimate endpoint to get a cost estimate before making a paid request.

Message Patterns

Simple User Message


{
  "messages": [
    { "role": "user", "content": "What is the capital of France?" }
  ]
}

System Prompt + User Message


{
  "messages": [
    { "role": "system", "content": "You are a helpful math tutor." },
    { "role": "user", "content": "What is 15 * 7?" }
  ]
}

Multi-Turn Conversation


{
  "messages": [
    { "role": "user", "content": "Hello!" },
    { "role": "assistant", "content": "Hi! How can I help you today?" },
    { "role": "user", "content": "What is quantum computing?" }
  ]
}

Complex Conversation with System Prompt


{
  "messages": [
    { "role": "system", "content": "You are a concise coding assistant." },
    { "role": "user", "content": "How do I create a function in JavaScript?" },
    { "role": "assistant", "content": "Use function myFunc() {} or const myFunc = () => {}" },
    { "role": "user", "content": "Which is better for callbacks?" }
  ]
}

Error Responses

The API returns standard HTTP error codes with JSON error responses.

Common errors:

400 Bad Request: Invalid request format or parameters
402 Payment Required: Payment needed (handled automatically by x402-fetch)
429 Too Many Requests: Rate limit exceeded (see Rate Limits)
500 Internal Server Error: Server error
503 Service Unavailable: Backend service unavailable

Rate Limits

All API endpoints are rate limited to prevent abuse and ensure fair usage:

Limit: 100 requests per minute per IP address
Window: 60 seconds (rolling window)
Scope: Applied per IP address (includes forwarded IPs via X-Forwarded-For header)

Rate Limit Response

When rate limited, you’ll receive a 429 error:


{
  "error": "Too many requests",
  "code": "RATE_LIMITED"
}

Best Practices

Implement exponential backoff when receiving 429 errors
Batch operations when possible to reduce request count
Cache responses to avoid repeated identical requests
Monitor your rate in production to stay under limits
Use streaming for long responses instead of polling

Example: Retry with Backoff


async function requestWithRetry(url: string, options: RequestInit, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetchWithPayment(url, options);
 
      if (response.status === 429) {
        const waitTime = Math.pow(2, attempt) * 1000; // Exponential backoff
        console.log(`Rate limited. Waiting ${waitTime}ms before retry...`);
        await new Promise(resolve => setTimeout(resolve, waitTime));
        continue;
      }
 
      return response;
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
    }
  }
}