Chat Completions
Process an AI chat completion request. Requires payment via x402.
Endpoint
POST /v1/chat/completionsRequest
Headers
Content-Type: application/jsonBody Parameters
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
messages | Array | Yes | - | Array of message objects representing the conversation history |
model | String | No | anthropic/claude-3.5-sonnet | AI model to use (see List Models) |
max_tokens | Number | No | 4096 | Maximum tokens to generate (1-128,000) |
temperature | Number | No | 0.7 | Sampling temperature (0-2) |
stream | Boolean | No | false | Enable Server-Sent Events streaming |
Message Object
| Field | Type | Required | Description |
|---|---|---|---|
role | String | Yes | Role of the message sender: system, user, or assistant |
content | String | Yes | Content of the message (1-100,000 characters) |
Validation Rules
- Messages array: 1-100 items
- Message content: 1-100,000 characters
- Model name: Alphanumeric with
-,_,/,.,: - Max tokens: 1-128,000
- Temperature: 0-2
Response
Success (200 OK) - Non-Streaming
{
id: string; // Unique request identifier
object: "chat.completion"; // Response object type
created: number; // Unix timestamp of response creation
model: string; // The model that generated the response
system_fingerprint?: string; // Backend configuration fingerprint
choices: Array<{
index: number; // Choice index (usually 0)
message: {
role: "assistant"; // Always "assistant" for responses
content: string | null; // The AI-generated response text
// Reasoning fields (for reasoning models like DeepSeek R1, Gemini Thinking)
reasoning?: string | null; // Chain-of-thought reasoning process
reasoning_details?: Array<{ // Structured reasoning breakdown
type: string; // Type: "reasoning.text", "reasoning.summary"
format?: string | null; // Format identifier
index?: number; // Reasoning step index
text?: string; // Reasoning text content
summary?: string; // Reasoning summary
}>;
refusal?: string | null; // Model refusal message (if applicable)
};
finish_reason: string | null; // Reason for completion: "stop", "length", "content_filter"
}>;
usage?: {
prompt_tokens: number; // Number of tokens in the input
completion_tokens: number; // Number of tokens in the output
total_tokens: number; // Total tokens (input + output)
// Detailed token breakdowns (when available)
prompt_tokens_details?: {
cached_tokens?: number; // Tokens retrieved from cache
};
completion_tokens_details?: {
reasoning_tokens?: number; // Tokens used for reasoning (o1, R1, Gemini Thinking models)
image_tokens?: number; // Tokens from image processing
audio_tokens?: number; // Tokens from audio processing
};
// Timing statistics (when available)
queue_time?: number; // Time spent in queue (ms)
prompt_time?: number; // Time processing prompt (ms)
completion_time?: number; // Time generating completion (ms)
total_time?: number; // Total processing time (ms)
};
}Note on Reasoning Models: Models like deepseek/deepseek-r1 and google/gemini-2.0-flash-thinking-exp return their internal reasoning process in the reasoning field, while the final answer appears in content. The reasoning_details array provides structured breakdowns of the reasoning steps.
Success (200 OK) - Streaming
When stream: true, the response is delivered as Server-Sent Events (SSE):
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3.5-sonnet","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3.5-sonnet","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3.5-sonnet","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":3,"total_tokens":15}}
data: [DONE]Streaming Event Format:
- Each chunk is prefixed with
data: - Chunks contain
deltaobjects with incremental content - The final chunk includes a
usageobject with token counts - Usage contains:
prompt_tokens,completion_tokens,total_tokens - Stream ends with
data: [DONE] - Payment is required before streaming begins
Payment Required (402)
{
"statusCode": 402,
"error": "Payment Required",
"message": "Payment required for this endpoint"
}Note: x402-fetch handles this automatically by signing and submitting payment.
Examples
Simple Request
curl -X POST https://api.x-router.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "What is 2+2?"
}
],
"model": "anthropic/claude-3.5-sonnet",
"max_tokens": 100
}'With TypeScript
import { wrapFetchWithPayment } from 'x402-fetch';
import { privateKeyToAccount } from 'viem/accounts';
const account = privateKeyToAccount(process.env.PRIVATE_KEY as `0x${string}`);
const fetchWithPayment = wrapFetchWithPayment(fetch, account);
const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [
{ role: 'user', content: 'What is 2+2?' }
],
model: 'anthropic/claude-3.5-sonnet',
max_tokens: 100
})
});
const data = await response.json();
console.log(data.choices[0].message.content);With System Prompt
const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [
{
role: 'system',
content: 'You are a helpful coding assistant.'
},
{
role: 'user',
content: 'How do I reverse a string in Python?'
}
],
model: 'anthropic/claude-3.5-sonnet',
max_tokens: 200,
temperature: 0.7
})
});
const data = await response.json();
console.log(data.choices[0].message.content);
console.log('Tokens used:', data.usage.total_tokens);Multi-Turn Conversation
const messages = [
{ role: 'user', content: 'Hello!' },
{ role: 'assistant', content: 'Hi! How can I help you today?' },
{ role: 'user', content: 'What is quantum computing?' }
];
const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages,
model: 'anthropic/claude-3.5-sonnet',
max_tokens: 300
})
});
const data = await response.json();
// Add assistant's response to conversation
messages.push({
role: 'assistant',
content: data.choices[0].message.content
});Using Reasoning Models
Reasoning models like DeepSeek R1 and Gemini Thinking provide their internal thought process:
const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [
{ role: 'user', content: 'Solve this math problem: If x + 5 = 12, what is x?' }
],
model: 'deepseek/deepseek-r1',
max_tokens: 1000
})
});
const data = await response.json();
const message = data.choices[0].message;
// Access the reasoning process
if (message.reasoning) {
console.log('Reasoning:', message.reasoning);
}
// Get the final answer
console.log('Answer:', message.content);
// Check token usage including reasoning tokens
if (data.usage?.completion_tokens_details?.reasoning_tokens) {
console.log('Reasoning tokens:', data.usage.completion_tokens_details.reasoning_tokens);
}Streaming Response
const response = await fetchWithPayment('https://api.x-router.ai/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [
{ role: 'user', content: 'Tell me a short story.' }
],
model: 'anthropic/claude-3.5-sonnet',
max_tokens: 200,
stream: true
})
});
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader!.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') break;
try {
const json = JSON.parse(data);
const content = json.choices?.[0]?.delta?.content || '';
if (content) process.stdout.write(content);
} catch (e) {}
}
}
}Pricing
Pricing is dynamic and calculated based on:
- Number of input tokens (all messages combined)
- Requested
max_tokensfor output - Model-specific pricing rates
Price is determined before payment and includes a buffer to ensure we never undercharge.
Use the Estimate endpoint to get a cost estimate before making a paid request.
Message Patterns
Simple User Message
{
"messages": [
{ "role": "user", "content": "What is the capital of France?" }
]
}System Prompt + User Message
{
"messages": [
{ "role": "system", "content": "You are a helpful math tutor." },
{ "role": "user", "content": "What is 15 * 7?" }
]
}Multi-Turn Conversation
{
"messages": [
{ "role": "user", "content": "Hello!" },
{ "role": "assistant", "content": "Hi! How can I help you today?" },
{ "role": "user", "content": "What is quantum computing?" }
]
}Complex Conversation with System Prompt
{
"messages": [
{ "role": "system", "content": "You are a concise coding assistant." },
{ "role": "user", "content": "How do I create a function in JavaScript?" },
{ "role": "assistant", "content": "Use function myFunc() {} or const myFunc = () => {}" },
{ "role": "user", "content": "Which is better for callbacks?" }
]
}Error Responses
The API returns standard HTTP error codes with JSON error responses.
Common errors:
- 400 Bad Request: Invalid request format or parameters
- 402 Payment Required: Payment needed (handled automatically by x402-fetch)
- 429 Too Many Requests: Rate limit exceeded (see Rate Limits)
- 500 Internal Server Error: Server error
- 503 Service Unavailable: Backend service unavailable
Rate Limits
All API endpoints are rate limited to prevent abuse and ensure fair usage:
- Limit: 100 requests per minute per IP address
- Window: 60 seconds (rolling window)
- Scope: Applied per IP address (includes forwarded IPs via X-Forwarded-For header)
Rate Limit Response
When rate limited, you’ll receive a 429 error:
{
"error": "Too many requests",
"code": "RATE_LIMITED"
}Best Practices
- Implement exponential backoff when receiving 429 errors
- Batch operations when possible to reduce request count
- Cache responses to avoid repeated identical requests
- Monitor your rate in production to stay under limits
- Use streaming for long responses instead of polling
Example: Retry with Backoff
async function requestWithRetry(url: string, options: RequestInit, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetchWithPayment(url, options);
if (response.status === 429) {
const waitTime = Math.pow(2, attempt) * 1000; // Exponential backoff
console.log(`Rate limited. Waiting ${waitTime}ms before retry...`);
await new Promise(resolve => setTimeout(resolve, waitTime));
continue;
}
return response;
} catch (error) {
if (attempt === maxRetries - 1) throw error;
}
}
}