API
Chat completions
POST /api/v1/chat/completions is OpenAI-compatible. The
request and response bodies match the OpenAI schema exactly — drop a
Pendra base URL and key into your existing OpenAI SDK and you're done.
Request
curl https://api.pendra.ai/api/v1/chat/completions \
-H "Authorization: Bearer pdr_sk_..." \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.6:27b",
"messages": [{"role": "user", "content": "Explain GPUs in one sentence."}]
}' Required fields
model— model ID, e.g.qwen3.6:27b,llama3.3:70b,gpt-oss:120b. Browse available models.messages— array of{ role, content }objects.
Common optional fields
stream— settruefor server-sent events.temperature,top_p,max_tokens,stop— standard OpenAI sampling controls.tools,tool_choice— function calling, OpenAI-shaped.response_format— set to{ "type": "json_object" }for JSON-mode output (model permitting).
Response
Non-streaming responses come back as a single OpenAI-shaped
chat.completion object. usage is always populated;
finish_reason is "stop" when the model finished
naturally or "length" when capped by max_tokens.
{
"id": "chatcmpl-9f2b1c8e",
"object": "chat.completion",
"created": 1715346123,
"model": "qwen3.6:27b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "GPUs are specialised processors built for the kind of massively parallel arithmetic that graphics and AI workloads need."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 24,
"total_tokens": 36
}
} Streaming
With stream: true, Pendra returns
Server-Sent Events
matching OpenAI's format: each event is a data: { ... } line containing a
delta chunk, terminated by data: [DONE]. Caddy is configured to flush
immediately so tokens arrive as they're generated.
curl https://api.pendra.ai/api/v1/chat/completions \
-H "Authorization: Bearer pdr_sk_..." \
-N \
-d '{
"model": "qwen3.6:27b",
"stream": true,
"messages": [{"role": "user", "content": "Hello"}]
}' Streaming response
Each chunk is an OpenAI-shaped chat.completion.chunk. The
final chunk before [DONE] carries usage because
Pendra always sets stream_options.include_usage = true
server-side.
data: {"id":"chatcmpl-9f2b","object":"chat.completion.chunk","created":1715346123,"model":"qwen3.6:27b","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-9f2b","object":"chat.completion.chunk","created":1715346123,"model":"qwen3.6:27b","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-9f2b","object":"chat.completion.chunk","created":1715346123,"model":"qwen3.6:27b","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-9f2b","object":"chat.completion.chunk","created":1715346123,"model":"qwen3.6:27b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":8,"completion_tokens":2,"total_tokens":10}}
data: [DONE] Python (using the Pendra SDK)
from pendra import Pendra
client = Pendra() # reads PENDRA_API_KEY
for event in client.chat.completions.create(
model="qwen3.6:27b",
stream=True,
messages=[{"role": "user", "content": "Write a haiku."}],
):
print(event.choices[0].delta.content or "", end="") Response headers
Every chat response — streaming or not — carries these headers:
| Header | Meaning |
|---|---|
X-Request-Id | UUID. Quote this to support when reporting an issue with a request. |
X-Worker-Id | Which GPU worker served the request. |
X-Worker-Name | Human-readable worker name from the console. |
Tool / function calling
Tools work the same as OpenAI. Pass a tools array; the model
may reply with a tool_calls array in the assistant message.
Resolve, append a {role: "tool", tool_call_id, content}
message, and re-call /chat/completions.
Timeouts
Idle-connection cut-off is about 100 seconds for non-streaming chat. Long
generations should use stream: true so partial tokens keep
the socket alive.
OpenAI SDK compatibility
Point the OpenAI SDK at Pendra by setting OPENAI_BASE_URL=https://api.pendra.ai/api/v1
and OPENAI_API_KEY=pdr_sk_…. No other code changes needed.