API
Errors & rate limits
Pendra returns standard HTTP status codes. Errors always come back as JSON
with a short detail message, and every response carries an
X-Request-Id header you can paste into a support ticket.
Status codes
| Code | Meaning | What to do |
|---|---|---|
400 | Bad request — malformed JSON, unknown model, or invalid field. | Fix the request body. The detail field describes the problem. |
401 | Missing or invalid API key. | Check the Authorization or x-api-key header. Rotate the key from the console if needed. |
403 | Authenticated but not allowed — e.g. owner-only operation called by a member. | Use a key from a user with the right role. |
404 | Unknown route or resource. | Check the URL prefix (/api/v1 vs /v1). |
429 | Rate limited. | Back off and retry with exponential jitter. Higher plans get higher limits. |
500 | Unexpected server error. | Retry once; if it persists, send the X-Request-Id to support. |
502 | Worker error — the GPU worker that picked up your request returned an error. | The X-Worker-Id / X-Worker-Name headers identify the worker. Retry; Pendra will pick a different worker. |
504 | Timeout — chat > ~100s idle, embeddings > ~60s, images > ~600s. | Use stream: true for long generations, or batch smaller embedding requests. |
Error body shape
{
"detail": "Invalid or expired token"
}
For OpenAI-compatible endpoints, Pendra also returns the OpenAI-shaped
error envelope when the upstream backend produces one — e.g.
{ "error": { "message": "...", "type": "..." } }.
Helpful response headers
| Header | On which responses | Use |
|---|---|---|
X-Request-Id | All requests | Quote this on a support ticket so we can look up the exact request. |
X-Worker-Id | Inference routes (chat, embeddings, images, audio) | Identifies which GPU worker served the request. |
X-Worker-Name | Inference routes | Human-readable worker name from the console. |
Rate limits
Pendra applies per-organisation rate limits scaled to your subscription
plan. The default is generous and covers most production workloads. When
you hit the limit you'll get a 429 with a detail
message describing the bucket; back off with jitter and retry.
If you expect to sustain very high throughput, contact sales to discuss a dedicated plan.
Idempotency & retries
All inference endpoints are safe to retry. Chat completions and image generation are non-deterministic, so a retry produces a fresh sample — duplicate billing is not a concern because retries only bill on success.