API

Errors & rate limits

Pendra returns standard HTTP status codes. Errors always come back as JSON with a short detail message, and every response carries an X-Request-Id header you can paste into a support ticket.

Status codes

CodeMeaningWhat to do
400 Bad request — malformed JSON, unknown model, or invalid field. Fix the request body. The detail field describes the problem.
401 Missing or invalid API key. Check the Authorization or x-api-key header. Rotate the key from the console if needed.
403 Authenticated but not allowed — e.g. owner-only operation called by a member. Use a key from a user with the right role.
404 Unknown route or resource. Check the URL prefix (/api/v1 vs /v1).
429 Rate limited. Back off and retry with exponential jitter. Higher plans get higher limits.
500 Unexpected server error. Retry once; if it persists, send the X-Request-Id to support.
502 Worker error — the GPU worker that picked up your request returned an error. The X-Worker-Id / X-Worker-Name headers identify the worker. Retry; Pendra will pick a different worker.
504 Timeout — chat > ~100s idle, embeddings > ~60s, images > ~600s. Use stream: true for long generations, or batch smaller embedding requests.

Error body shape

{
  "detail": "Invalid or expired token"
}

For OpenAI-compatible endpoints, Pendra also returns the OpenAI-shaped error envelope when the upstream backend produces one — e.g. { "error": { "message": "...", "type": "..." } }.

Helpful response headers

HeaderOn which responsesUse
X-Request-IdAll requestsQuote this on a support ticket so we can look up the exact request.
X-Worker-IdInference routes (chat, embeddings, images, audio)Identifies which GPU worker served the request.
X-Worker-NameInference routesHuman-readable worker name from the console.

Rate limits

Pendra applies per-organisation rate limits scaled to your subscription plan. The default is generous and covers most production workloads. When you hit the limit you'll get a 429 with a detail message describing the bucket; back off with jitter and retry.

If you expect to sustain very high throughput, contact sales to discuss a dedicated plan.

Idempotency & retries

All inference endpoints are safe to retry. Chat completions and image generation are non-deterministic, so a retry produces a fresh sample — duplicate billing is not a concern because retries only bill on success.