Endpoints

Responses API (OpenAI Codex)

OpenAI-compatible Responses API for the Codex CLI and other Responses-format clients.

POST /v1/responses

curl https://api.pendra.ai/v1/responses \
  -H "Authorization: Bearer pdr_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6:27b",
    "input": "Write a one-line summary of UK GDPR."
  }'

from openai import OpenAI

client = OpenAI(
    api_key="pdr_sk_...",
    base_url="https://api.pendra.ai/v1",
)

response = client.responses.create(
    model="qwen3.6:27b",
    input="Write a one-line summary of UK GDPR.",
)
print(response.output_text)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'pdr_sk_...',
  baseURL: 'https://api.pendra.ai/v1',
});

const response = await client.responses.create({
  model: 'qwen3.6:27b',
  input: 'Write a one-line summary of UK GDPR.',
});
console.log(response.output_text);

200

{
  "id": "resp_01HZ8b...",
  "object": "response",
  "created_at": 1715346400,
  "status": "completed",
  "model": "qwen3.6:27b",
  "output": [
    {
      "type": "message",
      "id": "msg_01",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "UK GDPR governs personal data of UK residents."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 12,
    "output_tokens": 18,
    "total_tokens": 30
  }
}

Pendra implements the OpenAI Responses API at /v1/responses (also aliased at /api/v1/responses) so the OpenAI Codex CLI works without modification when pointed at Pendra. The Python and JavaScript examples use the official OpenAI SDKs.

Body application/json

model string required

Model ID. Unknown OpenAI model names fall back to an available chat model — see Model mapping below.

input string | array required

The prompt or full conversation. This endpoint is stateless — send everything on each request.

store boolean

Accepted but has no effect — nothing is persisted server-side.

previous_response_id is not supported and returns a 400.

Response

Pendra returns the OpenAI Responses envelope (see example). output is an array of items; each message item carries an array of content blocks. status is completed on a clean finish, or incomplete when the model hits a stop condition before exhausting max_output_tokens.

Model mapping

Codex hard-codes OpenAI model names like gpt-5-codex and gpt-5.5. Pendra falls back to an available chat model on your worker pool when an unknown OpenAI model name is requested, so Codex works out of the box. To pin a specific Pendra model, set model in your Codex config — see Integrations → Codex.

Streaming

The Responses API uses its own event taxonomy (response.output_text.delta, response.completed, etc.). Pendra emits these events from streamed chat completions; the Codex CLI consumes them directly.

event: response.created
data: {"type":"response.created","response":{"id":"resp_01HZ8b","object":"response","status":"in_progress","model":"qwen3.6:27b"}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_01","output_index":0,"content_index":0,"delta":"Hello"}

event: response.completed
data: {"type":"response.completed","response":{"id":"resp_01HZ8b","status":"completed","usage":{"input_tokens":8,"output_tokens":2,"total_tokens":10}}}

Quick start

The full Codex setup — config file, env vars, and a model pin — lives in Integrations → Codex.