API

Text completions

The completions endpoint sends your prompt to the model exactly as written and returns the text it generates — no chat template is applied. Reach for it when chat formatting gets in the way: base (non-instruct) models, fill-in-the-middle code completion, or older clients that target the legacy OpenAI Completions API. For assistant-style conversations, use Chat completions instead.

Endpoint

POST https://api.pendra.ai/v1/completions
POST https://api.pendra.ai/api/v1/completions    # alias

Request

curl
curl https://api.pendra.ai/v1/completions \
  -H "Authorization: Bearer pdr_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder:7b",
    "prompt": "def fibonacci(n):",
    "max_tokens": 64,
    "stop": ["\n\n"]
  }'

Fields

  • model — model ID. Browse available models.
  • prompt — the text to complete (a string, or an array of strings).
  • max_tokens, temperature, top_p, top_k, min_p, stop, seed — standard sampling controls.
  • frequency_penalty, presence_penalty, logit_bias — repetition and token-bias controls.
  • suffix — text that comes after the insertion point, for fill-in-the-middle completion (model permitting).
  • stream — set true for server-sent events.

Any other standard OpenAI Completions field is forwarded to the serving worker as-is. Whether a given field takes effect depends on the model and backend serving the request.

Response

A single OpenAI-shaped text_completion object. The generated text is in choices[0].text.

{
  "id": "cmpl-9f2b1c8e",
  "object": "text_completion",
  "created": 1715346400,
  "model": "qwen2.5-coder:7b",
  "choices": [
    {
      "text": "\n    if n < 2:\n        return n\n    return fibonacci(n - 1) + fibonacci(n - 2)",
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 6, "completion_tokens": 24, "total_tokens": 30 }
}

Streaming

With "stream": true you receive text_completion chunks as server-sent events, terminated by a data: [DONE] line — the same framing as chat completions.

Backend support

Raw completions are served by workers running Ollama, vLLM, or LM Studio. The in-process Pendra backend (llama.cpp) does not serve this endpoint and returns 415 — point the request at a worker running one of the above, or use Chat completions.