API
Text completions
The completions endpoint sends your prompt to
the model exactly as written and returns the text it generates — no chat
template is applied. Reach for it when chat formatting gets in the way:
base (non-instruct) models, fill-in-the-middle code completion, or older
clients that target the legacy OpenAI Completions API. For assistant-style
conversations, use Chat completions
instead.
Endpoint
POST https://api.pendra.ai/v1/completions
POST https://api.pendra.ai/api/v1/completions # alias Request
curl https://api.pendra.ai/v1/completions \
-H "Authorization: Bearer pdr_sk_..." \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder:7b",
"prompt": "def fibonacci(n):",
"max_tokens": 64,
"stop": ["\n\n"]
}' Fields
model— model ID. Browse available models.prompt— the text to complete (a string, or an array of strings).max_tokens,temperature,top_p,top_k,min_p,stop,seed— standard sampling controls.frequency_penalty,presence_penalty,logit_bias— repetition and token-bias controls.suffix— text that comes after the insertion point, for fill-in-the-middle completion (model permitting).stream— settruefor server-sent events.
Any other standard OpenAI Completions field is forwarded to the serving worker as-is. Whether a given field takes effect depends on the model and backend serving the request.
Response
A single OpenAI-shaped text_completion object. The generated
text is in choices[0].text.
{
"id": "cmpl-9f2b1c8e",
"object": "text_completion",
"created": 1715346400,
"model": "qwen2.5-coder:7b",
"choices": [
{
"text": "\n if n < 2:\n return n\n return fibonacci(n - 1) + fibonacci(n - 2)",
"index": 0,
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 6, "completion_tokens": 24, "total_tokens": 30 }
} Streaming
With "stream": true you receive text_completion
chunks as server-sent events, terminated by a data: [DONE]
line — the same framing as chat completions.
Backend support
Raw completions are served by workers running Ollama, vLLM, or LM Studio.
The in-process Pendra backend (llama.cpp) does not serve this endpoint and
returns 415 — point the request at a worker running one of the
above, or use Chat completions.