SDKs
Python SDK
The Pendra Python SDK is a drop-in OpenAI-compatible client for sovereign UK inference. Sync and async, with streaming. Python 3.10+.
Installation
$ pip install pendra Requires Python 3.10 or later. View on PyPI.
Quick start
from pendra import Pendra
client = Pendra(
api_key="pdr_sk_...", # or set PENDRA_API_KEY env var
)
response = client.chat.completions.create(
model="qwen3.6:27b",
messages=[{
"role": "user",
"content": "What is the capital of the UK?"
}]
)
print(response.choices[0].message.content) Streaming
Stream responses token by token using a context manager.
with client.chat.completions.create(
model="qwen3.6:27b",
messages=[{"role": "user", "content": "Write a poem about London."}],
stream=True,
) as stream:
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True) Async client
Use AsyncPendra for asyncio applications. Supports both
streaming and non-streaming.
import asyncio
from pendra import AsyncPendra
async def main():
async with AsyncPendra(api_key="pdr_sk_...") as client:
response = await client.chat.completions.create(
model="qwen3.6:27b",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
asyncio.run(main()) Image generation
Generate images from a text prompt. Returns base64-encoded PNGs by
default — decode and save to disk, or set
response_format="url" when supported by the model.
import base64
response = client.images.generations.create(
model="x/z-image-turbo",
prompt="A red London double-decker bus at sunset",
size="1024x1024",
)
with open("bus.png", "wb") as f:
f.write(base64.b64decode(response.data[0].b64_json)) Use AsyncPendra for async applications:
async with AsyncPendra(api_key="pdr_sk_...") as client:
response = await client.images.generations.create(
model="x/z-image-turbo",
prompt="A red London double-decker bus at sunset",
) Image generation is non-streaming — the endpoint returns a single JSON response once the worker finishes.
Embeddings
Generate vector embeddings for retrieval, search, and RAG pipelines.
OpenAI-compatible — pass a string or a list of strings and get back a
CreateEmbeddingResponse with one embedding per input.
response = client.embeddings.create(
model="nomic-embed-text:latest",
input=["The quick brown fox", "jumps over the lazy dog"],
)
for item in response.data:
print(item.index, len(item.embedding), "dims")
print(response.usage.prompt_tokens)
Any embedding model in the
Pendra catalogue
works — nomic-embed-text, mxbai-embed-large,
bge-m3, qwen3-embedding,
all-minilm. Also available on AsyncPendra via
await client.embeddings.create(...).
Audio transcription
Transcribe audio to text using Whisper-class models. Multipart upload —
pass an open binary file, a path, or a (filename, bytes)
tuple. Files capped at 25 MB.
with open("meeting.mp3", "rb") as f:
result = client.audio.transcriptions.create(
file=f,
model="whisper-large-v3-turbo",
language="en",
)
print(result.text) Use AsyncPendra for async applications:
async with AsyncPendra(api_key="pdr_sk_...") as client:
with open("meeting.mp3", "rb") as f:
result = await client.audio.transcriptions.create(
file=f,
model="whisper-large-v3-turbo",
)
print(result.text)
Need subtitles? Pass response_format="srt" (or
"vtt") and the call returns a subtitle string directly —
write it to disk and you have a ready-to-use caption track:
# Get WebVTT/SRT subtitles instead of plain JSON
with open("meeting.mp3", "rb") as f:
srt = client.audio.transcriptions.create(
file=f,
model="whisper-large-v3-turbo",
response_format="srt",
)
with open("meeting.srt", "w") as f:
f.write(srt.text) # the SDK wraps text/srt/vtt bodies as { text: ... }
Transcription is non-streaming — the endpoint returns a single JSON
response once the worker finishes. result.duration and
result.language are populated when the backend supplies them;
result.segments appears when
response_format="verbose_json".
List models
Query available models from your Pendra instance.
models = client.models.list()
for model in models:
print(model.id) Migrating from OpenAI
The Pendra SDK mirrors the OpenAI interface. Two lines to switch — your existing code just works.
# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")
# After
from pendra import Pendra
client = Pendra(api_key="pdr_sk_...") API reference
client.chat.completions.create()
Create a chat completion. Returns ChatCompletion or
Stream.
| Parameter | Type | Description |
|---|---|---|
model | str | Model ID (e.g. "qwen3.6:27b") |
messages | list[dict] | Chat messages with role and content |
stream | bool | Enable streaming (default False) |
temperature | float? | Sampling temperature (0–2) |
max_tokens | int? | Maximum tokens to generate |
top_p | float? | Top-p sampling value |
stop | str | list? | Stop sequence(s) |
client.images.generations.create()
Generate images from a text prompt. Returns ImageResponse.
Also available as await client.images.generations.create(...)
on AsyncPendra.
| Parameter | Type | Description |
|---|---|---|
model | str | Image model ID (e.g. "x/z-image-turbo") |
prompt | str | Text description of the image to generate |
n | int? | Number of images, 1–4 (default 1) |
size | str? | Dimensions as WIDTHxHEIGHT (default "1024x1024") |
response_format | str? | "b64_json" (default) or "url" |
num_inference_steps | int? | Diffusion steps (model-dependent) |
seed | int? | Random seed for reproducibility |
negative_prompt | str? | Text to avoid in the generated image |
client.embeddings.create()
Create embeddings. Returns CreateEmbeddingResponse. Also
await client.embeddings.create(...) on AsyncPendra.
| Parameter | Type | Description |
|---|---|---|
model | str | Embedding model ID (e.g. "nomic-embed-text:latest") |
input | str | list[str] | Text to embed. Accepts a single string or a batch. |
encoding_format | str? | "float" (default) or "base64" |
dimensions | int? | Output dimensionality (Matryoshka models like nomic-embed-text) |
user | str? | Optional end-user identifier for abuse monitoring |
client.models.list()
Returns a list of Model objects available on the instance. Each model has
id, object, created, and
owned_by fields.
client.audio.transcriptions.create()
Transcribe an audio file. Returns TranscriptionResponse.
Also available as await client.audio.transcriptions.create(...)
on AsyncPendra.
| Parameter | Type | Description |
|---|---|---|
file | IO[bytes] | str | tuple | Open binary file, file path, or (filename, bytes) tuple. ≤ 25 MB. |
model | str | Transcription model id (e.g. "whisper-large-v3-turbo") |
language | str? | ISO-639-1 language hint, optional |
prompt | str? | Biasing prompt (vocabulary, formatting), optional |
response_format | str? | "json" (default), "text", "srt", "vtt", or "verbose_json" |
temperature | float? | Sampling temperature 0.0–1.0 |
timestamp_granularities | list[str]? | "segment" and/or "word" (verbose_json only) |
Environment variables
| Variable | Description |
|---|---|
PENDRA_API_KEY | Your API key (pdr_sk_…). Used when no api_key is passed to the constructor. |
Error handling
All exceptions inherit from pendra.APIError.
| Exception | Status | When |
|---|---|---|
AuthenticationError | 401 | Invalid or missing API key |
RateLimitError | 429 | Too many requests |
APIStatusError | 4xx/5xx | Any other non-2xx response |
APIConnectionError | — | Network or connection failure |