SDKs

Python SDK

The Pendra Python SDK is a drop-in OpenAI-compatible client for sovereign UK inference. Sync and async, with streaming. Python 3.10+.

Installation

bash

$ pip install pendra

Requires Python 3.10 or later. View on PyPI.

Quick start

quickstart.py

from pendra import Pendra

client = Pendra(
    api_key="pdr_sk_...",   # or set PENDRA_API_KEY env var
)

response = client.chat.completions.create(
    model="qwen3.6:27b",
    messages=[{
        "role": "user",
        "content": "What is the capital of the UK?"
    }]
)

print(response.choices[0].message.content)

Streaming

Stream responses token by token using a context manager.

stream.py

with client.chat.completions.create(
    model="qwen3.6:27b",
    messages=[{"role": "user", "content": "Write a poem about London."}],
    stream=True,
) as stream:
    for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

Async client

Use AsyncPendra for asyncio applications. Supports both streaming and non-streaming.

async.py

import asyncio
from pendra import AsyncPendra

async def main():
    async with AsyncPendra(api_key="pdr_sk_...") as client:
        response = await client.chat.completions.create(
            model="qwen3.6:27b",
            messages=[{"role": "user", "content": "Hello!"}]
        )
        print(response.choices[0].message.content)

asyncio.run(main())

Image generation

Generate images from a text prompt. Returns base64-encoded PNGs by default — decode and save to disk, or set response_format="url" when supported by the model.

images.py

import base64

response = client.images.generations.create(
    model="flux.1-schnell",
    prompt="A red London double-decker bus at sunset",
)

with open("bus.png", "wb") as f:
    f.write(base64.b64decode(response.data[0].b64_json))

Use AsyncPendra for async applications:

images_async.py

async with AsyncPendra(api_key="pdr_sk_...") as client:
    response = await client.images.generations.create(
        model="sdxl-turbo",
        prompt="A red London double-decker bus at sunset",
    )

Image generation is non-streaming — the endpoint returns a single JSON response once the worker finishes.

Embeddings

Generate vector embeddings for retrieval, search, and RAG pipelines. OpenAI-compatible — pass a string or a list of strings and get back a CreateEmbeddingResponse with one embedding per input.

embeddings.py

response = client.embeddings.create(
    model="nomic-embed-text",
    input=["The quick brown fox", "jumps over the lazy dog"],
)

for item in response.data:
    print(item.index, len(item.embedding), "dims")

print(response.usage.prompt_tokens)

Any embedding model in the Pendra catalogue works — nomic-embed-text, qwen3-embedding, bge-m3, embeddinggemma. Also available on AsyncPendra via await client.embeddings.create(...).

Audio transcription

Transcribe audio to text using Whisper-class models. Multipart upload — pass an open binary file, a path, or a (filename, bytes) tuple. Files capped at 25 MB.

transcribe.py

with open("meeting.mp3", "rb") as f:
    result = client.audio.transcriptions.create(
        file=f,
        model="whisper-large-v3-turbo",
        language="en",
    )

print(result.text)

Use AsyncPendra for async applications:

transcribe_async.py

async with AsyncPendra(api_key="pdr_sk_...") as client:
    with open("meeting.mp3", "rb") as f:
        result = await client.audio.transcriptions.create(
            file=f,
            model="whisper-large-v3-turbo",
        )

print(result.text)

Need subtitles? Pass response_format="srt" (or "vtt") and the call returns a subtitle string directly — write it to disk and you have a ready-to-use caption track:

subtitles.py

# Get WebVTT/SRT subtitles instead of plain JSON
with open("meeting.mp3", "rb") as f:
    srt = client.audio.transcriptions.create(
        file=f,
        model="whisper-large-v3-turbo",
        response_format="srt",
    )

with open("meeting.srt", "w") as f:
    f.write(srt.text)  # the SDK wraps text/srt/vtt bodies as { text: ... }

Transcription is non-streaming — the endpoint returns a single JSON response once the worker finishes. result.duration and result.language are populated when the model supplies them; result.segments appears when response_format="verbose_json".

List models

Query available models from your Pendra instance.

models.py

models = client.models.list()

for model in models:
    print(model.id)

Migrating from OpenAI

The Pendra SDK mirrors the OpenAI interface. Two lines to switch — your existing code just works.

migration.py

# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")

# After
from pendra import Pendra
client = Pendra(api_key="pdr_sk_...")

API reference

`client.chat.completions.create()`

Create a chat completion. Returns ChatCompletion or Stream.

Parameter	Type	Description
`model`	`str`	Model ID (e.g. "qwen3.6:27b")
`messages`	`list[dict]`	Chat messages with role and content
`stream`	`bool`	Enable streaming (default False)
`temperature`	`float?`	Sampling temperature (0–2)
`max_tokens`	`int?`	Maximum tokens to generate
`top_p`	`float?`	Top-p sampling value
`stop`	`str \| list?`	Stop sequence(s)
`worker_id`	`str?`	Pin the request to a specific worker that serves this model (sent as the `X-Pendra-Worker-Id` header). Defaults to automatic routing.

`client.images.generations.create()`

Generate images from a text prompt. Returns ImageResponse. Also available as await client.images.generations.create(...) on AsyncPendra.

Parameter	Type	Description
`model`	`str`	Image model ID (e.g. "sdxl-turbo")
`prompt`	`str`	Text description of the image to generate
`n`	`int?`	Number of images, 1–4 (default 1)
`size`	`str?`	Dimensions as WIDTHxHEIGHT (defaults to the model's native resolution, e.g. 512x512 for SD 1.5, 1024x1024 for SDXL/FLUX)
`response_format`	`str?`	"b64_json" (default) or "url"
`num_inference_steps`	`int?`	Diffusion steps; defaults per model (~30 for standard SD/SDXL, 4 for turbo/schnell)
`seed`	`int?`	Random seed for reproducibility
`negative_prompt`	`str?`	Text to avoid in the generated image

`client.embeddings.create()`

Create embeddings. Returns CreateEmbeddingResponse. Also await client.embeddings.create(...) on AsyncPendra.

Parameter	Type	Description
`model`	`str`	Embedding model ID (e.g. "nomic-embed-text")
`input`	`str \| list[str]`	Text to embed. Accepts a single string or a batch.
`encoding_format`	`str?`	"float" (default) or "base64"
`dimensions`	`int?`	Output dimensionality (Matryoshka models like nomic-embed-text)
`user`	`str?`	Optional end-user identifier for abuse monitoring

`client.models.list()`

Returns a list of Model objects available on the instance. Each model has id, object, created, and owned_by fields.

`client.audio.transcriptions.create()`

Transcribe an audio file. Returns TranscriptionResponse. Also available as await client.audio.transcriptions.create(...) on AsyncPendra.

Parameter	Type	Description
`file`	`IO[bytes] \| str \| tuple`	Open binary file, file path, or (filename, bytes) tuple. ≤ 25 MB.
`model`	`str`	Transcription model id (e.g. "whisper-large-v3-turbo")
`language`	`str?`	ISO-639-1 language hint, optional
`prompt`	`str?`	Biasing prompt (vocabulary, formatting), optional
`response_format`	`str?`	"json" (default), "text", "srt", "vtt", or "verbose_json"
`temperature`	`float?`	Sampling temperature 0.0–1.0
`timestamp_granularities`	`list[str]?`	"segment" and/or "word" (verbose_json only)

Environment variables

Variable	Description
`PENDRA_API_KEY`	Your API key (`pdr_sk_…`). Used when no `api_key` is passed to the constructor.

Error handling

All exceptions inherit from pendra.APIError.

Exception	Status	When
`AuthenticationError`	401	Invalid or missing API key
`RateLimitError`	429	Too many requests
`APIStatusError`	4xx/5xx	Any other non-2xx response
`APIConnectionError`	—	Network or connection failure