SDKs

Python SDK

The Pendra Python SDK is a drop-in OpenAI-compatible client for sovereign UK inference. Sync and async, with streaming. Python 3.10+.

Installation

bash
$ pip install pendra

Requires Python 3.10 or later. View on PyPI.

Quick start

quickstart.py
from pendra import Pendra

client = Pendra(
    api_key="pdr_sk_...",   # or set PENDRA_API_KEY env var
)

response = client.chat.completions.create(
    model="qwen3.6:27b",
    messages=[{
        "role": "user",
        "content": "What is the capital of the UK?"
    }]
)

print(response.choices[0].message.content)

Streaming

Stream responses token by token using a context manager.

stream.py
with client.chat.completions.create(
    model="qwen3.6:27b",
    messages=[{"role": "user", "content": "Write a poem about London."}],
    stream=True,
) as stream:
    for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

Async client

Use AsyncPendra for asyncio applications. Supports both streaming and non-streaming.

async.py
import asyncio
from pendra import AsyncPendra

async def main():
    async with AsyncPendra(api_key="pdr_sk_...") as client:
        response = await client.chat.completions.create(
            model="qwen3.6:27b",
            messages=[{"role": "user", "content": "Hello!"}]
        )
        print(response.choices[0].message.content)

asyncio.run(main())

Image generation

Generate images from a text prompt. Returns base64-encoded PNGs by default — decode and save to disk, or set response_format="url" when supported by the model.

images.py
import base64

response = client.images.generations.create(
    model="x/z-image-turbo",
    prompt="A red London double-decker bus at sunset",
    size="1024x1024",
)

with open("bus.png", "wb") as f:
    f.write(base64.b64decode(response.data[0].b64_json))

Use AsyncPendra for async applications:

images_async.py
async with AsyncPendra(api_key="pdr_sk_...") as client:
    response = await client.images.generations.create(
        model="x/z-image-turbo",
        prompt="A red London double-decker bus at sunset",
    )

Image generation is non-streaming — the endpoint returns a single JSON response once the worker finishes.

Embeddings

Generate vector embeddings for retrieval, search, and RAG pipelines. OpenAI-compatible — pass a string or a list of strings and get back a CreateEmbeddingResponse with one embedding per input.

embeddings.py
response = client.embeddings.create(
    model="nomic-embed-text:latest",
    input=["The quick brown fox", "jumps over the lazy dog"],
)

for item in response.data:
    print(item.index, len(item.embedding), "dims")

print(response.usage.prompt_tokens)

Any embedding model in the Pendra catalogue works — nomic-embed-text, mxbai-embed-large, bge-m3, qwen3-embedding, all-minilm. Also available on AsyncPendra via await client.embeddings.create(...).

Audio transcription

Transcribe audio to text using Whisper-class models. Multipart upload — pass an open binary file, a path, or a (filename, bytes) tuple. Files capped at 25 MB.

transcribe.py
with open("meeting.mp3", "rb") as f:
    result = client.audio.transcriptions.create(
        file=f,
        model="whisper-large-v3-turbo",
        language="en",
    )

print(result.text)

Use AsyncPendra for async applications:

transcribe_async.py
async with AsyncPendra(api_key="pdr_sk_...") as client:
    with open("meeting.mp3", "rb") as f:
        result = await client.audio.transcriptions.create(
            file=f,
            model="whisper-large-v3-turbo",
        )

print(result.text)

Need subtitles? Pass response_format="srt" (or "vtt") and the call returns a subtitle string directly — write it to disk and you have a ready-to-use caption track:

subtitles.py
# Get WebVTT/SRT subtitles instead of plain JSON
with open("meeting.mp3", "rb") as f:
    srt = client.audio.transcriptions.create(
        file=f,
        model="whisper-large-v3-turbo",
        response_format="srt",
    )

with open("meeting.srt", "w") as f:
    f.write(srt.text)  # the SDK wraps text/srt/vtt bodies as { text: ... }

Transcription is non-streaming — the endpoint returns a single JSON response once the worker finishes. result.duration and result.language are populated when the backend supplies them; result.segments appears when response_format="verbose_json".

List models

Query available models from your Pendra instance.

models.py
models = client.models.list()

for model in models:
    print(model.id)

Migrating from OpenAI

The Pendra SDK mirrors the OpenAI interface. Two lines to switch — your existing code just works.

migration.py
# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")

# After
from pendra import Pendra
client = Pendra(api_key="pdr_sk_...")

API reference

client.chat.completions.create()

Create a chat completion. Returns ChatCompletion or Stream.

ParameterTypeDescription
modelstrModel ID (e.g. "qwen3.6:27b")
messageslist[dict]Chat messages with role and content
streamboolEnable streaming (default False)
temperaturefloat?Sampling temperature (0–2)
max_tokensint?Maximum tokens to generate
top_pfloat?Top-p sampling value
stopstr | list?Stop sequence(s)

client.images.generations.create()

Generate images from a text prompt. Returns ImageResponse. Also available as await client.images.generations.create(...) on AsyncPendra.

ParameterTypeDescription
modelstrImage model ID (e.g. "x/z-image-turbo")
promptstrText description of the image to generate
nint?Number of images, 1–4 (default 1)
sizestr?Dimensions as WIDTHxHEIGHT (default "1024x1024")
response_formatstr?"b64_json" (default) or "url"
num_inference_stepsint?Diffusion steps (model-dependent)
seedint?Random seed for reproducibility
negative_promptstr?Text to avoid in the generated image

client.embeddings.create()

Create embeddings. Returns CreateEmbeddingResponse. Also await client.embeddings.create(...) on AsyncPendra.

ParameterTypeDescription
modelstrEmbedding model ID (e.g. "nomic-embed-text:latest")
inputstr | list[str]Text to embed. Accepts a single string or a batch.
encoding_formatstr?"float" (default) or "base64"
dimensionsint?Output dimensionality (Matryoshka models like nomic-embed-text)
userstr?Optional end-user identifier for abuse monitoring

client.models.list()

Returns a list of Model objects available on the instance. Each model has id, object, created, and owned_by fields.

client.audio.transcriptions.create()

Transcribe an audio file. Returns TranscriptionResponse. Also available as await client.audio.transcriptions.create(...) on AsyncPendra.

ParameterTypeDescription
fileIO[bytes] | str | tupleOpen binary file, file path, or (filename, bytes) tuple. ≤ 25 MB.
modelstrTranscription model id (e.g. "whisper-large-v3-turbo")
languagestr?ISO-639-1 language hint, optional
promptstr?Biasing prompt (vocabulary, formatting), optional
response_formatstr?"json" (default), "text", "srt", "vtt", or "verbose_json"
temperaturefloat?Sampling temperature 0.0–1.0
timestamp_granularitieslist[str]?"segment" and/or "word" (verbose_json only)

Environment variables

VariableDescription
PENDRA_API_KEYYour API key (pdr_sk_…). Used when no api_key is passed to the constructor.

Error handling

All exceptions inherit from pendra.APIError.

ExceptionStatusWhen
AuthenticationError401Invalid or missing API key
RateLimitError429Too many requests
APIStatusError4xx/5xxAny other non-2xx response
APIConnectionErrorNetwork or connection failure

Links