Now Available
Pendra Python SDK
OpenAI-compatible Python client for sovereign UK inference. Sync and async, with streaming support. Python 3.10+.
Quick Start
from pendra import Pendra
client = Pendra(
api_key="pdr_sk_...", # or set PENDRA_API_KEY env var
)
response = client.chat.completions.create(
model="llama3.2",
messages=[{
"role": "user",
"content": "What is the capital of the UK?"
}]
)
print(response.choices[0].message.content) Streaming
Stream responses token by token using a context manager.
with client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Write a poem about London."}],
stream=True,
) as stream:
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True) Async Client
Use AsyncPendra for asyncio applications. Supports both streaming and non-streaming.
import asyncio
from pendra import AsyncPendra
async def main():
async with AsyncPendra(api_key="pdr_sk_...") as client:
response = await client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
asyncio.run(main()) Image Generation
Generate images from a text prompt. Returns base64-encoded PNGs by default — decode and save to disk, or set response_format="url" when supported by the model.
import base64
response = client.images.generations.create(
model="x/z-image-turbo",
prompt="A red London double-decker bus at sunset",
size="1024x1024",
)
with open("bus.png", "wb") as f:
f.write(base64.b64decode(response.data[0].b64_json)) Use AsyncPendra for async applications:
async with AsyncPendra(api_key="pdr_sk_...") as client:
response = await client.images.generations.create(
model="x/z-image-turbo",
prompt="A red London double-decker bus at sunset",
) Image generation is non-streaming — the endpoint returns a single JSON response once the worker finishes.
Embeddings
Generate vector embeddings for retrieval, search, and RAG pipelines. OpenAI-compatible — pass a string or a list of strings and get back a CreateEmbeddingResponse with one embedding per input.
response = client.embeddings.create(
model="nomic-embed-text:latest",
input=["The quick brown fox", "jumps over the lazy dog"],
)
for item in response.data:
print(item.index, len(item.embedding), "dims")
print(response.usage.prompt_tokens) Any embedding model in the Pendra model catalogue works — nomic-embed-text, mxbai-embed-large, bge-m3, qwen3-embedding, all-minilm. Also available on AsyncPendra via await client.embeddings.create(...).
List Models
Query available models from your Pendra instance.
models = client.models.list()
for model in models:
print(model.id) Migrating from OpenAI
The Pendra SDK mirrors the OpenAI interface. Two lines to switch — your existing code just works.
# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")
# After
from pendra import Pendra
client = Pendra(api_key="pdr_sk_...") API Reference
client.chat.completions.create()
Create a chat completion. Returns ChatCompletion or Stream.
| Parameter | Type | Description |
|---|---|---|
| model | str | Model ID (e.g. "llama3.2") |
| messages | list[dict] | Chat messages with role and content |
| stream | bool | Enable streaming (default False) |
| temperature | float? | Sampling temperature (0–2) |
| max_tokens | int? | Maximum tokens to generate |
| top_p | float? | Top-p sampling value |
| stop | str | list? | Stop sequence(s) |
client.images.generations.create()
Generate images from a text prompt. Returns ImageResponse. Also available as await client.images.generations.create(...) on AsyncPendra.
| Parameter | Type | Description |
|---|---|---|
| model | str | Image model ID (e.g. "x/z-image-turbo") |
| prompt | str | Text description of the image to generate |
| n | int? | Number of images, 1–4 (default 1) |
| size | str? | Dimensions as WIDTHxHEIGHT (default "1024x1024") |
| response_format | str? | "b64_json" (default) or "url" |
| num_inference_steps | int? | Diffusion steps (model-dependent) |
| seed | int? | Random seed for reproducibility |
| negative_prompt | str? | Text to avoid in the generated image |
client.embeddings.create()
Create embeddings. Returns CreateEmbeddingResponse. Also await client.embeddings.create(...) on AsyncPendra.
| Parameter | Type | Description |
|---|---|---|
| model | str | Embedding model ID (e.g. "nomic-embed-text:latest") |
| input | str | list[str] | Text to embed. Accepts a single string or a batch. |
| encoding_format | str? | "float" (default) or "base64" |
| dimensions | int? | Output dimensionality (Matryoshka models like nomic-embed-text) |
| user | str? | Optional end-user identifier for abuse monitoring |
client.models.list()
Returns a list of Model objects available on the instance. Each model has id, object, created, and owned_by fields.
Environment Variables
| Variable | Description |
|---|---|
| PENDRA_API_KEY | Your API key (pdr_sk_...). Used when no api_key is passed to the constructor. |
Error Handling
All exceptions inherit from pendra.APIError.
| Exception | Status | When |
|---|---|---|
| AuthenticationError | 401 | Invalid or missing API key |
| RateLimitError | 429 | Too many requests |
| APIStatusError | 4xx/5xx | Any other non-2xx response |
| APIConnectionError | — | Network or connection failure |