Pricing

Start building. Scale when you're ready.

Every plan includes zero data retention, sovereignty in your jurisdiction, and OpenAI SDK compatibility. Bring your own GPUs with a free account, or let us manage everything.

Free

£0 /month

For personal use — sovereign inference on your hardware.

Personal use only
1 self-hosted Pendra Worker
OpenAI-compatible API endpoint
Zero data retention
Sovereign jurisdiction
Python and Node.js SDKs
Community documentation

Get Started

Pro

£99 /month

For commercial use — teams shipping AI products in production.

Everything in Free, plus:
Commercial use rights
Up to 5 self-hosted Pendra Workers
Enhanced usage analytics
Worker monitoring & webhook alerts
Standard DPA included
Priority support (email)
Request logging (optional)
Private inference (end-to-end encryption)

Get Started

Enterprise

Custom

Managed GPUs, advanced security, and dedicated support.

Everything in Pro, plus:
Pendra-managed GPU infrastructure
Unlimited self-hosted Workers
Automatic data masking and redaction
Granular audit logging
Custom DPA and DPIA support
Dedicated account manager
SLA with uptime guarantee
SSO and role-based access control
Onboarding and integration support

Talk to Us

Plan details

Compare plans

Free

Pro

Enterprise

Usage

Commercial use Use Pendra to power products, services, or workloads on behalf of customers or your business.

Infrastructure

Self-hosted Workers Run inference on your own GPU hardware via the Pendra orchestration layer.

Up to 5

Unlimited

OpenAI SDK compatibility Drop-in replacement — swap your base URL and use existing OpenAI client code.

SDKs Native client libraries for Python and Node.js.

Pendra-managed GPUs Dedicated GPU clusters in UK data centres, fully managed by Pendra.

Operations

Enhanced usage analytics Request volume, latency percentiles, model usage breakdowns, and per-key tracking.

Worker monitoring & webhook alerts Health dashboard with notifications for worker offline events and error rate spikes.

Optional request logging Opt-in logging of requests and responses. Off by default — you choose what to capture.

Priority support Faster response times from the Pendra engineering team.

Dedicated

Security & Compliance

Zero data retention Prompts and completions processed in RAM and never written to disc. Architectural, not policy.

Sovereign jurisdiction Run on our UK compute or your own GPUs in your country. No foreign government — including under the US CLOUD Act — can compel access.

DPA Data Processing Agreement for UK GDPR compliance. Standard template or custom-negotiated.

Standard

Custom

Private inference Prompts and completions are sealed on your client and only decrypted inside your worker — Pendra never holds the key.

Auto data masking Automatic redaction of PII and sensitive data before it reaches the model.

Audit logging Granular system-level logs of all API activity and configuration changes.

DPIA support Pendra provides input and documentation for your Data Protection Impact Assessments.

SSO & RBAC Single sign-on integration and role-based access control for your organisation.

SLA Contractual uptime guarantee with defined response and resolution times.

FAQ

Frequently asked questions

Is my data stored or logged?

No. On every plan, prompts and completions are processed in RAM and never stored. This is architectural, not a policy setting.

Can I use the Free plan for my startup or product?

The Free plan is for personal use — evaluating Pendra, running local experiments, hobby projects. Anything where Pendra powers a product, service, or workload on behalf of customers or your business needs the Pro plan or above. If you're a side project earning revenue, that's commercial use.

What's a Pendra Worker?

A Worker is the Pendra agent that runs on your own GPU hardware and serves models through the same API as Pendra-managed infrastructure. The Worker dials out to the Pendra control plane over an authenticated WebSocket — no inbound ports need to be opened in your network. Requests route through the Pendra API and into the Worker, where they're processed in RAM and never stored.

What hardware do I need to run a Pendra Worker?

Workers run on Linux, macOS, or Windows and support NVIDIA GPUs, Apple Silicon, and CPU-only fallback. Inference runs in-process — install a model from the catalogue and the worker serves it, with nothing else to set up. VRAM requirements depend on the model; a 7B model fits on 16GB, a 70B quantised model needs 48GB+. See the Workers docs for full specs.

What models can I run?

Pendra serves open-weight models — Llama, Mistral, Qwen, DeepSeek, Gemma, GLM, Phi, GPT-OSS and others — alongside image generation (Flux), audio transcription (Whisper, Parakeet) and embeddings (Nomic, BGE). We don't offer proprietary models like GPT-4, Claude or Gemini. Open-weight on sovereign infrastructure means full transparency and no vendor lock-in at the model layer.

How is Enterprise pricing structured?

Enterprise pricing is based on your GPU requirements, model selection, and throughput needs. Managed GPU plans are flat-rate — dedicated compute at a fixed monthly price, no per-token billing. Get in touch and we'll scope it with you.

Can I mix managed and self-hosted?

Yes — and this is common. You can ship your product on Pendra-managed GPUs for most traffic while routing the most sensitive customers' inference to self-hosted Workers in their own environment, all through a single API and control plane. Mixed topologies are an Enterprise feature.

What compliance certifications does Pendra hold?

Pendra is UK GDPR compliant and Cyber Essentials certified. Cyber Essentials Plus and ISO 27001 certifications are in progress. Pro customers are covered by our standard Data Processing Agreement; Enterprise customers receive a custom DPA and Data Protection Impact Assessment (DPIA) support tailored to their deployment.

Where exactly does my data live?

Prompts and completions are processed in RAM at the Worker and never written to disk. The only data that persists is operational metadata — timestamps, model IDs, token counts, latency — stored in a UK-resident Postgres instance run by Pendra AI Ltd. No content crosses a UK border at any point.

How does billing work?

Pro is billed monthly in GBP via Stripe. There's no per-token charge — the plan price is what you pay. Cancel any time from the dashboard; access continues until the end of the current billing period. Enterprise billing is invoiced annually with custom terms.

Can I upgrade or downgrade at any time?

Yes. Move between Free and Pro at any time. Enterprise transitions are handled with your account manager.

Not sure which plan fits?

Book a 15-minute call. We'll help you figure out the right setup for your workload and compliance requirements.

Talk to Us