Pricing

Start building. Scale when you're ready.

Every plan includes zero data retention, UK jurisdiction, and OpenAI SDK compatibility. Bring your own GPUs with a free account, or let us manage everything.

Free

£0 /month

For personal use — sovereign inference on your hardware.

  • Personal use only
  • 1 self-hosted Pendra Worker
  • OpenAI-compatible API endpoint
  • Zero data retention
  • UK legal jurisdiction
  • Python and Node.js SDKs
  • Community documentation
Get Started

Pro

£99 /month

For commercial use — teams shipping AI products in production.

  • Everything in Free, plus:
  • Commercial use rights
  • Up to 5 self-hosted Pendra Workers
  • Enhanced usage analytics
  • Worker monitoring & webhook alerts
  • Standard DPA included
  • Priority support (email)
  • Request logging (optional)
Get Started

Enterprise

Custom

Managed GPUs, advanced security, and dedicated support.

  • Everything in Pro, plus:
  • Pendra-managed GPU infrastructure
  • Unlimited self-hosted Workers
  • E2E encrypted worker communications
  • Automatic data masking and redaction
  • Granular audit logging
  • Custom DPA and DPIA support
  • Dedicated account manager
  • SLA with uptime guarantee
  • SSO and role-based access control
  • Onboarding and integration support
Talk to Us

Plan details

Compare plans

Free
Pro
Enterprise

Usage

Commercial use Use Pendra to power products, services, or workloads on behalf of customers or your business.

Infrastructure

Self-hosted Workers Run inference on your own GPU hardware via the Pendra orchestration layer.
1
Up to 5
Unlimited
OpenAI SDK compatibility Drop-in replacement — swap your base URL and use existing OpenAI client code.
SDKs Native client libraries for Python and Node.js.
Pendra-managed GPUs Dedicated GPU clusters in UK data centres, fully managed by Pendra.

Operations

Enhanced usage analytics Request volume, latency percentiles, model usage breakdowns, and per-key tracking.
Worker monitoring & webhook alerts Health dashboard with notifications for worker offline events and error rate spikes.
Optional request logging Opt-in logging of requests and responses. Off by default — you choose what to capture.
Priority support Faster response times from the Pendra engineering team.
Email
Dedicated

Security & Compliance

Zero data retention Prompts and completions processed in RAM and never written to disc. Architectural, not policy.
UK jurisdiction All infrastructure on UK soil, operated by a UK entity. Outside the US CLOUD Act.
DPA Data Processing Agreement for UK GDPR compliance. Standard template or custom-negotiated.
Standard
Custom
E2E encrypted worker comms End-to-end encryption between the Pendra control plane and your workers.
Auto data masking Automatic redaction of PII and sensitive data before it reaches the model.
Audit logging Granular system-level logs of all API activity and configuration changes.
DPIA support Pendra provides input and documentation for your Data Protection Impact Assessments.
SSO & RBAC Single sign-on integration and role-based access control for your organisation.
SLA Contractual uptime guarantee with defined response and resolution times.

FAQ

Frequently asked questions

Is my data stored or logged?
No. On every plan, prompts and completions are processed in RAM and never stored. This is architectural, not a policy setting.
Can I use the Free plan for my startup or product?
The Free plan is for personal use — evaluating Pendra, running local experiments, hobby projects. Anything where Pendra powers a product, service, or workload on behalf of customers or your business needs the Pro plan or above. If you're a side project earning revenue, that's commercial use.
What's a Pendra Worker?
A Worker is the Pendra agent that runs on your own GPU hardware and serves models through the same API as Pendra-managed infrastructure. The Worker dials out to the Pendra control plane over an authenticated WebSocket — no inbound ports need to be opened in your network. Requests route through the Pendra API and into the Worker, where they're processed in RAM and never stored.
What hardware do I need to run a Pendra Worker?
Workers run on Linux, macOS, or Windows and support NVIDIA GPUs, Apple Silicon, and CPU-only fallback. Backends are auto-detected — Ollama, vLLM, LM Studio, or our in-process llama.cpp runtime. VRAM requirements depend on the model; a 7B model fits on 16GB, a 70B quantised model needs 48GB+. See the Workers docs for full specs.
What models can I run?
Pendra serves open-weight models — Llama, Mistral, Qwen, DeepSeek, Gemma, GLM, Phi, GPT-OSS and others — alongside image generation (Flux), audio transcription (Whisper, Parakeet) and embeddings (Nomic, BGE). We don't offer proprietary models like GPT-4, Claude or Gemini. Open-weight on sovereign infrastructure means full transparency and no vendor lock-in at the model layer.
How is Enterprise pricing structured?
Enterprise pricing is based on your GPU requirements, model selection, and throughput needs. Managed GPU plans are flat-rate — dedicated compute at a fixed monthly price, no per-token billing. Get in touch and we'll scope it with you.
Can I mix managed and self-hosted?
Yes — and this is common. You can ship your product on Pendra-managed GPUs for most traffic while routing the most sensitive customers' inference to self-hosted Workers in their own environment, all through a single API and control plane. Mixed topologies are an Enterprise feature.
What compliance certifications does Pendra hold?
Pendra is UK GDPR compliant. Cyber Essentials Plus and ISO 27001 certifications are in progress. Pro customers are covered by our standard Data Processing Agreement; Enterprise customers receive a custom DPA and Data Protection Impact Assessment (DPIA) support tailored to their deployment.
Where exactly does my data live?
Prompts and completions are processed in RAM at the Worker and never written to disk. The only data that persists is operational metadata — timestamps, model IDs, token counts, latency — stored in a UK-resident Postgres instance run by Pendra AI Ltd. No content crosses a UK border at any point.
How does billing work?
Pro is billed monthly in GBP via Stripe. There's no per-token charge — the plan price is what you pay. Cancel any time from the dashboard; access continues until the end of the current billing period. Enterprise billing is invoiced annually with custom terms.
Can I upgrade or downgrade at any time?
Yes. Move between Free and Pro at any time. Enterprise transitions are handled with your account manager.

Not sure which plan fits?

Book a 15-minute call. We'll help you figure out the right setup for your workload and compliance requirements.

Talk to Us