Worker

Run inference on your own GPUs

Install a single binary or Docker container. Your prompts and completions stay on your hardware — routed through the same API, the same SDKs, the same console.

Self-Hosted Workers
Your GPUs. Our orchestration layer.
Your App
Pendra · UK
Pendra API
Self-hosted workers
Worker A
8× L40S 48GB
qwen-2.5-72b
Worker B
4× A100 80GB
deepseek-r1-70b
Inference runs in your environment

Why self-host

Full data control

Prompts and completions are processed entirely on your hardware. Requests route through the Pendra API, but inference never leaves your environment.

Unified orchestration

Centralised load balancing across your GPU fleet. One API, one console, one set of SDKs.

Hybrid ready

Run managed and self-hosted workers side by side. Route sensitive workloads to on-premises, everything else to Pendra.

Installation

Install with the native package for your OS or run as a Docker container. Open the Pendra consoleWorkersAdd worker for an OS-aware download with checksums, or grab the file straight from get.pendra.ai/worker/archives/latest/. Full per-OS instructions live in Workers → Install.

install
# macOS (Apple Silicon)
# Drag Pendra.app into /Applications, then launch it
open Pendra-<v>-arm64.dmg

# Windows — run the signed installer, no UAC needed
PendraSetup-<v>.exe

# Linux — pick CPU / CUDA / Vulkan to match your GPU
sudo apt install ./pendra-cuda_<v>_linux_amd64.deb
sudo pendra setup   # writes /var/lib/pendra/config.yaml and restarts pendra.service

Each installer registers the OS service (LaunchAgent on macOS, Run key on Windows, systemd on Linux), so the worker comes back up after a reboot. macOS supports Apple Silicon only — Intel Macs are no longer supported because the in-process Metal path needs Apple Silicon. Linux ships three GPU variants (CPU baseline, CUDA, Vulkan) for both amd64 and arm64.

CLI reference

The pendra CLI manages your worker. Config is stored in ~/.pendra/config.yaml. Full env-var reference at Workers → Configuration.

CommandDescription
pendra setupInteractive setup wizard — enter your key, discover backends, save config
pendra models install <model>Pull a catalogue model into the worker's inference backend
pendra runStart the worker and begin serving inference requests
pendra modelsList all models available on your configured backends
pendra statusShow connection status, backend health, and active models
pendra configView resolved configuration (env + file + defaults)
pendra config set KEY VALSet a configuration value in ~/.pendra/config.yaml
pendra logsTail the worker's log buffer; -f follows. Use systemctl status pendra / launchctl print to manage the OS-supervised service installed by the .deb / .rpm / .dmg / .exe package.
pendra updateSelf-update to the latest version
pendra versionShow version, Go version, and platform

Inference backends

Every Pendra worker ships with the Pendra backend built in — it serves catalogue chat models directly. You don't need to install anything else to start serving chat completions.

You can optionally connect external backends on the same machine for capabilities the Pendra backend doesn't cover today — image generation (Ollama) and audio transcription (Speaches). The worker auto-discovers them on startup. See the backend capability matrix for what each supports.

Requirements

ComponentRequirement
OSLinux (x86_64, arm64), macOS (Apple Silicon only), or Windows (amd64). Full matrix: system requirements.
GPUNVIDIA recommended for production; the Pendra backend ships CUDA, Metal, and Vulkan builds. AMD ROCm via Ollama. CPU-only mode supported for testing.
BackendThe Pendra backend is built in — no separate install. Optionally add Ollama, vLLM, LM Studio, or Speaches on the same machine.
NetworkOutbound WSS to api.pendra.ai. No inbound ports needed.
DockerOnly needed if running the worker as a container. Not required for binary install.

Hybrid deployments

Route traffic based on sensitivity, cost, or performance. Your application code stays the same regardless of where inference runs.

Self-hosted

Patient record summarisation, classified document analysis, privileged legal review.

Pendra-managed

Internal knowledge bases, customer support drafts, code generation, general-purpose tasks.

Want us to handle it instead?

Let us run your workers

If you'd rather not manage your own GPUs, we can run dedicated workers for you on Pendra-managed infrastructure. Same API, zero operational overhead.

Get in touch