Worker
Install the worker
Self-hosting Pendra is three steps: install the worker, paste a worker key from the console, then install a model from the catalogue. The Pendra backend is bundled with the worker, so you can serve chat completions without installing a separate inference engine.
1. Install the worker
The Pendra worker is a single Go binary. It includes the Pendra inference runtime — install it and you have a working backend.
The fastest path is the Pendra console → Workers → Add worker. It auto-detects your OS, serves the correct installer with the matching SHA-256, and walks you through the setup. The per-OS guidance below covers what each installer does in case you want to download manually.
macOS (Apple Silicon)
Download Pendra-<v>-arm64.dmg from the console (or
directly from https://get.pendra.ai/worker/archives/latest/),
open it, and drag Pendra.app into /Applications.
Launch the app and the menu-bar GUI starts the headless daemon, registers
a LaunchAgent so both start at login, and opens a browser-based setup
wizard for your worker key.
Intel Macs are no longer supported — the in-process Metal path needs Apple Silicon.
Windows
Download PendraSetup-<v>.exe from the console (or
directly from https://get.pendra.ai/worker/archives/latest/)
and run it. The signed InnoSetup installer drops a per-user install at
%LOCALAPPDATA%\Pendra (no UAC needed) and adds
HKCU\...\Run\PendraGui so the tray launches at login. The
setup wizard opens after the installer finishes.
Linux server (headless)
Download the right package for your distro and GPU from the console (or
directly from https://get.pendra.ai/worker/archives/latest/).
Three GPU variants ship side-by-side: the default CPU baseline, a CUDA
variant for NVIDIA hosts (CUDA 13 runtime bundled), and a Vulkan variant
for AMD / Intel hosts. Both arm64 and amd64 are
available.
Debian / Ubuntu (.deb)
# Debian / Ubuntu — pick the variant that matches your GPU
sudo apt install ./pendra_<v>_linux_amd64.deb # CPU baseline
sudo apt install ./pendra-cuda_<v>_linux_amd64.deb # NVIDIA (CUDA)
sudo apt install ./pendra-vulkan_<v>_linux_amd64.deb # AMD / Intel (Vulkan)
# Paste your worker key — setup writes /var/lib/pendra/config.yaml
# and restarts pendra.service for you
sudo pendra setup Fedora / RHEL / openSUSE (.rpm)
# Fedora / RHEL / openSUSE — pick the variant that matches your GPU
sudo dnf install ./pendra_<v>_linux_amd64.rpm # CPU baseline
sudo dnf install ./pendra-cuda_<v>_linux_amd64.rpm # NVIDIA (CUDA)
sudo dnf install ./pendra-vulkan_<v>_linux_amd64.rpm # AMD / Intel (Vulkan)
# Paste your worker key — setup writes /var/lib/pendra/config.yaml
# and restarts pendra.service for you
sudo pendra setup
Both packages register a systemd service so the daemon
runs at boot. Manage it with the usual
systemctl {start,stop,restart,status} pendra.
sudo pendra setup. On a
packaged install, setup detects the system service and writes
/var/lib/pendra/config.yaml directly (the path the
daemon reads), then restarts pendra.service so the
new worker key is live immediately. Running pendra setup
without sudo on a packaged host is refused with a
clear hint. The postinstall hook also adds you to
group pendra so pendra status,
pendra doctor, and pendra logs work
without sudo after your next login.
Docker
Multi-arch image at ghcr.io/pendra-cloud/pendra-worker
(linux/amd64 and linux/arm64). CPU baseline on :latest,
:cuda for NVIDIA hosts, :vulkan for AMD /
Intel. Mount your GPU and pass in a worker key:
# Run the worker container
docker run -d \
--name pendra-worker \
--gpus all \
-e GPU_WORKER_PRIVATE_KEY=<base64-ed25519-key> \
-e WORKER_NAME=my-worker \
ghcr.io/pendra-cloud/pendra-worker:latest 2. Connect the worker
- Sign in to the Pendra console.
- Open Workers → Worker Keys.
- Click Generate key — copy the base64 Ed25519 private key it shows you.
- Run
pendra setup(or paste it into the GUI setup wizard) and the daemon will start connecting.
Worker keys identify a specific machine to a Pendra organisation. They
are not the same as pdr_sk_ API keys —
those authenticate API clients, worker keys authenticate workers.
3. Install a model
A connected worker with no models can't serve requests. Pendra installs vetted catalogue models into the Pendra backend (or into Ollama / LM Studio / Speaches if you've added them) with one click — no SSH, no shell.
- Open console.pendra.ai → Models.
- Browse the catalogue and click Install on a model — for example,
qwen3.6:27bornomic-embed-text. - Pick your worker as the destination. For cross-backend models, choose a target backend (Pendra by default).
- Install progress streams live in the console.
Running vLLM instead? You manage models directly on the host — Pendra picks them up automatically once the backend is serving them. See each backend's docs for specifics.
4. Verify
Check the daemon is healthy with pendra status:
$ pendra status
worker: wrk-a1b2c3d4 (tom-mac)
connection: wss://api.pendra.ai/ws/gpu (connected)
backends: pendra (built-in)
ollama @ http://localhost:11434
models: 17 served Open the console → Workers and your machine should appear within a few seconds. Head back to the Quickstart to make your first API request.
Optional: bring your own backend
The Pendra backend covers chat models out of the box. You can add any of these external backends on the same machine when you need extra capabilities — image generation, audio transcription, or a specific serving engine like vLLM. Install them following the upstream instructions; the Pendra worker auto-discovers each one on its default port.
Ollama
Optional extra runner with broad model coverage including image and embedding models. Curated install/uninstall.
vLLM
High-throughput serving for HuggingFace models. Continuous batching, PagedAttention.
LM Studio
Desktop GUI runner. Curated install supported; uninstall happens inside the LM Studio app.
Speaches
Whisper-family transcription with curated install. Add this when you need audio.
Next steps
- Tune memory and GPU allocation: System requirements.
- Reference every env var: Configuration.
- See what each backend supports: Backend capability matrix.