Worker

Install the worker

Self-hosting Pendra is three steps: install the worker, paste a worker key from the console, then install a model from the catalogue. The Pendra backend is bundled with the worker, so you can serve chat completions without installing a separate inference engine.

Want image generation, transcription, or to bring your own runtime? Jump to the optional bring-your-own-backend section.

1. Install the worker

The Pendra worker is a single Go binary. It includes the Pendra inference runtime — install it and you have a working backend.

The fastest path is the Pendra consoleWorkersAdd worker. It auto-detects your OS, serves the correct installer with the matching SHA-256, and walks you through the setup. The per-OS guidance below covers what each installer does in case you want to download manually.

macOS (Apple Silicon)

Download Pendra-<v>-arm64.dmg from the console (or directly from https://get.pendra.ai/worker/archives/latest/), open it, and drag Pendra.app into /Applications. Launch the app and the menu-bar GUI starts the headless daemon, registers a LaunchAgent so both start at login, and opens a browser-based setup wizard for your worker key.

Intel Macs are no longer supported — the in-process Metal path needs Apple Silicon.

Windows

Download PendraSetup-<v>.exe from the console (or directly from https://get.pendra.ai/worker/archives/latest/) and run it. The signed InnoSetup installer drops a per-user install at %LOCALAPPDATA%\Pendra (no UAC needed) and adds HKCU\...\Run\PendraGui so the tray launches at login. The setup wizard opens after the installer finishes.

Linux server (headless)

Download the right package for your distro and GPU from the console (or directly from https://get.pendra.ai/worker/archives/latest/). Three GPU variants ship side-by-side: the default CPU baseline, a CUDA variant for NVIDIA hosts (CUDA 13 runtime bundled), and a Vulkan variant for AMD / Intel hosts. Both arm64 and amd64 are available.

Debian / Ubuntu (.deb)

bash
# Debian / Ubuntu — pick the variant that matches your GPU
sudo apt install ./pendra_<v>_linux_amd64.deb          # CPU baseline
sudo apt install ./pendra-cuda_<v>_linux_amd64.deb     # NVIDIA (CUDA)
sudo apt install ./pendra-vulkan_<v>_linux_amd64.deb   # AMD / Intel (Vulkan)

# Paste your worker key — setup writes /var/lib/pendra/config.yaml
# and restarts pendra.service for you
sudo pendra setup

Fedora / RHEL / openSUSE (.rpm)

bash
# Fedora / RHEL / openSUSE — pick the variant that matches your GPU
sudo dnf install ./pendra_<v>_linux_amd64.rpm          # CPU baseline
sudo dnf install ./pendra-cuda_<v>_linux_amd64.rpm     # NVIDIA (CUDA)
sudo dnf install ./pendra-vulkan_<v>_linux_amd64.rpm   # AMD / Intel (Vulkan)

# Paste your worker key — setup writes /var/lib/pendra/config.yaml
# and restarts pendra.service for you
sudo pendra setup

Both packages register a systemd service so the daemon runs at boot. Manage it with the usual systemctl {start,stop,restart,status} pendra.

Just run sudo pendra setup. On a packaged install, setup detects the system service and writes /var/lib/pendra/config.yaml directly (the path the daemon reads), then restarts pendra.service so the new worker key is live immediately. Running pendra setup without sudo on a packaged host is refused with a clear hint. The postinstall hook also adds you to group pendra so pendra status, pendra doctor, and pendra logs work without sudo after your next login.

Docker

Multi-arch image at ghcr.io/pendra-cloud/pendra-worker (linux/amd64 and linux/arm64). CPU baseline on :latest, :cuda for NVIDIA hosts, :vulkan for AMD / Intel. Mount your GPU and pass in a worker key:

bash
# Run the worker container
docker run -d \
  --name pendra-worker \
  --gpus all \
  -e GPU_WORKER_PRIVATE_KEY=<base64-ed25519-key> \
  -e WORKER_NAME=my-worker \
  ghcr.io/pendra-cloud/pendra-worker:latest

2. Connect the worker

  1. Sign in to the Pendra console.
  2. Open Workers → Worker Keys.
  3. Click Generate key — copy the base64 Ed25519 private key it shows you.
  4. Run pendra setup (or paste it into the GUI setup wizard) and the daemon will start connecting.

Worker keys identify a specific machine to a Pendra organisation. They are not the same as pdr_sk_ API keys — those authenticate API clients, worker keys authenticate workers.

3. Install a model

A connected worker with no models can't serve requests. Pendra installs vetted catalogue models into the Pendra backend (or into Ollama / LM Studio / Speaches if you've added them) with one click — no SSH, no shell.

  1. Open console.pendra.aiModels.
  2. Browse the catalogue and click Install on a model — for example, qwen3.6:27b or nomic-embed-text.
  3. Pick your worker as the destination. For cross-backend models, choose a target backend (Pendra by default).
  4. Install progress streams live in the console.

Running vLLM instead? You manage models directly on the host — Pendra picks them up automatically once the backend is serving them. See each backend's docs for specifics.

4. Verify

Check the daemon is healthy with pendra status:

$ pendra status

worker:       wrk-a1b2c3d4 (tom-mac)
connection:   wss://api.pendra.ai/ws/gpu  (connected)
backends:     pendra (built-in)
              ollama @ http://localhost:11434
models:       17 served

Open the consoleWorkers and your machine should appear within a few seconds. Head back to the Quickstart to make your first API request.

Optional: bring your own backend

The Pendra backend covers chat models out of the box. You can add any of these external backends on the same machine when you need extra capabilities — image generation, audio transcription, or a specific serving engine like vLLM. Install them following the upstream instructions; the Pendra worker auto-discovers each one on its default port.

See the capability matrix for what each backend supports and which ones let Pendra install catalogue models for you.

Next steps