Worker

Install the worker

Self-hosting Pendra is three steps: install the worker, paste a worker key from the console, then install a model from the catalogue. The inference runtime is built into the worker, so it serves chat, embeddings, image generation, and audio transcription with nothing else to install.

1. Install the worker

The Pendra worker is a single binary with the inference runtime built in — install it and it's ready to serve models.

On a Linux server, the quickest install is one command — it detects your GPU, installs the matching package, and prompts you to run pendra setup:

bash

# Linux — auto-detects your GPU and installs the matching package
curl -sSL https://get.pendra.ai/worker | sh

On macOS and Windows — or if you'd rather install graphically — open the Pendra console → Workers → Add worker. It auto-detects your OS and serves the correct installer with its matching SHA-256. The per-OS guidance below covers what each installer does.

macOS (Apple Silicon)

Download Pendra-<v>-arm64.dmg from the console, open it, and drag Pendra into /Applications. Launch it and the menu-bar app starts the headless daemon, registers a LaunchAgent so both start at login, and opens a browser-based setup wizard for your worker key.

To enter or change your worker key later, open the Pendra menu-bar icon and choose Set up Pendra… (or Settings… once a key is set) — it reopens the same setup flow.

Intel Macs are no longer supported — the in-process Metal path needs Apple Silicon.

Windows

Download PendraSetup-<v>.exe from the console and run it. The signed installer needs no admin rights and the setup wizard opens after it finishes so you can paste your worker key. Everything the worker needs to run is bundled — including the CUDA runtime — so there's no separate runtime to download.

NVIDIA GPU? Install the driver first. The installer bundles the CUDA runtime, but the host must already have a working NVIDIA driver — and cloud Windows images (GCP and similar) frequently ship none. Install the driver and reboot, then confirm with nvidia-smi (it should list your GPU and a CUDA version). Without a driver the worker can't see the card and runs every request CPU-only.

Need to enter or change your worker key later? Open the Pendra tray icon and choose Set up Pendra… (or Settings… once a key is set) — it reopens the same setup flow. To remove Pendra, uninstall it from Settings → Apps (or Add/Remove Programs). Your downloaded models and settings are kept in C:\ProgramData\Pendra so a re-install picks up where you left off — delete that folder by hand if you want to remove them and reclaim the disk space.

Linux server (headless)

The install command above picks the right package automatically. To choose yourself, three GPU variants ship side-by-side: the default CPU baseline, a CUDA variant for NVIDIA hosts, and a Vulkan variant for AMD / Intel hosts. Both arm64 and amd64 are available — install via the package repository (recommended) or by downloading a single file from the console.

Runtime requirements (Linux). The in-process inference runtime needs glibc 2.38 or newer — Ubuntu 24.04+, Debian 13+, or equivalent. On older releases (e.g. Ubuntu 22.04) it won't load and pendra doctor will tell you so.

pendra-cuda needs the CUDA 13 runtime. It requires NVIDIA driver R580 or newer and the CUDA 13 runtime, which NVIDIA's licence forbids us from bundling — install it from NVIDIA's installation guide. The package checks for libcudart.so.13 / libcublas.so.13 at install time and refuses to install if they're missing, so you never end up with a worker that can't use the GPU. Confirm the driver is loaded with nvidia-smi — it should list your GPU and a CUDA version. A host with no driver runs the worker CPU-only.

Don't want CUDA? pendra-vulkan runs on NVIDIA, AMD, and Intel GPUs with no CUDA dependency, and the CPU-only pendra package always works. If your CUDA runtime lives somewhere ldconfig doesn't index, set PENDRA_SKIP_CUDA_CHECK=1 to bypass the check, then confirm the GPU backend loaded with pendra doctor. If the runtime still can't see any GPU — usually a missing or not-loaded driver — pendra doctor reports a failure ("no GPU device found") instead of silently running on the CPU, and tells you how to fix it.

Package repository (recommended — auto-updates)

Add the Pendra repository once and your package manager keeps the worker current: a later apt upgrade or dnf upgrade pulls new versions like any other system package. Both arm64 and amd64 are served, and the repository is signed. Upgrading restarts the running worker onto the new binary automatically — no manual systemctl restart or reboot, and your worker key and config are untouched.

bash

# Debian / Ubuntu — add the Pendra apt repository once
sudo curl -fsSL https://get.pendra.ai/pendra-archive-keyring.gpg \
  -o /usr/share/keyrings/pendra-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/pendra-archive-keyring.gpg] https://get.pendra.ai/apt stable main" \
  | sudo tee /etc/apt/sources.list.d/pendra.list

# Install the variant that matches your GPU
sudo apt update
sudo apt install pendra          # CPU baseline
sudo apt install pendra-cuda     # NVIDIA (CUDA)
sudo apt install pendra-vulkan   # AMD / Intel (Vulkan)

# Later: sudo apt update && sudo apt upgrade pulls new worker versions
sudo pendra setup

bash

# Fedora / RHEL / openSUSE — add the Pendra yum repository once
sudo tee /etc/yum.repos.d/pendra.repo <<'EOF'
[pendra]
name=Pendra Worker
baseurl=https://get.pendra.ai/yum/$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://get.pendra.ai/RPM-GPG-KEY-pendra
EOF

# Install the variant that matches your GPU
sudo dnf install pendra          # CPU baseline
sudo dnf install pendra-cuda     # NVIDIA (CUDA)
sudo dnf install pendra-vulkan   # AMD / Intel (Vulkan)

# Later: sudo dnf upgrade pulls new worker versions
sudo pendra setup

Direct download (single file)

Grab the package for your distro and GPU from the console and install it by path. It registers the Pendra repository for you, so later upgrades are the usual sudo apt upgrade pendra / sudo dnf upgrade pendra — no need to download the file again. (If the installer can't register the repository — say curl is missing — add it manually with the commands in the Package repository section above.)

Debian / Ubuntu (.deb)

bash

# Debian / Ubuntu — pick the variant that matches your GPU
sudo apt install ./pendra_<v>_linux_amd64.deb          # CPU baseline
sudo apt install ./pendra-cuda_<v>_linux_amd64.deb     # NVIDIA (CUDA)
sudo apt install ./pendra-vulkan_<v>_linux_amd64.deb   # AMD / Intel (Vulkan)

# Paste your worker key — setup writes /var/lib/pendra/config.yaml
# and restarts pendra.service for you
sudo pendra setup

Fedora / RHEL / openSUSE (.rpm)

bash

# Fedora / RHEL / openSUSE — pick the variant that matches your GPU
sudo dnf install ./pendra_<v>_linux_amd64.rpm          # CPU baseline
sudo dnf install ./pendra-cuda_<v>_linux_amd64.rpm     # NVIDIA (CUDA)
sudo dnf install ./pendra-vulkan_<v>_linux_amd64.rpm   # AMD / Intel (Vulkan)

# Paste your worker key — setup writes /var/lib/pendra/config.yaml
# and restarts pendra.service for you
sudo pendra setup

Both packages register a systemd service so the daemon runs at boot. Manage it with the usual systemctl {start,stop,restart,status} pendra.

Just run sudo pendra setup. On a packaged install, setup writes /var/lib/pendra/config.yaml and restarts pendra.service so your worker key is live immediately. The install also adds you to the pendra group, so pendra status, pendra doctor, and pendra logs work without sudo after your next login.

Docker

Multi-arch image at ghcr.io/pendra-cloud/pendra-worker (linux/amd64 and linux/arm64). CPU baseline on :latest, :cuda for NVIDIA hosts, :vulkan for AMD / Intel. Mount your GPU and pass in a worker key:

bash

# Run the worker container
docker run -d \
  --name pendra-worker \
  --gpus all \
  -e GPU_WORKER_PRIVATE_KEY=<base64-ed25519-key> \
  -e WORKER_NAME=my-worker \
  ghcr.io/pendra-cloud/pendra-worker:latest

2. Connect the worker

Sign in to the Pendra console.
Open Workers → Worker Keys.
Click Generate key — copy the base64 Ed25519 private key it shows you.
Run pendra setup (or paste it into the GUI setup wizard) and the daemon will start connecting.

Worker keys identify a specific machine to a Pendra organisation. They are not the same as pdr_sk_ API keys — those authenticate API clients, worker keys authenticate workers.

3. Install a model

A connected worker with no models can't serve requests. Pendra installs vetted catalogue models onto the worker with one click — no SSH, no shell.

Open console.pendra.ai → Models.
Browse the catalogue and click Install on a model — for example, qwen3.6:27b or nomic-embed-text.
Pick your worker as the destination.
Install progress streams live in the console.

4. Verify

Check the daemon is healthy with pendra status:

$ pendra status

worker:       wrk-a1b2c3d4 (gpu-01)
connection:   wss://api.pendra.ai/ws/gpu  (connected)
models:       17 served

Open the console → Workers and your machine should appear within a few seconds. Head back to the Quickstart to make your first API request.

Next steps

Tune memory and GPU allocation: System requirements.
Reference every env var: Configuration.
Pick the right model variant: Choosing a model size.