Backends

Pendra (built-in)

The Pendra backend is the default inference runtime bundled with every Pendra worker. There is no separate service to install, no port to open, and no upstream project to configure — install the worker and the backend is already there.

Pendra reads GGUF model files from a local modelstore directory and serves them through the same OpenAI-compatible surface the API expects. Curated installs from the console download verified GGUFs (SHA-256 pinned) straight into that directory.

What's supported

CapabilityStatus
Chat completions
Embeddings
Image generation
Audio transcription
Model install✓ — curated download to the local modelstore
Model uninstall✓ — removes the GGUF from disk

How it connects

  • No external service — the runtime runs in-process inside the worker. Nothing to start, nothing to expose.
  • GPU acceleration — CUDA on supported NVIDIA cards, Metal on Apple Silicon, Vulkan elsewhere. The right build is picked when you install the worker.
  • Models directory — defaults to ~/.pendra/models on Unix and %ProgramData%\Pendra\models on Windows. Override with PENDRA_MODELS_DIR or models_dir in config.

Installing models

Either install from the console (Models → click Install) or from the CLI:

bash
# Install a catalogue model into the local modelstore
pendra models install qwen3.6:27b

# Show where Pendra keeps GGUF files on disk
pendra models dir

Catalogue variants with a gguf_url + gguf_sha256 pair install into the Pendra backend. The worker watches the modelstore directory, so dropping a .gguf file in by hand also works — it appears in pendra models on the next refresh.

Curated installs also write a <gguf>.meta.json sidecar next to the file so the runtime knows what the model is for (chat, embedding, image, transcription) without relying on a filename heuristic. Embedding models like nomic-embed-text-v1.5 (F16 and Q4_K_M variants in the catalogue) light up the OpenAI-compatible /v1/embeddings endpoint as soon as they finish installing.

Related