Backends
Pendra (built-in)
The Pendra backend is the default inference runtime bundled with every Pendra worker. There is no separate service to install, no port to open, and no upstream project to configure — install the worker and the backend is already there.
Pendra reads GGUF model files from a local modelstore directory and serves them through the same OpenAI-compatible surface the API expects. Curated installs from the console download verified GGUFs (SHA-256 pinned) straight into that directory.
What's supported
| Capability | Status |
|---|---|
| Chat completions | ✓ |
| Embeddings | ✓ |
| Image generation | — |
| Audio transcription | — |
| Model install | ✓ — curated download to the local modelstore |
| Model uninstall | ✓ — removes the GGUF from disk |
How it connects
- No external service — the runtime runs in-process inside the worker. Nothing to start, nothing to expose.
- GPU acceleration — CUDA on supported NVIDIA cards, Metal on Apple Silicon, Vulkan elsewhere. The right build is picked when you install the worker.
- Models directory — defaults to
~/.pendra/modelson Unix and%ProgramData%\Pendra\modelson Windows. Override withPENDRA_MODELS_DIRormodels_dirin config.
Installing models
Either install from the console (Models → click Install) or from the CLI:
# Install a catalogue model into the local modelstore
pendra models install qwen3.6:27b
# Show where Pendra keeps GGUF files on disk
pendra models dir
Catalogue variants with a gguf_url + gguf_sha256
pair install into the Pendra backend. The worker watches the modelstore
directory, so dropping a .gguf file in by hand also works —
it appears in pendra models on the next refresh.
Curated installs also write a <gguf>.meta.json
sidecar next to the file so the runtime knows what the model is for
(chat, embedding, image, transcription) without relying on a filename
heuristic. Embedding models like
nomic-embed-text-v1.5 (F16 and Q4_K_M variants in the
catalogue) light up the OpenAI-compatible /v1/embeddings
endpoint as soon as they finish installing.