Backends
Ollama
Ollama is the recommended default backend for Pendra. It has full model install and uninstall support, and it auto-discovers cleanly on every supported OS.
What's supported
| Capability | Status |
|---|---|
| Chat completions | ✓ |
| Embeddings | ✓ |
| Image generation | ✓ |
| Audio transcription | — |
| Model install | ✓ |
| Model uninstall | ✓ |
Connection
- Default port: 11434
- Auto-discovery probes:
http://localhost:11434, thenhttp://host.docker.internal:11434 - Verification: the worker calls
/api/versionand/api/tags; both must return the expected Ollama-shaped JSON before Ollama activates. - Override: set
OLLAMA_ENDPOINTin worker config.
Installing Ollama
Pendra does not install Ollama itself — install it from ollama.com/download on the same machine as the worker. Once Ollama is running and serving on port 11434, the worker discovers it on the next refresh tick.
Model discovery
The worker lists models from GET /api/tags and enriches each
one with a parallel GET /api/show call (max 5 concurrent),
capturing:
- Parameter size and quantisation level
- Format (GGUF), family
- Context length
- Capabilities array (e.g.
embedding,vision) - Disk size
Context window
Ollama defaults to a 2048-token context window per request, regardless
of what the underlying model supports. For models that report a much
larger window via /api/show (e.g.
qwen3.6:27b at 262144), this means longer conversations
get silently truncated and the model loses task context mid-session.
The worker fixes this for you: when it dispatches a chat completion
to Ollama, it injects options.num_ctx using the context
length captured during model discovery. An explicit
options.num_ctx sent by the client always wins. The
injection only applies to chat completions — embedding, image, and
transcription requests are forwarded unchanged.
Curated installs
Any catalogue model whose backend is ollama
(or both) can be installed from the console:
- console.pendra.ai → Models.
- Pick a catalogued model + variant (size / quantisation).
- Choose the destination worker. Progress streams live.
Uninstall works the same way — click Remove on an
installed model and Pendra calls DELETE /api/delete on the
worker's Ollama.