Backends

Backend capabilities

The Pendra worker ships with a built-in inference backend — just install the worker and you can start serving chat completions from the curated catalogue. You can optionally connect external backends (Ollama, vLLM, LM Studio, Speaches) on the same machine when you need capabilities the built-in backend doesn't cover, such as image generation or audio transcription.

External backends are auto-discovered on their default port; you can override or disable any of them in configuration.

Capability matrix

The most important columns are Model install and Model uninstall: these describe whether Pendra can install a model from our curated catalogue into that backend with one click from the console, or remove it the same way. For external backends, Pendra never installs the backend itself — once you have it running, model lifecycle becomes Pendra's job where the backend supports it.

Backend	Chat	Embed	Image	Transcribe	Model install	Model uninstall
Pendra (built-in)	✓	✓	—	—	✓	✓
Ollama	✓	✓	✓	—	✓	✓
LM Studio	✓	✓	—	—	✓	via LM Studio app
Speaches	—	—	—	✓	✓	✓
vLLM	✓	✓	—	—	—	—

What "model install" means

Pendra ships with a curated catalogue of vetted open-source models (Llama, Qwen, Mistral, gpt-oss, Phi, Nomic embeddings, Whisper variants, image models, etc.). For backends in the table that support model install, you can:

Browse the catalogue at console.pendra.ai → Models.
Click Install on any catalogue model.
Pick a destination worker (and, for cross-backend models, a target backend — Pendra, Ollama, or LM Studio).
Install progress streams live in the console.

The same flow drives uninstall where the backend supports it (Pendra, Ollama, and Speaches today). LM Studio models must currently be removed inside the LM Studio app — that's a limitation of LM Studio's API, not Pendra.

The catalogue endpoint at /api/v1/catalogue is public — anyone can browse it without an account. Use it to confirm which sizes / quantisations are catalogued for a given model.

Auto-discovery (external backends)

When no explicit endpoint is configured for an external backend, the worker probes http://localhost:<port> first, then http://host.docker.internal:<port>. The first endpoint that responds and passes a backend-specific Verify() wins. The Docker fallback means a worker running in a container can reach a backend on the host without extra config. The Pendra backend is in-process and doesn't need discovery.

Disabling a backend