Models – Open-weight LLMs on UK infrastructure

DeepSeek V4 Flash

DeepSeek

Frontier reasoning, open weights

Chat Tools Thinking

284B 1M context

Qwen 3.6

Alibaba

Qwen 3.6 is Alibaba's flagship open-weight model, built for agentic work with strong multilingual reasoning, native tool use, and image understanding. It offers a hybrid thinking mode that can reason at length or answer directly, and handles 262K-token contexts, making it a capable all-rounder for chat, coding, and document tasks.

Chat Vision Tools Thinking

27B · 35B 256K context

Qwen 3.5

Alibaba

Alibaba’s flagship multimodal family

Chat Vision Tools Thinking

0.8B · 122B · 27B · 2B · 35B · 397B · 4B · 9B 256K context

gpt-oss

OpenAI

OpenAI’s open-weight release

Chat Tools

120B · 20B 128K context

Gemma 4

Google DeepMind

Google DeepMind’s efficient multimodal family

Chat Vision Tools Thinking

12B · 26B · 31B · E2B · E4B 256K context

Qwen 3 Coder

Alibaba

Purpose-built for code

Chat Tools

30B · 480B 256K context

Llama 3.3

Qwen 3 VL

Alibaba

Qwen 3 VL is Alibaba's vision-language model, pairing Qwen 3's reasoning and tool use with detailed image understanding. It reads charts, documents, and photos alongside text and offers an explicit thinking mode, making it well suited to multimodal analysis and agentic tasks.

Chat Vision Tools Thinking

235B · 2B · 30B · 32B · 4B · 8B 256K context

Llama 4 Scout

Phi-4

Microsoft

Phi-4 is Microsoft's small-but-capable model, trained with a heavy emphasis on high-quality and synthetic data to punch above its size on reasoning tasks. Its compact footprint makes it a practical choice for cost-sensitive and on-device deployments.

Chat Thinking

14B · 3.8B 32K context

Qwen 3 Coder Next

Alibaba

Qwen 3 Coder Next is Alibaba's next-generation coding model, built on a hybrid architecture for efficient long-context inference. It targets agentic, repository-scale software tasks and tool use across a 262K-token context.

Chat Tools

80B 256K context

GLM 4.7 Flash

Z.ai

GLM 4.7 Flash is Z.ai's fast, lightweight model in the GLM 4 line, offering a hybrid reasoning mode that trades depth for speed on demand. It's tuned for tool use and responsive agentic workflows across a roughly 200K-token context.

Chat Tools Thinking

30B 198K context

Ministral 3

Mistral AI

Ministral 3 is Mistral AI's compact instruction model family, offered in several sizes with tool use and image understanding. It's engineered for efficient, low-latency inference on modest hardware while retaining strong general capability, making it a good edge and on-device option.

Chat Vision Tools Thinking

14B · 3B · 8B 256K context

Devstral Small 2

Mistral AI

Devstral Small 2 is Mistral AI's coding-focused model, tuned for software-engineering agents that navigate and edit real codebases. It combines tool use with image understanding in a compact 24B size, targeting practical developer automation.

Chat Vision Tools

24B 256K context

GLM OCR

Z.ai

GLM OCR is Z.ai's document-understanding model, specialised for optical character recognition and structured extraction from images and scanned pages. It reads dense layouts and returns clean, machine-usable text across a 131K-token context.

Vision OCR

0.9B 128K context

Mistral Small 4

Mistral AI

Mistral Small 4 is Mistral AI's mid-size flagship, a mixture-of-experts model that activates only a small fraction of its parameters per token for efficient inference. It reads images as well as text, supports tool use, and handles a 256K-token context, making it a capable all-rounder for chat, coding, and document work.

Chat Vision Tools

119B 256K context

LFM 2.5

Liquid AI

LFM 2.5 is Liquid AI's efficient model family built on a liquid-foundation architecture designed for fast on-device inference. It includes a mixture-of-experts variant and supports tool use, targeting responsive assistants that run well on constrained hardware.

Chat Tools Thinking

1.2B · 350M · 8B 125K context

Devstral 2

Mistral AI

Devstral 2 is Mistral AI's large coding model, built for software-engineering agents that explore and modify real repositories rather than answer isolated questions. At 123B dense parameters with a 256K-token context and native tool use, it targets long-running agentic development work. It ships under a modified MIT licence that permits commercial use.

Chat Tools

123B 256K context

LFM 2.5 VL

Liquid AI

LFM 2.5 VL is Liquid AI's compact vision-language model, adding image understanding to the efficient LFM 2.5 line. Its small footprint suits multimodal assistants that need to run on-device or at low latency.

Chat Vision

1.6B 125K context

Nemotron 3 Nano

NVIDIA

Nemotron 3 Nano is NVIDIA's small, efficient model tuned for reasoning and tool use, offered in dense and mixture-of-experts sizes. It's built for cost-effective agentic deployments and supports a 131K-token context.

Chat Tools Thinking

30B · 4B 128K context

Granite 4.0 Micro

IBM

Granite 4.0 Micro is IBM's smallest enterprise model, a hybrid design tuned for tool use and reliable instruction following. Its compact size and permissive licensing make it a practical fit for on-device and cost-sensitive business applications.

Chat Tools

3B 128K context

Magistral Small

Mistral AI

Magistral Small is Mistral AI's reasoning model, trained to work through problems step by step before answering. It now reads images as well as text, extending that deliberate reasoning to visual input. At a compact 24B size it brings transparent reasoning and tool use within reach of single-GPU deployments.

Chat Vision Tools Thinking

24B 128K context

GLM 4.6V Flash

Z.ai

GLM 4.6V Flash is Z.ai's fast vision-language model, pairing image understanding with responsive, lightweight inference. It's suited to multimodal chat and analysis where latency matters, across a 131K-token context.

Chat Vision

9B 128K context

Mistral Large 3

Mistral AI

Mistral Large 3 is Mistral AI's frontier open-weight model, a mixture-of-experts design that activates a fraction of its parameters per token so inference cost tracks the active size rather than the full one. It reads images as well as text, uses tools natively, and handles a 256K-token context for demanding reasoning and long-document work.

Chat Vision Tools

675B 256K context

MiMo V2.5

Xiaomi

MiMo V2.5 is Xiaomi's open model tuned for reasoning, tool use, and image understanding. It offers an explicit thinking mode and targets capable multimodal assistance in an efficient package.

Chat Vision Tools Thinking

1M context

Voxtral Small

Mistral AI

Voxtral Small is Mistral AI's speech-understanding model: it takes spoken audio directly as input alongside text, so you can ask questions about a recording rather than transcribing it first. Built on the Mistral Small 24B backbone, it handles a 131K-token context and targets voice assistants and audio analysis.

Chat

24B 128K context

Ornith 1.0

DeepReinforce

Ornith 1.0 is DeepReinforce's hybrid-reasoning model, designed to combine deliberate step-by-step thinking with efficient long-context inference. It supports tool use and is offered in two sizes for different quality-versus-cost trade-offs.

Chat Tools Thinking

35B · 9B 256K context

DeepSeek OCR

DeepSeek

DeepSeek OCR is DeepSeek's document-understanding model, specialised for optical character recognition and extracting text from images and scans. It focuses on accurate, structured reading of visual documents.

Vision OCR

3B 8K context

DeepSeek R1 Distill

DeepSeek

DeepSeek R1 Distill is a family of smaller models distilled from DeepSeek R1's reasoning traces onto Qwen and Llama backbones. They bring much of R1's step-by-step reasoning to sizes that run on modest hardware.

Chat Thinking

14B · 1.5B · 32B · 70B · 7B · 8B 128K context

dots.ocr

Rednote

dots.ocr is Rednote's document-parsing model, built to read complex layouts — tables, columns, and mixed text — and return structured output. It targets accurate OCR and document understanding across a 131K-token context.

Vision OCR

1.7B 128K context

BGE Reranker v2 M3

BAAI

BGE Reranker v2 m3 is BAAI's multilingual reranking model, scoring how well each candidate document answers a query to sharpen retrieval results. It's a lightweight, effective second stage for RAG pipelines across many languages.

568M 8K context

Whisper

OpenAI

Whisper is OpenAI's speech-to-text model, transcribing audio across many languages with strong robustness to accents and background noise. The large-v3 and faster large-v3-turbo variants suit everything from meeting notes to media captioning.

Transcription

FLUX.1 [schnell]

Black Forest Labs

FLUX.1 [schnell] is Black Forest Labs' fast text-to-image model, distilled to generate high-quality images in just a few diffusion steps. Its speed and permissive Apache-2.0 licence make it a strong default for quick drafts and interactive generation.

Image

12B

Z-Image Turbo

Alibaba

Z-Image Turbo is Alibaba's distilled text-to-image model built for speed, producing images in only a few diffusion steps. It trades some peak fidelity for near-interactive latency, making it a good default for fast drafts and previews.

Image

6B

SDXL Turbo

Stability AI

SDXL Turbo is Stability AI's real-time text-to-image model, distilled from SDXL to generate images in as little as one to four steps. It's ideal for interactive and preview use where speed matters most.

Image

3.5B

FLUX.2 [klein]

Black Forest Labs

FLUX.2 [klein] is Black Forest Labs' efficient text-to-image model, offered in 4B and 9B sizes for a choice of speed versus fidelity. It brings the FLUX.2 line's image quality to hardware that can't run the largest checkpoints. Note that the two sizes ship under different terms: the 4B is Apache-2.0, while the 9B is released under the FLUX Non-Commercial License and is not licensed for commercial use.

Image

4B · 9B

Qwen-Image 2512

Alibaba

Qwen-Image 2512 is Alibaba's high-fidelity text-to-image model, notable for rendering legible text within images. It targets detailed, prompt-faithful generation for design and content work.

Image

20B

Stable Diffusion 3.5 Large Turbo

Stability AI

Stable Diffusion 3.5 Large Turbo is Stability AI's distilled flagship text-to-image model, generating high-quality images in a handful of steps. It pairs the detail of SD 3.5 Large with fast, few-step inference.

Image

8B

nomic-embed-text-v1.5

Nomic AI

nomic-embed-text v1.5 is Nomic AI's open text-embedding model, producing high-quality vectors for search, retrieval, and RAG. It supports adjustable embedding dimensions via Matryoshka representation, letting you trade vector size for cost.

Embeddings

2K context

Qwen 3 Embedding

Alibaba

Qwen 3 Embedding is Alibaba's text-embedding family built on Qwen 3, offered in several sizes for search and retrieval. It delivers strong multilingual embeddings for RAG and semantic-search pipelines across a 32K-token context.

Embeddings

0.6B · 4B · 8B 32K context

BGE M3

BAAI

BGE-M3 is BAAI's versatile embedding model supporting dense, sparse, and multi-vector retrieval in a single model. It handles more than 100 languages and long inputs, making it a strong multilingual default for search and RAG.

Embeddings

568M 8K context

EmbeddingGemma

Google DeepMind

EmbeddingGemma is Google DeepMind's compact text-embedding model from the Gemma family, tuned for on-device retrieval. Its small size and strong multilingual quality make it a practical embedder for local RAG and semantic search.

Embeddings

2K context

Open-weight models,sovereign UK infrastructure

DeepSeek V4 Flash

Qwen 3.6

Qwen 3.5

gpt-oss

Gemma 4

Qwen 3 Coder

Llama 3.3

Qwen 3 VL

Llama 4 Scout

Phi-4

Qwen 3 Coder Next

GLM 4.7 Flash

Ministral 3

Devstral Small 2

GLM OCR

Mistral Small 4

LFM 2.5

Devstral 2

LFM 2.5 VL

Nemotron 3 Nano

Granite 4.0 Micro

Magistral Small

GLM 4.6V Flash

Mistral Large 3

MiMo V2.5

Voxtral Small

Ornith 1.0

DeepSeek OCR

DeepSeek R1 Distill

dots.ocr

BGE Reranker v2 M3

Whisper

FLUX.1 [schnell]

Z-Image Turbo

SDXL Turbo

FLUX.2 [klein]

Qwen-Image 2512

Stable Diffusion 3.5 Large Turbo

nomic-embed-text-v1.5

Qwen 3 Embedding

BGE M3

EmbeddingGemma

New models, added as the ecosystem evolves

Open-weight models,
sovereign UK infrastructure