#	Model		DFO Score ▾	Tok/s	In $/1M	Out $/1M	Ctx
1	Claude Fable 5Anthropic	TOP PICK	90.6	56.5	$10.00	$50.00	1M
2	Claude Opus 4.8Anthropic	FRONTIER	89.3	52.4	$5.00	$25.00	1M
3	Claude Opus 5Anthropic	TOP PICKIN-HOUSE PICKNEW	88.6	54.6	$5.00	$25.00	1M
4	Gemini 3.1 Pro PreviewGoogle	FRONTIER	86.6	126	$2.00	$12.00	1M
5	Kimi K3MoonshotAI	TOP PICKNEW	84.6	58.5	$3.00	$15.00	1M
6	Grok 4.5xAI	NEW	83.8	73.8	$2.00	$6.00	500K
7	Claude Sonnet 5Anthropic	IN-HOUSE PICKNEW	83.8	75.6	$2.00	$10.00	1M
8	Muse Spark 1.1Meta	NEW	83.4	107	$1.25	$4.25	1M
9	GPT-5.5OpenAI	BEST FOR CODING	83.3	73.7	$5.00	$30.00	1.1M
10	GLM 5.2Z.ai		82.2	167	$0.6916	$2.17	1M
11	GPT-5.3-CodexOpenAI		80.5	95.6	$1.75	$14.00	400K
12	GPT-5.5 (high)OpenAI	BEST FOR AGENTS	80.0	64.8	$5.00	$30.00	–
13	Gemini 3.5 FlashGoogle		79.1	193	$1.50	$9.00	1M
14	Claude Opus 4.7Anthropic		78.7	45.3	$5.00	$25.00	1M
15	Gemini 3 Flash PreviewGoogle		78.5	211	$0.5000	$3.00	1M
16	Qwen3.7 MaxQwen		78.5	201	$1.48	$4.43	1M
17	GPT-5.6 TerraOpenAI	NEW	78.1	124	$1.25	$7.50	1.1M
18	DeepSeek V4 ProDeepSeek		77.9	60.1	$0.4350	$0.8700	1M
19	GPT-5.5 (medium)OpenAI		77.5	64.4	$5.00	$30.00	–
20	GPT-5.6 SolOpenAI	NEW	77.5	64.9	$5.00	$30.00	1.1M

Showing 1–20 of 650 · Data from OpenRouter, Artificial Analysis, Hugging Face & our own testing. Scores editorially curated.

Updated July 29, 2026

We deploy these models for businesses every week. Get a recommendation for your workload.

Get Started

This independent leaderboard tracks 650 large language models and ranks them by capability, price and speed. Design for Online aggregates third-party benchmarks with our own editorial testing, refreshing the data daily so the ranking reflects the models you can actually use today, not last year's headlines.

Leaderboards by use case

The overall table, re-ranked for the job you're hiring a model for.

AI Agents 1. Claude Fable 5 90.6 2. Claude Opus 4.8 89.3 3. Claude Opus 5 88.6 View 112 models → Coding 1. Claude Fable 5 90.6 2. Claude Opus 4.8 89.3 3. Claude Opus 5 88.6 View 158 models → Content Writing 1. Claude Fable 5 90.6 2. Claude Opus 4.8 89.3 3. Claude Opus 5 88.6 View 163 models → General 1. Claude Fable 5 90.6 2. Claude Opus 4.8 89.3 3. Claude Opus 5 88.6 View 33 models → SEO 1. Gemini 3.1 Pro Preview 86.6 2. Gemini 3 Flash Preview 78.5 3. Gemma 4 31B 68.3 View 32 models → Tool Use 1. Claude Fable 5 90.6 2. Claude Opus 4.8 89.3 3. Claude Opus 5 88.6 View 325 models →

Leaderboard changelog

A running log of what has changed on the board and when.

30 changes in the last 7 days

PRICE DROP Today Qwen3 VL 30B A3B Instruct: output price drops 13% to $0.52 per 1M tokens.

PRICE DROP Today Qwen3 VL 30B A3B Instruct: input price drops 13% to $0.13 per 1M tokens.

PRICE DROP Today Qwen3 Coder Next: output price drops 11% to $0.80 per 1M tokens.

PRICE DROP Today Qwen3 Coder Next: input price drops 33% to $0.12 per 1M tokens.

PRICE DROP Today Gemma 4 26B A4B: output price drops 24% to $0.34 per 1M tokens.

PRICE DROP Today Gemma 4 26B A4B: input price drops 53% to $0.07 per 1M tokens.

PRICE DROP Today GLM 5.2: output price drops 10% to $2.18 per 1M tokens.

PRICE DROP Today GLM 5.2: input price drops 10% to $0.69 per 1M tokens.

PRICE DROP 2 days ago GPT-5.6 Terra Pro: output price drops 50% to $7.50 per 1M tokens.

PRICE DROP 2 days ago GPT-5.6 Terra Pro: input price drops 50% to $1.25 per 1M tokens.

PRICE DROP 2 days ago GPT-5.6 Luna Pro: output price drops 50% to $3.00 per 1M tokens.

PRICE DROP 2 days ago GPT-5.6 Luna Pro: input price drops 50% to $0.50 per 1M tokens.

PRICE DROP 2 days ago GPT-5.6 Luna: output price drops 50% to $3.00 per 1M tokens.

PRICE DROP 2 days ago GPT-5.6 Luna: input price drops 50% to $0.50 per 1M tokens.

PRICE DROP 2 days ago Nemotron 3 Ultra: output price drops 39% to $2.20 per 1M tokens.

PRICE DROP 2 days ago Nemotron 3 Ultra: input price drops 17% to $0.50 per 1M tokens.

PRICE DROP 2 days ago GPT-5.6 Terra: output price drops 50% to $7.50 per 1M tokens.

PRICE DROP 2 days ago GPT-5.6 Terra: input price drops 50% to $1.25 per 1M tokens.

PRICE DROP 3 days ago Qwen3.6 27B: output price drops 17% to $2.00 per 1M tokens.

RANK MOVE 4 days ago LFM2.5-1.2B-Instruct (free): moves ▼553 to #573 overall.

RANK MOVE 4 days ago LFM2.5-1.2B-Thinking (free): moves ▼508 to #527 overall.

RANK MOVE 4 days ago Qwen3 Coder Next: moves ▼115 to #131 overall.

RANK MOVE 4 days ago Qwen3 Max Thinking: moves ▼101 to #115 overall.

RANK MOVE 4 days ago Qwen3.5 397B A17B: moves ▼92 to #103 overall.

RANK MOVE 4 days ago Step 3.5 Flash: moves ▼82 to #99 overall.

RANK MOVE 4 days ago MiniMax M2.5: moves ▼63 to #75 overall.

RANK MOVE 4 days ago GLM 5: moves ▼48 to #61 overall.

RANK MOVE 4 days ago Kimi K2.5: moves ▼34 to #52 overall.

NEW MODEL 4 days ago Gemini 3.5 Flash-Lite: enters the leaderboard at #47.

RANK MOVE 4 days ago Claude Sonnet 4.6: moves ▼17 to #27 overall.

How we rank AI models

The Design for Online AI Model Leaderboard scores 650 models on a single 0–100 scale built from four weighted dimensions: intelligence (reasoning and knowledge benchmarks), technical capability (coding and tool use), content quality (writing and instruction-following) and value (capability per dollar).

Underlying data is aggregated from the OpenRouter API for pricing and availability, Artificial Analysis for intelligence, coding and agentic indices, and the Hugging Face Open LLM Leaderboard for open-model benchmarks. The fourth source is our own: we deploy these models in client agents, chatbots and automations every week, and that internal testing feeds the editorial layer, so a model that benchmarks well but is impractical to deploy will not automatically top the table.

Models are grouped into tiers (Frontier, Professional, Specialist, Efficient, Emerging and Legacy) to make like-for-like comparison easier, and newly released models are flagged so you can see what has just landed.

Leaderboard FAQ

How often is the leaderboard updated?

Pricing, availability and benchmark data are synced daily from our sources, and editorial scores are reviewed whenever a significant new model is released.

How is the overall score calculated?

Each model is graded 0–10 on intelligence, technical capability, content quality and value; those dimensions are weighted and combined into the 0–100 overall score used to rank the table.

Where does the data come from?

From four sources: the OpenRouter API, Artificial Analysis, the Hugging Face Open LLM Leaderboard, and internal testing from real deployments by the Design for Online team.