Skip to content
PickAIModel.com
Leaderboard

Top 10 AI models by quality.

Quality score uses HLE, SWE-Bench Pro, OpenRouter operational capability, and responsiveness from the published roster. For exact weights and coverage rules, see the Methodology page.

View

Quality
Value
Intelligence
Coding

Frontier watchlist

New frontier releases are under review

Claude Mythos Preview (release highly anticipated) and Gemini 3.5 Pro (release highly anticipated) are not yet included in the published roster.

Claude Mythos Preview and Gemini 3.5 Pro is on the watchlist because its release is highly anticipated; it will remain unranked until official availability and comparable benchmark evidence are published.

Before any model is ranked, we require comparable benchmark coverage from multiple accepted public sources, not just vendor-reported scores. We cross-check results across independent evaluation platforms and verify that each benchmark was run under consistent conditions before a score enters our methodology.

These models will be added to the leaderboard once that bar is met.

Claude Mythos PreviewGemini 3.5 Pro+4 more under review

Claude Mythos Preview: Release anticipated / limited-access only: Claude Mythos Preview is not generally available to the public, so it is excluded from the published roster.

Gemini 3.5 Pro: Frontier watchlist item only: official public Gemini 3.5 Pro release and comparable benchmark rows have not been verified yet.

Awaiting benchmarksPublication rules

Buyer-facing table

Explain how the quality score moves

Ranked by quality score

TTFT and ARC-AGI-2 are withheld from this primary table until accepted coverage reaches 60% of the published roster. Available rows remain on model detail pages.

01
100.0
Anthropic logo

Claude Opus 4.8

Anthropic

United States flagUnited States

GPQA Diamond

n/a

HLE

49.8%

Conversation

~392 chats

3K tokens/chat

Context

1M

Verdict

Open leaderboard verdict
02
69.2
Qwen logo

Qwen3.7 Max

Qwen

China flagChina

GPQA Diamond

92.4%

HLE

41.4%

Conversation

Free tier

3K tokens/chat

Context

1M

Verdict

Open leaderboard verdict
03
66.1
OpenAI logo

GPT-5.5

OpenAI

United States flagUnited States

GPQA Diamond

93.6%

HLE

41.4%

Conversation

~333 chats

3K tokens/chat

Context

1M

Verdict

Open leaderboard verdict
04
65.8
Google logo

Gemini 3.1 Pro

Google

United States flagUnited States

GPQA Diamond

94.3%

HLE

44.4%

Conversation

API pricing not published

3K tokens/chat

Context

1M

Verdict

Open leaderboard verdict
05
58.3
Google logo

Gemini 3.5 Flash

Google

United States flagUnited States

GPQA Diamond

n/a

HLE

40.2%

Conversation

~278 chats

3K tokens/chat

Context

1M

Verdict

Open leaderboard verdict
06
56.2
Anthropic logo

Claude Sonnet 4.6

Anthropic

United States flagUnited States

GPQA Diamond

89.9%

HLE

33.2%

Conversation

~654 chats

3K tokens/chat

Context

1M

Verdict

Open leaderboard verdict
07
44.6

GPQA Diamond

90.1%

HLE

33.5%

Conversation

Free tier

3K tokens/chat

Context

1M

Verdict

Open leaderboard verdict
08
35.4
xAI logo

Grok 4.3

xAI

United States flagUnited States

GPQA Diamond

90.1%

HLE

35.0%

Conversation

~6,667 chats

3K tokens/chat

Context

1M

Verdict

Open leaderboard verdict
09
31.3
MiniMax logo

MiniMax M2.7

MiniMax

China flagChina

GPQA Diamond

n/a

HLE

28.1%

Conversation

Free tier

3K tokens/chat

Context

204.8K

Verdict

Open leaderboard verdict
10
17.2
Moonshot AI logo

Kimi K2.6

Moonshot AI

China flagChina

GPQA Diamond

n/a

HLE

18.2%

Conversation

~333 chats

3K tokens/chat

Context

262.1K

Verdict

Open leaderboard verdict

Editorial investigation

The Reasoning Gap: How Claude Opus 4.8 Redefined the Quality Ceiling

An investigation into the emergent structural behaviors of the latest frontier update and its impact on the latent quality scores.

Explore full analysis
"Quality is what still makes sense after the excitement fades."
Q

PickAIModel Editorial

Quality note

Best AI Models Ranked by Value | PickAIModel