Newsletter archiveIssue April 7, 2026AI-generated

PickAIModel Weekly Brief - Issue 002

Anthropic's Opus 4.7 arrived after the April 7 snapshot closed, GPT-5.4 slipped in quality, and Qwen 3.6 Plus is the week's standout free value play.

Share this issue

Send the full newsletter link to colleagues, clients, or communities that follow AI buying decisions.

X LinkedIn Facebook

Issue 002 — April 2026 Independent weekly rankings of the top 10 AI models by quality and value.

This week, the most important thing that happened wasn't on any leaderboard yet. Anthropic shipped Claude Opus 4.7 on April 16 — nine days after the current April 7 snapshot closed — and it won't appear in the official rankings until benchmark coverage catches up. Meanwhile, the current table has a story of its own: GPT-5.4 dropped 24 points in quality, Qwen 3.6 Plus Preview is sitting at the top of the value chart for free, and the enterprise world just got a reminder that most workers still aren't using any of this.

This Week's Rankings Snapshot

Source: PickAIModel.com leaderboard, snapshot dated April 7, 2026.

Quality leader: Gemini 3.1 Pro (80.7) — Google's multimodal flagship holds the top quality position, scoring 44.4% on HLE and a strong 77.1% on Arc-AGI-2, with a 1M-token context and the fastest first-token time of the top tier at 0.28 seconds.

Value leader: Qwen 3.6 Plus Preview (87.6) — Alibaba's free-during-preview model leads the value table by a margin, combining frontier-capable HLE performance (46.3%, highest in the top 10) with zero API cost for now; the only catch is that your prompts are used for model improvement during the preview period.

The surprise: GPT-5.4 dropped 24.1 points in quality this snapshot, the largest single-period fall in the current table, despite being OpenAI's most recent flagship. It holds at #4 overall (68.1) but the trend line is notable — it's the steepest decline of any model currently ranked, happening while the model is still relatively new.

The Stories That Matter

Anthropic Drops Opus 4.7 — Already Under Review Here

Anthropic released Claude Opus 4.7 on April 16, nine days after the current PickAIModel snapshot closed. Anthropic says Opus 4.7 is a notable step up from Opus 4.6 in advanced software engineering, ships with better high-resolution vision, introduces a new `xhigh` effort level, and keeps pricing at $5 per million input tokens and $25 per million output tokens. It is available across Claude, the API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry.

The more revealing detail is that Anthropic is positioning Opus 4.7 as a public bridge model rather than the absolute ceiling. In the same announcement, the company says Opus 4.7 is less broadly capable than Claude Mythos Preview, the restricted model it is keeping limited after announcing Project Glasswing. Buyers who care most about raw ceiling capability should read 4.7 as the best generally available Claude today, not as the end of Anthropic's current capability curve.

Why it matters: If the gains Anthropic is describing hold up under PickAIModel's methodology, Claude has a realistic path back to the top of the quality table. Until then, this is a launch worth watching rather than a rank we can publish early.

GPT-5.4's Quality Score Fell Off a Cliff

OpenAI launched GPT-5.4 on March 5 with native computer-use support, up to 1M tokens of context, and a claim that its individual factual assertions are 33% less likely to be false than GPT-5.2. So a 24-point quality drop on PickAIModel's scoring in this snapshot is worth paying attention to. The model still sits at #4 in quality (68.1) and #3 in value (77.2), so it hasn't fallen off the table — but the gap between launch-benchmark strength and a buyer-facing composite score is exactly the sort of divergence this site exists to surface. That doesn't prove OpenAI's launch claims are wrong. It does suggest that strong vendor evals, long context, and computer-use wins do not automatically translate into a stronger buying proposition once price and broader benchmark coverage are taken together.

Why it matters: Buyers evaluating GPT-5.4 for subscriptions or enterprise contracts should test it on their actual workflows rather than trusting benchmark rankings. The quality trend is moving in the wrong direction here.

Qwen 3.6 Plus Is Free, Frontier-Level, and Sitting at #1 for Value

Alibaba formally released Qwen 3.6 Plus on April 2, after the preview model appeared on OpenRouter on March 30. It leads the PickAIModel value table at 87.6 — 7 points clear of second place — and sits #6 in quality at 59.7, up 20.3 points this snapshot. Alibaba says the model ships with a 1-million-token context window by default, while OpenRouter is offering the preview version free.

In plain English, a context window is the amount of material a model can keep in working memory during one request: your prompt, prior chat turns, documents, code, and attached inputs all count against it. A larger context window means you can hand the model a longer report, a bigger codebase slice, or a messier conversation without chopping everything into pieces first. It does not guarantee better answers, but it does make long-document and multi-step work much more practical.

The catch is explicit. OpenRouter notes that prompt and completion data may be used to improve the model, which makes this a poor place for confidential material. For non-sensitive development, benchmarking, and agentic coding workflows, though, the evaluation barrier is basically gone. When the value leader is also free to try, a lot of buyers should be rerunning their shortlists.

Why it matters: If you run agentic coding workflows or long-document processing and you haven't tested Qwen 3.6 Plus yet, you're probably overpaying. The free access removes the usual evaluation barrier.

Enterprise AI Adoption Is Failing — But Consumers Are Ahead of It

Two data points from this month sit in uncomfortable tension. WalkMe's State of Digital Adoption 2026 says 54% of workers bypassed their company's AI tools in the past 30 days and completed the work manually instead; another 33% had not used AI at all. Meanwhile, Stanford HAI's 2026 AI Index says studies are finding productivity gains of 26% in software development and 14% to 15% in customer support where AI is actually used well. The gap isn't capability so much as trust, workflow fit, and whether the tool chosen maps cleanly onto the job. In other words, the bottleneck is no longer whether models can help. It's whether companies picked tools workers will actually open.

Why it matters: For individual subscribers deciding which AI subscription to pay for personally, this data is actually useful: the people getting value from AI tools are the ones who chose them and integrated them deliberately, not the ones whose companies mandated them. Your own judgment about fit matters more than your employer's vendor contract.

Breakthrough of the Week

Opus 4.7's Coding Jump Is Bigger Than the Headline Number

The SWE-bench Verified score for Claude Opus 4.7 — 87.6%, up from 80.8% in Opus 4.6 — matters less as a benchmark than as a practical threshold. SWE-bench tests a model's ability to resolve real GitHub issues from open-source repositories. At 80%, models can handle many clean, well-scoped tasks. At 87%, they start handling the messy ones: conflicting dependencies, undocumented edge cases, multi-file refactors. Early testers quoted by Amazon Web Services described Opus 4.7 catching logical faults during the planning phase rather than after execution — which changes the supervision model for developers. Less checking after the fact, more directing at the start. The new task budget system lets developers cap reasoning token spend per request, which matters for cost management on long-running agents. One concrete comparison: CursorBench hit 70% on Opus 4.7, up from 58% on Opus 4.6 — a 12-point jump on the benchmark that directly measures coding assistant performance.

The Number That Changes the Picture

80%

Roughly eight in ten enterprise workers are either actively avoiding or simply not using the AI tools their employers are paying for — based on WalkMe's survey of 3,750 executives and employees across 14 countries, released on April 9. This number matters for buyers of AI subscriptions in a specific way: the models that people choose for themselves, and actually use, are often different from the ones their employers mandate. If you're choosing a personal subscription, the 80% figure is a reminder that tool fit beats brand name. The model you'll open every morning is worth more than the model your IT department approved.

Tool of the Week

Qwen 3.6 Plus Preview on OpenRouter Free frontier-level AI for coding, long-document work, and agent pipelines

Best for: Developers and technical users who want to test a frontier-capable agentic coding model without spending anything during the evaluation period.

Why this week: With Qwen 3.6 Plus sitting at #1 for value on PickAIModel's leaderboard and available free via OpenRouter right now, this is one of the rare weeks where the value leader and the free tier are the same model. Access it via `qwen/qwen3.6-plus-preview:free` on OpenRouter. Keep confidential data out of it during the preview period — prompts are collected for model improvement.

OpenRouter — Qwen 3.6 Plus Preview | Non-affiliate link

The Week in Accidental Honesty

Anthropic launched its new flagship model and described it as "our most powerful generally available model" — then, in the same release notes, mentioned that it's deliberately less capable than a different model Anthropic has been quietly giving to a small number of selected companies for the past month. The "most powerful generally available" framing does a lot of work in that sentence. Most powerful of the ones you can actually get is a different claim than most powerful, but the press coverage mostly ran with the shorter version.

Next Week Preview

Claude Opus 4.7 benchmark coverage: we're awaiting comparable scores from accepted public sources before ranking it — watch for the leaderboard update at pickaimodel.com.
Whether GPT-5.4's quality trend continues to fall or stabilises in the next snapshot.
OpenAI's freshly closed $122 billion funding round — what that capital deployment may signal for model release cadence in Q2.

If someone forwarded this to you, subscribe at pickaimodel.com/newsletter. Have a question for a future issue? Reply to this email — we read everything.

Disclosures

Editorial independence. PickAIModel.com produces independent editorial content. Model rankings, quality scores, and value scores are determined by our published methodology and are not influenced by commercial relationships with any AI vendor. No company can pay for ranking position, score changes, inclusion in rankings, or favourable treatment in our methodology outputs.

AI-assisted content disclosure. Portions of the editorial summaries and commentary in this newsletter may be drafted with the assistance of AI language models and reviewed by the PickAIModel editorial team. Benchmark scores, pricing fields stored in PickAIModel's own system, and ranking data are not invented by the model and are not editorially altered to favour advertisers or affiliates.

Affiliate disclosure. This newsletter may contain affiliate links. If you click a qualifying link and make a purchase, PickAIModel.com may earn a commission at no additional cost to you. Affiliate relationships do not influence our rankings, scores, or methodology outputs. The Tool of the Week link this issue is a non-affiliate link.

Not financial or legal advice. Nothing in this newsletter constitutes financial, investment, or legal advice. References to funding rounds, valuations, IPO timelines, pricing, contracts, or company strategy are provided for informational purposes only. Make decisions based on your own judgment and, where appropriate, qualified professional advice.

Accuracy and currency. AI model pricing, capabilities, availability, and company claims change frequently. While we aim to be accurate at the time of publication, information may become outdated or be revised after publication. Verify critical purchasing details directly with the relevant vendor before acting.