Claude Sonnet 4.6 Benchmarks: Beating Opus at Half the Price

On February 17, 2026, Anthropic released Claude Sonnet 4.6 — and quietly disrupted its own product line. Sonnet 4.6 beats Claude Opus 4.6 on the GDPval-AA benchmark for real-world office tasks, matches it within 0.2% on computer use, and delivers 97–99% of its coding capability — all at $3 per million input tokens versus Opus's $5. For most developers and businesses, the most expensive Claude model is no longer the best choice.

This is not marginal. Sonnet 4.6 jumped from 13.6% to 58.3% on ARC-AGI-2 (novel problem-solving). It leads all models on GDPval-AA. It scores 79.6% on SWE-bench Verified, within striking distance of Opus's 80.8%. And it is now the default model for free and Pro tier users. Here is the full breakdown.

Complete Benchmark Comparison

Every number below comes from Anthropic's published benchmarks, verified by independent outlets including ZDNet, CNET, Forbes, eWeek, and Tom's Guide. This table covers every major evaluation.

Benchmark	Sonnet 4.6	Opus 4.6	GPT-5.2	What It Tests
GDPval-AA (Elo)	1633	1606	1462	220 professional tasks across 44 occupations
SWE-bench Verified	79.6%	80.8%	80.0%	Real GitHub issue resolution
OSWorld-Verified	72.5%	72.7%	38.2%	Operating real software autonomously
Finance Agent v1.1	63.3%	60.1%	59.0%	Agentic financial analysis
ARC-AGI-2	58.3%	68.8%	—	Novel problem-solving, IQ-test style
Context Window	1M (beta)	1M (beta)	—	Maximum input capacity
Input Price	$3/M	$5/M	$1.75/M	Cost per million tokens
Output Price	$15/M	$25/M	$14/M	Cost per million tokens

Three things stand out immediately:

GDPval-AA is the most important benchmark here. It evaluates 220 real professional tasks across 44 occupations — finance, legal, healthcare, engineering, and more. These are not synthetic puzzles. Sonnet 4.6's Elo 1633 makes it the #1 model on this leaderboard, beating both Opus and GPT-5.2 by substantial margins. For organizations deploying AI for knowledge work, this is the benchmark that matters most.

OSWorld near-parity is remarkable. Sonnet 4.6 scores 72.5% versus Opus's 72.7% on OSWorld-Verified, which tests AI's ability to navigate spreadsheets, fill multi-step web forms, and interact with real operating systems. Both models have reached human-baseline performance — the approximate 72% threshold where AI matches average human accuracy on routine digital tasks. For context, GPT-5.2 scored 38.2% on the same test, and Sonnet 4.5 scored 61.4%.

ARC-AGI-2 is where Opus still wins convincingly. Opus 4.6 scores 68.8% versus Sonnet 4.6's 58.3% on novel abstract reasoning. This benchmark tests genuine generalization — solving problems the model has never seen in any form. The 10.5-point gap indicates Opus still has a meaningful edge in deep, creative reasoning. But Sonnet 4.6's leap from Sonnet 4.5's 13.6% to 58.3% (a 328% improvement) signals rapid convergence.

The Pricing Equation

Claude Sonnet 4.6 costs $3 per million input tokens and $15 per million output tokens. Claude Opus 4.6 costs $5 per million input tokens and $25 per million output tokens.

The practical impact depends on workload:

For 1 million tokens of input + 100K tokens of output:

Sonnet 4.6: $3.00 + $1.50 = $4.50
Opus 4.6: $5.00 + $2.50 = $7.50

That is 40% cheaper per request. At enterprise scale — millions of API calls per month — the savings compound into six figures annually. And for most tasks, Sonnet 4.6's slightly lower capability is invisible in practice.

Sonnet 4.6 is also available for free. It is now the default model for free-tier and Claude Pro users on Anthropic's platform. API access runs through Claude API, Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry.

Developer Preference Data

Anthropic released internal user preference data that reinforces the benchmarks:

70% of developers preferred Sonnet 4.6 over Sonnet 4.5 for coding tasks
59% of developers preferred Sonnet 4.6 over the previous Claude Opus 4.5
Key reasons cited: reduced overengineering, less "laziness" (refusing to complete tasks), and better instruction following

The "reduced overengineering" point matters more than it sounds. Opus-class models sometimes produce unnecessarily complex solutions — adding abstraction layers, creating interfaces that are not needed, or refactoring code that was not asked to be refactored. Sonnet 4.6 follows instructions more literally, which most production use cases prefer.

Technical Architecture: What Changed

Sonnet 4.6 features a 1 million token context window in beta — the same capacity as Opus 4.6. This allows processing entire codebases, lengthy legal documents, or dozens of research papers in a single request. Previous Sonnet models had significantly smaller context windows.

Key architectural upgrades include improvements in:

Coding: 79.6% SWE-bench Verified, near-parity with the best models
Computer use: Human-baseline performance on real software interfaces
Long-context reasoning: Processing vast inputs without degradation
Agent planning: Multi-step task decomposition and execution
Knowledge work: Real-world office task performance (GDPval-AA)
Design: UI/UX generation and visual reasoning

The model's 58.3% on ARC-AGI-2 (up from 13.6% on Sonnet 4.5) indicates fundamental improvements in abstract reasoning, not just benchmark optimization. A 328% improvement on a genuine generalization test suggests architectural changes, not just scaling.

When to Use Sonnet vs Opus

The decision framework is straightforward:

Use Sonnet 4.6 when:

Building production applications where cost efficiency matters
Deploying agentic systems for office work, financial analysis, or coding
Requiring consistent instruction following over creative reasoning
Processing high volumes of API requests
Needing computer use capabilities (nearly identical to Opus)

Use Opus 4.6 when:

Solving novel, never-before-seen problems (ARC-AGI-2 gap)
Performing agentic terminal coding (shell commands, system administration)
Conducting multi-disciplinary research requiring deep creative reasoning
Running agentic search tasks with complex query decomposition
Budget is secondary to peak performance

For most organizations, the answer is Sonnet 4.6 for 80–90% of workloads and Opus 4.6 reserved for the hardest 10–20%.

Competitive Landscape

Sonnet 4.6 competes directly with GPT-5.2 (OpenAI), GPT-5.3 Codex (coding-focused), and Gemini 2.0 (Google).

Against GPT-5.2, Sonnet wins on GDPval-AA (1633 vs 1462 Elo), Finance Agent (63.3% vs 59.0%), and OSWorld (72.5% vs 38.2%). GPT-5.2 is cheaper at $1.75/$14 per million tokens but scores dramatically lower on computer use.

Against GPT-5.3 Codex, the comparison is apples-to-oranges. Codex targets coding-specific workflows with 77.3% on Terminal-Bench 2.0. Sonnet 4.6 is a general-purpose model that happens to code very well.

Gemini 2.0 pushes multimodal agents with strong vision capabilities but lags on text-based reasoning benchmarks. No direct Elo comparison exists yet on GDPval-AA.

xAI's Grok 4.2 focuses on multi-agent debate for reasoning accuracy. It excels in math (ranked #1 on LMArena) but is not a coding-focused model.

What This Means for the AI Industry

Sonnet 4.6 demonstrates a trend that will reshape AI pricing: mid-tier models are closing the gap to flagship models faster than flagship models can advance. When the second-best model matches 97–99% of the best model's capability at 40% lower cost, the economic argument for the flagship becomes difficult to sustain.

This has implications for:

Anthropic's own business: Opus 4.6 exists to compete at the absolute frontier, but most revenue will flow through Sonnet usage. Anthropic may need to differentiate Opus more aggressively — perhaps through exclusive features, not just raw capability.

Enterprise AI budgets: The era of defaulting to the largest model is ending. Teams that switch from Opus to Sonnet save 40% with negligible quality loss for most applications. Budget-conscious organizations can deploy more agents, process more data, and experiment more freely.

Competitors: OpenAI and Google face a pricing war. Anthropic demonstrated that you can offer near-flagship performance at mid-tier prices. Expect matching price cuts or capability improvements across the board.

Frequently Asked Questions

What is Claude Sonnet 4.6?

Claude Sonnet 4.6 is Anthropic's latest mid-tier model, released February 17, 2026. It beats Claude Opus 4.6 on GDPval-AA (Elo 1633 vs 1606) and offers 97–99% of Opus coding capability at $3 per million input tokens.

How does Sonnet 4.6 compare to Opus 4.6 on benchmarks?

Sonnet leads on GDPval-AA and Finance Agent. Opus leads on ARC-AGI-2 and agentic terminal coding. They are virtually tied on OSWorld (72.5% vs 72.7%) and SWE-bench Verified (79.6% vs 80.8%).

What does Claude Sonnet 4.6 cost?

$3 per million input tokens and $15 per million output tokens. Opus 4.6 costs $5/$25. Free-tier users get Sonnet 4.6 as the default model.

Is Claude Sonnet 4.6 free?

Yes, it is the default model for free and Pro plan users on Anthropic's platform. API access requires paid tokens.

What is the context window?

1 million tokens in beta — the same as Opus 4.6. This supports processing entire codebases or document collections in a single request.

Where is Sonnet 4.6 available?

Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and directly through claude.ai for free and paid users.

Sonnet 4.6 is the most important Claude release of 2026 — not because it is the most capable, but because it proves that near-flagship performance at mid-tier pricing is achievable. Watch for Opus 4.7 to counter, cross-provider benchmark standardization, and whether the 40% price gap forces GPT and Gemini to respond.

Claude Sonnet 4.6 Benchmarks: Beating Opus at Half the Price

Claude Sonnet 4.6 Benchmarks: Beating Opus at Half the Price

Complete Benchmark Comparison

The Pricing Equation

Developer Preference Data

Technical Architecture: What Changed

When to Use Sonnet vs Opus

Competitive Landscape

What This Means for the AI Industry

Frequently Asked Questions

What is Claude Sonnet 4.6?

How does Sonnet 4.6 compare to Opus 4.6 on benchmarks?

What does Claude Sonnet 4.6 cost?

Is Claude Sonnet 4.6 free?

What is the context window?

Where is Sonnet 4.6 available?

Comments (0)