Claude 4 Opus: Anthropic Goes All-In on Coding Agents

Every few releases, a model ships that genuinely changes how you work. Claude 4 Opus is one of those.

The benchmark story is good — best SWE-bench Verified score at launch, beats Gemini 2.5 Pro and GPT-4.5 on software engineering tasks. But the more interesting story is the product design: Anthropic is shipping Claude 4 Opus as the core of a coding agent, not just a completion model.

What's New

Alongside the model:

Claude 4 Opus — Anthropic's new flagship. Hybrid architecture (instant + extended thinking). Best coding model at launch.
Claude 4 Sonnet — Also ships today. Cheaper, faster, almost as capable for most tasks. The model to use by default.
Claude Code GA — The terminal coding agent moves from preview to general availability. Opus 4 is the engine.

The capabilities that matter:

Coding at a new level — SWE-bench Verified scores top the leaderboard at launch. On complex, real-world software engineering tasks, Opus 4 consistently outperforms Gemini 2.5 Pro and the o-series models.
Computer use — Controlling a browser or desktop as part of an agentic workflow. Not perfect, but functional enough to automate real tasks.
Extended thinking + tool use, combined — The model can reason, use a tool, observe the result, reason again. Long chains of tool calls with preserved context.
Better instruction following — Significantly reduced refusals on legitimate tasks. The model does what you ask.

How It Compares at Launch

Model	SWE-bench	Agentic Tool Use	Extended Thinking
Claude 4 Opus	#1	★★★★★	✅
Gemini 2.5 Pro	top 3	★★★★☆	✅
GPT-4.5	moderate	★★★☆☆	❌
Claude 3.7 Sonnet	top 3	★★★★☆	✅
Llama 4 Maverick	open-weight leader	★★★☆☆	✅

Best For

Agentic coding — If you're building with Claude Code or API-based coding agents, Opus 4 is the obvious choice
Multi-step workflows with tool use and computer control
Hard coding problems where extended thinking materially improves results
Software engineering research — the SWE-bench lead here is substantial

Use Sonnet 4 Instead When

You need 80% of the quality at 20% of the cost
Response latency matters (Sonnet is significantly faster)
Your task is relatively straightforward (summarization, Q&A, generation)

Verdict

Claude 4 Opus is the best coding model as of May 22, 2025, and the GA of Claude Code puts that capability in any developer's hands. The model-plus-agent combination is the most coherent story in the industry right now. Anthropic has been building toward this for 18 months; Opus 4 is where it pays off.

Part of our Model Watch series. Next: GPT-5 →