← Back to blog
·Greg Mousseau

Claude 3.7 Sonnet: Anthropic's Extended Thinking Arrives

Claude 3.7 Sonnet dropped February 24, 2025 with extended thinking — Anthropic's answer to o1 and DeepSeek R1. It immediately set a new bar for coding benchmarks and introduced a dual-mode architecture that would define the Claude 4 generation.

Model ReviewFrontier ModelsAI StrategyAnthropic

Two things about Claude 3.7 Sonnet matter: the model itself, and what it signals about where Anthropic is headed.

The model: a new state-of-the-art for coding benchmarks at launch. The signal: Anthropic's hybrid architecture — instant responses for simple tasks, extended chain-of-thought for hard ones — is now shipping to users, not just being previewed at research conferences.

What's New

  • Extended thinking — 3.7 is the first Claude with a visible, user-accessible chain of thought. Budget controls let you set how much compute it can burn thinking before it answers.
  • Best coding model at launch — Leads SWE-bench Verified, the most realistic measure of real-world coding ability, at the time of release. Matches or beats GPT-4o and Gemini 2.0 Pro on software engineering tasks.
  • 200K context window — Same as 3.5, but the model makes better use of long contexts due to improved retrieval and reasoning over them.
  • Claude Code (preview) — Anthropic's coding agent launches alongside 3.7. It can autonomously work through multi-step coding tasks with file access, tests, and iteration.

How It Compares at Launch

ModelSWE-bench VerifiedGPQA DiamondNotes
Claude 3.7 Sonnet~70%~68%Best coding model at release
Grok 3 Thinkingtop tierMath-first; coding is secondary
GPT-4o~38%~53%Still strong for general tasks
DeepSeek R1moderate~65%Open-weight reasoning model
Gemini 2.0 Promoderate~60%Pre-2.5, still solid

SWE-bench figures are approximate; independent evaluations vary by scaffold and harness.

Best For

  • Software engineers using Claude Code or API-based coding agents
  • Any task where "think before you answer" produces better results — legal analysis, code review, complex debugging, multi-step planning
  • Teams already on Anthropic's API who want an instant upgrade

Not Yet For

  • Real-time web access — Claude 3.7 doesn't have live search; Grok 3's DeepSearch is better here
  • Math competition benchmarks — Grok 3 Thinking leads on AIME at this moment
  • Budget-sensitive deployments — extended thinking uses more tokens; disable it for simple tasks

Verdict

Claude 3.7 Sonnet is the best coding model available at this moment, and the extended thinking mode is genuinely useful — not just a benchmarking trick. The dual-mode architecture (instant + thinking) is the right design for production: you don't want a model that stops to think before every simple autocomplete. The bigger story is Claude Code, which is quietly becoming the most capable AI coding agent shipped to developers so far.

Part of our Model Watch series. Next: GPT-4.5 →