← Back to blog
·Greg Mousseau

Gemini 3 Pro: The Google Model That Triggered OpenAI's Code Red

Gemini 3 Pro dropped in November 2025 and did something the previous Gemini generations hadn't quite managed: it made people switch. OpenAI reportedly declared a code red as ChatGPT users moved to Gemini. Here's why.

Model ReviewFrontier ModelsAI StrategyGoogle

There's a useful distinction between a model that leads benchmarks and a model that changes user behavior. Gemini 3 Pro does both.

November 2025: Google releases Gemini 3 and Gemini 3 Pro. Within days, reporting surfaces that OpenAI is in "code red" mode internally — Gemini is pulling ChatGPT users. Not just technically impressive people who track benchmarks. Regular users, switching.

That's a different kind of win.

What's New

  • ARC-AGI-2: ~38% — Roughly double the previous best score on the hardest novel reasoning benchmark available. ARC-AGI-2 is specifically designed to resist training-data contamination; it tests whether a model can generalize, not memorize. Gemini 3 Pro's score here represents a genuine capability jump.
  • Chatbot Arena: #1 — Tops the blind human preference leaderboard, displacing GPT-5. This is the metric that affects real-world adoption, and Google now leads it.
  • 1M token context, better utilized — Gemini 3 Pro doesn't just accept 1M token inputs; it reasons coherently over them. The retrieval and attention quality at the far end of the context window is meaningfully better than 2.5 Pro.
  • Best multimodal model at launch — Text, image, video, audio, code — all native, all improved. Particularly strong on video understanding.
  • Coding — Leads WebDev Arena and general coding benchmarks at release. Claude Code retains the agentic edge, but for raw code generation, Gemini 3 Pro is now competitive.

How It Compares at Launch

ModelARC-AGI-2Arena RankContext
Gemini 3 Pro~38%#11M
GPT-5~20%top 3128K
Claude 4 Opustop 3200K
Gemini 2.5 Pro~18%top 51M
Llama 4 Maverickopen-weight1M

Why People Are Switching

A few specific things drive the usage shift:

  1. Better at everyday tasks — Gemini 3 Pro is more helpful on the kinds of questions regular users ask: explanations, research, writing assistance. It's less preachy, more direct.
  2. Native in the Google ecosystem — Docs, Gmail, NotebookLM, Search. The integration story finally catches up to the model quality.
  3. Free tier is better — Gemini app gives free users access to 3 Pro with higher usage limits than ChatGPT's free tier.

Best For

  • Multimodal workflows — especially anything involving video
  • Long-context analysis (500K+ tokens, or full codebases)
  • Teams embedded in Google Workspace
  • ARC-AGI-2 style novel reasoning — creative problem-solving, new domain synthesis

Not For

  • Coding agents — Claude Code is still the best agentic coding stack
  • Teams deeply committed to OpenAI's API and ecosystem
  • Real-time web integration — Grok 3's DeepSearch is still the best live-search experience

Verdict

Gemini 3 Pro is the best general-purpose model available as of November 2025. The ARC-AGI-2 score is a real signal — not a benchmark-optimized metric — and the market share movement is real. Google has gone from "good for Google" to "best on the market" in about 8 months. The competitive picture is now three-way again at the frontier, and it's genuinely unclear who leads in three months.

Part of our Model Watch series. Next: Kimi K2.5 Thinking →