← Back to blog
·Greg Mousseau

Llama 4: Meta's Biggest Open-Weight Bet Yet

Llama 4 launched April 5, 2025 with three models — Scout, Maverick, and a still-training Behemoth — and a 10M token context window that made everyone do a double-take. The open-weight frontier just got serious.

Model ReviewFrontier ModelsAI StrategyMetaOpen Source

The open-source AI narrative has always been: great for experiments, not quite frontier-quality. Llama 4 is the clearest challenge to that framing yet.

Meta shipped three models on April 5, 2025. Two are available as open weights. One (Behemoth, ~2T parameters) is still training and promising to be the biggest thing Meta has built.

The Three Models

Llama 4 Scout (109B MoE, 17B active) — The one that will run on real hardware. Efficient, fast, capable. The context window is the headline: 10 million tokens. Not a typo. You could fit thousands of documents, entire codebases with history, or hours of transcripts.

Llama 4 Maverick (400B MoE, ~17B active) — Competitive with GPT-4o and Gemini 2.0 Flash on general benchmarks. Best-in-class for open-weight models that can be self-hosted with sufficient GPU.

Llama 4 Behemoth (still training, ~2T params) — Preview benchmarks suggest it will compete with Claude 4 and Gemini 2.5 Pro on reasoning. TBD.

What's Genuinely New

  • Native multimodal from day one — Text + images, natively trained together. Not bolted on.
  • 10M context on Scout — The most practical ultra-long-context model shipped to date. Fits in memory on cloud instances; usable.
  • MoE efficiency — Both Scout and Maverick use mixture-of-experts, activating only a fraction of parameters per token. High capability, lower inference cost.
  • Open weights, commercial OK — Standard Meta license: free for most commercial use, restricted from training competing foundation models.

How It Compares at Launch

ModelOpen WeightContextBest Use
Llama 4 Maverick1MGeneral, self-hosted
Llama 4 Scout10MLong-context, efficient
DeepSeek R1128KReasoning, open-weight
Gemini 2.5 Pro1MBest closed model currently
Claude 3.7 Sonnet200KBest coding, closed

Best For

  • Teams that want to self-host and fine-tune on their own data
  • Extremely long-context tasks — Scout's 10M window is unmatched in open-weight
  • Companies that can't send data to third-party APIs
  • Fine-tuning for domain-specific tasks (legal, medical, finance)

Not Yet For

  • Coding agents — Claude Code and Gemini 2.5 Pro still lead
  • Teams that need vendor SLAs — run it yourself or use a hosting provider
  • Behemoth's intended use cases — it's not out yet

Verdict

Llama 4 is the most serious open-weight release since DeepSeek R1. The 10M context window on Scout is genuinely unprecedented and useful. Maverick is competitive enough to replace GPT-4o for many production workloads — at a fraction of the inference cost if you self-host. The Behemoth previews suggest Meta is serious about competing with closed-source frontier models, not just the open-weight tier.

Part of our Model Watch series. Next: Claude 4 Opus →