Llama 4: Meta's Biggest Open-Weight Bet Yet
Llama 4 launched April 5, 2025 with three models — Scout, Maverick, and a still-training Behemoth — and a 10M token context window that made everyone do a double-take. The open-weight frontier just got serious.
The open-source AI narrative has always been: great for experiments, not quite frontier-quality. Llama 4 is the clearest challenge to that framing yet.
Meta shipped three models on April 5, 2025. Two are available as open weights. One (Behemoth, ~2T parameters) is still training and promising to be the biggest thing Meta has built.
The Three Models
Llama 4 Scout (109B MoE, 17B active) — The one that will run on real hardware. Efficient, fast, capable. The context window is the headline: 10 million tokens. Not a typo. You could fit thousands of documents, entire codebases with history, or hours of transcripts.
Llama 4 Maverick (400B MoE, ~17B active) — Competitive with GPT-4o and Gemini 2.0 Flash on general benchmarks. Best-in-class for open-weight models that can be self-hosted with sufficient GPU.
Llama 4 Behemoth (still training, ~2T params) — Preview benchmarks suggest it will compete with Claude 4 and Gemini 2.5 Pro on reasoning. TBD.
What's Genuinely New
- Native multimodal from day one — Text + images, natively trained together. Not bolted on.
- 10M context on Scout — The most practical ultra-long-context model shipped to date. Fits in memory on cloud instances; usable.
- MoE efficiency — Both Scout and Maverick use mixture-of-experts, activating only a fraction of parameters per token. High capability, lower inference cost.
- Open weights, commercial OK — Standard Meta license: free for most commercial use, restricted from training competing foundation models.
How It Compares at Launch
| Model | Open Weight | Context | Best Use |
|---|---|---|---|
| Llama 4 Maverick | ✅ | 1M | General, self-hosted |
| Llama 4 Scout | ✅ | 10M | Long-context, efficient |
| DeepSeek R1 | ✅ | 128K | Reasoning, open-weight |
| Gemini 2.5 Pro | ❌ | 1M | Best closed model currently |
| Claude 3.7 Sonnet | ❌ | 200K | Best coding, closed |
Best For
- Teams that want to self-host and fine-tune on their own data
- Extremely long-context tasks — Scout's 10M window is unmatched in open-weight
- Companies that can't send data to third-party APIs
- Fine-tuning for domain-specific tasks (legal, medical, finance)
Not Yet For
- Coding agents — Claude Code and Gemini 2.5 Pro still lead
- Teams that need vendor SLAs — run it yourself or use a hosting provider
- Behemoth's intended use cases — it's not out yet
Verdict
Llama 4 is the most serious open-weight release since DeepSeek R1. The 10M context window on Scout is genuinely unprecedented and useful. Maverick is competitive enough to replace GPT-4o for many production workloads — at a fraction of the inference cost if you self-host. The Behemoth previews suggest Meta is serious about competing with closed-source frontier models, not just the open-weight tier.
Part of our Model Watch series. Next: Claude 4 Opus →
