Llama 4: Meta's Biggest Open-Weight Bet Yet

The open-source AI narrative has always been: great for experiments, not quite frontier-quality. Llama 4 is the clearest challenge to that framing yet.

Meta shipped three models on April 5, 2025. Two are available as open weights. One (Behemoth, ~2T parameters) is still training and promising to be the biggest thing Meta has built.

The Three Models

Llama 4 Scout (109B MoE, 17B active) — The one that will run on real hardware. Efficient, fast, capable. The context window is the headline: 10 million tokens. Not a typo. You could fit thousands of documents, entire codebases with history, or hours of transcripts.

Llama 4 Maverick (400B MoE, ~17B active) — Competitive with GPT-4o and Gemini 2.0 Flash on general benchmarks. Best-in-class for open-weight models that can be self-hosted with sufficient GPU.

Llama 4 Behemoth (still training, ~2T params) — Preview benchmarks suggest it will compete with Claude 4 and Gemini 2.5 Pro on reasoning. TBD.

What's Genuinely New

Native multimodal from day one — Text + images, natively trained together. Not bolted on.
10M context on Scout — The most practical ultra-long-context model shipped to date. Fits in memory on cloud instances; usable.
MoE efficiency — Both Scout and Maverick use mixture-of-experts, activating only a fraction of parameters per token. High capability, lower inference cost.
Open weights, commercial OK — Standard Meta license: free for most commercial use, restricted from training competing foundation models.

How It Compares at Launch

Model	Open Weight	Context	Best Use
Llama 4 Maverick	✅	1M	General, self-hosted
Llama 4 Scout	✅	10M	Long-context, efficient
DeepSeek R1	✅	128K	Reasoning, open-weight
Gemini 2.5 Pro	❌	1M	Best closed model currently
Claude 3.7 Sonnet	❌	200K	Best coding, closed

Best For

Teams that want to self-host and fine-tune on their own data
Extremely long-context tasks — Scout's 10M window is unmatched in open-weight
Companies that can't send data to third-party APIs
Fine-tuning for domain-specific tasks (legal, medical, finance)

Not Yet For

Coding agents — Claude Code and Gemini 2.5 Pro still lead
Teams that need vendor SLAs — run it yourself or use a hosting provider
Behemoth's intended use cases — it's not out yet

Verdict

Llama 4 is the most serious open-weight release since DeepSeek R1. The 10M context window on Scout is genuinely unprecedented and useful. Maverick is competitive enough to replace GPT-4o for many production workloads — at a fraction of the inference cost if you self-host. The Behemoth previews suggest Meta is serious about competing with closed-source frontier models, not just the open-weight tier.

Part of our Model Watch series. Next: Claude 4 Opus →