GLM-5: The Open-Weight Model That Just Dethroned Kimi
GLM-5 (Reasoning) from Z AI debuted at the top of the open-weight leaderboard in February 2026, displacing Kimi K2.5 Thinking with a Quality Index of 49.64. China's open-source labs are now shipping at a pace that's hard to ignore.
If you've been tracking the open-weight leaderboard, February 2026 had a clear shakeup: GLM-5 (Reasoning) from Z AI (Zhipu AI) debuted at #1, displacing Kimi K2.5 Thinking, which had held the spot for about a month.
The Artificial Analysis Quality Index tells the story: GLM-5 at 49.64, Kimi K2.5 at 46.73. That's a meaningful gap at the top of the open-weight tier.
What Z AI Shipped
GLM-5 (Reasoning) — The thinking variant. Extended chain-of-thought, strong on complex multi-step tasks. This is the model to benchmark.
GLM-5 (Base) — Faster, cheaper to run, less reasoning depth. Better for high-throughput applications where the full reasoning chain isn't needed.
Key specs: 203K context window. Open weights. Commercially usable.
What's Driving the #1 Ranking
Z AI hasn't published a granular benchmark breakdown at launch, which is atypical and worth noting. The Quality Index ranking comes from Artificial Analysis' independent evaluation across multiple tasks. The specific areas where GLM-5 pulls ahead of Kimi K2.5:
- Instruction following — More reliable on complex, multi-part instructions
- Reasoning breadth — Better generalization across domains, not just math olympiad-style problems
- Chinese language quality — Expected given Z AI's origin; top-tier bilingual performance
How It Compares at Launch
| Model | QI Score | Context | Open Weight |
|---|---|---|---|
| GLM-5 (Reasoning) | 49.64 | 203K | ✅ |
| Kimi K2.5 Thinking | 46.73 | — | ✅ |
| MiniMax M2.5 | 41.97 | — | ✅ |
| DeepSeek V3.2 | 41.28 | — | ✅ |
| Gemini 3.1 Pro | — | 1M | ❌ |
Quality Index from Artificial Analysis. Closed models excluded from direct QI comparison.
Best For
- Self-hosted reasoning workloads where top open-weight quality matters
- Bilingual (Chinese + English) applications
- Teams evaluating the open-weight frontier: benchmark GLM-5 alongside Kimi K2.5 and DeepSeek V3.2
Not For
- Specific benchmark claims — granular numbers aren't published at launch; wait for independent evals
- Agentic coding — Claude Code remains the leader
- Teams that need long context beyond 203K — Gemini or Llama 4 are better here
Verdict
GLM-5 is the new open-weight leader as of early February 2026 — that's real and worth paying attention to. The lack of detailed benchmark disclosure at launch is a mild yellow flag; verify on your own tasks before committing. But if you're evaluating open-weight frontier models right now, GLM-5 goes to the top of the list.
Part of our Model Watch series.
