← Back to blog
·Greg Mousseau

GLM-5: The Open-Weight Model That Just Dethroned Kimi

GLM-5 (Reasoning) from Z AI debuted at the top of the open-weight leaderboard in February 2026, displacing Kimi K2.5 Thinking with a Quality Index of 49.64. China's open-source labs are now shipping at a pace that's hard to ignore.

Model ReviewFrontier ModelsAI StrategyZ AIOpen Source

If you've been tracking the open-weight leaderboard, February 2026 had a clear shakeup: GLM-5 (Reasoning) from Z AI (Zhipu AI) debuted at #1, displacing Kimi K2.5 Thinking, which had held the spot for about a month.

The Artificial Analysis Quality Index tells the story: GLM-5 at 49.64, Kimi K2.5 at 46.73. That's a meaningful gap at the top of the open-weight tier.

What Z AI Shipped

GLM-5 (Reasoning) — The thinking variant. Extended chain-of-thought, strong on complex multi-step tasks. This is the model to benchmark.

GLM-5 (Base) — Faster, cheaper to run, less reasoning depth. Better for high-throughput applications where the full reasoning chain isn't needed.

Key specs: 203K context window. Open weights. Commercially usable.

What's Driving the #1 Ranking

Z AI hasn't published a granular benchmark breakdown at launch, which is atypical and worth noting. The Quality Index ranking comes from Artificial Analysis' independent evaluation across multiple tasks. The specific areas where GLM-5 pulls ahead of Kimi K2.5:

  • Instruction following — More reliable on complex, multi-part instructions
  • Reasoning breadth — Better generalization across domains, not just math olympiad-style problems
  • Chinese language quality — Expected given Z AI's origin; top-tier bilingual performance

How It Compares at Launch

ModelQI ScoreContextOpen Weight
GLM-5 (Reasoning)49.64203K
Kimi K2.5 Thinking46.73
MiniMax M2.541.97
DeepSeek V3.241.28
Gemini 3.1 Pro1M

Quality Index from Artificial Analysis. Closed models excluded from direct QI comparison.

Best For

  • Self-hosted reasoning workloads where top open-weight quality matters
  • Bilingual (Chinese + English) applications
  • Teams evaluating the open-weight frontier: benchmark GLM-5 alongside Kimi K2.5 and DeepSeek V3.2

Not For

  • Specific benchmark claims — granular numbers aren't published at launch; wait for independent evals
  • Agentic coding — Claude Code remains the leader
  • Teams that need long context beyond 203K — Gemini or Llama 4 are better here

Verdict

GLM-5 is the new open-weight leader as of early February 2026 — that's real and worth paying attention to. The lack of detailed benchmark disclosure at launch is a mild yellow flag; verify on your own tasks before committing. But if you're evaluating open-weight frontier models right now, GLM-5 goes to the top of the list.

Part of our Model Watch series.