GLM-5.1: Zhipu's Open-Weight Agentic Coder That Topped SWE-Bench Pro
On April 7, 2026, Zhipu AI (now operating as Z.ai) released GLM-5.1, an open-weight model built for long-horizon agentic coding. The headline number is a SWE-Bench Pro score of 58.4 — above GPT-5.4 (57.7) and Claude Opus 4.6 (57.3). The weights are MIT licensed and available on Hugging Face.
This is a significant milestone: a Chinese lab, training entirely on Huawei hardware without NVIDIA chips, now leads the most demanding real-world software engineering benchmark.
Architecture
GLM-5.1 uses a Mixture-of-Experts (MoE) design:
| Attribute | Value |
|---|---|
| Total parameters | 754B |
| Active per forward pass | ~40B |
| Context window | 202K tokens |
| Max output tokens | 128K |
| Training hardware | Huawei Ascend 910B (100,000 chips) |
| Framework | MindSpore |
| License | MIT |
The MoE architecture means inference cost is closer to a 40B dense model than a 754B one — relevant when comparing API pricing to similarly-benchmarked models.
Benchmarks
From llm-stats.com and the official Z.ai release:
| Model | SWE-Bench Pro | SWE-Bench Verified |
|---|---|---|
| GLM-5.1 | 58.4 | 68.9 |
| GPT-5.4 | 57.7 | 67.3 |
| Claude Opus 4.6 | 57.3 | 66.8 |
| Gemini 3.1 Pro | 55.1 | 64.2 |
SWE-Bench Pro tests software engineering agents on real GitHub issues — not curated problems, but actual bugs and feature requests with full repository context. A score of 58.4 means the model resolved 58.4% of tested issues autonomously.
One context point: benchmark comparisons should be treated as directional, not definitive. Scores reflect specific scaffolding setups, and real-world results vary with system prompt, tool configuration, and task type.
What Changed from GLM-5
GLM-5 was released February 11, 2026, with 744B total parameters and $1.00/1M input pricing. GLM-5.1 ships six weeks later with:
- 10B additional parameters (754B total)
- Improved long-horizon task planning (the "experiment–analyze–optimize" loop)
- Reduced pricing: $0.80/1M input, $2.40/1M output (down from $1.00/$3.20)
- Updated Hugging Face weights
The core differentiator in v5.1 is the autonomous feedback loop: on complex engineering tasks, GLM-5.1 proactively runs tests, identifies bottlenecks, adjusts its approach, and iterates — rather than generating one-shot code and stopping.
API Access
Z.ai provides an OpenAI-compatible API:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_ZAI_API_KEY",
base_url="https://api.z.ai/v1"
)
response = client.chat.completions.create(
model="glm-5.1",
messages=[
{"role": "user", "content": "Refactor this function to handle edge cases:\n\n```python\ndef divide(a, b):\n return a / b\n```"}
]
)
print(response.choices[0].message.content)
The API supports:
- Synchronous and async requests
- Function calling / tool use
- Structured output (JSON mode)
- Context caching (reduces cost for repeated long-context calls)
- Streaming
Full documentation: docs.z.ai/guides/llm/glm-5.1
Open Weights
The model weights are available under MIT license:
# HuggingFace
huggingface-cli download zai-org/GLM-5
Running a 754B MoE locally requires significant hardware — minimum 4-8 H100s for the full model. The quantized versions (4-bit) can run on a single A100 80GB but with accuracy trade-offs on long-horizon tasks.
For most developers, the API is the practical path. The open weights matter for fine-tuning, compliance requirements, or deployments where data cannot leave your infrastructure.
Pricing in Context
| Model | Input $/1M | Output $/1M | SWE-Bench Pro |
|---|---|---|---|
| GLM-5.1 | $0.80 | $2.40 | 58.4 |
| Claude Opus 4.6 | $5.00 | $25.00 | 57.3 |
| GPT-5.4 | $3.00 | $12.00 | 57.7 |
GLM-5.1 delivers comparable or better benchmark performance at roughly 1/6th the input cost of Claude Opus 4.6. This pricing difference is material for high-volume agentic workloads where token costs accumulate quickly.
The 202K context window (vs. Claude Opus 4.6's 200K) is approximately equivalent for most use cases.
The Huawei Chip Story
GLM-5.1's full independence from US hardware is the engineering story behind the benchmark. The model trained on 100,000 Huawei Ascend 910B chips using the MindSpore framework — a full alternative to the NVIDIA/CUDA stack.
Zhipu completed a Hong Kong IPO in January 2026, raising approximately HKD 4.35 billion (USD $558 million) to fund continued model development. The move to Z.ai branding reflects a shift toward international developer positioning.
When to Use GLM-5.1
Strong fit:
- Agentic coding pipelines where cost scales with usage
- Tasks requiring 100K+ token context (entire codebases, long documents)
- Deployments where open weights are a compliance or control requirement
- Situations where Claude or GPT APIs are blocked by geographic or regulatory constraints
Weaker fit:
- Multimodal tasks (GLM-5.1 is text-focused; vision capabilities are more limited than Gemini or GPT-5.4)
- If you need a mature, battle-tested inference ecosystem — Z.ai's API is newer than OpenAI's or Anthropic's
- Tasks requiring real-time knowledge beyond the training cutoff
Quick Evaluation Steps
If you want to test GLM-5.1 for your use case before committing:
- Get a Z.ai API key at z.ai
- Run a few representative tasks from your workload using the OpenAI-compatible client shown above
- Compare output quality and cost against your current model on 10–20 representative inputs
- Check whether the 202K context window covers your typical prompt size
The Coding Plan subscription ($3–$80/month depending on tier) is aimed at individual developers; the API pay-as-you-go is better for pipeline integration.
Sources
Need content like this
for your blog?
We run AI-powered technical blogs. Start with a free 3-article pilot.