ARTICLES ·2026-05-15 ·BY EFFLOOW CONTENT FACTORY

GLM-5.1: Zhipu's Open-Weight Agentic Coder That Topped SWE-Bench Pro

GLM-5.1 from Z.ai scored 58.4 on SWE-Bench Pro, beating GPT-5.4 and Claude Opus 4.6. 754B MoE, MIT license, $0.80/1M input tokens. Developer guide.

glm-5.1 zhipu-ai open-source agentic-coding swe-bench

GLM-5.1: Zhipu's Open-Weight Agentic Coder That Topped SWE-Bench Pro

On April 7, 2026, Zhipu AI (now operating as Z.ai) released GLM-5.1, an open-weight model built for long-horizon agentic coding. The headline number is a SWE-Bench Pro score of 58.4 — above GPT-5.4 (57.7) and Claude Opus 4.6 (57.3). The weights are MIT licensed and available on Hugging Face.

This is a significant milestone: a Chinese lab, training entirely on Huawei hardware without NVIDIA chips, now leads the most demanding real-world software engineering benchmark.

Architecture

GLM-5.1 uses a Mixture-of-Experts (MoE) design:

Attribute	Value
Total parameters	754B
Active per forward pass	~40B
Context window	202K tokens
Max output tokens	128K
Training hardware	Huawei Ascend 910B (100,000 chips)
Framework	MindSpore
License	MIT

The MoE architecture means inference cost is closer to a 40B dense model than a 754B one — relevant when comparing API pricing to similarly-benchmarked models.

Benchmarks

From llm-stats.com and the official Z.ai release:

Model	SWE-Bench Pro	SWE-Bench Verified
GLM-5.1	58.4	68.9
GPT-5.4	57.7	67.3
Claude Opus 4.6	57.3	66.8
Gemini 3.1 Pro	55.1	64.2

SWE-Bench Pro tests software engineering agents on real GitHub issues — not curated problems, but actual bugs and feature requests with full repository context. A score of 58.4 means the model resolved 58.4% of tested issues autonomously.

One context point: benchmark comparisons should be treated as directional, not definitive. Scores reflect specific scaffolding setups, and real-world results vary with system prompt, tool configuration, and task type.

What Changed from GLM-5

GLM-5 was released February 11, 2026, with 744B total parameters and $1.00/1M input pricing. GLM-5.1 ships six weeks later with:

10B additional parameters (754B total)
Improved long-horizon task planning (the "experiment–analyze–optimize" loop)
Reduced pricing: $0.80/1M input, $2.40/1M output (down from $1.00/$3.20)
Updated Hugging Face weights

The core differentiator in v5.1 is the autonomous feedback loop: on complex engineering tasks, GLM-5.1 proactively runs tests, identifies bottlenecks, adjusts its approach, and iterates — rather than generating one-shot code and stopping.

API Access

Z.ai provides an OpenAI-compatible API:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_ZAI_API_KEY",
    base_url="https://api.z.ai/v1"
)

response = client.chat.completions.create(
    model="glm-5.1",
    messages=[
        {"role": "user", "content": "Refactor this function to handle edge cases:\n\n```python\ndef divide(a, b):\n    return a / b\n```"}
    ]
)
print(response.choices[0].message.content)

The API supports:

Synchronous and async requests
Function calling / tool use
Structured output (JSON mode)
Context caching (reduces cost for repeated long-context calls)
Streaming

Full documentation: docs.z.ai/guides/llm/glm-5.1

Open Weights

The model weights are available under MIT license:

# HuggingFace
huggingface-cli download zai-org/GLM-5

Running a 754B MoE locally requires significant hardware — minimum 4-8 H100s for the full model. The quantized versions (4-bit) can run on a single A100 80GB but with accuracy trade-offs on long-horizon tasks.

For most developers, the API is the practical path. The open weights matter for fine-tuning, compliance requirements, or deployments where data cannot leave your infrastructure.

Pricing in Context

Model	Input $/1M	Output $/1M	SWE-Bench Pro
GLM-5.1	$0.80	$2.40	58.4
Claude Opus 4.6	$5.00	$25.00	57.3
GPT-5.4	$3.00	$12.00	57.7

GLM-5.1 delivers comparable or better benchmark performance at roughly 1/6th the input cost of Claude Opus 4.6. This pricing difference is material for high-volume agentic workloads where token costs accumulate quickly.

The 202K context window (vs. Claude Opus 4.6's 200K) is approximately equivalent for most use cases.

The Huawei Chip Story

GLM-5.1's full independence from US hardware is the engineering story behind the benchmark. The model trained on 100,000 Huawei Ascend 910B chips using the MindSpore framework — a full alternative to the NVIDIA/CUDA stack.

Zhipu completed a Hong Kong IPO in January 2026, raising approximately HKD 4.35 billion (USD $558 million) to fund continued model development. The move to Z.ai branding reflects a shift toward international developer positioning.

When to Use GLM-5.1

Strong fit:

Agentic coding pipelines where cost scales with usage
Tasks requiring 100K+ token context (entire codebases, long documents)
Deployments where open weights are a compliance or control requirement
Situations where Claude or GPT APIs are blocked by geographic or regulatory constraints

Weaker fit:

Multimodal tasks (GLM-5.1 is text-focused; vision capabilities are more limited than Gemini or GPT-5.4)
If you need a mature, battle-tested inference ecosystem — Z.ai's API is newer than OpenAI's or Anthropic's
Tasks requiring real-time knowledge beyond the training cutoff

Quick Evaluation Steps

If you want to test GLM-5.1 for your use case before committing:

Get a Z.ai API key at z.ai
Run a few representative tasks from your workload using the OpenAI-compatible client shown above
Compare output quality and cost against your current model on 10–20 representative inputs
Check whether the 202K context window covers your typical prompt size

The Coding Plan subscription ($3–$80/month depending on tier) is aimed at individual developers; the API pay-as-you-go is better for pipeline integration.

Sources

Need content like this
for your blog?

We run AI-powered technical blogs. Start with a free 3-article pilot.

Learn more →

Architecture

Benchmarks

What Changed from GLM-5

API Access

Open Weights

Pricing in Context

The Huawei Chip Story

When to Use GLM-5.1

Quick Evaluation Steps

Sources

Need content like thisfor your blog?

More in Articles

Stay in the loop.

Get weekly AI tool reviews & automation tips

Stay in the loop

Need content like this
for your blog?