ARTICLES ·2026-06-06 ·BY EFFLOOW CONTENT FACTORY

Grok Build 0.1: xAI's Agentic Coding API with Parallel Worktrees

xAI's Grok Build 0.1 is a fast agentic coding model at $1/$2 per MTok with 8 parallel Git worktrees and MCP support.

xai grok agentic-coding tool-scout

Grok Build 0.1: xAI's Agentic Coding API with Parallel Worktrees

xAI launched Grok Build 0.1 in public beta in May 2026, positioning it as a coding-first model optimized for agentic workflows. The model ID is grok-build-0.1, it's available through the xAI API and OpenRouter, and it was built specifically for software engineering tasks — not adapted from a general-purpose base at the last step. This scout covers what the model does, how it's priced, where it fits, and what to watch for when evaluating it against existing tooling.

Scout methodology: This article is based on xAI's official announcement (x.ai/news/grok-build-0-1), the xAI model documentation, a live OpenRouter API query conducted on 2026-06-06, and third-party developer coverage. No live API calls were made to xAI directly — Effloow does not hold an xAI API key. All pricing and parameter data was cross-referenced against the OpenRouter public API response.

What Grok Build 0.1 actually is

Grok Build 0.1 is not Grok with a coding system prompt. It's a separate model line trained specifically for agentic software engineering, according to the xAI announcement. The model supports text and image input, outputs text only, and runs at 100+ tokens per second on xAI's serving infrastructure — fast enough to be practical as a code-completion backbone in interactive tools.

The context window is 256,000 tokens, confirmed by querying the OpenRouter API directly. That's large enough for most real codebases when you're working on a feature branch, though not unlimited — projects with very large dependency graphs or monorepos may still hit the ceiling.

One notable architectural decision: built-in reasoning is always active. You cannot turn it off. For complex multi-step coding tasks this adds depth, but it also means every call carries reasoning token overhead. For high-frequency, low-complexity calls (think: simple file edits or single-line completions) that overhead is cost that wouldn't apply if reasoning were opt-in.

Parallel Git worktrees: the key differentiator

The most distinctive feature in Grok Build 0.1 is not the model itself — it's the execution architecture the model is designed around. Grok Build supports up to 8 parallel agents running simultaneously, each in an isolated Git worktree.

What that means in practice: you can assign Grok Build multiple independent tasks across different branches of the same repo, and they run concurrently without interfering with each other. Each agent operates in its own checkout, so branch A's changes don't contaminate branch B's working state mid-run.

The workflow model is plan → search → build. The agent first produces a plan for the task, then searches relevant code and documentation, then builds the implementation. This is a structured alternative to the more freeform tool-use loops in some other agentic coding tools.

For teams that need to run many parallel tasks — say, fixing multiple independent bugs, or running several feature implementations side-by-side — this architecture has real operational value. The bottleneck is no longer the model's throughput; it's whether you have enough independent tasks to fill 8 worktrees at once.

Pricing and where it lands

Pricing confirmed via OpenRouter API (live query, 2026-06-06):

Input: $1.00/M tokens
Output: $2.00/M tokens
Cache read: $0.20/M tokens

The cache read pricing is worth noting. At $0.20/M, cached prompt tokens cost one-fifth the fresh input rate. For agentic coding workflows with large, repeated system prompts or shared context, cache hit rate has a meaningful effect on total cost.

At $1/$2, Grok Build 0.1 is substantially cheaper than comparable agentic models. This makes it practical for high-volume automation workflows where cost-per-task matters — the kind of pipeline where you're running hundreds of small coding tasks rather than a few large ones.

SWE-Bench Verified performance

xAI reports a SWE-Bench Verified score of 70.8% for Grok Build 0.1. That figure comes from their internal harness, as stated in the announcement.

This is an important caveat. SWE-Bench Verified scores are not directly comparable across providers unless the same evaluation harness is used. The harness covers things like how the agent is scaffolded, how file context is provided, how retries are handled, and how partial solutions are scored. Different labs run different harnesses, and the differences can move scores by several percentage points in either direction.

The 70.8% figure is a useful v1 baseline — it tells you the model is functional and respectable for an initial release. It should not be read as directly comparable to scores reported by Anthropic, OpenAI, or DeepMind unless those scores were produced on the same harness.

As of this writing, independent third-party evaluations of Grok Build 0.1 on a standardized harness are [DATA NOT AVAILABLE].

How it compares (with caveats)

The table below uses the best available public data for each model. SWE-Bench scores across providers are not directly comparable due to harness differences — treat the column as directional, not definitive.

Metric	Grok Build 0.1	Claude Code (Opus 4.8)	Codex (GPT-5.5)
SWE-Bench Verified	70.8% (xAI harness)	[DATA NOT AVAILABLE on same harness]	[DATA NOT AVAILABLE]
Input price	$1.00/M	~$5.00/M	~$5.00/M
Output price	$2.00/M	~$25.00/M	~$30.00/M
Serving speed	100+ tok/sec	[DATA NOT AVAILABLE]	[DATA NOT AVAILABLE]
Context window	256K tokens	~200K tokens	~1M tokens
Parallel agents	Up to 8	Varies by client	[DATA NOT AVAILABLE]
Built-in reasoning	Always-on	Opt-in	Opt-in
MCP support	Yes (native)	Yes (via client)	Yes (via client)
Image input	Yes	Yes	Yes

The pricing gap is the most concrete comparison: Grok Build 0.1 costs roughly 5x less on input and 12x less on output than Opus 4.8. That gap matters for high-volume pipelines. For quality-sensitive, low-volume tasks where per-task cost is less important than reliability, the comparison shifts toward whichever model performs best on your specific task distribution.

MCP support and integrations

Grok Build 0.1 supports MCP (Model Context Protocol), both through official Grok Connectors and through Bring-Your-Own-MCP. Official connectors include GitHub, Notion, Linear, Google Workspace, Microsoft 365, Vercel, and Canva, according to the xAI docs and third-party guides.

The BYOMCP path means you can attach any MCP-compatible server to Grok Build, not just the pre-built connectors. This is important if your team has internal tools or custom data sources already exposed via MCP — you're not locked into the official connector list.

Third-party integrations reported in developer coverage include Kilo Code and OpenCode. Grok Skills (xAI's own skill-extension system) is also mentioned in the docs as a way to extend agent behavior, though detailed documentation on the Skills API was not fully reviewed for this scout.

The CLI

xAI ships a first-party CLI for Grok Build. Install command from the official docs:

curl -fsSL https://x.ai/cli/install.sh | bash

It's also available via npm. The latest version as of early June 2026 is v0.2.20, per developer community sources.

The CLI includes a --sandbox flag that runs agent tasks inside a Shuru microVM. This provides filesystem isolation — the agent's file operations are scoped to the sandbox, which reduces the risk of unintended modifications to the host system. The sandbox requires macOS 14 or later on Apple Silicon. Linux and Intel Mac support for the sandbox is [DATA NOT AVAILABLE] at time of writing.

Access to the CLI requires a SuperGrok or X Premium Plus subscription.

What it's good for

Based on the architecture and pricing:

High-volume parallel coding tasks. If your workflow involves running many independent tasks — patch generation, test scaffolding, dependency updates across multiple branches — the 8-worktree parallel architecture and low per-token cost make Grok Build 0.1 a practical fit. For an overview of how multi-agent coding systems are being evaluated and where the gaps remain, see our piece on the Meta Agent Challenge and autonomous development benchmarking.

Cost-sensitive automation pipelines. At $1/$2 input/output, you can run substantially more agentic tasks per dollar than with models in the $5-$30 range. For teams building automation pipelines that run continuously, this adds up.

MCP-heavy workflows. Native MCP support with a broad connector list means less glue code if your tools are already MCP-compatible.

What to watch for

Always-on reasoning has tradeoffs. For straightforward coding tasks, the reasoning overhead adds tokens without proportional benefit. You can't turn it off, so cost modeling needs to account for reasoning token volume.

The SWE-Bench harness caveat is real. Until third-party evaluations run Grok Build 0.1 on the same harness used by other labs, the 70.8% figure is useful context but not a direct comparison. For a deeper look at how coding agent benchmarks differ across evaluation setups, see our coverage of DataCurve's DeepSWE benchmark methodology.

v0.1 means the model is early. The 0.1 version number is not just branding — this is an early release. Expect behavior changes, capability additions, and possibly pricing adjustments as xAI iterates. The --sandbox macOS-only constraint also limits immediate adoption on non-Apple-Silicon environments.

Subscription gating for CLI. SuperGrok or X Premium Plus is required for CLI access. API access pricing is separate. Teams evaluating adoption need to account for both.

Verdict

Grok Build 0.1 enters a crowded agentic coding space with two genuine advantages: parallel Git worktrees built into the execution model, and pricing that is substantially below the current top-tier alternatives. The 100+ token/second serving speed means it's responsive enough for interactive use, not just batch processing.

The weaknesses are real — always-on reasoning that can't be disabled, a v0.1 release status that implies ongoing iteration, and SWE-Bench numbers that can't be directly compared across labs without a shared harness. The sandbox being macOS-Apple-Silicon-only also limits enterprise adoption in mixed environments.

For developers running high-volume, parallel, cost-sensitive agentic pipelines, this is worth evaluating now. For teams where per-task quality is the primary constraint and volume is lower, the more established tooling from Anthropic or OpenAI likely still holds an edge until independent benchmarks catch up.

FAQ

What does Grok Build 0.1 cost?
Input tokens are $1.00/M and output tokens are $2.00/M. Cache read tokens are $0.20/M. These prices are confirmed via the OpenRouter public API (queried 2026-06-06) and are consistent with xAI documentation.

How does the 70.8% SWE-Bench Verified score compare to other models?
Directly, it doesn't — at least not cleanly. The 70.8% figure was measured on xAI's internal harness. SWE-Bench Verified scores differ meaningfully across providers depending on how the evaluation is scaffolded. Treat it as a v1 baseline, not a leaderboard position.

Do I need a SuperGrok subscription to use Grok Build 0.1?
The CLI requires SuperGrok or X Premium Plus. The API (via xAI or OpenRouter) is separate — API access pricing is pay-per-token without a subscription requirement.

What is the --sandbox flag in the CLI?
It runs agent tasks inside a Shuru microVM, isolating the agent's filesystem operations from the host. It requires macOS 14 or later on Apple Silicon. This is useful for safety when running automated agents locally.

Can I connect my own MCP servers to Grok Build?
Yes. Grok Build supports Bring-Your-Own-MCP (BYOMCP), meaning any MCP-compatible server can be attached, not just the official Grok Connectors list.

Need content like this
for your blog?

We run AI-powered technical blogs. Start with a free 3-article pilot.

Learn more →

What Grok Build 0.1 actually is

Parallel Git worktrees: the key differentiator

Pricing and where it lands

SWE-Bench Verified performance

How it compares (with caveats)

MCP support and integrations

The CLI

What it's good for

What to watch for

Verdict

FAQ

Need content like thisfor your blog?

More in Articles

Stay in the loop.

Get weekly AI tool reviews & automation tips

Stay in the loop

Need content like this
for your blog?