Grok Build 0.1: xAI's Fast, Cheap Coding Model vs. Claude and Gemini
xAI just shipped grok-build-0.1 on the xAI API — a coding model trained specifically for agentic software engineering, served at 100+ tokens per second and priced aggressively at $1 per million input tokens and $2 per million output. It's the model behind the new Grok Build CLI, and early customers like Notion and Kilo Code are leaning on it for speed and cost. So what does it actually do, how does it stack up against the Claude and Gemini coding models, and is it good enough to put a terminal agent like Claude Code or Google's Antigravity CLI on notice? Here's an honest look.
What grok-build-0.1 is
grok-build-0.1 is a purpose-built coding model — not a general chat model with a coding mode bolted on. xAI trained it specifically for agentic coding tasks: web development, debugging, and MCP (Model Context Protocol) support so it can drive external tools. It accepts text and image input, returns text with no fixed output cap, and ships with a 256K-token context window. The same model powers xAI's Grok Build CLI, a terminal-native coding agent, and it's available through the xAI API directly plus OpenRouter and the Vercel AI Gateway.
Two numbers define its pitch. First, speed: 100+ tokens per second is fast enough that an agent's many small round-trips feel responsive instead of laggy. Second, price: at $1 in / $2 out per million tokens, it's one of the cheapest coding-capable models on the market. xAI is explicit that beyond coding it's also "a speedy, economical option for general-purpose agentic and tool-calling use cases." Prompt caching is supported, which can cut effective input cost substantially on the repeated context that agentic loops generate.
💡 Read the version number. "0.1" is a deliberate signal — this is a first, fast, cost-optimized coding model, not xAI's flagship. xAI's larger next-generation model is expected to power Grok Build later; for now, grok-build-0.1 competes on throughput and economics, not on topping the intelligence charts.
How it compares to Claude and Gemini coding models
The cleanest way to place grok-build-0.1 is on three axes: price, context, and measured coding ability. Here's the landscape as of mid-2026 (API list prices, per million tokens):
| Model | Input | Output | Context |
|---|---|---|---|
| xAI grok-build-0.1 | $1.00 | $2.00 | 256K |
| Anthropic Claude Opus 4.8 | $5.00 | $25.00 | 1M |
| Anthropic Claude Sonnet 4.6 | $3.00 | $15.00 | 1M |
| Anthropic Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
| Google Gemini 3 Pro / 3.1 Pro | $2.00* | $12.00* | 1M |
*Gemini 3 Pro uses context-tiered pricing — roughly $2/$12 up to 200K tokens and $4/$18 beyond. Always check each provider's current pricing page before budgeting.
On price, grok-build-0.1 sits at the bottom alongside Claude Haiku 4.5 on input and below it on output — materially cheaper than Sonnet, Gemini Pro, or Opus. That's the whole point of the product.
On coding ability, the picture flips. On the widely-cited SWE-bench Verified benchmark (real GitHub issues a model must actually fix), third-party reports put grok-build-0.1 around 70%. For comparison, the frontier coding setups report meaningfully higher — Claude in the high-80s and Gemini 3.1 Pro around 80%. Treat all of these as directional: scores vary with the agent harness, the prompt, and the test subset, and a model number is not the same as an end-to-end coding tool's number. But the ordering is consistent and unsurprising given the positioning: grok-build-0.1 trades some raw problem-solving for speed and cost.
⚠️ Benchmarks measure the model, not your codebase. A 15-point SWE-bench gap can vanish or widen on your stack depending on language, test coverage, and how well each model uses tools. Benchmarks narrow the field; a short bake-off on your own tickets decides it.
So — is it good for coding?
Yes, for the right job. grok-build-0.1 is a strong fit when throughput and cost dominate: high-volume agentic workflows, web-development scaffolding, routine debugging, large fan-outs where you run many agent steps and the per-token bill adds up fast, and latency-sensitive interactive coding where 100+ tokens/sec keeps the loop feeling instant. That it's already powering production features at Notion and Kilo Code is a real signal — those are demanding, high-volume environments.
It's a weaker fit for the hardest, highest-stakes work: gnarly multi-file refactors, subtle concurrency or security bugs, and long-horizon autonomous runs where one wrong turn compounds. There, the extra reasoning of a frontier model like Claude Opus 4.8 — and its 1M-token context for holding a large codebase at once — still earns its higher price. The smart move isn't "pick one forever"; it's to route by task: a fast, cheap model for the bulk of the work, a frontier model for the parts where being right the first time is worth 5× the tokens.
The Grok Build CLI vs. Claude Code and Antigravity CLI
A model is only as good as the agent wrapped around it. grok-build-0.1 ships inside the Grok Build CLI, a terminal-native coding agent that — like its rivals — supports MCP, headless/scripted runs, and an open agent protocol (ACP) so you can wire it into your own loop or IDE. Reviews describe it spawning parallel sub-agents that work in isolated git worktrees, the same fan-out pattern we covered in our piece on agents, subagents, and workflows. How do the three terminal agents line up?
- Claude Code (Anthropic) — the most mature of the three. Defaults to Claude Opus 4.8 with a 1M-token context and a fast mode, plus a deep feature set: dynamic workflows, agent teams, skills, and hooks. It's the benchmark for terminal-native agentic coding, and the one we compared head-to-head with OpenAI's Codex CLI. Best when you want the strongest reasoning and the richest orchestration, and you'll pay frontier token rates for it.
- Google Antigravity CLI — Google's Go-based, multi-agent replacement for the Gemini CLI (which retires June 18, 2026). It's built around running multiple agents and is backed by the Gemini 3 family, with Google Cloud integration as a natural pull. Best if your stack already lives on Google Cloud and Gemini.
- Grok Build CLI (xAI) — the newest entrant, single-model on grok-build-0.1 today. Its differentiator is the model underneath: fast and cheap. Best when you're running a lot of agentic coding and want to keep latency and spend down, or want an xAI-native option.
The honest summary: Claude Code is the one to beat on capability and polish; Antigravity CLI is the Google-ecosystem play; Grok Build is the speed-and-cost play. If your bottleneck is how smart the agent is on hard problems, Claude Code leads today. If your bottleneck is how much a high-volume agentic workflow costs and how snappy it feels, Grok Build is genuinely compelling — and because all three speak MCP and ACP, you don't have to marry one. A common pattern emerging in 2026 is exactly that: a fast, cheap agent for breadth, a frontier agent for depth.
Getting started
You can call grok-build-0.1 straight from the xAI API with an API key, or reach it through OpenRouter or the Vercel AI Gateway if you're already routing models through one of those. For the full terminal experience, install the Grok Build CLI and point it at a project. Because it's MCP-capable, it slots into the same tool-and-server setup you'd use for any modern coding agent.
The bottom line
grok-build-0.1 is a clear, well-aimed product: a fast, inexpensive, agentic-coding model that won't top the intelligence benchmarks but will move a lot of routine software work quickly and cheaply. Against Claude and Gemini, it wins on price and speed and trails on peak coding ability — which is exactly the trade its "0.1" name advertises. For teams doing high-volume agentic coding, it's worth a real evaluation; for the hardest problems, a frontier model like Claude Opus 4.8 still earns its keep. The winning strategy in 2026 isn't loyalty to one model or one CLI — it's matching the tool to the task, which is the same discipline that runs through everything from agentic workflows to the cloud and DevOps skills our certification labs teach.