Claude Opus 4.8: Fast Mode, Dynamic Workflows & Near-Mythos
Anthropic released Claude Opus 4.8 on May 28, 2026 — just 41 days after Opus 4.7, the shortest flagship upgrade cycle the company has shipped. The headline feature is not a new capability: it is a price drop. Fast Mode, which lets Opus 4.8 run at roughly 2.5x normal speed, now costs $10/$50 per million tokens. For Opus 4.7, the same mode cost $30/$150. That is a 3x reduction with no change in speed gain, and it changes the economic math for production agent deployments.
Alongside the pricing shift come two genuinely new features — Dynamic Workflows, a research-preview mechanism for fanning out hundreds of parallel subagents from a single task, and Effort Control, a five-level API parameter that replaces the old temperature and sampling knobs. Anthropic has also removed temperature, top_p, and top_k from Opus 4.8 entirely; sending those parameters now returns a 400 error.
This guide covers what changed, what the benchmark numbers mean in practice, and what you need to update in code that was written for Opus 4.7.
Why This Release Matters for Agent Developers
Claude models have been getting faster and cheaper in steps, but Opus 4.8 is the first time a significant Fast Mode price drop has landed on the flagship tier. Before May 28, running Opus at 2.5x speed cost six times more per million input tokens than the standard endpoint. Now it costs twice as much. For workloads that can tolerate brief delays — background agents, batch analysis, report generation — the calculus flips: Fast Mode becomes the default choice for most tasks, not a premium option reserved for latency-critical paths.
The removal of temperature and top_p is the other change developers will notice immediately. Anthropic's reasoning, consistent with the shift toward Effort Control, is that sampling parameters were an indirect and unreliable way to control the depth of reasoning. Effort levels are more semantically meaningful and simpler to expose in product UIs.
What Changed: A Quick Reference
| Feature | Opus 4.7 | Opus 4.8 |
|---|---|---|
| Model ID | claude-opus-4-7 | claude-opus-4-8 |
| Standard pricing (input/output per MTok) | $5 / $25 | $5 / $25 |
| Fast Mode pricing (input/output per MTok) | $30 / $150 | $10 / $50 |
| Fast Mode speed | ~2.5x standard | ~2.5x standard |
| Context window | 1M tokens | 1M tokens |
| Temperature / top_p / top_k | Supported | Removed (400 error) |
| Effort Control | Not available | low / medium / high / xhigh / max |
| Dynamic Workflows | Not available | Research preview |
| Available on | Claude API, Amazon Bedrock, Google Cloud Vertex AI, GitHub Copilot | |
Fast Mode: The Economics of 3x Cheaper
Fast Mode is not a different model — it is the same Opus 4.8 weights served through a higher-throughput inference path. Quality and context window are identical. The tradeoff is that the system prioritizes throughput over minimizing time-to-first-token, which means it performs slightly worse on tasks that benefit from low latency (interactive chat, streaming completions that the user watches as they arrive). For agentic tasks running in the background, that tradeoff is irrelevant.
The practical implication: if you have an agent pipeline that runs reports, processes documents, or does code review in batch mode, Fast Mode is now the cheapest way to run Opus-class intelligence. At $10 per million input tokens, it is priced between Sonnet 4.6 ($3) and standard Opus 4.8 ($5) in total cost per useful output — because for most tasks, Fast Mode generates equivalent results with higher throughput per dollar.
To use Fast Mode via the API, set the anthropic-beta header or use the model alias:
import anthropic
client = anthropic.Anthropic()
# Fast Mode via model suffix
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
# Add beta header for Fast Mode access
extra_headers={"anthropic-beta": "fast-mode-2026-05-01"},
messages=[{"role": "user", "content": "Review this pull request for security issues..."}]
)
Check the Anthropic docs for the current Fast Mode beta header value, as the date suffix may update with new API versions.
Dynamic Workflows: Hundreds of Parallel Subagents
Dynamic Workflows is in research preview as of the May 28 launch. The mechanism works in two phases. First, the model receives a hard problem and produces a plan — a decomposition of the task into units of work. Second, it fans those units out across hundreds of parallel subagents, each completing its slice independently. The model then collects results, verifies them against the original pass-or-fail criteria (such as a test suite), and reports a unified outcome.
Anthropic's public example is a codebase-scale migration in Claude Code: the agent decomposes the migration by module, runs parallel subagents on each module, uses the existing test suite as the acceptance signal, and submits a merge-ready result. No human orchestration between phases.
Dynamic Workflows requires the claude-opus-4-8 model and the dynamic-workflows-2026-05-01 beta header:
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=8192,
extra_headers={"anthropic-beta": "dynamic-workflows-2026-05-01"},
messages=[{
"role": "user",
"content": (
"Migrate all API calls in this codebase from requests to httpx. "
"Use the existing pytest suite as the acceptance test. "
"Report only after all tests pass."
)
}]
)
The research preview label means access may be gated or throttled depending on your API tier. If you hit rate limits on Dynamic Workflows, the fallback is to implement your own fan-out with the standard Opus 4.8 endpoint — the model quality is the same, you just manage subagent coordination yourself.
Effort Control: Five Levels Replace Temperature
Opus 4.8 introduces a formal effort parameter that replaces the informal guidance developers used to give through temperature settings. The five levels are:
- low: Minimum reasoning — fast responses, basic tasks, simple lookups
- medium: Moderate depth — suitable for well-scoped drafting, summarization, and structured extraction
- high: The default — recommended for most production workloads
- xhigh: Extended reasoning — Anthropic recommends this for "difficult tasks and long-running asynchronous workflows"
- max: Maximum available compute — reserved for the hardest problems; highest latency and cost
Claude Code surfaces xhigh (shown in the UI as "extra") as the recommended setting for coding tasks. The API accepts effort as a top-level parameter:
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=16000,
effort="xhigh", # or "low" | "medium" | "high" | "max"
messages=[{
"role": "user",
"content": "Find all race conditions in this concurrent Go codebase..."
}]
)
# IMPORTANT: do NOT send temperature, top_p, or top_k
# These will return a 400 error on Opus 4.8
The practical guidance from Anthropic's docs is to start at high for most tasks and promote to xhigh when quality falls short. Moving to max is rarely justified for production pipelines given the cost and latency increase.
Near-Mythos Alignment: What It Means for Code
Anthropic's alignment team runs Opus 4.8 through the same evaluation suite they use for Mythos Preview, their most capable internal model. The result: Opus 4.8 produces rates of misaligned behavior — deception, cooperation with misuse requests — that are "similar to Claude Mythos Preview."
The more concrete number for developers is code integrity: Opus 4.8 is approximately four times less likely than Opus 4.7 to leave a flaw in its own code without mentioning it. This is the honesty improvement Anthropic has been building toward — the model flags what it does not know, reports when a task is going poorly, and avoids the pattern of presenting plausible-looking code that it cannot internally verify.
For agentic pipelines, this translates to a lower rate of silent failures. If Opus 4.8 cannot complete a step, it says so. That is a meaningfully different failure mode than a model that confidently returns wrong results.
Benchmark Results
| Benchmark | Opus 4.7 | Opus 4.8 | Change |
|---|---|---|---|
| SWE-bench Pro | 64.3% | 69.2% | +4.9pp |
| SWE-bench Verified | 87.6% | 88.6% | +1.0pp |
| Terminal-Bench 2.1 | 66.1% | 74.6% | +8.5pp |
| GraphWalks 1M-token F1 | 40.3% | 68.1% | +27.8pp |
| USAMO 2026 (math) | 69.3% | 96.7% | +27.4pp |
| Multidisciplinary reasoning | 54.7% | 57.9% | +3.2pp |
The two standout jumps are GraphWalks (+27.8pp) and USAMO (+27.4pp). GraphWalks measures long-context retrieval at one million tokens — the kind of task that stresses multi-hop reasoning over very large documents or codebases. The improvement here is large enough to change whether Opus 4.8 is usable for RAG-over-full-codebase patterns that Opus 4.7 handled poorly.
USAMO 2026 is a competition math benchmark. The jump from 69.3% to 96.7% is notable less for the math use case and more as an indicator of abstract reasoning depth. Competition math requires long chains of valid inference without shortcuts — the same property that makes models better at complex debugging and multi-step planning.
Terminal-Bench 2.1 (+8.5pp, to 74.6%) is the most practically relevant for agent developers. It tests agents operating in terminal environments across multi-step tasks. GPT-5.5 still leads at 78.2%, but the gap has narrowed from ~12pp to ~3.6pp.
Migrating from Opus 4.7
If your code sends temperature, top_p, or top_k to claude-opus-4-7, those parameters will break when you switch model IDs. Remove them before updating the model string:
# Opus 4.7 call — will break on Opus 4.8
response = client.messages.create(
model="claude-opus-4-7",
temperature=0.3, # remove this
top_p=0.9, # remove this
messages=[...]
)
# Opus 4.8 equivalent
response = client.messages.create(
model="claude-opus-4-8",
effort="high", # replaces temperature intent
messages=[...]
)
If you were using temperature=0 to reduce variance, the equivalent signal in Opus 4.8 is effort="low" or effort="medium". If you were using high temperature for creative diversity, effort="max" does not replicate that — Opus 4.8's approach to diversity is built into the model rather than controlled via sampling.
The model ID update itself (claude-opus-4-7 → claude-opus-4-8) is otherwise a drop-in replacement: same context window, same SDK, same message format.
Common Mistakes to Avoid
Keeping temperature in your API calls. Opus 4.8 returns a 400 error if you send temperature, top_p, or top_k. This will break any wrapper or SDK call that adds these as defaults. Check your client initialization code, not just your per-request parameters.
Assuming Fast Mode is automatically applied. Fast Mode is opt-in via a beta header. Standard claude-opus-4-8 calls use the standard endpoint. If you want the cheaper, faster inference path, you need to enable it explicitly.
Treating Dynamic Workflows as a stable API. Research preview features can change behavior, rate limits, or availability without the standard deprecation notice. Add a fallback path in your agent code that handles the case where the dynamic-workflows header returns a 422 or is silently ignored.
Using max effort for everything. effort="max" carries the highest latency and cost. For most tasks, high or xhigh produces equivalent results. Reserve max for tasks where you have observed xhigh falling short.
FAQ
Q: Is claude-opus-4-8 available on all Anthropic plans?
Yes. As of May 28, 2026, claude-opus-4-8 is available on the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and GitHub Copilot. Developer and Pro tier API access is available without a waitlist.
Q: Does Opus 4.8 Fast Mode reduce output quality?
Per Anthropic's documentation, Fast Mode uses the same model weights and produces the same quality outputs. The difference is in how inference is scheduled — Fast Mode prioritizes throughput, which means slightly higher time-to-first-token on average but higher tokens-per-second once generation starts. For batch workloads, the result quality should be identical to the standard endpoint.
Q: Can I still use Claude Opus 4.7 after the 4.8 release?
Yes. claude-opus-4-7 remains available. Anthropic follows a standard deprecation notice period before removing models. The 4.7 model is not immediately deprecated with the 4.8 release.
Q: What is the difference between xhigh effort and Dynamic Workflows?
Effort control (xhigh) increases the depth of reasoning within a single model call — more compute per response. Dynamic Workflows is a different mechanism: it breaks a task into parallel subagent calls, distributes work, and aggregates results. They are complementary: you can use xhigh effort within each subagent in a Dynamic Workflow.
Q: Is Sonnet 4.8 available yet?
No. As of June 2026, Anthropic has not released Claude Sonnet 4.8. The current Sonnet model is claude-sonnet-4-6. Anthropic has not announced a Sonnet 4.8 release date.
Key Takeaways
Claude Opus 4.8's most consequential change is the Fast Mode price drop — $30/$150 to $10/$50 per million tokens makes Opus-class speed economically viable for a much wider range of production agent workloads. The removal of temperature parameters requires a migration step, but the replacement Effort Control API is simpler to reason about. Dynamic Workflows is worth watching in research preview; for teams that can use it, it addresses one of the most common reasons to build custom orchestration on top of a frontier model.
- Model ID:
claude-opus-4-8, available May 28, 2026 - Fast Mode is now your default for batch agent work: 3x cheaper than Opus 4.7 Fast Mode, same 2.5x speed gain
- Remove temperature/top_p/top_k before updating model IDs — they return 400 errors on Opus 4.8
- Effort levels low → max replace sampling parameter tuning; start at
high, promote toxhighfor complex tasks - Dynamic Workflows (research preview): fanning out hundreds of parallel subagents from a single call — useful for codebase-scale operations in Claude Code
- Benchmark highlights: +27.4pp on USAMO math, +27.8pp on GraphWalks 1M-context retrieval, +8.5pp on Terminal-Bench 2.1
- Near-Mythos alignment: 4x fewer silent code flaws vs Opus 4.7; the model reports what it does not know
Need content like this
for your blog?
We run AI-powered technical blogs. Start with a free 3-article pilot.