The words Innovation Explained with the ai underlined on gradient background with a data node pattern.The words Innovation Explained with the ai underlined on gradient background with a data node pattern.

Claude Opus 4.8 is Anthropic’s latest and most capable generally available AI model, released on May 28, 2026 as an upgrade to Opus 4.7. It’s built for demanding work like coding, multi-step agentic tasks, reasoning, and knowledge work, and according to Anthropic’s official announcement it ships at the same price as the model it replaces. The headline theme of the release isn’t just raw capability though, it’s reliability. Anthropic is positioning Opus 4.8 as a model with sharper judgment that’s more honest about what it does and doesn’t know.

In this article we’ll discuss what’s actually new in Opus 4.8, how it performs against both its predecessor and competing models, the new features launching alongside it, and what the release signals about where Anthropic is heading next.

TL;DR Snapshot

Anthropic is calling Claude Opus 4.8 a “modest but tangible” upgrade that leans hard into reliability and honesty rather than flashy new tricks. It posts the strongest scores in its class on several coding and reasoning benchmarks, it’s noticeably less likely to overstate its own progress, and it arrives bundled with new workflow and effort-control features.

Key takeaways include…

Opus 4.8 hit 69.2% on the SWE-Bench Pro agentic coding benchmark, up from 64.3% for Opus 4.7, and Anthropic says it’s around four times less likely than its predecessor to let flaws in its own code slip by unremarked.
It launches with new features including Dynamic Workflows (which can spin up hundreds of parallel subagents), user-adjustable effort controls, and a fast mode that runs 2.5x faster while costing three times less than previous fast modes.
Anthropic says more powerful Mythos-class models are expected to release “in the coming weeks,” once the proper cybersecurity safeguards are in place.

Who should read this: Developers, founders, product teams, and AI enthusiasts who want to understand what Opus 4.8 changes in practice.

What’s New in Opus 4.8: Better Judgment, Not Just Bigger Numbers

The most interesting story in this release isn’t the benchmark chart, it’s the focus on trust. AI models have a well-known habit of jumping to conclusions and confidently claiming they’ve finished something when the evidence is thin. Anthropic says Opus 4.8 directly targets that failure mode. In their official announcement, they report that early testers found the model more likely to flag uncertainties about its work and less likely to make unsupported claims, and that its evaluations show Opus 4.8 is around four times less likely than its predecessor to allow flaws in the code it writes to pass unremarked.

One of the testers, the investment firm Bridgewater, echoed these sentiments. According to a TechCrunch report, the firm said the biggest difference in the upgrade was the 4.8’s tendency to proactively flag issues with the inputs and outputs of an analysis, the kind of thing other models routinely miss and leave for the user to catch.

There’s an alignment angle too. Anthropic says Opus 4.8 reaches new highs on what it calls prosocial traits, like supporting user autonomy and acting in the user’s best interest, and that its rates of misaligned behavior such as deception are substantially lower than Opus 4.7.

How It Performs: Benchmarks and Speed

Illustration showing a stylized robotic head with a neural network, coding panels, workflow nodes, a magnifying glass, and a checkmark shield to represent reliable AI reasoning and code validation.

On the numbers, Opus 4.8 is competitive at the top of the field. Per a report from The Decoder, the model scored 69.2% on the SWE-Bench Pro agentic coding test, up from 64.3% for Opus 4.7 and ahead of GPT-5.5’s 58.6%. On Humanity’s Last Exam, a multidisciplinary reasoning benchmark, it scored 49.8% without tools and 57.9% with tools, which the same report describes as the highest marks in the field.

It’s worth noting, however, that GPT-5.5 still leads on at least one terminal-coding benchmark. And Let’s Data Science points out that third-party testing of Opus 4.8 has been somewhat mixed, with independent evaluations finding regressions on certain simulated economic benchmarks. Anthropic itself frames the model as a modest step up from 4.7 rather than a giant leap.

That said, there are clear improvements when it comes to speed and cost. According to Axios, 4.8’s fast mode runs at 2.5x the speed while being three times cheaper than it was for previous models. As far as standard mode is concerned, Artificial Intelligence News reports that 4.8’s pricing stays at $5 per million input tokens and $25 per million output tokens, keeping it consistent with 4.7.

New Features and the Mythos Tease

Several outlets including TechCrunch note that the features shipping alongside Opus 4.8 may matter just as much as (if not more than) the model itself. The most promising is Dynamic Workflows, a research-preview capability for Claude Code that lets the model plan a task and then run hundreds of parallel subagents in a single session. Anthropic says this lets Claude Code carry out codebase-scale migrations across hundreds of thousands of lines of code, from kickoff to merge, using the existing test suite as its bar.

The release also adds user-adjustable effort controls, which let you decide how much reasoning depth the model spends on a given response so you can trade off speed and cost. And the Messages API now accepts live changes to the messages array, which, as Artificial Intelligence News explains, lets developers update instructions mid-task without breaking prompt caching or needing a separate user turn.

Anthropic used the launch of Opus 4.8 to tease its more powerful Mythos-class models as well. These are currently being tested by a limited set of organizations through a program called Project Glasswing, and the company says it expects to make them available to all customers in the coming weeks once additional cybersecurity safeguards are in place.

Why the Timing Matters

This release came fast. As TechCrunch points out, Opus 4.8 arrived just 41 days after Opus 4.7, a noticeably quicker turnaround than Anthropic’s usual pace, which is likely in part due to a lukewarm reception for 4.7 in conjunction with competitive pressure from rivals OpenAI and Google.

The business backdrop here is striking too. According to Thurrott, on the same day that Anthropic announced Opus 4.8, they also reported that they had raised $65 billion in Series H funding, bringing their valuation to $965 billion (surpassing OpenAI’s $852 billion). For anyone tracking the AI race, the combination of a faster release cadence and a soaring valuation is a clear signal of how hard the company is pushing to be the world’s foremost authority in artificial intelligence.

Frequently Asked Questions

Claude Opus 4.8 is Anthropic’s most capable generally available AI model, released on May 28, 2026. It’s an upgrade to Opus 4.7 aimed at coding, agentic tasks, reasoning, and knowledge work, and it’s accessible through the Claude apps, Claude Code, and the Claude API (under the name claude-opus-4-8).

Claude Code is Anthropic’s tool for using Claude on software development tasks from a terminal or developer environment. It’s where features like Dynamic Workflows run, letting the model take on large, multi-step coding jobs.

Agentic coding refers to AI models doing software work with a degree of autonomy: planning, writing and running code, using tools, checking their own output, and iterating, rather than just answering a single question. Opus 4.8 is benchmarked heavily on these kinds of multi-step coding tasks.

SWE-Bench Pro is a benchmark used to measure how well AI models handle realistic software engineering tasks. Opus 4.8 scored 69.2% on it, which Anthropic and outlets like The Decoder cite as a leading result in its class.

Humanity’s Last Exam is a multidisciplinary reasoning benchmark designed to test a model across many fields of knowledge. Opus 4.8 scored 49.8% on it without tools and 57.9% with tools, described in coverage as the highest marks in the field at the time of release.

Dynamic Workflows is a research-preview feature that lets Claude Code plan a task and run hundreds of parallel subagents at once in a single session. Anthropic says it enables very large jobs, such as codebase-scale migrations across hundreds of thousands of lines of code.

Mythos-class models are a more advanced tier of AI that Anthropic says will surpass Opus in intelligence. They’re not yet broadly available and are being tested by a limited group of organizations through a program called Project Glasswing, with wider access expected once additional cybersecurity safeguards are ready.

Claude Opus 4.8: Everything You Need to Know About Anthropic’s New AI Model

TL;DR Snapshot

What’s New in Opus 4.8: Better Judgment, Not Just Bigger Numbers

How It Performs: Benchmarks and Speed

New Features and the Mythos Tease

Why the Timing Matters

Frequently Asked Questions

Other Enterprise AI Articles You May Be Interested In

TL;DR Snapshot

What’s New in Opus 4.8: Better Judgment, Not Just Bigger Numbers

How It Performs: Benchmarks and Speed

New Features and the Mythos Tease

Why the Timing Matters

Frequently Asked Questions

What is Claude Opus 4.8?+

What is Claude Code?+

What is agentic coding?+

What is SWE-Bench Pro?+

What is Humanity’s Last Exam?+

What is Dynamic Workflows?+

What are Mythos-class models?+

Other Enterprise AI Articles You May Be Interested In