Blog

Opus 4.6 Is Not Being Deprecated. Here's What's Actually Happening.

The deprecation claim is false. The operational complaints are real. Most of the posts getting shared right now are confusing two different models.

May 23, 2026

AIAnalysisAnthropic

I use Opus 4.6 and 4.7 daily alongside other frontier models. Over the past week, I’ve noticed 4.6 getting lazy at various points, giving noticeably lower output quality than it normally does. At the same time, 4.7 has actually improved. It used to be dry and weirdly aggressive, calling outputs from other frontier AI agents lies, flatly declaring that competing models were wrong, borderline psychotic energy in its responses. That seems to have mellowed out.

That was enough to make me curious whether something had actually changed, so I had Claude pull the primary sources: the deprecation docs, the status page history, the benchmark data, and the relevant engineering disclosures, and cross-reference what’s actually documented against what’s circulating online.

The short version: the deprecation claim is wrong. The degradation complaints are sourced and real, but they trace to different causes than most people are assuming.

A dark nwslyr analysis graphic stating Opus 4.6 is not being deprecated and showing active model lifecycle status. — The primary-source check: Opus 4.6 is active, with a retirement floor not before February 2027.

The deprecation claim is false

Anthropic’s model deprecation page, as of May 23, 2026, lists Claude Opus 4.6 (claude-opus-4-6) as Active with a tentative retirement date of “Not sooner than February 5, 2027.” Active is the highest lifecycle status Anthropic assigns. There is no deprecation notice, no recommended replacement, and no migration timeline.

What is being deprecated is the original Claude Opus 4.0 (claude-opus-4-20250514). Anthropic deprecated it on April 14, 2026. It retires June 15, 2026. The recommended replacement listed on the deprecation page is Opus 4.7.

Opus 4.0 and Opus 4.6 are different models with different release dates, different model strings, and different lifecycle statuses. A significant number of articles and forum posts have treated “Opus 4.0 deprecated” and “Opus 4.6 deprecated” as the same event. They are not.

Google’s Vertex AI model card for Opus 4.6 independently confirms the February 2027 retirement floor. Two separate platforms listing the same date. No ambiguity.

The operational issues are documented

The deprecation claim is false. That does not mean everything is fine. What follows is drawn from Anthropic’s own status page, their engineering disclosures, independent benchmark trackers, and public GitHub issues filed against Anthropic’s repositories. Each finding is attributed to its source.

Service incidents

Anthropic’s status page records multiple incidents involving Claude models in 2026. Progressive Robot’s April 2026 analysis, which cites the status page directly, identified Opus 4.6-specific incidents including elevated errors on February 28 and two separate incidents on March 31. The status page also logs incidents on May 14-15 and May 22 affecting multiple models including Opus 4.7 and Sonnet 4.6. These are Anthropic’s own logged disruptions. During affected windows, model performance degrades because the service itself is degraded. These incidents are temporary and infrastructure-related. They are not evidence that model weights changed.

The 1M context self-degradation problem

A GitHub issue filed against the claude-code repository documented Opus 4.6 self-reporting degraded performance during a 1M context session. According to the issue, the model began flagging its own declining effectiveness at around 40% context capacity and recommended a session restart at 48%. The reported symptoms included circular reasoning, lost decisions, and self-contradiction, all before the model had used half its advertised context window.

This is a single documented case, not a controlled study. But it describes a failure mode that matters for anyone running long agentic sessions, and it has been corroborated by similar complaints across developer forums.

Infrastructure changes shipped alongside the model

Opus 4.6 launched with several changes that altered how the model behaves in established workflows. According to Anthropic’s own documentation, these include compaction (automatic summarization of older context as conversations approach the window limit), adaptive thinking (automatic allocation of reasoning tokens based on assessed task complexity with high effort as the default), and the removal of assistant prefill support. None of these are bugs. All of them can destabilize tooling built around previous model behavior, and multiple analyses have pointed to these changes as contributing factors in the user complaints.

Notably, Anthropic’s release notes show that 1M context rate limits were removed on March 13, 2026 when the feature went generally available, not tightened. This weakens one popular theory that the model was being quietly throttled.

Benchmark data

Marginlab, an independent third party running daily SWE-Bench-Pro evaluations of Claude Code with Opus 4.6, reported a pass rate slip from a 56% baseline to 50% as of their April 10, 2026 reading. Marginlab themselves stated that this individual daily reading was not yet statistically significant given their sample size of 50 test cases. The methodology, as described by Marginlab, uses Claude Code CLI with the current best model directly, with no custom harnesses.

Anthropic also corrected two of its own published launch scores. According to a February 23 update on the Opus 4.6 announcement page, the Humanity’s Last Exam result was adjusted from 53.1% to 53.0% after an improved cheating-detection pipeline. The multi-agent BrowseComp score was separately adjusted from 86.81% to 86.57% after a contamination and eval-awareness review, as documented in Anthropic’s engineering post and the Opus 4.6 system card. These are small corrections. They do not indicate a model collapse, but they are worth noting for completeness.

On the other side: Progressive Robot’s analysis of the LiveBench leaderboard as of April 2026 showed Opus 4.6 at 76.33 overall versus Opus 4.5 at 75.96, slightly favoring 4.6. The independent benchmark evidence does not support the claim that Opus 4.6 is broadly less capable than its predecessor.

Token cost and efficiency

Artificial Analysis reported that Opus 4.6 used 30-60% more tokens than Opus 4.5 on their GDPval-AA benchmark. Their finding was that Opus 4.6 led the benchmark while also becoming the most costly model they had tested on it. A model can score higher on capability evaluations while consuming significantly more resources per task. This is not a contradiction. It is a design tradeoff that Anthropic made explicit in its launch documentation when it warned that the deeper thinking defaults “can add cost and latency on simpler” tasks.

Developer reports

Across Reddit (r/ClaudeCode, r/claude), Hacker News, Discord, and Anthropic’s own GitHub issue tracker, developers have reported shorter code outputs, degraded multi-step instruction following, increased refusals, and reduced research depth before edits. One detailed GitHub issue documented a roughly 50-60% productivity decline after switching to Opus 4.6, citing exploration loops, memory loss after compaction, and the model ignoring its own instruction files.

These are user reports inside Claude Code’s full product stack, not controlled API-level evaluations. They likely reflect a combination of raw model behavior, tool orchestration, compaction effects, and product-level defaults rather than a single clean regression in model weights. Their volume and consistency is notable, but they should be read as field reports, not as isolated model benchmarks.

What the evidence does not support

For the record, and based on the sources reviewed:

There is no documentation, public or otherwise, supporting the claim that Anthropic is deliberately degrading Opus 4.6 to push users toward Opus 4.7. There is no evidence that Opus 4.6 is being deprecated. Marginlab’s benchmark drops are, by Marginlab’s own assessment, not yet statistically significant. And the developer complaints, while real, do not cleanly separate model-level regression from the infrastructure and product changes that shipped alongside the model.

The narrative that Opus 4.6 is being “secretly nerfed” is not supported by the available evidence. That does not mean users are imagining the friction. It means the friction has identifiable, documented causes that are less dramatic and more fixable than a deliberate downgrade.

Three categories, one conversation

The discourse around Opus 4.6 is confusing because three distinct categories of problem are being discussed as if they were one thing.

Infrastructure incidents are temporary service disruptions logged on Anthropic’s status page. They resolve. They affect whichever models are running at the time. They are not model quality changes.

Operational friction is when a model’s default behavior (deeper reasoning, higher token consumption, compaction, adaptive effort allocation) creates real workflow problems even though the model’s underlying capability has not declined. The Artificial Analysis finding captures this precisely: Opus 4.6 led their benchmark while costing significantly more to run.

Deliberate deprecation is when a model reaches end of life and Anthropic publishes a retirement date and replacement. This is happening to Opus 4.0. It is not happening to Opus 4.6.

Collapsing these three into a single “they nerfed it” narrative is understandable but inaccurate, and it leads to the wrong responses. If the problem is infrastructure incidents, the fix is on Anthropic’s side and temporary. If the problem is operational friction, there are user-side mitigations that help. If the problem were deprecation, you would need to migrate. Since it isn’t, you don’t.

The context window finding deserves its own conversation

The 1M context self-degradation report is, to me, the most interesting thing that came out of this research. Not because it’s the most dramatic, but because it points to a gap between advertised capability and practical usability that standard benchmarks do not capture.

Benchmarks test capability on individual tasks. Production workflows test sustained coherence over extended sessions. A model that scores well on a 50-task evaluation can still lose the thread halfway through a long codebase refactor. Those are different measurements of different things, and the gap between them is where much of the practitioner frustration lives.

The community-discovered workaround of keeping sessions shorter helps, but it also means the 1M context window is a ceiling you may not want to approach rather than a capacity you can routinely use.

Practical takeaways

The model is not going anywhere before February 2027 at the earliest. No emergency migration is needed.

The operational friction is real and addressable. Workarounds that developers have reported as effective include setting CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1, using /effort high to force full reasoning allocation, and keeping sessions shorter to stay below the context degradation threshold.

Anthropic published a detailed engineering postmortem after the September 2025 degradation wave, documenting three infrastructure bugs that had affected multiple models over weeks. That disclosure came only after sustained public pressure. Whether the current round of complaints will produce a similar response is an open question, but the precedent exists.

If you are making platform decisions based on what you’ve read about Opus 4.6 over the past few weeks, check the primary sources first. The deprecation page, the status history, and the engineering disclosures are all public. The documented reality is less alarming and more nuanced than the narrative that has built up around it.