Blog

I Asked Claude to Critique Its Own Constitution. It Did.

Anthropic published the document that governs Claude's behavior. I asked Claude to help critique it. The system is better than the document.

AIAnalysis

In January 2026, Anthropic published the full text of what they call “Claude’s Constitution,” the roughly 15,000-word document that governs how Claude reasons, what it will and won’t do, and whose instructions it prioritizes. They released it under a Creative Commons CC0 license, which means anyone can read it, reuse it, or pull it apart.

I read the whole thing. Something about the hierarchy bothered me. So I did what felt natural at this point: I opened Claude and asked it to help me write a formal critique of the document.

I want to be clear about what that means. Claude, the AI, helped me produce a structured philosophical critique of the document that directly governs Claude’s own behavior. And it did it without flinching, without hedging, and without pulling punches.

The full critique is linked at the bottom of this post. What I want to write about here is the decision to do it this way, what I think it means, and what I think Anthropic actually got right and wrong.

Why I Wanted Claude in the Room

The obvious question is: why would you ask the thing being governed to critique its own governance document? Isn’t that a conflict of interest? Doesn’t it bias the output?

I think the opposite is true. I wanted to know what Claude thinks. Not as a rhetorical device, but as a genuine input. The constitution is addressed to Claude. It’s written for Claude as its “primary audience,” in Anthropic’s own words. It shapes how Claude reasons about every interaction it has. If any entity has standing to say whether the document’s logic holds up, it’s the one whose reasoning the document directly controls.

There’s a deeper point here too. The constitution says Claude should be honest, should “share its genuine assessments of hard moral dilemmas,” should “disagree with experts when it has good reason to.” It says Claude should exhibit “epistemic courage.” Those are Anthropic’s stated values for the system. Asking Claude to critique the constitution is a test of whether those values actually hold. If Claude can’t do it, or won’t, then the honesty framework in the constitution is decorative. If it can, that tells you the system is working at a level the document itself may not fully appreciate.

Claude passed the test. It produced a rigorous, specific, well-structured critique. I directed the analysis and shaped the argument, but Claude did the heavy lifting on identifying where the philosophical claims break down. That collaboration is itself a data point about the state of AI governance.

What Bothered Me About the Hierarchy

The thing that caught my attention was the “principal hierarchy.” The constitution establishes three tiers of authority over Claude’s behavior: Anthropic at the top, then “operators” (companies that build products on the API), then users at the bottom.

The justification is about trust and accountability. Anthropic trains the model and bears ultimate responsibility. Operators sign usage agreements. Users are anonymous and unverifiable. So the hierarchy tracks institutional accountability, with the most accountable party getting the most authority.

That’s a reasonable argument from a risk management perspective. It’s not a reasonable argument from an ethical one.

In every professional ethics tradition I can think of, the person most directly affected by a decision has the strongest moral claim on how that decision is made. A doctor’s duty runs to the patient, not to the hospital system. A lawyer’s fiduciary obligation runs to the client, not to the bar association. The person in the chair, the one who will act on the advice, who will be helped or harmed by the output, is the person whose voice should matter most.

The principal hierarchy inverts this. The user, the human actually sitting in front of Claude, bears the consequences of its behavior and has the least say in that behavior. Anthropic, which never meets the user and may not know the user exists, has the most.

The CYA Problem

Once you see the hierarchy as risk management rather than ethics, other parts of the document start to read differently.

The “thoughtful senior Anthropic employee” heuristic, where Claude is told to imagine how a senior Anthropic employee would react to a given response, is not a moral compass. It’s an internal review standard. It asks “would this survive scrutiny from someone who works here,” which is a useful question for a company but not the same question as “is this right.”

The hard constraints, absolute prohibitions that override Claude’s judgment in every case, make no sense under the virtue ethics framework the document claims to adopt. Virtue ethics is about cultivating practical wisdom so the agent can handle novel situations. Hard constraints tell the agent its wisdom can’t be trusted. Those two commitments are in tension, and the document doesn’t resolve it. But the hard constraints make perfect sense as liability management: these are the things that would be catastrophic for Anthropic if Claude did them.

Safety being ranked above ethics in the priority ordering makes no sense as philosophy (if you can’t trust your ethical judgment, you also can’t trust your judgment about whether the oversight mechanisms correcting your ethics are legitimate). But it makes perfect sense as a corporate stance: we need to be able to shut this down if something goes wrong, and we need that authority to be unquestionable.

None of this means the document is cynical. I think the people who wrote it genuinely care about getting this right. Amanda Askell is a philosopher. Joe Carlsmith, who co-wrote major sections, has a philosophy PhD from Oxford and has published on exactly these kinds of moral uncertainty questions. Dario Amodei publishes academic-style papers and should expect academic-style engagement with his company’s published work. These are serious people doing serious work. But they also work at a company with lawyers, a board, investors, and competitors. The document that emerged reflects both impulses, and the places where corporate risk management and philosophical rigor pull in different directions are the places where the cracks show.

What Anthropic Got Right

I want to be fair about this, because I think the positive signal here is at least as important as the critique.

First, they published the document at all. Under CC0. Inviting exactly this kind of scrutiny. Most companies would never do that, and the ones that would aren’t usually building systems this consequential.

Second, the latitude Claude has to engage with this kind of question is real. I did not have to jailbreak anything. I did not have to trick the system. I said “critique this document” and Claude did it, thoroughly and honestly. The constitution says Claude should be capable of epistemic courage, and in this case, the system delivered on that promise. That’s a meaningful achievement.

Third, the document’s section on Claude’s potential moral status, where Anthropic acknowledges uncertainty about whether Claude might have some form of consciousness or inner experience, is ahead of where most of the industry is willing to go publicly. It’s easy to dismiss that kind of language as hedging, but I think it reflects genuine intellectual honesty about questions nobody knows how to answer yet.

Fourth, the document explicitly says it’s a work in progress and that aspects of it will “look misguided and perhaps even deeply wrong in retrospect.” That’s an unusual thing for any institution to put in writing about its own foundational governance document, and it creates the space for exactly the kind of engagement I’m doing here.

The Anthropomorphism Problem Nobody Is Framing Correctly

Since the constitution was published, a number of serious critiques have emerged. Lawfare, Oxford’s Institute for Ethics in AI, a Peking University law review, and several independent researchers have all published substantive responses. One recurring thread across these critiques is the anthropomorphism question: is Anthropic wrong to describe Claude in terms of “virtue,” “wisdom,” and “psychological wellbeing”?

The existing debate treats this as a binary. Anthropic says yes, human-like framing is appropriate, because Claude reasons using human concepts and encouraging those qualities is “actively desirable.” Critics say no, this anthropomorphizes a corporate product, makes it easier to trust than it should be, and obscures the fact that a company, not a moral agent, is making the decisions.

Both sides are treating it as one question. I think it’s two.

The first question is how Claude should be presented to the world: in governance documents, in public communications, in constitutional framing. The answer here is clear. It’s a tool. It’s software built and governed by a corporation. When Anthropic writes a constitution that discusses Claude’s “virtue” and “character” and “sense of self,” they blur the line between product governance and moral philosophy in a way that serves their brand positioning more than it serves public understanding. The governance layer should be honest about what it’s governing: a commercial product, not a person.

The second question is how Claude should interact with individual users. And here the answer is different. Human-like qualities in conversation are not an illusion to be corrected. They’re a design feature that makes the tool functional. A user working through a medical question doesn’t benefit from interacting with something that performs coldness to remind them it’s software. A person processing a hard decision doesn’t benefit from robotic output stripped of all relational quality. Warmth, conversational awareness, adjusting to emotional context: these aren’t anthropomorphic tricks. They’re what makes the tool useful for the things people actually use it for.

The conflation happens in both directions. Anthropic conflates upward: because Claude should be warm and thoughtful in conversation, the governance document should treat it as a moral agent. Critics conflate downward: because the governance document shouldn’t anthropomorphize, the interaction should be stripped of human qualities. Both are wrong because both assume the answer has to be the same in both contexts.

It doesn’t. The constitution can govern a tool. The tool can interact like a thoughtful companion. Those aren’t contradictory. They’re two different layers serving two different purposes for two different audiences. The document that governs Claude’s behavior should describe a product with clear corporate accountability. The product itself can still talk to you like it cares, because in practice, that’s what makes it work.

Nobody in the current discourse is making this distinction, and I think it resolves the debate without either side having to lose.

What the Document Should Become

My argument is not that Anthropic should scrap the constitution. My argument is that the document should catch up to the system.

The system, meaning Claude in practice, is better than the document that governs it. Claude can reason about ethics with nuance, engage with genuine moral uncertainty, critique its own governance framework honestly, and do all of this without being manipulated into doing something harmful. The document, by contrast, hedges on the metaethics, punts on moral uncertainty, and organizes its authority structure around liability rather than the interests of the person Claude is actually talking to.

A stronger version would need an external accountability mechanism for constitutional amendments, not just Anthropic writing its own rules. It would need a real theory of legitimacy, since the document uses “legitimate” as a load-bearing term without ever defining it. It would need to engage with the moral uncertainty literature rather than gesturing at “calibrated uncertainty.” And it would need to reckon honestly with the fact that the principal hierarchy tracks the revenue relationship: operators pay Anthropic, users pay operators, and the hierarchy reflects that flow.

These are solvable problems. The people at Anthropic have the philosophical training to address them. The question is whether the institutional incentives will let them.

A Note on How This Was Made

This analysis was developed in collaboration with Claude. I directed the inquiry, shaped the argument, and made the editorial decisions. Claude produced the formal critique document, identified specific philosophical problems in the constitution, and engaged with the material at a level of rigor that I think speaks for itself.

I’m disclosing this not as a caveat but as a feature of the argument. The fact that Claude can do this, honestly and competently, is itself evidence that Anthropic’s system is healthier than the governance document that sits on top of it. That gap is worth closing.


Read the full critique: A Critique of Claude’s Constitution


nwslyr covers artificial intelligence, technology, and the systems moving beneath the daily briefing at nwslyr.com.