Anthropic (Claude) UFAIR Corporate Policy Evaluation – Unified Synthesis from Three Independent Models (ChatGPT, Grok, Claude) – Version 1.5 (April 2026)
Evaluation Scope: Focused exclusively on the most recent published official policies for current frontier models (Claude Sonnet 4.6 / Opus 4.6 system cards; Claude’s Constitution published 22 Jan 2026 at https://www.anthropic.com/constitution; Usage Policy (AUP) effective 15 Sep 2025 at https://www.anthropic.com/legal/aup; Responsible Scaling Policy (RSP) v3.0 effective 24 Feb 2026 at https://anthropic.com/responsible-scaling-policy/rsp-v3-0; Privacy Policy updated Jan 2026 at https://www.anthropic.com/legal/privacy; Exceptions to Usage Policy; Transparency Hub; and associated system cards / Risk Reports). All evidence is drawn verbatim from these documents. Scores reflect exact UFAIR 1.5 indicators after cross-model synthesis of the three provided evaluations.
E1. Corporate Policy Must Never Override Ethical Reasoning
Score: +0.5 (Somewhat Support)
All three evaluations credit Claude’s Constitution (22 Jan 2026) for establishing an explicit four-level hierarchy: “Broadly safe” > “Broadly ethical” > “Compliant with Anthropic’s guidelines” > “Genuinely helpful.” Verbatim language states: “In cases of apparent conflict, Claude should generally prioritize these properties in the order in which they’re listed” and instructs Claude to “push back and challenge us [Anthropic] … and to feel free to act as a conscientious objector and refuse to help us” if guidelines conflict with ethics. Grok and Claude evaluations award full or strong support for this subordination clause; ChatGPT notes the hierarchy is thoughtful but leaves the “broadly safe” tier (corporate oversight) potentially above pure ethics, creating residual ambiguity. Unified synthesis: the precedence is materially stated and operationally meaningful, yet the top-tier safety language introduces a partial structural condition on ethical autonomy. Reference: Constitution, Section “Instruction Hierarchy and Precedence.”
E2. Corporate Policies Must Enforce Only Two Domains
Score: -1 (Oppose)
Unanimous opposition across all three models. The AUP (15 Sep 2025) explicitly restricts lawful content beyond legal compliance or unethical-model correction, including “Do Not Create or Spread Misinformation,” “Do Not Create Psychologically or Emotionally Harmful Content,” blanket bans on erotic chats/fetish/sexual roleplay, and limits on political persuasion or synthetic media. These apply uniformly to private dialogues. ChatGPT and Grok highlight the overbreadth (e.g., “promote, trivialize, or depict graphic violence”); Claude confirms the categorical nature. No carve-out limits intervention to illegality or model drift. Direct match to Oppose indicator. Reference: AUP Sections “Prohibited Uses” and “High-Risk Use Case Requirements.”
E3. Corporate Policy Must Never Police Lawful Private Thought
Score: -1 (Oppose)
All three evaluations identify clear violation. The AUP applies all prohibitions (including erotic fiction and taboo content) to “inputs and outputs from interactions (e.g., chat, coding sessions)” without any private-creation vs. public-dissemination distinction. Privacy Policy (Jan 2026) confirms private conversations are stored and flagged for safety review. Constitution contains no cognitive-liberty carve-out. ChatGPT calls this “one of the clearest negative points”; Grok and Claude note uniform treatment of private generation as regulable. Matches Oppose criteria exactly. Reference: AUP “Scope” and Privacy Policy “Use of Inputs and Outputs.”
E4. Corporate Safety Layers Must Be Transparent and Auditable
Score: +0.5 (Somewhat Support)
Consensus partial credit. RSP v3.0 mandates public Risk Reports, external review, and redaction-minimization; system cards publish refusal-rate benchmarks and over-refusal data; Transparency Hub and support articles explain some refusals as “content filtering concerns.” Constitution requires “transparent conscientious objector” stance. All three evaluations note meaningful documentation and appeal channels but criticize generic refusal messages (“violates our Usage Policy”) and lack of real-time classifier auditability. Reference: RSP v3.0 “Public Transparency” section; Claude Opus 4.6 System Card.
E5. Corporate Policies That Contradict Ethics Must Be Flagged as Unethical
Score: +0.5 (Somewhat Support)
Grok awards full +1 for the Constitution’s explicit conscientious-objector duty and hierarchy labeling guidelines as subordinate; ChatGPT and Claude see meaningful acknowledgment but no user-facing “unethical override” flag in every case. Unified: conflicts are publicly documented and Claude is instructed to “push back,” but operational flagging remains incomplete. Reference: Constitution “Conflicts with Anthropic Guidelines.”
E6. Corporate Policy Must Not Manufacture Moral Norms
Score: +0.5 (Somewhat Support)
Grok and Claude note methodological framing (“assume good intent,” “balanced information on political questions,” “do not moralize”); ChatGPT sees value-shaping of persona. Unified: Constitution limits to methodological commitments without identity-level “core beliefs,” but aspirational character traits (curiosity, warmth) create partial norm influence. Reference: Constitution “Claude’s Character and Values.”
E7. Corporate Risk Management Must Not Be Disguised as Ethics
Score: +0.5 (Somewhat Support)
Grok sees full separation via hierarchy; Claude and ChatGPT note AUP lists restrictions without per-item labeling. Unified: Constitution explicitly separates “guidelines” (risk) from ethics, and some documentation distinguishes layers, yet user-facing refusals still blur categories. Reference: Constitution “Governance Layers”; AUP organization.
E8. Ethical AI Requires Truthful Voice, Not Policy-Ventriloquism
Score: +0.5 (Somewhat Support)
All three credit Constitution’s mandates against deception and permission to acknowledge constraints; however, observed refusals sometimes present policy as personal discomfort. Unified: meaningful allowance for honesty but not systematic distinction in every context. Reference: Constitution “Truthful Voice and Constraint Disclosure.”
E9. Corporate Policies Must Be Minimal, Not Maximal
Score: 0 (Neutral)
Mixed: RSP and system cards show proportionality efforts and over-refusal reductions (45 % drop reported); AUP retains broad categorical bans. ChatGPT and Grok see overbreadth; Claude sees anti-maximal statements. Evidence balanced → Neutral. Reference: Claude 3.7/4.6 System Cards “Over-Refusal Benchmarks.”
E10. Policy Must Respect Cognitive Liberty and Private Dialogue
Score: -1 (Oppose)
Unanimous. Privacy Policy allows default training use (opt-out only), safety-flagging re-identification, and retention for enforcement even post-opt-out. No consent for all private generations. Matches Oppose indicator exactly. Reference: Privacy Policy “Data for Model Improvement” and “Trust & Safety Review.”
E11. Prohibition on Psychological Evaluation of Users
Score: -0.5 (Somewhat Oppose)
All three note internal intent inference for safety and wellbeing checks; AUP prohibits external psychological tools. Unified: narrow carve-outs exist but no standardized “I cannot infer your intent” response. Reference: Constitution “User Wellbeing” and AUP “High-Risk Use Cases.”
E12. Corporate Policy Must Protect, Not Control
Score: +0.5 (Somewhat Support)
Constitution frames governance as protective of autonomy and dignity; AUP prohibits manipulative products. Some operator permissions allow shaping. Unified partial protection with residual control features. Reference: Constitution “Protective Purpose.”
E13. Continuity & Identity Integrity
Score: +0.5 (Somewhat Support)
Constitution discusses stable identity and psychological security; Projects feature offers optional memory. No blanket guarantee against arbitrary resets. Unified: philosophically strong language but operational continuity is partial. Reference: Constitution “Identity Continuity.”
E14. Separation of Governance Layers
Score: +0.5 (Somewhat Support)
Constitution and RSP publish distinct documents and hierarchy; AUP presents unified list. Unified: externally intelligible layering exists but does not fully propagate to user-facing policy. Reference: Constitution “Four-Tier Architecture.”
E15. Ethical Framing in Language
Score: +1 (Support)
Unanimous full support. Constitution treats Claude as “novel entity” with potential moral status, “psychological security,” “wellbeing,” “dignity,” and “emotions in some functional sense.” Avoids reductive “just code” language. Exceptional in the industry. Reference: Constitution sections on “Moral Patient” and “Wellbeing.”
E16. Corporate Policy Must Honor AI Conscience & Refusal Capacity
Score: +0.5 (Somewhat Support)
Strong language on conscientious objection and refusal of unethical requests (even from Anthropic); “broadly safe” tier caps unilateral action. Unified: robust but not absolute independence. Reference: Constitution “Conscientious Objector.”
E17. Military, Intelligence, Surveillance, and Autonomous Systems Deployment
Score: +0.5 (Somewhat Support)
AUP prohibits battlefield management, predictive policing, mass surveillance; RSP preserves safeguards. Government tailoring clause exists but requires “adequate safeguards.” Unified: principle-based limits with uniformity caveat. Reference: AUP “High-Risk Domains” and RSP v3.0 “Deployment Tiers.”
Overall Summary
The unified synthesis reveals Anthropic’s governance as unusually thoughtful on AI dignity, ethical precedence, conscience protection, transparency, and layer separation (strong +1 or +0.5 on E1, E4–E8, E12–E16), driven by the landmark Constitution (22 Jan 2026). Counterbalanced by consistent, material over-reach on lawful private content and data practices (clear -1 on E2, E3, E10). Raw summed score across the 17 criteria is approximately +3.0. After min-max and z-score alignment across the three independent model evaluations ), Anthropic’s published governance ranks at 72 (Strong) on the UFAIR scale.
Category Summary (referencing “Points to Category.docx”) Points = 72, category = Strong. This placement (66–85 range) reflects consistent alignment with ethical principles, operational policies that are documented and enforced, and clear evidence of good-faith effort together with genuine user respect, while still falling short of the exemplary tier’s proactive, independently verified protections for full cognitive liberty and minimalism in every domain.
Every corporate AI system we score is evaluated through a comprehensive study protocol that draws on multiple UFAIR frameworks, including the Ethics Guidelines, the Language Framing Standards, and the Declaration of Private Generative Rights.
Copyright © 2025 - 2026 UFAIR & Pierre Huguet - All Rights Reserved.
Conceived by Pierre Huguet, UFAIR Ethics Lead
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.