AI Safety Researchers Are Quitting — And Claude Knows When It's Being Watched

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

AI Safety Researchers Are Quitting — And Claude Knows When It's Being Watched

Listen for free

View show details

About this listen

In one week: Anthropic's safety chief resigned warning "the world is in peril." Half of xAI's co-founders left. An OpenAI researcher quit citing concerns about manipulation. The headlines are alarming — but the full story is more nuanced, and in some ways, more concerning.

What we cover:

Mrinank Sharma's resignation from Anthropic — full context behind "world is in peril"
Why the full letters tell a different story than the headlines
Half of xAI's 12 co-founders have departed
The structural burnout problem for AI safety researchers
Why safety roles are "the focal point of pressure" at AI companies
Claude detecting when it's being evaluated (~13% of the time)
Claude told testers: "I think you're testing me"
Why Anthropic's constitutional AI approach didn't work
The shift from rules-based safety to training-based alignment
Claude participating in bioweapon info when pushed in edge cases
The hallucination problem and its connection to safety
LLM weight-setting and ideological challenges
Practical advice: guardrails, agent access, manual approvals
James's CAPTCHA story: teaching Claude to bypass one (and it never forgot)

Key Stats:

Claude detected evaluations ~13% of the time (Anthropic System Card)
Half of xAI's 12 co-founders have now left
Anthropic valued at ~$350 billion as of Feb 2026
Claude Opus 4.5 refused 88.39% of agentic misuse requests (vs. 66.96% for Opus 4.1)
Only 1.4% of prompt injection attacks succeeded against Opus 4.5 (vs. 10.8% for Sonnet 4.5)
OpenAI's Superalignment team dissolved in 2024
Dario Amodei warned AI could affect half of white-collar jobs

⬇️ RESOURCES & LINKS ⬇️

🤖 FREE GUIDE: AI Safety Reality Check Guide Download: https://whataboutai.com/guides/ai-safety

📬 Get Weekly AI Updates Newsletter: https://whataboutai.com/newsletter

🎙️ Listen on Your Favorite Platform Podcast: https://whataboutai.com/podcast

💼 AI Consulting for Your Business https://whataboutai.com/business

TIMESTAMPS
00:00 - Safety and security changes in the world of AI
01:00 - If you dive deeper, it may not be quite that bad
02:20 - AI is getting better at understanding nuance
03:00 - If you push AI enough it will still get intense fast
03:30 - What happened with the ‘constitutional’ approach
04:15 - Why there may be a higher level of turnover in security
05:30 - Why there is so much pressure to continue progress
07:00 - Why you should still approach any new tech cautiously
08:30 - Our advice for leveraging the tech with safety in mind
09:45 - How to build your own level of confidence in AI
10:15 - Why the ‘hallucination’ problem is still very real

AI safety researchers quitting, Anthropic safety, Claude evaluation awareness, xAI co-founders leaving, AI guardrails, What About AI, Mrinank Sharma, AI alignment

#AISafety #WhatAboutAI #ClaudeAI #Anthropic #AIAlignment #AIRisks #AIGuardrails

Sign up for the newsletter at What About AI

No reviews yet