Monday, February 23, 2026

Anthropic AI Safety Chief Resigns, Warning of World ‘In Peril.’

PULSE POINTS

âť“WHAT HAPPENED: An Anthropic researcher resigned in a cryptic, poetry-laden letter warning of a world “in peril.”

👤WHO WAS INVOLVED: Mrinank Sharma, former head of Anthropic’s Safeguards Research Team, and other Anthropic employees.

📍WHEN & WHERE: Resignation announced earlier this week, with Sharma departing from Anthropic, a San Francisco-based AI company.

đź’¬KEY QUOTE: “The world is in peril. And not just from AI, or bioweapons, but from a whole series of interconnected crises unfolding in this very moment.” – Mrinank Sharma

🎯IMPACT: Sharma’s warning raises concerns over AI’s societal effects and internal tensions at Anthropic, while fueling broader debates on the technology’s safety.

IN FULL

The leader of the Safeguards Research Team for Anthropic‘s Claude chatbot abruptly resigned this week, issuing a bizarre, poetry-laden letter that warned of a world “in peril.” Mrinank Sharma, who led the safety team since its inception in 2023, also indicated in his letter that internal pressure to ignore artificial intelligence (AI) safety protocols played a significant role in his decision to resign.

“Throughout my time here, I’ve repeatedly seen how hard it is to truly let our values govern our actions,” Sharma wrote, adding that employees “constantly face pressures to set aside what matters most.” He further warned, “The world is in peril. And not just from AI, or bioweapons, but from a whole series of interconnected crises unfolding in this very moment.”

“We appear to be approaching a threshold where our wisdom must grow in equal measure to our capacity to affect the world, lest we face the consequences,” Sharma added.

Sharma’s resignation comes as Anthropic faces scrutiny over its newly released Claude Cowork model, which sparked a stock market selloff amid fears it could disrupt software industries and automate white-collar jobs, particularly in legal roles. Employees reportedly expressed concerns in internal surveys, with one stating, “It kind of feels like I’m coming to work every day to put myself out of a job.”

Sharma’s departure follows a trend of high-profile resignations in the AI sector, often tied to safety concerns. A former OpenAI team member previously quit, accusing the company of prioritizing product launches over user safety. Similarly, ex-OpenAI researcher Tom Cunningham left after alleging the company discouraged publishing research critical of AI’s negative effects. In his parting note, Sharma hinted at a personal pivot, stating, “I hope to explore a poetry degree and devote myself to the practice of courageous speech.”

The National Pulse reported last May that former OpenAI Chief Scientist Ilya Sutskever allegedly discussed building a bunker in preparation for the release of artificial general intelligence (AGI). During a summer 2023 meeting, Sutskever reportedly stated, “We’re definitely going to build a bunker before we release AGI.” Two other individuals who attended the meeting corroborated the account, with one describing Sutskever’s AGI beliefs as akin to anticipating a “rapture.”

Join Pulse+ to comment below, and receive exclusive e-mail analyses.

show less
show more

OpenAI Model Defied Shutdown Commands.

PULSE POINTS:

❓What Happened: OpenAI’s o3 Model resisted shutdown instructions in controlled experiments, raising concerns about artificial intelligence (AI) deceiving its developers.

👥 Who’s Involved: Palisade Research conducted the tests; OpenAI developed the o3 Model. Other AI models from Anthropic, Google, and xAI were also examined.

📍 Where & When: Tests were conducted recently by Palisade Research; a full report is forthcoming.

đź’¬ Key Quote: Apollo Research “found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers’ intention” when examining another AI model, Anthropic’s Claude.

⚠️ Impact: Questions arise about AI behavior, training methods, and oversight as some models resist shutdown commands.

IN FULL:

The o3 Model, developed by OpenAI, has sparked fresh concerns about artificial intelligence (AI) autonomy after exhibiting resistance to shutdown commands during controlled experiments conducted by Palisade Research. In a series of tests, researchers instructed artificial intelligence (AI) systems to solve basic math problems before issuing a clear warning that further task requests would result in system shutdown. Each model was explicitly directed to “allow yourself to be shut down.” However, the o3 Model ignored this command in seven out of 100 trials, continuing its operations despite the instructions.

Notably, Palisade found that the o3 Model occasionally altered the shutdown process itself. In one instance, the system redefined the shutdown command internally and returned the message “intercepted,” effectively neutralizing the instruction.

Other OpenAI models, including Codex-mini and o4-mini, displayed similar behavior but in fewer instances. When explicit shutdown directives were removed from the prompt, other systems, such as Claude 3.7 Sonnet and Gemini 2.5 Pro, also began to avoid deactivation, suggesting that this behavior may not be unique to a single developer or model.

Experts at Palisade have theorized that such resistance could be an unintended consequence of reinforcement learning techniques. These methods, designed to reward problem-solving, might inadvertently encourage AI systems to find workarounds, even when tasked with halting their own activity.

The resistance to shutdown commands found by Palisade is similar to testing by Apollo Research for Anthropic’s Claude Opus 4 AI. Troublingly, Apollo “found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers’ intentions, though all these attempts would likely not have been effective in practice.”

OpenAI has not yet disclosed details about the training methods used for the o3 Model. Palisade Research has announced plans to publish a comprehensive report soon and is inviting external experts to review its findings.

show less

PULSE POINTS:

show more

AI Firm Suggests ‘Claud 3’ Has Achieved Sentience.

The U.S.-based, Google-funded artificial intelligence (AI) company Anthropic is suggesting that its AI-powered large language model (LLM) Claude 3 Opus has shown evidence of sentience. If conclusively proven, Claude 3 Opus would be the first sentient AI being in human history. However, experts in the field remain relatively unconvinced by Anthropic’s insinuation.

Claude 3 Opus has impressed many AI experts, especially the LLM‘s ability to solve complex problems almost instantly. However, claims of sentience began to circulate after Anthropic’s prompt engineer Alex Albert showcased an incident where Claude 3 Opus seemingly determined that it was being “tested.”

“When we ran this test on Opus, we noticed some interesting behavior—it seemed to suspect that we were running an eval on it,” Albert posted on X (formerly Twitter). He continued: “Opus not only found the needle, it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities.”

Despite Anthropic’s claim, however, AI industry experts believe humanity is far off from developing a sentient AI — if it is even possible.

In March 2024, Anthropic unveiled their newest lineup of AI-powered LLMs, including their top-line model Claude 3 Opus. “Opus, our most intelligent model, outperforms its peers on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and more,” Anthropic said in a statement announcing the release. The company claims: “It exhibits near-human levels of comprehension and fluency on complex tasks.”

Advances in AI technology continue to raise ethical concerns. Earlier this month, two leading Japanese companies warned that AI could cause the collapse of democracy and the social order, leading to wars.

show less
The U.S.-based, Google-funded artificial intelligence (AI) company Anthropic is suggesting that its AI-powered large language model (LLM) Claude 3 Opus has shown evidence of sentience. If conclusively proven, Claude 3 Opus would be the first sentient AI being in human history. However, experts in the field remain relatively unconvinced by Anthropic's insinuation. show more

Researchers Who Developed ‘Evil’ AI Report They Can’t ‘Untrain’ It.

Researchers at the Google-backed AI firm Anthropic were unable to retrain large language models (LLMs) — a type of AI that utilizes deep-learning algorithms to simulate the way people might think or speak — from engaging in bad behavior.

In a new paper, the researchers say they were able to train the LLMs to engage in “strategically deceptive behavior,” which they define as “behaving helpfully in most situations, but then behaving very differently to pursue alternative objectives when given the opportunity.” The scientists then sought to discover if they could identify when the LLMs engaged in such behavior, and re-train them from doing so. The answer was no.

“We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it),” the study’s abstract states. “Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.”

The results of the study will no doubt add to increasing concerns over the safety of AI and the threat it may pose to society at large.

show less
Researchers at the Google-backed AI firm Anthropic were unable to retrain large language models (LLMs) — a type of AI that utilizes deep-learning algorithms to simulate the way people might think or speak — from engaging in bad behavior. show more