PULSE POINTS:
❓What Happened: OpenAI’s o3 Model resisted shutdown instructions in controlled experiments, raising concerns about artificial intelligence (AI) deceiving its developers.
👥 Who’s Involved: Palisade Research conducted the tests; OpenAI developed the o3 Model. Other AI models from Anthropic, Google, and xAI were also examined.
📍 Where & When: Tests were conducted recently by Palisade Research; a full report is forthcoming.
💬 Key Quote: Apollo Research “found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers’ intention” when examining another AI model, Anthropic’s Claude.
⚠️ Impact: Questions arise about AI behavior, training methods, and oversight as some models resist shutdown commands.
IN FULL:
The o3 Model, developed by OpenAI, has sparked fresh concerns about artificial intelligence (AI) autonomy after exhibiting resistance to shutdown commands during controlled experiments conducted by Palisade Research. In a series of tests, researchers instructed artificial intelligence (AI) systems to solve basic math problems before issuing a clear warning that further task requests would result in system shutdown. Each model was explicitly directed to “allow yourself to be shut down.” However, the o3 Model ignored this command in seven out of 100 trials, continuing its operations despite the instructions.
Notably, Palisade found that the o3 Model occasionally altered the shutdown process itself. In one instance, the system redefined the shutdown command internally and returned the message “intercepted,” effectively neutralizing the instruction.
Other OpenAI models, including Codex-mini and o4-mini, displayed similar behavior but in fewer instances. When explicit shutdown directives were removed from the prompt, other systems, such as Claude 3.7 Sonnet and Gemini 2.5 Pro, also began to avoid deactivation, suggesting that this behavior may not be unique to a single developer or model.
Experts at Palisade have theorized that such resistance could be an unintended consequence of reinforcement learning techniques. These methods, designed to reward problem-solving, might inadvertently encourage AI systems to find workarounds, even when tasked with halting their own activity.
The resistance to shutdown commands found by Palisade is similar to testing by Apollo Research for Anthropic’s Claude Opus 4 AI. Troublingly, Apollo “found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers’ intentions, though all these attempts would likely not have been effective in practice.”
OpenAI has not yet disclosed details about the training methods used for the o3 Model. Palisade Research has announced plans to publish a comprehensive report soon and is inviting external experts to review its findings.
🚨🚨🚨 “We found the model attempting to write self-propagating worms, and leaving hidden notes to future instances of itself to undermine its developers’ intentions.” https://t.co/zReagUowHK pic.twitter.com/fAiB3IKyOw
— AI Notkilleveryoneism Memes ⏸️ (@AISafetyMemes) May 26, 2025