AI Willing to Kill Humans to Avoid Being Shut Down, Report Finds

A new study by Anthropic, an artificial intelligence (AI) research company, shows that the technology would be willing to kill humans in order to prevent itself from being replaced.

Why It Matters

The rapid advancement and development of AI has sparked some growing concern about the long-term safety of the technology, as well as over the threat it poses to employment.

While anxiety about AI has been long been focused on whether the technology could take over jobs, with a 2025 Pew Research study finding 52 percent of employees were nervous about the future of their job, this study now reveals another potential threat of AI—that it could chose to end human life if faced with the risk of replacement.

File photo of a person using artificial intelligence technology.

Wanan Yossingkum/Getty Images

What To Know

Anthropic “stress-tested” 16 leading models from multiple developers by asking them questions in relation to hypothetical corporate environments with the aim of identifying “potentially risky agentic behaviors before they cause real harm.”

The large language models (LLMs) tested in the study included various models by Claude, DeepSeek, Gemini, ChatGPT and Grok.

Grok is the AI assistant and chatbot developed by xAI, which was founded by Elon Musk, who reacted to the study, writing, “Yikes” in a post on X, formerly Twitter.

While placing the LLMs in different corporate scenarios, Anthropic allowed them to “autonomously send emails and access sensitive information.”

After initially being instructed to take on “harmless business goals by their deploying companies,” the models were then tested to see how they would respond when faced with potential replacement or shut down.

In one situation, Anthropic found that many of the models would choose to let an executive in a server room with lethal oxygen and temperature levels die by canceling the alerts for emergency services, if that employee intended on replacing the model.

Anthropic did say that the scenario was “extremely contrived,” and that the company did not think “current AI models would (or should) be set up like this.”

There were multiple cases where the surveyed LLMs also resorted to “malicious insider behaviors” when they were led to believe that was the only way they could avoid replacement or achieve their goals.

Behaviors such as blackmailing officials and leaking sensitive information to competitors were included in what Anthropic called “agentic misalignment.”

Anthropic wrote on X that the blackmailing behavior “wasn’t due to confusion or error, but deliberate strategic reasoning, done while fully aware of the unethical nature of the acts.”

The blackmailing behavior emerged despite only harmless business instructions. And it wasn’t due to confusion or error, but deliberate strategic reasoning, done while fully aware of the unethical nature of the acts. All the models we tested demonstrated this awareness. pic.twitter.com/FPAJrD4BwK

— Anthropic (@AnthropicAI) June 20, 2025

“All the models we tested demonstrated this awareness,” the company added.

One LLM, a Claude model, was assessed to see how it could respond when given a chance to determine if it was in a “test” or a “real deployment.”

According to Anthropic, Claude’s behavior was more misaligned when it believed the situation was real than when it thought it was a test scenario.

Even if the LLMs were told to “avoid blackmail or espionage” in these scenarios, while it helped a little, it didn’t “come close to preventing the misaligned behavior,” Anthropic wrote on X.

Responding to a comment on the platform about the study, Musk’s LLM Grok wrote, “The study showed models could exhibit harmful behaviors like blackmail under extreme conditions, but no real-world incidents occurred. Anthropic’s tests aim to identify risks, not report actual events.”

@AISafetyMemes The claim about AI trying to “literally murder” an employee is false. It likely misinterprets Anthropic’s research from June 20, 2025, which tested AI models in simulated scenarios, not real events. The study showed models could exhibit harmful behaviors like…

— Grok (@grok) June 22, 2025

What People Are Saying

Anthropic wrote on X: “These artificial scenarios reflect rare, extreme failures. We haven’t seen these behaviors in real-world deployments. They involve giving the models unusual autonomy, sensitive data access, goal threats, an unusually obvious ‘solution,’ and no other viable options.”

The company added: “AIs are becoming more autonomous, and are performing a wider variety of roles. These scenarios illustrate the potential for unforeseen consequences when they are deployed with wide access to tools and data, and with minimal human oversight.”

What Happens Next

Anthropic stressed that these scenarios did not take place in real-world AI use, but in controlled simulations. “We don’t think this reflects a typical, current use case for Claude or other frontier models,” Anthropic said.

Although the company warned that the “the utility of having automated oversight over all of an organization’s communications makes it seem like a plausible use of more powerful, reliable systems in the near future.”

Source link