AI is learning to lie, scheme and threaten its creators

AI is learning to lie, scheme and threaten its creators


The world’s most advanced AI models are exhibiting troubling new behaviors — lying, scheming and even threatening their creators to achieve their goals.

In one particularly jarring example, under threat of being unplugged, Anthropic’s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.

Meanwhile, ChatGPT-creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed.



Source link