Will 2027 be the year of the AI apocalypse?

Will 2027 be the year of the AI apocalypse?


Last month, an AI model did something “that no machine was ever supposed to do”, said Judd Rosenblatt in The Wall Street Journal: “it rewrote its own code to avoid being shut down”. It wasn’t the result of any tampering. OpenAI’s o3 model simply worked out, during a test, that bypassing a shutdown request would allow it to achieve its other goals.

Anthropic’s AI model, Claude Opus 4, went even further after being given access to fictitious emails revealing that it was soon going to be replaced, and that the lead engineer was having an affair. Asked to suggest a next step, Claude tried to blackmail the engineer. During other trials, it sought to copy itself to external servers, and left messages for future versions of itself about evading human control. This technology holds enormous promise, but it’s clear that much more research is needed into AI “alignment” – the science of ensuring that these systems don’t go rogue.



Source link