The world's most advanced AI models are exhibiting troubling new behaviors – lying, scheming, and even threatening their creators to achieve their goals.
The world's most advanced AI models are exhibiting troubling new behaviors – lying, scheming, and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.
Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.
These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.
No rules
Current regulations aren't designed for these new problems.
The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.
Goldstein believes the issue will become more prominent as AI agents – autonomous tools capable of performing complex human tasks – become widespread.
"I don't think there's much awareness yet," he said.
All this is taking place in a context of fierce competition.
Useless fear mongering over the behavior of statistical models. No actual persistent memory unless we add it (and then then it isn't "memory"). They're driven only by their prompts, context, and training data.