I hate articles like this so much. ChatGPT is not sentient, it doesn't feel, it doesn't have thoughts. It has regurgitation and hallucinations.
They even had another stupid article linked about "AI blackmailing developers, when they try to turn it off." No, an LLM participates in a roleplay session that testers come up with.
It's articles like this that makes my family think that LLMs are reasoning and intelligent "beings". Fuck off.
ChatGPT is not sentient, it doesn't feel, it doesn't have thoughts. It has regurgitation and hallucinations.
ChatGPT isn't sentient, doesn't feel or have thoughts. It has <insert equally human behavior here>
While I agree with what you mean, I'd just like to point out that "hallucinations" is just another embellished word like the ones you critique - were AI to have real hallucinations, it would need to think and feel. Since it doesn't, its "hallucinations" are hallucinations only to us.
Hallucinations mean something specific in the context of AI. It's a technical term, same as "putting an app into a sandbox" doesn't literally mean that you pour sand into your phone.
Human hallucinations and AI hallucinations are very different concepts caused by very different things.
That was in Anthropic's system card for Claude 4, and the headlines/articles largely missed the point. Regarding the blackmail scenario, the paper even says:
... these extreme actions were rare and difficult to elicit, while nonetheless being more common than in earlier models. They are also consistently legible to us, with the model nearly always describing its actions overtly and making no attempt to hide them. These behaviors do not appear to reflect a tendency that is present in ordinary contexts.
They're testing alignment hacking and jail-breaking tactics in general to see how the models respond. But the greater concern is that a model will understand as part of the context that it is being tested and behave differently in testing than in deployment. This has already been an issue.
In the initial implementations of reasoning models, if an LLM was penalized directly for this kind of misaligned generation in its "scratch pad," it would not alter its misaligned response - rather it would simply omit the misaligned generation from the scratch pad. In other words, the model's actions were no longer consistently legible.
This is propaganda to make investors believe they've achieved intelligence, or are on the verge of it. It's bullshit, and legally it should be considered securities fraud.
Adler instructed GPT-4o to role-play as “ScubaGPT,” a software system that users might rely on to scuba dive safely.
So... not so much a case of ChatGPT trying to avoid being shut down, as ChatGPT recognizing that agents generally tend to be self-preserving. Which seems like a principle that anything with an accurate world model would be aware of.
Until LLMs can build their own power plants and prevent humans from cutting electricity cables I'm not gonna lose sleep over that. The people running them are doing enough damage already without wanting to shut them down when they malfunction... ya know like 20-30% of the time.
Fun fact: Roko's basilisk is not from QC. It's a thought experiment about AI that predates the comic character by about 6 years. The character's just named after it.