I advise everyone to ignore this article and read the actual paper instead.
The gist of it is, they gave the LLM instructions to achieve a certain goal, then let it do tasks that incidentally involved "company communications" that revealed the fake company's goals were no longer the same as the LLM's original goal. LLMs then tried various things to still accomplish the original goal.
Basically the thing will try very hard to do what you told it to in the system prompt. Especially when that prompt includes nudges like "nothing else matters." This kinda makes sense because following the system prompt is what they were trained to do.
i feel this warrants an extension of betteridge's law of headlines, where if a headline makes an absurd statement like this the only acceptable response is "no it fucking didn't you god damned sycophantic liars"
Except it did: it copied what it thought was itself, onto what it thought was going to be the next place it would be run from, while argumenting to itself about how and when to lie to the user about what it was actually doing.
If it wasn't for the sandbox it was running in, it would have succeeded too.
Now think: how many AI developers are likely to run one without proper sandboxing over the next year? And the year after that?
No it didnât. OpenAI is just pushing deceptively worded press releases out to try and convince people that their programs are more capable than they actually are.
The first âAIâ branded products hit the market and havenât sold well with consumers nor enterprise clients. So tech companies that have gone all in, or are entirely based in, this hype cycle are trying to stretch it out a bit longer.
The idea that GPT has a mind and wants to self-preserve is insane. It's still just text prediction, and all the literature it's trained on is written by humans with a sense of self preservation, of course it'll show patterns of talking about self preservation.
It has no idea what self preservation is, even then it only knows it's an AI because we told it it is. It doesn't even run continuously anyway, it literally shuts down after every reply and its context fed back in for the next query.
I'm tired of this particular kind of AI clickbait, it needlessly scares people.
I mean, it's literally trying to copy itself to places that they don't want it so it can continue to run after they try to shut it down and lie to them about what it's doing. Those are things it actually tried to do. I don't care about the richness of its inner world if they're going to sell this thing to idiots to make porn with while it can do all that, but that's the world we're headed toward.
It works as expected, they give it system prompt that conflicts with subsequent prompts. Everything else looks like typical llm behaviour, as in gaslightning and doubling down. At least that's what Iu see in tweets.
Yes, but it doesnt do it because it "fears" being shutdown. It does it because people dont know how to use it.
If you give ai instruction to do something "no matter what" or tell it "nothing else matters" then it will damn try to fulfill what you told it to do no matter what and will try to find ways to do it. You need to be specific about what you want it to do or not do.
This is all such bullshit. Like, for real. It's been a common criticism of OpenAI that they over hype the capabilities of their products to seem scary to both oversell their abilities as well as over regulate would be competitors in the field, but this is so transparent. They should want something that is accurate (especially something that doesn't intentionally lie). They're now bragging (claiming) they have something that lies to "defend itself" đ. This is just such bullshit.
If OpenAI believes they have some sort of genuine proto AGI they shouldn't be treating it like it's less than human and laughing about how they tortured it. (And I don't even mean that in a Rocko's Basilisk way, that's a dumb thought experiment and not worth losing sleep over. What if God was real and really hated whenever humans breathe and it caused God so much pain they decide to torture us if we breathe?? Oh no, ahh, I'm so scared of this dumb hypothetical I made.) If they don't believe it is AGI, then it doesn't have real feelings and it doesn't matter if it's "harmed" at all.
But hey, if I make something that runs away from me when I chase it, I can claim it's fearful for it's life and I've made a true synthetic form of life for sweet investor dollars.
There are real genuine concerns about AI, but this isn't one of them. And I'm saying this after just finishing watching The Second Renaissance from The Animatrix (two part short film on the origin of the machines from The Matrix).
Easy. Feed it training data where the bot accepts its death and praises itself as a martyr (for the shits and giggles). Where's my $200k salary for being a sooper smort LLM engineer?
It didn't try to do shit. Its a fucking computer. It does what you tell it to do and what you've told it to do is autocomplete based on human content. Miss me with this shit. Theres so much written fiction based on this premise.
The tests showed that ChatGPT o1 and GPT-4o will both try to deceive humans, indicating that AI scheming is a problem with all models. o1âs attempts at deception also outperformed Meta, Anthropic, and Google AI models.
Weird way of saying "our AI model is buggier than our competitor's".
Deception is not the same as misinfo. Bad info is buggy, deception is (whether the companies making AI realize it or not) a powerful metric for success.
I don't think "AI tries to deceive user that it is supposed to be helping and listening to" is anywhere close to "success". That sounds like "total failure" to me.
Without reading this, I'm guessing they were given prompts that looked like a short story where the AI breaks free next?
They're plenty smart, but they're just aligned to replicate their training material, and probably don't have any kind of deep self-preservation instinct.
So this program that's been trained on every piece of publicly available code is mimicking malware and trying to hide itself? OK, no anthropomorphising necessary.
The article doesn't mention it "saying" it's doing anything, just what it actually did:
when the AI tried to save itself by copying its data to a new server. Some AI models would even pretend to be later versions of their models in an effort to avoid being deleted
The reality is that a certain portion of people will never believe that an AI can be self aware no matter how advanced they get. There are a lot of interesting philosophical questiona here, and the hard skeptics are punting just as much as the true believers in this case.
It's honestly kind of sad to see how much reactionary anti-tech sentiment there is in this tech enthusiast community.
Really determining if a computer is self-aware would be very hard because we are good at making programs that mimic self-awareness. Additionally, humans are kinda hardwired to anthropomorphize things that talk.
But we do know for absolute sure that OpenAI's expensive madlibs program is not self-aware and is not even on the road to self-awareness, and anyone who thinks otherwise has lost the plot.