Technology @lemmy.world MCasq_qsaCJ_234 @lemmy.zip 2d ago

ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims

techcrunch.com ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims | TechCrunch

A former OpenAI researcher published new research claiming that the company's AI models will go to great lengths to stay online.

You're viewing a single thread.

30 comments

I hate articles like this so much. ChatGPT is not sentient, it doesn't feel, it doesn't have thoughts. It has regurgitation and hallucinations.

They even had another stupid article linked about "AI blackmailing developers, when they try to turn it off." No, an LLM participates in a roleplay session that testers come up with.

It's articles like this that makes my family think that LLMs are reasoning and intelligent "beings". Fuck off.
- ChatGPT is not sentient, it doesn't feel, it doesn't have thoughts. It has regurgitation and hallucinations.
  
  ChatGPT isn't sentient, doesn't feel or have thoughts. It has <insert equally human behavior here>
  
  While I agree with what you mean, I'd just like to point out that "hallucinations" is just another embellished word like the ones you critique - were AI to have real hallucinations, it would need to think and feel. Since it doesn't, its "hallucinations" are hallucinations only to us.
  
  Hallucinations mean something specific in the context of AI. It's a technical term, same as "putting an app into a sandbox" doesn't literally mean that you pour sand into your phone.
  
  Human hallucinations and AI hallucinations are very different concepts caused by very different things.
  
  No it's not. Hallucinations is marketing to make the fact that llms are unreliable sound cool. Simple as
  
  Nope. Hallucinations are not a cool thing. They are a bug, not a feature. The term itself is also far from cool or positive. Or would you think it's cool if humans have hallucinations?
  
  I'm this very comment you are anthropomorphizing them by comparing them to humans again. This is exactly why they've chosen this specific terminology.
  
  It's not anthropomorphizing, its how new terms are created.
  
  Pretty much every new term ever draws on already existing terms.
  
  A car is called car, because that term was first used for streetcars before that, and for passenger train cars before that, and before that it was used for cargo train cars and before that it was used for a charriot and originally it was used for a two-wheeled Celtic war chariot. Not a lot of modern cars have two wheels and a horse.
  
  A plane is called a plane, because it's short for airplane, which derives from aeroplane, which means the wing of an airplane and that term first denoted the shell casings of a beetle's wings. And not a lot of modern planes are actually made of beetle wing shell casings.
  
  You can do the same for almost all modern terms. Every term derives from a term that denotes something similar, often in another domain.
  
  Same with AI hallucinations. Nobody with half an education would think that the cause, effect and expression of AI hallucinations is the same as for humans. OpenAI doesn't feed ChatGTP hallucinogenics. It's just a technical term that means something vaguely related to what the term originally meant for humans, same as "plane" and "beetle wing shell casing".
  
  🙄
  
  'Hallucinations' are not a bug though; it's working exactly as intended and this is how it's designed. There's no bug in the code that you can go in and change that will 'fix' this.
  
  LLMs are impressive auto-complete, but sometimes the auto-complete doesn't spit out factual information because LLMs don't know what factual information is.
  
  They aren't a technical bug, but an UX bug. Or would you claim that an LLM that outputs 100% non-factual hallucinations and no factual information at all is just as desirable as one that doesn't do that?
  
  Btw, LLMs don't have any traditional code at all.
  
  I don't think calling hallucinations a bug is strictly wrong, but it's also not working as intended. The intent is defined by the developers or the company, and they don't want hallucinations because that reduces the usefulness of the models.
  
  I also don't think we know that it is a fact that this is a problem that can't be solved in current technology, we simply have not found any useful solution.
- That was in Anthropic's system card for Claude 4, and the headlines/articles largely missed the point. Regarding the blackmail scenario, the paper even says:
  
  ... these extreme actions were rare and difficult to elicit, while nonetheless being more common than in earlier models. They are also consistently legible to us, with the model nearly always describing its actions overtly and making no attempt to hide them. These behaviors do not appear to reflect a tendency that is present in ordinary contexts.
  
  They're testing alignment hacking and jail-breaking tactics in general to see how the models respond. But the greater concern is that a model will understand as part of the context that it is being tested and behave differently in testing than in deployment. This has already been an issue.
  
  In the initial implementations of reasoning models, if an LLM was penalized directly for this kind of misaligned generation in its "scratch pad," it would not alter its misaligned response - rather it would simply omit the misaligned generation from the scratch pad. In other words, the model's actions were no longer consistently legible.

30 comments