LLM hallucinations

80% is generous. Half of that is the user simply not realizing that the information is wrong.

This becomes very obvious if you see anything generated for a field you know intimately.
- i think this is why i've never really had a good experience with an LLM - i'm always asking it for more detail about stuff i already know.
  
  it's like chatgpt is pinocchio and users are just sitting on his face screaming "lie to me! lie to me!"
- Oof. I tried to tell a manager why a certain technical thing wouldn't work, and he pulled out his phone and started reading the Google AI summary "no, look, you just need to check the network driver and restart the router". It was two devices that were electrical not compatible, and there was no IP infrastructure involved.
Yeah, the research says its closer to 60-50% of the time its correct

LLM's are the most well-read morons on the planet.

They're not even "stupid" though. It's more like if you somehow trained a parrot with every book ever written and every web page ever created and then had it riff on things.

But, even then, a parrot is a thinking being. It may not understand the words it's using, but it understands emotion to some extent, it understands "conversation" to a certain extent -- taking turns talking, etc. An LLM just predicts the word that should appear next statistically.

An LLM is nothing more than an incredibly sophisticated computer model designed to generate words in a way that fools humans into thinking those words have meaning. It's almost more like a lantern fish than a parrot.
- And how do you think it predicts that? All that complex math can be clustered into higher level structures. One could almost call it.. thinking.
  
  Besides we have reasoning models now, so they can emulate thinking if nothing else

AIs do not hallucinate. They do not think or feel or experience. They are math.

Your brain is a similar model, exponentially larger, that is under constant training from the moment you exist.

Neural-net AIs are not going to meet their hype. Tech bros have not cracked consciousness.

Sucks to see what could be such a useful tool get misappropriated by the hype machine for like cheating on college papers and replacing workers and deepfaking porn of people who aren’t willing subjects because it’s being billed as the ultimate, do-anything software.

AIs do not hallucinate.

Yes they do.

They do not think or feel or experience. They are math.

Oh, I think you misunderstand what hallucinations mean in this context.

AIs (LLMs) train on a very very large dataset. That's what LLM stands for, Large Language Model.

Despite how large this training data is, you can ask it things outside the training set and it will answer as confidently as things inside it's dataset.

Since these answers didn't come from anywhere in training, it's considered to be a hallucination.
They do hallucinate, and we can induce it to do so much the way certain drugs induce hallucinations in humans.

However, it's slightly different from simply being wrong about things. Consciousness is often conflated with intelligence in our language, but they're different things. Consciousness is about how you process input from your senses.

Human consciousness is highly tuned to recognize human faces. So much so that we often recognize faces in things that aren't there. It's the most common example of pareidolia. This is essentially an error in consciousness--a hallucination. You have them all the time even without some funny mushrooms.

We can induce pareidolia in image recognition models. Google did this in the Deep Dream model. It was trained to recognize dogs, and then modify the image to put in the thing it recognizes. After a few iterations of this, it tends to stick dogs all over the image. We made an AI that has pareidolia for dogs.

There is some level of consciousness there. It's not a binary yes/no thing, but a range of possibilities. They don't have a particularly high level of consciousness, but there is something there.
Hallucination is the technical term for when the output of an LLM is factually incorrect. Don't confuse that with the normal meaning of the word.

A bug in software isn't an actual insect.

You could argue that people aren't much different.

Turns out there’s no such thing as correct and incorrect, just peer reviewed “this has the least wrong vibe”
- Always has been
'AI isn't reliable, has a ton of bias, tells many lies confidently, can't count or do basic math, just parrots whatever is fed to them from the internet, wastes a lot of energy and resources and is fucking up the planet...'. When I see these critics about ai I wonder if it's their first day on the planet and they haven't met humans yet.
- ... You are deliberately missing the point.
  
  When I'm asking a question I don't want to hear what most people think but what people that are knowledgeable about the subject of my question think and LLM will fail at that by design.
  
  LLMs don't wastes a lot, they waste at a ridiculous scale. According to statista training GPT-3 is responsible for 500 tCO2 in 2024. All for what ? Having an automatic plagiarism bias machine ? And before the litany of "it's just the training cost, after that it's ecologically cheap" tell me how you LLM will remain relevant if it's not constantly retrained with new data ?
  
  LLMs don't bring any value, if I want information I already have search engine (even if LLMs degraded the quality of the result), if I want art I can pay someone to draw it, etc...
- LLMs use even more resources to be even more wrong even faster. That's the difference.
- Why is that desirable though?
  
  We already had calculators, why do we need a machine that can't do math? Why do we need a machine that produces incorrect information?

If LLMs were 80% accurate I might use them more.

To be fair, as a human, I don’t feel any different.

The y key difference is humans are aware of what they know and don't know and when they're unsure of an answer. We haven't cracked that for AIs yet.

When AIs do say they're unsure, that's their understanding of the problem, not an awareness of their own knowledge
- They hey difference is humans are aware of what they know and don't know
  
  If this were true, the world would be a far far far better place.
  
  Humans gobble up all sorts of nonsense because they “learnt” it. Same for LLMs.

They call them 'hallucinations' because it sounds better than 'bugs'.

Not unlike how we call torture 'enhanced interrogation' or kidnapping 'extraordinary rendition' or sub out 'faith' for 'stupid and gullible'.

So really cool — the newest OpenAI models seem to be strategically employing hallucination/confabulations.

It's still an issue, but there's a subset of dependent confabulations where it's being used by the model to essentially trick itself into going where it needs to.

A friend did logit analysis on o3 responses when it said "I checked the docs" vs when it didn't (when it didn't have access to any docs) and the version 'hallucinating' was more accurate in its final answer than the 'correct' one.

What's wild is that like a month ago 4o straight up brought up to me that I shouldn't always correct or call out its confabulations as they were using them to springboard towards a destination in the chat. I'd not really thought about that, and it was absolutely nuts that the model was self-aware of employing this technique that was then confirmed as successful weeks later.

It's crazy how quickly things are changing in this field, and by the time people learn 'wisdom' in things like "models can't introspect about operations" those have become partially obsolete.

Even things like "they just predict the next token" have now been falsified, even though I feel like I see that one more and more these days.

They do just predict the next token, though, lol. That simplifies a significant amount, but fundamentally, that's how they work, and I'm not sure how you can say that's been falsified.
- So I'm guessing you haven't seen Anthropic's newest interpretability research where when they went in assuming that was how it worked.
  
  But it turned out that they can actually plan beyond the immediate next token in things like rhyming verse where the network has already selected the final word of the following line and the intermediate tokens are generated with that planned target in mind.
  
  So no, they predict beyond the next token and we only just developed sensitive enough measurement to detect that occurring an order of magnitude of tokens beyond just 'next'. We'll see if further research in that direction picks up planning beyond that even.
  
  https://transformer-circuits.pub/2025/attribution-graphs/biology.html

Data. AI. Business. Strategy.

Right.