Programming @programming.dev cm0002 @lemmy.world 4w ago

Trusting your own judgement on 'AI' is a huge risk

www.baldurbjarnason.com Trusting your own judgement on ‘AI’ is a huge risk

Web dev at the end of the world, from Hveragerði, Iceland

OC below by @HaraldvonBlauzahn@feddit.org

What called my attention is that assessments of AI are becoming polarized and somewhat a matter of belief.

Some people firmly believe LLMs are helpful. But programming is a logical task and LLMs can't think - only generate statistically plausible patterns.

The author of the article explains that this creates the same psychological hazards like astrology or tarot cards, psychological traps that have been exploited by psychics for centuries - and even very intelligent people can fall prey to these.

Finally what should cause alarm is that on top that LLMs can't think, but people behave as if they do, there is no objective scientifically sound examination whether AI models can create any working software faster. Given that there are multi-billion dollar investments, and there was more than enough time to carry through controlled experiments, this should raise loud alarm bells.

59 comments

LLMs can’t think - only generate statistically plausible patterns

Ah still rolling out the old "stochastic parrot" nonsense I see.

Anyway on to the actual article... I was hoping it wouldn't make these basic mistakes:

[Typescript] looks more like an “enterprise” programming language for large institutions, but we honestly don’t have any evidence that it’s genuinely more suitable for those circumstances than the regular JavaScript.

Yes we do. Frankly if you've used it it's so obviously better than regular JavaScript you probably don't need more evidence (it's like looking for "evidence" that film stars are more attractive than average people). But anyway we do have great papers like this one.

Anyway that's slightly beside the point. I think the article is right that smart people are not invulnerable to manipulation or falling for "obviously" stupid ideas. I know plenty of very smart religious people for example.

However I think using this to dismiss LLMs is dumb, in the same way that his dismissal of Typescript is. LLMs aren't homeopathy or religion.

I have used LLMs to get some work done and... guess what, it did the work! Do I trust it to do everything? Obviously not. But sometimes I don't need perfect code. For example recently I asked it to create an example SystemVerilog file for me utilising as many syntax features as possible (testing an auto-formatter). It did a pretty good job. Saved some time. What psychological hazard have I fallen for exactly?

Overall, B-. Interesting ideas but flawed logic.
- LLMs can’t think - only generate statistically plausible patterns
  
  Ah still rolling out the old “stochastic parrot” nonsense I see.
  
  Ah still rolling out the old "computers think" pseudo-science.
  
  I have used LLMs to get some work done and… guess what, it did the work!
  
  Ah yes the old pointless vague anecdote.
  
  What psychological hazard have I fallen for exactly?
  
  Promoting pseudo-science.
  
  Overall D. Neither interesting nor new nor useful.
  
  Ah yes the old pointless vague anecdote.
  
  If your argument is "LLMs can't do useful work", and then I say "no, I've used them to do useful work many times" how is that a pointless vague anecdote? It's a direct proof that you're wrong.
  
  Promoting pseudo-science.
  
  Sorry what? This is bizarre.
- Amen
  
  And to add that smart people fall for dumb biases, we just need to look at the object oriented mania of the 2000s to late 2010s to see us shoehorn in one paradigm into everything without critically considering whether it made sense over other models.
  
  Can an LLM do everything I need yet? No.
  
  But is a stochastic parrot good enough to help me complete a function and help me restructure code? Yes definitely.
  
  Claude is good enough for so much of the low value code I write that is actually a useful tool. I have to review the code but it’s useable.
  
  I use AI search to lookup functions that I don’t need detailed docs for, or to help me debug arcane library specific errors (just had one earlier today where in polars the list and array types are very much not interchangeable and the explode method was failing).
  
  I still read the docs on things that are critical, and I write the critical paths and dictate structure and understand the problem im solving well.
  
  It's really amazing the number of people trying to argue that LLMs are useless, while simultaneously so many people are using them successfully. Makes me wonder if they've even tried them.
- Ah still rolling out the old "stochastic parrot" nonsense I see.
  
  It is a bunch of stochastic parrots. It just happens frequently that the words they are parroting were orginally written by a bunch of intelligent people which were knowledgeable in their fields.
  
  Note this doesn't makes the parrots intelligent - in the same way that a book written by Einstein to explain special relativity has any own intelligence. Einstein was intelligent, his words transport his intelligent ideas, but the book conveying them to other people (as, the printed pages with cardboard cover) is as dumb as a stone. You would not ask a piece of cardboard so solve a math problem, would you?
  
  Your comment doesn't account for the fact that LLMs can generalise. Often not very well but they can produce outputs for inputs not seen in their training sets. Otherwise what would be the point?
  
  You would not ask a piece of cardboard so solve a math problem, would you?
  
  Uhhh you know LLMs can solve quite complex maths problems? Including novel ones.
I fear this is a problem that may never be solved. I mean that people of any intelligence fall for the mind's biases.

There's just too little to be gained feelings-wise. Yeah, you make better decisions, but you're also sacrificing "going with the flow", acting like our nature wants us to act. Going against your own nature is hard and sometimes painful.

Making wrong decisions is objectively worse, leading to worse outcomes, but if it doesn't feel worse (because you're not attributing the effects of the wrong decisions to the right cause, i.e. acting irrationally), then why should a person do it. If you follow the mind's bias towards attributing your problems away from irrationality, it's basically a self-fulfilling prophecy.

Great article.
What's the difference between copying a function from stack overflow and copying a function from a llm that has copied it from SO?

LLM are sort of a search engine with advanced language substitution features nothing more nothing less.
- LLM are poor snapshots of a search engine with no way to fix any erroneous data. If you search something on Stack you get the page with several people providing snippets and debating the best approach. The LLM does not give you this. Furthermore if the author goes back and fixes an error in their code the search will find it whereas the LLM will give you the buggy code with no way to reasonably update it
  
  LLM have major issues and even bigger limitations. Pretending they are some panacea is going to disappoint.
  
  LLM also does not bully you for asking. Nor it says "duplicated question" for non duplicated questions... There's a reason people prefer LLM to SO nowadays.
  
  It's not panacea. But it's not the doom world destroying useless machine that some people like to tell it is.
  
  It's a useful tool for some task if you know how to use it. Everyone who actively use it is because we have find put that it works for us better than other tools for that task, of not we would not use it.
  
  Giving my own personal experience, I tend to ask first to an LLM rather that what I used to do digging in old SO answers because I get the answer quicker and a lot of the times just better. It's not perfect by any stretch of the imagination, but it serves me a purpose.
  
  For instance last week I needed a PowerShell command to open an app compatibility menu from the command line. I asked and got this as a response:
  
  (New-Object -ComObject Shell.Application).Namespace((Split-Path "C:\Ruta\A\TuPrograma.exe")).ParseName((Split-Path "C:\Ruta\A\TuPrograma.exe" -Leaf)).InvokeVerb("P&roperties")
  
  Worked at first try, exactly as I wanted.
  
  You are free to try a search engine with the query "PowerShell command to open an app compatibility menu from the command line" and check for yourself how little help the firsts results get you.
  
  It's a tool, as many others. The magic lies in knowing when and how to use it. For other things I may not use it, but after a couple of years using it I'm developing a good sense of which questions does it handle well and which questions is better not even to try.
- Because it's not a plain copy but an Interpretation of SO.
  
  With llm you just have one more layer between you and the information that can distort that information.
  
  And?
  
  The issue is that you should not blindly trust code. Being originally written by a human being is not, by any means, a quality certification.
Another similar article that's really good:

The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con
What called my attention is that assessments of AI are becoming polarized and somewhat a matter of belief.

Proceed to write a belief as a statement in the following paragraph

If you think LLMs doesnt think (I won't argue that they arent extremely dumb), please define what is thinking, before continuing, and if your definition of thinking doesn't apply to humans, we won't be able to agree.
- The burden of proof is on those who say that LLMs do think.
  
  I asked for your definition, I cannot prove something if we do not agree on a definition first.
  You also missread what I said, I did not said AI were thinking.
  The burden of proof is on the one who made an affirmation.
  I'm not the one who made an affirmation which field experts doesn't know the answer.
  But depending of your definition of thinking, some can be answered.
- Since LLMs runs on CPUs with a lot of memory, do you agree that my calculator is thinking?
  
  This argument makes no more sense than trying to say that a plant is thinking because brains are made of cells and so are plants.
  
  You think computation is thinking ?
  I asked for your definition of thinking.
  The OP talked about belief, then made a statement using a word that is not precisely defined.
  If you think computation is thinking then by your definition the LLM is thinking.
  But that's your definition of thinking.
- ' Please succinctly answer a question of philosophy that has plagued mankind for thousands of years. can't? <crosses arms with a superior smirk> I win'
  
  Claiming LLMs can't think with the current informations available, and calling that not a belief, is claiming to have a response to this philosophy question.
  The only sensible answer is saying you don't know, or being aware and communicating that your statement is a belief.
- I don't think the current common implementation of AI systems are "thinking" and I'll base my argument on Oxford's definitions of words. Thinking is defined as "the process of using one's mind to consider or reason about something". I'll ignore the word "mind" and focus on the word "reason". I don't think what AIs are doing counts as reasoning as defined by Oxford. Let's go to that definition: "the power of the mind to think, understand, and form judgments by a process of logic". I take issue with the assertion that they form judgments. For completeness, but I don't think it's definition is particularly relevant here, a judgment is: "the ability to make considered decisions or come to sensible conclusions".
  
  I think when you ask an LLM how many 'r's there are in Strawberry and questions along this line you can see they can't form judgments. These basic but obscure questions are where you see that the ability to form judgements isn't there. I would also add that if you "form judgments" you probably don't need to be reminded you formed a judgment immediately after forming one. Like if I ask an LLM a question, and it provides an answer, I can convince it that it was wrong whether or not I'm making junk up or not. I can tell it it made a mistake and it will blindly change it's answer whether it made a mistake or not. That also doesn't feel like it's able to reason or make judgments.
  
  This is where all the hype falls flat for me. It feels like sometimes it looks like a concrete wall, but occasionally that concrete wall is made of wet paper. You can see how impressive the tool is and how paper thin it is at the same time. It's cool, it's useful, it's fake, and that's ok. Just be aware of what the tool is.
  
  I think when you ask an LLM how many 'r’s there are in Strawberry and questions along this line you can see they can’t form judgments.
  
  Like a LLMs you are making the wrong affirmation based lacking knowledge.
  Current LLMs input, and output tokens, they dont ever see the individual letters, they see tokens, for straberry, they see 3 tokens:
  
  They dont have any information on what characters are in this tokens. So they come up with something. If you learned a language only by speaking, you'll be unable to write it down correctly (except purely phonetical systems), instead you'll come up with what you think the word should be written.
  
  I would also add that if you “form judgments” you probably don’t need to be reminded you formed a judgment immediately after forming one.
  
  You come up with the judgment before you are aware of it: https://www.unsw.edu.au/newsroom/news/2019/03/our-brains-reveal-our-choices-before-were-even-aware-of-them--st
  
  can tell it it made a mistake and it will blindly change it’s answer whether it made a mistake or not. That also doesn’t feel like it’s able to reason or make judgments.
  
  That's also how the brain can works, it come up with a plausible explanation after having the result.
  See the experience which are spoken about here: https://www.youtube.com/watch?v=wfYbgdo8e-8
  
  I showed the same behavior in humans of some behavior you observed in LLMs, does this means that by your definition, humans doesnt think ?
Reponding to another comment in opensource@lemmy.ml:

Writing code is itself a process of scientific exploration; you think about what will happen, and then you test it, from different angles, to confirm or falsify your assumptions.

What you confuse here is doing something that can benefit from applying logical thinking with doing science. For exanple, mathematical arithmetic is part of math and math is science. But summing numbers is not necessarily doing science. And if you roll, say, octal dice to see if the result happens to match an addition task, it is certainly not doing science, and no, the dice still can't think logically and certainly don't do math even if the result sometimes happens to be correct.

For the dynamic vs static typing debate, see the article by Dan Luu:

https://danluu.com/empirical-pl/

But this is not the central point of the above blog post. The central point of it is that, by the very nature of LLMs to produce statistically plausible output, self-experimenting with them subjects one to very strong psychological biases because of the Barnum effect and therefore it is, first, not even possible to assess their usefulness for programming by self-experimentation(!) , and second, it is even harmful because these effects lead to self-reinforcing and harmful beliefs.

And the quibbling about what "thinking" means is just showing that the arguments pro-AI has degraded into a debate about belief - the argument has become "but it seems to be thinking to me" even if it is technically not possible and also not in reality observed that LLMs apply logical rules, cannot derive logical facts, can not explain output by reasoning , are not aware about what they 'know' and don't 'know', or can not optimize decisions for multiple complex and sometimes contradictory objectives (which is absolutely critical to any sane software architecture).

What would be needed here are objective controlled experiments whether developers equipped with LLMs can produce working and maintainable code any faster than ones not using them.

And the very likely result is that the code which they produce using LLMs is never better than the code they write themselves.
If you have to use AI - maybe your work insists on it - always demand it cite its sources, hope they are relevant, and go read those instead.

59 comments