One thing I've also noticed is people doing code reviews using ai to pad their stats or think they are helping out. At best it's stating the obvious, wasting resources to point out what doesn't need pointing out. At worst it's a giant waste of time based on total bullshit the ai made up.
I kinda understand why people would think LLMs are able to generate and evaluate code. Because they throw simple example problems at them and they solve them without much issue. Sometimes they make obvious mistakes, but these are easily corrected. This makes people think LLMs are basically able to code, if it can solve even some harder example problems, surely they are at least as good as beginner programmers right?
No, wrong actually. The reason the LLM can solve the example problem, is because that example (or a variation) was contained within its training data. It knows the answer not by deduction or by reason, it knows the answer by memorization.
Once you start actually programming in the real world, it's nothing like the examples. You need to account for an existing code base, with existing rules, standards and limitations. You need to evaluate which solution out of your toolbox to apply. Need to consider the big picture as well as small details. You need to think of the next guy working with the code, because more often than not, that next guy is you. LLMs crumble in a situation like this, they don't know about all the unspoken things, they haven't trained on the code base you are working with.
There's a book I'm fond of called Patterns of Enterprise Application Architecture by Martin Fowler. I always used to joke it contained the answer to any problem a software engineer ever comes across. The only trick is to choose the correct answer. LLMs are like this, they have all these patterns memorized and choose which answer best fits the question. But it doesn't understand why, what the upsides and downsides are for your specific situation. What the implications of the selected answer are going forward. Or why this pattern over another. When the LLM answers you can often prompt it to produce an answer with a completely different pattern applied. In my opinion it's barely more useful than the book and in many ways much worse.
I use LLM-type AI every day as a software developer. It's incredibly helpful in many contexts, but you have to understand what it's designed to do and what its limitations are.
I went back and forth with Claude and ChatGPT today about its logic being incorrect and it telling me "You're right," then outputting the same/similar erroneous code it output before, until I needed to just slow down and fix some fundamental issues with its output myself. It’s certainly a force multiplier, but not at any kind of scale without guidance.
I'm not convinced AI, in its current incarnation, can be used to write code at a reasonable scale without human intervention. Though I hope we get there so I can retire.
One thing you gotta remember when dealing with that kind of situation is that Claude and Chat etc. are often misaligned with what your goals are.
They aren't really chat bots, they're just pretending to be. LLMs are fundamentally completion engines. So it's not really a chat with an ai that can help solve your problem, instead, the LLM is given the equivalent of "here is a chat log between a helpful ai assistant and a user. What do you think the assistant would say next?"
That means that context is everything and if you tell the ai that it's wrong, it might correct itself the first couple of times but, after a few mistakes, the most likely response will be another wrong answer that needs another correction. Not because the ai doesn't know the correct answer or how to write good code, but because it's completing a chat log between a user and a foolish ai that makes mistakes.
It's easy to get into a degenerate state where the code gets progressively dumber as the conversation goes on. The best solution is to rewrite the assistant's answers directly but chat doesn't let you do that for safety reasons. It's too easy to jailbreak if you can control the full context.
The next best thing is to kill the context and ask about the same thing again in a fresh one. When the ai gets it right, praise it and tell it that it's an excellent professional programmer that is doing a great job. It'll then be more likely to give correct answers because now it's completing a conversation with a pro.
There's a kind of weird art to prompt engineering because open ai and the like have sunk billions of dollars into trying to make them act as much like a "helpful ai assistant" as they can. So sometimes you have to sorta lean into that to get the best results.
It's really easy to get tricked into treating like a normal conversation with a person when it's actually really... not normal.
Also, we monitor beginners heavily because the smallest unsignificant error (in their eyes) can have long lasting downsides and cause strange problems further down the road...
Managers usually love to say they, too, coded back in the day, but they didn't, they wrote some small scripts and thinks everything is easy like that so why not use AI, and why is it taking long to fix that bug?!
Managers usually love to say they, too, coded back in the day, but they didn't, they wrote some small scripts and thinks everything is easy like that so why not use AI, and why is it taking long to fix that bug?!
To be fair, some of us were real developers with real experience; you just don’t tend to hear us making claims about how easy dev work is and how AI is going to take over all the coding.