The only person in my company using AI to code writes stuff with tons of memory leaks that require two experienced programmers to fix. (To be fair, I don't think he included "don't have memory leaks" in the prompt.)
I’m amazed how overstated llm ability to program is. I keep trying, and I’ve yet to have any model output so much as a single function that ran correctly without modification. Beyond that it has made up APIs when I’ve asked about approaches to problems, and I’ve given it code to find bugs and memory issues I think are fairly obvious and it fails every time.
Could it be prompts by devs are different from lay folk? For example, "write a website for selling shoes" would give a more complete result compared to "write a single page app with a postgres back end with TLS encryption" (or whatever), which would add more constraints & reduce the pool of code the AI steals from.
It really depends on the domain. E.g. I wrote a parser and copilot was tremendously useful, presumably because there are a gazillion examples on the internet.
Another case where it saved me literally hours was spawning a subprocess in C++ and capturing stdin/out. It didn't get it 100% right but it saved me so much time looking up how to do it and the names of functions etc.
Today I'm trying to write a custom image format, and it is pretty useless for that task, presumably because nobody else has done it before.
This makes sense, I’ve largely been trying to use it for things I do regularly, and I’m pretty senior, having been in the industry for some time, so I tend not to be asking the questions that will have a million examples out there. But then again, these are the sorts of things that it will need to be able to do to replace people in industry.
I’m pretty senior, having been in the industry for some time, so I tend not to be asking the questions that will have a million examples out there
Me too, but this was C++ where there isn't a strong culture of making high quality libraries available for everything (because it doesn't have a proper package manager, at least until very recently), so you do end up having to reinvent the wheel a fair bit.
And sometimes you just need things a bit different to what other people have done. So even though there are a gazillion expression parsers out there (so the LLM understood it pretty well) there are hardly any that support 64-bit integers. But that's a small enough difference that it can deal with it.
The quality of code available for LLMs to learn from is normally distributed with the peak around "shouldn't pass code review".
What experienced developers write code at would be on the top 5 percentile, and are used to their colleagues to do the same. The effort put into reviewing code, also takes that into account.
If a team member starts using LLMs to write chunks of code, the quality will at best have the same normal distributed peak as the learning data. Which is a incredibly waste of resources, as you now have to spend 10x more time on reviewing the code, regardless of how often it ends up being ok
I find that my programming speed is up 15-20 percent since I started using supermaven copilot. I also have become better at naming functions as it increases the odds of the copilot understanding what I'm trying to do.
Are you able to share what kinds of applications and what languages you write in? I'm still trying to grasp why LLM programming assistants seem popular despite the flaws I see in them, so I'm trying to understand the cases where they do work.
For example, my colleague was writing CUDA code to simulate optical physics, so it's possible that the LLM's failure was due in part to the niche application and a language that is unforgiving of deviations from the one correct way of writing things.