OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems
OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems

futurism.com
OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems

Looks like we're finally reaching the end of the hype phase of AI coding tools. My experience is that reasoning models like R1, are able to produce useful code fairly consistently now, but they're not a replacement for a human. They're a tool a human can use to speed up their workflow.
even then i'd be extremely cautious about using ai code bc what's worse than generating broken code is generating code that looks convincing and runs "properly" but is incorrect. in my experience, this is the worst problem about llms in general: very convincing but wrong answers
The trick is that you don't want to generate a lot of code all at once. I tend to use it for making specific functions, or doing boring tasks like creating SQL schemas based on JSON input. I can validate these tasks pretty easily, but it saves me a lot of time looking stuff up or writing boilerplate.
I also find that different languages also work better in this context. For example, I primarily work with Clojure and it's a functional language where data is immutable. The contract for most functions is a plain data structure without any state or mutable references to the outside. This makes it easy to test functions in isolation to ensure they're doing what's intended. For example, I had DeepSeek write a function to extract image urls from Markdown for me just the other day.
It's easy to follow code that's fairly idiomatic, and I can easily test this function by passing some Markdown through it and seeing whether it's giving me expected results. The most annoying part about writing it by hand would've been crafting the regex which DeepSeek managed to do correctly.
@yogthos
Now for the valley of productivity...
I'd guess we've seen 5% or less of the value these tools can bring even without more breakthroughs in the enabling technologies.
What aspect of it do you think users are deficient in that they’re missing 95% of its potential? Prompt engineering?
My experience is much better, these tools basically replaced stuff like stack overflow for me, and save me a ton of time writing boilerplate.