How Performant are LLM Agents(AI Chatbots) on Real World Work Tasks? They Fail 70% or More of The Time.
How Performant are LLM Agents(AI Chatbots) on Real World Work Tasks? They Fail 70% or More of The Time.
The Agent Company
How Performant are LLM Agents(AI Chatbots) on Real World Work Tasks? They Fail 70% or More of The Time.
The Agent Company
That tracks with my experience
You have to very carefully scope things for them and have a plan for when they inevitably screw up.
They’re great for bootstrapping in my experience but then really fall apart when you need it to do something surgical on a larger codebase.
Mine too
I’ve been working on an app and it was fantastic for the basics, then I decided to refactor an API and Claude code would run for hours without really getting there.
Also a good warning: I just had to completely rewrite an mcp server I had Claude build because when I needed to update it, the whole server was one giant if/else statement and utterly unmaintainable.