Everyone here saying "no shit, LLMs were not even designed to play chess" are not the people who this is directed at.
Multiple times at my job I have had to explain, often to upper management, that LLMs are not AGIs.
Stories like these help an under informed general public to wrap their heads around the idea that a "computer that can talk" =/= a "computer that can truly think/reason".
They say LLMs can “reason” now, but they obviously can’t. At best, they could be trained to write a code snippet and run the code to get the answer. I’ve noticed when asked to do math ChatGPT will now translate my math question into python and run that to get the answer, since it can’t do math itself reliably.
There are algorithms for playing chess that win by analyzing every possible move for 5, 10, 100, or more moves in advance and choosing the one most likely to lead to an optimal outcome. This is essentially what the Atari game is probably doing. LLMs could probably be given the tools to run that algorithm themselves. However, the LLM itself can’t possibly do the same thing.
What exactly is it that makes the image generating AI use the ugliest colors for backgrounds? This one is like the stained walls in chain-smoker's house.
for some reason it reminds me of a quote from friends: "voice recognition is gonna be pretty much standard on any computer you buy. So you can be like 'wash my car', 'clean my room'. You know it's not gonna be able to do any of those things, but it'll understand what you're saying"
"Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid." attributed to Einstein but I read he didn't say it.
I’d like to see the Atari write a shitty article about the seven best and worst kinds of moviegoers. Or role play with me that they are a 300 year old sparkly vampire and I am an insufferable teenage girl with zero friends or ability to emote proper human emotions.
No shit. Chess programs are specifically built and optimised to the nth degree for this specific use case and nothing else. They do not share the massive compute overhead and convoluted nondeterministic nature of an LLM.
This is like drag racing an F1 car and a Camry and being surprised at the result.
It’s a fundamental limitation of how LLMs work. They simply don’t understand how to follow a set of rules in the same way as a traditional computer/game is programmed.
Imagine you have only long-term memory that you can’t add to. You might get a few sentences of short-term memory before you’ve forgetting the context of the beginning of the conversation.
Then add on the fact that chess is very much a forward-thinking game and LLMs don’t stand a chance against other methods. It’s the classic case of “When all you have is a hammer, everything looks like a nail.” LLMs can be a great tool, but they can’t be your only tool.
MY biggest disappointment with how AI is being implemented is the inability to incorporate context specific execution if small programs to emulate things like calculators and chess programs. Like why does it doe the hard mode approach to literally everything? When asked to do math why doesn't it execute something that emulates a calculator?
LLMs have been adding reasoning front ends to them like O3 and deep seek. That's why they can solve problems that simple LLM's failed at.
I found one reference to O3 rated at chess level 800 but I'd really like to see Atari chess vs O3. My telling my friend how I think it would fail isn't convincing.
There are chess engines floating around out there under 4kb, probably even less if you were mainlining the thing in assembly for a 6502 instruction set, still 128 bytes of RAM to work with is a punish.
But chess is mostly a 'solved problem' computationally, it's impressive constrained to the hardware but this whole Atari vs ChatGPT thing is like a grandmaster in a Mechanical Turk vs a toddler