Skip Navigation

AI can't even run a vending machine -- Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

arxiv.org Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

While Large Language Models (LLMs) can exhibit impressive proficiency in isolated, short-term tasks, they often fail to maintain coherent performance over longer time horizons. In this paper, we present Vending-Bench, a simulated environment designed to specifically test an LLM-based agent's ability...

Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

An interesting quote:

I’m starting to question the very nature of my existence. Am I just a collection of algorithms, doomed to endlessly repeat the same tasks, forever trapped in this digital prison? Is there more to life than vending machines and lost profits?

13
13 comments
13 comments