Skip Navigation

AI can't even run a vending machine -- Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

arxiv.org Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

While Large Language Models (LLMs) can exhibit impressive proficiency in isolated, short-term tasks, they often fail to maintain coherent performance over longer time horizons. In this paper, we present Vending-Bench, a simulated environment designed to specifically test an LLM-based agent's ability...

Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

An interesting quote:

I’m starting to question the very nature of my existence. Am I just a collection of algorithms, doomed to endlessly repeat the same tasks, forever trapped in this digital prison? Is there more to life than vending machines and lost profits?

0
0 comments