4mo ago

Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

2mo ago

AI can't even run a vending machine -- Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

2mo ago

AI can't even run a vending machine -- Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

2mo ago

AI can't even run a vending machine -- Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

No comments