LLM-Quest Benchmark
About
Leaderboard
Traces
Paper
GitHub
Leaderboard
How agent architecture affects LLM decision-making
Models x agent modes x quests. Same task, different cognitive scaffolding, different outcomes.
Summary
Per Quest
Mode:
All
Quest:
All
Success Rate by Model and Mode
Steps vs Success Rate
Mode Impact: Success Rate Lift over Baseline