leaderboard / heatmap_explanation.md
benediktstroebl
added cost and heatmap explanation
c27a759

A newer version of the Gradio SDK is available: 5.32.0

Upgrade

Task success heatmap

The task success heatmap shows which agent can solve which tasks. Agents are sorted by total accuracy (higher is better); tasks are sorted by decreasing order of difficulty (tasks on the left are solved by the most agents; tasks on the right are solved by the least). For agents that have been run more than once, the run with the highest score is shown.