Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.32.0
Task success heatmap
The task success heatmap shows which agent can solve which tasks. Agents are sorted by total accuracy (higher is better); tasks are sorted by decreasing order of difficulty (tasks on the left are solved by the most agents; tasks on the right are solved by the least). For agents that have been run more than once, the run with the highest score is shown.