leaderboard / benchmark_submission.md
benediktstroebl
init v1
7c691e6

A newer version of the Gradio SDK is available: 5.32.0

Upgrade

To submit a new benchmark to the library:

  1. Implement a new benchmark using some standard format (such as the METR Task Standard). This includes specifying the exact instructions for each tasks as well as the task environment that is provided inside the container the agent is run in.

  2. We will encourage developers to support running their tasks on separate VMs and specify the exact hardware specifications for each task in the task environment.