leaderboard / agent_submission.md
benediktstroebl
init v1
7c691e6

A newer version of the Gradio SDK is available: 5.32.0

Upgrade

To submit a new agent for evaluation, developers should only need to:

  1. Enure that the agent provides a specific entry point to the agent (e.g., a Python script or function)

  2. Integrate logging by wrapping all LLM API calls to report cost, latency, and relevant parameters.

    • For our own evaluations, we have been relying on Weights & Biases' Weave which provides integrations for a number of LLM providers.
    • Both, Vivaria and UK AISI's Inspect provide logging functionalities.
    • However, there are some missing pieces we are interested in such as latency and parameters of LLM calls. Weave provides a minimum-effort solution.