Spaces:

Duplicated from benediktstroebl/hal

agent-evals
/

core_leaderboard

Running

App Files Files Community

core_leaderboard / utils

Ctrl+K

Ctrl+K

3 contributors

History: 11 commits

benediktstroebl

added failure report and two new swebench variants

5a7e21a 10 months ago

data.py

9.47 kB

format update and added monitor llm client backend 10 months ago
pareto.py

1.34 kB

big update with raw predictions section and dropdowns that dynamically parse agents of current leaderboard 10 months ago
processing.py

6.27 kB

added failure report and two new swebench variants 10 months ago
viz.py

10.3 kB

added failure report and two new swebench variants 10 months ago