Commit History

update title
bc0f99c

Zachary Siegel commited on

agent submission instructions
2c91b5e

Zachary Siegel commited on

add results to leaderboard
8de3f0a

Zachary Siegel commited on

added first agent to leaderboard
64319c0

Zachary Siegel commited on

scaffold for core bench
b335ab8

Zachary Siegel commited on

core bench outline
2faf3bd

Zachary Siegel commited on

modified heading and added about tab text
c50a008

benediktstroebl commited on

added one line descriptions to each benchmark with acknowledgements and modified headline
4e68e9f

benediktstroebl commited on

vis update
e24146f

benediktstroebl commited on

Added sorting to heatmap
9250161

benediktstroebl commited on

Update app.py
f5fc72d

benediktstroebl commited on

update
e10e28c

benediktstroebl commited on

added verified agents management and column and fixed widths
b7d1f08

benediktstroebl commited on

added task heatmaps
47280b7

benediktstroebl commited on

Update
fd35772

benediktstroebl commited on

Added MLAgentBench
22fef14

benediktstroebl commited on

Update app.py
9f9bed8

benediktstroebl commited on

Update
ff06039

benediktstroebl commited on

Update app.py
f400b47

benediktstroebl commited on

Big update with SQL backend
e59dbdb

benediktstroebl commited on

added timestamp to task summary prompt for failure report and fixed failure report gradio issue
19bb306

benediktstroebl commited on

added failure report and two new swebench variants
5a7e21a

benediktstroebl commited on

formatting and download fix
eb2a754

benediktstroebl commited on

fixed step headline not showing
8946d7b

benediktstroebl commited on

format update
3e874db

benediktstroebl commited on

layout update
066588c

benediktstroebl commited on

refactoring
5cbaf0e

benediktstroebl commited on

refactoring and USACO as default front page
221fb8a

benediktstroebl commited on

new data structure with global dict for faster processing
f9140ad

benediktstroebl commited on

big update with raw predictions section and dropdowns that dynamically parse agents of current leaderboard
ca89148

benediktstroebl commited on

Added Raw prediction dashboard
07044da

benediktstroebl commited on

added task flow plot
575c750

benediktstroebl commited on

added initial version of visibility feature and fixed automatic update of results every hour
0b3117f

benediktstroebl commited on

fixed sorting. Modified axis labels
bf0e375

benediktstroebl commited on

added usaco
387c612

benediktstroebl commited on

added auto update ever 1 h of HF space
5f9c44d

benediktstroebl commited on

added Pareto frontier to plot
4415138

benediktstroebl commited on

added auto update
bb94aa7

benediktstroebl commited on

update
5b0a5d3

benediktstroebl commited on

Fix auto update
108bc02

benediktstroebl commited on

added automatic download for results
1783518

benediktstroebl commited on

update
356b0eb

benediktstroebl commited on

app file
7e8296f

benediktstroebl commited on