Running 5 5 OpenThoughts Benchmark Explorer 📊 Explore model performance through benchmark correlations
mlfoundations-dev/a1_science_stackexchange_physics_1k_eval_636d Viewer • Updated May 30 • 7.9k • 19