Spaces:
Sleeping
Sleeping
title: Head to Head Evaluations Comparator | |
emoji: 🦀 | |
colorFrom: indigo | |
colorTo: blue | |
sdk: gradio | |
sdk_version: 5.18.0 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
short_description: Evaluates 2 models or 1 model w/diff configs on a dataset | |
This Space replicates the evaluation of different models on various datasets. | |
Dataset: https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro | |
GitHub: https://github.com/TIGER-AI-Lab/MMLU-Pro | |
Paper: https://arxiv.org/abs/2406.01574 (Submitted at NeurIPS 2024) | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |