File size: 607 Bytes
c5224d3
164123e
c5224d3
 
 
 
8c8448c
c5224d3
 
 
164123e
c5224d3
164123e
7845011
 
 
c5224d3
274521a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
title: Head to Head Evaluations Comparator
emoji: 🦀
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Evaluates 2 models or 1 model w/diff configs on a dataset
---
This Space replicates the evaluation of different models on various datasets.
Dataset: https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro
GitHub: https://github.com/TIGER-AI-Lab/MMLU-Pro
Paper: https://arxiv.org/abs/2406.01574 (Submitted at NeurIPS 2024)

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference