|
--- |
|
title: CP-Bench Leaderboard |
|
emoji: ππ |
|
colorFrom: green |
|
colorTo: indigo |
|
sdk: docker |
|
|
|
|
|
|
|
pinned: true |
|
license: apache-2.0 |
|
--- |
|
|
|
# π CP-Bench Leaderboard |
|
|
|
This repository contains the leaderboard of the [CP-Bench](https://huggingface.co/datasets/kostis-init/CP-Bench) dataset. |
|
|
|
## π Structure |
|
|
|
- `app.py` β Launches the Gradio interface. |
|
- `src/` β Contains the main logic for fetching and displaying leaderboard data.' |
|
- `config.py` β Configuration for the leaderboard. |
|
- `eval.py` β Evaluation logic for model submissions. |
|
- `hf_utils.py` β Utilities file. |
|
- `ui.py` β UI components for displaying the leaderboard. |
|
- `user_eval.py` β The logic for the evaluation of submitted models, it can also be used to evaluate models locally. |
|
- `README.md` β (you are here) |
|
|
|
## π§ How It Works |
|
|
|
1. Users submit a .jsonl file with their generated models |
|
2. The submission is uploaded to a storage repository (Hugging Face Hub). |
|
3. An evaluation script is triggered, which: |
|
- Loads the submission. |
|
- Evaluates the models against the benchmark dataset. |
|
- Computes metrics. |
|
4. The results are stored and displayed on the leaderboard. |
|
|
|
## π οΈ Development |
|
|
|
To run locally: |
|
|
|
```bash |
|
pip install -r requirements.txt |
|
python app.py |
|
``` |
|
|
|
If you wish to contribute or modify the leaderboard, feel free to open discussions or pull requests. |
|
For adding more modelling frameworks, please modify the `src/user_eval.py` file to include the execution code for the new framework. |
|
|