metadata
title: CP-Bench Leaderboard
emoji: ππ
colorFrom: green
colorTo: indigo
sdk: docker
pinned: true
license: apache-2.0
π CP-Bench Leaderboard
This repository contains the leaderboard of the CP-Bench dataset.
π Structure
app.py
β Launches the Gradio interface.src/
β Contains the main logic for fetching and displaying leaderboard data.'config.py
β Configuration for the leaderboard.eval.py
β Evaluation logic for model submissions.hf_utils.py
β Utilities file.ui.py
β UI components for displaying the leaderboard.user_eval.py
β The logic for the evaluation of submitted models, it can also be used to evaluate models locally.
README.md
β (you are here)
π§ How It Works
- Users submit a .jsonl file with their generated models
- The submission is uploaded to a storage repository (Hugging Face Hub).
- An evaluation script is triggered, which:
- Loads the submission.
- Evaluates the models against the benchmark dataset.
- Computes metrics.
- The results are stored and displayed on the leaderboard.
π οΈ Development
To run locally:
pip install -r requirements.txt
python app.py
If you wish to contribute or modify the leaderboard, feel free to open discussions or pull requests.
For adding more modelling frameworks, please modify the src/user_eval.py
file to include the execution code for the new framework.