kostis-init's picture
Add dataset versioning support and update leaderboard configuration
bea3aa3
|
raw
history blame
1.56 kB
metadata
title: CP-Bench Leaderboard
emoji: πŸš€πŸ“‘
colorFrom: green
colorTo: indigo
sdk: docker
pinned: true
license: apache-2.0

πŸš€ CP-Bench Leaderboard

This repository contains the leaderboard of the CP-Bench dataset.

πŸ“ Structure

  • app.py β€” Launches the Gradio interface.
  • src/ β€” Contains the main logic for fetching and displaying leaderboard data.'
    • config.py β€” Configuration for the leaderboard.
    • eval.py β€” Evaluation logic for model submissions.
    • hf_utils.py β€” Utilities file.
    • ui.py β€” UI components for displaying the leaderboard.
    • user_eval.py β€” The logic for the evaluation of submitted models, it can also be used to evaluate models locally.
  • README.md β€” (you are here)

🧠 How It Works

  1. Users submit a .jsonl file with their generated models
  2. The submission is uploaded to a storage repository (Hugging Face Hub).
  3. An evaluation script is triggered, which:
    • Loads the submission.
    • Evaluates the models against the benchmark dataset.
    • Computes metrics.
  4. The results are stored and displayed on the leaderboard.

πŸ› οΈ Development

To run locally:

pip install -r requirements.txt
python app.py

If you wish to contribute or modify the leaderboard, feel free to open discussions or pull requests. For adding more modelling frameworks, please modify the src/user_eval.py file to include the execution code for the new framework.