Spaces:

kostis-init
/

CP-Bench-Leaderboard

Running

kostis-init commited on 11 days ago

Commit

2990bc2

1 Parent(s): 70cc330

Update README.md with project structure and development instructions; fix error percentage calculation in user_eval.py

Files changed (2) hide show

README.md CHANGED Viewed

@@ -11,4 +11,36 @@ pinned: true
 license: apache-2.0
 ---
-# CP Bench Leaderboard

 license: apache-2.0
 ---
+# 🚀 [CP-Bench Leaderboard] – Hugging Face Leaderboard Space
+This repository contains the logic and configuration for the **[CP-Bench Leaderboard]**.
+## 📁 Structure
+- `app.py` — Launches the Gradio interface.
+- `src/` — Contains the main logic for fetching and displaying leaderboard data.'
+  - `config.py` — Configuration for the leaderboard.
+  - `eval.py` — Evaluation logic for model submissions.
+  - `hf_utils.py` — Utilities file.
+  - `ui.py` — UI components for displaying the leaderboard.
+  - `user_eval.py` — The logic for the evaluation of submitted models, it can also be used to evaluate models locally.
+- `README.md` — (you are here)
+## 🧠 How It Works
+1. Users submit a .jsonl file with their generated models
+2. The submission is uploaded to a storage repository (Hugging Face Hub).
+3. An evaluation script is triggered, which:
+   - Loads the submission.
+   - Evaluates the models against the benchmark dataset.
+   - Computes metrics.
+4. The results are stored and displayed on the leaderboard.
+## 🛠️ Development
+To run locally:
+```bash
+pip install -r requirements.txt
+python app.py
+```

src/user_eval.py CHANGED Viewed

@@ -328,7 +328,7 @@ def evaluate_submission(submitted_models, summary_file_path, modelling_framw, to
         summary_f.write(f"  Total Submitted Models Parsed: {total_submitted_models}\n")
         summary_f.write(f"  Models That Ran Successfully (out of the submitted models): {models_ran_successfully}/{total_submitted_models}\n")
         summary_f.write(f"  Submission coverage perc: {float(total_submitted_models) / len(ground_truth_models) * 100:.2f}%\n")
-        summary_f.write(f"  Error perc: {(total_submitted_models - models_ran_successfully) / len(ground_truth_models) * 100:.2f}%\n")
         summary_f.write(f"  Consistency perc: {consistency_checks_passed / len(ground_truth_models) * 100:.2f}%\n")
         summary_f.write(f"  Final Solution Accuracy perc: {all_checks_passed / len(ground_truth_models) * 100:.2f}%\n")
         summary_f.write("-" * 30 + "\n")

         summary_f.write(f"  Total Submitted Models Parsed: {total_submitted_models}\n")
         summary_f.write(f"  Models That Ran Successfully (out of the submitted models): {models_ran_successfully}/{total_submitted_models}\n")
         summary_f.write(f"  Submission coverage perc: {float(total_submitted_models) / len(ground_truth_models) * 100:.2f}%\n")
+        summary_f.write(f"  Error perc: {(total_submitted_models - models_ran_successfully) / len(total_submitted_models) * 100:.2f}%\n")
         summary_f.write(f"  Consistency perc: {consistency_checks_passed / len(ground_truth_models) * 100:.2f}%\n")
         summary_f.write(f"  Final Solution Accuracy perc: {all_checks_passed / len(ground_truth_models) * 100:.2f}%\n")
         summary_f.write("-" * 30 + "\n")