kostis-init commited on
Commit
2990bc2
Β·
1 Parent(s): 70cc330

Update README.md with project structure and development instructions; fix error percentage calculation in user_eval.py

Browse files
Files changed (2) hide show
  1. README.md +33 -1
  2. src/user_eval.py +1 -1
README.md CHANGED
@@ -11,4 +11,36 @@ pinned: true
11
  license: apache-2.0
12
  ---
13
 
14
- # CP Bench Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  license: apache-2.0
12
  ---
13
 
14
+ # πŸš€ [CP-Bench Leaderboard] – Hugging Face Leaderboard Space
15
+
16
+ This repository contains the logic and configuration for the **[CP-Bench Leaderboard]**.
17
+
18
+ ## πŸ“ Structure
19
+
20
+ - `app.py` β€” Launches the Gradio interface.
21
+ - `src/` β€” Contains the main logic for fetching and displaying leaderboard data.'
22
+ - `config.py` β€” Configuration for the leaderboard.
23
+ - `eval.py` β€” Evaluation logic for model submissions.
24
+ - `hf_utils.py` β€” Utilities file.
25
+ - `ui.py` β€” UI components for displaying the leaderboard.
26
+ - `user_eval.py` β€” The logic for the evaluation of submitted models, it can also be used to evaluate models locally.
27
+ - `README.md` β€” (you are here)
28
+
29
+ ## 🧠 How It Works
30
+
31
+ 1. Users submit a .jsonl file with their generated models
32
+ 2. The submission is uploaded to a storage repository (Hugging Face Hub).
33
+ 3. An evaluation script is triggered, which:
34
+ - Loads the submission.
35
+ - Evaluates the models against the benchmark dataset.
36
+ - Computes metrics.
37
+ 4. The results are stored and displayed on the leaderboard.
38
+
39
+ ## πŸ› οΈ Development
40
+
41
+ To run locally:
42
+
43
+ ```bash
44
+ pip install -r requirements.txt
45
+ python app.py
46
+ ```
src/user_eval.py CHANGED
@@ -328,7 +328,7 @@ def evaluate_submission(submitted_models, summary_file_path, modelling_framw, to
328
  summary_f.write(f" Total Submitted Models Parsed: {total_submitted_models}\n")
329
  summary_f.write(f" Models That Ran Successfully (out of the submitted models): {models_ran_successfully}/{total_submitted_models}\n")
330
  summary_f.write(f" Submission coverage perc: {float(total_submitted_models) / len(ground_truth_models) * 100:.2f}%\n")
331
- summary_f.write(f" Error perc: {(total_submitted_models - models_ran_successfully) / len(ground_truth_models) * 100:.2f}%\n")
332
  summary_f.write(f" Consistency perc: {consistency_checks_passed / len(ground_truth_models) * 100:.2f}%\n")
333
  summary_f.write(f" Final Solution Accuracy perc: {all_checks_passed / len(ground_truth_models) * 100:.2f}%\n")
334
  summary_f.write("-" * 30 + "\n")
 
328
  summary_f.write(f" Total Submitted Models Parsed: {total_submitted_models}\n")
329
  summary_f.write(f" Models That Ran Successfully (out of the submitted models): {models_ran_successfully}/{total_submitted_models}\n")
330
  summary_f.write(f" Submission coverage perc: {float(total_submitted_models) / len(ground_truth_models) * 100:.2f}%\n")
331
+ summary_f.write(f" Error perc: {(total_submitted_models - models_ran_successfully) / len(total_submitted_models) * 100:.2f}%\n")
332
  summary_f.write(f" Consistency perc: {consistency_checks_passed / len(ground_truth_models) * 100:.2f}%\n")
333
  summary_f.write(f" Final Solution Accuracy perc: {all_checks_passed / len(ground_truth_models) * 100:.2f}%\n")
334
  summary_f.write("-" * 30 + "\n")