Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
File size: 3,394 Bytes
0061e14 7d20cd0 0061e14 7d20cd0 0061e14 34a2915 7d20cd0 31d76d1 5f0a178 31d76d1 5f0a178 31d76d1 8aa6e0d 31d76d1 0061e14 daa3ab0 3239883 416ebf1 3239883 333d45b 0061e14 416ebf1 0061e14 9eba8d6 0135bb2 9eba8d6 0135bb2 9eba8d6 0061e14 9eba8d6 0061e14 9eba8d6 04c3caf 0061e14 0135bb2 04c3caf 0135bb2 04c3caf 5848897 0135bb2 416ebf1 9eba8d6 0061e14 9eba8d6 0061e14 9eba8d6 daa3ab0 0061e14 333d45b 416ebf1 333d45b 416ebf1 333d45b 0061e14 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
from dataclasses import dataclass
from enum import Enum
@dataclass
class Task:
benchmark: str
metric: str
col_name: str
class Tasks(Enum):
task0 = Task("FormulaOne", "success_rate", "Success Rate (%)")
NUM_FEWSHOT = 0
TITLE = """
<h1 id="space-title" style="
text-align: center;
font-family: 'Segoe UI', 'Helvetica Neue', sans-serif;
font-weight: 300;
letter-spacing: 0.05em;
color: white;
text-transform: none;
margin-top: 2rem;
font-size: 2.6rem;
">
FormulaOne Leaderboard
</h1>
"""
INTRODUCTION_TEXT = """
Welcome to the official leaderboard for the paper:
**FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming** <br>
*Gal Beniamini, Yuval Dor, Alon Vinnikov, Shir Granot Peled, Or Weinstein, Or Sharir, Noam Wies, Tomer Nussbaum, Nadav Schweiger, Ido Ben Shaul, Tomer Zekharya, Yoav Levine, Shai Shalev-Shwartz, Amnon Shashua* <br>
**AAI, July 2025**
FormulaOne is a new benchmark designed to challenge frontier AI models. The benchmark is constructed from a vast and conceptually diverse family of dynamic programming problems derived from Monadic Second-Order (MSO) logic on graphs, a framework with profound connections to theoretical computer science.
"""
LLM_BENCHMARKS_TEXT = """
## How it works
## Reproducibility
To reproduce our results, here is the commands you can run:
"""
EVALUATION_QUEUE_TEXT = """
## π§ͺ Submitting to the FormulaOne Leaderboard
This leaderboard evaluates systems on the FormulaOne core dataset. Submissions consist of a .jsonl file with solution code for each problem.
### π I. Format Your Submission File
Your submission must be a .jsonl file with one entry per problem:
```json
{"problem_id": "1", "solution": "<your Python code here>"}
{"problem_id": "2", "solution": "<your Python code here>"}
...
```
- problem_id: Must match the official list of FormulaOne core problems.
- solution: A Python code implementing the required callback functions.
π Full list of problem_ids:
View the [FormulaOne core dataset](https://github.com/double-ai/formulaone-dataset-release/dataset/formulaone) for the complete list of problem IDs.
β οΈ Validation Rules:
Submissions must:
- Contain exactly two columns: ["problem_id", "solution"]
- Include all required problems (no missing/unknown IDs)
- Provide solutions as Python strings
- Avoid duplicates
### π€ II. Submit via the UI below
- Upload your `.jsonl` file.
- Fill in the following fields:
- **System Name**
- **Organization**
- **System Type**
- Click **Submit**.
### β±οΈ After Submission
Submissions are validated and evaluated within ~24 hours. Results will appear on the leaderboard once processed.
"""
CITATION_BUTTON_LABEL = """π How to cite FormulaOne"""
CITATION_BUTTON_TEXT = r"""
@misc{beniamini2025formulaonemeasuringdepthalgorithmic,
title={FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming},
author={Gal Beniamini and Yuval Dor and Alon Vinnikov and Shir Granot Peled and Or Weinstein and Or Sharir and Noam Wies and Tomer Nussbaum and Nadav Schweiger and Ido Ben Shaul and Tomer Zekharya and Yoav Levine and Shai Shalev-Shwartz and Amnon Shashua},
year={2025},
eprint={2507.13337},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2507.13337},
}
"""
|