Spaces:
Running
Running
Update strings
Browse files- pages/about.py +2 -1
- src/strings.py +9 -2
pages/about.py
CHANGED
@@ -8,11 +8,12 @@ ABOUT_LEADERBOARD = """
|
|
8 |
|
9 |
### π Resources
|
10 |
- **Documentation**: [Official docs](https://autogluon.github.io/fev/latest/)
|
|
|
11 |
- **Source Code**: [GitHub repository](https://github.com/autogluon/fev)
|
12 |
- **Issues & Questions**: [GitHub Issues](https://github.com/autogluon/fev/issues)
|
13 |
|
14 |
### π Submit Your Model
|
15 |
-
Ready to add your model to the leaderboard? Follow this [tutorial](https://autogluon.github.io/fev/latest/tutorials/
|
16 |
"""
|
17 |
st.set_page_config(layout="wide", page_title="About FEV", page_icon=":material/info:")
|
18 |
st.markdown(ABOUT_LEADERBOARD)
|
|
|
8 |
|
9 |
### π Resources
|
10 |
- **Documentation**: [Official docs](https://autogluon.github.io/fev/latest/)
|
11 |
+
- **Publication**: ["fev-bench: A Realistic Benchmark for Time Series Forecasting"](https://arxiv.org/abs/2509.26468)
|
12 |
- **Source Code**: [GitHub repository](https://github.com/autogluon/fev)
|
13 |
- **Issues & Questions**: [GitHub Issues](https://github.com/autogluon/fev/issues)
|
14 |
|
15 |
### π Submit Your Model
|
16 |
+
Ready to add your model to the leaderboard? Follow this [tutorial](https://autogluon.github.io/fev/latest/tutorials/05-add-your-model/) to evaluate your model with fev and contribute your results.
|
17 |
"""
|
18 |
st.set_page_config(layout="wide", page_title="About FEV", page_icon=":material/info:")
|
19 |
st.markdown(ABOUT_LEADERBOARD)
|
src/strings.py
CHANGED
@@ -14,9 +14,13 @@ Model names are colored by type: <span style='color: {COLORS["dl_text"]}; font-w
|
|
14 |
|
15 |
The full matrix $E_{{rj}}$ with the error of each model $j$ on task $r$ is available at the bottom of the page.
|
16 |
|
17 |
-
* **Avg. win rate (%)**: Fraction of all possible model pairs and tasks where this model achieves lower error than the competing model. For model $j$, defined as $W_j = \\frac{{1}}{{R(M-1)}} \\sum_{{r=1}}^{{R}} \\sum_{{k \\neq j}} (\\mathbf{{1}}(E_{{rj}} < E_{{rk}}) + 0.5 \\cdot \\mathbf{{1}}(E_{{rj}} = E_{{rk}}))$ where $R$ is number of tasks, $M$ is number of models. Ties count as half-wins.
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
20 |
|
21 |
* **Median runtime (s)**: Median end-to-end time (training + prediction across all evaluation windows) in seconds. Note that inference times depend on hardware, batch sizes, and implementation details, so these serve as a rough guide rather than definitive performance benchmarks.
|
22 |
|
@@ -57,6 +61,9 @@ CITATION_FEV = """
|
|
57 |
title={{fev-bench}: A Realistic Benchmark for Time Series Forecasting},
|
58 |
author={Shchur, Oleksandr and Ansari, Abdul Fatir and Turkmen, Caner and Stella, Lorenzo and Erickson, Nick and Guerron, Pablo and Bohlke-Schneider, Michael and Wang, Yuyang},
|
59 |
year={2025},
|
|
|
|
|
|
|
60 |
}
|
61 |
```
|
62 |
"""
|
|
|
14 |
|
15 |
The full matrix $E_{{rj}}$ with the error of each model $j$ on task $r$ is available at the bottom of the page.
|
16 |
|
17 |
+
* **Avg. win rate (%)**: Fraction of all possible model pairs and tasks where this model achieves lower error than the competing model. For model $j$, defined as $W_j = \\frac{{1}}{{R(M-1)}} \\sum_{{r=1}}^{{R}} \\sum_{{k \\neq j}} (\\mathbf{{1}}(E_{{rj}} < E_{{rk}}) + 0.5 \\cdot \\mathbf{{1}}(E_{{rj}} = E_{{rk}}))$ where $R$ is number of tasks, $M$ is number of models. Ties count as half-wins.
|
18 |
|
19 |
+
Ranges from 0% (worst) to 100% (best). Higher values are better. This value changes as new models are added to the benchmark.
|
20 |
+
|
21 |
+
* **Skill score (%)**: Measures how much the model reduces forecasting error compared to the Seasonal Naive baseline. Computed as $S_j = 100 \\times (1 - \\sqrt[R]{{\\prod_{{r=1}}^{{R}} E_{{rj}}/E_{{r\\beta}}}})$, where $E_{{r\\beta}}$ is baseline error on task $r$. Relative errors are clipped between 0.01 and 100 before aggregation to avoid extreme outliers. Positive values indicate better-than-baseline performance, negative values indicate worse-than-baseline performance.
|
22 |
+
|
23 |
+
Higher values are better. This value does not change as new models are added to the benchmark.
|
24 |
|
25 |
* **Median runtime (s)**: Median end-to-end time (training + prediction across all evaluation windows) in seconds. Note that inference times depend on hardware, batch sizes, and implementation details, so these serve as a rough guide rather than definitive performance benchmarks.
|
26 |
|
|
|
61 |
title={{fev-bench}: A Realistic Benchmark for Time Series Forecasting},
|
62 |
author={Shchur, Oleksandr and Ansari, Abdul Fatir and Turkmen, Caner and Stella, Lorenzo and Erickson, Nick and Guerron, Pablo and Bohlke-Schneider, Michael and Wang, Yuyang},
|
63 |
year={2025},
|
64 |
+
eprint={2509.26468},
|
65 |
+
archivePrefix={arXiv},
|
66 |
+
primaryClass={cs.LG}
|
67 |
}
|
68 |
```
|
69 |
"""
|