shchuro commited on
Commit
393f9c5
Β·
1 Parent(s): c85d72b

Update strings

Browse files
Files changed (2) hide show
  1. pages/about.py +2 -1
  2. src/strings.py +9 -2
pages/about.py CHANGED
@@ -8,11 +8,12 @@ ABOUT_LEADERBOARD = """
8
 
9
  ### πŸ“š Resources
10
  - **Documentation**: [Official docs](https://autogluon.github.io/fev/latest/)
 
11
  - **Source Code**: [GitHub repository](https://github.com/autogluon/fev)
12
  - **Issues & Questions**: [GitHub Issues](https://github.com/autogluon/fev/issues)
13
 
14
  ### πŸš€ Submit Your Model
15
- Ready to add your model to the leaderboard? Follow this [tutorial](https://autogluon.github.io/fev/latest/tutorials/04-models/) to evaluate your model with fev and contribute your results.
16
  """
17
  st.set_page_config(layout="wide", page_title="About FEV", page_icon=":material/info:")
18
  st.markdown(ABOUT_LEADERBOARD)
 
8
 
9
  ### πŸ“š Resources
10
  - **Documentation**: [Official docs](https://autogluon.github.io/fev/latest/)
11
+ - **Publication**: ["fev-bench: A Realistic Benchmark for Time Series Forecasting"](https://arxiv.org/abs/2509.26468)
12
  - **Source Code**: [GitHub repository](https://github.com/autogluon/fev)
13
  - **Issues & Questions**: [GitHub Issues](https://github.com/autogluon/fev/issues)
14
 
15
  ### πŸš€ Submit Your Model
16
+ Ready to add your model to the leaderboard? Follow this [tutorial](https://autogluon.github.io/fev/latest/tutorials/05-add-your-model/) to evaluate your model with fev and contribute your results.
17
  """
18
  st.set_page_config(layout="wide", page_title="About FEV", page_icon=":material/info:")
19
  st.markdown(ABOUT_LEADERBOARD)
src/strings.py CHANGED
@@ -14,9 +14,13 @@ Model names are colored by type: <span style='color: {COLORS["dl_text"]}; font-w
14
 
15
  The full matrix $E_{{rj}}$ with the error of each model $j$ on task $r$ is available at the bottom of the page.
16
 
17
- * **Avg. win rate (%)**: Fraction of all possible model pairs and tasks where this model achieves lower error than the competing model. For model $j$, defined as $W_j = \\frac{{1}}{{R(M-1)}} \\sum_{{r=1}}^{{R}} \\sum_{{k \\neq j}} (\\mathbf{{1}}(E_{{rj}} < E_{{rk}}) + 0.5 \\cdot \\mathbf{{1}}(E_{{rj}} = E_{{rk}}))$ where $R$ is number of tasks, $M$ is number of models. Ties count as half-wins. Ranges from 0% (worst) to 100% (best). Higher values are better.
18
 
19
- * **Skill score (%)**: Measures how much the model reduces forecasting error compared to the Seasonal Naive baseline. Computed as $S_j = 100 \\times (1 - \\sqrt[R]{{\\prod_{{r=1}}^{{R}} E_{{rj}}/E_{{r\\beta}}}})$, where $E_{{r\\beta}}$ is baseline error on task $r$. Relative errors are clipped between 0.01 and 100 before aggregation to avoid extreme outliers. Positive values indicate better-than-baseline performance, negative values indicate worse-than-baseline performance. Higher values are better.
 
 
 
 
20
 
21
  * **Median runtime (s)**: Median end-to-end time (training + prediction across all evaluation windows) in seconds. Note that inference times depend on hardware, batch sizes, and implementation details, so these serve as a rough guide rather than definitive performance benchmarks.
22
 
@@ -57,6 +61,9 @@ CITATION_FEV = """
57
  title={{fev-bench}: A Realistic Benchmark for Time Series Forecasting},
58
  author={Shchur, Oleksandr and Ansari, Abdul Fatir and Turkmen, Caner and Stella, Lorenzo and Erickson, Nick and Guerron, Pablo and Bohlke-Schneider, Michael and Wang, Yuyang},
59
  year={2025},
 
 
 
60
  }
61
  ```
62
  """
 
14
 
15
  The full matrix $E_{{rj}}$ with the error of each model $j$ on task $r$ is available at the bottom of the page.
16
 
17
+ * **Avg. win rate (%)**: Fraction of all possible model pairs and tasks where this model achieves lower error than the competing model. For model $j$, defined as $W_j = \\frac{{1}}{{R(M-1)}} \\sum_{{r=1}}^{{R}} \\sum_{{k \\neq j}} (\\mathbf{{1}}(E_{{rj}} < E_{{rk}}) + 0.5 \\cdot \\mathbf{{1}}(E_{{rj}} = E_{{rk}}))$ where $R$ is number of tasks, $M$ is number of models. Ties count as half-wins.
18
 
19
+ Ranges from 0% (worst) to 100% (best). Higher values are better. This value changes as new models are added to the benchmark.
20
+
21
+ * **Skill score (%)**: Measures how much the model reduces forecasting error compared to the Seasonal Naive baseline. Computed as $S_j = 100 \\times (1 - \\sqrt[R]{{\\prod_{{r=1}}^{{R}} E_{{rj}}/E_{{r\\beta}}}})$, where $E_{{r\\beta}}$ is baseline error on task $r$. Relative errors are clipped between 0.01 and 100 before aggregation to avoid extreme outliers. Positive values indicate better-than-baseline performance, negative values indicate worse-than-baseline performance.
22
+
23
+ Higher values are better. This value does not change as new models are added to the benchmark.
24
 
25
  * **Median runtime (s)**: Median end-to-end time (training + prediction across all evaluation windows) in seconds. Note that inference times depend on hardware, batch sizes, and implementation details, so these serve as a rough guide rather than definitive performance benchmarks.
26
 
 
61
  title={{fev-bench}: A Realistic Benchmark for Time Series Forecasting},
62
  author={Shchur, Oleksandr and Ansari, Abdul Fatir and Turkmen, Caner and Stella, Lorenzo and Erickson, Nick and Guerron, Pablo and Bohlke-Schneider, Michael and Wang, Yuyang},
63
  year={2025},
64
+ eprint={2509.26468},
65
+ archivePrefix={arXiv},
66
+ primaryClass={cs.LG}
67
  }
68
  ```
69
  """