Gregor Betz commited on
Commit
97d5b99
·
unverified ·
1 Parent(s): ad554f1

description

Browse files
Files changed (1) hide show
  1. src/display/about.py +6 -17
src/display/about.py CHANGED
@@ -53,23 +53,12 @@ Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingfac
53
 
54
  Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
55
 
56
- | Leaderboard | Measures | Metric | Focus |
57
- |:---|:---|:---|:---|
58
- | 🤗 Open LLM Leaderboard | Task performance | Absolute accuracy | Task performance |
59
-
60
- ### 🤗 Open LLM Leaderboard
61
- * Can `model` solve `task`?
62
- * Measures `task` performance.
63
- * Metric: absolute accuracy.
64
- * Covers broad spectrum of `tasks`.
65
-
66
- ### `/\/` Open CoT Leaderboard
67
- * Can `model` do CoT to improve in `task`?
68
- * Measures ability to reason (about `task`).
69
- * Metric: relative accuracy gain.
70
- * Focuses on critical thinking `tasks`.
71
-
72
-
73
 
74
 
75
  ## Test dataset selection (`tasks`)
 
53
 
54
  Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
55
 
56
+ | 🤗 Open LLM Leaderboard | `/\/` Open CoT Leaderboard |
57
+ |:---|:---|
58
+ | Can `model` solve `task`? | Can `model` do CoT to improve in `task`? |
59
+ | Measures `task` performance. | Measures ability to reason (about `task`). |
60
+ | Metric: absolute accuracy. | Metric: relative accuracy gain. |
61
+ | Covers broad spectrum of `tasks`. | Focuses on critical thinking `tasks`. |
 
 
 
 
 
 
 
 
 
 
 
62
 
63
 
64
  ## Test dataset selection (`tasks`)