Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Gregor Betz
commited on
description
Browse files- src/display/about.py +6 -17
src/display/about.py
CHANGED
@@ -53,23 +53,12 @@ Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingfac
|
|
53 |
|
54 |
Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
|
55 |
|
56 |
-
| Leaderboard |
|
57 |
-
|
58 |
-
|
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
* Measures `task` performance.
|
63 |
-
* Metric: absolute accuracy.
|
64 |
-
* Covers broad spectrum of `tasks`.
|
65 |
-
|
66 |
-
### `/\/` Open CoT Leaderboard
|
67 |
-
* Can `model` do CoT to improve in `task`?
|
68 |
-
* Measures ability to reason (about `task`).
|
69 |
-
* Metric: relative accuracy gain.
|
70 |
-
* Focuses on critical thinking `tasks`.
|
71 |
-
|
72 |
-
|
73 |
|
74 |
|
75 |
## Test dataset selection (`tasks`)
|
|
|
53 |
|
54 |
Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
|
55 |
|
56 |
+
| 🤗 Open LLM Leaderboard | `/\/` Open CoT Leaderboard |
|
57 |
+
|:---|:---|
|
58 |
+
| Can `model` solve `task`? | Can `model` do CoT to improve in `task`? |
|
59 |
+
| Measures `task` performance. | Measures ability to reason (about `task`). |
|
60 |
+
| Metric: absolute accuracy. | Metric: relative accuracy gain. |
|
61 |
+
| Covers broad spectrum of `tasks`. | Focuses on critical thinking `tasks`. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
|
63 |
|
64 |
## Test dataset selection (`tasks`)
|