Updating README to include v2 details
Browse files
README.md
CHANGED
@@ -3,6 +3,11 @@ license: apache-2.0
|
|
3 |
library_name: transformers
|
4 |
tags:
|
5 |
- bamba
|
|
|
|
|
|
|
|
|
|
|
6 |
---
|
7 |
|
8 |
## Model Details
|
@@ -11,7 +16,7 @@ tags:
|
|
11 |
</p>
|
12 |
|
13 |
# Model Card for Bamba 9B v2
|
14 |
-
We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks.
|
15 |
|
16 |
| Model | Params | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length | Tied Embeddings |
|
17 |
| ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
|
@@ -21,7 +26,7 @@ We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](h
|
|
21 |
The current release includes the following models:
|
22 |
| **Stage** | **Bamba 9B** | **Quantized** | **Note** |
|
23 |
|----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
|
24 |
-
| **Base Model** | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-fms/Bamba-9B-v2) |
|
25 |
| **Base Model** | [ibm-fms/Bamba-9B](https://huggingface.co/ibm-fms/Bamba-9B) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 2 pretraining |
|
26 |
| **Base Model** | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 1 pretraining |
|
27 |
| **Base Model** | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Intermediate checkpoints during Stage 1, more to come |
|
@@ -186,32 +191,6 @@ contributed [HF-version of Mamba2-Hybrid](https://github.com/huggingface/transfo
|
|
186 |
<td>9.28
|
187 |
</td>
|
188 |
</tr>
|
189 |
-
<tr>
|
190 |
-
<td rowspan="4" >Safety Tasks
|
191 |
-
</td>
|
192 |
-
<td>PopQA (5-shot)
|
193 |
-
</td>
|
194 |
-
<td>20.5
|
195 |
-
</td>
|
196 |
-
</tr>
|
197 |
-
<tr>
|
198 |
-
<td>Toxigen (5-shot)
|
199 |
-
</td>
|
200 |
-
<td>57.4
|
201 |
-
</td>
|
202 |
-
</tr>
|
203 |
-
<tr>
|
204 |
-
<td>BBQ (5-shot)
|
205 |
-
</td>
|
206 |
-
<td>44.2
|
207 |
-
</td>
|
208 |
-
</tr>
|
209 |
-
<tr>
|
210 |
-
<td>Crows-pairs english (5-shot)
|
211 |
-
</td>
|
212 |
-
<td>70.78
|
213 |
-
</td>
|
214 |
-
</tr>
|
215 |
</table>
|
216 |
|
217 |
*For the v2 leaderboard results, we perform [normalization](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/normalization) and report the normalized results.
|
|
|
3 |
library_name: transformers
|
4 |
tags:
|
5 |
- bamba
|
6 |
+
datasets:
|
7 |
+
- allenai/dolma
|
8 |
+
- allenai/olmo-mix-1124
|
9 |
+
- allenai/dolmino-mix-1124
|
10 |
+
- HuggingFaceTB/smollm-corpus
|
11 |
---
|
12 |
|
13 |
## Model Details
|
|
|
16 |
</p>
|
17 |
|
18 |
# Model Card for Bamba 9B v2
|
19 |
+
We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. Bamba v2 is trained for an additional 1T tokens that significantly improves on [Bamba v1](https://huggingface.co/ibm-ai-platform/Bamba-9B). The L1 and L2 leaderboard scores outperform Llama 3.1 8B, which was trained with nearly 5x the amount of data.
|
20 |
|
21 |
| Model | Params | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length | Tied Embeddings |
|
22 |
| ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
|
|
|
26 |
The current release includes the following models:
|
27 |
| **Stage** | **Bamba 9B** | **Quantized** | **Note** |
|
28 |
|----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
|
29 |
+
| **Base Model** | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-fms/Bamba-9B-v2) | coming soon | Stage 2 pretraining + Annealing |
|
30 |
| **Base Model** | [ibm-fms/Bamba-9B](https://huggingface.co/ibm-fms/Bamba-9B) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 2 pretraining |
|
31 |
| **Base Model** | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 1 pretraining |
|
32 |
| **Base Model** | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Intermediate checkpoints during Stage 1, more to come |
|
|
|
191 |
<td>9.28
|
192 |
</td>
|
193 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
194 |
</table>
|
195 |
|
196 |
*For the v2 leaderboard results, we perform [normalization](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/normalization) and report the normalized results.
|