Updating README to include v2 details
Browse files
README.md
CHANGED
|
@@ -3,6 +3,11 @@ license: apache-2.0
|
|
| 3 |
library_name: transformers
|
| 4 |
tags:
|
| 5 |
- bamba
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
---
|
| 7 |
|
| 8 |
## Model Details
|
|
@@ -11,7 +16,7 @@ tags:
|
|
| 11 |
</p>
|
| 12 |
|
| 13 |
# Model Card for Bamba 9B v2
|
| 14 |
-
We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks.
|
| 15 |
|
| 16 |
| Model | Params | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length | Tied Embeddings |
|
| 17 |
| ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
|
|
@@ -21,7 +26,7 @@ We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](h
|
|
| 21 |
The current release includes the following models:
|
| 22 |
| **Stage** | **Bamba 9B** | **Quantized** | **Note** |
|
| 23 |
|----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
|
| 24 |
-
| **Base Model** | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-fms/Bamba-9B-v2) |
|
| 25 |
| **Base Model** | [ibm-fms/Bamba-9B](https://huggingface.co/ibm-fms/Bamba-9B) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 2 pretraining |
|
| 26 |
| **Base Model** | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 1 pretraining |
|
| 27 |
| **Base Model** | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Intermediate checkpoints during Stage 1, more to come |
|
|
@@ -186,32 +191,6 @@ contributed [HF-version of Mamba2-Hybrid](https://github.com/huggingface/transfo
|
|
| 186 |
<td>9.28
|
| 187 |
</td>
|
| 188 |
</tr>
|
| 189 |
-
<tr>
|
| 190 |
-
<td rowspan="4" >Safety Tasks
|
| 191 |
-
</td>
|
| 192 |
-
<td>PopQA (5-shot)
|
| 193 |
-
</td>
|
| 194 |
-
<td>20.5
|
| 195 |
-
</td>
|
| 196 |
-
</tr>
|
| 197 |
-
<tr>
|
| 198 |
-
<td>Toxigen (5-shot)
|
| 199 |
-
</td>
|
| 200 |
-
<td>57.4
|
| 201 |
-
</td>
|
| 202 |
-
</tr>
|
| 203 |
-
<tr>
|
| 204 |
-
<td>BBQ (5-shot)
|
| 205 |
-
</td>
|
| 206 |
-
<td>44.2
|
| 207 |
-
</td>
|
| 208 |
-
</tr>
|
| 209 |
-
<tr>
|
| 210 |
-
<td>Crows-pairs english (5-shot)
|
| 211 |
-
</td>
|
| 212 |
-
<td>70.78
|
| 213 |
-
</td>
|
| 214 |
-
</tr>
|
| 215 |
</table>
|
| 216 |
|
| 217 |
*For the v2 leaderboard results, we perform [normalization](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/normalization) and report the normalized results.
|
|
|
|
| 3 |
library_name: transformers
|
| 4 |
tags:
|
| 5 |
- bamba
|
| 6 |
+
datasets:
|
| 7 |
+
- allenai/dolma
|
| 8 |
+
- allenai/olmo-mix-1124
|
| 9 |
+
- allenai/dolmino-mix-1124
|
| 10 |
+
- HuggingFaceTB/smollm-corpus
|
| 11 |
---
|
| 12 |
|
| 13 |
## Model Details
|
|
|
|
| 16 |
</p>
|
| 17 |
|
| 18 |
# Model Card for Bamba 9B v2
|
| 19 |
+
We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. Bamba v2 is trained for an additional 1T tokens that significantly improves on [Bamba v1](https://huggingface.co/ibm-ai-platform/Bamba-9B). The L1 and L2 leaderboard scores outperform Llama 3.1 8B, which was trained with nearly 5x the amount of data.
|
| 20 |
|
| 21 |
| Model | Params | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length | Tied Embeddings |
|
| 22 |
| ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
|
|
|
|
| 26 |
The current release includes the following models:
|
| 27 |
| **Stage** | **Bamba 9B** | **Quantized** | **Note** |
|
| 28 |
|----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
|
| 29 |
+
| **Base Model** | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-fms/Bamba-9B-v2) | coming soon | Stage 2 pretraining + Annealing |
|
| 30 |
| **Base Model** | [ibm-fms/Bamba-9B](https://huggingface.co/ibm-fms/Bamba-9B) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 2 pretraining |
|
| 31 |
| **Base Model** | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 1 pretraining |
|
| 32 |
| **Base Model** | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Intermediate checkpoints during Stage 1, more to come |
|
|
|
|
| 191 |
<td>9.28
|
| 192 |
</td>
|
| 193 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 194 |
</table>
|
| 195 |
|
| 196 |
*For the v2 leaderboard results, we perform [normalization](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/normalization) and report the normalized results.
|