ibm-ai-platform
/

Bamba-9B-v2

@@ -3,6 +3,11 @@ license: apache-2.0
 library_name: transformers
 tags:
 - bamba
 ---
 ## Model Details
@@ -11,7 +16,7 @@ tags:
 </p>
 # Model Card for Bamba 9B v2
-We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. It is trained from scratch using a two-stage training approach. In the first stage, the model is trained on 2 trillion tokens from the Dolma v1.7 dataset. In the second stage, it undergoes additional training on another 1.1 trillion tokens, leveraging a carefully curated blend of high-quality data to further refine its performance and enhance output quality.
 | Model | Params     | # Layers | Hidden Dim. | Attention Heads | GQA  | KV Heads | Context Length | Tied Embeddings |
 | ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
@@ -21,7 +26,7 @@ We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](h
 The current release includes the following models:
 | **Stage**            | **Bamba 9B**                                                         | **Quantized**                                                           | **Note**                                                          |
 |----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
-| **Base Model**       | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-fms/Bamba-9B-v2)    | TBD                                                                     | Stage 2 pretraining                                               |
 | **Base Model**       | [ibm-fms/Bamba-9B](https://huggingface.co/ibm-fms/Bamba-9B)          | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Stage 2 pretraining                                               |
 | **Base Model**       | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T)    | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Stage 1 pretraining                                               |
 | **Base Model**       | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Intermediate checkpoints during Stage 1, more to come             |
@@ -186,32 +191,6 @@ contributed [HF-version of Mamba2-Hybrid](https://github.com/huggingface/transfo
 <td>9.28
 </td>
 </tr>
-<tr>
-<td rowspan="4" >Safety Tasks
-</td>
-<td>PopQA (5-shot)
-</td>
-<td>20.5
-</td>
-</tr>
-<tr>
-<td>Toxigen (5-shot)
-</td>
-<td>57.4
-</td>
-</tr>
-<tr>
-<td>BBQ (5-shot)
-</td>
-<td>44.2
-</td>
-</tr>
-<tr>
-<td>Crows-pairs english (5-shot)
-</td>
-<td>70.78
-</td>
-</tr>
 </table>
 *For the v2 leaderboard results, we perform [normalization](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/normalization) and report the normalized results.

 library_name: transformers
 tags:
 - bamba
+datasets:
+- allenai/dolma
+- allenai/olmo-mix-1124
+- allenai/dolmino-mix-1124
+- HuggingFaceTB/smollm-corpus
 ---
 ## Model Details
 </p>
 # Model Card for Bamba 9B v2
+We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. Bamba v2 is trained for an additional 1T tokens that significantly improves on [Bamba v1](https://huggingface.co/ibm-ai-platform/Bamba-9B). The L1 and L2 leaderboard scores outperform Llama 3.1 8B, which was trained with nearly 5x the amount of data.
 | Model | Params     | # Layers | Hidden Dim. | Attention Heads | GQA  | KV Heads | Context Length | Tied Embeddings |
 | ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
 The current release includes the following models:
 | **Stage**            | **Bamba 9B**                                                         | **Quantized**                                                           | **Note**                                                          |
 |----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
+| **Base Model**       | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-fms/Bamba-9B-v2)    | coming soon                                                                     | Stage 2 pretraining + Annealing                                            |
 | **Base Model**       | [ibm-fms/Bamba-9B](https://huggingface.co/ibm-fms/Bamba-9B)          | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Stage 2 pretraining                                               |
 | **Base Model**       | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T)    | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Stage 1 pretraining                                               |
 | **Base Model**       | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Intermediate checkpoints during Stage 1, more to come             |
 <td>9.28
 </td>
 </tr>
 </table>
 *For the v2 leaderboard results, we perform [normalization](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/normalization) and report the normalized results.