ibm-ai-platform
/

Bamba-9B-v2

@@ -16,7 +16,7 @@ datasets:
 </p>
 # Model Card for Bamba 9B v2
-We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. Bamba v2 is trained for an additional 1T tokens that significantly improves on [Bamba v1](https://huggingface.co/ibm-ai-platform/Bamba-9B). The L1 and L2 leaderboard scores outperform Llama 3.1 8B, which was trained with nearly 5x the amount of data.
 | Model | Params     | # Layers | Hidden Dim. | Attention Heads | GQA  | KV Heads | Context Length | Tied Embeddings |
 | ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
@@ -27,7 +27,7 @@ The current release includes the following models:
 | **Stage**            | **Bamba 9B**                                                         | **Quantized**                                                           | **Note**                                                          |
 |----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
 | **Base Model**       | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-ai-platform/Bamba-9B-v2)    | coming soon                                                                     | Stage 2 pretraining + Annealing                                            |
-| **Base Model**       | [ibm-fms/Bamba-9B-v1](https://huggingface.co/ibm-fms/Bamba-9B)          | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Stage 2 pretraining                                               |
 | **Base Model**       | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T)    | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Stage 1 pretraining                                               |
 | **Base Model**       | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Intermediate checkpoints during Stage 1, more to come             |
 | **SFT**              | coming soon                                                          | coming soon                                                             | to be released in the next drop                                   |

 </p>
 # Model Card for Bamba 9B v2
+We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. Bamba v2 is trained for an additional 1T tokens that significantly improves on [Bamba v1](https://huggingface.co/ibm-ai-platform/Bamba-9B-v1). The L1 and L2 leaderboard scores outperform Llama 3.1 8B, which was trained with nearly 5x the amount of data.
 | Model | Params     | # Layers | Hidden Dim. | Attention Heads | GQA  | KV Heads | Context Length | Tied Embeddings |
 | ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
 | **Stage**            | **Bamba 9B**                                                         | **Quantized**                                                           | **Note**                                                          |
 |----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
 | **Base Model**       | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-ai-platform/Bamba-9B-v2)    | coming soon                                                                     | Stage 2 pretraining + Annealing                                            |
+| **Base Model**       | [ibm-fms/Bamba-9B-v1](https://huggingface.co/ibm-fms/Bamba-9B-v1)          | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Stage 2 pretraining                                               |
 | **Base Model**       | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T)    | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Stage 1 pretraining                                               |
 | **Base Model**       | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8)     | Intermediate checkpoints during Stage 1, more to come             |
 | **SFT**              | coming soon                                                          | coming soon                                                             | to be released in the next drop                                   |