Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ datasets:
|
|
16 |
</p>
|
17 |
|
18 |
# Model Card for Bamba 9B v2
|
19 |
-
We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. Bamba v2 is trained for an additional 1T tokens that significantly improves on [Bamba v1](https://huggingface.co/ibm-ai-platform/Bamba-9B). The L1 and L2 leaderboard scores outperform Llama 3.1 8B, which was trained with nearly 5x the amount of data.
|
20 |
|
21 |
| Model | Params | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length | Tied Embeddings |
|
22 |
| ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
|
@@ -27,7 +27,7 @@ The current release includes the following models:
|
|
27 |
| **Stage** | **Bamba 9B** | **Quantized** | **Note** |
|
28 |
|----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
|
29 |
| **Base Model** | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-ai-platform/Bamba-9B-v2) | coming soon | Stage 2 pretraining + Annealing |
|
30 |
-
| **Base Model** | [ibm-fms/Bamba-9B-v1](https://huggingface.co/ibm-fms/Bamba-9B) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 2 pretraining |
|
31 |
| **Base Model** | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 1 pretraining |
|
32 |
| **Base Model** | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Intermediate checkpoints during Stage 1, more to come |
|
33 |
| **SFT** | coming soon | coming soon | to be released in the next drop |
|
|
|
16 |
</p>
|
17 |
|
18 |
# Model Card for Bamba 9B v2
|
19 |
+
We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. Bamba v2 is trained for an additional 1T tokens that significantly improves on [Bamba v1](https://huggingface.co/ibm-ai-platform/Bamba-9B-v1). The L1 and L2 leaderboard scores outperform Llama 3.1 8B, which was trained with nearly 5x the amount of data.
|
20 |
|
21 |
| Model | Params | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length | Tied Embeddings |
|
22 |
| ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
|
|
|
27 |
| **Stage** | **Bamba 9B** | **Quantized** | **Note** |
|
28 |
|----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
|
29 |
| **Base Model** | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-ai-platform/Bamba-9B-v2) | coming soon | Stage 2 pretraining + Annealing |
|
30 |
+
| **Base Model** | [ibm-fms/Bamba-9B-v1](https://huggingface.co/ibm-fms/Bamba-9B-v1) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 2 pretraining |
|
31 |
| **Base Model** | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 1 pretraining |
|
32 |
| **Base Model** | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Intermediate checkpoints during Stage 1, more to come |
|
33 |
| **SFT** | coming soon | coming soon | to be released in the next drop |
|