divykum commited on
Commit
ad081bc
·
verified ·
1 Parent(s): 0388197

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -16,7 +16,7 @@ datasets:
16
  </p>
17
 
18
  # Model Card for Bamba 9B v2
19
- We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. Bamba v2 is trained for an additional 1T tokens that significantly improves on [Bamba v1](https://huggingface.co/ibm-ai-platform/Bamba-9B). The L1 and L2 leaderboard scores outperform Llama 3.1 8B, which was trained with nearly 5x the amount of data.
20
 
21
  | Model | Params | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length | Tied Embeddings |
22
  | ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
@@ -27,7 +27,7 @@ The current release includes the following models:
27
  | **Stage** | **Bamba 9B** | **Quantized** | **Note** |
28
  |----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
29
  | **Base Model** | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-ai-platform/Bamba-9B-v2) | coming soon | Stage 2 pretraining + Annealing |
30
- | **Base Model** | [ibm-fms/Bamba-9B-v1](https://huggingface.co/ibm-fms/Bamba-9B) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 2 pretraining |
31
  | **Base Model** | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 1 pretraining |
32
  | **Base Model** | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Intermediate checkpoints during Stage 1, more to come |
33
  | **SFT** | coming soon | coming soon | to be released in the next drop |
 
16
  </p>
17
 
18
  # Model Card for Bamba 9B v2
19
+ We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. Bamba v2 is trained for an additional 1T tokens that significantly improves on [Bamba v1](https://huggingface.co/ibm-ai-platform/Bamba-9B-v1). The L1 and L2 leaderboard scores outperform Llama 3.1 8B, which was trained with nearly 5x the amount of data.
20
 
21
  | Model | Params | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length | Tied Embeddings |
22
  | ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
 
27
  | **Stage** | **Bamba 9B** | **Quantized** | **Note** |
28
  |----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
29
  | **Base Model** | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-ai-platform/Bamba-9B-v2) | coming soon | Stage 2 pretraining + Annealing |
30
+ | **Base Model** | [ibm-fms/Bamba-9B-v1](https://huggingface.co/ibm-fms/Bamba-9B-v1) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 2 pretraining |
31
  | **Base Model** | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 1 pretraining |
32
  | **Base Model** | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Intermediate checkpoints during Stage 1, more to come |
33
  | **SFT** | coming soon | coming soon | to be released in the next drop |