rganti commited on
Commit
83dc877
·
verified ·
1 Parent(s): 8bc5b3e

Updating README to include v2 details

Browse files
Files changed (1) hide show
  1. README.md +7 -28
README.md CHANGED
@@ -3,6 +3,11 @@ license: apache-2.0
3
  library_name: transformers
4
  tags:
5
  - bamba
 
 
 
 
 
6
  ---
7
 
8
  ## Model Details
@@ -11,7 +16,7 @@ tags:
11
  </p>
12
 
13
  # Model Card for Bamba 9B v2
14
- We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. It is trained from scratch using a two-stage training approach. In the first stage, the model is trained on 2 trillion tokens from the Dolma v1.7 dataset. In the second stage, it undergoes additional training on another 1.1 trillion tokens, leveraging a carefully curated blend of high-quality data to further refine its performance and enhance output quality.
15
 
16
  | Model | Params | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length | Tied Embeddings |
17
  | ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
@@ -21,7 +26,7 @@ We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](h
21
  The current release includes the following models:
22
  | **Stage** | **Bamba 9B** | **Quantized** | **Note** |
23
  |----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
24
- | **Base Model** | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-fms/Bamba-9B-v2) | TBD | Stage 2 pretraining |
25
  | **Base Model** | [ibm-fms/Bamba-9B](https://huggingface.co/ibm-fms/Bamba-9B) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 2 pretraining |
26
  | **Base Model** | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 1 pretraining |
27
  | **Base Model** | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Intermediate checkpoints during Stage 1, more to come |
@@ -186,32 +191,6 @@ contributed [HF-version of Mamba2-Hybrid](https://github.com/huggingface/transfo
186
  <td>9.28
187
  </td>
188
  </tr>
189
- <tr>
190
- <td rowspan="4" >Safety Tasks
191
- </td>
192
- <td>PopQA (5-shot)
193
- </td>
194
- <td>20.5
195
- </td>
196
- </tr>
197
- <tr>
198
- <td>Toxigen (5-shot)
199
- </td>
200
- <td>57.4
201
- </td>
202
- </tr>
203
- <tr>
204
- <td>BBQ (5-shot)
205
- </td>
206
- <td>44.2
207
- </td>
208
- </tr>
209
- <tr>
210
- <td>Crows-pairs english (5-shot)
211
- </td>
212
- <td>70.78
213
- </td>
214
- </tr>
215
  </table>
216
 
217
  *For the v2 leaderboard results, we perform [normalization](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/normalization) and report the normalized results.
 
3
  library_name: transformers
4
  tags:
5
  - bamba
6
+ datasets:
7
+ - allenai/dolma
8
+ - allenai/olmo-mix-1124
9
+ - allenai/dolmino-mix-1124
10
+ - HuggingFaceTB/smollm-corpus
11
  ---
12
 
13
  ## Model Details
 
16
  </p>
17
 
18
  # Model Card for Bamba 9B v2
19
+ We introduce Bamba-9B-v2, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. Bamba v2 is trained for an additional 1T tokens that significantly improves on [Bamba v1](https://huggingface.co/ibm-ai-platform/Bamba-9B). The L1 and L2 leaderboard scores outperform Llama 3.1 8B, which was trained with nearly 5x the amount of data.
20
 
21
  | Model | Params | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length | Tied Embeddings |
22
  | ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
 
26
  The current release includes the following models:
27
  | **Stage** | **Bamba 9B** | **Quantized** | **Note** |
28
  |----------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|
29
+ | **Base Model** | [ibm-fms/Bamba-9B-v2](https://huggingface.co/ibm-fms/Bamba-9B-v2) | coming soon | Stage 2 pretraining + Annealing |
30
  | **Base Model** | [ibm-fms/Bamba-9B](https://huggingface.co/ibm-fms/Bamba-9B) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 2 pretraining |
31
  | **Base Model** | [ibm-fms/Bamba-9B-2T](https://huggingface.co/ibm-fms/Bamba-9B-2T) | [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Stage 1 pretraining |
32
  | **Base Model** | [ibm-fms/Bamba-9B-1.8T](https://huggingface.co/ibm-fms/Bamba-9B-1.8T)| [ibm-fms/Bamba-9B-fp8](https://huggingface.co/ibm-fms/Bamba-9B-fp8) | Intermediate checkpoints during Stage 1, more to come |
 
191
  <td>9.28
192
  </td>
193
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
  </table>
195
 
196
  *For the v2 leaderboard results, we perform [normalization](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/normalization) and report the normalized results.