TechxGenus commited on
Commit
06c55e3
·
verified ·
1 Parent(s): 37c4c7e

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ tags:
5
+ - jamba
6
+ - mamba
7
+ - moe
8
+ ---
9
+
10
+ A compatible version of [Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1) in transformers that no longer requires `trust_remote_code=True`.
11
+
12
+ # Model Card for Jamba
13
+
14
+ Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. It delivers throughput gains over traditional Transformer-based models, while outperforming or matching the leading models of its size class on most common benchmarks.
15
+
16
+ Jamba is the first production-scale Mamba implementation, which opens up interesting research and application opportunities. While this initial experimentation shows encouraging gains, we expect these to be further enhanced with future optimizations and explorations.
17
+
18
+ This model card is for the base version of Jamba. It’s a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and a total of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU.
19
+
20
+ For full details of this model please read the [release blog post](https://www.ai21.com/blog/announcing-jamba).
21
+
22
+ ## Model Details
23
+
24
+ - **Developed by:** [AI21](https://www.ai21.com)
25
+ - **Model type:** Joint Attention and Mamba (Jamba)
26
+ - **License:** Apache 2.0
27
+ - **Context length:** 256K
28
+ - **Knowledge cutoff date:** March 5, 2024
29
+
30
+ ## Usage
31
+ ### Presequities
32
+ Jamba requires you use `transformers` version 4.40.0 or higher:
33
+ ```bash
34
+ pip install transformers>=4.40.0
35
+ ```
36
+
37
+ In order to run optimized Mamba implementations, you first need to install `mamba-ssm` and `causal-conv1d`:
38
+ ```bash
39
+ pip install mamba-ssm causal-conv1d>=1.2.0
40
+ ```
41
+ You also have to have the model on a CUDA device.
42
+
43
+ You can run the model not using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly lower latencies. In order to do that, you'll need to specify `use_mamba_kernels=False` when loading the model.
44
+
45
+ ### Run the model
46
+ ```python
47
+ from transformers import AutoModelForCausalLM, AutoTokenizer
48
+
49
+ model = AutoModelForCausalLM.from_pretrained("TechxGenus/Jamba-v0.1-hf")
50
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/Jamba-v0.1-hf")
51
+
52
+ input_ids = tokenizer("In the recent Super Bowl LVIII,", return_tensors='pt').to(model.device)["input_ids"]
53
+
54
+ outputs = model.generate(input_ids, max_new_tokens=216)
55
+
56
+ print(tokenizer.batch_decode(outputs))
57
+ # ["<|startoftext|>In the recent Super Bowl LVIII, the Kansas City Chiefs emerged victorious, defeating the San Francisco 49ers in a thrilling overtime showdown. The game was a nail-biter, with both teams showcasing their skills and determination.\n\nThe Chiefs, led by their star quarterback Patrick Mahomes, displayed their offensive prowess, while the 49ers, led by their strong defense, put up a tough fight. The game went into overtime, with the Chiefs ultimately securing the win with a touchdown.\n\nThe victory marked the Chiefs' second Super Bowl win in four years, solidifying their status as one of the top teams in the NFL. The game was a testament to the skill and talent of both teams, and a thrilling end to the NFL season.\n\nThe Super Bowl is not just about the game itself, but also about the halftime show and the commercials. This year's halftime show featured a star-studded lineup, including Usher, Alicia Keys, and Lil Jon. The show was a spectacle of music and dance, with the performers delivering an energetic and entertaining performance.\n"]
58
+ ```
59
+
60
+ <details>
61
+ <summary><strong>Loading the model in half precision</strong></summary>
62
+
63
+ The published checkpoint is saved in BF16. In order to load it into RAM in BF16/FP16, you need to specify `torch_dtype`:
64
+
65
+ ```python
66
+ from transformers import AutoModelForCausalLM
67
+ import torch
68
+ model = AutoModelForCausalLM.from_pretrained("TechxGenus/Jamba-v0.1-hf",
69
+ torch_dtype=torch.bfloat16) # you can also use torch_dtype=torch.float16
70
+ ```
71
+
72
+ When using half precision, you can enable the [FlashAttention2](https://github.com/Dao-AILab/flash-attention) implementation of the Attention blocks. In order to use it, you also need the model on a CUDA device. Since in this precision the model is to big to fit on a single 80GB GPU, you'll also need to parallelize it using [accelerate](https://huggingface.co/docs/accelerate/index):
73
+ ```python
74
+ from transformers import AutoModelForCausalLM
75
+ import torch
76
+ model = AutoModelForCausalLM.from_pretrained("TechxGenus/Jamba-v0.1-hf",
77
+ torch_dtype=torch.bfloat16,
78
+ attn_implementation="flash_attention_2",
79
+ device_map="auto")
80
+ ```
81
+
82
+ </details>
83
+ <details><summary><strong>Load the model in 8-bit</strong></summary>
84
+
85
+ **Using 8-bit precision, it is possible to fit up to 140K sequence lengths on a single 80GB GPU.** You can easily quantize the model to 8-bit using [bitsandbytes](https://huggingface.co/docs/bitsandbytes/index). In order to not degrade model quality, we recommend to exclude the Mamba blocks from the quantization:
86
+
87
+ ```python
88
+ from transformers import AutoModelForCausalLM, BitsAndBytesConfig
89
+ quantization_config = BitsAndBytesConfig(load_in_8bit=True,
90
+ llm_int8_skip_modules=["mamba"])
91
+ model = AutoModelForCausalLM.from_pretrained("TechxGenus/Jamba-v0.1-hf",
92
+ torch_dtype=torch.bfloat16,
93
+ attn_implementation="flash_attention_2",
94
+ quantization_config=quantization_config)
95
+ ```
96
+ </details>
97
+
98
+ ### Fine-tuning example
99
+ Jamba is a base model that can be fine-tuned for custom solutions (including for chat/instruct versions). You can fine-tune it using any technique of your choice. Here is an example of fine-tuning with the [PEFT](https://huggingface.co/docs/peft/index) library:
100
+
101
+ ```python
102
+ from datasets import load_dataset
103
+ from trl import SFTTrainer
104
+ from peft import LoraConfig
105
+ from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
106
+
107
+ tokenizer = AutoTokenizer.from_pretrained("TechxGenus/Jamba-v0.1-hf")
108
+ model = AutoModelForCausalLM.from_pretrained("TechxGenus/Jamba-v0.1-hf", device_map='auto')
109
+
110
+ dataset = load_dataset("Abirate/english_quotes", split="train")
111
+ training_args = TrainingArguments(
112
+ output_dir="./results",
113
+ num_train_epochs=3,
114
+ per_device_train_batch_size=4,
115
+ logging_dir='./logs',
116
+ logging_steps=10,
117
+ learning_rate=2e-3
118
+ )
119
+ lora_config = LoraConfig(
120
+ r=8,
121
+ target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"],
122
+ task_type="CAUSAL_LM",
123
+ bias="none"
124
+ )
125
+ trainer = SFTTrainer(
126
+ model=model,
127
+ tokenizer=tokenizer,
128
+ args=training_args,
129
+ peft_config=lora_config,
130
+ train_dataset=dataset,
131
+ dataset_text_field="quote",
132
+ )
133
+
134
+ trainer.train()
135
+ ```
136
+
137
+ ## Results on common benchmarks
138
+ | Benchmark | Score |
139
+ |--------------|:-----:|
140
+ | HellaSwag | 87.1% |
141
+ | Arc Challenge | 64.4% |
142
+ | WinoGrande | 82.5% |
143
+ | PIQA | 83.2% |
144
+ | MMLU | 67.4% |
145
+ | BBH | 45.4% |
146
+ | TruthfulQA | 46.4% |
147
+ | GSM8K (CoT) | 59.9% |
148
+
149
+ It's crucial that the 'BOS' token is added to all prompts, which might not be enabled by default in all eval frameworks.
150
+
151
+
152
+ ## Notice
153
+ Jamba is a pretrained base model and did not undergo any alignment for instruct/chat interactions.
154
+
155
+ As a base model, Jamba is intended for use as a foundation layer for fine tuning, training, and developing custom solutions. Jamba does not have safety moderation mechanisms and guardrails should be added for responsible and safe use.
156
+
157
+ ## About AI21
158
+ AI21 builds reliable, practical, and scalable AI solutions for the enterprise.
159
+
160
+ Jamba is the first in AI21’s new family of models, and the Instruct version of Jamba is available in beta via the [AI21 platform](https://www.ai21.com/studio).
config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "JambaForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "attn_layer_offset": 4,
7
+ "attn_layer_period": 8,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "expert_layer_offset": 1,
11
+ "expert_layer_period": 2,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 4096,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 14336,
16
+ "mamba_conv_bias": true,
17
+ "mamba_d_conv": 4,
18
+ "mamba_d_state": 16,
19
+ "mamba_dt_rank": 256,
20
+ "mamba_expand": 2,
21
+ "mamba_proj_bias": false,
22
+ "max_position_embeddings": 262144,
23
+ "model_type": "jamba",
24
+ "num_attention_heads": 32,
25
+ "num_experts": 16,
26
+ "num_experts_per_tok": 2,
27
+ "num_hidden_layers": 32,
28
+ "num_key_value_heads": 8,
29
+ "num_logits_to_keep": 1,
30
+ "output_router_logits": false,
31
+ "pad_token_id": 0,
32
+ "rms_norm_eps": 1e-06,
33
+ "router_aux_loss_coef": 0.001,
34
+ "sliding_window": null,
35
+ "tie_word_embeddings": false,
36
+ "torch_dtype": "bfloat16",
37
+ "transformers_version": "4.40.0",
38
+ "use_cache": true,
39
+ "use_mamba_kernels": true,
40
+ "vocab_size": 65536
41
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.40.0"
7
+ }
model-00001-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1aace34ee0da3bf95605bd150fff6d3e78110be4048a3c389b0a740354b2ccb7
3
+ size 4951761424
model-00002-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ba1de67a86329431f14f7ffa165d84055d32ce57a6d2314e3b2464eac3732dc
3
+ size 4884669624
model-00003-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1abc4f16865fb78241c9453292ee3b2ca2c1e2d54ee945631da625834b95c9b2
3
+ size 4992557120
model-00004-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45fab97739a58e924791572ea3d06f9c90b9ff2a299460aaa4bd87c6e9d424f3
3
+ size 4958853560
model-00005-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4b0ec6e8f33e6d7b1f837cd4c25818487dcc7e478734606da28110507e51c97
3
+ size 4975763832
model-00006-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed98d5c3c8d7ab7352944bea09b0d54d98066cf567ba3d069da12c05575d56ed
3
+ size 4884669616
model-00007-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:735be2bc568711bf42a4caebcda8288dd300b31b48fa098b00df3cf1a98e10e2
3
+ size 4884669640
model-00008-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0c8d817b2b47661d361e8b520128b3194185f756cc2204a95d642e24895ee51
3
+ size 4992557176
model-00009-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e50222cf865ca5678d22574b131294303c46b249478cf70113c701f70331e999
3
+ size 4932507176
model-00010-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1b4b69b24ae55827b6c8b1e4a10807aa3525bc85f4d34dc002ac7440757fbf4
3
+ size 4884669672
model-00011-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60213cac13b92ed34b93ce48e670434f22e3bf8b2b8df20c60b7bf8a9515c35c
3
+ size 4884669696
model-00012-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05805eacd3bb40cc9da802350409f1cb078e8b276da7e06c7a8a5ca5b26cc887
3
+ size 4884669688
model-00013-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:201df979a1b34ced6cdbb7a790163412636779f1119e3845a704c489181d03d2
3
+ size 4932507176
model-00014-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0a7eb42a9ea3a385442c2e758dd5efd5dc5b913f1d10bfd37792cc963a33c93
3
+ size 4992557152
model-00015-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4b9afe4398000c28b36e3aa40c87086af673d4f8a64bfc5767941ab2008bcc9
3
+ size 4884669688
model-00016-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd1ac6cc861971c43bdf0c9c6d4c9fe72d33e5227e054a621e2e68f001419763
3
+ size 4884669688
model-00017-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52d9eea696dd29ef413d617bbcb62a9f159e8fe8170d36e018932cef45ee281d
3
+ size 4908522856
model-00018-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77acada7c098e81280645ea0a9dbfa00196dca6da8946498b9907e9e376fb42d
3
+ size 4908654000
model-00019-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09e10dfd6c6459cd3460b1d667639717d3657274c1694c19a6fdbac1be6a76bf
3
+ size 4992557168
model-00020-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2bd5c27b2cca6e06f7b4497ce8c9b1522a64846817a871bad274d08507960ed0
3
+ size 4884669696
model-00021-of-00021.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a47ef23db8deb5364da676a40dc3dcb011fb9d9ceef13ba044c176e9a83ac1e3
3
+ size 4647318576
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|pad|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<|unk|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02fd6530b8ede0eedd8e509fcab32da7b1dd04c8119f8498c787100f13112713
3
+ size 1124742
tokenizer_config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<|pad|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<|startoftext|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "<|endoftext|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "3": {
30
+ "content": "<|unk|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ }
37
+ },
38
+ "bos_token": "<|startoftext|>",
39
+ "clean_up_tokenization_spaces": false,
40
+ "eos_token": "<|endoftext|>",
41
+ "model_max_length": 1000000000000000019884624838656,
42
+ "pad_token": "<|pad|>",
43
+ "spaces_between_special_tokens": false,
44
+ "tokenizer_class": "LlamaTokenizer",
45
+ "unk_token": "<|unk|>",
46
+ "use_default_system_prompt": false
47
+ }