bjoernp commited on
Commit
b22e530
·
verified ·
1 Parent(s): f3887a8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # BitLLama Micro (Experimental + untrained)
6
+ This model contains the modeling code for the 1.58-bit Llama Model following the reference paper: https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf
7
+
8
+ For more details see: https://github.com/bjoernpl/bitllama
9
+
10
+ The model was initialized with the following config:
11
+
12
+ ```
13
+ from transformers.models.bitllama import BitLlamaForCausalLM, LlamaConfig
14
+
15
+ model_config = LlamaConfig(
16
+ # Config for a tiny model model with 1.62M parameters
17
+ bos_token_id=1,
18
+ eos_token_id=2,
19
+ hidden_act="silu",
20
+ hidden_size=512,
21
+ initializer_range=0.02,
22
+ intermediate_size=1365,
23
+ max_position_embeddings=32000,
24
+ num_attention_heads=8,
25
+ num_hidden_layers=12,
26
+ num_key_value_heads=4,
27
+ pretraining_tp=1,
28
+ rms_norm_eps=1e-05,
29
+ rope_scaling=None,
30
+ tie_word_embeddings=True,
31
+ use_cache=True,
32
+ vocab_size=32000,
33
+ )
34
+
35
+ model = BitLlamaForCausalLM._from_config(model_config)
36
+ model.push_to_hub("bjoernp/micro-bitllama")
37
+ ```