weiweiz1 commited on
Commit
7dcceb0
·
1 Parent(s): c8254c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -33,4 +33,15 @@ The sparse version of GPT-J 6B is a pruned variant derived from the original [GP
33
  The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model
34
  dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64
35
  dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
36
- GPT-2/GPT-3.
 
 
 
 
 
 
 
 
 
 
 
 
33
  The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model
34
  dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64
35
  dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
36
+ GPT-2/GPT-3.
37
+
38
+
39
+
40
+ ## Evaluation results
41
+
42
+ <figure>
43
+
44
+ | Model | Sparsity | Dataset | Precision | Dense Acc ↑ | Sparse Acc ↑ | Acc fluctuations |
45
+ |--------------|--------|----------------|------- |------- |-------- |-------- |
46
+ | gpt-j-6B | 40% |Lambada_openai  | FP32 | 0.6831 | 0.6922 | +1.33% |
47
+ | gpt-j-6B | 40% |Lambada_openai  | BF16 | 0.6771 | 0.6874 | +0.63% |