Update README.md
Browse files
README.md
CHANGED
@@ -33,4 +33,15 @@ The sparse version of GPT-J 6B is a pruned variant derived from the original [GP
|
|
33 |
The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model
|
34 |
dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64
|
35 |
dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
|
36 |
-
GPT-2/GPT-3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model
|
34 |
dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64
|
35 |
dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
|
36 |
+
GPT-2/GPT-3.
|
37 |
+
|
38 |
+
|
39 |
+
|
40 |
+
## Evaluation results
|
41 |
+
|
42 |
+
<figure>
|
43 |
+
|
44 |
+
| Model | Sparsity | Dataset | Precision | Dense Acc ↑ | Sparse Acc ↑ | Acc fluctuations |
|
45 |
+
|--------------|--------|----------------|------- |------- |-------- |-------- |
|
46 |
+
| gpt-j-6B | 40% |Lambada_openai | FP32 | 0.6831 | 0.6922 | +1.33% |
|
47 |
+
| gpt-j-6B | 40% |Lambada_openai | BF16 | 0.6771 | 0.6874 | +0.63% |
|