Commit
·
eedca21
1
Parent(s):
5b8b83b
Update README.md
Browse files
README.md
CHANGED
@@ -36,7 +36,7 @@ A sequence of word embeddings is therefore processed sequentially by each transf
|
|
36 |
|
37 |
## Details model architecture
|
38 |
|
39 |
-
The *conventional* T5 architectures are summarized in the following table
|
40 |
|
41 |
| Model | nl | ff | dm | kv | nh | #Params|
|
42 |
| ----| ---- | ---- | ---- | ---- | ---- | ----|
|
@@ -48,7 +48,17 @@ The *conventional* T5 architectures are summarized in the following table.
|
|
48 |
| **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
|
49 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
50 |
|
51 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
## Pre-Training
|
54 |
|
|
|
36 |
|
37 |
## Details model architecture
|
38 |
|
39 |
+
The *conventional* T5 architectures are summarized in the following table:
|
40 |
|
41 |
| Model | nl | ff | dm | kv | nh | #Params|
|
42 |
| ----| ---- | ---- | ---- | ---- | ---- | ----|
|
|
|
48 |
| **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
|
49 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
50 |
|
51 |
+
with the following definitions:
|
52 |
+
|
53 |
+
| NL | Number of transformer blocks (depth) |
|
54 |
+
| EL | Number of transformer blocks in the encoder (encoder depth) |
|
55 |
+
| DL | Number of transformer blocks in the decoder (decoder depth) |
|
56 |
+
| DM | Dimension of embedding vector (output vector of transformers block) |
|
57 |
+
| KV | Dimension of key/value projection matrix |
|
58 |
+
| NH | Number of attention heads |
|
59 |
+
| FF | Dimension of intermediate vector within transformer block (size of feed-forward projection matrix) |
|
60 |
+
| SH | Signifies that attention heads are shared |
|
61 |
+
| SKV | Signifies that key-values projection matrices are tied |
|
62 |
|
63 |
## Pre-Training
|
64 |
|