Commit
·
5b8b83b
1
Parent(s):
9df3444
Update README.md
Browse files
README.md
CHANGED
@@ -38,14 +38,14 @@ A sequence of word embeddings is therefore processed sequentially by each transf
|
|
38 |
|
39 |
The *conventional* T5 architectures are summarized in the following table.
|
40 |
|
41 |
-
| Model |
|
42 |
| ----| ---- | ---- | ---- | ---- | ---- | ----|
|
43 |
| Tiny | 4/4 | 1024 | 256 | 32 | 4 | 16M|
|
44 |
| Mini | 4/4 | 1536 | 384 | 32 | 8 | 31M|
|
45 |
| Small | 6/6 | 2048 | 512 | 32 | 8 | 60M|
|
46 |
| Base | 12/12 | 3072 | 768 | 64 | 12 | 220M|
|
47 |
| Large | 24/24 | 4096 | 1024 | 64 | 16 | 738M|
|
48 |
-
|
49 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
50 |
|
51 |
This
|
|
|
38 |
|
39 |
The *conventional* T5 architectures are summarized in the following table.
|
40 |
|
41 |
+
| Model | nl | ff | dm | kv | nh | #Params|
|
42 |
| ----| ---- | ---- | ---- | ---- | ---- | ----|
|
43 |
| Tiny | 4/4 | 1024 | 256 | 32 | 4 | 16M|
|
44 |
| Mini | 4/4 | 1536 | 384 | 32 | 8 | 31M|
|
45 |
| Small | 6/6 | 2048 | 512 | 32 | 8 | 60M|
|
46 |
| Base | 12/12 | 3072 | 768 | 64 | 12 | 220M|
|
47 |
| Large | 24/24 | 4096 | 1024 | 64 | 16 | 738M|
|
48 |
+
| **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
|
49 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
50 |
|
51 |
This
|