Commit
·
b8ecf29
1
Parent(s):
db69d5e
Update README.md
Browse files
README.md
CHANGED
@@ -38,7 +38,11 @@ A sequence of word embeddings is therefore processed sequentially by each transf
|
|
38 |
|
39 |
## Details model architecture
|
40 |
|
41 |
-
|
|
|
|
|
|
|
|
|
42 |
|
43 |
| Model | nl (el/dl) | ff | dm | kv | nh | #Params|
|
44 |
| ----| ---- | ---- | ---- | ---- | ---- | ----|
|
@@ -50,7 +54,7 @@ The *conventional* T5 architectures are summarized in the following table:
|
|
50 |
| **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
|
51 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
52 |
|
53 |
-
|
54 |
|
55 |
| Abbreviation | Definition |
|
56 |
| ----| ---- |
|
@@ -66,10 +70,6 @@ with the following definitions:
|
|
66 |
|
67 |
If a model checkpoint has no specific, *el* or *dl* than both the number of encoder- and decoder layers correspond no *nl*.
|
68 |
|
69 |
-
This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
|
70 |
-
It has **2852** million parameters and thus requires **11406** MB of memory in full precision (*fp32*)
|
71 |
-
or **5703** MB of memory in half precision (*fp16* or *bf16*).
|
72 |
-
|
73 |
## Pre-Training
|
74 |
|
75 |
The checkpoint was pretrained on the [Colossal, Cleaned version of Common Crawl (C4)](https://huggingface.co/datasets/c4) for 524288 steps using
|
|
|
38 |
|
39 |
## Details model architecture
|
40 |
|
41 |
+
This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
|
42 |
+
It has **2852** million parameters and thus requires **11406** MB of memory in full precision (*fp32*)
|
43 |
+
or **5703** MB of memory in half precision (*fp16* or *bf16*).
|
44 |
+
|
45 |
+
The *conventional* T5 architectures are summarized as follows:
|
46 |
|
47 |
| Model | nl (el/dl) | ff | dm | kv | nh | #Params|
|
48 |
| ----| ---- | ---- | ---- | ---- | ---- | ----|
|
|
|
54 |
| **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
|
55 |
| XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
|
56 |
|
57 |
+
, whereas the following abbreviations are used:
|
58 |
|
59 |
| Abbreviation | Definition |
|
60 |
| ----| ---- |
|
|
|
70 |
|
71 |
If a model checkpoint has no specific, *el* or *dl* than both the number of encoder- and decoder layers correspond no *nl*.
|
72 |
|
|
|
|
|
|
|
|
|
73 |
## Pre-Training
|
74 |
|
75 |
The checkpoint was pretrained on the [Colossal, Cleaned version of Common Crawl (C4)](https://huggingface.co/datasets/c4) for 524288 steps using
|