patrickvonplaten commited on
Commit
b8ecf29
·
1 Parent(s): db69d5e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -38,7 +38,11 @@ A sequence of word embeddings is therefore processed sequentially by each transf
38
 
39
  ## Details model architecture
40
 
41
- The *conventional* T5 architectures are summarized in the following table:
 
 
 
 
42
 
43
  | Model | nl (el/dl) | ff | dm | kv | nh | #Params|
44
  | ----| ---- | ---- | ---- | ---- | ---- | ----|
@@ -50,7 +54,7 @@ The *conventional* T5 architectures are summarized in the following table:
50
  | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
51
  | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
52
 
53
- with the following definitions:
54
 
55
  | Abbreviation | Definition |
56
  | ----| ---- |
@@ -66,10 +70,6 @@ with the following definitions:
66
 
67
  If a model checkpoint has no specific, *el* or *dl* than both the number of encoder- and decoder layers correspond no *nl*.
68
 
69
- This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
70
- It has **2852** million parameters and thus requires **11406** MB of memory in full precision (*fp32*)
71
- or **5703** MB of memory in half precision (*fp16* or *bf16*).
72
-
73
  ## Pre-Training
74
 
75
  The checkpoint was pretrained on the [Colossal, Cleaned version of Common Crawl (C4)](https://huggingface.co/datasets/c4) for 524288 steps using
 
38
 
39
  ## Details model architecture
40
 
41
+ This model checkpoint - **t5-efficient-xl** - is of model type **XL** with **no** variations.
42
+ It has **2852** million parameters and thus requires **11406** MB of memory in full precision (*fp32*)
43
+ or **5703** MB of memory in half precision (*fp16* or *bf16*).
44
+
45
+ The *conventional* T5 architectures are summarized as follows:
46
 
47
  | Model | nl (el/dl) | ff | dm | kv | nh | #Params|
48
  | ----| ---- | ---- | ---- | ---- | ---- | ----|
 
54
  | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
55
  | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
56
 
57
+ , whereas the following abbreviations are used:
58
 
59
  | Abbreviation | Definition |
60
  | ----| ---- |
 
70
 
71
  If a model checkpoint has no specific, *el* or *dl* than both the number of encoder- and decoder layers correspond no *nl*.
72
 
 
 
 
 
73
  ## Pre-Training
74
 
75
  The checkpoint was pretrained on the [Colossal, Cleaned version of Common Crawl (C4)](https://huggingface.co/datasets/c4) for 524288 steps using