patrickvonplaten commited on
Commit
9df3444
·
1 Parent(s): fb84e9e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -36,7 +36,7 @@ A sequence of word embeddings is therefore processed sequentially by each transf
36
 
37
  ## Details model architecture
38
 
39
- The *conventional* T5 architectures are
40
 
41
  | Model | NL | dff | dmodel | dkv | NH | #Params|
42
  | ----| ---- | ---- | ---- | ---- | ---- | ----|
@@ -45,9 +45,11 @@ The *conventional* T5 architectures are
45
  | Small | 6/6 | 2048 | 512 | 32 | 8 | 60M|
46
  | Base | 12/12 | 3072 | 768 | 64 | 12 | 220M|
47
  | Large | 24/24 | 4096 | 1024 | 64 | 16 | 738M|
48
- | XL | 24/24 | 16384 | 1024 | 128 | 32 | 3B|
49
  | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
50
 
 
 
51
  ## Pre-Training
52
 
53
  The checkpoint was pretrained on the [Colossal, Cleaned version of Common Crawl (C4)](https://huggingface.co/datasets/c4) for 524288 steps using
 
36
 
37
  ## Details model architecture
38
 
39
+ The *conventional* T5 architectures are summarized in the following table.
40
 
41
  | Model | NL | dff | dmodel | dkv | NH | #Params|
42
  | ----| ---- | ---- | ---- | ---- | ---- | ----|
 
45
  | Small | 6/6 | 2048 | 512 | 32 | 8 | 60M|
46
  | Base | 12/12 | 3072 | 768 | 64 | 12 | 220M|
47
  | Large | 24/24 | 4096 | 1024 | 64 | 16 | 738M|
48
+ **| XL | 24/24 | 16384 | 1024 | 128 | 32 | 3B|**
49
  | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
50
 
51
+ This
52
+
53
  ## Pre-Training
54
 
55
  The checkpoint was pretrained on the [Colossal, Cleaned version of Common Crawl (C4)](https://huggingface.co/datasets/c4) for 524288 steps using