google
/

t5-efficient-xl

text2text-generation

text-generation-inference

Model card Files Files and versions

patrickvonplaten commited on Feb 15, 2022

Commit

eedca21

·

1 Parent(s): 5b8b83b

Update README.md

Files changed (1) hide show

README.md +12 -2

README.md CHANGED Viewed

@@ -36,7 +36,7 @@ A sequence of word embeddings is therefore processed sequentially by each transf
 ## Details model architecture
-The *conventional* T5 architectures are summarized in the following table.
 | Model | nl | ff | dm | kv | nh | #Params|
 | ----| ---- | ---- | ---- | ---- | ---- | ----|
@@ -48,7 +48,17 @@ The *conventional* T5 architectures are summarized in the following table.
 | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
 | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
-This
 ## Pre-Training

 ## Details model architecture
+The *conventional* T5 architectures are summarized in the following table:
 | Model | nl | ff | dm | kv | nh | #Params|
 | ----| ---- | ---- | ---- | ---- | ---- | ----|
 | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
 | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
+with the following definitions:
+| NL | Number of transformer blocks (depth) |
+| EL | Number of transformer blocks in the encoder (encoder depth) |
+| DL | Number of transformer blocks in the decoder (decoder depth) |
+| DM | Dimension of embedding vector (output vector of transformers block) |
+| KV | Dimension of key/value projection matrix |
+| NH | Number of attention heads |
+| FF | Dimension of intermediate vector within transformer block (size of feed-forward projection matrix) |
+| SH | Signifies that attention heads are shared |
+| SKV | Signifies that key-values projection matrices are tied |
 ## Pre-Training