patrickvonplaten commited on
Commit
eedca21
·
1 Parent(s): 5b8b83b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -36,7 +36,7 @@ A sequence of word embeddings is therefore processed sequentially by each transf
36
 
37
  ## Details model architecture
38
 
39
- The *conventional* T5 architectures are summarized in the following table.
40
 
41
  | Model | nl | ff | dm | kv | nh | #Params|
42
  | ----| ---- | ---- | ---- | ---- | ---- | ----|
@@ -48,7 +48,17 @@ The *conventional* T5 architectures are summarized in the following table.
48
  | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
49
  | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
50
 
51
- This
 
 
 
 
 
 
 
 
 
 
52
 
53
  ## Pre-Training
54
 
 
36
 
37
  ## Details model architecture
38
 
39
+ The *conventional* T5 architectures are summarized in the following table:
40
 
41
  | Model | nl | ff | dm | kv | nh | #Params|
42
  | ----| ---- | ---- | ---- | ---- | ---- | ----|
 
48
  | **XL** | **24/24** | **16384** | **1024** | **128** | **32** | **3B**|
49
  | XXL | 24/24 | 65536 | 1024 | 128 | 128 | 11B|
50
 
51
+ with the following definitions:
52
+
53
+ | NL | Number of transformer blocks (depth) |
54
+ | EL | Number of transformer blocks in the encoder (encoder depth) |
55
+ | DL | Number of transformer blocks in the decoder (decoder depth) |
56
+ | DM | Dimension of embedding vector (output vector of transformers block) |
57
+ | KV | Dimension of key/value projection matrix |
58
+ | NH | Number of attention heads |
59
+ | FF | Dimension of intermediate vector within transformer block (size of feed-forward projection matrix) |
60
+ | SH | Signifies that attention heads are shared |
61
+ | SKV | Signifies that key-values projection matrices are tied |
62
 
63
  ## Pre-Training
64