patrickvonplaten commited on
Commit
63dfd57
·
1 Parent(s): 685fc6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -10,14 +10,14 @@ license: apache-2.0
10
  ---
11
 
12
  # T5-Efficient-XL
13
- ## *One of T5's Deep-Narrow checkpoints*
14
 
15
  T5-Efficient-XL is a variation of the original [T5-3B](https://huggingface.co/t5-3b) checkpoint and follows the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
16
  It is a *pretrained-only* checkpoint and was released with the
17
  paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
18
  by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
19
 
20
- In a nutshell, the paper indicates that a **DeepNarrow** model architecture is favorable for **downstream** performance compared to other model architectures
21
  of similar parameter count.
22
 
23
  To quote the paper:
 
10
  ---
11
 
12
  # T5-Efficient-XL
13
+ ### *One of T5's Deep-Narrow checkpoints*
14
 
15
  T5-Efficient-XL is a variation of the original [T5-3B](https://huggingface.co/t5-3b) checkpoint and follows the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
16
  It is a *pretrained-only* checkpoint and was released with the
17
  paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
18
  by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
19
 
20
+ In a nutshell, the paper indicates that a **Deep-Narrow** model architecture is favorable for **downstream** performance compared to other model architectures
21
  of similar parameter count.
22
 
23
  To quote the paper: