patrickvonplaten commited on
Commit
05444a3
·
1 Parent(s): 6f56197

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -9,9 +9,11 @@ tags:
9
  license: apache-2.0
10
  ---
11
 
12
- T5-Efficient-XL is a checkpoint of the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
13
 
14
- The checkpoint was released with the paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
 
 
15
  by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
16
 
17
  In a nutshell, the paper indicates that a **DeepNarrow** model architecture is favorable for **downstream** performance compared to other model architectures
 
9
  license: apache-2.0
10
  ---
11
 
12
+ # T5-Efficient-XL (T5's Deep-Narrow checkpoints)
13
 
14
+ T5-Efficient-XL is a variation of the original [T5-3B](https://huggingface.co/t5-3b) checkpoint and follows the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
15
+ It is a *pretrained-only* checkpoint and was released with the
16
+ paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
17
  by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
18
 
19
  In a nutshell, the paper indicates that a **DeepNarrow** model architecture is favorable for **downstream** performance compared to other model architectures