google
/

t5-efficient-xl

text2text-generation

text-generation-inference

Model card Files Files and versions

patrickvonplaten commited on Feb 15, 2022

Commit

63dfd57

·

1 Parent(s): 685fc6b

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -10,14 +10,14 @@ license: apache-2.0
 ---
 # T5-Efficient-XL
-## *One of T5's Deep-Narrow checkpoints*
 T5-Efficient-XL is a variation of the original [T5-3B](https://huggingface.co/t5-3b) checkpoint and follows the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
 It is a *pretrained-only* checkpoint and was released with the
 paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
 by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
-In a nutshell, the paper indicates that a **DeepNarrow** model architecture is favorable for **downstream** performance compared to other model architectures
 of similar parameter count.
 To quote the paper:

 ---
 # T5-Efficient-XL
+### *One of T5's Deep-Narrow checkpoints*
 T5-Efficient-XL is a variation of the original [T5-3B](https://huggingface.co/t5-3b) checkpoint and follows the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
 It is a *pretrained-only* checkpoint and was released with the
 paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
 by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
+In a nutshell, the paper indicates that a **Deep-Narrow** model architecture is favorable for **downstream** performance compared to other model architectures
 of similar parameter count.
 To quote the paper: