google
/

t5-efficient-xl

text2text-generation

text-generation-inference

Model card Files Files and versions

patrickvonplaten commited on Feb 15, 2022

Commit

05444a3

·

1 Parent(s): 6f56197

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -9,9 +9,11 @@ tags:
 license: apache-2.0
 ---
-T5-Efficient-XL is a checkpoint of the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
-The checkpoint was released with the paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
 by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
 In a nutshell, the paper indicates that a **DeepNarrow** model architecture is favorable for **downstream** performance compared to other model architectures

 license: apache-2.0
 ---
+# T5-Efficient-XL (T5's Deep-Narrow checkpoints)
+T5-Efficient-XL is a variation of the original [T5-3B](https://huggingface.co/t5-3b) checkpoint and follows the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
+It is a *pretrained-only* checkpoint and was released with the
+paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
 by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
 In a nutshell, the paper indicates that a **DeepNarrow** model architecture is favorable for **downstream** performance compared to other model architectures