Commit
·
05444a3
1
Parent(s):
6f56197
Update README.md
Browse files
README.md
CHANGED
@@ -9,9 +9,11 @@ tags:
|
|
9 |
license: apache-2.0
|
10 |
---
|
11 |
|
12 |
-
T5-Efficient-XL
|
13 |
|
14 |
-
|
|
|
|
|
15 |
by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
|
16 |
|
17 |
In a nutshell, the paper indicates that a **DeepNarrow** model architecture is favorable for **downstream** performance compared to other model architectures
|
|
|
9 |
license: apache-2.0
|
10 |
---
|
11 |
|
12 |
+
# T5-Efficient-XL (T5's Deep-Narrow checkpoints)
|
13 |
|
14 |
+
T5-Efficient-XL is a variation of the original [T5-3B](https://huggingface.co/t5-3b) checkpoint and follows the [T5 model architecture](https://huggingface.co/docs/transformers/model_doc/t5).
|
15 |
+
It is a *pretrained-only* checkpoint and was released with the
|
16 |
+
paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)**
|
17 |
by *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*.
|
18 |
|
19 |
In a nutshell, the paper indicates that a **DeepNarrow** model architecture is favorable for **downstream** performance compared to other model architectures
|