Update README.md
Browse files
README.md
CHANGED
@@ -16,14 +16,14 @@ Please check the [official repository](https://github.com/microsoft/DeBERTa) for
|
|
16 |
|
17 |
In DeBERTa V3, we replaced the MLM objective with the RTD(Replaced Token Detection) objective introduced by ELECTRA for pre-training, as well as some innovations to be introduced in our upcoming paper. Compared to DeBERTa-V2, our V3 version significantly improves the model performance in downstream tasks. You can find a simple introduction about the model from the appendix A11 in our original [paper](https://arxiv.org/abs/2006.03654), but we will provide more details in a separate write-up.
|
18 |
|
19 |
-
mDeBERTa is multilingual version of DeBERTa
|
20 |
|
21 |
The mDeBERTa V3 base model comes with 12 layers and a hidden size of 768. Its total parameter number is 280M since we use a vocabulary containing 250K tokens which introduce 190M parameters in the Embedding layer. This model was trained using the 2.5T CC100 data as XLM-R.
|
22 |
|
23 |
|
24 |
#### Fine-tuning on NLU tasks
|
25 |
|
26 |
-
We present the dev results on XNLI with zero-shot crosslingual transfer setting, i.e. training with english data only, test
|
27 |
|
28 |
| Model |avg | en | fr| es | de | el | bg | ru |tr |ar |vi | th | zh | hi | sw | ur |
|
29 |
|--------------| ----|----|----|---- |-- |-- |-- | -- |-- |-- |-- | -- | -- | -- | -- | -- |
|
|
|
16 |
|
17 |
In DeBERTa V3, we replaced the MLM objective with the RTD(Replaced Token Detection) objective introduced by ELECTRA for pre-training, as well as some innovations to be introduced in our upcoming paper. Compared to DeBERTa-V2, our V3 version significantly improves the model performance in downstream tasks. You can find a simple introduction about the model from the appendix A11 in our original [paper](https://arxiv.org/abs/2006.03654), but we will provide more details in a separate write-up.
|
18 |
|
19 |
+
mDeBERTa is the multilingual version of DeBERTa with the same model structure but was trained on the CC100 multilingual data.
|
20 |
|
21 |
The mDeBERTa V3 base model comes with 12 layers and a hidden size of 768. Its total parameter number is 280M since we use a vocabulary containing 250K tokens which introduce 190M parameters in the Embedding layer. This model was trained using the 2.5T CC100 data as XLM-R.
|
22 |
|
23 |
|
24 |
#### Fine-tuning on NLU tasks
|
25 |
|
26 |
+
We present the dev results on XNLI with zero-shot crosslingual transfer setting, i.e. training with english data only, test on other languages.
|
27 |
|
28 |
| Model |avg | en | fr| es | de | el | bg | ru |tr |ar |vi | th | zh | hi | sw | ur |
|
29 |
|--------------| ----|----|----|---- |-- |-- |-- | -- |-- |-- |-- | -- | -- | -- | -- | -- |
|