microsoft
/

mdeberta-v3-base

Model card Files Files and versions

DeBERTa commited on Oct 27, 2021

Commit

b2129d4

·

1 Parent(s): 9ead775

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -16,14 +16,14 @@ Please check the [official repository](https://github.com/microsoft/DeBERTa) for
 In DeBERTa V3, we replaced the MLM objective with the RTD(Replaced Token Detection) objective introduced by ELECTRA for pre-training, as well as some innovations to be introduced in our upcoming paper. Compared to DeBERTa-V2,  our V3 version significantly improves the model performance in downstream tasks.  You can find a simple introduction about the model from the appendix A11 in our original [paper](https://arxiv.org/abs/2006.03654),  but we will provide more details in a separate write-up.
-mDeBERTa is multilingual version of DeBERTa which use the same structure as DeBERTa and was trained with CC100 multilingual data.
 The mDeBERTa V3 base model comes with 12 layers and a hidden size of 768. Its total parameter number is 280M since we use a vocabulary containing 250K tokens which introduce 190M parameters in the Embedding layer.  This model was trained using the 2.5T CC100 data as XLM-R.
 #### Fine-tuning on NLU tasks
-We present the dev results on XNLI with zero-shot crosslingual transfer setting, i.e. training with english data only, test with other languages.
 | Model        |avg | en |  fr| es  | de  | el  | bg  | ru  |tr   |ar   |vi   | th  | zh | hi  | sw  | ur  |
 |--------------| ----|----|----|---- |--   |--   |--   | --  |--   |--   |--   | --  | -- | --  | --  | --  |

 In DeBERTa V3, we replaced the MLM objective with the RTD(Replaced Token Detection) objective introduced by ELECTRA for pre-training, as well as some innovations to be introduced in our upcoming paper. Compared to DeBERTa-V2,  our V3 version significantly improves the model performance in downstream tasks.  You can find a simple introduction about the model from the appendix A11 in our original [paper](https://arxiv.org/abs/2006.03654),  but we will provide more details in a separate write-up.
+mDeBERTa is the multilingual version of DeBERTa with the same model structure but was trained on the CC100 multilingual data.
 The mDeBERTa V3 base model comes with 12 layers and a hidden size of 768. Its total parameter number is 280M since we use a vocabulary containing 250K tokens which introduce 190M parameters in the Embedding layer.  This model was trained using the 2.5T CC100 data as XLM-R.
 #### Fine-tuning on NLU tasks
+We present the dev results on XNLI with zero-shot crosslingual transfer setting, i.e. training with english data only, test on other languages.
 | Model        |avg | en |  fr| es  | de  | el  | bg  | ru  |tr   |ar   |vi   | th  | zh | hi  | sw  | ur  |
 |--------------| ----|----|----|---- |--   |--   |--   | --  |--   |--   |--   | --  | -- | --  | --  | --  |