Spaces:
Runtime error
Runtime error
| # M2M-100 Tokenization | |
| We apply different tokenization strategies for different languages following the existing literature. Here we provide tok.sh a tokenizer that can be used to reproduce our results. | |
| To reproduce the results, follow these steps: | |
| ``` | |
| tgt_lang=... | |
| reference_translation=... | |
| cat generation_output | grep -P "^H" | sort -V | cut -f 3- | sh tok.sh $tgt_lang > hyp | |
| cat $reference_translation |sh tok.sh $tgt_lang > ref | |
| sacrebleu -tok 'none' ref < hyp | |
| ``` | |
| ## Installation | |
| Tools needed for all the languages except Arabic can be installed by running install_dependencies.sh | |
| If you want to evaluate Arabic models, please follow the instructions provided here: http://alt.qcri.org/tools/arabic-normalizer/ to install | |