Spaces:
Runtime error
Runtime error
| # Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019) | |
| This page includes instructions for training models described in [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](https://arxiv.org/abs/1909.02074). | |
| ## Training a joint alignment-translation model on WMT'18 En-De | |
| ##### 1. Extract and preprocess the WMT'18 En-De data | |
| ```bash | |
| ./prepare-wmt18en2de_no_norm_no_escape_no_agressive.sh | |
| ``` | |
| ##### 2. Generate alignments from statistical alignment toolkits e.g. Giza++/FastAlign. | |
| In this example, we use FastAlign. | |
| ```bash | |
| git clone [email protected]:clab/fast_align.git | |
| pushd fast_align | |
| mkdir build | |
| cd build | |
| cmake .. | |
| make | |
| popd | |
| ALIGN=fast_align/build/fast_align | |
| paste bpe.32k/train.en bpe.32k/train.de | awk -F '\t' '{print $1 " ||| " $2}' > bpe.32k/train.en-de | |
| $ALIGN -i bpe.32k/train.en-de -d -o -v > bpe.32k/train.align | |
| ``` | |
| ##### 3. Preprocess the dataset with the above generated alignments. | |
| ```bash | |
| fairseq-preprocess \ | |
| --source-lang en --target-lang de \ | |
| --trainpref bpe.32k/train \ | |
| --validpref bpe.32k/valid \ | |
| --testpref bpe.32k/test \ | |
| --align-suffix align \ | |
| --destdir binarized/ \ | |
| --joined-dictionary \ | |
| --workers 32 | |
| ``` | |
| ##### 4. Train a model | |
| ```bash | |
| fairseq-train \ | |
| binarized \ | |
| --arch transformer_wmt_en_de_big_align --share-all-embeddings \ | |
| --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --activation-fn relu\ | |
| --lr 0.0002 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 \ | |
| --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \ | |
| --max-tokens 3500 --label-smoothing 0.1 \ | |
| --save-dir ./checkpoints --log-interval 1000 --max-update 60000 \ | |
| --keep-interval-updates -1 --save-interval-updates 0 \ | |
| --load-alignments --criterion label_smoothed_cross_entropy_with_alignment \ | |
| --fp16 | |
| ``` | |
| Note that the `--fp16` flag requires you have CUDA 9.1 or greater and a Volta GPU or newer. | |
| If you want to train the above model with big batches (assuming your machine has 8 GPUs): | |
| - add `--update-freq 8` to simulate training on 8x8=64 GPUs | |
| - increase the learning rate; 0.0007 works well for big batches | |
| ##### 5. Evaluate and generate the alignments (BPE level) | |
| ```bash | |
| fairseq-generate \ | |
| binarized --gen-subset test --print-alignment \ | |
| --source-lang en --target-lang de \ | |
| --path checkpoints/checkpoint_best.pt --beam 5 --nbest 1 | |
| ``` | |
| ##### 6. Other resources. | |
| The code for: | |
| 1. preparing alignment test sets | |
| 2. converting BPE level alignments to token level alignments | |
| 3. symmetrizing bidirectional alignments | |
| 4. evaluating alignments using AER metric | |
| can be found [here](https://github.com/lilt/alignment-scripts) | |
| ## Citation | |
| ```bibtex | |
| @inproceedings{garg2019jointly, | |
| title = {Jointly Learning to Align and Translate with Transformer Models}, | |
| author = {Garg, Sarthak and Peitz, Stephan and Nallasamy, Udhyakumar and Paulik, Matthias}, | |
| booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)}, | |
| address = {Hong Kong}, | |
| month = {November}, | |
| url = {https://arxiv.org/abs/1909.02074}, | |
| year = {2019}, | |
| } | |
| ``` | |