README.md 3.09 KB
Newer Older
1
2
3
4
5
6
# Scaling Neural Machine Translation (Ott et al., 2018)

This page includes instructions for reproducing results from the paper [Scaling Neural Machine Translation (Ott et al., 2018)](https://arxiv.org/abs/1806.00187).

## Pre-trained models

Myle Ott's avatar
Myle Ott committed
7
Model | Description | Dataset | Download
8
---|---|---|---
Myle Ott's avatar
Myle Ott committed
9
10
`transformer.wmt14.en-fr` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-fr.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-fr.joined-dict.newstest2014.tar.bz2)
`transformer.wmt16.en-de` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT16 English-German](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt16.en-de.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt16.en-de.joined-dict.newstest2014.tar.bz2)
11
12
13

## Training a new model on WMT'16 En-De

Myle Ott's avatar
Myle Ott committed
14
15
First download the [preprocessed WMT'16 En-De data provided by Google](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8).

16
17
Then:

Myle Ott's avatar
Myle Ott committed
18
##### 1. Extract the WMT'16 En-De data
Myle Ott's avatar
Myle Ott committed
19
20
```bash
TEXT=wmt16_en_de_bpe32k
21
mkdir -p $TEXT
Myle Ott's avatar
Myle Ott committed
22
tar -xzvf wmt16_en_de.tar.gz -C $TEXT
23
24
```

Myle Ott's avatar
Myle Ott committed
25
##### 2. Preprocess the dataset with a joined dictionary
Myle Ott's avatar
Myle Ott committed
26
```bash
Myle Ott's avatar
Myle Ott committed
27
28
fairseq-preprocess \
    --source-lang en --target-lang de \
Myle Ott's avatar
Myle Ott committed
29
30
31
32
33
    --trainpref $TEXT/train.tok.clean.bpe.32000 \
    --validpref $TEXT/newstest2013.tok.bpe.32000 \
    --testpref $TEXT/newstest2014.tok.bpe.32000 \
    --destdir data-bin/wmt16_en_de_bpe32k \
    --nwordssrc 32768 --nwordstgt 32768 \
Myle Ott's avatar
Myle Ott committed
34
35
    --joined-dictionary \
    --workers 20
36
37
```

Myle Ott's avatar
Myle Ott committed
38
##### 3. Train a model
Myle Ott's avatar
Myle Ott committed
39
```bash
Myle Ott's avatar
Myle Ott committed
40
41
fairseq-train \
    data-bin/wmt16_en_de_bpe32k \
Myle Ott's avatar
Myle Ott committed
42
43
    --arch transformer_vaswani_wmt_en_de_big --share-all-embeddings \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
Myle Ott's avatar
Myle Ott committed
44
45
46
    --lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 \
    --dropout 0.3 --weight-decay 0.0 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
Myle Ott's avatar
Myle Ott committed
47
48
    --max-tokens 3584 \
    --fp16
49
50
```

Myle Ott's avatar
Myle Ott committed
51
Note that the `--fp16` flag requires you have CUDA 9.1 or greater and a Volta GPU or newer.
52
53

If you want to train the above model with big batches (assuming your machine has 8 GPUs):
Myle Ott's avatar
Myle Ott committed
54
- add `--update-freq 16` to simulate training on 8x16=128 GPUs
55
56
- increase the learning rate; 0.001 works well for big batches

Myle Ott's avatar
Myle Ott committed
57
58
59
60
61
62
63
64
##### 4. Evaluate
```bash
fairseq-generate \
    data-bin/wmt16_en_de_bpe32k \
    --path checkpoints/checkpoint_best.pt \
    --beam 4 --lenpen 0.6 --remove-bpe
```

65
66
67
68
69
70
71
72
73
74
## Citation

```bibtex
@inproceedings{ott2018scaling,
  title = {Scaling Neural Machine Translation},
  author = {Ott, Myle and Edunov, Sergey and Grangier, David and Auli, Michael},
  booktitle = {Proceedings of the Third Conference on Machine Translation (WMT)},
  year = 2018,
}
```