README.md 12.9 KB
Newer Older
1
# Neural Machine Translation
2

Myle Ott's avatar
Myle Ott committed
3
4
5
This README contains instructions for [using pretrained translation models](#example-usage-torchhub)
as well as [training new models](#training-a-new-model).

6
7
## Pre-trained models

Myle Ott's avatar
Myle Ott committed
8
Model | Description | Dataset | Download
9
---|---|---|---
Myle Ott's avatar
Myle Ott committed
10
11
12
13
14
15
16
17
18
19
`conv.wmt14.en-fr` | Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.v2.en-fr.fconv-py.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.newstest2014.tar.bz2) <br> newstest2012/2013: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.ntst1213.tar.bz2)
`conv.wmt14.en-de` | Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT14 English-German](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-de.fconv-py.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-de.newstest2014.tar.bz2)
`conv.wmt17.en-de` | Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT17 English-German](http://statmt.org/wmt17/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt17.v2.en-de.fconv-py.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt17.v2.en-de.newstest2014.tar.bz2)
`transformer.wmt14.en-fr` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-fr.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-fr.joined-dict.newstest2014.tar.bz2)
`transformer.wmt16.en-de` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT16 English-German](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt16.en-de.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt16.en-de.joined-dict.newstest2014.tar.bz2)
`transformer.wmt18.en-de` | Transformer <br> ([Edunov et al., 2018](https://arxiv.org/abs/1808.09381)) <br> WMT'18 winner | [WMT'18 English-German](http://www.statmt.org/wmt18/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt18.en-de.ensemble.tar.gz) <br> See NOTE in the archive
`transformer.wmt19.en-de` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 English-German](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-de.joined-dict.ensemble.tar.gz)
`transformer.wmt19.de-en` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 German-English](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.de-en.joined-dict.ensemble.tar.gz)
`transformer.wmt19.en-ru` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 English-Russian](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-ru.ensemble.tar.gz)
`transformer.wmt19.ru-en` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 Russian-English](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.ru-en.ensemble.tar.gz)
20

Myle Ott's avatar
Myle Ott committed
21
22
## Example usage (torch.hub)

Myle Ott's avatar
Myle Ott committed
23
24
25
26
27
We require a few additional Python dependencies for preprocessing:
```bash
pip install sacremoses subword_nmt
```

Myle Ott's avatar
Myle Ott committed
28
29
30
31
32
33
34
35
36
Interactive translation via PyTorch Hub:
```python
import torch

# List available models
torch.hub.list('pytorch/fairseq')  # [..., 'transformer.wmt16.en-de', ... ]

# Load a transformer trained on WMT'16 En-De
en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de', tokenizer='moses', bpe='subword_nmt')
Myle Ott's avatar
Myle Ott committed
37

Myle Ott's avatar
Myle Ott committed
38
39
40
41
42
43
44
# The underlying model is available under the *models* attribute
assert isinstance(en2de.models[0], fairseq.models.transformer.TransformerModel)

# Translate a sentence
en2de.translate('Hello world!')
# 'Hallo Welt!'
```
Myle Ott's avatar
Myle Ott committed
45
46

## Example usage (CLI tools)
47
48

Generation with the binarized test sets can be run in batch mode as follows, e.g. for WMT 2014 English-French on a GTX-1080ti:
Myle Ott's avatar
Myle Ott committed
49
50
51
52
53
54
55
56
57
58
```bash
mkdir -p data-bin
curl https://dl.fbaipublicfiles.com/fairseq/models/wmt14.v2.en-fr.fconv-py.tar.bz2 | tar xvjf - -C data-bin
curl https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.newstest2014.tar.bz2 | tar xvjf - -C data-bin
fairseq-generate data-bin/wmt14.en-fr.newstest2014  \
    --path data-bin/wmt14.en-fr.fconv-py/model.pt \
    --beam 5 --batch-size 128 --remove-bpe | tee /tmp/gen.out
# ...
# | Translated 3003 sentences (96311 tokens) in 166.0s (580.04 tokens/s)
# | Generate test with beam=5: BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
59

Myle Ott's avatar
Myle Ott committed
60
# Compute BLEU score
Myle Ott's avatar
Myle Ott committed
61
62
63
64
grep ^H /tmp/gen.out | cut -f3- > /tmp/gen.out.sys
grep ^T /tmp/gen.out | cut -f2- > /tmp/gen.out.ref
fairseq-score --sys /tmp/gen.out.sys --ref /tmp/gen.out.ref
# BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
65
```
66

Myle Ott's avatar
Myle Ott committed
67
## Training a new model
68

Myle Ott's avatar
Myle Ott committed
69
### IWSLT'14 German to English (Transformer)
70

Myle Ott's avatar
Myle Ott committed
71
The following instructions can be used to train a Transformer model on the [IWSLT'14 German to English dataset](http://workshop2014.iwslt.org/downloads/proceeding.pdf).
72

Myle Ott's avatar
Myle Ott committed
73
First download and preprocess the data:
Myle Ott's avatar
Myle Ott committed
74
```bash
Myle Ott's avatar
Myle Ott committed
75
# Download and prepare the data
Myle Ott's avatar
Myle Ott committed
76
77
78
cd examples/translation/
bash prepare-iwslt14.sh
cd ../..
79

Myle Ott's avatar
Myle Ott committed
80
# Preprocess/binarize the data
Myle Ott's avatar
Myle Ott committed
81
82
83
TEXT=examples/translation/iwslt14.tokenized.de-en
fairseq-preprocess --source-lang de --target-lang en \
    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
Myle Ott's avatar
Myle Ott committed
84
85
86
    --destdir data-bin/iwslt14.tokenized.de-en \
    --workers 20
```
87

Myle Ott's avatar
Myle Ott committed
88
89
90
91
92
93
94
95
Next we'll train a Transformer translation model over this data:
```bash
CUDA_VISIBLE_DEVICES=0 fairseq-train \
    data-bin/iwslt14.tokenized.de-en \
    --arch transformer_iwslt_de_en --share-decoder-input-output-embed \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
    --dropout 0.3 --weight-decay 0.0001 \
Myle Ott's avatar
Myle Ott committed
96
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
Myle Ott's avatar
Myle Ott committed
97
    --max-tokens 4096
98
99
```

Myle Ott's avatar
Myle Ott committed
100
Finally we can evaluate our trained model:
Myle Ott's avatar
Myle Ott committed
101
102
```bash
fairseq-generate data-bin/iwslt14.tokenized.de-en \
Myle Ott's avatar
Myle Ott committed
103
    --path checkpoints/checkpoint_best.pt \
Myle Ott's avatar
Myle Ott committed
104
    --batch-size 128 --beam 5 --remove-bpe
105
106
```

Myle Ott's avatar
Myle Ott committed
107
### WMT'14 English to German (Convolutional)
108

Myle Ott's avatar
Myle Ott committed
109
110
The following instructions can be used to train a Convolutional translation model on the WMT English to German dataset.
See the [Scaling NMT README](../scaling_nmt/README.md) for instructions to train a Transformer translation model on this data.
111

Myle Ott's avatar
Myle Ott committed
112
113
The WMT English to German dataset can be preprocessed using the `prepare-wmt14en2de.sh` script.
By default it will produce a dataset that was modeled after [Attention Is All You Need (Vaswani et al., 2017)](https://arxiv.org/abs/1706.03762), but with additional news-commentary-v12 data from WMT'17.
114

Myle Ott's avatar
Myle Ott committed
115
To use only data available in WMT'14 or to replicate results obtained in the original [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](https://arxiv.org/abs/1705.03122) paper, please use the `--icml17` option.
116

Myle Ott's avatar
Myle Ott committed
117
```bash
Myle Ott's avatar
Myle Ott committed
118
# Download and prepare the data
Myle Ott's avatar
Myle Ott committed
119
cd examples/translation/
Myle Ott's avatar
Myle Ott committed
120
# WMT'17 data:
Myle Ott's avatar
Myle Ott committed
121
bash prepare-wmt14en2de.sh
Myle Ott's avatar
Myle Ott committed
122
123
# or to use WMT'14 data:
# bash prepare-wmt14en2de.sh --icml17
Myle Ott's avatar
Myle Ott committed
124
cd ../..
125

Myle Ott's avatar
Myle Ott committed
126
# Binarize the dataset
Myle Ott's avatar
Myle Ott committed
127
TEXT=examples/translation/wmt17_en_de
Myle Ott's avatar
Myle Ott committed
128
129
fairseq-preprocess \
    --source-lang en --target-lang de \
Myle Ott's avatar
Myle Ott committed
130
    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
Myle Ott's avatar
Myle Ott committed
131
132
    --destdir data-bin/wmt17_en_de --thresholdtgt 0 --thresholdsrc 0 \
    --workers 20
133

Myle Ott's avatar
Myle Ott committed
134
# Train the model
Myle Ott's avatar
Myle Ott committed
135
mkdir -p checkpoints/fconv_wmt_en_de
Myle Ott's avatar
Myle Ott committed
136
137
138
fairseq-train \
    data-bin/wmt17_en_de \
    --arch fconv_wmt_en_de \
Myle Ott's avatar
Myle Ott committed
139
140
141
    --lr 0.5 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --lr-scheduler fixed --force-anneal 50 \
Myle Ott's avatar
Myle Ott committed
142
    --save-dir checkpoints/fconv_wmt_en_de
143

Myle Ott's avatar
Myle Ott committed
144
# Evaluate
Myle Ott's avatar
Myle Ott committed
145
fairseq-generate data-bin/wmt17_en_de \
Myle Ott's avatar
Myle Ott committed
146
147
    --path checkpoints/fconv_wmt_en_de/checkpoint_best.pt \
    --beam 5 --remove-bpe
148
149
```

Myle Ott's avatar
Myle Ott committed
150
### WMT'14 English to French
Myle Ott's avatar
Myle Ott committed
151
```bash
Myle Ott's avatar
Myle Ott committed
152
# Download and prepare the data
Myle Ott's avatar
Myle Ott committed
153
154
155
cd examples/translation/
bash prepare-wmt14en2fr.sh
cd ../..
156

Myle Ott's avatar
Myle Ott committed
157
# Binarize the dataset
Myle Ott's avatar
Myle Ott committed
158
TEXT=examples/translation/wmt14_en_fr
Myle Ott's avatar
Myle Ott committed
159
160
fairseq-preprocess \
    --source-lang en --target-lang fr \
Myle Ott's avatar
Myle Ott committed
161
    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
Myle Ott's avatar
Myle Ott committed
162
163
    --destdir data-bin/wmt14_en_fr --thresholdtgt 0 --thresholdsrc 0 \
    --workers 60
164

Myle Ott's avatar
Myle Ott committed
165
# Train the model
Myle Ott's avatar
Myle Ott committed
166
mkdir -p checkpoints/fconv_wmt_en_fr
Myle Ott's avatar
Myle Ott committed
167
168
fairseq-train \
    data-bin/wmt14_en_fr \
Myle Ott's avatar
Myle Ott committed
169
170
171
    --lr 0.5 --clip-norm 0.1 --dropout 0.1 --max-tokens 3000 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --lr-scheduler fixed --force-anneal 50 \
Myle Ott's avatar
Myle Ott committed
172
173
174
175
176
177
178
179
    --arch fconv_wmt_en_fr \
    --save-dir checkpoints/fconv_wmt_en_fr

# Evaluate
fairseq-generate \
    data-bin/fconv_wmt_en_fr \
    --path checkpoints/fconv_wmt_en_fr/checkpoint_best.pt \
    --beam 5 --remove-bpe
180
```
181
182
183
184
185
186
187
188
189
190

## Multilingual Translation

We also support training multilingual translation models. In this example we'll
train a multilingual `{de,fr}-en` translation model using the IWSLT'17 datasets.

Note that we use slightly different preprocessing here than for the IWSLT'14
En-De data above. In particular we learn a joint BPE code for all three
languages and use interactive.py and sacrebleu for scoring the test set.

Myle Ott's avatar
Myle Ott committed
191
```bash
192
# First install sacrebleu and sentencepiece
Myle Ott's avatar
Myle Ott committed
193
pip install sacrebleu sentencepiece
194
195

# Then download and preprocess the data
Myle Ott's avatar
Myle Ott committed
196
197
198
cd examples/translation/
bash prepare-iwslt17-multilingual.sh
cd ../..
199
200

# Binarize the de-en dataset
Myle Ott's avatar
Myle Ott committed
201
202
203
204
205
206
TEXT=examples/translation/iwslt17.de_fr.en.bpe16k
fairseq-preprocess --source-lang de --target-lang en \
    --trainpref $TEXT/train.bpe.de-en --validpref $TEXT/valid.bpe.de-en \
    --joined-dictionary \
    --destdir data-bin/iwslt17.de_fr.en.bpe16k \
    --workers 10
207
208
209

# Binarize the fr-en dataset
# NOTE: it's important to reuse the en dictionary from the previous step
Myle Ott's avatar
Myle Ott committed
210
211
212
213
214
fairseq-preprocess --source-lang fr --target-lang en \
    --trainpref $TEXT/train.bpe.fr-en --validpref $TEXT/valid.bpe.fr-en \
    --joined-dictionary --tgtdict data-bin/iwslt17.de_fr.en.bpe16k/dict.en.txt \
    --destdir data-bin/iwslt17.de_fr.en.bpe16k \
    --workers 10
215
216
217
218

# Train a multilingual transformer model
# NOTE: the command below assumes 1 GPU, but accumulates gradients from
#       8 fwd/bwd passes to simulate training on 8 GPUs
Myle Ott's avatar
Myle Ott committed
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
mkdir -p checkpoints/multilingual_transformer
CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt17.de_fr.en.bpe16k/ \
    --max-epoch 50 \
    --ddp-backend=no_c10d \
    --task multilingual_translation --lang-pairs de-en,fr-en \
    --arch multilingual_transformer_iwslt_de_en \
    --share-decoders --share-decoder-input-output-embed \
    --optimizer adam --adam-betas '(0.9, 0.98)' \
    --lr 0.0005 --lr-scheduler inverse_sqrt --min-lr '1e-09' \
    --warmup-updates 4000 --warmup-init-lr '1e-07' \
    --label-smoothing 0.1 --criterion label_smoothed_cross_entropy \
    --dropout 0.3 --weight-decay 0.0001 \
    --save-dir checkpoints/multilingual_transformer \
    --max-tokens 4000 \
    --update-freq 8
234
235

# Generate and score the test set with sacrebleu
Myle Ott's avatar
Myle Ott committed
236
237
238
239
240
241
SRC=de
sacrebleu --test-set iwslt17 --language-pair ${SRC}-en --echo src \
    | python scripts/spm_encode.py --model examples/translation/iwslt17.de_fr.en.bpe16k/sentencepiece.bpe.model \
    > iwslt17.test.${SRC}-en.${SRC}.bpe
cat iwslt17.test.${SRC}-en.${SRC}.bpe \
    | fairseq-interactive data-bin/iwslt17.de_fr.en.bpe16k/ \
242
243
      --task multilingual_translation --source-lang ${SRC} --target-lang en \
      --path checkpoints/multilingual_transformer/checkpoint_best.pt \
244
      --buffer-size 2000 --batch-size 128 \
245
      --beam 5 --remove-bpe=sentencepiece \
Myle Ott's avatar
Myle Ott committed
246
247
248
    > iwslt17.test.${SRC}-en.en.sys
grep ^H iwslt17.test.${SRC}-en.en.sys | cut -f3 \
    | sacrebleu --test-set iwslt17 --language-pair ${SRC}-en
249
```
250

Myle Ott's avatar
Myle Ott committed
251
252
##### Argument format during inference

253
254
255
256
During inference it is required to specify a single `--source-lang` and
`--target-lang`, which indicates the inference langauge direction.
`--lang-pairs`, `--encoder-langtok`, `--decoder-langtok` have to be set to
the same value as training.