Sample data processing scripts for the FAIR Sequence-to-Sequence Toolkit
# Example usage for Neural Machine Translation
These scripts provide an example of pre-processing data for the NMT task.
These scripts provide an example of pre-processing data for the NMT task
and instructions for how to replicate the results from the paper [Scaling Neural Machine Translation (Ott et al., 2018)](https://arxiv.org/abs/1806.00187).
# prepare-iwslt14.sh
## Preprocessing
### prepare-iwslt14.sh
Provides an example of pre-processing for IWSLT'14 German to English translation task: ["Report on the 11th IWSLT evaluation campaign" by Cettolo et al.](http://workshop2014.iwslt.org/downloads/proceeding.pdf)
Provides an example of pre-processing for IWSLT'14 German to English translation task: ["Report on the 11th IWSLT evaluation campaign" by Cettolo et al.](http://workshop2014.iwslt.org/downloads/proceeding.pdf)
Provides an example of pre-processing for the WMT'14 English to German translation task. By default it will produce a dataset that was modeled after ["Attention Is All You Need" by Vaswani et al.](https://arxiv.org/abs/1706.03762) that includes news-commentary-v12 data.
Provides an example of pre-processing for the WMT'14 English to German translation task. By default it will produce a dataset that was modeled after ["Attention Is All You Need" by Vaswani et al.](https://arxiv.org/abs/1706.03762) that includes news-commentary-v12 data.
...
@@ -52,7 +55,7 @@ $ bash prepare-wmt14en2de.sh
...
@@ -52,7 +55,7 @@ $ bash prepare-wmt14en2de.sh
$ cd ../..
$ cd ../..
# Binarize the dataset:
# Binarize the dataset:
$ TEXT=data/wmt14_en_de
$ TEXT=examples/translation/wmt14_en_de
$ python preprocess.py --source-lang en --target-lang de \
$ python preprocess.py --source-lang en --target-lang de \