Commit 181dc58e authored by Myle Ott's avatar Myle Ott Committed by Facebook Github Bot
Browse files

Documentation fixes

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/927

Differential Revision: D18691521

Pulled By: myleott

fbshipit-source-id: a79cb0a7614a30be765e741761819263d9fb5047
parent 5d9392df
...@@ -59,17 +59,9 @@ To use the model without GLU, please set `--encoder-glu 0 --decoder-glu 0`. ...@@ -59,17 +59,9 @@ To use the model without GLU, please set `--encoder-glu 0 --decoder-glu 0`.
For LightConv, please use `--encoder-conv-type lightweight --decoder-conv-type lightweight`, otherwise the default is DynamicConv. For LightConv, please use `--encoder-conv-type lightweight --decoder-conv-type lightweight`, otherwise the default is DynamicConv.
For best BLEU results, lenpen may need to be manually tuned. For best BLEU results, lenpen may need to be manually tuned.
To use the CUDA kernels, first install the PyTorch modules using the commands below To use the CUDA kernels, first install the PyTorch modules using the commands
```sh above. Once the CUDA modules are installed, they will automatically be used
# to install lightconv instead of the PyTorch modules.
python fairseq/modules/lightconv_layer/cuda_function_gen.py
python fairseq/modules/lightconv_layer/setup.py install
# to install dynamicconv
python fairseq/modules/dynamicconv_layer/cuda_function_gen.py
python fairseq/modules/dynamicconv_layer/setup.py install
```
Once the CUDA modules are installed, they will automatically be used instead of the PyTorch modules.
### IWSLT14 De-En ### IWSLT14 De-En
Training and evaluating DynamicConv (without GLU) on a GPU: Training and evaluating DynamicConv (without GLU) on a GPU:
......
...@@ -50,15 +50,35 @@ fairseq-train \ ...@@ -50,15 +50,35 @@ fairseq-train \
Note that the `--fp16` flag requires you have CUDA 9.1 or greater and a Volta GPU or newer. Note that the `--fp16` flag requires you have CUDA 9.1 or greater and a Volta GPU or newer.
If you want to train the above model with big batches (assuming your machine has 8 GPUs): ***IMPORTANT:*** You will get better performance by training with big batches and
increasing the learning rate. If you want to train the above model with big batches
(assuming your machine has 8 GPUs):
- add `--update-freq 16` to simulate training on 8x16=128 GPUs - add `--update-freq 16` to simulate training on 8x16=128 GPUs
- increase the learning rate; 0.001 works well for big batches - increase the learning rate; 0.001 works well for big batches
##### 4. Evaluate ##### 4. Evaluate
Now we can evaluate our trained model.
Note that the original [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
paper used a couple tricks to achieve better BLEU scores. We use these same tricks in
the Scaling NMT paper, so it's important to apply them when reproducing our results.
First, use the [average_checkpoints.py](/scripts/average_checkpoints.py) script to
average the last few checkpoints. Averaging the last 5-10 checkpoints is usually
good, but you may need to adjust this depending on how long you've trained:
```bash
python scripts/average_checkpoints \
--inputs /path/to/checkpoints \
--num-epoch-checkpoints 5 \
--output checkpoint.avg5.pt
```
Next, generate translations using a beam width of 4 and length penalty of 0.6:
```bash ```bash
fairseq-generate \ fairseq-generate \
data-bin/wmt16_en_de_bpe32k \ data-bin/wmt16_en_de_bpe32k \
--path checkpoints/checkpoint_best.pt \ --path checkpoint.avg5.pt \
--beam 4 --lenpen 0.6 --remove-bpe --beam 4 --lenpen 0.6 --remove-bpe
``` ```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment