Update README.md

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/899 Differential Revision: D16448602 Pulled By: myleott fbshipit-source-id: afd1a1b713274b6328150cd85d7f8a81833597aa

Update README.md
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/899 Differential Revision: D16448602 Pulled By: myleott fbshipit-source-id: afd1a1b713274b6328150cd85d7f8a81833597aa
208295df · Myle Ott · Facebook Github Bot · af6b361c · 208295df
Commit 208295df authored Jul 23, 2019 by Myle Ott Committed by Facebook Github Bot Jul 23, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 7 deletions

examples/translation_moe/README.md examples/translation_moe/README.md +4 -7

No files found.
--- a/examples/translation_moe/README.md
+++ b/examples/translation_moe/README.md
@@ -15,23 +15,20 @@ The model is trained with online responsibility assignment and shared parameteri

 The following command will train a `hMoElp` model with `3` experts:
 ```
-$ CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/wmt17_en_de \
+$ fairseq-train --ddp-backend='no_c10d' \
+  data-bin/wmt17_en_de \
  --max-update 100000 \
  --task translation_moe \
  --method hMoElp --mean-pool-gating-network \
  --num-experts 3 \
-  --arch transformer_vaswani_wmt_en_de --share-all-embeddings \
+  --arch transformer_wmt_en_de --share-all-embeddings \
  --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
  --lr 0.0007 --min-lr 1e-09 \
  --dropout 0.1 --weight-decay 0.0 --criterion cross_entropy \
-  --max-tokens 3584 \
-  --update-freq 8
+  --max-tokens 3584
 ```

-**Note**: the above command assumes 1 GPU, but accumulates gradients from 8 fwd/bwd passes to simulate training on 8 GPUs.
-You can accelerate training on up to 8 GPUs by adjusting the `CUDA_VISIBLE_DEVICES` and `--update-freq` options accordingly.
-
 ## Translate

 Once a model is trained, we can generate translations from different experts using the `--gen-expert` option.