Update comments and citations

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/676 Differential Revision: D15114128 Pulled By: myleott fbshipit-source-id: b11dde77b2f2610d33649101aea03fb5a3eeb56a

Update comments and citations
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/676 Differential Revision: D15114128 Pulled By: myleott fbshipit-source-id: b11dde77b2f2610d33649101aea03fb5a3eeb56a
849605a0 · Myle Ott · Facebook Github Bot · 257a3b89 · 849605a0 · 849605a0
Commit 849605a0 authored Apr 29, 2019 by Myle Ott Committed by Facebook Github Bot Apr 29, 2019
Showing with 15 additions and 9 deletions

README.md README.md +12 -6

examples/translation_moe/README.md examples/translation_moe/README.md +1 -1

fairseq/options.py fairseq/options.py +1 -1

fairseq/trainer.py fairseq/trainer.py +1 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -101,9 +101,15 @@ fairseq(-py) is BSD-licensed.
 The license applies to the pre-trained models as well.
 We also provide an additional patent grant.

-# Credits
-This is a PyTorch version of
-[fairseq](https://github.com/facebookresearch/fairseq), a sequence-to-sequence
-learning toolkit from Facebook AI Research. The original authors of this
-reimplementation are (in no particular order) Sergey Edunov, Myle Ott, and Sam
-Gross.
+# Citation
+
+Please cite as:
+
+```bibtex
+@inproceedings{ott2019fairseq,
+  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
+  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
+  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
+  year = {2019},
+}
+```
--- a/examples/translation_moe/README.md
+++ b/examples/translation_moe/README.md
@@ -86,7 +86,7 @@ This matches row 3 from Table 7 in the paper.
 @article{shen2019mixture,
  title = {Mixture Models for Diverse Machine Translation: Tricks of the Trade},
  author = {Tianxiao Shen and Myle Ott and Michael Auli and Marc'Aurelio Ranzato},
-  journal = {arXiv preprint arXiv:1902.07816},
+  journal = {International Conference on Machine Learning},
  year = 2019,
 }
 ```
--- a/fairseq/options.py
+++ b/fairseq/options.py
@@ -314,7 +314,7 @@ def add_optimization_args(parser):
                       help='Learning Rate Scheduler')
    group.add_argument('--lr-shrink', default=0.1, type=float, metavar='LS',
                       help='learning rate shrink factor for annealing, lr_new = (lr * lr_shrink)')
-    group.add_argument('--min-lr', default=1e-5, type=float, metavar='LR',
+    group.add_argument('--min-lr', default=-1, type=float, metavar='LR',
                       help='minimum learning rate')
    # fmt: on
    return group

--- a/fairseq/trainer.py
+++ b/fairseq/trainer.py
@@ -184,7 +184,7 @@ class Trainer(object):
                    # Whenever *samples* contains more than one mini-batch, we
                    # want to accumulate gradients locally and only call
                    # all-reduce in the last backwards pass. Currently the
-                    # *need_reduction* flag is only supported by
+                    # *accumulate_grads* flag is only supported by
                    # LegacyDistributedDataParallel.
                    if i < len(samples) - 1:
                        self.model.accumulate_grads = True