Commit 849605a0 authored by Myle Ott's avatar Myle Ott Committed by Facebook Github Bot
Browse files

Update comments and citations

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/676

Differential Revision: D15114128

Pulled By: myleott

fbshipit-source-id: b11dde77b2f2610d33649101aea03fb5a3eeb56a
parent 257a3b89
......@@ -101,9 +101,15 @@ fairseq(-py) is BSD-licensed.
The license applies to the pre-trained models as well.
We also provide an additional patent grant.
# Credits
This is a PyTorch version of
[fairseq](https://github.com/facebookresearch/fairseq), a sequence-to-sequence
learning toolkit from Facebook AI Research. The original authors of this
reimplementation are (in no particular order) Sergey Edunov, Myle Ott, and Sam
Gross.
# Citation
Please cite as:
```bibtex
@inproceedings{ott2019fairseq,
title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
year = {2019},
}
```
......@@ -86,7 +86,7 @@ This matches row 3 from Table 7 in the paper.
@article{shen2019mixture,
title = {Mixture Models for Diverse Machine Translation: Tricks of the Trade},
author = {Tianxiao Shen and Myle Ott and Michael Auli and Marc'Aurelio Ranzato},
journal = {arXiv preprint arXiv:1902.07816},
journal = {International Conference on Machine Learning},
year = 2019,
}
```
......@@ -314,7 +314,7 @@ def add_optimization_args(parser):
help='Learning Rate Scheduler')
group.add_argument('--lr-shrink', default=0.1, type=float, metavar='LS',
help='learning rate shrink factor for annealing, lr_new = (lr * lr_shrink)')
group.add_argument('--min-lr', default=1e-5, type=float, metavar='LR',
group.add_argument('--min-lr', default=-1, type=float, metavar='LR',
help='minimum learning rate')
# fmt: on
return group
......
......@@ -184,7 +184,7 @@ class Trainer(object):
# Whenever *samples* contains more than one mini-batch, we
# want to accumulate gradients locally and only call
# all-reduce in the last backwards pass. Currently the
# *need_reduction* flag is only supported by
# *accumulate_grads* flag is only supported by
# LegacyDistributedDataParallel.
if i < len(samples) - 1:
self.model.accumulate_grads = True
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment