- 02 Oct, 2018 2 commits
-
-
Michael Auli authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/300 Differential Revision: D10154711 Pulled By: edunov fbshipit-source-id: 859d1ac59923b67c1547b6f7acb94f801b0c3318
-
Liezl Puzon authored
Summary: Using argparse Namespace hides the actual args that are expected and makes code harder to read. Note the difference in style for the args list def __init__( self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling, beam, max_len_a, max_len_b, ): instead of def __init__( self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling, beam, max_len_a, max_len_b, ): Reviewed By: dpacgopinath Differential Revision: D10152331 fbshipit-source-id: 6539ccba09d48acf23759996b7e32fb329b3e3f6
-
- 01 Oct, 2018 1 commit
-
-
alexeib authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/296 Differential Revision: D10121830 Pulled By: alexeib fbshipit-source-id: 1b73430bdfdcb20a9a6123abfca3472a0d307b3b
-
- 30 Sep, 2018 3 commits
-
-
Myle Ott authored
Summary: Changelog: - `90f52a1`: Support loading subsets of the data on each worker with the `--fix-batches-to-gpus` flag. This should fix #217 and #266. - `6eda0a9`: Update README for replicating the "Scaling Neural Machine Translation" paper - `b14c7cf`: Fallback to no_c10d backend for pytorch 0.4.1 (fixes #294) Pull Request resolved: https://github.com/pytorch/fairseq/pull/295 Differential Revision: D10121559 Pulled By: myleott fbshipit-source-id: 41c84d0ee4cdd113544b5d3aa38ae8b23acc2c27
-
myleott authored
-
myleott authored
-
- 25 Sep, 2018 18 commits
-
-
Myle Ott authored
Co-authored-by:liezl200 <lie@fb.com>
-
Sergey Edunov authored
-
Myle Ott authored
-
alexeib authored
-
Alexei Baevski authored
-
Myle Ott authored
-
Myle Ott authored
-
Sergey Edunov authored
-
Sergey Edunov authored
-
Myle Ott authored
-
Myle Ott authored
-
Stephen Roller authored
-
Myle Ott authored
-
Myle Ott authored
-
Sergey Edunov authored
- no more FP16Trainer, we just have an FP16Optimizer wrapper - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0 - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
-
Myle Ott authored
-
Stephen Roller authored
-
Stephen Roller authored
-
- 24 Sep, 2018 2 commits
-
-
Sergey Edunov authored
Update readme with WMT'18 model (#433)
-
Sergey Edunov authored
-
- 18 Sep, 2018 4 commits
-
-
Sergey Edunov authored
Oss master
-
Sergey Edunov authored
-
Sergey Edunov authored
-
Sergey Edunov authored
-
- 07 Sep, 2018 1 commit
-
-
Angela Fan authored
-
- 04 Sep, 2018 1 commit
-
-
Myle Ott authored
-
- 03 Sep, 2018 8 commits