- 25 Sep, 2018 8 commits
-
-
Myle Ott authored
-
Stephen Roller authored
-
Myle Ott authored
-
Myle Ott authored
-
Sergey Edunov authored
- no more FP16Trainer, we just have an FP16Optimizer wrapper - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0 - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
-
Myle Ott authored
-
Stephen Roller authored
-
Stephen Roller authored
-
- 24 Sep, 2018 2 commits
-
-
Sergey Edunov authored
Update readme with WMT'18 model (#433)
-
Sergey Edunov authored
-
- 18 Sep, 2018 4 commits
-
-
Sergey Edunov authored
Oss master
-
Sergey Edunov authored
-
Sergey Edunov authored
-
Sergey Edunov authored
-
- 07 Sep, 2018 1 commit
-
-
Angela Fan authored
-
- 04 Sep, 2018 1 commit
-
-
Myle Ott authored
-
- 03 Sep, 2018 24 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
alexeib authored
-
Myle Ott authored
-
Myle Ott authored
-
alexeib authored
-
Myle Ott authored
-
Myle Ott authored
-
Li Zhao authored
-
Alexei Baevski authored
also don't crash if param does not recieve grads
-
Myle Ott authored
-
Myle Ott authored
-
Alexei Baevski authored
-
Sergey Edunov authored
-
Myle Ott authored
-
Alexei Baevski authored
-
Louis Martin authored
-
Myle Ott authored
-
Myle Ott authored
-
alexeib authored
-
Myle Ott authored
-
Myle Ott authored
-
Alexei Baevski authored
-