1. 25 Jan, 2019 1 commit
  2. 15 Jan, 2019 1 commit
  3. 07 Jan, 2019 1 commit
  4. 05 Jan, 2019 1 commit
  5. 25 Sep, 2018 1 commit
    • Sergey Edunov's avatar
      Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 · 1082ba35
      Sergey Edunov authored
      - no more FP16Trainer, we just have an FP16Optimizer wrapper
      - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
      - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
      - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
      1082ba35
  6. 18 Sep, 2018 1 commit
  7. 04 Sep, 2018 1 commit
  8. 03 Sep, 2018 1 commit