1. 05 Feb, 2019 1 commit
  2. 25 Sep, 2018 1 commit
    • Sergey Edunov's avatar
      Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 · 1082ba35
      Sergey Edunov authored
      - no more FP16Trainer, we just have an FP16Optimizer wrapper
      - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
      - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
      - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
      1082ba35
  3. 15 Jun, 2018 1 commit
  4. 02 Mar, 2018 1 commit
  5. 27 Feb, 2018 1 commit
    • Myle Ott's avatar
      fairseq-py goes distributed (#106) · 66415206
      Myle Ott authored
      This PR includes breaking API changes to modularize fairseq-py and adds support for distributed training across multiple nodes.
      
      Changes:
      - c7033ef: add support for distributed training! See updated README for usage.
      - e016299: modularize fairseq-py, adding support for register_model, register_criterion, register_optimizer, etc.
      - 154e440: update LSTM implementation to use PackedSequence objects in the encoder, better following best practices and improving perf
      - 90c2973 and 1da6265: improve unit test coverage
      66415206
  6. 22 Jan, 2018 1 commit
  7. 12 Nov, 2017 1 commit
    • Myle Ott's avatar
      Version 0.1.0 -> 0.2.0 · 13a3c811
      Myle Ott authored
      Release notes:
      - 5c7f4954: Added simple LSTM model with input feeding and attention
      - 6e4b7e22: Refactored model definitions and incremental generation to be cleaner
      - 7ae79c12: Split interactive generation out of generate.py and into a new binary: interactive.py
      - 19a3865d: Subtle correctness fix in beam search decoder. Previously, for a beam size of k, we might emit a hypotheses
                 if the <eos> was among the top 2*k candidates. Now we only emit hypotheses for which the <eos> is among the
                 top-k candidates. This may subtly change generation results, and in the case of k=1 we will now produce
                 strictly greedy outputs.
      - 97d7fcb9: Fixed bug in padding direction, where previously we right-padded the source and left-padded the target. We
                 now left-pad the source and right-pad the target. This should not effect existing trained models, but may
                 change (usually improves) the quality of new models.
      - f442f896: Add support for batching based on the number of sentences (`--max-sentences`) in addition to the number of
                 tokens (`--max-tokens`). When batching by the number of sentences, one can optionally normalize the gradients
                 by the number of sentences with `--sentence-avg` (the default is to normalize by the number of tokens).
      - c6d6256b: Add `--log-format` option and JSON logger
      13a3c811
  8. 24 Oct, 2017 1 commit
  9. 19 Oct, 2017 1 commit
  10. 15 Sep, 2017 1 commit