1. 30 Jun, 2019 1 commit
  2. 27 Jun, 2019 1 commit
    • Nayan Singhal's avatar
      2/N bmuf · c246df42
      Nayan Singhal authored
      Summary:
      Added BMUF implementation.
      
      Todo:
      1) Add unit test case for testing model averaging and bmuf
      2) Add warm before actually start training the model
      
      Reviewed By: jay-mahadeokar
      
      Differential Revision: D15871477
      
      fbshipit-source-id: 866b0aba2d5bea5b65b4438acb49c886c4a87924
      c246df42
  3. 12 Jun, 2019 1 commit
    • Nayan Singhal's avatar
      Add Model Averaging · 6982c404
      Nayan Singhal authored
      Summary:
      Implemented model averaging for fairseq.
      Removed the ddp wrapper if global optimizer is provided.
      Syncing all the models based on the iteration provide in the input
      
      TODO:
      1) Fix throughput and wps meter. Need to check other meters too.
      2) Replace Model average code with BMUF algorithm implementation.
      
      Reviewed By: myleott
      
      Differential Revision: D15711044
      
      fbshipit-source-id: 58a4af74db2a61d06762597b95836cbeb1ed82cc
      6982c404
  4. 11 Jun, 2019 1 commit
  5. 30 May, 2019 2 commits
  6. 23 May, 2019 1 commit
    • Kritika Singh's avatar
      Allow unused params in distributed training · 72a5487c
      Kritika Singh authored
      Summary:
      Context from https://fb.workplace.com/groups/1405155842844877/permalink/2785095451517569/:
      
      I am adding a model to pyspeech (formerly fairspeq) with the following `forward`:
      ```
      def forward(self, src_tokens, src_lengths, prev_output_tokens, name):
          encoder_out = self.encoder(src_tokens, src_lengths)
          if name == Dataset.d1:
              decoder_out = self.decoder1(prev_output_tokens, encoder_out)
          elif name == Dataset.d2:
              decoder_out = self.decoder2(encoder_out)
          return decoder_out
      ```
      When I run distributed training on this model, I get the following error:
      
      ```
      RuntimeError: Expected to have finished reduction in the prior iteration before starting a
      new one. This error indicates that your module has parameters that were not used in
      producing loss. You can enable unused parameter detection by (1) passing the keyword
      argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2)
      making sure all `forward` function outputs participate in calculating loss. If you already have
      done the above two steps, then the distributed data parallel module wasn't able to locate the
      output tensors in the return value of your module's `forward` function. Please include the loss
      function and the structure of the return value of `forward` of your module when reporting this
      issue (e.g. list, dict, iterable). (prepare_for_backward at
      caffe2/torch/csrc/distributed/c10d/reducer.cpp:410)
      ```
      
      The recommended fix is to pass find_unused_parameters=True to DistributedDataParallel's initialization
      
      Reviewed By: myleott
      
      Differential Revision: D15439726
      
      fbshipit-source-id: 7fd80d4a3f49ac90182dec723b49b14e6689406a
      72a5487c
  7. 20 May, 2019 1 commit
  8. 17 May, 2019 1 commit
  9. 08 May, 2019 1 commit
  10. 07 May, 2019 1 commit
    • Davide Caroselli's avatar
      Memory-Mapped IndexedDataset implementation (#589) · a1c997bd
      Davide Caroselli authored
      Summary:
      Following discussion in https://github.com/pytorch/fairseq/issues/574:
      
       - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
      - Update scripts/read_binarized.py to support new MMapIndexedDataset
      - Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
      - Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/589
      
      Differential Revision: D14597128
      
      Pulled By: myleott
      
      fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e
      a1c997bd
  11. 05 May, 2019 1 commit
  12. 04 May, 2019 1 commit
  13. 30 Apr, 2019 1 commit
  14. 29 Apr, 2019 1 commit
  15. 15 Mar, 2019 1 commit
    • Myle Ott's avatar
      0.6.1 -> 0.6.2 (#577) · e6422528
      Myle Ott authored
      Summary:
      Changelog:
      - 998ba4f: Add language models from Baevski & Auli (2018)
      - 4294c4f6: Add mixture of experts code from Shen et al. (2019)
      - 00493490: Add example for multilingual training
      - 48d9afbe: Speed improvements, including fused operators from apex
      - 44d27e64: Add Tensorboard support
      - d17fa851: Add Adadelta optimizer
      - 9e1c880f: Add `FairseqEncoderModel`
      - b65c579b: Add `FairseqTask.inference_step` to modularize generate.py
      - 2ad1178e: Add back `--curriculum`
      - Misc bug fixes and other features
      
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/577
      
      Differential Revision: D14481233
      
      Pulled By: myleott
      
      fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7
      e6422528
  16. 12 Mar, 2019 1 commit
    • Dmytro Okhonko's avatar
      Handle 3+ dimensional input in sequence_generator + nits · 860010e9
      Dmytro Okhonko authored
      Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths.
      
      Reviewed By: myleott
      
      Differential Revision: D14420044
      
      fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015
      860010e9
  17. 04 Mar, 2019 1 commit
  18. 26 Feb, 2019 2 commits
  19. 01 Feb, 2019 1 commit
    • Davide Caroselli's avatar
      Support custom Dictionary implementations in 'preprocess.py' (#448) · bbb4120b
      Davide Caroselli authored
      Summary:
      The `preprocess.py` script has been refactored in order to:
      
      1. Use the `options` module for command line arguments  parsing. This will give to `preprocess.py` the ability to load custom modules with `--user-dir` flag (already implemented to all other binaries)
      2. Dictionary loading and building code has moved to Task implementation. This allows custom Dictionary classes to be used during the data generation step.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/448
      
      Differential Revision: D13674819
      
      Pulled By: myleott
      
      fbshipit-source-id: b40648a98ed6c08284577e5ec25876e018d8c822
      bbb4120b
  20. 30 Jan, 2019 2 commits
  21. 25 Jan, 2019 1 commit
  22. 16 Jan, 2019 1 commit
    • Davide Caroselli's avatar
      FIX: '--user-dir' on multi-gpu (#449) · 7853818c
      Davide Caroselli authored
      Summary:
      On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`.
      
      This pull request fixes this problem: custom module import in now explicit on every `main()` function.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/449
      
      Differential Revision: D13676922
      
      Pulled By: myleott
      
      fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6
      7853818c
  23. 15 Jan, 2019 1 commit
  24. 14 Jan, 2019 1 commit
  25. 05 Jan, 2019 1 commit
  26. 26 Dec, 2018 1 commit
    • Myle Ott's avatar
      Merge internal changes (#422) · 8ce6499d
      Myle Ott authored
      Summary:
      - 04cc608: Add `--match-source-len` option to generate.py to for sequence-tagging tasks
      - 19f1a40: Add `--no-repeat-ngram-size` option to generate.py for ngram blocking
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/422
      
      Differential Revision: D13548445
      
      Pulled By: myleott
      
      fbshipit-source-id: 26d1ae83993e428fcb020dac5ae358b0e36233d9
      8ce6499d
  27. 07 Dec, 2018 1 commit
  28. 06 Dec, 2018 1 commit
  29. 18 Nov, 2018 1 commit
  30. 07 Nov, 2018 1 commit
  31. 30 Sep, 2018 1 commit
    • Myle Ott's avatar
      Merge internal changes (#295) · b87c5366
      Myle Ott authored
      Summary:
      Changelog:
      - `90f52a1`: Support loading subsets of the data on each worker with the `--fix-batches-to-gpus` flag. This should fix #217 and #266.
      - `6eda0a9`: Update README for replicating the "Scaling Neural Machine Translation" paper
      - `b14c7cf`: Fallback to no_c10d backend for pytorch 0.4.1 (fixes #294)
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/295
      
      Differential Revision: D10121559
      
      Pulled By: myleott
      
      fbshipit-source-id: 41c84d0ee4cdd113544b5d3aa38ae8b23acc2c27
      b87c5366
  32. 25 Sep, 2018 4 commits
  33. 03 Sep, 2018 2 commits