1. 24 Jan, 2019 5 commits
  2. 17 Jan, 2019 2 commits
  3. 16 Jan, 2019 3 commits
    • Davide Caroselli's avatar
      FIX: '--user-dir' on multi-gpu (#449) · 7853818c
      Davide Caroselli authored
      Summary:
      On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`.
      
      This pull request fixes this problem: custom module import in now explicit on every `main()` function.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/449
      
      Differential Revision: D13676922
      
      Pulled By: myleott
      
      fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6
      7853818c
    • Myle Ott's avatar
      Add --checkpoint-upper-bound to average_checkpoints.py (#452) · bdec179b
      Myle Ott authored
      Summary:
      This is useful for averaging the last N checkpoints, ending at some "best" checkpoint.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/452
      
      Differential Revision: D13695407
      
      Pulled By: myleott
      
      fbshipit-source-id: 5d9d2bff3706834f01501e9259834c77fb335817
      bdec179b
    • Ruty Rinott's avatar
      optimizations for token_block_dataset · d1dc66d9
      Ruty Rinott authored
      Summary:
      optimizing memory use of token_block_dataset by replacing python data structures with numpy arrays.
      applying needed parts from D13498973, instead of rebasing it on changes
      
      Reviewed By: edunov
      
      Differential Revision: D13678485
      
      fbshipit-source-id: c0c827a8b95834a6a5456476040ebdc8e42136d4
      d1dc66d9
  4. 15 Jan, 2019 2 commits
  5. 14 Jan, 2019 2 commits
  6. 10 Jan, 2019 1 commit
  7. 09 Jan, 2019 2 commits
  8. 07 Jan, 2019 1 commit
  9. 05 Jan, 2019 3 commits
  10. 28 Dec, 2018 3 commits
  11. 26 Dec, 2018 2 commits
  12. 24 Dec, 2018 2 commits
    • Myle Ott's avatar
      Improve memory efficiency of FP16 optimization (#404) · 03a57dec
      Myle Ott authored
      Summary:
      Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory.
      
      This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16.
      
      This does not affect convergence, i.e., models will train exactly as they did before.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/404
      
      Differential Revision: D13394376
      
      Pulled By: myleott
      
      fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf
      03a57dec
    • Myle Ott's avatar
      Add BufferedIterator (#419) · 0f833526
      Myle Ott authored
      Summary:
      This improves performance for datasets that load data lazily. Enabled by default since it shouldn't compromise performance for non-lazy datasets.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/419
      
      Differential Revision: D13546585
      
      Pulled By: myleott
      
      fbshipit-source-id: f6152e2047291b0d68cd7506cd772b0caafe95be
      0f833526
  13. 18 Dec, 2018 1 commit
    • Haoran Li's avatar
      data per gpu change · 9ca82a0e
      Haoran Li authored
      Summary: Avoid loading entire data set per gpu to reduce memory footprint
      
      Reviewed By: rutyrinott
      
      Differential Revision: D13163548
      
      fbshipit-source-id: 4ba717c8021ba5723d02225bae5782e2c3a18640
      9ca82a0e
  14. 11 Dec, 2018 1 commit
  15. 08 Dec, 2018 1 commit
  16. 07 Dec, 2018 2 commits
    • Myle Ott's avatar
      Add --fp16-scale-tolerance (#397) · 03ef3ab8
      Myle Ott authored
      Summary:
      Let's only decrease the loss scale if a large enough percentage of batches overflow.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/397
      
      Differential Revision: D13355159
      
      Pulled By: myleott
      
      fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6
      03ef3ab8
    • Halil Akin's avatar
      Take a dummy train step under OOM to keep multiprocessing in sync · 6c006a34
      Halil Akin authored
      Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough.
      
      Reviewed By: myleott
      
      Differential Revision: D13086018
      
      fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee
      6c006a34
  17. 06 Dec, 2018 4 commits
  18. 04 Dec, 2018 1 commit
  19. 30 Nov, 2018 1 commit
  20. 29 Nov, 2018 1 commit
    • Haoran Li's avatar
      fixes on bi-transformer onnx · 7bbe528d
      Haoran Li authored
      Summary: replace dynamic index put with copying and creating a new tensor
      
      Reviewed By: wanchaol
      
      Differential Revision: D13244573
      
      fbshipit-source-id: 909f7913ad579ed035f29bb52321ff01e09a2c60
      7bbe528d