1. 30 Jan, 2019 1 commit
    • Myle Ott's avatar
      Merge internal changes (#483) · 42be3ebd
      Myle Ott authored
      Summary:
      Changelog:
      - `4889802`: can now remove detokenize sentencepiece output with `--remove-bpe=sentencepiece` (fixes #331). Also added `--sacrebleu` for computing detokenized BLEU.
      - `0d76427`: fix assertion error when training language model with dataset containing empty sentences
      - minor bug and style fixes
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/483
      
      Differential Revision: D13867899
      
      Pulled By: myleott
      
      fbshipit-source-id: 25c940b847fe270262ac8f5ac838407b3977fdda
      42be3ebd
  2. 29 Jan, 2019 1 commit
  3. 25 Jan, 2019 4 commits
  4. 24 Jan, 2019 6 commits
  5. 17 Jan, 2019 2 commits
  6. 16 Jan, 2019 3 commits
    • Davide Caroselli's avatar
      FIX: '--user-dir' on multi-gpu (#449) · 7853818c
      Davide Caroselli authored
      Summary:
      On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`.
      
      This pull request fixes this problem: custom module import in now explicit on every `main()` function.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/449
      
      Differential Revision: D13676922
      
      Pulled By: myleott
      
      fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6
      7853818c
    • Myle Ott's avatar
      Add --checkpoint-upper-bound to average_checkpoints.py (#452) · bdec179b
      Myle Ott authored
      Summary:
      This is useful for averaging the last N checkpoints, ending at some "best" checkpoint.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/452
      
      Differential Revision: D13695407
      
      Pulled By: myleott
      
      fbshipit-source-id: 5d9d2bff3706834f01501e9259834c77fb335817
      bdec179b
    • Ruty Rinott's avatar
      optimizations for token_block_dataset · d1dc66d9
      Ruty Rinott authored
      Summary:
      optimizing memory use of token_block_dataset by replacing python data structures with numpy arrays.
      applying needed parts from D13498973, instead of rebasing it on changes
      
      Reviewed By: edunov
      
      Differential Revision: D13678485
      
      fbshipit-source-id: c0c827a8b95834a6a5456476040ebdc8e42136d4
      d1dc66d9
  7. 15 Jan, 2019 2 commits
  8. 14 Jan, 2019 2 commits
  9. 10 Jan, 2019 1 commit
  10. 09 Jan, 2019 2 commits
  11. 07 Jan, 2019 1 commit
  12. 05 Jan, 2019 3 commits
  13. 28 Dec, 2018 3 commits
  14. 26 Dec, 2018 2 commits
  15. 24 Dec, 2018 2 commits
    • Myle Ott's avatar
      Improve memory efficiency of FP16 optimization (#404) · 03a57dec
      Myle Ott authored
      Summary:
      Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory.
      
      This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16.
      
      This does not affect convergence, i.e., models will train exactly as they did before.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/404
      
      Differential Revision: D13394376
      
      Pulled By: myleott
      
      fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf
      03a57dec
    • Myle Ott's avatar
      Add BufferedIterator (#419) · 0f833526
      Myle Ott authored
      Summary:
      This improves performance for datasets that load data lazily. Enabled by default since it shouldn't compromise performance for non-lazy datasets.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/419
      
      Differential Revision: D13546585
      
      Pulled By: myleott
      
      fbshipit-source-id: f6152e2047291b0d68cd7506cd772b0caafe95be
      0f833526
  16. 18 Dec, 2018 1 commit
    • Haoran Li's avatar
      data per gpu change · 9ca82a0e
      Haoran Li authored
      Summary: Avoid loading entire data set per gpu to reduce memory footprint
      
      Reviewed By: rutyrinott
      
      Differential Revision: D13163548
      
      fbshipit-source-id: 4ba717c8021ba5723d02225bae5782e2c3a18640
      9ca82a0e
  17. 11 Dec, 2018 1 commit
  18. 08 Dec, 2018 1 commit
  19. 07 Dec, 2018 2 commits
    • Myle Ott's avatar
      Add --fp16-scale-tolerance (#397) · 03ef3ab8
      Myle Ott authored
      Summary:
      Let's only decrease the loss scale if a large enough percentage of batches overflow.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/397
      
      Differential Revision: D13355159
      
      Pulled By: myleott
      
      fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6
      03ef3ab8
    • Halil Akin's avatar
      Take a dummy train step under OOM to keep multiprocessing in sync · 6c006a34
      Halil Akin authored
      Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough.
      
      Reviewed By: myleott
      
      Differential Revision: D13086018
      
      fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee
      6c006a34