1. 30 Apr, 2019 1 commit
  2. 22 Apr, 2019 1 commit
    • Yongqiang Wang's avatar
      reduce memory footprint for average_checkpoints (#647) · d63477e1
      Yongqiang Wang authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/647
      
      the current implementation of average_checkpoints requires loading all
      the model parameters into memory and then do the averaging. To average large
      models (e.g., transformer) over a large number of checkpoints (e.g., >50),
      it may require over 100GB memory.
      
      Loading all the parameters is not necessary, as we know the number of models in advance.
      
      Reviewed By: skritika
      
      Differential Revision: D15027513
      
      fbshipit-source-id: 0afe37c9a031a9ab0f1e78844a37be49ec5f76f1
      d63477e1
  3. 16 Jan, 2019 1 commit
  4. 06 Dec, 2018 1 commit
  5. 15 Jun, 2018 5 commits
  6. 02 Apr, 2018 1 commit
    • Myle Ott's avatar
      Merge internal changes (#136) · d3795d6c
      Myle Ott authored
      Changes:
      - 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search
      - c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model
      - 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates
      - small bugfixes for distributed training, LSTM, inverse square root LR scheduler
      d3795d6c