1. 30 Apr, 2019 1 commit
    • Liezl Puzon's avatar
      Fix upgrade_state_dict for XLM Transformer sentence encoder (#680) · 121877f5
      Liezl Puzon authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/680
      
      Some embedding names were renamed but this one was missed
      
      So far I've only seen this affect our runs during continuing training. If you encountered any errors when continuing training from an XLM save_dir, rebasing past this diff (or patching this and canarying) should fix the problem
      
      Reviewed By: pipibjc
      
      Differential Revision: D15137463
      
      fbshipit-source-id: c72067f16aaf1ba2b8286938bd25a19b70ae8712
      121877f5
  2. 29 Apr, 2019 2 commits
  3. 27 Apr, 2019 2 commits
  4. 26 Apr, 2019 1 commit
  5. 25 Apr, 2019 6 commits
  6. 24 Apr, 2019 1 commit
  7. 22 Apr, 2019 2 commits
    • Max Ryabinin's avatar
      Fix generation with --no-early-stop (#627) · fa52d202
      Max Ryabinin authored
      Summary:
      Because the size of `unfinalized_scores` is equal to current `bsz` and not initial batch size, we need to index it by `unfin_idx` instead of `sent` in `is_finished`.
      Fixes #588.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/627
      
      Differential Revision: D15034641
      
      Pulled By: myleott
      
      fbshipit-source-id: 2638e68e877ae01256cac7d8e69b5b7fec8f7017
      fa52d202
    • Yongqiang Wang's avatar
      reduce memory footprint for average_checkpoints (#647) · d63477e1
      Yongqiang Wang authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/647
      
      the current implementation of average_checkpoints requires loading all
      the model parameters into memory and then do the averaging. To average large
      models (e.g., transformer) over a large number of checkpoints (e.g., >50),
      it may require over 100GB memory.
      
      Loading all the parameters is not necessary, as we know the number of models in advance.
      
      Reviewed By: skritika
      
      Differential Revision: D15027513
      
      fbshipit-source-id: 0afe37c9a031a9ab0f1e78844a37be49ec5f76f1
      d63477e1
  8. 17 Apr, 2019 3 commits
  9. 16 Apr, 2019 1 commit
  10. 15 Apr, 2019 3 commits
  11. 12 Apr, 2019 1 commit
  12. 10 Apr, 2019 4 commits
  13. 09 Apr, 2019 2 commits
  14. 07 Apr, 2019 1 commit
    • Haoran Li's avatar
      move distributed_init after get_batch_iterator · 34028c63
      Haoran Li authored
      Summary: There are constantly wait timeout issue for using multiple nodes, even setting copylocallytempdir:/ doesn't help, eg f105637629. It seems to be working after I moved distributed_init after get_batch_iterator, eg f106520580
      
      Reviewed By: myleott
      
      Differential Revision: D14817769
      
      fbshipit-source-id: edbb101a28d8082241c7bdd8c5500c9dad27647c
      34028c63
  15. 05 Apr, 2019 3 commits
  16. 04 Apr, 2019 1 commit
    • Jay Mahadeokar's avatar
      aligned training task and CE related changes · 3658fa32
      Jay Mahadeokar authored
      Summary:
      This diff adds:
      
      1. Aligned training task specifically for doing cross entropy criterion training using prod data and prod like models
      2. Few changes to correctly register the task and criterions.
      3. Changes to trainer code for propogating accuracy metrics which we care about for training.
      
      Couple of things are hacky right now:
      - The reporting is not modular (this needs to be thought about in general for fairseq).
      
      - The get dummy batch could be specific to task instead of specific for dataset.
      
      Reviewed By: myleott
      
      Differential Revision: D14670482
      
      fbshipit-source-id: dc077247b2ae9d26a8e842a386ec5faa5771e836
      3658fa32
  17. 03 Apr, 2019 2 commits
  18. 02 Apr, 2019 3 commits
  19. 29 Mar, 2019 1 commit