"vscode:/vscode.git/clone" did not exist on "406e03b7b1dade31738dc777aa2b396a3fbc3183"
  1. 06 Dec, 2018 3 commits
  2. 04 Dec, 2018 1 commit
  3. 30 Nov, 2018 1 commit
  4. 29 Nov, 2018 2 commits
  5. 27 Nov, 2018 2 commits
  6. 26 Nov, 2018 2 commits
  7. 19 Nov, 2018 1 commit
    • Halil Akin's avatar
      Protect against failures in case of OOMs · a442244d
      Halil Akin authored
      Summary: Fixing some distributed failures that happen when OOMs are observed.
      
      Reviewed By: myleott
      
      Differential Revision: D13121054
      
      fbshipit-source-id: f71a0a695332acbaa1797e89887b8b7c7ddaa727
      a442244d
  8. 18 Nov, 2018 2 commits
  9. 17 Nov, 2018 1 commit
  10. 16 Nov, 2018 1 commit
    • Haoran Li's avatar
      make dictionary optional · a4e34985
      Haoran Li authored
      Reviewed By: jingfeidu
      
      Differential Revision: D13104360
      
      fbshipit-source-id: 9636f5ee2721818f98b33af559fa24292534a72f
      a4e34985
  11. 14 Nov, 2018 1 commit
  12. 13 Nov, 2018 1 commit
  13. 10 Nov, 2018 1 commit
    • Ruty Rinott's avatar
      pipeline for LM training · 880e7cd4
      Ruty Rinott authored
      Summary:
      step 2 of pipeline for LM training
      assumes tokenized text data as input. Splits it into train/validation/test, and runs binarization
      (step a_ii in https://fb.quip.com/kazzAxvZHBj9)
      
      Reviewed By: borguz
      
      Differential Revision: D10454705
      
      fbshipit-source-id: 74e8679041f5507c4e404c1b719547c2ae9ed983
      880e7cd4
  14. 08 Nov, 2018 1 commit
    • Peng-Jen Chen's avatar
      Fix error when training multilingual_translation task with multi-GPU · 189fcabf
      Peng-Jen Chen authored
      Summary:
      D10052908 introduce multilingual_translation task, but it raises exception when training with multiple-GPUs: P60202593
      
      With Myle's help, we found that it is because of improperly handled dummy batch data type, and it causes optimizer.backward() is not executed same number of times cross different GPUs.
      
      Reviewed By: xianxl
      
      Differential Revision: D12964263
      
      fbshipit-source-id: 4991039030bf373f0c484e131acc4736487be4d8
      189fcabf
  15. 07 Nov, 2018 2 commits
    • Myle Ott's avatar
      Merge internal changes · 8eb232ce
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/352
      
      Differential Revision: D12956930
      
      Pulled By: myleott
      
      fbshipit-source-id: 39334a79544bac570feb04be9103269d7c1563f9
      8eb232ce
    • Liezl Puzon's avatar
      Support BPE end of word marker suffix in fairseq noising module · 2b13f3c0
      Liezl Puzon authored
      Summary:
      There are 2 ways to implement BPE:
      1. use a continuation marker suffix to indicate that there is at least one more subtoken left in the word
      2. use a end of word marker suffix to indicate that there is no more subtokens left in the word
      
      This adds some logic to account for either kind of BPE marker suffix. This diff adds a corresponding test. I also refactored the test setup to reduce the number of boolean args when setting up test data.
      
      Reviewed By: xianxl
      
      Differential Revision: D12919428
      
      fbshipit-source-id: 405e9f346dce6e736c1305288721dfc7b63e872a
      2b13f3c0
  16. 02 Nov, 2018 2 commits
  17. 01 Nov, 2018 6 commits
  18. 30 Oct, 2018 1 commit
    • James Cross's avatar
      transformer onnx trace: skip no-op transpose (#333) · 672977c1
      James Cross authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/333
      
      A tiny hack to speed up inference slightly for transformer beam search after export to graph mode. Specifically, there is no need to transpose a dimension with size 1 (the sequence length of a single decoder time step during beam search) with its neighbor immediately before a view/reshape.
      
      Reviewed By: jmp84
      
      Differential Revision: D12833011
      
      fbshipit-source-id: f9c344a9ad595e6e48a8a65b31cf2b1392f9b938
      672977c1
  19. 27 Oct, 2018 1 commit
    • Xian Li's avatar
      Extend WordShuffle noising function to apply to non-bpe tokens · 90c01b3a
      Xian Li authored
      Summary:
      We'd like to resue the noising functions and DenoisingDataset in
      adversarial training. However, current noising functions assume the input are
      subword tokens. The goal of this diff is to extend it so the noising can be
      applied to word tokens. Since we're mostly interested in the word shuffle
      noising, so I only modified the WordShuffle class.
      
      Reviewed By: liezl200
      
      Differential Revision: D10523177
      
      fbshipit-source-id: 1e5d27362850675010e73cd38850c890d42652ab
      90c01b3a
  20. 26 Oct, 2018 1 commit
    • Wei Ho's avatar
      Fix print & add more informative logging · 6117f827
      Wei Ho authored
      Summary: Fix fairseq's `force` option for disabling print suppression (otherwise, `print(..., force=True)` fails on master since the force kwarg gets passed to the builtin print).
      
      Reviewed By: dpacgopinath
      
      Differential Revision: D10522058
      
      fbshipit-source-id: bbc10c021a7d21396ebfbb1bf007f6b9b162f4fd
      6117f827
  21. 25 Oct, 2018 2 commits
  22. 23 Oct, 2018 2 commits
  23. 22 Oct, 2018 1 commit
    • Halil Akin's avatar
      Fix another distributed syncing issue · 23e9dc2e
      Halil Akin authored
      Summary:
      This is another failure due to distributed GPU's getting out of sync.
      We are running save_and_eval (which has the inter-gpu communication calls) by
      looking at number of updates. But number of updates means weight updates. Whenever
      there is an issue in the training and weights can't be updated, nodes go
      out of sync and nodes start failing. So we should check number of iterations instead.
      
      I am, again, making a small change to save the day, but we should decouple/refactor
      save_and_eval logic from the training, to have less headache in future.
      Planning, working on that in future. But this should solve some of the
      issues for now.
      
      Reviewed By: jhcross
      
      Differential Revision: D10478427
      
      fbshipit-source-id: b9deacfea252b2fb66b81c799fa78e2439fa514c
      23e9dc2e
  24. 21 Oct, 2018 1 commit
  25. 19 Oct, 2018 1 commit
    • Peng-Jen Chen's avatar
      Update upgrade_state_dict in transformer.py to upgrade_state_dict_named (#317) · 0a628401
      Peng-Jen Chen authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/317
      
      When upgrading `state_dict` variable, `upgrade_state_dict` function in TransformerEncoder/TransformerDecoder doesn't handle multiple encoders/decoders, however, D10052908 will be the case.
      
      Before the change, we will hit error message [1] when loading checkpoint for multilingual_transformer model in D10052908. This diff will fix it.
      
      Reviewed By: myleott, liezl200
      
      Differential Revision: D10375418
      
      fbshipit-source-id: 7104c1a463e78f3fa33d8479a37c51608be50610
      0a628401