1. 18 Oct, 2019 3 commits
  2. 15 Oct, 2019 2 commits
    • Nayan Singhal's avatar
      Add Unit test cases for BMUF · b5f41f82
      Nayan Singhal authored
      Summary:
      This unit test guards the bmuf code.
      
      change:
      1. distributed_init assumes we are always using cuda device which is not the case if you are using "gloo" backend on CPU machine.
      
      Reviewed By: jay-mahadeokar
      
      Differential Revision: D17821391
      
      fbshipit-source-id: 28e1bb39f7a4889b1dc6bd636b7c499e55bfc69a
      b5f41f82
    • Changhan Wang's avatar
      fix libnat imports · e3a40d9d
      Changhan Wang authored
      Summary: Bring back the changes in D17661768
      
      Reviewed By: ailzhang
      
      Differential Revision: D17920299
      
      fbshipit-source-id: be3f93a044a8710c8b475012c39e36a3e6507fad
      e3a40d9d
  3. 12 Oct, 2019 1 commit
  4. 11 Oct, 2019 2 commits
    • Jiatao Gu's avatar
      fix the random mask function for CMLM model · 02b74c58
      Jiatao Gu authored
      Summary: The original implementation of the random mask is different from what the paper was stated.
      
      Reviewed By: kahne
      
      Differential Revision: D17652564
      
      fbshipit-source-id: 238a9158041b3ff2482ee50ce6151c3f77f0b2c1
      02b74c58
    • Jiatao Gu's avatar
      add new_arange function + FIX BUGS of returning attn values · cce92bdd
      Jiatao Gu authored
      Summary:
      Implementation of Levenshtein Transformer paper.
      Add a new helper function "new_arange" to create arange tensor easily.
      Fix bugs of returning attn values for NAT models
      Delete files which are not necessary or experimental.
      
      Reviewed By: kahne
      
      Differential Revision: D17652009
      
      fbshipit-source-id: 436bbb5d45de2f8067003232de4f2bd51e87719c
      cce92bdd
  5. 10 Oct, 2019 2 commits
    • Dmytro Okhonko's avatar
      Add ctc loss to ASR task (#1233) · c4893ca6
      Dmytro Okhonko authored
      Summary:
      Adds CTC loss and corresponding transformer ctc based models.
      
      Tested with
      `CUDA_VISIBLE_DEVICES=0 python train.py $DATA_PATH --save-dir $SAVE_DIR --max-epoch 30 --task speech_recognition --arch vggtransformer_enc_1 --optimizer adadelta --lr 1.0 --adadelta-eps 1e-8 --adadelta-rho 0.95 --clip-norm 10.0  --max-tokens 10000 --log-format json --log-interval 1 --criterion ctc_loss --user-dir examples/speech_recognition/ --validate-interval=10`
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1233
      
      Reviewed By: jcai1
      
      Differential Revision: D17856824
      
      Pulled By: okhonko
      
      fbshipit-source-id: f3eac64d3fdd0c37cf8c539dd360cfb610d8a6ef
      c4893ca6
    • Jeff Cai's avatar
      wav2letter integration · 33646ac9
      Jeff Cai authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/846
      
      Reviewed By: jcai1
      
      Differential Revision: D17845996
      
      Pulled By: okhonko
      
      fbshipit-source-id: 3826fd9a4418496916bf1835c319dd85c89945cc
      33646ac9
  6. 09 Oct, 2019 1 commit
    • Alex Xiao's avatar
      Fix data loading memory issue in pyspeech · b6e001f6
      Alex Xiao authored
      Summary:
      We currently shard data when creating the batch iterator. This means we first load all indicese/frame lengths/handles into memory, and then do the sharding. This makes it impossible to train on large datasets with a high amount of workers  because each worker will need to load the entire dataset into memory. For training on a million hours of data (i.e. semi-supervised or unsupervised approaches) this data loading just makes it flat out impossible to use 8 GPU's.
      
      3 changes:
      
      1. This diff modifies the data loading such that we do the sharding while we read the handles file, rather than later. This modification is done on a task-by-task basis, since the task specifies how the data is loaded. I've tried to make the code compatible with both sharding during handle loading and sharding during batch iteration. I've currently only done the sharding during handle loading for the aligned_training task.
      
      2. To support data sharding at data loading time and the requirement that all shards must have exactly the same # of batches, I've added a method to do this synchronization where all shards with too many batches would just truncate the extra ones, similar to what we already do.
      
      2. In fairspeq/train.py, we are actually loading the training dataset and batch iterator twice, once in train.py and once when loading the checkpoint (which we always do regardless if there is a checkpoint). This means double the loading time which can be painful for very large files. I've removed the extraneous loading in this diff as well.
      
      Reviewed By: yqwangustc
      
      Differential Revision: D17750715
      
      fbshipit-source-id: 0e6e3d363525fa5661f1c784303390ea13f46377
      b6e001f6
  7. 08 Oct, 2019 3 commits
    • Jerry Ma's avatar
      Add printing of PyTorch memory summary on OOM (#885) · 63b6b3f4
      Jerry Ma authored
      Summary:
      PyTorch now has more comprehensive memory instrumentation, added in https://github.com/pytorch/pytorch/pull/27361 . This PR makes fairseq print a summary table of the memory state when an OOM occurs.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/885
      
      Differential Revision: D17820445
      
      Pulled By: jma127
      
      fbshipit-source-id: 1887417c7648d703f78e1cff9f2a5b89901f49d0
      63b6b3f4
    • Jungo Kasai's avatar
      ensemble levts · 34e79c58
      Jungo Kasai authored
      Summary:
      Add ensemble wrappers to the levenshtein NAT.
      Levenshtein
      Final softmax ensemble over the pipeline of three steps: deletion, placeholder insertion, and word selection.
      1. Deletion
      2. Placeholder Insertion
      3. Word Selection
      
      Each step involves scoring, averaging the scores over the ensemble, and then make hard decisions with argmax. Then next step follows. We cannot do the three steps in parallel by design.
      
      Reviewed By: kahne
      
      Differential Revision: D17723202
      
      fbshipit-source-id: 05f7a4fcd922a972cc4796ca397e8220f0b4d53e
      34e79c58
    • Changhan Wang's avatar
      fix max lengths in Levenshtein Tramsformer · c2165224
      Changhan Wang authored
      Summary: Fix the max length calculation in Levenshtein Transformer
      
      Reviewed By: jhcross
      
      Differential Revision: D17672946
      
      fbshipit-source-id: e5efbe7e56cf879d3e822864e4398f99f45b04d4
      c2165224
  8. 07 Oct, 2019 1 commit
    • Nayan Singhal's avatar
      Setting Global sync to 50 in BMUF · 6f58e15e
      Nayan Singhal authored
      Summary:
      In all our final settings, we are using global_sync = 50 and we get comparable results with DDP and caffe2.
      
      Setting the default global-sync-iter = 50
      and users can just define --use-bmuf to enable it for training.
      
      Reviewed By: skritika
      
      Differential Revision: D17765094
      
      fbshipit-source-id: 369591eeff266d757f89e1fc8dda01711146fdbc
      6f58e15e
  9. 05 Oct, 2019 1 commit
  10. 04 Oct, 2019 2 commits
  11. 01 Oct, 2019 1 commit
  12. 30 Sep, 2019 2 commits
  13. 29 Sep, 2019 2 commits
  14. 28 Sep, 2019 1 commit
  15. 27 Sep, 2019 5 commits
  16. 26 Sep, 2019 1 commit
  17. 24 Sep, 2019 1 commit
  18. 23 Sep, 2019 3 commits
  19. 20 Sep, 2019 3 commits
    • Myle Ott's avatar
      Remove extraneous call to RNG in multi-GPU code path · 10f9349e
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/865
      
      Differential Revision: D17510276
      
      Pulled By: myleott
      
      fbshipit-source-id: 24119402ad5fe95a1312fadb77bafe49a9197c6b
      10f9349e
    • Myle Ott's avatar
      Update README.race.md · e869c80d
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1155
      
      Differential Revision: D17509762
      
      Pulled By: myleott
      
      fbshipit-source-id: 4de535289c1f35abff0d8142d8580f3ede039f47
      e869c80d
    • Naman Goyal's avatar
      added multilingual masked LM training (#849) · 32335404
      Naman Goyal authored
      Summary:
      The multilingual-RoBERTa training is working with aconneau XLM data.
      
      Two pieces remaining:
      
      1) `XLM` limits batch to be from same language, I am not 100% sure about the reason for that, but should be easy to implement, basically we can add `batch_by_size_and_language` instead of default `batch_by_size` function. If it's not critical, I would want to leave it out as it keeps the code very clean and simple.
      
      2) `sample_ratio` in `ConcatDataset` works with `int` by tiling the datasets based on ratio. Currently I am handling it by sounding off the ratio to `first decimal` and then multiplying by `10`. We can see if some such simple heuristics are good enough, there are other options (we can talk about them offline).
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/849
      
      Differential Revision: D17162460
      
      fbshipit-source-id: d967f3d872f7a1f0aa4ea418bd362b68af9e432f
      32335404
  20. 19 Sep, 2019 2 commits
    • Jerry Ma's avatar
      Add dataset class for weighted sampling with replacement. (#861) · a8a85c26
      Jerry Ma authored
      Summary:
      As discussed with Naman earlier today. Weighted sampling with
      replacement can be done on a per-epoch basis using `set_epoch()`
      functionality, which generates the samples as a function of random seed
      and epoch.
      
      Additionally, `FairseqTask` needs to set the starting epoch for the
      dataset at the very beginning of iterator construction.
      
      Not yet implemented is the per-epoch iterator construction, which
      is necessary to actually regenerate the batches for each epoch.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/861
      
      Differential Revision: D17460687
      
      Pulled By: jma127
      
      fbshipit-source-id: 1c2a54f04ac96b3561c100a6fd66a9fccbe3c658
      a8a85c26
    • Myle Ott's avatar
      Add cython language_level hints · 0eaaf355
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1147
      
      Differential Revision: D17468447
      
      Pulled By: myleott
      
      fbshipit-source-id: 0dbac04b92c8df74ad991d5e92cd02036d662369
      0eaaf355
  21. 18 Sep, 2019 1 commit