1. 21 Nov, 2019 3 commits
    • Tatiana Likhomanenko's avatar
      Fix warmup for fixed_schedule in case of first update (#1408) · 60e16a35
      Tatiana Likhomanenko authored
      Summary:
      I faced the error while using warmup for fixed lr schedule
      
      ```
      Traceback (most recent call last):
        File "/private/home/antares/.conda/envs/fairseq-20190809/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
          fn(i, *args)
        File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/train.py", line 291, in distributed_main
          main(args, init_distributed=True)
        File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/train.py", line 81, in main
          train(args, trainer, task, epoch_itr)
        File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/train.py", line 122, in train
          log_output = trainer.train_step(samples)
        File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/fairseq/trainer.py", line 409, in train_step
          self.optimizer.step()
        File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/fairseq/optim/fp16_optimizer.py", line 153, in step
          self.fp32_optimizer.step(closure)
        File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/fairseq/optim/fairseq_optimizer.py", line 98, in step
          self.optimizer.step(closure)
        File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/fairseq/optim/nag.py", line 68, in step
          lr_correct = lr / lr_old
      ZeroDivisionError: float division by zero
      ```
      which is due to `num_updates=0` for the first iteration and thus `lr` we set to the optimizer is zero.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1408
      
      Differential Revision: D18637526
      
      Pulled By: myleott
      
      fbshipit-source-id: fdd81dd69b1b38bc21a4fa315b4e25cee03af6bf
      60e16a35
    • ngoyal2707's avatar
      added instructions to FT bart on cnn-dm · 226c1f48
      ngoyal2707 authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/922
      
      Differential Revision: D18617322
      
      fbshipit-source-id: 50645197cb7f075b5f878818a97358653077c3e0
      226c1f48
    • Alex Xiao's avatar
      Refactor data sharding to be specified via caller of task rather than task itself · 99fbd317
      Alex Xiao authored
      Summary: Modifying number of shards internally to disable data sharding for batch iteration is dangerous because the caller of these tasks is not limited to fairspeq/train. So therefore we should put the onus of data sharding properly on the caller rather than the task itself.
      
      Reviewed By: myleott
      
      Differential Revision: D18456424
      
      fbshipit-source-id: d46be16c441c50082f9a768d0b259e6c28a4b67b
      99fbd317
  2. 20 Nov, 2019 1 commit
  3. 19 Nov, 2019 3 commits
  4. 18 Nov, 2019 3 commits
  5. 17 Nov, 2019 1 commit
  6. 15 Nov, 2019 1 commit
  7. 14 Nov, 2019 4 commits
  8. 13 Nov, 2019 5 commits
  9. 12 Nov, 2019 1 commit
    • Spencer Poff's avatar
      More thorough support for iterable datasets · 2a9b4ec2
      Spencer Poff authored
      Summary: Using PyTorch IterableDataset for streaming iterators. Such that there is a clean differentiation in interface between datasets that are streaming data and those that support indexed access.
      
      Reviewed By: myleott
      
      Differential Revision: D18438694
      
      fbshipit-source-id: 482857d8357091ea2a6bf819535b09ba7f1a5b7d
      2a9b4ec2
  10. 10 Nov, 2019 1 commit
  11. 09 Nov, 2019 1 commit
  12. 08 Nov, 2019 2 commits
    • Myle Ott's avatar
      Move fb_pathmgr registration out of train.py · e98bf7e6
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/903
      
      Reviewed By: sujitoc
      
      Differential Revision: D18327653
      
      fbshipit-source-id: 739ddbaf54862acdf7b4f1bc3ad538bde5ae00fd
      e98bf7e6
    • Xian Li's avatar
      Fix LevT edge cases · e9171ce1
      Xian Li authored
      Summary:
      To avoid the case where can_ins_mask has all False so max_lengths has size [0, 1] which failed expand_as operator. Move it back into the skipping branch in script.
      
      The same for deletion and ins_word.
      
      Reviewed By: kahne
      
      Differential Revision: D18365340
      
      fbshipit-source-id: 509ac21d7d6fd9083d0710697288203977314c52
      e9171ce1
  13. 07 Nov, 2019 4 commits
  14. 06 Nov, 2019 2 commits
  15. 05 Nov, 2019 2 commits
    • ngoyal2707's avatar
      XLM-R code and model release (#900) · e23e5eaa
      ngoyal2707 authored
      Summary:
      TODO:
      1) Need to update bibtex entry
      2) Need to upload models, spm_vocab and dict.txt to public s3 location.
      
      For Future:
      
      1) I will probably add instructions to finetune on XNLI and NER, POS etc. but currently no timeline for that.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/900
      
      Reviewed By: myleott
      
      Differential Revision: D18333076
      
      Pulled By: myleott
      
      fbshipit-source-id: 3f3d3716fcc41c78d2dd4525f60b519abbd0459c
      e23e5eaa
    • Spencer Poff's avatar
      Fixing key padding mask during transformer generation · 68dd3e17
      Spencer Poff authored
      Summary:
      https://github.com/pytorch/fairseq/pull/1097 added key padding mask history in TransformerDecoderLayer, but during an edge case where only the current or only the previous key_padding_mask exists, the resulting key_padding_mask is the wrong size.
      
      This diff adds empty columns in such a case to ensure key_padding_mask is a usable size.
      
      Reviewed By: myleott
      
      Differential Revision: D18224313
      
      fbshipit-source-id: c9fb7266baf0a2d79a66704e00a5ea8bd2987ff6
      68dd3e17
  16. 02 Nov, 2019 1 commit
  17. 01 Nov, 2019 2 commits
  18. 31 Oct, 2019 2 commits
  19. 30 Oct, 2019 1 commit
    • Xian Li's avatar
      layer drop · 856d8b82
      Xian Li authored
      Summary: This diff enables layer drop in transformer decoder in production training pipeline (ptt_transformer). It builds on top of the fairseq implementation D18094657 added by Angela Fan, and added additional logic to handle corresponding dropping layers at test time in exported model.
      
      Reviewed By: jhcross
      
      Differential Revision: D18165586
      
      fbshipit-source-id: 373ac00268a25fa9e412edcb483becdfe792d992
      856d8b82