1. 13 Nov, 2019 4 commits
  2. 12 Nov, 2019 1 commit
    • Spencer Poff's avatar
      More thorough support for iterable datasets · 2a9b4ec2
      Spencer Poff authored
      Summary: Using PyTorch IterableDataset for streaming iterators. Such that there is a clean differentiation in interface between datasets that are streaming data and those that support indexed access.
      
      Reviewed By: myleott
      
      Differential Revision: D18438694
      
      fbshipit-source-id: 482857d8357091ea2a6bf819535b09ba7f1a5b7d
      2a9b4ec2
  3. 10 Nov, 2019 1 commit
  4. 09 Nov, 2019 1 commit
  5. 08 Nov, 2019 2 commits
    • Myle Ott's avatar
      Move fb_pathmgr registration out of train.py · e98bf7e6
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/903
      
      Reviewed By: sujitoc
      
      Differential Revision: D18327653
      
      fbshipit-source-id: 739ddbaf54862acdf7b4f1bc3ad538bde5ae00fd
      e98bf7e6
    • Xian Li's avatar
      Fix LevT edge cases · e9171ce1
      Xian Li authored
      Summary:
      To avoid the case where can_ins_mask has all False so max_lengths has size [0, 1] which failed expand_as operator. Move it back into the skipping branch in script.
      
      The same for deletion and ins_word.
      
      Reviewed By: kahne
      
      Differential Revision: D18365340
      
      fbshipit-source-id: 509ac21d7d6fd9083d0710697288203977314c52
      e9171ce1
  6. 07 Nov, 2019 4 commits
  7. 06 Nov, 2019 2 commits
  8. 05 Nov, 2019 2 commits
    • ngoyal2707's avatar
      XLM-R code and model release (#900) · e23e5eaa
      ngoyal2707 authored
      Summary:
      TODO:
      1) Need to update bibtex entry
      2) Need to upload models, spm_vocab and dict.txt to public s3 location.
      
      For Future:
      
      1) I will probably add instructions to finetune on XNLI and NER, POS etc. but currently no timeline for that.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/900
      
      Reviewed By: myleott
      
      Differential Revision: D18333076
      
      Pulled By: myleott
      
      fbshipit-source-id: 3f3d3716fcc41c78d2dd4525f60b519abbd0459c
      e23e5eaa
    • Spencer Poff's avatar
      Fixing key padding mask during transformer generation · 68dd3e17
      Spencer Poff authored
      Summary:
      https://github.com/pytorch/fairseq/pull/1097 added key padding mask history in TransformerDecoderLayer, but during an edge case where only the current or only the previous key_padding_mask exists, the resulting key_padding_mask is the wrong size.
      
      This diff adds empty columns in such a case to ensure key_padding_mask is a usable size.
      
      Reviewed By: myleott
      
      Differential Revision: D18224313
      
      fbshipit-source-id: c9fb7266baf0a2d79a66704e00a5ea8bd2987ff6
      68dd3e17
  9. 02 Nov, 2019 1 commit
  10. 01 Nov, 2019 2 commits
  11. 31 Oct, 2019 2 commits
  12. 30 Oct, 2019 1 commit
    • Xian Li's avatar
      layer drop · 856d8b82
      Xian Li authored
      Summary: This diff enables layer drop in transformer decoder in production training pipeline (ptt_transformer). It builds on top of the fairseq implementation D18094657 added by Angela Fan, and added additional logic to handle corresponding dropping layers at test time in exported model.
      
      Reviewed By: jhcross
      
      Differential Revision: D18165586
      
      fbshipit-source-id: 373ac00268a25fa9e412edcb483becdfe792d992
      856d8b82
  13. 28 Oct, 2019 1 commit
    • Ning Dong's avatar
      Fix LevT generator interface · 50cf3bb5
      Ning Dong authored
      Summary: Revert the interface change for iterative_refinement_generator
      
      Reviewed By: kahne
      
      Differential Revision: D18165103
      
      fbshipit-source-id: 075c276746eb90d7c359b6ad92e1ef25e8452bcc
      50cf3bb5
  14. 27 Oct, 2019 1 commit
    • Angela Fan's avatar
      adding layerdrop code for training, pruning, and readme (#890) · dabbef46
      Angela Fan authored
      Summary:
      TEST 1: EVALUATION TIME WORKS
      checked
      achieves correct model perplexity: 18.68
      
      TEST 2: TRAINING NEW MODEL WORKS
      checked
      
      without layerdrop:
      --decoder-layerdrop 0 OR no flag at all
      | epoch 001:     10 / 11201 loss=27.469, nll_loss=27.469, ppl=185799477.36, wps=1764, ups=0, wpb=9216.000, bsz=3.000, num_updates=7, lr=0.0004376, gnorm=25.471, clip=1.000, oom=0.000, loss_scale=8.000, wall=37, train_wall=30
      | epoch 001:     20 / 11201 loss=27.443, nll_loss=27.443, ppl=182500427.22, wps=2449, ups=0, wpb=9216.000, bsz=3.000, num_updates=17, lr=0.0010626, gnorm=25.273, clip=1.000, oom=0.000, loss_scale=8.000, wall=64, train_wall=57
      | epoch 001:     30 / 11201 loss=27.404, nll_loss=27.404, ppl=177612215.78, wps=2720, ups=0, wpb=9216.000, bsz=3.000, num_updates=27, lr=0.0016876, gnorm=25.136, clip=1.000, oom=0.000, loss_scale=8.000, wall=91, train_wall=84
      | epoch 001:     40 / 11201 loss=27.009, nll_loss=27.009, ppl=135079983.00, wps=2865, ups=0, wpb=9216.000, bsz=3.000, num_updates=37, lr=0.0023126, gnorm=24.311, clip=1.000, oom=0.000, loss_scale=8.000, wall=119, train_wall=112
      | epoch 001:     50 / 11201 loss=26.418, nll_loss=26.418, ppl=89680259.41, wps=2952, ups=0, wpb=9216.000, bsz=3.000, num_updates=47, lr=0.0029376, gnorm=22.775, clip=1.000, oom=0.000, loss_scale=8.000, wall=147, train_wall=140
      
      with layerdrop (regularization effect should be seen in PPL):
      --decoder-layerdrop 0.2
      
      | epoch 001:     10 / 11201 loss=25.186, nll_loss=25.186, ppl=38182937.27, wps=2428, ups=0, wpb=9216.000, bsz=3.000, num_updates=8, lr=0.0005001, gnorm=17.082, clip=1.000, oom=0.000, loss_scale=16.000, wall=30, train_wall=24
      | epoch 001:     20 / 11201 loss=25.270, nll_loss=25.270, ppl=40451933.50, wps=3173, ups=0, wpb=9216.000, bsz=3.000, num_updates=18, lr=0.0011251, gnorm=17.162, clip=1.000, oom=0.000, loss_scale=16.000, wall=52, train_wall=45
      | epoch 001:     30 / 11201 loss=25.349, nll_loss=25.349, ppl=42752256.68, wps=3454, ups=0, wpb=9216.000, bsz=3.000, num_updates=28, lr=0.0017501, gnorm=17.370, clip=1.000, oom=0.000, loss_scale=16.000, wall=75, train_wall=68
      | epoch 001:     40 / 11201 loss=25.115, nll_loss=25.115, ppl=36343806.30, wps=3619, ups=0, wpb=9216.000, bsz=3.000, num_updates=38, lr=0.0023751, gnorm=16.945, clip=1.000, oom=0.000, loss_scale=16.000, wall=97, train_wall=90
      | epoch 001:     50 / 11201 loss=24.804, nll_loss=24.804, ppl=29284345.78, wps=3716, ups=0, wpb=9216.000, bsz=3.000, num_updates=48, lr=0.0030001, gnorm=16.406, clip=1.000, oom=0.000, loss_scale=16.000, wall=119, train_wall=112
      
      TEST 3: PICKING UP TRAINING FROM EXISTING MODEL
      checked
      
      | loaded checkpoint /checkpoint/angelafan/structured_0.1_block_8_sd02/checkpoint_last.pt (epoch 272 @ 381066 updates)
      | loading train data for epoch 272
      | loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train
      
      TEST 4: EVALUATING EXISTING BERT MODEL REPROS RESULTS
      | [input] dictionary: 50265 types
      | [label] dictionary: 9 types
      | Accuracy:  0.9231651376146789
      achieves correct accuracy on SST2 for this model
      
      TEST 5: TRAINING NEW BERT MODEL WORKS
      checked and works
      
      TEST 6: NMT
      
      without layerdrop
      --encoder-layerdrop 0 --decoder-layerdrop 0 OR combinations of flag specified and not specified
      
      | epoch 001:     10 / 92203 loss=15.820, nll_loss=15.830, ppl=58267.93, wps=4902, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=7.207, clip=0.000, oom=0.000, loss_scale=128.000, wall=60, train_wall=3
      | epoch 001:     20 / 92203 loss=15.523, nll_loss=15.501, ppl=46359.29, wps=5037, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.869, clip=0.000, oom=0.000, loss_scale=128.000, wall=63, train_wall=6
      | epoch 001:     30 / 92203 loss=15.185, nll_loss=15.123, ppl=35695.79, wps=5085, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.186, clip=0.000, oom=0.000, loss_scale=128.000, wall=66, train_wall=9
      | epoch 001:     40 / 92203 loss=14.940, nll_loss=14.849, ppl=29505.60, wps=5116, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=5.610, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=12
      | epoch 001:     50 / 92203 loss=14.745, nll_loss=14.630, ppl=25346.87, wps=5070, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.104, clip=0.000, oom=0.000, loss_scale=128.000, wall=71, train_wall=15
      
      with layerdrop (regularization effect should be seen in PPL)
      
      A) works with --encoder-layerdrop 0.2 --decoder-layerdrop 0.2
      B) works with different settings --encoder-layerdrop 0.3 --decoder-layerdrop 0.5
      C) works with one on and one off --encoder-layerdrop 0.2 --decoder-layerdrop 0
      
      | epoch 001:     10 / 92203 loss=15.817, nll_loss=15.828, ppl=58158.54, wps=5355, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=6.959, clip=0.000, oom=0.000, loss_scale=128.000, wall=59, train_wall=3
      | epoch 001:     20 / 92203 loss=15.650, nll_loss=15.641, ppl=51111.63, wps=5515, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.825, clip=0.000, oom=0.000, loss_scale=128.000, wall=61, train_wall=6
      | epoch 001:     30 / 92203 loss=15.440, nll_loss=15.408, ppl=43491.58, wps=5602, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.576, clip=0.000, oom=0.000, loss_scale=128.000, wall=64, train_wall=8
      | epoch 001:     40 / 92203 loss=15.247, nll_loss=15.193, ppl=37457.14, wps=5676, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=6.124, clip=0.000, oom=0.000, loss_scale=128.000, wall=67, train_wall=11
      | epoch 001:     50 / 92203 loss=15.055, nll_loss=14.977, ppl=32259.92, wps=5598, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.661, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=14
      
      TEST 7: PRUNING TESTCASES
      
      A) after adding the pruning flags, model can evaluate as a full model
      checked, reaches correct PPL
      num. model params: 246933504
      | Evaluated 217646 tokens in 196.3s (1108.99 tokens/s)
      | Loss: 2.9275, Perplexity: 18.68
      
      B) after adding pruning flags, model can be pruned. this works with multiple flag settings
      checked three cases:
      num. model params: 146163712
      | Evaluated 217646 tokens in 106.0s (2054.07 tokens/s)
      | Loss: 3.0932, Perplexity: 22.05
      
      num. model params: 209144832
      | Evaluated 217646 tokens in 162.8s (1336.99 tokens/s)
      | Loss: 2.9526, Perplexity: 19.16
      
      C) model can pick up training if you want to finetune the pruned model
      checked:
      | loading train data for epoch 272
      | loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train
      | WARNING: overflow detected, setting loss scale to: 64.0
      | WARNING: overflow detected, setting loss scale to: 32.0
      | epoch 272:   1500 / 5601 loss=5.015, nll_loss=5.015, ppl=32.33, wps=11598, ups=1, wpb=18432.000, bsz=6.000, num_updates=98, lr=0.0061251, gnorm=0.613, clip=1.000, oom=0.000, loss_scale=32.000, wall=156, train_wall=252396
      
      D) works with BERT
      checked:
      without specifying any flags, reproduces the correct standard accuracy
      with flags, produces the correct pruned accuracy
      
      | [input] dictionary: 50265 types
      | [label] dictionary: 9 types
      | Accuracy:  0.9231651376146789
      
      | [input] dictionary: 50265 types
      | [label] dictionary: 9 types
      | Pruning model to specified layer configuration - this works best if the model was trained with LayerDrop
      | Accuracy:  0.9220183486238532
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/890
      
      Reviewed By: edunov
      
      Differential Revision: D18094657
      
      Pulled By: huihuifan
      
      fbshipit-source-id: 2bbaa2ff0039e906782694fc2038b8c17a8693e7
      dabbef46
  15. 26 Oct, 2019 1 commit
    • Xian Li's avatar
      fix a type mismatch in NAT quantization run · eb68afca
      Xian Li authored
      Summary:
      Fix a type mismatch which was found after patching NAT on top of quantization.
      Ning suggested this fix. Need to further understand: why this only appears after patching quantization diff?
      
      Reviewed By: kahne, jhcross
      
      Differential Revision: D18147726
      
      fbshipit-source-id: a51becc9ad58a637a0180074eaa2b46990ab9f84
      eb68afca
  16. 25 Oct, 2019 2 commits
  17. 24 Oct, 2019 4 commits
  18. 23 Oct, 2019 1 commit
    • Yilei Li's avatar
      Add warmup support in reduce_on_plateau lr schedule · 8defa9d9
      Yilei Li authored
      Summary:
      Enables reduce_on_plateau schedule with optional warmup phase, where we linearly increase the learning rate from some initial learning rate (``--warmup-init-lr``) until the configured learning rate (``--lr``). Thereafter the lr is adjusted according to original reduce_on_plateau scheme
      During warmup::
      
            lrs = torch.linspace(args.warmup_init_lr, args.lr, args.warmup_updates)
            lr = lrs[update_num]
      
      Reviewed By: yqwangustc
      
      Differential Revision: D17779925
      
      fbshipit-source-id: c3bfb3321c76850824fc42df4fac4e5dcf73fbf8
      8defa9d9
  19. 22 Oct, 2019 3 commits
  20. 20 Oct, 2019 2 commits
    • Jiatao Gu's avatar
      Enable separate models for insertion and deletion; · 66d24dc2
      Jiatao Gu authored
      Summary:
      The Diff conatins two fixes:
      (1) enabling non-shared decoder layers for deletion/insertion
      (2) adding options to perform sampling instead of argmax when learning the deletion
      
      Reviewed By: kahne
      
      Differential Revision: D18011220
      
      fbshipit-source-id: c60815fb7bc3a0004c81249504f7a641536ae2d8
      66d24dc2
    • Jiatao Gu's avatar
      Fix typos on Examples for Nonautoregressive translation · a3c629b5
      Jiatao Gu authored
      Summary: Fix typos in the examples
      
      Reviewed By: kahne
      
      Differential Revision: D18030097
      
      fbshipit-source-id: 84f0cbafd85e50ffd5033738835373935e3b83d4
      a3c629b5
  21. 18 Oct, 2019 2 commits