1. 02 Dec, 2019 1 commit
  2. 26 Nov, 2019 2 commits
  3. 21 Nov, 2019 1 commit
  4. 19 Nov, 2019 1 commit
  5. 18 Nov, 2019 1 commit
  6. 17 Nov, 2019 1 commit
  7. 14 Nov, 2019 1 commit
  8. 10 Nov, 2019 1 commit
  9. 09 Nov, 2019 1 commit
  10. 07 Nov, 2019 2 commits
  11. 06 Nov, 2019 1 commit
  12. 05 Nov, 2019 1 commit
    • ngoyal2707's avatar
      XLM-R code and model release (#900) · e23e5eaa
      ngoyal2707 authored
      Summary:
      TODO:
      1) Need to update bibtex entry
      2) Need to upload models, spm_vocab and dict.txt to public s3 location.
      
      For Future:
      
      1) I will probably add instructions to finetune on XNLI and NER, POS etc. but currently no timeline for that.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/900
      
      Reviewed By: myleott
      
      Differential Revision: D18333076
      
      Pulled By: myleott
      
      fbshipit-source-id: 3f3d3716fcc41c78d2dd4525f60b519abbd0459c
      e23e5eaa
  13. 02 Nov, 2019 1 commit
  14. 27 Oct, 2019 1 commit
    • Angela Fan's avatar
      adding layerdrop code for training, pruning, and readme (#890) · dabbef46
      Angela Fan authored
      Summary:
      TEST 1: EVALUATION TIME WORKS
      checked
      achieves correct model perplexity: 18.68
      
      TEST 2: TRAINING NEW MODEL WORKS
      checked
      
      without layerdrop:
      --decoder-layerdrop 0 OR no flag at all
      | epoch 001:     10 / 11201 loss=27.469, nll_loss=27.469, ppl=185799477.36, wps=1764, ups=0, wpb=9216.000, bsz=3.000, num_updates=7, lr=0.0004376, gnorm=25.471, clip=1.000, oom=0.000, loss_scale=8.000, wall=37, train_wall=30
      | epoch 001:     20 / 11201 loss=27.443, nll_loss=27.443, ppl=182500427.22, wps=2449, ups=0, wpb=9216.000, bsz=3.000, num_updates=17, lr=0.0010626, gnorm=25.273, clip=1.000, oom=0.000, loss_scale=8.000, wall=64, train_wall=57
      | epoch 001:     30 / 11201 loss=27.404, nll_loss=27.404, ppl=177612215.78, wps=2720, ups=0, wpb=9216.000, bsz=3.000, num_updates=27, lr=0.0016876, gnorm=25.136, clip=1.000, oom=0.000, loss_scale=8.000, wall=91, train_wall=84
      | epoch 001:     40 / 11201 loss=27.009, nll_loss=27.009, ppl=135079983.00, wps=2865, ups=0, wpb=9216.000, bsz=3.000, num_updates=37, lr=0.0023126, gnorm=24.311, clip=1.000, oom=0.000, loss_scale=8.000, wall=119, train_wall=112
      | epoch 001:     50 / 11201 loss=26.418, nll_loss=26.418, ppl=89680259.41, wps=2952, ups=0, wpb=9216.000, bsz=3.000, num_updates=47, lr=0.0029376, gnorm=22.775, clip=1.000, oom=0.000, loss_scale=8.000, wall=147, train_wall=140
      
      with layerdrop (regularization effect should be seen in PPL):
      --decoder-layerdrop 0.2
      
      | epoch 001:     10 / 11201 loss=25.186, nll_loss=25.186, ppl=38182937.27, wps=2428, ups=0, wpb=9216.000, bsz=3.000, num_updates=8, lr=0.0005001, gnorm=17.082, clip=1.000, oom=0.000, loss_scale=16.000, wall=30, train_wall=24
      | epoch 001:     20 / 11201 loss=25.270, nll_loss=25.270, ppl=40451933.50, wps=3173, ups=0, wpb=9216.000, bsz=3.000, num_updates=18, lr=0.0011251, gnorm=17.162, clip=1.000, oom=0.000, loss_scale=16.000, wall=52, train_wall=45
      | epoch 001:     30 / 11201 loss=25.349, nll_loss=25.349, ppl=42752256.68, wps=3454, ups=0, wpb=9216.000, bsz=3.000, num_updates=28, lr=0.0017501, gnorm=17.370, clip=1.000, oom=0.000, loss_scale=16.000, wall=75, train_wall=68
      | epoch 001:     40 / 11201 loss=25.115, nll_loss=25.115, ppl=36343806.30, wps=3619, ups=0, wpb=9216.000, bsz=3.000, num_updates=38, lr=0.0023751, gnorm=16.945, clip=1.000, oom=0.000, loss_scale=16.000, wall=97, train_wall=90
      | epoch 001:     50 / 11201 loss=24.804, nll_loss=24.804, ppl=29284345.78, wps=3716, ups=0, wpb=9216.000, bsz=3.000, num_updates=48, lr=0.0030001, gnorm=16.406, clip=1.000, oom=0.000, loss_scale=16.000, wall=119, train_wall=112
      
      TEST 3: PICKING UP TRAINING FROM EXISTING MODEL
      checked
      
      | loaded checkpoint /checkpoint/angelafan/structured_0.1_block_8_sd02/checkpoint_last.pt (epoch 272 @ 381066 updates)
      | loading train data for epoch 272
      | loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train
      
      TEST 4: EVALUATING EXISTING BERT MODEL REPROS RESULTS
      | [input] dictionary: 50265 types
      | [label] dictionary: 9 types
      | Accuracy:  0.9231651376146789
      achieves correct accuracy on SST2 for this model
      
      TEST 5: TRAINING NEW BERT MODEL WORKS
      checked and works
      
      TEST 6: NMT
      
      without layerdrop
      --encoder-layerdrop 0 --decoder-layerdrop 0 OR combinations of flag specified and not specified
      
      | epoch 001:     10 / 92203 loss=15.820, nll_loss=15.830, ppl=58267.93, wps=4902, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=7.207, clip=0.000, oom=0.000, loss_scale=128.000, wall=60, train_wall=3
      | epoch 001:     20 / 92203 loss=15.523, nll_loss=15.501, ppl=46359.29, wps=5037, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.869, clip=0.000, oom=0.000, loss_scale=128.000, wall=63, train_wall=6
      | epoch 001:     30 / 92203 loss=15.185, nll_loss=15.123, ppl=35695.79, wps=5085, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.186, clip=0.000, oom=0.000, loss_scale=128.000, wall=66, train_wall=9
      | epoch 001:     40 / 92203 loss=14.940, nll_loss=14.849, ppl=29505.60, wps=5116, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=5.610, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=12
      | epoch 001:     50 / 92203 loss=14.745, nll_loss=14.630, ppl=25346.87, wps=5070, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.104, clip=0.000, oom=0.000, loss_scale=128.000, wall=71, train_wall=15
      
      with layerdrop (regularization effect should be seen in PPL)
      
      A) works with --encoder-layerdrop 0.2 --decoder-layerdrop 0.2
      B) works with different settings --encoder-layerdrop 0.3 --decoder-layerdrop 0.5
      C) works with one on and one off --encoder-layerdrop 0.2 --decoder-layerdrop 0
      
      | epoch 001:     10 / 92203 loss=15.817, nll_loss=15.828, ppl=58158.54, wps=5355, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=6.959, clip=0.000, oom=0.000, loss_scale=128.000, wall=59, train_wall=3
      | epoch 001:     20 / 92203 loss=15.650, nll_loss=15.641, ppl=51111.63, wps=5515, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.825, clip=0.000, oom=0.000, loss_scale=128.000, wall=61, train_wall=6
      | epoch 001:     30 / 92203 loss=15.440, nll_loss=15.408, ppl=43491.58, wps=5602, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.576, clip=0.000, oom=0.000, loss_scale=128.000, wall=64, train_wall=8
      | epoch 001:     40 / 92203 loss=15.247, nll_loss=15.193, ppl=37457.14, wps=5676, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=6.124, clip=0.000, oom=0.000, loss_scale=128.000, wall=67, train_wall=11
      | epoch 001:     50 / 92203 loss=15.055, nll_loss=14.977, ppl=32259.92, wps=5598, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.661, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=14
      
      TEST 7: PRUNING TESTCASES
      
      A) after adding the pruning flags, model can evaluate as a full model
      checked, reaches correct PPL
      num. model params: 246933504
      | Evaluated 217646 tokens in 196.3s (1108.99 tokens/s)
      | Loss: 2.9275, Perplexity: 18.68
      
      B) after adding pruning flags, model can be pruned. this works with multiple flag settings
      checked three cases:
      num. model params: 146163712
      | Evaluated 217646 tokens in 106.0s (2054.07 tokens/s)
      | Loss: 3.0932, Perplexity: 22.05
      
      num. model params: 209144832
      | Evaluated 217646 tokens in 162.8s (1336.99 tokens/s)
      | Loss: 2.9526, Perplexity: 19.16
      
      C) model can pick up training if you want to finetune the pruned model
      checked:
      | loading train data for epoch 272
      | loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train
      | WARNING: overflow detected, setting loss scale to: 64.0
      | WARNING: overflow detected, setting loss scale to: 32.0
      | epoch 272:   1500 / 5601 loss=5.015, nll_loss=5.015, ppl=32.33, wps=11598, ups=1, wpb=18432.000, bsz=6.000, num_updates=98, lr=0.0061251, gnorm=0.613, clip=1.000, oom=0.000, loss_scale=32.000, wall=156, train_wall=252396
      
      D) works with BERT
      checked:
      without specifying any flags, reproduces the correct standard accuracy
      with flags, produces the correct pruned accuracy
      
      | [input] dictionary: 50265 types
      | [label] dictionary: 9 types
      | Accuracy:  0.9231651376146789
      
      | [input] dictionary: 50265 types
      | [label] dictionary: 9 types
      | Pruning model to specified layer configuration - this works best if the model was trained with LayerDrop
      | Accuracy:  0.9220183486238532
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/890
      
      Reviewed By: edunov
      
      Differential Revision: D18094657
      
      Pulled By: huihuifan
      
      fbshipit-source-id: 2bbaa2ff0039e906782694fc2038b8c17a8693e7
      dabbef46
  15. 20 Oct, 2019 1 commit
  16. 18 Oct, 2019 1 commit
  17. 10 Oct, 2019 2 commits
    • Dmytro Okhonko's avatar
      Add ctc loss to ASR task (#1233) · c4893ca6
      Dmytro Okhonko authored
      Summary:
      Adds CTC loss and corresponding transformer ctc based models.
      
      Tested with
      `CUDA_VISIBLE_DEVICES=0 python train.py $DATA_PATH --save-dir $SAVE_DIR --max-epoch 30 --task speech_recognition --arch vggtransformer_enc_1 --optimizer adadelta --lr 1.0 --adadelta-eps 1e-8 --adadelta-rho 0.95 --clip-norm 10.0  --max-tokens 10000 --log-format json --log-interval 1 --criterion ctc_loss --user-dir examples/speech_recognition/ --validate-interval=10`
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1233
      
      Reviewed By: jcai1
      
      Differential Revision: D17856824
      
      Pulled By: okhonko
      
      fbshipit-source-id: f3eac64d3fdd0c37cf8c539dd360cfb610d8a6ef
      c4893ca6
    • Jeff Cai's avatar
      wav2letter integration · 33646ac9
      Jeff Cai authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/846
      
      Reviewed By: jcai1
      
      Differential Revision: D17845996
      
      Pulled By: okhonko
      
      fbshipit-source-id: 3826fd9a4418496916bf1835c319dd85c89945cc
      33646ac9
  18. 05 Oct, 2019 1 commit
  19. 30 Sep, 2019 1 commit
  20. 29 Sep, 2019 1 commit
  21. 28 Sep, 2019 1 commit
  22. 27 Sep, 2019 3 commits
  23. 24 Sep, 2019 1 commit
  24. 20 Sep, 2019 1 commit
  25. 17 Sep, 2019 2 commits
  26. 05 Sep, 2019 1 commit
    • Roman Rädle's avatar
      Return predicted token for RoBERTa filling mask · 3e3fe722
      Roman Rädle authored
      Summary:
      Added the `predicted_token` to each `topk` filled output item
      
      Updated RoBERTa filling mask example in README.md
      
      Reviewed By: myleott
      
      Differential Revision: D17188810
      
      fbshipit-source-id: 5fdc57ff2c13239dabf13a8dad43ae9a55e8931c
      3e3fe722
  27. 03 Sep, 2019 1 commit
    • altale's avatar
      Fix an error in the command about Hierarchical Neural Story Generation (#1099) · 6c00b338
      altale authored
      Summary:
      When I try to reproduce the experiment in  _Hierarchical Neural Story Generation_, I found the command about generation cannot be executed.
      
      It said that **fairseq-generate: error: unrecognized arguments: --sampling-temperature 0.8**
      In the document, I find:
      ```
      --temperature   temperature for generation
      Default: 1.0
      ```
      And I don't find a parameter named `--sampling-temperature`, so I think the parameter `--sampling-temperature` should be changed to `--temperature`
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1099
      
      Differential Revision: D17163065
      
      Pulled By: myleott
      
      fbshipit-source-id: 25c430eeee4703f8ec30353825ffec4bb973da0d
      6c00b338
  28. 27 Aug, 2019 1 commit
  29. 22 Aug, 2019 3 commits
  30. 20 Aug, 2019 1 commit
  31. 19 Aug, 2019 2 commits