- 12 Nov, 2019 1 commit
-
-
Spencer Poff authored
Summary: Using PyTorch IterableDataset for streaming iterators. Such that there is a clean differentiation in interface between datasets that are streaming data and those that support indexed access. Reviewed By: myleott Differential Revision: D18438694 fbshipit-source-id: 482857d8357091ea2a6bf819535b09ba7f1a5b7d
-
- 10 Nov, 2019 1 commit
-
-
Louis Martin authored
Summary: Check locally that everything works fine. Model is uploaded to fbaipublicfiles. I fixed a few inconsistencies in the bpe encoding along the way, e.g. related to https://github.com/pytorch/fairseq/issues/1306.. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/904 Reviewed By: ngoyal2707 Differential Revision: D18418345 Pulled By: louismartin fbshipit-source-id: 53acb4d021581968d70430ee9babee07d6573c17
-
- 09 Nov, 2019 1 commit
-
-
Naman Goyal authored
Summary: This is the first version of BART code / model release. It still requires lot of clean up, instructions, making sure results are reproducible before we can release it. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/902 Differential Revision: D18389535 fbshipit-source-id: 77f16800307ce831bd29538fdd34800793210f46
-
- 08 Nov, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/903 Reviewed By: sujitoc Differential Revision: D18327653 fbshipit-source-id: 739ddbaf54862acdf7b4f1bc3ad538bde5ae00fd
-
Xian Li authored
Summary: To avoid the case where can_ins_mask has all False so max_lengths has size [0, 1] which failed expand_as operator. Move it back into the skipping branch in script. The same for deletion and ins_word. Reviewed By: kahne Differential Revision: D18365340 fbshipit-source-id: 509ac21d7d6fd9083d0710697288203977314c52
-
- 07 Nov, 2019 4 commits
-
-
Kevin authored
Summary: Solves https://github.com/pytorch/fairseq/issues/1218. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1219 Differential Revision: D18339541 Pulled By: myleott fbshipit-source-id: 6d5bd7b60fa7fd30c038fdad54591343a01f228b
-
Louis MARTIN authored
Summary: Models seem to train fine with this modification. I checked that the mask for beginning of words is correct but didn't check if the actual masking worked correctly. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1292 Differential Revision: D18338307 Pulled By: myleott fbshipit-source-id: eae9e29d6ab648e768d70921694a898554496704
-
freewym authored
Summary: …all set_epoch() for each sub dataset Pull Request resolved: https://github.com/pytorch/fairseq/pull/1272 Differential Revision: D18338300 Pulled By: myleott fbshipit-source-id: 973d57f52c5cf4ad40122d4a625942281c7983b7
-
Liam authored
Summary: "pytorch.fairseq" -> "pytorch/fairseq" to avoid following error: ``` ValueError: not enough values to unpack (expected 2, got 1) Pull Request resolved: https://github.com/pytorch/fairseq/pull/1310 Differential Revision: D18338223 Pulled By: myleott fbshipit-source-id: c95fcc3bb814c7f980a22996dc7923d6d487810b
-
- 06 Nov, 2019 2 commits
-
-
Naman Goyal authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/901 Differential Revision: D18349686 fbshipit-source-id: ba0a378e3fb98a35b3ef2e2103c2f921c4729e40
-
Jerry Ma authored
Summary: - Adds memory summary logging to validation and optimization steps. - Clarifies in the logging that optimization OOMs are not recoverable. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/893 Differential Revision: D18110763 Pulled By: jma127 fbshipit-source-id: 49340e611169c606ab9c991265167a79f51846e6
-
- 05 Nov, 2019 2 commits
-
-
ngoyal2707 authored
Summary: TODO: 1) Need to update bibtex entry 2) Need to upload models, spm_vocab and dict.txt to public s3 location. For Future: 1) I will probably add instructions to finetune on XNLI and NER, POS etc. but currently no timeline for that. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/900 Reviewed By: myleott Differential Revision: D18333076 Pulled By: myleott fbshipit-source-id: 3f3d3716fcc41c78d2dd4525f60b519abbd0459c
-
Spencer Poff authored
Summary: https://github.com/pytorch/fairseq/pull/1097 added key padding mask history in TransformerDecoderLayer, but during an edge case where only the current or only the previous key_padding_mask exists, the resulting key_padding_mask is the wrong size. This diff adds empty columns in such a case to ensure key_padding_mask is a usable size. Reviewed By: myleott Differential Revision: D18224313 fbshipit-source-id: c9fb7266baf0a2d79a66704e00a5ea8bd2987ff6
-
- 02 Nov, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1340 Differential Revision: D18289455 Pulled By: myleott fbshipit-source-id: a1c8163a35273b6c646d300142701e8a317d7378
-
- 01 Nov, 2019 2 commits
-
-
Chau Tran authored
Summary: Fix integration test Reviewed By: xianxl Differential Revision: D18040440 fbshipit-source-id: 98c8ab7970d081f17deb54c69aa35669de12d767
-
Halil Akin authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/898 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1333 Pull Request resolved: https://github.com/fairinternal/fairspeq/pull/11 This in_proj_weight and in_proj_bias properties are not the right way of providing backward compatibility, and it's causing other incompatibilities with the new Dynamic Quantization API. So, let's remove this, and properly fix the failing tests. Reviewed By: myleott Differential Revision: D18264129 fbshipit-source-id: fc1838657a60d914ca83c4e0f6add5ed8206ac54
-
- 31 Oct, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/897 Differential Revision: D18250587 Pulled By: myleott fbshipit-source-id: b9cef376bc014b68766229aab7b6e454480757d3
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/895 Reviewed By: akinh Differential Revision: D18246479 Pulled By: myleott fbshipit-source-id: a610f1e4943619d32a523601a572fb09cdc5638d
-
- 30 Oct, 2019 1 commit
-
-
Xian Li authored
Summary: This diff enables layer drop in transformer decoder in production training pipeline (ptt_transformer). It builds on top of the fairseq implementation D18094657 added by Angela Fan, and added additional logic to handle corresponding dropping layers at test time in exported model. Reviewed By: jhcross Differential Revision: D18165586 fbshipit-source-id: 373ac00268a25fa9e412edcb483becdfe792d992
-
- 28 Oct, 2019 1 commit
-
-
Ning Dong authored
Summary: Revert the interface change for iterative_refinement_generator Reviewed By: kahne Differential Revision: D18165103 fbshipit-source-id: 075c276746eb90d7c359b6ad92e1ef25e8452bcc
-
- 27 Oct, 2019 1 commit
-
-
Angela Fan authored
Summary: TEST 1: EVALUATION TIME WORKS checked achieves correct model perplexity: 18.68 TEST 2: TRAINING NEW MODEL WORKS checked without layerdrop: --decoder-layerdrop 0 OR no flag at all | epoch 001: 10 / 11201 loss=27.469, nll_loss=27.469, ppl=185799477.36, wps=1764, ups=0, wpb=9216.000, bsz=3.000, num_updates=7, lr=0.0004376, gnorm=25.471, clip=1.000, oom=0.000, loss_scale=8.000, wall=37, train_wall=30 | epoch 001: 20 / 11201 loss=27.443, nll_loss=27.443, ppl=182500427.22, wps=2449, ups=0, wpb=9216.000, bsz=3.000, num_updates=17, lr=0.0010626, gnorm=25.273, clip=1.000, oom=0.000, loss_scale=8.000, wall=64, train_wall=57 | epoch 001: 30 / 11201 loss=27.404, nll_loss=27.404, ppl=177612215.78, wps=2720, ups=0, wpb=9216.000, bsz=3.000, num_updates=27, lr=0.0016876, gnorm=25.136, clip=1.000, oom=0.000, loss_scale=8.000, wall=91, train_wall=84 | epoch 001: 40 / 11201 loss=27.009, nll_loss=27.009, ppl=135079983.00, wps=2865, ups=0, wpb=9216.000, bsz=3.000, num_updates=37, lr=0.0023126, gnorm=24.311, clip=1.000, oom=0.000, loss_scale=8.000, wall=119, train_wall=112 | epoch 001: 50 / 11201 loss=26.418, nll_loss=26.418, ppl=89680259.41, wps=2952, ups=0, wpb=9216.000, bsz=3.000, num_updates=47, lr=0.0029376, gnorm=22.775, clip=1.000, oom=0.000, loss_scale=8.000, wall=147, train_wall=140 with layerdrop (regularization effect should be seen in PPL): --decoder-layerdrop 0.2 | epoch 001: 10 / 11201 loss=25.186, nll_loss=25.186, ppl=38182937.27, wps=2428, ups=0, wpb=9216.000, bsz=3.000, num_updates=8, lr=0.0005001, gnorm=17.082, clip=1.000, oom=0.000, loss_scale=16.000, wall=30, train_wall=24 | epoch 001: 20 / 11201 loss=25.270, nll_loss=25.270, ppl=40451933.50, wps=3173, ups=0, wpb=9216.000, bsz=3.000, num_updates=18, lr=0.0011251, gnorm=17.162, clip=1.000, oom=0.000, loss_scale=16.000, wall=52, train_wall=45 | epoch 001: 30 / 11201 loss=25.349, nll_loss=25.349, ppl=42752256.68, wps=3454, ups=0, wpb=9216.000, bsz=3.000, num_updates=28, lr=0.0017501, gnorm=17.370, clip=1.000, oom=0.000, loss_scale=16.000, wall=75, train_wall=68 | epoch 001: 40 / 11201 loss=25.115, nll_loss=25.115, ppl=36343806.30, wps=3619, ups=0, wpb=9216.000, bsz=3.000, num_updates=38, lr=0.0023751, gnorm=16.945, clip=1.000, oom=0.000, loss_scale=16.000, wall=97, train_wall=90 | epoch 001: 50 / 11201 loss=24.804, nll_loss=24.804, ppl=29284345.78, wps=3716, ups=0, wpb=9216.000, bsz=3.000, num_updates=48, lr=0.0030001, gnorm=16.406, clip=1.000, oom=0.000, loss_scale=16.000, wall=119, train_wall=112 TEST 3: PICKING UP TRAINING FROM EXISTING MODEL checked | loaded checkpoint /checkpoint/angelafan/structured_0.1_block_8_sd02/checkpoint_last.pt (epoch 272 @ 381066 updates) | loading train data for epoch 272 | loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train TEST 4: EVALUATING EXISTING BERT MODEL REPROS RESULTS | [input] dictionary: 50265 types | [label] dictionary: 9 types | Accuracy: 0.9231651376146789 achieves correct accuracy on SST2 for this model TEST 5: TRAINING NEW BERT MODEL WORKS checked and works TEST 6: NMT without layerdrop --encoder-layerdrop 0 --decoder-layerdrop 0 OR combinations of flag specified and not specified | epoch 001: 10 / 92203 loss=15.820, nll_loss=15.830, ppl=58267.93, wps=4902, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=7.207, clip=0.000, oom=0.000, loss_scale=128.000, wall=60, train_wall=3 | epoch 001: 20 / 92203 loss=15.523, nll_loss=15.501, ppl=46359.29, wps=5037, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.869, clip=0.000, oom=0.000, loss_scale=128.000, wall=63, train_wall=6 | epoch 001: 30 / 92203 loss=15.185, nll_loss=15.123, ppl=35695.79, wps=5085, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.186, clip=0.000, oom=0.000, loss_scale=128.000, wall=66, train_wall=9 | epoch 001: 40 / 92203 loss=14.940, nll_loss=14.849, ppl=29505.60, wps=5116, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=5.610, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=12 | epoch 001: 50 / 92203 loss=14.745, nll_loss=14.630, ppl=25346.87, wps=5070, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.104, clip=0.000, oom=0.000, loss_scale=128.000, wall=71, train_wall=15 with layerdrop (regularization effect should be seen in PPL) A) works with --encoder-layerdrop 0.2 --decoder-layerdrop 0.2 B) works with different settings --encoder-layerdrop 0.3 --decoder-layerdrop 0.5 C) works with one on and one off --encoder-layerdrop 0.2 --decoder-layerdrop 0 | epoch 001: 10 / 92203 loss=15.817, nll_loss=15.828, ppl=58158.54, wps=5355, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=6.959, clip=0.000, oom=0.000, loss_scale=128.000, wall=59, train_wall=3 | epoch 001: 20 / 92203 loss=15.650, nll_loss=15.641, ppl=51111.63, wps=5515, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.825, clip=0.000, oom=0.000, loss_scale=128.000, wall=61, train_wall=6 | epoch 001: 30 / 92203 loss=15.440, nll_loss=15.408, ppl=43491.58, wps=5602, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.576, clip=0.000, oom=0.000, loss_scale=128.000, wall=64, train_wall=8 | epoch 001: 40 / 92203 loss=15.247, nll_loss=15.193, ppl=37457.14, wps=5676, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=6.124, clip=0.000, oom=0.000, loss_scale=128.000, wall=67, train_wall=11 | epoch 001: 50 / 92203 loss=15.055, nll_loss=14.977, ppl=32259.92, wps=5598, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.661, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=14 TEST 7: PRUNING TESTCASES A) after adding the pruning flags, model can evaluate as a full model checked, reaches correct PPL num. model params: 246933504 | Evaluated 217646 tokens in 196.3s (1108.99 tokens/s) | Loss: 2.9275, Perplexity: 18.68 B) after adding pruning flags, model can be pruned. this works with multiple flag settings checked three cases: num. model params: 146163712 | Evaluated 217646 tokens in 106.0s (2054.07 tokens/s) | Loss: 3.0932, Perplexity: 22.05 num. model params: 209144832 | Evaluated 217646 tokens in 162.8s (1336.99 tokens/s) | Loss: 2.9526, Perplexity: 19.16 C) model can pick up training if you want to finetune the pruned model checked: | loading train data for epoch 272 | loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train | WARNING: overflow detected, setting loss scale to: 64.0 | WARNING: overflow detected, setting loss scale to: 32.0 | epoch 272: 1500 / 5601 loss=5.015, nll_loss=5.015, ppl=32.33, wps=11598, ups=1, wpb=18432.000, bsz=6.000, num_updates=98, lr=0.0061251, gnorm=0.613, clip=1.000, oom=0.000, loss_scale=32.000, wall=156, train_wall=252396 D) works with BERT checked: without specifying any flags, reproduces the correct standard accuracy with flags, produces the correct pruned accuracy | [input] dictionary: 50265 types | [label] dictionary: 9 types | Accuracy: 0.9231651376146789 | [input] dictionary: 50265 types | [label] dictionary: 9 types | Pruning model to specified layer configuration - this works best if the model was trained with LayerDrop | Accuracy: 0.9220183486238532 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/890 Reviewed By: edunov Differential Revision: D18094657 Pulled By: huihuifan fbshipit-source-id: 2bbaa2ff0039e906782694fc2038b8c17a8693e7
-
- 26 Oct, 2019 1 commit
-
-
Xian Li authored
Summary: Fix a type mismatch which was found after patching NAT on top of quantization. Ning suggested this fix. Need to further understand: why this only appears after patching quantization diff? Reviewed By: kahne, jhcross Differential Revision: D18147726 fbshipit-source-id: a51becc9ad58a637a0180074eaa2b46990ab9f84
-
- 25 Oct, 2019 2 commits
-
-
Halil Akin authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1304 Pull Request resolved: https://github.com/pytorch/translate/pull/657 Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1065 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/889 We are converting matmuls to quantizable nn.Linear modules in this diff. First let's test profile after the diff to see how low level operations are changing. Reviewed By: jmp84, edunov, lly-zero-one, jhcross Differential Revision: D17964796 fbshipit-source-id: 3ddd3ff81fa1ea5864dded98e993f4fe3b71fe5e
-
Halil Akin authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/888 We want to simplify multihead attention and get rid of the dynamic in_proj_weight logic. Sending the diff early for feedback, will have further changes as I try to fix breaking tests Reviewed By: edunov Differential Revision: D17912661 fbshipit-source-id: 0e6319fc694d8ec5187d1c2fefe5839d9d522186
-
- 24 Oct, 2019 4 commits
-
-
Ning Dong authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1299 LevT calls into tracing compliant transformer we didn't plan to OSS earlier. This is a workaround to unbreak the master. Will revisit and simplify the code later. Reviewed By: pipibjc Differential Revision: D18110339 fbshipit-source-id: 3bb51c56c2c20f45db1d5786d030b374b412eab1
-
Jerry Ma authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/892 Differential Revision: D18109685 Pulled By: jma127 fbshipit-source-id: f96e1080a5577b8ee0748dfdd956bf72bed47474
-
Jerry Ma authored
Summary: Makes more sense to reset either both meters or neither of them. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/891 Differential Revision: D18109027 Pulled By: jma127 fbshipit-source-id: f63baed9a6b928a6f591a76e69ef6e9c524e4398
-
Ning Dong authored
Summary: NAT productionization diff (1) Integrate NAT model training / Evaluation in LATTE base training workflow. (2) Make NAT tracing compliant. Since it calls into Fairseq transformer, we need to refactor the code and I created a ~copy of it named fb_tracing_transformer. (3) Decoder side C++ code is landed in the diff earlier. Reviewed By: xianxl Differential Revision: D17888324 fbshipit-source-id: ef4ef195fddd360da921502adcef82b087e46ce6
-
- 23 Oct, 2019 1 commit
-
-
Yilei Li authored
Summary: Enables reduce_on_plateau schedule with optional warmup phase, where we linearly increase the learning rate from some initial learning rate (``--warmup-init-lr``) until the configured learning rate (``--lr``). Thereafter the lr is adjusted according to original reduce_on_plateau scheme During warmup:: lrs = torch.linspace(args.warmup_init_lr, args.lr, args.warmup_updates) lr = lrs[update_num] Reviewed By: yqwangustc Differential Revision: D17779925 fbshipit-source-id: c3bfb3321c76850824fc42df4fac4e5dcf73fbf8
-
- 22 Oct, 2019 3 commits
-
-
Changhan Wang authored
Summary: Bugfix for inconsistent scores on the same input sentences. This only affects the displayed scores in `generate.py` and does not affect the model outputs. Reviewed By: MultiPath Differential Revision: D17799343 fbshipit-source-id: 2b868ac03097a4db27db736e126a61d50958acc5
-
Louis MARTIN authored
Summary: Very small change. The previous message was misleading, the length of TokenBlocksDataset is a number of "blocks" or "streams" but not the number of batches strictly speaking if I am not mistaken. I use the notion of batch from roberta https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md. It took me some time to understand what was going on, I hope it saves some time for others. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1279 Differential Revision: D18051476 fbshipit-source-id: 71fa35f21b9dbc8d6bde28cd3a487723690aadee
-
Louis MARTIN authored
Summary: Fix for https://github.com/pytorch/fairseq/issues/1240 Tested with MaskedLMTask. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1281 Differential Revision: D18051472 fbshipit-source-id: 0aeff60c71489655f5e621349f780ba9cd8c027a
-
- 20 Oct, 2019 2 commits
-
-
Jiatao Gu authored
Summary: The Diff conatins two fixes: (1) enabling non-shared decoder layers for deletion/insertion (2) adding options to perform sampling instead of argmax when learning the deletion Reviewed By: kahne Differential Revision: D18011220 fbshipit-source-id: c60815fb7bc3a0004c81249504f7a641536ae2d8
-
Jiatao Gu authored
Summary: Fix typos in the examples Reviewed By: kahne Differential Revision: D18030097 fbshipit-source-id: 84f0cbafd85e50ffd5033738835373935e3b83d4
-
- 18 Oct, 2019 3 commits
-
-
Spencer Poff authored
Summary: In https://github.com/fairinternal/fairseq-py/pull/877, sequence_generator began calling `model.forward_decoder`, but not all decoder models were given an implementation of that function. Reviewed By: okhonko Differential Revision: D17863751 fbshipit-source-id: ea70b636c9dafcf87f5d5e49631d0c4b7cf14984
-
dikshameghwal authored
Summary: removed redundant quotes in the filename assigned for dev dataset for GLUE tasks Pull Request resolved: https://github.com/pytorch/fairseq/pull/1270 Differential Revision: D18013071 fbshipit-source-id: 35f00162e117c6584dc859f760503ca32dcb706e
-
Changhan Wang authored
Summary: When the `if` statements in the levenshtein transformer decoder forward are removed, `attn` may get inconsistent batch sizes with output tokens. This is a fix. Reviewed By: cndn Differential Revision: D17936411 fbshipit-source-id: a1583f3806dc9f41caeb783c043429e247035803
-
- 15 Oct, 2019 2 commits
-
-
Nayan Singhal authored
Summary: This unit test guards the bmuf code. change: 1. distributed_init assumes we are always using cuda device which is not the case if you are using "gloo" backend on CPU machine. Reviewed By: jay-mahadeokar Differential Revision: D17821391 fbshipit-source-id: 28e1bb39f7a4889b1dc6bd636b7c499e55bfc69a
-
Changhan Wang authored
Summary: Bring back the changes in D17661768 Reviewed By: ailzhang Differential Revision: D17920299 fbshipit-source-id: be3f93a044a8710c8b475012c39e36a3e6507fad
-
- 12 Oct, 2019 1 commit
-
-
Sujit Verma authored
Summary: Added option to save checkpoints using Path Manager. Reviewed By: hudeven Differential Revision: D17392754 fbshipit-source-id: 4b8e556ef8455a1548e5a083d779ed809cd785be
-