Commits · 096d7d301e2ad6687673f4abfad0a008b9f426ea · OpenDAS / Fairseq

13 Nov, 2019 4 commits

Fix the type annotations of three parameters found in two constructors (#1268) · 096d7d30

zheng authored Nov 13, 2019

Summary:
As their names suggest, the parameters `embedding_dim`, `ffn_embedding_dim`, and `num_attention_heads` should have type `int`, not `float`.

Also validated by https://github.com/pytorch/fairseq/blob/b5f41f828b0ec9b67fa60aceb0778073d1b368b2/fairseq/modules/sparse_transformer_sentence_encoder.py#L22#L24.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1268

Differential Revision: D18372518

Pulled By: myleott

fbshipit-source-id: 666739b6270a975536785886068a975e07312bb0

096d7d30

Add 'ppl' to tensorboard (#1212) · aaa37f05

Zhanghao Wu authored Nov 13, 2019

Summary:
Originally, the 'ppl' is calculated but returned as a string, which will not be printed to the tensorboard.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1212

Differential Revision: D18339553

Pulled By: myleott

fbshipit-source-id: 52e64d5d173bfd79836a72ee103cb25c8bb2a4c2

aaa37f05

Have `setup.py clean` remove compiled Cython files · 4d21c157

Myle Ott authored Nov 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/907

Differential Revision: D18480215

Pulled By: myleott

fbshipit-source-id: b02002f631f6d47380f309d4f464bd135d623280

4d21c157

Merge TracingCompliantTransformer and regular Transformer, fix NAT tests · 27568a7e

Myle Ott authored Nov 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/899

Differential Revision: D18373060

Pulled By: myleott

fbshipit-source-id: bb5510ec15799a0a10a7c0669e76d8200e1ba479

27568a7e

12 Nov, 2019 1 commit

More thorough support for iterable datasets · 2a9b4ec2

Spencer Poff authored Nov 11, 2019

Summary: Using PyTorch IterableDataset for streaming iterators. Such that there is a clean differentiation in interface between datasets that are streaming data and those that support indexed access.

Reviewed By: myleott

Differential Revision: D18438694

fbshipit-source-id: 482857d8357091ea2a6bf819535b09ba7f1a5b7d

2a9b4ec2

10 Nov, 2019 1 commit

Camembert model and code (#904) · b31849aa

Louis Martin authored Nov 10, 2019

Summary:
Check locally that everything works fine.
Model is uploaded to fbaipublicfiles.

I fixed a few inconsistencies in the bpe encoding along the way, e.g. related to https://github.com/pytorch/fairseq/issues/1306..
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/904

Reviewed By: ngoyal2707

Differential Revision: D18418345

Pulled By: louismartin

fbshipit-source-id: 53acb4d021581968d70430ee9babee07d6573c17

b31849aa

09 Nov, 2019 1 commit

adding first version of bart code release (#902) · a92bcdad

Naman Goyal authored Nov 08, 2019

Summary:
This is the first version of BART code / model release.

It still requires lot of clean up, instructions, making sure results are reproducible before we can release it.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/902

Differential Revision: D18389535

fbshipit-source-id: 77f16800307ce831bd29538fdd34800793210f46

a92bcdad

08 Nov, 2019 2 commits

Move fb_pathmgr registration out of train.py · e98bf7e6

Myle Ott authored Nov 08, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/903

Reviewed By: sujitoc

Differential Revision: D18327653

fbshipit-source-id: 739ddbaf54862acdf7b4f1bc3ad538bde5ae00fd

e98bf7e6

Fix LevT edge cases · e9171ce1

Xian Li authored Nov 07, 2019

Summary:
To avoid the case where can_ins_mask has all False so max_lengths has size [0, 1] which failed expand_as operator. Move it back into the skipping branch in script.

The same for deletion and ins_word.

Reviewed By: kahne

Differential Revision: D18365340

fbshipit-source-id: 509ac21d7d6fd9083d0710697288203977314c52

e9171ce1

07 Nov, 2019 4 commits

Fix changes of file locations of subword-nmt (#1219) · 13d9e2ba

Kevin authored Nov 07, 2019

Summary:
Solves https://github.com/pytorch/fairseq/issues/1218.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1219

Differential Revision: D18339541

Pulled By: myleott

fbshipit-source-id: 6d5bd7b60fa7fd30c038fdad54591343a01f228b

13d9e2ba

Add whole word masking for SentencepieceBPE (#1292) · 37c9d96f

Louis MARTIN authored Nov 07, 2019

Summary:
Models seem to train fine with this modification. I checked that the mask for beginning of words is correct but didn't check if the actual masking worked correctly.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1292

Differential Revision: D18338307

Pulled By: myleott

fbshipit-source-id: eae9e29d6ab648e768d70921694a898554496704

37c9d96f

add set_epoch() to class ConcatDataset and ConcatSentenceDataset to c… (#1272) · 7ca56cb8

freewym authored Nov 07, 2019

Summary:
…all set_epoch() for each sub dataset
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1272

Differential Revision: D18338300

Pulled By: myleott

fbshipit-source-id: 973d57f52c5cf4ad40122d4a625942281c7983b7

7ca56cb8

fix typos (#1310) · f03392d1

Liam authored Nov 07, 2019

Summary:
"pytorch.fairseq" -> "pytorch/fairseq" to avoid following error:
```
ValueError: not enough values to unpack (expected 2, got 1)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1310

Differential Revision: D18338223

Pulled By: myleott

fbshipit-source-id: c95fcc3bb814c7f980a22996dc7923d6d487810b

f03392d1

06 Nov, 2019 2 commits

Xlmr update readme · 1d1e460d

Naman Goyal authored Nov 06, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/901

Differential Revision: D18349686

fbshipit-source-id: ba0a378e3fb98a35b3ef2e2103c2f921c4729e40

1d1e460d

log more OOM sites (#893) · bafeed46

Jerry Ma authored Nov 06, 2019

Summary:
- Adds memory summary logging to validation and optimization steps.
- Clarifies in the logging that optimization OOMs are not recoverable.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/893

Differential Revision: D18110763

Pulled By: jma127

fbshipit-source-id: 49340e611169c606ab9c991265167a79f51846e6

bafeed46

05 Nov, 2019 2 commits

XLM-R code and model release (#900) · e23e5eaa

ngoyal2707 authored Nov 05, 2019

Summary:
TODO:
1) Need to update bibtex entry
2) Need to upload models, spm_vocab and dict.txt to public s3 location.

For Future:

1) I will probably add instructions to finetune on XNLI and NER, POS etc. but currently no timeline for that.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/900

Reviewed By: myleott

Differential Revision: D18333076

Pulled By: myleott

fbshipit-source-id: 3f3d3716fcc41c78d2dd4525f60b519abbd0459c

e23e5eaa

Fixing key padding mask during transformer generation · 68dd3e17

Spencer Poff authored Nov 05, 2019

Summary:
https://github.com/pytorch/fairseq/pull/1097 added key padding mask history in TransformerDecoderLayer, but during an edge case where only the current or only the previous key_padding_mask exists, the resulting key_padding_mask is the wrong size.

This diff adds empty columns in such a case to ensure key_padding_mask is a usable size.

Reviewed By: myleott

Differential Revision: D18224313

fbshipit-source-id: c9fb7266baf0a2d79a66704e00a5ea8bd2987ff6

68dd3e17

02 Nov, 2019 1 commit

Fix building of docs · a0f75996

Myle Ott authored Nov 02, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1340

Differential Revision: D18289455

Pulled By: myleott

fbshipit-source-id: a1c8163a35273b6c646d300142701e8a317d7378

a0f75996

01 Nov, 2019 2 commits

Fix BPE for dual learning · 828c1ca7

Chau Tran authored Nov 01, 2019

Summary: Fix integration test

Reviewed By: xianxl

Differential Revision: D18040440

fbshipit-source-id: 98c8ab7970d081f17deb54c69aa35669de12d767

828c1ca7

Remove in_proj_weight/in_proj_bias in multihead attention and fix the failing tests instead (#898) · 4c6b689e

Halil Akin authored Nov 01, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/898

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1333

Pull Request resolved: https://github.com/fairinternal/fairspeq/pull/11

This in_proj_weight and in_proj_bias properties are not the right way of providing backward compatibility, and it's causing other incompatibilities with the new Dynamic Quantization API. So, let's remove this, and properly fix the failing tests.

Reviewed By: myleott

Differential Revision: D18264129

fbshipit-source-id: fc1838657a60d914ca83c4e0f6add5ed8206ac54

4c6b689e

31 Oct, 2019 2 commits

Fix fairspeq unit test · 99c524c5

Myle Ott authored Oct 31, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/897

Differential Revision: D18250587

Pulled By: myleott

fbshipit-source-id: b9cef376bc014b68766229aab7b6e454480757d3

99c524c5

Fix MultiheadAttention and torch hub · f30fc7d7

Myle Ott authored Oct 31, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/895

Reviewed By: akinh

Differential Revision: D18246479

Pulled By: myleott

fbshipit-source-id: a610f1e4943619d32a523601a572fb09cdc5638d

f30fc7d7

30 Oct, 2019 1 commit

layer drop · 856d8b82

Xian Li authored Oct 30, 2019

Summary: This diff enables layer drop in transformer decoder in production training pipeline (ptt_transformer). It builds on top of the fairseq implementation D18094657 added by Angela Fan, and added additional logic to handle corresponding dropping layers at test time in exported model.

Reviewed By: jhcross

Differential Revision: D18165586

fbshipit-source-id: 373ac00268a25fa9e412edcb483becdfe792d992

856d8b82

28 Oct, 2019 1 commit

Fix LevT generator interface · 50cf3bb5

Ning Dong authored Oct 27, 2019

Summary: Revert the interface change for iterative_refinement_generator

Reviewed By: kahne

Differential Revision: D18165103

fbshipit-source-id: 075c276746eb90d7c359b6ad92e1ef25e8452bcc

50cf3bb5

27 Oct, 2019 1 commit

adding layerdrop code for training, pruning, and readme (#890) · dabbef46

Angela Fan authored Oct 27, 2019

Summary:
TEST 1: EVALUATION TIME WORKS
checked
achieves correct model perplexity: 18.68

TEST 2: TRAINING NEW MODEL WORKS
checked

without layerdrop:
--decoder-layerdrop 0 OR no flag at all
| epoch 001:     10 / 11201 loss=27.469, nll_loss=27.469, ppl=185799477.36, wps=1764, ups=0, wpb=9216.000, bsz=3.000, num_updates=7, lr=0.0004376, gnorm=25.471, clip=1.000, oom=0.000, loss_scale=8.000, wall=37, train_wall=30
| epoch 001:     20 / 11201 loss=27.443, nll_loss=27.443, ppl=182500427.22, wps=2449, ups=0, wpb=9216.000, bsz=3.000, num_updates=17, lr=0.0010626, gnorm=25.273, clip=1.000, oom=0.000, loss_scale=8.000, wall=64, train_wall=57
| epoch 001:     30 / 11201 loss=27.404, nll_loss=27.404, ppl=177612215.78, wps=2720, ups=0, wpb=9216.000, bsz=3.000, num_updates=27, lr=0.0016876, gnorm=25.136, clip=1.000, oom=0.000, loss_scale=8.000, wall=91, train_wall=84
| epoch 001:     40 / 11201 loss=27.009, nll_loss=27.009, ppl=135079983.00, wps=2865, ups=0, wpb=9216.000, bsz=3.000, num_updates=37, lr=0.0023126, gnorm=24.311, clip=1.000, oom=0.000, loss_scale=8.000, wall=119, train_wall=112
| epoch 001:     50 / 11201 loss=26.418, nll_loss=26.418, ppl=89680259.41, wps=2952, ups=0, wpb=9216.000, bsz=3.000, num_updates=47, lr=0.0029376, gnorm=22.775, clip=1.000, oom=0.000, loss_scale=8.000, wall=147, train_wall=140

with layerdrop (regularization effect should be seen in PPL):
--decoder-layerdrop 0.2

| epoch 001:     10 / 11201 loss=25.186, nll_loss=25.186, ppl=38182937.27, wps=2428, ups=0, wpb=9216.000, bsz=3.000, num_updates=8, lr=0.0005001, gnorm=17.082, clip=1.000, oom=0.000, loss_scale=16.000, wall=30, train_wall=24
| epoch 001:     20 / 11201 loss=25.270, nll_loss=25.270, ppl=40451933.50, wps=3173, ups=0, wpb=9216.000, bsz=3.000, num_updates=18, lr=0.0011251, gnorm=17.162, clip=1.000, oom=0.000, loss_scale=16.000, wall=52, train_wall=45
| epoch 001:     30 / 11201 loss=25.349, nll_loss=25.349, ppl=42752256.68, wps=3454, ups=0, wpb=9216.000, bsz=3.000, num_updates=28, lr=0.0017501, gnorm=17.370, clip=1.000, oom=0.000, loss_scale=16.000, wall=75, train_wall=68
| epoch 001:     40 / 11201 loss=25.115, nll_loss=25.115, ppl=36343806.30, wps=3619, ups=0, wpb=9216.000, bsz=3.000, num_updates=38, lr=0.0023751, gnorm=16.945, clip=1.000, oom=0.000, loss_scale=16.000, wall=97, train_wall=90
| epoch 001:     50 / 11201 loss=24.804, nll_loss=24.804, ppl=29284345.78, wps=3716, ups=0, wpb=9216.000, bsz=3.000, num_updates=48, lr=0.0030001, gnorm=16.406, clip=1.000, oom=0.000, loss_scale=16.000, wall=119, train_wall=112

TEST 3: PICKING UP TRAINING FROM EXISTING MODEL
checked

| loaded checkpoint /checkpoint/angelafan/structured_0.1_block_8_sd02/checkpoint_last.pt (epoch 272 @ 381066 updates)
| loading train data for epoch 272
| loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train

TEST 4: EVALUATING EXISTING BERT MODEL REPROS RESULTS
| [input] dictionary: 50265 types
| [label] dictionary: 9 types
| Accuracy:  0.9231651376146789
achieves correct accuracy on SST2 for this model

TEST 5: TRAINING NEW BERT MODEL WORKS
checked and works

TEST 6: NMT

without layerdrop
--encoder-layerdrop 0 --decoder-layerdrop 0 OR combinations of flag specified and not specified

| epoch 001:     10 / 92203 loss=15.820, nll_loss=15.830, ppl=58267.93, wps=4902, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=7.207, clip=0.000, oom=0.000, loss_scale=128.000, wall=60, train_wall=3
| epoch 001:     20 / 92203 loss=15.523, nll_loss=15.501, ppl=46359.29, wps=5037, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.869, clip=0.000, oom=0.000, loss_scale=128.000, wall=63, train_wall=6
| epoch 001:     30 / 92203 loss=15.185, nll_loss=15.123, ppl=35695.79, wps=5085, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.186, clip=0.000, oom=0.000, loss_scale=128.000, wall=66, train_wall=9
| epoch 001:     40 / 92203 loss=14.940, nll_loss=14.849, ppl=29505.60, wps=5116, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=5.610, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=12
| epoch 001:     50 / 92203 loss=14.745, nll_loss=14.630, ppl=25346.87, wps=5070, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.104, clip=0.000, oom=0.000, loss_scale=128.000, wall=71, train_wall=15

with layerdrop (regularization effect should be seen in PPL)

A) works with --encoder-layerdrop 0.2 --decoder-layerdrop 0.2
B) works with different settings --encoder-layerdrop 0.3 --decoder-layerdrop 0.5
C) works with one on and one off --encoder-layerdrop 0.2 --decoder-layerdrop 0

| epoch 001:     10 / 92203 loss=15.817, nll_loss=15.828, ppl=58158.54, wps=5355, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=6.959, clip=0.000, oom=0.000, loss_scale=128.000, wall=59, train_wall=3
| epoch 001:     20 / 92203 loss=15.650, nll_loss=15.641, ppl=51111.63, wps=5515, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.825, clip=0.000, oom=0.000, loss_scale=128.000, wall=61, train_wall=6
| epoch 001:     30 / 92203 loss=15.440, nll_loss=15.408, ppl=43491.58, wps=5602, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.576, clip=0.000, oom=0.000, loss_scale=128.000, wall=64, train_wall=8
| epoch 001:     40 / 92203 loss=15.247, nll_loss=15.193, ppl=37457.14, wps=5676, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=6.124, clip=0.000, oom=0.000, loss_scale=128.000, wall=67, train_wall=11
| epoch 001:     50 / 92203 loss=15.055, nll_loss=14.977, ppl=32259.92, wps=5598, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.661, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=14

TEST 7: PRUNING TESTCASES

A) after adding the pruning flags, model can evaluate as a full model
checked, reaches correct PPL
num. model params: 246933504
| Evaluated 217646 tokens in 196.3s (1108.99 tokens/s)
| Loss: 2.9275, Perplexity: 18.68

B) after adding pruning flags, model can be pruned. this works with multiple flag settings
checked three cases:
num. model params: 146163712
| Evaluated 217646 tokens in 106.0s (2054.07 tokens/s)
| Loss: 3.0932, Perplexity: 22.05

num. model params: 209144832
| Evaluated 217646 tokens in 162.8s (1336.99 tokens/s)
| Loss: 2.9526, Perplexity: 19.16

C) model can pick up training if you want to finetune the pruned model
checked:
| loading train data for epoch 272
| loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train
| WARNING: overflow detected, setting loss scale to: 64.0
| WARNING: overflow detected, setting loss scale to: 32.0
| epoch 272:   1500 / 5601 loss=5.015, nll_loss=5.015, ppl=32.33, wps=11598, ups=1, wpb=18432.000, bsz=6.000, num_updates=98, lr=0.0061251, gnorm=0.613, clip=1.000, oom=0.000, loss_scale=32.000, wall=156, train_wall=252396

D) works with BERT
checked:
without specifying any flags, reproduces the correct standard accuracy
with flags, produces the correct pruned accuracy

| [input] dictionary: 50265 types
| [label] dictionary: 9 types
| Accuracy:  0.9231651376146789

| [input] dictionary: 50265 types
| [label] dictionary: 9 types
| Pruning model to specified layer configuration - this works best if the model was trained with LayerDrop
| Accuracy:  0.9220183486238532
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/890

Reviewed By: edunov

Differential Revision: D18094657

Pulled By: huihuifan

fbshipit-source-id: 2bbaa2ff0039e906782694fc2038b8c17a8693e7

dabbef46

26 Oct, 2019 1 commit

fix a type mismatch in NAT quantization run · eb68afca

Xian Li authored Oct 25, 2019

Summary:
Fix a type mismatch which was found after patching NAT on top of quantization.
Ning suggested this fix. Need to further understand: why this only appears after patching quantization diff?

Reviewed By: kahne, jhcross

Differential Revision: D18147726

fbshipit-source-id: a51becc9ad58a637a0180074eaa2b46990ab9f84

eb68afca

25 Oct, 2019 2 commits

Convert matmuls to quantizable nn.Linear modules (#1304) · c07362c6

Halil Akin authored Oct 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1304

Pull Request resolved: https://github.com/pytorch/translate/pull/657

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1065

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/889

We are converting matmuls to quantizable nn.Linear modules in this diff. First let's test profile after the diff to see how low level operations are changing.

Reviewed By: jmp84, edunov, lly-zero-one, jhcross

Differential Revision: D17964796

fbshipit-source-id: 3ddd3ff81fa1ea5864dded98e993f4fe3b71fe5e

c07362c6

Simplify fairseq multihead attention (#888) · fdf4c3e9

Halil Akin authored Oct 25, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/888

We want to simplify multihead attention and get rid of the dynamic in_proj_weight logic. Sending the diff early for feedback, will have further changes as I try to fix breaking tests

Reviewed By: edunov

Differential Revision: D17912661

fbshipit-source-id: 0e6319fc694d8ec5187d1c2fefe5839d9d522186

fdf4c3e9

24 Oct, 2019 4 commits

OSS tracing compliant transformer to unbreak master (#1299) · 5b086a0c

Ning Dong authored Oct 24, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1299

 LevT calls into tracing compliant transformer we didn't plan to OSS earlier. This is a workaround to unbreak the master. Will revisit and simplify the code later.

Reviewed By: pipibjc

Differential Revision: D18110339

fbshipit-source-id: 3bb51c56c2c20f45db1d5786d030b374b412eab1

5b086a0c

fix inconsistency w/ recent pytorch cuda device logic · d0358bb3

Jerry Ma authored Oct 23, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/892

Differential Revision: D18109685

Pulled By: jma127

fbshipit-source-id: f96e1080a5577b8ee0748dfdd956bf72bed47474

d0358bb3

Reset both WPS and UPS on first minibatch (#891) · 39faa0a4

Jerry Ma authored Oct 23, 2019

Summary:
Makes more sense to reset either both meters or neither of them.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/891

Differential Revision: D18109027

Pulled By: jma127

fbshipit-source-id: f63baed9a6b928a6f591a76e69ef6e9c524e4398

39faa0a4

NAT productionization · 5a2f76ed

Ning Dong authored Oct 23, 2019

Summary:
NAT productionization diff

(1) Integrate NAT model training / Evaluation in LATTE base training workflow.
(2) Make NAT tracing compliant. Since it calls into Fairseq transformer, we need to refactor the code and I created a ~copy of it named fb_tracing_transformer.
(3) Decoder side C++ code is landed in the diff earlier.

Reviewed By: xianxl

Differential Revision: D17888324

fbshipit-source-id: ef4ef195fddd360da921502adcef82b087e46ce6

5a2f76ed

23 Oct, 2019 1 commit

Add warmup support in reduce_on_plateau lr schedule · 8defa9d9

Yilei Li authored Oct 22, 2019

Summary:
Enables reduce_on_plateau schedule with optional warmup phase, where we linearly increase the learning rate from some initial learning rate (``--warmup-init-lr``) until the configured learning rate (``--lr``). Thereafter the lr is adjusted according to original reduce_on_plateau scheme
During warmup::

lrs = torch.linspace(args.warmup_init_lr, args.lr, args.warmup_updates)
lr = lrs[update_num]

Reviewed By: yqwangustc

Differential Revision: D17779925

fbshipit-source-id: c3bfb3321c76850824fc42df4fac4e5dcf73fbf8

8defa9d9

22 Oct, 2019 3 commits

fix score · e49b302a

Changhan Wang authored Oct 22, 2019

Summary: Bugfix for inconsistent scores on the same input sentences. This only affects the displayed scores in `generate.py` and does not affect the model outputs.

Reviewed By: MultiPath

Differential Revision: D17799343

fbshipit-source-id: 2b868ac03097a4db27db736e126a61d50958acc5

e49b302a

Rename "loaded {} batches" to "loaded {} blocks" (#1279) · 2d51e04d

Louis MARTIN authored Oct 21, 2019

Summary:
Very small change.
The previous message was misleading, the length of TokenBlocksDataset is a number of "blocks" or "streams" but not the number of batches strictly speaking if I am not mistaken. I use the notion of batch from roberta https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md.
It took me some time to understand what was going on, I hope it saves some time for others.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1279

Differential Revision: D18051476

fbshipit-source-id: 71fa35f21b9dbc8d6bde28cd3a487723690aadee

2d51e04d

Fix load_dataset signature (#1281) · 34e6a5e8

Louis MARTIN authored Oct 21, 2019

Summary:
Fix for https://github.com/pytorch/fairseq/issues/1240
Tested with MaskedLMTask.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1281

Differential Revision: D18051472

fbshipit-source-id: 0aeff60c71489655f5e621349f780ba9cd8c027a

34e6a5e8

20 Oct, 2019 2 commits

Enable separate models for insertion and deletion; · 66d24dc2

Jiatao Gu authored Oct 20, 2019

Summary:
The Diff conatins two fixes:
(1) enabling non-shared decoder layers for deletion/insertion
(2) adding options to perform sampling instead of argmax when learning the deletion

Reviewed By: kahne

Differential Revision: D18011220

fbshipit-source-id: c60815fb7bc3a0004c81249504f7a641536ae2d8

66d24dc2

Fix typos on Examples for Nonautoregressive translation · a3c629b5

Jiatao Gu authored Oct 19, 2019

Summary: Fix typos in the examples

Reviewed By: kahne

Differential Revision: D18030097

fbshipit-source-id: 84f0cbafd85e50ffd5033738835373935e3b83d4

a3c629b5

18 Oct, 2019 2 commits

add missing function to FairseqLanguageModel · b8d024e9

Spencer Poff authored Oct 18, 2019

Summary: In https://github.com/fairinternal/fairseq-py/pull/877, sequence_generator began calling `model.forward_decoder`, but not all decoder models were given an implementation of that function.

Reviewed By: okhonko

Differential Revision: D17863751

fbshipit-source-id: ea70b636c9dafcf87f5d5e49631d0c4b7cf14984

b8d024e9

fixed a bug in preprocess glue dataset dev filename (#1270) · c8a7b627

dikshameghwal authored Oct 18, 2019

Summary:
removed redundant quotes in the filename assigned for dev dataset for GLUE tasks
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1270

Differential Revision: D18013071

fbshipit-source-id: 35f00162e117c6584dc859f760503ca32dcb706e

c8a7b627