Commits · 60e16a35032cfdab214457a35369c65c2261ccf5 · OpenDAS / Fairseq

21 Nov, 2019 3 commits

Fix warmup for fixed_schedule in case of first update (#1408) · 60e16a35

Tatiana Likhomanenko authored Nov 21, 2019

Summary:
I faced the error while using warmup for fixed lr schedule

```
Traceback (most recent call last):
  File "/private/home/antares/.conda/envs/fairseq-20190809/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/train.py", line 291, in distributed_main
    main(args, init_distributed=True)
  File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/train.py", line 81, in main
    train(args, trainer, task, epoch_itr)
  File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/train.py", line 122, in train
    log_output = trainer.train_step(samples)
  File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/fairseq/trainer.py", line 409, in train_step
    self.optimizer.step()
  File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/fairseq/optim/fp16_optimizer.py", line 153, in step
    self.fp32_optimizer.step(closure)
  File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/fairseq/optim/fairseq_optimizer.py", line 98, in step
    self.optimizer.step(closure)
  File "/private/home/antares/work/unsupervised/blank_test/fairseq-py/fairseq/optim/nag.py", line 68, in step
    lr_correct = lr / lr_old
ZeroDivisionError: float division by zero
```
which is due to `num_updates=0` for the first iteration and thus `lr` we set to the optimizer is zero.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1408

Differential Revision: D18637526

Pulled By: myleott

fbshipit-source-id: fdd81dd69b1b38bc21a4fa315b4e25cee03af6bf

60e16a35

added instructions to FT bart on cnn-dm · 226c1f48

ngoyal2707 authored Nov 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/922

Differential Revision: D18617322

fbshipit-source-id: 50645197cb7f075b5f878818a97358653077c3e0

226c1f48

Refactor data sharding to be specified via caller of task rather than task itself · 99fbd317

Alex Xiao authored Nov 20, 2019

Summary: Modifying number of shards internally to disable data sharding for batch iteration is dangerous because the caller of these tasks is not limited to fairspeq/train. So therefore we should put the onus of data sharding properly on the caller rather than the task itself.

Reviewed By: myleott

Differential Revision: D18456424

fbshipit-source-id: d46be16c441c50082f9a768d0b259e6c28a4b67b

99fbd317

20 Nov, 2019 1 commit

clean up the NAT loss (#921) · 51eb9802

Jiatao Gu authored Nov 19, 2019

Summary:
Clean up the original NAT loss and make it more general to adapt new losses used in NAT models.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/921

Differential Revision: D18610145

Pulled By: MultiPath

fbshipit-source-id: d04dd0fc4047b5f8e332cfe66b1e28cbf39494af

51eb9802

19 Nov, 2019 3 commits

Bart fix prev tokens collate · 831b6b6e

Naman Goyal authored Nov 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/920

Differential Revision: D18593088

fbshipit-source-id: d4479ee8dae2ca623e62e12bd145165a116fb70a

831b6b6e

add and set the missing state "shuffle" properly to EpochBatchIterato… (#1375) · 534eaa2c

freewym authored Nov 19, 2019

Summary:
…r to correctly recover the training from a "non-shuffle" checkpoint
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1375

Differential Revision: D18566535

Pulled By: myleott

fbshipit-source-id: ff7b1a6ead708801f537ec7885e30e37168cd34b

534eaa2c

Bart push cnn eval · 4fd2a16b

ngoyal2707 authored Nov 18, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/915

Differential Revision: D18580996

fbshipit-source-id: 9505a81892ba8ad997c03465d6a2d488c379c762

4fd2a16b

18 Nov, 2019 3 commits

fix defaults for layer drop things (#918) · 9bf0f107

alexeib authored Nov 18, 2019

Summary:
recent layerdrop related changes break existing models because they assume presence of certain args
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/918

Reviewed By: huihuifan

Differential Revision: D18578572

Pulled By: alexeib

fbshipit-source-id: 368c2d5b3add55864bf59516820807303aac6001

9bf0f107

Build Cython components when loading hub (#1386) · 6f7b7d20

Myle Ott authored Nov 17, 2019

Summary:
Fixes https://github.com/pytorch/fairseq/issues/1376
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1386

Differential Revision: D18566839

Pulled By: myleott

fbshipit-source-id: 71805f58fab90f53f757bf4ef69eb914195af38a

6f7b7d20

Update README.md · 6fc03d3c

Louis Martin authored Nov 17, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/913

Differential Revision: D18565866

Pulled By: myleott

fbshipit-source-id: e845759dafe915805c2e38f53c6835cbcef5db2f

6fc03d3c

17 Nov, 2019 1 commit

added prompt file · 7fb1df53

Angela Fan authored Nov 17, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1385

Differential Revision: D18565188

Pulled By: huihuifan

fbshipit-source-id: 9580663b208f286a249bbfa2bacd71f34a01ca9f

7fb1df53

15 Nov, 2019 1 commit

TorchScript-ify BERT training (#887) · 8446cb63

Ilia Cherniavskii authored Nov 15, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/887

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1052

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1250

Adding config parameter "use_torchscript" that enables use of TS for BERT
training

Reviewed By: chenyangyu1988

Differential Revision: D17872083

fbshipit-source-id: 00ac4b04e7f26aa56fe84fe9feaded676d6deb71

8446cb63

14 Nov, 2019 4 commits

Add internal tests for torch hub · e4047852

Myle Ott authored Nov 14, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/911

Differential Revision: D18511627

Pulled By: myleott

fbshipit-source-id: 37d7606ae629f9acf84715dbc9045fb683075db4

e4047852

fix a bug when resuming training from the last epoch (#1275) · 0d03aa88

freewym authored Nov 14, 2019

Summary:
If the training stopped in the middle of the last epoch, and then it was resumed from checkpoint, it will not continue the training because `epoch_itr.epoch < max_epoch` is not satisfied. This PR fixed the issue.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1275

Differential Revision: D18483945

Pulled By: myleott

fbshipit-source-id: 80df6f73fa17606a79a28e8328bb4c577f504683

0d03aa88

Adopt Fairseq MemoryEfficientFP16Optimizer in PyText (#910) · d9836217

Abhimanyu Sharma authored Nov 14, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/910

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1124

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1362

Split the Fariseq MemoryEfficientFP16Optimizer class into 2 classes so that it can be easily imported in pytext through a wrapper class.

Iter 2 - fixed some issues to ensure code runs correctly on fblearner.

Iter 3 - fixed review comments, incorrect import and lints.

Iter 4 - fixed pytext test breaks.

Iter 5 - fix pytext test breaks.

Iter 6 - fix comments and refactor based on conversation with chenyang.

Reviewed By: chenyangyu1988

Differential Revision: D18410916

fbshipit-source-id: 5238ee553cd2811ed0573825e1c29000980cc489

d9836217

Enable to print the history of NAT; fix LevT decoding bug (#908) · b85fb035

Jiatao Gu authored Nov 13, 2019

Summary:
(1) Enable to print the iterative refinement history for all NAT models by setting --retain-iter-history during decoding;
(2) Fix a small bug in the decoding process in Levenshtein Transformer.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/908

Differential Revision: D18493234

Pulled By: MultiPath

fbshipit-source-id: 9e7702adcea49f39d3c10b5349b5a9ae66399a24

b85fb035

13 Nov, 2019 5 commits

Fix LM generation and add unit test · e26ee47a

Myle Ott authored Nov 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/896

Differential Revision: D18250948

Pulled By: myleott

fbshipit-source-id: 7a515311e18795670b29f5e24eeba7619a625da7

e26ee47a

Fix the type annotations of three parameters found in two constructors (#1268) · 096d7d30

zheng authored Nov 13, 2019

Summary:
As their names suggest, the parameters `embedding_dim`, `ffn_embedding_dim`, and `num_attention_heads` should have type `int`, not `float`.

Also validated by https://github.com/pytorch/fairseq/blob/b5f41f828b0ec9b67fa60aceb0778073d1b368b2/fairseq/modules/sparse_transformer_sentence_encoder.py#L22#L24.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1268

Differential Revision: D18372518

Pulled By: myleott

fbshipit-source-id: 666739b6270a975536785886068a975e07312bb0

096d7d30

Add 'ppl' to tensorboard (#1212) · aaa37f05

Zhanghao Wu authored Nov 13, 2019

Summary:
Originally, the 'ppl' is calculated but returned as a string, which will not be printed to the tensorboard.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1212

Differential Revision: D18339553

Pulled By: myleott

fbshipit-source-id: 52e64d5d173bfd79836a72ee103cb25c8bb2a4c2

aaa37f05

Have `setup.py clean` remove compiled Cython files · 4d21c157

Myle Ott authored Nov 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/907

Differential Revision: D18480215

Pulled By: myleott

fbshipit-source-id: b02002f631f6d47380f309d4f464bd135d623280

4d21c157

Merge TracingCompliantTransformer and regular Transformer, fix NAT tests · 27568a7e

Myle Ott authored Nov 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/899

Differential Revision: D18373060

Pulled By: myleott

fbshipit-source-id: bb5510ec15799a0a10a7c0669e76d8200e1ba479

27568a7e

12 Nov, 2019 1 commit

More thorough support for iterable datasets · 2a9b4ec2

Spencer Poff authored Nov 11, 2019

Summary: Using PyTorch IterableDataset for streaming iterators. Such that there is a clean differentiation in interface between datasets that are streaming data and those that support indexed access.

Reviewed By: myleott

Differential Revision: D18438694

fbshipit-source-id: 482857d8357091ea2a6bf819535b09ba7f1a5b7d

2a9b4ec2

10 Nov, 2019 1 commit

Camembert model and code (#904) · b31849aa

Louis Martin authored Nov 10, 2019

Summary:
Check locally that everything works fine.
Model is uploaded to fbaipublicfiles.

I fixed a few inconsistencies in the bpe encoding along the way, e.g. related to https://github.com/pytorch/fairseq/issues/1306..
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/904

Reviewed By: ngoyal2707

Differential Revision: D18418345

Pulled By: louismartin

fbshipit-source-id: 53acb4d021581968d70430ee9babee07d6573c17

b31849aa

09 Nov, 2019 1 commit

adding first version of bart code release (#902) · a92bcdad

Naman Goyal authored Nov 08, 2019

Summary:
This is the first version of BART code / model release.

It still requires lot of clean up, instructions, making sure results are reproducible before we can release it.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/902

Differential Revision: D18389535

fbshipit-source-id: 77f16800307ce831bd29538fdd34800793210f46

a92bcdad

08 Nov, 2019 2 commits

Move fb_pathmgr registration out of train.py · e98bf7e6

Myle Ott authored Nov 08, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/903

Reviewed By: sujitoc

Differential Revision: D18327653

fbshipit-source-id: 739ddbaf54862acdf7b4f1bc3ad538bde5ae00fd

e98bf7e6

Fix LevT edge cases · e9171ce1

Xian Li authored Nov 07, 2019

Summary:
To avoid the case where can_ins_mask has all False so max_lengths has size [0, 1] which failed expand_as operator. Move it back into the skipping branch in script.

The same for deletion and ins_word.

Reviewed By: kahne

Differential Revision: D18365340

fbshipit-source-id: 509ac21d7d6fd9083d0710697288203977314c52

e9171ce1

07 Nov, 2019 4 commits

Fix changes of file locations of subword-nmt (#1219) · 13d9e2ba

Kevin authored Nov 07, 2019

Summary:
Solves https://github.com/pytorch/fairseq/issues/1218.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1219

Differential Revision: D18339541

Pulled By: myleott

fbshipit-source-id: 6d5bd7b60fa7fd30c038fdad54591343a01f228b

13d9e2ba

Add whole word masking for SentencepieceBPE (#1292) · 37c9d96f

Louis MARTIN authored Nov 07, 2019

Summary:
Models seem to train fine with this modification. I checked that the mask for beginning of words is correct but didn't check if the actual masking worked correctly.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1292

Differential Revision: D18338307

Pulled By: myleott

fbshipit-source-id: eae9e29d6ab648e768d70921694a898554496704

37c9d96f

add set_epoch() to class ConcatDataset and ConcatSentenceDataset to c… (#1272) · 7ca56cb8

freewym authored Nov 07, 2019

Summary:
…all set_epoch() for each sub dataset
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1272

Differential Revision: D18338300

Pulled By: myleott

fbshipit-source-id: 973d57f52c5cf4ad40122d4a625942281c7983b7

7ca56cb8

fix typos (#1310) · f03392d1

Liam authored Nov 07, 2019

Summary:
"pytorch.fairseq" -> "pytorch/fairseq" to avoid following error:
```
ValueError: not enough values to unpack (expected 2, got 1)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1310

Differential Revision: D18338223

Pulled By: myleott

fbshipit-source-id: c95fcc3bb814c7f980a22996dc7923d6d487810b

f03392d1

06 Nov, 2019 2 commits

Xlmr update readme · 1d1e460d

Naman Goyal authored Nov 06, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/901

Differential Revision: D18349686

fbshipit-source-id: ba0a378e3fb98a35b3ef2e2103c2f921c4729e40

1d1e460d

log more OOM sites (#893) · bafeed46

Jerry Ma authored Nov 06, 2019

Summary:
- Adds memory summary logging to validation and optimization steps.
- Clarifies in the logging that optimization OOMs are not recoverable.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/893

Differential Revision: D18110763

Pulled By: jma127

fbshipit-source-id: 49340e611169c606ab9c991265167a79f51846e6

bafeed46

05 Nov, 2019 2 commits

XLM-R code and model release (#900) · e23e5eaa

ngoyal2707 authored Nov 05, 2019

Summary:
TODO:
1) Need to update bibtex entry
2) Need to upload models, spm_vocab and dict.txt to public s3 location.

For Future:

1) I will probably add instructions to finetune on XNLI and NER, POS etc. but currently no timeline for that.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/900

Reviewed By: myleott

Differential Revision: D18333076

Pulled By: myleott

fbshipit-source-id: 3f3d3716fcc41c78d2dd4525f60b519abbd0459c

e23e5eaa

Fixing key padding mask during transformer generation · 68dd3e17

Spencer Poff authored Nov 05, 2019

Summary:
https://github.com/pytorch/fairseq/pull/1097 added key padding mask history in TransformerDecoderLayer, but during an edge case where only the current or only the previous key_padding_mask exists, the resulting key_padding_mask is the wrong size.

This diff adds empty columns in such a case to ensure key_padding_mask is a usable size.

Reviewed By: myleott

Differential Revision: D18224313

fbshipit-source-id: c9fb7266baf0a2d79a66704e00a5ea8bd2987ff6

68dd3e17

02 Nov, 2019 1 commit

Fix building of docs · a0f75996

Myle Ott authored Nov 02, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1340

Differential Revision: D18289455

Pulled By: myleott

fbshipit-source-id: a1c8163a35273b6c646d300142701e8a317d7378

a0f75996

01 Nov, 2019 2 commits

Fix BPE for dual learning · 828c1ca7

Chau Tran authored Nov 01, 2019

Summary: Fix integration test

Reviewed By: xianxl

Differential Revision: D18040440

fbshipit-source-id: 98c8ab7970d081f17deb54c69aa35669de12d767

828c1ca7

Remove in_proj_weight/in_proj_bias in multihead attention and fix the failing tests instead (#898) · 4c6b689e

Halil Akin authored Nov 01, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/898

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1333

Pull Request resolved: https://github.com/fairinternal/fairspeq/pull/11

This in_proj_weight and in_proj_bias properties are not the right way of providing backward compatibility, and it's causing other incompatibilities with the new Dynamic Quantization API. So, let's remove this, and properly fix the failing tests.

Reviewed By: myleott

Differential Revision: D18264129

fbshipit-source-id: fc1838657a60d914ca83c4e0f6add5ed8206ac54

4c6b689e

31 Oct, 2019 2 commits

Fix fairspeq unit test · 99c524c5

Myle Ott authored Oct 31, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/897

Differential Revision: D18250587

Pulled By: myleott

fbshipit-source-id: b9cef376bc014b68766229aab7b6e454480757d3

99c524c5

Fix MultiheadAttention and torch hub · f30fc7d7

Myle Ott authored Oct 31, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/895

Reviewed By: akinh

Differential Revision: D18246479

Pulled By: myleott

fbshipit-source-id: a610f1e4943619d32a523601a572fb09cdc5638d

f30fc7d7

30 Oct, 2019 1 commit

layer drop · 856d8b82

Xian Li authored Oct 30, 2019

Summary: This diff enables layer drop in transformer decoder in production training pipeline (ptt_transformer). It builds on top of the fairseq implementation D18094657 added by Angela Fan, and added additional logic to handle corresponding dropping layers at test time in exported model.

Reviewed By: jhcross

Differential Revision: D18165586

fbshipit-source-id: 373ac00268a25fa9e412edcb483becdfe792d992

856d8b82