Commits · 977e36e56aa9be4b114a85de90df52378df2c3dd · OpenDAS / Fairseq

29 May, 2019 4 commits

Support multiple seeds in data_utils.numpy_seed · 977e36e5

Myle Ott authored May 29, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/610

Differential Revision: D15541261

Pulled By: myleott

fbshipit-source-id: f0b823cf4f04c5ef3205f6d259c6dcad4cc329b1

977e36e5

rm BertLayerNorm · 3e472b22

Myle Ott authored May 29, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/608

Differential Revision: D15541220

Pulled By: myleott

fbshipit-source-id: 52a8e4da72cc6e3e25cf98c989d34a269d614c9d

3e472b22

making it easier to use transformer_lm model with new tasks · ed592ab5

Spencer Poff authored May 29, 2019

Summary:
There were two non-obvious errors I ran into while creating a new language modeling task:
- `transformer_lm` implicitly required the `tokens_per_sample` arg
- `transformer_lm` assumed the task had a `dictionary` and `output_dictionary` property, neither of which are specified in the FairseqTask interface

Reviewed By: myleott

Differential Revision: D15532345

fbshipit-source-id: 200d7d3b542c35f17cc2d6bca4219c4a4d17cb6b

ed592ab5

Make XLM torchscipt Export-able (#765) · 4e9ecb80

Kartikay Khandelwal authored May 29, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/765

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/614

This diff has changes needed to make XLM torchscript exportable.

Reviewed By: bethebunny

Differential Revision: D15497208

fbshipit-source-id: fd9645119e154e3c397f147acf9144d661d9a5c8

4e9ecb80

28 May, 2019 1 commit

Add --sentence-bleu option to score.py · 65f46473

Myle Ott authored May 28, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/605

Differential Revision: D15518167

Pulled By: myleott

fbshipit-source-id: 8b0e6b32adff018136d0d251b7fde3818e373d6f

65f46473

24 May, 2019 2 commits

Implement reducing footprint of average checkpoint correctly (#747) · 8ce2c35d

Yongqiang Wang authored May 24, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/747

In https://github.com/pytorch/fairseq/pull/647, checkpoint averaging
is not Implemented correctly when it comes to shared parameters. This diff
has the right Implementation and a test case to guard future change.

Reviewed By: myleott

Differential Revision: D15402943

fbshipit-source-id: 8004836d5c2571814ea54844650618008a9ee522

8ce2c35d

fix bug for masking prob (#758) · 6b0cce84

Jingfei Du authored May 24, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/758

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/603

fixed a typo for _mask_block of mlm. This typo will make we never set masked token as random token, which should take 10% of the masked tokens.

Reviewed By: akinh

Differential Revision: D15492315

fbshipit-source-id: 1e03dc862e23a6543e51d7401c74608d366ba62d

6b0cce84

23 May, 2019 3 commits

collections.abc python 3.8 · 6b3a516f

Jason Fried authored May 23, 2019

Summary:
In python 3.7 collections.abc warns when importing abc classes from `collections`
In 3.8 this will not work at all.

This changes all code using abc's from collections to attempt to import from `collections.abc`

I am not fixing existing lint's don't ask, if `arc lint` auto-fixed I accepted, except for spelling in code.

Reviewed By: lisroach

Differential Revision: D15461049

fbshipit-source-id: ac2bf2ec8cffacd8ba5572882b0832bbf99a1646

6b3a516f

Fix gating for find_unused_parameters · 128f4bea

Myle Ott authored May 23, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/600

Differential Revision: D15469322

Pulled By: myleott

fbshipit-source-id: fdefa8efbb10e48b2a04a6bc10404fd2f3f21ecf

128f4bea

Allow unused params in distributed training · 72a5487c

Kritika Singh authored May 22, 2019

Summary:
Context from https://fb.workplace.com/groups/1405155842844877/permalink/2785095451517569/:

I am adding a model to pyspeech (formerly fairspeq) with the following `forward`:
```
def forward(self, src_tokens, src_lengths, prev_output_tokens, name):
    encoder_out = self.encoder(src_tokens, src_lengths)
    if name == Dataset.d1:
        decoder_out = self.decoder1(prev_output_tokens, encoder_out)
    elif name == Dataset.d2:
        decoder_out = self.decoder2(encoder_out)
    return decoder_out
```
When I run distributed training on this model, I get the following error:

```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a
new one. This error indicates that your module has parameters that were not used in
producing loss. You can enable unused parameter detection by (1) passing the keyword
argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2)
making sure all `forward` function outputs participate in calculating loss. If you already have
done the above two steps, then the distributed data parallel module wasn't able to locate the
output tensors in the return value of your module's `forward` function. Please include the loss
function and the structure of the return value of `forward` of your module when reporting this
issue (e.g. list, dict, iterable). (prepare_for_backward at
caffe2/torch/csrc/distributed/c10d/reducer.cpp:410)
```

The recommended fix is to pass find_unused_parameters=True to DistributedDataParallel's initialization

Reviewed By: myleott

Differential Revision: D15439726

fbshipit-source-id: 7fd80d4a3f49ac90182dec723b49b14e6689406a

72a5487c

22 May, 2019 2 commits

Fix semisupervised translation · c11aaf14

Matt Le authored May 22, 2019

Summary: Fixes semisupervised translation task to deal with change in order of data loading and model creation (D15428242). When we build the model, we create the backtranslation function, which we can then pass in to the constructor of BacktranslationDataset

Reviewed By: myleott

Differential Revision: D15455420

fbshipit-source-id: 95101ca92f8af33702be3416147edd98da135a20

c11aaf14

Remove duplicate code (#754) · 886ef6bc

zhiqiang authored May 22, 2019

Summary:
Remove duplicate definition of PositionalEmbedding in `lightconv.py`
Pull Request resolved: https://github.com/pytorch/fairseq/pull/754

Differential Revision: D15451443

Pulled By: myleott

fbshipit-source-id: a3d82ab2c1335d66be3c5d67a07893162d138c7a

886ef6bc

21 May, 2019 3 commits

Don't load training set twice · 4604b4a5

Myle Ott authored May 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/595

Differential Revision: D15428242

Pulled By: myleott

fbshipit-source-id: 3cec83a2353498a4802398eba8bcb1aefaf6d5c4

4604b4a5

Add missing LM options · ef62ec0a

Myle Ott authored May 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/596

Differential Revision: D15432359

Pulled By: myleott

fbshipit-source-id: ebfdf0031864c3c88357543c0202ba0bd65a7b90

ef62ec0a

Add compare_namespaces.py helper · d10fe896

Myle Ott authored May 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/597

Differential Revision: D15432965

Pulled By: myleott

fbshipit-source-id: 4471a2a8bb468bb639a80f977ab4c20480acb461

d10fe896

20 May, 2019 4 commits

Add --disable-validation · b71f8f45

Myle Ott authored May 20, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/592

Differential Revision: D15415499

Pulled By: myleott

fbshipit-source-id: 87ba09b9b38501daebd95bbf28815e048c78f9a3

b71f8f45

fix bug for masking (#752) · 4fac3b60

Jingfei Du authored May 20, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/752

previously we sample masked tokens with replace=True (default). Because of this, we would mask same tokens multiple times, which will make us mask less tokens finally

Reviewed By: liaimi

Differential Revision: D15403556

fbshipit-source-id: cf12eeb13f9610431136a345de9199ad0292984b

4fac3b60

Make ConcatDataset work in PytorchTranslateTask multi-path dataset loading (#730) · ee28411f

Ning Dong authored May 20, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/730

Pull Request resolved: https://github.com/pytorch/translate/pull/528

Add/modify necessary functions for ConcatDataset to work in PytorchTranslateTask and replace MultiCorpusSampledDataset which doesn't support mixed batch.

Any idea on how to implement collater here for mixed batch? Now I'm just using the collater of the first dataset.

Reviewed By: liezl200

Differential Revision: D15260872

fbshipit-source-id: 14b148c506e9f8ebf4fe60a49f95444d4123d76f

ee28411f

Fix for tasks that don't define args.data · 5aebd096

Myle Ott authored May 20, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/591

Differential Revision: D15415490

Pulled By: myleott

fbshipit-source-id: c45df5f3b5327911e2c9b11642e7da2e8bb835dc

5aebd096

19 May, 2019 1 commit

Make Fairseq compatible with pre-computed position tensors (#570) · e265c239

Kartikay Khandelwal authored May 18, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/570

Pull Request resolved: https://github.com/pytorch/fairseq/pull/731

Currently the LearnedPositionalEmbedding module computes the position tensor based on the input data. However this really doesnt work for XLM where we have different behavior based on the Masked LM and Translation LM. In this diff I keep the same default behavior for LearnedPositionalEmbedding as before but add the ability for these models to work with pre-computed position tensors.

Reviewed By: myleott

Differential Revision: D15305474

fbshipit-source-id: de7d908245a2a620b58d36055211600a08f2d1dc

e265c239

17 May, 2019 2 commits

Small features + lint · ba989ed1

Myle Ott authored May 17, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/588

Differential Revision: D15389638

Pulled By: myleott

fbshipit-source-id: 4632ce22d51dc2c74d250bae999630095d849701

ba989ed1

Clean up sharded train iterator · 3bfbb49b

Myle Ott authored May 16, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/586

Differential Revision: D15372949

Pulled By: myleott

fbshipit-source-id: c1cf1c645e8d55fc8568f23a47c45677ac9ab1da

3bfbb49b

16 May, 2019 5 commits

fixed bugs of masked_lm for fine-tuning (#744) · fca32e05

Jingfei Du authored May 16, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/744

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/587

After we added additional prediciton layers for language model predictions. The fine-tuning is broken because of 2 reasons.
1. checkpoint cannot be loaded since we didn't update state_dict names
2. lm_output_learned_bias is not initialize if load_softmax is false

Reviewed By: myleott

Differential Revision: D15377380

fbshipit-source-id: d58544b1d2c549586abef42fec19ec8bf27a994a

fca32e05

Back out "reduce memory footprint for average_checkpoints" (#743) · e2a0b87d

Myle Ott authored May 16, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/743

Original commit changeset: 0afe37c9a031

According to edunov: "We need to be careful here with shared parameters, I believe right now it is broken if you have shared encoder/decoder input embeddings (encoder.embed_tokens.weight and decoder.embed_tokens.weight) as they get updated several times"

We also have OSS issues that look related, e.g., https://github.com/pytorch/fairseq/issues/732.

Backing this out until we can confirm the correct behavior for shared params.

Differential Revision: D15372673

fbshipit-source-id: 8683c0f2514e21fa1e9d2fe6dfc48d98957a2831

e2a0b87d

Cleanup rm_pt.py script · e797f633

Myle Ott authored May 16, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/585

Differential Revision: D15372416

fbshipit-source-id: add226a4558ae4d84dd261e9317b80c43970f771

e797f633

Add multi-dataset loading to multilingual_translation · 0863ea68

Peng-Jen Chen authored May 15, 2019

Summary: Similar to TranslationTask, we want to enable multilingual translation task to be able to load 'train{k}' datasets from data-bin folder.

Reviewed By: lematt1991

Differential Revision: D15363481

fbshipit-source-id: 5fed7be19383023b792ed2fd38e655cbcecc8b90

0863ea68

fixed cmd arg for shuffle dataset masked lm task · 861dd2b7

Naman Goyal authored May 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/584

Reviewed By: myleott

Differential Revision: D15360774

Pulled By: myleott

fbshipit-source-id: b18efbb6ff5a8832c61b689f3d87c958cbd908e9

861dd2b7

15 May, 2019 7 commits

Fix biTransformer export (#583) · 2a3adcdc

Ruty Rinott authored May 15, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/583

D14610694 fixed issues in layerNorm exporting by making it conditional. D15260838 changed the implementation of TransformerDecoderLayer to the one under transformer, thus losing the fix. Bringing it back here.

Reviewed By: myleott, geof90, liaimi

Differential Revision: D15357119

fbshipit-source-id: e29e053ca5beca0008d7a8dad9880a483a14c7b9

2a3adcdc

added shuffle as arg for masked_lm for experimenting with pad effecie… (#582) · 74c936dc

Naman Goyal authored May 15, 2019

Summary:
added shuffle as arg for masked_lm for experimenting with pad effecient batching
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/582

Reviewed By: jingfeidu

Differential Revision: D15355105

Pulled By: jingfeidu

fbshipit-source-id: 9925271a0bc2f9d283f354d158bd4b5ec8788b39

74c936dc

added missing dense layers in masked lm model (#581) · d1d3a581

Naman Goyal authored May 15, 2019

Summary:
1) Added pooled_output for sentence classification as `Tanh(Linear())`.
2) Added lm_head_transform as `LayerNorm(GeLU(Linear(x)))`
3) `act_dropout = 0.0`
4) added `lm_output_learned_bias`
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/581

Reviewed By: borguz

Differential Revision: D15353575

Pulled By: borguz

fbshipit-source-id: 4ff64c6ceed23f3e99348f73d189546f1d84452e

d1d3a581

Updates to model API (#561) · dffb1674

Myle Ott authored May 15, 2019

Summary:
- `FairseqModel` -> `FairseqEncoderDecoderModel`
- add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
- `encoder_out_dict` -> `encoder_out`
- rm unused `remove_head` functions
- update docs
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/561

Differential Revision: D15271142

Pulled By: myleott

fbshipit-source-id: 8e8864e399336020f0271c780598e968ff51a264

dffb1674

Allow TransformerSentenceEncoder to return only last state · a0c5f9b8

Myle Ott authored May 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/578

Differential Revision: D15352060

Pulled By: myleott

fbshipit-source-id: 7dc2fceca37ec96c89356662831b0d82f28bef6f

a0c5f9b8

Add missing imports · 52778827

Myle Ott authored May 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/579

Differential Revision: D15352058

Pulled By: myleott

fbshipit-source-id: cebef02edcfcb203ef2e32c64f7f28e08c4e46b0

52778827

Various fixes for Masked LM (#573) · bf106796

Myle Ott authored May 14, 2019

Summary:
Various fixes for Masked LM

- use --activation-fn instead of --gelu
- use --dataset-impl instead of --lazy-load
- add embed_scale option to TransformerSentenceEncoder
- fix encoder_normalize_before to include a final layer norm
- delete BertLayerNorm
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/573

Reviewed By: borguz

Differential Revision: D15317933

Pulled By: myleott

fbshipit-source-id: 8ecb46556ad43e76e92d41ed8f5a62e8516fd375

bf106796

14 May, 2019 3 commits

rm default_key from MultiCorpusSampledDataset · 7432130e

Myle Ott authored May 14, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/575

Differential Revision: D15318004

Pulled By: myleott

fbshipit-source-id: ad918d71b1bd8074decf5ec3463dd9bc9487bbe9

7432130e

Alignment Training task using minibatch · 2c278ff0

Nayan Singhal authored May 14, 2019

Summary:
1. Define a EpochMinibatchIterator which extends the EpochBatchIterator. It has same functionality as EpochBatchIterator except two major changes: use static batching and use MiniBatchIterator for getting the indices.

2. SplitSeqCollater is used instead of Seq2SeqCollater.
3. LSTM_subsample started storing the previous states and reset it once the sample is over.

Reviewed By: jay-mahadeokar

Differential Revision: D15209023

fbshipit-source-id: 900b8bd1f25159ffc77f8106e26729a3e7422a1f

2c278ff0

Move save/load checkpoint functions to utils · cd1e5c09

Dmytro Okhonko authored May 14, 2019

Summary:
Move `load_checkpoint`, `save_checkpoint` and `reload_train` from train.py to checkpoint_utils.py
Move `get_perplexity` from train.py to utils.py.
This will make train.py lighter and allow us to reuse all this utils functionality when fairseq is used as external library.

Reviewed By: myleott

Differential Revision: D15289607

fbshipit-source-id: 4b7c95225ac22e402bcda3497811361809110df1

cd1e5c09

13 May, 2019 3 commits

Transition smoothly after warmup in polynomial LR decay schedule · c124d272

Myle Ott authored May 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/576

Differential Revision: D15318086

Pulled By: myleott

fbshipit-source-id: c6587737ca7b97edc97ad4aef5c5c9ac7e92b2f2

c124d272

gelu_fast -> gelu_accurate (#571) · 939ab6ae

Myle Ott authored May 13, 2019

Summary:
This was named gelu_fast after the original implementation: https://github.com/hendrycks/GELUs/blob/master/mnist_ae.py#L62-L63

But in practice it's actually slower and uses more memory. Rename to gelu_accurate.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/571

Differential Revision: D15317874

Pulled By: myleott

fbshipit-source-id: c96fbc89bf91b27ced1ab8d5b25a8f23f922ec24

939ab6ae

Lint · 72291287

Myle Ott authored May 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/574

Differential Revision: D15317984

Pulled By: myleott

fbshipit-source-id: 09a66229cc6b4c95678ca1ca13c9e0da25b203de

72291287