Commits · 8c03ff2ddc0df95094cf716146d3fcdbd8a17b00 · OpenDAS / Fairseq

01 Jun, 2019 1 commit

Myle Ott authored May 31, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/622

Differential Revision: D15572555

Pulled By: myleott

fbshipit-source-id: 2b81f22207b4c894ffe645af0b45c70ac0a80612

8c03ff2d

31 May, 2019 1 commit

Replace --decoder-final-norm with --no-decoder-final-norm · 8ca05802

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/620

Differential Revision: D15569440

Pulled By: myleott

fbshipit-source-id: c4681f1c72467c04cd2654e87bc724c94b76e3fb

8ca05802

30 May, 2019 7 commits

Update --memory-efficient-fp16 to work with c10d DDP · 38e82904

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/617

Differential Revision: D15555328

Pulled By: myleott

fbshipit-source-id: 35d1f329f887cb0b867c7a22f17a16f3c9c66815

38e82904

Update MoE README · 75cc8821

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/619

Differential Revision: D15562983

Pulled By: myleott

fbshipit-source-id: 9240f56f18c87120b7d38e0db374d24a55999395

75cc8821

Clarify mixed precision training support (#766) · d5f76d74

Khoa Ho authored May 30, 2019

Summary:
Change the wording to avoid confusion. Mixed precision ensures both higher arithmetic throughput and numerical stability, not exactly synonymous to pure half-precision/FP16 training. Also add mentioning of tensor cores since older generation GPUs without tensor cores don't support true mixed precision training.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/766

Differential Revision: D15559565

Pulled By: myleott

fbshipit-source-id: c71e720772657bb3e8ad330b58bf69e23beb614e

d5f76d74

Add --reset-dataloader · ffc3bb58

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/613

Differential Revision: D15541384

Pulled By: myleott

fbshipit-source-id: ef2c0b0a51cdf37af2ccff0546f524d49f87e65d

ffc3bb58

Fix PyTorch deprecation warnings · 9770f367

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/618

Differential Revision: D15552599

Pulled By: myleott

fbshipit-source-id: 2192a30a9c5af31b954a3a1716166dd6ba27b23a

9770f367

Added support for plotting scalars through palaas tbwriter interface. (#580) · 47313d85

Sujit Verma authored May 29, 2019

Summary: Changes for supporting tensorboard scalar plotting.

Reviewed By: myleott

Differential Revision: D15456534

Pulled By: myleott

fbshipit-source-id: a012a4eea028aae764ce11786570b7d96841c4a5

47313d85

device error in SinusoidalPositionalEmbedding (#746) · dd0dc54c

lukovnikov authored May 29, 2019

Summary:
Not sure if I'm doing something wrong elsewhere, but I had a device error in `SinusoidalPositionalEmbedding` when running on GPU > 0 because the weights were on a different device than the input.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/746

Differential Revision: D15547217

Pulled By: myleott

fbshipit-source-id: 37849d895ce483c14615fdb4ace8a8c4fb05b568

dd0dc54c

29 May, 2019 7 commits

Fix Tensorboard Init (#763) · 497d972e

Zhanghao Wu authored May 29, 2019

Summary:
Fix the mismatching between the parameter fed into `SummaryWriter` and the API of the latest [tensorboardX](https://github.com/lanpa/tensorboardX/blob/3e35c9b5f85e8ceb0294532d9eb772341a04c097/tensorboardX/writer.py#L192), i.e. "log_dir" -> "logdir".
Pull Request resolved: https://github.com/pytorch/fairseq/pull/763

Differential Revision: D15547192

Pulled By: myleott

fbshipit-source-id: c51b88da5ec589fb8ca5b4876bc229efeb7bf494

497d972e

Faster masking in MultiheadAttention · b18a3126

Myle Ott authored May 29, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/612

Differential Revision: D15541377

Pulled By: myleott

fbshipit-source-id: 4762516a3b545d03bc81d3660f47827e15466dce

b18a3126

Fix warmup for polynomial decay schedule · c97978a2

Myle Ott authored May 29, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/611

Differential Revision: D15541303

Pulled By: myleott

fbshipit-source-id: 279ca813437c834fca49576a48b75cbf1fdf0e76

c97978a2

Support multiple seeds in data_utils.numpy_seed · 977e36e5

Myle Ott authored May 29, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/610

Differential Revision: D15541261

Pulled By: myleott

fbshipit-source-id: f0b823cf4f04c5ef3205f6d259c6dcad4cc329b1

977e36e5

rm BertLayerNorm · 3e472b22

Myle Ott authored May 29, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/608

Differential Revision: D15541220

Pulled By: myleott

fbshipit-source-id: 52a8e4da72cc6e3e25cf98c989d34a269d614c9d

3e472b22

making it easier to use transformer_lm model with new tasks · ed592ab5

Spencer Poff authored May 29, 2019

Summary:
There were two non-obvious errors I ran into while creating a new language modeling task:
- `transformer_lm` implicitly required the `tokens_per_sample` arg
- `transformer_lm` assumed the task had a `dictionary` and `output_dictionary` property, neither of which are specified in the FairseqTask interface

Reviewed By: myleott

Differential Revision: D15532345

fbshipit-source-id: 200d7d3b542c35f17cc2d6bca4219c4a4d17cb6b

ed592ab5

Make XLM torchscipt Export-able (#765) · 4e9ecb80

Kartikay Khandelwal authored May 29, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/765

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/614

This diff has changes needed to make XLM torchscript exportable.

Reviewed By: bethebunny

Differential Revision: D15497208

fbshipit-source-id: fd9645119e154e3c397f147acf9144d661d9a5c8

4e9ecb80

28 May, 2019 1 commit

Add --sentence-bleu option to score.py · 65f46473

Myle Ott authored May 28, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/605

Differential Revision: D15518167

Pulled By: myleott

fbshipit-source-id: 8b0e6b32adff018136d0d251b7fde3818e373d6f

65f46473

24 May, 2019 2 commits

Implement reducing footprint of average checkpoint correctly (#747) · 8ce2c35d

Yongqiang Wang authored May 24, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/747

In https://github.com/pytorch/fairseq/pull/647, checkpoint averaging
is not Implemented correctly when it comes to shared parameters. This diff
has the right Implementation and a test case to guard future change.

Reviewed By: myleott

Differential Revision: D15402943

fbshipit-source-id: 8004836d5c2571814ea54844650618008a9ee522

8ce2c35d

fix bug for masking prob (#758) · 6b0cce84

Jingfei Du authored May 24, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/758

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/603

fixed a typo for _mask_block of mlm. This typo will make we never set masked token as random token, which should take 10% of the masked tokens.

Reviewed By: akinh

Differential Revision: D15492315

fbshipit-source-id: 1e03dc862e23a6543e51d7401c74608d366ba62d

6b0cce84

23 May, 2019 3 commits

collections.abc python 3.8 · 6b3a516f

Jason Fried authored May 23, 2019

Summary:
In python 3.7 collections.abc warns when importing abc classes from `collections`
In 3.8 this will not work at all.

This changes all code using abc's from collections to attempt to import from `collections.abc`

I am not fixing existing lint's don't ask, if `arc lint` auto-fixed I accepted, except for spelling in code.

Reviewed By: lisroach

Differential Revision: D15461049

fbshipit-source-id: ac2bf2ec8cffacd8ba5572882b0832bbf99a1646

6b3a516f

Fix gating for find_unused_parameters · 128f4bea

Myle Ott authored May 23, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/600

Differential Revision: D15469322

Pulled By: myleott

fbshipit-source-id: fdefa8efbb10e48b2a04a6bc10404fd2f3f21ecf

128f4bea

Allow unused params in distributed training · 72a5487c

Kritika Singh authored May 22, 2019

Summary:
Context from https://fb.workplace.com/groups/1405155842844877/permalink/2785095451517569/:

I am adding a model to pyspeech (formerly fairspeq) with the following `forward`:
```
def forward(self, src_tokens, src_lengths, prev_output_tokens, name):
    encoder_out = self.encoder(src_tokens, src_lengths)
    if name == Dataset.d1:
        decoder_out = self.decoder1(prev_output_tokens, encoder_out)
    elif name == Dataset.d2:
        decoder_out = self.decoder2(encoder_out)
    return decoder_out
```
When I run distributed training on this model, I get the following error:

```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a
new one. This error indicates that your module has parameters that were not used in
producing loss. You can enable unused parameter detection by (1) passing the keyword
argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2)
making sure all `forward` function outputs participate in calculating loss. If you already have
done the above two steps, then the distributed data parallel module wasn't able to locate the
output tensors in the return value of your module's `forward` function. Please include the loss
function and the structure of the return value of `forward` of your module when reporting this
issue (e.g. list, dict, iterable). (prepare_for_backward at
caffe2/torch/csrc/distributed/c10d/reducer.cpp:410)
```

The recommended fix is to pass find_unused_parameters=True to DistributedDataParallel's initialization

Reviewed By: myleott

Differential Revision: D15439726

fbshipit-source-id: 7fd80d4a3f49ac90182dec723b49b14e6689406a

72a5487c

22 May, 2019 2 commits

Fix semisupervised translation · c11aaf14

Matt Le authored May 22, 2019

Summary: Fixes semisupervised translation task to deal with change in order of data loading and model creation (D15428242). When we build the model, we create the backtranslation function, which we can then pass in to the constructor of BacktranslationDataset

Reviewed By: myleott

Differential Revision: D15455420

fbshipit-source-id: 95101ca92f8af33702be3416147edd98da135a20

c11aaf14

Remove duplicate code (#754) · 886ef6bc

zhiqiang authored May 22, 2019

Summary:
Remove duplicate definition of PositionalEmbedding in `lightconv.py`
Pull Request resolved: https://github.com/pytorch/fairseq/pull/754

Differential Revision: D15451443

Pulled By: myleott

fbshipit-source-id: a3d82ab2c1335d66be3c5d67a07893162d138c7a

886ef6bc

21 May, 2019 3 commits

Don't load training set twice · 4604b4a5

Myle Ott authored May 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/595

Differential Revision: D15428242

Pulled By: myleott

fbshipit-source-id: 3cec83a2353498a4802398eba8bcb1aefaf6d5c4

4604b4a5

Add missing LM options · ef62ec0a

Myle Ott authored May 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/596

Differential Revision: D15432359

Pulled By: myleott

fbshipit-source-id: ebfdf0031864c3c88357543c0202ba0bd65a7b90

ef62ec0a

Add compare_namespaces.py helper · d10fe896

Myle Ott authored May 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/597

Differential Revision: D15432965

Pulled By: myleott

fbshipit-source-id: 4471a2a8bb468bb639a80f977ab4c20480acb461

d10fe896

20 May, 2019 4 commits

Add --disable-validation · b71f8f45

Myle Ott authored May 20, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/592

Differential Revision: D15415499

Pulled By: myleott

fbshipit-source-id: 87ba09b9b38501daebd95bbf28815e048c78f9a3

b71f8f45

fix bug for masking (#752) · 4fac3b60

Jingfei Du authored May 20, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/752

previously we sample masked tokens with replace=True (default). Because of this, we would mask same tokens multiple times, which will make us mask less tokens finally

Reviewed By: liaimi

Differential Revision: D15403556

fbshipit-source-id: cf12eeb13f9610431136a345de9199ad0292984b

4fac3b60

Make ConcatDataset work in PytorchTranslateTask multi-path dataset loading (#730) · ee28411f

Ning Dong authored May 20, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/730

Pull Request resolved: https://github.com/pytorch/translate/pull/528

Add/modify necessary functions for ConcatDataset to work in PytorchTranslateTask and replace MultiCorpusSampledDataset which doesn't support mixed batch.

Any idea on how to implement collater here for mixed batch? Now I'm just using the collater of the first dataset.

Reviewed By: liezl200

Differential Revision: D15260872

fbshipit-source-id: 14b148c506e9f8ebf4fe60a49f95444d4123d76f

ee28411f

Fix for tasks that don't define args.data · 5aebd096

Myle Ott authored May 20, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/591

Differential Revision: D15415490

Pulled By: myleott

fbshipit-source-id: c45df5f3b5327911e2c9b11642e7da2e8bb835dc

5aebd096

19 May, 2019 1 commit

Make Fairseq compatible with pre-computed position tensors (#570) · e265c239

Kartikay Khandelwal authored May 18, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/570

Pull Request resolved: https://github.com/pytorch/fairseq/pull/731

Currently the LearnedPositionalEmbedding module computes the position tensor based on the input data. However this really doesnt work for XLM where we have different behavior based on the Masked LM and Translation LM. In this diff I keep the same default behavior for LearnedPositionalEmbedding as before but add the ability for these models to work with pre-computed position tensors.

Reviewed By: myleott

Differential Revision: D15305474

fbshipit-source-id: de7d908245a2a620b58d36055211600a08f2d1dc

e265c239

17 May, 2019 2 commits

Small features + lint · ba989ed1

Myle Ott authored May 17, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/588

Differential Revision: D15389638

Pulled By: myleott

fbshipit-source-id: 4632ce22d51dc2c74d250bae999630095d849701

ba989ed1

Clean up sharded train iterator · 3bfbb49b

Myle Ott authored May 16, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/586

Differential Revision: D15372949

Pulled By: myleott

fbshipit-source-id: c1cf1c645e8d55fc8568f23a47c45677ac9ab1da

3bfbb49b

16 May, 2019 5 commits

fixed bugs of masked_lm for fine-tuning (#744) · fca32e05

Jingfei Du authored May 16, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/744

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/587

After we added additional prediciton layers for language model predictions. The fine-tuning is broken because of 2 reasons.
1. checkpoint cannot be loaded since we didn't update state_dict names
2. lm_output_learned_bias is not initialize if load_softmax is false

Reviewed By: myleott

Differential Revision: D15377380

fbshipit-source-id: d58544b1d2c549586abef42fec19ec8bf27a994a

fca32e05

Back out "reduce memory footprint for average_checkpoints" (#743) · e2a0b87d

Myle Ott authored May 16, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/743

Original commit changeset: 0afe37c9a031

According to edunov: "We need to be careful here with shared parameters, I believe right now it is broken if you have shared encoder/decoder input embeddings (encoder.embed_tokens.weight and decoder.embed_tokens.weight) as they get updated several times"

We also have OSS issues that look related, e.g., https://github.com/pytorch/fairseq/issues/732.

Backing this out until we can confirm the correct behavior for shared params.

Differential Revision: D15372673

fbshipit-source-id: 8683c0f2514e21fa1e9d2fe6dfc48d98957a2831

e2a0b87d

Cleanup rm_pt.py script · e797f633

Myle Ott authored May 16, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/585

Differential Revision: D15372416

fbshipit-source-id: add226a4558ae4d84dd261e9317b80c43970f771

e797f633

Add multi-dataset loading to multilingual_translation · 0863ea68

Peng-Jen Chen authored May 15, 2019

Summary: Similar to TranslationTask, we want to enable multilingual translation task to be able to load 'train{k}' datasets from data-bin folder.

Reviewed By: lematt1991

Differential Revision: D15363481

fbshipit-source-id: 5fed7be19383023b792ed2fd38e655cbcecc8b90

0863ea68

fixed cmd arg for shuffle dataset masked lm task · 861dd2b7

Naman Goyal authored May 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/584

Reviewed By: myleott

Differential Revision: D15360774

Pulled By: myleott

fbshipit-source-id: b18efbb6ff5a8832c61b689f3d87c958cbd908e9

861dd2b7

15 May, 2019 1 commit

Fix biTransformer export (#583) · 2a3adcdc

Ruty Rinott authored May 15, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/583

D14610694 fixed issues in layerNorm exporting by making it conditional. D15260838 changed the implementation of TransformerDecoderLayer to the one under transformer, thus losing the fix. Bringing it back here.

Reviewed By: myleott, geof90, liaimi

Differential Revision: D15357119

fbshipit-source-id: e29e053ca5beca0008d7a8dad9880a483a14c7b9

2a3adcdc