Commits · eea4d20b65bd619670549712a4036a02f3cd43c8 · OpenDAS / Fairseq

11 Jun, 2019 6 commits

Automatically fill in default values from add_args · eea4d20b

Myle Ott authored Jun 11, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/797

Differential Revision: D15761071

Pulled By: myleott

fbshipit-source-id: 257d4a2297e83da7e59baed154dbafd6bfe614bf

eea4d20b

Add exception for bsz=1 with prefix generation (#796) · 1b937bb2

Myle Ott authored Jun 11, 2019

Summary:
This is a temporary workaround to support sampling after https://github.com/pytorch/fairseq/issues/713. We'll need to revisit this to support sampling and beam more generally.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/796

Differential Revision: D15760808

Pulled By: myleott

fbshipit-source-id: ecaf4f161b0c30de037f32007e4610a559a49230

1b937bb2

Python3.5 compat (#794) · a8f28ecb

Bairen Yi authored Jun 11, 2019

Summary:
See #467. Ping myleott to review.

This is a work-related contribution. Ping lark to review.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/794

Differential Revision: D15756816

Pulled By: myleott

fbshipit-source-id: 6dce3ff3a713bf5f60e5782bc260b2ca9d2c0a9b

a8f28ecb

Add generic registry mechanism · 9b40999e

Myle Ott authored Jun 11, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/792

Differential Revision: D15741781

Pulled By: myleott

fbshipit-source-id: c256c7900c307d485904e69b1526b9acbe08fec9

9b40999e

when given prefix_tokens, sequence generator would generate (exactly) same... · 9dc9a486

yilinyang7 authored Jun 11, 2019

when given prefix_tokens, sequence generator would generate (exactly) same finished candidates (#713)

Summary:
https://github.com/pytorch/fairseq/issues/712
Pull Request resolved: https://github.com/pytorch/fairseq/pull/713

Differential Revision: D15242432

Pulled By: myleott

fbshipit-source-id: a230ee48f4bf891c805609c428d7233a0ad21179

9dc9a486

Fix of MHA for TPUs (#636) · ee8bcb17

Sergey Edunov authored Jun 10, 2019

Summary:
Multi-Head attention is currently not TPU-friendly, specifically .data_ptr() is not supported and should not be used. Also there are potential issues with correctness of existing code (e.g. data_ptr() can point to the same storage for different tensors). Rather than rely on data_ptr() we should explicitly set self_attention or encoder_decoder_attention flags.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/636

Reviewed By: myleott

Differential Revision: D15709898

Pulled By: edunov

fbshipit-source-id: f931713193c51be848a5de20da730ac3a3ce0187

ee8bcb17

10 Jun, 2019 2 commits

More generator features for demo (#791) · 4868c182

Myle Ott authored Jun 10, 2019

Summary:
- make it possible to load file_utils.py without the dependencies
- add some more demo features
Pull Request resolved: https://github.com/pytorch/fairseq/pull/791

Differential Revision: D15739950

Pulled By: myleott

fbshipit-source-id: 38df5209973a6fe2e3651575b97134e096aaf5bf

4868c182

fix log printing in progress bar (#778) · a58c1127

freewym authored Jun 10, 2019

Summary:
In the current progress bar, the counter for log_interval will always start from 0, which is not correct if reloading from a checkpoint in the middle of an epoch. This fix obtains the offset from the iterator to set the counter correctly.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/778

Differential Revision: D15739953

Pulled By: myleott

fbshipit-source-id: a1d13403ec5783b22e01d7cb63874fd8dea7f8b0

a58c1127

07 Jun, 2019 1 commit

Replace unknown word by original source word when empty string is given (#770) · 1ca075a2

Ning Dong authored Jun 06, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/770

Without this change comment here https://fburl.com/w1cejgw9 is inconsistent with the implementation.

Reviewed By: xianxl

Differential Revision: D15582826

fbshipit-source-id: 16d8368560153b251beed8b290f51fcdd8a8faee

1ca075a2

06 Jun, 2019 1 commit

Change encoder_learned_pos default back to True for xlm_base · fa7791df

Matt Le authored Jun 06, 2019

Reviewed By: pipibjc

Differential Revision: D15635402

fbshipit-source-id: e92fab914de40775d7bad851420355240d822bde

fa7791df

04 Jun, 2019 4 commits

Fix loading XLM pretraining · 5408bc08

Matt Le authored Jun 04, 2019

Summary: We never actually load the model parameters from an XLM model when using tranformer_from_pretrained_xlm. Also, change encoder_learned_pos from True -> False

Reviewed By: liezl200

Differential Revision: D15629061

fbshipit-source-id: 759eadc88041eae94505477960de57dd78a99dcb

5408bc08

Fixing xlm example docts (#776) · 0d636744

lematt1991 authored Jun 04, 2019

Summary:
Resolves #762
Pull Request resolved: https://github.com/pytorch/fairseq/pull/776

Differential Revision: D15631503

Pulled By: lematt1991

fbshipit-source-id: 103f77d553476917b8b0f8001767217fb311d920

0d636744

Remove overridden inverse_sqrt lr scheduler in dynamic conv example (#769) · b1dd40cf

lematt1991 authored Jun 04, 2019

Summary:
Resolves #768
Pull Request resolved: https://github.com/pytorch/fairseq/pull/769

Differential Revision: D15621841

Pulled By: lematt1991

fbshipit-source-id: 694effe3788ff7d04864217d673608ec31da589e

b1dd40cf

Adding masked_lm_dictionary to pytorch_translate (#630) · 4ed5abc9

Biao Lu authored Jun 03, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/630

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/629

Pull Request resolved: https://github.com/pytorch/translate/pull/562

Pull Request resolved: https://github.com/pytorch/fairseq/pull/774

forked masked_lm_dictionary from fairseq
changed import in pytorch_translate to use the new masked_lm_dictionary
registered cooresponding tasks

Reviewed By: liezl200

Differential Revision: D15410352

fbshipit-source-id: 06516caabdd4dc5cdee9ad1d8025978f4eea6c4b

4ed5abc9

03 Jun, 2019 2 commits

fix masked_lm for loading in pytext · dc028c52

Haoran Li authored Jun 03, 2019

Summary: lm_output_learned_bias doesn't exist when loading the model for fine-tuning

Reviewed By: jingfeidu

Differential Revision: D15579190

fbshipit-source-id: 45e8e193399943c89b77cc553d3d6d49b056e55a

dc028c52

Torch hub · a2aed890

Nathan Ng authored Jun 03, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/621

Differential Revision: D15571435

Pulled By: myleott

fbshipit-source-id: 67d25b00c8c1bc69dbffd8521da56f7cc14eb75e

a2aed890

02 Jun, 2019 2 commits

Fix rearranging of encoder_out in SequenceGenerator · b35d9bca

Myle Ott authored Jun 02, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/625

Differential Revision: D15595787

Pulled By: myleott

fbshipit-source-id: ba6edf305ed41be392194f492e034dd66d1743fe

b35d9bca

Backward compatibility + updated links for pretrained language models · 6a21b232

Myle Ott authored Jun 02, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/624

Differential Revision: D15595746

Pulled By: myleott

fbshipit-source-id: b79e489de9ff37ee7cbf939092a6e5ec0dbebbf5

6a21b232

01 Jun, 2019 1 commit

Fix positions for LM · 8c03ff2d

Myle Ott authored May 31, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/622

Differential Revision: D15572555

Pulled By: myleott

fbshipit-source-id: 2b81f22207b4c894ffe645af0b45c70ac0a80612

8c03ff2d

31 May, 2019 1 commit

Replace --decoder-final-norm with --no-decoder-final-norm · 8ca05802

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/620

Differential Revision: D15569440

Pulled By: myleott

fbshipit-source-id: c4681f1c72467c04cd2654e87bc724c94b76e3fb

8ca05802

30 May, 2019 7 commits

Update --memory-efficient-fp16 to work with c10d DDP · 38e82904

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/617

Differential Revision: D15555328

Pulled By: myleott

fbshipit-source-id: 35d1f329f887cb0b867c7a22f17a16f3c9c66815

38e82904

Update MoE README · 75cc8821

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/619

Differential Revision: D15562983

Pulled By: myleott

fbshipit-source-id: 9240f56f18c87120b7d38e0db374d24a55999395

75cc8821

Clarify mixed precision training support (#766) · d5f76d74

Khoa Ho authored May 30, 2019

Summary:
Change the wording to avoid confusion. Mixed precision ensures both higher arithmetic throughput and numerical stability, not exactly synonymous to pure half-precision/FP16 training. Also add mentioning of tensor cores since older generation GPUs without tensor cores don't support true mixed precision training.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/766

Differential Revision: D15559565

Pulled By: myleott

fbshipit-source-id: c71e720772657bb3e8ad330b58bf69e23beb614e

d5f76d74

Add --reset-dataloader · ffc3bb58

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/613

Differential Revision: D15541384

Pulled By: myleott

fbshipit-source-id: ef2c0b0a51cdf37af2ccff0546f524d49f87e65d

ffc3bb58

Fix PyTorch deprecation warnings · 9770f367

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/618

Differential Revision: D15552599

Pulled By: myleott

fbshipit-source-id: 2192a30a9c5af31b954a3a1716166dd6ba27b23a

9770f367

Added support for plotting scalars through palaas tbwriter interface. (#580) · 47313d85

Sujit Verma authored May 29, 2019

Summary: Changes for supporting tensorboard scalar plotting.

Reviewed By: myleott

Differential Revision: D15456534

Pulled By: myleott

fbshipit-source-id: a012a4eea028aae764ce11786570b7d96841c4a5

47313d85

device error in SinusoidalPositionalEmbedding (#746) · dd0dc54c

lukovnikov authored May 29, 2019

Summary:
Not sure if I'm doing something wrong elsewhere, but I had a device error in `SinusoidalPositionalEmbedding` when running on GPU > 0 because the weights were on a different device than the input.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/746

Differential Revision: D15547217

Pulled By: myleott

fbshipit-source-id: 37849d895ce483c14615fdb4ace8a8c4fb05b568

dd0dc54c

29 May, 2019 7 commits

Fix Tensorboard Init (#763) · 497d972e

Zhanghao Wu authored May 29, 2019

Summary:
Fix the mismatching between the parameter fed into `SummaryWriter` and the API of the latest [tensorboardX](https://github.com/lanpa/tensorboardX/blob/3e35c9b5f85e8ceb0294532d9eb772341a04c097/tensorboardX/writer.py#L192), i.e. "log_dir" -> "logdir".
Pull Request resolved: https://github.com/pytorch/fairseq/pull/763

Differential Revision: D15547192

Pulled By: myleott

fbshipit-source-id: c51b88da5ec589fb8ca5b4876bc229efeb7bf494

497d972e

Faster masking in MultiheadAttention · b18a3126

Myle Ott authored May 29, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/612

Differential Revision: D15541377

Pulled By: myleott

fbshipit-source-id: 4762516a3b545d03bc81d3660f47827e15466dce

b18a3126

Fix warmup for polynomial decay schedule · c97978a2

Myle Ott authored May 29, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/611

Differential Revision: D15541303

Pulled By: myleott

fbshipit-source-id: 279ca813437c834fca49576a48b75cbf1fdf0e76

c97978a2

Support multiple seeds in data_utils.numpy_seed · 977e36e5

Myle Ott authored May 29, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/610

Differential Revision: D15541261

Pulled By: myleott

fbshipit-source-id: f0b823cf4f04c5ef3205f6d259c6dcad4cc329b1

977e36e5

rm BertLayerNorm · 3e472b22

Myle Ott authored May 29, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/608

Differential Revision: D15541220

Pulled By: myleott

fbshipit-source-id: 52a8e4da72cc6e3e25cf98c989d34a269d614c9d

3e472b22

making it easier to use transformer_lm model with new tasks · ed592ab5

Spencer Poff authored May 29, 2019

Summary:
There were two non-obvious errors I ran into while creating a new language modeling task:
- `transformer_lm` implicitly required the `tokens_per_sample` arg
- `transformer_lm` assumed the task had a `dictionary` and `output_dictionary` property, neither of which are specified in the FairseqTask interface

Reviewed By: myleott

Differential Revision: D15532345

fbshipit-source-id: 200d7d3b542c35f17cc2d6bca4219c4a4d17cb6b

ed592ab5

Make XLM torchscipt Export-able (#765) · 4e9ecb80

Kartikay Khandelwal authored May 29, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/765

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/614

This diff has changes needed to make XLM torchscript exportable.

Reviewed By: bethebunny

Differential Revision: D15497208

fbshipit-source-id: fd9645119e154e3c397f147acf9144d661d9a5c8

4e9ecb80

28 May, 2019 1 commit

Add --sentence-bleu option to score.py · 65f46473

Myle Ott authored May 28, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/605

Differential Revision: D15518167

Pulled By: myleott

fbshipit-source-id: 8b0e6b32adff018136d0d251b7fde3818e373d6f

65f46473

24 May, 2019 2 commits

Implement reducing footprint of average checkpoint correctly (#747) · 8ce2c35d

Yongqiang Wang authored May 24, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/747

In https://github.com/pytorch/fairseq/pull/647, checkpoint averaging
is not Implemented correctly when it comes to shared parameters. This diff
has the right Implementation and a test case to guard future change.

Reviewed By: myleott

Differential Revision: D15402943

fbshipit-source-id: 8004836d5c2571814ea54844650618008a9ee522

8ce2c35d

fix bug for masking prob (#758) · 6b0cce84

Jingfei Du authored May 24, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/758

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/603

fixed a typo for _mask_block of mlm. This typo will make we never set masked token as random token, which should take 10% of the masked tokens.

Reviewed By: akinh

Differential Revision: D15492315

fbshipit-source-id: 1e03dc862e23a6543e51d7401c74608d366ba62d

6b0cce84

23 May, 2019 3 commits

collections.abc python 3.8 · 6b3a516f

Jason Fried authored May 23, 2019

Summary:
In python 3.7 collections.abc warns when importing abc classes from `collections`
In 3.8 this will not work at all.

This changes all code using abc's from collections to attempt to import from `collections.abc`

I am not fixing existing lint's don't ask, if `arc lint` auto-fixed I accepted, except for spelling in code.

Reviewed By: lisroach

Differential Revision: D15461049

fbshipit-source-id: ac2bf2ec8cffacd8ba5572882b0832bbf99a1646

6b3a516f

Fix gating for find_unused_parameters · 128f4bea

Myle Ott authored May 23, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/600

Differential Revision: D15469322

Pulled By: myleott

fbshipit-source-id: fdefa8efbb10e48b2a04a6bc10404fd2f3f21ecf

128f4bea

Allow unused params in distributed training · 72a5487c

Kritika Singh authored May 22, 2019

Summary:
Context from https://fb.workplace.com/groups/1405155842844877/permalink/2785095451517569/:

I am adding a model to pyspeech (formerly fairspeq) with the following `forward`:
```
def forward(self, src_tokens, src_lengths, prev_output_tokens, name):
    encoder_out = self.encoder(src_tokens, src_lengths)
    if name == Dataset.d1:
        decoder_out = self.decoder1(prev_output_tokens, encoder_out)
    elif name == Dataset.d2:
        decoder_out = self.decoder2(encoder_out)
    return decoder_out
```
When I run distributed training on this model, I get the following error:

```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a
new one. This error indicates that your module has parameters that were not used in
producing loss. You can enable unused parameter detection by (1) passing the keyword
argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2)
making sure all `forward` function outputs participate in calculating loss. If you already have
done the above two steps, then the distributed data parallel module wasn't able to locate the
output tensors in the return value of your module's `forward` function. Please include the loss
function and the structure of the return value of `forward` of your module when reporting this
issue (e.g. list, dict, iterable). (prepare_for_backward at
caffe2/torch/csrc/distributed/c10d/reducer.cpp:410)
```

The recommended fix is to pass find_unused_parameters=True to DistributedDataParallel's initialization

Reviewed By: myleott

Differential Revision: D15439726

fbshipit-source-id: 7fd80d4a3f49ac90182dec723b49b14e6689406a

72a5487c