Commits · 9dd8724577fd01511a59df962e856a7e30be4618 · OpenDAS / Fairseq

30 Nov, 2018 1 commit

fixed torch 0.4.0 , "RuntimeError: Expected object of type torch.cuda… (#393) · 9dd87245

linkerr authored Nov 30, 2018

Summary:
….LongTensor but found type torch.cuda.FloatTensor for argument #3 'index' " error

in the torch.__version__ == 0.4.0 ,
new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
will return a float dtype Tensor, when exec the "line 321: fairseq/fairseq/models/fconv.py " will throw a RuntimeError
Pull Request resolved: https://github.com/pytorch/fairseq/pull/393

Differential Revision: D13276496

Pulled By: myleott

fbshipit-source-id: e7986246fbe2c79fff61bcab0e5bec9dd63e0afd

9dd87245

29 Nov, 2018 2 commits

fixes on bi-transformer onnx · 7bbe528d

Haoran Li authored Nov 28, 2018

Summary: replace dynamic index put with copying and creating a new tensor

Reviewed By: wanchaol

Differential Revision: D13244573

fbshipit-source-id: 909f7913ad579ed035f29bb52321ff01e09a2c60

7bbe528d

Fix --ddp-backend=no_c10d for params that don't require grads · 866d0d2e

Myle Ott authored Nov 28, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/388

Reviewed By: theweiho

Differential Revision: D13244869

fbshipit-source-id: d22c18f63f9a691ccc7245e06bc9a5b776a192b5

866d0d2e

27 Nov, 2018 2 commits

Decoder embedding sharing in PyTorch Translate for denoising autoencoder (#386) · 07e34244

Liezl Puzon authored Nov 27, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/386

Pull Request resolved: https://github.com/pytorch/translate/pull/266

This allows decoder embedding sharing for denoising autoencoder modules with different decoders (one for src decoding and one for tgt decoding)

Reviewed By: dpacgopinath

Differential Revision: D13133015

fbshipit-source-id: 3c98be639d705744ccf5ba3a8fd7d10ddc7aef4a

07e34244

onnx bi-transformer (#385) · a5e2d786

Haoran Li authored Nov 26, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/385

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/6

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14292

Reviewed By: jingfeidu

Differential Revision: D10517864

fbshipit-source-id: 81008b5cc6aab70e23329c187392fb72ee057d78

a5e2d786

26 Nov, 2018 2 commits

Fix some recursive functions (e.g., reorder_incremental_state) to only touch... · 14506a83

Myle Ott authored Nov 25, 2018

Fix some recursive functions (e.g., reorder_incremental_state) to only touch each module once (#379)

Summary:
This can happen if a module is registered in more than one place in the network.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/379

Differential Revision: D13154498

Pulled By: myleott

fbshipit-source-id: a35575d1956a46cd35ac8b16a719ad20ac3e380a

14506a83

Refactor BacktranslationDataset to be more reusable (#354) · 3c19878f

Myle Ott authored Nov 25, 2018

Summary:
- generalize AppendEosDataset -> TransformEosDataset
- remove EOS logic from BacktranslationDataset (use TransformEosDataset instead)
- BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself
Pull Request resolved: https://github.com/pytorch/fairseq/pull/354

Reviewed By: liezl200

Differential Revision: D12970233

Pulled By: myleott

fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36

3c19878f

19 Nov, 2018 1 commit

Protect against failures in case of OOMs · a442244d

Halil Akin authored Nov 19, 2018

Summary: Fixing some distributed failures that happen when OOMs are observed.

Reviewed By: myleott

Differential Revision: D13121054

fbshipit-source-id: f71a0a695332acbaa1797e89887b8b7c7ddaa727

a442244d

18 Nov, 2018 2 commits

Merge small fixes from internal · 693894b6

Naman Goyal authored Nov 18, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/374

Differential Revision: D13116074

Pulled By: myleott

fbshipit-source-id: 485724cc5a40e8360d21e4bf9c35821baa0ddc57

693894b6

Fix build for docs · 0864a9c4

Myle Ott authored Nov 18, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/372

Differential Revision: D13114426

Pulled By: myleott

fbshipit-source-id: 6c24b96a3556a0ecd3d1f350642a884254a40bd3

0864a9c4

17 Nov, 2018 1 commit

Add LegacyDistributedDataParallel in place of no_c10d (#370) · 2625b0a4

Myle Ott authored Nov 17, 2018

Summary:
This should bring back the speedup with --update-freq that we reported in the Scaling Neural Machine Translation paper.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/370

Differential Revision: D13100281

Pulled By: myleott

fbshipit-source-id: 4a81b51bb7390a197add314a4be5512bbf68c085

2625b0a4

16 Nov, 2018 1 commit

make dictionary optional · a4e34985

Haoran Li authored Nov 16, 2018

Reviewed By: jingfeidu

Differential Revision: D13104360

fbshipit-source-id: 9636f5ee2721818f98b33af559fa24292534a72f

a4e34985

14 Nov, 2018 1 commit

Fix dummy batch when --max-tokens is small (fixes #347) · 161d1e06

Myle Ott authored Nov 14, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/366

Differential Revision: D13058513

Pulled By: myleott

fbshipit-source-id: a146d2cfb345d404775ed8d6b8e4a4ad4e7a33b4

161d1e06

13 Nov, 2018 1 commit

Support for BPE vocabs + denoising autoencoder in PyTorch Translate (#362) · 7e60d45b

Liezl Puzon authored Nov 13, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/362

Pull Request resolved: https://github.com/pytorch/translate/pull/254

This actually uses the fairseq logic which supports BPE cont / end word marker suffixes.

Reviewed By: xianxl

Differential Revision: D12952766

fbshipit-source-id: 35a1bbc38240e4145bec0fc419f2d0a6a73ae2e5

7e60d45b

10 Nov, 2018 1 commit

pipeline for LM training · 880e7cd4

Ruty Rinott authored Nov 09, 2018

Summary:
step 2 of pipeline for LM training
assumes tokenized text data as input. Splits it into train/validation/test, and runs binarization
(step a_ii in https://fb.quip.com/kazzAxvZHBj9)

Reviewed By: borguz

Differential Revision: D10454705

fbshipit-source-id: 74e8679041f5507c4e404c1b719547c2ae9ed983

880e7cd4

08 Nov, 2018 1 commit

Fix error when training multilingual_translation task with multi-GPU · 189fcabf

Peng-Jen Chen authored Nov 08, 2018

Summary:
D10052908 introduce multilingual_translation task, but it raises exception when training with multiple-GPUs: P60202593

With Myle's help, we found that it is because of improperly handled dummy batch data type, and it causes optimizer.backward() is not executed same number of times cross different GPUs.

Reviewed By: xianxl

Differential Revision: D12964263

fbshipit-source-id: 4991039030bf373f0c484e131acc4736487be4d8

189fcabf

07 Nov, 2018 2 commits

Merge internal changes · 8eb232ce

Myle Ott authored Nov 07, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/352

Differential Revision: D12956930

Pulled By: myleott

fbshipit-source-id: 39334a79544bac570feb04be9103269d7c1563f9

8eb232ce

Support BPE end of word marker suffix in fairseq noising module · 2b13f3c0

Liezl Puzon authored Nov 06, 2018

Summary:
There are 2 ways to implement BPE:
1. use a continuation marker suffix to indicate that there is at least one more subtoken left in the word
2. use a end of word marker suffix to indicate that there is no more subtokens left in the word

This adds some logic to account for either kind of BPE marker suffix. This diff adds a corresponding test. I also refactored the test setup to reduce the number of boolean args when setting up test data.

Reviewed By: xianxl

Differential Revision: D12919428

fbshipit-source-id: 405e9f346dce6e736c1305288721dfc7b63e872a

2b13f3c0

02 Nov, 2018 2 commits

Refactor fairseq/test_noising with a word shuffle helper function (#340) · b1521f96

Liezl Puzon authored Nov 01, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/340

This allows us to do a lot less copy paste when adding new word shuffle function tests

Reviewed By: xianxl

Differential Revision: D12810304

fbshipit-source-id: a56b5df093d17be2b73837897c526978cab92b70

b1521f96

Black formatting in fairseq/test_noising (#341) · 0b05467d

Liezl Puzon authored Nov 01, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/341

Use black formatting in test_noising.py

Reviewed By: xianxl

Differential Revision: D12810285

fbshipit-source-id: 5517dd5d2f086831f487d88acf6bc2fa18820297

0b05467d

01 Nov, 2018 6 commits

Fix "ignore-case" behavior (#339) · 726a47dc

ngimel authored Nov 01, 2018

Summary:
Currently, if `ignore-case` is set, the same line will be yielded twice - once as lower-cased version, once as original version, leading to lower than expected uncased scores.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/339

Differential Revision: D12890386

Pulled By: myleott

fbshipit-source-id: 0570e5f6e8f848f2c6439d615e70aca6df097eef

726a47dc

Move fairseq part of D10478427 directly into pytorch-translate (#337) · 50a671f7

Myle Ott authored Nov 01, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/337

Pull Request resolved: https://github.com/pytorch/translate/pull/250

Reviewed By: akinh

Differential Revision: D12880352

fbshipit-source-id: 61e9888a9cc3df07e805820b74a5fcf359dfe0ea

50a671f7

Denoising autoencoder task (#251) · c9c660c0

Liezl Puzon authored Nov 01, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/251

We should use shared encoder and separate decoders as in:

https://fb.facebook.com/groups/2156114531381111/permalink/2169028113423086/

Generation is a hack, ideally the net input should have the lang pair info so that when we pass the sample to the model, it can select the correct encoder/decoder pair.

diff [2/2] will be for flow integration for basic experimentation

TODO in a future diff: figure out how to generalize this so export will work??

This works with vocab reduction, but we only support vocab reduction for src-tgt, not src-src model. A future (lowpri) task could be to add word prediction vocab reduction for src-src model to speed up training.

Reviewed By: xianxl

Differential Revision: D10512576

fbshipit-source-id: 545d96cad8e814b9da7be102a48cc5cac358b758

c9c660c0

Fix tests + style nits + Python 3.5 compat · 5bbd148e

Myle Ott authored Nov 01, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/336

Differential Revision: D12876709

Pulled By: myleott

fbshipit-source-id: a31536e2eb93f752600b9940c28e9b9fcefc8b86

5bbd148e

Update bleu.py (#320) · f3a0939e

Zihao Fu authored Oct 31, 2018

Summary:
Modify Error message of bleu.
Fix the issue:  https://github.com/pytorch/fairseq/issues/284
Pull Request resolved: https://github.com/pytorch/fairseq/pull/320

Differential Revision: D12876721

Pulled By: myleott

fbshipit-source-id: df25885a94a584cbf4b86a1665e3e513c7eb8e9a

f3a0939e

match examples/stories/writingPrompts scripts to correct folder · f41088a5

John Pope authored Oct 31, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/290

Differential Revision: D12876759

Pulled By: myleott

fbshipit-source-id: 9f6d1c9de27dad29368a7edb923dfcf770355938

f41088a5

30 Oct, 2018 1 commit

transformer onnx trace: skip no-op transpose (#333) · 672977c1

James Cross authored Oct 29, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/333

A tiny hack to speed up inference slightly for transformer beam search after export to graph mode. Specifically, there is no need to transpose a dimension with size 1 (the sequence length of a single decoder time step during beam search) with its neighbor immediately before a view/reshape.

Reviewed By: jmp84

Differential Revision: D12833011

fbshipit-source-id: f9c344a9ad595e6e48a8a65b31cf2b1392f9b938

672977c1

27 Oct, 2018 1 commit

Extend WordShuffle noising function to apply to non-bpe tokens · 90c01b3a

Xian Li authored Oct 26, 2018

Summary:
We'd like to resue the noising functions and DenoisingDataset in
adversarial training. However, current noising functions assume the input are
subword tokens. The goal of this diff is to extend it so the noising can be
applied to word tokens. Since we're mostly interested in the word shuffle
noising, so I only modified the WordShuffle class.

Reviewed By: liezl200

Differential Revision: D10523177

fbshipit-source-id: 1e5d27362850675010e73cd38850c890d42652ab

90c01b3a

26 Oct, 2018 1 commit

Fix print & add more informative logging · 6117f827

Wei Ho authored Oct 26, 2018

Summary: Fix fairseq's `force` option for disabling print suppression (otherwise, `print(..., force=True)` fails on master since the force kwarg gets passed to the builtin print).

Reviewed By: dpacgopinath

Differential Revision: D10522058

fbshipit-source-id: bbc10c021a7d21396ebfbb1bf007f6b9b162f4fd

6117f827

25 Oct, 2018 2 commits

LanguagePairDataset and BacktranslationDataset changes for semi supervised task setup (#330) · 0d63cf03

Deepak Gopinath authored Oct 25, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/330

As part of the semi sueprvised task setup (https://github.com/pytorch/translate/pull/243), this diff adds the ability for LanguagePairDataset to remove EOS from source or append EOS to target. This functionality is required by BacktranslationDataset to use translations as source data.

Also added changes to BacktranslationDataset to make it work on GPU. We needed to transfer back-translated sentences back to CPU for the LanguagePairDataset to collate.

Reviewed By: liezl200

Differential Revision: D10846294

fbshipit-source-id: b015ecb5fcef26fba507c30f8a4992bdbc54899f

0d63cf03

make fairseq models compatible with character inputs and use character inputs for elmo in pytext · 4afa455e

Haoran Li authored Oct 25, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/321

Reviewed By: alexeib

Differential Revision: D10430186

fbshipit-source-id: 9cc8fe0f202cc49370cecf36312bcc9bf0b4deee

4afa455e

23 Oct, 2018 2 commits

Add size method to BacktranslationDataset + misc fixes (#325) · 613ffeea

Deepak Gopinath authored Oct 22, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/325

RoundRobinZipDataset requires size(index) method implemented in every dataset used. Also added missing return statements in a few methods.

Reviewed By: liezl200

Differential Revision: D10457159

fbshipit-source-id: 01856eb455f2f3a21e7fb723129ff35fbe29e0ae

613ffeea

Expose BacktranslationDataset from fairseq.data (#324) · 1aae5f6a

Deepak Gopinath authored Oct 22, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/324

BacktranslationDataset was introduced recently but was not exposed as part of the fairseq.data module

Reviewed By: liezl200

Differential Revision: D10412717

fbshipit-source-id: 8a9d4ecd43fd376e895c450d00e765a869c95eff

1aae5f6a

22 Oct, 2018 1 commit

Fix another distributed syncing issue · 23e9dc2e

Halil Akin authored Oct 22, 2018

Summary:
This is another failure due to distributed GPU's getting out of sync.
We are running save_and_eval (which has the inter-gpu communication calls) by
looking at number of updates. But number of updates means weight updates. Whenever
there is an issue in the training and weights can't be updated, nodes go
out of sync and nodes start failing. So we should check number of iterations instead.

I am, again, making a small change to save the day, but we should decouple/refactor
save_and_eval logic from the training, to have less headache in future.
Planning, working on that in future. But this should solve some of the
issues for now.

Reviewed By: jhcross

Differential Revision: D10478427

fbshipit-source-id: b9deacfea252b2fb66b81c799fa78e2439fa514c

23e9dc2e

21 Oct, 2018 1 commit

Manually port pull request 385 · 8441cbf3

Peng-Jen Chen authored Oct 20, 2018

Summary:
Manually port fairinternal fairseq-py pull request #385 [1] to fbcode.

Resolve the merge conflict of removing fp16_trainer per offline discussion with Myle. Also updated codes to make generate.py works.

[1] https://github.com/fairinternal/fairseq-py/pull/385/commits/18fa6e154781cf0c4b1596429dba7e753a545069

Reviewed By: liezl200

Differential Revision: D10052908

fbshipit-source-id: c3c378d78dc1e9ac087c815f359e78c0048ff2f5

8441cbf3

19 Oct, 2018 1 commit

Update upgrade_state_dict in transformer.py to upgrade_state_dict_named (#317) · 0a628401

Peng-Jen Chen authored Oct 19, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/317

When upgrading `state_dict` variable, `upgrade_state_dict` function in TransformerEncoder/TransformerDecoder doesn't handle multiple encoders/decoders, however, D10052908 will be the case.

Before the change, we will hit error message [1] when loading checkpoint for multilingual_transformer model in D10052908. This diff will fix it.

Reviewed By: myleott, liezl200

Differential Revision: D10375418

fbshipit-source-id: 7104c1a463e78f3fa33d8479a37c51608be50610

0a628401

17 Oct, 2018 1 commit

fix make_positions() typo (#316) · 0eea6923

James Cross authored Oct 17, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/316

This code should actually be keeping the padded positions as `padding_idx` (though note that this is on the ONNX export path, and it has no effect in the most common case when using the exported network to do un-batched inference).

Reviewed By: myleott

Differential Revision: D10431872

fbshipit-source-id: 79fe4ac27cafcd4701e0f2a90e29d1b7362dc6f8

0eea6923

06 Oct, 2018 2 commits

Add denoising dataset for denoising autoencoder (#306) · e286243c

Liezl Puzon authored Oct 05, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/306

This uses a source dataset to generate a batch of {source: noisy source, target: original clean source} which allows us to train a denoising autoencoding component as part of a seq2seq model.

Reviewed By: xianxl

Differential Revision: D10078981

fbshipit-source-id: 026225984d4a97062ac05dc3a36e79b5c841fe9c

e286243c

Have noising account for sentences with and without EOS (#305) · 8798a240

Liezl Puzon authored Oct 05, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/305

Previously, noising code assumed that every sentence had an EOS which had to be excluded from noising operations (since we shouldn't drop, blank, or shuffle EOS). This logic allows the noising module to handle sentences with EOS and without EOS

Reviewed By: xianxl

Differential Revision: D10114425

fbshipit-source-id: 04ec8547343eb94266bda1ac7fca3d8a1991c9f4

8798a240

05 Oct, 2018 1 commit

multihead_attention: pre-transpose incremental state (#232) · 265f42b7

James Cross authored Oct 05, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/232

Though transpose operations are essentially free during PyTorch execution, they can result in costly operations when exported to Caffe2 inference nets via ONNX tracing, especially when applied repeatedly to large tensors.

For this reason, we update `MultiheadAttention` to store its incremental state with shape (bsz, num_heads, seq_len, head_dim), that is after transposing the projected input. This should result in non-trivially faster exported models without changing the semantics or speed of PyTorch execution.

Reviewed By: myleott

Differential Revision: D10186506

fbshipit-source-id: 8a42712423ee767ea49ed88d2a4653f900d14fba

265f42b7