Commits · 73876ce3b6fa6a81680e5fcde6e29022416e24e0 · OpenDAS / Fairseq

09 Jan, 2019 1 commit

Fix broken link in README.md (#436) · 73876ce3

Art Matsak authored Jan 09, 2019

Summary:
https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset is not longer valid, redirects to a blog post listing page.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/436

Differential Revision: D13607961

Pulled By: myleott

fbshipit-source-id: 1a1074ffcbc454e29bc9d5aed84fdf2089a224bc

73876ce3

07 Jan, 2019 1 commit

Update docs for --lazy-load and torch.distributed.launch · 14bd9c62

Myle Ott authored Jan 07, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/433

Differential Revision: D13588032

Pulled By: myleott

fbshipit-source-id: 0e5ff361e27b206c4490264f0f51863367499e81

14bd9c62

05 Jan, 2019 3 commits

Cleanup more files · 7d66726b
Myle Ott authored Jan 05, 2019

7d66726b
rm fb_train.py (#432) · df4d566d
Myle Ott authored Jan 05, 2019

df4d566d

Merge internal changes (#283) · 7633129b

Myle Ott authored Jan 04, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/283

Pull Request resolved: https://github.com/pytorch/fairseq/pull/428

Differential Revision: D13564190

Pulled By: myleott

fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5

7633129b

28 Dec, 2018 3 commits

Make multiprocessing_train.py work with multi-node setups · 0cb87130

Myle Ott authored Dec 28, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/425

Differential Revision: D13558340

Pulled By: myleott

fbshipit-source-id: dff8c77027e821d8c80bfbd6a6ccce9ca1a44b78

0cb87130

Fix resuming from FP16 checkpoints (#424) · 58dd1862

Myle Ott authored Dec 27, 2018

Summary:
This was broken in 03a57dec.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/424

Differential Revision: D13557540

Pulled By: myleott

fbshipit-source-id: 62deda5353032aff20d35d046b0bb843da44d27c

58dd1862

Fix backtranslation dataset on IndexedCachedDataset (#410) · 31a43973

Paul Michel authored Dec 27, 2018

Summary:
BacktranslationDataset would throw an error when the underlying dataset was an IndexedCachedDataset because prefetching was not handled correctly. This fixes the error.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/410

Differential Revision: D13557539

Pulled By: myleott

fbshipit-source-id: 398ab59a3ebdbf1c666d862b9f905654eece800c

31a43973

26 Dec, 2018 2 commits

Merge internal changes (#422) · 8ce6499d

Myle Ott authored Dec 26, 2018

Summary:
- 04cc608: Add `--match-source-len` option to generate.py to for sequence-tagging tasks
- 19f1a40: Add `--no-repeat-ngram-size` option to generate.py for ngram blocking
Pull Request resolved: https://github.com/pytorch/fairseq/pull/422

Differential Revision: D13548445

Pulled By: myleott

fbshipit-source-id: 26d1ae83993e428fcb020dac5ae358b0e36233d9

8ce6499d

Add option to disable positional embeddings in TransformerModel (#421) · 19c17b74

Emanuele Bugliarello authored Dec 26, 2018

Summary:
Add argument `--no-token-positional-embeddings` to TransformerModel (currently only available in TransformerLanguageModel) to disable positional embeddings.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/421

Differential Revision: D13548450

Pulled By: myleott

fbshipit-source-id: b352c702ed1609e3b84d9a8404941d3274a7f883

19c17b74

24 Dec, 2018 2 commits

Improve memory efficiency of FP16 optimization (#404) · 03a57dec

Myle Ott authored Dec 24, 2018

Summary:
Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory.

This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16.

This does not affect convergence, i.e., models will train exactly as they did before.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/404

Differential Revision: D13394376

Pulled By: myleott

fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf

03a57dec

Add BufferedIterator (#419) · 0f833526

Myle Ott authored Dec 24, 2018

Summary:
This improves performance for datasets that load data lazily. Enabled by default since it shouldn't compromise performance for non-lazy datasets.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/419

Differential Revision: D13546585

Pulled By: myleott

fbshipit-source-id: f6152e2047291b0d68cd7506cd772b0caafe95be

0f833526

18 Dec, 2018 1 commit

data per gpu change · 9ca82a0e

Haoran Li authored Dec 18, 2018

Summary: Avoid loading entire data set per gpu to reduce memory footprint

Reviewed By: rutyrinott

Differential Revision: D13163548

fbshipit-source-id: 4ba717c8021ba5723d02225bae5782e2c3a18640

9ca82a0e

11 Dec, 2018 1 commit

Loading PreTrained Models (#406) · c37250ab

Suvrat Bhooshan authored Dec 10, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/406

Static helper function in TranslationTask to load pretrained models

Reviewed By: myleott

Differential Revision: D13345276

fbshipit-source-id: 3a675ee1a144ceb8b010f30e1a6163ef670b53f3

c37250ab

08 Dec, 2018 1 commit

fix data checking report bug (#403) · 00e47d7c

Peng Li authored Dec 08, 2018

Summary:
The original code reports the size of a valid sample instead of an invalid one when raising an Exception , which will make people confused.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/403

Differential Revision: D13391431

Pulled By: myleott

fbshipit-source-id: 4642ed027c0f664424fc5a9baf4363791144feaf

00e47d7c

07 Dec, 2018 2 commits

Add --fp16-scale-tolerance (#397) · 03ef3ab8

Myle Ott authored Dec 07, 2018

Summary:
Let's only decrease the loss scale if a large enough percentage of batches overflow.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/397

Differential Revision: D13355159

Pulled By: myleott

fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6

03ef3ab8

Take a dummy train step under OOM to keep multiprocessing in sync · 6c006a34

Halil Akin authored Dec 06, 2018

Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough.

Reviewed By: myleott

Differential Revision: D13086018

fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee

6c006a34

06 Dec, 2018 4 commits

Warn when using --update-freq on a single machine and --ddp-backend != no_c10d · ccd22212

Myle Ott authored Dec 06, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/400

Differential Revision: D13366996

Pulled By: myleott

fbshipit-source-id: b4907815e7cc1b4a2aceab11210bf64cb3d814c9

ccd22212

Fix arg formatting in preprocess.py and add fmt control for black formatting (#399) · 82a9f923

Myle Ott authored Dec 06, 2018

Summary:
Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/399

Differential Revision: D13364674

Pulled By: myleott

fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c

82a9f923

Add check that --encoder-layers matches --decoder-layers for LSTM (fixes #394) · 0693c351

Myle Ott authored Dec 06, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/398

Differential Revision: D13358876

Pulled By: myleott

fbshipit-source-id: 57673f2643aac01492cb8f5728bb9f1a34ba6aa7

0693c351

Enable check_reduction for imagenet flow and fairseq · 50591a29

Teng Li authored Dec 05, 2018

Summary:
As the title says, better to enable this for certain use cases to make
sure things are right

Reviewed By: myleott, pietern

Differential Revision: D13351753

fbshipit-source-id: cf495960fda71ebd679c23212e19703c93a9dbdc

50591a29

04 Dec, 2018 1 commit

Better error message if workers fall out of sync (#396) · 776e9ce3

Myle Ott authored Dec 04, 2018

Summary:
This kind of issue should be rare, but the exception that was thrown before ("UnpicklingError: invalid load key") was very opaque, so let's use something a bit clearer.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/396

Differential Revision: D13325600

Pulled By: myleott

fbshipit-source-id: 2e7093752d45d6b04a3d506aca8d5694b72ab638

776e9ce3

30 Nov, 2018 1 commit

fixed torch 0.4.0 , "RuntimeError: Expected object of type torch.cuda… (#393) · 9dd87245

linkerr authored Nov 30, 2018

Summary:
….LongTensor but found type torch.cuda.FloatTensor for argument #3 'index' " error

in the torch.__version__ == 0.4.0 ,
new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
will return a float dtype Tensor, when exec the "line 321: fairseq/fairseq/models/fconv.py " will throw a RuntimeError
Pull Request resolved: https://github.com/pytorch/fairseq/pull/393

Differential Revision: D13276496

Pulled By: myleott

fbshipit-source-id: e7986246fbe2c79fff61bcab0e5bec9dd63e0afd

9dd87245

29 Nov, 2018 2 commits

fixes on bi-transformer onnx · 7bbe528d

Haoran Li authored Nov 28, 2018

Summary: replace dynamic index put with copying and creating a new tensor

Reviewed By: wanchaol

Differential Revision: D13244573

fbshipit-source-id: 909f7913ad579ed035f29bb52321ff01e09a2c60

7bbe528d

Fix --ddp-backend=no_c10d for params that don't require grads · 866d0d2e

Myle Ott authored Nov 28, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/388

Reviewed By: theweiho

Differential Revision: D13244869

fbshipit-source-id: d22c18f63f9a691ccc7245e06bc9a5b776a192b5

866d0d2e

27 Nov, 2018 2 commits

Decoder embedding sharing in PyTorch Translate for denoising autoencoder (#386) · 07e34244

Liezl Puzon authored Nov 27, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/386

Pull Request resolved: https://github.com/pytorch/translate/pull/266

This allows decoder embedding sharing for denoising autoencoder modules with different decoders (one for src decoding and one for tgt decoding)

Reviewed By: dpacgopinath

Differential Revision: D13133015

fbshipit-source-id: 3c98be639d705744ccf5ba3a8fd7d10ddc7aef4a

07e34244

onnx bi-transformer (#385) · a5e2d786

Haoran Li authored Nov 26, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/385

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/6

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14292

Reviewed By: jingfeidu

Differential Revision: D10517864

fbshipit-source-id: 81008b5cc6aab70e23329c187392fb72ee057d78

a5e2d786

26 Nov, 2018 2 commits

Fix some recursive functions (e.g., reorder_incremental_state) to only touch... · 14506a83

Myle Ott authored Nov 25, 2018

Fix some recursive functions (e.g., reorder_incremental_state) to only touch each module once (#379)

Summary:
This can happen if a module is registered in more than one place in the network.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/379

Differential Revision: D13154498

Pulled By: myleott

fbshipit-source-id: a35575d1956a46cd35ac8b16a719ad20ac3e380a

14506a83

Refactor BacktranslationDataset to be more reusable (#354) · 3c19878f

Myle Ott authored Nov 25, 2018

Summary:
- generalize AppendEosDataset -> TransformEosDataset
- remove EOS logic from BacktranslationDataset (use TransformEosDataset instead)
- BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself
Pull Request resolved: https://github.com/pytorch/fairseq/pull/354

Reviewed By: liezl200

Differential Revision: D12970233

Pulled By: myleott

fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36

3c19878f

19 Nov, 2018 1 commit

Protect against failures in case of OOMs · a442244d

Halil Akin authored Nov 19, 2018

Summary: Fixing some distributed failures that happen when OOMs are observed.

Reviewed By: myleott

Differential Revision: D13121054

fbshipit-source-id: f71a0a695332acbaa1797e89887b8b7c7ddaa727

a442244d

18 Nov, 2018 2 commits

Merge small fixes from internal · 693894b6

Naman Goyal authored Nov 18, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/374

Differential Revision: D13116074

Pulled By: myleott

fbshipit-source-id: 485724cc5a40e8360d21e4bf9c35821baa0ddc57

693894b6

Fix build for docs · 0864a9c4

Myle Ott authored Nov 18, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/372

Differential Revision: D13114426

Pulled By: myleott

fbshipit-source-id: 6c24b96a3556a0ecd3d1f350642a884254a40bd3

0864a9c4

17 Nov, 2018 1 commit

Add LegacyDistributedDataParallel in place of no_c10d (#370) · 2625b0a4

Myle Ott authored Nov 17, 2018

Summary:
This should bring back the speedup with --update-freq that we reported in the Scaling Neural Machine Translation paper.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/370

Differential Revision: D13100281

Pulled By: myleott

fbshipit-source-id: 4a81b51bb7390a197add314a4be5512bbf68c085

2625b0a4

16 Nov, 2018 1 commit

make dictionary optional · a4e34985

Haoran Li authored Nov 16, 2018

Reviewed By: jingfeidu

Differential Revision: D13104360

fbshipit-source-id: 9636f5ee2721818f98b33af559fa24292534a72f

a4e34985

14 Nov, 2018 1 commit

Fix dummy batch when --max-tokens is small (fixes #347) · 161d1e06

Myle Ott authored Nov 14, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/366

Differential Revision: D13058513

Pulled By: myleott

fbshipit-source-id: a146d2cfb345d404775ed8d6b8e4a4ad4e7a33b4

161d1e06

13 Nov, 2018 1 commit

Support for BPE vocabs + denoising autoencoder in PyTorch Translate (#362) · 7e60d45b

Liezl Puzon authored Nov 13, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/362

Pull Request resolved: https://github.com/pytorch/translate/pull/254

This actually uses the fairseq logic which supports BPE cont / end word marker suffixes.

Reviewed By: xianxl

Differential Revision: D12952766

fbshipit-source-id: 35a1bbc38240e4145bec0fc419f2d0a6a73ae2e5

7e60d45b

10 Nov, 2018 1 commit

pipeline for LM training · 880e7cd4

Ruty Rinott authored Nov 09, 2018

Summary:
step 2 of pipeline for LM training
assumes tokenized text data as input. Splits it into train/validation/test, and runs binarization
(step a_ii in https://fb.quip.com/kazzAxvZHBj9)

Reviewed By: borguz

Differential Revision: D10454705

fbshipit-source-id: 74e8679041f5507c4e404c1b719547c2ae9ed983

880e7cd4

08 Nov, 2018 1 commit

Fix error when training multilingual_translation task with multi-GPU · 189fcabf

Peng-Jen Chen authored Nov 08, 2018

Summary:
D10052908 introduce multilingual_translation task, but it raises exception when training with multiple-GPUs: P60202593

With Myle's help, we found that it is because of improperly handled dummy batch data type, and it causes optimizer.backward() is not executed same number of times cross different GPUs.

Reviewed By: xianxl

Differential Revision: D12964263

fbshipit-source-id: 4991039030bf373f0c484e131acc4736487be4d8

189fcabf

07 Nov, 2018 2 commits

Merge internal changes · 8eb232ce

Myle Ott authored Nov 07, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/352

Differential Revision: D12956930

Pulled By: myleott

fbshipit-source-id: 39334a79544bac570feb04be9103269d7c1563f9

8eb232ce

Support BPE end of word marker suffix in fairseq noising module · 2b13f3c0

Liezl Puzon authored Nov 06, 2018

Summary:
There are 2 ways to implement BPE:
1. use a continuation marker suffix to indicate that there is at least one more subtoken left in the word
2. use a end of word marker suffix to indicate that there is no more subtokens left in the word

This adds some logic to account for either kind of BPE marker suffix. This diff adds a corresponding test. I also refactored the test setup to reduce the number of boolean args when setting up test data.

Reviewed By: xianxl

Differential Revision: D12919428

fbshipit-source-id: 405e9f346dce6e736c1305288721dfc7b63e872a

2b13f3c0