Commits · d9284ee7eac9862ed478e8667e3f851aaabd863e · OpenDAS / Fairseq

14 Jan, 2019 1 commit

Huihui Fan authored Jan 14, 2019

Summary:
minor fixes:
1- adding fairseq logo
2- encoder padding for fconv self att
3- legacy ddp change
Pull Request resolved: https://github.com/pytorch/fairseq/pull/442

Differential Revision: D13651715

Pulled By: myleott

fbshipit-source-id: ac93c80f1dbffdfe03fbd4b8a8ea527aecb576a7

d9284ee7

10 Jan, 2019 1 commit

Make error message for trying to train after make_generation_fast work correctly · 315fa5cb

Wei Ho authored Jan 09, 2019

Summary: https://github.com/pytorch/fairseq/blob/master/fairseq/trainer.py#L164 calls `train()` without any argument

Reviewed By: myleott

Differential Revision: D13599203

fbshipit-source-id: 3a096a6dd35a7a3f8309fbda3b54a36f606475e3

315fa5cb

09 Jan, 2019 2 commits

Misc fixes · 4b1f4788

Myle Ott authored Jan 09, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/439

Differential Revision: D13608151

Pulled By: myleott

fbshipit-source-id: 198b84995a6329f8329829cc91184d88f1eab947

4b1f4788

Fix broken link in README.md (#436) · 73876ce3

Art Matsak authored Jan 09, 2019

Summary:
https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset is not longer valid, redirects to a blog post listing page.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/436

Differential Revision: D13607961

Pulled By: myleott

fbshipit-source-id: 1a1074ffcbc454e29bc9d5aed84fdf2089a224bc

73876ce3

07 Jan, 2019 1 commit

Update docs for --lazy-load and torch.distributed.launch · 14bd9c62

Myle Ott authored Jan 07, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/433

Differential Revision: D13588032

Pulled By: myleott

fbshipit-source-id: 0e5ff361e27b206c4490264f0f51863367499e81

14bd9c62

05 Jan, 2019 3 commits

Cleanup more files · 7d66726b
Myle Ott authored Jan 05, 2019

7d66726b
rm fb_train.py (#432) · df4d566d
Myle Ott authored Jan 05, 2019

df4d566d

Merge internal changes (#283) · 7633129b

Myle Ott authored Jan 04, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/283

Pull Request resolved: https://github.com/pytorch/fairseq/pull/428

Differential Revision: D13564190

Pulled By: myleott

fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5

7633129b

28 Dec, 2018 3 commits

Make multiprocessing_train.py work with multi-node setups · 0cb87130

Myle Ott authored Dec 28, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/425

Differential Revision: D13558340

Pulled By: myleott

fbshipit-source-id: dff8c77027e821d8c80bfbd6a6ccce9ca1a44b78

0cb87130

Fix resuming from FP16 checkpoints (#424) · 58dd1862

Myle Ott authored Dec 27, 2018

Summary:
This was broken in 03a57dec.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/424

Differential Revision: D13557540

Pulled By: myleott

fbshipit-source-id: 62deda5353032aff20d35d046b0bb843da44d27c

58dd1862

Fix backtranslation dataset on IndexedCachedDataset (#410) · 31a43973

Paul Michel authored Dec 27, 2018

Summary:
BacktranslationDataset would throw an error when the underlying dataset was an IndexedCachedDataset because prefetching was not handled correctly. This fixes the error.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/410

Differential Revision: D13557539

Pulled By: myleott

fbshipit-source-id: 398ab59a3ebdbf1c666d862b9f905654eece800c

31a43973

26 Dec, 2018 2 commits

Merge internal changes (#422) · 8ce6499d

Myle Ott authored Dec 26, 2018

Summary:
- 04cc608: Add `--match-source-len` option to generate.py to for sequence-tagging tasks
- 19f1a40: Add `--no-repeat-ngram-size` option to generate.py for ngram blocking
Pull Request resolved: https://github.com/pytorch/fairseq/pull/422

Differential Revision: D13548445

Pulled By: myleott

fbshipit-source-id: 26d1ae83993e428fcb020dac5ae358b0e36233d9

8ce6499d

Add option to disable positional embeddings in TransformerModel (#421) · 19c17b74

Emanuele Bugliarello authored Dec 26, 2018

Summary:
Add argument `--no-token-positional-embeddings` to TransformerModel (currently only available in TransformerLanguageModel) to disable positional embeddings.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/421

Differential Revision: D13548450

Pulled By: myleott

fbshipit-source-id: b352c702ed1609e3b84d9a8404941d3274a7f883

19c17b74

24 Dec, 2018 2 commits

Improve memory efficiency of FP16 optimization (#404) · 03a57dec

Myle Ott authored Dec 24, 2018

Summary:
Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory.

This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16.

This does not affect convergence, i.e., models will train exactly as they did before.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/404

Differential Revision: D13394376

Pulled By: myleott

fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf

03a57dec

Add BufferedIterator (#419) · 0f833526

Myle Ott authored Dec 24, 2018

Summary:
This improves performance for datasets that load data lazily. Enabled by default since it shouldn't compromise performance for non-lazy datasets.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/419

Differential Revision: D13546585

Pulled By: myleott

fbshipit-source-id: f6152e2047291b0d68cd7506cd772b0caafe95be

0f833526

18 Dec, 2018 1 commit

data per gpu change · 9ca82a0e

Haoran Li authored Dec 18, 2018

Summary: Avoid loading entire data set per gpu to reduce memory footprint

Reviewed By: rutyrinott

Differential Revision: D13163548

fbshipit-source-id: 4ba717c8021ba5723d02225bae5782e2c3a18640

9ca82a0e

11 Dec, 2018 1 commit

Loading PreTrained Models (#406) · c37250ab

Suvrat Bhooshan authored Dec 10, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/406

Static helper function in TranslationTask to load pretrained models

Reviewed By: myleott

Differential Revision: D13345276

fbshipit-source-id: 3a675ee1a144ceb8b010f30e1a6163ef670b53f3

c37250ab

08 Dec, 2018 1 commit

fix data checking report bug (#403) · 00e47d7c

Peng Li authored Dec 08, 2018

Summary:
The original code reports the size of a valid sample instead of an invalid one when raising an Exception , which will make people confused.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/403

Differential Revision: D13391431

Pulled By: myleott

fbshipit-source-id: 4642ed027c0f664424fc5a9baf4363791144feaf

00e47d7c

07 Dec, 2018 2 commits

Add --fp16-scale-tolerance (#397) · 03ef3ab8

Myle Ott authored Dec 07, 2018

Summary:
Let's only decrease the loss scale if a large enough percentage of batches overflow.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/397

Differential Revision: D13355159

Pulled By: myleott

fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6

03ef3ab8

Take a dummy train step under OOM to keep multiprocessing in sync · 6c006a34

Halil Akin authored Dec 06, 2018

Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough.

Reviewed By: myleott

Differential Revision: D13086018

fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee

6c006a34

06 Dec, 2018 4 commits

Warn when using --update-freq on a single machine and --ddp-backend != no_c10d · ccd22212

Myle Ott authored Dec 06, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/400

Differential Revision: D13366996

Pulled By: myleott

fbshipit-source-id: b4907815e7cc1b4a2aceab11210bf64cb3d814c9

ccd22212

Fix arg formatting in preprocess.py and add fmt control for black formatting (#399) · 82a9f923

Myle Ott authored Dec 06, 2018

Summary:
Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/399

Differential Revision: D13364674

Pulled By: myleott

fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c

82a9f923

Add check that --encoder-layers matches --decoder-layers for LSTM (fixes #394) · 0693c351

Myle Ott authored Dec 06, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/398

Differential Revision: D13358876

Pulled By: myleott

fbshipit-source-id: 57673f2643aac01492cb8f5728bb9f1a34ba6aa7

0693c351

Enable check_reduction for imagenet flow and fairseq · 50591a29

Teng Li authored Dec 05, 2018

Summary:
As the title says, better to enable this for certain use cases to make
sure things are right

Reviewed By: myleott, pietern

Differential Revision: D13351753

fbshipit-source-id: cf495960fda71ebd679c23212e19703c93a9dbdc

50591a29

04 Dec, 2018 1 commit

Better error message if workers fall out of sync (#396) · 776e9ce3

Myle Ott authored Dec 04, 2018

Summary:
This kind of issue should be rare, but the exception that was thrown before ("UnpicklingError: invalid load key") was very opaque, so let's use something a bit clearer.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/396

Differential Revision: D13325600

Pulled By: myleott

fbshipit-source-id: 2e7093752d45d6b04a3d506aca8d5694b72ab638

776e9ce3

30 Nov, 2018 1 commit

fixed torch 0.4.0 , "RuntimeError: Expected object of type torch.cuda… (#393) · 9dd87245

linkerr authored Nov 30, 2018

Summary:
….LongTensor but found type torch.cuda.FloatTensor for argument #3 'index' " error

in the torch.__version__ == 0.4.0 ,
new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
will return a float dtype Tensor, when exec the "line 321: fairseq/fairseq/models/fconv.py " will throw a RuntimeError
Pull Request resolved: https://github.com/pytorch/fairseq/pull/393

Differential Revision: D13276496

Pulled By: myleott

fbshipit-source-id: e7986246fbe2c79fff61bcab0e5bec9dd63e0afd

9dd87245

29 Nov, 2018 2 commits

fixes on bi-transformer onnx · 7bbe528d

Haoran Li authored Nov 28, 2018

Summary: replace dynamic index put with copying and creating a new tensor

Reviewed By: wanchaol

Differential Revision: D13244573

fbshipit-source-id: 909f7913ad579ed035f29bb52321ff01e09a2c60

7bbe528d

Fix --ddp-backend=no_c10d for params that don't require grads · 866d0d2e

Myle Ott authored Nov 28, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/388

Reviewed By: theweiho

Differential Revision: D13244869

fbshipit-source-id: d22c18f63f9a691ccc7245e06bc9a5b776a192b5

866d0d2e

27 Nov, 2018 2 commits

Decoder embedding sharing in PyTorch Translate for denoising autoencoder (#386) · 07e34244

Liezl Puzon authored Nov 27, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/386

Pull Request resolved: https://github.com/pytorch/translate/pull/266

This allows decoder embedding sharing for denoising autoencoder modules with different decoders (one for src decoding and one for tgt decoding)

Reviewed By: dpacgopinath

Differential Revision: D13133015

fbshipit-source-id: 3c98be639d705744ccf5ba3a8fd7d10ddc7aef4a

07e34244

onnx bi-transformer (#385) · a5e2d786

Haoran Li authored Nov 26, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/385

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/6

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14292

Reviewed By: jingfeidu

Differential Revision: D10517864

fbshipit-source-id: 81008b5cc6aab70e23329c187392fb72ee057d78

a5e2d786

26 Nov, 2018 2 commits

Fix some recursive functions (e.g., reorder_incremental_state) to only touch... · 14506a83

Myle Ott authored Nov 25, 2018

Fix some recursive functions (e.g., reorder_incremental_state) to only touch each module once (#379)

Summary:
This can happen if a module is registered in more than one place in the network.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/379

Differential Revision: D13154498

Pulled By: myleott

fbshipit-source-id: a35575d1956a46cd35ac8b16a719ad20ac3e380a

14506a83

Refactor BacktranslationDataset to be more reusable (#354) · 3c19878f

Myle Ott authored Nov 25, 2018

Summary:
- generalize AppendEosDataset -> TransformEosDataset
- remove EOS logic from BacktranslationDataset (use TransformEosDataset instead)
- BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself
Pull Request resolved: https://github.com/pytorch/fairseq/pull/354

Reviewed By: liezl200

Differential Revision: D12970233

Pulled By: myleott

fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36

3c19878f

19 Nov, 2018 1 commit

Protect against failures in case of OOMs · a442244d

Halil Akin authored Nov 19, 2018

Summary: Fixing some distributed failures that happen when OOMs are observed.

Reviewed By: myleott

Differential Revision: D13121054

fbshipit-source-id: f71a0a695332acbaa1797e89887b8b7c7ddaa727

a442244d

18 Nov, 2018 2 commits

Merge small fixes from internal · 693894b6

Naman Goyal authored Nov 18, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/374

Differential Revision: D13116074

Pulled By: myleott

fbshipit-source-id: 485724cc5a40e8360d21e4bf9c35821baa0ddc57

693894b6

Fix build for docs · 0864a9c4

Myle Ott authored Nov 18, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/372

Differential Revision: D13114426

Pulled By: myleott

fbshipit-source-id: 6c24b96a3556a0ecd3d1f350642a884254a40bd3

0864a9c4

17 Nov, 2018 1 commit

Add LegacyDistributedDataParallel in place of no_c10d (#370) · 2625b0a4

Myle Ott authored Nov 17, 2018

Summary:
This should bring back the speedup with --update-freq that we reported in the Scaling Neural Machine Translation paper.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/370

Differential Revision: D13100281

Pulled By: myleott

fbshipit-source-id: 4a81b51bb7390a197add314a4be5512bbf68c085

2625b0a4

16 Nov, 2018 1 commit

make dictionary optional · a4e34985

Haoran Li authored Nov 16, 2018

Reviewed By: jingfeidu

Differential Revision: D13104360

fbshipit-source-id: 9636f5ee2721818f98b33af559fa24292534a72f

a4e34985

14 Nov, 2018 1 commit

Fix dummy batch when --max-tokens is small (fixes #347) · 161d1e06

Myle Ott authored Nov 14, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/366

Differential Revision: D13058513

Pulled By: myleott

fbshipit-source-id: a146d2cfb345d404775ed8d6b8e4a4ad4e7a33b4

161d1e06

13 Nov, 2018 1 commit

Support for BPE vocabs + denoising autoencoder in PyTorch Translate (#362) · 7e60d45b

Liezl Puzon authored Nov 13, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/362

Pull Request resolved: https://github.com/pytorch/translate/pull/254

This actually uses the fairseq logic which supports BPE cont / end word marker suffixes.

Reviewed By: xianxl

Differential Revision: D12952766

fbshipit-source-id: 35a1bbc38240e4145bec0fc419f2d0a6a73ae2e5

7e60d45b

10 Nov, 2018 1 commit

pipeline for LM training · 880e7cd4

Ruty Rinott authored Nov 09, 2018

Summary:
step 2 of pipeline for LM training
assumes tokenized text data as input. Splits it into train/validation/test, and runs binarization
(step a_ii in https://fb.quip.com/kazzAxvZHBj9)

Reviewed By: borguz

Differential Revision: D10454705

fbshipit-source-id: 74e8679041f5507c4e404c1b719547c2ae9ed983

880e7cd4