Commits · 7853818c2e33a63ec17a31bcfe20e4fc75d94130 · OpenDAS / Fairseq

16 Jan, 2019 3 commits

FIX: '--user-dir' on multi-gpu (#449) · 7853818c

Davide Caroselli authored Jan 16, 2019

Summary:
On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`.

This pull request fixes this problem: custom module import in now explicit on every `main()` function.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/449

Differential Revision: D13676922

Pulled By: myleott

fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6

7853818c

Add --checkpoint-upper-bound to average_checkpoints.py (#452) · bdec179b

Myle Ott authored Jan 16, 2019

Summary:
This is useful for averaging the last N checkpoints, ending at some "best" checkpoint.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/452

Differential Revision: D13695407

Pulled By: myleott

fbshipit-source-id: 5d9d2bff3706834f01501e9259834c77fb335817

bdec179b

optimizations for token_block_dataset · d1dc66d9

Ruty Rinott authored Jan 15, 2019

Summary:
optimizing memory use of token_block_dataset by replacing python data structures with numpy arrays.
applying needed parts from D13498973, instead of rebasing it on changes

Reviewed By: edunov

Differential Revision: D13678485

fbshipit-source-id: c0c827a8b95834a6a5456476040ebdc8e42136d4

d1dc66d9

15 Jan, 2019 2 commits

Fixed wrong help message shown on '--help' (#446) · cefe3f8a

Davide Caroselli authored Jan 15, 2019

Summary:
Correct help message was obfuscated by the transient `ArgumentParser` used only for eagerly read `--user-dir` flag.

To reproduce just try:
```bash
python3 train.py --help
```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/446

Differential Revision: D13674731

Pulled By: myleott

fbshipit-source-id: b9503a4d7ef26405be630d31c0ca02386d783031

cefe3f8a

'--user-dir' documentation (correct) (#447) · ebaf8c50

Davide Caroselli authored Jan 15, 2019

Summary:
Command line option --user-dir documented in docs/overview.rst
Pull Request resolved: https://github.com/pytorch/fairseq/pull/447

Differential Revision: D13674744

Pulled By: myleott

fbshipit-source-id: 17049ee5c9f692f5298ef9fa7381ee583f269cde

ebaf8c50

14 Jan, 2019 2 commits

New command line option '--user-dir' (#440) · b15f5f53

Davide Caroselli authored Jan 14, 2019

Summary:
Following discussion on official fairseq (https://github.com/pytorch/fairseq/issues/438), I added the `--user-dir` option to the command line. The user can now specify a path in order to import a custom module with proprietary tasks, architectures and so on.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/440

Differential Revision: D13651721

Pulled By: myleott

fbshipit-source-id: 38b87454487f1ffa5eaf19c4bcefa0b3b15a8f43

b15f5f53

Fixes (#442) · d9284ee7

Huihui Fan authored Jan 14, 2019

Summary:
minor fixes:
1- adding fairseq logo
2- encoder padding for fconv self att
3- legacy ddp change
Pull Request resolved: https://github.com/pytorch/fairseq/pull/442

Differential Revision: D13651715

Pulled By: myleott

fbshipit-source-id: ac93c80f1dbffdfe03fbd4b8a8ea527aecb576a7

d9284ee7

10 Jan, 2019 1 commit

Make error message for trying to train after make_generation_fast work correctly · 315fa5cb

Wei Ho authored Jan 09, 2019

Summary: https://github.com/pytorch/fairseq/blob/master/fairseq/trainer.py#L164 calls `train()` without any argument

Reviewed By: myleott

Differential Revision: D13599203

fbshipit-source-id: 3a096a6dd35a7a3f8309fbda3b54a36f606475e3

315fa5cb

09 Jan, 2019 2 commits

Misc fixes · 4b1f4788

Myle Ott authored Jan 09, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/439

Differential Revision: D13608151

Pulled By: myleott

fbshipit-source-id: 198b84995a6329f8329829cc91184d88f1eab947

4b1f4788

Fix broken link in README.md (#436) · 73876ce3

Art Matsak authored Jan 09, 2019

Summary:
https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset is not longer valid, redirects to a blog post listing page.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/436

Differential Revision: D13607961

Pulled By: myleott

fbshipit-source-id: 1a1074ffcbc454e29bc9d5aed84fdf2089a224bc

73876ce3

07 Jan, 2019 1 commit

Update docs for --lazy-load and torch.distributed.launch · 14bd9c62

Myle Ott authored Jan 07, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/433

Differential Revision: D13588032

Pulled By: myleott

fbshipit-source-id: 0e5ff361e27b206c4490264f0f51863367499e81

14bd9c62

05 Jan, 2019 3 commits

Cleanup more files · 7d66726b
Myle Ott authored Jan 05, 2019

7d66726b
rm fb_train.py (#432) · df4d566d
Myle Ott authored Jan 05, 2019

df4d566d

Merge internal changes (#283) · 7633129b

Myle Ott authored Jan 04, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/283

Pull Request resolved: https://github.com/pytorch/fairseq/pull/428

Differential Revision: D13564190

Pulled By: myleott

fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5

7633129b

28 Dec, 2018 3 commits

Make multiprocessing_train.py work with multi-node setups · 0cb87130

Myle Ott authored Dec 28, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/425

Differential Revision: D13558340

Pulled By: myleott

fbshipit-source-id: dff8c77027e821d8c80bfbd6a6ccce9ca1a44b78

0cb87130

Fix resuming from FP16 checkpoints (#424) · 58dd1862

Myle Ott authored Dec 27, 2018

Summary:
This was broken in 03a57dec.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/424

Differential Revision: D13557540

Pulled By: myleott

fbshipit-source-id: 62deda5353032aff20d35d046b0bb843da44d27c

58dd1862

Fix backtranslation dataset on IndexedCachedDataset (#410) · 31a43973

Paul Michel authored Dec 27, 2018

Summary:
BacktranslationDataset would throw an error when the underlying dataset was an IndexedCachedDataset because prefetching was not handled correctly. This fixes the error.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/410

Differential Revision: D13557539

Pulled By: myleott

fbshipit-source-id: 398ab59a3ebdbf1c666d862b9f905654eece800c

31a43973

26 Dec, 2018 2 commits

Merge internal changes (#422) · 8ce6499d

Myle Ott authored Dec 26, 2018

Summary:
- 04cc608: Add `--match-source-len` option to generate.py to for sequence-tagging tasks
- 19f1a40: Add `--no-repeat-ngram-size` option to generate.py for ngram blocking
Pull Request resolved: https://github.com/pytorch/fairseq/pull/422

Differential Revision: D13548445

Pulled By: myleott

fbshipit-source-id: 26d1ae83993e428fcb020dac5ae358b0e36233d9

8ce6499d

Add option to disable positional embeddings in TransformerModel (#421) · 19c17b74

Emanuele Bugliarello authored Dec 26, 2018

Summary:
Add argument `--no-token-positional-embeddings` to TransformerModel (currently only available in TransformerLanguageModel) to disable positional embeddings.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/421

Differential Revision: D13548450

Pulled By: myleott

fbshipit-source-id: b352c702ed1609e3b84d9a8404941d3274a7f883

19c17b74

24 Dec, 2018 2 commits

Improve memory efficiency of FP16 optimization (#404) · 03a57dec

Myle Ott authored Dec 24, 2018

Summary:
Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory.

This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16.

This does not affect convergence, i.e., models will train exactly as they did before.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/404

Differential Revision: D13394376

Pulled By: myleott

fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf

03a57dec

Add BufferedIterator (#419) · 0f833526

Myle Ott authored Dec 24, 2018

Summary:
This improves performance for datasets that load data lazily. Enabled by default since it shouldn't compromise performance for non-lazy datasets.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/419

Differential Revision: D13546585

Pulled By: myleott

fbshipit-source-id: f6152e2047291b0d68cd7506cd772b0caafe95be

0f833526

18 Dec, 2018 1 commit

data per gpu change · 9ca82a0e

Haoran Li authored Dec 18, 2018

Summary: Avoid loading entire data set per gpu to reduce memory footprint

Reviewed By: rutyrinott

Differential Revision: D13163548

fbshipit-source-id: 4ba717c8021ba5723d02225bae5782e2c3a18640

9ca82a0e

11 Dec, 2018 1 commit

Loading PreTrained Models (#406) · c37250ab

Suvrat Bhooshan authored Dec 10, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/406

Static helper function in TranslationTask to load pretrained models

Reviewed By: myleott

Differential Revision: D13345276

fbshipit-source-id: 3a675ee1a144ceb8b010f30e1a6163ef670b53f3

c37250ab

08 Dec, 2018 1 commit

fix data checking report bug (#403) · 00e47d7c

Peng Li authored Dec 08, 2018

Summary:
The original code reports the size of a valid sample instead of an invalid one when raising an Exception , which will make people confused.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/403

Differential Revision: D13391431

Pulled By: myleott

fbshipit-source-id: 4642ed027c0f664424fc5a9baf4363791144feaf

00e47d7c

07 Dec, 2018 2 commits

Add --fp16-scale-tolerance (#397) · 03ef3ab8

Myle Ott authored Dec 07, 2018

Summary:
Let's only decrease the loss scale if a large enough percentage of batches overflow.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/397

Differential Revision: D13355159

Pulled By: myleott

fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6

03ef3ab8

Take a dummy train step under OOM to keep multiprocessing in sync · 6c006a34

Halil Akin authored Dec 06, 2018

Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough.

Reviewed By: myleott

Differential Revision: D13086018

fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee

6c006a34

06 Dec, 2018 4 commits

Warn when using --update-freq on a single machine and --ddp-backend != no_c10d · ccd22212

Myle Ott authored Dec 06, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/400

Differential Revision: D13366996

Pulled By: myleott

fbshipit-source-id: b4907815e7cc1b4a2aceab11210bf64cb3d814c9

ccd22212

Fix arg formatting in preprocess.py and add fmt control for black formatting (#399) · 82a9f923

Myle Ott authored Dec 06, 2018

Summary:
Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/399

Differential Revision: D13364674

Pulled By: myleott

fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c

82a9f923

Add check that --encoder-layers matches --decoder-layers for LSTM (fixes #394) · 0693c351

Myle Ott authored Dec 06, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/398

Differential Revision: D13358876

Pulled By: myleott

fbshipit-source-id: 57673f2643aac01492cb8f5728bb9f1a34ba6aa7

0693c351

Enable check_reduction for imagenet flow and fairseq · 50591a29

Teng Li authored Dec 05, 2018

Summary:
As the title says, better to enable this for certain use cases to make
sure things are right

Reviewed By: myleott, pietern

Differential Revision: D13351753

fbshipit-source-id: cf495960fda71ebd679c23212e19703c93a9dbdc

50591a29

04 Dec, 2018 1 commit

Better error message if workers fall out of sync (#396) · 776e9ce3

Myle Ott authored Dec 04, 2018

Summary:
This kind of issue should be rare, but the exception that was thrown before ("UnpicklingError: invalid load key") was very opaque, so let's use something a bit clearer.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/396

Differential Revision: D13325600

Pulled By: myleott

fbshipit-source-id: 2e7093752d45d6b04a3d506aca8d5694b72ab638

776e9ce3

30 Nov, 2018 1 commit

fixed torch 0.4.0 , "RuntimeError: Expected object of type torch.cuda… (#393) · 9dd87245

linkerr authored Nov 30, 2018

Summary:
….LongTensor but found type torch.cuda.FloatTensor for argument #3 'index' " error

in the torch.__version__ == 0.4.0 ,
new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
will return a float dtype Tensor, when exec the "line 321: fairseq/fairseq/models/fconv.py " will throw a RuntimeError
Pull Request resolved: https://github.com/pytorch/fairseq/pull/393

Differential Revision: D13276496

Pulled By: myleott

fbshipit-source-id: e7986246fbe2c79fff61bcab0e5bec9dd63e0afd

9dd87245

29 Nov, 2018 2 commits

fixes on bi-transformer onnx · 7bbe528d

Haoran Li authored Nov 28, 2018

Summary: replace dynamic index put with copying and creating a new tensor

Reviewed By: wanchaol

Differential Revision: D13244573

fbshipit-source-id: 909f7913ad579ed035f29bb52321ff01e09a2c60

7bbe528d

Fix --ddp-backend=no_c10d for params that don't require grads · 866d0d2e

Myle Ott authored Nov 28, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/388

Reviewed By: theweiho

Differential Revision: D13244869

fbshipit-source-id: d22c18f63f9a691ccc7245e06bc9a5b776a192b5

866d0d2e

27 Nov, 2018 2 commits

Decoder embedding sharing in PyTorch Translate for denoising autoencoder (#386) · 07e34244

Liezl Puzon authored Nov 27, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/386

Pull Request resolved: https://github.com/pytorch/translate/pull/266

This allows decoder embedding sharing for denoising autoencoder modules with different decoders (one for src decoding and one for tgt decoding)

Reviewed By: dpacgopinath

Differential Revision: D13133015

fbshipit-source-id: 3c98be639d705744ccf5ba3a8fd7d10ddc7aef4a

07e34244

onnx bi-transformer (#385) · a5e2d786

Haoran Li authored Nov 26, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/385

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/6

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14292

Reviewed By: jingfeidu

Differential Revision: D10517864

fbshipit-source-id: 81008b5cc6aab70e23329c187392fb72ee057d78

a5e2d786

26 Nov, 2018 2 commits

Fix some recursive functions (e.g., reorder_incremental_state) to only touch... · 14506a83

Myle Ott authored Nov 25, 2018

Fix some recursive functions (e.g., reorder_incremental_state) to only touch each module once (#379)

Summary:
This can happen if a module is registered in more than one place in the network.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/379

Differential Revision: D13154498

Pulled By: myleott

fbshipit-source-id: a35575d1956a46cd35ac8b16a719ad20ac3e380a

14506a83

Refactor BacktranslationDataset to be more reusable (#354) · 3c19878f

Myle Ott authored Nov 25, 2018

Summary:
- generalize AppendEosDataset -> TransformEosDataset
- remove EOS logic from BacktranslationDataset (use TransformEosDataset instead)
- BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself
Pull Request resolved: https://github.com/pytorch/fairseq/pull/354

Reviewed By: liezl200

Differential Revision: D12970233

Pulled By: myleott

fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36

3c19878f

19 Nov, 2018 1 commit

Protect against failures in case of OOMs · a442244d

Halil Akin authored Nov 19, 2018

Summary: Fixing some distributed failures that happen when OOMs are observed.

Reviewed By: myleott

Differential Revision: D13121054

fbshipit-source-id: f71a0a695332acbaa1797e89887b8b7c7ddaa727

a442244d

18 Nov, 2018 1 commit

Merge small fixes from internal · 693894b6

Naman Goyal authored Nov 18, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/374

Differential Revision: D13116074

Pulled By: myleott

fbshipit-source-id: 485724cc5a40e8360d21e4bf9c35821baa0ddc57

693894b6