Commits · 42be3ebd41100f9e37a0e0d30c73fe3d4ab6b105 · OpenDAS / Fairseq

30 Jan, 2019 1 commit

Merge internal changes (#483) · 42be3ebd

Myle Ott authored Jan 30, 2019

Summary:
Changelog:
- `4889802`: can now remove detokenize sentencepiece output with `--remove-bpe=sentencepiece` (fixes #331). Also added `--sacrebleu` for computing detokenized BLEU.
- `0d76427`: fix assertion error when training language model with dataset containing empty sentences
- minor bug and style fixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/483

Differential Revision: D13867899

Pulled By: myleott

fbshipit-source-id: 25c940b847fe270262ac8f5ac838407b3977fdda

42be3ebd

29 Jan, 2019 1 commit

make dictionary class as input for fairseq preprocess functions (#482) · 66ce2175

Jingfei Du authored Jan 29, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/482

With this change, we can use different dictionary classes when calling build_dictionary and build_and_save_dictionary

Reviewed By: liaimi

Differential Revision: D13855100

fbshipit-source-id: 62e6db310b5f078e05c547d2671252233be7b7f0

66ce2175

25 Jan, 2019 4 commits

Add code for "Pay Less Attention with Lightweight and Dynamic Convolutions" (#473) · b41c74dc

Myle Ott authored Jan 25, 2019

Summary:
Changelog:
- `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper
- `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu
- update READMEs
- misc fixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/473

Differential Revision: D13819717

Pulled By: myleott

fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9

b41c74dc

refactor AdversarialTrainer factor out helper functions · bc8ae449

Xian Li authored Jan 25, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/474

Reviewed By: theweiho, akinh

Differential Revision: D13701447

fbshipit-source-id: 34036dce7601835b605e3b169210edc7a6715de6

bc8ae449

Adafactor Optimizer (#472) · 3e67386b

Lucio Dery authored Jan 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/472

Implementation of "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost" (https://arxiv.org/abs/1804.04235)

Differential Revision: D13388049

fbshipit-source-id: 24ad30f4bac248e6aeaced5064bb83784058f03d

3e67386b

Only use c10d distributed primitives · 7e0d222c

Myle Ott authored Jan 25, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/471

Differential Revision: D13818918

Pulled By: myleott

fbshipit-source-id: d3b8dc50e81ee1d2dcc5efc5815998be8461085f

7e0d222c

24 Jan, 2019 6 commits

LSTM improvements (fixes #414) · 9196c0b6

Myle Ott authored Jan 24, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/470

Differential Revision: D13803964

Pulled By: myleott

fbshipit-source-id: 91b66599e9a539833fcedea07c608b349ba3b449

9196c0b6

Print model and number of trained params · d0ebcec4

Myle Ott authored Jan 24, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/469

Differential Revision: D13802945

Pulled By: myleott

fbshipit-source-id: b6976506a8336b96ee40505c4a7638541cc99c95

d0ebcec4

Enforce UTF-8 when open() text files (#460) · 38f1dee9

Davide Caroselli authored Jan 24, 2019

Summary:
When opening text files without specifying the encoding (i.e. `open(path, "r")` or `open(path, "w")`), python3 will use the preferred locale encoding (`locale.getpreferredencoding()`) so the result is platform dependent and can change from one machine to another.

I believe fairseq should enforce its standard (UTF-8 seems like the best choice to me). This pull request explicity specify UTF-8 encoding when reading text files.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/460

Differential Revision: D13802525

Pulled By: myleott

fbshipit-source-id: 672fd55707ee559ab36d74bc1c24026166ea2367

38f1dee9

Better error message for improperly formatted dictionaries · ef3e6ab5

Myle Ott authored Jan 24, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/468

Differential Revision: D13802590

Pulled By: myleott

fbshipit-source-id: e374e38e74dc91bda0579ae41e26289fb0ba56a2

ef3e6ab5

change f"{args}" to "{}".format(args) (#467) · 8eb49c84

vufg authored Jan 24, 2019

Summary:
Although both are supported by Python 3.6, I think it would be better to unify the usage of string format function.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/467

Differential Revision: D13802506

Pulled By: myleott

fbshipit-source-id: 5c4877547b1c4ca806ab54c80ae483cfbaa7827a

8eb49c84

Fix iteration bug in GroupedIterator. Correct sent size filter. (#455) · 37b9c235

frankang authored Jan 24, 2019

Summary:
Fix iterating from the beginning bug when initializing the GroupedIterator. (https://github.com/pytorch/fairseq/issues/441)
 Correct filter criterion for dict type sentence size. (https://github.com/pytorch/fairseq/issues/451)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/455

Differential Revision: D13725646

Pulled By: myleott

fbshipit-source-id: e698fa6f9b45460f95a75c9e9976a3aa3b6aa523

37b9c235

17 Jan, 2019 2 commits

Fix stories generation · d259ffa9

Myle Ott authored Jan 16, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/454

Differential Revision: D13708565

Pulled By: myleott

fbshipit-source-id: 5cd0e07e3e1885eef14e3a5e8074f24cf4bde632

d259ffa9

Fix initial learning rate (#453) · 2210fa71

Myle Ott authored Jan 16, 2019

Summary:
There was a very subtle bug here 😢When we recently removed this line (7633129b), it meant that the learning rate scheduler didn't get initialized until after the first update. Unfortunately pytorch optimizers store the learning rate in their internal state, so some learning rate schedulers use their `__init__` method to reset the learning rate to some sane initial value. This is especially problematic for LR schedulers that include a warmup, where the Optimizer is likely to contain the peak learning rate at initialization, and it's only in the LR scheduler's `__init__` that the (much smaller) warmup value is set.

For example, the inverse_sqrt scheduler resets the learning rate upon initialization:
https://github.com/pytorch/fairseq/blob/7853818c2e33a63ec17a31bcfe20e4fc75d94130/fairseq/optim/lr_scheduler/inverse_square_root_schedule.py#L48-L50

**Impact:** For the last ~1.5 weeks, the first training update would use the optimizer...

2210fa71

16 Jan, 2019 3 commits

FIX: '--user-dir' on multi-gpu (#449) · 7853818c

Davide Caroselli authored Jan 16, 2019

Summary:
On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`.

This pull request fixes this problem: custom module import in now explicit on every `main()` function.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/449

Differential Revision: D13676922

Pulled By: myleott

fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6

7853818c

Add --checkpoint-upper-bound to average_checkpoints.py (#452) · bdec179b

Myle Ott authored Jan 16, 2019

Summary:
This is useful for averaging the last N checkpoints, ending at some "best" checkpoint.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/452

Differential Revision: D13695407

Pulled By: myleott

fbshipit-source-id: 5d9d2bff3706834f01501e9259834c77fb335817

bdec179b

optimizations for token_block_dataset · d1dc66d9

Ruty Rinott authored Jan 15, 2019

Summary:
optimizing memory use of token_block_dataset by replacing python data structures with numpy arrays.
applying needed parts from D13498973, instead of rebasing it on changes

Reviewed By: edunov

Differential Revision: D13678485

fbshipit-source-id: c0c827a8b95834a6a5456476040ebdc8e42136d4

d1dc66d9

15 Jan, 2019 2 commits

Fixed wrong help message shown on '--help' (#446) · cefe3f8a

Davide Caroselli authored Jan 15, 2019

Summary:
Correct help message was obfuscated by the transient `ArgumentParser` used only for eagerly read `--user-dir` flag.

To reproduce just try:
```bash
python3 train.py --help
```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/446

Differential Revision: D13674731

Pulled By: myleott

fbshipit-source-id: b9503a4d7ef26405be630d31c0ca02386d783031

cefe3f8a

'--user-dir' documentation (correct) (#447) · ebaf8c50

Davide Caroselli authored Jan 15, 2019

Summary:
Command line option --user-dir documented in docs/overview.rst
Pull Request resolved: https://github.com/pytorch/fairseq/pull/447

Differential Revision: D13674744

Pulled By: myleott

fbshipit-source-id: 17049ee5c9f692f5298ef9fa7381ee583f269cde

ebaf8c50

14 Jan, 2019 2 commits

New command line option '--user-dir' (#440) · b15f5f53

Davide Caroselli authored Jan 14, 2019

Summary:
Following discussion on official fairseq (https://github.com/pytorch/fairseq/issues/438), I added the `--user-dir` option to the command line. The user can now specify a path in order to import a custom module with proprietary tasks, architectures and so on.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/440

Differential Revision: D13651721

Pulled By: myleott

fbshipit-source-id: 38b87454487f1ffa5eaf19c4bcefa0b3b15a8f43

b15f5f53

Fixes (#442) · d9284ee7

Huihui Fan authored Jan 14, 2019

Summary:
minor fixes:
1- adding fairseq logo
2- encoder padding for fconv self att
3- legacy ddp change
Pull Request resolved: https://github.com/pytorch/fairseq/pull/442

Differential Revision: D13651715

Pulled By: myleott

fbshipit-source-id: ac93c80f1dbffdfe03fbd4b8a8ea527aecb576a7

d9284ee7

10 Jan, 2019 1 commit

Make error message for trying to train after make_generation_fast work correctly · 315fa5cb

Wei Ho authored Jan 09, 2019

Summary: https://github.com/pytorch/fairseq/blob/master/fairseq/trainer.py#L164 calls `train()` without any argument

Reviewed By: myleott

Differential Revision: D13599203

fbshipit-source-id: 3a096a6dd35a7a3f8309fbda3b54a36f606475e3

315fa5cb

09 Jan, 2019 2 commits

Misc fixes · 4b1f4788

Myle Ott authored Jan 09, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/439

Differential Revision: D13608151

Pulled By: myleott

fbshipit-source-id: 198b84995a6329f8329829cc91184d88f1eab947

4b1f4788

Fix broken link in README.md (#436) · 73876ce3

Art Matsak authored Jan 09, 2019

Summary:
https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset is not longer valid, redirects to a blog post listing page.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/436

Differential Revision: D13607961

Pulled By: myleott

fbshipit-source-id: 1a1074ffcbc454e29bc9d5aed84fdf2089a224bc

73876ce3

07 Jan, 2019 1 commit

Update docs for --lazy-load and torch.distributed.launch · 14bd9c62

Myle Ott authored Jan 07, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/433

Differential Revision: D13588032

Pulled By: myleott

fbshipit-source-id: 0e5ff361e27b206c4490264f0f51863367499e81

14bd9c62

05 Jan, 2019 3 commits

Cleanup more files · 7d66726b
Myle Ott authored Jan 05, 2019

7d66726b
rm fb_train.py (#432) · df4d566d
Myle Ott authored Jan 05, 2019

df4d566d

Merge internal changes (#283) · 7633129b

Myle Ott authored Jan 04, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/283

Pull Request resolved: https://github.com/pytorch/fairseq/pull/428

Differential Revision: D13564190

Pulled By: myleott

fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5

7633129b

28 Dec, 2018 3 commits

Make multiprocessing_train.py work with multi-node setups · 0cb87130

Myle Ott authored Dec 28, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/425

Differential Revision: D13558340

Pulled By: myleott

fbshipit-source-id: dff8c77027e821d8c80bfbd6a6ccce9ca1a44b78

0cb87130

Fix resuming from FP16 checkpoints (#424) · 58dd1862

Myle Ott authored Dec 27, 2018

Summary:
This was broken in 03a57dec.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/424

Differential Revision: D13557540

Pulled By: myleott

fbshipit-source-id: 62deda5353032aff20d35d046b0bb843da44d27c

58dd1862

Fix backtranslation dataset on IndexedCachedDataset (#410) · 31a43973

Paul Michel authored Dec 27, 2018

Summary:
BacktranslationDataset would throw an error when the underlying dataset was an IndexedCachedDataset because prefetching was not handled correctly. This fixes the error.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/410

Differential Revision: D13557539

Pulled By: myleott

fbshipit-source-id: 398ab59a3ebdbf1c666d862b9f905654eece800c

31a43973

26 Dec, 2018 2 commits

Merge internal changes (#422) · 8ce6499d

Myle Ott authored Dec 26, 2018

Summary:
- 04cc608: Add `--match-source-len` option to generate.py to for sequence-tagging tasks
- 19f1a40: Add `--no-repeat-ngram-size` option to generate.py for ngram blocking
Pull Request resolved: https://github.com/pytorch/fairseq/pull/422

Differential Revision: D13548445

Pulled By: myleott

fbshipit-source-id: 26d1ae83993e428fcb020dac5ae358b0e36233d9

8ce6499d

Add option to disable positional embeddings in TransformerModel (#421) · 19c17b74

Emanuele Bugliarello authored Dec 26, 2018

Summary:
Add argument `--no-token-positional-embeddings` to TransformerModel (currently only available in TransformerLanguageModel) to disable positional embeddings.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/421

Differential Revision: D13548450

Pulled By: myleott

fbshipit-source-id: b352c702ed1609e3b84d9a8404941d3274a7f883

19c17b74

24 Dec, 2018 2 commits

Improve memory efficiency of FP16 optimization (#404) · 03a57dec

Myle Ott authored Dec 24, 2018

Summary:
Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory.

This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16.

This does not affect convergence, i.e., models will train exactly as they did before.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/404

Differential Revision: D13394376

Pulled By: myleott

fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf

03a57dec

Add BufferedIterator (#419) · 0f833526

Myle Ott authored Dec 24, 2018

Summary:
This improves performance for datasets that load data lazily. Enabled by default since it shouldn't compromise performance for non-lazy datasets.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/419

Differential Revision: D13546585

Pulled By: myleott

fbshipit-source-id: f6152e2047291b0d68cd7506cd772b0caafe95be

0f833526

18 Dec, 2018 1 commit

data per gpu change · 9ca82a0e

Haoran Li authored Dec 18, 2018

Summary: Avoid loading entire data set per gpu to reduce memory footprint

Reviewed By: rutyrinott

Differential Revision: D13163548

fbshipit-source-id: 4ba717c8021ba5723d02225bae5782e2c3a18640

9ca82a0e

11 Dec, 2018 1 commit

Loading PreTrained Models (#406) · c37250ab

Suvrat Bhooshan authored Dec 10, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/406

Static helper function in TranslationTask to load pretrained models

Reviewed By: myleott

Differential Revision: D13345276

fbshipit-source-id: 3a675ee1a144ceb8b010f30e1a6163ef670b53f3

c37250ab

08 Dec, 2018 1 commit

fix data checking report bug (#403) · 00e47d7c

Peng Li authored Dec 08, 2018

Summary:
The original code reports the size of a valid sample instead of an invalid one when raising an Exception , which will make people confused.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/403

Differential Revision: D13391431

Pulled By: myleott

fbshipit-source-id: 4642ed027c0f664424fc5a9baf4363791144feaf

00e47d7c

07 Dec, 2018 2 commits

Add --fp16-scale-tolerance (#397) · 03ef3ab8

Myle Ott authored Dec 07, 2018

Summary:
Let's only decrease the loss scale if a large enough percentage of batches overflow.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/397

Differential Revision: D13355159

Pulled By: myleott

fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6

03ef3ab8

Take a dummy train step under OOM to keep multiprocessing in sync · 6c006a34

Halil Akin authored Dec 06, 2018

Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough.

Reviewed By: myleott

Differential Revision: D13086018

fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee

6c006a34