- 29 Jan, 2019 1 commit
-
-
Jingfei Du authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/482 With this change, we can use different dictionary classes when calling build_dictionary and build_and_save_dictionary Reviewed By: liaimi Differential Revision: D13855100 fbshipit-source-id: 62e6db310b5f078e05c547d2671252233be7b7f0
-
- 25 Jan, 2019 4 commits
-
-
Myle Ott authored
Summary: Changelog: - `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper - `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu - update READMEs - misc fixes Pull Request resolved: https://github.com/pytorch/fairseq/pull/473 Differential Revision: D13819717 Pulled By: myleott fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9
-
Xian Li authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/474 Reviewed By: theweiho, akinh Differential Revision: D13701447 fbshipit-source-id: 34036dce7601835b605e3b169210edc7a6715de6
-
Lucio Dery authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/472 Implementation of "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost" (https://arxiv.org/abs/1804.04235) Differential Revision: D13388049 fbshipit-source-id: 24ad30f4bac248e6aeaced5064bb83784058f03d
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/471 Differential Revision: D13818918 Pulled By: myleott fbshipit-source-id: d3b8dc50e81ee1d2dcc5efc5815998be8461085f
-
- 24 Jan, 2019 6 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/470 Differential Revision: D13803964 Pulled By: myleott fbshipit-source-id: 91b66599e9a539833fcedea07c608b349ba3b449
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/469 Differential Revision: D13802945 Pulled By: myleott fbshipit-source-id: b6976506a8336b96ee40505c4a7638541cc99c95
-
Davide Caroselli authored
Summary: When opening text files without specifying the encoding (i.e. `open(path, "r")` or `open(path, "w")`), python3 will use the preferred locale encoding (`locale.getpreferredencoding()`) so the result is platform dependent and can change from one machine to another. I believe fairseq should enforce its standard (UTF-8 seems like the best choice to me). This pull request explicity specify UTF-8 encoding when reading text files. Pull Request resolved: https://github.com/pytorch/fairseq/pull/460 Differential Revision: D13802525 Pulled By: myleott fbshipit-source-id: 672fd55707ee559ab36d74bc1c24026166ea2367
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/468 Differential Revision: D13802590 Pulled By: myleott fbshipit-source-id: e374e38e74dc91bda0579ae41e26289fb0ba56a2
-
vufg authored
Summary: Although both are supported by Python 3.6, I think it would be better to unify the usage of string format function. Pull Request resolved: https://github.com/pytorch/fairseq/pull/467 Differential Revision: D13802506 Pulled By: myleott fbshipit-source-id: 5c4877547b1c4ca806ab54c80ae483cfbaa7827a
-
frankang authored
Summary: Fix iterating from the beginning bug when initializing the GroupedIterator. (https://github.com/pytorch/fairseq/issues/441) Correct filter criterion for dict type sentence size. (https://github.com/pytorch/fairseq/issues/451) Pull Request resolved: https://github.com/pytorch/fairseq/pull/455 Differential Revision: D13725646 Pulled By: myleott fbshipit-source-id: e698fa6f9b45460f95a75c9e9976a3aa3b6aa523
-
- 17 Jan, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/454 Differential Revision: D13708565 Pulled By: myleott fbshipit-source-id: 5cd0e07e3e1885eef14e3a5e8074f24cf4bde632
-
Myle Ott authored
Summary: There was a very subtle bug here
😢 When we recently removed this line (7633129b), it meant that the learning rate scheduler didn't get initialized until after the first update. Unfortunately pytorch optimizers store the learning rate in their internal state, so some learning rate schedulers use their `__init__` method to reset the learning rate to some sane initial value. This is especially problematic for LR schedulers that include a warmup, where the Optimizer is likely to contain the peak learning rate at initialization, and it's only in the LR scheduler's `__init__` that the (much smaller) warmup value is set. For example, the inverse_sqrt scheduler resets the learning rate upon initialization: https://github.com/pytorch/fairseq/blob/7853818c2e33a63ec17a31bcfe20e4fc75d94130/fairseq/optim/lr_scheduler/inverse_square_root_schedule.py#L48-L50 **Impact:** For the last ~1.5 weeks, the first training update would use the optimizer...
-
- 16 Jan, 2019 3 commits
-
-
Davide Caroselli authored
Summary: On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`. This pull request fixes this problem: custom module import in now explicit on every `main()` function. Pull Request resolved: https://github.com/pytorch/fairseq/pull/449 Differential Revision: D13676922 Pulled By: myleott fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6
-
Myle Ott authored
Summary: This is useful for averaging the last N checkpoints, ending at some "best" checkpoint. Pull Request resolved: https://github.com/pytorch/fairseq/pull/452 Differential Revision: D13695407 Pulled By: myleott fbshipit-source-id: 5d9d2bff3706834f01501e9259834c77fb335817
-
Ruty Rinott authored
Summary: optimizing memory use of token_block_dataset by replacing python data structures with numpy arrays. applying needed parts from D13498973, instead of rebasing it on changes Reviewed By: edunov Differential Revision: D13678485 fbshipit-source-id: c0c827a8b95834a6a5456476040ebdc8e42136d4
-
- 15 Jan, 2019 2 commits
-
-
Davide Caroselli authored
Summary: Correct help message was obfuscated by the transient `ArgumentParser` used only for eagerly read `--user-dir` flag. To reproduce just try: ```bash python3 train.py --help ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/446 Differential Revision: D13674731 Pulled By: myleott fbshipit-source-id: b9503a4d7ef26405be630d31c0ca02386d783031
-
Davide Caroselli authored
Summary: Command line option --user-dir documented in docs/overview.rst Pull Request resolved: https://github.com/pytorch/fairseq/pull/447 Differential Revision: D13674744 Pulled By: myleott fbshipit-source-id: 17049ee5c9f692f5298ef9fa7381ee583f269cde
-
- 14 Jan, 2019 2 commits
-
-
Davide Caroselli authored
Summary: Following discussion on official fairseq (https://github.com/pytorch/fairseq/issues/438), I added the `--user-dir` option to the command line. The user can now specify a path in order to import a custom module with proprietary tasks, architectures and so on. Pull Request resolved: https://github.com/pytorch/fairseq/pull/440 Differential Revision: D13651721 Pulled By: myleott fbshipit-source-id: 38b87454487f1ffa5eaf19c4bcefa0b3b15a8f43
-
Huihui Fan authored
Summary: minor fixes: 1- adding fairseq logo 2- encoder padding for fconv self att 3- legacy ddp change Pull Request resolved: https://github.com/pytorch/fairseq/pull/442 Differential Revision: D13651715 Pulled By: myleott fbshipit-source-id: ac93c80f1dbffdfe03fbd4b8a8ea527aecb576a7
-
- 10 Jan, 2019 1 commit
-
-
Wei Ho authored
Summary: https://github.com/pytorch/fairseq/blob/master/fairseq/trainer.py#L164 calls `train()` without any argument Reviewed By: myleott Differential Revision: D13599203 fbshipit-source-id: 3a096a6dd35a7a3f8309fbda3b54a36f606475e3
-
- 09 Jan, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/439 Differential Revision: D13608151 Pulled By: myleott fbshipit-source-id: 198b84995a6329f8329829cc91184d88f1eab947
-
Art Matsak authored
Summary: https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset is not longer valid, redirects to a blog post listing page. Pull Request resolved: https://github.com/pytorch/fairseq/pull/436 Differential Revision: D13607961 Pulled By: myleott fbshipit-source-id: 1a1074ffcbc454e29bc9d5aed84fdf2089a224bc
-
- 07 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/433 Differential Revision: D13588032 Pulled By: myleott fbshipit-source-id: 0e5ff361e27b206c4490264f0f51863367499e81
-
- 05 Jan, 2019 3 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/283 Pull Request resolved: https://github.com/pytorch/fairseq/pull/428 Differential Revision: D13564190 Pulled By: myleott fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5
-
- 28 Dec, 2018 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/425 Differential Revision: D13558340 Pulled By: myleott fbshipit-source-id: dff8c77027e821d8c80bfbd6a6ccce9ca1a44b78
-
Myle Ott authored
Summary: This was broken in 03a57dec. Pull Request resolved: https://github.com/pytorch/fairseq/pull/424 Differential Revision: D13557540 Pulled By: myleott fbshipit-source-id: 62deda5353032aff20d35d046b0bb843da44d27c
-
Paul Michel authored
Summary: BacktranslationDataset would throw an error when the underlying dataset was an IndexedCachedDataset because prefetching was not handled correctly. This fixes the error. Pull Request resolved: https://github.com/pytorch/fairseq/pull/410 Differential Revision: D13557539 Pulled By: myleott fbshipit-source-id: 398ab59a3ebdbf1c666d862b9f905654eece800c
-
- 26 Dec, 2018 2 commits
-
-
Myle Ott authored
Summary: - 04cc608: Add `--match-source-len` option to generate.py to for sequence-tagging tasks - 19f1a40: Add `--no-repeat-ngram-size` option to generate.py for ngram blocking Pull Request resolved: https://github.com/pytorch/fairseq/pull/422 Differential Revision: D13548445 Pulled By: myleott fbshipit-source-id: 26d1ae83993e428fcb020dac5ae358b0e36233d9
-
Emanuele Bugliarello authored
Summary: Add argument `--no-token-positional-embeddings` to TransformerModel (currently only available in TransformerLanguageModel) to disable positional embeddings. Pull Request resolved: https://github.com/pytorch/fairseq/pull/421 Differential Revision: D13548450 Pulled By: myleott fbshipit-source-id: b352c702ed1609e3b84d9a8404941d3274a7f883
-
- 24 Dec, 2018 2 commits
-
-
Myle Ott authored
Summary: Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory. This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16. This does not affect convergence, i.e., models will train exactly as they did before. Pull Request resolved: https://github.com/pytorch/fairseq/pull/404 Differential Revision: D13394376 Pulled By: myleott fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf
-
Myle Ott authored
Summary: This improves performance for datasets that load data lazily. Enabled by default since it shouldn't compromise performance for non-lazy datasets. Pull Request resolved: https://github.com/pytorch/fairseq/pull/419 Differential Revision: D13546585 Pulled By: myleott fbshipit-source-id: f6152e2047291b0d68cd7506cd772b0caafe95be
-
- 18 Dec, 2018 1 commit
-
-
Haoran Li authored
Summary: Avoid loading entire data set per gpu to reduce memory footprint Reviewed By: rutyrinott Differential Revision: D13163548 fbshipit-source-id: 4ba717c8021ba5723d02225bae5782e2c3a18640
-
- 11 Dec, 2018 1 commit
-
-
Suvrat Bhooshan authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/406 Static helper function in TranslationTask to load pretrained models Reviewed By: myleott Differential Revision: D13345276 fbshipit-source-id: 3a675ee1a144ceb8b010f30e1a6163ef670b53f3
-
- 08 Dec, 2018 1 commit
-
-
Peng Li authored
Summary: The original code reports the size of a valid sample instead of an invalid one when raising an Exception , which will make people confused. Pull Request resolved: https://github.com/pytorch/fairseq/pull/403 Differential Revision: D13391431 Pulled By: myleott fbshipit-source-id: 4642ed027c0f664424fc5a9baf4363791144feaf
-
- 07 Dec, 2018 2 commits
-
-
Myle Ott authored
Summary: Let's only decrease the loss scale if a large enough percentage of batches overflow. Pull Request resolved: https://github.com/pytorch/fairseq/pull/397 Differential Revision: D13355159 Pulled By: myleott fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6
-
Halil Akin authored
Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough. Reviewed By: myleott Differential Revision: D13086018 fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee
-
- 06 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/400 Differential Revision: D13366996 Pulled By: myleott fbshipit-source-id: b4907815e7cc1b4a2aceab11210bf64cb3d814c9
-