- 17 Jan, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/454 Differential Revision: D13708565 Pulled By: myleott fbshipit-source-id: 5cd0e07e3e1885eef14e3a5e8074f24cf4bde632
-
Myle Ott authored
Summary: There was a very subtle bug here
😢 When we recently removed this line (7633129b), it meant that the learning rate scheduler didn't get initialized until after the first update. Unfortunately pytorch optimizers store the learning rate in their internal state, so some learning rate schedulers use their `__init__` method to reset the learning rate to some sane initial value. This is especially problematic for LR schedulers that include a warmup, where the Optimizer is likely to contain the peak learning rate at initialization, and it's only in the LR scheduler's `__init__` that the (much smaller) warmup value is set. For example, the inverse_sqrt scheduler resets the learning rate upon initialization: https://github.com/pytorch/fairseq/blob/7853818c2e33a63ec17a31bcfe20e4fc75d94130/fairseq/optim/lr_scheduler/inverse_square_root_schedule.py#L48-L50 **Impact:** For the last ~1.5 weeks, the first training update would use the optimizer's default learning rate instead of the initial rate set by the LR scheduler. All subsequent updates used the correct learning rates. This primarily affects LR schedulers with warmups. Pull Request resolved: https://github.com/pytorch/fairseq/pull/453 Differential Revision: D13704453 Pulled By: myleott fbshipit-source-id: a946da30100f837c66bdc6b9b77b014ab4eb8764
-
- 16 Jan, 2019 3 commits
-
-
Davide Caroselli authored
Summary: On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`. This pull request fixes this problem: custom module import in now explicit on every `main()` function. Pull Request resolved: https://github.com/pytorch/fairseq/pull/449 Differential Revision: D13676922 Pulled By: myleott fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6
-
Myle Ott authored
Summary: This is useful for averaging the last N checkpoints, ending at some "best" checkpoint. Pull Request resolved: https://github.com/pytorch/fairseq/pull/452 Differential Revision: D13695407 Pulled By: myleott fbshipit-source-id: 5d9d2bff3706834f01501e9259834c77fb335817
-
Ruty Rinott authored
Summary: optimizing memory use of token_block_dataset by replacing python data structures with numpy arrays. applying needed parts from D13498973, instead of rebasing it on changes Reviewed By: edunov Differential Revision: D13678485 fbshipit-source-id: c0c827a8b95834a6a5456476040ebdc8e42136d4
-
- 15 Jan, 2019 2 commits
-
-
Davide Caroselli authored
Summary: Correct help message was obfuscated by the transient `ArgumentParser` used only for eagerly read `--user-dir` flag. To reproduce just try: ```bash python3 train.py --help ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/446 Differential Revision: D13674731 Pulled By: myleott fbshipit-source-id: b9503a4d7ef26405be630d31c0ca02386d783031
-
Davide Caroselli authored
Summary: Command line option --user-dir documented in docs/overview.rst Pull Request resolved: https://github.com/pytorch/fairseq/pull/447 Differential Revision: D13674744 Pulled By: myleott fbshipit-source-id: 17049ee5c9f692f5298ef9fa7381ee583f269cde
-
- 14 Jan, 2019 2 commits
-
-
Davide Caroselli authored
Summary: Following discussion on official fairseq (https://github.com/pytorch/fairseq/issues/438), I added the `--user-dir` option to the command line. The user can now specify a path in order to import a custom module with proprietary tasks, architectures and so on. Pull Request resolved: https://github.com/pytorch/fairseq/pull/440 Differential Revision: D13651721 Pulled By: myleott fbshipit-source-id: 38b87454487f1ffa5eaf19c4bcefa0b3b15a8f43
-
Huihui Fan authored
Summary: minor fixes: 1- adding fairseq logo 2- encoder padding for fconv self att 3- legacy ddp change Pull Request resolved: https://github.com/pytorch/fairseq/pull/442 Differential Revision: D13651715 Pulled By: myleott fbshipit-source-id: ac93c80f1dbffdfe03fbd4b8a8ea527aecb576a7
-
- 10 Jan, 2019 1 commit
-
-
Wei Ho authored
Summary: https://github.com/pytorch/fairseq/blob/master/fairseq/trainer.py#L164 calls `train()` without any argument Reviewed By: myleott Differential Revision: D13599203 fbshipit-source-id: 3a096a6dd35a7a3f8309fbda3b54a36f606475e3
-
- 09 Jan, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/439 Differential Revision: D13608151 Pulled By: myleott fbshipit-source-id: 198b84995a6329f8329829cc91184d88f1eab947
-
Art Matsak authored
Summary: https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset is not longer valid, redirects to a blog post listing page. Pull Request resolved: https://github.com/pytorch/fairseq/pull/436 Differential Revision: D13607961 Pulled By: myleott fbshipit-source-id: 1a1074ffcbc454e29bc9d5aed84fdf2089a224bc
-
- 07 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/433 Differential Revision: D13588032 Pulled By: myleott fbshipit-source-id: 0e5ff361e27b206c4490264f0f51863367499e81
-
- 05 Jan, 2019 3 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/283 Pull Request resolved: https://github.com/pytorch/fairseq/pull/428 Differential Revision: D13564190 Pulled By: myleott fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5
-
- 28 Dec, 2018 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/425 Differential Revision: D13558340 Pulled By: myleott fbshipit-source-id: dff8c77027e821d8c80bfbd6a6ccce9ca1a44b78
-
Myle Ott authored
Summary: This was broken in 03a57dec. Pull Request resolved: https://github.com/pytorch/fairseq/pull/424 Differential Revision: D13557540 Pulled By: myleott fbshipit-source-id: 62deda5353032aff20d35d046b0bb843da44d27c
-
Paul Michel authored
Summary: BacktranslationDataset would throw an error when the underlying dataset was an IndexedCachedDataset because prefetching was not handled correctly. This fixes the error. Pull Request resolved: https://github.com/pytorch/fairseq/pull/410 Differential Revision: D13557539 Pulled By: myleott fbshipit-source-id: 398ab59a3ebdbf1c666d862b9f905654eece800c
-
- 26 Dec, 2018 2 commits
-
-
Myle Ott authored
Summary: - 04cc608: Add `--match-source-len` option to generate.py to for sequence-tagging tasks - 19f1a40: Add `--no-repeat-ngram-size` option to generate.py for ngram blocking Pull Request resolved: https://github.com/pytorch/fairseq/pull/422 Differential Revision: D13548445 Pulled By: myleott fbshipit-source-id: 26d1ae83993e428fcb020dac5ae358b0e36233d9
-
Emanuele Bugliarello authored
Summary: Add argument `--no-token-positional-embeddings` to TransformerModel (currently only available in TransformerLanguageModel) to disable positional embeddings. Pull Request resolved: https://github.com/pytorch/fairseq/pull/421 Differential Revision: D13548450 Pulled By: myleott fbshipit-source-id: b352c702ed1609e3b84d9a8404941d3274a7f883
-
- 24 Dec, 2018 2 commits
-
-
Myle Ott authored
Summary: Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory. This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16. This does not affect convergence, i.e., models will train exactly as they did before. Pull Request resolved: https://github.com/pytorch/fairseq/pull/404 Differential Revision: D13394376 Pulled By: myleott fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf
-
Myle Ott authored
Summary: This improves performance for datasets that load data lazily. Enabled by default since it shouldn't compromise performance for non-lazy datasets. Pull Request resolved: https://github.com/pytorch/fairseq/pull/419 Differential Revision: D13546585 Pulled By: myleott fbshipit-source-id: f6152e2047291b0d68cd7506cd772b0caafe95be
-
- 18 Dec, 2018 1 commit
-
-
Haoran Li authored
Summary: Avoid loading entire data set per gpu to reduce memory footprint Reviewed By: rutyrinott Differential Revision: D13163548 fbshipit-source-id: 4ba717c8021ba5723d02225bae5782e2c3a18640
-
- 11 Dec, 2018 1 commit
-
-
Suvrat Bhooshan authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/406 Static helper function in TranslationTask to load pretrained models Reviewed By: myleott Differential Revision: D13345276 fbshipit-source-id: 3a675ee1a144ceb8b010f30e1a6163ef670b53f3
-
- 08 Dec, 2018 1 commit
-
-
Peng Li authored
Summary: The original code reports the size of a valid sample instead of an invalid one when raising an Exception , which will make people confused. Pull Request resolved: https://github.com/pytorch/fairseq/pull/403 Differential Revision: D13391431 Pulled By: myleott fbshipit-source-id: 4642ed027c0f664424fc5a9baf4363791144feaf
-
- 07 Dec, 2018 2 commits
-
-
Myle Ott authored
Summary: Let's only decrease the loss scale if a large enough percentage of batches overflow. Pull Request resolved: https://github.com/pytorch/fairseq/pull/397 Differential Revision: D13355159 Pulled By: myleott fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6
-
Halil Akin authored
Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough. Reviewed By: myleott Differential Revision: D13086018 fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee
-
- 06 Dec, 2018 4 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/400 Differential Revision: D13366996 Pulled By: myleott fbshipit-source-id: b4907815e7cc1b4a2aceab11210bf64cb3d814c9
-
Myle Ott authored
Summary: Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later. Pull Request resolved: https://github.com/pytorch/fairseq/pull/399 Differential Revision: D13364674 Pulled By: myleott fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/398 Differential Revision: D13358876 Pulled By: myleott fbshipit-source-id: 57673f2643aac01492cb8f5728bb9f1a34ba6aa7
-
Teng Li authored
Summary: As the title says, better to enable this for certain use cases to make sure things are right Reviewed By: myleott, pietern Differential Revision: D13351753 fbshipit-source-id: cf495960fda71ebd679c23212e19703c93a9dbdc
-
- 04 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: This kind of issue should be rare, but the exception that was thrown before ("UnpicklingError: invalid load key") was very opaque, so let's use something a bit clearer. Pull Request resolved: https://github.com/pytorch/fairseq/pull/396 Differential Revision: D13325600 Pulled By: myleott fbshipit-source-id: 2e7093752d45d6b04a3d506aca8d5694b72ab638
-
- 30 Nov, 2018 1 commit
-
-
linkerr authored
Summary: ….LongTensor but found type torch.cuda.FloatTensor for argument #3 'index' " error in the torch.__version__ == 0.4.0 , new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1) will return a float dtype Tensor, when exec the "line 321: fairseq/fairseq/models/fconv.py " will throw a RuntimeError Pull Request resolved: https://github.com/pytorch/fairseq/pull/393 Differential Revision: D13276496 Pulled By: myleott fbshipit-source-id: e7986246fbe2c79fff61bcab0e5bec9dd63e0afd
-
- 29 Nov, 2018 2 commits
-
-
Haoran Li authored
Summary: replace dynamic index put with copying and creating a new tensor Reviewed By: wanchaol Differential Revision: D13244573 fbshipit-source-id: 909f7913ad579ed035f29bb52321ff01e09a2c60
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/388 Reviewed By: theweiho Differential Revision: D13244869 fbshipit-source-id: d22c18f63f9a691ccc7245e06bc9a5b776a192b5
-
- 27 Nov, 2018 2 commits
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/386 Pull Request resolved: https://github.com/pytorch/translate/pull/266 This allows decoder embedding sharing for denoising autoencoder modules with different decoders (one for src decoding and one for tgt decoding) Reviewed By: dpacgopinath Differential Revision: D13133015 fbshipit-source-id: 3c98be639d705744ccf5ba3a8fd7d10ddc7aef4a
-
Haoran Li authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/385 Pull Request resolved: https://github.com/facebookresearch/pytext/pull/6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14292 Reviewed By: jingfeidu Differential Revision: D10517864 fbshipit-source-id: 81008b5cc6aab70e23329c187392fb72ee057d78
-
- 26 Nov, 2018 2 commits
-
-
Myle Ott authored
Fix some recursive functions (e.g., reorder_incremental_state) to only touch each module once (#379) Summary: This can happen if a module is registered in more than one place in the network. Pull Request resolved: https://github.com/pytorch/fairseq/pull/379 Differential Revision: D13154498 Pulled By: myleott fbshipit-source-id: a35575d1956a46cd35ac8b16a719ad20ac3e380a
-
Myle Ott authored
Summary: - generalize AppendEosDataset -> TransformEosDataset - remove EOS logic from BacktranslationDataset (use TransformEosDataset instead) - BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself Pull Request resolved: https://github.com/pytorch/fairseq/pull/354 Reviewed By: liezl200 Differential Revision: D12970233 Pulled By: myleott fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36
-