- 16 Jan, 2019 3 commits
-
-
Davide Caroselli authored
Summary: On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`. This pull request fixes this problem: custom module import in now explicit on every `main()` function. Pull Request resolved: https://github.com/pytorch/fairseq/pull/449 Differential Revision: D13676922 Pulled By: myleott fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6
-
Myle Ott authored
Summary: This is useful for averaging the last N checkpoints, ending at some "best" checkpoint. Pull Request resolved: https://github.com/pytorch/fairseq/pull/452 Differential Revision: D13695407 Pulled By: myleott fbshipit-source-id: 5d9d2bff3706834f01501e9259834c77fb335817
-
Ruty Rinott authored
Summary: optimizing memory use of token_block_dataset by replacing python data structures with numpy arrays. applying needed parts from D13498973, instead of rebasing it on changes Reviewed By: edunov Differential Revision: D13678485 fbshipit-source-id: c0c827a8b95834a6a5456476040ebdc8e42136d4
-
- 15 Jan, 2019 2 commits
-
-
Davide Caroselli authored
Summary: Correct help message was obfuscated by the transient `ArgumentParser` used only for eagerly read `--user-dir` flag. To reproduce just try: ```bash python3 train.py --help ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/446 Differential Revision: D13674731 Pulled By: myleott fbshipit-source-id: b9503a4d7ef26405be630d31c0ca02386d783031
-
Davide Caroselli authored
Summary: Command line option --user-dir documented in docs/overview.rst Pull Request resolved: https://github.com/pytorch/fairseq/pull/447 Differential Revision: D13674744 Pulled By: myleott fbshipit-source-id: 17049ee5c9f692f5298ef9fa7381ee583f269cde
-
- 14 Jan, 2019 2 commits
-
-
Davide Caroselli authored
Summary: Following discussion on official fairseq (https://github.com/pytorch/fairseq/issues/438), I added the `--user-dir` option to the command line. The user can now specify a path in order to import a custom module with proprietary tasks, architectures and so on. Pull Request resolved: https://github.com/pytorch/fairseq/pull/440 Differential Revision: D13651721 Pulled By: myleott fbshipit-source-id: 38b87454487f1ffa5eaf19c4bcefa0b3b15a8f43
-
Huihui Fan authored
Summary: minor fixes: 1- adding fairseq logo 2- encoder padding for fconv self att 3- legacy ddp change Pull Request resolved: https://github.com/pytorch/fairseq/pull/442 Differential Revision: D13651715 Pulled By: myleott fbshipit-source-id: ac93c80f1dbffdfe03fbd4b8a8ea527aecb576a7
-
- 10 Jan, 2019 1 commit
-
-
Wei Ho authored
Summary: https://github.com/pytorch/fairseq/blob/master/fairseq/trainer.py#L164 calls `train()` without any argument Reviewed By: myleott Differential Revision: D13599203 fbshipit-source-id: 3a096a6dd35a7a3f8309fbda3b54a36f606475e3
-
- 09 Jan, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/439 Differential Revision: D13608151 Pulled By: myleott fbshipit-source-id: 198b84995a6329f8329829cc91184d88f1eab947
-
Art Matsak authored
Summary: https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset is not longer valid, redirects to a blog post listing page. Pull Request resolved: https://github.com/pytorch/fairseq/pull/436 Differential Revision: D13607961 Pulled By: myleott fbshipit-source-id: 1a1074ffcbc454e29bc9d5aed84fdf2089a224bc
-
- 07 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/433 Differential Revision: D13588032 Pulled By: myleott fbshipit-source-id: 0e5ff361e27b206c4490264f0f51863367499e81
-
- 05 Jan, 2019 3 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/283 Pull Request resolved: https://github.com/pytorch/fairseq/pull/428 Differential Revision: D13564190 Pulled By: myleott fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5
-
- 28 Dec, 2018 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/425 Differential Revision: D13558340 Pulled By: myleott fbshipit-source-id: dff8c77027e821d8c80bfbd6a6ccce9ca1a44b78
-
Myle Ott authored
Summary: This was broken in 03a57dec. Pull Request resolved: https://github.com/pytorch/fairseq/pull/424 Differential Revision: D13557540 Pulled By: myleott fbshipit-source-id: 62deda5353032aff20d35d046b0bb843da44d27c
-
Paul Michel authored
Summary: BacktranslationDataset would throw an error when the underlying dataset was an IndexedCachedDataset because prefetching was not handled correctly. This fixes the error. Pull Request resolved: https://github.com/pytorch/fairseq/pull/410 Differential Revision: D13557539 Pulled By: myleott fbshipit-source-id: 398ab59a3ebdbf1c666d862b9f905654eece800c
-
- 26 Dec, 2018 2 commits
-
-
Myle Ott authored
Summary: - 04cc608: Add `--match-source-len` option to generate.py to for sequence-tagging tasks - 19f1a40: Add `--no-repeat-ngram-size` option to generate.py for ngram blocking Pull Request resolved: https://github.com/pytorch/fairseq/pull/422 Differential Revision: D13548445 Pulled By: myleott fbshipit-source-id: 26d1ae83993e428fcb020dac5ae358b0e36233d9
-
Emanuele Bugliarello authored
Summary: Add argument `--no-token-positional-embeddings` to TransformerModel (currently only available in TransformerLanguageModel) to disable positional embeddings. Pull Request resolved: https://github.com/pytorch/fairseq/pull/421 Differential Revision: D13548450 Pulled By: myleott fbshipit-source-id: b352c702ed1609e3b84d9a8404941d3274a7f883
-
- 24 Dec, 2018 2 commits
-
-
Myle Ott authored
Summary: Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory. This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16. This does not affect convergence, i.e., models will train exactly as they did before. Pull Request resolved: https://github.com/pytorch/fairseq/pull/404 Differential Revision: D13394376 Pulled By: myleott fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf
-
Myle Ott authored
Summary: This improves performance for datasets that load data lazily. Enabled by default since it shouldn't compromise performance for non-lazy datasets. Pull Request resolved: https://github.com/pytorch/fairseq/pull/419 Differential Revision: D13546585 Pulled By: myleott fbshipit-source-id: f6152e2047291b0d68cd7506cd772b0caafe95be
-
- 18 Dec, 2018 1 commit
-
-
Haoran Li authored
Summary: Avoid loading entire data set per gpu to reduce memory footprint Reviewed By: rutyrinott Differential Revision: D13163548 fbshipit-source-id: 4ba717c8021ba5723d02225bae5782e2c3a18640
-
- 11 Dec, 2018 1 commit
-
-
Suvrat Bhooshan authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/406 Static helper function in TranslationTask to load pretrained models Reviewed By: myleott Differential Revision: D13345276 fbshipit-source-id: 3a675ee1a144ceb8b010f30e1a6163ef670b53f3
-
- 08 Dec, 2018 1 commit
-
-
Peng Li authored
Summary: The original code reports the size of a valid sample instead of an invalid one when raising an Exception , which will make people confused. Pull Request resolved: https://github.com/pytorch/fairseq/pull/403 Differential Revision: D13391431 Pulled By: myleott fbshipit-source-id: 4642ed027c0f664424fc5a9baf4363791144feaf
-
- 07 Dec, 2018 2 commits
-
-
Myle Ott authored
Summary: Let's only decrease the loss scale if a large enough percentage of batches overflow. Pull Request resolved: https://github.com/pytorch/fairseq/pull/397 Differential Revision: D13355159 Pulled By: myleott fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6
-
Halil Akin authored
Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough. Reviewed By: myleott Differential Revision: D13086018 fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee
-
- 06 Dec, 2018 4 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/400 Differential Revision: D13366996 Pulled By: myleott fbshipit-source-id: b4907815e7cc1b4a2aceab11210bf64cb3d814c9
-
Myle Ott authored
Summary: Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later. Pull Request resolved: https://github.com/pytorch/fairseq/pull/399 Differential Revision: D13364674 Pulled By: myleott fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/398 Differential Revision: D13358876 Pulled By: myleott fbshipit-source-id: 57673f2643aac01492cb8f5728bb9f1a34ba6aa7
-
Teng Li authored
Summary: As the title says, better to enable this for certain use cases to make sure things are right Reviewed By: myleott, pietern Differential Revision: D13351753 fbshipit-source-id: cf495960fda71ebd679c23212e19703c93a9dbdc
-
- 04 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: This kind of issue should be rare, but the exception that was thrown before ("UnpicklingError: invalid load key") was very opaque, so let's use something a bit clearer. Pull Request resolved: https://github.com/pytorch/fairseq/pull/396 Differential Revision: D13325600 Pulled By: myleott fbshipit-source-id: 2e7093752d45d6b04a3d506aca8d5694b72ab638
-
- 30 Nov, 2018 1 commit
-
-
linkerr authored
Summary: ….LongTensor but found type torch.cuda.FloatTensor for argument #3 'index' " error in the torch.__version__ == 0.4.0 , new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1) will return a float dtype Tensor, when exec the "line 321: fairseq/fairseq/models/fconv.py " will throw a RuntimeError Pull Request resolved: https://github.com/pytorch/fairseq/pull/393 Differential Revision: D13276496 Pulled By: myleott fbshipit-source-id: e7986246fbe2c79fff61bcab0e5bec9dd63e0afd
-
- 29 Nov, 2018 2 commits
-
-
Haoran Li authored
Summary: replace dynamic index put with copying and creating a new tensor Reviewed By: wanchaol Differential Revision: D13244573 fbshipit-source-id: 909f7913ad579ed035f29bb52321ff01e09a2c60
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/388 Reviewed By: theweiho Differential Revision: D13244869 fbshipit-source-id: d22c18f63f9a691ccc7245e06bc9a5b776a192b5
-
- 27 Nov, 2018 2 commits
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/386 Pull Request resolved: https://github.com/pytorch/translate/pull/266 This allows decoder embedding sharing for denoising autoencoder modules with different decoders (one for src decoding and one for tgt decoding) Reviewed By: dpacgopinath Differential Revision: D13133015 fbshipit-source-id: 3c98be639d705744ccf5ba3a8fd7d10ddc7aef4a
-
Haoran Li authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/385 Pull Request resolved: https://github.com/facebookresearch/pytext/pull/6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14292 Reviewed By: jingfeidu Differential Revision: D10517864 fbshipit-source-id: 81008b5cc6aab70e23329c187392fb72ee057d78
-
- 26 Nov, 2018 2 commits
-
-
Myle Ott authored
Fix some recursive functions (e.g., reorder_incremental_state) to only touch each module once (#379) Summary: This can happen if a module is registered in more than one place in the network. Pull Request resolved: https://github.com/pytorch/fairseq/pull/379 Differential Revision: D13154498 Pulled By: myleott fbshipit-source-id: a35575d1956a46cd35ac8b16a719ad20ac3e380a
-
Myle Ott authored
Summary: - generalize AppendEosDataset -> TransformEosDataset - remove EOS logic from BacktranslationDataset (use TransformEosDataset instead) - BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself Pull Request resolved: https://github.com/pytorch/fairseq/pull/354 Reviewed By: liezl200 Differential Revision: D12970233 Pulled By: myleott fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36
-
- 19 Nov, 2018 1 commit
-
-
Halil Akin authored
Summary: Fixing some distributed failures that happen when OOMs are observed. Reviewed By: myleott Differential Revision: D13121054 fbshipit-source-id: f71a0a695332acbaa1797e89887b8b7c7ddaa727
-
- 18 Nov, 2018 1 commit
-
-
Naman Goyal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/374 Differential Revision: D13116074 Pulled By: myleott fbshipit-source-id: 485724cc5a40e8360d21e4bf9c35821baa0ddc57
-