- 11 Dec, 2018 1 commit
-
-
Suvrat Bhooshan authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/406 Static helper function in TranslationTask to load pretrained models Reviewed By: myleott Differential Revision: D13345276 fbshipit-source-id: 3a675ee1a144ceb8b010f30e1a6163ef670b53f3
-
- 08 Dec, 2018 1 commit
-
-
Peng Li authored
Summary: The original code reports the size of a valid sample instead of an invalid one when raising an Exception , which will make people confused. Pull Request resolved: https://github.com/pytorch/fairseq/pull/403 Differential Revision: D13391431 Pulled By: myleott fbshipit-source-id: 4642ed027c0f664424fc5a9baf4363791144feaf
-
- 07 Dec, 2018 2 commits
-
-
Myle Ott authored
Summary: Let's only decrease the loss scale if a large enough percentage of batches overflow. Pull Request resolved: https://github.com/pytorch/fairseq/pull/397 Differential Revision: D13355159 Pulled By: myleott fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6
-
Halil Akin authored
Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough. Reviewed By: myleott Differential Revision: D13086018 fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee
-
- 06 Dec, 2018 4 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/400 Differential Revision: D13366996 Pulled By: myleott fbshipit-source-id: b4907815e7cc1b4a2aceab11210bf64cb3d814c9
-
Myle Ott authored
Summary: Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later. Pull Request resolved: https://github.com/pytorch/fairseq/pull/399 Differential Revision: D13364674 Pulled By: myleott fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/398 Differential Revision: D13358876 Pulled By: myleott fbshipit-source-id: 57673f2643aac01492cb8f5728bb9f1a34ba6aa7
-
Teng Li authored
Summary: As the title says, better to enable this for certain use cases to make sure things are right Reviewed By: myleott, pietern Differential Revision: D13351753 fbshipit-source-id: cf495960fda71ebd679c23212e19703c93a9dbdc
-
- 04 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: This kind of issue should be rare, but the exception that was thrown before ("UnpicklingError: invalid load key") was very opaque, so let's use something a bit clearer. Pull Request resolved: https://github.com/pytorch/fairseq/pull/396 Differential Revision: D13325600 Pulled By: myleott fbshipit-source-id: 2e7093752d45d6b04a3d506aca8d5694b72ab638
-
- 30 Nov, 2018 1 commit
-
-
linkerr authored
Summary: ….LongTensor but found type torch.cuda.FloatTensor for argument #3 'index' " error in the torch.__version__ == 0.4.0 , new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1) will return a float dtype Tensor, when exec the "line 321: fairseq/fairseq/models/fconv.py " will throw a RuntimeError Pull Request resolved: https://github.com/pytorch/fairseq/pull/393 Differential Revision: D13276496 Pulled By: myleott fbshipit-source-id: e7986246fbe2c79fff61bcab0e5bec9dd63e0afd
-
- 29 Nov, 2018 2 commits
-
-
Haoran Li authored
Summary: replace dynamic index put with copying and creating a new tensor Reviewed By: wanchaol Differential Revision: D13244573 fbshipit-source-id: 909f7913ad579ed035f29bb52321ff01e09a2c60
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/388 Reviewed By: theweiho Differential Revision: D13244869 fbshipit-source-id: d22c18f63f9a691ccc7245e06bc9a5b776a192b5
-
- 27 Nov, 2018 2 commits
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/386 Pull Request resolved: https://github.com/pytorch/translate/pull/266 This allows decoder embedding sharing for denoising autoencoder modules with different decoders (one for src decoding and one for tgt decoding) Reviewed By: dpacgopinath Differential Revision: D13133015 fbshipit-source-id: 3c98be639d705744ccf5ba3a8fd7d10ddc7aef4a
-
Haoran Li authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/385 Pull Request resolved: https://github.com/facebookresearch/pytext/pull/6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14292 Reviewed By: jingfeidu Differential Revision: D10517864 fbshipit-source-id: 81008b5cc6aab70e23329c187392fb72ee057d78
-
- 26 Nov, 2018 2 commits
-
-
Myle Ott authored
Fix some recursive functions (e.g., reorder_incremental_state) to only touch each module once (#379) Summary: This can happen if a module is registered in more than one place in the network. Pull Request resolved: https://github.com/pytorch/fairseq/pull/379 Differential Revision: D13154498 Pulled By: myleott fbshipit-source-id: a35575d1956a46cd35ac8b16a719ad20ac3e380a
-
Myle Ott authored
Summary: - generalize AppendEosDataset -> TransformEosDataset - remove EOS logic from BacktranslationDataset (use TransformEosDataset instead) - BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself Pull Request resolved: https://github.com/pytorch/fairseq/pull/354 Reviewed By: liezl200 Differential Revision: D12970233 Pulled By: myleott fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36
-
- 19 Nov, 2018 1 commit
-
-
Halil Akin authored
Summary: Fixing some distributed failures that happen when OOMs are observed. Reviewed By: myleott Differential Revision: D13121054 fbshipit-source-id: f71a0a695332acbaa1797e89887b8b7c7ddaa727
-
- 18 Nov, 2018 2 commits
-
-
Naman Goyal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/374 Differential Revision: D13116074 Pulled By: myleott fbshipit-source-id: 485724cc5a40e8360d21e4bf9c35821baa0ddc57
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/372 Differential Revision: D13114426 Pulled By: myleott fbshipit-source-id: 6c24b96a3556a0ecd3d1f350642a884254a40bd3
-
- 17 Nov, 2018 1 commit
-
-
Myle Ott authored
Summary: This should bring back the speedup with --update-freq that we reported in the Scaling Neural Machine Translation paper. Pull Request resolved: https://github.com/pytorch/fairseq/pull/370 Differential Revision: D13100281 Pulled By: myleott fbshipit-source-id: 4a81b51bb7390a197add314a4be5512bbf68c085
-
- 16 Nov, 2018 1 commit
-
-
Haoran Li authored
Reviewed By: jingfeidu Differential Revision: D13104360 fbshipit-source-id: 9636f5ee2721818f98b33af559fa24292534a72f
-
- 14 Nov, 2018 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/366 Differential Revision: D13058513 Pulled By: myleott fbshipit-source-id: a146d2cfb345d404775ed8d6b8e4a4ad4e7a33b4
-
- 13 Nov, 2018 1 commit
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/362 Pull Request resolved: https://github.com/pytorch/translate/pull/254 This actually uses the fairseq logic which supports BPE cont / end word marker suffixes. Reviewed By: xianxl Differential Revision: D12952766 fbshipit-source-id: 35a1bbc38240e4145bec0fc419f2d0a6a73ae2e5
-
- 10 Nov, 2018 1 commit
-
-
Ruty Rinott authored
Summary: step 2 of pipeline for LM training assumes tokenized text data as input. Splits it into train/validation/test, and runs binarization (step a_ii in https://fb.quip.com/kazzAxvZHBj9) Reviewed By: borguz Differential Revision: D10454705 fbshipit-source-id: 74e8679041f5507c4e404c1b719547c2ae9ed983
-
- 08 Nov, 2018 1 commit
-
-
Peng-Jen Chen authored
Summary: D10052908 introduce multilingual_translation task, but it raises exception when training with multiple-GPUs: P60202593 With Myle's help, we found that it is because of improperly handled dummy batch data type, and it causes optimizer.backward() is not executed same number of times cross different GPUs. Reviewed By: xianxl Differential Revision: D12964263 fbshipit-source-id: 4991039030bf373f0c484e131acc4736487be4d8
-
- 07 Nov, 2018 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/352 Differential Revision: D12956930 Pulled By: myleott fbshipit-source-id: 39334a79544bac570feb04be9103269d7c1563f9
-
Liezl Puzon authored
Summary: There are 2 ways to implement BPE: 1. use a continuation marker suffix to indicate that there is at least one more subtoken left in the word 2. use a end of word marker suffix to indicate that there is no more subtokens left in the word This adds some logic to account for either kind of BPE marker suffix. This diff adds a corresponding test. I also refactored the test setup to reduce the number of boolean args when setting up test data. Reviewed By: xianxl Differential Revision: D12919428 fbshipit-source-id: 405e9f346dce6e736c1305288721dfc7b63e872a
-
- 02 Nov, 2018 2 commits
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/340 This allows us to do a lot less copy paste when adding new word shuffle function tests Reviewed By: xianxl Differential Revision: D12810304 fbshipit-source-id: a56b5df093d17be2b73837897c526978cab92b70
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/341 Use black formatting in test_noising.py Reviewed By: xianxl Differential Revision: D12810285 fbshipit-source-id: 5517dd5d2f086831f487d88acf6bc2fa18820297
-
- 01 Nov, 2018 6 commits
-
-
ngimel authored
Summary: Currently, if `ignore-case` is set, the same line will be yielded twice - once as lower-cased version, once as original version, leading to lower than expected uncased scores. Pull Request resolved: https://github.com/pytorch/fairseq/pull/339 Differential Revision: D12890386 Pulled By: myleott fbshipit-source-id: 0570e5f6e8f848f2c6439d615e70aca6df097eef
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/337 Pull Request resolved: https://github.com/pytorch/translate/pull/250 Reviewed By: akinh Differential Revision: D12880352 fbshipit-source-id: 61e9888a9cc3df07e805820b74a5fcf359dfe0ea
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/251 We should use shared encoder and separate decoders as in: https://fb.facebook.com/groups/2156114531381111/permalink/2169028113423086/ Generation is a hack, ideally the net input should have the lang pair info so that when we pass the sample to the model, it can select the correct encoder/decoder pair. diff [2/2] will be for flow integration for basic experimentation TODO in a future diff: figure out how to generalize this so export will work?? This works with vocab reduction, but we only support vocab reduction for src-tgt, not src-src model. A future (lowpri) task could be to add word prediction vocab reduction for src-src model to speed up training. Reviewed By: xianxl Differential Revision: D10512576 fbshipit-source-id: 545d96cad8e814b9da7be102a48cc5cac358b758
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/336 Differential Revision: D12876709 Pulled By: myleott fbshipit-source-id: a31536e2eb93f752600b9940c28e9b9fcefc8b86
-
Zihao Fu authored
Summary: Modify Error message of bleu. Fix the issue: https://github.com/pytorch/fairseq/issues/284 Pull Request resolved: https://github.com/pytorch/fairseq/pull/320 Differential Revision: D12876721 Pulled By: myleott fbshipit-source-id: df25885a94a584cbf4b86a1665e3e513c7eb8e9a
-
John Pope authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/290 Differential Revision: D12876759 Pulled By: myleott fbshipit-source-id: 9f6d1c9de27dad29368a7edb923dfcf770355938
-
- 30 Oct, 2018 1 commit
-
-
James Cross authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/333 A tiny hack to speed up inference slightly for transformer beam search after export to graph mode. Specifically, there is no need to transpose a dimension with size 1 (the sequence length of a single decoder time step during beam search) with its neighbor immediately before a view/reshape. Reviewed By: jmp84 Differential Revision: D12833011 fbshipit-source-id: f9c344a9ad595e6e48a8a65b31cf2b1392f9b938
-
- 27 Oct, 2018 1 commit
-
-
Xian Li authored
Summary: We'd like to resue the noising functions and DenoisingDataset in adversarial training. However, current noising functions assume the input are subword tokens. The goal of this diff is to extend it so the noising can be applied to word tokens. Since we're mostly interested in the word shuffle noising, so I only modified the WordShuffle class. Reviewed By: liezl200 Differential Revision: D10523177 fbshipit-source-id: 1e5d27362850675010e73cd38850c890d42652ab
-
- 26 Oct, 2018 1 commit
-
-
Wei Ho authored
Summary: Fix fairseq's `force` option for disabling print suppression (otherwise, `print(..., force=True)` fails on master since the force kwarg gets passed to the builtin print). Reviewed By: dpacgopinath Differential Revision: D10522058 fbshipit-source-id: bbc10c021a7d21396ebfbb1bf007f6b9b162f4fd
-
- 25 Oct, 2018 2 commits
-
-
Deepak Gopinath authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/330 As part of the semi sueprvised task setup (https://github.com/pytorch/translate/pull/243), this diff adds the ability for LanguagePairDataset to remove EOS from source or append EOS to target. This functionality is required by BacktranslationDataset to use translations as source data. Also added changes to BacktranslationDataset to make it work on GPU. We needed to transfer back-translated sentences back to CPU for the LanguagePairDataset to collate. Reviewed By: liezl200 Differential Revision: D10846294 fbshipit-source-id: b015ecb5fcef26fba507c30f8a4992bdbc54899f
-
Haoran Li authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/321 Reviewed By: alexeib Differential Revision: D10430186 fbshipit-source-id: 9cc8fe0f202cc49370cecf36312bcc9bf0b4deee
-