- 13 Nov, 2018 1 commit
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/362 Pull Request resolved: https://github.com/pytorch/translate/pull/254 This actually uses the fairseq logic which supports BPE cont / end word marker suffixes. Reviewed By: xianxl Differential Revision: D12952766 fbshipit-source-id: 35a1bbc38240e4145bec0fc419f2d0a6a73ae2e5
-
- 10 Nov, 2018 1 commit
-
-
Ruty Rinott authored
Summary: step 2 of pipeline for LM training assumes tokenized text data as input. Splits it into train/validation/test, and runs binarization (step a_ii in https://fb.quip.com/kazzAxvZHBj9) Reviewed By: borguz Differential Revision: D10454705 fbshipit-source-id: 74e8679041f5507c4e404c1b719547c2ae9ed983
-
- 08 Nov, 2018 1 commit
-
-
Peng-Jen Chen authored
Summary: D10052908 introduce multilingual_translation task, but it raises exception when training with multiple-GPUs: P60202593 With Myle's help, we found that it is because of improperly handled dummy batch data type, and it causes optimizer.backward() is not executed same number of times cross different GPUs. Reviewed By: xianxl Differential Revision: D12964263 fbshipit-source-id: 4991039030bf373f0c484e131acc4736487be4d8
-
- 07 Nov, 2018 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/352 Differential Revision: D12956930 Pulled By: myleott fbshipit-source-id: 39334a79544bac570feb04be9103269d7c1563f9
-
Liezl Puzon authored
Summary: There are 2 ways to implement BPE: 1. use a continuation marker suffix to indicate that there is at least one more subtoken left in the word 2. use a end of word marker suffix to indicate that there is no more subtokens left in the word This adds some logic to account for either kind of BPE marker suffix. This diff adds a corresponding test. I also refactored the test setup to reduce the number of boolean args when setting up test data. Reviewed By: xianxl Differential Revision: D12919428 fbshipit-source-id: 405e9f346dce6e736c1305288721dfc7b63e872a
-
- 02 Nov, 2018 2 commits
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/340 This allows us to do a lot less copy paste when adding new word shuffle function tests Reviewed By: xianxl Differential Revision: D12810304 fbshipit-source-id: a56b5df093d17be2b73837897c526978cab92b70
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/341 Use black formatting in test_noising.py Reviewed By: xianxl Differential Revision: D12810285 fbshipit-source-id: 5517dd5d2f086831f487d88acf6bc2fa18820297
-
- 01 Nov, 2018 6 commits
-
-
ngimel authored
Summary: Currently, if `ignore-case` is set, the same line will be yielded twice - once as lower-cased version, once as original version, leading to lower than expected uncased scores. Pull Request resolved: https://github.com/pytorch/fairseq/pull/339 Differential Revision: D12890386 Pulled By: myleott fbshipit-source-id: 0570e5f6e8f848f2c6439d615e70aca6df097eef
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/337 Pull Request resolved: https://github.com/pytorch/translate/pull/250 Reviewed By: akinh Differential Revision: D12880352 fbshipit-source-id: 61e9888a9cc3df07e805820b74a5fcf359dfe0ea
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/251 We should use shared encoder and separate decoders as in: https://fb.facebook.com/groups/2156114531381111/permalink/2169028113423086/ Generation is a hack, ideally the net input should have the lang pair info so that when we pass the sample to the model, it can select the correct encoder/decoder pair. diff [2/2] will be for flow integration for basic experimentation TODO in a future diff: figure out how to generalize this so export will work?? This works with vocab reduction, but we only support vocab reduction for src-tgt, not src-src model. A future (lowpri) task could be to add word prediction vocab reduction for src-src model to speed up training. Reviewed By: xianxl Differential Revision: D10512576 fbshipit-source-id: 545d96cad8e814b9da7be102a48cc5cac358b758
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/336 Differential Revision: D12876709 Pulled By: myleott fbshipit-source-id: a31536e2eb93f752600b9940c28e9b9fcefc8b86
-
Zihao Fu authored
Summary: Modify Error message of bleu. Fix the issue: https://github.com/pytorch/fairseq/issues/284 Pull Request resolved: https://github.com/pytorch/fairseq/pull/320 Differential Revision: D12876721 Pulled By: myleott fbshipit-source-id: df25885a94a584cbf4b86a1665e3e513c7eb8e9a
-
John Pope authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/290 Differential Revision: D12876759 Pulled By: myleott fbshipit-source-id: 9f6d1c9de27dad29368a7edb923dfcf770355938
-
- 30 Oct, 2018 1 commit
-
-
James Cross authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/333 A tiny hack to speed up inference slightly for transformer beam search after export to graph mode. Specifically, there is no need to transpose a dimension with size 1 (the sequence length of a single decoder time step during beam search) with its neighbor immediately before a view/reshape. Reviewed By: jmp84 Differential Revision: D12833011 fbshipit-source-id: f9c344a9ad595e6e48a8a65b31cf2b1392f9b938
-
- 27 Oct, 2018 1 commit
-
-
Xian Li authored
Summary: We'd like to resue the noising functions and DenoisingDataset in adversarial training. However, current noising functions assume the input are subword tokens. The goal of this diff is to extend it so the noising can be applied to word tokens. Since we're mostly interested in the word shuffle noising, so I only modified the WordShuffle class. Reviewed By: liezl200 Differential Revision: D10523177 fbshipit-source-id: 1e5d27362850675010e73cd38850c890d42652ab
-
- 26 Oct, 2018 1 commit
-
-
Wei Ho authored
Summary: Fix fairseq's `force` option for disabling print suppression (otherwise, `print(..., force=True)` fails on master since the force kwarg gets passed to the builtin print). Reviewed By: dpacgopinath Differential Revision: D10522058 fbshipit-source-id: bbc10c021a7d21396ebfbb1bf007f6b9b162f4fd
-
- 25 Oct, 2018 2 commits
-
-
Deepak Gopinath authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/330 As part of the semi sueprvised task setup (https://github.com/pytorch/translate/pull/243), this diff adds the ability for LanguagePairDataset to remove EOS from source or append EOS to target. This functionality is required by BacktranslationDataset to use translations as source data. Also added changes to BacktranslationDataset to make it work on GPU. We needed to transfer back-translated sentences back to CPU for the LanguagePairDataset to collate. Reviewed By: liezl200 Differential Revision: D10846294 fbshipit-source-id: b015ecb5fcef26fba507c30f8a4992bdbc54899f
-
Haoran Li authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/321 Reviewed By: alexeib Differential Revision: D10430186 fbshipit-source-id: 9cc8fe0f202cc49370cecf36312bcc9bf0b4deee
-
- 23 Oct, 2018 2 commits
-
-
Deepak Gopinath authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/325 RoundRobinZipDataset requires size(index) method implemented in every dataset used. Also added missing return statements in a few methods. Reviewed By: liezl200 Differential Revision: D10457159 fbshipit-source-id: 01856eb455f2f3a21e7fb723129ff35fbe29e0ae
-
Deepak Gopinath authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/324 BacktranslationDataset was introduced recently but was not exposed as part of the fairseq.data module Reviewed By: liezl200 Differential Revision: D10412717 fbshipit-source-id: 8a9d4ecd43fd376e895c450d00e765a869c95eff
-
- 22 Oct, 2018 1 commit
-
-
Halil Akin authored
Summary: This is another failure due to distributed GPU's getting out of sync. We are running save_and_eval (which has the inter-gpu communication calls) by looking at number of updates. But number of updates means weight updates. Whenever there is an issue in the training and weights can't be updated, nodes go out of sync and nodes start failing. So we should check number of iterations instead. I am, again, making a small change to save the day, but we should decouple/refactor save_and_eval logic from the training, to have less headache in future. Planning, working on that in future. But this should solve some of the issues for now. Reviewed By: jhcross Differential Revision: D10478427 fbshipit-source-id: b9deacfea252b2fb66b81c799fa78e2439fa514c
-
- 21 Oct, 2018 1 commit
-
-
Peng-Jen Chen authored
Summary: Manually port fairinternal fairseq-py pull request #385 [1] to fbcode. Resolve the merge conflict of removing fp16_trainer per offline discussion with Myle. Also updated codes to make generate.py works. [1] https://github.com/fairinternal/fairseq-py/pull/385/commits/18fa6e154781cf0c4b1596429dba7e753a545069 Reviewed By: liezl200 Differential Revision: D10052908 fbshipit-source-id: c3c378d78dc1e9ac087c815f359e78c0048ff2f5
-
- 19 Oct, 2018 1 commit
-
-
Peng-Jen Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/317 When upgrading `state_dict` variable, `upgrade_state_dict` function in TransformerEncoder/TransformerDecoder doesn't handle multiple encoders/decoders, however, D10052908 will be the case. Before the change, we will hit error message [1] when loading checkpoint for multilingual_transformer model in D10052908. This diff will fix it. Reviewed By: myleott, liezl200 Differential Revision: D10375418 fbshipit-source-id: 7104c1a463e78f3fa33d8479a37c51608be50610
-
- 17 Oct, 2018 1 commit
-
-
James Cross authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/316 This code should actually be keeping the padded positions as `padding_idx` (though note that this is on the ONNX export path, and it has no effect in the most common case when using the exported network to do un-batched inference). Reviewed By: myleott Differential Revision: D10431872 fbshipit-source-id: 79fe4ac27cafcd4701e0f2a90e29d1b7362dc6f8
-
- 06 Oct, 2018 2 commits
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/306 This uses a source dataset to generate a batch of {source: noisy source, target: original clean source} which allows us to train a denoising autoencoding component as part of a seq2seq model. Reviewed By: xianxl Differential Revision: D10078981 fbshipit-source-id: 026225984d4a97062ac05dc3a36e79b5c841fe9c
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/305 Previously, noising code assumed that every sentence had an EOS which had to be excluded from noising operations (since we shouldn't drop, blank, or shuffle EOS). This logic allows the noising module to handle sentences with EOS and without EOS Reviewed By: xianxl Differential Revision: D10114425 fbshipit-source-id: 04ec8547343eb94266bda1ac7fca3d8a1991c9f4
-
- 05 Oct, 2018 1 commit
-
-
James Cross authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/232 Though transpose operations are essentially free during PyTorch execution, they can result in costly operations when exported to Caffe2 inference nets via ONNX tracing, especially when applied repeatedly to large tensors. For this reason, we update `MultiheadAttention` to store its incremental state with shape (bsz, num_heads, seq_len, head_dim), that is after transposing the projected input. This should result in non-trivially faster exported models without changing the semantics or speed of PyTorch execution. Reviewed By: myleott Differential Revision: D10186506 fbshipit-source-id: 8a42712423ee767ea49ed88d2a4653f900d14fba
-
- 04 Oct, 2018 1 commit
-
-
Liezl Puzon authored
Summary: If we want our parallel data to have EOS at the end of source, we keep the EOS at the end of the generated source dialect backtranslation. If we don't want our parallel data to have EOS at the end of source, we **remove** the EOS at the end of the generated source dialect backtranslation. Note: we always want EOS at the end of our target / reference in parallel data so our model can learn to generate a sentence at any arbitrary length. So we make sure that the original target has an EOS before returning a batch of {generated src, original target}. If our original targets in tgt dataset doesn't have an EOS, we append EOS to each tgt sample before collating. We only do this for the purpose of collating a {generated src, original tgt} batch AFTER generating the backtranslations. We don't enforce any EOS before passing tgt to the tgt->src model for generating the backtranslation. The users of this dataset is expected to format tgt dataset examples in the correct format that the tgt->src model expects. Reviewed By: jmp84 Differential Revision: D10157725 fbshipit-source-id: eb6a15f13c651f7c435b8db28103c9a8189845fb
-
- 03 Oct, 2018 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/302 Differential Revision: D10174608 Pulled By: myleott fbshipit-source-id: 4e2dfc76eae97afc5488f29b47e74f9897a643ff
-
Liezl Puzon authored
Summary: This generalizes BacktranslationDataset to allow us to use any SequenceGenerator class. For example, if we want to use this model in PyTorch Translate, we can pass the following to BacktraanslationDataset init: (1) a PyTorch Translate SequenceGenerator class as generator_class and (2) the appropriate args for initializing that class as kwargs. Reviewed By: xianxl Differential Revision: D10156552 fbshipit-source-id: 0495d825bf4727da96d0d9a40dc434135ff3486c
-
- 02 Oct, 2018 2 commits
-
-
Michael Auli authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/300 Differential Revision: D10154711 Pulled By: edunov fbshipit-source-id: 859d1ac59923b67c1547b6f7acb94f801b0c3318
-
Liezl Puzon authored
Summary: Using argparse Namespace hides the actual args that are expected and makes code harder to read. Note the difference in style for the args list def __init__( self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling, beam, max_len_a, max_len_b, ): instead of def __init__( self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling, beam, max_len_a, max_len_b, ): Reviewed By: dpacgopinath Differential Revision: D10152331 fbshipit-source-id: 6539ccba09d48acf23759996b7e32fb329b3e3f6
-
- 01 Oct, 2018 1 commit
-
-
alexeib authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/296 Differential Revision: D10121830 Pulled By: alexeib fbshipit-source-id: 1b73430bdfdcb20a9a6123abfca3472a0d307b3b
-
- 30 Sep, 2018 3 commits
-
-
Myle Ott authored
Summary: Changelog: - `90f52a1`: Support loading subsets of the data on each worker with the `--fix-batches-to-gpus` flag. This should fix #217 and #266. - `6eda0a9`: Update README for replicating the "Scaling Neural Machine Translation" paper - `b14c7cf`: Fallback to no_c10d backend for pytorch 0.4.1 (fixes #294) Pull Request resolved: https://github.com/pytorch/fairseq/pull/295 Differential Revision: D10121559 Pulled By: myleott fbshipit-source-id: 41c84d0ee4cdd113544b5d3aa38ae8b23acc2c27
-
myleott authored
-
myleott authored
-
- 25 Sep, 2018 4 commits
-
-
Myle Ott authored
Co-authored-by:liezl200 <lie@fb.com>
-
Sergey Edunov authored
-
Myle Ott authored
-
alexeib authored
-