- 30 Jun, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/697 Differential Revision: D16068465 Pulled By: myleott fbshipit-source-id: c2563c3c682e7e8406e6d7c8e895d8afbec551eb
-
- 27 Jun, 2019 1 commit
-
-
Nayan Singhal authored
Summary: Added BMUF implementation. Todo: 1) Add unit test case for testing model averaging and bmuf 2) Add warm before actually start training the model Reviewed By: jay-mahadeokar Differential Revision: D15871477 fbshipit-source-id: 866b0aba2d5bea5b65b4438acb49c886c4a87924
-
- 12 Jun, 2019 1 commit
-
-
Nayan Singhal authored
Summary: Implemented model averaging for fairseq. Removed the ddp wrapper if global optimizer is provided. Syncing all the models based on the iteration provide in the input TODO: 1) Fix throughput and wps meter. Need to check other meters too. 2) Replace Model average code with BMUF algorithm implementation. Reviewed By: myleott Differential Revision: D15711044 fbshipit-source-id: 58a4af74db2a61d06762597b95836cbeb1ed82cc
-
- 11 Jun, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/792 Differential Revision: D15741781 Pulled By: myleott fbshipit-source-id: c256c7900c307d485904e69b1526b9acbe08fec9
-
- 30 May, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/613 Differential Revision: D15541384 Pulled By: myleott fbshipit-source-id: ef2c0b0a51cdf37af2ccff0546f524d49f87e65d
-
Sujit Verma authored
Summary: Changes for supporting tensorboard scalar plotting. Reviewed By: myleott Differential Revision: D15456534 Pulled By: myleott fbshipit-source-id: a012a4eea028aae764ce11786570b7d96841c4a5
-
- 23 May, 2019 1 commit
-
-
Kritika Singh authored
Summary: Context from https://fb.workplace.com/groups/1405155842844877/permalink/2785095451517569/: I am adding a model to pyspeech (formerly fairspeq) with the following `forward`: ``` def forward(self, src_tokens, src_lengths, prev_output_tokens, name): encoder_out = self.encoder(src_tokens, src_lengths) if name == Dataset.d1: decoder_out = self.decoder1(prev_output_tokens, encoder_out) elif name == Dataset.d2: decoder_out = self.decoder2(encoder_out) return decoder_out ``` When I run distributed training on this model, I get the following error: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at caffe2/torch/csrc/distributed/c10d/reducer.cpp:410) ``` The recommended fix is to pass find_unused_parameters=True to DistributedDataParallel's initialization Reviewed By: myleott Differential Revision: D15439726 fbshipit-source-id: 7fd80d4a3f49ac90182dec723b49b14e6689406a
-
- 20 May, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/592 Differential Revision: D15415499 Pulled By: myleott fbshipit-source-id: 87ba09b9b38501daebd95bbf28815e048c78f9a3
-
- 17 May, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/588 Differential Revision: D15389638 Pulled By: myleott fbshipit-source-id: 4632ce22d51dc2c74d250bae999630095d849701
-
- 08 May, 2019 1 commit
-
-
Myle Ott authored
Reviewed By: jmp84 Differential Revision: D15264847 fbshipit-source-id: 4ba9224d1b35c3de0d26c9b4c1ee6d641d3d8535
-
- 07 May, 2019 1 commit
-
-
Davide Caroselli authored
Summary: Following discussion in https://github.com/pytorch/fairseq/issues/574: - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder - Update scripts/read_binarized.py to support new MMapIndexedDataset - Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate) - Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists() Pull Request resolved: https://github.com/pytorch/fairseq/pull/589 Differential Revision: D14597128 Pulled By: myleott fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e
-
- 05 May, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/695 Differential Revision: D15182613 Pulled By: myleott fbshipit-source-id: 4196346517d8e75ed9e903e9e01ab943d086f6f1
-
- 04 May, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/508 The previous version applied the temperature after the softmax. Fix that, and also generalize so it works with other search approaches. Pull Request resolved: https://github.com/pytorch/fairseq/pull/694 Differential Revision: D15175160 Pulled By: myleott fbshipit-source-id: cc87ff0e97a8a1dd37f9983163f58a8641155ab0
-
- 30 Apr, 2019 1 commit
-
-
Myle Ott authored
Summary: - Add --add-bos-token option to LM task - Cleanup utils.py and options.py Pull Request resolved: https://github.com/pytorch/fairseq/pull/654 Differential Revision: D15041794 Pulled By: myleott fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e
-
- 29 Apr, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/676 Differential Revision: D15114128 Pulled By: myleott fbshipit-source-id: b11dde77b2f2610d33649101aea03fb5a3eeb56a
-
- 15 Mar, 2019 1 commit
-
-
Myle Ott authored
Summary: Changelog: - 998ba4f: Add language models from Baevski & Auli (2018) - 4294c4f6: Add mixture of experts code from Shen et al. (2019) - 00493490: Add example for multilingual training - 48d9afbe: Speed improvements, including fused operators from apex - 44d27e64: Add Tensorboard support - d17fa851: Add Adadelta optimizer - 9e1c880f: Add `FairseqEncoderModel` - b65c579b: Add `FairseqTask.inference_step` to modularize generate.py - 2ad1178e: Add back `--curriculum` - Misc bug fixes and other features Pull Request resolved: https://github.com/pytorch/fairseq/pull/577 Differential Revision: D14481233 Pulled By: myleott fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7
-
- 12 Mar, 2019 1 commit
-
-
Dmytro Okhonko authored
Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths. Reviewed By: myleott Differential Revision: D14420044 fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015
-
- 04 Mar, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/554 Differential Revision: D14300596 Pulled By: myleott fbshipit-source-id: f38c8e58daef99d5e4b97dd423e4142e4294a4f0
-
- 26 Feb, 2019 2 commits
-
-
Myle Ott authored
Summary: Enable with the `--tensorboard-logdir` option. Pull Request resolved: https://github.com/pytorch/fairseq/pull/530 Differential Revision: D14218430 Pulled By: myleott fbshipit-source-id: e7a54f66f928e3bb02ae03fda09b22fa4fa7d053
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/529 Differential Revision: D14218384 Pulled By: myleott fbshipit-source-id: 5d2cbb1f56ea42e9929785aff4a5ae5f44d13724
-
- 01 Feb, 2019 1 commit
-
-
Davide Caroselli authored
Summary: The `preprocess.py` script has been refactored in order to: 1. Use the `options` module for command line arguments parsing. This will give to `preprocess.py` the ability to load custom modules with `--user-dir` flag (already implemented to all other binaries) 2. Dictionary loading and building code has moved to Task implementation. This allows custom Dictionary classes to be used during the data generation step. Pull Request resolved: https://github.com/pytorch/fairseq/pull/448 Differential Revision: D13674819 Pulled By: myleott fbshipit-source-id: b40648a98ed6c08284577e5ec25876e018d8c822
-
- 30 Jan, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/484 Differential Revision: D13880636 Pulled By: myleott fbshipit-source-id: 984b2e1c3b281c28243102eb971ea45ec891d94e
-
Myle Ott authored
Summary: Changelog: - `4889802`: can now remove detokenize sentencepiece output with `--remove-bpe=sentencepiece` (fixes #331). Also added `--sacrebleu` for computing detokenized BLEU. - `0d76427`: fix assertion error when training language model with dataset containing empty sentences - minor bug and style fixes Pull Request resolved: https://github.com/pytorch/fairseq/pull/483 Differential Revision: D13867899 Pulled By: myleott fbshipit-source-id: 25c940b847fe270262ac8f5ac838407b3977fdda
-
- 25 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/471 Differential Revision: D13818918 Pulled By: myleott fbshipit-source-id: d3b8dc50e81ee1d2dcc5efc5815998be8461085f
-
- 16 Jan, 2019 1 commit
-
-
Davide Caroselli authored
Summary: On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`. This pull request fixes this problem: custom module import in now explicit on every `main()` function. Pull Request resolved: https://github.com/pytorch/fairseq/pull/449 Differential Revision: D13676922 Pulled By: myleott fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6
-
- 15 Jan, 2019 1 commit
-
-
Davide Caroselli authored
Summary: Correct help message was obfuscated by the transient `ArgumentParser` used only for eagerly read `--user-dir` flag. To reproduce just try: ```bash python3 train.py --help ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/446 Differential Revision: D13674731 Pulled By: myleott fbshipit-source-id: b9503a4d7ef26405be630d31c0ca02386d783031
-
- 14 Jan, 2019 1 commit
-
-
Davide Caroselli authored
Summary: Following discussion on official fairseq (https://github.com/pytorch/fairseq/issues/438), I added the `--user-dir` option to the command line. The user can now specify a path in order to import a custom module with proprietary tasks, architectures and so on. Pull Request resolved: https://github.com/pytorch/fairseq/pull/440 Differential Revision: D13651721 Pulled By: myleott fbshipit-source-id: 38b87454487f1ffa5eaf19c4bcefa0b3b15a8f43
-
- 05 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/283 Pull Request resolved: https://github.com/pytorch/fairseq/pull/428 Differential Revision: D13564190 Pulled By: myleott fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5
-
- 26 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: - 04cc608: Add `--match-source-len` option to generate.py to for sequence-tagging tasks - 19f1a40: Add `--no-repeat-ngram-size` option to generate.py for ngram blocking Pull Request resolved: https://github.com/pytorch/fairseq/pull/422 Differential Revision: D13548445 Pulled By: myleott fbshipit-source-id: 26d1ae83993e428fcb020dac5ae358b0e36233d9
-
- 07 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: Let's only decrease the loss scale if a large enough percentage of batches overflow. Pull Request resolved: https://github.com/pytorch/fairseq/pull/397 Differential Revision: D13355159 Pulled By: myleott fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6
-
- 06 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later. Pull Request resolved: https://github.com/pytorch/fairseq/pull/399 Differential Revision: D13364674 Pulled By: myleott fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c
-
- 18 Nov, 2018 1 commit
-
-
Naman Goyal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/374 Differential Revision: D13116074 Pulled By: myleott fbshipit-source-id: 485724cc5a40e8360d21e4bf9c35821baa0ddc57
-
- 07 Nov, 2018 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/352 Differential Revision: D12956930 Pulled By: myleott fbshipit-source-id: 39334a79544bac570feb04be9103269d7c1563f9
-
- 30 Sep, 2018 1 commit
-
-
Myle Ott authored
Summary: Changelog: - `90f52a1`: Support loading subsets of the data on each worker with the `--fix-batches-to-gpus` flag. This should fix #217 and #266. - `6eda0a9`: Update README for replicating the "Scaling Neural Machine Translation" paper - `b14c7cf`: Fallback to no_c10d backend for pytorch 0.4.1 (fixes #294) Pull Request resolved: https://github.com/pytorch/fairseq/pull/295 Differential Revision: D10121559 Pulled By: myleott fbshipit-source-id: 41c84d0ee4cdd113544b5d3aa38ae8b23acc2c27
-
- 25 Sep, 2018 4 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Sergey Edunov authored
- no more FP16Trainer, we just have an FP16Optimizer wrapper - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0 - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
-
- 03 Sep, 2018 2 commits
-
-
Myle Ott authored
-
Alexei Baevski authored
-