Commits · 89e077c36bdd37fe0b0649b0232550bb670f672a · OpenDAS / Fairseq

30 Jun, 2019 1 commit

Add additional options for configuring writing of checkpoints · 89e077c3

Myle Ott authored Jun 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/697

Differential Revision: D16068465

Pulled By: myleott

fbshipit-source-id: c2563c3c682e7e8406e6d7c8e895d8afbec551eb

89e077c3

27 Jun, 2019 1 commit

2/N bmuf · c246df42

Nayan Singhal authored Jun 26, 2019

Summary:
Added BMUF implementation.

Todo:
1) Add unit test case for testing model averaging and bmuf
2) Add warm before actually start training the model

Reviewed By: jay-mahadeokar

Differential Revision: D15871477

fbshipit-source-id: 866b0aba2d5bea5b65b4438acb49c886c4a87924

c246df42

12 Jun, 2019 1 commit

Add Model Averaging · 6982c404

Nayan Singhal authored Jun 12, 2019

Summary:
Implemented model averaging for fairseq.
Removed the ddp wrapper if global optimizer is provided.
Syncing all the models based on the iteration provide in the input

TODO:
1) Fix throughput and wps meter. Need to check other meters too.
2) Replace Model average code with BMUF algorithm implementation.

Reviewed By: myleott

Differential Revision: D15711044

fbshipit-source-id: 58a4af74db2a61d06762597b95836cbeb1ed82cc

6982c404

11 Jun, 2019 1 commit

Add generic registry mechanism · 9b40999e

Myle Ott authored Jun 11, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/792

Differential Revision: D15741781

Pulled By: myleott

fbshipit-source-id: c256c7900c307d485904e69b1526b9acbe08fec9

9b40999e

30 May, 2019 2 commits

Add --reset-dataloader · ffc3bb58

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/613

Differential Revision: D15541384

Pulled By: myleott

fbshipit-source-id: ef2c0b0a51cdf37af2ccff0546f524d49f87e65d

ffc3bb58

Added support for plotting scalars through palaas tbwriter interface. (#580) · 47313d85

Sujit Verma authored May 29, 2019

Summary: Changes for supporting tensorboard scalar plotting.

Reviewed By: myleott

Differential Revision: D15456534

Pulled By: myleott

fbshipit-source-id: a012a4eea028aae764ce11786570b7d96841c4a5

47313d85

23 May, 2019 1 commit

Allow unused params in distributed training · 72a5487c

Kritika Singh authored May 22, 2019

Summary:
Context from https://fb.workplace.com/groups/1405155842844877/permalink/2785095451517569/:

I am adding a model to pyspeech (formerly fairspeq) with the following `forward`:
```
def forward(self, src_tokens, src_lengths, prev_output_tokens, name):
    encoder_out = self.encoder(src_tokens, src_lengths)
    if name == Dataset.d1:
        decoder_out = self.decoder1(prev_output_tokens, encoder_out)
    elif name == Dataset.d2:
        decoder_out = self.decoder2(encoder_out)
    return decoder_out
```
When I run distributed training on this model, I get the following error:

```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a
new one. This error indicates that your module has parameters that were not used in
producing loss. You can enable unused parameter detection by (1) passing the keyword
argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2)
making sure all `forward` function outputs participate in calculating loss. If you already have
done the above two steps, then the distributed data parallel module wasn't able to locate the
output tensors in the return value of your module's `forward` function. Please include the loss
function and the structure of the return value of `forward` of your module when reporting this
issue (e.g. list, dict, iterable). (prepare_for_backward at
caffe2/torch/csrc/distributed/c10d/reducer.cpp:410)
```

The recommended fix is to pass find_unused_parameters=True to DistributedDataParallel's initialization

Reviewed By: myleott

Differential Revision: D15439726

fbshipit-source-id: 7fd80d4a3f49ac90182dec723b49b14e6689406a

72a5487c

20 May, 2019 1 commit

Add --disable-validation · b71f8f45

Myle Ott authored May 20, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/592

Differential Revision: D15415499

Pulled By: myleott

fbshipit-source-id: 87ba09b9b38501daebd95bbf28815e048c78f9a3

b71f8f45

17 May, 2019 1 commit

Small features + lint · ba989ed1

Myle Ott authored May 17, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/588

Differential Revision: D15389638

Pulled By: myleott

fbshipit-source-id: 4632ce22d51dc2c74d250bae999630095d849701

ba989ed1

08 May, 2019 1 commit

Don't allow abbreviated argument options · acb9ab32

Myle Ott authored May 08, 2019

Reviewed By: jmp84

Differential Revision: D15264847

fbshipit-source-id: 4ba9224d1b35c3de0d26c9b4c1ee6d641d3d8535

acb9ab32

07 May, 2019 1 commit

Memory-Mapped IndexedDataset implementation (#589) · a1c997bd

Davide Caroselli authored May 07, 2019

Summary:
Following discussion in https://github.com/pytorch/fairseq/issues/574:

 - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
- Update scripts/read_binarized.py to support new MMapIndexedDataset
- Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
- Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
Pull Request resolved: https://github.com/pytorch/fairseq/pull/589

Differential Revision: D14597128

Pulled By: myleott

fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e

a1c997bd

05 May, 2019 1 commit

Initialize distributed using multiproc with all visible GPUs · cf17068a

Myle Ott authored May 04, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/695

Differential Revision: D15182613

Pulled By: myleott

fbshipit-source-id: 4196346517d8e75ed9e903e9e01ab943d086f6f1

cf17068a

04 May, 2019 1 commit

Fix and generalize --temperature option (#508) · 96ac28d3

Myle Ott authored May 04, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/508

The previous version applied the temperature after the softmax. Fix that, and
also generalize so it works with other search approaches.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/694

Differential Revision: D15175160

Pulled By: myleott

fbshipit-source-id: cc87ff0e97a8a1dd37f9983163f58a8641155ab0

96ac28d3

30 Apr, 2019 1 commit

Merge internal changes (#654) · d45db804

Myle Ott authored Apr 29, 2019

Summary:
- Add --add-bos-token option to LM task
- Cleanup utils.py and options.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/654

Differential Revision: D15041794

Pulled By: myleott

fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e

d45db804

29 Apr, 2019 1 commit

Update comments and citations · 849605a0

Myle Ott authored Apr 29, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/676

Differential Revision: D15114128

Pulled By: myleott

fbshipit-source-id: b11dde77b2f2610d33649101aea03fb5a3eeb56a

849605a0

15 Mar, 2019 1 commit

0.6.1 -> 0.6.2 (#577) · e6422528

Myle Ott authored Mar 15, 2019

Summary:
Changelog:
- 998ba4f: Add language models from Baevski & Auli (2018)
- 4294c4f6: Add mixture of experts code from Shen et al. (2019)
- 00493490: Add example for multilingual training
- 48d9afbe: Speed improvements, including fused operators from apex
- 44d27e64: Add Tensorboard support
- d17fa851: Add Adadelta optimizer
- 9e1c880f: Add `FairseqEncoderModel`
- b65c579b: Add `FairseqTask.inference_step` to modularize generate.py
- 2ad1178e: Add back `--curriculum`
- Misc bug fixes and other features

Pull Request resolved: https://github.com/pytorch/fairseq/pull/577

Differential Revision: D14481233

Pulled By: myleott

fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7

e6422528

12 Mar, 2019 1 commit

Handle 3+ dimensional input in sequence_generator + nits · 860010e9

Dmytro Okhonko authored Mar 12, 2019

Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths.

Reviewed By: myleott

Differential Revision: D14420044

fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015

860010e9

04 Mar, 2019 1 commit

Add --curriculum (fixes #533) · 2ad1178e

Myle Ott authored Mar 04, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/554

Differential Revision: D14300596

Pulled By: myleott

fbshipit-source-id: f38c8e58daef99d5e4b97dd423e4142e4294a4f0

2ad1178e

26 Feb, 2019 2 commits

Add Tensorboard support (#530) · 44d27e64

Myle Ott authored Feb 25, 2019

Summary:
Enable with the `--tensorboard-logdir` option.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/530

Differential Revision: D14218430

Pulled By: myleott

fbshipit-source-id: e7a54f66f928e3bb02ae03fda09b22fa4fa7d053

44d27e64

Misc fixes · 65c1903e

Myle Ott authored Feb 25, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/529

Differential Revision: D14218384

Pulled By: myleott

fbshipit-source-id: 5d2cbb1f56ea42e9929785aff4a5ae5f44d13724

65c1903e

01 Feb, 2019 1 commit

Support custom Dictionary implementations in 'preprocess.py' (#448) · bbb4120b

Davide Caroselli authored Feb 01, 2019

Summary:
The `preprocess.py` script has been refactored in order to:

1. Use the `options` module for command line arguments parsing. This will give to `preprocess.py` the ability to load custom modules with `--user-dir` flag (already implemented to all other binaries)
2. Dictionary loading and building code has moved to Task implementation. This allows custom Dictionary classes to be used during the data generation step.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/448

Differential Revision: D13674819

Pulled By: myleott

fbshipit-source-id: b40648a98ed6c08284577e5ec25876e018d8c822

bbb4120b

30 Jan, 2019 2 commits

Add --input option to interactive.py to support reading from file · 3dce7c9f

Myle Ott authored Jan 30, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/484

Differential Revision: D13880636

Pulled By: myleott

fbshipit-source-id: 984b2e1c3b281c28243102eb971ea45ec891d94e

3dce7c9f

Merge internal changes (#483) · 42be3ebd

Myle Ott authored Jan 30, 2019

Summary:
Changelog:
- `4889802`: can now remove detokenize sentencepiece output with `--remove-bpe=sentencepiece` (fixes #331). Also added `--sacrebleu` for computing detokenized BLEU.
- `0d76427`: fix assertion error when training language model with dataset containing empty sentences
- minor bug and style fixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/483

Differential Revision: D13867899

Pulled By: myleott

fbshipit-source-id: 25c940b847fe270262ac8f5ac838407b3977fdda

42be3ebd

25 Jan, 2019 1 commit

Only use c10d distributed primitives · 7e0d222c

Myle Ott authored Jan 25, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/471

Differential Revision: D13818918

Pulled By: myleott

fbshipit-source-id: d3b8dc50e81ee1d2dcc5efc5815998be8461085f

7e0d222c

16 Jan, 2019 1 commit

FIX: '--user-dir' on multi-gpu (#449) · 7853818c

Davide Caroselli authored Jan 16, 2019

Summary:
On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`.

This pull request fixes this problem: custom module import in now explicit on every `main()` function.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/449

Differential Revision: D13676922

Pulled By: myleott

fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6

7853818c

15 Jan, 2019 1 commit

Fixed wrong help message shown on '--help' (#446) · cefe3f8a

Davide Caroselli authored Jan 15, 2019

Summary:
Correct help message was obfuscated by the transient `ArgumentParser` used only for eagerly read `--user-dir` flag.

To reproduce just try:
```bash
python3 train.py --help
```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/446

Differential Revision: D13674731

Pulled By: myleott

fbshipit-source-id: b9503a4d7ef26405be630d31c0ca02386d783031

cefe3f8a

14 Jan, 2019 1 commit

New command line option '--user-dir' (#440) · b15f5f53

Davide Caroselli authored Jan 14, 2019

Summary:
Following discussion on official fairseq (https://github.com/pytorch/fairseq/issues/438), I added the `--user-dir` option to the command line. The user can now specify a path in order to import a custom module with proprietary tasks, architectures and so on.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/440

Differential Revision: D13651721

Pulled By: myleott

fbshipit-source-id: 38b87454487f1ffa5eaf19c4bcefa0b3b15a8f43

b15f5f53

05 Jan, 2019 1 commit

Merge internal changes (#283) · 7633129b

Myle Ott authored Jan 04, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/283

Pull Request resolved: https://github.com/pytorch/fairseq/pull/428

Differential Revision: D13564190

Pulled By: myleott

fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5

7633129b

26 Dec, 2018 1 commit

Merge internal changes (#422) · 8ce6499d

Myle Ott authored Dec 26, 2018

Summary:
- 04cc608: Add `--match-source-len` option to generate.py to for sequence-tagging tasks
- 19f1a40: Add `--no-repeat-ngram-size` option to generate.py for ngram blocking
Pull Request resolved: https://github.com/pytorch/fairseq/pull/422

Differential Revision: D13548445

Pulled By: myleott

fbshipit-source-id: 26d1ae83993e428fcb020dac5ae358b0e36233d9

8ce6499d

07 Dec, 2018 1 commit

Add --fp16-scale-tolerance (#397) · 03ef3ab8

Myle Ott authored Dec 07, 2018

Summary:
Let's only decrease the loss scale if a large enough percentage of batches overflow.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/397

Differential Revision: D13355159

Pulled By: myleott

fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6

03ef3ab8

06 Dec, 2018 1 commit

Fix arg formatting in preprocess.py and add fmt control for black formatting (#399) · 82a9f923

Myle Ott authored Dec 06, 2018

Summary:
Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/399

Differential Revision: D13364674

Pulled By: myleott

fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c

82a9f923

18 Nov, 2018 1 commit

Merge small fixes from internal · 693894b6

Naman Goyal authored Nov 18, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/374

Differential Revision: D13116074

Pulled By: myleott

fbshipit-source-id: 485724cc5a40e8360d21e4bf9c35821baa0ddc57

693894b6

07 Nov, 2018 1 commit

Merge internal changes · 8eb232ce

Myle Ott authored Nov 07, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/352

Differential Revision: D12956930

Pulled By: myleott

fbshipit-source-id: 39334a79544bac570feb04be9103269d7c1563f9

8eb232ce

30 Sep, 2018 1 commit

Merge internal changes (#295) · b87c5366

Myle Ott authored Sep 30, 2018

Summary:
Changelog:
- `90f52a1`: Support loading subsets of the data on each worker with the `--fix-batches-to-gpus` flag. This should fix #217 and #266.
- `6eda0a9`: Update README for replicating the "Scaling Neural Machine Translation" paper
- `b14c7cf`: Fallback to no_c10d backend for pytorch 0.4.1 (fixes #294)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/295

Differential Revision: D10121559

Pulled By: myleott

fbshipit-source-id: 41c84d0ee4cdd113544b5d3aa38ae8b23acc2c27

b87c5366

25 Sep, 2018 4 commits

Better support for various c10d API changes · fbe8ce65
Myle Ott authored Sep 17, 2018

fbe8ce65
Fix type of c10d bucket size · 78071e0f
Myle Ott authored Sep 12, 2018

78071e0f
Add unit test to verify reproducibility after reloading checkpoints · e775877f
Myle Ott authored Sep 09, 2018

e775877f

Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 · 1082ba35

Sergey Edunov authored Sep 06, 2018

- no more FP16Trainer, we just have an FP16Optimizer wrapper
- most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
- Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
- Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq

1082ba35

03 Sep, 2018 2 commits
- Add documentation · 6381cc97
  Myle Ott authored Sep 03, 2018
  
  6381cc97
- word stats in eval_lm · c7c567a7
  Alexei Baevski authored Aug 26, 2018
  
  c7c567a7