Commits · 0a628401adb62873899b4e13ac9415c1b330ca45 · OpenDAS / Fairseq

19 Oct, 2018 1 commit

Update upgrade_state_dict in transformer.py to upgrade_state_dict_named (#317) · 0a628401

Peng-Jen Chen authored Oct 19, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/317

When upgrading `state_dict` variable, `upgrade_state_dict` function in TransformerEncoder/TransformerDecoder doesn't handle multiple encoders/decoders, however, D10052908 will be the case.

Before the change, we will hit error message [1] when loading checkpoint for multilingual_transformer model in D10052908. This diff will fix it.

Reviewed By: myleott, liezl200

Differential Revision: D10375418

fbshipit-source-id: 7104c1a463e78f3fa33d8479a37c51608be50610

0a628401

17 Oct, 2018 1 commit

fix make_positions() typo (#316) · 0eea6923

James Cross authored Oct 17, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/316

This code should actually be keeping the padded positions as `padding_idx` (though note that this is on the ONNX export path, and it has no effect in the most common case when using the exported network to do un-batched inference).

Reviewed By: myleott

Differential Revision: D10431872

fbshipit-source-id: 79fe4ac27cafcd4701e0f2a90e29d1b7362dc6f8

0eea6923

06 Oct, 2018 2 commits

Add denoising dataset for denoising autoencoder (#306) · e286243c

Liezl Puzon authored Oct 05, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/306

This uses a source dataset to generate a batch of {source: noisy source, target: original clean source} which allows us to train a denoising autoencoding component as part of a seq2seq model.

Reviewed By: xianxl

Differential Revision: D10078981

fbshipit-source-id: 026225984d4a97062ac05dc3a36e79b5c841fe9c

e286243c

Have noising account for sentences with and without EOS (#305) · 8798a240

Liezl Puzon authored Oct 05, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/305

Previously, noising code assumed that every sentence had an EOS which had to be excluded from noising operations (since we shouldn't drop, blank, or shuffle EOS). This logic allows the noising module to handle sentences with EOS and without EOS

Reviewed By: xianxl

Differential Revision: D10114425

fbshipit-source-id: 04ec8547343eb94266bda1ac7fca3d8a1991c9f4

8798a240

05 Oct, 2018 1 commit

multihead_attention: pre-transpose incremental state (#232) · 265f42b7

James Cross authored Oct 05, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/232

Though transpose operations are essentially free during PyTorch execution, they can result in costly operations when exported to Caffe2 inference nets via ONNX tracing, especially when applied repeatedly to large tensors.

For this reason, we update `MultiheadAttention` to store its incremental state with shape (bsz, num_heads, seq_len, head_dim), that is after transposing the projected input. This should result in non-trivially faster exported models without changing the semantics or speed of PyTorch execution.

Reviewed By: myleott

Differential Revision: D10186506

fbshipit-source-id: 8a42712423ee767ea49ed88d2a4653f900d14fba

265f42b7

04 Oct, 2018 1 commit

Option to remove EOS at source in backtranslation dataset · b9e29a47

Liezl Puzon authored Oct 03, 2018

Summary:
If we want our parallel data to have EOS at the end of source, we keep the EOS at the end of the generated source dialect backtranslation.
If we don't want our parallel data to have EOS at the end of source, we **remove** the EOS at the end of the generated source dialect backtranslation.

Note: we always want EOS at the end of our target / reference in parallel data so our model can learn to generate a sentence at any arbitrary length. So we make sure that the original target has an EOS before returning a batch of {generated src, original target}. If our original targets in tgt dataset doesn't have an EOS, we append EOS to each tgt sample before collating.
We only do this for the purpose of collating a {generated src, original tgt} batch AFTER generating the backtranslations. We don't enforce any EOS before passing tgt to the tgt->src model for generating the backtranslation. The users of this dataset is expected to format tgt dataset examples in the correct format that the tgt->src model expects.

Reviewed By: jmp84

Differential Revision: D10157725

fbshipit-source-id: eb6a15f13c651f7c435b8db28103c9a8189845fb

b9e29a47

03 Oct, 2018 2 commits

Fix proxying in DistributedFairseqModel · fc677c94

Myle Ott authored Oct 03, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/302

Differential Revision: D10174608

Pulled By: myleott

fbshipit-source-id: 4e2dfc76eae97afc5488f29b47e74f9897a643ff

fc677c94

Pass in kwargs and SequenceGenerator class to init BacktranslationDataset · f766c9a0

Liezl Puzon authored Oct 02, 2018

Summary: This generalizes BacktranslationDataset to allow us to use any SequenceGenerator class. For example, if we want to use this model in PyTorch Translate, we can pass the following to BacktraanslationDataset init: (1) a PyTorch Translate SequenceGenerator class as generator_class and (2) the appropriate args for initializing that class as kwargs.

Reviewed By: xianxl

Differential Revision: D10156552

fbshipit-source-id: 0495d825bf4727da96d0d9a40dc434135ff3486c

f766c9a0

02 Oct, 2018 2 commits

Update README.md · df88ba95

Michael Auli authored Oct 02, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/300

Differential Revision: D10154711

Pulled By: edunov

fbshipit-source-id: 859d1ac59923b67c1547b6f7acb94f801b0c3318

df88ba95

Explicitly list out generation args for backtranslation dataset · 86e93f2b

Liezl Puzon authored Oct 02, 2018

Summary:
Using argparse Namespace hides the actual args that are expected and makes code harder to read.

Note the difference in style for the args list

    def __init__(
        self,
        tgt_dataset,
        tgt_dict,
        backtranslation_model,
        unkpen,
        sampling,
        beam,
        max_len_a,
        max_len_b,
    ):

instead of

    def __init__(
        self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling,
        beam,  max_len_a, max_len_b,
    ):

Reviewed By: dpacgopinath

Differential Revision: D10152331

fbshipit-source-id: 6539ccba09d48acf23759996b7e32fb329b3e3f6

86e93f2b

01 Oct, 2018 1 commit

Merge internal changes · 22e535e2

alexeib authored Sep 30, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/296

Differential Revision: D10121830

Pulled By: alexeib

fbshipit-source-id: 1b73430bdfdcb20a9a6123abfca3472a0d307b3b

22e535e2

30 Sep, 2018 3 commits

Merge internal changes (#295) · b87c5366

Myle Ott authored Sep 30, 2018

Summary:
Changelog:
- `90f52a1`: Support loading subsets of the data on each worker with the `--fix-batches-to-gpus` flag. This should fix #217 and #266.
- `6eda0a9`: Update README for replicating the "Scaling Neural Machine Translation" paper
- `b14c7cf`: Fallback to no_c10d backend for pytorch 0.4.1 (fixes #294)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/295

Differential Revision: D10121559

Pulled By: myleott

fbshipit-source-id: 41c84d0ee4cdd113544b5d3aa38ae8b23acc2c27

b87c5366

fbshipit-source-id: 17992f6a5908f078942544b769eda7a340a5e359 · 0bc5c2e9
myleott authored Sep 30, 2018

0bc5c2e9
fbshipit-source-id: 6a835d32f9dc5e0de118f1b46d365d0e0cc85e11 · f8377a70
myleott authored Sep 30, 2018

f8377a70

25 Sep, 2018 18 commits
- Online backtranslation module · 864b89d0
  Myle Ott authored Sep 25, 2018
```
Co-authored-by: liezl200 <lie@fb.com>
```
  864b89d0
- Add back secondary set · a4fe8c99
  Sergey Edunov authored Sep 24, 2018
  
  a4fe8c99
- Merge internal changes · 535ca991
  Myle Ott authored Sep 24, 2018
  
  535ca991
- fix issue with truncated dict · 28069cf4
  alexeib authored Sep 21, 2018
  
  28069cf4
- core changes to support latte collab · cfd2a3a0
  Alexei Baevski authored Sep 20, 2018
  
  cfd2a3a0
- Better support for various c10d API changes · fbe8ce65
  Myle Ott authored Sep 17, 2018
  
  fbe8ce65
- Fix type of c10d bucket size · 78071e0f
  Myle Ott authored Sep 12, 2018
  
  78071e0f
- Parallel preprocessing · 862cad11
  Sergey Edunov authored Sep 12, 2018
  
  862cad11
- Fix adaptive loss logging · ee46c63b
  Sergey Edunov authored Sep 10, 2018
  
  ee46c63b
- Add unit test to verify reproducibility after reloading checkpoints · e775877f
  Myle Ott authored Sep 09, 2018
  
  e775877f
- Fix validation loss · 83e08b6f
  Myle Ott authored Sep 09, 2018
  
  83e08b6f
- Pass encoder_input to generator, rather than src_tokens/src_lengths. · bfeb7732
  Stephen Roller authored Sep 08, 2018
  
  bfeb7732
- Update LM test with --no-c10d · 8bd8ec8f
  Myle Ott authored Sep 07, 2018
  
  8bd8ec8f
- Disable c10d for AdaptiveLoss · f66e9cb5
  Myle Ott authored Sep 06, 2018
  
  f66e9cb5
- Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 · 1082ba35
  Sergey Edunov authored Sep 06, 2018
```
- no more FP16Trainer, we just have an FP16Optimizer wrapper
- most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
- Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
- Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
```
  1082ba35
- Revert sequence generator changes · 311d2c6c
  Myle Ott authored Sep 06, 2018
  
  311d2c6c
- Sequence generator bug fix. · 0714080b
  Stephen Roller authored Sep 05, 2018
  
  0714080b
- Generator: net_input instead of manual src_tokens. · e6d45d5c
  Stephen Roller authored Sep 05, 2018
  
  e6d45d5c
24 Sep, 2018 2 commits
- Merge pull request #287 from pytorch/oss-master · 25524f19
  Sergey Edunov authored Sep 24, 2018
```
Update readme with WMT'18 model (#433)
```
  25524f19
- Update readme with WMT'18 model (#433) · 86b5cfe4
  Sergey Edunov authored Sep 24, 2018
  
  86b5cfe4
18 Sep, 2018 4 commits
- Merge pull request #279 from pytorch/oss-master · 5d150856
  Sergey Edunov authored Sep 17, 2018
```
Oss master
```
  5d150856
- Readme fix · 74b3f1e9
  Sergey Edunov authored Sep 17, 2018
  
  74b3f1e9
- Fix docs · fe2d1581
  Sergey Edunov authored Sep 17, 2018
  
  fe2d1581
- Fix readme · 5d944b06
  Sergey Edunov authored Sep 17, 2018
  
  5d944b06
07 Sep, 2018 1 commit
- modified stories readme to include sample preprocessing code to split stories to 1k tokens · 5d00e8ee
  Angela Fan authored Sep 07, 2018
  
  5d00e8ee
04 Sep, 2018 1 commit
- Update documentation · 4a47b889
  Myle Ott authored Sep 03, 2018
  
  4a47b889