Commits · 4ac2c5f2cc8a8b1f221f1e8e9b7839f07c25d997 · OpenDAS / Fairseq

29 Sep, 2019 1 commit

Implementation of the WeCNLP abstract "Cross+Self-Attention for Transformer Models" (#1097) · 4ac2c5f2

Stephan Peitz authored Sep 29, 2019

Summary:
This PR implements a new attention module which combines cross-attention (encoder-decoder attention) and the decoder self-attention. This work was accepted as an abstract at WeCNLP 2019 (https://www.wecnlp.ai/wecnlp-2019).

Cross+Self-Attention reduces the amount of parameter and increases the inference speed without any degradation in translation quality.
More details can be found in the attached [abstract](https://github.com/pytorch/fairseq/files/3561282/paper.pdf)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1097

Differential Revision: D17653168

Pulled By: myleott

fbshipit-source-id: deb834c2c78a229d7418ffbfea20ba3ce252991c

4ac2c5f2

27 Sep, 2019 1 commit

Levenshtein Transformer paper code · 86857a58

Changhan Wang authored Sep 27, 2019

Summary:
Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006)
* Added Levenshtein Transformer model, task and criterion class
* Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines
* Add an option for prepending BOS to dictionary class and translation task class

Reviewed By: myleott

Differential Revision: D17297372

fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7

86857a58

19 Sep, 2019 1 commit

Add dataset class for weighted sampling with replacement. (#861) · a8a85c26

Jerry Ma authored Sep 19, 2019

Summary:
As discussed with Naman earlier today. Weighted sampling with
replacement can be done on a per-epoch basis using `set_epoch()`
functionality, which generates the samples as a function of random seed
and epoch.

Additionally, `FairseqTask` needs to set the starting epoch for the
dataset at the very beginning of iterator construction.

Not yet implemented is the per-epoch iterator construction, which
is necessary to actually regenerate the batches for each epoch.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/861

Differential Revision: D17460687

Pulled By: jma127

fbshipit-source-id: 1c2a54f04ac96b3561c100a6fd66a9fccbe3c658

a8a85c26

19 Aug, 2019 1 commit

Small fixes · 6ce55e4b

Myle Ott authored Aug 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/835

Differential Revision: D16904038

Pulled By: myleott

fbshipit-source-id: 2c9d0b913f8d688297ac80fcabd905bd1397f66a

6ce55e4b

14 Aug, 2019 1 commit

Fix tests · 7c89e13f

Myle Ott authored Aug 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/822

Differential Revision: D16800078

Pulled By: myleott

fbshipit-source-id: b86e08e01f2fe13c64b77f1d23a5f6800f252bf7

7c89e13f

13 Aug, 2019 1 commit

Add fairseq-validate · d015d23a

Myle Ott authored Aug 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/765

Differential Revision: D16763357

Pulled By: myleott

fbshipit-source-id: 758b03158e486ee82786e2d5bf4e46073b50c503

d015d23a

08 Aug, 2019 1 commit

Asr initial push (#810) · 72f9364c

Dmytro Okhonko authored Aug 08, 2019

Summary:
Initial code for speech recognition task.
Right now only one ASR model added - https://arxiv.org/abs/1904.11660

unit test testing:
python -m unittest discover tests

also run model training with this code and obtained
5.0 test_clean | 13.4 test_other
on librispeech with pytorch/audio features
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/810

Reviewed By: cpuhrsch

Differential Revision: D16706659

Pulled By: okhonko

fbshipit-source-id: 89a5f9883e50bc0e548234287aa0ea73f7402514

72f9364c

01 Aug, 2019 1 commit

Fix sampling with beam>1 · 4abadbdf

Myle Ott authored Aug 01, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/792

Differential Revision: D16591987

Pulled By: myleott

fbshipit-source-id: d27c490ae75f80ded19226b8384f4776485dd694

4abadbdf

30 Jul, 2019 1 commit

Relicense fairseq under MIT license (#786) · e75cff5f

Myle Ott authored Jul 30, 2019

Summary:
The previous BSD+PATENTS license was controversial. We have been
approved to relicense fairseq under the MIT license.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786

Differential Revision: D16560654

Pulled By: myleott

fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034

e75cff5f

22 Jul, 2019 2 commits

Implement sparse transformer fixed attention pattern (#804) · a03fe6fa

Sara Hanson authored Jul 22, 2019

Summary:
Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746

Pull Request resolved: https://github.com/pytorch/fairseq/pull/894

Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values.

Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary.

Reviewed By: borguz

Differential Revision: D16042988

fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5

a03fe6fa

Move Masked LM components to legacy/ -- new ones are coming · 47fd9852

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740

Differential Revision: D16377797

Pulled By: myleott

fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f

47fd9852

17 Jul, 2019 1 commit

Nucleus (top-P) sampling (#710) · e46b924d

Xing Zhou authored Jul 17, 2019

Summary:
Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p.

To test it:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710

Test Plan:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3

python tests/test_sequence_generator.py

python tests/test_binaries.py

Reviewed By: myleott

Differential Revision: D16286688

Pulled By: xingz9

fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f

e46b924d

23 Jun, 2019 1 commit

Fix resuming training when using --memory-efficient-fp16 · efb43450

Myle Ott authored Jun 23, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/678

Differential Revision: D15956712

Pulled By: myleott

fbshipit-source-id: 5048d06ddfbec0045558a22c777a966cca1ec396

efb43450

11 Jun, 2019 1 commit

Python3.5 compat (#794) · a8f28ecb

Bairen Yi authored Jun 11, 2019

Summary:
See #467. Ping myleott to review.

This is a work-related contribution. Ping lark to review.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/794

Differential Revision: D15756816

Pulled By: myleott

fbshipit-source-id: 6dce3ff3a713bf5f60e5782bc260b2ca9d2c0a9b

a8f28ecb

06 Jun, 2019 1 commit

Change encoder_learned_pos default back to True for xlm_base · fa7791df

Matt Le authored Jun 06, 2019

Reviewed By: pipibjc

Differential Revision: D15635402

fbshipit-source-id: e92fab914de40775d7bad851420355240d822bde

fa7791df

04 Jun, 2019 1 commit

Fix loading XLM pretraining · 5408bc08

Matt Le authored Jun 04, 2019

Summary: We never actually load the model parameters from an XLM model when using tranformer_from_pretrained_xlm. Also, change encoder_learned_pos from True -> False

Reviewed By: liezl200

Differential Revision: D15629061

fbshipit-source-id: 759eadc88041eae94505477960de57dd78a99dcb

5408bc08

30 May, 2019 1 commit

Add --reset-dataloader · ffc3bb58

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/613

Differential Revision: D15541384

Pulled By: myleott

fbshipit-source-id: ef2c0b0a51cdf37af2ccff0546f524d49f87e65d

ffc3bb58

24 May, 2019 1 commit

Implement reducing footprint of average checkpoint correctly (#747) · 8ce2c35d

Yongqiang Wang authored May 24, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/747

In https://github.com/pytorch/fairseq/pull/647, checkpoint averaging
is not Implemented correctly when it comes to shared parameters. This diff
has the right Implementation and a test case to guard future change.

Reviewed By: myleott

Differential Revision: D15402943

fbshipit-source-id: 8004836d5c2571814ea54844650618008a9ee522

8ce2c35d

20 May, 2019 1 commit

Make ConcatDataset work in PytorchTranslateTask multi-path dataset loading (#730) · ee28411f

Ning Dong authored May 20, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/730

Pull Request resolved: https://github.com/pytorch/translate/pull/528

Add/modify necessary functions for ConcatDataset to work in PytorchTranslateTask and replace MultiCorpusSampledDataset which doesn't support mixed batch.

Any idea on how to implement collater here for mixed batch? Now I'm just using the collater of the first dataset.

Reviewed By: liezl200

Differential Revision: D15260872

fbshipit-source-id: 14b148c506e9f8ebf4fe60a49f95444d4123d76f

ee28411f

17 May, 2019 1 commit

Clean up sharded train iterator · 3bfbb49b

Myle Ott authored May 16, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/586

Differential Revision: D15372949

Pulled By: myleott

fbshipit-source-id: c1cf1c645e8d55fc8568f23a47c45677ac9ab1da

3bfbb49b

15 May, 2019 1 commit

Updates to model API (#561) · dffb1674

Myle Ott authored May 15, 2019

Summary:
- `FairseqModel` -> `FairseqEncoderDecoderModel`
- add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
- `encoder_out_dict` -> `encoder_out`
- rm unused `remove_head` functions
- update docs
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/561

Differential Revision: D15271142

Pulled By: myleott

fbshipit-source-id: 8e8864e399336020f0271c780598e968ff51a264

dffb1674

14 May, 2019 2 commits

rm default_key from MultiCorpusSampledDataset · 7432130e

Myle Ott authored May 14, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/575

Differential Revision: D15318004

Pulled By: myleott

fbshipit-source-id: ad918d71b1bd8074decf5ec3463dd9bc9487bbe9

7432130e

Move save/load checkpoint functions to utils · cd1e5c09

Dmytro Okhonko authored May 14, 2019

Summary:
Move `load_checkpoint`, `save_checkpoint` and `reload_train` from train.py to checkpoint_utils.py
Move `get_perplexity` from train.py to utils.py.
This will make train.py lighter and allow us to reuse all this utils functionality when fairseq is used as external library.

Reviewed By: myleott

Differential Revision: D15289607

fbshipit-source-id: 4b7c95225ac22e402bcda3497811361809110df1

cd1e5c09

09 May, 2019 1 commit

expose arguments for bias_kv and zero_attn for masked_lm · 93ec8d0b

Jingfei Du authored May 08, 2019

Summary: the old no_bias_kv argument for masked_lm models are not used. Split it into 2 arguments and expose them.

Reviewed By: myleott

Differential Revision: D15266154

fbshipit-source-id: 60b041f8370ca1d8869ed3402fb9a67d1cd8e0e8

93ec8d0b

07 May, 2019 2 commits

Memory-Mapped IndexedDataset implementation (#589) · a1c997bd

Davide Caroselli authored May 07, 2019

Summary:
Following discussion in https://github.com/pytorch/fairseq/issues/574:

 - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
- Update scripts/read_binarized.py to support new MMapIndexedDataset
- Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
- Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
Pull Request resolved: https://github.com/pytorch/fairseq/pull/589

Differential Revision: D14597128

Pulled By: myleott

fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e

a1c997bd

Improve init speed of TokenBlockDataset and EpochBatchIterator · e4edf27a

Myle Ott authored May 07, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/704

Differential Revision: D15221549

Pulled By: myleott

fbshipit-source-id: b0021acdc2d7792ce51421f1432e1f2bd8218f7b

e4edf27a

06 May, 2019 1 commit

allowing sharded dataset (#696) · 0add50c2

Naman Goyal authored May 06, 2019

Summary:
Co-authored-by: myleott <myleott@fb.com>

Changing `data` to be `str` with colon separated list for loading sharded datasets. This change is useful for loading large datasets that cannot fit into, memory. The large dataset can be sharded and then each shard is loaded in one epoch in roudrobin manner.

For example, if there are `5` shards of data and `10` epochs then the shards will be iterated upon `[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]`.

myleott We need to look into `translation.py` as it currently already expects a list and then concats the datasets.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/696

Differential Revision: D15214049

fbshipit-source-id: 03e43a7b69c7aefada2ca668abf1eac1969fe013

0add50c2

04 May, 2019 1 commit

Fix and generalize --temperature option (#508) · 96ac28d3

Myle Ott authored May 04, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/508

The previous version applied the temperature after the softmax. Fix that, and
also generalize so it works with other search approaches.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/694

Differential Revision: D15175160

Pulled By: myleott

fbshipit-source-id: cc87ff0e97a8a1dd37f9983163f58a8641155ab0

96ac28d3

30 Apr, 2019 1 commit

Merge internal changes (#654) · d45db804

Myle Ott authored Apr 29, 2019

Summary:
- Add --add-bos-token option to LM task
- Cleanup utils.py and options.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/654

Differential Revision: D15041794

Pulled By: myleott

fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e

d45db804

25 Apr, 2019 3 commits

Fix fairseq unittest timeouts (#667) · 57b6a6db

Liezl Puzon authored Apr 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/667

Use smaller models so that unittests won't timeout

Reviewed By: pipibjc

Differential Revision: D15056894

fbshipit-source-id: af9fbda6ea6e56cf82d52555620121b189e2f013

57b6a6db

XLM for NMT: option to only load encoder or decoder (#666) · 5008fd4e

Liezl Puzon authored Apr 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/666

Option to load the XLM weights into only the encoder or the decoder

Reviewed By: pipibjc

Differential Revision: D14881004

fbshipit-source-id: 6d0d598ea9c445ec468f71b8e855712de89a5dac

5008fd4e

Load a XLM model into transformer encoder / decoder for MT training (#629) · 8da9b1c5

Liezl Puzon authored Apr 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/629

Use GeLU as an alternate activation layer for ReLU.

Reviewed By: lematt1991

Differential Revision: D14689851

fbshipit-source-id: 7ec81fa34bc7bd0e1e43b337847ae932dcbf8b15

8da9b1c5

17 Apr, 2019 1 commit

Enable custom sampling strategy in MultiCorpusSampledDataset (#639) · 90d6eac2

Ning Dong authored Apr 16, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/639

Add argument sampling_func in the constructor to enable custom sampling over a list of dataset keys. The default strategy is to sample uniformly as it did previously.

Reviewed By: liezl200

Differential Revision: D14965774

fbshipit-source-id: f3285688a9ae3729c0ba12c22254c1144d0eea9e

90d6eac2

15 Apr, 2019 1 commit

Simplify and generalize utils.make_positions · e12e1d25

Myle Ott authored Apr 15, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/625

Differential Revision: D14822123

Pulled By: myleott

fbshipit-source-id: 8a263d30020588577ee02fb8c6959ff918705103

e12e1d25

10 Apr, 2019 1 commit

Back translation + denoising in MultilingualTranslation task (#620) · d7e19573

Peng-Jen Chen authored Apr 10, 2019

Summary:
- Add language token to MultilingualTranslation task
- Add back translation and denoising loss to MultilingualTranslation task
Pull Request resolved: https://github.com/pytorch/fairseq/pull/620

Reviewed By: liezl200

Differential Revision: D14756873

Pulled By: pipibjc

fbshipit-source-id: 89d668db26848fd95f446edf5923bab2113636f7

d7e19573

12 Mar, 2019 2 commits

Handle 3+ dimensional input in sequence_generator + nits · 860010e9

Dmytro Okhonko authored Mar 12, 2019

Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths.

Reviewed By: myleott

Differential Revision: D14420044

fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015

860010e9

Adadelta optimizer · d17fa851

Dmytro Okhonko authored Mar 12, 2019

Summary: Adding Adadelta optimizer to fairseq as wrapper around torch.optim.Adadelta

Reviewed By: myleott

Differential Revision: D14418635

fbshipit-source-id: 6bf5ec008e905a4a2cbf7415e9492f5eea3ff07f

d17fa851

28 Feb, 2019 2 commits

Move string line encoding logic from tokenizer to Dictionary (unified diff). (#541) · f296824f

Vladimir Karpukhin authored Feb 28, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/541

Just a combo of a stacked pair D14057943 & D14176011,
Made this as a separete diff cause there seems to be some issue with porting a stacked change into github repo

Differential Revision: D14251048

fbshipit-source-id: 0a47f534a69d6ab2ebe035fba40fd51748cccfb8

f296824f

Add test for mixture of experts · bc919276

Myle Ott authored Feb 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/543

Differential Revision: D14259481

Pulled By: myleott

fbshipit-source-id: fcb0a150b8e851cf86ea5ed1f083f56e1600588e

bc919276

26 Feb, 2019 1 commit

Add Tensorboard support (#530) · 44d27e64

Myle Ott authored Feb 25, 2019

Summary:
Enable with the `--tensorboard-logdir` option.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/530

Differential Revision: D14218430

Pulled By: myleott

fbshipit-source-id: e7a54f66f928e3bb02ae03fda09b22fa4fa7d053

44d27e64