Commits · 4ac2c5f2cc8a8b1f221f1e8e9b7839f07c25d997 · OpenDAS / Fairseq

29 Sep, 2019 1 commit

Implementation of the WeCNLP abstract "Cross+Self-Attention for Transformer Models" (#1097) · 4ac2c5f2

Stephan Peitz authored Sep 29, 2019

Summary:
This PR implements a new attention module which combines cross-attention (encoder-decoder attention) and the decoder self-attention. This work was accepted as an abstract at WeCNLP 2019 (https://www.wecnlp.ai/wecnlp-2019).

Cross+Self-Attention reduces the amount of parameter and increases the inference speed without any degradation in translation quality.
More details can be found in the attached [abstract](https://github.com/pytorch/fairseq/files/3561282/paper.pdf)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1097

Differential Revision: D17653168

Pulled By: myleott

fbshipit-source-id: deb834c2c78a229d7418ffbfea20ba3ce252991c

4ac2c5f2

27 Sep, 2019 1 commit

Levenshtein Transformer paper code · 86857a58

Changhan Wang authored Sep 27, 2019

Summary:
Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006)
* Added Levenshtein Transformer model, task and criterion class
* Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines
* Add an option for prepending BOS to dictionary class and translation task class

Reviewed By: myleott

Differential Revision: D17297372

fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7

86857a58

14 Aug, 2019 1 commit

Fix tests · 7c89e13f

Myle Ott authored Aug 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/822

Differential Revision: D16800078

Pulled By: myleott

fbshipit-source-id: b86e08e01f2fe13c64b77f1d23a5f6800f252bf7

7c89e13f

13 Aug, 2019 1 commit

Add fairseq-validate · d015d23a

Myle Ott authored Aug 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/765

Differential Revision: D16763357

Pulled By: myleott

fbshipit-source-id: 758b03158e486ee82786e2d5bf4e46073b50c503

d015d23a

30 Jul, 2019 1 commit

Relicense fairseq under MIT license (#786) · e75cff5f

Myle Ott authored Jul 30, 2019

Summary:
The previous BSD+PATENTS license was controversial. We have been
approved to relicense fairseq under the MIT license.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786

Differential Revision: D16560654

Pulled By: myleott

fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034

e75cff5f

22 Jul, 2019 1 commit

Move Masked LM components to legacy/ -- new ones are coming · 47fd9852

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740

Differential Revision: D16377797

Pulled By: myleott

fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f

47fd9852

17 Jul, 2019 1 commit

Nucleus (top-P) sampling (#710) · e46b924d

Xing Zhou authored Jul 17, 2019

Summary:
Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p.

To test it:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710

Test Plan:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3

python tests/test_sequence_generator.py

python tests/test_binaries.py

Reviewed By: myleott

Differential Revision: D16286688

Pulled By: xingz9

fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f

e46b924d

11 Jun, 2019 1 commit

Python3.5 compat (#794) · a8f28ecb

Bairen Yi authored Jun 11, 2019

Summary:
See #467. Ping myleott to review.

This is a work-related contribution. Ping lark to review.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/794

Differential Revision: D15756816

Pulled By: myleott

fbshipit-source-id: 6dce3ff3a713bf5f60e5782bc260b2ca9d2c0a9b

a8f28ecb

06 Jun, 2019 1 commit

Change encoder_learned_pos default back to True for xlm_base · fa7791df

Matt Le authored Jun 06, 2019

Reviewed By: pipibjc

Differential Revision: D15635402

fbshipit-source-id: e92fab914de40775d7bad851420355240d822bde

fa7791df

04 Jun, 2019 1 commit

Fix loading XLM pretraining · 5408bc08

Matt Le authored Jun 04, 2019

Summary: We never actually load the model parameters from an XLM model when using tranformer_from_pretrained_xlm. Also, change encoder_learned_pos from True -> False

Reviewed By: liezl200

Differential Revision: D15629061

fbshipit-source-id: 759eadc88041eae94505477960de57dd78a99dcb

5408bc08

09 May, 2019 1 commit

expose arguments for bias_kv and zero_attn for masked_lm · 93ec8d0b

Jingfei Du authored May 08, 2019

Summary: the old no_bias_kv argument for masked_lm models are not used. Split it into 2 arguments and expose them.

Reviewed By: myleott

Differential Revision: D15266154

fbshipit-source-id: 60b041f8370ca1d8869ed3402fb9a67d1cd8e0e8

93ec8d0b

07 May, 2019 1 commit

Memory-Mapped IndexedDataset implementation (#589) · a1c997bd

Davide Caroselli authored May 07, 2019

Summary:
Following discussion in https://github.com/pytorch/fairseq/issues/574:

 - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
- Update scripts/read_binarized.py to support new MMapIndexedDataset
- Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
- Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
Pull Request resolved: https://github.com/pytorch/fairseq/pull/589

Differential Revision: D14597128

Pulled By: myleott

fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e

a1c997bd

04 May, 2019 1 commit

Fix and generalize --temperature option (#508) · 96ac28d3

Myle Ott authored May 04, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/508

The previous version applied the temperature after the softmax. Fix that, and
also generalize so it works with other search approaches.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/694

Differential Revision: D15175160

Pulled By: myleott

fbshipit-source-id: cc87ff0e97a8a1dd37f9983163f58a8641155ab0

96ac28d3

30 Apr, 2019 1 commit

Merge internal changes (#654) · d45db804

Myle Ott authored Apr 29, 2019

Summary:
- Add --add-bos-token option to LM task
- Cleanup utils.py and options.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/654

Differential Revision: D15041794

Pulled By: myleott

fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e

d45db804

25 Apr, 2019 3 commits

Fix fairseq unittest timeouts (#667) · 57b6a6db

Liezl Puzon authored Apr 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/667

Use smaller models so that unittests won't timeout

Reviewed By: pipibjc

Differential Revision: D15056894

fbshipit-source-id: af9fbda6ea6e56cf82d52555620121b189e2f013

57b6a6db

XLM for NMT: option to only load encoder or decoder (#666) · 5008fd4e

Liezl Puzon authored Apr 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/666

Option to load the XLM weights into only the encoder or the decoder

Reviewed By: pipibjc

Differential Revision: D14881004

fbshipit-source-id: 6d0d598ea9c445ec468f71b8e855712de89a5dac

5008fd4e

Load a XLM model into transformer encoder / decoder for MT training (#629) · 8da9b1c5

Liezl Puzon authored Apr 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/629

Use GeLU as an alternate activation layer for ReLU.

Reviewed By: lematt1991

Differential Revision: D14689851

fbshipit-source-id: 7ec81fa34bc7bd0e1e43b337847ae932dcbf8b15

8da9b1c5

12 Mar, 2019 2 commits

Handle 3+ dimensional input in sequence_generator + nits · 860010e9

Dmytro Okhonko authored Mar 12, 2019

Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths.

Reviewed By: myleott

Differential Revision: D14420044

fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015

860010e9

Adadelta optimizer · d17fa851

Dmytro Okhonko authored Mar 12, 2019

Summary: Adding Adadelta optimizer to fairseq as wrapper around torch.optim.Adadelta

Reviewed By: myleott

Differential Revision: D14418635

fbshipit-source-id: 6bf5ec008e905a4a2cbf7415e9492f5eea3ff07f

d17fa851

28 Feb, 2019 1 commit

Add test for mixture of experts · bc919276

Myle Ott authored Feb 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/543

Differential Revision: D14259481

Pulled By: myleott

fbshipit-source-id: fcb0a150b8e851cf86ea5ed1f083f56e1600588e

bc919276

01 Feb, 2019 1 commit

Support custom Dictionary implementations in 'preprocess.py' (#448) · bbb4120b

Davide Caroselli authored Feb 01, 2019

Summary:
The `preprocess.py` script has been refactored in order to:

1. Use the `options` module for command line arguments parsing. This will give to `preprocess.py` the ability to load custom modules with `--user-dir` flag (already implemented to all other binaries)
2. Dictionary loading and building code has moved to Task implementation. This allows custom Dictionary classes to be used during the data generation step.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/448

Differential Revision: D13674819

Pulled By: myleott

fbshipit-source-id: b40648a98ed6c08284577e5ec25876e018d8c822

bbb4120b

30 Jan, 2019 1 commit

Add --input option to interactive.py to support reading from file · 3dce7c9f

Myle Ott authored Jan 30, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/484

Differential Revision: D13880636

Pulled By: myleott

fbshipit-source-id: 984b2e1c3b281c28243102eb971ea45ec891d94e

3dce7c9f

25 Jan, 2019 1 commit

Add code for "Pay Less Attention with Lightweight and Dynamic Convolutions" (#473) · b41c74dc

Myle Ott authored Jan 25, 2019

Summary:
Changelog:
- `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper
- `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu
- update READMEs
- misc fixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/473

Differential Revision: D13819717

Pulled By: myleott

fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9

b41c74dc

05 Jan, 2019 1 commit

Merge internal changes (#283) · 7633129b

Myle Ott authored Jan 04, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/283

Pull Request resolved: https://github.com/pytorch/fairseq/pull/428

Differential Revision: D13564190

Pulled By: myleott

fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5

7633129b

03 Oct, 2018 1 commit

Fix proxying in DistributedFairseqModel · fc677c94

Myle Ott authored Oct 03, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/302

Differential Revision: D10174608

Pulled By: myleott

fbshipit-source-id: 4e2dfc76eae97afc5488f29b47e74f9897a643ff

fc677c94

25 Sep, 2018 2 commits
- Better support for various c10d API changes · fbe8ce65
  Myle Ott authored Sep 17, 2018
  
  fbe8ce65
- Update LM test with --no-c10d · 8bd8ec8f
  Myle Ott authored Sep 07, 2018
  
  8bd8ec8f
03 Sep, 2018 2 commits
- Test max_positions · d473620e
  Myle Ott authored Sep 02, 2018
  
  d473620e
- Factor out search logic in SequenceGenerator · ef43da72
  Myle Ott authored Aug 09, 2018
  
  ef43da72
25 Jul, 2018 1 commit
- Iterate on need_attn and fix tests · bb5f15d1
  Myle Ott authored Jul 12, 2018
  
  bb5f15d1
21 Jun, 2018 1 commit
- Fix `--output-format raw` option to preprocess.py (Fixes #188) (#190) · 572a1d55
  Myle Ott authored Jun 21, 2018
  
  572a1d55
15 Jun, 2018 5 commits

Fix bidirectional lstm · bfcc6ec7
Myle Ott authored Jun 12, 2018

bfcc6ec7

Add FairseqTask · ff68a9ef

Myle Ott authored Jun 12, 2018

A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss.

Changes:
- Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator.
- Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position
- Remove LEFT_PAD_* constants and make them configurable per task

ff68a9ef

Add more integration tests (LM, stories, transformer, lstm) · 16a72b4d
Myle Ott authored Jun 04, 2018

16a72b4d

Conv lm implementation · 4c2ef2de

alexeib authored May 25, 2018

This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf

There are 3 modes for constructing batches:

- token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper
- complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper
- eos: one sentence per sample (skip blank lines)

some results:

GCNN-13 - GBW - 37.46
GCNN-14B - GBW - 33.88
GCNN-8 - Wiki103 - 43.76
GCNN-14 - Wiki103 - 35.66

train:

python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500

eval:

python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)*3+(850,1)+(850,5)*4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'

4c2ef2de

Fix tests · ae2585d9
Myle Ott authored May 24, 2018

ae2585d9

24 May, 2018 1 commit
- Merge internal changes (#163) · ec0031df
  Myle Ott authored May 24, 2018
  
  ec0031df
02 Apr, 2018 1 commit

Merge internal changes (#136) · d3795d6c

Myle Ott authored Apr 02, 2018

Changes:
- 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search
- c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model
- 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates
- small bugfixes for distributed training, LSTM, inverse square root LR scheduler

d3795d6c

27 Feb, 2018 2 commits
- More unit test fixes · 0d90e35f
  Myle Ott authored Feb 15, 2018
  
  0d90e35f
- Fix tests and flake8 · 29c82741
  Myle Ott authored Feb 15, 2018
  
  29c82741