Commits · 34726d5612322b79b621e52f7f6fe47a6716eb65 · OpenDAS / Fairseq

02 May, 2019 4 commits

Move distributed_init into DistributedFairseqModel (#687) · 34726d56

Myle Ott authored May 02, 2019

Summary:
This should make rendezvous happen as lazily as possible.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/687

Differential Revision: D15151145

Pulled By: myleott

fbshipit-source-id: d70816a85414c5d509a6b12e2b339b4736db2c88

34726d56

Validate on all sets based on --save-interval-updates · fb18be00

Myle Ott authored May 02, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/693

Differential Revision: D15174831

fbshipit-source-id: 98688b1269ead5694e5116659ff64507d3c0d1c0

fb18be00

Fix inconsistent gradient check · 4a30a5f6

Myle Ott authored May 02, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/692

Differential Revision: D15174954

fbshipit-source-id: 1a7bff9aeed3e2cc658577be9d79e8c9f72314c2

4a30a5f6

Make CTC work with more encoder-only models · ffc9c8cc

Kritika Singh authored May 01, 2019

Summary:
Changes include:
1. Added get_normalized_probabilities to the encoder-only base class FairseqEncoderModel
2. Made CTCCriterion work for both batch_first (LSTMSubsampleEncoderModel) and batch_second (LSTMEncoderOnly) encoder types
3. Added tests for different encoder and CTC combinations.

TODO:
CTC still doesn't work for VGGLSTMEncoderModel so I have disabled that. Will debug and send out fix in another diff.

Reviewed By: jay-mahadeokar

Differential Revision: D15158818

fbshipit-source-id: acb484bad705c937d676d2c3dcde3e3562d68ed9

ffc9c8cc

01 May, 2019 5 commits

Make MultiCorpusSampledDataset and IndexedCachedDataset Picklable · e112d501

Myle Ott authored May 01, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/691

Differential Revision: D15172543

Pulled By: myleott

fbshipit-source-id: f2b626ff7f5e95f0ddc83c105af7ab9d092a135e

e112d501

add ConcatDataset support for XLM · 91c78477

taineleau authored May 01, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/684

Differential Revision: D15154631

Pulled By: myleott

fbshipit-source-id: 5e7dd9651d9ed239b60c51b9a11d08c80307d3ba

91c78477

Support dataset upsampling / relative ratio in PytorchTranslateTask (#494) · ff74ca94

Ning Dong authored May 01, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/494

Pull Request resolved: https://github.com/pytorch/fairseq/pull/657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad

ff74ca94

Better OOM recovery · da9e493e

Myle Ott authored May 01, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/685

Differential Revision: D15154647

Pulled By: myleott

fbshipit-source-id: 36c72359755192a4a53367e19f8dd006791d483c

da9e493e

Add default noising argument in WordNoiser initialization (#664) · 37420855

Ning Dong authored Apr 30, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/664

Previously arguments for noising (dropout_prob for WordDropout and max_shuffle_distance for WordShuffle) are only passed in noising() so it could not be customized in NoisingDataset.

Now add default argument in initializer so the value could be specified at construction.

Reviewed By: liezl200

Differential Revision: D15071632

fbshipit-source-id: 59a9bf5a5e6d03c1e74f1b31c1927e221cb11dfa

37420855

30 Apr, 2019 6 commits

addding polynomial lr scheduler (#683) · 9421e978

Naman Goyal authored Apr 30, 2019



Summary:
Co-authored-by: jingfeidu <jingfeidu@fb.com>

The implementation is by Jingfei Du from branch "bigbert". Copied over to this CR to get it merged in isolation since other changes seem to be already in master.

**Small changes from original:**
Added following line in `__init__` as discovered by myleott :

```
self.optimizer.set_lr(self.warmup_factor * self.lr)
```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/683

Reviewed By: myleott

Differential Revision: D15149628

Pulled By: myleott

fbshipit-source-id: 5f715611182cdd111e636c66d5f24aa88fa03e29

9421e978

Merge internal changes · 6b8cb7db

Myle Ott authored Apr 30, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/682

Differential Revision: D15147735

Pulled By: myleott

fbshipit-source-id: 4a5f12c0b24591f964fe1f465be3775a67578e79

6b8cb7db

Add rm_pt.py helper script for removing checkpoint files · f5e52c19

Myle Ott authored Apr 30, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/681

Differential Revision: D15147107

fbshipit-source-id: 4452c98059586a4d748868a7659329285a76d5ef

f5e52c19

Merge internal changes (#654) · d45db804

Myle Ott authored Apr 29, 2019

Summary:
- Add --add-bos-token option to LM task
- Cleanup utils.py and options.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/654

Differential Revision: D15041794

Pulled By: myleott

fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e

d45db804

Add more details in error message when sentence length > max tokens (#672) · 89a69616

Liezl Puzon authored Apr 29, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/672

title

Reviewed By: jmp84, pipibjc

Differential Revision: D15094977

fbshipit-source-id: c24e4ec9355b53e1585ac4da32809f1c339c7364

89a69616

Fix upgrade_state_dict for XLM Transformer sentence encoder (#680) · 121877f5

Liezl Puzon authored Apr 29, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/680

Some embedding names were renamed but this one was missed

So far I've only seen this affect our runs during continuing training. If you encountered any errors when continuing training from an XLM save_dir, rebasing past this diff (or patching this and canarying) should fix the problem

Reviewed By: pipibjc

Differential Revision: D15137463

fbshipit-source-id: c72067f16aaf1ba2b8286938bd25a19b70ae8712

121877f5

29 Apr, 2019 2 commits

Update README.md (#679) · ace8f724

Myle Ott authored Apr 29, 2019

Summary:
Add missing backslash.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/679

Differential Revision: D15122270

Pulled By: myleott

fbshipit-source-id: fbdfde648051294eaa9f7a4e0c4cfbc57491a718

ace8f724

Update comments and citations · 849605a0

Myle Ott authored Apr 29, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/676

Differential Revision: D15114128

Pulled By: myleott

fbshipit-source-id: b11dde77b2f2610d33649101aea03fb5a3eeb56a

849605a0

27 Apr, 2019 2 commits

Add args and sys.argv to tensorboard (#673) · 257a3b89

Noe Casas authored Apr 27, 2019

Summary:
Log fairseq's `args` and `sys.argv` in tensorboard to easily identify run hyperparameters from within tensorboard.

The idea was suggested in https://twitter.com/Thom_Wolf/status/1106300583835766786
Pull Request resolved: https://github.com/pytorch/fairseq/pull/673

Differential Revision: D15114159

Pulled By: myleott

fbshipit-source-id: d48133a7f629dffe984836712390c317916cf413

257a3b89

Add small comments for MonolingualDataset and TokenBlockDataset · 8bf8399d

Myle Ott authored Apr 27, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/669

Differential Revision: D15114160

Pulled By: myleott

fbshipit-source-id: 64f4a8154c8931ddbbe459d4d4a54c46680ad6b6

8bf8399d

26 Apr, 2019 1 commit

Passing kwargs in setup_task in fairseq_task (#670) · f701aa8c

Mohammad Sadegh Rasooli authored Apr 26, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/670

Pytorch-translate task needs to use extra arguments (such as vocabulary objects). By passing kwargs, we are able to have the ability to have extra arguments in setup_task

Reviewed By: akinh, pipibjc

Differential Revision: D15086810

fbshipit-source-id: 555f7976020eaac1febb8226f5a0055af0407ea6

f701aa8c

25 Apr, 2019 6 commits

Fix fairseq unittest timeouts (#667) · 57b6a6db

Liezl Puzon authored Apr 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/667

Use smaller models so that unittests won't timeout

Reviewed By: pipibjc

Differential Revision: D15056894

fbshipit-source-id: af9fbda6ea6e56cf82d52555620121b189e2f013

57b6a6db

XLM for NMT: option to only load encoder or decoder (#666) · 5008fd4e

Liezl Puzon authored Apr 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/666

Option to load the XLM weights into only the encoder or the decoder

Reviewed By: pipibjc

Differential Revision: D14881004

fbshipit-source-id: 6d0d598ea9c445ec468f71b8e855712de89a5dac

5008fd4e

Load a XLM model into transformer encoder / decoder for MT training (#629) · 8da9b1c5

Liezl Puzon authored Apr 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/629

Use GeLU as an alternate activation layer for ReLU.

Reviewed By: lematt1991

Differential Revision: D14689851

fbshipit-source-id: 7ec81fa34bc7bd0e1e43b337847ae932dcbf8b15

8da9b1c5

Add gelu and gelu_fast as possible activation functions (#653) · 8500bdd0

Liezl Puzon authored Apr 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/653

After this diff, you can train a transformer model with --activation-fn 'relu', 'gelu', or 'gelu_fast'

gelu_fast is the default implementation in https://github.com/hendrycks/GELUs/blob/master/mnist_fcn.py#L72-L77
gelu is the alternate implementation in https://github.com/hendrycks/GELUs/blob/master/mnist_fcn.py#L72-L77 and the default implementation in https://github.com/facebookresearch/XLM

Reviewed By: pipibjc

Differential Revision: D14966006

fbshipit-source-id: 94e95fb99bd548ba47cf23b4999265c7b6833fc1

8500bdd0

Added link to blog post (#662) · d8d03745

ankur6ue authored Apr 24, 2019

Summary:
Added link to blog post about incremental decoder in the FairseqIncrementalDecoder class description.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/662

Differential Revision: D15077845

Pulled By: myleott

fbshipit-source-id: f23294721739600e14feb2cca4ece95f2b968f44

d8d03745

added link to sample stories · 5ecedd69

Angela Fan authored Apr 24, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/665

Differential Revision: D15077853

Pulled By: huihuifan

fbshipit-source-id: 2a0d3f6236ae002579f1ee72735d6d8000b8e6b6

5ecedd69

24 Apr, 2019 1 commit

Don't reload best validation loss when using --reset-optimizer · 0020477a

Myle Ott authored Apr 24, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/661

Differential Revision: D15068312

Pulled By: myleott

fbshipit-source-id: 1216835fd4c7f83ea5e350bff83901c93ac57447

0020477a

22 Apr, 2019 2 commits

Fix generation with --no-early-stop (#627) · fa52d202

Max Ryabinin authored Apr 22, 2019

Summary:
Because the size of `unfinalized_scores` is equal to current `bsz` and not initial batch size, we need to index it by `unfin_idx` instead of `sent` in `is_finished`.
Fixes #588.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/627

Differential Revision: D15034641

Pulled By: myleott

fbshipit-source-id: 2638e68e877ae01256cac7d8e69b5b7fec8f7017

fa52d202

reduce memory footprint for average_checkpoints (#647) · d63477e1

Yongqiang Wang authored Apr 21, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/647

the current implementation of average_checkpoints requires loading all
the model parameters into memory and then do the averaging. To average large
models (e.g., transformer) over a large number of checkpoints (e.g., >50),
it may require over 100GB memory.

Loading all the parameters is not necessary, as we know the number of models in advance.

Reviewed By: skritika

Differential Revision: D15027513

fbshipit-source-id: 0afe37c9a031a9ab0f1e78844a37be49ec5f76f1

d63477e1

17 Apr, 2019 3 commits

Open BlockPairDataset for MaskedLMData to work (#641) · d2f3007c

Kartikay Khandelwal authored Apr 17, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/641

Fix breaking import

Reviewed By: pipibjc

Differential Revision: D14978454

fbshipit-source-id: 7b43152cb30100881e9991ead871531ee3f60e07

d2f3007c

Enable custom sampling strategy in MultiCorpusSampledDataset (#639) · 90d6eac2

Ning Dong authored Apr 16, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/639

Add argument sampling_func in the constructor to enable custom sampling over a list of dataset keys. The default strategy is to sample uniformly as it did previously.

Reviewed By: liezl200

Differential Revision: D14965774

fbshipit-source-id: f3285688a9ae3729c0ba12c22254c1144d0eea9e

90d6eac2

Black formatting for multi_corpus_sampled_dataset.py (#638) · 17cef3f6

Ning Dong authored Apr 16, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/638

RT

Reviewed By: liezl200

Differential Revision: D14967268

fbshipit-source-id: 2da361497743d90a841fdbf2a50085136c70b468

17cef3f6

16 Apr, 2019 1 commit

Open Source MLM Implementation in Fairseq (#635) · 8776928c

Kartikay Khandelwal authored Apr 16, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/635

Adding a task and relevant models, datasets and criteria needed for training Cross-lingual Language Models similar to Masked Language Model used in XLM (Lample and Conneau, 2019 - https://arxiv.org/abs/1901.07291).

Reviewed By: liezl200

Differential Revision: D14943776

fbshipit-source-id: 3e416a730303d1dd4f5b92550c78db989be27073

8776928c

15 Apr, 2019 3 commits

Better distributed init · 303b95ce

Myle Ott authored Apr 15, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/615

Differential Revision: D14933742

Pulled By: myleott

fbshipit-source-id: c2c20425875743c89bbc2ac564a2fbb6ff4958b2

303b95ce

fix checkpoint timer (#634) · de8aeab5

freewym authored Apr 15, 2019

Summary:
If arg.keep_interval_updates or args.keep_last_epochs > 0, `checkpoints` would refer to a list of checkpoint files to be removed, which can be empty. So moved the logging code to the right position.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/634

Differential Revision: D14933655

Pulled By: myleott

fbshipit-source-id: 68182ee99d9701e1536833d31e0a7c5d2eb2d679

de8aeab5

Simplify and generalize utils.make_positions · e12e1d25

Myle Ott authored Apr 15, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/625

Differential Revision: D14822123

Pulled By: myleott

fbshipit-source-id: 8a263d30020588577ee02fb8c6959ff918705103

e12e1d25

12 Apr, 2019 1 commit

Fix hybrid transformer state dict update after encoder layernorm rename (#633) · a47630e1

Liezl Puzon authored Apr 12, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/633

Pull Request resolved: https://github.com/pytorch/translate/pull/456

This diff makes it easier to upgrade the state dict for components that use TransformerEncoderLayer

Reviewed By: jhcross

Differential Revision: D14916941

fbshipit-source-id: 6d0258c8a9492a720684dadce59c90fc87cbf5cf

a47630e1

10 Apr, 2019 3 commits

Fix sacrebleu (#630) · 58b912f6

Xian Li authored Apr 10, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/630

sacrebleu scorer has stopped working in pytorch_translate (maybe
fairseq too) probably due to  a recent api change.

Reviewed By: jmp84

Differential Revision: D14792797

fbshipit-source-id: c2a00246e08bc913c41e60c5fbf8ab4ab5e80d18

58b912f6

Make TransformerEncoderLayer layer norm names more descriptive · e5ba94ab

Liezl Puzon authored Apr 10, 2019

Summary:
I added an upgrade_state_dict function so that loading old models will still work

layer_norms[0] --> self_attn_layer_norm
layer_norms[1] --> final_layer_norm

Reviewed By: pipibjc

Differential Revision: D14689849

fbshipit-source-id: b2809262c11fe9d083e571fa31044798aefd48ce

e5ba94ab

Add anneal-eps argument · 309f2511

Kritika Singh authored Apr 10, 2019

Summary: Used in fairspeq/train.py

Reviewed By: myleott, yqwangustc

Differential Revision: D14841512

fbshipit-source-id: 02fd7b58841c32e2797e3159e65f2bef36f02da1

309f2511