Commits · 861dd2b7aab577295acd904e9ac27e53afbc1034 · OpenDAS / Fairseq

16 May, 2019 1 commit

fixed cmd arg for shuffle dataset masked lm task · 861dd2b7

Naman Goyal authored May 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/584

Reviewed By: myleott

Differential Revision: D15360774

Pulled By: myleott

fbshipit-source-id: b18efbb6ff5a8832c61b689f3d87c958cbd908e9

861dd2b7

15 May, 2019 7 commits

Fix biTransformer export (#583) · 2a3adcdc

Ruty Rinott authored May 15, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/583

D14610694 fixed issues in layerNorm exporting by making it conditional. D15260838 changed the implementation of TransformerDecoderLayer to the one under transformer, thus losing the fix. Bringing it back here.

Reviewed By: myleott, geof90, liaimi

Differential Revision: D15357119

fbshipit-source-id: e29e053ca5beca0008d7a8dad9880a483a14c7b9

2a3adcdc

added shuffle as arg for masked_lm for experimenting with pad effecie… (#582) · 74c936dc

Naman Goyal authored May 15, 2019

Summary:
added shuffle as arg for masked_lm for experimenting with pad effecient batching
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/582

Reviewed By: jingfeidu

Differential Revision: D15355105

Pulled By: jingfeidu

fbshipit-source-id: 9925271a0bc2f9d283f354d158bd4b5ec8788b39

74c936dc

added missing dense layers in masked lm model (#581) · d1d3a581

Naman Goyal authored May 15, 2019

Summary:
1) Added pooled_output for sentence classification as `Tanh(Linear())`.
2) Added lm_head_transform as `LayerNorm(GeLU(Linear(x)))`
3) `act_dropout = 0.0`
4) added `lm_output_learned_bias`
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/581

Reviewed By: borguz

Differential Revision: D15353575

Pulled By: borguz

fbshipit-source-id: 4ff64c6ceed23f3e99348f73d189546f1d84452e

d1d3a581

Updates to model API (#561) · dffb1674

Myle Ott authored May 15, 2019

Summary:
- `FairseqModel` -> `FairseqEncoderDecoderModel`
- add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
- `encoder_out_dict` -> `encoder_out`
- rm unused `remove_head` functions
- update docs
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/561

Differential Revision: D15271142

Pulled By: myleott

fbshipit-source-id: 8e8864e399336020f0271c780598e968ff51a264

dffb1674

Allow TransformerSentenceEncoder to return only last state · a0c5f9b8

Myle Ott authored May 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/578

Differential Revision: D15352060

Pulled By: myleott

fbshipit-source-id: 7dc2fceca37ec96c89356662831b0d82f28bef6f

a0c5f9b8

Add missing imports · 52778827

Myle Ott authored May 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/579

Differential Revision: D15352058

Pulled By: myleott

fbshipit-source-id: cebef02edcfcb203ef2e32c64f7f28e08c4e46b0

52778827

Various fixes for Masked LM (#573) · bf106796

Myle Ott authored May 14, 2019

Summary:
Various fixes for Masked LM

- use --activation-fn instead of --gelu
- use --dataset-impl instead of --lazy-load
- add embed_scale option to TransformerSentenceEncoder
- fix encoder_normalize_before to include a final layer norm
- delete BertLayerNorm
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/573

Reviewed By: borguz

Differential Revision: D15317933

Pulled By: myleott

fbshipit-source-id: 8ecb46556ad43e76e92d41ed8f5a62e8516fd375

bf106796

14 May, 2019 3 commits

rm default_key from MultiCorpusSampledDataset · 7432130e

Myle Ott authored May 14, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/575

Differential Revision: D15318004

Pulled By: myleott

fbshipit-source-id: ad918d71b1bd8074decf5ec3463dd9bc9487bbe9

7432130e

Alignment Training task using minibatch · 2c278ff0

Nayan Singhal authored May 14, 2019

Summary:
1. Define a EpochMinibatchIterator which extends the EpochBatchIterator. It has same functionality as EpochBatchIterator except two major changes: use static batching and use MiniBatchIterator for getting the indices.

2. SplitSeqCollater is used instead of Seq2SeqCollater.
3. LSTM_subsample started storing the previous states and reset it once the sample is over.

Reviewed By: jay-mahadeokar

Differential Revision: D15209023

fbshipit-source-id: 900b8bd1f25159ffc77f8106e26729a3e7422a1f

2c278ff0

Move save/load checkpoint functions to utils · cd1e5c09

Dmytro Okhonko authored May 14, 2019

Summary:
Move `load_checkpoint`, `save_checkpoint` and `reload_train` from train.py to checkpoint_utils.py
Move `get_perplexity` from train.py to utils.py.
This will make train.py lighter and allow us to reuse all this utils functionality when fairseq is used as external library.

Reviewed By: myleott

Differential Revision: D15289607

fbshipit-source-id: 4b7c95225ac22e402bcda3497811361809110df1

cd1e5c09

13 May, 2019 4 commits

Transition smoothly after warmup in polynomial LR decay schedule · c124d272

Myle Ott authored May 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/576

Differential Revision: D15318086

Pulled By: myleott

fbshipit-source-id: c6587737ca7b97edc97ad4aef5c5c9ac7e92b2f2

c124d272

gelu_fast -> gelu_accurate (#571) · 939ab6ae

Myle Ott authored May 13, 2019

Summary:
This was named gelu_fast after the original implementation: https://github.com/hendrycks/GELUs/blob/master/mnist_ae.py#L62-L63

But in practice it's actually slower and uses more memory. Rename to gelu_accurate.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/571

Differential Revision: D15317874

Pulled By: myleott

fbshipit-source-id: c96fbc89bf91b27ced1ab8d5b25a8f23f922ec24

939ab6ae

Lint · 72291287

Myle Ott authored May 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/574

Differential Revision: D15317984

Pulled By: myleott

fbshipit-source-id: 09a66229cc6b4c95678ca1ca13c9e0da25b203de

72291287

Add LAMB optimizer · b95f1b5d

Myle Ott authored May 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/572

Differential Revision: D15317928

Pulled By: myleott

fbshipit-source-id: b3f0e9229737a63b49937e7c5b918470f18ddc45

b95f1b5d

12 May, 2019 2 commits

Fix option in docs (#735) · d0577ba7

zhiqiang authored May 12, 2019

Summary:
`--output-format` -> `--dataset-impl` in Tutorial: Classifying Names with a Character-Level RNN
Pull Request resolved: https://github.com/pytorch/fairseq/pull/735

Differential Revision: D15314625

Pulled By: myleott

fbshipit-source-id: 65b8efd1a367ca754e5b9dca088aefbc648864dd

d0577ba7

Add scripts for working with txt files containing document boundaries · 287d31e2

Myle Ott authored May 12, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/736

Differential Revision: D15314626

Pulled By: myleott

fbshipit-source-id: 1e0c32529afee57e43fe5d6c7991cd13eb8a52c4

287d31e2

11 May, 2019 2 commits

convert logits to fp32 for calculating loss in masked_lm_loss criterion · 43722c5e

Naman Goyal authored May 11, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/568

Differential Revision: D15308483

Pulled By: myleott

fbshipit-source-id: 9d898ce523e46e6b6fb444274f478da0b577b603

43722c5e

Add missing options to TransformerDecoderLayer · 5dcc855a

Myle Ott authored May 11, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/560

Differential Revision: D15260838

Pulled By: myleott

fbshipit-source-id: 5f80dd82775c10ce46a3e1c451ccaf0ef55bfa31

5dcc855a

10 May, 2019 2 commits
- add option to specify lr-threshold while using lr-on-plateau strategy · 8a2e6e81
  Jay Mahadeokar authored May 10, 2019
```
Summary: As in title.

Reviewed By: skritika

Differential Revision: D15299135

fbshipit-source-id: 2fd513b32c0ab41911cdf0b0186f6c3bb5256285
```
  8a2e6e81
- fbshipit-source-id: 682b375c6e7535f12faaf9ca32811051f9e874da · 47fbc491
  myleott authored May 10, 2019
  
  47fbc491
09 May, 2019 5 commits

Merge pull request #727 from pytorch/fix_lr_scheduler · cfeb2163
Myle Ott authored May 09, 2019
```
Set initial learning rate in LR schedulers by calling step_update(0) at init
```
cfeb2163
Set initial learning rate in LR schedulers by calling step_update(0) at init · 219cbf6e
Myle Ott authored May 09, 2019

219cbf6e
Revert "Add sweep scripts" · 2af922f1
Myle Ott authored May 09, 2019
```
This reverts commit 8e8e1afc.
```
2af922f1

Add sweep scripts · 8e8e1afc

Myle Ott authored May 09, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/564

Differential Revision: D15278017

Pulled By: myleott

fbshipit-source-id: b6fba1b62145ea533b40f5eb9b134e6aa122e546

8e8e1afc

expose arguments for bias_kv and zero_attn for masked_lm · 93ec8d0b

Jingfei Du authored May 08, 2019

Summary: the old no_bias_kv argument for masked_lm models are not used. Split it into 2 arguments and expose them.

Reviewed By: myleott

Differential Revision: D15266154

fbshipit-source-id: 60b041f8370ca1d8869ed3402fb9a67d1cd8e0e8

93ec8d0b

08 May, 2019 7 commits

Don't allow abbreviated argument options · acb9ab32

Myle Ott authored May 08, 2019

Reviewed By: jmp84

Differential Revision: D15264847

fbshipit-source-id: 4ba9224d1b35c3de0d26c9b4c1ee6d641d3d8535

acb9ab32

Better error message for incorrect --dataset-impl · 61f29f7f

Myle Ott authored May 08, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/723

Differential Revision: D15260870

Pulled By: myleott

fbshipit-source-id: 73d9b138b9ab44f96824076258f1a6319193d0f7

61f29f7f

bug_fixes and small changes to masked lm (#721) · bd6e5c4f

Naman Goyal authored May 08, 2019

Summary:
1) Made the model compatible with using either `masked_lm_dataset` or `monolingual_dataset`.
2) fixed default args setting task. (`bert` vs `masked_lm`) myleott should we keep both?
3) bug in setting default value of `sentence_class_num`
4) bug for padding mask in `fp16`.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/721

Differential Revision: D15259885

fbshipit-source-id: 9dbf7fb8192992c1251670287bed719e41c08fcc

bd6e5c4f

Cleanup LM + Flake8 · f2563c21

Myle Ott authored May 08, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/720

Differential Revision: D15259091

Pulled By: myleott

fbshipit-source-id: 06a35996c06ccddb49fdc9e01e348ff3c9da334e

f2563c21

Fix indexing in TokenBlockDataset · eddcdf08

Myle Ott authored May 08, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/719

Differential Revision: D15258483

Pulled By: myleott

fbshipit-source-id: dd00daa6f1c87264c1196a77dfffc8c876ebde7f

eddcdf08

Bugfix · 0cb45bcb

Myle Ott authored May 08, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/717

Differential Revision: D15254560

Pulled By: myleott

fbshipit-source-id: 2a07614e8d294636f706939e60f0091c73115494

0cb45bcb

bugfix data not in args · 6a7eb6ce

Jay Mahadeokar authored May 07, 2019

Summary:
D15214049 introduced a bug such that if a tasks args does not contain data, then it will give error
```
File "/data/users/jaym/fbsource/fbcode/buck-out/dev/gen/deeplearning/projects/fairspeq/train#link-tree/train.py", line 119, in reload_train
   if len(args.data.split(":")) == 1:
AttributeError: 'Namespace' object has no attribute 'data'
```

This diff checks if data is in args to avoid above error.

Reviewed By: myleott, jmp84

Differential Revision: D15253373

fbshipit-source-id: 14fb9ad878ee50f1b7583349bb17e29c03c40815

6a7eb6ce

07 May, 2019 5 commits

fixed arg passing in masked_lm_dataset · 20e7836e

Naman Goyal authored May 07, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/715

Differential Revision: D15240723

fbshipit-source-id: 11d7280cb187d68f107902822e878f2a04b840c7

20e7836e

bugfix: passing args · e37bd948

taineleau authored May 07, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/711

Differential Revision: D15239618

Pulled By: myleott

fbshipit-source-id: 82f3f79501a13a967324b8a66281cd134bf1ef23

e37bd948

Memory-Mapped IndexedDataset implementation (#589) · a1c997bd

Davide Caroselli authored May 07, 2019

Summary:
Following discussion in https://github.com/pytorch/fairseq/issues/574:

 - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
- Update scripts/read_binarized.py to support new MMapIndexedDataset
- Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
- Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
Pull Request resolved: https://github.com/pytorch/fairseq/pull/589

Differential Revision: D14597128

Pulled By: myleott

fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e

a1c997bd

Improve init speed of TokenBlockDataset and EpochBatchIterator · e4edf27a

Myle Ott authored May 07, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/704

Differential Revision: D15221549

Pulled By: myleott

fbshipit-source-id: b0021acdc2d7792ce51421f1432e1f2bd8218f7b

e4edf27a

Mask out embeddings associated with padding (#710) · 8d9063fe

Kartikay Khandelwal authored May 06, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/710

Previously there was a bug in how we dealt with padding when computing the input representation from the segment and position embedding. D15144912 fixed this by adding an offset based on the padding id. However this makes assumptions about the padding id which may not hold true for vocabularies built outside of pyText and fairseq. Based on a discussion with barlaso, this diff 0's out all the embeddings associated with the padding.

Reviewed By: borguz

Differential Revision: D15209395

fbshipit-source-id: 5573020e610f5466e673fe3845c3ed34ebb5c44d

8d9063fe

06 May, 2019 2 commits

allowing sharded dataset (#696) · 0add50c2

Naman Goyal authored May 06, 2019

Summary:
Co-authored-by: myleott <myleott@fb.com>

Changing `data` to be `str` with colon separated list for loading sharded datasets. This change is useful for loading large datasets that cannot fit into, memory. The large dataset can be sharded and then each shard is loaded in one epoch in roudrobin manner.

For example, if there are `5` shards of data and `10` epochs then the shards will be iterated upon `[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]`.

myleott We need to look into `translation.py` as it currently already expects a list and then concats the datasets.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/696

Differential Revision: D15214049

fbshipit-source-id: 03e43a7b69c7aefada2ca668abf1eac1969fe013

0add50c2

Remove redundant distributed init · 57da383c

Myle Ott authored May 06, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/707

Differential Revision: D15219014

Pulled By: myleott

fbshipit-source-id: f38f2cf817d05e0871ff9084a810d109848e827c

57da383c