Commits · 47fd985269e92735826c05d9160d68dc8e8a9807 · OpenDAS / Fairseq

22 Jul, 2019 1 commit

Move Masked LM components to legacy/ -- new ones are coming · 47fd9852

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740

Differential Revision: D16377797

Pulled By: myleott

fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f

47fd9852

21 Jul, 2019 1 commit

Default to mmap and infer dataset implementations automatically · 5f78106a

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/751

Differential Revision: D16410989

Pulled By: myleott

fbshipit-source-id: ddbbee49756f9ff6c4487977a3f5d2259b7abafe

5f78106a

17 Jul, 2019 1 commit

Nucleus (top-P) sampling (#710) · e46b924d

Xing Zhou authored Jul 17, 2019

Summary:
Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p.

To test it:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710

Test Plan:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3

python tests/test_sequence_generator.py

python tests/test_binaries.py

Reviewed By: myleott

Differential Revision: D16286688

Pulled By: xingz9

fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f

e46b924d

09 Jul, 2019 1 commit

Fix multilingual evaluation bug (#592) · e4335001

Peng-Jen Chen authored Jul 09, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/592

Fix bug reported at
https://github.com/pytorch/fairseq/commit/9c3bb5c6d6c7d6442a28ccb8a81b2fc4e5782ace#r34181600

D15682169 breaks the multilingual translation generation.

Reviewed By: dpacgopinath

Differential Revision: D16147454

fbshipit-source-id: e0cf4d32f362190a0542fa0160f65a2a207ca3fa

e4335001

24 Jun, 2019 1 commit

Add 'doc' break mode to TokenBlockDataset · d9c79133

Myle Ott authored Jun 24, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/679

Test Plan: https://our.intern.facebook.com/intern/chronos/jobinstance/?jobinstanceid=5191319216&smc=chronos_gp_admin_client&log_type=stdout&offset=0&pretty_logs=false

Differential Revision: D15961008

Pulled By: myleott

fbshipit-source-id: cf214de96665b33887ef64cfcb45a51f81002ed1

d9c79133

21 Jun, 2019 1 commit

get_batch_iterator: allow max_positions=None (#673) · 7b4f5517

James Cross authored Jun 21, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/673

This function breaks when leaving the argument `max_positions` with the default value `None`, which is presumably not the intended behavior.

Reviewed By: theweiho, myleott

Differential Revision: D15937221

fbshipit-source-id: 1f5dc1c27ad9b6a89501d2dc015de12181059349

7b4f5517

20 Jun, 2019 2 commits

Better explain the inference argument format of multilingual translation · 9c3bb5c6

Peng-Jen Chen authored Jun 19, 2019

Summary:
In https://github.com/pytorch/fairseq/issues/656, people are often confused about how to set multilingual translation parameters at inference time.

This diff add more checks to ensure the arguments (`--lang-pairs`, `--encoder-langtok`, `--decoder-langtok`) load from checkpoint are consistent with arguments specified in generate/interactive command line.
We also add a section in example page to explain how to set the arguments

Reviewed By: myleott

Differential Revision: D15682169

fbshipit-source-id: 64e6db94cd72ea7ce2d0aa1067c9c2dcd3b8a2ac

9c3bb5c6

wav2vec model (#654) · 392fce8a

alexeib authored Jun 19, 2019

Summary:
Merging wav2vec to master. Includes renames (Cpc -> wav2vec) and some light example files.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/654

Differential Revision: D15913409

Pulled By: alexeib

fbshipit-source-id: f723e6f211706cd9431c7d76dc12c4e80c9cfc80

392fce8a

11 Jun, 2019 4 commits

Iterate on torch.hub interface · 5bdee18e

Myle Ott authored Jun 11, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/793

Differential Revision: D15758755

Pulled By: myleott

fbshipit-source-id: b93e4ac11bde36a0b59b4d6d1c84d31c3124d767

5bdee18e

Automatically fill in default values from add_args · eea4d20b

Myle Ott authored Jun 11, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/797

Differential Revision: D15761071

Pulled By: myleott

fbshipit-source-id: 257d4a2297e83da7e59baed154dbafd6bfe614bf

eea4d20b

Python3.5 compat (#794) · a8f28ecb

Bairen Yi authored Jun 11, 2019

Summary:
See #467. Ping myleott to review.

This is a work-related contribution. Ping lark to review.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/794

Differential Revision: D15756816

Pulled By: myleott

fbshipit-source-id: 6dce3ff3a713bf5f60e5782bc260b2ca9d2c0a9b

a8f28ecb

when given prefix_tokens, sequence generator would generate (exactly) same... · 9dc9a486

yilinyang7 authored Jun 11, 2019

when given prefix_tokens, sequence generator would generate (exactly) same finished candidates (#713)

Summary:
https://github.com/pytorch/fairseq/issues/712
Pull Request resolved: https://github.com/pytorch/fairseq/pull/713

Differential Revision: D15242432

Pulled By: myleott

fbshipit-source-id: a230ee48f4bf891c805609c428d7233a0ad21179

9dc9a486

01 Jun, 2019 1 commit

Fix positions for LM · 8c03ff2d

Myle Ott authored May 31, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/622

Differential Revision: D15572555

Pulled By: myleott

fbshipit-source-id: 2b81f22207b4c894ffe645af0b45c70ac0a80612

8c03ff2d

31 May, 2019 1 commit

Replace --decoder-final-norm with --no-decoder-final-norm · 8ca05802

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/620

Differential Revision: D15569440

Pulled By: myleott

fbshipit-source-id: c4681f1c72467c04cd2654e87bc724c94b76e3fb

8ca05802

22 May, 2019 1 commit

Fix semisupervised translation · c11aaf14

Matt Le authored May 22, 2019

Summary: Fixes semisupervised translation task to deal with change in order of data loading and model creation (D15428242). When we build the model, we create the backtranslation function, which we can then pass in to the constructor of BacktranslationDataset

Reviewed By: myleott

Differential Revision: D15455420

fbshipit-source-id: 95101ca92f8af33702be3416147edd98da135a20

c11aaf14

17 May, 2019 1 commit

Clean up sharded train iterator · 3bfbb49b

Myle Ott authored May 16, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/586

Differential Revision: D15372949

Pulled By: myleott

fbshipit-source-id: c1cf1c645e8d55fc8568f23a47c45677ac9ab1da

3bfbb49b

16 May, 2019 2 commits

Add multi-dataset loading to multilingual_translation · 0863ea68

Peng-Jen Chen authored May 15, 2019

Summary: Similar to TranslationTask, we want to enable multilingual translation task to be able to load 'train{k}' datasets from data-bin folder.

Reviewed By: lematt1991

Differential Revision: D15363481

fbshipit-source-id: 5fed7be19383023b792ed2fd38e655cbcecc8b90

0863ea68

fixed cmd arg for shuffle dataset masked lm task · 861dd2b7

Naman Goyal authored May 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/584

Reviewed By: myleott

Differential Revision: D15360774

Pulled By: myleott

fbshipit-source-id: b18efbb6ff5a8832c61b689f3d87c958cbd908e9

861dd2b7

15 May, 2019 3 commits

added shuffle as arg for masked_lm for experimenting with pad effecie… (#582) · 74c936dc

Naman Goyal authored May 15, 2019

Summary:
added shuffle as arg for masked_lm for experimenting with pad effecient batching
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/582

Reviewed By: jingfeidu

Differential Revision: D15355105

Pulled By: jingfeidu

fbshipit-source-id: 9925271a0bc2f9d283f354d158bd4b5ec8788b39

74c936dc

Updates to model API (#561) · dffb1674

Myle Ott authored May 15, 2019

Summary:
- `FairseqModel` -> `FairseqEncoderDecoderModel`
- add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
- `encoder_out_dict` -> `encoder_out`
- rm unused `remove_head` functions
- update docs
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/561

Differential Revision: D15271142

Pulled By: myleott

fbshipit-source-id: 8e8864e399336020f0271c780598e968ff51a264

dffb1674

Various fixes for Masked LM (#573) · bf106796

Myle Ott authored May 14, 2019

Summary:
Various fixes for Masked LM

- use --activation-fn instead of --gelu
- use --dataset-impl instead of --lazy-load
- add embed_scale option to TransformerSentenceEncoder
- fix encoder_normalize_before to include a final layer norm
- delete BertLayerNorm
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/573

Reviewed By: borguz

Differential Revision: D15317933

Pulled By: myleott

fbshipit-source-id: 8ecb46556ad43e76e92d41ed8f5a62e8516fd375

bf106796

14 May, 2019 1 commit

rm default_key from MultiCorpusSampledDataset · 7432130e

Myle Ott authored May 14, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/575

Differential Revision: D15318004

Pulled By: myleott

fbshipit-source-id: ad918d71b1bd8074decf5ec3463dd9bc9487bbe9

7432130e

10 May, 2019 1 commit
- fbshipit-source-id: 682b375c6e7535f12faaf9ca32811051f9e874da · 47fbc491
  myleott authored May 10, 2019
  
  47fbc491
08 May, 2019 1 commit

Cleanup LM + Flake8 · f2563c21

Myle Ott authored May 08, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/720

Differential Revision: D15259091

Pulled By: myleott

fbshipit-source-id: 06a35996c06ccddb49fdc9e01e348ff3c9da334e

f2563c21

07 May, 2019 1 commit

Memory-Mapped IndexedDataset implementation (#589) · a1c997bd

Davide Caroselli authored May 07, 2019

Summary:
Following discussion in https://github.com/pytorch/fairseq/issues/574:

 - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
- Update scripts/read_binarized.py to support new MMapIndexedDataset
- Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
- Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
Pull Request resolved: https://github.com/pytorch/fairseq/pull/589

Differential Revision: D14597128

Pulled By: myleott

fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e

a1c997bd

06 May, 2019 3 commits

allowing sharded dataset (#696) · 0add50c2

Naman Goyal authored May 06, 2019

Summary:
Co-authored-by: myleott <myleott@fb.com>

Changing `data` to be `str` with colon separated list for loading sharded datasets. This change is useful for loading large datasets that cannot fit into, memory. The large dataset can be sharded and then each shard is loaded in one epoch in roudrobin manner.

For example, if there are `5` shards of data and `10` epochs then the shards will be iterated upon `[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]`.

myleott We need to look into `translation.py` as it currently already expects a list and then concats the datasets.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/696

Differential Revision: D15214049

fbshipit-source-id: 03e43a7b69c7aefada2ca668abf1eac1969fe013

0add50c2

added masked_lm task (#697) · e1ffea87

Naman Goyal authored May 06, 2019



Summary:
Co-authored-by: jingfeidu <jingfeidu@fb.com>

1) Adding `masked_lm` task for BERT like training. Code mostly taken from jingfeidu 's implementation.

2) Added `has_eos` option to `block_pair_dataset` for working with dataset that has been preprocessed with having `eos`.

Depends on: https://github.com/pytorch/fairseq/pull/696
Pull Request resolved: https://github.com/pytorch/fairseq/pull/697

Differential Revision: D15214050

fbshipit-source-id: c179ce2d70e59d2ddc941b13ceda99d929878931

e1ffea87

Fix semisupervised_translation task (#706) · 817fccf5

Maksym Del authored May 06, 2019

Summary:
Pass required "sample_key" argument to forward-backward call in semi-supervised task.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/706

Differential Revision: D15217957

Pulled By: pipibjc

fbshipit-source-id: bf943d566c5caa67682dfb16ff8b7c432323cdba

817fccf5

04 May, 2019 1 commit

Fix and generalize --temperature option (#508) · 96ac28d3

Myle Ott authored May 04, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/508

The previous version applied the temperature after the softmax. Fix that, and
also generalize so it works with other search approaches.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/694

Differential Revision: D15175160

Pulled By: myleott

fbshipit-source-id: cc87ff0e97a8a1dd37f9983163f58a8641155ab0

96ac28d3

01 May, 2019 1 commit

add ConcatDataset support for XLM · 91c78477

taineleau authored May 01, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/684

Differential Revision: D15154631

Pulled By: myleott

fbshipit-source-id: 5e7dd9651d9ed239b60c51b9a11d08c80307d3ba

91c78477

30 Apr, 2019 1 commit

Merge internal changes (#654) · d45db804

Myle Ott authored Apr 29, 2019

Summary:
- Add --add-bos-token option to LM task
- Cleanup utils.py and options.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/654

Differential Revision: D15041794

Pulled By: myleott

fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e

d45db804

26 Apr, 2019 1 commit

Passing kwargs in setup_task in fairseq_task (#670) · f701aa8c

Mohammad Sadegh Rasooli authored Apr 26, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/670

Pytorch-translate task needs to use extra arguments (such as vocabulary objects). By passing kwargs, we are able to have the ability to have extra arguments in setup_task

Reviewed By: akinh, pipibjc

Differential Revision: D15086810

fbshipit-source-id: 555f7976020eaac1febb8226f5a0055af0407ea6

f701aa8c

25 Apr, 2019 1 commit

Load a XLM model into transformer encoder / decoder for MT training (#629) · 8da9b1c5

Liezl Puzon authored Apr 25, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/629

Use GeLU as an alternate activation layer for ReLU.

Reviewed By: lematt1991

Differential Revision: D14689851

fbshipit-source-id: 7ec81fa34bc7bd0e1e43b337847ae932dcbf8b15

8da9b1c5

16 Apr, 2019 1 commit

Open Source MLM Implementation in Fairseq (#635) · 8776928c

Kartikay Khandelwal authored Apr 16, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/635

Adding a task and relevant models, datasets and criteria needed for training Cross-lingual Language Models similar to Masked Language Model used in XLM (Lample and Conneau, 2019 - https://arxiv.org/abs/1901.07291).

Reviewed By: liezl200

Differential Revision: D14943776

fbshipit-source-id: 3e416a730303d1dd4f5b92550c78db989be27073

8776928c

10 Apr, 2019 1 commit

Back translation + denoising in MultilingualTranslation task (#620) · d7e19573

Peng-Jen Chen authored Apr 10, 2019

Summary:
- Add language token to MultilingualTranslation task
- Add back translation and denoising loss to MultilingualTranslation task
Pull Request resolved: https://github.com/pytorch/fairseq/pull/620

Reviewed By: liezl200

Differential Revision: D14756873

Pulled By: pipibjc

fbshipit-source-id: 89d668db26848fd95f446edf5923bab2113636f7

d7e19573

05 Apr, 2019 1 commit

Eval and log on a subset of directions for multimodel training (#605) · 40ac340b

Liezl Puzon authored Apr 05, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/605

Eval and log on a subset of directions for multimodel training

This reduces code duplication in PyTorch Translate's semi_supervised task and will enable clean multitask setups in the future.

Reviewed By: pipibjc, dpacgopinath

Differential Revision: D14672779

fbshipit-source-id: 1342c71781f0824cc56a38ad1c1822e34eaef337

40ac340b

15 Mar, 2019 1 commit

0.6.1 -> 0.6.2 (#577) · e6422528

Myle Ott authored Mar 15, 2019

Summary:
Changelog:
- 998ba4f: Add language models from Baevski & Auli (2018)
- 4294c4f6: Add mixture of experts code from Shen et al. (2019)
- 00493490: Add example for multilingual training
- 48d9afbe: Speed improvements, including fused operators from apex
- 44d27e64: Add Tensorboard support
- d17fa851: Add Adadelta optimizer
- 9e1c880f: Add `FairseqEncoderModel`
- b65c579b: Add `FairseqTask.inference_step` to modularize generate.py
- 2ad1178e: Add back `--curriculum`
- Misc bug fixes and other features

Pull Request resolved: https://github.com/pytorch/fairseq/pull/577

Differential Revision: D14481233

Pulled By: myleott

fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7

e6422528

04 Mar, 2019 1 commit

Add --curriculum (fixes #533) · 2ad1178e

Myle Ott authored Mar 04, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/554

Differential Revision: D14300596

Pulled By: myleott

fbshipit-source-id: f38c8e58daef99d5e4b97dd423e4142e4294a4f0

2ad1178e

28 Feb, 2019 2 commits

Deprecate _aggregate_logging_outputs · 8a8df81d

Myle Ott authored Feb 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/498

Differential Revision: D14024524

Pulled By: myleott

fbshipit-source-id: 1b0be4bb212dbab41ea0959ac34020832ff00645

8a8df81d

Move string line encoding logic from tokenizer to Dictionary (unified diff). (#541) · f296824f

Vladimir Karpukhin authored Feb 28, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/541

Just a combo of a stacked pair D14057943 & D14176011,
Made this as a separete diff cause there seems to be some issue with porting a stacked change into github repo

Differential Revision: D14251048

fbshipit-source-id: 0a47f534a69d6ab2ebe035fba40fd51748cccfb8

f296824f