Commits · 1328bc090499fd96aeac5ec52bca332c9875bc2f · OpenDAS / Fairseq

26 Jun, 2019 3 commits

add missing condition for first moment type as statement · 1328bc09

Alexander Rives authored Jun 26, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/684

Differential Revision: D16006333

Pulled By: myleott

fbshipit-source-id: 95bd4215734281194008fa029e81407d63b335ac

1328bc09

FIx dataset loading when there are multiple valid subsets (#835) · 8b514b9f

Liang Wang authored Jun 26, 2019

Summary:
When we have multiple valid subsets, say `valid`, `valid1` and `valid2`, if `combine=True` holds, when loading `valid` subset, it will try to locate and load `valid`, `valid1`, `valid2`... and then combine them into one dataset. Set `combine` to `False` solves this issue.

In my experiment, I have 3 valid subsets with 3000, 5000 and 8701 examples, with argument `--valid-subset valid,valid1,valid2`, the log is as follows:

```
......
| ./mix_data/bin valid src-trg 3000 examples
| ./mix_data/bin valid1 src-trg 5000 examples
| ./mix_data/bin valid2 src-trg 7801 examples
| ./mix_data/bin valid1 src-trg 5000 examples
| ./mix_data/bin valid2 src-trg 7801 examples
......
```

As shown above, `valid1` and `valid2` subsets are incorrectly loaded twice.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/835

Differential Revision: D16006343

Pulled By: myleott

fbshipit-source-id: ece7fee3a00f97a6b3409defbf7f7ffaf0a54fdc

8b514b9f

Move task import in MultilingualTransformer to fix circular dependencies · ab2fa185

Myle Ott authored Jun 26, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/687

Differential Revision: D16005399

Pulled By: myleott

fbshipit-source-id: bf099c17e2095394acc452e9abcb4ee04afd0426

ab2fa185

25 Jun, 2019 1 commit

avoid "divided by zero error" in logging_outputs when --use-bmuf is e… (#812) · b3864b28

freewym authored Jun 25, 2019

Summary:
… enabled.

When doing multi-gpu training with --use-bmuf turned on and --global-sync-iter > 1, each replica may not sync with other replicas at each iteration. So logging_outputs only has stats of their own.  On the other hand, logging_outputs may be empty at the end of an epoch after "a dummy iteration" because the number of replicas does not divide the number of batches of the training data. If this happens, sample_size and ntokens would be 0 for some replica  and cause "divided by 0" error. This fix sets *loss to 0 if sample_size/ntokens is 0.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/812

Reviewed By: myleott, yqwangustc

Differential Revision: D15908614

Pulled By: nayansinghal

fbshipit-source-id: c92e8e095f012bdb4ef753a3c627fd215afa215d

b3864b28

24 Jun, 2019 1 commit

Add 'doc' break mode to TokenBlockDataset · d9c79133

Myle Ott authored Jun 24, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/679

Test Plan: https://our.intern.facebook.com/intern/chronos/jobinstance/?jobinstanceid=5191319216&smc=chronos_gp_admin_client&log_type=stdout&offset=0&pretty_logs=false

Differential Revision: D15961008

Pulled By: myleott

fbshipit-source-id: cf214de96665b33887ef64cfcb45a51f81002ed1

d9c79133

23 Jun, 2019 3 commits

Fix resuming training when using --memory-efficient-fp16 · efb43450

Myle Ott authored Jun 23, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/678

Differential Revision: D15956712

Pulled By: myleott

fbshipit-source-id: 5048d06ddfbec0045558a22c777a966cca1ec396

efb43450

Fixed argument for Adaptive Softmax Instantiation · 39a60b84

Alex Mathai authored Jun 23, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/830

Differential Revision: D15960624

Pulled By: myleott

fbshipit-source-id: ecfef5c51b886e3162bb8e07d232c6e9ea1169b0

39a60b84

Stringlize 2-d tensors with Dictionary. · 4340b34e

Qian Wang authored Jun 23, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/828

Differential Revision: D15960629

Pulled By: myleott

fbshipit-source-id: ca631651e9a90ce8ed90ca23987519001fea3656

4340b34e

21 Jun, 2019 2 commits

get_batch_iterator: allow max_positions=None (#673) · 7b4f5517

James Cross authored Jun 21, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/673

This function breaks when leaving the argument `max_positions` with the default value `None`, which is presumably not the intended behavior.

Reviewed By: theweiho, myleott

Differential Revision: D15937221

fbshipit-source-id: 1f5dc1c27ad9b6a89501d2dc015de12181059349

7b4f5517

Support DDP.no_sync context manager · b625d53d

Myle Ott authored Jun 20, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/671

Differential Revision: D15925248

fbshipit-source-id: 9eeea8a257929347e2458afdfc1def8dbb925a72

b625d53d

20 Jun, 2019 6 commits

Use bert init for xlm_base · 6be5f07c

Matt Le authored Jun 20, 2019

Summary:
Use bert init for xlm_base. This seems to be much closer to what is done in the [XLM](https://github.com/facebookresearch/XLM/blob/master/src/model/transformer.py#L44) repo.

At update 10 with BERT init (f121471600), loss starts at 14.234

At update 10 without BERT init (f121471612), loss starts at 154.423

Reviewed By: liezl200, pipibjc

Differential Revision: D15874836

fbshipit-source-id: f81bf83a078992d7476ba7fdf263b731a9f5b66d

6be5f07c

v0.7.1: fix PyPI setup and tests · 881381cf

Myle Ott authored Jun 20, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/818

Differential Revision: D15916265

Pulled By: myleott

fbshipit-source-id: c66c0bd988d3472c4150226952f34ee8d4c3db86

881381cf

Enhanced MMapIndexedDataset: less memory, higher speed (#816) · 9462a819

davidecaroselli authored Jun 19, 2019

Summary:
I have made an upgrade to my previous implementation of MMapIndexedDataset, now:
- It uses up to **4 times less memory and disk space**
- Words per second is slightly improved thanks to less memory access
Pull Request resolved: https://github.com/pytorch/fairseq/pull/816

Differential Revision: D15899848

Pulled By: myleott

fbshipit-source-id: 9ddeb4809729ef69cc6b0867b33ee71184d845e6

9462a819

Better explain the inference argument format of multilingual translation · 9c3bb5c6

Peng-Jen Chen authored Jun 19, 2019

Summary:
In https://github.com/pytorch/fairseq/issues/656, people are often confused about how to set multilingual translation parameters at inference time.

This diff add more checks to ensure the arguments (`--lang-pairs`, `--encoder-langtok`, `--decoder-langtok`) load from checkpoint are consistent with arguments specified in generate/interactive command line.
We also add a section in example page to explain how to set the arguments

Reviewed By: myleott

Differential Revision: D15682169

fbshipit-source-id: 64e6db94cd72ea7ce2d0aa1067c9c2dcd3b8a2ac

9c3bb5c6

wav2vec model (#654) · 392fce8a

alexeib authored Jun 19, 2019

Summary:
Merging wav2vec to master. Includes renames (Cpc -> wav2vec) and some light example files.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/654

Differential Revision: D15913409

Pulled By: alexeib

fbshipit-source-id: f723e6f211706cd9431c7d76dc12c4e80c9cfc80

392fce8a

v0.7.0 (#817) · bd710e75

Myle Ott authored Jun 19, 2019

Summary:
Notable (possibly breaking) changes:
- d45db804: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
- f2563c21: Move LM definitions into separate files
- dffb1674: Updates to model API:
  - `FairseqModel` -> `FairseqEncoderDecoderModel`
  - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
  - `encoder_out_dict` -> `encoder_out`
  - rm unused `remove_head` functions
- 34726d56: Move `distributed_init` into `DistributedFairseqModel`
- cf17068a: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
- d45db804: Change default LR scheduler from `reduce_lr_on_plateau` to `fixed`
- 96ac28d3: Rename `--sampling-temperature` -> `--temperature`
- fc1a19a3: Deprecate dummy batches
- a1c997bd: Add memory mapped datasets
- 0add50c2: Allow cycling over multiple datasets, where each one becomes an "epoch"

Plus many additional features and bugfixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/817

Differential Revision: D15913844

Pulled By: myleott

fbshipit-source-id: d5b5d678efdd9dd3e4d7ca848ddcf1ec2b21bf6b

bd710e75

19 Jun, 2019 4 commits

Add option to freeze transformer params for fine-tuning · af9500dc

Michael Wu authored Jun 19, 2019

Summary: add flags to freeze embedding parameters and transformer layer parameters in `TransformerSentenceEncoder`.

Reviewed By: myleott

Differential Revision: D15866135

fbshipit-source-id: e634d7adfd5e81eacccf2b9cf6bc15bad30bd1fe

af9500dc

Support different embed dim in Transformer decoder · 461a366d

Myle Ott authored Jun 19, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/811

Differential Revision: D15880880

Pulled By: myleott

fbshipit-source-id: c47e09a90c945aca82b26edb4a8af93e063d5b00

461a366d

Replace the use of the deprecated torch.distributed.reduce_op with to… (#804) · 00ac823e

freewym authored Jun 19, 2019

Summary:
…rch.distributed.ReduceOp
Pull Request resolved: https://github.com/pytorch/fairseq/pull/804

Differential Revision: D15877033

Pulled By: myleott

fbshipit-source-id: 58e7c39a88b67345a55b761fee4d9f211a5ee82c

00ac823e

Add fairspeq task to train ASR model with auxiliary data. (#813) · 14282ff3

Arya McCarthy authored Jun 18, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/813

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/663

Pull Request resolved: https://github.com/fairinternal/fairspeq/pull/4

Introduce new training for speech models which accept additional training data.

Reviewed By: liezl200

Differential Revision: D15846661

fbshipit-source-id: 8b2cbfd56a86cf03c0b34c4a025bebdd5db7204e

14282ff3

15 Jun, 2019 1 commit

Close memory maps · 1c1fd730

Myle Ott authored Jun 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/655

Differential Revision: D15816573

fbshipit-source-id: ac0118a1d407dc132cc7d82e029eac6c8ec76d2a

1c1fd730

13 Jun, 2019 1 commit

Switch to gzip for large WMT'18 ensemble (#803) · 6d1233fa

Myle Ott authored Jun 12, 2019

Summary:
It's so much faster to extract (3 minutes instead of 20).
Pull Request resolved: https://github.com/pytorch/fairseq/pull/803

Differential Revision: D15795810

Pulled By: myleott

fbshipit-source-id: 3b2ae8bd7924a77ac8e795f5e1a7da0c4ae27374

6d1233fa

12 Jun, 2019 3 commits

Add Model Averaging · 6982c404

Nayan Singhal authored Jun 12, 2019

Summary:
Implemented model averaging for fairseq.
Removed the ddp wrapper if global optimizer is provided.
Syncing all the models based on the iteration provide in the input

TODO:
1) Fix throughput and wps meter. Need to check other meters too.
2) Replace Model average code with BMUF algorithm implementation.

Reviewed By: myleott

Differential Revision: D15711044

fbshipit-source-id: 58a4af74db2a61d06762597b95836cbeb1ed82cc

6982c404

Add more torch.hub deps · 78c2fcf0

Myle Ott authored Jun 12, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/801

Differential Revision: D15781975

Pulled By: myleott

fbshipit-source-id: b86276cd3a40138c09494637c43ce52a56c4aced

78c2fcf0

Add missing dependencies to hubconf · 37df862e

Myle Ott authored Jun 11, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/799

Differential Revision: D15773932

Pulled By: myleott

fbshipit-source-id: 650c0621bedb3b7ecebc0654d8e10d7692c50994

37df862e

11 Jun, 2019 7 commits

Iterate on torch.hub interface · 5bdee18e

Myle Ott authored Jun 11, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/793

Differential Revision: D15758755

Pulled By: myleott

fbshipit-source-id: b93e4ac11bde36a0b59b4d6d1c84d31c3124d767

5bdee18e

Automatically fill in default values from add_args · eea4d20b

Myle Ott authored Jun 11, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/797

Differential Revision: D15761071

Pulled By: myleott

fbshipit-source-id: 257d4a2297e83da7e59baed154dbafd6bfe614bf

eea4d20b

Add exception for bsz=1 with prefix generation (#796) · 1b937bb2

Myle Ott authored Jun 11, 2019

Summary:
This is a temporary workaround to support sampling after https://github.com/pytorch/fairseq/issues/713. We'll need to revisit this to support sampling and beam more generally.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/796

Differential Revision: D15760808

Pulled By: myleott

fbshipit-source-id: ecaf4f161b0c30de037f32007e4610a559a49230

1b937bb2

Python3.5 compat (#794) · a8f28ecb

Bairen Yi authored Jun 11, 2019

Summary:
See #467. Ping myleott to review.

This is a work-related contribution. Ping lark to review.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/794

Differential Revision: D15756816

Pulled By: myleott

fbshipit-source-id: 6dce3ff3a713bf5f60e5782bc260b2ca9d2c0a9b

a8f28ecb

Add generic registry mechanism · 9b40999e

Myle Ott authored Jun 11, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/792

Differential Revision: D15741781

Pulled By: myleott

fbshipit-source-id: c256c7900c307d485904e69b1526b9acbe08fec9

9b40999e

when given prefix_tokens, sequence generator would generate (exactly) same... · 9dc9a486

yilinyang7 authored Jun 11, 2019

when given prefix_tokens, sequence generator would generate (exactly) same finished candidates (#713)

Summary:
https://github.com/pytorch/fairseq/issues/712
Pull Request resolved: https://github.com/pytorch/fairseq/pull/713

Differential Revision: D15242432

Pulled By: myleott

fbshipit-source-id: a230ee48f4bf891c805609c428d7233a0ad21179

9dc9a486

Fix of MHA for TPUs (#636) · ee8bcb17

Sergey Edunov authored Jun 10, 2019

Summary:
Multi-Head attention is currently not TPU-friendly, specifically .data_ptr() is not supported and should not be used. Also there are potential issues with correctness of existing code (e.g. data_ptr() can point to the same storage for different tensors). Rather than rely on data_ptr() we should explicitly set self_attention or encoder_decoder_attention flags.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/636

Reviewed By: myleott

Differential Revision: D15709898

Pulled By: edunov

fbshipit-source-id: f931713193c51be848a5de20da730ac3a3ce0187

ee8bcb17

10 Jun, 2019 2 commits

More generator features for demo (#791) · 4868c182

Myle Ott authored Jun 10, 2019

Summary:
- make it possible to load file_utils.py without the dependencies
- add some more demo features
Pull Request resolved: https://github.com/pytorch/fairseq/pull/791

Differential Revision: D15739950

Pulled By: myleott

fbshipit-source-id: 38df5209973a6fe2e3651575b97134e096aaf5bf

4868c182

fix log printing in progress bar (#778) · a58c1127

freewym authored Jun 10, 2019

Summary:
In the current progress bar, the counter for log_interval will always start from 0, which is not correct if reloading from a checkpoint in the middle of an epoch. This fix obtains the offset from the iterator to set the counter correctly.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/778

Differential Revision: D15739953

Pulled By: myleott

fbshipit-source-id: a1d13403ec5783b22e01d7cb63874fd8dea7f8b0

a58c1127

07 Jun, 2019 1 commit

Replace unknown word by original source word when empty string is given (#770) · 1ca075a2

Ning Dong authored Jun 06, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/770

Without this change comment here https://fburl.com/w1cejgw9 is inconsistent with the implementation.

Reviewed By: xianxl

Differential Revision: D15582826

fbshipit-source-id: 16d8368560153b251beed8b290f51fcdd8a8faee

1ca075a2

06 Jun, 2019 1 commit

Change encoder_learned_pos default back to True for xlm_base · fa7791df

Matt Le authored Jun 06, 2019

Reviewed By: pipibjc

Differential Revision: D15635402

fbshipit-source-id: e92fab914de40775d7bad851420355240d822bde

fa7791df

04 Jun, 2019 4 commits

Fix loading XLM pretraining · 5408bc08

Matt Le authored Jun 04, 2019

Summary: We never actually load the model parameters from an XLM model when using tranformer_from_pretrained_xlm. Also, change encoder_learned_pos from True -> False

Reviewed By: liezl200

Differential Revision: D15629061

fbshipit-source-id: 759eadc88041eae94505477960de57dd78a99dcb

5408bc08

Fixing xlm example docts (#776) · 0d636744

lematt1991 authored Jun 04, 2019

Summary:
Resolves #762
Pull Request resolved: https://github.com/pytorch/fairseq/pull/776

Differential Revision: D15631503

Pulled By: lematt1991

fbshipit-source-id: 103f77d553476917b8b0f8001767217fb311d920

0d636744

Remove overridden inverse_sqrt lr scheduler in dynamic conv example (#769) · b1dd40cf

lematt1991 authored Jun 04, 2019

Summary:
Resolves #768
Pull Request resolved: https://github.com/pytorch/fairseq/pull/769

Differential Revision: D15621841

Pulled By: lematt1991

fbshipit-source-id: 694effe3788ff7d04864217d673608ec31da589e

b1dd40cf

Adding masked_lm_dictionary to pytorch_translate (#630) · 4ed5abc9

Biao Lu authored Jun 03, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/630

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/629

Pull Request resolved: https://github.com/pytorch/translate/pull/562

Pull Request resolved: https://github.com/pytorch/fairseq/pull/774

forked masked_lm_dictionary from fairseq
changed import in pytorch_translate to use the new masked_lm_dictionary
registered cooresponding tasks

Reviewed By: liezl200

Differential Revision: D15410352

fbshipit-source-id: 06516caabdd4dc5cdee9ad1d8025978f4eea6c4b

4ed5abc9