Commits · 6d2e08314c0a13e58eb9518cbee84b93b4a8e097 · OpenDAS / Fairseq

08 Jul, 2019 1 commit

Integrate torch.nn and fairseq MultiheadAttention (#772) · 6d2e0831

Guanheng Zhang authored Jul 08, 2019

Summary:
Integrate torch.nn and fairseq MultiheadAttention modules. In the future, both libraries will be benefited from performance optimization together.

Under the following circumstances, the calculation of the MultiheadAttention will still remain in fairseq, including:
1. onnx trace
2. incremental state
3. static kv

We plan to gradually mitigate those capabilities to PyTorch's core library.

Faieseq users can user the attribute self.enable_torch_version to force the calculations in either torch or fairseq. We use the following script to ensure both versions yield the same results.

------------------------------------------------------------------------------------
```
import torch
from fairseq.modules import MultiheadAttention
import time

embed_dim = 64
kv_embed_dim = 1208
num_heads = 16
src_len = 20
tgt_len = 30
bsz = 10

model = MultiheadAttention(embed_dim, num_heads, kdim=kv_embed_dim, vdim=kv_embed_dim,
                           bias=True, add_bias_kv=True, add_zero_attn=True)

query = torch.rand((src_len, bsz, embed_dim))
key = torch.rand((src_len, bsz, kv_embed_dim))
value = torch.rand((src_len, bsz, kv_embed_dim))

attn_mask = torch.randint(0, 2, (src_len, src_len)).float()
attn_mask.masked_fill_(attn_mask == 0, float('-inf'))
attn_mask.masked_fill_(attn_mask > 0, float('0.0'))

seq_mask = torch.randint(0, 2, (1, src_len))
key_padding_mask = seq_mask
for i in range(bsz-1):
    key_padding_mask = torch.cat([key_padding_mask, seq_mask], axis=0)
key_padding_mask = key_padding_mask == 1

# Apply torch.nn version
model.enable_torch_version = True
torch_output, torch_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask)

# Apply fairseq version
model.enable_torch_version = False
fairseq_output, fairseq_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask)

print("torch and fairseq generate same results: outputs are same ? ",
      torch.allclose(torch_output, fairseq_output, atol=5e-6, rtol=1e-6),
      ", weights are same ? ",
      torch.allclose(torch_weight, fairseq_weight, atol=5e-6, rtol=1e-6)
)
```
------------------------------------------------------------------------------------
Expected results:
torch and fairseq generate same results: outputs are same ?  True , weights are same ?  True

------------------------------------------------------------------------------------
Similar performance is expected for both two versions. Using the following setup and have the initial performance benchmark results:

#########################
embed_dim = 32
kv_embed_dim = 32
num_heads = 4
src_len = 3
tgt_len = 2
bsz = 4
num_samples = 50000

#########################
torch-version MultiheadAttention cpu time: 0.46589  ms per iteration.
fairseq-version MultiheadAttention cpu time: 0.47861  ms per iteration.
torch-version MultiheadAttention gpu time: 0.82330  ms per iteration.
fairseq-version MultiheadAttention gpu time: 0.79410  ms per iteration.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/772

Reviewed By: myleott

Differential Revision: D16108450

Pulled By: zhangguanheng66

fbshipit-source-id: cd2eb5a6eeeab6c274999b7928c2af14fc211565

6d2e0831

06 Jul, 2019 2 commits

Added a comment to inform coders that pack_padded_sequence requires that... · 417ecb4b

vineetk1 authored Jul 06, 2019

Added a comment to inform coders that pack_padded_sequence requires that padding must be on the right (#860)

Summary:
The PyTorch document on pack_padded_sequence has no information regarding a requirement that padding must be on the right. Therefore, this information is added as a comment on Line 212 of [https://github.com/vineetk1/fairseq/blob/master/fairseq/models/lstm.py](url)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/860

Differential Revision: D16142102

Pulled By: myleott

fbshipit-source-id: 7cb6d4df64b17b54b223de03bd966ca16077c3fe

417ecb4b

Add specific compile flags for macOS (#862) · cc292afa

Louis MARTIN authored Jul 06, 2019

Summary:
Fairseq wouldn't install on macOS.
A workaround was found here: https://github.com/pytorch/fairseq/issues/289
This is now automatic in setup.py, maybe be there's a cleaner way to do it.

I checked that it compiles fine on Linux and macOS.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/862

Differential Revision: D16142105

Pulled By: myleott

fbshipit-source-id: 998ac7781d7a1ac047f4f9239c1fe16eab4be0dd

cc292afa

04 Jul, 2019 1 commit

support streaming iterator · 5c241c8c

Spencer Poff authored Jul 03, 2019

Summary:
For tasks that involve streaming data directly from an API, we need a simpler epoch iterator.

Also included in this change is support for initializing a dictionary with an arbitrary list of special symbols.

Reviewed By: myleott

Differential Revision: D16110603

fbshipit-source-id: be6d9f680292dec1512614871f9269c95ac84861

5c241c8c

02 Jul, 2019 1 commit

add --max-tokens-valid option for validation · bccfddbb

Xutai Ma authored Jul 01, 2019

Summary: Add the max-token-valid option. Sometime a separate max batch tokens for validation may be helpful, for example when there is a long sequence in validation set thats larger than max_tokens (it's rare in MT but could happen in ASR or AST).

Reviewed By: myleott

Differential Revision: D16076951

fbshipit-source-id: ae7f4218594580b9450a8196d7afa1e7e2018aee

bccfddbb

01 Jul, 2019 3 commits

Fixes checkpointing bug introduced in · 39bbc9a5

Myle Ott authored Jul 01, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/847

Differential Revision: D16075498

Pulled By: myleott

fbshipit-source-id: 62e27a8c4764f53f181c502674dfab1e6b0537e2

39bbc9a5

Make segment_labels optional · b6d420c2

Myle Ott authored Jul 01, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/703

Differential Revision: D16072305

Pulled By: myleott

fbshipit-source-id: b77019bdcfbfb95f2817a29a74515bc8f5b682bf

b6d420c2

Better distributed init for SLURM · 1757ef69

Myle Ott authored Jul 01, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/696

Differential Revision: D16068394

Pulled By: myleott

fbshipit-source-id: 92b44470ab8aeb9f99838cf74e34176104eb2b87

1757ef69

30 Jun, 2019 4 commits

Fix docs (fixes #843) · b5949373

Myle Ott authored Jun 30, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/844

Differential Revision: D16069358

Pulled By: myleott

fbshipit-source-id: 5ca4ab392dbdc4dfdaa27b63e8ff1c3940c91a26

b5949373

Add linear activation · ca5b1da5

Myle Ott authored Jun 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/698

Differential Revision: D16068477

Pulled By: myleott

fbshipit-source-id: a68f6f519dc5481f857d8e10cc443249eccb2545

ca5b1da5

Fix --end-learning-rate in polynomial LR schedule · 0bac688c

Myle Ott authored Jun 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/699

Differential Revision: D16068551

Pulled By: myleott

fbshipit-source-id: dddd8768b531032af7c4598af9dae3c6c00ff9ac

0bac688c

Add additional options for configuring writing of checkpoints · 89e077c3

Myle Ott authored Jun 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/697

Differential Revision: D16068465

Pulled By: myleott

fbshipit-source-id: c2563c3c682e7e8406e6d7c8e895d8afbec551eb

89e077c3

27 Jun, 2019 2 commits

Update generate.py (#831) · c86d70cc

Bao-Yu authored Jun 27, 2019

Summary:
Repeated use of 'i' in evaluate may cause some problems.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/831

Differential Revision: D15980227

Pulled By: myleott

fbshipit-source-id: 7b6b54a6b54938ad63ed1720d930505b56e5c84b

c86d70cc

2/N bmuf · c246df42

Nayan Singhal authored Jun 26, 2019

Summary:
Added BMUF implementation.

Todo:
1) Add unit test case for testing model averaging and bmuf
2) Add warm before actually start training the model

Reviewed By: jay-mahadeokar

Differential Revision: D15871477

fbshipit-source-id: 866b0aba2d5bea5b65b4438acb49c886c4a87924

c246df42

26 Jun, 2019 3 commits

add missing condition for first moment type as statement · 1328bc09

Alexander Rives authored Jun 26, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/684

Differential Revision: D16006333

Pulled By: myleott

fbshipit-source-id: 95bd4215734281194008fa029e81407d63b335ac

1328bc09

FIx dataset loading when there are multiple valid subsets (#835) · 8b514b9f

Liang Wang authored Jun 26, 2019

Summary:
When we have multiple valid subsets, say `valid`, `valid1` and `valid2`, if `combine=True` holds, when loading `valid` subset, it will try to locate and load `valid`, `valid1`, `valid2`... and then combine them into one dataset. Set `combine` to `False` solves this issue.

In my experiment, I have 3 valid subsets with 3000, 5000 and 8701 examples, with argument `--valid-subset valid,valid1,valid2`, the log is as follows:

```
......
| ./mix_data/bin valid src-trg 3000 examples
| ./mix_data/bin valid1 src-trg 5000 examples
| ./mix_data/bin valid2 src-trg 7801 examples
| ./mix_data/bin valid1 src-trg 5000 examples
| ./mix_data/bin valid2 src-trg 7801 examples
......
```

As shown above, `valid1` and `valid2` subsets are incorrectly loaded twice.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/835

Differential Revision: D16006343

Pulled By: myleott

fbshipit-source-id: ece7fee3a00f97a6b3409defbf7f7ffaf0a54fdc

8b514b9f

Move task import in MultilingualTransformer to fix circular dependencies · ab2fa185

Myle Ott authored Jun 26, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/687

Differential Revision: D16005399

Pulled By: myleott

fbshipit-source-id: bf099c17e2095394acc452e9abcb4ee04afd0426

ab2fa185

25 Jun, 2019 1 commit

avoid "divided by zero error" in logging_outputs when --use-bmuf is e… (#812) · b3864b28

freewym authored Jun 25, 2019

Summary:
… enabled.

When doing multi-gpu training with --use-bmuf turned on and --global-sync-iter > 1, each replica may not sync with other replicas at each iteration. So logging_outputs only has stats of their own.  On the other hand, logging_outputs may be empty at the end of an epoch after "a dummy iteration" because the number of replicas does not divide the number of batches of the training data. If this happens, sample_size and ntokens would be 0 for some replica  and cause "divided by 0" error. This fix sets *loss to 0 if sample_size/ntokens is 0.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/812

Reviewed By: myleott, yqwangustc

Differential Revision: D15908614

Pulled By: nayansinghal

fbshipit-source-id: c92e8e095f012bdb4ef753a3c627fd215afa215d

b3864b28

24 Jun, 2019 1 commit

Add 'doc' break mode to TokenBlockDataset · d9c79133

Myle Ott authored Jun 24, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/679

Test Plan: https://our.intern.facebook.com/intern/chronos/jobinstance/?jobinstanceid=5191319216&smc=chronos_gp_admin_client&log_type=stdout&offset=0&pretty_logs=false

Differential Revision: D15961008

Pulled By: myleott

fbshipit-source-id: cf214de96665b33887ef64cfcb45a51f81002ed1

d9c79133

23 Jun, 2019 3 commits

Fix resuming training when using --memory-efficient-fp16 · efb43450

Myle Ott authored Jun 23, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/678

Differential Revision: D15956712

Pulled By: myleott

fbshipit-source-id: 5048d06ddfbec0045558a22c777a966cca1ec396

efb43450

Fixed argument for Adaptive Softmax Instantiation · 39a60b84

Alex Mathai authored Jun 23, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/830

Differential Revision: D15960624

Pulled By: myleott

fbshipit-source-id: ecfef5c51b886e3162bb8e07d232c6e9ea1169b0

39a60b84

Stringlize 2-d tensors with Dictionary. · 4340b34e

Qian Wang authored Jun 23, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/828

Differential Revision: D15960629

Pulled By: myleott

fbshipit-source-id: ca631651e9a90ce8ed90ca23987519001fea3656

4340b34e

21 Jun, 2019 2 commits

get_batch_iterator: allow max_positions=None (#673) · 7b4f5517

James Cross authored Jun 21, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/673

This function breaks when leaving the argument `max_positions` with the default value `None`, which is presumably not the intended behavior.

Reviewed By: theweiho, myleott

Differential Revision: D15937221

fbshipit-source-id: 1f5dc1c27ad9b6a89501d2dc015de12181059349

7b4f5517

Support DDP.no_sync context manager · b625d53d

Myle Ott authored Jun 20, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/671

Differential Revision: D15925248

fbshipit-source-id: 9eeea8a257929347e2458afdfc1def8dbb925a72

b625d53d

20 Jun, 2019 6 commits

Use bert init for xlm_base · 6be5f07c

Matt Le authored Jun 20, 2019

Summary:
Use bert init for xlm_base. This seems to be much closer to what is done in the [XLM](https://github.com/facebookresearch/XLM/blob/master/src/model/transformer.py#L44) repo.

At update 10 with BERT init (f121471600), loss starts at 14.234

At update 10 without BERT init (f121471612), loss starts at 154.423

Reviewed By: liezl200, pipibjc

Differential Revision: D15874836

fbshipit-source-id: f81bf83a078992d7476ba7fdf263b731a9f5b66d

6be5f07c

v0.7.1: fix PyPI setup and tests · 881381cf

Myle Ott authored Jun 20, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/818

Differential Revision: D15916265

Pulled By: myleott

fbshipit-source-id: c66c0bd988d3472c4150226952f34ee8d4c3db86

881381cf

Enhanced MMapIndexedDataset: less memory, higher speed (#816) · 9462a819

davidecaroselli authored Jun 19, 2019

Summary:
I have made an upgrade to my previous implementation of MMapIndexedDataset, now:
- It uses up to **4 times less memory and disk space**
- Words per second is slightly improved thanks to less memory access
Pull Request resolved: https://github.com/pytorch/fairseq/pull/816

Differential Revision: D15899848

Pulled By: myleott

fbshipit-source-id: 9ddeb4809729ef69cc6b0867b33ee71184d845e6

9462a819

Better explain the inference argument format of multilingual translation · 9c3bb5c6

Peng-Jen Chen authored Jun 19, 2019

Summary:
In https://github.com/pytorch/fairseq/issues/656, people are often confused about how to set multilingual translation parameters at inference time.

This diff add more checks to ensure the arguments (`--lang-pairs`, `--encoder-langtok`, `--decoder-langtok`) load from checkpoint are consistent with arguments specified in generate/interactive command line.
We also add a section in example page to explain how to set the arguments

Reviewed By: myleott

Differential Revision: D15682169

fbshipit-source-id: 64e6db94cd72ea7ce2d0aa1067c9c2dcd3b8a2ac

9c3bb5c6

wav2vec model (#654) · 392fce8a

alexeib authored Jun 19, 2019

Summary:
Merging wav2vec to master. Includes renames (Cpc -> wav2vec) and some light example files.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/654

Differential Revision: D15913409

Pulled By: alexeib

fbshipit-source-id: f723e6f211706cd9431c7d76dc12c4e80c9cfc80

392fce8a

v0.7.0 (#817) · bd710e75

Myle Ott authored Jun 19, 2019

Summary:
Notable (possibly breaking) changes:
- d45db804: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
- f2563c21: Move LM definitions into separate files
- dffb1674: Updates to model API:
  - `FairseqModel` -> `FairseqEncoderDecoderModel`
  - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
  - `encoder_out_dict` -> `encoder_out`
  - rm unused `remove_head` functions
- 34726d56: Move `distributed_init` into `DistributedFairseqModel`
- cf17068a: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
- d45db804: Change default LR scheduler from `reduce_lr_on_plateau` to `fixed`
- 96ac28d3: Rename `--sampling-temperature` -> `--temperature`
- fc1a19a3: Deprecate dummy batches
- a1c997bd: Add memory mapped datasets
- 0add50c2: Allow cycling over multiple datasets, where each one becomes an "epoch"

Plus many additional features and bugfixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/817

Differential Revision: D15913844

Pulled By: myleott

fbshipit-source-id: d5b5d678efdd9dd3e4d7ca848ddcf1ec2b21bf6b

bd710e75

19 Jun, 2019 4 commits

Add option to freeze transformer params for fine-tuning · af9500dc

Michael Wu authored Jun 19, 2019

Summary: add flags to freeze embedding parameters and transformer layer parameters in `TransformerSentenceEncoder`.

Reviewed By: myleott

Differential Revision: D15866135

fbshipit-source-id: e634d7adfd5e81eacccf2b9cf6bc15bad30bd1fe

af9500dc

Support different embed dim in Transformer decoder · 461a366d

Myle Ott authored Jun 19, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/811

Differential Revision: D15880880

Pulled By: myleott

fbshipit-source-id: c47e09a90c945aca82b26edb4a8af93e063d5b00

461a366d

Replace the use of the deprecated torch.distributed.reduce_op with to… (#804) · 00ac823e

freewym authored Jun 19, 2019

Summary:
…rch.distributed.ReduceOp
Pull Request resolved: https://github.com/pytorch/fairseq/pull/804

Differential Revision: D15877033

Pulled By: myleott

fbshipit-source-id: 58e7c39a88b67345a55b761fee4d9f211a5ee82c

00ac823e

Add fairspeq task to train ASR model with auxiliary data. (#813) · 14282ff3

Arya McCarthy authored Jun 18, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/813

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/663

Pull Request resolved: https://github.com/fairinternal/fairspeq/pull/4

Introduce new training for speech models which accept additional training data.

Reviewed By: liezl200

Differential Revision: D15846661

fbshipit-source-id: 8b2cbfd56a86cf03c0b34c4a025bebdd5db7204e

14282ff3

15 Jun, 2019 1 commit

Close memory maps · 1c1fd730

Myle Ott authored Jun 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/655

Differential Revision: D15816573

fbshipit-source-id: ac0118a1d407dc132cc7d82e029eac6c8ec76d2a

1c1fd730

13 Jun, 2019 1 commit

Switch to gzip for large WMT'18 ensemble (#803) · 6d1233fa

Myle Ott authored Jun 12, 2019

Summary:
It's so much faster to extract (3 minutes instead of 20).
Pull Request resolved: https://github.com/pytorch/fairseq/pull/803

Differential Revision: D15795810

Pulled By: myleott

fbshipit-source-id: 3b2ae8bd7924a77ac8e795f5e1a7da0c4ae27374

6d1233fa

12 Jun, 2019 3 commits

Add Model Averaging · 6982c404

Nayan Singhal authored Jun 12, 2019

Summary:
Implemented model averaging for fairseq.
Removed the ddp wrapper if global optimizer is provided.
Syncing all the models based on the iteration provide in the input

TODO:
1) Fix throughput and wps meter. Need to check other meters too.
2) Replace Model average code with BMUF algorithm implementation.

Reviewed By: myleott

Differential Revision: D15711044

fbshipit-source-id: 58a4af74db2a61d06762597b95836cbeb1ed82cc

6982c404

Add more torch.hub deps · 78c2fcf0

Myle Ott authored Jun 12, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/801

Differential Revision: D15781975

Pulled By: myleott

fbshipit-source-id: b86276cd3a40138c09494637c43ce52a56c4aced

78c2fcf0

Add missing dependencies to hubconf · 37df862e

Myle Ott authored Jun 11, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/799

Differential Revision: D15773932

Pulled By: myleott

fbshipit-source-id: 650c0621bedb3b7ecebc0654d8e10d7692c50994

37df862e

11 Jun, 2019 1 commit

Iterate on torch.hub interface · 5bdee18e

Myle Ott authored Jun 11, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/793

Differential Revision: D15758755

Pulled By: myleott

fbshipit-source-id: b93e4ac11bde36a0b59b4d6d1c84d31c3124d767

5bdee18e