Commits · ffe53d6fbc9b36668cf3c9bdf1cc786730fdee79 · OpenDAS / Fairseq

19 Jul, 2019 5 commits

Create standalone label_smoothed_nll_loss · ffe53d6f

Myle Ott authored Jul 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/739

Differential Revision: D16377798

Pulled By: myleott

fbshipit-source-id: 20047c80de2e6f108269ace4ae3eec906a5920dd

ffe53d6f

Store task in the criterion base class · c811e0e0

Myle Ott authored Jul 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/737

Differential Revision: D16377805

Pulled By: myleott

fbshipit-source-id: 1e090a02ff4fbba8695173f57d3cc5b88ae98bbf

c811e0e0

Improve interactive generation (support --tokenizer and --bpe) · 8af55542

Myle Ott authored Jul 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/734

Differential Revision: D16377044

Pulled By: myleott

fbshipit-source-id: 37d5553d76aa7c653113fec089f59710281c31d7

8af55542

Switch to torch.nn.functional.gelu when available · be5821b8

Myle Ott authored Jul 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/735

Differential Revision: D16377046

Pulled By: myleott

fbshipit-source-id: 9725d4a3ce6b2fc8cee0b1d1cb8921f9d59c551a

be5821b8

v0.7.1 -> v0.7.2 (#891) · b002d009

Myle Ott authored Jul 19, 2019

Summary:
No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/891

Differential Revision: D16377132

Pulled By: myleott

fbshipit-source-id: f1cb88e671ccd510e53334d0f449fe18585268c7

b002d009

17 Jul, 2019 5 commits

Support Latent Variable Model in base training (#879) · 1f5b414f

Ning Dong authored Jul 17, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/879

Pull Request resolved: https://github.com/pytorch/translate/pull/598

Details in https://fb.workplace.com/notes/ning-dong/closing-research-to-production-gap-a-story-of-latent-variable-model-migration/443418839813586/

Reviewed By: xianxl

Differential Revision: D15742439

fbshipit-source-id: 168c84bd30a5da3c2fb404fcca74266deef1f964

1f5b414f

Nucleus (top-P) sampling (#710) · e46b924d

Xing Zhou authored Jul 17, 2019

Summary:
Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p.

To test it:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710

Test Plan:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3

python tests/test_sequence_generator.py

python tests/test_binaries.py

Reviewed By: myleott

Differential Revision: D16286688

Pulled By: xingz9

fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f

e46b924d

Small bug fix for generation when batch_size is small · 473389a3

Jiajun Shen authored Jul 17, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/727

Differential Revision: D16332742

Pulled By: myleott

fbshipit-source-id: becedd573c2c071fd21fcb5e55fead554c9bd9d1

473389a3

Add suppress_defaults functionality to options parser (#723) · 61e328cc

Myle Ott authored Jul 17, 2019

Summary:
This is useful for standalone scripts that want to load a model and inherit most of the args from the model (e.g., eval_lm.py).
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/723

Differential Revision: D16255751

Pulled By: myleott

fbshipit-source-id: 562b61511d5d7113e805c9644c877ebb8a3a1889

61e328cc

Forcing static shapes in loss computation (LSCE) (#876) · 8db7b1c7

Taylan Bilal authored Jul 17, 2019

Summary:
applying non_pad_mask results in dynamic shapes = bad for tpus
This is an equivalent loss computation (tested), but tensor shapes are
constant (in the case of reduce=True)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/876

Differential Revision: D16241621

Pulled By: myleott

fbshipit-source-id: 973254b7e0842f2b55817afd66b2a110a566f149

8db7b1c7

14 Jul, 2019 1 commit

removing tensor resizing in future_mask (#877) · c38b1f91

Taylan Bilal authored Jul 14, 2019

Summary:
tensor resizing doesn't work well with tpus, this change is equivalent
to the base and works better w/ tpus.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/877

Differential Revision: D16241620

Pulled By: myleott

fbshipit-source-id: 402c7d5eb6175a66a0420d10e74eb0a9e085790e

c38b1f91

11 Jul, 2019 2 commits

adding __getstate__ to fairseq_optimizer (#872) · 397ba265

Taylan Bilal authored Jul 11, 2019

Summary:
self._optimizer has __getstate__
We need this so that fairseq_optimizer's work with pytorch/xla

```
% find . | xargs grep -s -i __getstate__
./third_party/tensorflow/tensorflow/python/util/deprecation_wrapper.py:  def __getstate__(self):
./torch_xla_py/xla_model.py:  for param_group in optimizer.__getstate__()['param_groups']:
```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/872

Differential Revision: D16211062

Pulled By: alexeib

fbshipit-source-id: 1b5575c85d34b7b021d719a03fd58d1c2ee453ee

397ba265

Small fix for trainer · 0a4911f6

Myle Ott authored Jul 11, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/711

Differential Revision: D16192752

Pulled By: myleott

fbshipit-source-id: 102ed337a3d31e2047be7c033e9007c04223a684

0a4911f6

10 Jul, 2019 1 commit

add __len__ for progress_bar (#871) · 2bf7e75d

Taylan Bilal authored Jul 10, 2019

Summary:
We need this so that `progress_bar`s work with pytorch/xla i.e. TPUs. See [here](https://github.com/pytorch/xla/blob/master/torch_xla_py/data_parallel.py#L130).
Pull Request resolved: https://github.com/pytorch/fairseq/pull/871

Differential Revision: D16181062

Pulled By: myleott

fbshipit-source-id: 02c65033260396c2a243fbb66e31ffc2965f2376

2bf7e75d

09 Jul, 2019 1 commit

Fix multilingual evaluation bug (#592) · e4335001

Peng-Jen Chen authored Jul 09, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/592

Fix bug reported at
https://github.com/pytorch/fairseq/commit/9c3bb5c6d6c7d6442a28ccb8a81b2fc4e5782ace#r34181600

D15682169 breaks the multilingual translation generation.

Reviewed By: dpacgopinath

Differential Revision: D16147454

fbshipit-source-id: e0cf4d32f362190a0542fa0160f65a2a207ca3fa

e4335001

08 Jul, 2019 1 commit

Integrate torch.nn and fairseq MultiheadAttention (#772) · 6d2e0831

Guanheng Zhang authored Jul 08, 2019

Summary:
Integrate torch.nn and fairseq MultiheadAttention modules. In the future, both libraries will be benefited from performance optimization together.

Under the following circumstances, the calculation of the MultiheadAttention will still remain in fairseq, including:
1. onnx trace
2. incremental state
3. static kv

We plan to gradually mitigate those capabilities to PyTorch's core library.

Faieseq users can user the attribute self.enable_torch_version to force the calculations in either torch or fairseq. We use the following script to ensure both versions yield the same results.

------------------------------------------------------------------------------------
```
import torch
from fairseq.modules import MultiheadAttention
import time

embed_dim = 64
kv_embed_dim = 1208
num_heads = 16
src_len = 20
tgt_len = 30
bsz = 10

model = MultiheadAttention(embed_dim, num_heads, kdim=kv_embed_dim, vdim=kv_embed_dim,
                           bias=True, add_bias_kv=True, add_zero_attn=True)

query = torch.rand((src_len, bsz, embed_dim))
key = torch.rand((src_len, bsz, kv_embed_dim))
value = torch.rand((src_len, bsz, kv_embed_dim))

attn_mask = torch.randint(0, 2, (src_len, src_len)).float()
attn_mask.masked_fill_(attn_mask == 0, float('-inf'))
attn_mask.masked_fill_(attn_mask > 0, float('0.0'))

seq_mask = torch.randint(0, 2, (1, src_len))
key_padding_mask = seq_mask
for i in range(bsz-1):
    key_padding_mask = torch.cat([key_padding_mask, seq_mask], axis=0)
key_padding_mask = key_padding_mask == 1

# Apply torch.nn version
model.enable_torch_version = True
torch_output, torch_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask)

# Apply fairseq version
model.enable_torch_version = False
fairseq_output, fairseq_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask)

print("torch and fairseq generate same results: outputs are same ? ",
      torch.allclose(torch_output, fairseq_output, atol=5e-6, rtol=1e-6),
      ", weights are same ? ",
      torch.allclose(torch_weight, fairseq_weight, atol=5e-6, rtol=1e-6)
)
```
------------------------------------------------------------------------------------
Expected results:
torch and fairseq generate same results: outputs are same ?  True , weights are same ?  True

------------------------------------------------------------------------------------
Similar performance is expected for both two versions. Using the following setup and have the initial performance benchmark results:

#########################
embed_dim = 32
kv_embed_dim = 32
num_heads = 4
src_len = 3
tgt_len = 2
bsz = 4
num_samples = 50000

#########################
torch-version MultiheadAttention cpu time: 0.46589  ms per iteration.
fairseq-version MultiheadAttention cpu time: 0.47861  ms per iteration.
torch-version MultiheadAttention gpu time: 0.82330  ms per iteration.
fairseq-version MultiheadAttention gpu time: 0.79410  ms per iteration.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/772

Reviewed By: myleott

Differential Revision: D16108450

Pulled By: zhangguanheng66

fbshipit-source-id: cd2eb5a6eeeab6c274999b7928c2af14fc211565

6d2e0831

06 Jul, 2019 2 commits

Added a comment to inform coders that pack_padded_sequence requires that... · 417ecb4b

vineetk1 authored Jul 06, 2019

Added a comment to inform coders that pack_padded_sequence requires that padding must be on the right (#860)

Summary:
The PyTorch document on pack_padded_sequence has no information regarding a requirement that padding must be on the right. Therefore, this information is added as a comment on Line 212 of [https://github.com/vineetk1/fairseq/blob/master/fairseq/models/lstm.py](url)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/860

Differential Revision: D16142102

Pulled By: myleott

fbshipit-source-id: 7cb6d4df64b17b54b223de03bd966ca16077c3fe

417ecb4b

Add specific compile flags for macOS (#862) · cc292afa

Louis MARTIN authored Jul 06, 2019

Summary:
Fairseq wouldn't install on macOS.
A workaround was found here: https://github.com/pytorch/fairseq/issues/289
This is now automatic in setup.py, maybe be there's a cleaner way to do it.

I checked that it compiles fine on Linux and macOS.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/862

Differential Revision: D16142105

Pulled By: myleott

fbshipit-source-id: 998ac7781d7a1ac047f4f9239c1fe16eab4be0dd

cc292afa

04 Jul, 2019 1 commit

support streaming iterator · 5c241c8c

Spencer Poff authored Jul 03, 2019

Summary:
For tasks that involve streaming data directly from an API, we need a simpler epoch iterator.

Also included in this change is support for initializing a dictionary with an arbitrary list of special symbols.

Reviewed By: myleott

Differential Revision: D16110603

fbshipit-source-id: be6d9f680292dec1512614871f9269c95ac84861

5c241c8c

02 Jul, 2019 1 commit

add --max-tokens-valid option for validation · bccfddbb

Xutai Ma authored Jul 01, 2019

Summary: Add the max-token-valid option. Sometime a separate max batch tokens for validation may be helpful, for example when there is a long sequence in validation set thats larger than max_tokens (it's rare in MT but could happen in ASR or AST).

Reviewed By: myleott

Differential Revision: D16076951

fbshipit-source-id: ae7f4218594580b9450a8196d7afa1e7e2018aee

bccfddbb

01 Jul, 2019 3 commits

Fixes checkpointing bug introduced in · 39bbc9a5

Myle Ott authored Jul 01, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/847

Differential Revision: D16075498

Pulled By: myleott

fbshipit-source-id: 62e27a8c4764f53f181c502674dfab1e6b0537e2

39bbc9a5

Make segment_labels optional · b6d420c2

Myle Ott authored Jul 01, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/703

Differential Revision: D16072305

Pulled By: myleott

fbshipit-source-id: b77019bdcfbfb95f2817a29a74515bc8f5b682bf

b6d420c2

Better distributed init for SLURM · 1757ef69

Myle Ott authored Jul 01, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/696

Differential Revision: D16068394

Pulled By: myleott

fbshipit-source-id: 92b44470ab8aeb9f99838cf74e34176104eb2b87

1757ef69

30 Jun, 2019 4 commits

Fix docs (fixes #843) · b5949373

Myle Ott authored Jun 30, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/844

Differential Revision: D16069358

Pulled By: myleott

fbshipit-source-id: 5ca4ab392dbdc4dfdaa27b63e8ff1c3940c91a26

b5949373

Add linear activation · ca5b1da5

Myle Ott authored Jun 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/698

Differential Revision: D16068477

Pulled By: myleott

fbshipit-source-id: a68f6f519dc5481f857d8e10cc443249eccb2545

ca5b1da5

Fix --end-learning-rate in polynomial LR schedule · 0bac688c

Myle Ott authored Jun 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/699

Differential Revision: D16068551

Pulled By: myleott

fbshipit-source-id: dddd8768b531032af7c4598af9dae3c6c00ff9ac

0bac688c

Add additional options for configuring writing of checkpoints · 89e077c3

Myle Ott authored Jun 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/697

Differential Revision: D16068465

Pulled By: myleott

fbshipit-source-id: c2563c3c682e7e8406e6d7c8e895d8afbec551eb

89e077c3

27 Jun, 2019 2 commits

Update generate.py (#831) · c86d70cc

Bao-Yu authored Jun 27, 2019

Summary:
Repeated use of 'i' in evaluate may cause some problems.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/831

Differential Revision: D15980227

Pulled By: myleott

fbshipit-source-id: 7b6b54a6b54938ad63ed1720d930505b56e5c84b

c86d70cc

2/N bmuf · c246df42

Nayan Singhal authored Jun 26, 2019

Summary:
Added BMUF implementation.

Todo:
1) Add unit test case for testing model averaging and bmuf
2) Add warm before actually start training the model

Reviewed By: jay-mahadeokar

Differential Revision: D15871477

fbshipit-source-id: 866b0aba2d5bea5b65b4438acb49c886c4a87924

c246df42

26 Jun, 2019 3 commits

add missing condition for first moment type as statement · 1328bc09

Alexander Rives authored Jun 26, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/684

Differential Revision: D16006333

Pulled By: myleott

fbshipit-source-id: 95bd4215734281194008fa029e81407d63b335ac

1328bc09

FIx dataset loading when there are multiple valid subsets (#835) · 8b514b9f

Liang Wang authored Jun 26, 2019

Summary:
When we have multiple valid subsets, say `valid`, `valid1` and `valid2`, if `combine=True` holds, when loading `valid` subset, it will try to locate and load `valid`, `valid1`, `valid2`... and then combine them into one dataset. Set `combine` to `False` solves this issue.

In my experiment, I have 3 valid subsets with 3000, 5000 and 8701 examples, with argument `--valid-subset valid,valid1,valid2`, the log is as follows:

```
......
| ./mix_data/bin valid src-trg 3000 examples
| ./mix_data/bin valid1 src-trg 5000 examples
| ./mix_data/bin valid2 src-trg 7801 examples
| ./mix_data/bin valid1 src-trg 5000 examples
| ./mix_data/bin valid2 src-trg 7801 examples
......
```

As shown above, `valid1` and `valid2` subsets are incorrectly loaded twice.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/835

Differential Revision: D16006343

Pulled By: myleott

fbshipit-source-id: ece7fee3a00f97a6b3409defbf7f7ffaf0a54fdc

8b514b9f

Move task import in MultilingualTransformer to fix circular dependencies · ab2fa185

Myle Ott authored Jun 26, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/687

Differential Revision: D16005399

Pulled By: myleott

fbshipit-source-id: bf099c17e2095394acc452e9abcb4ee04afd0426

ab2fa185

25 Jun, 2019 1 commit

avoid "divided by zero error" in logging_outputs when --use-bmuf is e… (#812) · b3864b28

freewym authored Jun 25, 2019

Summary:
… enabled.

When doing multi-gpu training with --use-bmuf turned on and --global-sync-iter > 1, each replica may not sync with other replicas at each iteration. So logging_outputs only has stats of their own.  On the other hand, logging_outputs may be empty at the end of an epoch after "a dummy iteration" because the number of replicas does not divide the number of batches of the training data. If this happens, sample_size and ntokens would be 0 for some replica  and cause "divided by 0" error. This fix sets *loss to 0 if sample_size/ntokens is 0.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/812

Reviewed By: myleott, yqwangustc

Differential Revision: D15908614

Pulled By: nayansinghal

fbshipit-source-id: c92e8e095f012bdb4ef753a3c627fd215afa215d

b3864b28

24 Jun, 2019 1 commit

Add 'doc' break mode to TokenBlockDataset · d9c79133

Myle Ott authored Jun 24, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/679

Test Plan: https://our.intern.facebook.com/intern/chronos/jobinstance/?jobinstanceid=5191319216&smc=chronos_gp_admin_client&log_type=stdout&offset=0&pretty_logs=false

Differential Revision: D15961008

Pulled By: myleott

fbshipit-source-id: cf214de96665b33887ef64cfcb45a51f81002ed1

d9c79133

23 Jun, 2019 3 commits

Fix resuming training when using --memory-efficient-fp16 · efb43450

Myle Ott authored Jun 23, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/678

Differential Revision: D15956712

Pulled By: myleott

fbshipit-source-id: 5048d06ddfbec0045558a22c777a966cca1ec396

efb43450

Fixed argument for Adaptive Softmax Instantiation · 39a60b84

Alex Mathai authored Jun 23, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/830

Differential Revision: D15960624

Pulled By: myleott

fbshipit-source-id: ecfef5c51b886e3162bb8e07d232c6e9ea1169b0

39a60b84

Stringlize 2-d tensors with Dictionary. · 4340b34e

Qian Wang authored Jun 23, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/828

Differential Revision: D15960629

Pulled By: myleott

fbshipit-source-id: ca631651e9a90ce8ed90ca23987519001fea3656

4340b34e

21 Jun, 2019 2 commits

get_batch_iterator: allow max_positions=None (#673) · 7b4f5517

James Cross authored Jun 21, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/673

This function breaks when leaving the argument `max_positions` with the default value `None`, which is presumably not the intended behavior.

Reviewed By: theweiho, myleott

Differential Revision: D15937221

fbshipit-source-id: 1f5dc1c27ad9b6a89501d2dc015de12181059349

7b4f5517

Support DDP.no_sync context manager · b625d53d

Myle Ott authored Jun 20, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/671

Differential Revision: D15925248

fbshipit-source-id: 9eeea8a257929347e2458afdfc1def8dbb925a72

b625d53d

20 Jun, 2019 1 commit

Use bert init for xlm_base · 6be5f07c

Matt Le authored Jun 20, 2019

Summary:
Use bert init for xlm_base. This seems to be much closer to what is done in the [XLM](https://github.com/facebookresearch/XLM/blob/master/src/model/transformer.py#L44) repo.

At update 10 with BERT init (f121471600), loss starts at 14.234

At update 10 without BERT init (f121471612), loss starts at 154.423

Reviewed By: liezl200, pipibjc

Differential Revision: D15874836

fbshipit-source-id: f81bf83a078992d7476ba7fdf263b731a9f5b66d

6be5f07c