Commits · ce7f044bb100aeec6b3c524a654ce8c177403c0b · OpenDAS / Fairseq

29 Jul, 2019 3 commits

Add instructions to load RoBERTa models on PyTorch 1.0 · ce7f044b

Myle Ott authored Jul 29, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/921

Differential Revision: D16541025

Pulled By: myleott

fbshipit-source-id: bb78d30fe285da2adfc7c4e5897ee01fa413b2e4

ce7f044b

Add RoBERTa · 8d036c2f

Myle Ott authored Jul 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/916

Differential Revision: D16537774

Pulled By: myleott

fbshipit-source-id: 86bb7b1913a428ee4a21674cc3fc7b39264067ec

8d036c2f

Update BPE library code · a80cade9

Myle Ott authored Jul 28, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/780

Differential Revision: D16537567

Pulled By: myleott

fbshipit-source-id: 4e18c529959935e82ea122c3a2ee477308ffcbe3

a80cade9

28 Jul, 2019 6 commits

Change default --num-workers to 1 · 76ff39f5

Myle Ott authored Jul 28, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/779

Differential Revision: D16536673

Pulled By: myleott

fbshipit-source-id: bf56e9a81d3086f3d95a3273391dc5e04ed2dbc4

76ff39f5

Add Adamax optimizer · c446c44b

Myle Ott authored Jul 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/914

Differential Revision: D16536670

Pulled By: myleott

fbshipit-source-id: 8a41c98f0fb87af6c384cdade756e3eae2978a88

c446c44b

Correctly zero padding index in TransformerSentenceEncoder · 1362b21b

Myle Ott authored Jul 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/912

Differential Revision: D16536561

Pulled By: myleott

fbshipit-source-id: 54c5c20a826a14f4e690770e027bcb282acdf911

1362b21b

Misc dataset improvements · 8207f263

Myle Ott authored Jul 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/911

Differential Revision: D16536559

Pulled By: myleott

fbshipit-source-id: 7fe495054ce5b7658b1d3a43eca38c5858360236

8207f263

Make hub_utils.generator inherit from nn.Module · abc13e28

Myle Ott authored Jul 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/913

Differential Revision: D16536562

Pulled By: myleott

fbshipit-source-id: ce28642da6868ec884e3e416388a652977a062df

abc13e28

Fix compatibility with PyTorch 1.0.x (Fixes #906) · 5218a7c9

Myle Ott authored Jul 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/910

Differential Revision: D16536532

Pulled By: myleott

fbshipit-source-id: 56bb5570e70b5670ad87c64d9dd20c64c1fa9f5c

5218a7c9

27 Jul, 2019 2 commits

Add return_all_hiddens flag to hub interface · 40f16872

Myle Ott authored Jul 27, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/909

Differential Revision: D16532919

Pulled By: myleott

fbshipit-source-id: 16ce884cf3d84579026e4406a75ba3c01a128dbd

40f16872

Add RoBERTa README · 17fcc72a

Myle Ott authored Jul 26, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/778

Differential Revision: D16525447

Pulled By: myleott

fbshipit-source-id: e721e3a10e243a2408a04f89f06b5adbbe2fdff2

17fcc72a

25 Jul, 2019 2 commits

Standardize on 'teacher forcing' rather than 'input feeding' which is… (#769) · 8835d93c

Myle Ott authored Jul 25, 2019

Summary:
Input feeding generally refers to a slightly different concept
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/769

Differential Revision: D16491898

Pulled By: myleott

fbshipit-source-id: 68573584e820f11f199db4e7e37e9ee7a69a3287

8835d93c

Update torch.hub usage · 3d764a3d

Myle Ott authored Jul 25, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/770

Differential Revision: D16491911

Pulled By: myleott

fbshipit-source-id: 8dd2b76f8fa24183640ae9d1129ea47ded77d43d

3d764a3d

24 Jul, 2019 1 commit

check save_dir before beginning training · b49ea81c

Spencer Poff authored Jul 24, 2019

Summary: I sadly discovery that my checkpoint directory wasn't globally readable after 8 hours of training. Adding this check at the beginning of train loop to keep that from happening again!

Reviewed By: myleott

Differential Revision: D16455394

fbshipit-source-id: 35959aa058150b2afb63710c468d01ebc8a12b0c

b49ea81c

23 Jul, 2019 3 commits

Update README.md · 208295df

Myle Ott authored Jul 23, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/899

Differential Revision: D16448602

Pulled By: myleott

fbshipit-source-id: afd1a1b713274b6328150cd85d7f8a81833597aa

208295df

Initializing mask as a tensor of ints (not long) (#875) · af6b361c

Taylan Bilal authored Jul 23, 2019

Summary:
Since mask really is a tensor of ints, this change should be mathematically
equivalent to the base.

On the other hand, this has performance implications for xla, hence the
pull request.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/875

Differential Revision: D16232877

Pulled By: myleott

fbshipit-source-id: e63175ee0016dcf0dfe10e2fd22570b8bbfbde84

af6b361c

Fix read_binarized.py script · 30123e2c

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/762

Differential Revision: D16427266

Pulled By: myleott

fbshipit-source-id: 9bd9b8c6b4994ae98a62a37b34d03265bd365453

30123e2c

22 Jul, 2019 8 commits

Implement sparse transformer fixed attention pattern (#804) · a03fe6fa

Sara Hanson authored Jul 22, 2019

Summary:
Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746

Pull Request resolved: https://github.com/pytorch/fairseq/pull/894

Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values.

Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary.

Reviewed By: borguz

Differential Revision: D16042988

fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5

a03fe6fa

Add new Masked LM task + criterion · e8d609a8

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/761

Differential Revision: D16421335

Pulled By: myleott

fbshipit-source-id: 257d92c2b90361147642e2baa38486b4d18f6297

e8d609a8

Add new Datasets · 654affc0

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/757

Differential Revision: D16418305

Pulled By: myleott

fbshipit-source-id: 25f293a2792509f7a75c688e4bf8cff02e6bba2e

654affc0

Simplify hubconf · 51ba3521

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/758

Differential Revision: D16418932

Pulled By: myleott

fbshipit-source-id: 59f005164b61b9fa712922eeb23525f7eec38f38

51ba3521

Fix --reset-meters · 906411da

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/756

Differential Revision: D16418302

Pulled By: myleott

fbshipit-source-id: 62495a0bff41d1741e2b09807a3b43ff2c66c8fb

906411da

Add fallback for SLURM config · bccfa7d0

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/752

Differential Revision: D16417582

Pulled By: myleott

fbshipit-source-id: 6b4289febcf9290452bb91f1f2181a02c09c82a7

bccfa7d0

Move Masked LM components to legacy/ -- new ones are coming · 47fd9852

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740

Differential Revision: D16377797

Pulled By: myleott

fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f

47fd9852

Misc improvements to torch hub interface · 9c89e882

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/750

Differential Revision: D16410986

Pulled By: myleott

fbshipit-source-id: 8ee6b4371d6ae5b041b00a54a6039a422345795e

9c89e882

21 Jul, 2019 4 commits

Update GPT-2 BPE · 62b5498b

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/749

Differential Revision: D16410984

Pulled By: myleott

fbshipit-source-id: 7698df46b8a179afccb287990f9705358690454a

62b5498b

Default to mmap and infer dataset implementations automatically · 5f78106a

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/751

Differential Revision: D16410989

Pulled By: myleott

fbshipit-source-id: ddbbee49756f9ff6c4487977a3f5d2259b7abafe

5f78106a

Fix topp sampling issues (#882) · 1f96d284

Liang Wang authored Jul 21, 2019

Summary:
Two issues here:

1. `last_included` should be the last included index `cumsum_mask[:, :, -1:]` instead of `cumsum_mask[:, :, :1]`  (which is either 0 or 1);

2. If `--no-repeat-ngram-size` is set, the sum of `probs` may less than 1, we need to re-normalize to make it a valid probability distribution

The following code can reproduce this issues:

```
import torch
import numpy as np

def _sample_topp(probs):

    # =====  Code from  fairseq/search.py _sample_topp ======

    # sort the last dimension (vocab dimension) in descending order
    sorted_probs, sorted_indices = probs.sort(descending=True)

    # compute a mask to indicate the words to be included in the top-P set.
    cumsum_probs = sorted_probs.cumsum(dim=2)
    mask = cumsum_probs.lt(sampling_topp)

    # note that mask was computed by 'lt'. One more word needs to be included
    # so that the cumulative probability mass can exceed p.
    cumsum_mask = mask.cumsum(dim=2)
    last_included = cumsum_mask[:, :, :1]
    mask = mask.scatter_(2, last_included, 1)

    # truncate unnecessary dims.
    max_dim = last_included.max()
    truncated_mask = mask[:, :, :max_dim + 1]
    truncated_probs = sorted_probs[:, :, :max_dim + 1]
    truncated_indices = sorted_indices[:, :, :max_dim + 1]

    # trim the words that are not in top-P by setting their probabilities
    # to 0, so that they would not be sampled later.
    trim_mask = 1 - truncated_mask
    trimed_probs = truncated_probs.masked_fill_(trim_mask, 0)
    return trimed_probs, truncated_indices

    # ========================================================

if __name__ == '__main__':
    np.random.seed(1234)
    torch.manual_seed(1234)

    sampling_topp = 0.9
    probs = torch.softmax(torch.randn(1, 1, 10), dim=-1)
    # probs = tensor([0.0545, 0.0779, 0.0189, 0.0647, 0.0282, 0.0862, 0.0656, 0.1041, 0.0399, 0.4600])
    print('probs =', probs[0][0])

    trimed_probs, truncated_indices = _sample_topp(probs)

    cum_probs = trimed_probs.cumsum(dim=-1)[0][0]
    # cumsum = tensor([0.4600, 0.5641])
    print('cumsum =', cum_probs)
    # Will throw AssertionError
    assert float(cum_probs[-1]) >= sampling_topp

```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/882

Differential Revision: D16409269

Pulled By: xingz9

fbshipit-source-id: 94b1122eed50c656057b64e22af6f4a6ea7a68af

1f96d284

Rename data.transforms -> data.encoders · f812e529

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/747

Differential Revision: D16403464

Pulled By: myleott

fbshipit-source-id: ee3b4184f129a02be833c7bdc00685978b4de883

f812e529

19 Jul, 2019 7 commits

Rename _load_model_ensemble -> load_model_ensemble_and_task · 69d0f7f8

Myle Ott authored Jul 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/738

Differential Revision: D16377803

Pulled By: myleott

fbshipit-source-id: 6beb2f78e7464b70ff65a965d2b747cdca0ca951

69d0f7f8

Allow not specifying --warmup-init-lr · 7efde226

Myle Ott authored Jul 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/736

Differential Revision: D16378001

Pulled By: myleott

fbshipit-source-id: 2907f63bcbf7068ceaa48b00096040fa2639e569

7efde226

Create standalone label_smoothed_nll_loss · ffe53d6f

Myle Ott authored Jul 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/739

Differential Revision: D16377798

Pulled By: myleott

fbshipit-source-id: 20047c80de2e6f108269ace4ae3eec906a5920dd

ffe53d6f

Store task in the criterion base class · c811e0e0

Myle Ott authored Jul 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/737

Differential Revision: D16377805

Pulled By: myleott

fbshipit-source-id: 1e090a02ff4fbba8695173f57d3cc5b88ae98bbf

c811e0e0

Improve interactive generation (support --tokenizer and --bpe) · 8af55542

Myle Ott authored Jul 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/734

Differential Revision: D16377044

Pulled By: myleott

fbshipit-source-id: 37d5553d76aa7c653113fec089f59710281c31d7

8af55542

Switch to torch.nn.functional.gelu when available · be5821b8

Myle Ott authored Jul 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/735

Differential Revision: D16377046

Pulled By: myleott

fbshipit-source-id: 9725d4a3ce6b2fc8cee0b1d1cb8921f9d59c551a

be5821b8

v0.7.1 -> v0.7.2 (#891) · b002d009

Myle Ott authored Jul 19, 2019

Summary:
No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/891

Differential Revision: D16377132

Pulled By: myleott

fbshipit-source-id: f1cb88e671ccd510e53334d0f449fe18585268c7

b002d009

17 Jul, 2019 4 commits

Support Latent Variable Model in base training (#879) · 1f5b414f

Ning Dong authored Jul 17, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/879

Pull Request resolved: https://github.com/pytorch/translate/pull/598

Details in https://fb.workplace.com/notes/ning-dong/closing-research-to-production-gap-a-story-of-latent-variable-model-migration/443418839813586/

Reviewed By: xianxl

Differential Revision: D15742439

fbshipit-source-id: 168c84bd30a5da3c2fb404fcca74266deef1f964

1f5b414f

Nucleus (top-P) sampling (#710) · e46b924d

Xing Zhou authored Jul 17, 2019

Summary:
Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p.

To test it:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710

Test Plan:
python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3

python tests/test_sequence_generator.py

python tests/test_binaries.py

Reviewed By: myleott

Differential Revision: D16286688

Pulled By: xingz9

fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f

e46b924d

Small bug fix for generation when batch_size is small · 473389a3

Jiajun Shen authored Jul 17, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/727

Differential Revision: D16332742

Pulled By: myleott

fbshipit-source-id: becedd573c2c071fd21fcb5e55fead554c9bd9d1

473389a3

Add suppress_defaults functionality to options parser (#723) · 61e328cc

Myle Ott authored Jul 17, 2019

Summary:
This is useful for standalone scripts that want to load a model and inherit most of the args from the model (e.g., eval_lm.py).
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/723

Differential Revision: D16255751

Pulled By: myleott

fbshipit-source-id: 562b61511d5d7113e805c9644c877ebb8a3a1889

61e328cc