Commits · 208295dfc76492748500f97a4f9a808d8053a184 · OpenDAS / Fairseq

23 Jul, 2019 3 commits

Update README.md · 208295df

Myle Ott authored Jul 23, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/899

Differential Revision: D16448602

Pulled By: myleott

fbshipit-source-id: afd1a1b713274b6328150cd85d7f8a81833597aa

208295df

Initializing mask as a tensor of ints (not long) (#875) · af6b361c

Taylan Bilal authored Jul 23, 2019

Summary:
Since mask really is a tensor of ints, this change should be mathematically
equivalent to the base.

On the other hand, this has performance implications for xla, hence the
pull request.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/875

Differential Revision: D16232877

Pulled By: myleott

fbshipit-source-id: e63175ee0016dcf0dfe10e2fd22570b8bbfbde84

af6b361c

Fix read_binarized.py script · 30123e2c

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/762

Differential Revision: D16427266

Pulled By: myleott

fbshipit-source-id: 9bd9b8c6b4994ae98a62a37b34d03265bd365453

30123e2c

22 Jul, 2019 8 commits

Implement sparse transformer fixed attention pattern (#804) · a03fe6fa

Sara Hanson authored Jul 22, 2019

Summary:
Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746

Pull Request resolved: https://github.com/pytorch/fairseq/pull/894

Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values.

Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary.

Reviewed By: borguz

Differential Revision: D16042988

fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5

a03fe6fa

Add new Masked LM task + criterion · e8d609a8

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/761

Differential Revision: D16421335

Pulled By: myleott

fbshipit-source-id: 257d92c2b90361147642e2baa38486b4d18f6297

e8d609a8

Add new Datasets · 654affc0

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/757

Differential Revision: D16418305

Pulled By: myleott

fbshipit-source-id: 25f293a2792509f7a75c688e4bf8cff02e6bba2e

654affc0

Simplify hubconf · 51ba3521

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/758

Differential Revision: D16418932

Pulled By: myleott

fbshipit-source-id: 59f005164b61b9fa712922eeb23525f7eec38f38

51ba3521

Fix --reset-meters · 906411da

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/756

Differential Revision: D16418302

Pulled By: myleott

fbshipit-source-id: 62495a0bff41d1741e2b09807a3b43ff2c66c8fb

906411da

Add fallback for SLURM config · bccfa7d0

Myle Ott authored Jul 22, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/752

Differential Revision: D16417582

Pulled By: myleott

fbshipit-source-id: 6b4289febcf9290452bb91f1f2181a02c09c82a7

bccfa7d0

Move Masked LM components to legacy/ -- new ones are coming · 47fd9852

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740

Differential Revision: D16377797

Pulled By: myleott

fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f

47fd9852

Misc improvements to torch hub interface · 9c89e882

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/750

Differential Revision: D16410986

Pulled By: myleott

fbshipit-source-id: 8ee6b4371d6ae5b041b00a54a6039a422345795e

9c89e882

21 Jul, 2019 4 commits

Update GPT-2 BPE · 62b5498b

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/749

Differential Revision: D16410984

Pulled By: myleott

fbshipit-source-id: 7698df46b8a179afccb287990f9705358690454a

62b5498b

Default to mmap and infer dataset implementations automatically · 5f78106a

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/751

Differential Revision: D16410989

Pulled By: myleott

fbshipit-source-id: ddbbee49756f9ff6c4487977a3f5d2259b7abafe

5f78106a

Fix topp sampling issues (#882) · 1f96d284

Liang Wang authored Jul 21, 2019

Summary:
Two issues here:

1. `last_included` should be the last included index `cumsum_mask[:, :, -1:]` instead of `cumsum_mask[:, :, :1]`  (which is either 0 or 1);

2. If `--no-repeat-ngram-size` is set, the sum of `probs` may less than 1, we need to re-normalize to make it a valid probability distribution

The following code can reproduce this issues:

```
import torch
import numpy as np

def _sample_topp(probs):

    # =====  Code from  fairseq/search.py _sample_topp ======

    # sort the last dimension (vocab dimension) in descending order
    sorted_probs, sorted_indices = probs.sort(descending=True)

    # compute a mask to indicate the words to be included in the top-P set.
    cumsum_probs = sorted_probs.cumsum(dim=2)
    mask = cumsum_probs.lt(sampling_topp)

    # note that mask was computed by 'lt'. One more word needs to be included
    # so that the cumulative probability mass can exceed p.
    cumsum_mask = mask.cumsum(dim=2)
    last_included = cumsum_mask[:, :, :1]
    mask = mask.scatter_(2, last_included, 1)

    # truncate unnecessary dims.
    max_dim = last_included.max()
    truncated_mask = mask[:, :, :max_dim + 1]
    truncated_probs = sorted_probs[:, :, :max_dim + 1]
    truncated_indices = sorted_indices[:, :, :max_dim + 1]

    # trim the words that are not in top-P by setting their probabilities
    # to 0, so that they would not be sampled later.
    trim_mask = 1 - truncated_mask
    trimed_probs = truncated_probs.masked_fill_(trim_mask, 0)
    return trimed_probs, truncated_indices

    # ========================================================

if __name__ == '__main__':
    np.random.seed(1234)
    torch.manual_seed(1234)

    sampling_topp = 0.9
    probs = torch.softmax(torch.randn(1, 1, 10), dim=-1)
    # probs = tensor([0.0545, 0.0779, 0.0189, 0.0647, 0.0282, 0.0862, 0.0656, 0.1041, 0.0399, 0.4600])
    print('probs =', probs[0][0])

    trimed_probs, truncated_indices = _sample_topp(probs)

    cum_probs = trimed_probs.cumsum(dim=-1)[0][0]
    # cumsum = tensor([0.4600, 0.5641])
    print('cumsum =', cum_probs)
    # Will throw AssertionError
    assert float(cum_probs[-1]) >= sampling_topp

```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/882

Differential Revision: D16409269

Pulled By: xingz9

fbshipit-source-id: 94b1122eed50c656057b64e22af6f4a6ea7a68af

1f96d284

Rename data.transforms -> data.encoders · f812e529

Myle Ott authored Jul 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/747

Differential Revision: D16403464

Pulled By: myleott

fbshipit-source-id: ee3b4184f129a02be833c7bdc00685978b4de883

f812e529

19 Jul, 2019 7 commits

Rename _load_model_ensemble -> load_model_ensemble_and_task · 69d0f7f8