- 23 Jul, 2019 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/899 Differential Revision: D16448602 Pulled By: myleott fbshipit-source-id: afd1a1b713274b6328150cd85d7f8a81833597aa
-
Taylan Bilal authored
Summary: Since mask really is a tensor of ints, this change should be mathematically equivalent to the base. On the other hand, this has performance implications for xla, hence the pull request. Pull Request resolved: https://github.com/pytorch/fairseq/pull/875 Differential Revision: D16232877 Pulled By: myleott fbshipit-source-id: e63175ee0016dcf0dfe10e2fd22570b8bbfbde84
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/762 Differential Revision: D16427266 Pulled By: myleott fbshipit-source-id: 9bd9b8c6b4994ae98a62a37b34d03265bd365453
-
- 22 Jul, 2019 8 commits
-
-
Sara Hanson authored
Summary: Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746 Pull Request resolved: https://github.com/pytorch/fairseq/pull/894 Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values. Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary. Reviewed By: borguz Differential Revision: D16042988 fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/761 Differential Revision: D16421335 Pulled By: myleott fbshipit-source-id: 257d92c2b90361147642e2baa38486b4d18f6297
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/757 Differential Revision: D16418305 Pulled By: myleott fbshipit-source-id: 25f293a2792509f7a75c688e4bf8cff02e6bba2e
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/758 Differential Revision: D16418932 Pulled By: myleott fbshipit-source-id: 59f005164b61b9fa712922eeb23525f7eec38f38
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/756 Differential Revision: D16418302 Pulled By: myleott fbshipit-source-id: 62495a0bff41d1741e2b09807a3b43ff2c66c8fb
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/752 Differential Revision: D16417582 Pulled By: myleott fbshipit-source-id: 6b4289febcf9290452bb91f1f2181a02c09c82a7
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740 Differential Revision: D16377797 Pulled By: myleott fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/750 Differential Revision: D16410986 Pulled By: myleott fbshipit-source-id: 8ee6b4371d6ae5b041b00a54a6039a422345795e
-
- 21 Jul, 2019 4 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/749 Differential Revision: D16410984 Pulled By: myleott fbshipit-source-id: 7698df46b8a179afccb287990f9705358690454a
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/751 Differential Revision: D16410989 Pulled By: myleott fbshipit-source-id: ddbbee49756f9ff6c4487977a3f5d2259b7abafe
-
Liang Wang authored
Summary: Two issues here: 1. `last_included` should be the last included index `cumsum_mask[:, :, -1:]` instead of `cumsum_mask[:, :, :1]` (which is either 0 or 1); 2. If `--no-repeat-ngram-size` is set, the sum of `probs` may less than 1, we need to re-normalize to make it a valid probability distribution The following code can reproduce this issues: ``` import torch import numpy as np def _sample_topp(probs): # ===== Code from fairseq/search.py _sample_topp ====== # sort the last dimension (vocab dimension) in descending order sorted_probs, sorted_indices = probs.sort(descending=True) # compute a mask to indicate the words to be included in the top-P set. cumsum_probs = sorted_probs.cumsum(dim=2) mask = cumsum_probs.lt(sampling_topp) # note that mask was computed by 'lt'. One more word needs to be included # so that the cumulative probability mass can exceed p. cumsum_mask = mask.cumsum(dim=2) last_included = cumsum_mask[:, :, :1] mask = mask.scatter_(2, last_included, 1) # truncate unnecessary dims. max_dim = last_included.max() truncated_mask = mask[:, :, :max_dim + 1] truncated_probs = sorted_probs[:, :, :max_dim + 1] truncated_indices = sorted_indices[:, :, :max_dim + 1] # trim the words that are not in top-P by setting their probabilities # to 0, so that they would not be sampled later. trim_mask = 1 - truncated_mask trimed_probs = truncated_probs.masked_fill_(trim_mask, 0) return trimed_probs, truncated_indices # ======================================================== if __name__ == '__main__': np.random.seed(1234) torch.manual_seed(1234) sampling_topp = 0.9 probs = torch.softmax(torch.randn(1, 1, 10), dim=-1) # probs = tensor([0.0545, 0.0779, 0.0189, 0.0647, 0.0282, 0.0862, 0.0656, 0.1041, 0.0399, 0.4600]) print('probs =', probs[0][0]) trimed_probs, truncated_indices = _sample_topp(probs) cum_probs = trimed_probs.cumsum(dim=-1)[0][0] # cumsum = tensor([0.4600, 0.5641]) print('cumsum =', cum_probs) # Will throw AssertionError assert float(cum_probs[-1]) >= sampling_topp ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/882 Differential Revision: D16409269 Pulled By: xingz9 fbshipit-source-id: 94b1122eed50c656057b64e22af6f4a6ea7a68af -
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/747 Differential Revision: D16403464 Pulled By: myleott fbshipit-source-id: ee3b4184f129a02be833c7bdc00685978b4de883
-
- 19 Jul, 2019 7 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/738 Differential Revision: D16377803 Pulled By: myleott fbshipit-source-id: 6beb2f78e7464b70ff65a965d2b747cdca0ca951
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/736 Differential Revision: D16378001 Pulled By: myleott fbshipit-source-id: 2907f63bcbf7068ceaa48b00096040fa2639e569
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/739 Differential Revision: D16377798 Pulled By: myleott fbshipit-source-id: 20047c80de2e6f108269ace4ae3eec906a5920dd
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/737 Differential Revision: D16377805 Pulled By: myleott fbshipit-source-id: 1e090a02ff4fbba8695173f57d3cc5b88ae98bbf
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/734 Differential Revision: D16377044 Pulled By: myleott fbshipit-source-id: 37d5553d76aa7c653113fec089f59710281c31d7
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/735 Differential Revision: D16377046 Pulled By: myleott fbshipit-source-id: 9725d4a3ce6b2fc8cee0b1d1cb8921f9d59c551a
-
Myle Ott authored
Summary: No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon. Pull Request resolved: https://github.com/pytorch/fairseq/pull/891 Differential Revision: D16377132 Pulled By: myleott fbshipit-source-id: f1cb88e671ccd510e53334d0f449fe18585268c7
-
- 17 Jul, 2019 5 commits
-
-
Ning Dong authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/879 Pull Request resolved: https://github.com/pytorch/translate/pull/598 Details in https://fb.workplace.com/notes/ning-dong/closing-research-to-production-gap-a-story-of-latent-variable-model-migration/443418839813586/ Reviewed By: xianxl Differential Revision: D15742439 fbshipit-source-id: 168c84bd30a5da3c2fb404fcca74266deef1f964
-
Xing Zhou authored
Summary: Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p. To test it: python generate.py ~myleott/data/data-bin/wmt17_zh_en_full/ --path ~myleott/zh_en/model.pt --remove-bpe --nbest 5 --beam 5 --sampling --sampling-topp 0.3 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710 Test Plan: python generate.py ~myleott/data/data-bin/wmt17_zh_en_full/ --path ~myleott/zh_en/model.pt --remove-bpe --nbest 5 --beam 5 --sampling --sampling-topp 0.3 python tests/test_sequence_generator.py python tests/test_binaries.py Reviewed By: myleott Differential Revision: D16286688 Pulled By: xingz9 fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f
-
Jiajun Shen authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/727 Differential Revision: D16332742 Pulled By: myleott fbshipit-source-id: becedd573c2c071fd21fcb5e55fead554c9bd9d1
-
Myle Ott authored
Summary: This is useful for standalone scripts that want to load a model and inherit most of the args from the model (e.g., eval_lm.py). Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/723 Differential Revision: D16255751 Pulled By: myleott fbshipit-source-id: 562b61511d5d7113e805c9644c877ebb8a3a1889
-
Taylan Bilal authored
Summary: applying non_pad_mask results in dynamic shapes = bad for tpus This is an equivalent loss computation (tested), but tensor shapes are constant (in the case of reduce=True) Pull Request resolved: https://github.com/pytorch/fairseq/pull/876 Differential Revision: D16241621 Pulled By: myleott fbshipit-source-id: 973254b7e0842f2b55817afd66b2a110a566f149
-
- 14 Jul, 2019 1 commit
-
-
Taylan Bilal authored
Summary: tensor resizing doesn't work well with tpus, this change is equivalent to the base and works better w/ tpus. Pull Request resolved: https://github.com/pytorch/fairseq/pull/877 Differential Revision: D16241620 Pulled By: myleott fbshipit-source-id: 402c7d5eb6175a66a0420d10e74eb0a9e085790e
-
- 11 Jul, 2019 2 commits
-
-
Taylan Bilal authored
Summary: self._optimizer has __getstate__ We need this so that fairseq_optimizer's work with pytorch/xla ``` % find . | xargs grep -s -i __getstate__ ./third_party/tensorflow/tensorflow/python/util/deprecation_wrapper.py: def __getstate__(self): ./torch_xla_py/xla_model.py: for param_group in optimizer.__getstate__()['param_groups']: ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/872 Differential Revision: D16211062 Pulled By: alexeib fbshipit-source-id: 1b5575c85d34b7b021d719a03fd58d1c2ee453ee
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/711 Differential Revision: D16192752 Pulled By: myleott fbshipit-source-id: 102ed337a3d31e2047be7c033e9007c04223a684
-
- 10 Jul, 2019 1 commit
-
-
Taylan Bilal authored
Summary: We need this so that `progress_bar`s work with pytorch/xla i.e. TPUs. See [here](https://github.com/pytorch/xla/blob/master/torch_xla_py/data_parallel.py#L130). Pull Request resolved: https://github.com/pytorch/fairseq/pull/871 Differential Revision: D16181062 Pulled By: myleott fbshipit-source-id: 02c65033260396c2a243fbb66e31ffc2965f2376
-
- 09 Jul, 2019 1 commit
-
-
Peng-Jen Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/592 Fix bug reported at https://github.com/pytorch/fairseq/commit/9c3bb5c6d6c7d6442a28ccb8a81b2fc4e5782ace#r34181600 D15682169 breaks the multilingual translation generation. Reviewed By: dpacgopinath Differential Revision: D16147454 fbshipit-source-id: e0cf4d32f362190a0542fa0160f65a2a207ca3fa
-
- 08 Jul, 2019 1 commit
-
-
Guanheng Zhang authored
Summary: Integrate torch.nn and fairseq MultiheadAttention modules. In the future, both libraries will be benefited from performance optimization together. Under the following circumstances, the calculation of the MultiheadAttention will still remain in fairseq, including: 1. onnx trace 2. incremental state 3. static kv We plan to gradually mitigate those capabilities to PyTorch's core library. Faieseq users can user the attribute self.enable_torch_version to force the calculations in either torch or fairseq. We use the following script to ensure both versions yield the same results. ------------------------------------------------------------------------------------ ``` import torch from fairseq.modules import MultiheadAttention import time embed_dim = 64 kv_embed_dim = 1208 num_heads = 16 src_len = 20 tgt_len = 30 bsz = 10 model = MultiheadAttention(embed_dim, num_heads, kdim=kv_embed_dim, vdim=kv_embed_dim, bias=True, add_bias_kv=True, add_zero_attn=True) query = torch.rand((src_len, bsz, embed_dim)) key = torch.rand((src_len, bsz, kv_embed_dim)) value = torch.rand((src_len, bsz, kv_embed_dim)) attn_mask = torch.randint(0, 2, (src_len, src_len)).float() attn_mask.masked_fill_(attn_mask == 0, float('-inf')) attn_mask.masked_fill_(attn_mask > 0, float('0.0')) seq_mask = torch.randint(0, 2, (1, src_len)) key_padding_mask = seq_mask for i in range(bsz-1): key_padding_mask = torch.cat([key_padding_mask, seq_mask], axis=0) key_padding_mask = key_padding_mask == 1 # Apply torch.nn version model.enable_torch_version = True torch_output, torch_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask) # Apply fairseq version model.enable_torch_version = False fairseq_output, fairseq_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask) print("torch and fairseq generate same results: outputs are same ? ", torch.allclose(torch_output, fairseq_output, atol=5e-6, rtol=1e-6), ", weights are same ? ", torch.allclose(torch_weight, fairseq_weight, atol=5e-6, rtol=1e-6) ) ``` ------------------------------------------------------------------------------------ Expected results: torch and fairseq generate same results: outputs are same ? True , weights are same ? True ------------------------------------------------------------------------------------ Similar performance is expected for both two versions. Using the following setup and have the initial performance benchmark results: ######################### embed_dim = 32 kv_embed_dim = 32 num_heads = 4 src_len = 3 tgt_len = 2 bsz = 4 num_samples = 50000 ######################### torch-version MultiheadAttention cpu time: 0.46589 ms per iteration. fairseq-version MultiheadAttention cpu time: 0.47861 ms per iteration. torch-version MultiheadAttention gpu time: 0.82330 ms per iteration. fairseq-version MultiheadAttention gpu time: 0.79410 ms per iteration. Pull Request resolved: https://github.com/pytorch/fairseq/pull/772 Reviewed By: myleott Differential Revision: D16108450 Pulled By: zhangguanheng66 fbshipit-source-id: cd2eb5a6eeeab6c274999b7928c2af14fc211565
-
- 06 Jul, 2019 2 commits
-
-
vineetk1 authored
Added a comment to inform coders that pack_padded_sequence requires that padding must be on the right (#860) Summary: The PyTorch document on pack_padded_sequence has no information regarding a requirement that padding must be on the right. Therefore, this information is added as a comment on Line 212 of [https://github.com/vineetk1/fairseq/blob/master/fairseq/models/lstm.py](url) Pull Request resolved: https://github.com/pytorch/fairseq/pull/860 Differential Revision: D16142102 Pulled By: myleott fbshipit-source-id: 7cb6d4df64b17b54b223de03bd966ca16077c3fe
-
Louis MARTIN authored
Summary: Fairseq wouldn't install on macOS. A workaround was found here: https://github.com/pytorch/fairseq/issues/289 This is now automatic in setup.py, maybe be there's a cleaner way to do it. I checked that it compiles fine on Linux and macOS. Pull Request resolved: https://github.com/pytorch/fairseq/pull/862 Differential Revision: D16142105 Pulled By: myleott fbshipit-source-id: 998ac7781d7a1ac047f4f9239c1fe16eab4be0dd
-
- 04 Jul, 2019 1 commit
-
-
Spencer Poff authored
Summary: For tasks that involve streaming data directly from an API, we need a simpler epoch iterator. Also included in this change is support for initializing a dictionary with an arbitrary list of special symbols. Reviewed By: myleott Differential Revision: D16110603 fbshipit-source-id: be6d9f680292dec1512614871f9269c95ac84861
-
- 02 Jul, 2019 1 commit
-
-
Xutai Ma authored
Summary: Add the max-token-valid option. Sometime a separate max batch tokens for validation may be helpful, for example when there is a long sequence in validation set thats larger than max_tokens (it's rare in MT but could happen in ASR or AST). Reviewed By: myleott Differential Revision: D16076951 fbshipit-source-id: ae7f4218594580b9450a8196d7afa1e7e2018aee
-
- 01 Jul, 2019 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/847 Differential Revision: D16075498 Pulled By: myleott fbshipit-source-id: 62e27a8c4764f53f181c502674dfab1e6b0537e2
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/703 Differential Revision: D16072305 Pulled By: myleott fbshipit-source-id: b77019bdcfbfb95f2817a29a74515bc8f5b682bf
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/696 Differential Revision: D16068394 Pulled By: myleott fbshipit-source-id: 92b44470ab8aeb9f99838cf74e34176104eb2b87
-