- 31 Jul, 2019 3 commits
-
-
Dongjin Na authored
Summary: Adding a backslash in the convolutional language model training usage. Pull Request resolved: https://github.com/pytorch/fairseq/pull/941 Differential Revision: D16581388 Pulled By: myleott fbshipit-source-id: 7e2e05ecf13e86cb844dc5200d49f560c63b12ff
-
Johannes Villmow authored
Summary: Just a small fix for issue https://github.com/pytorch/fairseq/issues/936 . Pull Request resolved: https://github.com/pytorch/fairseq/pull/937 Differential Revision: D16580263 Pulled By: myleott fbshipit-source-id: 1777e782491c63697726e95bd555892da3fed4ec
-
Nathan Ng authored
Summary: Release of the WMT 19 pretrained models Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/767 Reviewed By: edunov Differential Revision: D16472717 Pulled By: nng555 fbshipit-source-id: acf0fa3548c33f2bf2b5f71e551c782ad8c31a42
-
- 30 Jul, 2019 4 commits
-
-
Myle Ott authored
Summary: Fixes https://github.com/pytorch/fairseq/issues/930. Pull Request resolved: https://github.com/pytorch/fairseq/pull/931 Differential Revision: D16562511 Pulled By: myleott fbshipit-source-id: c4c07e2f067326b79daa547dcb3db84aeddbd555
-
Naman Goyal authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/787 Differential Revision: D16562052 fbshipit-source-id: 640e30b2378ec917d60092558d3088a77f9741cb
-
Myle Ott authored
Summary: The previous BSD+PATENTS license was controversial. We have been approved to relicense fairseq under the MIT license. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786 Differential Revision: D16560654 Pulled By: myleott fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034
-
Myle Ott authored
Summary: Fixes https://github.com/pytorch/fairseq/issues/926 Pull Request resolved: https://github.com/pytorch/fairseq/pull/929 Differential Revision: D16560281 Pulled By: myleott fbshipit-source-id: 751051bcdbf25207315bb05f5bee0235d21be627
-
- 29 Jul, 2019 8 commits
-
-
Naman Goyal authored
Summary: 1) Added glue data pre-processing script. 2) updated README with usage. TODO: 1) releasing fairseq dictionary and remove hardcoded path. 2) remove hard-coded path for bpe-encoding, myleott what do you recommend for above TODOs? Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/771 Reviewed By: myleott Differential Revision: D16547679 Pulled By: myleott fbshipit-source-id: 6a6562d9b6215523d048fdf3daee63ffac21e231
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/924 Differential Revision: D16548165 Pulled By: myleott fbshipit-source-id: 49569ece3e54fad7b4f0dfb201ac99123bfdd4f2
-
Xing Zhou authored
Summary: Update README.md to include the recently implemented top-p/nucleus sampling. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/783 Differential Revision: D16543974 Pulled By: myleott fbshipit-source-id: 27c502af10ee390d29607038118a99ff0067aec4
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/923 Differential Revision: D16541289 Pulled By: myleott fbshipit-source-id: b3563a9d61507d4864ac6ecf0648672eaa40b5f3
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/920 Differential Revision: D16540932 Pulled By: myleott fbshipit-source-id: b64438ad8651ecc8fe8904c5f69fa6111b4bed64
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/921 Differential Revision: D16541025 Pulled By: myleott fbshipit-source-id: bb78d30fe285da2adfc7c4e5897ee01fa413b2e4
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/916 Differential Revision: D16537774 Pulled By: myleott fbshipit-source-id: 86bb7b1913a428ee4a21674cc3fc7b39264067ec
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/780 Differential Revision: D16537567 Pulled By: myleott fbshipit-source-id: 4e18c529959935e82ea122c3a2ee477308ffcbe3
-
- 28 Jul, 2019 6 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/779 Differential Revision: D16536673 Pulled By: myleott fbshipit-source-id: bf56e9a81d3086f3d95a3273391dc5e04ed2dbc4
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/914 Differential Revision: D16536670 Pulled By: myleott fbshipit-source-id: 8a41c98f0fb87af6c384cdade756e3eae2978a88
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/912 Differential Revision: D16536561 Pulled By: myleott fbshipit-source-id: 54c5c20a826a14f4e690770e027bcb282acdf911
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/911 Differential Revision: D16536559 Pulled By: myleott fbshipit-source-id: 7fe495054ce5b7658b1d3a43eca38c5858360236
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/913 Differential Revision: D16536562 Pulled By: myleott fbshipit-source-id: ce28642da6868ec884e3e416388a652977a062df
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/910 Differential Revision: D16536532 Pulled By: myleott fbshipit-source-id: 56bb5570e70b5670ad87c64d9dd20c64c1fa9f5c
-
- 27 Jul, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/909 Differential Revision: D16532919 Pulled By: myleott fbshipit-source-id: 16ce884cf3d84579026e4406a75ba3c01a128dbd
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/778 Differential Revision: D16525447 Pulled By: myleott fbshipit-source-id: e721e3a10e243a2408a04f89f06b5adbbe2fdff2
-
- 25 Jul, 2019 2 commits
-
-
Myle Ott authored
Summary: Input feeding generally refers to a slightly different concept Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/769 Differential Revision: D16491898 Pulled By: myleott fbshipit-source-id: 68573584e820f11f199db4e7e37e9ee7a69a3287
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/770 Differential Revision: D16491911 Pulled By: myleott fbshipit-source-id: 8dd2b76f8fa24183640ae9d1129ea47ded77d43d
-
- 24 Jul, 2019 1 commit
-
-
Spencer Poff authored
Summary: I sadly discovery that my checkpoint directory wasn't globally readable after 8 hours of training. Adding this check at the beginning of train loop to keep that from happening again! Reviewed By: myleott Differential Revision: D16455394 fbshipit-source-id: 35959aa058150b2afb63710c468d01ebc8a12b0c
-
- 23 Jul, 2019 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/899 Differential Revision: D16448602 Pulled By: myleott fbshipit-source-id: afd1a1b713274b6328150cd85d7f8a81833597aa
-
Taylan Bilal authored
Summary: Since mask really is a tensor of ints, this change should be mathematically equivalent to the base. On the other hand, this has performance implications for xla, hence the pull request. Pull Request resolved: https://github.com/pytorch/fairseq/pull/875 Differential Revision: D16232877 Pulled By: myleott fbshipit-source-id: e63175ee0016dcf0dfe10e2fd22570b8bbfbde84
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/762 Differential Revision: D16427266 Pulled By: myleott fbshipit-source-id: 9bd9b8c6b4994ae98a62a37b34d03265bd365453
-
- 22 Jul, 2019 8 commits
-
-
Sara Hanson authored
Summary: Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746 Pull Request resolved: https://github.com/pytorch/fairseq/pull/894 Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values. Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary. Reviewed By: borguz Differential Revision: D16042988 fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/761 Differential Revision: D16421335 Pulled By: myleott fbshipit-source-id: 257d92c2b90361147642e2baa38486b4d18f6297
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/757 Differential Revision: D16418305 Pulled By: myleott fbshipit-source-id: 25f293a2792509f7a75c688e4bf8cff02e6bba2e
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/758 Differential Revision: D16418932 Pulled By: myleott fbshipit-source-id: 59f005164b61b9fa712922eeb23525f7eec38f38
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/756 Differential Revision: D16418302 Pulled By: myleott fbshipit-source-id: 62495a0bff41d1741e2b09807a3b43ff2c66c8fb
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/752 Differential Revision: D16417582 Pulled By: myleott fbshipit-source-id: 6b4289febcf9290452bb91f1f2181a02c09c82a7
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740 Differential Revision: D16377797 Pulled By: myleott fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/750 Differential Revision: D16410986 Pulled By: myleott fbshipit-source-id: 8ee6b4371d6ae5b041b00a54a6039a422345795e
-
- 21 Jul, 2019 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/749 Differential Revision: D16410984 Pulled By: myleott fbshipit-source-id: 7698df46b8a179afccb287990f9705358690454a
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/751 Differential Revision: D16410989 Pulled By: myleott fbshipit-source-id: ddbbee49756f9ff6c4487977a3f5d2259b7abafe
-
Liang Wang authored
Summary: Two issues here: 1. `last_included` should be the last included index `cumsum_mask[:, :, -1:]` instead of `cumsum_mask[:, :, :1]` (which is either 0 or 1); 2. If `--no-repeat-ngram-size` is set, the sum of `probs` may less than 1, we need to re-normalize to make it a valid probability distribution The following code can reproduce this issues: ``` import torch import numpy as np def _sample_topp(probs): # ===== Code from fairseq/search.py _sample_topp ====== # sort the last dimension (vocab dimension) in descending order sorted_probs, sorted_indices = probs.sort(descending=True) # compute a mask to indicate the words to be included in the top-P set. cumsum_probs = sorted_probs.cumsum(dim=2) mask = cumsum_probs.lt(sampling_topp) # note that mask was computed by 'lt'. One more word needs to be included # so that the cumulative probability mass can exceed p. cumsum_mask = mask.cumsum(dim=2) last_included = cumsum_mask[:, :, :1] mask = mask.scatter_(2, last_included, 1) # truncate unnecessary dims. max_dim = last_included.max() truncated_mask = mask[:, :, :max_dim + 1] truncated_probs = sorted_probs[:, :, :max_dim + 1] truncated_indices = sorted_indices[:, :, :max_dim + 1] # trim the words that are not in top-P by setting their probabilities # to 0, so that they would not be sampled later. trim_mask = 1 - truncated_mask trimed_probs = truncated_probs.masked_fill_(trim_mask, 0) return trimed_probs, truncated_indices # ======================================================== if __name__ == '__main__': np.random.seed(1234) torch.manual_seed(1234) sampling_topp = 0.9 probs = torch.softmax(torch.randn(1, 1, 10), dim=-1) # probs = tensor([0.0545, 0.0779, 0.0189, 0.0647, 0.0282, 0.0862, 0.0656, 0.1041, 0.0399, 0.4600]) print('probs =', probs[0][0]) trimed_probs, truncated_indices = _sample_topp(probs) cum_probs = trimed_probs.cumsum(dim=-1)[0][0] # cumsum = tensor([0.4600, 0.5641]) print('cumsum =', cum_probs) # Will throw AssertionError assert float(cum_probs[-1]) >= sampling_topp ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/882 Differential Revision: D16409269 Pulled By: xingz9 fbshipit-source-id: 94b1122eed50c656057b64e22af6f4a6ea7a68af
-