1. 22 Jul, 2019 4 commits
  2. 21 Jul, 2019 4 commits
    • Myle Ott's avatar
      Update GPT-2 BPE · 62b5498b
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/749
      
      Differential Revision: D16410984
      
      Pulled By: myleott
      
      fbshipit-source-id: 7698df46b8a179afccb287990f9705358690454a
      62b5498b
    • Myle Ott's avatar
      Default to mmap and infer dataset implementations automatically · 5f78106a
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/751
      
      Differential Revision: D16410989
      
      Pulled By: myleott
      
      fbshipit-source-id: ddbbee49756f9ff6c4487977a3f5d2259b7abafe
      5f78106a
    • Liang Wang's avatar
      Fix topp sampling issues (#882) · 1f96d284
      Liang Wang authored
      Summary:
      Two issues here:
      
      1. `last_included` should be the last included index `cumsum_mask[:, :, -1:]` instead of `cumsum_mask[:, :, :1]`  (which is either 0 or 1);
      
      2. If `--no-repeat-ngram-size` is set, the sum of `probs` may less than 1, we need to re-normalize to make it a valid probability distribution
      
      The following code can reproduce this issues:
      
      ```
      import torch
      import numpy as np
      
      def _sample_topp(probs):
      
          # =====  Code from  fairseq/search.py _sample_topp ======
      
          # sort the last dimension (vocab dimension) in descending order
          sorted_probs, sorted_indices = probs.sort(descending=True)
      
          # compute a mask to indicate the words to be included in the top-P set.
          cumsum_probs = sorted_probs.cumsum(dim=2)
          mask = cumsum_probs.lt(sampling_topp)
      
          # note that mask was computed by 'lt'. One more word needs to be included
          # so that the cumulative probability mass can exceed p.
          cumsum_mask = mask.cumsum(dim=2)
          last_included = cumsum_mask[:, :, :1]
          mask = mask.scatter_(2, last_included, 1)
      
          # truncate unnecessary dims.
          max_dim = last_included.max()
          truncated_mask = mask[:, :, :max_dim + 1]
          truncated_probs = sorted_probs[:, :, :max_dim + 1]
          truncated_indices = sorted_indices[:, :, :max_dim + 1]
      
          # trim the words that are not in top-P by setting their probabilities
          # to 0, so that they would not be sampled later.
          trim_mask = 1 - truncated_mask
          trimed_probs = truncated_probs.masked_fill_(trim_mask, 0)
          return trimed_probs, truncated_indices
      
          # ========================================================
      
      if __name__ == '__main__':
          np.random.seed(1234)
          torch.manual_seed(1234)
      
          sampling_topp = 0.9
          probs = torch.softmax(torch.randn(1, 1, 10), dim=-1)
          # probs = tensor([0.0545, 0.0779, 0.0189, 0.0647, 0.0282, 0.0862, 0.0656, 0.1041, 0.0399, 0.4600])
          print('probs =', probs[0][0])
      
          trimed_probs, truncated_indices = _sample_topp(probs)
      
          cum_probs = trimed_probs.cumsum(dim=-1)[0][0]
          # cumsum = tensor([0.4600, 0.5641])
          print('cumsum =', cum_probs)
          # Will throw AssertionError
          assert float(cum_probs[-1]) >= sampling_topp
      
      ```
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/882
      
      Differential Revision: D16409269
      
      Pulled By: xingz9
      
      fbshipit-source-id: 94b1122eed50c656057b64e22af6f4a6ea7a68af
      1f96d284
    • Myle Ott's avatar
      Rename data.transforms -> data.encoders · f812e529
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/747
      
      Differential Revision: D16403464
      
      Pulled By: myleott
      
      fbshipit-source-id: ee3b4184f129a02be833c7bdc00685978b4de883
      f812e529
  3. 19 Jul, 2019 7 commits
  4. 17 Jul, 2019 5 commits
  5. 14 Jul, 2019 1 commit
  6. 11 Jul, 2019 2 commits
  7. 10 Jul, 2019 1 commit
  8. 09 Jul, 2019 1 commit
  9. 08 Jul, 2019 1 commit
    • Guanheng Zhang's avatar
      Integrate torch.nn and fairseq MultiheadAttention (#772) · 6d2e0831
      Guanheng Zhang authored
      Summary:
      Integrate torch.nn and fairseq MultiheadAttention modules. In the future, both libraries will be benefited from performance optimization together.
      
      Under the following circumstances, the calculation of the MultiheadAttention will still remain in fairseq, including:
      1. onnx trace
      2. incremental state
      3. static kv
      
      We plan to gradually mitigate those capabilities to PyTorch's core library.
      
      Faieseq users can user the attribute self.enable_torch_version to force the calculations in either torch or fairseq. We use the following script to ensure both versions yield the same results.
      
      ------------------------------------------------------------------------------------
      ```
      import torch
      from fairseq.modules import MultiheadAttention
      import time
      
      embed_dim = 64
      kv_embed_dim = 1208
      num_heads = 16
      src_len = 20
      tgt_len = 30
      bsz = 10
      
      model = MultiheadAttention(embed_dim, num_heads, kdim=kv_embed_dim, vdim=kv_embed_dim,
                                 bias=True, add_bias_kv=True, add_zero_attn=True)
      
      query = torch.rand((src_len, bsz, embed_dim))
      key = torch.rand((src_len, bsz, kv_embed_dim))
      value = torch.rand((src_len, bsz, kv_embed_dim))
      
      attn_mask = torch.randint(0, 2, (src_len, src_len)).float()
      attn_mask.masked_fill_(attn_mask == 0, float('-inf'))
      attn_mask.masked_fill_(attn_mask > 0, float('0.0'))
      
      seq_mask = torch.randint(0, 2, (1, src_len))
      key_padding_mask = seq_mask
      for i in range(bsz-1):
          key_padding_mask = torch.cat([key_padding_mask, seq_mask], axis=0)
      key_padding_mask = key_padding_mask == 1
      
      # Apply torch.nn version
      model.enable_torch_version = True
      torch_output, torch_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask)
      
      # Apply fairseq version
      model.enable_torch_version = False
      fairseq_output, fairseq_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask)
      
      print("torch and fairseq generate same results: outputs are same ? ",
            torch.allclose(torch_output, fairseq_output, atol=5e-6, rtol=1e-6),
            ", weights are same ? ",
            torch.allclose(torch_weight, fairseq_weight, atol=5e-6, rtol=1e-6)
      )
      ```
      ------------------------------------------------------------------------------------
      Expected results:
      torch and fairseq generate same results: outputs are same ?  True , weights are same ?  True
      
      ------------------------------------------------------------------------------------
      Similar performance is expected for both two versions. Using the following setup and have the initial performance benchmark results:
      
      #########################
      embed_dim = 32
      kv_embed_dim = 32
      num_heads = 4
      src_len = 3
      tgt_len = 2
      bsz = 4
      num_samples = 50000
      
      #########################
      torch-version MultiheadAttention cpu time: 0.46589  ms per iteration.
      fairseq-version MultiheadAttention cpu time: 0.47861  ms per iteration.
      torch-version MultiheadAttention gpu time: 0.82330  ms per iteration.
      fairseq-version MultiheadAttention gpu time: 0.79410  ms per iteration.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/772
      
      Reviewed By: myleott
      
      Differential Revision: D16108450
      
      Pulled By: zhangguanheng66
      
      fbshipit-source-id: cd2eb5a6eeeab6c274999b7928c2af14fc211565
      6d2e0831
  10. 06 Jul, 2019 2 commits
  11. 04 Jul, 2019 1 commit
    • Spencer Poff's avatar
      support streaming iterator · 5c241c8c
      Spencer Poff authored
      Summary:
      For tasks that involve streaming data directly from an API, we need a simpler epoch iterator.
      
      Also included in this change is support for initializing a dictionary with an arbitrary list of special symbols.
      
      Reviewed By: myleott
      
      Differential Revision: D16110603
      
      fbshipit-source-id: be6d9f680292dec1512614871f9269c95ac84861
      5c241c8c
  12. 02 Jul, 2019 1 commit
    • Xutai Ma's avatar
      add --max-tokens-valid option for validation · bccfddbb
      Xutai Ma authored
      Summary: Add the max-token-valid option. Sometime a separate max batch tokens for validation may be helpful, for example when there is a long sequence in validation set thats larger than max_tokens (it's rare in MT but could happen in ASR or AST).
      
      Reviewed By: myleott
      
      Differential Revision: D16076951
      
      fbshipit-source-id: ae7f4218594580b9450a8196d7afa1e7e2018aee
      bccfddbb
  13. 01 Jul, 2019 3 commits
  14. 30 Jun, 2019 4 commits
  15. 27 Jun, 2019 2 commits
    • Bao-Yu's avatar
      Update generate.py (#831) · c86d70cc
      Bao-Yu authored
      Summary:
      Repeated use of 'i' in evaluate may cause some problems.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/831
      
      Differential Revision: D15980227
      
      Pulled By: myleott
      
      fbshipit-source-id: 7b6b54a6b54938ad63ed1720d930505b56e5c84b
      c86d70cc
    • Nayan Singhal's avatar
      2/N bmuf · c246df42
      Nayan Singhal authored
      Summary:
      Added BMUF implementation.
      
      Todo:
      1) Add unit test case for testing model averaging and bmuf
      2) Add warm before actually start training the model
      
      Reviewed By: jay-mahadeokar
      
      Differential Revision: D15871477
      
      fbshipit-source-id: 866b0aba2d5bea5b65b4438acb49c886c4a87924
      c246df42
  16. 26 Jun, 2019 1 commit