1. 19 Jul, 2019 5 commits
  2. 17 Jul, 2019 5 commits
  3. 14 Jul, 2019 1 commit
  4. 11 Jul, 2019 2 commits
  5. 10 Jul, 2019 1 commit
  6. 09 Jul, 2019 1 commit
  7. 08 Jul, 2019 1 commit
    • Guanheng Zhang's avatar
      Integrate torch.nn and fairseq MultiheadAttention (#772) · 6d2e0831
      Guanheng Zhang authored
      Summary:
      Integrate torch.nn and fairseq MultiheadAttention modules. In the future, both libraries will be benefited from performance optimization together.
      
      Under the following circumstances, the calculation of the MultiheadAttention will still remain in fairseq, including:
      1. onnx trace
      2. incremental state
      3. static kv
      
      We plan to gradually mitigate those capabilities to PyTorch's core library.
      
      Faieseq users can user the attribute self.enable_torch_version to force the calculations in either torch or fairseq. We use the following script to ensure both versions yield the same results.
      
      ------------------------------------------------------------------------------------
      ```
      import torch
      from fairseq.modules import MultiheadAttention
      import time
      
      embed_dim = 64
      kv_embed_dim = 1208
      num_heads = 16
      src_len = 20
      tgt_len = 30
      bsz = 10
      
      model = MultiheadAttention(embed_dim, num_heads, kdim=kv_embed_dim, vdim=kv_embed_dim,
                                 bias=True, add_bias_kv=True, add_zero_attn=True)
      
      query = torch.rand((src_len, bsz, embed_dim))
      key = torch.rand((src_len, bsz, kv_embed_dim))
      value = torch.rand((src_len, bsz, kv_embed_dim))
      
      attn_mask = torch.randint(0, 2, (src_len, src_len)).float()
      attn_mask.masked_fill_(attn_mask == 0, float('-inf'))
      attn_mask.masked_fill_(attn_mask > 0, float('0.0'))
      
      seq_mask = torch.randint(0, 2, (1, src_len))
      key_padding_mask = seq_mask
      for i in range(bsz-1):
          key_padding_mask = torch.cat([key_padding_mask, seq_mask], axis=0)
      key_padding_mask = key_padding_mask == 1
      
      # Apply torch.nn version
      model.enable_torch_version = True
      torch_output, torch_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask)
      
      # Apply fairseq version
      model.enable_torch_version = False
      fairseq_output, fairseq_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask)
      
      print("torch and fairseq generate same results: outputs are same ? ",
            torch.allclose(torch_output, fairseq_output, atol=5e-6, rtol=1e-6),
            ", weights are same ? ",
            torch.allclose(torch_weight, fairseq_weight, atol=5e-6, rtol=1e-6)
      )
      ```
      ------------------------------------------------------------------------------------
      Expected results:
      torch and fairseq generate same results: outputs are same ?  True , weights are same ?  True
      
      ------------------------------------------------------------------------------------
      Similar performance is expected for both two versions. Using the following setup and have the initial performance benchmark results:
      
      #########################
      embed_dim = 32
      kv_embed_dim = 32
      num_heads = 4
      src_len = 3
      tgt_len = 2
      bsz = 4
      num_samples = 50000
      
      #########################
      torch-version MultiheadAttention cpu time: 0.46589  ms per iteration.
      fairseq-version MultiheadAttention cpu time: 0.47861  ms per iteration.
      torch-version MultiheadAttention gpu time: 0.82330  ms per iteration.
      fairseq-version MultiheadAttention gpu time: 0.79410  ms per iteration.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/772
      
      Reviewed By: myleott
      
      Differential Revision: D16108450
      
      Pulled By: zhangguanheng66
      
      fbshipit-source-id: cd2eb5a6eeeab6c274999b7928c2af14fc211565
      6d2e0831
  8. 06 Jul, 2019 2 commits
  9. 04 Jul, 2019 1 commit
    • Spencer Poff's avatar
      support streaming iterator · 5c241c8c
      Spencer Poff authored
      Summary:
      For tasks that involve streaming data directly from an API, we need a simpler epoch iterator.
      
      Also included in this change is support for initializing a dictionary with an arbitrary list of special symbols.
      
      Reviewed By: myleott
      
      Differential Revision: D16110603
      
      fbshipit-source-id: be6d9f680292dec1512614871f9269c95ac84861
      5c241c8c
  10. 02 Jul, 2019 1 commit
    • Xutai Ma's avatar
      add --max-tokens-valid option for validation · bccfddbb
      Xutai Ma authored
      Summary: Add the max-token-valid option. Sometime a separate max batch tokens for validation may be helpful, for example when there is a long sequence in validation set thats larger than max_tokens (it's rare in MT but could happen in ASR or AST).
      
      Reviewed By: myleott
      
      Differential Revision: D16076951
      
      fbshipit-source-id: ae7f4218594580b9450a8196d7afa1e7e2018aee
      bccfddbb
  11. 01 Jul, 2019 3 commits
  12. 30 Jun, 2019 4 commits
  13. 27 Jun, 2019 2 commits
    • Bao-Yu's avatar
      Update generate.py (#831) · c86d70cc
      Bao-Yu authored
      Summary:
      Repeated use of 'i' in evaluate may cause some problems.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/831
      
      Differential Revision: D15980227
      
      Pulled By: myleott
      
      fbshipit-source-id: 7b6b54a6b54938ad63ed1720d930505b56e5c84b
      c86d70cc
    • Nayan Singhal's avatar
      2/N bmuf · c246df42
      Nayan Singhal authored
      Summary:
      Added BMUF implementation.
      
      Todo:
      1) Add unit test case for testing model averaging and bmuf
      2) Add warm before actually start training the model
      
      Reviewed By: jay-mahadeokar
      
      Differential Revision: D15871477
      
      fbshipit-source-id: 866b0aba2d5bea5b65b4438acb49c886c4a87924
      c246df42
  14. 26 Jun, 2019 3 commits
  15. 25 Jun, 2019 1 commit
    • freewym's avatar
      avoid "divided by zero error" in logging_outputs when --use-bmuf is e… (#812) · b3864b28
      freewym authored
      Summary:
      … enabled.
      
      When doing multi-gpu training with --use-bmuf turned on and --global-sync-iter > 1, each replica may not sync with other replicas at each iteration. So logging_outputs only has stats of their own.  On the other hand, logging_outputs may be empty at the end of an epoch after "a dummy iteration" because the number of replicas does not divide the number of batches of the training data. If this happens, sample_size and ntokens would be 0 for some replica  and cause "divided by 0" error. This fix sets *loss to 0 if sample_size/ntokens is 0.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/812
      
      Reviewed By: myleott, yqwangustc
      
      Differential Revision: D15908614
      
      Pulled By: nayansinghal
      
      fbshipit-source-id: c92e8e095f012bdb4ef753a3c627fd215afa215d
      b3864b28
  16. 24 Jun, 2019 1 commit
  17. 23 Jun, 2019 3 commits
  18. 21 Jun, 2019 2 commits
  19. 20 Jun, 2019 1 commit