1. 08 Jul, 2019 1 commit
    • Guanheng Zhang's avatar
      Integrate torch.nn and fairseq MultiheadAttention (#772) · 6d2e0831
      Guanheng Zhang authored
      Summary:
      Integrate torch.nn and fairseq MultiheadAttention modules. In the future, both libraries will be benefited from performance optimization together.
      
      Under the following circumstances, the calculation of the MultiheadAttention will still remain in fairseq, including:
      1. onnx trace
      2. incremental state
      3. static kv
      
      We plan to gradually mitigate those capabilities to PyTorch's core library.
      
      Faieseq users can user the attribute self.enable_torch_version to force the calculations in either torch or fairseq. We use the following script to ensure both versions yield the same results.
      
      ------------------------------------------------------------------------------------
      ```
      import torch
      from fairseq.modules import MultiheadAttention
      import time
      
      embed_dim = 64
      kv_embed_dim = 1208
      num_heads = 16
      src_len = 20
      tgt_len = 30
      bsz = 10
      
      model = MultiheadAttention(embed_dim, num_heads, kdim=kv_embed_dim, vdim=kv_embed_dim,
                                 bias=True, add_bias_kv=True, add_zero_attn=True)
      
      query = torch.rand((src_len, bsz, embed_dim))
      key = torch.rand((src_len, bsz, kv_embed_dim))
      value = torch.rand((src_len, bsz, kv_embed_dim))
      
      attn_mask = torch.randint(0, 2, (src_len, src_len)).float()
      attn_mask.masked_fill_(attn_mask == 0, float('-inf'))
      attn_mask.masked_fill_(attn_mask > 0, float('0.0'))
      
      seq_mask = torch.randint(0, 2, (1, src_len))
      key_padding_mask = seq_mask
      for i in range(bsz-1):
          key_padding_mask = torch.cat([key_padding_mask, seq_mask], axis=0)
      key_padding_mask = key_padding_mask == 1
      
      # Apply torch.nn version
      model.enable_torch_version = True
      torch_output, torch_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask)
      
      # Apply fairseq version
      model.enable_torch_version = False
      fairseq_output, fairseq_weight = model(query, key, value, key_padding_mask=key_padding_mask, attn_mask=attn_mask)
      
      print("torch and fairseq generate same results: outputs are same ? ",
            torch.allclose(torch_output, fairseq_output, atol=5e-6, rtol=1e-6),
            ", weights are same ? ",
            torch.allclose(torch_weight, fairseq_weight, atol=5e-6, rtol=1e-6)
      )
      ```
      ------------------------------------------------------------------------------------
      Expected results:
      torch and fairseq generate same results: outputs are same ?  True , weights are same ?  True
      
      ------------------------------------------------------------------------------------
      Similar performance is expected for both two versions. Using the following setup and have the initial performance benchmark results:
      
      #########################
      embed_dim = 32
      kv_embed_dim = 32
      num_heads = 4
      src_len = 3
      tgt_len = 2
      bsz = 4
      num_samples = 50000
      
      #########################
      torch-version MultiheadAttention cpu time: 0.46589  ms per iteration.
      fairseq-version MultiheadAttention cpu time: 0.47861  ms per iteration.
      torch-version MultiheadAttention gpu time: 0.82330  ms per iteration.
      fairseq-version MultiheadAttention gpu time: 0.79410  ms per iteration.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/772
      
      Reviewed By: myleott
      
      Differential Revision: D16108450
      
      Pulled By: zhangguanheng66
      
      fbshipit-source-id: cd2eb5a6eeeab6c274999b7928c2af14fc211565
      6d2e0831
  2. 06 Jul, 2019 2 commits
  3. 04 Jul, 2019 1 commit
    • Spencer Poff's avatar
      support streaming iterator · 5c241c8c
      Spencer Poff authored
      Summary:
      For tasks that involve streaming data directly from an API, we need a simpler epoch iterator.
      
      Also included in this change is support for initializing a dictionary with an arbitrary list of special symbols.
      
      Reviewed By: myleott
      
      Differential Revision: D16110603
      
      fbshipit-source-id: be6d9f680292dec1512614871f9269c95ac84861
      5c241c8c
  4. 02 Jul, 2019 1 commit
    • Xutai Ma's avatar
      add --max-tokens-valid option for validation · bccfddbb
      Xutai Ma authored
      Summary: Add the max-token-valid option. Sometime a separate max batch tokens for validation may be helpful, for example when there is a long sequence in validation set thats larger than max_tokens (it's rare in MT but could happen in ASR or AST).
      
      Reviewed By: myleott
      
      Differential Revision: D16076951
      
      fbshipit-source-id: ae7f4218594580b9450a8196d7afa1e7e2018aee
      bccfddbb
  5. 01 Jul, 2019 3 commits
  6. 30 Jun, 2019 4 commits
  7. 27 Jun, 2019 2 commits
    • Bao-Yu's avatar
      Update generate.py (#831) · c86d70cc
      Bao-Yu authored
      Summary:
      Repeated use of 'i' in evaluate may cause some problems.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/831
      
      Differential Revision: D15980227
      
      Pulled By: myleott
      
      fbshipit-source-id: 7b6b54a6b54938ad63ed1720d930505b56e5c84b
      c86d70cc
    • Nayan Singhal's avatar
      2/N bmuf · c246df42
      Nayan Singhal authored
      Summary:
      Added BMUF implementation.
      
      Todo:
      1) Add unit test case for testing model averaging and bmuf
      2) Add warm before actually start training the model
      
      Reviewed By: jay-mahadeokar
      
      Differential Revision: D15871477
      
      fbshipit-source-id: 866b0aba2d5bea5b65b4438acb49c886c4a87924
      c246df42
  8. 26 Jun, 2019 3 commits
  9. 25 Jun, 2019 1 commit
    • freewym's avatar
      avoid "divided by zero error" in logging_outputs when --use-bmuf is e… (#812) · b3864b28
      freewym authored
      Summary:
      … enabled.
      
      When doing multi-gpu training with --use-bmuf turned on and --global-sync-iter > 1, each replica may not sync with other replicas at each iteration. So logging_outputs only has stats of their own.  On the other hand, logging_outputs may be empty at the end of an epoch after "a dummy iteration" because the number of replicas does not divide the number of batches of the training data. If this happens, sample_size and ntokens would be 0 for some replica  and cause "divided by 0" error. This fix sets *loss to 0 if sample_size/ntokens is 0.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/812
      
      Reviewed By: myleott, yqwangustc
      
      Differential Revision: D15908614
      
      Pulled By: nayansinghal
      
      fbshipit-source-id: c92e8e095f012bdb4ef753a3c627fd215afa215d
      b3864b28
  10. 24 Jun, 2019 1 commit
  11. 23 Jun, 2019 3 commits
  12. 21 Jun, 2019 2 commits
  13. 20 Jun, 2019 6 commits
    • Matt Le's avatar
      Use bert init for xlm_base · 6be5f07c
      Matt Le authored
      Summary:
      Use bert init for xlm_base.  This seems to be much closer to what is done in the [XLM](https://github.com/facebookresearch/XLM/blob/master/src/model/transformer.py#L44) repo.
      
      At update 10 with BERT init (f121471600), loss starts at 14.234
      
      At update 10 without BERT init (f121471612), loss starts at 154.423
      
      Reviewed By: liezl200, pipibjc
      
      Differential Revision: D15874836
      
      fbshipit-source-id: f81bf83a078992d7476ba7fdf263b731a9f5b66d
      6be5f07c
    • Myle Ott's avatar
      v0.7.1: fix PyPI setup and tests · 881381cf
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/818
      
      Differential Revision: D15916265
      
      Pulled By: myleott
      
      fbshipit-source-id: c66c0bd988d3472c4150226952f34ee8d4c3db86
      881381cf
    • davidecaroselli's avatar
      Enhanced MMapIndexedDataset: less memory, higher speed (#816) · 9462a819
      davidecaroselli authored
      Summary:
      I have made an upgrade to my previous implementation of MMapIndexedDataset, now:
      - It uses up to **4 times less memory and disk space**
      - Words per second is slightly improved thanks to less memory access
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/816
      
      Differential Revision: D15899848
      
      Pulled By: myleott
      
      fbshipit-source-id: 9ddeb4809729ef69cc6b0867b33ee71184d845e6
      9462a819
    • Peng-Jen Chen's avatar
      Better explain the inference argument format of multilingual translation · 9c3bb5c6
      Peng-Jen Chen authored
      Summary:
      In https://github.com/pytorch/fairseq/issues/656, people are often confused about how to set multilingual translation parameters at inference time.
      
      This diff add more checks to ensure the arguments (`--lang-pairs`, `--encoder-langtok`, `--decoder-langtok`) load from checkpoint are consistent with arguments specified in generate/interactive command line.
      We also add a section in example page to explain how to set the arguments
      
      Reviewed By: myleott
      
      Differential Revision: D15682169
      
      fbshipit-source-id: 64e6db94cd72ea7ce2d0aa1067c9c2dcd3b8a2ac
      9c3bb5c6
    • alexeib's avatar
      wav2vec model (#654) · 392fce8a
      alexeib authored
      Summary:
      Merging wav2vec to master. Includes renames (Cpc -> wav2vec) and some light example files.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/654
      
      Differential Revision: D15913409
      
      Pulled By: alexeib
      
      fbshipit-source-id: f723e6f211706cd9431c7d76dc12c4e80c9cfc80
      392fce8a
    • Myle Ott's avatar
      v0.7.0 (#817) · bd710e75
      Myle Ott authored
      Summary:
      Notable (possibly breaking) changes:
      - d45db804: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
      - f2563c21: Move LM definitions into separate files
      - dffb1674: Updates to model API:
        - `FairseqModel` -> `FairseqEncoderDecoderModel`
        - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
        - `encoder_out_dict` -> `encoder_out`
        - rm unused `remove_head` functions
      - 34726d56: Move `distributed_init` into `DistributedFairseqModel`
      - cf17068a: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
      - d45db804: Change default LR scheduler from `reduce_lr_on_plateau` to `fixed`
      - 96ac28d3: Rename `--sampling-temperature` -> `--temperature`
      - fc1a19a3: Deprecate dummy batches
      - a1c997bd: Add memory mapped datasets
      - 0add50c2: Allow cycling over multiple datasets, where each one becomes an "epoch"
      
      Plus many additional features and bugfixes
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/817
      
      Differential Revision: D15913844
      
      Pulled By: myleott
      
      fbshipit-source-id: d5b5d678efdd9dd3e4d7ca848ddcf1ec2b21bf6b
      bd710e75
  14. 19 Jun, 2019 4 commits
  15. 15 Jun, 2019 1 commit
  16. 13 Jun, 2019 1 commit
  17. 12 Jun, 2019 3 commits
    • Nayan Singhal's avatar
      Add Model Averaging · 6982c404
      Nayan Singhal authored
      Summary:
      Implemented model averaging for fairseq.
      Removed the ddp wrapper if global optimizer is provided.
      Syncing all the models based on the iteration provide in the input
      
      TODO:
      1) Fix throughput and wps meter. Need to check other meters too.
      2) Replace Model average code with BMUF algorithm implementation.
      
      Reviewed By: myleott
      
      Differential Revision: D15711044
      
      fbshipit-source-id: 58a4af74db2a61d06762597b95836cbeb1ed82cc
      6982c404
    • Myle Ott's avatar
      Add more torch.hub deps · 78c2fcf0
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/801
      
      Differential Revision: D15781975
      
      Pulled By: myleott
      
      fbshipit-source-id: b86276cd3a40138c09494637c43ce52a56c4aced
      78c2fcf0
    • Myle Ott's avatar
      Add missing dependencies to hubconf · 37df862e
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/799
      
      Differential Revision: D15773932
      
      Pulled By: myleott
      
      fbshipit-source-id: 650c0621bedb3b7ecebc0654d8e10d7692c50994
      37df862e
  18. 11 Jun, 2019 1 commit