1. 31 Aug, 2019 1 commit
  2. 30 Aug, 2019 2 commits
    • alexeib's avatar
      set numpy seed explicitly + other minor fixes (#850) · 4a7cd582
      alexeib authored
      Summary:
      not setting the numpy seed explicitly at the beginning was an extremely annoying bug to find. it it caused different gpus to have a different view of data if some randomization was used in the dataset (e.g. subsample dataset)
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/850
      
      Differential Revision: D17085006
      
      Pulled By: alexeib
      
      fbshipit-source-id: 62bb2116369fb703df878e6bc24c06f1ea4e75a0
      4a7cd582
    • Paul O'Shannessy's avatar
      Adopt Contributor Covenant · 8777465b
      Paul O'Shannessy authored
      Summary:
      In order to foster healthy open source communities, we're adopting the
      [Contributor Covenant](https://www.contributor-covenant.org/). It has been
      built by open source community members and represents a shared understanding of
      what is expected from a healthy community.
      
      Reviewed By: josephsavona, danobi, rdzhabarov
      
      Differential Revision: D17104640
      
      fbshipit-source-id: d210000de686c5f0d97d602b50472d5869bc6a49
      8777465b
  3. 29 Aug, 2019 1 commit
  4. 28 Aug, 2019 1 commit
    • Naman Goyal's avatar
      use numpy function for filter by size when possible (#845) · 108f94bc
      Naman Goyal authored
      Summary:
      For general Masked language modeling use-case, this is much faster, (`3 minutes vs 1 sec`).
      
      Let me know what you think about it myleott, if you don't like all the special case checking, we can think of reorganizing the dataset APIs to always have `sizes` as property calculated in `__init__`.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/845
      
      Reviewed By: myleott
      
      Differential Revision: D16993769
      
      Pulled By: myleott
      
      fbshipit-source-id: 161bba62af2965190c07c47e838ee967cb886e88
      108f94bc
  5. 27 Aug, 2019 4 commits
  6. 26 Aug, 2019 1 commit
  7. 23 Aug, 2019 3 commits
    • Myle Ott's avatar
      Suppress leaked semaphore warnings · 833f053d
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/844
      
      Differential Revision: D16985131
      
      Pulled By: myleott
      
      fbshipit-source-id: 66ba3b9aa0cdf329a1e38fc09786f34906afdb43
      833f053d
    • Naman Goyal's avatar
      Cythonize token block dataset (#834) · 4fc39538
      Naman Goyal authored
      Summary:
      Cythonized token block dataset code, it's `> 100x` faster. Token block for entire `bookwiki+CC+stories+openweb` is just ~`39.9` seconds.
      
      TODO:
      1) I think, I can make it 2x more faster.
      2) cleanup.
      
      EDIT History:
      ~~First pass at parellelizing `token_block_dataset`. The code feels somewhat complicated and cluttered.
      This is 2-3x faster though on my tests on `bookwiki` dataset with both `complete` and `complete_doc` modes.
      myleott Can you take a look for correctness as I am still not 100% sure that I am not missing corner cases.~~
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/834
      
      Test Plan:
      Imported from GitHub, without a `Test Plan:` line.
      
      Test workflow: f133816198
      
      Reviewed By: myleott
      
      Differential Revision: D16970257
      
      Pulled By: myleott
      
      fbshipit-source-id: ec45a308193c9e9f3e7075336c15df4723228d6f
      4fc39538
    • Alexei Baevski's avatar
      wav2vec everstore support · 6e2bd794
      Alexei Baevski authored
      Summary: changes for internal support
      
      Differential Revision: D16646887
      
      fbshipit-source-id: ac5bf6c32901819726249422324eae32a0a6e148
      6e2bd794
  8. 22 Aug, 2019 3 commits
  9. 21 Aug, 2019 4 commits
    • Trinkle23897's avatar
      fix string format to work in python 3.5 (#1050) · 93057cc0
      Trinkle23897 authored
      Summary:
      change string fromat in fairseq/data/subsample_dataset.py#20
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1050
      
      Differential Revision: D16946060
      
      Pulled By: okhonko
      
      fbshipit-source-id: 0eabf22e7ffd4f658b6d18c87dc6e59c81a355c7
      93057cc0
    • Jeff Cai's avatar
      Parameterized criterions (#808) · ba5f829f
      Jeff Cai authored
      Summary:
      Support criterion with parameters, such as AutoSegmentationCriterion (ASG) used in wav2letter which has a transition matrix parameter. This is needed to integrate wav2letter's ASG into PySpeech.
      
      With this diff, parameters in criterions will be:
      (1) updated by optimizers, with a configurable learning rate
      (2) saved and loaded from checkpoints, preserving backward compatibility for criterions without parameters
      (3) synchronized across nodes in distributed training.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/808
      
      Reviewed By: jcai1
      
      Differential Revision: D16934097
      
      Pulled By: okhonko
      
      fbshipit-source-id: 121ec9382459385c6f9cbef3a8274bec1a434038
      ba5f829f
    • alexeib's avatar
      Multiset (#838) · a2f5361d
      alexeib authored
      Summary:
      Adds ability to tag individual examples with the names of their datasets, along with some minor miscellaneous fixes and improvements
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/838
      
      Differential Revision: D16919175
      
      Pulled By: alexeib
      
      fbshipit-source-id: 4bf493299645bae63f3ee6382e15f18a9f73666c
      a2f5361d
    • Siddharth Dalmia's avatar
      vggblock support without pooling and pooling_kernel_size missing self (#839) · 7a31fe06
      Siddharth Dalmia authored
      Summary:
      1) VggBlock was not supported if pooling kernel size was None.
      2) Since we modify pooling kernel size by using _pair. We should use self.pooling_kernel_size. But I agree it doesn't matter as pytorch is robust to this.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/839
      
      Differential Revision: D16934112
      
      Pulled By: okhonko
      
      fbshipit-source-id: b6b95163b0e7f7203d76d535f01a41912382bdc3
      7a31fe06
  10. 20 Aug, 2019 2 commits
  11. 19 Aug, 2019 6 commits
  12. 17 Aug, 2019 1 commit
  13. 16 Aug, 2019 2 commits
  14. 15 Aug, 2019 5 commits
  15. 14 Aug, 2019 4 commits
    • Nathan Ng's avatar
      initial light and dynamic convolution kernels (#547) · f840564d
      Nathan Ng authored
      Summary:
      CUDA code for light/dynamicconv kernels, including pytorch modules. Modules can be built by running setup.py in each respective folder, and can then be imported and used like any other module.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/547
      
      Reviewed By: myleott, shubho
      
      Differential Revision: D15703660
      
      Pulled By: nng555
      
      fbshipit-source-id: e9c913753be3a1cd571965f7200df6678b644520
      f840564d
    • Myle Ott's avatar
      Update READMEs · b8704686
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/823
      
      Differential Revision: D16804995
      
      Pulled By: myleott
      
      fbshipit-source-id: abac5dc0ed6b7bfe2309ba273456e54b37340b2c
      b8704686
    • Myle Ott's avatar
      v0.7.2 -> v0.8.0 (#1017) · ffffe04e
      Myle Ott authored
      Summary:
      Changelog:
      - Relicensed under MIT license
      - Add RoBERTa
      - Add wav2vec
      - Add WMT'19 models
      - Add initial ASR code
      - Changed torch.hub interface (`generate` renamed to `translate`)
      - Add `--tokenizer` and `--bpe`
      - f812e529: Renamed data.transforms -> data.encoders
      - 654affc0: New Dataset API (optional)
      - `47fd9852`: Deprecate old Masked LM components
      - `5f78106a`: Set mmap as default dataset format and infer format automatically
      - Misc fixes for sampling
      - Misc fixes to support PyTorch 1.2
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1017
      
      Differential Revision: D16799880
      
      Pulled By: myleott
      
      fbshipit-source-id: 45ad8bc531724a53063cbc24ca1c93f715cdc5a7
      ffffe04e
    • Myle Ott's avatar
      Fix tests · 7c89e13f
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/822
      
      Differential Revision: D16800078
      
      Pulled By: myleott
      
      fbshipit-source-id: b86e08e01f2fe13c64b77f1d23a5f6800f252bf7
      7c89e13f