1. 28 Aug, 2019 1 commit
    • Naman Goyal's avatar
      use numpy function for filter by size when possible (#845) · 108f94bc
      Naman Goyal authored
      Summary:
      For general Masked language modeling use-case, this is much faster, (`3 minutes vs 1 sec`).
      
      Let me know what you think about it myleott, if you don't like all the special case checking, we can think of reorganizing the dataset APIs to always have `sizes` as property calculated in `__init__`.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/845
      
      Reviewed By: myleott
      
      Differential Revision: D16993769
      
      Pulled By: myleott
      
      fbshipit-source-id: 161bba62af2965190c07c47e838ee967cb886e88
      108f94bc
  2. 27 Aug, 2019 4 commits
  3. 26 Aug, 2019 1 commit
  4. 23 Aug, 2019 3 commits
    • Myle Ott's avatar
      Suppress leaked semaphore warnings · 833f053d
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/844
      
      Differential Revision: D16985131
      
      Pulled By: myleott
      
      fbshipit-source-id: 66ba3b9aa0cdf329a1e38fc09786f34906afdb43
      833f053d
    • Naman Goyal's avatar
      Cythonize token block dataset (#834) · 4fc39538
      Naman Goyal authored
      Summary:
      Cythonized token block dataset code, it's `> 100x` faster. Token block for entire `bookwiki+CC+stories+openweb` is just ~`39.9` seconds.
      
      TODO:
      1) I think, I can make it 2x more faster.
      2) cleanup.
      
      EDIT History:
      ~~First pass at parellelizing `token_block_dataset`. The code feels somewhat complicated and cluttered.
      This is 2-3x faster though on my tests on `bookwiki` dataset with both `complete` and `complete_doc` modes.
      myleott Can you take a look for correctness as I am still not 100% sure that I am not missing corner cases.~~
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/834
      
      Test Plan:
      Imported from GitHub, without a `Test Plan:` line.
      
      Test workflow: f133816198
      
      Reviewed By: myleott
      
      Differential Revision: D16970257
      
      Pulled By: myleott
      
      fbshipit-source-id: ec45a308193c9e9f3e7075336c15df4723228d6f
      4fc39538
    • Alexei Baevski's avatar
      wav2vec everstore support · 6e2bd794
      Alexei Baevski authored
      Summary: changes for internal support
      
      Differential Revision: D16646887
      
      fbshipit-source-id: ac5bf6c32901819726249422324eae32a0a6e148
      6e2bd794
  5. 22 Aug, 2019 3 commits
  6. 21 Aug, 2019 4 commits
    • Trinkle23897's avatar
      fix string format to work in python 3.5 (#1050) · 93057cc0
      Trinkle23897 authored
      Summary:
      change string fromat in fairseq/data/subsample_dataset.py#20
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1050
      
      Differential Revision: D16946060
      
      Pulled By: okhonko
      
      fbshipit-source-id: 0eabf22e7ffd4f658b6d18c87dc6e59c81a355c7
      93057cc0
    • Jeff Cai's avatar
      Parameterized criterions (#808) · ba5f829f
      Jeff Cai authored
      Summary:
      Support criterion with parameters, such as AutoSegmentationCriterion (ASG) used in wav2letter which has a transition matrix parameter. This is needed to integrate wav2letter's ASG into PySpeech.
      
      With this diff, parameters in criterions will be:
      (1) updated by optimizers, with a configurable learning rate
      (2) saved and loaded from checkpoints, preserving backward compatibility for criterions without parameters
      (3) synchronized across nodes in distributed training.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/808
      
      Reviewed By: jcai1
      
      Differential Revision: D16934097
      
      Pulled By: okhonko
      
      fbshipit-source-id: 121ec9382459385c6f9cbef3a8274bec1a434038
      ba5f829f
    • alexeib's avatar
      Multiset (#838) · a2f5361d
      alexeib authored
      Summary:
      Adds ability to tag individual examples with the names of their datasets, along with some minor miscellaneous fixes and improvements
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/838
      
      Differential Revision: D16919175
      
      Pulled By: alexeib
      
      fbshipit-source-id: 4bf493299645bae63f3ee6382e15f18a9f73666c
      a2f5361d
    • Siddharth Dalmia's avatar
      vggblock support without pooling and pooling_kernel_size missing self (#839) · 7a31fe06
      Siddharth Dalmia authored
      Summary:
      1) VggBlock was not supported if pooling kernel size was None.
      2) Since we modify pooling kernel size by using _pair. We should use self.pooling_kernel_size. But I agree it doesn't matter as pytorch is robust to this.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/839
      
      Differential Revision: D16934112
      
      Pulled By: okhonko
      
      fbshipit-source-id: b6b95163b0e7f7203d76d535f01a41912382bdc3
      7a31fe06
  7. 20 Aug, 2019 2 commits
  8. 19 Aug, 2019 6 commits
  9. 17 Aug, 2019 1 commit
  10. 16 Aug, 2019 2 commits
  11. 15 Aug, 2019 5 commits
  12. 14 Aug, 2019 5 commits
  13. 13 Aug, 2019 3 commits