1. 10 May, 2023 2 commits
  2. 09 May, 2023 6 commits
  3. 05 May, 2023 6 commits
    • Xiaohui Zhang's avatar
      fix doc of specaugment transform (#3314) · a8dc4de5
      Xiaohui Zhang authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3314
      
      Reviewed By: nateanl
      
      Differential Revision: D45621958
      
      Pulled By: xiaohui-zhang
      
      fbshipit-source-id: 17555a865790adadc2abd40a86571596386a12fc
      a8dc4de5
    • Zhaoheng Ni's avatar
      Update squim tutorial (#3313) · 05ef7dc6
      Zhaoheng Ni authored
      Summary:
      Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3313
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45620311
      
      Pulled By: nateanl
      
      fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557
      05ef7dc6
    • Xiaohui Zhang's avatar
      Add SpecAugment transform (#3309) · 82febc59
      Xiaohui Zhang authored
      Summary:
      (2/2 of the previous https://github.com/pytorch/audio/pull/2360 which I accidentally closed)
      
      The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
      - Only zero masking can be done; masking by mean value is not supported.
      - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
      - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
      - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
      - It's not straightforward to apply multiple time/frequency masks by the current design. If we need N masks across time/frequency axis, we need to sequentially apply N Frequency/TimeMasking transforms to input tensors, and such API looks very inconvenient. We need to introduce a separate SpecAugment transform to handle this.
      
      To solve these issues, here we
      [done in the previous [PR](https://github.com/pytorch/audio/pull/3289)] Extend mask_along_axis_iid to support 3D+ tensors and mask_along_axis to support 2D+ tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.
      [done in this PR] Introducing SpecAugment transform.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3309
      
      Reviewed By: nateanl
      
      Differential Revision: D45592926
      
      Pulled By: xiaohui-zhang
      
      fbshipit-source-id: 97cd686dbb6c1c6ff604716b71a876e616aaf1a2
      82febc59
    • huyao's avatar
      Fix missing PTS initialization with NVIDIA encoder (#3312) · 1e3af12f
      huyao authored
      Summary:
      Fix **Failed to write packet (Invalid argument)** error when encoding FLV video streams using NVIDIA hardware encoders.
      
      Resolve https://github.com/pytorch/audio/issues/3311
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3312
      
      Reviewed By: nateanl
      
      Differential Revision: D45611656
      
      Pulled By: mthrok
      
      fbshipit-source-id: 531a83a27d3b19ed9e9aedd161769c60aa0bd175
      1e3af12f
    • moto's avatar
      Fix doc version (#3310) · bfb47017
      moto authored
      Summary:
      Fixes the regression caused by build_doc job GHA migration. The version number is not properly set.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3310
      
      Reviewed By: nateanl
      
      Differential Revision: D45607829
      
      Pulled By: mthrok
      
      fbshipit-source-id: 3450a38fa6982fcc56676a80144e9eed1aad02ec
      bfb47017
    • moto's avatar
      Fix MKL issue on Intel mac build (#3307) · 3e897ca7
      moto authored
      Summary:
      * Remove MKL and NumPy from Conda build env
      * Remove `caffe2::mkl` dependency from `torch_cpu`, which introduced unnecessary and undesided dependency on Intel mac.
      
      TorchAudio does not use BLAS libraries directly, thus all the mentions to MKL should be removed from the codebase.
      However, this was causing an issue on Intel mac. It turned out that `torch_cpu` target is pulling `caffe2::mkl` dependency, and the linker on macOS keeps library dependency even if no symbol from that library is used. This stray mkl dependency should be fixed on core side, but also we can modify the target temporarily and remove them.
      
      Also we don't need NumPy on build/run time, so that is removed as well.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3307
      
      Reviewed By: atalman
      
      Differential Revision: D45606944
      
      Pulled By: mthrok
      
      fbshipit-source-id: 853411ccbbca31796b808a2b052b4cfa564718cd
      3e897ca7
  4. 04 May, 2023 3 commits
    • atalman's avatar
      Add older mkl build contraint only (#3302) · 1e48af06
      atalman authored
      Summary:
      Similar to what we used to have here:
      https://github.com/pytorch/test-infra/pull/3896/files
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3302
      
      Reviewed By: nateanl
      
      Differential Revision: D45574845
      
      Pulled By: atalman
      
      fbshipit-source-id: 142c35dfd811a5f5c170dcd082bec8d055edd9cb
      1e48af06
    • atalman's avatar
      Add mkl dependency to torchaudio MacOS x86 builds (#3300) · b5795943
      atalman authored
      Summary:
      Add mkl dependency to torchaudio MacOS x86 builds
      
      Already tested here: https://github.com/pytorch/audio/actions/runs/4878179835/jobs/8703586137
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3300
      
      Reviewed By: jeanschmidt, mthrok
      
      Differential Revision: D45566352
      
      Pulled By: atalman
      
      fbshipit-source-id: a0376016506891240b2dd03d4fa4889028bf764b
      b5795943
    • Xiaohui Zhang's avatar
      Extend mask_along_axis{,_iid} (#3289) · 74bd971a
      Xiaohui Zhang authored
      Summary:
      (1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed)
      
      The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
      - Only zero masking can be done; masking by mean value is not supported.
      - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
      - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
      - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
      - It's not straightforward to apply multiple time/frequency masks by the current design.
      
      To solve these issues, here we
      - Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.
      
      The introduction of SpecAugment transform will be done in another PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3289
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45460357
      
      Pulled By: xiaohui-zhang
      
      fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3
      74bd971a
  5. 03 May, 2023 4 commits
  6. 02 May, 2023 3 commits
  7. 01 May, 2023 2 commits
  8. 29 Apr, 2023 1 commit
  9. 28 Apr, 2023 1 commit
    • Yuekai Zhang's avatar
      Add cuctc decoder (#3096) · 0a1801ed
      Yuekai Zhang authored
      Summary:
      This PR implements a CUDA based ctc prefix beam search decoder.
      
      Attach serveral benchmark results using V100 below:
      |decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
      |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
      | cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
      | cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|
      
      Note:
      1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
      2. WER is the same as CPU implementations. However, it can't decode with LM now.
      
      Resolves: https://github.com/pytorch/audio/issues/2957.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3096
      
      Reviewed By: nateanl
      
      Differential Revision: D44709397
      
      Pulled By: mthrok
      
      fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
      0a1801ed
  10. 25 Apr, 2023 1 commit
  11. 19 Apr, 2023 2 commits
  12. 18 Apr, 2023 1 commit
  13. 12 Apr, 2023 3 commits
  14. 11 Apr, 2023 2 commits
  15. 10 Apr, 2023 3 commits