1. 22 May, 2023 3 commits
  2. 21 May, 2023 2 commits
  3. 20 May, 2023 1 commit
  4. 19 May, 2023 1 commit
  5. 17 May, 2023 4 commits
    • moto's avatar
      Improve the performance of YUV420P frame conversion (#3342) · 72d3fe09
      moto authored
      Summary:
      This commit improve the performance of conversions of YUV420P format from AVFrame to torch Tensor.
      
      It changes two things;
      1. Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy.
      2.  Get rid of intermediate UV plane copy
      
      The following compares the time it takes to process 30 seconds of YUV420P frame at 25 FPS of resolution 320x240. The measurement times are sorted by values.
      
      Some observations
      * `torch::nn::functional::interpolate` with `torch::kNearest` option is not as fast as copying data manually.
      * switching from `interpolate` to manual data copy reduces the variance.
      
      run | main | 1 | 1+2 | improvement (from main to 1+2)
      -- | -- | -- | -- | --
      1 | 0.452250583 | 0.417490125 | 0.40155375 | 11.21%
      2 | 0.462039958 | 0.42006675 | 0.401764125 | 13.05%
      3 | 0.463067666 | 0.42416 | 0.402651334 | 13.05%
      4 | 0.464228166 | 0.424545458 | 0.402985667 | 13.19%
      5 | 0.465777375 | 0.425629208 | 0.405604625 | 12.92%
      6 | 0.469628666 | 0.427044333 | 0.40628525 | 13.49%
      7 | 0.475935125 | 0.42805875 | 0.406412167 | 14.61%
      8 | 0.482277667 | 0.429921209 | 0.407279 | 15.55%
      9 | 0.496695208 | 0.431182792 | 0.442013791 | 11.01%
      10 | 0.546653625 | 0.541639584 | 0.4711585 | 13.81%
      
      [second]
      
      Increasing the resolution, the improvement is smaller but is consistent.
      
      run | main | 1+2 | improvement
      -- | -- | -- | --
      1 | 4.032393 | 3.991784667 | 1.01%
      2 | 4.052248084 | 3.992672208 | 1.47%
      3 | 4.07705575 | 4.000541666 | 1.88%
      4 | 4.143954792 | 4.020671584 | 2.98%
      5 | 4.170711959 | 4.025753125 | 3.48%
      6 | 4.240229292 | 4.045504875 | 4.59%
      7 | 4.267384042 | 4.045588125 | 5.20%
      8 | 4.277025958 | 4.061980083 | 5.03%
      9 | 4.312192042 | 4.163251959 | 3.45%
      10 | 4.406109875 | 4.312560334 | 2.12%
      
      <details><summary>code</summary>
      
      ```python
      import time
      
      from torchaudio.io import StreamReader
      
      def test():
          r = StreamReader(src="testsrc=duration=30", format="lavfi")
          # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi")
          r.add_video_stream(-1, filter_desc="format=yuv420p")
          t0 = time.monotonic()
          r.process_all_packets()
          elapsed = time.monotonic() - t0
          print(elapsed)
      
      for _ in range(10):
          test()
      ```
      </details>
      
      <details><summary>env</summary>
      
      ```
      PyTorch version: 2.1.0.dev20230325
      Is debug build: False
      CUDA used to build PyTorch: None
      ROCM used to build PyTorch: N/A
      
      OS: macOS 13.3.1 (arm64)
      GCC version: Could not collect
      Clang version: 14.0.6
      CMake version: version 3.22.1
      Libc version: N/A
      
      Python version: 3.9.16 (main, Mar  8 2023, 04:29:24)  [Clang 14.0.6 ] (64-bit runtime)
      Python platform: macOS-13.3.1-arm64-arm-64bit
      Is CUDA available: False
      CUDA runtime version: No CUDA
      CUDA_MODULE_LOADING set to: N/A
      GPU models and configuration: No CUDA
      Nvidia driver version: No CUDA
      cuDNN version: No CUDA
      HIP runtime version: N/A
      MIOpen runtime version: N/A
      Is XNNPACK available: True
      
      CPU:
      Apple M1
      
      Versions of relevant libraries:
      [pip3] torch==2.1.0.dev20230325
      [pip3] torchaudio==2.1.0a0+541b525
      [conda] pytorch                   2.1.0.dev20230325         py3.9_0    pytorch-nightly
      [conda] torchaudio                2.1.0a0+541b525           dev_0    <develop>
      ```
      
      </details>
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3342
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D45947716
      
      Pulled By: mthrok
      
      fbshipit-source-id: 17e5930f57544b4f2e48a9b2185464694a88ab68
      72d3fe09
    • moto's avatar
      Improve the performance of NV12 frame conversion (#3344) · c11661e0
      moto authored
      Summary:
      Similar to https://github.com/pytorch/audio/pull/3342, this commit improves the performance of NV12 frame conversion.
      
      It changes two things;
      
      - Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy.
      - Get rid of intermediate UV plane copy
      
      with 320x240
      
      run | main | pr | improvement
      -- | -- | -- | --
      1 | 0.600671417 | 0.464993125 | 22.59%
      2 | 0.638846084 | 0.456763542 | 28.50%
      3 | 0.64158175 | 0.458295333 | 28.57%
      4 | 0.649868584 | 0.455450583 | 29.92%
      5 | 0.612171333 | 0.462435625 | 24.46%
      6 | 0.6128095 | 0.456716166 | 25.47%
      7 | 0.632084583 | 0.463357083 | 26.69%
      8 | 0.610733083 | 0.46148625 | 24.44%
      9 | 0.613825834 | 0.4559555 | 25.72%
      10 | 0.653857458 | 0.455375375 | 30.36%
      
      [second]
      
      with 1080x720 video
      
      run | main | pr | improvement
      -- | -- | -- | --
      1 | 4.984154333 | 4.21090375 | 15.51%
      2 | 4.988090625 | 4.239649375 | 15.00%
      3 | 4.988896375 | 4.227277458 | 15.27%
      4 | 4.998186584 | 4.161077042 | 16.75%
      5 | 5.06180425 | 4.191672584 | 17.19%
      6 | 5.108769667 | 4.198468458 | 17.82%
      7 | 5.151363625 | 4.181942167 | 18.82%
      8 | 5.199527875 | 4.239319084 | 18.47%
      9 | 5.224903708 | 4.194901959 | 19.71%
      10 | 5.333422583 | 4.320925792 | 18.98%
      
      [second]
      
      <details><summary>code</summary>
      
      ```python
      import time
      
      from torchaudio.io import StreamReader
      
      def test():
          r = StreamReader(src="testsrc=duration=30", format="lavfi")
          # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi")
          r.add_video_stream(-1, filter_desc="format=nv12")
          t0 = time.monotonic()
          r.process_all_packets()
          elapsed = time.monotonic() - t0
          print(elapsed)
      
      for _ in range(10):
          test()
      ```
      </details>
      
      <details><summary>env</summary>
      
      ```
      PyTorch version: 2.1.0.dev20230325
      Is debug build: False
      CUDA used to build PyTorch: None
      ROCM used to build PyTorch: N/A
      
      OS: macOS 13.3.1 (arm64)
      GCC version: Could not collect
      Clang version: 14.0.6
      CMake version: version 3.22.1
      Libc version: N/A
      
      Python version: 3.9.16 (main, Mar  8 2023, 04:29:24)  [Clang 14.0.6 ] (64-bit runtime)
      Python platform: macOS-13.3.1-arm64-arm-64bit
      Is CUDA available: False
      CUDA runtime version: No CUDA
      CUDA_MODULE_LOADING set to: N/A
      GPU models and configuration: No CUDA
      Nvidia driver version: No CUDA
      cuDNN version: No CUDA
      HIP runtime version: N/A
      MIOpen runtime version: N/A
      Is XNNPACK available: True
      
      CPU:
      Apple M1
      
      Versions of relevant libraries:
      [pip3] torch==2.1.0.dev20230325
      [pip3] torchaudio==2.1.0a0+541b525
      [conda] pytorch                   2.1.0.dev20230325         py3.9_0    pytorch-nightly
      [conda] torchaudio                2.1.0a0+541b525           dev_0    <develop>
      ```
      
      </details>
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3344
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D45948511
      
      Pulled By: mthrok
      
      fbshipit-source-id: ae9b300cbcb4295f3f7470736f258280005a21e5
      c11661e0
    • Carl Parker's avatar
      Fix for breadcrumbs displaying "Old version (stable)" on Nightly build (#3333) · 3ffd76c8
      Carl Parker authored
      Summary:
      Previously, `breadcrumbs.html` identified a nightly build version by the prefix "Nightly" which would normally be prepended to the version in `conf.py`. However, the version string is coming through without the "Nightly" prefix, so this change causes `breadcrumbs.html` to key on the substring "dev" instead.
      
      The reason we aren't getting "Nightly" is apparently because the environment variable BUILD_VERSION is available, so `conf.py` is using the value of that env var instead of the version string imported from the `torchaudio` module itself, which actually appears to be incorrect; see below.
      
      If I install torchaudio using
      
          conda install torchaudio -c pytorch-nightly
      
      then `torchaudio.__version__` returns the incorrect version string:
      
          2.0.0.dev20230309
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3333
      
      Reviewed By: mthrok
      
      Differential Revision: D45926466
      
      Pulled By: carljparker
      
      fbshipit-source-id: d5516f2d9f1716c2400d3e9b285bd5d32b4b3a77
      3ffd76c8
    • moto's avatar
      Add 420p10le CPU support to StreamReader (#3332) · c12f4734
      moto authored
      Summary:
      This commit add support to decode YUV420P010LE format.
      
      The image tensor returned by this format
      - NCHW format (C == 3)
      - int16 type
      - value range [0, 2^10).
      
      Note that the value range is different from what "hevc_cuvid" decoder
      returns. "hevc_cuvid" decoder uses full range of int16 (internally,
      it's uint16) to express the color (with some intervals), but the values
      returned by CPU "hevc" decoder are with in [0, 2^10).
      
      Address https://github.com/pytorch/audio/issues/3331
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3332
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45925097
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4e669b65c030f388bba2fdbb8f00faf7e2981508
      c12f4734
  6. 16 May, 2023 3 commits
  7. 15 May, 2023 1 commit
  8. 11 May, 2023 3 commits
  9. 10 May, 2023 4 commits
  10. 09 May, 2023 6 commits
  11. 05 May, 2023 6 commits
    • Xiaohui Zhang's avatar
      fix doc of specaugment transform (#3314) · a8dc4de5
      Xiaohui Zhang authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3314
      
      Reviewed By: nateanl
      
      Differential Revision: D45621958
      
      Pulled By: xiaohui-zhang
      
      fbshipit-source-id: 17555a865790adadc2abd40a86571596386a12fc
      a8dc4de5
    • Zhaoheng Ni's avatar
      Update squim tutorial (#3313) · 05ef7dc6
      Zhaoheng Ni authored
      Summary:
      Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3313
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45620311
      
      Pulled By: nateanl
      
      fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557
      05ef7dc6
    • Xiaohui Zhang's avatar
      Add SpecAugment transform (#3309) · 82febc59
      Xiaohui Zhang authored
      Summary:
      (2/2 of the previous https://github.com/pytorch/audio/pull/2360 which I accidentally closed)
      
      The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
      - Only zero masking can be done; masking by mean value is not supported.
      - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
      - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
      - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
      - It's not straightforward to apply multiple time/frequency masks by the current design. If we need N masks across time/frequency axis, we need to sequentially apply N Frequency/TimeMasking transforms to input tensors, and such API looks very inconvenient. We need to introduce a separate SpecAugment transform to handle this.
      
      To solve these issues, here we
      [done in the previous [PR](https://github.com/pytorch/audio/pull/3289)] Extend mask_along_axis_iid to support 3D+ tensors and mask_along_axis to support 2D+ tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.
      [done in this PR] Introducing SpecAugment transform.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3309
      
      Reviewed By: nateanl
      
      Differential Revision: D45592926
      
      Pulled By: xiaohui-zhang
      
      fbshipit-source-id: 97cd686dbb6c1c6ff604716b71a876e616aaf1a2
      82febc59
    • huyao's avatar
      Fix missing PTS initialization with NVIDIA encoder (#3312) · 1e3af12f
      huyao authored
      Summary:
      Fix **Failed to write packet (Invalid argument)** error when encoding FLV video streams using NVIDIA hardware encoders.
      
      Resolve https://github.com/pytorch/audio/issues/3311
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3312
      
      Reviewed By: nateanl
      
      Differential Revision: D45611656
      
      Pulled By: mthrok
      
      fbshipit-source-id: 531a83a27d3b19ed9e9aedd161769c60aa0bd175
      1e3af12f
    • moto's avatar
      Fix doc version (#3310) · bfb47017
      moto authored
      Summary:
      Fixes the regression caused by build_doc job GHA migration. The version number is not properly set.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3310
      
      Reviewed By: nateanl
      
      Differential Revision: D45607829
      
      Pulled By: mthrok
      
      fbshipit-source-id: 3450a38fa6982fcc56676a80144e9eed1aad02ec
      bfb47017
    • moto's avatar
      Fix MKL issue on Intel mac build (#3307) · 3e897ca7
      moto authored
      Summary:
      * Remove MKL and NumPy from Conda build env
      * Remove `caffe2::mkl` dependency from `torch_cpu`, which introduced unnecessary and undesided dependency on Intel mac.
      
      TorchAudio does not use BLAS libraries directly, thus all the mentions to MKL should be removed from the codebase.
      However, this was causing an issue on Intel mac. It turned out that `torch_cpu` target is pulling `caffe2::mkl` dependency, and the linker on macOS keeps library dependency even if no symbol from that library is used. This stray mkl dependency should be fixed on core side, but also we can modify the target temporarily and remove them.
      
      Also we don't need NumPy on build/run time, so that is removed as well.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3307
      
      Reviewed By: atalman
      
      Differential Revision: D45606944
      
      Pulled By: mthrok
      
      fbshipit-source-id: 853411ccbbca31796b808a2b052b4cfa564718cd
      3e897ca7
  12. 04 May, 2023 3 commits
    • atalman's avatar
      Add older mkl build contraint only (#3302) · 1e48af06
      atalman authored
      Summary:
      Similar to what we used to have here:
      https://github.com/pytorch/test-infra/pull/3896/files
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3302
      
      Reviewed By: nateanl
      
      Differential Revision: D45574845
      
      Pulled By: atalman
      
      fbshipit-source-id: 142c35dfd811a5f5c170dcd082bec8d055edd9cb
      1e48af06
    • atalman's avatar
      Add mkl dependency to torchaudio MacOS x86 builds (#3300) · b5795943
      atalman authored
      Summary:
      Add mkl dependency to torchaudio MacOS x86 builds
      
      Already tested here: https://github.com/pytorch/audio/actions/runs/4878179835/jobs/8703586137
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3300
      
      Reviewed By: jeanschmidt, mthrok
      
      Differential Revision: D45566352
      
      Pulled By: atalman
      
      fbshipit-source-id: a0376016506891240b2dd03d4fa4889028bf764b
      b5795943
    • Xiaohui Zhang's avatar
      Extend mask_along_axis{,_iid} (#3289) · 74bd971a
      Xiaohui Zhang authored
      Summary:
      (1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed)
      
      The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
      - Only zero masking can be done; masking by mean value is not supported.
      - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
      - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
      - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
      - It's not straightforward to apply multiple time/frequency masks by the current design.
      
      To solve these issues, here we
      - Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.
      
      The introduction of SpecAugment transform will be done in another PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3289
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45460357
      
      Pulled By: xiaohui-zhang
      
      fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3
      74bd971a
  13. 03 May, 2023 3 commits