1. 24 Oct, 2023 1 commit
  2. 12 Oct, 2023 1 commit
  3. 11 Oct, 2023 1 commit
  4. 09 Oct, 2023 1 commit
  5. 13 Jul, 2023 1 commit
  6. 12 Jul, 2023 1 commit
  7. 05 Jul, 2023 1 commit
  8. 08 Jun, 2023 1 commit
    • moto's avatar
      Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca
      moto authored
      Summary:
      StreamReader decoding process is composed of the three steps;
      
      1. Decode the incoming AVPacket into AVFrame
      2. Pass AVFrame through AVFilter to perform post process
      3. Convert the resulgint AVFrame
      
      The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.
      
      For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
      However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405
      
      AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.
      
      Fix https://github.com/pytorch/audio/issues/3405
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3419
      
      Differential Revision: D46557505
      
      Pulled By: mthrok
      
      fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6
      7dff24ca
  9. 03 Jun, 2023 1 commit
  10. 02 Jun, 2023 1 commit
  11. 01 Jun, 2023 1 commit
    • moto's avatar
      Use dlopen for FFmpeg (#3353) · b14ced1a
      moto authored
      Summary:
      This commit changes the way FFmpeg extension is built and used.
      Instead of linking (LGPL) FFmpeg libraries to torchaudio at build time,
      It uses dlopen to search and link them at run time.
      
      For dlopen-ing, we use PyTorch's `at::DynamicLibrary` class, which provides
      portable wrapper.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3353
      
      Differential Revision: D46059199
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4493a5fd8a4c802178d20276522f5334d637307d
      b14ced1a
  12. 17 May, 2023 3 commits
    • moto's avatar
      Improve the performance of YUV420P frame conversion (#3342) · 72d3fe09
      moto authored
      Summary:
      This commit improve the performance of conversions of YUV420P format from AVFrame to torch Tensor.
      
      It changes two things;
      1. Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy.
      2.  Get rid of intermediate UV plane copy
      
      The following compares the time it takes to process 30 seconds of YUV420P frame at 25 FPS of resolution 320x240. The measurement times are sorted by values.
      
      Some observations
      * `torch::nn::functional::interpolate` with `torch::kNearest` option is not as fast as copying data manually.
      * switching from `interpolate` to manual data copy reduces the variance.
      
      run | main | 1 | 1+2 | improvement (from main to 1+2)
      -- | -- | -- | -- | --
      1 | 0.452250583 | 0.417490125 | 0.40155375 | 11.21%
      2 | 0.462039958 | 0.42006675 | 0.401764125 | 13.05%
      3 | 0.463067666 | 0.42416 | 0.402651334 | 13.05%
      4 | 0.464228166 | 0.424545458 | 0.402985667 | 13.19%
      5 | 0.465777375 | 0.425629208 | 0.405604625 | 12.92%
      6 | 0.469628666 | 0.427044333 | 0.40628525 | 13.49%
      7 | 0.475935125 | 0.42805875 | 0.406412167 | 14.61%
      8 | 0.482277667 | 0.429921209 | 0.407279 | 15.55%
      9 | 0.496695208 | 0.431182792 | 0.442013791 | 11.01%
      10 | 0.546653625 | 0.541639584 | 0.4711585 | 13.81%
      
      [second]
      
      Increasing the resolution, the improvement is smaller but is consistent.
      
      run | main | 1+2 | improvement
      -- | -- | -- | --
      1 | 4.032393 | 3.991784667 | 1.01%
      2 | 4.052248084 | 3.992672208 | 1.47%
      3 | 4.07705575 | 4.000541666 | 1.88%
      4 | 4.143954792 | 4.020671584 | 2.98%
      5 | 4.170711959 | 4.025753125 | 3.48%
      6 | 4.240229292 | 4.045504875 | 4.59%
      7 | 4.267384042 | 4.045588125 | 5.20%
      8 | 4.277025958 | 4.061980083 | 5.03%
      9 | 4.312192042 | 4.163251959 | 3.45%
      10 | 4.406109875 | 4.312560334 | 2.12%
      
      <details><summary>code</summary>
      
      ```python
      import time
      
      from torchaudio.io import StreamReader
      
      def test():
          r = StreamReader(src="testsrc=duration=30", format="lavfi")
          # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi")
          r.add_video_stream(-1, filter_desc="format=yuv420p")
          t0 = time.monotonic()
          r.process_all_packets()
          elapsed = time.monotonic() - t0
          print(elapsed)
      
      for _ in range(10):
          test()
      ```
      </details>
      
      <details><summary>env</summary>
      
      ```
      PyTorch version: 2.1.0.dev20230325
      Is debug build: False
      CUDA used to build PyTorch: None
      ROCM used to build PyTorch: N/A
      
      OS: macOS 13.3.1 (arm64)
      GCC version: Could not collect
      Clang version: 14.0.6
      CMake version: version 3.22.1
      Libc version: N/A
      
      Python version: 3.9.16 (main, Mar  8 2023, 04:29:24)  [Clang 14.0.6 ] (64-bit runtime)
      Python platform: macOS-13.3.1-arm64-arm-64bit
      Is CUDA available: False
      CUDA runtime version: No CUDA
      CUDA_MODULE_LOADING set to: N/A
      GPU models and configuration: No CUDA
      Nvidia driver version: No CUDA
      cuDNN version: No CUDA
      HIP runtime version: N/A
      MIOpen runtime version: N/A
      Is XNNPACK available: True
      
      CPU:
      Apple M1
      
      Versions of relevant libraries:
      [pip3] torch==2.1.0.dev20230325
      [pip3] torchaudio==2.1.0a0+541b525
      [conda] pytorch                   2.1.0.dev20230325         py3.9_0    pytorch-nightly
      [conda] torchaudio                2.1.0a0+541b525           dev_0    <develop>
      ```
      
      </details>
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3342
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D45947716
      
      Pulled By: mthrok
      
      fbshipit-source-id: 17e5930f57544b4f2e48a9b2185464694a88ab68
      72d3fe09
    • moto's avatar
      Improve the performance of NV12 frame conversion (#3344) · c11661e0
      moto authored
      Summary:
      Similar to https://github.com/pytorch/audio/pull/3342, this commit improves the performance of NV12 frame conversion.
      
      It changes two things;
      
      - Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy.
      - Get rid of intermediate UV plane copy
      
      with 320x240
      
      run | main | pr | improvement
      -- | -- | -- | --
      1 | 0.600671417 | 0.464993125 | 22.59%
      2 | 0.638846084 | 0.456763542 | 28.50%
      3 | 0.64158175 | 0.458295333 | 28.57%
      4 | 0.649868584 | 0.455450583 | 29.92%
      5 | 0.612171333 | 0.462435625 | 24.46%
      6 | 0.6128095 | 0.456716166 | 25.47%
      7 | 0.632084583 | 0.463357083 | 26.69%
      8 | 0.610733083 | 0.46148625 | 24.44%
      9 | 0.613825834 | 0.4559555 | 25.72%
      10 | 0.653857458 | 0.455375375 | 30.36%
      
      [second]
      
      with 1080x720 video
      
      run | main | pr | improvement
      -- | -- | -- | --
      1 | 4.984154333 | 4.21090375 | 15.51%
      2 | 4.988090625 | 4.239649375 | 15.00%
      3 | 4.988896375 | 4.227277458 | 15.27%
      4 | 4.998186584 | 4.161077042 | 16.75%
      5 | 5.06180425 | 4.191672584 | 17.19%
      6 | 5.108769667 | 4.198468458 | 17.82%
      7 | 5.151363625 | 4.181942167 | 18.82%
      8 | 5.199527875 | 4.239319084 | 18.47%
      9 | 5.224903708 | 4.194901959 | 19.71%
      10 | 5.333422583 | 4.320925792 | 18.98%
      
      [second]
      
      <details><summary>code</summary>
      
      ```python
      import time
      
      from torchaudio.io import StreamReader
      
      def test():
          r = StreamReader(src="testsrc=duration=30", format="lavfi")
          # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi")
          r.add_video_stream(-1, filter_desc="format=nv12")
          t0 = time.monotonic()
          r.process_all_packets()
          elapsed = time.monotonic() - t0
          print(elapsed)
      
      for _ in range(10):
          test()
      ```
      </details>
      
      <details><summary>env</summary>
      
      ```
      PyTorch version: 2.1.0.dev20230325
      Is debug build: False
      CUDA used to build PyTorch: None
      ROCM used to build PyTorch: N/A
      
      OS: macOS 13.3.1 (arm64)
      GCC version: Could not collect
      Clang version: 14.0.6
      CMake version: version 3.22.1
      Libc version: N/A
      
      Python version: 3.9.16 (main, Mar  8 2023, 04:29:24)  [Clang 14.0.6 ] (64-bit runtime)
      Python platform: macOS-13.3.1-arm64-arm-64bit
      Is CUDA available: False
      CUDA runtime version: No CUDA
      CUDA_MODULE_LOADING set to: N/A
      GPU models and configuration: No CUDA
      Nvidia driver version: No CUDA
      cuDNN version: No CUDA
      HIP runtime version: N/A
      MIOpen runtime version: N/A
      Is XNNPACK available: True
      
      CPU:
      Apple M1
      
      Versions of relevant libraries:
      [pip3] torch==2.1.0.dev20230325
      [pip3] torchaudio==2.1.0a0+541b525
      [conda] pytorch                   2.1.0.dev20230325         py3.9_0    pytorch-nightly
      [conda] torchaudio                2.1.0a0+541b525           dev_0    <develop>
      ```
      
      </details>
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3344
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D45948511
      
      Pulled By: mthrok
      
      fbshipit-source-id: ae9b300cbcb4295f3f7470736f258280005a21e5
      c11661e0
    • moto's avatar
      Add 420p10le CPU support to StreamReader (#3332) · c12f4734
      moto authored
      Summary:
      This commit add support to decode YUV420P010LE format.
      
      The image tensor returned by this format
      - NCHW format (C == 3)
      - int16 type
      - value range [0, 2^10).
      
      Note that the value range is different from what "hevc_cuvid" decoder
      returns. "hevc_cuvid" decoder uses full range of int16 (internally,
      it's uint16) to express the color (with some intervals), but the values
      returned by CPU "hevc" decoder are with in [0, 2^10).
      
      Address https://github.com/pytorch/audio/issues/3331
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3332
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45925097
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4e669b65c030f388bba2fdbb8f00faf7e2981508
      c12f4734
  13. 23 Mar, 2023 2 commits
  14. 16 Mar, 2023 1 commit
    • moto's avatar
      Refactor Tensor conversion in StreamReader (#3170) · 014d7140
      moto authored
      Summary:
      Currently, when the Buffer converts AVFrame* to torch::Tensor,
      it checks the format at each time a frame is passed, and
      perform the conversion.
      
      This commit changes it so that the conversion operation is
      pre-instantiated at the time outside stream is configured.
      
      It introduces Converter implementations for various formats,
      and use template to embed them in Buffer class.
      This way, branching like if/switch are eliminated from
      decoding path.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3170
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D44048293
      
      Pulled By: mthrok
      
      fbshipit-source-id: 30d8b240a5695d7513f499ce17853f2f0ffcab9f
      014d7140