- 24 Oct, 2023 1 commit
-
-
moto-meta authored
Differential Revision: D50506299 Pull Request resolved: https://github.com/pytorch/audio/pull/3669
-
- 12 Oct, 2023 1 commit
-
-
moto-meta authored
Differential Revision: D50205775 Pull Request resolved: https://github.com/pytorch/audio/pull/3651
-
- 11 Oct, 2023 1 commit
-
-
moto-meta authored
Differential Revision: D50082877 Pull Request resolved: https://github.com/pytorch/audio/pull/3646
-
- 09 Oct, 2023 1 commit
-
-
moto-meta authored
Differential Revision: D49965263 Pull Request resolved: https://github.com/pytorch/audio/pull/3639
-
- 13 Jul, 2023 1 commit
-
-
Moto Hira authored
Differential Revision: D47402174 Original commit changeset: 00c0719ab184 Original Phabricator Diff: D47402174 fbshipit-source-id: b1f6ea4cc3ecef3f72a87bf2f67bf9644c847546
-
- 12 Jul, 2023 1 commit
-
-
moto authored
Summary: - FFmpeg 6 deprecated attributes - Guard CUDA specific functions not used in CPU builds Pull Request resolved: https://github.com/pytorch/audio/pull/3471 Differential Revision: D47402174 Pulled By: mthrok fbshipit-source-id: 00c0719ab1849b50c0b56b03d8fb38bc7aa74538
-
- 05 Jul, 2023 1 commit
-
-
moto authored
Summary: This reverts commit b7d3e89a. We will use pre-built binaries instead of dlopen. Pull Request resolved: https://github.com/pytorch/audio/pull/3456 Differential Revision: D47239681 Pulled By: mthrok fbshipit-source-id: 0446a62410d914081184fc20c386afa00b1e41b6
-
- 08 Jun, 2023 1 commit
-
-
moto authored
Summary: StreamReader decoding process is composed of the three steps; 1. Decode the incoming AVPacket into AVFrame 2. Pass AVFrame through AVFilter to perform post process 3. Convert the resulgint AVFrame The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved. For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable. However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405 AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape. Fix https://github.com/pytorch/audio/issues/3405 Pull Request resolved: https://github.com/pytorch/audio/pull/3419 Differential Revision: D46557505 Pulled By: mthrok fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6
-
- 03 Jun, 2023 1 commit
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3402 This is a second attempt of https://github.com/pytorch/audio/pull/3353. The basic logic to enable dlopen for FFmpeg libraries are same. It uses `at::DynamicLibrary`, which allows to compile torchaudio without linking FFmpeg libraries. This time, the option to enable this feature DLOPEN_FFMPEG has been added, so that users have a way to disable this feature and keep using build-time linking. Please refer to stub.h for more technical detail. Differential Revision: D46403783 fbshipit-source-id: ca3db57ff6bdc50c8c225d22f12f3e76c6dc3f16
-
- 02 Jun, 2023 1 commit
-
-
Moto Hira authored
Differential Revision: D46059199 Original commit changeset: 4493a5fd8a4c Original Phabricator Diff: D46059199 fbshipit-source-id: 71cde3f8cd870d1ad9114e3e87cdd1ba564441c0
-
- 01 Jun, 2023 1 commit
-
-
moto authored
Summary: This commit changes the way FFmpeg extension is built and used. Instead of linking (LGPL) FFmpeg libraries to torchaudio at build time, It uses dlopen to search and link them at run time. For dlopen-ing, we use PyTorch's `at::DynamicLibrary` class, which provides portable wrapper. Pull Request resolved: https://github.com/pytorch/audio/pull/3353 Differential Revision: D46059199 Pulled By: mthrok fbshipit-source-id: 4493a5fd8a4c802178d20276522f5334d637307d
-
- 17 May, 2023 3 commits
-
-
moto authored
Summary: This commit improve the performance of conversions of YUV420P format from AVFrame to torch Tensor. It changes two things; 1. Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy. 2. Get rid of intermediate UV plane copy The following compares the time it takes to process 30 seconds of YUV420P frame at 25 FPS of resolution 320x240. The measurement times are sorted by values. Some observations * `torch::nn::functional::interpolate` with `torch::kNearest` option is not as fast as copying data manually. * switching from `interpolate` to manual data copy reduces the variance. run | main | 1 | 1+2 | improvement (from main to 1+2) -- | -- | -- | -- | -- 1 | 0.452250583 | 0.417490125 | 0.40155375 | 11.21% 2 | 0.462039958 | 0.42006675 | 0.401764125 | 13.05% 3 | 0.463067666 | 0.42416 | 0.402651334 | 13.05% 4 | 0.464228166 | 0.424545458 | 0.402985667 | 13.19% 5 | 0.465777375 | 0.425629208 | 0.405604625 | 12.92% 6 | 0.469628666 | 0.427044333 | 0.40628525 | 13.49% 7 | 0.475935125 | 0.42805875 | 0.406412167 | 14.61% 8 | 0.482277667 | 0.429921209 | 0.407279 | 15.55% 9 | 0.496695208 | 0.431182792 | 0.442013791 | 11.01% 10 | 0.546653625 | 0.541639584 | 0.4711585 | 13.81% [second] Increasing the resolution, the improvement is smaller but is consistent. run | main | 1+2 | improvement -- | -- | -- | -- 1 | 4.032393 | 3.991784667 | 1.01% 2 | 4.052248084 | 3.992672208 | 1.47% 3 | 4.07705575 | 4.000541666 | 1.88% 4 | 4.143954792 | 4.020671584 | 2.98% 5 | 4.170711959 | 4.025753125 | 3.48% 6 | 4.240229292 | 4.045504875 | 4.59% 7 | 4.267384042 | 4.045588125 | 5.20% 8 | 4.277025958 | 4.061980083 | 5.03% 9 | 4.312192042 | 4.163251959 | 3.45% 10 | 4.406109875 | 4.312560334 | 2.12% <details><summary>code</summary> ```python import time from torchaudio.io import StreamReader def test(): r = StreamReader(src="testsrc=duration=30", format="lavfi") # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi") r.add_video_stream(-1, filter_desc="format=yuv420p") t0 = time.monotonic() r.process_all_packets() elapsed = time.monotonic() - t0 print(elapsed) for _ in range(10): test() ``` </details> <details><summary>env</summary> ``` PyTorch version: 2.1.0.dev20230325 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 13.3.1 (arm64) GCC version: Could not collect Clang version: 14.0.6 CMake version: version 3.22.1 Libc version: N/A Python version: 3.9.16 (main, Mar 8 2023, 04:29:24) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-13.3.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Apple M1 Versions of relevant libraries: [pip3] torch==2.1.0.dev20230325 [pip3] torchaudio==2.1.0a0+541b525 [conda] pytorch 2.1.0.dev20230325 py3.9_0 pytorch-nightly [conda] torchaudio 2.1.0a0+541b525 dev_0 <develop> ``` </details> Pull Request resolved: https://github.com/pytorch/audio/pull/3342 Reviewed By: xiaohui-zhang Differential Revision: D45947716 Pulled By: mthrok fbshipit-source-id: 17e5930f57544b4f2e48a9b2185464694a88ab68 -
moto authored
Summary: Similar to https://github.com/pytorch/audio/pull/3342, this commit improves the performance of NV12 frame conversion. It changes two things; - Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy. - Get rid of intermediate UV plane copy with 320x240 run | main | pr | improvement -- | -- | -- | -- 1 | 0.600671417 | 0.464993125 | 22.59% 2 | 0.638846084 | 0.456763542 | 28.50% 3 | 0.64158175 | 0.458295333 | 28.57% 4 | 0.649868584 | 0.455450583 | 29.92% 5 | 0.612171333 | 0.462435625 | 24.46% 6 | 0.6128095 | 0.456716166 | 25.47% 7 | 0.632084583 | 0.463357083 | 26.69% 8 | 0.610733083 | 0.46148625 | 24.44% 9 | 0.613825834 | 0.4559555 | 25.72% 10 | 0.653857458 | 0.455375375 | 30.36% [second] with 1080x720 video run | main | pr | improvement -- | -- | -- | -- 1 | 4.984154333 | 4.21090375 | 15.51% 2 | 4.988090625 | 4.239649375 | 15.00% 3 | 4.988896375 | 4.227277458 | 15.27% 4 | 4.998186584 | 4.161077042 | 16.75% 5 | 5.06180425 | 4.191672584 | 17.19% 6 | 5.108769667 | 4.198468458 | 17.82% 7 | 5.151363625 | 4.181942167 | 18.82% 8 | 5.199527875 | 4.239319084 | 18.47% 9 | 5.224903708 | 4.194901959 | 19.71% 10 | 5.333422583 | 4.320925792 | 18.98% [second] <details><summary>code</summary> ```python import time from torchaudio.io import StreamReader def test(): r = StreamReader(src="testsrc=duration=30", format="lavfi") # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi") r.add_video_stream(-1, filter_desc="format=nv12") t0 = time.monotonic() r.process_all_packets() elapsed = time.monotonic() - t0 print(elapsed) for _ in range(10): test() ``` </details> <details><summary>env</summary> ``` PyTorch version: 2.1.0.dev20230325 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 13.3.1 (arm64) GCC version: Could not collect Clang version: 14.0.6 CMake version: version 3.22.1 Libc version: N/A Python version: 3.9.16 (main, Mar 8 2023, 04:29:24) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-13.3.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Apple M1 Versions of relevant libraries: [pip3] torch==2.1.0.dev20230325 [pip3] torchaudio==2.1.0a0+541b525 [conda] pytorch 2.1.0.dev20230325 py3.9_0 pytorch-nightly [conda] torchaudio 2.1.0a0+541b525 dev_0 <develop> ``` </details> Pull Request resolved: https://github.com/pytorch/audio/pull/3344 Reviewed By: xiaohui-zhang Differential Revision: D45948511 Pulled By: mthrok fbshipit-source-id: ae9b300cbcb4295f3f7470736f258280005a21e5
-
moto authored
Summary: This commit add support to decode YUV420P010LE format. The image tensor returned by this format - NCHW format (C == 3) - int16 type - value range [0, 2^10). Note that the value range is different from what "hevc_cuvid" decoder returns. "hevc_cuvid" decoder uses full range of int16 (internally, it's uint16) to express the color (with some intervals), but the values returned by CPU "hevc" decoder are with in [0, 2^10). Address https://github.com/pytorch/audio/issues/3331 Pull Request resolved: https://github.com/pytorch/audio/pull/3332 Reviewed By: hwangjeff Differential Revision: D45925097 Pulled By: mthrok fbshipit-source-id: 4e669b65c030f388bba2fdbb8f00faf7e2981508
-
- 23 Mar, 2023 2 commits
-
-
moto authored
Summary: With the support of CUDA filter in https://github.com/pytorch/audio/issues/3183, it is now possible to change the pixel format of CUDA frame. This commit adds conversion for YUV444P format. Pull Request resolved: https://github.com/pytorch/audio/pull/3199 Reviewed By: hwangjeff Differential Revision: D44323928 Pulled By: mthrok fbshipit-source-id: 6d9b205e7235df5f21e7d3e06166b3a169f1ae9f
-
moto authored
Summary: StreamReader behaves differently when dealing with YUV formats. It implicitly converts the image format to YUV444P because otherwise image planes do not have the same shape and it is not possible to express it as a regular PyTorch Tensor with dedicated dimension for each color channel. This is commit adds warnings to such conversions. Pull Request resolved: https://github.com/pytorch/audio/pull/3201 Reviewed By: nateanl Differential Revision: D44311017 Pulled By: mthrok fbshipit-source-id: 73a02a19c013c0263f349e1f3a3603e3d3eddb6a
-
- 16 Mar, 2023 1 commit
-
-
moto authored
Summary: Currently, when the Buffer converts AVFrame* to torch::Tensor, it checks the format at each time a frame is passed, and perform the conversion. This commit changes it so that the conversion operation is pre-instantiated at the time outside stream is configured. It introduces Converter implementations for various formats, and use template to embed them in Buffer class. This way, branching like if/switch are eliminated from decoding path. Pull Request resolved: https://github.com/pytorch/audio/pull/3170 Reviewed By: xiaohui-zhang Differential Revision: D44048293 Pulled By: mthrok fbshipit-source-id: 30d8b240a5695d7513f499ce17853f2f0ffcab9f
-