Improve the performance of NV12 frame conversion (#3344)
Summary: Similar to https://github.com/pytorch/audio/pull/3342, this commit improves the performance of NV12 frame conversion. It changes two things; - Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy. - Get rid of intermediate UV plane copy with 320x240 run | main | pr | improvement -- | -- | -- | -- 1 | 0.600671417 | 0.464993125 | 22.59% 2 | 0.638846084 | 0.456763542 | 28.50% 3 | 0.64158175 | 0.458295333 | 28.57% 4 | 0.649868584 | 0.455450583 | 29.92% 5 | 0.612171333 | 0.462435625 | 24.46% 6 | 0.6128095 | 0.456716166 | 25.47% 7 | 0.632084583 | 0.463357083 | 26.69% 8 | 0.610733083 | 0.46148625 | 24.44% 9 | 0.613825834 | 0.4559555 | 25.72% 10 | 0.653857458 | 0.455375375 | 30.36% [second] with 1080x720 video run | main | pr | improvement -- | -- | -- | -- 1 | 4.984154333 | 4.21090375 | 15.51% 2 | 4.988090625 | 4.239649375 | 15.00% 3 | 4.988896375 | 4.227277458 | 15.27% 4 | 4.998186584 | 4.161077042 | 16.75% 5 | 5.06180425 | 4.191672584 | 17.19% 6 | 5.108769667 | 4.198468458 | 17.82% 7 | 5.151363625 | 4.181942167 | 18.82% 8 | 5.199527875 | 4.239319084 | 18.47% 9 | 5.224903708 | 4.194901959 | 19.71% 10 | 5.333422583 | 4.320925792 | 18.98% [second] <details><summary>code</summary> ```python import time from torchaudio.io import StreamReader def test(): r = StreamReader(src="testsrc=duration=30", format="lavfi") # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi") r.add_video_stream(-1, filter_desc="format=nv12") t0 = time.monotonic() r.process_all_packets() elapsed = time.monotonic() - t0 print(elapsed) for _ in range(10): test() ``` </details> <details><summary>env</summary> ``` PyTorch version: 2.1.0.dev20230325 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 13.3.1 (arm64) GCC version: Could not collect Clang version: 14.0.6 CMake version: version 3.22.1 Libc version: N/A Python version: 3.9.16 (main, Mar 8 2023, 04:29:24) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-13.3.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Apple M1 Versions of relevant libraries: [pip3] torch==2.1.0.dev20230325 [pip3] torchaudio==2.1.0a0+541b525 [conda] pytorch 2.1.0.dev20230325 py3.9_0 pytorch-nightly [conda] torchaudio 2.1.0a0+541b525 dev_0 <develop> ``` </details> Pull Request resolved: https://github.com/pytorch/audio/pull/3344 Reviewed By: xiaohui-zhang Differential Revision: D45948511 Pulled By: mthrok fbshipit-source-id: ae9b300cbcb4295f3f7470736f258280005a21e5
Showing
Please register or sign in to comment