- 10 May, 2022 1 commit
-
-
moto authored
Summary: This commits add `hw_accel` option to `Streamer::add_video_stream` method. Specifying `hw_accel="cuda"` allows to create the chunk Tensor directly from CUDA, when the following conditions are met. 1. the video format is H264, 2. underlying ffmpeg is compiled with NVENC, and 3. the client code specifies `decoder="h264_cuvid"`. A simple benchmark yields x7 improvement in the decoding speed. <details> ```python import time from torchaudio.prototype.io import Streamer srcs = [ "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", # offline version ] patterns = [ ("h264_cuvid", None, "cuda:0"), # NVDEC on CUDA:0 -> CUDA:0 ("h264_cuvid", None, "cuda:1"), # NVDEC on CUDA:1 -> CUDA:1 ("h264_cuvid", None, None), # NVDEC -> CPU (None, None, None), # CPU ] for src in srcs: print(src, flush=True) for (decoder, decoder_options, hw_accel) in patterns: s = Streamer(src) s.add_video_stream(5, decoder=decoder, decoder_options=decoder_options, hw_accel=hw_accel) t0 = time.monotonic() num_frames = 0 for i, (chunk, ) in enumerate(s.stream()): num_frames += chunk.shape[0] t1 = time.monotonic() print(chunk.dtype, chunk.shape, chunk.device) print(time.monotonic() - t0, num_frames, flush=True) ``` </details> ``` https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0 10.781158386962488 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1 10.771313901990652 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 27.88662809302332 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 83.22728440898936 6175 ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0 12.945253834011964 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1 12.870224556012545 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 28.03406483103754 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 82.6120332319988 6175 ``` With HW resizing <details> ```python import time from torchaudio.prototype.io import Streamer srcs = [ "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", ] patterns = [ # Decode with NVDEC, CUDA HW scaling -> CUDA:0 ("h264_cuvid", {"resize": "960x540"}, "", "cuda:0"), # Decoded with NVDEC, CUDA HW scaling -> CPU ("h264_cuvid", {"resize": "960x540"}, "", None), # CPU decoding, CPU scaling (None, None, "scale=width=960:height=540", None), ] for src in srcs: print(src, flush=True) for (decoder, decoder_options, filter_desc, hw_accel) in patterns: s = Streamer(src) s.add_video_stream( 5, decoder=decoder, decoder_options=decoder_options, filter_desc=filter_desc, hw_accel=hw_accel, ) t0 = time.monotonic() num_frames = 0 for i, (chunk, ) in enumerate(s.stream()): num_frames += chunk.shape[0] t1 = time.monotonic() print(chunk.dtype, chunk.shape, chunk.device) print(time.monotonic() - t0, num_frames, flush=True) ``` </details> ``` ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0 12.890056837990414 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 10.697489063022658 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 85.19899423001334 6175 https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0 10.712715593050234 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 11.030170071986504 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 84.8515750519582 6175 ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2331 Reviewed By: hwangjeff Differential Revision: D36217169 Pulled By: mthrok fbshipit-source-id: 7979570b083cfc238ad4735b44305d8649f0607b
-
- 14 Apr, 2022 1 commit
-
-
moto authored
Summary: This commit adds support to specify decoder to Streamer's add stream method. This is roughly equivalent to `ffmpeg`'s `-c:v foo` and `-c:a foo` options. This allows to override the decoder codec and/or specify the option of the decoder. This change allows to specify Nvidia NVDEC codec for supported formats, which uses dedicated hardware for decoding the video. --- Note: The CL might look overwhelming, but it's essentially, add new parameters in Python, and pass them down all the way to `AVCodecContextPtr`, which initializes the actual decoder implementation (`AVCodecContext`.) Pull Request resolved: https://github.com/pytorch/audio/pull/2327 Reviewed By: carolineechen Differential Revision: D35626904 Pulled By: mthrok fbshipit-source-id: a115ed548624e53c16bacfecff5aa6c9d4e8bede
-
- 04 Mar, 2022 1 commit
-
-
moto authored
Summary: This commit adds the following behavior to `seek` so that `seek` works after a frame is decoded. 1. Flush the decoder buffer. 2. Recreate filter graphs (so that internal state is re-initialized) 3. Discard the buffered tensor. (decoded chunks) Also it disallows negative values for seek timestamp. Pull Request resolved: https://github.com/pytorch/audio/pull/2264 Reviewed By: carolineechen Differential Revision: D34497826 Pulled By: mthrok fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d
-
- 21 Jan, 2022 1 commit
-
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/2164. Removes debug code and associated arguments/fields left in previous PRs. Pull Request resolved: https://github.com/pytorch/audio/pull/2168 Reviewed By: hwangjeff Differential Revision: D33712999 Pulled By: mthrok fbshipit-source-id: 0729e9fbc146c48887379b6231e4d6e8cb520c44
-
- 05 Jan, 2022 1 commit
-
-
moto authored
Summary: MSVS is strict on header and would not allow `std::runtime_error` without `<stdexcept>`. Also this commit removes the stray `"libavutil/frame.h"`, inserted by IDE, which I missed. Ref https://github.com/pytorch/audio/issues/2124 Pull Request resolved: https://github.com/pytorch/audio/pull/2135 Reviewed By: carolineechen Differential Revision: D33419120 Pulled By: mthrok fbshipit-source-id: 328e723b70a3608133d9ddef9fc4a95e5d90e61d
-
- 29 Dec, 2021 1 commit
-
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review. Add StreamProcessor class that bundles `Buffer`, `FilterGraph` and `Decoder`. Note: The API to retrieve the buffered Tensors is tentative. For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md. Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later. Needs to be imported after https://github.com/pytorch/audio/issues/2044. Pull Request resolved: https://github.com/pytorch/audio/pull/2045 Reviewed By: carolineechen Differential Revision: D33299858 Pulled By: mthrok fbshipit-source-id: d85bececed475f45622743f137dd59cb1390ceed
-