Add HW acceleration support on Streamer (#2331)
Summary:
This commits add `hw_accel` option to `Streamer::add_video_stream` method.
Specifying `hw_accel="cuda"` allows to create the chunk Tensor directly from CUDA,
when the following conditions are met.
1. the video format is H264,
2. underlying ffmpeg is compiled with NVENC, and
3. the client code specifies `decoder="h264_cuvid"`.
A simple benchmark yields x7 improvement in the decoding speed.
<details>
```python
import time
from torchaudio.prototype.io import Streamer
srcs = [
"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
"./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", # offline version
]
patterns = [
("h264_cuvid", None, "cuda:0"), # NVDEC on CUDA:0 -> CUDA:0
("h264_cuvid", None, "cuda:1"), # NVDEC on CUDA:1 -> CUDA:1
("h264_cuvid", None, None), # NVDEC -> CPU
(None, None, None), # CPU
]
for src in srcs:
print(src, flush=True)
for (decoder, decoder_options, hw_accel) in patterns:
s = Streamer(src)
s.add_video_stream(5, decoder=decoder, decoder_options=decoder_options, hw_accel=hw_accel)
t0 = time.monotonic()
num_frames = 0
for i, (chunk, ) in enumerate(s.stream()):
num_frames += chunk.shape[0]
t1 = time.monotonic()
print(chunk.dtype, chunk.shape, chunk.device)
print(time.monotonic() - t0, num_frames, flush=True)
```
</details>
```
https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
10.781158386962488 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
10.771313901990652 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
27.88662809302332 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
83.22728440898936 6175
./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
12.945253834011964 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
12.870224556012545 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
28.03406483103754 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
82.6120332319988 6175
```
With HW resizing
<details>
```python
import time
from torchaudio.prototype.io import Streamer
srcs = [
"./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
]
patterns = [
# Decode with NVDEC, CUDA HW scaling -> CUDA:0
("h264_cuvid", {"resize": "960x540"}, "", "cuda:0"),
# Decoded with NVDEC, CUDA HW scaling -> CPU
("h264_cuvid", {"resize": "960x540"}, "", None),
# CPU decoding, CPU scaling
(None, None, "scale=width=960:height=540", None),
]
for src in srcs:
print(src, flush=True)
for (decoder, decoder_options, filter_desc, hw_accel) in patterns:
s = Streamer(src)
s.add_video_stream(
5,
decoder=decoder,
decoder_options=decoder_options,
filter_desc=filter_desc,
hw_accel=hw_accel,
)
t0 = time.monotonic()
num_frames = 0
for i, (chunk, ) in enumerate(s.stream()):
num_frames += chunk.shape[0]
t1 = time.monotonic()
print(chunk.dtype, chunk.shape, chunk.device)
print(time.monotonic() - t0, num_frames, flush=True)
```
</details>
```
./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
12.890056837990414 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
10.697489063022658 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
85.19899423001334 6175
https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
10.712715593050234 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
11.030170071986504 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
84.8515750519582 6175
```
Pull Request resolved: https://github.com/pytorch/audio/pull/2331
Reviewed By: hwangjeff
Differential Revision: D36217169
Pulled By: mthrok
fbshipit-source-id: 7979570b083cfc238ad4735b44305d8649f0607b
Showing
Please register or sign in to comment