torchaudio/csrc/ffmpeg/stream_reader/conversion.cpp · 7dff24cae4b40d858439dbfd5b2d8be72b57ab70 · OpenDAS / Torchaudio

Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca

moto authored Jun 08, 2023

Summary:
StreamReader decoding process is composed of the three steps;

1. Decode the incoming AVPacket into AVFrame
2. Pass AVFrame through AVFilter to perform post process
3. Convert the resulgint AVFrame

The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.

For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405

AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.

Fix https://github.com/pytorch/audio/issues/3405

Pull Request resolved: https://github.com/pytorch/audio/pull/3419

Differential Revision: D46557505

Pulled By: mthrok

fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6

7dff24ca

conversion.cpp 21.9 KB

Replace conversion.cpp