• moto's avatar
    Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca
    moto authored
    Summary:
    StreamReader decoding process is composed of the three steps;
    
    1. Decode the incoming AVPacket into AVFrame
    2. Pass AVFrame through AVFilter to perform post process
    3. Convert the resulgint AVFrame
    
    The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.
    
    For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
    However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405
    
    AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.
    
    Fix https://github.com/pytorch/audio/issues/3405
    
    Pull Request resolved: https://github.com/pytorch/audio/pull/3419
    
    Differential Revision: D46557505
    
    Pulled By: mthrok
    
    fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6
    7dff24ca
conversion.cpp 21.9 KB