Update the logic to fetch pixel format from filter graph (#3479)

Summary: When using GPU decoder in some environments, attempting to read the output formats from filter graph caused an issue in which the software pixel format cannot be determined. We do not know the exact cause but when it happens, the input link of buffer sink does not have HW frames context. Since currently no filter can convert the pixel format of CUDA frame, we resort to the HW frames context of the output link of buffer source. Environments this was observed. Env1 - OS: Fedora 36 (x86_64) - GCC 12.2.1 - Python 3.10.12 - GPU: GeForce RTX 3070 Ti Laptop GPU - FFmpeg: 5.1.3 - nv-codec-header: n11.1.5.2 - CUDA: 12.1 Env2 - Ubuntu 20.04.4 LTS (x86_64) - GCC 9.4.0 - Python 3.11.3 - GPU: Quadro GV100 - FFmpeg: 5.1.3 - nv-codec-header: n11.1.5.2 - CUDA: 11.4 Pull Request resolved: https://github.com/pytorch/audio/pull/3479 Differential Revision: D47482407 Pulled By: mthrok fbshipit-source-id: 1c53096b27824453b260138ab64e1948afeeefc7

Update the logic to fetch pixel format from filter graph (#3479)
Summary: When using GPU decoder in some environments, attempting to read the output formats from filter graph caused an issue in which the software pixel format cannot be determined. We do not know the exact cause but when it happens, the input link of buffer sink does not have HW frames context. Since currently no filter can convert the pixel format of CUDA frame, we resort to the HW frames context of the output link of buffer source. Environments this was observed. Env1 - OS: Fedora 36 (x86_64) - GCC 12.2.1 - Python 3.10.12 - GPU: GeForce RTX 3070 Ti Laptop GPU - FFmpeg: 5.1.3 - nv-codec-header: n11.1.5.2 - CUDA: 12.1 Env2 - Ubuntu 20.04.4 LTS (x86_64) - GCC 9.4.0 - Python 3.11.3 - GPU: Quadro GV100 - FFmpeg: 5.1.3 - nv-codec-header: n11.1.5.2 - CUDA: 11.4 Pull Request resolved: https://github.com/pytorch/audio/pull/3479 Differential Revision: D47482407 Pulled By: mthrok fbshipit-source-id: 1c53096b27824453b260138ab64e1948afeeefc7
cf53a486 · moto · Facebook GitHub Bot · 86cb1e09 · cf53a486
Commit cf53a486 authored Jul 14, 2023 by moto Committed by Facebook GitHub Bot Jul 14, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 20 additions and 2 deletions

torchaudio/csrc/ffmpeg/filter_graph.cpp torchaudio/csrc/ffmpeg/filter_graph.cpp +20 -2

No files found.
--- a/torchaudio/csrc/ffmpeg/filter_graph.cpp
+++ b/torchaudio/csrc/ffmpeg/filter_graph.cpp
@@ -195,8 +195,26 @@ FilterGraphOutputInfo FilterGraph::get_output_info() const {
      break;
    }
    case AVMEDIA_TYPE_VIDEO: {
-      if (l->format == AV_PIX_FMT_CUDA && l->hw_frames_ctx) {
-        auto frames_ctx = (AVHWFramesContext*)(l->hw_frames_ctx->data);
+      // If this is CUDA, retrieve the software pixel format from HW frames
+      // context.
+      if (l->format == AV_PIX_FMT_CUDA) {
+        // Originally, we were expecting that filter graph would propagate the
+        // HW frames context, so that we can retrieve it from the sink link.
+        // However, this is sometimes not the case.
+        // We do not know what is causing this behavior (GPU? libavfilter?
+        // format?) we resort to the source link in such case.
+        //
+        // (Technically, filters like scale_cuda could change the pixel format.
+        // We expect that hw_frames_ctx is propagated in such cases, but we do
+        // not know.
+        // TODO: check how scale_cuda interferes.
+        auto frames_ctx = [&]() -> AVHWFramesContext* {
+          if (l->hw_frames_ctx) {
+            return (AVHWFramesContext*)(l->hw_frames_ctx->data);
+          }
+          return (AVHWFramesContext*)(buffersrc_ctx->outputs[0]
+                                          ->hw_frames_ctx->data);
+        }();
        ret.format = frames_ctx->sw_format;
      }
      ret.frame_rate = l->frame_rate;