Reuse HW device context in GPU encoder (#3215)

Summary: In https://github.com/pytorch/audio/issues/3178, a mechanism to cache HW device context was introduced. This commit applies the reuse in StreamWriter, so that when using GPU video decoding and encoding, they are shared. This gives back about 250 - 300 MB of GPU memory. --- Q: What is HW device context? From https://ffmpeg.org/doxygen/4.1/structAVHWDeviceContext.html#details > This struct aggregates all the (hardware/vendor-specific) "high-level" state, i.e. > > state that is not tied to a concrete processing configuration. E.g., in an API that supports hardware-accelerated encoding and decoding, this struct will (if possible) wrap the state that is common to both encoding and decoding and from which specific instances of encoders or decoders can be derived. Pull Request resolved: https://github.com/pytorch/audio/pull/3215 Reviewed By: nateanl Differential Revision: D44504051 Pulled By: mthrok fbshipit-source-id: 77579cdc8bd9e9b8a218e3f29031d091cda83860

Reuse HW device context in GPU encoder (#3215)
Summary: In https://github.com/pytorch/audio/issues/3178, a mechanism to cache HW device context was introduced. This commit applies the reuse in StreamWriter, so that when using GPU video decoding and encoding, they are shared. This gives back about 250 - 300 MB of GPU memory. --- Q: What is HW device context? From https://ffmpeg.org/doxygen/4.1/structAVHWDeviceContext.html#details > This struct aggregates all the (hardware/vendor-specific) "high-level" state, i.e. > > state that is not tied to a concrete processing configuration. E.g., in an API that supports hardware-accelerated encoding and decoding, this struct will (if possible) wrap the state that is common to both encoding and decoding and from which specific instances of encoders or decoders can be derived. Pull Request resolved: https://github.com/pytorch/audio/pull/3215 Reviewed By: nateanl Differential Revision: D44504051 Pulled By: mthrok fbshipit-source-id: 77579cdc8bd9e9b8a218e3f29031d091cda83860
c76fd58b · moto · Facebook GitHub Bot · c07a96ab · c76fd58b · c76fd58b
Commit c76fd58b authored Mar 29, 2023 by moto Committed by Facebook GitHub Bot Mar 29, 2023
2 changed files
--- a/torchaudio/csrc/ffmpeg/stream_reader/stream_processor.cpp
+++ b/torchaudio/csrc/ffmpeg/stream_reader/stream_processor.cpp
@@ -122,6 +122,8 @@ void configure_codec_context(
    // will retrieve the HW pixel format from opaque pointer.
    codec_ctx->get_format = get_hw_format;
    codec_ctx->hw_device_ctx = av_buffer_ref(get_cuda_context(device.index()));
+    TORCH_INTERNAL_ASSERT(
+        codec_ctx->hw_device_ctx, "Failed to reference HW device context.");
 #endif
  }
 }

--- a/torchaudio/csrc/ffmpeg/stream_writer/encode_process.cpp
+++ b/torchaudio/csrc/ffmpeg/stream_writer/encode_process.cpp
+#include <torchaudio/csrc/ffmpeg/hw_context.h>
 #include <torchaudio/csrc/ffmpeg/stream_writer/encode_process.h>
 namespace torchaudio::io {
@@ -460,15 +461,9 @@ void configure_hw_accel(AVCodecContext* ctx, const std::string& hw_accel) {
  // context to AVCodecContext. But this way, it will be deallocated
  // automatically at the time AVCodecContext is freed, so we do that.
-  int ret = av_hwdevice_ctx_create(
+  ctx->hw_device_ctx = av_buffer_ref(get_cuda_context(device.index()));
-      &ctx->hw_device_ctx,
+  TORCH_INTERNAL_ASSERT(
-      AV_HWDEVICE_TYPE_CUDA,
+      ctx->hw_device_ctx, "Failed to reference HW device context.");
-      std::to_string(device.index()).c_str(),
-      nullptr,
-      0);
-  TORCH_CHECK(
-      ret >= 0, "Failed to create CUDA device context: ", av_err2string(ret));
-  assert(ctx->hw_device_ctx);
  ctx->sw_pix_fmt = ctx->pix_fmt;
  ctx->pix_fmt = AV_PIX_FMT_CUDA;
@@ -483,7 +478,7 @@ void configure_hw_accel(AVCodecContext* ctx, const std::string& hw_accel) {
  frames_ctx->height = ctx->height;
  frames_ctx->initial_pool_size = 5;
-  ret = av_hwframe_ctx_init(ctx->hw_frames_ctx);
+  int ret = av_hwframe_ctx_init(ctx->hw_frames_ctx);
  TORCH_CHECK(
      ret >= 0,
      "Failed to initialize CUDA frame context: ",