Commit c76fd58b authored by moto's avatar moto Committed by Facebook GitHub Bot
Browse files

Reuse HW device context in GPU encoder (#3215)

Summary:
In https://github.com/pytorch/audio/issues/3178, a mechanism to cache HW device context was introduced.
This commit applies the reuse in StreamWriter, so that
when using GPU video decoding and encoding, they are shared.

This gives back about 250 - 300 MB of GPU memory.

 ---

Q: What is HW device context?
From https://ffmpeg.org/doxygen/4.1/structAVHWDeviceContext.html#details
> This struct aggregates all the (hardware/vendor-specific) "high-level" state, i.e.
>
> state that is not tied to a concrete processing configuration. E.g., in an API that supports hardware-accelerated encoding and decoding, this struct will (if possible) wrap the state that is common to both encoding and decoding and from which specific instances of encoders or decoders can be derived.

Pull Request resolved: https://github.com/pytorch/audio/pull/3215

Reviewed By: nateanl

Differential Revision: D44504051

Pulled By: mthrok

fbshipit-source-id: 77579cdc8bd9e9b8a218e3f29031d091cda83860
parent c07a96ab
...@@ -122,6 +122,8 @@ void configure_codec_context( ...@@ -122,6 +122,8 @@ void configure_codec_context(
// will retrieve the HW pixel format from opaque pointer. // will retrieve the HW pixel format from opaque pointer.
codec_ctx->get_format = get_hw_format; codec_ctx->get_format = get_hw_format;
codec_ctx->hw_device_ctx = av_buffer_ref(get_cuda_context(device.index())); codec_ctx->hw_device_ctx = av_buffer_ref(get_cuda_context(device.index()));
TORCH_INTERNAL_ASSERT(
codec_ctx->hw_device_ctx, "Failed to reference HW device context.");
#endif #endif
} }
} }
......
#include <torchaudio/csrc/ffmpeg/hw_context.h>
#include <torchaudio/csrc/ffmpeg/stream_writer/encode_process.h> #include <torchaudio/csrc/ffmpeg/stream_writer/encode_process.h>
namespace torchaudio::io { namespace torchaudio::io {
...@@ -460,15 +461,9 @@ void configure_hw_accel(AVCodecContext* ctx, const std::string& hw_accel) { ...@@ -460,15 +461,9 @@ void configure_hw_accel(AVCodecContext* ctx, const std::string& hw_accel) {
// context to AVCodecContext. But this way, it will be deallocated // context to AVCodecContext. But this way, it will be deallocated
// automatically at the time AVCodecContext is freed, so we do that. // automatically at the time AVCodecContext is freed, so we do that.
int ret = av_hwdevice_ctx_create( ctx->hw_device_ctx = av_buffer_ref(get_cuda_context(device.index()));
&ctx->hw_device_ctx, TORCH_INTERNAL_ASSERT(
AV_HWDEVICE_TYPE_CUDA, ctx->hw_device_ctx, "Failed to reference HW device context.");
std::to_string(device.index()).c_str(),
nullptr,
0);
TORCH_CHECK(
ret >= 0, "Failed to create CUDA device context: ", av_err2string(ret));
assert(ctx->hw_device_ctx);
ctx->sw_pix_fmt = ctx->pix_fmt; ctx->sw_pix_fmt = ctx->pix_fmt;
ctx->pix_fmt = AV_PIX_FMT_CUDA; ctx->pix_fmt = AV_PIX_FMT_CUDA;
...@@ -483,7 +478,7 @@ void configure_hw_accel(AVCodecContext* ctx, const std::string& hw_accel) { ...@@ -483,7 +478,7 @@ void configure_hw_accel(AVCodecContext* ctx, const std::string& hw_accel) {
frames_ctx->height = ctx->height; frames_ctx->height = ctx->height;
frames_ctx->initial_pool_size = 5; frames_ctx->initial_pool_size = 5;
ret = av_hwframe_ctx_init(ctx->hw_frames_ctx); int ret = av_hwframe_ctx_init(ctx->hw_frames_ctx);
TORCH_CHECK( TORCH_CHECK(
ret >= 0, ret >= 0,
"Failed to initialize CUDA frame context: ", "Failed to initialize CUDA frame context: ",
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment