Commit 4609daf7 authored by moto's avatar moto Committed by Facebook GitHub Bot
Browse files

Update HW video processing tutorial (#2739)

Summary:
* Add HW encoding to HW tutorial

https://colab.research.google.com/drive/1DDah_IaGULEO66CfQWltRqaVheBkiXdN#scrollTo=eXzKSVrHk1vS

Pull Request resolved: https://github.com/pytorch/audio/pull/2739

Reviewed By: hwangjeff

Differential Revision: D40197086

Pulled By: hwangjeff

fbshipit-source-id: 1780a5419f6705f7c24ba96bd46c3310438af7db
parent 8ef6de9f
......@@ -6,29 +6,32 @@
"id": "3BxT9HiUjMyv"
},
"source": [
"# Accelerated Video Decoding with NVDEC\n",
"# Hardware-Accelerated Video Decoding and Encoding\n",
"\n",
"This tutorial shows how to use Nvidia's hardware video decoding (NVDEC)† with TorchAudio.\n",
"This tutorial shows how to use NVIDIA's hardware video decoder (NVDEC) and encoder (NVENC) with TorchAudio.\n",
"\n",
"Using hardware encoder/decoder improves the speed of loading and saving certain types of videos. Using them in TorchAduio requires additional FFmpeg configuration. This tutorial goes over how to compile FFmpeg, and compare the speed it takes to process video.\n",
"\n",
"**WARNING**\n",
"\n",
"> This tutorial instsalls FFmpeg in system directory.\n",
"> If you run this tutorial on your system, please adjust the build configuration accordingly.\n",
"\n",
"**NOTE**\n",
"\n",
"> This tutorial is authored in Google Colab, and is tailored to Google Colab's specifications.\n",
">\n",
"> Please check out this tutorial in [Google Colab](https://colab.research.google.com/drive/1DDah_IaGULEO66CfQWltRqaVheBkiXdN#sandboxMode=true).\n",
">\n",
"> If you install FFmpeg following this tutorial, please adjust the build configuration accordingly.\n",
"> This tutorial was authored in Google Colab, and is tailored to Google Colab's specifications. Please check out this tutorial in [Google Colab](https://colab.research.google.com/drive/1DDah_IaGULEO66CfQWltRqaVheBkiXdN#sandboxMode=true).\n",
"\n",
"To use NVDEC with TorchAudio, the following items are required.\n",
"To use NVENC/NVDEC with TorchAudio, the following items are required.\n",
"\n",
"1. Nvidia GPU with hardware video encoder.\n",
"2. FFmpeg libraries compiled with NVDEC support.\n",
"1. NVIDIA GPU with hardware video decoder/encoder.\n",
"2. FFmpeg libraries compiled with NVDEC/NVENC support.\n",
"3. PyTorch / TorchAudio with CUDA support.\n",
"\n",
"TorchAudio's binary distributions are compiled against FFmpeg 4 libraries, and they contain the logic required for hardware-based decoding.\n",
"TorchAudio's official binary distributions are compiled with FFmpeg 4 libraries, and they contain the logic required for hardware-based decoding/encoding.\n",
"\n",
"In the following sections, we build FFmpeg 4 libraries with NVDEC support and enable hardware acceleration through TorchAudio's `StreamReader` API. We then compare the time it takes to decode the same MP4 video with CPU and NVDEC.\n",
"In the following sections, we build FFmpeg 4 libraries with NVDEC/NVENC support, then we demonstrate the performance imrovement using TorchAudio's `StreamReader`/`StreamWriter`.\n",
"\n",
"† For details on NVDEC and FFmpeg, please refer to the following articles.\n",
"† For details on NVDEC/NVENC and FFmpeg, please refer to the following articles.\n",
"\n",
"* https://docs.nvidia.com/video-technologies/video-codec-sdk/nvdec-video-decoder-api-prog-guide/\n",
"* https://docs.nvidia.com/video-technologies/video-codec-sdk/ffmpeg-with-nvidia-gpu/#compiling-ffmpeg\n",
......@@ -52,14 +55,14 @@
"base_uri": "https://localhost:8080/"
},
"id": "gCegZlbgutNM",
"outputId": "8a8977be-a568-4ec4-b641-a7dc19b03e3f"
"outputId": "0eca8ead-4671-4b83-ca9b-a602642844a5"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Thu Jun 2 04:14:27 2022 \n",
"Fri Oct 7 13:01:26 2022 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
......@@ -68,7 +71,7 @@
"| | | MIG M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
"| N/A 56C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |\n",
"| N/A 56C P8 10W / 70W | 0MiB / 15109MiB | 0% Default |\n",
"| | | N/A |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
......@@ -88,14 +91,14 @@
},
{
"cell_type": "markdown",
"metadata": {
"id": "S8sX7UqrSPck"
},
"source": [
"## Update PyTorch and TorchAudio with nightly builds\n",
"\n",
"Until TorchAudio 0.12 is released, we need to use the nightly builds of PyTorch and TorchAudio."
]
"Until TorchAudio 0.13 is released, we need to use the nightly builds of PyTorch and TorchAudio."
],
"metadata": {
"id": "8EI-DwaeQbjp"
}
},
{
"cell_type": "code",
......@@ -104,30 +107,33 @@
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "TdQUL-fMSTsn",
"outputId": "937c0450-09c4-4cc0-ad8b-2731b4ff531b"
"id": "fACIAl1dgYt-",
"outputId": "7ff2cb88-33e5-4b8c-9542-4f53cce09cfa"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://download.pytorch.org/whl/nightly/cu113\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://download.pytorch.org/whl/nightly/cu116\n",
"Collecting torch\n",
" Downloading https://download.pytorch.org/whl/nightly/cu113/torch-1.13.0.dev20220601%2Bcu113-cp37-cp37m-linux_x86_64.whl (2102.2 MB)\n",
" Downloading https://download.pytorch.org/whl/nightly/cu116/torch-1.14.0.dev20221007%2Bcu116-cp37-cp37m-linux_x86_64.whl (2286.1 MB)\n",
"\u001b[?25l\n",
"\u001b[?25hCollecting torchaudio\n",
" Downloading https://download.pytorch.org/whl/nightly/cu113/torchaudio-0.12.0.dev20220601%2Bcu113-cp37-cp37m-linux_x86_64.whl (3.8 MB)\n",
" Downloading https://download.pytorch.org/whl/nightly/cu116/torchaudio-0.13.0.dev20221006%2Bcu116-cp37-cp37m-linux_x86_64.whl (4.2 MB)\n",
"\u001b[?25l\n",
"\u001b[?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch) (4.2.0)\n",
"Installing collected packages: torch, torchaudio\n",
"Successfully installed torch-1.13.0.dev20220601+cu113 torchaudio-0.12.0.dev20220601+cu113\n"
"\u001b[?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch) (4.1.1)\n",
"Collecting torch\n",
" Downloading https://download.pytorch.org/whl/nightly/cu116/torch-1.13.0.dev20221006%2Bcu116-cp37-cp37m-linux_x86_64.whl (1983.0 MB)\n",
"\u001b[?25l\n",
"\u001b[?25hInstalling collected packages: torch, torchaudio\n",
"Successfully installed torch-1.13.0.dev20221006+cu116 torchaudio-0.13.0.dev20221006+cu116\n"
]
}
],
"source": [
"!pip3 uninstall -y -q torchaudio torch\n",
"!pip3 install --progress-bar off --pre torch torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu113 2> /dev/null"
"!pip uninstall -y -q torch torchaudio torchvision torchtext\n",
"!pip install --progress-bar off --pre torch torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu116 2> /dev/null"
]
},
{
......@@ -136,7 +142,7 @@
"id": "RAziwx_MqJzr"
},
"source": [
"## Build FFmpeg libraries with Nvidia NVDEC support\n"
"## Build FFmpeg libraries with Nvidia NVDEC/NVENC support\n"
]
},
{
......@@ -147,7 +153,7 @@
"source": [
"### Install NVIDIA Video Codec Headers\n",
"\n",
"To build FFmpeg with NVDEC, we first install the headers that FFmpeg uses to interact with Video Codec SDK."
"To build FFmpeg with NVDEC/NVENC, we first install the headers that FFmpeg uses to interact with Video Codec SDK."
]
},
{
......@@ -158,7 +164,7 @@
"base_uri": "https://localhost:8080/"
},
"id": "vgbusLNCq2mV",
"outputId": "3ff90684-8194-4b6c-9e1f-f295d94df2b0"
"outputId": "04b9c2b6-bc18-4dd0-ea8f-4b320559ebb3"
},
"outputs": [
{
......@@ -166,12 +172,24 @@
"name": "stdout",
"text": [
"Cloning into 'nv-codec-headers'...\n",
"remote: Enumerating objects: 808, done.\u001b[K\n",
"remote: Counting objects: 100% (808/808), done.\u001b[K\n",
"remote: Compressing objects: 100% (688/688), done.\u001b[K\n",
"remote: Total 808 (delta 436), reused 0 (delta 0)\n",
"Receiving objects: 100% (808/808), 154.86 KiB | 396.00 KiB/s, done.\n",
"Resolving deltas: 100% (436/436), done.\n",
"remote: Enumerating objects: 819, done.\u001b[K\n",
"remote: Counting objects: 100% (819/819), done.\u001b[K\n",
"remote: Compressing objects: 100% (697/697), done.\u001b[K\n",
"remote: Total 819 (delta 439), reused 0 (delta 0)\u001b[K\n",
"Receiving objects: 100% (819/819), 156.42 KiB | 410.00 KiB/s, done.\n",
"Resolving deltas: 100% (439/439), done.\n",
"Note: checking out 'n11.0.10.1'.\n",
"\n",
"You are in 'detached HEAD' state. You can look around, make experimental\n",
"changes and commit them, and you can discard any commits you make in this\n",
"state without impacting any branches by performing another checkout.\n",
"\n",
"If you want to create a new branch to retain commits you create, you may\n",
"do so (now or later) by using -b with the checkout command again. Example:\n",
"\n",
" git checkout -b <new-branch-name>\n",
"\n",
"HEAD is now at 315ad74 add cuMemcpy\n",
"sed 's#@@PREFIX@@#/usr/local#' ffnvcodec.pc.in > ffnvcodec.pc\n",
"install -m 0755 -d '/usr/local/include/ffnvcodec'\n",
"install -m 0644 include/ffnvcodec/*.h '/usr/local/include/ffnvcodec'\n",
......@@ -182,7 +200,8 @@
],
"source": [
"!git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git\n",
"!cd nv-codec-headers && sudo make install"
"# Note: Google Colab's GPU has NVENC API ver 11.0, so we checkout 11.0 tag.\n",
"!cd nv-codec-headers && git checkout n11.0.10.1 && sudo make install"
]
},
{
......@@ -217,7 +236,7 @@
"source": [
"### Install FFmpeg build and runtime dependencies\n",
"\n",
"In the later test, we use H264-encoded MP4 video streamed over HTTPS protocol, so we install the libraries for them here."
"In the later test, we use H264 video codec and HTTPS protocol, so we install the libraries for them here."
]
},
{
......@@ -228,35 +247,34 @@
"base_uri": "https://localhost:8080/"
},
"id": "Vn50oz2UvvPP",
"outputId": "afccca8a-b272-425f-9c5e-14c919db60cc"
"outputId": "b2e695d9-c953-4370-fb8d-92cdd8a832be"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"STRIP\tinstall-libavutil-shared\n",
"\n",
"... Omitted for brevity ...\n",
"\n",
"\n",
"Setting up libx264-dev:amd64 (2:0.152.2854+gite9a5903-2) ...\n",
"Setting up yasm (1.3.0-2build1) ...\n",
"Setting up libunbound2:amd64 (1.6.7-1ubuntu2.4) ...\n",
"Setting up libunbound2:amd64 (1.6.7-1ubuntu2.5) ...\n",
"Setting up libp11-kit-dev:amd64 (0.23.9-2ubuntu0.1) ...\n",
"Setting up libtasn1-6-dev:amd64 (4.13-2) ...\n",
"Setting up libtasn1-doc (4.13-2) ...\n",
"Setting up libgnutlsxx28:amd64 (3.5.18-1ubuntu1.5) ...\n",
"Setting up libgnutls-dane0:amd64 (3.5.18-1ubuntu1.5) ...\n",
"Setting up libgnutls-openssl27:amd64 (3.5.18-1ubuntu1.5) ...\n",
"Setting up libgnutlsxx28:amd64 (3.5.18-1ubuntu1.6) ...\n",
"Setting up libgnutls-dane0:amd64 (3.5.18-1ubuntu1.6) ...\n",
"Setting up libgnutls-openssl27:amd64 (3.5.18-1ubuntu1.6) ...\n",
"Setting up libgmpxx4ldbl:amd64 (2:6.1.2+dfsg-2) ...\n",
"Setting up libidn2-dev:amd64 (2.0.4-1.1ubuntu0.2) ...\n",
"Setting up libidn2-0-dev (2.0.4-1.1ubuntu0.2) ...\n",
"Setting up libgmp-dev:amd64 (2:6.1.2+dfsg-2) ...\n",
"Setting up nettle-dev:amd64 (3.4.1-0ubuntu0.18.04.1) ...\n",
"Setting up libgnutls28-dev:amd64 (3.5.18-1ubuntu1.5) ...\n",
"Setting up libgnutls28-dev:amd64 (3.5.18-1ubuntu1.6) ...\n",
"Processing triggers for man-db (2.8.3-2ubuntu0.1) ...\n",
"Processing triggers for libc-bin (2.27-3ubuntu1.3) ...\n",
"/sbin/ldconfig.real: /usr/local/lib/python3.7/dist-packages/ideep4py/lib/libmkldnn.so.0 is not a symbolic link\n",
"\n"
"Processing triggers for libc-bin (2.27-3ubuntu1.6) ...\n"
]
}
],
......@@ -275,11 +293,12 @@
"\n",
"Next we configure FFmpeg build. Note the following:\n",
"\n",
"1. We provide flags like `-I/usr/local/cuda/include`, `-L/usr/local/cuda/lib64` and `--enable-nvdec` to enable NVDEC. Please check out the Transcoding Guide† for the detail.\n",
"2. We also provide NVCC flags with compute capability 37. \n",
"1. We provide flags like `-I/usr/local/cuda/include`, `-L/usr/local/cuda/lib64` to let the build process know where the CUDA libraries are found.\n",
"2. We provide flags like `--enable-nvdec` and `--enable-nvenc` to enable NVDEC/NVENC. Please check out the Transcoding Guide† for the detail.\n",
"3. We also provide NVCC flags with compute capability 37. \n",
"This is because by default the configuration script verifies NVCC by compiling sample code targeting compute capability 30, which is too old for CUDA 11.\n",
"3. Many features are disabled to reduce the compilation time.\n",
"4. We install the library in `/usr/lib/`, which is one of the active search path of the dynamic loader. \n",
"4. Many features are disabled to reduce the compilation time.\n",
"5. We install the library in `/usr/lib/`, which is one of the active search path of the dynamic loader. \n",
"Doing so allows the resulting libraries to be found without requiring a restart of the current session. This might be an undesirable location, e.g. when one isn't using a disposable VM.\n",
"\n",
"† NVIDIA FFmpeg Transcoding Guide https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/"
......@@ -293,7 +312,7 @@
"base_uri": "https://localhost:8080/"
},
"id": "KZr8bMdztRz1",
"outputId": "ea48c157-54bc-42aa-dbe6-235514de418a"
"outputId": "d542338b-bfcd-4176-c349-4a9235fcf411"
},
"outputs": [
{
......@@ -350,9 +369,9 @@
"iconv libxcb_xfixes\n",
"\n",
"External libraries providing hardware acceleration:\n",
"cuda cuvid nvdec\n",
"cuda_llvm ffnvcodec nvenc\n",
"cuda_nvcc libnpp v4l2_m2m\n",
"cuda cuvid nvenc\n",
"cuda_llvm ffnvcodec v4l2_m2m\n",
"cuda_nvcc nvdec\n",
"\n",
"Libraries:\n",
"avcodec avformat swscale\n",
......@@ -363,13 +382,14 @@
"ffmpeg ffprobe\n",
"\n",
"Enabled decoders:\n",
"aac hevc vc1\n",
"av1 mjpeg vp8\n",
"h263 mpeg1video vp9\n",
"h264 mpeg2video\n",
"aac hevc rawvideo\n",
"av1 mjpeg vc1\n",
"h263 mpeg1video vp8\n",
"h264 mpeg2video vp9\n",
"h264_cuvid mpeg4\n",
"\n",
"Enabled encoders:\n",
"h264_nvenc libx264\n",
"\n",
"Enabled hwaccels:\n",
"av1_nvdec mpeg1_nvdec vp8_nvdec\n",
......@@ -384,21 +404,24 @@
"mov\n",
"\n",
"Enabled muxers:\n",
"mov mp4\n",
"\n",
"Enabled protocols:\n",
"file tcp\n",
"https tls\n",
"\n",
"Enabled filters:\n",
"aformat hflip trim\n",
"anull null vflip\n",
"atrim scale\n",
"format transpose\n",
"aformat hflip transpose\n",
"anull null trim\n",
"atrim scale vflip\n",
"format testsrc2\n",
"\n",
"Enabled bsfs:\n",
"h264_mp4toannexb null vp9_superframe_split\n",
"aac_adtstoasc null vp9_superframe_split\n",
"h264_mp4toannexb vp9_superframe\n",
"\n",
"Enabled indevs:\n",
"lavfi\n",
"\n",
"Enabled outdevs:\n",
"\n",
......@@ -421,9 +444,18 @@
"# NOTE:\n",
"# We disable most of the features to speed up compilation\n",
"# The necessary components are\n",
"# - demuxer: mov\n",
"# - decoder: h264\n",
"# - demuxer: mov, aac\n",
"# - decoder: h264, h264_nvdec\n",
"# - muxer: mp4\n",
"# - encoder: libx264, h264_nvenc\n",
"# - gnutls (HTTPS)\n",
"#\n",
"# Additionally, we use FFmpeg's virtual input device to generate \n",
"# test video data. This requires\n",
"# - input device: lavfi\n",
"# - filter: testsrc2\n",
"# - decoder: rawvideo\n",
"#\n",
"\n",
"!cd ffmpeg && ./configure \\\n",
" --prefix='/usr/' \\\n",
......@@ -445,8 +477,14 @@
" --enable-decoder=aac \\\n",
" --enable-decoder=h264 \\\n",
" --enable-decoder=h264_cuvid \\\n",
" --enable-decoder=rawvideo \\\n",
" --enable-indev=lavfi \\\n",
" --enable-encoder=libx264 \\\n",
" --enable-encoder=h264_nvenc \\\n",
" --enable-demuxer=mov \\\n",
" --enable-muxer=mp4 \\\n",
" --enable-filter=scale \\\n",
" --enable-filter=testsrc2 \\\n",
" --enable-protocol=file \\\n",
" --enable-protocol=https \\\n",
" --enable-gnutls \\\n",
......@@ -455,8 +493,8 @@
" --enable-nonfree \\\n",
" --enable-cuda-nvcc \\\n",
" --enable-libx264 \\\n",
" --enable-libnpp \\\n",
" --enable-nvenc \\\n",
" --enable-cuvid \\\n",
" --enable-nvdec"
]
},
......@@ -477,7 +515,7 @@
"base_uri": "https://localhost:8080/"
},
"id": "Ki9PN3V8XbSh",
"outputId": "62b51d59-ffd2-4465-dd6f-e8e2600f01f2"
"outputId": "47b40f5f-ec63-4827-fdac-a7e17f6eb1d8"
},
"outputs": [
{
......@@ -521,62 +559,64 @@
"base_uri": "https://localhost:8080/"
},
"id": "4fu9POO7kbQL",
"outputId": "7e9361f0-4f7b-4007-db94-b24daa295801"
"outputId": "e55f7443-3a7f-471c-c894-5677fd843ee7"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"ffprobe version 4.4.2 Copyright (c) 2007-2021 the FFmpeg developers\n",
" built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)\n",
" configuration: --prefix=/usr/ --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --nvccflags='-gencode arch=compute_37,code=sm_37 -O2' --disable-doc --disable-static --disable-bsfs --disable-decoders --disable-encoders --disable-filters --disable-demuxers --disable-devices --disable-muxers --disable-parsers --disable-postproc --disable-protocols --enable-decoder=aac --enable-decoder=h264 --enable-decoder=h264_cuvid --enable-demuxer=mov --enable-filter=scale --enable-protocol=file --enable-protocol=https --enable-gnutls --enable-shared --enable-gpl --enable-nonfree --enable-cuda-nvcc --enable-libx264 --enable-libnpp --enable-nvenc --enable-nvdec\n",
" libavutil 56. 70.100 / 56. 70.100\n",
" libavcodec 58.134.100 / 58.134.100\n",
" libavformat 58. 76.100 / 58. 76.100\n",
" libavdevice 58. 13.100 / 58. 13.100\n",
" libavfilter 7.110.100 / 7.110.100\n",
" libswscale 5. 9.100 / 5. 9.100\n",
" libswresample 3. 9.100 / 3. 9.100\n",
"Decoders:\n",
" V..... = Video\n",
" A..... = Audio\n",
" S..... = Subtitle\n",
" .F.... = Frame-level multithreading\n",
" ..S... = Slice-level multithreading\n",
" ...X.. = Codec is experimental\n",
" ....B. = Supports draw_horiz_band\n",
" .....D = Supports direct rendering method 1\n",
" ------\n",
" V....D av1 Alliance for Open Media AV1\n",
" V...BD h263 H.263 / H.263-1996, H.263+ / H.263-1998 / H.263 version 2\n",
" VFS..D h264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10\n",
" V..... h264_cuvid Nvidia CUVID H264 decoder (codec h264)\n",
" VFS..D hevc HEVC (High Efficiency Video Coding)\n",
" V....D mjpeg MJPEG (Motion JPEG)\n",
" V.S.BD mpeg1video MPEG-1 video\n",
" V.S.BD mpeg2video MPEG-2 video\n",
" VF..BD mpeg4 MPEG-4 part 2\n",
" V....D vc1 SMPTE VC-1\n",
" VFS..D vp8 On2 VP8\n",
" VFS..D vp9 Google VP9\n",
" A....D aac AAC (Advanced Audio Coding)\n"
" V..... h264_cuvid Nvidia CUVID H264 decoder (codec h264)\n"
]
}
],
"source": [
"!ffprobe -decoders"
"!ffprobe -hide_banner -decoders | grep h264"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "499UCyKtFPrO",
"outputId": "2cd0f6a5-cde9-42b7-c30b-95c920a44720"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" V..... libx264 libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (codec h264)\n",
" V....D h264_nvenc NVIDIA NVENC H.264 encoder (codec h264)\n"
]
}
],
"source": [
"!ffmpeg -hide_banner -encoders | grep 264"
]
},
{
"cell_type": "markdown",
"source": [
"The following command fetches video from remote server, decode with NVDEC (cuvid) and re-encode with NVENC. If this command does not work, then there is an issue with FFmpeg installation, and TorchAudio would not be able to use them either."
],
"metadata": {
"id": "yZunKfgUb-i_"
}
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "A--jYQBrfcOr",
"outputId": "a2993442-fa41-4ad0-925d-1c84a4ac3cad"
"outputId": "f7f32af2-845d-4326-c9a7-6f3e05ca7231"
},
"outputs": [
{
......@@ -597,46 +637,69 @@
" Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)\n",
" Metadata:\n",
" handler_name : #Mainconcept MP4 Sound Media Handler\n",
" vendor_id : [0][0][0][0]\n"
" vendor_id : [0][0][0][0]\n",
"Stream mapping:\n",
" Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))\n",
" Stream #0:1 -> #0:1 (copy)\n",
"Press [q] to stop, [?] for help\n",
"Output #0, mp4, to 'test.mp4':\n",
" Metadata:\n",
" major_brand : mp42\n",
" minor_version : 512\n",
" compatible_brands: mp42iso2avc1mp41\n",
" encoder : Lavf58.76.100\n",
" Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), cuda(tv, bt709, progressive), 360x240 [SAR 1:1 DAR 3:2], q=2-31, 5000 kb/s, 29.97 fps, 30k tbn (default)\n",
" Metadata:\n",
" handler_name : ?Mainconcept Video Media Handler\n",
" vendor_id : [0][0][0][0]\n",
" encoder : Lavc58.134.100 h264_nvenc\n",
" Side data:\n",
" cpb: bitrate max/min/avg: 0/0/5000000 buffer size: 10000000 vbv_delay: N/A\n",
" Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)\n",
" Metadata:\n",
" handler_name : #Mainconcept MP4 Sound Media Handler\n",
" vendor_id : [0][0][0][0]\n",
"frame= 6175 fps=1712 q=11.0 Lsize= 37935kB time=00:03:26.01 bitrate=1508.5kbits/s speed=57.1x \n",
"video:34502kB audio:3234kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.526932%\n"
]
}
],
"source": [
"!ffprobe -hide_banner \"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4\""
"!ffmpeg -hide_banner -y -vsync 0 -hwaccel cuvid -hwaccel_output_format cuda -c:v h264_cuvid -resize 360x240 -i \"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4\" -c:a copy -c:v h264_nvenc -b:v 5M test.mp4"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HSCZFqt95GZm"
"id": "UYzJWphMG9nh"
},
"source": [
"## Benchmark NVDEC with TorchAudio\n",
"## Benchmarking GPU Encoding and Decoding\n",
"\n",
"Now that FFmpeg and the resulting libraries are ready to use, we test NVDEC with TorchAudio. For the basics of TorchAudio's streaming API, please refer to [Streaming API tutorial](https://pytorch.org/audio/main/tutorials/streaming_api_tutorial.html).\n",
"Now that FFmpeg and the resulting libraries are ready to use, we test NVDEC/NVENC with TorchAudio. For the basics of TorchAudio's streaming APIs, please refer to [Media I/O tutorials](https://pytorch.org/audio/main/tutorials.io.html).\n",
"\n",
"**Note**\n",
"\n",
"If you rebuild FFmpeg after importing class StreamReader, you'll need to restart the session to activate the newly built FFmpeg libraries."
"If you rebuild FFmpeg after importing torchaudio, you'll need to restart the session to activate the newly built FFmpeg libraries."
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wDJ-qdM45fV6",
"outputId": "4e533844-40a4-4cfc-c3ce-21230dbd4ae4"
"outputId": "9242a9bd-62e8-4217-8113-a18a62fbbe94"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"1.13.0.dev20220601+cu113\n",
"0.12.0.dev20220601+cu113\n"
"1.13.0.dev20221006+cu116\n",
"0.13.0.dev20221006+cu116\n"
]
}
],
......@@ -647,18 +710,45 @@
"print(torch.__version__)\n",
"print(torchaudio.__version__)\n",
"\n",
"from torchaudio.io import StreamReader"
"from torchaudio.io import StreamReader, StreamWriter"
]
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 12,
"metadata": {
"id": "zKMACIJNIXns"
},
"outputs": [],
"source": [
"import time\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"\n",
"pd.set_option('display.max_rows', None)\n",
"pd.set_option('display.max_columns', None)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HSCZFqt95GZm"
},
"source": [
"## Benchmark NVDEC with `StreamReader`\n",
"\n",
"First we test hardware decoding, and we fetch video from multiple locations (local file, network file, AWS S3) and use NVDEC to decod them."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "lbA3S6p_Z2lI",
"outputId": "99ce81f8-513d-4301-8297-3165f1c94711"
"outputId": "818e6aef-048a-49d3-a29b-1e3f421dc8ef"
},
"outputs": [
{
......@@ -667,27 +757,27 @@
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting boto3\n",
" Downloading boto3-1.24.1-py3-none-any.whl (132 kB)\n",
"\u001b[?25l\n",
"\u001b[?25hCollecting botocore<1.28.0,>=1.27.1\n",
" Downloading botocore-1.27.1-py3-none-any.whl (8.8 MB)\n",
" Downloading boto3-1.24.88-py3-none-any.whl (132 kB)\n",
"\u001b[?25l\n",
"\u001b[?25hCollecting s3transfer<0.7.0,>=0.6.0\n",
" Downloading s3transfer-0.6.0-py3-none-any.whl (79 kB)\n",
"\u001b[?25l\n",
"\u001b[?25hCollecting jmespath<2.0.0,>=0.7.1\n",
" Downloading jmespath-1.0.0-py3-none-any.whl (23 kB)\n",
"Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.7/dist-packages (from botocore<1.28.0,>=1.27.1->boto3) (2.8.2)\n",
"Collecting urllib3<1.27,>=1.25.4\n",
" Downloading urllib3-1.26.9-py2.py3-none-any.whl (138 kB)\n",
" Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)\n",
"Collecting botocore<1.28.0,>=1.27.88\n",
" Downloading botocore-1.27.88-py3-none-any.whl (9.2 MB)\n",
"\u001b[?25l\n",
"\u001b[?25hRequirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.28.0,>=1.27.1->boto3) (1.15.0)\n",
"\u001b[?25hCollecting urllib3<1.27,>=1.25.4\n",
" Downloading urllib3-1.26.12-py2.py3-none-any.whl (140 kB)\n",
"\u001b[?25l\n",
"\u001b[?25hRequirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.7/dist-packages (from botocore<1.28.0,>=1.27.88->boto3) (2.8.2)\n",
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.28.0,>=1.27.88->boto3) (1.15.0)\n",
"Installing collected packages: urllib3, jmespath, botocore, s3transfer, boto3\n",
" Attempting uninstall: urllib3\n",
" Found existing installation: urllib3 1.24.3\n",
" Uninstalling urllib3-1.24.3:\n",
" Successfully uninstalled urllib3-1.24.3\n",
"Successfully installed boto3-1.24.1 botocore-1.27.1 jmespath-1.0.0 s3transfer-0.6.0 urllib3-1.26.9\n"
"Successfully installed boto3-1.24.88 botocore-1.27.88 jmespath-1.0.1 s3transfer-0.6.0 urllib3-1.26.12\n"
]
}
],
......@@ -697,41 +787,34 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 14,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "g3TMksRk5ara",
"outputId": "8268b252-b4bc-48d6-fe7c-e381981b1177"
"outputId": "17191485-7188-4986-e540-c77df1c5afe0"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"1.24.1\n"
"1.24.88\n"
]
}
],
"source": [
"import time\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import boto3\n",
"from botocore import UNSIGNED\n",
"from botocore.config import Config\n",
"\n",
"print(boto3.__version__)\n",
"\n",
"pd.set_option('display.max_rows', None)\n",
"pd.set_option('display.max_columns', None)"
"print(boto3.__version__)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 15,
"metadata": {
"id": "d0GYPyGGTz-q"
},
......@@ -748,12 +831,12 @@
"source": [
"First, we define the functions we'll use for testing.\n",
"\n",
"Funcion `test` decodes the given source from start to end, and it reports the elapsed time, and returns one image frmae as a sample."
"Funcion `test_decode` decodes the given source from start to end, and it reports the elapsed time, and returns one image frmae as a sample."
]
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 16,
"metadata": {
"id": "GsmoRAY75T6J"
},
......@@ -763,7 +846,7 @@
"samples = [[None, None] for _ in range(4)]\n",
"\n",
"\n",
"def test(src, config, i_sample):\n",
"def test_decode(src, config, i_sample):\n",
" print(\"=\" * 40)\n",
" print(\"* Configuration:\", config)\n",
" print(\"* Source:\", src)\n",
......@@ -803,7 +886,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 17,
"metadata": {
"id": "mlrA9oVM3qvt"
},
......@@ -811,14 +894,6 @@
"source": [
"local_src = \"input.mp4\"\n",
"\n",
"cpu_conf = {\n",
" \"decoder\": \"h264\", # CPU decoding\n",
"}\n",
"cuda_conf = {\n",
" \"decoder\": \"h264_cuvid\", # Use CUDA HW decoder\n",
" \"hw_accel\": \"cuda:0\", # Then keep the memory on CUDA:0\n",
"}\n",
"\n",
"i_sample = 520"
]
},
......@@ -833,13 +908,13 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 18,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "z6kivDretHtS",
"outputId": "754b3bfa-58f9-4ac5-bb1a-b44820bf8c50"
"outputId": "fc14c8a1-6f26-4370-b532-d90098d0f775"
},
"outputs": [
{
......@@ -853,18 +928,22 @@
" - Chunk: torch.Size([5, 3, 540, 960]) cpu torch.uint8\n",
"\n",
" - Processed 6175 frames.\n",
" - Elapsed: 45.752042501000005 seconds.\n",
" - Elapsed: 46.691246449000005 seconds.\n",
"\n"
]
}
],
"source": [
"elapsed, sample = test(local_src, cpu_conf, i_sample)"
"cpu_conf = {\n",
" \"decoder\": \"h264\", # CPU decoding\n",
"}\n",
"\n",
"elapsed, sample = test_decode(local_src, cpu_conf, i_sample)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 19,
"metadata": {
"id": "vAEDVY0D37bJ"
},
......@@ -885,13 +964,13 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 20,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "V6jwiJD-tW0M",
"outputId": "6ebb87af-3056-4b41-dcba-b6e104a4084a"
"outputId": "0f40c5b2-5e04-4ca1-9902-5a9ed9dc5820"
},
"outputs": [
{
......@@ -905,18 +984,23 @@
" - Chunk: torch.Size([5, 3, 540, 960]) cuda:0 torch.uint8\n",
"\n",
" - Processed 6175 frames.\n",
" - Elapsed: 7.458571206999977 seconds.\n",
" - Elapsed: 6.117441406000012 seconds.\n",
"\n"
]
}
],
"source": [
"elapsed, sample = test(local_src, cuda_conf, i_sample)"
"cuda_conf = {\n",
" \"decoder\": \"h264_cuvid\", # Use CUDA HW decoder\n",
" \"hw_accel\": \"cuda:0\", # Then keep the memory on CUDA:0\n",
"}\n",
"\n",
"elapsed, sample = test_decode(local_src, cuda_conf, i_sample)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 21,
"metadata": {
"id": "yIA6UAaw39VC"
},
......@@ -939,7 +1023,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 22,
"metadata": {
"id": "ICa21rL84QVb"
},
......@@ -960,13 +1044,13 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 23,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6AGfH8X8tboy",
"outputId": "c9c6eea1-93e5-4b0b-9a3f-86fbbbe63ce1"
"outputId": "3039e4b6-35a2-4b55-e7a7-d8b1b63e84f6"
},
"outputs": [
{
......@@ -980,18 +1064,18 @@
" - Chunk: torch.Size([5, 3, 540, 960]) cpu torch.uint8\n",
"\n",
" - Processed 6175 frames.\n",
" - Elapsed: 40.36345302500001 seconds.\n",
" - Elapsed: 46.460909987000036 seconds.\n",
"\n"
]
}
],
"source": [
"elapsed, sample = test(network_src, cpu_conf, i_sample)"
"elapsed, sample = test_decode(network_src, cpu_conf, i_sample)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 24,
"metadata": {
"id": "4u8glKk14bYT"
},
......@@ -1012,13 +1096,13 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 25,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "95bky0TqtfMI",
"outputId": "ed78214f-79ce-407a-aae0-af9530ed4a17"
"outputId": "14e2cea4-fe27-405b-feec-e276c466542a"
},
"outputs": [
{
......@@ -1032,18 +1116,18 @@
" - Chunk: torch.Size([5, 3, 540, 960]) cuda:0 torch.uint8\n",
"\n",
" - Processed 6175 frames.\n",
" - Elapsed: 4.222158643999933 seconds.\n",
" - Elapsed: 4.23164078800005 seconds.\n",
"\n"
]
}
],
"source": [
"elapsed, sample = test(network_src, cuda_conf, i_sample)"
"elapsed, sample = test_decode(network_src, cuda_conf, i_sample)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 26,
"metadata": {
"id": "kir92rPk4eLr"
},
......@@ -1066,7 +1150,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 27,
"metadata": {
"id": "XBYc5AiGORUU"
},
......@@ -1088,7 +1172,7 @@
"#### Defining Helper class\n",
"\n",
"StreamReader supports file-like objects with `read` method. In addition to this,\n",
"if the file-like object has `seek` method, StreamReader attempts to use it for more reliable detection of medi formats.\n",
"if the file-like object has `seek` method, StreamReader attempts to use it for more reliable detection of media formats.\n",
"\n",
"However, the seek method of `boto3`'s S3 client response object only raises errors to let users know that seek operation is not supported. Therefore we wrap it with a class that does not have `seek` method. This way, StreamReader won't try to use the `seek` method.\n",
"\n",
......@@ -1099,7 +1183,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 28,
"metadata": {
"id": "Odq5pMJ4M2ph"
},
......@@ -1129,13 +1213,13 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 29,
"metadata": {
"id": "rjDH0g0qOAhh",
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "rjDH0g0qOAhh",
"outputId": "1a210ed1-6497-4da2-d70a-adbd5dba0b1f"
"outputId": "e8323115-a3f6-4525-9645-e20d77fadc95"
},
"outputs": [
{
......@@ -1144,12 +1228,12 @@
"text": [
"========================================\n",
"* Configuration: {'decoder': 'h264'}\n",
"* Source: <botocore.response.StreamingBody object at 0x7fecbfcb5c90>\n",
"* Source: <botocore.response.StreamingBody object at 0x7fb991dddfd0>\n",
"========================================\n",
" - Chunk: torch.Size([5, 3, 540, 960]) cpu torch.uint8\n",
"\n",
" - Processed 6175 frames.\n",
" - Elapsed: 40.16508613600001 seconds.\n",
" - Elapsed: 40.758733775999985 seconds.\n",
"\n"
]
}
......@@ -1157,12 +1241,12 @@
"source": [
"response = s3_client.get_object(Bucket=bucket, Key=key)\n",
"src = UnseekableWrapper(response[\"Body\"])\n",
"elapsed, sample = test(src, cpu_conf, i_sample)"
"elapsed, sample = test_decode(src, cpu_conf, i_sample)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 30,
"metadata": {
"id": "JfjvXqBqQGyu"
},
......@@ -1183,13 +1267,13 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 31,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "17-0MVlZOm-f",
"outputId": "9a561873-28aa-4962-a527-1edba8510caf"
"outputId": "9604cb57-cdb1-4e72-8754-47dfa737f0f7"
},
"outputs": [
{
......@@ -1198,12 +1282,12 @@
"text": [
"========================================\n",
"* Configuration: {'decoder': 'h264_cuvid', 'hw_accel': 'cuda:0'}\n",
"* Source: <botocore.response.StreamingBody object at 0x7fecbfc70390>\n",
"* Source: <botocore.response.StreamingBody object at 0x7fb991d390d0>\n",
"========================================\n",
" - Chunk: torch.Size([5, 3, 540, 960]) cuda:0 torch.uint8\n",
"\n",
" - Processed 6175 frames.\n",
" - Elapsed: 4.510979067999983 seconds.\n",
" - Elapsed: 4.620101478000038 seconds.\n",
"\n"
]
}
......@@ -1211,12 +1295,12 @@
"source": [
"response = s3_client.get_object(Bucket=bucket, Key=key)\n",
"src = UnseekableWrapper(response[\"Body\"])\n",
"elapsed, sample = test(src, cuda_conf, i_sample)"
"elapsed, sample = test_decode(src, cuda_conf, i_sample)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 32,
"metadata": {
"id": "VpDLx494QIN4"
},
......@@ -1239,24 +1323,12 @@
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 33,
"metadata": {
"id": "XKpuH_tXvMSF"
},
"outputs": [],
"source": [
"cpu_conf = {\n",
" \"decoder\": \"h264\", # CPU decoding\n",
" \"filter_desc\": \"scale=360:240\", # Software filter\n",
"}\n",
"cuda_conf = {\n",
" \"decoder\": \"h264_cuvid\", # Use CUDA HW decoder\n",
" \"decoder_option\": {\n",
" \"resize\": \"360x240\", # Then apply HW preprocessing (resize)\n",
" },\n",
" \"hw_accel\": \"cuda:0\", # Then keep the memory on CUDA:0\n",
"}\n",
"\n",
"i_sample = 1085"
]
},
......@@ -1271,13 +1343,13 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 34,
"metadata": {
"id": "OEtAVQVYwVCy",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "d5a88558-25e4-4f54-e5c5-80ede47eb9f5"
"id": "OEtAVQVYwVCy",
"outputId": "29fd9dca-5378-48a3-9ef3-26ab1b892107"
},
"outputs": [
{
......@@ -1291,18 +1363,23 @@
" - Chunk: torch.Size([5, 3, 240, 360]) cpu torch.uint8\n",
"\n",
" - Processed 6175 frames.\n",
" - Elapsed: 18.506949264000013 seconds.\n",
" - Elapsed: 19.082725973000038 seconds.\n",
"\n"
]
}
],
"source": [
"elapsed, sample = test(local_src, cpu_conf, i_sample)"
"cpu_conf = {\n",
" \"decoder\": \"h264\", # CPU decoding\n",
" \"filter_desc\": \"scale=360:240\", # Software filter\n",
"}\n",
"\n",
"elapsed, sample = test_decode(local_src, cpu_conf, i_sample)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 35,
"metadata": {
"id": "uzZMz0rW4j73"
},
......@@ -1323,13 +1400,13 @@
},
{
"cell_type": "code",
"execution_count": 34,
"execution_count": 36,
"metadata": {
"id": "XajhZb-G4mwm",
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "XajhZb-G4mwm",
"outputId": "9e0eb31f-3ba6-44dc-a2d8-bec146d4ca6d"
"outputId": "29038cfd-2d4e-4b4b-a40f-ef6731ddff96"
},
"outputs": [
{
......@@ -1343,18 +1420,26 @@
" - Chunk: torch.Size([5, 3, 240, 360]) cuda:0 torch.uint8\n",
"\n",
" - Processed 6175 frames.\n",
" - Elapsed: 4.9442481019999605 seconds.\n",
" - Elapsed: 4.157032522999998 seconds.\n",
"\n"
]
}
],
"source": [
"elapsed, sample = test(local_src, cuda_conf, i_sample)"
"cuda_conf = {\n",
" \"decoder\": \"h264_cuvid\", # Use CUDA HW decoder\n",
" \"decoder_option\": {\n",
" \"resize\": \"360x240\", # Then apply HW preprocessing (resize)\n",
" },\n",
" \"hw_accel\": \"cuda:0\", # Then keep the memory on CUDA:0\n",
"}\n",
"\n",
"elapsed, sample = test_decode(local_src, cuda_conf, i_sample)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"execution_count": 37,
"metadata": {
"id": "5mXoy38TwYVx"
},
......@@ -1370,7 +1455,7 @@
"id": "KEA2jrrKqmr4"
},
"source": [
"## Results\n",
"### Results\n",
"\n",
"The following table summarizes the time it took to decode the same media with CPU and NVDEC.\n",
"We see significant speedup with NVDEC."
......@@ -1378,13 +1463,13 @@
},
{
"cell_type": "code",
"execution_count": 36,
"execution_count": 38,
"metadata": {
"id": "Ni0bihmogZAb",
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Ni0bihmogZAb",
"outputId": "4ec33e88-2bbd-48fa-cf61-6a4b27250e0c"
"outputId": "b586825a-1164-4b9e-f5ca-79febf95628a"
},
"outputs": [
{
......@@ -1392,10 +1477,10 @@
"name": "stdout",
"text": [
" CPU NVDEC\n",
"Decoding (local file) 45.752041 7.458571\n",
"Decoding (network file) 40.363453 4.222158\n",
"Decoding (file-like object, S3) 40.165085 4.510979\n",
"Decoding + Resize 18.506948 4.944248\n"
"Decoding (local file) 46.691246 6.117441\n",
"Decoding (network file) 46.460911 4.231641\n",
"Decoding (file-like object, S3) 40.758736 4.620101\n",
"Decoding + Resize 19.082726 4.157032\n"
]
}
],
......@@ -1419,7 +1504,7 @@
},
{
"cell_type": "code",
"execution_count": 37,
"execution_count": 39,
"metadata": {
"id": "nWTT5ih5v1s6"
},
......@@ -1446,14 +1531,14 @@
},
{
"cell_type": "code",
"execution_count": 38,
"execution_count": 40,
"metadata": {
"id": "5AL9u6_xmRQa",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "df0f38de-57d5-44bd-ee65-5304986a6ad8"
"outputId": "89bdea08-4d2c-4192-874c-6e1b316ea63a"
},
"outputs": [
{
......@@ -1464,7 +1549,7 @@
]
},
"metadata": {},
"execution_count": 38
"execution_count": 40
},
{
"output_type": "display_data",
......@@ -1489,6 +1574,715 @@
"plt.plot(block=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eXzKSVrHk1vS"
},
"source": [
"## Benchmark NVENC with `StreamWriter`\n",
"\n",
"Next, we benchmark encoding speed with StreamWriter and NVENC.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"id": "k5WkyQEp0hjv"
},
"outputs": [],
"source": [
"def test_encode(data, dst, **config):\n",
" print(\"=\" * 40)\n",
" print(\"* Configuration:\", config)\n",
" print(\"* Destination:\", dst)\n",
" print(\"=\" * 40)\n",
"\n",
" s = StreamWriter(dst)\n",
" s.add_video_stream(**config)\n",
"\n",
" t0 = time.monotonic()\n",
" with s.open():\n",
" s.write_video_chunk(0, data)\n",
" elapsed = time.monotonic() - t0\n",
"\n",
" print()\n",
" print(f\" - Processed {len(data)} frames.\")\n",
" print(f\" - Elapsed: {elapsed} seconds.\")\n",
" print()\n",
" return elapsed\n",
"\n",
"result = torch.zeros((3, 3))"
]
},
{
"cell_type": "markdown",
"source": [
"We use ``StreamReader`` to generate test data."
],
"metadata": {
"id": "wx3cQ7YwX880"
}
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"id": "nHynkI-oK9mX"
},
"outputs": [],
"source": [
"def get_data(frame_rate, height, width, format, duration=15):\n",
" src = f\"testsrc2=rate={frame_rate}:size={width}x{height}:duration={duration}\"\n",
" s = StreamReader(src=src, format=\"lavfi\")\n",
" s.add_basic_video_stream(-1, format=format)\n",
" s.process_all_packets()\n",
" video, = s.pop_chunks()\n",
" return video"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1IQafz-HEQtu"
},
"source": [
"### Encode MP4 - 360P\n",
"\n",
"For the first test, we compare the time it takes for CPU and NVENC to encode 15 seconds of video with small resolution."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"id": "2dZk8Ui7EZew"
},
"outputs": [],
"source": [
"pict_config = {\n",
" \"height\": 360,\n",
" \"width\": 640,\n",
" \"frame_rate\": 30000/1001,\n",
" \"format\": \"yuv444p\",\n",
"}\n",
"\n",
"video = get_data(**pict_config)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BZWLHmf8H3Q5"
},
"source": [
"#### CPU"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"id": "wP648bHCH9p_",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "30af3ebc-a27c-4c40-a533-b4ac8e977527"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"========================================\n",
"* Configuration: {'height': 360, 'width': 640, 'frame_rate': 29.97002997002997, 'format': 'yuv444p', 'encoder': 'libx264', 'encoder_format': 'yuv444p'}\n",
"* Destination: 360p_cpu.mp4\n",
"========================================\n",
"\n",
" - Processed 450 frames.\n",
" - Elapsed: 3.280829835000077 seconds.\n",
"\n"
]
}
],
"source": [
"encode_config = {\n",
" \"encoder\": \"libx264\",\n",
" \"encoder_format\": \"yuv444p\",\n",
"}\n",
"\n",
"result[0, 0] = test_encode(video, \"360p_cpu.mp4\", **pict_config, **encode_config)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "P731iWsIH45U"
},
"source": [
"#### CUDA (from CPU Tensor)\n",
"\n",
"Now we test NVENC. This time, the data is sent from CPU memory to GPU memory as part of encoding."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"id": "zQ9W_UVTIf-E",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "3a9ccbfb-d243-4f1f-a790-0595a1de8bba"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"========================================\n",
"* Configuration: {'height': 360, 'width': 640, 'frame_rate': 29.97002997002997, 'format': 'yuv444p', 'encoder': 'h264_nvenc', 'encoder_format': 'yuv444p', 'encoder_option': {'gpu': '0'}}\n",
"* Destination: 360p_cuda.mp4\n",
"========================================\n",
"\n",
" - Processed 450 frames.\n",
" - Elapsed: 0.34294435300000714 seconds.\n",
"\n"
]
}
],
"source": [
"encode_config = {\n",
" \"encoder\": \"h264_nvenc\", # Use NVENC\n",
" \"encoder_format\": \"yuv444p\",\n",
" \"encoder_option\": {\"gpu\": \"0\"}, # Run encoding on the cuda:0 device\n",
"}\n",
"\n",
"result[1, 0] = test_encode(video, \"360p_cuda.mp4\", **pict_config, **encode_config)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QirJIHuSm-WT"
},
"source": [
"#### CUDA (from CUDA Tensor)\n",
"\n",
"If the data is already present on CUDA, then we can pass it to GPU encoder directly."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"id": "EHjyMYpJnEb3",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "b3a4c529-9b6e-468e-a836-6e5f5fe51d56"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"========================================\n",
"* Configuration: {'height': 360, 'width': 640, 'frame_rate': 29.97002997002997, 'format': 'yuv444p', 'encoder': 'h264_nvenc', 'encoder_format': 'yuv444p', 'encoder_option': {'gpu': '0'}, 'hw_accel': 'cuda:0'}\n",
"* Destination: 360p_cuda_hw.mp4\n",
"========================================\n",
"\n",
" - Processed 450 frames.\n",
" - Elapsed: 0.2424524550000342 seconds.\n",
"\n"
]
}
],
"source": [
"device = \"cuda:0\"\n",
"\n",
"encode_config = {\n",
" \"encoder\": \"h264_nvenc\", # GPU Encoder\n",
" \"encoder_format\": \"yuv444p\",\n",
" \"encoder_option\": {\"gpu\": \"0\"}, # Run encoding on the cuda:0 device\n",
" \"hw_accel\": device, # Data comes from cuda:0 device\n",
"}\n",
"\n",
"result[2, 0] = test_encode(video.to(torch.device(device)), \"360p_cuda_hw.mp4\", **pict_config, **encode_config)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LBCBSKkRJeiA"
},
"source": [
"### Encode MP4 - 720P\n",
"\n",
"Let's run the same tests on video with larger resolution."
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"id": "CUdzN9BzJlo_"
},
"outputs": [],
"source": [
"pict_config = {\n",
" \"height\": 720,\n",
" \"width\": 1280,\n",
" \"frame_rate\": 30000/1001,\n",
" \"format\": \"yuv444p\",\n",
"}\n",
"\n",
"video = get_data(**pict_config)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wTjAcXGAJsE8"
},
"source": [
"#### CPU"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "FoEM76wFJta5",
"outputId": "abe666de-8d02-469a-beea-74fcff494b89"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"========================================\n",
"* Configuration: {'height': 720, 'width': 1280, 'frame_rate': 29.97002997002997, 'format': 'yuv444p', 'encoder': 'libx264', 'encoder_format': 'yuv444p'}\n",
"* Destination: 720p_cpu.mp4\n",
"========================================\n",
"\n",
" - Processed 450 frames.\n",
" - Elapsed: 11.638768525999922 seconds.\n",
"\n"
]
}
],
"source": [
"encode_config = {\n",
" \"encoder\": \"libx264\",\n",
" \"encoder_format\": \"yuv444p\",\n",
"}\n",
"\n",
"result[0, 1] = test_encode(video, \"720p_cpu.mp4\", **pict_config, **encode_config)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lgbIN67BKARA"
},
"source": [
"#### CUDA (from CPU Tensor)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "fgNR79olJ7Yo",
"outputId": "65484cc4-ab54-4373-b1a4-50bfea20e73d"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"========================================\n",
"* Configuration: {'height': 720, 'width': 1280, 'frame_rate': 29.97002997002997, 'format': 'yuv444p', 'encoder': 'h264_nvenc', 'encoder_format': 'yuv444p'}\n",
"* Destination: 720p_cuda.mp4\n",
"========================================\n",
"\n",
" - Processed 450 frames.\n",
" - Elapsed: 0.8508033889999069 seconds.\n",
"\n"
]
}
],
"source": [
"encode_config = {\n",
" \"encoder\": \"h264_nvenc\",\n",
" \"encoder_format\": \"yuv444p\",\n",
"}\n",
"\n",
"result[1, 1] = test_encode(video, \"720p_cuda.mp4\", **pict_config, **encode_config)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oD_XUn4-KDod"
},
"source": [
"#### CUDA (from CUDA Tensor)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2NX4SU5xJ7zc",
"outputId": "130ad5a6-a197-46dc-d799-d230e2b469c0"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"========================================\n",
"* Configuration: {'height': 720, 'width': 1280, 'frame_rate': 29.97002997002997, 'format': 'yuv444p', 'encoder': 'h264_nvenc', 'encoder_format': 'yuv444p', 'encoder_option': {'gpu': '0'}, 'hw_accel': 'cuda:0'}\n",
"* Destination: 720p_cuda_hw.mp4\n",
"========================================\n",
"\n",
" - Processed 450 frames.\n",
" - Elapsed: 0.6384492569999338 seconds.\n",
"\n"
]
}
],
"source": [
"device = \"cuda:0\"\n",
"\n",
"encode_config = {\n",
" \"encoder\": \"h264_nvenc\",\n",
" \"encoder_format\": \"yuv444p\",\n",
" \"encoder_option\": {\"gpu\": \"0\"},\n",
" \"hw_accel\": device,\n",
"}\n",
"\n",
"result[2, 1] = test_encode(video.to(torch.device(device)), \"720p_cuda_hw.mp4\", **pict_config, **encode_config)"
]
},
{
"cell_type": "markdown",
"source": [
"### Encode MP4 - 1080P\n",
"\n",
"We make the video with even larger."
],
"metadata": {
"id": "FoMDEhtYO-xD"
}
},
{
"cell_type": "code",
"source": [
"pict_config = {\n",
" \"height\": 1080,\n",
" \"width\": 1920,\n",
" \"frame_rate\": 30000/1001,\n",
" \"format\": \"yuv444p\",\n",
"}\n",
"\n",
"video = get_data(**pict_config)"
],
"metadata": {
"id": "klcLfXipO5OX"
},
"execution_count": 51,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### CPU"
],
"metadata": {
"id": "o28eOGRuPD3n"
}
},
{
"cell_type": "code",
"source": [
"encode_config = {\n",
" \"encoder\": \"libx264\",\n",
" \"encoder_format\": \"yuv444p\",\n",
"}\n",
"\n",
"result[0, 2] = test_encode(video, \"1080p_cpu.mp4\", **pict_config, **encode_config)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6UuiAEowOedt",
"outputId": "2bcc2c9b-dc27-4dd5-b469-44e760a0068a"
},
"execution_count": 52,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"========================================\n",
"* Configuration: {'height': 1080, 'width': 1920, 'frame_rate': 29.97002997002997, 'format': 'yuv444p', 'encoder': 'libx264', 'encoder_format': 'yuv444p'}\n",
"* Destination: 1080p_cpu.mp4\n",
"========================================\n",
"\n",
" - Processed 450 frames.\n",
" - Elapsed: 27.020421489 seconds.\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"#### CUDA (from CPU Tensor)"
],
"metadata": {
"id": "ACJ3oEwgPFAh"
}
},
{
"cell_type": "code",
"source": [
"encode_config = {\n",
" \"encoder\": \"h264_nvenc\",\n",
" \"encoder_format\": \"yuv444p\",\n",
"}\n",
"\n",
"result[1, 2] = test_encode(video, \"1080p_cuda.mp4\", **pict_config, **encode_config)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "dzYNn8HtOeVi",
"outputId": "53b8e871-360f-483a-c7d6-c217a913dae9"
},
"execution_count": 53,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"========================================\n",
"* Configuration: {'height': 1080, 'width': 1920, 'frame_rate': 29.97002997002997, 'format': 'yuv444p', 'encoder': 'h264_nvenc', 'encoder_format': 'yuv444p'}\n",
"* Destination: 1080p_cuda.mp4\n",
"========================================\n",
"\n",
" - Processed 450 frames.\n",
" - Elapsed: 1.60377999800005 seconds.\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"#### CUDA (from CUDA Tensor)"
],
"metadata": {
"id": "YcEH15BYPOQ_"
}
},
{
"cell_type": "code",
"source": [
"device = \"cuda:0\"\n",
"\n",
"encode_config = {\n",
" \"encoder\": \"h264_nvenc\",\n",
" \"encoder_format\": \"yuv444p\",\n",
" \"encoder_option\": {\"gpu\": \"0\"},\n",
" \"hw_accel\": device,\n",
"}\n",
"\n",
"result[2, 2] = test_encode(video.to(torch.device(device)), \"1080p_cuda_hw.mp4\", **pict_config, **encode_config)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cUZOWVllOeRl",
"outputId": "b7608edc-ed60-4bc9-ab44-fa0b333a4264"
},
"execution_count": 54,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"========================================\n",
"* Configuration: {'height': 1080, 'width': 1920, 'frame_rate': 29.97002997002997, 'format': 'yuv444p', 'encoder': 'h264_nvenc', 'encoder_format': 'yuv444p', 'encoder_option': {'gpu': '0'}, 'hw_accel': 'cuda:0'}\n",
"* Destination: 1080p_cuda_hw.mp4\n",
"========================================\n",
"\n",
" - Processed 450 frames.\n",
" - Elapsed: 1.4101193979998925 seconds.\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### Result\n",
"\n",
"Here is the result."
],
"metadata": {
"id": "Z9lEeklfPY-Z"
}
},
{
"cell_type": "code",
"source": [
"labels = [\"CPU\", \"CUDA (from CPU Tensor)\", \"CUDA (from CUDA Tensor)\"]\n",
"columns = [\"360P\", \"720P\", \"1080P\"]\n",
"res = pd.DataFrame(\n",
" result.numpy(),\n",
" index=labels,\n",
" columns=columns,\n",
")\n",
"print(res)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "jDWUQFjVPaUV",
"outputId": "eda3af22-a346-4e0a-fd48-f1227e524ddf"
},
"execution_count": 55,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" 360P 720P 1080P\n",
"CPU 3.280830 11.638768 27.020422\n",
"CUDA (from CPU Tensor) 0.342944 0.850803 1.603780\n",
"CUDA (from CUDA Tensor) 0.242452 0.638449 1.410119\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"plt.plot(result.T)\n",
"plt.legend(labels)\n",
"plt.xticks([i for i in range(3)], columns)\n",
"plt.grid(visible=True, axis='y')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 265
},
"id": "gEccR5zwTYRQ",
"outputId": "486af2b2-9137-4197-bc6b-b17af0b2075a"
},
"execution_count": 56,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"source": [
"The resulting videos look like the following."
],
"metadata": {
"id": "mVeuM2gmkKld"
}
},
{
"cell_type": "code",
"source": [
"from IPython.display import HTML\n",
"\n",
"HTML('''\n",
"<div>\n",
" <video width=360 controls autoplay>\n",
" <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/streamwriter_360p_cpu.mp4\" type=\"video/mp4\">\n",
" </video>\n",
" <video width=360 controls autoplay>\n",
" <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/streamwriter_360p_cuda.mp4\" type=\"video/mp4\">\n",
" </video>\n",
" <video width=360 controls autoplay>\n",
" <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/streamwriter_360p_cuda_hw.mp4\" type=\"video/mp4\">\n",
" </video>\n",
"</div>\n",
"''')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 431
},
"id": "LNPWCXxOgnqz",
"outputId": "cd9449d8-d464-4c0e-88c3-e71d7b3261e5"
},
"execution_count": 61,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<IPython.core.display.HTML object>"
],
"text/html": [
"\n",
"<div>\n",
" <video width=360 controls autoplay>\n",
" <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/streamwriter_360p_cpu.mp4\" type=\"video/mp4\">\n",
" </video>\n",
" <video width=360 controls autoplay>\n",
" <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/streamwriter_360p_cuda.mp4\" type=\"video/mp4\">\n",
" </video>\n",
" <video width=360 controls autoplay>\n",
" <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/streamwriter_360p_cuda_hw.mp4\" type=\"video/mp4\">\n",
" </video>\n",
"</div>\n"
]
},
"metadata": {},
"execution_count": 61
}
]
},
{
"cell_type": "markdown",
"metadata": {
......@@ -1497,7 +2291,7 @@
"source": [
"## Conclusion\n",
"\n",
"We looked at how to build FFmpeg libraries with NVDEC support and use it from TorchAudio. NVDEC provides significant speed up."
"We looked at how to build FFmpeg libraries with NVDEC/NVENC support and use them from TorchAudio. NVDEC/NVENC provide significant speed up when saving/loading a video."
]
}
],
......@@ -1505,12 +2299,10 @@
"accelerator": "GPU",
"colab": {
"collapsed_sections": [
"S8sX7UqrSPck",
"VcKsO-zurtWM",
"PWT0m_LGq_GE",
"TpWDpxBvDy-b"
],
"name": "TorchAudio_Streaming_API_HW_Acceleration_Tutorial.ipynb",
"provenance": [],
"toc_visible": true
},
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment