{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "view-in-github" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "VcjSRFELVbNk" }, "source": [ "# MMAction2 Tutorial\n", "\n", "Welcome to MMAction2! This is the official colab tutorial for using MMAction2. In this tutorial, you will learn\n", "- Perform inference with a MMAction2 recognizer.\n", "- Train a new recognizer with a new dataset.\n", "- Perform spatio-temporal detection.\n", "\n", "Let's start!" ] }, { "cell_type": "markdown", "metadata": { "id": "7LqHGkGEVqpm" }, "source": [ "## Install MMAction2" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Bf8PpPXtVvmg", "outputId": "75519a17-cc0a-491f-98a1-f287b090cf82" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nvcc: NVIDIA (R) Cuda compiler driver\n", "Copyright (c) 2005-2020 NVIDIA Corporation\n", "Built on Mon_Oct_12_20:09:46_PDT_2020\n", "Cuda compilation tools, release 11.1, V11.1.105\n", "Build cuda_11.1.TC455_06.29190527_0\n", "gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0\n", "Copyright (C) 2017 Free Software Foundation, Inc.\n", "This is free software; see the source for copying conditions. There is NO\n", "warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n", "\n" ] } ], "source": [ "# Check nvcc version\n", "!nvcc -V\n", "# Check GCC version\n", "!gcc --version" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "5PAJ4ArzV5Ry", "outputId": "992b30c2-8281-4198-97c8-df2a287b0ae8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Looking in links: https://download.pytorch.org/whl/torch_stable.html\n", "Collecting torch==1.8.0+cu101\n", " Downloading https://download.pytorch.org/whl/cu101/torch-1.8.0%2Bcu101-cp37-cp37m-linux_x86_64.whl (763.5 MB)\n", "\u001B[K |████████████████████████████████| 763.5 MB 15 kB/s \n", "\u001B[?25hCollecting torchvision==0.9.0+cu101\n", " Downloading https://download.pytorch.org/whl/cu101/torchvision-0.9.0%2Bcu101-cp37-cp37m-linux_x86_64.whl (17.3 MB)\n", "\u001B[K |████████████████████████████████| 17.3 MB 983 kB/s \n", "\u001B[?25hCollecting torchtext==0.9.0\n", " Downloading torchtext-0.9.0-cp37-cp37m-manylinux1_x86_64.whl (7.1 MB)\n", "\u001B[K |████████████████████████████████| 7.1 MB 10.9 MB/s \n", "\u001B[?25hCollecting torchaudio==0.8.0\n", " Downloading torchaudio-0.8.0-cp37-cp37m-manylinux1_x86_64.whl (1.9 MB)\n", "\u001B[K |████████████████████████████████| 1.9 MB 46.6 MB/s \n", "\u001B[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torch==1.8.0+cu101) (1.21.5)\n", "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch==1.8.0+cu101) (3.10.0.2)\n", "Requirement already satisfied: pillow>=4.1.1 in /usr/local/lib/python3.7/dist-packages (from torchvision==0.9.0+cu101) (7.1.2)\n", "Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from torchtext==0.9.0) (4.62.3)\n", "Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from torchtext==0.9.0) (2.23.0)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (1.24.3)\n", "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (2.10)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (2021.10.8)\n", "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->torchtext==0.9.0) (3.0.4)\n", "Installing collected packages: torch, torchvision, torchtext, torchaudio\n", " Attempting uninstall: torch\n", " Found existing installation: torch 1.10.0+cu111\n", " Uninstalling torch-1.10.0+cu111:\n", " Successfully uninstalled torch-1.10.0+cu111\n", " Attempting uninstall: torchvision\n", " Found existing installation: torchvision 0.11.1+cu111\n", " Uninstalling torchvision-0.11.1+cu111:\n", " Successfully uninstalled torchvision-0.11.1+cu111\n", " Attempting uninstall: torchtext\n", " Found existing installation: torchtext 0.11.0\n", " Uninstalling torchtext-0.11.0:\n", " Successfully uninstalled torchtext-0.11.0\n", " Attempting uninstall: torchaudio\n", " Found existing installation: torchaudio 0.10.0+cu111\n", " Uninstalling torchaudio-0.10.0+cu111:\n", " Successfully uninstalled torchaudio-0.10.0+cu111\n", "Successfully installed torch-1.8.0+cu101 torchaudio-0.8.0 torchtext-0.9.0 torchvision-0.9.0+cu101\n", "Looking in links: https://download.openmmlab.com/mmcv/dist/cu101/torch1.8.0/index.html\n", "Collecting mmcv-full\n", " Downloading https://download.openmmlab.com/mmcv/dist/cu101/torch1.8.0/mmcv_full-1.4.5-cp37-cp37m-manylinux1_x86_64.whl (60.7 MB)\n", "\u001B[K |████████████████████████████████| 60.7 MB 66 kB/s \n", "\u001B[?25hCollecting addict\n", " Downloading addict-2.4.0-py3-none-any.whl (3.8 kB)\n", "Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (7.1.2)\n", "Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (3.13)\n", "Collecting yapf\n", " Downloading yapf-0.32.0-py2.py3-none-any.whl (190 kB)\n", "\u001B[K |████████████████████████████████| 190 kB 15.6 MB/s \n", "\u001B[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (1.21.5)\n", "Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (21.3)\n", "Requirement already satisfied: opencv-python>=3 in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (4.1.2.30)\n", "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->mmcv-full) (3.0.7)\n", "Installing collected packages: yapf, addict, mmcv-full\n", "Successfully installed addict-2.4.0 mmcv-full-1.4.5 yapf-0.32.0\n", "Cloning into 'mmaction2'...\n", "remote: Enumerating objects: 15036, done.\u001B[K\n", "remote: Counting objects: 100% (233/233), done.\u001B[K\n", "remote: Compressing objects: 100% (192/192), done.\u001B[K\n", "remote: Total 15036 (delta 86), reused 72 (delta 41), pack-reused 14803\u001B[K\n", "Receiving objects: 100% (15036/15036), 49.25 MiB | 25.23 MiB/s, done.\n", "Resolving deltas: 100% (10608/10608), done.\n", "/content/mmaction2\n", "Obtaining file:///content/mmaction2\n", "Collecting decord>=0.4.1\n", " Downloading decord-0.6.0-py3-none-manylinux2010_x86_64.whl (13.6 MB)\n", "\u001B[K |████████████████████████████████| 13.6 MB 10.2 MB/s \n", "\u001B[?25hCollecting einops\n", " Downloading einops-0.4.0-py3-none-any.whl (28 kB)\n", "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (3.2.2)\n", "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (1.21.5)\n", "Requirement already satisfied: opencv-contrib-python in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (4.1.2.30)\n", "Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (7.1.2)\n", "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (1.4.1)\n", "Requirement already satisfied: torch>=1.3 in /usr/local/lib/python3.7/dist-packages (from mmaction2==0.21.0) (1.8.0+cu101)\n", "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.3->mmaction2==0.21.0) (3.10.0.2)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.21.0) (1.3.2)\n", "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.21.0) (2.8.2)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.21.0) (3.0.7)\n", "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmaction2==0.21.0) (0.11.0)\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->mmaction2==0.21.0) (1.15.0)\n", "Installing collected packages: einops, decord, mmaction2\n", " Running setup.py develop for mmaction2\n", "Successfully installed decord-0.6.0 einops-0.4.0 mmaction2-0.21.0\n", "Collecting av\n", " Downloading av-8.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36.1 MB)\n", "\u001B[K |████████████████████████████████| 36.1 MB 298 kB/s \n", "\u001B[?25hRequirement already satisfied: imgaug in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 2)) (0.2.9)\n", "Requirement already satisfied: librosa in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 3)) (0.8.1)\n", "Requirement already satisfied: lmdb in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 4)) (0.99)\n", "Requirement already satisfied: moviepy in /usr/local/lib/python3.7/dist-packages (from -r requirements/optional.txt (line 5)) (0.2.3.5)\n", "Collecting onnx\n", " Downloading onnx-1.11.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (12.8 MB)\n", "\u001B[K |████████████████████████████████| 12.8 MB 52.3 MB/s \n", "\u001B[?25hCollecting onnxruntime\n", " Downloading onnxruntime-1.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB)\n", "\u001B[K |████████████████████████████████| 4.9 MB 51.6 MB/s \n", "\u001B[?25hCollecting pims\n", " Downloading PIMS-0.5.tar.gz (85 kB)\n", "\u001B[K |████████████████████████████████| 85 kB 5.2 MB/s \n", "\u001B[?25hCollecting PyTurboJPEG\n", " Downloading PyTurboJPEG-1.6.5.tar.gz (11 kB)\n", "Collecting timm\n", " Downloading timm-0.5.4-py3-none-any.whl (431 kB)\n", "\u001B[K |████████████████████████████████| 431 kB 64.7 MB/s \n", "\u001B[?25hRequirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (7.1.2)\n", "Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (1.21.5)\n", "Requirement already satisfied: scikit-image>=0.11.0 in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (0.18.3)\n", "Requirement already satisfied: imageio in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (2.4.1)\n", "Requirement already satisfied: opencv-python in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (4.1.2.30)\n", "Requirement already satisfied: Shapely in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (1.8.0)\n", "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (1.4.1)\n", "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (3.2.2)\n", "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from imgaug->-r requirements/optional.txt (line 2)) (1.15.0)\n", "Requirement already satisfied: PyWavelets>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.11.0->imgaug->-r requirements/optional.txt (line 2)) (1.2.0)\n", "Requirement already satisfied: tifffile>=2019.7.26 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.11.0->imgaug->-r requirements/optional.txt (line 2)) (2021.11.2)\n", "Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.11.0->imgaug->-r requirements/optional.txt (line 2)) (2.6.3)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 2)) (1.3.2)\n", "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 2)) (0.11.0)\n", "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 2)) (2.8.2)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug->-r requirements/optional.txt (line 2)) (3.0.7)\n", "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (21.3)\n", "Requirement already satisfied: numba>=0.43.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (0.51.2)\n", "Requirement already satisfied: resampy>=0.2.2 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (0.2.2)\n", "Requirement already satisfied: decorator>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (4.4.2)\n", "Requirement already satisfied: soundfile>=0.10.2 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (0.10.3.post1)\n", "Requirement already satisfied: scikit-learn!=0.19.0,>=0.14.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (1.0.2)\n", "Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (1.1.0)\n", "Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (1.6.0)\n", "Requirement already satisfied: audioread>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from librosa->-r requirements/optional.txt (line 3)) (2.1.9)\n", "Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba>=0.43.0->librosa->-r requirements/optional.txt (line 3)) (57.4.0)\n", "Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba>=0.43.0->librosa->-r requirements/optional.txt (line 3)) (0.34.0)\n", "Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (2.23.0)\n", "Requirement already satisfied: appdirs>=1.3.0 in /usr/local/lib/python3.7/dist-packages (from pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (1.4.4)\n", "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (3.0.4)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (1.24.3)\n", "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (2.10)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa->-r requirements/optional.txt (line 3)) (2021.10.8)\n", "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn!=0.19.0,>=0.14.0->librosa->-r requirements/optional.txt (line 3)) (3.1.0)\n", "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.7/dist-packages (from soundfile>=0.10.2->librosa->-r requirements/optional.txt (line 3)) (1.15.0)\n", "Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi>=1.0->soundfile>=0.10.2->librosa->-r requirements/optional.txt (line 3)) (2.21)\n", "Requirement already satisfied: tqdm<5.0,>=4.11.2 in /usr/local/lib/python3.7/dist-packages (from moviepy->-r requirements/optional.txt (line 5)) (4.62.3)\n", "Requirement already satisfied: typing-extensions>=3.6.2.1 in /usr/local/lib/python3.7/dist-packages (from onnx->-r requirements/optional.txt (line 6)) (3.10.0.2)\n", "Requirement already satisfied: protobuf>=3.12.2 in /usr/local/lib/python3.7/dist-packages (from onnx->-r requirements/optional.txt (line 6)) (3.17.3)\n", "Requirement already satisfied: flatbuffers in /usr/local/lib/python3.7/dist-packages (from onnxruntime->-r requirements/optional.txt (line 7)) (2.0)\n", "Collecting slicerator>=0.9.8\n", " Downloading slicerator-1.0.0-py3-none-any.whl (9.3 kB)\n", "Requirement already satisfied: torch>=1.4 in /usr/local/lib/python3.7/dist-packages (from timm->-r requirements/optional.txt (line 10)) (1.8.0+cu101)\n", "Requirement already satisfied: torchvision in /usr/local/lib/python3.7/dist-packages (from timm->-r requirements/optional.txt (line 10)) (0.9.0+cu101)\n", "Building wheels for collected packages: pims, PyTurboJPEG\n", " Building wheel for pims (setup.py) ... \u001B[?25l\u001B[?25hdone\n", " Created wheel for pims: filename=PIMS-0.5-py3-none-any.whl size=84325 sha256=acdeb0697c66e2b9cc49a549f9a3c67a35b36642e6724eeac9795e25e6d9de47\n", " Stored in directory: /root/.cache/pip/wheels/75/02/a9/86571c38081ba4c1832eb95430b5d588dfa15a738e2a603737\n", " Building wheel for PyTurboJPEG (setup.py) ... \u001B[?25l\u001B[?25hdone\n", " Created wheel for PyTurboJPEG: filename=PyTurboJPEG-1.6.5-py3-none-any.whl size=12160 sha256=b5fffd01e16b4d2a1d2f4e1cd976501c1e3ea1b3872f91bf595f6c025735a4e0\n", " Stored in directory: /root/.cache/pip/wheels/1b/6a/97/17286b24cd97dda462b5a886107f8663f1ccc7705f148b3850\n", "Successfully built pims PyTurboJPEG\n", "Installing collected packages: slicerator, timm, PyTurboJPEG, pims, onnxruntime, onnx, av\n", "Successfully installed PyTurboJPEG-1.6.5 av-8.1.0 onnx-1.11.0 onnxruntime-1.10.0 pims-0.5 slicerator-1.0.0 timm-0.5.4\n" ] } ], "source": [ "# install dependencies: (use cu111 because colab has CUDA 11.1)\n", "!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html\n", "\n", "# install mmcv-full thus we could use CUDA operators\n", "!pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html\n", "\n", "# Install mmaction2\n", "!rm -rf mmaction2\n", "!git clone https://github.com/open-mmlab/mmaction2.git\n", "%cd mmaction2\n", "\n", "!pip install -e .\n", "\n", "# Install some optional requirements\n", "!pip install -r requirements/optional.txt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "No_zZAFpWC-a", "outputId": "1f5dd76e-7749-4fc3-ee97-83c5e1700f29" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.8.0+cu101 True\n", "0.21.0\n", "10.1\n", "GCC 7.3\n" ] } ], "source": [ "# Check Pytorch installation\n", "import torch, torchvision\n", "print(torch.__version__, torch.cuda.is_available())\n", "\n", "# Check MMAction2 installation\n", "import mmaction\n", "print(mmaction.__version__)\n", "\n", "# Check MMCV installation\n", "from mmcv.ops import get_compiling_cuda_version, get_compiler_version\n", "print(get_compiling_cuda_version())\n", "print(get_compiler_version())" ] }, { "cell_type": "markdown", "metadata": { "id": "pXf7oV5DWdab" }, "source": [ "## Perform inference with a MMAction2 recognizer\n", "MMAction2 already provides high level APIs to do inference and training." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "64CW6d_AaT-Q", "outputId": "d08bfb9b-ab1e-451b-d3b2-89023a59766b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2021-07-11 12:44:00-- https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth\n", "Resolving download.openmmlab.com (download.openmmlab.com)... 47.88.36.78\n", "Connecting to download.openmmlab.com (download.openmmlab.com)|47.88.36.78|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 97579339 (93M) [application/octet-stream]\n", "Saving to: ‘checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth’\n", "\n", "checkpoints/tsn_r50 100%[===================>] 93.06M 11.4MB/s in 8.1s \n", "\n", "2021-07-11 12:44:09 (11.4 MB/s) - ‘checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth’ saved [97579339/97579339]\n", "\n" ] } ], "source": [ "!mkdir checkpoints\n", "!wget -c https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \\\n", " -O checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "HNZB7NoSabzj", "outputId": "b2f9bd71-1490-44d3-81c6-5037d804f0b1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Use load_from_local loader\n" ] } ], "source": [ "from mmaction.apis import inference_recognizer, init_recognizer\n", "\n", "# Choose to use a config and initialize the recognizer\n", "config = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'\n", "# Setup a checkpoint file to load\n", "checkpoint = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n", "# Initialize the recognizer\n", "model = init_recognizer(config, checkpoint, device='cuda:0')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "rEMsBnpHapAn" }, "outputs": [], "source": [ "# Use the recognizer to do inference\n", "video = 'demo/demo.mp4'\n", "label = 'tools/data/kinetics/label_map_k400.txt'\n", "results = inference_recognizer(model, video)\n", "\n", "labels = open(label).readlines()\n", "labels = [x.strip() for x in labels]\n", "results = [(labels[k[0]], k[1]) for k in results]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "NIyJXqfWathq", "outputId": "ca24528b-f99d-414a-fa50-456f6068b463" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "arm wrestling: 29.616438\n", "rock scissors paper: 10.754841\n", "shaking hands: 9.908401\n", "clapping: 9.189913\n", "massaging feet: 8.305307\n" ] } ], "source": [ "# Let's show the results\n", "for result in results:\n", " print(f'{result[0]}: ', result[1])" ] }, { "cell_type": "markdown", "metadata": { "id": "QuZG8kZ2fJ5d" }, "source": [ "## Train a recognizer on customized dataset\n", "\n", "To train a new recognizer, there are usually three things to do:\n", "1. Support a new dataset\n", "2. Modify the config\n", "3. Train a new recognizer" ] }, { "cell_type": "markdown", "metadata": { "id": "neEFyxChfgiJ" }, "source": [ "### Support a new dataset\n", "\n", "In this tutorial, we gives an example to convert the data into the format of existing datasets. Other methods and more advanced usages can be found in the [doc](/docs/en/tutorials/new_dataset.md)\n", "\n", "Firstly, let's download a tiny dataset obtained from [Kinetics-400](https://deepmind.com/research/open-source/open-source-datasets/kinetics/). We select 30 videos with their labels as train dataset and 10 videos with their labels as test dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "gjsUj9JzgUlJ", "outputId": "61c4704d-db81-4ca5-ed16-e2454dbdfe8e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rm: cannot remove 'kinetics400_tiny.zip*': No such file or directory\n", "--2021-07-11 12:44:29-- https://download.openmmlab.com/mmaction/kinetics400_tiny.zip\n", "Resolving download.openmmlab.com (download.openmmlab.com)... 47.88.36.78\n", "Connecting to download.openmmlab.com (download.openmmlab.com)|47.88.36.78|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 18308682 (17M) [application/zip]\n", "Saving to: ‘kinetics400_tiny.zip’\n", "\n", "kinetics400_tiny.zi 100%[===================>] 17.46M 10.7MB/s in 1.6s \n", "\n", "2021-07-11 12:44:31 (10.7 MB/s) - ‘kinetics400_tiny.zip’ saved [18308682/18308682]\n", "\n" ] } ], "source": [ "# download, decompress the data\n", "!rm kinetics400_tiny.zip*\n", "!rm -rf kinetics400_tiny\n", "!wget https://download.openmmlab.com/mmaction/kinetics400_tiny.zip\n", "!unzip kinetics400_tiny.zip > /dev/null" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "AbZ-o7V6hNw4", "outputId": "b091909c-def2-49b5-88c2-01b00802b162" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reading package lists...\n", "Building dependency tree...\n", "Reading state information...\n", "The following NEW packages will be installed:\n", " tree\n", "0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.\n", "Need to get 40.7 kB of archives.\n", "After this operation, 105 kB of additional disk space will be used.\n", "Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]\n", "Fetched 40.7 kB in 0s (88.7 kB/s)\n", "Selecting previously unselected package tree.\n", "(Reading database ... 160815 files and directories currently installed.)\n", "Preparing to unpack .../tree_1.7.0-5_amd64.deb ...\n", "Unpacking tree (1.7.0-5) ...\n", "Setting up tree (1.7.0-5) ...\n", "Processing triggers for man-db (2.8.3-2ubuntu0.1) ...\n", "kinetics400_tiny\n", "├── kinetics_tiny_train_video.txt\n", "├── kinetics_tiny_val_video.txt\n", "├── train\n", "│   ├── 27_CSXByd3s.mp4\n", "│   ├── 34XczvTaRiI.mp4\n", "│   ├── A-wiliK50Zw.mp4\n", "│   ├── D32_1gwq35E.mp4\n", "│   ├── D92m0HsHjcQ.mp4\n", "│   ├── DbX8mPslRXg.mp4\n", "│   ├── FMlSTTpN3VY.mp4\n", "│   ├── h10B9SVE-nk.mp4\n", "│   ├── h2YqqUhnR34.mp4\n", "│   ├── iRuyZSKhHRg.mp4\n", "│   ├── IyfILH9lBRo.mp4\n", "│   ├── kFC3KY2bOP8.mp4\n", "│   ├── LvcFDgCAXQs.mp4\n", "│   ├── O46YA8tI530.mp4\n", "│   ├── oMrZaozOvdQ.mp4\n", "│   ├── oXy-e_P_cAI.mp4\n", "│   ├── P5M-hAts7MQ.mp4\n", "│   ├── phDqGd0NKoo.mp4\n", "│   ├── PnOe3GZRVX8.mp4\n", "│   ├── R8HXQkdgKWA.mp4\n", "│   ├── RqnKtCEoEcA.mp4\n", "│   ├── soEcZZsBmDs.mp4\n", "│   ├── TkkZPZHbAKA.mp4\n", "│   ├── T_TMNGzVrDk.mp4\n", "│   ├── WaS0qwP46Us.mp4\n", "│   ├── Wh_YPQdH1Zg.mp4\n", "│   ├── WWP5HZJsg-o.mp4\n", "│   ├── xGY2dP0YUjA.mp4\n", "│   ├── yLC9CtWU5ws.mp4\n", "│   └── ZQV4U2KQ370.mp4\n", "└── val\n", " ├── 0pVGiAU6XEA.mp4\n", " ├── AQrbRSnRt8M.mp4\n", " ├── b6Q_b7vgc7Q.mp4\n", " ├── ddvJ6-faICE.mp4\n", " ├── IcLztCtvhb8.mp4\n", " ├── ik4BW3-SCts.mp4\n", " ├── jqRrH30V0k4.mp4\n", " ├── SU_x2LQqSLs.mp4\n", " ├── u4Rm6srmIS8.mp4\n", " └── y5Iu7XkTqV0.mp4\n", "\n", "2 directories, 42 files\n" ] } ], "source": [ "# Check the directory structure of the tiny data\n", "\n", "# Install tree first\n", "!apt-get -q install tree\n", "!tree kinetics400_tiny" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fTdi6dI0hY3g", "outputId": "ffda0997-8d77-431a-d66e-2f273e80c756" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "D32_1gwq35E.mp4 0\n", "iRuyZSKhHRg.mp4 1\n", "oXy-e_P_cAI.mp4 0\n", "34XczvTaRiI.mp4 1\n", "h2YqqUhnR34.mp4 0\n", "O46YA8tI530.mp4 0\n", "kFC3KY2bOP8.mp4 1\n", "WWP5HZJsg-o.mp4 1\n", "phDqGd0NKoo.mp4 1\n", "yLC9CtWU5ws.mp4 0\n", "27_CSXByd3s.mp4 1\n", "IyfILH9lBRo.mp4 1\n", "T_TMNGzVrDk.mp4 1\n", "TkkZPZHbAKA.mp4 0\n", "PnOe3GZRVX8.mp4 1\n", "soEcZZsBmDs.mp4 1\n", "FMlSTTpN3VY.mp4 1\n", "WaS0qwP46Us.mp4 0\n", "A-wiliK50Zw.mp4 1\n", "oMrZaozOvdQ.mp4 1\n", "ZQV4U2KQ370.mp4 0\n", "DbX8mPslRXg.mp4 1\n", "h10B9SVE-nk.mp4 1\n", "P5M-hAts7MQ.mp4 0\n", "R8HXQkdgKWA.mp4 0\n", "D92m0HsHjcQ.mp4 0\n", "RqnKtCEoEcA.mp4 0\n", "LvcFDgCAXQs.mp4 0\n", "xGY2dP0YUjA.mp4 0\n", "Wh_YPQdH1Zg.mp4 0\n" ] } ], "source": [ "# After downloading the data, we need to check the annotation format\n", "!cat kinetics400_tiny/kinetics_tiny_train_video.txt" ] }, { "cell_type": "markdown", "metadata": { "id": "0bq0mxmEi29H" }, "source": [ "According to the format defined in [`VideoDataset`](./datasets/video_dataset.py), each line indicates a sample video with the filepath and label, which are split with a whitespace." ] }, { "cell_type": "markdown", "metadata": { "id": "Ht_DGJA9jQar" }, "source": [ "### Modify the config\n", "\n", "In the next step, we need to modify the config for the training.\n", "To accelerate the process, we finetune a recognizer using a pre-trained recognizer." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "LjCcmCKOjktc" }, "outputs": [], "source": [ "from mmcv import Config\n", "cfg = Config.fromfile('./configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py')" ] }, { "cell_type": "markdown", "metadata": { "id": "tc8YhFFGjp3e" }, "source": [ "Given a config that trains a TSN model on kinetics400-full dataset, we need to modify some values to use it for training TSN on Kinetics400-tiny dataset.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "tlhu9byjjt-K", "outputId": "3b9a3c49-ace0-41d3-dd15-d6c8579755f8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Config:\n", "model = dict(\n", " type='Recognizer2D',\n", " backbone=dict(\n", " type='ResNet',\n", " pretrained='torchvision://resnet50',\n", " depth=50,\n", " norm_eval=False),\n", " cls_head=dict(\n", " type='TSNHead',\n", " num_classes=2,\n", " in_channels=2048,\n", " spatial_type='avg',\n", " consensus=dict(type='AvgConsensus', dim=1),\n", " dropout_ratio=0.4,\n", " init_std=0.01),\n", " train_cfg=None,\n", " test_cfg=dict(average_clips=None))\n", "optimizer = dict(type='SGD', lr=7.8125e-05, momentum=0.9, weight_decay=0.0001)\n", "optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))\n", "lr_config = dict(policy='step', step=[40, 80])\n", "total_epochs = 10\n", "checkpoint_config = dict(interval=5)\n", "log_config = dict(interval=5, hooks=[dict(type='TextLoggerHook')])\n", "dist_params = dict(backend='nccl')\n", "log_level = 'INFO'\n", "load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n", "resume_from = None\n", "workflow = [('train', 1)]\n", "dataset_type = 'VideoDataset'\n", "data_root = 'kinetics400_tiny/train/'\n", "data_root_val = 'kinetics400_tiny/val/'\n", "ann_file_train = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n", "ann_file_val = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", "ann_file_test = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", "img_norm_cfg = dict(\n", " mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)\n", "train_pipeline = [\n", " dict(type='DecordInit'),\n", " dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8),\n", " dict(type='DecordDecode'),\n", " dict(\n", " type='MultiScaleCrop',\n", " input_size=224,\n", " scales=(1, 0.875, 0.75, 0.66),\n", " random_crop=False,\n", " max_wh_scale_gap=1),\n", " dict(type='Resize', scale=(224, 224), keep_ratio=False),\n", " dict(type='Flip', flip_ratio=0.5),\n", " dict(\n", " type='Normalize',\n", " mean=[123.675, 116.28, 103.53],\n", " std=[58.395, 57.12, 57.375],\n", " to_bgr=False),\n", " dict(type='FormatShape', input_format='NCHW'),\n", " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", " dict(type='ToTensor', keys=['imgs', 'label'])\n", "]\n", "val_pipeline = [\n", " dict(type='DecordInit'),\n", " dict(\n", " type='SampleFrames',\n", " clip_len=1,\n", " frame_interval=1,\n", " num_clips=8,\n", " test_mode=True),\n", " dict(type='DecordDecode'),\n", " dict(type='Resize', scale=(-1, 256)),\n", " dict(type='CenterCrop', crop_size=224),\n", " dict(\n", " type='Normalize',\n", " mean=[123.675, 116.28, 103.53],\n", " std=[58.395, 57.12, 57.375],\n", " to_bgr=False),\n", " dict(type='FormatShape', input_format='NCHW'),\n", " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", " dict(type='ToTensor', keys=['imgs'])\n", "]\n", "test_pipeline = [\n", " dict(type='DecordInit'),\n", " dict(\n", " type='SampleFrames',\n", " clip_len=1,\n", " frame_interval=1,\n", " num_clips=25,\n", " test_mode=True),\n", " dict(type='DecordDecode'),\n", " dict(type='Resize', scale=(-1, 256)),\n", " dict(type='ThreeCrop', crop_size=256),\n", " dict(\n", " type='Normalize',\n", " mean=[123.675, 116.28, 103.53],\n", " std=[58.395, 57.12, 57.375],\n", " to_bgr=False),\n", " dict(type='FormatShape', input_format='NCHW'),\n", " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", " dict(type='ToTensor', keys=['imgs'])\n", "]\n", "data = dict(\n", " videos_per_gpu=2,\n", " workers_per_gpu=2,\n", " train=dict(\n", " type='VideoDataset',\n", " ann_file='kinetics400_tiny/kinetics_tiny_train_video.txt',\n", " data_prefix='kinetics400_tiny/train/',\n", " pipeline=[\n", " dict(type='DecordInit'),\n", " dict(\n", " type='SampleFrames', clip_len=1, frame_interval=1,\n", " num_clips=8),\n", " dict(type='DecordDecode'),\n", " dict(\n", " type='MultiScaleCrop',\n", " input_size=224,\n", " scales=(1, 0.875, 0.75, 0.66),\n", " random_crop=False,\n", " max_wh_scale_gap=1),\n", " dict(type='Resize', scale=(224, 224), keep_ratio=False),\n", " dict(type='Flip', flip_ratio=0.5),\n", " dict(\n", " type='Normalize',\n", " mean=[123.675, 116.28, 103.53],\n", " std=[58.395, 57.12, 57.375],\n", " to_bgr=False),\n", " dict(type='FormatShape', input_format='NCHW'),\n", " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", " dict(type='ToTensor', keys=['imgs', 'label'])\n", " ]),\n", " val=dict(\n", " type='VideoDataset',\n", " ann_file='kinetics400_tiny/kinetics_tiny_val_video.txt',\n", " data_prefix='kinetics400_tiny/val/',\n", " pipeline=[\n", " dict(type='DecordInit'),\n", " dict(\n", " type='SampleFrames',\n", " clip_len=1,\n", " frame_interval=1,\n", " num_clips=8,\n", " test_mode=True),\n", " dict(type='DecordDecode'),\n", " dict(type='Resize', scale=(-1, 256)),\n", " dict(type='CenterCrop', crop_size=224),\n", " dict(\n", " type='Normalize',\n", " mean=[123.675, 116.28, 103.53],\n", " std=[58.395, 57.12, 57.375],\n", " to_bgr=False),\n", " dict(type='FormatShape', input_format='NCHW'),\n", " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", " dict(type='ToTensor', keys=['imgs'])\n", " ]),\n", " test=dict(\n", " type='VideoDataset',\n", " ann_file='kinetics400_tiny/kinetics_tiny_val_video.txt',\n", " data_prefix='kinetics400_tiny/val/',\n", " pipeline=[\n", " dict(type='DecordInit'),\n", " dict(\n", " type='SampleFrames',\n", " clip_len=1,\n", " frame_interval=1,\n", " num_clips=25,\n", " test_mode=True),\n", " dict(type='DecordDecode'),\n", " dict(type='Resize', scale=(-1, 256)),\n", " dict(type='ThreeCrop', crop_size=256),\n", " dict(\n", " type='Normalize',\n", " mean=[123.675, 116.28, 103.53],\n", " std=[58.395, 57.12, 57.375],\n", " to_bgr=False),\n", " dict(type='FormatShape', input_format='NCHW'),\n", " dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),\n", " dict(type='ToTensor', keys=['imgs'])\n", " ]))\n", "evaluation = dict(\n", " interval=5,\n", " metrics=['top_k_accuracy', 'mean_class_accuracy'],\n", " save_best='auto')\n", "work_dir = './tutorial_exps'\n", "omnisource = False\n", "seed = 0\n", "gpu_ids = range(0, 1)\n", "\n" ] } ], "source": [ "from mmcv.runner import set_random_seed\n", "\n", "# Modify dataset type and path\n", "cfg.dataset_type = 'VideoDataset'\n", "cfg.data_root = 'kinetics400_tiny/train/'\n", "cfg.data_root_val = 'kinetics400_tiny/val/'\n", "cfg.ann_file_train = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n", "cfg.ann_file_val = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", "cfg.ann_file_test = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", "\n", "cfg.data.test.type = 'VideoDataset'\n", "cfg.data.test.ann_file = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", "cfg.data.test.data_prefix = 'kinetics400_tiny/val/'\n", "\n", "cfg.data.train.type = 'VideoDataset'\n", "cfg.data.train.ann_file = 'kinetics400_tiny/kinetics_tiny_train_video.txt'\n", "cfg.data.train.data_prefix = 'kinetics400_tiny/train/'\n", "\n", "cfg.data.val.type = 'VideoDataset'\n", "cfg.data.val.ann_file = 'kinetics400_tiny/kinetics_tiny_val_video.txt'\n", "cfg.data.val.data_prefix = 'kinetics400_tiny/val/'\n", "\n", "# The flag is used to determine whether it is omnisource training\n", "cfg.setdefault('omnisource', False)\n", "# Modify num classes of the model in cls_head\n", "cfg.model.cls_head.num_classes = 2\n", "# We can use the pre-trained TSN model\n", "cfg.load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'\n", "\n", "# Set up working dir to save files and logs.\n", "cfg.work_dir = './tutorial_exps'\n", "\n", "# The original learning rate (LR) is set for 8-GPU training.\n", "# We divide it by 8 since we only use one GPU.\n", "cfg.data.videos_per_gpu = cfg.data.videos_per_gpu // 16\n", "cfg.optimizer.lr = cfg.optimizer.lr / 8 / 16\n", "cfg.total_epochs = 10\n", "\n", "# We can set the checkpoint saving interval to reduce the storage cost\n", "cfg.checkpoint_config.interval = 5\n", "# We can set the log print interval to reduce the the times of printing log\n", "cfg.log_config.interval = 5\n", "\n", "# Set seed thus the results are more reproducible\n", "cfg.seed = 0\n", "set_random_seed(0, deterministic=False)\n", "cfg.gpu_ids = range(1)\n", "\n", "# Save the best\n", "cfg.evaluation.save_best='auto'\n", "\n", "\n", "# We can initialize the logger for training and have a look\n", "# at the final config used for training\n", "print(f'Config:\\n{cfg.pretty_text}')\n" ] }, { "cell_type": "markdown", "metadata": { "id": "tES-qnZ3k38Z" }, "source": [ "### Train a new recognizer\n", "\n", "Finally, lets initialize the dataset and recognizer, then train a new recognizer!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "dDBWkdDRk6oz", "outputId": "a85d80d7-b3c4-43f1-d49a-057e8036807f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Use load_from_torchvision loader\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2021-07-11 13:00:46,931 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}\n", "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n", " cpuset_checked))\n", "2021-07-11 13:00:46,980 - mmaction - INFO - load checkpoint from ./checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth\n", "2021-07-11 13:00:46,981 - mmaction - INFO - Use load_from_local loader\n", "2021-07-11 13:00:47,071 - mmaction - WARNING - The model and loaded state dict do not match exactly\n", "\n", "size mismatch for cls_head.fc_cls.weight: copying a param with shape torch.Size([400, 2048]) from checkpoint, the shape in current model is torch.Size([2, 2048]).\n", "size mismatch for cls_head.fc_cls.bias: copying a param with shape torch.Size([400]) from checkpoint, the shape in current model is torch.Size([2]).\n", "2021-07-11 13:00:47,074 - mmaction - INFO - Start running, host: root@b465112b4add, work_dir: /content/mmaction2/tutorial_exps\n", "2021-07-11 13:00:47,078 - mmaction - INFO - Hooks will be executed in the following order:\n", "before_run:\n", "(VERY_HIGH ) StepLrUpdaterHook \n", "(NORMAL ) CheckpointHook \n", "(NORMAL ) EvalHook \n", "(VERY_LOW ) TextLoggerHook \n", " -------------------- \n", "before_train_epoch:\n", "(VERY_HIGH ) StepLrUpdaterHook \n", "(NORMAL ) EvalHook \n", "(LOW ) IterTimerHook \n", "(VERY_LOW ) TextLoggerHook \n", " -------------------- \n", "before_train_iter:\n", "(VERY_HIGH ) StepLrUpdaterHook \n", "(NORMAL ) EvalHook \n", "(LOW ) IterTimerHook \n", " -------------------- \n", "after_train_iter:\n", "(ABOVE_NORMAL) OptimizerHook \n", "(NORMAL ) CheckpointHook \n", "(NORMAL ) EvalHook \n", "(LOW ) IterTimerHook \n", "(VERY_LOW ) TextLoggerHook \n", " -------------------- \n", "after_train_epoch:\n", "(NORMAL ) CheckpointHook \n", "(NORMAL ) EvalHook \n", "(VERY_LOW ) TextLoggerHook \n", " -------------------- \n", "before_val_epoch:\n", "(LOW ) IterTimerHook \n", "(VERY_LOW ) TextLoggerHook \n", " -------------------- \n", "before_val_iter:\n", "(LOW ) IterTimerHook \n", " -------------------- \n", "after_val_iter:\n", "(LOW ) IterTimerHook \n", " -------------------- \n", "after_val_epoch:\n", "(VERY_LOW ) TextLoggerHook \n", " -------------------- \n", "2021-07-11 13:00:47,081 - mmaction - INFO - workflow: [('train', 1)], max: 10 epochs\n", "/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/evaluation.py:190: UserWarning: runner.meta is None. Creating an empty one.\n", " warnings.warn('runner.meta is None. Creating an empty one.')\n", "2021-07-11 13:00:51,802 - mmaction - INFO - Epoch [1][5/15]\tlr: 7.813e-05, eta: 0:02:16, time: 0.942, data_time: 0.730, memory: 2918, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7604, loss: 0.7604, grad_norm: 14.8813\n", "2021-07-11 13:00:52,884 - mmaction - INFO - Epoch [1][10/15]\tlr: 7.813e-05, eta: 0:01:21, time: 0.217, data_time: 0.028, memory: 2918, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6282, loss: 0.6282, grad_norm: 10.1834\n", "2021-07-11 13:00:53,706 - mmaction - INFO - Epoch [1][15/15]\tlr: 7.813e-05, eta: 0:00:59, time: 0.164, data_time: 0.001, memory: 2918, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7165, loss: 0.7165, grad_norm: 10.8534\n", "2021-07-11 13:00:57,724 - mmaction - INFO - Epoch [2][5/15]\tlr: 7.813e-05, eta: 0:01:09, time: 0.802, data_time: 0.596, memory: 2918, top1_acc: 0.3000, top5_acc: 1.0000, loss_cls: 0.7001, loss: 0.7001, grad_norm: 11.4311\n", "2021-07-11 13:00:59,219 - mmaction - INFO - Epoch [2][10/15]\tlr: 7.813e-05, eta: 0:01:00, time: 0.296, data_time: 0.108, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6916, loss: 0.6916, grad_norm: 12.7101\n", "2021-07-11 13:01:00,040 - mmaction - INFO - Epoch [2][15/15]\tlr: 7.813e-05, eta: 0:00:51, time: 0.167, data_time: 0.004, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6567, loss: 0.6567, grad_norm: 8.8837\n", "2021-07-11 13:01:04,152 - mmaction - INFO - Epoch [3][5/15]\tlr: 7.813e-05, eta: 0:00:56, time: 0.820, data_time: 0.618, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6320, loss: 0.6320, grad_norm: 11.4025\n", "2021-07-11 13:01:05,526 - mmaction - INFO - Epoch [3][10/15]\tlr: 7.813e-05, eta: 0:00:50, time: 0.276, data_time: 0.075, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6542, loss: 0.6542, grad_norm: 10.6429\n", "2021-07-11 13:01:06,350 - mmaction - INFO - Epoch [3][15/15]\tlr: 7.813e-05, eta: 0:00:44, time: 0.165, data_time: 0.001, memory: 2918, top1_acc: 0.2000, top5_acc: 1.0000, loss_cls: 0.7661, loss: 0.7661, grad_norm: 12.8421\n", "2021-07-11 13:01:10,771 - mmaction - INFO - Epoch [4][5/15]\tlr: 7.813e-05, eta: 0:00:47, time: 0.883, data_time: 0.676, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6410, loss: 0.6410, grad_norm: 10.6697\n", "2021-07-11 13:01:11,776 - mmaction - INFO - Epoch [4][10/15]\tlr: 7.813e-05, eta: 0:00:42, time: 0.201, data_time: 0.011, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6949, loss: 0.6949, grad_norm: 10.5467\n", "2021-07-11 13:01:12,729 - mmaction - INFO - Epoch [4][15/15]\tlr: 7.813e-05, eta: 0:00:38, time: 0.190, data_time: 0.026, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6290, loss: 0.6290, grad_norm: 11.2779\n", "2021-07-11 13:01:16,816 - mmaction - INFO - Epoch [5][5/15]\tlr: 7.813e-05, eta: 0:00:38, time: 0.817, data_time: 0.608, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6011, loss: 0.6011, grad_norm: 9.1335\n", "2021-07-11 13:01:18,176 - mmaction - INFO - Epoch [5][10/15]\tlr: 7.813e-05, eta: 0:00:35, time: 0.272, data_time: 0.080, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6652, loss: 0.6652, grad_norm: 11.0616\n", "2021-07-11 13:01:19,119 - mmaction - INFO - Epoch [5][15/15]\tlr: 7.813e-05, eta: 0:00:32, time: 0.188, data_time: 0.017, memory: 2918, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6440, loss: 0.6440, grad_norm: 11.6473\n", "2021-07-11 13:01:19,120 - mmaction - INFO - Saving checkpoint at 5 epochs\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 4.9 task/s, elapsed: 2s, ETA: 0s" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2021-07-11 13:01:21,673 - mmaction - INFO - Evaluating top_k_accuracy ...\n", "2021-07-11 13:01:21,677 - mmaction - INFO - \n", "top1_acc\t0.7000\n", "top5_acc\t1.0000\n", "2021-07-11 13:01:21,679 - mmaction - INFO - Evaluating mean_class_accuracy ...\n", "2021-07-11 13:01:21,682 - mmaction - INFO - \n", "mean_acc\t0.7000\n", "2021-07-11 13:01:22,264 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_5.pth.\n", "2021-07-11 13:01:22,267 - mmaction - INFO - Best top1_acc is 0.7000 at 5 epoch.\n", "2021-07-11 13:01:22,271 - mmaction - INFO - Epoch(val) [5][5]\ttop1_acc: 0.7000, top5_acc: 1.0000, mean_class_accuracy: 0.7000\n", "2021-07-11 13:01:26,623 - mmaction - INFO - Epoch [6][5/15]\tlr: 7.813e-05, eta: 0:00:31, time: 0.868, data_time: 0.656, memory: 2918, top1_acc: 0.7000, top5_acc: 1.0000, loss_cls: 0.6753, loss: 0.6753, grad_norm: 11.8640\n", "2021-07-11 13:01:27,597 - mmaction - INFO - Epoch [6][10/15]\tlr: 7.813e-05, eta: 0:00:28, time: 0.195, data_time: 0.003, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6715, loss: 0.6715, grad_norm: 11.3347\n", "2021-07-11 13:01:28,736 - mmaction - INFO - Epoch [6][15/15]\tlr: 7.813e-05, eta: 0:00:25, time: 0.228, data_time: 0.063, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5769, loss: 0.5769, grad_norm: 9.2541\n", "2021-07-11 13:01:32,860 - mmaction - INFO - Epoch [7][5/15]\tlr: 7.813e-05, eta: 0:00:24, time: 0.822, data_time: 0.620, memory: 2918, top1_acc: 0.9000, top5_acc: 1.0000, loss_cls: 0.5379, loss: 0.5379, grad_norm: 8.0147\n", "2021-07-11 13:01:34,340 - mmaction - INFO - Epoch [7][10/15]\tlr: 7.813e-05, eta: 0:00:22, time: 0.298, data_time: 0.109, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6187, loss: 0.6187, grad_norm: 11.5244\n", "2021-07-11 13:01:35,165 - mmaction - INFO - Epoch [7][15/15]\tlr: 7.813e-05, eta: 0:00:19, time: 0.165, data_time: 0.002, memory: 2918, top1_acc: 0.4000, top5_acc: 1.0000, loss_cls: 0.7063, loss: 0.7063, grad_norm: 12.4979\n", "2021-07-11 13:01:39,435 - mmaction - INFO - Epoch [8][5/15]\tlr: 7.813e-05, eta: 0:00:17, time: 0.853, data_time: 0.641, memory: 2918, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.5369, loss: 0.5369, grad_norm: 8.6545\n", "2021-07-11 13:01:40,808 - mmaction - INFO - Epoch [8][10/15]\tlr: 7.813e-05, eta: 0:00:15, time: 0.275, data_time: 0.086, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6407, loss: 0.6407, grad_norm: 12.5537\n", "2021-07-11 13:01:41,627 - mmaction - INFO - Epoch [8][15/15]\tlr: 7.813e-05, eta: 0:00:12, time: 0.164, data_time: 0.001, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6073, loss: 0.6073, grad_norm: 11.4028\n", "2021-07-11 13:01:45,651 - mmaction - INFO - Epoch [9][5/15]\tlr: 7.813e-05, eta: 0:00:11, time: 0.803, data_time: 0.591, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5596, loss: 0.5596, grad_norm: 10.0821\n", "2021-07-11 13:01:46,891 - mmaction - INFO - Epoch [9][10/15]\tlr: 7.813e-05, eta: 0:00:08, time: 0.248, data_time: 0.044, memory: 2918, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.6470, loss: 0.6470, grad_norm: 11.8979\n", "2021-07-11 13:01:47,944 - mmaction - INFO - Epoch [9][15/15]\tlr: 7.813e-05, eta: 0:00:06, time: 0.211, data_time: 0.041, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6657, loss: 0.6657, grad_norm: 12.0643\n", "2021-07-11 13:01:52,200 - mmaction - INFO - Epoch [10][5/15]\tlr: 7.813e-05, eta: 0:00:04, time: 0.849, data_time: 0.648, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.6310, loss: 0.6310, grad_norm: 11.5690\n", "2021-07-11 13:01:53,707 - mmaction - INFO - Epoch [10][10/15]\tlr: 7.813e-05, eta: 0:00:02, time: 0.303, data_time: 0.119, memory: 2918, top1_acc: 0.8000, top5_acc: 1.0000, loss_cls: 0.5178, loss: 0.5178, grad_norm: 9.3324\n", "2021-07-11 13:01:54,520 - mmaction - INFO - Epoch [10][15/15]\tlr: 7.813e-05, eta: 0:00:00, time: 0.162, data_time: 0.001, memory: 2918, top1_acc: 0.5000, top5_acc: 1.0000, loss_cls: 0.6919, loss: 0.6919, grad_norm: 12.6688\n", "2021-07-11 13:01:54,522 - mmaction - INFO - Saving checkpoint at 10 epochs\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 5.9 task/s, elapsed: 2s, ETA: 0s" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2021-07-11 13:01:56,741 - mmaction - INFO - Evaluating top_k_accuracy ...\n", "2021-07-11 13:01:56,743 - mmaction - INFO - \n", "top1_acc\t1.0000\n", "top5_acc\t1.0000\n", "2021-07-11 13:01:56,749 - mmaction - INFO - Evaluating mean_class_accuracy ...\n", "2021-07-11 13:01:56,750 - mmaction - INFO - \n", "mean_acc\t1.0000\n", "2021-07-11 13:01:57,267 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_10.pth.\n", "2021-07-11 13:01:57,269 - mmaction - INFO - Best top1_acc is 1.0000 at 10 epoch.\n", "2021-07-11 13:01:57,270 - mmaction - INFO - Epoch(val) [10][5]\ttop1_acc: 1.0000, top5_acc: 1.0000, mean_class_accuracy: 1.0000\n" ] } ], "source": [ "import os.path as osp\n", "\n", "from mmaction.datasets import build_dataset\n", "from mmaction.models import build_model\n", "from mmaction.apis import train_model\n", "\n", "import mmcv\n", "\n", "# Build the dataset\n", "datasets = [build_dataset(cfg.data.train)]\n", "\n", "# Build the recognizer\n", "model = build_model(cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))\n", "\n", "# Create work_dir\n", "mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))\n", "train_model(model, datasets, cfg, distributed=False, validate=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "zdSd7oTLlxIf" }, "source": [ "### Understand the log\n", "From the log, we can have a basic understanding the training process and know how well the recognizer is trained.\n", "\n", "Firstly, the ResNet-50 backbone pre-trained on ImageNet is loaded, this is a common practice since training from scratch is more cost. The log shows that all the weights of the ResNet-50 backbone are loaded except the `fc.bias` and `fc.weight`.\n", "\n", "Second, since the dataset we are using is small, we loaded a TSN model and finetune it for action recognition.\n", "The original TSN is trained on original Kinetics-400 dataset which contains 400 classes but Kinetics-400 Tiny dataset only have 2 classes. Therefore, the last FC layer of the pre-trained TSN for classification has different weight shape and is not used.\n", "\n", "Third, after training, the recognizer is evaluated by the default evaluation. The results show that the recognizer achieves 100% top1 accuracy and 100% top5 accuracy on the val dataset,\n", " \n", "Not bad!" ] }, { "cell_type": "markdown", "metadata": { "id": "ryVoSfZVmogw" }, "source": [ "## Test the trained recognizer\n", "\n", "After finetuning the recognizer, let's check the prediction results!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "eyY3hCMwyTct", "outputId": "ea54ff0a-4299-4e93-c1ca-4fe597e7516b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ ] 0/10, elapsed: 0s, ETA:" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n", " cpuset_checked))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10/10, 2.2 task/s, elapsed: 5s, ETA: 0s\n", "Evaluating top_k_accuracy ...\n", "\n", "top1_acc\t1.0000\n", "top5_acc\t1.0000\n", "\n", "Evaluating mean_class_accuracy ...\n", "\n", "mean_acc\t1.0000\n", "top1_acc: 1.0000\n", "top5_acc: 1.0000\n", "mean_class_accuracy: 1.0000\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/content/mmaction2/mmaction/datasets/base.py:166: UserWarning: Option arguments for metrics has been changed to `metric_options`, See 'https://github.com/open-mmlab/mmaction2/pull/286' for more details\n", " 'Option arguments for metrics has been changed to '\n" ] } ], "source": [ "from mmaction.apis import single_gpu_test\n", "from mmaction.datasets import build_dataloader\n", "from mmcv.parallel import MMDataParallel\n", "\n", "# Build a test dataloader\n", "dataset = build_dataset(cfg.data.test, dict(test_mode=True))\n", "data_loader = build_dataloader(\n", " dataset,\n", " videos_per_gpu=1,\n", " workers_per_gpu=cfg.data.workers_per_gpu,\n", " dist=False,\n", " shuffle=False)\n", "model = MMDataParallel(model, device_ids=[0])\n", "outputs = single_gpu_test(model, data_loader)\n", "\n", "eval_config = cfg.evaluation\n", "eval_config.pop('interval')\n", "eval_res = dataset.evaluate(outputs, **eval_config)\n", "for name, val in eval_res.items():\n", " print(f'{name}: {val:.04f}')" ] }, { "cell_type": "markdown", "metadata": { "id": "jZ4t44nWmZDM" }, "source": [ "## Perform Spatio-Temporal Detection\n", "Here we first install MMDetection." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "w1p0_g76nHOQ", "outputId": "b30a6be3-c457-452e-c789-7083117c5011" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/content\n", "Cloning into 'mmdetection'...\n", "remote: Enumerating objects: 23137, done.\u001B[K\n", "remote: Total 23137 (delta 0), reused 0 (delta 0), pack-reused 23137\u001B[K\n", "Receiving objects: 100% (23137/23137), 25.88 MiB | 25.75 MiB/s, done.\n", "Resolving deltas: 100% (16198/16198), done.\n", "/content/mmdetection\n", "Obtaining file:///content/mmdetection\n", "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (3.2.2)\n", "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (1.21.5)\n", "Requirement already satisfied: pycocotools in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (2.0.4)\n", "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (1.15.0)\n", "Collecting terminaltables\n", " Downloading terminaltables-3.1.10-py2.py3-none-any.whl (15 kB)\n", "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (0.11.0)\n", "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (2.8.2)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (1.3.2)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (3.0.7)\n", "Installing collected packages: terminaltables, mmdet\n", " Running setup.py develop for mmdet\n", "Successfully installed mmdet-2.21.0 terminaltables-3.1.10\n", "/content/mmaction2\n" ] } ], "source": [ "# Git clone mmdetection repo\n", "%cd ..\n", "!git clone https://github.com/open-mmlab/mmdetection.git\n", "%cd mmdetection\n", "\n", "# install mmdet\n", "!pip install -e .\n", "%cd ../mmaction2" ] }, { "cell_type": "markdown", "metadata": { "id": "vlOQsH8OnVKn" }, "source": [ "Download a video to `demo` directory in MMAction2." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QaW3jg5Enish", "outputId": "c70cde3a-b337-41d0-cb08-82dfc746d9ef" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2022-02-19 11:02:59-- https://download.openmmlab.com/mmaction/dataset/sample/1j20qq1JyX4.mp4\n", "Resolving download.openmmlab.com (download.openmmlab.com)... 47.254.186.233\n", "Connecting to download.openmmlab.com (download.openmmlab.com)|47.254.186.233|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 4864186 (4.6M) [video/mp4]\n", "Saving to: ‘demo/1j20qq1JyX4.mp4’\n", "\n", "demo/1j20qq1JyX4.mp 100%[===================>] 4.64M 3.78MB/s in 1.2s \n", "\n", "2022-02-19 11:03:01 (3.78 MB/s) - ‘demo/1j20qq1JyX4.mp4’ saved [4864186/4864186]\n", "\n" ] } ], "source": [ "!wget https://download.openmmlab.com/mmaction/dataset/sample/1j20qq1JyX4.mp4 -O demo/1j20qq1JyX4.mp4" ] }, { "cell_type": "markdown", "metadata": { "id": "LYGxdu8Vnoah" }, "source": [ "Run spatio-temporal demo." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "LPLiaHaYnrb7", "outputId": "8a8f8a16-ad7b-4559-c19c-c8264533bff3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imageio: 'ffmpeg-linux64-v3.3.1' was not found on your computer; downloading it now.\n", "Try 1. Download from https://github.com/imageio/imageio-binaries/raw/master/ffmpeg/ffmpeg-linux64-v3.3.1 (43.8 MB)\n", "Downloading: 8192/45929032 bytes (0.0%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b3883008/45929032 bytes (8.5%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b7995392/45929032 bytes (17.4%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b11796480/45929032 bytes (25.7%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b16072704/45929032 bytes (35.0%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b20152320/45929032 bytes (43.9%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b24305664/45929032 bytes (52.9%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b28319744/45929032 bytes (61.7%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b32440320/45929032 bytes (70.6%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b36634624/45929032 bytes (79.8%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b40886272/45929032 bytes (89.0%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b45146112/45929032 bytes (98.3%)\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b45929032/45929032 bytes (100.0%)\n", " Done\n", "File saved as /root/.imageio/ffmpeg/ffmpeg-linux64-v3.3.1.\n", "load checkpoint from http path: http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth\n", "Downloading: \"http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth\" to /root/.cache/torch/hub/checkpoints/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth\n", "100% 160M/160M [00:21<00:00, 7.77MB/s]\n", "Performing Human Detection for each frame\n", "[>>] 217/217, 8.6 task/s, elapsed: 25s, ETA: 0sload checkpoint from http path: https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth\n", "Downloading: \"https://download.openmmlab.com/mmaction/detection/ava/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth\" to /root/.cache/torch/hub/checkpoints/slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb_20201217-16378594.pth\n", "100% 228M/228M [00:31<00:00, 7.55MB/s]\n", "Performing SpatioTemporal Action Detection for each clip\n", "[> ] 167/217, 7.7 task/s, elapsed: 22s, ETA: 7sPerforming visualization\n", "[MoviePy] >>>> Building video demo/stdet_demo.mp4\n", "[MoviePy] Writing video demo/stdet_demo.mp4\n", "100% 434/434 [00:12<00:00, 36.07it/s]\n", "[MoviePy] Done.\n", "[MoviePy] >>>> Video ready: demo/stdet_demo.mp4 \n", "\n" ] } ], "source": [ "!python demo/demo_spatiotemporal_det.py --video demo/1j20qq1JyX4.mp4" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 341 }, "id": "-0atQCzBo9-C", "outputId": "b6bb3a67-669c-45d0-cdf4-25b6210362d0" }, "outputs": [ { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check the video\n", "from IPython.display import HTML\n", "from base64 import b64encode\n", "mp4 = open('demo/stdet_demo.mp4','rb').read()\n", "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n", "HTML(\"\"\"\n", "\n", "\"\"\" % data_url)" ] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "MMAction2 Tutorial.ipynb", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 0 }