Commit 37437e80 authored by sunxx1's avatar sunxx1
Browse files

Merge branch 'sun_22.10' into 'main'

Sun 22.10

See merge request dcutoolkit/deeplearing/dlexamples_new!54
parents 8442f072 701c0060
ARG PYTORCH="1.6.0"
ARG CUDA="10.1"
ARG CUDNN="7"
FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel
ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX"
ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all"
ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"
RUN apt-get update && apt-get install -y git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 ffmpeg \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Install mmcv-full
RUN pip install mmcv-full==latest -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html
# Install MMAction2
RUN conda clean --all
RUN git clone https://github.com/open-mmlab/mmaction2.git /mmaction2
WORKDIR /mmaction2
RUN mkdir -p /mmaction2/data
ENV FORCE_CUDA="1"
RUN pip install cython --no-cache-dir
RUN pip install --no-cache-dir -e .
ARG PYTORCH="1.9.0"
ARG CUDA="10.2"
ARG CUDNN="7"
FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel
ARG MMCV="1.3.8"
ARG MMACTION="0.24.0"
ENV PYTHONUNBUFFERED TRUE
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
ca-certificates \
g++ \
openjdk-11-jre-headless \
# MMDET Requirements
ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \
libsndfile1 libturbojpeg \
&& rm -rf /var/lib/apt/lists/*
ENV PATH="/opt/conda/bin:$PATH"
RUN export FORCE_CUDA=1
# TORCHSEVER
RUN pip install torchserve torch-model-archiver
# MMLAB
ARG PYTORCH
ARG CUDA
RUN ["/bin/bash", "-c", "pip install mmcv-full==${MMCV} -f https://download.openmmlab.com/mmcv/dist/cu${CUDA//./}/torch${PYTORCH}/index.html"]
# RUN pip install mmaction2==${MMACTION}
RUN pip install git+https://github.com/open-mmlab/mmaction2.git
RUN useradd -m model-server \
&& mkdir -p /home/model-server/tmp
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod +x /usr/local/bin/entrypoint.sh \
&& chown -R model-server /home/model-server
COPY config.properties /home/model-server/config.properties
RUN mkdir /home/model-server/model-store && chown -R model-server /home/model-server/model-store
EXPOSE 8080 8081 8082
USER model-server
WORKDIR /home/model-server
ENV TEMP=/home/model-server/tmp
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["serve"]
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
model_store=/home/model-server/model-store
load_models=all
#!/bin/bash
set -e
if [[ "$1" = "serve" ]]; then
shift 1
torchserve --start --ts-config /home/model-server/config.properties
else
eval "$@"
fi
# prevent docker exit
tail -f /dev/null
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.header-logo {
background-image: url("../images/mmaction2.png");
background-size: 130px 40px;
height: 40px;
width: 130px;
}
mmaction.apis
-------------
.. automodule:: mmaction.apis
:members:
mmaction.core
-------------
optimizer
^^^^^^^^^
.. automodule:: mmaction.core.optimizer
:members:
evaluation
^^^^^^^^^^
.. automodule:: mmaction.core.evaluation
:members:
scheduler
^^
.. automodule:: mmaction.core.scheduler
:members:
mmaction.localization
---------------------
localization
^^^^^^^^^^^^
.. automodule:: mmaction.localization
:members:
mmaction.models
---------------
models
^^^^^^
.. automodule:: mmaction.models
:members:
recognizers
^^^^^^^^^^^
.. automodule:: mmaction.models.recognizers
:members:
localizers
^^^^^^^^^^
.. automodule:: mmaction.models.localizers
:members:
common
^^^^^^
.. automodule:: mmaction.models.common
:members:
backbones
^^^^^^^^^
.. automodule:: mmaction.models.backbones
:members:
heads
^^^^^
.. automodule:: mmaction.models.heads
:members:
necks
^^^^^
.. automodule:: mmaction.models.necks
:members:
losses
^^^^^^
.. automodule:: mmaction.models.losses
:members:
mmaction.datasets
-----------------
datasets
^^^^^^^^
.. automodule:: mmaction.datasets
:members:
pipelines
^^^^^^^^^
.. automodule:: mmaction.datasets.pipelines
:members:
samplers
^^^^^^^^
.. automodule:: mmaction.datasets.samplers
:members:
mmaction.utils
--------------
.. automodule:: mmaction.utils
:members:
mmaction.localization
---------------------
.. automodule:: mmaction.localization
:members:
# Benchmark
We compare our results with some popular frameworks and official releases in terms of speed.
## Settings
### Hardware
- 8 NVIDIA Tesla V100 (32G) GPUs
- Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
### Software Environment
- Python 3.7
- PyTorch 1.4
- CUDA 10.1
- CUDNN 7.6.03
- NCCL 2.4.08
### Metrics
The time we measured is the average training time for an iteration, including data processing and model training.
The training speed is measure with s/iter. The lower, the better. Note that we skip the first 50 iter times as they may contain the device warmup time.
### Comparison Rules
Here we compare our MMAction2 repo with other video understanding toolboxes in the same data and model settings
by the training time per iteration. Here, we use
- commit id [7f3490d](https://github.com/open-mmlab/mmaction/tree/7f3490d3db6a67fe7b87bfef238b757403b670e3)(1/5/2020) of MMAction
- commit id [8d53d6f](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd)(5/5/2020) of Temporal-Shift-Module
- commit id [8299c98](https://github.com/facebookresearch/SlowFast/tree/8299c9862f83a067fa7114ce98120ae1568a83ec)(7/7/2020) of PySlowFast
- commit id [f13707f](https://github.com/wzmsltw/BSN-boundary-sensitive-network/tree/f13707fbc362486e93178c39f9c4d398afe2cb2f)(12/12/2018) of BSN(boundary sensitive network)
- commit id [45d0514](https://github.com/JJBOY/BMN-Boundary-Matching-Network/tree/45d05146822b85ca672b65f3d030509583d0135a)(17/10/2019) of BMN(boundary matching network)
To ensure the fairness of the comparison, the comparison experiments were conducted under the same hardware environment and using the same dataset. The rawframe dataset we used is generated by the [data preparation tools](/tools/data/kinetics/README.md), the video dataset we used is a special version of resized video cache called '256p dense-encoded video', featuring a faster decoding speed which is generated by the scripts [here](/tools/data/resize_videos.py). Significant improvement can be observed when comparing with normal 256p videos as shown in the table below, especially when the sampling is sparse(like [TSN](/configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py)).
For each model setting, we kept the same data preprocessing methods to make sure the same feature input.
In addition, we also used Memcached, a distributed cached system, to load the data for the same IO time except for fair comparisons with Pyslowfast which uses raw videos directly from disk by default.
We provide the training log based on which we calculate the average iter time, with the actual setting logged inside, feel free to verify it and fire an issue if something does not make sense.
## Main Results
### Recognizers
| Model | input | io backend | batch size x gpus | MMAction2 (s/iter) | GPU mem(GB) | MMAction (s/iter) | GPU mem(GB) | Temporal-Shift-Module (s/iter) | GPU mem(GB) | PySlowFast (s/iter) | GPU mem(GB) |
| :------------------------------------------------------------------------------------------ | :----------------------: | :--------: | :---------------: | :-------------------------------------------------------------------------------------------------------------------------: | :---------: | :------------------------------------------------------------------------------------------------------------------: | :---------: | :-------------------------------------------------------------------------------------------------------------------------------: | :---------: | :--------------------------------------------------------------------------------------------------------------------: | :---------: |
| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p rawframes | Memcached | 32x8 | **[0.32](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_rawframes_memcahed_32x8.zip)** | 8.1 | [0.38](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction/tsn_256p_rawframes_memcached_32x8.zip) | 8.1 | [0.42](https://download.openmmlab.com/mmaction/benchmark/recognition/temporal_shift_module/tsn_256p_rawframes_memcached_32x8.zip) | 10.5 | x | x |
| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p videos | Disk | 32x8 | **[1.42](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_videos_disk_32x8.zip)** | 8.1 | x | x | x | x | TODO | TODO |
| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 32x8 | **[0.61](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_fast_videos_disk_32x8.zip)** | 8.1 | x | x | x | x | TODO | TODO |
| [I3D heavy](/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.34](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_heavy_256p_videos_disk_8x8.zip)** | 4.6 | x | x | x | x | [0.44](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_i3d_r50_8x8_video.log) | 4.6 |
| [I3D heavy](/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.35](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_heavy_256p_fast_videos_disk_8x8.zip)** | 4.6 | x | x | x | x | [0.36](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_i3d_r50_8x8_fast_video.log) | 4.6 |
| [I3D](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | 256p rawframes | Memcached | 8x8 | **[0.43](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_256p_rawframes_memcahed_8x8.zip)** | 5.0 | [0.56](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction/i3d_256p_rawframes_memcached_8x8.zip) | 5.0 | x | x | x | x |
| [TSM](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | 256p rawframes | Memcached | 8x8 | **[0.31](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsm_256p_rawframes_memcahed_8x8.zip)** | 6.9 | x | x | [0.41](https://download.openmmlab.com/mmaction/benchmark/recognition/temporal_shift_module/tsm_256p_rawframes_memcached_8x8.zip) | 9.1 | x | x |
| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.32](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowonly_256p_videos_disk_8x8.zip)** | 3.1 | TODO | TODO | x | x | [0.34](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowonly_r50_4x16_video.log) | 3.4 |
| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.25](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowonly_256p_fast_videos_disk_8x8.zip)** | 3.1 | TODO | TODO | x | x | [0.28](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowonly_r50_4x16_fast_video.log) | 3.4 |
| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.69](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowfast_256p_videos_disk_8x8.zip)** | 6.1 | x | x | x | x | [1.04](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowfast_r50_4x16_video.log) | 7.0 |
| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.68](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowfast_256p_fast_videos_disk_8x8.zip)** | 6.1 | x | x | x | x | [0.96](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowfast_r50_4x16_fast_video.log) | 7.0 |
| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.45](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/r2plus1d_256p_videos_disk_8x8.zip)** | 5.1 | x | x | x | x | x | x |
| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.44](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/r2plus1d_256p_fast_videos_disk_8x8.zip)** | 5.1 | x | x | x | x | x | x |
### Localizers
| Model | MMAction2 (s/iter) | BSN(boundary sensitive network) (s/iter) | BMN(boundary matching network) (s/iter) |
| :------------------------------------------------------------------------------------------------------------------ | :-----------------------: | :--------------------------------------: | :-------------------------------------: |
| BSN ([TEM + PEM + PGM](/configs/localization/bsn)) | **0.074(TEM)+0.040(PEM)** | 0.101(TEM)+0.040(PEM) | x |
| BMN ([bmn_400x100_2x8_9e_activitynet_feature](/configs/localization/bmn/bmn_400x100_2x8_9e_activitynet_feature.py)) | **3.27** | x | 3.30 |
## Details of Comparison
### TSN
- **MMAction2**
```shell
# rawframes
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_tsn configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_tsn_rawframes
# videos
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_tsn configs/recognition/tsn/tsn_r50_video_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_tsn_video
```
- **MMAction**
```shell
python -u tools/train_recognizer.py configs/TSN/tsn_kinetics400_2d_rgb_r50_seg3_f1s1.py
```
- **Temporal-Shift-Module**
```shell
python main.py kinetics RGB --arch resnet50 --num_segments 3 --gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 1 --batch-size 256 -j 32 --dropout 0.5 --consensus_type=avg --eval-freq=10 --npb --print-freq 1
```
### I3D
- **MMAction2**
```shell
# rawframes
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_i3d configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_i3d_rawframes
# videos
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_i3d configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py --work-dir work_dirs/benchmark_i3d_video
```
- **MMAction**
```shell
python -u tools/train_recognizer.py configs/I3D_RGB/i3d_kinetics400_3d_rgb_r50_c3d_inflate3x1x1_seg1_f32s2.py
```
- **PySlowFast**
```shell
python tools/run_net.py --cfg configs/Kinetics/I3D_8x8_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_i3d_r50_8x8_video.log
```
You may reproduce the result by writing a simple script to parse out the value of the field 'time_diff'.
### SlowFast
- **MMAction2**
```shell
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_slowfast configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py --work-dir work_dirs/benchmark_slowfast_video
```
- **PySlowFast**
```shell
python tools/run_net.py --cfg configs/Kinetics/SLOWFAST_4x16_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_slowfast_r50_4x16_video.log
```
You may reproduce the result by writing a simple script to parse out the value of the field 'time_diff'.
### SlowOnly
- **MMAction2**
```shell
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_slowonly configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py --work-dir work_dirs/benchmark_slowonly_video
```
- **PySlowFast**
```shell
python tools/run_net.py --cfg configs/Kinetics/SLOW_4x16_R50.yaml DATA.PATH_TO_DATA_DIR ${DATA_ROOT} NUM_GPUS 8 TRAIN.BATCH_SIZE 64 TRAIN.AUTO_RESUME False LOG_PERIOD 1 SOLVER.MAX_EPOCH 1 > pysf_slowonly_r50_4x16_video.log
```
You may reproduce the result by writing a simple script to parse out the value of the field 'time_diff'.
### R2plus1D
- **MMAction2**
```shell
bash tools/slurm_train.sh ${PARTATION_NAME} benchmark_r2plus1d configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py --work-dir work_dirs/benchmark_r2plus1d_video
```
## Changelog
### 0.24.0 (05/05/2022)
**Highlights**
- Support different seeds
**New Features**
- Add lateral norm in multigrid config ([#1567](https://github.com/open-mmlab/mmaction2/pull/1567))
- Add openpose 25 joints in graph config ([#1578](https://github.com/open-mmlab/mmaction2/pull/1578))
- Support MLU Backend ([#1608](https://github.com/open-mmlab/mmaction2/pull/1608))
**Bug and Typo Fixes**
- Fix local_rank ([#1558](https://github.com/open-mmlab/mmaction2/pull/1558))
- Fix install typo ([#1571](https://github.com/open-mmlab/mmaction2/pull/1571))
- Fix the inference API doc ([#1580](https://github.com/open-mmlab/mmaction2/pull/1580))
- Fix zh-CN demo.md and getting_started.md ([#1587](https://github.com/open-mmlab/mmaction2/pull/1587))
- Remove Recommonmark ([#1595](https://github.com/open-mmlab/mmaction2/pull/1595))
- Fix inference with ndarray ([#1603](https://github.com/open-mmlab/mmaction2/pull/1603))
- Fix the log error when `IterBasedRunner` is used ([#1606](https://github.com/open-mmlab/mmaction2/pull/1606))
### 0.23.0 (04/01/2022)
**Highlights**
- Support different seeds
- Provide multi-node training & testing script
- Update error log
**New Features**
- Support different seeds([#1502](https://github.com/open-mmlab/mmaction2/pull/1502))
- Provide multi-node training & testing script([#1521](https://github.com/open-mmlab/mmaction2/pull/1521))
- Update error log([#1546](https://github.com/open-mmlab/mmaction2/pull/1546))
**Documentations**
- Update gpus in Slowfast readme([#1497](https://github.com/open-mmlab/mmaction2/pull/1497))
- Fix work_dir in multigrid config([#1498](https://github.com/open-mmlab/mmaction2/pull/1498))
- Add sub bn docs([#1503](https://github.com/open-mmlab/mmaction2/pull/1503))
- Add shortcycle sampler docs([#1513](https://github.com/open-mmlab/mmaction2/pull/1513))
- Update Windows Declaration([#1520](https://github.com/open-mmlab/mmaction2/pull/1520))
- Update the link for ST-GCN([#1544](https://github.com/open-mmlab/mmaction2/pull/1544))
- Update install commands([#1549](https://github.com/open-mmlab/mmaction2/pull/1549))
**Bug and Typo Fixes**
- Update colab tutorial install cmds([#1522](https://github.com/open-mmlab/mmaction2/pull/1522))
- Fix num_iters_per_epoch in analyze_logs.py([#1530](https://github.com/open-mmlab/mmaction2/pull/1530))
- Fix distributed_sampler([#1532](https://github.com/open-mmlab/mmaction2/pull/1532))
- Fix cd dir error([#1545](https://github.com/open-mmlab/mmaction2/pull/1545))
- Update arg names([#1548](https://github.com/open-mmlab/mmaction2/pull/1548))
**ModelZoo**
### 0.22.0 (03/05/2022)
**Highlights**
- Support Multigrid training strategy
- Support CPU training
- Support audio demo
- Support topk customizing in models/heads/base.py
**New Features**
- Support Multigrid training strategy([#1378](https://github.com/open-mmlab/mmaction2/pull/1378))
- Support STGCN in demo_skeleton.py([#1391](https://github.com/open-mmlab/mmaction2/pull/1391))
- Support CPU training([#1407](https://github.com/open-mmlab/mmaction2/pull/1407))
- Support audio demo([#1425](https://github.com/open-mmlab/mmaction2/pull/1425))
- Support topk customizing in models/heads/base.py([#1452](https://github.com/open-mmlab/mmaction2/pull/1452))
**Documentations**
- Add OpenMMLab platform([#1393](https://github.com/open-mmlab/mmaction2/pull/1393))
- Update links([#1394](https://github.com/open-mmlab/mmaction2/pull/1394))
- Update readme in configs([#1404](https://github.com/open-mmlab/mmaction2/pull/1404))
- Update instructions to install mmcv-full([#1426](https://github.com/open-mmlab/mmaction2/pull/1426))
- Add shortcut([#1433](https://github.com/open-mmlab/mmaction2/pull/1433))
- Update modelzoo([#1439](https://github.com/open-mmlab/mmaction2/pull/1439))
- add video_structuralize in readme([#1455](https://github.com/open-mmlab/mmaction2/pull/1455))
- Update OpenMMLab repo information([#1482](https://github.com/open-mmlab/mmaction2/pull/1482))
**Bug and Typo Fixes**
- Update train.py([#1375](https://github.com/open-mmlab/mmaction2/pull/1375))
- Fix printout bug([#1382](<(https://github.com/open-mmlab/mmaction2/pull/1382)>))
- Update multi processing setting([#1395](https://github.com/open-mmlab/mmaction2/pull/1395))
- Setup multi processing both in train and test([#1405](https://github.com/open-mmlab/mmaction2/pull/1405))
- Fix bug in nondistributed multi-gpu training([#1406](https://github.com/open-mmlab/mmaction2/pull/1406))
- Add variable fps in ava_dataset.py([#1409](https://github.com/open-mmlab/mmaction2/pull/1409))
- Only support distributed training([#1414](https://github.com/open-mmlab/mmaction2/pull/1414))
- Set test_mode for AVA configs([#1432](https://github.com/open-mmlab/mmaction2/pull/1432))
- Support single label([#1434](https://github.com/open-mmlab/mmaction2/pull/1434))
- Add check copyright([#1447](https://github.com/open-mmlab/mmaction2/pull/1447))
- Support Windows CI([#1448](https://github.com/open-mmlab/mmaction2/pull/1448))
- Fix wrong device of class_weight in models/losses/cross_entropy_loss.py([#1457](https://github.com/open-mmlab/mmaction2/pull/1457))
- Fix bug caused by distributed([#1459](https://github.com/open-mmlab/mmaction2/pull/1459))
- Update readme([#1460](https://github.com/open-mmlab/mmaction2/pull/1460))
- Fix lint caused by colab automatic upload([#1461](https://github.com/open-mmlab/mmaction2/pull/1461))
- Refine CI([#1471](https://github.com/open-mmlab/mmaction2/pull/1471))
- Update pre-commit([#1474](https://github.com/open-mmlab/mmaction2/pull/1474))
- Add deprecation message for deploy tool([#1483](https://github.com/open-mmlab/mmaction2/pull/1483))
**ModelZoo**
- Support slowfast_steplr([#1421](https://github.com/open-mmlab/mmaction2/pull/1421))
### 0.21.0 (31/12/2021)
**Highlights**
- Support 2s-AGCN
- Support publish models in Windows
- Improve some sthv1 related models
- Support BABEL
**New Features**
- Support 2s-AGCN([#1248](https://github.com/open-mmlab/mmaction2/pull/1248))
- Support skip postproc in ntu_pose_extraction([#1295](https://github.com/open-mmlab/mmaction2/pull/1295))
- Support publish models in Windows([#1325](https://github.com/open-mmlab/mmaction2/pull/1325))
- Add copyright checkhook in pre-commit-config([#1344](https://github.com/open-mmlab/mmaction2/pull/1344))
**Documentations**
- Add MMFlow ([#1273](https://github.com/open-mmlab/mmaction2/pull/1273))
- Revise README.md and add projects.md ([#1286](https://github.com/open-mmlab/mmaction2/pull/1286))
- Add 2s-AGCN in Updates([#1289](https://github.com/open-mmlab/mmaction2/pull/1289))
- Add MMFewShot([#1300](https://github.com/open-mmlab/mmaction2/pull/1300))
- Add MMHuman3d([#1304](https://github.com/open-mmlab/mmaction2/pull/1304))
- Update pre-commit([#1313](https://github.com/open-mmlab/mmaction2/pull/1313))
- Use share menu from the theme instead([#1328](https://github.com/open-mmlab/mmaction2/pull/1328))
- Update installation command([#1340](https://github.com/open-mmlab/mmaction2/pull/1340))
**Bug and Typo Fixes**
- Update the inference part in notebooks([#1256](https://github.com/open-mmlab/mmaction2/pull/1256))
- Update the map_location([#1262](<(https://github.com/open-mmlab/mmaction2/pull/1262)>))
- Fix bug that start_index is not used in RawFrameDecode([#1278](https://github.com/open-mmlab/mmaction2/pull/1278))
- Fix bug in init_random_seed([#1282](https://github.com/open-mmlab/mmaction2/pull/1282))
- Fix bug in setup.py([#1303](https://github.com/open-mmlab/mmaction2/pull/1303))
- Fix interrogate error in workflows([#1305](https://github.com/open-mmlab/mmaction2/pull/1305))
- Fix typo in slowfast config([#1309](https://github.com/open-mmlab/mmaction2/pull/1309))
- Cancel previous runs that are not completed([#1327](https://github.com/open-mmlab/mmaction2/pull/1327))
- Fix missing skip_postproc parameter([#1347](https://github.com/open-mmlab/mmaction2/pull/1347))
- Update ssn.py([#1355](https://github.com/open-mmlab/mmaction2/pull/1355))
- Use latest youtube-dl([#1357](https://github.com/open-mmlab/mmaction2/pull/1357))
- Fix test-best([#1362](https://github.com/open-mmlab/mmaction2/pull/1362))
**ModelZoo**
- Improve some sthv1 related models([#1306](https://github.com/open-mmlab/mmaction2/pull/1306))
- Support BABEL([#1332](https://github.com/open-mmlab/mmaction2/pull/1332))
### 0.20.0 (07/10/2021)
**Highlights**
- Support TorchServe
- Add video structuralize demo
- Support using 3D skeletons for skeleton-based action recognition
- Benchmark PoseC3D on UCF and HMDB
**New Features**
- Support TorchServe ([#1212](https://github.com/open-mmlab/mmaction2/pull/1212))
- Support 3D skeletons pre-processing ([#1218](https://github.com/open-mmlab/mmaction2/pull/1218))
- Support video structuralize demo ([#1197](https://github.com/open-mmlab/mmaction2/pull/1197))
**Documentations**
- Revise README.md and add projects.md ([#1214](https://github.com/open-mmlab/mmaction2/pull/1214))
- Add CN docs for Skeleton dataset, PoseC3D and ST-GCN ([#1228](https://github.com/open-mmlab/mmaction2/pull/1228), [#1237](https://github.com/open-mmlab/mmaction2/pull/1237), [#1236](https://github.com/open-mmlab/mmaction2/pull/1236))
- Add tutorial for custom dataset training for skeleton-based action recognition ([#1234](https://github.com/open-mmlab/mmaction2/pull/1234))
**Bug and Typo Fixes**
- Fix tutorial link ([#1219](https://github.com/open-mmlab/mmaction2/pull/1219))
- Fix GYM links ([#1224](https://github.com/open-mmlab/mmaction2/pull/1224))
**ModelZoo**
- Benchmark PoseC3D on UCF and HMDB ([#1223](https://github.com/open-mmlab/mmaction2/pull/1223))
- Add ST-GCN + 3D skeleton model for NTU60-XSub ([#1236](https://github.com/open-mmlab/mmaction2/pull/1236))
### 0.19.0 (07/10/2021)
**Highlights**
- Support ST-GCN
- Refactor the inference API
- Add code spell check hook
**New Features**
- Support ST-GCN ([#1123](https://github.com/open-mmlab/mmaction2/pull/1123))
**Improvement**
- Add label maps for every dataset ([#1127](https://github.com/open-mmlab/mmaction2/pull/1127))
- Remove useless code MultiGroupCrop ([#1180](https://github.com/open-mmlab/mmaction2/pull/1180))
- Refactor Inference API ([#1191](https://github.com/open-mmlab/mmaction2/pull/1191))
- Add code spell check hook ([#1208](https://github.com/open-mmlab/mmaction2/pull/1208))
- Use docker in CI ([#1159](https://github.com/open-mmlab/mmaction2/pull/1159))
**Documentations**
- Update metafiles to new OpenMMLAB protocols ([#1134](https://github.com/open-mmlab/mmaction2/pull/1134))
- Switch to new doc style ([#1160](https://github.com/open-mmlab/mmaction2/pull/1160))
- Improve the ERROR message ([#1203](https://github.com/open-mmlab/mmaction2/pull/1203))
- Fix invalid URL in getting_started ([#1169](https://github.com/open-mmlab/mmaction2/pull/1169))
**Bug and Typo Fixes**
- Compatible with new MMClassification ([#1139](https://github.com/open-mmlab/mmaction2/pull/1139))
- Add missing runtime dependencies ([#1144](https://github.com/open-mmlab/mmaction2/pull/1144))
- Fix THUMOS tag proposals path ([#1156](https://github.com/open-mmlab/mmaction2/pull/1156))
- Fix LoadHVULabel ([#1194](https://github.com/open-mmlab/mmaction2/pull/1194))
- Switch the default value of `persistent_workers` to False ([#1202](https://github.com/open-mmlab/mmaction2/pull/1202))
- Fix `_freeze_stages` for MobileNetV2 ([#1193](https://github.com/open-mmlab/mmaction2/pull/1193))
- Fix resume when building rawframes ([#1150](https://github.com/open-mmlab/mmaction2/pull/1150))
- Fix device bug for class weight ([#1188](https://github.com/open-mmlab/mmaction2/pull/1188))
- Correct Arg names in extract_audio.py ([#1148](https://github.com/open-mmlab/mmaction2/pull/1148))
**ModelZoo**
- Add TSM-MobileNetV2 ported from TSM ([#1163](https://github.com/open-mmlab/mmaction2/pull/1163))
- Add ST-GCN for NTURGB+D-XSub-60 ([#1123](https://github.com/open-mmlab/mmaction2/pull/1123))
### 0.18.0 (02/09/2021)
**Improvement**
- Add CopyRight ([#1099](https://github.com/open-mmlab/mmaction2/pull/1099))
- Support NTU Pose Extraction ([#1076](https://github.com/open-mmlab/mmaction2/pull/1076))
- Support Caching in RawFrameDecode ([#1078](https://github.com/open-mmlab/mmaction2/pull/1078))
- Add citations & Support python3.9 CI & Use fixed-version sphinx ([#1125](https://github.com/open-mmlab/mmaction2/pull/1125))
**Documentations**
- Add Descriptions of PoseC3D dataset ([#1053](https://github.com/open-mmlab/mmaction2/pull/1053))
**Bug and Typo Fixes**
- Fix SSV2 checkpoints ([#1101](https://github.com/open-mmlab/mmaction2/pull/1101))
- Fix CSN normalization ([#1116](https://github.com/open-mmlab/mmaction2/pull/1116))
- Fix typo ([#1121](https://github.com/open-mmlab/mmaction2/pull/1121))
- Fix new_crop_quadruple bug ([#1108](https://github.com/open-mmlab/mmaction2/pull/1108))
### 0.17.0 (03/08/2021)
**Highlights**
- Support PyTorch 1.9
- Support Pytorchvideo Transforms
- Support PreciseBN
**New Features**
- Support Pytorchvideo Transforms ([#1008](https://github.com/open-mmlab/mmaction2/pull/1008))
- Support PreciseBN ([#1038](https://github.com/open-mmlab/mmaction2/pull/1038))
**Improvements**
- Remove redundant augmentations in config files ([#996](https://github.com/open-mmlab/mmaction2/pull/996))
- Make resource directory to hold common resource pictures ([#1011](https://github.com/open-mmlab/mmaction2/pull/1011))
- Remove deprecated FrameSelector ([#1010](https://github.com/open-mmlab/mmaction2/pull/1010))
- Support Concat Dataset ([#1000](https://github.com/open-mmlab/mmaction2/pull/1000))
- Add `to-mp4` option to resize_videos.py ([#1021](https://github.com/open-mmlab/mmaction2/pull/1021))
- Add option to keep tail frames ([#1050](https://github.com/open-mmlab/mmaction2/pull/1050))
- Update MIM support ([#1061](https://github.com/open-mmlab/mmaction2/pull/1061))
- Calculate Top-K accurate and inaccurate classes ([#1047](https://github.com/open-mmlab/mmaction2/pull/1047))
**Bug and Typo Fixes**
- Fix bug in PoseC3D demo ([#1009](https://github.com/open-mmlab/mmaction2/pull/1009))
- Fix some problems in resize_videos.py ([#1012](https://github.com/open-mmlab/mmaction2/pull/1012))
- Support torch1.9 ([#1015](https://github.com/open-mmlab/mmaction2/pull/1015))
- Remove redundant code in CI ([#1046](https://github.com/open-mmlab/mmaction2/pull/1046))
- Fix bug about persistent_workers ([#1044](https://github.com/open-mmlab/mmaction2/pull/1044))
- Support TimeSformer feature extraction ([#1035](https://github.com/open-mmlab/mmaction2/pull/1035))
- Fix ColorJitter ([#1025](https://github.com/open-mmlab/mmaction2/pull/1025))
**ModelZoo**
- Add TSM-R50 sthv1 models trained by PytorchVideo RandAugment and AugMix ([#1008](https://github.com/open-mmlab/mmaction2/pull/1008))
- Update SlowOnly SthV1 checkpoints ([#1034](https://github.com/open-mmlab/mmaction2/pull/1034))
- Add SlowOnly Kinetics400 checkpoints trained with Precise-BN ([#1038](https://github.com/open-mmlab/mmaction2/pull/1038))
- Add CSN-R50 from scratch checkpoints ([#1045](https://github.com/open-mmlab/mmaction2/pull/1045))
- TPN Kinetics-400 Checkpoints trained with the new ColorJitter ([#1025](https://github.com/open-mmlab/mmaction2/pull/1025))
**Documentation**
- Add Chinese translation of feature_extraction.md ([#1020](https://github.com/open-mmlab/mmaction2/pull/1020))
- Fix the code snippet in getting_started.md ([#1023](https://github.com/open-mmlab/mmaction2/pull/1023))
- Fix TANet config table ([#1028](https://github.com/open-mmlab/mmaction2/pull/1028))
- Add description to PoseC3D dataset ([#1053](https://github.com/open-mmlab/mmaction2/pull/1053))
### 0.16.0 (01/07/2021)
**Highlights**
- Support using backbone from pytorch-image-models(timm)
- Support PIMS Decoder
- Demo for skeleton-based action recognition
- Support Timesformer
**New Features**
- Support using backbones from pytorch-image-models(timm) for TSN ([#880](https://github.com/open-mmlab/mmaction2/pull/880))
- Support torchvision transformations in preprocessing pipelines ([#972](https://github.com/open-mmlab/mmaction2/pull/972))
- Demo for skeleton-based action recognition ([#972](https://github.com/open-mmlab/mmaction2/pull/972))
- Support Timesformer ([#839](https://github.com/open-mmlab/mmaction2/pull/839))
**Improvements**
- Add a tool to find invalid videos ([#907](https://github.com/open-mmlab/mmaction2/pull/907), [#950](https://github.com/open-mmlab/mmaction2/pull/950))
- Add an option to specify spectrogram_type ([#909](https://github.com/open-mmlab/mmaction2/pull/909))
- Add json output to video demo ([#906](https://github.com/open-mmlab/mmaction2/pull/906))
- Add MIM related docs ([#918](https://github.com/open-mmlab/mmaction2/pull/918))
- Rename lr to scheduler ([#916](https://github.com/open-mmlab/mmaction2/pull/916))
- Support `--cfg-options` for demos ([#911](https://github.com/open-mmlab/mmaction2/pull/911))
- Support number counting for flow-wise filename template ([#922](https://github.com/open-mmlab/mmaction2/pull/922))
- Add Chinese tutorial ([#941](https://github.com/open-mmlab/mmaction2/pull/941))
- Change ResNet3D default values ([#939](https://github.com/open-mmlab/mmaction2/pull/939))
- Adjust script structure ([#935](https://github.com/open-mmlab/mmaction2/pull/935))
- Add font color to args in long_video_demo ([#947](https://github.com/open-mmlab/mmaction2/pull/947))
- Polish code style with Pylint ([#908](https://github.com/open-mmlab/mmaction2/pull/908))
- Support PIMS Decoder ([#946](https://github.com/open-mmlab/mmaction2/pull/946))
- Improve Metafiles ([#956](https://github.com/open-mmlab/mmaction2/pull/956), [#979](https://github.com/open-mmlab/mmaction2/pull/979), [#966](https://github.com/open-mmlab/mmaction2/pull/966))
- Add links to download Kinetics400 validation ([#920](https://github.com/open-mmlab/mmaction2/pull/920))
- Audit the usage of shutil.rmtree ([#943](https://github.com/open-mmlab/mmaction2/pull/943))
- Polish localizer related codes([#913](https://github.com/open-mmlab/mmaction2/pull/913))
**Bug and Typo Fixes**
- Fix spatiotemporal detection demo ([#899](https://github.com/open-mmlab/mmaction2/pull/899))
- Fix docstring for 3D inflate ([#925](https://github.com/open-mmlab/mmaction2/pull/925))
- Fix bug of writing text to video with TextClip ([#952](https://github.com/open-mmlab/mmaction2/pull/952))
- Fix mmcv install in CI ([#977](https://github.com/open-mmlab/mmaction2/pull/977))
**ModelZoo**
- Add TSN with Swin Transformer backbone as an example for using pytorch-image-models(timm) backbones ([#880](https://github.com/open-mmlab/mmaction2/pull/880))
- Port CSN checkpoints from VMZ ([#945](https://github.com/open-mmlab/mmaction2/pull/945))
- Release various checkpoints for UCF101, HMDB51 and Sthv1 ([#938](https://github.com/open-mmlab/mmaction2/pull/938))
- Support Timesformer ([#839](https://github.com/open-mmlab/mmaction2/pull/839))
- Update TSM modelzoo ([#981](https://github.com/open-mmlab/mmaction2/pull/981))
### 0.15.0 (31/05/2021)
**Highlights**
- Support PoseC3D
- Support ACRN
- Support MIM
**New Features**
- Support PoseC3D ([#786](https://github.com/open-mmlab/mmaction2/pull/786), [#890](https://github.com/open-mmlab/mmaction2/pull/890))
- Support MIM ([#870](https://github.com/open-mmlab/mmaction2/pull/870))
- Support ACRN and Focal Loss ([#891](https://github.com/open-mmlab/mmaction2/pull/891))
- Support Jester dataset ([#864](https://github.com/open-mmlab/mmaction2/pull/864))
**Improvements**
- Add `metric_options` for evaluation to docs ([#873](https://github.com/open-mmlab/mmaction2/pull/873))
- Support creating a new label map based on custom classes for demos about spatio temporal demo ([#879](https://github.com/open-mmlab/mmaction2/pull/879))
- Improve document about AVA dataset preparation ([#878](https://github.com/open-mmlab/mmaction2/pull/878))
- Provide a script to extract clip-level feature ([#856](https://github.com/open-mmlab/mmaction2/pull/856))
**Bug and Typo Fixes**
- Fix issues about resume ([#877](https://github.com/open-mmlab/mmaction2/pull/877), [#878](https://github.com/open-mmlab/mmaction2/pull/878))
- Correct the key name of `eval_results` dictionary for metric 'mmit_mean_average_precision' ([#885](https://github.com/open-mmlab/mmaction2/pull/885))
**ModelZoo**
- Support Jester dataset ([#864](https://github.com/open-mmlab/mmaction2/pull/864))
- Support ACRN and Focal Loss ([#891](https://github.com/open-mmlab/mmaction2/pull/891))
### 0.14.0 (30/04/2021)
**Highlights**
- Support TRN
- Support Diving48
**New Features**
- Support TRN ([#755](https://github.com/open-mmlab/mmaction2/pull/755))
- Support Diving48 ([#835](https://github.com/open-mmlab/mmaction2/pull/835))
- Support Webcam Demo for Spatio-temporal Action Detection Models ([#795](https://github.com/open-mmlab/mmaction2/pull/795))
**Improvements**
- Add softmax option for pytorch2onnx tool ([#781](https://github.com/open-mmlab/mmaction2/pull/781))
- Support TRN ([#755](https://github.com/open-mmlab/mmaction2/pull/755))
- Test with onnx models and TensorRT engines ([#758](https://github.com/open-mmlab/mmaction2/pull/758))
- Speed up AVA Testing ([#784](https://github.com/open-mmlab/mmaction2/pull/784))
- Add `self.with_neck` attribute ([#796](https://github.com/open-mmlab/mmaction2/pull/796))
- Update installation document ([#798](https://github.com/open-mmlab/mmaction2/pull/798))
- Use a random master port ([#809](https://github.com/open-mmlab/mmaction2/pull/8098))
- Update AVA processing data document ([#801](https://github.com/open-mmlab/mmaction2/pull/801))
- Refactor spatio-temporal augmentation ([#782](https://github.com/open-mmlab/mmaction2/pull/782))
- Add QR code in CN README ([#812](https://github.com/open-mmlab/mmaction2/pull/812))
- Add Alternative way to download Kinetics ([#817](https://github.com/open-mmlab/mmaction2/pull/817), [#822](https://github.com/open-mmlab/mmaction2/pull/822))
- Refactor Sampler ([#790](https://github.com/open-mmlab/mmaction2/pull/790))
- Use EvalHook in MMCV with backward compatibility ([#793](https://github.com/open-mmlab/mmaction2/pull/793))
- Use MMCV Model Registry ([#843](https://github.com/open-mmlab/mmaction2/pull/843))
**Bug and Typo Fixes**
- Fix a bug in pytorch2onnx.py when `num_classes <= 4` ([#800](https://github.com/open-mmlab/mmaction2/pull/800), [#824](https://github.com/open-mmlab/mmaction2/pull/824))
- Fix `demo_spatiotemporal_det.py` error ([#803](https://github.com/open-mmlab/mmaction2/pull/803), [#805](https://github.com/open-mmlab/mmaction2/pull/805))
- Fix loading config bugs when resume ([#820](https://github.com/open-mmlab/mmaction2/pull/820))
- Make HMDB51 annotation generation more robust ([#811](https://github.com/open-mmlab/mmaction2/pull/811))
**ModelZoo**
- Update checkpoint for 256 height in something-V2 ([#789](https://github.com/open-mmlab/mmaction2/pull/789))
- Support Diving48 ([#835](https://github.com/open-mmlab/mmaction2/pull/835))
### 0.13.0 (31/03/2021)
**Highlights**
- Support LFB
- Support using backbone from MMCls/TorchVision
- Add Chinese documentation
**New Features**
- Support LFB ([#553](https://github.com/open-mmlab/mmaction2/pull/553))
- Support using backbones from MMCls for TSN ([#679](https://github.com/open-mmlab/mmaction2/pull/679))
- Support using backbones from TorchVision for TSN ([#720](https://github.com/open-mmlab/mmaction2/pull/720))
- Support Mixup and Cutmix for recognizers ([#681](https://github.com/open-mmlab/mmaction2/pull/681))
- Support Chinese documentation ([#665](https://github.com/open-mmlab/mmaction2/pull/665), [#680](https://github.com/open-mmlab/mmaction2/pull/680), [#689](https://github.com/open-mmlab/mmaction2/pull/689), [#701](https://github.com/open-mmlab/mmaction2/pull/701), [#702](https://github.com/open-mmlab/mmaction2/pull/702), [#703](https://github.com/open-mmlab/mmaction2/pull/703), [#706](https://github.com/open-mmlab/mmaction2/pull/706), [#716](https://github.com/open-mmlab/mmaction2/pull/716), [#717](https://github.com/open-mmlab/mmaction2/pull/717), [#731](https://github.com/open-mmlab/mmaction2/pull/731), [#733](https://github.com/open-mmlab/mmaction2/pull/733), [#735](https://github.com/open-mmlab/mmaction2/pull/735), [#736](https://github.com/open-mmlab/mmaction2/pull/736), [#737](https://github.com/open-mmlab/mmaction2/pull/737), [#738](https://github.com/open-mmlab/mmaction2/pull/738), [#739](https://github.com/open-mmlab/mmaction2/pull/739), [#740](https://github.com/open-mmlab/mmaction2/pull/740), [#742](https://github.com/open-mmlab/mmaction2/pull/742), [#752](https://github.com/open-mmlab/mmaction2/pull/752), [#759](https://github.com/open-mmlab/mmaction2/pull/759), [#761](https://github.com/open-mmlab/mmaction2/pull/761), [#772](https://github.com/open-mmlab/mmaction2/pull/772), [#775](https://github.com/open-mmlab/mmaction2/pull/775))
**Improvements**
- Add slowfast config/json/log/ckpt for training custom classes of AVA ([#678](https://github.com/open-mmlab/mmaction2/pull/678))
- Set RandAugment as Imgaug default transforms ([#585](https://github.com/open-mmlab/mmaction2/pull/585))
- Add `--test-last` & `--test-best` for `tools/train.py` to test checkpoints after training ([#608](https://github.com/open-mmlab/mmaction2/pull/608))
- Add fcn_testing in TPN ([#684](https://github.com/open-mmlab/mmaction2/pull/684))
- Remove redundant recall functions ([#741](https://github.com/open-mmlab/mmaction2/pull/741))
- Recursively remove pretrained step for testing ([#695](https://github.com/open-mmlab/mmaction2/pull/695))
- Improve demo by limiting inference fps ([#668](https://github.com/open-mmlab/mmaction2/pull/668))
**Bug and Typo Fixes**
- Fix a bug about multi-class in VideoDataset ([#723](https://github.com/open-mmlab/mmaction2/pull/678))
- Reverse key-value in anet filelist generation ([#686](https://github.com/open-mmlab/mmaction2/pull/686))
- Fix flow norm cfg typo ([#693](https://github.com/open-mmlab/mmaction2/pull/693))
**ModelZoo**
- Add LFB for AVA2.1 ([#553](https://github.com/open-mmlab/mmaction2/pull/553))
- Add TSN with ResNeXt-101-32x4d backbone as an example for using MMCls backbones ([#679](https://github.com/open-mmlab/mmaction2/pull/679))
- Add TSN with Densenet161 backbone as an example for using TorchVision backbones ([#720](https://github.com/open-mmlab/mmaction2/pull/720))
- Add slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb ([#690](https://github.com/open-mmlab/mmaction2/pull/690))
- Add slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb ([#704](https://github.com/open-mmlab/mmaction2/pull/704))
- Add slowonly_nl_kinetics_pretrained_r50_4x16x1(8x8x1)\_20e_ava_rgb ([#730](https://github.com/open-mmlab/mmaction2/pull/730))
### 0.12.0 (28/02/2021)
**Highlights**
- Support TSM-MobileNetV2
- Support TANet
- Support GPU Normalize
**New Features**
- Support TSM-MobileNetV2 ([#415](https://github.com/open-mmlab/mmaction2/pull/415))
- Support flip with label mapping ([#591](https://github.com/open-mmlab/mmaction2/pull/591))
- Add seed option for sampler ([#642](https://github.com/open-mmlab/mmaction2/pull/642))
- Support GPU Normalize ([#586](https://github.com/open-mmlab/mmaction2/pull/586))
- Support TANet ([#595](https://github.com/open-mmlab/mmaction2/pull/595))
**Improvements**
- Training custom classes of ava dataset ([#555](https://github.com/open-mmlab/mmaction2/pull/555))
- Add CN README in homepage ([#592](https://github.com/open-mmlab/mmaction2/pull/592), [#594](https://github.com/open-mmlab/mmaction2/pull/594))
- Support soft label for CrossEntropyLoss ([#625](https://github.com/open-mmlab/mmaction2/pull/625))
- Refactor config: Specify `train_cfg` and `test_cfg` in `model` ([#629](https://github.com/open-mmlab/mmaction2/pull/629))
- Provide an alternative way to download older kinetics annotations ([#597](https://github.com/open-mmlab/mmaction2/pull/597))
- Update FAQ for
- 1). data pipeline about video and frames ([#598](https://github.com/open-mmlab/mmaction2/pull/598))
- 2). how to show results ([#598](https://github.com/open-mmlab/mmaction2/pull/598))
- 3). batch size setting for batchnorm ([#657](https://github.com/open-mmlab/mmaction2/pull/657))
- 4). how to fix stages of backbone when finetuning models ([#658](https://github.com/open-mmlab/mmaction2/pull/658))
- Modify default value of `save_best` ([#600](https://github.com/open-mmlab/mmaction2/pull/600))
- Use BibTex rather than latex in markdown ([#607](https://github.com/open-mmlab/mmaction2/pull/607))
- Add warnings of uninstalling mmdet and supplementary documents ([#624](https://github.com/open-mmlab/mmaction2/pull/624))
- Support soft label for CrossEntropyLoss ([#625](https://github.com/open-mmlab/mmaction2/pull/625))
**Bug and Typo Fixes**
- Fix value of `pem_low_temporal_iou_threshold` in BSN ([#556](https://github.com/open-mmlab/mmaction2/pull/556))
- Fix ActivityNet download script ([#601](https://github.com/open-mmlab/mmaction2/pull/601))
**ModelZoo**
- Add TSM-MobileNetV2 for Kinetics400 ([#415](https://github.com/open-mmlab/mmaction2/pull/415))
- Add deeper SlowFast models ([#605](https://github.com/open-mmlab/mmaction2/pull/605))
### 0.11.0 (31/01/2021)
**Highlights**
- Support imgaug
- Support spatial temporal demo
- Refactor EvalHook, config structure, unittest structure
**New Features**
- Support [imgaug](https://imgaug.readthedocs.io/en/latest/index.html) for augmentations in the data pipeline ([#492](https://github.com/open-mmlab/mmaction2/pull/492))
- Support setting `max_testing_views` for extremely large models to save GPU memory used ([#511](https://github.com/open-mmlab/mmaction2/pull/511))
- Add spatial temporal demo ([#547](https://github.com/open-mmlab/mmaction2/pull/547), [#566](https://github.com/open-mmlab/mmaction2/pull/566))
**Improvements**
- Refactor EvalHook ([#395](https://github.com/open-mmlab/mmaction2/pull/395))
- Refactor AVA hook ([#567](https://github.com/open-mmlab/mmaction2/pull/567))
- Add repo citation ([#545](https://github.com/open-mmlab/mmaction2/pull/545))
- Add dataset size of Kinetics400 ([#503](https://github.com/open-mmlab/mmaction2/pull/503))
- Add lazy operation docs ([#504](https://github.com/open-mmlab/mmaction2/pull/504))
- Add class_weight for CrossEntropyLoss and BCELossWithLogits ([#509](https://github.com/open-mmlab/mmaction2/pull/509))
- add some explanation about the resampling in slowfast ([#502](https://github.com/open-mmlab/mmaction2/pull/502))
- Modify paper title in README.md ([#512](https://github.com/open-mmlab/mmaction2/pull/512))
- Add alternative ways to download Kinetics ([#521](https://github.com/open-mmlab/mmaction2/pull/521))
- Add OpenMMLab projects link in README ([#530](https://github.com/open-mmlab/mmaction2/pull/530))
- Change default preprocessing to shortedge to 256 ([#538](https://github.com/open-mmlab/mmaction2/pull/538))
- Add config tag in dataset README ([#540](https://github.com/open-mmlab/mmaction2/pull/540))
- Add solution for markdownlint installation issue ([#497](https://github.com/open-mmlab/mmaction2/pull/497))
- Add dataset overview in readthedocs ([#548](https://github.com/open-mmlab/mmaction2/pull/548))
- Modify the trigger mode of the warnings of missing mmdet ([#583](https://github.com/open-mmlab/mmaction2/pull/583))
- Refactor config structure ([#488](https://github.com/open-mmlab/mmaction2/pull/488), [#572](https://github.com/open-mmlab/mmaction2/pull/572))
- Refactor unittest structure ([#433](https://github.com/open-mmlab/mmaction2/pull/433))
**Bug and Typo Fixes**
- Fix a bug about ava dataset validation ([#527](https://github.com/open-mmlab/mmaction2/pull/527))
- Fix a bug about ResNet pretrain weight initialization ([#582](https://github.com/open-mmlab/mmaction2/pull/582))
- Fix a bug in CI due to MMCV index ([#495](https://github.com/open-mmlab/mmaction2/pull/495))
- Remove invalid links of MiT and MMiT ([#516](https://github.com/open-mmlab/mmaction2/pull/516))
- Fix frame rate bug for AVA preparation ([#576](https://github.com/open-mmlab/mmaction2/pull/576))
**ModelZoo**
### 0.10.0 (31/12/2020)
**Highlights**
- Support Spatio-Temporal Action Detection (AVA)
- Support precise BN
**New Features**
- Support precise BN ([#501](https://github.com/open-mmlab/mmaction2/pull/501/))
- Support Spatio-Temporal Action Detection (AVA) ([#351](https://github.com/open-mmlab/mmaction2/pull/351))
- Support to return feature maps in `inference_recognizer` ([#458](https://github.com/open-mmlab/mmaction2/pull/458))
**Improvements**
- Add arg `stride` to long_video_demo.py, to make inference faster ([#468](https://github.com/open-mmlab/mmaction2/pull/468))
- Support training and testing for Spatio-Temporal Action Detection ([#351](https://github.com/open-mmlab/mmaction2/pull/351))
- Fix CI due to pip upgrade ([#454](https://github.com/open-mmlab/mmaction2/pull/454))
- Add markdown lint in pre-commit hook ([#255](https://github.com/open-mmlab/mmaction2/pull/225))
- Speed up confusion matrix calculation ([#465](https://github.com/open-mmlab/mmaction2/pull/465))
- Use title case in modelzoo statistics ([#456](https://github.com/open-mmlab/mmaction2/pull/456))
- Add FAQ documents for easy troubleshooting. ([#413](https://github.com/open-mmlab/mmaction2/pull/413), [#420](https://github.com/open-mmlab/mmaction2/pull/420), [#439](https://github.com/open-mmlab/mmaction2/pull/439))
- Support Spatio-Temporal Action Detection with context ([#471](https://github.com/open-mmlab/mmaction2/pull/471))
- Add class weight for CrossEntropyLoss and BCELossWithLogits ([#509](https://github.com/open-mmlab/mmaction2/pull/509))
- Add Lazy OPs docs ([#504](https://github.com/open-mmlab/mmaction2/pull/504))
**Bug and Typo Fixes**
- Fix typo in default argument of BaseHead ([#446](https://github.com/open-mmlab/mmaction2/pull/446))
- Fix potential bug about `output_config` overwrite ([#463](https://github.com/open-mmlab/mmaction2/pull/463))
**ModelZoo**
- Add SlowOnly, SlowFast for AVA2.1 ([#351](https://github.com/open-mmlab/mmaction2/pull/351))
### 0.9.0 (30/11/2020)
**Highlights**
- Support GradCAM utils for recognizers
- Support ResNet Audio model
**New Features**
- Automatically add modelzoo statistics to readthedocs ([#327](https://github.com/open-mmlab/mmaction2/pull/327))
- Support GYM99 ([#331](https://github.com/open-mmlab/mmaction2/pull/331), [#336](https://github.com/open-mmlab/mmaction2/pull/336))
- Add AudioOnly Pathway from AVSlowFast. ([#355](https://github.com/open-mmlab/mmaction2/pull/355))
- Add GradCAM utils for recognizer ([#324](https://github.com/open-mmlab/mmaction2/pull/324))
- Add print config script ([#345](https://github.com/open-mmlab/mmaction2/pull/345))
- Add online motion vector decoder ([#291](https://github.com/open-mmlab/mmaction2/pull/291))
**Improvements**
- Support PyTorch 1.7 in CI ([#312](https://github.com/open-mmlab/mmaction2/pull/312))
- Support to predict different labels in a long video ([#274](https://github.com/open-mmlab/mmaction2/pull/274))
- Update docs bout test crops ([#359](https://github.com/open-mmlab/mmaction2/pull/359))
- Polish code format using pylint manually ([#338](https://github.com/open-mmlab/mmaction2/pull/338))
- Update unittest coverage ([#358](https://github.com/open-mmlab/mmaction2/pull/358), [#322](https://github.com/open-mmlab/mmaction2/pull/322), [#325](https://github.com/open-mmlab/mmaction2/pull/325))
- Add random seed for building filelists ([#323](https://github.com/open-mmlab/mmaction2/pull/323))
- Update colab tutorial ([#367](https://github.com/open-mmlab/mmaction2/pull/367))
- set default batch_size of evaluation and testing to 1 ([#250](https://github.com/open-mmlab/mmaction2/pull/250))
- Rename the preparation docs to `README.md` ([#388](https://github.com/open-mmlab/mmaction2/pull/388))
- Move docs about demo to `demo/README.md` ([#329](https://github.com/open-mmlab/mmaction2/pull/329))
- Remove redundant code in `tools/test.py` ([#310](https://github.com/open-mmlab/mmaction2/pull/310))
- Automatically calculate number of test clips for Recognizer2D ([#359](https://github.com/open-mmlab/mmaction2/pull/359))
**Bug and Typo Fixes**
- Fix rename Kinetics classnames bug ([#384](https://github.com/open-mmlab/mmaction2/pull/384))
- Fix a bug in BaseDataset when `data_prefix` is None ([#314](https://github.com/open-mmlab/mmaction2/pull/314))
- Fix a bug about `tmp_folder` in `OpenCVInit` ([#357](https://github.com/open-mmlab/mmaction2/pull/357))
- Fix `get_thread_id` when not using disk as backend ([#354](https://github.com/open-mmlab/mmaction2/pull/354), [#357](https://github.com/open-mmlab/mmaction2/pull/357))
- Fix the bug of HVU object `num_classes` from 1679 to 1678 ([#307](https://github.com/open-mmlab/mmaction2/pull/307))
- Fix typo in `export_model.md` ([#399](https://github.com/open-mmlab/mmaction2/pull/399))
- Fix OmniSource training configs ([#321](https://github.com/open-mmlab/mmaction2/pull/321))
- Fix Issue #306: Bug of SampleAVAFrames ([#317](https://github.com/open-mmlab/mmaction2/pull/317))
**ModelZoo**
- Add SlowOnly model for GYM99, both RGB and Flow ([#336](https://github.com/open-mmlab/mmaction2/pull/336))
- Add auto modelzoo statistics in readthedocs ([#327](https://github.com/open-mmlab/mmaction2/pull/327))
- Add TSN for HMDB51 pretrained on Kinetics400, Moments in Time and ImageNet. ([#372](https://github.com/open-mmlab/mmaction2/pull/372))
### v0.8.0 (31/10/2020)
**Highlights**
- Support [OmniSource](https://arxiv.org/abs/2003.13042)
- Support C3D
- Support video recognition with audio modality
- Support HVU
- Support X3D
**New Features**
- Support AVA dataset preparation ([#266](https://github.com/open-mmlab/mmaction2/pull/266))
- Support the training of video recognition dataset with multiple tag categories ([#235](https://github.com/open-mmlab/mmaction2/pull/235))
- Support joint training with multiple training datasets of multiple formats, including images, untrimmed videos, etc. ([#242](https://github.com/open-mmlab/mmaction2/pull/242))
- Support to specify a start epoch to conduct evaluation ([#216](https://github.com/open-mmlab/mmaction2/pull/216))
- Implement X3D models, support testing with model weights converted from SlowFast ([#288](https://github.com/open-mmlab/mmaction2/pull/288))
- Support specify a start epoch to conduct evaluation ([#216](https://github.com/open-mmlab/mmaction2/pull/216))
**Improvements**
- Set default values of 'average_clips' in each config file so that there is no need to set it explicitly during testing in most cases ([#232](https://github.com/open-mmlab/mmaction2/pull/232))
- Extend HVU datatools to generate individual file list for each tag category ([#258](https://github.com/open-mmlab/mmaction2/pull/258))
- Support data preparation for Kinetics-600 and Kinetics-700 ([#254](https://github.com/open-mmlab/mmaction2/pull/254))
- Use `metric_dict` to replace hardcoded arguments in `evaluate` function ([#286](https://github.com/open-mmlab/mmaction2/pull/286))
- Add `cfg-options` in arguments to override some settings in the used config for convenience ([#212](https://github.com/open-mmlab/mmaction2/pull/212))
- Rename the old evaluating protocol `mean_average_precision` as `mmit_mean_average_precision` since it is only used on MMIT and is not the `mAP` we usually talk about. Add `mean_average_precision`, which is the real `mAP` ([#235](https://github.com/open-mmlab/mmaction2/pull/235))
- Add accurate setting (Three crop * 2 clip) and report corresponding performance for TSM model ([#241](https://github.com/open-mmlab/mmaction2/pull/241))
- Add citations in each preparing_dataset.md in `tools/data/dataset` ([#289](https://github.com/open-mmlab/mmaction2/pull/289))
- Update the performance of audio-visual fusion on Kinetics-400 ([#281](https://github.com/open-mmlab/mmaction2/pull/281))
- Support data preparation of OmniSource web datasets, including GoogleImage, InsImage, InsVideo and KineticsRawVideo ([#294](https://github.com/open-mmlab/mmaction2/pull/294))
- Use `metric_options` dict to provide metric args in `evaluate` ([#286](https://github.com/open-mmlab/mmaction2/pull/286))
**Bug Fixes**
- Register `FrameSelector` in `PIPELINES` ([#268](https://github.com/open-mmlab/mmaction2/pull/268))
- Fix the potential bug for default value in dataset_setting ([#245](https://github.com/open-mmlab/mmaction2/pull/245))
- Fix multi-node dist test ([#292](https://github.com/open-mmlab/mmaction2/pull/292))
- Fix the data preparation bug for `something-something` dataset ([#278](https://github.com/open-mmlab/mmaction2/pull/278))
- Fix the invalid config url in slowonly README data benchmark ([#249](https://github.com/open-mmlab/mmaction2/pull/249))
- Validate that the performance of models trained with videos have no significant difference comparing to the performance of models trained with rawframes ([#256](https://github.com/open-mmlab/mmaction2/pull/256))
- Correct the `img_norm_cfg` used by TSN-3seg-R50 UCF-101 model, improve the Top-1 accuracy by 3% ([#273](https://github.com/open-mmlab/mmaction2/pull/273))
**ModelZoo**
- Add Baselines for Kinetics-600 and Kinetics-700, including TSN-R50-8seg and SlowOnly-R50-8x8 ([#259](https://github.com/open-mmlab/mmaction2/pull/259))
- Add OmniSource benchmark on MiniKineitcs ([#296](https://github.com/open-mmlab/mmaction2/pull/296))
- Add Baselines for HVU, including TSN-R18-8seg on 6 tag categories of HVU ([#287](https://github.com/open-mmlab/mmaction2/pull/287))
- Add X3D models ported from [SlowFast](https://github.com/facebookresearch/SlowFast/) ([#288](https://github.com/open-mmlab/mmaction2/pull/288))
### v0.7.0 (30/9/2020)
**Highlights**
- Support TPN
- Support JHMDB, UCF101-24, HVU dataset preparation
- support onnx model conversion
**New Features**
- Support the data pre-processing pipeline for the HVU Dataset ([#277](https://github.com/open-mmlab/mmaction2/pull/227/))
- Support real-time action recognition from web camera ([#171](https://github.com/open-mmlab/mmaction2/pull/171))
- Support onnx ([#160](https://github.com/open-mmlab/mmaction2/pull/160))
- Support UCF101-24 preparation ([#219](https://github.com/open-mmlab/mmaction2/pull/219))
- Support evaluating mAP for ActivityNet with [CUHK17_activitynet_pred](http://activity-net.org/challenges/2017/evaluation.html) ([#176](https://github.com/open-mmlab/mmaction2/pull/176))
- Add the data pipeline for ActivityNet, including downloading videos, extracting RGB and Flow frames, finetuning TSN and extracting feature ([#190](https://github.com/open-mmlab/mmaction2/pull/190))
- Support JHMDB preparation ([#220](https://github.com/open-mmlab/mmaction2/pull/220))
**ModelZoo**
- Add finetuning setting for SlowOnly ([#173](https://github.com/open-mmlab/mmaction2/pull/173))
- Add TSN and SlowOnly models trained with [OmniSource](https://arxiv.org/abs/2003.13042), which achieve 75.7% Top-1 with TSN-R50-3seg and 80.4% Top-1 with SlowOnly-R101-8x8 ([#215](https://github.com/open-mmlab/mmaction2/pull/215))
**Improvements**
- Support demo with video url ([#165](https://github.com/open-mmlab/mmaction2/pull/165))
- Support multi-batch when testing ([#184](https://github.com/open-mmlab/mmaction2/pull/184))
- Add tutorial for adding a new learning rate updater ([#181](https://github.com/open-mmlab/mmaction2/pull/181))
- Add config name in meta info ([#183](https://github.com/open-mmlab/mmaction2/pull/183))
- Remove git hash in `__version__` ([#189](https://github.com/open-mmlab/mmaction2/pull/189))
- Check mmcv version ([#189](https://github.com/open-mmlab/mmaction2/pull/189))
- Update url with 'https://download.openmmlab.com' ([#208](https://github.com/open-mmlab/mmaction2/pull/208))
- Update Docker file to support PyTorch 1.6 and update `install.md` ([#209](https://github.com/open-mmlab/mmaction2/pull/209))
- Polish readsthedocs display ([#217](https://github.com/open-mmlab/mmaction2/pull/217), [#229](https://github.com/open-mmlab/mmaction2/pull/229))
**Bug Fixes**
- Fix the bug when using OpenCV to extract only RGB frames with original shape ([#184](https://github.com/open-mmlab/mmaction2/pull/187))
- Fix the bug of sthv2 `num_classes` from 339 to 174 ([#174](https://github.com/open-mmlab/mmaction2/pull/174), [#207](https://github.com/open-mmlab/mmaction2/pull/207))
### v0.6.0 (2/9/2020)
**Highlights**
- Support TIN, CSN, SSN, NonLocal
- Support FP16 training
**New Features**
- Support NonLocal module and provide ckpt in TSM and I3D ([#41](https://github.com/open-mmlab/mmaction2/pull/41))
- Support SSN ([#33](https://github.com/open-mmlab/mmaction2/pull/33), [#37](https://github.com/open-mmlab/mmaction2/pull/37), [#52](https://github.com/open-mmlab/mmaction2/pull/52), [#55](https://github.com/open-mmlab/mmaction2/pull/55))
- Support CSN ([#87](https://github.com/open-mmlab/mmaction2/pull/87))
- Support TIN ([#53](https://github.com/open-mmlab/mmaction2/pull/53))
- Support HMDB51 dataset preparation ([#60](https://github.com/open-mmlab/mmaction2/pull/60))
- Support encoding videos from frames ([#84](https://github.com/open-mmlab/mmaction2/pull/84))
- Support FP16 training ([#25](https://github.com/open-mmlab/mmaction2/pull/25))
- Enhance demo by supporting rawframe inference ([#59](https://github.com/open-mmlab/mmaction2/pull/59)), output video/gif ([#72](https://github.com/open-mmlab/mmaction2/pull/72))
**ModelZoo**
- Update Slowfast modelzoo ([#51](https://github.com/open-mmlab/mmaction2/pull/51))
- Update TSN, TSM video checkpoints ([#50](https://github.com/open-mmlab/mmaction2/pull/50))
- Add data benchmark for TSN ([#57](https://github.com/open-mmlab/mmaction2/pull/57))
- Add data benchmark for SlowOnly ([#77](https://github.com/open-mmlab/mmaction2/pull/77))
- Add BSN/BMN performance results with feature extracted by our codebase ([#99](https://github.com/open-mmlab/mmaction2/pull/99))
**Improvements**
- Polish data preparation codes ([#70](https://github.com/open-mmlab/mmaction2/pull/70))
- Improve data preparation scripts ([#58](https://github.com/open-mmlab/mmaction2/pull/58))
- Improve unittest coverage and minor fix ([#62](https://github.com/open-mmlab/mmaction2/pull/62))
- Support PyTorch 1.6 in CI ([#117](https://github.com/open-mmlab/mmaction2/pull/117))
- Support `with_offset` for rawframe dataset ([#48](https://github.com/open-mmlab/mmaction2/pull/48))
- Support json annotation files ([#119](https://github.com/open-mmlab/mmaction2/pull/119))
- Support `multi-class` in TSMHead ([#104](https://github.com/open-mmlab/mmaction2/pull/104))
- Support using `val_step()` to validate data for each `val` workflow ([#123](https://github.com/open-mmlab/mmaction2/pull/123))
- Use `xxInit()` method to get `total_frames` and make `total_frames` a required key ([#90](https://github.com/open-mmlab/mmaction2/pull/90))
- Add paper introduction in model readme ([#140](https://github.com/open-mmlab/mmaction2/pull/140))
- Adjust the directory structure of `tools/` and rename some scripts files ([#142](https://github.com/open-mmlab/mmaction2/pull/142))
**Bug Fixes**
- Fix configs for localization test ([#67](https://github.com/open-mmlab/mmaction2/pull/67))
- Fix configs of SlowOnly by fixing lr to 8 gpus ([#136](https://github.com/open-mmlab/mmaction2/pull/136))
- Fix the bug in analyze_log ([#54](https://github.com/open-mmlab/mmaction2/pull/54))
- Fix the bug of generating HMDB51 class index file ([#69](https://github.com/open-mmlab/mmaction2/pull/69))
- Fix the bug of using `load_checkpoint()` in ResNet ([#93](https://github.com/open-mmlab/mmaction2/pull/93))
- Fix the bug of `--work-dir` when using slurm training script ([#110](https://github.com/open-mmlab/mmaction2/pull/110))
- Correct the sthv1/sthv2 rawframes filelist generate command ([#71](https://github.com/open-mmlab/mmaction2/pull/71))
- `CosineAnnealing` typo ([#47](https://github.com/open-mmlab/mmaction2/pull/47))
### v0.5.0 (9/7/2020)
**Highlights**
- MMAction2 is released
**New Features**
- Support various datasets: UCF101, Kinetics-400, Something-Something V1&V2, Moments in Time,
Multi-Moments in Time, THUMOS14
- Support various action recognition methods: TSN, TSM, R(2+1)D, I3D, SlowOnly, SlowFast, Non-local
- Support various action localization methods: BSN, BMN
- Colab demo for action recognition
# Copyright (c) OpenMMLab. All rights reserved.
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import subprocess
import sys
import pytorch_sphinx_theme
sys.path.insert(0, os.path.abspath('..'))
# -- Project information -----------------------------------------------------
project = 'MMAction2'
copyright = '2020, OpenMMLab'
author = 'MMAction2 Authors'
version_file = '../mmaction/version.py'
def get_version():
with open(version_file, 'r') as f:
exec(compile(f.read(), version_file, 'exec'))
return locals()['__version__']
# The full version, including alpha/beta/rc tags
release = get_version()
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc', 'sphinx.ext.napoleon', 'sphinx.ext.viewcode',
'sphinx_markdown_tables', 'sphinx_copybutton', 'myst_parser'
]
# numpy and torch are required
autodoc_mock_imports = ['mmaction.version', 'PIL']
copybutton_prompt_text = r'>>> |\.\.\. '
copybutton_prompt_is_regexp = True
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
# -- Options for HTML output -------------------------------------------------
source_suffix = {'.rst': 'restructuredtext', '.md': 'markdown'}
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'pytorch_sphinx_theme'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_theme_path = [pytorch_sphinx_theme.get_html_theme_path()]
html_theme_options = {
# 'logo_url': 'https://mmaction2.readthedocs.io/en/latest/',
'menu': [
{
'name':
'Tutorial',
'url':
'https://colab.research.google.com/github/'
'open-mmlab/mmaction2/blob/master/demo/mmaction2_tutorial.ipynb'
},
{
'name': 'GitHub',
'url': 'https://github.com/open-mmlab/mmaction2'
},
{
'name':
'Upstream',
'children': [
{
'name': 'MMCV',
'url': 'https://github.com/open-mmlab/mmcv',
'description': 'Foundational library for computer vision'
},
{
'name':
'MMClassification',
'url':
'https://github.com/open-mmlab/mmclassification',
'description':
'Open source image classification toolbox based on PyTorch'
},
{
'name': 'MMDetection',
'url': 'https://github.com/open-mmlab/mmdetection',
'description': 'Object detection toolbox and benchmark'
},
]
},
],
# Specify the language of shared menu
'menu_lang':
'en'
}
language = 'en'
master_doc = 'index'
html_static_path = ['_static']
html_css_files = ['css/readthedocs.css']
myst_enable_extensions = ['colon_fence']
myst_heading_anchors = 3
def builder_inited_handler(app):
subprocess.run(['./merge_docs.sh'])
subprocess.run(['./stat.py'])
def setup(app):
app.connect('builder-inited', builder_inited_handler)
# Data Preparation
We provide some tips for MMAction2 data preparation in this file.
<!-- TOC -->
- [Notes on Video Data Format](#notes-on-video-data-format)
- [Getting Data](#getting-data)
- [Prepare videos](#prepare-videos)
- [Extract frames](#extract-frames)
- [Alternative to denseflow](#alternative-to-denseflow)
- [Generate file list](#generate-file-list)
- [Prepare audio](#prepare-audio)
<!-- TOC -->
## Notes on Video Data Format
MMAction2 supports two types of data format: raw frames and video. The former is widely used in previous projects such as [TSN](https://github.com/yjxiong/temporal-segment-networks).
This is fast when SSD is available but fails to scale to the fast-growing datasets.
(For example, the newest edition of [Kinetics](https://deepmind.com/research/open-source/open-source-datasets/kinetics/) has 650K videos and the total frames will take up several TBs.)
The latter saves much space but has to do the computation intensive video decoding at execution time.
To make video decoding faster, we support several efficient video loading libraries, such as [decord](https://github.com/zhreshold/decord), [PyAV](https://github.com/PyAV-Org/PyAV), etc.
## Getting Data
The following guide is helpful when you want to experiment with custom dataset.
Similar to the datasets stated above, it is recommended organizing in `$MMACTION2/data/$DATASET`.
### Prepare videos
Please refer to the official website and/or the official script to prepare the videos.
Note that the videos should be arranged in either
(1). A two-level directory organized by `${CLASS_NAME}/${VIDEO_ID}`, which is recommended to be used for action recognition datasets (such as UCF101 and Kinetics)
(2). A single-level directory, which is recommended to be used for action detection datasets or those with multiple annotations per video (such as THUMOS14).
### Extract frames
To extract both frames and optical flow, you can use the tool [denseflow](https://github.com/open-mmlab/denseflow) we wrote.
Since different frame extraction tools produce different number of frames,
it is beneficial to use the same tool to do both frame extraction and the flow computation, to avoid mismatching of frame counts.
```shell
python build_rawframes.py ${SRC_FOLDER} ${OUT_FOLDER} [--task ${TASK}] [--level ${LEVEL}] \
[--num-worker ${NUM_WORKER}] [--flow-type ${FLOW_TYPE}] [--out-format ${OUT_FORMAT}] \
[--ext ${EXT}] [--new-width ${NEW_WIDTH}] [--new-height ${NEW_HEIGHT}] [--new-short ${NEW_SHORT}] \
[--resume] [--use-opencv] [--mixed-ext]
```
- `SRC_FOLDER`: Folder of the original video.
- `OUT_FOLDER`: Root folder where the extracted frames and optical flow store.
- `TASK`: Extraction task indicating which kind of frames to extract. Allowed choices are `rgb`, `flow`, `both`.
- `LEVEL`: Directory level. 1 for the single-level directory or 2 for the two-level directory.
- `NUM_WORKER`: Number of workers to build rawframes.
- `FLOW_TYPE`: Flow type to extract, e.g., `None`, `tvl1`, `warp_tvl1`, `farn`, `brox`.
- `OUT_FORMAT`: Output format for extracted frames, e.g., `jpg`, `h5`, `png`.
- `EXT`: Video file extension, e.g., `avi`, `mp4`.
- `NEW_WIDTH`: Resized image width of output.
- `NEW_HEIGHT`: Resized image height of output.
- `NEW_SHORT`: Resized image short side length keeping ratio.
- `--resume`: Whether to resume optical flow extraction instead of overwriting.
- `--use-opencv`: Whether to use OpenCV to extract rgb frames.
- `--mixed-ext`: Indicate whether process video files with mixed extensions.
The recommended practice is
1. set `$OUT_FOLDER` to be a folder located in SSD.
2. symlink the link `$OUT_FOLDER` to `$MMACTION2/data/$DATASET/rawframes`.
3. set `new-short` instead of using `new-width` and `new-height`.
```shell
ln -s ${YOUR_FOLDER} $MMACTION2/data/$DATASET/rawframes
```
#### Alternative to denseflow
In case your device doesn't fulfill the installation requirement of [denseflow](https://github.com/open-mmlab/denseflow)(like Nvidia driver version), or you just want to see some quick demos about flow extraction, we provide a python script `tools/misc/flow_extraction.py` as an alternative to denseflow. You can use it for rgb frames and optical flow extraction from one or several videos. Note that the speed of the script is much slower than denseflow, since it runs optical flow algorithms on CPU.
```shell
python tools/misc/flow_extraction.py --input ${INPUT} [--prefix ${PREFIX}] [--dest ${DEST}] [--rgb-tmpl ${RGB_TMPL}] \
[--flow-tmpl ${FLOW_TMPL}] [--start-idx ${START_IDX}] [--method ${METHOD}] [--bound ${BOUND}] [--save-rgb]
```
- `INPUT`: Videos for frame extraction, can be single video or a video list, the video list should be a txt file and just consists of filenames without directories.
- `PREFIX`: The prefix of input videos, used when input is a video list.
- `DEST`: The destination to save extracted frames.
- `RGB_TMPL`: The template filename of rgb frames.
- `FLOW_TMPL`: The template filename of flow frames.
- `START_IDX`: The start index of extracted frames.
- `METHOD`: The method used to generate flow.
- `BOUND`: The maximum of optical flow.
- `SAVE_RGB`: Also save extracted rgb frames.
### Generate file list
We provide a convenient script to generate annotation file list. You can use the following command to generate file lists given extracted frames / downloaded videos.
```shell
cd $MMACTION2
python tools/data/build_file_list.py ${DATASET} ${SRC_FOLDER} [--rgb-prefix ${RGB_PREFIX}] \
[--flow-x-prefix ${FLOW_X_PREFIX}] [--flow-y-prefix ${FLOW_Y_PREFIX}] [--num-split ${NUM_SPLIT}] \
[--subset ${SUBSET}] [--level ${LEVEL}] [--format ${FORMAT}] [--out-root-path ${OUT_ROOT_PATH}] \
[--seed ${SEED}] [--shuffle]
```
- `DATASET`: Dataset to be prepared, e.g., `ucf101`, `kinetics400`, `thumos14`, `sthv1`, `sthv2`, etc.
- `SRC_FOLDER`: Folder of the corresponding data format:
- "$MMACTION2/data/$DATASET/rawframes" if `--format rawframes`.
- "$MMACTION2/data/$DATASET/videos" if `--format videos`.
- `RGB_PREFIX`: Name prefix of rgb frames.
- `FLOW_X_PREFIX`: Name prefix of x flow frames.
- `FLOW_Y_PREFIX`: Name prefix of y flow frames.
- `NUM_SPLIT`: Number of split to file list.
- `SUBSET`: Subset to generate file list. Allowed choice are `train`, `val`, `test`.
- `LEVEL`: Directory level. 1 for the single-level directory or 2 for the two-level directory.
- `FORMAT`: Source data format to generate file list. Allowed choices are `rawframes`, `videos`.
- `OUT_ROOT_PATH`: Root path for output
- `SEED`: Random seed.
- `--shuffle`: Whether to shuffle the file list.
Now, you can go to [getting_started.md](getting_started.md) to train and test the model.
### Prepare audio
We also provide a simple script for audio waveform extraction and mel-spectrogram generation.
```shell
cd $MMACTION2
python tools/data/extract_audio.py ${ROOT} ${DST_ROOT} [--ext ${EXT}] [--num-workers ${N_WORKERS}] \
[--level ${LEVEL}]
```
- `ROOT`: The root directory of the videos.
- `DST_ROOT`: The destination root directory of the audios.
- `EXT`: Extension of the video files. e.g., `mp4`.
- `N_WORKERS`: Number of processes to be used.
After extracting audios, you are free to decode and generate the spectrogram on-the-fly such as [this](/configs/recognition_audio/resnet/tsn_r50_64x1x1_100e_kinetics400_audio.py). As for the annotations, you can directly use those of the rawframes as long as you keep the relative position of audio files same as the rawframes directory. However, extracting spectrogram on-the-fly is slow and bad for prototype iteration. Therefore, we also provide a script (and many useful tools to play with) for you to generation spectrogram off-line.
```shell
cd $MMACTION2
python tools/data/build_audio_features.py ${AUDIO_HOME_PATH} ${SPECTROGRAM_SAVE_PATH} [--level ${LEVEL}] \
[--ext $EXT] [--num-workers $N_WORKERS] [--part $PART]
```
- `AUDIO_HOME_PATH`: The root directory of the audio files.
- `SPECTROGRAM_SAVE_PATH`: The destination root directory of the audio features.
- `EXT`: Extension of the audio files. e.g., `m4a`.
- `N_WORKERS`: Number of processes to be used.
- `PART`: Determines how many parts to be splited and which part to run. e.g., `2/5` means splitting all files into 5-fold and executing the 2nd part. This is useful if you have several machines.
The annotations for audio spectrogram features are identical to those of rawframes. You can simply make a copy of `dataset_[train/val]_list_rawframes.txt` and rename it as `dataset_[train/val]_list_audio_feature.txt`
# FAQ
## Outline
We list some common issues faced by many users and their corresponding solutions here.
- [Installation](#installation)
- [Data](#data)
- [Training](#training)
- [Testing](#testing)
- [Deploying](#deploying)
Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them.
If the contents here do not cover your issue, please create an issue using the [provided templates](/.github/ISSUE_TEMPLATE/error-report.md) and make sure you fill in all required information in the template.
## Installation
- **"No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'"**
1. Uninstall existing mmcv in the environment using `pip uninstall mmcv`
2. Install mmcv-full following the [installation instruction](https://mmcv.readthedocs.io/en/latest/#installation)
- **"OSError: MoviePy Error: creation of None failed because of the following error"**
Refer to [install.md](https://github.com/open-mmlab/mmaction2/blob/master/docs/install.md#requirements)
1. For Windows users, [ImageMagick](https://www.imagemagick.org/script/index.php) will not be automatically detected by MoviePy, there is a need to modify `moviepy/config_defaults.py` file by providing the path to the ImageMagick binary called `magick`, like `IMAGEMAGICK_BINARY = "C:\\Program Files\\ImageMagick_VERSION\\magick.exe"`
2. For Linux users, there is a need to modify the `/etc/ImageMagick-6/policy.xml` file by commenting out `<policy domain="path" rights="none" pattern="@*" />` to `<!-- <policy domain="path" rights="none" pattern="@*" /> -->`, if ImageMagick is not detected by moviepy.
- **"Why I got the error message 'Please install XXCODEBASE to use XXX' even if I have already installed XXCODEBASE?"**
You got that error message because our project failed to import a function or a class from XXCODEBASE. You can try to run the corresponding line to see what happens. One possible reason is, for some codebases in OpenMMLAB, you need to install mmcv-full before you install them.
## Data
- **FileNotFound like `No such file or directory: xxx/xxx/img_00300.jpg`**
In our repo, we set `start_index=1` as default value for rawframe dataset, and `start_index=0` as default value for video dataset.
If users encounter FileNotFound error for the first or last frame of the data, there is a need to check the files begin with offset 0 or 1,
that is `xxx_00000.jpg` or `xxx_00001.jpg`, and then change the `start_index` value of data pipeline in configs.
- **How should we preprocess the videos in the dataset? Resizing them to a fix size(all videos with the same height-width ratio) like `340x256`(1) or resizing them so that the short edges of all videos are of the same length (256px or 320px)**
We have tried both preprocessing approaches and found (2) is a better solution in general, so we use (2) with short edge length 256px as the default preprocessing setting. We benchmarked these preprocessing approaches and you may find the results in [TSN Data Benchmark](https://github.com/open-mmlab/mmaction2/tree/master/configs/recognition/tsn) and [SlowOnly Data Benchmark](https://github.com/open-mmlab/mmaction2/tree/master/configs/recognition/tsn).
- **Mismatched data pipeline items lead to errors like `KeyError: 'total_frames'`**
We have both pipeline for processing videos and frames.
**For videos**, We should decode them on the fly in the pipeline, so pairs like `DecordInit & DecordDecode`, `OpenCVInit & OpenCVDecode`, `PyAVInit & PyAVDecode` should be used for this case like [this example](https://github.com/open-mmlab/mmaction2/blob/023777cfd26bb175f85d78c455f6869673e0aa09/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py#L47-L49).
**For Frames**, the image has been decoded offline, so pipeline item `RawFrameDecode` should be used for this case like [this example](https://github.com/open-mmlab/mmaction2/blob/023777cfd26bb175f85d78c455f6869673e0aa09/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb.py#L49).
`KeyError: 'total_frames'` is caused by incorrectly using `RawFrameDecode` step for videos, since when the input is a video, it can not get the `total_frame` beforehand.
## Training
- **How to just use trained recognizer models for backbone pre-training?**
Refer to [Use Pre-Trained Model](https://github.com/open-mmlab/mmaction2/blob/master/docs/tutorials/2_finetune.md#use-pre-trained-model),
in order to use the pre-trained model for the whole network, the new config adds the link of pre-trained models in the `load_from`.
And to use backbone for pre-training, you can change `pretrained` value in the backbone dict of config files to the checkpoint path / url.
When training, the unexpected keys will be ignored.
- **How to visualize the training accuracy/loss curves in real-time?**
Use `TensorboardLoggerHook` in `log_config` like
```python
log_config=dict(interval=20, hooks=[dict(type='TensorboardLoggerHook')])
```
You can refer to [tutorials/1_config.md](tutorials/1_config.md), [tutorials/7_customize_runtime.md](tutorials/7_customize_runtime.md#log-config), and [this](https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py#L118).
- **In batchnorm.py: Expected more than 1 value per channel when training**
To use batchnorm, the batch_size should be larger than 1. If `drop_last` is set as False when building dataloaders, sometimes the last batch of an epoch will have `batch_size==1` (what a coincidence ...) and training will throw out this error. You can set `drop_last` as True to avoid this error:
```python
train_dataloader=dict(drop_last=True)
```
- **How to fix stages of backbone when finetuning a model?**
You can refer to [`def _freeze_stages()`](https://github.com/open-mmlab/mmaction2/blob/0149a0e8c1e0380955db61680c0006626fd008e9/mmaction/models/backbones/x3d.py#L458) and [`frozen_stages`](https://github.com/open-mmlab/mmaction2/blob/0149a0e8c1e0380955db61680c0006626fd008e9/mmaction/models/backbones/x3d.py#L183-L184),
reminding to set `find_unused_parameters = True` in config files for distributed training or testing.
Actually, users can set `frozen_stages` to freeze stages in backbones except C3D model, since all backbones inheriting from `ResNet` and `ResNet3D` support the inner function `_freeze_stages()`.
- **How to set memcached setting in config files?**
In MMAction2, you can pass memcached kwargs to `class DecordInit` for video dataset or `RawFrameDecode` for rawframes dataset.
For more details, you can refer to [`class FileClient`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py) in MMCV for more details.
Here is an example to use memcached for rawframes dataset:
```python
mc_cfg = dict(server_list_cfg='server_list_cfg', client_cfg='client_cfg', sys_path='sys_path')
train_pipeline = [
...
dict(type='RawFrameDecode', io_backend='memcached', **mc_cfg),
...
]
```
- **How to set `load_from` value in config files to finetune models?**
In MMAction2, We set `load_from=None` as default in `configs/_base_/default_runtime.py` and owing to [inheritance design](/docs/tutorials/1_config.md),
users can directly change it by setting `load_from` in their configs.
## Testing
- **How to make predicted score normalized by softmax within \[0, 1\]?**
change this in the config, make `model['test_cfg'] = dict(average_clips='prob')`.
- **What if the model is too large and the GPU memory can not fit even only one testing sample?**
By default, the 3d models are tested with 10clips x 3crops, which are 30 views in total. For extremely large models, the GPU memory can not fit even only one testing sample (cuz there are 30 views). To handle this, you can set `max_testing_views=n` in `model['test_cfg']` of the config file. If so, n views will be used as a batch during forwarding to save GPU memory used.
- **How to show test results?**
During testing, we can use the command `--out xxx.json/pkl/yaml` to output result files for checking. The testing output has exactly the same order as the test dataset.
Besides, we provide an analysis tool for evaluating a model using the output result files in [`tools/analysis/eval_metric.py`](/tools/analysis/eval_metric.py)
## Deploying
- **Why is the onnx model converted by mmaction2 throwing error when converting to other frameworks such as TensorRT?**
For now, we can only make sure that models in mmaction2 are onnx-compatible. However, some operations in onnx may be unsupported by your target framework for deployment, e.g. TensorRT in [this issue](https://github.com/open-mmlab/mmaction2/issues/414). When such situation occurs, we suggest you raise an issue and ask the community to help as long as `pytorch2onnx.py` works well and is verified numerically.
# Feature Extraction
We provide easy to use scripts for feature extraction.
## Clip-level Feature Extraction
Clip-level feature extraction extract deep feature from a video clip, which usually lasts several to tens of seconds. The extracted feature is an n-dim vector for each clip. When performing multi-view feature extraction, e.g. n clips x m crops, the extracted feature will be the average of the n * m views.
Before applying clip-level feature extraction, you need to prepare a video list (which include all videos that you want to extract feature from). For example, the video list for videos in UCF101 will look like:
```
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c03.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi
...
YoYo/v_YoYo_g25_c01.avi
YoYo/v_YoYo_g25_c02.avi
YoYo/v_YoYo_g25_c03.avi
YoYo/v_YoYo_g25_c04.avi
YoYo/v_YoYo_g25_c05.avi
```
Assume the root of UCF101 videos is `data/ucf101/videos` and the name of the video list is `ucf101.txt`, to extract clip-level feature of UCF101 videos with Kinetics-400 pretrained TSN, you can use the following script:
```shell
python tools/misc/clip_feature_extraction.py \
configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py \
https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth \
--video-list ucf101.txt \
--video-root data/ucf101/videos \
--out ucf101_feature.pkl
```
and the extracted feature will be stored in `ucf101_feature.pkl`
You can also use distributed clip-level feature extraction. Below is an example for a node with 8 gpus.
```shell
bash tools/misc/dist_clip_feature_extraction.sh \
configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py \
https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth \
8 \
--video-list ucf101.txt \
--video-root data/ucf101/videos \
--out ucf101_feature.pkl
```
To extract clip-level feature of UCF101 videos with Kinetics-400 pretrained SlowOnly, you can use the following script:
```shell
python tools/misc/clip_feature_extraction.py \
configs/recognition/slowonly/slowonly_r50_clip_feature_extraction_4x16x1_rgb.py \
https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth \
--video-list ucf101.txt \
--video-root data/ucf101/videos \
--out ucf101_feature.pkl
```
The two config files demonstrates what a minimal config file for feature extraction looks like. You can also use other existing config files for feature extraction, as long as they use videos rather than raw frames for training and testing:
```shell
python tools/misc/clip_feature_extraction.py \
configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py \
https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth \
--video-list ucf101.txt \
--video-root data/ucf101/videos \
--out ucf101_feature.pkl
```
# Getting Started
This page provides basic tutorials about the usage of MMAction2.
For installation instructions, please see [install.md](install.md).
<!-- TOC -->
- [Getting Started](#getting-started)
- [Datasets](#datasets)
- [Inference with Pre-Trained Models](#inference-with-pre-trained-models)
- [Test a dataset](#test-a-dataset)
- [High-level APIs for testing a video and rawframes](#high-level-apis-for-testing-a-video-and-rawframes)
- [Build a Model](#build-a-model)
- [Build a model with basic components](#build-a-model-with-basic-components)
- [Write a new model](#write-a-new-model)
- [Train a Model](#train-a-model)
- [Iteration pipeline](#iteration-pipeline)
- [Training setting](#training-setting)
- [Train with a single GPU](#train-with-a-single-gpu)
- [Train with multiple GPUs](#train-with-multiple-gpus)
- [Train with multiple machines](#train-with-multiple-machines)
- [Launch multiple jobs on a single machine](#launch-multiple-jobs-on-a-single-machine)
- [Tutorials](#tutorials)
<!-- TOC -->
## Datasets
It is recommended to symlink the dataset root to `$MMACTION2/data`.
If your folder structure is different, you may need to change the corresponding paths in config files.
```
mmaction2
├── mmaction
├── tools
├── configs
├── data
│ ├── kinetics400
│ │ ├── rawframes_train
│ │ ├── rawframes_val
│ │ ├── kinetics_train_list.txt
│ │ ├── kinetics_val_list.txt
│ ├── ucf101
│ │ ├── rawframes_train
│ │ ├── rawframes_val
│ │ ├── ucf101_train_list.txt
│ │ ├── ucf101_val_list.txt
│ ├── ...
```
For more information on data preparation, please see [data_preparation.md](data_preparation.md)
For using custom datasets, please refer to [Tutorial 3: Adding New Dataset](tutorials/3_new_dataset.md)
## Inference with Pre-Trained Models
We provide testing scripts to evaluate a whole dataset (Kinetics-400, Something-Something V1&V2, (Multi-)Moments in Time, etc.),
and provide some high-level apis for easier integration to other projects.
MMAction2 also supports testing with CPU. However, it will be **very slow** and should only be used for debugging on a device without GPU.
To test with CPU, one should first disable all GPUs (if exist) with `export CUDA_VISIBLE_DEVICES=-1`, and then call the testing scripts directly with `python tools/test.py {OTHER_ARGS}`.
### Test a dataset
- [x] single GPU
- [x] single node multiple GPUs
- [x] multiple node
You can use the following commands to test a dataset.
```shell
# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] \
[--gpu-collect] [--tmpdir ${TMPDIR}] [--options ${OPTIONS}] [--average-clips ${AVG_TYPE}] \
[--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}] [--onnx] [--tensorrt]
# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] \
[--gpu-collect] [--tmpdir ${TMPDIR}] [--options ${OPTIONS}] [--average-clips ${AVG_TYPE}] \
[--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}]
```
Optional arguments:
- `RESULT_FILE`: Filename of the output results. If not specified, the results will not be saved to a file.
- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., `top_k_accuracy`, `mean_class_accuracy` are available for all datasets in recognition, `mmit_mean_average_precision` for Multi-Moments in Time, `mean_average_precision` for Multi-Moments in Time and HVU single category. `AR@AN` for ActivityNet, etc.
- `--gpu-collect`: If specified, recognition results will be collected using gpu communication. Otherwise, it will save the results on different gpus to `TMPDIR` and collect them by the rank 0 worker.
- `TMPDIR`: Temporary directory used for collecting results from multiple workers, available when `--gpu-collect` is not specified.
- `OPTIONS`: Custom options used for evaluation. Allowed values depend on the arguments of the `evaluate` function in dataset.
- `AVG_TYPE`: Items to average the test clips. If set to `prob`, it will apply softmax before averaging the clip scores. Otherwise, it will directly average the clip scores.
- `JOB_LAUNCHER`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode.
- `LOCAL_RANK`: ID for local rank. If not specified, it will be set to 0.
- `--onnx`: If specified, recognition results will be generated by onnx model and `CHECKPOINT_FILE` should be onnx model file path. Onnx model files are generated by `/tools/deployment/pytorch2onnx.py`. For now, multi-gpu mode and dynamic input shape mode are not supported. Please note that the output tensors of dataset and the input tensors of onnx model should share the same shape. And it is recommended to remove all test-time augmentation methods in `test_pipeline`(`ThreeCrop`, `TenCrop`, `twice_sample`, etc.)
- `--tensorrt`: If specified, recognition results will be generated by TensorRT engine and `CHECKPOINT_FILE` should be TensorRT engine file path. TensorRT engines are generated by exported onnx models and TensorRT official conversion tools. For now, multi-gpu mode and dynamic input shape mode are not supported. Please note that the output tensors of dataset and the input tensors of TensorRT engine should share the same shape. And it is recommended to remove all test-time augmentation methods in `test_pipeline`(`ThreeCrop`, `TenCrop`, `twice_sample`, etc.)
Examples:
Assume that you have already downloaded the checkpoints to the directory `checkpoints/`.
1. Test TSN on Kinetics-400 (without saving the test results) and evaluate the top-k accuracy and mean class accuracy.
```shell
python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \
checkpoints/SOME_CHECKPOINT.pth \
--eval top_k_accuracy mean_class_accuracy
```
2. Test TSN on Something-Something V1 with 8 GPUS, and evaluate the top-k accuracy.
```shell
./tools/dist_test.sh configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb.py \
checkpoints/SOME_CHECKPOINT.pth \
8 --out results.pkl --eval top_k_accuracy
```
3. Test TSN on Kinetics-400 in slurm environment and evaluate the top-k accuracy
```shell
python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \
checkpoints/SOME_CHECKPOINT.pth \
--launcher slurm --eval top_k_accuracy
```
4. Test TSN on Something-Something V1 with onnx model and evaluate the top-k accuracy
```shell
python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \
checkpoints/SOME_CHECKPOINT.onnx \
--eval top_k_accuracy --onnx
```
### High-level APIs for testing a video and rawframes
Here is an example of building the model and testing a given video.
```python
import torch
from mmaction.apis import init_recognizer, inference_recognizer
config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
# download the checkpoint from model zoo and put it in `checkpoints/`
checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'
# assign the desired device.
device = 'cuda:0' # or 'cpu'
device = torch.device(device)
# build the model from a config file and a checkpoint file
model = init_recognizer(config_file, checkpoint_file, device=device)
# test a single video and show the result:
video = 'demo/demo.mp4'
labels = 'tools/data/kinetics/label_map_k400.txt'
results = inference_recognizer(model, video)
# show the results
labels = open('tools/data/kinetics/label_map_k400.txt').readlines()
labels = [x.strip() for x in labels]
results = [(labels[k[0]], k[1]) for k in results]
print(f'The top-5 labels with corresponding scores are:')
for result in results:
print(f'{result[0]}: ', result[1])
```
Here is an example of building the model and testing with a given rawframes directory.
```python
import torch
from mmaction.apis import init_recognizer, inference_recognizer
config_file = 'configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py'
# download the checkpoint from model zoo and put it in `checkpoints/`
checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'
# assign the desired device.
device = 'cuda:0' # or 'cpu'
device = torch.device(device)
# build the model from a config file and a checkpoint file
model = init_recognizer(config_file, checkpoint_file, device=device)
# test rawframe directory of a single video and show the result:
video = 'SOME_DIR_PATH/'
labels = 'tools/data/kinetics/label_map_k400.txt'
results = inference_recognizer(model, video)
# show the results
labels = open('tools/data/kinetics/label_map_k400.txt').readlines()
labels = [x.strip() for x in labels]
results = [(labels[k[0]], k[1]) for k in results]
print(f'The top-5 labels with corresponding scores are:')
for result in results:
print(f'{result[0]}: ', result[1])
```
Here is an example of building the model and testing with a given video url.
```python
import torch
from mmaction.apis import init_recognizer, inference_recognizer
config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
# download the checkpoint from model zoo and put it in `checkpoints/`
checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'
# assign the desired device.
device = 'cuda:0' # or 'cpu'
device = torch.device(device)
# build the model from a config file and a checkpoint file
model = init_recognizer(config_file, checkpoint_file, device=device)
# test url of a single video and show the result:
video = 'https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4'
labels = 'tools/data/kinetics/label_map_k400.txt'
results = inference_recognizer(model, video)
# show the results
labels = open('tools/data/kinetics/label_map_k400.txt').readlines()
labels = [x.strip() for x in labels]
results = [(labels[k[0]], k[1]) for k in results]
print(f'The top-5 labels with corresponding scores are:')
for result in results:
print(f'{result[0]}: ', result[1])
```
:::{note}
We define `data_prefix` in config files and set it None as default for our provided inference configs.
If the `data_prefix` is not None, the path for the video file (or rawframe directory) to get will be `data_prefix/video`.
Here, the `video` is the param in the demo scripts above.
This detail can be found in `rawframe_dataset.py` and `video_dataset.py`. For example,
- When video (rawframes) path is `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`), and `data_prefix` is None in the config file,
the param `video` should be `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME`).
- When video (rawframes) path is `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`), and `data_prefix` is `SOME_DIR_PATH` in the config file,
the param `video` should be `VIDEO.mp4` (`VIDEO_NAME`).
- When rawframes path is `VIDEO_NAME/img_xxxxx.jpg`, and `data_prefix` is None in the config file, the param `video` should be `VIDEO_NAME`.
- When passing a url instead of a local video file, you need to use OpenCV as the video decoding backend.
:::
A notebook demo can be found in [demo/demo.ipynb](/demo/demo.ipynb)
## Build a Model
### Build a model with basic components
In MMAction2, model components are basically categorized as 4 types.
- recognizer: the whole recognizer model pipeline, usually contains a backbone and cls_head.
- backbone: usually an FCN network to extract feature maps, e.g., ResNet, BNInception.
- cls_head: the component for classification task, usually contains an FC layer with some pooling layers.
- localizer: the model for localization task, currently available: BSN, BMN.
Following some basic pipelines (e.g., `Recognizer2D`), the model structure
can be customized through config files with no pains.
If we want to implement some new components, e.g., the temporal shift backbone structure as
in [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383), there are several things to do.
1. create a new file in `mmaction/models/backbones/resnet_tsm.py`.
```python
from ..builder import BACKBONES
from .resnet import ResNet
@BACKBONES.register_module()
class ResNetTSM(ResNet):
def __init__(self,
depth,
num_segments=8,
is_shift=True,
shift_div=8,
shift_place='blockres',
temporal_pool=False,
**kwargs):
pass
def forward(self, x):
# implementation is ignored
pass
```
2. Import the module in `mmaction/models/backbones/__init__.py`
```python
from .resnet_tsm import ResNetTSM
```
3. modify the config file from
```python
backbone=dict(
type='ResNet',
pretrained='torchvision://resnet50',
depth=50,
norm_eval=False)
```
to
```python
backbone=dict(
type='ResNetTSM',
pretrained='torchvision://resnet50',
depth=50,
norm_eval=False,
shift_div=8)
```
### Write a new model
To write a new recognition pipeline, you need to inherit from `BaseRecognizer`,
which defines the following abstract methods.
- `forward_train()`: forward method of the training mode.
- `forward_test()`: forward method of the testing mode.
[Recognizer2D](/mmaction/models/recognizers/recognizer2d.py) and [Recognizer3D](/mmaction/models/recognizers/recognizer3d.py)
are good examples which show how to do that.
## Train a Model
### Iteration pipeline
MMAction2 implements distributed training and non-distributed training,
which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
We adopt distributed training for both single machine and multiple machines.
Supposing that the server has 8 GPUs, 8 processes will be started and each process runs on a single GPU.
Each process keeps an isolated model, data loader, and optimizer.
Model parameters are only synchronized once at the beginning.
After a forward and backward pass, gradients will be allreduced among all GPUs,
and the optimizer will update model parameters.
Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration.
### Training setting
All outputs (log files and checkpoints) will be saved to the working directory,
which is specified by `work_dir` in the config file.
By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by modifying the interval argument in the training config
```python
evaluation = dict(interval=5) # This evaluate the model per 5 epoch.
```
According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you need to set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu.
MMAction2 also supports training with CPU. However, it will be **very slow** and should only be used for debugging on a device without GPU.
To train with CPU, one should first disable all GPUs (if exist) with `export CUDA_VISIBLE_DEVICES=-1`, and then call the training scripts directly with `python tools/train.py {OTHER_ARGS}`.
### Train with a single GPU
```shell
python tools/train.py ${CONFIG_FILE} [optional arguments]
```
If you want to specify the working directory in the command, you can add an argument `--work-dir ${YOUR_WORK_DIR}`.
### Train with multiple GPUs
```shell
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
```
Optional arguments are:
- `--validate` (**strongly recommended**): Perform evaluation at every k (default value is 5, which can be modified by changing the `interval` value in `evaluation` dict in each config file) epochs during the training.
- `--test-last`: Test the final checkpoint when training is over, save the prediction to `${WORK_DIR}/last_pred.pkl`.
- `--test-best`: Test the best checkpoint when training is over, save the prediction to `${WORK_DIR}/best_pred.pkl`.
- `--work-dir ${WORK_DIR}`: Override the working directory specified in the config file.
- `--resume-from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
- `--gpus ${GPU_NUM}`: Number of gpus to use, which is only applicable to non-distributed training.
- `--gpu-ids ${GPU_IDS}`: IDs of gpus to use, which is only applicable to non-distributed training.
- `--seed ${SEED}`: Seed id for random state in python, numpy and pytorch to generate random numbers.
- `--deterministic`: If specified, it will set deterministic options for CUDNN backend.
- `JOB_LAUNCHER`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode.
- `LOCAL_RANK`: ID for local rank. If not specified, it will be set to 0.
Difference between `resume-from` and `load-from`:
`resume-from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally.
`load-from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.
Here is an example of using 8 GPUs to load TSN checkpoint.
```shell
./tools/dist_train.sh configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py 8 --resume-from work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/latest.pth
```
### Train with multiple machines
If you can run MMAction2 on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.)
```shell
[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} [--work-dir ${WORK_DIR}]
```
Here is an example of using 16 GPUs to train TSN on the dev partition in a slurm cluster. (use `GPUS_PER_NODE=8` to specify a single slurm cluster node with 8 GPUs.)
```shell
GPUS=16 ./tools/slurm_train.sh dev tsn_r50_k400 configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb
```
You can check [slurm_train.sh](/tools/slurm_train.sh) for full arguments and environment variables.
If you have just multiple machines connected with ethernet, you can simply run the following commands:
On the first machine:
```shell
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
```
On the second machine:
```shell
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
```
It can be extremely slow if you do not have high-speed networking like InfiniBand.
### Launch multiple jobs on a single machine
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
you need to specify different ports (29500 by default) for each job to avoid communication conflict.
If you use `dist_train.sh` to launch training jobs, you can set the port in commands.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
```
If you use launch training jobs with slurm, you need to modify `dist_params` in the config files (usually the 6th line from the bottom in config files) to set different communication ports.
In `config1.py`,
```python
dist_params = dict(backend='nccl', port=29500)
```
In `config2.py`,
```python
dist_params = dict(backend='nccl', port=29501)
```
Then you can launch two jobs with `config1.py` ang `config2.py`.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py [--work-dir ${WORK_DIR}]
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py [--work-dir ${WORK_DIR}]
```
## Tutorials
Currently, we provide some tutorials for users to [learn about configs](tutorials/1_config.md), [finetune model](tutorials/2_finetune.md),
[add new dataset](tutorials/3_new_dataset.md), [customize data pipelines](tutorials/4_data_pipeline.md),
[add new modules](tutorials/5_new_modules.md), [export a model to ONNX](tutorials/6_export_model.md) and [customize runtime settings](tutorials/7_customize_runtime.md).
Welcome to MMAction2's documentation!
=====================================
You can switch between Chinese and English documents in the lower-left corner of the layout.
您可以在页面左下角切换文档语言。
.. toctree::
:maxdepth: 2
install.md
getting_started.md
demo.md
benchmark.md
.. toctree::
:maxdepth: 2
:caption: Datasets
datasets.md
data_preparation.md
supported_datasets.md
.. toctree::
:maxdepth: 2
:caption: Model Zoo
modelzoo.md
recognition_models.md
localization_models.md
detection_models.md
skeleton_models.md
.. toctree::
:maxdepth: 2
:caption: Tutorials
tutorials/1_config.md
tutorials/2_finetune.md
tutorials/3_new_dataset.md
tutorials/4_data_pipeline.md
tutorials/5_new_modules.md
tutorials/6_export_model.md
tutorials/7_customize_runtime.md
.. toctree::
:maxdepth: 2
:caption: Useful Tools and Scripts
useful_tools.md
.. toctree::
:maxdepth: 2
:caption: Notes
changelog.md
faq.md
.. toctree::
:caption: API Reference
api.rst
.. toctree::
:caption: Switch Language
switch_language.md
Indices and tables
==================
* :ref:`genindex`
* :ref:`search`
# Installation
We provide some tips for MMAction2 installation in this file.
<!-- TOC -->
- [Installation](#installation)
- [Requirements](#requirements)
- [Prepare environment](#prepare-environment)
- [Install MMAction2](#install-mmaction2)
- [Install with CPU only](#install-with-cpu-only)
- [Another option: Docker Image](#another-option-docker-image)
- [A from-scratch setup script](#a-from-scratch-setup-script)
- [Developing with multiple MMAction2 versions](#developing-with-multiple-mmaction2-versions)
- [Verification](#verification)
<!-- TOC -->
## Requirements
- Linux, Windows (We can successfully install mmaction2 on Windows and run inference, but we haven't tried training yet)
- Python 3.6+
- PyTorch 1.3+
- CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible)
- GCC 5+
- [mmcv](https://github.com/open-mmlab/mmcv) 1.1.1+
- Numpy
- ffmpeg (4.2 is preferred)
- [decord](https://github.com/dmlc/decord) (optional, 0.4.1+): Install CPU version by `pip install decord==0.4.1` and install GPU version from source
- [PyAV](https://github.com/mikeboers/PyAV) (optional): `conda install av -c conda-forge -y`
- [PyTurboJPEG](https://github.com/lilohuang/PyTurboJPEG) (optional): `pip install PyTurboJPEG`
- [denseflow](https://github.com/open-mmlab/denseflow) (optional): See [here](https://github.com/innerlee/setup) for simple install scripts.
- [moviepy](https://zulko.github.io/moviepy/) (optional): `pip install moviepy`. See [here](https://zulko.github.io/moviepy/install.html) for official installation. **Note**(according to [this issue](https://github.com/Zulko/moviepy/issues/693)) that:
1. For Windows users, [ImageMagick](https://www.imagemagick.org/script/index.php) will not be automatically detected by MoviePy,
there is a need to modify `moviepy/config_defaults.py` file by providing the path to the ImageMagick binary called `magick`, like `IMAGEMAGICK_BINARY = "C:\\Program Files\\ImageMagick_VERSION\\magick.exe"`
2. For Linux users, there is a need to modify the `/etc/ImageMagick-6/policy.xml` file by commenting out
`<policy domain="path" rights="none" pattern="@*" />` to `<!-- <policy domain="path" rights="none" pattern="@*" /> -->`, if [ImageMagick](https://www.imagemagick.org/script/index.php) is not detected by `moviepy`.
- [Pillow-SIMD](https://docs.fast.ai/performance.html#pillow-simd) (optional): Install it by the following scripts.
```shell
conda uninstall -y --force pillow pil jpeg libtiff libjpeg-turbo
pip uninstall -y pillow pil jpeg libtiff libjpeg-turbo
conda install -yc conda-forge libjpeg-turbo
CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
conda install -y jpeg libtiff
```
:::{note}
You need to run `pip uninstall mmcv` first if you have mmcv installed.
If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`.
:::
## Prepare environment
a. Create a conda virtual environment and activate it.
```shell
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
```
b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/), e.g.,
```shell
conda install pytorch torchvision -c pytorch
```
:::{note}
Make sure that your compilation CUDA version and runtime CUDA version match.
You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).
`E.g.1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install PyTorch 1.5,
you need to install the prebuilt PyTorch with CUDA 10.1.
```shell
conda install pytorch cudatoolkit=10.1 torchvision -c pytorch
```
`E.g.2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install PyTorch 1.3.1.,
you need to install the prebuilt PyTorch with CUDA 9.2.
```shell
conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch
```
If you build PyTorch from source instead of installing the prebuilt package, you can use more CUDA versions such as 9.0.
:::
## Install MMAction2
We recommend you to install MMAction2 with [MIM](https://github.com/open-mmlab/mim).
```shell
pip install git+https://github.com/open-mmlab/mim.git
mim install mmaction2 -f https://github.com/open-mmlab/mmaction2.git
```
MIM can automatically install OpenMMLab projects and their requirements.
Or, you can install MMAction2 manually:
a. Install mmcv-full, we recommend you to install the pre-built package as below.
```shell
# pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html
```
mmcv-full is only compiled on PyTorch 1.x.0 because the compatibility usually holds between 1.x.0 and 1.x.1. If your PyTorch version is 1.x.1, you can install mmcv-full compiled with PyTorch 1.x.0 and it usually works well.
```
# We can ignore the micro version of PyTorch
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10/index.html
```
See [here](https://github.com/open-mmlab/mmcv#installation) for different versions of MMCV compatible to different PyTorch and CUDA versions.
Optionally you can choose to compile mmcv from source by the following command
```shell
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e . # package mmcv-full, which contains cuda ops, will be installed after this step
# OR pip install -e . # package mmcv, which contains no cuda ops, will be installed after this step
cd ..
```
Or directly run
```shell
pip install mmcv-full
# alternative: pip install mmcv
```
**Important:** You need to run `pip uninstall mmcv` first if you have mmcv installed. If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`.
b. Clone the MMAction2 repository.
```shell
git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2
```
c. Install build requirements and then install MMAction2.
```shell
pip install -r requirements/build.txt
pip install -v -e . # or "python setup.py develop"
```
If you build MMAction2 on macOS, replace the last command with
```shell
CC=clang CXX=clang++ CFLAGS='-stdlib=libc++' pip install -e .
```
d. Install mmdetection for spatial temporal detection tasks.
This part is **optional** if you're not going to do spatial temporal detection.
See [here](https://github.com/open-mmlab/mmdetection#installation) to install mmdetection.
:::{note}
1. The git commit id will be written to the version number with step b, e.g. 0.6.0+2e7045c. The version will also be saved in trained models.
It is recommended that you run step b each time you pull some updates from github. If C++/CUDA codes are modified, then this step is compulsory.
2. Following the above instructions, MMAction2 is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it (unless you submit some commits and want to update the version number).
3. If you would like to use `opencv-python-headless` instead of `opencv-python`,
you can install it before installing MMCV.
4. If you would like to use `PyAV`, you can install it with `conda install av -c conda-forge -y`.
5. Some dependencies are optional. Running `python setup.py develop` will only install the minimum runtime requirements.
To use optional dependencies like `decord`, either install them with `pip install -r requirements/optional.txt`
or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`,
valid keys for the `[optional]` field are `all`, `tests`, `build`, and `optional`) like `pip install -v -e .[tests,build]`.
:::
## Install with CPU only
The code can be built for CPU only environment (where CUDA isn't available).
In CPU mode you can run the demo/demo.py for example.
## Another option: Docker Image
We provide a [Dockerfile](/docker/Dockerfile) to build an image.
```shell
# build an image with PyTorch 1.6.0, CUDA 10.1, CUDNN 7.
docker build -f ./docker/Dockerfile --rm -t mmaction2 .
```
**Important:** Make sure you've installed the [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
Run it with command:
```shell
docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmaction2/data mmaction2
```
## A from-scratch setup script
Here is a full script for setting up MMAction2 with conda and link the dataset path (supposing that your Kinetics-400 dataset path is $KINETICS400_ROOT).
```shell
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
# install latest pytorch prebuilt with the default prebuilt CUDA version (usually the latest)
conda install -c pytorch pytorch torchvision -y
# install the latest mmcv or mmcv-full, here we take mmcv as example
pip install mmcv
# install mmaction2
git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2
pip install -r requirements/build.txt
python setup.py develop
mkdir data
ln -s $KINETICS400_ROOT data
```
## Developing with multiple MMAction2 versions
The train and test scripts already modify the `PYTHONPATH` to ensure the script use the MMAction2 in the current directory.
To use the default MMAction2 installed in the environment rather than that you are working with, you can remove the following line in those scripts.
```shell
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
```
## Verification
To verify whether MMAction2 and the required environment are installed correctly,
we can run sample python codes to initialize a recognizer and inference a demo video:
```python
import torch
from mmaction.apis import init_recognizer, inference_recognizer
config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
device = 'cuda:0' # or 'cpu'
device = torch.device(device)
model = init_recognizer(config_file, device=device)
# inference the demo video
inference_recognizer(model, 'demo/demo.mp4')
```
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd
#!/usr/bin/env bash
sed -i '$a\\n' ../demo/README.md
# gather models
cat ../configs/localization/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Action Localization Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > localization_models.md
cat ../configs/recognition/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Action Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > recognition_models.md
cat ../configs/recognition_audio/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" >> recognition_models.md
cat ../configs/detection/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Spatio Temporal Action Detection Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > detection_models.md
cat ../configs/skeleton/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Skeleton-based Action Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > skeleton_models.md
# demo
cat ../demo/README.md | sed "s/md#t/html#t/g" | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > demo.md
# gather datasets
cat ../tools/data/*/README.md | sed 's/# Preparing/# /g' | sed 's/#/#&/' > prepare_data.md
sed -i 's/(\/tools\/data\/activitynet\/README.md/(#activitynet/g' supported_datasets.md
sed -i 's/(\/tools\/data\/kinetics\/README.md/(#kinetics-400-600-700/g' supported_datasets.md
sed -i 's/(\/tools\/data\/mit\/README.md/(#moments-in-time/g' supported_datasets.md
sed -i 's/(\/tools\/data\/mmit\/README.md/(#multi-moments-in-time/g' supported_datasets.md
sed -i 's/(\/tools\/data\/sthv1\/README.md/(#something-something-v1/g' supported_datasets.md
sed -i 's/(\/tools\/data\/sthv2\/README.md/(#something-something-v2/g' supported_datasets.md
sed -i 's/(\/tools\/data\/thumos14\/README.md/(#thumos-14/g' supported_datasets.md
sed -i 's/(\/tools\/data\/ucf101\/README.md/(#ucf-101/g' supported_datasets.md
sed -i 's/(\/tools\/data\/ucf101_24\/README.md/(#ucf101-24/g' supported_datasets.md
sed -i 's/(\/tools\/data\/jhmdb\/README.md/(#jhmdb/g' supported_datasets.md
sed -i 's/(\/tools\/data\/hvu\/README.md/(#hvu/g' supported_datasets.md
sed -i 's/(\/tools\/data\/hmdb51\/README.md/(#hmdb51/g' supported_datasets.md
sed -i 's/(\/tools\/data\/jester\/README.md/(#jester/g' supported_datasets.md
sed -i 's/(\/tools\/data\/ava\/README.md/(#ava/g' supported_datasets.md
sed -i 's/(\/tools\/data\/gym\/README.md/(#gym/g' supported_datasets.md
sed -i 's/(\/tools\/data\/omnisource\/README.md/(#omnisource/g' supported_datasets.md
sed -i 's/(\/tools\/data\/diving48\/README.md/(#diving48/g' supported_datasets.md
sed -i 's/(\/tools\/data\/skeleton\/README.md/(#skeleton/g' supported_datasets.md
cat prepare_data.md >> supported_datasets.md
sed -i 's/](\/docs\//](/g' supported_datasets.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' supported_datasets.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' benchmark.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' getting_started.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' install.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' changelog.md
sed -i 's/](\/docs\//](/g' ./tutorials/*.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' ./tutorials/*.md
# Projects based on MMAction2
There are many research works and projects built on MMAction2.
We list some of them as examples of how to extend MMAction2 for your own projects.
As the page might not be completed, please feel free to create a PR to update this page.
## Projects as an extension
- [OTEAction2](https://github.com/openvinotoolkit/mmaction2): OpenVINO Training Extensions for Action Recognition.
## Projects of papers
There are also projects released with papers.
Some of the papers are published in top-tier conferences (CVPR, ICCV, and ECCV), the others are also highly influential.
To make this list also a reference for the community to develop and compare new video understanding algorithms, we list them following the time order of top-tier conferences.
Methods already supported and maintained by MMAction2 are not listed.
- Evidential Deep Learning for Open Set Action Recognition, ICCV 2021 Oral. [\[paper\]](https://arxiv.org/abs/2107.10161)[\[github\]](https://github.com/Cogito2012/DEAR)
- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective, ICCV 2021 Oral. [\[paper\]](https://arxiv.org/abs/2103.17263)[\[github\]](https://github.com/xvjiarui/VFS)
- MGSampler: An Explainable Sampling Strategy for Video Action Recognition, ICCV 2021. [\[paper\]](https://arxiv.org/abs/2104.09952)[\[github\]](https://github.com/MCG-NJU/MGSampler)
- MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions, ICCV 2021. [\[paper\]](https://arxiv.org/abs/2105.07404)
- Video Swin Transformer. [\[paper\]](https://arxiv.org/abs/2106.13230)[\[github\]](https://github.com/SwinTransformer/Video-Swin-Transformer)
- Long Short-Term Transformer for Online Action Detection. [\[paper\]](https://arxiv.org/abs/2107.03377)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment