Commit 5b3e36dc authored by Sugon_ldc's avatar Sugon_ldc
Browse files

add model TSM

parents
Pipeline #315 failed with stages
in 0 seconds
# Feature Extraction
We provide easy to use scripts for feature extraction.
## Clip-level Feature Extraction
Clip-level feature extraction extract deep feature from a video clip, which usually lasts several to tens of seconds. The extracted feature is an n-dim vector for each clip. When performing multi-view feature extraction, e.g. n clips x m crops, the extracted feature will be the average of the n * m views.
Before applying clip-level feature extraction, you need to prepare a video list (which include all videos that you want to extract feature from). For example, the video list for videos in UCF101 will look like:
```
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c03.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi
...
YoYo/v_YoYo_g25_c01.avi
YoYo/v_YoYo_g25_c02.avi
YoYo/v_YoYo_g25_c03.avi
YoYo/v_YoYo_g25_c04.avi
YoYo/v_YoYo_g25_c05.avi
```
Assume the root of UCF101 videos is `data/ucf101/videos` and the name of the video list is `ucf101.txt`, to extract clip-level feature of UCF101 videos with Kinetics-400 pretrained TSN, you can use the following script:
```shell
python tools/misc/clip_feature_extraction.py \
configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py \
https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth \
--video-list ucf101.txt \
--video-root data/ucf101/videos \
--out ucf101_feature.pkl
```
and the extracted feature will be stored in `ucf101_feature.pkl`
You can also use distributed clip-level feature extraction. Below is an example for a node with 8 gpus.
```shell
bash tools/misc/dist_clip_feature_extraction.sh \
configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py \
https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth \
8 \
--video-list ucf101.txt \
--video-root data/ucf101/videos \
--out ucf101_feature.pkl
```
To extract clip-level feature of UCF101 videos with Kinetics-400 pretrained SlowOnly, you can use the following script:
```shell
python tools/misc/clip_feature_extraction.py \
configs/recognition/slowonly/slowonly_r50_clip_feature_extraction_4x16x1_rgb.py \
https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth \
--video-list ucf101.txt \
--video-root data/ucf101/videos \
--out ucf101_feature.pkl
```
The two config files demonstrates what a minimal config file for feature extraction looks like. You can also use other existing config files for feature extraction, as long as they use videos rather than raw frames for training and testing:
```shell
python tools/misc/clip_feature_extraction.py \
configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py \
https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth \
--video-list ucf101.txt \
--video-root data/ucf101/videos \
--out ucf101_feature.pkl
```
# Getting Started
This page provides basic tutorials about the usage of MMAction2.
For installation instructions, please see [install.md](install.md).
<!-- TOC -->
- [Getting Started](#getting-started)
- [Datasets](#datasets)
- [Inference with Pre-Trained Models](#inference-with-pre-trained-models)
- [Test a dataset](#test-a-dataset)
- [High-level APIs for testing a video and rawframes](#high-level-apis-for-testing-a-video-and-rawframes)
- [Build a Model](#build-a-model)
- [Build a model with basic components](#build-a-model-with-basic-components)
- [Write a new model](#write-a-new-model)
- [Train a Model](#train-a-model)
- [Iteration pipeline](#iteration-pipeline)
- [Training setting](#training-setting)
- [Train with a single GPU](#train-with-a-single-gpu)
- [Train with multiple GPUs](#train-with-multiple-gpus)
- [Train with multiple machines](#train-with-multiple-machines)
- [Launch multiple jobs on a single machine](#launch-multiple-jobs-on-a-single-machine)
- [Tutorials](#tutorials)
<!-- TOC -->
## Datasets
It is recommended to symlink the dataset root to `$MMACTION2/data`.
If your folder structure is different, you may need to change the corresponding paths in config files.
```
mmaction2
├── mmaction
├── tools
├── configs
├── data
│ ├── kinetics400
│ │ ├── rawframes_train
│ │ ├── rawframes_val
│ │ ├── kinetics_train_list.txt
│ │ ├── kinetics_val_list.txt
│ ├── ucf101
│ │ ├── rawframes_train
│ │ ├── rawframes_val
│ │ ├── ucf101_train_list.txt
│ │ ├── ucf101_val_list.txt
│ ├── ...
```
For more information on data preparation, please see [data_preparation.md](data_preparation.md)
For using custom datasets, please refer to [Tutorial 3: Adding New Dataset](tutorials/3_new_dataset.md)
## Inference with Pre-Trained Models
We provide testing scripts to evaluate a whole dataset (Kinetics-400, Something-Something V1&V2, (Multi-)Moments in Time, etc.),
and provide some high-level apis for easier integration to other projects.
MMAction2 also supports testing with CPU. However, it will be **very slow** and should only be used for debugging on a device without GPU.
To test with CPU, one should first disable all GPUs (if exist) with `export CUDA_VISIBLE_DEVICES=-1`, and then call the testing scripts directly with `python tools/test.py {OTHER_ARGS}`.
### Test a dataset
- [x] single GPU
- [x] single node multiple GPUs
- [x] multiple node
You can use the following commands to test a dataset.
```shell
# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] \
[--gpu-collect] [--tmpdir ${TMPDIR}] [--options ${OPTIONS}] [--average-clips ${AVG_TYPE}] \
[--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}] [--onnx] [--tensorrt]
# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] \
[--gpu-collect] [--tmpdir ${TMPDIR}] [--options ${OPTIONS}] [--average-clips ${AVG_TYPE}] \
[--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}]
```
Optional arguments:
- `RESULT_FILE`: Filename of the output results. If not specified, the results will not be saved to a file.
- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., `top_k_accuracy`, `mean_class_accuracy` are available for all datasets in recognition, `mmit_mean_average_precision` for Multi-Moments in Time, `mean_average_precision` for Multi-Moments in Time and HVU single category. `AR@AN` for ActivityNet, etc.
- `--gpu-collect`: If specified, recognition results will be collected using gpu communication. Otherwise, it will save the results on different gpus to `TMPDIR` and collect them by the rank 0 worker.
- `TMPDIR`: Temporary directory used for collecting results from multiple workers, available when `--gpu-collect` is not specified.
- `OPTIONS`: Custom options used for evaluation. Allowed values depend on the arguments of the `evaluate` function in dataset.
- `AVG_TYPE`: Items to average the test clips. If set to `prob`, it will apply softmax before averaging the clip scores. Otherwise, it will directly average the clip scores.
- `JOB_LAUNCHER`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode.
- `LOCAL_RANK`: ID for local rank. If not specified, it will be set to 0.
- `--onnx`: If specified, recognition results will be generated by onnx model and `CHECKPOINT_FILE` should be onnx model file path. Onnx model files are generated by `/tools/deployment/pytorch2onnx.py`. For now, multi-gpu mode and dynamic input shape mode are not supported. Please note that the output tensors of dataset and the input tensors of onnx model should share the same shape. And it is recommended to remove all test-time augmentation methods in `test_pipeline`(`ThreeCrop`, `TenCrop`, `twice_sample`, etc.)
- `--tensorrt`: If specified, recognition results will be generated by TensorRT engine and `CHECKPOINT_FILE` should be TensorRT engine file path. TensorRT engines are generated by exported onnx models and TensorRT official conversion tools. For now, multi-gpu mode and dynamic input shape mode are not supported. Please note that the output tensors of dataset and the input tensors of TensorRT engine should share the same shape. And it is recommended to remove all test-time augmentation methods in `test_pipeline`(`ThreeCrop`, `TenCrop`, `twice_sample`, etc.)
Examples:
Assume that you have already downloaded the checkpoints to the directory `checkpoints/`.
1. Test TSN on Kinetics-400 (without saving the test results) and evaluate the top-k accuracy and mean class accuracy.
```shell
python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \
checkpoints/SOME_CHECKPOINT.pth \
--eval top_k_accuracy mean_class_accuracy
```
2. Test TSN on Something-Something V1 with 8 GPUS, and evaluate the top-k accuracy.
```shell
./tools/dist_test.sh configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb.py \
checkpoints/SOME_CHECKPOINT.pth \
8 --out results.pkl --eval top_k_accuracy
```
3. Test TSN on Kinetics-400 in slurm environment and evaluate the top-k accuracy
```shell
python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \
checkpoints/SOME_CHECKPOINT.pth \
--launcher slurm --eval top_k_accuracy
```
4. Test TSN on Something-Something V1 with onnx model and evaluate the top-k accuracy
```shell
python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \
checkpoints/SOME_CHECKPOINT.onnx \
--eval top_k_accuracy --onnx
```
### High-level APIs for testing a video and rawframes
Here is an example of building the model and testing a given video.
```python
import torch
from mmaction.apis import init_recognizer, inference_recognizer
config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
# download the checkpoint from model zoo and put it in `checkpoints/`
checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'
# assign the desired device.
device = 'cuda:0' # or 'cpu'
device = torch.device(device)
# build the model from a config file and a checkpoint file
model = init_recognizer(config_file, checkpoint_file, device=device)
# test a single video and show the result:
video = 'demo/demo.mp4'
labels = 'tools/data/kinetics/label_map_k400.txt'
results = inference_recognizer(model, video)
# show the results
labels = open('tools/data/kinetics/label_map_k400.txt').readlines()
labels = [x.strip() for x in labels]
results = [(labels[k[0]], k[1]) for k in results]
print(f'The top-5 labels with corresponding scores are:')
for result in results:
print(f'{result[0]}: ', result[1])
```
Here is an example of building the model and testing with a given rawframes directory.
```python
import torch
from mmaction.apis import init_recognizer, inference_recognizer
config_file = 'configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py'
# download the checkpoint from model zoo and put it in `checkpoints/`
checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'
# assign the desired device.
device = 'cuda:0' # or 'cpu'
device = torch.device(device)
# build the model from a config file and a checkpoint file
model = init_recognizer(config_file, checkpoint_file, device=device)
# test rawframe directory of a single video and show the result:
video = 'SOME_DIR_PATH/'
labels = 'tools/data/kinetics/label_map_k400.txt'
results = inference_recognizer(model, video)
# show the results
labels = open('tools/data/kinetics/label_map_k400.txt').readlines()
labels = [x.strip() for x in labels]
results = [(labels[k[0]], k[1]) for k in results]
print(f'The top-5 labels with corresponding scores are:')
for result in results:
print(f'{result[0]}: ', result[1])
```
Here is an example of building the model and testing with a given video url.
```python
import torch
from mmaction.apis import init_recognizer, inference_recognizer
config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
# download the checkpoint from model zoo and put it in `checkpoints/`
checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'
# assign the desired device.
device = 'cuda:0' # or 'cpu'
device = torch.device(device)
# build the model from a config file and a checkpoint file
model = init_recognizer(config_file, checkpoint_file, device=device)
# test url of a single video and show the result:
video = 'https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4'
labels = 'tools/data/kinetics/label_map_k400.txt'
results = inference_recognizer(model, video)
# show the results
labels = open('tools/data/kinetics/label_map_k400.txt').readlines()
labels = [x.strip() for x in labels]
results = [(labels[k[0]], k[1]) for k in results]
print(f'The top-5 labels with corresponding scores are:')
for result in results:
print(f'{result[0]}: ', result[1])
```
:::{note}
We define `data_prefix` in config files and set it None as default for our provided inference configs.
If the `data_prefix` is not None, the path for the video file (or rawframe directory) to get will be `data_prefix/video`.
Here, the `video` is the param in the demo scripts above.
This detail can be found in `rawframe_dataset.py` and `video_dataset.py`. For example,
- When video (rawframes) path is `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`), and `data_prefix` is None in the config file,
the param `video` should be `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME`).
- When video (rawframes) path is `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`), and `data_prefix` is `SOME_DIR_PATH` in the config file,
the param `video` should be `VIDEO.mp4` (`VIDEO_NAME`).
- When rawframes path is `VIDEO_NAME/img_xxxxx.jpg`, and `data_prefix` is None in the config file, the param `video` should be `VIDEO_NAME`.
- When passing a url instead of a local video file, you need to use OpenCV as the video decoding backend.
:::
A notebook demo can be found in [demo/demo.ipynb](/demo/demo.ipynb)
## Build a Model
### Build a model with basic components
In MMAction2, model components are basically categorized as 4 types.
- recognizer: the whole recognizer model pipeline, usually contains a backbone and cls_head.
- backbone: usually an FCN network to extract feature maps, e.g., ResNet, BNInception.
- cls_head: the component for classification task, usually contains an FC layer with some pooling layers.
- localizer: the model for localization task, currently available: BSN, BMN.
Following some basic pipelines (e.g., `Recognizer2D`), the model structure
can be customized through config files with no pains.
If we want to implement some new components, e.g., the temporal shift backbone structure as
in [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383), there are several things to do.
1. create a new file in `mmaction/models/backbones/resnet_tsm.py`.
```python
from ..builder import BACKBONES
from .resnet import ResNet
@BACKBONES.register_module()
class ResNetTSM(ResNet):
def __init__(self,
depth,
num_segments=8,
is_shift=True,
shift_div=8,
shift_place='blockres',
temporal_pool=False,
**kwargs):
pass
def forward(self, x):
# implementation is ignored
pass
```
2. Import the module in `mmaction/models/backbones/__init__.py`
```python
from .resnet_tsm import ResNetTSM
```
3. modify the config file from
```python
backbone=dict(
type='ResNet',
pretrained='torchvision://resnet50',
depth=50,
norm_eval=False)
```
to
```python
backbone=dict(
type='ResNetTSM',
pretrained='torchvision://resnet50',
depth=50,
norm_eval=False,
shift_div=8)
```
### Write a new model
To write a new recognition pipeline, you need to inherit from `BaseRecognizer`,
which defines the following abstract methods.
- `forward_train()`: forward method of the training mode.
- `forward_test()`: forward method of the testing mode.
[Recognizer2D](/mmaction/models/recognizers/recognizer2d.py) and [Recognizer3D](/mmaction/models/recognizers/recognizer3d.py)
are good examples which show how to do that.
## Train a Model
### Iteration pipeline
MMAction2 implements distributed training and non-distributed training,
which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
We adopt distributed training for both single machine and multiple machines.
Supposing that the server has 8 GPUs, 8 processes will be started and each process runs on a single GPU.
Each process keeps an isolated model, data loader, and optimizer.
Model parameters are only synchronized once at the beginning.
After a forward and backward pass, gradients will be allreduced among all GPUs,
and the optimizer will update model parameters.
Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration.
### Training setting
All outputs (log files and checkpoints) will be saved to the working directory,
which is specified by `work_dir` in the config file.
By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by modifying the interval argument in the training config
```python
evaluation = dict(interval=5) # This evaluate the model per 5 epoch.
```
According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you need to set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu.
MMAction2 also supports training with CPU. However, it will be **very slow** and should only be used for debugging on a device without GPU.
To train with CPU, one should first disable all GPUs (if exist) with `export CUDA_VISIBLE_DEVICES=-1`, and then call the training scripts directly with `python tools/train.py {OTHER_ARGS}`.
### Train with a single GPU
```shell
python tools/train.py ${CONFIG_FILE} [optional arguments]
```
If you want to specify the working directory in the command, you can add an argument `--work-dir ${YOUR_WORK_DIR}`.
### Train with multiple GPUs
```shell
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
```
Optional arguments are:
- `--validate` (**strongly recommended**): Perform evaluation at every k (default value is 5, which can be modified by changing the `interval` value in `evaluation` dict in each config file) epochs during the training.
- `--test-last`: Test the final checkpoint when training is over, save the prediction to `${WORK_DIR}/last_pred.pkl`.
- `--test-best`: Test the best checkpoint when training is over, save the prediction to `${WORK_DIR}/best_pred.pkl`.
- `--work-dir ${WORK_DIR}`: Override the working directory specified in the config file.
- `--resume-from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
- `--gpus ${GPU_NUM}`: Number of gpus to use, which is only applicable to non-distributed training.
- `--gpu-ids ${GPU_IDS}`: IDs of gpus to use, which is only applicable to non-distributed training.
- `--seed ${SEED}`: Seed id for random state in python, numpy and pytorch to generate random numbers.
- `--deterministic`: If specified, it will set deterministic options for CUDNN backend.
- `JOB_LAUNCHER`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode.
- `LOCAL_RANK`: ID for local rank. If not specified, it will be set to 0.
Difference between `resume-from` and `load-from`:
`resume-from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally.
`load-from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.
Here is an example of using 8 GPUs to load TSN checkpoint.
```shell
./tools/dist_train.sh configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py 8 --resume-from work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/latest.pth
```
### Train with multiple machines
If you can run MMAction2 on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.)
```shell
[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} [--work-dir ${WORK_DIR}]
```
Here is an example of using 16 GPUs to train TSN on the dev partition in a slurm cluster. (use `GPUS_PER_NODE=8` to specify a single slurm cluster node with 8 GPUs.)
```shell
GPUS=16 ./tools/slurm_train.sh dev tsn_r50_k400 configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb
```
You can check [slurm_train.sh](/tools/slurm_train.sh) for full arguments and environment variables.
If you have just multiple machines connected with ethernet, you can simply run the following commands:
On the first machine:
```shell
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
```
On the second machine:
```shell
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
```
It can be extremely slow if you do not have high-speed networking like InfiniBand.
### Launch multiple jobs on a single machine
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
you need to specify different ports (29500 by default) for each job to avoid communication conflict.
If you use `dist_train.sh` to launch training jobs, you can set the port in commands.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
```
If you use launch training jobs with slurm, you need to modify `dist_params` in the config files (usually the 6th line from the bottom in config files) to set different communication ports.
In `config1.py`,
```python
dist_params = dict(backend='nccl', port=29500)
```
In `config2.py`,
```python
dist_params = dict(backend='nccl', port=29501)
```
Then you can launch two jobs with `config1.py` ang `config2.py`.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py [--work-dir ${WORK_DIR}]
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py [--work-dir ${WORK_DIR}]
```
## Tutorials
Currently, we provide some tutorials for users to [learn about configs](tutorials/1_config.md), [finetune model](tutorials/2_finetune.md),
[add new dataset](tutorials/3_new_dataset.md), [customize data pipelines](tutorials/4_data_pipeline.md),
[add new modules](tutorials/5_new_modules.md), [export a model to ONNX](tutorials/6_export_model.md) and [customize runtime settings](tutorials/7_customize_runtime.md).
Welcome to MMAction2's documentation!
=====================================
You can switch between Chinese and English documents in the lower-left corner of the layout.
您可以在页面左下角切换文档语言。
.. toctree::
:maxdepth: 2
install.md
getting_started.md
demo.md
benchmark.md
.. toctree::
:maxdepth: 2
:caption: Datasets
datasets.md
data_preparation.md
supported_datasets.md
.. toctree::
:maxdepth: 2
:caption: Model Zoo
modelzoo.md
recognition_models.md
localization_models.md
detection_models.md
skeleton_models.md
.. toctree::
:maxdepth: 2
:caption: Tutorials
tutorials/1_config.md
tutorials/2_finetune.md
tutorials/3_new_dataset.md
tutorials/4_data_pipeline.md
tutorials/5_new_modules.md
tutorials/6_export_model.md
tutorials/7_customize_runtime.md
.. toctree::
:maxdepth: 2
:caption: Useful Tools and Scripts
useful_tools.md
.. toctree::
:maxdepth: 2
:caption: Notes
changelog.md
faq.md
.. toctree::
:caption: API Reference
api.rst
.. toctree::
:caption: Switch Language
switch_language.md
Indices and tables
==================
* :ref:`genindex`
* :ref:`search`
# Installation
We provide some tips for MMAction2 installation in this file.
<!-- TOC -->
- [Installation](#installation)
- [Requirements](#requirements)
- [Prepare environment](#prepare-environment)
- [Install MMAction2](#install-mmaction2)
- [Install with CPU only](#install-with-cpu-only)
- [Another option: Docker Image](#another-option-docker-image)
- [A from-scratch setup script](#a-from-scratch-setup-script)
- [Developing with multiple MMAction2 versions](#developing-with-multiple-mmaction2-versions)
- [Verification](#verification)
<!-- TOC -->
## Requirements
- Linux, Windows (We can successfully install mmaction2 on Windows and run inference, but we haven't tried training yet)
- Python 3.6+
- PyTorch 1.3+
- CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible)
- GCC 5+
- [mmcv](https://github.com/open-mmlab/mmcv) 1.1.1+
- Numpy
- ffmpeg (4.2 is preferred)
- [decord](https://github.com/dmlc/decord) (optional, 0.4.1+): Install CPU version by `pip install decord==0.4.1` and install GPU version from source
- [PyAV](https://github.com/mikeboers/PyAV) (optional): `conda install av -c conda-forge -y`
- [PyTurboJPEG](https://github.com/lilohuang/PyTurboJPEG) (optional): `pip install PyTurboJPEG`
- [denseflow](https://github.com/open-mmlab/denseflow) (optional): See [here](https://github.com/innerlee/setup) for simple install scripts.
- [moviepy](https://zulko.github.io/moviepy/) (optional): `pip install moviepy`. See [here](https://zulko.github.io/moviepy/install.html) for official installation. **Note**(according to [this issue](https://github.com/Zulko/moviepy/issues/693)) that:
1. For Windows users, [ImageMagick](https://www.imagemagick.org/script/index.php) will not be automatically detected by MoviePy,
there is a need to modify `moviepy/config_defaults.py` file by providing the path to the ImageMagick binary called `magick`, like `IMAGEMAGICK_BINARY = "C:\\Program Files\\ImageMagick_VERSION\\magick.exe"`
2. For Linux users, there is a need to modify the `/etc/ImageMagick-6/policy.xml` file by commenting out
`<policy domain="path" rights="none" pattern="@*" />` to `<!-- <policy domain="path" rights="none" pattern="@*" /> -->`, if [ImageMagick](https://www.imagemagick.org/script/index.php) is not detected by `moviepy`.
- [Pillow-SIMD](https://github.com/uploadcare/pillow-simd) (optional): Install it by the following scripts.
```shell
conda uninstall -y --force pillow pil jpeg libtiff libjpeg-turbo
pip uninstall -y pillow pil jpeg libtiff libjpeg-turbo
conda install -yc conda-forge libjpeg-turbo
CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
conda install -y jpeg libtiff
```
:::{note}
You need to run `pip uninstall mmcv` first if you have mmcv installed.
If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`.
:::
## Prepare environment
a. Create a conda virtual environment and activate it.
```shell
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
```
b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/), e.g.,
```shell
conda install pytorch torchvision -c pytorch
```
:::{note}
Make sure that your compilation CUDA version and runtime CUDA version match.
You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).
`E.g.1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install PyTorch 1.5,
you need to install the prebuilt PyTorch with CUDA 10.1.
```shell
conda install pytorch cudatoolkit=10.1 torchvision -c pytorch
```
`E.g.2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install PyTorch 1.3.1.,
you need to install the prebuilt PyTorch with CUDA 9.2.
```shell
conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch
```
If you build PyTorch from source instead of installing the prebuilt package, you can use more CUDA versions such as 9.0.
:::
## Install MMAction2
We recommend you to install MMAction2 with [MIM](https://github.com/open-mmlab/mim).
```shell
pip install git+https://github.com/open-mmlab/mim.git
mim install mmaction2 -f https://github.com/open-mmlab/mmaction2.git
```
MIM can automatically install OpenMMLab projects and their requirements.
Or, you can install MMAction2 manually:
a. Install mmcv-full, we recommend you to install the pre-built package as below.
```shell
# pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html
```
mmcv-full is only compiled on PyTorch 1.x.0 because the compatibility usually holds between 1.x.0 and 1.x.1. If your PyTorch version is 1.x.1, you can install mmcv-full compiled with PyTorch 1.x.0 and it usually works well.
```
# We can ignore the micro version of PyTorch
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10/index.html
```
See [here](https://github.com/open-mmlab/mmcv#installation) for different versions of MMCV compatible to different PyTorch and CUDA versions.
Optionally you can choose to compile mmcv from source by the following command
```shell
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e . # package mmcv-full, which contains cuda ops, will be installed after this step
# OR pip install -e . # package mmcv, which contains no cuda ops, will be installed after this step
cd ..
```
Or directly run
```shell
pip install mmcv-full
# alternative: pip install mmcv
```
**Important:** You need to run `pip uninstall mmcv` first if you have mmcv installed. If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`.
b. Clone the MMAction2 repository.
```shell
git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2
```
c. Install build requirements and then install MMAction2.
```shell
pip install -r requirements/build.txt
pip install -v -e . # or "python setup.py develop"
```
If you build MMAction2 on macOS, replace the last command with
```shell
CC=clang CXX=clang++ CFLAGS='-stdlib=libc++' pip install -e .
```
d. Install mmdetection for spatial temporal detection tasks.
This part is **optional** if you're not going to do spatial temporal detection.
See [here](https://github.com/open-mmlab/mmdetection#installation) to install mmdetection.
:::{note}
1. The git commit id will be written to the version number with step b, e.g. 0.6.0+2e7045c. The version will also be saved in trained models.
It is recommended that you run step b each time you pull some updates from github. If C++/CUDA codes are modified, then this step is compulsory.
2. Following the above instructions, MMAction2 is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it (unless you submit some commits and want to update the version number).
3. If you would like to use `opencv-python-headless` instead of `opencv-python`,
you can install it before installing MMCV.
4. If you would like to use `PyAV`, you can install it with `conda install av -c conda-forge -y`.
5. Some dependencies are optional. Running `python setup.py develop` will only install the minimum runtime requirements.
To use optional dependencies like `decord`, either install them with `pip install -r requirements/optional.txt`
or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`,
valid keys for the `[optional]` field are `all`, `tests`, `build`, and `optional`) like `pip install -v -e .[tests,build]`.
:::
## Install with CPU only
The code can be built for CPU only environment (where CUDA isn't available).
In CPU mode you can run the demo/demo.py for example.
## Another option: Docker Image
We provide a [Dockerfile](/docker/Dockerfile) to build an image.
```shell
# build an image with PyTorch 1.6.0, CUDA 10.1, CUDNN 7.
docker build -f ./docker/Dockerfile --rm -t mmaction2 .
```
**Important:** Make sure you've installed the [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
Run it with command:
```shell
docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmaction2/data mmaction2
```
## A from-scratch setup script
Here is a full script for setting up MMAction2 with conda and link the dataset path (supposing that your Kinetics-400 dataset path is $KINETICS400_ROOT).
```shell
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
# install latest pytorch prebuilt with the default prebuilt CUDA version (usually the latest)
conda install -c pytorch pytorch torchvision -y
# install the latest mmcv or mmcv-full, here we take mmcv as example
pip install mmcv
# install mmaction2
git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2
pip install -r requirements/build.txt
python setup.py develop
mkdir data
ln -s $KINETICS400_ROOT data
```
## Developing with multiple MMAction2 versions
The train and test scripts already modify the `PYTHONPATH` to ensure the script use the MMAction2 in the current directory.
To use the default MMAction2 installed in the environment rather than that you are working with, you can remove the following line in those scripts.
```shell
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
```
## Verification
To verify whether MMAction2 and the required environment are installed correctly,
we can run sample python codes to initialize a recognizer and inference a demo video:
```python
import torch
from mmaction.apis import init_recognizer, inference_recognizer
config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
device = 'cuda:0' # or 'cpu'
device = torch.device(device)
model = init_recognizer(config_file, device=device)
# inference the demo video
inference_recognizer(model, 'demo/demo.mp4')
```
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd
#!/usr/bin/env bash
sed -i '$a\\n' ../demo/README.md
# gather models
cat ../../configs/localization/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Action Localization Models' | sed 's/](\/docs\/en\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > localization_models.md
cat ../../configs/recognition/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Action Recognition Models' | sed 's/](\/docs\/en\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > recognition_models.md
cat ../../configs/recognition_audio/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed 's/](\/docs\/en\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" >> recognition_models.md
cat ../../configs/detection/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Spatio Temporal Action Detection Models' | sed 's/](\/docs\/en\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > detection_models.md
cat ../../configs/skeleton/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Skeleton-based Action Recognition Models' | sed 's/](\/docs\/en\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > skeleton_models.md
# demo
cat ../../demo/README.md | sed "s/md#t/html#t/g" | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > demo.md
# gather datasets
cat ../../tools/data/*/README.md | sed 's/# Preparing/# /g' | sed 's/#/#&/' > prepare_data.md
sed -i 's/(\/tools\/data\/activitynet\/README.md/(#activitynet/g' supported_datasets.md
sed -i 's/(\/tools\/data\/kinetics\/README.md/(#kinetics-400600700/g' supported_datasets.md
sed -i 's/(\/tools\/data\/mit\/README.md/(#moments-in-time/g' supported_datasets.md
sed -i 's/(\/tools\/data\/mmit\/README.md/(#multi-moments-in-time/g' supported_datasets.md
sed -i 's/(\/tools\/data\/sthv1\/README.md/(#something-something-v1/g' supported_datasets.md
sed -i 's/(\/tools\/data\/sthv2\/README.md/(#something-something-v2/g' supported_datasets.md
sed -i "s/(\/tools\/data\/thumos14\/README.md/(#thumos14/g" supported_datasets.md
sed -i 's/(\/tools\/data\/ucf101\/README.md/(#ucf-101/g' supported_datasets.md
sed -i 's/(\/tools\/data\/ucf101_24\/README.md/(#ucf101-24/g' supported_datasets.md
sed -i 's/(\/tools\/data\/jhmdb\/README.md/(#jhmdb/g' supported_datasets.md
sed -i 's/(\/tools\/data\/hvu\/README.md/(#hvu/g' supported_datasets.md
sed -i 's/(\/tools\/data\/hmdb51\/README.md/(#hmdb51/g' supported_datasets.md
sed -i 's/(\/tools\/data\/jester\/README.md/(#jester/g' supported_datasets.md
sed -i 's/(\/tools\/data\/ava\/README.md/(#ava/g' supported_datasets.md
sed -i 's/(\/tools\/data\/gym\/README.md/(#gym/g' supported_datasets.md
sed -i 's/(\/tools\/data\/omnisource\/README.md/(#omnisource/g' supported_datasets.md
sed -i 's/(\/tools\/data\/diving48\/README.md/(#diving48/g' supported_datasets.md
sed -i 's/(\/tools\/data\/skeleton\/README.md/(#skeleton-dataset/g' supported_datasets.md
cat prepare_data.md >> supported_datasets.md
sed -i 's/](\/docs\/en\//](/g' supported_datasets.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' supported_datasets.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' benchmark.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' getting_started.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' install.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' changelog.md
sed -i 's/](\/docs\/en\//](/g' ./tutorials/*.md
sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' ./tutorials/*.md
# Projects based on MMAction2
There are many research works and projects built on MMAction2.
We list some of them as examples of how to extend MMAction2 for your own projects.
As the page might not be completed, please feel free to create a PR to update this page.
## Projects as an extension
- [OTEAction2](https://github.com/openvinotoolkit/mmaction2): OpenVINO Training Extensions for Action Recognition.
## Projects of papers
There are also projects released with papers.
Some of the papers are published in top-tier conferences (CVPR, ICCV, and ECCV), the others are also highly influential.
To make this list also a reference for the community to develop and compare new video understanding algorithms, we list them following the time order of top-tier conferences.
Methods already supported and maintained by MMAction2 are not listed.
- Evidential Deep Learning for Open Set Action Recognition, ICCV 2021 Oral. [\[paper\]](https://arxiv.org/abs/2107.10161)[\[github\]](https://github.com/Cogito2012/DEAR)
- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective, ICCV 2021 Oral. [\[paper\]](https://arxiv.org/abs/2103.17263)[\[github\]](https://github.com/xvjiarui/VFS)
- MGSampler: An Explainable Sampling Strategy for Video Action Recognition, ICCV 2021. [\[paper\]](https://arxiv.org/abs/2104.09952)[\[github\]](https://github.com/MCG-NJU/MGSampler)
- MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions, ICCV 2021. [\[paper\]](https://arxiv.org/abs/2105.07404)
- Video Swin Transformer. [\[paper\]](https://arxiv.org/abs/2106.13230)[\[github\]](https://github.com/SwinTransformer/Video-Swin-Transformer)
- Long Short-Term Transformer for Online Action Detection. [\[paper\]](https://arxiv.org/abs/2107.03377)
#!/usr/bin/env python
# Copyright (c) OpenMMLab. All rights reserved.
import functools as func
import glob
import re
from os.path import basename, splitext
import numpy as np
import titlecase
def anchor(name):
return re.sub(r'-+', '-', re.sub(r'[^a-zA-Z0-9]', '-',
name.strip().lower())).strip('-')
# Count algorithms
files = sorted(glob.glob('*_models.md'))
# files = sorted(glob.glob('docs/*_models.md'))
stats = []
for f in files:
with open(f, 'r') as content_file:
content = content_file.read()
# title
title = content.split('\n')[0].replace('#', '')
# skip IMAGE and ABSTRACT tags
content = [
x for x in content.split('\n')
if 'IMAGE' not in x and 'ABSTRACT' not in x
]
content = '\n'.join(content)
# count papers
papers = set(
(papertype, titlecase.titlecase(paper.lower().strip()))
for (papertype, paper) in re.findall(
r'<!--\s*\[([A-Z]*?)\]\s*-->\s*\n.*?\btitle\s*=\s*{(.*?)}',
content, re.DOTALL))
# paper links
revcontent = '\n'.join(list(reversed(content.splitlines())))
paperlinks = {}
for _, p in papers:
print(p)
q = p.replace('\\', '\\\\').replace('?', '\\?')
paperlinks[p] = ' '.join(
(f'[->]({splitext(basename(f))[0]}.html#{anchor(paperlink)})'
for paperlink in re.findall(
rf'\btitle\s*=\s*{{\s*{q}\s*}}.*?\n## (.*?)\s*[,;]?\s*\n',
revcontent, re.DOTALL | re.IGNORECASE)))
print(' ', paperlinks[p])
paperlist = '\n'.join(
sorted(f' - [{t}] {x} ({paperlinks[x]})' for t, x in papers))
# count configs
configs = set(x.lower().strip()
for x in re.findall(r'https.*configs/.*\.py', content))
# count ckpts
ckpts = set(x.lower().strip()
for x in re.findall(r'https://download.*\.pth', content)
if 'mmaction' in x)
statsmsg = f"""
## [{title}]({f})
* Number of checkpoints: {len(ckpts)}
* Number of configs: {len(configs)}
* Number of papers: {len(papers)}
{paperlist}
"""
stats.append((papers, configs, ckpts, statsmsg))
allpapers = func.reduce(lambda a, b: a.union(b), [p for p, _, _, _ in stats])
allconfigs = func.reduce(lambda a, b: a.union(b), [c for _, c, _, _ in stats])
allckpts = func.reduce(lambda a, b: a.union(b), [c for _, _, c, _ in stats])
msglist = '\n'.join(x for _, _, _, x in stats)
papertypes, papercounts = np.unique([t for t, _ in allpapers],
return_counts=True)
countstr = '\n'.join(
[f' - {t}: {c}' for t, c in zip(papertypes, papercounts)])
modelzoo = f"""
# Overview
* Number of checkpoints: {len(allckpts)}
* Number of configs: {len(allconfigs)}
* Number of papers: {len(allpapers)}
{countstr}
For supported datasets, see [datasets overview](datasets.md).
{msglist}
"""
with open('modelzoo.md', 'w') as f:
f.write(modelzoo)
# Count datasets
files = ['supported_datasets.md']
# files = sorted(glob.glob('docs/tasks/*.md'))
datastats = []
for f in files:
with open(f, 'r') as content_file:
content = content_file.read()
# title
title = content.split('\n')[0].replace('#', '')
# count papers
papers = set(
(papertype, titlecase.titlecase(paper.lower().strip()))
for (papertype, paper) in re.findall(
r'<!--\s*\[([A-Z]*?)\]\s*-->\s*\n.*?\btitle\s*=\s*{(.*?)}',
content, re.DOTALL))
# paper links
revcontent = '\n'.join(list(reversed(content.splitlines())))
paperlinks = {}
for _, p in papers:
print(p)
q = p.replace('\\', '\\\\').replace('?', '\\?')
paperlinks[p] = ', '.join(
(f'[{p.strip()} ->]({splitext(basename(f))[0]}.html#{anchor(p)})'
for p in re.findall(
rf'\btitle\s*=\s*{{\s*{q}\s*}}.*?\n## (.*?)\s*[,;]?\s*\n',
revcontent, re.DOTALL | re.IGNORECASE)))
print(' ', paperlinks[p])
paperlist = '\n'.join(
sorted(f' - [{t}] {x} ({paperlinks[x]})' for t, x in papers))
statsmsg = f"""
## [{title}]({f})
* Number of papers: {len(papers)}
{paperlist}
"""
datastats.append((papers, configs, ckpts, statsmsg))
alldatapapers = func.reduce(lambda a, b: a.union(b),
[p for p, _, _, _ in datastats])
# Summarize
msglist = '\n'.join(x for _, _, _, x in stats)
datamsglist = '\n'.join(x for _, _, _, x in datastats)
papertypes, papercounts = np.unique([t for t, _ in alldatapapers],
return_counts=True)
countstr = '\n'.join(
[f' - {t}: {c}' for t, c in zip(papertypes, papercounts)])
modelzoo = f"""
# Overview
* Number of papers: {len(alldatapapers)}
{countstr}
For supported action algorithms, see [modelzoo overview](modelzoo.md).
{datamsglist}
"""
with open('datasets.md', 'w') as f:
f.write(modelzoo)
# Supported Datasets
- Action Recognition
- [UCF101](/tools/data/ucf101/README.md) \[ [Homepage](https://www.crcv.ucf.edu/research/data-sets/ucf101/) \].
- [HMDB51](/tools/data/hmdb51/README.md) \[ [Homepage](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/) \].
- [Kinetics-\[400/600/700\]](/tools/data/kinetics/README.md) \[ [Homepage](https://deepmind.com/research/open-source/kinetics) \]
- [Something-Something V1](/tools/data/sthv1/README.md) \[ [Homepage](https://20bn.com/datasets/something-something/v1) \]
- [Something-Something V2](/tools/data/sthv2/README.md) \[ [Homepage](https://20bn.com/datasets/something-something) \]
- [Moments in Time](/tools/data/mit/README.md) \[ [Homepage](http://moments.csail.mit.edu/) \]
- [Multi-Moments in Time](/tools/data/mmit/README.md) \[ [Homepage](http://moments.csail.mit.edu/challenge_iccv_2019.html) \]
- [HVU](/tools/data/hvu/README.md) \[ [Homepage](https://github.com/holistic-video-understanding/HVU-Dataset) \]
- [Jester](/tools/data/jester/README.md) \[ [Homepage](https://developer.qualcomm.com/software/ai-datasets/jester) \]
- [GYM](/tools/data/gym/README.md) \[ [Homepage](https://sdolivia.github.io/FineGym/) \]
- [ActivityNet](/tools/data/activitynet/README.md) \[ [Homepage](http://activity-net.org/) \]
- [Diving48](/tools/data/diving48/README.md) \[ [Homepage](http://www.svcl.ucsd.edu/projects/resound/dataset.html) \]
- [OmniSource](/tools/data/omnisource/README.md) \[ [Homepage](https://kennymckormick.github.io/omnisource/) \]
- Temporal Action Detection
- [ActivityNet](/tools/data/activitynet/README.md) \[ [Homepage](http://activity-net.org/) \]
- [THUMOS14](/tools/data/thumos14/README.md) \[ [Homepage](https://www.crcv.ucf.edu/THUMOS14/download.html) \]
- Spatial Temporal Action Detection
- [AVA](/tools/data/ava/README.md) \[ [Homepage](https://research.google.com/ava/index.html) \]
- [UCF101-24](/tools/data/ucf101_24/README.md) \[ [Homepage](http://www.thumos.info/download.html) \]
- [JHMDB](/tools/data/jhmdb/README.md) \[ [Homepage](http://jhmdb.is.tue.mpg.de/) \]
- Skeleton-based Action Recognition
- [PoseC3D Skeleton Dataset](/tools/data/skeleton/README.md) \[ [Homepage](https://kennymckormick.github.io/posec3d/) \]
The supported datasets are listed above.
We provide shell scripts for data preparation under the path `$MMACTION2/tools/data/`.
Below is the detailed tutorials of data deployment for each dataset.
## <a href='https://mmaction2.readthedocs.io/en/latest/'>English</a>
## <a href='https://mmaction2.readthedocs.io/zh_CN/latest/'>简体中文</a>
# Tutorial 1: Learn about Configs
We use python files as configs, incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments.
You can find all the provided configs under `$MMAction2/configs`. If you wish to inspect the config file,
you may run `python tools/analysis/print_config.py /PATH/TO/CONFIG` to see the complete config.
<!-- TOC -->
- [Tutorial 1: Learn about Configs](#tutorial-1-learn-about-configs)
- [Modify config through script arguments](#modify-config-through-script-arguments)
- [Config File Structure](#config-file-structure)
- [Config File Naming Convention](#config-file-naming-convention)
- [Config System for Action localization](#config-system-for-action-localization)
- [Config System for Action Recognition](#config-system-for-action-recognition)
- [Config System for Spatio-Temporal Action Detection](#config-system-for-spatio-temporal-action-detection)
- [FAQ](#faq)
- [Use intermediate variables in configs](#use-intermediate-variables-in-configs)
<!-- TOC -->
## Modify config through script arguments
When submitting jobs using "tools/train.py" or "tools/test.py", you may specify `--cfg-options` to in-place modify the config.
- Update config keys of dict.
The config options can be specified following the order of the dict keys in the original config.
For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode.
- Update keys inside a list of configs.
Some config dicts are composed as a list in your config. For example, the training pipeline `data.train.pipeline` is normally a list
e.g. `[dict(type='SampleFrames'), ...]`. If you want to change `'SampleFrames'` to `'DenseSampleFrames'` in the pipeline,
you may specify `--cfg-options data.train.pipeline.0.type=DenseSampleFrames`.
- Update values of list/tuples.
If the value to be updated is a list or a tuple. For example, the config file normally sets `workflow=[('train', 1)]`. If you want to
change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark " is necessary to
support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value.
## Config File Structure
There are 3 basic component types under `config/_base_`, model, schedule, default_runtime.
Many methods could be easily constructed with one of each like TSN, I3D, SlowOnly, etc.
The configs that are composed by components from `_base_` are called _primitive_.
For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3.
For easy understanding, we recommend contributors to inherit from exiting methods.
For example, if some modification is made base on TSN, users may first inherit the basic TSN structure by specifying `_base_ = ../tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py`, then modify the necessary fields in the config files.
If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder under `configs/TASK`.
Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/understand_mmcv/config.html) for detailed documentation.
## Config File Naming Convention
We follow the style below to name config files. Contributors are advised to follow the same style.
```
{model}_[model setting]_{backbone}_[misc]_{data setting}_[gpu x batch_per_gpu]_{schedule}_{dataset}_{modality}
```
`{xxx}` is required field and `[yyy]` is optional.
- `{model}`: model type, e.g. `tsn`, `i3d`, etc.
- `[model setting]`: specific setting for some models.
- `{backbone}`: backbone type, e.g. `r50` (ResNet-50), etc.
- `[misc]`: miscellaneous setting/plugins of model, e.g. `dense`, `320p`, `video`, etc.
- `{data setting}`: frame sample setting in `{clip_len}x{frame_interval}x{num_clips}` format.
- `[gpu x batch_per_gpu]`: GPUs and samples per GPU.
- `{schedule}`: training schedule, e.g. `20e` means 20 epochs.
- `{dataset}`: dataset name, e.g. `kinetics400`, `mmit`, etc.
- `{modality}`: frame modality, e.g. `rgb`, `flow`, etc.
### Config System for Action localization
We incorporate modular design into our config system,
which is convenient to conduct various experiments.
- An Example of BMN
To help the users have a basic idea of a complete config structure and the modules in an action localization system,
we make brief comments on the config of BMN as the following.
For more detailed usage and alternative for per parameter in each module, please refer to the [API documentation](https://mmaction2.readthedocs.io/en/latest/api.html).
```python
# model settings
model = dict( # Config of the model
type='BMN', # Type of the localizer
temporal_dim=100, # Total frames selected for each video
boundary_ratio=0.5, # Ratio for determining video boundaries
num_samples=32, # Number of samples for each proposal
num_samples_per_bin=3, # Number of bin samples for each sample
feat_dim=400, # Dimension of feature
soft_nms_alpha=0.4, # Soft NMS alpha
soft_nms_low_threshold=0.5, # Soft NMS low threshold
soft_nms_high_threshold=0.9, # Soft NMS high threshold
post_process_top_k=100) # Top k proposals in post process
# model training and testing settings
train_cfg = None # Config of training hyperparameters for BMN
test_cfg = dict(average_clips='score') # Config for testing hyperparameters for BMN
# dataset settings
dataset_type = 'ActivityNetDataset' # Type of dataset for training, validation and testing
data_root = 'data/activitynet_feature_cuhk/csv_mean_100/' # Root path to data for training
data_root_val = 'data/activitynet_feature_cuhk/csv_mean_100/' # Root path to data for validation and testing
ann_file_train = 'data/ActivityNet/anet_anno_train.json' # Path to the annotation file for training
ann_file_val = 'data/ActivityNet/anet_anno_val.json' # Path to the annotation file for validation
ann_file_test = 'data/ActivityNet/anet_anno_test.json' # Path to the annotation file for testing
train_pipeline = [ # List of training pipeline steps
dict(type='LoadLocalizationFeature'), # Load localization feature pipeline
dict(type='GenerateLocalizationLabels'), # Generate localization labels pipeline
dict( # Config of Collect
type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer
keys=['raw_feature', 'gt_bbox'], # Keys of input
meta_name='video_meta', # Meta name
meta_keys=['video_name']), # Meta keys of input
dict( # Config of ToTensor
type='ToTensor', # Convert other types to tensor type pipeline
keys=['raw_feature']), # Keys to be converted from image to tensor
dict( # Config of ToDataContainer
type='ToDataContainer', # Pipeline to convert the data to DataContainer
fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) # Required fields to be converted with keys and attributes
]
val_pipeline = [ # List of validation pipeline steps
dict(type='LoadLocalizationFeature'), # Load localization feature pipeline
dict(type='GenerateLocalizationLabels'), # Generate localization labels pipeline
dict( # Config of Collect
type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer
keys=['raw_feature', 'gt_bbox'], # Keys of input
meta_name='video_meta', # Meta name
meta_keys=[
'video_name', 'duration_second', 'duration_frame', 'annotations',
'feature_frame'
]), # Meta keys of input
dict( # Config of ToTensor
type='ToTensor', # Convert other types to tensor type pipeline
keys=['raw_feature']), # Keys to be converted from image to tensor
dict( # Config of ToDataContainer
type='ToDataContainer', # Pipeline to convert the data to DataContainer
fields=[dict(key='gt_bbox', stack=False, cpu_only=True)]) # Required fields to be converted with keys and attributes
]
test_pipeline = [ # List of testing pipeline steps
dict(type='LoadLocalizationFeature'), # Load localization feature pipeline
dict( # Config of Collect
type='Collect', # Collect pipeline that decides which keys in the data should be passed to the localizer
keys=['raw_feature'], # Keys of input
meta_name='video_meta', # Meta name
meta_keys=[
'video_name', 'duration_second', 'duration_frame', 'annotations',
'feature_frame'
]), # Meta keys of input
dict( # Config of ToTensor
type='ToTensor', # Convert other types to tensor type pipeline
keys=['raw_feature']), # Keys to be converted from image to tensor
]
data = dict( # Config of data
videos_per_gpu=8, # Batch size of each single GPU
workers_per_gpu=8, # Workers to pre-fetch data for each single GPU
train_dataloader=dict( # Additional config of train dataloader
drop_last=True), # Whether to drop out the last batch of data in training
val_dataloader=dict( # Additional config of validation dataloader
videos_per_gpu=1), # Batch size of each single GPU during evaluation
test_dataloader=dict( # Additional config of test dataloader
videos_per_gpu=2), # Batch size of each single GPU during testing
test=dict( # Testing dataset config
type=dataset_type,
ann_file=ann_file_test,
pipeline=test_pipeline,
data_prefix=data_root_val),
val=dict( # Validation dataset config
type=dataset_type,
ann_file=ann_file_val,
pipeline=val_pipeline,
data_prefix=data_root_val),
train=dict( # Training dataset config
type=dataset_type,
ann_file=ann_file_train,
pipeline=train_pipeline,
data_prefix=data_root))
# optimizer
optimizer = dict(
# Config used to build optimizer, support (1). All the optimizers in PyTorch
# whose arguments are also the same as those in PyTorch. (2). Custom optimizers
# which are built on `constructor`, referring to "tutorials/5_new_modules.md"
# for implementation.
type='Adam', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
lr=0.001, # Learning rate, see detail usages of the parameters in the documentation of PyTorch
weight_decay=0.0001) # Weight decay of Adam
optimizer_config = dict( # Config used to build the optimizer hook
grad_clip=None) # Most of the methods do not use gradient clip
# learning policy
lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook
policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
step=7) # Steps to decay the learning rate
total_epochs = 9 # Total epochs to train the model
checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
interval=1) # Interval to save checkpoint
evaluation = dict( # Config of evaluation during training
interval=1, # Interval to perform evaluation
metrics=['AR@AN']) # Metrics to be performed
log_config = dict( # Config to register logger hook
interval=50, # Interval to print the log
hooks=[ # Hooks to be implemented during training
dict(type='TextLoggerHook'), # The logger used to record the training process
# dict(type='TensorboardLoggerHook'), # The Tensorboard logger is also supported
])
# runtime settings
dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set
log_level = 'INFO' # The level of logging
work_dir = './work_dirs/bmn_400x100_2x8_9e_activitynet_feature/' # Directory to save the model checkpoints and logs for the current experiments
load_from = None # load models as a pre-trained model from a given path. This will not resume training
resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved
workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once
output_config = dict( # Config of localization output
out=f'{work_dir}/results.json', # Path to output file
output_format='json') # File format of output file
```
### Config System for Action Recognition
We incorporate modular design into our config system,
which is convenient to conduct various experiments.
- An Example of TSN
To help the users have a basic idea of a complete config structure and the modules in an action recognition system,
we make brief comments on the config of TSN as the following.
For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.
```python
# model settings
model = dict( # Config of the model
type='Recognizer2D', # Type of the recognizer
backbone=dict( # Dict for backbone
type='ResNet', # Name of the backbone
pretrained='torchvision://resnet50', # The url/site of the pretrained model
depth=50, # Depth of ResNet model
norm_eval=False), # Whether to set BN layers to eval mode when training
cls_head=dict( # Dict for classification head
type='TSNHead', # Name of classification head
num_classes=400, # Number of classes to be classified.
in_channels=2048, # The input channels of classification head.
spatial_type='avg', # Type of pooling in spatial dimension
consensus=dict(type='AvgConsensus', dim=1), # Config of consensus module
dropout_ratio=0.4, # Probability in dropout layer
init_std=0.01), # Std value for linear layer initiation
# model training and testing settings
train_cfg=None, # Config of training hyperparameters for TSN
test_cfg=dict(average_clips=None)) # Config for testing hyperparameters for TSN.
# dataset settings
dataset_type = 'RawframeDataset' # Type of dataset for training, validation and testing
data_root = 'data/kinetics400/rawframes_train/' # Root path to data for training
data_root_val = 'data/kinetics400/rawframes_val/' # Root path to data for validation and testing
ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt' # Path to the annotation file for training
ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # Path to the annotation file for validation
ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt' # Path to the annotation file for testing
img_norm_cfg = dict( # Config of image normalization used in data pipeline
mean=[123.675, 116.28, 103.53], # Mean values of different channels to normalize
std=[58.395, 57.12, 57.375], # Std values of different channels to normalize
to_bgr=False) # Whether to convert channels from RGB to BGR
train_pipeline = [ # List of training pipeline steps
dict( # Config of SampleFrames
type='SampleFrames', # Sample frames pipeline, sampling frames from video
clip_len=1, # Frames of each sampled output clip
frame_interval=1, # Temporal interval of adjacent sampled frames
num_clips=3), # Number of clips to be sampled
dict( # Config of RawFrameDecode
type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
dict( # Config of Resize
type='Resize', # Resize pipeline
scale=(-1, 256)), # The scale to resize images
dict( # Config of MultiScaleCrop
type='MultiScaleCrop', # Multi scale crop pipeline, cropping images with a list of randomly selected scales
input_size=224, # Input size of the network
scales=(1, 0.875, 0.75, 0.66), # Scales of width and height to be selected
random_crop=False, # Whether to randomly sample cropping bbox
max_wh_scale_gap=1), # Maximum gap of w and h scale levels
dict( # Config of Resize
type='Resize', # Resize pipeline
scale=(224, 224), # The scale to resize images
keep_ratio=False), # Whether to resize with changing the aspect ratio
dict( # Config of Flip
type='Flip', # Flip Pipeline
flip_ratio=0.5), # Probability of implementing flip
dict( # Config of Normalize
type='Normalize', # Normalize pipeline
**img_norm_cfg), # Config of image normalization
dict( # Config of FormatShape
type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
input_format='NCHW'), # Final image shape format
dict( # Config of Collect
type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer
keys=['imgs', 'label'], # Keys of input
meta_keys=[]), # Meta keys of input
dict( # Config of ToTensor
type='ToTensor', # Convert other types to tensor type pipeline
keys=['imgs', 'label']) # Keys to be converted from image to tensor
]
val_pipeline = [ # List of validation pipeline steps
dict( # Config of SampleFrames
type='SampleFrames', # Sample frames pipeline, sampling frames from video
clip_len=1, # Frames of each sampled output clip
frame_interval=1, # Temporal interval of adjacent sampled frames
num_clips=3, # Number of clips to be sampled
test_mode=True), # Whether to set test mode in sampling
dict( # Config of RawFrameDecode
type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
dict( # Config of Resize
type='Resize', # Resize pipeline
scale=(-1, 256)), # The scale to resize images
dict( # Config of CenterCrop
type='CenterCrop', # Center crop pipeline, cropping the center area from images
crop_size=224), # The size to crop images
dict( # Config of Flip
type='Flip', # Flip pipeline
flip_ratio=0), # Probability of implementing flip
dict( # Config of Normalize
type='Normalize', # Normalize pipeline
**img_norm_cfg), # Config of image normalization
dict( # Config of FormatShape
type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
input_format='NCHW'), # Final image shape format
dict( # Config of Collect
type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer
keys=['imgs', 'label'], # Keys of input
meta_keys=[]), # Meta keys of input
dict( # Config of ToTensor
type='ToTensor', # Convert other types to tensor type pipeline
keys=['imgs']) # Keys to be converted from image to tensor
]
test_pipeline = [ # List of testing pipeline steps
dict( # Config of SampleFrames
type='SampleFrames', # Sample frames pipeline, sampling frames from video
clip_len=1, # Frames of each sampled output clip
frame_interval=1, # Temporal interval of adjacent sampled frames
num_clips=25, # Number of clips to be sampled
test_mode=True), # Whether to set test mode in sampling
dict( # Config of RawFrameDecode
type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
dict( # Config of Resize
type='Resize', # Resize pipeline
scale=(-1, 256)), # The scale to resize images
dict( # Config of TenCrop
type='TenCrop', # Ten crop pipeline, cropping ten area from images
crop_size=224), # The size to crop images
dict( # Config of Flip
type='Flip', # Flip pipeline
flip_ratio=0), # Probability of implementing flip
dict( # Config of Normalize
type='Normalize', # Normalize pipeline
**img_norm_cfg), # Config of image normalization
dict( # Config of FormatShape
type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
input_format='NCHW'), # Final image shape format
dict( # Config of Collect
type='Collect', # Collect pipeline that decides which keys in the data should be passed to the recognizer
keys=['imgs', 'label'], # Keys of input
meta_keys=[]), # Meta keys of input
dict( # Config of ToTensor
type='ToTensor', # Convert other types to tensor type pipeline
keys=['imgs']) # Keys to be converted from image to tensor
]
data = dict( # Config of data
videos_per_gpu=32, # Batch size of each single GPU
workers_per_gpu=2, # Workers to pre-fetch data for each single GPU
train_dataloader=dict( # Additional config of train dataloader
drop_last=True), # Whether to drop out the last batch of data in training
val_dataloader=dict( # Additional config of validation dataloader
videos_per_gpu=1), # Batch size of each single GPU during evaluation
test_dataloader=dict( # Additional config of test dataloader
videos_per_gpu=2), # Batch size of each single GPU during testing
train=dict( # Training dataset config
type=dataset_type,
ann_file=ann_file_train,
data_prefix=data_root,
pipeline=train_pipeline),
val=dict( # Validation dataset config
type=dataset_type,
ann_file=ann_file_val,
data_prefix=data_root_val,
pipeline=val_pipeline),
test=dict( # Testing dataset config
type=dataset_type,
ann_file=ann_file_test,
data_prefix=data_root_val,
pipeline=test_pipeline))
# optimizer
optimizer = dict(
# Config used to build optimizer, support (1). All the optimizers in PyTorch
# whose arguments are also the same as those in PyTorch. (2). Custom optimizers
# which are built on `constructor`, referring to "tutorials/5_new_modules.md"
# for implementation.
type='SGD', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
lr=0.01, # Learning rate, see detail usages of the parameters in the documentation of PyTorch
momentum=0.9, # Momentum,
weight_decay=0.0001) # Weight decay of SGD
optimizer_config = dict( # Config used to build the optimizer hook
grad_clip=dict(max_norm=40, norm_type=2)) # Use gradient clip
# learning policy
lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook
policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
step=[40, 80]) # Steps to decay the learning rate
total_epochs = 100 # Total epochs to train the model
checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
interval=5) # Interval to save checkpoint
evaluation = dict( # Config of evaluation during training
interval=5, # Interval to perform evaluation
metrics=['top_k_accuracy', 'mean_class_accuracy'], # Metrics to be performed
metric_options=dict(top_k_accuracy=dict(topk=(1, 3))), # Set top-k accuracy to 1 and 3 during validation
save_best='top_k_accuracy') # set `top_k_accuracy` as key indicator to save best checkpoint
eval_config = dict(
metric_options=dict(top_k_accuracy=dict(topk=(1, 3)))) # Set top-k accuracy to 1 and 3 during testing. You can also use `--eval top_k_accuracy` to assign evaluation metrics
log_config = dict( # Config to register logger hook
interval=20, # Interval to print the log
hooks=[ # Hooks to be implemented during training
dict(type='TextLoggerHook'), # The logger used to record the training process
# dict(type='TensorboardLoggerHook'), # The Tensorboard logger is also supported
])
# runtime settings
dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set
log_level = 'INFO' # The level of logging
work_dir = './work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/' # Directory to save the model checkpoints and logs for the current experiments
load_from = None # load models as a pre-trained model from a given path. This will not resume training
resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved
workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once
```
### Config System for Spatio-Temporal Action Detection
We incorporate modular design into our config system, which is convenient to conduct various experiments.
- An Example of FastRCNN
To help the users have a basic idea of a complete config structure and the modules in a spatio-temporal action detection system,
we make brief comments on the config of FastRCNN as the following.
For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.
```python
# model setting
model = dict( # Config of the model
type='FastRCNN', # Type of the detector
backbone=dict( # Dict for backbone
type='ResNet3dSlowOnly', # Name of the backbone
depth=50, # Depth of ResNet model
pretrained=None, # The url/site of the pretrained model
pretrained2d=False, # If the pretrained model is 2D
lateral=False, # If the backbone is with lateral connections
num_stages=4, # Stages of ResNet model
conv1_kernel=(1, 7, 7), # Conv1 kernel size
conv1_stride_t=1, # Conv1 temporal stride
pool1_stride_t=1, # Pool1 temporal stride
spatial_strides=(1, 2, 2, 1)), # The spatial stride for each ResNet stage
roi_head=dict( # Dict for roi_head
type='AVARoIHead', # Name of the roi_head
bbox_roi_extractor=dict( # Dict for bbox_roi_extractor
type='SingleRoIExtractor3D', # Name of the bbox_roi_extractor
roi_layer_type='RoIAlign', # Type of the RoI op
output_size=8, # Output feature size of the RoI op
with_temporal_pool=True), # If temporal dim is pooled
bbox_head=dict( # Dict for bbox_head
type='BBoxHeadAVA', # Name of the bbox_head
in_channels=2048, # Number of channels of the input feature
num_classes=81, # Number of action classes + 1
multilabel=True, # If the dataset is multilabel
dropout_ratio=0.5)), # The dropout ratio used
# model training and testing settings
train_cfg=dict( # Training config of FastRCNN
rcnn=dict( # Dict for rcnn training config
assigner=dict( # Dict for assigner
type='MaxIoUAssignerAVA', # Name of the assigner
pos_iou_thr=0.9, # IoU threshold for positive examples, > pos_iou_thr -> positive
neg_iou_thr=0.9, # IoU threshold for negative examples, < neg_iou_thr -> negative
min_pos_iou=0.9), # Minimum acceptable IoU for positive examples
sampler=dict( # Dict for sample
type='RandomSampler', # Name of the sampler
num=32, # Batch Size of the sampler
pos_fraction=1, # Positive bbox fraction of the sampler
neg_pos_ub=-1, # Upper bound of the ratio of num negative to num positive
add_gt_as_proposals=True), # Add gt bboxes as proposals
pos_weight=1.0, # Loss weight of positive examples
debug=False)), # Debug mode
test_cfg=dict( # Testing config of FastRCNN
rcnn=dict( # Dict for rcnn testing config
action_thr=0.002))) # The threshold of an action
# dataset settings
dataset_type = 'AVADataset' # Type of dataset for training, validation and testing
data_root = 'data/ava/rawframes' # Root path to data
anno_root = 'data/ava/annotations' # Root path to annotations
ann_file_train = f'{anno_root}/ava_train_v2.1.csv' # Path to the annotation file for training
ann_file_val = f'{anno_root}/ava_val_v2.1.csv' # Path to the annotation file for validation
exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv' # Path to the exclude annotation file for training
exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv' # Path to the exclude annotation file for validation
label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt' # Path to the label file
proposal_file_train = f'{anno_root}/ava_dense_proposals_train.FAIR.recall_93.9.pkl' # Path to the human detection proposals for training examples
proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl' # Path to the human detection proposals for validation examples
img_norm_cfg = dict( # Config of image normalization used in data pipeline
mean=[123.675, 116.28, 103.53], # Mean values of different channels to normalize
std=[58.395, 57.12, 57.375], # Std values of different channels to normalize
to_bgr=False) # Whether to convert channels from RGB to BGR
train_pipeline = [ # List of training pipeline steps
dict( # Config of SampleFrames
type='AVASampleFrames', # Sample frames pipeline, sampling frames from video
clip_len=4, # Frames of each sampled output clip
frame_interval=16), # Temporal interval of adjacent sampled frames
dict( # Config of RawFrameDecode
type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
dict( # Config of RandomRescale
type='RandomRescale', # Randomly rescale the shortedge by a given range
scale_range=(256, 320)), # The shortedge size range of RandomRescale
dict( # Config of RandomCrop
type='RandomCrop', # Randomly crop a patch with the given size
size=256), # The size of the cropped patch
dict( # Config of Flip
type='Flip', # Flip Pipeline
flip_ratio=0.5), # Probability of implementing flip
dict( # Config of Normalize
type='Normalize', # Normalize pipeline
**img_norm_cfg), # Config of image normalization
dict( # Config of FormatShape
type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
input_format='NCTHW', # Final image shape format
collapse=True), # Collapse the dim N if N == 1
dict( # Config of Rename
type='Rename', # Rename keys
mapping=dict(imgs='img')), # The old name to new name mapping
dict( # Config of ToTensor
type='ToTensor', # Convert other types to tensor type pipeline
keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), # Keys to be converted from image to tensor
dict( # Config of ToDataContainer
type='ToDataContainer', # Convert other types to DataContainer type pipeline
fields=[ # Fields to convert to DataContainer
dict( # Dict of fields
key=['proposals', 'gt_bboxes', 'gt_labels'], # Keys to Convert to DataContainer
stack=False)]), # Whether to stack these tensor
dict( # Config of Collect
type='Collect', # Collect pipeline that decides which keys in the data should be passed to the detector
keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'], # Keys of input
meta_keys=['scores', 'entity_ids']), # Meta keys of input
]
val_pipeline = [ # List of validation pipeline steps
dict( # Config of SampleFrames
type='AVASampleFrames', # Sample frames pipeline, sampling frames from video
clip_len=4, # Frames of each sampled output clip
frame_interval=16) # Temporal interval of adjacent sampled frames
dict( # Config of RawFrameDecode
type='RawFrameDecode'), # Load and decode Frames pipeline, picking raw frames with given indices
dict( # Config of Resize
type='Resize', # Resize pipeline
scale=(-1, 256)), # The scale to resize images
dict( # Config of Normalize
type='Normalize', # Normalize pipeline
**img_norm_cfg), # Config of image normalization
dict( # Config of FormatShape
type='FormatShape', # Format shape pipeline, Format final image shape to the given input_format
input_format='NCTHW', # Final image shape format
collapse=True), # Collapse the dim N if N == 1
dict( # Config of Rename
type='Rename', # Rename keys
mapping=dict(imgs='img')), # The old name to new name mapping
dict( # Config of ToTensor
type='ToTensor', # Convert other types to tensor type pipeline
keys=['img', 'proposals']), # Keys to be converted from image to tensor
dict( # Config of ToDataContainer
type='ToDataContainer', # Convert other types to DataContainer type pipeline
fields=[ # Fields to convert to DataContainer
dict( # Dict of fields
key=['proposals'], # Keys to Convert to DataContainer
stack=False)]), # Whether to stack these tensor
dict( # Config of Collect
type='Collect', # Collect pipeline that decides which keys in the data should be passed to the detector
keys=['img', 'proposals'], # Keys of input
meta_keys=['scores', 'entity_ids'], # Meta keys of input
nested=True) # Whether to wrap the data in a nested list
]
data = dict( # Config of data
videos_per_gpu=16, # Batch size of each single GPU
workers_per_gpu=2, # Workers to pre-fetch data for each single GPU
val_dataloader=dict( # Additional config of validation dataloader
videos_per_gpu=1), # Batch size of each single GPU during evaluation
train=dict( # Training dataset config
type=dataset_type,
ann_file=ann_file_train,
exclude_file=exclude_file_train,
pipeline=train_pipeline,
label_file=label_file,
proposal_file=proposal_file_train,
person_det_score_thr=0.9,
data_prefix=data_root),
val=dict( # Validation dataset config
type=dataset_type,
ann_file=ann_file_val,
exclude_file=exclude_file_val,
pipeline=val_pipeline,
label_file=label_file,
proposal_file=proposal_file_val,
person_det_score_thr=0.9,
data_prefix=data_root))
data['test'] = data['val'] # Set test_dataset as val_dataset
# optimizer
optimizer = dict(
# Config used to build optimizer, support (1). All the optimizers in PyTorch
# whose arguments are also the same as those in PyTorch. (2). Custom optimizers
# which are built on `constructor`, referring to "tutorials/5_new_modules.md"
# for implementation.
type='SGD', # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
lr=0.2, # Learning rate, see detail usages of the parameters in the documentation of PyTorch (for 8gpu)
momentum=0.9, # Momentum,
weight_decay=0.00001) # Weight decay of SGD
optimizer_config = dict( # Config used to build the optimizer hook
grad_clip=dict(max_norm=40, norm_type=2)) # Use gradient clip
lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook
policy='step', # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
step=[40, 80], # Steps to decay the learning rate
warmup='linear', # Warmup strategy
warmup_by_epoch=True, # Warmup_iters indicates iter num or epoch num
warmup_iters=5, # Number of iters or epochs for warmup
warmup_ratio=0.1) # The initial learning rate is warmup_ratio * lr
total_epochs = 20 # Total epochs to train the model
checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
interval=1) # Interval to save checkpoint
workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once
evaluation = dict( # Config of evaluation during training
interval=1, save_best='mAP@0.5IOU') # Interval to perform evaluation and the key for saving best checkpoint
log_config = dict( # Config to register logger hook
interval=20, # Interval to print the log
hooks=[ # Hooks to be implemented during training
dict(type='TextLoggerHook'), # The logger used to record the training process
])
# runtime settings
dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set
log_level = 'INFO' # The level of logging
work_dir = ('./work_dirs/ava/' # Directory to save the model checkpoints and logs for the current experiments
'slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb')
load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/' # load models as a pre-trained model from a given path. This will not resume training
'slowonly_r50_4x16x1_256e_kinetics400_rgb/'
'slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth')
resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved
```
## FAQ
### Use intermediate variables in configs
Some intermediate variables are used in the config files, like `train_pipeline`/`val_pipeline`/`test_pipeline`,
`ann_file_train`/`ann_file_val`/`ann_file_test`, `img_norm_cfg` etc.
For Example, we would like to first define `train_pipeline`/`val_pipeline`/`test_pipeline` and pass them into `data`.
Thus, `train_pipeline`/`val_pipeline`/`test_pipeline` are intermediate variable.
we also define `ann_file_train`/`ann_file_val`/`ann_file_test` and `data_root`/`data_root_val` to provide data pipeline some
basic information.
In addition, we use `img_norm_cfg` as intermediate variables to construct data augmentation components.
```python
...
dataset_type = 'RawframeDataset'
data_root = 'data/kinetics400/rawframes_train'
data_root_val = 'data/kinetics400/rawframes_val'
ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt'
ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt'
ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(
type='MultiScaleCrop',
input_size=224,
scales=(1, 0.8),
random_crop=False,
max_wh_scale_gap=0),
dict(type='Resize', scale=(224, 224), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(
type='SampleFrames',
clip_len=32,
frame_interval=2,
num_clips=1,
test_mode=True),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(
type='SampleFrames',
clip_len=32,
frame_interval=2,
num_clips=10,
test_mode=True),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='ThreeCrop', crop_size=256),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=8,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix=data_root,
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=data_root_val,
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=data_root_val,
pipeline=test_pipeline))
```
# Tutorial 2: Finetuning Models
This tutorial provides instructions for users to use the pre-trained models
to finetune them on other datasets, so that better performance can be achieved.
<!-- TOC -->
- [Tutorial 2: Finetuning Models](#tutorial-2-finetuning-models)
- [Outline](#outline)
- [Modify Head](#modify-head)
- [Modify Dataset](#modify-dataset)
- [Modify Training Schedule](#modify-training-schedule)
- [Use Pre-Trained Model](#use-pre-trained-model)
<!-- TOC -->
## Outline
There are two steps to finetune a model on a new dataset.
1. Add support for the new dataset. See [Tutorial 3: Adding New Dataset](3_new_dataset.md).
2. Modify the configs. This will be discussed in this tutorial.
For example, if the users want to finetune models pre-trained on Kinetics-400 Dataset to another dataset, say UCF101,
then four parts in the config (see [here](1_config.md)) needs attention.
## Modify Head
The `num_classes` in the `cls_head` need to be changed to the class number of the new dataset.
The weights of the pre-trained models are reused except for the final prediction layer.
So it is safe to change the class number.
In our case, UCF101 has 101 classes.
So we change it from 400 (class number of Kinetics-400) to 101.
```python
model = dict(
type='Recognizer2D',
backbone=dict(
type='ResNet',
pretrained='torchvision://resnet50',
depth=50,
norm_eval=False),
cls_head=dict(
type='TSNHead',
num_classes=101, # change from 400 to 101
in_channels=2048,
spatial_type='avg',
consensus=dict(type='AvgConsensus', dim=1),
dropout_ratio=0.4,
init_std=0.01),
train_cfg=None,
test_cfg=dict(average_clips=None))
```
Note that the `pretrained='torchvision://resnet50'` setting is used for initializing backbone.
If you are training a new model from ImageNet-pretrained weights, this is for you.
However, this setting is not related to our task at hand.
What we need is `load_from`, which will be discussed later.
## Modify Dataset
MMAction2 supports UCF101, Kinetics-400, Moments in Time, Multi-Moments in Time, THUMOS14,
Something-Something V1&V2, ActivityNet Dataset.
The users may need to adapt one of the above dataset to fit for their special datasets.
In our case, UCF101 is already supported by various dataset types, like `RawframeDataset`,
so we change the config as follows.
```python
# dataset settings
dataset_type = 'RawframeDataset'
data_root = 'data/ucf101/rawframes_train/'
data_root_val = 'data/ucf101/rawframes_val/'
ann_file_train = 'data/ucf101/ucf101_train_list.txt'
ann_file_val = 'data/ucf101/ucf101_val_list.txt'
ann_file_test = 'data/ucf101/ucf101_val_list.txt'
```
## Modify Training Schedule
Finetuning usually requires smaller learning rate and less training epochs.
```python
# optimizer
optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001) # change from 0.01 to 0.005
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[20, 40])
total_epochs = 50 # change from 100 to 50
checkpoint_config = dict(interval=5)
```
## Use Pre-Trained Model
To use the pre-trained model for the whole network, the new config adds the link of pre-trained models in the `load_from`.
We set `load_from=None` as default in `configs/_base_/default_runtime.py` and owing to \[inheritance design\](/docs/en/tutorials/1_config.md), users can directly change it by setting `load_from` in their configs.
```python
# use the pre-trained model for the whole TSN network
load_from = 'https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/mmaction-v1/recognition/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth' # model path can be found in model zoo
```
# Tutorial 3: Adding New Dataset
In this tutorial, we will introduce some methods about how to customize your own dataset by reorganizing data and mixing dataset for the project.
<!-- TOC -->
- [Tutorial 3: Adding New Dataset](#tutorial-3-adding-new-dataset)
- [Customize Datasets by Reorganizing Data](#customize-datasets-by-reorganizing-data)
- [Reorganize datasets to existing format](#reorganize-datasets-to-existing-format)
- [An example of a custom dataset](#an-example-of-a-custom-dataset)
- [Customize Dataset by Mixing Dataset](#customize-dataset-by-mixing-dataset)
- [Repeat dataset](#repeat-dataset)
<!-- TOC -->
## Customize Datasets by Reorganizing Data
### Reorganize datasets to existing format
The simplest way is to convert your dataset to existing dataset formats (RawframeDataset or VideoDataset).
There are three kinds of annotation files.
- rawframe annotation
The annotation of a rawframe dataset is a text file with multiple lines,
and each line indicates `frame_directory` (relative path) of a video,
`total_frames` of a video and the `label` of a video, which are split by a whitespace.
Here is an example.
```
some/directory-1 163 1
some/directory-2 122 1
some/directory-3 258 2
some/directory-4 234 2
some/directory-5 295 3
some/directory-6 121 3
```
- video annotation
The annotation of a video dataset is a text file with multiple lines,
and each line indicates a sample video with the `filepath` (relative path) and `label`,
which are split by a whitespace.
Here is an example.
```
some/path/000.mp4 1
some/path/001.mp4 1
some/path/002.mp4 2
some/path/003.mp4 2
some/path/004.mp4 3
some/path/005.mp4 3
```
- ActivityNet annotation
The annotation of ActivityNet dataset is a json file. Each key is a video name
and the corresponding value is the meta data and annotation for the video.
Here is an example.
```
{
"video1": {
"duration_second": 211.53,
"duration_frame": 6337,
"annotations": [
{
"segment": [
30.025882995319815,
205.2318595943838
],
"label": "Rock climbing"
}
],
"feature_frame": 6336,
"fps": 30.0,
"rfps": 29.9579255898
},
"video2": {
"duration_second": 26.75,
"duration_frame": 647,
"annotations": [
{
"segment": [
2.578755070202808,
24.914101404056165
],
"label": "Drinking beer"
}
],
"feature_frame": 624,
"fps": 24.0,
"rfps": 24.1869158879
}
}
```
There are two ways to work with custom datasets.
- online conversion
You can write a new Dataset class inherited from [BaseDataset](/mmaction/datasets/base.py), and overwrite three methods
`load_annotations(self)`, `evaluate(self, results, metrics, logger)` and `dump_results(self, results, out)`,
like [RawframeDataset](/mmaction/datasets/rawframe_dataset.py), [VideoDataset](/mmaction/datasets/video_dataset.py) or [ActivityNetDataset](/mmaction/datasets/activitynet_dataset.py).
- offline conversion
You can convert the annotation format to the expected format above and save it to
a pickle or json file, then you can simply use `RawframeDataset`, `VideoDataset` or `ActivityNetDataset`.
After the data pre-processing, the users need to further modify the config files to use the dataset.
Here is an example of using a custom dataset in rawframe format.
In `configs/task/method/my_custom_config.py`:
```python
...
# dataset settings
dataset_type = 'RawframeDataset'
data_root = 'path/to/your/root'
data_root_val = 'path/to/your/root_val'
ann_file_train = 'data/custom/custom_train_list.txt'
ann_file_val = 'data/custom/custom_val_list.txt'
ann_file_test = 'data/custom/custom_val_list.txt'
...
data = dict(
videos_per_gpu=32,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file=ann_file_train,
...),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
...),
test=dict(
type=dataset_type,
ann_file=ann_file_test,
...))
...
```
We use this way to support Rawframe dataset.
### An example of a custom dataset
Assume the annotation is in a new format in text files, and the image file name is of template like `img_00005.jpg`
The video annotations are stored in text file `annotation.txt` as following
```
directory,total frames,class
D32_1gwq35E,299,66
-G-5CJ0JkKY,249,254
T4h1bvOd9DA,299,33
4uZ27ivBl00,299,341
0LfESFkfBSw,249,186
-YIsNpBEx6c,299,169
```
We can create a new dataset in `mmaction/datasets/my_dataset.py` to load the data.
```python
import copy
import os.path as osp
import mmcv
from .base import BaseDataset
from .builder import DATASETS
@DATASETS.register_module()
class MyDataset(BaseDataset):
def __init__(self,
ann_file,
pipeline,
data_prefix=None,
test_mode=False,
filename_tmpl='img_{:05}.jpg'):
super(MyDataset, self).__init__(ann_file, pipeline, test_mode)
self.filename_tmpl = filename_tmpl
def load_annotations(self):
video_infos = []
with open(self.ann_file, 'r') as fin:
for line in fin:
if line.startswith("directory"):
continue
frame_dir, total_frames, label = line.split(',')
if self.data_prefix is not None:
frame_dir = osp.join(self.data_prefix, frame_dir)
video_infos.append(
dict(
frame_dir=frame_dir,
total_frames=int(total_frames),
label=int(label)))
return video_infos
def prepare_train_frames(self, idx):
results = copy.deepcopy(self.video_infos[idx])
results['filename_tmpl'] = self.filename_tmpl
return self.pipeline(results)
def prepare_test_frames(self, idx):
results = copy.deepcopy(self.video_infos[idx])
results['filename_tmpl'] = self.filename_tmpl
return self.pipeline(results)
def evaluate(self,
results,
metrics='top_k_accuracy',
topk=(1, 5),
logger=None):
pass
```
Then in the config, to use `MyDataset` you can modify the config as the following
```python
dataset_A_train = dict(
type='MyDataset',
ann_file=ann_file_train,
pipeline=train_pipeline
)
```
## Customize Dataset by Mixing Dataset
MMAction2 also supports to mix dataset for training. Currently it supports to repeat dataset.
### Repeat dataset
We use `RepeatDataset` as wrapper to repeat the dataset. For example, suppose the original dataset as `Dataset_A`,
to repeat it, the config looks like the following
```python
dataset_A_train = dict(
type='RepeatDataset',
times=N,
dataset=dict( # This is the original config of Dataset_A
type='Dataset_A',
...
pipeline=train_pipeline
)
)
```
# Tutorial 4: Customize Data Pipelines
In this tutorial, we will introduce some methods about the design of data pipelines, and how to customize and extend your own data pipelines for the project.
<!-- TOC -->
- [Tutorial 4: Customize Data Pipelines](#tutorial-4-customize-data-pipelines)
- [Design of Data Pipelines](#design-of-data-pipelines)
- [Data loading](#data-loading)
- [Pre-processing](#pre-processing)
- [Formatting](#formatting)
- [Extend and Use Custom Pipelines](#extend-and-use-custom-pipelines)
<!-- TOC -->
## Design of Data Pipelines
Following typical conventions, we use `Dataset` and `DataLoader` for data loading
with multiple workers. `Dataset` returns a dict of data items corresponding
the arguments of models' forward method.
Since the data in action recognition & localization may not be the same size (image size, gt bbox size, etc.),
The `DataContainer` in MMCV is used to help collect and distribute data of different sizes.
See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
The data preparation pipeline and the dataset is decomposed. Usually a dataset
defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next operation.
We present a typical pipeline in the following figure. The blue blocks are pipeline operations.
With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange).
![pipeline figure](https://github.com/open-mmlab/mmaction2/raw/master/resources/data_pipeline.png)
The operations are categorized into data loading, pre-processing and formatting.
Here is a pipeline example for TSN.
```python
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3),
dict(type='RawFrameDecode', io_backend='disk'),
dict(type='Resize', scale=(-1, 256)),
dict(
type='MultiScaleCrop',
input_size=224,
scales=(1, 0.875, 0.75, 0.66),
random_crop=False,
max_wh_scale_gap=1),
dict(type='Resize', scale=(224, 224), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(
type='SampleFrames',
clip_len=1,
frame_interval=1,
num_clips=3,
test_mode=True),
dict(type='RawFrameDecode', io_backend='disk'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(
type='SampleFrames',
clip_len=1,
frame_interval=1,
num_clips=25,
test_mode=True),
dict(type='RawFrameDecode', io_backend='disk'),
dict(type='Resize', scale=(-1, 256)),
dict(type='TenCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
```
We have supported some lazy operators and encourage users to apply them.
Lazy ops record how the data should be processed, but it will postpone the processing on the raw data until the raw data forward `Fuse` stage.
Specifically, lazy ops avoid frequent reading and modification operation on the raw data, but process the raw data once in the final Fuse stage, thus accelerating data preprocessing.
Here is a pipeline example applying lazy ops.
```python
train_pipeline = [
dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1),
dict(type='RawFrameDecode', decoding_backend='turbojpeg'),
# The following three lazy ops only process the bbox of frames without
# modifying the raw data.
dict(type='Resize', scale=(-1, 256), lazy=True),
dict(
type='MultiScaleCrop',
input_size=224,
scales=(1, 0.8),
random_crop=False,
max_wh_scale_gap=0,
lazy=True),
dict(type='Resize', scale=(224, 224), keep_ratio=False, lazy=True),
# Lazy operator `Flip` only record whether a frame should be fliped and the
# flip direction.
dict(type='Flip', flip_ratio=0.5, lazy=True),
# Processing the raw data once in Fuse stage.
dict(type='Fuse'),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
```
For each operation, we list the related dict fields that are added/updated/removed below, where `*` means the key may not be affected.
### Data loading
`SampleFrames`
- add: frame_inds, clip_len, frame_interval, num_clips, \*total_frames
`DenseSampleFrames`
- add: frame_inds, clip_len, frame_interval, num_clips, \*total_frames
`PyAVDecode`
- add: imgs, original_shape
- update: \*frame_inds
`DecordDecode`
- add: imgs, original_shape
- update: \*frame_inds
`OpenCVDecode`
- add: imgs, original_shape
- update: \*frame_inds
`RawFrameDecode`
- add: imgs, original_shape
- update: \*frame_inds
### Pre-processing
`RandomCrop`
- add: crop_bbox, img_shape
- update: imgs
`RandomResizedCrop`
- add: crop_bbox, img_shape
- update: imgs
`MultiScaleCrop`
- add: crop_bbox, img_shape, scales
- update: imgs
`Resize`
- add: img_shape, keep_ratio, scale_factor
- update: imgs
`Flip`
- add: flip, flip_direction
- update: imgs, label
`Normalize`
- add: img_norm_cfg
- update: imgs
`CenterCrop`
- add: crop_bbox, img_shape
- update: imgs
`ThreeCrop`
- add: crop_bbox, img_shape
- update: imgs
`TenCrop`
- add: crop_bbox, img_shape
- update: imgs
### Formatting
`ToTensor`
- update: specified by `keys`.
`ImageToTensor`
- update: specified by `keys`.
`Transpose`
- update: specified by `keys`.
`Collect`
- add: img_metas (the keys of img_metas is specified by `meta_keys`)
- remove: all other keys except for those specified by `keys`
It is **noteworthy** that the first key, commonly `imgs`, will be used as the main key to calculate the batch size.
`FormatShape`
- add: input_shape
- update: imgs
## Extend and Use Custom Pipelines
1. Write a new pipeline in any file, e.g., `my_pipeline.py`. It takes a dict as input and return a dict.
```python
from mmaction.datasets import PIPELINES
@PIPELINES.register_module()
class MyTransform:
def __call__(self, results):
results['key'] = value
return results
```
2. Import the new class.
```python
from .my_pipeline import MyTransform
```
3. Use it in config files.
```python
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='DenseSampleFrames', clip_len=8, frame_interval=8, num_clips=1),
dict(type='RawFrameDecode', io_backend='disk'),
dict(type='MyTransform'), # use a custom pipeline
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
```
# Tutorial 5: Adding New Modules
In this tutorial, we will introduce some methods about how to customize optimizer, develop new components and new a learning rate scheduler for this project.
<!-- TOC -->
- [Tutorial 5: Adding New Modules](#tutorial-5-adding-new-modules)
- [Customize Optimizer](#customize-optimizer)
- [Customize Optimizer Constructor](#customize-optimizer-constructor)
- [Develop New Components](#develop-new-components)
- [Add new backbones](#add-new-backbones)
- [Add new heads](#add-new-heads)
- [Add new loss](#add-new-loss)
- [Add new learning rate scheduler (updater)](#add-new-learning-rate-scheduler-updater)
<!-- TOC -->
## Customize Optimizer
An example of customized optimizer is [CopyOfSGD](/mmaction/core/optimizer/copy_of_sgd.py) is defined in `mmaction/core/optimizer/copy_of_sgd.py`.
More generally, a customized optimizer could be defined as following.
Assume you want to add an optimizer named as `MyOptimizer`, which has arguments `a`, `b` and `c`.
You need to first implement the new optimizer in a file, e.g., in `mmaction/core/optimizer/my_optimizer.py`:
```python
from mmcv.runner import OPTIMIZERS
from torch.optim import Optimizer
@OPTIMIZERS.register_module()
class MyOptimizer(Optimizer):
def __init__(self, a, b, c):
```
Then add this module in `mmaction/core/optimizer/__init__.py`, thus the registry will find the new module and add it:
```python
from .my_optimizer import MyOptimizer
```
Then you can use `MyOptimizer` in `optimizer` field of config files.
In the configs, the optimizers are defined by the field `optimizer` like the following:
```python
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
```
To use your own optimizer, the field can be changed as
```python
optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value)
```
We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the `optimizer` field of config files.
For example, if you want to use `ADAM`, though the performance will drop a lot, the modification could be as the following.
```python
optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001)
```
The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch.
## Customize Optimizer Constructor
Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNorm layers.
The users can do those fine-grained parameter tuning through customizing optimizer constructor.
You can write a new optimizer constructor inherit from [DefaultOptimizerConstructor](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py)
and overwrite the `add_params(self, params, module)` method.
An example of customized optimizer constructor is [TSMOptimizerConstructor](/mmaction/core/optimizer/tsm_optimizer_constructor.py).
More generally, a customized optimizer constructor could be defined as following.
In `mmaction/core/optimizer/my_optimizer_constructor.py`:
```python
from mmcv.runner import OPTIMIZER_BUILDERS, DefaultOptimizerConstructor
@OPTIMIZER_BUILDERS.register_module()
class MyOptimizerConstructor(DefaultOptimizerConstructor):
```
In `mmaction/core/optimizer/__init__.py`:
```python
from .my_optimizer_constructor import MyOptimizerConstructor
```
Then you can use `MyOptimizerConstructor` in `optimizer` field of config files.
```python
# optimizer
optimizer = dict(
type='SGD',
constructor='MyOptimizerConstructor',
paramwise_cfg=dict(fc_lr5=True),
lr=0.02,
momentum=0.9,
weight_decay=0.0001)
```
## Develop New Components
We basically categorize model components into 4 types.
- recognizer: the whole recognizer model pipeline, usually contains a backbone and cls_head.
- backbone: usually an FCN network to extract feature maps, e.g., ResNet, BNInception.
- cls_head: the component for classification task, usually contains an FC layer with some pooling layers.
- localizer: the model for temporal localization task, currently available: BSN, BMN, SSN.
### Add new backbones
Here we show how to develop new components with an example of TSN.
1. Create a new file `mmaction/models/backbones/resnet.py`.
```python
import torch.nn as nn
from ..builder import BACKBONES
@BACKBONES.register_module()
class ResNet(nn.Module):
def __init__(self, arg1, arg2):
pass
def forward(self, x): # should return a tuple
pass
def init_weights(self, pretrained=None):
pass
```
2. Import the module in `mmaction/models/backbones/__init__.py`.
```python
from .resnet import ResNet
```
3. Use it in your config file.
```python
model = dict(
...
backbone=dict(
type='ResNet',
arg1=xxx,
arg2=xxx),
)
```
### Add new heads
Here we show how to develop a new head with the example of TSNHead as the following.
1. Create a new file `mmaction/models/heads/tsn_head.py`.
You can write a new classification head inheriting from [BaseHead](/mmaction/models/heads/base.py),
and overwrite `init_weights(self)` and `forward(self, x)` method.
```python
from ..builder import HEADS
from .base import BaseHead
@HEADS.register_module()
class TSNHead(BaseHead):
def __init__(self, arg1, arg2):
pass
def forward(self, x):
pass
def init_weights(self):
pass
```
2. Import the module in `mmaction/models/heads/__init__.py`
```python
from .tsn_head import TSNHead
```
3. Use it in your config file
```python
model = dict(
...
cls_head=dict(
type='TSNHead',
num_classes=400,
in_channels=2048,
arg1=xxx,
arg2=xxx),
```
### Add new loss
Assume you want to add a new loss as `MyLoss`. To add a new loss function, the users need implement it in `mmaction/models/losses/my_loss.py`.
```python
import torch
import torch.nn as nn
from ..builder import LOSSES
def my_loss(pred, target):
assert pred.size() == target.size() and target.numel() > 0
loss = torch.abs(pred - target)
return loss
@LOSSES.register_module()
class MyLoss(nn.Module):
def forward(self, pred, target):
loss = my_loss(pred, target)
return loss
```
Then the users need to add it in the `mmaction/models/losses/__init__.py`
```python
from .my_loss import MyLoss, my_loss
```
To use it, modify the `loss_xxx` field. Since MyLoss is for regression, we can use it for the bbox loss `loss_bbox`.
```python
loss_bbox=dict(type='MyLoss'))
```
## Add new learning rate scheduler (updater)
The default manner of constructing a lr updater(namely, 'scheduler' by pytorch convention), is to modify the config such as:
```python
...
lr_config = dict(policy='step', step=[20, 40])
...
```
In the api for [`train.py`](/mmaction/apis/train.py), it will register the learning rate updater hook based on the config at:
```python
...
runner.register_training_hooks(
cfg.lr_config,
optimizer_config,
cfg.checkpoint_config,
cfg.log_config,
cfg.get('momentum_config', None))
...
```
So far, the supported updaters can be find in [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py), but if you want to customize a new learning rate updater, you may follow the steps below:
1. First, write your own LrUpdaterHook in `$MMAction2/mmaction/core/scheduler`. The snippet followed is an example of customized lr updater that uses learning rate based on a specific learning rate ratio: `lrs`, by which the learning rate decreases at each `steps`:
```python
@HOOKS.register_module()
# Register it here
class RelativeStepLrUpdaterHook(LrUpdaterHook):
# You should inheritate it from mmcv.LrUpdaterHook
def __init__(self, steps, lrs, **kwargs):
super().__init__(**kwargs)
assert len(steps) == (len(lrs))
self.steps = steps
self.lrs = lrs
def get_lr(self, runner, base_lr):
# Only this function is required to override
# This function is called before each training epoch, return the specific learning rate here.
progress = runner.epoch if self.by_epoch else runner.iter
for i in range(len(self.steps)):
if progress < self.steps[i]:
return self.lrs[i]
```
2. Modify your config:
In your config file, swap the original `lr_config` by:
```python
lr_config = dict(policy='RelativeStep', steps=[20, 40, 60], lrs=[0.1, 0.01, 0.001])
```
More examples can be found in [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py).
# Tutorial 6: Exporting a model to ONNX
Open Neural Network Exchange [(ONNX)](https://onnx.ai/) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves.
<!-- TOC -->
- [Tutorial 6: Exporting a model to ONNX](#tutorial-6-exporting-a-model-to-onnx)
- [Supported Models](#supported-models)
- [Usage](#usage)
- [Prerequisite](#prerequisite)
- [Recognizers](#recognizers)
- [Localizers](#localizers)
<!-- TOC -->
## Supported Models
So far, our codebase supports onnx exporting from pytorch models trained with MMAction2. The supported models are:
- I3D
- TSN
- TIN
- TSM
- R(2+1)D
- SLOWFAST
- SLOWONLY
- BMN
- BSN(tem, pem)
## Usage
For simple exporting, you can use the [script](/tools/deployment/pytorch2onnx.py) here. Note that the package `onnx` and `onnxruntime` are required for verification after exporting.
### Prerequisite
First, install onnx.
```shell
pip install onnx onnxruntime
```
We provide a python script to export the pytorch model trained by MMAction2 to ONNX.
```shell
python tools/deployment/pytorch2onnx.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--shape ${SHAPE}] \
[--verify] [--show] [--output-file ${OUTPUT_FILE}] [--is-localizer] [--opset-version ${VERSION}]
```
Optional arguments:
- `--shape`: The shape of input tensor to the model. For 2D recognizer(e.g. TSN), the input should be `$batch $clip $channel $height $width`(e.g. `1 1 3 224 224`); For 3D recognizer(e.g. I3D), the input should be `$batch $clip $channel $time $height $width`(e.g. `1 1 3 32 224 224`); For localizer such as BSN, the input for each module is different, please check the `forward` function for it. If not specified, it will be set to `1 1 3 224 224`.
- `--verify`: Determines whether to verify the exported model, runnably and numerically. If not specified, it will be set to `False`.
- `--show`: Determines whether to print the architecture of the exported model. If not specified, it will be set to `False`.
- `--output-file`: The output onnx model name. If not specified, it will be set to `tmp.onnx`.
- `--is-localizer`: Determines whether the model to be exported is a localizer. If not specified, it will be set to `False`.
- `--opset-version`: Determines the operation set version of onnx, we recommend you to use a higher version such as 11 for compatibility. If not specified, it will be set to `11`.
- `--softmax`: Determines whether to add a softmax layer at the end of recognizers. If not specified, it will be set to `False`. For now, localizers are not supported.
### Recognizers
For recognizers, please run:
```shell
python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --shape $SHAPE --verify
```
### Localizers
For localizers, please run:
```shell
python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --is-localizer --shape $SHAPE --verify
```
Please fire an issue if you discover any checkpoints that are not perfectly exported or suffer some loss in accuracy.
# Tutorial 7: Customize Runtime Settings
In this tutorial, we will introduce some methods about how to customize optimization methods, training schedules, workflow and hooks when running your own settings for the project.
<!-- TOC -->
- [Tutorial 7: Customize Runtime Settings](#tutorial-7-customize-runtime-settings)
- [Customize Optimization Methods](#customize-optimization-methods)
- [Customize optimizer supported by PyTorch](#customize-optimizer-supported-by-pytorch)
- [Customize self-implemented optimizer](#customize-self-implemented-optimizer)
- [1. Define a new optimizer](#1-define-a-new-optimizer)
- [2. Add the optimizer to registry](#2-add-the-optimizer-to-registry)
- [3. Specify the optimizer in the config file](#3-specify-the-optimizer-in-the-config-file)
- [Customize optimizer constructor](#customize-optimizer-constructor)
- [Additional settings](#additional-settings)
- [Customize Training Schedules](#customize-training-schedules)
- [Customize Workflow](#customize-workflow)
- [Customize Hooks](#customize-hooks)
- [Customize self-implemented hooks](#customize-self-implemented-hooks)
- [1. Implement a new hook](#1-implement-a-new-hook)
- [2. Register the new hook](#2-register-the-new-hook)
- [3. Modify the config](#3-modify-the-config)
- [Use hooks implemented in MMCV](#use-hooks-implemented-in-mmcv)
- [Modify default runtime hooks](#modify-default-runtime-hooks)
- [Checkpoint config](#checkpoint-config)
- [Log config](#log-config)
- [Evaluation config](#evaluation-config)
<!-- TOC -->
## Customize Optimization Methods
### Customize optimizer supported by PyTorch
We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the `optimizer` field of config files.
For example, if you want to use `Adam`, the modification could be as the following.
```python
optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001)
```
To modify the learning rate of the model, the users only need to modify the `lr` in the config of optimizer.
The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch.
For example, if you want to use `Adam` with the setting like `torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)` in PyTorch,
the modification could be set as the following.
```python
optimizer = dict(type='Adam', lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
```
### Customize self-implemented optimizer
#### 1. Define a new optimizer
A customized optimizer could be defined as following.
Assume you want to add an optimizer named `MyOptimizer`, which has arguments `a`, `b`, and `c`.
You need to create a new directory named `mmaction/core/optimizer`.
And then implement the new optimizer in a file, e.g., in `mmaction/core/optimizer/my_optimizer.py`:
```python
from mmcv.runner import OPTIMIZERS
from torch.optim import Optimizer
@OPTIMIZERS.register_module()
class MyOptimizer(Optimizer):
def __init__(self, a, b, c):
```
#### 2. Add the optimizer to registry
To find the above module defined above, this module should be imported into the main namespace at first. There are two ways to achieve it.
- Modify `mmaction/core/optimizer/__init__.py` to import it.
The newly defined module should be imported in `mmaction/core/optimizer/__init__.py` so that the registry will
find the new module and add it:
```python
from .my_optimizer import MyOptimizer
```
- Use `custom_imports` in the config to manually import it
```python
custom_imports = dict(imports=['mmaction.core.optimizer.my_optimizer'], allow_failed_imports=False)
```
The module `mmaction.core.optimizer.my_optimizer` will be imported at the beginning of the program and the class `MyOptimizer` is then automatically registered.
Note that only the package containing the class `MyOptimizer` should be imported. `mmaction.core.optimizer.my_optimizer.MyOptimizer` **cannot** be imported directly.
#### 3. Specify the optimizer in the config file
Then you can use `MyOptimizer` in `optimizer` field of config files.
In the configs, the optimizers are defined by the field `optimizer` like the following:
```python
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
```
To use your own optimizer, the field can be changed to
```python
optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value)
```
### Customize optimizer constructor
Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNorm layers.
The users can do those fine-grained parameter tuning through customizing optimizer constructor.
```python
from mmcv.runner.optimizer import OPTIMIZER_BUILDERS
@OPTIMIZER_BUILDERS.register_module()
class MyOptimizerConstructor:
def __init__(self, optimizer_cfg, paramwise_cfg=None):
pass
def __call__(self, model):
return my_optimizer
```
The default optimizer constructor is implemented [here](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/optimizer/default_constructor.py#L11),
which could also serve as a template for new optimizer constructor.
### Additional settings
Tricks not implemented by the optimizer should be implemented through optimizer constructor (e.g., set parameter-wise learning rates) or hooks.
We list some common settings that could stabilize the training or accelerate the training. Feel free to create PR, issue for more settings.
- __Use gradient clip to stabilize training__:
Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below:
```python
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
```
- __Use momentum schedule to accelerate model convergence__:
We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way.
Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence.
For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327)
and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130).
```python
lr_config = dict(
policy='cyclic',
target_ratio=(10, 1e-4),
cyclic_times=1,
step_ratio_up=0.4,
)
momentum_config = dict(
policy='cyclic',
target_ratio=(0.85 / 0.95, 1),
cyclic_times=1,
step_ratio_up=0.4,
)
```
## Customize Training Schedules
we use step learning rate with default value in config files, this calls [`StepLRHook`](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L153) in MMCV.
We support many other learning rate schedule [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py), such as `CosineAnnealing` and `Poly` schedule. Here are some examples
- Poly schedule:
```python
lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
```
- ConsineAnnealing schedule:
```python
lr_config = dict(
policy='CosineAnnealing',
warmup='linear',
warmup_iters=1000,
warmup_ratio=1.0 / 10,
min_lr_ratio=1e-5)
```
## Customize Workflow
By default, we recommend users to use `EvalHook` to do evaluation after training epoch, but they can still use `val` workflow as an alternative.
Workflow is a list of (phase, epochs) to specify the running order and epochs. By default it is set to be
```python
workflow = [('train', 1)]
```
which means running 1 epoch for training.
Sometimes user may want to check some metrics (e.g. loss, accuracy) about the model on the validate set.
In such case, we can set the workflow as
```python
[('train', 1), ('val', 1)]
```
so that 1 epoch for training and 1 epoch for validation will be run iteratively.
:::{note}
1. The parameters of model will not be updated during val epoch.
2. Keyword `total_epochs` in the config only controls the number of training epochs and will not affect the validation workflow.
3. Workflows `[('train', 1), ('val', 1)]` and `[('train', 1)]` will not change the behavior of `EvalHook` because `EvalHook` is called by `after_train_epoch` and validation workflow only affect hooks that are called through `after_val_epoch`.
Therefore, the only difference between `[('train', 1), ('val', 1)]` and `[('train', 1)]` is that the runner will calculate losses on validation set after each training epoch.
:::
## Customize Hooks
### Customize self-implemented hooks
#### 1. Implement a new hook
Here we give an example of creating a new hook in MMAction2 and using it in training.
```python
from mmcv.runner import HOOKS, Hook
@HOOKS.register_module()
class MyHook(Hook):
def __init__(self, a, b):
pass
def before_run(self, runner):
pass
def after_run(self, runner):
pass
def before_epoch(self, runner):
pass
def after_epoch(self, runner):
pass
def before_iter(self, runner):
pass
def after_iter(self, runner):
pass
```
Depending on the functionality of the hook, the users need to specify what the hook will do at each stage of the training in `before_run`, `after_run`, `before_epoch`, `after_epoch`, `before_iter`, and `after_iter`.
#### 2. Register the new hook
Then we need to make `MyHook` imported. Assuming the file is in `mmaction/core/utils/my_hook.py` there are two ways to do that:
- Modify `mmaction/core/utils/__init__.py` to import it.
The newly defined module should be imported in `mmaction/core/utils/__init__.py` so that the registry will
find the new module and add it:
```python
from .my_hook import MyHook
```
- Use `custom_imports` in the config to manually import it
```python
custom_imports = dict(imports=['mmaction.core.utils.my_hook'], allow_failed_imports=False)
```
#### 3. Modify the config
```python
custom_hooks = [
dict(type='MyHook', a=a_value, b=b_value)
]
```
You can also set the priority of the hook by adding key `priority` to `'NORMAL'` or `'HIGHEST'` as below
```python
custom_hooks = [
dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL')
]
```
By default the hook's priority is set as `NORMAL` during registration.
### Use hooks implemented in MMCV
If the hook is already implemented in MMCV, you can directly modify the config to use the hook as below
```python
mmcv_hooks = [
dict(type='MMCVHook', a=a_value, b=b_value, priority='NORMAL')
]
```
### Modify default runtime hooks
There are some common hooks that are not registered through `custom_hooks` but has been registered by default when importing MMCV, they are
- log_config
- checkpoint_config
- evaluation
- lr_config
- optimizer_config
- momentum_config
In those hooks, only the logger hook has the `VERY_LOW` priority, others' priority are `NORMAL`.
The above-mentioned tutorials already cover how to modify `optimizer_config`, `momentum_config`, and `lr_config`.
Here we reveals how what we can do with `log_config`, `checkpoint_config`, and `evaluation`.
#### Checkpoint config
The MMCV runner will use `checkpoint_config` to initialize [`CheckpointHook`](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/hooks/checkpoint.py#L9).
```python
checkpoint_config = dict(interval=1)
```
The users could set `max_keep_ckpts` to only save only small number of checkpoints or decide whether to store state dict of optimizer by `save_optimizer`.
More details of the arguments are [here](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.CheckpointHook)
#### Log config
The `log_config` wraps multiple logger hooks and enables to set intervals. Now MMCV supports `WandbLoggerHook`, `MlflowLoggerHook`, and `TensorboardLoggerHook`.
The detail usages can be found in the [doc](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.LoggerHook).
```python
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
```
#### Evaluation config
The config of `evaluation` will be used to initialize the [`EvalHook`](https://github.com/open-mmlab/mmaction2/blob/master/mmaction/core/evaluation/eval_hooks.py#L12).
Except the key `interval`, other arguments such as `metrics` will be passed to the `dataset.evaluate()`
```python
evaluation = dict(interval=1, metrics='bbox')
```
Apart from training/testing scripts, We provide lots of useful tools under the `tools/` directory.
## Useful Tools Link
<!-- TOC -->
- [Useful Tools Link](#useful-tools-link)
- [Log Analysis](#log-analysis)
- [Model Complexity](#model-complexity)
- [Model Conversion](#model-conversion)
- [MMAction2 model to ONNX (experimental)](#mmaction2-model-to-onnx-experimental)
- [Prepare a model for publishing](#prepare-a-model-for-publishing)
- [Model Serving](#model-serving)
- [1. Convert model from MMAction2 to TorchServe](#1-convert-model-from-mmaction2-to-torchserve)
- [2. Build `mmaction-serve` docker image](#2-build-mmaction-serve-docker-image)
- [3. Launch `mmaction-serve`](#3-launch-mmaction-serve)
- [4. Test deployment](#4-test-deployment)
- [Miscellaneous](#miscellaneous)
- [Evaluating a metric](#evaluating-a-metric)
- [Print the entire config](#print-the-entire-config)
- [Check videos](#check-videos)
<!-- TOC -->
## Log Analysis
`tools/analysis/analyze_logs.py` plots loss/top-k acc curves given a training log file. Run `pip install seaborn` first to install the dependency.
![acc_curve_image](https://github.com/open-mmlab/mmaction2/raw/master/resources/acc_curve.png)
```shell
python tools/analysis/analyze_logs.py plot_curve ${JSON_LOGS} [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}]
```
Examples:
- Plot the classification loss of some run.
```shell
python tools/analysis/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
```
- Plot the top-1 acc and top-5 acc of some run, and save the figure to a pdf.
```shell
python tools/analysis/analyze_logs.py plot_curve log.json --keys top1_acc top5_acc --out results.pdf
```
- Compare the top-1 acc of two runs in the same figure.
```shell
python tools/analysis/analyze_logs.py plot_curve log1.json log2.json --keys top1_acc --legend run1 run2
```
You can also compute the average training speed.
```shell
python tools/analysis/analyze_logs.py cal_train_time ${JSON_LOGS} [--include-outliers]
```
- Compute the average training speed for a config file.
```shell
python tools/analysis/analyze_logs.py cal_train_time work_dirs/some_exp/20200422_153324.log.json
```
The output is expected to be like the following.
```text
-----Analyze train time of work_dirs/some_exp/20200422_153324.log.json-----
slowest epoch 60, average time is 0.9736
fastest epoch 18, average time is 0.9001
time std over epochs is 0.0177
average iter time: 0.9330 s/iter
```
## Model Complexity
`/tools/analysis/get_flops.py` is a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) to compute the FLOPs and params of a given model.
```shell
python tools/analysis/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
```
We will get the result like this
```text
==============================
Input shape: (1, 3, 32, 340, 256)
Flops: 37.1 GMac
Params: 28.04 M
==============================
```
:::{note}
This tool is still experimental and we do not guarantee that the number is absolutely correct.
You may use the result for simple comparisons, but double check it before you adopt it in technical reports or papers.
(1) FLOPs are related to the input shape while parameters are not. The default input shape is (1, 3, 340, 256) for 2D recognizer, (1, 3, 32, 340, 256) for 3D recognizer.
(2) Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details.
:::
## Model Conversion
### MMAction2 model to ONNX (experimental)
`/tools/deployment/pytorch2onnx.py` is a script to convert model to [ONNX](https://github.com/onnx/onnx) format.
It also supports comparing the output results between Pytorch and ONNX model for verification.
Run `pip install onnx onnxruntime` first to install the dependency.
Please note that a softmax layer could be added for recognizers by `--softmax` option, in order to get predictions in range `[0, 1]`.
- For recognizers, please run:
```shell
python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --shape $SHAPE --verify
```
- For localizers, please run:
```shell
python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --is-localizer --shape $SHAPE --verify
```
### Prepare a model for publishing
`tools/deployment/publish_model.py` helps users to prepare their model for publishing.
Before you upload a model to AWS, you may want to:
(1) convert model weights to CPU tensors.
(2) delete the optimizer states.
(3) compute the hash of the checkpoint file and append the hash id to the filename.
```shell
python tools/deployment/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
```
E.g.,
```shell
python tools/deployment/publish_model.py work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/latest.pth tsn_r50_1x1x3_100e_kinetics400_rgb.pth
```
The final output filename will be `tsn_r50_1x1x3_100e_kinetics400_rgb-{hash id}.pth`.
## Model Serving
In order to serve an `MMAction2` model with [`TorchServe`](https://pytorch.org/serve/), you can follow the steps:
### 1. Convert model from MMAction2 to TorchServe
```shell
python tools/deployment/mmaction2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
--output_folder ${MODEL_STORE} \
--model-name ${MODEL_NAME} \
--label-file ${LABLE_FILE}
```
### 2. Build `mmaction-serve` docker image
```shell
DOCKER_BUILDKIT=1 docker build -t mmaction-serve:latest docker/serve/
```
### 3. Launch `mmaction-serve`
Check the official docs for [running TorchServe with docker](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment).
Example:
```shell
docker run --rm \
--cpus 8 \
--gpus device=0 \
-p8080:8080 -p8081:8081 -p8082:8082 \
--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \
mmaction-serve:latest
```
**Note**: ${MODEL_STORE} needs to be an absolute path.
[Read the docs](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md) about the Inference (8080), Management (8081) and Metrics (8082) APis
### 4. Test deployment
```shell
# Assume you are under the directory `mmaction2`
curl http://127.0.0.1:8080/predictions/${MODEL_NAME} -T demo/demo.mp4
```
You should obtain a response similar to:
```json
{
"arm wrestling": 1.0,
"rock scissors paper": 4.962051880497143e-10,
"shaking hands": 3.9761663406245873e-10,
"massaging feet": 1.1924419784925533e-10,
"stretching leg": 1.0601879096849842e-10
}
```
## Miscellaneous
### Evaluating a metric
`tools/analysis/eval_metric.py` evaluates certain metrics of the results saved in a file according to a config file.
The saved result file is created on `tools/test.py` by setting the arguments `--out ${RESULT_FILE}` to indicate the result file,
which stores the final output of the whole model.
```shell
python tools/analysis/eval_metric.py ${CONFIG_FILE} ${RESULT_FILE} [--eval ${EVAL_METRICS}] [--cfg-options ${CFG_OPTIONS}] [--eval-options ${EVAL_OPTIONS}]
```
### Print the entire config
`tools/analysis/print_config.py` prints the whole config verbatim, expanding all its imports.
```shell
python tools/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}]
```
### Check videos
`tools/analysis/check_videos.py` uses specified video encoder to iterate all samples that are specified by the input configuration file, looks for invalid videos (corrupted or missing), and saves the corresponding file path to the output file. Please note that after deleting invalid videos, users need to regenerate the video file list.
```shell
python tools/analysis/check_videos.py ${CONFIG} [-h] [--options OPTIONS [OPTIONS ...]] [--cfg-options CFG_OPTIONS [CFG_OPTIONS ...]] [--output-file OUTPUT_FILE] [--split SPLIT] [--decoder DECODER] [--num-processes NUM_PROCESSES] [--remove-corrupted-videos]
```
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
../README_zh-CN.md
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment