add model TSM

5b3e36dc · Sugon_ldc · 5b3e36dc · 5b3e36dc · 5b3e36dc · 5b3e36dc
Commit 5b3e36dc authored Jun 07, 2023 by Sugon_ldc
20 changed files
--- a/docs/en/feature_extraction.md
+++ b/docs/en/feature_extraction.md
+# Feature Extraction
+We provide easy to use scripts for feature extraction.
+## Clip-level Feature Extraction
+Clip-level feature extraction extract deep feature from a video clip, which usually lasts several to tens of seconds. The extracted feature is an n-dim vector for each clip. When performing multi-view feature extraction, e.g. n clips x m crops, the extracted feature will be the average of the n * m views.
+Before applying clip-level feature extraction, you need to prepare a video list (which include all videos that you want to extract feature from). For example, the video list for videos in UCF101 will look like:
+```
+ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi
+ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi
+ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c03.avi
+ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi
+ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi
+...
+YoYo/v_YoYo_g25_c01.avi
+YoYo/v_YoYo_g25_c02.avi
+YoYo/v_YoYo_g25_c03.avi
+YoYo/v_YoYo_g25_c04.avi
+YoYo/v_YoYo_g25_c05.avi
+```
+Assume the root of UCF101 videos is `data/ucf101/videos` and the name of the video list is `ucf101.txt`, to extract clip-level feature of UCF101 videos with Kinetics-400 pretrained TSN, you can use the following script:
+```shell
+python tools/misc/clip_feature_extraction.py \
+configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py \
+https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth \
+--video-list ucf101.txt \
+--video-root data/ucf101/videos \
+--out ucf101_feature.pkl
+```
+and the extracted feature will be stored in `ucf101_feature.pkl`
+You can also use distributed clip-level feature extraction. Below is an example for a node with 8 gpus.
+```shell
+bash tools/misc/dist_clip_feature_extraction.sh \
+configs/recognition/tsn/tsn_r50_clip_feature_extraction_1x1x3_rgb.py \
+https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb/tsn_r50_320p_1x1x3_100e_kinetics400_rgb_20200702-cc665e2a.pth \
+8 \
+--video-list ucf101.txt \
+--video-root data/ucf101/videos \
+--out ucf101_feature.pkl
+```
+To extract clip-level feature of UCF101 videos with Kinetics-400 pretrained SlowOnly, you can use the following script:
+```shell
+python tools/misc/clip_feature_extraction.py \
+configs/recognition/slowonly/slowonly_r50_clip_feature_extraction_4x16x1_rgb.py \
+https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth \
+--video-list ucf101.txt \
+--video-root data/ucf101/videos \
+--out ucf101_feature.pkl
+```
+The two config files demonstrates what a minimal config file for feature extraction looks like. You can also use other existing config files for feature extraction, as long as they use videos rather than raw frames for training and testing:
+```shell
+python tools/misc/clip_feature_extraction.py \
+configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py \
+https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb/slowonly_r50_video_320p_4x16x1_256e_kinetics400_rgb_20201014-c9cdc656.pth \
+--video-list ucf101.txt \
+--video-root data/ucf101/videos \
+--out ucf101_feature.pkl
+```
--- a/docs/en/getting_started.md
+++ b/docs/en/getting_started.md
+# Getting Started
+This page provides basic tutorials about the usage of MMAction2.
+For installation instructions, please see [install.md](install.md).
+<!-- TOC -->
+- [Getting Started](#getting-started)
+  - [Datasets](#datasets)
+  - [Inference with Pre-Trained Models](#inference-with-pre-trained-models)
+    - [Test a dataset](#test-a-dataset)
+    - [High-level APIs for testing a video and rawframes](#high-level-apis-for-testing-a-video-and-rawframes)
+  - [Build a Model](#build-a-model)
+    - [Build a model with basic components](#build-a-model-with-basic-components)
+    - [Write a new model](#write-a-new-model)
+  - [Train a Model](#train-a-model)
+    - [Iteration pipeline](#iteration-pipeline)
+    - [Training setting](#training-setting)
+    - [Train with a single GPU](#train-with-a-single-gpu)
+    - [Train with multiple GPUs](#train-with-multiple-gpus)
+    - [Train with multiple machines](#train-with-multiple-machines)
+    - [Launch multiple jobs on a single machine](#launch-multiple-jobs-on-a-single-machine)
+  - [Tutorials](#tutorials)
+<!-- TOC -->
+## Datasets
+It is recommended to symlink the dataset root to `$MMACTION2/data`.
+If your folder structure is different, you may need to change the corresponding paths in config files.
+```
+mmaction2
+├── mmaction
+├── tools
+├── configs
+├── data
+│   ├── kinetics400
+│   │   ├── rawframes_train
+│   │   ├── rawframes_val
+│   │   ├── kinetics_train_list.txt
+│   │   ├── kinetics_val_list.txt
+│   ├── ucf101
+│   │   ├── rawframes_train
+│   │   ├── rawframes_val
+│   │   ├── ucf101_train_list.txt
+│   │   ├── ucf101_val_list.txt
+│   ├── ...
+```
+For more information on data preparation, please see [data_preparation.md](data_preparation.md)
+For using custom datasets, please refer to [Tutorial 3: Adding New Dataset](tutorials/3_new_dataset.md)
+## Inference with Pre-Trained Models
+We provide testing scripts to evaluate a whole dataset (Kinetics-400, Something-Something V1&V2, (Multi-)Moments in Time, etc.),
+and provide some high-level apis for easier integration to other projects.
+MMAction2 also supports testing with CPU. However, it will be **very slow** and should only be used for debugging on a device without GPU.
+To test with CPU, one should first disable all GPUs (if exist) with `export CUDA_VISIBLE_DEVICES=-1`, and then call the testing scripts directly with `python tools/test.py {OTHER_ARGS}`.
+### Test a dataset
+- [x] single GPU
+- [x] single node multiple GPUs
+- [x] multiple node
+You can use the following commands to test a dataset.
+```shell
+# single-gpu testing
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] \
+    [--gpu-collect] [--tmpdir ${TMPDIR}] [--options ${OPTIONS}] [--average-clips ${AVG_TYPE}] \
+    [--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}] [--onnx] [--tensorrt]
+# multi-gpu testing
+./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] \
+    [--gpu-collect] [--tmpdir ${TMPDIR}] [--options ${OPTIONS}] [--average-clips ${AVG_TYPE}] \
+    [--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}]
+```
+Optional arguments:
+- `RESULT_FILE`: Filename of the output results. If not specified, the results will not be saved to a file.
+- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., `top_k_accuracy`, `mean_class_accuracy` are available for all datasets in recognition, `mmit_mean_average_precision` for Multi-Moments in Time, `mean_average_precision` for Multi-Moments in Time and HVU single category. `AR@AN` for ActivityNet, etc.
+- `--gpu-collect`: If specified, recognition results will be collected using gpu communication. Otherwise, it will save the results on different gpus to `TMPDIR` and collect them by the rank 0 worker.
+- `TMPDIR`: Temporary directory used for collecting results from multiple workers, available when `--gpu-collect` is not specified.
+- `OPTIONS`: Custom options used for evaluation. Allowed values depend on the arguments of the `evaluate` function in dataset.
+- `AVG_TYPE`: Items to average the test clips. If set to `prob`, it will apply softmax before averaging the clip scores. Otherwise, it will directly average the clip scores.
+- `JOB_LAUNCHER`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode.
+- `LOCAL_RANK`: ID for local rank. If not specified, it will be set to 0.
+- `--onnx`: If specified, recognition results will be generated by onnx model and `CHECKPOINT_FILE` should be onnx model file path. Onnx model files are generated by `/tools/deployment/pytorch2onnx.py`. For now, multi-gpu mode and dynamic input shape mode are not supported. Please note that the output tensors of dataset and the input tensors of onnx model should share the same shape. And it is recommended to remove all test-time augmentation methods in `test_pipeline`(`ThreeCrop`, `TenCrop`, `twice_sample`, etc.)
+- `--tensorrt`: If specified, recognition results will be generated by TensorRT engine and `CHECKPOINT_FILE` should be TensorRT engine file path. TensorRT engines are generated by exported onnx models and TensorRT official conversion tools. For now, multi-gpu mode and dynamic input shape mode are not supported. Please note that the output tensors of dataset and the input tensors of TensorRT engine should share the same shape. And it is recommended to remove all test-time augmentation methods in `test_pipeline`(`ThreeCrop`, `TenCrop`, `twice_sample`, etc.)
+Examples:
+Assume that you have already downloaded the checkpoints to the directory `checkpoints/`.
+1. Test TSN on Kinetics-400 (without saving the test results) and evaluate the top-k accuracy and mean class accuracy.
+   ```shell
+   python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \
+       checkpoints/SOME_CHECKPOINT.pth \
+       --eval top_k_accuracy mean_class_accuracy
+   ```
+2. Test TSN on Something-Something V1 with 8 GPUS, and evaluate the top-k accuracy.
+   ```shell
+   ./tools/dist_test.sh configs/recognition/tsn/tsn_r50_1x1x8_50e_sthv1_rgb.py \
+       checkpoints/SOME_CHECKPOINT.pth \
+       8 --out results.pkl --eval top_k_accuracy
+   ```
+3. Test TSN on Kinetics-400 in slurm environment and evaluate the top-k accuracy
+   ```shell
+   python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \
+       checkpoints/SOME_CHECKPOINT.pth \
+       --launcher slurm --eval top_k_accuracy
+   ```
+4. Test TSN on Something-Something V1 with onnx model and evaluate the top-k accuracy
+   ```shell
+   python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py \
+       checkpoints/SOME_CHECKPOINT.onnx \
+       --eval top_k_accuracy --onnx
+   ```
+### High-level APIs for testing a video and rawframes
+Here is an example of building the model and testing a given video.
+```python
+import torch
+from mmaction.apis import init_recognizer, inference_recognizer
+config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
+# download the checkpoint from model zoo and put it in `checkpoints/`
+checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'
+# assign the desired device.
+device = 'cuda:0' # or 'cpu'
+device = torch.device(device)
+ # build the model from a config file and a checkpoint file
+model = init_recognizer(config_file, checkpoint_file, device=device)
+# test a single video and show the result:
+video = 'demo/demo.mp4'
+labels = 'tools/data/kinetics/label_map_k400.txt'
+results = inference_recognizer(model, video)
+# show the results
+labels = open('tools/data/kinetics/label_map_k400.txt').readlines()
+labels = [x.strip() for x in labels]
+results = [(labels[k[0]], k[1]) for k in results]
+print(f'The top-5 labels with corresponding scores are:')
+for result in results:
+    print(f'{result[0]}: ', result[1])
+```
+Here is an example of building the model and testing with a given rawframes directory.
+```python
+import torch
+from mmaction.apis import init_recognizer, inference_recognizer
+config_file = 'configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py'
+# download the checkpoint from model zoo and put it in `checkpoints/`
+checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'
+# assign the desired device.
+device = 'cuda:0' # or 'cpu'
+device = torch.device(device)
+ # build the model from a config file and a checkpoint file
+model = init_recognizer(config_file, checkpoint_file, device=device)
+# test rawframe directory of a single video and show the result:
+video = 'SOME_DIR_PATH/'
+labels = 'tools/data/kinetics/label_map_k400.txt'
+results = inference_recognizer(model, video)
+# show the results
+labels = open('tools/data/kinetics/label_map_k400.txt').readlines()
+labels = [x.strip() for x in labels]
+results = [(labels[k[0]], k[1]) for k in results]
+print(f'The top-5 labels with corresponding scores are:')
+for result in results:
+    print(f'{result[0]}: ', result[1])
+```
+Here is an example of building the model and testing with a given video url.
+```python
+import torch
+from mmaction.apis import init_recognizer, inference_recognizer
+config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
+# download the checkpoint from model zoo and put it in `checkpoints/`
+checkpoint_file = 'checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'
+# assign the desired device.
+device = 'cuda:0' # or 'cpu'
+device = torch.device(device)
+ # build the model from a config file and a checkpoint file
+model = init_recognizer(config_file, checkpoint_file, device=device)
+# test url of a single video and show the result:
+video = 'https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4'
+labels = 'tools/data/kinetics/label_map_k400.txt'
+results = inference_recognizer(model, video)
+# show the results
+labels = open('tools/data/kinetics/label_map_k400.txt').readlines()
+labels = [x.strip() for x in labels]
+results = [(labels[k[0]], k[1]) for k in results]
+print(f'The top-5 labels with corresponding scores are:')
+for result in results:
+    print(f'{result[0]}: ', result[1])
+```
+:::{note}
+We define `data_prefix` in config files and set it None as default for our provided inference configs.
+If the `data_prefix` is not None, the path for the video file (or rawframe directory) to get will be `data_prefix/video`.
+Here, the `video` is the param in the demo scripts above.
+This detail can be found in `rawframe_dataset.py` and `video_dataset.py`. For example,
+- When video (rawframes) path is `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`), and `data_prefix` is None in the config file,
+  the param `video` should be `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME`).
+- When video (rawframes) path is `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`), and `data_prefix` is `SOME_DIR_PATH` in the config file,
+  the param `video` should be `VIDEO.mp4` (`VIDEO_NAME`).
+- When rawframes path is `VIDEO_NAME/img_xxxxx.jpg`, and `data_prefix` is None in the config file, the param `video` should be `VIDEO_NAME`.
+- When passing a url instead of a local video file, you need to use OpenCV as the video decoding backend.
+:::
+A notebook demo can be found in [demo/demo.ipynb](/demo/demo.ipynb)
+## Build a Model
+### Build a model with basic components
+In MMAction2, model components are basically categorized as 4 types.
+- recognizer: the whole recognizer model pipeline, usually contains a backbone and cls_head.
+- backbone: usually an FCN network to extract feature maps, e.g., ResNet, BNInception.
+- cls_head: the component for classification task, usually contains an FC layer with some pooling layers.
+- localizer: the model for localization task, currently available: BSN, BMN.
+Following some basic pipelines (e.g., `Recognizer2D`), the model structure
+can be customized through config files with no pains.
+If we want to implement some new components, e.g., the temporal shift backbone structure as
+in [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383), there are several things to do.
+1. create a new file in `mmaction/models/backbones/resnet_tsm.py`.
+   ```python
+   from ..builder import BACKBONES
+   from .resnet import ResNet
+   @BACKBONES.register_module()
+   class ResNetTSM(ResNet):
+     def __init__(self,
+                  depth,
+                  num_segments=8,
+                  is_shift=True,
+                  shift_div=8,
+                  shift_place='blockres',
+                  temporal_pool=False,
+                  **kwargs):
+         pass
+     def forward(self, x):
+         # implementation is ignored
+         pass
+   ```
+2. Import the module in `mmaction/models/backbones/__init__.py`
+   ```python
+   from .resnet_tsm import ResNetTSM
+   ```
+3. modify the config file from
+   ```python
+   backbone=dict(
+     type='ResNet',
+     pretrained='torchvision://resnet50',
+     depth=50,
+     norm_eval=False)
+   ```
+   to
+   ```python
+   backbone=dict(
+       type='ResNetTSM',
+       pretrained='torchvision://resnet50',
+       depth=50,
+       norm_eval=False,
+       shift_div=8)
+   ```
+### Write a new model
+To write a new recognition pipeline, you need to inherit from `BaseRecognizer`,
+which defines the following abstract methods.
+- `forward_train()`: forward method of the training mode.
+- `forward_test()`: forward method of the testing mode.
+[Recognizer2D](/mmaction/models/recognizers/recognizer2d.py) and [Recognizer3D](/mmaction/models/recognizers/recognizer3d.py)
+are good examples which show how to do that.
+## Train a Model
+### Iteration pipeline
+MMAction2 implements distributed training and non-distributed training,
+which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
+We adopt distributed training for both single machine and multiple machines.
+Supposing that the server has 8 GPUs, 8 processes will be started and each process runs on a single GPU.
+Each process keeps an isolated model, data loader, and optimizer.
+Model parameters are only synchronized once at the beginning.
+After a forward and backward pass, gradients will be allreduced among all GPUs,
+and the optimizer will update model parameters.
+Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration.
+### Training setting
+All outputs (log files and checkpoints) will be saved to the working directory,
+which is specified by `work_dir` in the config file.
+By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by modifying the interval argument in the training config
+```python
+evaluation = dict(interval=5)  # This evaluate the model per 5 epoch.
+```
+According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you need to set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu.
+MMAction2 also supports training with CPU. However, it will be **very slow** and should only be used for debugging on a device without GPU.
+To train with CPU, one should first disable all GPUs (if exist) with `export CUDA_VISIBLE_DEVICES=-1`, and then call the training scripts directly with `python tools/train.py {OTHER_ARGS}`.
+### Train with a single GPU
+```shell
+python tools/train.py ${CONFIG_FILE} [optional arguments]
+```
+If you want to specify the working directory in the command, you can add an argument `--work-dir ${YOUR_WORK_DIR}`.
+### Train with multiple GPUs
+```shell
+./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
+```
+Optional arguments are:
+- `--validate` (**strongly recommended**): Perform evaluation at every k (default value is 5, which can be modified by changing the `interval` value in `evaluation` dict in each config file) epochs during the training.
+- `--test-last`: Test the final checkpoint when training is over, save the prediction to `${WORK_DIR}/last_pred.pkl`.
+- `--test-best`: Test the best checkpoint when training is over, save the prediction to `${WORK_DIR}/best_pred.pkl`.
+- `--work-dir ${WORK_DIR}`: Override the working directory specified in the config file.
+- `--resume-from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
+- `--gpus ${GPU_NUM}`: Number of gpus to use, which is only applicable to non-distributed training.
+- `--gpu-ids ${GPU_IDS}`: IDs of gpus to use, which is only applicable to non-distributed training.
+- `--seed ${SEED}`: Seed id for random state in python, numpy and pytorch to generate random numbers.
+- `--deterministic`: If specified, it will set deterministic options for CUDNN backend.
+- `JOB_LAUNCHER`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode.
+- `LOCAL_RANK`: ID for local rank. If not specified, it will be set to 0.
+Difference between `resume-from` and `load-from`:
+`resume-from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally.
+`load-from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.
+Here is an example of using 8 GPUs to load TSN checkpoint.
+```shell
+./tools/dist_train.sh configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py 8 --resume-from work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/latest.pth
+```
+### Train with multiple machines
+If you can run MMAction2 on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.)
+```shell
+[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} [--work-dir ${WORK_DIR}]
+```
+Here is an example of using 16 GPUs to train TSN on the dev partition in a slurm cluster. (use `GPUS_PER_NODE=8` to specify a single slurm cluster node with 8 GPUs.)
+```shell
+GPUS=16 ./tools/slurm_train.sh dev tsn_r50_k400 configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb
+```
+You can check [slurm_train.sh](/tools/slurm_train.sh) for full arguments and environment variables.
+If you have just multiple machines connected with ethernet, you can simply run the following commands:
+On the first machine:
+```shell
+NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
+```
+On the second machine:
+```shell
+NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
+```
+It can be extremely slow if you do not have high-speed networking like InfiniBand.
+### Launch multiple jobs on a single machine
+If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
+you need to specify different ports (29500 by default) for each job to avoid communication conflict.
+If you use `dist_train.sh` to launch training jobs, you can set the port in commands.
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
+CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
+```
+If you use launch training jobs with slurm, you need to modify `dist_params` in the config files (usually the 6th line from the bottom in config files) to set different communication ports.
+In `config1.py`,
+```python
+dist_params = dict(backend='nccl', port=29500)
+```
+In `config2.py`,
+```python
+dist_params = dict(backend='nccl', port=29501)
+```
+Then you can launch two jobs with `config1.py` ang `config2.py`.
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py [--work-dir ${WORK_DIR}]
+CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py [--work-dir ${WORK_DIR}]
+```
+## Tutorials
+Currently, we provide some tutorials for users to [learn about configs](tutorials/1_config.md), [finetune model](tutorials/2_finetune.md),
+[add new dataset](tutorials/3_new_dataset.md), [customize data pipelines](tutorials/4_data_pipeline.md),
+[add new modules](tutorials/5_new_modules.md), [export a model to ONNX](tutorials/6_export_model.md) and [customize runtime settings](tutorials/7_customize_runtime.md).
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
+Welcome to MMAction2's documentation!
+=====================================
+You can switch between Chinese and English documents in the lower-left corner of the layout.
+您可以在页面左下角切换文档语言。
+.. toctree::
+   :maxdepth: 2
+   install.md
+   getting_started.md
+   demo.md
+   benchmark.md
+.. toctree::
+   :maxdepth: 2
+   :caption: Datasets
+   datasets.md
+   data_preparation.md
+   supported_datasets.md
+.. toctree::
+   :maxdepth: 2
+   :caption: Model Zoo
+   modelzoo.md
+   recognition_models.md
+   localization_models.md
+   detection_models.md
+   skeleton_models.md
+.. toctree::
+   :maxdepth: 2
+   :caption: Tutorials
+   tutorials/1_config.md
+   tutorials/2_finetune.md
+   tutorials/3_new_dataset.md
+   tutorials/4_data_pipeline.md
+   tutorials/5_new_modules.md
+   tutorials/6_export_model.md
+   tutorials/7_customize_runtime.md
+.. toctree::
+   :maxdepth: 2
+   :caption: Useful Tools and Scripts
+   useful_tools.md
+.. toctree::
+   :maxdepth: 2
+   :caption: Notes
+   changelog.md
+   faq.md
+.. toctree::
+   :caption: API Reference
+   api.rst
+.. toctree::
+   :caption: Switch Language
+   switch_language.md
+Indices and tables
+==================
+* :ref:`genindex`
+* :ref:`search`
--- a/docs/en/install.md
+++ b/docs/en/install.md
+# Installation
+We provide some tips for MMAction2 installation in this file.
+<!-- TOC -->
+- [Installation](#installation)
+  - [Requirements](#requirements)
+  - [Prepare environment](#prepare-environment)
+  - [Install MMAction2](#install-mmaction2)
+  - [Install with CPU only](#install-with-cpu-only)
+  - [Another option: Docker Image](#another-option-docker-image)
+  - [A from-scratch setup script](#a-from-scratch-setup-script)
+  - [Developing with multiple MMAction2 versions](#developing-with-multiple-mmaction2-versions)
+  - [Verification](#verification)
+<!-- TOC -->
+## Requirements
+- Linux, Windows (We can successfully install mmaction2 on Windows and run inference, but we haven't tried training yet)
+- Python 3.6+
+- PyTorch 1.3+
+- CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible)
+- GCC 5+
+- [mmcv](https://github.com/open-mmlab/mmcv) 1.1.1+
+- Numpy
+- ffmpeg (4.2 is preferred)
+- [decord](https://github.com/dmlc/decord) (optional, 0.4.1+): Install CPU version by `pip install decord==0.4.1` and install GPU version from source
+- [PyAV](https://github.com/mikeboers/PyAV) (optional): `conda install av -c conda-forge -y`
+- [PyTurboJPEG](https://github.com/lilohuang/PyTurboJPEG) (optional): `pip install PyTurboJPEG`
+- [denseflow](https://github.com/open-mmlab/denseflow) (optional): See [here](https://github.com/innerlee/setup) for simple install scripts.
+- [moviepy](https://zulko.github.io/moviepy/) (optional): `pip install moviepy`. See [here](https://zulko.github.io/moviepy/install.html) for official installation.  **Note**(according to [this issue](https://github.com/Zulko/moviepy/issues/693)) that:
+  1. For Windows users, [ImageMagick](https://www.imagemagick.org/script/index.php) will not be automatically detected by MoviePy,
+     there is a need to modify `moviepy/config_defaults.py` file by providing the path to the ImageMagick binary called `magick`, like `IMAGEMAGICK_BINARY = "C:\\Program Files\\ImageMagick_VERSION\\magick.exe"`
+  2. For Linux users, there is a need to modify the `/etc/ImageMagick-6/policy.xml` file by commenting out
+     `<policy domain="path" rights="none" pattern="@*" />` to `<!-- <policy domain="path" rights="none" pattern="@*" /> -->`, if [ImageMagick](https://www.imagemagick.org/script/index.php) is not detected by `moviepy`.
+- [Pillow-SIMD](https://github.com/uploadcare/pillow-simd) (optional): Install it by the following scripts.
+```shell
+conda uninstall -y --force pillow pil jpeg libtiff libjpeg-turbo
+pip   uninstall -y         pillow pil jpeg libtiff libjpeg-turbo
+conda install -yc conda-forge libjpeg-turbo
+CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
+conda install -y jpeg libtiff
+```
+:::{note}
+You need to run `pip uninstall mmcv` first if you have mmcv installed.
+If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`.
+:::
+## Prepare environment
+a. Create a conda virtual environment and activate it.
+```shell
+conda create -n open-mmlab python=3.7 -y
+conda activate open-mmlab
+```
+b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/), e.g.,
+```shell
+conda install pytorch torchvision -c pytorch
+```
+:::{note}
+Make sure that your compilation CUDA version and runtime CUDA version match.
+You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).
+`E.g.1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install PyTorch 1.5,
+you need to install the prebuilt PyTorch with CUDA 10.1.
+```shell
+conda install pytorch cudatoolkit=10.1 torchvision -c pytorch
+```
+`E.g.2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install PyTorch 1.3.1.,
+you need to install the prebuilt PyTorch with CUDA 9.2.
+```shell
+conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch
+```
+If you build PyTorch from source instead of installing the prebuilt package, you can use more CUDA versions such as 9.0.
+:::
+## Install MMAction2
+We recommend you to install MMAction2 with [MIM](https://github.com/open-mmlab/mim).
+```shell
+pip install git+https://github.com/open-mmlab/mim.git
+mim install mmaction2 -f https://github.com/open-mmlab/mmaction2.git
+```
+MIM can automatically install OpenMMLab projects and their requirements.
+Or, you can install MMAction2 manually:
+a. Install mmcv-full, we recommend you to install the pre-built package as below.
+```shell
+# pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
+pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html
+```
+mmcv-full is only compiled on PyTorch 1.x.0 because the compatibility usually holds between 1.x.0 and 1.x.1. If your PyTorch version is 1.x.1, you can install mmcv-full compiled with PyTorch 1.x.0 and it usually works well.
+```
+# We can ignore the micro version of PyTorch
+pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10/index.html
+```
+See [here](https://github.com/open-mmlab/mmcv#installation) for different versions of MMCV compatible to different PyTorch and CUDA versions.
+Optionally you can choose to compile mmcv from source by the following command
+```shell
+git clone https://github.com/open-mmlab/mmcv.git
+cd mmcv
+MMCV_WITH_OPS=1 pip install -e .  # package mmcv-full, which contains cuda ops, will be installed after this step
+# OR pip install -e .  # package mmcv, which contains no cuda ops, will be installed after this step
+cd ..
+```
+Or directly run
+```shell
+pip install mmcv-full
+# alternative: pip install mmcv
+```
+**Important:** You need to run `pip uninstall mmcv` first if you have mmcv installed. If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`.
+b. Clone the MMAction2 repository.
+```shell
+git clone https://github.com/open-mmlab/mmaction2.git
+cd mmaction2
+```
+c. Install build requirements and then install MMAction2.
+```shell
+pip install -r requirements/build.txt
+pip install -v -e .  # or "python setup.py develop"
+```
+If you build MMAction2 on macOS, replace the last command with
+```shell
+CC=clang CXX=clang++ CFLAGS='-stdlib=libc++' pip install -e .
+```
+d. Install mmdetection for spatial temporal detection tasks.
+This part is **optional** if you're not going to do spatial temporal detection.
+See [here](https://github.com/open-mmlab/mmdetection#installation) to install mmdetection.
+:::{note}
+1. The git commit id will be written to the version number with step b, e.g. 0.6.0+2e7045c. The version will also be saved in trained models.
+   It is recommended that you run step b each time you pull some updates from github. If C++/CUDA codes are modified, then this step is compulsory.
+2. Following the above instructions, MMAction2 is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it (unless you submit some commits and want to update the version number).
+3. If you would like to use `opencv-python-headless` instead of `opencv-python`,
+   you can install it before installing MMCV.
+4. If you would like to use `PyAV`, you can install it with `conda install av -c conda-forge -y`.
+5. Some dependencies are optional. Running `python setup.py develop` will only install the minimum runtime requirements.
+   To use optional dependencies like `decord`, either install them with `pip install -r requirements/optional.txt`
+   or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`,
+   valid keys for the `[optional]` field are `all`, `tests`, `build`, and `optional`) like `pip install -v -e .[tests,build]`.
+:::
+## Install with CPU only
+The code can be built for CPU only environment (where CUDA isn't available).
+In CPU mode you can run the demo/demo.py for example.
+## Another option: Docker Image
+We provide a [Dockerfile](/docker/Dockerfile) to build an image.
+```shell
+# build an image with PyTorch 1.6.0, CUDA 10.1, CUDNN 7.
+docker build -f ./docker/Dockerfile --rm -t mmaction2 .
+```
+**Important:** Make sure you've installed the [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
+Run it with command:
+```shell
+docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmaction2/data mmaction2
+```
+## A from-scratch setup script
+Here is a full script for setting up MMAction2 with conda and link the dataset path (supposing that your Kinetics-400 dataset path is $KINETICS400_ROOT).
+```shell
+conda create -n open-mmlab python=3.7 -y
+conda activate open-mmlab
+# install latest pytorch prebuilt with the default prebuilt CUDA version (usually the latest)
+conda install -c pytorch pytorch torchvision -y
+# install the latest mmcv or mmcv-full, here we take mmcv as example
+pip install mmcv
+# install mmaction2
+git clone https://github.com/open-mmlab/mmaction2.git
+cd mmaction2
+pip install -r requirements/build.txt
+python setup.py develop
+mkdir data
+ln -s $KINETICS400_ROOT data
+```
+## Developing with multiple MMAction2 versions
+The train and test scripts already modify the `PYTHONPATH` to ensure the script use the MMAction2 in the current directory.
+To use the default MMAction2 installed in the environment rather than that you are working with, you can remove the following line in those scripts.
+```shell
+PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
+```
+## Verification
+To verify whether MMAction2 and the required environment are installed correctly,
+we can run sample python codes to initialize a recognizer and inference a demo video:
+```python
+import torch
+from mmaction.apis import init_recognizer, inference_recognizer
+config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
+device = 'cuda:0' # or 'cpu'
+device = torch.device(device)
+model = init_recognizer(config_file, device=device)
+# inference the demo video
+inference_recognizer(model, 'demo/demo.mp4')
+```
--- a/docs/en/make.bat
+++ b/docs/en/make.bat
+@ECHO OFF
+pushd %~dp0
+REM Command file for Sphinx documentation
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=.
+set BUILDDIR=_build
+if "%1" == "" goto help
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.http://sphinx-doc.org/
+	exit /b 1
+)
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+:end
+popd
--- a/docs/en/merge_docs.sh
+++ b/docs/en/merge_docs.sh
+#!/usr/bin/env bash
+sed -i '$a\\n' ../demo/README.md
+# gather models
+cat  ../../configs/localization/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Action Localization Models' | sed 's/](\/docs\/en\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > localization_models.md
+cat  ../../configs/recognition/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Action Recognition Models' | sed 's/](\/docs\/en\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > recognition_models.md
+cat  ../../configs/recognition_audio/*/README.md | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed 's/](\/docs\/en\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" >> recognition_models.md
+cat  ../../configs/detection/*/README.md  | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Spatio Temporal Action Detection Models' | sed 's/](\/docs\/en\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > detection_models.md
+cat  ../../configs/skeleton/*/README.md  | sed "s/md#t/html#t/g" | sed "s/#/#&/" | sed '1i\# Skeleton-based Action Recognition Models' | sed 's/](\/docs\/en\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > skeleton_models.md
+# demo
+cat  ../../demo/README.md | sed "s/md#t/html#t/g" | sed 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' | sed "s/getting_started.html##t/getting_started.html#t/g" > demo.md
+# gather datasets
+cat  ../../tools/data/*/README.md | sed 's/# Preparing/# /g' | sed 's/#/#&/' > prepare_data.md
+sed -i 's/(\/tools\/data\/activitynet\/README.md/(#activitynet/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/kinetics\/README.md/(#kinetics-400600700/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/mit\/README.md/(#moments-in-time/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/mmit\/README.md/(#multi-moments-in-time/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/sthv1\/README.md/(#something-something-v1/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/sthv2\/README.md/(#something-something-v2/g' supported_datasets.md
+sed -i "s/(\/tools\/data\/thumos14\/README.md/(#thumos14/g" supported_datasets.md
+sed -i 's/(\/tools\/data\/ucf101\/README.md/(#ucf-101/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/ucf101_24\/README.md/(#ucf101-24/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/jhmdb\/README.md/(#jhmdb/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/hvu\/README.md/(#hvu/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/hmdb51\/README.md/(#hmdb51/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/jester\/README.md/(#jester/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/ava\/README.md/(#ava/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/gym\/README.md/(#gym/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/omnisource\/README.md/(#omnisource/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/diving48\/README.md/(#diving48/g' supported_datasets.md
+sed -i 's/(\/tools\/data\/skeleton\/README.md/(#skeleton-dataset/g' supported_datasets.md
+cat prepare_data.md >> supported_datasets.md
+sed -i 's/](\/docs\/en\//](/g' supported_datasets.md
+sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' supported_datasets.md
+sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' benchmark.md
+sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' getting_started.md
+sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' install.md
+sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' changelog.md
+sed -i 's/](\/docs\/en\//](/g' ./tutorials/*.md
+sed -i 's=](/=](https://github.com/open-mmlab/mmaction2/tree/master/=g' ./tutorials/*.md
--- a/docs/en/projects.md
+++ b/docs/en/projects.md
+# Projects based on MMAction2
+There are many research works and projects built on MMAction2.
+We list some of them as examples of how to extend MMAction2 for your own projects.
+As the page might not be completed, please feel free to create a PR to update this page.
+## Projects as an extension
+- [OTEAction2](https://github.com/openvinotoolkit/mmaction2): OpenVINO Training Extensions for Action Recognition.
+## Projects of papers
+There are also projects released with papers.
+Some of the papers are published in top-tier conferences (CVPR, ICCV, and ECCV), the others are also highly influential.
+To make this list also a reference for the community to develop and compare new video understanding algorithms, we list them following the time order of top-tier conferences.
+Methods already supported and maintained by MMAction2 are not listed.
+- Evidential Deep Learning for Open Set Action Recognition, ICCV 2021 Oral. [\[paper\]](https://arxiv.org/abs/2107.10161)[\[github\]](https://github.com/Cogito2012/DEAR)
+- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective, ICCV 2021 Oral. [\[paper\]](https://arxiv.org/abs/2103.17263)[\[github\]](https://github.com/xvjiarui/VFS)
+- MGSampler: An Explainable Sampling Strategy for Video Action Recognition, ICCV 2021. [\[paper\]](https://arxiv.org/abs/2104.09952)[\[github\]](https://github.com/MCG-NJU/MGSampler)
+- MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions, ICCV 2021. [\[paper\]](https://arxiv.org/abs/2105.07404)
+- Video Swin Transformer. [\[paper\]](https://arxiv.org/abs/2106.13230)[\[github\]](https://github.com/SwinTransformer/Video-Swin-Transformer)
+- Long Short-Term Transformer for Online Action Detection. [\[paper\]](https://arxiv.org/abs/2107.03377)
--- a/docs/en/stat.py
+++ b/docs/en/stat.py
+#!/usr/bin/env python
+# Copyright (c) OpenMMLab. All rights reserved.
+import functools as func
+import glob
+import re
+from os.path import basename, splitext
+import numpy as np
+import titlecase
+def anchor(name):
+    return re.sub(r'-+', '-', re.sub(r'[^a-zA-Z0-9]', '-',
+                                     name.strip().lower())).strip('-')
+# Count algorithms
+files = sorted(glob.glob('*_models.md'))
+# files = sorted(glob.glob('docs/*_models.md'))
+stats = []
+for f in files:
+    with open(f, 'r') as content_file:
+        content = content_file.read()
+    # title
+    title = content.split('\n')[0].replace('#', '')
+    # skip IMAGE and ABSTRACT tags
+    content = [
+        x for x in content.split('\n')
+        if 'IMAGE' not in x and 'ABSTRACT' not in x
+    ]
+    content = '\n'.join(content)
+    # count papers
+    papers = set(
+        (papertype, titlecase.titlecase(paper.lower().strip()))
+        for (papertype, paper) in re.findall(
+            r'<!--\s*\[([A-Z]*?)\]\s*-->\s*\n.*?\btitle\s*=\s*{(.*?)}',
+            content, re.DOTALL))
+    # paper links
+    revcontent = '\n'.join(list(reversed(content.splitlines())))
+    paperlinks = {}
+    for _, p in papers:
+        print(p)
+        q = p.replace('\\', '\\\\').replace('?', '\\?')
+        paperlinks[p] = ' '.join(
+            (f'[->]({splitext(basename(f))[0]}.html#{anchor(paperlink)})'
+             for paperlink in re.findall(
+                 rf'\btitle\s*=\s*{{\s*{q}\s*}}.*?\n## (.*?)\s*[,;]?\s*\n',
+                 revcontent, re.DOTALL | re.IGNORECASE)))
+        print('   ', paperlinks[p])
+    paperlist = '\n'.join(
+        sorted(f'    - [{t}] {x} ({paperlinks[x]})' for t, x in papers))
+    # count configs
+    configs = set(x.lower().strip()
+                  for x in re.findall(r'https.*configs/.*\.py', content))
+    # count ckpts
+    ckpts = set(x.lower().strip()
+                for x in re.findall(r'https://download.*\.pth', content)
+                if 'mmaction' in x)
+    statsmsg = f"""
+## [{title}]({f})
+* Number of checkpoints: {len(ckpts)}
+* Number of configs: {len(configs)}
+* Number of papers: {len(papers)}
+{paperlist}
+    """
+    stats.append((papers, configs, ckpts, statsmsg))
+allpapers = func.reduce(lambda a, b: a.union(b), [p for p, _, _, _ in stats])
+allconfigs = func.reduce(lambda a, b: a.union(b), [c for _, c, _, _ in stats])
+allckpts = func.reduce(lambda a, b: a.union(b), [c for _, _, c, _ in stats])
+msglist = '\n'.join(x for _, _, _, x in stats)
+papertypes, papercounts = np.unique([t for t, _ in allpapers],
+                                    return_counts=True)
+countstr = '\n'.join(
+    [f'   - {t}: {c}' for t, c in zip(papertypes, papercounts)])
+modelzoo = f"""
+# Overview
+* Number of checkpoints: {len(allckpts)}
+* Number of configs: {len(allconfigs)}
+* Number of papers: {len(allpapers)}
+{countstr}
+For supported datasets, see [datasets overview](datasets.md).
+{msglist}
+"""
+with open('modelzoo.md', 'w') as f:
+    f.write(modelzoo)
+# Count datasets
+files = ['supported_datasets.md']
+# files = sorted(glob.glob('docs/tasks/*.md'))
+datastats = []
+for f in files:
+    with open(f, 'r') as content_file:
+        content = content_file.read()
+    # title
+    title = content.split('\n')[0].replace('#', '')
+    # count papers
+    papers = set(
+        (papertype, titlecase.titlecase(paper.lower().strip()))
+        for (papertype, paper) in re.findall(
+            r'<!--\s*\[([A-Z]*?)\]\s*-->\s*\n.*?\btitle\s*=\s*{(.*?)}',
+            content, re.DOTALL))
+    # paper links
+    revcontent = '\n'.join(list(reversed(content.splitlines())))
+    paperlinks = {}
+    for _, p in papers:
+        print(p)
+        q = p.replace('\\', '\\\\').replace('?', '\\?')
+        paperlinks[p] = ', '.join(
+            (f'[{p.strip()} ->]({splitext(basename(f))[0]}.html#{anchor(p)})'
+             for p in re.findall(
+                 rf'\btitle\s*=\s*{{\s*{q}\s*}}.*?\n## (.*?)\s*[,;]?\s*\n',
+                 revcontent, re.DOTALL | re.IGNORECASE)))
+        print('   ', paperlinks[p])
+    paperlist = '\n'.join(
+        sorted(f'    - [{t}] {x} ({paperlinks[x]})' for t, x in papers))
+    statsmsg = f"""
+## [{title}]({f})
+* Number of papers: {len(papers)}
+{paperlist}
+    """
+    datastats.append((papers, configs, ckpts, statsmsg))
+alldatapapers = func.reduce(lambda a, b: a.union(b),
+                            [p for p, _, _, _ in datastats])
+# Summarize
+msglist = '\n'.join(x for _, _, _, x in stats)
+datamsglist = '\n'.join(x for _, _, _, x in datastats)
+papertypes, papercounts = np.unique([t for t, _ in alldatapapers],
+                                    return_counts=True)
+countstr = '\n'.join(
+    [f'   - {t}: {c}' for t, c in zip(papertypes, papercounts)])
+modelzoo = f"""
+# Overview
+* Number of papers: {len(alldatapapers)}
+{countstr}
+For supported action algorithms, see [modelzoo overview](modelzoo.md).
+{datamsglist}
+"""
+with open('datasets.md', 'w') as f:
+    f.write(modelzoo)
--- a/docs/en/supported_datasets.md
+++ b/docs/en/supported_datasets.md
+# Supported Datasets
+- Action Recognition
+  - [UCF101](/tools/data/ucf101/README.md) \[ [Homepage](https://www.crcv.ucf.edu/research/data-sets/ucf101/) \].
+  - [HMDB51](/tools/data/hmdb51/README.md) \[ [Homepage](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/) \].
+  - [Kinetics-\[400/600/700\]](/tools/data/kinetics/README.md) \[ [Homepage](https://deepmind.com/research/open-source/kinetics) \]
+  - [Something-Something V1](/tools/data/sthv1/README.md) \[ [Homepage](https://20bn.com/datasets/something-something/v1) \]
+  - [Something-Something V2](/tools/data/sthv2/README.md) \[ [Homepage](https://20bn.com/datasets/something-something) \]
+  - [Moments in Time](/tools/data/mit/README.md) \[ [Homepage](http://moments.csail.mit.edu/) \]
+  - [Multi-Moments in Time](/tools/data/mmit/README.md) \[ [Homepage](http://moments.csail.mit.edu/challenge_iccv_2019.html) \]
+  - [HVU](/tools/data/hvu/README.md) \[ [Homepage](https://github.com/holistic-video-understanding/HVU-Dataset) \]
+  - [Jester](/tools/data/jester/README.md) \[ [Homepage](https://developer.qualcomm.com/software/ai-datasets/jester) \]
+  - [GYM](/tools/data/gym/README.md) \[ [Homepage](https://sdolivia.github.io/FineGym/) \]
+  - [ActivityNet](/tools/data/activitynet/README.md) \[ [Homepage](http://activity-net.org/) \]
+  - [Diving48](/tools/data/diving48/README.md) \[ [Homepage](http://www.svcl.ucsd.edu/projects/resound/dataset.html) \]
+  - [OmniSource](/tools/data/omnisource/README.md) \[ [Homepage](https://kennymckormick.github.io/omnisource/) \]
+- Temporal Action Detection
+  - [ActivityNet](/tools/data/activitynet/README.md) \[ [Homepage](http://activity-net.org/) \]
+  - [THUMOS14](/tools/data/thumos14/README.md) \[ [Homepage](https://www.crcv.ucf.edu/THUMOS14/download.html) \]
+- Spatial Temporal Action Detection
+  - [AVA](/tools/data/ava/README.md) \[ [Homepage](https://research.google.com/ava/index.html) \]
+  - [UCF101-24](/tools/data/ucf101_24/README.md) \[ [Homepage](http://www.thumos.info/download.html) \]
+  - [JHMDB](/tools/data/jhmdb/README.md) \[ [Homepage](http://jhmdb.is.tue.mpg.de/) \]
+- Skeleton-based Action Recognition
+  - [PoseC3D Skeleton Dataset](/tools/data/skeleton/README.md) \[ [Homepage](https://kennymckormick.github.io/posec3d/) \]
+The supported datasets are listed above.
+We provide shell scripts for data preparation under the path `$MMACTION2/tools/data/`.
+Below is the detailed tutorials of data deployment for each dataset.
--- a/docs/en/switch_language.md
+++ b/docs/en/switch_language.md
+## <a href='https://mmaction2.readthedocs.io/en/latest/'>English</a>
+## <a href='https://mmaction2.readthedocs.io/zh_CN/latest/'>简体中文</a>
--- a/docs/en/tutorials/1_config.md
+++ b/docs/en/tutorials/1_config.md
+# Tutorial 1: Learn about Configs
+We use python files as configs, incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments.
+You can find all the provided configs under `$MMAction2/configs`. If you wish to inspect the config file,
+you may run `python tools/analysis/print_config.py /PATH/TO/CONFIG` to see the complete config.
+<!-- TOC -->
+- [Tutorial 1: Learn about Configs](#tutorial-1-learn-about-configs)
+  - [Modify config through script arguments](#modify-config-through-script-arguments)
+  - [Config File Structure](#config-file-structure)
+  - [Config File Naming Convention](#config-file-naming-convention)
+    - [Config System for Action localization](#config-system-for-action-localization)
+    - [Config System for Action Recognition](#config-system-for-action-recognition)
+    - [Config System for Spatio-Temporal Action Detection](#config-system-for-spatio-temporal-action-detection)
+  - [FAQ](#faq)
+    - [Use intermediate variables in configs](#use-intermediate-variables-in-configs)
+<!-- TOC -->
+## Modify config through script arguments
+When submitting jobs using "tools/train.py" or "tools/test.py", you may specify `--cfg-options` to in-place modify the config.
+- Update config keys of dict.
+  The config options can be specified following the order of the dict keys in the original config.
+  For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode.
+- Update keys inside a list of configs.
+  Some config dicts are composed as a list in your config. For example, the training pipeline `data.train.pipeline` is normally a list
+  e.g. `[dict(type='SampleFrames'), ...]`. If you want to change `'SampleFrames'` to `'DenseSampleFrames'` in the pipeline,
+  you may specify `--cfg-options data.train.pipeline.0.type=DenseSampleFrames`.
+- Update values of list/tuples.
+  If the value to be updated is a list or a tuple. For example, the config file normally sets `workflow=[('train', 1)]`. If you want to
+  change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark " is necessary to
+  support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value.
+## Config File Structure
+There are 3 basic component types under `config/_base_`, model, schedule, default_runtime.
+Many methods could be easily constructed with one of each like TSN, I3D, SlowOnly, etc.
+The configs that are composed by components from `_base_` are called _primitive_.
+For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3.
+For easy understanding, we recommend contributors to inherit from exiting methods.
+For example, if some modification is made base on TSN, users may first inherit the basic TSN structure by specifying `_base_ = ../tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py`, then modify the necessary fields in the config files.
+If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder under `configs/TASK`.
+Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/understand_mmcv/config.html) for detailed documentation.
+## Config File Naming Convention
+We follow the style below to name config files. Contributors are advised to follow the same style.
+```
+{model}_[model setting]_{backbone}_[misc]_{data setting}_[gpu x batch_per_gpu]_{schedule}_{dataset}_{modality}
+```
+`{xxx}` is required field and `[yyy]` is optional.
+- `{model}`: model type, e.g. `tsn`, `i3d`, etc.
+- `[model setting]`: specific setting for some models.
+- `{backbone}`: backbone type, e.g. `r50` (ResNet-50), etc.
+- `[misc]`: miscellaneous setting/plugins of model, e.g. `dense`, `320p`, `video`, etc.
+- `{data setting}`: frame sample setting in `{clip_len}x{frame_interval}x{num_clips}` format.
+- `[gpu x batch_per_gpu]`: GPUs and samples per GPU.
+- `{schedule}`: training schedule, e.g. `20e` means 20 epochs.
+- `{dataset}`: dataset name, e.g. `kinetics400`, `mmit`, etc.
+- `{modality}`: frame modality, e.g. `rgb`, `flow`, etc.
+### Config System for Action localization
+We incorporate modular design into our config system,
+which is convenient to conduct various experiments.
+- An Example of BMN
+  To help the users have a basic idea of a complete config structure and the modules in an action localization system,
+  we make brief comments on the config of BMN as the following.
+  For more detailed usage and alternative for per parameter in each module, please refer to the [API documentation](https://mmaction2.readthedocs.io/en/latest/api.html).
+  ```python
+  # model settings
+  model = dict(  # Config of the model
+      type='BMN',  # Type of the localizer
+      temporal_dim=100,  # Total frames selected for each video
+      boundary_ratio=0.5,  # Ratio for determining video boundaries
+      num_samples=32,  # Number of samples for each proposal
+      num_samples_per_bin=3,  # Number of bin samples for each sample
+      feat_dim=400,  # Dimension of feature
+      soft_nms_alpha=0.4,  # Soft NMS alpha
+      soft_nms_low_threshold=0.5,  # Soft NMS low threshold
+      soft_nms_high_threshold=0.9,  # Soft NMS high threshold
+      post_process_top_k=100)  # Top k proposals in post process
+  # model training and testing settings
+  train_cfg = None  # Config of training hyperparameters for BMN
+  test_cfg = dict(average_clips='score')  # Config for testing hyperparameters for BMN
+  # dataset settings
+  dataset_type = 'ActivityNetDataset'  # Type of dataset for training, validation and testing
+  data_root = 'data/activitynet_feature_cuhk/csv_mean_100/'  # Root path to data for training
+  data_root_val = 'data/activitynet_feature_cuhk/csv_mean_100/'  # Root path to data for validation and testing
+  ann_file_train = 'data/ActivityNet/anet_anno_train.json'  # Path to the annotation file for training
+  ann_file_val = 'data/ActivityNet/anet_anno_val.json'  # Path to the annotation file for validation
+  ann_file_test = 'data/ActivityNet/anet_anno_test.json'  # Path to the annotation file for testing
+  train_pipeline = [  # List of training pipeline steps
+      dict(type='LoadLocalizationFeature'),  # Load localization feature pipeline
+      dict(type='GenerateLocalizationLabels'),  # Generate localization labels pipeline
+      dict(  # Config of Collect
+          type='Collect',  # Collect pipeline that decides which keys in the data should be passed to the localizer
+          keys=['raw_feature', 'gt_bbox'],  # Keys of input
+          meta_name='video_meta',  # Meta name
+          meta_keys=['video_name']),  # Meta keys of input
+      dict(  # Config of ToTensor
+          type='ToTensor',  # Convert other types to tensor type pipeline
+          keys=['raw_feature']),  # Keys to be converted from image to tensor
+      dict(  # Config of ToDataContainer
+          type='ToDataContainer',  # Pipeline to convert the data to DataContainer
+          fields=[dict(key='gt_bbox', stack=False, cpu_only=True)])  # Required fields to be converted with keys and attributes
+  ]
+  val_pipeline = [  # List of validation pipeline steps
+      dict(type='LoadLocalizationFeature'),  # Load localization feature pipeline
+      dict(type='GenerateLocalizationLabels'),  # Generate localization labels pipeline
+      dict(  # Config of Collect
+          type='Collect',  # Collect pipeline that decides which keys in the data should be passed to the localizer
+          keys=['raw_feature', 'gt_bbox'],  # Keys of input
+          meta_name='video_meta',  # Meta name
+          meta_keys=[
+              'video_name', 'duration_second', 'duration_frame', 'annotations',
+              'feature_frame'
+          ]),  # Meta keys of input
+      dict(  # Config of ToTensor
+          type='ToTensor',  # Convert other types to tensor type pipeline
+          keys=['raw_feature']),  # Keys to be converted from image to tensor
+      dict(  # Config of ToDataContainer
+          type='ToDataContainer',  # Pipeline to convert the data to DataContainer
+          fields=[dict(key='gt_bbox', stack=False, cpu_only=True)])  # Required fields to be converted with keys and attributes
+  ]
+  test_pipeline = [  # List of testing pipeline steps
+      dict(type='LoadLocalizationFeature'),  # Load localization feature pipeline
+      dict(  # Config of Collect
+          type='Collect',  # Collect pipeline that decides which keys in the data should be passed to the localizer
+          keys=['raw_feature'],  # Keys of input
+          meta_name='video_meta',  # Meta name
+          meta_keys=[
+              'video_name', 'duration_second', 'duration_frame', 'annotations',
+              'feature_frame'
+          ]),  # Meta keys of input
+      dict(  # Config of ToTensor
+          type='ToTensor',  # Convert other types to tensor type pipeline
+          keys=['raw_feature']),  # Keys to be converted from image to tensor
+  ]
+  data = dict(  # Config of data
+      videos_per_gpu=8,  # Batch size of each single GPU
+      workers_per_gpu=8,  # Workers to pre-fetch data for each single GPU
+      train_dataloader=dict(  # Additional config of train dataloader
+          drop_last=True),  # Whether to drop out the last batch of data in training
+      val_dataloader=dict(  # Additional config of validation dataloader
+          videos_per_gpu=1),  # Batch size of each single GPU during evaluation
+      test_dataloader=dict(  # Additional config of test dataloader
+          videos_per_gpu=2),  # Batch size of each single GPU during testing
+      test=dict(  # Testing dataset config
+          type=dataset_type,
+          ann_file=ann_file_test,
+          pipeline=test_pipeline,
+          data_prefix=data_root_val),
+      val=dict(  # Validation dataset config
+          type=dataset_type,
+          ann_file=ann_file_val,
+          pipeline=val_pipeline,
+          data_prefix=data_root_val),
+      train=dict(  # Training dataset config
+          type=dataset_type,
+          ann_file=ann_file_train,
+          pipeline=train_pipeline,
+          data_prefix=data_root))
+  # optimizer
+  optimizer = dict(
+      # Config used to build optimizer, support (1). All the optimizers in PyTorch
+      # whose arguments are also the same as those in PyTorch. (2). Custom optimizers
+      # which are built on `constructor`, referring to "tutorials/5_new_modules.md"
+      # for implementation.
+      type='Adam',  # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
+      lr=0.001,  # Learning rate, see detail usages of the parameters in the documentation of PyTorch
+      weight_decay=0.0001)  # Weight decay of Adam
+  optimizer_config = dict(  # Config used to build the optimizer hook
+      grad_clip=None)  # Most of the methods do not use gradient clip
+  # learning policy
+  lr_config = dict(  # Learning rate scheduler config used to register LrUpdater hook
+      policy='step',  # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
+      step=7)  # Steps to decay the learning rate
+  total_epochs = 9  # Total epochs to train the model
+  checkpoint_config = dict(  # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
+      interval=1)  # Interval to save checkpoint
+  evaluation = dict(  # Config of evaluation during training
+      interval=1,  # Interval to perform evaluation
+      metrics=['AR@AN'])  # Metrics to be performed
+  log_config = dict(  # Config to register logger hook
+      interval=50,  # Interval to print the log
+      hooks=[  # Hooks to be implemented during training
+          dict(type='TextLoggerHook'),  # The logger used to record the training process
+          # dict(type='TensorboardLoggerHook'),  # The Tensorboard logger is also supported
+      ])
+  # runtime settings
+  dist_params = dict(backend='nccl')  # Parameters to setup distributed training, the port can also be set
+  log_level = 'INFO'  # The level of logging
+  work_dir = './work_dirs/bmn_400x100_2x8_9e_activitynet_feature/'  # Directory to save the model checkpoints and logs for the current experiments
+  load_from = None  # load models as a pre-trained model from a given path. This will not resume training
+  resume_from = None  # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved
+  workflow = [('train', 1)]  # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once
+  output_config = dict(  # Config of localization output
+      out=f'{work_dir}/results.json',  # Path to output file
+      output_format='json')  # File format of output file
+  ```
+### Config System for Action Recognition
+We incorporate modular design into our config system,
+which is convenient to conduct various experiments.
+- An Example of TSN
+  To help the users have a basic idea of a complete config structure and the modules in an action recognition system,
+  we make brief comments on the config of TSN as the following.
+  For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.
+  ```python
+  # model settings
+  model = dict(  # Config of the model
+      type='Recognizer2D',  # Type of the recognizer
+      backbone=dict(  # Dict for backbone
+          type='ResNet',  # Name of the backbone
+          pretrained='torchvision://resnet50',  # The url/site of the pretrained model
+          depth=50,  # Depth of ResNet model
+          norm_eval=False),  # Whether to set BN layers to eval mode when training
+      cls_head=dict(  # Dict for classification head
+          type='TSNHead',  # Name of classification head
+          num_classes=400,  # Number of classes to be classified.
+          in_channels=2048,  # The input channels of classification head.
+          spatial_type='avg',  # Type of pooling in spatial dimension
+          consensus=dict(type='AvgConsensus', dim=1),  # Config of consensus module
+          dropout_ratio=0.4,  # Probability in dropout layer
+          init_std=0.01), # Std value for linear layer initiation
+          # model training and testing settings
+          train_cfg=None,  # Config of training hyperparameters for TSN
+          test_cfg=dict(average_clips=None))  # Config for testing hyperparameters for TSN.
+  # dataset settings
+  dataset_type = 'RawframeDataset'  # Type of dataset for training, validation and testing
+  data_root = 'data/kinetics400/rawframes_train/'  # Root path to data for training
+  data_root_val = 'data/kinetics400/rawframes_val/'  # Root path to data for validation and testing
+  ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt'  # Path to the annotation file for training
+  ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt'  # Path to the annotation file for validation
+  ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt'  # Path to the annotation file for testing
+  img_norm_cfg = dict(  # Config of image normalization used in data pipeline
+      mean=[123.675, 116.28, 103.53],  # Mean values of different channels to normalize
+      std=[58.395, 57.12, 57.375],  # Std values of different channels to normalize
+      to_bgr=False)  # Whether to convert channels from RGB to BGR
+  train_pipeline = [  # List of training pipeline steps
+      dict(  # Config of SampleFrames
+          type='SampleFrames',  # Sample frames pipeline, sampling frames from video
+          clip_len=1,  # Frames of each sampled output clip
+          frame_interval=1,  # Temporal interval of adjacent sampled frames
+          num_clips=3),  # Number of clips to be sampled
+      dict(  # Config of RawFrameDecode
+          type='RawFrameDecode'),  # Load and decode Frames pipeline, picking raw frames with given indices
+      dict(  # Config of Resize
+          type='Resize',  # Resize pipeline
+          scale=(-1, 256)),  # The scale to resize images
+      dict(  # Config of MultiScaleCrop
+          type='MultiScaleCrop',  # Multi scale crop pipeline, cropping images with a list of randomly selected scales
+          input_size=224,  # Input size of the network
+          scales=(1, 0.875, 0.75, 0.66),  # Scales of width and height to be selected
+          random_crop=False,  # Whether to randomly sample cropping bbox
+          max_wh_scale_gap=1),  # Maximum gap of w and h scale levels
+      dict(  # Config of Resize
+          type='Resize',  # Resize pipeline
+          scale=(224, 224),  # The scale to resize images
+          keep_ratio=False),  # Whether to resize with changing the aspect ratio
+      dict(  # Config of Flip
+          type='Flip',  # Flip Pipeline
+          flip_ratio=0.5),  # Probability of implementing flip
+      dict(  # Config of Normalize
+          type='Normalize',  # Normalize pipeline
+          **img_norm_cfg),  # Config of image normalization
+      dict(  # Config of FormatShape
+          type='FormatShape',  # Format shape pipeline, Format final image shape to the given input_format
+          input_format='NCHW'),  # Final image shape format
+      dict(  # Config of Collect
+          type='Collect',  # Collect pipeline that decides which keys in the data should be passed to the recognizer
+          keys=['imgs', 'label'],  # Keys of input
+          meta_keys=[]),  # Meta keys of input
+      dict(  # Config of ToTensor
+          type='ToTensor',  # Convert other types to tensor type pipeline
+          keys=['imgs', 'label'])  # Keys to be converted from image to tensor
+  ]
+  val_pipeline = [  # List of validation pipeline steps
+      dict(  # Config of SampleFrames
+          type='SampleFrames',  # Sample frames pipeline, sampling frames from video
+          clip_len=1,  # Frames of each sampled output clip
+          frame_interval=1,  # Temporal interval of adjacent sampled frames
+          num_clips=3,  # Number of clips to be sampled
+          test_mode=True),  # Whether to set test mode in sampling
+      dict(  # Config of RawFrameDecode
+          type='RawFrameDecode'),  # Load and decode Frames pipeline, picking raw frames with given indices
+      dict(  # Config of Resize
+          type='Resize',  # Resize pipeline
+          scale=(-1, 256)),  # The scale to resize images
+      dict(  # Config of CenterCrop
+          type='CenterCrop',  # Center crop pipeline, cropping the center area from images
+          crop_size=224),  # The size to crop images
+      dict(  # Config of Flip
+          type='Flip',  # Flip pipeline
+          flip_ratio=0),  # Probability of implementing flip
+      dict(  # Config of Normalize
+          type='Normalize',  # Normalize pipeline
+          **img_norm_cfg),  # Config of image normalization
+      dict(  # Config of FormatShape
+          type='FormatShape',  # Format shape pipeline, Format final image shape to the given input_format
+          input_format='NCHW'),  # Final image shape format
+      dict(  # Config of Collect
+          type='Collect',  # Collect pipeline that decides which keys in the data should be passed to the recognizer
+          keys=['imgs', 'label'],  # Keys of input
+          meta_keys=[]),  # Meta keys of input
+      dict(  # Config of ToTensor
+          type='ToTensor',  # Convert other types to tensor type pipeline
+          keys=['imgs'])  # Keys to be converted from image to tensor
+  ]
+  test_pipeline = [  # List of testing pipeline steps
+      dict(  # Config of SampleFrames
+          type='SampleFrames',  # Sample frames pipeline, sampling frames from video
+          clip_len=1,  # Frames of each sampled output clip
+          frame_interval=1,  # Temporal interval of adjacent sampled frames
+          num_clips=25,  # Number of clips to be sampled
+          test_mode=True),  # Whether to set test mode in sampling
+      dict(  # Config of RawFrameDecode
+          type='RawFrameDecode'),  # Load and decode Frames pipeline, picking raw frames with given indices
+      dict(  # Config of Resize
+          type='Resize',  # Resize pipeline
+          scale=(-1, 256)),  # The scale to resize images
+      dict(  # Config of TenCrop
+          type='TenCrop',  # Ten crop pipeline, cropping ten area from images
+          crop_size=224),  # The size to crop images
+      dict(  # Config of Flip
+          type='Flip',  # Flip pipeline
+          flip_ratio=0),  # Probability of implementing flip
+      dict(  # Config of Normalize
+          type='Normalize',  # Normalize pipeline
+          **img_norm_cfg),  # Config of image normalization
+      dict(  # Config of FormatShape
+          type='FormatShape',  # Format shape pipeline, Format final image shape to the given input_format
+          input_format='NCHW'),  # Final image shape format
+      dict(  # Config of Collect
+          type='Collect',  # Collect pipeline that decides which keys in the data should be passed to the recognizer
+          keys=['imgs', 'label'],  # Keys of input
+          meta_keys=[]),  # Meta keys of input
+      dict(  # Config of ToTensor
+          type='ToTensor',  # Convert other types to tensor type pipeline
+          keys=['imgs'])  # Keys to be converted from image to tensor
+  ]
+  data = dict(  # Config of data
+      videos_per_gpu=32,  # Batch size of each single GPU
+      workers_per_gpu=2,  # Workers to pre-fetch data for each single GPU
+      train_dataloader=dict(  # Additional config of train dataloader
+          drop_last=True),  # Whether to drop out the last batch of data in training
+      val_dataloader=dict(  # Additional config of validation dataloader
+          videos_per_gpu=1),  # Batch size of each single GPU during evaluation
+      test_dataloader=dict(  # Additional config of test dataloader
+          videos_per_gpu=2),  # Batch size of each single GPU during testing
+      train=dict(  # Training dataset config
+          type=dataset_type,
+          ann_file=ann_file_train,
+          data_prefix=data_root,
+          pipeline=train_pipeline),
+      val=dict(  # Validation dataset config
+          type=dataset_type,
+          ann_file=ann_file_val,
+          data_prefix=data_root_val,
+          pipeline=val_pipeline),
+      test=dict(  # Testing dataset config
+          type=dataset_type,
+          ann_file=ann_file_test,
+          data_prefix=data_root_val,
+          pipeline=test_pipeline))
+  # optimizer
+  optimizer = dict(
+      # Config used to build optimizer, support (1). All the optimizers in PyTorch
+      # whose arguments are also the same as those in PyTorch. (2). Custom optimizers
+      # which are built on `constructor`, referring to "tutorials/5_new_modules.md"
+      # for implementation.
+      type='SGD',  # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
+      lr=0.01,  # Learning rate, see detail usages of the parameters in the documentation of PyTorch
+      momentum=0.9,  # Momentum,
+      weight_decay=0.0001)  # Weight decay of SGD
+  optimizer_config = dict(  # Config used to build the optimizer hook
+      grad_clip=dict(max_norm=40, norm_type=2))  # Use gradient clip
+  # learning policy
+  lr_config = dict(  # Learning rate scheduler config used to register LrUpdater hook
+      policy='step',  # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
+      step=[40, 80])  # Steps to decay the learning rate
+  total_epochs = 100  # Total epochs to train the model
+  checkpoint_config = dict(  # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
+      interval=5)  # Interval to save checkpoint
+  evaluation = dict(  # Config of evaluation during training
+      interval=5,  # Interval to perform evaluation
+      metrics=['top_k_accuracy', 'mean_class_accuracy'],  # Metrics to be performed
+      metric_options=dict(top_k_accuracy=dict(topk=(1, 3))), # Set top-k accuracy to 1 and 3 during validation
+      save_best='top_k_accuracy')  # set `top_k_accuracy` as key indicator to save best checkpoint
+  eval_config = dict(
+      metric_options=dict(top_k_accuracy=dict(topk=(1, 3)))) # Set top-k accuracy to 1 and 3 during testing. You can also use `--eval top_k_accuracy` to assign evaluation metrics
+  log_config = dict(  # Config to register logger hook
+      interval=20,  # Interval to print the log
+      hooks=[  # Hooks to be implemented during training
+          dict(type='TextLoggerHook'),  # The logger used to record the training process
+          # dict(type='TensorboardLoggerHook'),  # The Tensorboard logger is also supported
+      ])
+  # runtime settings
+  dist_params = dict(backend='nccl')  # Parameters to setup distributed training, the port can also be set
+  log_level = 'INFO'  # The level of logging
+  work_dir = './work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/'  # Directory to save the model checkpoints and logs for the current experiments
+  load_from = None  # load models as a pre-trained model from a given path. This will not resume training
+  resume_from = None  # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved
+  workflow = [('train', 1)]  # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once
+  ```
+### Config System for Spatio-Temporal Action Detection
+We incorporate modular design into our config system, which is convenient to conduct various experiments.
+- An Example of FastRCNN
+  To help the users have a basic idea of a complete config structure and the modules in a spatio-temporal action detection system,
+  we make brief comments on the config of FastRCNN as the following.
+  For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.
+  ```python
+  # model setting
+  model = dict(  # Config of the model
+      type='FastRCNN',  # Type of the detector
+      backbone=dict(  # Dict for backbone
+          type='ResNet3dSlowOnly',  # Name of the backbone
+          depth=50, # Depth of ResNet model
+          pretrained=None,   # The url/site of the pretrained model
+          pretrained2d=False, # If the pretrained model is 2D
+          lateral=False,  # If the backbone is with lateral connections
+          num_stages=4, # Stages of ResNet model
+          conv1_kernel=(1, 7, 7), # Conv1 kernel size
+          conv1_stride_t=1, # Conv1 temporal stride
+          pool1_stride_t=1, # Pool1 temporal stride
+          spatial_strides=(1, 2, 2, 1)),  # The spatial stride for each ResNet stage
+      roi_head=dict(  # Dict for roi_head
+          type='AVARoIHead',  # Name of the roi_head
+          bbox_roi_extractor=dict(  # Dict for bbox_roi_extractor
+              type='SingleRoIExtractor3D',  # Name of the bbox_roi_extractor
+              roi_layer_type='RoIAlign',  # Type of the RoI op
+              output_size=8,  # Output feature size of the RoI op
+              with_temporal_pool=True), # If temporal dim is pooled
+          bbox_head=dict( # Dict for bbox_head
+              type='BBoxHeadAVA', # Name of the bbox_head
+              in_channels=2048, # Number of channels of the input feature
+              num_classes=81, # Number of action classes + 1
+              multilabel=True,  # If the dataset is multilabel
+              dropout_ratio=0.5)),  # The dropout ratio used
+      # model training and testing settings
+      train_cfg=dict(  # Training config of FastRCNN
+          rcnn=dict(  # Dict for rcnn training config
+              assigner=dict(  # Dict for assigner
+                  type='MaxIoUAssignerAVA', # Name of the assigner
+                  pos_iou_thr=0.9,  # IoU threshold for positive examples, > pos_iou_thr -> positive
+                  neg_iou_thr=0.9,  # IoU threshold for negative examples, < neg_iou_thr -> negative
+                  min_pos_iou=0.9), # Minimum acceptable IoU for positive examples
+              sampler=dict( # Dict for sample
+                  type='RandomSampler', # Name of the sampler
+                  num=32, # Batch Size of the sampler
+                  pos_fraction=1, # Positive bbox fraction of the sampler
+                  neg_pos_ub=-1,  # Upper bound of the ratio of num negative to num positive
+                  add_gt_as_proposals=True), # Add gt bboxes as proposals
+              pos_weight=1.0, # Loss weight of positive examples
+              debug=False)), # Debug mode
+      test_cfg=dict( # Testing config of FastRCNN
+          rcnn=dict(  # Dict for rcnn testing config
+              action_thr=0.002))) # The threshold of an action
+  # dataset settings
+  dataset_type = 'AVADataset' # Type of dataset for training, validation and testing
+  data_root = 'data/ava/rawframes'  # Root path to data
+  anno_root = 'data/ava/annotations'  # Root path to annotations
+  ann_file_train = f'{anno_root}/ava_train_v2.1.csv'  # Path to the annotation file for training
+  ann_file_val = f'{anno_root}/ava_val_v2.1.csv'  # Path to the annotation file for validation
+  exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv'  # Path to the exclude annotation file for training
+  exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv'  # Path to the exclude annotation file for validation
+  label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt'  # Path to the label file
+  proposal_file_train = f'{anno_root}/ava_dense_proposals_train.FAIR.recall_93.9.pkl'  # Path to the human detection proposals for training examples
+  proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl'  # Path to the human detection proposals for validation examples
+  img_norm_cfg = dict(  # Config of image normalization used in data pipeline
+      mean=[123.675, 116.28, 103.53], # Mean values of different channels to normalize
+      std=[58.395, 57.12, 57.375],   # Std values of different channels to normalize
+      to_bgr=False) # Whether to convert channels from RGB to BGR
+  train_pipeline = [  # List of training pipeline steps
+      dict(  # Config of SampleFrames
+          type='AVASampleFrames',  # Sample frames pipeline, sampling frames from video
+          clip_len=4,  # Frames of each sampled output clip
+          frame_interval=16),  # Temporal interval of adjacent sampled frames
+      dict(  # Config of RawFrameDecode
+          type='RawFrameDecode'),  # Load and decode Frames pipeline, picking raw frames with given indices
+      dict(  # Config of RandomRescale
+          type='RandomRescale',   # Randomly rescale the shortedge by a given range
+          scale_range=(256, 320)),   # The shortedge size range of RandomRescale
+      dict(  # Config of RandomCrop
+          type='RandomCrop',   # Randomly crop a patch with the given size
+          size=256),   # The size of the cropped patch
+      dict(  # Config of Flip
+          type='Flip',  # Flip Pipeline
+          flip_ratio=0.5),  # Probability of implementing flip
+      dict(  # Config of Normalize
+          type='Normalize',  # Normalize pipeline
+          **img_norm_cfg),  # Config of image normalization
+      dict(  # Config of FormatShape
+          type='FormatShape',  # Format shape pipeline, Format final image shape to the given input_format
+          input_format='NCTHW',  # Final image shape format
+          collapse=True),   # Collapse the dim N if N == 1
+      dict(  # Config of Rename
+          type='Rename',  # Rename keys
+          mapping=dict(imgs='img')),  # The old name to new name mapping
+      dict(  # Config of ToTensor
+          type='ToTensor',  # Convert other types to tensor type pipeline
+          keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),  # Keys to be converted from image to tensor
+      dict(  # Config of ToDataContainer
+          type='ToDataContainer',  # Convert other types to DataContainer type pipeline
+          fields=[   # Fields to convert to DataContainer
+              dict(   # Dict of fields
+                  key=['proposals', 'gt_bboxes', 'gt_labels'],  # Keys to Convert to DataContainer
+                  stack=False)]),  # Whether to stack these tensor
+      dict(  # Config of Collect
+          type='Collect',  # Collect pipeline that decides which keys in the data should be passed to the detector
+          keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],  # Keys of input
+          meta_keys=['scores', 'entity_ids']),  # Meta keys of input
+  ]
+  val_pipeline = [  # List of validation pipeline steps
+      dict(  # Config of SampleFrames
+          type='AVASampleFrames',  # Sample frames pipeline, sampling frames from video
+          clip_len=4,  # Frames of each sampled output clip
+          frame_interval=16)  # Temporal interval of adjacent sampled frames
+      dict(  # Config of RawFrameDecode
+          type='RawFrameDecode'),  # Load and decode Frames pipeline, picking raw frames with given indices
+      dict(  # Config of Resize
+          type='Resize',  # Resize pipeline
+          scale=(-1, 256)),  # The scale to resize images
+      dict(  # Config of Normalize
+          type='Normalize',  # Normalize pipeline
+          **img_norm_cfg),  # Config of image normalization
+      dict(  # Config of FormatShape
+          type='FormatShape',  # Format shape pipeline, Format final image shape to the given input_format
+          input_format='NCTHW',  # Final image shape format
+          collapse=True),   # Collapse the dim N if N == 1
+      dict(  # Config of Rename
+          type='Rename',  # Rename keys
+          mapping=dict(imgs='img')),  # The old name to new name mapping
+      dict(  # Config of ToTensor
+          type='ToTensor',  # Convert other types to tensor type pipeline
+          keys=['img', 'proposals']),  # Keys to be converted from image to tensor
+      dict(  # Config of ToDataContainer
+          type='ToDataContainer',  # Convert other types to DataContainer type pipeline
+          fields=[   # Fields to convert to DataContainer
+              dict(   # Dict of fields
+                  key=['proposals'],  # Keys to Convert to DataContainer
+                  stack=False)]),  # Whether to stack these tensor
+      dict(  # Config of Collect
+          type='Collect',  # Collect pipeline that decides which keys in the data should be passed to the detector
+          keys=['img', 'proposals'],  # Keys of input
+          meta_keys=['scores', 'entity_ids'],  # Meta keys of input
+          nested=True)  # Whether to wrap the data in a nested list
+  ]
+  data = dict(  # Config of data
+      videos_per_gpu=16,  # Batch size of each single GPU
+      workers_per_gpu=2,  # Workers to pre-fetch data for each single GPU
+      val_dataloader=dict(   # Additional config of validation dataloader
+          videos_per_gpu=1),  # Batch size of each single GPU during evaluation
+      train=dict(   # Training dataset config
+          type=dataset_type,
+          ann_file=ann_file_train,
+          exclude_file=exclude_file_train,
+          pipeline=train_pipeline,
+          label_file=label_file,
+          proposal_file=proposal_file_train,
+          person_det_score_thr=0.9,
+          data_prefix=data_root),
+      val=dict(     # Validation dataset config
+          type=dataset_type,
+          ann_file=ann_file_val,
+          exclude_file=exclude_file_val,
+          pipeline=val_pipeline,
+          label_file=label_file,
+          proposal_file=proposal_file_val,
+          person_det_score_thr=0.9,
+          data_prefix=data_root))
+  data['test'] = data['val']    # Set test_dataset as val_dataset
+  # optimizer
+  optimizer = dict(
+      # Config used to build optimizer, support (1). All the optimizers in PyTorch
+      # whose arguments are also the same as those in PyTorch. (2). Custom optimizers
+      # which are built on `constructor`, referring to "tutorials/5_new_modules.md"
+      # for implementation.
+      type='SGD',  # Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
+      lr=0.2,  # Learning rate, see detail usages of the parameters in the documentation of PyTorch (for 8gpu)
+      momentum=0.9,  # Momentum,
+      weight_decay=0.00001)  # Weight decay of SGD
+  optimizer_config = dict(  # Config used to build the optimizer hook
+      grad_clip=dict(max_norm=40, norm_type=2))   # Use gradient clip
+  lr_config = dict(  # Learning rate scheduler config used to register LrUpdater hook
+      policy='step',  # Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
+      step=[40, 80],  # Steps to decay the learning rate
+      warmup='linear',  # Warmup strategy
+      warmup_by_epoch=True,  # Warmup_iters indicates iter num or epoch num
+      warmup_iters=5,   # Number of iters or epochs for warmup
+      warmup_ratio=0.1)   # The initial learning rate is warmup_ratio * lr
+  total_epochs = 20  # Total epochs to train the model
+  checkpoint_config = dict(  # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
+      interval=1)   # Interval to save checkpoint
+  workflow = [('train', 1)]   # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once
+  evaluation = dict(  # Config of evaluation during training
+      interval=1, save_best='mAP@0.5IOU')  # Interval to perform evaluation and the key for saving best checkpoint
+  log_config = dict(  # Config to register logger hook
+      interval=20,  # Interval to print the log
+      hooks=[  # Hooks to be implemented during training
+          dict(type='TextLoggerHook'),  # The logger used to record the training process
+      ])
+  # runtime settings
+  dist_params = dict(backend='nccl')  # Parameters to setup distributed training, the port can also be set
+  log_level = 'INFO'  # The level of logging
+  work_dir = ('./work_dirs/ava/'  # Directory to save the model checkpoints and logs for the current experiments
+              'slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb')
+  load_from = ('https://download.openmmlab.com/mmaction/recognition/slowonly/'  # load models as a pre-trained model from a given path. This will not resume training
+               'slowonly_r50_4x16x1_256e_kinetics400_rgb/'
+               'slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth')
+  resume_from = None  # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved
+  ```
+## FAQ
+### Use intermediate variables in configs
+Some intermediate variables are used in the config files, like `train_pipeline`/`val_pipeline`/`test_pipeline`,
+`ann_file_train`/`ann_file_val`/`ann_file_test`, `img_norm_cfg` etc.
+For Example, we would like to first define `train_pipeline`/`val_pipeline`/`test_pipeline` and pass them into `data`.
+Thus, `train_pipeline`/`val_pipeline`/`test_pipeline` are intermediate variable.
+we also define `ann_file_train`/`ann_file_val`/`ann_file_test` and `data_root`/`data_root_val` to provide data pipeline some
+basic information.
+In addition, we use `img_norm_cfg` as intermediate variables to construct data augmentation components.
+```python
+...
+dataset_type = 'RawframeDataset'
+data_root = 'data/kinetics400/rawframes_train'
+data_root_val = 'data/kinetics400/rawframes_val'
+ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt'
+ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt'
+ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
+train_pipeline = [
+    dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1),
+    dict(type='RawFrameDecode'),
+    dict(type='Resize', scale=(-1, 256)),
+    dict(
+        type='MultiScaleCrop',
+        input_size=224,
+        scales=(1, 0.8),
+        random_crop=False,
+        max_wh_scale_gap=0),
+    dict(type='Resize', scale=(224, 224), keep_ratio=False),
+    dict(type='Flip', flip_ratio=0.5),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='FormatShape', input_format='NCTHW'),
+    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
+    dict(type='ToTensor', keys=['imgs', 'label'])
+]
+val_pipeline = [
+    dict(
+        type='SampleFrames',
+        clip_len=32,
+        frame_interval=2,
+        num_clips=1,
+        test_mode=True),
+    dict(type='RawFrameDecode'),
+    dict(type='Resize', scale=(-1, 256)),
+    dict(type='CenterCrop', crop_size=224),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='FormatShape', input_format='NCTHW'),
+    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
+    dict(type='ToTensor', keys=['imgs'])
+]
+test_pipeline = [
+    dict(
+        type='SampleFrames',
+        clip_len=32,
+        frame_interval=2,
+        num_clips=10,
+        test_mode=True),
+    dict(type='RawFrameDecode'),
+    dict(type='Resize', scale=(-1, 256)),
+    dict(type='ThreeCrop', crop_size=256),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='FormatShape', input_format='NCTHW'),
+    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
+    dict(type='ToTensor', keys=['imgs'])
+]
+data = dict(
+    videos_per_gpu=8,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        ann_file=ann_file_train,
+        data_prefix=data_root,
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        ann_file=ann_file_val,
+        data_prefix=data_root_val,
+        pipeline=val_pipeline),
+    test=dict(
+        type=dataset_type,
+        ann_file=ann_file_val,
+        data_prefix=data_root_val,
+        pipeline=test_pipeline))
+```
--- a/docs/en/tutorials/2_finetune.md
+++ b/docs/en/tutorials/2_finetune.md
+# Tutorial 2: Finetuning Models
+This tutorial provides instructions for users to use the pre-trained models
+to finetune them on other datasets, so that better performance can be achieved.
+<!-- TOC -->
+- [Tutorial 2: Finetuning Models](#tutorial-2-finetuning-models)
+  - [Outline](#outline)
+  - [Modify Head](#modify-head)
+  - [Modify Dataset](#modify-dataset)
+  - [Modify Training Schedule](#modify-training-schedule)
+  - [Use Pre-Trained Model](#use-pre-trained-model)
+<!-- TOC -->
+## Outline
+There are two steps to finetune a model on a new dataset.
+1. Add support for the new dataset. See [Tutorial 3: Adding New Dataset](3_new_dataset.md).
+2. Modify the configs. This will be discussed in this tutorial.
+For example, if the users want to finetune models pre-trained on Kinetics-400 Dataset to another dataset, say UCF101,
+then four parts in the config (see [here](1_config.md)) needs attention.
+## Modify Head
+The `num_classes` in the `cls_head` need to be changed to the class number of the new dataset.
+The weights of the pre-trained models are reused except for the final prediction layer.
+So it is safe to change the class number.
+In our case, UCF101 has 101 classes.
+So we change it from 400 (class number of Kinetics-400) to 101.
+```python
+model = dict(
+    type='Recognizer2D',
+    backbone=dict(
+        type='ResNet',
+        pretrained='torchvision://resnet50',
+        depth=50,
+        norm_eval=False),
+    cls_head=dict(
+        type='TSNHead',
+        num_classes=101,   # change from 400 to 101
+        in_channels=2048,
+        spatial_type='avg',
+        consensus=dict(type='AvgConsensus', dim=1),
+        dropout_ratio=0.4,
+        init_std=0.01),
+    train_cfg=None,
+    test_cfg=dict(average_clips=None))
+```
+Note that the `pretrained='torchvision://resnet50'` setting is used for initializing backbone.
+If you are training a new model from ImageNet-pretrained weights, this is for you.
+However, this setting is not related to our task at hand.
+What we need is `load_from`, which will be discussed later.
+## Modify Dataset
+MMAction2 supports UCF101, Kinetics-400, Moments in Time, Multi-Moments in Time, THUMOS14,
+Something-Something V1&V2, ActivityNet Dataset.
+The users may need to adapt one of the above dataset to fit for their special datasets.
+In our case, UCF101 is already supported by various dataset types, like `RawframeDataset`,
+so we change the config as follows.
+```python
+# dataset settings
+dataset_type = 'RawframeDataset'
+data_root = 'data/ucf101/rawframes_train/'
+data_root_val = 'data/ucf101/rawframes_val/'
+ann_file_train = 'data/ucf101/ucf101_train_list.txt'
+ann_file_val = 'data/ucf101/ucf101_val_list.txt'
+ann_file_test = 'data/ucf101/ucf101_val_list.txt'
+```
+## Modify Training Schedule
+Finetuning usually requires smaller learning rate and less training epochs.
+```python
+# optimizer
+optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001)  # change from 0.01 to 0.005
+optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
+# learning policy
+lr_config = dict(policy='step', step=[20, 40])
+total_epochs = 50 # change from 100 to 50
+checkpoint_config = dict(interval=5)
+```
+## Use Pre-Trained Model
+To use the pre-trained model for the whole network, the new config adds the link of pre-trained models in the `load_from`.
+We set `load_from=None` as default in `configs/_base_/default_runtime.py` and owing to \[inheritance design\](/docs/en/tutorials/1_config.md), users can directly change it by setting `load_from` in their configs.
+```python
+# use the pre-trained model for the whole TSN network
+load_from = 'https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/mmaction-v1/recognition/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'  # model path can be found in model zoo
+```
--- a/docs/en/tutorials/3_new_dataset.md
+++ b/docs/en/tutorials/3_new_dataset.md
+# Tutorial 3: Adding New Dataset
+In this tutorial, we will introduce some methods about how to customize your own dataset by reorganizing data and mixing dataset for the project.
+<!-- TOC -->
+- [Tutorial 3: Adding New Dataset](#tutorial-3-adding-new-dataset)
+  - [Customize Datasets by Reorganizing Data](#customize-datasets-by-reorganizing-data)
+    - [Reorganize datasets to existing format](#reorganize-datasets-to-existing-format)
+    - [An example of a custom dataset](#an-example-of-a-custom-dataset)
+  - [Customize Dataset by Mixing Dataset](#customize-dataset-by-mixing-dataset)
+    - [Repeat dataset](#repeat-dataset)
+<!-- TOC -->
+## Customize Datasets by Reorganizing Data
+### Reorganize datasets to existing format
+The simplest way is to convert your dataset to existing dataset formats (RawframeDataset or VideoDataset).
+There are three kinds of annotation files.
+- rawframe annotation
+  The annotation of a rawframe dataset is a text file with multiple lines,
+  and each line indicates `frame_directory` (relative path) of a video,
+  `total_frames` of a video and the `label` of a video, which are split by a whitespace.
+  Here is an example.
+  ```
+  some/directory-1 163 1
+  some/directory-2 122 1
+  some/directory-3 258 2
+  some/directory-4 234 2
+  some/directory-5 295 3
+  some/directory-6 121 3
+  ```
+- video annotation
+  The annotation of a video dataset is a text file with multiple lines,
+  and each line indicates a sample video with the `filepath` (relative path) and `label`,
+  which are split by a whitespace.
+  Here is an example.
+  ```
+  some/path/000.mp4 1
+  some/path/001.mp4 1
+  some/path/002.mp4 2
+  some/path/003.mp4 2
+  some/path/004.mp4 3
+  some/path/005.mp4 3
+  ```
+- ActivityNet annotation
+  The annotation of ActivityNet dataset is a json file. Each key is a video name
+  and the corresponding value is the meta data and annotation for the video.
+  Here is an example.
+  ```
+  {
+    "video1": {
+        "duration_second": 211.53,
+        "duration_frame": 6337,
+        "annotations": [
+            {
+                "segment": [
+                    30.025882995319815,
+                    205.2318595943838
+                ],
+                "label": "Rock climbing"
+            }
+        ],
+        "feature_frame": 6336,
+        "fps": 30.0,
+        "rfps": 29.9579255898
+    },
+    "video2": {
+        "duration_second": 26.75,
+        "duration_frame": 647,
+        "annotations": [
+            {
+                "segment": [
+                    2.578755070202808,
+                    24.914101404056165
+                ],
+                "label": "Drinking beer"
+            }
+        ],
+        "feature_frame": 624,
+        "fps": 24.0,
+        "rfps": 24.1869158879
+    }
+  }
+  ```
+There are two ways to work with custom datasets.
+- online conversion
+  You can write a new Dataset class inherited from [BaseDataset](/mmaction/datasets/base.py), and overwrite three methods
+  `load_annotations(self)`, `evaluate(self, results, metrics, logger)` and `dump_results(self, results, out)`,
+  like [RawframeDataset](/mmaction/datasets/rawframe_dataset.py), [VideoDataset](/mmaction/datasets/video_dataset.py) or [ActivityNetDataset](/mmaction/datasets/activitynet_dataset.py).
+- offline conversion
+  You can convert the annotation format to the expected format above and save it to
+  a pickle or json file, then you can simply use `RawframeDataset`, `VideoDataset` or `ActivityNetDataset`.
+After the data pre-processing, the users need to further modify the config files to use the dataset.
+Here is an example of using a custom dataset in rawframe format.
+In `configs/task/method/my_custom_config.py`:
+```python
+...
+# dataset settings
+dataset_type = 'RawframeDataset'
+data_root = 'path/to/your/root'
+data_root_val = 'path/to/your/root_val'
+ann_file_train = 'data/custom/custom_train_list.txt'
+ann_file_val = 'data/custom/custom_val_list.txt'
+ann_file_test = 'data/custom/custom_val_list.txt'
+...
+data = dict(
+    videos_per_gpu=32,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        ann_file=ann_file_train,
+        ...),
+    val=dict(
+        type=dataset_type,
+        ann_file=ann_file_val,
+        ...),
+    test=dict(
+        type=dataset_type,
+        ann_file=ann_file_test,
+        ...))
+...
+```
+We use this way to support Rawframe dataset.
+### An example of a custom dataset
+Assume the annotation is in a new format in text files, and the image file name is of template like `img_00005.jpg`
+The video annotations are stored in text file `annotation.txt` as following
+```
+directory,total frames,class
+D32_1gwq35E,299,66
+-G-5CJ0JkKY,249,254
+T4h1bvOd9DA,299,33
+4uZ27ivBl00,299,341
+0LfESFkfBSw,249,186
+-YIsNpBEx6c,299,169
+```
+We can create a new dataset in `mmaction/datasets/my_dataset.py` to load the data.
+```python
+import copy
+import os.path as osp
+import mmcv
+from .base import BaseDataset
+from .builder import DATASETS
+@DATASETS.register_module()
+class MyDataset(BaseDataset):
+    def __init__(self,
+                 ann_file,
+                 pipeline,
+                 data_prefix=None,
+                 test_mode=False,
+                 filename_tmpl='img_{:05}.jpg'):
+        super(MyDataset, self).__init__(ann_file, pipeline, test_mode)
+        self.filename_tmpl = filename_tmpl
+    def load_annotations(self):
+        video_infos = []
+        with open(self.ann_file, 'r') as fin:
+            for line in fin:
+                if line.startswith("directory"):
+                    continue
+                frame_dir, total_frames, label = line.split(',')
+                if self.data_prefix is not None:
+                    frame_dir = osp.join(self.data_prefix, frame_dir)
+                video_infos.append(
+                    dict(
+                        frame_dir=frame_dir,
+                        total_frames=int(total_frames),
+                        label=int(label)))
+        return video_infos
+    def prepare_train_frames(self, idx):
+        results = copy.deepcopy(self.video_infos[idx])
+        results['filename_tmpl'] = self.filename_tmpl
+        return self.pipeline(results)
+    def prepare_test_frames(self, idx):
+        results = copy.deepcopy(self.video_infos[idx])
+        results['filename_tmpl'] = self.filename_tmpl
+        return self.pipeline(results)
+    def evaluate(self,
+                 results,
+                 metrics='top_k_accuracy',
+                 topk=(1, 5),
+                 logger=None):
+        pass
+```
+Then in the config, to use `MyDataset` you can modify the config as the following
+```python
+dataset_A_train = dict(
+    type='MyDataset',
+    ann_file=ann_file_train,
+    pipeline=train_pipeline
+)
+```
+## Customize Dataset by Mixing Dataset
+MMAction2 also supports to mix dataset for training. Currently it supports to repeat dataset.
+### Repeat dataset
+We use `RepeatDataset` as wrapper to repeat the dataset. For example, suppose the original dataset as `Dataset_A`,
+to repeat it, the config looks like the following
+```python
+dataset_A_train = dict(
+        type='RepeatDataset',
+        times=N,
+        dataset=dict(  # This is the original config of Dataset_A
+            type='Dataset_A',
+            ...
+            pipeline=train_pipeline
+        )
+    )
+```
--- a/docs/en/tutorials/4_data_pipeline.md
+++ b/docs/en/tutorials/4_data_pipeline.md
+# Tutorial 4: Customize Data Pipelines
+In this tutorial, we will introduce some methods about the design of data pipelines, and how to customize and extend your own data pipelines for the project.
+<!-- TOC -->
+- [Tutorial 4: Customize Data Pipelines](#tutorial-4-customize-data-pipelines)
+  - [Design of Data Pipelines](#design-of-data-pipelines)
+    - [Data loading](#data-loading)
+    - [Pre-processing](#pre-processing)
+    - [Formatting](#formatting)
+  - [Extend and Use Custom Pipelines](#extend-and-use-custom-pipelines)
+<!-- TOC -->
+## Design of Data Pipelines
+Following typical conventions, we use `Dataset` and `DataLoader` for data loading
+with multiple workers. `Dataset` returns a dict of data items corresponding
+the arguments of models' forward method.
+Since the data in action recognition & localization may not be the same size (image size, gt bbox size, etc.),
+The `DataContainer` in MMCV is used to help collect and distribute data of different sizes.
+See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
+The data preparation pipeline and the dataset is decomposed. Usually a dataset
+defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
+A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next operation.
+We present a typical pipeline in the following figure. The blue blocks are pipeline operations.
+With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange).
+![pipeline figure](https://github.com/open-mmlab/mmaction2/raw/master/resources/data_pipeline.png)
+The operations are categorized into data loading, pre-processing and formatting.
+Here is a pipeline example for TSN.
+```python
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
+train_pipeline = [
+    dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3),
+    dict(type='RawFrameDecode', io_backend='disk'),
+    dict(type='Resize', scale=(-1, 256)),
+    dict(
+        type='MultiScaleCrop',
+        input_size=224,
+        scales=(1, 0.875, 0.75, 0.66),
+        random_crop=False,
+        max_wh_scale_gap=1),
+    dict(type='Resize', scale=(224, 224), keep_ratio=False),
+    dict(type='Flip', flip_ratio=0.5),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='FormatShape', input_format='NCHW'),
+    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
+    dict(type='ToTensor', keys=['imgs', 'label'])
+]
+val_pipeline = [
+    dict(
+        type='SampleFrames',
+        clip_len=1,
+        frame_interval=1,
+        num_clips=3,
+        test_mode=True),
+    dict(type='RawFrameDecode', io_backend='disk'),
+    dict(type='Resize', scale=(-1, 256)),
+    dict(type='CenterCrop', crop_size=224),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='FormatShape', input_format='NCHW'),
+    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
+    dict(type='ToTensor', keys=['imgs'])
+]
+test_pipeline = [
+    dict(
+        type='SampleFrames',
+        clip_len=1,
+        frame_interval=1,
+        num_clips=25,
+        test_mode=True),
+    dict(type='RawFrameDecode', io_backend='disk'),
+    dict(type='Resize', scale=(-1, 256)),
+    dict(type='TenCrop', crop_size=224),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='FormatShape', input_format='NCHW'),
+    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
+    dict(type='ToTensor', keys=['imgs'])
+]
+```
+We have supported some lazy operators and encourage users to apply them.
+Lazy ops record how the data should be processed, but it will postpone the processing on the raw data until the raw data forward `Fuse` stage.
+Specifically, lazy ops avoid frequent reading and modification operation on the raw data, but process the raw data once in the final Fuse stage, thus accelerating data preprocessing.
+Here is a pipeline example applying lazy ops.
+```python
+train_pipeline = [
+    dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1),
+    dict(type='RawFrameDecode', decoding_backend='turbojpeg'),
+    # The following three lazy ops only process the bbox of frames without
+    # modifying the raw data.
+    dict(type='Resize', scale=(-1, 256), lazy=True),
+    dict(
+        type='MultiScaleCrop',
+        input_size=224,
+        scales=(1, 0.8),
+        random_crop=False,
+        max_wh_scale_gap=0,
+        lazy=True),
+    dict(type='Resize', scale=(224, 224), keep_ratio=False, lazy=True),
+    # Lazy operator `Flip` only record whether a frame should be fliped and the
+    # flip direction.
+    dict(type='Flip', flip_ratio=0.5, lazy=True),
+    # Processing the raw data once in Fuse stage.
+    dict(type='Fuse'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='FormatShape', input_format='NCTHW'),
+    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
+    dict(type='ToTensor', keys=['imgs', 'label'])
+]
+```
+For each operation, we list the related dict fields that are added/updated/removed below, where `*` means the key may not be affected.
+### Data loading
+`SampleFrames`
+- add: frame_inds, clip_len, frame_interval, num_clips, \*total_frames
+`DenseSampleFrames`
+- add: frame_inds, clip_len, frame_interval, num_clips, \*total_frames
+`PyAVDecode`
+- add: imgs, original_shape
+- update: \*frame_inds
+`DecordDecode`
+- add: imgs, original_shape
+- update: \*frame_inds
+`OpenCVDecode`
+- add: imgs, original_shape
+- update: \*frame_inds
+`RawFrameDecode`
+- add: imgs, original_shape
+- update: \*frame_inds
+### Pre-processing
+`RandomCrop`
+- add: crop_bbox, img_shape
+- update: imgs
+`RandomResizedCrop`
+- add: crop_bbox, img_shape
+- update: imgs
+`MultiScaleCrop`
+- add: crop_bbox, img_shape, scales
+- update: imgs
+`Resize`
+- add: img_shape, keep_ratio, scale_factor
+- update: imgs
+`Flip`
+- add: flip, flip_direction
+- update: imgs, label
+`Normalize`
+- add: img_norm_cfg
+- update: imgs
+`CenterCrop`
+- add: crop_bbox, img_shape
+- update: imgs
+`ThreeCrop`
+- add: crop_bbox, img_shape
+- update: imgs
+`TenCrop`
+- add: crop_bbox, img_shape
+- update: imgs
+### Formatting
+`ToTensor`
+- update: specified by `keys`.
+`ImageToTensor`
+- update: specified by `keys`.
+`Transpose`
+- update: specified by `keys`.
+`Collect`
+- add: img_metas (the keys of img_metas is specified by `meta_keys`)
+- remove: all other keys except for those specified by `keys`
+It is **noteworthy** that the first key, commonly `imgs`, will be used as the main key to calculate the batch size.
+`FormatShape`
+- add: input_shape
+- update: imgs
+## Extend and Use Custom Pipelines
+1. Write a new pipeline in any file, e.g., `my_pipeline.py`. It takes a dict as input and return a dict.
+   ```python
+   from mmaction.datasets import PIPELINES
+   @PIPELINES.register_module()
+   class MyTransform:
+       def __call__(self, results):
+           results['key'] = value
+           return results
+   ```
+2. Import the new class.
+   ```python
+   from .my_pipeline import MyTransform
+   ```
+3. Use it in config files.
+   ```python
+   img_norm_cfg = dict(
+        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+   train_pipeline = [
+       dict(type='DenseSampleFrames', clip_len=8, frame_interval=8, num_clips=1),
+       dict(type='RawFrameDecode', io_backend='disk'),
+       dict(type='MyTransform'),       # use a custom pipeline
+       dict(type='Normalize', **img_norm_cfg),
+       dict(type='FormatShape', input_format='NCTHW'),
+       dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
+       dict(type='ToTensor', keys=['imgs', 'label'])
+   ]
+   ```
--- a/docs/en/tutorials/5_new_modules.md
+++ b/docs/en/tutorials/5_new_modules.md
+# Tutorial 5: Adding New Modules
+In this tutorial, we will introduce some methods about how to customize optimizer, develop new components and new a learning rate scheduler for this project.
+<!-- TOC -->
+- [Tutorial 5: Adding New Modules](#tutorial-5-adding-new-modules)
+  - [Customize Optimizer](#customize-optimizer)
+  - [Customize Optimizer Constructor](#customize-optimizer-constructor)
+  - [Develop New Components](#develop-new-components)
+    - [Add new backbones](#add-new-backbones)
+    - [Add new heads](#add-new-heads)
+    - [Add new loss](#add-new-loss)
+  - [Add new learning rate scheduler (updater)](#add-new-learning-rate-scheduler-updater)
+<!-- TOC -->
+## Customize Optimizer
+An example of customized optimizer is [CopyOfSGD](/mmaction/core/optimizer/copy_of_sgd.py) is defined in `mmaction/core/optimizer/copy_of_sgd.py`.
+More generally, a customized optimizer could be defined as following.
+Assume you want to add an optimizer named as `MyOptimizer`, which has arguments `a`, `b` and `c`.
+You need to first implement the new optimizer in a file, e.g., in `mmaction/core/optimizer/my_optimizer.py`:
+```python
+from mmcv.runner import OPTIMIZERS
+from torch.optim import Optimizer
+@OPTIMIZERS.register_module()
+class MyOptimizer(Optimizer):
+    def __init__(self, a, b, c):
+```
+Then add this module in `mmaction/core/optimizer/__init__.py`, thus the registry will find the new module and add it:
+```python
+from .my_optimizer import MyOptimizer
+```
+Then you can use `MyOptimizer` in `optimizer` field of config files.
+In the configs, the optimizers are defined by the field `optimizer` like the following:
+```python
+optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
+```
+To use your own optimizer, the field can be changed as
+```python
+optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value)
+```
+We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the `optimizer` field of config files.
+For example, if you want to use `ADAM`, though the performance will drop a lot, the modification could be as the following.
+```python
+optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001)
+```
+The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch.
+## Customize Optimizer Constructor
+Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNorm layers.
+The users can do those fine-grained parameter tuning through customizing optimizer constructor.
+You can write a new optimizer constructor inherit from [DefaultOptimizerConstructor](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py)
+and overwrite the `add_params(self, params, module)` method.
+An example of customized optimizer constructor is [TSMOptimizerConstructor](/mmaction/core/optimizer/tsm_optimizer_constructor.py).
+More generally, a customized optimizer constructor could be defined as following.
+In `mmaction/core/optimizer/my_optimizer_constructor.py`:
+```python
+from mmcv.runner import OPTIMIZER_BUILDERS, DefaultOptimizerConstructor
+@OPTIMIZER_BUILDERS.register_module()
+class MyOptimizerConstructor(DefaultOptimizerConstructor):
+```
+In `mmaction/core/optimizer/__init__.py`:
+```python
+from .my_optimizer_constructor import MyOptimizerConstructor
+```
+Then you can use `MyOptimizerConstructor` in `optimizer` field of config files.
+```python
+# optimizer
+optimizer = dict(
+    type='SGD',
+    constructor='MyOptimizerConstructor',
+    paramwise_cfg=dict(fc_lr5=True),
+    lr=0.02,
+    momentum=0.9,
+    weight_decay=0.0001)
+```
+## Develop New Components
+We basically categorize model components into 4 types.
+- recognizer: the whole recognizer model pipeline, usually contains a backbone and cls_head.
+- backbone: usually an FCN network to extract feature maps, e.g., ResNet, BNInception.
+- cls_head: the component for classification task, usually contains an FC layer with some pooling layers.
+- localizer: the model for temporal localization task, currently available: BSN, BMN, SSN.
+### Add new backbones
+Here we show how to develop new components with an example of TSN.
+1. Create a new file `mmaction/models/backbones/resnet.py`.
+   ```python
+   import torch.nn as nn
+   from ..builder import BACKBONES
+   @BACKBONES.register_module()
+   class ResNet(nn.Module):
+       def __init__(self, arg1, arg2):
+           pass
+       def forward(self, x):  # should return a tuple
+           pass
+       def init_weights(self, pretrained=None):
+           pass
+   ```
+2. Import the module in `mmaction/models/backbones/__init__.py`.
+   ```python
+   from .resnet import ResNet
+   ```
+3. Use it in your config file.
+   ```python
+   model = dict(
+       ...
+       backbone=dict(
+           type='ResNet',
+           arg1=xxx,
+           arg2=xxx),
+   )
+   ```
+### Add new heads
+Here we show how to develop a new head with the example of TSNHead as the following.
+1. Create a new file `mmaction/models/heads/tsn_head.py`.
+   You can write a new classification head inheriting from [BaseHead](/mmaction/models/heads/base.py),
+   and overwrite `init_weights(self)` and `forward(self, x)` method.
+   ```python
+   from ..builder import HEADS
+   from .base import BaseHead
+   @HEADS.register_module()
+   class TSNHead(BaseHead):
+       def __init__(self, arg1, arg2):
+           pass
+       def forward(self, x):
+           pass
+       def init_weights(self):
+           pass
+   ```
+2. Import the module in `mmaction/models/heads/__init__.py`
+   ```python
+   from .tsn_head import TSNHead
+   ```
+3. Use it in your config file
+   ```python
+   model = dict(
+       ...
+       cls_head=dict(
+           type='TSNHead',
+           num_classes=400,
+           in_channels=2048,
+           arg1=xxx,
+           arg2=xxx),
+   ```
+### Add new loss
+Assume you want to add a new loss as `MyLoss`. To add a new loss function, the users need implement it in `mmaction/models/losses/my_loss.py`.
+```python
+import torch
+import torch.nn as nn
+from ..builder import LOSSES
+def my_loss(pred, target):
+    assert pred.size() == target.size() and target.numel() > 0
+    loss = torch.abs(pred - target)
+    return loss
+@LOSSES.register_module()
+class MyLoss(nn.Module):
+    def forward(self, pred, target):
+        loss = my_loss(pred, target)
+        return loss
+```
+Then the users need to add it in the `mmaction/models/losses/__init__.py`
+```python
+from .my_loss import MyLoss, my_loss
+```
+To use it, modify the `loss_xxx` field. Since MyLoss is for regression, we can use it for the bbox loss `loss_bbox`.
+```python
+loss_bbox=dict(type='MyLoss'))
+```
+## Add new learning rate scheduler (updater)
+The default manner of constructing a lr updater(namely, 'scheduler' by pytorch convention), is to modify the config such as:
+```python
+...
+lr_config = dict(policy='step', step=[20, 40])
+...
+```
+In the api for [`train.py`](/mmaction/apis/train.py), it will register the learning rate updater hook based on the config at:
+```python
+...
+    runner.register_training_hooks(
+        cfg.lr_config,
+        optimizer_config,
+        cfg.checkpoint_config,
+        cfg.log_config,
+        cfg.get('momentum_config', None))
+...
+```
+So far, the supported updaters can be find in [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py), but if you want to customize a new learning rate updater, you may follow the steps below:
+1. First, write your own LrUpdaterHook in `$MMAction2/mmaction/core/scheduler`. The snippet followed is an example of customized lr updater that uses learning rate based on a specific learning rate ratio: `lrs`, by which the learning rate decreases at each `steps`:
+```python
+@HOOKS.register_module()
+# Register it here
+class RelativeStepLrUpdaterHook(LrUpdaterHook):
+    # You should inheritate it from mmcv.LrUpdaterHook
+    def __init__(self, steps, lrs, **kwargs):
+        super().__init__(**kwargs)
+        assert len(steps) == (len(lrs))
+        self.steps = steps
+        self.lrs = lrs
+    def get_lr(self, runner, base_lr):
+        # Only this function is required to override
+        # This function is called before each training epoch, return the specific learning rate here.
+        progress = runner.epoch if self.by_epoch else runner.iter
+        for i in range(len(self.steps)):
+            if progress < self.steps[i]:
+                return self.lrs[i]
+```
+2. Modify your config:
+In your config file, swap the original `lr_config` by:
+```python
+lr_config = dict(policy='RelativeStep', steps=[20, 40, 60], lrs=[0.1, 0.01, 0.001])
+```
+More examples can be found in [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py).
--- a/docs/en/tutorials/6_export_model.md
+++ b/docs/en/tutorials/6_export_model.md
+# Tutorial 6: Exporting a model to ONNX
+Open Neural Network Exchange [(ONNX)](https://onnx.ai/) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves.
+<!-- TOC -->
+- [Tutorial 6: Exporting a model to ONNX](#tutorial-6-exporting-a-model-to-onnx)
+  - [Supported Models](#supported-models)
+  - [Usage](#usage)
+    - [Prerequisite](#prerequisite)
+    - [Recognizers](#recognizers)
+    - [Localizers](#localizers)
+<!-- TOC -->
+## Supported Models
+So far, our codebase supports onnx exporting from pytorch models trained with MMAction2. The supported models are:
+- I3D
+- TSN
+- TIN
+- TSM
+- R(2+1)D
+- SLOWFAST
+- SLOWONLY
+- BMN
+- BSN(tem, pem)
+## Usage
+For simple exporting, you can use the [script](/tools/deployment/pytorch2onnx.py) here. Note that the package `onnx` and `onnxruntime` are required for verification after exporting.
+### Prerequisite
+First, install onnx.
+```shell
+pip install onnx onnxruntime
+```
+We provide a python script to export the pytorch model trained by MMAction2 to ONNX.
+```shell
+python tools/deployment/pytorch2onnx.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--shape ${SHAPE}] \
+    [--verify] [--show] [--output-file ${OUTPUT_FILE}]  [--is-localizer] [--opset-version ${VERSION}]
+```
+Optional arguments:
+- `--shape`: The shape of input tensor to the model. For 2D recognizer(e.g. TSN), the input should be `$batch $clip $channel $height $width`(e.g. `1 1 3 224 224`); For 3D recognizer(e.g. I3D), the input should be `$batch $clip $channel $time $height $width`(e.g. `1 1 3 32 224 224`); For localizer such as BSN, the input for each module is different, please check the `forward` function for it. If not specified, it will be set to `1 1 3 224 224`.
+- `--verify`: Determines whether to verify the exported model, runnably and numerically. If not specified, it will be set to `False`.
+- `--show`: Determines whether to print the architecture of the exported model. If not specified, it will be set to `False`.
+- `--output-file`: The output onnx model name. If not specified, it will be set to `tmp.onnx`.
+- `--is-localizer`: Determines whether the model to be exported is a localizer. If not specified, it will be set to `False`.
+- `--opset-version`: Determines the operation set version of onnx, we recommend you to use a higher version such as 11 for compatibility. If not specified, it will be set to `11`.
+- `--softmax`: Determines whether to add a softmax layer at the end of recognizers. If not specified, it will be set to `False`. For now, localizers are not supported.
+### Recognizers
+For recognizers, please run:
+```shell
+python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --shape $SHAPE --verify
+```
+### Localizers
+For localizers, please run:
+```shell
+python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --is-localizer --shape $SHAPE --verify
+```
+Please fire an issue if you discover any checkpoints that are not perfectly exported or suffer some loss in accuracy.
--- a/docs/en/tutorials/7_customize_runtime.md
+++ b/docs/en/tutorials/7_customize_runtime.md
+# Tutorial 7: Customize Runtime Settings
+In this tutorial, we will introduce some methods about how to customize optimization methods, training schedules, workflow and hooks when running your own settings for the project.
+<!-- TOC -->
+- [Tutorial 7: Customize Runtime Settings](#tutorial-7-customize-runtime-settings)
+  - [Customize Optimization Methods](#customize-optimization-methods)
+    - [Customize optimizer supported by PyTorch](#customize-optimizer-supported-by-pytorch)
+    - [Customize self-implemented optimizer](#customize-self-implemented-optimizer)
+      - [1. Define a new optimizer](#1-define-a-new-optimizer)
+      - [2. Add the optimizer to registry](#2-add-the-optimizer-to-registry)
+      - [3. Specify the optimizer in the config file](#3-specify-the-optimizer-in-the-config-file)
+    - [Customize optimizer constructor](#customize-optimizer-constructor)
+    - [Additional settings](#additional-settings)
+  - [Customize Training Schedules](#customize-training-schedules)
+  - [Customize Workflow](#customize-workflow)
+  - [Customize Hooks](#customize-hooks)
+    - [Customize self-implemented hooks](#customize-self-implemented-hooks)
+      - [1. Implement a new hook](#1-implement-a-new-hook)
+      - [2. Register the new hook](#2-register-the-new-hook)
+      - [3. Modify the config](#3-modify-the-config)
+    - [Use hooks implemented in MMCV](#use-hooks-implemented-in-mmcv)
+    - [Modify default runtime hooks](#modify-default-runtime-hooks)
+      - [Checkpoint config](#checkpoint-config)
+      - [Log config](#log-config)
+      - [Evaluation config](#evaluation-config)
+<!-- TOC -->
+## Customize Optimization Methods
+### Customize optimizer supported by PyTorch
+We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the `optimizer` field of config files.
+For example, if you want to use `Adam`, the modification could be as the following.
+```python
+optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001)
+```
+To modify the learning rate of the model, the users only need to modify the `lr` in the config of optimizer.
+The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch.
+For example, if you want to use `Adam` with the setting like `torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)` in PyTorch,
+the modification could be set as the following.
+```python
+optimizer = dict(type='Adam', lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
+```
+### Customize self-implemented optimizer
+#### 1. Define a new optimizer
+A customized optimizer could be defined as following.
+Assume you want to add an optimizer named `MyOptimizer`, which has arguments `a`, `b`, and `c`.
+You need to create a new directory named `mmaction/core/optimizer`.
+And then implement the new optimizer in a file, e.g., in `mmaction/core/optimizer/my_optimizer.py`:
+```python
+from mmcv.runner import OPTIMIZERS
+from torch.optim import Optimizer
+@OPTIMIZERS.register_module()
+class MyOptimizer(Optimizer):
+    def __init__(self, a, b, c):
+```
+#### 2. Add the optimizer to registry
+To find the above module defined above, this module should be imported into the main namespace at first. There are two ways to achieve it.
+- Modify `mmaction/core/optimizer/__init__.py` to import it.
+  The newly defined module should be imported in `mmaction/core/optimizer/__init__.py` so that the registry will
+  find the new module and add it:
+```python
+from .my_optimizer import MyOptimizer
+```
+- Use `custom_imports` in the config to manually import it
+```python
+custom_imports = dict(imports=['mmaction.core.optimizer.my_optimizer'], allow_failed_imports=False)
+```
+The module `mmaction.core.optimizer.my_optimizer` will be imported at the beginning of the program and the class `MyOptimizer` is then automatically registered.
+Note that only the package containing the class `MyOptimizer` should be imported. `mmaction.core.optimizer.my_optimizer.MyOptimizer` **cannot** be imported directly.
+#### 3. Specify the optimizer in the config file
+Then you can use `MyOptimizer` in `optimizer` field of config files.
+In the configs, the optimizers are defined by the field `optimizer` like the following:
+```python
+optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
+```
+To use your own optimizer, the field can be changed to
+```python
+optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value)
+```
+### Customize optimizer constructor
+Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNorm layers.
+The users can do those fine-grained parameter tuning through customizing optimizer constructor.
+```python
+from mmcv.runner.optimizer import OPTIMIZER_BUILDERS
+@OPTIMIZER_BUILDERS.register_module()
+class MyOptimizerConstructor:
+    def __init__(self, optimizer_cfg, paramwise_cfg=None):
+        pass
+    def __call__(self, model):
+        return my_optimizer
+```
+The default optimizer constructor is implemented [here](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/optimizer/default_constructor.py#L11),
+which could also serve as a template for new optimizer constructor.
+### Additional settings
+Tricks not implemented by the optimizer should be implemented through optimizer constructor (e.g., set parameter-wise learning rates) or hooks.
+We list some common settings that could stabilize the training or accelerate the training. Feel free to create PR, issue for more settings.
+- __Use gradient clip to stabilize training__:
+  Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below:
+  ```python
+  optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
+  ```
+- __Use momentum schedule to accelerate model convergence__:
+  We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way.
+  Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence.
+  For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327)
+  and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130).
+  ```python
+  lr_config = dict(
+      policy='cyclic',
+      target_ratio=(10, 1e-4),
+      cyclic_times=1,
+      step_ratio_up=0.4,
+  )
+  momentum_config = dict(
+      policy='cyclic',
+      target_ratio=(0.85 / 0.95, 1),
+      cyclic_times=1,
+      step_ratio_up=0.4,
+  )
+  ```
+## Customize Training Schedules
+we use step learning rate with default value in config files, this calls [`StepLRHook`](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L153) in MMCV.
+We support many other learning rate schedule [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py), such as `CosineAnnealing` and `Poly` schedule. Here are some examples
+- Poly schedule:
+  ```python
+  lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
+  ```
+- ConsineAnnealing schedule:
+  ```python
+  lr_config = dict(
+      policy='CosineAnnealing',
+      warmup='linear',
+      warmup_iters=1000,
+      warmup_ratio=1.0 / 10,
+      min_lr_ratio=1e-5)
+  ```
+## Customize Workflow
+By default, we recommend users to use `EvalHook` to do evaluation after training epoch, but they can still use `val` workflow as an alternative.
+Workflow is a list of (phase, epochs) to specify the running order and epochs. By default it is set to be
+```python
+workflow = [('train', 1)]
+```
+which means running 1 epoch for training.
+Sometimes user may want to check some metrics (e.g. loss, accuracy) about the model on the validate set.
+In such case, we can set the workflow as
+```python
+[('train', 1), ('val', 1)]
+```
+so that 1 epoch for training and 1 epoch for validation will be run iteratively.
+:::{note}
+1. The parameters of model will not be updated during val epoch.
+2. Keyword `total_epochs` in the config only controls the number of training epochs and will not affect the validation workflow.
+3. Workflows `[('train', 1), ('val', 1)]` and `[('train', 1)]` will not change the behavior of `EvalHook` because `EvalHook` is called by `after_train_epoch` and validation workflow only affect hooks that are called through `after_val_epoch`.
+   Therefore, the only difference between `[('train', 1), ('val', 1)]` and `[('train', 1)]` is that the runner will calculate losses on validation set after each training epoch.
+:::
+## Customize Hooks
+### Customize self-implemented hooks
+#### 1. Implement a new hook
+Here we give an example of creating a new hook in MMAction2 and using it in training.
+```python
+from mmcv.runner import HOOKS, Hook
+@HOOKS.register_module()
+class MyHook(Hook):
+    def __init__(self, a, b):
+        pass
+    def before_run(self, runner):
+        pass
+    def after_run(self, runner):
+        pass
+    def before_epoch(self, runner):
+        pass
+    def after_epoch(self, runner):
+        pass
+    def before_iter(self, runner):
+        pass
+    def after_iter(self, runner):
+        pass
+```
+Depending on the functionality of the hook, the users need to specify what the hook will do at each stage of the training in `before_run`, `after_run`, `before_epoch`, `after_epoch`, `before_iter`, and `after_iter`.
+#### 2. Register the new hook
+Then we need to make `MyHook` imported. Assuming the file is in `mmaction/core/utils/my_hook.py` there are two ways to do that:
+- Modify `mmaction/core/utils/__init__.py` to import it.
+  The newly defined module should be imported in `mmaction/core/utils/__init__.py` so that the registry will
+  find the new module and add it:
+```python
+from .my_hook import MyHook
+```
+- Use `custom_imports` in the config to manually import it
+```python
+custom_imports = dict(imports=['mmaction.core.utils.my_hook'], allow_failed_imports=False)
+```
+#### 3. Modify the config
+```python
+custom_hooks = [
+    dict(type='MyHook', a=a_value, b=b_value)
+]
+```
+You can also set the priority of the hook by adding key `priority` to `'NORMAL'` or `'HIGHEST'` as below
+```python
+custom_hooks = [
+    dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL')
+]
+```
+By default the hook's priority is set as `NORMAL` during registration.
+### Use hooks implemented in MMCV
+If the hook is already implemented in MMCV, you can directly modify the config to use the hook as below
+```python
+mmcv_hooks = [
+    dict(type='MMCVHook', a=a_value, b=b_value, priority='NORMAL')
+]
+```
+### Modify default runtime hooks
+There are some common hooks that are not registered through `custom_hooks` but has been registered by default when importing MMCV, they are
+- log_config
+- checkpoint_config
+- evaluation
+- lr_config
+- optimizer_config
+- momentum_config
+In those hooks, only the logger hook has the `VERY_LOW` priority, others' priority are `NORMAL`.
+The above-mentioned tutorials already cover how to modify `optimizer_config`, `momentum_config`, and `lr_config`.
+Here we reveals how what we can do with `log_config`, `checkpoint_config`, and `evaluation`.
+#### Checkpoint config
+The MMCV runner will use `checkpoint_config` to initialize [`CheckpointHook`](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/hooks/checkpoint.py#L9).
+```python
+checkpoint_config = dict(interval=1)
+```
+The users could set `max_keep_ckpts` to only save only small number of checkpoints or decide whether to store state dict of optimizer by `save_optimizer`.
+More details of the arguments are [here](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.CheckpointHook)
+#### Log config
+The `log_config` wraps multiple logger hooks and enables to set intervals. Now MMCV supports `WandbLoggerHook`, `MlflowLoggerHook`, and `TensorboardLoggerHook`.
+The detail usages can be found in the [doc](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.LoggerHook).
+```python
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook'),
+        dict(type='TensorboardLoggerHook')
+    ])
+```
+#### Evaluation config
+The config of `evaluation` will be used to initialize the [`EvalHook`](https://github.com/open-mmlab/mmaction2/blob/master/mmaction/core/evaluation/eval_hooks.py#L12).
+Except the key `interval`, other arguments such as `metrics` will be passed to the `dataset.evaluate()`
+```python
+evaluation = dict(interval=1, metrics='bbox')
+```
--- a/docs/en/useful_tools.md
+++ b/docs/en/useful_tools.md
+Apart from training/testing scripts, We provide lots of useful tools under the `tools/` directory.
+## Useful Tools Link
+<!-- TOC -->
+- [Useful Tools Link](#useful-tools-link)
+- [Log Analysis](#log-analysis)
+- [Model Complexity](#model-complexity)
+- [Model Conversion](#model-conversion)
+  - [MMAction2 model to ONNX (experimental)](#mmaction2-model-to-onnx-experimental)
+  - [Prepare a model for publishing](#prepare-a-model-for-publishing)
+- [Model Serving](#model-serving)
+  - [1. Convert model from MMAction2 to TorchServe](#1-convert-model-from-mmaction2-to-torchserve)
+  - [2. Build `mmaction-serve` docker image](#2-build-mmaction-serve-docker-image)
+  - [3. Launch `mmaction-serve`](#3-launch-mmaction-serve)
+  - [4. Test deployment](#4-test-deployment)
+- [Miscellaneous](#miscellaneous)
+  - [Evaluating a metric](#evaluating-a-metric)
+  - [Print the entire config](#print-the-entire-config)
+  - [Check videos](#check-videos)
+<!-- TOC -->
+## Log Analysis
+`tools/analysis/analyze_logs.py` plots loss/top-k acc curves given a training log file. Run `pip install seaborn` first to install the dependency.
+![acc_curve_image](https://github.com/open-mmlab/mmaction2/raw/master/resources/acc_curve.png)
+```shell
+python tools/analysis/analyze_logs.py plot_curve ${JSON_LOGS} [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}]
+```
+Examples:
+- Plot the classification loss of some run.
+  ```shell
+  python tools/analysis/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
+  ```
+- Plot the top-1 acc and top-5 acc of some run, and save the figure to a pdf.
+  ```shell
+  python tools/analysis/analyze_logs.py plot_curve log.json --keys top1_acc top5_acc --out results.pdf
+  ```
+- Compare the top-1 acc of two runs in the same figure.
+  ```shell
+  python tools/analysis/analyze_logs.py plot_curve log1.json log2.json --keys top1_acc --legend run1 run2
+  ```
+  You can also compute the average training speed.
+  ```shell
+  python tools/analysis/analyze_logs.py cal_train_time ${JSON_LOGS} [--include-outliers]
+  ```
+- Compute the average training speed for a config file.
+  ```shell
+  python tools/analysis/analyze_logs.py cal_train_time work_dirs/some_exp/20200422_153324.log.json
+  ```
+  The output is expected to be like the following.
+  ```text
+  -----Analyze train time of work_dirs/some_exp/20200422_153324.log.json-----
+  slowest epoch 60, average time is 0.9736
+  fastest epoch 18, average time is 0.9001
+  time std over epochs is 0.0177
+  average iter time: 0.9330 s/iter
+  ```
+## Model Complexity
+`/tools/analysis/get_flops.py` is a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) to compute the FLOPs and params of a given model.
+```shell
+python tools/analysis/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
+```
+We will get the result like this
+```text
+==============================
+Input shape: (1, 3, 32, 340, 256)
+Flops: 37.1 GMac
+Params: 28.04 M
+==============================
+```
+:::{note}
+This tool is still experimental and we do not guarantee that the number is absolutely correct.
+You may use the result for simple comparisons, but double check it before you adopt it in technical reports or papers.
+(1) FLOPs are related to the input shape while parameters are not. The default input shape is (1, 3, 340, 256) for 2D recognizer, (1, 3, 32, 340, 256) for 3D recognizer.
+(2) Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details.
+:::
+## Model Conversion
+### MMAction2 model to ONNX (experimental)
+`/tools/deployment/pytorch2onnx.py` is a script to convert model to [ONNX](https://github.com/onnx/onnx) format.
+It also supports comparing the output results between Pytorch and ONNX model for verification.
+Run `pip install onnx onnxruntime` first to install the dependency.
+Please note that a softmax layer could be added for recognizers by `--softmax` option, in order to get predictions in range `[0, 1]`.
+- For recognizers, please run:
+  ```shell
+  python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --shape $SHAPE --verify
+  ```
+- For localizers, please run:
+  ```shell
+  python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --is-localizer --shape $SHAPE --verify
+  ```
+### Prepare a model for publishing
+`tools/deployment/publish_model.py` helps users to prepare their model for publishing.
+Before you upload a model to AWS, you may want to:
+(1) convert model weights to CPU tensors.
+(2) delete the optimizer states.
+(3) compute the hash of the checkpoint file and append the hash id to the filename.
+```shell
+python tools/deployment/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
+```
+E.g.,
+```shell
+python tools/deployment/publish_model.py work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/latest.pth tsn_r50_1x1x3_100e_kinetics400_rgb.pth
+```
+The final output filename will be `tsn_r50_1x1x3_100e_kinetics400_rgb-{hash id}.pth`.
+## Model Serving
+In order to serve an `MMAction2` model with [`TorchServe`](https://pytorch.org/serve/), you can follow the steps:
+### 1. Convert model from MMAction2 to TorchServe
+```shell
+python tools/deployment/mmaction2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
+--output_folder ${MODEL_STORE} \
+--model-name ${MODEL_NAME} \
+--label-file ${LABLE_FILE}
+```
+### 2. Build `mmaction-serve` docker image
+```shell
+DOCKER_BUILDKIT=1 docker build -t mmaction-serve:latest docker/serve/
+```
+### 3. Launch `mmaction-serve`
+Check the official docs for [running TorchServe with docker](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment).
+Example:
+```shell
+docker run --rm \
+--cpus 8 \
+--gpus device=0 \
+-p8080:8080 -p8081:8081 -p8082:8082 \
+--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \
+mmaction-serve:latest
+```
+**Note**: ${MODEL_STORE} needs to be an absolute path.
+[Read the docs](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md) about the Inference (8080), Management (8081) and Metrics (8082) APis
+### 4. Test deployment
+```shell
+# Assume you are under the directory `mmaction2`
+curl http://127.0.0.1:8080/predictions/${MODEL_NAME} -T demo/demo.mp4
+```
+You should obtain a response similar to:
+```json
+{
+  "arm wrestling": 1.0,
+  "rock scissors paper": 4.962051880497143e-10,
+  "shaking hands": 3.9761663406245873e-10,
+  "massaging feet": 1.1924419784925533e-10,
+  "stretching leg": 1.0601879096849842e-10
+}
+```
+## Miscellaneous
+### Evaluating a metric
+`tools/analysis/eval_metric.py` evaluates certain metrics of the results saved in a file according to a config file.
+The saved result file is created on `tools/test.py` by setting the arguments `--out ${RESULT_FILE}` to indicate the result file,
+which stores the final output of the whole model.
+```shell
+python tools/analysis/eval_metric.py ${CONFIG_FILE} ${RESULT_FILE} [--eval ${EVAL_METRICS}] [--cfg-options ${CFG_OPTIONS}] [--eval-options ${EVAL_OPTIONS}]
+```
+### Print the entire config
+`tools/analysis/print_config.py` prints the whole config verbatim, expanding all its imports.
+```shell
+python tools/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}]
+```
+### Check videos
+`tools/analysis/check_videos.py` uses specified video encoder to iterate all samples that are specified by the input configuration file, looks for invalid videos (corrupted or missing), and saves the corresponding file path to the output file. Please note that after deleting invalid videos, users need to regenerate the video file list.
+```shell
+python tools/analysis/check_videos.py ${CONFIG} [-h] [--options OPTIONS [OPTIONS ...]] [--cfg-options CFG_OPTIONS [CFG_OPTIONS ...]] [--output-file OUTPUT_FILE] [--split SPLIT] [--decoder DECODER] [--num-processes NUM_PROCESSES] [--remove-corrupted-videos]
+```
--- a/docs/zh_cn/Makefile
+++ b/docs/zh_cn/Makefile
+# Minimal makefile for Sphinx documentation
+#
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+.PHONY: help Makefile
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/zh_cn/README.md
+++ b/docs/zh_cn/README.md
+../README_zh-CN.md