We compare our results with some popular frameworks and official releases in terms of speed.
## Settings
### Hardware
- 8 NVIDIA Tesla V100 (32G) GPUs
- Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
### Software Environment
- Python 3.7
- PyTorch 1.4
- CUDA 10.1
- CUDNN 7.6.03
- NCCL 2.4.08
### Metrics
The time we measured is the average training time for an iteration, including data processing and model training.
The training speed is measure with s/iter. The lower, the better. Note that we skip the first 50 iter times as they may contain the device warmup time.
### Comparison Rules
Here we compare our MMAction2 repo with other video understanding toolboxes in the same data and model settings
by the training time per iteration. Here, we use
- commit id [7f3490d](https://github.com/open-mmlab/mmaction/tree/7f3490d3db6a67fe7b87bfef238b757403b670e3)(1/5/2020) of MMAction
- commit id [8d53d6f](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd)(5/5/2020) of Temporal-Shift-Module
- commit id [8299c98](https://github.com/facebookresearch/SlowFast/tree/8299c9862f83a067fa7114ce98120ae1568a83ec)(7/7/2020) of PySlowFast
- commit id [f13707f](https://github.com/wzmsltw/BSN-boundary-sensitive-network/tree/f13707fbc362486e93178c39f9c4d398afe2cb2f)(12/12/2018) of BSN(boundary sensitive network)
- commit id [45d0514](https://github.com/JJBOY/BMN-Boundary-Matching-Network/tree/45d05146822b85ca672b65f3d030509583d0135a)(17/10/2019) of BMN(boundary matching network)
To ensure the fairness of the comparison, the comparison experiments were conducted under the same hardware environment and using the same dataset. The rawframe dataset we used is generated by the [data preparation tools](/tools/data/kinetics/README.md), the video dataset we used is a special version of resized video cache called '256p dense-encoded video', featuring a faster decoding speed which is generated by the scripts [here](/tools/data/resize_videos.py). Significant improvement can be observed when comparing with normal 256p videos as shown in the table below, especially when the sampling is sparse(like [TSN](/configs/recognition/tsn/tsn_r50_video_320p_1x1x3_100e_kinetics400_rgb.py)).
For each model setting, we kept the same data preprocessing methods to make sure the same feature input.
In addition, we also used Memcached, a distributed cached system, to load the data for the same IO time except for fair comparisons with Pyslowfast which uses raw videos directly from disk by default.
We provide the training log based on which we calculate the average iter time, with the actual setting logged inside, feel free to verify it and fire an issue if something does not make sense.
| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p videos | Disk | 32x8 | **[1.42](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_videos_disk_32x8.zip)** | 8.1 | x | x | x | x | TODO | TODO |
| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 32x8 | **[0.61](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_fast_videos_disk_32x8.zip)** | 8.1 | x | x | x | x | TODO | TODO |
| [I3D heavy](/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.34](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_heavy_256p_videos_disk_8x8.zip)** | 4.6 | x | x | x | x | [0.44](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_i3d_r50_8x8_video.log) | 4.6 |
| [I3D heavy](/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.35](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_heavy_256p_fast_videos_disk_8x8.zip)** | 4.6 | x | x | x | x | [0.36](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_i3d_r50_8x8_fast_video.log) | 4.6 |
| [I3D](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | 256p rawframes | Memcached | 8x8 | **[0.43](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_256p_rawframes_memcahed_8x8.zip)** | 5.0 | [0.56](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction/i3d_256p_rawframes_memcached_8x8.zip) | 5.0 | x | x | x | x |
| [TSM](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | 256p rawframes | Memcached | 8x8 | **[0.31](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsm_256p_rawframes_memcahed_8x8.zip)** | 6.9 | x | x | [0.41](https://download.openmmlab.com/mmaction/benchmark/recognition/temporal_shift_module/tsm_256p_rawframes_memcached_8x8.zip) | 9.1 | x | x |
| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.32](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowonly_256p_videos_disk_8x8.zip)** | 3.1 | TODO | TODO | x | x | [0.34](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowonly_r50_4x16_video.log) | 3.4 |
| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.25](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowonly_256p_fast_videos_disk_8x8.zip)** | 3.1 | TODO | TODO | x | x | [0.28](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowonly_r50_4x16_fast_video.log) | 3.4 |
| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.69](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowfast_256p_videos_disk_8x8.zip)** | 6.1 | x | x | x | x | [1.04](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowfast_r50_4x16_video.log) | 7.0 |
| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.68](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowfast_256p_fast_videos_disk_8x8.zip)** | 6.1 | x | x | x | x | [0.96](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowfast_r50_4x16_fast_video.log) | 7.0 |
| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.45](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/r2plus1d_256p_videos_disk_8x8.zip)** | 5.1 | x | x | x | x | x | x |
| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.44](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/r2plus1d_256p_fast_videos_disk_8x8.zip)** | 5.1 | x | x | x | x | x | x |
- Add description to PoseC3D dataset ([#1053](https://github.com/open-mmlab/mmaction2/pull/1053))
### 0.16.0 (01/07/2021)
**Highlights**
- Support using backbone from pytorch-image-models(timm)
- Support PIMS Decoder
- Demo for skeleton-based action recognition
- Support Timesformer
**New Features**
- Support using backbones from pytorch-image-models(timm) for TSN ([#880](https://github.com/open-mmlab/mmaction2/pull/880))
- Support torchvision transformations in preprocessing pipelines ([#972](https://github.com/open-mmlab/mmaction2/pull/972))
- Demo for skeleton-based action recognition ([#972](https://github.com/open-mmlab/mmaction2/pull/972))
- Support Timesformer ([#839](https://github.com/open-mmlab/mmaction2/pull/839))
**Improvements**
- Add a tool to find invalid videos ([#907](https://github.com/open-mmlab/mmaction2/pull/907), [#950](https://github.com/open-mmlab/mmaction2/pull/950))
- Add an option to specify spectrogram_type ([#909](https://github.com/open-mmlab/mmaction2/pull/909))
- Add json output to video demo ([#906](https://github.com/open-mmlab/mmaction2/pull/906))
- Add MIM related docs ([#918](https://github.com/open-mmlab/mmaction2/pull/918))
- Rename lr to scheduler ([#916](https://github.com/open-mmlab/mmaction2/pull/916))
- Support `--cfg-options` for demos ([#911](https://github.com/open-mmlab/mmaction2/pull/911))
- Support number counting for flow-wise filename template ([#922](https://github.com/open-mmlab/mmaction2/pull/922))
- Add Chinese tutorial ([#941](https://github.com/open-mmlab/mmaction2/pull/941))
- Fix docstring for 3D inflate ([#925](https://github.com/open-mmlab/mmaction2/pull/925))
- Fix bug of writing text to video with TextClip ([#952](https://github.com/open-mmlab/mmaction2/pull/952))
- Fix mmcv install in CI ([#977](https://github.com/open-mmlab/mmaction2/pull/977))
**ModelZoo**
- Add TSN with Swin Transformer backbone as an example for using pytorch-image-models(timm) backbones ([#880](https://github.com/open-mmlab/mmaction2/pull/880))
- Port CSN checkpoints from VMZ ([#945](https://github.com/open-mmlab/mmaction2/pull/945))
- Release various checkpoints for UCF101, HMDB51 and Sthv1 ([#938](https://github.com/open-mmlab/mmaction2/pull/938))
- Support Timesformer ([#839](https://github.com/open-mmlab/mmaction2/pull/839))
- Support PoseC3D ([#786](https://github.com/open-mmlab/mmaction2/pull/786), [#890](https://github.com/open-mmlab/mmaction2/pull/890))
- Support MIM ([#870](https://github.com/open-mmlab/mmaction2/pull/870))
- Support ACRN and Focal Loss ([#891](https://github.com/open-mmlab/mmaction2/pull/891))
- Support Jester dataset ([#864](https://github.com/open-mmlab/mmaction2/pull/864))
**Improvements**
- Add `metric_options` for evaluation to docs ([#873](https://github.com/open-mmlab/mmaction2/pull/873))
- Support creating a new label map based on custom classes for demos about spatio temporal demo ([#879](https://github.com/open-mmlab/mmaction2/pull/879))
- Improve document about AVA dataset preparation ([#878](https://github.com/open-mmlab/mmaction2/pull/878))
- Provide a script to extract clip-level feature ([#856](https://github.com/open-mmlab/mmaction2/pull/856))
**Bug and Typo Fixes**
- Fix issues about resume ([#877](https://github.com/open-mmlab/mmaction2/pull/877), [#878](https://github.com/open-mmlab/mmaction2/pull/878))
- Correct the key name of `eval_results` dictionary for metric 'mmit_mean_average_precision' ([#885](https://github.com/open-mmlab/mmaction2/pull/885))
**ModelZoo**
- Support Jester dataset ([#864](https://github.com/open-mmlab/mmaction2/pull/864))
- Support ACRN and Focal Loss ([#891](https://github.com/open-mmlab/mmaction2/pull/891))
### 0.14.0 (30/04/2021)
**Highlights**
- Support TRN
- Support Diving48
**New Features**
- Support TRN ([#755](https://github.com/open-mmlab/mmaction2/pull/755))
- Support Diving48 ([#835](https://github.com/open-mmlab/mmaction2/pull/835))
- Support Webcam Demo for Spatio-temporal Action Detection Models ([#795](https://github.com/open-mmlab/mmaction2/pull/795))
**Improvements**
- Add softmax option for pytorch2onnx tool ([#781](https://github.com/open-mmlab/mmaction2/pull/781))
- Support TRN ([#755](https://github.com/open-mmlab/mmaction2/pull/755))
- Test with onnx models and TensorRT engines ([#758](https://github.com/open-mmlab/mmaction2/pull/758))
- Speed up AVA Testing ([#784](https://github.com/open-mmlab/mmaction2/pull/784))
- Add QR code in CN README ([#812](https://github.com/open-mmlab/mmaction2/pull/812))
- Add Alternative way to download Kinetics ([#817](https://github.com/open-mmlab/mmaction2/pull/817), [#822](https://github.com/open-mmlab/mmaction2/pull/822))
- Use EvalHook in MMCV with backward compatibility ([#793](https://github.com/open-mmlab/mmaction2/pull/793))
- Use MMCV Model Registry ([#843](https://github.com/open-mmlab/mmaction2/pull/843))
**Bug and Typo Fixes**
- Fix a bug in pytorch2onnx.py when `num_classes <= 4` ([#800](https://github.com/open-mmlab/mmaction2/pull/800), [#824](https://github.com/open-mmlab/mmaction2/pull/824))
- Support [imgaug](https://imgaug.readthedocs.io/en/latest/index.html) for augmentations in the data pipeline ([#492](https://github.com/open-mmlab/mmaction2/pull/492))
- Support setting `max_testing_views` for extremely large models to save GPU memory used ([#511](https://github.com/open-mmlab/mmaction2/pull/511))
- Fix a bug in BaseDataset when `data_prefix` is None ([#314](https://github.com/open-mmlab/mmaction2/pull/314))
- Fix a bug about `tmp_folder` in `OpenCVInit` ([#357](https://github.com/open-mmlab/mmaction2/pull/357))
- Fix `get_thread_id` when not using disk as backend ([#354](https://github.com/open-mmlab/mmaction2/pull/354), [#357](https://github.com/open-mmlab/mmaction2/pull/357))
- Fix the bug of HVU object `num_classes` from 1679 to 1678 ([#307](https://github.com/open-mmlab/mmaction2/pull/307))
- Fix typo in `export_model.md` ([#399](https://github.com/open-mmlab/mmaction2/pull/399))
- Fix OmniSource training configs ([#321](https://github.com/open-mmlab/mmaction2/pull/321))
- Fix Issue #306: Bug of SampleAVAFrames ([#317](https://github.com/open-mmlab/mmaction2/pull/317))
**ModelZoo**
- Add SlowOnly model for GYM99, both RGB and Flow ([#336](https://github.com/open-mmlab/mmaction2/pull/336))
- Add auto modelzoo statistics in readthedocs ([#327](https://github.com/open-mmlab/mmaction2/pull/327))
- Add TSN for HMDB51 pretrained on Kinetics400, Moments in Time and ImageNet. ([#372](https://github.com/open-mmlab/mmaction2/pull/372))
### v0.8.0 (31/10/2020)
**Highlights**
- Support [OmniSource](https://arxiv.org/abs/2003.13042)
- Support C3D
- Support video recognition with audio modality
- Support HVU
- Support X3D
**New Features**
- Support AVA dataset preparation ([#266](https://github.com/open-mmlab/mmaction2/pull/266))
- Support the training of video recognition dataset with multiple tag categories ([#235](https://github.com/open-mmlab/mmaction2/pull/235))
- Support joint training with multiple training datasets of multiple formats, including images, untrimmed videos, etc. ([#242](https://github.com/open-mmlab/mmaction2/pull/242))
- Support to specify a start epoch to conduct evaluation ([#216](https://github.com/open-mmlab/mmaction2/pull/216))
- Implement X3D models, support testing with model weights converted from SlowFast ([#288](https://github.com/open-mmlab/mmaction2/pull/288))
- Support specify a start epoch to conduct evaluation ([#216](https://github.com/open-mmlab/mmaction2/pull/216))
**Improvements**
- Set default values of 'average_clips' in each config file so that there is no need to set it explicitly during testing in most cases ([#232](https://github.com/open-mmlab/mmaction2/pull/232))
- Extend HVU datatools to generate individual file list for each tag category ([#258](https://github.com/open-mmlab/mmaction2/pull/258))
- Support data preparation for Kinetics-600 and Kinetics-700 ([#254](https://github.com/open-mmlab/mmaction2/pull/254))
- Use `metric_dict` to replace hardcoded arguments in `evaluate` function ([#286](https://github.com/open-mmlab/mmaction2/pull/286))
- Add `cfg-options` in arguments to override some settings in the used config for convenience ([#212](https://github.com/open-mmlab/mmaction2/pull/212))
- Rename the old evaluating protocol `mean_average_precision` as `mmit_mean_average_precision` since it is only used on MMIT and is not the `mAP` we usually talk about. Add `mean_average_precision`, which is the real `mAP` ([#235](https://github.com/open-mmlab/mmaction2/pull/235))
- Add accurate setting (Three crop * 2 clip) and report corresponding performance for TSM model ([#241](https://github.com/open-mmlab/mmaction2/pull/241))
- Add citations in each preparing_dataset.md in `tools/data/dataset` ([#289](https://github.com/open-mmlab/mmaction2/pull/289))
- Update the performance of audio-visual fusion on Kinetics-400 ([#281](https://github.com/open-mmlab/mmaction2/pull/281))
- Support data preparation of OmniSource web datasets, including GoogleImage, InsImage, InsVideo and KineticsRawVideo ([#294](https://github.com/open-mmlab/mmaction2/pull/294))
- Use `metric_options` dict to provide metric args in `evaluate` ([#286](https://github.com/open-mmlab/mmaction2/pull/286))
**Bug Fixes**
- Register `FrameSelector` in `PIPELINES` ([#268](https://github.com/open-mmlab/mmaction2/pull/268))
- Fix the potential bug for default value in dataset_setting ([#245](https://github.com/open-mmlab/mmaction2/pull/245))
- Fix multi-node dist test ([#292](https://github.com/open-mmlab/mmaction2/pull/292))
- Fix the data preparation bug for `something-something` dataset ([#278](https://github.com/open-mmlab/mmaction2/pull/278))
- Fix the invalid config url in slowonly README data benchmark ([#249](https://github.com/open-mmlab/mmaction2/pull/249))
- Validate that the performance of models trained with videos have no significant difference comparing to the performance of models trained with rawframes ([#256](https://github.com/open-mmlab/mmaction2/pull/256))
- Correct the `img_norm_cfg` used by TSN-3seg-R50 UCF-101 model, improve the Top-1 accuracy by 3% ([#273](https://github.com/open-mmlab/mmaction2/pull/273))
**ModelZoo**
- Add Baselines for Kinetics-600 and Kinetics-700, including TSN-R50-8seg and SlowOnly-R50-8x8 ([#259](https://github.com/open-mmlab/mmaction2/pull/259))
- Add OmniSource benchmark on MiniKineitcs ([#296](https://github.com/open-mmlab/mmaction2/pull/296))
- Add Baselines for HVU, including TSN-R18-8seg on 6 tag categories of HVU ([#287](https://github.com/open-mmlab/mmaction2/pull/287))
- Add X3D models ported from [SlowFast](https://github.com/facebookresearch/SlowFast/)([#288](https://github.com/open-mmlab/mmaction2/pull/288))
### v0.7.0 (30/9/2020)
**Highlights**
- Support TPN
- Support JHMDB, UCF101-24, HVU dataset preparation
- support onnx model conversion
**New Features**
- Support the data pre-processing pipeline for the HVU Dataset ([#277](https://github.com/open-mmlab/mmaction2/pull/227/))
- Support real-time action recognition from web camera ([#171](https://github.com/open-mmlab/mmaction2/pull/171))
- Support onnx ([#160](https://github.com/open-mmlab/mmaction2/pull/160))
- Support UCF101-24 preparation ([#219](https://github.com/open-mmlab/mmaction2/pull/219))
- Support evaluating mAP for ActivityNet with [CUHK17_activitynet_pred](http://activity-net.org/challenges/2017/evaluation.html)([#176](https://github.com/open-mmlab/mmaction2/pull/176))
- Add the data pipeline for ActivityNet, including downloading videos, extracting RGB and Flow frames, finetuning TSN and extracting feature ([#190](https://github.com/open-mmlab/mmaction2/pull/190))
- Support JHMDB preparation ([#220](https://github.com/open-mmlab/mmaction2/pull/220))
**ModelZoo**
- Add finetuning setting for SlowOnly ([#173](https://github.com/open-mmlab/mmaction2/pull/173))
- Add TSN and SlowOnly models trained with [OmniSource](https://arxiv.org/abs/2003.13042), which achieve 75.7% Top-1 with TSN-R50-3seg and 80.4% Top-1 with SlowOnly-R101-8x8 ([#215](https://github.com/open-mmlab/mmaction2/pull/215))
**Improvements**
- Support demo with video url ([#165](https://github.com/open-mmlab/mmaction2/pull/165))
- Support multi-batch when testing ([#184](https://github.com/open-mmlab/mmaction2/pull/184))
- Add tutorial for adding a new learning rate updater ([#181](https://github.com/open-mmlab/mmaction2/pull/181))
- Add config name in meta info ([#183](https://github.com/open-mmlab/mmaction2/pull/183))
- Remove git hash in `__version__` ([#189](https://github.com/open-mmlab/mmaction2/pull/189))
- Check mmcv version ([#189](https://github.com/open-mmlab/mmaction2/pull/189))
- Update url with 'https://download.openmmlab.com' ([#208](https://github.com/open-mmlab/mmaction2/pull/208))
- Update Docker file to support PyTorch 1.6 and update `install.md` ([#209](https://github.com/open-mmlab/mmaction2/pull/209))
- Fix the bug when using OpenCV to extract only RGB frames with original shape ([#184](https://github.com/open-mmlab/mmaction2/pull/187))
- Fix the bug of sthv2 `num_classes` from 339 to 174 ([#174](https://github.com/open-mmlab/mmaction2/pull/174), [#207](https://github.com/open-mmlab/mmaction2/pull/207))
### v0.6.0 (2/9/2020)
**Highlights**
- Support TIN, CSN, SSN, NonLocal
- Support FP16 training
**New Features**
- Support NonLocal module and provide ckpt in TSM and I3D ([#41](https://github.com/open-mmlab/mmaction2/pull/41))
- Support SSN ([#33](https://github.com/open-mmlab/mmaction2/pull/33), [#37](https://github.com/open-mmlab/mmaction2/pull/37), [#52](https://github.com/open-mmlab/mmaction2/pull/52), [#55](https://github.com/open-mmlab/mmaction2/pull/55))
- Support CSN ([#87](https://github.com/open-mmlab/mmaction2/pull/87))
- Support TIN ([#53](https://github.com/open-mmlab/mmaction2/pull/53))
- Support HMDB51 dataset preparation ([#60](https://github.com/open-mmlab/mmaction2/pull/60))
- Support encoding videos from frames ([#84](https://github.com/open-mmlab/mmaction2/pull/84))
- Support FP16 training ([#25](https://github.com/open-mmlab/mmaction2/pull/25))
We provide some tips for MMAction2 data preparation in this file.
<!-- TOC -->
-[Notes on Video Data Format](#notes-on-video-data-format)
-[Getting Data](#getting-data)
-[Prepare videos](#prepare-videos)
-[Extract frames](#extract-frames)
-[Alternative to denseflow](#alternative-to-denseflow)
-[Generate file list](#generate-file-list)
-[Prepare audio](#prepare-audio)
<!-- TOC -->
## Notes on Video Data Format
MMAction2 supports two types of data format: raw frames and video. The former is widely used in previous projects such as [TSN](https://github.com/yjxiong/temporal-segment-networks).
This is fast when SSD is available but fails to scale to the fast-growing datasets.
(For example, the newest edition of [Kinetics](https://deepmind.com/research/open-source/open-source-datasets/kinetics/) has 650K videos and the total frames will take up several TBs.)
The latter saves much space but has to do the computation intensive video decoding at execution time.
To make video decoding faster, we support several efficient video loading libraries, such as [decord](https://github.com/zhreshold/decord), [PyAV](https://github.com/PyAV-Org/PyAV), etc.
## Getting Data
The following guide is helpful when you want to experiment with custom dataset.
Similar to the datasets stated above, it is recommended organizing in `$MMACTION2/data/$DATASET`.
### Prepare videos
Please refer to the official website and/or the official script to prepare the videos.
Note that the videos should be arranged in either
(1). A two-level directory organized by `${CLASS_NAME}/${VIDEO_ID}`, which is recommended to be used for action recognition datasets (such as UCF101 and Kinetics)
(2). A single-level directory, which is recommended to be used for action detection datasets or those with multiple annotations per video (such as THUMOS14).
### Extract frames
To extract both frames and optical flow, you can use the tool [denseflow](https://github.com/open-mmlab/denseflow) we wrote.
Since different frame extraction tools produce different number of frames,
it is beneficial to use the same tool to do both frame extraction and the flow computation, to avoid mismatching of frame counts.
In case your device doesn't fulfill the installation requirement of [denseflow](https://github.com/open-mmlab/denseflow)(like Nvidia driver version), or you just want to see some quick demos about flow extraction, we provide a python script `tools/misc/flow_extraction.py` as an alternative to denseflow. You can use it for rgb frames and optical flow extraction from one or several videos. Note that the speed of the script is much slower than denseflow, since it runs optical flow algorithms on CPU.
-`INPUT`: Videos for frame extraction, can be single video or a video list, the video list should be a txt file and just consists of filenames without directories.
-`PREFIX`: The prefix of input videos, used when input is a video list.
-`DEST`: The destination to save extracted frames.
-`RGB_TMPL`: The template filename of rgb frames.
-`FLOW_TMPL`: The template filename of flow frames.
-`START_IDX`: The start index of extracted frames.
-`METHOD`: The method used to generate flow.
-`BOUND`: The maximum of optical flow.
-`SAVE_RGB`: Also save extracted rgb frames.
### Generate file list
We provide a convenient script to generate annotation file list. You can use the following command to generate file lists given extracted frames / downloaded videos.
-`DST_ROOT`: The destination root directory of the audios.
-`EXT`: Extension of the video files. e.g., `mp4`.
-`N_WORKERS`: Number of processes to be used.
After extracting audios, you are free to decode and generate the spectrogram on-the-fly such as [this](/configs/recognition_audio/resnet/tsn_r50_64x1x1_100e_kinetics400_audio.py). As for the annotations, you can directly use those of the rawframes as long as you keep the relative position of audio files same as the rawframes directory. However, extracting spectrogram on-the-fly is slow and bad for prototype iteration. Therefore, we also provide a script (and many useful tools to play with) for you to generation spectrogram off-line.
-`AUDIO_HOME_PATH`: The root directory of the audio files.
-`SPECTROGRAM_SAVE_PATH`: The destination root directory of the audio features.
-`EXT`: Extension of the audio files. e.g., `m4a`.
-`N_WORKERS`: Number of processes to be used.
-`PART`: Determines how many parts to be splited and which part to run. e.g., `2/5` means splitting all files into 5-fold and executing the 2nd part. This is useful if you have several machines.
The annotations for audio spectrogram features are identical to those of rawframes. You can simply make a copy of `dataset_[train/val]_list_rawframes.txt` and rename it as `dataset_[train/val]_list_audio_feature.txt`
We list some common issues faced by many users and their corresponding solutions here.
-[Installation](#installation)
-[Data](#data)
-[Training](#training)
-[Testing](#testing)
-[Deploying](#deploying)
Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them.
If the contents here do not cover your issue, please create an issue using the [provided templates](/.github/ISSUE_TEMPLATE/error-report.md) and make sure you fill in all required information in the template.
## Installation
-**"No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'"**
1. Uninstall existing mmcv in the environment using `pip uninstall mmcv`
2. Install mmcv-full following the [installation instruction](https://mmcv.readthedocs.io/en/latest/#installation)
-**"OSError: MoviePy Error: creation of None failed because of the following error"**
Refer to [install.md](https://github.com/open-mmlab/mmaction2/blob/master/docs/install.md#requirements)
1. For Windows users, [ImageMagick](https://www.imagemagick.org/script/index.php) will not be automatically detected by MoviePy, there is a need to modify `moviepy/config_defaults.py` file by providing the path to the ImageMagick binary called `magick`, like `IMAGEMAGICK_BINARY = "C:\\Program Files\\ImageMagick_VERSION\\magick.exe"`
2. For Linux users, there is a need to modify the `/etc/ImageMagick-6/policy.xml` file by commenting out `<policy domain="path" rights="none" pattern="@*" />` to `<!-- <policy domain="path" rights="none" pattern="@*" /> -->`, if ImageMagick is not detected by moviepy.
-**"Why I got the error message 'Please install XXCODEBASE to use XXX' even if I have already installed XXCODEBASE?"**
You got that error message because our project failed to import a function or a class from XXCODEBASE. You can try to run the corresponding line to see what happens. One possible reason is, for some codebases in OpenMMLAB, you need to install mmcv-full before you install them.
## Data
-**FileNotFound like `No such file or directory: xxx/xxx/img_00300.jpg`**
In our repo, we set `start_index=1` as default value for rawframe dataset, and `start_index=0` as default value for video dataset.
If users encounter FileNotFound error for the first or last frame of the data, there is a need to check the files begin with offset 0 or 1,
that is `xxx_00000.jpg` or `xxx_00001.jpg`, and then change the `start_index` value of data pipeline in configs.
-**How should we preprocess the videos in the dataset? Resizing them to a fix size(all videos with the same height-width ratio) like `340x256`(1) or resizing them so that the short edges of all videos are of the same length (256px or 320px)**
We have tried both preprocessing approaches and found (2) is a better solution in general, so we use (2) with short edge length 256px as the default preprocessing setting. We benchmarked these preprocessing approaches and you may find the results in [TSN Data Benchmark](https://github.com/open-mmlab/mmaction2/tree/master/configs/recognition/tsn) and [SlowOnly Data Benchmark](https://github.com/open-mmlab/mmaction2/tree/master/configs/recognition/tsn).
-**Mismatched data pipeline items lead to errors like `KeyError: 'total_frames'`**
We have both pipeline for processing videos and frames.
**For videos**, We should decode them on the fly in the pipeline, so pairs like `DecordInit & DecordDecode`, `OpenCVInit & OpenCVDecode`, `PyAVInit & PyAVDecode` should be used for this case like [this example](https://github.com/open-mmlab/mmaction2/blob/023777cfd26bb175f85d78c455f6869673e0aa09/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py#L47-L49).
**For Frames**, the image has been decoded offline, so pipeline item `RawFrameDecode` should be used for this case like [this example](https://github.com/open-mmlab/mmaction2/blob/023777cfd26bb175f85d78c455f6869673e0aa09/configs/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb.py#L49).
`KeyError: 'total_frames'` is caused by incorrectly using `RawFrameDecode` step for videos, since when the input is a video, it can not get the `total_frame` beforehand.
## Training
-**How to just use trained recognizer models for backbone pre-training?**
Refer to [Use Pre-Trained Model](https://github.com/open-mmlab/mmaction2/blob/master/docs/tutorials/2_finetune.md#use-pre-trained-model),
in order to use the pre-trained model for the whole network, the new config adds the link of pre-trained models in the `load_from`.
And to use backbone for pre-training, you can change `pretrained` value in the backbone dict of config files to the checkpoint path / url.
When training, the unexpected keys will be ignored.
-**How to visualize the training accuracy/loss curves in real-time?**
You can refer to [tutorials/1_config.md](tutorials/1_config.md), [tutorials/7_customize_runtime.md](tutorials/7_customize_runtime.md#log-config), and [this](https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py#L118).
-**In batchnorm.py: Expected more than 1 value per channel when training**
To use batchnorm, the batch_size should be larger than 1. If `drop_last` is set as False when building dataloaders, sometimes the last batch of an epoch will have `batch_size==1` (what a coincidence ...) and training will throw out this error. You can set `drop_last` as True to avoid this error:
```python
train_dataloader=dict(drop_last=True)
```
-**How to fix stages of backbone when finetuning a model?**
You can refer to [`def _freeze_stages()`](https://github.com/open-mmlab/mmaction2/blob/0149a0e8c1e0380955db61680c0006626fd008e9/mmaction/models/backbones/x3d.py#L458) and [`frozen_stages`](https://github.com/open-mmlab/mmaction2/blob/0149a0e8c1e0380955db61680c0006626fd008e9/mmaction/models/backbones/x3d.py#L183-L184),
reminding to set `find_unused_parameters = True` in config files for distributed training or testing.
Actually, users can set `frozen_stages` to freeze stages in backbones except C3D model, since all backbones inheriting from `ResNet` and `ResNet3D` support the inner function `_freeze_stages()`.
-**How to set memcached setting in config files?**
In MMAction2, you can pass memcached kwargs to `class DecordInit` for video dataset or `RawFrameDecode` for rawframes dataset.
For more details, you can refer to [`class FileClient`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py) in MMCV for more details.
Here is an example to use memcached for rawframes dataset:
-**How to set `load_from` value in config files to finetune models?**
In MMAction2, We set `load_from=None` as default in `configs/_base_/default_runtime.py` and owing to [inheritance design](/docs/tutorials/1_config.md),
users can directly change it by setting `load_from` in their configs.
## Testing
-**How to make predicted score normalized by softmax within \[0, 1\]?**
change this in the config, make `model['test_cfg'] = dict(average_clips='prob')`.
-**What if the model is too large and the GPU memory can not fit even only one testing sample?**
By default, the 3d models are tested with 10clips x 3crops, which are 30 views in total. For extremely large models, the GPU memory can not fit even only one testing sample (cuz there are 30 views). To handle this, you can set `max_testing_views=n` in `model['test_cfg']` of the config file. If so, n views will be used as a batch during forwarding to save GPU memory used.
-**How to show test results?**
During testing, we can use the command `--out xxx.json/pkl/yaml` to output result files for checking. The testing output has exactly the same order as the test dataset.
Besides, we provide an analysis tool for evaluating a model using the output result files in [`tools/analysis/eval_metric.py`](/tools/analysis/eval_metric.py)
## Deploying
-**Why is the onnx model converted by mmaction2 throwing error when converting to other frameworks such as TensorRT?**
For now, we can only make sure that models in mmaction2 are onnx-compatible. However, some operations in onnx may be unsupported by your target framework for deployment, e.g. TensorRT in [this issue](https://github.com/open-mmlab/mmaction2/issues/414). When such situation occurs, we suggest you raise an issue and ask the community to help as long as `pytorch2onnx.py` works well and is verified numerically.
We provide easy to use scripts for feature extraction.
## Clip-level Feature Extraction
Clip-level feature extraction extract deep feature from a video clip, which usually lasts several to tens of seconds. The extracted feature is an n-dim vector for each clip. When performing multi-view feature extraction, e.g. n clips x m crops, the extracted feature will be the average of the n * m views.
Before applying clip-level feature extraction, you need to prepare a video list (which include all videos that you want to extract feature from). For example, the video list for videos in UCF101 will look like:
```
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c03.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi
...
YoYo/v_YoYo_g25_c01.avi
YoYo/v_YoYo_g25_c02.avi
YoYo/v_YoYo_g25_c03.avi
YoYo/v_YoYo_g25_c04.avi
YoYo/v_YoYo_g25_c05.avi
```
Assume the root of UCF101 videos is `data/ucf101/videos` and the name of the video list is `ucf101.txt`, to extract clip-level feature of UCF101 videos with Kinetics-400 pretrained TSN, you can use the following script:
The two config files demonstrates what a minimal config file for feature extraction looks like. You can also use other existing config files for feature extraction, as long as they use videos rather than raw frames for training and testing:
This page provides basic tutorials about the usage of MMAction2.
For installation instructions, please see [install.md](install.md).
<!-- TOC -->
-[Getting Started](#getting-started)
-[Datasets](#datasets)
-[Inference with Pre-Trained Models](#inference-with-pre-trained-models)
-[Test a dataset](#test-a-dataset)
-[High-level APIs for testing a video and rawframes](#high-level-apis-for-testing-a-video-and-rawframes)
-[Build a Model](#build-a-model)
-[Build a model with basic components](#build-a-model-with-basic-components)
-[Write a new model](#write-a-new-model)
-[Train a Model](#train-a-model)
-[Iteration pipeline](#iteration-pipeline)
-[Training setting](#training-setting)
-[Train with a single GPU](#train-with-a-single-gpu)
-[Train with multiple GPUs](#train-with-multiple-gpus)
-[Train with multiple machines](#train-with-multiple-machines)
-[Launch multiple jobs on a single machine](#launch-multiple-jobs-on-a-single-machine)
-[Tutorials](#tutorials)
<!-- TOC -->
## Datasets
It is recommended to symlink the dataset root to `$MMACTION2/data`.
If your folder structure is different, you may need to change the corresponding paths in config files.
```
mmaction2
├── mmaction
├── tools
├── configs
├── data
│ ├── kinetics400
│ │ ├── rawframes_train
│ │ ├── rawframes_val
│ │ ├── kinetics_train_list.txt
│ │ ├── kinetics_val_list.txt
│ ├── ucf101
│ │ ├── rawframes_train
│ │ ├── rawframes_val
│ │ ├── ucf101_train_list.txt
│ │ ├── ucf101_val_list.txt
│ ├── ...
```
For more information on data preparation, please see [data_preparation.md](data_preparation.md)
For using custom datasets, please refer to [Tutorial 3: Adding New Dataset](tutorials/3_new_dataset.md)
## Inference with Pre-Trained Models
We provide testing scripts to evaluate a whole dataset (Kinetics-400, Something-Something V1&V2, (Multi-)Moments in Time, etc.),
and provide some high-level apis for easier integration to other projects.
MMAction2 also supports testing with CPU. However, it will be **very slow** and should only be used for debugging on a device without GPU.
To test with CPU, one should first disable all GPUs (if exist) with `export CUDA_VISIBLE_DEVICES=-1`, and then call the testing scripts directly with `python tools/test.py {OTHER_ARGS}`.
### Test a dataset
- [x] single GPU
- [x] single node multiple GPUs
- [x] multiple node
You can use the following commands to test a dataset.
-`RESULT_FILE`: Filename of the output results. If not specified, the results will not be saved to a file.
-`EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., `top_k_accuracy`, `mean_class_accuracy` are available for all datasets in recognition, `mmit_mean_average_precision` for Multi-Moments in Time, `mean_average_precision` for Multi-Moments in Time and HVU single category. `AR@AN` for ActivityNet, etc.
-`--gpu-collect`: If specified, recognition results will be collected using gpu communication. Otherwise, it will save the results on different gpus to `TMPDIR` and collect them by the rank 0 worker.
-`TMPDIR`: Temporary directory used for collecting results from multiple workers, available when `--gpu-collect` is not specified.
-`OPTIONS`: Custom options used for evaluation. Allowed values depend on the arguments of the `evaluate` function in dataset.
-`AVG_TYPE`: Items to average the test clips. If set to `prob`, it will apply softmax before averaging the clip scores. Otherwise, it will directly average the clip scores.
-`JOB_LAUNCHER`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode.
-`LOCAL_RANK`: ID for local rank. If not specified, it will be set to 0.
-`--onnx`: If specified, recognition results will be generated by onnx model and `CHECKPOINT_FILE` should be onnx model file path. Onnx model files are generated by `/tools/deployment/pytorch2onnx.py`. For now, multi-gpu mode and dynamic input shape mode are not supported. Please note that the output tensors of dataset and the input tensors of onnx model should share the same shape. And it is recommended to remove all test-time augmentation methods in `test_pipeline`(`ThreeCrop`, `TenCrop`, `twice_sample`, etc.)
-`--tensorrt`: If specified, recognition results will be generated by TensorRT engine and `CHECKPOINT_FILE` should be TensorRT engine file path. TensorRT engines are generated by exported onnx models and TensorRT official conversion tools. For now, multi-gpu mode and dynamic input shape mode are not supported. Please note that the output tensors of dataset and the input tensors of TensorRT engine should share the same shape. And it is recommended to remove all test-time augmentation methods in `test_pipeline`(`ThreeCrop`, `TenCrop`, `twice_sample`, etc.)
Examples:
Assume that you have already downloaded the checkpoints to the directory `checkpoints/`.
1. Test TSN on Kinetics-400 (without saving the test results) and evaluate the top-k accuracy and mean class accuracy.
print(f'The top-5 labels with corresponding scores are:')
forresultinresults:
print(f'{result[0]}: ',result[1])
```
:::{note}
We define `data_prefix` in config files and set it None as default for our provided inference configs.
If the `data_prefix` is not None, the path for the video file (or rawframe directory) to get will be `data_prefix/video`.
Here, the `video` is the param in the demo scripts above.
This detail can be found in `rawframe_dataset.py` and `video_dataset.py`. For example,
- When video (rawframes) path is `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`), and `data_prefix` is None in the config file,
the param `video` should be `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME`).
- When video (rawframes) path is `SOME_DIR_PATH/VIDEO.mp4` (`SOME_DIR_PATH/VIDEO_NAME/img_xxxxx.jpg`), and `data_prefix` is `SOME_DIR_PATH` in the config file,
the param `video` should be `VIDEO.mp4` (`VIDEO_NAME`).
- When rawframes path is `VIDEO_NAME/img_xxxxx.jpg`, and `data_prefix` is None in the config file, the param `video` should be `VIDEO_NAME`.
- When passing a url instead of a local video file, you need to use OpenCV as the video decoding backend.
:::
A notebook demo can be found in [demo/demo.ipynb](/demo/demo.ipynb)
## Build a Model
### Build a model with basic components
In MMAction2, model components are basically categorized as 4 types.
- recognizer: the whole recognizer model pipeline, usually contains a backbone and cls_head.
- backbone: usually an FCN network to extract feature maps, e.g., ResNet, BNInception.
- cls_head: the component for classification task, usually contains an FC layer with some pooling layers.
- localizer: the model for localization task, currently available: BSN, BMN.
Following some basic pipelines (e.g., `Recognizer2D`), the model structure
can be customized through config files with no pains.
If we want to implement some new components, e.g., the temporal shift backbone structure as
in [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383), there are several things to do.
1. create a new file in `mmaction/models/backbones/resnet_tsm.py`.
```python
from..builderimportBACKBONES
from.resnetimportResNet
@BACKBONES.register_module()
classResNetTSM(ResNet):
def__init__(self,
depth,
num_segments=8,
is_shift=True,
shift_div=8,
shift_place='blockres',
temporal_pool=False,
**kwargs):
pass
defforward(self,x):
# implementation is ignored
pass
```
2. Import the module in `mmaction/models/backbones/__init__.py`
```python
from.resnet_tsmimportResNetTSM
```
3. modify the config file from
```python
backbone=dict(
type='ResNet',
pretrained='torchvision://resnet50',
depth=50,
norm_eval=False)
```
to
```python
backbone=dict(
type='ResNetTSM',
pretrained='torchvision://resnet50',
depth=50,
norm_eval=False,
shift_div=8)
```
### Write a new model
To write a new recognition pipeline, you need to inherit from `BaseRecognizer`,
which defines the following abstract methods.
-`forward_train()`: forward method of the training mode.
-`forward_test()`: forward method of the testing mode.
[Recognizer2D](/mmaction/models/recognizers/recognizer2d.py) and [Recognizer3D](/mmaction/models/recognizers/recognizer3d.py)
are good examples which show how to do that.
## Train a Model
### Iteration pipeline
MMAction2 implements distributed training and non-distributed training,
which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
We adopt distributed training for both single machine and multiple machines.
Supposing that the server has 8 GPUs, 8 processes will be started and each process runs on a single GPU.
Each process keeps an isolated model, data loader, and optimizer.
Model parameters are only synchronized once at the beginning.
After a forward and backward pass, gradients will be allreduced among all GPUs,
and the optimizer will update model parameters.
Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration.
### Training setting
All outputs (log files and checkpoints) will be saved to the working directory,
which is specified by `work_dir` in the config file.
By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by modifying the interval argument in the training config
```python
evaluation=dict(interval=5)# This evaluate the model per 5 epoch.
```
According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you need to set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu.
MMAction2 also supports training with CPU. However, it will be **very slow** and should only be used for debugging on a device without GPU.
To train with CPU, one should first disable all GPUs (if exist) with `export CUDA_VISIBLE_DEVICES=-1`, and then call the training scripts directly with `python tools/train.py {OTHER_ARGS}`.
-`--validate` (**strongly recommended**): Perform evaluation at every k (default value is 5, which can be modified by changing the `interval` value in `evaluation` dict in each config file) epochs during the training.
-`--test-last`: Test the final checkpoint when training is over, save the prediction to `${WORK_DIR}/last_pred.pkl`.
-`--test-best`: Test the best checkpoint when training is over, save the prediction to `${WORK_DIR}/best_pred.pkl`.
-`--work-dir ${WORK_DIR}`: Override the working directory specified in the config file.
-`--resume-from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
-`--gpus ${GPU_NUM}`: Number of gpus to use, which is only applicable to non-distributed training.
-`--gpu-ids ${GPU_IDS}`: IDs of gpus to use, which is only applicable to non-distributed training.
-`--seed ${SEED}`: Seed id for random state in python, numpy and pytorch to generate random numbers.
-`--deterministic`: If specified, it will set deterministic options for CUDNN backend.
-`JOB_LAUNCHER`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode.
-`LOCAL_RANK`: ID for local rank. If not specified, it will be set to 0.
Difference between `resume-from` and `load-from`:
`resume-from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally.
`load-from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.
Here is an example of using 8 GPUs to load TSN checkpoint.
If you can run MMAction2 on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.)
Here is an example of using 16 GPUs to train TSN on the dev partition in a slurm cluster. (use `GPUS_PER_NODE=8` to specify a single slurm cluster node with 8 GPUs.)
```shell
GPUS=16 ./tools/slurm_train.sh dev tsn_r50_k400 configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py --work-dir work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb
```
You can check [slurm_train.sh](/tools/slurm_train.sh) for full arguments and environment variables.
If you have just multiple machines connected with ethernet, you can simply run the following commands:
On the first machine:
```shell
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORTMASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG$GPUS
```
On the second machine:
```shell
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORTMASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG$GPUS
```
It can be extremely slow if you do not have high-speed networking like InfiniBand.
### Launch multiple jobs on a single machine
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
you need to specify different ports (29500 by default) for each job to avoid communication conflict.
If you use `dist_train.sh` to launch training jobs, you can set the port in commands.
If you use launch training jobs with slurm, you need to modify `dist_params` in the config files (usually the 6th line from the bottom in config files) to set different communication ports.
In `config1.py`,
```python
dist_params=dict(backend='nccl',port=29500)
```
In `config2.py`,
```python
dist_params=dict(backend='nccl',port=29501)
```
Then you can launch two jobs with `config1.py` ang `config2.py`.
Currently, we provide some tutorials for users to [learn about configs](tutorials/1_config.md), [finetune model](tutorials/2_finetune.md),
[add new dataset](tutorials/3_new_dataset.md), [customize data pipelines](tutorials/4_data_pipeline.md),
[add new modules](tutorials/5_new_modules.md), [export a model to ONNX](tutorials/6_export_model.md) and [customize runtime settings](tutorials/7_customize_runtime.md).
-[denseflow](https://github.com/open-mmlab/denseflow)(optional): See [here](https://github.com/innerlee/setup) for simple install scripts.
-[moviepy](https://zulko.github.io/moviepy/)(optional): `pip install moviepy`. See [here](https://zulko.github.io/moviepy/install.html) for official installation. **Note**(according to [this issue](https://github.com/Zulko/moviepy/issues/693)) that:
1. For Windows users, [ImageMagick](https://www.imagemagick.org/script/index.php) will not be automatically detected by MoviePy,
there is a need to modify `moviepy/config_defaults.py` file by providing the path to the ImageMagick binary called `magick`, like `IMAGEMAGICK_BINARY = "C:\\Program Files\\ImageMagick_VERSION\\magick.exe"`
2. For Linux users, there is a need to modify the `/etc/ImageMagick-6/policy.xml` file by commenting out
`<policy domain="path" rights="none" pattern="@*" />` to `<!-- <policy domain="path" rights="none" pattern="@*" /> -->`, if [ImageMagick](https://www.imagemagick.org/script/index.php) is not detected by `moviepy`.
-[Pillow-SIMD](https://docs.fast.ai/performance.html#pillow-simd)(optional): Install it by the following scripts.
mmcv-full is only compiled on PyTorch 1.x.0 because the compatibility usually holds between 1.x.0 and 1.x.1. If your PyTorch version is 1.x.1, you can install mmcv-full compiled with PyTorch 1.x.0 and it usually works well.
See [here](https://github.com/open-mmlab/mmcv#installation) for different versions of MMCV compatible to different PyTorch and CUDA versions.
Optionally you can choose to compile mmcv from source by the following command
```shell
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install-e.# package mmcv-full, which contains cuda ops, will be installed after this step
# OR pip install -e . # package mmcv, which contains no cuda ops, will be installed after this step
cd ..
```
Or directly run
```shell
pip install mmcv-full
# alternative: pip install mmcv
```
**Important:** You need to run `pip uninstall mmcv` first if you have mmcv installed. If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`.
d. Install mmdetection for spatial temporal detection tasks.
This part is **optional** if you're not going to do spatial temporal detection.
See [here](https://github.com/open-mmlab/mmdetection#installation) to install mmdetection.
:::{note}
1. The git commit id will be written to the version number with step b, e.g. 0.6.0+2e7045c. The version will also be saved in trained models.
It is recommended that you run step b each time you pull some updates from github. If C++/CUDA codes are modified, then this step is compulsory.
2. Following the above instructions, MMAction2 is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it (unless you submit some commits and want to update the version number).
3. If you would like to use `opencv-python-headless` instead of `opencv-python`,
you can install it before installing MMCV.
4. If you would like to use `PyAV`, you can install it with `conda install av -c conda-forge -y`.
5. Some dependencies are optional. Running `python setup.py develop` will only install the minimum runtime requirements.
To use optional dependencies like `decord`, either install them with `pip install -r requirements/optional.txt`
or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`,
valid keys for the `[optional]` field are `all`, `tests`, `build`, and `optional`) like `pip install -v -e .[tests,build]`.
:::
## Install with CPU only
The code can be built for CPU only environment (where CUDA isn't available).
In CPU mode you can run the demo/demo.py for example.
## Another option: Docker Image
We provide a [Dockerfile](/docker/Dockerfile) to build an image.
```shell
# build an image with PyTorch 1.6.0, CUDA 10.1, CUDNN 7.
**Important:** Make sure you've installed the [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
Run it with command:
```shell
docker run --gpus all --shm-size=8g -it-v{DATA_DIR}:/mmaction2/data mmaction2
```
## A from-scratch setup script
Here is a full script for setting up MMAction2 with conda and link the dataset path (supposing that your Kinetics-400 dataset path is $KINETICS400_ROOT).
```shell
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
# install latest pytorch prebuilt with the default prebuilt CUDA version (usually the latest)
conda install-c pytorch pytorch torchvision -y
# install the latest mmcv or mmcv-full, here we take mmcv as example
There are many research works and projects built on MMAction2.
We list some of them as examples of how to extend MMAction2 for your own projects.
As the page might not be completed, please feel free to create a PR to update this page.
## Projects as an extension
-[OTEAction2](https://github.com/openvinotoolkit/mmaction2): OpenVINO Training Extensions for Action Recognition.
## Projects of papers
There are also projects released with papers.
Some of the papers are published in top-tier conferences (CVPR, ICCV, and ECCV), the others are also highly influential.
To make this list also a reference for the community to develop and compare new video understanding algorithms, we list them following the time order of top-tier conferences.
Methods already supported and maintained by MMAction2 are not listed.
- Evidential Deep Learning for Open Set Action Recognition, ICCV 2021 Oral. [\[paper\]](https://arxiv.org/abs/2107.10161)[\[github\]](https://github.com/Cogito2012/DEAR)
- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective, ICCV 2021 Oral. [\[paper\]](https://arxiv.org/abs/2103.17263)[\[github\]](https://github.com/xvjiarui/VFS)
- MGSampler: An Explainable Sampling Strategy for Video Action Recognition, ICCV 2021. [\[paper\]](https://arxiv.org/abs/2104.09952)[\[github\]](https://github.com/MCG-NJU/MGSampler)
- MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions, ICCV 2021. [\[paper\]](https://arxiv.org/abs/2105.07404)
- Video Swin Transformer. [\[paper\]](https://arxiv.org/abs/2106.13230)[\[github\]](https://github.com/SwinTransformer/Video-Swin-Transformer)
- Long Short-Term Transformer for Online Action Detection. [\[paper\]](https://arxiv.org/abs/2107.03377)