-[Config System for Action localization](#config-system-for-action-localization)
-[Config System for Action Recognition](#config-system-for-action-recognition)
-[Config System for Spatio-Temporal Action Detection](#config-system-for-spatio-temporal-action-detection)
-[FAQ](#faq)
-[Use intermediate variables in configs](#use-intermediate-variables-in-configs)
<!-- TOC -->
## Modify config through script arguments
When submitting jobs using "tools/train.py" or "tools/test.py", you may specify `--cfg-options` to in-place modify the config.
- Update config keys of dict.
The config options can be specified following the order of the dict keys in the original config.
For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode.
- Update keys inside a list of configs.
Some config dicts are composed as a list in your config. For example, the training pipeline `data.train.pipeline` is normally a list
e.g. `[dict(type='SampleFrames'), ...]`. If you want to change `'SampleFrames'` to `'DenseSampleFrames'` in the pipeline,
you may specify `--cfg-options data.train.pipeline.0.type=DenseSampleFrames`.
- Update values of list/tuples.
If the value to be updated is a list or a tuple. For example, the config file normally sets `workflow=[('train', 1)]`. If you want to
change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark " is necessary to
support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value.
## Config File Structure
There are 3 basic component types under `config/_base_`, model, schedule, default_runtime.
Many methods could be easily constructed with one of each like TSN, I3D, SlowOnly, etc.
The configs that are composed by components from `_base_` are called _primitive_.
For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3.
For easy understanding, we recommend contributors to inherit from exiting methods.
For example, if some modification is made base on TSN, users may first inherit the basic TSN structure by specifying `_base_ = ../tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py`, then modify the necessary fields in the config files.
If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder under `configs/TASK`.
Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/understand_mmcv/config.html) for detailed documentation.
## Config File Naming Convention
We follow the style below to name config files. Contributors are advised to follow the same style.
```
{model}_[model setting]_{backbone}_[misc]_{data setting}_[gpu x batch_per_gpu]_{schedule}_{dataset}_{modality}
```
`{xxx}` is required field and `[yyy]` is optional.
-`{model}`: model type, e.g. `tsn`, `i3d`, etc.
-`[model setting]`: specific setting for some models.
-`{backbone}`: backbone type, e.g. `r50` (ResNet-50), etc.
-`[misc]`: miscellaneous setting/plugins of model, e.g. `dense`, `320p`, `video`, etc.
-`{data setting}`: frame sample setting in `{clip_len}x{frame_interval}x{num_clips}` format.
-`[gpu x batch_per_gpu]`: GPUs and samples per GPU.
-`{schedule}`: training schedule, e.g. `20e` means 20 epochs.
-`{dataset}`: dataset name, e.g. `kinetics400`, `mmit`, etc.
-`{modality}`: frame modality, e.g. `rgb`, `flow`, etc.
### Config System for Action localization
We incorporate modular design into our config system,
which is convenient to conduct various experiments.
- An Example of BMN
To help the users have a basic idea of a complete config structure and the modules in an action localization system,
we make brief comments on the config of BMN as the following.
For more detailed usage and alternative for per parameter in each module, please refer to the [API documentation](https://mmaction2.readthedocs.io/en/latest/api.html).
```python
# model settings
model=dict(# Config of the model
type='BMN',# Type of the localizer
temporal_dim=100,# Total frames selected for each video
boundary_ratio=0.5,# Ratio for determining video boundaries
num_samples=32,# Number of samples for each proposal
num_samples_per_bin=3,# Number of bin samples for each sample
type='ToTensor',# Convert other types to tensor type pipeline
keys=['raw_feature']),# Keys to be converted from image to tensor
]
data=dict(# Config of data
videos_per_gpu=8,# Batch size of each single GPU
workers_per_gpu=8,# Workers to pre-fetch data for each single GPU
train_dataloader=dict(# Additional config of train dataloader
drop_last=True),# Whether to drop out the last batch of data in training
val_dataloader=dict(# Additional config of validation dataloader
videos_per_gpu=1),# Batch size of each single GPU during evaluation
test_dataloader=dict(# Additional config of test dataloader
videos_per_gpu=2),# Batch size of each single GPU during testing
test=dict(# Testing dataset config
type=dataset_type,
ann_file=ann_file_test,
pipeline=test_pipeline,
data_prefix=data_root_val),
val=dict(# Validation dataset config
type=dataset_type,
ann_file=ann_file_val,
pipeline=val_pipeline,
data_prefix=data_root_val),
train=dict(# Training dataset config
type=dataset_type,
ann_file=ann_file_train,
pipeline=train_pipeline,
data_prefix=data_root))
# optimizer
optimizer=dict(
# Config used to build optimizer, support (1). All the optimizers in PyTorch
# whose arguments are also the same as those in PyTorch. (2). Custom optimizers
# which are built on `constructor`, referring to "tutorials/5_new_modules.md"
# for implementation.
type='Adam',# Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
lr=0.001,# Learning rate, see detail usages of the parameters in the documentation of PyTorch
weight_decay=0.0001)# Weight decay of Adam
optimizer_config=dict(# Config used to build the optimizer hook
grad_clip=None)# Most of the methods do not use gradient clip
# learning policy
lr_config=dict(# Learning rate scheduler config used to register LrUpdater hook
policy='step',# Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
step=7)# Steps to decay the learning rate
total_epochs=9# Total epochs to train the model
checkpoint_config=dict(# Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
interval=1)# Interval to save checkpoint
evaluation=dict(# Config of evaluation during training
interval=1,# Interval to perform evaluation
metrics=['AR@AN'])# Metrics to be performed
log_config=dict(# Config to register logger hook
interval=50,# Interval to print the log
hooks=[# Hooks to be implemented during training
dict(type='TextLoggerHook'),# The logger used to record the training process
# dict(type='TensorboardLoggerHook'), # The Tensorboard logger is also supported
])
# runtime settings
dist_params=dict(backend='nccl')# Parameters to setup distributed training, the port can also be set
log_level='INFO'# The level of logging
work_dir='./work_dirs/bmn_400x100_2x8_9e_activitynet_feature/'# Directory to save the model checkpoints and logs for the current experiments
load_from=None# load models as a pre-trained model from a given path. This will not resume training
resume_from=None# Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved
workflow=[('train',1)]# Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once
output_config=dict(# Config of localization output
out=f'{work_dir}/results.json',# Path to output file
output_format='json')# File format of output file
```
### Config System for Action Recognition
We incorporate modular design into our config system,
which is convenient to conduct various experiments.
- An Example of TSN
To help the users have a basic idea of a complete config structure and the modules in an action recognition system,
we make brief comments on the config of TSN as the following.
For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.
```python
# model settings
model=dict(# Config of the model
type='Recognizer2D',# Type of the recognizer
backbone=dict(# Dict for backbone
type='ResNet',# Name of the backbone
pretrained='torchvision://resnet50',# The url/site of the pretrained model
depth=50,# Depth of ResNet model
norm_eval=False),# Whether to set BN layers to eval mode when training
cls_head=dict(# Dict for classification head
type='TSNHead',# Name of classification head
num_classes=400,# Number of classes to be classified.
in_channels=2048,# The input channels of classification head.
spatial_type='avg',# Type of pooling in spatial dimension
consensus=dict(type='AvgConsensus',dim=1),# Config of consensus module
dropout_ratio=0.4,# Probability in dropout layer
init_std=0.01),# Std value for linear layer initiation
# model training and testing settings
train_cfg=None,# Config of training hyperparameters for TSN
test_cfg=dict(average_clips=None))# Config for testing hyperparameters for TSN.
# dataset settings
dataset_type='RawframeDataset'# Type of dataset for training, validation and testing
data_root='data/kinetics400/rawframes_train/'# Root path to data for training
data_root_val='data/kinetics400/rawframes_val/'# Root path to data for validation and testing
ann_file_train='data/kinetics400/kinetics400_train_list_rawframes.txt'# Path to the annotation file for training
ann_file_val='data/kinetics400/kinetics400_val_list_rawframes.txt'# Path to the annotation file for validation
ann_file_test='data/kinetics400/kinetics400_val_list_rawframes.txt'# Path to the annotation file for testing
img_norm_cfg=dict(# Config of image normalization used in data pipeline
mean=[123.675,116.28,103.53],# Mean values of different channels to normalize
std=[58.395,57.12,57.375],# Std values of different channels to normalize
to_bgr=False)# Whether to convert channels from RGB to BGR
train_pipeline=[# List of training pipeline steps
dict(# Config of SampleFrames
type='SampleFrames',# Sample frames pipeline, sampling frames from video
clip_len=1,# Frames of each sampled output clip
frame_interval=1,# Temporal interval of adjacent sampled frames
num_clips=3),# Number of clips to be sampled
dict(# Config of RawFrameDecode
type='RawFrameDecode'),# Load and decode Frames pipeline, picking raw frames with given indices
dict(# Config of Resize
type='Resize',# Resize pipeline
scale=(-1,256)),# The scale to resize images
dict(# Config of MultiScaleCrop
type='MultiScaleCrop',# Multi scale crop pipeline, cropping images with a list of randomly selected scales
input_size=224,# Input size of the network
scales=(1,0.875,0.75,0.66),# Scales of width and height to be selected
random_crop=False,# Whether to randomly sample cropping bbox
max_wh_scale_gap=1),# Maximum gap of w and h scale levels
dict(# Config of Resize
type='Resize',# Resize pipeline
scale=(224,224),# The scale to resize images
keep_ratio=False),# Whether to resize with changing the aspect ratio
dict(# Config of Flip
type='Flip',# Flip Pipeline
flip_ratio=0.5),# Probability of implementing flip
dict(# Config of Normalize
type='Normalize',# Normalize pipeline
**img_norm_cfg),# Config of image normalization
dict(# Config of FormatShape
type='FormatShape',# Format shape pipeline, Format final image shape to the given input_format
input_format='NCHW'),# Final image shape format
dict(# Config of Collect
type='Collect',# Collect pipeline that decides which keys in the data should be passed to the recognizer
keys=['imgs','label'],# Keys of input
meta_keys=[]),# Meta keys of input
dict(# Config of ToTensor
type='ToTensor',# Convert other types to tensor type pipeline
keys=['imgs','label'])# Keys to be converted from image to tensor
]
val_pipeline=[# List of validation pipeline steps
dict(# Config of SampleFrames
type='SampleFrames',# Sample frames pipeline, sampling frames from video
clip_len=1,# Frames of each sampled output clip
frame_interval=1,# Temporal interval of adjacent sampled frames
num_clips=3,# Number of clips to be sampled
test_mode=True),# Whether to set test mode in sampling
dict(# Config of RawFrameDecode
type='RawFrameDecode'),# Load and decode Frames pipeline, picking raw frames with given indices
dict(# Config of Resize
type='Resize',# Resize pipeline
scale=(-1,256)),# The scale to resize images
dict(# Config of CenterCrop
type='CenterCrop',# Center crop pipeline, cropping the center area from images
crop_size=224),# The size to crop images
dict(# Config of Flip
type='Flip',# Flip pipeline
flip_ratio=0),# Probability of implementing flip
dict(# Config of Normalize
type='Normalize',# Normalize pipeline
**img_norm_cfg),# Config of image normalization
dict(# Config of FormatShape
type='FormatShape',# Format shape pipeline, Format final image shape to the given input_format
input_format='NCHW'),# Final image shape format
dict(# Config of Collect
type='Collect',# Collect pipeline that decides which keys in the data should be passed to the recognizer
keys=['imgs','label'],# Keys of input
meta_keys=[]),# Meta keys of input
dict(# Config of ToTensor
type='ToTensor',# Convert other types to tensor type pipeline
keys=['imgs'])# Keys to be converted from image to tensor
]
test_pipeline=[# List of testing pipeline steps
dict(# Config of SampleFrames
type='SampleFrames',# Sample frames pipeline, sampling frames from video
clip_len=1,# Frames of each sampled output clip
frame_interval=1,# Temporal interval of adjacent sampled frames
num_clips=25,# Number of clips to be sampled
test_mode=True),# Whether to set test mode in sampling
dict(# Config of RawFrameDecode
type='RawFrameDecode'),# Load and decode Frames pipeline, picking raw frames with given indices
dict(# Config of Resize
type='Resize',# Resize pipeline
scale=(-1,256)),# The scale to resize images
dict(# Config of TenCrop
type='TenCrop',# Ten crop pipeline, cropping ten area from images
crop_size=224),# The size to crop images
dict(# Config of Flip
type='Flip',# Flip pipeline
flip_ratio=0),# Probability of implementing flip
dict(# Config of Normalize
type='Normalize',# Normalize pipeline
**img_norm_cfg),# Config of image normalization
dict(# Config of FormatShape
type='FormatShape',# Format shape pipeline, Format final image shape to the given input_format
input_format='NCHW'),# Final image shape format
dict(# Config of Collect
type='Collect',# Collect pipeline that decides which keys in the data should be passed to the recognizer
keys=['imgs','label'],# Keys of input
meta_keys=[]),# Meta keys of input
dict(# Config of ToTensor
type='ToTensor',# Convert other types to tensor type pipeline
keys=['imgs'])# Keys to be converted from image to tensor
]
data=dict(# Config of data
videos_per_gpu=32,# Batch size of each single GPU
workers_per_gpu=2,# Workers to pre-fetch data for each single GPU
train_dataloader=dict(# Additional config of train dataloader
drop_last=True),# Whether to drop out the last batch of data in training
val_dataloader=dict(# Additional config of validation dataloader
videos_per_gpu=1),# Batch size of each single GPU during evaluation
test_dataloader=dict(# Additional config of test dataloader
videos_per_gpu=2),# Batch size of each single GPU during testing
train=dict(# Training dataset config
type=dataset_type,
ann_file=ann_file_train,
data_prefix=data_root,
pipeline=train_pipeline),
val=dict(# Validation dataset config
type=dataset_type,
ann_file=ann_file_val,
data_prefix=data_root_val,
pipeline=val_pipeline),
test=dict(# Testing dataset config
type=dataset_type,
ann_file=ann_file_test,
data_prefix=data_root_val,
pipeline=test_pipeline))
# optimizer
optimizer=dict(
# Config used to build optimizer, support (1). All the optimizers in PyTorch
# whose arguments are also the same as those in PyTorch. (2). Custom optimizers
# which are built on `constructor`, referring to "tutorials/5_new_modules.md"
# for implementation.
type='SGD',# Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
lr=0.01,# Learning rate, see detail usages of the parameters in the documentation of PyTorch
momentum=0.9,# Momentum,
weight_decay=0.0001)# Weight decay of SGD
optimizer_config=dict(# Config used to build the optimizer hook
grad_clip=dict(max_norm=40,norm_type=2))# Use gradient clip
# learning policy
lr_config=dict(# Learning rate scheduler config used to register LrUpdater hook
policy='step',# Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
step=[40,80])# Steps to decay the learning rate
total_epochs=100# Total epochs to train the model
checkpoint_config=dict(# Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
interval=5)# Interval to save checkpoint
evaluation=dict(# Config of evaluation during training
interval=5,# Interval to perform evaluation
metrics=['top_k_accuracy','mean_class_accuracy'],# Metrics to be performed
metric_options=dict(top_k_accuracy=dict(topk=(1,3))),# Set top-k accuracy to 1 and 3 during validation
save_best='top_k_accuracy')# set `top_k_accuracy` as key indicator to save best checkpoint
eval_config=dict(
metric_options=dict(top_k_accuracy=dict(topk=(1,3))))# Set top-k accuracy to 1 and 3 during testing. You can also use `--eval top_k_accuracy` to assign evaluation metrics
log_config=dict(# Config to register logger hook
interval=20,# Interval to print the log
hooks=[# Hooks to be implemented during training
dict(type='TextLoggerHook'),# The logger used to record the training process
# dict(type='TensorboardLoggerHook'), # The Tensorboard logger is also supported
])
# runtime settings
dist_params=dict(backend='nccl')# Parameters to setup distributed training, the port can also be set
log_level='INFO'# The level of logging
work_dir='./work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/'# Directory to save the model checkpoints and logs for the current experiments
load_from=None# load models as a pre-trained model from a given path. This will not resume training
resume_from=None# Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved
workflow=[('train',1)]# Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once
```
### Config System for Spatio-Temporal Action Detection
We incorporate modular design into our config system, which is convenient to conduct various experiments.
- An Example of FastRCNN
To help the users have a basic idea of a complete config structure and the modules in a spatio-temporal action detection system,
we make brief comments on the config of FastRCNN as the following.
For more detailed usage and alternative for per parameter in each module, please refer to the API documentation.
```python
# model setting
model=dict(# Config of the model
type='FastRCNN',# Type of the detector
backbone=dict(# Dict for backbone
type='ResNet3dSlowOnly',# Name of the backbone
depth=50,# Depth of ResNet model
pretrained=None,# The url/site of the pretrained model
pretrained2d=False,# If the pretrained model is 2D
lateral=False,# If the backbone is with lateral connections
num_stages=4,# Stages of ResNet model
conv1_kernel=(1,7,7),# Conv1 kernel size
conv1_stride_t=1,# Conv1 temporal stride
pool1_stride_t=1,# Pool1 temporal stride
spatial_strides=(1,2,2,1)),# The spatial stride for each ResNet stage
roi_head=dict(# Dict for roi_head
type='AVARoIHead',# Name of the roi_head
bbox_roi_extractor=dict(# Dict for bbox_roi_extractor
type='SingleRoIExtractor3D',# Name of the bbox_roi_extractor
roi_layer_type='RoIAlign',# Type of the RoI op
output_size=8,# Output feature size of the RoI op
with_temporal_pool=True),# If temporal dim is pooled
bbox_head=dict(# Dict for bbox_head
type='BBoxHeadAVA',# Name of the bbox_head
in_channels=2048,# Number of channels of the input feature
num_classes=81,# Number of action classes + 1
multilabel=True,# If the dataset is multilabel
dropout_ratio=0.5)),# The dropout ratio used
# model training and testing settings
train_cfg=dict(# Training config of FastRCNN
rcnn=dict(# Dict for rcnn training config
assigner=dict(# Dict for assigner
type='MaxIoUAssignerAVA',# Name of the assigner
pos_iou_thr=0.9,# IoU threshold for positive examples, > pos_iou_thr -> positive
neg_iou_thr=0.9,# IoU threshold for negative examples, < neg_iou_thr -> negative
min_pos_iou=0.9),# Minimum acceptable IoU for positive examples
sampler=dict(# Dict for sample
type='RandomSampler',# Name of the sampler
num=32,# Batch Size of the sampler
pos_fraction=1,# Positive bbox fraction of the sampler
neg_pos_ub=-1,# Upper bound of the ratio of num negative to num positive
add_gt_as_proposals=True),# Add gt bboxes as proposals
pos_weight=1.0,# Loss weight of positive examples
debug=False)),# Debug mode
test_cfg=dict(# Testing config of FastRCNN
rcnn=dict(# Dict for rcnn testing config
action_thr=0.002)))# The threshold of an action
# dataset settings
dataset_type='AVADataset'# Type of dataset for training, validation and testing
data_root='data/ava/rawframes'# Root path to data
anno_root='data/ava/annotations'# Root path to annotations
ann_file_train=f'{anno_root}/ava_train_v2.1.csv'# Path to the annotation file for training
ann_file_val=f'{anno_root}/ava_val_v2.1.csv'# Path to the annotation file for validation
exclude_file_train=f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv'# Path to the exclude annotation file for training
exclude_file_val=f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv'# Path to the exclude annotation file for validation
label_file=f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt'# Path to the label file
proposal_file_train=f'{anno_root}/ava_dense_proposals_train.FAIR.recall_93.9.pkl'# Path to the human detection proposals for training examples
proposal_file_val=f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl'# Path to the human detection proposals for validation examples
img_norm_cfg=dict(# Config of image normalization used in data pipeline
mean=[123.675,116.28,103.53],# Mean values of different channels to normalize
std=[58.395,57.12,57.375],# Std values of different channels to normalize
to_bgr=False)# Whether to convert channels from RGB to BGR
train_pipeline=[# List of training pipeline steps
dict(# Config of SampleFrames
type='AVASampleFrames',# Sample frames pipeline, sampling frames from video
clip_len=4,# Frames of each sampled output clip
frame_interval=16),# Temporal interval of adjacent sampled frames
dict(# Config of RawFrameDecode
type='RawFrameDecode'),# Load and decode Frames pipeline, picking raw frames with given indices
dict(# Config of RandomRescale
type='RandomRescale',# Randomly rescale the shortedge by a given range
scale_range=(256,320)),# The shortedge size range of RandomRescale
dict(# Config of RandomCrop
type='RandomCrop',# Randomly crop a patch with the given size
size=256),# The size of the cropped patch
dict(# Config of Flip
type='Flip',# Flip Pipeline
flip_ratio=0.5),# Probability of implementing flip
dict(# Config of Normalize
type='Normalize',# Normalize pipeline
**img_norm_cfg),# Config of image normalization
dict(# Config of FormatShape
type='FormatShape',# Format shape pipeline, Format final image shape to the given input_format
input_format='NCTHW',# Final image shape format
collapse=True),# Collapse the dim N if N == 1
dict(# Config of Rename
type='Rename',# Rename keys
mapping=dict(imgs='img')),# The old name to new name mapping
dict(# Config of ToTensor
type='ToTensor',# Convert other types to tensor type pipeline
keys=['img','proposals','gt_bboxes','gt_labels']),# Keys to be converted from image to tensor
dict(# Config of ToDataContainer
type='ToDataContainer',# Convert other types to DataContainer type pipeline
fields=[# Fields to convert to DataContainer
dict(# Dict of fields
key=['proposals','gt_bboxes','gt_labels'],# Keys to Convert to DataContainer
stack=False)]),# Whether to stack these tensor
dict(# Config of Collect
type='Collect',# Collect pipeline that decides which keys in the data should be passed to the detector
keys=['img','proposals','gt_bboxes','gt_labels'],# Keys of input
meta_keys=['scores','entity_ids']),# Meta keys of input
]
val_pipeline=[# List of validation pipeline steps
dict(# Config of SampleFrames
type='AVASampleFrames',# Sample frames pipeline, sampling frames from video
clip_len=4,# Frames of each sampled output clip
frame_interval=16)# Temporal interval of adjacent sampled frames
dict(# Config of RawFrameDecode
type='RawFrameDecode'),# Load and decode Frames pipeline, picking raw frames with given indices
dict(# Config of Resize
type='Resize',# Resize pipeline
scale=(-1,256)),# The scale to resize images
dict(# Config of Normalize
type='Normalize',# Normalize pipeline
**img_norm_cfg),# Config of image normalization
dict(# Config of FormatShape
type='FormatShape',# Format shape pipeline, Format final image shape to the given input_format
input_format='NCTHW',# Final image shape format
collapse=True),# Collapse the dim N if N == 1
dict(# Config of Rename
type='Rename',# Rename keys
mapping=dict(imgs='img')),# The old name to new name mapping
dict(# Config of ToTensor
type='ToTensor',# Convert other types to tensor type pipeline
keys=['img','proposals']),# Keys to be converted from image to tensor
dict(# Config of ToDataContainer
type='ToDataContainer',# Convert other types to DataContainer type pipeline
fields=[# Fields to convert to DataContainer
dict(# Dict of fields
key=['proposals'],# Keys to Convert to DataContainer
stack=False)]),# Whether to stack these tensor
dict(# Config of Collect
type='Collect',# Collect pipeline that decides which keys in the data should be passed to the detector
keys=['img','proposals'],# Keys of input
meta_keys=['scores','entity_ids'],# Meta keys of input
nested=True)# Whether to wrap the data in a nested list
]
data=dict(# Config of data
videos_per_gpu=16,# Batch size of each single GPU
workers_per_gpu=2,# Workers to pre-fetch data for each single GPU
val_dataloader=dict(# Additional config of validation dataloader
videos_per_gpu=1),# Batch size of each single GPU during evaluation
train=dict(# Training dataset config
type=dataset_type,
ann_file=ann_file_train,
exclude_file=exclude_file_train,
pipeline=train_pipeline,
label_file=label_file,
proposal_file=proposal_file_train,
person_det_score_thr=0.9,
data_prefix=data_root),
val=dict(# Validation dataset config
type=dataset_type,
ann_file=ann_file_val,
exclude_file=exclude_file_val,
pipeline=val_pipeline,
label_file=label_file,
proposal_file=proposal_file_val,
person_det_score_thr=0.9,
data_prefix=data_root))
data['test']=data['val']# Set test_dataset as val_dataset
# optimizer
optimizer=dict(
# Config used to build optimizer, support (1). All the optimizers in PyTorch
# whose arguments are also the same as those in PyTorch. (2). Custom optimizers
# which are built on `constructor`, referring to "tutorials/5_new_modules.md"
# for implementation.
type='SGD',# Type of optimizer, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
lr=0.2,# Learning rate, see detail usages of the parameters in the documentation of PyTorch (for 8gpu)
momentum=0.9,# Momentum,
weight_decay=0.00001)# Weight decay of SGD
optimizer_config=dict(# Config used to build the optimizer hook
grad_clip=dict(max_norm=40,norm_type=2))# Use gradient clip
lr_config=dict(# Learning rate scheduler config used to register LrUpdater hook
policy='step',# Policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9
step=[40,80],# Steps to decay the learning rate
warmup='linear',# Warmup strategy
warmup_by_epoch=True,# Warmup_iters indicates iter num or epoch num
warmup_iters=5,# Number of iters or epochs for warmup
warmup_ratio=0.1)# The initial learning rate is warmup_ratio * lr
total_epochs=20# Total epochs to train the model
checkpoint_config=dict(# Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation
interval=1)# Interval to save checkpoint
workflow=[('train',1)]# Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once
evaluation=dict(# Config of evaluation during training
interval=1,save_best='mAP@0.5IOU')# Interval to perform evaluation and the key for saving best checkpoint
log_config=dict(# Config to register logger hook
interval=20,# Interval to print the log
hooks=[# Hooks to be implemented during training
dict(type='TextLoggerHook'),# The logger used to record the training process
])
# runtime settings
dist_params=dict(backend='nccl')# Parameters to setup distributed training, the port can also be set
log_level='INFO'# The level of logging
work_dir=('./work_dirs/ava/'# Directory to save the model checkpoints and logs for the current experiments
load_from=('https://download.openmmlab.com/mmaction/recognition/slowonly/'# load models as a pre-trained model from a given path. This will not resume training
To use the pre-trained model for the whole network, the new config adds the link of pre-trained models in the `load_from`.
We set `load_from=None` as default in `configs/_base_/default_runtime.py` and owing to [inheritance design](/docs/tutorials/1_config.md), users can directly change it by setting `load_from` in their configs.
```python
# use the pre-trained model for the whole TSN network
load_from='https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/mmaction-v1/recognition/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'# model path can be found in model zoo
In this tutorial, we will introduce some methods about how to customize your own dataset by reorganizing data and mixing dataset for the project.
<!-- TOC -->
-[Customize Datasets by Reorganizing Data](#customize-datasets-by-reorganizing-data)
-[Reorganize datasets to existing format](#reorganize-datasets-to-existing-format)
-[An example of a custom dataset](#an-example-of-a-custom-dataset)
-[Customize Dataset by Mixing Dataset](#customize-dataset-by-mixing-dataset)
-[Repeat dataset](#repeat-dataset)
<!-- TOC -->
## Customize Datasets by Reorganizing Data
### Reorganize datasets to existing format
The simplest way is to convert your dataset to existing dataset formats (RawframeDataset or VideoDataset).
There are three kinds of annotation files.
- rawframe annotation
The annotation of a rawframe dataset is a text file with multiple lines,
and each line indicates `frame_directory` (relative path) of a video,
`total_frames` of a video and the `label` of a video, which are split by a whitespace.
Here is an example.
```
some/directory-1 163 1
some/directory-2 122 1
some/directory-3 258 2
some/directory-4 234 2
some/directory-5 295 3
some/directory-6 121 3
```
- video annotation
The annotation of a video dataset is a text file with multiple lines,
and each line indicates a sample video with the `filepath` (relative path) and `label`,
which are split by a whitespace.
Here is an example.
```
some/path/000.mp4 1
some/path/001.mp4 1
some/path/002.mp4 2
some/path/003.mp4 2
some/path/004.mp4 3
some/path/005.mp4 3
```
- ActivityNet annotation
The annotation of ActivityNet dataset is a json file. Each key is a video name
and the corresponding value is the meta data and annotation for the video.
Here is an example.
```
{
"video1": {
"duration_second": 211.53,
"duration_frame": 6337,
"annotations": [
{
"segment": [
30.025882995319815,
205.2318595943838
],
"label": "Rock climbing"
}
],
"feature_frame": 6336,
"fps": 30.0,
"rfps": 29.9579255898
},
"video2": {
"duration_second": 26.75,
"duration_frame": 647,
"annotations": [
{
"segment": [
2.578755070202808,
24.914101404056165
],
"label": "Drinking beer"
}
],
"feature_frame": 624,
"fps": 24.0,
"rfps": 24.1869158879
}
}
```
There are two ways to work with custom datasets.
- online conversion
You can write a new Dataset class inherited from [BaseDataset](/mmaction/datasets/base.py), and overwrite three methods
`load_annotations(self)`, `evaluate(self, results, metrics, logger)` and `dump_results(self, results, out)`,
like [RawframeDataset](/mmaction/datasets/rawframe_dataset.py), [VideoDataset](/mmaction/datasets/video_dataset.py) or [ActivityNetDataset](/mmaction/datasets/activitynet_dataset.py).
- offline conversion
You can convert the annotation format to the expected format above and save it to
a pickle or json file, then you can simply use `RawframeDataset`, `VideoDataset` or `ActivityNetDataset`.
After the data pre-processing, the users need to further modify the config files to use the dataset.
Here is an example of using a custom dataset in rawframe format.
In this tutorial, we will introduce some methods about the design of data pipelines, and how to customize and extend your own data pipelines for the project.
<!-- TOC -->
-[Tutorial 4: Customize Data Pipelines](#tutorial-4-customize-data-pipelines)
-[Design of Data Pipelines](#design-of-data-pipelines)
-[Data loading](#data-loading)
-[Pre-processing](#pre-processing)
-[Formatting](#formatting)
-[Extend and Use Custom Pipelines](#extend-and-use-custom-pipelines)
<!-- TOC -->
## Design of Data Pipelines
Following typical conventions, we use `Dataset` and `DataLoader` for data loading
with multiple workers. `Dataset` returns a dict of data items corresponding
the arguments of models' forward method.
Since the data in action recognition & localization may not be the same size (image size, gt bbox size, etc.),
The `DataContainer` in MMCV is used to help collect and distribute data of different sizes.
See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
The data preparation pipeline and the dataset is decomposed. Usually a dataset
defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next operation.
We present a typical pipeline in the following figure. The blue blocks are pipeline operations.
With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange).
We have supported some lazy operators and encourage users to apply them.
Lazy ops record how the data should be processed, but it will postpone the processing on the raw data until the raw data forward `Fuse` stage.
Specifically, lazy ops avoid frequent reading and modification operation on the raw data, but process the raw data once in the final Fuse stage, thus accelerating data preprocessing.
In this tutorial, we will introduce some methods about how to customize optimizer, develop new components and new a learning rate scheduler for this project.
The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch.
## Customize Optimizer Constructor
Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNorm layers.
The users can do those fine-grained parameter tuning through customizing optimizer constructor.
You can write a new optimizer constructor inherit from [DefaultOptimizerConstructor](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py)
and overwrite the `add_params(self, params, module)` method.
An example of customized optimizer constructor is [TSMOptimizerConstructor](/mmaction/core/optimizer/tsm_optimizer_constructor.py).
More generally, a customized optimizer constructor could be defined as following.
In `mmaction/core/optimizer/my_optimizer_constructor.py`:
Then the users need to add it in the `mmaction/models/losses/__init__.py`
```python
from.my_lossimportMyLoss,my_loss
```
To use it, modify the `loss_xxx` field. Since MyLoss is for regression, we can use it for the bbox loss `loss_bbox`.
```python
loss_bbox=dict(type='MyLoss'))
```
## Add new learning rate scheduler (updater)
The default manner of constructing a lr updater(namely, 'scheduler' by pytorch convention), is to modify the config such as:
```python
...
lr_config=dict(policy='step',step=[20,40])
...
```
In the api for [`train.py`](/mmaction/apis/train.py), it will register the learning rate updater hook based on the config at:
```python
...
runner.register_training_hooks(
cfg.lr_config,
optimizer_config,
cfg.checkpoint_config,
cfg.log_config,
cfg.get('momentum_config',None))
...
```
So far, the supported updaters can be find in [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py), but if you want to customize a new learning rate updater, you may follow the steps below:
1. First, write your own LrUpdaterHook in `$MMAction2/mmaction/core/scheduler`. The snippet followed is an example of customized lr updater that uses learning rate based on a specific learning rate ratio: `lrs`, by which the learning rate decreases at each `steps`:
```python
@HOOKS.register_module()
# Register it here
classRelativeStepLrUpdaterHook(LrUpdaterHook):
# You should inheritate it from mmcv.LrUpdaterHook
def__init__(self,steps,lrs,**kwargs):
super().__init__(**kwargs)
assertlen(steps)==(len(lrs))
self.steps=steps
self.lrs=lrs
defget_lr(self,runner,base_lr):
# Only this function is required to override
# This function is called before each training epoch, return the specific learning rate here.
Open Neural Network Exchange [(ONNX)](https://onnx.ai/) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves.
<!-- TOC -->
-[Supported Models](#supported-models)
-[Usage](#usage)
-[Prerequisite](#prerequisite)
-[Recognizers](#recognizers)
-[Localizers](#localizers)
<!-- TOC -->
## Supported Models
So far, our codebase supports onnx exporting from pytorch models trained with MMAction2. The supported models are:
- I3D
- TSN
- TIN
- TSM
- R(2+1)D
- SLOWFAST
- SLOWONLY
- BMN
- BSN(tem, pem)
## Usage
For simple exporting, you can use the [script](/tools/deployment/pytorch2onnx.py) here. Note that the package `onnx` and `onnxruntime` are required for verification after exporting.
### Prerequisite
First, install onnx.
```shell
pip install onnx onnxruntime
```
We provide a python script to export the pytorch model trained by MMAction2 to ONNX.
-`--shape`: The shape of input tensor to the model. For 2D recognizer(e.g. TSN), the input should be `$batch $clip $channel $height $width`(e.g. `1 1 3 224 224`); For 3D recognizer(e.g. I3D), the input should be `$batch $clip $channel $time $height $width`(e.g. `1 1 3 32 224 224`); For localizer such as BSN, the input for each module is different, please check the `forward` function for it. If not specified, it will be set to `1 1 3 224 224`.
-`--verify`: Determines whether to verify the exported model, runnably and numerically. If not specified, it will be set to `False`.
-`--show`: Determines whether to print the architecture of the exported model. If not specified, it will be set to `False`.
-`--output-file`: The output onnx model name. If not specified, it will be set to `tmp.onnx`.
-`--is-localizer`: Determines whether the model to be exported is a localizer. If not specified, it will be set to `False`.
-`--opset-version`: Determines the operation set version of onnx, we recommend you to use a higher version such as 11 for compatibility. If not specified, it will be set to `11`.
-`--softmax`: Determines whether to add a softmax layer at the end of recognizers. If not specified, it will be set to `False`. For now, localizers are not supported.
In this tutorial, we will introduce some methods about how to customize optimization methods, training schedules, workflow and hooks when running your own settings for the project.
To modify the learning rate of the model, the users only need to modify the `lr` in the config of optimizer.
The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch.
For example, if you want to use `Adam` with the setting like `torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)` in PyTorch,
The module `mmaction.core.optimizer.my_optimizer` will be imported at the beginning of the program and the class `MyOptimizer` is then automatically registered.
Note that only the package containing the class `MyOptimizer` should be imported. `mmaction.core.optimizer.my_optimizer.MyOptimizer`**cannot** be imported directly.
#### 3. Specify the optimizer in the config file
Then you can use `MyOptimizer` in `optimizer` field of config files.
In the configs, the optimizers are defined by the field `optimizer` like the following:
The default optimizer constructor is implemented [here](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/optimizer/default_constructor.py#L11),
which could also serve as a template for new optimizer constructor.
### Additional settings
Tricks not implemented by the optimizer should be implemented through optimizer constructor (e.g., set parameter-wise learning rates) or hooks.
We list some common settings that could stabilize the training or accelerate the training. Feel free to create PR, issue for more settings.
- __Use gradient clip to stabilize training__:
Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below:
- __Use momentum schedule to accelerate model convergence__:
We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way.
Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence.
For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327)
and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130).
```python
lr_config=dict(
policy='cyclic',
target_ratio=(10,1e-4),
cyclic_times=1,
step_ratio_up=0.4,
)
momentum_config=dict(
policy='cyclic',
target_ratio=(0.85/0.95,1),
cyclic_times=1,
step_ratio_up=0.4,
)
```
## Customize Training Schedules
we use step learning rate with default value in config files, this calls [`StepLRHook`](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L153) in MMCV.
We support many other learning rate schedule [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py), such as `CosineAnnealing` and `Poly` schedule. Here are some examples
By default, we recommend users to use `EvalHook` to do evaluation after training epoch, but they can still use `val` workflow as an alternative.
Workflow is a list of (phase, epochs) to specify the running order and epochs. By default it is set to be
```python
workflow=[('train',1)]
```
which means running 1 epoch for training.
Sometimes user may want to check some metrics (e.g. loss, accuracy) about the model on the validate set.
In such case, we can set the workflow as
```python
[('train',1),('val',1)]
```
so that 1 epoch for training and 1 epoch for validation will be run iteratively.
:::{note}
1. The parameters of model will not be updated during val epoch.
2. Keyword `total_epochs` in the config only controls the number of training epochs and will not affect the validation workflow.
3. Workflows `[('train', 1), ('val', 1)]` and `[('train', 1)]` will not change the behavior of `EvalHook` because `EvalHook` is called by `after_train_epoch` and validation workflow only affect hooks that are called through `after_val_epoch`.
Therefore, the only difference between `[('train', 1), ('val', 1)]` and `[('train', 1)]` is that the runner will calculate losses on validation set after each training epoch.
:::
## Customize Hooks
### Customize self-implemented hooks
#### 1. Implement a new hook
Here we give an example of creating a new hook in MMAction2 and using it in training.
```python
frommmcv.runnerimportHOOKS,Hook
@HOOKS.register_module()
classMyHook(Hook):
def__init__(self,a,b):
pass
defbefore_run(self,runner):
pass
defafter_run(self,runner):
pass
defbefore_epoch(self,runner):
pass
defafter_epoch(self,runner):
pass
defbefore_iter(self,runner):
pass
defafter_iter(self,runner):
pass
```
Depending on the functionality of the hook, the users need to specify what the hook will do at each stage of the training in `before_run`, `after_run`, `before_epoch`, `after_epoch`, `before_iter`, and `after_iter`.
#### 2. Register the new hook
Then we need to make `MyHook` imported. Assuming the file is in `mmaction/core/utils/my_hook.py` there are two ways to do that:
- Modify `mmaction/core/utils/__init__.py` to import it.
The newly defined module should be imported in `mmaction/core/utils/__init__.py` so that the registry will
find the new module and add it:
```python
from.my_hookimportMyHook
```
- Use `custom_imports` in the config to manually import it
There are some common hooks that are not registered through `custom_hooks` but has been registered by default when importing MMCV, they are
- log_config
- checkpoint_config
- evaluation
- lr_config
- optimizer_config
- momentum_config
In those hooks, only the logger hook has the `VERY_LOW` priority, others' priority are `NORMAL`.
The above-mentioned tutorials already cover how to modify `optimizer_config`, `momentum_config`, and `lr_config`.
Here we reveals how what we can do with `log_config`, `checkpoint_config`, and `evaluation`.
#### Checkpoint config
The MMCV runner will use `checkpoint_config` to initialize [`CheckpointHook`](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/hooks/checkpoint.py#L9).
```python
checkpoint_config=dict(interval=1)
```
The users could set `max_keep_ckpts` to only save only small number of checkpoints or decide whether to store state dict of optimizer by `save_optimizer`.
More details of the arguments are [here](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.CheckpointHook)
#### Log config
The `log_config` wraps multiple logger hooks and enables to set intervals. Now MMCV supports `WandbLoggerHook`, `MlflowLoggerHook`, and `TensorboardLoggerHook`.
The detail usages can be found in the [doc](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.LoggerHook).
```python
log_config=dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
```
#### Evaluation config
The config of `evaluation` will be used to initialize the [`EvalHook`](https://github.com/open-mmlab/mmaction2/blob/master/mmaction/core/evaluation/eval_hooks.py#L12).
Except the key `interval`, other arguments such as `metrics` will be passed to the `dataset.evaluate()`
-----Analyze train time of work_dirs/some_exp/20200422_153324.log.json-----
slowest epoch 60, average time is 0.9736
fastest epoch 18, average time is 0.9001
time std over epochs is 0.0177
average iter time: 0.9330 s/iter
```
## Model Complexity
`/tools/analysis/get_flops.py` is a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) to compute the FLOPs and params of a given model.
This tool is still experimental and we do not guarantee that the number is absolutely correct.
You may use the result for simple comparisons, but double check it before you adopt it in technical reports or papers.
(1) FLOPs are related to the input shape while parameters are not. The default input shape is (1, 3, 340, 256) for 2D recognizer, (1, 3, 32, 340, 256) for 3D recognizer.
(2) Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details.
:::
## Model Conversion
### MMAction2 model to ONNX (experimental)
`/tools/deployment/pytorch2onnx.py` is a script to convert model to [ONNX](https://github.com/onnx/onnx) format.
It also supports comparing the output results between Pytorch and ONNX model for verification.
Run `pip install onnx onnxruntime` first to install the dependency.
Please note that a softmax layer could be added for recognizers by `--softmax` option, in order to get predictions in range `[0, 1]`.
Check the official docs for [running TorchServe with docker](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment).
**Note**: ${MODEL_STORE} needs to be an absolute path.
[Read the docs](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md) about the Inference (8080), Management (8081) and Metrics (8082) APis
`tools/analysis/check_videos.py` uses specified video encoder to iterate all samples that are specified by the input configuration file, looks for invalid videos (corrupted or missing), and saves the corresponding file path to the output file. Please note that after deleting invalid videos, users need to regenerate the video file list.
- MMAction: commit id [7f3490d](https://github.com/open-mmlab/mmaction/tree/7f3490d3db6a67fe7b87bfef238b757403b670e3)(1/5/2020)
- Temporal-Shift-Module: commit id [8d53d6f](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd)(5/5/2020)
- PySlowFast: commit id [8299c98](https://github.com/facebookresearch/SlowFast/tree/8299c9862f83a067fa7114ce98120ae1568a83ec)(7/7/2020)
- BSN(boundary sensitive network): commit id [f13707f](https://github.com/wzmsltw/BSN-boundary-sensitive-network/tree/f13707fbc362486e93178c39f9c4d398afe2cb2f)(12/12/2018)
- BMN(boundary matching network): commit id [45d0514](https://github.com/JJBOY/BMN-Boundary-Matching-Network/tree/45d05146822b85ca672b65f3d030509583d0135a)(17/10/2019)
| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p videos | Disk | 32x8 | **[1.42](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_videos_disk_32x8.zip)** | 8.1 | x | x | x | x | TODO | TODO |
| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 32x8 | **[0.61](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsn_256p_fast_videos_disk_32x8.zip)** | 8.1 | x | x | x | x | TODO | TODO |
| [I3D heavy](/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.34](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_heavy_256p_videos_disk_8x8.zip)** | 4.6 | x | x | x | x | [0.44](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_i3d_r50_8x8_video.log) | 4.6 |
| [I3D heavy](/configs/recognition/i3d/i3d_r50_video_heavy_8x8x1_100e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.35](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_heavy_256p_fast_videos_disk_8x8.zip)** | 4.6 | x | x | x | x | [0.36](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_i3d_r50_8x8_fast_video.log) | 4.6 |
| [I3D](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | 256p rawframes | Memcached | 8x8 | **[0.43](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/i3d_256p_rawframes_memcahed_8x8.zip)** | 5.0 | [0.56](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction/i3d_256p_rawframes_memcached_8x8.zip) | 5.0 | x | x | x | x |
| [TSM](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | 256p rawframes | Memcached | 8x8 | **[0.31](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/tsm_256p_rawframes_memcahed_8x8.zip)** | 6.9 | x | x | [0.41](https://download.openmmlab.com/mmaction/benchmark/recognition/temporal_shift_module/tsm_256p_rawframes_memcached_8x8.zip) | 9.1 | x | x |
| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.32](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowonly_256p_videos_disk_8x8.zip)** | 3.1 | TODO | TODO | x | x | [0.34](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowonly_r50_4x16_video.log) | 3.4 |
| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.25](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowonly_256p_fast_videos_disk_8x8.zip)** | 3.1 | TODO | TODO | x | x | [0.28](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowonly_r50_4x16_fast_video.log) | 3.4 |
| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.69](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowfast_256p_videos_disk_8x8.zip)** | 6.1 | x | x | x | x | [1.04](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowfast_r50_4x16_video.log) | 7.0 |
| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.68](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/slowfast_256p_fast_videos_disk_8x8.zip)** | 6.1 | x | x | x | x | [0.96](https://download.openmmlab.com/mmaction/benchmark/recognition/pyslowfast/pysf_slowfast_r50_4x16_fast_video.log) | 7.0 |
| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | 256p videos | Disk | 8x8 | **[0.45](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/r2plus1d_256p_videos_disk_8x8.zip)** | 5.1 | x | x | x | x | x | x |
| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | 256p dense-encoded video | Disk | 8x8 | **[0.44](https://download.openmmlab.com/mmaction/benchmark/recognition/mmaction2/r2plus1d_256p_fast_videos_disk_8x8.zip)** | 5.1 | x | x | x | x | x | x |