Commit 4dfcf34a authored by Sugon_ldc's avatar Sugon_ldc
Browse files

add readme

parent 5b3e36dc
# 训练方法
运行train.sh脚本进行训练
run_pretraining.sh脚本为FlagPerf使用
# 原README.md
<div align="center">
<img src="http://10.0.53.25:9090/cmcc-ailab/tsm/raw/master/resources/mmaction2_logo.png" width="600"/>
<div>&nbsp;</div>
<div align="center">
<b><font size="5">OpenMMLab website</font></b>
<sup>
<a href="https://openmmlab.com">
<i><font size="4">HOT</font></i>
</a>
</sup>
&nbsp;&nbsp;&nbsp;&nbsp;
<b><font size="5">OpenMMLab platform</font></b>
<sup>
<a href="https://platform.openmmlab.com">
<i><font size="4">TRY IT OUT</font></i>
</a>
</sup>
</div>
[![Documentation](https://readthedocs.org/projects/mmaction2/badge/?version=latest)](https://mmaction2.readthedocs.io/en/latest/)
[![actions](https://github.com/open-mmlab/mmaction2/workflows/build/badge.svg)](https://github.com/open-mmlab/mmaction2/actions)
[![codecov](https://codecov.io/gh/open-mmlab/mmaction2/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmaction2)
[![PyPI](https://img.shields.io/pypi/v/mmaction2)](https://pypi.org/project/mmaction2/)
[![LICENSE](https://img.shields.io/github/license/open-mmlab/mmaction2.svg)](https://github.com/open-mmlab/mmaction2/blob/master/LICENSE)
[![Average time to resolve an issue](https://isitmaintained.com/badge/resolution/open-mmlab/mmaction2.svg)](https://github.com/open-mmlab/mmaction2/issues)
[![Percentage of issues still open](https://isitmaintained.com/badge/open/open-mmlab/mmaction2.svg)](https://github.com/open-mmlab/mmaction2/issues)
[📘Documentation](https://mmaction2.readthedocs.io/en/latest/) |
[🛠️Installation](https://mmaction2.readthedocs.io/en/latest/install.html) |
[👀Model Zoo](https://mmaction2.readthedocs.io/en/latest/modelzoo.html) |
[🆕Update News](https://mmaction2.readthedocs.io/en/latest/changelog.html) |
[🚀Ongoing Projects](https://github.com/open-mmlab/mmaction2/projects) |
[🤔Reporting Issues](https://github.com/open-mmlab/mmaction2/issues/new/choose)
</div>
English | [简体中文](/README_zh-CN.md)
## Introduction
MMAction2 is an open-source toolbox for video understanding based on PyTorch.
It is a part of the [OpenMMLab](https://openmmlab.com/) project.
The master branch works with **PyTorch 1.5+**.
<div align="center">
<div style="float:left;margin-right:10px;">
<img src="http://10.0.53.25:9090/cmcc-ailab/tsm/raw/master/resources/mmaction2_overview.gif" width="380px"><br>
<p style="font-size:1.5vw;">Action Recognition Results on Kinetics-400</p>
</div>
<div style="float:right;margin-right:0px;">
<img src="https://user-images.githubusercontent.com/34324155/123989146-2ecae680-d9fb-11eb-916b-b9db5563a9e5.gif" width="380px"><br>
<p style="font-size:1.5vw;">Skeleton-based Action Recognition Results on NTU-RGB+D-120</p>
</div>
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/30782254/155710881-bb26863e-fcb4-458e-b0c4-33cd79f96901.gif" width="580px"/><br>
<p style="font-size:1.5vw;">Skeleton-based Spatio-Temporal Action Detection and Action Recognition Results on Kinetics-400</p>
</div>
<div align="center">
<img src="http://10.0.53.25:9090/cmcc-ailab/tsm/raw/master/resources/spatio-temporal-det.gif" width="800px"/><br>
<p style="font-size:1.5vw;">Spatio-Temporal Action Detection Results on AVA-2.1</p>
</div>
## Major Features
- **Modular design**: We decompose a video understanding framework into different components. One can easily construct a customized video understanding framework by combining different modules.
- **Support four major video understanding tasks**: MMAction2 implements various algorithms for multiple video understanding tasks, including action recognition, action localization, spatio-temporal action detection, and skeleton-based action detection. We support **27** different algorithms and **20** different datasets for the four major tasks.
- **Well tested and documented**: We provide detailed documentation and API reference, as well as unit tests.
## What's New
### 🌟 Preview of 1.x version
A brand new version of **MMAction2 v1.0.0rc0** was released in 01/09/2022:
- Unified interfaces of all components based on [MMEngine](https://github.com/open-mmlab/mmengine).
- Faster training and testing speed with complete support of mixed precision training.
- More flexible [architecture](https://mmaction2.readthedocs.io/en/1.x).
Find more new features in [1.x branch](https://github.com/open-mmlab/mmaction2/tree/1.x). Issues and PRs are welcome!
### 💎 Stable version
- (2022-03-04) We support **Multigrid** on Kinetics400, achieve 76.07% Top-1 accuracy and accelerate training speed.
- (2021-11-24) We support **2s-AGCN** on NTU60 XSub, achieve 86.06% Top-1 accuracy on joint stream and 86.89% Top-1 accuracy on bone stream respectively.
- (2021-10-29) We provide a demo for skeleton-based and rgb-based spatio-temporal detection and action recognition (demo/demo_video_structuralize.py).
- (2021-10-26) We train and test **ST-GCN** on NTU60 with 3D keypoint annotations, achieve 84.61% Top-1 accuracy (higher than 81.5% in the [paper](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/17135)).
- (2021-10-25) We provide a script(tools/data/skeleton/gen_ntu_rgbd_raw.py) to convert the NTU60 and NTU120 3D raw skeleton data to our format.
- (2021-10-25) We provide a [guide](https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/custom_dataset_training.md) on how to train PoseC3D with custom datasets, [bit-scientist](https://github.com/bit-scientist) authored this PR!
- (2021-10-16) We support **PoseC3D** on UCF101 and HMDB51, achieves 87.0% and 69.3% Top-1 accuracy with 2D skeletons only. Pre-extracted 2D skeletons are also available.
**Release**: v0.24.0 was released in 05/05/2022. Please refer to [changelog.md](docs/en/changelog.md) for details and release history.
## Installation
MMAction2 depends on [PyTorch](https://pytorch.org/), [MMCV](https://github.com/open-mmlab/mmcv), [MMDetection](https://github.com/open-mmlab/mmdetection) (optional), and [MMPose](https://github.com/open-mmlab/mmdetection)(optional).
Below are quick steps for installation.
Please refer to [install.md](docs/en/install.md) for more detailed instruction.
```shell
conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y
conda activate open-mmlab
pip3 install openmim
mim install mmcv-full
mim install mmdet # optional
mim install mmpose # optional
git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2
pip3 install -e .
```
## Get Started
Please see [getting_started.md](docs/en/getting_started.md) for the basic usage of MMAction2.
There are also tutorials:
- [learn about configs](docs/en/tutorials/1_config.md)
- [finetuning models](docs/en/tutorials/2_finetune.md)
- [adding new dataset](docs/en/tutorials/3_new_dataset.md)
- [designing data pipeline](docs/en/tutorials/4_data_pipeline.md)
- [adding new modules](docs/en/tutorials/5_new_modules.md)
- [exporting model to onnx](docs/en/tutorials/6_export_model.md)
- [customizing runtime settings](docs/en/tutorials/7_customize_runtime.md)
A Colab tutorial is also provided. You may preview the notebook [here](demo/mmaction2_tutorial.ipynb) or directly [run](https://colab.research.google.com/github/open-mmlab/mmaction2/blob/master/demo/mmaction2_tutorial.ipynb) on Colab.
## Supported Methods
<table style="margin-left:auto;margin-right:auto;font-size:1.3vw;padding:3px 5px;text-align:center;vertical-align:center;">
<tr>
<td colspan="5" style="font-weight:bold;">Action Recognition</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/c3d/README.md">C3D</a> (CVPR'2014)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tsn/README.md">TSN</a> (ECCV'2016)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/i3d/README.md">I3D</a> (CVPR'2017)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/i3d/README.md">I3D Non-Local</a> (CVPR'2018)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/r2plus1d/README.md">R(2+1)D</a> (CVPR'2018)</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/trn/README.md">TRN</a> (ECCV'2018)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tsm/README.md">TSM</a> (ICCV'2019)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tsm/README.md">TSM Non-Local</a> (ICCV'2019)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/slowonly/README.md">SlowOnly</a> (ICCV'2019)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/slowfast/README.md">SlowFast</a> (ICCV'2019)</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/csn/README.md">CSN</a> (ICCV'2019)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tin/README.md">TIN</a> (AAAI'2020)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tpn/README.md">TPN</a> (CVPR'2020)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/x3d/README.md">X3D</a> (CVPR'2020)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/omnisource/README.md">OmniSource</a> (ECCV'2020)</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition_audio/resnet/README.md">MultiModality: Audio</a> (ArXiv'2020)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tanet/README.md">TANet</a> (ArXiv'2020)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/timesformer/README.md">TimeSformer</a> (ICML'2021)</td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="5" style="font-weight:bold;">Action Localization</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/localization/ssn/README.md">SSN</a> (ICCV'2017)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/localization/bsn/README.md">BSN</a> (ECCV'2018)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/localization/bmn/README.md">BMN</a> (ICCV'2019)</td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="5" style="font-weight:bold;">Spatio-Temporal Action Detection</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/detection/acrn/README.md">ACRN</a> (ECCV'2018)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/detection/ava/README.md">SlowOnly+Fast R-CNN</a> (ICCV'2019)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/detection/ava/README.md">SlowFast+Fast R-CNN</a> (ICCV'2019)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/detection/lfb/README.md">LFB</a> (CVPR'2019)</td>
<td></td>
</tr>
<tr>
<td colspan="5" style="font-weight:bold;">Skeleton-based Action Recognition</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/stgcn/README.md">ST-GCN</a> (AAAI'2018)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/2s-agcn/README.md">2s-AGCN</a> (CVPR'2019)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/README.md">PoseC3D</a> (ArXiv'2021)</td>
<td></td>
<td></td>
</tr>
</table>
Results and models are available in the *README.md* of each method's config directory.
A summary can be found on the [**model zoo**](https://mmaction2.readthedocs.io/en/latest/recognition_models.html) page.
We will keep up with the latest progress of the community and support more popular algorithms and frameworks.
If you have any feature requests, please feel free to leave a comment in [Issues](https://github.com/open-mmlab/mmaction2/issues/19).
## Supported Datasets
<table style="margin-left:auto;margin-right:auto;font-size:1.3vw;padding:3px 5px;text-align:center;vertical-align:center;">
<tr>
<td colspan="4" style="font-weight:bold;">Action Recognition</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/hmdb51/README.md">HMDB51</a> (<a href="https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/">Homepage</a>) (ICCV'2011)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/ucf101/README.md">UCF101</a> (<a href="https://www.crcv.ucf.edu/research/data-sets/ucf101/">Homepage</a>) (CRCV-IR-12-01)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/activitynet/README.md">ActivityNet</a> (<a href="http://activity-net.org/">Homepage</a>) (CVPR'2015)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/kinetics/README.md">Kinetics-[400/600/700]</a> (<a href="https://deepmind.com/research/open-source/kinetics/">Homepage</a>) (CVPR'2017)</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/sthv1/README.md">SthV1</a> (ICCV'2017)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/sthv2/README.md">SthV2</a> (<a href="https://developer.qualcomm.com/software/ai-datasets/something-something">Homepage</a>) (ICCV'2017)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/diving48/README.md">Diving48</a> (<a href="http://www.svcl.ucsd.edu/projects/resound/dataset.html">Homepage</a>) (ECCV'2018)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/jester/README.md">Jester</a> (<a href="https://developer.qualcomm.com/software/ai-datasets/jester">Homepage</a>) (ICCV'2019)</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/mit/README.md">Moments in Time</a> (<a href="http://moments.csail.mit.edu/">Homepage</a>) (TPAMI'2019)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/mmit/README.md">Multi-Moments in Time</a> (<a href="http://moments.csail.mit.edu/challenge_iccv_2019.html">Homepage</a>) (ArXiv'2019)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/hvu/README.md">HVU</a> (<a href="https://github.com/holistic-video-understanding/HVU-Dataset">Homepage</a>) (ECCV'2020)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/omnisource/README.md">OmniSource</a> (<a href="https://kennymckormick.github.io/omnisource/">Homepage</a>) (ECCV'2020)</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/gym/README.md">FineGYM</a> (<a href="https://sdolivia.github.io/FineGym/">Homepage</a>) (CVPR'2020)</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="4" style="font-weight:bold;">Action Localization</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/thumos14/README.md">THUMOS14</a> (<a href="https://www.crcv.ucf.edu/THUMOS14/download.html">Homepage</a>) (THUMOS Challenge 2014)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/activitynet/README.md">ActivityNet</a> (<a href="http://activity-net.org/">Homepage</a>) (CVPR'2015)</td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="4" style="font-weight:bold;">Spatio-Temporal Action Detection</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/ucf101_24/README.md">UCF101-24*</a> (<a href="http://www.thumos.info/download.html">Homepage</a>) (CRCV-IR-12-01)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/jhmdb/README.md">JHMDB*</a> (<a href="http://jhmdb.is.tue.mpg.de/">Homepage</a>) (ICCV'2015)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/ava/README.md">AVA</a> (<a href="https://research.google.com/ava/index.html">Homepage</a>) (CVPR'2018)</td>
<td></td>
</tr>
<tr>
<td colspan="4" style="font-weight:bold;">Skeleton-based Action Recognition</td>
</tr>
<tr>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/skeleton/README.md">PoseC3D-FineGYM</a> (<a href="https://kennymckormick.github.io/posec3d/">Homepage</a>) (ArXiv'2021)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/skeleton/README.md">PoseC3D-NTURGB+D</a> (<a href="https://kennymckormick.github.io/posec3d/">Homepage</a>) (ArXiv'2021)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/skeleton/README.md">PoseC3D-UCF101</a> (<a href="https://kennymckormick.github.io/posec3d/">Homepage</a>) (ArXiv'2021)</td>
<td><a href="https://github.com/open-mmlab/mmaction2/blob/master/tools/data/skeleton/README.md">PoseC3D-HMDB51</a> (<a href="https://kennymckormick.github.io/posec3d/">Homepage</a>) (ArXiv'2021)</td>
</tr>
</table>
Datasets marked with * are not fully supported yet, but related dataset preparation steps are provided. A summary can be found on the [**Supported Datasets**](https://mmaction2.readthedocs.io/en/latest/supported_datasets.html) page.
## Benchmark
To demonstrate the efficacy and efficiency of our framework, we compare MMAction2 with some other popular frameworks and official releases in terms of speed. Details can be found in [benchmark](docs/en/benchmark.md).
## Data Preparation
Please refer to [data_preparation.md](docs/en/data_preparation.md) for a general knowledge of data preparation.
The supported datasets are listed in [supported_datasets.md](docs/en/supported_datasets.md)
## FAQ
Please refer to [FAQ](docs/en/faq.md) for frequently asked questions.
## Projects built on MMAction2
Currently, there are many research works and projects built on MMAction2 by users from community, such as:
- Video Swin Transformer. [\[paper\]](https://arxiv.org/abs/2106.13230)[\[github\]](https://github.com/SwinTransformer/Video-Swin-Transformer)
- Evidential Deep Learning for Open Set Action Recognition, ICCV 2021 **Oral**. [\[paper\]](https://arxiv.org/abs/2107.10161)[\[github\]](https://github.com/Cogito2012/DEAR)
- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective, ICCV 2021 **Oral**. [\[paper\]](https://arxiv.org/abs/2103.17263)[\[github\]](https://github.com/xvjiarui/VFS)
etc., check [projects.md](docs/en/projects.md) to see all related projects.
## Contributing
We appreciate all contributions to improve MMAction2. Please refer to [CONTRIBUTING.md](https://github.com/open-mmlab/mmcv/blob/master/CONTRIBUTING.md) in MMCV for more details about the contributing guideline.
## Acknowledgement
MMAction2 is an open-source project that is contributed by researchers and engineers from various colleges and companies.
We appreciate all the contributors who implement their methods or add new features and users who give valuable feedback.
We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their new models.
## Citation
If you find this project useful in your research, please consider cite:
```BibTeX
@misc{2020mmaction2,
title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark},
author={MMAction2 Contributors},
howpublished = {\url{https://github.com/open-mmlab/mmaction2}},
year={2020}
}
```
## License
This project is released under the [Apache 2.0 license](LICENSE).
## Projects in OpenMMLab
- [MIM](https://github.com/open-mmlab/mim): MIM installs OpenMMLab packages.
- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab image classification toolbox and benchmark.
- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab detection toolbox and benchmark.
- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab's next-generation platform for general 3D object detection.
- [MMYOLO](https://github.com/open-mmlab/mmyolo): OpenMMLab YOLO series toolbox and benchmark.
- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab rotated object detection toolbox and benchmark.
- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark.
- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab text detection, recognition, and understanding toolbox.
- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab pose estimation toolbox and benchmark.
- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark.
- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab self-supervised learning toolbox and benchmark.
- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab model compression toolbox and benchmark.
- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab fewshot learning toolbox and benchmark.
- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark.
- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab video perception toolbox and benchmark.
- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab optical flow toolbox and benchmark.
- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab image and video editing toolbox.
- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab image and video generative models toolbox.
- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab model deployment framework.
# TSM_PyTorch算力测试
## 1.模型介绍
TSM(Temporal Shift Module)是一种用于视频分类的模型,它使用时间平移操作将视频中的不同时间段进行分离,然后使用二维卷积神经网络对这些时间段进行分类。TSM模型采用了时间平移和空间卷积两种模块来实现视频分类。
在时间平移模块中,TSM模型通过将视频的每个时间片段进行平移来实现时间分离。这样,每个时间片段都可以被独立地处理。平移操作通过将视频分成多个时间段,并对这些时间段进行随机平移和拼接而实现。
在空间卷积模块中,TSM模型使用了二维卷积神经网络对每个时间片段进行分类。这样,模型可以学习到每个时间片段中的空间特征,并将这些特征用于分类。
TSM模型在视频分类任务上取得了很好的效果,特别是在大规模视频数据集上。
## 2.模型结构
TSM(Temporal Shift Module)模型结构主要由时间平移模块和二维卷积神经网络组成。
时间平移模块使用时间平移操作将视频中的不同时间段进行分离。具体来说,时间平移模块将视频的每个时间片段划分为多个组,然后对每个组进行随机平移和拼接,以实现时间分离。这样,每个时间片段都可以被独立地处理,从而提高了模型的效率和准确率。
二维卷积神经网络用于对每个时间片段进行分类。具体来说,TSM模型使用了一系列的卷积层、池化层和批归一化层来提取每个时间片段中的空间特征,并将这些特征用于分类。TSM模型中的卷积层通常采用3x3或5x5大小的卷积核,以捕捉不同尺度的空间特征。此外,TSM模型还使用了残差连接和注意力机制等技术来提高模型的效率和准确率。
总体来说,TSM模型结构是一个深度卷积神经网络,它使用时间平移操作将视频中的不同时间段进行分离,并使用二维卷积神经网络对每个时间片段进行分类。这种结构能够有效地提高模型的效率和准确率,在大规模视频分类任务中取得了很好的效果。
## 3.数据集
使用Something-Something V2数据集,Something-Something V2是一个大规模的视频动作识别数据集,由纽约大学、谷歌、MIT等机构联合发布。该数据集包含220,847个视频剪辑,每个视频剪辑由一系列手势动作组成,这些手势动作可以是简单的动作,例如“拍手”、“摆手”等,也可以是复杂的动作,例如“穿衣服”、“打扫卫生”等。每个动作剪辑的平均长度为2.75秒,总时长为600小时,这些视频剪辑都是在真实场景下拍摄的,包括不同的光照、背景和物体等。
Something-Something V2数据集是一个挑战性的数据集,需要模型具备对复杂场景下的手势动作进行准确分类的能力。该数据集可以用于视频动作识别、行为分析、视频检索等任务的研究和评估。
## 4.训练
### 环境配置
提供光源拉取的训练镜像
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/paddlepaddle:2.3.2-centos7.6-dtk-22.10-py38-latest
```
### 数据预处理
在/workspace/mmaction2目录下创建data子目录,进入data目录,创建软链接指向数据集实际路径:
```
ln -s /dataset/ sthv2
```
使用如下指令进行解压
```
cat 20bn-something-something-v2-?? | tar zx
```
解压完成后使用如下命令抽取RGB帧
```
bash extract_rgb_frames_opencv.sh
```
## 准确率数据
| 卡数 | 准确率 |
| :--: | :----: |
| 4 | 59.14% |
## 源码仓库及问题反馈
## 参考
https://github.com/open-mmlab/mmaction2
https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_1x1x8_50e_sthv2_rgb/20210816_224310.log
\ No newline at end of file
# 模型名称
modelName=TSM_PyTorch
# 模型描述
modelDescription=TSM(Temporal Shift Module)是一种用于视频分类的模型
# 应用场景(多个标签以英文逗号分割)
appScenario=训练,pytroch,视频分类
# 框架类型(多个标签以英文逗号分割)
frameType=PyTorch
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment