Commit 5b3e36dc authored by Sugon_ldc's avatar Sugon_ldc
Browse files

add model TSM

parents
Pipeline #315 failed with stages
in 0 seconds
Collections:
- Name: AGCN
README: configs/skeleton/2s-agcn/README.md
Models:
- Config: configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py
In Collection: AGCN
Metadata:
Architecture: AGCN
Batch Size: 24
Epochs: 80
Parameters: 3472176
Training Data: NTU60-XSub
Training Resources: 1 GPU
Name: agcn_80e_ntu60_xsub_keypoint_3d
Results:
Dataset: NTU60-XSub
Metrics:
Top 1 Accuracy: 86.06
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d.log
Weights: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d/2sagcn_80e_ntu60_xsub_keypoint_3d-3bed61ba.pth
- Config: configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d.py
In Collection: AGCN
Metadata:
Architecture: AGCN
Batch Size: 24
Epochs: 80
Parameters: 3472176
Training Data: NTU60-XSub
Training Resources: 2 GPU
Name: agcn_80e_ntu60_xsub_bone_3d
Results:
Dataset: NTU60-XSub
Metrics:
Top 1 Accuracy: 86.89
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d.log
Weights: https://download.openmmlab.com/mmaction/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_bone_3d/2sagcn_80e_ntu60_xsub_bone_3d-278b8815.pth
# PoseC3D
[Revisiting Skeleton-based Action Recognition](https://arxiv.org/abs/2104.13586)
<!-- [ALGORITHM] -->
## Abstract
<!-- [ABSTRACT] -->
Human skeleton, as a compact representation of human action, has received increasing attention in recent years. Many skeleton-based action recognition methods adopt graph convolutional networks (GCN) to extract features on top of human skeletons. Despite the positive results shown in previous works, GCN-based methods are subject to limitations in robustness, interoperability, and scalability. In this work, we propose PoseC3D, a new approach to skeleton-based action recognition, which relies on a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons. Compared to GCN-based methods, PoseC3D is more effective in learning spatiotemporal features, more robust against pose estimation noises, and generalizes better in cross-dataset settings. Also, PoseC3D can handle multiple-person scenarios without additional computation cost, and its features can be easily integrated with other modalities at early fusion stages, which provides a great design space to further boost the performance. On four challenging datasets, PoseC3D consistently obtains superior performance, when used alone on skeletons and in combination with the RGB modality.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/34324155/142995620-21b5536c-8cda-48cd-9cb9-50b70cab7a89.png" width="800"/>
</div>
<table>
<thead>
<tr>
<td>
<div align="center">
<b> Pose Estimation Results </b>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116529341-6fc95080-a90f-11eb-8f0d-57fdb35d1ba4.gif" width="455"/>
<br/>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116531676-04cd4900-a912-11eb-8db4-a93343bedd01.gif" width="455"/>
</div></td>
<td>
<div align="center">
<b> Keypoint Heatmap Volume Visualization </b>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116529336-6dff8d00-a90f-11eb-807e-4d9168997655.gif" width="256"/>
<br/>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116531658-00a12b80-a912-11eb-957b-561c280a86da.gif" width="256"/>
</div></td>
<td>
<div align="center">
<b> Limb Heatmap Volume Visualization </b>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116529322-6a6c0600-a90f-11eb-81df-6fbb36230bd0.gif" width="256"/>
<br/>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116531649-fed76800-a911-11eb-8ca9-0b4e58f43ad9.gif" width="256"/>
</div></td>
</tr>
</thead>
</table>
## Results and Models
### FineGYM
| config | pseudo heatmap | gpus | backbone | Mean Top-1 | ckpt | log | json |
| :---------------------------------------------------------------------------------------------------- | :------------: | :---: | :----------: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------: |
| [slowonly_r50_u48_240e_gym_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py) | keypoint | 8 x 2 | SlowOnly-R50 | 93.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint-b07a98a0.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.json) |
| [slowonly_r50_u48_240e_gym_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb.py) | limb | 8 x 2 | SlowOnly-R50 | 94.0 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb-c0d7b482.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.json) |
| Fusion | | | | 94.3 | | | |
### NTU60_XSub
| config | pseudo heatmap | gpus | backbone | Top-1 | ckpt | log | json |
| :------------------------------------------------------------------------------------------------------------------ | :------------: | :---: | :----------: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [slowonly_r50_u48_240e_ntu60_xsub_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint.py) | keypoint | 8 x 2 | SlowOnly-R50 | 93.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint-f3adabf1.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.json) |
| [slowonly_r50_u48_240e_ntu60_xsub_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb.py) | limb | 8 x 2 | SlowOnly-R50 | 93.4 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb-1d69006a.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.json) |
| Fusion | | | | 94.1 | | | |
### NTU120_XSub
| config | pseudo heatmap | gpus | backbone | Top-1 | ckpt | log | json |
| :-------------------------------------------------------------------------------------------------------------------- | :------------: | :---: | :----------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [slowonly_r50_u48_240e_ntu120_xsub_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py) | keypoint | 8 x 2 | SlowOnly-R50 | 86.3 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.json) |
| [slowonly_r50_u48_240e_ntu120_xsub_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb.py) | limb | 8 x 2 | SlowOnly-R50 | 85.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb-803c2317.pth?) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.json) |
| Fusion | | | | 86.9 | | | |
### UCF101
| config | pseudo heatmap | gpus | backbone | Top-1 | ckpt | log | json |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :----------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint](/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.py) | keypoint | 8 | SlowOnly-R50 | 87.0 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint-cae8aa4a.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.json) |
### HMDB51
| config | pseudo heatmap | gpus | backbone | Top-1 | ckpt | log | json |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--: | :----------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint](/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.py) | keypoint | 8 | SlowOnly-R50 | 69.3 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint-76ffdd8b.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.json) |
:::{note}
1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default.
According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU,
e.g., lr=0.01 for 8 GPUs x 8 videos/gpu and lr=0.04 for 16 GPUs x 16 videos/gpu.
2. You can follow the guide in [Preparing Skeleton Dataset](https://github.com/open-mmlab/mmaction2/tree/master/tools/data/skeleton) to obtain skeleton annotations used in the above configs.
:::
## Train
You can use the following command to train a model.
```shell
python tools/train.py ${CONFIG_FILE} [optional arguments]
```
Example: train PoseC3D model on FineGYM dataset in a deterministic option with periodic validation.
```shell
python tools/train.py configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py \
--work-dir work_dirs/slowonly_r50_u48_240e_gym_keypoint \
--validate --seed 0 --deterministic
```
For training with your custom dataset, you can refer to [Custom Dataset Training](https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/custom_dataset_training.md).
For more details, you can refer to **Training setting** part in [getting_started](/docs/en/getting_started.md#training-setting).
## Test
You can use the following command to test a model.
```shell
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]
```
Example: test PoseC3D model on FineGYM dataset and dump the result to a pickle file.
```shell
python tools/test.py configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py \
checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \
--out result.pkl
```
For more details, you can refer to **Test a dataset** part in [getting_started](/docs/en/getting_started.md#test-a-dataset).
## Citation
```BibTeX
@misc{duan2021revisiting,
title={Revisiting Skeleton-based Action Recognition},
author={Haodong Duan and Yue Zhao and Kai Chen and Dian Shao and Dahua Lin and Bo Dai},
year={2021},
eprint={2104.13586},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
# PoseC3D
## 简介
<!-- [ALGORITHM] -->
```BibTeX
@misc{duan2021revisiting,
title={Revisiting Skeleton-based Action Recognition},
author={Haodong Duan and Yue Zhao and Kai Chen and Dian Shao and Dahua Lin and Bo Dai},
year={2021},
eprint={2104.13586},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
<table>
<thead>
<tr>
<td>
<div align="center">
<b> 姿态估计结果 </b>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116529341-6fc95080-a90f-11eb-8f0d-57fdb35d1ba4.gif" width="455"/>
<br/>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116531676-04cd4900-a912-11eb-8db4-a93343bedd01.gif" width="455"/>
</div></td>
<td>
<div align="center">
<b> 关键点热图三维体可视化 </b>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116529336-6dff8d00-a90f-11eb-807e-4d9168997655.gif" width="256"/>
<br/>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116531658-00a12b80-a912-11eb-957b-561c280a86da.gif" width="256"/>
</div></td>
<td>
<div align="center">
<b> 肢体热图三维体可视化 </b>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116529322-6a6c0600-a90f-11eb-81df-6fbb36230bd0.gif" width="256"/>
<br/>
<br/>
<img src="https://user-images.githubusercontent.com/34324155/116531649-fed76800-a911-11eb-8ca9-0b4e58f43ad9.gif" width="256"/>
</div></td>
</tr>
</thead>
</table>
## 模型库
### FineGYM
| 配置文件 | 热图类型 | GPU 数量 | 主干网络 | Mean Top-1 | ckpt | log | json |
| :---------------------------------------------------------------------------------------------------- | :------: | :------: | :----------: | :--------: | :-------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------: |
| [slowonly_r50_u48_240e_gym_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py) | 关键点 | 8 x 2 | SlowOnly-R50 | 93.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint-b07a98a0.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.json) |
| [slowonly_r50_u48_240e_gym_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb.py) | 肢体 | 8 x 2 | SlowOnly-R50 | 94.0 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb-c0d7b482.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.json) |
| 融合预测结果 | | | | 94.3 | | | |
### NTU60_XSub
| 配置文件 | 热图类型 | GPU 数量 | 主干网络 | Top-1 | ckpt | log | json |
| :------------------------------------------------------------------------------------------------------------------ | :------: | :------: | :----------: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [slowonly_r50_u48_240e_ntu60_xsub_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint.py) | 关键点 | 8 x 2 | SlowOnly-R50 | 93.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint-f3adabf1.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.json) |
| [slowonly_r50_u48_240e_ntu60_xsub_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb.py) | 肢体 | 8 x 2 | SlowOnly-R50 | 93.4 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb-1d69006a.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.json) |
| 融合预测结果 | | | | 94.1 | | | |
### NTU120_XSub
| 配置文件 | 热图类型 | GPU 数量 | 主干网络 | Top-1 | ckpt | log | json |
| :-------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :----------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [slowonly_r50_u48_240e_ntu120_xsub_keypoint](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py) | 关键点 | 8 x 2 | SlowOnly-R50 | 86.3 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.json) |
| [slowonly_r50_u48_240e_ntu120_xsub_limb](/configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb.py) | 肢体 | 8 x 2 | SlowOnly-R50 | 85.7 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb-803c2317.pth?) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.json) |
| 融合预测结果 | | | | 86.9 | | | |
### UCF101
| 配置文件 | 热图类型 | GPU 数量 | 主干网络 | Top-1 | ckpt | log | json |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :----------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint](/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.py) | 关键点 | 8 | SlowOnly-R50 | 87.0 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint-cae8aa4a.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.json) |
### HMDB51
| 配置文件 | 热图类型 | GPU 数量 | 主干网络 | Top-1 | ckpt | log | json |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :----------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint](/configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.py) | 关键点 | 8 | SlowOnly-R50 | 69.3 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint-76ffdd8b.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.json) |
注:
1. 这里的 **GPU 数量** 指的是得到模型权重文件对应的 GPU 个数。默认地,MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。
依据 [线性缩放规则](https://arxiv.org/abs/1706.02677),当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时,需要根据批大小等比例地调节学习率。
如,lr=0.2 对应 8 GPUs x 16 video/gpu,以及 lr=0.4 对应 16 GPUs x 16 video/gpu。
2. 用户可以参照 [准备骨骼数据集](https://github.com/open-mmlab/mmaction2/blob/master/tools/data/skeleton/README_zh-CN.md) 来获取以上配置文件使用的骨骼标注。
## 如何训练
用户可以使用以下指令进行模型训练。
```shell
python tools/train.py ${CONFIG_FILE} [optional arguments]
```
Example: 以确定性的训练,加以定期的验证过程进行 PoseC3D 模型在 FineGYM 数据集上的训练。
```shell
python tools/train.py configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py \
--work-dir work_dirs/slowonly_r50_u48_240e_gym_keypoint \
--validate --seed 0 --deterministic
```
有关自定义数据集上的训练,可以参考 [Custom Dataset Training](https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/custom_dataset_training.md)
更多训练细节,可参考 [基础教程](/docs/zh_cn/getting_started.md#训练配置) 中的 **训练配置** 部分。
## 如何测试
用户可以使用以下指令进行模型测试。
```shell
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]
```
Example: 在 FineGYM 数据集上测试 PoseC3D 模型,并将结果导出为一个 pickle 文件。
```shell
python tools/test.py configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py \
checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \
--out result.pkl
```
更多测试细节,可参考 [基础教程](/docs/zh_cn/getting_started.md#测试某个数据集) 中的 **测试某个数据集** 部分。
# Custom Dataset Training with PoseC3D
We provide a step-by-step tutorial on how to train your custom dataset with PoseC3D.
1. First, you should know that action recognition with PoseC3D requires skeleton information only and for that you need to prepare your custom annotation files (for training and validation). To start with, you need to replace the placeholder `mmdet_root` and `mmpose_root` in `ntu_pose_extraction.py` with your installation path. Then you need to take advantage of [ntu_pose_extraction.py](https://github.com/open-mmlab/mmaction2/blob/90fc8440961987b7fe3ee99109e2c633c4e30158/tools/data/skeleton/ntu_pose_extraction.py) as shown in [Prepare Annotations](https://github.com/open-mmlab/mmaction2/blob/master/tools/data/skeleton/README.md#prepare-annotations) to extract 2D keypoints for each video in your custom dataset. The command looks like (assuming the name of your video is `some_video_from_my_dataset.mp4`):
```shell
# You can use the above command to generate pickle files for all of your training and validation videos.
python ntu_pose_extraction.py some_video_from_my_dataset.mp4 some_video_from_my_dataset.pkl
```
@kennymckormick's [note](https://github.com/open-mmlab/mmaction2/issues/1216#issuecomment-950130079):
> One only thing you may need to change is that: since ntu_pose_extraction.py is developed specifically for pose extraction of NTU videos, you can skip the [ntu_det_postproc](https://github.com/open-mmlab/mmaction2/blob/90fc8440961987b7fe3ee99109e2c633c4e30158/tools/data/skeleton/ntu_pose_extraction.py#L307) step when using this script for extracting pose from your custom video datasets.
2. Then, you will collect all the pickle files into one list for training (and, of course, for validation) and save them as a single file (like `custom_dataset_train.pkl` or `custom_dataset_val.pkl`). At that time, you finalize preparing annotation files for your custom dataset.
3. Next, you may use the following script (with some alterations according to your needs) for training as shown in [PoseC3D/Train](https://github.com/open-mmlab/mmaction2/blob/master/configs/skeleton/posec3d/README.md#train): `python tools/train.py configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py --work-dir work_dirs/slowonly_r50_u48_240e_ntu120_xsub_keypoint --validate --test-best --gpus 2 --seed 0 --deterministic`:
- Before running the above script, you need to modify the variables to initialize with your newly made annotation files:
```python
model = dict(
...
cls_head=dict(
...
num_classes=4, # Your class number
...
),
...
)
ann_file_train = 'data/posec3d/custom_dataset_train.pkl' # Your annotation for training
ann_file_val = 'data/posec3d/custom_dataset_val.pkl' # Your annotation for validation
load_from = 'pretrained_weight.pth' # Your can use released weights for initialization, set to None if training from scratch
# You can also alter the hyper parameters or training schedule
```
With that, your machine should start its work to let you grab a cup of coffee and watch how the training goes.
Collections:
- Name: PoseC3D
README: configs/skeleton/posec3d/README.md
Paper:
URL: https://arxiv.org/abs/2104.13586
Title: Revisiting Skeleton-based Action Recognition
Models:
- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint.py
In Collection: PoseC3D
Metadata:
Architecture: SlowOnly-R50
Batch Size: 16
Epochs: 240
Parameters: 2044867
Training Data: FineGYM
Training Resources: 16 GPUs
pseudo heatmap: keypoint
Name: slowonly_r50_u48_240e_gym_keypoint
Results:
- Dataset: FineGYM
Metrics:
mean Top 1 Accuracy: 93.7
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint.log
Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_keypoint/slowonly_r50_u48_240e_gym_keypoint-b07a98a0.pth
- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb.py
In Collection: PoseC3D
Metadata:
Architecture: SlowOnly-R50
Batch Size: 16
Epochs: 240
Parameters: 2044867
Training Data: FineGYM
Training Resources: 16 GPUs
pseudo heatmap: limb
Name: slowonly_r50_u48_240e_gym_limb
Results:
- Dataset: FineGYM
Metrics:
mean Top 1 Accuracy: 94.0
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb.log
Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_gym_limb/slowonly_r50_u48_240e_gym_limb-c0d7b482.pth
- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint.py
In Collection: PoseC3D
Metadata:
Architecture: SlowOnly-R50
Batch Size: 16
Epochs: 240
Parameters: 2024860
Training Data: NTU60-XSub
Training Resources: 16 GPUs
pseudo heatmap: keypoint
Name: slowonly_r50_u48_240e_ntu60_xsub_keypoint
Results:
- Dataset: NTU60-XSub
Metrics:
Top 1 Accuracy: 93.7
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.log
Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint-f3adabf1.pth
- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb.py
In Collection: PoseC3D
Metadata:
Architecture: SlowOnly-R50
Batch Size: 16
Epochs: 240
Parameters: 2024860
Training Data: NTU60-XSub
Training Resources: 16 GPUs
pseudo heatmap: limb
Name: slowonly_r50_u48_240e_ntu60_xsub_limb
Results:
- Dataset: NTU60-XSub
Metrics:
Top 1 Accuracy: 93.4
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb.log
Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb/slowonly_r50_u48_240e_ntu60_xsub_limb-1d69006a.pth
- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py
In Collection: PoseC3D
Metadata:
Architecture: SlowOnly-R50
Batch Size: 16
Epochs: 240
Parameters: 2055640
Training Data: NTU120-XSub
Training Resources: 16 GPUs
pseudo heatmap: keypoint
Name: slowonly_r50_u48_240e_ntu120_xsub_keypoint
Results:
- Dataset: NTU120-XSub
Metrics:
Top 1 Accuracy: 86.3
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint.log
Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth
- Config: configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb.py
In Collection: PoseC3D
Metadata:
Architecture: SlowOnly-R50
Batch Size: 16
Epochs: 240
Parameters: 2055640
Training Data: NTU120-XSub
Training Resources: 16 GPUs
pseudo heatmap: limb
Name: slowonly_r50_u48_240e_ntu120_xsub_limb
Results:
- Dataset: NTU120-XSub
Metrics:
Top 1 Accuracy: 85.7
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb.log
Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb/slowonly_r50_u48_240e_ntu120_xsub_limb-803c2317.pth
- Config: configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.py
In Collection: PoseC3D
Metadata:
Architecture: SlowOnly-R50
Batch Size: 16
Epochs: 120
Parameters: 3029984
Training Data: HMDB51
Training Resources: 8 GPUs
pseudo heatmap: keypoint
Name: slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint
Results:
- Dataset: HMDB51
Metrics:
Top 1 Accuracy: 69.3
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint.log
Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint-76ffdd8b.pth
- Config: configs/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.py
In Collection: PoseC3D
Metadata:
Architecture: SlowOnly-R50
Batch Size: 16
Epochs: 120
Parameters: 3055584
Training Data: UCF101
Training Resources: 8 GPUs
pseudo heatmap: keypoint
Name: slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint
Results:
- Dataset: UCF101
Metrics:
Top 1 Accuracy: 87.0
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint.log
Weights: https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint-cae8aa4a.pth
model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3dSlowOnly',
depth=50,
pretrained=None,
in_channels=17,
base_channels=32,
num_stages=3,
out_indices=(2, ),
stage_blocks=(3, 4, 6),
conv1_stride_s=1,
pool1_stride_s=1,
inflate=(0, 1, 1),
spatial_strides=(2, 2, 2),
temporal_strides=(1, 1, 2),
dilations=(1, 1, 1)),
cls_head=dict(
type='I3DHead',
in_channels=512,
num_classes=51,
spatial_type='avg',
dropout_ratio=0.5),
train_cfg=dict(),
test_cfg=dict(average_clips='prob'))
dataset_type = 'PoseDataset'
ann_file = 'data/posec3d/hmdb51.pkl'
left_kp = [1, 3, 5, 7, 9, 11, 13, 15]
right_kp = [2, 4, 6, 8, 10, 12, 14, 16]
train_pipeline = [
dict(type='UniformSampleFrames', clip_len=48),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='RandomResizedCrop', area_range=(0.56, 1.0)),
dict(type='Resize', scale=(48, 48), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 56)),
dict(type='CenterCrop', crop_size=56),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(
type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 56)),
dict(type='CenterCrop', crop_size=56),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False,
double=True,
left_kp=left_kp,
right_kp=right_kp),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type='RepeatDataset',
times=10,
dataset=dict(
type=dataset_type,
ann_file=ann_file,
split='train1',
data_prefix='',
pipeline=train_pipeline)),
val=dict(
type=dataset_type,
ann_file=ann_file,
split='test1',
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file,
split='test1',
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.01, momentum=0.9,
weight_decay=0.0001) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[9, 11])
total_epochs = 12
checkpoint_config = dict(interval=1)
workflow = [('train', 1)]
evaluation = dict(
interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
interval=20, hooks=[
dict(type='TextLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/posec3d_iclr/slowonly_kinetics400_pretrained_r50_u48_120e_hmdb51_split1_keypoint' # noqa: E501
load_from = 'https://download.openmmlab.com/mmaction/skeleton/posec3d/k400_posec3d-041f49c6.pth' # noqa: E501
resume_from = None
find_unused_parameters = True
model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3dSlowOnly',
depth=50,
pretrained=None,
in_channels=17,
base_channels=32,
num_stages=3,
out_indices=(2, ),
stage_blocks=(3, 4, 6),
conv1_stride_s=1,
pool1_stride_s=1,
inflate=(0, 1, 1),
spatial_strides=(2, 2, 2),
temporal_strides=(1, 1, 2),
dilations=(1, 1, 1)),
cls_head=dict(
type='I3DHead',
in_channels=512,
num_classes=101,
spatial_type='avg',
dropout_ratio=0.5),
train_cfg=dict(),
test_cfg=dict(average_clips='prob'))
dataset_type = 'PoseDataset'
ann_file = 'data/posec3d/ucf101.pkl'
left_kp = [1, 3, 5, 7, 9, 11, 13, 15]
right_kp = [2, 4, 6, 8, 10, 12, 14, 16]
train_pipeline = [
dict(type='UniformSampleFrames', clip_len=48),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='RandomResizedCrop', area_range=(0.56, 1.0)),
dict(type='Resize', scale=(48, 48), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 56)),
dict(type='CenterCrop', crop_size=56),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(
type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 56)),
dict(type='CenterCrop', crop_size=56),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False,
double=True,
left_kp=left_kp,
right_kp=right_kp),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type='RepeatDataset',
times=10,
dataset=dict(
type=dataset_type,
ann_file=ann_file,
split='train1',
data_prefix='',
pipeline=train_pipeline)),
val=dict(
type=dataset_type,
ann_file=ann_file,
split='test1',
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file,
split='test1',
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.01, momentum=0.9,
weight_decay=0.0003) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[9, 11])
total_epochs = 12
checkpoint_config = dict(interval=1)
workflow = [('train', 1)]
evaluation = dict(
interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
interval=20, hooks=[
dict(type='TextLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/posec3d_iclr/slowonly_kinetics400_pretrained_r50_u48_120e_ucf101_split1_keypoint' # noqa: E501
load_from = 'https://download.openmmlab.com/mmaction/skeleton/posec3d/k400_posec3d-041f49c6.pth' # noqa: E501
resume_from = None
find_unused_parameters = True
model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3dSlowOnly',
depth=50,
pretrained=None,
in_channels=17,
base_channels=32,
num_stages=3,
out_indices=(2, ),
stage_blocks=(4, 6, 3),
conv1_stride_s=1,
pool1_stride_s=1,
inflate=(0, 1, 1),
spatial_strides=(2, 2, 2),
temporal_strides=(1, 1, 2),
dilations=(1, 1, 1)),
cls_head=dict(
type='I3DHead',
in_channels=512,
num_classes=99,
spatial_type='avg',
dropout_ratio=0.5),
train_cfg=dict(),
test_cfg=dict(average_clips='prob'))
dataset_type = 'PoseDataset'
ann_file_train = 'data/posec3d/gym_train.pkl'
ann_file_val = 'data/posec3d/gym_val.pkl'
left_kp = [1, 3, 5, 7, 9, 11, 13, 15]
right_kp = [2, 4, 6, 8, 10, 12, 14, 16]
train_pipeline = [
dict(type='UniformSampleFrames', clip_len=48),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='RandomResizedCrop', area_range=(0.56, 1.0)),
dict(type='Resize', scale=(56, 56), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(
type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False,
double=True,
left_kp=left_kp,
right_kp=right_kp),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix='',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.2, momentum=0.9,
weight_decay=0.0003) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0)
total_epochs = 240
checkpoint_config = dict(interval=10)
workflow = [('train', 10)]
evaluation = dict(
interval=10,
metrics=['top_k_accuracy', 'mean_class_accuracy'],
topk=(1, 5))
log_config = dict(
interval=20, hooks=[
dict(type='TextLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_gym_keypoint'
load_from = None
resume_from = None
find_unused_parameters = False
model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3dSlowOnly',
depth=50,
pretrained=None,
in_channels=17,
base_channels=32,
num_stages=3,
out_indices=(2, ),
stage_blocks=(4, 6, 3),
conv1_stride_s=1,
pool1_stride_s=1,
inflate=(0, 1, 1),
spatial_strides=(2, 2, 2),
temporal_strides=(1, 1, 2),
dilations=(1, 1, 1)),
cls_head=dict(
type='I3DHead',
in_channels=512,
num_classes=99,
spatial_type='avg',
dropout_ratio=0.5),
train_cfg=dict(),
test_cfg=dict(average_clips='prob'))
dataset_type = 'PoseDataset'
ann_file_train = 'data/posec3d/gym_train.pkl'
ann_file_val = 'data/posec3d/gym_val.pkl'
left_kp = [1, 3, 5, 7, 9, 11, 13, 15]
right_kp = [2, 4, 6, 8, 10, 12, 14, 16]
skeletons = [[0, 5], [0, 6], [5, 7], [7, 9], [6, 8], [8, 10], [5, 11],
[11, 13], [13, 15], [6, 12], [12, 14], [14, 16], [0, 1], [0, 2],
[1, 3], [2, 4], [11, 12]]
train_pipeline = [
dict(type='UniformSampleFrames', clip_len=48),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='RandomResizedCrop', area_range=(0.56, 1.0)),
dict(type='Resize', scale=(56, 56), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=False,
with_limb=True,
skeletons=skeletons),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=False,
with_limb=True,
skeletons=skeletons),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(
type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=False,
with_limb=True,
skeletons=skeletons,
double=True,
left_kp=left_kp,
right_kp=right_kp),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix='',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.2, momentum=0.9,
weight_decay=0.0003) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0)
total_epochs = 240
checkpoint_config = dict(interval=10)
workflow = [('train', 10)]
evaluation = dict(
interval=10,
metrics=['top_k_accuracy', 'mean_class_accuracy'],
topk=(1, 5))
log_config = dict(
interval=20, hooks=[
dict(type='TextLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_gym_limb'
load_from = None
resume_from = None
find_unused_parameters = False
model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3dSlowOnly',
depth=50,
pretrained=None,
in_channels=17,
base_channels=32,
num_stages=3,
out_indices=(2, ),
stage_blocks=(4, 6, 3),
conv1_stride_s=1,
pool1_stride_s=1,
inflate=(0, 1, 1),
spatial_strides=(2, 2, 2),
temporal_strides=(1, 1, 2),
dilations=(1, 1, 1)),
cls_head=dict(
type='I3DHead',
in_channels=512,
num_classes=120,
spatial_type='avg',
dropout_ratio=0.5),
train_cfg=dict(),
test_cfg=dict(average_clips='prob'))
dataset_type = 'PoseDataset'
ann_file_train = 'data/posec3d/ntu120_xsub_train.pkl'
ann_file_val = 'data/posec3d/ntu120_xsub_val.pkl'
left_kp = [1, 3, 5, 7, 9, 11, 13, 15]
right_kp = [2, 4, 6, 8, 10, 12, 14, 16]
train_pipeline = [
dict(type='UniformSampleFrames', clip_len=48),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='RandomResizedCrop', area_range=(0.56, 1.0)),
dict(type='Resize', scale=(56, 56), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(
type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False,
double=True,
left_kp=left_kp,
right_kp=right_kp),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix='',
class_prob={i: 1 + int(i >= 60)
for i in range(120)},
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.2, momentum=0.9,
weight_decay=0.0003) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0)
total_epochs = 240
checkpoint_config = dict(interval=10)
workflow = [('train', 10)]
evaluation = dict(
interval=10,
metrics=['top_k_accuracy', 'mean_class_accuracy'],
topk=(1, 5))
log_config = dict(
interval=20, hooks=[
dict(type='TextLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint'
load_from = None
resume_from = None
find_unused_parameters = False
model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3dSlowOnly',
depth=50,
pretrained=None,
in_channels=17,
base_channels=32,
num_stages=3,
out_indices=(2, ),
stage_blocks=(4, 6, 3),
conv1_stride_s=1,
pool1_stride_s=1,
inflate=(0, 1, 1),
spatial_strides=(2, 2, 2),
temporal_strides=(1, 1, 2),
dilations=(1, 1, 1)),
cls_head=dict(
type='I3DHead',
in_channels=512,
num_classes=120,
spatial_type='avg',
dropout_ratio=0.5),
train_cfg=dict(),
test_cfg=dict(average_clips='prob'))
dataset_type = 'PoseDataset'
ann_file_train = 'data/posec3d/ntu60_xsub_train.pkl'
ann_file_val = 'data/posec3d/ntu60_xsub_val.pkl'
left_kp = [1, 3, 5, 7, 9, 11, 13, 15]
right_kp = [2, 4, 6, 8, 10, 12, 14, 16]
skeletons = [[0, 5], [0, 6], [5, 7], [7, 9], [6, 8], [8, 10], [5, 11],
[11, 13], [13, 15], [6, 12], [12, 14], [14, 16], [0, 1], [0, 2],
[1, 3], [2, 4], [11, 12]]
train_pipeline = [
dict(type='UniformSampleFrames', clip_len=48),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='RandomResizedCrop', area_range=(0.56, 1.0)),
dict(type='Resize', scale=(56, 56), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=False,
with_limb=True,
skeletons=skeletons),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=False,
with_limb=True,
skeletons=skeletons),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(
type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=False,
with_limb=True,
skeletons=skeletons,
double=True,
left_kp=left_kp,
right_kp=right_kp),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix='',
class_prob={i: 1 + int(i >= 60)
for i in range(120)},
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.2, momentum=0.9,
weight_decay=0.0003) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0)
total_epochs = 240
checkpoint_config = dict(interval=10)
workflow = [('train', 10)]
evaluation = dict(
interval=10,
metrics=['top_k_accuracy', 'mean_class_accuracy'],
topk=(1, 5))
log_config = dict(
interval=20, hooks=[
dict(type='TextLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_ntu120_xsub_limb'
load_from = None
resume_from = None
find_unused_parameters = False
model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3dSlowOnly',
depth=50,
pretrained=None,
in_channels=17,
base_channels=32,
num_stages=3,
out_indices=(2, ),
stage_blocks=(4, 6, 3),
conv1_stride_s=1,
pool1_stride_s=1,
inflate=(0, 1, 1),
spatial_strides=(2, 2, 2),
temporal_strides=(1, 1, 2),
dilations=(1, 1, 1)),
cls_head=dict(
type='I3DHead',
in_channels=512,
num_classes=60,
spatial_type='avg',
dropout_ratio=0.5),
train_cfg=dict(),
test_cfg=dict(average_clips='prob'))
dataset_type = 'PoseDataset'
ann_file_train = 'data/posec3d/ntu60_xsub_train.pkl'
ann_file_val = 'data/posec3d/ntu60_xsub_val.pkl'
left_kp = [1, 3, 5, 7, 9, 11, 13, 15]
right_kp = [2, 4, 6, 8, 10, 12, 14, 16]
train_pipeline = [
dict(type='UniformSampleFrames', clip_len=48),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='RandomResizedCrop', area_range=(0.56, 1.0)),
dict(type='Resize', scale=(56, 56), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(
type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=True,
with_limb=False,
double=True,
left_kp=left_kp,
right_kp=right_kp),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix='',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.2, momentum=0.9,
weight_decay=0.0003) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0)
total_epochs = 240
checkpoint_config = dict(interval=10)
workflow = [('train', 10)]
evaluation = dict(
interval=10,
metrics=['top_k_accuracy', 'mean_class_accuracy'],
topk=(1, 5))
log_config = dict(
interval=20, hooks=[
dict(type='TextLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint'
load_from = None
resume_from = None
find_unused_parameters = False
model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3dSlowOnly',
depth=50,
pretrained=None,
in_channels=17,
base_channels=32,
num_stages=3,
out_indices=(2, ),
stage_blocks=(4, 6, 3),
conv1_stride_s=1,
pool1_stride_s=1,
inflate=(0, 1, 1),
spatial_strides=(2, 2, 2),
temporal_strides=(1, 1, 2),
dilations=(1, 1, 1)),
cls_head=dict(
type='I3DHead',
in_channels=512,
num_classes=60,
spatial_type='avg',
dropout_ratio=0.5),
train_cfg=dict(),
test_cfg=dict(average_clips='prob'))
dataset_type = 'PoseDataset'
ann_file_train = 'data/posec3d/ntu60_xsub_train.pkl'
ann_file_val = 'data/posec3d/ntu60_xsub_val.pkl'
left_kp = [1, 3, 5, 7, 9, 11, 13, 15]
right_kp = [2, 4, 6, 8, 10, 12, 14, 16]
skeletons = [[0, 5], [0, 6], [5, 7], [7, 9], [6, 8], [8, 10], [5, 11],
[11, 13], [13, 15], [6, 12], [12, 14], [14, 16], [0, 1], [0, 2],
[1, 3], [2, 4], [11, 12]]
train_pipeline = [
dict(type='UniformSampleFrames', clip_len=48),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='RandomResizedCrop', area_range=(0.56, 1.0)),
dict(type='Resize', scale=(56, 56), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5, left_kp=left_kp, right_kp=right_kp),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=False,
with_limb=True,
skeletons=skeletons),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=False,
with_limb=True,
skeletons=skeletons),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(
type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True),
dict(type='PoseDecode'),
dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
dict(type='Resize', scale=(-1, 64)),
dict(type='CenterCrop', crop_size=64),
dict(
type='GeneratePoseTarget',
sigma=0.6,
use_score=True,
with_kp=False,
with_limb=True,
skeletons=skeletons,
double=True,
left_kp=left_kp,
right_kp=right_kp),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix='',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.2, momentum=0.9,
weight_decay=0.0003) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0)
total_epochs = 240
checkpoint_config = dict(interval=10)
workflow = [('train', 10)]
evaluation = dict(
interval=10,
metrics=['top_k_accuracy', 'mean_class_accuracy'],
topk=(1, 5))
log_config = dict(
interval=20, hooks=[
dict(type='TextLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/posec3d/slowonly_r50_u48_240e_ntu60_xsub_limb'
load_from = None
resume_from = None
find_unused_parameters = False
# STGCN
[Spatial temporal graph convolutional networks for skeleton-based action recognition](https://ojs.aaai.org/index.php/AAAI/article/view/12328)
<!-- [ALGORITHM] -->
## Abstract
<!-- [ABSTRACT] -->
Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power and difficulties of generalization. In this work, we propose a novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data. This formulation not only leads to greater expressive power but also stronger generalization capability. On two large datasets, Kinetics and NTU-RGBD, it achieves substantial improvements over mainstream methods.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/34324155/142995893-d6618728-072c-46e1-b276-9b88cf21a01c.png" width="800"/>
</div>
## Results and Models
### NTU60_XSub
| config | keypoint | gpus | backbone | Top-1 | ckpt | log | json |
| :---------------------------------------------------------------------------------------------- | :------: | :--: | :------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------: |
| [stgcn_80e_ntu60_xsub_keypoint](/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py) | 2d | 2 | STGCN | 86.91 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint-e7bb9653.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.json) |
| [stgcn_80e_ntu60_xsub_keypoint_3d](/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d.py) | 3d | 1 | STGCN | 84.61 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d-13e7ccf0.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.log) | [json](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.json) |
### BABEL
| config | gpus | backbone | Top-1 | Mean Top-1 | Top-1 Official (AGCN) | Mean Top-1 Official (AGCN) | ckpt | log |
| --------------------------------------------------------------------------- | :--: | :------: | :-------: | :--------: | :-------------------: | :------------------------: | :-----------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------: |
| [stgcn_80e_babel60](/configs/skeleton/stgcn/stgcn_80e_babel60.py) | 8 | ST-GCN | **42.39** | **28.28** | 41.14 | 24.46 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60-3d206418.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60.log) |
| [stgcn_80e_babel60_wfl](/configs/skeleton/stgcn/stgcn_80e_babel60_wfl.py) | 8 | ST-GCN | **40.31** | 29.79 | 33.41 | **30.42** | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60_wfl/stgcn_80e_babel60_wfl-1a9102d7.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60_wfl.log) |
| [stgcn_80e_babel120](/configs/skeleton/stgcn/stgcn_80e_babel120.py) | 8 | ST-GCN | **38.95** | **20.58** | 38.41 | 17.56 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120/stgcn_80e_babel120-e41eb6d7.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel120.log) |
| [stgcn_80e_babel120_wfl](/configs/skeleton/stgcn/stgcn_80e_babel120_wfl.py) | 8 | ST-GCN | **33.00** | 24.33 | 27.91 | **26.17**\* | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120_wfl/stgcn_80e_babel120_wfl-3f2c100d.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel120_wfl.log) |
\* The number is copied from the [paper](https://arxiv.org/pdf/2106.09696.pdf), the performance of the [released checkpoints](https://github.com/abhinanda-punnakkal/BABEL/tree/main/action_recognition) for BABEL-120 is inferior.
## Train
You can use the following command to train a model.
```shell
python tools/train.py ${CONFIG_FILE} [optional arguments]
```
Example: train STGCN model on NTU60 dataset in a deterministic option with periodic validation.
```shell
python tools/train.py configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \
--work-dir work_dirs/stgcn_80e_ntu60_xsub_keypoint \
--validate --seed 0 --deterministic
```
For more details, you can refer to **Training setting** part in [getting_started](/docs/en/getting_started.md#training-setting).
## Test
You can use the following command to test a model.
```shell
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]
```
Example: test STGCN model on NTU60 dataset and dump the result to a pickle file.
```shell
python tools/test.py configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \
checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \
--out result.pkl
```
For more details, you can refer to **Test a dataset** part in [getting_started](/docs/en/getting_started.md#test-a-dataset).
## Citation
```BibTeX
@inproceedings{yan2018spatial,
title={Spatial temporal graph convolutional networks for skeleton-based action recognition},
author={Yan, Sijie and Xiong, Yuanjun and Lin, Dahua},
booktitle={Thirty-second AAAI conference on artificial intelligence},
year={2018}
}
```
# STGCN
## 简介
<!-- [ALGORITHM] -->
```BibTeX
@inproceedings{yan2018spatial,
title={Spatial temporal graph convolutional networks for skeleton-based action recognition},
author={Yan, Sijie and Xiong, Yuanjun and Lin, Dahua},
booktitle={Thirty-second AAAI conference on artificial intelligence},
year={2018}
}
```
## 模型库
### NTU60_XSub
| 配置文件 | 骨骼点 | GPU 数量 | 主干网络 | Top-1 准确率 | ckpt | log | json |
| :---------------------------------------------------------------------------------------------- | :----: | :------: | :------: | :----------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------: |
| [stgcn_80e_ntu60_xsub_keypoint](/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py) | 2d | 2 | STGCN | 86.91 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint-e7bb9653.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.log) | [json](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.json) |
| [stgcn_80e_ntu60_xsub_keypoint_3d](/configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d.py) | 3d | 1 | STGCN | 84.61 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d-13e7ccf0.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.log) | [json](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.json) |
### BABEL
| 配置文件 | GPU 数量 | 主干网络 | Top-1 准确率 | 类平均 Top-1 准确率 | Top-1 准确率 <br>(官方,使用 AGCN) | 类平均 Top-1 准确率<br>(官方,使用 AGCN) | ckpt | log |
| --------------------------------------------------------------------------- | :------: | :------: | :----------: | :-----------------: | :----------------------------------: | :----------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------: |
| [stgcn_80e_babel60](/configs/skeleton/stgcn/stgcn_80e_babel60.py) | 8 | ST-GCN | **42.39** | **28.28** | 41.14 | 24.46 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60-3d206418.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60.log) |
| [stgcn_80e_babel60_wfl](/configs/skeleton/stgcn/stgcn_80e_babel60_wfl.py) | 8 | ST-GCN | **40.31** | 29.79 | 33.41 | **30.42** | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60_wfl/stgcn_80e_babel60_wfl-1a9102d7.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60_wfl.log) |
| [stgcn_80e_babel120](/configs/skeleton/stgcn/stgcn_80e_babel120.py) | 8 | ST-GCN | **38.95** | **20.58** | 38.41 | 17.56 | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120/stgcn_80e_babel120-e41eb6d7.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel120.log) |
| [stgcn_80e_babel120_wfl](/configs/skeleton/stgcn/stgcn_80e_babel120_wfl.py) | 8 | ST-GCN | **33.00** | 24.33 | 27.91 | **26.17**\* | [ckpt](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120_wfl/stgcn_80e_babel120_wfl-3f2c100d.pth) | [log](https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel120_wfl.log) |
\* 注:此数字引自原 [论文](https://arxiv.org/pdf/2106.09696.pdf), 实际公开的 [模型权重](https://github.com/abhinanda-punnakkal/BABEL/tree/main/action_recognition) 精度略低一些。
## 如何训练
用户可以使用以下指令进行模型训练。
```shell
python tools/train.py ${CONFIG_FILE} [optional arguments]
```
例如:以一个确定性的训练方式,辅以定期的验证过程进行 STGCN 模型在 NTU60 数据集上的训练
```shell
python tools/train.py configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \
--work-dir work_dirs/stgcn_80e_ntu60_xsub_keypoint \
--validate --seed 0 --deterministic
```
更多训练细节,可参考 [基础教程](/docs/zh_cn/getting_started.md#训练配置) 中的 **训练配置** 部分。
## 如何测试
用户可以使用以下指令进行模型测试。
```shell
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]
```
例如:在 NTU60 数据集上测试 STGCN 模型,并将结果导出为一个 pickle 文件。
```shell
python tools/test.py configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py \
checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \
--out result.pkl
```
更多测试细节,可参考 [基础教程](/docs/zh_cn/getting_started.md#测试某个数据集) 中的 **测试某个数据集** 部分。
Collections:
- Name: STGCN
README: configs/skeleton/stgcn/README.md
Models:
- Config: configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint.py
In Collection: STGCN
Metadata:
Architecture: STGCN
Batch Size: 16
Epochs: 80
Parameters: 3088704
Training Data: NTU60-XSub
Training Resources: 2 GPUs
Name: stgcn_80e_ntu60_xsub_keypoint
Results:
Dataset: NTU60-XSub
Metrics:
Top 1 Accuracy: 86.91
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint.log
Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint/stgcn_80e_ntu60_xsub_keypoint-e7bb9653.pth
- Config: configs/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d.py
In Collection: STGCN
Metadata:
Architecture: STGCN
Batch Size: 32
Epochs: 80
Parameters: 3088704
Training Data: NTU60-XSub
Training Resources: 1 GPU
Name: stgcn_80e_ntu60_xsub_keypoint_3d
Results:
Dataset: NTU60-XSub
Metrics:
Top 1 Accuracy: 84.61
Task: Skeleton-based Action Recognition
Training Json Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.json
Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d.log
Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_ntu60_xsub_keypoint_3d/stgcn_80e_ntu60_xsub_keypoint_3d-13e7ccf0.pth
- Config: configs/skeleton/stgcn/stgcn_80e_babel60.py
In Collection: STGCN
Metadata:
Architecture: STGCN
Batch Size: 128
Epochs: 80
Parameters: 3088704
Training Data: BABEL60
Training Resources: 8 GPU
Name: stgcn_80e_babel60
Results:
Dataset: BABEL60
Metrics:
Top 1 Accuracy: 42.39
Mean Top 1 Accuracy: 28.28
Task: Skeleton-based Action Recognition
Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60.log
Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60/stgcn_80e_babel60-3d206418.pth
- Config: configs/skeleton/stgcn/stgcn_80e_babel60_wfl.py
In Collection: STGCN
Metadata:
Architecture: STGCN
Batch Size: 128
Epochs: 80
Parameters: 3088704
Training Data: BABEL60
Training Resources: 8 GPU
Name: stgcn_80e_babel60_wfl
Results:
Dataset: BABEL60
Metrics:
Top 1 Accuracy: 40.31
Mean Top 1 Accuracy: 29.79
Task: Skeleton-based Action Recognition
Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60_wfl/stgcn_80e_babel60_wfl.log
Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel60_wfl/stgcn_80e_babel60_wfl-1a9102d7.pth
- Config: configs/skeleton/stgcn/stgcn_80e_babel120.py
In Collection: STGCN
Metadata:
Architecture: STGCN
Batch Size: 128
Epochs: 80
Parameters: 3104320
Training Data: BABEL120
Training Resources: 8 GPU
Name: stgcn_80e_babel120
Results:
Dataset: BABEL120
Metrics:
Top 1 Accuracy: 38.95
Mean Top 1 Accuracy: 20.58
Task: Skeleton-based Action Recognition
Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120/stgcn_80e_babel120.log
Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120/stgcn_80e_babel120-e41eb6d7.pth
- Config: configs/skeleton/stgcn/stgcn_80e_babel120_wfl.py
In Collection: STGCN
Metadata:
Architecture: STGCN
Batch Size: 128
Epochs: 80
Parameters: 3104320
Training Data: BABEL120
Training Resources: 8 GPU
Name: stgcn_80e_babel120_wfl
Results:
Dataset: BABEL120
Metrics:
Top 1 Accuracy: 33.00
Mean Top 1 Accuracy: 24.33
Task: Skeleton-based Action Recognition
Training Log: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120_wfl/stgcn_80e_babel120_wfl.log
Weights: https://download.openmmlab.com/mmaction/skeleton/stgcn/stgcn_80e_babel120_wfl/stgcn_80e_babel120_wfl-3f2c100d.pth
model = dict(
type='SkeletonGCN',
backbone=dict(
type='STGCN',
in_channels=3,
edge_importance_weighting=True,
graph_cfg=dict(layout='ntu-rgb+d', strategy='spatial')),
cls_head=dict(
type='STGCNHead',
num_classes=120,
in_channels=256,
num_person=1,
loss_cls=dict(type='CrossEntropyLoss')),
train_cfg=None,
test_cfg=None)
dataset_type = 'PoseDataset'
ann_file_train = 'data/babel/babel120_train.pkl'
ann_file_val = 'data/babel/babel120_val.pkl'
train_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
val_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
test_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type='RepeatDataset',
times=5,
dataset=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix='',
pipeline=train_pipeline)),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(policy='step', step=[10, 14])
total_epochs = 16
checkpoint_config = dict(interval=1)
evaluation = dict(
interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'])
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/stgcn_80e_babel120'
load_from = None
resume_from = None
workflow = [('train', 1)]
samples_per_cls = [
518, 1993, 6260, 508, 208, 3006, 431, 724, 4527, 2131, 199, 1255, 487, 302,
136, 571, 267, 646, 1180, 405, 72, 731, 842, 1619, 271, 27, 1198, 1012,
110, 865, 462, 526, 405, 487, 101, 24, 84, 64, 168, 271, 609, 503, 76, 167,
415, 137, 421, 283, 2069, 715, 196, 66, 44, 989, 122, 43, 599, 396, 245,
380, 34, 236, 260, 325, 127, 133, 119, 66, 125, 50, 206, 191, 394, 69, 98,
145, 38, 21, 29, 64, 277, 65, 39, 31, 35, 85, 54, 80, 133, 66, 39, 64, 268,
34, 172, 54, 33, 21, 110, 19, 40, 55, 146, 39, 37, 75, 101, 20, 46, 55, 43,
21, 43, 87, 29, 36, 24, 37, 28, 39
]
model = dict(
type='SkeletonGCN',
backbone=dict(
type='STGCN',
in_channels=3,
edge_importance_weighting=True,
graph_cfg=dict(layout='ntu-rgb+d', strategy='spatial')),
cls_head=dict(
type='STGCNHead',
num_classes=120,
in_channels=256,
num_person=1,
loss_cls=dict(type='CBFocalLoss', samples_per_cls=samples_per_cls)),
train_cfg=None,
test_cfg=None)
dataset_type = 'PoseDataset'
ann_file_train = 'data/babel/babel120_train.pkl'
ann_file_val = 'data/babel/babel120_val.pkl'
train_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
val_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
test_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type='RepeatDataset',
times=5,
dataset=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix='',
pipeline=train_pipeline)),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(policy='step', step=[10, 14])
total_epochs = 16
checkpoint_config = dict(interval=1)
evaluation = dict(
interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'])
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/stgcn_80e_babel120_wfl/'
load_from = None
resume_from = None
workflow = [('train', 1)]
model = dict(
type='SkeletonGCN',
backbone=dict(
type='STGCN',
in_channels=3,
edge_importance_weighting=True,
graph_cfg=dict(layout='ntu-rgb+d', strategy='spatial')),
cls_head=dict(
type='STGCNHead',
num_classes=60,
in_channels=256,
num_person=1,
loss_cls=dict(type='CrossEntropyLoss')),
train_cfg=None,
test_cfg=None)
dataset_type = 'PoseDataset'
ann_file_train = 'data/babel/babel60_train.pkl'
ann_file_val = 'data/babel/babel60_val.pkl'
train_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
val_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
test_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type='RepeatDataset',
times=5,
dataset=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix='',
pipeline=train_pipeline)),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(policy='step', step=[10, 14])
total_epochs = 16
checkpoint_config = dict(interval=1)
evaluation = dict(
interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'])
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/stgcn_80e_babel60'
load_from = None
resume_from = None
workflow = [('train', 1)]
samples_per_cls = [
518, 1993, 6260, 508, 208, 3006, 431, 724, 4527, 2131, 199, 1255, 487, 302,
136, 571, 267, 646, 1180, 405, 731, 842, 1619, 271, 1198, 1012, 865, 462,
526, 405, 487, 168, 271, 609, 503, 167, 415, 421, 283, 2069, 715, 196, 989,
122, 599, 396, 245, 380, 236, 260, 325, 133, 206, 191, 394, 145, 277, 268,
172, 146
]
model = dict(
type='SkeletonGCN',
backbone=dict(
type='STGCN',
in_channels=3,
edge_importance_weighting=True,
graph_cfg=dict(layout='ntu-rgb+d', strategy='spatial')),
cls_head=dict(
type='STGCNHead',
num_classes=60,
in_channels=256,
num_person=1,
loss_cls=dict(type='CBFocalLoss', samples_per_cls=samples_per_cls)),
train_cfg=None,
test_cfg=None)
dataset_type = 'PoseDataset'
ann_file_train = 'data/babel/babel60_train.pkl'
ann_file_val = 'data/babel/babel60_val.pkl'
train_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
val_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
test_pipeline = [
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM', num_person=1),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
data = dict(
videos_per_gpu=16,
workers_per_gpu=2,
test_dataloader=dict(videos_per_gpu=1),
train=dict(
type='RepeatDataset',
times=5,
dataset=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix='',
pipeline=train_pipeline)),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix='',
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(policy='step', step=[10, 14])
total_epochs = 16
checkpoint_config = dict(interval=1)
evaluation = dict(
interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'])
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/stgcn_80e_babel60_wfl/'
load_from = None
resume_from = None
workflow = [('train', 1)]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment