Commit 37c8cebc authored by Sugon_ldc's avatar Sugon_ldc
Browse files

add new model

parents
Pipeline #318 failed with stages
in 0 seconds
<!-- [ALGORITHM] -->
<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_cvpr_2014/html/Toshev_DeepPose_Human_Pose_2014_CVPR_paper.html">DeepPose (CVPR'2014)</a></summary>
```bibtex
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
```
</details>
<!-- [ALGORITHM] -->
<details>
<summary align="right"><a href="https://arxiv.org/abs/2107.11291">RLE (ICCV'2021)</a></summary>
```bibtex
@inproceedings{li2021human,
title={Human pose regression with residual log-likelihood estimation},
author={Li, Jiefeng and Bian, Siyuan and Zeng, Ailing and Wang, Can and Pang, Bo and Liu, Wentao and Lu, Cewu},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={11025--11034},
year={2021}
}
```
</details>
<!-- [BACKBONE] -->
<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html">ResNet (CVPR'2016)</a></summary>
```bibtex
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
```
</details>
<!-- [DATASET] -->
<details>
<summary align="right"><a href="https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48">COCO (ECCV'2014)</a></summary>
```bibtex
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
```
</details>
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
| Arch | Input Size | AP | AP<sup>50</sup> | AP<sup>75</sup> | AR | AR<sup>50</sup> | ckpt | log |
| :-------------------------------------------- | :--------: | :---: | :-------------: | :-------------: | :---: | :-------------: | :-------------------------------------------: | :-------------------------------------------: |
| [deeppose_resnet_50_rle](/configs/body/2d_kpt_sview_rgb_img/deeppose/coco/res50_coco_256x192_rle.py) | 256x192 | 0.704 | 0.883 | 0.777 | 0.751 | 0.920 | [ckpt](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res50_coco_256x192_rle-2ea9bb4a_20220616.pth) | [log](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res50_coco_256x192_rle_20220616.log.json) |
| [deeppose_resnet_101_rle](/configs/body/2d_kpt_sview_rgb_img/deeppose/coco/res101_coco_256x192_rle.py) | 256x192 | 0.722 | 0.894 | 0.794 | 0.768 | 0.930 | [ckpt](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res101_coco_256x192_rle-16c3d461_20220615.pth) | [log](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res101_coco_256x192_rle_20220615.log.json) |
| [deeppose_resnet_152_rle](/configs/body/2d_kpt_sview_rgb_img/deeppose/coco/res152_coco_256x192_rle.py) | 256x192 | 0.731 | 0.897 | 0.805 | 0.777 | 0.933 | [ckpt](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res152_coco_256x192_rle-c05bdccf_20220615.pth) | [log](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res152_coco_256x192_rle_20220615.log.json) |
| [deeppose_resnet_152_rle](/configs/body/2d_kpt_sview_rgb_img/deeppose/coco/res152_coco_384x288_rle.py) | 384x288 | 0.749 | 0.901 | 0.815 | 0.793 | 0.935 | [ckpt](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res152_coco_384x288_rle-b77c4c37_20220624.pth) | [log](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res152_coco_384x288_rle_20220624.log.json) |
Collections:
- Name: RLE
Paper:
Title: Human pose regression with residual log-likelihood estimation
URL: https://arxiv.org/abs/2107.11291
README: https://github.com/open-mmlab/mmpose/blob/master/docs/en/papers/techniques/rle.md
Models:
- Config: configs/body/2d_kpt_sview_rgb_img/deeppose/coco/res50_coco_256x192_rle.py
In Collection: RLE
Metadata:
Architecture: &id001
- DeepPose
- RLE
- ResNet
Training Data: COCO
Name: deeppose_res50_coco_256x192_rle
Results:
- Dataset: COCO
Metrics:
AP: 0.704
AP@0.5: 0.883
AP@0.75: 0.777
AR: 0.751
AR@0.5: 0.92
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res50_coco_256x192_rle-2ea9bb4a_20220616.pth
- Config: configs/body/2d_kpt_sview_rgb_img/deeppose/coco/res101_coco_256x192_rle.py
In Collection: RLE
Metadata:
Architecture: *id001
Training Data: COCO
Name: deeppose_res101_coco_256x192_rle
Results:
- Dataset: COCO
Metrics:
AP: 0.722
AP@0.5: 0.894
AP@0.75: 0.794
AR: 0.768
AR@0.5: 0.93
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res101_coco_256x192_rle-16c3d461_20220615.pth
- Config: configs/body/2d_kpt_sview_rgb_img/deeppose/coco/res152_coco_256x192_rle.py
In Collection: RLE
Metadata:
Architecture: *id001
Training Data: COCO
Name: deeppose_res152_coco_256x192_rle
Results:
- Dataset: COCO
Metrics:
AP: 0.731
AP@0.5: 0.897
AP@0.75: 0.805
AR: 0.777
AR@0.5: 0.933
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res152_coco_256x192_rle-c05bdccf_20220615.pth
- Config: configs/body/2d_kpt_sview_rgb_img/deeppose/coco/res152_coco_384x288_rle.py
In Collection: RLE
Metadata:
Architecture: *id001
Training Data: COCO
Name: deeppose_res152_coco_384x288_rle
Results:
- Dataset: COCO
Metrics:
AP: 0.749
AP@0.5: 0.901
AP@0.75: 0.815
AR: 0.793
AR@0.5: 0.935
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res152_coco_384x288_rle-b77c4c37_20220624.pth
_base_ = [
'../../../../_base_/default_runtime.py',
'../../../../_base_/datasets/mpii.py'
]
evaluation = dict(interval=10, metric='PCKh', save_best='PCKh')
optimizer = dict(
type='Adam',
lr=5e-4,
)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[170, 200])
total_epochs = 210
log_config = dict(
interval=50, hooks=[
dict(type='TextLoggerHook'),
])
channel_cfg = dict(
num_output_channels=16,
dataset_joints=16,
dataset_channel=list(range(16)),
inference_channel=list(range(16)))
# model settings
model = dict(
type='TopDown',
pretrained='torchvision://resnet101',
backbone=dict(type='ResNet', depth=101, num_stages=4, out_indices=(3, )),
neck=dict(type='GlobalAveragePooling'),
keypoint_head=dict(
type='DeepposeRegressionHead',
in_channels=2048,
num_joints=channel_cfg['num_output_channels'],
loss_keypoint=dict(type='SmoothL1Loss', use_target_weight=True)),
train_cfg=dict(),
test_cfg=dict(flip_test=True))
data_cfg = dict(
image_size=[256, 256],
heatmap_size=[64, 64],
num_output_channels=channel_cfg['num_output_channels'],
num_joints=channel_cfg['dataset_joints'],
dataset_channel=channel_cfg['dataset_channel'],
inference_channel=channel_cfg['inference_channel'],
use_gt_bbox=True,
bbox_file=None,
)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='TopDownGetBboxCenterScale', padding=1.25),
dict(type='TopDownRandomFlip', flip_prob=0.5),
dict(
type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5),
dict(type='TopDownAffine'),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(type='TopDownGenerateTargetRegression'),
dict(
type='Collect',
keys=['img', 'target', 'target_weight'],
meta_keys=[
'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
'rotation', 'flip_pairs'
]),
]
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='TopDownGetBboxCenterScale', padding=1.25),
dict(type='TopDownAffine'),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(
type='Collect',
keys=['img'],
meta_keys=['image_file', 'center', 'scale', 'rotation', 'flip_pairs']),
]
test_pipeline = val_pipeline
data_root = 'data/mpii'
data = dict(
samples_per_gpu=64,
workers_per_gpu=2,
val_dataloader=dict(samples_per_gpu=32),
test_dataloader=dict(samples_per_gpu=32),
train=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_train.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=train_pipeline,
dataset_info={{_base_.dataset_info}}),
val=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_val.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=val_pipeline,
dataset_info={{_base_.dataset_info}}),
test=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_val.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=test_pipeline,
dataset_info={{_base_.dataset_info}}),
)
_base_ = [
'../../../../_base_/default_runtime.py',
'../../../../_base_/datasets/mpii.py'
]
evaluation = dict(interval=10, metric='PCKh', save_best='PCKh')
optimizer = dict(
type='Adam',
lr=5e-4,
)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[170, 200])
total_epochs = 210
log_config = dict(
interval=50, hooks=[
dict(type='TextLoggerHook'),
])
channel_cfg = dict(
num_output_channels=16,
dataset_joints=16,
dataset_channel=list(range(16)),
inference_channel=list(range(16)))
# model settings
model = dict(
type='TopDown',
pretrained='torchvision://resnet152',
backbone=dict(type='ResNet', depth=152, num_stages=4, out_indices=(3, )),
neck=dict(type='GlobalAveragePooling'),
keypoint_head=dict(
type='DeepposeRegressionHead',
in_channels=2048,
num_joints=channel_cfg['num_output_channels'],
loss_keypoint=dict(type='SmoothL1Loss', use_target_weight=True)),
train_cfg=dict(),
test_cfg=dict(flip_test=True))
data_cfg = dict(
image_size=[256, 256],
heatmap_size=[64, 64],
num_output_channels=channel_cfg['num_output_channels'],
num_joints=channel_cfg['dataset_joints'],
dataset_channel=channel_cfg['dataset_channel'],
inference_channel=channel_cfg['inference_channel'],
use_gt_bbox=True,
bbox_file=None,
)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='TopDownGetBboxCenterScale', padding=1.25),
dict(type='TopDownRandomFlip', flip_prob=0.5),
dict(
type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5),
dict(type='TopDownAffine'),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(type='TopDownGenerateTargetRegression'),
dict(
type='Collect',
keys=['img', 'target', 'target_weight'],
meta_keys=[
'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
'rotation', 'flip_pairs'
]),
]
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='TopDownGetBboxCenterScale', padding=1.25),
dict(type='TopDownAffine'),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(
type='Collect',
keys=['img'],
meta_keys=['image_file', 'center', 'scale', 'rotation', 'flip_pairs']),
]
test_pipeline = val_pipeline
data_root = 'data/mpii'
data = dict(
samples_per_gpu=64,
workers_per_gpu=2,
val_dataloader=dict(samples_per_gpu=32),
test_dataloader=dict(samples_per_gpu=32),
train=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_train.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=train_pipeline,
dataset_info={{_base_.dataset_info}}),
val=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_val.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=val_pipeline,
dataset_info={{_base_.dataset_info}}),
test=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_val.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=test_pipeline,
dataset_info={{_base_.dataset_info}}),
)
_base_ = [
'../../../../_base_/default_runtime.py',
'../../../../_base_/datasets/mpii.py'
]
evaluation = dict(interval=10, metric='PCKh', save_best='PCKh')
optimizer = dict(
type='Adam',
lr=5e-4,
)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[170, 200])
total_epochs = 210
log_config = dict(
interval=50, hooks=[
dict(type='TextLoggerHook'),
])
channel_cfg = dict(
num_output_channels=16,
dataset_joints=16,
dataset_channel=list(range(16)),
inference_channel=list(range(16)))
# model settings
model = dict(
type='TopDown',
pretrained='torchvision://resnet50',
backbone=dict(type='ResNet', depth=50, num_stages=4, out_indices=(3, )),
neck=dict(type='GlobalAveragePooling'),
keypoint_head=dict(
type='DeepposeRegressionHead',
in_channels=2048,
num_joints=channel_cfg['num_output_channels'],
loss_keypoint=dict(type='SmoothL1Loss', use_target_weight=True)),
train_cfg=dict(),
test_cfg=dict(flip_test=True))
data_cfg = dict(
image_size=[256, 256],
heatmap_size=[64, 64],
num_output_channels=channel_cfg['num_output_channels'],
num_joints=channel_cfg['dataset_joints'],
dataset_channel=channel_cfg['dataset_channel'],
inference_channel=channel_cfg['inference_channel'],
use_gt_bbox=True,
bbox_file=None,
)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='TopDownGetBboxCenterScale', padding=1.25),
dict(type='TopDownRandomFlip', flip_prob=0.5),
dict(
type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5),
dict(type='TopDownAffine'),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(type='TopDownGenerateTargetRegression'),
dict(
type='Collect',
keys=['img', 'target', 'target_weight'],
meta_keys=[
'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
'rotation', 'flip_pairs'
]),
]
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='TopDownGetBboxCenterScale', padding=1.25),
dict(type='TopDownAffine'),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(
type='Collect',
keys=['img'],
meta_keys=['image_file', 'center', 'scale', 'rotation', 'flip_pairs']),
]
test_pipeline = val_pipeline
data_root = 'data/mpii'
data = dict(
samples_per_gpu=64,
workers_per_gpu=2,
val_dataloader=dict(samples_per_gpu=32),
test_dataloader=dict(samples_per_gpu=32),
train=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_train.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=train_pipeline,
dataset_info={{_base_.dataset_info}}),
val=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_val.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=val_pipeline,
dataset_info={{_base_.dataset_info}}),
test=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_val.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=test_pipeline,
dataset_info={{_base_.dataset_info}}),
)
_base_ = [
'../../../../_base_/default_runtime.py',
'../../../../_base_/datasets/mpii.py'
]
evaluation = dict(interval=10, metric='PCKh', save_best='PCKh')
optimizer = dict(
type='Adam',
lr=5e-4,
)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[170, 200])
total_epochs = 210
log_config = dict(
interval=50, hooks=[
dict(type='TextLoggerHook'),
])
channel_cfg = dict(
num_output_channels=16,
dataset_joints=16,
dataset_channel=list(range(16)),
inference_channel=list(range(16)))
# model settings
model = dict(
type='TopDown',
pretrained='torchvision://resnet50',
backbone=dict(type='ResNet', depth=50, num_stages=4, out_indices=(3, )),
neck=dict(type='GlobalAveragePooling'),
keypoint_head=dict(
type='DeepposeRegressionHead',
in_channels=2048,
num_joints=channel_cfg['num_output_channels'],
loss_keypoint=dict(
type='RLELoss',
use_target_weight=True,
size_average=True,
residual=True),
out_sigma=True),
train_cfg=dict(),
test_cfg=dict(flip_test=True))
data_cfg = dict(
image_size=[256, 256],
heatmap_size=[64, 64],
num_output_channels=channel_cfg['num_output_channels'],
num_joints=channel_cfg['dataset_joints'],
dataset_channel=channel_cfg['dataset_channel'],
inference_channel=channel_cfg['inference_channel'],
use_gt_bbox=True,
bbox_file=None,
)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='TopDownGetBboxCenterScale', padding=1.25),
dict(type='TopDownRandomFlip', flip_prob=0.5),
dict(
type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5),
dict(type='TopDownAffine'),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(type='TopDownGenerateTargetRegression'),
dict(
type='Collect',
keys=['img', 'target', 'target_weight'],
meta_keys=[
'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
'rotation', 'flip_pairs'
]),
]
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='TopDownGetBboxCenterScale', padding=1.25),
dict(type='TopDownAffine'),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(
type='Collect',
keys=['img'],
meta_keys=['image_file', 'center', 'scale', 'rotation', 'flip_pairs']),
]
test_pipeline = val_pipeline
data_root = 'data/mpii'
data = dict(
samples_per_gpu=64,
workers_per_gpu=2,
val_dataloader=dict(samples_per_gpu=32),
test_dataloader=dict(samples_per_gpu=32),
train=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_train.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=train_pipeline,
dataset_info={{_base_.dataset_info}}),
val=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_val.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=val_pipeline,
dataset_info={{_base_.dataset_info}}),
test=dict(
type='TopDownMpiiDataset',
ann_file=f'{data_root}/annotations/mpii_val.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=test_pipeline,
dataset_info={{_base_.dataset_info}}),
)
<!-- [ALGORITHM] -->
<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_cvpr_2014/html/Toshev_DeepPose_Human_Pose_2014_CVPR_paper.html">DeepPose (CVPR'2014)</a></summary>
```bibtex
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
```
</details>
<!-- [BACKBONE] -->
<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html">ResNet (CVPR'2016)</a></summary>
```bibtex
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
```
</details>
<!-- [DATASET] -->
<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_cvpr_2014/html/Andriluka_2D_Human_Pose_2014_CVPR_paper.html">MPII (CVPR'2014)</a></summary>
```bibtex
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
```
</details>
Results on MPII val set
| Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
| :---------------------------------------------------------- | :--------: | :---: | :------: | :---------------------------------------------------------: | :---------------------------------------------------------: |
| [deeppose_resnet_50](/configs/body/2d_kpt_sview_rgb_img/deeppose/mpii/res50_mpii_256x256.py) | 256x256 | 0.825 | 0.174 | [ckpt](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res50_mpii_256x256-c63cd0b6_20210203.pth) | [log](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res50_mpii_256x256_20210203.log.json) |
| [deeppose_resnet_101](/configs/body/2d_kpt_sview_rgb_img/deeppose/mpii/res101_mpii_256x256.py) | 256x256 | 0.841 | 0.193 | [ckpt](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res101_mpii_256x256-87516a90_20210205.pth) | [log](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res101_mpii_256x256_20210205.log.json) |
| [deeppose_resnet_152](/configs/body/2d_kpt_sview_rgb_img/deeppose/mpii/res152_mpii_256x256.py) | 256x256 | 0.850 | 0.198 | [ckpt](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res152_mpii_256x256-15f5e6f9_20210205.pth) | [log](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res152_mpii_256x256_20210205.log.json) |
Collections:
- Name: ResNet
Paper:
Title: Deep residual learning for image recognition
URL: http://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html
README: https://github.com/open-mmlab/mmpose/blob/master/docs/en/papers/backbones/resnet.md
Models:
- Config: configs/body/2d_kpt_sview_rgb_img/deeppose/mpii/res50_mpii_256x256.py
In Collection: ResNet
Metadata:
Architecture: &id001
- DeepPose
- ResNet
Training Data: MPII
Name: deeppose_res50_mpii_256x256
Results:
- Dataset: MPII
Metrics:
Mean: 0.825
Mean@0.1: 0.174
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res50_mpii_256x256-c63cd0b6_20210203.pth
- Config: configs/body/2d_kpt_sview_rgb_img/deeppose/mpii/res101_mpii_256x256.py
In Collection: ResNet
Metadata:
Architecture: *id001
Training Data: MPII
Name: deeppose_res101_mpii_256x256
Results:
- Dataset: MPII
Metrics:
Mean: 0.841
Mean@0.1: 0.193
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res101_mpii_256x256-87516a90_20210205.pth
- Config: configs/body/2d_kpt_sview_rgb_img/deeppose/mpii/res152_mpii_256x256.py
In Collection: ResNet
Metadata:
Architecture: *id001
Training Data: MPII
Name: deeppose_res152_mpii_256x256
Results:
- Dataset: MPII
Metrics:
Mean: 0.85
Mean@0.1: 0.198
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res152_mpii_256x256-15f5e6f9_20210205.pth
<!-- [ALGORITHM] -->
<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_cvpr_2014/html/Toshev_DeepPose_Human_Pose_2014_CVPR_paper.html">DeepPose (CVPR'2014)</a></summary>
```bibtex
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
```
</details>
<!-- [ALGORITHM] -->
<details>
<summary align="right"><a href="https://arxiv.org/abs/2107.11291">RLE (ICCV'2021)</a></summary>
```bibtex
@inproceedings{li2021human,
title={Human pose regression with residual log-likelihood estimation},
author={Li, Jiefeng and Bian, Siyuan and Zeng, Ailing and Wang, Can and Pang, Bo and Liu, Wentao and Lu, Cewu},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={11025--11034},
year={2021}
}
```
</details>
<!-- [BACKBONE] -->
<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html">ResNet (CVPR'2016)</a></summary>
```bibtex
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
```
</details>
<!-- [DATASET] -->
<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_cvpr_2014/html/Andriluka_2D_Human_Pose_2014_CVPR_paper.html">MPII (CVPR'2014)</a></summary>
```bibtex
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
```
</details>
Results on MPII val set
| Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
| :---------------------------------------------------------- | :--------: | :---: | :------: | :---------------------------------------------------------: | :---------------------------------------------------------: |
| [deeppose_resnet_50_rle](/configs/body/2d_kpt_sview_rgb_img/deeppose/mpii/res50_mpii_256x256_rle.py) | 256x256 | 0.860 | 0.263 | [ckpt](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res50_mpii_256x256_rle-5f92a619_20220504.pth) | [log](https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res50_mpii_256x256_rle_20220504.log.json) |
Collections:
- Name: RLE
Paper:
Title: Human pose regression with residual log-likelihood estimation
URL: https://arxiv.org/abs/2107.11291
README: https://github.com/open-mmlab/mmpose/blob/master/docs/en/papers/techniques/rle.md
Models:
- Config: configs/body/2d_kpt_sview_rgb_img/deeppose/mpii/res50_mpii_256x256_rle.py
In Collection: RLE
Metadata:
Architecture:
- DeepPose
- RLE
- ResNet
Training Data: MPII
Name: deeppose_res50_mpii_256x256_rle
Results:
- Dataset: MPII
Metrics:
Mean: 0.86
Mean@0.1: 0.263
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res50_mpii_256x256_rle-5f92a619_20220504.pth
# Bottom-up Human Pose Estimation via Disentangled Keypoint Regression (DEKR)
<!-- [ALGORITHM] -->
<details>
<summary align="right"><a href="https://arxiv.org/abs/2104.02300">DEKR (CVPR'2021)</a></summary>
```bibtex
@inproceedings{geng2021bottom,
title={Bottom-up human pose estimation via disentangled keypoint regression},
author={Geng, Zigang and Sun, Ke and Xiao, Bin and Zhang, Zhaoxiang and Wang, Jingdong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14676--14686},
year={2021}
}
```
</details>
DEKR is a popular 2D bottom-up pose estimation approach that simultaneously detects all the instances and regresses the offsets from the instance centers to joints.
In order to predict the offsets more accurately, the offsets of different joints are regressed using separated branches with deformable convolutional layers. Thus convolution kernels with different shapes are adopted to extract features for the corresponding joint.
<!-- [ALGORITHM] -->
<details>
<summary align="right"><a href="https://arxiv.org/abs/2104.02300">DEKR (CVPR'2021)</a></summary>
```bibtex
@inproceedings{geng2021bottom,
title={Bottom-up human pose estimation via disentangled keypoint regression},
author={Geng, Zigang and Sun, Ke and Xiao, Bin and Zhang, Zhaoxiang and Wang, Jingdong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14676--14686},
year={2021}
}
```
</details>
<!-- [ALGORITHM] -->
<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_CVPR_2019/html/Sun_Deep_High-Resolution_Representation_Learning_for_Human_Pose_Estimation_CVPR_2019_paper.html">HRNet (CVPR'2019)</a></summary>
```bibtex
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
```
</details>
<!-- [DATASET] -->
<details>
<summary align="right"><a href="https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48">COCO (ECCV'2014)</a></summary>
```bibtex
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
```
</details>
Results on COCO val2017 without multi-scale test
| Arch | Input Size | AP | AP<sup>50</sup> | AP<sup>75</sup> | AR | AR<sup>50</sup> | ckpt | log |
| :-------------------------------------------- | :--------: | :---: | :-------------: | :-------------: | :---: | :-------------: | :-------------------------------------------: | :-------------------------------------------: |
| [HRNet-w32](/configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/coco/hrnet_w32_coco_512x512.py) | 512x512 | 0.680 | 0.868 | 0.745 | 0.728 | 0.897 | [ckpt](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w32_coco_512x512-2a3056de_20220928.pth) | [log](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w32_coco_512x512-20220928.log.json) |
| [HRNet-w48](/configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/coco/hrnet_w48_coco_640x640.py) | 640x640 | 0.709 | 0.876 | 0.773 | 0.758 | 0.909 | [ckpt](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w48_coco_640x640-8854b2f1_20220930.pth) | [log](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w48_coco_640x640-20220930.log.json) |
Results on COCO val2017 with multi-scale test. 3 default scales (\[2, 1, 0.5\]) are used
| Arch | Input Size | AP | AP<sup>50</sup> | AP<sup>75</sup> | AR | AR<sup>50</sup> | ckpt |
| :------------------------------------------------------------------ | :--------: | :---: | :-------------: | :-------------: | :---: | :-------------: | :------------------------------------------------------------------: |
| [HRNet-w32](/configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/coco/hrnet_w32_coco_512x512_multiscale.py)\* | 512x512 | 0.705 | 0.878 | 0.767 | 0.759 | 0.921 | [ckpt](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w32_coco_512x512-2a3056de_20220928.pth) |
| [HRNet-w48](/configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/coco/hrnet_w48_coco_640x640_multiscale.py)\* | 640x640 | 0.722 | 0.882 | 0.785 | 0.778 | 0.928 | [ckpt](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w48_coco_640x640-8854b2f1_20220930.pth) |
\* these configs are generally used for evaluation. The training settings are identical to their single-scale counterparts.
The results of models provided by the authors on COCO val2017 using the same evaluation protocol
| Arch | Input Size | Setting | AP | AP<sup>50</sup> | AP<sup>75</sup> | AR | AR<sup>50</sup> | ckpt |
| :-------- | :--------: | :----------: | :---: | :-------------: | :-------------: | :---: | :-------------: | :----------------------------------------------------------: |
| HRNet-w32 | 512x512 | single-scale | 0.678 | 0.868 | 0.744 | 0.728 | 0.897 | see [official implementation](https://github.com/HRNet/DEKR) |
| HRNet-w48 | 640x640 | single-scale | 0.707 | 0.876 | 0.773 | 0.757 | 0.909 | see [official implementation](https://github.com/HRNet/DEKR) |
| HRNet-w32 | 512x512 | multi-scale | 0.708 | 0.880 | 0.773 | 0.763 | 0.921 | see [official implementation](https://github.com/HRNet/DEKR) |
| HRNet-w48 | 640x640 | multi-scale | 0.721 | 0.881 | 0.786 | 0.779 | 0.927 | see [official implementation](https://github.com/HRNet/DEKR) |
The discrepancy between these results and that shown in paper is attributed to the differences in implementation details in evaluation process.
Collections:
- Name: DEKR
Paper:
Title: Bottom-up human pose estimation via disentangled keypoint regression
URL: https://arxiv.org/abs/2104.02300
README: https://github.com/open-mmlab/mmpose/blob/master/docs/en/papers/algorithms/dekr.md
Models:
- Config: configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/coco/hrnet_w32_coco_512x512.py
In Collection: DEKR
Metadata:
Architecture: &id001
- DEKR
- HRNet
Training Data: COCO
Name: disentangled_keypoint_regression_hrnet_w32_coco_512x512
Results:
- Dataset: COCO
Metrics:
AP: 0.68
AP@0.5: 0.868
AP@0.75: 0.745
AR: 0.728
AR@0.5: 0.897
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w32_coco_512x512-2a3056de_20220928.pth
- Config: configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/coco/hrnet_w48_coco_640x640.py
In Collection: DEKR
Metadata:
Architecture: *id001
Training Data: COCO
Name: disentangled_keypoint_regression_hrnet_w48_coco_640x640
Results:
- Dataset: COCO
Metrics:
AP: 0.709
AP@0.5: 0.876
AP@0.75: 0.773
AR: 0.758
AR@0.5: 0.909
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w48_coco_640x640-8854b2f1_20220930.pth
- Config: configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/coco/hrnet_w32_coco_512x512_multiscale.py
In Collection: DEKR
Metadata:
Architecture: *id001
Training Data: COCO
Name: disentangled_keypoint_regression_hrnet_w32_coco_512x512_multiscale
Results:
- Dataset: COCO
Metrics:
AP: 0.705
AP@0.5: 0.878
AP@0.75: 0.767
AR: 0.759
AR@0.5: 0.921
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w32_coco_512x512-2a3056de_20220928.pth
- Config: configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/coco/hrnet_w48_coco_640x640_multiscale.py
In Collection: DEKR
Metadata:
Architecture: *id001
Training Data: COCO
Name: disentangled_keypoint_regression_hrnet_w48_coco_640x640_multiscale
Results:
- Dataset: COCO
Metrics:
AP: 0.722
AP@0.5: 0.882
AP@0.75: 0.785
AR: 0.778
AR@0.5: 0.928
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w48_coco_640x640-8854b2f1_20220930.pth
_base_ = [
'../../../../_base_/default_runtime.py',
'../../../../_base_/datasets/coco.py'
]
checkpoint_config = dict(interval=20)
evaluation = dict(interval=20, metric='mAP', save_best='AP')
optimizer = dict(
type='Adam',
lr=0.001,
)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[90, 120])
total_epochs = 140
channel_cfg = dict(
dataset_joints=17,
dataset_channel=[
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
],
inference_channel=[
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
])
data_cfg = dict(
image_size=512,
base_size=256,
base_sigma=2,
heatmap_size=[128],
num_joints=channel_cfg['dataset_joints'],
dataset_channel=channel_cfg['dataset_channel'],
inference_channel=channel_cfg['inference_channel'],
num_scales=1,
scale_aware_sigma=False,
)
# model settings
model = dict(
type='DisentangledKeypointRegressor',
pretrained='https://download.openmmlab.com/mmpose/'
'pretrain_models/hrnet_w32-36af842e.pth',
backbone=dict(
type='HRNet',
in_channels=3,
extra=dict(
stage1=dict(
num_modules=1,
num_branches=1,
block='BOTTLENECK',
num_blocks=(4, ),
num_channels=(64, )),
stage2=dict(
num_modules=1,
num_branches=2,
block='BASIC',
num_blocks=(4, 4),
num_channels=(32, 64)),
stage3=dict(
num_modules=4,
num_branches=3,
block='BASIC',
num_blocks=(4, 4, 4),
num_channels=(32, 64, 128)),
stage4=dict(
num_modules=3,
num_branches=4,
block='BASIC',
num_blocks=(4, 4, 4, 4),
num_channels=(32, 64, 128, 256),
multiscale_output=True)),
),
keypoint_head=dict(
type='DEKRHead',
in_channels=(32, 64, 128, 256),
in_index=(0, 1, 2, 3),
num_heatmap_filters=32,
num_joints=channel_cfg['dataset_joints'],
input_transform='resize_concat',
heatmap_loss=dict(
type='JointsMSELoss',
use_target_weight=True,
loss_weight=1.0,
),
offset_loss=dict(
type='SoftWeightSmoothL1Loss',
use_target_weight=True,
supervise_empty=False,
loss_weight=0.002,
beta=1 / 9.0,
)),
train_cfg=dict(),
test_cfg=dict(
num_joints=channel_cfg['dataset_joints'],
max_num_people=30,
project2image=False,
align_corners=False,
max_pool_kernel=5,
use_nms=True,
nms_dist_thr=0.05,
nms_joints_thr=8,
keypoint_threshold=0.01,
rescore_cfg=dict(
in_channels=74,
norm_indexes=(5, 6),
pretrained='https://download.openmmlab.com/mmpose/'
'pretrain_models/kpt_rescore_coco-33d58c5c.pth'),
flip_test=True))
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='BottomUpRandomAffine',
rot_factor=30,
scale_factor=[0.75, 1.5],
scale_type='short',
trans_factor=40),
dict(type='BottomUpRandomFlip', flip_prob=0.5),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(type='GetKeypointCenterArea'),
dict(
type='BottomUpGenerateHeatmapTarget',
sigma=(2, 4),
gen_center_heatmap=True,
bg_weight=0.1,
),
dict(
type='BottomUpGenerateOffsetTarget',
radius=4,
),
dict(
type='Collect',
keys=['img', 'heatmaps', 'masks', 'offsets', 'offset_weights'],
meta_keys=[]),
]
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='BottomUpGetImgSize', test_scale_factor=[1]),
dict(
type='BottomUpResizeAlign',
transforms=[
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
]),
dict(
type='Collect',
keys=['img'],
meta_keys=[
'image_file', 'aug_data', 'test_scale_factor', 'base_size',
'center', 'scale', 'flip_index', 'num_joints', 'skeleton',
'image_size', 'heatmap_size'
]),
]
test_pipeline = val_pipeline
data_root = 'data/coco'
data = dict(
workers_per_gpu=4,
train_dataloader=dict(samples_per_gpu=10),
val_dataloader=dict(samples_per_gpu=1),
test_dataloader=dict(samples_per_gpu=1),
train=dict(
type='BottomUpCocoDataset',
ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
img_prefix=f'{data_root}/train2017/',
data_cfg=data_cfg,
pipeline=train_pipeline,
dataset_info={{_base_.dataset_info}}),
val=dict(
type='BottomUpCocoDataset',
ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=val_pipeline,
dataset_info={{_base_.dataset_info}}),
test=dict(
type='BottomUpCocoDataset',
ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=test_pipeline,
dataset_info={{_base_.dataset_info}}),
)
_base_ = ['hrnet_w32_coco_512x512.py']
model = dict(
test_cfg=dict(
multi_scale_score_decrease=1.0,
nms_dist_thr=0.1,
max_pool_kernel=9,
))
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='BottomUpGetImgSize',
base_length=32,
test_scale_factor=[0.5, 1, 2]),
dict(
type='BottomUpResizeAlign',
base_length=32,
transforms=[
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
]),
dict(
type='Collect',
keys=['img'],
meta_keys=[
'image_file', 'aug_data', 'test_scale_factor', 'base_size',
'center', 'scale', 'flip_index', 'num_joints', 'skeleton',
'image_size', 'heatmap_size'
]),
]
test_pipeline = val_pipeline
data = dict(
val=dict(pipeline=val_pipeline),
test=dict(pipeline=test_pipeline),
)
_base_ = [
'../../../../_base_/default_runtime.py',
'../../../../_base_/datasets/coco.py'
]
checkpoint_config = dict(interval=20)
evaluation = dict(interval=20, metric='mAP', save_best='AP')
optimizer = dict(
type='Adam',
lr=0.001,
)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[90, 120])
total_epochs = 140
channel_cfg = dict(
dataset_joints=17,
dataset_channel=[
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
],
inference_channel=[
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
])
data_cfg = dict(
image_size=640,
base_size=320,
base_sigma=2,
heatmap_size=[160],
num_joints=channel_cfg['dataset_joints'],
dataset_channel=channel_cfg['dataset_channel'],
inference_channel=channel_cfg['inference_channel'],
num_scales=1,
scale_aware_sigma=False,
)
# model settings
model = dict(
type='DisentangledKeypointRegressor',
pretrained='https://download.openmmlab.com/mmpose/'
'pretrain_models/hrnet_w48-8ef0771d.pth',
backbone=dict(
type='HRNet',
in_channels=3,
extra=dict(
stage1=dict(
num_modules=1,
num_branches=1,
block='BOTTLENECK',
num_blocks=(4, ),
num_channels=(64, )),
stage2=dict(
num_modules=1,
num_branches=2,
block='BASIC',
num_blocks=(4, 4),
num_channels=(48, 96)),
stage3=dict(
num_modules=4,
num_branches=3,
block='BASIC',
num_blocks=(4, 4, 4),
num_channels=(48, 96, 192)),
stage4=dict(
num_modules=3,
num_branches=4,
block='BASIC',
num_blocks=(4, 4, 4, 4),
num_channels=(48, 96, 192, 384),
multiscale_output=True)),
),
keypoint_head=dict(
type='DEKRHead',
in_channels=(48, 96, 192, 384),
in_index=(0, 1, 2, 3),
num_heatmap_filters=48,
num_joints=channel_cfg['dataset_joints'],
input_transform='resize_concat',
heatmap_loss=dict(
type='JointsMSELoss',
use_target_weight=True,
loss_weight=1.0,
),
offset_loss=dict(
type='SoftWeightSmoothL1Loss',
use_target_weight=True,
supervise_empty=False,
loss_weight=0.002,
beta=1 / 9.0,
)),
train_cfg=dict(),
test_cfg=dict(
num_joints=channel_cfg['dataset_joints'],
max_num_people=30,
project2image=False,
align_corners=False,
max_pool_kernel=5,
use_nms=True,
nms_dist_thr=0.05,
nms_joints_thr=8,
keypoint_threshold=0.01,
rescore_cfg=dict(
in_channels=74,
norm_indexes=(5, 6),
pretrained='https://download.openmmlab.com/mmpose/'
'pretrain_models/kpt_rescore_coco-33d58c5c.pth'),
flip_test=True))
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='BottomUpRandomAffine',
rot_factor=30,
scale_factor=[0.75, 1.5],
scale_type='short',
trans_factor=40),
dict(type='BottomUpRandomFlip', flip_prob=0.5),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(type='GetKeypointCenterArea'),
dict(
type='BottomUpGenerateHeatmapTarget',
sigma=(2, 4),
gen_center_heatmap=True,
bg_weight=0.1,
),
dict(
type='BottomUpGenerateOffsetTarget',
radius=4,
),
dict(
type='Collect',
keys=['img', 'heatmaps', 'masks', 'offsets', 'offset_weights'],
meta_keys=[]),
]
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='BottomUpGetImgSize', test_scale_factor=[1]),
dict(
type='BottomUpResizeAlign',
transforms=[
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
]),
dict(
type='Collect',
keys=['img'],
meta_keys=[
'image_file', 'aug_data', 'test_scale_factor', 'base_size',
'center', 'scale', 'flip_index', 'num_joints', 'skeleton',
'image_size', 'heatmap_size'
]),
]
test_pipeline = val_pipeline
data_root = 'data/coco'
data = dict(
workers_per_gpu=4,
train_dataloader=dict(samples_per_gpu=5),
val_dataloader=dict(samples_per_gpu=1),
test_dataloader=dict(samples_per_gpu=1),
train=dict(
type='BottomUpCocoDataset',
ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
img_prefix=f'{data_root}/train2017/',
data_cfg=data_cfg,
pipeline=train_pipeline,
dataset_info={{_base_.dataset_info}}),
val=dict(
type='BottomUpCocoDataset',
ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=val_pipeline,
dataset_info={{_base_.dataset_info}}),
test=dict(
type='BottomUpCocoDataset',
ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=test_pipeline,
dataset_info={{_base_.dataset_info}}),
)
_base_ = ['hrnet_w48_coco_640x640.py']
model = dict(test_cfg=dict(
nms_dist_thr=0.1,
max_pool_kernel=11,
))
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='BottomUpGetImgSize',
base_length=32,
test_scale_factor=[0.5, 1, 2]),
dict(
type='BottomUpResizeAlign',
base_length=32,
transforms=[
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
]),
dict(
type='Collect',
keys=['img'],
meta_keys=[
'image_file', 'aug_data', 'test_scale_factor', 'base_size',
'center', 'scale', 'flip_index', 'num_joints', 'skeleton',
'image_size', 'heatmap_size'
]),
]
test_pipeline = val_pipeline
data = dict(
val=dict(pipeline=val_pipeline),
test=dict(pipeline=test_pipeline),
)
<!-- [ALGORITHM] -->
<details>
<summary align="right"><a href="https://arxiv.org/abs/2104.02300">DEKR (CVPR'2021)</a></summary>
```bibtex
@inproceedings{geng2021bottom,
title={Bottom-up human pose estimation via disentangled keypoint regression},
author={Geng, Zigang and Sun, Ke and Xiao, Bin and Zhang, Zhaoxiang and Wang, Jingdong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14676--14686},
year={2021}
}
```
</details>
<!-- [ALGORITHM] -->
<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_CVPR_2019/html/Sun_Deep_High-Resolution_Representation_Learning_for_Human_Pose_Estimation_CVPR_2019_paper.html">HRNet (CVPR'2019)</a></summary>
```bibtex
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
```
</details>
<!-- [DATASET] -->
<details>
<summary align="right"><a href="http://openaccess.thecvf.com/content_CVPR_2019/html/Li_CrowdPose_Efficient_Crowded_Scenes_Pose_Estimation_and_a_New_Benchmark_CVPR_2019_paper.html">CrowdPose (CVPR'2019)</a></summary>
```bibtex
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
```
</details>
Results on CrowdPose test without multi-scale test
| Arch | Input Size | AP | AP<sup>50</sup> | AP<sup>75</sup> | AR | AR<sup>50</sup> | ckpt | log |
| :-------------------------------------------- | :--------: | :---: | :-------------: | :-------------: | :---: | :-------------: | :-------------------------------------------: | :-------------------------------------------: |
| [HRNet-w32](/configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/crowdpose/hrnet_w32_crowdpose_512x512.py) | 512x512 | 0.663 | 0.857 | 0.715 | 0.719 | 0.893 | [ckpt](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w32_crowdpose_512x512-685aff75_20220924.pth) | [log](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w32_crowdpose_512x512-20220924.log.json) |
| [HRNet-w48](/configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/crowdpose/hrnet_w48_crowdpose_640x640.py) | 640x640 | 0.682 | 0.869 | 0.736 | 0.742 | 0.911 | [ckpt](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w48_crowdpose_640x640-ef6b6040_20220930.pth) | [log](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w48_crowdpose_640x640-20220930.log.json) |
Results on CrowdPose test with multi-scale test. 3 default scales (\[2, 1, 0.5\]) are used
| Arch | Input Size | AP | AP<sup>50</sup> | AP<sup>75</sup> | AR | AR<sup>50</sup> | ckpt |
| :------------------------------------------------------------------ | :--------: | :---: | :-------------: | :-------------: | :---: | :-------------: | :------------------------------------------------------------------: |
| [HRNet-w32](/configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/crowdpose/hrnet_w32_crowdpose_512x512_multiscale.py)\* | 512x512 | 0.692 | 0.874 | 0.748 | 0.755 | 0.926 | [ckpt](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w32_crowdpose_512x512-685aff75_20220924.pth) |
| [HRNet-w48](/configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/crowdpose/hrnet_w48_crowdpose_640x640_multiscale.py)\* | 640x640 | 0.696 | 0.869 | 0.749 | 0.769 | 0.933 | [ckpt](https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w48_crowdpose_640x640-ef6b6040_20220930.pth) |
\* these configs are generally used for evaluation. The training settings are identical to their single-scale counterparts.
Collections:
- Name: DEKR
Paper:
Title: Bottom-up human pose estimation via disentangled keypoint regression
URL: https://arxiv.org/abs/2104.02300
README: https://github.com/open-mmlab/mmpose/blob/master/docs/en/papers/algorithms/dekr.md
Models:
- Config: configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/crowdpose/hrnet_w32_crowdpose_512x512.py
In Collection: DEKR
Metadata:
Architecture: &id001
- DEKR
- HRNet
Training Data: CrowdPose
Name: disentangled_keypoint_regression_hrnet_w32_crowdpose_512x512
Results:
- Dataset: CrowdPose
Metrics:
AP: 0.663
AP@0.5: 0.857
AP@0.75: 0.715
AR: 0.719
AR@0.5: 0.893
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w32_crowdpose_512x512-685aff75_20220924.pth
- Config: configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/crowdpose/hrnet_w48_crowdpose_640x640.py
In Collection: DEKR
Metadata:
Architecture: *id001
Training Data: CrowdPose
Name: disentangled_keypoint_regression_hrnet_w48_crowdpose_640x640
Results:
- Dataset: CrowdPose
Metrics:
AP: 0.682
AP@0.5: 0.869
AP@0.75: 0.736
AR: 0.742
AR@0.5: 0.911
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w48_crowdpose_640x640-ef6b6040_20220930.pth
- Config: configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/crowdpose/hrnet_w32_crowdpose_512x512_multiscale.py
In Collection: DEKR
Metadata:
Architecture: *id001
Training Data: CrowdPose
Name: disentangled_keypoint_regression_hrnet_w32_crowdpose_512x512_multiscale
Results:
- Dataset: CrowdPose
Metrics:
AP: 0.692
AP@0.5: 0.874
AP@0.75: 0.748
AR: 0.755
AR@0.5: 0.926
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w32_crowdpose_512x512-685aff75_20220924.pth
- Config: configs/body/2d_kpt_sview_rgb_img/disentangled_keypoint_regression/crowdpose/hrnet_w48_crowdpose_640x640_multiscale.py
In Collection: DEKR
Metadata:
Architecture: *id001
Training Data: CrowdPose
Name: disentangled_keypoint_regression_hrnet_w48_crowdpose_640x640_multiscale
Results:
- Dataset: CrowdPose
Metrics:
AP: 0.696
AP@0.5: 0.869
AP@0.75: 0.749
AR: 0.769
AR@0.5: 0.933
Task: Body 2D Keypoint
Weights: https://download.openmmlab.com/mmpose/bottom_up/dekr/hrnet_w48_crowdpose_640x640-ef6b6040_20220930.pth
_base_ = [
'../../../../_base_/default_runtime.py',
'../../../../_base_/datasets/crowdpose.py'
]
checkpoint_config = dict(interval=20)
evaluation = dict(interval=20, metric='mAP', save_best='AP')
optimizer = dict(
type='Adam',
lr=0.001,
)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[200, 260])
total_epochs = 300
channel_cfg = dict(
num_output_channels=14,
dataset_joints=14,
dataset_channel=[
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
],
inference_channel=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
data_cfg = dict(
image_size=512,
base_size=256,
base_sigma=2,
heatmap_size=[128, 256],
num_joints=channel_cfg['dataset_joints'],
dataset_channel=channel_cfg['dataset_channel'],
inference_channel=channel_cfg['inference_channel'],
num_scales=2,
scale_aware_sigma=False,
)
# model settings
model = dict(
type='DisentangledKeypointRegressor',
pretrained='https://download.openmmlab.com/mmpose/'
'pretrain_models/hrnet_w32-36af842e.pth',
backbone=dict(
type='HRNet',
in_channels=3,
extra=dict(
stage1=dict(
num_modules=1,
num_branches=1,
block='BOTTLENECK',
num_blocks=(4, ),
num_channels=(64, )),
stage2=dict(
num_modules=1,
num_branches=2,
block='BASIC',
num_blocks=(4, 4),
num_channels=(32, 64)),
stage3=dict(
num_modules=4,
num_branches=3,
block='BASIC',
num_blocks=(4, 4, 4),
num_channels=(32, 64, 128)),
stage4=dict(
num_modules=3,
num_branches=4,
block='BASIC',
num_blocks=(4, 4, 4, 4),
num_channels=(32, 64, 128, 256),
multiscale_output=True)),
),
keypoint_head=dict(
type='DEKRHead',
in_channels=(32, 64, 128, 256),
in_index=(0, 1, 2, 3),
num_heatmap_filters=32,
num_joints=channel_cfg['dataset_joints'],
input_transform='resize_concat',
heatmap_loss=dict(
type='JointsMSELoss',
use_target_weight=True,
loss_weight=1.0,
),
offset_loss=dict(
type='SoftWeightSmoothL1Loss',
use_target_weight=True,
supervise_empty=False,
loss_weight=0.004,
beta=1 / 9.0,
)),
train_cfg=dict(),
test_cfg=dict(
num_joints=channel_cfg['dataset_joints'],
max_num_people=30,
project2image=False,
align_corners=False,
max_pool_kernel=5,
use_nms=True,
nms_dist_thr=0.05,
nms_joints_thr=7,
keypoint_threshold=0.01,
rescore_cfg=dict(
in_channels=59,
norm_indexes=(0, 1),
pretrained='https://download.openmmlab.com/mmpose/'
'pretrain_models/kpt_rescore_crowdpose-300c7efe.pth'),
flip_test=True))
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='BottomUpRandomAffine',
rot_factor=30,
scale_factor=[0.75, 1.5],
scale_type='short',
trans_factor=40),
dict(type='BottomUpRandomFlip', flip_prob=0.5),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(type='GetKeypointCenterArea'),
dict(
type='BottomUpGenerateHeatmapTarget',
sigma=(2, 4),
gen_center_heatmap=True,
bg_weight=0.1,
),
dict(
type='BottomUpGenerateOffsetTarget',
radius=4,
),
dict(
type='Collect',
keys=['img', 'heatmaps', 'masks', 'offsets', 'offset_weights'],
meta_keys=[]),
]
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='BottomUpGetImgSize', test_scale_factor=[1]),
dict(
type='BottomUpResizeAlign',
transforms=[
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
]),
dict(
type='Collect',
keys=['img'],
meta_keys=[
'image_file', 'aug_data', 'test_scale_factor', 'base_size',
'center', 'scale', 'flip_index', 'num_joints', 'skeleton',
'image_size', 'heatmap_size'
]),
]
test_pipeline = val_pipeline
data_root = 'data/crowdpose'
data = dict(
workers_per_gpu=4,
train_dataloader=dict(samples_per_gpu=10),
val_dataloader=dict(samples_per_gpu=1),
test_dataloader=dict(samples_per_gpu=1),
train=dict(
type='BottomUpCrowdPoseDataset',
ann_file=f'{data_root}/annotations/mmpose_crowdpose_trainval.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=train_pipeline,
dataset_info={{_base_.dataset_info}}),
val=dict(
type='BottomUpCrowdPoseDataset',
ann_file=f'{data_root}/annotations/mmpose_crowdpose_test.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=val_pipeline,
dataset_info={{_base_.dataset_info}}),
test=dict(
type='BottomUpCrowdPoseDataset',
ann_file=f'{data_root}/annotations/mmpose_crowdpose_test.json',
img_prefix=f'{data_root}/images/',
data_cfg=data_cfg,
pipeline=test_pipeline,
dataset_info={{_base_.dataset_info}}),
)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment