Multi-person human pose estimation is defined as the task of detecting the poses (or keypoints) of all people from an input image.
Existing approaches can be categorized into top-down and bottom-up approaches.
Top-down methods (e.g. DeepPose) divide the task into two stages: human detection and pose estimation. They perform human detection first, followed by single-person pose estimation given human bounding boxes.
Bottom-up approaches (e.g. Associative Embedding) first detect all the keypoints and then group/associate them into person instances.
## Data preparation
Please follow [DATA Preparation](/docs/en/dataset_zoo/2d_body_keypoint.md) to prepare data.
## Demo
Please follow [Demo](/demo/docs/en/2d_human_pose_demo.md#2d-human-pose-demo) to run demos.
# Associative embedding: End-to-end learning for joint detection and grouping (AE)
Associative Embedding is one of the most popular 2D bottom-up pose estimation approaches, that first detect all the keypoints and then group/associate them into person instances.
In order to group all the predicted keypoints to individuals, a tag is also predicted for each detected keypoint. Tags of the same person are similar, while tags of different people are different. Thus the keypoints can be grouped according to the tags.
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
```
</details>
Results on COCO val2017 without multi-scale test
| Arch | Input Size | AP | AP<sup>50</sup> | AP<sup>75</sup> | AR | AR<sup>50</sup> | ckpt | log |