# CrowdHuman ## Introduction Introduced by Shao et al. in [CrowdHuman: A Benchmark for Detecting Human in a Crowd](https://arxiv.org/pdf/1805.00123.pdf) CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks. ## Prepare the data Download the original dataset from [CrowdHuman](https://www.crowdhuman.org/download.html). Then convert annotations by detection/tools/create_crowd_anno.py - Data Tree of CrowdHuman should look like: ```bash $ tree CrowdHuman CrowdHuman ├── annotations │ ├── annotation_train.json │ ├── annotation_train.odgt │ ├── annotation_val.json │ ├── annotation_val.odgt │ └── ... └── Images ├── 1074488,79b360006b38332b.jpg ├── 1074488,79d54000c6f9d9e5.jpg └── ... ``` ## Model Zoo ### Cascade Mask R-CNN + InternImage | backbone | schd | box mAP | mask mAP | train speed | train time | #param | FLOPs | Config | Download | | :------------: | :--: | :-----: | :------: | :---------: | :--------: | :----: | :---: | :------------------------------------------------------: | :------: | | InternImage-XL | 3x | TBD | TBD | TBD | TBD | TBD | TBD | [config](./cascade_internimage_xl_fpn_3x_crowd_human.py) | TBD | - Training speed is measured with A100 GPUs using current code and may be faster than the speed in logs. - Some logs are our recent newly trained ones. There might be slight differences between the results in logs and our paper.