Commit 41b18fd8 authored by zhe chen's avatar zhe chen
Browse files

Use pre-commit to reformat code


Use pre-commit to reformat code
parent ff20ea39
......@@ -4,17 +4,15 @@
# Licensed under The MIT License [see LICENSE for details]
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import, division, print_function
import math
import time
import torch
import torch.nn as nn
import math
from torch.autograd import gradcheck
from functions.dcnv3_func import DCNv3Function, dcnv3_core_pytorch
from torch.autograd import gradcheck
H_in, W_in = 8, 8
N, M, D = 2, 4, 16
......
......@@ -35,7 +35,7 @@ def build_optimizer(config, model):
optimizer = None
use_zero = config.TRAIN.OPTIMIZER.USE_ZERO
if use_zero:
print(f"\nUse Zero!")
print(f'\nUse Zero!')
if opt_lower == 'sgd':
# an ugly implementation
# this problem is fixed after torch 1.12
......@@ -119,7 +119,7 @@ def set_weight_decay_and_lr(
if f'levels.{i}' in name:
param.requires_grad = False
# 1. check wd
if len(param.shape) == 1 or name.endswith(".bias") or (
if len(param.shape) == 1 or name.endswith('.bias') or (
name in skip_list) or check_keywords_in_name(
name, skip_keywords):
wd = 0.
......
......@@ -4,13 +4,15 @@
# Licensed under The MIT License [see LICENSE for details]
# --------------------------------------------------------
import os
import math
import torch
import os
from collections import OrderedDict
import numpy as np
import torch
import torch.distributed as dist
from collections import OrderedDict
from timm.utils import get_state_dict
try:
# noinspection PyUnresolvedReferences
from apex import amp
......
# InternImage for Object Detection
This folder contains the implementation of the InternImage for object detection.
This folder contains the implementation of the InternImage for object detection.
Our detection code is developed on top of [MMDetection v2.28.1](https://github.com/open-mmlab/mmdetection/tree/v2.28.1).
## Usage
### Install
......@@ -28,6 +27,7 @@ conda activate internimage
- Install `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2`:
For examples, to install torch==1.11 with CUDA==11.3:
```bash
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
```
......@@ -47,12 +47,14 @@ pip install opencv-python termcolor yacs pyyaml scipy
```
- Compile CUDA operators
```bash
cd ./ops_dcnv3
sh ./make.sh
# unit test (should see all checking is True)
python test.py
```
- You can also install the operator using .whl files
[DCNv3-1.0-whl](https://github.com/OpenGVLab/InternImage/releases/tag/whl_files)
......@@ -61,7 +63,6 @@ python test.py
Prepare COCO according to the guidelines in [MMDetection v2.28.1](https://github.com/open-mmlab/mmdetection/blob/master/docs/en/1_exist_data_model.md).
### Evaluation
To evaluate our `InternImage` on COCO val, run:
......@@ -107,6 +108,7 @@ GPUS=32 sh slurm_train.sh <partition> <job-name> configs/coco/cascade_internimag
### Export
To export a detection model from PyTorch to TensorRT, run:
```shell
MODEL="model_name"
CKPT_PATH="/path/to/model/ckpt.pth"
......@@ -122,6 +124,7 @@ python deploy.py \
```
For example, to export `mask_rcnn_internimage_t_fpn_1x_coco` from PyTorch to TensorRT, run:
```shell
MODEL="mask_rcnn_internimage_t_fpn_1x_coco"
CKPT_PATH="/path/to/model/ckpt/mask_rcnn_internimage_t_fpn_1x_coco.pth"
......
......@@ -46,4 +46,4 @@ data = dict(
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline))
evaluation = dict(interval=1, metric='bbox', classwise=True)
\ No newline at end of file
evaluation = dict(interval=1, metric='bbox', classwise=True)
......@@ -125,4 +125,4 @@ model = dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100,
mask_thr_binary=0.5)))
\ No newline at end of file
mask_thr_binary=0.5)))
# COCO
## Introduction
Introduced by Lin et al. in [Microsoft COCO: Common Objects in Context](https://arxiv.org/pdf/1405.0312v3.pdf)
......@@ -11,19 +10,18 @@ Splits: The first version of MS COCO dataset was released in 2014. It contains 1
Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.
## Model Zoo
### Mask R-CNN + InternImage
| backbone | schd | box mAP | mask mAP | train speed | train time |#param | FLOPs | Config | Download |
| :------------: | :---------: | :-----: | :------: | :-----: |:------: | :-----: |:------: | :-----: | :---: |
| InternImage-T | 1x | 47.2 | 42.5 | 0.36s / iter | 9h | 49M | 270G | [config](./mask_rcnn_internimage_t_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_1x_coco.log.json) |
| InternImage-T | 3x | 49.1 | 43.7 | 0.34s / iter | 26h | 49M | 270G | [config](./mask_rcnn_internimage_t_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_3x_coco.log.json) |
| InternImage-S | 1x | 47.8 | 43.3 | 0.40s / iter | 10h | 69M | 340G | [config](./mask_rcnn_internimage_s_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_1x_coco.log.json) |
| InternImage-S | 3x | 49.7 | 44.5 | 0.40s / iter | 30h | 69M | 340G | [config](./mask_rcnn_internimage_s_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.log.json) |
| InternImage-B | 1x | 48.8 | 44.0 | 0.45s / iter | 11.5h | 115M | 501G | [config](./mask_rcnn_internimage_b_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_1x_coco.log.json) |
| InternImage-B | 3x | 50.3 | 44.8 | 0.45s / iter | 34h | 115M | 501G | [config](./mask_rcnn_internimage_b_fpn_3x_coco.py)| [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.log.json) |
| backbone | schd | box mAP | mask mAP | train speed | train time | #param | FLOPs | Config | Download |
| :-----------: | :--: | :-----: | :------: | :----------: | :--------: | :----: | :---: | :------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-T | 1x | 47.2 | 42.5 | 0.36s / iter | 9h | 49M | 270G | [config](./mask_rcnn_internimage_t_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_1x_coco.log.json) |
| InternImage-T | 3x | 49.1 | 43.7 | 0.34s / iter | 26h | 49M | 270G | [config](./mask_rcnn_internimage_t_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_3x_coco.log.json) |
| InternImage-S | 1x | 47.8 | 43.3 | 0.40s / iter | 10h | 69M | 340G | [config](./mask_rcnn_internimage_s_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_1x_coco.log.json) |
| InternImage-S | 3x | 49.7 | 44.5 | 0.40s / iter | 30h | 69M | 340G | [config](./mask_rcnn_internimage_s_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.log.json) |
| InternImage-B | 1x | 48.8 | 44.0 | 0.45s / iter | 11.5h | 115M | 501G | [config](./mask_rcnn_internimage_b_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_1x_coco.log.json) |
| InternImage-B | 3x | 50.3 | 44.8 | 0.45s / iter | 34h | 115M | 501G | [config](./mask_rcnn_internimage_b_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.log.json) |
- Training speed is measured with A100 GPUs using current code and may be faster than the speed in logs.
- Some logs are our recent newly trained ones. There might be slight differences between the results in logs and our paper.
......@@ -31,22 +29,21 @@ Based on community feedback, in 2017 the training/validation split was changed f
### Cascade Mask R-CNN + InternImage
| backbone | schd | box mAP | mask mAP | train speed | train time | #param | FLOPs | Config | Download |
| :------------: | :---------: | :-----: | :------: | :-----: | :---: | :-----: | :---: | :---: | :---: |
| InternImage-L | 1x | 54.9 | 47.7 | 0.73s / iter | 18h | 277M | 1399G | [config](./cascade_internimage_l_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_1x_coco.pth) |
| InternImage-L | 3x | 56.1 | 48.5 | 0.79s / iter | 15h (4n) | 277M | 1399G | [config](./cascade_internimage_l_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_3x_coco.log.json) |
| InternImage-XL | 1x | 55.3 | 48.1 | 0.82s / iter | 21h | 387M | 1782G | [config](./cascade_internimage_xl_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_1x_coco.log.json) |
| InternImage-XL | 3x | 56.2 | 48.8 | 0.91s / iter | 17h (4n) | 387M | 1782G | [config](./cascade_internimage_xl_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_3x_coco.log.json) |
| backbone | schd | box mAP | mask mAP | train speed | train time | #param | FLOPs | Config | Download |
| :------------: | :--: | :-----: | :------: | :----------: | :--------: | :----: | :---: | :-----------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-L | 1x | 54.9 | 47.7 | 0.73s / iter | 18h | 277M | 1399G | [config](./cascade_internimage_l_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_1x_coco.pth) |
| InternImage-L | 3x | 56.1 | 48.5 | 0.79s / iter | 15h (4n) | 277M | 1399G | [config](./cascade_internimage_l_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_3x_coco.log.json) |
| InternImage-XL | 1x | 55.3 | 48.1 | 0.82s / iter | 21h | 387M | 1782G | [config](./cascade_internimage_xl_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_1x_coco.log.json) |
| InternImage-XL | 3x | 56.2 | 48.8 | 0.91s / iter | 17h (4n) | 387M | 1782G | [config](./cascade_internimage_xl_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_3x_coco.log.json) |
- Training speed is measured with A100 GPUs using current code and may be faster than the speed in logs.
- Some logs are our recent newly trained ones. There might be slight differences between the results in logs and our paper.
- Please set `with_cp=True` to save memory if you meet `out-of-memory` issues.
### DINO + InternImage
| backbone | lr type | pretrain | schd | box mAP | train time | #param | Config | Download |
| :------------: | :---------: |:---------: | :---------: | :-----: | :---: | :-----: | :---: | :---: |
| InternImage-T | layer-wise lr | ImageNet-1K | 1x | 53.9 | 9.5h | 49M | [config](./dino_4scale_internimage_t_1x_coco_layer_wise_lr.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_t_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_t_1x_coco.json) |
| InternImage-L | layer-wise lr | ImageNet-22K | 1x | 57.5 | 18h | 241M | [config](./dino_4scale_internimage_l_1x_coco_layer_wise_lr.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_layer_wise_lr.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_layer_wise_lr.log.json) |
| InternImage-L | 0.1x backbone lr | ImageNet-22K | 1x | 57.6 | 18h | 241M | [config](./dino_4scale_internimage_l_1x_coco_0.1x_backbone_lr.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_0.1x_backbone_lr.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_0.1x_backbone_lr.log.json) |
| backbone | lr type | pretrain | schd | box mAP | train time | #param | Config | Download |
| :-----------: | :--------------: | :----------: | :--: | :-----: | :--------: | :----: | :---------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-T | layer-wise lr | ImageNet-1K | 1x | 53.9 | 9.5h | 49M | [config](./dino_4scale_internimage_t_1x_coco_layer_wise_lr.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_t_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_t_1x_coco.json) |
| InternImage-L | layer-wise lr | ImageNet-22K | 1x | 57.5 | 18h | 241M | [config](./dino_4scale_internimage_l_1x_coco_layer_wise_lr.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_layer_wise_lr.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_layer_wise_lr.log.json) |
| InternImage-L | 0.1x backbone lr | ImageNet-22K | 1x | 57.6 | 18h | 241M | [config](./dino_4scale_internimage_l_1x_coco_0.1x_backbone_lr.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_0.1x_backbone_lr.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_0.1x_backbone_lr.log.json) |
......@@ -106,4 +106,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
......@@ -106,4 +106,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
......@@ -175,4 +175,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
......@@ -174,4 +174,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
......@@ -174,4 +174,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
......@@ -46,4 +46,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
......@@ -89,4 +89,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
......@@ -46,4 +46,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
......@@ -89,4 +89,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
......@@ -46,4 +46,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
......@@ -47,4 +47,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
......@@ -89,4 +89,4 @@ checkpoint_config = dict(
interval=1,
max_keep_ckpts=3,
save_last=True,
)
\ No newline at end of file
)
# CrowdHuman
## Introduction
Introduced by Shao et al. in [CrowdHuman: A Benchmark for Detecting Human in a Crowd](https://arxiv.org/pdf/1805.00123.pdf)
......@@ -8,6 +7,7 @@ Introduced by Shao et al. in [CrowdHuman: A Benchmark for Detecting Human in a C
CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks.
## Prepare the data
Download the original dataset from [CrowdHuman](https://www.crowdhuman.org/download.html). Then convert annotations by detection/tools/create_crowd_anno.py
- Data Tree of CrowdHuman should look like:
......@@ -25,16 +25,15 @@ Download the original dataset from [CrowdHuman](https://www.crowdhuman.org/downl
├── 1074488,79d54000c6f9d9e5.jpg
└── ...
## Model Zoo
```
## Model Zoo
### Cascade Mask R-CNN + InternImage
| backbone | schd | box mAP | mask mAP | train speed | train time | #param | FLOPs | Config | Download |
| :------------: | :---------: |:-------:|:--------:|:-----------:|:-----------:|:------:|:-----:| :---: |:--------:|
| InternImage-XL | 3x | TBD | TBD | TBD | TBD | TBD | TBD | [config](./cascade_internimage_xl_fpn_3x_crowd_human.py) | TBD |
| backbone | schd | box mAP | mask mAP | train speed | train time | #param | FLOPs | Config | Download |
| :------------: | :--: | :-----: | :------: | :---------: | :--------: | :----: | :---: | :------------------------------------------------------: | :------: |
| InternImage-XL | 3x | TBD | TBD | TBD | TBD | TBD | TBD | [config](./cascade_internimage_xl_fpn_3x_crowd_human.py) | TBD |
- Training speed is measured with A100 GPUs using current code and may be faster than the speed in logs.
- Some logs are our recent newly trained ones. There might be slight differences between the results in logs and our paper.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment