Commit 41b18fd8 authored by zhe chen's avatar zhe chen
Browse files

Use pre-commit to reformat code


Use pre-commit to reformat code
parent ff20ea39
......@@ -4,17 +4,15 @@
# Licensed under The MIT License [see LICENSE for details]
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import, division, print_function
import math
import time
import torch
import torch.nn as nn
import math
from torch.autograd import gradcheck
from functions.dcnv3_func import DCNv3Function, dcnv3_core_pytorch
from torch.autograd import gradcheck
H_in, W_in = 8, 8
N, M, D = 2, 4, 16
......
......@@ -35,7 +35,7 @@ def build_optimizer(config, model):
optimizer = None
use_zero = config.TRAIN.OPTIMIZER.USE_ZERO
if use_zero:
print(f"\nUse Zero!")
print(f'\nUse Zero!')
if opt_lower == 'sgd':
# an ugly implementation
# this problem is fixed after torch 1.12
......@@ -119,7 +119,7 @@ def set_weight_decay_and_lr(
if f'levels.{i}' in name:
param.requires_grad = False
# 1. check wd
if len(param.shape) == 1 or name.endswith(".bias") or (
if len(param.shape) == 1 or name.endswith('.bias') or (
name in skip_list) or check_keywords_in_name(
name, skip_keywords):
wd = 0.
......
......@@ -4,13 +4,15 @@
# Licensed under The MIT License [see LICENSE for details]
# --------------------------------------------------------
import os
import math
import torch
import os
from collections import OrderedDict
import numpy as np
import torch
import torch.distributed as dist
from collections import OrderedDict
from timm.utils import get_state_dict
try:
# noinspection PyUnresolvedReferences
from apex import amp
......
......@@ -4,7 +4,6 @@ This folder contains the implementation of the InternImage for object detection.
Our detection code is developed on top of [MMDetection v2.28.1](https://github.com/open-mmlab/mmdetection/tree/v2.28.1).
## Usage
### Install
......@@ -28,6 +27,7 @@ conda activate internimage
- Install `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2`:
For examples, to install torch==1.11 with CUDA==11.3:
```bash
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
```
......@@ -47,12 +47,14 @@ pip install opencv-python termcolor yacs pyyaml scipy
```
- Compile CUDA operators
```bash
cd ./ops_dcnv3
sh ./make.sh
# unit test (should see all checking is True)
python test.py
```
- You can also install the operator using .whl files
[DCNv3-1.0-whl](https://github.com/OpenGVLab/InternImage/releases/tag/whl_files)
......@@ -61,7 +63,6 @@ python test.py
Prepare COCO according to the guidelines in [MMDetection v2.28.1](https://github.com/open-mmlab/mmdetection/blob/master/docs/en/1_exist_data_model.md).
### Evaluation
To evaluate our `InternImage` on COCO val, run:
......@@ -107,6 +108,7 @@ GPUS=32 sh slurm_train.sh <partition> <job-name> configs/coco/cascade_internimag
### Export
To export a detection model from PyTorch to TensorRT, run:
```shell
MODEL="model_name"
CKPT_PATH="/path/to/model/ckpt.pth"
......@@ -122,6 +124,7 @@ python deploy.py \
```
For example, to export `mask_rcnn_internimage_t_fpn_1x_coco` from PyTorch to TensorRT, run:
```shell
MODEL="mask_rcnn_internimage_t_fpn_1x_coco"
CKPT_PATH="/path/to/model/ckpt/mask_rcnn_internimage_t_fpn_1x_coco.pth"
......
# COCO
## Introduction
Introduced by Lin et al. in [Microsoft COCO: Common Objects in Context](https://arxiv.org/pdf/1405.0312v3.pdf)
......@@ -11,19 +10,18 @@ Splits: The first version of MS COCO dataset was released in 2014. It contains 1
Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.
## Model Zoo
### Mask R-CNN + InternImage
| backbone | schd | box mAP | mask mAP | train speed | train time |#param | FLOPs | Config | Download |
| :------------: | :---------: | :-----: | :------: | :-----: |:------: | :-----: |:------: | :-----: | :---: |
| backbone | schd | box mAP | mask mAP | train speed | train time | #param | FLOPs | Config | Download |
| :-----------: | :--: | :-----: | :------: | :----------: | :--------: | :----: | :---: | :------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-T | 1x | 47.2 | 42.5 | 0.36s / iter | 9h | 49M | 270G | [config](./mask_rcnn_internimage_t_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_1x_coco.log.json) |
| InternImage-T | 3x | 49.1 | 43.7 | 0.34s / iter | 26h | 49M | 270G | [config](./mask_rcnn_internimage_t_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_t_fpn_3x_coco.log.json) |
| InternImage-S | 1x | 47.8 | 43.3 | 0.40s / iter | 10h | 69M | 340G | [config](./mask_rcnn_internimage_s_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_1x_coco.log.json) |
| InternImage-S | 3x | 49.7 | 44.5 | 0.40s / iter | 30h | 69M | 340G | [config](./mask_rcnn_internimage_s_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.log.json) |
| InternImage-B | 1x | 48.8 | 44.0 | 0.45s / iter | 11.5h | 115M | 501G | [config](./mask_rcnn_internimage_b_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_1x_coco.log.json) |
| InternImage-B | 3x | 50.3 | 44.8 | 0.45s / iter | 34h | 115M | 501G | [config](./mask_rcnn_internimage_b_fpn_3x_coco.py)| [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.log.json) |
| InternImage-B | 3x | 50.3 | 44.8 | 0.45s / iter | 34h | 115M | 501G | [config](./mask_rcnn_internimage_b_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_b_fpn_3x_coco.log.json) |
- Training speed is measured with A100 GPUs using current code and may be faster than the speed in logs.
- Some logs are our recent newly trained ones. There might be slight differences between the results in logs and our paper.
......@@ -32,7 +30,7 @@ Based on community feedback, in 2017 the training/validation split was changed f
### Cascade Mask R-CNN + InternImage
| backbone | schd | box mAP | mask mAP | train speed | train time | #param | FLOPs | Config | Download |
| :------------: | :---------: | :-----: | :------: | :-----: | :---: | :-----: | :---: | :---: | :---: |
| :------------: | :--: | :-----: | :------: | :----------: | :--------: | :----: | :---: | :-----------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-L | 1x | 54.9 | 47.7 | 0.73s / iter | 18h | 277M | 1399G | [config](./cascade_internimage_l_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_1x_coco.pth) |
| InternImage-L | 3x | 56.1 | 48.5 | 0.79s / iter | 15h (4n) | 277M | 1399G | [config](./cascade_internimage_l_fpn_3x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_3x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_l_fpn_3x_coco.log.json) |
| InternImage-XL | 1x | 55.3 | 48.1 | 0.82s / iter | 21h | 387M | 1782G | [config](./cascade_internimage_xl_fpn_1x_coco.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/cascade_internimage_xl_fpn_1x_coco.log.json) |
......@@ -42,11 +40,10 @@ Based on community feedback, in 2017 the training/validation split was changed f
- Some logs are our recent newly trained ones. There might be slight differences between the results in logs and our paper.
- Please set `with_cp=True` to save memory if you meet `out-of-memory` issues.
### DINO + InternImage
| backbone | lr type | pretrain | schd | box mAP | train time | #param | Config | Download |
| :------------: | :---------: |:---------: | :---------: | :-----: | :---: | :-----: | :---: | :---: |
| :-----------: | :--------------: | :----------: | :--: | :-----: | :--------: | :----: | :---------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-T | layer-wise lr | ImageNet-1K | 1x | 53.9 | 9.5h | 49M | [config](./dino_4scale_internimage_t_1x_coco_layer_wise_lr.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_t_1x_coco.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_t_1x_coco.json) |
| InternImage-L | layer-wise lr | ImageNet-22K | 1x | 57.5 | 18h | 241M | [config](./dino_4scale_internimage_l_1x_coco_layer_wise_lr.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_layer_wise_lr.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_layer_wise_lr.log.json) |
| InternImage-L | 0.1x backbone lr | ImageNet-22K | 1x | 57.6 | 18h | 241M | [config](./dino_4scale_internimage_l_1x_coco_0.1x_backbone_lr.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_0.1x_backbone_lr.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/resolve/main/dino_4scale_internimage_l_1x_coco_0.1x_backbone_lr.log.json) |
# CrowdHuman
## Introduction
Introduced by Shao et al. in [CrowdHuman: A Benchmark for Detecting Human in a Crowd](https://arxiv.org/pdf/1805.00123.pdf)
......@@ -8,6 +7,7 @@ Introduced by Shao et al. in [CrowdHuman: A Benchmark for Detecting Human in a C
CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks.
## Prepare the data
Download the original dataset from [CrowdHuman](https://www.crowdhuman.org/download.html). Then convert annotations by detection/tools/create_crowd_anno.py
- Data Tree of CrowdHuman should look like:
......@@ -25,16 +25,15 @@ Download the original dataset from [CrowdHuman](https://www.crowdhuman.org/downl
├── 1074488,79d54000c6f9d9e5.jpg
└── ...
## Model Zoo
```
## Model Zoo
### Cascade Mask R-CNN + InternImage
| backbone | schd | box mAP | mask mAP | train speed | train time | #param | FLOPs | Config | Download |
| :------------: | :---------: |:-------:|:--------:|:-----------:|:-----------:|:------:|:-----:| :---: |:--------:|
| :------------: | :--: | :-----: | :------: | :---------: | :--------: | :----: | :---: | :------------------------------------------------------: | :------: |
| InternImage-XL | 3x | TBD | TBD | TBD | TBD | TBD | TBD | [config](./cascade_internimage_xl_fpn_3x_crowd_human.py) | TBD |
- Training speed is measured with A100 GPUs using current code and may be faster than the speed in logs.
- Some logs are our recent newly trained ones. There might be slight differences between the results in logs and our paper.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment