"pcdet/vscode:/vscode.git/clone" did not exist on "f6c824ccba79ebff982bf788171b5d0fb81abb40"
README.md 19.1 KB
Newer Older
zhe chen's avatar
zhe chen committed
1
2
# InternImage for Semantic Segmentation

zhe chen's avatar
zhe chen committed
3
This folder contains the implementation of the InternImage for semantic segmentation.
zhe chen's avatar
zhe chen committed
4
5
6

Our segmentation code is developed on top of [MMSegmentation v0.27.0](https://github.com/open-mmlab/mmsegmentation/tree/v0.27.0).

zhe chen's avatar
zhe chen committed
7
<!-- TOC -->
zhe chen's avatar
zhe chen committed
8

zhe chen's avatar
zhe chen committed
9
10
11
12
13
14
15
16
- [Installation](#installation)
- [Data Preparation](#data-preparation)
- [Released Models](#released-models)
- [Evaluation](#evaluation)
- [Training](#training)
- [Manage Jobs with Slurm](#manage-jobs-with-slurm)
- [Image Demo](#image-demo)
- [Export](#export)
zhe chen's avatar
zhe chen committed
17

zhe chen's avatar
zhe chen committed
18
19
20
21
22
<!-- TOC -->

## Installation

- Clone this repository:
zhe chen's avatar
zhe chen committed
23
24
25
26
27
28
29
30
31

```bash
git clone https://github.com/OpenGVLab/InternImage.git
cd InternImage
```

- Create a conda virtual environment and activate it:

```bash
zhe chen's avatar
zhe chen committed
32
conda create -n internimage python=3.9
zhe chen's avatar
zhe chen committed
33
34
35
36
37
conda activate internimage
```

- Install `CUDA>=10.2` with `cudnn>=7` following
  the [official installation instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
38
- Install `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2`:
zhe chen's avatar
zhe chen committed
39

zhe chen's avatar
zhe chen committed
40
For examples, to install `torch==1.11` with `CUDA==11.3`:
zhe chen's avatar
zhe chen committed
41

zhe chen's avatar
zhe chen committed
42
```bash
zhe chen's avatar
zhe chen committed
43
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113  -f https://download.pytorch.org/whl/torch_stable.html
zhe chen's avatar
zhe chen committed
44
45
```

46
47
- Install other requirements:

zhe chen's avatar
zhe chen committed
48
  note: conda opencv will break torchvision as not to support GPU, so we need to install opencv using pip.
zhe chen's avatar
zhe chen committed
49
50

```bash
51
52
conda install -c conda-forge termcolor yacs pyyaml scipy pip -y
pip install opencv-python
zhe chen's avatar
zhe chen committed
53
54
```

zhe chen's avatar
zhe chen committed
55
- Install `timm`, `mmcv-full` and \`mmsegmentation':
zhe chen's avatar
zhe chen committed
56
57

```bash
58
59
60
61
pip install -U openmim
mim install mmcv-full==1.5.0
mim install mmsegmentation==0.27.0
pip install timm==0.6.11 mmdet==2.28.1
zhe chen's avatar
zhe chen committed
62
63
```

zhe chen's avatar
zhe chen committed
64
65
66
67
68
69
70
71
72
- Install other requirements:

```bash
pip install opencv-python termcolor yacs pyyaml scipy
# Please use a version of numpy lower than 2.0
pip install numpy==1.26.4
pip install pydantic==1.10.13
```

zhe chen's avatar
zhe chen committed
73
- Compile CUDA operators
zhe chen's avatar
zhe chen committed
74

zhe chen's avatar
zhe chen committed
75
76
Before compiling, please use the `nvcc -V` command to check whether your `nvcc` version matches the CUDA version of PyTorch.

zhe chen's avatar
zhe chen committed
77
78
79
80
81
82
```bash
cd ./ops_dcnv3
sh ./make.sh
# unit test (should see all checking is True)
python test.py
```
zhe chen's avatar
zhe chen committed
83

zhe chen's avatar
zhe chen committed
84
- You can also install the operator using precompiled `.whl` files
zhe chen's avatar
zhe chen committed
85
  [DCNv3-1.0-whl](https://github.com/OpenGVLab/InternImage/releases/tag/whl_files)
zhe chen's avatar
zhe chen committed
86

zhe chen's avatar
zhe chen committed
87
## Data Preparation
zhe chen's avatar
zhe chen committed
88

89
Prepare datasets according to the [guidelines](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#prepare-datasets) in MMSegmentation.
zhe chen's avatar
zhe chen committed
90

zhe chen's avatar
zhe chen committed
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
## Released Models

<details open>
<summary> Dataset: ADE20K </summary>
<br>
<div>

|   method    |    backbone    | resolution | mIoU (ss/ms) | #param | FLOPs |                                       Config                                        |                                                                                                                       Download                                                                                                                       |
| :---------: | :------------: | :--------: | :----------: | :----: | :---: | :---------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|   UperNet   | InternImage-T  |  512x512   | 47.9 / 48.1  |  59M   | 944G  |         [config](./configs/ade20k/upernet_internimage_t_512_160k_ade20k.py)         |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512_160k_ade20k.log.json)              |
|   UperNet   | InternImage-S  |  512x512   | 50.1 / 50.9  |  80M   | 1017G |         [config](./configs/ade20k/upernet_internimage_s_512_160k_ade20k.py)         |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512_160k_ade20k.log.json)              |
|   UperNet   | InternImage-B  |  512x512   | 50.8 / 51.3  |  128M  | 1185G |         [config](./configs/ade20k/upernet_internimage_b_512_160k_ade20k.py)         |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512_160k_ade20k.log.json)              |
|   UperNet   | InternImage-L  |  640x640   | 53.9 / 54.1  |  256M  | 2526G |         [config](./configs/ade20k/upernet_internimage_l_640_160k_ade20k.py)         |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_640_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_640_160k_ade20k.log.json)              |
|   UperNet   | InternImage-XL |  640x640   | 55.0 / 55.3  |  368M  | 3142G |        [config](./configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py)         |             [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_640_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_640_160k_ade20k.log.json)             |
|   UperNet   | InternImage-H  |  896x896   | 59.9 / 60.3  | 1.12B  | 3566G |         [config](./configs/ade20k/upernet_internimage_h_896_160k_ade20k.py)         |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_h_896_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_h_896_160k_ade20k.log.json)              |
| Mask2Former | InternImage-H  |  896x896   | 62.6 / 62.9  | 1.31B  | 4635G | [config](./configs/ade20k/mask2former_internimage_h_896_80k_cocostuff2ade20k_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.log.json) |

</div>

</details>

<details>
<summary> Dataset: Cityscapes </summary>
<br>
<div>

zhe chen's avatar
zhe chen committed
117
118
119
120
121
122
123
124
125
126
127
128
|    method     |    backbone    | resolution | mIoU (ss/ms)  | #params | FLOPs |                                             Config                                             |                                                                                                                                 Download                                                                                                                                 |
| :-----------: | :------------: | :--------: | :-----------: | :-----: | :---: | :--------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|    UperNet    | InternImage-T  |  512x1024  | 82.58 / 83.40 |   59M   | 1889G |        [config](./configs/cityscapes/upernet_internimage_t_512x1024_160k_cityscapes.py)        |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512x1024_160k_cityscapes.log.json)               |
|    UperNet    | InternImage-S  |  512x1024  | 82.74 / 83.45 |   80M   | 2035G |        [config](./configs/cityscapes/upernet_internimage_s_512x1024_160k_cityscapes.py)        |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512x1024_160k_cityscapes.log.json)               |
|    UperNet    | InternImage-B  |  512x1024  | 83.18 / 83.97 |  128M   | 2369G |        [config](./configs/cityscapes/upernet_internimage_b_512x1024_160k_cityscapes.py)        |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512x1024_160k_cityscapes.log.json)               |
|    UperNet    | InternImage-L  |  512x1024  | 83.68 / 84.41 |  256M   | 3234G |        [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_cityscapes.py)        |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_cityscapes.log.json)               |
|   UperNet\*   | InternImage-L  |  512x1024  | 85.94 / 86.22 |  256M   | 3234G |   [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.py)   |    [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.pth)  \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.log.json)     |
|    UperNet    | InternImage-XL |  512x1024  | 83.62 / 84.28 |  368M   | 4022G |       [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_cityscapes.py)        |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_cityscapes.log.json)              |
|   UperNet\*   | InternImage-XL |  512x1024  | 86.20 / 86.42 |  368M   | 4022G |  [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.py)   |    [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json)    |
|  SegFormer\*  | InternImage-L  |  512x1024  | 85.16 / 85.67 |  220M   | 1580G |  [config](./configs/cityscapes/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.py)  |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.log.json)   |
|  SegFormer\*  | InternImage-XL |  512x1024  | 85.41 / 85.93 |  330M   | 2364G | [config](./configs/cityscapes/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.py)  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json)  |
| Mask2Former\* | InternImage-H  | 1024x1024  | 86.37 / 86.96 |  1094M  | 7878G | [config](./configs/cityscapes/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.log.json) |
zhe chen's avatar
zhe chen committed
129
130
131
132
133
134
135
136
137
138
139
140

\* denotes the model is trained using extra Mapillary dataset.

</div>

</details>

<details>
<summary> Dataset: COCO-Stuff-164K </summary>
<br>
<div>

zhe chen's avatar
zhe chen committed
141
142
143
|   method    |   backbone    | resolution | mIoU (ss) | #params | FLOPs |                                        Config                                         |                                                                                                                    Download                                                                                                                    |
| :---------: | :-----------: | :--------: | :-------: | :-----: | :---: | :-----------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| Mask2Former | InternImage-H |  896x896   |   52.6    |  1.31B  | 4635G | [config](./configs/coco_stuff164k/mask2former_internimage_h_896_80k_cocostuff164k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff164k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_896_80k_cocostuff164k.log.json) |
zhe chen's avatar
zhe chen committed
144
145
146
147
148

</div>

</details>

zhe chen's avatar
zhe chen committed
149
150
151
152
153
<details>
<summary> Dataset: COCO-Stuff-10K </summary>
<br>
<div>

zhe chen's avatar
zhe chen committed
154
155
|   method    |   backbone    | resolution |  mIoU (ss)  | #params | FLOPs |                                           Config                                            |                                                                                                                           Download                                                                                                                           |
| :---------: | :-----------: | :--------: | :---------: | :-----: | :---: | :-----------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
zhe chen's avatar
zhe chen committed
156
| Mask2Former | InternImage-H |  512x512   | 59.2 / 59.6 |  1.28B  | 1528G | [config](./configs/coco_stuff10k/mask2former_internimage_h_512_40k_cocostuff164k_to_10k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_512_40k_cocostuff164k_to_10k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_512_40k_cocostuff164k_to_10k.log.json) |
zhe chen's avatar
zhe chen committed
157
158
159
160
161

</div>

</details>

zhe chen's avatar
zhe chen committed
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
<details>
<summary> Dataset: Pascal-Context-59 </summary>
<br>
<div>

|   method    |   backbone    | resolution | mIoU (ss/ms) | #param | FLOPs |                                          Config                                           |                                                                                                                        Download                                                                                                                        |
| :---------: | :-----------: | :--------: | :----------: | :----: | :---: | :---------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| Mask2Former | InternImage-H |  480x480   | 69.7 / 70.3  | 1.07B  | 867G  | [config](./configs/pascal_context/mask2former_internimage_h_480_40k_pascal_context_59.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_480_40k_pascal_context_59.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_480_40k_pascal_context_59.log.json) |

</div>

</details>

<details>
<summary> Dataset: NYU-Depth-V2 </summary>
<br>
<div>

|   method    |   backbone    | resolution | mIoU (ss/ms) | #param | FLOPs |                                  Config                                   |                                                                                                          Download                                                                                                          |
| :---------: | :-----------: | :--------: | :----------: | :----: | :---: | :-----------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| Mask2Former | InternImage-H |  480x480   | 67.1 / 68.1  | 1.07B  | 867G  | [config](./configs/nyu_depth_v2/mask2former_internimage_h_480_40k_nyu.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_480_40k_nyu.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_480_40k_nyu.log.json) |

</div>

</details>

zhe chen's avatar
zhe chen committed
188
## Evaluation
zhe chen's avatar
zhe chen committed
189
190
191
192
193
194

To evaluate our `InternImage` on ADE20K val, run:

```bash
sh dist_test.sh <config-file> <checkpoint> <gpu-num> --eval mIoU
```
zhe chen's avatar
zhe chen committed
195

zhe chen's avatar
zhe chen committed
196
197
198
For example, to evaluate the `InternImage-T` with a single GPU:

```bash
zhe chen's avatar
zhe chen committed
199
python test.py configs/ade20k/upernet_internimage_t_512_160k_ade20k.py pretrained/upernet_internimage_t_512_160k_ade20k.pth --eval mIoU
zhe chen's avatar
zhe chen committed
200
201
202
203
204
```

For example, to evaluate the `InternImage-B` with a single node with 8 GPUs:

```bash
zhe chen's avatar
zhe chen committed
205
sh dist_test.sh configs/ade20k/upernet_internimage_b_512_160k_ade20k.py pretrained/upernet_internimage_b_512_160k_ade20k.pth 8 --eval mIoU
zhe chen's avatar
zhe chen committed
206
207
```

zhe chen's avatar
zhe chen committed
208
## Training
zhe chen's avatar
zhe chen committed
209
210
211
212
213
214
215

To train an `InternImage` on ADE20K, run:

```bash
sh dist_train.sh <config-file> <gpu-num>
```

216
For example, to train `InternImage-T` with 8 GPU on 1 node (total batch size 16), run:
zhe chen's avatar
zhe chen committed
217
218

```bash
219
sh dist_train.sh configs/ade20k/upernet_internimage_t_512_160k_ade20k.py 8
zhe chen's avatar
zhe chen committed
220
221
```

zhe chen's avatar
zhe chen committed
222
## Manage Jobs with Slurm
zhe chen's avatar
zhe chen committed
223

224
For example, to train `InternImage-XL` with 8 GPU on 1 node (total batch size 16), run:
zhe chen's avatar
zhe chen committed
225
226

```bash
227
228
229
GPUS=8 sh slurm_train.sh <partition> <job-name> configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py
```

zhe chen's avatar
zhe chen committed
230
## Image Demo
zhe chen's avatar
zhe chen committed
231

232
To inference a single/multiple image like this.
zhe chen's avatar
zhe chen committed
233
If you specify image containing directory instead of a single image, it will process all the images in the directory.
zhe chen's avatar
zhe chen committed
234

235
236
237
238
239
```
CUDA_VISIBLE_DEVICES=0 python image_demo.py \
  data/ade/ADEChallengeData2016/images/validation/ADE_val_00000591.jpg \
  configs/ade20k/upernet_internimage_t_512_160k_ade20k.py  \
  checkpoint_dir/seg/upernet_internimage_t_512_160k_ade20k.pth  \
zhe chen's avatar
zhe chen committed
240
  --palette ade20k
zhe chen's avatar
zhe chen committed
241
```
Weiyun1025's avatar
Weiyun1025 committed
242

zhe chen's avatar
zhe chen committed
243
244
245
246
247
248
249
## Export

Install `mmdeploy` at first:

```shell
pip install mmdeploy==0.14.0
```
Weiyun1025's avatar
Weiyun1025 committed
250
251

To export a segmentation model from PyTorch to TensorRT, run:
zhe chen's avatar
zhe chen committed
252

Weiyun1025's avatar
Weiyun1025 committed
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
```shell
MODEL="model_name"
CKPT_PATH="/path/to/model/ckpt.pth"

python deploy.py \
    "./deploy/configs/mmseg/segmentation_tensorrt_static-512x512.py" \
    "./configs/ade20k/${MODEL}.py" \
    "${CKPT_PATH}" \
    "./deploy/demo.png" \
    --work-dir "./work_dirs/mmseg/${MODEL}" \
    --device cuda \
    --dump-info
```

For example, to export `upernet_internimage_t_512_160k_ade20k` from PyTorch to TensorRT, run:
zhe chen's avatar
zhe chen committed
268

Weiyun1025's avatar
Weiyun1025 committed
269
270
271
272
273
274
275
276
277
278
279
280
281
```shell
MODEL="upernet_internimage_t_512_160k_ade20k"
CKPT_PATH="/path/to/model/ckpt/upernet_internimage_t_512_160k_ade20k.pth"

python deploy.py \
    "./deploy/configs/mmseg/segmentation_tensorrt_static-512x512.py" \
    "./configs/ade20k/${MODEL}.py" \
    "${CKPT_PATH}" \
    "./deploy/demo.png" \
    --work-dir "./work_dirs/mmseg/${MODEL}" \
    --device cuda \
    --dump-info
```