README.md 15.6 KB
Newer Older
PRC-Huang's avatar
PRC-Huang committed
1
2
3
4
# InternImage for Image Classification

This folder contains the implementation of the InternImage for image classification.

5
<!-- TOC -->
zhe chen's avatar
zhe chen committed
6

zhe chen's avatar
zhe chen committed
7
- [Installation](#installation)
zhe chen's avatar
zhe chen committed
8
- [Data Preparation](#data-preparation)
zhe chen's avatar
zhe chen committed
9
- [Released Models](#released-models)
zhe chen's avatar
zhe chen committed
10
11
12
13
14
15
16
- [Evaluation](#evaluation)
- [Training from Scratch on ImageNet-1K](#training-from-scratch-on-imagenet-1k)
- [Manage Jobs with Slurm](#manage-jobs-with-slurm)
- [Training with Deepspeed](#training-with-deepspeed)
- [Extracting Intermediate Features](#extracting-intermediate-features)
- [Export](#export)

17
18
<!-- TOC -->

zhe chen's avatar
zhe chen committed
19
## Installation
PRC-Huang's avatar
PRC-Huang committed
20

zhe chen's avatar
zhe chen committed
21
- Clone this repository:
PRC-Huang's avatar
PRC-Huang committed
22
23
24
25
26
27
28
29
30

```bash
git clone https://github.com/OpenGVLab/InternImage.git
cd InternImage
```

- Create a conda virtual environment and activate it:

```bash
zhe chen's avatar
zhe chen committed
31
conda create -n internimage python=3.9
PRC-Huang's avatar
PRC-Huang committed
32
33
34
35
36
conda activate internimage
```

- Install `CUDA>=10.2` with `cudnn>=7` following
  the [official installation instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
37
- Install `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2`:
PRC-Huang's avatar
PRC-Huang committed
38

zhe chen's avatar
zhe chen committed
39
For examples, to install `torch==1.11` with `CUDA==11.3`:
zhe chen's avatar
zhe chen committed
40

PRC-Huang's avatar
PRC-Huang committed
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
```bash
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113  -f https://download.pytorch.org/whl/torch_stable.html
```

- Install `timm==0.6.11` and `mmcv-full==1.5.0`:

```bash
pip install -U openmim
mim install mmcv-full==1.5.0
pip install timm==0.6.11 mmdet==2.28.1
```

- Install other requirements:

```bash
pip install opencv-python termcolor yacs pyyaml scipy
zhe chen's avatar
zhe chen committed
57
58
59
# Please use a version of numpy lower than 2.0
pip install numpy==1.26.4
pip install pydantic==1.10.13
PRC-Huang's avatar
PRC-Huang committed
60
61
62
```

- Compiling CUDA operators
zhe chen's avatar
zhe chen committed
63

zhe chen's avatar
zhe chen committed
64
65
Before compiling, please use the `nvcc -V` command to check whether your `nvcc` version matches the CUDA version of PyTorch.

PRC-Huang's avatar
PRC-Huang committed
66
67
68
69
70
71
```bash
cd ./ops_dcnv3
sh ./make.sh
# unit test (should see all checking is True)
python test.py
```
zhe chen's avatar
zhe chen committed
72

zhe chen's avatar
zhe chen committed
73
- You can also install the operator using precompiled `.whl` files
zhe chen's avatar
zhe chen committed
74
75
  [DCNv3-1.0-whl](https://github.com/OpenGVLab/InternImage/releases/tag/whl_files)

zhe chen's avatar
zhe chen committed
76
77
## Data Preparation

zhe chen's avatar
zhe chen committed
78
We provide the following ways to prepare data:
zhe chen's avatar
zhe chen committed
79
80
81

<details open>
  <summary>Standard ImageNet-1K</summary>
PRC-Huang's avatar
PRC-Huang committed
82

zhe chen's avatar
zhe chen committed
83
84
We use standard ImageNet dataset, you can download it from http://image-net.org/.

PRC-Huang's avatar
PRC-Huang committed
85
- For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:
zhe chen's avatar
zhe chen committed
86

PRC-Huang's avatar
PRC-Huang committed
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
  ```bash
  $ tree data
  imagenet
  ├── train
  │   ├── class1
  │   │   ├── img1.jpeg
  │   │   ├── img2.jpeg
  │   │   └── ...
  │   ├── class2
  │   │   ├── img3.jpeg
  │   │   └── ...
  │   └── ...
  └── val
      ├── class1
      │   ├── img4.jpeg
      │   ├── img5.jpeg
      │   └── ...
      ├── class2
      │   ├── img6.jpeg
      │   └── ...
      └── ...
  ```
zhe chen's avatar
zhe chen committed
109

zhe chen's avatar
zhe chen committed
110
111
112
113
114
</details>

<details>
  <summary>Zipped ImageNet-1K</summary>

PRC-Huang's avatar
PRC-Huang committed
115
116
- To boost the slow speed when reading images from massive small files, we also support zipped ImageNet, which includes
  four files:
zhe chen's avatar
zhe chen committed
117
118
119
120

  - `train.zip`, `val.zip`: which store the zipped folder for train and validate splits.
  - `train.txt`, `val.txt`: which store the relative path in the corresponding zip file and ground truth
    label. Make sure the data folder looks like this:
PRC-Huang's avatar
PRC-Huang committed
121
122
123
124
125
126
127
128
129

  ```bash
  $ tree data
  data
  └── ImageNet-Zip
      ├── train_map.txt
      ├── train.zip
      ├── val_map.txt
      └── val.zip
zhe chen's avatar
zhe chen committed
130

PRC-Huang's avatar
PRC-Huang committed
131
132
133
134
135
136
  $ head -n 5 meta_data/val.txt
  ILSVRC2012_val_00000001.JPEG	65
  ILSVRC2012_val_00000002.JPEG	970
  ILSVRC2012_val_00000003.JPEG	230
  ILSVRC2012_val_00000004.JPEG	809
  ILSVRC2012_val_00000005.JPEG	516
zhe chen's avatar
zhe chen committed
137

PRC-Huang's avatar
PRC-Huang committed
138
139
140
141
142
143
144
  $ head -n 5 meta_data/train.txt
  n01440764/n01440764_10026.JPEG	0
  n01440764/n01440764_10027.JPEG	0
  n01440764/n01440764_10029.JPEG	0
  n01440764/n01440764_10040.JPEG	0
  n01440764/n01440764_10042.JPEG	0
  ```
zhe chen's avatar
zhe chen committed
145

zhe chen's avatar
zhe chen committed
146
147
148
149
150
</details>

<details>
  <summary>ImageNet-22K</summary>

PRC-Huang's avatar
PRC-Huang committed
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
- For ImageNet-22K dataset, make a folder named `fall11_whole` and move all images to labeled sub-folders in this
  folder. Then download the train-val split
  file ([ILSVRC2011fall_whole_map_train.txt](https://github.com/SwinTransformer/storage/releases/download/v2.0.1/ILSVRC2011fall_whole_map_train.txt)
  & [ILSVRC2011fall_whole_map_val.txt](https://github.com/SwinTransformer/storage/releases/download/v2.0.1/ILSVRC2011fall_whole_map_val.txt))
  , and put them in the parent directory of `fall11_whole`. The file structure should look like:

  ```bash
    $ tree imagenet22k/
    imagenet22k/
    └── fall11_whole
        ├── n00004475
        ├── n00005787
        ├── n00006024
        ├── n00006484
        └── ...
  ```

zhe chen's avatar
zhe chen committed
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
</details>

## Released Models

<details open>
<summary> Open-Source Visual Pretrained Models </summary>
<br>
<div>

|      name      |   pretrain   | pre-training  resolution | #param |                                               download                                                |
| :------------: | :----------: | :----------------------: | :----: | :---------------------------------------------------------------------------------------------------: |
| InternImage-L  | ImageNet-22K |         384x384          |  223M  |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22k_192to384.pth)    |
| InternImage-XL | ImageNet-22K |         384x384          |  335M  |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22k_192to384.pth)   |
| InternImage-H  |  Joint 427M  |         384x384          | 1.08B  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_jointto22k_384.pth)   |
| InternImage-G  |      -       |         384x384          |   3B   | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth) |

</div>

</details>

<details open>
<summary> ImageNet-1K Image Classification </summary>
<br>
<div>

|      name      |   pretrain   | resolution | acc@1 | #param | FLOPs |                                                                              download                                                                               |
| :------------: | :----------: | :--------: | :---: | :----: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-T  | ImageNet-1K  |  224x224   | 83.5  |  30M   |  5G   |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_t_1k_224.yaml)       |
| InternImage-S  | ImageNet-1K  |  224x224   | 84.2  |  50M   |  8G   |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_s_1k_224.yaml)       |
| InternImage-B  | ImageNet-1K  |  224x224   | 84.9  |  97M   |  16G  |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_b_1k_224.yaml)       |
| InternImage-L  | ImageNet-22K |  384x384   | 87.7  |  223M  | 108G  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_l_22kto1k_384.yaml)  |
| InternImage-XL | ImageNet-22K |  384x384   | 88.0  |  335M  | 163G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_xl_22kto1k_384.yaml) |
| InternImage-H  |  Joint 427M  |  640x640   | 89.6  | 1.08B  | 1478G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22kto1k_640.pth) \| [cfg](configs/without_lr_decay/internimage_h_22kto1k_640.yaml)  |
| InternImage-G  |      -       |  512x512   | 90.1  |   3B   | 2700G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth) \| [cfg](configs/without_lr_decay/internimage_g_22kto1k_512.yaml)  |

</div>

</details>

## Evaluation
PRC-Huang's avatar
PRC-Huang committed
208

209
To evaluate a pretrained `InternImage` on ImageNet val, run:
PRC-Huang's avatar
PRC-Huang committed
210
211
212

```bash
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py --eval \
zhe chen's avatar
zhe chen committed
213
--cfg <config-file> --resume <checkpoint> --data-path <imagenet-path>
PRC-Huang's avatar
PRC-Huang committed
214
215
216
217
218
219
220
221
222
```

For example, to evaluate the `InternImage-B` with a single GPU:

```bash
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval \
--cfg configs/internimage_b_1k_224.yaml --resume internimage_b_1k_224.pth --data-path <imagenet-path>
```

zhe chen's avatar
zhe chen committed
223
## Training from Scratch on ImageNet-1K
PRC-Huang's avatar
PRC-Huang committed
224

225
226
> The paper results were obtained from models trained with configs in `configs/without_lr_decay`.

PRC-Huang's avatar
PRC-Huang committed
227
228
229
To train an `InternImage` on ImageNet from scratch, run:

```bash
zhe chen's avatar
zhe chen committed
230
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345  main.py \
PRC-Huang's avatar
PRC-Huang committed
231
232
233
--cfg <config-file> --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]
```

zhe chen's avatar
zhe chen committed
234
## Manage Jobs with Slurm
PRC-Huang's avatar
PRC-Huang committed
235

zhe chen's avatar
zhe chen committed
236
For example, to train or evaluate `InternImage` with 8 GPU on a single node, run:
PRC-Huang's avatar
PRC-Huang committed
237
238
239
240

`InternImage-T`:

```bash
zhe chen's avatar
zhe chen committed
241
242
243
244
# Train for 300 epochs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml
# Evaluate on ImageNet-1K
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --resume pretrained/internimage_t_1k_224.pth --eval
PRC-Huang's avatar
PRC-Huang committed
245
246
247
248
249
```

`InternImage-S`:

```bash
zhe chen's avatar
zhe chen committed
250
251
252
253
# Train for 300 epochs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml
# Evaluate on ImageNet-1K
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --resume pretrained/internimage_s_1k_224.pth --eval
PRC-Huang's avatar
PRC-Huang committed
254
255
256
257
258
```

`InternImage-XL`:

```bash
zhe chen's avatar
zhe chen committed
259
# Train for 300 epochs
260
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml
zhe chen's avatar
zhe chen committed
261
# Evaluate on ImageNet-1K
262
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml --resume pretrained/internimage_xl_22kto1k_384.pth --eval
PRC-Huang's avatar
PRC-Huang committed
263
264
```

zhe chen's avatar
zhe chen committed
265
<!--
PRC-Huang's avatar
PRC-Huang committed
266
267
268
269
270
### Test pretrained model on ImageNet-22K

For example, to evaluate the `InternImage-L-22k`:

```bash
zhe chen's avatar
zhe chen committed
271
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345  main.py \
PRC-Huang's avatar
PRC-Huang committed
272
273
274
275
--cfg configs/internimage_xl_22k_192to384.yaml --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory>] \
--resume internimage_xl_22k_192to384.pth --eval
``` -->

276
<!-- ### Fine-tuning from a ImageNet-22K pretrained model
PRC-Huang's avatar
PRC-Huang committed
277

278
For example, to fine-tune a `InternImage-XL-22k` model pretrained on ImageNet-22K:
PRC-Huang's avatar
PRC-Huang committed
279
280
281
282
283
284
285
286

```bashs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_image_.yaml --pretrained intern_image_b.pth --eval
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345  main.py \
--cfg configs/.yaml --pretrained swin_base_patch4_window7_224_22k.pth \
--data-path <imagenet-path> --batch-size 64 --accumulation-steps 2 [--use-checkpoint]
``` -->

zhe chen's avatar
zhe chen committed
287
## Training with DeepSpeed
288

zhe chen's avatar
zhe chen committed
289
We support utilizing [DeepSpeed](https://github.com/microsoft/DeepSpeed) to reduce memory costs for training large-scale models, e.g. InternImage-H with over 1 billion parameters.
290
291
292
To use it, first install the requirements as

```bash
Zeqiang Lai's avatar
Zeqiang Lai committed
293
pip install deepspeed==0.8.3
294
295
```

Zeqiang Lai's avatar
Zeqiang Lai committed
296
297
Then you could launch the training in a slurm system with 8 GPUs as follows (tiny and huge as examples).
The default zero stage is 1 and it could config via command line args `--zero-stage`.
zhe chen's avatar
zhe chen committed
298

299
```
zhe chen's avatar
zhe chen committed
300
301
302
303
304
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4 --eval --resume ckpt.pth
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4 --eval --resume deepspeed_ckpt_dir
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --batch-size 16 --accumulation-steps 4 --pretrained pretrained/internimage_h_jointto22k_384.pth
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --batch-size 16 --accumulation-steps 4 --pretrained pretrained/internimage_h_jointto22k_384.pth --zero-stage 3
305
306
```

zhe chen's avatar
zhe chen committed
307
🤗 **HuggingFace Accelerate Integration of DeepSpeed**
308

zhe chen's avatar
zhe chen committed
309
Optionally, you could use our [HuggingFace Accelerate](https://github.com/huggingface/accelerate) integration to use DeepSpeed.
310
311

```bash
Zeqiang Lai's avatar
Zeqiang Lai committed
312
pip install accelerate==0.18.0
313
314
315
```

```bash
zhe chen's avatar
zhe chen committed
316
accelerate launch --config_file configs/accelerate/dist_8gpus_zero3_wo_loss_scale.yaml main_accelerate.py --cfg configs/internimage_h_22kto1k_640.yaml --data-path data/imagenet --batch-size 16 --pretrained pretrained/internimage_h_jointto22k_384.pth --accumulation-steps 4
zhe chen's avatar
zhe chen committed
317
318
accelerate launch --config_file configs/accelerate/dist_8gpus_zero3_offload.yaml main_accelerate.py --cfg configs/internimage_t_1k_224.yaml --data-path data/imagenet --batch-size 128 --accumulation-steps 4 --output output_zero3_offload
accelerate launch --config_file configs/accelerate/dist_8gpus_zero1.yaml main_accelerate.py --cfg configs/internimage_t_1k_224.yaml --data-path data/imagenet --batch-size 128 --accumulation-steps 4
319
320
321
322
323
324
325
326
```

**Memory Costs**

Here is the reference GPU memory cost for InternImage-H with 8 GPUs.

- total batch size = 512, 16 batch size for each GPU, gradient accumulation steps = 4.

zhe chen's avatar
zhe chen committed
327
328
329
330
331
332
333
| Resolution | Zero Stage | Cpu Offloading | Memory |
| :--------: | :--------: | :------------: | :----: |
|    640     |   zero1    |     False      | 22572  |
|    640     |   zero3    |     False      | 20000  |
|    640     |   zero3    |      True      | 19144  |
|    384     |   zero1    |     False      | 16000  |
|    384     |   zero3    |      True      | 11928  |
334
335
336

**Convert Checkpoints**

zhe chen's avatar
zhe chen committed
337
To convert DeepSpeed checkpoints to pytorch fp32 checkpoint, you could use the following snippet.
338
339
340
341
342
343
344
345
346
347

```python
from deepspeed.utils.zero_to_fp32 import convert_zero_checkpoint_to_fp32_state_dict
convert_zero_checkpoint_to_fp32_state_dict(checkpoint_dir, 'best.pth', tag='best')
```

Then, you could use `best.pth` as usual, e.g., `model.load_state_dict(torch.load('best.pth'))`

> Due to the lack of computational resources, the deepspeed training scripts are currently only verified for the first few epochs. Please fire an issue if you have problems for reproducing the whole training.

zhe chen's avatar
zhe chen committed
348
## Extracting Intermediate Features
349

zhe chen's avatar
zhe chen committed
350
To extract the features of an intermediate layer, you could use `extract_feature.py`.
351
352
353
354
355
356
357

For example, extract features of `b.png` from layers `patch_embed` and `levels.0.downsample` and save them to 'b.pth'.

```bash
python extract_feature.py --cfg configs/internimage_t_1k_224.yaml --img b.png --keys patch_embed levels.0.downsample --save --resume internimage_t_1k_224.pth
```

zhe chen's avatar
zhe chen committed
358
## Export
PRC-Huang's avatar
PRC-Huang committed
359

zhe chen's avatar
zhe chen committed
360
361
362
363
364
365
Install `mmdeploy` at first:

```shell
pip
```

PRC-Huang's avatar
PRC-Huang committed
366
To export `InternImage-T` from PyTorch to ONNX, run:
zhe chen's avatar
zhe chen committed
367

PRC-Huang's avatar
PRC-Huang committed
368
369
370
371
372
```shell
python export.py --model_name internimage_t_1k_224 --ckpt_dir /path/to/ckpt/dir --onnx
```

To export `InternImage-T` from PyTorch to TensorRT, run:
zhe chen's avatar
zhe chen committed
373

PRC-Huang's avatar
PRC-Huang committed
374
```shell
zhe chen's avatar
zhe chen committed
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
git clone https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy && git checkout v0.13.0
export MMDEPLOY_DIR=$(pwd)
# prepare our custom ops, you can find it at InternImage/tensorrt/modulated_deform_conv_v3
cp -r ../../tensorrt/modulated_deform_conv_v3 csrc/mmdeploy/backend_ops/tensorrt/

# build custom ops
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++ -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${TENSORRT_DIR} -DCUDNN_DIR=${CUDNN_DIR} ..
make -j$(nproc) && make install

# install the mmdeploy after building custom ops
pip install -e .
cd ../

PRC-Huang's avatar
PRC-Huang committed
390
391
python export.py --model_name internimage_t_1k_224 --ckpt_dir /path/to/ckpt/dir --trt
```