README.md 19.7 KB
Newer Older
PRC-Huang's avatar
PRC-Huang committed
1
2
3
4
# InternImage for Image Classification

This folder contains the implementation of the InternImage for image classification.

5
<!-- TOC -->
zhe chen's avatar
zhe chen committed
6

zhe chen's avatar
zhe chen committed
7
- [Installation](#installation)
zhe chen's avatar
zhe chen committed
8
- [Data Preparation](#data-preparation)
zhe chen's avatar
zhe chen committed
9
- [Released Models](#released-models)
zhe chen's avatar
zhe chen committed
10
11
12
- [Evaluation](#evaluation)
- [Training from Scratch on ImageNet-1K](#training-from-scratch-on-imagenet-1k)
- [Manage Jobs with Slurm](#manage-jobs-with-slurm)
zhe chen's avatar
zhe chen committed
13
- [Training with DeepSpeed](#training-with-deepspeed)
zhe chen's avatar
zhe chen committed
14
15
16
- [Extracting Intermediate Features](#extracting-intermediate-features)
- [Export](#export)

17
18
<!-- TOC -->

zhe chen's avatar
zhe chen committed
19
## Installation
PRC-Huang's avatar
PRC-Huang committed
20

zhe chen's avatar
zhe chen committed
21
- Clone this repository:
PRC-Huang's avatar
PRC-Huang committed
22
23
24
25
26
27
28
29
30

```bash
git clone https://github.com/OpenGVLab/InternImage.git
cd InternImage
```

- Create a conda virtual environment and activate it:

```bash
zhe chen's avatar
zhe chen committed
31
conda create -n internimage python=3.9
PRC-Huang's avatar
PRC-Huang committed
32
33
34
35
36
conda activate internimage
```

- Install `CUDA>=10.2` with `cudnn>=7` following
  the [official installation instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
37
- Install `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2`:
PRC-Huang's avatar
PRC-Huang committed
38

zhe chen's avatar
zhe chen committed
39
For examples, to install `torch==1.11` with `CUDA==11.3`:
zhe chen's avatar
zhe chen committed
40

PRC-Huang's avatar
PRC-Huang committed
41
42
43
44
45
46
47
48
49
```bash
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113  -f https://download.pytorch.org/whl/torch_stable.html
```

- Install `timm==0.6.11` and `mmcv-full==1.5.0`:

```bash
pip install -U openmim
mim install mmcv-full==1.5.0
zhe chen's avatar
zhe chen committed
50
mim install mmsegmentation==0.27.0
PRC-Huang's avatar
PRC-Huang committed
51
52
53
54
55
56
57
pip install timm==0.6.11 mmdet==2.28.1
```

- Install other requirements:

```bash
pip install opencv-python termcolor yacs pyyaml scipy
zhe chen's avatar
zhe chen committed
58
59
60
# Please use a version of numpy lower than 2.0
pip install numpy==1.26.4
pip install pydantic==1.10.13
PRC-Huang's avatar
PRC-Huang committed
61
62
```

zhe chen's avatar
zhe chen committed
63
- Compile CUDA operators
zhe chen's avatar
zhe chen committed
64

zhe chen's avatar
zhe chen committed
65
66
Before compiling, please use the `nvcc -V` command to check whether your `nvcc` version matches the CUDA version of PyTorch.

PRC-Huang's avatar
PRC-Huang committed
67
68
69
70
71
72
```bash
cd ./ops_dcnv3
sh ./make.sh
# unit test (should see all checking is True)
python test.py
```
zhe chen's avatar
zhe chen committed
73

zhe chen's avatar
zhe chen committed
74
- You can also install the operator using precompiled `.whl` files
zhe chen's avatar
zhe chen committed
75
76
  [DCNv3-1.0-whl](https://github.com/OpenGVLab/InternImage/releases/tag/whl_files)

zhe chen's avatar
zhe chen committed
77
78
## Data Preparation

zhe chen's avatar
zhe chen committed
79
We provide the following ways to prepare data:
zhe chen's avatar
zhe chen committed
80
81
82

<details open>
  <summary>Standard ImageNet-1K</summary>
PRC-Huang's avatar
PRC-Huang committed
83

zhe chen's avatar
zhe chen committed
84
- We use standard ImageNet dataset, you can download it from http://image-net.org/.
zhe chen's avatar
zhe chen committed
85

PRC-Huang's avatar
PRC-Huang committed
86
- For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:
zhe chen's avatar
zhe chen committed
87

PRC-Huang's avatar
PRC-Huang committed
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
  ```bash
  $ tree data
  imagenet
  ├── train
  │   ├── class1
  │   │   ├── img1.jpeg
  │   │   ├── img2.jpeg
  │   │   └── ...
  │   ├── class2
  │   │   ├── img3.jpeg
  │   │   └── ...
  │   └── ...
  └── val
      ├── class1
      │   ├── img4.jpeg
      │   ├── img5.jpeg
      │   └── ...
      ├── class2
      │   ├── img6.jpeg
      │   └── ...
      └── ...
  ```
zhe chen's avatar
zhe chen committed
110

zhe chen's avatar
zhe chen committed
111
112
113
114
115
</details>

<details>
  <summary>Zipped ImageNet-1K</summary>

PRC-Huang's avatar
PRC-Huang committed
116
117
- To boost the slow speed when reading images from massive small files, we also support zipped ImageNet, which includes
  four files:
zhe chen's avatar
zhe chen committed
118
119
120
121

  - `train.zip`, `val.zip`: which store the zipped folder for train and validate splits.
  - `train.txt`, `val.txt`: which store the relative path in the corresponding zip file and ground truth
    label. Make sure the data folder looks like this:
PRC-Huang's avatar
PRC-Huang committed
122
123
124
125
126
127
128
129
130

  ```bash
  $ tree data
  data
  └── ImageNet-Zip
      ├── train_map.txt
      ├── train.zip
      ├── val_map.txt
      └── val.zip
zhe chen's avatar
zhe chen committed
131

PRC-Huang's avatar
PRC-Huang committed
132
133
134
135
136
137
  $ head -n 5 meta_data/val.txt
  ILSVRC2012_val_00000001.JPEG	65
  ILSVRC2012_val_00000002.JPEG	970
  ILSVRC2012_val_00000003.JPEG	230
  ILSVRC2012_val_00000004.JPEG	809
  ILSVRC2012_val_00000005.JPEG	516
zhe chen's avatar
zhe chen committed
138

PRC-Huang's avatar
PRC-Huang committed
139
140
141
142
143
144
145
  $ head -n 5 meta_data/train.txt
  n01440764/n01440764_10026.JPEG	0
  n01440764/n01440764_10027.JPEG	0
  n01440764/n01440764_10029.JPEG	0
  n01440764/n01440764_10040.JPEG	0
  n01440764/n01440764_10042.JPEG	0
  ```
zhe chen's avatar
zhe chen committed
146

zhe chen's avatar
zhe chen committed
147
148
149
150
151
</details>

<details>
  <summary>ImageNet-22K</summary>

PRC-Huang's avatar
PRC-Huang committed
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
- For ImageNet-22K dataset, make a folder named `fall11_whole` and move all images to labeled sub-folders in this
  folder. Then download the train-val split
  file ([ILSVRC2011fall_whole_map_train.txt](https://github.com/SwinTransformer/storage/releases/download/v2.0.1/ILSVRC2011fall_whole_map_train.txt)
  & [ILSVRC2011fall_whole_map_val.txt](https://github.com/SwinTransformer/storage/releases/download/v2.0.1/ILSVRC2011fall_whole_map_val.txt))
  , and put them in the parent directory of `fall11_whole`. The file structure should look like:

  ```bash
    $ tree imagenet22k/
    imagenet22k/
    └── fall11_whole
        ├── n00004475
        ├── n00005787
        ├── n00006024
        ├── n00006484
        └── ...
  ```

zhe chen's avatar
zhe chen committed
169
170
</details>

171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
<details>
  <summary>iNaturalist 2018</summary>

- For the iNaturalist 2018, please download the dataset from the [official repository](https://github.com/visipedia/inat_comp/blob/master/2018/README.md).
  The file structure should look like:

  ```bash
    $ tree inat2018/
    inat2018/
    ├── categories.json
    ├── test2018
    ├── test2018.json
    ├── train2018.json
    ├── train2018_locations.json
    ├── val2018
    ├── val2018.json
    └── val2018_locations.json
  ```

</details>

zhe chen's avatar
zhe chen committed
192
193
194
195
196
197
198
## Released Models

<details open>
<summary> Open-Source Visual Pretrained Models </summary>
<br>
<div>

zhe chen's avatar
zhe chen committed
199
200
201
202
203
204
|      name      |       pretrain       | resolution | #param |                                                                                  download                                                                                   |
| :------------: | :------------------: | :--------: | :----: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-L  |        IN-22K        |  384x384   |  223M  |     [pth](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22k_192to384.pth)    \| [hf](https://huggingface.co/OpenGVLab/internimage_l_22k_384)      |
| InternImage-XL |        IN-22K        |  384x384   |  335M  |     [pth](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22k_192to384.pth)   \| [hf](https://huggingface.co/OpenGVLab/internimage_xl_22k_384)     |
| InternImage-H  | Joint 427M -> IN-22K |  384x384   | 1.08B  | [pth](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_jointto22k_384.pth)   \| [hf](https://huggingface.co/OpenGVLab/internimage_h_jointto22k_384)  |
| InternImage-G  | Joint 427M -> IN-22K |  384x384   |   3B   | [pth](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth) \| [hf](https://huggingface.co/OpenGVLab/internimage_g_jointto22k_384) |
zhe chen's avatar
zhe chen committed
205
206
207
208
209
210
211
212
213
214

</div>

</details>

<details open>
<summary> ImageNet-1K Image Classification </summary>
<br>
<div>

zhe chen's avatar
zhe chen committed
215
216
217
218
219
220
221
222
223
|      name      |       pretrain       | resolution | acc@1 | #param | FLOPs |                                                                                                                download                                                                                                                 |
| :------------: | :------------------: | :--------: | :---: | :----: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-T  |        IN-1K         |  224x224   | 83.5  |  30M   |  5G   |          [pth](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth) \| [hf](https://huggingface.co/OpenGVLab/internimage_t_1k_224) \| [cfg](configs/without_lr_decay/internimage_t_1k_224.yaml)          |
| InternImage-S  |        IN-1K         |  224x224   | 84.2  |  50M   |  8G   |          [pth](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth) \| [hf](https://huggingface.co/OpenGVLab/internimage_s_1k_224) \| [cfg](configs/without_lr_decay/internimage_s_1k_224.yaml)          |
| InternImage-B  |        IN-1K         |  224x224   | 84.9  |  97M   |  16G  |          [pth](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth) \| [hf](https://huggingface.co/OpenGVLab/internimage_b_1k_224) \| [cfg](configs/without_lr_decay/internimage_b_1k_224.yaml)          |
| InternImage-L  |        IN-22K        |  384x384   | 87.7  |  223M  | 108G  |  [pth](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22kto1k_384.pth) \| [hf](https://huggingface.co/OpenGVLab/internimage_l_22kto1k_384) \| [cfg](configs/without_lr_decay/internimage_l_22kto1k_384.yaml)   |
| InternImage-XL |        IN-22K        |  384x384   | 88.0  |  335M  | 163G  | [pth](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22kto1k_384.pth) \| [hf](https://huggingface.co/OpenGVLab/internimage_xl_22kto1k_384) \| [cfg](configs/without_lr_decay/internimage_xl_22kto1k_384.yaml) |
| InternImage-H  | Joint 427M -> IN-22K |  640x640   | 89.6  | 1.08B  | 1478G |  [pth](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22kto1k_640.pth) \| [hf](https://huggingface.co/OpenGVLab/internimage_h_22kto1k_640) \| [cfg](configs/without_lr_decay/internimage_h_22kto1k_640.yaml)   |
| InternImage-G  | Joint 427M -> IN-22K |  512x512   | 90.1  |   3B   | 2700G |  [pth](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth) \| [hf](https://huggingface.co/OpenGVLab/internimage_g_22kto1k_512) \| [cfg](configs/without_lr_decay/internimage_g_22kto1k_512.yaml)   |
zhe chen's avatar
zhe chen committed
224
225
226
227
228

</div>

</details>

229
230
231
232
233
<details open>
<summary> iNaturalist 2018 Image Classification </summary>
<br>
<div>

zhe chen's avatar
zhe chen committed
234
235
236
|     name      |       pretrain       | resolution | acc@1 | #param |                                                                                                                                  download                                                                                                                                  |
| :-----------: | :------------------: | :--------: | :---: | :----: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-H | Joint 427M -> IN-22K |  384x384   | 92.6  |  1.1B  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22ktoinat18_384.pth) \| [cfg](configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_h_22ktoinat18_384.log) |
237
238
239
240
241

</div>

</details>

zhe chen's avatar
zhe chen committed
242
## Evaluation
PRC-Huang's avatar
PRC-Huang committed
243

244
To evaluate a pretrained `InternImage` on ImageNet val, run:
PRC-Huang's avatar
PRC-Huang committed
245
246
247

```bash
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py --eval \
zhe chen's avatar
zhe chen committed
248
--cfg <config-file> --resume <checkpoint> --data-path <imagenet-path>
PRC-Huang's avatar
PRC-Huang committed
249
250
251
252
253
254
255
256
257
```

For example, to evaluate the `InternImage-B` with a single GPU:

```bash
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval \
--cfg configs/internimage_b_1k_224.yaml --resume internimage_b_1k_224.pth --data-path <imagenet-path>
```

zhe chen's avatar
zhe chen committed
258
## Training from Scratch on ImageNet-1K
PRC-Huang's avatar
PRC-Huang committed
259

260
261
> The paper results were obtained from models trained with configs in `configs/without_lr_decay`.

PRC-Huang's avatar
PRC-Huang committed
262
263
264
To train an `InternImage` on ImageNet from scratch, run:

```bash
zhe chen's avatar
zhe chen committed
265
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345  main.py \
PRC-Huang's avatar
PRC-Huang committed
266
267
268
--cfg <config-file> --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]
```

zhe chen's avatar
zhe chen committed
269
## Manage Jobs with Slurm
PRC-Huang's avatar
PRC-Huang committed
270

zhe chen's avatar
zhe chen committed
271
For example, to train or evaluate `InternImage` with slurm cluster, run:
PRC-Huang's avatar
PRC-Huang committed
272

zhe chen's avatar
zhe chen committed
273
274
275
<details open>
<summary> InternImage-T (IN-1K) </summary>
<br>
PRC-Huang's avatar
PRC-Huang committed
276
277

```bash
zhe chen's avatar
zhe chen committed
278
279
280
281
282
# Train for 300 epochs with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 512
# Train for 300 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 128
# Evaluate on ImageNet-1K with 8 GPUs
zhe chen's avatar
zhe chen committed
283
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --resume pretrained/internimage_t_1k_224.pth --eval
PRC-Huang's avatar
PRC-Huang committed
284
285
```

zhe chen's avatar
zhe chen committed
286
287
288
289
290
</details>

<details>
<summary> InternImage-S (IN-1K) </summary>
<br>
PRC-Huang's avatar
PRC-Huang committed
291
292

```bash
zhe chen's avatar
zhe chen committed
293
294
295
296
297
# Train for 300 epochs with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --batch-size 512
# Train for 300 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --batch-size 128
# Evaluate on ImageNet-1K with 8 GPUs
zhe chen's avatar
zhe chen committed
298
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --resume pretrained/internimage_s_1k_224.pth --eval
PRC-Huang's avatar
PRC-Huang committed
299
300
```

zhe chen's avatar
zhe chen committed
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
</details>

<details>
<summary> InternImage-B (IN-1K) </summary>
<br>

```bash
# Train for 300 epochs with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --batch-size 512
# Train for 300 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --batch-size 128
# Evaluate on ImageNet-1K with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --resume pretrained/internimage_b_1k_224.pth --eval
```

</details>

<details>
<summary> InternImage-L (IN-22K to IN-1K) </summary>
<br>
PRC-Huang's avatar
PRC-Huang committed
321
322

```bash
zhe chen's avatar
zhe chen committed
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
# Train for 20 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_l_22kto1k_384.yaml --batch-size 16
# Evaluate on ImageNet-1K with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_l_22kto1k_384.yaml --resume pretrained/internimage_l_22kto1k_384.pth --eval
```

</details>

<details>
<summary> InternImage-XL (IN-22K to IN-1K) </summary>
<br>

```bash
# Train for 20 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml --batch-size 16
# Evaluate on ImageNet-1K with 8 GPUs
339
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml --resume pretrained/internimage_xl_22kto1k_384.pth --eval
PRC-Huang's avatar
PRC-Huang committed
340
341
```

zhe chen's avatar
zhe chen committed
342
</details>
PRC-Huang's avatar
PRC-Huang committed
343

zhe chen's avatar
zhe chen committed
344
345
346
<details>
<summary> InternImage-H (IN-22K to IN-1K) </summary>
<br>
PRC-Huang's avatar
PRC-Huang committed
347
348

```bash
zhe chen's avatar
zhe chen committed
349
350
351
352
353
# Train for 20 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --batch-size 16
# Evaluate on ImageNet-1K with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --resume pretrained/internimage_h_22kto1k_640.pth --eval
```
PRC-Huang's avatar
PRC-Huang committed
354

zhe chen's avatar
zhe chen committed
355
</details>
PRC-Huang's avatar
PRC-Huang committed
356

zhe chen's avatar
zhe chen committed
357
358
359
<details>
<summary> InternImage-G (IN-22K to IN-1K) </summary>
<br>
PRC-Huang's avatar
PRC-Huang committed
360

zhe chen's avatar
zhe chen committed
361
362
363
364
365
366
367
368
```bash
# Train for 20 epochs with 64 GPUs
GPUS=64 sh train_in1k.sh <partition> <job-name> configs/internimage_g_22kto1k_512.yaml --batch-size 8
# Evaluate on ImageNet-1K with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_g_22kto1k_512.yaml --resume pretrained/internimage_g_22kto1k_512.pth --eval
```

</details>
PRC-Huang's avatar
PRC-Huang committed
369

zhe chen's avatar
zhe chen committed
370
## Training with DeepSpeed
371

zhe chen's avatar
zhe chen committed
372
We support utilizing [DeepSpeed](https://github.com/microsoft/DeepSpeed) to reduce memory costs for training large-scale models, e.g. InternImage-H with over 1 billion parameters.
373
374
375
To use it, first install the requirements as

```bash
Zeqiang Lai's avatar
Zeqiang Lai committed
376
pip install deepspeed==0.8.3
377
378
```

Zeqiang Lai's avatar
Zeqiang Lai committed
379
380
Then you could launch the training in a slurm system with 8 GPUs as follows (tiny and huge as examples).
The default zero stage is 1 and it could config via command line args `--zero-stage`.
zhe chen's avatar
zhe chen committed
381

382
```
zhe chen's avatar
zhe chen committed
383
384
385
386
387
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4 --eval --resume ckpt.pth
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4 --eval --resume deepspeed_ckpt_dir
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --batch-size 16 --accumulation-steps 4 --pretrained pretrained/internimage_h_jointto22k_384.pth
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --batch-size 16 --accumulation-steps 4 --pretrained pretrained/internimage_h_jointto22k_384.pth --zero-stage 3
388
389
```

zhe chen's avatar
zhe chen committed
390
🤗 **HuggingFace Accelerate Integration of DeepSpeed**
391

zhe chen's avatar
zhe chen committed
392
Optionally, you could use our [HuggingFace Accelerate](https://github.com/huggingface/accelerate) integration to use DeepSpeed.
393
394

```bash
Zeqiang Lai's avatar
Zeqiang Lai committed
395
pip install accelerate==0.18.0
396
397
398
```

```bash
zhe chen's avatar
zhe chen committed
399
accelerate launch --config_file configs/accelerate/dist_8gpus_zero3_wo_loss_scale.yaml main_accelerate.py --cfg configs/internimage_h_22kto1k_640.yaml --data-path data/imagenet --batch-size 16 --pretrained pretrained/internimage_h_jointto22k_384.pth --accumulation-steps 4
zhe chen's avatar
zhe chen committed
400
401
accelerate launch --config_file configs/accelerate/dist_8gpus_zero3_offload.yaml main_accelerate.py --cfg configs/internimage_t_1k_224.yaml --data-path data/imagenet --batch-size 128 --accumulation-steps 4 --output output_zero3_offload
accelerate launch --config_file configs/accelerate/dist_8gpus_zero1.yaml main_accelerate.py --cfg configs/internimage_t_1k_224.yaml --data-path data/imagenet --batch-size 128 --accumulation-steps 4
402
403
404
405
406
407
408
409
```

**Memory Costs**

Here is the reference GPU memory cost for InternImage-H with 8 GPUs.

- total batch size = 512, 16 batch size for each GPU, gradient accumulation steps = 4.

zhe chen's avatar
zhe chen committed
410
411
412
413
414
415
416
| Resolution | Zero Stage | Cpu Offloading | Memory |
| :--------: | :--------: | :------------: | :----: |
|    640     |   zero1    |     False      | 22572  |
|    640     |   zero3    |     False      | 20000  |
|    640     |   zero3    |      True      | 19144  |
|    384     |   zero1    |     False      | 16000  |
|    384     |   zero3    |      True      | 11928  |
417
418
419

**Convert Checkpoints**

zhe chen's avatar
zhe chen committed
420
To convert DeepSpeed checkpoints to pytorch fp32 checkpoint, you could use the following snippet.
421
422
423
424
425
426
427
428
429
430

```python
from deepspeed.utils.zero_to_fp32 import convert_zero_checkpoint_to_fp32_state_dict
convert_zero_checkpoint_to_fp32_state_dict(checkpoint_dir, 'best.pth', tag='best')
```

Then, you could use `best.pth` as usual, e.g., `model.load_state_dict(torch.load('best.pth'))`

> Due to the lack of computational resources, the deepspeed training scripts are currently only verified for the first few epochs. Please fire an issue if you have problems for reproducing the whole training.

zhe chen's avatar
zhe chen committed
431
## Extracting Intermediate Features
432

zhe chen's avatar
zhe chen committed
433
To extract the features of an intermediate layer, you could use `extract_feature.py`.
434
435
436
437
438
439
440

For example, extract features of `b.png` from layers `patch_embed` and `levels.0.downsample` and save them to 'b.pth'.

```bash
python extract_feature.py --cfg configs/internimage_t_1k_224.yaml --img b.png --keys patch_embed levels.0.downsample --save --resume internimage_t_1k_224.pth
```

zhe chen's avatar
zhe chen committed
441
## Export
PRC-Huang's avatar
PRC-Huang committed
442

zhe chen's avatar
zhe chen committed
443
444
445
Install `mmdeploy` at first:

```shell
zhe chen's avatar
zhe chen committed
446
pip install mmdeploy==0.14.0
zhe chen's avatar
zhe chen committed
447
448
```

PRC-Huang's avatar
PRC-Huang committed
449
To export `InternImage-T` from PyTorch to ONNX, run:
zhe chen's avatar
zhe chen committed
450

PRC-Huang's avatar
PRC-Huang committed
451
452
453
454
455
```shell
python export.py --model_name internimage_t_1k_224 --ckpt_dir /path/to/ckpt/dir --onnx
```

To export `InternImage-T` from PyTorch to TensorRT, run:
zhe chen's avatar
zhe chen committed
456

PRC-Huang's avatar
PRC-Huang committed
457
```shell
zhe chen's avatar
zhe chen committed
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
git clone https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy && git checkout v0.13.0
export MMDEPLOY_DIR=$(pwd)
# prepare our custom ops, you can find it at InternImage/tensorrt/modulated_deform_conv_v3
cp -r ../../tensorrt/modulated_deform_conv_v3 csrc/mmdeploy/backend_ops/tensorrt/

# build custom ops
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++ -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${TENSORRT_DIR} -DCUDNN_DIR=${CUDNN_DIR} ..
make -j$(nproc) && make install

# install the mmdeploy after building custom ops
pip install -e .
cd ../

PRC-Huang's avatar
PRC-Huang committed
473
474
python export.py --model_name internimage_t_1k_224 --ckpt_dir /path/to/ckpt/dir --trt
```