README.md 10.8 KB
Newer Older
PRC-Huang's avatar
PRC-Huang committed
1
2
3
4
# InternImage for Image Classification

This folder contains the implementation of the InternImage for image classification.

5
<!-- TOC -->
zhe chen's avatar
zhe chen committed
6
7
8
9
10
11
12
13
14
15

- [Install](#install)
- [Data Preparation](#data-preparation)
- [Evaluation](#evaluation)
- [Training from Scratch on ImageNet-1K](#training-from-scratch-on-imagenet-1k)
- [Manage Jobs with Slurm](#manage-jobs-with-slurm)
- [Training with Deepspeed](#training-with-deepspeed)
- [Extracting Intermediate Features](#extracting-intermediate-features)
- [Export](#export)

16
17
<!-- TOC -->

PRC-Huang's avatar
PRC-Huang committed
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
## Usage

### Install

- Clone this repo:

```bash
git clone https://github.com/OpenGVLab/InternImage.git
cd InternImage
```

- Create a conda virtual environment and activate it:

```bash
conda create -n internimage python=3.7 -y
conda activate internimage
```

- Install `CUDA>=10.2` with `cudnn>=7` following
  the [official installation instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
38
- Install `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2`:
PRC-Huang's avatar
PRC-Huang committed
39
40

For examples, to install torch==1.11 with CUDA==11.3:
zhe chen's avatar
zhe chen committed
41

PRC-Huang's avatar
PRC-Huang committed
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
```bash
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113  -f https://download.pytorch.org/whl/torch_stable.html
```

- Install `timm==0.6.11` and `mmcv-full==1.5.0`:

```bash
pip install -U openmim
mim install mmcv-full==1.5.0
pip install timm==0.6.11 mmdet==2.28.1
```

- Install other requirements:

```bash
pip install opencv-python termcolor yacs pyyaml scipy
```

- Compiling CUDA operators
zhe chen's avatar
zhe chen committed
61

PRC-Huang's avatar
PRC-Huang committed
62
63
64
65
66
67
```bash
cd ./ops_dcnv3
sh ./make.sh
# unit test (should see all checking is True)
python test.py
```
zhe chen's avatar
zhe chen committed
68

yeshenglong1's avatar
yeshenglong1 committed
69
- You can also install the operator using .whl files
zhe chen's avatar
zhe chen committed
70
71
  [DCNv3-1.0-whl](https://github.com/OpenGVLab/InternImage/releases/tag/whl_files)

72
### Data Preparation
PRC-Huang's avatar
PRC-Huang committed
73
74
75
76
77

We use standard ImageNet dataset, you can download it from http://image-net.org/. We provide the following two ways to
load data:

- For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:
zhe chen's avatar
zhe chen committed
78

PRC-Huang's avatar
PRC-Huang committed
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
  ```bash
  $ tree data
  imagenet
  ├── train
  │   ├── class1
  │   │   ├── img1.jpeg
  │   │   ├── img2.jpeg
  │   │   └── ...
  │   ├── class2
  │   │   ├── img3.jpeg
  │   │   └── ...
  │   └── ...
  └── val
      ├── class1
      │   ├── img4.jpeg
      │   ├── img5.jpeg
      │   └── ...
      ├── class2
      │   ├── img6.jpeg
      │   └── ...
      └── ...
zhe chen's avatar
zhe chen committed
100

PRC-Huang's avatar
PRC-Huang committed
101
  ```
zhe chen's avatar
zhe chen committed
102

PRC-Huang's avatar
PRC-Huang committed
103
104
- To boost the slow speed when reading images from massive small files, we also support zipped ImageNet, which includes
  four files:
zhe chen's avatar
zhe chen committed
105
106
107
108

  - `train.zip`, `val.zip`: which store the zipped folder for train and validate splits.
  - `train.txt`, `val.txt`: which store the relative path in the corresponding zip file and ground truth
    label. Make sure the data folder looks like this:
PRC-Huang's avatar
PRC-Huang committed
109
110
111
112
113
114
115
116
117

  ```bash
  $ tree data
  data
  └── ImageNet-Zip
      ├── train_map.txt
      ├── train.zip
      ├── val_map.txt
      └── val.zip
zhe chen's avatar
zhe chen committed
118

PRC-Huang's avatar
PRC-Huang committed
119
120
121
122
123
124
  $ head -n 5 meta_data/val.txt
  ILSVRC2012_val_00000001.JPEG	65
  ILSVRC2012_val_00000002.JPEG	970
  ILSVRC2012_val_00000003.JPEG	230
  ILSVRC2012_val_00000004.JPEG	809
  ILSVRC2012_val_00000005.JPEG	516
zhe chen's avatar
zhe chen committed
125

PRC-Huang's avatar
PRC-Huang committed
126
127
128
129
130
131
132
  $ head -n 5 meta_data/train.txt
  n01440764/n01440764_10026.JPEG	0
  n01440764/n01440764_10027.JPEG	0
  n01440764/n01440764_10029.JPEG	0
  n01440764/n01440764_10040.JPEG	0
  n01440764/n01440764_10042.JPEG	0
  ```
zhe chen's avatar
zhe chen committed
133

PRC-Huang's avatar
PRC-Huang committed
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
- For ImageNet-22K dataset, make a folder named `fall11_whole` and move all images to labeled sub-folders in this
  folder. Then download the train-val split
  file ([ILSVRC2011fall_whole_map_train.txt](https://github.com/SwinTransformer/storage/releases/download/v2.0.1/ILSVRC2011fall_whole_map_train.txt)
  & [ILSVRC2011fall_whole_map_val.txt](https://github.com/SwinTransformer/storage/releases/download/v2.0.1/ILSVRC2011fall_whole_map_val.txt))
  , and put them in the parent directory of `fall11_whole`. The file structure should look like:

  ```bash
    $ tree imagenet22k/
    imagenet22k/
    └── fall11_whole
        ├── n00004475
        ├── n00005787
        ├── n00006024
        ├── n00006484
        └── ...
  ```

### Evaluation

153
To evaluate a pretrained `InternImage` on ImageNet val, run:
PRC-Huang's avatar
PRC-Huang committed
154
155
156

```bash
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py --eval \
zhe chen's avatar
zhe chen committed
157
--cfg <config-file> --resume <checkpoint> --data-path <imagenet-path>
PRC-Huang's avatar
PRC-Huang committed
158
159
160
161
162
163
164
165
166
```

For example, to evaluate the `InternImage-B` with a single GPU:

```bash
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval \
--cfg configs/internimage_b_1k_224.yaml --resume internimage_b_1k_224.pth --data-path <imagenet-path>
```

167
### Training from Scratch on ImageNet-1K
PRC-Huang's avatar
PRC-Huang committed
168

169
170
> The paper results were obtained from models trained with configs in `configs/without_lr_decay`.

PRC-Huang's avatar
PRC-Huang committed
171
172
173
To train an `InternImage` on ImageNet from scratch, run:

```bash
zhe chen's avatar
zhe chen committed
174
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345  main.py \
PRC-Huang's avatar
PRC-Huang committed
175
176
177
--cfg <config-file> --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]
```

178
### Manage Jobs with Slurm
PRC-Huang's avatar
PRC-Huang committed
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199

For example, to train `InternImage` with 8 GPU on a single node for 300 epochs, run:

`InternImage-T`:

```bash
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --resume internimage_t_1k_224.pth --eval
```

`InternImage-S`:

```bash
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --resume internimage_s_1k_224.pth --eval
```

`InternImage-XL`:

```bash
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.pth --resume internimage_xl_22kto1k_384.pth --eval
```

zhe chen's avatar
zhe chen committed
200
<!--
PRC-Huang's avatar
PRC-Huang committed
201
202
203
204
205
### Test pretrained model on ImageNet-22K

For example, to evaluate the `InternImage-L-22k`:

```bash
zhe chen's avatar
zhe chen committed
206
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345  main.py \
PRC-Huang's avatar
PRC-Huang committed
207
208
209
210
--cfg configs/internimage_xl_22k_192to384.yaml --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory>] \
--resume internimage_xl_22k_192to384.pth --eval
``` -->

211
<!-- ### Fine-tuning from a ImageNet-22K pretrained model
PRC-Huang's avatar
PRC-Huang committed
212

213
For example, to fine-tune a `InternImage-XL-22k` model pretrained on ImageNet-22K:
PRC-Huang's avatar
PRC-Huang committed
214
215
216
217
218
219
220
221

```bashs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_image_.yaml --pretrained intern_image_b.pth --eval
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345  main.py \
--cfg configs/.yaml --pretrained swin_base_patch4_window7_224_22k.pth \
--data-path <imagenet-path> --batch-size 64 --accumulation-steps 2 [--use-checkpoint]
``` -->

222
223
224
225
226
227
### Training with Deepspeed

We support utilizing [Deepspeed](https://github.com/microsoft/DeepSpeed) to reduce memory costs for training large-scale models, e.g. InternImage-H with over 1 billion parameters.
To use it, first install the requirements as

```bash
Zeqiang Lai's avatar
Zeqiang Lai committed
228
pip install deepspeed==0.8.3
229
230
```

Zeqiang Lai's avatar
Zeqiang Lai committed
231
232
Then you could launch the training in a slurm system with 8 GPUs as follows (tiny and huge as examples).
The default zero stage is 1 and it could config via command line args `--zero-stage`.
zhe chen's avatar
zhe chen committed
233

234
```
zhe chen's avatar
zhe chen committed
235
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh vc_research_4 train configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4
236
237
238
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh vc_research_4 train configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4 --eval --resume ckpt.pth
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh vc_research_4 train configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4 --eval --resume deepspeed_ckpt_dir
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh vc_research_4 train configs/internimage_h_22kto1k_640.yaml --batch-size 16 --accumulation-steps 4 --pretrained ckpt/internimage_h_jointto22k_384.pth
Zeqiang Lai's avatar
Zeqiang Lai committed
239
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh vc_research_4 train configs/internimage_h_22kto1k_640.yaml --batch-size 16 --accumulation-steps 4 --pretrained ckpt/internimage_h_jointto22k_384.pth --zero-stage 3
240
241
242
243
244
245
246
```

🤗 **Huggingface Accelerate Integration of Deepspeed**

Optionally, you could use our [Huggingface accelerate](https://github.com/huggingface/accelerate) integration to use deepspeed.

```bash
Zeqiang Lai's avatar
Zeqiang Lai committed
247
pip install accelerate==0.18.0
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
```

```bash
accelerate launch --config_file configs/accelerate/dist_8gpus_zero3_wo_loss_scale.yaml main_accelerate.py  --cfg configs/internimage_h_22kto1k_640.yaml --data-path /mnt/lustre/share/images --batch-size 16 --pretrained ckpt/internimage_h_jointto22k_384.pth --accumulation-steps 4
accelerate launch --config_file configs/accelerate/dist_8gpus_zero3_offload.yaml main_accelerate.py  --cfg configs/internimage_t_1k_224.yaml --data-path /mnt/lustre/share/images --batch-size 128 --accumulation-steps 4 --output output_zero3_offload
accelerate launch --config_file configs/accelerate/dist_8gpus_zero1.yaml main_accelerate.py  --cfg configs/internimage_t_1k_224.yaml --data-path /mnt/lustre/share/images --batch-size 128 --accumulation-steps 4
```

**Memory Costs**

Here is the reference GPU memory cost for InternImage-H with 8 GPUs.

- total batch size = 512, 16 batch size for each GPU, gradient accumulation steps = 4.

| Resolution | Deepspeed | Cpu offloading | Memory |
zhe chen's avatar
zhe chen committed
263
264
265
266
267
268
| ---------- | --------- | -------------- | ------ |
| 640        | zero1     | False          | 22572  |
| 640        | zero3     | False          | 20000  |
| 640        | zero3     | True           | 19144  |
| 384        | zero1     | False          | 16000  |
| 384        | zero3     | True           | 11928  |
269
270
271
272
273
274
275
276
277
278
279
280
281
282

**Convert Checkpoints**

To convert deepspeed checkpoints to pytorch fp32 checkpoint, you could use the following snippet.

```python
from deepspeed.utils.zero_to_fp32 import convert_zero_checkpoint_to_fp32_state_dict
convert_zero_checkpoint_to_fp32_state_dict(checkpoint_dir, 'best.pth', tag='best')
```

Then, you could use `best.pth` as usual, e.g., `model.load_state_dict(torch.load('best.pth'))`

> Due to the lack of computational resources, the deepspeed training scripts are currently only verified for the first few epochs. Please fire an issue if you have problems for reproducing the whole training.

283
284
### Extracting Intermediate Features

zhe chen's avatar
zhe chen committed
285
To extract the features of an intermediate layer, you could use `extract_feature.py`.
286
287
288
289
290
291
292

For example, extract features of `b.png` from layers `patch_embed` and `levels.0.downsample` and save them to 'b.pth'.

```bash
python extract_feature.py --cfg configs/internimage_t_1k_224.yaml --img b.png --keys patch_embed levels.0.downsample --save --resume internimage_t_1k_224.pth
```

PRC-Huang's avatar
PRC-Huang committed
293
294
295
### Export

To export `InternImage-T` from PyTorch to ONNX, run:
zhe chen's avatar
zhe chen committed
296

PRC-Huang's avatar
PRC-Huang committed
297
298
299
300
301
```shell
python export.py --model_name internimage_t_1k_224 --ckpt_dir /path/to/ckpt/dir --onnx
```

To export `InternImage-T` from PyTorch to TensorRT, run:
zhe chen's avatar
zhe chen committed
302

PRC-Huang's avatar
PRC-Huang committed
303
304
305
```shell
python export.py --model_name internimage_t_1k_224 --ckpt_dir /path/to/ckpt/dir --trt
```