Commit 0552aa5e authored by zhe chen's avatar zhe chen
Browse files

Update README.md


Update README.md
parent a32a696e
...@@ -4,8 +4,9 @@ This folder contains the implementation of the InternImage for image classificat ...@@ -4,8 +4,9 @@ This folder contains the implementation of the InternImage for image classificat
<!-- TOC --> <!-- TOC -->
- [Install](#install) - [Installation](#installation)
- [Data Preparation](#data-preparation) - [Data Preparation](#data-preparation)
- [Released Models](#released-models)
- [Evaluation](#evaluation) - [Evaluation](#evaluation)
- [Training from Scratch on ImageNet-1K](#training-from-scratch-on-imagenet-1k) - [Training from Scratch on ImageNet-1K](#training-from-scratch-on-imagenet-1k)
- [Manage Jobs with Slurm](#manage-jobs-with-slurm) - [Manage Jobs with Slurm](#manage-jobs-with-slurm)
...@@ -15,11 +16,9 @@ This folder contains the implementation of the InternImage for image classificat ...@@ -15,11 +16,9 @@ This folder contains the implementation of the InternImage for image classificat
<!-- TOC --> <!-- TOC -->
## Usage ## Installation
### Install - Clone this repository:
- Clone this repo:
```bash ```bash
git clone https://github.com/OpenGVLab/InternImage.git git clone https://github.com/OpenGVLab/InternImage.git
...@@ -37,7 +36,7 @@ conda activate internimage ...@@ -37,7 +36,7 @@ conda activate internimage
the [official installation instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) the [official installation instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
- Install `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2`: - Install `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2`:
For examples, to install torch==1.11 with CUDA==11.3: For examples, to install `torch==1.11` with `CUDA==11.3`:
```bash ```bash
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
...@@ -59,6 +58,8 @@ pip install opencv-python termcolor yacs pyyaml scipy ...@@ -59,6 +58,8 @@ pip install opencv-python termcolor yacs pyyaml scipy
- Compiling CUDA operators - Compiling CUDA operators
Before compiling, please use the `nvcc -V` command to check whether your `nvcc` version matches the CUDA version of PyTorch.
```bash ```bash
cd ./ops_dcnv3 cd ./ops_dcnv3
sh ./make.sh sh ./make.sh
...@@ -66,13 +67,17 @@ sh ./make.sh ...@@ -66,13 +67,17 @@ sh ./make.sh
python test.py python test.py
``` ```
- You can also install the operator using .whl files - You can also install the operator using precompiled `.whl` files
[DCNv3-1.0-whl](https://github.com/OpenGVLab/InternImage/releases/tag/whl_files) [DCNv3-1.0-whl](https://github.com/OpenGVLab/InternImage/releases/tag/whl_files)
### Data Preparation ## Data Preparation
We use standard ImageNet dataset, you can download it from http://image-net.org/.
We use standard ImageNet dataset, you can download it from http://image-net.org/. We provide the following two ways to We provide the following two ways to load data:
load data:
<details open>
<summary>Standard ImageNet-1K</summary>
- For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like: - For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:
...@@ -100,6 +105,11 @@ load data: ...@@ -100,6 +105,11 @@ load data:
``` ```
</details>
<details>
<summary>Zipped ImageNet-1K</summary>
- To boost the slow speed when reading images from massive small files, we also support zipped ImageNet, which includes - To boost the slow speed when reading images from massive small files, we also support zipped ImageNet, which includes
four files: four files:
...@@ -131,6 +141,11 @@ load data: ...@@ -131,6 +141,11 @@ load data:
n01440764/n01440764_10042.JPEG 0 n01440764/n01440764_10042.JPEG 0
``` ```
</details>
<details>
<summary>ImageNet-22K</summary>
- For ImageNet-22K dataset, make a folder named `fall11_whole` and move all images to labeled sub-folders in this - For ImageNet-22K dataset, make a folder named `fall11_whole` and move all images to labeled sub-folders in this
folder. Then download the train-val split folder. Then download the train-val split
file ([ILSVRC2011fall_whole_map_train.txt](https://github.com/SwinTransformer/storage/releases/download/v2.0.1/ILSVRC2011fall_whole_map_train.txt) file ([ILSVRC2011fall_whole_map_train.txt](https://github.com/SwinTransformer/storage/releases/download/v2.0.1/ILSVRC2011fall_whole_map_train.txt)
...@@ -148,7 +163,46 @@ load data: ...@@ -148,7 +163,46 @@ load data:
└── ... └── ...
``` ```
### Evaluation </details>
## Released Models
<details open>
<summary> Open-Source Visual Pretrained Models </summary>
<br>
<div>
| name | pretrain | pre-training resolution | #param | download |
| :------------: | :----------: | :----------------------: | :----: | :---------------------------------------------------------------------------------------------------: |
| InternImage-L | ImageNet-22K | 384x384 | 223M | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22k_192to384.pth) |
| InternImage-XL | ImageNet-22K | 384x384 | 335M | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22k_192to384.pth) |
| InternImage-H | Joint 427M | 384x384 | 1.08B | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_jointto22k_384.pth) |
| InternImage-G | - | 384x384 | 3B | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth) |
</div>
</details>
<details open>
<summary> ImageNet-1K Image Classification </summary>
<br>
<div>
| name | pretrain | resolution | acc@1 | #param | FLOPs | download |
| :------------: | :----------: | :--------: | :---: | :----: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-T | ImageNet-1K | 224x224 | 83.5 | 30M | 5G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_t_1k_224.yaml) |
| InternImage-S | ImageNet-1K | 224x224 | 84.2 | 50M | 8G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_s_1k_224.yaml) |
| InternImage-B | ImageNet-1K | 224x224 | 84.9 | 97M | 16G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_b_1k_224.yaml) |
| InternImage-L | ImageNet-22K | 384x384 | 87.7 | 223M | 108G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_l_22kto1k_384.yaml) |
| InternImage-XL | ImageNet-22K | 384x384 | 88.0 | 335M | 163G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_xl_22kto1k_384.yaml) |
| InternImage-H | Joint 427M | 640x640 | 89.6 | 1.08B | 1478G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22kto1k_640.pth) \| [cfg](configs/without_lr_decay/internimage_h_22kto1k_640.yaml) |
| InternImage-G | - | 512x512 | 90.1 | 3B | 2700G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth) \| [cfg](configs/without_lr_decay/internimage_g_22kto1k_512.yaml) |
</div>
</details>
## Evaluation
To evaluate a pretrained `InternImage` on ImageNet val, run: To evaluate a pretrained `InternImage` on ImageNet val, run:
...@@ -164,7 +218,7 @@ python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.p ...@@ -164,7 +218,7 @@ python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.p
--cfg configs/internimage_b_1k_224.yaml --resume internimage_b_1k_224.pth --data-path <imagenet-path> --cfg configs/internimage_b_1k_224.yaml --resume internimage_b_1k_224.pth --data-path <imagenet-path>
``` ```
### Training from Scratch on ImageNet-1K ## Training from Scratch on ImageNet-1K
> The paper results were obtained from models trained with configs in `configs/without_lr_decay`. > The paper results were obtained from models trained with configs in `configs/without_lr_decay`.
...@@ -175,7 +229,7 @@ python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --maste ...@@ -175,7 +229,7 @@ python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --maste
--cfg <config-file> --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>] --cfg <config-file> --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]
``` ```
### Manage Jobs with Slurm ## Manage Jobs with Slurm
For example, to train `InternImage` with 8 GPU on a single node for 300 epochs, run: For example, to train `InternImage` with 8 GPU on a single node for 300 epochs, run:
...@@ -219,7 +273,7 @@ python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main. ...@@ -219,7 +273,7 @@ python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.
--data-path <imagenet-path> --batch-size 64 --accumulation-steps 2 [--use-checkpoint] --data-path <imagenet-path> --batch-size 64 --accumulation-steps 2 [--use-checkpoint]
``` --> ``` -->
### Training with Deepspeed ## Training with DeepSpeed
We support utilizing [Deepspeed](https://github.com/microsoft/DeepSpeed) to reduce memory costs for training large-scale models, e.g. InternImage-H with over 1 billion parameters. We support utilizing [Deepspeed](https://github.com/microsoft/DeepSpeed) to reduce memory costs for training large-scale models, e.g. InternImage-H with over 1 billion parameters.
To use it, first install the requirements as To use it, first install the requirements as
...@@ -232,25 +286,25 @@ Then you could launch the training in a slurm system with 8 GPUs as follows (tin ...@@ -232,25 +286,25 @@ Then you could launch the training in a slurm system with 8 GPUs as follows (tin
The default zero stage is 1 and it could config via command line args `--zero-stage`. The default zero stage is 1 and it could config via command line args `--zero-stage`.
``` ```
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh vc_research_4 train configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4 GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh INTERN2 train configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh vc_research_4 train configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4 --eval --resume ckpt.pth GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh INTERN2 train configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4 --eval --resume ckpt.pth
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh vc_research_4 train configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4 --eval --resume deepspeed_ckpt_dir GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh INTERN2 train configs/internimage_t_1k_224.yaml --batch-size 128 --accumulation-steps 4 --eval --resume deepspeed_ckpt_dir
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh vc_research_4 train configs/internimage_h_22kto1k_640.yaml --batch-size 16 --accumulation-steps 4 --pretrained ckpt/internimage_h_jointto22k_384.pth GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh INTERN2 train configs/internimage_h_22kto1k_640.yaml --batch-size 16 --accumulation-steps 4 --pretrained ckpt/internimage_h_jointto22k_384.pth
GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh vc_research_4 train configs/internimage_h_22kto1k_640.yaml --batch-size 16 --accumulation-steps 4 --pretrained ckpt/internimage_h_jointto22k_384.pth --zero-stage 3 GPUS=8 GPUS_PER_NODE=8 sh train_in1k_deepspeed.sh INTERN2 train configs/internimage_h_22kto1k_640.yaml --batch-size 16 --accumulation-steps 4 --pretrained ckpt/internimage_h_jointto22k_384.pth --zero-stage 3
``` ```
🤗 **Huggingface Accelerate Integration of Deepspeed** 🤗 **Huggingface Accelerate Integration of DeepSpeed**
Optionally, you could use our [Huggingface accelerate](https://github.com/huggingface/accelerate) integration to use deepspeed. Optionally, you could use our [Huggingface Accelerate](https://github.com/huggingface/accelerate) integration to use DeepSpeed.
```bash ```bash
pip install accelerate==0.18.0 pip install accelerate==0.18.0
``` ```
```bash ```bash
accelerate launch --config_file configs/accelerate/dist_8gpus_zero3_wo_loss_scale.yaml main_accelerate.py --cfg configs/internimage_h_22kto1k_640.yaml --data-path /mnt/lustre/share/images --batch-size 16 --pretrained ckpt/internimage_h_jointto22k_384.pth --accumulation-steps 4 accelerate launch --config_file configs/accelerate/dist_8gpus_zero3_wo_loss_scale.yaml main_accelerate.py --cfg configs/internimage_h_22kto1k_640.yaml --data-path data/imagenet --batch-size 16 --pretrained ckpt/internimage_h_jointto22k_384.pth --accumulation-steps 4
accelerate launch --config_file configs/accelerate/dist_8gpus_zero3_offload.yaml main_accelerate.py --cfg configs/internimage_t_1k_224.yaml --data-path /mnt/lustre/share/images --batch-size 128 --accumulation-steps 4 --output output_zero3_offload accelerate launch --config_file configs/accelerate/dist_8gpus_zero3_offload.yaml main_accelerate.py --cfg configs/internimage_t_1k_224.yaml --data-path data/imagenet --batch-size 128 --accumulation-steps 4 --output output_zero3_offload
accelerate launch --config_file configs/accelerate/dist_8gpus_zero1.yaml main_accelerate.py --cfg configs/internimage_t_1k_224.yaml --data-path /mnt/lustre/share/images --batch-size 128 --accumulation-steps 4 accelerate launch --config_file configs/accelerate/dist_8gpus_zero1.yaml main_accelerate.py --cfg configs/internimage_t_1k_224.yaml --data-path data/imagenet --batch-size 128 --accumulation-steps 4
``` ```
**Memory Costs** **Memory Costs**
...@@ -259,8 +313,8 @@ Here is the reference GPU memory cost for InternImage-H with 8 GPUs. ...@@ -259,8 +313,8 @@ Here is the reference GPU memory cost for InternImage-H with 8 GPUs.
- total batch size = 512, 16 batch size for each GPU, gradient accumulation steps = 4. - total batch size = 512, 16 batch size for each GPU, gradient accumulation steps = 4.
| Resolution | Deepspeed | Cpu offloading | Memory | | Resolution | Zero Stage | Cpu Offloading | Memory |
| ---------- | --------- | -------------- | ------ | | :--------: | :--------: | :------------: | :----: |
| 640 | zero1 | False | 22572 | | 640 | zero1 | False | 22572 |
| 640 | zero3 | False | 20000 | | 640 | zero3 | False | 20000 |
| 640 | zero3 | True | 19144 | | 640 | zero3 | True | 19144 |
...@@ -269,7 +323,7 @@ Here is the reference GPU memory cost for InternImage-H with 8 GPUs. ...@@ -269,7 +323,7 @@ Here is the reference GPU memory cost for InternImage-H with 8 GPUs.
**Convert Checkpoints** **Convert Checkpoints**
To convert deepspeed checkpoints to pytorch fp32 checkpoint, you could use the following snippet. To convert DeepSpeed checkpoints to pytorch fp32 checkpoint, you could use the following snippet.
```python ```python
from deepspeed.utils.zero_to_fp32 import convert_zero_checkpoint_to_fp32_state_dict from deepspeed.utils.zero_to_fp32 import convert_zero_checkpoint_to_fp32_state_dict
...@@ -280,7 +334,7 @@ Then, you could use `best.pth` as usual, e.g., `model.load_state_dict(torch.load ...@@ -280,7 +334,7 @@ Then, you could use `best.pth` as usual, e.g., `model.load_state_dict(torch.load
> Due to the lack of computational resources, the deepspeed training scripts are currently only verified for the first few epochs. Please fire an issue if you have problems for reproducing the whole training. > Due to the lack of computational resources, the deepspeed training scripts are currently only verified for the first few epochs. Please fire an issue if you have problems for reproducing the whole training.
### Extracting Intermediate Features ## Extracting Intermediate Features
To extract the features of an intermediate layer, you could use `extract_feature.py`. To extract the features of an intermediate layer, you could use `extract_feature.py`.
...@@ -290,7 +344,7 @@ For example, extract features of `b.png` from layers `patch_embed` and `levels.0 ...@@ -290,7 +344,7 @@ For example, extract features of `b.png` from layers `patch_embed` and `levels.0
python extract_feature.py --cfg configs/internimage_t_1k_224.yaml --img b.png --keys patch_embed levels.0.downsample --save --resume internimage_t_1k_224.pth python extract_feature.py --cfg configs/internimage_t_1k_224.yaml --img b.png --keys patch_embed levels.0.downsample --save --resume internimage_t_1k_224.pth
``` ```
### Export ## Export
To export `InternImage-T` from PyTorch to ONNX, run: To export `InternImage-T` from PyTorch to ONNX, run:
......
...@@ -74,7 +74,8 @@ def parse_option(): ...@@ -74,7 +74,8 @@ def parse_option():
type=str, type=str,
help='dataset name', help='dataset name',
default=None) default=None)
parser.add_argument('--data-path', type=str, help='path to dataset') parser.add_argument('--data-path', type=str, help='path to dataset',
default='data/imagenet')
parser.add_argument('--zip', parser.add_argument('--zip',
action='store_true', action='store_true',
help='use zipped dataset instead of folder dataset') help='use zipped dataset instead of folder dataset')
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment