Commit 477b5ed3 authored by zhe chen's avatar zhe chen
Browse files

Update README.md


Update README.md


Update README.md


Update README.md


Update README.md


Update README.md


Update README.md


Update README.md


Update README.md


Update README.md


Update README.md


Update README.md


Update README.md
parent f37f9c2a
.idea/
.DS_Store
__pycache__/
......@@ -6,3 +5,4 @@ classification/convertor/
segmentation/convertor/
checkpoint_dir/
demo/
pretrained/
This diff is collapsed.
This diff is collapsed.
......@@ -10,7 +10,7 @@ This folder contains the implementation of the InternImage for image classificat
- [Evaluation](#evaluation)
- [Training from Scratch on ImageNet-1K](#training-from-scratch-on-imagenet-1k)
- [Manage Jobs with Slurm](#manage-jobs-with-slurm)
- [Training with Deepspeed](#training-with-deepspeed)
- [Training with DeepSpeed](#training-with-deepspeed)
- [Extracting Intermediate Features](#extracting-intermediate-features)
- [Export](#export)
......@@ -47,6 +47,7 @@ pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.p
```bash
pip install -U openmim
mim install mmcv-full==1.5.0
mim install mmsegmentation==0.27.0
pip install timm==0.6.11 mmdet==2.28.1
```
......@@ -59,7 +60,7 @@ pip install numpy==1.26.4
pip install pydantic==1.10.13
```
- Compiling CUDA operators
- Compile CUDA operators
Before compiling, please use the `nvcc -V` command to check whether your `nvcc` version matches the CUDA version of PyTorch.
......@@ -79,8 +80,9 @@ We provide the following ways to prepare data:
<details open>
<summary>Standard ImageNet-1K</summary>
<br>
We use standard ImageNet dataset, you can download it from http://image-net.org/.
- We use standard ImageNet dataset, you can download it from http://image-net.org/.
- For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:
......@@ -195,12 +197,12 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
<br>
<div>
| name | pretrain | pre-training resolution | #param | download |
| :------------: | :----------: | :----------------------: | :----: | :---------------------------------------------------------------------------------------------------: |
| name | pretrain | resolution | #param | download |
| :------------: | :----------: | :--------: | :----: | :---------------------------------------------------------------------------------------------------: |
| InternImage-L | ImageNet-22K | 384x384 | 223M | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22k_192to384.pth) |
| InternImage-XL | ImageNet-22K | 384x384 | 335M | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22k_192to384.pth) |
| InternImage-H | Joint 427M | 384x384 | 1.08B | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_jointto22k_384.pth) |
| InternImage-G | - | 384x384 | 3B | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth) |
| InternImage-G | Joint 427M | 384x384 | 3B | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth) |
</div>
......@@ -212,14 +214,14 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
<div>
| name | pretrain | resolution | acc@1 | #param | FLOPs | download |
| :------------: | :----------: | :--------: | :---: | :----: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-T | ImageNet-1K | 224x224 | 83.5 | 30M | 5G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_t_1k_224.yaml) |
| InternImage-S | ImageNet-1K | 224x224 | 84.2 | 50M | 8G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_s_1k_224.yaml) |
| InternImage-B | ImageNet-1K | 224x224 | 84.9 | 97M | 16G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_b_1k_224.yaml) |
| :------------: | :----------: | :--------: | :---: | :----: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-T | ImageNet-1K | 224x224 | 83.5 | 30M | 5G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_t_1k_224.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_t_1k_224.log) |
| InternImage-S | ImageNet-1K | 224x224 | 84.2 | 50M | 8G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_s_1k_224.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_s_1k_224.log) |
| InternImage-B | ImageNet-1K | 224x224 | 84.9 | 97M | 16G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_b_1k_224.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_b_1k_224.log) |
| InternImage-L | ImageNet-22K | 384x384 | 87.7 | 223M | 108G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_l_22kto1k_384.yaml) |
| InternImage-XL | ImageNet-22K | 384x384 | 88.0 | 335M | 163G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_xl_22kto1k_384.yaml) |
| InternImage-H | Joint 427M | 640x640 | 89.6 | 1.08B | 1478G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22kto1k_640.pth) \| [cfg](configs/without_lr_decay/internimage_h_22kto1k_640.yaml) |
| InternImage-G | - | 512x512 | 90.1 | 3B | 2700G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth) \| [cfg](configs/without_lr_decay/internimage_g_22kto1k_512.yaml) |
| InternImage-G | Joint 427M | 512x512 | 90.1 | 3B | 2700G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth) \| [cfg](configs/without_lr_decay/internimage_g_22kto1k_512.yaml) |
</div>
......@@ -231,8 +233,8 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
<div>
| name | pretrain | resolution | acc@1 | #param | download |
| :-----------: | :--------: | :--------: | :---: | :----: | :-----------------------------------------------------------------------------: |
| InternImage-H | Joint 427M | 384x384 | 92.6 | 1.1B | [ckpt](<>) \| [cfg](configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml) |
| :-----------: | :--------: | :--------: | :---: | :----: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-H | Joint 427M | 384x384 | 92.6 | 1.1B | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22ktoinat18_384.pth) \| [cfg](configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_h_22ktoinat18_384.log) |
</div>
......@@ -267,56 +269,104 @@ python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --maste
## Manage Jobs with Slurm
For example, to train or evaluate `InternImage` with 8 GPU on a single node, run:
For example, to train or evaluate `InternImage` with slurm cluster, run:
`InternImage-T`:
<details open>
<summary> InternImage-T (IN-1K) </summary>
<br>
```bash
# Train for 300 epochs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml
# Evaluate on ImageNet-1K
# Train for 300 epochs with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 512
# Train for 300 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 128
# Evaluate on ImageNet-1K with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --resume pretrained/internimage_t_1k_224.pth --eval
```
`InternImage-S`:
</details>
<details>
<summary> InternImage-S (IN-1K) </summary>
<br>
```bash
# Train for 300 epochs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml
# Evaluate on ImageNet-1K
# Train for 300 epochs with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --batch-size 512
# Train for 300 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --batch-size 128
# Evaluate on ImageNet-1K with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --resume pretrained/internimage_s_1k_224.pth --eval
```
`InternImage-XL`:
</details>
<details>
<summary> InternImage-B (IN-1K) </summary>
<br>
```bash
# Train for 300 epochs with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --batch-size 512
# Train for 300 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --batch-size 128
# Evaluate on ImageNet-1K with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --resume pretrained/internimage_b_1k_224.pth --eval
```
</details>
<details>
<summary> InternImage-L (IN-22K to IN-1K) </summary>
<br>
```bash
# Train for 300 epochs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml
# Evaluate on ImageNet-1K
# Train for 20 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_l_22kto1k_384.yaml --batch-size 16
# Evaluate on ImageNet-1K with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_l_22kto1k_384.yaml --resume pretrained/internimage_l_22kto1k_384.pth --eval
```
</details>
<details>
<summary> InternImage-XL (IN-22K to IN-1K) </summary>
<br>
```bash
# Train for 20 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml --batch-size 16
# Evaluate on ImageNet-1K with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml --resume pretrained/internimage_xl_22kto1k_384.pth --eval
```
<!--
### Test pretrained model on ImageNet-22K
</details>
For example, to evaluate the `InternImage-L-22k`:
<details>
<summary> InternImage-H (IN-22K to IN-1K) </summary>
<br>
```bash
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py \
--cfg configs/internimage_xl_22k_192to384.yaml --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory>] \
--resume internimage_xl_22k_192to384.pth --eval
``` -->
# Train for 20 epochs with 32 GPUs
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --batch-size 16
# Evaluate on ImageNet-1K with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --resume pretrained/internimage_h_22kto1k_640.pth --eval
```
<!-- ### Fine-tuning from a ImageNet-22K pretrained model
</details>
For example, to fine-tune a `InternImage-XL-22k` model pretrained on ImageNet-22K:
<details>
<summary> InternImage-G (IN-22K to IN-1K) </summary>
<br>
```bashs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_image_.yaml --pretrained intern_image_b.pth --eval
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py \
--cfg configs/.yaml --pretrained swin_base_patch4_window7_224_22k.pth \
--data-path <imagenet-path> --batch-size 64 --accumulation-steps 2 [--use-checkpoint]
``` -->
```bash
# Train for 20 epochs with 64 GPUs
GPUS=64 sh train_in1k.sh <partition> <job-name> configs/internimage_g_22kto1k_512.yaml --batch-size 8
# Evaluate on ImageNet-1K with 8 GPUs
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_g_22kto1k_512.yaml --resume pretrained/internimage_g_22kto1k_512.pth --eval
```
</details>
## Training with DeepSpeed
......@@ -394,7 +444,7 @@ python extract_feature.py --cfg configs/internimage_t_1k_224.yaml --img b.png --
Install `mmdeploy` at first:
```shell
pip
pip install mmdeploy==0.14.0
```
To export `InternImage-T` from PyTorch to ONNX, run:
......
......@@ -28,7 +28,7 @@ MODEL:
PRETRAINED: 'pretrained/internimage_h_jointto22k_384.pth'
TRAIN:
EMA:
ENABLE: true
ENABLE: false
DECAY: 0.9999
EPOCHS: 100
WARMUP_EPOCHS: 0
......@@ -36,11 +36,7 @@ TRAIN:
BASE_LR: 2e-05 # 512
WARMUP_LR: .0
MIN_LR: .0
LR_LAYER_DECAY: true
LR_LAYER_DECAY_RATIO: 0.9
USE_CHECKPOINT: true
RAND_INIT_FT_HEAD: true
OPTIMIZER:
DCN_LR_MUL: 0.1
AMP_OPT_LEVEL: O0
EVAL_FREQ: 1
......@@ -4,11 +4,22 @@ This folder contains the implementation of the InternImage for semantic segmenta
Our segmentation code is developed on top of [MMSegmentation v0.27.0](https://github.com/open-mmlab/mmsegmentation/tree/v0.27.0).
## Usage
<!-- TOC -->
### Install
- [Installation](#installation)
- [Data Preparation](#data-preparation)
- [Released Models](#released-models)
- [Evaluation](#evaluation)
- [Training](#training)
- [Manage Jobs with Slurm](#manage-jobs-with-slurm)
- [Image Demo](#image-demo)
- [Export](#export)
- Clone this repo:
<!-- TOC -->
## Installation
- Clone this repository:
```bash
git clone https://github.com/OpenGVLab/InternImage.git
......@@ -26,11 +37,10 @@ conda activate internimage
the [official installation instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
- Install `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2`:
For examples, to install torch==1.11 with CUDA==11.3 and nvcc:
For examples, to install `torch==1.11` with `CUDA==11.3`:
```bash
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch -y
conda install -c conda-forge cudatoolkit-dev=11.3 -y # to install nvcc
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
```
- Install other requirements:
......@@ -42,7 +52,7 @@ conda install -c conda-forge termcolor yacs pyyaml scipy pip -y
pip install opencv-python
```
- Install `timm` and `mmcv-full` and \`mmsegmentation':
- Install `timm`, `mmcv-full` and \`mmsegmentation':
```bash
pip install -U openmim
......@@ -51,8 +61,19 @@ mim install mmsegmentation==0.27.0
pip install timm==0.6.11 mmdet==2.28.1
```
- Install other requirements:
```bash
pip install opencv-python termcolor yacs pyyaml scipy
# Please use a version of numpy lower than 2.0
pip install numpy==1.26.4
pip install pydantic==1.10.13
```
- Compile CUDA operators
Before compiling, please use the `nvcc -V` command to check whether your `nvcc` version matches the CUDA version of PyTorch.
```bash
cd ./ops_dcnv3
sh ./make.sh
......@@ -60,14 +81,71 @@ sh ./make.sh
python test.py
```
- You can also install the operator using .whl files
- You can also install the operator using precompiled `.whl` files
[DCNv3-1.0-whl](https://github.com/OpenGVLab/InternImage/releases/tag/whl_files)
### Data Preparation
## Data Preparation
Prepare datasets according to the [guidelines](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#prepare-datasets) in MMSegmentation.
### Evaluation
## Released Models
<details open>
<summary> Dataset: ADE20K </summary>
<br>
<div>
| method | backbone | resolution | mIoU (ss/ms) | #param | FLOPs | Config | Download |
| :---------: | :------------: | :--------: | :----------: | :----: | :---: | :---------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| UperNet | InternImage-T | 512x512 | 47.9 / 48.1 | 59M | 944G | [config](./configs/ade20k/upernet_internimage_t_512_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512_160k_ade20k.log.json) |
| UperNet | InternImage-S | 512x512 | 50.1 / 50.9 | 80M | 1017G | [config](./configs/ade20k/upernet_internimage_s_512_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512_160k_ade20k.log.json) |
| UperNet | InternImage-B | 512x512 | 50.8 / 51.3 | 128M | 1185G | [config](./configs/ade20k/upernet_internimage_b_512_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512_160k_ade20k.log.json) |
| UperNet | InternImage-L | 640x640 | 53.9 / 54.1 | 256M | 2526G | [config](./configs/ade20k/upernet_internimage_l_640_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_640_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_640_160k_ade20k.log.json) |
| UperNet | InternImage-XL | 640x640 | 55.0 / 55.3 | 368M | 3142G | [config](./configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_640_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_640_160k_ade20k.log.json) |
| UperNet | InternImage-H | 896x896 | 59.9 / 60.3 | 1.12B | 3566G | [config](./configs/ade20k/upernet_internimage_h_896_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_h_896_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_h_896_160k_ade20k.log.json) |
| Mask2Former | InternImage-H | 896x896 | 62.6 / 62.9 | 1.31B | 4635G | [config](./configs/ade20k/mask2former_internimage_h_896_80k_cocostuff2ade20k_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.log.json) |
</div>
</details>
<details>
<summary> Dataset: Cityscapes </summary>
<br>
<div>
| method | backbone | resolution | mIoU (ss/ms) | #params | FLOPs | Config | Download |
| :---------: | :------------: | :--------: | :-----------: | :-----: | :---: | :-------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| UperNet | InternImage-T | 512x1024 | 82.58 / 83.40 | 59M | 1889G | [config](./configs/cityscapes/upernet_internimage_t_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512x1024_160k_cityscapes.log.json) |
| UperNet | InternImage-S | 512x1024 | 82.74 / 83.45 | 80M | 2035G | [config](./configs/cityscapes/upernet_internimage_s_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512x1024_160k_cityscapes.log.json) |
| UperNet | InternImage-B | 512x1024 | 83.18 / 83.97 | 128M | 2369G | [config](./configs/cityscapes/upernet_internimage_b_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512x1024_160k_cityscapes.log.json) |
| UperNet | InternImage-L | 512x1024 | 83.68 / 84.41 | 256M | 3234G | [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_cityscapes.log.json) |
| UperNet\* | InternImage-L | 512x1024 | 85.94 / 86.22 | 256M | 3234G | [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.log.json) |
| UperNet | InternImage-XL | 512x1024 | 83.62 / 84.28 | 368M | 4022G | [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_cityscapes.log.json) |
| UperNet\* | InternImage-XL | 512x1024 | 86.20 / 86.42 | 368M | 4022G | [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
| SegFormer\* | InternImage-L | 512x1024 | 85.16 / 85.67 | 220M | 1580G | [config](./configs/cityscapes/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.log.json) |
| SegFormer\* | InternImage-XL | 512x1024 | 85.41 / 85.93 | 330M | 2364G | [config](./configs/cityscapes/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
\* denotes the model is trained using extra Mapillary dataset.
</div>
</details>
<details>
<summary> Dataset: COCO-Stuff-164K </summary>
<br>
<div>
| method | backbone | resolution | mIoU (ss) | #params | FLOPs | Config | Download |
| :---------: | :-----------: | :--------: | :-------: | :-----: | :---: | :--------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| Mask2Former | InternImage-H | 896x896 | 52.6 | 1.31B | 4635G | [config](./configs/coco_stuff164k/mask2former_internimage_h_896_80k_cocostuff164k_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff164k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_896_80k_cocostuff164k.log.json) |
</div>
</details>
## Evaluation
To evaluate our `InternImage` on ADE20K val, run:
......@@ -75,21 +153,19 @@ To evaluate our `InternImage` on ADE20K val, run:
sh dist_test.sh <config-file> <checkpoint> <gpu-num> --eval mIoU
```
You can download checkpoint files from [here](https://huggingface.co/OpenGVLab/InternImage/tree/fc1e4e7e01c3e7a39a3875bdebb6577a7256ff91). Then place it to segmentation/checkpoint_dir/seg.
For example, to evaluate the `InternImage-T` with a single GPU:
```bash
python test.py configs/ade20k/upernet_internimage_t_512_160k_ade20k.py checkpoint_dir/seg/upernet_internimage_t_512_160k_ade20k.pth --eval mIoU
python test.py configs/ade20k/upernet_internimage_t_512_160k_ade20k.py pretrained/upernet_internimage_t_512_160k_ade20k.pth --eval mIoU
```
For example, to evaluate the `InternImage-B` with a single node with 8 GPUs:
```bash
sh dist_test.sh configs/ade20k/upernet_internimage_b_512_160k_ade20k.py checkpoint_dir/seg/upernet_internimage_b_512_160k_ade20k.pth 8 --eval mIoU
sh dist_test.sh configs/ade20k/upernet_internimage_b_512_160k_ade20k.py pretrained/upernet_internimage_b_512_160k_ade20k.pth 8 --eval mIoU
```
### Training
## Training
To train an `InternImage` on ADE20K, run:
......@@ -103,7 +179,7 @@ For example, to train `InternImage-T` with 8 GPU on 1 node (total batch size 16)
sh dist_train.sh configs/ade20k/upernet_internimage_t_512_160k_ade20k.py 8
```
### Manage Jobs with Slurm
## Manage Jobs with Slurm
For example, to train `InternImage-XL` with 8 GPU on 1 node (total batch size 16), run:
......@@ -111,10 +187,10 @@ For example, to train `InternImage-XL` with 8 GPU on 1 node (total batch size 16
GPUS=8 sh slurm_train.sh <partition> <job-name> configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py
```
### Image Demo
## Image Demo
To inference a single/multiple image like this.
If you specify image containing directory instead of a single image, it will process all the images in the directory.:
If you specify image containing directory instead of a single image, it will process all the images in the directory.
```
CUDA_VISIBLE_DEVICES=0 python image_demo.py \
......@@ -124,7 +200,13 @@ CUDA_VISIBLE_DEVICES=0 python image_demo.py \
--palette ade20k
```
### Export
## Export
Install `mmdeploy` at first:
```shell
pip install mmdeploy==0.14.0
```
To export a segmentation model from PyTorch to TensorRT, run:
......
......@@ -4,7 +4,7 @@
## Introduction
The Common Objects in COntext-stuff (COCO-stuff) dataset is a dataset for scene understanding tasks like semantic segmentation, object detection and image captioning. It is constructed by annotating the original COCO dataset, which originally annotated things while neglecting stuff annotations.  There are 164k images in COCO-Stuff-164K dataset that span over 172 categories including 80 things, 91 stuff, and 1 unlabeled class.
The Common Objects in COntext-stuff (COCO-stuff) dataset is a dataset for scene understanding tasks like semantic segmentation, object detection and image captioning. It is constructed by annotating the original COCO dataset, which originally annotated things while neglecting stuff annotations.  There are 164k images in COCO-Stuff-164K dataset that span over 172 categories including 80 things, 91 stuff, and 1 unlabeled class.
## Model Zoo
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment