Update README.md

Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md

Update README.md
Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md
477b5ed3 · zhe chen · f37f9c2a · 477b5ed3 · 477b5ed3 · 477b5ed3
Commit 477b5ed3 authored Jan 24, 2025 by zhe chen
7 changed files
--- a/.gitignore
+++ b/.gitignore
-
 .idea/
 .DS_Store
 __pycache__/
@@ -6,3 +5,4 @@ classification/convertor/
 segmentation/convertor/
 checkpoint_dir/
 demo/
+pretrained/
--- a/README.md
+++ b/README.md
--- a/README_CN.md
+++ b/README_CN.md
--- a/classification/README.md
+++ b/classification/README.md
@@ -10,7 +10,7 @@ This folder contains the implementation of the InternImage for image classificat
 - [Evaluation](#evaluation)
 - [Training from Scratch on ImageNet-1K](#training-from-scratch-on-imagenet-1k)
 - [Manage Jobs with Slurm](#manage-jobs-with-slurm)
- [Training with Deepspeed](#training-with-deepspeed)
+- [Training with DeepSpeed](#training-with-deepspeed)
 - [Extracting Intermediate Features](#extracting-intermediate-features)
 - [Export](#export)

@@ -47,6 +47,7 @@ pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113  -f https://download.p
 ```bash
 pip install -U openmim
 mim install mmcv-full==1.5.0
+mim install mmsegmentation==0.27.0
 pip install timm==0.6.11 mmdet==2.28.1
 ```

@@ -59,7 +60,7 @@ pip install numpy==1.26.4
 pip install pydantic==1.10.13
 ```

- Compiling CUDA operators
+- Compile CUDA operators

 Before compiling, please use the `nvcc -V` command to check whether your `nvcc` version matches the CUDA version of PyTorch.

@@ -79,8 +80,9 @@ We provide the following ways to prepare data:

 <details open>
  <summary>Standard ImageNet-1K</summary>
+<br>

-We use standard ImageNet dataset, you can download it from http://image-net.org/.
+- We use standard ImageNet dataset, you can download it from http://image-net.org/.

 - For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:

@@ -195,12 +197,12 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
 <br>
 <div>

-|      name      |   pretrain   | pre-training  resolution | #param |                                               download                                                |
-| :------------: | :----------: | :----------------------: | :----: | :---------------------------------------------------------------------------------------------------: |
+|      name      |   pretrain   | resolution | #param |                                               download                                                |
+| :------------: | :----------: | :--------: | :----: | :---------------------------------------------------------------------------------------------------: |
 | InternImage-L  | ImageNet-22K |  384x384   |  223M  |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22k_192to384.pth)    |
 | InternImage-XL | ImageNet-22K |  384x384   |  335M  |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22k_192to384.pth)   |
 | InternImage-H  |  Joint 427M  |  384x384   | 1.08B  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_jointto22k_384.pth)   |
-| InternImage-G  |      -       |         384x384          |   3B   | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth) |
+| InternImage-G  |  Joint 427M  |  384x384   |   3B   | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth) |

 </div>

@@ -212,14 +214,14 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
 <div>

 |      name      |   pretrain   | resolution | acc@1 | #param | FLOPs |                                                                                                                     download                                                                                                                     |
-| :------------: | :----------: | :--------: | :---: | :----: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| InternImage-T  | ImageNet-1K  |  224x224   | 83.5  |  30M   |  5G   |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_t_1k_224.yaml)       |
-| InternImage-S  | ImageNet-1K  |  224x224   | 84.2  |  50M   |  8G   |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_s_1k_224.yaml)       |
-| InternImage-B  | ImageNet-1K  |  224x224   | 84.9  |  97M   |  16G  |       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_b_1k_224.yaml)       |
+| :------------: | :----------: | :--------: | :---: | :----: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| InternImage-T  | ImageNet-1K  |  224x224   | 83.5  |  30M   |  5G   | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_t_1k_224.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_t_1k_224.log) |
+| InternImage-S  | ImageNet-1K  |  224x224   | 84.2  |  50M   |  8G   | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_s_1k_224.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_s_1k_224.log) |
+| InternImage-B  | ImageNet-1K  |  224x224   | 84.9  |  97M   |  16G  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_b_1k_224.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_b_1k_224.log) |
 | InternImage-L  | ImageNet-22K |  384x384   | 87.7  |  223M  | 108G  |                                        [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_l_22kto1k_384.yaml)                                         |
 | InternImage-XL | ImageNet-22K |  384x384   | 88.0  |  335M  | 163G  |                                       [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_xl_22kto1k_384.yaml)                                        |
 | InternImage-H  |  Joint 427M  |  640x640   | 89.6  | 1.08B  | 1478G |                                        [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22kto1k_640.pth) \| [cfg](configs/without_lr_decay/internimage_h_22kto1k_640.yaml)                                         |
-| InternImage-G  |      -       |  512x512   | 90.1  |   3B   | 2700G |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth) \| [cfg](configs/without_lr_decay/internimage_g_22kto1k_512.yaml)  |
+| InternImage-G  |  Joint 427M  |  512x512   | 90.1  |   3B   | 2700G |                                        [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth) \| [cfg](configs/without_lr_decay/internimage_g_22kto1k_512.yaml)                                         |

 </div>

@@ -231,8 +233,8 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
 <div>

 |     name      |  pretrain  | resolution | acc@1 | #param |                                                                                                                                  download                                                                                                                                  |
-| :-----------: | :--------: | :--------: | :---: | :----: | :-----------------------------------------------------------------------------: |
-| InternImage-H | Joint 427M |  384x384   | 92.6  |  1.1B  | [ckpt](<>) \| [cfg](configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml) |
+| :-----------: | :--------: | :--------: | :---: | :----: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| InternImage-H | Joint 427M |  384x384   | 92.6  |  1.1B  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22ktoinat18_384.pth) \| [cfg](configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_h_22ktoinat18_384.log) |

 </div>

@@ -267,56 +269,104 @@ python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --maste

 ## Manage Jobs with Slurm

-For example, to train or evaluate `InternImage` with 8 GPU on a single node, run:
+For example, to train or evaluate `InternImage` with slurm cluster, run:

-`InternImage-T`:
+<details open>
+<summary> InternImage-T (IN-1K) </summary>
+<br>

 ```bash
-# Train for 300 epochs
-GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml
-# Evaluate on ImageNet-1K
+# Train for 300 epochs with 8 GPUs
+GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 512
+# Train for 300 epochs with 32 GPUs
+GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 128
+# Evaluate on ImageNet-1K with 8 GPUs
 GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --resume pretrained/internimage_t_1k_224.pth --eval
 ```

-`InternImage-S`:
+</details>
+
+<details>
+<summary> InternImage-S (IN-1K) </summary>
+<br>

 ```bash
-# Train for 300 epochs
-GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml
-# Evaluate on ImageNet-1K
+# Train for 300 epochs with 8 GPUs
+GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --batch-size 512
+# Train for 300 epochs with 32 GPUs
+GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --batch-size 128
+# Evaluate on ImageNet-1K with 8 GPUs
 GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --resume pretrained/internimage_s_1k_224.pth --eval
 ```

-`InternImage-XL`:
+</details>
+
+<details>
+<summary> InternImage-B (IN-1K) </summary>
+<br>
+
+```bash
+# Train for 300 epochs with 8 GPUs
+GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --batch-size 512
+# Train for 300 epochs with 32 GPUs
+GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --batch-size 128
+# Evaluate on ImageNet-1K with 8 GPUs
+GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --resume pretrained/internimage_b_1k_224.pth --eval
+```
+
+</details>
+
+<details>
+<summary> InternImage-L (IN-22K to IN-1K) </summary>
+<br>

 ```bash
-# Train for 300 epochs
-GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml
-# Evaluate on ImageNet-1K
+# Train for 20 epochs with 32 GPUs
+GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_l_22kto1k_384.yaml --batch-size 16
+# Evaluate on ImageNet-1K with 8 GPUs
+GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_l_22kto1k_384.yaml --resume pretrained/internimage_l_22kto1k_384.pth --eval
+```
+
+</details>
+
+<details>
+<summary> InternImage-XL (IN-22K to IN-1K) </summary>
+<br>
+
+```bash
+# Train for 20 epochs with 32 GPUs
+GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml --batch-size 16
+# Evaluate on ImageNet-1K with 8 GPUs
 GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml --resume pretrained/internimage_xl_22kto1k_384.pth --eval
 ```

-<!--
-### Test pretrained model on ImageNet-22K
+</details>

-For example, to evaluate the `InternImage-L-22k`:
+<details>
+<summary> InternImage-H (IN-22K to IN-1K) </summary>
+<br>

 ```bash
-python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345  main.py \
--cfg configs/internimage_xl_22k_192to384.yaml --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory>] \
--resume internimage_xl_22k_192to384.pth --eval
-``` -->
+# Train for 20 epochs with 32 GPUs
+GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --batch-size 16
+# Evaluate on ImageNet-1K with 8 GPUs
+GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --resume pretrained/internimage_h_22kto1k_640.pth --eval
+```

-<!-- ### Fine-tuning from a ImageNet-22K pretrained model
+</details>

-For example, to fine-tune a `InternImage-XL-22k` model pretrained on ImageNet-22K:
+<details>
+<summary> InternImage-G (IN-22K to IN-1K) </summary>
+<br>

-```bashs
-GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_image_.yaml --pretrained intern_image_b.pth --eval
-python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345  main.py \
--cfg configs/.yaml --pretrained swin_base_patch4_window7_224_22k.pth \
--data-path <imagenet-path> --batch-size 64 --accumulation-steps 2 [--use-checkpoint]
-``` -->
+```bash
+# Train for 20 epochs with 64 GPUs
+GPUS=64 sh train_in1k.sh <partition> <job-name> configs/internimage_g_22kto1k_512.yaml --batch-size 8
+# Evaluate on ImageNet-1K with 8 GPUs
+GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_g_22kto1k_512.yaml --resume pretrained/internimage_g_22kto1k_512.pth --eval
+```
+
+</details>

 ## Training with DeepSpeed

@@ -394,7 +444,7 @@ python extract_feature.py --cfg configs/internimage_t_1k_224.yaml --img b.png --
 Install `mmdeploy` at first:

 ```shell
-pip
+pip install mmdeploy==0.14.0
 ```

 To export `InternImage-T` from PyTorch to ONNX, run:

--- a/classification/configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml
+++ b/classification/configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml
@@ -28,7 +28,7 @@ MODEL:
  PRETRAINED: 'pretrained/internimage_h_jointto22k_384.pth'
 TRAIN:
  EMA:
-    ENABLE: true
+    ENABLE: false
    DECAY: 0.9999
  EPOCHS: 100
  WARMUP_EPOCHS: 0
@@ -36,11 +36,7 @@ TRAIN:
  BASE_LR: 2e-05 # 512
  WARMUP_LR: .0
  MIN_LR: .0
-  LR_LAYER_DECAY: true
-  LR_LAYER_DECAY_RATIO: 0.9
  USE_CHECKPOINT: true
  RAND_INIT_FT_HEAD: true
-  OPTIMIZER:
-    DCN_LR_MUL: 0.1
 AMP_OPT_LEVEL: O0
 EVAL_FREQ: 1
--- a/segmentation/README.md
+++ b/segmentation/README.md
@@ -4,11 +4,22 @@ This folder contains the implementation of the InternImage for semantic segmenta

 Our segmentation code is developed on top of [MMSegmentation v0.27.0](https://github.com/open-mmlab/mmsegmentation/tree/v0.27.0).

-## Usage
+<!-- TOC -->

-### Install
+- [Installation](#installation)
+- [Data Preparation](#data-preparation)
+- [Released Models](#released-models)
+- [Evaluation](#evaluation)
+- [Training](#training)
+- [Manage Jobs with Slurm](#manage-jobs-with-slurm)
+- [Image Demo](#image-demo)
+- [Export](#export)

- Clone this repo:
+<!-- TOC -->
+
+## Installation
+
+- Clone this repository:

 ```bash
 git clone https://github.com/OpenGVLab/InternImage.git
@@ -26,11 +37,10 @@ conda activate internimage
  the [official installation instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
 - Install `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2`:

-For examples, to install torch==1.11 with CUDA==11.3 and nvcc:
+For examples, to install `torch==1.11` with `CUDA==11.3`:

 ```bash
-conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch -y
-conda install -c conda-forge cudatoolkit-dev=11.3 -y # to install nvcc
+pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113  -f https://download.pytorch.org/whl/torch_stable.html
 ```

 - Install other requirements:
@@ -42,7 +52,7 @@ conda install -c conda-forge termcolor yacs pyyaml scipy pip -y
 pip install opencv-python
 ```

- Install `timm` and `mmcv-full` and \`mmsegmentation':
+- Install `timm`, `mmcv-full` and \`mmsegmentation':

 ```bash
 pip install -U openmim
@@ -51,8 +61,19 @@ mim install mmsegmentation==0.27.0
 pip install timm==0.6.11 mmdet==2.28.1
 ```

+- Install other requirements:
+
+```bash
+pip install opencv-python termcolor yacs pyyaml scipy
+# Please use a version of numpy lower than 2.0
+pip install numpy==1.26.4
+pip install pydantic==1.10.13
+```
+
 - Compile CUDA operators

+Before compiling, please use the `nvcc -V` command to check whether your `nvcc` version matches the CUDA version of PyTorch.
+
 ```bash
 cd ./ops_dcnv3
 sh ./make.sh
@@ -60,14 +81,71 @@ sh ./make.sh
 python test.py
 ```

- You can also install the operator using .whl files
+- You can also install the operator using precompiled `.whl` files
  [DCNv3-1.0-whl](https://github.com/OpenGVLab/InternImage/releases/tag/whl_files)

-### Data Preparation
+## Data Preparation

 Prepare datasets according to the [guidelines](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#prepare-datasets) in MMSegmentation.

-### Evaluation
+## Released Models
+
+<details open>
+<summary> Dataset: ADE20K </summary>
+<br>
+<div>
+
+|   method    |    backbone    | resolution | mIoU (ss/ms) | #param | FLOPs |                                       Config                                        |                                                                                                                       Download                                                                                                                       |
+| :---------: | :------------: | :--------: | :----------: | :----: | :---: | :---------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|   UperNet   | InternImage-T  |  512x512   | 47.9 / 48.1  |  59M   | 944G  |         [config](./configs/ade20k/upernet_internimage_t_512_160k_ade20k.py)         |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512_160k_ade20k.log.json)              |
+|   UperNet   | InternImage-S  |  512x512   | 50.1 / 50.9  |  80M   | 1017G |         [config](./configs/ade20k/upernet_internimage_s_512_160k_ade20k.py)         |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512_160k_ade20k.log.json)              |
+|   UperNet   | InternImage-B  |  512x512   | 50.8 / 51.3  |  128M  | 1185G |         [config](./configs/ade20k/upernet_internimage_b_512_160k_ade20k.py)         |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512_160k_ade20k.log.json)              |
+|   UperNet   | InternImage-L  |  640x640   | 53.9 / 54.1  |  256M  | 2526G |         [config](./configs/ade20k/upernet_internimage_l_640_160k_ade20k.py)         |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_640_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_640_160k_ade20k.log.json)              |
+|   UperNet   | InternImage-XL |  640x640   | 55.0 / 55.3  |  368M  | 3142G |        [config](./configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py)         |             [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_640_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_640_160k_ade20k.log.json)             |
+|   UperNet   | InternImage-H  |  896x896   | 59.9 / 60.3  | 1.12B  | 3566G |         [config](./configs/ade20k/upernet_internimage_h_896_160k_ade20k.py)         |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_h_896_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_h_896_160k_ade20k.log.json)              |
+| Mask2Former | InternImage-H  |  896x896   | 62.6 / 62.9  | 1.31B  | 4635G | [config](./configs/ade20k/mask2former_internimage_h_896_80k_cocostuff2ade20k_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.log.json) |
+
+</div>
+
+</details>
+
+<details>
+<summary> Dataset: Cityscapes </summary>
+<br>
+<div>
+
+|   method    |    backbone    | resolution | mIoU (ss/ms)  | #params | FLOPs |                                            Config                                             |                                                                                                                                Download                                                                                                                                |
+| :---------: | :------------: | :--------: | :-----------: | :-----: | :---: | :-------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|   UperNet   | InternImage-T  |  512x1024  | 82.58 / 83.40 |   59M   | 1889G |       [config](./configs/cityscapes/upernet_internimage_t_512x1024_160k_cityscapes.py)        |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512x1024_160k_cityscapes.log.json)              |
+|   UperNet   | InternImage-S  |  512x1024  | 82.74 / 83.45 |   80M   | 2035G |       [config](./configs/cityscapes/upernet_internimage_s_512x1024_160k_cityscapes.py)        |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512x1024_160k_cityscapes.log.json)              |
+|   UperNet   | InternImage-B  |  512x1024  | 83.18 / 83.97 |  128M   | 2369G |       [config](./configs/cityscapes/upernet_internimage_b_512x1024_160k_cityscapes.py)        |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512x1024_160k_cityscapes.log.json)              |
+|   UperNet   | InternImage-L  |  512x1024  | 83.68 / 84.41 |  256M   | 3234G |       [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_cityscapes.py)        |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_cityscapes.log.json)              |
+|  UperNet\*  | InternImage-L  |  512x1024  | 85.94 / 86.22 |  256M   | 3234G |  [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.py)   |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.pth)  \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.log.json)    |
+|   UperNet   | InternImage-XL |  512x1024  | 83.62 / 84.28 |  368M   | 4022G |       [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_cityscapes.py)       |             [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_cityscapes.log.json)             |
+|  UperNet\*  | InternImage-XL |  512x1024  | 86.20 / 86.42 |  368M   | 4022G |  [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.py)  |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json)   |
+| SegFormer\* | InternImage-L  |  512x1024  | 85.16 / 85.67 |  220M   | 1580G | [config](./configs/cityscapes/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.py)  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.log.json)  |
+| SegFormer\* | InternImage-XL |  512x1024  | 85.41 / 85.93 |  330M   | 2364G | [config](./configs/cityscapes/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
+
+\* denotes the model is trained using extra Mapillary dataset.
+
+</div>
+
+</details>
+
+<details>
+<summary> Dataset: COCO-Stuff-164K </summary>
+<br>
+<div>
+
+|   method    |   backbone    | resolution | mIoU (ss) | #params | FLOPs |                                          Config                                          |                                                                                                                    Download                                                                                                                    |
+| :---------: | :-----------: | :--------: | :-------: | :-----: | :---: | :--------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| Mask2Former | InternImage-H |  896x896   |   52.6    |  1.31B  | 4635G | [config](./configs/coco_stuff164k/mask2former_internimage_h_896_80k_cocostuff164k_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff164k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_896_80k_cocostuff164k.log.json) |
+
+</div>
+
+</details>
+
+## Evaluation

 To evaluate our `InternImage` on ADE20K val, run:

@@ -75,21 +153,19 @@ To evaluate our `InternImage` on ADE20K val, run:
 sh dist_test.sh <config-file> <checkpoint> <gpu-num> --eval mIoU
 ```

-You can download checkpoint files from [here](https://huggingface.co/OpenGVLab/InternImage/tree/fc1e4e7e01c3e7a39a3875bdebb6577a7256ff91). Then place it to segmentation/checkpoint_dir/seg.
-
 For example, to evaluate the `InternImage-T` with a single GPU:

 ```bash
-python test.py configs/ade20k/upernet_internimage_t_512_160k_ade20k.py checkpoint_dir/seg/upernet_internimage_t_512_160k_ade20k.pth --eval mIoU
+python test.py configs/ade20k/upernet_internimage_t_512_160k_ade20k.py pretrained/upernet_internimage_t_512_160k_ade20k.pth --eval mIoU
 ```

 For example, to evaluate the `InternImage-B` with a single node with 8 GPUs:

 ```bash
-sh dist_test.sh configs/ade20k/upernet_internimage_b_512_160k_ade20k.py checkpoint_dir/seg/upernet_internimage_b_512_160k_ade20k.pth 8 --eval mIoU
+sh dist_test.sh configs/ade20k/upernet_internimage_b_512_160k_ade20k.py pretrained/upernet_internimage_b_512_160k_ade20k.pth 8 --eval mIoU
 ```

-### Training
+## Training

 To train an `InternImage` on ADE20K, run:

@@ -103,7 +179,7 @@ For example, to train `InternImage-T` with 8 GPU on 1 node (total batch size 16)
 sh dist_train.sh configs/ade20k/upernet_internimage_t_512_160k_ade20k.py 8
 ```

-### Manage Jobs with Slurm
+## Manage Jobs with Slurm

 For example, to train `InternImage-XL` with 8 GPU on 1 node (total batch size 16), run:

@@ -111,10 +187,10 @@ For example, to train `InternImage-XL` with 8 GPU on 1 node (total batch size 16
 GPUS=8 sh slurm_train.sh <partition> <job-name> configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py
 ```

-### Image Demo
+## Image Demo

 To inference a single/multiple image like this.
-If you specify image containing directory instead of a single image, it will process all the images in the directory.:
+If you specify image containing directory instead of a single image, it will process all the images in the directory.

 ```
 CUDA_VISIBLE_DEVICES=0 python image_demo.py \
@@ -124,7 +200,13 @@ CUDA_VISIBLE_DEVICES=0 python image_demo.py \
  --palette ade20k
 ```

-### Export
+## Export
+
+Install `mmdeploy` at first:
+
+```shell
+pip install mmdeploy==0.14.0
+```

 To export a segmentation model from PyTorch to TensorRT, run:


--- a/segmentation/configs/coco_stuff164k/README.md
+++ b/segmentation/configs/coco_stuff164k/README.md
@@ -4,7 +4,7 @@

 ## Introduction

-The Common Objects in COntext-stuff (COCO-stuff) dataset is a dataset for scene understanding tasks like semantic segmentation, object detection and image captioning. It is constructed by annotating the original COCO dataset, which originally annotated things while neglecting stuff annotations.  There are 164k images in COCO-Stuff-164K dataset that span over 172 categories including 80 things, 91 stuff, and 1 unlabeled class.
+The Common Objects in COntext-stuff (COCO-stuff) dataset is a dataset for scene understanding tasks like semantic segmentation, object detection and image captioning. It is constructed by annotating the original COCO dataset, which originally annotated things while neglecting stuff annotations.  There are 164k images in COCO-Stuff-164K dataset that span over 172 categories including 80 things, 91 stuff, and 1 unlabeled class.

 ## Model Zoo