README.md 8.62 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
# Cityscapes

Introduced by Cordts et al. in [The Cityscapes Dataset for Semantic Urban Scene Understanding](https://paperswithcode.com/paper/the-cityscapes-dataset-for-semantic-urban).

Cityscapes is a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories (flat surfaces, humans, vehicles, constructions, objects, nature, sky, and void). The dataset consists of around 5000 fine annotated images and 20000 coarse annotated ones. Data was captured in 50 cities during several months, daytimes, and good weather conditions. It was originally recorded as video so the frames were manually selected to have the following features: large number of dynamic objects, varying scene layout, and varying background.

## Model Zoo

### UperNet + InternImage

zhe chen's avatar
zhe chen committed
11
12
13
14
15
16
17
|    backbone    | resolution | mIoU (ss/ms)  | train speed  | train time | #params | FLOPs |                             Config                             |                                                                                                                    Download                                                                                                                    |
| :------------: | :--------: | :-----------: | :----------: | :--------: | :-----: | :---: | :------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-T  |  512x1024  | 82.58 / 83.40 | 0.32s / iter |   14.5h    |   59M   | 1889G | [config](./upernet_internimage_t_512x1024_160k_cityscapes.py)  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512x1024_160k_cityscapes.log.json)  |
| InternImage-S  |  512x1024  | 82.74 / 83.45 | 0.36s / iter |   16.5h    |   80M   | 2035G | [config](./upernet_internimage_s_512x1024_160k_cityscapes.py)  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512x1024_160k_cityscapes.log.json)  |
| InternImage-B  |  512x1024  | 83.18 / 83.97 | 0.39s / iter |    17h     |  128M   | 2369G | [config](./upernet_internimage_b_512x1024_160k_cityscapes.py)  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512x1024_160k_cityscapes.log.json)  |
| InternImage-L  |  512x1024  | 83.68 / 84.41 | 0.50s / iter |    23h     |  256M   | 3234G | [config](./upernet_internimage_l_512x1024_160k_cityscapes.py)  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_cityscapes.log.json)  |
| InternImage-XL |  512x1024  | 83.62 / 84.28 | 0.56s / iter |    26h     |  368M   | 4022G | [config](./upernet_internimage_xl_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_cityscapes.log.json) |
18
19
20
21
22
23

- Training speed is measured with A100 GPU.
- Please set `with_cp=True` to save memory if you meet `out-of-memory` issues.

### UperNet + InternImage (with additional data)

zhe chen's avatar
zhe chen committed
24
Mapillary 80k + Cityscapes (w/ coarse data) 160k
25

zhe chen's avatar
zhe chen committed
26
27
28
29
|    backbone    | resolution | mIoU (ss/ms)  | train speed  | train time | #params | FLOPs |                                  Config                                  |                                                                                                                              Download                                                                                                                              |
| :------------: | :--------: | :-----------: | :----------: | :--------: | :-----: | :---: | :----------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-L  |  512x1024  | 85.94 / 86.22 | 0.50s / iter |    23h     |  256M   | 3234G | [config](./upernet_internimage_l_512x1024_160k_mapillary2cityscapes.py)  | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.pth)  \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.log.json)  |
| InternImage-XL |  512x1024  | 86.20 / 86.42 | 0.56s / iter |    26h     |  368M   | 4022G | [config](./upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
30
31
32
33
34

### SegFormerHead + InternImage (with additional data)

Mapillary 80k + Cityscapes (w/ coarse data) 160k

zhe chen's avatar
zhe chen committed
35
36
37
38
|    backbone    | resolution | mIoU (ss/ms)  | train speed  | train time | #params | FLOPs |                                   Config                                   |                                                                                                                                Download                                                                                                                                |
| :------------: | :--------: | :-----------: | :----------: | :--------: | :-----: | :---: | :------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-L  |  512x1024  | 85.16 / 85.67 | 0.37s / iter |    17h     |  220M   | 1580G | [config](./segformer_internimage_l_512x1024_160k_mapillary2cityscapes.py)  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.log.json)  |
| InternImage-XL |  512x1024  | 85.41 / 85.93 | 0.43s / iter |   19.5h    |  330M   | 2364G | [config](./segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
zhe chen's avatar
zhe chen committed
39
40
41
42
43
44
45
46

### Mask2Former + InternImage (with additional data)

Mapillary 80k + Cityscapes (w/ coarse data) 80k

|   backbone    | resolution | mIoU (ss/ms)  | #params | FLOPs |                                     Config                                     |                                                                                                                                 Download                                                                                                                                 |
| :-----------: | :--------: | :-----------: | :-----: | :---: | :----------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| InternImage-H | 1024x1024  | 86.37 / 86.96 |  1094M  | 7878G | [config](./mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.log.json) |