# ADE20K Introduced by Zhou et al. in [Scene Parsing Through ADE20K Dataset](https://paperswithcode.com/paper/scene-parsing-through-ade20k-dataset). The ADE20K semantic segmentation dataset contains more than 20K scene-centric images exhaustively annotated with pixel-level objects and object parts labels. There are totally 150 semantic categories, which include stuffs like sky, road, grass, and discrete objects like person, car, bed. ## Model Zoo ### UperNet + InternImage | backbone | resolution | mIoU (ss/ms) | train speed | train time | #param | FLOPs | Config | Download | |:--------------:|:----------:|:-----------:|:-----------:|:----------:|:-------:|:-----:|:-----:|:-------------------:| | InternImage-T | 512x512 | 47.9 / 48.1 | 0.23s / iter | 10.5h | 59M | 944G | [config](./upernet_internimage_t_512_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512_160k_ade20k.log.json) | | InternImage-S | 512x512 | 50.1 / 50.9 | 0.25s / iter | 11.5h | 80M | 1017G | [config](./upernet_internimage_s_512_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512_160k_ade20k.log.json) | | InternImage-B | 512x512 | 50.8 / 51.3 | 0.26s / iter | 12h | 128M | 1185G | [config](./upernet_internimage_b_512_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512_160k_ade20k.log.json) | | InternImage-L | 640x640 | 53.9 / 54.1 | 0.42s / iter | 19h | 256M | 2526G | [config](./upernet_internimage_l_640_160k_ade20k.py)| [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_640_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_640_160k_ade20k.log.json) | | InternImage-XL | 640x640 | 55.0 / 55.3 | 0.47s / iter | 22h | 368M | 3142G | [config](./upernet_internimage_xl_640_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_640_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_640_160k_ade20k.log.json) | | InternImage-H | 896x896 | 59.9 / 60.3 | 0.94s / iter | 2d (2n) | 1.12B | 3566G | [config](./upernet_internimage_h_896_160k_ade20k.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_h_896_160k_ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_h_896_160k_ade20k.log.json) | - Training speed is measured with A100 GPU. - Please set `with_cp=True` to save memory if you meet `out-of-memory` issues. - The logs are our recent newly trained ones. There are slight differences between the results in logs and our paper. ### Mask2Former + InternImage | backbone | resolution | mIoU (ss/ms) | train speed | train time | #param | FLOPs | Config | Download | |:--------------:|:----------:|:-----------:|:-----------:|:----------:|:-------:|:-----:|:-----:|:-------------------:| | InternImage-H | 896x896 | 62.6 / 62.9 | 1.21s / iter | 1.5d (2n) | 1.31B | 4635G | [config](./mask2former_internimage_h_896_80k_cocostuff2ade20k_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_896_80k_cocostuff2ade20k.log.json) |