Commit 0d97cc8c authored by Sugon_ldc's avatar Sugon_ldc
Browse files

add new model

parents
Pipeline #316 failed with stages
in 0 seconds
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 160000
model:
type: OCRNet
backbone:
type: HRNet_W48
pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz
num_classes: 19
backbone_indices: [0]
optimizer:
type: sgd
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 40000
model:
type: OCRNet
backbone:
type: HRNet_W48
pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz
num_classes: 19
backbone_indices: [0]
optimizer:
type: sgd
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 40000
model:
type: OCRNet
backbone:
type: HRNet_W48
pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz
num_classes: 19
backbone_indices: [0]
optimizer:
type: sgd
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: SemanticConnectivityLoss
coef: [1, 0.1]
- type: CrossEntropyLoss
coef: [1, 0.4]
_base_: './ocrnet_hrnetw18_voc12aug_512x512_40k.yml'
model:
backbone:
type: HRNet_W48
pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz
# Panoptic Feature Pyramid Networks
## Reference
> Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár. "Panoptic Feature Pyramid Networks." arXiv preprint arXiv:1901.02446(2019).
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|PFPNNet|ResNet101_vd|1024x512|40000|79.07%|79.46%|79.75%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/pfpn_resnet101_os8_cityscapes_512x1024_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/pfpn_resnet101_os8_cityscapes_512x1024_40k/train.log )\| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=29ef44625be2fb00a255b215b26cf00f)|
_base_: '../_base_/cityscapes.yml'
batch_size: 4
iters: 40000
model:
type: PFPNNet
backbone:
type: ResNet101_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
backbone_indices: [0, 1, 2, 3]
channels: 256
train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [512, 1024]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.4
contrast_range: 0.4
saturation_range: 0.4
- type: Normalize
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
end_lr: 0
power: 0.9
optimizer:
type: sgd
momentum: 0.9
weight_decay: 0.0005
# PointRend: Image Segmentation As Rendering
## Reference
> Alexander Kirillov, Yuxin Wu, Kaiming He, Ross Girshick. "PointRend: Image Segmentation As Rendering." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9799-9808. 2020.
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|-|-|-|-|-|-|-|-|
|PointRend|ResNet50_vd|1024x512|80000|76.54%|76.84%|77.45%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/pointrend_resnet50_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/pointrend_resnet50_os8_cityscapes_1024×512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=bda232796400bc15141a088197d9a8c0) |
### Pascal VOC 2012 + Aug
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|-|-|-|-|-|-|-|-|
|PointRend|ResNet50_vd|512x512|40000|72.82%|73.53%|74.62%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/pointrend_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/pointrend_resnet50_os8_voc12aug_512×512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=35c2d83707f51b23eabbe734606493a5) |
|PointRend|ResNet101_vd|512x512|40000|74.09%|74.7%|74.85%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/pointrend_resnet101_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/pointrend_resnet101_os8_voc12aug_512×512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=b2f7b7e99bba213db27b52826086686a) |
_base_: 'pointrend_resnet50_os8_cityscapes_1024x512_80k.yml'
model:
backbone:
type: ResNet101_vd
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
_base_: 'pointrend_resnet50_os8_voc12aug_512x512_40k.yml'
model:
backbone:
type: ResNet101_vd
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
_base_: '../_base_/cityscapes.yml'
model:
type: PointRend
backbone:
type: ResNet50_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
backbone_indices: [0, 1, 2, 3]
loss:
types:
- type: CrossEntropyLoss
- type: PointCrossEntropyLoss
coef: [1, 1]
optimizer:
type: sgd
momentum: 0.9
weight_decay: 0.0005
_base_: '../_base_/pascal_voc12aug.yml'
model:
type: PointRend
backbone:
type: ResNet50_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
backbone_indices: [0, 1, 2, 3]
loss:
types:
- type: CrossEntropyLoss
- type: PointCrossEntropyLoss
coef: [1, 1]
optimizer:
type: sgd
momentum: 0.9
weight_decay: 0.0005
# PortraitNet: Real-time Portrait Segmentation Network for Mobile Device
## Reference
> Song-Hai Zhanga, Xin Donga, Jia Lib, Ruilong Lia, Yong-Liang Yang. "PortraitNet: Real-time portrait segmentation network for mobile device". @ CAD&Graphics 2019.
## Performance
| Model | Backbone | Dataset | Resolution | Training Iters | mIoU | Link |
|-|-|-|-|-|-|-|
|PortraitNet|MobileNetV2|EG1800|224x224|46000|96.92%|[model](https://bj.bcebos.com/paddleseg/dygraph/portraitnet_mobilenetv2_eg1800_224x224_46k/model.pdparams)|
|PortraitNet|MobileNetV2|Supervise.ly|224x224|60000|93.94%|[model](https://bj.bcebos.com/paddleseg/dygraph/portraitnet_mobilenetv2_supervisely_224x224_60k/model.pdparams)|
## Online Tutorial
[AI Studio](https://aistudio.baidu.com/aistudio/projectdetail/1754799)
## Dataset
The training process will automatically download the dataset.
You can also download and view it manually.
[Baidu Pan](https://pan.baidu.com/s/15uBpR7zFF2zpUccoq5pQYg)
password: ajcs
batch_size: 64
iters: 46000
train_dataset:
type: EG1800
dataset_root: data/EG1800
common_transforms:
- type: RandomAffine
max_rotation: 45
min_scale_factor: 0.5
max_scale_factor: 1.5
size: [ 224, 224 ]
translation_offset: 56
- type: RandomHorizontalFlip
transforms1:
- type: Normalize
mean: [0.485, 0.458, 0.408]
std: [0.23, 0.23, 0.23]
transforms2:
- type: RandomDistort
brightness_range: 0.6
contrast_range: 0.4
saturation_range: 0.6
hue_prob: 0.0
sharpness_range: 0.2
sharpness_prob: 0.5
- type: RandomBlur
prob: 0.5
blur_type: random
- type: RandomNoise
- type: Normalize
mean: [ 0.485, 0.458, 0.408 ]
std: [ 0.23, 0.23, 0.23 ]
mode: train
val_dataset:
type: EG1800
dataset_root: data/EG1800
common_transforms:
- type: ScalePadding
target_size: [ 224, 224 ]
im_padding_value: [127.5, 127.5, 127.5]
label_padding_value: 0
- type: Normalize
mean: [0.485, 0.458, 0.408]
std: [0.23, 0.23, 0.23]
transforms1: null
transforms2: null
mode: val
optimizer:
type: adam
weight_decay: 5.0e-4
lr_scheduler:
type: StepDecay
learning_rate: 0.001
step_size: 460
gamma: 0.95
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
- type: FocalLoss
- type: KLLoss
coef: [1, 1, 0.3, 2]
model:
type: PortraitNet
backbone:
type: MobileNetV2_x1_0
channel_ratio: 1.0
min_channel: 16
pretrained: https://paddleseg.bj.bcebos.com/dygraph/backbone/mobilenetv2_x1_0_ssld.tar.gz
add_edge: True
num_classes: 2
_base_: './portraitnet_eg1800_224x224_46k.yml'
batch_size: 64
iters: 60000
train_dataset:
type: SUPERVISELY
dataset_root: data/Supervisely_face
common_transforms:
- type: RandomAffine
max_rotation: 45
min_scale_factor: 0.5
max_scale_factor: 1.5
size: [ 224, 224 ]
translation_offset: 56
- type: RandomHorizontalFlip
transforms1:
- type: Normalize
mean: [0.485, 0.458, 0.408]
std: [0.23, 0.23, 0.23]
transforms2:
- type: RandomDistort
brightness_range: 0.6
contrast_range: 0.4
saturation_range: 0.6
hue_prob: 0.0
sharpness_range: 0.2
sharpness_prob: 0.5
- type: RandomBlur
prob: 0.5
blur_type: random
- type: RandomNoise
- type: Normalize
mean: [ 0.485, 0.458, 0.408 ]
std: [ 0.23, 0.23, 0.23 ]
mode: train
val_dataset:
type: SUPERVISELY
dataset_root: data/Supervisely_face
common_transforms:
- type: ScalePadding
target_size: [ 224, 224 ]
im_padding_value: [127.5, 127.5, 127.5]
label_padding_value: 0
- type: Normalize
mean: [0.485, 0.458, 0.408]
std: [0.23, 0.23, 0.23]
transforms1: null
transforms2: null
mode: val
# PP-HumanSeg-Lite
A self-developed ultra lightweight model ConnectNet, is suitable for real-time segmentation scenarios on the web or mobile. See [paper](https://arxiv.org/abs/2112.07146) for more information.
## Network Structure
![](pphumanseg_lite.png)
## Performance
Refer to [PP-HumanSeg](../../contrib/PP-HumanSeg).
model:
type: PPHumanSegLite
align_corners: False
num_classes: 2
export:
transforms:
- type: Resize
target_size: [398, 224]
- type: Normalize
val_dataset:
type: Dataset
dataset_root: data/mini_supervisely
val_path: data/mini_supervisely/val.txt
num_classes: 2
transforms:
- type: Resize
target_size: [398, 224]
- type: Normalize
mode: val
batch_size: 64
iters: 2000
train_dataset:
type: Dataset
dataset_root: data/mini_supervisely
train_path: data/mini_supervisely/train.txt
num_classes: 2
transforms:
- type: Resize
target_size: [398, 224]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.4
contrast_range: 0.4
saturation_range: 0.4
- type: Normalize
mode: train
val_dataset:
type: Dataset
dataset_root: data/mini_supervisely
val_path: data/mini_supervisely/val.txt
num_classes: 2
transforms:
- type: Resize
target_size: [398, 224]
- type: Normalize
mode: val
export:
transforms:
- type: Resize
target_size: [398, 224]
- type: Normalize
optimizer:
type: sgd
momentum: 0.9
weight_decay: 0.0005
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.05
end_lr: 0
power: 0.9
loss:
types:
- type: CrossEntropyLoss
coef: [1]
model:
type: PPHumanSegLite
align_corners: False
num_classes: 2
# PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model
## Reference
> Juncai Peng, Yi Liu, Shiyu Tang, Yuying Hao, Lutao Chu, Guowei Chen, Zewu Wu, Zeyu Chen, Zhiliang Yu, Yuning Du, Qingqing Dang,Baohua Lai, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma. PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model. https://arxiv.org/abs/2204.02681
## Overview
We propose PP-LiteSeg, a novel lightweight model for the real-time semantic segmentation task. Specifically, we present a Flexible and Lightweight Decoder (FLD) to reduce computation overhead of previous decoder. To strengthen feature representations, we propose a Unified Attention Fusion Module (UAFM), which takes advantage of spatial and channel attention to produce a weight and then fuses the input features with the weight. Moreover, a Simple Pyramid Pooling Module (SPPM) is proposed to aggregate global context with low computation cost.
<div align="center">
<img src="https://user-images.githubusercontent.com/52520497/162148786-c8b91fd1-d006-4bad-8599-556daf959a75.png" width = "600" height = "300" alt="arch" />
</div>
## Training
**Prepare:**
* Install gpu driver, cuda toolkit and cudnn
* Install Paddle and PaddleSeg ([doc](../../docs/install.md))
* Download dataset and link it to `PaddleSeg/data` ([Cityscapes](https://paddleseg.bj.bcebos.com/dataset/cityscapes.tar), [CamVid](https://paddleseg.bj.bcebos.com/dataset/camvid.tar))
```
PaddleSeg/data
├── cityscapes
│   ├── gtFine
│   ├── infer.list
│   ├── leftImg8bit
│   ├── test.list
│   ├── train.list
│   ├── trainval.list
│   └── val.list
├── camvid
│   ├── annot
│   ├── images
│   ├── README.md
│   ├── test.txt
│   ├── train.txt
│   └── val.txt
```
**Training:**
The config files of PP-LiteSeg are under `PaddleSeg/configs/pp_liteseg/`.
Based on the `train.py` script, we set the config file and start training model.
```Shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
export model=pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k # test resolution is 1024*512
# export model=pp_liteseg_stdc1_cityscapes_1024x512_scale0.75_160k # test resolution is 1536x768
# export model=pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k # test resolution is 2048x1024
# export model=pp_liteseg_stdc2_cityscapes_1024x512_scale0.5_160k
# export model=pp_liteseg_stdc2_cityscapes_1024x512_scale0.75_160k
# export model=pp_liteseg_stdc2_cityscapes_1024x512_scale1.0_160k
# export model=pp_liteseg_stdc1_camvid_960x720_10k
# export model=pp_liteseg_stdc2_camvid_960x720_10k
python -m paddle.distributed.launch tools/train.py \
--config configs/pp_liteseg/${model}.yml \
--save_dir output/${model} \
--save_interval 1000 \
--num_workers 3 \
--do_eval \
--use_vdl
```
After the training, the weights are saved in `PaddleSeg/output/xxx/best_model/model.pdparams`.
Refer to [doc](../../docs/train/train.md) for the detailed usage of training.
## Evaluation
With the config file and trained weights, we use the `val.py` script to evaluate the model.
Refer to [doc](../../docs/evaluation/evaluate/evaluate.md) for the detailed usage of evalution.
```shell
export CUDA_VISIBLE_DEVICES=0
export model=pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k
# export other model
python tools/val.py \
--config configs/pp_liteseg/${model}.yml \
--model_path output/${model}/best_model/model.pdparams \
--num_workers 3
```
## Deployment
**Using ONNX+TRT**
Prepare:
* Install gpu driver, cuda toolkit and cudnn
* Download TensorRT 7 tar file from [Nvidia](https://developer.nvidia.com/tensorrt). We provide [cuda10.2-cudnn8.0-trt7.1](https://paddle-inference-dist.bj.bcebos.com/tensorrt_test/cuda10.2-cudnn8.0-trt7.1.tgz)
* Install the TensorRT whl in the tar file, i.e., `pip install TensorRT-7.1.3.4/python/xx.whl`
* Set Path, i.e., `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:TensorRT-7.1.3.4/lib`
* Install Paddle and PaddleSeg ([doc](../../docs/install.md))
* Run `pip install 'pycuda>=2019.1.1'`
* Run `pip install paddle2onnx onnx onnxruntime`
We measure the inference speed with [infer_onnx_trt.py](../../deploy/python/infer_onnx_trt.py), which first exports the Paddle model as ONNX and then infers the ONNX model by TRT.
Sometimes, the adaptive average pooling op can not be converted to ONNX. To solve the problem, you can adjust the input shape of the model as a multiple of 128.
```shell
python deploy/python/infer_onnx_trt.py \
--config configs/pp_liteseg/pp_liteseg_xxx.yml
--width 1024 \
--height 512
```
Please refer to [infer_onnx_trt.py](../../deploy/python/infer_onnx_trt.py) for the detailed usage.
**Using PaddleInference**
Export the trained model as inference model ([doc](../../docs/model_export.md)).
Use PaddleInference to deploy the inference model on Nvidia GPU and X86 CPU([python api doc](../../docs/deployment/inference/python_inference.md), [cpp api doc](../../docs/deployment/inference/cpp_inference.md)).
## Performance
### Cityscapes
| Model | Backbone | Training Iters | Train Resolution | Test Resolution | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|-|-|-|-|-|-|-|-|-|
|PP-LiteSeg-T|STDC1|160000|1024x512|1025x512|73.10%|73.89%|-|[config](./pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k.yml)\|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k/model.pdparams)\|[log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k/train.log)\|[vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/scalar?id=66db3a2815980e41274ad587df2cd4e4)|
|PP-LiteSeg-T|STDC1|160000|1024x512|1536x768|76.03%|76.74%|-|[config](./pp_liteseg_stdc1_cityscapes_1024x512_scale0.75_160k.yml)\|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc1_cityscapes_1024x512_scale0.75_160k/model.pdparams)\|[log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc1_cityscapes_1024x512_scale0.75_160k/train.log)\|[vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/scalar?id=ea5d56fbfceb8d020eabe46e9bc8c40c)|
|PP-LiteSeg-T|STDC1|160000|1024x512|2048x1024|77.04%|77.73%|77.46%|[config](./pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k.yml)\|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k/model.pdparams)\|[log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k/train.log)\|[vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=b9d2ca9445c5b3ee41db8ec37252d3e8)|
|PP-LiteSeg-B|STDC2|160000|1024x512|1024x512|75.25%|75.65%|-|[config](./pp_liteseg_stdc2_cityscapes_1024x512_scale0.5_160k.yml)\|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc2_cityscapes_1024x512_scale0.5_160k/model.pdparams)\|[log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc2_cityscapes_1024x512_scale0.5_160k/train.log)\|[vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=75a52ed995914223474b3c17e628d65e)|
|PP-LiteSeg-B|STDC2|160000|1024x512|1536x768|78.75%|79.23%|-|[config](./pp_liteseg_stdc2_cityscapes_1024x512_scale0.75_160k.yml)\|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc2_cityscapes_1024x512_scale0.75_160k/model.pdparams)\|[log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc2_cityscapes_1024x512_scale0.75_160k/train.log)\|[vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/scalar?id=a248fe1f645018306f1d4a0da33d97d6)|
|PP-LiteSeg-B|STDC2|160000|1024x512|2048x1024|79.04%|79.52%|79.85%|[config](./pp_liteseg_stdc2_cityscapes_1024x512_scale1.0_160k.yml)\|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc2_cityscapes_1024x512_scale1.0_160k/model.pdparams)\|[log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc2_cityscapes_1024x512_scale1.0_160k/train.log)\|[vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/scalar?id=12fa0144ca6a1541186afd2c53d31bcb)|
Note that:
* Use [infer_onnx_trt.py](../../deploy/python/infer_onnx_trt.py) to measure the inference speed.
* The flip denotes flip_horizontal, the ms denotes multi scale, i.e (0.75, 1.0, 1.25) * test_resolution.
* Simliar to other models in PaddleSeg, the mIoU in above table refer to the evaluation of PP-LiteSeg on Cityscapes validation set.
* You can download the trained model in above table and use it in evaluation.
**The comparisons with state-of-the-art real-time methods on Cityscapes as follows.**
<div align="center">
|Model|Encoder|Resolution|mIoU(Val)|mIoU(Test)|FPS|
|-|-|-|-|-|-|
ENet | - | 512x1024 | - | 58.3 | 76.9 |
ICNet | PSPNet50 | 1024x2048 | - | 69.5 | 30.3 |
ESPNet | ESPNet | 512x1024 | - | 60.3 | 112.9 |
ESPNetV2 | ESPNetV2 | 512x1024 | 66.4 | 66.2 | - |
SwiftNet | ResNet18 | 1024x2048 | 75.4 | 75.5 | 39.9 |
BiSeNetV1 | Xception39 | 768x1536 | 69.0 | 68.4 | 105.8 |
BiSeNetV1-L | ResNet18 | 768x1536 | 74.8 | 74.7 | 65.5 |
BiSeNetV2 | - | 512x1024 | 73.4 | 72.6 | 156 |
BiSeNetV2-L | - | 512x1024 | 75.8 | 75.3 | 47.3 |
FasterSeg | - | 1024x2048 | 73.1 | 71.5 | 163.9 |
SFNet | DF1 | 1024x2048 | - | 74.5 | 121 |
STDC1-Seg50 | STDC1 | 512x1024 | 72.2 | 71.9 | 250.4 |
STDC2-Seg50 | STDC2 | 512x1024 | 74.2 | 73.4 | 188.6 |
STDC1-Seg75 | STDC1 | 768x1536 | 74.5 | 75.3 | 126.7 |
STDC2-Seg75 | STDC2 | 768x1536 | 77.0 | 76.8 | 97.0 |
PP-LiteSeg-T1 | STDC1 | 512x1024 | 73.1 | 72.0 | 273.6 |
PP-LiteSeg-B1 | STDC2 | 512x1024 | 75.3 | 73.9 | 195.3 |
PP-LiteSeg-T2 | STDC1 | 768x1536 | 76.0 | 74.9 | 143.6 |
PP-LiteSeg-B2 | STDC2 | 768x1536 | 78.2 | 77.5 | 102.6|
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/52520497/162148733-70be896a-eadb-4790-94e5-f48dad356b2d.png" width = "500" height = "430" alt="iou_fps" />
</div>
### CamVid
| Model | Backbone | Training Iters | Train Resolution | Test Resolution | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|-|-|-|-|-|-|-|-|-|
|PP-LiteSeg-T|STDC1|10000|960x720|960x720|73.30%|73.89%|73.66%|[config](./pp_liteseg_stdc1_camvid_960x720_10k.yml)\|[model](https://paddleseg.bj.bcebos.com/dygraph/camvid/pp_liteseg_stdc1_camvid_960x720_10k/model.pdparams)\|[log](https://paddleseg.bj.bcebos.com/dygraph/camvid/pp_liteseg_stdc1_camvid_960x720_10k/train.log)\|[vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=5685c196ff76493cecf867564c7e49be)|
|PP-LiteSeg-B|STDC2|10000|960x720|960x720|75.10%|75.85%|75.48%|[config](./pp_liteseg_stdc2_camvid_960x720_10k.yml)\|[model](https://paddleseg.bj.bcebos.com/dygraph/camvid/pp_liteseg_stdc2_camvid_960x720_10k/model.pdparams)\|[log](https://paddleseg.bj.bcebos.com/dygraph/camvid/pp_liteseg_stdc2_camvid_960x720_10k/train.log)\|[vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/scalar?id=cf5223dd121d58ceff7fd93135efb573)|
Note:
* The flip denotes flip_horizontal, the ms denotes multi scale, i.e (0.75, 1.0, 1.25) * test_resolution.
* The mIoU in above table refer to the evaluation of PP-LiteSeg on CamVid test set.
batch_size: 6 # total: 4*6
iters: 10000
train_dataset:
type: Dataset
dataset_root: data/camvid
num_classes: 11
mode: train
train_path: data/camvid/train.txt
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.5
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [960, 720]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
val_dataset:
type: Dataset
dataset_root: data/camvid
num_classes: 11
mode: val
val_path: data/camvid/val.txt
transforms:
- type: Normalize
optimizer:
type: sgd
momentum: 0.9
weight_decay: 5.0e-4
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
end_lr: 0
power: 0.9
warmup_iters: 200
warmup_start_lr: 1.0e-5
loss:
types:
- type: OhemCrossEntropyLoss
min_kept: 250000 # batch_size * 960 * 720 // 16
- type: OhemCrossEntropyLoss
min_kept: 250000
- type: OhemCrossEntropyLoss
min_kept: 250000
coef: [1, 1, 1]
model:
type: PPLiteSeg
backbone:
type: STDC1
pretrained: https://bj.bcebos.com/paddleseg/dygraph/PP_STDCNet1.tar.gz
arm_out_chs: [32, 64, 128]
seg_head_inter_chs: [32, 64, 64]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment