Commit 0d97cc8c authored by Sugon_ldc's avatar Sugon_ldc
Browse files

add new model

parents
Pipeline #316 failed with stages
in 0 seconds
_base_: '../_base_/pascal_voc12aug.yml'
model:
type: ISANet
isa_channels: 256
backbone:
type: ResNet50_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
align_corners: True
optimizer:
type: sgd
momentum: 0.9
weight_decay: 0.00001
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
# LRASPP
## Reference
> Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, et al. "Searching for MobileNetV3." In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314-1324. 2019.
## Performance
### Cityscapes
| Model | Backbone | Resolution | Pooling Method | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|LRASPP|MobileNetV3_large_x1_0_os8|1024x512|Global|80000|72.33%|72.63%|73.77%|[config](./lraspp_mobilenetv3_cityscapes_1024x512_80k.yml) \| [model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/lraspp_mobilenetv3_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/lraspp_mobilenetv3_cityscapes_1024x512_80k/train.log) \| [vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app?id=d42c84fe5407fd2f1cf08e355348c441)|
|LRASPP|MobileNetV3_large_x1_0_os8|1024x512|Large kernel|80000|73.19%|73.40%|74.20%|[config](lraspp_mobilenetv3_cityscapes_1024x512_80k_large_kernel.yml) \| [model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/lraspp_mobilenetv3_cityscapes_1024x512_80k_large_kernel/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/lraspp_mobilenetv3_cityscapes_1024x512_80k_large_kernel/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=76c9c025d913c90ba703eeb5cef307e1)|
|LRASPP|MobileNetV3_large_x1_0|1024x512|Global|80000|70.13%|70.43%|72.12%|[config](lraspp_mobilenetv3_cityscapes_1024x512_80k_os32.yml) \| [model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/lraspp_mobilenetv3_cityscapes_1024x512_80k_os32/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/lraspp_mobilenetv3_cityscapes_1024x512_80k_os32/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=2ee4619b2858f38ff92cf602b793d248)|
Note that:
- The *global* pooling method refers to the use of a global average pooling layer in the LR-ASPP head, which easily adapts to small-sized input images. In contrast, the *large-kernel* pooling method uses a 49x49 kernel for average pooling, which is consistent with the design in the original paper.
- MobileNetV3_\*_os8 is a variant of MobileNetV3 tailored for semantic segmentation tasks. The output stride is 8, and dilated convolutional layers are used in place of the vanilla convolutional layers in the last two stages.
_base_: '../_base_/cityscapes.yml'
batch_size: 4
iters: 80000
optimizer:
weight_decay: 5.0e-4
lr_scheduler:
warmup_iters: 1000
warmup_start_lr: 1.0e-5
learning_rate: 0.005
loss:
types:
- type: OhemCrossEntropyLoss
min_kept: 130000
coef: [1]
train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [1024, 512]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
mode: train
model:
type: LRASPP
backbone:
type: MobileNetV3_large_x1_0_os8 # out channels: [24, 40, 112, 160]
pretrained: https://paddleseg.bj.bcebos.com/dygraph/backbone/mobilenetv3_large_x1_0_ssld.tar.gz
backbone_indices: [0, 1, 3]
lraspp_head_inter_chs: [32, 64]
lraspp_head_out_ch: 128
resize_mode: bilinear
\ No newline at end of file
_base_: '../_base_/cityscapes.yml'
batch_size: 4
iters: 80000
optimizer:
weight_decay: 5.0e-4
lr_scheduler:
warmup_iters: 1000
warmup_start_lr: 1.0e-5
learning_rate: 0.005
loss:
types:
- type: OhemCrossEntropyLoss
min_kept: 130000
coef: [1]
train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [1024, 512]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
mode: train
model:
type: LRASPP
backbone:
type: MobileNetV3_large_x1_0_os8 # out channels: [24, 40, 112, 160]
pretrained: https://paddleseg.bj.bcebos.com/dygraph/backbone/mobilenetv3_large_x1_0_ssld.tar.gz
backbone_indices: [0, 1, 3]
lraspp_head_inter_chs: [32, 64]
lraspp_head_out_ch: 128
resize_mode: bilinear
use_gap: False
\ No newline at end of file
_base_: '../_base_/cityscapes.yml'
batch_size: 4
iters: 80000
optimizer:
weight_decay: 5.0e-4
lr_scheduler:
warmup_iters: 1000
warmup_start_lr: 1.0e-5
learning_rate: 0.005
loss:
types:
- type: OhemCrossEntropyLoss
min_kept: 130000
coef: [1]
train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [1024, 512]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
mode: train
model:
type: LRASPP
backbone:
type: MobileNetV3_large_x1_0 # out channels: [24, 40, 112, 160]
pretrained: https://paddleseg.bj.bcebos.com/dygraph/backbone/mobilenetv3_large_x1_0_ssld.tar.gz
backbone_indices: [0, 1, 3]
lraspp_head_inter_chs: [32, 64]
lraspp_head_out_ch: 128
resize_mode: bilinear
\ No newline at end of file
English | [简体中文](./README_cn.md)
# MobileSeg
These semantic segmentation models are designed for mobile and edge devices.
MobileSeg models adopt encoder-decoder architecture and use lightweight models as encoder.
## Reference
> Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. "Mobilenetv2: Inverted residuals and linear bottlenecks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510-4520. 2018.
> Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for mobilenetv3." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314-1324. 2019.
> Ma, Ningning, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. "Shufflenet v2: Practical guidelines for efficient cnn architecture design." In Proceedings of the European conference on computer vision (ECCV), pp. 116-131. 2018.
> Yu, Changqian, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, and Jingdong Wang. "Lite-hrnet: A lightweight high-resolution network." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10440-10450. 2021.
> Han, Kai, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. "Ghostnet: More features from cheap operations." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580-1589. 2020.
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|MobileSeg|MobileNetV2|1024x512|80000|73.94%|74.32%|75.33%|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_mobilenetv2_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_mobilenetv2_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=f210c79b6fd52f5135cf2f238e9d678d)|
|MobileSeg|MobileNetV3_large_x1_0|1024x512|80000|73.47%|73.72%|74.70%|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_mobilenetv3_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_mobilenetv3_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=28c57d0e666337ea98a1046160ef95d2)|
|MobileSeg|Lite_HRNet_18|1024x512|80000|70.75%|71.62%|72.40%|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_litehrnet18_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_litehrnet18_cityscapes_1024x512_80k/train.log) \| [vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/scalar?id=02706145c7c463f3c76a0cb9d54728b8)|
|MobileSeg|ShuffleNetV2_x1_0|1024x512|80000|69.46%|70.00%|70.90%|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_shufflenetv2_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_shufflenetv2_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=3d83c00cf9b90f2446959e8c97a4fb7a)|
|MobileSeg|GhostNet_x1_0|1024x512|80000|71.88%|72.22%|73.11%|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_ghostnet_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_ghostnet_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=73a6b325c0ae941a40746d53911c03bc)|
## Inference Speed
| Model | Backbone | V100 TRT Inference Speed(FPS) | Snapdragon 855 Inference Speed(FPS) |
|:-------- |:--------:|:-------------------------------:|:-----------------------------------:|
| MobileSeg | MobileNetV2 | 67.57 | 27.01 |
| **MobileSeg** | MobileNetV3_large_x1_0 | 67.39 | 32.90 |
| MobileSeg | Lite_HRNet_18 | *10.5* | 13.05 |
| MobileSeg | ShuffleNetV2_x1_0 | *37.09* | 39.61 |
| MobileSeg | GhostNet_x1_0 | *35.58* | 38.74 |
Note that:
* Test the inference speed on Nvidia GPU V100: use PaddleInference Python API, enable TensorRT, the data type is FP32, the dimension of input is 1x3x1024x2048.
* Test the inference speed on Snapdragon 855: use PaddleLite CPP API, 1 thread, the dimension of input is 1x3x256x256.
简体中文 | [English](./README.md)
# MobileSeg
MobileSeg系列模型采用编解码架构,使用轻量级的模型作为骨干网络,适合部署在X86 CPU、ARM CPU等算量低的硬件。
## 参考论文
> Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. "Mobilenetv2: Inverted residuals and linear bottlenecks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510-4520. 2018.
> Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for mobilenetv3." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314-1324. 2019.
> Ma, Ningning, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. "Shufflenet v2: Practical guidelines for efficient cnn architecture design." In Proceedings of the European conference on computer vision (ECCV), pp. 116-131. 2018.
> Yu, Changqian, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, and Jingdong Wang. "Lite-hrnet: A lightweight high-resolution network." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10440-10450. 2021.
> Han, Kai, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. "Ghostnet: More features from cheap operations." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580-1589. 2020.
## 分割精度
### Cityscapes数据集
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|MobileSeg|MobileNetV2|1024x512|80000|73.94%|74.32%|75.33%|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_mobilenetv2_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_mobilenetv2_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=f210c79b6fd52f5135cf2f238e9d678d)|
|MobileSeg|MobileNetV3_large_x1_0|1024x512|80000|73.47%|73.72%|74.70%|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_mobilenetv3_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_mobilenetv3_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=28c57d0e666337ea98a1046160ef95d2)|
|MobileSeg|Lite_HRNet_18|1024x512|80000|70.75%|71.62%|72.40%|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_litehrnet18_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_litehrnet18_cityscapes_1024x512_80k/train.log) \| [vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/scalar?id=02706145c7c463f3c76a0cb9d54728b8)|
|MobileSeg|ShuffleNetV2_x1_0|1024x512|80000|69.46%|70.00%|70.90%|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_shufflenetv2_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_shufflenetv2_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=3d83c00cf9b90f2446959e8c97a4fb7a)|
|MobileSeg|GhostNet_x1_0|1024x512|80000|71.88%|72.22%|73.11%|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_ghostnet_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mobileseg_ghostnet_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=73a6b325c0ae941a40746d53911c03bc)|
## 预测速度
| Model | Backbone | V100 TRT Inference Speed(FPS) | Snapdragon 855 Inference Speed(FPS) |
|:-------- |:--------:|:-------------------------------:|:-----------------------------------:|
| MobileSeg | MobileNetV2 | 67.57 | 27.01 |
| **MobileSeg** | MobileNetV3_large_x1_0 | 67.39 | 32.90 |
| MobileSeg | Lite_HRNet_18 | *10.5* | 13.05 |
| MobileSeg | ShuffleNetV2_x1_0 | *37.09* | 39.61 |
| MobileSeg | GhostNet_x1_0 | *35.58* | 38.74 |
测试条件:
* 在Nvidia GPU V100上测试模型速度:使用PaddleInference Python接口,开启TensorRT,推理数据类型是FP32,输入图像的维度是1x3x1024x2048。
* 在小米9(骁龙855 CPU)上测试模型速度:使用PaddleLite C++接口,单线程,输入图像的维度是1x3x256x256。
_base_: '../_base_/cityscapes.yml'
batch_size: 4 # use 4 GPU in default
iters: 80000
optimizer:
weight_decay: 5.0e-4
lr_scheduler:
warmup_iters: 1000
warmup_start_lr: 1.0e-5
learning_rate: 0.005
loss:
types:
- type: OhemCrossEntropyLoss
min_kept: 130000
- type: OhemCrossEntropyLoss
min_kept: 130000
- type: OhemCrossEntropyLoss
min_kept: 130000
coef: [1, 1, 1]
train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [1024, 512]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
mode: train
model:
type: MobileSeg
backbone:
type: GhostNet_x1_0 # out channels: [24, 40, 112, 160]
pretrained: https://paddleseg.bj.bcebos.com/dygraph/backbone/ghostnet_x1_0.zip
cm_bin_sizes: [1, 2, 4]
cm_out_ch: 128
arm_out_chs: [32, 64, 128]
seg_head_inter_chs: [32, 32, 32]
\ No newline at end of file
_base_: '../_base_/cityscapes.yml'
batch_size: 4
iters: 80000
optimizer:
weight_decay: 5.0e-4
lr_scheduler:
warmup_iters: 1000
warmup_start_lr: 1.0e-5
learning_rate: 0.005
loss:
types:
- type: OhemCrossEntropyLoss
min_kept: 130000
- type: OhemCrossEntropyLoss
min_kept: 130000
- type: OhemCrossEntropyLoss
min_kept: 130000
coef: [1, 1, 1]
train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [1024, 512]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
mode: train
model:
type: MobileSeg
backbone:
type: Lite_HRNet_18
use_head: True # False : [40, 80, 160, 320] True: [40, 40, 80, 160]
pretrained: https://paddleseg.bj.bcebos.com/dygraph/backbone/lite_hrnet_18.tar.gz
backbone_indices: [0, 1, 2]
cm_bin_sizes: [1, 2, 4]
cm_out_ch: 128
arm_out_chs: [32, 64, 128]
seg_head_inter_chs: [32, 32, 32]
\ No newline at end of file
_base_: '../_base_/cityscapes.yml'
batch_size: 4
iters: 80000
optimizer:
weight_decay: 5.0e-4
lr_scheduler:
warmup_iters: 1000
warmup_start_lr: 1.0e-5
learning_rate: 0.005
loss:
types:
- type: OhemCrossEntropyLoss
min_kept: 130000
- type: OhemCrossEntropyLoss
min_kept: 130000
- type: OhemCrossEntropyLoss
min_kept: 130000
coef: [1, 1, 1]
train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [1024, 512]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
mode: train
model:
type: MobileSeg
backbone:
type: MobileNetV2_x1_0 # out channels: [24, 32, 96, 320]
pretrained: https://paddleseg.bj.bcebos.com/dygraph/backbone/mobilenetv2_x1_0_ssld.tar.gz
cm_bin_sizes: [1, 2, 4]
cm_out_ch: 128
arm_out_chs: [32, 64, 128]
seg_head_inter_chs: [32, 32, 32]
_base_: '../_base_/cityscapes.yml'
batch_size: 4
iters: 80000
optimizer:
weight_decay: 5.0e-4
lr_scheduler:
warmup_iters: 1000
warmup_start_lr: 1.0e-5
learning_rate: 0.005
loss:
types:
- type: OhemCrossEntropyLoss
min_kept: 130000
- type: OhemCrossEntropyLoss
min_kept: 130000
- type: OhemCrossEntropyLoss
min_kept: 130000
coef: [1, 1, 1]
train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [1024, 512]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
mode: train
model:
type: MobileSeg
backbone:
type: MobileNetV3_large_x1_0 # out channels: [24, 40, 112, 160]
pretrained: https://paddleseg.bj.bcebos.com/dygraph/backbone/mobilenetv3_large_x1_0_ssld.tar.gz
cm_bin_sizes: [1, 2, 4]
cm_out_ch: 128
arm_out_chs: [32, 64, 128]
seg_head_inter_chs: [32, 32, 32]
\ No newline at end of file
_base_: '../_base_/cityscapes.yml'
batch_size: 4
iters: 80000
optimizer:
weight_decay: 5.0e-4
lr_scheduler:
warmup_iters: 1000
warmup_start_lr: 1.0e-5
learning_rate: 0.005
loss:
types:
- type: OhemCrossEntropyLoss
min_kept: 130000
- type: OhemCrossEntropyLoss
min_kept: 130000
- type: OhemCrossEntropyLoss
min_kept: 130000
coef: [1, 1, 1]
train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [1024, 512]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
mode: train
model:
type: MobileSeg
backbone:
type: ShuffleNetV2_x1_0 # out channels: [24, 116, 232, 464]
pretrained: https://paddleseg.bj.bcebos.com/dygraph/backbone/shufflenetv2_x1_0.zip
cm_bin_sizes: [1, 2, 4]
cm_out_ch: 128
arm_out_chs: [32, 64, 128]
seg_head_inter_chs: [32, 32, 32]
\ No newline at end of file
# PSA: Polarized Self-Attention: Towards High-quality Pixel-wise Regression
## Reference
> Huajun Liu, Fuqiang Liu, Xinyi Fan and Dong Huang. "Polarized Self-Attention: Towards High-quality Pixel-wise Regression." arXiv preprint arXiv:2107.00782v2 (2021).
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
| :--------------: | :-------------: | :--------: | :------------: | :----: | :---------: | :------------: | :----------------------------------------------------------: |
| OCRNet-HRNet+psa | HRNETV2_W48+psa | 1024x2048 | 150000 | 84.62% | 84.90% | 84.01% | [model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mscale_ocrnet_hrnetv2_psa_cityscapes_1024x2048_150k/model.pdparams)\|[log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mscale_ocrnet_hrnetv2_psa_cityscapes_1024x2048_150k/train.log)\|[vdl](#) |
### Notes
* This is the MscaleOCRNet that supports PSA.
* Since we cannot reproduce the training results from [the authors' official repo](https://github.com/DeLightCMU/PSA), we follow the settings in the original paper to train and evaluate our models, and the final accuracy is lower than that reported in the paper.
* We observed a reduced accuracy when applying ms+flip augmentation to MsacleOCRNet during test time. This is probably due to that MscaleOCRNet has built internally multi-scale operations in the network.
_base_: '../_base_/cityscapes.yml'
batch_size: 1
iters: 150000
train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0
- type: RandomPaddingCrop
crop_size: [2048, 1024]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.25
brightness_prob: 1
contrast_range: 0.25
contrast_prob: 1
saturation_range: 0.25
saturation_prob: 1
hue_range: 0.25
hue_prob: 1
- type: RandomScaleAspect
- type: Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
mode: train
val_dataset:
transforms:
- type: Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
mode: val
export:
transforms:
- type: Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
model:
type: MscaleOCRNet
num_classes: 19
backbone:
type: HRNet_W48
use_psa: True
padding_same: False
pretrained: https://paddleseg.bj.bcebos.com/dygraph/cityscapes/mscale_ocrnet_hrnetv2_psa_cityscapes_1024x2048_150k/mscale_ocrnet_pretrained_mappilary.zip
optimizer:
type: sgd
momentum: 0.9
weight_decay: 5.0e-4
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.005
end_lr: 0.0
power: 2
warmup_iters: 5000
warmup_start_lr: 1.0e-5
loss:
types:
- type: CrossEntropyLoss
- type: MixedLoss
losses:
- type: RMILoss
- type: CrossEntropyLoss
coef: [1.0, 1.0]
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [0.4, 1.0, 0.05, 0.05]
# Object-Contextual Representations for Semantic Segmentation
## Reference
> Yuan, Yuhui, Xilin Chen, and Jingdong Wang. "Object-contextual representations for semantic segmentation." arXiv preprint arXiv:1909.11065 (2019).
## Performance
### CityScapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|OCRNet|HRNet_w18|1024x512|160000|80.67%|81.21%|81.30%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/ocrnet_hrnetw18_cityscapes_1024x512_160k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/ocrnet_hrnetw18_cityscapes_1024x512_160k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=901a5d0a78b71ca56f06002f05547837)|
|OCRNet|HRNet_w48|1024x512|160000|82.15%|82.59%|82.85%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/ocrnet_hrnetw48_cityscapes_1024x512_160k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/ocrnet_hrnetw48_cityscapes_1024x512_160k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=176bf6ca4d89957ffe62ac7c30fcd039) |
### Pascal VOC 2012 + Aug
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|OCRNet|HRNet_w18|512x512|40000|75.76%|76.39%|77.95%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw18_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw18_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=74707b83bc14b7d236146ac4ceaf6c9c)|
|OCRNet|HRNet_w48|512x512|40000|79.76%|80.47%|81.02%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw48_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw48_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=8f695743c799f8966a72973f3259fad4) |
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 160000
model:
type: OCRNet
backbone:
type: HRNet_W18
pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w18_ssld.tar.gz
num_classes: 19
backbone_indices: [0]
optimizer:
type: sgd
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 160000
model:
type: OCRNet
backbone:
type: HRNet_W18
pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w18_ssld.tar.gz
backbone_indices: [0]
optimizer:
type: sgd
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.8, 0.2]
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.8, 0.2]
coef: [1, 0.4]
batch_size: 4
iters: 15000
train_dataset:
type: MiniDeepGlobeRoadExtraction
dataset_root: data/MiniDeepGlobeRoadExtraction
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [768, 768]
- type: RandomHorizontalFlip
- type: Normalize
mode: train
val_dataset:
type: MiniDeepGlobeRoadExtraction
dataset_root: data/MiniDeepGlobeRoadExtraction
transforms:
- type: Normalize
mode: val
model:
type: OCRNet
backbone:
type: HRNet_W18
pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w18_ssld.tar.gz
backbone_indices: [0]
optimizer:
type: sgd
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
batch_size: 4
iters: 15000
train_dataset:
type: MiniDeepGlobeRoadExtraction
dataset_root: data/MiniDeepGlobeRoadExtraction
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [768, 768]
- type: RandomHorizontalFlip
- type: Normalize
mode: train
val_dataset:
type: MiniDeepGlobeRoadExtraction
dataset_root: data/MiniDeepGlobeRoadExtraction
transforms:
- type: Normalize
mode: val
model:
type: OCRNet
backbone:
type: HRNet_W18
pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w18_ssld.tar.gz
backbone_indices: [0]
optimizer:
type: sgd
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszHingeLoss
coef: [1, 0.01]
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszHingeLoss
coef: [1, 0.01]
coef: [1, 0.4]
_base_: '../_base_/pascal_voc12aug.yml'
model:
type: OCRNet
backbone:
type: HRNet_W18
pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w18_ssld.tar.gz
backbone_indices: [0]
optimizer:
type: sgd
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 1]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment