"include/ck/utility/utility.hpp" did not exist on "08c692433e527adb03995994c02181ea1ad8ba7e"
Commit 0d97cc8c authored by Sugon_ldc's avatar Sugon_ldc
Browse files

add new model

parents
Pipeline #316 failed with stages
in 0 seconds
# Dual Attention Network for Scene Segmentation
## Reference
> Fu, Jun, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. "Dual attention network for scene segmentation." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146-3154. 2019.
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|-|-|-|-|-|-|-|-|
|DANet|ResNet50_OS8|1024x512|80000|80.27%|80.53%|-|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/danet_resnet50_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/danet_resnet50_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=6caecf1222a0cc9124a376284a402cbe)|
### Pascal VOC 2012 + Aug
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|-|-|-|-|-|-|-|-|
|DANet|ResNet50_OS8|512x512|40000|78.55%|78.93%|79.68%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/danet_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/danet_resnet50_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=51a403a54302bc81dd5ec0310a6d50ba)|
_base_: '../_base_/cityscapes.yml'
batch_size: 1
iters: 80000
model:
type: DANet
backbone:
type: ResNet101_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
num_classes: 19
backbone_indices: [2, 3]
optimizer:
type: sgd
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 1, 1, 0.4]
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 80000
model:
type: DANet
backbone:
type: ResNet50_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
num_classes: 19
backbone_indices: [2, 3]
optimizer:
type: sgd
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 1, 1, 0.4]
_base_: '../_base_/pascal_voc12aug.yml'
model:
type: DANet
backbone:
type: ResNet50_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
backbone_indices: [2, 3]
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 1, 1, 0.4]
# Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes
## Reference
> Yuanduo Hong, Huihui Pan, Weichao Sun, Yisong Jia. "Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes." arXiv preprint arXiv:2101.06085 (2021).
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|-|-|-|-|-|-|-|-|
|DDRNet_23|-|1024x1024|120000|79.85%|80.11%|80.44%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/ddrnet23_cityscapes_1024x1024_120k/model.pdparams)\|[log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/ddrnet23_cityscapes_1024x1024_120k/train.log)\|[vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=33c0d5f37e5a708c605e43ef3845ea56)|
_base_: '../_base_/cityscapes_1024x1024.yml'
batch_size: 3
iters: 120000
model:
type: DDRNet_23
enable_auxiliary_loss: False
pretrained: https://bj.bcebos.com/paddleseg/dygraph/cityscapes/ddrnet23_cityscapes_1024x1024_120k/pretrain/model.pdparams
optimizer:
type: sgd
weight_decay: 0.0005
loss:
types:
- type: OhemCrossEntropyLoss
coef: [1]
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
end_lr: 0.0
power: 0.9
# Improving Semantic Segmentation via Decoupled Body and Edge Supervision
## Reference
> Li, Xiangtai, Xia Li, Li Zhang, Guangliang Cheng, Jianping Shi, Zhouchen Lin, Shaohua Tan, and Yunhai Tong. "Improving semantic segmentation via decoupled body and edge supervision." arXiv preprint arXiv:2007.10035 (2020).
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|DecoupledSegNet|ResNet50_OS8|1024x512|80000|80.86%|81.34%|81.49%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/decoupledsegnet_resnet50_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/decoupledsegnet_resnet50_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/scalar?id=3c5cba5e6f89b33dc75b43c62026dc12)|
|DecoupledSegNet|ResNet50_OS8|832x832|80000|81.26%|81.56%|81.80%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/decoupledsegnet_resnet50_os8_cityscapes_832x832_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/decoupledsegnet_resnet50_os8_cityscapes_832x832_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=e3e8f9044d96a57f7337f5928f2c265f)|
_base_: '../_base_/cityscapes.yml'
model:
type: DecoupledSegNet
backbone:
type: ResNet50_vd
output_stride: 8
multi_grid: [1, 2, 4]
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
num_classes: 19
backbone_indices: [0, 3]
aspp_ratios: [1, 12, 24, 36]
aspp_out_channels: 256
align_corners: False
pretrained: null
loss:
types:
- type: OhemCrossEntropyLoss
- type: RelaxBoundaryLoss
- type: BCELoss
weight: 'dynamic'
edge_label: True
- type: OhemEdgeAttentionLoss
coef: [1,1,25,1]
train_dataset:
edge: True
_base_: '../_base_/cityscapes.yml'
model:
type: DecoupledSegNet
backbone:
type: ResNet50_vd
output_stride: 8
multi_grid: [1, 2, 4]
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
num_classes: 19
backbone_indices: [0, 3]
aspp_ratios: [1, 12, 24, 36]
aspp_out_channels: 256
align_corners: False
pretrained: null
loss:
types:
- type: OhemCrossEntropyLoss
- type: RelaxBoundaryLoss
- type: BCELoss
weight: 'dynamic'
edge_label: True
- type: OhemEdgeAttentionLoss
coef: [1,1,25,1]
train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.75
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [832, 832]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.4
contrast_range: 0.4
saturation_range: 0.4
- type: Normalize
edge: True
optimizer:
weight_decay: 5.0e-4
# Rethinking Atrous Convolution for Semantic Image Segmentation
## Reference
> Chen, Liang-Chieh, George Papandreou, Florian Schroff, and Hartwig Adam. "Rethinking Atrous Convolution for Semantic Image Segmentation." arXiv preprint arXiv:1706.05587 (2017).
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|DeepLabV3|ResNet50_OS8|1024x512|80000|79.90%|80.22%|80.47%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3_resnet50_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3_resnet50_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=7e30d1cb34cd94400e1e1266538dfb6c)|
|DeepLabV3|ResNet101_OS8|1024x512|80000|80.85%|81.09%|81.54%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3_resnet101_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3_resnet101_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=1ff25b7f3c5e88a051b9dd273625f942)|
### Pascal VOC 2012 + Aug
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|DeepLabV3|ResNet50_OS8|512x512|40000|79.76%|80.11%|80.71%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/deeplabv3_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/deeplabv3_resnet50_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=a962383ae3c581aa75f644c2dfbdae29)|
|DeepLabV3|ResNet101_OS8|512x512|40000|80.62%|80.87%|81.48%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/deeplabv3_resnet101_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/deeplabv3_resnet101_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=e20ea47329476ceb9150961154b87c8b)|
_base_: 'deeplabv3_resnet50_os8_cityscapes_1024x512_80k.yml'
model:
backbone:
type: ResNet101_vd
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
_base_: 'deeplabv3_resnet50_os8_voc12aug_512x512_40k.yml'
model:
backbone:
type: ResNet101_vd
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 80000
model:
type: DeepLabV3
backbone:
type: ResNet50_vd
output_stride: 8
multi_grid: [1, 2, 4]
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
backbone_indices: [3]
aspp_ratios: [1, 12, 24, 36]
aspp_out_channels: 256
align_corners: False
pretrained: null
_base_: '../_base_/pascal_voc12aug.yml'
model:
type: DeepLabV3
backbone:
type: ResNet50_vd
output_stride: 8
multi_grid: [1, 2, 4]
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
backbone_indices: [3]
aspp_ratios: [1, 12, 24, 36]
aspp_out_channels: 256
align_corners: False
pretrained: null
# Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
## Reference
> Chen, Liang-Chieh, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. "Encoder-decoder with atrous separable convolution for semantic image segmentation." In Proceedings of the European conference on computer vision (ECCV), pp. 801-818. 2018.
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|DeepLabV3P|ResNet50_OS8|1024x512|80000|80.36%|80.57%|80.81%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3p_resnet50_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3p_resnet50_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=860bd0049ba5495d629a96d5aaf1bf75)|
|DeepLabV3P*|ResNet50_OS8|1024x512|80000|81.18%| 81.42% | 81.48% |[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3p_resnet50_os8_cityscapes_1024x512_80k_rmiloss/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3p_resnet50_os8_cityscapes_1024x512_80k_rmiloss/train.log) \| [vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/scalar?id=ce094fb8a42c056b6edb92f975cfa0e3)|
|DeepLabV3P|ResNet101_OS8|1024x512|80000|81.10%|81.38%|81.24%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3p_resnet101_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3p_resnet101_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=8b11e75b8977a0fd74180145350c27de)|
|DeepLabV3P|ResNet101_OS8|769x769|80000|81.53%|81.88%|82.12%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3p_resnet101_os8_cityscapes_769x769_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/deeplabv3p_resnet101_os8_cityscapes_769x769_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=420039406361cbc3cf7ec14c1084d886)|
DeepLabV3P* is DeepLabV3P with [RMI Loss](https://arxiv.org/abs/1910.12037), which requires paddlepaddle=2.2.
### Pascal VOC 2012 + Aug
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|DeepLabV3P|ResNet50_OS8|512x512|40000|80.66%|81.33%|81.49%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/deeplabv3p_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/deeplabv3p_resnet50_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=a2891ac5fb866b3ea8c38289e5b1d686)|
|DeepLabV3P|ResNet101_OS8|512x512|40000|80.60%|80.77%|80.75%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/deeplabv3p_resnet101_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/deeplabv3p_resnet101_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=304048e5c2b57949f56b75b88ccb5645)|
_base_: 'deeplabv3p_resnet50_os8_cityscapes_1024x512_80k.yml'
model:
backbone:
type: ResNet101_vd
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
_base_: '../_base_/cityscapes_769x769.yml'
batch_size: 2
iters: 80000
model:
type: DeepLabV3P
backbone:
type: ResNet101_vd
output_stride: 8
multi_grid: [1, 2, 4]
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
num_classes: 19
backbone_indices: [0, 3]
aspp_ratios: [1, 12, 24, 36]
aspp_out_channels: 256
align_corners: True
pretrained: null
_base_: 'deeplabv3p_resnet50_os8_voc12aug_512x512_40k.yml'
model:
backbone:
type: ResNet101_vd
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 80000
model:
type: DeepLabV3P
backbone:
type: ResNet50_vd
output_stride: 8
multi_grid: [1, 2, 4]
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
num_classes: 19
backbone_indices: [0, 3]
aspp_ratios: [1, 12, 24, 36]
aspp_out_channels: 256
align_corners: False
pretrained: null
_base_: 'deeplabv3p_resnet50_os8_cityscapes_1024x512_80k.yml'
loss:
types:
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: RMILoss
coef: [0.5, 0.5]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment