Commit 0d97cc8c authored by Sugon_ldc's avatar Sugon_ldc
Browse files

add new model

parents
Pipeline #316 failed with stages
in 0 seconds
_base_: '../_base_/pascal_voc12aug.yml'
model:
type: DeepLabV3P
backbone:
type: ResNet50_vd
output_stride: 8
multi_grid: [1, 2, 4]
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
backbone_indices: [0, 3]
aspp_ratios: [1, 12, 24, 36]
aspp_out_channels: 256
align_corners: False
pretrained: null
# Dynamic Multi-scale Filters for Semantic Segmentation
## Reference
> Junjun He, Zhongying Deng, Yu Qiao. "Dynamic Multi-scale Filters for Semantic Segmentation" In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3562-3572. 2019.
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|DMNet|ResNet101_vd|1024x512|80000|79.76%|80.11%|80.56%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/dmnet_resnet101_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/dmnet_resnet101_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/scalar?id=d5bac108e3ff90136771b677d8459d17)
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 80000
model:
type: DMNet
backbone:
type: ResNet101_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
optimizer:
type: sgd
weight_decay: 0.0005
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
end_lr: 0.0
power: 0.9
# Disentangled Non-Local Neural Networks
## Reference
> Minghao Yin, Zhuliang Yao, Yue Cao, Xiu Li, Zheng Zhang, Stephen Lin, Han Hu:
Disentangled Non-local Neural Networks. ECCV (15) 2020: 191-207.
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) |Links |
|-|-|-|-|-|-|-|-|
|DNLNet|ResNet50_OS8|1024x512|80000|79.95%|80.43%|-|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/dnlnet_resnet50_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/dnlnet_resnet50_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=922cf0682c5e684507ab54a14ef12847)|
|DNLNet|ResNet101_OS8|1024x512|80000|81.03%|81.38%|-|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/dnlnet_resnet101_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/dnlnet_resnet101_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=3e0d13c4d9dbf4115bbba2abdc88122c)|
### Pascal VOC 2012 + Aug
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|-|-|-|-|-|-|-|-|
|DNLNet|ResNet50_OS8|512x512|40000|80.89%|81.31%|81.56%|[model](https://paddleseg.bj.bcebos.com/dygraph/pascal_voc12/dnlnet_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/pascal_voc12/dnlnet_resnet50_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=8877c77bef8b227af22c5eb3017138ce)|
|DNLNet|ResNet101_OS8|512x512|40000|80.49%|80.83%| 81.33%|[model](https://paddleseg.bj.bcebos.com/dygraph/pascal_voc12/dnlnet_resnet101_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/pascal_voc12/dnlnet_resnet101_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=1d42c22da1c465d9a38e4204bebeeb54)|
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 80000
model:
type: DNLNet
backbone:
type: ResNet101_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
num_classes: 19
optimizer:
type: sgd
momentum: 0.9
weight_decay: 0.00004
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
_base_: '../_base_/pascal_voc12aug.yml'
model:
type: DNLNet
backbone:
type: ResNet101_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
optimizer:
type: sgd
momentum: 0.9
weight_decay: 4.0e-05
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 80000
model:
type: DNLNet
backbone:
type: ResNet50_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
num_classes: 19
optimizer:
type: sgd
momentum: 0.9
weight_decay: 0.00004
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
_base_: '../_base_/pascal_voc12aug.yml'
model:
type: DNLNet
backbone:
type: ResNet50_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
optimizer:
type: sgd
momentum: 0.9
weight_decay: 4.0e-05
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
power: 0.9
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
# Expectation-Maximization Attention Networks for Semantic Segmentation
## Reference
> Xia Li, Zhisheng Zhong, Jianlong Wu, Yibo Yang, Zhouchen Lin, Hong Liu:
Expectation-Maximization Attention Networks for Semantic Segmentation. ICCV 2019: 9166-9175.
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) |Links |
|-|-|-|-|-|-|-|-|
|EMANet|ResNet50_OS8|1024x512|80000|79.05%|79.34%|79.69%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/emanet_resnet50_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/emanet_resnet50_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=0a05a0c4cd7d785b9707bdc59f55f585)|
|EMANet|ResNet101_OS8|1024x512|80000|80.00%|80.23%|80.53%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/emanet_resnet101_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/emanet_resnet101_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=ee6926322b8e292ce23ce62ecdaa3439)|
### Pascal VOC 2012 + Aug
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|-|-|-|-|-|-|-|-|
|EMANet|ResNet50_OS8|512x512|40000|78.60%|78.90%|79.17%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/emanet_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/emanet_resnet50_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=3e60b80b984a71f3d2b83b8a746a819c)|
|EMANet|ResNet101_OS8|512x512|40000|79.47%|79.97%| 80.67%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/emanet_resnet101_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/emanet_resnet101_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=f33479772409766dbc40b5f031cbdb1a)|
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 80000
model:
type: EMANet
backbone:
type: ResNet101_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
num_classes: 19
ema_channels: 512
gc_channels: 256
num_bases: 64
stage_num: 3
momentum: 0.1
concat_input: True
enable_auxiliary_loss: True
align_corners: False
optimizer:
type: sgd
momentum: 0.9
weight_decay: 0.0005
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
_base_: '../_base_/pascal_voc12aug.yml'
model:
type: EMANet
backbone:
type: ResNet101_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
ema_channels: 512
gc_channels: 256
num_bases: 64
stage_num: 3
momentum: 0.1
concat_input: True
enable_auxiliary_loss: True
align_corners: True
optimizer:
type: sgd
momentum: 0.9
weight_decay: 0.0005
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 80000
model:
type: EMANet
backbone:
type: ResNet50_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
num_classes: 19
ema_channels: 512
gc_channels: 256
num_bases: 64
stage_num: 3
momentum: 0.1
concat_input: True
enable_auxiliary_loss: True
align_corners: False
optimizer:
type: sgd
momentum: 0.9
weight_decay: 0.0005
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
_base_: '../_base_/pascal_voc12aug.yml'
model:
type: EMANet
backbone:
type: ResNet50_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
ema_channels: 512
gc_channels: 256
num_bases: 64
stage_num: 3
momentum: 0.1
concat_input: True
enable_auxiliary_loss: True
align_corners: True
optimizer:
type: sgd
momentum: 0.9
weight_decay: 0.0005
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
coef: [1, 0.4]
# ENCNet: Context Encoding for Semantic Segmentation
## Reference
> Hang Zhang, Kristin Dana, et, al. "Context Encoding for Semantic Segmentation". In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7151-7160. 2018.
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|ENCNet|ResNet101_vd|1024x512|80000|79.42%|80.02%|-|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/encnet_resnet101_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/encnet_resnet101_os8_cityscapes_1024x512_80k/train.log )\| [vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/index?id=c2b819e6b666e4e50bba4b525f515d41)|
_base_: '../_base_/cityscapes.yml'
batch_size: 2
iters: 80000
model:
type: ENCNet
backbone:
type: ResNet101_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
num_codes: 32
mid_channels: 512
backbone_indices: [1, 2, 3]
use_se_loss: True
add_lateral: True
optimizer:
type: sgd
weight_decay: 0.0005
loss:
types:
- type: CrossEntropyLoss
- type: CrossEntropyLoss
- type: SECrossEntropyLoss
coef: [1, 0.4, 0.2]
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
end_lr: 0.0
power: 0.9
# ENet: A Deep Neural Network Architecture forReal-Time Semantic Segmentation
## Reference
> Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello. "ENet: A Deep Neural Network Architecture for
Real-Time Semantic Segmentation." arXiv preprint arXiv:1606.02147(2016).
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|ENet|-|1024x512|80000|67.42%|68.11%|67.99%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/enet_cityscapes_1024x512_80k/model.pdparams)\|[log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/enet_cityscapes_1024x512_80k/train.log)\|[vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=5d57386cdfcdb6a6bcb5135af134a0f2)|
_base_: '../_base_/cityscapes.yml'
batch_size: 8
train_dataset:
type: Cityscapes
dataset_root: data/cityscapes
transforms:
- type: RandomPaddingCrop
crop_size: [1024, 512]
- type: RandomDistort
brightness_range: 0.4
contrast_range: 0.4
saturation_range: 0.4
- type: Normalize
mode: train
model:
type: ENet
num_classes: 19
pretrained: Null
optimizer:
_inherited_: False
type: adam
weight_decay: 0.0002
lr_scheduler:
end_lr: 0
learning_rate: 0.001
power: 0.9
type: PolynomialDecay
# ESPNetV2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network
## Reference
> Mehta, Sachin, Mohammad Rastegari, Linda Shapiro, and Hannaneh Hajishirzi. "Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9190-9200. 2019.
## Performance
### CityScapes
| Model | Backbone | Resolution | Training Iters | mIoU | Links |
|:---:|:---:|:---:|:---:|:---:|:---:|
|ESPNetV2|-|1024x512|120000|70.88%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/espnet_cityscapes_1024x512_120k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/espnet_cityscapes_1024x512_120k/train.log) \|[vdl](https://www.paddlepaddle.org.cn/paddle/visualdl/service/app/scalar?id=c717bd8c2b5a083de759492158c14ffd)
#### Additional Requirement
- paddlepaddle develop version after 20211230
_base_: '../_base_/cityscapes.yml'
batch_size: 8
iters: 120000
optimizer:
_inherited_: False
type: adam
weight_decay: 0.0002
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.001
end_lr: 0.0
power: 0.9
loss:
types:
- type: CrossEntropyLoss
weight: [2.79834108 ,6.92945723 ,3.84068512 ,9.94349362 ,9.77098823 ,9.51484 ,10.30981624 ,9.94307377 ,4.64933892 ,9.55759938 ,7.86692178 ,9.53126629 ,10.3496365 ,6.67234062 ,10.26054204 ,10.28785275 ,10.28988296 ,10.40546021 ,10.13848367]
coef: [1, 1]
model:
type: ESPNetV2
in_channels: 3
scale: 2.0
num_classes: 19
drop_prob: 0.0
# ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
## Reference
> Mehta Sachin, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi. "ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation.".In Proceedings of the European Conference on Computer Vision, pp. 552-568. 2018.
## Performance
### Cityscapes
| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|ESPNetV2|-|1024x512|120000|61.82%|62.20%|62.89%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/espnetv1_cityscapes_1024x512_120k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/espnetv1_cityscapes_1024x512_120k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=472e91a0600420c99a0dc3a1e6f80f87)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment