README.md 8.76 KB
Newer Older
luopl's avatar
luopl committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# SED
## 论文
`SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation`
- https://arxiv.org/abs/2311.15537
## 模型结构
SED的总体架构,首先使用分层编码器(可学习的)和文本编码器(冻结的)来生成像素级的图像-文本成本图。然后,我们引入了一种渐进融合解码器,将不同的层次编码器特征映射和成本图相结合。渐进融合解码器堆栈具有聚合模块(FAM)和跳跃层融合模块(SFM)。此外,我们在解码器中设计了一个类别早期屏蔽(CER),以在不牺牲性能的情况下加速推理速度。
<div align=center>
    <img src="./docs/main_arch.png"/>
</div>

## 算法原理
渐进融合解码器的结构如下图所示。渐进融合解码器(GFD)首先在空间和类层次上进行特征聚合(a),然后采用跳层融合(b)将之前解码层和层次编码器的特征映射结合起来。
<div align=center>
    <img src="./docs/Structure of gradual fusion decoder.png"/>
</div>

类别早期屏蔽(CER)结构如下图。在训练(a)期间,在每个解码器层之后附加一个辅助卷积来预测由标签监督的分割图。在推理(b)过程中,采用top-k策略来预测现有的类别,并拒绝下一个解码器层中不存在的类别。
<div align=center>
    <img src="./docs/Structure of category early rejection.png"/>
</div>


## 环境配置
### Docker(方法一)
chenzk's avatar
chenzk committed
25
此处提供[光源](https://www.sourcefind.cn/)拉取docker镜像的地址与使用步骤,以及[光合](https://developer.sourcefind.cn/tool/)开发者社区深度学习库下载地址
luopl's avatar
luopl committed
26
```
dcuai's avatar
dcuai committed
27
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.8 
luopl's avatar
luopl committed
28
29
30
docker run -it --shm-size=128G -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name sed  <your IMAGE ID> bash # <your IMAGE ID>为以上拉取的docker的镜像ID替换,本镜像为:ffa1f63239fc
cd /path/your_code_data/sed_pytorch
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
luopl's avatar
luopl committed
31
32
33
34
#安装detectron2-0.6
cd whl
pip install detectron2-0.6-cp38-cp38-linux_x86_64.whl
cd .. 
luopl's avatar
luopl committed
35
36
37
38
39
40
41
42
43
44
cd open_clip/
make install
```
### Dockerfile(方法二)
此处提供dockerfile的使用方法
```
docker build --no-cache -t sed:latest .
docker run -it --shm-size=128G -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name sed_pytorch  sed  bash
cd /path/your_code_data/sed_pytorch
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
luopl's avatar
luopl committed
45
46
#安装detectron2-0.6
cd whl
luopl's avatar
luopl committed
47
pip install detectron2-0.6-cp38-cp38-linux_x86_64.whl 
luopl's avatar
luopl committed
48
cd ..
luopl's avatar
luopl committed
49
50
51
52
53
54
cd open_clip/
make install
```
### Anaconda(方法三)
此处提供本地配置、编译的详细步骤,例如:

chenzk's avatar
chenzk committed
55
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
luopl's avatar
luopl committed
56
```
dcuai's avatar
dcuai committed
57
58
59
60
DTK驱动:dtk24.04.2
python:python3.8
torch: 2.1.0
torchvision: 0.16.0
luopl's avatar
luopl committed
61
62
63
64
65
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`

其它依赖环境安装如下:
```
luopl's avatar
luopl committed
66
cd /path/your_code_data/sed_pytorch
dcuai's avatar
dcuai committed
67
68
cd whl
pip install detectron2-0.6-cp38-cp38-linux_x86_64.whl
luopl's avatar
luopl committed
69
70
71
72
73
74
75
76
77
78
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
cd open_clip/
make install
```
## 数据集

SED 内置了对一些数据集的支持。假定数据集存在于环境变量 DETECTRON2_DATASETS 指定的目录中。在此目录下,Detectron2 将查找具有下述结构的数据集。
可以通过导出 DETECTRON2_DATASETS=/path/to/datasets 设置内置数据集的位置。如果未设置,默认值是相对于当前工作目录的 ./datasets,默认数据结构如下:

```
luopl's avatar
luopl committed
79
 ── datasets
luopl's avatar
luopl committed
80
81
82
83
84
85
86
│   ├── coco-stuff
│   ├── ADEChallengeData2016
│   ├── ADE20K_2021_17_01
│   ├── VOCdevkit
│   │  ├── VOC2010
│   │  └── VOC2012
```
chenzk's avatar
chenzk committed
87
数据准备详情查看[dataset/readme.md](https://developer.sourcefind.cn/codes/modelzoo/sed_pytorch/-/blob/master/datasets/README.md)
luopl's avatar
luopl committed
88

chenzk's avatar
chenzk committed
89
数据集SCnet快速下载链接:[coco-stuff](https://www.modelscope.cn/datasets/OmniData/COCO-Stuff)[coco2017](https://cocodataset.org/#home)[ADEChallengeData2016](http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip)[ADE20K_2021_17_01](https://ade20k.csail.mit.edu/index.html)[VOCdevkit](https://link.zhihu.com/?target=http%3A//host.robots.ox.ac.uk/pascal/VOC/voc2012/)
luopl's avatar
luopl committed
90

luopl's avatar
luopl committed
91
92
93
94
95
96
97
## 训练
首先下载模型文件CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K和laionCLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup,放于weights目录下:

[CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K](https://huggingface.co/laion/CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K)

[laionCLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup](https://huggingface.co/laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup)

luopl's avatar
luopl committed
98

luopl's avatar
luopl committed
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
注:使用本地模型文件时修改/open_clip/src/open_clip/pretrained.py文件中download_pretrained_from_hf函数下的cached_file地址为对应的权重本地文件夹地址,例如:
```
#ConvNeXt-B
cached_file = './weights/CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K/open_clip_pytorch_model.bin'

```
### 单机多卡
```
sh run.sh [CONFIG] [NUM_GPUS] [OUTPUT_DIR] [OPTS]

# For ConvNeXt-B variant
sh run.sh configs/convnextB_768.yaml 4 output/
# For ConvNeXt-L variant
sh run.sh configs/convnextL_768.yaml 4 output/
```
## 推理
模型权重文件下载表格如下,放到weights文件夹下:

|     Name     |    CLIP    | Download                                                                       |
|:------------:|:----------:|--------------------------------------------------------------------------------|
|   SED (B)    | ConvNeXt-B | [ckpt](https://drive.google.com/file/d/1qx6zGZgSPkF6TObregRz4uzQqSRHrgUw/view) |
| SED-fast (B) | ConvNeXt-B | [ckpt](https://drive.google.com/file/d/1qx6zGZgSPkF6TObregRz4uzQqSRHrgUw/view) |
|   SED (L)    | ConvNeXt-L | [ckpt](https://drive.google.com/file/d/1zAXE0QXy47n0cVn7j_2cSR85eqxdDGg8/view) |
| SED-fast (L) | ConvNeXt-L | [ckpt](https://drive.google.com/file/d/1zAXE0QXy47n0cVn7j_2cSR85eqxdDGg8/view) |

注意:模型配置文件、clip文件与权重文件应一一对应

luopl's avatar
luopl committed
126

luopl's avatar
luopl committed
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
### 单卡推理

```
#sh eval.sh [CONFIG] [NUM_GPUS] [OUTPUT_DIR] [OPTS]

#sh eval.sh configs/convnextB_768.yaml 1 output/ MODEL.WEIGHTS path/to/weights.pth
sh eval.sh configs/convnextB_768.yaml 1 output/ MODEL.WEIGHTS ./weights/sed_model_base.pth
# Fast version.
#sh eval.sh configs/convnextB_768.yaml 1 output/ MODEL.WEIGHTS path/to/weights.pth  TEST.FAST_INFERENCE True  TEST.TOPK 8
sh eval.sh configs/convnextB_768.yaml 1 output/ MODEL.WEIGHTS ./weights/sed_model_base.pth  TEST.FAST_INFERENCE True  TEST.TOPK 8
```

Demo_for_vis:
```
# python demo/demo_for_vis.py  --input image_path --output result_path --opts MODEL.WEIGHTS path/to/your/pretrained_model.pth
# image_path: A list of space separated input images; or a single glob pattern such as 'directory/*.jpg'
python demo/demo_for_vis.py  --input ./docs/ADE_val_00000001_Original.jpg --output demo_results --opts MODEL.WEIGHTS ./weights/sed_model_base.pth
```


### 多卡推理

```
#sh run.sh [CONFIG] [NUM_GPUS] [OUTPUT_DIR] [OPTS]

$sh eval.sh configs/convnextB_768.yaml 4 output/ MODEL.WEIGHTS path/to/weights.pth
sh eval.sh configs/convnextB_768.yaml 4 output/ MODEL.WEIGHTS ./weights/sed_model_base.pth
# Fast version.
#sh eval.sh configs/convnextB_768.yaml 4 output/ MODEL.WEIGHTS path/to/weights.pth  TEST.FAST_INFERENCE True  TEST.TOPK 8
sh eval.sh configs/convnextB_768.yaml 4 output/ MODEL.WEIGHTS ./weights/sed_model_base.pth  TEST.FAST_INFERENCE True  TEST.TOPK 8
```

## result
原图:
<div align=center>
    <img src="./docs/ADE_val_00000001_Original.jpg"/>
</div>

Demo_for_vis可视化结果如下:

<div align=center>
    <img src="./docs/ADE_val_00000001.jpg"/>
</div>


### 精度
使用四张DCU-K100卡推理

|     Name     |           CLIP            | A-847 | PC-459 | A-150 | PC-59 | PAS-20 |
|:------------:|:-------------------------:|------|------|-------|-------|--------|
|   SED (B)    |        ConvNeXt-B         | 11.2 | 18.6 | 31.7  | 57.7  | 94.4   |
| SED-fast (B) |        ConvNeXt-B         | 11.3 | 18.6 | 31.6  | 57.3  | 94.4   |
|   SED (L)    |        ConvNeXt-L         | 13.7 | 22.1 | 35.3  | 60.9  | 96.1   |
| SED-fast (L) |        ConvNeXt-L         | 13.9 | 22.6 | 35.2  | 60.6  | 96.1   |
## 应用场景
### 算法类别
`图像分割`
### 热点应用行业
`科研,制造,医疗,家居,教育`
## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
187
- https://developer.sourcefind.cn/codes/modelzoo/sed_pytorch
luopl's avatar
luopl committed
188
189
190
## 参考资料
- https://github.com/xb534/SED