"vscode:/vscode.git/clone" did not exist on "ba74a8be7a620da0558f27802a19736627e9e64a"
README.md 6.27 KB
Newer Older
yuhai's avatar
yuhai committed
1
2
3
# SAM
## 论文
Segment Anything
yuhai's avatar
yuhai committed
4
- https://arxiv.org/abs/2304.02643
yuhai's avatar
yuhai committed
5
6
7
## 模型结构

![](./assets/model_diagram.png)
yuhai's avatar
yuhai committed
8

yuhai's avatar
yuhai committed
9
10
如图,该模型的网络结构主要分三个部分:Image encoder、Prompt encoder和Lightweight mask decoder。
### Image encoder
yuhai's avatar
yuhai committed
11
使用ViT-H/16网络处理高分辨率输入,输出是输入图像的16倍缩小的嵌入(64×64)。通道维度降低至256,通过1×1和3×3卷积层。
yuhai's avatar
yuhai committed
12
13

### Prompt encoder 
yuhai's avatar
yuhai committed
14
映射到256维向量嵌入,包括位置编码和前景/背景信息。框由左上角和右下角嵌入对表示。文本编码器使用CLIP。
yuhai's avatar
yuhai committed
15
16
17

### Lightweight mask decoder 
![](./assets/mask_decoder.PNG)
yuhai's avatar
yuhai committed
18

yuhai's avatar
yuhai committed
19
图像嵌入通过两个转置卷积层放大4倍,经过MLP输出掩码。Transformer使用256嵌入维度,64×64图像嵌入的交叉注视层使用128通道维度。
yuhai's avatar
yuhai committed
20

yuhai's avatar
yuhai committed
21
## 算法原理
yuhai's avatar
yuhai committed
22
![](./assets/algorithm.png)
yuhai's avatar
yuhai committed
23

yuhai's avatar
yuhai committed
24
SAM分为图像编码器和快速提示编码器/掩码解码器,可以重用相同的image embedding图像嵌入(并摊销其成本)与不同的提示。给定image embedding图像嵌入,提示编码器和掩码解码器可以在web浏览器中预测掩码。为了使SAM实现模糊感知,设计它来预测单个提示的多个掩码,从而使SAM能够自然地处理模糊性。
yuhai's avatar
yuhai committed
25

yuhai's avatar
yuhai committed
26
27
## 环境配置
### Docker(方法一)
yuhai's avatar
yuhai committed
28
29
[光源](https://www.sourcefind.cn/#/service-list)拉取镜像
```
dcuai's avatar
dcuai committed
30
31
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run -it --network=host --name=SAM_pytorch -v /opt/hyhal:/opt/hyhal:ro --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G --group-add video --cap-add=SYS_PTRACE image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 /bin/bash
yuhai's avatar
yuhai committed
32
33
34
```
安装其他依赖:
```
dcuai's avatar
dcuai committed
35
pip install opencv-python pycocotools matplotlib onnxruntime onnx 
yuhai's avatar
yuhai committed
36
```
yuhai's avatar
yuhai committed
37
38
### Dockerfile(方法二)
```
yuhai's avatar
yuhai committed
39
cd /path/to/dockerfile
dcuai's avatar
dcuai committed
40
41
docker build --no-cache -t sam_pytorch:latest .
docker run -it --network=host --name=SAM_pytorch -v /opt/hyhal:/opt/hyhal:ro  --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16g --group-add video --cap-add=SYS_PTRACE -it SAM_pytorch:latest bash
yuhai's avatar
yuhai committed
42
```
yuhai's avatar
yuhai committed
43

yuhai's avatar
yuhai committed
44
### Anaconda(方法三)
dcuai's avatar
dcuai committed
45
46
47
48
49
50
51
52
53
54
55
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
```
DTK软件栈:dtk24.04.1
python:python3.10
torch:2.1.0
torchvision:0.16.0
```

Tips:以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应

2、安装其他依赖
yuhai's avatar
yuhai committed
56
57
直接使用pip install的方式安装
```
dcuai's avatar
dcuai committed
58
pip install opencv-python pycocotools matplotlib onnxruntime onnx
yuhai's avatar
yuhai committed
59
60
61
62
pip install git+https://github.com/facebookresearch/segment-anything.git
```
或下载后本地安装
```
dcuai's avatar
dcuai committed
63
pip install opencv-python pycocotools matplotlib onnxruntime onnx
yuhai's avatar
yuhai committed
64
git clone git@github.com:facebookresearch/segment-anything.git
yuhai's avatar
yuhai committed
65
66
cd segment-anything
pip install -e .
yuhai's avatar
yuhai committed
67
68
```
## 数据集
dcuai's avatar
dcuai committed
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101

在本测试中训练部分数据集使用COCO2017数据集。
- 数据集快速下载中心:
  - [SCNet AIDatasets](http://113.200.138.88:18080/aidatasets)
- 数据集快速通道下载地址:
  - [数据集快速下载地址](http://113.200.138.88:18080/aidatasets/coco2017)
- 官方下载地址
  - [训练数据](http://images.cocodataset.org/zips/train2017.zip)
  - [验证数据](http://images.cocodataset.org/zips/val2017.zip)
  - [测试数据](http://images.cocodataset.org/zips/test2017.zip)
  - [标签数据](https://github.com/ultralytics/yolov5/releases/download/v1.0/coco2017labels.zip)

数据集的目录结构如下:

```
├── images 
│   ├── train2017
│   ├── val2017
│   ├── test2017
├── labels
│   ├── train2017
│   ├── val2017
├── annotations
│   ├── instances_val2017.json
├── LICENSE
├── README.txt 
├── test-dev2017.txt
├── train2017.txt
├── val2017.txt

```

推理数据集名称:SA-1B Dataset
yuhai's avatar
yuhai committed
102
完整数据集可在[这里](https://ai.facebook.com/datasets/segment-anything-downloads/)进行下载
yuhai's avatar
yuhai committed
103
104
105
106
107
108
109
110
项目中用于试验训练的迷你数据集结构如下
```
 ── notebooks
    │   ├── images
    │   │         ├── dog.jpg
    │   │         ├── groceries.jpg
    │   │         └── trunk.jpg
```
yuhai's avatar
yuhai committed
111
## 训练
yuhai's avatar
yuhai committed
112
### 微调
yuhai's avatar
yuhai committed
113
官网提供了生成掩码的预训练权重和生成掩码的脚本,没有提供训练脚本,但可使用第三方提供的示例脚本微调
yuhai's avatar
yuhai committed
114
如果您有兴趣,参考[这里](https://github.com/luca-medeiros/lightning-sam/blob/main/lightning_sam/train.py).
yuhai's avatar
yuhai committed
115

dcuai's avatar
dcuai committed
116
117
### 单机多卡
预训练模型在[这里](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth)下载
yuhai's avatar
yuhai committed
118
119
120
```
git clone https://github.com/luca-medeiros/lightning-sam.git
cd lightning-sam
dcuai's avatar
dcuai committed
121
修改pyproject.toml文件中的第六行为documentation = "https://this/needs/to/be/something/otherwise/poetry/complains"
yuhai's avatar
yuhai committed
122
pip install .
dcuai's avatar
dcuai committed
123
124
125
pip install tensorboardX==2.6.2.2
cd lightning_sam
根据实际情况在config.py中修改相关参数:卡数、数据集路径、checkpoint模型路径
yuhai's avatar
yuhai committed
126
127
python train.py
```
dcuai's avatar
dcuai committed
128
pip install . 过程中会顶掉DCU版本的pytorch、torchvision、triton,需要到[开发者社区](https://cancon.hpccube.com:65024/4/main/pytorch)下载DCU版本对应包
yuhai's avatar
yuhai committed
129
## 推理
yuhai's avatar
yuhai committed
130
131
132
```
python scripts/amg.py --checkpoint <path/to/checkpoint> --model-type <model_type> --input <image_or_folder> --output <path/to/output>
```
yuhai's avatar
yuhai committed
133
注:checkpoint预训练模型在[这里](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth)下载
yuhai's avatar
yuhai committed
134
135
136
137
138
139
140
141

指令中:<path/to/checkpoint> 代表选择权重的路径

<model_type> 代表不同的模型,可选择'vit_h'、'vit_l'、 'vit_b'

<image_or_folder> 代表输入图片或者文件夹的路径

<path/to/output> 代表分割结果保存路径
yuhai's avatar
yuhai committed
142
143
## result

yuhai's avatar
yuhai committed
144
![](./ouputs/000000524456/0.png)
yuhai's avatar
yuhai committed
145
146
147
148
149
150
151
152
153
154










yuhai's avatar
yuhai committed
155
掩码生成的部分结果在同级目录outputs中可以查看,结果示例如上图,官方提供demo可在[这里](https://segment-anything.com/demo)试用
yuhai's avatar
yuhai committed
156
## 精度
yuhai's avatar
yuhai committed
157
158


yuhai's avatar
yuhai committed
159
160
## 应用场景
### 算法类别
yuhai's avatar
yuhai committed
161
图像分割
yuhai's avatar
yuhai committed
162
### 热点应用行业
yuhai's avatar
yuhai committed
163
能源,医疗,网安
yuhai's avatar
yuhai committed
164
165
166
## 源码仓库及问题反馈
https://developer.hpccube.com/codes/modelzoo/sam_pytorch
## 参考资料
yuhai's avatar
yuhai committed
167
https://github.com/facebookresearch/segment-anything
yuhai's avatar
yuhai committed
168

yuhai's avatar
yuhai committed
169
https://github.com/luca-medeiros/lightning-sam/blob/main/lightning_sam(第三方SAM微调)
yuhai's avatar
yuhai committed
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
















yuhai's avatar
yuhai committed
186