README.md 4.54 KB
Newer Older
yuhai's avatar
yuhai committed
1
2
3
# SAM
## 论文
Segment Anything
yuhai's avatar
yuhai committed
4
- https://arxiv.org/abs/2304.02643
yuhai's avatar
yuhai committed
5
6
7
## 模型结构

![](./assets/model_diagram.png)
yuhai's avatar
yuhai committed
8

yuhai's avatar
yuhai committed
9
10
如图,该模型的网络结构主要分三个部分:Image encoder、Prompt encoder和Lightweight mask decoder。
### Image encoder
yuhai's avatar
yuhai committed
11
使用ViT-H/16网络处理高分辨率输入,输出是输入图像的16倍缩小的嵌入(64×64)。通道维度降低至256,通过1×1和3×3卷积层。
yuhai's avatar
yuhai committed
12
13

### Prompt encoder 
yuhai's avatar
yuhai committed
14
映射到256维向量嵌入,包括位置编码和前景/背景信息。框由左上角和右下角嵌入对表示。文本编码器使用CLIP。
yuhai's avatar
yuhai committed
15
16
17

### Lightweight mask decoder 
![](./assets/mask_decoder.PNG)
yuhai's avatar
yuhai committed
18

yuhai's avatar
yuhai committed
19
图像嵌入通过两个转置卷积层放大4倍,经过MLP输出掩码。Transformer使用256嵌入维度,64×64图像嵌入的交叉注视层使用128通道维度。
yuhai's avatar
yuhai committed
20

yuhai's avatar
yuhai committed
21
## 算法原理
yuhai's avatar
yuhai committed
22
![](./assets/algorithm.png)
yuhai's avatar
yuhai committed
23

yuhai's avatar
yuhai committed
24
SAM分为图像编码器和快速提示编码器/掩码解码器,可以重用相同的image embedding图像嵌入(并摊销其成本)与不同的提示。给定image embedding图像嵌入,提示编码器和掩码解码器可以在web浏览器中预测掩码。为了使SAM实现模糊感知,设计它来预测单个提示的多个掩码,从而使SAM能够自然地处理模糊性。
yuhai's avatar
yuhai committed
25

yuhai's avatar
yuhai committed
26
27
## 环境配置
### Docker(方法一)
yuhai's avatar
yuhai committed
28
29
30
[光源](https://www.sourcefind.cn/#/service-list)拉取镜像
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-22.10-py39-latest
yuhai's avatar
yuhai committed
31
docker run -it --network=host --name=SAM_pytorch  --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G --group-add video --cap-add=SYS_PTRACE image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-22.10-py39-latest /bin/bash
yuhai's avatar
yuhai committed
32
33
34
35
36
```
安装其他依赖:
```
pip install opencv-python pycocotools matplotlib onnxruntime onnx
```
yuhai's avatar
yuhai committed
37
38
39
40
41
42
43
44
45
46
### Dockerfile(方法二)
```
docker build --no-cache -t image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-22.10-py39-latest .
docker run -it --network=host --name=SAM_pytorch  --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G --group-add video --cap-add=SYS_PTRACE <上面获得的镜像ID> /bin/bash
```
安装其他依赖:
```
pip install opencv-python pycocotools matplotlib onnxruntime onnx
```
### Anaconda(方法三)
yuhai's avatar
yuhai committed
47
48
49
50
51
52
53
直接使用pip install的方式安装
```
pip install git+https://github.com/facebookresearch/segment-anything.git
```
或下载后本地安装
```
git clone git@github.com:facebookresearch/segment-anything.git
yuhai's avatar
yuhai committed
54
55
cd segment-anything
pip install -e .
yuhai's avatar
yuhai committed
56
57
58
59
```
## 数据集
数据集名称:SA-1B Dataset
完整数据集可在[这里](https://ai.facebook.com/datasets/segment-anything-downloads/)进行下载
yuhai's avatar
yuhai committed
60
61
62
63
64
65
66
67
项目中用于试验训练的迷你数据集结构如下
```
 ── notebooks
    │   ├── images
    │   │         ├── dog.jpg
    │   │         ├── groceries.jpg
    │   │         └── trunk.jpg
```
yuhai's avatar
yuhai committed
68
##训练
yuhai's avatar
yuhai committed
69
### 微调
yuhai's avatar
yuhai committed
70
官网提供了生成掩码的预训练权重和生成掩码的脚本,没有提供训练脚本,但可使用第三方提供的示例脚本微调
yuhai's avatar
yuhai committed
71
如果您有兴趣,参考[这里](https://github.com/luca-medeiros/lightning-sam/blob/main/lightning_sam/train.py).
yuhai's avatar
yuhai committed
72

yuhai's avatar
yuhai committed
73
### 单机单卡
yuhai's avatar
yuhai committed
74
75
76
77
78
79
```
git clone https://github.com/luca-medeiros/lightning-sam.git
cd lightning-sam
pip install .
python train.py
```
yuhai's avatar
yuhai committed
80
81
pip install . 过程中可能顶掉DCU版本的pytorch,可以到[开发者社区](https://cancon.hpccube.com:65024/4/main/pytorch)下载DCU版本对应包
## 推理
yuhai's avatar
yuhai committed
82
83
84
```
python scripts/amg.py --checkpoint <path/to/checkpoint> --model-type <model_type> --input <image_or_folder> --output <path/to/output>
```
yuhai's avatar
yuhai committed
85
注:checkpoint预训练模型在[这里](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth)下载
yuhai's avatar
yuhai committed
86
87
88
89
90
91
92
93

指令中:<path/to/checkpoint> 代表选择权重的路径

<model_type> 代表不同的模型,可选择'vit_h'、'vit_l'、 'vit_b'

<image_or_folder> 代表输入图片或者文件夹的路径

<path/to/output> 代表分割结果保存路径
yuhai's avatar
yuhai committed
94
95
## result

yuhai's avatar
yuhai committed
96
![](./ouputs/000000524456/0.png)
yuhai's avatar
yuhai committed
97
98
99
100
101
102
103
104
105
106










yuhai's avatar
yuhai committed
107
掩码生成的部分结果在同级目录outputs中可以查看,结果示例如上图,官方提供demo可在[这里](https://segment-anything.com/demo)试用
yuhai's avatar
yuhai committed
108
## 精度
yuhai's avatar
yuhai committed
109
110


yuhai's avatar
yuhai committed
111
112
## 应用场景
### 算法类别
yuhai's avatar
yuhai committed
113
图像分割
yuhai's avatar
yuhai committed
114
### 热点应用行业
yuhai's avatar
yuhai committed
115
能源,医疗,网安
yuhai's avatar
yuhai committed
116
117
118
## 源码仓库及问题反馈
https://developer.hpccube.com/codes/modelzoo/sam_pytorch
## 参考资料
yuhai's avatar
yuhai committed
119
https://github.com/facebookresearch/segment-anything
yuhai's avatar
yuhai committed
120

yuhai's avatar
yuhai committed
121
https://github.com/luca-medeiros/lightning-sam/blob/main/lightning_sam(第三方SAM微调)
yuhai's avatar
yuhai committed
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
















yuhai's avatar
yuhai committed
138