README.md 11.9 KB
Newer Older
chenych's avatar
chenych committed
1
2
3
# DeepSolo
## 论文

Rayyyyy's avatar
Rayyyyy committed
4
5
`DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting`
- https://arxiv.org/abs/2211.10772
chenych's avatar
chenych committed
6

Rayyyyy's avatar
Rayyyyy committed
7
8
`DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Text Spotting`
- https://arxiv.org/abs/2305.19957
chenych's avatar
chenych committed
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

## 模型结构
一个简洁的类似DETR的基线,允许一个具有显式点的解码器同时进行检测和识别(图 (c)、(f))。

<div align=center>
    <img src="./doc/image.png"/>
</div>

## 算法原理
DeepSolo中,编码器在接收到图像特征后,生成由四个Bezier控制点表示的Bezier中心曲线候选和相应的分数,然后,选择前K个评分的候选。对于每个选定的曲线候选,在曲线上均匀采样N个点,这些点的坐标被编码为位置query并将其添加到内容query中形成复合query。接下来,将复合query输入deformable cross-attention解码器收集有用的文本特征。在解码器之后,采用了几个简单的并行预测头(线性层或MLP)将query解码为文本的中心线、边界、script和置信度,从而同时解决检测和识别问题。

<div align=center>
    <img src="./doc/DeepSolo.jpg"/>
</div>

## 环境配置
Rayyyyy's avatar
Rayyyyy committed
25
训练需要依赖**Detectron2**库,编译Detectron2库需要满足`Python ≥ 3.7``PyTorch ≥ 1.8`并且`torchvision``PyTorch`版本匹配,`gcc & g++ ≥ 5.4`。如果想要更快的构建,推荐安装`Ninja`
chenych's avatar
chenych committed
26

Rayyyyy's avatar
Rayyyyy committed
27
Tips: 如果`detectron2`安装失败,可尝试以下方式进行安装:
chenych's avatar
chenych committed
28

Rayyyyy's avatar
Rayyyyy committed
29
```bash
chenych's avatar
chenych committed
30
31
32
33
34
35
36
37
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
```

### Docker(方法一)

-v 路径、docker_name和imageID根据实际情况修改

Rayyyyy's avatar
Rayyyyy committed
38
```
shantf's avatar
shantf committed
39
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.8
chenych's avatar
chenych committed
40

shantf's avatar
shantf committed
41
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
chenych's avatar
chenych committed
42

chenych's avatar
chenych committed
43
cd /home/deepsolo_pytorch
chenych's avatar
chenych committed
44
pip install --upgrade setuptools wheel
shantf's avatar
shantf committed
45
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
chenych's avatar
chenych committed
46
47
48
49
50
51
52
53
54
55
56
57
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
bash make.sh
```

### Dockerfile(方法二)

-v 路径、docker_name和imageID根据实际情况修改

```
cd ./docker

docker build --no-cache -t deepsolo:latest .
shantf's avatar
shantf committed
58
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
chenych's avatar
chenych committed
59
60

cd /your_code_path/deepsolo_pytorch
chenych's avatar
chenych committed
61
pip install --upgrade setuptools wheel
shantf's avatar
shantf committed
62
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
chenych's avatar
chenych committed
63
64
65
66
67
68
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
bash make.sh
```

### Anaconda(方法三)

chenzk's avatar
chenzk committed
69
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.sourcefind.cn/tool/
chenych's avatar
chenych committed
70
71

```
shantf's avatar
shantf committed
72
DTK软件栈:dtk24.04.1
chenych's avatar
chenych committed
73
python:python3.8
shantf's avatar
shantf committed
74
75
torch:2.1.0
torchvision:0.16.0+das1.1.git7d45932.abi1.dtk2404.torch2.1
chenych's avatar
chenych committed
76
77
78
79
80
81
82
```

Tips:以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应

2、其他非特殊库直接按照下面步骤进行安装

```
chenych's avatar
chenych committed
83
pip install --upgrade setuptools wheel
shantf's avatar
shantf committed
84
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
chenych's avatar
chenych committed
85
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
chenych's avatar
chenych committed
86
87
88
89
90
bash make.sh
```

## 数据集

Rayyyyy's avatar
Rayyyyy committed
91
所有的数据集请保存在`deepsolo_pytorch/datasets`下,因数据集较大,请按训练的需求进行选择下载。训练需求详见`configs文件夹下各个yaml`**DATASETS**字段(测试数据同理)。
chenych's avatar
chenych committed
92
93

### 训练数据集
bailuo's avatar
bailuo committed
94
`[SynthText150K (CurvedSynText150K)]` [images](https://github.com/aim-uofa/AdelaiDet/tree/master/datasets) | [annotations(Part1)](https://1drv.ms/u/s!ApEsJ9RIZdBQgQTfQC578sYbkPik?e=2Yz06g) | [annotations(Part2)](https://1drv.ms/u/s!ApEsJ9RIZdBQgQJWqH404p34Wb1m?e=KImg6N) \
chenzk's avatar
chenzk committed
95
或者从SCNet下载
chenych's avatar
chenych committed
96

bailuo's avatar
bailuo committed
97
`[MLT]` [images](https://github.com/aim-uofa/AdelaiDet/tree/master/datasets) | [annotations](https://1drv.ms/u/s!ApEsJ9RIZdBQgQBpvuvV2KBBbN64?e=HVTCab) \
chenzk's avatar
chenzk committed
98
或者从SCNet下载
chenych's avatar
chenych committed
99

bailuo's avatar
bailuo committed
100
`[ICDAR2013]` [images](https://1drv.ms/u/s!ApEsJ9RIZdBQgQcK05sWzK3_t26T?e=5jTWAa) | [annotations](https://1drv.ms/u/s!ApEsJ9RIZdBQfbgqFCeiKOrTM0E?e=UMfIQh) \
chenzk's avatar
chenzk committed
101
或者从SCNet下载
chenych's avatar
chenych committed
102

bailuo's avatar
bailuo committed
103
`[ICDAR2015]` [images](https://1drv.ms/u/s!ApEsJ9RIZdBQgQbupfCNqVxtYGna?e=b4TQY2) | [annotations](https://1drv.ms/u/s!ApEsJ9RIZdBQfhGW5JDiNcDxfWQ?e=PZ2JCX) \
chenzk's avatar
chenzk committed
104
或者从SCNet下载
chenych's avatar
chenych committed
105

bailuo's avatar
bailuo committed
106
`[Total-Text]` [images](https://1drv.ms/u/s!ApEsJ9RIZdBQgQjyPyivo_FnjJ1H?e=qgSFYL) | [annotations](https://1drv.ms/u/s!ApEsJ9RIZdBQgQOShwd8O0K5Dd1f?e=GYyPAX) \
chenzk's avatar
chenzk committed
107
或者从SCNet下载
chenych's avatar
chenych committed
108

bailuo's avatar
bailuo committed
109
`[CTW1500]` [images](https://1drv.ms/u/s!ApEsJ9RIZdBQgQlZVAH5AJld3Y9g?e=zgG71Z) | [annotations](https://1drv.ms/u/s!ApEsJ9RIZdBQfPpyzxoFV34zBg4?e=WK20AN) \
chenzk's avatar
chenzk committed
110
或者从SCNet下载
chenych's avatar
chenych committed
111

bailuo's avatar
bailuo committed
112
`[TextOCR]` [images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip) | [annotations](https://1drv.ms/u/s!ApEsJ9RIZdBQgQHY3mjH13GRLPGI?e=Dx1O99) \
chenzk's avatar
chenzk committed
113
或者从SCNet下载
chenych's avatar
chenych committed
114

bailuo's avatar
bailuo committed
115
`[Inverse-Text]` [images](https://1drv.ms/u/s!AimBgYV7JjTlgccVhlbD4I3z5QfmsQ?e=myu7Ue) | [annotations](https://1drv.ms/u/s!ApEsJ9RIZdBQf3G4vZpf4QD5NKo?e=xR3GtY) \
chenzk's avatar
chenzk committed
116
或者从SCNet下载
chenych's avatar
chenych committed
117

bailuo's avatar
bailuo committed
118
`[SynChinese130K]` [images](https://github.com/aim-uofa/AdelaiDet/tree/master/datasets) | [annotations](https://1drv.ms/u/s!AimBgYV7JjTlgch5W0n1Iv397i0csw?e=Gq8qww) \
chenzk's avatar
chenzk committed
119
或者从SCNet下载
chenych's avatar
chenych committed
120

bailuo's avatar
bailuo committed
121
`[ArT]` [images](https://github.com/aim-uofa/AdelaiDet/tree/master/datasets) | [annotations](https://1drv.ms/u/s!AimBgYV7JjTlgch45d0VHNCoPC1jfQ?e=likK00) \
chenzk's avatar
chenzk committed
122
或者从SCNet下载
chenych's avatar
chenych committed
123

bailuo's avatar
bailuo committed
124
`[LSVT]` [images](https://github.com/aim-uofa/AdelaiDet/tree/master/datasets) | [annotations](https://1drv.ms/u/s!AimBgYV7JjTlgch7yjmrCSN0TgoO4w?e=NKd5OG) \
chenzk's avatar
chenzk committed
125
或者从SCNet下载
chenych's avatar
chenych committed
126

bailuo's avatar
bailuo committed
127
`[ReCTS]` [images](https://github.com/aim-uofa/AdelaiDet/tree/master/datasets) | [annotations](https://1drv.ms/u/s!AimBgYV7JjTlgch_xZ8otxFWfNgZSg?e=pdq28B) \
chenzk's avatar
chenzk committed
128
或者从SCNet下载
chenych's avatar
chenych committed
129

bailuo's avatar
bailuo committed
130
`[Evaluation ground-truth]` [Link](https://1drv.ms/u/s!ApEsJ9RIZdBQem-MG1TjuRWApyA?e=fVPnmT) \
chenzk's avatar
chenzk committed
131
或者从SCNet下载
chenych's avatar
chenych committed
132
133
134


### 验证数据集
Rayyyyy's avatar
Rayyyyy committed
135
```bash
chenych's avatar
chenych committed
136
137
138
139
140
141
142
143
cd datasets
mkdir evaluation
cd evaluation

wget -O gt_ctw1500.zip https://cloudstor.aarnet.edu.au/plus/s/xU3yeM3GnidiSTr/download
wget -O gt_totaltext.zip https://cloudstor.aarnet.edu.au/plus/s/SFHvin8BLUM4cNd/download
wget -O gt_icdar2015.zip https://drive.google.com/file/d/1wrq_-qIyb_8dhYVlDzLZTTajQzbic82Z/view?usp=sharing
wget -O gt_inversetext.zip https://cloudstor.aarnet.edu.au/plus/s/xU3yeM3GnidiSTr/download
bailuo's avatar
bailuo committed
144
# 或者从上面训练数据集中对应数据集的SCNet链接手动下载
chenych's avatar
chenych committed
145
146
147
148
149
150
```

### 数据集目录结构
用于正常训练的数据集请按此目录结构进行:

```
Rayyyyy's avatar
Rayyyyy committed
151
├── datasets
chenych's avatar
chenych committed
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
│   ├── simple
│       ├── test_images
│       ├── train_images
│       ├── test.json
│       └── train.json
│   ├── evaluation
│       ├── gt_totaltext.zip
│       ├── gt_ctw1500.zip
│       ├── gt_icdar2015.zip
│       └── gt_inversetext.zip
│   ├── syntext1
│       ├── train_images
│       └── annotations
│           ├── train_37voc.json
│           └── train_96voc.json
│   ├── syntext2
│       ├── train_images
│       └── annotations
│           ├── train_37voc.json
│           └── train_96voc.json
│   ├── mlt2017
│       ├── train_images
│       ├── train_37voc.json
│       └── train_96voc.json
│   ├── totaltext
│       ├── train_images
│       ├── test_images
│       ├── weak_voc_new.txt
│       ├── weak_voc_pair_list.txt
│       ├── train_37voc.json
│       ├── train_96voc.json
│       └── test.json
│   ├── ic13
│       ├── train_images
│       ├── train_37voc.json
│       └── train_96voc.json
│   ├── ic15
│       ├── train_images
│       ├── test_images
│       ├── new_strong_lexicon
│       ├── strong_lexicon
│       ├── ch4_test_vocabulary.txt
│       ├── ch4_test_vocabulary_new.txt
│       ├── ch4_test_vocabulary_pair_list.txt
│       ├── GenericVocabulary.txt
│       ├── GenericVocabulary_new.txt
│       ├── GenericVocabulary_pair_list.txt
│       ├── train_37voc.json
│       ├── train_96voc.json
│       └── test.json
│   ├── ctw1500
│       ├── train_images
│       ├── test_images
│       ├── weak_voc_new.txt
│       ├── weak_voc_pair_list.txt
│       ├── train_96voc.json
│       └── test.json
│   ├── textocr
│       ├── train_images
│       ├── train_37voc_1.json
│       └── train_37voc_2.json
│   ├── inversetext
│       ├── test_images
│       └── test.json
│   ├── chnsyntext
│       ├── syn_130k_images
│       └── chn_syntext.json
│   ├── ArT
│       ├── rename_artimg_train
│       └── art_train.json
│   ├── LSVT
│       ├── rename_lsvtimg_train
│       └── lsvt_train.json
│   ├── ReCTS
│       ├── ReCTS_train_images  # 18,000 images
Rayyyyy's avatar
Rayyyyy committed
227
228
│       ├── ReCTS_val_images    # 2,000 images
│       ├── ReCTS_test_images   # 5,000 images
chenych's avatar
chenych committed
229
230
231
232
233
│       ├── rects_train.json
│       ├── rects_val.json
│       └── rects_test.json
```

Rayyyyy's avatar
Rayyyyy committed
234
如果使用自己的数据集,请将数据标注转换成**COCO**的格式,并在`DeepSolo/adet/data/builtin.py`代码第18行`_PREDEFINED_SPLITS_TEXT`参数中,参照结构补充自己的数据集。
chenych's avatar
chenych committed
235

Rayyyyy's avatar
Rayyyyy committed
236
项目同样提供了迷你数据集`simple`进行学习。
chenych's avatar
chenych committed
237
238
239
240
241
242
243

## 训练

### 单机多卡

Tips: 以下参数请根据实际情况自行修改 train.sh 中的参数设定

Rayyyyy's avatar
Rayyyyy committed
244
`--config-file` yaml文件配置地址
chenych's avatar
chenych committed
245

Rayyyyy's avatar
Rayyyyy committed
246
`--num-gpus` 训练卡数量
chenych's avatar
chenych committed
247
248
249

修改后执行:

Rayyyyy's avatar
Rayyyyy committed
250
```bash
chenych's avatar
chenych committed
251
252
253
254
255
bash train.sh
```

## 推理
Tips:
Rayyyyy's avatar
Rayyyyy committed
256
257
258
测试有两种展示,一种 `visualization show`, 测试完成后会在`--output`路径下生成测试图片结果;

第二种是`eval show`, 测试完成后会展示测试结果数据, 没有测试图片结果展示。
chenych's avatar
chenych committed
259

chenych's avatar
chenych committed
260
261
需要修改的主要参数说明如下:

Rayyyyy's avatar
Rayyyyy committed
262
`${CONFIG_FILE}` yaml文件配置地址(注意修改预训练模型地址)
chenych's avatar
chenych committed
263

Rayyyyy's avatar
Rayyyyy committed
264
`${IMAGE_PATH}` 待测试数据地址
chenych's avatar
chenych committed
265

Rayyyyy's avatar
Rayyyyy committed
266
`${MODEL_PATH}` 待测试预训练模型地址
chenych's avatar
chenych committed
267

Rayyyyy's avatar
Rayyyyy committed
268
如需执行自己的预训练模型,请修改对应配置(visualization show 的模型地址修改在yaml文件中, eval show 的模型地址修改为${MODEL_PATH}参数输入), 具体配置可参考`test.sh`中提供的样例。
chenych's avatar
chenych committed
269

chenych's avatar
chenych committed
270
以 visualization show 为, 执行步骤如下:
chenych's avatar
chenych committed
271

Rayyyyy's avatar
Rayyyyy committed
272
1. 下载CTW1500的预训练模型`pretrain_ctw_96voc.pth`
chenych's avatar
chenych committed
273
274
275

|Backbone|Training Data|Weights|
|:------:|:------:|:------:|
chenzk's avatar
chenzk committed
276
|Res-50|Synth150K+Total-Text+MLT17+IC13+IC15|[OneDrive](https://1drv.ms/u/s!AimBgYV7JjTlgcdtYzwEBGvOH6CiBw?e=trgKFE) / [SCNet]|
chenych's avatar
chenych committed
277

Rayyyyy's avatar
Rayyyyy committed
278
将预训练模型放在`pretrained_models/CTW1500/`文件夹下,如果放置于其他地方,请同步修改yaml配置文件中`MODEL.WEIGHTS`地址。
chenych's avatar
chenych committed
279

Rayyyyy's avatar
Rayyyyy committed
280
2. 将待测试数据存放于`${IMAGE_PATH}`下,执行
chenych's avatar
chenych committed
281

Rayyyyy's avatar
Rayyyyy committed
282
```bash
chenych's avatar
chenych committed
283
284
285
bash test.sh
```

Rayyyyy's avatar
Rayyyyy committed
286
3. 推理结果默认保存在`test_results`文件夹下,可以使用参数`--output`替换结果保存路径。
chenych's avatar
chenych committed
287
288

## result
Rayyyyy's avatar
Rayyyyy committed
289
CTW1500上的结果展示:
chenych's avatar
chenych committed
290
291
292
293
294
295

<div align=center>
    <img src="./doc/results.jpg"/>
</div>

### 精度
Rayyyyy's avatar
Rayyyyy committed
296
基于`backbone=R50``ctw1500`上的测试结果如下表所示:
chenych's avatar
chenych committed
297

chenych's avatar
chenych committed
298
|Backbone|External Data|Det-P|Det-R|Det-F1|E2E-None|E2E-Generic|
chenych's avatar
chenych committed
299
|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
chenych's avatar
chenych committed
300
301
|Res-50(ours)|Synth150K+Total-Text+MLT17+IC13+IC15|0.9320|0.8363|0.8816|0.6783|0.9349|
|Res-50|Synth150K+Total-Text+MLT17+IC13+IC15|0.9329|0.8478|0.8883|0.6742|0.9373|
chenych's avatar
chenych committed
302
303
304
305
306

## 应用场景
### 算法类别
OCR

Rayyyyy's avatar
Rayyyyy committed
307
### 热点应用行业
chenych's avatar
chenych committed
308
309
310
政府,交通,物流

## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
311
http://developer.sourcefind.cn/codes/modelzoo/deepsolo_pytorch.git
chenych's avatar
chenych committed
312
313
314

## 参考资料
https://github.com/ViTAE-Transformer/DeepSolo.git