"sgl-router/vscode:/vscode.git/clone" did not exist on "1d1ce62495db61ac74a2edc24983561f48fb1338"
README_ch.md 5.82 KB
Newer Older
1
## Style Text
weishengyu's avatar
weishengyu committed
2

3
### 目录
dyning's avatar
dyning committed
4
- [一、工具简介](#工具简介)
dyning's avatar
dyning committed
5
6
- [二、环境配置](#环境配置)
- [三、快速上手](#快速上手)
dyning's avatar
dyning committed
7
- [四、应用案例](#应用案例)
weishengyu's avatar
weishengyu committed
8
- [五、代码结构](#代码结构)
weishengyu's avatar
weishengyu committed
9

dyning's avatar
dyning committed
10
<a name="工具简介"></a>
11
### 一、工具简介
12
<div align="center">
weishengyu's avatar
weishengyu committed
13
    <img src="doc/images/3.png" width="800">
14
</div>
weishengyu's avatar
weishengyu committed
15

dyning's avatar
dyning committed
16
<div align="center">
dyning's avatar
dyning committed
17
    <img src="doc/images/1.png" width="600">
dyning's avatar
dyning committed
18
19
</div>

dyning's avatar
dyning committed
20

dyning's avatar
dyning committed
21
Style-Text数据合成工具是基于百度自研的文本编辑算法《Editing Text in the Wild》https://arxiv.org/abs/1908.03047
dyning's avatar
dyning committed
22

dyning's avatar
dyning committed
23
不同于常用的基于GAN的数据合成工具,Style-Text主要框架包括:1.文本前景风格迁移模块 2.背景抽取模块 3.融合模块。经过这样三步,就可以迅速实现图像文本风格迁移。下图是一些该数据合成工具效果图。
weishengyu's avatar
weishengyu committed
24

25
<div align="center">
dyning's avatar
dyning committed
26
    <img src="doc/images/2.png" width="1000">
27
</div>
weishengyu's avatar
weishengyu committed
28

dyning's avatar
dyning committed
29
<a name="环境配置"></a>
30
### 二、环境配置
31

dyning's avatar
dyning committed
32
1. 参考[快速安装](../doc/doc_ch/installation.md),安装PaddleOCR。
33
2. 进入`StyleText`目录,下载模型,并解压:
weishengyu's avatar
weishengyu committed
34
35

```bash
36
cd StyleText
weishengyu's avatar
dbg  
weishengyu committed
37
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/style_text_models.zip
weishengyu's avatar
weishengyu committed
38
39
40
unzip style_text_models.zip
```

41
如果您将模型保存再其他位置,请在`configs/config.yml`中修改模型文件的地址,修改时需要同时修改这三个配置:
weishengyu's avatar
weishengyu committed
42
43
44
45
46
47
48
49
50
51
52
53

```
bg_generator:
  pretrain: style_text_models/bg_generator
...
text_generator:
  pretrain: style_text_models/text_generator
...
fusion_generator:
  pretrain: style_text_models/fusion_generator
```

dyning's avatar
dyning committed
54
<a name="快速上手"></a>
55
56
### 三、快速上手

weishengyu's avatar
weishengyu committed
57
#### 合成单张图
dyning's avatar
dyning committed
58
输入一张风格图和一段文字语料,运行tools/synth_image,合成单张图片,结果图像保存在当前目录下:
weishengyu's avatar
weishengyu committed
59
```python
weishengyu's avatar
weishengyu committed
60
python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
weishengyu's avatar
weishengyu committed
61
```
dyning's avatar
dyning committed
62
* 注意:语言选项和语料相对应,目前该工具只支持英文、简体中文和韩语。
dyning's avatar
dyning committed
63

weishengyu's avatar
weishengyu committed
64
例如,输入如下图片和语料"PaddleOCR":
dyning's avatar
dyning committed
65

weishengyu's avatar
weishengyu committed
66
<div align="center">
weishengyu's avatar
weishengyu committed
67
    <img src="examples/style_images/2.jpg" width="300">
weishengyu's avatar
weishengyu committed
68
</div>
dyning's avatar
dyning committed
69

dyning's avatar
dyning committed
70
生成合成数据`fake_fusion.jpg`
71
<div align="center">
weishengyu's avatar
weishengyu committed
72
    <img src="doc/images/4.jpg" width="300">
73
</div>
dyning's avatar
dyning committed
74

dyning's avatar
dyning committed
75
除此之外,程序还会生成并保存中间结果`fake_bg.jpg`:为风格参考图去掉文字后的背景;
dyning's avatar
dyning committed
76
   
weishengyu's avatar
weishengyu committed
77
78
79
<div align="center">
    <img src="doc/images/7.jpg" width="300">
</div>
dyning's avatar
dyning committed
80

dyning's avatar
dyning committed
81
`fake_text.jpg`:是用提供的字符串,仿照风格参考图中文字的风格,生成在灰色背景上的文字图片。
dyning's avatar
dyning committed
82
   
weishengyu's avatar
weishengyu committed
83
84
85
<div align="center">
    <img src="doc/images/8.jpg" width="300">
</div>
weishengyu's avatar
dbg  
weishengyu committed
86

weishengyu's avatar
weishengyu committed
87
#### 批量合成
weishengyu's avatar
weishengyu committed
88
在实际应用场景中,经常需要批量合成图片,补充到训练集中。StyleText可以使用一批风格图片和语料,批量合成数据。合成过程如下:
weishengyu's avatar
weishengyu committed
89

dyning's avatar
dyning committed
90
1.`configs/dataset_config.yml`中配置目标场景风格图像和语料的路径,具体如下:
dyning's avatar
dyning committed
91

dyning's avatar
dyning committed
92
93
   * `Global`
     * `output_dir:`:保存合成数据的目录。
weishengyu's avatar
dbg  
weishengyu committed
94
   * `StyleSampler`
weishengyu's avatar
weishengyu committed
95
96
97
98
99
100
101
     * `image_home`:风格图片目录;
     * `label_file`:风格图片路径列表文件,如果所用数据集有label,则label_file为label文件路径;
     * `with_label`:标志`label_file`是否为label文件。
   * `CorpusGenerator`
     * `method`:语料生成方法,目前有`FileCorpus``EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file``language`
     * `language`:语料的语种;
     * `corpus_file`: 语料文件路径。
weishengyu's avatar
weishengyu committed
102
   
dyning's avatar
dyning committed
103
104
   StyleText也提供了一批中英韩5万张通用场景数据用作文本风格图像,便于合成场景丰富的文本图像,下图给出了一些示例。
   
dyning's avatar
dyning committed
105
   中英韩5万张通用场景数据: [下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar) 
dyning's avatar
dyning committed
106
   
weishengyu's avatar
weishengyu committed
107
108
109
<div align="center">
    <img src="doc/images/5.png" width="800">
</div>
dyning's avatar
dyning committed
110

weishengyu's avatar
weishengyu committed
111
112
113
114
115
116
2. 运行`tools/synth_dataset`合成数据:

   ``` bash
   python -m tools.synth_dataset -c configs/dataset_config.yml
   ```

dyning's avatar
dyning committed
117
118
119
<a name="应用案例"></a>
### 四、应用案例
下面以金属表面英文数字识别和通用韩语识别两个场景为例,说明使用StyleText合成数据,来提升文本识别效果的实际案例。下图给出了一些真实场景图像和合成图像的示例:
120

weishengyu's avatar
weishengyu committed
121
122
123
124
<div align="center">
    <img src="doc/images/6.png" width="800">
</div>

dyning's avatar
dyning committed
125
在添加上述合成数据进行训练后,识别模型的效果提升,如下表所示:
weishengyu's avatar
weishengyu committed
126

dyning's avatar
dyning committed
127
| 场景     | 字符       | 原始数据 | 测试数据 | 只使用原始数据</br>识别准确率 | 新增合成数据 | 同时使用合成数据</br>识别准确率 | 指标提升 |
weishengyu's avatar
weishengyu committed
128
129
130
| -------- | ---------- | -------- | -------- | -------------------------- | ------------ | ---------------------- | -------- |
| 金属表面 | 英文和数字 | 2203     | 650      | 0.5938                     | 20000        | 0.7546                 | 16%      |
| 随机背景 | 韩语       | 5631     | 1230     | 0.3012                     | 100000       | 0.5057                 | 20%      |
131

dyning's avatar
dyning committed
132

weishengyu's avatar
weishengyu committed
133
134
<a name="代码结构"></a>
### 五、代码结构
weishengyu's avatar
dbg  
weishengyu committed
135
136
```
style_text_rec
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
|-- arch
|   |-- base_module.py
|   |-- decoder.py
|   |-- encoder.py
|   |-- spectral_norm.py
|   `-- style_text_rec.py
|-- configs
|   |-- config.yml
|   `-- dataset_config.yml
|-- engine
|   |-- corpus_generators.py
|   |-- predictors.py
|   |-- style_samplers.py
|   |-- synthesisers.py
|   |-- text_drawers.py
|   `-- writers.py
|-- examples
|   |-- corpus
|   |   `-- example.txt
|   |-- image_list.txt
|   `-- style_images
|       |-- 1.jpg
|       `-- 2.jpg
|-- fonts
|   |-- ch_standard.ttf
|   |-- en_standard.ttf
|   `-- ko_standard.ttf
|-- tools
|   |-- __init__.py
|   |-- synth_dataset.py
|   `-- synth_image.py
`-- utils
    |-- config.py
    |-- load_params.py
    |-- logging.py
    |-- math_functions.py
weishengyu's avatar
dbg  
weishengyu committed
173
    `-- sys_funcs.py
dyning's avatar
dyning committed
174
```