README_ch.md 5.85 KB
Newer Older
littletomatodonkey's avatar
fix doc  
littletomatodonkey committed
1
2
简体中文 | [English](README.md)

3
## Style Text
weishengyu's avatar
weishengyu committed
4

5
### 目录
dyning's avatar
dyning committed
6
- [一、工具简介](#工具简介)
dyning's avatar
dyning committed
7
8
- [二、环境配置](#环境配置)
- [三、快速上手](#快速上手)
dyning's avatar
dyning committed
9
- [四、应用案例](#应用案例)
weishengyu's avatar
weishengyu committed
10
- [五、代码结构](#代码结构)
weishengyu's avatar
weishengyu committed
11

dyning's avatar
dyning committed
12
<a name="工具简介"></a>
13
### 一、工具简介
14
<div align="center">
weishengyu's avatar
weishengyu committed
15
    <img src="doc/images/3.png" width="800">
16
</div>
weishengyu's avatar
weishengyu committed
17

dyning's avatar
dyning committed
18
<div align="center">
dyning's avatar
dyning committed
19
    <img src="doc/images/1.png" width="600">
dyning's avatar
dyning committed
20
21
</div>

dyning's avatar
dyning committed
22

dyning's avatar
dyning committed
23
Style-Text数据合成工具是基于百度自研的文本编辑算法《Editing Text in the Wild》https://arxiv.org/abs/1908.03047
dyning's avatar
dyning committed
24

dyning's avatar
dyning committed
25
不同于常用的基于GAN的数据合成工具,Style-Text主要框架包括:1.文本前景风格迁移模块 2.背景抽取模块 3.融合模块。经过这样三步,就可以迅速实现图像文本风格迁移。下图是一些该数据合成工具效果图。
weishengyu's avatar
weishengyu committed
26

27
<div align="center">
dyning's avatar
dyning committed
28
    <img src="doc/images/2.png" width="1000">
29
</div>
weishengyu's avatar
weishengyu committed
30

dyning's avatar
dyning committed
31
<a name="环境配置"></a>
32
### 二、环境配置
33

dyning's avatar
dyning committed
34
1. 参考[快速安装](../doc/doc_ch/installation.md),安装PaddleOCR。
35
2. 进入`StyleText`目录,下载模型,并解压:
weishengyu's avatar
weishengyu committed
36
37

```bash
38
cd StyleText
weishengyu's avatar
dbg  
weishengyu committed
39
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/style_text_models.zip
weishengyu's avatar
weishengyu committed
40
41
42
unzip style_text_models.zip
```

43
如果您将模型保存再其他位置,请在`configs/config.yml`中修改模型文件的地址,修改时需要同时修改这三个配置:
weishengyu's avatar
weishengyu committed
44
45
46
47
48
49
50
51
52
53
54
55

```
bg_generator:
  pretrain: style_text_models/bg_generator
...
text_generator:
  pretrain: style_text_models/text_generator
...
fusion_generator:
  pretrain: style_text_models/fusion_generator
```

dyning's avatar
dyning committed
56
<a name="快速上手"></a>
57
58
### 三、快速上手

weishengyu's avatar
weishengyu committed
59
#### 合成单张图
dyning's avatar
dyning committed
60
输入一张风格图和一段文字语料,运行tools/synth_image,合成单张图片,结果图像保存在当前目录下:
littletomatodonkey's avatar
fix doc  
littletomatodonkey committed
61

weishengyu's avatar
weishengyu committed
62
```python
weishengyu's avatar
weishengyu committed
63
python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
weishengyu's avatar
weishengyu committed
64
```
dyning's avatar
dyning committed
65
* 注意:语言选项和语料相对应,目前该工具只支持英文、简体中文和韩语。
dyning's avatar
dyning committed
66

weishengyu's avatar
weishengyu committed
67
例如,输入如下图片和语料"PaddleOCR":
dyning's avatar
dyning committed
68

weishengyu's avatar
weishengyu committed
69
<div align="center">
weishengyu's avatar
weishengyu committed
70
    <img src="examples/style_images/2.jpg" width="300">
weishengyu's avatar
weishengyu committed
71
</div>
dyning's avatar
dyning committed
72

dyning's avatar
dyning committed
73
生成合成数据`fake_fusion.jpg`
74
<div align="center">
weishengyu's avatar
weishengyu committed
75
    <img src="doc/images/4.jpg" width="300">
76
</div>
dyning's avatar
dyning committed
77

dyning's avatar
dyning committed
78
除此之外,程序还会生成并保存中间结果`fake_bg.jpg`:为风格参考图去掉文字后的背景;
littletomatodonkey's avatar
fix doc  
littletomatodonkey committed
79

weishengyu's avatar
weishengyu committed
80
81
82
<div align="center">
    <img src="doc/images/7.jpg" width="300">
</div>
dyning's avatar
dyning committed
83

dyning's avatar
dyning committed
84
`fake_text.jpg`:是用提供的字符串,仿照风格参考图中文字的风格,生成在灰色背景上的文字图片。
littletomatodonkey's avatar
fix doc  
littletomatodonkey committed
85

weishengyu's avatar
weishengyu committed
86
87
88
<div align="center">
    <img src="doc/images/8.jpg" width="300">
</div>
weishengyu's avatar
dbg  
weishengyu committed
89

weishengyu's avatar
weishengyu committed
90
#### 批量合成
weishengyu's avatar
weishengyu committed
91
在实际应用场景中,经常需要批量合成图片,补充到训练集中。StyleText可以使用一批风格图片和语料,批量合成数据。合成过程如下:
weishengyu's avatar
weishengyu committed
92

dyning's avatar
dyning committed
93
1.`configs/dataset_config.yml`中配置目标场景风格图像和语料的路径,具体如下:
dyning's avatar
dyning committed
94

dyning's avatar
dyning committed
95
96
   * `Global`
     * `output_dir:`:保存合成数据的目录。
weishengyu's avatar
dbg  
weishengyu committed
97
   * `StyleSampler`
weishengyu's avatar
weishengyu committed
98
99
100
101
102
103
104
     * `image_home`:风格图片目录;
     * `label_file`:风格图片路径列表文件,如果所用数据集有label,则label_file为label文件路径;
     * `with_label`:标志`label_file`是否为label文件。
   * `CorpusGenerator`
     * `method`:语料生成方法,目前有`FileCorpus``EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file``language`
     * `language`:语料的语种;
     * `corpus_file`: 语料文件路径。
littletomatodonkey's avatar
fix doc  
littletomatodonkey committed
105

dyning's avatar
dyning committed
106
   StyleText也提供了一批中英韩5万张通用场景数据用作文本风格图像,便于合成场景丰富的文本图像,下图给出了一些示例。
littletomatodonkey's avatar
fix doc  
littletomatodonkey committed
107
108
109

   中英韩5万张通用场景数据: [下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)

weishengyu's avatar
weishengyu committed
110
111
112
<div align="center">
    <img src="doc/images/5.png" width="800">
</div>
dyning's avatar
dyning committed
113

weishengyu's avatar
weishengyu committed
114
115
116
117
118
119
2. 运行`tools/synth_dataset`合成数据:

   ``` bash
   python -m tools.synth_dataset -c configs/dataset_config.yml
   ```

dyning's avatar
dyning committed
120
121
122
<a name="应用案例"></a>
### 四、应用案例
下面以金属表面英文数字识别和通用韩语识别两个场景为例,说明使用StyleText合成数据,来提升文本识别效果的实际案例。下图给出了一些真实场景图像和合成图像的示例:
123

weishengyu's avatar
weishengyu committed
124
125
126
127
<div align="center">
    <img src="doc/images/6.png" width="800">
</div>

dyning's avatar
dyning committed
128
在添加上述合成数据进行训练后,识别模型的效果提升,如下表所示:
weishengyu's avatar
weishengyu committed
129

dyning's avatar
dyning committed
130
| 场景     | 字符       | 原始数据 | 测试数据 | 只使用原始数据</br>识别准确率 | 新增合成数据 | 同时使用合成数据</br>识别准确率 | 指标提升 |
weishengyu's avatar
weishengyu committed
131
132
133
| -------- | ---------- | -------- | -------- | -------------------------- | ------------ | ---------------------- | -------- |
| 金属表面 | 英文和数字 | 2203     | 650      | 0.5938                     | 20000        | 0.7546                 | 16%      |
| 随机背景 | 韩语       | 5631     | 1230     | 0.3012                     | 100000       | 0.5057                 | 20%      |
134

dyning's avatar
dyning committed
135

weishengyu's avatar
weishengyu committed
136
137
<a name="代码结构"></a>
### 五、代码结构
weishengyu's avatar
dbg  
weishengyu committed
138
139
```
style_text_rec
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
|-- arch
|   |-- base_module.py
|   |-- decoder.py
|   |-- encoder.py
|   |-- spectral_norm.py
|   `-- style_text_rec.py
|-- configs
|   |-- config.yml
|   `-- dataset_config.yml
|-- engine
|   |-- corpus_generators.py
|   |-- predictors.py
|   |-- style_samplers.py
|   |-- synthesisers.py
|   |-- text_drawers.py
|   `-- writers.py
|-- examples
|   |-- corpus
|   |   `-- example.txt
|   |-- image_list.txt
|   `-- style_images
|       |-- 1.jpg
|       `-- 2.jpg
|-- fonts
|   |-- ch_standard.ttf
|   |-- en_standard.ttf
|   `-- ko_standard.ttf
|-- tools
|   |-- __init__.py
|   |-- synth_dataset.py
|   `-- synth_image.py
`-- utils
    |-- config.py
    |-- load_params.py
    |-- logging.py
    |-- math_functions.py
weishengyu's avatar
dbg  
weishengyu committed
176
    `-- sys_funcs.py
dyning's avatar
dyning committed
177
```