Tutorial_Cpp.md 10.3 KB
Newer Older
liuhy's avatar
liuhy committed
1
2
3
4
5
6
# 概述
PP-OCRv5 是PP-OCR新一代文字识别解决方案,该方案聚焦于多场景、多文字类型的文字识别。在文字类型方面,PP-OCRv5支持简体中文、中文拼音、繁体中文、英文、日文5大主流文字类型,在场景方面,PP-OCRv5升级了中英复杂手写体、竖排文本、生僻字等多种挑战性场景的识别能力。在内部多场景复杂评估集上,PP-OCRv5较PP-OCRv4端到端提升13个百分点,本sample适配了PPOcrV5字符检测和识别模型,并使用MIGraphX 的C++接口实现推理。

## 模型简介

### 文本检测 
7
8
9
文本检测使用了dbnet( 论文地址:https://arxiv.org/pdf/1911.08947 ),网络结构:
![alt text](Images/DBNet.png) 
模型输出概率图,并用Vatti Clipping算法对字符区域多边形简化处理,sample中借助Clipping 库。 sample模型输入shape为[1,3,640,640],模型路径:Resource/Models/ppocrv5_server_det_infer.onnx
liuhy's avatar
liuhy committed
10
### 文本识别
11
文本识别使用了CRNN+CTCDecode( https://arxiv.org/pdf/2009.09941 ),网络结构:
liuhy's avatar
liuhy committed
12

13
![(Images/CRNN.png)](Images/CRNN.png)
liuhy's avatar
liuhy committed
14

15
16
sample中模型输入shape为[1,3,48,720],模型路径:Resource/Models/ppocrv5_server_rec_infer.onnx
  																										
liuhy's avatar
liuhy committed
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
## 预处理
### 检测模型预处理
检测模型输入数据预处理:
- 图片等比缩放,填充(沿着右、下填充)
- 图片归一化,减均值除方差
- transpose ,MigraphX的输入数据排布顺序为[N,C,H,W]


```c++
cv::Size OcrDet::preproc(cv::Mat img,float* data)
{
    float scale = 1.0/255.0;
    std::vector<float> s_mean={0.485, 0.456, 0.406};
    std::vector<float> s_stdv={0.229, 0.224, 0.225};
    if(img.empty())
    {
        std::cout<<"Source image is empty!\n";
        return cv::Size(1.0,1.0);
    }
    cv::Mat res_img;
    cv::Size scale_r;
    scale_r.width = float(net_input_width)/float(img.cols);
    scale_r.height = float(net_input_height)/float(img.rows);
    //等比缩放
    cv::resize(img,res_img,cv::Size(net_input_width,net_input_height)); 
    int iw = res_img.cols;
    int ih = res_img.rows;
    memset(data,0.0,3*iw*ih*sizeof(float));
    //HWC->CHW
    for(int i=0;i<net_input_height;i++)
    {
        for(int j=0;j<net_input_width;j++)
        { 
            data[i*net_input_width+j+2*net_input_height*net_input_width] = (float(res_img.at<cv::Vec3b>(i, j)[2])*scale-s_mean[2])/s_stdv[2];
            data[i*net_input_width+j+net_input_height*net_input_width] =   (float(res_img.at<cv::Vec3b>(i, j)[1])*scale-s_mean[1])/s_stdv[1];
            data[i*net_input_width+j] =                                    (float(res_img.at<cv::Vec3b>(i, j)[0])*scale-s_mean[0])/s_stdv[0];   
        }
    }
    return  scale_r ;
}
```
### 字符识别模型预处理
字符识别模型输入数据预处理:
- 等比缩放,保留H维度的原始比例,填充(沿着右、下)
- 图片归一化,均值方差默认为0.5
- transpose ,MigraphX的输入数据排布顺序为[N,C,H,W]
```c++

bool CTCDecode::preproc(cv::Mat img,float* data,int img_w,int img_h)
    {
        if (img.empty())
        {
            std::cout<<"WARNING image is empty!\n";
            return false;
        }

        float scale=1.0/255.;
        int iw=img.cols;
        int ih=img.rows;
        float ratio=min(img_h*1.0/ih,img_w*1.0/iw);
        int nw=static_cast<int> (iw*ratio);
        int nh=img_h;
        cv::Mat res_mat;
        cv::resize(img,res_mat,cv::Size(nw,nh));
        cv::Mat template_mat=cv::Mat(img_h,img_w,CV_8UC3,cv::Scalar(0,0,0));
        int xdet=img_w-nw;
        int ydet=img_h-nh;
        cv::copyMakeBorder(res_mat, template_mat, 0,ydet, 0, xdet, 0); 
        memset(data,0.0,this->batch_size*3*img_w*img_h*sizeof(float));
      
        for(int b =0 ; b < this->batch_size;b++ )
        {
            for(int i=0;i<img_h;i++)
            {
                for(int j=0;j<img_w;j++)
                { 
                    data[i*img_w+j] = (template_mat.at<cv::Vec3b>(i, j)[2]*scale-0.5)/0.5;
                    data[i*img_w+j+img_h*img_w] = (template_mat.at<cv::Vec3b>(i, j)[1]*scale-0.5)/0.5;
                    data[i*img_w+j+2*img_h*img_w] =( template_mat.at<cv::Vec3b>(i, j)[0]*scale-0.5)/0.5;  
                
                }
            }
        }
        return  true ;
    }
```
## 类介绍
ppOcrEngine 封装了对外提供的API,OcrDet为文本检测类,CTCDecode为文本识别类。文本检测和文本识别在ppOcrEngine中是两个智能指针变量,在forward,首先调用text_detector检测到图片中的所有字符区域,然后分别将检测到的区域传入到text_recognizer中识别字符区域的内容。
```c++

class ppOcrEngine { 
        private:
            std::shared_ptr<OcrDet> text_detector;
            std::shared_ptr<CTCDecode> text_recognizer;
        public:
            ppOcrEngine(const std::string &det_model_path,
                    const std::string &rec_model_path,
                    const std::string &character_dict_path,
                    const float segm_thres=0.3,
                    const float box_thresh=0.7,
                    bool offload_copy =true,
118
                    std::string precision_mode = "fp16") ;
liuhy's avatar
liuhy committed
119
120
121
122
123
124
125
126
                    /**
         * @brief OCR engine初始化
         * @param det_model_path  字符检测模型路径
         * @param rec_model_path  识别模型路径
         * @param character_dict_path  字符字典路径
         * @param segm_thres   像素分割阈值
         * @param box_thresh   字符区域box阈值
         * @param offload_copy 内存拷贝存模式, 支持两种数据拷贝方式:*offload_copy=true、offload_copy=false。当offload_copy为true时,不需*要进行内存拷贝,如果为false,需要先预分配输入输出的设备内存,并在推理* *前,将预处理数据拷贝到设备内存,推理后将模型输出从设备内存中拷贝出来
127
         * @param precision_mode   精度模式,支持:fp32、fp16,默认支持fp16
liuhy's avatar
liuhy committed
128
129
130
131
132
133
134
135
136
137
         * 
         * @return NONE
         */
            ~ppOcrEngine();
            std::vector<std::string> forward(cv::Mat &srcimg);
    };

    class CTCDecode
    {
    private:
138
        ...
liuhy's avatar
liuhy committed
139
140
141
        
    public:
        CTCDecode(std::string rec_model_path,
142
        std::string precision_mode="fp16",
liuhy's avatar
liuhy committed
143
144
145
146
147
148
149
150
151
        int image_width=480,
        int image_height=48,
        int channel=3,
        int batch_size = 1,
        bool offload_copy = true,
        std::string character_dict_path="./ppocr_keys_v5.txt");
        
        ~CTCDecode();
        /**
152
153
154
         * @brief 字符识别、编码API 字符识别编码,可支持,最长可支持预测90个字符,18385个字符
         * @param img 输入图片
         * @return 编码后的字符串
liuhy's avatar
liuhy committed
155
156
157
158
         */
        std::string forward(cv::Mat& img);
        
    private:
159
       ...
liuhy's avatar
liuhy committed
160
161
162
163
164
165
        
    };
 
    class OcrDet
    {
    private:
166
        ...
liuhy's avatar
liuhy committed
167
168
169
170
171
172
173
174
175
176

    public:
        OcrDet(std::string det_model_path,
            std::string precision_mode="float32",
            bool offload_copy = true,
            float segm_thres = 0.3,
            float box_thresh = 0.7);
        ~OcrDet();

        /**
177
178
179
180
181
182
         * @brief 字符检测模型推理API
         * @param img 原始图片
         * @param text_roi_boxes  字符区域坐标,格式:[[[tl.x, tl.y], [tr.x, tr.y],[], [br.x, br.y], [bl.x, bl.y]]]]
         *                                                  |              |               |              |
         *                                               左上坐标        右上坐标         右下坐标        左下坐标
         * @return 成功返回true,失败返回false
liuhy's avatar
liuhy committed
183
         */
184
        bool forward(cv::Mat& img,std::vector<std::vector<std::vector<int>>>& text_roi_boxes);
liuhy's avatar
liuhy committed
185
        
186
187
    private:
        ...
liuhy's avatar
liuhy committed
188
189
190
191
192
193
194

    };

```

 
## 推理
195
196
197
198
- 字符检测
- 字符识别、解码
- 字符框可视化
- OCR结果可视化
liuhy's avatar
liuhy committed
199
```c++
200
201
202
203
204
205
206
207
std::vector<std::string> ppOcrEngine::forward(cv::Mat &srcimg){
        std::vector<std::vector<std::vector<int>>> text_roi_boxes;
         
        std::vector<std::string> text_vec;
        auto start = std::chrono::high_resolution_clock::now();
        //字符区域检测
        text_detector->forward(srcimg,text_roi_boxes);
        if(text_roi_boxes.size() == 0)
liuhy's avatar
liuhy committed
208
        {
209
210
            std::cout<<"Not found text roi !\n";
            return std::vector<std::string>();
liuhy's avatar
liuhy committed
211
        }
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
       
        std::vector<cv::Point> points;
        //字符识别+编码
        for (int n = 0; n < text_roi_boxes.size(); n++) {
            
            cv::Rect rect;
            cv::Mat text_roi_mat;
            rect.x = text_roi_boxes[n][0][0];
            rect.y = text_roi_boxes[n][0][1];
            rect.width = text_roi_boxes[n][2][0] -  text_roi_boxes[n][0][0];
            rect.height = text_roi_boxes[n][2][1] - text_roi_boxes[n][0][1];
            if(rect.width <3 || rect.height<3)
            {
                continue;
            }
            text_roi_mat = srcimg(rect).clone();
liuhy's avatar
liuhy committed
228

229
230
231
232
233
234
235
236
237
238
239
240
241
            std::string text = text_recognizer->forward(text_roi_mat);
            text_vec.push_back(text);
            points.push_back(cv::Point(rect.x,rect.y));
        }  
        auto end = std::chrono::high_resolution_clock::now(); 
        auto duration_ms = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
        std::cout<<"[Time info] elapsed: "<< duration_ms.count() <<" ms\n";
        //字符框可视化
        visualize_boxes(srcimg,text_roi_boxes);
        //OCR可视化
        cv::Mat res_img = visualize_text(text_vec,points, srcimg);
        ...
}
liuhy's avatar
liuhy committed
242
243
244
245
246
247

```

# Ocrv5 API调用说明
API调用步骤如下:
- 类实例化
248
- 读取测试图片
liuhy's avatar
liuhy committed
249
250
251
252
- 识别接口调用

例:
```c++
253
int main(int argc, char** argv){
liuhy's avatar
liuhy committed
254
255
    std::string det_model_onnx = "../Resource/Models/ppocrv5_server_det_infer.onnx";
    std::string rec_model_onnx = "../Resource/Models/ppocrv5_server_rec_infer.onnx";
256
    std::string img_path = "../Resource/Images/demo.png";
liuhy's avatar
liuhy committed
257
    std::string character_dict_path = "../Resource/ppocr_keys_v5.txt";
258
    std::string front = "../Resource/fonts/SimHei.ttf";
liuhy's avatar
liuhy committed
259
260
261
262
263
    float segm_thres=0.3;
    float box_thresh=0.3; 
    ppOcrEngine ocr_engine(det_model_onnx,
        rec_model_onnx,
        character_dict_path,
264
        front,
liuhy's avatar
liuhy committed
265
266
267
        segm_thres,
        box_thresh,
        true,
268
269
        "fp16");
    
liuhy's avatar
liuhy committed
270
271
272
273
274
    cv::Mat img=cv::imread(img_path);
    ocr_engine.forward(img);
    return 0;
}
```
275
sample支持两种精度推理(fp32和fp16),默认是fp16),精度和内存拷贝方式分别通过ocr_engine的构造函数传入参数来设置。