Tutorial_Python.md

# 概述
PP-OCRv5 是PP-OCR新一代文字识别解决方案，该方案聚焦于多场景、多文字类型的文字识别。在文字类型方面，PP-OCRv5支持简体中文、中文拼音、繁体中文、英文、日文5大主流文字类型，在场景方面，PP-OCRv5升级了中英复杂手写体、竖排文本、生僻字等多种挑战性场景的识别能力。在内部多场景复杂评估集上，PP-OCRv5较PP-OCRv4端到端提升13个百分点，本sample适配了PPOcrV5字符检测和识别模型，并使用MIGraphX 5.0 的python接口实现推理。

## 模型简介

### 文本检测 
文本检测使用了dbnet(论文地址：https://arxiv.org/pdf/1911.08947),网络结构:![alt text](Images/DBNet.png),模型输出概率图，并用Vatti Clipping算法对字符区域多边形简化处理。 sample中使用动态shape（N,3,H,C）,最大输入shape是[1,3,640,640],模型地址：Resource/Models/ppocrv5_server_det_infer.onnx
### 文本识别
文本识别使用了CRNN+CTCDecode(https://arxiv.org/pdf/2009.09941)，网络结构：![alt text](Images/CRNN.png)，sample中使用了动态shape (N,3,48,W),最大输入shape是[1,3,48,720],模型地址：Resource/Models/ppocrv5_server_rec_infer.onnx
  																										

## 预处理
### 检测模型预处理
检测模型输入数据预处理：
- 图片等比缩放，填充（沿着右、下填充）
- 图片归一化，减均值除方差
- transpose ,MigraphX的输入数据排布顺序为[N,C,H,W]

 
本示例代码主要采用了OpenCV实现了预处理操作：

```python
def preprocess(self, src_img,
                   mean: list = [0.485, 0.456, 0.406],
                   std: list = [0.229, 0.224, 0.225],
                   scale: float = 1.0/255):
           
            data = dict()
            img = src_img.copy()
            src_h, src_w, _ = img.shape
            #对输入图片等比缩放，确保字符区域不会变形
            res_img, [ratio_h, ratio_w] = self.resize_image(img)
            norm_img = (res_img* scale - mean) / std
            #HWC->CHW
            image_data = norm_img.transpose(2, 0, 1)
            #HWC->NCHW
            image_data = np.expand_dims(image_data, axis=0).astype(np.float32)
            image_data = np.ascontiguousarray(image_data)

            data["image"] = image_data
            data["shape"] = np.array([src_h, src_w, ratio_h, ratio_w])
            return data

    def resize_image(self, img):
        h, w, _ = img.shape
        if h > w:
            ratio = float(self.db_input_size[1]) / h
        else:
            ratio = float(self.db_input_size[0]) / w
        resize_h = int(h * ratio)
        resize_w = int(w * ratio)

        resize_h = max(int(round(resize_h / 32) * 32), 32)
        resize_w = max(int(round(resize_w / 32) * 32), 32)

        try:
            if int(resize_w) <= 0 or int(resize_h) <= 0:
                return None, (None, None)
            img = cv2.resize(img, (int(resize_w), int(resize_h)))
        except:
            print(img.shape, resize_w, resize_h)
            raise ValueError("resize error")
        ratio_h = resize_h / float(h)
        ratio_w = resize_w / float(w)
        
        im_pad = np.zeros((self.db_input_size[1], self.db_input_size[0], 3), np.float32)
        im_pad[:resize_h, :resize_w, :] = img
        return im_pad, [ratio_h, ratio_w]
```
### 字符识别模型预处理
```python
字符识别模型输入数据预处理：
- 等比缩放，保留H维度的原始比例，填充(沿着右、下)
- 图片归一化，均值方差默认为0.5
- transpose ,MigraphX的输入数据排布顺序为[N,C,H,W]

def preprocess(self, img, max_wh_ratio):
        if isinstance(max_wh_ratio,list) ==False:
            raise TypeError("max_wh_ratio must be list")

        imgH, imgW = self.rec_input_size
        max_h,max_w = self.rec_input_size
        
        h, w = img.shape[:2]
     
        # re_size = (max_w,max_h)
        #保留H的原始维度
        if h <= max_h:
            ratio = max_h / h

            w = int(w*ratio)
           
            if w <= max_w:
                re_size =(w,max_h)
                
            else:
                re_size = (max_w,max_h)
               
        else:
            ratio = max_h/h
            w,h = int(w*ratio),max_h
            if w <= max_w:
                re_size = (w,h)
               
            else:
                re_size = (max_w,h)

        max_wh_ratio.append(ratio)
        resized_image = cv2.resize(img, re_size)
        resized_image = resized_image.astype("float32")
        #归一化
        resized_image = resized_image.transpose((2, 0, 1)) / 255
        resized_image -= 0.5
        resized_image /= 0.5

        #填充，沿着右、下填充
        padding_im = np.zeros((3, imgH, imgW), dtype=np.float32)
       
        padding_im[:, :, 0:re_size[0]] = resized_image
     
        return padding_im
```
## 类介绍
PPOcrV5 封装了对外提供的API，TextDetector为文本检测类，TextRecgnizer为文本识别类,BaseRecLabelDecode，实现模型输出的索引序列和实际文本标签之间进行转换，CTCLabelDecode继承BaseRecLabelDecode，实现文本识别模型的输出解码将模型输出的概率值转换为字符并连接成句子。

```python
class PPOcrV5():
    def __init__(self,
        det_model_path:str,
        rec_model_path:str,
        char_dict_path:str = "../Resource/ppocr_keys_v5.txt",
        db_input_size :list =  (640,640),
        rec_input_size :list = (48,720),
        seg_thresh:float=0.3,
        box_thresh:float=0.7,
        precision_mode:str='fp32',
        offload_copy:bool=True,
        **kwargs
        )
    """Ocr检测识别推理初始化

    字符检测、字符编码、识别。

    Args:
        det_model_path ：字符检测模型路径
        rec_model_path : 字符分割模型路径。
        char_dict_path ：字符集路径
        db_input_size  ：检测模型输入size
        rec_input_size ：是被模型输入size
        seg_thresh     ：像素分割阈值
        box_thresh     ：字符区域box阈值
        precision_mode ：精度模式。可选 fp32、fp16
        offload_copy   : 数据拷贝模式 ，支持两种数据拷贝方式：offload_copy=true、offload_copy=false。当offload_copy为true时，不需要进行内存拷贝，如果为false，需要先预分配输入输出的设备内存，并在推理前，将预处理数据拷贝到设备内存，推理后将模型输出从设备内存中拷贝出来
        **kwargs       ：设置字符检测模型后处理相关参数

    Returns:
        return_type: NONE。

    Examples:
        det_onnx_path = "PATH/TO/det_onnx_model.onnx"
        rec_onnx_path = "PATH/TO/rec_onnx_model.onnx"
        image_path = "PATH/TO/test.png"
        ppocrv5 = PPOcrV5(det_onnx_path,rec_onnx_path,offload_copy=True)
    """


class TextDetector(object):
    def __init__(
        self,
        det_model_path,
        db_input_size=(640,640),
        thresh=0.3,
        box_thresh=0.7,
        max_candidates=1000,
        unclip_ratio=2.0,
        use_dilation=False,
        score_mode="fast",
        box_type="quad",
        precision_mode="float32",
        **kwargs,
    )
    """字符检测模型初始化

    字符检测(dbnet)。

    Args:
        det_model_path ：字符检测模型路径。
        db_input_size  ：检测模型输入size
        thresh         ：像素分割阈值
        box_thresh     ：字符区域box阈值
        max_candidates : 字符最大候选数
        unclip_ratio   ：polygon 扩散比例
        precision_mode ：精度模式。可选 "fp16"，"int8"，"float32"
        use_dilation   : 是否对二值图进行膨胀处理
        score_mode     ：评分模式。
        box_type       ：box类型，可选矩形和多边形，这里默认为矩形
        offload_copy   : 数据拷贝模式 ，支持两种数据拷贝方式：offload_copy=true、offload_copy=false。当offload_copy为true时，不需要进行内存拷贝，如果为false，需要先预分配输入输出的设备内存，并在推理前，将预处理数据拷贝到设备内存，推理后将模型输出从设备内存中拷贝出来
        **kwargs       ：设置字符检测模型后处理相关参数

    Returns:
        return_type: NONE。

    Examples:
        self.db_detector = TextDetector(
                det_model_path,
                db_input_size,
                thresh=self.seg_thres,
                box_thresh=self.box_thresh,
                max_candidates=self.max_candidates,
                unclip_ratio=self.unclip_ratio,
                box_type=self.box_type,
                use_dilation=self.use_dilation,
                score_mode=self.score_mode,
                precision_mode=precision_mode,
                offload_copy=offload_copy
    """

class TextRecgnizer(object):
    """Support SVTR_LCNet """
    def __init__(
        self,
        rec_model_path,
        rec_batch_num=2,
        rec_input_size=(48, 480),#hw
        rec_algorithm="SVTR_LCNet",
        precision_mode = "fp32",
        **kwargs
    )
    """字符识别模型初始化

    字符识别(crnn+ctc)。

    Args:
        rec_model_path ：字符识别模型路径。
        rec_batch_num  ：模型推理batch size 
        rec_input_size ：模型推理的最大size
        rec_algorithm  : 后处理算法类型
        unclip_ratio   ：polygon 扩散比例
        precision_mode ：精度模式。可选 "fp16"，"float32"
        **kwargs       ：设置字符识别模型后处理相关参数

    Returns:
        return_type: NONE。

    Examples:
        self.text_extractor = TextRecgnizer(rec_model_path=rec_model_path,
                                            rec_input_size=rec_input_size,
                                            precision_mode=precision_mode,
                                            offload_copy=offload_copy)
    """

    class BaseRecLabelDecode(object):
    def __init__(self, character_dict_path=None,
     use_space_char=False)
     """Convert between text-label and text-index

    字符识别(crnn+ctc)。

    Args:
        character_dict_path ：字符集文件路径。
        use_space_char      ：字符集中是否包含空格。
    Returns:
        return_type: NONE。

    Examples:
       
    """

    class CTCLabelDecode(BaseRecLabelDecode):
    def __init__(self, character_dict_path=None, use_space_char=False, **kwargs):
        super(CTCLabelDecode, self).__init__(character_dict_path, use_space_char)
        """Convert between text-label and text-index

    字符识别(crnn+ctc)。

    Args:
        character_dict_path ：字符集文件路径。
        use_space_char      ：字符集中是否包含空格。
    Returns:
        return_type: NONE。

    Examples:
       
    """
```

 
## 推理
### 字符检测模型推理
```python
def __call__(self, src_img):
        data = self.preprocess(src_img)
        """支持两种数据拷贝方式：offload_copy=true、offload_copy=false。当offload_copy为true时，不需要进行内存拷贝，如果为false，需要先预分配输入输出的设备内存，并在推理前，将预处理数据拷贝到设备内存，推理后将模型输出从设备内存中拷贝出来，在做后处理。"""
        if self.offload_copy==False:
            self.d_mem[self.det_input_name] =migraphx.to_gpu(migraphx.argument(data["image"]))
            results = self.db_model.run(self.d_mem)
        else:
            results = self.db_model.run({self.det_input_name:data["image"]})
        
        if self.offload_copy==False :
            #从gpu拷贝推理结果到cpu 
            result=migraphx.from_gpu(results[0])
            print("offload copy model")
            result = np.array(result)
        else:
            result = results[0]
                        
        shape_list = np.expand_dims(data["shape"], axis=0)
        pred = np.array(result)
        pred = pred[:, 0, :, :]
        #获取大于阈值的概率
        segmentation = pred > self.thresh
        boxes_batch = []
        for batch_index in range(pred.shape[0]):
            src_h, src_w, ratio_h, ratio_w = shape_list[batch_index]
            if self.dilation_kernel is not None:
                mask = cv2.dilate(
                    np.array(segmentation[batch_index]).astype(np.uint8),
                    self.dilation_kernel,
                )
            else:
                mask = segmentation[batch_index]
            #根据预测的bitmap获取文本区域
            if self.box_type == "poly":
                boxes, scores = self.polygons_from_bitmap(
                    pred[batch_index], mask, ratio_w,ratio_h, src_w, src_h
                )
            elif self.box_type == "quad":
                boxes, scores = self.boxes_from_bitmap(
                    pred[batch_index], mask, ratio_w,ratio_h, src_w, src_h
                )
            else:
                raise ValueError("box_type can only be one of ['quad', 'poly']")

            boxes_batch.append(boxes)
        #文本区域按照从上到下，从左到右的顺序排序
        det_box_batch = self.sorted_boxes(boxes_batch)
        #文本区域按坐标映射到原始图像
        dt_boxes,det_rects = self.box_standardization(det_box_batch,shape_list)
        return dt_boxes,det_rects


```
### 字符识别推理
```python
def __call__(self, batch_img_list):
        if len(batch_img_list) == 0:
            return []
        width_list = []
        #遍历图片列表（字符roi存放在图片列表中），为了支持多batch推理，这里还会将batch_size张图片进行拼接np.concatenate(batch_norm_imgs)
        for b in range(len(batch_img_list)):
            for img in batch_img_list[b]:
                width_list.append(img.shape[1] / float(img.shape[0]))
        
        indices = np.argsort(np.array(width_list))
       
        input_batch = self.rec_batch_num
        batch_outputs_pre = []
        batch_max_wh_ratio_pre = []
        for b in range(len(batch_img_list)):
            im_count = len(batch_img_list[b])
            batch_outputs = []
            batch_max_wh_ratio = []
            for beg_img_no in range(0, im_count, input_batch):
                end_img_no = min(im_count, beg_img_no + input_batch)
                
                # for ino in range(beg_img_no, end_img_no):
                    
                #     h, w = batch_img_list[b][indices[ino]].shape[0:2]
                #     wh_ratio = w * 1.0 / h
                #     max_wh_ratio = max(max_wh_ratio, wh_ratio)
                    
                batch_norm_imgs = []
                max_wh_ratio = list()
                # N batch
                for ino in range(beg_img_no, end_img_no):
                    #单张图片预处理
                    norm_img = self.preprocess(batch_img_list[b][indices[ino]], max_wh_ratio)
                    norm_img = norm_img[np.newaxis, :].astype(np.float32)
                  
                    batch_norm_imgs.append(norm_img)
                    
                batch_max_wh_ratio.append(max_wh_ratio)
                #batch_size张图片进行拼接
                if self.rec_batch_num >1:
                    norm_img_batch = np.concatenate(batch_norm_imgs)
                    norm_img_batch = norm_img_batch.copy()
                else:
                    norm_img_batch = np.array([batch_norm_imgs.copy()])

              
                if self.offload_copy==False:
                    print("offload copy model")
                    self.d_mem[self.rec_input_name] =migraphx.to_gpu(migraphx.argument(norm_img_batch))
                    results = self.rec_model.run(self.d_mem)
                    output = np.array(results[0])
                else:
                    results = self.rec_model.run({self.rec_input_name:norm_img_batch})
                    output = results[0]

             
                # batch_outputs.append(np.array(output))

                #将所有batch的输出结果append到batch_outputs中方便后处理
                [batch_outputs.append(out) for out in np.array(output)]
            
            batch_outputs_pre.append(np.array(batch_outputs))   
            batch_max_wh_ratio_pre.append(batch_max_wh_ratio)            

        return batch_outputs_pre ,batch_max_wh_ratio_pre  


```
# Ocrv5 API调用说明
API调用步骤如下：
- 类实例化
- 识别接口调用

例：
```python
if __name__ == '__main__':

    det_onnx_path = "../Resource/Models/ppocrv5_server_det_infer.onnx"
    rec_onnx_path = "../Resource/Models/ppocrv5_server_rec_infer.onnx"
    image_path = "../Resource/Images/lite_demo.png"
    img = cv2.imread(image_path)
    ppocrv5 = PPOcrV5(det_onnx_path,rec_onnx_path,offload_copy=True,precision_mode="fp32")
    res_img = ppocrv5(img)
    cv2.imwrite("res.jpg",res_img)
```
sample支持两种精度推理（fp32和fp16），默认是fp32）,精度和内存拷贝方式分别通过precision_mode和offload_copy参数控制。