Unverified Commit 4ffb5b62 authored by zhoujun's avatar zhoujun Committed by GitHub
Browse files

Merge pull request #924 from WenmuZhou/dygraph

Dygraph
parents bc93c549 aad3093a
# RECENT UPDATES
- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)
- 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294)
- 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519)
- 2020.7.15, Add mobile App demo , support both iOS and Android ( based on easyedge and Paddle Lite)
- 2020.7.15, Improve the deployment ability, add the C + + inference , serving deployment. In addtion, the benchmarks of the ultra-lightweight Chinese OCR model are provided.
- 2020.7.15, Add several related datasets, data annotation and synthesis tools.
- 2020.7.9 Add a new model to support recognize the character "space".
- 2020.7.9 Add the data augument and learning rate decay strategies during training.
- 2020.6.8 Add [datasets](./doc/doc_en/datasets_en.md) and keep updating
- 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- 2020.5.30 Provide Lightweight Chinese OCR online experience
- 2020.5.30 Model prediction and training support on Windows system
- 2020.5.30 Open source general Chinese OCR model
- 2020.5.14 Release [PaddleOCR Open Class](https://www.bilibili.com/video/BV1nf4y1U7RX?p=4)
- 2020.5.14 Release [PaddleOCR Practice Notebook](https://aistudio.baidu.com/aistudio/projectdetail/467229)
- 2020.5.14 Open source 8.6M lightweight Chinese OCR model
# Vertical multi-language OCR dataset
Here we have sorted out the commonly used vertical multi-language OCR dataset datasets, which are being updated continuously. We welcome you to contribute datasets ~
- [Chinese urban license plate dataset](#Chinese urban license plate dataset)
- [Bank credit card dataset](#Bank credit card dataset)
- [Captcha dataset-Captcha](#Captcha dataset-Captcha)
- [multi-language dataset](#multi-language dataset)
<a name="Chinese urban license plate dataset"></a>
## Chinese urban license plate dataset
- **Data source**[https://github.com/detectRecog/CCPD](https://github.com/detectRecog/CCPD)
- **Data introduction**: It contains more than 250000 vehicle license plate images and vehicle license plate detection and recognition information labeling. It contains the following license plate image information in different scenes.
* CCPD-Base: General license plate picture
* CCPD-DB: The brightness of license plate area is bright, dark or uneven
* CCPD-FN: The license plate is farther or closer to the camera location
* CCPD-Rotate: License plate includes rotation (horizontal 20\~50 degrees, vertical-10\~10 degrees)
* CCPD-Tilt: License plate includes rotation (horizontal 15\~45 degrees, vertical 15\~45 degrees)
* CCPD-Blur: The license plate contains blurring due to camera lens jitter
* CCPD-Weather: The license plate is photographed on rainy, snowy or foggy days
* CCPD-Challenge: So far, some of the most challenging images in license plate detection and recognition tasks
* CCPD-NP: Pictures of new cars without license plates.
![](../datasets/ccpd_demo.png)
- **Download address**
* Baidu cloud download address (extracted code is hm0U): [https://pan.baidu.com/s/1i5AOjAbtkwb17Zy-NQGqkw](https://pan.baidu.com/s/1i5AOjAbtkwb17Zy-NQGqkw)
* Google drive download address:[https://drive.google.com/file/d/1rdEsCUcIUaYOVRkx5IMTRNA7PcGMmSgc/view](https://drive.google.com/file/d/1rdEsCUcIUaYOVRkx5IMTRNA7PcGMmSgc/view)
<a name="Bank credit card dataset"></a>
## Bank credit card dataset
- **Data source**: [https://www.kesci.com/home/dataset/5954cf1372ead054a5e25870](https://www.kesci.com/home/dataset/5954cf1372ead054a5e25870)
- **Data introduction**: There are three types of training data
* 1.Sample card data of China Merchants Bank: including card image data and annotation data, a total of 618 pictures
* 2.Single character data: including pictures and annotation data, 37 pictures in total.
* 3.There are only other bank cards, no more detailed information, a total of 50 pictures.
* The demo image is shown as follows. The annotation information is stored in excel, and the demo image below is marked as
* Top 8 card number: 62257583
* Card type: card of our bank
* End of validity: 07/41
* Chinese phonetic alphabet of card users: MICHAEL
![](../datasets/cmb_demo.jpg)
- **Download address**: [https://cdn.kesci.com/cmb2017-2.zip](https://cdn.kesci.com/cmb2017-2.zip)
<a name="Captcha dataset-Captcha"></a>
## Captcha dataset-Captcha
- **Data source**: [https://github.com/lepture/captcha](https://github.com/lepture/captcha)
- **Data introduction**: This is a toolkit for data synthesis. You can output captcha images according to the input text. Use the toolkit to generate several demo images as follows.
![](../datasets/captcha_demo.png)
- **Download address**: The dataset is generated and has no download address.
<a name="multi-language dataset"></a>
## multi-language dataset(Multi-lingual scene text detection and recognition)
- **Data source**: [https://rrc.cvc.uab.es/?ch=15&com=downloads](https://rrc.cvc.uab.es/?ch=15&com=downloads)
- **Data introduction**: Multi language detection dataset MLT contains both language recognition and detection tasks.
* In the detection task, the training set contains 10000 images in 10 languages, and each language contains 1000 training images. The test set contains 10000 images.
* In the recognition task, the training set contains 111998 samples.
- **Download address**: The training set is large and can be downloaded in two parts. It can only be downloaded after registering on the website:
[https://rrc.cvc.uab.es/?ch=15&com=downloads](https://rrc.cvc.uab.es/?ch=15&com=downloads)
# Visualization
- [Chinese/English OCR Visualization (Space_support )](#Space_support)
- [Ultra-lightweight Chinese/English OCR Visualization](#Ultra-lightweight)
- [General Chinese/English OCR Visualization](#General)
<a name="Space_support"></a>
## Chinese/English OCR Visualization (Space_support )
### Ultra-lightweight Model
<div align="center">
<img src="../imgs_results/img_11.jpg" width="800">
</div>
### General OCR Model
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/en_paper.jpg" width="800">
</div>
<a name="Ultra-lightweight"></a>
## Ultra-lightweight Chinese/English OCR Visualization
<div align="center">
<img src="../imgs_results/1.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/7.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/12.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/4.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/6.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/9.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/16.png" width="800">
</div>
<div align="center">
<img src="../imgs_results/22.jpg" width="800">
</div>
<a name="General"></a>
## General Chinese/English OCR Visualization
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/11.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/2.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/8.jpg" width="800">
</div>
# paddleocr package
## Get started quickly
### install package
install by pypi
```bash
pip install paddleocr
```
build own whl package and install
```bash
python setup.py bdist_wheel
pip install dist/paddleocr-0.0.3-py3-none-any.whl
```
### 1. Use by code
* detection and recognition
```python
from paddleocr import PaddleOCR,draw_ocr
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path)
for line in result:
print(line)
# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
......
```
Visualization of results
<div align="center">
<img src="../imgs_results/whl/12_det_rec.jpg" width="800">
</div>
* only detection
```python
from paddleocr import PaddleOCR,draw_ocr
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path,rec=False)
for line in result:
print(line)
# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
Output will be a list, each item only contains bounding box
```bash
[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
......
```
Visualization of results
<div align="center">
<img src="../imgs_results/whl/12_det.jpg" width="800">
</div>
* only recognition
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR() # need to run only once to load model into memory
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
result = ocr.ocr(img_path,det=False)
for line in result:
print(line)
```
Output will be a list, each item contains text and recognition confidence
```bash
['PAIN', 0.990372]
```
### Use by command line
show help information
```bash
paddleocr -h
```
* detection and recognition
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg
```
Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
......
```
* only detection
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --rec false
```
Output will be a list, each item only contains bounding box
```bash
[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
......
```
* only recognition
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false
```
Output will be a list, each item contains text and recognition confidence
```bash
['PAIN', 0.990372]
```
## Use custom model
When the built-in model cannot meet the needs, you need to use your own trained model.
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows
### 1. Use by code
```python
from paddleocr import PaddleOCR,draw_ocr
# The path of detection and recognition model must contain model and params files
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}',rec_model_dir='{your_rec_model_dir}å')
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path)
for line in result:
print(line)
# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
### Use by command line
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir}
```
## Parameter Description
| Parameter | Description | Default value |
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
| use_gpu | use GPU or not | TRUE |
| gpu_mem | GPU memory size used for initialization | 8000M |
| image_dir | The images path or folder path for predicting when used by the command line | |
| det_algorithm | Type of detection algorithm selected | DB |
| det_model_dir | the text detection inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/det`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
| det_max_side_len | The maximum size of the long side of the image. When the long side exceeds this value, the long side will be resized to this size, and the short side will be scaled proportionally | 960 |
| det_db_thresh | Binarization threshold value of DB output map | 0.3 |
| det_db_box_thresh | The threshold value of the DB output box. Boxes score lower than this value will be discarded | 0.5 |
| det_db_unclip_ratio | The expanded ratio of DB output box | 2 |
| det_east_score_thresh | Binarization threshold value of EAST output map | 0.8 |
| det_east_cover_thresh | The threshold value of the EAST output box. Boxes score lower than this value will be discarded | 0.1 |
| det_east_nms_thresh | The NMS threshold value of EAST model output box | 0.2 |
| rec_algorithm | Type of recognition algorithm selected | CRNN |
| rec_model_dir | the text recognition inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/rec`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
| rec_image_shape | image shape of recognition algorithm | "3,32,320" |
| rec_char_type | Character type of recognition algorithm, Chinese (ch) or English (en) | ch |
| rec_batch_num | When performing recognition, the batchsize of forward images | 30 |
| max_text_length | The maximum text length that the recognition algorithm can recognize | 25 |
| rec_char_dict_path | the alphabet path which needs to be modified to your own path when `rec_model_Name` use mode 2 | ./ppocr/utils/ppocr_keys_v1.txt |
| use_space_char | Whether to recognize spaces | TRUE |
| enable_mkldnn | Whether to enable mkldnn | FALSE |
| det | Enable detction when `ppocr.ocr` func exec | TRUE |
| rec | Enable detction when `ppocr.ocr` func exec | TRUE |
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment