"doc/doc_ch/recognition.md" did not exist on "0cc083f6a4bf099d19b4d02091fd0b0054cac930"
Commit 6ec9aa4d authored by LDOUBLEV's avatar LDOUBLEV
Browse files

Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into trt_cpp

parents df1c97af 65e61f44
......@@ -148,7 +148,7 @@ The visual text detection results are saved to the ./inference_results folder by
![](../imgs_results/det_res_00018069.jpg)
You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
The optional parameters of `litmit_type` are [`max`, `min`], and
The optional parameters of `limit_type` are [`max`, `min`], and
`det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960.
The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960,
......
## OCR model list(V2.0, updated on 2021.1.20)
**Note** : Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
> **Note**
> 1. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
> 2. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md).
- [1. Text Detection Model](#Detection)
- [2. Text Recognition Model](#Recognition)
......@@ -12,9 +14,13 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
|model type|model format|description|
|--- | --- | --- |
|inference model|inference.pdmodel、inference.pdiparams|Used for reasoning based on Python prediction engine,[detail](./inference_en.md)|
|inference model|inference.pdmodel、inference.pdiparams|Used for inference based on Paddle inference engine,[detail](./inference_en.md)|
|trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.|
|slim model|\*.nb|Generally used for Lite deployment|
|slim model|\*.nb| Model compressed by PaddleSim (a model compression tool using PaddlePaddle), which is suitable for mobile-side deployment scenarios (Paddle-Lite is needed for slim model deployment). |
Relationship of the above models is as follows.
![](../imgs_en/model_prod_flow_en.png)
<a name="Detection"></a>
### 1. Text Detection Model
......
## TEXT RECOGNITION
- [DATA PREPARATION](#DATA_PREPARATION)
- [Dataset Download](#Dataset_download)
- [Costom Dataset](#Costom_Dataset)
- [Dictionary](#Dictionary)
- [Add Space Category](#Add_space_category)
- [1 DATA PREPARATION](#DATA_PREPARATION)
- [1.1 Costom Dataset](#Costom_Dataset)
- [1.2 Dataset Download](#Dataset_download)
- [1.3 Dictionary](#Dictionary)
- [1.4 Add Space Category](#Add_space_category)
- [TRAINING](#TRAINING)
- [Data Augmentation](#Data_Augmentation)
- [Training](#Training)
- [Multi-language](#Multi_language)
- [2 TRAINING](#TRAINING)
- [2.1 Data Augmentation](#Data_Augmentation)
- [2.2 Training](#Training)
- [2.3 Multi-language](#Multi_language)
- [EVALUATION](#EVALUATION)
- [3 EVALUATION](#EVALUATION)
- [PREDICTION](#PREDICTION)
- [Training engine prediction](#Training_engine_prediction)
- [4 PREDICTION](#PREDICTION)
- [4.1 Training engine prediction](#Training_engine_prediction)
<a name="DATA_PREPARATION"></a>
### DATA PREPARATION
PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data:
PaddleOCR supports two data formats:
- `LMDB` is used to train data sets stored in lmdb format;
- `general data` is used to train data sets stored in text files:
Please organize the dataset as follows:
The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
```
# linux and mac os
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
# windows
mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
```
<a name="Dataset_download"></a>
* Dataset download
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
<a name="Costom_Dataset"></a>
* Use your own dataset:
#### 1.1 Costom dataset
If you want to use your own data for training, please refer to the following to organize your data.
- Training set
First put the training images in the same folder (train_images), and use a txt file (rec_gt_train.txt) to store the image path and label.
It is recommended to put the training images in the same folder, and use a txt file (rec_gt_train.txt) to store the image path and label. The contents of the txt file are as follows:
* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error
```
" Image file name Image annotation "
train_data/train_0001.jpg 简单可依赖
train_data/train_0002.jpg 用科技让复杂的世界更简单
```
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
```
# Training set label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
# Test Set Label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
train_data/rec/train/word_001.jpg 简单可依赖
train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
...
```
The final training set should have the following file structure:
```
|-train_data
|-ic15_data
|-rec
|- rec_gt_train.txt
|- train
|- word_001.png
......@@ -82,6 +73,7 @@ Similar to the training set, the test set also needs to be provided a folder con
```
|-train_data
|-rec
|-ic15_data
|- rec_gt_test.txt
|- test
......@@ -90,8 +82,25 @@ Similar to the training set, the test set also needs to be provided a folder con
|- word_003.jpg
| ...
```
<a name="Dataset_download"></a>
#### 1.2 Dataset download
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,download the lmdb format dataset required for benchmark
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
```
# Training set label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
# Test Set Label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
```
<a name="Dictionary"></a>
- Dictionary
#### 1.3 Dictionary
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
......@@ -108,6 +117,8 @@ n
In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1]
PaddleOCR has built-in dictionaries, which can be used on demand.
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.
`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
......@@ -123,8 +134,6 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
`ppocr/utils/dict/en_dict.txt` is a English dictionary with 63 characters
You can use it on demand.
The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) and we will thank you in the Repo.
......@@ -136,14 +145,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
<a name="Add_space_category"></a>
- Add space category
#### 1.4 Add space category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
**Note: use_space_char only takes effect when character_type=ch**
<a name="TRAINING"></a>
### TRAINING
### 2 TRAINING
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
......@@ -166,7 +175,7 @@ Start training:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml
```
<a name="Data_Augmentation"></a>
- Data Augmentation
#### 2.1 Data Augmentation
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
......@@ -175,7 +184,7 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand
Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
<a name="Training"></a>
- Training
#### 2.2 Training
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
......@@ -268,7 +277,7 @@ Eval:
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
<a name="Multi_language"></a>
- Multi-language
#### 2.3 Multi-language
PaddleOCR currently supports 26 (except Chinese) language recognition. A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)
......@@ -420,7 +429,7 @@ Eval:
```
<a name="EVALUATION"></a>
### EVALUATION
### 3 EVALUATION
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
......@@ -430,10 +439,10 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
```
<a name="PREDICTION"></a>
### PREDICTION
### 4 PREDICTION
<a name="Training_engine_prediction"></a>
* Training engine prediction
#### 4.1 Training engine prediction
Using the model trained by paddleocr, you can quickly get prediction through the following script.
......
# paddleocr package
## Get started quickly
### install package
## 1 Get started quickly
### 1.1 install package
install by pypi
```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
......@@ -12,9 +12,11 @@ build own whl package and install
python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
```
### 1. Use by code
## 2 Use
### 2.1 Use by code
The paddleocr whl package will automatically download the ppocr lightweight model as the default model, which can be customized and replaced according to the section 3 **Custom Model**.
* detection classification and recognition
* detection angle classification and recognition
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
......@@ -163,7 +165,7 @@ Output will be a list, each item contains classification result and confidence
['0', 0.99999964]
```
### Use by command line
### 2.2 Use by command line
show help information
```bash
......@@ -239,11 +241,11 @@ Output will be a list, each item contains classification result and confidence
['0', 0.99999964]
```
## Use custom model
## 3 Use custom model
When the built-in model cannot meet the needs, you need to use your own trained model.
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows
### 1. Use by code
### 3.1 Use by code
```python
from paddleocr import PaddleOCR,draw_ocr
......@@ -265,17 +267,17 @@ im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
### Use by command line
### 3.2 Use by command line
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true
```
### Use web images or numpy array as input
## 4 Use web images or numpy array as input
1. Web image
### 4.1 Web image
Use by code
- Use by code
```python
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
......@@ -294,12 +296,12 @@ im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
Use by command line
- Use by command line
```bash
paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true
```
2. Numpy array
### 4.2 Numpy array
Support numpy array as input only when used by code
```python
......@@ -324,7 +326,7 @@ im_show.save('result.jpg')
```
## Parameter Description
## 5 Parameter Description
| Parameter | Description | Default value |
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
......
doc/joinus.PNG

114 KB | W: | H:

doc/joinus.PNG

109 KB | W: | H:

doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
  • 2-up
  • Swipe
  • Onion skin
......@@ -215,7 +215,7 @@ class AttnLabelEncode(BaseRecLabelEncode):
return None
data['length'] = np.array(len(text))
text = [0] + text + [len(self.character) - 1] + [0] * (self.max_text_len
- len(text) - 1)
- len(text) - 2)
data['label'] = np.array(text)
return data
......@@ -255,13 +255,13 @@ class SRNLabelEncode(BaseRecLabelEncode):
def __call__(self, data):
text = data['label']
text = self.encode(text)
char_num = len(self.character_str)
char_num = len(self.character)
if text is None:
return None
if len(text) > self.max_text_len:
return None
data['length'] = np.array(len(text))
text = text + [char_num] * (self.max_text_len - len(text))
text = text + [char_num - 1] * (self.max_text_len - len(text))
data['label'] = np.array(text)
return data
......
......@@ -84,11 +84,12 @@ class MakeShrinkMap(object):
return polygons, ignore_tags
def polygon_area(self, polygon):
# return cv2.contourArea(polygon.astype(np.float32))
edge = 0
for i in range(polygon.shape[0]):
next_index = (i + 1) % polygon.shape[0]
edge += (polygon[next_index, 0] - polygon[i, 0]) * (
polygon[next_index, 1] - polygon[i, 1])
return edge / 2.
"""
compute polygon area
"""
area = 0
q = polygon[-1]
for p in polygon:
area += p[0] * q[1] - p[1] * q[0]
q = p
return area / 2.0
......@@ -185,8 +185,8 @@ class DetResizeForTest(object):
resize_h = int(h * ratio)
resize_w = int(w * ratio)
resize_h = int(round(resize_h / 32) * 32)
resize_w = int(round(resize_w / 32) * 32)
resize_h = max(int(round(resize_h / 32) * 32), 32)
resize_w = max(int(round(resize_w / 32) * 32), 32)
try:
if int(resize_w) <= 0 or int(resize_h) <= 0:
......
......@@ -29,7 +29,7 @@ class RecMetric(object):
pred = pred.replace(" ", "")
target = target.replace(" ", "")
norm_edit_dis += Levenshtein.distance(pred, target) / max(
len(pred), len(target))
len(pred), len(target), 1)
if pred == target:
correct_num += 1
all_num += 1
......
......@@ -146,6 +146,9 @@ class AttentionLSTM(nn.Layer):
else:
targets = paddle.zeros(shape=[batch_size], dtype="int32")
probs = None
char_onehots = None
outputs = None
alpha = None
for i in range(num_steps):
char_onehots = self._char_to_onehot(
......
......@@ -47,6 +47,7 @@ def main():
config['Architecture']["Head"]['out_channels'] = len(
getattr(post_process_class, 'character'))
model = build_model(config['Architecture'])
use_srn = config['Architecture']['algorithm'] == "SRN"
best_model_dict = init_model(config, model, logger)
if len(best_model_dict):
......@@ -59,7 +60,7 @@ def main():
# start eval
metirc = program.eval(model, valid_dataloader, post_process_class,
eval_class)
eval_class, use_srn)
logger.info('metric eval ***************')
for k, v in metirc.items():
logger.info('{}:{}'.format(k, v))
......
......@@ -54,6 +54,13 @@ class TextRecognizer(object):
"character_dict_path": args.rec_char_dict_path,
"use_space_char": args.use_space_char
}
elif self.rec_algorithm == "RARE":
postprocess_params = {
'name': 'AttnLabelDecode',
"character_type": args.rec_char_type,
"character_dict_path": args.rec_char_dict_path,
"use_space_char": args.use_space_char
}
self.postprocess_op = build_post_process(postprocess_params)
self.predictor, self.input_tensor, self.output_tensors = \
utility.create_predictor(args, 'rec', logger)
......
......@@ -163,6 +163,11 @@ def train(config,
if type(eval_batch_step) == list and len(eval_batch_step) >= 2:
start_eval_step = eval_batch_step[0]
eval_batch_step = eval_batch_step[1]
if len(valid_dataloader) == 0:
logger.info(
'No Images in eval dataset, evaluation during training will be disabled'
)
start_eval_step = 1e111
logger.info(
"During the training process, after the {}th iteration, an evaluation is run every {} iterations".
format(start_eval_step, eval_batch_step))
......@@ -177,6 +182,8 @@ def train(config,
model_average = False
model.train()
use_srn = config['Architecture']['algorithm'] == "SRN"
if 'start_epoch' in best_model_dict:
start_epoch = best_model_dict['start_epoch']
else:
......@@ -195,7 +202,7 @@ def train(config,
break
lr = optimizer.get_lr()
images = batch[0]
if config['Architecture']['algorithm'] == "SRN":
if use_srn:
others = batch[-4:]
preds = model(images, others)
model_average = True
......@@ -251,8 +258,12 @@ def train(config,
min_average_window=10000,
max_average_window=15625)
Model_Average.apply()
cur_metric = eval(model, valid_dataloader, post_process_class,
eval_class)
cur_metric = eval(
model,
valid_dataloader,
post_process_class,
eval_class,
use_srn=use_srn)
cur_metric_str = 'cur metric, {}'.format(', '.join(
['{}: {}'.format(k, v) for k, v in cur_metric.items()]))
logger.info(cur_metric_str)
......@@ -316,7 +327,8 @@ def train(config,
return
def eval(model, valid_dataloader, post_process_class, eval_class):
def eval(model, valid_dataloader, post_process_class, eval_class,
use_srn=False):
model.eval()
with paddle.no_grad():
total_frame = 0.0
......@@ -327,7 +339,8 @@ def eval(model, valid_dataloader, post_process_class, eval_class):
break
images = batch[0]
start = time.time()
if "SRN" in str(model.head):
if use_srn:
others = batch[-4:]
preds = model(images, others)
else:
......
......@@ -50,6 +50,12 @@ def main(config, device, logger, vdl_writer):
# build dataloader
train_dataloader = build_dataloader(config, 'Train', device, logger)
if len(train_dataloader) == 0:
logger.error(
'No Images in train dataset, please check annotation file and path in the configuration file'
)
return
if config['Eval']:
valid_dataloader = build_dataloader(config, 'Eval', device, logger)
else:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment