Commit 6ec9aa4d authored by LDOUBLEV's avatar LDOUBLEV
Browse files

Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into trt_cpp

parents df1c97af 65e61f44
...@@ -148,7 +148,7 @@ The visual text detection results are saved to the ./inference_results folder by ...@@ -148,7 +148,7 @@ The visual text detection results are saved to the ./inference_results folder by
![](../imgs_results/det_res_00018069.jpg) ![](../imgs_results/det_res_00018069.jpg)
You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image, You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
The optional parameters of `litmit_type` are [`max`, `min`], and The optional parameters of `limit_type` are [`max`, `min`], and
`det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960. `det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960.
The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960, The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960,
......
## OCR model list(V2.0, updated on 2021.1.20) ## OCR model list(V2.0, updated on 2021.1.20)
**Note** : Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance. > **Note**
> 1. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
> 2. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md).
- [1. Text Detection Model](#Detection) - [1. Text Detection Model](#Detection)
- [2. Text Recognition Model](#Recognition) - [2. Text Recognition Model](#Recognition)
...@@ -12,9 +14,13 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine ...@@ -12,9 +14,13 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
|model type|model format|description| |model type|model format|description|
|--- | --- | --- | |--- | --- | --- |
|inference model|inference.pdmodel、inference.pdiparams|Used for reasoning based on Python prediction engine,[detail](./inference_en.md)| |inference model|inference.pdmodel、inference.pdiparams|Used for inference based on Paddle inference engine,[detail](./inference_en.md)|
|trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.| |trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.|
|slim model|\*.nb|Generally used for Lite deployment| |slim model|\*.nb| Model compressed by PaddleSim (a model compression tool using PaddlePaddle), which is suitable for mobile-side deployment scenarios (Paddle-Lite is needed for slim model deployment). |
Relationship of the above models is as follows.
![](../imgs_en/model_prod_flow_en.png)
<a name="Detection"></a> <a name="Detection"></a>
### 1. Text Detection Model ### 1. Text Detection Model
...@@ -80,7 +86,7 @@ If you want to train your own model, you can prepare the training set file, veri ...@@ -80,7 +86,7 @@ If you want to train your own model, you can prepare the training set file, veri
cd {your/path/}PaddleOCR/configs/rec/multi_language/ cd {your/path/}PaddleOCR/configs/rec/multi_language/
# The -l or --language parameter is required # The -l or --language parameter is required
# --train modify train_list path # --train modify train_list path
# --val modify eval_list path # --val modify eval_list path
# --data_dir modify data dir # --data_dir modify data dir
# -o modify default parameters # -o modify default parameters
# --dict Change the dictionary path. The example uses the default dictionary path, so that this parameter can be empty. # --dict Change the dictionary path. The example uses the default dictionary path, so that this parameter can be empty.
......
## TEXT RECOGNITION ## TEXT RECOGNITION
- [DATA PREPARATION](#DATA_PREPARATION) - [1 DATA PREPARATION](#DATA_PREPARATION)
- [Dataset Download](#Dataset_download) - [1.1 Costom Dataset](#Costom_Dataset)
- [Costom Dataset](#Costom_Dataset) - [1.2 Dataset Download](#Dataset_download)
- [Dictionary](#Dictionary) - [1.3 Dictionary](#Dictionary)
- [Add Space Category](#Add_space_category) - [1.4 Add Space Category](#Add_space_category)
- [TRAINING](#TRAINING) - [2 TRAINING](#TRAINING)
- [Data Augmentation](#Data_Augmentation) - [2.1 Data Augmentation](#Data_Augmentation)
- [Training](#Training) - [2.2 Training](#Training)
- [Multi-language](#Multi_language) - [2.3 Multi-language](#Multi_language)
- [EVALUATION](#EVALUATION) - [3 EVALUATION](#EVALUATION)
- [PREDICTION](#PREDICTION) - [4 PREDICTION](#PREDICTION)
- [Training engine prediction](#Training_engine_prediction) - [4.1 Training engine prediction](#Training_engine_prediction)
<a name="DATA_PREPARATION"></a> <a name="DATA_PREPARATION"></a>
### DATA PREPARATION ### DATA PREPARATION
PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data: PaddleOCR supports two data formats:
- `LMDB` is used to train data sets stored in lmdb format;
- `general data` is used to train data sets stored in text files:
Please organize the dataset as follows: Please organize the dataset as follows:
The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory: The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
``` ```
# linux and mac os
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
# windows
mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
``` ```
<a name="Dataset_download"></a>
* Dataset download
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
<a name="Costom_Dataset"></a> <a name="Costom_Dataset"></a>
* Use your own dataset: #### 1.1 Costom dataset
If you want to use your own data for training, please refer to the following to organize your data. If you want to use your own data for training, please refer to the following to organize your data.
- Training set - Training set
First put the training images in the same folder (train_images), and use a txt file (rec_gt_train.txt) to store the image path and label. It is recommended to put the training images in the same folder, and use a txt file (rec_gt_train.txt) to store the image path and label. The contents of the txt file are as follows:
* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error * Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error
``` ```
" Image file name Image annotation " " Image file name Image annotation "
train_data/train_0001.jpg 简单可依赖 train_data/rec/train/word_001.jpg 简单可依赖
train_data/train_0002.jpg 用科技让复杂的世界更简单 train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
``` ...
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
```
# Training set label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
# Test Set Label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
``` ```
The final training set should have the following file structure: The final training set should have the following file structure:
``` ```
|-train_data |-train_data
|-ic15_data |-rec
|- rec_gt_train.txt |- rec_gt_train.txt
|- train |- train
|- word_001.png |- word_001.png
|- word_002.jpg |- word_002.jpg
|- word_003.jpg |- word_003.jpg
| ... | ...
``` ```
- Test set - Test set
...@@ -82,6 +73,7 @@ Similar to the training set, the test set also needs to be provided a folder con ...@@ -82,6 +73,7 @@ Similar to the training set, the test set also needs to be provided a folder con
``` ```
|-train_data |-train_data
|-rec
|-ic15_data |-ic15_data
|- rec_gt_test.txt |- rec_gt_test.txt
|- test |- test
...@@ -90,8 +82,25 @@ Similar to the training set, the test set also needs to be provided a folder con ...@@ -90,8 +82,25 @@ Similar to the training set, the test set also needs to be provided a folder con
|- word_003.jpg |- word_003.jpg
| ... | ...
``` ```
<a name="Dataset_download"></a>
#### 1.2 Dataset download
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,download the lmdb format dataset required for benchmark
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
```
# Training set label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
# Test Set Label
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
```
<a name="Dictionary"></a> <a name="Dictionary"></a>
- Dictionary #### 1.3 Dictionary
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index. Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
...@@ -108,6 +117,8 @@ n ...@@ -108,6 +117,8 @@ n
In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1] In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1]
PaddleOCR has built-in dictionaries, which can be used on demand.
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters. `ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.
`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters `ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
...@@ -123,8 +134,6 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a ...@@ -123,8 +134,6 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
`ppocr/utils/dict/en_dict.txt` is a English dictionary with 63 characters `ppocr/utils/dict/en_dict.txt` is a English dictionary with 63 characters
You can use it on demand.
The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**, The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) and we will thank you in the Repo. If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) and we will thank you in the Repo.
...@@ -136,14 +145,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co ...@@ -136,14 +145,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch. If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
<a name="Add_space_category"></a> <a name="Add_space_category"></a>
- Add space category #### 1.4 Add space category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`. If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
**Note: use_space_char only takes effect when character_type=ch** **Note: use_space_char only takes effect when character_type=ch**
<a name="TRAINING"></a> <a name="TRAINING"></a>
### TRAINING ### 2 TRAINING
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example: PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
...@@ -166,7 +175,7 @@ Start training: ...@@ -166,7 +175,7 @@ Start training:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml
``` ```
<a name="Data_Augmentation"></a> <a name="Data_Augmentation"></a>
- Data Augmentation #### 2.1 Data Augmentation
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file. PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
...@@ -175,7 +184,7 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand ...@@ -175,7 +184,7 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand
Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
<a name="Training"></a> <a name="Training"></a>
- Training #### 2.2 Training
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process. PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
...@@ -268,7 +277,7 @@ Eval: ...@@ -268,7 +277,7 @@ Eval:
**Note that the configuration file for prediction/evaluation must be consistent with the training.** **Note that the configuration file for prediction/evaluation must be consistent with the training.**
<a name="Multi_language"></a> <a name="Multi_language"></a>
- Multi-language #### 2.3 Multi-language
PaddleOCR currently supports 26 (except Chinese) language recognition. A multi-language configuration file template is PaddleOCR currently supports 26 (except Chinese) language recognition. A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml) provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)
...@@ -420,7 +429,7 @@ Eval: ...@@ -420,7 +429,7 @@ Eval:
``` ```
<a name="EVALUATION"></a> <a name="EVALUATION"></a>
### EVALUATION ### 3 EVALUATION
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file. The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
...@@ -430,10 +439,10 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec ...@@ -430,10 +439,10 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
``` ```
<a name="PREDICTION"></a> <a name="PREDICTION"></a>
### PREDICTION ### 4 PREDICTION
<a name="Training_engine_prediction"></a> <a name="Training_engine_prediction"></a>
* Training engine prediction #### 4.1 Training engine prediction
Using the model trained by paddleocr, you can quickly get prediction through the following script. Using the model trained by paddleocr, you can quickly get prediction through the following script.
......
# paddleocr package # paddleocr package
## Get started quickly ## 1 Get started quickly
### install package ### 1.1 install package
install by pypi install by pypi
```bash ```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+ pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
...@@ -12,9 +12,11 @@ build own whl package and install ...@@ -12,9 +12,11 @@ build own whl package and install
python3 setup.py bdist_wheel python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
``` ```
### 1. Use by code ## 2 Use
### 2.1 Use by code
The paddleocr whl package will automatically download the ppocr lightweight model as the default model, which can be customized and replaced according to the section 3 **Custom Model**.
* detection classification and recognition * detection angle classification and recognition
```python ```python
from paddleocr import PaddleOCR,draw_ocr from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese. # Paddleocr supports Chinese, English, French, German, Korean and Japanese.
...@@ -163,7 +165,7 @@ Output will be a list, each item contains classification result and confidence ...@@ -163,7 +165,7 @@ Output will be a list, each item contains classification result and confidence
['0', 0.99999964] ['0', 0.99999964]
``` ```
### Use by command line ### 2.2 Use by command line
show help information show help information
```bash ```bash
...@@ -239,11 +241,11 @@ Output will be a list, each item contains classification result and confidence ...@@ -239,11 +241,11 @@ Output will be a list, each item contains classification result and confidence
['0', 0.99999964] ['0', 0.99999964]
``` ```
## Use custom model ## 3 Use custom model
When the built-in model cannot meet the needs, you need to use your own trained model. When the built-in model cannot meet the needs, you need to use your own trained model.
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows
### 1. Use by code ### 3.1 Use by code
```python ```python
from paddleocr import PaddleOCR,draw_ocr from paddleocr import PaddleOCR,draw_ocr
...@@ -265,17 +267,17 @@ im_show = Image.fromarray(im_show) ...@@ -265,17 +267,17 @@ im_show = Image.fromarray(im_show)
im_show.save('result.jpg') im_show.save('result.jpg')
``` ```
### Use by command line ### 3.2 Use by command line
```bash ```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true
``` ```
### Use web images or numpy array as input ## 4 Use web images or numpy array as input
1. Web image ### 4.1 Web image
Use by code - Use by code
```python ```python
from paddleocr import PaddleOCR, draw_ocr from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
...@@ -294,12 +296,12 @@ im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc ...@@ -294,12 +296,12 @@ im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc
im_show = Image.fromarray(im_show) im_show = Image.fromarray(im_show)
im_show.save('result.jpg') im_show.save('result.jpg')
``` ```
Use by command line - Use by command line
```bash ```bash
paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true
``` ```
2. Numpy array ### 4.2 Numpy array
Support numpy array as input only when used by code Support numpy array as input only when used by code
```python ```python
...@@ -324,7 +326,7 @@ im_show.save('result.jpg') ...@@ -324,7 +326,7 @@ im_show.save('result.jpg')
``` ```
## Parameter Description ## 5 Parameter Description
| Parameter | Description | Default value | | Parameter | Description | Default value |
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------| |-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
......
doc/joinus.PNG

114 KB | W: | H:

doc/joinus.PNG

109 KB | W: | H:

doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
  • 2-up
  • Swipe
  • Onion skin
...@@ -215,7 +215,7 @@ class AttnLabelEncode(BaseRecLabelEncode): ...@@ -215,7 +215,7 @@ class AttnLabelEncode(BaseRecLabelEncode):
return None return None
data['length'] = np.array(len(text)) data['length'] = np.array(len(text))
text = [0] + text + [len(self.character) - 1] + [0] * (self.max_text_len text = [0] + text + [len(self.character) - 1] + [0] * (self.max_text_len
- len(text) - 1) - len(text) - 2)
data['label'] = np.array(text) data['label'] = np.array(text)
return data return data
...@@ -255,13 +255,13 @@ class SRNLabelEncode(BaseRecLabelEncode): ...@@ -255,13 +255,13 @@ class SRNLabelEncode(BaseRecLabelEncode):
def __call__(self, data): def __call__(self, data):
text = data['label'] text = data['label']
text = self.encode(text) text = self.encode(text)
char_num = len(self.character_str) char_num = len(self.character)
if text is None: if text is None:
return None return None
if len(text) > self.max_text_len: if len(text) > self.max_text_len:
return None return None
data['length'] = np.array(len(text)) data['length'] = np.array(len(text))
text = text + [char_num] * (self.max_text_len - len(text)) text = text + [char_num - 1] * (self.max_text_len - len(text))
data['label'] = np.array(text) data['label'] = np.array(text)
return data return data
......
...@@ -84,11 +84,12 @@ class MakeShrinkMap(object): ...@@ -84,11 +84,12 @@ class MakeShrinkMap(object):
return polygons, ignore_tags return polygons, ignore_tags
def polygon_area(self, polygon): def polygon_area(self, polygon):
# return cv2.contourArea(polygon.astype(np.float32)) """
edge = 0 compute polygon area
for i in range(polygon.shape[0]): """
next_index = (i + 1) % polygon.shape[0] area = 0
edge += (polygon[next_index, 0] - polygon[i, 0]) * ( q = polygon[-1]
polygon[next_index, 1] - polygon[i, 1]) for p in polygon:
area += p[0] * q[1] - p[1] * q[0]
return edge / 2. q = p
return area / 2.0
...@@ -185,8 +185,8 @@ class DetResizeForTest(object): ...@@ -185,8 +185,8 @@ class DetResizeForTest(object):
resize_h = int(h * ratio) resize_h = int(h * ratio)
resize_w = int(w * ratio) resize_w = int(w * ratio)
resize_h = int(round(resize_h / 32) * 32) resize_h = max(int(round(resize_h / 32) * 32), 32)
resize_w = int(round(resize_w / 32) * 32) resize_w = max(int(round(resize_w / 32) * 32), 32)
try: try:
if int(resize_w) <= 0 or int(resize_h) <= 0: if int(resize_w) <= 0 or int(resize_h) <= 0:
......
...@@ -29,7 +29,7 @@ class RecMetric(object): ...@@ -29,7 +29,7 @@ class RecMetric(object):
pred = pred.replace(" ", "") pred = pred.replace(" ", "")
target = target.replace(" ", "") target = target.replace(" ", "")
norm_edit_dis += Levenshtein.distance(pred, target) / max( norm_edit_dis += Levenshtein.distance(pred, target) / max(
len(pred), len(target)) len(pred), len(target), 1)
if pred == target: if pred == target:
correct_num += 1 correct_num += 1
all_num += 1 all_num += 1
......
...@@ -146,6 +146,9 @@ class AttentionLSTM(nn.Layer): ...@@ -146,6 +146,9 @@ class AttentionLSTM(nn.Layer):
else: else:
targets = paddle.zeros(shape=[batch_size], dtype="int32") targets = paddle.zeros(shape=[batch_size], dtype="int32")
probs = None probs = None
char_onehots = None
outputs = None
alpha = None
for i in range(num_steps): for i in range(num_steps):
char_onehots = self._char_to_onehot( char_onehots = self._char_to_onehot(
......
...@@ -47,6 +47,7 @@ def main(): ...@@ -47,6 +47,7 @@ def main():
config['Architecture']["Head"]['out_channels'] = len( config['Architecture']["Head"]['out_channels'] = len(
getattr(post_process_class, 'character')) getattr(post_process_class, 'character'))
model = build_model(config['Architecture']) model = build_model(config['Architecture'])
use_srn = config['Architecture']['algorithm'] == "SRN"
best_model_dict = init_model(config, model, logger) best_model_dict = init_model(config, model, logger)
if len(best_model_dict): if len(best_model_dict):
...@@ -59,7 +60,7 @@ def main(): ...@@ -59,7 +60,7 @@ def main():
# start eval # start eval
metirc = program.eval(model, valid_dataloader, post_process_class, metirc = program.eval(model, valid_dataloader, post_process_class,
eval_class) eval_class, use_srn)
logger.info('metric eval ***************') logger.info('metric eval ***************')
for k, v in metirc.items(): for k, v in metirc.items():
logger.info('{}:{}'.format(k, v)) logger.info('{}:{}'.format(k, v))
......
...@@ -54,6 +54,13 @@ class TextRecognizer(object): ...@@ -54,6 +54,13 @@ class TextRecognizer(object):
"character_dict_path": args.rec_char_dict_path, "character_dict_path": args.rec_char_dict_path,
"use_space_char": args.use_space_char "use_space_char": args.use_space_char
} }
elif self.rec_algorithm == "RARE":
postprocess_params = {
'name': 'AttnLabelDecode',
"character_type": args.rec_char_type,
"character_dict_path": args.rec_char_dict_path,
"use_space_char": args.use_space_char
}
self.postprocess_op = build_post_process(postprocess_params) self.postprocess_op = build_post_process(postprocess_params)
self.predictor, self.input_tensor, self.output_tensors = \ self.predictor, self.input_tensor, self.output_tensors = \
utility.create_predictor(args, 'rec', logger) utility.create_predictor(args, 'rec', logger)
......
...@@ -163,6 +163,11 @@ def train(config, ...@@ -163,6 +163,11 @@ def train(config,
if type(eval_batch_step) == list and len(eval_batch_step) >= 2: if type(eval_batch_step) == list and len(eval_batch_step) >= 2:
start_eval_step = eval_batch_step[0] start_eval_step = eval_batch_step[0]
eval_batch_step = eval_batch_step[1] eval_batch_step = eval_batch_step[1]
if len(valid_dataloader) == 0:
logger.info(
'No Images in eval dataset, evaluation during training will be disabled'
)
start_eval_step = 1e111
logger.info( logger.info(
"During the training process, after the {}th iteration, an evaluation is run every {} iterations". "During the training process, after the {}th iteration, an evaluation is run every {} iterations".
format(start_eval_step, eval_batch_step)) format(start_eval_step, eval_batch_step))
...@@ -177,6 +182,8 @@ def train(config, ...@@ -177,6 +182,8 @@ def train(config,
model_average = False model_average = False
model.train() model.train()
use_srn = config['Architecture']['algorithm'] == "SRN"
if 'start_epoch' in best_model_dict: if 'start_epoch' in best_model_dict:
start_epoch = best_model_dict['start_epoch'] start_epoch = best_model_dict['start_epoch']
else: else:
...@@ -195,7 +202,7 @@ def train(config, ...@@ -195,7 +202,7 @@ def train(config,
break break
lr = optimizer.get_lr() lr = optimizer.get_lr()
images = batch[0] images = batch[0]
if config['Architecture']['algorithm'] == "SRN": if use_srn:
others = batch[-4:] others = batch[-4:]
preds = model(images, others) preds = model(images, others)
model_average = True model_average = True
...@@ -251,8 +258,12 @@ def train(config, ...@@ -251,8 +258,12 @@ def train(config,
min_average_window=10000, min_average_window=10000,
max_average_window=15625) max_average_window=15625)
Model_Average.apply() Model_Average.apply()
cur_metric = eval(model, valid_dataloader, post_process_class, cur_metric = eval(
eval_class) model,
valid_dataloader,
post_process_class,
eval_class,
use_srn=use_srn)
cur_metric_str = 'cur metric, {}'.format(', '.join( cur_metric_str = 'cur metric, {}'.format(', '.join(
['{}: {}'.format(k, v) for k, v in cur_metric.items()])) ['{}: {}'.format(k, v) for k, v in cur_metric.items()]))
logger.info(cur_metric_str) logger.info(cur_metric_str)
...@@ -316,7 +327,8 @@ def train(config, ...@@ -316,7 +327,8 @@ def train(config,
return return
def eval(model, valid_dataloader, post_process_class, eval_class): def eval(model, valid_dataloader, post_process_class, eval_class,
use_srn=False):
model.eval() model.eval()
with paddle.no_grad(): with paddle.no_grad():
total_frame = 0.0 total_frame = 0.0
...@@ -327,7 +339,8 @@ def eval(model, valid_dataloader, post_process_class, eval_class): ...@@ -327,7 +339,8 @@ def eval(model, valid_dataloader, post_process_class, eval_class):
break break
images = batch[0] images = batch[0]
start = time.time() start = time.time()
if "SRN" in str(model.head):
if use_srn:
others = batch[-4:] others = batch[-4:]
preds = model(images, others) preds = model(images, others)
else: else:
......
...@@ -50,6 +50,12 @@ def main(config, device, logger, vdl_writer): ...@@ -50,6 +50,12 @@ def main(config, device, logger, vdl_writer):
# build dataloader # build dataloader
train_dataloader = build_dataloader(config, 'Train', device, logger) train_dataloader = build_dataloader(config, 'Train', device, logger)
if len(train_dataloader) == 0:
logger.error(
'No Images in train dataset, please check annotation file and path in the configuration file'
)
return
if config['Eval']: if config['Eval']:
valid_dataloader = build_dataloader(config, 'Eval', device, logger) valid_dataloader = build_dataloader(config, 'Eval', device, logger)
else: else:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment