FROM hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev
# PaddleOCR base on Python3.7
RUN pip3.7 install--upgrade pip -i https://mirror.baidu.com/pypi/simple
RUN python3.7 -m pip install paddlepaddle-gpu==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
RUN pip3.7 install paddlehub --upgrade-i https://mirror.baidu.com/pypi/simple
RUN git clone https://github.com/PaddlePaddle/PaddleOCR.git /PaddleOCR
WORKDIR /PaddleOCR
RUN pip3.7 install-r requirements.txt -i https://mirror.baidu.com/pypi/simple
RUN mkdir-p /PaddleOCR/inference/
# Download orc detect model(light version). if you want to change normal version, you can change ch_ppocr_mobile_v1.1_det_infer to ch_ppocr_server_v1.1_det_infer, also remember change det_model_dir in deploy/hubserving/ocr_system/params.py)
ADD {link} /PaddleOCR/inference/
RUN tar xf /PaddleOCR/inference/{file}.tar -C /PaddleOCR/inference/
# Download direction classifier(light version). If you want to change normal version, you can change ch_ppocr_mobile_v1.1_cls_infer to ch_ppocr_mobile_v1.1_cls_infer, also remember change cls_model_dir in deploy/hubserving/ocr_system/params.py)
ADD {link} /PaddleOCR/inference/
RUN tar xf /PaddleOCR/inference/{file}-C /PaddleOCR/inference/
# Download orc recognition model(light version). If you want to change normal version, you can change ch_ppocr_mobile_v1.1_rec_infer to ch_ppocr_server_v1.1_rec_infer, also remember change rec_model_dir in deploy/hubserving/ocr_system/params.py)
ADD {link} /PaddleOCR/inference/
RUN tar xf /PaddleOCR/inference/{file}.tar -C /PaddleOCR/inference/
PaddleOCR decomposes an algorithm into the following parts, and modularizes each part to make it more convenient to develop new algorithms.
* Data loading and processing
* Network
* Post-processing
* Loss
* Metric
* Optimizer
The following will introduce each part separately, and introduce how to add the modules required for the new algorithm.
## Data loading and processing
Data loading and processing are composed of different modules, which complete the image reading, data augment and label production. This part is under [ppocr/data](../../ppocr/data). The explanation of each file and folder are as follows:
```bash
ppocr/data/
├── imaug # Scripts for image reading, data augment and label production
│ ├── label_ops.py # Modules that transform the label
│ ├── operators.py # Modules that transform the image
│ ├──.....
├── __init__.py
├── lmdb_dataset.py # The dataset that reads the lmdb
└── simple_dataset.py # Read the dataset saved in the form of `image_path\tgt`
```
PaddleOCR has a large number of built-in image operation related modules. For modules that are not built-in, you can add them through the following steps:
1. Create a new file under the [ppocr/data/imaug](../../ppocr/data/imaug) folder, such as my_module.py.
2. Add code in the my_module.py file, the sample code is as follows:
```python
classMyModule:
def__init__(self,*args,**kwargs):
# your init code
pass
def__call__(self,data):
img=data['image']
label=data['label']
# your process code
data['image']=img
data['label']=label
returndata
```
3. Import the added module in the [ppocr/data/imaug/\__init\__.py](../../ppocr/data/imaug/__init__.py) file.
All different modules of data processing are executed by sequence, combined and executed in the form of a list in the config file. Such as:
```yaml
# angle class data process
transforms:
-DecodeImage:# load image
img_mode:BGR
channel_first:False
-MyModule:
args1:args1
args2:args2
-KeepKeys:
keep_keys:['image','label']# dataloader will return list in this order
```
## Network
The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
necks->heads).
```bash
├── architectures # Code for building network
├── transforms # Image Transformation Module
├── backbones # Feature extraction module
├── necks # Feature enhancement module
└── heads # Output module
```
PaddleOCR has built-in commonly used modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in, you can add them through the following steps, the four parts are added in the same steps, take backbones as an example:
1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
2. Add code in the my_backbone.py file, the sample code is as follows:
```python
importpaddle
importpaddle.nnasnn
importpaddle.nn.functionalasF
classMyBackbone(nn.Layer):
def__init__(self,*args,**kwargs):
super(MyBackbone,self).__init__()
# your init code
self.conv=nn.xxxx
defforward(self,inputs):
# your necwork forward
y=self.conv(inputs)
returny
```
3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.
After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:
```yaml
Architecture:
model_type:rec
algorithm:CRNN
Transform:
name:MyTransform
args1:args1
args2:args2
Backbone:
name:MyBackbone
args1:args1
Neck:
name:MyNeck
args1:args1
Head:
name:MyHead
args1:args1
```
## Post-processing
Post-processing realizes decoding network output to obtain text box or recognized text. This part is under [ppocr/postprocess](../../ppocr/postprocess).
PaddleOCR has built-in post-processing modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For components that are not built-in, they can be added through the following steps:
1. Create a new file under the [ppocr/postprocess](../../ppocr/postprocess) folder, such as my_postprocess.py.
2. Add code in the my_postprocess.py file, the sample code is as follows:
3. Import the added module in the [ppocr/postprocess/\__init\__.py](../../ppocr/postprocess/__init__.py) file.
After the post-processing module is added, you only need to configure it in the configuration file to use, such as:
```yaml
PostProcess:
name:MyPostProcess
args1:args1
args2:args2
```
## Loss
The loss function is used to calculate the distance between the network output and the label. This part is under [ppocr/losses](../../ppocr/losses).
PaddleOCR has built-in loss function modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in modules, you can add them through the following steps:
1. Create a new file in the [ppocr/losses](../../ppocr/losses) folder, such as my_loss.py.
2. Add code in the my_loss.py file, the sample code is as follows:
```python
importpaddle
frompaddleimportnn
classMyLoss(nn.Layer):
def__init__(self,**kwargs):
super(MyLoss,self).__init__()
# you init code
pass
def__call__(self,predicts,batch):
label=batch[1]
# your loss code
loss=self.loss(input=predicts,label=label)
return{'loss':loss}
```
3. Import the added module in the [ppocr/losses/\__init\__.py](../../ppocr/losses/__init__.py) file.
After the loss function module is added, you only need to configure it in the configuration file to use it, such as:
```yaml
Loss:
name:MyLoss
args1:args1
args2:args2
```
## Metric
Metric is used to calculate the performance of the network on the current batch. This part is under [ppocr/metrics](../../ppocr/metrics). PaddleOCR has built-in evaluation modules related to algorithms such as detection, classification and recognition. For modules that do not have built-in modules, you can add them through the following steps:
1. Create a new file under the [ppocr/metrics](../../ppocr/metrics) folder, such as my_metric.py.
2. Add code in the my_metric.py file, the sample code is as follows:
```python
classMyMetric(object):
def__init__(self,main_indicator='acc',**kwargs):
# main_indicator is used for select best model
self.main_indicator=main_indicator
self.reset()
def__call__(self,preds,batch,*args,**kwargs):
# preds is out of postprocess
# batch is out of dataloader
labels=batch[1]
cur_correct_num=0
cur_all_num=0
# you metric code
self.correct_num+=cur_correct_num
self.all_num+=cur_all_num
return{'acc':cur_correct_num/cur_all_num,}
defget_metric(self):
"""
return metircs {
'acc': 0,
'norm_edit_dis': 0,
}
"""
acc=self.correct_num/self.all_num
self.reset()
return{'acc':acc}
defreset(self):
# reset metric
self.correct_num=0
self.all_num=0
```
3. Import the added module in the [ppocr/metrics/\__init\__.py](../../ppocr/metrics/__init__.py) file.
After the metric module is added, you only need to configure it in the configuration file to use it, such as:
```yaml
Metric:
name:MyMetric
main_indicator:acc
```
## 优化器
The optimizer is used to train the network. The optimizer also contains network regularization and learning rate decay modules. This part is under [ppocr/optimizer](../../ppocr/optimizer). PaddleOCR has built-in
Commonly used optimizer modules such as `Momentum`, `Adam` and `RMSProp`, common regularization modules such as `Linear`, `Cosine`, `Step` and `Piecewise`, and common learning rate decay modules such as `L1Decay` and `L2Decay`.
Modules without built-in can be added through the following steps, take `optimizer` as an example:
1. Create your own optimizer in the [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) file, the sample code is as follows:
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, Please uncomment the `RecAug` and `RandAugment` fields under `Train.dataset.transforms` in the configuration file.
The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment.
Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to:
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/cls/cls_mv3.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/cls_mv3/best_accuracy` during the evaluation process.
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/cls/cls_mv3.yml` to set the evaluation frequency. By default, it is evaluated every 1000 iter. The following content will be saved during training:
```bash
├── best_accuracy.pdopt # Optimizer parameters for the best model
├── best_accuracy.pdparams # Parameters of the best model
├── best_accuracy.states # Metric info and epochs of the best model
├── config.yml # Configuration file for this experiment
├── latest.pdopt # Optimizer parameters for the latest model
├── latest.pdparams # Parameters of the latest model
├── latest.states # Metric info and epochs of the latest model
└── train.log # Training log
```
If the evaluation set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.
...
...
@@ -92,7 +101,7 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
### EVALUATION
The evaluation dataset can be modified via `configs/cls/cls_reader.yml` setting of `label_file_path` in EvalReader.
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/cls/cls_mv3.yml` file.
| -c | ALL | Specify configuration file to use | None | **Please refer to the parameter introduction for configuration file usage** |
| -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` |
| -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false |
## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE
Take `rec_chinese_lite_train_v1.1.yml` as an example
Take rec_chinese_lite_train_v1.1.yml as an example
| algorithm | Select algorithm to use | Synchronize with configuration file | For selecting model, please refer to the supported model [list](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/README_en.md) |
| use_gpu | Set using GPU or not | true | \ |
| epoch_num | Maximum training epoch number | 3000 | \ |
| epoch_num | Maximum training epoch number | 500 | \ |
| save_model_dir | Set model save path | output/{model_name} | \ |
| save_model_dir | Set model save path | output/{算法名称} | \ |
| save_epoch_step | Set model save interval | 3 | \ |
| eval_batch_step | Set the model evaluation interval |2000 or [1000, 2000] |runing evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration |
|train_batch_size_per_card | Set the batch size during training | 256 | \ |
| test_batch_size_per_card | Set the batch size during testing | 256 | \ |
| loss_type | Set loss type | ctc | Supports two types of loss: ctc / attention |
| distort | Set use distort | false | Support distort type ,read [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) |
| use_space_char | Wether to recognize space | false | Only support in character_type=ch mode |
label_list | Set the angle supported by the direction classifier | ['0','180'] | Only valid in the direction classifier |
| reader_yml | Set the reader configuration file | ./configs/rec/rec_icdar15_reader.yml | \ |
| reader_function | Select data reading method | ppocr.data.rec.dataset_traversal,SimpleReader | Support two data reading methods: SimpleReader / LMDBReader |
| num_workers | Set the number of data reading threads | 8 | \ |
| infer_img | Result folder path | ./infer_img | \|
| eval_batch_step | Set the model evaluation interval | 2000 or [1000, 2000] | runing evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration |
| cal_metric_during_train | Set whether to evaluate the metric during the training process. At this time, the metric of the model under the current batch is evaluated | true | \ |
| load_static_weights | Set whether the pre-training model is saved in static graph mode (currently only required by the detection algorithm) | true | \ |
| pretrained_model | Set the path of the pre-trained model | ./pretrain_models/CRNN/best_accuracy | \ |
| checkpoints | set model parameter path | None | Used to load parameters after interruption to continue training|
| use_visualdl | Set whether to enable visualdl for visual log display | False | [Tutorial](https://www.paddlepaddle.org.cn/paddle/visualdl) |
| infer_img | Set inference image path or folder path | ./infer_img | \|
| function | Select Optimizer function | pocr.optimizer,AdamDecay | Only support Adam |
| base_lr | Set the base lr | 0.0005 | \ |
| name | Optimizer class name | Adam | Currently supports`Momentum`,`Adam`,`RMSProp`, see [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) |
| beta1 | Set the exponential decay rate for the 1st moment estimates | 0.9 | \ |
| beta2 | Set the exponential decay rate for the 2nd moment estimates | 0.999 | \ |
| decay | Whether to use decay | \ | \ |
| function(decay) | Set the decay function | cosine_decay | Support cosine_decay, cosine_decay_warmup and piecewise_decay |
| step_each_epoch | The number of steps in an epoch. Used in cosine_decay/cosine_decay_warmup | 20 | Calculation: total_image_num / (batch_size_per_card * card_size) |
| total_epoch | The number of epochs. Used in cosine_decay/cosine_decay_warmup | 1000 | Consistent with Global.epoch_num |
| warmup_minibatch | Number of steps for linear warmup. Used in cosine_decay_warmup | 1000 | \ |
| boundaries | The step intervals to reduce learning rate. Used in piecewise_decay | - | The format is list |
| decay_rate | Learning rate decay rate. Used in piecewise_decay | - | \ |
| **lr** | Set the learning rate decay method | - | \ |
| name | Learning rate decay class name | Cosine | Currently supports`Linear`,`Cosine`,`Step`,`Piecewise`, see[ppocr/optimizer/learning_rate.py](../../ppocr/optimizer/learning_rate.py) |
| learning_rate | Set the base learning rate | 0.001 | \ |
| model_type | Network Type | rec | Currently support`rec`,`det`,`cls` |
| algorithm | Model name | CRNN | See [algorithm_overview](./algorithm_overview.md) for the support list |
| **Transform** | Set the transformation method | - | Currently only recognition algorithms are supported, see [ppocr/modeling/transform](../../ppocr/modeling/transform) for details |
| name | Transformation class name | TPS | Currently supports `TPS` |
| num_fiducial | Number of TPS control points | 20 | Ten on the top and bottom |
| name | Metric method name | CTCLabelDecode | Currently support`DetMetric`,`RecMetric`,`ClsMetric` |
| main_indicator | Main indicators, used to select the best model | acc | For the detection method is hmean, the recognition and classification method is acc |
| label_file_list | Groundtruth file path | ["./train_data/train_list.txt"] | This parameter is not required when dataset is LMDBDateSet |
| ratio_list | Ratio of data set | [1.0] | If there are two train_lists in label_file_list and ratio_list is [0.4,0.6], 40% will be sampled from train_list1, and 60% will be sampled from train_list2 to combine the entire dataset |
| transforms | List of methods to transform images and labels | [DecodeImage,CTCLabelEncode,RecResizeImg,KeepKeys] | see[ppocr/data/imaug](../../ppocr/data/imaug) |
| **loader** | dataloader related | - | |
| shuffle | Does each epoch disrupt the order of the data set | True | \ |
| batch_size_per_card | Single card batch size during training | 256 | \ |
| drop_last | Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size | True | \ |
| num_workers | The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process | 8 | \ |
| max_text_length | The maximum text length that the recognition algorithm can recognize | 25 |
| rec_char_dict_path | the alphabet path which needs to be modified to your own path when `rec_model_Name` use mode 2 | ./ppocr/utils/ppocr_keys_v1.txt |
| use_space_char | Whether to recognize spaces | TRUE |
| drop_score | Filter the output by score (from the recognition model), and those below this score will not be returned | 0.5 |
| use_angle_cls | Whether to load classification model | FALSE |
| cls_model_dir | the classification inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/cls`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
| cls | Enable classification when `ppocr.ocr` func exec((Use use_angle_cls in command line mode to control whether to start classification in the forward direction) | FALSE |