README_en.md 4.58 KB
Newer Older
LDOUBLEV's avatar
LDOUBLEV committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

## Introduction

Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance.

This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model.
[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry.

It is recommended that you could understand following pages before reading this example:
1. [PaddleOCR training methods](../../../doc/doc_ch/quickstart.md)
2. [The demo of prune](https://github.com/PaddlePaddle/PaddleSlim/blob/release%2F2.0.0/docs/zh_cn/tutorials/pruning/dygraph/filter_pruning.md)

## Quick start

Five steps for OCR model prune:
1. Install PaddleSlim
2. Prepare the trained model
3. Sensitivity analysis and tailoring training
4. Export model, predict deployment

### 1. Install PaddleSlim

```bash
git clone https://github.com/PaddlePaddle/PaddleSlim.git
LDOUBLEV's avatar
LDOUBLEV committed
25
cd PaddleSlim
LDOUBLEV's avatar
LDOUBLEV committed
26
git checkout develop
LDOUBLEV's avatar
LDOUBLEV committed
27
28
29
30
python3 setup.py install
```


fanruinet's avatar
fanruinet committed
31
### 2. Download Pre-trained Model
LDOUBLEV's avatar
LDOUBLEV committed
32
Model prune needs to load pre-trained models.
LDOUBLEV's avatar
LDOUBLEV committed
33
PaddleOCR also provides a series of [models](../../../doc/doc_en/models_list_en.md). Developers can choose their own models or use their own models according to their needs.
LDOUBLEV's avatar
LDOUBLEV committed
34
35
36
37


### 3. Pruning sensitivity analysis

fanruinet's avatar
fanruinet committed
38
  After the pre-trained model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sen.pickle.  After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
LDOUBLEV's avatar
LDOUBLEV committed
39
40
41
42
43
44
45
46
47
48
49
  The data format of sensitivity file:
      sen.pickle(Dict){
              'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
              'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
          }

      example:
          {
              'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594}
              'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405}
          }
fanruinet's avatar
fanruinet committed
50
  The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of corresponding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)
LDOUBLEV's avatar
LDOUBLEV committed
51
52
53
54
55
56


Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command:

```bash

LDOUBLEV's avatar
LDOUBLEV committed
57
python3.7 deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model="your trained model"  Global.save_model_dir=./output/prune_model/
LDOUBLEV's avatar
LDOUBLEV committed
58
59
60
61
62
63
64
65

```


### 5.  Export inference model and deploy it

We can export the pruned model as inference_model for deployment:
```bash
LDOUBLEV's avatar
LDOUBLEV committed
66
python deploy/slim/prune/export_prune_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml  -o Global.pretrained_model=./output/det_db/best_accuracy  Global.save_inference_dir=./prune/prune_inference_model
LDOUBLEV's avatar
LDOUBLEV committed
67
68
69
70
71
```

Reference for prediction and deployment of inference model:
1. [inference model python prediction](../../../doc/doc_en/inference_en.md)
2. [inference model C++ prediction](../../cpp_infer/readme_en.md)