first init

c6a27e0b · panhb · e4b993b1 · c6a27e0b · c6a27e0b · c6a27e0b
Commit c6a27e0b authored Jan 07, 2025 by panhb
20 changed files
--- a/docs/images/res.jpg
+++ b/docs/images/res.jpg
--- a/docs/images/road554.png
+++ b/docs/images/road554.png
--- a/docs/images/roadsign_yml.png
+++ b/docs/images/roadsign_yml.png
--- a/docs/images/ssld_model.png
+++ b/docs/images/ssld_model.png
--- a/docs/images/yaml_show.png
+++ b/docs/images/yaml_show.png
--- a/docs/tutorials/DistributedTraining_cn.md
+++ b/docs/tutorials/DistributedTraining_cn.md
+[English](DistributedTraining_en.md) | 简体中文
+# 分布式训练
+## 1. 简介
+* 分布式训练指的是将训练任务按照一定方法拆分到多个计算节点进行计算，再按照一定的方法对拆分后计算得到的梯度等信息进行聚合与更新。飞桨分布式训练技术源自百度的业务实践，在自然语言处理、计算机视觉、搜索和推荐等领域经过超大规模业务检验。分布式训练的高性能，是飞桨的核心优势技术之一，PaddleDetection同时支持单机训练与多机训练。更多关于分布式训练的方法与文档可以参考：[分布式训练快速开始教程](https://fleet-x.readthedocs.io/en/latest/paddle_fleet_rst/parameter_server/ps_quick_start.html)。
+## 2. 使用方法
+### 2.1 单机训练
+* 以PP-YOLOE-s为例，本地准备好数据之后，使用`paddle.distributed.launch`或者`fleetrun`的接口启动训练任务即可。下面为运行脚本示例。
+```bash
+fleetrun \
+--selected_gpu 0,1,2,3,4,5,6,7 \
+tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \
+--eval &>logs.txt 2>&1 &
+```
+### 2.2 多机训练
+* 相比单机训练，多机训练时，只需要添加`--ips`的参数，该参数表示需要参与分布式训练的机器的ip列表，不同机器的ip用逗号隔开。下面为运行代码示例。
+```shell
+ip_list="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151"
+fleetrun \
+--ips=${ip_list} \
+--selected_gpu 0,1,2,3,4,5,6,7 \
+tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \
+--eval &>logs.txt 2>&1 &
+```
+**注：**
+* 不同机器的ip信息需要用逗号隔开，可以通过`ifconfig`或者`ipconfig`查看。
+* 不同机器之间需要做免密设置，且可以直接ping通，否则无法完成通信。
+* 不同机器之间的代码、数据与运行命令或脚本需要保持一致，且所有的机器上都需要运行设置好的训练命令或者脚本。最终`ip_list`中的第一台机器的第一块设备是trainer0，以此类推。
+* 不同机器的起始端口可能不同，建议在启动多机任务前，在不同的机器中设置相同的多机运行起始端口，命令为`export FLAGS_START_PORT=17000`，端口值建议在`10000~20000`之间。
+## 3. 性能效果测试
+* 在单机和4机8卡V100的机器上，基于[PP-YOLOE-s](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml)进行模型训练，模型的训练耗时情况如下所示。
+机器 | 精度 | 耗时
+-|-|-
+单机8卡 | 42.7% | 39h
+4机8卡 | 42.1% | 13h
--- a/docs/tutorials/DistributedTraining_en.md
+++ b/docs/tutorials/DistributedTraining_en.md
+English | [简体中文](DistributedTraining_cn.md)
+## 1. Usage
+### 1.1 Single-machine
+* Take PP-YOLOE-s as an example, after preparing the data locally, use the interface of `paddle.distributed.launch` or `fleetrun` to start the training task. Below is an example of running the script.
+```bash
+fleetrun \
+--selected_gpu 0,1,2,3,4,5,6,7 \
+tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \
+--eval &>logs.txt 2>&1 &
+```
+### 1.2 Multi-machine
+* Compared with single-machine training, when training on multiple machines, you only need to add the `--ips` parameter, which indicates the ip list of machines that need to participate in distributed training. The ips of different machines are separated by commas. Below is an example of running code.
+```shell
+ip_list="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151"
+fleetrun \
+--ips=${ip_list} \
+--selected_gpu 0,1,2,3,4,5,6,7 \
+tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \
+--eval &>logs.txt 2>&1 &
+```
+**Note:**
+* The ip information of different machines needs to be separated by commas, which can be viewed through `ifconfig` or `ipconfig`.
+* Password-free settings are required between different machines, and they can be pinged directly, otherwise the communication cannot be completed.
+* The code, data, and running commands or scripts between different machines need to be consistent, and the set training commands or scripts need to be run on all machines. The first device of the first machine in the final `ip_list` is trainer0, and so on.
+* The starting port of different machines may be different. It is recommended to set the same starting port for multi-machine running in different machines before starting the multi-machine task. The command is `export FLAGS_START_PORT=17000`, and the port value is recommended to be `10000~20000`.
+## 2. Performance
+* On single-machine and 4-machine 8-card V100 machines, model training is performed based on [PP-YOLOE-s](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml). The model training time is as follows.
+Machine | mAP | Time cost
+-|-|-
+single machine | 42.7% | 39h
+4 machines | 42.1% | 13h
--- a/docs/tutorials/FAQ/FAQ第一期.md
+++ b/docs/tutorials/FAQ/FAQ第一期.md
+# FAQ：第一期
+**Q：**SOLOv2训练mAP值宽幅震荡，无上升趋势，检测效果不好，检测置信度超过了1的原因是？
+**A：** SOLOv2训练不收敛的话，先更新PaddleDetection到release/2.2或者develop分支尝试。
+**Q：** Optimizer中优化器支持哪几种？
+**A：** Paddle中支持的优化器[Optimizer](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html )在PaddleDetection中均支持，需要手动修改下配置文件即可。
+**Q：** 在tools/infer.py加入如下函数，得到FLOPs值为-1,请问原因？
+**A：** 更新PaddleDetection到release/2.2或者develop分支，`print_flops`设为True即可打印FLOPs。
+**Q：** 使用官方的ReID模块时遇到了模块未注册的问题
+**A：** 请尝试`pip uninstall paddledet`并重新安装，或者`python setup.py install`。
+**Q：** 大规模实用目标检测模型有动态图版本吗，或者可以转换为动态图版本吗？
+**A：** 大规模实用模型的动态图版本正在整理，我们正在开发更大规模的通用预训练模型，预计在2.3版本中发布。
+**Q：** Develop分支下FairMot预测视频问题：预测视频时不会完全运行完毕。比如用一个300frame的视频，代码会保存预测结果的每一帧图片，但只保存到299张就没了，并且也没有预测好的视频文件生成，该如何解决？
+**A：** 已经支持自己设置帧率infer视频，请使用develop分支或release/2.2分支，命令如下：
+```
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --video_file={your video name}.mp4 --frame_rate=20 --save_videos
+```
+**Q：** 使用YOLOv3模型如何通过yml文件修改输入图片尺寸？
+**A：** 模型预测部署需要用到指定的尺寸时，首先在训练前需要修改`configs/_base_/yolov3_reader.yml`中的`TrainReader`的`BatchRandomResize`中`target_size`包含指定的尺寸，训练完成后，在评估或者预测时，需要将`EvalReader`和`TestReader`中的`Resize`的`target_size`修改成对应的尺寸，如果是需要模型导出(export_model)，则需要将`TestReader`中的`image_shape`修改为对应的图片输入尺寸 。
+**Q：** 以前的模型都是用静态图训练的，现在想用动态图训练，但想加载原来静态图的模型作为预训练模型，可以直接用加载静态图保存的模型断点吗？如不行，有其它方法吗？
+**A：** 静态图和动态图模型的权重的key做下映射一一对应转过去是可以的，可以参考[这个代码](https://github.com/nemonameless/weights_st2dy )。但是不保证所有静态图的权重的key映射都能对应上，静态图是把背景也训练了，动态图去背景类训的，而且现有动态图模型训出来的一般都比以前静态图更高，资源时间够的情况下建议还是直接训动态图版本。
+**Q：** TTFNet训练过程中hm_loss异常
+**A：** 如果是单卡的话学习率需要对应降低8倍。另外ttfnet模型因为自身设置的学习率比较大，可能会出现其他数据集训练出现不稳定的情况。建议pretrain_weights加载官方release出的coco数据集上训练好的模型，然后将学习率再调低一些。
--- a/docs/tutorials/FAQ/FAQ第零期.md
+++ b/docs/tutorials/FAQ/FAQ第零期.md
+# FAQ：第零期
+**Q:**  为什么我使用单GPU训练loss会出`NaN`? </br>
+**A:**  配置文件中原始学习率是适配多GPU训练(8x GPU)，若使用单GPU训练，须对应调整学习率（例如，除以8）。
+以[faster_rcnn_r50](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml) 为例,在静态图下计算规则表如下所示，它们是等价的，表中变化节点即为`piecewise decay`里的`boundaries`: </br>
+| GPU数  |batch size/卡| 学习率  | 最大轮数 | 变化节点       |
+| :---------: |  :------------:|:------------: | :-------: | :--------------: |
+| 2          | 1 | 0.0025         | 720000    | [480000, 640000] |
+| 4          | 1 | 0.005          | 360000    | [240000, 320000] |
+| 8          | 1| 0.01           | 180000    | [120000, 160000] |
+* 上述方式适用于静态图下。在动态图中，由于训练以epoch方式计数，因此调整GPU卡数后只需要修改学习率即可，修改方式和静态图相同.
+**Q:**  自定义数据集时，配置文件里的`num_classes`应该如何设置? </br>
+**A:**  动态图中，自定义数据集时将`num_classes`统一设置为自定义数据集的类别数即可，静态图中(static目录下)，YOLO系列模型和anchor free系列模型将`num_classes`设置为自定义数据集类别即可，其他模型如RCNN系列，SSD，RetinaNet，SOLOv2等模型，由于检测原理上分类中需要区分背景框和前景框，设置的`num_classes`须为自定义数据集类别数+1，即增加一类背景类。
+**Q:**  PP-YOLOv2模型训练使用`—eval`做训练中验证，在第一次做eval的时候hang住,该如何处理?</br>
+**A:**  PP-YOLO系列模型如果只加载backbone的预训练权重从头开始训练的话收敛会比较慢，当模型还没有较好收敛的时候做预测时，由于输出的预测框比较混乱，在NMS时做排序和滤除会非常耗时，就好像eval时hang住了一样，这种情况一般发生在使用自定义数据集并且自定义数据集样本数较少导致训练到第一次做eval的时候训练轮数较少，模型还没有较好收敛的情况下，可以通过如下三个方面排查解决。
+* PaddleDetection中提供的默认配置一般是采用8卡训练的配置，配置文件中的`batch_size`数为每卡的batch size，若训练的时候不是使用8卡或者对`batch_size`有修改，需要等比例的调小初始`learning_rate`来获得较好的收敛效果
+* 如果使用自定义数据集并且样本数比较少，建议增大`snapshot_epoch`数来增加第一次进行eval的时候的训练轮数来保证模型已经较好收敛
+* 若使用自定义数据集训练，可以加载我们发布的COCO或VOC数据集上训练好的权重进行finetune训练来加快收敛速度，可以使用`-o pretrain_weights=xxx`的方式指定预训练权重，xxx可以是Model Zoo里发布的模型权重链接
+**Q:**  如何更好的理解reader和自定义修改reader文件
+```
+# 每张GPU reader进程个数
+worker_num: 2
+# 训练数据
+TrainReader:
+  inputs_def:
+    num_max_boxes: 50
+  # 训练数据transforms
+  sample_transforms:
+    - Decode: {} # 图片解码，将图片数据从numpy格式转为rgb格式，是必须存在的一个OP
+    - Mixup: {alpha: 1.5, beta: 1.5} # Mixup数据增强，对两个样本的gt_bbbox/gt_score操作，构建虚拟的训练样本，可选的OP
+    - RandomDistort: {} # 随机颜色失真，可选的OP
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} # 随机Canvas填充，可选的OP
+    - RandomCrop: {} # 随机裁剪，可选的OP
+    - RandomFlip: {} # 随机左右翻转，默认概率0.5，可选的OP
+  # batch_transforms
+  batch_transforms:
+    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeBox: {}
+    - PadBox: {num_max_boxes: 50}
+    - BboxXYXY2XYWH: {}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
+  # 训练时batch_size
+  batch_size: 24
+  # 读取数据是否乱序
+  shuffle: true
+  # 是否丢弃最后不能完整组成batch的数据
+  drop_last: true
+  # mixup_epoch，大于最大epoch，表示训练过程一直使用mixup数据增广。默认值为-1，表示不使用Mixup。如果删去- Mixup: {alpha: 1.5, beta: 1.5}这行代码则必须也将mixup_epoch设置为-1或者删除
+  mixup_epoch: 25000
+  # 是否通过共享内存进行数据读取加速，需要保证共享内存大小(如/dev/shm)满足大于1G
+  use_shared_memory: true
+  如果需要单尺度训练，则去掉batch_transforms里的BatchRandomResize这一行，在sample_transforms最后一行添加- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+  Decode是必须保留的，如果想要去除数据增强，则可以注释或删除Mixup RandomDistort RandomExpand RandomCrop RandomFlip，注意如果注释或删除Mixup则必须也将mixup_epoch这一行注释或删除，或者设置为-1表示不使用Mixup
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+```
+**Q:**  用户如何控制类别类别输出？即图中有多类目标只输出其中的某几类
+**A:**  用户可自行在代码中进行修改，增加条件设置。
+```
+# filter by class_id
+keep_class_id = [1, 2]
+bbox_res = [e for e in bbox_res if int(e[0]) in keep_class_id]
+```
+https://github.com/PaddlePaddle/PaddleDetection/blob/b87a1ea86fa18ce69e44a17ad1b49c1326f19ff9/ppdet/engine/trainer.py#L438
+**Q:**  用户自定义数据集训练，预测结果标签错误
+**A:**  此类情况往往是用户在设置数据集路径时候，并没有关注TestDataset中anno_path的路径问题。需要用户将anno_path设置成自己的路径。
+```
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/instances_val2017.json
+```
+**Q:** 如何打印网络FLOPs？
+**A:** 在`configs/runtime.yml`中设置`print_flops: true`，同时需要安装PaddleSlim(比如：pip install paddleslim)，即可打印模型的FLOPs。
+**Q:** 如何使用无标注框进行训练？
+**A:** 在`configs/dataset/coco.py` 或者`configs/dataset/voc.py`中的TrainDataset下设置`allow_empty: true`, 此时允许数据集加载无标注框进行训练。该功能支持coco，voc数据格式，RCNN系列和YOLO系列模型验证能够正常训练。另外，如果无标注框数据过多，会影响模型收敛，在TrainDataset下可以设置`empty_ratio: 0.1`对无标注框数据进行随机采样，控制无标注框的数据量占总数据量的比例，默认值为1.，即使用全部无标注框
--- a/docs/tutorials/FAQ/README.md
+++ b/docs/tutorials/FAQ/README.md
+# FAQ/常见问题
+**PaddleDetection**非常感谢各位开发者提出任何使用问题或需求，我们根据大家的提问，总结**FAQ/常见问题**合集，并在**每周一**进行更新，以下是往期的FAQ，欢迎大家进行查阅。
+- [FAQ：第零期](./FAQ第零期.md)
+- [FAQ：第一期](./FAQ第一期.md)
--- a/docs/tutorials/GETTING_STARTED.md
+++ b/docs/tutorials/GETTING_STARTED.md
+English | [简体中文](GETTING_STARTED_cn.md)
+# Getting Started
+## Installation
+For setting up the running environment, please refer to [installation
+instructions](INSTALL_cn.md).
+## Data preparation
+- Please refer to [PrepareDetDataSet](./data/PrepareDetDataSet_en.md) for data preparation
+- Please set the data path for data configuration file in ```configs/datasets```
+## Training & Evaluation & Inference
+PaddleDetection provides scripts for training, evalution and inference with various features according to different configure. And for more distribued training details see [DistributedTraining].(./DistributedTraining_en.md)
+```bash
+# training on single-GPU
+export CUDA_VISIBLE_DEVICES=0
+python tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
+# training on multi-GPU
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
+# training on multi-machines and multi-GPUs
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+$fleetrun --ips="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" --selected_gpu 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
+# GPU evaluation
+export CUDA_VISIBLE_DEVICES=0
+python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams
+# Inference
+python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --infer_img=demo/000000570688.jpg -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams
+```
+### Other argument list
+list below can be viewed by `--help`
+|         FLAG             |  script supported  |    description    |     default     |      remark      |
+| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
+|          -c              |      ALL       |  Select config file  |  None  |  **required**, such as `-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml` |
+|          -o              |      ALL       |  Set parameters in configure file  |  None  |  `-o` has higher priority to file configured by `-c`. Such as `-o use_gpu=False`  |  
+|        --eval            |     train      |  Whether to perform evaluation in training  |  False  |  set `--eval` if needed  |
+|   -r/--resume_checkpoint |     train      |  Checkpoint path for resuming training  |  None  |  such as `-r output/faster_rcnn_r50_1x_coco/10000`  |
+|      --slim_config     |     ALL |  Configure file of slim method  |  None  |  such as `--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml`  |
+|        --use_vdl          |   train/infer   |  Whether to record the data with [VisualDL](https://github.com/paddlepaddle/visualdl), so as to display in VisualDL  |  False  |  VisualDL requires Python>=3.5   |
+|        --vdl\_log_dir     |   train/infer   |  VisualDL logging directory for image  |  train:`vdl_log_dir/scalar` infer: `vdl_log_dir/image`  |  VisualDL requires Python>=3.5   |
+|      --output_eval       |   eval |  Directory for storing the evaluation output  | None  |   such as `--output_eval=eval_output`, default is current directory  |
+|       --json_eval        |       eval     |  Whether to evaluate with already existed bbox.json or mask.json  |  False  |  set `--json_eval` if needed and json path is set in `--output_eval`  |
+|      --classwise         |       eval     |  Whether to eval AP for each class and draw PR curve  |  False  |  set `--classwise` if needed  |
+|       --output_dir       |      infer     |  Directory for storing the output visualization files  |  `./output`  |  such as `--output_dir output`  |
+|    --draw_threshold      |      infer     |  Threshold to reserve the result for visualization  |  0.5  |   such as `--draw_threshold 0.7`  |
+|      --infer_dir         |       infer     |  Directory for images to perform inference on  |  None  | One of `infer_dir` and `infer_img` is requied  |
+|      --infer_img         |       infer     |  Image path  |  None  | One of `infer_dir` and `infer_img` is requied, `infer_img` has higher priority over `infer_dir`  |
+|      --save_results         |       infer     |  Whether to save detection results to file      |  False | Optional
+## Examples
+### Training
+- Perform evaluation in training
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --eval
+  ```
+  Perform training and evalution alternatively and evaluate at each end of epoch. Meanwhile, the best model with highest MAP is saved at each epoch which has the same path as `model_final`.
+  If evaluation dataset is large, we suggest modifing `snapshot_epoch` in `configs/runtime.yml` to decrease evaluation times or evaluating after training.
+- Fine-tune other task
+  When using pre-trained model to fine-tune other task, pretrain\_weights can be used directly. The parameters with different shape will be ignored automatically. For example:
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  # If the shape of parameters in program is different from pretrain_weights,
+  # then PaddleDetection will not use such parameters.
+  python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+                           -o pretrain_weights=output/faster_rcnn_r50_1x_coco/model_final \
+  ```
+##### NOTES
+- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`.
+- Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally.
+- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
+- Checkpoints are saved in `output` by default, and can be revised from `save_dir` in `configs/runtime.yml`.
+### Evaluation
+- Evaluate by specified weights path and dataset path
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python -u tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+                          -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams
+  ```
+  The path of model to be evaluted can be both local path and link in [MODEL_ZOO](../MODEL_ZOO_cn.md).
+- Evaluate with json
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+             --json_eval \
+             -output_eval evaluation/
+  ```
+  The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory.
+### Inference
+- Output specified directory && Set up threshold
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+                      --infer_img=demo/000000570688.jpg \
+                      --output_dir=infer_output/ \
+                      --draw_threshold=0.5 \
+                      -o weights=output/faster_rcnn_r50_fpn_1x_coco/model_final \
+                      --use_vdl=True
+  ```
+  `--draw_threshold` is an optional argument. Default is 0.5.
+  Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).
+## Deployment
+Please refer to [depolyment](../../deploy/README_en.md)
+## Model Compression
+Please refer to [slim](../../configs/slim/README_en.md)
--- a/docs/tutorials/GETTING_STARTED_cn.md
+++ b/docs/tutorials/GETTING_STARTED_cn.md
+[English](GETTING_STARTED.md) | 简体中文
+# 30分钟快速上手PaddleDetection
+PaddleDetection作为成熟的目标检测开发套件，提供了从数据准备、模型训练、模型评估、模型导出到模型部署的全流程。在这个章节里面，我们以路标检测数据集为例，提供快速上手PaddleDetection的流程。
+## 1 安装
+关于安装配置运行环境，请参考[安装指南](INSTALL_cn.md)
+在本演示案例中，假定用户将PaddleDetection的代码克隆并放置在`/home/paddle`目录中。用户执行的命令操作均在`/home/paddle/PaddleDetection`目录下完成
+## 2 准备数据
+目前PaddleDetection支持：COCO VOC WiderFace, MOT四种数据格式。
+- 首先按照[准备数据文档](./data/PrepareDetDataSet.md) 准备数据。  
+- 然后设置`configs/datasets`中相应的coco或voc等数据配置文件中的数据路径。
+- 在本项目中，我们使用路标识别数据集
+ ```bash
+python dataset/roadsign_voc/download_roadsign_voc.py
+```
+- 下载后的数据格式为
+```
+  ├── download_roadsign_voc.py
+  ├── annotations
+  │   ├── road0.xml
+  │   ├── road1.xml
+  │   |   ...
+  ├── images
+  │   ├── road0.png
+  │   ├── road1.png
+  │   |   ...
+  ├── label_list.txt
+  ├── train.txt
+  ├── valid.txt
+```
+## 3 配置文件改动和说明
+我们使用`configs/yolov3/yolov3_mobilenet_v1_roadsign`配置进行训练。
+在静态图版本下，一个模型往往可以通过两个配置文件（一个主配置文件、一个reader的读取配置）实现，在PaddleDetection 2.0后续版本，采用了模块解耦设计，用户可以组合配置模块实现检测器，并可自由修改覆盖各模块配置，如下图所示
+<center>
+<img src="../images/roadsign_yml.png" width="500" >
+</center>
+<br><center>配置文件摘要</center></br>
+从上图看到`yolov3_mobilenet_v1_roadsign.yml`配置需要依赖其他的配置文件。在该例子中需要依赖：
+```bash
+  roadsign_voc.yml
+  runtime.yml
+  optimizer_40e.yml
+  yolov3_mobilenet_v1.yml
+  yolov3_reader.yml
+--------------------------------------
+yolov3_mobilenet_v1_roadsign 文件入口
+roadsign_voc 主要说明了训练数据和验证数据的路径
+runtime.yml 主要说明了公共的运行参数，比如说是否使用GPU、每多少个epoch存储checkpoint等
+optimizer_40e.yml 主要说明了学习率和优化器的配置。
+ppyolov2_r50vd_dcn.yml 主要说明模型、和主干网络的情况。
+ppyolov2_reader.yml 主要说明数据读取器配置，如batch size，并发加载子进程数等，同时包含读取后预处理操作，如resize、数据增强等等
+```
+<center><img src="../images/yaml_show.png" width="1000" ></center>
+<br><center>配置文件结构说明</center></br>
+### 修改配置文件说明
+* 关于数据的路径修改说明
+在修改配置文件中，用户如何实现自定义数据集是非常关键的一步，如何定义数据集请参考[如何自定义数据集](https://aistudio.baidu.com/aistudio/projectdetail/1917140)
+* 默认学习率是适配多GPU训练(8x GPU)，若使用单GPU训练，须对应调整学习率（例如，除以8）
+* 更多使用问题，请参考[FAQ](FAQ)
+## 4 训练
+PaddleDetection提供了单卡/多卡训练模式，满足用户多种训练需求
+* GPU单卡训练
+```bash
+export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
+python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
+```
+* GPU多卡训练
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 #windows和Mac下不需要执行该命令
+python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
+```
+* [GPU多机多卡训练](./DistributedTraining_cn.md)
+```bash
+$fleetrun \
+--ips="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" \
+--selected_gpu 0,1,2,3,4,5,6,7 \
+tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
+```
+* Fine-tune其他任务
+  使用预训练模型fine-tune其他任务时，可以直接加载预训练模型，形状不匹配的参数将自动忽略，例如：
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  # 如果模型中参数形状与加载权重形状不同，将不会加载这类参数
+python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o pretrain_weights=output/model_final
+```
+* 模型恢复训练
+  在日常训练过程中，有的用户由于一些原因导致训练中断，用户可以使用-r的命令恢复训练
+```bash
+export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
+python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -r output/faster_rcnn_r50_1x_coco/10000
+ ```
+## 5 评估
+* 默认将训练生成的模型保存在当前`output`文件夹下
+ ```bash
+export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
+python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_roadsign.pdparams
+```
+* 边训练，边评估
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 #windows和Mac下不需要执行该命令
+python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval
+```
+  在训练中交替执行评估, 评估在每个epoch训练结束后开始。每次评估后还会评出最佳mAP模型保存到`best_model`文件夹下。
+  如果验证集很大，测试将会比较耗时，建议调整`configs/runtime.yml` 文件中的 `snapshot_epoch`配置以减少评估次数，或训练完成后再进行评估。
+- 通过json文件评估
+```bash
+export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
+python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
+             --json_eval \
+             -output_eval evaluation/
+```
+* 上述命令中没有加载模型的选项，则使用配置文件中weights的默认配置，`weights`表示训练过程中保存的最后一轮模型文件
+* json文件必须命名为bbox.json或者mask.json，放在`evaluation`目录下。
+## 6 预测
+  ```bash
+  python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --infer_img=demo/000000570688.jpg -o weights=https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_roadsign.pdparams
+  ```
+ * 设置参数预测
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
+  python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
+                      --infer_img=demo/road554.png \
+                      --output_dir=infer_output/ \
+                      --draw_threshold=0.5 \
+                      -o weights=output/yolov3_mobilenet_v1_roadsign/model_final \
+                      --use_vdl=True
+  ```
+  `--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算，不同阈值会产生不同的结果
+  `keep_top_k`表示设置输出目标的最大数量，默认值为100，用户可以根据自己的实际情况进行设定。
+结果如下图：
+![road554 image](../images/road554.png)
+## 7 训练可视化
+当打开`use_vdl`开关后，为了方便用户实时查看训练过程中状态，PaddleDetection集成了VisualDL可视化工具，当打开`use_vdl`开关后，记录的数据包括：
+1. loss变化趋势
+2. mAP变化趋势
+```bash
+export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
+python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
+                        --use_vdl=true \
+                        --vdl_log_dir=vdl_dir/scalar \
+```
+使用如下命令启动VisualDL查看日志
+```shell
+# 下述命令会在127.0.0.1上启动一个服务，支持通过前端web页面查看，可以通过--host这个参数指定实际ip地址
+visualdl --logdir vdl_dir/scalar/
+```
+在浏览器输入提示的网址，效果如下：
+<center><img src="https://ai-studio-static-online.cdn.bcebos.com/ab767a202f084d1589f7d34702a75a7ef5d0f0a7e8c445bd80d54775b5761a8d" width="900" ></center>
+<br><center>图：VDL效果演示</center></br>
+**参数列表**
+以下列表可以通过`--help`查看
+|         FLAG             |     支持脚本    |        用途        |      默认值       |         备注         |
+| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
+|          -c              |      ALL       |  指定配置文件  |  None  |  **必选**，例如-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml |
+|          -o              |      ALL       |  设置或更改配置文件里的参数内容  |  None  |  相较于`-c`设置的配置文件有更高优先级，例如：`-o use_gpu=False`  |
+|        --eval            |     train      |  是否边训练边测试  |  False  |  如需指定，直接`--eval`即可 |
+|   -r/--resume_checkpoint |     train      |  恢复训练加载的权重路径  |  None  |  例如：`-r output/faster_rcnn_r50_1x_coco/10000`  |
+|       --slim_config             |     ALL      |  模型压缩策略配置文件  |  None  |  例如`--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml`  |
+|        --use_vdl          |   train/infer   |  是否使用[VisualDL](https://github.com/paddlepaddle/visualdl)记录数据，进而在VisualDL面板中显示  |  False  |  VisualDL需Python>=3.5   |
+|        --vdl\_log_dir     |   train/infer   |  指定 VisualDL 记录数据的存储路径  |  train:`vdl_log_dir/scalar` infer: `vdl_log_dir/image`  |  VisualDL需Python>=3.5   |
+|      --output_eval       |   eval |  评估阶段保存json路径  | None  |  例如 `--output_eval=eval_output`, 默认为当前路径  |
+|       --json_eval        |       eval     |  是否通过已存在的bbox.json或者mask.json进行评估  |  False  |  如需指定，直接`--json_eval`即可， json文件路径在`--output_eval`中设置  |
+|      --classwise         |       eval     |  是否评估单类AP和绘制单类PR曲线  |  False  |  如需指定，直接`--classwise`即可 |
+|       --output_dir       |      infer/export_model     |  预测后结果或导出模型保存路径  |  `./output`  |  例如`--output_dir=output`  |
+|    --draw_threshold      |      infer     |  可视化时分数阈值  |  0.5  |  例如`--draw_threshold=0.7`  |
+|      --infer_dir         |       infer     |  用于预测的图片文件夹路径  |  None  |    `--infer_img`和`--infer_dir`必须至少设置一个 |
+|      --infer_img         |       infer     |  用于预测的图片路径  |  None  |  `--infer_img`和`--infer_dir`必须至少设置一个，`infer_img`具有更高优先级  |
+|      --save_results         |       infer     |  是否在文件夹下将图片的预测结果保存到文件中        |  False  |  可选  |
+## 8 模型导出
+在模型训练过程中保存的模型文件是包含前向预测和反向传播的过程，在实际的工业部署则不需要反向传播，因此需要将模型进行导成部署需要的模型格式。
+在PaddleDetection中提供了 `tools/export_model.py`脚本来导出模型
+```bash
+python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --output_dir=./inference_model \
+ -o weights=output/yolov3_mobilenet_v1_roadsign/best_model
+```
+预测模型会导出到`inference_model/yolov3_mobilenet_v1_roadsign`目录下，分别为`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`,`model.pdmodel` 如果不指定文件夹，模型则会导出在`output_inference`
+* 更多关于模型导出的文档，请参考[模型导出文档](../../deploy/EXPORT_MODEL.md)
+## 9 模型压缩
+为了进一步对模型进行优化，PaddleDetection提供了基于PaddleSlim进行模型压缩的完整教程和benchmark。目前支持的方案：
+* 裁剪
+* 量化
+* 蒸馏
+* 联合策略
+* 更多关于模型压缩的文档，请参考[模型压缩文档](../../configs/slim/README.md)。
+## 10 预测部署
+PaddleDetection提供了PaddleInference、PaddleServing、PaddleLite多种部署形式，支持服务端、移动端、嵌入式等多种平台，提供了完善的Python和C++部署方案。
+* 在这里，我们以Python为例，说明如何使用PaddleInference进行模型部署
+```bash
+python deploy/python/infer.py --model_dir=./output_inference/yolov3_mobilenet_v1_roadsign --image_file=demo/road554.png --device=GPU
+```
+* 同时`infer.py`提供了丰富的接口，用户进行接入视频文件、摄像头进行预测，更多内容请参考[Python端预测部署](../../deploy/python)
+### PaddleDetection支持的部署形式说明
+|形式|语言|教程|设备/平台|
+|-|-|-|-|
+|PaddleInference|Python|已完善|Linux(arm X86)、Windows
+|PaddleInference|C++|已完善|Linux(arm X86)、Windows|
+|PaddleServing|Python|已完善|Linux(arm X86)、Windows|
+|PaddleLite|C++|已完善|Android、IOS、FPGA、RK...
+* 更多关于预测部署的文档，请参考[预测部署文档](../../deploy/README.md)。
--- a/docs/tutorials/INSTALL.md
+++ b/docs/tutorials/INSTALL.md
+English | [简体中文](INSTALL_cn.md)
+# Installation
+This document covers how to install PaddleYOLO and its dependencies
+(including PaddlePaddle), together with COCO and Pascal VOC dataset.
+For general information about PaddleYOLO, please see [README.md](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop).
+## Requirements:
+- PaddlePaddle 2.3.2
+- OS 64 bit
+- Python 3(3.5.1+/3.6/3.7/3.8/3.9)，64 bit
+- pip/pip3(9.0.1+), 64 bit
+- CUDA >= 10.2
+- cuDNN >= 7.6
+Dependency of PaddleYOLO and PaddlePaddle:
+| PaddleYOLO version | PaddlePaddle version  |    tips    |
+| :----------------: | :---------------: | :-------: |
+|    develop           |       >= 2.3.2   |     Dygraph mode is set as default    |
+|    release/2.6       |       >= 2.3.2   |     Dygraph mode is set as default    |
+|    release/2.5       |       >= 2.2.2   |     Dygraph mode is set as default    |
+## Instruction
+### 1. Install PaddlePaddle
+```
+# CUDA10.2
+python -m pip install paddlepaddle-gpu==2.3.2 -i https://mirror.baidu.com/pypi/simple
+# CPU
+python -m pip install paddlepaddle==2.3.2 -i https://mirror.baidu.com/pypi/simple
+```
+- For more CUDA version or environment to quick install, please refer to the [PaddlePaddle Quick Installation document](https://www.paddlepaddle.org.cn/install/quick)
+- For more installation methods such as conda or compile with source code, please refer to the [installation document](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html)
+Please make sure that your PaddlePaddle is installed successfully and the version is not lower than the required version. Use the following command to verify.
+```
+# check
+>>> import paddle
+>>> paddle.utils.run_check()
+# confirm the paddle's version
+python -c "import paddle; print(paddle.__version__)"
+```
+**Note**
+1.  If you want to use PaddleDetection on multi-GPU, please install NCCL at first.
+### 2. Install PaddleDetection
+**Note:** Installing via pip only supports Python3
+```
+# Clone PaddleDetection repository
+cd <path/to/clone/PaddleDetection>
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+# Install other dependencies
+cd PaddleDetection
+pip install -r requirements.txt
+# Compile and install paddledet
+python setup.py install
+```
+**Note**
+1. If you are working on Windows OS, `pycocotools` installing may failed because of the origin version of cocoapi does not support windows, another version can be used used which only supports Python3:
+    ```pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI```
+2. If you are using Python <= 3.6, `pycocotools` installing may failed with error like `distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('cython>=0.27.3')`, please install `cython` firstly, for example `pip install cython`
+After installation, make sure the tests pass:
+```shell
+python ppdet/modeling/tests/test_architectures.py
+```
+If the tests are passed, the following information will be prompted:
+```
+.......
+----------------------------------------------------------------------
+Ran 7 tests in 12.816s
+OK
+```
+## Use built Docker images
+> If you  do not have a Docker environment, please refer to [Docker](https://www.docker.com/).
+We provide docker images containing the latest PaddleDetection code, and all environment and package dependencies are pre-installed. All you have to do is to **pull and run the docker image**. Then you can enjoy PaddleDetection without any extra steps.
+Get these images and guidance in [docker hub](https://hub.docker.com/repository/docker/paddlecloud/paddledetection), including CPU, GPU, ROCm environment versions.
+If you have some customized requirements about automatic building docker images, you can get it in github repo [PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton).
+## Inference demo
+**Congratulation!** Now you have installed PaddleDetection successfully and try our inference demo:
+```
+# Predict an image by GPU
+export CUDA_VISIBLE_DEVICES=0
+python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
+```
+An image of the same name with the predicted result will be generated under the `output` folder.
+The result is as shown below：
+![](../images/000000014439.jpg)
--- a/docs/tutorials/INSTALL_cn.md
+++ b/docs/tutorials/INSTALL_cn.md
+[English](INSTALL.md) | 简体中文
+# 安装文档
+## 环境要求
+- PaddlePaddle 2.3.2
+- OS 64位操作系统
+- Python 3(3.5.1+/3.6/3.7/3.8/3.9)，64位版本
+- pip/pip3(9.0.1+)，64位版本
+- CUDA >= 10.2
+- cuDNN >= 7.6
+PaddleYOLO 依赖 PaddlePaddle 版本关系：
+|  PaddleYOLO版本  | PaddlePaddle版本  |    备注    |
+| :------------------: | :---------------: | :-------: |
+|    develop           |       >= 2.3.2    |     默认使用动态图模式    |
+|    release/2.6       |       >= 2.3.2    |     默认使用动态图模式    |
+|    release/2.5       |       >= 2.2.2    |     默认使用动态图模式    |
+## 安装说明
+### 1. 安装PaddlePaddle
+```
+# CUDA10.2
+python -m pip install paddlepaddle-gpu==2.3.2 -i https://mirror.baidu.com/pypi/simple
+# CPU
+python -m pip install paddlepaddle==2.3.2 -i https://mirror.baidu.com/pypi/simple
+```
+- 更多CUDA版本或环境快速安装，请参考[PaddlePaddle快速安装文档](https://www.paddlepaddle.org.cn/install/quick)
+- 更多安装方式例如conda或源码编译安装方法，请参考[PaddlePaddle安装文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/index_cn.html)
+请确保您的PaddlePaddle安装成功并且版本不低于需求版本。使用以下命令进行验证。
+```
+# 在您的Python解释器中确认PaddlePaddle安装成功
+>>> import paddle
+>>> paddle.utils.run_check()
+# 确认PaddlePaddle版本
+python -c "import paddle; print(paddle.__version__)"
+```
+**注意**
+1. 如果您希望在多卡环境下使用PaddleDetection，请首先安装NCCL
+### 2. 安装PaddleDetection
+**注意：** pip安装方式只支持Python3
+```
+# 克隆PaddleDetection仓库
+cd <path/to/clone/PaddleDetection>
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+# 安装其他依赖
+cd PaddleDetection
+pip install -r requirements.txt
+# 编译安装paddledet
+python setup.py install
+```
+**注意**
+1. 如果github下载代码较慢，可尝试使用[gitee](https://gitee.com/PaddlePaddle/PaddleDetection.git)或者[代理加速](https://doc.fastgit.org/zh-cn/guide.html)。
+1. 若您使用的是Windows系统，由于原版cocoapi不支持Windows，`pycocotools`依赖可能安装失败，可采用第三方实现版本，该版本仅支持Python3
+    ```pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI```
+2. 若您使用的是Python <= 3.6的版本，安装`pycocotools`可能会报错`distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('cython>=0.27.3')`, 您可通过先安装`cython`如`pip install cython`解决该问题
+安装后确认测试通过：
+```
+python ppdet/modeling/tests/test_architectures.py
+```
+测试通过后会提示如下信息：
+```
+.......
+----------------------------------------------------------------------
+Ran 7 tests in 12.816s
+OK
+```
+## 使用Docker镜像
+> 如果您没有Docker运行环境，请参考[Docker官网](https://www.docker.com/)进行安装。
+我们提供了包含最新 PaddleDetection 代码的docker镜像，并预先安装好了所有的环境和库依赖，您只需要**拉取docker镜像**，然后**运行docker镜像**，无需其他任何额外操作，即可开始使用PaddleDetection的所有功能。
+在[Docker Hub](https://hub.docker.com/repository/docker/paddlecloud/paddledetection)中获取这些镜像及相应的使用指南，包括CPU、GPU、ROCm版本。
+如果您对自动化制作docker镜像感兴趣，或有自定义需求，请访问[PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton)做进一步了解。
+## 快速体验
+**恭喜！** 您已经成功安装了PaddleDetection，接下来快速体验目标检测效果
+```
+# 在GPU上预测一张图片
+export CUDA_VISIBLE_DEVICES=0
+python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
+```
+会在`output`文件夹下生成一个画有预测结果的同名图像。
+结果如下图：
+![](../images/000000014439.jpg)
--- a/docs/tutorials/QUICK_STARTED.md
+++ b/docs/tutorials/QUICK_STARTED.md
+English | [简体中文](QUICK_STARTED_cn.md)
+# Quick Start
+In order to enable users to experience PaddleDetection and produce models in a short time, this tutorial introduces the pipeline to get a decent object detection model by finetuning on a small dataset in 10 minutes only. In practical applications, it is recommended that users select a suitable model configuration file for their specific demand.
+- **Set GPU**
+```bash
+export CUDA_VISIBLE_DEVICES=0
+```
+## Inference Demo with Pre-trained Models
+```
+# predict an image using PP-YOLO
+python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
+```
+the result：
+![](../images/000000014439.jpg)
+## Data preparation
+The Dataset is [Kaggle dataset](https://www.kaggle.com/andrewmvd/road-sign-detection) ，including 877 images and 4 data categories: crosswalk, speedlimit, stop, trafficlight. The dataset is divided into training set (701 images) and test set (176 images)，[download link](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar).
+```
+# Note: this command could skip and
+# the dataset will be dowloaded automatically at the stage of training.
+python dataset/roadsign_voc/download_roadsign_voc.py
+```
+## Training & Evaluation & Inference
+### 1、Training
+```
+# It will takes about 10 minutes on 1080Ti and 1 hour on CPU
+# -c set configuration file
+# -o overwrite the settings in the configuration file
+# --eval Evaluate while training, and a model named best_model.pdmodel with the most evaluation results will be automatically saved
+python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true
+```
+If you want to observe the loss change curve in real time through VisualDL, add --use_vdl=true to the training command, and set the log save path through --vdl_log_dir.
+**Note: VisualDL need Python>=3.5**
+Please install [VisualDL](https://github.com/PaddlePaddle/VisualDL) first
+```
+python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple
+```
+```
+python -u tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
+                        --use_vdl=true \
+                        --vdl_log_dir=vdl_dir/scalar \
+                        --eval
+```
+View the change curve in real time through the visualdl command:
+```
+visualdl --logdir vdl_dir/scalar/ --host <host_IP> --port <port_num>
+```
+### 2、Evaluation
+```
+# Evaluate best_model by default
+# -c set config file
+# -o overwrite the settings in the configuration file
+python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true
+```
+The final mAP should be around 0.85. The dataset is small so the precision may vary a little after each training.
+### 3、Inference
+```
+# -c set config file
+# -o overwrite the settings in the configuration file
+# --infer_img image path
+# After the prediction is over, an image of the same name with the prediction result will be generated in the output folder
+python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true --infer_img=demo/road554.png
+```
+The result is as shown below：
+![](../images/road554.png)
--- a/docs/tutorials/QUICK_STARTED_cn.md
+++ b/docs/tutorials/QUICK_STARTED_cn.md
+[English](QUICK_STARTED.md) | 简体中文
+# 快速开始
+为了使得用户能够在很短时间内快速产出模型，掌握PaddleDetection的使用方式，这篇教程通过一个预训练检测模型对小数据集进行finetune。在较短时间内即可产出一个效果不错的模型。实际业务中，建议用户根据需要选择合适模型配置文件进行适配。
+- **设置显卡**
+```bash
+export CUDA_VISIBLE_DEVICES=0
+```
+## 一、快速体验
+```
+# 用PP-YOLO算法在COCO数据集上预训练模型预测一张图片
+python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
+```
+结果如下图：
+![demo image](../images/000000014439.jpg)
+## 二、准备数据
+数据集参考[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) ，包含877张图像，数据类别4类：crosswalk，speedlimit，stop，trafficlight。
+将数据划分为训练集701张图和测试集176张图，[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar).
+```
+# 注意：可跳过这步下载，后面训练会自动下载
+python dataset/roadsign_voc/download_roadsign_voc.py
+```
+## 三、训练、评估、预测
+### 1、训练
+```
+# 边训练边测试 CPU需要约1小时(use_gpu=false)，1080Ti GPU需要约10分钟
+# -c 参数表示指定使用哪个配置文件
+# -o 参数表示指定配置文件中的全局变量（覆盖配置文件中的设置），这里设置使用gpu
+# --eval 参数表示边训练边评估，最后会自动保存一个名为model_final.pdparams的模型
+python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true
+```
+如果想通过VisualDL实时观察loss变化曲线，在训练命令中添加--use_vdl=true，以及通过--vdl_log_dir设置日志保存路径。
+**但注意VisualDL需Python>=3.5**
+首先安装[VisualDL](https://github.com/PaddlePaddle/VisualDL)
+```
+python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple
+```
+```
+python -u tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
+                        --use_vdl=true \
+                        --vdl_log_dir=vdl_dir/scalar \
+                        --eval
+```
+通过visualdl命令实时查看变化曲线：
+```
+visualdl --logdir vdl_dir/scalar/ --host <host_IP> --port <port_num>
+```
+### 2、评估
+```
+# 评估 默认使用训练过程中保存的model_final.pdparams
+# -c 参数表示指定使用哪个配置文件
+# -o 参数表示指定配置文件中的全局变量（覆盖配置文件中的设置）
+# 目前只支持单卡评估
+python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true
+```
+最终模型精度在mAP=0.85左右，由于数据集较小因此每次训练结束后精度会有一定波动
+### 3、预测
+```
+# -c 参数表示指定使用哪个配置文件
+# -o 参数表示指定配置文件中的全局变量（覆盖配置文件中的设置）
+# --infer_img 参数指定预测图像路径
+# 预测结束后会在output文件夹中生成一张画有预测结果的同名图像
+python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true --infer_img=demo/road554.png
+```
+结果如下图：
+![road554 image](../images/road554.png)
--- a/docs/tutorials/config_annotation/multi_scale_test_config.md
+++ b/docs/tutorials/config_annotation/multi_scale_test_config.md
+# Multi Scale Test Configuration
+Tags: Configuration
+---
+```yaml
+##################################### Multi scale test configuration #####################################
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+TestReader:
+  sample_transforms:
+  - Decode: {}
+  - MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+```
+---
+Multi Scale Test is a TTA (Test Time Augmentation) method, it can improve object detection performance. 
+The input image will be scaled into different scales, then model generated predictions (bboxes) at different scales, finally all the predictions will be combined to generate final prediction. (Here **NMS** is used to aggregate the predictions.)
+## _MultiscaleTestResize_ option
+`MultiscaleTestResize` option is used to enable multi scale test prediction. 
+`origin_target_size: [800, 1333]` means the input image will be scaled to 800 (for short edge) and 1333 (max edge length cannot be greater than 1333) at first
+`target_size: [700 , 900]` property is used to specify different scales. 
+It can be plugged into evaluation process or test (inference) process, by adding `MultiscaleTestResize` entry to `EvalReader.sample_transforms` or `TestReader.sample_transforms`
+---
+###Note
+Now only CascadeRCNN, FasterRCNN and MaskRCNN are supported for multi scale testing. And batch size must be 1.
\ No newline at end of file
--- a/docs/tutorials/config_annotation/multi_scale_test_config_cn.md
+++ b/docs/tutorials/config_annotation/multi_scale_test_config_cn.md
+# 多尺度测试的配置
+标签: 配置
+---
+```yaml
+##################################### 多尺度测试的配置 #####################################
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+TestReader:
+  sample_transforms:
+  - Decode: {}
+  - MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+```
+---
+多尺度测试是一种TTA方法（测试时增强），可以用于提高目标检测的准确率
+输入图像首先被缩放为不同尺度的图像，然后模型对这些不同尺度的图像进行预测，最后将这些不同尺度上的预测结果整合为最终预测结果。（这里使用了**NMS**来整合不同尺度的预测结果）
+## _MultiscaleTestResize_ 选项
+`MultiscaleTestResize` 选项用于开启多尺度测试. 
+`origin_target_size: [800, 1333]` 项代表输入图像首先缩放为短边为800，最长边不超过1333.
+`target_size: [700 , 900]` 项设置不同的预测尺度。
+通过在`EvalReader.sample_transforms`或`TestReader.sample_transforms`中设置`MultiscaleTestResize`项，可以在评估过程或预测过程中开启多尺度测试。
+---
+###注意
+目前多尺度测试只支持CascadeRCNN, FasterRCNN and MaskRCNN网络, 并且batch size需要是1.
\ No newline at end of file
--- a/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md
+++ b/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md
+# YOLO系列模型参数配置教程
+标签： 模型参数配置
+以`ppyolo_r50vd_dcn_1x_coco.yml`为例，这个模型由五个子配置文件组成：
+- 数据配置文件 `coco_detection.yml`
+```yaml
+# 数据评估类型
+metric: COCO
+# 数据集的类别数
+num_classes: 80
+# TrainDataset
+TrainDataset:
+  !COCODataSet
+    # 图像数据路径，相对 dataset_dir 路径，os.path.join(dataset_dir, image_dir)
+    image_dir: train2017
+    # 标注文件路径，相对 dataset_dir 路径，os.path.join(dataset_dir, anno_path)
+    anno_path: annotations/instances_train2017.json
+    # 数据文件夹
+    dataset_dir: dataset/coco
+    # data_fields
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+EvalDataset:
+  !COCODataSet
+    # 图像数据路径，相对 dataset_dir 路径，os.path.join(dataset_dir, image_dir)
+    image_dir: val2017
+    # 标注文件路径，相对 dataset_dir 路径，os.path.join(dataset_dir, anno_path)
+    anno_path: annotations/instances_val2017.json
+    # 数据文件夹，os.path.join(dataset_dir, anno_path)
+    dataset_dir: dataset/coco
+TestDataset:
+  !ImageFolder
+    # 标注文件路径，相对 dataset_dir 路径
+    anno_path: annotations/instances_val2017.json
+```
+- 优化器配置文件 `optimizer_1x.yml`
+```yaml
+# 总训练轮数
+epoch: 405
+# 学习率设置
+LearningRate:
+  # 默认为8卡训学习率
+  base_lr: 0.01
+  # 学习率调整策略
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    # 学习率变化位置(轮数)
+    milestones:
+    - 243
+    - 324
+  # Warmup
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+# 优化器
+OptimizerBuilder:
+  # 优化器
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  # 正则化
+  regularizer:
+    factor: 0.0005
+    type: L2
+```
+- 数据读取配置文件 `ppyolo_reader.yml`
+```yaml
+# 每张GPU reader进程个数
+worker_num: 2
+# 训练数据
+TrainReader:
+  inputs_def:
+    num_max_boxes: 50
+  # 训练数据transforms
+  sample_transforms:
+    - Decode: {}
+    - Mixup: {alpha: 1.5, beta: 1.5}
+    - RandomDistort: {}
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+    - RandomCrop: {}
+    - RandomFlip: {}
+  # batch_transforms
+  batch_transforms:
+    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeBox: {}
+    - PadBox: {num_max_boxes: 50}
+    - BboxXYXY2XYWH: {}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
+  # 训练时batch_size
+  batch_size: 24
+  # 读取数据是否乱序
+  shuffle: true
+  # 是否丢弃最后不能完整组成batch的数据
+  drop_last: true
+  # mixup_epoch，大于最大epoch，表示训练过程一直使用mixup数据增广
+  mixup_epoch: 25000
+  # 是否通过共享内存进行数据读取加速，需要保证共享内存大小(如/dev/shm)满足大于1G
+  use_shared_memory: true
+# 评估数据
+EvalReader:
+  # 评估数据transforms
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  # 评估时batch_size
+  batch_size: 8
+  # 是否丢弃没有标注的数据
+  drop_empty: false
+# 测试数据
+TestReader:
+  inputs_def:
+    image_shape: [3, 608, 608]
+  # 测试数据transforms
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  # 测试时batch_size
+  batch_size: 1
+```
+- 模型配置文件 `ppyolo_r50vd_dcn.yml`
+```yaml
+# 模型结构类型
+architecture: YOLOv3
+# 预训练模型地址
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
+# norm_type
+norm_type: sync_bn
+# 是否使用ema
+use_ema: true
+# ema_decay
+ema_decay: 0.9998
+# YOLOv3
+YOLOv3:
+  # backbone
+  backbone: ResNet
+  # neck
+  neck: PPYOLOFPN
+  # yolo_head
+  yolo_head: YOLOv3Head
+  # post_process
+  post_process: BBoxPostProcess
+# backbone
+ResNet:
+  # depth
+  depth: 50
+  # variant
+  variant: d
+  # return_idx, 0 represent res2
+  return_idx: [1, 2, 3]
+  # dcn_v2_stages
+  dcn_v2_stages: [3]
+  # freeze_at
+  freeze_at: -1
+  # freeze_norm
+  freeze_norm: false
+  # norm_decay
+  norm_decay: 0.
+# PPYOLOFPN
+PPYOLOFPN:
+  # 是否coord_conv
+  coord_conv: true
+  # 是否drop_block
+  drop_block: true
+  # block_size
+  block_size: 3
+  # keep_prob
+  keep_prob: 0.9
+  # 是否spp
+  spp: true
+# YOLOv3Head
+YOLOv3Head:
+  # anchors
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  # anchor_masks
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  # loss
+  loss: YOLOv3Loss
+  # 是否使用iou_aware
+  iou_aware: true
+  # iou_aware_factor
+  iou_aware_factor: 0.4
+# YOLOv3Loss
+YOLOv3Loss:
+  # ignore_thresh
+  ignore_thresh: 0.7
+  # downsample
+  downsample: [32, 16, 8]
+  # 是否label_smooth
+  label_smooth: false
+  # scale_x_y
+  scale_x_y: 1.05
+  # iou_loss
+  iou_loss: IouLoss
+  # iou_aware_loss
+  iou_aware_loss: IouAwareLoss
+# IouLoss
+IouLoss:
+  loss_weight: 2.5
+  loss_square: true
+# IouAwareLoss
+IouAwareLoss:
+  loss_weight: 1.0
+# BBoxPostProcess
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.01
+    downsample_ratio: 32
+    clip_bbox: true
+    scale_x_y: 1.05
+  # nms 配置
+  nms:
+    name: MatrixNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    post_threshold: 0.01
+    nms_top_k: -1
+    background_label: -1
+```
+- 运行时置文件 `runtime.yml`
+```yaml
+# 是否使用gpu
+use_gpu: true
+# 日志打印间隔
+log_iter: 20
+# save_dir
+save_dir: output
+# 模型保存间隔时间
+snapshot_epoch: 1
+```
--- a/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md
+++ b/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md
+# YOLO series model parameter configuration tutorial
+Tag: Model parameter configuration
+Take `ppyolo_r50vd_dcn_1x_coco.yml` as an example, The model consists of five sub-profiles:
+- Data profile `coco_detection.yml`
+```yaml
+# Data evaluation type
+metric: COCO
+# The number of categories in the dataset
+num_classes: 80
+# TrainDataset
+TrainDataset:
+  !COCODataSet
+    # Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir)
+    image_dir: train2017
+    # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
+    anno_path: annotations/instances_train2017.json
+    # data file
+    dataset_dir: dataset/coco
+    # data_fields
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+EvalDataset:
+  !COCODataSet
+    # Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir)
+    image_dir: val2017
+    # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
+    anno_path: annotations/instances_val2017.json
+    # data file os.path.join(dataset_dir, anno_path)
+    dataset_dir: dataset/coco
+TestDataset:
+  !ImageFolder
+    # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
+    anno_path: annotations/instances_val2017.json
+```
+- Optimizer configuration file `optimizer_1x.yml`
+```yaml
+# Total training epoches
+epoch: 405
+# learning rate setting
+LearningRate:
+  # Default is 8 Gpus training learning rate
+  base_lr: 0.01
+  # Learning rate adjustment strategy
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    # Position of change in learning rate (number of epoches)
+    milestones:
+    - 243
+    - 324
+  # Warmup
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+# Optimizer
+OptimizerBuilder:
+  # Optimizer
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  # Regularization
+  regularizer:
+    factor: 0.0005
+    type: L2
+```
+- Data reads configuration files `ppyolo_reader.yml`
+```yaml
+# Number of PROCESSES per GPU Reader
+worker_num: 2
+# training data
+TrainReader:
+  inputs_def:
+    num_max_boxes: 50
+  # Training data transforms
+  sample_transforms:
+    - Decode: {}
+    - Mixup: {alpha: 1.5, beta: 1.5}
+    - RandomDistort: {}
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+    - RandomCrop: {}
+    - RandomFlip: {}
+  # batch_transforms
+  batch_transforms:
+    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeBox: {}
+    - PadBox: {num_max_boxes: 50}
+    - BboxXYXY2XYWH: {}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
+  # Batch size during training
+  batch_size: 24
+  # Read data is out of order
+  shuffle: true
+  # Whether to discard data that does not complete the batch
+  drop_last: true
+  # mixup_epoch，Greater than maximum epoch, Indicates that the training process has been augmented with mixup data
+  mixup_epoch: 25000
+  # Whether to use the shared memory to accelerate data reading, ensure that the shared memory size (such as /dev/shm) is greater than 1 GB
+  use_shared_memory: true
+# Evaluate data
+EvalReader:
+  # Evaluating data transforms
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  # Batch_size during evaluation
+  batch_size: 8
+  # Whether to discard unlabeled data
+  drop_empty: false
+# test data
+TestReader:
+  inputs_def:
+    image_shape: [3, 608, 608]
+  # test data transforms
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  # batch_size during training
+  batch_size: 1
+```
+- Model profile `ppyolo_r50vd_dcn.yml`
+```yaml
+# Model structure type
+architecture: YOLOv3
+# Pretrain model address
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
+# norm_type
+norm_type: sync_bn
+# Whether to use EMA
+use_ema: true
+# ema_decay
+ema_decay: 0.9998
+# YOLOv3
+YOLOv3:
+  # backbone
+  backbone: ResNet
+  # neck
+  neck: PPYOLOFPN
+  # yolo_head
+  yolo_head: YOLOv3Head
+  # post_process
+  post_process: BBoxPostProcess
+# backbone
+ResNet:
+  # depth
+  depth: 50
+  # variant
+  variant: d
+  # return_idx, 0 represent res2
+  return_idx: [1, 2, 3]
+  # dcn_v2_stages
+  dcn_v2_stages: [3]
+  # freeze_at
+  freeze_at: -1
+  # freeze_norm
+  freeze_norm: false
+  # norm_decay
+  norm_decay: 0.
+# PPYOLOFPN
+PPYOLOFPN:
+  # whether coord_conv or not
+  coord_conv: true
+  # whether drop_block or not
+  drop_block: true
+  # block_size
+  block_size: 3
+  # keep_prob
+  keep_prob: 0.9
+  # whether spp or not
+  spp: true
+# YOLOv3Head
+YOLOv3Head:
+  # anchors
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  # anchor_masks
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  # loss
+  loss: YOLOv3Loss
+  # whether to use iou_aware
+  iou_aware: true
+  # iou_aware_factor
+  iou_aware_factor: 0.4
+# YOLOv3Loss
+YOLOv3Loss:
+  # ignore_thresh
+  ignore_thresh: 0.7
+  # downsample
+  downsample: [32, 16, 8]
+  # whether label_smooth or not
+  label_smooth: false
+  # scale_x_y
+  scale_x_y: 1.05
+  # iou_loss
+  iou_loss: IouLoss
+  # iou_aware_loss
+  iou_aware_loss: IouAwareLoss
+# IouLoss
+IouLoss:
+  loss_weight: 2.5
+  loss_square: true
+# IouAwareLoss
+IouAwareLoss:
+  loss_weight: 1.0
+# BBoxPostProcess
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.01
+    downsample_ratio: 32
+    clip_bbox: true
+    scale_x_y: 1.05
+  # nms setting
+  nms:
+    name: MatrixNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    post_threshold: 0.01
+    nms_top_k: -1
+    background_label: -1
+```
+- Runtime file `runtime.yml`
+```yaml
+# Whether to use gpu
+use_gpu: true
+# Log Printing interval
+log_iter: 20
+# save_dir
+save_dir: output
+# Model save interval
+snapshot_epoch: 1
+```