first init

c6a27e0b · panhb · e4b993b1 · c6a27e0b · c6a27e0b · c6a27e0b
Commit c6a27e0b authored Jan 07, 2025 by panhb
20 changed files
--- a/deploy/README.md
+++ b/deploy/README.md
+# PaddleDetection 预测部署
+PaddleDetection提供了Paddle Inference、Paddle Serving、Paddle-Lite多种部署形式，支持服务端、移动端、嵌入式等多种平台，提供了完善的Python和C++部署方案。
+## PaddleDetection支持的部署形式说明
+|形式|语言|教程|设备/平台|
+|-|-|-|-|
+|Paddle Inference|Python|已完善|Linux(ARM\X86)、Windows
+|Paddle Inference|C++|已完善|Linux(ARM\X86)、Windows|
+|Paddle Serving|Python|已完善|Linux(ARM\X86)、Windows|
+|Paddle-Lite|C++|已完善|Android、IOS、FPGA、RK...
+## 1.Paddle Inference部署
+### 1.1 导出模型
+使用`tools/export_model.py`脚本导出模型以及部署时使用的配置文件，配置文件名字为`infer_cfg.yml`。模型导出脚本如下：
+```bash
+# 导出YOLOv3模型
+python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams
+```
+预测模型会导出到`output_inference/yolov3_mobilenet_v1_roadsign`目录下，分别为`infer_cfg.yml`, `model.pdiparams`,  `model.pdiparams.info`, `model.pdmodel`。
+模型导出具体请参考文档[PaddleDetection模型导出教程](EXPORT_MODEL.md)。
+### 1.2 使用PaddleInference进行预测
+* Python部署 支持`CPU`、`GPU`和`XPU`环境，支持，windows、linux系统，支持NV Jetson嵌入式设备上部署。参考文档[python部署](python/README.md)
+* C++部署 支持`CPU`、`GPU`和`XPU`环境，支持，windows、linux系统，支持NV Jetson嵌入式设备上部署。参考文档[C++部署](cpp/README.md)
+* PaddleDetection支持TensorRT加速,相关文档请参考[TensorRT预测部署教程](TENSOR_RT.md)
+**注意:**  Paddle预测库版本需要>=2.1，batch_size>1仅支持YOLOv3和PP-YOLO。
+##  2.PaddleServing部署
+### 2.1 导出模型
+如果需要导出`PaddleServing`格式的模型，需要设置`export_serving_model=True`:
+```buildoutcfg
+python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams --export_serving_model=True
+```
+预测模型会导出到`output_inference/yolov3_darknet53_270e_coco`目录下，分别为`infer_cfg.yml`, `model.pdiparams`,  `model.pdiparams.info`, `model.pdmodel`, `serving_client/`文件夹, `serving_server/`文件夹。
+模型导出具体请参考文档[PaddleDetection模型导出教程](EXPORT_MODEL.md)。
+### 2.2 使用PaddleServing进行预测
+* [安装PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md#installation)
+* [使用PaddleServing](./serving/README.md)
+## 3.PaddleLite部署
+- [使用PaddleLite部署PaddleDetection模型](./lite/README.md)
+- 详细案例请参考[Paddle-Lite-Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo)部署。更多内容，请参考[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)
+## 4.第三方部署（MNN、NCNN、Openvino） 
+- 第三方部署提供PicoDet、TinyPose案例，其他模型请参考修改
+- TinyPose部署推荐工具：Intel CPU端推荐使用Openvino，GPU端推荐使用PaddleInference，ARM/ANDROID端推荐使用PaddleLite或者MNN
+| Third_Engine | MNN  | NCNN  | OPENVINO   |
+| ------------ | ---- | ----- | ---------- |
+| PicoDet      | [PicoDet_MNN](./third_engine/demo_mnn/README.md)       | [PicoDet_NCNN](./third_engine/demo_ncnn/README.md) | [PicoDet_OPENVINO](./third_engine/demo_openvino/README.md)   |
+| TinyPose     | [TinyPose_MNN](./third_engine/demo_mnn_kpts/README.md) | -                                                  | [TinyPose_OPENVINO](./third_engine/demo_openvino_kpts/README.md) |
+## 5.Benchmark测试
+- 使用导出的模型，运行Benchmark批量测试脚本：
+```shell
+sh deploy/benchmark/benchmark.sh {model_dir} {model_name}
+```
+**注意** 如果是量化模型，请使用`deploy/benchmark/benchmark_quant.sh`脚本。
+- 将测试结果log导出至Excel中：
+```
+python deploy/benchmark/log_parser_excel.py --log_path=./output_pipeline --output_name=benchmark_excel.xlsx
+```
+## 6.常见问题QA
+- 1、`Paddle 1.8.4`训练的模型，可以用`Paddle2.0`部署吗？
+  Paddle 2.0是兼容Paddle 1.8.4的，因此是可以的。但是部分模型(如SOLOv2)使用到了Paddle 2.0中新增OP，这类模型不可以。
+- 2、Windows编译时，预测库是VS2015编译的，选择VS2017或VS2019会有问题吗？
+  关于VS兼容性问题请参考：[C++Visual Studio 2015、2017和2019之间的二进制兼容性](https://docs.microsoft.com/zh-cn/cpp/porting/binary-compat-2015-2017?view=msvc-160)
+- 3、cuDNN 8.0.4连续预测会发生内存泄漏吗？
+  经QA测试，发现cuDNN 8系列连续预测时都有内存泄漏问题，且cuDNN 8性能差于cuDNN 7，推荐使用CUDA + cuDNN7.6.4的方式进行部署。
--- a/deploy/README_en.md
+++ b/deploy/README_en.md
+# PaddleDetection Predict deployment
+PaddleDetection provides multiple deployment forms of Paddle Inference, Paddle Serving and Paddle-Lite, supports multiple platforms such as server, mobile and embedded, and provides a complete Python and C++ deployment solution
+## PaddleDetection This section describes the supported deployment modes
+| formalization    | language | Tutorial    | Equipment/Platform        |
+| ---------------- | -------- | ----------- | ------------------------- |
+| Paddle Inference | Python   | Has perfect | Linux(ARM\X86)、Windows   |
+| Paddle Inference | C++      | Has perfect | Linux(ARM\X86)、Windows   |
+| Paddle Serving   | Python   | Has perfect | Linux(ARM\X86)、Windows   |
+| Paddle-Lite      | C++      | Has perfect | Android、IOS、FPGA、RK... |
+## 1.Paddle Inference Deployment
+### 1.1 The export model
+Use the `tools/export_model.py` script to export the model and the configuration file used during deployment. The configuration file name is `infer_cfg.yml`. The model export script is as follows
+```bash
+# The YOLOv3 model is derived
+python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams
+```
+The prediction model will be exported to the `output_inference/yolov3_mobilenet_v1_roadsign` directory `infer_cfg.yml`, `model.pdiparams`,  `model.pdiparams.info`, `model.pdmodel`. For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](./EXPORT_MODEL_en.md).
+### 1.2 Use Paddle Inference to Make Predictions
+* Python deployment supports `CPU`, `GPU` and `XPU` environments, Windows, Linux, and NV Jetson embedded devices. Reference Documentation [Python Deployment](python/README.md)
+* C++ deployment supports `CPU`, `GPU` and `XPU` environments, Windows and Linux systems, and NV Jetson embedded devices. Reference documentation [C++ deployment](cpp/README.md)
+* PaddleDetection supports TensorRT acceleration. Please refer to the documentation for [TensorRT Predictive Deployment Tutorial](TENSOR_RT.md)
+**Attention:**  Paddle prediction library version requires >=2.1, and batch_size>1 only supports YOLOv3 and PP-YOLO.
+##  2.PaddleServing Deployment
+### 2.1 Export model
+If you want to export the model in `PaddleServing` format, set `export_serving_model=True`:
+```buildoutcfg
+python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams --export_serving_model=True
+```
+The prediction model will be exported to the `output_inference/yolov3_darknet53_270e_coco` directory `infer_cfg.yml`, `model.pdiparams`,  `model.pdiparams.info`, `model.pdmodel`, `serving_client/` and `serving_server/` folder.
+For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](./EXPORT_MODEL_en.md).
+### 2.2 Predictions are made using Paddle Serving
+* [Install PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md#installation)
+* [Use PaddleServing](./serving/README.md)
+## 3. PaddleLite Deployment
+- [Deploy the PaddleDetection model using PaddleLite](./lite/README.md)
+- For details, please refer to [Paddle-Lite-Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo) deployment. For more information, please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)
+## 4.Third-Engine deploy(MNN、NCNN、Openvino)
+- The Third-Engine deploy take example of PicoDet、TinyPose，the others model is the same
+- Suggestion for TinyPose: For Intel CPU Openvino is recommended，for Nvidia GPU PaddleInference is recommended，and for ARM/ANDROID PaddleLite or MNN is recommended.
+| Third_Engine | MNN                                                    | NCNN                                               | OPENVINO                                                     |
+| ------------ | ------------------------------------------------------ | -------------------------------------------------- | ------------------------------------------------------------ |
+| PicoDet      | [PicoDet_MNN](./third_engine/demo_mnn/README.md)       | [PicoDet_NCNN](./third_engine/demo_ncnn/README.md) | [PicoDet_OPENVINO](./third_engine/demo_openvino/README.md)   |
+| TinyPose     | [TinyPose_MNN](./third_engine/demo_mnn_kpts/README.md) | -                                                  | [TinyPose_OPENVINO](./third_engine/demo_openvino_kpts/README.md) |
+## 5. Benchmark Test
+- Using the exported model, run the Benchmark batch test script:
+```shell
+sh deploy/benchmark/benchmark.sh {model_dir} {model_name}
+```
+**Attention** If it is a quantitative model, please use the `deploy/benchmark/benchmark_quant.sh` script.
+- Export the test result log to Excel：
+```
+python deploy/benchmark/log_parser_excel.py --log_path=./output_pipeline --output_name=benchmark_excel.xlsx
+```
+## 6. FAQ
+- 1、Can `Paddle 1.8.4` trained models be deployed with `Paddle2.0`?
+  Paddle 2.0 is compatible with Paddle 1.8.4, so it is ok. However, some models (such as SOLOv2) use the new OP in Paddle 2.0, which is not allowed.
+- 2、When compiling for Windows, the prediction library is compiled with VS2015, will it be a problem to choose VS2017 or VS2019?
+  For compatibility issues with VS, please refer to: [C++ Visual Studio 2015, 2017 and 2019 binary compatibility](https://docs.microsoft.com/zh-cn/cpp/porting/binary-compat-2015-2017?view=msvc-160)
+- 3、Does cuDNN 8.0.4 continuously predict memory leaks?
+  QA tests show that cuDNN 8 series have memory leakage problems in continuous prediction, and cuDNN 8 performance is worse than cuDNN7. CUDA + cuDNN7.6.4 is recommended for deployment.
--- a/deploy/TENSOR_RT.md
+++ b/deploy/TENSOR_RT.md
+# TensorRT预测部署教程
+TensorRT是NVIDIA提出的用于统一模型部署的加速库，可以应用于V100、JETSON Xavier等硬件，它可以极大提高预测速度。Paddle TensorRT教程请参考文档[使用Paddle-TensorRT库预测](https://www.paddlepaddle.org.cn/inference/optimize/paddle_trt.html)
+## 1. 安装PaddleInference预测库
+- Python安装包，请从[这里](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#python) 下载带有tensorrt的安装包进行安装
+- CPP预测库，请从[这里](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html) 下载带有TensorRT编译的预测库
+- 如果Python和CPP官网没有提供已编译好的安装包或预测库，请参考[源码安装](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile.html) 自行编译
+**注意：**
+- 您的机器上TensorRT的版本需要跟您使用的预测库中TensorRT版本保持一致。
+- PaddleDetection中部署预测要求TensorRT版本 > 6.0。
+## 2. 导出模型
+模型导出具体请参考文档[PaddleDetection模型导出教程](./EXPORT_MODEL.md)。
+## 3. 开启TensorRT加速
+### 3.1 配置TensorRT
+在使用Paddle预测库构建预测器配置config时，打开TensorRT引擎就可以了：
+```
+config->EnableUseGpu(100, 0); // 初始化100M显存，使用GPU ID为0
+config->GpuDeviceId();        // 返回正在使用的GPU ID
+// 开启TensorRT预测，可提升GPU预测性能，需要使用带TensorRT的预测库
+config->EnableTensorRtEngine(1 << 20             /*workspace_size*/,
+                             batch_size        /*max_batch_size*/,
+                             3                 /*min_subgraph_size*/,
+                             AnalysisConfig::Precision::kFloat32 /*precision*/,
+                             false             /*use_static*/,
+                             false             /*use_calib_mode*/);
+```
+**注意:**
+  --run_benchmark如果设置为True，则需要安装依赖`pip install pynvml psutil GPUtil`。
+### 3.2 TensorRT固定尺寸预测
+例如在模型Reader配置文件中设置：
+```yaml
+TestReader:
+  inputs_def:
+    image_shape: [3,608,608]
+  ...
+```
+或者在导出模型时设置`-o TestReader.inputs_def.image_shape=[3,608,608]`，模型将会进行固定尺寸预测，具体请参考[PaddleDetection模型导出教程](./EXPORT_MODEL.md) 。
+可以通过[visualdl](https://www.paddlepaddle.org.cn/paddle/visualdl/demo/graph) 打开`model.pdmodel`文件，查看输入的第一个Tensor尺寸是否是固定的，如果不指定，尺寸会用`？`表示，如下图所示：
+![img](../docs/images/input_shape.png)
+注意：由于TesnorRT不支持在batch维度进行slice操作，Faster RCNN 和 Mask RCNN不能使用固定尺寸输入预测，所以不能设置`TestReader.inputs_def.image_shape`字段。
+以`YOLOv3`为例，使用固定尺寸输入预测：
+```
+python python/infer.py --model_dir=./output_inference/yolov3_darknet53_270e_coco/ --image_file=./demo/000000014439.jpg --device=GPU --run_mode=trt_fp32 --run_benchmark=True
+```
+### 3.3 TensorRT动态尺寸预测
+TensorRT版本>=6时，使用TensorRT预测时，可以支持动态尺寸输入。如果模型Reader配置文件中没有设置例如`TestReader.inputs_def.image_shape=[3,608,608]`的字段，或者`image_shape=[3.-1,-1]`，导出模型将以动态尺寸进行预测。一般RCNN系列模型使用动态图尺寸预测。
+Paddle预测库关于动态尺寸输入请查看[Paddle CPP预测](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/native_infer.html) 的`SetTRTDynamicShapeInfo`函数说明。
+`python/infer.py`设置动态尺寸输入参数说明：
+- trt_min_shape 用于设定TensorRT的输入图像height、width中的最小尺寸，默认值：1
+- trt_max_shape 用于设定TensorRT的输入图像height、width中的最大尺寸，默认值：1280
+- trt_opt_shape 用于设定TensorRT的输入图像height、width中的最优尺寸，默认值：640
+**注意：`TensorRT`中动态尺寸设置是4维的，这里只设置输入图像的尺寸。**
+以`Faster RCNN`为例，使用动态尺寸输入预测：
+```
+python python/infer.py --model_dir=./output_inference/faster_rcnn_r50_fpn_1x_coco/ --image_file=./demo/000000014439.jpg --device=GPU --run_mode=trt_fp16 --run_benchmark=True --trt_max_shape=1280 --trt_min_shape=800 --trt_opt_shape=960
+```
+## 4、常见问题QA
+**Q:** 提示没有`tensorrt_op`</br>
+**A:** 请检查是否使用带有TensorRT的Paddle Python包或预测库。
+**Q:** 提示`op out of memory`</br>
+**A:** 检查GPU是否是别人也在使用，请尝试使用空闲GPU
+**Q:** 提示`some trt inputs dynamic shape info not set`</br>
+**A:** 这是由于`TensorRT`会把网络结果划分成多个子图，我们只设置了输入数据的动态尺寸，划分的其他子图的输入并未设置动态尺寸。有两个解决方法：
+- 方法一：通过增大`min_subgraph_size`，跳过对这些子图的优化。根据提示，设置min_subgraph_size大于并未设置动态尺寸输入的子图中OP个数即可。
+`min_subgraph_size`的意思是，在加载TensorRT引擎的时候，大于`min_subgraph_size`的OP才会被优化，并且这些OP是连续的且是TensorRT可以优化的。
+- 方法二：找到子图的这些输入，按照上面方式也设置子图的输入动态尺寸。
+**Q:** 如何打开日志</br>
+**A:** 预测库默认是打开日志的，只要注释掉`config.disable_glog_info()`就可以打开日志
+**Q:** 开启TensorRT，预测时提示Slice on batch axis is not supported in TensorRT</br>
+**A:** 请尝试使用动态尺寸输入
--- a/deploy/auto_compression/README.md
+++ b/deploy/auto_compression/README.md
+# 自动化压缩
+目录：
+- [1.简介](#1简介)
+- [2.Benchmark](#2Benchmark)
+- [3.开始自动压缩](#自动压缩流程)
+  - [3.1 环境准备](#31-准备环境)
+  - [3.2 准备数据集](#32-准备数据集)
+  - [3.3 准备预测模型](#33-准备预测模型)
+  - [3.4 测试模型精度](#34-测试模型精度)
+  - [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
+- [4.预测部署](#4预测部署)
+## 1. 简介
+本示例使用PaddleDetection中Inference部署模型进行自动化压缩，使用的自动化压缩策略为量化蒸馏。
+## 2.Benchmark
+### PP-YOLOE+
+| 模型  | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 |  配置文件 | 量化模型  |
+| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
+| PP-YOLOE+_s	 | 43.7  |  - | 42.9  |   -  |   -   |  -  |  [config](./configs/ppyoloe_plus_s_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_s_qat_dis.tar) |
+| PP-YOLOE+_m | 49.8  |  - | 49.3  |   -  |   -   |  -  |  [config](./configs/ppyoloe_plus_m_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_m_qat_dis.tar) |
+| PP-YOLOE+_l | 52.9  |  - | 52.6  |   -  |   -   |  -  |  [config](./configs/ppyoloe_plus_l_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_l_qat_dis.tar) |
+| PP-YOLOE+_x | 54.7  |  - | 54.4  |   -  |   -   |  -  |  [config](./configs/ppyoloe_plus_x_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_x_qat_dis.tar) |
+- mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。
+### YOLOv8
+| 模型  | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 |  配置文件 | 量化模型  |
+| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
+| YOLOv8-s | 44.9 |  43.9 | 44.3  |   9.27ms  |   4.65ms   |  **3.78ms**  |  [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/yolov8_s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov8_s_500e_coco_trt_nms_quant.tar) |
+**注意：**
+- 表格中YOLOv8模型均为带NMS的模型，可直接在TRT中部署，如果需要对齐测试标准，需要测试不带NMS的模型。
+- mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。
+- 表格中的性能在Tesla T4的GPU环境下测试，并且开启TensorRT，batch_size=1。
+### PP-YOLOE
+| 模型  | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 |  配置文件 | 量化模型  |
+| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
+| PP-YOLOE-l | 50.9  |  - | 50.6  |   11.2ms  |   7.7ms   |  **6.7ms**  |  [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_l_300e_coco_quant.tar) |
+| PP-YOLOE-SOD | 38.5  |  - | 37.6  |   -  |   -   |  -  |  [config](./configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_qat.yml) | [Quant Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_sod_visdrone.tar) |
+git
+- PP-YOLOE-l mAP的指标在COCO val2017数据集中评测得到，IoU=0.5:0.95。
+- PP-YOLOE-l模型在Tesla V100的GPU环境下测试，并且开启TensorRT，batch_size=1，包含NMS，测试脚本是[benchmark demo](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy/python)。
+- PP-YOLOE-SOD 的指标在VisDrone-DET数据集切图后的COCO格式[数据集](https://bj.bcebos.com/v1/paddledet/data/smalldet/visdrone_sliced.zip)中评测得到，IoU=0.5:0.95。定义文件[ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml](../../configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml)
+### PP-PicoDet
+| 模型  | 策略 | mAP | FP32 | FP16 | INT8 |  配置文件 | 模型  |
+| :-------- |:-------- |:--------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
+| PicoDet-S-NPU | Baseline | 30.1   |   -   |  -  |  -  | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_416_coco_npu.yml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_416_coco_npu.tar) |
+| PicoDet-S-NPU |  量化训练 | 29.7  |   -  |   -   |  -  |  [config](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/full_quantization/detection/configs/picodet_s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_npu_quant.tar) |
+- mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。
+### RT-DETR
+| 模型              | Base mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 |  TRT-INT8  |                           配置文件                           |                           量化模型                           |
+| :---------------- | :------- | :--------: | :------: | :------: | :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
+| RT-DETR-R50       | 53.1     |    53.0    | 32.05ms  |  9.12ms  | **6.96ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r50vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r50vd_6x_coco_quant.tar) |
+| RT-DETR-R101      | 54.3     |    54.1    | 54.13ms  | 12.68ms  | **9.20ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r101vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r101vd_6x_coco_quant.tar) |
+| RT-DETR-HGNetv2-L | 53.0     |    52.9    | 26.16ms  |  8.54ms  | **6.65ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_l_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_l_6x_coco_quant.tar) |
+| RT-DETR-HGNetv2-X | 54.8     |    54.6    | 49.22ms  | 12.50ms  | **9.24ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_x_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_x_6x_coco_quant.tar) |
+- 上表测试环境：Tesla T4，TensorRT 8.6.0，CUDA 11.7，batch_size=1。
+| 模型              | Base mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 |  TRT-INT8  |                           配置文件                           |                           量化模型                           |
+| :---------------- | :------- | :--------: | :------: | :------: | :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
+| RT-DETR-R50       | 53.1     |    53.0    |  9.64ms  |  5.00ms  | **3.99ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r50vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r50vd_6x_coco_quant.tar) |
+| RT-DETR-R101      | 54.3     |    54.1    | 14.93ms  |  7.15ms  | **5.12ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r101vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r101vd_6x_coco_quant.tar) |
+| RT-DETR-HGNetv2-L | 53.0     |    52.9    |  8.17ms  |  4.77ms  | **4.00ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_l_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_l_6x_coco_quant.tar) |
+| RT-DETR-HGNetv2-X | 54.8     |    54.6    | 12.81ms  |  6.97ms  | **5.32ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_x_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_x_6x_coco_quant.tar) |
+- 上表测试环境：A10，TensorRT 8.6.0，CUDA 11.6，batch_size=1。
+- mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。
+## 3. 自动压缩流程
+#### 3.1 准备环境
+- PaddlePaddle >= 2.4 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
+- PaddleSlim >= 2.4.1
+- PaddleDet >= 2.5
+- opencv-python
+安装paddlepaddle：
+```shell
+# CPU
+pip install paddlepaddle
+# GPU
+pip install paddlepaddle-gpu
+```
+安装paddleslim：
+```shell
+pip install paddleslim
+```
+安装paddledet：
+```shell
+pip install paddledet
+```
+**注意：** YOLOv8模型的自动化压缩需要依赖安装最新[Develop Paddle](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)和[Develop PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim#%E5%AE%89%E8%A3%85)版本。
+#### 3.2 准备数据集
+本案例默认以COCO数据进行自动压缩实验，如果自定义COCO数据，或者其他格式数据，请参考[数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/docs/tutorials/data/PrepareDataSet.md) 来准备数据。
+如果数据集为非COCO格式数据，请修改[configs](./configs)中reader配置文件中的Dataset字段。
+以PP-YOLOE模型为例，如果已经准备好数据集，请直接修改[./configs/yolo_reader.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。
+#### 3.3 准备预测模型
+预测模型的格式为：`model.pdmodel` 和 `model.pdiparams`两个，带`pdmodel`的是模型文件，带`pdiparams`后缀的是权重文件。
+根据[PaddleDetection文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED_cn.md#8-%E6%A8%A1%E5%9E%8B%E5%AF%BC%E5%87%BA) 导出Inference模型，具体可参考下方PP-YOLOE模型的导出示例：
+- 下载代码
+```
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+```
+- 导出预测模型
+PPYOLOE-l模型，包含NMS：如快速体验，可直接下载[PP-YOLOE-l导出模型](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_l_300e_coco.tar)
+```shell
+python tools/export_model.py \
+        -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml \
+        -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams \
+        trt=True \
+```
+YOLOv8-s模型，包含NMS，具体可参考[YOLOv8模型文档](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8), 然后执行：
+```shell
+python tools/export_model.py \
+        -c configs/yolov8/yolov8_s_500e_coco.yml \
+        -o weights=https://paddledet.bj.bcebos.com/models/yolov8_s_500e_coco.pdparams \
+        trt=True
+```
+如快速体验，可直接下载[YOLOv8-s导出模型](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov8_s_500e_coco_trt_nms.tar)
+#### 3.4 自动压缩并产出模型
+蒸馏量化自动压缩示例通过run.py脚本启动，会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数，配置完成后便可对模型进行量化和蒸馏。具体运行命令为：
+- 单卡训练：
+```
+export CUDA_VISIBLE_DEVICES=0
+python run.py --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/'
+```
+- 多卡训练：
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
+          --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/'
+```
+#### 3.5 测试模型精度
+使用eval.py脚本得到模型的mAP：
+```
+export CUDA_VISIBLE_DEVICES=0
+python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml
+```
+使用paddle inference并使用trt int8得到模型的mAP:
+```
+export CUDA_VISIBLE_DEVICES=0
+python paddle_inference_eval.py --model_path ./output/ --reader_config configs/ppyoloe_reader.yml --precision int8 --use_trt=True
+```
+**注意**：
+- 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。
+- --precision 默认为paddle，如果使用trt，需要设置--use_trt=True，同时--precision 可设置为fp32/fp16/int8
+## 4.预测部署
+- 可以参考[PaddleDetection部署教程](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy)，GPU上量化模型开启TensorRT并设置trt_int8模式进行部署。
--- a/deploy/auto_compression/configs/picodet_reader.yml
+++ b/deploy/auto_compression/configs/picodet_reader.yml
+metric: COCO
+num_classes: 80
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco/
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+worker_num: 6
+eval_height: &eval_height 416
+eval_width: &eval_width 416
+eval_size: &eval_size [*eval_height, *eval_width]
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
+  - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32}
+  batch_size: 8
+  shuffle: false
--- a/deploy/auto_compression/configs/picodet_s_qat_dis.yaml
+++ b/deploy/auto_compression/configs/picodet_s_qat_dis.yaml
+Global:
+  reader_config: ./configs/picodet_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./picodet_s_416_coco_npu/
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: l2
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  weight_bits: 8
+  activation_bits: 8
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 8000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00001
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 4.0e-05
--- a/deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_qat.yml
+++ b/deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_qat.yml
+Global:
+  reader_config: configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_reader.yml
+  input_list: ['image', 'scale_factor']
+  arch: YOLO
+  include_nms: True
+  Evaluation: True
+  model_dir: ../../output_inference/ppyoloe_crn_l_80e_sliced_visdrone_640_025
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+QuantAware:
+  onnx_format: True
+  use_pact: False
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 8000
+  eval_iter: 500
+  learning_rate:
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 4.0e-05
--- a/deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_reader.yml
+++ b/deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_reader.yml
+metric: COCO
+num_classes: 10
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train_images_640_025
+    anno_path: train_640_025.json
+    dataset_dir: dataset/visdrone_sliced
+EvalDataset:
+  !COCODataSet
+    image_dir: val_images_640_025
+    anno_path: val_640_025.json
+    dataset_dir: dataset/visdrone_sliced
+worker_num: 0
+# preprocess reader in test
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    #- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 16
--- a/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml
+++ b/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml
+Global:
+  reader_config: configs/ppyoloe_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./ppyoloe_crn_l_300e_coco
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/deploy/auto_compression/configs/ppyoloe_plus_crn_t_auxhead_300e_coco_qat.yml
+++ b/deploy/auto_compression/configs/ppyoloe_plus_crn_t_auxhead_300e_coco_qat.yml
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ../../output_inference/ppyoloe_plus_crn_t_auxhead_300e_coco/
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+QuantAware:
+  onnx_format: True
+  use_pact: False
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 8000
+  eval_iter: 1000
+  learning_rate:
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 4.0e-05
--- a/deploy/auto_compression/configs/ppyoloe_plus_l_qat_dis.yaml
+++ b/deploy/auto_compression/configs/ppyoloe_plus_l_qat_dis.yaml
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./ppyoloe_plus_crn_l_80e_coco  
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/deploy/auto_compression/configs/ppyoloe_plus_m_qat_dis.yaml
+++ b/deploy/auto_compression/configs/ppyoloe_plus_m_qat_dis.yaml
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./ppyoloe_plus_crn_m_80e_coco
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/deploy/auto_compression/configs/ppyoloe_plus_reader.yml
+++ b/deploy/auto_compression/configs/ppyoloe_plus_reader.yml
+metric: COCO
+num_classes: 80
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco/
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+worker_num: 0
+# preprocess reader in test
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 4
--- a/deploy/auto_compression/configs/ppyoloe_plus_s_qat_dis.yaml
+++ b/deploy/auto_compression/configs/ppyoloe_plus_s_qat_dis.yaml
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./ppyoloe_plus_crn_s_80e_coco
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/deploy/auto_compression/configs/ppyoloe_plus_sod_crn_l_qat_dis.yaml
+++ b/deploy/auto_compression/configs/ppyoloe_plus_sod_crn_l_qat_dis.yaml
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ../../output_inference/ppyoloe_plus_sod_crn_l_80e_coco  
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+QuantAware:
+  onnx_format: True
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 1
+  eval_iter: 1
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/deploy/auto_compression/configs/ppyoloe_plus_x_qat_dis.yaml
+++ b/deploy/auto_compression/configs/ppyoloe_plus_x_qat_dis.yaml
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./ppyoloe_plus_crn_x_80e_coco  
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/deploy/auto_compression/configs/ppyoloe_reader.yml
+++ b/deploy/auto_compression/configs/ppyoloe_reader.yml
+metric: COCO
+num_classes: 80
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco/
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+worker_num: 0
+# preprocess reader in test
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 4
--- a/deploy/auto_compression/configs/rtdetr_hgnetv2_l_qat_dis.yaml
+++ b/deploy/auto_compression/configs/rtdetr_hgnetv2_l_qat_dis.yaml
+Global:
+  reader_config: configs/rtdetr_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./rtdetr_hgnetv2_l_6x_coco/
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+QuantAware:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+  - matmul_v2
+TrainConfig:
+  train_iter: 200
+  eval_iter: 50
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 10000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/deploy/auto_compression/configs/rtdetr_hgnetv2_x_qat_dis.yaml
+++ b/deploy/auto_compression/configs/rtdetr_hgnetv2_x_qat_dis.yaml
+Global:
+  reader_config: configs/rtdetr_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./rtdetr_r50vd_6x_coco/ 
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+QuantAware:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+  - matmul_v2
+TrainConfig:
+  train_iter: 500
+  eval_iter: 100
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 10000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/deploy/auto_compression/configs/rtdetr_r101vd_qat_dis.yaml
+++ b/deploy/auto_compression/configs/rtdetr_r101vd_qat_dis.yaml
+Global:
+  reader_config: configs/rtdetr_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./rtdetr_hgnetv2_x_6x_coco/
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+QuantAware:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+  - matmul_v2
+TrainConfig:
+  train_iter: 200
+  eval_iter: 50
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 10000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05