Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
yolov5_paddle
Commits
fccfdfa5
Commit
fccfdfa5
authored
Dec 25, 2023
by
dlyrm
Browse files
update code
parent
dcc7bf4f
Pipeline
#681
canceled with stages
Changes
508
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
840 additions
and
0 deletions
+840
-0
deploy/TENSOR_RT.md
deploy/TENSOR_RT.md
+98
-0
deploy/auto_compression/README.md
deploy/auto_compression/README.md
+186
-0
deploy/auto_compression/configs/picodet_reader.yml
deploy/auto_compression/configs/picodet_reader.yml
+32
-0
deploy/auto_compression/configs/picodet_s_qat_dis.yaml
deploy/auto_compression/configs/picodet_s_qat_dis.yaml
+34
-0
deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_qat.yml
...configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_qat.yml
+34
-0
deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_reader.yml
...figs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_reader.yml
+25
-0
deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml
deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml
+32
-0
deploy/auto_compression/configs/ppyoloe_plus_crn_t_auxhead_300e_coco_qat.yml
...sion/configs/ppyoloe_plus_crn_t_auxhead_300e_coco_qat.yml
+32
-0
deploy/auto_compression/configs/ppyoloe_plus_l_qat_dis.yaml
deploy/auto_compression/configs/ppyoloe_plus_l_qat_dis.yaml
+32
-0
deploy/auto_compression/configs/ppyoloe_plus_m_qat_dis.yaml
deploy/auto_compression/configs/ppyoloe_plus_m_qat_dis.yaml
+32
-0
deploy/auto_compression/configs/ppyoloe_plus_reader.yml
deploy/auto_compression/configs/ppyoloe_plus_reader.yml
+26
-0
deploy/auto_compression/configs/ppyoloe_plus_s_qat_dis.yaml
deploy/auto_compression/configs/ppyoloe_plus_s_qat_dis.yaml
+32
-0
deploy/auto_compression/configs/ppyoloe_plus_sod_crn_l_qat_dis.yaml
...o_compression/configs/ppyoloe_plus_sod_crn_l_qat_dis.yaml
+33
-0
deploy/auto_compression/configs/ppyoloe_plus_x_qat_dis.yaml
deploy/auto_compression/configs/ppyoloe_plus_x_qat_dis.yaml
+32
-0
deploy/auto_compression/configs/ppyoloe_reader.yml
deploy/auto_compression/configs/ppyoloe_reader.yml
+26
-0
deploy/auto_compression/configs/rtdetr_hgnetv2_l_qat_dis.yaml
...oy/auto_compression/configs/rtdetr_hgnetv2_l_qat_dis.yaml
+32
-0
deploy/auto_compression/configs/rtdetr_hgnetv2_x_qat_dis.yaml
...oy/auto_compression/configs/rtdetr_hgnetv2_x_qat_dis.yaml
+32
-0
deploy/auto_compression/configs/rtdetr_r101vd_qat_dis.yaml
deploy/auto_compression/configs/rtdetr_r101vd_qat_dis.yaml
+32
-0
deploy/auto_compression/configs/rtdetr_r50vd_qat_dis.yaml
deploy/auto_compression/configs/rtdetr_r50vd_qat_dis.yaml
+32
-0
deploy/auto_compression/configs/rtdetr_reader.yml
deploy/auto_compression/configs/rtdetr_reader.yml
+26
-0
No files found.
deploy/TENSOR_RT.md
0 → 100644
View file @
fccfdfa5
# TensorRT预测部署教程
TensorRT是NVIDIA提出的用于统一模型部署的加速库,可以应用于V100、JETSON Xavier等硬件,它可以极大提高预测速度。Paddle TensorRT教程请参考文档
[
使用Paddle-TensorRT库预测
](
https://www.paddlepaddle.org.cn/inference/optimize/paddle_trt.html
)
## 1. 安装PaddleInference预测库
-
Python安装包,请从
[
这里
](
https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#python
)
下载带有tensorrt的安装包进行安装
-
CPP预测库,请从
[
这里
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html
)
下载带有TensorRT编译的预测库
-
如果Python和CPP官网没有提供已编译好的安装包或预测库,请参考
[
源码安装
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile.html
)
自行编译
**注意:**
-
您的机器上TensorRT的版本需要跟您使用的预测库中TensorRT版本保持一致。
-
PaddleDetection中部署预测要求TensorRT版本 > 6.0。
## 2. 导出模型
模型导出具体请参考文档
[
PaddleDetection模型导出教程
](
./EXPORT_MODEL.md
)
。
## 3. 开启TensorRT加速
### 3.1 配置TensorRT
在使用Paddle预测库构建预测器配置config时,打开TensorRT引擎就可以了:
```
config->EnableUseGpu(100, 0); // 初始化100M显存,使用GPU ID为0
config->GpuDeviceId(); // 返回正在使用的GPU ID
// 开启TensorRT预测,可提升GPU预测性能,需要使用带TensorRT的预测库
config->EnableTensorRtEngine(1 << 20 /*workspace_size*/,
batch_size /*max_batch_size*/,
3 /*min_subgraph_size*/,
AnalysisConfig::Precision::kFloat32 /*precision*/,
false /*use_static*/,
false /*use_calib_mode*/);
```
**注意:**
--run_benchmark如果设置为True,则需要安装依赖
`pip install pynvml psutil GPUtil`
。
### 3.2 TensorRT固定尺寸预测
例如在模型Reader配置文件中设置:
```
yaml
TestReader
:
inputs_def
:
image_shape
:
[
3
,
608
,
608
]
...
```
或者在导出模型时设置
`-o TestReader.inputs_def.image_shape=[3,608,608]`
,模型将会进行固定尺寸预测,具体请参考
[
PaddleDetection模型导出教程
](
./EXPORT_MODEL.md
)
。
可以通过
[
visualdl
](
https://www.paddlepaddle.org.cn/paddle/visualdl/demo/graph
)
打开
`model.pdmodel`
文件,查看输入的第一个Tensor尺寸是否是固定的,如果不指定,尺寸会用
`?`
表示,如下图所示:

注意:由于TesnorRT不支持在batch维度进行slice操作,Faster RCNN 和 Mask RCNN不能使用固定尺寸输入预测,所以不能设置
`TestReader.inputs_def.image_shape`
字段。
以
`YOLOv3`
为例,使用固定尺寸输入预测:
```
python python/infer.py --model_dir=./output_inference/yolov3_darknet53_270e_coco/ --image_file=./demo/000000014439.jpg --device=GPU --run_mode=trt_fp32 --run_benchmark=True
```
### 3.3 TensorRT动态尺寸预测
TensorRT版本>=6时,使用TensorRT预测时,可以支持动态尺寸输入。如果模型Reader配置文件中没有设置例如
`TestReader.inputs_def.image_shape=[3,608,608]`
的字段,或者
`image_shape=[3.-1,-1]`
,导出模型将以动态尺寸进行预测。一般RCNN系列模型使用动态图尺寸预测。
Paddle预测库关于动态尺寸输入请查看
[
Paddle CPP预测
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/native_infer.html
)
的
`SetTRTDynamicShapeInfo`
函数说明。
`python/infer.py`
设置动态尺寸输入参数说明:
-
trt_min_shape 用于设定TensorRT的输入图像height、width中的最小尺寸,默认值:1
-
trt_max_shape 用于设定TensorRT的输入图像height、width中的最大尺寸,默认值:1280
-
trt_opt_shape 用于设定TensorRT的输入图像height、width中的最优尺寸,默认值:640
**注意:`TensorRT`中动态尺寸设置是4维的,这里只设置输入图像的尺寸。**
以
`Faster RCNN`
为例,使用动态尺寸输入预测:
```
python python/infer.py --model_dir=./output_inference/faster_rcnn_r50_fpn_1x_coco/ --image_file=./demo/000000014439.jpg --device=GPU --run_mode=trt_fp16 --run_benchmark=True --trt_max_shape=1280 --trt_min_shape=800 --trt_opt_shape=960
```
## 4、常见问题QA
**Q:**
提示没有
`tensorrt_op`
</br>
**A:**
请检查是否使用带有TensorRT的Paddle Python包或预测库。
**Q:**
提示
`op out of memory`
</br>
**A:**
检查GPU是否是别人也在使用,请尝试使用空闲GPU
**Q:**
提示
`some trt inputs dynamic shape info not set`
</br>
**A:**
这是由于
`TensorRT`
会把网络结果划分成多个子图,我们只设置了输入数据的动态尺寸,划分的其他子图的输入并未设置动态尺寸。有两个解决方法:
-
方法一:通过增大
`min_subgraph_size`
,跳过对这些子图的优化。根据提示,设置min_subgraph_size大于并未设置动态尺寸输入的子图中OP个数即可。
`min_subgraph_size`
的意思是,在加载TensorRT引擎的时候,大于
`min_subgraph_size`
的OP才会被优化,并且这些OP是连续的且是TensorRT可以优化的。
-
方法二:找到子图的这些输入,按照上面方式也设置子图的输入动态尺寸。
**Q:**
如何打开日志
</br>
**A:**
预测库默认是打开日志的,只要注释掉
`config.disable_glog_info()`
就可以打开日志
**Q:**
开启TensorRT,预测时提示Slice on batch axis is not supported in TensorRT
</br>
**A:**
请尝试使用动态尺寸输入
deploy/auto_compression/README.md
0 → 100644
View file @
fccfdfa5
# 自动化压缩
目录:
-
[
1.简介
](
#1简介
)
-
[
2.Benchmark
](
#2Benchmark
)
-
[
3.开始自动压缩
](
#自动压缩流程
)
-
[
3.1 环境准备
](
#31-准备环境
)
-
[
3.2 准备数据集
](
#32-准备数据集
)
-
[
3.3 准备预测模型
](
#33-准备预测模型
)
-
[
3.4 测试模型精度
](
#34-测试模型精度
)
-
[
3.5 自动压缩并产出模型
](
#35-自动压缩并产出模型
)
-
[
4.预测部署
](
#4预测部署
)
## 1. 简介
本示例使用PaddleDetection中Inference部署模型进行自动化压缩,使用的自动化压缩策略为量化蒸馏。
## 2.Benchmark
### PP-YOLOE+
| 模型 | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 | 配置文件 | 量化模型 |
| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
| PP-YOLOE+_s | 43.7 | - | 42.9 | - | - | - |
[
config
](
./configs/ppyoloe_plus_s_qat_dis.yaml
)
|
[
Quant Model
](
https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_s_qat_dis.tar
)
|
| PP-YOLOE+_m | 49.8 | - | 49.3 | - | - | - |
[
config
](
./configs/ppyoloe_plus_m_qat_dis.yaml
)
|
[
Quant Model
](
https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_m_qat_dis.tar
)
|
| PP-YOLOE+_l | 52.9 | - | 52.6 | - | - | - |
[
config
](
./configs/ppyoloe_plus_l_qat_dis.yaml
)
|
[
Quant Model
](
https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_l_qat_dis.tar
)
|
| PP-YOLOE+_x | 54.7 | - | 54.4 | - | - | - |
[
config
](
./configs/ppyoloe_plus_x_qat_dis.yaml
)
|
[
Quant Model
](
https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_x_qat_dis.tar
)
|
-
mAP的指标均在COCO val2017数据集中评测得到,IoU=0.5:0.95。
### YOLOv8
| 模型 | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 | 配置文件 | 量化模型 |
| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
| YOLOv8-s | 44.9 | 43.9 | 44.3 | 9.27ms | 4.65ms |
**3.78ms**
|
[
config
](
https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/yolov8_s_qat_dis.yaml
)
|
[
Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/yolov8_s_500e_coco_trt_nms_quant.tar
)
|
**注意:**
-
表格中YOLOv8模型均为带NMS的模型,可直接在TRT中部署,如果需要对齐测试标准,需要测试不带NMS的模型。
-
mAP的指标均在COCO val2017数据集中评测得到,IoU=0.5:0.95。
-
表格中的性能在Tesla T4的GPU环境下测试,并且开启TensorRT,batch_size=1。
### PP-YOLOE
| 模型 | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 | 配置文件 | 量化模型 |
| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
| PP-YOLOE-l | 50.9 | - | 50.6 | 11.2ms | 7.7ms |
**6.7ms**
|
[
config
](
https://github.com/PaddlePaddle/PaddleDetection/tree/develop/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml
)
|
[
Quant Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_l_300e_coco_quant.tar
)
|
| PP-YOLOE-SOD | 38.5 | - | 37.6 | - | - | - |
[
config
](
./configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_qat.yml
)
|
[
Quant Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_sod_visdrone.tar
)
|
git
-
PP-YOLOE-l mAP的指标在COCO val2017数据集中评测得到,IoU=0.5:0.95。
-
PP-YOLOE-l模型在Tesla V100的GPU环境下测试,并且开启TensorRT,batch_size=1,包含NMS,测试脚本是
[
benchmark demo
](
https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy/python
)
。
-
PP-YOLOE-SOD 的指标在VisDrone-DET数据集切图后的COCO格式
[
数据集
](
https://bj.bcebos.com/v1/paddledet/data/smalldet/visdrone_sliced.zip
)
中评测得到,IoU=0.5:0.95。定义文件
[
ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml
](
../../configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml
)
### PP-PicoDet
| 模型 | 策略 | mAP | FP32 | FP16 | INT8 | 配置文件 | 模型 |
| :-------- |:-------- |:--------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
| PicoDet-S-NPU | Baseline | 30.1 | - | - | - |
[
config
](
https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_416_coco_npu.yml
)
|
[
Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_416_coco_npu.tar
)
|
| PicoDet-S-NPU | 量化训练 | 29.7 | - | - | - |
[
config
](
https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/full_quantization/detection/configs/picodet_s_qat_dis.yaml
)
|
[
Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_npu_quant.tar
)
|
-
mAP的指标均在COCO val2017数据集中评测得到,IoU=0.5:0.95。
### RT-DETR
| 模型 | Base mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 | 配置文件 | 量化模型 |
| :---------------- | :------- | :--------: | :------: | :------: | :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| RT-DETR-R50 | 53.1 | 53.0 | 32.05ms | 9.12ms |
**6.96ms**
|
[
config
](
https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r50vd_qat_dis.yaml
)
|
[
Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r50vd_6x_coco_quant.tar
)
|
| RT-DETR-R101 | 54.3 | 54.1 | 54.13ms | 12.68ms |
**9.20ms**
|
[
config
](
https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r101vd_qat_dis.yaml
)
|
[
Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r101vd_6x_coco_quant.tar
)
|
| RT-DETR-HGNetv2-L | 53.0 | 52.9 | 26.16ms | 8.54ms |
**6.65ms**
|
[
config
](
https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_l_qat_dis.yaml
)
|
[
Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_l_6x_coco_quant.tar
)
|
| RT-DETR-HGNetv2-X | 54.8 | 54.6 | 49.22ms | 12.50ms |
**9.24ms**
|
[
config
](
https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_x_qat_dis.yaml
)
|
[
Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_x_6x_coco_quant.tar
)
|
-
上表测试环境:Tesla T4,TensorRT 8.6.0,CUDA 11.7,batch_size=1。
| 模型 | Base mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 | 配置文件 | 量化模型 |
| :---------------- | :------- | :--------: | :------: | :------: | :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| RT-DETR-R50 | 53.1 | 53.0 | 9.64ms | 5.00ms |
**3.99ms**
|
[
config
](
https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r50vd_qat_dis.yaml
)
|
[
Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r50vd_6x_coco_quant.tar
)
|
| RT-DETR-R101 | 54.3 | 54.1 | 14.93ms | 7.15ms |
**5.12ms**
|
[
config
](
https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r101vd_qat_dis.yaml
)
|
[
Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r101vd_6x_coco_quant.tar
)
|
| RT-DETR-HGNetv2-L | 53.0 | 52.9 | 8.17ms | 4.77ms |
**4.00ms**
|
[
config
](
https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_l_qat_dis.yaml
)
|
[
Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_l_6x_coco_quant.tar
)
|
| RT-DETR-HGNetv2-X | 54.8 | 54.6 | 12.81ms | 6.97ms |
**5.32ms**
|
[
config
](
https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_x_qat_dis.yaml
)
|
[
Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_x_6x_coco_quant.tar
)
|
-
上表测试环境:A10,TensorRT 8.6.0,CUDA 11.6,batch_size=1。
-
mAP的指标均在COCO val2017数据集中评测得到,IoU=0.5:0.95。
## 3. 自动压缩流程
#### 3.1 准备环境
-
PaddlePaddle >= 2.4 (可从
[
Paddle官网
](
https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html
)
下载安装)
-
PaddleSlim >= 2.4.1
-
PaddleDet >= 2.5
-
opencv-python
安装paddlepaddle:
```
shell
# CPU
pip
install
paddlepaddle
# GPU
pip
install
paddlepaddle-gpu
```
安装paddleslim:
```
shell
pip
install
paddleslim
```
安装paddledet:
```
shell
pip
install
paddledet
```
**注意:**
YOLOv8模型的自动化压缩需要依赖安装最新
[
Develop Paddle
](
https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html
)
和
[
Develop PaddleSlim
](
https://github.com/PaddlePaddle/PaddleSlim#%E5%AE%89%E8%A3%85
)
版本。
#### 3.2 准备数据集
本案例默认以COCO数据进行自动压缩实验,如果自定义COCO数据,或者其他格式数据,请参考
[
数据准备文档
](
https://github.com/PaddlePaddle/PaddleDetection/tree/develop/docs/tutorials/data/PrepareDataSet.md
)
来准备数据。
如果数据集为非COCO格式数据,请修改
[
configs
](
./configs
)
中reader配置文件中的Dataset字段。
以PP-YOLOE模型为例,如果已经准备好数据集,请直接修改[./configs/yolo_reader.yml]中
`EvalDataset`
的
`dataset_dir`
字段为自己数据集路径即可。
#### 3.3 准备预测模型
预测模型的格式为:
`model.pdmodel`
和
`model.pdiparams`
两个,带
`pdmodel`
的是模型文件,带
`pdiparams`
后缀的是权重文件。
根据
[
PaddleDetection文档
](
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED_cn.md#8-%E6%A8%A1%E5%9E%8B%E5%AF%BC%E5%87%BA
)
导出Inference模型,具体可参考下方PP-YOLOE模型的导出示例:
-
下载代码
```
git clone https://github.com/PaddlePaddle/PaddleDetection.git
```
-
导出预测模型
PPYOLOE-l模型,包含NMS:如快速体验,可直接下载
[
PP-YOLOE-l导出模型
](
https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_l_300e_coco.tar
)
```
shell
python tools/export_model.py
\
-c
configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml
\
-o
weights
=
https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams
\
trt
=
True
\
```
YOLOv8-s模型,包含NMS,具体可参考
[
YOLOv8模型文档
](
https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8
)
, 然后执行:
```
shell
python tools/export_model.py
\
-c
configs/yolov8/yolov8_s_500e_coco.yml
\
-o
weights
=
https://paddledet.bj.bcebos.com/models/yolov8_s_500e_coco.pdparams
\
trt
=
True
```
如快速体验,可直接下载
[
YOLOv8-s导出模型
](
https://bj.bcebos.com/v1/paddle-slim-models/act/yolov8_s_500e_coco_trt_nms.tar
)
#### 3.4 自动压缩并产出模型
蒸馏量化自动压缩示例通过run.py脚本启动,会使用接口
```paddleslim.auto_compression.AutoCompression```
对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数,配置完成后便可对模型进行量化和蒸馏。具体运行命令为:
-
单卡训练:
```
export CUDA_VISIBLE_DEVICES=0
python run.py --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/'
```
-
多卡训练:
```
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
--config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/'
```
#### 3.5 测试模型精度
使用eval.py脚本得到模型的mAP:
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml
```
使用paddle inference并使用trt int8得到模型的mAP:
```
export CUDA_VISIBLE_DEVICES=0
python paddle_inference_eval.py --model_path ./output/ --reader_config configs/ppyoloe_reader.yml --precision int8 --use_trt=True
```
**注意**
:
-
要测试的模型路径可以在配置文件中
`model_dir`
字段下进行修改。
-
--precision 默认为paddle,如果使用trt,需要设置--use_trt=True,同时--precision 可设置为fp32/fp16/int8
## 4.预测部署
-
可以参考
[
PaddleDetection部署教程
](
https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy
)
,GPU上量化模型开启TensorRT并设置trt_int8模式进行部署。
deploy/auto_compression/configs/picodet_reader.yml
0 → 100644
View file @
fccfdfa5
metric
:
COCO
num_classes
:
80
# Datset configuration
TrainDataset
:
!COCODataSet
image_dir
:
train2017
anno_path
:
annotations/instances_train2017.json
dataset_dir
:
dataset/coco/
EvalDataset
:
!COCODataSet
image_dir
:
val2017
anno_path
:
annotations/instances_val2017.json
dataset_dir
:
dataset/coco/
worker_num
:
6
eval_height
:
&eval_height
416
eval_width
:
&eval_width
416
eval_size
:
&eval_size
[
*eval_height
,
*eval_width
]
EvalReader
:
sample_transforms
:
-
Decode
:
{}
-
Resize
:
{
interp
:
2
,
target_size
:
*eval_size
,
keep_ratio
:
False
}
-
NormalizeImage
:
{
mean
:
[
0
,
0
,
0
],
std
:
[
1
,
1
,
1
],
is_scale
:
True
}
-
Permute
:
{}
batch_transforms
:
-
PadBatch
:
{
pad_to_stride
:
32
}
batch_size
:
8
shuffle
:
false
deploy/auto_compression/configs/picodet_s_qat_dis.yaml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
./configs/picodet_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
./picodet_s_416_coco_npu/
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
l2
QuantAware
:
use_pact
:
true
activation_quantize_type
:
'
moving_average_abs_max'
weight_bits
:
8
activation_bits
:
8
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
TrainConfig
:
train_iter
:
8000
eval_iter
:
1000
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00001
T_max
:
8000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_qat.yml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_reader.yml
input_list
:
[
'
image'
,
'
scale_factor'
]
arch
:
YOLO
include_nms
:
True
Evaluation
:
True
model_dir
:
../../output_inference/ppyoloe_crn_l_80e_sliced_visdrone_640_025
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
onnx_format
:
True
use_pact
:
False
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
TrainConfig
:
train_iter
:
8000
eval_iter
:
500
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
6000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_reader.yml
0 → 100644
View file @
fccfdfa5
metric
:
COCO
num_classes
:
10
# Datset configuration
TrainDataset
:
!COCODataSet
image_dir
:
train_images_640_025
anno_path
:
train_640_025.json
dataset_dir
:
dataset/visdrone_sliced
EvalDataset
:
!COCODataSet
image_dir
:
val_images_640_025
anno_path
:
val_640_025.json
dataset_dir
:
dataset/visdrone_sliced
worker_num
:
0
# preprocess reader in test
EvalReader
:
sample_transforms
:
-
Decode
:
{}
-
Resize
:
{
target_size
:
[
640
,
640
],
keep_ratio
:
False
,
interp
:
2
}
#- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
-
Permute
:
{}
batch_size
:
16
deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/ppyoloe_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
./ppyoloe_crn_l_300e_coco
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
use_pact
:
true
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
TrainConfig
:
train_iter
:
5000
eval_iter
:
1000
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
6000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/ppyoloe_plus_crn_t_auxhead_300e_coco_qat.yml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/ppyoloe_plus_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
../../output_inference/ppyoloe_plus_crn_t_auxhead_300e_coco/
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
onnx_format
:
True
use_pact
:
False
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
TrainConfig
:
train_iter
:
8000
eval_iter
:
1000
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
6000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/ppyoloe_plus_l_qat_dis.yaml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/ppyoloe_plus_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
./ppyoloe_plus_crn_l_80e_coco
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
use_pact
:
true
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
TrainConfig
:
train_iter
:
5000
eval_iter
:
1000
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
6000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/ppyoloe_plus_m_qat_dis.yaml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/ppyoloe_plus_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
./ppyoloe_plus_crn_m_80e_coco
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
use_pact
:
true
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
TrainConfig
:
train_iter
:
5000
eval_iter
:
1000
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
6000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/ppyoloe_plus_reader.yml
0 → 100644
View file @
fccfdfa5
metric
:
COCO
num_classes
:
80
# Datset configuration
TrainDataset
:
!COCODataSet
image_dir
:
train2017
anno_path
:
annotations/instances_train2017.json
dataset_dir
:
dataset/coco/
EvalDataset
:
!COCODataSet
image_dir
:
val2017
anno_path
:
annotations/instances_val2017.json
dataset_dir
:
dataset/coco/
worker_num
:
0
# preprocess reader in test
EvalReader
:
sample_transforms
:
-
Decode
:
{}
-
Resize
:
{
target_size
:
[
640
,
640
],
keep_ratio
:
False
,
interp
:
2
}
-
NormalizeImage
:
{
mean
:
[
0.
,
0.
,
0.
],
std
:
[
1.
,
1.
,
1.
],
norm_type
:
none
}
-
Permute
:
{}
batch_size
:
4
deploy/auto_compression/configs/ppyoloe_plus_s_qat_dis.yaml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/ppyoloe_plus_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
./ppyoloe_plus_crn_s_80e_coco
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
use_pact
:
true
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
TrainConfig
:
train_iter
:
5000
eval_iter
:
1000
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
6000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/ppyoloe_plus_sod_crn_l_qat_dis.yaml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/ppyoloe_plus_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
../../output_inference/ppyoloe_plus_sod_crn_l_80e_coco
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
onnx_format
:
True
use_pact
:
true
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
TrainConfig
:
train_iter
:
1
eval_iter
:
1
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
6000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/ppyoloe_plus_x_qat_dis.yaml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/ppyoloe_plus_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
./ppyoloe_plus_crn_x_80e_coco
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
use_pact
:
true
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
TrainConfig
:
train_iter
:
5000
eval_iter
:
1000
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
6000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/ppyoloe_reader.yml
0 → 100644
View file @
fccfdfa5
metric
:
COCO
num_classes
:
80
# Datset configuration
TrainDataset
:
!COCODataSet
image_dir
:
train2017
anno_path
:
annotations/instances_train2017.json
dataset_dir
:
dataset/coco/
EvalDataset
:
!COCODataSet
image_dir
:
val2017
anno_path
:
annotations/instances_val2017.json
dataset_dir
:
dataset/coco/
worker_num
:
0
# preprocess reader in test
EvalReader
:
sample_transforms
:
-
Decode
:
{}
-
Resize
:
{
target_size
:
[
640
,
640
],
keep_ratio
:
False
,
interp
:
2
}
-
NormalizeImage
:
{
mean
:
[
0.485
,
0.456
,
0.406
],
std
:
[
0.229
,
0.224
,
0.225
],
is_scale
:
True
}
-
Permute
:
{}
batch_size
:
4
deploy/auto_compression/configs/rtdetr_hgnetv2_l_qat_dis.yaml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/rtdetr_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
./rtdetr_hgnetv2_l_6x_coco/
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
onnx_format
:
true
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
-
matmul_v2
TrainConfig
:
train_iter
:
200
eval_iter
:
50
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
10000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/rtdetr_hgnetv2_x_qat_dis.yaml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/rtdetr_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
./rtdetr_r50vd_6x_coco/
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
onnx_format
:
true
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
-
matmul_v2
TrainConfig
:
train_iter
:
500
eval_iter
:
100
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
10000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/rtdetr_r101vd_qat_dis.yaml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/rtdetr_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
./rtdetr_hgnetv2_x_6x_coco/
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
onnx_format
:
true
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
-
matmul_v2
TrainConfig
:
train_iter
:
200
eval_iter
:
50
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
10000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/rtdetr_r50vd_qat_dis.yaml
0 → 100644
View file @
fccfdfa5
Global
:
reader_config
:
configs/rtdetr_reader.yml
include_nms
:
True
Evaluation
:
True
model_dir
:
./rtdetr_r50vd_6x_coco/
model_filename
:
model.pdmodel
params_filename
:
model.pdiparams
Distillation
:
alpha
:
1.0
loss
:
soft_label
QuantAware
:
onnx_format
:
true
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
-
matmul_v2
TrainConfig
:
train_iter
:
500
eval_iter
:
100
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
10000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
4.0e-05
deploy/auto_compression/configs/rtdetr_reader.yml
0 → 100644
View file @
fccfdfa5
metric
:
COCO
num_classes
:
80
# Datset configuration
TrainDataset
:
!COCODataSet
image_dir
:
train2017
anno_path
:
annotations/instances_train2017.json
dataset_dir
:
dataset/coco/
!COCODataSet
image_dir
:
val2017
anno_path
:
annotations/instances_val2017.json
dataset_dir
:
dataset/coco/
worker_num
:
0
# preprocess reader in test
EvalReader
:
sample_transforms
:
-
Decode
:
{}
-
Resize
:
{
target_size
:
[
640
,
640
],
keep_ratio
:
False
,
interp
:
2
}
-
NormalizeImage
:
{
mean
:
[
0.
,
0.
,
0.
],
std
:
[
1.
,
1.
,
1.
],
norm_type
:
none
}
-
Permute
:
{}
batch_size
:
1
shuffle
:
false
drop_last
:
false
Prev
1
2
3
4
5
6
7
8
…
26
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment