Commit fccfdfa5 authored by dlyrm's avatar dlyrm
Browse files

update code

parent dcc7bf4f
Pipeline #681 canceled with stages
# FAQ:第零期
**Q:** 为什么我使用单GPU训练loss会出`NaN`? </br>
**A:** 配置文件中原始学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整学习率(例如,除以8)。
[faster_rcnn_r50](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml) 为例,在静态图下计算规则表如下所示,它们是等价的,表中变化节点即为`piecewise decay`里的`boundaries`: </br>
| GPU数 |batch size/卡| 学习率 | 最大轮数 | 变化节点 |
| :---------: | :------------:|:------------: | :-------: | :--------------: |
| 2 | 1 | 0.0025 | 720000 | [480000, 640000] |
| 4 | 1 | 0.005 | 360000 | [240000, 320000] |
| 8 | 1| 0.01 | 180000 | [120000, 160000] |
* 上述方式适用于静态图下。在动态图中,由于训练以epoch方式计数,因此调整GPU卡数后只需要修改学习率即可,修改方式和静态图相同.
**Q:** 自定义数据集时,配置文件里的`num_classes`应该如何设置? </br>
**A:** 动态图中,自定义数据集时将`num_classes`统一设置为自定义数据集的类别数即可,静态图中(static目录下),YOLO系列模型和anchor free系列模型将`num_classes`设置为自定义数据集类别即可,其他模型如RCNN系列,SSD,RetinaNet,SOLOv2等模型,由于检测原理上分类中需要区分背景框和前景框,设置的`num_classes`须为自定义数据集类别数+1,即增加一类背景类。
**Q:** PP-YOLOv2模型训练使用`—eval`做训练中验证,在第一次做eval的时候hang住,该如何处理?</br>
**A:** PP-YOLO系列模型如果只加载backbone的预训练权重从头开始训练的话收敛会比较慢,当模型还没有较好收敛的时候做预测时,由于输出的预测框比较混乱,在NMS时做排序和滤除会非常耗时,就好像eval时hang住了一样,这种情况一般发生在使用自定义数据集并且自定义数据集样本数较少导致训练到第一次做eval的时候训练轮数较少,模型还没有较好收敛的情况下,可以通过如下三个方面排查解决。
* PaddleDetection中提供的默认配置一般是采用8卡训练的配置,配置文件中的`batch_size`数为每卡的batch size,若训练的时候不是使用8卡或者对`batch_size`有修改,需要等比例的调小初始`learning_rate`来获得较好的收敛效果
* 如果使用自定义数据集并且样本数比较少,建议增大`snapshot_epoch`数来增加第一次进行eval的时候的训练轮数来保证模型已经较好收敛
* 若使用自定义数据集训练,可以加载我们发布的COCO或VOC数据集上训练好的权重进行finetune训练来加快收敛速度,可以使用`-o pretrain_weights=xxx`的方式指定预训练权重,xxx可以是Model Zoo里发布的模型权重链接
**Q:** 如何更好的理解reader和自定义修改reader文件
```
# 每张GPU reader进程个数
worker_num: 2
# 训练数据
TrainReader:
inputs_def:
num_max_boxes: 50
# 训练数据transforms
sample_transforms:
- Decode: {} # 图片解码,将图片数据从numpy格式转为rgb格式,是必须存在的一个OP
- Mixup: {alpha: 1.5, beta: 1.5} # Mixup数据增强,对两个样本的gt_bbbox/gt_score操作,构建虚拟的训练样本,可选的OP
- RandomDistort: {} # 随机颜色失真,可选的OP
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]} # 随机Canvas填充,可选的OP
- RandomCrop: {} # 随机裁剪,可选的OP
- RandomFlip: {} # 随机左右翻转,默认概率0.5,可选的OP
# batch_transforms
batch_transforms:
- BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeBox: {}
- PadBox: {num_max_boxes: 50}
- BboxXYXY2XYWH: {}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
- Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
# 训练时batch_size
batch_size: 24
# 读取数据是否乱序
shuffle: true
# 是否丢弃最后不能完整组成batch的数据
drop_last: true
# mixup_epoch,大于最大epoch,表示训练过程一直使用mixup数据增广。默认值为-1,表示不使用Mixup。如果删去- Mixup: {alpha: 1.5, beta: 1.5}这行代码则必须也将mixup_epoch设置为-1或者删除
mixup_epoch: 25000
# 是否通过共享内存进行数据读取加速,需要保证共享内存大小(如/dev/shm)满足大于1G
use_shared_memory: true
如果需要单尺度训练,则去掉batch_transforms里的BatchRandomResize这一行,在sample_transforms最后一行添加- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
Decode是必须保留的,如果想要去除数据增强,则可以注释或删除Mixup RandomDistort RandomExpand RandomCrop RandomFlip,注意如果注释或删除Mixup则必须也将mixup_epoch这一行注释或删除,或者设置为-1表示不使用Mixup
sample_transforms:
- Decode: {}
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
```
**Q:** 用户如何控制类别类别输出?即图中有多类目标只输出其中的某几类
**A:** 用户可自行在代码中进行修改,增加条件设置。
```
# filter by class_id
keep_class_id = [1, 2]
bbox_res = [e for e in bbox_res if int(e[0]) in keep_class_id]
```
https://github.com/PaddlePaddle/PaddleDetection/blob/b87a1ea86fa18ce69e44a17ad1b49c1326f19ff9/ppdet/engine/trainer.py#L438
**Q:** 用户自定义数据集训练,预测结果标签错误
**A:** 此类情况往往是用户在设置数据集路径时候,并没有关注TestDataset中anno_path的路径问题。需要用户将anno_path设置成自己的路径。
```
TestDataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
```
**Q:** 如何打印网络FLOPs?
**A:**`configs/runtime.yml`中设置`print_flops: true`,同时需要安装PaddleSlim(比如:pip install paddleslim),即可打印模型的FLOPs。
**Q:** 如何使用无标注框进行训练?
**A:**`configs/dataset/coco.py` 或者`configs/dataset/voc.py`中的TrainDataset下设置`allow_empty: true`, 此时允许数据集加载无标注框进行训练。该功能支持coco,voc数据格式,RCNN系列和YOLO系列模型验证能够正常训练。另外,如果无标注框数据过多,会影响模型收敛,在TrainDataset下可以设置`empty_ratio: 0.1`对无标注框数据进行随机采样,控制无标注框的数据量占总数据量的比例,默认值为1.,即使用全部无标注框
# FAQ/常见问题
**PaddleDetection**非常感谢各位开发者提出任何使用问题或需求,我们根据大家的提问,总结**FAQ/常见问题**合集,并在**每周一**进行更新,以下是往期的FAQ,欢迎大家进行查阅。
- [FAQ:第零期](./FAQ第零期.md)
- [FAQ:第一期](./FAQ第一期.md)
English | [简体中文](GETTING_STARTED_cn.md)
# Getting Started
## Installation
For setting up the running environment, please refer to [installation
instructions](INSTALL_cn.md).
## Data preparation
- Please refer to [PrepareDetDataSet](./data/PrepareDetDataSet_en.md) for data preparation
- Please set the data path for data configuration file in ```configs/datasets```
## Training & Evaluation & Inference
PaddleDetection provides scripts for training, evalution and inference with various features according to different configure. And for more distribued training details see [DistributedTraining].(./DistributedTraining_en.md)
```bash
# training on single-GPU
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
# training on multi-GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
# training on multi-machines and multi-GPUs
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
$fleetrun --ips="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" --selected_gpu 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
# GPU evaluation
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams
# Inference
python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --infer_img=demo/000000570688.jpg -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams
```
### Other argument list
list below can be viewed by `--help`
| FLAG | script supported | description | default | remark |
| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
| -c | ALL | Select config file | None | **required**, such as `-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml` |
| -o | ALL | Set parameters in configure file | None | `-o` has higher priority to file configured by `-c`. Such as `-o use_gpu=False` |
| --eval | train | Whether to perform evaluation in training | False | set `--eval` if needed |
| -r/--resume_checkpoint | train | Checkpoint path for resuming training | None | such as `-r output/faster_rcnn_r50_1x_coco/10000` |
| --slim_config | ALL | Configure file of slim method | None | such as `--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml` |
| --use_vdl | train/infer | Whether to record the data with [VisualDL](https://github.com/paddlepaddle/visualdl), so as to display in VisualDL | False | VisualDL requires Python>=3.5 |
| --vdl\_log_dir | train/infer | VisualDL logging directory for image | train:`vdl_log_dir/scalar` infer: `vdl_log_dir/image` | VisualDL requires Python>=3.5 |
| --output_eval | eval | Directory for storing the evaluation output | None | such as `--output_eval=eval_output`, default is current directory |
| --json_eval | eval | Whether to evaluate with already existed bbox.json or mask.json | False | set `--json_eval` if needed and json path is set in `--output_eval` |
| --classwise | eval | Whether to eval AP for each class and draw PR curve | False | set `--classwise` if needed |
| --output_dir | infer | Directory for storing the output visualization files | `./output` | such as `--output_dir output` |
| --draw_threshold | infer | Threshold to reserve the result for visualization | 0.5 | such as `--draw_threshold 0.7` |
| --infer_dir | infer | Directory for images to perform inference on | None | One of `infer_dir` and `infer_img` is requied |
| --infer_img | infer | Image path | None | One of `infer_dir` and `infer_img` is requied, `infer_img` has higher priority over `infer_dir` |
| --save_results | infer | Whether to save detection results to file | False | Optional
## Examples
### Training
- Perform evaluation in training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --eval
```
Perform training and evalution alternatively and evaluate at each end of epoch. Meanwhile, the best model with highest MAP is saved at each epoch which has the same path as `model_final`.
If evaluation dataset is large, we suggest modifing `snapshot_epoch` in `configs/runtime.yml` to decrease evaluation times or evaluating after training.
- Fine-tune other task
When using pre-trained model to fine-tune other task, pretrain\_weights can be used directly. The parameters with different shape will be ignored automatically. For example:
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# If the shape of parameters in program is different from pretrain_weights,
# then PaddleDetection will not use such parameters.
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
-o pretrain_weights=output/faster_rcnn_r50_1x_coco/model_final \
```
##### NOTES
- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`.
- Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally.
- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
- Checkpoints are saved in `output` by default, and can be revised from `save_dir` in `configs/runtime.yml`.
### Evaluation
- Evaluate by specified weights path and dataset path
```bash
export CUDA_VISIBLE_DEVICES=0
python -u tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
-o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams
```
The path of model to be evaluted can be both local path and link in [MODEL_ZOO](../MODEL_ZOO_cn.md).
- Evaluate with json
```bash
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
--json_eval \
-output_eval evaluation/
```
The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory.
### Inference
- Output specified directory && Set up threshold
```bash
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
--infer_img=demo/000000570688.jpg \
--output_dir=infer_output/ \
--draw_threshold=0.5 \
-o weights=output/faster_rcnn_r50_fpn_1x_coco/model_final \
--use_vdl=True
```
`--draw_threshold` is an optional argument. Default is 0.5.
Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).
## Deployment
Please refer to [depolyment](../../deploy/README_en.md)
## Model Compression
Please refer to [slim](../../configs/slim/README_en.md)
[English](GETTING_STARTED.md) | 简体中文
# 30分钟快速上手PaddleDetection
PaddleDetection作为成熟的目标检测开发套件,提供了从数据准备、模型训练、模型评估、模型导出到模型部署的全流程。在这个章节里面,我们以路标检测数据集为例,提供快速上手PaddleDetection的流程。
## 1 安装
关于安装配置运行环境,请参考[安装指南](INSTALL_cn.md)
在本演示案例中,假定用户将PaddleDetection的代码克隆并放置在`/home/paddle`目录中。用户执行的命令操作均在`/home/paddle/PaddleDetection`目录下完成
## 2 准备数据
目前PaddleDetection支持:COCO VOC WiderFace, MOT四种数据格式。
- 首先按照[准备数据文档](./data/PrepareDetDataSet.md) 准备数据。
- 然后设置`configs/datasets`中相应的coco或voc等数据配置文件中的数据路径。
- 在本项目中,我们使用路标识别数据集
```bash
python dataset/roadsign_voc/download_roadsign_voc.py
```
- 下载后的数据格式为
```
├── download_roadsign_voc.py
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ | ...
├── images
│ ├── road0.png
│ ├── road1.png
│ | ...
├── label_list.txt
├── train.txt
├── valid.txt
```
## 3 配置文件改动和说明
我们使用`configs/yolov3/yolov3_mobilenet_v1_roadsign`配置进行训练。
在静态图版本下,一个模型往往可以通过两个配置文件(一个主配置文件、一个reader的读取配置)实现,在PaddleDetection 2.0后续版本,采用了模块解耦设计,用户可以组合配置模块实现检测器,并可自由修改覆盖各模块配置,如下图所示
<center>
<img src="../images/roadsign_yml.png" width="500" >
</center>
<br><center>配置文件摘要</center></br>
从上图看到`yolov3_mobilenet_v1_roadsign.yml`配置需要依赖其他的配置文件。在该例子中需要依赖:
```bash
roadsign_voc.yml
runtime.yml
optimizer_40e.yml
yolov3_mobilenet_v1.yml
yolov3_reader.yml
--------------------------------------
yolov3_mobilenet_v1_roadsign 文件入口
roadsign_voc 主要说明了训练数据和验证数据的路径
runtime.yml 主要说明了公共的运行参数,比如说是否使用GPU、每多少个epoch存储checkpoint等
optimizer_40e.yml 主要说明了学习率和优化器的配置。
ppyolov2_r50vd_dcn.yml 主要说明模型、和主干网络的情况。
ppyolov2_reader.yml 主要说明数据读取器配置,如batch size,并发加载子进程数等,同时包含读取后预处理操作,如resize、数据增强等等
```
<center><img src="../images/yaml_show.png" width="1000" ></center>
<br><center>配置文件结构说明</center></br>
### 修改配置文件说明
* 关于数据的路径修改说明
在修改配置文件中,用户如何实现自定义数据集是非常关键的一步,如何定义数据集请参考[如何自定义数据集](https://aistudio.baidu.com/aistudio/projectdetail/1917140)
* 默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整学习率(例如,除以8)
* 更多使用问题,请参考[FAQ](FAQ)
## 4 训练
PaddleDetection提供了单卡/多卡训练模式,满足用户多种训练需求
* GPU单卡训练
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
```
* GPU多卡训练
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 #windows和Mac下不需要执行该命令
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
```
* [GPU多机多卡训练](./DistributedTraining_cn.md)
```bash
$fleetrun \
--ips="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" \
--selected_gpu 0,1,2,3,4,5,6,7 \
tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
```
* Fine-tune其他任务
使用预训练模型fine-tune其他任务时,可以直接加载预训练模型,形状不匹配的参数将自动忽略,例如:
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# 如果模型中参数形状与加载权重形状不同,将不会加载这类参数
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o pretrain_weights=output/model_final
```
* 模型恢复训练
在日常训练过程中,有的用户由于一些原因导致训练中断,用户可以使用-r的命令恢复训练
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -r output/faster_rcnn_r50_1x_coco/10000
```
## 5 评估
* 默认将训练生成的模型保存在当前`output`文件夹下
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_roadsign.pdparams
```
* 边训练,边评估
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 #windows和Mac下不需要执行该命令
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval
```
在训练中交替执行评估, 评估在每个epoch训练结束后开始。每次评估后还会评出最佳mAP模型保存到`best_model`文件夹下。
如果验证集很大,测试将会比较耗时,建议调整`configs/runtime.yml` 文件中的 `snapshot_epoch`配置以减少评估次数,或训练完成后再进行评估。
- 通过json文件评估
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
--json_eval \
-output_eval evaluation/
```
* 上述命令中没有加载模型的选项,则使用配置文件中weights的默认配置,`weights`表示训练过程中保存的最后一轮模型文件
* json文件必须命名为bbox.json或者mask.json,放在`evaluation`目录下。
## 6 预测
```bash
python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --infer_img=demo/000000570688.jpg -o weights=https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_roadsign.pdparams
```
* 设置参数预测
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
--infer_img=demo/road554.png \
--output_dir=infer_output/ \
--draw_threshold=0.5 \
-o weights=output/yolov3_mobilenet_v1_roadsign/model_final \
--use_vdl=True
```
`--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,不同阈值会产生不同的结果
`keep_top_k`表示设置输出目标的最大数量,默认值为100,用户可以根据自己的实际情况进行设定。
结果如下图:
![road554 image](../images/road554.png)
## 7 训练可视化
当打开`use_vdl`开关后,为了方便用户实时查看训练过程中状态,PaddleDetection集成了VisualDL可视化工具,当打开`use_vdl`开关后,记录的数据包括:
1. loss变化趋势
2. mAP变化趋势
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
--use_vdl=true \
--vdl_log_dir=vdl_dir/scalar \
```
使用如下命令启动VisualDL查看日志
```shell
# 下述命令会在127.0.0.1上启动一个服务,支持通过前端web页面查看,可以通过--host这个参数指定实际ip地址
visualdl --logdir vdl_dir/scalar/
```
在浏览器输入提示的网址,效果如下:
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/ab767a202f084d1589f7d34702a75a7ef5d0f0a7e8c445bd80d54775b5761a8d" width="900" ></center>
<br><center>图:VDL效果演示</center></br>
**参数列表**
以下列表可以通过`--help`查看
| FLAG | 支持脚本 | 用途 | 默认值 | 备注 |
| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
| -c | ALL | 指定配置文件 | None | **必选**,例如-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml |
| -o | ALL | 设置或更改配置文件里的参数内容 | None | 相较于`-c`设置的配置文件有更高优先级,例如:`-o use_gpu=False` |
| --eval | train | 是否边训练边测试 | False | 如需指定,直接`--eval`即可 |
| -r/--resume_checkpoint | train | 恢复训练加载的权重路径 | None | 例如:`-r output/faster_rcnn_r50_1x_coco/10000` |
| --slim_config | ALL | 模型压缩策略配置文件 | None | 例如`--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml` |
| --use_vdl | train/infer | 是否使用[VisualDL](https://github.com/paddlepaddle/visualdl)记录数据,进而在VisualDL面板中显示 | False | VisualDL需Python>=3.5 |
| --vdl\_log_dir | train/infer | 指定 VisualDL 记录数据的存储路径 | train:`vdl_log_dir/scalar` infer: `vdl_log_dir/image` | VisualDL需Python>=3.5 |
| --output_eval | eval | 评估阶段保存json路径 | None | 例如 `--output_eval=eval_output`, 默认为当前路径 |
| --json_eval | eval | 是否通过已存在的bbox.json或者mask.json进行评估 | False | 如需指定,直接`--json_eval`即可, json文件路径在`--output_eval`中设置 |
| --classwise | eval | 是否评估单类AP和绘制单类PR曲线 | False | 如需指定,直接`--classwise`即可 |
| --output_dir | infer/export_model | 预测后结果或导出模型保存路径 | `./output` | 例如`--output_dir=output` |
| --draw_threshold | infer | 可视化时分数阈值 | 0.5 | 例如`--draw_threshold=0.7` |
| --infer_dir | infer | 用于预测的图片文件夹路径 | None | `--infer_img``--infer_dir`必须至少设置一个 |
| --infer_img | infer | 用于预测的图片路径 | None | `--infer_img``--infer_dir`必须至少设置一个,`infer_img`具有更高优先级 |
| --save_results | infer | 是否在文件夹下将图片的预测结果保存到文件中 | False | 可选 |
## 8 模型导出
在模型训练过程中保存的模型文件是包含前向预测和反向传播的过程,在实际的工业部署则不需要反向传播,因此需要将模型进行导成部署需要的模型格式。
在PaddleDetection中提供了 `tools/export_model.py`脚本来导出模型
```bash
python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --output_dir=./inference_model \
-o weights=output/yolov3_mobilenet_v1_roadsign/best_model
```
预测模型会导出到`inference_model/yolov3_mobilenet_v1_roadsign`目录下,分别为`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`,`model.pdmodel` 如果不指定文件夹,模型则会导出在`output_inference`
* 更多关于模型导出的文档,请参考[模型导出文档](../../deploy/EXPORT_MODEL.md)
## 9 模型压缩
为了进一步对模型进行优化,PaddleDetection提供了基于PaddleSlim进行模型压缩的完整教程和benchmark。目前支持的方案:
* 裁剪
* 量化
* 蒸馏
* 联合策略
* 更多关于模型压缩的文档,请参考[模型压缩文档](../../configs/slim/README.md)
## 10 预测部署
PaddleDetection提供了PaddleInference、PaddleServing、PaddleLite多种部署形式,支持服务端、移动端、嵌入式等多种平台,提供了完善的Python和C++部署方案。
* 在这里,我们以Python为例,说明如何使用PaddleInference进行模型部署
```bash
python deploy/python/infer.py --model_dir=./output_inference/yolov3_mobilenet_v1_roadsign --image_file=demo/road554.png --device=GPU
```
* 同时`infer.py`提供了丰富的接口,用户进行接入视频文件、摄像头进行预测,更多内容请参考[Python端预测部署](../../deploy/python)
### PaddleDetection支持的部署形式说明
|形式|语言|教程|设备/平台|
|-|-|-|-|
|PaddleInference|Python|已完善|Linux(arm X86)、Windows
|PaddleInference|C++|已完善|Linux(arm X86)、Windows|
|PaddleServing|Python|已完善|Linux(arm X86)、Windows|
|PaddleLite|C++|已完善|Android、IOS、FPGA、RK...
* 更多关于预测部署的文档,请参考[预测部署文档](../../deploy/README.md)
English | [简体中文](INSTALL_cn.md)
# Installation
This document covers how to install PaddleYOLO and its dependencies
(including PaddlePaddle), together with COCO and Pascal VOC dataset.
For general information about PaddleYOLO, please see [README.md](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop).
## Requirements:
- PaddlePaddle 2.3.2
- OS 64 bit
- Python 3(3.5.1+/3.6/3.7/3.8/3.9),64 bit
- pip/pip3(9.0.1+), 64 bit
- CUDA >= 10.2
- cuDNN >= 7.6
Dependency of PaddleYOLO and PaddlePaddle:
| PaddleYOLO version | PaddlePaddle version | tips |
| :----------------: | :---------------: | :-------: |
| develop | >= 2.3.2 | Dygraph mode is set as default |
| release/2.6 | >= 2.3.2 | Dygraph mode is set as default |
| release/2.5 | >= 2.2.2 | Dygraph mode is set as default |
## Instruction
### 1. Install PaddlePaddle
```
# CUDA10.2
python -m pip install paddlepaddle-gpu==2.3.2 -i https://mirror.baidu.com/pypi/simple
# CPU
python -m pip install paddlepaddle==2.3.2 -i https://mirror.baidu.com/pypi/simple
```
- For more CUDA version or environment to quick install, please refer to the [PaddlePaddle Quick Installation document](https://www.paddlepaddle.org.cn/install/quick)
- For more installation methods such as conda or compile with source code, please refer to the [installation document](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html)
Please make sure that your PaddlePaddle is installed successfully and the version is not lower than the required version. Use the following command to verify.
```
# check
>>> import paddle
>>> paddle.utils.run_check()
# confirm the paddle's version
python -c "import paddle; print(paddle.__version__)"
```
**Note**
1. If you want to use PaddleDetection on multi-GPU, please install NCCL at first.
### 2. Install PaddleDetection
**Note:** Installing via pip only supports Python3
```
# Clone PaddleDetection repository
cd <path/to/clone/PaddleDetection>
git clone https://github.com/PaddlePaddle/PaddleDetection.git
# Install other dependencies
cd PaddleDetection
pip install -r requirements.txt
# Compile and install paddledet
python setup.py install
```
**Note**
1. If you are working on Windows OS, `pycocotools` installing may failed because of the origin version of cocoapi does not support windows, another version can be used used which only supports Python3:
```pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI```
2. If you are using Python <= 3.6, `pycocotools` installing may failed with error like `distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('cython>=0.27.3')`, please install `cython` firstly, for example `pip install cython`
After installation, make sure the tests pass:
```shell
python ppdet/modeling/tests/test_architectures.py
```
If the tests are passed, the following information will be prompted:
```
.......
----------------------------------------------------------------------
Ran 7 tests in 12.816s
OK
```
## Use built Docker images
> If you do not have a Docker environment, please refer to [Docker](https://www.docker.com/).
We provide docker images containing the latest PaddleDetection code, and all environment and package dependencies are pre-installed. All you have to do is to **pull and run the docker image**. Then you can enjoy PaddleDetection without any extra steps.
Get these images and guidance in [docker hub](https://hub.docker.com/repository/docker/paddlecloud/paddledetection), including CPU, GPU, ROCm environment versions.
If you have some customized requirements about automatic building docker images, you can get it in github repo [PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton).
## Inference demo
**Congratulation!** Now you have installed PaddleDetection successfully and try our inference demo:
```
# Predict an image by GPU
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
```
An image of the same name with the predicted result will be generated under the `output` folder.
The result is as shown below:
![](../images/000000014439.jpg)
[English](INSTALL.md) | 简体中文
# 安装文档
## 环境要求
- PaddlePaddle 2.3.2
- OS 64位操作系统
- Python 3(3.5.1+/3.6/3.7/3.8/3.9),64位版本
- pip/pip3(9.0.1+),64位版本
- CUDA >= 10.2
- cuDNN >= 7.6
PaddleYOLO 依赖 PaddlePaddle 版本关系:
| PaddleYOLO版本 | PaddlePaddle版本 | 备注 |
| :------------------: | :---------------: | :-------: |
| develop | >= 2.3.2 | 默认使用动态图模式 |
| release/2.6 | >= 2.3.2 | 默认使用动态图模式 |
| release/2.5 | >= 2.2.2 | 默认使用动态图模式 |
## 安装说明
### 1. 安装PaddlePaddle
```
# CUDA10.2
python -m pip install paddlepaddle-gpu==2.3.2 -i https://mirror.baidu.com/pypi/simple
# CPU
python -m pip install paddlepaddle==2.3.2 -i https://mirror.baidu.com/pypi/simple
```
- 更多CUDA版本或环境快速安装,请参考[PaddlePaddle快速安装文档](https://www.paddlepaddle.org.cn/install/quick)
- 更多安装方式例如conda或源码编译安装方法,请参考[PaddlePaddle安装文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/index_cn.html)
请确保您的PaddlePaddle安装成功并且版本不低于需求版本。使用以下命令进行验证。
```
# 在您的Python解释器中确认PaddlePaddle安装成功
>>> import paddle
>>> paddle.utils.run_check()
# 确认PaddlePaddle版本
python -c "import paddle; print(paddle.__version__)"
```
**注意**
1. 如果您希望在多卡环境下使用PaddleDetection,请首先安装NCCL
### 2. 安装PaddleDetection
**注意:** pip安装方式只支持Python3
```
# 克隆PaddleDetection仓库
cd <path/to/clone/PaddleDetection>
git clone https://github.com/PaddlePaddle/PaddleDetection.git
# 安装其他依赖
cd PaddleDetection
pip install -r requirements.txt
# 编译安装paddledet
python setup.py install
```
**注意**
1. 如果github下载代码较慢,可尝试使用[gitee](https://gitee.com/PaddlePaddle/PaddleDetection.git)或者[代理加速](https://doc.fastgit.org/zh-cn/guide.html)
1. 若您使用的是Windows系统,由于原版cocoapi不支持Windows,`pycocotools`依赖可能安装失败,可采用第三方实现版本,该版本仅支持Python3
```pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI```
2. 若您使用的是Python <= 3.6的版本安装`pycocotools`可能会报错`distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('cython>=0.27.3')`, 您可通过先安装`cython`如`pip install cython`解决该问题
安装后确认测试通过:
```
python ppdet/modeling/tests/test_architectures.py
```
测试通过后会提示如下信息:
```
.......
----------------------------------------------------------------------
Ran 7 tests in 12.816s
OK
```
## 使用Docker镜像
> 如果您没有Docker运行环境,请参考[Docker官网](https://www.docker.com/)进行安装。
我们提供了包含最新 PaddleDetection 代码的docker镜像,并预先安装好了所有的环境和库依赖,您只需要**拉取docker镜像**,然后**运行docker镜像**,无需其他任何额外操作,即可开始使用PaddleDetection的所有功能。
在[Docker Hub](https://hub.docker.com/repository/docker/paddlecloud/paddledetection)中获取这些镜像及相应的使用指南,包括CPU、GPU、ROCm版本。
如果您对自动化制作docker镜像感兴趣,或有自定义需求,请访问[PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton)做进一步了解。
## 快速体验
**恭喜!** 您已经成功安装了PaddleDetection,接下来快速体验目标检测效果
```
# 在GPU上预测一张图片
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
```
会在`output`文件夹下生成一个画有预测结果的同名图像。
结果如下图:
![](../images/000000014439.jpg)
English | [简体中文](QUICK_STARTED_cn.md)
# Quick Start
In order to enable users to experience PaddleDetection and produce models in a short time, this tutorial introduces the pipeline to get a decent object detection model by finetuning on a small dataset in 10 minutes only. In practical applications, it is recommended that users select a suitable model configuration file for their specific demand.
- **Set GPU**
```bash
export CUDA_VISIBLE_DEVICES=0
```
## Inference Demo with Pre-trained Models
```
# predict an image using PP-YOLO
python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
```
the result:
![](../images/000000014439.jpg)
## Data preparation
The Dataset is [Kaggle dataset](https://www.kaggle.com/andrewmvd/road-sign-detection) ,including 877 images and 4 data categories: crosswalk, speedlimit, stop, trafficlight. The dataset is divided into training set (701 images) and test set (176 images),[download link](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar).
```
# Note: this command could skip and
# the dataset will be dowloaded automatically at the stage of training.
python dataset/roadsign_voc/download_roadsign_voc.py
```
## Training & Evaluation & Inference
### 1、Training
```
# It will takes about 10 minutes on 1080Ti and 1 hour on CPU
# -c set configuration file
# -o overwrite the settings in the configuration file
# --eval Evaluate while training, and a model named best_model.pdmodel with the most evaluation results will be automatically saved
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true
```
If you want to observe the loss change curve in real time through VisualDL, add --use_vdl=true to the training command, and set the log save path through --vdl_log_dir.
**Note: VisualDL need Python>=3.5**
Please install [VisualDL](https://github.com/PaddlePaddle/VisualDL) first
```
python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple
```
```
python -u tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
--use_vdl=true \
--vdl_log_dir=vdl_dir/scalar \
--eval
```
View the change curve in real time through the visualdl command:
```
visualdl --logdir vdl_dir/scalar/ --host <host_IP> --port <port_num>
```
### 2、Evaluation
```
# Evaluate best_model by default
# -c set config file
# -o overwrite the settings in the configuration file
python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true
```
The final mAP should be around 0.85. The dataset is small so the precision may vary a little after each training.
### 3、Inference
```
# -c set config file
# -o overwrite the settings in the configuration file
# --infer_img image path
# After the prediction is over, an image of the same name with the prediction result will be generated in the output folder
python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true --infer_img=demo/road554.png
```
The result is as shown below:
![](../images/road554.png)
[English](QUICK_STARTED.md) | 简体中文
# 快速开始
为了使得用户能够在很短时间内快速产出模型,掌握PaddleDetection的使用方式,这篇教程通过一个预训练检测模型对小数据集进行finetune。在较短时间内即可产出一个效果不错的模型。实际业务中,建议用户根据需要选择合适模型配置文件进行适配。
- **设置显卡**
```bash
export CUDA_VISIBLE_DEVICES=0
```
## 一、快速体验
```
# 用PP-YOLO算法在COCO数据集上预训练模型预测一张图片
python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
```
结果如下图:
![demo image](../images/000000014439.jpg)
## 二、准备数据
数据集参考[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) ,包含877张图像,数据类别4类:crosswalk,speedlimit,stop,trafficlight。
将数据划分为训练集701张图和测试集176张图,[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar).
```
# 注意:可跳过这步下载,后面训练会自动下载
python dataset/roadsign_voc/download_roadsign_voc.py
```
## 三、训练、评估、预测
### 1、训练
```
# 边训练边测试 CPU需要约1小时(use_gpu=false),1080Ti GPU需要约10分钟
# -c 参数表示指定使用哪个配置文件
# -o 参数表示指定配置文件中的全局变量(覆盖配置文件中的设置),这里设置使用gpu
# --eval 参数表示边训练边评估,最后会自动保存一个名为model_final.pdparams的模型
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true
```
如果想通过VisualDL实时观察loss变化曲线,在训练命令中添加--use_vdl=true,以及通过--vdl_log_dir设置日志保存路径。
**但注意VisualDL需Python>=3.5**
首先安装[VisualDL](https://github.com/PaddlePaddle/VisualDL)
```
python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple
```
```
python -u tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
--use_vdl=true \
--vdl_log_dir=vdl_dir/scalar \
--eval
```
通过visualdl命令实时查看变化曲线:
```
visualdl --logdir vdl_dir/scalar/ --host <host_IP> --port <port_num>
```
### 2、评估
```
# 评估 默认使用训练过程中保存的model_final.pdparams
# -c 参数表示指定使用哪个配置文件
# -o 参数表示指定配置文件中的全局变量(覆盖配置文件中的设置)
# 目前只支持单卡评估
python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true
```
最终模型精度在mAP=0.85左右,由于数据集较小因此每次训练结束后精度会有一定波动
### 3、预测
```
# -c 参数表示指定使用哪个配置文件
# -o 参数表示指定配置文件中的全局变量(覆盖配置文件中的设置)
# --infer_img 参数指定预测图像路径
# 预测结束后会在output文件夹中生成一张画有预测结果的同名图像
python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true --infer_img=demo/road554.png
```
结果如下图:
![road554 image](../images/road554.png)
# Multi Scale Test Configuration
Tags: Configuration
---
```yaml
##################################### Multi scale test configuration #####################################
EvalReader:
sample_transforms:
- Decode: {}
- MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
TestReader:
sample_transforms:
- Decode: {}
- MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
```
---
Multi Scale Test is a TTA (Test Time Augmentation) method, it can improve object detection performance.
The input image will be scaled into different scales, then model generated predictions (bboxes) at different scales, finally all the predictions will be combined to generate final prediction. (Here **NMS** is used to aggregate the predictions.)
## _MultiscaleTestResize_ option
`MultiscaleTestResize` option is used to enable multi scale test prediction.
`origin_target_size: [800, 1333]` means the input image will be scaled to 800 (for short edge) and 1333 (max edge length cannot be greater than 1333) at first
`target_size: [700 , 900]` property is used to specify different scales.
It can be plugged into evaluation process or test (inference) process, by adding `MultiscaleTestResize` entry to `EvalReader.sample_transforms` or `TestReader.sample_transforms`
---
###Note
Now only CascadeRCNN, FasterRCNN and MaskRCNN are supported for multi scale testing. And batch size must be 1.
\ No newline at end of file
# 多尺度测试的配置
标签: 配置
---
```yaml
##################################### 多尺度测试的配置 #####################################
EvalReader:
sample_transforms:
- Decode: {}
- MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
TestReader:
sample_transforms:
- Decode: {}
- MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
```
---
多尺度测试是一种TTA方法(测试时增强),可以用于提高目标检测的准确率
输入图像首先被缩放为不同尺度的图像,然后模型对这些不同尺度的图像进行预测,最后将这些不同尺度上的预测结果整合为最终预测结果。(这里使用了**NMS**来整合不同尺度的预测结果)
## _MultiscaleTestResize_ 选项
`MultiscaleTestResize` 选项用于开启多尺度测试.
`origin_target_size: [800, 1333]` 项代表输入图像首先缩放为短边为800,最长边不超过1333.
`target_size: [700 , 900]` 项设置不同的预测尺度。
通过在`EvalReader.sample_transforms``TestReader.sample_transforms`中设置`MultiscaleTestResize`项,可以在评估过程或预测过程中开启多尺度测试。
---
###注意
目前多尺度测试只支持CascadeRCNN, FasterRCNN and MaskRCNN网络, 并且batch size需要是1.
\ No newline at end of file
# YOLO系列模型参数配置教程
标签: 模型参数配置
`ppyolo_r50vd_dcn_1x_coco.yml`为例,这个模型由五个子配置文件组成:
- 数据配置文件 `coco_detection.yml`
```yaml
# 数据评估类型
metric: COCO
# 数据集的类别数
num_classes: 80
# TrainDataset
TrainDataset:
!COCODataSet
# 图像数据路径,相对 dataset_dir 路径,os.path.join(dataset_dir, image_dir)
image_dir: train2017
# 标注文件路径,相对 dataset_dir 路径,os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_train2017.json
# 数据文件夹
dataset_dir: dataset/coco
# data_fields
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
# 图像数据路径,相对 dataset_dir 路径,os.path.join(dataset_dir, image_dir)
image_dir: val2017
# 标注文件路径,相对 dataset_dir 路径,os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_val2017.json
# 数据文件夹,os.path.join(dataset_dir, anno_path)
dataset_dir: dataset/coco
TestDataset:
!ImageFolder
# 标注文件路径,相对 dataset_dir 路径
anno_path: annotations/instances_val2017.json
```
- 优化器配置文件 `optimizer_1x.yml`
```yaml
# 总训练轮数
epoch: 405
# 学习率设置
LearningRate:
# 默认为8卡训学习率
base_lr: 0.01
# 学习率调整策略
schedulers:
- !PiecewiseDecay
gamma: 0.1
# 学习率变化位置(轮数)
milestones:
- 243
- 324
# Warmup
- !LinearWarmup
start_factor: 0.
steps: 4000
# 优化器
OptimizerBuilder:
# 优化器
optimizer:
momentum: 0.9
type: Momentum
# 正则化
regularizer:
factor: 0.0005
type: L2
```
- 数据读取配置文件 `ppyolo_reader.yml`
```yaml
# 每张GPU reader进程个数
worker_num: 2
# 训练数据
TrainReader:
inputs_def:
num_max_boxes: 50
# 训练数据transforms
sample_transforms:
- Decode: {}
- Mixup: {alpha: 1.5, beta: 1.5}
- RandomDistort: {}
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
- RandomCrop: {}
- RandomFlip: {}
# batch_transforms
batch_transforms:
- BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeBox: {}
- PadBox: {num_max_boxes: 50}
- BboxXYXY2XYWH: {}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
- Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
# 训练时batch_size
batch_size: 24
# 读取数据是否乱序
shuffle: true
# 是否丢弃最后不能完整组成batch的数据
drop_last: true
# mixup_epoch,大于最大epoch,表示训练过程一直使用mixup数据增广
mixup_epoch: 25000
# 是否通过共享内存进行数据读取加速,需要保证共享内存大小(如/dev/shm)满足大于1G
use_shared_memory: true
# 评估数据
EvalReader:
# 评估数据transforms
sample_transforms:
- Decode: {}
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
# 评估时batch_size
batch_size: 8
# 是否丢弃没有标注的数据
drop_empty: false
# 测试数据
TestReader:
inputs_def:
image_shape: [3, 608, 608]
# 测试数据transforms
sample_transforms:
- Decode: {}
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
# 测试时batch_size
batch_size: 1
```
- 模型配置文件 `ppyolo_r50vd_dcn.yml`
```yaml
# 模型结构类型
architecture: YOLOv3
# 预训练模型地址
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
# norm_type
norm_type: sync_bn
# 是否使用ema
use_ema: true
# ema_decay
ema_decay: 0.9998
# YOLOv3
YOLOv3:
# backbone
backbone: ResNet
# neck
neck: PPYOLOFPN
# yolo_head
yolo_head: YOLOv3Head
# post_process
post_process: BBoxPostProcess
# backbone
ResNet:
# depth
depth: 50
# variant
variant: d
# return_idx, 0 represent res2
return_idx: [1, 2, 3]
# dcn_v2_stages
dcn_v2_stages: [3]
# freeze_at
freeze_at: -1
# freeze_norm
freeze_norm: false
# norm_decay
norm_decay: 0.
# PPYOLOFPN
PPYOLOFPN:
# 是否coord_conv
coord_conv: true
# 是否drop_block
drop_block: true
# block_size
block_size: 3
# keep_prob
keep_prob: 0.9
# 是否spp
spp: true
# YOLOv3Head
YOLOv3Head:
# anchors
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
# anchor_masks
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
# loss
loss: YOLOv3Loss
# 是否使用iou_aware
iou_aware: true
# iou_aware_factor
iou_aware_factor: 0.4
# YOLOv3Loss
YOLOv3Loss:
# ignore_thresh
ignore_thresh: 0.7
# downsample
downsample: [32, 16, 8]
# 是否label_smooth
label_smooth: false
# scale_x_y
scale_x_y: 1.05
# iou_loss
iou_loss: IouLoss
# iou_aware_loss
iou_aware_loss: IouAwareLoss
# IouLoss
IouLoss:
loss_weight: 2.5
loss_square: true
# IouAwareLoss
IouAwareLoss:
loss_weight: 1.0
# BBoxPostProcess
BBoxPostProcess:
decode:
name: YOLOBox
conf_thresh: 0.01
downsample_ratio: 32
clip_bbox: true
scale_x_y: 1.05
# nms 配置
nms:
name: MatrixNMS
keep_top_k: 100
score_threshold: 0.01
post_threshold: 0.01
nms_top_k: -1
background_label: -1
```
- 运行时置文件 `runtime.yml`
```yaml
# 是否使用gpu
use_gpu: true
# 日志打印间隔
log_iter: 20
# save_dir
save_dir: output
# 模型保存间隔时间
snapshot_epoch: 1
```
# YOLO series model parameter configuration tutorial
Tag: Model parameter configuration
Take `ppyolo_r50vd_dcn_1x_coco.yml` as an example, The model consists of five sub-profiles:
- Data profile `coco_detection.yml`
```yaml
# Data evaluation type
metric: COCO
# The number of categories in the dataset
num_classes: 80
# TrainDataset
TrainDataset:
!COCODataSet
# Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir)
image_dir: train2017
# Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_train2017.json
# data file
dataset_dir: dataset/coco
# data_fields
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
# Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir)
image_dir: val2017
# Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_val2017.json
# data file os.path.join(dataset_dir, anno_path)
dataset_dir: dataset/coco
TestDataset:
!ImageFolder
# Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_val2017.json
```
- Optimizer configuration file `optimizer_1x.yml`
```yaml
# Total training epoches
epoch: 405
# learning rate setting
LearningRate:
# Default is 8 Gpus training learning rate
base_lr: 0.01
# Learning rate adjustment strategy
schedulers:
- !PiecewiseDecay
gamma: 0.1
# Position of change in learning rate (number of epoches)
milestones:
- 243
- 324
# Warmup
- !LinearWarmup
start_factor: 0.
steps: 4000
# Optimizer
OptimizerBuilder:
# Optimizer
optimizer:
momentum: 0.9
type: Momentum
# Regularization
regularizer:
factor: 0.0005
type: L2
```
- Data reads configuration files `ppyolo_reader.yml`
```yaml
# Number of PROCESSES per GPU Reader
worker_num: 2
# training data
TrainReader:
inputs_def:
num_max_boxes: 50
# Training data transforms
sample_transforms:
- Decode: {}
- Mixup: {alpha: 1.5, beta: 1.5}
- RandomDistort: {}
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
- RandomCrop: {}
- RandomFlip: {}
# batch_transforms
batch_transforms:
- BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeBox: {}
- PadBox: {num_max_boxes: 50}
- BboxXYXY2XYWH: {}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
- Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
# Batch size during training
batch_size: 24
# Read data is out of order
shuffle: true
# Whether to discard data that does not complete the batch
drop_last: true
# mixup_epoch,Greater than maximum epoch, Indicates that the training process has been augmented with mixup data
mixup_epoch: 25000
# Whether to use the shared memory to accelerate data reading, ensure that the shared memory size (such as /dev/shm) is greater than 1 GB
use_shared_memory: true
# Evaluate data
EvalReader:
# Evaluating data transforms
sample_transforms:
- Decode: {}
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
# Batch_size during evaluation
batch_size: 8
# Whether to discard unlabeled data
drop_empty: false
# test data
TestReader:
inputs_def:
image_shape: [3, 608, 608]
# test data transforms
sample_transforms:
- Decode: {}
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
# batch_size during training
batch_size: 1
```
- Model profile `ppyolo_r50vd_dcn.yml`
```yaml
# Model structure type
architecture: YOLOv3
# Pretrain model address
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
# norm_type
norm_type: sync_bn
# Whether to use EMA
use_ema: true
# ema_decay
ema_decay: 0.9998
# YOLOv3
YOLOv3:
# backbone
backbone: ResNet
# neck
neck: PPYOLOFPN
# yolo_head
yolo_head: YOLOv3Head
# post_process
post_process: BBoxPostProcess
# backbone
ResNet:
# depth
depth: 50
# variant
variant: d
# return_idx, 0 represent res2
return_idx: [1, 2, 3]
# dcn_v2_stages
dcn_v2_stages: [3]
# freeze_at
freeze_at: -1
# freeze_norm
freeze_norm: false
# norm_decay
norm_decay: 0.
# PPYOLOFPN
PPYOLOFPN:
# whether coord_conv or not
coord_conv: true
# whether drop_block or not
drop_block: true
# block_size
block_size: 3
# keep_prob
keep_prob: 0.9
# whether spp or not
spp: true
# YOLOv3Head
YOLOv3Head:
# anchors
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
# anchor_masks
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
# loss
loss: YOLOv3Loss
# whether to use iou_aware
iou_aware: true
# iou_aware_factor
iou_aware_factor: 0.4
# YOLOv3Loss
YOLOv3Loss:
# ignore_thresh
ignore_thresh: 0.7
# downsample
downsample: [32, 16, 8]
# whether label_smooth or not
label_smooth: false
# scale_x_y
scale_x_y: 1.05
# iou_loss
iou_loss: IouLoss
# iou_aware_loss
iou_aware_loss: IouAwareLoss
# IouLoss
IouLoss:
loss_weight: 2.5
loss_square: true
# IouAwareLoss
IouAwareLoss:
loss_weight: 1.0
# BBoxPostProcess
BBoxPostProcess:
decode:
name: YOLOBox
conf_thresh: 0.01
downsample_ratio: 32
clip_bbox: true
scale_x_y: 1.05
# nms setting
nms:
name: MatrixNMS
keep_top_k: 100
score_threshold: 0.01
post_threshold: 0.01
nms_top_k: -1
background_label: -1
```
- Runtime file `runtime.yml`
```yaml
# Whether to use gpu
use_gpu: true
# Log Printing interval
log_iter: 20
# save_dir
save_dir: output
# Model save interval
snapshot_epoch: 1
```
简体中文 | [English](DetAnnoTools_en.md)
# 目标检测标注工具
## 目录
[LabelMe](#LabelMe)
* [使用说明](#使用说明)
* [安装](#LabelMe安装)
* [图片标注过程](#LabelMe图片标注过程)
* [标注格式](#LabelMe标注格式)
* [导出数据格式](#LabelMe导出数据格式)
* [格式转化总结](#格式转化总结)
* [标注文件(json)-->VOC](#标注文件(json)-->VOC数据集)
* [标注文件(json)-->COCO](#标注文件(json)-->COCO数据集)
[LabelImg](#LabelImg)
* [使用说明](#使用说明)
* [LabelImg安装](#LabelImg安装)
* [安装注意事项](#安装注意事项)
* [图片标注过程](#LabelImg图片标注过程)
* [标注格式](#LabelImg标注格式)
* [导出数据格式](#LabelImg导出数据格式)
* [格式转换注意事项](#格式转换注意事项)
## [LabelMe](https://github.com/wkentaro/labelme)
### 使用说明
#### LabelMe安装
具体安装操作请参考[LabelMe官方教程](https://github.com/wkentaro/labelme)中的Installation
<details>
<summary><b> Ubuntu</b></summary>
```
sudo apt-get install labelme
# or
sudo pip3 install labelme
# or install standalone executable from:
# https://github.com/wkentaro/labelme/releases
```
</details>
<details>
<summary><b> macOS</b></summary>
```
brew install pyqt # maybe pyqt5
pip install labelme
# or
brew install wkentaro/labelme/labelme # command line interface
# brew install --cask wkentaro/labelme/labelme # app
# or install standalone executable/app from:
# https://github.com/wkentaro/labelme/releases
```
</details>
推荐使用Anaconda的安装方式
```
conda create –name=labelme python=3
conda activate labelme
pip install pyqt5
pip install labelme
```
#### LabelMe图片标注过程
启动labelme后,选择图片文件或者图片所在文件夹
左侧编辑栏选择`create polygons` 绘制标注区域如下图所示(右击图像区域可以选择不同的标注形状),绘制好区域后按下回车,弹出新的框填入标注区域对应的标签,如:people
左侧菜单栏点击保存,生成`json`形式的**标注文件**
![](https://media3.giphy.com/media/XdnHZgge5eynRK3ATK/giphy.gif?cid=790b7611192e4c0ec2b5e6990b6b0f65623154ffda66b122&rid=giphy.gif&ct=g)
### LabelMe标注格式
#### LabelMe导出数据格式
```
#生成标注文件
png/jpeg/jpg-->labelme标注-->json
```
#### 格式转化总结
```
#标注文件转化为VOC数据集格式
json-->labelme2voc.py-->VOC数据集
#标注文件转化为COCO数据集格式
json-->labelme2coco.py-->COCO数据集
```
#### 标注文件(json)-->VOC数据集
使用[官方给出的labelme2voc.py](https://github.com/wkentaro/labelme/blob/main/examples/bbox_detection/labelme2voc.py)这份脚本
下载该脚本,在命令行中使用
```Te
python labelme2voc.py data_annotated(标注文件所在文件夹) data_dataset_voc(输出文件夹) --labels labels.txt
```
运行后,在指定的输出文件夹中会如下的目录
```
# It generates:
# - data_dataset_voc/JPEGImages
# - data_dataset_voc/Annotations
# - data_dataset_voc/AnnotationsVisualization
```
#### 标注文件(json)-->COCO数据集
使用[PaddleDetection提供的x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/tools/x2coco.py) 将labelme标注的数据转换为COCO数据集形式
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
用户数据集转成COCO数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错):
```
dataset/xxx/
├── annotations
│ ├── train.json # coco数据的标注文件
│ ├── valid.json # coco数据的标注文件
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```
## [LabelImg](https://github.com/tzutalin/labelImg)
### 使用说明
#### LabelImg安装
安装操作请参考[LabelImg官方教程](https://github.com/tzutalin/labelImg)
<details>
<summary><b> Ubuntu</b></summary>
```
sudo apt-get install pyqt5-dev-tools
sudo pip3 install -r requirements/requirements-linux-python3.txt
make qt5py3
python3 labelImg.py
python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
</details>
<details>
<summary><b>macOS</b></summary>
```
brew install qt # Install qt-5.x.x by Homebrew
brew install libxml2
or using pip
pip3 install pyqt5 lxml # Install qt and lxml by pip
make qt5py3
python3 labelImg.py
python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
</details>
推荐使用Anaconda的安装方式
首先下载并进入 [labelImg](https://github.com/tzutalin/labelImg#labelimg) 的目录
```
conda install pyqt=5
conda install -c anaconda lxml
pyrcc5 -o libs/resources.py resources.qrc
python labelImg.py
python labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
#### 安装注意事项
以Anaconda安装方式为例,比Labelme配置要麻烦一些
启动方式是通过python运行脚本`python labelImg.py <图片路径>`
#### LabelImg图片标注过程
启动labelImg后,选择图片文件或者图片所在文件夹
左侧编辑栏选择`创建区块` 绘制标注区,在弹出新的框选择对应的标签
左侧菜单栏点击保存,可以选择VOC/YOLO/CreateML三种类型的标注文件
![](https://user-images.githubusercontent.com/34162360/177526022-fd9c63d8-e476-4b63-ae02-76d032bb7656.gif)
### LabelImg标注格式
#### LabelImg导出数据格式
```
#生成标注文件
png/jpeg/jpg-->labelImg标注-->xml/txt/json
```
#### 格式转换注意事项
**PaddleDetection支持VOC或COCO格式的数据**,经LabelImg标注导出后的标注文件,需要修改为**VOC或COCO格式**,调整说明可以参考[准备训练数据](./PrepareDataSet.md#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE)
[简体中文](DetAnnoTools.md) | English
# Object Detection Annotation Tools
## Concents
[LabelMe](#LabelMe)
* [Instruction](#Instruction-of-LabelMe)
* [Installation](#Installation)
* [Annotation of Images](#Annotation-of-images-in-LabelMe)
* [Annotation Format](#Annotation-Format-of-LabelMe)
* [Export Format](#Export-Format-of-LabelMe)
* [Summary of Format Conversion](#Summary-of-Format-Conversion)
* [Annotation file(json)—>VOC Dataset](#annotation-filejsonvoc-dataset)
* [Annotation file(json)—>COCO Dataset](#annotation-filejsoncoco-dataset)
[LabelImg](#LabelImg)
* [Instruction](#Instruction-of-LabelImg)
* [Installation](#Installation-of-LabelImg)
* [Installation Notes](#Installation-Notes)
* [Annotation of images](#Annotation-of-images-in-LabelImg)
* [Annotation Format](#Annotation-Format-of-LabelImg)
* [Export Format](#Export-Format-of-LabelImg)
* [Notes of Format Conversion](#Notes-of-Format-Conversion)
## [LabelMe](https://github.com/wkentaro/labelme)
### Instruction of LabelMe
#### Installation
Please refer to [The github of LabelMe](https://github.com/wkentaro/labelme) for installation details.
<details>
<summary><b> Ubuntu</b></summary>
```
sudo apt-get install labelme
# or
sudo pip3 install labelme
# or install standalone executable from:
# https://github.com/wkentaro/labelme/releases
```
</details>
<details>
<summary><b> macOS</b></summary>
```
brew install pyqt # maybe pyqt5
pip install labelme
# or
brew install wkentaro/labelme/labelme # command line interface
# brew install --cask wkentaro/labelme/labelme # app
# or install standalone executable/app from:
# https://github.com/wkentaro/labelme/releases
```
</details>
We recommend installing by Anoncanda.
```
conda create –name=labelme python=3
conda activate labelme
pip install pyqt5
pip install labelme
```
#### Annotation of Images in LabelMe
After starting labelme, select an image or an folder with images.
Select `create polygons` in the formula bar. Draw an annotation area as shown in the following GIF. You can right-click on the image to select different shape. When finished, press the Enter/Return key, then fill the corresponding label in the popup box, such as, people.
Click the save button in the formula bar,it will generate an annotation file in json.
![](https://media3.giphy.com/media/XdnHZgge5eynRK3ATK/giphy.gif?cid=790b7611192e4c0ec2b5e6990b6b0f65623154ffda66b122&rid=giphy.gif&ct=g)
### Annotation Format of LabelMe
#### Export Format of LabelMe
```
#generate an annotation file
png/jpeg/jpg-->labelme-->json
```
#### Summary of Format Conversion
```
#convert annotation file to VOC dataset format
json-->labelme2voc.py-->VOC dataset
#convert annotation file to COCO dataset format
json-->labelme2coco.py-->COCO dataset
```
#### Annotation file(json)—>VOC Dataset
Use this script [labelme2voc.py](https://github.com/wkentaro/labelme/blob/main/examples/bbox_detection/labelme2voc.py) in command line.
```Te
python labelme2voc.py data_annotated(annotation folder) data_dataset_voc(output folder) --labels labels.txt
```
Then, it will generate following contents:
```
# It generates:
# - data_dataset_voc/JPEGImages
# - data_dataset_voc/Annotations
# - data_dataset_voc/AnnotationsVisualization
```
#### Annotation file(json)—>COCO Dataset
Convert the data annotated by LabelMe to COCO dataset by the script [x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/tools/x2coco.py) provided by PaddleDetection.
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
After the user dataset is converted to COCO data, the directory structure is as follows (Try to avoid use Chinese for the path name in case of errors caused by Chinese coding problems):
```
dataset/xxx/
├── annotations
│ ├── train.json # Annotation file of coco data
│ ├── valid.json # Annotation file of coco data
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```
## [LabelImg](https://github.com/tzutalin/labelImg)
### Instruction
#### Installation of LabelImg
Please refer to [The github of LabelImg](https://github.com/tzutalin/labelImg) for installation details.
<details>
<summary><b> Ubuntu</b></summary>
```
sudo apt-get install pyqt5-dev-tools
sudo pip3 install -r requirements/requirements-linux-python3.txt
make qt5py3
python3 labelImg.py
python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
</details>
<details>
<summary><b>macOS</b></summary>
```
brew install qt # Install qt-5.x.x by Homebrew
brew install libxml2
or using pip
pip3 install pyqt5 lxml # Install qt and lxml by pip
make qt5py3
python3 labelImg.py
python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
</details>
We recommend installing by Anoncanda.
Download and go to the folder of [labelImg](https://github.com/tzutalin/labelImg#labelimg)
```
conda install pyqt=5
conda install -c anaconda lxml
pyrcc5 -o libs/resources.py resources.qrc
python labelImg.py
python labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
#### Installation Notes
Use python scripts to startup LabelImg: `python labelImg.py <IMAGE_PATH>`
#### Annotation of images in LabelImg
After the startup of LabelImg, select an image or a folder with images.
Select `Create RectBox` in the formula bar. Draw an annotation area as shown in the following GIF. When finished, select corresponding label in the popup box. Then save the annotated file in three forms: VOC/YOLO/CreateML.
![](https://user-images.githubusercontent.com/34162360/177526022-fd9c63d8-e476-4b63-ae02-76d032bb7656.gif)
### Annotation Format of LabelImg
#### Export Format of LabelImg
```
#generate annotation files
png/jpeg/jpg-->labelImg-->xml/txt/json
```
#### Notes of Format Conversion
**PaddleDetection supports the format of VOC or COCO.** The annotation file generated by LabelImg needs to be converted by VOC or COCO. You can refer to [PrepareDataSet](./PrepareDataSet.md#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE).
# 目标检测数据准备
## 目录
- [目标检测数据说明](#目标检测数据说明)
- [准备训练数据](#准备训练数据)
- [VOC数据数据](#VOC数据数据)
- [VOC数据集下载](#VOC数据集下载)
- [VOC数据标注文件介绍](#VOC数据标注文件介绍)
- [COCO数据数据](#COCO数据数据)
- [COCO数据集下载](#COCO数据下载)
- [COCO数据标注文件介绍](#COCO数据标注文件介绍)
- [用户数据准备](#用户数据准备)
- [用户数据转成VOC数据](#用户数据转成VOC数据)
- [用户数据转成COCO数据](#用户数据转成COCO数据)
- [用户数据自定义reader](#用户数据自定义reader)
- [用户数据使用示例](#用户数据使用示例)
- [数据格式转换](#数据格式转换)
- [自定义数据训练](#自定义数据训练)
- [(可选)生成Anchor](#(可选)生成Anchor)
### 目标检测数据说明
目标检测的数据比分类复杂,一张图像中,需要标记出各个目标区域的位置和类别。
一般的目标区域位置用一个矩形框来表示,一般用以下3种方式表达:
| 表达方式 | 说明 |
| :----------------: | :--------------------------------: |
| x1,y1,x2,y2 | (x1,y1)为左上角坐标,(x2,y2)为右下角坐标 |
| x1,y1,w,h | (x1,y1)为左上角坐标,w为目标区域宽度,h为目标区域高度 |
| xc,yc,w,h | (xc,yc)为目标区域中心坐标,w为目标区域宽度,h为目标区域高度 |
常见的目标检测数据集如Pascal VOC采用的`[x1,y1,x2,y2]` 表示物体的bounding box, [COCO](https://cocodataset.org/#format-data)采用的`[x1,y1,w,h]` 表示物体的bounding box.
### 准备训练数据
PaddleDetection默认支持[COCO](http://cocodataset.org)[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/)[WIDER-FACE](http://shuoyang1213.me/WIDERFACE/) 数据源。
同时还支持自定义数据源,包括:
(1) 自定义数据数据转换成VOC数据;
(2) 自定义数据数据转换成COCO数据;
(3) 自定义新的数据源,增加自定义的reader。
首先进入到`PaddleDetection`根目录下
```
cd PaddleDetection/
ppdet_root=$(pwd)
```
#### VOC数据数据
VOC数据是[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) 比赛使用的数据。Pascal VOC比赛不仅包含图像分类分类任务,还包含图像目标检测、图像分割等任务,其标注文件中包含多个任务的标注内容。
VOC数据集指的是Pascal VOC比赛使用的数据。用户自定义的VOC数据,xml文件中的非必须字段,请根据实际情况选择是否标注或是否使用默认值。
##### VOC数据集下载
- 通过代码自动化下载VOC数据集,数据集较大,下载需要较长时间
```
# 执行代码自动化下载VOC数据集
python dataset/voc/download_voc.py
```
代码执行完成后VOC数据集文件组织结构为:
```
>>cd dataset/voc/
>>tree
├── create_list.py
├── download_voc.py
├── generic_det_label_list.txt
├── generic_det_label_list_zh.txt
├── label_list.txt
├── VOCdevkit/VOC2007
│ ├── annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.jpg
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 2011_003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 2011_003876.jpg
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
各个文件说明
```
# label_list.txt 是类别名称列表,文件名必须是 label_list.txt。若使用VOC数据集,config文件中use_default_label为true时不需要这个文件
>>cat label_list.txt
aeroplane
bicycle
...
# trainval.txt 是训练数据集文件列表
>>cat trainval.txt
VOCdevkit/VOC2007/JPEGImages/007276.jpg VOCdevkit/VOC2007/Annotations/007276.xml
VOCdevkit/VOC2012/JPEGImages/2011_002612.jpg VOCdevkit/VOC2012/Annotations/2011_002612.xml
...
# test.txt 是测试数据集文件列表
>>cat test.txt
VOCdevkit/VOC2007/JPEGImages/000001.jpg VOCdevkit/VOC2007/Annotations/000001.xml
...
# label_list.txt voc 类别名称列表
>>cat label_list.txt
aeroplane
bicycle
...
```
- 已下载VOC数据集
按照如上数据文件组织结构组织文件即可。
##### VOC数据标注文件介绍
VOC数据是每个图像文件对应一个同名的xml文件,xml文件中标记物体框的坐标和类别等信息。例如图像`2007_002055.jpg`
![](../images/2007_002055.jpg)
图片对应的xml文件内包含对应图片的基本信息,比如文件名、来源、图像尺寸以及图像中包含的物体区域信息和类别信息等。
xml文件中包含以下字段:
- filename,表示图像名称。
- size,表示图像尺寸。包括:图像宽度、图像高度、图像深度。
```
<size>
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
```
- object字段,表示每个物体。包括:
| 标签 | 说明 |
| :--------: | :-----------: |
| name | 物体类别名称 |
| pose | 关于目标物体姿态描述(非必须字段) |
| truncated | 如果物体的遮挡超过15-20%并且位于边界框之外,请标记为`truncated`(非必须字段) |
| difficult | 难以识别的物体标记为`difficult`(非必须字段) |
| bndbox子标签 | (xmin,ymin) 左上角坐标,(xmax,ymax) 右下角坐标, |
#### COCO数据
COCO数据是[COCO](http://cocodataset.org) 比赛使用的数据。同样的,COCO比赛数也包含多个比赛任务,其标注文件中包含多个任务的标注内容。
COCO数据集指的是COCO比赛使用的数据。用户自定义的COCO数据,json文件中的一些字段,请根据实际情况选择是否标注或是否使用默认值。
##### COCO数据下载
- 通过代码自动化下载COCO数据集,数据集较大,下载需要较长时间
```
# 执行代码自动化下载COCO数据集
python dataset/coco/download_coco.py
```
代码执行完成后COCO数据集文件组织结构为:
```
>>cd dataset/coco/
>>tree
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
- 已下载COCO数据集
按照如上数据文件组织结构组织文件即可。
##### COCO数据标注介绍
COCO数据标注是将所有训练图像的标注都存放到一个json文件中。数据以字典嵌套的形式存放。
json文件中包含以下key:
- info,表示标注文件info。
- licenses,表示标注文件licenses。
- images,表示标注文件中图像信息列表,每个元素是一张图像的信息。如下为其中一张图像的信息:
```
{
'license': 3, # license
'file_name': '000000391895.jpg', # file_name
# coco_url
'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg',
'height': 360, # image height
'width': 640, # image width
'date_captured': '2013-11-14 11:18:45', # date_captured
# flickr_url
'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg',
'id': 391895 # image id
}
```
- annotations,表示标注文件中目标物体的标注信息列表,每个元素是一个目标物体的标注信息。如下为其中一个目标物体的标注信息:
```
{
'segmentation': # 物体的分割标注
'area': 2765.1486500000005, # 物体的区域面积
'iscrowd': 0, # iscrowd
'image_id': 558840, # image id
'bbox': [199.84, 200.46, 77.71, 70.88], # bbox [x1,y1,w,h]
'category_id': 58, # category_id
'id': 156 # image id
}
```
```
# 查看COCO标注文件
import json
coco_anno = json.load(open('./annotations/instances_train2017.json'))
# coco_anno.keys
print('\nkeys:', coco_anno.keys())
# 查看类别信息
print('\n物体类别:', coco_anno['categories'])
# 查看一共多少张图
print('\n图像数量:', len(coco_anno['images']))
# 查看一共多少个目标物体
print('\n标注物体数量:', len(coco_anno['annotations']))
# 查看一条目标物体标注信息
print('\n查看一条目标物体标注信息:', coco_anno['annotations'][0])
```
#### 用户数据准备
对于用户数据有3种处理方法:
(1) 将用户数据转成VOC数据(根据需要仅包含物体检测所必须的标签即可)
(2) 将用户数据转成COCO数据(根据需要仅包含物体检测所必须的标签即可)
(3) 自定义一个用户数据的reader(较复杂数据,需要自定义reader)
##### 用户数据转成VOC数据
用户数据集转成VOC数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错):
```
dataset/xxx/
├── annotations
│ ├── xxx1.xml
│ ├── xxx2.xml
│ ├── xxx3.xml
│ | ...
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
├── label_list.txt (必须提供,且文件名称必须是label_list.txt )
├── train.txt (训练数据集文件列表, ./images/xxx1.jpg ./annotations/xxx1.xml)
└── valid.txt (测试数据集文件列表)
```
各个文件说明
```
# label_list.txt 是类别名称列表,改文件名必须是这个
>>cat label_list.txt
classname1
classname2
...
# train.txt 是训练数据文件列表
>>cat train.txt
./images/xxx1.jpg ./annotations/xxx1.xml
./images/xxx2.jpg ./annotations/xxx2.xml
...
# valid.txt 是验证数据文件列表
>>cat valid.txt
./images/xxx3.jpg ./annotations/xxx3.xml
...
```
##### 用户数据转成COCO数据
`./tools/`中提供了`x2coco.py`用于将VOC数据集、labelme标注的数据集或cityscape数据集转换为COCO数据,例如:
(1)labelme数据转换为COCO数据:
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
(2)voc数据转换为COCO数据:
```bash
python tools/x2coco.py \
--dataset_type voc \
--voc_anno_dir path/to/VOCdevkit/VOC2007/Annotations/ \
--voc_anno_list path/to/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt \
--voc_label_list dataset/voc/label_list.txt \
--voc_out_name voc_train.json
```
用户数据集转成COCO数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错):
```
dataset/xxx/
├── annotations
│ ├── train.json # coco数据的标注文件
│ ├── valid.json # coco数据的标注文件
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```
##### 用户数据自定义reader
如果数据集有新的数据需要添加进PaddleDetection中,您可参考数据处理文档中的[添加新数据源](../advanced_tutorials/READER.md#2.3自定义数据集)文档部分,开发相应代码完成新的数据源支持,同时数据处理具体代码解析等可阅读[数据处理文档](../advanced_tutorials/READER.md)
#### 用户数据使用示例
[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据为例,说明如何准备自定义数据。
Kaggle上的 [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据包含877张图像,数据类别4类:crosswalk,speedlimit,stop,trafficlight。
可从Kaggle上下载,也可以从[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar) 下载。
路标数据集示例图:
![](../images/road554.png)
```
# 下载解压数据
>>cd $(ppdet_root)/dataset
# 下载kaggle数据集并解压,当前文件组织结构如下
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
```
#### 数据格式转换
将数据划分为训练集和测试集
```
# 生成 label_list.txt 文件
>>echo -e "speedlimit\ncrosswalk\ntrafficlight\nstop" > label_list.txt
# 生成 train.txt、valid.txt和test.txt列表文件
>>ls images/*.png | shuf > all_image_list.txt
>>awk -F"/" '{print $2}' all_image_list.txt | awk -F".png" '{print $1}' | awk -F"\t" '{print "images/"$1".png annotations/"$1".xml"}' > all_list.txt
# 训练集、验证集、测试集比例分别约80%、10%、10%。
>>head -n 88 all_list.txt > test.txt
>>head -n 176 all_list.txt | tail -n 88 > valid.txt
>>tail -n 701 all_list.txt > train.txt
# 删除不用文件
>>rm -rf all_image_list.txt all_list.txt
最终数据集文件组织结构为:
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
├── label_list.txt
├── test.txt
├── train.txt
└── valid.txt
# label_list.txt 是类别名称列表,文件名必须是 label_list.txt
>>cat label_list.txt
crosswalk
speedlimit
stop
trafficlight
# train.txt 是训练数据集文件列表,每一行是一张图像路径和对应标注文件路径,以空格分开。注意这里的路径是数据集文件夹内的相对路径。
>>cat train.txt
./images/road839.png ./annotations/road839.xml
./images/road363.png ./annotations/road363.xml
...
# valid.txt 是验证数据集文件列表,每一行是一张图像路径和对应标注文件路径,以空格分开。注意这里的路径是数据集文件夹内的相对路径。
>>cat valid.txt
./images/road218.png ./annotations/road218.xml
./images/road681.png ./annotations/road681.xml
```
也可以下载准备好的数据[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar) ,解压到`dataset/roadsign_voc/`文件夹下即可。
准备好数据后,一般的我们要对数据有所了解,比如图像量,图像尺寸,每一类目标区域个数,目标区域大小等。如有必要,还要对数据进行清洗。
roadsign数据集统计:
| 数据 | 图片数量 |
| :--------: | :-----------: |
| train | 701 |
| valid | 176 |
**说明:**
(1)用户数据,建议在训练前仔细检查数据,避免因数据标注格式错误或图像数据不完整造成训练过程中的crash
(2)如果图像尺寸太大的话,在不限制读入数据尺寸情况下,占用内存较多,会造成内存/显存溢出,请合理设置batch_size,可从小到大尝试
#### 自定义数据训练
数据准备完成后,需要修改PaddleDetection中关于Dataset的配置文件,在`configs/datasets`文件夹下。比如roadsign数据集的配置文件如下:
```
metric: VOC # 目前支持COCO, VOC, WiderFace等评估标准
num_classes: 4 # 数据集的类别数,不包含背景类,roadsign数据集为4类,其他数据需要修改为自己的数据类别
TrainDataset:
!VOCDataSet
dataset_dir: dataset/roadsign_voc # 训练集的图片所在文件相对于dataset_dir的路径
anno_path: train.txt # 训练集的标注文件相对于dataset_dir的路径
label_list: label_list.txt # 数据集所在路径,相对于PaddleDetection路径
data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] # 控制dataset输出的sample所包含的字段,注意此为训练集Reader独有的且必须配置的字段
EvalDataset:
!VOCDataSet
dataset_dir: dataset/roadsign_voc # 数据集所在路径,相对于PaddleDetection路径
anno_path: valid.txt # 验证集的标注文件相对于dataset_dir的路径
label_list: label_list.txt # 标签文件,相对于dataset_dir的路径
data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
TestDataset:
!ImageFolder
anno_path: label_list.txt # 标注文件所在路径,仅用于读取数据集的类别信息,支持json和txt格式
dataset_dir: dataset/roadsign_voc # 数据集所在路径,若添加了此行,则`anno_path`路径为相对于`dataset_dir`路径,若此行不设置或去掉此行,则为相对于PaddleDetection路径
```
然后在对应模型配置文件中将自定义数据文件路径替换为新路径,以`configs/yolov3/yolov3_mobilenet_v1_roadsign.yml`为例
```
_BASE_: [
'../datasets/roadsign_voc.yml', # 指定为自定义数据集配置路径
'../runtime.yml',
'_base_/optimizer_40e.yml',
'_base_/yolov3_mobilenet_v1.yml',
'_base_/yolov3_reader.yml',
]
pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams
weights: output/yolov3_mobilenet_v1_roadsign/model_final
YOLOv3Loss:
ignore_thresh: 0.7
label_smooth: true
```
在PaddleDetection的yml配置文件中,使用`!`直接序列化模块实例(可以是函数,实例等),上述的配置文件均使用Dataset进行了序列化。
配置修改完成后,即可以启动训练评估,命令如下
```
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval
```
更详细的命令参考[30分钟快速上手PaddleDetection](../GETTING_STARTED_cn.md)
**注意:**
请运行前自行仔细检查数据集的配置路径,在训练或验证时如果TrainDataset和EvalDataset的路径配置有误,会提示自动下载数据集。若使用自定义数据集,在推理时如果TestDataset路径配置有误,会提示使用默认COCO数据集的类别信息。
### (可选)生成Anchor
在yolo系列模型中,大多数情况下使用默认的anchor设置即可, 你也可以运行`tools/anchor_cluster.py`来得到适用于你的数据集Anchor,使用方法如下:
``` bash
python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000
```
目前`tools/anchor_cluster.py`支持的主要参数配置如下表所示:
| 参数 | 用途 | 默认值 | 备注 |
|:------:|:------:|:------:|:------:|
| -c/--config | 模型的配置文件 | 无默认值 | 必须指定 |
| -n/--n | 聚类的簇数 | 9 | Anchor的数目 |
| -s/--size | 图片的输入尺寸 | None | 若指定,则使用指定的尺寸,如果不指定, 则尝试从配置文件中读取图片尺寸 |
| -m/--method | 使用的Anchor聚类方法 | v2 | 目前只支持yolov2的聚类算法 |
| -i/--iters | kmeans聚类算法的迭代次数 | 1000 | kmeans算法收敛或者达到迭代次数后终止 |
# How to Prepare Training Data
## Directory
- [How to Prepare Training Data](#how-to-prepare-training-data)
- [Directory](#directory)
- [Description of Object Detection Data](#description-of-object-detection-data)
- [Prepare Training Data](#prepare-training-data)
- [VOC Data](#voc-data)
- [VOC Dataset Download](#voc-dataset-download)
- [Introduction to VOC Data Annotation File](#introduction-to-voc-data-annotation-file)
- [COCO Data](#coco-data)
- [COCO Data Download](#coco-data-download)
- [Description of COCO Data Annotation](#description-of-coco-data-annotation)
- [User Data](#user-data)
- [Convert User Data to VOC Data](#convert-user-data-to-voc-data)
- [Convert User Data to COCO Data](#convert-user-data-to-coco-data)
- [Reader of User Define Data](#reader-of-user-define-data)
- [Example of User Data Conversion](#example-of-user-data-conversion)
### Description of Object Detection Data
The data of object detection is more complex than classification. In an image, it is necessary to mark the position and category of each object.
The general object position is represented by a rectangular box, which is generally expressed in the following three ways
| Expression | Explanation |
| :---------: | :----------------------------------------------------------------------------: |
| x1,y1,x2,y2 | (x1,y1)is the top left coordinate, (x2,y2)is the bottom right coordonate |
| x1,y1,w,h | (x1,y1)is the top left coordinate, w is width of object, h is height of object |
| xc,yc,w,h | (xc,yc)is center of object, w is width of object, h is height of object |
Common object detection datasets such as Pascal VOC, adopting `[x1,y1,x2,y2]` to express the bounding box of object. COCO uses `[x1,y1,w,h]`, [format](https://cocodataset.org/#format-data).
### Prepare Training Data
PaddleDetection is supported [COCO](http://cocodataset.org) and [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) and [WIDER-FACE](http://shuoyang1213.me/WIDERFACE/) datasets by default.
It also supports custom data sources including:
(1) Convert custom data to VOC format;
(2) Convert custom data to COOC format;
(3) Customize a new data source, and add custom reader;
firstly, enter `PaddleDetection` root directory
```
cd PaddleDetection/
ppdet_root=$(pwd)
```
#### VOC Data
VOC data is used in [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) competition. Pascal VOC competition not only contains image classification task, but also contains object detection and object segmentation et al., the annotation file contains the ground truth of multiple tasks.
VOC dataset denotes the data of PAscal VOC competition. when customizeing VOC data, For non mandatory fields in the XML file, please select whether to label or use the default value according to the actual situation.
##### VOC Dataset Download
- Download VOC datasets through code automation. The datasets are large and take a long time to download
```
# Execute code to automatically download VOC dataset
python dataset/voc/download_voc.py
```
After code execution, the VOC dataset file organization structure is:
```
>>cd dataset/voc/
>>tree
├── create_list.py
├── download_voc.py
├── generic_det_label_list.txt
├── generic_det_label_list_zh.txt
├── label_list.txt
├── VOCdevkit/VOC2007
│ ├── annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.jpg
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 2011_003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 2011_003876.jpg
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
Description of each document
```
# label_list.txt is list of classes name,filename must be label_list.txt. If using VOC dataset, when `use_default_label=true` in config file, this file is not required.
>>cat label_list.txt
aeroplane
bicycle
...
# trainval.txt is file list of trainset
>>cat trainval.txt
VOCdevkit/VOC2007/JPEGImages/007276.jpg VOCdevkit/VOC2007/Annotations/007276.xml
VOCdevkit/VOC2012/JPEGImages/2011_002612.jpg VOCdevkit/VOC2012/Annotations/2011_002612.xml
...
# test.txt is file list of testset
>>cat test.txt
VOCdevkit/VOC2007/JPEGImages/000001.jpg VOCdevkit/VOC2007/Annotations/000001.xml
...
# label_list.txt voc list of classes name
>>cat label_list.txt
aeroplane
bicycle
...
```
- If the VOC dataset has been downloaded
You can organize files according to the above data file organization structure.
##### Introduction to VOC Data Annotation File
In VOC dataset, Each image file corresponds to an XML file with the same name, the coordinates and categories of the marked object frame in the XML file, such as `2007_002055.jpg`:
![](../images/2007_002055.jpg)
The XML file corresponding to the image contains the basic information of the corresponding image, such as file name, source, image size, object area information and category information contained in the image.
The XML file contains the following fields:
- filename, indicating the image name.
- size, indicating the image size, including: image width, image height and image depth
```
<size>
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
```
- object field, indict each object, including:
| Label | Explanation |
| :--------------: | :------------------------------------------------------------------------------------------------------------------------: |
| name | name of object class |
| pose | attitude description of the target object (non required field) |
| truncated | If the occlusion of the object exceeds 15-20% and is outside the bounding box,mark it as `truncated` (non required field) |
| difficult | objects that are difficult to recognize are marked as`difficult` (non required field) |
| bndbox son laebl | (xmin,ymin) top left coordinate, (xmax,ymax) bottom right coordinate |
#### COCO Data
COOC data is used in [COCO](http://cocodataset.org) competition. alike, Coco competition also contains multiple competition tasks, and its annotation file contains the annotation contents of multiple tasks.
The coco dataset refers to the data used in the coco competition. Customizing coco data, some fields in JSON file, please select whether to label or use the default value according to the actual situation.
##### COCO Data Download
- The coco dataset is downloaded automatically through the code. The dataset is large and takes a long time to download
```
# automatically download coco datasets by executing code
python dataset/coco/download_coco.py
```
after code execution, the organization structure of coco dataset file is:
```
>>cd dataset/coco/
>>tree
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
- If the coco dataset has been downloaded
The files can be organized according to the above data file organization structure.
##### Description of COCO Data Annotation
Coco data annotation is to store the annotations of all training images in a JSON file. Data is stored in the form of nested dictionaries.
The JSON file contains the following keys:
- info,indicating the annotation file info。
- licenses, indicating the label file licenses。
- images, indicating the list of image information in the annotation file, and each element is the information of an image. The following is the information of one of the images:
```
{
'license': 3, # license
'file_name': '000000391895.jpg', # file_name
# coco_url
'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg',
'height': 360, # image height
'width': 640, # image width
'date_captured': '2013-11-14 11:18:45', # date_captured
# flickr_url
'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg',
'id': 391895 # image id
}
```
- annotations: indicating the annotation information list of the target object in the annotation file. Each element is the annotation information of a target object. The following is the annotation information of one of the target objects:
```
{
'segmentation': # object segmentation annotation
'area': 2765.1486500000005, # object area
'iscrowd': 0, # iscrowd
'image_id': 558840, # image id
'bbox': [199.84, 200.46, 77.71, 70.88], # bbox [x1,y1,w,h]
'category_id': 58, # category_id
'id': 156 # image id
}
```
```
# Viewing coco annotation files
import json
coco_anno = json.load(open('./annotations/instances_train2017.json'))
# coco_anno.keys
print('\nkeys:', coco_anno.keys())
# Viewing categories information
print('\ncategories:', coco_anno['categories'])
# Viewing the number of images
print('\nthe number of images:', len(coco_anno['images']))
# Viewing the number of obejcts
print('\nthe number of annotation:', len(coco_anno['annotations']))
# View object annotation information
print('\nobject annotation information: ', coco_anno['annotations'][0])
```
Coco data is prepared as follows.
`dataset/coco/`Initial document organization
```
>>cd dataset/coco/
>>tree
├── download_coco.py
```
#### User Data
There are three processing methods for user data:
(1) Convert user data into VOC data (only include labels necessary for object detection as required)
(2) Convert user data into coco data (only include labels necessary for object detection as required)
(3) Customize a reader for user data (for complex data, you need to customize the reader)
##### Convert User Data to VOC Data
After the user dataset is converted to VOC data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems):
```
dataset/xxx/
├── annotations
│ ├── xxx1.xml
│ ├── xxx2.xml
│ ├── xxx3.xml
│ | ...
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
├── label_list.txt (Must be provided and the file name must be label_list.txt )
├── train.txt (list of trainset ./images/xxx1.jpg ./annotations/xxx1.xml)
└── valid.txt (list of valid file)
```
Description of each document
```
# label_list.txt is a list of category names. The file name must be this
>>cat label_list.txt
classname1
classname2
...
# train.txt is list of trainset
>>cat train.txt
./images/xxx1.jpg ./annotations/xxx1.xml
./images/xxx2.jpg ./annotations/xxx2.xml
...
# valid.txt is list of validset
>>cat valid.txt
./images/xxx3.jpg ./annotations/xxx3.xml
...
```
##### Convert User Data to COCO Data
`x2coco.py` is provided in `./tools/` to convert VOC dataset, labelme labeled dataset or cityscape dataset into coco data, for example:
(1)Conversion of labelme data to coco data:
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
(2)Convert VOC data to coco data:
```bash
python tools/x2coco.py \
--dataset_type voc \
--voc_anno_dir path/to/VOCdevkit/VOC2007/Annotations/ \
--voc_anno_list path/to/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt \
--voc_label_list dataset/voc/label_list.txt \
--voc_out_name voc_train.json
```
After the user dataset is converted to coco data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems):
```
dataset/xxx/
├── annotations
│ ├── train.json # Annotation file of coco data
│ ├── valid.json # Annotation file of coco data
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```
##### Reader of User Define Data
If new data in the dataset needs to be added to paddedetection, you can refer to the [add new data source] (../advanced_tutorials/READER.md#2.3_Customizing_Dataset) document section in the data processing document to develop corresponding code to complete the new data source support. At the same time, you can read the [data processing document] (../advanced_tutorials/READER.md) for specific code analysis of data processing
The configuration file for the Dataset exists in the `configs/datasets` folder. For example, the COCO dataset configuration file is as follows:
```
metric: COCO # Currently supports COCO, VOC, OID, Wider Face and other evaluation standards
num_classes: 80 # num_classes: The number of classes in the dataset, excluding background classes
TrainDataset:
!COCODataSet
image_dir: train2017 # The path where the training set image resides relative to the dataset_dir
anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir
dataset_dir: dataset/coco #The path where the dataset is located relative to the PaddleDetection path
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # Controls the fields contained in the sample output of the dataset, note data_fields are unique to the trainreader and must be configured
EvalDataset:
!COCODataSet
image_dir: val2017 # The path where the images of the validation set reside relative to the dataset_dir
anno_path: annotations/instances_val2017.json # The path to the annotation file of the validation set relative to the dataset_dir
dataset_dir: dataset/coco # The path where the dataset is located relative to the PaddleDetection path
TestDataset:
!ImageFolder
anno_path: dataset/coco/annotations/instances_val2017.json # The path of the annotation file, it is only used to read the category information of the dataset. JSON and TXT formats are supported
dataset_dir: dataset/coco # The path of the dataset, note if this row is added, `anno_path` will be 'dataset_dir/anno_path`, if not set or removed, `anno_path` is `anno_path`
```
In the YML profile for Paddle Detection, use `!`directly serializes module instances (functions, instances, etc.). The above configuration files are serialized using Dataset.
**Note:**
Please carefully check the configuration path of the dataset before running. During training or verification, if the path of TrainDataset or EvalDataset is wrong, it will download the dataset automatically. When using a user-defined dataset, if the TestDataset path is incorrectly configured during inference, the category of the default COCO dataset will be used.
#### Example of User Data Conversion
Take [Kaggle Dataset](https://www.kaggle.com/andrewmvd/road-sign-detection) competition data as an example to illustrate how to prepare custom data. The dataset of Kaggle [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) competition contains 877 images, four categories:crosswalk,speedlimit,stop,trafficlight. Available for download from kaggle, also available from [link](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar).
Example diagram of road sign dataset:
![](../images/road554.png)
```
# Downing and unziping data
>>cd $(ppdet_root)/dataset
# Download and unzip the kaggle dataset. The current file organization is as follows
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
```
The data is divided into training set and test set
```
# Generating label_list.txt
>>echo -e "speedlimit\ncrosswalk\ntrafficlight\nstop" > label_list.txt
# Generating train.txt, valid.txt and test.txt
>>ls images/*.png | shuf > all_image_list.txt
>>awk -F"/" '{print $2}' all_image_list.txt | awk -F".png" '{print $1}' | awk -F"\t" '{print "images/"$1".png annotations/"$1".xml"}' > all_list.txt
# The proportion of training set, verification set and test set is about 80%, 10% and 10% respectively.
>>head -n 88 all_list.txt > test.txt
>>head -n 176 all_list.txt | tail -n 88 > valid.txt
>>tail -n 701 all_list.txt > train.txt
# Deleting unused files
>>rm -rf all_image_list.txt all_list.txt
The organization structure of the final dataset file is:
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
├── label_list.txt
├── test.txt
├── train.txt
└── valid.txt
# label_list.txt is list of file name, file name must be label_list.txt
>>cat label_list.txt
crosswalk
speedlimit
stop
trafficlight
# train.txt is the list of training dataset files, and each line is an image path and the corresponding annotation file path, separated by spaces. Note that the path here is a relative path within the dataset folder.
>>cat train.txt
./images/road839.png ./annotations/road839.xml
./images/road363.png ./annotations/road363.xml
...
# valid.txt is the list of validation dataset files. Each line is an image path and the corresponding annotation file path, separated by spaces. Note that the path here is a relative path within the dataset folder.
>>cat valid.txt
./images/road218.png ./annotations/road218.xml
./images/road681.png ./annotations/road681.xml
```
You can also download [the prepared data](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar), unzip to `dataset/roadsign_voc/`
After preparing the data, we should generally understand the data, such as image quantity, image size, number of target areas of each type, target area size, etc. If necessary, clean the data.
Roadsign dataset statistics:
| data | number of images |
| :---: | :--------------: |
| train | 701 |
| valid | 176 |
**Explanation:**
(1) For user data, it is recommended to carefully check the data before training to avoid crash during training due to wrong data annotation format or incomplete image data
(2) If the image size is too large, it will occupy more memory without limiting the read data size, which will cause memory / video memory overflow. Please set batch reasonably_ Size, you can try from small to large
# 数据准备
数据对于深度学习开发起到了至关重要的作用,数据采集和标注的质量是提升业务模型效果的重要因素。本文档主要介绍PaddleDetection中如何进行数据准备,包括采集高质量数据方法,覆盖多场景类型,提升模型泛化能力;以及各类任务数据标注工具和方法,并在PaddleDetection下使用
## 数据采集
在深度学习任务的实际落地中,数据采集往往决定了最终模型的效果,对于数据采集的几点建议如下:
### 确定方向
任务类型、数据的类别和目标场景这些因素决定了要收集什么数据,首先需要根据这些因素来确定整体数据收集的工作方向。
### 开源数据集
在实际场景中数据采集成本其实十分高昂,完全靠自己收集在时间和金钱上都有很高的成本,开源数据集是帮助增加训练数据量的重要手段,所以很多时候会考虑加入一些相似任务的开源数据。在使用中请遵守各个开源数据集的license规定的使用条件。
### 增加场景数据
开源数据一般不会覆盖实际使用的的目标场景,用户需要评估开源数据集中已包含的场景和目标场景间的差异,有针对性地补充目标场景数据,尽量让训练和部署数据的场景一致。
### 类别均衡
在采集阶段,也需要尽量保持类别均衡,帮助模型正确学习到目标特征。
## 数据标注及格式说明
| 任务类型 | 数据标注 | 数据格式说明 |
|:--------:| :--------:|:--------:|
| 目标检测 | [文档链接](DetAnnoTools.md) | [文档链接](PrepareDetDataSet.md) |
| 关键点检测 | [文档链接](KeyPointAnnoTools.md) | [文档链接](PrepareKeypointDataSet.md) |
| 多目标跟踪 | [文档链接](MOTAnnoTools.md) | [文档链接](PrepareMOTDataSet.md) |
# Logging
This document talks about how to track metrics and visualize model performance during training. The library currently supports [VisualDL](https://www.paddlepaddle.org.cn/documentation/docs/en/guides/03_VisualDL/visualdl_usage_en.html) and [Weights & Biases](https://docs.wandb.ai).
## VisualDL
Logging to VisualDL is supported only in python >= 3.5. To install VisualDL
```
pip install visualdl
```
PaddleDetection uses a callback to log the training metrics at the end of every step and metrics from the validation step at the end of every epoch. To use VisualDL for visualization, add the `--use_vdl` flag to the training command and `--vdl_log_dir <logs>` to set the directory which stores the records.
For example
```
python tools/train -c config.yml --use_vdl --vdl_log_dir ./logs
```
Another possible way to do this is to add the aforementioned flags to the `config.yml` file.
## Weights & Biases
W&B is a MLOps tool that can be used for experiment tracking, dataset/model versioning, visualizing results and collaborating with colleagues. A W&B logger is integrated directly into PaddleDetection and to use it, first you need to install the wandb sdk and login to your wandb account.
```
pip install wandb
wandb login
```
To use wandb to log metrics while training add the `--use_wandb` flag to the training command and any other arguments for the W&B logger can be provided like this -
```
python tools/train -c config.yml --use_wandb -o wandb-project=MyDetector wandb-entity=MyTeam wandb-save_dir=./logs
```
The arguments to the W&B logger must be proceeded by `-o` and each invidiual argument must contain the prefix "wandb-".
If this is too tedious, an alternative way is to add the arguments to the `config.yml` file under the `wandb` header. For example
```
use_wandb: True
wandb:
project: MyProject
entity: MyTeam
save_dir: ./logs
```
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import (core, data, engine, modeling, model_zoo, optimizer, metrics,
utils, slim)
try:
from .version import full_version as __version__
from .version import commit as __git_commit__
except ImportError:
import sys
sys.stderr.write("Warning: import ppdet from source directory " \
"without installing, run 'python setup.py install' to " \
"install ppdet firstly\n")
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import config
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment