# YOLOV8检测器 YOLOV8模型是目前工业界使用较多的算法，官方提供了多个不同版本的预训练模型，本份文档主要介绍了如何基于migraphx构建YOLOV8推理，包括：静态推理、动态shape推理，该示例推理流程对YOLOV8其他版本的模型同样适用。 ## 模型简介 YOLOV8是一种单阶段目标检测算法，该算法在YOLOV5的基础上添加了一些新的改进思路，使其速度与精度都得到了极大的性能提升。具体包括：骨干网络和 Neck 部分可能参考了 YOLOv7 ELAN 设计思想，将 YOLOv5 的 C3 结构换成了梯度流更丰富的 C2f 结构，并对不同尺度模型调整了不同的通道数，属于对模型结构精心微调，不再是无脑一套参数应用所有模型，大幅提升了模型性能。Head 部分相比 YOLOv5 改动较大，换成了目前主流的解耦头结构，将分类和检测头分离，同时也从 Anchor-Based 换成了 Anchor-Free。Loss 计算方面采用了 TaskAlignedAssigner 正样本分配策略，并引入了 Distribution Focal Loss。训练的数据增强部分引入了 YOLOX 中的最后 10 epoch 关闭 Mosiac 增强的操作，可以有效地提升精度。网络结构如图所示。

## 检测器参数设置 samples工程中的Resource/Configuration.xml文件的DetectorYOLOV8节点表示YOLOV8检测器的参数，相关参数主要依据官方推理示例进行设置。各个参数含义如下： - ModelPathDynamic：yolov8动态模型存放路径 - ModelPathStatic：yolov8静态模型存放路径 - ClassNameFile：coco数据集类别文件存放路径 - UseFP16：是否使用FP16推理模式 - NumberOfClasses：检测类别数量 - ConfidenceThreshold：置信度阈值，用于判断proposal内的物体是否为正样本 - NMSThreshold：非极大值抑制阈值，用于消除重复框 ```yaml "../Resource/Models/yolov8n_dynamic.onnx" "../Resource/Models/yolov8n_static.onnx" "../Resource/Models/coco.names" 0 80 0.5 0.5 ``` ## 模型初始化模型初始化首先通过parse_onnx()函数加载YOLOV8的onnx模型。 - 静态推理：调用parse_onnx函数对静态模型进行解析 ```cpp ErrorCode DetectorYOLOV8::Initialize(InitializationParameterOfDetector initializationParameterOfDetector, bool dynamic) { ... // 加载模型 net = migraphx::parse_onnx(modelPath); LOG_INFO(stdout,"succeed to load model: %s\n",GetFileName(modelPath).c_str()); ... } ``` - 动态shape推理：需要设置模型输入的最大shape，本示例设为{1,3,1024,1024} ```cpp ErrorCode DetectorYOLOV8::Initialize(InitializationParameterOfDetector initializationParameterOfDetector, bool dynamic) { ... migraphx::onnx_options onnx_options; onnx_options.map_input_dims["images"]={1,3,1024,1024};// net = migraphx::parse_onnx(modelPath, onnx_options); ... } ``` ## 预处理在将数据输入到模型之前，需要对图像做如下预处理操作： - 转换数据排布为NCHW - 归一化[0.0, 1.0] - 输入数据的尺寸变换：静态推理将输入大小固定为relInputShape=[1,3,640,640]，动态推理对输入图像尺寸变换为设定的动态尺寸。 ```cpp ErrorCode DetectorYOLOV8::Detect(const cv::Mat &srcImage, std::vector &relInputShape, std::vector &resultsOfDetection, bool dynamic) { ... // 数据预处理并转换为NCHW格式 inputSize = cv::Size(relInputShape[3], relInputShape[2]); cv::Mat inputBlob; cv::dnn::blobFromImage(srcImage, inputBlob, 1 / 255.0, inputSize, cv::Scalar(0, 0, 0), true, false); ... } ``` ## 推理利用migraphx推理得到YOLOV8模型的输出。其中静态推理输入数据inputData的shape大小为模型的固定输入尺寸，动态推理则为实际输入的尺寸。 ```cpp ErrorCode DetectorYOLOV8::Detect(const cv::Mat &srcImage, std::vector &relInputShape, std::vector &resultsOfDetection, bool dynamic) { ... // 创建输入数据 migraphx::parameter_map inputData; if(dynamic) { inputData[inputName]= migraphx::argument{migraphx::shape(inputShape.type(), relInputShape), (float*)inputBlob.data}; } else { inputData[inputName]= migraphx::argument{inputShape, (float*)inputBlob.data}; } // 推理 std::vector inferenceResults = net.eval(inputData); ... } ``` YOLOV8的MIGraphX推理结果inferenceResults是一个std::vector< migraphx::argument >类型，YOLOV8的onnx模型包含一个输出，所以result等于inferenceResults[0]，result包含三个维度：outputShape.lens()[0]=1表示batch信息，outputShape.lens()[1]=84表示对每个proposal的预测信息。同时可将84拆分为4+80，前4个参数用于判断每一个特征点的回归参数，回归参数调整后可以获得预测框，最后80个参数用于判断每一个特征点所包含的物体种类，outputShape.lens()[2]=8400表示生成proposal数量。另外，如果想要指定输出节点，可以在eval()方法中通过提供outputNames参数来实现： ```cpp ... // 推理 std::vector outputNames = {"output0"} std::vector inferenceResults = net.eval(inputData, outputNames); ... ``` 如果没有指定outputName参数，则默认输出所有输出节点，此时输出节点的顺序与ONNX中输出节点顺序保持一致，可以通过netron查看ONNX文件的输出节点的顺序。获取上述信息之后进行proposal筛选，筛选过程根据confidenceThreshold阈值进行筛选，proposal的最大置信度得分maxClassScore由80个类别的分数最高值确定，当maxClassScore大于confidenceThreshold阈值，则进一步获取当前proposal的坐标信息和预测物体类别信息，小于该阈值则不做处理。 ```cpp ErrorCode DetectorYOLOV5::Detect(const cv::Mat &srcImage, std::vector &relInputShape, std::vector &resultsOfDetection, bool dynamic) { ... //获取先验框的个数 int numProposal = outs[0].size[2]; int numOut = outs[0].size[1]; //变换输出的维度 outs[0] = outs[0].reshape(1, numOut); cv::transpose(outs[0], outs[0]); float *data = (float *)outs[0].data; //生成先验框 std::vector confidences; std::vector boxes; std::vector classIds; float ratioh = (float)srcImage.rows / inputSize.height, ratiow = (float)srcImage.cols / inputSize.width; //计算x,y,w,h for (int n = 0; n < numProposal; n++) { float *classes_scores = data+4; cv::Mat scores(1, classNames.size(), CV_32FC1, classes_scores); cv::Point class_id; double maxClassScore; cv::minMaxLoc(scores, 0, &maxClassScore, 0, &class_id); if (maxClassScore > yolov8Parameter.confidenceThreshold) { confidences.push_back(maxClassScore); classIds.push_back(class_id.x); float x = data[0]; float y = data[1]; float w = data[2]; float h = data[3]; int left = int((x - 0.5 * w) * ratiow); int top = int((y - 0.5 * h) * ratioh); int width = int(w * ratiow); int height = int(h * ratioh); boxes.push_back(cv::Rect(left, top, width, height)); } data += numOut; } ... } ``` 为了消除重叠锚框，输出最终的YOLOV8目标检测结果，执行非极大值抑制对筛选之后的proposal进行处理，最后保存检测结果到resultsOfDetection中。 ```cpp ErrorCode DetectorYOLOV5::Detect(const cv::Mat &srcImage, std::vector &relInputShape, std::vector &resultsOfDetection, bool dynamic) { ... // 执行non maximum suppression消除冗余重叠boxes std::vector indices; cv::dnn::NMSBoxes(boxes, confidences, yolov5Parameter.confidenceThreshold, yolov5Parameter.nmsThreshold, indices); for (size_t i = 0; i < indices.size(); ++i) { int idx = indices[i]; int classID=classIds[idx]; string className=classNames[classID]; float confidence=confidences[idx]; cv::Rect box = boxes[idx]; //保存每个最终预测proposal的坐标值、置信度分数、类别ID ResultOfDetection result; result.boundingBox=box; result.confidence=confidence;// confidence result.classID=classID; // label result.className=className; resultsOfDetection.push_back(result); } ... } ```