Commit fcfce60d authored by yaoht's avatar yaoht
Browse files

Initial commit

parents
Pipeline #665 failed with stages
in 0 seconds
#! /bin/sh
############### Ubuntu ###############
# 参考:https://docs.opencv.org/3.4.11/d7/d9f/tutorial_linux_install.html
# apt-get install build-essential -y
# apt-get install cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev -y
# apt-get install python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev -y # 处理图像所需的包,可选
############### CentOS ###############
yum install gcc gcc-c++ gtk2-devel gimp-devel gimp-devel-tools gimp-help-browser zlib-devel libtiff-devel libjpeg-devel libpng-devel gstreamer-devel libavc1394-devel libraw1394-devel libdc1394-devel jasper-devel jasper-utils swig python libtool nasm -y
\ No newline at end of file
############################ 在线安装依赖 ###############################
#cd ./3rdParty
#pip install rbuild-master.tar.gz
############################ 离线安装依赖 ###############################
# 安装依赖
cd ./3rdParty/rbuild_depend
pip install click-6.6-py2.py3-none-any.whl
pip install six-1.15.0-py2.py3-none-any.whl
pip install subprocess32-3.5.4.tar.gz
pip install cget-0.1.9.tar.gz
# 安装rbuild
cd ../
pip install rbuild-master.tar.gz
# 设置cmake的最低版本
cmake_minimum_required(VERSION 3.5)
# 设置项目名
project(YOLOV8)
# 设置编译器
set(CMAKE_CXX_COMPILER g++)
set(CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS} -std=c++17) # 2.2版本以上需要c++17
set(CMAKE_BUILD_TYPE release)
# 添加头文件路径
set(INCLUDE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/Src/
${CMAKE_CURRENT_SOURCE_DIR}/Src/Utility/
$ENV{DTKROOT}/include/
${CMAKE_CURRENT_SOURCE_DIR}/depend/include/)
include_directories(${INCLUDE_PATH})
# 添加依赖库路径
set(LIBRARY_PATH ${CMAKE_CURRENT_SOURCE_DIR}/depend/lib64/
$ENV{DTKROOT}/lib/)
link_directories(${LIBRARY_PATH})
# 添加依赖库
set(LIBRARY opencv_core
opencv_imgproc
opencv_imgcodecs
opencv_dnn
migraphx
migraphx_gpu
migraphx_onnx)
link_libraries(${LIBRARY})
# 添加源文件
set(SOURCE_FILES ${CMAKE_CURRENT_SOURCE_DIR}/Src/main.cpp
${CMAKE_CURRENT_SOURCE_DIR}/Src/YOLOV8.cpp
${CMAKE_CURRENT_SOURCE_DIR}/Src/Utility/CommonUtility.cpp
${CMAKE_CURRENT_SOURCE_DIR}/Src/Utility/Filesystem.cpp)
# 添加可执行目标
add_executable(YOLOV8 ${SOURCE_FILES})
# YOLOV8检测器
YOLOV8模型是目前工业界使用较多的算法,官方提供了多个不同版本的预训练模型,本份文档主要介绍了如何基于migraphx构建YOLOV8推理,包括:静态推理、动态shape推理,该示例推理流程对YOLOV8其他版本的模型同样适用。
## 模型简介
YOLOV8是一种单阶段目标检测算法,该算法在YOLOV5的基础上添加了一些新的改进思路,使其速度与精度都得到了极大的性能提升。具体包括:骨干网络和 Neck 部分可能参考了 YOLOv7 ELAN 设计思想,将 YOLOv5 的 C3 结构换成了梯度流更丰富的 C2f 结构,并对不同尺度模型调整了不同的通道数,属于对模型结构精心微调,不再是无脑一套参数应用所有模型,大幅提升了模型性能。Head 部分相比 YOLOv5 改动较大,换成了目前主流的解耦头结构,将分类和检测头分离,同时也从 Anchor-Based 换成了 Anchor-Free。Loss 计算方面采用了 TaskAlignedAssigner 正样本分配策略,并引入了 Distribution Focal Loss。训练的数据增强部分引入了 YOLOX 中的最后 10 epoch 关闭 Mosiac 增强的操作,可以有效地提升精度。网络结构如图所示。
<img src=./yolov8_model.jpg style="zoom:100%;" align=middle>
## 检测器参数设置
samples工程中的Resource/Configuration.xml文件的DetectorYOLOV8节点表示YOLOV8检测器的参数,相关参数主要依据官方推理示例进行设置。各个参数含义如下:
- ModelPathDynamic:yolov8动态模型存放路径
- ModelPathStatic:yolov8静态模型存放路径
- ClassNameFile:coco数据集类别文件存放路径
- UseFP16:是否使用FP16推理模式
- NumberOfClasses:检测类别数量
- ConfidenceThreshold:置信度阈值,用于判断proposal内的物体是否为正样本
- NMSThreshold:非极大值抑制阈值,用于消除重复框
```yaml
<ModelPathDynamic>"../Resource/Models/yolov8n_dynamic.onnx"</ModelPathDynamic>
<ModelPathStatic>"../Resource/Models/yolov8n_static.onnx"</ModelPathStatic>
<ClassNameFile>"../Resource/Models/coco.names"</ClassNameFile>
<UseFP16>0</UseFP16><!--是否使用FP16-->
<NumberOfClasses>80</NumberOfClasses><!--类别数(不包括背景类),COCO:80,VOC:20-->
<ConfidenceThreshold>0.5</ConfidenceThreshold>
<NMSThreshold>0.5</NMSThreshold>
```
## 模型初始化
模型初始化首先通过parse_onnx()函数加载YOLOV8的onnx模型。
- 静态推理:调用parse_onnx函数对静态模型进行解析
```cpp
ErrorCode DetectorYOLOV8::Initialize(InitializationParameterOfDetector initializationParameterOfDetector, bool dynamic)
{
...
// 加载模型
net = migraphx::parse_onnx(modelPath);
LOG_INFO(stdout,"succeed to load model: %s\n",GetFileName(modelPath).c_str());
...
}
```
- 动态shape推理:需要设置模型输入的最大shape,本示例设为{1,3,1024,1024}
```cpp
ErrorCode DetectorYOLOV8::Initialize(InitializationParameterOfDetector initializationParameterOfDetector, bool dynamic)
{
...
migraphx::onnx_options onnx_options;
onnx_options.map_input_dims["images"]={1,3,1024,1024};//
net = migraphx::parse_onnx(modelPath, onnx_options);
...
}
```
## 预处理
在将数据输入到模型之前,需要对图像做如下预处理操作:
- 转换数据排布为NCHW
- 归一化[0.0, 1.0]
- 输入数据的尺寸变换:静态推理将输入大小固定为relInputShape=[1,3,640,640],动态推理对输入图像尺寸变换为设定的动态尺寸。
```cpp
ErrorCode DetectorYOLOV8::Detect(const cv::Mat &srcImage, std::vector<std::size_t> &relInputShape, std::vector<ResultOfDetection> &resultsOfDetection, bool dynamic)
{
...
// 数据预处理并转换为NCHW格式
inputSize = cv::Size(relInputShape[3], relInputShape[2]);
cv::Mat inputBlob;
cv::dnn::blobFromImage(srcImage,
inputBlob,
1 / 255.0,
inputSize,
cv::Scalar(0, 0, 0),
true,
false);
...
}
```
## 推理
完成图像预处理以及YOLOV8目标检测相关参数设置之后开始执行推理,利用migraphx推理计算得到YOLOV8模型的输出。其中静态推理输入数据inputData的shape大小为模型的固定输入尺寸,动态推理则为实际输入的尺寸。
```cpp
ErrorCode DetectorYOLOV8::Detect(const cv::Mat &srcImage, std::vector<std::size_t> &relInputShape, std::vector<ResultOfDetection> &resultsOfDetection, bool dynamic)
{
...
// 创建输入数据
migraphx::parameter_map inputData;
if(dynamic)
{
inputData[inputName]= migraphx::argument{migraphx::shape(inputShape.type(), relInputShape), (float*)inputBlob.data};
}
else
{
inputData[inputName]= migraphx::argument{inputShape, (float*)inputBlob.data};
}
// 推理
std::vector<migraphx::argument> inferenceResults = net.eval(inputData);
...
}
```
YOLOV8的MIGraphX推理结果inferenceResults是一个std::vector< migraphx::argument >类型,YOLOV8的onnx模型包含一个输出,所以result等于inferenceResults[0],result包含三个维度:outputShape.lens()[0]=1表示batch信息,outputShape.lens()[1]=84表示对每个proposal的预测信息。同时可将84拆分为4+80,前4个参数用于判断每一个特征点的回归参数,回归参数调整后可以获得预测框,最后80个参数用于判断每一个特征点所包含的物体种类,outputShape.lens()[2]=8400表示生成proposal数量。
另外,如果想要指定输出节点,可以在eval()方法中通过提供outputNames参数来实现:
```cpp
...
// 推理
std::vector<std::string> outputNames = {"output0"}
std::vector<migraphx::argument> inferenceResults = net.eval(inputData, outputNames);
...
```
如果没有指定outputName参数,则默认输出所有输出节点,此时输出节点的顺序与ONNX中输出节点顺序保持一致,可以通过netron查看ONNX文件的输出节点的顺序。
获取上述信息之后进行proposal筛选,筛选过程根据confidenceThreshold阈值进行筛选,当proposal的最大置信度得分maxClassScore大于该阈值,则进一步获取当前proposal的坐标信息和预测物体类别信息,小于该阈值则不做处理。
```cpp
ErrorCode DetectorYOLOV5::Detect(const cv::Mat &srcImage, std::vector<std::size_t> &relInputShape, std::vector<ResultOfDetection> &resultsOfDetection, bool dynamic)
{
...
//获取先验框的个数
int numProposal = outs[0].size[2];
int numOut = outs[0].size[1];
//变换输出的维度
outs[0] = outs[0].reshape(1, numOut);
cv::transpose(outs[0], outs[0]);
float *data = (float *)outs[0].data;
//生成先验框
std::vector<float> confidences;
std::vector<cv::Rect> boxes;
std::vector<int> classIds;
float ratioh = (float)srcImage.rows / inputSize.height, ratiow = (float)srcImage.cols / inputSize.width;
//计算x,y,w,h
for (int n = 0; n < numProposal; n++)
{
float *classes_scores = data+4;
cv::Mat scores(1, classNames.size(), CV_32FC1, classes_scores);
cv::Point class_id;
double maxClassScore;
cv::minMaxLoc(scores, 0, &maxClassScore, 0, &class_id);
if (maxClassScore > yolov8Parameter.confidenceThreshold)
{
confidences.push_back(maxClassScore);
classIds.push_back(class_id.x);
float x = data[0];
float y = data[1];
float w = data[2];
float h = data[3];
int left = int((x - 0.5 * w) * ratiow);
int top = int((y - 0.5 * h) * ratioh);
int width = int(w * ratiow);
int height = int(h * ratioh);
boxes.push_back(cv::Rect(left, top, width, height));
}
data += numOut;
}
...
}
```
为了消除重叠锚框,输出最终的YOLOV8目标检测结果,执行非极大值抑制对筛选之后的proposal进行处理,最后保存检测结果到resultsOfDetection中。
```cpp
ErrorCode DetectorYOLOV5::Detect(const cv::Mat &srcImage, std::vector<std::size_t> &relInputShape, std::vector<ResultOfDetection> &resultsOfDetection, bool dynamic)
{
...
// 执行non maximum suppression消除冗余重叠boxes
std::vector<int> indices;
cv::dnn::NMSBoxes(boxes, confidences, yolov5Parameter.confidenceThreshold, yolov5Parameter.nmsThreshold, indices);
for (size_t i = 0; i < indices.size(); ++i)
{
int idx = indices[i];
int classID=classIds[idx];
string className=classNames[classID];
float confidence=confidences[idx];
cv::Rect box = boxes[idx];
//保存每个最终预测proposal的坐标值、置信度分数、类别ID
ResultOfDetection result;
result.boundingBox=box;
result.confidence=confidence;// confidence
result.classID=classID; // label
result.className=className;
resultsOfDetection.push_back(result);
}
...
}
```
# YOLOV8检测器
本份文档主要介绍如何基于MIGraphX构建YOLOV8的动态shape推理Python示例,根据文档描述可以了解怎样运行该Python示例得到YOLOV8的目标检测结果。
## 模型简介
YOLOV8是一种单阶段目标检测算法,该算法在YOLOV5的基础上添加了一些新的改进思路,使其速度与精度都得到了极大的性能提升。具体包括:骨干网络和 Neck 部分可能参考了 YOLOv7 ELAN 设计思想,将 YOLOv5 的 C3 结构换成了梯度流更丰富的 C2f 结构,并对不同尺度模型调整了不同的通道数,属于对模型结构精心微调,不再是无脑一套参数应用所有模型,大幅提升了模型性能。Head 部分相比 YOLOv5 改动较大,换成了目前主流的解耦头结构,将分类和检测头分离,同时也从 Anchor-Based 换成了 Anchor-Free。Loss 计算方面采用了 TaskAlignedAssigner 正样本分配策略,并引入了 Distribution Focal Loss。训练的数据增强部分引入了 YOLOX 中的最后 10 epoch 关闭 Mosiac 增强的操作,可以有效地提升精度。网络结构如图所示。
<img src=./yolov8_model.jpg style="zoom:100%;" align=middle>
## 预处理
待检测图像输入模型进行检测之前需要进行预处理,主要包括调整输入的尺寸,归一化等操作。
1. 转换数据排布为NCHW
2. 归一化[0.0, 1.0]
3. 调整输入数据的尺寸
```python
def preprocess(self, image):
"""
Preprocesses the input image before performing inference.
Returns:
image_data: Preprocessed image data ready for inference.
"""
# Read the input image using OpenCV
# self.img = cv2.imread(self.input_image)
self.img = image
# Get the height and width of the input image
self.img_height, self.img_width = self.img.shape[:2]
# Convert the image color space from BGR to RGB
img = cv2.cvtColor(self.img, cv2.COLOR_BGR2RGB)
# Resize the image to match the input shape
img = cv2.resize(img, (self.inputWidth, self.inputHeight))
# Normalize the image data by dividing it by 255.0
image_data = np.array(img) / 255.0
# Transpose the image to have the channel dimension as the first dimension
image_data = np.transpose(image_data, (2, 0, 1)) # Channel first
# Expand the dimensions of the image data to match the expected input shape
image_data = np.expand_dims(image_data, axis=0).astype(np.float32)
# Make array memery contiguous
image_data = np.ascontiguousarray(image_data)
# Return the preprocessed image data
return image_data
```
## 推理
执行YOLOV8模型推理,首先需要对YOLOV8模型进行解析、编译,静态推理过程中直接调用parse_onnx函数对静态模型进行解析,获取静态模型的输入shape信息;与静态推理不同的是,动态shape推理需要设置模型输入的最大shape,本示例设为[1,3,1024,1024]。
```python
class YOLOv8:
"""YOLOv8 object detection model class for handling inference and visualization."""
def __init__(self, model_path, dynamic=False, conf_thres=0.5, iou_thres=0.5):
"""
Initializes an instance of the YOLOv8 class.
Args:
model_path: Path to the ONNX model.
dynamic: whether use dynamic inference.
conf_thres: Confidence threshold for filtering detections.
iou_thres: IoU (Intersection over Union) threshold for non-maximum suppression.
"""
self.confThreshold = conf_thres
self.nmsThreshold = iou_thres
self.isDynamic = dynamic
# 获取模型检测的类别信息
self.classNames = list(map(lambda x: x.strip(), open('../Resource/Models/coco.names', 'r').readlines()))
# 解析推理模型
if self.isDynamic:
maxInput={"images":[1,3,1024,1024]}
self.model = migraphx.parse_onnx(model_path, map_input_dims=maxInput)
# 获取模型输入/输出节点信息
print("inputs:")
inputs = self.model.get_inputs()
for key,value in inputs.items():
print("{}:{}".format(key,value))
print("outputs:")
outputs = self.model.get_outputs()
for key,value in outputs.items():
print("{}:{}".format(key,value))
# 获取模型的输入name
self.inputName = "images"
# 获取模型的输入尺寸
inputShape = inputShape=inputs[self.inputName].lens()
self.inputHeight = int(inputShape[2])
self.inputWidth = int(inputShape[3])
print("inputName:{0} \ninputShape:{1}".format(self.inputName, inputShape))
else:
self.model = migraphx.parse_onnx(path)
...
# 模型编译
self.model.compile(t=migraphx.get_target("gpu"), device_id=0) # device_id: 设置GPU设备,默认为0号设备
print("Success to compile")
...
```
模型初始化完成之后开始进行推理,对输入数据进行前向计算得到模型的输出result,在detect函数中调用定义的postprocess函数对result进行后处理,得到图像中含有物体的anchor坐标信息、类别置信度、类别ID并画在输入图像上。
```python
def detect(self, image, input_shape=None):
if(self.isDynamic):
self.inputWidth = input_shape[3]
self.inputHeight = input_shape[2]
# 输入图片预处理
input_img = self.preprocess(image)
# 执行推理
start = time.time()
result = self.model.run({self.inputName: input_img})
print('net forward time: {:.4f}'.format(time.time() - start))
# 模型输出结果后处理
dstimg = self.postprocess(image, result)
return dstimg
```
其中对migraphx推理输出result进行后处理,首先需要置信度阈值confThreshold进行筛选,并执行非极大值抑制消除冗余anchor。相关过程定义在postprocess函数中。
```python
def postprocess(self, input_image, output):
"""
Performs post-processing on the model's output to extract bounding boxes, scores, and class IDs.
Args:
input_image (numpy.ndarray): The input image.
output (numpy.ndarray): The output of the model.
Returns:
numpy.ndarray: The input image with detections drawn on it.
"""
# Transpose and squeeze the output to match the expected shape
outputs = np.transpose(np.squeeze(output[0]))
# Get the number of rows in the outputs array
rows = outputs.shape[0]
# Lists to store the bounding boxes, scores, and class IDs of the detections
boxes = []
scores = []
class_ids = []
# Calculate the scaling factors for the bounding box coordinates
x_factor = self.img_width / self.inputWidth
y_factor = self.img_height / self.inputHeight
# Iterate over each row in the outputs array
for i in range(rows):
# Extract the class scores from the current row
classes_scores = outputs[i][4:]
# Find the maximum score among the class scores
max_score = np.amax(classes_scores)
# If the maximum score is above the confidence threshold
if max_score >= self.confThreshold:
# Get the class ID with the highest score
class_id = np.argmax(classes_scores)
# Extract the bounding box coordinates from the current row
x, y, w, h = outputs[i][0], outputs[i][1], outputs[i][2], outputs[i][3]
# Calculate the scaled coordinates of the bounding box
left = int((x - w / 2) * x_factor)
top = int((y - h / 2) * y_factor)
width = int(w * x_factor)
height = int(h * y_factor)
# Add the class ID, score, and box coordinates to the respective lists
class_ids.append(class_id)
scores.append(max_score)
boxes.append([left, top, width, height])
# Apply non-maximum suppression to filter out overlapping bounding boxes
indices = cv2.dnn.NMSBoxes(boxes, scores, self.confThreshold, self.nmsThreshold)
# Iterate over the selected indices after non-maximum suppression
for i in indices:
# Get the box, score, and class ID corresponding to the index
box = boxes[i]
score = scores[i]
class_id = class_ids[i]
# Draw the detection on the input image
self.draw_detections(input_image, box, score, class_id)
# Return the modified input image
return input_image
```
根据NMS去重后的boxes、scores、class_ids信息在原图进行结果可视化,包括绘制图像中检测到的物体位置、类别和置信度分数,得到最终的YOLOV8目标检测结果输出。
```python
def draw_detections(self, img, box, score, class_id):
"""
Draws bounding boxes and labels on the input image based on the detected objects.
Args:
img: The input image to draw detections on.
box: Detected bounding box.
score: Corresponding detection score.
class_id: Class ID for the detected object.
Returns:
None
"""
# Extract the coordinates of the bounding box
x1, y1, w, h = box
# Retrieve the color for the class ID
color = self.color_palette[class_id]
# Draw the bounding box on the image
cv2.rectangle(img, (int(x1), int(y1)), (int(x1 + w), int(y1 + h)), color, 2)
# Create the label text with class name and score
label = f'{self.classNames[class_id]}: {score:.2f}'
# Calculate the dimensions of the label text
(label_width, label_height), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
# Calculate the position of the label text
label_x = x1
label_y = y1 - 10 if y1 - 10 > label_height else y1 + 10
# Draw a filled rectangle as the background for the label text
cv2.rectangle(img, (label_x, label_y - label_height), (label_x + label_width, label_y + label_height), color,
cv2.FILLED)
# Draw the label text on the image
cv2.putText(img, label, (label_x, label_y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)
```
# -*- coding: utf-8 -*-
import os
import time
import migraphx
import argparse
import cv2
import numpy as np
class YOLOv8:
"""YOLOv8 object detection model class for handling inference and visualization."""
def __init__(self, model_path, dynamic=False, conf_thres=0.5, iou_thres=0.5):
"""
Initializes an instance of the YOLOv8 class.
Args:
model_path: Path to the ONNX model.
dynamic: whether use dynamic inference.
conf_thres: Confidence threshold for filtering detections.
iou_thres: IoU (Intersection over Union) threshold for non-maximum suppression.
"""
self.confThreshold = conf_thres
self.nmsThreshold = iou_thres
self.isDynamic = dynamic
# 获取模型检测的类别信息
self.classNames = list(map(lambda x: x.strip(), open('../Resource/Models/coco.names', 'r').readlines()))
# 解析推理模型
if self.isDynamic:
maxInput={"images":[1,3,1024,1024]}
self.model = migraphx.parse_onnx(model_path, map_input_dims=maxInput)
# 获取模型输入/输出节点信息
print("inputs:")
inputs = self.model.get_inputs()
for key,value in inputs.items():
print("{}:{}".format(key,value))
print("outputs:")
outputs = self.model.get_outputs()
for key,value in outputs.items():
print("{}:{}".format(key,value))
# 获取模型的输入name
self.inputName = "images"
# 获取模型的输入尺寸
inputShape = inputShape=inputs[self.inputName].lens()
self.inputHeight = int(inputShape[2])
self.inputWidth = int(inputShape[3])
print("inputName:{0} \ninputShape:{1}".format(self.inputName, inputShape))
else:
self.model = migraphx.parse_onnx(model_path)
# 获取模型输入/输出节点信息
print("inputs:")
inputs = self.model.get_inputs()
for key,value in inputs.items():
print("{}:{}".format(key,value))
print("outputs:")
outputs = self.model.get_outputs()
for key,value in outputs.items():
print("{}:{}".format(key,value))
# 获取模型的输入name
self.inputName = "images"
# 获取模型的输入尺寸
inputShape = inputShape=inputs[self.inputName].lens()
self.inputHeight = int(inputShape[2])
self.inputWidth = int(inputShape[3])
print("inputName:{0} \ninputShape:{1}".format(self.inputName, inputShape))
# 模型编译
self.model.compile(t=migraphx.get_target("gpu"), device_id=0) # device_id: 设置GPU设备,默认为0号设备
print("Success to compile")
# Generate a color palette for the classes
self.color_palette = np.random.uniform(0, 255, size=(len(self.classNames), 3))
def draw_detections(self, img, box, score, class_id):
"""
Draws bounding boxes and labels on the input image based on the detected objects.
Args:
img: The input image to draw detections on.
box: Detected bounding box.
score: Corresponding detection score.
class_id: Class ID for the detected object.
Returns:
None
"""
# Extract the coordinates of the bounding box
x1, y1, w, h = box
# Retrieve the color for the class ID
color = self.color_palette[class_id]
# Draw the bounding box on the image
cv2.rectangle(img, (int(x1), int(y1)), (int(x1 + w), int(y1 + h)), color, 2)
# Create the label text with class name and score
label = f'{self.classNames[class_id]}: {score:.2f}'
# Calculate the dimensions of the label text
(label_width, label_height), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
# Calculate the position of the label text
label_x = x1
label_y = y1 - 10 if y1 - 10 > label_height else y1 + 10
# Draw a filled rectangle as the background for the label text
cv2.rectangle(img, (label_x, label_y - label_height), (label_x + label_width, label_y + label_height), color,
cv2.FILLED)
# Draw the label text on the image
cv2.putText(img, label, (label_x, label_y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)
def preprocess(self, image):
"""
Preprocesses the input image before performing inference.
Returns:
image_data: Preprocessed image data ready for inference.
"""
# Read the input image using OpenCV
# self.img = cv2.imread(self.input_image)
self.img = image
# Get the height and width of the input image
self.img_height, self.img_width = self.img.shape[:2]
# Convert the image color space from BGR to RGB
img = cv2.cvtColor(self.img, cv2.COLOR_BGR2RGB)
# Resize the image to match the input shape
img = cv2.resize(img, (self.inputWidth, self.inputHeight))
# Normalize the image data by dividing it by 255.0
image_data = np.array(img) / 255.0
# Transpose the image to have the channel dimension as the first dimension
image_data = np.transpose(image_data, (2, 0, 1)) # Channel first
# Expand the dimensions of the image data to match the expected input shape
image_data = np.expand_dims(image_data, axis=0).astype(np.float32)
# Make array memery contiguous
image_data = np.ascontiguousarray(image_data)
# Return the preprocessed image data
return image_data
def postprocess(self, input_image, output):
"""
Performs post-processing on the model's output to extract bounding boxes, scores, and class IDs.
Args:
input_image (numpy.ndarray): The input image.
output (numpy.ndarray): The output of the model.
Returns:
numpy.ndarray: The input image with detections drawn on it.
"""
# Transpose and squeeze the output to match the expected shape
outputs = np.transpose(np.squeeze(output[0]))
# Get the number of rows in the outputs array
rows = outputs.shape[0]
# Lists to store the bounding boxes, scores, and class IDs of the detections
boxes = []
scores = []
class_ids = []
# Calculate the scaling factors for the bounding box coordinates
x_factor = self.img_width / self.inputWidth
y_factor = self.img_height / self.inputHeight
# Iterate over each row in the outputs array
for i in range(rows):
# Extract the class scores from the current row
classes_scores = outputs[i][4:]
# Find the maximum score among the class scores
max_score = np.amax(classes_scores)
# If the maximum score is above the confidence threshold
if max_score >= self.confThreshold:
# Get the class ID with the highest score
class_id = np.argmax(classes_scores)
# Extract the bounding box coordinates from the current row
x, y, w, h = outputs[i][0], outputs[i][1], outputs[i][2], outputs[i][3]
# Calculate the scaled coordinates of the bounding box
left = int((x - w / 2) * x_factor)
top = int((y - h / 2) * y_factor)
width = int(w * x_factor)
height = int(h * y_factor)
# Add the class ID, score, and box coordinates to the respective lists
class_ids.append(class_id)
scores.append(max_score)
boxes.append([left, top, width, height])
# Apply non-maximum suppression to filter out overlapping bounding boxes
indices = cv2.dnn.NMSBoxes(boxes, scores, self.confThreshold, self.nmsThreshold)
# Iterate over the selected indices after non-maximum suppression
for i in indices:
# Get the box, score, and class ID corresponding to the index
box = boxes[i]
score = scores[i]
class_id = class_ids[i]
# Draw the detection on the input image
self.draw_detections(input_image, box, score, class_id)
# Return the modified input image
return input_image
def detect(self, image, input_shape=None):
if(self.isDynamic):
self.inputWidth = input_shape[3]
self.inputHeight = input_shape[2]
# 输入图片预处理
input_img = self.preprocess(image)
# 执行推理
start = time.time()
result = self.model.run({self.inputName: input_img})
print('net forward time: {:.4f}'.format(time.time() - start))
# 模型输出结果后处理
dstimg = self.postprocess(image, result)
return dstimg
def read_images(image_path):
image_lists = []
for image_name in os.listdir(image_path):
image = cv2.imread(image_path +"/" + image_name, 1)
image_lists.append(image)
return image_lists
def yolov8_Static(imgpath, modelpath, confThreshold, nmsThreshold):
yolov8_detector = YOLOv8(modelpath, False, conf_thres=confThreshold,
iou_thres=nmsThreshold)
srcimg = cv2.imread(imgpath, 1)
dstimg = yolov8_detector.detect(srcimg)
# 保存检测结果
cv2.imwrite("./Result.jpg", dstimg)
print("Success to save result")
def yolov8_dynamic(imgpath, modelpath, confThreshold, nmsThreshold):
# 设置动态输入shape
input_shapes = []
input_shapes.append([1,3,416,416])
input_shapes.append([1,3,608,608])
# 读取测试图像
image_lists = read_images(imgpath)
# 推理
yolov8_detector = YOLOv8(modelpath, True,
conf_thres=confThreshold, iou_thres=nmsThreshold)
for i, image in enumerate(image_lists):
print("Start to inference image{}".format(i))
dstimg = yolov8_detector.detect(image, input_shapes[i])
# 保存检测结果
result_name = "Result{}.jpg".format(i)
cv2.imwrite(result_name, dstimg)
print("Success to save results")
if __name__ == '__main__':
# Create an argument parser to handle command-line arguments
parser = argparse.ArgumentParser()
parser.add_argument('--imgPath', type=str, default='../Resource/Images/image_test.jpg', help="image path")
parser.add_argument('--imgFolderPath', type=str, default='../Resource/Images/DynamicPics', help="image folder path")
parser.add_argument('--staticModelPath', type=str, default='../Resource/Models/yolov8n_static.onnx', help="static onnx filepath")
parser.add_argument('--dynamicModelPath', type=str, default='../Resource/Models/yolov8n_dynamic.onnx', help="dynamic onnx filepath")
parser.add_argument('--confThreshold', default=0.5, type=float, help='class confidence')
parser.add_argument('--nmsThreshold', default=0.5, type=float, help='nms iou thresh')
parser.add_argument("--staticInfer",action="store_true",default=False,help="Performing static inference")
parser.add_argument("--dynamicInfer",action="store_true",default=False,help="Performing dynamic inference")
args = parser.parse_args()
# 静态推理
if args.staticInfer:
yolov8_Static(args.imgPath, args.staticModelPath, args.confThreshold, args.nmsThreshold)
# 动态推理
if args.dynamicInfer:
yolov8_dynamic(args.imgFolderPath, args.dynamicModelPath, args.confThreshold, args.nmsThreshold)
# YoloV8
## 模型介绍
YoloV8是一种单阶段目标检测算法,该算法在YOLOV5的基础上添加了一些新的改进思路,使其速度与精度都得到了极大的性能提升。
## 模型结构
YoloV8模型的主要改进思路有以下几点:
- 骨干网络和 Neck 部分可能参考了 YOLOv7 ELAN 设计思想,将 YOLOv5 的 C3 结构换成了梯度流更丰富的 C2f 结构,并对不同尺度模型调整了不同的通道数,属于对模型结构精心微调,不再是无脑一套参数应用所有模型,大幅提升了模型性能。
- Head 部分相比 YOLOv5 改动较大,换成了目前主流的解耦头结构,将分类和检测头分离,同时也从 Anchor-Based 换成了 Anchor-Free.
- Loss 计算方面采用了 TaskAlignedAssigner 正样本分配策略,并引入了 Distribution Focal Loss.
- 训练的数据增强部分引入了 YOLOX 中的最后 10 epoch 关闭 Mosiac 增强的操作,可以有效地提升精度
## Python版本推理
下面介绍如何运行Python代码示例,Python示例的详细说明见Doc目录下的Tutorial_Python.md。
### 下载镜像
下载MIGraphX镜像:
```shell
docker pull image.sourcefind.cn:5000/dcu/admin/base/migraphx:4.0.0-centos7.6-dtk23.04.1-py38-latest
```
### 设置Python环境变量
```shell
export PYTHONPATH=/opt/dtk/lib:$PYTHONPATH
```
### 安装依赖
```shell
# 进入python示例目录
cd <path_to_yolov8_migraphx>/Python
# 安装依赖
pip install -r requirements.txt
```
### 运行示例
YoloV8模型的推理示例程序是YoloV8_infer_migraphx.py,使用如下命令运行该推理示例:
```shell
# 进入python目录
cd <path_to_yolov8_migraphx>
# 进入Python目录
cd Python/
```
1. 静态推理
```shell
python YoloV8_infer_migraphx.py --staticInfer
```
程序运行结束后,在当前目录生成YOLOV8静态推理检测结果可视化图像Result.jpg
<img src="./Resource/Images/Result.jpg" alt="Result" style="zoom: 50%;" />
2. 动态推理
```shell
python YoloV8_infer_migraphx.py --dynamicInfer
```
程序运行结束会在当前目录生成YoloV8动态推理检测结果可视化图像Result0.jpg、Result1.jpg。
<img src="./Resource/Images/Result0.jpg" alt="Result_2" style="zoom: 50%;" />
<img src="./Resource/Images/Result1.jpg" alt="Result1" style="zoom: 50%;" />
## C++版本推理
下面介绍如何运行C++代码示例,C++示例的详细说明见Doc目录下的Tutorial_Cpp.md。
### 下载镜像
下载MIGraphX镜像:
```shell
docker pull image.sourcefind.cn:5000/dcu/admin/base/migraphx:4.0.0-centos7.6-dtk23.04.1-py38-latest
```
### 构建工程
```shell
rbuild build -d depend
```
### 设置环境变量
将依赖库依赖加入环境变量LD_LIBRARY_PATH,在~/.bashrc中添加如下语句:
```shell
export LD_LIBRARY_PATH=<path_to_yolov8_migraphx>/depend/lib64/:$LD_LIBRARY_PATH
```
然后执行:
```shell
source ~/.bashrc
```
### 运行示例
YoloV8示例程序编译成功后,执行如下指令运行该示例:
```shell
# 进入yolov8 migraphx工程根目录
cd <path_to_yolov8_migraphx>
# 进入build目录
cd build/
```
1. 静态推理
```shell
./YOLOV8 0
```
程序运行结束后,会在当前目录生成YOLOV8静态推理检测结果可视化图像Result.jpg
<img src="./Resource/Images/Result.jpg" alt="Result" style="zoom:50%;" />
2. 动态推理
```shell
./YOLOV8 1
```
程序运行结束会在build目录生成YoloV8动态shape推理检测结果可视化图像Result0.jpg、Result1.jpg。
<img src="./Resource/Images/Result0.jpg" alt="Result" style="zoom:50%;" />
<img src="./Resource/Images/Result1.jpg" alt="Result" style="zoom:50%;" />
## 源码仓库及问题反馈
​ https://developer.hpccube.com/codes/modelzoo/yolov8_migraphx
## 参考
​ https://github.com/ultralytics/ultralytics
<?xml version="1.0" encoding="GB2312"?>
<opencv_storage>
<!--YOLOV8检测器 -->
<DetectorYOLOV8>
<ModelPathDynamic>"../Resource/Models/yolov8n_dynamic.onnx"</ModelPathDynamic>
<ModelPathStatic>"../Resource/Models/yolov8n_static.onnx"</ModelPathStatic>
<ClassNameFile>"../Resource/Models/coco.names"</ClassNameFile>
<UseFP16>0</UseFP16><!--是否使用FP16-->
<NumberOfClasses>80</NumberOfClasses><!--类别数(不包括背景类),COCO:80,VOC:20-->
<ConfidenceThreshold>0.5</ConfidenceThreshold>
<NMSThreshold>0.5</NMSThreshold>
<!-- <ObjectThreshold>0.5</ObjectThreshold> -->
</DetectorYOLOV8>
</opencv_storage>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment