# YOLOV8检测器 本份文档主要介绍如何基于MIGraphX构建YOLOV8的动态shape推理Python示例,根据文档描述可以了解怎样运行该Python示例得到YOLOV8的目标检测结果。 ## 模型简介 YOLOV8是一种单阶段目标检测算法,该算法在YOLOV5的基础上添加了一些新的改进思路,使其速度与精度都得到了极大的性能提升。具体包括:骨干网络和 Neck 部分可能参考了 YOLOv7 ELAN 设计思想,将 YOLOv5 的 C3 结构换成了梯度流更丰富的 C2f 结构,并对不同尺度模型调整了不同的通道数,属于对模型结构精心微调,不再是无脑一套参数应用所有模型,大幅提升了模型性能。Head 部分相比 YOLOv5 改动较大,换成了目前主流的解耦头结构,将分类和检测头分离,同时也从 Anchor-Based 换成了 Anchor-Free。Loss 计算方面采用了 TaskAlignedAssigner 正样本分配策略,并引入了 Distribution Focal Loss。训练的数据增强部分引入了 YOLOX 中的最后 10 epoch 关闭 Mosiac 增强的操作,可以有效地提升精度。网络结构如图所示。 ## 预处理 待检测图像输入模型进行检测之前需要进行预处理,主要包括调整输入的尺寸,归一化等操作。 1. 转换数据排布为NCHW 2. 归一化[0.0, 1.0] 3. 调整输入数据的尺寸 ```python def preprocess(self, image): """ Preprocesses the input image before performing inference. Returns: image_data: Preprocessed image data ready for inference. """ # Read the input image using OpenCV # self.img = cv2.imread(self.input_image) self.img = image # Get the height and width of the input image self.img_height, self.img_width = self.img.shape[:2] # Convert the image color space from BGR to RGB img = cv2.cvtColor(self.img, cv2.COLOR_BGR2RGB) # Resize the image to match the input shape img = cv2.resize(img, (self.inputWidth, self.inputHeight)) # Normalize the image data by dividing it by 255.0 image_data = np.array(img) / 255.0 # Transpose the image to have the channel dimension as the first dimension image_data = np.transpose(image_data, (2, 0, 1)) # Channel first # Expand the dimensions of the image data to match the expected input shape image_data = np.expand_dims(image_data, axis=0).astype(np.float32) # Make array memery contiguous image_data = np.ascontiguousarray(image_data) # Return the preprocessed image data return image_data ``` ## 推理 执行YOLOV8模型推理,首先需要对YOLOV8模型进行解析、编译,静态推理过程中直接调用parse_onnx函数对静态模型进行解析,获取静态模型的输入shape信息;与静态推理不同的是,动态shape推理需要设置模型输入的最大shape,本示例设为[1,3,1024,1024]。 ```python class YOLOv8: """YOLOv8 object detection model class for handling inference and visualization.""" def __init__(self, model_path, dynamic=False, conf_thres=0.5, iou_thres=0.5): """ Initializes an instance of the YOLOv8 class. Args: model_path: Path to the ONNX model. dynamic: whether use dynamic inference. conf_thres: Confidence threshold for filtering detections. iou_thres: IoU (Intersection over Union) threshold for non-maximum suppression. """ self.confThreshold = conf_thres self.nmsThreshold = iou_thres self.isDynamic = dynamic # 获取模型检测的类别信息 self.classNames = list(map(lambda x: x.strip(), open('../Resource/Models/coco.names', 'r').readlines())) # 解析推理模型 if self.isDynamic: maxInput={"images":[1,3,1024,1024]} self.model = migraphx.parse_onnx(model_path, map_input_dims=maxInput) # 获取模型输入/输出节点信息 print("inputs:") inputs = self.model.get_inputs() for key,value in inputs.items(): print("{}:{}".format(key,value)) print("outputs:") outputs = self.model.get_outputs() for key,value in outputs.items(): print("{}:{}".format(key,value)) # 获取模型的输入name self.inputName = "images" # 获取模型的输入尺寸 inputShape = inputShape=inputs[self.inputName].lens() self.inputHeight = int(inputShape[2]) self.inputWidth = int(inputShape[3]) print("inputName:{0} \ninputShape:{1}".format(self.inputName, inputShape)) else: self.model = migraphx.parse_onnx(path) ... # 模型编译 self.model.compile(t=migraphx.get_target("gpu"), device_id=0) # device_id: 设置GPU设备,默认为0号设备 print("Success to compile") ... ``` 模型初始化完成之后开始进行推理,对输入数据进行前向计算得到模型的输出result,在detect函数中调用定义的postprocess函数对result进行后处理,得到图像中含有物体的anchor坐标信息、类别置信度、类别ID并画在输入图像上。 ```python def detect(self, image, input_shape=None): if(self.isDynamic): self.inputWidth = input_shape[3] self.inputHeight = input_shape[2] # 输入图片预处理 input_img = self.preprocess(image) # 执行推理 start = time.time() result = self.model.run({self.inputName: input_img}) print('net forward time: {:.4f}'.format(time.time() - start)) # 模型输出结果后处理 dstimg = self.postprocess(image, result) return dstimg ``` 其中对migraphx推理输出result进行后处理,首先需要置信度阈值confThreshold进行筛选,并执行非极大值抑制消除冗余anchor。相关过程定义在postprocess函数中。 ```python def postprocess(self, input_image, output): """ Performs post-processing on the model's output to extract bounding boxes, scores, and class IDs. Args: input_image (numpy.ndarray): The input image. output (numpy.ndarray): The output of the model. Returns: numpy.ndarray: The input image with detections drawn on it. """ # Transpose and squeeze the output to match the expected shape outputs = np.transpose(np.squeeze(output[0])) # Get the number of rows in the outputs array rows = outputs.shape[0] # Lists to store the bounding boxes, scores, and class IDs of the detections boxes = [] scores = [] class_ids = [] # Calculate the scaling factors for the bounding box coordinates x_factor = self.img_width / self.inputWidth y_factor = self.img_height / self.inputHeight # Iterate over each row in the outputs array for i in range(rows): # Extract the class scores from the current row classes_scores = outputs[i][4:] # Find the maximum score among the class scores max_score = np.amax(classes_scores) # If the maximum score is above the confidence threshold if max_score >= self.confThreshold: # Get the class ID with the highest score class_id = np.argmax(classes_scores) # Extract the bounding box coordinates from the current row x, y, w, h = outputs[i][0], outputs[i][1], outputs[i][2], outputs[i][3] # Calculate the scaled coordinates of the bounding box left = int((x - w / 2) * x_factor) top = int((y - h / 2) * y_factor) width = int(w * x_factor) height = int(h * y_factor) # Add the class ID, score, and box coordinates to the respective lists class_ids.append(class_id) scores.append(max_score) boxes.append([left, top, width, height]) # Apply non-maximum suppression to filter out overlapping bounding boxes indices = cv2.dnn.NMSBoxes(boxes, scores, self.confThreshold, self.nmsThreshold) # Iterate over the selected indices after non-maximum suppression for i in indices: # Get the box, score, and class ID corresponding to the index box = boxes[i] score = scores[i] class_id = class_ids[i] # Draw the detection on the input image self.draw_detections(input_image, box, score, class_id) # Return the modified input image return input_image ``` 根据NMS去重后的boxes、scores、class_ids信息在原图进行结果可视化,包括绘制图像中检测到的物体位置、类别和置信度分数,得到最终的YOLOV8目标检测结果输出。 ```python def draw_detections(self, img, box, score, class_id): """ Draws bounding boxes and labels on the input image based on the detected objects. Args: img: The input image to draw detections on. box: Detected bounding box. score: Corresponding detection score. class_id: Class ID for the detected object. Returns: None """ # Extract the coordinates of the bounding box x1, y1, w, h = box # Retrieve the color for the class ID color = self.color_palette[class_id] # Draw the bounding box on the image cv2.rectangle(img, (int(x1), int(y1)), (int(x1 + w), int(y1 + h)), color, 2) # Create the label text with class name and score label = f'{self.classNames[class_id]}: {score:.2f}' # Calculate the dimensions of the label text (label_width, label_height), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1) # Calculate the position of the label text label_x = x1 label_y = y1 - 10 if y1 - 10 > label_height else y1 + 10 # Draw a filled rectangle as the background for the label text cv2.rectangle(img, (label_x, label_y - label_height), (label_x + label_width, label_y + label_height), color, cv2.FILLED) # Draw the label text on the image cv2.putText(img, label, (label_x, label_y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA) ```