Tutorial_Python.md

# YOLOV8检测器

本份文档主要介绍如何基于MIGraphX构建YOLOV8的动态shape推理Python示例，根据文档描述可以了解怎样运行该Python示例得到YOLOV8的目标检测结果。

## 模型简介

YOLOV8是一种单阶段目标检测算法，该算法在YOLOV5的基础上添加了一些新的改进思路，使其速度与精度都得到了极大的性能提升。具体包括：骨干网络和 Neck 部分可能参考了 YOLOv7 ELAN 设计思想，将 YOLOv5 的 C3 结构换成了梯度流更丰富的 C2f 结构，并对不同尺度模型调整了不同的通道数，属于对模型结构精心微调，不再是无脑一套参数应用所有模型，大幅提升了模型性能。Head 部分相比 YOLOv5 改动较大，换成了目前主流的解耦头结构，将分类和检测头分离，同时也从 Anchor-Based 换成了 Anchor-Free。Loss 计算方面采用了 TaskAlignedAssigner 正样本分配策略，并引入了 Distribution Focal Loss。训练的数据增强部分引入了 YOLOX 中的最后 10 epoch 关闭 Mosiac 增强的操作，可以有效地提升精度。网络结构如图所示。

<img src=./yolov8_model.jpg style="zoom:100%;" align=middle>

## 预处理

待检测图像输入模型进行检测之前需要进行预处理，主要包括调整输入的尺寸，归一化等操作。

1. 转换数据排布为NCHW
2. 归一化[0.0, 1.0]
3. 调整输入数据的尺寸

```python
def preprocess(self, image):
        """
        Preprocesses the input image before performing inference.
        Args:
            image: image to preprocess.
        Returns:
            image_data: Preprocessed image data ready for inference.
        """
        # Read the input image using OpenCV
        # self.img = cv2.imread(self.input_image)
        self.img = image

        # Get the height and width of the input image
        self.img_height, self.img_width = self.img.shape[:2]

        # Convert the image color space from BGR to RGB
        img = cv2.cvtColor(self.img, cv2.COLOR_BGR2RGB)

        # Resize the image to match the input shape
        img = cv2.resize(img, (self.inputWidth, self.inputHeight))

        # Normalize the image data by dividing it by 255.0
        image_data = np.array(img) / 255.0

        # Transpose the image to have the channel dimension as the first dimension
        image_data = np.transpose(image_data, (2, 0, 1))  # Channel first

        # Expand the dimensions of the image data to match the expected input shape
        image_data = np.expand_dims(image_data, axis=0).astype(np.float32)

        # Make array memery contiguous
        image_data = np.ascontiguousarray(image_data)

        # Return the preprocessed image data
        return image_data
```

## 推理

执行YOLOV8模型推理，首先需要对YOLOV8模型进行解析、编译，静态推理过程中直接调用parse_onnx函数对静态模型进行解析，获取静态模型的输入shape信息；与静态推理不同的是，动态shape推理需要设置模型输入的最大shape，本示例设为[1,3,1024,1024]。

```python
class YOLOv8:
    """YOLOv8 object detection model class for handling inference and visualization."""

    def __init__(self, model_path, dynamic=False, conf_thres=0.5, iou_thres=0.5):
        """
        Initializes an instance of the YOLOv8 class.

        Args:
            model_path: Path to the ONNX model.
            dynamic: whether use dynamic inference.
            conf_thres: Confidence threshold for filtering detections.
            iou_thres: IoU (Intersection over Union) threshold for non-maximum suppression.
        """
        self.confThreshold = conf_thres
        self.nmsThreshold = iou_thres
        self.isDynamic = dynamic
        # 获取模型检测的类别信息
        self.classNames = list(map(lambda x: x.strip(), open('../Resource/Models/coco.names', 'r').readlines()))

        # 解析推理模型
        if self.isDynamic:
            maxInput={"images":[1,3,1024,1024]}
            self.model = migraphx.parse_onnx(model_path, map_input_dims=maxInput)

            # 获取模型输入/输出节点信息
            print("inputs:")
            inputs = self.model.get_inputs()
            for key,value in inputs.items():
                print("{}:{}".format(key,value))
            
            print("outputs:")
            outputs = self.model.get_outputs()
            for key,value in outputs.items():
                print("{}:{}".format(key,value))

            # 获取模型的输入name
            self.inputName = "images"
            
            # 获取模型的输入尺寸
            inputShape = inputShape=inputs[self.inputName].lens()
            self.inputHeight = int(inputShape[2])
            self.inputWidth = int(inputShape[3])
            print("inputName:{0} \ninputShape:{1}".format(self.inputName, inputShape))
        else:
            self.model = migraphx.parse_onnx(path) 
            ... 
        
        # 模型编译
        self.model.compile(t=migraphx.get_target("gpu"), device_id=0)  # device_id: 设置GPU设备，默认为0号设备
        print("Success to compile")
        
        ...
```

模型初始化完成之后开始进行推理，对输入数据进行前向计算得到模型的输出result，在detect函数中调用定义的postprocess函数对result进行后处理，得到图像中含有物体的anchor坐标信息、类别置信度、类别ID并画在输入图像上。

```python
def detect(self, image, input_shape=None):
        if(self.isDynamic):
            self.inputWidth = input_shape[3]
            self.inputHeight = input_shape[2]
        # 输入图片预处理
        input_img = self.preprocess(image)

        # 执行推理
        start = time.time()
        result = self.model.run({self.inputName: input_img})
        print('net forward time: {:.4f}'.format(time.time() - start))
        # 模型输出结果后处理
        dstimg = self.postprocess(image, result)

        return dstimg
```

其中对migraphx推理输出result进行后处理，首先需要置信度阈值confThreshold进行筛选，并执行非极大值抑制消除冗余anchor。相关过程定义在postprocess函数中。

```python
def postprocess(self, input_image, output):
        """
        Performs post-processing on the model's output to extract bounding boxes, scores, and class IDs.

        Args:
            input_image (numpy.ndarray): The input image.
            output (numpy.ndarray): The output of the model.

        Returns:
            numpy.ndarray: The input image with detections drawn on it.
        """

        # Transpose and squeeze the output to match the expected shape
        outputs = np.transpose(np.squeeze(output[0]))

        # Get the number of rows in the outputs array
        rows = outputs.shape[0]

        # Lists to store the bounding boxes, scores, and class IDs of the detections
        boxes = []
        scores = []
        class_ids = []

        # Calculate the scaling factors for the bounding box coordinates
        x_factor = self.img_width / self.inputWidth
        y_factor = self.img_height / self.inputHeight

        # Iterate over each row in the outputs array
        for i in range(rows):
            # Extract the class scores from the current row
            classes_scores = outputs[i][4:]

            # Find the maximum score among the class scores
            max_score = np.amax(classes_scores)

            # If the maximum score is above the confidence threshold
            if max_score >= self.confThreshold:
                # Get the class ID with the highest score
                class_id = np.argmax(classes_scores)

                # Extract the bounding box coordinates from the current row
                x, y, w, h = outputs[i][0], outputs[i][1], outputs[i][2], outputs[i][3]

                # Calculate the scaled coordinates of the bounding box
                left = int((x - w / 2) * x_factor)
                top = int((y - h / 2) * y_factor)
                width = int(w * x_factor)
                height = int(h * y_factor)

                # Add the class ID, score, and box coordinates to the respective lists
                class_ids.append(class_id)
                scores.append(max_score)
                boxes.append([left, top, width, height])

        # Apply non-maximum suppression to filter out overlapping bounding boxes
        indices = cv2.dnn.NMSBoxes(boxes, scores, self.confThreshold, self.nmsThreshold)

        # Iterate over the selected indices after non-maximum suppression
        for i in indices:
            # Get the box, score, and class ID corresponding to the index
            box = boxes[i]
            score = scores[i]
            class_id = class_ids[i]

            # Draw the detection on the input image
            self.draw_detections(input_image, box, score, class_id)

        # Return the modified input image
        return input_image
```

根据NMS去重后的boxes、scores、class_ids信息在原图进行结果可视化，包括绘制图像中检测到的物体位置、类别和置信度分数，得到最终的YOLOV8目标检测结果输出。

```python
def draw_detections(self, img, box, score, class_id):
        """
        Draws bounding boxes and labels on the input image based on the detected objects.

        Args:
            img: The input image to draw detections on.
            box: Detected bounding box.
            score: Corresponding detection score.
            class_id: Class ID for the detected object.

        Returns:
            None
        """

        # Extract the coordinates of the bounding box
        x1, y1, w, h = box

        # Retrieve the color for the class ID
        color = self.color_palette[class_id]

        # Draw the bounding box on the image
        cv2.rectangle(img, (int(x1), int(y1)), (int(x1 + w), int(y1 + h)), color, 2)

        # Create the label text with class name and score
        label = f'{self.classNames[class_id]}: {score:.2f}'

        # Calculate the dimensions of the label text
        (label_width, label_height), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)

        # Calculate the position of the label text
        label_x = x1
        label_y = y1 - 10 if y1 - 10 > label_height else y1 + 10

        # Draw a filled rectangle as the background for the label text
        cv2.rectangle(img, (label_x, label_y - label_height), (label_x + label_width, label_y + label_height), color,
                      cv2.FILLED)

        # Draw the label text on the image
        cv2.putText(img, label, (label_x, label_y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)
```