This directory features a collection of real-world applications and walkthroughs, provided as either Python files or notebooks. Explore the examples below to see how YOLO can be integrated into various applications.
We greatly appreciate contributions from the community, including examples, applications, and guides. If you'd like to contribute, please follow these guidelines:
1.**Create a pull request (PR)** with the title prefix `[Example]`, adding your new example folder to the `examples/` directory within the repository.
2.**Ensure your project adheres to the following standards:**
- Makes use of the `ultralytics` package.
- Includes a `README.md` with clear instructions for setting up and running the example.
- Avoids adding large files or dependencies unless they are absolutely necessary for the example.
- Contributors should be willing to provide support for their examples and address related issues.
For more detailed information and guidance on contributing, please visit our [contribution documentation](https://docs.ultralytics.com/help/contributing/).
If you encounter any questions or concerns regarding these guidelines, feel free to open a PR or an issue in the repository, and we will assist you in the contribution process.
Make sure to replace rtdetr-l.onnx with the path to your RTDETR ONNX model file, image.jpg with the path to your input image, and adjust the confidence threshold (conf-thres) and IoU threshold (iou-thres) values as needed.
# YOLO-Series ONNXRuntime Rust Demo for Core YOLO Tasks
This repository provides a Rust demo for key YOLO-Series tasks such as `Classification`, `Segmentation`, `Detection`, `Pose Detection`, and `OBB` using ONNXRuntime. It supports various YOLO models (v5 - 11) across multiple vision tasks.
## Introduction
- This example leverages the latest versions of both ONNXRuntime and YOLO models.
- We utilize the [usls](https://github.com/jamjamjon/usls/tree/main) crate to streamline YOLO model inference, providing efficient data loading, visualization, and optimized inference performance.
## Features
-**Extensive Model Compatibility**: Supports `YOLOv5`, `YOLOv6`, `YOLOv7`, `YOLOv8`, `YOLOv9`, `YOLOv10`, `YOLO11`, `YOLO-world`, `RTDETR`, and others, covering a wide range of YOLO versions.
-**Versatile Task Coverage**: Includes `Classification`, `Segmentation`, `Detection`, `Pose`, and `OBB`.
-**Precision Flexibility**: Works with `FP16` and `FP32` ONNX models.
-**Execution Providers**: Accelerated support for `CPU`, `CUDA`, `CoreML`, and `TensorRT`.
-**Dynamic Input Shapes**: Dynamically adjusts to variable `batch`, `width`, and `height` dimensions for flexible model input.
-**Flexible Data Loading**: The `DataLoader` handles images, folders, videos, and video streams.
-**Real-Time Display and Video Export**: `Viewer` provides real-time frame visualization and video export functions, similar to OpenCV’s `imshow()` and `imwrite()`.
-**Enhanced Annotation and Visualization**: The `Annotator` facilitates comprehensive result rendering, with support for bounding boxes (HBB), oriented bounding boxes (OBB), polygons, masks, keypoints, and text labels.
## Setup Instructions
### 1. ONNXRuntime Linking
<details>
<summary>You have two options to link the ONNXRuntime library:</summary>
-**Option 1: Manual Linking**
- For detailed setup, consult the [ONNX Runtime linking documentation](https://ort.pyke.io/setup/linking).
-**Linux or macOS**:
1. Download the ONNX Runtime package from the [Releases page](https://github.com/microsoft/onnxruntime/releases).
2. Set up the library path by exporting the `ORT_DYLIB_PATH` environment variable:
A REST API server that detects objects in images using YOLOv13 AI models. Upload an image, get back detection results with bounding boxes and confidence scores.
**Key Benefits:**
- Real-time detection (~6.9 FPS with YOLOv13n)
- Multiple YOLO model support (YOLOv13, YOLOv8)
- Simple REST API interface
- Production-ready with error handling
## Quick Start
Before starting the server, make sure you have installed this extra requirement:
# Zero-shot Action Recognition with YOLOv8 (Inference on Video)
- Action recognition is a technique used to identify and classify actions performed by individuals in a video. This process enables more advanced analyses when multiple actions are considered. The actions can be detected and classified in real time.
- The system can be customized to recognize specific actions based on the user's preferences and requirements.
## Table of Contents
-[Step 1: Install the Required Libraries](#step-1-install-the-required-libraries)
-[Step 2: Run the Action Recognition Using Ultralytics YOLOv8](#step-2-run-the-action-recognition-using-ultralytics-yolov8)
-[Usage Options](#usage-options)
-[FAQ](#faq)
## Step 1: Install the Required Libraries
Clone the repository, install dependencies and `cd` to this local directory for commands in Step 2.
## Step 2: Run the Action Recognition Using Ultralytics YOLOv8
Here are the basic commands for running the inference:
### Note
The action recognition model will automatically detect and track people in the video, and classify their actions based on the specified labels. The results will be displayed in real-time on the video output. You can customize the action labels by modifying the `--labels` argument when running the script.
```bash
# Quick start
python action_recognition.py
# Basic usage
python action_recognition.py --source"https://www.youtube.com/watch?v=dQw4w9WgXcQ"--labels"dancing""singing a song"
python action_recognition.py --source"https://www.youtube.com/watch?v=dQw4w9WgXcQ"--device 0 --video-classifier-model"microsoft/xclip-base-patch32"--labels"dancing""singing a song"--fp16
```
## Usage Options
-`--weights`: Path to the YOLO model weights (default: "yolov8n.pt")
-`--device`: Cuda device, i.e. 0 or 0,1,2,3 or cpu (default: auto-detect)
-`--source`: Video file path or YouTube URL (default: "[rickroll](https://www.youtube.com/watch?v=dQw4w9WgXcQ)")
-`--output-path`: Output video file path
-`--crop-margin-percentage`: Percentage of margin to add around detected objects (default: 10)
-`--num-video-sequence-samples`: Number of video frames to use for classification (default: 8)
-`--skip-frame`: Number of frames to skip between detections (default: 1)
-`--video-cls-overlap-ratio`: Overlap ratio between video sequences (default: 0.25)
-`--fp16`: Use FP16 for inference (only for HuggingFace models)
-`--video-classifier-model`: Video classifier model name or path (default: "microsoft/xclip-base-patch32")
-`--labels`: Labels for zero-shot video classification (default: \["dancing" "singing a song"\])
## FAQ
**1. What Does Action Recognition Involve?**
Action recognition is a computational method used to identify and classify actions or activities performed by individuals in recorded video or real-time streams. This technique is widely used in video analysis, surveillance, and human-computer interaction, enabling the detection and understanding of human behaviors based on their motion patterns and context.
**2. Is Custom Action Labels Supported by the Action Recognition?**
Yes, custom action labels are supported by the action recognition system. The `action_recognition.py` script allows users to specify their own custom labels for zero-shot video classification. This can be done using the `--labels` argument when running the script. For example:
You can adjust these labels to match the specific actions you want to recognize in your video. The system will then attempt to classify the detected actions based on these custom labels.
Additionally, you can choose between different video classification models:
1. For Hugging Face models, you can use any compatible video classification model. The default is set to:
- "microsoft/xclip-base-patch32"
2. For TorchVision models (no support for zero-shot labels), you can select from the following options:
- "s3d"
- "r3d_18"
- "swin3d_t"
- "swin3d_b"
- "mvit_v1_b"
- "mvit_v2_s"
**3. Why Combine Action Recognition with YOLOv8?**
YOLOv8 specializes in the detection and tracking of objects in video streams. Action recognition complements this by enabling the identification and classification of actions performed by individuals, making it a valuable application of YOLOv8.
**4. Can I Employ Other YOLO Versions?**
Certainly, you have the flexibility to specify different YOLO model weights using the `--weights` option.
This repository utilizes OpenCV DNN API to run ONNX exported models of YOLOv5 and YOLOv8. In theory, it should work for YOLOv6 and YOLOv7 as well, but they have not been tested. Note that the example networks are exported with rectangular (640x480) resolutions, but any exported resolution will work. You may want to use the letterbox approach for square images, depending on your use case.
The **main** branch version uses Qt as a GUI wrapper. The primary focus here is the **Inference** class file, which demonstrates how to transpose YOLOv8 models to work as YOLOv5 models.