v1.0

af155c51 · chenzk · af155c51 · af155c51 · af155c51 · af155c51
Commit af155c51 authored May 30, 2025 by chenzk
20 changed files
--- a/CITATION.cff
+++ b/CITATION.cff
+# This CITATION.cff file was generated with https://bit.ly/cffinit
+cff-version: 1.2.0
+title: Ultralytics YOLO
+message: >-
+  If you use this software, please cite it using the
+  metadata from this file.
+type: software
+authors:
+  - given-names: Glenn
+    family-names: Jocher
+    affiliation: Ultralytics
+    orcid: 'https://orcid.org/0000-0001-5950-6979'
+  - family-names: Qiu
+    given-names: Jing
+    affiliation: Ultralytics
+    orcid: 'https://orcid.org/0000-0003-3783-7069'
+  - given-names: Ayush
+    family-names: Chaurasia
+    affiliation: Ultralytics
+    orcid: 'https://orcid.org/0000-0002-7603-6750'
+repository-code: 'https://github.com/ultralytics/ultralytics'
+url: 'https://ultralytics.com'
+license: AGPL-3.0
+version: 8.0.0
+date-released: '2023-01-10'
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
+---
+comments: true
+description: Learn how to contribute to Ultralytics YOLO open-source repositories. Follow guidelines for pull requests, code of conduct, and bug reporting.
+keywords: Ultralytics, YOLO, open-source, contribution, pull request, code of conduct, bug reporting, GitHub, CLA, Google-style docstrings
+---
+# Contributing to Ultralytics Open-Source Projects
+Welcome! We're thrilled that you're considering contributing to our [Ultralytics](https://www.ultralytics.com/) [open-source](https://github.com/ultralytics) projects. Your involvement not only helps enhance the quality of our repositories but also benefits the entire community. This guide provides clear guidelines and best practices to help you get started.
+<a href="https://github.com/ultralytics/ultralytics/graphs/contributors">
+<img width="100%" src="https://github.com/ultralytics/assets/raw/main/im/image-contributors.png" alt="Ultralytics open-source contributors"></a>
+## Table of Contents
+1. [Code of Conduct](#code-of-conduct)
+2. [Contributing via Pull Requests](#contributing-via-pull-requests)
+   - [CLA Signing](#cla-signing)
+   - [Google-Style Docstrings](#google-style-docstrings)
+   - [GitHub Actions CI Tests](#github-actions-ci-tests)
+3. [Reporting Bugs](#reporting-bugs)
+4. [License](#license)
+5. [Conclusion](#conclusion)
+6. [FAQ](#faq)
+## Code of Conduct
+To ensure a welcoming and inclusive environment for everyone, all contributors must adhere to our [Code of Conduct](https://docs.ultralytics.com/help/code_of_conduct/). Respect, kindness, and professionalism are at the heart of our community.
+## Contributing via Pull Requests
+We greatly appreciate contributions in the form of pull requests. To make the review process as smooth as possible, please follow these steps:
+1. **[Fork the repository](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo):** Start by forking the Ultralytics YOLO repository to your GitHub account.
+2. **[Create a branch](https://docs.github.com/en/desktop/making-changes-in-a-branch/managing-branches-in-github-desktop):** Create a new branch in your forked repository with a clear, descriptive name that reflects your changes.
+3. **Make your changes:** Ensure your code adheres to the project's style guidelines and does not introduce any new errors or warnings.
+4. **[Test your changes](https://github.com/ultralytics/ultralytics/tree/main/tests):** Before submitting, test your changes locally to confirm they work as expected and don't cause any new issues.
+5. **[Commit your changes](https://docs.github.com/en/desktop/making-changes-in-a-branch/committing-and-reviewing-changes-to-your-project-in-github-desktop):** Commit your changes with a concise and descriptive commit message. If your changes address a specific issue, include the issue number in your commit message.
+6. **[Create a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request):** Submit a pull request from your forked repository to the main Ultralytics YOLO repository. Provide a clear and detailed explanation of your changes and how they improve the project.
+### CLA Signing
+Before we can merge your pull request, you must sign our [Contributor License Agreement (CLA)](https://docs.ultralytics.com/help/CLA/). This legal agreement ensures that your contributions are properly licensed, allowing the project to continue being distributed under the AGPL-3.0 license.
+After submitting your pull request, the CLA bot will guide you through the signing process. To sign the CLA, simply add a comment in your PR stating:
+```
+I have read the CLA Document and I sign the CLA
+```
+### Google-Style Docstrings
+When adding new functions or classes, please include [Google-style docstrings](https://google.github.io/styleguide/pyguide.html). These docstrings provide clear, standardized documentation that helps other developers understand and maintain your code.
+#### Example
+This example illustrates a Google-style docstring. Ensure that both input and output `types` are always enclosed in parentheses, e.g., `(bool)`.
+```python
+def example_function(arg1, arg2=4):
+    """
+    Example function demonstrating Google-style docstrings.
+    Args:
+        arg1 (int): The first argument.
+        arg2 (int): The second argument, with a default value of 4.
+    Returns:
+        (bool): True if successful, False otherwise.
+    Examples:
+        >>> result = example_function(1, 2)  # returns False
+    """
+    if arg1 == arg2:
+        return True
+    return False
+```
+#### Example with type hints
+This example includes both a Google-style docstring and type hints for arguments and returns, though using either independently is also acceptable.
+```python
+def example_function(arg1: int, arg2: int = 4) -> bool:
+    """
+    Example function demonstrating Google-style docstrings.
+    Args:
+        arg1: The first argument.
+        arg2: The second argument, with a default value of 4.
+    Returns:
+        True if successful, False otherwise.
+    Examples:
+        >>> result = example_function(1, 2)  # returns False
+    """
+    if arg1 == arg2:
+        return True
+    return False
+```
+#### Example Single-line
+For smaller or simpler functions, a single-line docstring may be sufficient. The docstring must use three double-quotes, be a complete sentence, start with a capital letter, and end with a period.
+```python
+def example_small_function(arg1: int, arg2: int = 4) -> bool:
+    """Example function with a single-line docstring."""
+    return arg1 == arg2
+```
+### GitHub Actions CI Tests
+All pull requests must pass the GitHub Actions [Continuous Integration](https://docs.ultralytics.com/help/CI/) (CI) tests before they can be merged. These tests include linting, unit tests, and other checks to ensure that your changes meet the project's quality standards. Review the CI output and address any issues that arise.
+## Reporting Bugs
+We highly value bug reports as they help us maintain the quality of our projects. When reporting a bug, please provide a [Minimum Reproducible Example](https://docs.ultralytics.com/help/minimum_reproducible_example/)—a simple, clear code example that consistently reproduces the issue. This allows us to quickly identify and resolve the problem.
+## License
+Ultralytics uses the [GNU Affero General Public License v3.0 (AGPL-3.0)](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) for its repositories. This license promotes openness, transparency, and collaborative improvement in software development. It ensures that all users have the freedom to use, modify, and share the software, fostering a strong community of collaboration and innovation.
+We encourage all contributors to familiarize themselves with the terms of the AGPL-3.0 license to contribute effectively and ethically to the Ultralytics open-source community.
+## Conclusion
+Thank you for your interest in contributing to [Ultralytics](https://www.ultralytics.com/) [open-source](https://github.com/ultralytics) YOLO projects. Your participation is essential in shaping the future of our software and building a vibrant community of innovation and collaboration. Whether you're enhancing code, reporting bugs, or suggesting new features, your contributions are invaluable.
+We're excited to see your ideas come to life and appreciate your commitment to advancing object detection technology. Together, let's continue to grow and innovate in this exciting open-source journey. Happy coding! 🚀🌟
+## FAQ
+### Why should I contribute to Ultralytics YOLO open-source repositories?
+Contributing to Ultralytics YOLO open-source repositories improves the software, making it more robust and feature-rich for the entire community. Contributions can include code enhancements, bug fixes, documentation improvements, and new feature implementations. Additionally, contributing allows you to collaborate with other skilled developers and experts in the field, enhancing your own skills and reputation. For details on how to get started, refer to the [Contributing via Pull Requests](#contributing-via-pull-requests) section.
+### How do I sign the Contributor License Agreement (CLA) for Ultralytics YOLO?
+To sign the Contributor License Agreement (CLA), follow the instructions provided by the CLA bot after submitting your pull request. This process ensures that your contributions are properly licensed under the AGPL-3.0 license, maintaining the legal integrity of the open-source project. Add a comment in your pull request stating:
+```
+I have read the CLA Document and I sign the CLA.
+```
+For more information, see the [CLA Signing](#cla-signing) section.
+### What are Google-style docstrings, and why are they required for Ultralytics YOLO contributions?
+Google-style docstrings provide clear, concise documentation for functions and classes, improving code readability and maintainability. These docstrings outline the function's purpose, arguments, and return values with specific formatting rules. When contributing to Ultralytics YOLO, following Google-style docstrings ensures that your additions are well-documented and easily understood. For examples and guidelines, visit the [Google-Style Docstrings](#google-style-docstrings) section.
+### How can I ensure my changes pass the GitHub Actions CI tests?
+Before your pull request can be merged, it must pass all GitHub Actions Continuous Integration (CI) tests. These tests include linting, unit tests, and other checks to ensure the code meets
+the project's quality standards. Review the CI output and fix any issues. For detailed information on the CI process and troubleshooting tips, see the [GitHub Actions CI Tests](#github-actions-ci-tests) section.
+### How do I report a bug in Ultralytics YOLO repositories?
+To report a bug, provide a clear and concise [Minimum Reproducible Example](https://docs.ultralytics.com/help/minimum_reproducible_example/) along with your bug report. This helps developers quickly identify and fix the issue. Ensure your example is minimal yet sufficient to replicate the problem. For more detailed steps on reporting bugs, refer to the [Reporting Bugs](#reporting-bugs) section.
--- a/LICENSE
+++ b/LICENSE
--- a/README.md
+++ b/README.md
+# YOLOE
+YOLOE跨越多种开放Prompt机制实现了实时感知任何事物，不受限于预定义类别。
+## 论文
+`YOLOE: Real-Time Seeing Anything`
+- https://arxiv.org/pdf/2503.07465
+## 模型结构
+YOLOE采用了典型的YOLOs架构，通过RepRTA支持文本Prompt，通过SAVPE支持视觉Prompt，以及通过LRPC支持无Prompt场景，YOLOs中分类头的结构将最后一个卷积层的输出通道数从闭集场景中的类别数更改为嵌入维度，以支持文本和视觉Prompt。
+<div align=center>
+    <img src="./doc/yoloe.png"/>
+</div>
+## 算法原理
+视觉Prompt旨在通过视觉线索Anchor点指示感兴趣的物体类别。
+<div align=center>
+    <img src="./doc/SAVPE.png"/>
+</div>
+## 环境配置
+```
+mv yoloe_pytorch yoloe # 去框架名后缀
+```
+### Docker（方法一）
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
+# <your IMAGE ID>为以上拉取的docker的镜像ID替换，本镜像为：6063b673703a
+docker run -it --shm-size=64G -v $PWD/yoloe:/home/yoloe -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name ye <your IMAGE ID> bash
+cd /home/yoloe
+pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
+```
+### Dockerfile（方法二）
+```
+cd /home/yoloe/docker
+docker build --no-cache -t ye:latest .
+docker run --shm-size=64G --name ye -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/../../yoloe:/home/yoloe -it ye bash
+# 若遇到Dockerfile启动的方式安装环境需要长时间等待，可注释掉里面的pip安装，启动容器后再安装python库：pip install -r requirements.txt。
+```
+### Anaconda（方法三）
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
+- https://developer.sourcefind.cn/tool/
+```
+DTK驱动:dtk2504
+python:python3.10
+torch:2.4.1
+torchvision:0.19.1
+triton:3.0.0
+vllm:0.6.2
+flash-attn:2.6.1
+deepspeed:0.14.2
+apex:1.4.0
+transformers:4.46.3
+```
+`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`
+2、其它非特殊库参照requirements.txt安装
+```
+cd /home/yoloe
+pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
+```
+## 数据集
+`无`
+## 训练
+无
+## 推理
+预训练权重目录结构：
+```
+/home/yoloe/
+    |── pretrain/yoloe-v8l-seg.pt
+    └── mobileclip_blt.pt
+``` 
+### 单机单卡
+```
+cd /home/yoloe
+export HF_ENDPOINT=https://hf-mirror.com
+sh infer.sh
+```
+更多资料可参考源项目中的[`README_origin`](./README_origin.md)。
+## result
+`输入: `
+```
+--source ultralytics/assets/bus.jpg
+--names person dog cat
+```
+`输出:`
+```
+ultralytics/assets/bus-output.jpg
+```
+<div align=center>
+    <img src="./doc/bus-output.png"/>
+</div>
+### 精度
+DCU与GPU精度一致，推理框架：pytorch。
+## 应用场景
+### 算法类别
+`目标检测`
+### 热点应用行业
+`制造,电商,医疗,能源,教育`
+## 预训练权重
+github/HF下载地址为：[yoloe-v8l-seg](https://github.com/ultralytics/assets/releases/download/v8.3.0/yoloe-v8l-seg.pt)、[mobileclip_blt](https://huggingface.co/apple/MobileCLIP-B-LT/blob/main/mobileclip_blt.pt)
+github权重可通过镜像地址用wget下载，例如：https://bgithub.xyz。
+## 源码仓库及问题反馈
+- http://developer.sourcefind.cn/codes/modelzoo/yoloe_pytorch.git
+## 参考资料
+- https://github.com/THU-MIG/yoloe.git
--- a/README_origin.md
+++ b/README_origin.md
+# [YOLOE: Real-Time Seeing Anything](https://arxiv.org/abs/2503.07465)
+Official PyTorch implementation of **YOLOE**.
+<p align="center">
+  <img src="figures/comparison.svg" width=70%> <br>
+  Comparison of performance, training cost, and inference efficiency between YOLOE (Ours) and YOLO-Worldv2 in terms of open text prompts.
+</p>
+[YOLOE: Real-Time Seeing Anything](https://arxiv.org/abs/2503.07465).\
+Ao Wang*, Lihao Liu*, Hui Chen, Zijia Lin, Jungong Han, and Guiguang Ding\
+[![arXiv](https://img.shields.io/badge/arXiv-2503.07465-b31b1b.svg)](https://arxiv.org/abs/2503.07465) [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/jameslahm/yoloe/tree/main) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/jameslahm/yoloe) [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/zero-shot-object-detection-and-segmentation-with-yoloe.ipynb) [![Hugging Face Collection](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-blue)](https://huggingface.co/collections/jameslahm/yoloe-67d5110aabaefbe129c15917) [![Openbayes Demo](https://img.shields.io/static/v1?label=Demo&message=OpenBayes%E8%B4%9D%E5%BC%8F%E8%AE%A1%E7%AE%97&color=green)](https://openbayes.com/console/public/tutorials/BQhUorEqyVX)
+We introduce **YOLOE(ye)**, a highly **efficient**, **unified**, and **open** object detection and segmentation model, like human eye, under different prompt mechanisms, like *texts*, *visual inputs*, and *prompt-free paradigm*, with **zero inference and transferring overhead** compared with closed-set YOLOs.
+<!-- <p align="center">
+  <img src="figures/pipeline.svg" width=96%> <br>
+</p> -->
+<p align="center">
+  <img src="figures/visualization.svg" width=96%> <br>
+</p>
+<details>
+  <summary>
+  <font size="+1">Abstract</font>
+  </summary>
+Object detection and segmentation are widely employed in computer vision applications, yet conventional models like YOLO series, while efficient and accurate, are limited by predefined categories, hindering adaptability in open scenarios. Recent open-set methods leverage text prompts, visual cues, or prompt-free paradigm to overcome this, but often compromise between performance and efficiency due to high computational demands or deployment complexity. In this work, we introduce YOLOE, which integrates detection and segmentation across diverse open prompt mechanisms within a single highly efficient model, achieving real-time seeing anything. For text prompts, we propose Re-parameterizable Region-Text Alignment (RepRTA) strategy. It refines pretrained textual embeddings via a re-parameterizable lightweight auxiliary network and enhances visual-textual alignment with zero inference and transferring overhead. For visual prompts, we present Semantic-Activated Visual Prompt Encoder (SAVPE). It employs decoupled semantic and activation branches to bring improved visual embedding and accuracy with minimal complexity. For prompt-free scenario, we introduce Lazy Region-Prompt Contrast (LRPC) strategy. It utilizes a built-in large vocabulary and specialized embedding to identify all objects, avoiding costly language model dependency. Extensive experiments show YOLOE's exceptional zero-shot performance and transferability with high inference efficiency and low training cost. Notably, on LVIS, with $3\times$ less training cost and $1.4\times$ inference speedup, YOLOE-v8-S surpasses YOLO-Worldv2-S by 3.5 AP. When transferring to COCO, YOLOE-v8-L achieves 0.6 $AP^b$ and 0.4 $AP^m$ gains over closed-set YOLOv8-L with nearly $4\times$ less training time.
+</details>
+<p></p>
+<p align="center">
+  <img src="figures/pipeline.svg" width=96%> <br>
+</p>
+## Performance
+### Zero-shot detection evaluation
+- *Fixed AP* is reported on LVIS `minival` set with text (T) / visual (V) prompts.
+- Training time is for text prompts with detection based on 8 Nvidia RTX4090 GPUs.
+- FPS is measured on T4 with TensorRT and iPhone 12 with CoreML, respectively.
+- For training data, OG denotes Objects365v1 and GoldG.
+- YOLOE can become YOLOs after re-parameterization with **zero inference and transferring overhead**.
+| Model | Size | Prompt | Params | Data | Time | FPS | $AP$ | $AP_r$ | $AP_c$ | $AP_f$ | Log |
+|---|---|---|---|---|---|---|---|---|---|---|---|
+| [YOLOE-v8-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8s-seg.pt) | 640 | T / V | 12M / 13M | OG | 12.0h | 305.8 / 64.3 | 27.9 / 26.2 | 22.3 / 21.3 | 27.8 / 27.7 | 29.0 / 25.7 | [T](./logs/yoloe-v8s-seg) / [V](./logs/yoloe-v8s-seg-vp) |
+| [YOLOE-v8-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8m-seg.pt) | 640 | T / V | 27M / 30M | OG | 17.0h | 156.7 / 41.7 | 32.6 / 31.0 | 26.9 / 27.0 | 31.9 / 31.7 | 34.4 / 31.1 | [T](./logs/yoloe-v8m-seg) / [V](./logs/yoloe-v8m-seg-vp) |
+| [YOLOE-v8-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8l-seg.pt) | 640 | T / V | 45M / 50M | OG | 22.5h | 102.5 / 27.2 | 35.9 / 34.2 | 33.2 / 33.2 | 34.8 / 34.6 | 37.3 / 34.1 | [T](./logs/yoloe-v8l-seg) / [V](./logs/yoloe-v8l-seg-vp) |
+| [YOLOE-11-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11s-seg.pt) | 640 | T / V | 10M / 12M | OG | 13.0h | 301.2 / 73.3 | 27.5 / 26.3 | 21.4 / 22.5 | 26.8 / 27.1 | 29.3 / 26.4 | [T](./logs/yoloe-11s-seg) / [V](./logs/yoloe-11s-seg-vp) |
+| [YOLOE-11-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11m-seg.pt) | 640 | T / V | 21M / 27M | OG | 18.5h | 168.3 / 39.2 | 33.0 / 31.4 | 26.9 / 27.1 | 32.5 / 31.9 | 34.5 / 31.7 | [T](./logs/yoloe-11m-seg) / [V](./logs/yoloe-11m-seg-vp) |
+| [YOLOE-11-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11l-seg.pt) | 640 | T / V | 26M / 32M | OG | 23.5h | 130.5 / 35.1 | 35.2 / 33.7 | 29.1 / 28.1 | 35.0 / 34.6 | 36.5 / 33.8 | [T](./logs/yoloe-11l-seg) / [V](./logs/yoloe-11l-seg-vp) |
+### Zero-shot segmentation evaluation
+- The model is the same as above in [Zero-shot detection evaluation](#zero-shot-detection-evaluation).
+- *Standard AP<sup>m</sup>* is reported on LVIS `val` set with text (T) / visual (V) prompts.
+| Model | Size | Prompt | $AP^m$ | $AP_r^m$ | $AP_c^m$ | $AP_f^m$ |
+|---|---|---|---|---|---|---|
+| YOLOE-v8-S | 640 | T / V | 17.7 / 16.8 | 15.5 / 13.5 | 16.3 / 16.7 | 20.3 / 18.2 |
+| YOLOE-v8-M | 640 | T / V | 20.8 / 20.3 | 17.2 / 17.0 | 19.2 / 20.1 | 24.2 / 22.0 |
+| YOLOE-v8-L | 640 | T / V | 23.5 / 22.0 | 21.9 / 16.5 | 21.6 / 22.1 | 26.4 / 24.3 |
+| YOLOE-11-S | 640 | T / V | 17.6 / 17.1 | 16.1 / 14.4 | 15.6 / 16.8 | 20.5 / 18.6 |
+| YOLOE-11-M | 640 | T / V | 21.1 / 21.0 | 17.2 / 18.3 | 19.6 / 20.6 | 24.4 / 22.6 |
+| YOLOE-11-L | 640 | T / V | 22.6 / 22.5 | 19.3 / 20.5 | 20.9 / 21.7 | 26.0 / 24.1 |
+### Prompt-free evaluation
+- The model is the same as above in [Zero-shot detection evaluation](#zero-shot-detection-evaluation) except the specialized prompt embedding.
+- *Fixed AP* is reported on LVIS `minival` set and FPS is measured on Nvidia T4 GPU with Pytorch.
+| Model | Size | Params | $AP$ | $AP_r$ | $AP_c$ | $AP_f$ | FPS | Log |
+|---|---|---|---|---|---|---|---|---|
+| [YOLOE-v8-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8s-seg-pf.pt) | 640 | 13M | 21.0 | 19.1 | 21.3 | 21.0 | 95.8 | [PF](./logs/yoloe-v8s-seg-pf/) |
+| [YOLOE-v8-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8m-seg-pf.pt) | 640 | 29M | 24.7 | 22.2 | 24.5 | 25.3 | 45.9 | [PF](./logs/yoloe-v8m-seg-pf/) |
+| [YOLOE-v8-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8l-seg-pf.pt) | 640 | 47M | 27.2 | 23.5 | 27.0 | 28.0 | 25.3 | [PF](./logs/yoloe-v8l-seg-pf/) |
+| [YOLOE-11-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11s-seg-pf.pt) | 640 | 11M | 20.6 | 18.4 | 20.2 | 21.3 | 93.0 | [PF](./logs/yoloe-11s-seg-pf/) |
+| [YOLOE-11-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11m-seg-pf.pt) | 640 | 24M | 25.5 | 21.6 | 25.5 | 26.1 | 42.5 | [PF](./logs/yoloe-11m-seg-pf/) |
+| [YOLOE-11-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11l-seg-pf.pt) | 640 | 29M | 26.3 | 22.7 | 25.8 | 27.5 | 34.9 | [PF](./logs/yoloe-11l-seg-pf/) |
+### Downstream transfer on COCO
+- During transferring, YOLOE-v8 / YOLOE-11 is **exactly the same** as YOLOv8 / YOLO11.
+- For *Linear probing*, only the last conv in classification head is trainable.
+- For *Full tuning*, all parameters are trainable.
+| Model | Size | Epochs | $AP^b$ | $AP^b_{50}$ | $AP^b_{75}$ | $AP^m$ | $AP^m_{50}$ | $AP^m_{75}$ | Log |
+|---|---|---|---|---|---|---|---|---|---|
+| Linear probing | | | | | | | | | |
+| [YOLOE-v8-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8s-seg-coco-pe.pt) | 640 | 10 | 35.6 | 51.5 | 38.9 | 30.3 | 48.2 | 32.0 | [LP](./logs/yoloe-v8s-seg-coco-pe/) |
+| [YOLOE-v8-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8m-seg-coco-pe.pt) | 640 | 10 | 42.2 | 59.2 | 46.3 | 35.5 | 55.6 | 37.7 | [LP](./logs/yoloe-v8m-seg-coco-pe/) |
+| [YOLOE-v8-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8l-seg-coco-pe.pt) | 640 | 10 | 45.4 | 63.3 | 50.0 | 38.3 | 59.6 | 40.8 | [LP](./logs/yoloe-v8l-seg-coco-pe/) |
+| [YOLOE-11-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11s-seg-coco-pe.pt) | 640 | 10 | 37.0 | 52.9 | 40.4 | 31.5 | 49.7 | 33.5 | [LP](./logs/yoloe-11s-seg-coco-pe/) |
+| [YOLOE-11-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11m-seg-coco-pe.pt) | 640 | 10 | 43.1 | 60.6 | 47.4 | 36.5 | 56.9 | 39.0 | [LP](./logs/yoloe-11m-seg-coco-pe/) |
+| [YOLOE-11-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11l-seg-coco-pe.pt) | 640 | 10 | 45.1 | 62.8 | 49.5 | 38.0 | 59.2 | 40.6 | [LP](./logs/yoloe-11l-seg-coco-pe/) |
+| Full tuning | | | | | | | | | |
+| [YOLOE-v8-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8s-seg-coco.pt) | 640 | 160 | 45.0 | 61.6 | 49.1 | 36.7 | 58.3 | 39.1 | [FT](./logs/yoloe-v8s-seg-coco/) |
+| [YOLOE-v8-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8m-seg-coco.pt) | 640 | 80 | 50.4 | 67.0 | 55.2 | 40.9 | 63.7 | 43.5 | [FT](./logs/yoloe-v8m-seg-coco/) |
+| [YOLOE-v8-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8l-seg-coco.pt) | 640 | 80 | 53.0 | 69.8 | 57.9 | 42.7 | 66.5 | 45.6 | [FT](./logs/yoloe-v8l-seg-coco/) |
+| [YOLOE-11-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11s-seg-coco.pt) | 640 | 160 | 46.2 | 62.9 | 50.0 | 37.6 | 59.3 | 40.1 | [FT](./logs/yoloe-11s-seg-coco/) |
+| [YOLOE-11-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11m-seg-coco.pt) | 640 | 80 | 51.3 | 68.3 | 56.0 | 41.5 | 64.8 | 44.3 | [FT](./logs/yoloe-11m-seg-coco/) |
+| [YOLOE-11-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11l-seg-coco.pt) | 640 | 80 | 52.6 | 69.7 | 57.5 | 42.4 | 66.2 | 45.2 | [FT](./logs/yoloe-11l-seg-coco/) |
+## Installation
+You could also quickly try YOLOE for [prediction](https://colab.research.google.com/drive/1LRFEVarAIVSnIeL_pCPtsFL87FsEe46U?usp=sharing) and [transferring](https://colab.research.google.com/drive/1y-r4y_owfFAfyqbqP2t64H7IqjURkKwe?usp=sharing) using colab notebooks.
+`conda` virtual environment is recommended. 
+```bash
+conda create -n yoloe python=3.10 -y
+conda activate yoloe
+# If you clone this repo, please use this
+pip install -r requirements.txt
+# Or you can also directly install the repo by this
+pip install git+https://github.com/THU-MIG/yoloe.git#subdirectory=third_party/CLIP
+pip install git+https://github.com/THU-MIG/yoloe.git#subdirectory=third_party/ml-mobileclip
+pip install git+https://github.com/THU-MIG/yoloe.git#subdirectory=third_party/lvis-api
+pip install git+https://github.com/THU-MIG/yoloe.git
+wget https://docs-assets.developer.apple.com/ml-research/datasets/mobileclip/mobileclip_blt.pt
+```
+## Demo
+If desired objects are not identified, pleaset set a **smaller** confidence threshold, e.g., for visual prompts with handcrafted shape or cross-image prompts.
+```bash
+# Optional for mirror: export HF_ENDPOINT=https://hf-mirror.com
+pip install gradio==4.42.0 gradio_image_prompter==0.1.0 fastapi==0.112.2 huggingface-hub==0.26.3 gradio_client==1.3.0 pydantic==2.10.6
+python app.py
+# Please visit http://127.0.0.1:7860
+```
+## Prediction
+```bash
+# Download pretrained models
+# Optional for mirror: export HF_ENDPOINT=https://hf-mirror.com
+# Please replace the pt file with your desired model
+pip install huggingface-hub==0.26.3
+huggingface-cli download jameslahm/yoloe yoloe-v8l-seg.pt --local-dir pretrain
+```
+For yoloe-(v8s/m/l)/(11s/m/l)-seg, Models can also be automatically downloaded using `from_pretrained`.
+```python
+from ultralytics import YOLOE
+model = YOLOE.from_pretrained("jameslahm/yoloe-v8l-seg")
+```
+### Text prompt
+```bash
+python predict_text_prompt.py \
+    --source ultralytics/assets/bus.jpg \
+    --checkpoint pretrain/yoloe-v8l-seg.pt \
+    --names person dog cat \
+    --device cuda:0
+```
+### Visual prompt
+```bash
+python predict_visual_prompt.py
+```
+### Prompt free
+```bash
+python predict_prompt_free.py
+```
+## Transferring
+After pretraining, YOLOE-v8 / YOLOE-11 can be re-parameterized into the same architecture as YOLOv8 / YOLO11, with **zero overhead for transferring**.
+### Linear probing
+Only the last conv, ie., the prompt embedding, is trainable.
+```bash
+python train_pe.py
+```
+### Full tuning
+All parameters are trainable, for better performance.
+```bash
+# For models with s scale, please change the epochs to 160 for longer training
+python train_pe_all.py
+```
+## Validation
+### Data
+- Please download LVIS following [here](https://docs.ultralytics.com/zh/datasets/detect/lvis/) or [lvis.yaml](./ultralytics/cfg/datasets/lvis.yaml).
+- We use this [`minival.txt`](./tools/lvis/minival.txt) with background images for evaluation.
+```bash
+# For evaluation with visual prompt, please obtain the referring data.
+python tools/generate_lvis_visual_prompt_data.py
+```
+### Zero-shot evaluation on LVIS
+- For text prompts, `python val.py`.
+- For visual prompts, `python val_vp.py`
+For *Fixed AP*, please refer to the comments in `val.py` and `val_vp.py`, and use `tools/eval_fixed_ap.py` for evaluation.
+### Prompt-free evaluation
+```bash
+python val_pe_free.py
+python tools/eval_open_ended.py --json ../datasets/lvis/annotations/lvis_v1_minival.json --pred runs/detect/val/predictions.json --fixed
+```
+### Downstream transfer on COCO
+```bash
+python val_coco.py
+```
+## Training 
+The training includes three stages:
+- YOLOE is trained with text prompts for detection and segmentation for 30 epochs.
+- Only visual prompt encoder (SAVPE) is trained with visual prompts for 2 epochs.
+- Only specialized prompt embedding for prompt free is trained for 1 epochs.
+### Data
+| Images | Raw Annotations | Processed Annotations |
+|---|---|---|
+| [Objects365v1](https://opendatalab.com/OpenDataLab/Objects365_v1) | [objects365_train.json](https://opendatalab.com/OpenDataLab/Objects365_v1) | [objects365_train_segm.json](https://huggingface.co/datasets/jameslahm/yoloe/blob/main/objects365_train_segm.json) |
+| [GQA](https://nlp.stanford.edu/data/gqa/images.zip) | [	final_mixed_train_noo_coco.json](https://huggingface.co/GLIPModel/GLIP/blob/main/mdetr_annotations/final_mixed_train_no_coco.json)  | [	final_mixed_train_noo_coco_segm.json](https://huggingface.co/datasets/jameslahm/yoloe/blob/main/final_mixed_train_no_coco_segm.json) |
+| [Flickr30k](https://shannon.cs.illinois.edu/DenotationGraph/) | [final_flickr_separateGT_train.json](https://huggingface.co/GLIPModel/GLIP/blob/main/mdetr_annotations/final_flickr_separateGT_train.json) | [final_flickr_separateGT_train_segm.json](https://huggingface.co/datasets/jameslahm/yoloe/blob/main/final_flickr_separateGT_train_segm.json) |
+For annotations, you can directly use our preprocessed ones or use the following script to obtain the processed annotations with segmentation masks.
+```bash
+# Generate segmentation data
+conda create -n sam2 python==3.10.16
+conda activate sam2
+pip install -r third_party/sam2/requirements.txt
+pip install -e third_party/sam2/
+python tools/generate_sam_masks.py --img-path ../datasets/Objects365v1/images/train --json-path ../datasets/Objects365v1/annotations/objects365_train.json --batch
+python tools/generate_sam_masks.py --img-path ../datasets/flickr/full_images/ --json-path ../datasets/flickr/annotations/final_flickr_separateGT_train.json
+python tools/generate_sam_masks.py --img-path ../datasets/mixed_grounding/gqa/images --json-path ../datasets/mixed_grounding/annotations/final_mixed_train_no_coco.json
+# Generate objects365v1 labels
+python tools/generate_objects365v1.py
+```
+Then, please generate the data and embedding cache for training.
+```bash
+# Generate grounding segmentation cache
+python tools/generate_grounding_cache.py --img-path ../datasets/flickr/full_images/ --json-path ../datasets/flickr/annotations/final_flickr_separateGT_train_segm.json
+python tools/generate_grounding_cache.py --img-path ../datasets/mixed_grounding/gqa/images --json-path ../datasets/mixed_grounding/annotations/final_mixed_train_no_coco_segm.json
+# Generate train label embeddings
+python tools/generate_label_embedding.py
+python tools/generate_global_neg_cat.py
+```
+At last, please download MobileCLIP-B(LT) for text encoder.
+```bash
+wget https://docs-assets.developer.apple.com/ml-research/datasets/mobileclip/mobileclip_blt.pt
+```
+### Text prompt
+```bash
+# For models with l scale, please change the initialization by referring to the comments in Line 549 in ultralytics/nn/moduels/head.py
+# If you want to train YOLOE only for detection, you can use `train.py` 
+python train_seg.py
+```
+### Visual prompt
+```bash
+# For visual prompt, because only SAVPE is trained, we can adopt the detection pipeline with less training time
+# First, obtain the detection model
+python tools/convert_segm2det.py
+# Then, train the SAVPE module
+python train_vp.py
+# After training, please use tools/get_vp_segm.py to add the segmentation head
+# python tools/get_vp_segm.py
+```
+### Prompt free
+```bash
+# Generate LVIS with single class for evaluation during training
+python tools/generate_lvis_sc.py
+# Similar to visual prompt, because only the specialized prompt embedding is trained, we can adopt the detection pipeline with less training time
+python tools/convert_segm2det.py
+python train_pe_free.py
+# After training, please use tools/get_pf_free_segm.py to add the segmentation head
+# python tools/get_pf_free_segm.py
+```
+## Export
+After re-parameterization, YOLOE-v8 / YOLOE-11 can be exported into the identical format as YOLOv8 / YOLO11, with **zero overhead for inference**.
+```bash
+pip install onnx coremltools onnxslim
+python export.py
+```
+## Benchmark
+- For TensorRT, please refer to `benchmark.sh`.
+- For CoreML, please use the benchmark tool from [XCode 14](https://developer.apple.com/videos/play/wwdc2022/10027/).
+- For prompt-free setting, please refer to `tools/benchmark_pf.py`.
+## Acknowledgement
+The code base is built with [ultralytics](https://github.com/ultralytics/ultralytics), [YOLO-World](https://github.com/AILab-CVC/YOLO-World), [MobileCLIP](https://github.com/apple/ml-mobileclip), [lvis-api](https://github.com/lvis-dataset/lvis-api), [CLIP](https://github.com/openai/CLIP), and [GenerateU](https://github.com/FoundationVision/GenerateU).
+Thanks for the great implementations! 
+## Citation
+If our code or models help your work, please cite our paper:
+```BibTeX
+@misc{wang2025yoloerealtimeseeing,
+      title={YOLOE: Real-Time Seeing Anything}, 
+      author={Ao Wang and Lihao Liu and Hui Chen and Zijia Lin and Jungong Han and Guiguang Ding},
+      year={2025},
+      eprint={2503.07465},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2503.07465}, 
+}
+```
--- a/app.py
+++ b/app.py
+import torch
+import numpy as np
+import gradio as gr
+import supervision as sv
+from scipy.ndimage import binary_fill_holes
+from ultralytics import YOLOE
+from ultralytics.utils.torch_utils import smart_inference_mode
+from ultralytics.models.yolo.yoloe.predict_vp import YOLOEVPSegPredictor
+from gradio_image_prompter import ImagePrompter
+from huggingface_hub import hf_hub_download
+def init_model(model_id, is_pf=False):
+    filename = f"{model_id}-seg.pt" if not is_pf else f"{model_id}-seg-pf.pt"
+    path = hf_hub_download(repo_id="jameslahm/yoloe", filename=filename)
+    model = YOLOE(path)
+    model.eval()
+    model.to("cuda" if torch.cuda.is_available() else "cpu")
+    return model
+@smart_inference_mode()
+def yoloe_inference(image, prompts, target_image, model_id, image_size, conf_thresh, iou_thresh, prompt_type):
+    model = init_model(model_id)
+    kwargs = {}
+    if prompt_type == "Text":
+        texts = prompts["texts"]
+        model.set_classes(texts, model.get_text_pe(texts))
+    elif prompt_type == "Visual":
+        kwargs = dict(
+            prompts=prompts,
+            predictor=YOLOEVPSegPredictor
+        )
+        if target_image:
+            model.predict(source=image, imgsz=image_size, conf=conf_thresh, iou=iou_thresh, return_vpe=True, **kwargs)
+            model.set_classes(["object0"], model.predictor.vpe)
+            model.predictor = None  # unset VPPredictor
+            image = target_image
+            kwargs = {}
+    elif prompt_type == "Prompt-free":
+        vocab = model.get_vocab(prompts["texts"])
+        model = init_model(model_id, is_pf=True)
+        model.set_vocab(vocab, names=prompts["texts"])
+        model.model.model[-1].is_fused = True
+        model.model.model[-1].conf = 0.001
+        model.model.model[-1].max_det = 1000
+    results = model.predict(source=image, imgsz=image_size, conf=conf_thresh, iou=iou_thresh, **kwargs)
+    detections = sv.Detections.from_ultralytics(results[0])
+    resolution_wh = image.size
+    thickness = sv.calculate_optimal_line_thickness(resolution_wh=resolution_wh)
+    text_scale = sv.calculate_optimal_text_scale(resolution_wh=resolution_wh)
+    labels = [
+        f"{class_name} {confidence:.2f}"
+        for class_name, confidence
+        in zip(detections['class_name'], detections.confidence)
+    ]
+    annotated_image = image.copy()
+    annotated_image = sv.MaskAnnotator(
+        color_lookup=sv.ColorLookup.INDEX,
+        opacity=0.4
+    ).annotate(scene=annotated_image, detections=detections)
+    annotated_image = sv.BoxAnnotator(
+        color_lookup=sv.ColorLookup.INDEX,
+        thickness=thickness
+    ).annotate(scene=annotated_image, detections=detections)
+    annotated_image = sv.LabelAnnotator(
+        color_lookup=sv.ColorLookup.INDEX,
+        text_scale=text_scale,
+        smart_position=True
+    ).annotate(scene=annotated_image, detections=detections, labels=labels)
+    return annotated_image
+def app():
+    with gr.Blocks():
+        with gr.Row():
+            with gr.Column():
+                with gr.Row():
+                    raw_image = gr.Image(type="pil", label="Image", visible=True, interactive=True)
+                    box_image = ImagePrompter(type="pil", label="DrawBox", visible=False, interactive=True)
+                    mask_image = gr.ImageEditor(type="pil", label="DrawMask", visible=False, interactive=True, layers=False, canvas_size=(640, 640))
+                    target_image = gr.Image(type="pil", label="Target Image", visible=False, interactive=True)
+                yoloe_infer = gr.Button(value="Detect & Segment Objects")
+                prompt_type = gr.Textbox(value="Text", visible=False)
+                with gr.Tab("Text") as text_tab:
+                    texts = gr.Textbox(label="Input Texts", value='person,bus', placeholder='person,bus', visible=True, interactive=True)
+                with gr.Tab("Visual") as visual_tab:
+                    with gr.Row():
+                        visual_prompt_type = gr.Dropdown(choices=["bboxes", "masks"], value="bboxes", label="Visual Type", interactive=True)
+                        visual_usage_type = gr.Radio(choices=["Intra-Image", "Cross-Image"], value="Intra-Image", label="Intra/Cross Image", interactive=True)
+                with gr.Tab("Prompt-Free") as prompt_free_tab:
+                    gr.HTML(
+                        """
+                        <p style='text-align: center'>
+                        <b>Prompt-Free Mode is On</b>
+                        </p>
+                    """, show_label=False)
+                model_id = gr.Dropdown(
+                    label="Model",
+                    choices=[
+                        "yoloe-v8s",
+                        "yoloe-v8m",
+                        "yoloe-v8l",
+                        "yoloe-11s",
+                        "yoloe-11m",
+                        "yoloe-11l",
+                    ],
+                    value="yoloe-v8l",
+                )
+                image_size = gr.Slider(
+                    label="Image Size",
+                    minimum=320,
+                    maximum=1280,
+                    step=32,
+                    value=640,
+                )
+                conf_thresh = gr.Slider(
+                    label="Confidence Threshold",
+                    minimum=0.0,
+                    maximum=1.0,
+                    step=0.05,
+                    value=0.25,
+                )
+                iou_thresh = gr.Slider(
+                    label="IoU Threshold",
+                    minimum=0.0,
+                    maximum=1.0,
+                    step=0.05,
+                    value=0.70,
+                )
+            with gr.Column():
+                output_image = gr.Image(type="numpy", label="Annotated Image", visible=True)
+        def update_text_image_visibility():
+            return gr.update(value="Text"), gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False)
+        def update_visual_image_visiblity(visual_prompt_type, visual_usage_type):
+            if visual_prompt_type == "bboxes":
+                return gr.update(value="Visual"), gr.update(visible=False), gr.update(visible=True), gr.update(visible=False), gr.update(visible=(visual_usage_type == "Cross-Image"))
+            elif visual_prompt_type == "masks":
+                return gr.update(value="Visual"), gr.update(visible=False), gr.update(visible=False), gr.update(visible=True), gr.update(visible=(visual_usage_type == "Cross-Image"))
+        def update_pf_image_visibility():
+            return gr.update(value="Prompt-free"), gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False)
+        text_tab.select(
+            fn=update_text_image_visibility,
+            inputs=None,
+            outputs=[prompt_type, raw_image, box_image, mask_image, target_image]
+        )
+        visual_tab.select(
+            fn=update_visual_image_visiblity,
+            inputs=[visual_prompt_type, visual_usage_type],
+            outputs=[prompt_type, raw_image, box_image, mask_image, target_image]
+        )
+        prompt_free_tab.select(
+            fn=update_pf_image_visibility,
+            inputs=None,
+            outputs=[prompt_type, raw_image, box_image, mask_image, target_image]
+        )
+        def update_visual_prompt_type(visual_prompt_type):
+            if visual_prompt_type == "bboxes":
+                return gr.update(visible=True), gr.update(visible=False)
+            if visual_prompt_type == "masks":
+                return gr.update(visible=False), gr.update(visible=True)
+            return gr.update(visible=False), gr.update(visible=False)
+        def update_visual_usage_type(visual_usage_type):
+            if visual_usage_type == "Intra-Image":
+                return gr.update(visible=False)
+            if visual_usage_type == "Cross-Image":
+                return gr.update(visible=True)
+            return gr.update(visible=False)
+        visual_prompt_type.change(
+            fn=update_visual_prompt_type,
+            inputs=[visual_prompt_type],
+            outputs=[box_image, mask_image]
+        )
+        visual_usage_type.change(
+            fn=update_visual_usage_type,
+            inputs=[visual_usage_type],
+            outputs=[target_image]
+        )
+        def run_inference(raw_image, box_image, mask_image, target_image, texts, model_id, image_size, conf_thresh, iou_thresh, prompt_type, visual_prompt_type, visual_usage_type):
+            # add text/built-in prompts
+            if prompt_type == "Text" or prompt_type == "Prompt-free":
+                target_image = None
+                image = raw_image
+                if prompt_type == "Prompt-free":
+                    with open('tools/ram_tag_list.txt', 'r') as f:
+                        texts = [x.strip() for x in f.readlines()]
+                else:
+                    texts = [text.strip() for text in texts.split(',')]
+                prompts = {
+                    "texts": texts
+                }
+            # add visual prompt
+            elif prompt_type == "Visual":
+                if visual_usage_type != "Cross-Image":
+                    target_image = None
+                if visual_prompt_type == "bboxes":
+                    image, points = box_image["image"], box_image["points"]
+                    points = np.array(points)
+                    if len(points) == 0:
+                        gr.Warning("No boxes are provided. No image output.", visible=True)
+                        return gr.update(value=None)
+                    bboxes = np.array([p[[0, 1, 3, 4]] for p in points if p[2] == 2])
+                    prompts = {
+                        "bboxes": bboxes,
+                        "cls": np.array([0] * len(bboxes))
+                    }
+                elif visual_prompt_type == "masks":
+                    image, masks = mask_image["background"], mask_image["layers"][0]
+                    # image = image.convert("RGB")
+                    masks = np.array(masks.convert("L"))
+                    masks = binary_fill_holes(masks).astype(np.uint8)
+                    masks[masks > 0] = 1
+                    if masks.sum() == 0:
+                        gr.Warning("No masks are provided. No image output.", visible=True)
+                        return gr.update(value=None)
+                    prompts = {
+                        "masks": masks[None],
+                        "cls": np.array([0])
+                    }
+            return yoloe_inference(image, prompts, target_image, model_id, image_size, conf_thresh, iou_thresh, prompt_type)
+        yoloe_infer.click(
+            fn=run_inference,
+            inputs=[raw_image, box_image, mask_image, target_image, texts, model_id, image_size, conf_thresh, iou_thresh, prompt_type, visual_prompt_type, visual_usage_type],
+            outputs=[output_image],
+        )
+        ###################### Examples ##########################
+        text_examples = gr.Examples(
+            examples=[[
+                "ultralytics/assets/bus.jpg",
+                "person,bus",
+                "yoloe-v8l",
+                640,
+                0.25,
+                0.7]],
+            inputs=[raw_image, texts, model_id, image_size, conf_thresh, iou_thresh],
+            visible=True, cache_examples=False, label="Text Prompt Examples")
+        box_examples = gr.Examples(
+            examples=[[
+                {"image": "ultralytics/assets/bus_box.jpg", "points": [[235, 408, 2, 342, 863, 3]]},
+                "ultralytics/assets/zidane.jpg",
+                "yoloe-v8l",
+                640,
+                0.2,
+                0.7,
+            ]],
+            inputs=[box_image, target_image, model_id, image_size, conf_thresh, iou_thresh],
+            visible=False, cache_examples=False, label="Box Visual Prompt Examples")
+        mask_examples = gr.Examples(
+            examples=[[
+                {"background": "ultralytics/assets/bus.jpg", "layers": ["ultralytics/assets/bus_mask.png"], "composite": "ultralytics/assets/bus_composite.jpg"},
+                "ultralytics/assets/zidane.jpg",
+                "yoloe-v8l",
+                640,
+                0.15,
+                0.7,
+            ]],
+            inputs=[mask_image, target_image, model_id, image_size, conf_thresh, iou_thresh],
+            visible=False, cache_examples=False, label="Mask Visual Prompt Examples")
+        pf_examples = gr.Examples(
+            examples=[[
+                "ultralytics/assets/bus.jpg",
+                "yoloe-v8l",
+                640,
+                0.25,
+                0.7,
+            ]],
+            inputs=[raw_image, model_id, image_size, conf_thresh, iou_thresh],
+            visible=False, cache_examples=False, label="Prompt-free Examples")
+        # Components update
+        def load_box_example(visual_usage_type):
+            return (gr.update(visible=True, value={"image": "ultralytics/assets/bus_box.jpg", "points": [[235, 408, 2, 342, 863, 3]]}),
+                    gr.update(visible=(visual_usage_type=="Cross-Image")))
+        def load_mask_example(visual_usage_type):
+            return gr.update(visible=True), gr.update(visible=(visual_usage_type=="Cross-Image"))
+        box_examples.load_input_event.then(
+            fn=load_box_example,
+            inputs=visual_usage_type,
+            outputs=[box_image, target_image]
+        )
+        mask_examples.load_input_event.then(
+            fn=load_mask_example,
+            inputs=visual_usage_type,
+            outputs=[mask_image, target_image]
+        )
+        # Examples update
+        def update_text_examples():
+            return gr.Dataset(visible=True), gr.Dataset(visible=False), gr.Dataset(visible=False), gr.Dataset(visible=False)
+        def update_pf_examples():
+            return gr.Dataset(visible=False), gr.Dataset(visible=False), gr.Dataset(visible=False), gr.Dataset(visible=True)
+        def update_visual_examples(visual_prompt_type):
+            if visual_prompt_type == "bboxes":
+                return gr.Dataset(visible=False), gr.Dataset(visible=True), gr.Dataset(visible=False), gr.Dataset(visible=False),
+            elif visual_prompt_type == "masks":
+                return gr.Dataset(visible=False), gr.Dataset(visible=False), gr.Dataset(visible=True), gr.Dataset(visible=False),
+        text_tab.select(
+            fn=update_text_examples,
+            inputs=None,
+            outputs=[text_examples.dataset, box_examples.dataset, mask_examples.dataset, pf_examples.dataset]
+        )
+        visual_tab.select(
+            fn=update_visual_examples,
+            inputs=[visual_prompt_type],
+            outputs=[text_examples.dataset, box_examples.dataset, mask_examples.dataset, pf_examples.dataset]
+        )
+        prompt_free_tab.select(
+            fn=update_pf_examples,
+            inputs=None,
+            outputs=[text_examples.dataset, box_examples.dataset, mask_examples.dataset, pf_examples.dataset]
+        )
+        visual_prompt_type.change(
+            fn=update_visual_examples,
+            inputs=[visual_prompt_type],
+            outputs=[text_examples.dataset, box_examples.dataset, mask_examples.dataset, pf_examples.dataset]
+        )
+        visual_usage_type.change(
+            fn=update_visual_examples,
+            inputs=[visual_prompt_type],
+            outputs=[text_examples.dataset, box_examples.dataset, mask_examples.dataset, pf_examples.dataset]
+        )
+gradio_app = gr.Blocks()
+with gradio_app:
+    gr.HTML(
+        """
+    <h1 style='text-align: center'>
+    <img src="/file=figures/logo.png" width="2.5%" style="display:inline;padding-bottom:4px">
+    YOLOE: Real-Time Seeing Anything
+    </h1>
+    """)
+    gr.HTML(
+        """
+        <h3 style='text-align: center'>
+        <a href='https://arxiv.org/abs/2503.07465' target='_blank'>arXiv</a> | <a href='https://github.com/THU-MIG/yoloe' target='_blank'>github</a>
+        </h3>
+        """)
+    gr.Markdown(
+        """
+        We introduce **YOLOE(ye)**, a highly **efficient**, **unified**, and **open** object detection and segmentation model, like human eye, under different prompt mechanisms, like *texts*, *visual inputs*, and *prompt-free paradigm*.
+        """
+    )
+    gr.Markdown(
+        """
+        If desired objects are not identified, pleaset set a **smaller** confidence threshold, e.g., for visual prompts with handcrafted shape or cross-image prompts.
+        """
+    )
+    gr.Markdown(
+        """
+        Drawing **multiple** boxes or handcrafted shapes as visual prompt in an image is also supported, which leads to more accurate prompt.
+        """
+    )
+    with gr.Row():
+        with gr.Column():
+            app()
+if __name__ == '__main__':
+    gradio_app.launch(allowed_paths=["figures"])
--- a/benchmark.sh
+++ b/benchmark.sh
+set -e
+set -x
+MODEL=$1
+trtexec --onnx="${MODEL}.onnx" \
+    --fp16 \
+    --saveEngine="${MODEL}.engine" \
+    --timingCacheFile="${MODEL}.engine.timing.cache" \
+    --warmUp=500   \
+    --duration=10  \
+    --useCudaGraph  \
+    --useSpinWait  \
+    --noDataTransfers > /dev/null
+trtexec \
+    --fp16 \
+    --loadEngine="${MODEL}.engine" \
+    --timingCacheFile="${MODEL}.engine.timing.cache" \
+    --warmUp=500   \
+    --duration=10  \
+    --useCudaGraph  \
+    --useSpinWait  \
+    --noDataTransfers
\ No newline at end of file
--- a/doc/SAVPE.png
+++ b/doc/SAVPE.png
--- a/doc/bus-output.png
+++ b/doc/bus-output.png
--- a/doc/yoloe.png
+++ b/doc/yoloe.png
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
+ENV DEBIAN_FRONTEND=noninteractive
+# RUN yum update && yum install -y git cmake wget build-essential
+# RUN source /opt/dtk-dtk25.04/env.sh
+# # 安装pip相关依赖
+COPY requirements.txt requirements.txt
+RUN pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
--- a/docker/requirements.txt
+++ b/docker/requirements.txt
+-e .
+-e third_party/lvis-api
+-e third_party/ml-mobileclip
+-e third_party/CLIP
\ No newline at end of file
--- a/docker_origin/Dockerfile
+++ b/docker_origin/Dockerfile
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+# Builds ultralytics/ultralytics:latest image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
+# Image is CUDA-optimized for YOLO11 single/multi-GPU training and inference
+# Start FROM PyTorch image https://hub.docker.com/r/pytorch/pytorch or nvcr.io/nvidia/pytorch:23.03-py3
+FROM pytorch/pytorch:2.5.0-cuda12.4-cudnn9-runtime
+# Set environment variables
+# Avoid DDP error "MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library" https://github.com/pytorch/pytorch/issues/37377
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_BREAK_SYSTEM_PACKAGES=1 \
+    MKL_THREADING_LAYER=GNU \
+    OMP_NUM_THREADS=1 
+# Downloads to user config dir
+ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
+    https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
+    /root/.config/Ultralytics/
+# Install linux packages
+# g++ required to build 'tflite_support' and 'lap' packages, libusb-1.0-0 required for 'tflite_support' package
+# libsm6 required by libqxcb to create QT-based windows for visualization; set 'QT_DEBUG_PLUGINS=1' to test in docker
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+    gcc git zip unzip wget curl htop libgl1 libglib2.0-0 libpython3-dev gnupg g++ libusb-1.0-0 libsm6 \
+    && rm -rf /var/lib/apt/lists/*
+# Security updates
+# https://security.snyk.io/vuln/SNYK-UBUNTU1804-OPENSSL-3314796
+RUN apt upgrade --no-install-recommends -y openssl tar
+# Create working directory
+WORKDIR /ultralytics
+# Copy contents and configure git
+COPY . .
+RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
+ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
+# Install pip packages
+RUN python3 -m pip install --upgrade pip wheel
+# Note -cu12 must be used with tensorrt)
+RUN pip install -e ".[export]" tensorrt-cu12 "albumentations>=1.4.6" comet pycocotools
+# Run exports to AutoInstall packages
+# Edge TPU export fails the first time so is run twice here
+RUN yolo export model=tmp/yolo11n.pt format=edgetpu imgsz=32 || yolo export model=tmp/yolo11n.pt format=edgetpu imgsz=32
+RUN yolo export model=tmp/yolo11n.pt format=ncnn imgsz=32
+# Requires <= Python 3.10, bug with paddlepaddle==2.5.0 https://github.com/PaddlePaddle/X2Paddle/issues/991
+RUN pip install "paddlepaddle>=2.6.0" x2paddle
+# Fix error: `np.bool` was a deprecated alias for the builtin `bool` segmentation error in Tests
+RUN pip install numpy==1.23.5
+# Remove extra build files
+RUN rm -rf tmp /root/.config/Ultralytics/persistent_cache.json
+# Usage Examples -------------------------------------------------------------------------------------------------------
+# Build and Push
+# t=ultralytics/ultralytics:latest && sudo docker build -f docker/Dockerfile -t $t . && sudo docker push $t
+# Pull and Run with access to all GPUs
+# t=ultralytics/ultralytics:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all $t
+# Pull and Run with access to GPUs 2 and 3 (inside container CUDA devices will appear as 0 and 1)
+# t=ultralytics/ultralytics:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus '"device=2,3"' $t
+# Pull and Run with local directory access
+# t=ultralytics/ultralytics:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all -v "$(pwd)"/shared/datasets:/datasets $t
+# Kill all
+# sudo docker kill $(sudo docker ps -q)
+# Kill all image-based
+# sudo docker kill $(sudo docker ps -qa --filter ancestor=ultralytics/ultralytics:latest)
+# DockerHub tag update
+# t=ultralytics/ultralytics:latest tnew=ultralytics/ultralytics:v6.2 && sudo docker pull $t && sudo docker tag $t $tnew && sudo docker push $tnew
+# Clean up
+# sudo docker system prune -a --volumes
+# Update Ubuntu drivers
+# https://www.maketecheasier.com/install-nvidia-drivers-ubuntu/
+# DDP test
+# python -m torch.distributed.run --nproc_per_node 2 --master_port 1 train.py --epochs 3
+# GCP VM from Image
+# docker.io/ultralytics/ultralytics:latest
--- a/docker_origin/Dockerfile-arm64
+++ b/docker_origin/Dockerfile-arm64
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+# Builds ultralytics/ultralytics:latest-arm64 image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
+# Image is aarch64-compatible for Apple M1, M2, M3, Raspberry Pi and other ARM architectures
+# Start FROM Ubuntu image https://hub.docker.com/_/ubuntu with "FROM arm64v8/ubuntu:22.04" (deprecated)
+# Start FROM Debian image for arm64v8 https://hub.docker.com/r/arm64v8/debian (new)
+FROM arm64v8/debian:bookworm-slim
+# Set environment variables
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_BREAK_SYSTEM_PACKAGES=1
+# Downloads to user config dir
+ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
+    https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
+    /root/.config/Ultralytics/
+# Install linux packages
+# g++ required to build 'tflite_support' and 'lap' packages, libusb-1.0-0 required for 'tflite_support' package
+# pkg-config and libhdf5-dev (not included) are needed to build 'h5py==3.11.0' aarch64 wheel required by 'tensorflow'
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+    python3-pip git zip unzip wget curl htop gcc libgl1 libglib2.0-0 libpython3-dev gnupg g++ libusb-1.0-0 \
+    && rm -rf /var/lib/apt/lists/*
+# Create working directory
+WORKDIR /ultralytics
+# Copy contents and configure git
+COPY . .
+RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
+ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
+# Install pip packages
+RUN python3 -m pip install --upgrade pip wheel
+RUN pip install -e ".[export]"
+# Creates a symbolic link to make 'python' point to 'python3'
+RUN ln -sf /usr/bin/python3 /usr/bin/python
+# Remove extra build files
+RUN rm -rf /root/.config/Ultralytics/persistent_cache.json
+# Usage Examples -------------------------------------------------------------------------------------------------------
+# Build and Push
+# t=ultralytics/ultralytics:latest-arm64 && sudo docker build --platform linux/arm64 -f docker/Dockerfile-arm64 -t $t . && sudo docker push $t
+# Run
+# t=ultralytics/ultralytics:latest-arm64 && sudo docker run -it --ipc=host $t
+# Pull and Run
+# t=ultralytics/ultralytics:latest-arm64 && sudo docker pull $t && sudo docker run -it --ipc=host $t
+# Pull and Run with local volume mounted
+# t=ultralytics/ultralytics:latest-arm64 && sudo docker pull $t && sudo docker run -it --ipc=host -v "$(pwd)"/shared/datasets:/datasets $t
--- a/docker_origin/Dockerfile-conda
+++ b/docker_origin/Dockerfile-conda
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+# Builds ultralytics/ultralytics:latest-conda image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
+# Image is optimized for Ultralytics Anaconda (https://anaconda.org/conda-forge/ultralytics) installation and usage
+# Start FROM miniconda3 image https://hub.docker.com/r/continuumio/miniconda3
+FROM continuumio/miniconda3:latest
+# Set environment variables
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_BREAK_SYSTEM_PACKAGES=1
+# Downloads to user config dir
+ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
+    https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
+    /root/.config/Ultralytics/
+# Install linux packages
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+    libgl1 \
+    && rm -rf /var/lib/apt/lists/*
+# Copy contents
+ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
+# Install conda packages
+# mkl required to fix 'OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory'
+RUN conda config --set solver libmamba && \
+    conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia && \
+    conda install -c conda-forge ultralytics mkl
+    # conda install -c pytorch -c nvidia -c conda-forge pytorch torchvision pytorch-cuda=12.1 ultralytics mkl
+# Remove extra build files
+RUN rm -rf /root/.config/Ultralytics/persistent_cache.json
+# Usage Examples -------------------------------------------------------------------------------------------------------
+# Build and Push
+# t=ultralytics/ultralytics:latest-conda && sudo docker build -f docker/Dockerfile-cpu -t $t . && sudo docker push $t
+# Run
+# t=ultralytics/ultralytics:latest-conda && sudo docker run -it --ipc=host $t
+# Pull and Run
+# t=ultralytics/ultralytics:latest-conda && sudo docker pull $t && sudo docker run -it --ipc=host $t
+# Pull and Run with local volume mounted
+# t=ultralytics/ultralytics:latest-conda && sudo docker pull $t && sudo docker run -it --ipc=host -v "$(pwd)"/shared/datasets:/datasets $t
--- a/docker_origin/Dockerfile-cpu
+++ b/docker_origin/Dockerfile-cpu
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+# Builds ultralytics/ultralytics:latest-cpu image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
+# Image is CPU-optimized for ONNX, OpenVINO and PyTorch YOLO11 deployments
+# Use official Python base image for reproducibility (3.11.10 for export and 3.12.6 for inference)
+FROM python:3.11.10-slim-bookworm
+# Set environment variables
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_BREAK_SYSTEM_PACKAGES=1
+# Downloads to user config dir
+ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
+    https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
+    /root/.config/Ultralytics/
+# Install linux packages
+# g++ required to build 'tflite_support' and 'lap' packages, libusb-1.0-0 required for 'tflite_support' package
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+    python3-pip git zip unzip wget curl htop libgl1 libglib2.0-0 libpython3-dev gnupg g++ libusb-1.0-0 \
+    && rm -rf /var/lib/apt/lists/*
+# Create working directory
+WORKDIR /ultralytics
+# Copy contents and configure git
+COPY . .
+RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
+ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
+# Install pip packages
+RUN python3 -m pip install --upgrade pip wheel
+RUN pip install -e ".[export]" --extra-index-url https://download.pytorch.org/whl/cpu
+# Run exports to AutoInstall packages
+RUN yolo export model=tmp/yolo11n.pt format=edgetpu imgsz=32
+RUN yolo export model=tmp/yolo11n.pt format=ncnn imgsz=32
+# Requires Python<=3.10, bug with paddlepaddle==2.5.0 https://github.com/PaddlePaddle/X2Paddle/issues/991
+RUN pip install "paddlepaddle>=2.6.0" x2paddle
+# Remove extra build files
+RUN rm -rf tmp /root/.config/Ultralytics/persistent_cache.json
+# Set default command to bash
+CMD ["/bin/bash"]
+# Usage Examples -------------------------------------------------------------------------------------------------------
+# Build and Push
+# t=ultralytics/ultralytics:latest-cpu && sudo docker build -f docker/Dockerfile-cpu -t $t . && sudo docker push $t
+# Run
+# t=ultralytics/ultralytics:latest-cpu && sudo docker run -it --ipc=host --name NAME $t
+# Pull and Run
+# t=ultralytics/ultralytics:latest-cpu && sudo docker pull $t && sudo docker run -it --ipc=host --name NAME $t
+# Pull and Run with local volume mounted
+# t=ultralytics/ultralytics:latest-cpu && sudo docker pull $t && sudo docker run -it --ipc=host -v "$(pwd)"/shared/datasets:/datasets $t
--- a/docker_origin/Dockerfile-jetson-jetpack4
+++ b/docker_origin/Dockerfile-jetson-jetpack4
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+# Builds ultralytics/ultralytics:jetson-jetpack4 image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
+# Supports JetPack4.x for YOLO11 on Jetson Nano, TX2, Xavier NX, AGX Xavier
+# Start FROM https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-cuda
+FROM nvcr.io/nvidia/l4t-cuda:10.2.460-runtime
+# Set environment variables
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1
+# Downloads to user config dir
+ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
+    https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
+    /root/.config/Ultralytics/
+# Add NVIDIA repositories for TensorRT dependencies
+RUN wget -q -O - https://repo.download.nvidia.com/jetson/jetson-ota-public.asc | apt-key add - && \
+  echo "deb https://repo.download.nvidia.com/jetson/common r32.7 main" > /etc/apt/sources.list.d/nvidia-l4t-apt-source.list && \
+  echo "deb https://repo.download.nvidia.com/jetson/t194 r32.7 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
+# Install dependencies
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+    git python3.8 python3.8-dev python3-pip python3-libnvinfer libopenmpi-dev libopenblas-base libomp-dev gcc \
+    && rm -rf /var/lib/apt/lists/*
+# Create symbolic links for python3.8 and pip3
+RUN ln -sf /usr/bin/python3.8 /usr/bin/python3
+RUN ln -s /usr/bin/pip3 /usr/bin/pip
+# Create working directory
+WORKDIR /ultralytics
+# Copy contents and configure git
+COPY . .
+RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
+ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
+# Download onnxruntime-gpu 1.8.0 and tensorrt 8.2.0.6
+# Other versions can be seen in https://elinux.org/Jetson_Zoo and https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048
+ADD https://nvidia.box.com/shared/static/gjqofg7rkg97z3gc8jeyup6t8n9j8xjw.whl onnxruntime_gpu-1.8.0-cp38-cp38-linux_aarch64.whl
+ADD https://forums.developer.nvidia.com/uploads/short-url/hASzFOm9YsJx6VVFrDW1g44CMmv.whl tensorrt-8.2.0.6-cp38-none-linux_aarch64.whl
+# Install pip packages
+RUN python3 -m pip install --upgrade pip wheel
+RUN pip install \
+    onnxruntime_gpu-1.8.0-cp38-cp38-linux_aarch64.whl \
+    tensorrt-8.2.0.6-cp38-none-linux_aarch64.whl \
+    https://github.com/ultralytics/assets/releases/download/v0.0.0/torch-1.11.0a0+gitbc2c6ed-cp38-cp38-linux_aarch64.whl \
+    https://github.com/ultralytics/assets/releases/download/v0.0.0/torchvision-0.12.0a0+9b5a3fe-cp38-cp38-linux_aarch64.whl
+RUN pip install -e ".[export]"
+# Remove extra build files
+RUN rm -rf *.whl /root/.config/Ultralytics/persistent_cache.json
+# Usage Examples -------------------------------------------------------------------------------------------------------
+# Build and Push
+# t=ultralytics/ultralytics:latest-jetson-jetpack4 && sudo docker build --platform linux/arm64 -f docker/Dockerfile-jetson-jetpack4 -t $t . && sudo docker push $t
+# Run
+# t=ultralytics/ultralytics:latest-jetson-jetpack4 && sudo docker run -it --ipc=host $t
+# Pull and Run
+# t=ultralytics/ultralytics:latest-jetson-jetpack4 && sudo docker pull $t && sudo docker run -it --ipc=host $t
+# Pull and Run with NVIDIA runtime
+# t=ultralytics/ultralytics:latest-jetson-jetpack4 && sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
--- a/docker_origin/Dockerfile-jetson-jetpack5
+++ b/docker_origin/Dockerfile-jetson-jetpack5
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+# Builds ultralytics/ultralytics:jetson-jetson-jetpack5 image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
+# Supports JetPack5.x for YOLO11 on Jetson Xavier NX, AGX Xavier, AGX Orin, Orin Nano and Orin NX
+# Start FROM https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-pytorch
+FROM nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3
+# Set environment variables
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_BREAK_SYSTEM_PACKAGES=1
+# Downloads to user config dir
+ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
+    https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
+    /root/.config/Ultralytics/
+# Install linux packages
+# g++ required to build 'tflite_support' and 'lap' packages
+# libusb-1.0-0 required for 'tflite_support' package when exporting to TFLite
+# pkg-config and libhdf5-dev (not included) are needed to build 'h5py==3.11.0' aarch64 wheel required by 'tensorflow'
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+    gcc git zip unzip  wget curl htop libgl1 libglib2.0-0 libpython3-dev gnupg g++ libusb-1.0-0 \
+    && rm -rf /var/lib/apt/lists/*
+# Create working directory
+WORKDIR /ultralytics
+# Copy contents and configure git
+COPY . .
+RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
+ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
+# Remove opencv-python from Ultralytics dependencies as it conflicts with opencv-python installed in base image
+RUN sed -i '/opencv-python/d' pyproject.toml
+# Download onnxruntime-gpu 1.15.1 for Jetson Linux 35.2.1 (JetPack 5.1). Other versions can be seen in https://elinux.org/Jetson_Zoo#ONNX_Runtime
+ADD https://nvidia.box.com/shared/static/mvdcltm9ewdy2d5nurkiqorofz1s53ww.whl onnxruntime_gpu-1.15.1-cp38-cp38-linux_aarch64.whl
+# Install pip packages manually for TensorRT compatibility https://github.com/NVIDIA/TensorRT/issues/2567
+RUN python3 -m pip install --upgrade pip wheel
+RUN pip install onnxruntime_gpu-1.15.1-cp38-cp38-linux_aarch64.whl
+RUN pip install -e ".[export]"
+# Remove extra build files
+RUN rm -rf *.whl /root/.config/Ultralytics/persistent_cache.json
+# Usage Examples -------------------------------------------------------------------------------------------------------
+# Build and Push
+# t=ultralytics/ultralytics:latest-jetson-jetpack5 && sudo docker build --platform linux/arm64 -f docker/Dockerfile-jetson-jetpack5 -t $t . && sudo docker push $t
+# Run
+# t=ultralytics/ultralytics:latest-jetson-jetpack5 && sudo docker run -it --ipc=host $t
+# Pull and Run
+# t=ultralytics/ultralytics:latest-jetson-jetpack5 && sudo docker pull $t && sudo docker run -it --ipc=host $t
+# Pull and Run with NVIDIA runtime
+# t=ultralytics/ultralytics:latest-jetson-jetpack5 && sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
--- a/docker_origin/Dockerfile-jetson-jetpack6
+++ b/docker_origin/Dockerfile-jetson-jetpack6
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+# Builds ultralytics/ultralytics:jetson-jetpack6 image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
+# Supports JetPack6.x for YOLO11 on Jetson AGX Orin, Orin NX and Orin Nano Series
+# Start FROM https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-jetpack
+FROM nvcr.io/nvidia/l4t-jetpack:r36.3.0
+# Set environment variables
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_BREAK_SYSTEM_PACKAGES=1
+# Downloads to user config dir
+ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
+    https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
+    /root/.config/Ultralytics/
+# Install dependencies
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+    git python3-pip libopenmpi-dev libopenblas-base libomp-dev \
+    && rm -rf /var/lib/apt/lists/*
+# Create working directory
+WORKDIR /ultralytics
+# Copy contents and configure git
+COPY . .
+RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
+ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
+# Download onnxruntime-gpu 1.18.0 from https://elinux.org/Jetson_Zoo and https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048
+ADD https://nvidia.box.com/shared/static/48dtuob7meiw6ebgfsfqakc9vse62sg4.whl onnxruntime_gpu-1.18.0-cp310-cp310-linux_aarch64.whl
+# Pip install onnxruntime-gpu, torch, torchvision and ultralytics
+RUN python3 -m pip install --upgrade pip wheel
+RUN pip install \
+    onnxruntime_gpu-1.18.0-cp310-cp310-linux_aarch64.whl \
+    https://github.com/ultralytics/assets/releases/download/v0.0.0/torch-2.3.0-cp310-cp310-linux_aarch64.whl \
+    https://github.com/ultralytics/assets/releases/download/v0.0.0/torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl
+RUN pip install -e ".[export]"
+# Remove extra build files
+RUN rm -rf *.whl /root/.config/Ultralytics/persistent_cache.json
+# Usage Examples -------------------------------------------------------------------------------------------------------
+# Build and Push
+# t=ultralytics/ultralytics:latest-jetson-jetpack6 && sudo docker build --platform linux/arm64 -f docker/Dockerfile-jetson-jetpack6 -t $t . && sudo docker push $t
+# Run
+# t=ultralytics/ultralytics:latest-jetson-jetpack6 && sudo docker run -it --ipc=host $t
+# Pull and Run
+# t=ultralytics/ultralytics:latest-jetson-jetpack6 && sudo docker pull $t && sudo docker run -it --ipc=host $t
+# Pull and Run with NVIDIA runtime
+# t=ultralytics/ultralytics:latest-jetson-jetpack6 && sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
--- a/docker_origin/Dockerfile-jupyter
+++ b/docker_origin/Dockerfile-jupyter
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+# Builds ultralytics/ultralytics:latest-jupyter image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
+# Image provides JupyterLab interface for interactive YOLO development and includes tutorial notebooks
+# Start from Python-based Ultralytics image for full Python environment
+FROM ultralytics/ultralytics:latest-python
+# Install JupyterLab for interactive development
+RUN /usr/local/bin/pip install jupyterlab
+# Create persistent data directory structure
+RUN mkdir /data
+# Configure YOLO directory paths
+RUN mkdir /data/datasets && /usr/local/bin/yolo settings datasets_dir="/data/datasets"
+RUN mkdir /data/weights && /usr/local/bin/yolo settings weights_dir="/data/weights"
+RUN mkdir /data/runs && /usr/local/bin/yolo settings runs_dir="/data/runs"
+# Start JupyterLab with tutorial notebook
+ENTRYPOINT ["/usr/local/bin/jupyter", "lab", "--allow-root", "--ip=*", "/ultralytics/examples/tutorial.ipynb"]
+# Usage Examples -------------------------------------------------------------------------------------------------------
+# Build and Push
+# t=ultralytics/ultralytics:latest-jupyter && sudo docker build -f docker/Dockerfile-jupyter -t $t . && sudo docker push $t
+# Run
+# t=ultralytics/ultralytics:latest-jupyter && sudo docker run -it --ipc=host -p 8888:8888 $t
+# Pull and Run
+# t=ultralytics/ultralytics:latest-jupyter && sudo docker pull $t && sudo docker run -it --ipc=host -p 8888:8888 $t
+# Pull and Run with local volume mounted
+# t=ultralytics/ultralytics:latest-jupyter && sudo docker pull $t && sudo docker run -it --ipc=host -p 8888:8888 -v "$(pwd)"/datasets:/data/datasets $t