"docs/source/vscode:/vscode.git/clone" did not exist on "a3febc061b867a30ffa80cbc7e394cb0208e7346"
Commit af155c51 authored by chenzk's avatar chenzk
Browse files

v1.0

parents
Pipeline #2732 failed with stages
in 0 seconds
# This CITATION.cff file was generated with https://bit.ly/cffinit
cff-version: 1.2.0
title: Ultralytics YOLO
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Glenn
family-names: Jocher
affiliation: Ultralytics
orcid: 'https://orcid.org/0000-0001-5950-6979'
- family-names: Qiu
given-names: Jing
affiliation: Ultralytics
orcid: 'https://orcid.org/0000-0003-3783-7069'
- given-names: Ayush
family-names: Chaurasia
affiliation: Ultralytics
orcid: 'https://orcid.org/0000-0002-7603-6750'
repository-code: 'https://github.com/ultralytics/ultralytics'
url: 'https://ultralytics.com'
license: AGPL-3.0
version: 8.0.0
date-released: '2023-01-10'
---
comments: true
description: Learn how to contribute to Ultralytics YOLO open-source repositories. Follow guidelines for pull requests, code of conduct, and bug reporting.
keywords: Ultralytics, YOLO, open-source, contribution, pull request, code of conduct, bug reporting, GitHub, CLA, Google-style docstrings
---
# Contributing to Ultralytics Open-Source Projects
Welcome! We're thrilled that you're considering contributing to our [Ultralytics](https://www.ultralytics.com/) [open-source](https://github.com/ultralytics) projects. Your involvement not only helps enhance the quality of our repositories but also benefits the entire community. This guide provides clear guidelines and best practices to help you get started.
<a href="https://github.com/ultralytics/ultralytics/graphs/contributors">
<img width="100%" src="https://github.com/ultralytics/assets/raw/main/im/image-contributors.png" alt="Ultralytics open-source contributors"></a>
## Table of Contents
1. [Code of Conduct](#code-of-conduct)
2. [Contributing via Pull Requests](#contributing-via-pull-requests)
- [CLA Signing](#cla-signing)
- [Google-Style Docstrings](#google-style-docstrings)
- [GitHub Actions CI Tests](#github-actions-ci-tests)
3. [Reporting Bugs](#reporting-bugs)
4. [License](#license)
5. [Conclusion](#conclusion)
6. [FAQ](#faq)
## Code of Conduct
To ensure a welcoming and inclusive environment for everyone, all contributors must adhere to our [Code of Conduct](https://docs.ultralytics.com/help/code_of_conduct/). Respect, kindness, and professionalism are at the heart of our community.
## Contributing via Pull Requests
We greatly appreciate contributions in the form of pull requests. To make the review process as smooth as possible, please follow these steps:
1. **[Fork the repository](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo):** Start by forking the Ultralytics YOLO repository to your GitHub account.
2. **[Create a branch](https://docs.github.com/en/desktop/making-changes-in-a-branch/managing-branches-in-github-desktop):** Create a new branch in your forked repository with a clear, descriptive name that reflects your changes.
3. **Make your changes:** Ensure your code adheres to the project's style guidelines and does not introduce any new errors or warnings.
4. **[Test your changes](https://github.com/ultralytics/ultralytics/tree/main/tests):** Before submitting, test your changes locally to confirm they work as expected and don't cause any new issues.
5. **[Commit your changes](https://docs.github.com/en/desktop/making-changes-in-a-branch/committing-and-reviewing-changes-to-your-project-in-github-desktop):** Commit your changes with a concise and descriptive commit message. If your changes address a specific issue, include the issue number in your commit message.
6. **[Create a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request):** Submit a pull request from your forked repository to the main Ultralytics YOLO repository. Provide a clear and detailed explanation of your changes and how they improve the project.
### CLA Signing
Before we can merge your pull request, you must sign our [Contributor License Agreement (CLA)](https://docs.ultralytics.com/help/CLA/). This legal agreement ensures that your contributions are properly licensed, allowing the project to continue being distributed under the AGPL-3.0 license.
After submitting your pull request, the CLA bot will guide you through the signing process. To sign the CLA, simply add a comment in your PR stating:
```
I have read the CLA Document and I sign the CLA
```
### Google-Style Docstrings
When adding new functions or classes, please include [Google-style docstrings](https://google.github.io/styleguide/pyguide.html). These docstrings provide clear, standardized documentation that helps other developers understand and maintain your code.
#### Example
This example illustrates a Google-style docstring. Ensure that both input and output `types` are always enclosed in parentheses, e.g., `(bool)`.
```python
def example_function(arg1, arg2=4):
"""
Example function demonstrating Google-style docstrings.
Args:
arg1 (int): The first argument.
arg2 (int): The second argument, with a default value of 4.
Returns:
(bool): True if successful, False otherwise.
Examples:
>>> result = example_function(1, 2) # returns False
"""
if arg1 == arg2:
return True
return False
```
#### Example with type hints
This example includes both a Google-style docstring and type hints for arguments and returns, though using either independently is also acceptable.
```python
def example_function(arg1: int, arg2: int = 4) -> bool:
"""
Example function demonstrating Google-style docstrings.
Args:
arg1: The first argument.
arg2: The second argument, with a default value of 4.
Returns:
True if successful, False otherwise.
Examples:
>>> result = example_function(1, 2) # returns False
"""
if arg1 == arg2:
return True
return False
```
#### Example Single-line
For smaller or simpler functions, a single-line docstring may be sufficient. The docstring must use three double-quotes, be a complete sentence, start with a capital letter, and end with a period.
```python
def example_small_function(arg1: int, arg2: int = 4) -> bool:
"""Example function with a single-line docstring."""
return arg1 == arg2
```
### GitHub Actions CI Tests
All pull requests must pass the GitHub Actions [Continuous Integration](https://docs.ultralytics.com/help/CI/) (CI) tests before they can be merged. These tests include linting, unit tests, and other checks to ensure that your changes meet the project's quality standards. Review the CI output and address any issues that arise.
## Reporting Bugs
We highly value bug reports as they help us maintain the quality of our projects. When reporting a bug, please provide a [Minimum Reproducible Example](https://docs.ultralytics.com/help/minimum_reproducible_example/)—a simple, clear code example that consistently reproduces the issue. This allows us to quickly identify and resolve the problem.
## License
Ultralytics uses the [GNU Affero General Public License v3.0 (AGPL-3.0)](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) for its repositories. This license promotes openness, transparency, and collaborative improvement in software development. It ensures that all users have the freedom to use, modify, and share the software, fostering a strong community of collaboration and innovation.
We encourage all contributors to familiarize themselves with the terms of the AGPL-3.0 license to contribute effectively and ethically to the Ultralytics open-source community.
## Conclusion
Thank you for your interest in contributing to [Ultralytics](https://www.ultralytics.com/) [open-source](https://github.com/ultralytics) YOLO projects. Your participation is essential in shaping the future of our software and building a vibrant community of innovation and collaboration. Whether you're enhancing code, reporting bugs, or suggesting new features, your contributions are invaluable.
We're excited to see your ideas come to life and appreciate your commitment to advancing object detection technology. Together, let's continue to grow and innovate in this exciting open-source journey. Happy coding! 🚀🌟
## FAQ
### Why should I contribute to Ultralytics YOLO open-source repositories?
Contributing to Ultralytics YOLO open-source repositories improves the software, making it more robust and feature-rich for the entire community. Contributions can include code enhancements, bug fixes, documentation improvements, and new feature implementations. Additionally, contributing allows you to collaborate with other skilled developers and experts in the field, enhancing your own skills and reputation. For details on how to get started, refer to the [Contributing via Pull Requests](#contributing-via-pull-requests) section.
### How do I sign the Contributor License Agreement (CLA) for Ultralytics YOLO?
To sign the Contributor License Agreement (CLA), follow the instructions provided by the CLA bot after submitting your pull request. This process ensures that your contributions are properly licensed under the AGPL-3.0 license, maintaining the legal integrity of the open-source project. Add a comment in your pull request stating:
```
I have read the CLA Document and I sign the CLA.
```
For more information, see the [CLA Signing](#cla-signing) section.
### What are Google-style docstrings, and why are they required for Ultralytics YOLO contributions?
Google-style docstrings provide clear, concise documentation for functions and classes, improving code readability and maintainability. These docstrings outline the function's purpose, arguments, and return values with specific formatting rules. When contributing to Ultralytics YOLO, following Google-style docstrings ensures that your additions are well-documented and easily understood. For examples and guidelines, visit the [Google-Style Docstrings](#google-style-docstrings) section.
### How can I ensure my changes pass the GitHub Actions CI tests?
Before your pull request can be merged, it must pass all GitHub Actions Continuous Integration (CI) tests. These tests include linting, unit tests, and other checks to ensure the code meets
the project's quality standards. Review the CI output and fix any issues. For detailed information on the CI process and troubleshooting tips, see the [GitHub Actions CI Tests](#github-actions-ci-tests) section.
### How do I report a bug in Ultralytics YOLO repositories?
To report a bug, provide a clear and concise [Minimum Reproducible Example](https://docs.ultralytics.com/help/minimum_reproducible_example/) along with your bug report. This helps developers quickly identify and fix the issue. Ensure your example is minimal yet sufficient to replicate the problem. For more detailed steps on reporting bugs, refer to the [Reporting Bugs](#reporting-bugs) section.
This diff is collapsed.
# YOLOE
YOLOE跨越多种开放Prompt机制实现了实时感知任何事物,不受限于预定义类别。
## 论文
`YOLOE: Real-Time Seeing Anything`
- https://arxiv.org/pdf/2503.07465
## 模型结构
YOLOE采用了典型的YOLOs架构,通过RepRTA支持文本Prompt,通过SAVPE支持视觉Prompt,以及通过LRPC支持无Prompt场景,YOLOs中分类头的结构将最后一个卷积层的输出通道数从闭集场景中的类别数更改为嵌入维度,以支持文本和视觉Prompt。
<div align=center>
<img src="./doc/yoloe.png"/>
</div>
## 算法原理
视觉Prompt旨在通过视觉线索Anchor点指示感兴趣的物体类别。
<div align=center>
<img src="./doc/SAVPE.png"/>
</div>
## 环境配置
```
mv yoloe_pytorch yoloe # 去框架名后缀
```
### Docker(方法一)
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
# <your IMAGE ID>为以上拉取的docker的镜像ID替换,本镜像为:6063b673703a
docker run -it --shm-size=64G -v $PWD/yoloe:/home/yoloe -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name ye <your IMAGE ID> bash
cd /home/yoloe
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
```
### Dockerfile(方法二)
```
cd /home/yoloe/docker
docker build --no-cache -t ye:latest .
docker run --shm-size=64G --name ye -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/../../yoloe:/home/yoloe -it ye bash
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt。
```
### Anaconda(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
- https://developer.sourcefind.cn/tool/
```
DTK驱动:dtk2504
python:python3.10
torch:2.4.1
torchvision:0.19.1
triton:3.0.0
vllm:0.6.2
flash-attn:2.6.1
deepspeed:0.14.2
apex:1.4.0
transformers:4.46.3
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`
2、其它非特殊库参照requirements.txt安装
```
cd /home/yoloe
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
```
## 数据集
`无`
## 训练
## 推理
预训练权重目录结构:
```
/home/yoloe/
|── pretrain/yoloe-v8l-seg.pt
└── mobileclip_blt.pt
```
### 单机单卡
```
cd /home/yoloe
export HF_ENDPOINT=https://hf-mirror.com
sh infer.sh
```
更多资料可参考源项目中的[`README_origin`](./README_origin.md)
## result
`输入: `
```
--source ultralytics/assets/bus.jpg
--names person dog cat
```
`输出:`
```
ultralytics/assets/bus-output.jpg
```
<div align=center>
<img src="./doc/bus-output.png"/>
</div>
### 精度
DCU与GPU精度一致,推理框架:pytorch。
## 应用场景
### 算法类别
`目标检测`
### 热点应用行业
`制造,电商,医疗,能源,教育`
## 预训练权重
github/HF下载地址为:[yoloe-v8l-seg](https://github.com/ultralytics/assets/releases/download/v8.3.0/yoloe-v8l-seg.pt)[mobileclip_blt](https://huggingface.co/apple/MobileCLIP-B-LT/blob/main/mobileclip_blt.pt)
github权重可通过镜像地址用wget下载,例如:https://bgithub.xyz。
## 源码仓库及问题反馈
- http://developer.sourcefind.cn/codes/modelzoo/yoloe_pytorch.git
## 参考资料
- https://github.com/THU-MIG/yoloe.git
# [YOLOE: Real-Time Seeing Anything](https://arxiv.org/abs/2503.07465)
Official PyTorch implementation of **YOLOE**.
<p align="center">
<img src="figures/comparison.svg" width=70%> <br>
Comparison of performance, training cost, and inference efficiency between YOLOE (Ours) and YOLO-Worldv2 in terms of open text prompts.
</p>
[YOLOE: Real-Time Seeing Anything](https://arxiv.org/abs/2503.07465).\
Ao Wang*, Lihao Liu*, Hui Chen, Zijia Lin, Jungong Han, and Guiguang Ding\
[![arXiv](https://img.shields.io/badge/arXiv-2503.07465-b31b1b.svg)](https://arxiv.org/abs/2503.07465) [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/jameslahm/yoloe/tree/main) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/jameslahm/yoloe) [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/zero-shot-object-detection-and-segmentation-with-yoloe.ipynb) [![Hugging Face Collection](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-blue)](https://huggingface.co/collections/jameslahm/yoloe-67d5110aabaefbe129c15917) [![Openbayes Demo](https://img.shields.io/static/v1?label=Demo&message=OpenBayes%E8%B4%9D%E5%BC%8F%E8%AE%A1%E7%AE%97&color=green)](https://openbayes.com/console/public/tutorials/BQhUorEqyVX)
We introduce **YOLOE(ye)**, a highly **efficient**, **unified**, and **open** object detection and segmentation model, like human eye, under different prompt mechanisms, like *texts*, *visual inputs*, and *prompt-free paradigm*, with **zero inference and transferring overhead** compared with closed-set YOLOs.
<!-- <p align="center">
<img src="figures/pipeline.svg" width=96%> <br>
</p> -->
<p align="center">
<img src="figures/visualization.svg" width=96%> <br>
</p>
<details>
<summary>
<font size="+1">Abstract</font>
</summary>
Object detection and segmentation are widely employed in computer vision applications, yet conventional models like YOLO series, while efficient and accurate, are limited by predefined categories, hindering adaptability in open scenarios. Recent open-set methods leverage text prompts, visual cues, or prompt-free paradigm to overcome this, but often compromise between performance and efficiency due to high computational demands or deployment complexity. In this work, we introduce YOLOE, which integrates detection and segmentation across diverse open prompt mechanisms within a single highly efficient model, achieving real-time seeing anything. For text prompts, we propose Re-parameterizable Region-Text Alignment (RepRTA) strategy. It refines pretrained textual embeddings via a re-parameterizable lightweight auxiliary network and enhances visual-textual alignment with zero inference and transferring overhead. For visual prompts, we present Semantic-Activated Visual Prompt Encoder (SAVPE). It employs decoupled semantic and activation branches to bring improved visual embedding and accuracy with minimal complexity. For prompt-free scenario, we introduce Lazy Region-Prompt Contrast (LRPC) strategy. It utilizes a built-in large vocabulary and specialized embedding to identify all objects, avoiding costly language model dependency. Extensive experiments show YOLOE's exceptional zero-shot performance and transferability with high inference efficiency and low training cost. Notably, on LVIS, with $3\times$ less training cost and $1.4\times$ inference speedup, YOLOE-v8-S surpasses YOLO-Worldv2-S by 3.5 AP. When transferring to COCO, YOLOE-v8-L achieves 0.6 $AP^b$ and 0.4 $AP^m$ gains over closed-set YOLOv8-L with nearly $4\times$ less training time.
</details>
<p></p>
<p align="center">
<img src="figures/pipeline.svg" width=96%> <br>
</p>
## Performance
### Zero-shot detection evaluation
- *Fixed AP* is reported on LVIS `minival` set with text (T) / visual (V) prompts.
- Training time is for text prompts with detection based on 8 Nvidia RTX4090 GPUs.
- FPS is measured on T4 with TensorRT and iPhone 12 with CoreML, respectively.
- For training data, OG denotes Objects365v1 and GoldG.
- YOLOE can become YOLOs after re-parameterization with **zero inference and transferring overhead**.
| Model | Size | Prompt | Params | Data | Time | FPS | $AP$ | $AP_r$ | $AP_c$ | $AP_f$ | Log |
|---|---|---|---|---|---|---|---|---|---|---|---|
| [YOLOE-v8-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8s-seg.pt) | 640 | T / V | 12M / 13M | OG | 12.0h | 305.8 / 64.3 | 27.9 / 26.2 | 22.3 / 21.3 | 27.8 / 27.7 | 29.0 / 25.7 | [T](./logs/yoloe-v8s-seg) / [V](./logs/yoloe-v8s-seg-vp) |
| [YOLOE-v8-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8m-seg.pt) | 640 | T / V | 27M / 30M | OG | 17.0h | 156.7 / 41.7 | 32.6 / 31.0 | 26.9 / 27.0 | 31.9 / 31.7 | 34.4 / 31.1 | [T](./logs/yoloe-v8m-seg) / [V](./logs/yoloe-v8m-seg-vp) |
| [YOLOE-v8-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8l-seg.pt) | 640 | T / V | 45M / 50M | OG | 22.5h | 102.5 / 27.2 | 35.9 / 34.2 | 33.2 / 33.2 | 34.8 / 34.6 | 37.3 / 34.1 | [T](./logs/yoloe-v8l-seg) / [V](./logs/yoloe-v8l-seg-vp) |
| [YOLOE-11-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11s-seg.pt) | 640 | T / V | 10M / 12M | OG | 13.0h | 301.2 / 73.3 | 27.5 / 26.3 | 21.4 / 22.5 | 26.8 / 27.1 | 29.3 / 26.4 | [T](./logs/yoloe-11s-seg) / [V](./logs/yoloe-11s-seg-vp) |
| [YOLOE-11-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11m-seg.pt) | 640 | T / V | 21M / 27M | OG | 18.5h | 168.3 / 39.2 | 33.0 / 31.4 | 26.9 / 27.1 | 32.5 / 31.9 | 34.5 / 31.7 | [T](./logs/yoloe-11m-seg) / [V](./logs/yoloe-11m-seg-vp) |
| [YOLOE-11-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11l-seg.pt) | 640 | T / V | 26M / 32M | OG | 23.5h | 130.5 / 35.1 | 35.2 / 33.7 | 29.1 / 28.1 | 35.0 / 34.6 | 36.5 / 33.8 | [T](./logs/yoloe-11l-seg) / [V](./logs/yoloe-11l-seg-vp) |
### Zero-shot segmentation evaluation
- The model is the same as above in [Zero-shot detection evaluation](#zero-shot-detection-evaluation).
- *Standard AP<sup>m</sup>* is reported on LVIS `val` set with text (T) / visual (V) prompts.
| Model | Size | Prompt | $AP^m$ | $AP_r^m$ | $AP_c^m$ | $AP_f^m$ |
|---|---|---|---|---|---|---|
| YOLOE-v8-S | 640 | T / V | 17.7 / 16.8 | 15.5 / 13.5 | 16.3 / 16.7 | 20.3 / 18.2 |
| YOLOE-v8-M | 640 | T / V | 20.8 / 20.3 | 17.2 / 17.0 | 19.2 / 20.1 | 24.2 / 22.0 |
| YOLOE-v8-L | 640 | T / V | 23.5 / 22.0 | 21.9 / 16.5 | 21.6 / 22.1 | 26.4 / 24.3 |
| YOLOE-11-S | 640 | T / V | 17.6 / 17.1 | 16.1 / 14.4 | 15.6 / 16.8 | 20.5 / 18.6 |
| YOLOE-11-M | 640 | T / V | 21.1 / 21.0 | 17.2 / 18.3 | 19.6 / 20.6 | 24.4 / 22.6 |
| YOLOE-11-L | 640 | T / V | 22.6 / 22.5 | 19.3 / 20.5 | 20.9 / 21.7 | 26.0 / 24.1 |
### Prompt-free evaluation
- The model is the same as above in [Zero-shot detection evaluation](#zero-shot-detection-evaluation) except the specialized prompt embedding.
- *Fixed AP* is reported on LVIS `minival` set and FPS is measured on Nvidia T4 GPU with Pytorch.
| Model | Size | Params | $AP$ | $AP_r$ | $AP_c$ | $AP_f$ | FPS | Log |
|---|---|---|---|---|---|---|---|---|
| [YOLOE-v8-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8s-seg-pf.pt) | 640 | 13M | 21.0 | 19.1 | 21.3 | 21.0 | 95.8 | [PF](./logs/yoloe-v8s-seg-pf/) |
| [YOLOE-v8-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8m-seg-pf.pt) | 640 | 29M | 24.7 | 22.2 | 24.5 | 25.3 | 45.9 | [PF](./logs/yoloe-v8m-seg-pf/) |
| [YOLOE-v8-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8l-seg-pf.pt) | 640 | 47M | 27.2 | 23.5 | 27.0 | 28.0 | 25.3 | [PF](./logs/yoloe-v8l-seg-pf/) |
| [YOLOE-11-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11s-seg-pf.pt) | 640 | 11M | 20.6 | 18.4 | 20.2 | 21.3 | 93.0 | [PF](./logs/yoloe-11s-seg-pf/) |
| [YOLOE-11-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11m-seg-pf.pt) | 640 | 24M | 25.5 | 21.6 | 25.5 | 26.1 | 42.5 | [PF](./logs/yoloe-11m-seg-pf/) |
| [YOLOE-11-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11l-seg-pf.pt) | 640 | 29M | 26.3 | 22.7 | 25.8 | 27.5 | 34.9 | [PF](./logs/yoloe-11l-seg-pf/) |
### Downstream transfer on COCO
- During transferring, YOLOE-v8 / YOLOE-11 is **exactly the same** as YOLOv8 / YOLO11.
- For *Linear probing*, only the last conv in classification head is trainable.
- For *Full tuning*, all parameters are trainable.
| Model | Size | Epochs | $AP^b$ | $AP^b_{50}$ | $AP^b_{75}$ | $AP^m$ | $AP^m_{50}$ | $AP^m_{75}$ | Log |
|---|---|---|---|---|---|---|---|---|---|
| Linear probing | | | | | | | | | |
| [YOLOE-v8-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8s-seg-coco-pe.pt) | 640 | 10 | 35.6 | 51.5 | 38.9 | 30.3 | 48.2 | 32.0 | [LP](./logs/yoloe-v8s-seg-coco-pe/) |
| [YOLOE-v8-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8m-seg-coco-pe.pt) | 640 | 10 | 42.2 | 59.2 | 46.3 | 35.5 | 55.6 | 37.7 | [LP](./logs/yoloe-v8m-seg-coco-pe/) |
| [YOLOE-v8-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8l-seg-coco-pe.pt) | 640 | 10 | 45.4 | 63.3 | 50.0 | 38.3 | 59.6 | 40.8 | [LP](./logs/yoloe-v8l-seg-coco-pe/) |
| [YOLOE-11-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11s-seg-coco-pe.pt) | 640 | 10 | 37.0 | 52.9 | 40.4 | 31.5 | 49.7 | 33.5 | [LP](./logs/yoloe-11s-seg-coco-pe/) |
| [YOLOE-11-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11m-seg-coco-pe.pt) | 640 | 10 | 43.1 | 60.6 | 47.4 | 36.5 | 56.9 | 39.0 | [LP](./logs/yoloe-11m-seg-coco-pe/) |
| [YOLOE-11-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11l-seg-coco-pe.pt) | 640 | 10 | 45.1 | 62.8 | 49.5 | 38.0 | 59.2 | 40.6 | [LP](./logs/yoloe-11l-seg-coco-pe/) |
| Full tuning | | | | | | | | | |
| [YOLOE-v8-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8s-seg-coco.pt) | 640 | 160 | 45.0 | 61.6 | 49.1 | 36.7 | 58.3 | 39.1 | [FT](./logs/yoloe-v8s-seg-coco/) |
| [YOLOE-v8-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8m-seg-coco.pt) | 640 | 80 | 50.4 | 67.0 | 55.2 | 40.9 | 63.7 | 43.5 | [FT](./logs/yoloe-v8m-seg-coco/) |
| [YOLOE-v8-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-v8l-seg-coco.pt) | 640 | 80 | 53.0 | 69.8 | 57.9 | 42.7 | 66.5 | 45.6 | [FT](./logs/yoloe-v8l-seg-coco/) |
| [YOLOE-11-S](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11s-seg-coco.pt) | 640 | 160 | 46.2 | 62.9 | 50.0 | 37.6 | 59.3 | 40.1 | [FT](./logs/yoloe-11s-seg-coco/) |
| [YOLOE-11-M](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11m-seg-coco.pt) | 640 | 80 | 51.3 | 68.3 | 56.0 | 41.5 | 64.8 | 44.3 | [FT](./logs/yoloe-11m-seg-coco/) |
| [YOLOE-11-L](https://huggingface.co/jameslahm/yoloe/blob/main/yoloe-11l-seg-coco.pt) | 640 | 80 | 52.6 | 69.7 | 57.5 | 42.4 | 66.2 | 45.2 | [FT](./logs/yoloe-11l-seg-coco/) |
## Installation
You could also quickly try YOLOE for [prediction](https://colab.research.google.com/drive/1LRFEVarAIVSnIeL_pCPtsFL87FsEe46U?usp=sharing) and [transferring](https://colab.research.google.com/drive/1y-r4y_owfFAfyqbqP2t64H7IqjURkKwe?usp=sharing) using colab notebooks.
`conda` virtual environment is recommended.
```bash
conda create -n yoloe python=3.10 -y
conda activate yoloe
# If you clone this repo, please use this
pip install -r requirements.txt
# Or you can also directly install the repo by this
pip install git+https://github.com/THU-MIG/yoloe.git#subdirectory=third_party/CLIP
pip install git+https://github.com/THU-MIG/yoloe.git#subdirectory=third_party/ml-mobileclip
pip install git+https://github.com/THU-MIG/yoloe.git#subdirectory=third_party/lvis-api
pip install git+https://github.com/THU-MIG/yoloe.git
wget https://docs-assets.developer.apple.com/ml-research/datasets/mobileclip/mobileclip_blt.pt
```
## Demo
If desired objects are not identified, pleaset set a **smaller** confidence threshold, e.g., for visual prompts with handcrafted shape or cross-image prompts.
```bash
# Optional for mirror: export HF_ENDPOINT=https://hf-mirror.com
pip install gradio==4.42.0 gradio_image_prompter==0.1.0 fastapi==0.112.2 huggingface-hub==0.26.3 gradio_client==1.3.0 pydantic==2.10.6
python app.py
# Please visit http://127.0.0.1:7860
```
## Prediction
```bash
# Download pretrained models
# Optional for mirror: export HF_ENDPOINT=https://hf-mirror.com
# Please replace the pt file with your desired model
pip install huggingface-hub==0.26.3
huggingface-cli download jameslahm/yoloe yoloe-v8l-seg.pt --local-dir pretrain
```
For yoloe-(v8s/m/l)/(11s/m/l)-seg, Models can also be automatically downloaded using `from_pretrained`.
```python
from ultralytics import YOLOE
model = YOLOE.from_pretrained("jameslahm/yoloe-v8l-seg")
```
### Text prompt
```bash
python predict_text_prompt.py \
--source ultralytics/assets/bus.jpg \
--checkpoint pretrain/yoloe-v8l-seg.pt \
--names person dog cat \
--device cuda:0
```
### Visual prompt
```bash
python predict_visual_prompt.py
```
### Prompt free
```bash
python predict_prompt_free.py
```
## Transferring
After pretraining, YOLOE-v8 / YOLOE-11 can be re-parameterized into the same architecture as YOLOv8 / YOLO11, with **zero overhead for transferring**.
### Linear probing
Only the last conv, ie., the prompt embedding, is trainable.
```bash
python train_pe.py
```
### Full tuning
All parameters are trainable, for better performance.
```bash
# For models with s scale, please change the epochs to 160 for longer training
python train_pe_all.py
```
## Validation
### Data
- Please download LVIS following [here](https://docs.ultralytics.com/zh/datasets/detect/lvis/) or [lvis.yaml](./ultralytics/cfg/datasets/lvis.yaml).
- We use this [`minival.txt`](./tools/lvis/minival.txt) with background images for evaluation.
```bash
# For evaluation with visual prompt, please obtain the referring data.
python tools/generate_lvis_visual_prompt_data.py
```
### Zero-shot evaluation on LVIS
- For text prompts, `python val.py`.
- For visual prompts, `python val_vp.py`
For *Fixed AP*, please refer to the comments in `val.py` and `val_vp.py`, and use `tools/eval_fixed_ap.py` for evaluation.
### Prompt-free evaluation
```bash
python val_pe_free.py
python tools/eval_open_ended.py --json ../datasets/lvis/annotations/lvis_v1_minival.json --pred runs/detect/val/predictions.json --fixed
```
### Downstream transfer on COCO
```bash
python val_coco.py
```
## Training
The training includes three stages:
- YOLOE is trained with text prompts for detection and segmentation for 30 epochs.
- Only visual prompt encoder (SAVPE) is trained with visual prompts for 2 epochs.
- Only specialized prompt embedding for prompt free is trained for 1 epochs.
### Data
| Images | Raw Annotations | Processed Annotations |
|---|---|---|
| [Objects365v1](https://opendatalab.com/OpenDataLab/Objects365_v1) | [objects365_train.json](https://opendatalab.com/OpenDataLab/Objects365_v1) | [objects365_train_segm.json](https://huggingface.co/datasets/jameslahm/yoloe/blob/main/objects365_train_segm.json) |
| [GQA](https://nlp.stanford.edu/data/gqa/images.zip) | [ final_mixed_train_noo_coco.json](https://huggingface.co/GLIPModel/GLIP/blob/main/mdetr_annotations/final_mixed_train_no_coco.json) | [ final_mixed_train_noo_coco_segm.json](https://huggingface.co/datasets/jameslahm/yoloe/blob/main/final_mixed_train_no_coco_segm.json) |
| [Flickr30k](https://shannon.cs.illinois.edu/DenotationGraph/) | [final_flickr_separateGT_train.json](https://huggingface.co/GLIPModel/GLIP/blob/main/mdetr_annotations/final_flickr_separateGT_train.json) | [final_flickr_separateGT_train_segm.json](https://huggingface.co/datasets/jameslahm/yoloe/blob/main/final_flickr_separateGT_train_segm.json) |
For annotations, you can directly use our preprocessed ones or use the following script to obtain the processed annotations with segmentation masks.
```bash
# Generate segmentation data
conda create -n sam2 python==3.10.16
conda activate sam2
pip install -r third_party/sam2/requirements.txt
pip install -e third_party/sam2/
python tools/generate_sam_masks.py --img-path ../datasets/Objects365v1/images/train --json-path ../datasets/Objects365v1/annotations/objects365_train.json --batch
python tools/generate_sam_masks.py --img-path ../datasets/flickr/full_images/ --json-path ../datasets/flickr/annotations/final_flickr_separateGT_train.json
python tools/generate_sam_masks.py --img-path ../datasets/mixed_grounding/gqa/images --json-path ../datasets/mixed_grounding/annotations/final_mixed_train_no_coco.json
# Generate objects365v1 labels
python tools/generate_objects365v1.py
```
Then, please generate the data and embedding cache for training.
```bash
# Generate grounding segmentation cache
python tools/generate_grounding_cache.py --img-path ../datasets/flickr/full_images/ --json-path ../datasets/flickr/annotations/final_flickr_separateGT_train_segm.json
python tools/generate_grounding_cache.py --img-path ../datasets/mixed_grounding/gqa/images --json-path ../datasets/mixed_grounding/annotations/final_mixed_train_no_coco_segm.json
# Generate train label embeddings
python tools/generate_label_embedding.py
python tools/generate_global_neg_cat.py
```
At last, please download MobileCLIP-B(LT) for text encoder.
```bash
wget https://docs-assets.developer.apple.com/ml-research/datasets/mobileclip/mobileclip_blt.pt
```
### Text prompt
```bash
# For models with l scale, please change the initialization by referring to the comments in Line 549 in ultralytics/nn/moduels/head.py
# If you want to train YOLOE only for detection, you can use `train.py`
python train_seg.py
```
### Visual prompt
```bash
# For visual prompt, because only SAVPE is trained, we can adopt the detection pipeline with less training time
# First, obtain the detection model
python tools/convert_segm2det.py
# Then, train the SAVPE module
python train_vp.py
# After training, please use tools/get_vp_segm.py to add the segmentation head
# python tools/get_vp_segm.py
```
### Prompt free
```bash
# Generate LVIS with single class for evaluation during training
python tools/generate_lvis_sc.py
# Similar to visual prompt, because only the specialized prompt embedding is trained, we can adopt the detection pipeline with less training time
python tools/convert_segm2det.py
python train_pe_free.py
# After training, please use tools/get_pf_free_segm.py to add the segmentation head
# python tools/get_pf_free_segm.py
```
## Export
After re-parameterization, YOLOE-v8 / YOLOE-11 can be exported into the identical format as YOLOv8 / YOLO11, with **zero overhead for inference**.
```bash
pip install onnx coremltools onnxslim
python export.py
```
## Benchmark
- For TensorRT, please refer to `benchmark.sh`.
- For CoreML, please use the benchmark tool from [XCode 14](https://developer.apple.com/videos/play/wwdc2022/10027/).
- For prompt-free setting, please refer to `tools/benchmark_pf.py`.
## Acknowledgement
The code base is built with [ultralytics](https://github.com/ultralytics/ultralytics), [YOLO-World](https://github.com/AILab-CVC/YOLO-World), [MobileCLIP](https://github.com/apple/ml-mobileclip), [lvis-api](https://github.com/lvis-dataset/lvis-api), [CLIP](https://github.com/openai/CLIP), and [GenerateU](https://github.com/FoundationVision/GenerateU).
Thanks for the great implementations!
## Citation
If our code or models help your work, please cite our paper:
```BibTeX
@misc{wang2025yoloerealtimeseeing,
title={YOLOE: Real-Time Seeing Anything},
author={Ao Wang and Lihao Liu and Hui Chen and Zijia Lin and Jungong Han and Guiguang Ding},
year={2025},
eprint={2503.07465},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.07465},
}
```
import torch
import numpy as np
import gradio as gr
import supervision as sv
from scipy.ndimage import binary_fill_holes
from ultralytics import YOLOE
from ultralytics.utils.torch_utils import smart_inference_mode
from ultralytics.models.yolo.yoloe.predict_vp import YOLOEVPSegPredictor
from gradio_image_prompter import ImagePrompter
from huggingface_hub import hf_hub_download
def init_model(model_id, is_pf=False):
filename = f"{model_id}-seg.pt" if not is_pf else f"{model_id}-seg-pf.pt"
path = hf_hub_download(repo_id="jameslahm/yoloe", filename=filename)
model = YOLOE(path)
model.eval()
model.to("cuda" if torch.cuda.is_available() else "cpu")
return model
@smart_inference_mode()
def yoloe_inference(image, prompts, target_image, model_id, image_size, conf_thresh, iou_thresh, prompt_type):
model = init_model(model_id)
kwargs = {}
if prompt_type == "Text":
texts = prompts["texts"]
model.set_classes(texts, model.get_text_pe(texts))
elif prompt_type == "Visual":
kwargs = dict(
prompts=prompts,
predictor=YOLOEVPSegPredictor
)
if target_image:
model.predict(source=image, imgsz=image_size, conf=conf_thresh, iou=iou_thresh, return_vpe=True, **kwargs)
model.set_classes(["object0"], model.predictor.vpe)
model.predictor = None # unset VPPredictor
image = target_image
kwargs = {}
elif prompt_type == "Prompt-free":
vocab = model.get_vocab(prompts["texts"])
model = init_model(model_id, is_pf=True)
model.set_vocab(vocab, names=prompts["texts"])
model.model.model[-1].is_fused = True
model.model.model[-1].conf = 0.001
model.model.model[-1].max_det = 1000
results = model.predict(source=image, imgsz=image_size, conf=conf_thresh, iou=iou_thresh, **kwargs)
detections = sv.Detections.from_ultralytics(results[0])
resolution_wh = image.size
thickness = sv.calculate_optimal_line_thickness(resolution_wh=resolution_wh)
text_scale = sv.calculate_optimal_text_scale(resolution_wh=resolution_wh)
labels = [
f"{class_name} {confidence:.2f}"
for class_name, confidence
in zip(detections['class_name'], detections.confidence)
]
annotated_image = image.copy()
annotated_image = sv.MaskAnnotator(
color_lookup=sv.ColorLookup.INDEX,
opacity=0.4
).annotate(scene=annotated_image, detections=detections)
annotated_image = sv.BoxAnnotator(
color_lookup=sv.ColorLookup.INDEX,
thickness=thickness
).annotate(scene=annotated_image, detections=detections)
annotated_image = sv.LabelAnnotator(
color_lookup=sv.ColorLookup.INDEX,
text_scale=text_scale,
smart_position=True
).annotate(scene=annotated_image, detections=detections, labels=labels)
return annotated_image
def app():
with gr.Blocks():
with gr.Row():
with gr.Column():
with gr.Row():
raw_image = gr.Image(type="pil", label="Image", visible=True, interactive=True)
box_image = ImagePrompter(type="pil", label="DrawBox", visible=False, interactive=True)
mask_image = gr.ImageEditor(type="pil", label="DrawMask", visible=False, interactive=True, layers=False, canvas_size=(640, 640))
target_image = gr.Image(type="pil", label="Target Image", visible=False, interactive=True)
yoloe_infer = gr.Button(value="Detect & Segment Objects")
prompt_type = gr.Textbox(value="Text", visible=False)
with gr.Tab("Text") as text_tab:
texts = gr.Textbox(label="Input Texts", value='person,bus', placeholder='person,bus', visible=True, interactive=True)
with gr.Tab("Visual") as visual_tab:
with gr.Row():
visual_prompt_type = gr.Dropdown(choices=["bboxes", "masks"], value="bboxes", label="Visual Type", interactive=True)
visual_usage_type = gr.Radio(choices=["Intra-Image", "Cross-Image"], value="Intra-Image", label="Intra/Cross Image", interactive=True)
with gr.Tab("Prompt-Free") as prompt_free_tab:
gr.HTML(
"""
<p style='text-align: center'>
<b>Prompt-Free Mode is On</b>
</p>
""", show_label=False)
model_id = gr.Dropdown(
label="Model",
choices=[
"yoloe-v8s",
"yoloe-v8m",
"yoloe-v8l",
"yoloe-11s",
"yoloe-11m",
"yoloe-11l",
],
value="yoloe-v8l",
)
image_size = gr.Slider(
label="Image Size",
minimum=320,
maximum=1280,
step=32,
value=640,
)
conf_thresh = gr.Slider(
label="Confidence Threshold",
minimum=0.0,
maximum=1.0,
step=0.05,
value=0.25,
)
iou_thresh = gr.Slider(
label="IoU Threshold",
minimum=0.0,
maximum=1.0,
step=0.05,
value=0.70,
)
with gr.Column():
output_image = gr.Image(type="numpy", label="Annotated Image", visible=True)
def update_text_image_visibility():
return gr.update(value="Text"), gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False)
def update_visual_image_visiblity(visual_prompt_type, visual_usage_type):
if visual_prompt_type == "bboxes":
return gr.update(value="Visual"), gr.update(visible=False), gr.update(visible=True), gr.update(visible=False), gr.update(visible=(visual_usage_type == "Cross-Image"))
elif visual_prompt_type == "masks":
return gr.update(value="Visual"), gr.update(visible=False), gr.update(visible=False), gr.update(visible=True), gr.update(visible=(visual_usage_type == "Cross-Image"))
def update_pf_image_visibility():
return gr.update(value="Prompt-free"), gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False)
text_tab.select(
fn=update_text_image_visibility,
inputs=None,
outputs=[prompt_type, raw_image, box_image, mask_image, target_image]
)
visual_tab.select(
fn=update_visual_image_visiblity,
inputs=[visual_prompt_type, visual_usage_type],
outputs=[prompt_type, raw_image, box_image, mask_image, target_image]
)
prompt_free_tab.select(
fn=update_pf_image_visibility,
inputs=None,
outputs=[prompt_type, raw_image, box_image, mask_image, target_image]
)
def update_visual_prompt_type(visual_prompt_type):
if visual_prompt_type == "bboxes":
return gr.update(visible=True), gr.update(visible=False)
if visual_prompt_type == "masks":
return gr.update(visible=False), gr.update(visible=True)
return gr.update(visible=False), gr.update(visible=False)
def update_visual_usage_type(visual_usage_type):
if visual_usage_type == "Intra-Image":
return gr.update(visible=False)
if visual_usage_type == "Cross-Image":
return gr.update(visible=True)
return gr.update(visible=False)
visual_prompt_type.change(
fn=update_visual_prompt_type,
inputs=[visual_prompt_type],
outputs=[box_image, mask_image]
)
visual_usage_type.change(
fn=update_visual_usage_type,
inputs=[visual_usage_type],
outputs=[target_image]
)
def run_inference(raw_image, box_image, mask_image, target_image, texts, model_id, image_size, conf_thresh, iou_thresh, prompt_type, visual_prompt_type, visual_usage_type):
# add text/built-in prompts
if prompt_type == "Text" or prompt_type == "Prompt-free":
target_image = None
image = raw_image
if prompt_type == "Prompt-free":
with open('tools/ram_tag_list.txt', 'r') as f:
texts = [x.strip() for x in f.readlines()]
else:
texts = [text.strip() for text in texts.split(',')]
prompts = {
"texts": texts
}
# add visual prompt
elif prompt_type == "Visual":
if visual_usage_type != "Cross-Image":
target_image = None
if visual_prompt_type == "bboxes":
image, points = box_image["image"], box_image["points"]
points = np.array(points)
if len(points) == 0:
gr.Warning("No boxes are provided. No image output.", visible=True)
return gr.update(value=None)
bboxes = np.array([p[[0, 1, 3, 4]] for p in points if p[2] == 2])
prompts = {
"bboxes": bboxes,
"cls": np.array([0] * len(bboxes))
}
elif visual_prompt_type == "masks":
image, masks = mask_image["background"], mask_image["layers"][0]
# image = image.convert("RGB")
masks = np.array(masks.convert("L"))
masks = binary_fill_holes(masks).astype(np.uint8)
masks[masks > 0] = 1
if masks.sum() == 0:
gr.Warning("No masks are provided. No image output.", visible=True)
return gr.update(value=None)
prompts = {
"masks": masks[None],
"cls": np.array([0])
}
return yoloe_inference(image, prompts, target_image, model_id, image_size, conf_thresh, iou_thresh, prompt_type)
yoloe_infer.click(
fn=run_inference,
inputs=[raw_image, box_image, mask_image, target_image, texts, model_id, image_size, conf_thresh, iou_thresh, prompt_type, visual_prompt_type, visual_usage_type],
outputs=[output_image],
)
###################### Examples ##########################
text_examples = gr.Examples(
examples=[[
"ultralytics/assets/bus.jpg",
"person,bus",
"yoloe-v8l",
640,
0.25,
0.7]],
inputs=[raw_image, texts, model_id, image_size, conf_thresh, iou_thresh],
visible=True, cache_examples=False, label="Text Prompt Examples")
box_examples = gr.Examples(
examples=[[
{"image": "ultralytics/assets/bus_box.jpg", "points": [[235, 408, 2, 342, 863, 3]]},
"ultralytics/assets/zidane.jpg",
"yoloe-v8l",
640,
0.2,
0.7,
]],
inputs=[box_image, target_image, model_id, image_size, conf_thresh, iou_thresh],
visible=False, cache_examples=False, label="Box Visual Prompt Examples")
mask_examples = gr.Examples(
examples=[[
{"background": "ultralytics/assets/bus.jpg", "layers": ["ultralytics/assets/bus_mask.png"], "composite": "ultralytics/assets/bus_composite.jpg"},
"ultralytics/assets/zidane.jpg",
"yoloe-v8l",
640,
0.15,
0.7,
]],
inputs=[mask_image, target_image, model_id, image_size, conf_thresh, iou_thresh],
visible=False, cache_examples=False, label="Mask Visual Prompt Examples")
pf_examples = gr.Examples(
examples=[[
"ultralytics/assets/bus.jpg",
"yoloe-v8l",
640,
0.25,
0.7,
]],
inputs=[raw_image, model_id, image_size, conf_thresh, iou_thresh],
visible=False, cache_examples=False, label="Prompt-free Examples")
# Components update
def load_box_example(visual_usage_type):
return (gr.update(visible=True, value={"image": "ultralytics/assets/bus_box.jpg", "points": [[235, 408, 2, 342, 863, 3]]}),
gr.update(visible=(visual_usage_type=="Cross-Image")))
def load_mask_example(visual_usage_type):
return gr.update(visible=True), gr.update(visible=(visual_usage_type=="Cross-Image"))
box_examples.load_input_event.then(
fn=load_box_example,
inputs=visual_usage_type,
outputs=[box_image, target_image]
)
mask_examples.load_input_event.then(
fn=load_mask_example,
inputs=visual_usage_type,
outputs=[mask_image, target_image]
)
# Examples update
def update_text_examples():
return gr.Dataset(visible=True), gr.Dataset(visible=False), gr.Dataset(visible=False), gr.Dataset(visible=False)
def update_pf_examples():
return gr.Dataset(visible=False), gr.Dataset(visible=False), gr.Dataset(visible=False), gr.Dataset(visible=True)
def update_visual_examples(visual_prompt_type):
if visual_prompt_type == "bboxes":
return gr.Dataset(visible=False), gr.Dataset(visible=True), gr.Dataset(visible=False), gr.Dataset(visible=False),
elif visual_prompt_type == "masks":
return gr.Dataset(visible=False), gr.Dataset(visible=False), gr.Dataset(visible=True), gr.Dataset(visible=False),
text_tab.select(
fn=update_text_examples,
inputs=None,
outputs=[text_examples.dataset, box_examples.dataset, mask_examples.dataset, pf_examples.dataset]
)
visual_tab.select(
fn=update_visual_examples,
inputs=[visual_prompt_type],
outputs=[text_examples.dataset, box_examples.dataset, mask_examples.dataset, pf_examples.dataset]
)
prompt_free_tab.select(
fn=update_pf_examples,
inputs=None,
outputs=[text_examples.dataset, box_examples.dataset, mask_examples.dataset, pf_examples.dataset]
)
visual_prompt_type.change(
fn=update_visual_examples,
inputs=[visual_prompt_type],
outputs=[text_examples.dataset, box_examples.dataset, mask_examples.dataset, pf_examples.dataset]
)
visual_usage_type.change(
fn=update_visual_examples,
inputs=[visual_prompt_type],
outputs=[text_examples.dataset, box_examples.dataset, mask_examples.dataset, pf_examples.dataset]
)
gradio_app = gr.Blocks()
with gradio_app:
gr.HTML(
"""
<h1 style='text-align: center'>
<img src="/file=figures/logo.png" width="2.5%" style="display:inline;padding-bottom:4px">
YOLOE: Real-Time Seeing Anything
</h1>
""")
gr.HTML(
"""
<h3 style='text-align: center'>
<a href='https://arxiv.org/abs/2503.07465' target='_blank'>arXiv</a> | <a href='https://github.com/THU-MIG/yoloe' target='_blank'>github</a>
</h3>
""")
gr.Markdown(
"""
We introduce **YOLOE(ye)**, a highly **efficient**, **unified**, and **open** object detection and segmentation model, like human eye, under different prompt mechanisms, like *texts*, *visual inputs*, and *prompt-free paradigm*.
"""
)
gr.Markdown(
"""
If desired objects are not identified, pleaset set a **smaller** confidence threshold, e.g., for visual prompts with handcrafted shape or cross-image prompts.
"""
)
gr.Markdown(
"""
Drawing **multiple** boxes or handcrafted shapes as visual prompt in an image is also supported, which leads to more accurate prompt.
"""
)
with gr.Row():
with gr.Column():
app()
if __name__ == '__main__':
gradio_app.launch(allowed_paths=["figures"])
set -e
set -x
MODEL=$1
trtexec --onnx="${MODEL}.onnx" \
--fp16 \
--saveEngine="${MODEL}.engine" \
--timingCacheFile="${MODEL}.engine.timing.cache" \
--warmUp=500 \
--duration=10 \
--useCudaGraph \
--useSpinWait \
--noDataTransfers > /dev/null
trtexec \
--fp16 \
--loadEngine="${MODEL}.engine" \
--timingCacheFile="${MODEL}.engine.timing.cache" \
--warmUp=500 \
--duration=10 \
--useCudaGraph \
--useSpinWait \
--noDataTransfers
\ No newline at end of file
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
ENV DEBIAN_FRONTEND=noninteractive
# RUN yum update && yum install -y git cmake wget build-essential
# RUN source /opt/dtk-dtk25.04/env.sh
# # 安装pip相关依赖
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
-e .
-e third_party/lvis-api
-e third_party/ml-mobileclip
-e third_party/CLIP
\ No newline at end of file
# Ultralytics YOLO 🚀, AGPL-3.0 license
# Builds ultralytics/ultralytics:latest image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
# Image is CUDA-optimized for YOLO11 single/multi-GPU training and inference
# Start FROM PyTorch image https://hub.docker.com/r/pytorch/pytorch or nvcr.io/nvidia/pytorch:23.03-py3
FROM pytorch/pytorch:2.5.0-cuda12.4-cudnn9-runtime
# Set environment variables
# Avoid DDP error "MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library" https://github.com/pytorch/pytorch/issues/37377
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_BREAK_SYSTEM_PACKAGES=1 \
MKL_THREADING_LAYER=GNU \
OMP_NUM_THREADS=1
# Downloads to user config dir
ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
/root/.config/Ultralytics/
# Install linux packages
# g++ required to build 'tflite_support' and 'lap' packages, libusb-1.0-0 required for 'tflite_support' package
# libsm6 required by libqxcb to create QT-based windows for visualization; set 'QT_DEBUG_PLUGINS=1' to test in docker
RUN apt-get update && \
apt-get install -y --no-install-recommends \
gcc git zip unzip wget curl htop libgl1 libglib2.0-0 libpython3-dev gnupg g++ libusb-1.0-0 libsm6 \
&& rm -rf /var/lib/apt/lists/*
# Security updates
# https://security.snyk.io/vuln/SNYK-UBUNTU1804-OPENSSL-3314796
RUN apt upgrade --no-install-recommends -y openssl tar
# Create working directory
WORKDIR /ultralytics
# Copy contents and configure git
COPY . .
RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
# Install pip packages
RUN python3 -m pip install --upgrade pip wheel
# Note -cu12 must be used with tensorrt)
RUN pip install -e ".[export]" tensorrt-cu12 "albumentations>=1.4.6" comet pycocotools
# Run exports to AutoInstall packages
# Edge TPU export fails the first time so is run twice here
RUN yolo export model=tmp/yolo11n.pt format=edgetpu imgsz=32 || yolo export model=tmp/yolo11n.pt format=edgetpu imgsz=32
RUN yolo export model=tmp/yolo11n.pt format=ncnn imgsz=32
# Requires <= Python 3.10, bug with paddlepaddle==2.5.0 https://github.com/PaddlePaddle/X2Paddle/issues/991
RUN pip install "paddlepaddle>=2.6.0" x2paddle
# Fix error: `np.bool` was a deprecated alias for the builtin `bool` segmentation error in Tests
RUN pip install numpy==1.23.5
# Remove extra build files
RUN rm -rf tmp /root/.config/Ultralytics/persistent_cache.json
# Usage Examples -------------------------------------------------------------------------------------------------------
# Build and Push
# t=ultralytics/ultralytics:latest && sudo docker build -f docker/Dockerfile -t $t . && sudo docker push $t
# Pull and Run with access to all GPUs
# t=ultralytics/ultralytics:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all $t
# Pull and Run with access to GPUs 2 and 3 (inside container CUDA devices will appear as 0 and 1)
# t=ultralytics/ultralytics:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus '"device=2,3"' $t
# Pull and Run with local directory access
# t=ultralytics/ultralytics:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all -v "$(pwd)"/shared/datasets:/datasets $t
# Kill all
# sudo docker kill $(sudo docker ps -q)
# Kill all image-based
# sudo docker kill $(sudo docker ps -qa --filter ancestor=ultralytics/ultralytics:latest)
# DockerHub tag update
# t=ultralytics/ultralytics:latest tnew=ultralytics/ultralytics:v6.2 && sudo docker pull $t && sudo docker tag $t $tnew && sudo docker push $tnew
# Clean up
# sudo docker system prune -a --volumes
# Update Ubuntu drivers
# https://www.maketecheasier.com/install-nvidia-drivers-ubuntu/
# DDP test
# python -m torch.distributed.run --nproc_per_node 2 --master_port 1 train.py --epochs 3
# GCP VM from Image
# docker.io/ultralytics/ultralytics:latest
# Ultralytics YOLO 🚀, AGPL-3.0 license
# Builds ultralytics/ultralytics:latest-arm64 image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
# Image is aarch64-compatible for Apple M1, M2, M3, Raspberry Pi and other ARM architectures
# Start FROM Ubuntu image https://hub.docker.com/_/ubuntu with "FROM arm64v8/ubuntu:22.04" (deprecated)
# Start FROM Debian image for arm64v8 https://hub.docker.com/r/arm64v8/debian (new)
FROM arm64v8/debian:bookworm-slim
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_BREAK_SYSTEM_PACKAGES=1
# Downloads to user config dir
ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
/root/.config/Ultralytics/
# Install linux packages
# g++ required to build 'tflite_support' and 'lap' packages, libusb-1.0-0 required for 'tflite_support' package
# pkg-config and libhdf5-dev (not included) are needed to build 'h5py==3.11.0' aarch64 wheel required by 'tensorflow'
RUN apt-get update && \
apt-get install -y --no-install-recommends \
python3-pip git zip unzip wget curl htop gcc libgl1 libglib2.0-0 libpython3-dev gnupg g++ libusb-1.0-0 \
&& rm -rf /var/lib/apt/lists/*
# Create working directory
WORKDIR /ultralytics
# Copy contents and configure git
COPY . .
RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
# Install pip packages
RUN python3 -m pip install --upgrade pip wheel
RUN pip install -e ".[export]"
# Creates a symbolic link to make 'python' point to 'python3'
RUN ln -sf /usr/bin/python3 /usr/bin/python
# Remove extra build files
RUN rm -rf /root/.config/Ultralytics/persistent_cache.json
# Usage Examples -------------------------------------------------------------------------------------------------------
# Build and Push
# t=ultralytics/ultralytics:latest-arm64 && sudo docker build --platform linux/arm64 -f docker/Dockerfile-arm64 -t $t . && sudo docker push $t
# Run
# t=ultralytics/ultralytics:latest-arm64 && sudo docker run -it --ipc=host $t
# Pull and Run
# t=ultralytics/ultralytics:latest-arm64 && sudo docker pull $t && sudo docker run -it --ipc=host $t
# Pull and Run with local volume mounted
# t=ultralytics/ultralytics:latest-arm64 && sudo docker pull $t && sudo docker run -it --ipc=host -v "$(pwd)"/shared/datasets:/datasets $t
# Ultralytics YOLO 🚀, AGPL-3.0 license
# Builds ultralytics/ultralytics:latest-conda image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
# Image is optimized for Ultralytics Anaconda (https://anaconda.org/conda-forge/ultralytics) installation and usage
# Start FROM miniconda3 image https://hub.docker.com/r/continuumio/miniconda3
FROM continuumio/miniconda3:latest
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_BREAK_SYSTEM_PACKAGES=1
# Downloads to user config dir
ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
/root/.config/Ultralytics/
# Install linux packages
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libgl1 \
&& rm -rf /var/lib/apt/lists/*
# Copy contents
ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
# Install conda packages
# mkl required to fix 'OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory'
RUN conda config --set solver libmamba && \
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia && \
conda install -c conda-forge ultralytics mkl
# conda install -c pytorch -c nvidia -c conda-forge pytorch torchvision pytorch-cuda=12.1 ultralytics mkl
# Remove extra build files
RUN rm -rf /root/.config/Ultralytics/persistent_cache.json
# Usage Examples -------------------------------------------------------------------------------------------------------
# Build and Push
# t=ultralytics/ultralytics:latest-conda && sudo docker build -f docker/Dockerfile-cpu -t $t . && sudo docker push $t
# Run
# t=ultralytics/ultralytics:latest-conda && sudo docker run -it --ipc=host $t
# Pull and Run
# t=ultralytics/ultralytics:latest-conda && sudo docker pull $t && sudo docker run -it --ipc=host $t
# Pull and Run with local volume mounted
# t=ultralytics/ultralytics:latest-conda && sudo docker pull $t && sudo docker run -it --ipc=host -v "$(pwd)"/shared/datasets:/datasets $t
# Ultralytics YOLO 🚀, AGPL-3.0 license
# Builds ultralytics/ultralytics:latest-cpu image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
# Image is CPU-optimized for ONNX, OpenVINO and PyTorch YOLO11 deployments
# Use official Python base image for reproducibility (3.11.10 for export and 3.12.6 for inference)
FROM python:3.11.10-slim-bookworm
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_BREAK_SYSTEM_PACKAGES=1
# Downloads to user config dir
ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
/root/.config/Ultralytics/
# Install linux packages
# g++ required to build 'tflite_support' and 'lap' packages, libusb-1.0-0 required for 'tflite_support' package
RUN apt-get update && \
apt-get install -y --no-install-recommends \
python3-pip git zip unzip wget curl htop libgl1 libglib2.0-0 libpython3-dev gnupg g++ libusb-1.0-0 \
&& rm -rf /var/lib/apt/lists/*
# Create working directory
WORKDIR /ultralytics
# Copy contents and configure git
COPY . .
RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
# Install pip packages
RUN python3 -m pip install --upgrade pip wheel
RUN pip install -e ".[export]" --extra-index-url https://download.pytorch.org/whl/cpu
# Run exports to AutoInstall packages
RUN yolo export model=tmp/yolo11n.pt format=edgetpu imgsz=32
RUN yolo export model=tmp/yolo11n.pt format=ncnn imgsz=32
# Requires Python<=3.10, bug with paddlepaddle==2.5.0 https://github.com/PaddlePaddle/X2Paddle/issues/991
RUN pip install "paddlepaddle>=2.6.0" x2paddle
# Remove extra build files
RUN rm -rf tmp /root/.config/Ultralytics/persistent_cache.json
# Set default command to bash
CMD ["/bin/bash"]
# Usage Examples -------------------------------------------------------------------------------------------------------
# Build and Push
# t=ultralytics/ultralytics:latest-cpu && sudo docker build -f docker/Dockerfile-cpu -t $t . && sudo docker push $t
# Run
# t=ultralytics/ultralytics:latest-cpu && sudo docker run -it --ipc=host --name NAME $t
# Pull and Run
# t=ultralytics/ultralytics:latest-cpu && sudo docker pull $t && sudo docker run -it --ipc=host --name NAME $t
# Pull and Run with local volume mounted
# t=ultralytics/ultralytics:latest-cpu && sudo docker pull $t && sudo docker run -it --ipc=host -v "$(pwd)"/shared/datasets:/datasets $t
# Ultralytics YOLO 🚀, AGPL-3.0 license
# Builds ultralytics/ultralytics:jetson-jetpack4 image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
# Supports JetPack4.x for YOLO11 on Jetson Nano, TX2, Xavier NX, AGX Xavier
# Start FROM https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-cuda
FROM nvcr.io/nvidia/l4t-cuda:10.2.460-runtime
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1
# Downloads to user config dir
ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
/root/.config/Ultralytics/
# Add NVIDIA repositories for TensorRT dependencies
RUN wget -q -O - https://repo.download.nvidia.com/jetson/jetson-ota-public.asc | apt-key add - && \
echo "deb https://repo.download.nvidia.com/jetson/common r32.7 main" > /etc/apt/sources.list.d/nvidia-l4t-apt-source.list && \
echo "deb https://repo.download.nvidia.com/jetson/t194 r32.7 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
# Install dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
git python3.8 python3.8-dev python3-pip python3-libnvinfer libopenmpi-dev libopenblas-base libomp-dev gcc \
&& rm -rf /var/lib/apt/lists/*
# Create symbolic links for python3.8 and pip3
RUN ln -sf /usr/bin/python3.8 /usr/bin/python3
RUN ln -s /usr/bin/pip3 /usr/bin/pip
# Create working directory
WORKDIR /ultralytics
# Copy contents and configure git
COPY . .
RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
# Download onnxruntime-gpu 1.8.0 and tensorrt 8.2.0.6
# Other versions can be seen in https://elinux.org/Jetson_Zoo and https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048
ADD https://nvidia.box.com/shared/static/gjqofg7rkg97z3gc8jeyup6t8n9j8xjw.whl onnxruntime_gpu-1.8.0-cp38-cp38-linux_aarch64.whl
ADD https://forums.developer.nvidia.com/uploads/short-url/hASzFOm9YsJx6VVFrDW1g44CMmv.whl tensorrt-8.2.0.6-cp38-none-linux_aarch64.whl
# Install pip packages
RUN python3 -m pip install --upgrade pip wheel
RUN pip install \
onnxruntime_gpu-1.8.0-cp38-cp38-linux_aarch64.whl \
tensorrt-8.2.0.6-cp38-none-linux_aarch64.whl \
https://github.com/ultralytics/assets/releases/download/v0.0.0/torch-1.11.0a0+gitbc2c6ed-cp38-cp38-linux_aarch64.whl \
https://github.com/ultralytics/assets/releases/download/v0.0.0/torchvision-0.12.0a0+9b5a3fe-cp38-cp38-linux_aarch64.whl
RUN pip install -e ".[export]"
# Remove extra build files
RUN rm -rf *.whl /root/.config/Ultralytics/persistent_cache.json
# Usage Examples -------------------------------------------------------------------------------------------------------
# Build and Push
# t=ultralytics/ultralytics:latest-jetson-jetpack4 && sudo docker build --platform linux/arm64 -f docker/Dockerfile-jetson-jetpack4 -t $t . && sudo docker push $t
# Run
# t=ultralytics/ultralytics:latest-jetson-jetpack4 && sudo docker run -it --ipc=host $t
# Pull and Run
# t=ultralytics/ultralytics:latest-jetson-jetpack4 && sudo docker pull $t && sudo docker run -it --ipc=host $t
# Pull and Run with NVIDIA runtime
# t=ultralytics/ultralytics:latest-jetson-jetpack4 && sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
# Ultralytics YOLO 🚀, AGPL-3.0 license
# Builds ultralytics/ultralytics:jetson-jetson-jetpack5 image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
# Supports JetPack5.x for YOLO11 on Jetson Xavier NX, AGX Xavier, AGX Orin, Orin Nano and Orin NX
# Start FROM https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-pytorch
FROM nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_BREAK_SYSTEM_PACKAGES=1
# Downloads to user config dir
ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
/root/.config/Ultralytics/
# Install linux packages
# g++ required to build 'tflite_support' and 'lap' packages
# libusb-1.0-0 required for 'tflite_support' package when exporting to TFLite
# pkg-config and libhdf5-dev (not included) are needed to build 'h5py==3.11.0' aarch64 wheel required by 'tensorflow'
RUN apt-get update && \
apt-get install -y --no-install-recommends \
gcc git zip unzip wget curl htop libgl1 libglib2.0-0 libpython3-dev gnupg g++ libusb-1.0-0 \
&& rm -rf /var/lib/apt/lists/*
# Create working directory
WORKDIR /ultralytics
# Copy contents and configure git
COPY . .
RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
# Remove opencv-python from Ultralytics dependencies as it conflicts with opencv-python installed in base image
RUN sed -i '/opencv-python/d' pyproject.toml
# Download onnxruntime-gpu 1.15.1 for Jetson Linux 35.2.1 (JetPack 5.1). Other versions can be seen in https://elinux.org/Jetson_Zoo#ONNX_Runtime
ADD https://nvidia.box.com/shared/static/mvdcltm9ewdy2d5nurkiqorofz1s53ww.whl onnxruntime_gpu-1.15.1-cp38-cp38-linux_aarch64.whl
# Install pip packages manually for TensorRT compatibility https://github.com/NVIDIA/TensorRT/issues/2567
RUN python3 -m pip install --upgrade pip wheel
RUN pip install onnxruntime_gpu-1.15.1-cp38-cp38-linux_aarch64.whl
RUN pip install -e ".[export]"
# Remove extra build files
RUN rm -rf *.whl /root/.config/Ultralytics/persistent_cache.json
# Usage Examples -------------------------------------------------------------------------------------------------------
# Build and Push
# t=ultralytics/ultralytics:latest-jetson-jetpack5 && sudo docker build --platform linux/arm64 -f docker/Dockerfile-jetson-jetpack5 -t $t . && sudo docker push $t
# Run
# t=ultralytics/ultralytics:latest-jetson-jetpack5 && sudo docker run -it --ipc=host $t
# Pull and Run
# t=ultralytics/ultralytics:latest-jetson-jetpack5 && sudo docker pull $t && sudo docker run -it --ipc=host $t
# Pull and Run with NVIDIA runtime
# t=ultralytics/ultralytics:latest-jetson-jetpack5 && sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
# Ultralytics YOLO 🚀, AGPL-3.0 license
# Builds ultralytics/ultralytics:jetson-jetpack6 image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
# Supports JetPack6.x for YOLO11 on Jetson AGX Orin, Orin NX and Orin Nano Series
# Start FROM https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-jetpack
FROM nvcr.io/nvidia/l4t-jetpack:r36.3.0
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_BREAK_SYSTEM_PACKAGES=1
# Downloads to user config dir
ADD https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.ttf \
https://github.com/ultralytics/assets/releases/download/v0.0.0/Arial.Unicode.ttf \
/root/.config/Ultralytics/
# Install dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
git python3-pip libopenmpi-dev libopenblas-base libomp-dev \
&& rm -rf /var/lib/apt/lists/*
# Create working directory
WORKDIR /ultralytics
# Copy contents and configure git
COPY . .
RUN sed -i '/^\[http "https:\/\/github\.com\/"\]/,+1d' .git/config
ADD https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt .
# Download onnxruntime-gpu 1.18.0 from https://elinux.org/Jetson_Zoo and https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048
ADD https://nvidia.box.com/shared/static/48dtuob7meiw6ebgfsfqakc9vse62sg4.whl onnxruntime_gpu-1.18.0-cp310-cp310-linux_aarch64.whl
# Pip install onnxruntime-gpu, torch, torchvision and ultralytics
RUN python3 -m pip install --upgrade pip wheel
RUN pip install \
onnxruntime_gpu-1.18.0-cp310-cp310-linux_aarch64.whl \
https://github.com/ultralytics/assets/releases/download/v0.0.0/torch-2.3.0-cp310-cp310-linux_aarch64.whl \
https://github.com/ultralytics/assets/releases/download/v0.0.0/torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl
RUN pip install -e ".[export]"
# Remove extra build files
RUN rm -rf *.whl /root/.config/Ultralytics/persistent_cache.json
# Usage Examples -------------------------------------------------------------------------------------------------------
# Build and Push
# t=ultralytics/ultralytics:latest-jetson-jetpack6 && sudo docker build --platform linux/arm64 -f docker/Dockerfile-jetson-jetpack6 -t $t . && sudo docker push $t
# Run
# t=ultralytics/ultralytics:latest-jetson-jetpack6 && sudo docker run -it --ipc=host $t
# Pull and Run
# t=ultralytics/ultralytics:latest-jetson-jetpack6 && sudo docker pull $t && sudo docker run -it --ipc=host $t
# Pull and Run with NVIDIA runtime
# t=ultralytics/ultralytics:latest-jetson-jetpack6 && sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
# Ultralytics YOLO 🚀, AGPL-3.0 license
# Builds ultralytics/ultralytics:latest-jupyter image on DockerHub https://hub.docker.com/r/ultralytics/ultralytics
# Image provides JupyterLab interface for interactive YOLO development and includes tutorial notebooks
# Start from Python-based Ultralytics image for full Python environment
FROM ultralytics/ultralytics:latest-python
# Install JupyterLab for interactive development
RUN /usr/local/bin/pip install jupyterlab
# Create persistent data directory structure
RUN mkdir /data
# Configure YOLO directory paths
RUN mkdir /data/datasets && /usr/local/bin/yolo settings datasets_dir="/data/datasets"
RUN mkdir /data/weights && /usr/local/bin/yolo settings weights_dir="/data/weights"
RUN mkdir /data/runs && /usr/local/bin/yolo settings runs_dir="/data/runs"
# Start JupyterLab with tutorial notebook
ENTRYPOINT ["/usr/local/bin/jupyter", "lab", "--allow-root", "--ip=*", "/ultralytics/examples/tutorial.ipynb"]
# Usage Examples -------------------------------------------------------------------------------------------------------
# Build and Push
# t=ultralytics/ultralytics:latest-jupyter && sudo docker build -f docker/Dockerfile-jupyter -t $t . && sudo docker push $t
# Run
# t=ultralytics/ultralytics:latest-jupyter && sudo docker run -it --ipc=host -p 8888:8888 $t
# Pull and Run
# t=ultralytics/ultralytics:latest-jupyter && sudo docker pull $t && sudo docker run -it --ipc=host -p 8888:8888 $t
# Pull and Run with local volume mounted
# t=ultralytics/ultralytics:latest-jupyter && sudo docker pull $t && sudo docker run -it --ipc=host -p 8888:8888 -v "$(pwd)"/datasets:/data/datasets $t
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment