Commit af155c51 authored by chenzk's avatar chenzk
Browse files

v1.0

parents
Pipeline #2732 failed with stages
in 0 seconds
---
comments: true
description: Explore the comprehensive Argoverse dataset by Argo AI for 3D tracking, motion forecasting, and stereo depth estimation in autonomous driving research.
keywords: Argoverse dataset, autonomous driving, 3D tracking, motion forecasting, stereo depth estimation, Argo AI, LiDAR point clouds, high-resolution images, HD maps
---
# Argoverse Dataset
The [Argoverse](https://www.argoverse.org/) dataset is a collection of data designed to support research in autonomous driving tasks, such as 3D tracking, motion forecasting, and stereo depth estimation. Developed by Argo AI, the dataset provides a wide range of high-quality sensor data, including high-resolution images, LiDAR point clouds, and map data.
!!! note
The Argoverse dataset `*.zip` file required for training was removed from Amazon S3 after the shutdown of Argo AI by Ford, but we have made it available for manual download on [Google Drive](https://drive.google.com/file/d/1st9qW3BeIwQsnR0t8mRpvbsSWIo16ACi/view?usp=drive_link).
## Key Features
- Argoverse contains over 290K labeled 3D object tracks and 5 million object instances across 1,263 distinct scenes.
- The dataset includes high-resolution camera images, LiDAR point clouds, and richly annotated HD maps.
- Annotations include 3D bounding boxes for objects, object tracks, and trajectory information.
- Argoverse provides multiple subsets for different tasks, such as 3D tracking, motion forecasting, and stereo depth estimation.
## Dataset Structure
The Argoverse dataset is organized into three main subsets:
1. **Argoverse 3D Tracking**: This subset contains 113 scenes with over 290K labeled 3D object tracks, focusing on 3D object tracking tasks. It includes LiDAR point clouds, camera images, and sensor calibration information.
2. **Argoverse Motion Forecasting**: This subset consists of 324K vehicle trajectories collected from 60 hours of driving data, suitable for motion forecasting tasks.
3. **Argoverse Stereo Depth Estimation**: This subset is designed for stereo depth estimation tasks and includes over 10K stereo image pairs with corresponding LiDAR point clouds for ground truth depth estimation.
## Applications
The Argoverse dataset is widely used for training and evaluating [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl) models in autonomous driving tasks such as 3D object tracking, motion forecasting, and stereo depth estimation. The dataset's diverse set of sensor data, object annotations, and map information make it a valuable resource for researchers and practitioners in the field of autonomous driving.
## Dataset YAML
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. For the case of the Argoverse dataset, the `Argoverse.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Argoverse.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Argoverse.yaml).
!!! example "ultralytics/cfg/datasets/Argoverse.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/Argoverse.yaml"
```
## Usage
To train a YOLO11n model on the Argoverse dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="Argoverse.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=Argoverse.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Sample Data and Annotations
The Argoverse dataset contains a diverse set of sensor data, including camera images, LiDAR point clouds, and HD map information, providing rich context for autonomous driving tasks. Here are some examples of data from the dataset, along with their corresponding annotations:
![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/argoverse-3d-tracking-sample.avif)
- **Argoverse 3D Tracking**: This image demonstrates an example of 3D object tracking, where objects are annotated with 3D bounding boxes. The dataset provides LiDAR point clouds and camera images to facilitate the development of models for this task.
The example showcases the variety and complexity of the data in the Argoverse dataset and highlights the importance of high-quality sensor data for autonomous driving tasks.
## Citations and Acknowledgments
If you use the Argoverse dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@inproceedings{chang2019argoverse,
title={Argoverse: 3D Tracking and Forecasting with Rich Maps},
author={Chang, Ming-Fang and Lambert, John and Sangkloy, Patsorn and Singh, Jagjeet and Bak, Slawomir and Hartnett, Andrew and Wang, Dequan and Carr, Peter and Lucey, Simon and Ramanan, Deva and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={8748--8757},
year={2019}
}
```
We would like to acknowledge Argo AI for creating and maintaining the Argoverse dataset as a valuable resource for the autonomous driving research community. For more information about the Argoverse dataset and its creators, visit the [Argoverse dataset website](https://www.argoverse.org/).
## FAQ
### What is the Argoverse dataset and its key features?
The [Argoverse](https://www.argoverse.org/) dataset, developed by Argo AI, supports autonomous driving research. It includes over 290K labeled 3D object tracks and 5 million object instances across 1,263 distinct scenes. The dataset provides high-resolution camera images, LiDAR point clouds, and annotated HD maps, making it valuable for tasks like 3D tracking, motion forecasting, and stereo depth estimation.
### How can I train an Ultralytics YOLO model using the Argoverse dataset?
To train a YOLO11 model with the Argoverse dataset, use the provided YAML configuration file and the following code:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="Argoverse.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=Argoverse.yaml model=yolo11n.pt epochs=100 imgsz=640
```
For a detailed explanation of the arguments, refer to the model [Training](../../modes/train.md) page.
### What types of data and annotations are available in the Argoverse dataset?
The Argoverse dataset includes various sensor data types such as high-resolution camera images, LiDAR point clouds, and HD map data. Annotations include 3D bounding boxes, object tracks, and trajectory information. These comprehensive annotations are essential for accurate model training in tasks like 3D object tracking, motion forecasting, and stereo depth estimation.
### How is the Argoverse dataset structured?
The dataset is divided into three main subsets:
1. **Argoverse 3D Tracking**: Contains 113 scenes with over 290K labeled 3D object tracks, focusing on 3D object tracking tasks. It includes LiDAR point clouds, camera images, and sensor calibration information.
2. **Argoverse Motion Forecasting**: Consists of 324K vehicle trajectories collected from 60 hours of driving data, suitable for motion forecasting tasks.
3. **Argoverse Stereo Depth Estimation**: Includes over 10K stereo image pairs with corresponding LiDAR point clouds for ground truth depth estimation.
### Where can I download the Argoverse dataset now that it has been removed from Amazon S3?
The Argoverse dataset `*.zip` file, previously available on Amazon S3, can now be manually downloaded from [Google Drive](https://drive.google.com/file/d/1st9qW3BeIwQsnR0t8mRpvbsSWIo16ACi/view?usp=drive_link).
### What is the YAML configuration file used for with the Argoverse dataset?
A YAML file contains the dataset's paths, classes, and other essential information. For the Argoverse dataset, the configuration file, `Argoverse.yaml`, can be found at the following link: [Argoverse.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Argoverse.yaml).
For more information about YAML configurations, see our [datasets](../index.md) guide.
---
comments: true
description: Explore the brain tumor detection dataset with MRI/CT images. Essential for training AI models for early diagnosis and treatment planning.
keywords: brain tumor dataset, MRI scans, CT scans, brain tumor detection, medical imaging, AI in healthcare, computer vision, early diagnosis, treatment planning
---
# Brain Tumor Dataset
A brain tumor detection dataset consists of medical images from MRI or CT scans, containing information about brain tumor presence, location, and characteristics. This dataset is essential for training [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) algorithms to automate brain tumor identification, aiding in early diagnosis and treatment planning.
<p align="center">
<br>
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/ogTBBD8McRk"
title="YouTube video player" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> Brain Tumor Detection using Ultralytics HUB
</p>
## Dataset Structure
The brain tumor dataset is divided into two subsets:
- **Training set**: Consisting of 893 images, each accompanied by corresponding annotations.
- **Testing set**: Comprising 223 images, with annotations paired for each one.
## Applications
The application of brain tumor detection using computer vision enables early diagnosis, treatment planning, and monitoring of tumor progression. By analyzing medical imaging data like MRI or CT scans, computer vision systems assist in accurately identifying brain tumors, aiding in timely medical intervention and personalized treatment strategies.
## Dataset YAML
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the brain tumor dataset, the `brain-tumor.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/brain-tumor.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/brain-tumor.yaml).
!!! example "ultralytics/cfg/datasets/brain-tumor.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/brain-tumor.yaml"
```
## Usage
To train a YOLO11n model on the brain tumor dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, utilize the provided code snippets. For a detailed list of available arguments, consult the model's [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="brain-tumor.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=brain-tumor.yaml model=yolo11n.pt epochs=100 imgsz=640
```
!!! example "Inference Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("path/to/best.pt") # load a brain-tumor fine-tuned model
# Inference using the model
results = model.predict("https://ultralytics.com/assets/brain-tumor-sample.jpg")
```
=== "CLI"
```bash
# Start prediction with a finetuned *.pt model
yolo detect predict model='path/to/best.pt' imgsz=640 source="https://ultralytics.com/assets/brain-tumor-sample.jpg"
```
## Sample Images and Annotations
The brain tumor dataset encompasses a wide array of images featuring diverse object categories and intricate scenes. Presented below are examples of images from the dataset, accompanied by their respective annotations
![Brain tumor dataset sample image](https://github.com/ultralytics/docs/releases/download/0/brain-tumor-dataset-sample-image.avif)
- **Mosaiced Image**: Displayed here is a training batch comprising mosaiced dataset images. Mosaicing, a training technique, consolidates multiple images into one, enhancing batch diversity. This approach aids in improving the model's capacity to generalize across various object sizes, aspect ratios, and contexts.
This example highlights the diversity and intricacy of images within the brain tumor dataset, underscoring the advantages of incorporating mosaicing during the training phase.
## Citations and Acknowledgments
The dataset has been released available under the [AGPL-3.0 License](https://github.com/ultralytics/ultralytics/blob/main/LICENSE).
## FAQ
### What is the structure of the brain tumor dataset available in Ultralytics documentation?
The brain tumor dataset is divided into two subsets: the **training set** consists of 893 images with corresponding annotations, while the **testing set** comprises 223 images with paired annotations. This structured division aids in developing robust and accurate computer vision models for detecting brain tumors. For more information on the dataset structure, visit the [Dataset Structure](#dataset-structure) section.
### How can I train a YOLO11 model on the brain tumor dataset using Ultralytics?
You can train a YOLO11 model on the brain tumor dataset for 100 epochs with an image size of 640px using both Python and CLI methods. Below are the examples for both:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="brain-tumor.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=brain-tumor.yaml model=yolo11n.pt epochs=100 imgsz=640
```
For a detailed list of available arguments, refer to the [Training](../../modes/train.md) page.
### What are the benefits of using the brain tumor dataset for AI in healthcare?
Using the brain tumor dataset in AI projects enables early diagnosis and treatment planning for brain tumors. It helps in automating brain tumor identification through computer vision, facilitating accurate and timely medical interventions, and supporting personalized treatment strategies. This application holds significant potential in improving patient outcomes and medical efficiencies.
### How do I perform inference using a fine-tuned YOLO11 model on the brain tumor dataset?
Inference using a fine-tuned YOLO11 model can be performed with either Python or CLI approaches. Here are the examples:
!!! example "Inference Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("path/to/best.pt") # load a brain-tumor fine-tuned model
# Inference using the model
results = model.predict("https://ultralytics.com/assets/brain-tumor-sample.jpg")
```
=== "CLI"
```bash
# Start prediction with a finetuned *.pt model
yolo detect predict model='path/to/best.pt' imgsz=640 source="https://ultralytics.com/assets/brain-tumor-sample.jpg"
```
### Where can I find the YAML configuration for the brain tumor dataset?
The YAML configuration file for the brain tumor dataset can be found at [brain-tumor.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/brain-tumor.yaml). This file includes paths, classes, and additional relevant information necessary for training and evaluating models on this dataset.
---
comments: true
description: Explore the COCO dataset for object detection and segmentation. Learn about its structure, usage, pretrained models, and key features.
keywords: COCO dataset, object detection, segmentation, benchmarking, computer vision, pose estimation, YOLO models, COCO annotations
---
# COCO Dataset
The [COCO](https://cocodataset.org/#home) (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) models. It is an essential dataset for researchers and developers working on object detection, segmentation, and pose estimation tasks.
<p align="center">
<br>
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/uDrn9QZJ2lk"
title="YouTube video player" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> Ultralytics COCO Dataset Overview
</p>
## COCO Pretrained Models
{% include "macros/yolo-det-perf.md" %}
## Key Features
- COCO contains 330K images, with 200K images having annotations for object detection, segmentation, and captioning tasks.
- The dataset comprises 80 object categories, including common objects like cars, bicycles, and animals, as well as more specific categories such as umbrellas, handbags, and sports equipment.
- Annotations include object bounding boxes, segmentation masks, and captions for each image.
- COCO provides standardized evaluation metrics like [mean Average Precision](https://www.ultralytics.com/glossary/mean-average-precision-map) (mAP) for object detection, and mean Average [Recall](https://www.ultralytics.com/glossary/recall) (mAR) for segmentation tasks, making it suitable for comparing model performance.
## Dataset Structure
The COCO dataset is split into three subsets:
1. **Train2017**: This subset contains 118K images for training object detection, segmentation, and captioning models.
2. **Val2017**: This subset has 5K images used for validation purposes during model training.
3. **Test2017**: This subset consists of 20K images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the [COCO evaluation server](https://codalab.lisn.upsaclay.fr/competitions/7384) for performance evaluation.
## Applications
The COCO dataset is widely used for training and evaluating [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl) models in object detection (such as YOLO, Faster R-CNN, and SSD), [instance segmentation](https://www.ultralytics.com/glossary/instance-segmentation) (such as Mask R-CNN), and keypoint detection (such as OpenPose). The dataset's diverse set of object categories, large number of annotated images, and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners.
## Dataset YAML
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO dataset, the `coco.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco.yaml).
!!! example "ultralytics/cfg/datasets/coco.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/coco.yaml"
```
## Usage
To train a YOLO11n model on the COCO dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="coco.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=coco.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Sample Images and Annotations
The COCO dataset contains a diverse set of images with various object categories and complex scenes. Here are some examples of images from the dataset, along with their corresponding annotations:
![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/mosaiced-coco-dataset-sample.avif)
- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts.
The example showcases the variety and complexity of the images in the COCO dataset and the benefits of using mosaicing during the training process.
## Citations and Acknowledgments
If you use the COCO dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@misc{lin2015microsoft,
title={Microsoft COCO: Common Objects in Context},
author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár},
year={2015},
eprint={1405.0312},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
We would like to acknowledge the COCO Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the COCO dataset and its creators, visit the [COCO dataset website](https://cocodataset.org/#home).
## FAQ
### What is the COCO dataset and why is it important for computer vision?
The [COCO dataset](https://cocodataset.org/#home) (Common Objects in Context) is a large-scale dataset used for [object detection](https://www.ultralytics.com/glossary/object-detection), segmentation, and captioning. It contains 330K images with detailed annotations for 80 object categories, making it essential for benchmarking and training computer vision models. Researchers use COCO due to its diverse categories and standardized evaluation metrics like mean Average [Precision](https://www.ultralytics.com/glossary/precision) (mAP).
### How can I train a YOLO model using the COCO dataset?
To train a YOLO11 model using the COCO dataset, you can use the following code snippets:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="coco.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=coco.yaml model=yolo11n.pt epochs=100 imgsz=640
```
Refer to the [Training page](../../modes/train.md) for more details on available arguments.
### What are the key features of the COCO dataset?
The COCO dataset includes:
- 330K images, with 200K annotated for object detection, segmentation, and captioning.
- 80 object categories ranging from common items like cars and animals to specific ones like handbags and sports equipment.
- Standardized evaluation metrics for object detection (mAP) and segmentation (mean Average Recall, mAR).
- **Mosaicing** technique in training batches to enhance model generalization across various object sizes and contexts.
### Where can I find pretrained YOLO11 models trained on the COCO dataset?
Pretrained YOLO11 models on the COCO dataset can be downloaded from the links provided in the documentation. Examples include:
- [YOLO11n](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt)
- [YOLO11s](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s.pt)
- [YOLO11m](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11m.pt)
- [YOLO11l](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11l.pt)
- [YOLO11x](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt)
These models vary in size, mAP, and inference speed, providing options for different performance and resource requirements.
### How is the COCO dataset structured and how do I use it?
The COCO dataset is split into three subsets:
1. **Train2017**: 118K images for training.
2. **Val2017**: 5K images for validation during training.
3. **Test2017**: 20K images for benchmarking trained models. Results need to be submitted to the [COCO evaluation server](https://codalab.lisn.upsaclay.fr/competitions/7384) for performance evaluation.
The dataset's YAML configuration file is available at [coco.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco.yaml), which defines paths, classes, and dataset details.
---
comments: true
description: Explore the Ultralytics COCO8 dataset, a versatile and manageable set of 8 images perfect for testing object detection models and training pipelines.
keywords: COCO8, Ultralytics, dataset, object detection, YOLO11, training, validation, machine learning, computer vision
---
# COCO8 Dataset
## Introduction
[Ultralytics](https://www.ultralytics.com/) COCO8 is a small, but versatile [object detection](https://www.ultralytics.com/glossary/object-detection) dataset composed of the first 8 images of the COCO train 2017 set, 4 for training and 4 for validation. This dataset is ideal for testing and debugging object detection models, or for experimenting with new detection approaches. With 8 images, it is small enough to be easily manageable, yet diverse enough to test training pipelines for errors and act as a sanity check before training larger datasets.
<p align="center">
<br>
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/uDrn9QZJ2lk"
title="YouTube video player" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> Ultralytics COCO Dataset Overview
</p>
This dataset is intended for use with Ultralytics [HUB](https://hub.ultralytics.com/) and [YOLO11](https://github.com/ultralytics/ultralytics).
## Dataset YAML
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO8 dataset, the `coco8.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco8.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco8.yaml).
!!! example "ultralytics/cfg/datasets/coco8.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/coco8.yaml"
```
## Usage
To train a YOLO11n model on the COCO8 dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=coco8.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Sample Images and Annotations
Here are some examples of images from the COCO8 dataset, along with their corresponding annotations:
<img src="https://github.com/ultralytics/docs/releases/download/0/mosaiced-training-batch-1.avif" alt="Dataset sample image" width="800">
- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts.
The example showcases the variety and complexity of the images in the COCO8 dataset and the benefits of using mosaicing during the training process.
## Citations and Acknowledgments
If you use the COCO dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@misc{lin2015microsoft,
title={Microsoft COCO: Common Objects in Context},
author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár},
year={2015},
eprint={1405.0312},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
We would like to acknowledge the COCO Consortium for creating and maintaining this valuable resource for the [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) community. For more information about the COCO dataset and its creators, visit the [COCO dataset website](https://cocodataset.org/#home).
## FAQ
### What is the Ultralytics COCO8 dataset used for?
The Ultralytics COCO8 dataset is a compact yet versatile object detection dataset consisting of the first 8 images from the COCO train 2017 set, with 4 images for training and 4 for validation. It is designed for testing and debugging object detection models and experimentation with new detection approaches. Despite its small size, COCO8 offers enough diversity to act as a sanity check for your training pipelines before deploying larger datasets. For more details, view the [COCO8 dataset](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco8.yaml).
### How do I train a YOLO11 model using the COCO8 dataset?
To train a YOLO11 model using the COCO8 dataset, you can employ either Python or CLI commands. Here's how you can start:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=coco8.yaml model=yolo11n.pt epochs=100 imgsz=640
```
For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
### Why should I use Ultralytics HUB for managing my COCO8 training?
Ultralytics HUB is an all-in-one web tool designed to simplify the training and deployment of YOLO models, including the Ultralytics YOLO11 models on the COCO8 dataset. It offers cloud training, real-time tracking, and seamless dataset management. HUB allows you to start training with a single click and avoids the complexities of manual setups. Discover more about [Ultralytics HUB](https://hub.ultralytics.com/) and its benefits.
### What are the benefits of using mosaic augmentation in training with the COCO8 dataset?
Mosaic augmentation, demonstrated in the COCO8 dataset, combines multiple images into a single image during training. This technique increases the variety of objects and scenes in each training batch, improving the model's ability to generalize across different object sizes, aspect ratios, and contexts. This results in a more robust object detection model. For more details, refer to the [training guide](#usage).
### How can I validate my YOLO11 model trained on the COCO8 dataset?
Validation of your YOLO11 model trained on the COCO8 dataset can be performed using the model's validation commands. You can invoke the validation mode via CLI or Python script to evaluate the model's performance using precise metrics. For detailed instructions, visit the [Validation](../../modes/val.md) page.
---
comments: true
description: Explore the Global Wheat Head Dataset to develop accurate wheat head detection models. Includes training images, annotations, and usage for crop management.
keywords: Global Wheat Head Dataset, wheat head detection, wheat phenotyping, crop management, deep learning, object detection, training datasets
---
# Global Wheat Head Dataset
The [Global Wheat Head Dataset](https://www.global-wheat.com/) is a collection of images designed to support the development of accurate wheat head detection models for applications in wheat phenotyping and crop management. Wheat heads, also known as spikes, are the grain-bearing parts of the wheat plant. Accurate estimation of wheat head density and size is essential for assessing crop health, maturity, and yield potential. The dataset, created by a collaboration of nine research institutes from seven countries, covers multiple growing regions to ensure models generalize well across different environments.
## Key Features
- The dataset contains over 3,000 training images from Europe (France, UK, Switzerland) and North America (Canada).
- It includes approximately 1,000 test images from Australia, Japan, and China.
- Images are outdoor field images, capturing the natural variability in wheat head appearances.
- Annotations include wheat head bounding boxes to support object detection tasks.
## Dataset Structure
The Global Wheat Head Dataset is organized into two main subsets:
1. **Training Set**: This subset contains over 3,000 images from Europe and North America. The images are labeled with wheat head bounding boxes, providing ground truth for training object detection models.
2. **Test Set**: This subset consists of approximately 1,000 images from Australia, Japan, and China. These images are used for evaluating the performance of trained models on unseen genotypes, environments, and observational conditions.
## Applications
The Global Wheat Head Dataset is widely used for training and evaluating [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl) models in wheat head detection tasks. The dataset's diverse set of images, capturing a wide range of appearances, environments, and conditions, make it a valuable resource for researchers and practitioners in the field of plant phenotyping and crop management.
## Dataset YAML
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. For the case of the Global Wheat Head Dataset, the `GlobalWheat2020.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/GlobalWheat2020.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/GlobalWheat2020.yaml).
!!! example "ultralytics/cfg/datasets/GlobalWheat2020.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/GlobalWheat2020.yaml"
```
## Usage
To train a YOLO11n model on the Global Wheat Head Dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="GlobalWheat2020.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=GlobalWheat2020.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Sample Data and Annotations
The Global Wheat Head Dataset contains a diverse set of outdoor field images, capturing the natural variability in wheat head appearances, environments, and conditions. Here are some examples of data from the dataset, along with their corresponding annotations:
![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/wheat-head-detection-sample.avif)
- **Wheat Head Detection**: This image demonstrates an example of wheat head detection, where wheat heads are annotated with bounding boxes. The dataset provides a variety of images to facilitate the development of models for this task.
The example showcases the variety and complexity of the data in the Global Wheat Head Dataset and highlights the importance of accurate wheat head detection for applications in wheat phenotyping and crop management.
## Citations and Acknowledgments
If you use the Global Wheat Head Dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@article{david2020global,
title={Global Wheat Head Detection (GWHD) Dataset: A Large and Diverse Dataset of High-Resolution RGB-Labelled Images to Develop and Benchmark Wheat Head Detection Methods},
author={David, Etienne and Madec, Simon and Sadeghi-Tehran, Pouria and Aasen, Helge and Zheng, Bangyou and Liu, Shouyang and Kirchgessner, Norbert and Ishikawa, Goro and Nagasawa, Koichi and Badhon, Minhajul and others},
journal={arXiv preprint arXiv:2005.02162},
year={2020}
}
```
We would like to acknowledge the researchers and institutions that contributed to the creation and maintenance of the Global Wheat Head Dataset as a valuable resource for the plant phenotyping and crop management research community. For more information about the dataset and its creators, visit the [Global Wheat Head Dataset website](https://www.global-wheat.com/).
## FAQ
### What is the Global Wheat Head Dataset used for?
The Global Wheat Head Dataset is primarily used for developing and training deep learning models aimed at wheat head detection. This is crucial for applications in wheat phenotyping and crop management, allowing for more accurate estimations of wheat head density, size, and overall crop yield potential. Accurate detection methods help in assessing crop health and maturity, essential for efficient crop management.
### How do I train a YOLO11n model on the Global Wheat Head Dataset?
To train a YOLO11n model on the Global Wheat Head Dataset, you can use the following code snippets. Make sure you have the `GlobalWheat2020.yaml` configuration file specifying dataset paths and classes:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a pre-trained model (recommended for training)
model = YOLO("yolo11n.pt")
# Train the model
results = model.train(data="GlobalWheat2020.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=GlobalWheat2020.yaml model=yolo11n.pt epochs=100 imgsz=640
```
For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
### What are the key features of the Global Wheat Head Dataset?
Key features of the Global Wheat Head Dataset include:
- Over 3,000 training images from Europe (France, UK, Switzerland) and North America (Canada).
- Approximately 1,000 test images from Australia, Japan, and China.
- High variability in wheat head appearances due to different growing environments.
- Detailed annotations with wheat head bounding boxes to aid [object detection](https://www.ultralytics.com/glossary/object-detection) models.
These features facilitate the development of robust models capable of generalization across multiple regions.
### Where can I find the configuration YAML file for the Global Wheat Head Dataset?
The configuration YAML file for the Global Wheat Head Dataset, named `GlobalWheat2020.yaml`, is available on GitHub. You can access it at this [link](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/GlobalWheat2020.yaml). This file contains necessary information about dataset paths, classes, and other configuration details needed for model training in Ultralytics YOLO.
### Why is wheat head detection important in crop management?
Wheat head detection is critical in crop management because it enables accurate estimation of wheat head density and size, which are essential for evaluating crop health, maturity, and yield potential. By leveraging deep learning models trained on datasets like the Global Wheat Head Dataset, farmers and researchers can better monitor and manage crops, leading to improved productivity and optimized resource use in agricultural practices. This technological advancement supports sustainable agriculture and food security initiatives.
For more information on applications of AI in agriculture, visit [AI in Agriculture](https://www.ultralytics.com/solutions/ai-in-agriculture).
---
comments: true
description: Learn about dataset formats compatible with Ultralytics YOLO for robust object detection. Explore supported datasets and learn how to convert formats.
keywords: Ultralytics, YOLO, object detection datasets, dataset formats, COCO, dataset conversion, training datasets
---
# Object Detection Datasets Overview
Training a robust and accurate [object detection](https://www.ultralytics.com/glossary/object-detection) model requires a comprehensive dataset. This guide introduces various formats of datasets that are compatible with the Ultralytics YOLO model and provides insights into their structure, usage, and how to convert between different formats.
## Supported Dataset Formats
### Ultralytics YOLO format
The Ultralytics YOLO format is a dataset configuration format that allows you to define the dataset root directory, the relative paths to training/validation/testing image directories or `*.txt` files containing image paths, and a dictionary of class names. Here is an example:
```yaml
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco8 # dataset root dir
train: images/train # train images (relative to 'path') 4 images
val: images/val # val images (relative to 'path') 4 images
test: # test images (optional)
# Classes (80 COCO classes)
names:
0: person
1: bicycle
2: car
# ...
77: teddy bear
78: hair drier
79: toothbrush
```
Labels for this format should be exported to YOLO format with one `*.txt` file per image. If there are no objects in an image, no `*.txt` file is required. The `*.txt` file should be formatted with one row per object in `class x_center y_center width height` format. Box coordinates must be in **normalized xywh** format (from 0 to 1). If your boxes are in pixels, you should divide `x_center` and `width` by image width, and `y_center` and `height` by image height. Class numbers should be zero-indexed (start with 0).
<p align="center"><img width="750" src="https://github.com/ultralytics/docs/releases/download/0/two-persons-tie.avif" alt="Example labelled image"></p>
The label file corresponding to the above image contains 2 persons (class `0`) and a tie (class `27`):
<p align="center"><img width="428" src="https://github.com/ultralytics/docs/releases/download/0/two-persons-tie-1.avif" alt="Example label file"></p>
When using the Ultralytics YOLO format, organize your training and validation images and labels as shown in the [COCO8 dataset](coco8.md) example below.
<p align="center"><img width="800" src="https://github.com/ultralytics/docs/releases/download/0/two-persons-tie-2.avif" alt="Example dataset directory structure"></p>
## Usage
Here's how you can use these formats to train your model:
!!! example
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=coco8.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Supported Datasets
Here is a list of the supported datasets and a brief description for each:
- [Argoverse](argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
- [COCO](coco.md): Common Objects in Context (COCO) is a large-scale object detection, segmentation, and captioning dataset with 80 object categories.
- [LVIS](lvis.md): A large-scale object detection, segmentation, and captioning dataset with 1203 object categories.
- [COCO8](coco8.md): A smaller subset of the first 4 images from COCO train and COCO val, suitable for quick tests.
- [COCO128](coco.md): A smaller subset of the first 128 images from COCO train and COCO val, suitable for tests.
- [Global Wheat 2020](globalwheat2020.md): A dataset containing images of wheat heads for the Global Wheat Challenge 2020.
- [Objects365](objects365.md): A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
- [OpenImagesV7](open-images-v7.md): A comprehensive dataset by Google with 1.7M train images and 42k validation images.
- [SKU-110K](sku-110k.md): A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes.
- [VisDrone](visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
- [VOC](voc.md): The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
- [xView](xview.md): A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.
- [Roboflow 100](roboflow-100.md): A diverse object detection benchmark with 100 datasets spanning seven imagery domains for comprehensive model evaluation.
- [Brain-tumor](brain-tumor.md): A dataset for detecting brain tumors includes MRI or CT scan images with details on tumor presence, location, and characteristics.
- [African-wildlife](african-wildlife.md): A dataset featuring images of African wildlife, including buffalo, elephant, rhino, and zebras.
- [Signature](signature.md): A dataset featuring images of various documents with annotated signatures, supporting document verification and fraud detection research.
### Adding your own dataset
If you have your own dataset and would like to use it for training detection models with Ultralytics YOLO format, ensure that it follows the format specified above under "Ultralytics YOLO format". Convert your annotations to the required format and specify the paths, number of classes, and class names in the YAML configuration file.
## Port or Convert Label Formats
### COCO Dataset Format to YOLO Format
You can easily convert labels from the popular COCO dataset format to the YOLO format using the following code snippet:
!!! example
=== "Python"
```python
from ultralytics.data.converter import convert_coco
convert_coco(labels_dir="path/to/coco/annotations/")
```
This conversion tool can be used to convert the COCO dataset or any dataset in the COCO format to the Ultralytics YOLO format.
Remember to double-check if the dataset you want to use is compatible with your model and follows the necessary format conventions. Properly formatted datasets are crucial for training successful object detection models.
## FAQ
### What is the Ultralytics YOLO dataset format and how to structure it?
The Ultralytics YOLO format is a structured configuration for defining datasets in your training projects. It involves setting paths to your training, validation, and testing images and corresponding labels. For example:
```yaml
path: ../datasets/coco8 # dataset root directory
train: images/train # training images (relative to 'path')
val: images/val # validation images (relative to 'path')
test: # optional test images
names:
0: person
1: bicycle
2: car
# ...
```
Labels are saved in `*.txt` files with one file per image, formatted as `class x_center y_center width height` with normalized coordinates. For a detailed guide, see the [COCO8 dataset example](coco8.md).
### How do I convert a COCO dataset to the YOLO format?
You can convert a COCO dataset to the YOLO format using the Ultralytics conversion tools. Here's a quick method:
```python
from ultralytics.data.converter import convert_coco
convert_coco(labels_dir="path/to/coco/annotations/")
```
This code will convert your COCO annotations to YOLO format, enabling seamless integration with Ultralytics YOLO models. For additional details, visit the [Port or Convert Label Formats](#port-or-convert-label-formats) section.
### Which datasets are supported by Ultralytics YOLO for object detection?
Ultralytics YOLO supports a wide range of datasets, including:
- [Argoverse](argoverse.md)
- [COCO](coco.md)
- [LVIS](lvis.md)
- [COCO8](coco8.md)
- [Global Wheat 2020](globalwheat2020.md)
- [Objects365](objects365.md)
- [OpenImagesV7](open-images-v7.md)
Each dataset page provides detailed information on the structure and usage tailored for efficient YOLO11 training. Explore the full list in the [Supported Datasets](#supported-datasets) section.
### How do I start training a YOLO11 model using my dataset?
To start training a YOLO11 model, ensure your dataset is formatted correctly and the paths are defined in a YAML file. Use the following script to begin training:
!!! example
=== "Python"
```python
from ultralytics import YOLO
model = YOLO("yolo11n.pt") # Load a pretrained model
results = model.train(data="path/to/your_dataset.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
yolo detect train data=path/to/your_dataset.yaml model=yolo11n.pt epochs=100 imgsz=640
```
Refer to the [Usage](#usage) section for more details on utilizing different modes, including CLI commands.
### Where can I find practical examples of using Ultralytics YOLO for object detection?
Ultralytics provides numerous examples and practical guides for using YOLO11 in diverse applications. For a comprehensive overview, visit the [Ultralytics Blog](https://www.ultralytics.com/blog) where you can find case studies, detailed tutorials, and community stories showcasing object detection, segmentation, and more with YOLO11. For specific examples, check the [Usage](../../modes/predict.md) section in the documentation.
---
comments: true
description: Discover the LVIS dataset by Facebook AI Research, a benchmark for object detection and instance segmentation with a large, diverse vocabulary. Learn how to utilize it.
keywords: LVIS dataset, object detection, instance segmentation, Facebook AI Research, YOLO, computer vision, model training, LVIS examples
---
# LVIS Dataset
The [LVIS dataset](https://www.lvisdataset.org/) is a large-scale, fine-grained vocabulary-level annotation dataset developed and released by Facebook AI Research (FAIR). It is primarily used as a research benchmark for object detection and [instance segmentation](https://www.ultralytics.com/glossary/instance-segmentation) with a large vocabulary of categories, aiming to drive further advancements in computer vision field.
<p align="center">
<br>
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/cfTKj96TjSE"
title="YouTube video player" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> YOLO World training workflow with LVIS dataset
</p>
<p align="center">
<img width="640" src="https://github.com/ultralytics/docs/releases/download/0/lvis-dataset-example-images.avif" alt="LVIS Dataset example images">
</p>
## Key Features
- LVIS contains 160k images and 2M instance annotations for object detection, segmentation, and captioning tasks.
- The dataset comprises 1203 object categories, including common objects like cars, bicycles, and animals, as well as more specific categories such as umbrellas, handbags, and sports equipment.
- Annotations include object bounding boxes, segmentation masks, and captions for each image.
- LVIS provides standardized evaluation metrics like [mean Average Precision](https://www.ultralytics.com/glossary/mean-average-precision-map) (mAP) for object detection, and mean Average [Recall](https://www.ultralytics.com/glossary/recall) (mAR) for segmentation tasks, making it suitable for comparing model performance.
- LVIS uses exactly the same images as [COCO](./coco.md) dataset, but with different splits and different annotations.
## Dataset Structure
The LVIS dataset is split into three subsets:
1. **Train**: This subset contains 100k images for training object detection, segmentation, and captioning models.
2. **Val**: This subset has 20k images used for validation purposes during model training.
3. **Minival**: This subset is exactly the same as COCO val2017 set which has 5k images used for validation purposes during model training.
4. **Test**: This subset consists of 20k images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the [LVIS evaluation server](https://eval.ai/web/challenges/challenge-page/675/overview) for performance evaluation.
## Applications
The LVIS dataset is widely used for training and evaluating [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl) models in object detection (such as YOLO, Faster R-CNN, and SSD), instance segmentation (such as Mask R-CNN). The dataset's diverse set of object categories, large number of annotated images, and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners.
## Dataset YAML
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the LVIS dataset, the `lvis.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/lvis.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/lvis.yaml).
!!! example "ultralytics/cfg/datasets/lvis.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/lvis.yaml"
```
## Usage
To train a YOLO11n model on the LVIS dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="lvis.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=lvis.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Sample Images and Annotations
The LVIS dataset contains a diverse set of images with various object categories and complex scenes. Here are some examples of images from the dataset, along with their corresponding annotations:
![LVIS Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/lvis-mosaiced-training-batch.avif)
- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts.
The example showcases the variety and complexity of the images in the LVIS dataset and the benefits of using mosaicing during the training process.
## Citations and Acknowledgments
If you use the LVIS dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@inproceedings{gupta2019lvis,
title={LVIS: A Dataset for Large Vocabulary Instance Segmentation},
author={Gupta, Agrim and Dollar, Piotr and Girshick, Ross},
booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},
year={2019}
}
```
We would like to acknowledge the LVIS Consortium for creating and maintaining this valuable resource for the [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) community. For more information about the LVIS dataset and its creators, visit the [LVIS dataset website](https://www.lvisdataset.org/).
## FAQ
### What is the LVIS dataset, and how is it used in computer vision?
The [LVIS dataset](https://www.lvisdataset.org/) is a large-scale dataset with fine-grained vocabulary-level annotations developed by Facebook AI Research (FAIR). It is primarily used for object detection and instance segmentation, featuring over 1203 object categories and 2 million instance annotations. Researchers and practitioners use it to train and benchmark models like Ultralytics YOLO for advanced computer vision tasks. The dataset's extensive size and diversity make it an essential resource for pushing the boundaries of model performance in detection and segmentation.
### How can I train a YOLO11n model using the LVIS dataset?
To train a YOLO11n model on the LVIS dataset for 100 epochs with an image size of 640, follow the example below. This process utilizes Ultralytics' framework, which offers comprehensive training features.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="lvis.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=lvis.yaml model=yolo11n.pt epochs=100 imgsz=640
```
For detailed training configurations, refer to the [Training](../../modes/train.md) documentation.
### How does the LVIS dataset differ from the COCO dataset?
The images in the LVIS dataset are the same as those in the [COCO dataset](./coco.md), but the two differ in terms of splitting and annotations. LVIS provides a larger and more detailed vocabulary with 1203 object categories compared to COCO's 80 categories. Additionally, LVIS focuses on annotation completeness and diversity, aiming to push the limits of [object detection](https://www.ultralytics.com/glossary/object-detection) and instance segmentation models by offering more nuanced and comprehensive data.
### Why should I use Ultralytics YOLO for training on the LVIS dataset?
Ultralytics YOLO models, including the latest YOLO11, are optimized for real-time object detection with state-of-the-art [accuracy](https://www.ultralytics.com/glossary/accuracy) and speed. They support a wide range of annotations, such as the fine-grained ones provided by the LVIS dataset, making them ideal for advanced computer vision applications. Moreover, Ultralytics offers seamless integration with various [training](../../modes/train.md), [validation](../../modes/val.md), and [prediction](../../modes/predict.md) modes, ensuring efficient model development and deployment.
### Can I see some sample annotations from the LVIS dataset?
Yes, the LVIS dataset includes a variety of images with diverse object categories and complex scenes. Here is an example of a sample image along with its annotations:
![LVIS Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/lvis-mosaiced-training-batch.avif)
This mosaiced image demonstrates a training batch composed of multiple dataset images combined into one. Mosaicing increases the variety of objects and scenes within each training batch, enhancing the model's ability to generalize across different contexts. For more details on the LVIS dataset, explore the [LVIS dataset documentation](#key-features).
---
comments: true
description: Explore the Objects365 Dataset with 2M images and 30M bounding boxes across 365 categories. Enhance your object detection models with diverse, high-quality data.
keywords: Objects365 dataset, object detection, machine learning, deep learning, computer vision, annotated images, bounding boxes, YOLO11, high-resolution images, dataset configuration
---
# Objects365 Dataset
The [Objects365](https://www.objects365.org/) dataset is a large-scale, high-quality dataset designed to foster object detection research with a focus on diverse objects in the wild. Created by a team of [Megvii](https://en.megvii.com/) researchers, the dataset offers a wide range of high-resolution images with a comprehensive set of annotated bounding boxes covering 365 object categories.
## Key Features
- Objects365 contains 365 object categories, with 2 million images and over 30 million bounding boxes.
- The dataset includes diverse objects in various scenarios, providing a rich and challenging benchmark for object detection tasks.
- Annotations include bounding boxes for objects, making it suitable for training and evaluating object detection models.
- Objects365 pre-trained models significantly outperform ImageNet pre-trained models, leading to better generalization on various tasks.
## Dataset Structure
The Objects365 dataset is organized into a single set of images with corresponding annotations:
- **Images**: The dataset includes 2 million high-resolution images, each containing a variety of objects across 365 categories.
- **Annotations**: The images are annotated with over 30 million bounding boxes, providing comprehensive ground truth information for object detection tasks.
## Applications
The Objects365 dataset is widely used for training and evaluating deep learning models in object detection tasks. The dataset's diverse set of object categories and high-quality annotations make it a valuable resource for researchers and practitioners in the field of [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv).
## Dataset YAML
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. For the case of the Objects365 Dataset, the `Objects365.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Objects365.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Objects365.yaml).
!!! example "ultralytics/cfg/datasets/Objects365.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/Objects365.yaml"
```
## Usage
To train a YOLO11n model on the Objects365 dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="Objects365.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=Objects365.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Sample Data and Annotations
The Objects365 dataset contains a diverse set of high-resolution images with objects from 365 categories, providing rich context for [object detection](https://www.ultralytics.com/glossary/object-detection) tasks. Here are some examples of the images in the dataset:
![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/objects365-sample-image.avif)
- **Objects365**: This image demonstrates an example of object detection, where objects are annotated with bounding boxes. The dataset provides a wide range of images to facilitate the development of models for this task.
The example showcases the variety and complexity of the data in the Objects365 dataset and highlights the importance of accurate object detection for computer vision applications.
## Citations and Acknowledgments
If you use the Objects365 dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@inproceedings{shao2019objects365,
title={Objects365: A Large-scale, High-quality Dataset for Object Detection},
author={Shao, Shuai and Li, Zeming and Zhang, Tianyuan and Peng, Chao and Yu, Gang and Li, Jing and Zhang, Xiangyu and Sun, Jian},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={8425--8434},
year={2019}
}
```
We would like to acknowledge the team of researchers who created and maintain the Objects365 dataset as a valuable resource for the computer vision research community. For more information about the Objects365 dataset and its creators, visit the [Objects365 dataset website](https://www.objects365.org/).
## FAQ
### What is the Objects365 dataset used for?
The [Objects365 dataset](https://www.objects365.org/) is designed for object detection tasks in [machine learning](https://www.ultralytics.com/glossary/machine-learning-ml) and computer vision. It provides a large-scale, high-quality dataset with 2 million annotated images and 30 million bounding boxes across 365 categories. Leveraging such a diverse dataset helps improve the performance and generalization of object detection models, making it invaluable for research and development in the field.
### How can I train a YOLO11 model on the Objects365 dataset?
To train a YOLO11n model using the Objects365 dataset for 100 epochs with an image size of 640, follow these instructions:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="Objects365.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=Objects365.yaml model=yolo11n.pt epochs=100 imgsz=640
```
Refer to the [Training](../../modes/train.md) page for a comprehensive list of available arguments.
### Why should I use the Objects365 dataset for my object detection projects?
The Objects365 dataset offers several advantages for object detection tasks:
1. **Diversity**: It includes 2 million images with objects in diverse scenarios, covering 365 categories.
2. **High-quality Annotations**: Over 30 million bounding boxes provide comprehensive ground truth data.
3. **Performance**: Models pre-trained on Objects365 significantly outperform those trained on datasets like ImageNet, leading to better generalization.
### Where can I find the YAML configuration file for the Objects365 dataset?
The YAML configuration file for the Objects365 dataset is available at [Objects365.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/Objects365.yaml). This file contains essential information such as dataset paths and class labels, crucial for setting up your training environment.
### How does the dataset structure of Objects365 enhance object detection modeling?
The [Objects365 dataset](https://www.objects365.org/) is organized with 2 million high-resolution images and comprehensive annotations of over 30 million bounding boxes. This structure ensures a robust dataset for training [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl) models in object detection, offering a wide variety of objects and scenarios. Such diversity and volume help in developing models that are more accurate and capable of generalizing well to real-world applications. For more details on the dataset structure, refer to the [Dataset YAML](#dataset-yaml) section.
---
comments: true
description: Explore the comprehensive Open Images V7 dataset by Google. Learn about its annotations, applications, and use YOLO11 pretrained models for computer vision tasks.
keywords: Open Images V7, Google dataset, computer vision, YOLO11 models, object detection, image segmentation, visual relationships, AI research, Ultralytics
---
# Open Images V7 Dataset
[Open Images V7](https://storage.googleapis.com/openimages/web/index.html) is a versatile and expansive dataset championed by Google. Aimed at propelling research in the realm of [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv), it boasts a vast collection of images annotated with a plethora of data, including image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives.
<p align="center">
<br>
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/u3pLlgzUeV8"
title="YouTube video player" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> <a href="https://www.ultralytics.com/glossary/object-detection">Object Detection</a> using OpenImagesV7 Pretrained Model
</p>
## Open Images V7 Pretrained Models
| Model | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
| ----------------------------------------------------------------------------------------- | --------------------- | -------------------- | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
| [YOLOv8n](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n-oiv7.pt) | 640 | 18.4 | 142.4 | 1.21 | 3.5 | 10.5 |
| [YOLOv8s](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-oiv7.pt) | 640 | 27.7 | 183.1 | 1.40 | 11.4 | 29.7 |
| [YOLOv8m](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-oiv7.pt) | 640 | 33.6 | 408.5 | 2.26 | 26.2 | 80.6 |
| [YOLOv8l](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-oiv7.pt) | 640 | 34.9 | 596.9 | 2.43 | 44.1 | 167.4 |
| [YOLOv8x](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8x-oiv7.pt) | 640 | 36.3 | 860.6 | 3.56 | 68.7 | 260.6 |
You can use these pretrained for inference or fine-tuning as follows.
!!! example "Pretrained Model Usage Example"
=== "Python"
```python
from ultralytics import YOLO
# Load an Open Images Dataset V7 pretrained YOLOv8n model
model = YOLO("yolov8n-oiv7.pt")
# Run prediction
results = model.predict(source="image.jpg")
# Start training from the pretrained checkpoint
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Predict using an Open Images Dataset V7 pretrained model
yolo detect predict source=image.jpg model=yolov8n-oiv7.pt
# Start training from an Open Images Dataset V7 pretrained checkpoint
yolo detect train data=coco8.yaml model=yolov8n-oiv7.pt epochs=100 imgsz=640
```
![Open Images V7 classes visual](https://github.com/ultralytics/docs/releases/download/0/open-images-v7-classes-visual.avif)
## Key Features
- Encompasses ~9M images annotated in various ways to suit multiple computer vision tasks.
- Houses a staggering 16M bounding boxes across 600 object classes in 1.9M images. These boxes are primarily hand-drawn by experts ensuring high [precision](https://www.ultralytics.com/glossary/precision).
- Visual relationship annotations totaling 3.3M are available, detailing 1,466 unique relationship triplets, object properties, and human activities.
- V5 introduced segmentation masks for 2.8M objects across 350 classes.
- V6 introduced 675k localized narratives that amalgamate voice, text, and mouse traces highlighting described objects.
- V7 introduced 66.4M point-level labels on 1.4M images, spanning 5,827 classes.
- Encompasses 61.4M image-level labels across a diverse set of 20,638 classes.
- Provides a unified platform for image classification, object detection, relationship detection, [instance segmentation](https://www.ultralytics.com/glossary/instance-segmentation), and multimodal image descriptions.
## Dataset Structure
Open Images V7 is structured in multiple components catering to varied computer vision challenges:
- **Images**: About 9 million images, often showcasing intricate scenes with an average of 8.3 objects per image.
- **Bounding Boxes**: Over 16 million boxes that demarcate objects across 600 categories.
- **Segmentation Masks**: These detail the exact boundary of 2.8M objects across 350 classes.
- **Visual Relationships**: 3.3M annotations indicating object relationships, properties, and actions.
- **Localized Narratives**: 675k descriptions combining voice, text, and mouse traces.
- **Point-Level Labels**: 66.4M labels across 1.4M images, suitable for zero/few-shot [semantic segmentation](https://www.ultralytics.com/glossary/semantic-segmentation).
## Applications
Open Images V7 is a cornerstone for training and evaluating state-of-the-art models in various computer vision tasks. The dataset's broad scope and high-quality annotations make it indispensable for researchers and developers specializing in computer vision.
## Dataset YAML
Typically, datasets come with a YAML (Yet Another Markup Language) file that delineates the dataset's configuration. For the case of Open Images V7, a hypothetical `OpenImagesV7.yaml` might exist. For accurate paths and configurations, one should refer to the dataset's official repository or documentation.
!!! example "OpenImagesV7.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/open-images-v7.yaml"
```
## Usage
To train a YOLO11n model on the Open Images V7 dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
!!! warning
The complete Open Images V7 dataset comprises 1,743,042 training images and 41,620 validation images, requiring approximately **561 GB of storage space** upon download.
Executing the commands provided below will trigger an automatic download of the full dataset if it's not already present locally. Before running the below example it's crucial to:
- Verify that your device has enough storage capacity.
- Ensure a robust and speedy internet connection.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a COCO-pretrained YOLO11n model
model = YOLO("yolo11n.pt")
# Train the model on the Open Images V7 dataset
results = model.train(data="open-images-v7.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Train a COCO-pretrained YOLO11n model on the Open Images V7 dataset
yolo detect train data=open-images-v7.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Sample Data and Annotations
Illustrations of the dataset help provide insights into its richness:
![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/oidv7-all-in-one-example-ab.avif)
- **Open Images V7**: This image exemplifies the depth and detail of annotations available, including bounding boxes, relationships, and segmentation masks.
Researchers can gain invaluable insights into the array of computer vision challenges that the dataset addresses, from basic object detection to intricate relationship identification.
## Citations and Acknowledgments
For those employing Open Images V7 in their work, it's prudent to cite the relevant papers and acknowledge the creators:
!!! quote ""
=== "BibTeX"
```bibtex
@article{OpenImages,
author = {Alina Kuznetsova and Hassan Rom and Neil Alldrin and Jasper Uijlings and Ivan Krasin and Jordi Pont-Tuset and Shahab Kamali and Stefan Popov and Matteo Malloci and Alexander Kolesnikov and Tom Duerig and Vittorio Ferrari},
title = {The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale},
year = {2020},
journal = {IJCV}
}
```
A heartfelt acknowledgment goes out to the Google AI team for creating and maintaining the Open Images V7 dataset. For a deep dive into the dataset and its offerings, navigate to the [official Open Images V7 website](https://storage.googleapis.com/openimages/web/index.html).
## FAQ
### What is the Open Images V7 dataset?
Open Images V7 is an extensive and versatile dataset created by Google, designed to advance research in computer vision. It includes image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives, making it ideal for various computer vision tasks such as object detection, segmentation, and relationship detection.
### How do I train a YOLO11 model on the Open Images V7 dataset?
To train a YOLO11 model on the Open Images V7 dataset, you can use both Python and CLI commands. Here's an example of training the YOLO11n model for 100 epochs with an image size of 640:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a COCO-pretrained YOLO11n model
model = YOLO("yolo11n.pt")
# Train the model on the Open Images V7 dataset
results = model.train(data="open-images-v7.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Train a COCO-pretrained YOLO11n model on the Open Images V7 dataset
yolo detect train data=open-images-v7.yaml model=yolo11n.pt epochs=100 imgsz=640
```
For more details on arguments and settings, refer to the [Training](../../modes/train.md) page.
### What are some key features of the Open Images V7 dataset?
The Open Images V7 dataset includes approximately 9 million images with various annotations:
- **Bounding Boxes**: 16 million bounding boxes across 600 object classes.
- **Segmentation Masks**: Masks for 2.8 million objects across 350 classes.
- **Visual Relationships**: 3.3 million annotations indicating relationships, properties, and actions.
- **Localized Narratives**: 675,000 descriptions combining voice, text, and mouse traces.
- **Point-Level Labels**: 66.4 million labels across 1.4 million images.
- **Image-Level Labels**: 61.4 million labels across 20,638 classes.
### What pretrained models are available for the Open Images V7 dataset?
Ultralytics provides several YOLOv8 pretrained models for the Open Images V7 dataset, each with different sizes and performance metrics:
| Model | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
| ----------------------------------------------------------------------------------------- | --------------------- | -------------------- | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
| [YOLOv8n](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n-oiv7.pt) | 640 | 18.4 | 142.4 | 1.21 | 3.5 | 10.5 |
| [YOLOv8s](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-oiv7.pt) | 640 | 27.7 | 183.1 | 1.40 | 11.4 | 29.7 |
| [YOLOv8m](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-oiv7.pt) | 640 | 33.6 | 408.5 | 2.26 | 26.2 | 80.6 |
| [YOLOv8l](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-oiv7.pt) | 640 | 34.9 | 596.9 | 2.43 | 44.1 | 167.4 |
| [YOLOv8x](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8x-oiv7.pt) | 640 | 36.3 | 860.6 | 3.56 | 68.7 | 260.6 |
### What applications can the Open Images V7 dataset be used for?
The Open Images V7 dataset supports a variety of computer vision tasks including:
- **[Image Classification](https://www.ultralytics.com/glossary/image-classification)**
- **Object Detection**
- **Instance Segmentation**
- **Visual Relationship Detection**
- **Multimodal Image Descriptions**
Its comprehensive annotations and broad scope make it suitable for training and evaluating advanced [machine learning](https://www.ultralytics.com/glossary/machine-learning-ml) models, as highlighted in practical use cases detailed in our [applications](#applications) section.
---
comments: true
description: Explore the Roboflow 100 dataset featuring 100 diverse datasets designed to test object detection models across various domains, from healthcare to video games.
keywords: Roboflow 100, Ultralytics, object detection, dataset, benchmarking, machine learning, computer vision, diverse datasets, model evaluation
---
# Roboflow 100 Dataset
Roboflow 100, developed by [Roboflow](https://roboflow.com/?ref=ultralytics) and sponsored by Intel, is a groundbreaking [object detection](../../tasks/detect.md) benchmark. It includes 100 diverse datasets sampled from over 90,000 public datasets. This benchmark is designed to test the adaptability of models to various domains, including healthcare, aerial imagery, and video games.
<p align="center">
<img width="640" src="https://github.com/ultralytics/docs/releases/download/0/roboflow-100-overview.avif" alt="Roboflow 100 Overview">
</p>
## Key Features
- Includes 100 datasets across seven domains: Aerial, Video games, Microscopic, Underwater, Documents, Electromagnetic, and Real World.
- The benchmark comprises 224,714 images across 805 classes, thanks to over 11,170 hours of labeling efforts.
- All images are resized to 640x640 pixels, with a focus on eliminating class ambiguity and filtering out underrepresented classes.
- Annotations include bounding boxes for objects, making it suitable for [training](../../modes/train.md) and evaluating object detection models.
## Dataset Structure
The Roboflow 100 dataset is organized into seven categories, each with a distinct set of datasets, images, and classes:
- **Aerial**: Consists of 7 datasets with a total of 9,683 images, covering 24 distinct classes.
- **Video Games**: Includes 7 datasets, featuring 11,579 images across 88 classes.
- **Microscopic**: Comprises 11 datasets with 13,378 images, spanning 28 classes.
- **Underwater**: Contains 5 datasets, encompassing 18,003 images in 39 classes.
- **Documents**: Consists of 8 datasets with 24,813 images, divided into 90 classes.
- **Electromagnetic**: Made up of 12 datasets, totaling 36,381 images in 41 classes.
- **Real World**: The largest category with 50 datasets, offering 110,615 images across 495 classes.
This structure enables a diverse and extensive testing ground for object detection models, reflecting real-world application scenarios.
## Benchmarking
Dataset benchmarking evaluates machine learning model performance on specific datasets using standardized metrics like [accuracy](https://www.ultralytics.com/glossary/accuracy), [mean average precision](https://www.ultralytics.com/glossary/mean-average-precision-map) and F1-score.
!!! tip "Benchmarking"
Benchmarking results will be stored in "ultralytics-benchmarks/evaluation.txt"
!!! example "Benchmarking example"
=== "Python"
```python
import os
import shutil
from pathlib import Path
from ultralytics.utils.benchmarks import RF100Benchmark
# Initialize RF100Benchmark and set API key
benchmark = RF100Benchmark()
benchmark.set_key(api_key="YOUR_ROBOFLOW_API_KEY")
# Parse dataset and define file paths
names, cfg_yamls = benchmark.parse_dataset()
val_log_file = Path("ultralytics-benchmarks") / "validation.txt"
eval_log_file = Path("ultralytics-benchmarks") / "evaluation.txt"
# Run benchmarks on each dataset in RF100
for ind, path in enumerate(cfg_yamls):
path = Path(path)
if path.exists():
# Fix YAML file and run training
benchmark.fix_yaml(str(path))
os.system(f"yolo detect train data={path} model=yolo11s.pt epochs=1 batch=16")
# Run validation and evaluate
os.system(f"yolo detect val data={path} model=runs/detect/train/weights/best.pt > {val_log_file} 2>&1")
benchmark.evaluate(str(path), str(val_log_file), str(eval_log_file), ind)
# Remove the 'runs' directory
runs_dir = Path.cwd() / "runs"
shutil.rmtree(runs_dir)
else:
print("YAML file path does not exist")
continue
print("RF100 Benchmarking completed!")
```
## Applications
Roboflow 100 is invaluable for various applications related to [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) and [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl). Researchers and engineers can use this benchmark to:
- Evaluate the performance of object detection models in a multi-domain context.
- Test the adaptability of models to real-world scenarios beyond common object recognition.
- Benchmark the capabilities of object detection models across diverse datasets, including those in healthcare, aerial imagery, and video games.
For more ideas and inspiration on real-world applications, be sure to check out [our guides on real-world projects](../../guides/index.md).
## Usage
The Roboflow 100 dataset is available on both [GitHub](https://github.com/roboflow/roboflow-100-benchmark) and [Roboflow Universe](https://universe.roboflow.com/roboflow-100?ref=ultralytics).
You can access it directly from the Roboflow 100 GitHub repository. In addition, on Roboflow Universe, you have the flexibility to download individual datasets by simply clicking the export button within each dataset.
## Sample Data and Annotations
Roboflow 100 consists of datasets with diverse images and videos captured from various angles and domains. Here's a look at examples of annotated images in the RF100 benchmark.
<p align="center">
<img width="640" src="https://github.com/ultralytics/docs/releases/download/0/sample-data-annotations.avif" alt="Sample Data and Annotations">
</p>
The diversity in the Roboflow 100 benchmark that can be seen above is a significant advancement from traditional benchmarks which often focus on optimizing a single metric within a limited domain.
## Citations and Acknowledgments
If you use the Roboflow 100 dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@misc{2211.13523,
Author = {Floriana Ciaglia and Francesco Saverio Zuppichini and Paul Guerrie and Mark McQuade and Jacob Solawetz},
Title = {Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},
Eprint = {arXiv:2211.13523},
}
```
Our thanks go to the Roboflow team and all the contributors for their hard work in creating and sustaining the Roboflow 100 dataset.
If you are interested in exploring more datasets to enhance your object detection and [machine learning](https://www.ultralytics.com/glossary/machine-learning-ml) projects, feel free to visit [our comprehensive dataset collection](../index.md).
## FAQ
### What is the Roboflow 100 dataset, and why is it significant for object detection?
The **Roboflow 100** dataset, developed by [Roboflow](https://roboflow.com/?ref=ultralytics) and sponsored by Intel, is a crucial [object detection](../../tasks/detect.md) benchmark. It features 100 diverse datasets from over 90,000 public datasets, covering domains such as healthcare, aerial imagery, and video games. This diversity ensures that models can adapt to various real-world scenarios, enhancing their robustness and performance.
### How can I use the Roboflow 100 dataset for benchmarking my object detection models?
To use the Roboflow 100 dataset for benchmarking, you can implement the RF100Benchmark class from the Ultralytics library. Here's a brief example:
!!! example "Benchmarking example"
=== "Python"
```python
import os
import shutil
from pathlib import Path
from ultralytics.utils.benchmarks import RF100Benchmark
# Initialize RF100Benchmark and set API key
benchmark = RF100Benchmark()
benchmark.set_key(api_key="YOUR_ROBOFLOW_API_KEY")
# Parse dataset and define file paths
names, cfg_yamls = benchmark.parse_dataset()
val_log_file = Path("ultralytics-benchmarks") / "validation.txt"
eval_log_file = Path("ultralytics-benchmarks") / "evaluation.txt"
# Run benchmarks on each dataset in RF100
for ind, path in enumerate(cfg_yamls):
path = Path(path)
if path.exists():
# Fix YAML file and run training
benchmark.fix_yaml(str(path))
os.system(f"yolo detect train data={path} model=yolo11n.pt epochs=1 batch=16")
# Run validation and evaluate
os.system(f"yolo detect val data={path} model=runs/detect/train/weights/best.pt > {val_log_file} 2>&1")
benchmark.evaluate(str(path), str(val_log_file), str(eval_log_file), ind)
# Remove 'runs' directory
runs_dir = Path.cwd() / "runs"
shutil.rmtree(runs_dir)
else:
print("YAML file path does not exist")
continue
print("RF100 Benchmarking completed!")
```
### Which domains are covered by the Roboflow 100 dataset?
The **Roboflow 100** dataset spans seven domains, each providing unique challenges and applications for [object detection](https://www.ultralytics.com/glossary/object-detection) models:
1. **Aerial**: 7 datasets, 9,683 images, 24 classes
2. **Video Games**: 7 datasets, 11,579 images, 88 classes
3. **Microscopic**: 11 datasets, 13,378 images, 28 classes
4. **Underwater**: 5 datasets, 18,003 images, 39 classes
5. **Documents**: 8 datasets, 24,813 images, 90 classes
6. **Electromagnetic**: 12 datasets, 36,381 images, 41 classes
7. **Real World**: 50 datasets, 110,615 images, 495 classes
This setup allows for extensive and varied testing of models across different real-world applications.
### How do I access and download the Roboflow 100 dataset?
The **Roboflow 100** dataset is accessible on [GitHub](https://github.com/roboflow/roboflow-100-benchmark) and [Roboflow Universe](https://universe.roboflow.com/roboflow-100?ref=ultralytics). You can download the entire dataset from GitHub or select individual datasets on Roboflow Universe using the export button.
### What should I include when citing the Roboflow 100 dataset in my research?
When using the Roboflow 100 dataset in your research, ensure to properly cite it. Here is the recommended citation:
!!! quote ""
=== "BibTeX"
```bibtex
@misc{2211.13523,
Author = {Floriana Ciaglia and Francesco Saverio Zuppichini and Paul Guerrie and Mark McQuade and Jacob Solawetz},
Title = {Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},
Eprint = {arXiv:2211.13523},
}
```
For more details, you can refer to our [comprehensive dataset collection](../index.md).
---
comments: true
description: Discover the Signature Detection Dataset for training models to identify and verify human signatures in various documents. Perfect for document verification and fraud prevention.
keywords: Signature Detection Dataset, document verification, fraud detection, computer vision, YOLO11, Ultralytics, annotated signatures, training dataset
---
# Signature Detection Dataset
This dataset focuses on detecting human written signatures within documents. It includes a variety of document types with annotated signatures, providing valuable insights for applications in document verification and fraud detection. Essential for training [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) algorithms, this dataset aids in identifying signatures in various document formats, supporting research and practical applications in document analysis.
## Dataset Structure
The signature detection dataset is split into three subsets:
- **Training set**: Contains 143 images, each with corresponding annotations.
- **Validation set**: Includes 35 images, each with paired annotations.
## Applications
This dataset can be applied in various computer vision tasks such as object detection, object tracking, and document analysis. Specifically, it can be used to train and evaluate models for identifying signatures in documents, which can have applications in document verification, fraud detection, and archival research. Additionally, it can serve as a valuable resource for educational purposes, enabling students and researchers to study and understand the characteristics and behaviors of signatures in different document types.
## Dataset YAML
A YAML (Yet Another Markup Language) file defines the dataset configuration, including paths and classes information. For the signature detection dataset, the `signature.yaml` file is located at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/signature.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/signature.yaml).
!!! example "ultralytics/cfg/datasets/signature.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/signature.yaml"
```
## Usage
To train a YOLO11n model on the signature detection dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, use the provided code samples. For a comprehensive list of available parameters, refer to the model's [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="signature.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=signature.yaml model=yolo11n.pt epochs=100 imgsz=640
```
!!! example "Inference Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("path/to/best.pt") # load a signature-detection fine-tuned model
# Inference using the model
results = model.predict("https://ultralytics.com/assets/signature-s.mp4", conf=0.75)
```
=== "CLI"
```bash
# Start prediction with a finetuned *.pt model
yolo detect predict model='path/to/best.pt' imgsz=640 source="https://ultralytics.com/assets/signature-s.mp4" conf=0.75
```
## Sample Images and Annotations
The signature detection dataset comprises a wide variety of images showcasing different document types and annotated signatures. Below are examples of images from the dataset, each accompanied by its corresponding annotations.
![Signature detection dataset sample image](https://github.com/ultralytics/docs/releases/download/0/signature-detection-mosaiced-sample.avif)
- **Mosaiced Image**: Here, we present a training batch consisting of mosaiced dataset images. Mosaicing, a training technique, combines multiple images into one, enriching batch diversity. This method helps enhance the model's ability to generalize across different signature sizes, aspect ratios, and contexts.
This example illustrates the variety and complexity of images in the signature Detection Dataset, emphasizing the benefits of including mosaicing during the training process.
## Citations and Acknowledgments
The dataset has been released available under the [AGPL-3.0 License](https://github.com/ultralytics/ultralytics/blob/main/LICENSE).
## FAQ
### What is the Signature Detection Dataset, and how can it be used?
The Signature Detection Dataset is a collection of annotated images aimed at detecting human signatures within various document types. It can be applied in computer vision tasks such as [object detection](https://www.ultralytics.com/glossary/object-detection) and tracking, primarily for document verification, fraud detection, and archival research. This dataset helps train models to recognize signatures in different contexts, making it valuable for both research and practical applications.
### How do I train a YOLO11n model on the Signature Detection Dataset?
To train a YOLO11n model on the Signature Detection Dataset, follow these steps:
1. Download the `signature.yaml` dataset configuration file from [signature.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/signature.yaml).
2. Use the following Python script or CLI command to start training:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a pretrained model
model = YOLO("yolo11n.pt")
# Train the model
results = model.train(data="signature.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
yolo detect train data=signature.yaml model=yolo11n.pt epochs=100 imgsz=640
```
For more details, refer to the [Training](../../modes/train.md) page.
### What are the main applications of the Signature Detection Dataset?
The Signature Detection Dataset can be used for:
1. **Document Verification**: Automatically verifying the presence and authenticity of human signatures in documents.
2. **Fraud Detection**: Identifying forged or fraudulent signatures in legal and financial documents.
3. **Archival Research**: Assisting historians and archivists in the digital analysis and cataloging of historical documents.
4. **Education**: Supporting academic research and teaching in the fields of computer vision and [machine learning](https://www.ultralytics.com/glossary/machine-learning-ml).
### How can I perform inference using a model trained on the Signature Detection Dataset?
To perform inference using a model trained on the Signature Detection Dataset, follow these steps:
1. Load your fine-tuned model.
2. Use the below Python script or CLI command to perform inference:
!!! example "Inference Example"
=== "Python"
```python
from ultralytics import YOLO
# Load the fine-tuned model
model = YOLO("path/to/best.pt")
# Perform inference
results = model.predict("https://ultralytics.com/assets/signature-s.mp4", conf=0.75)
```
=== "CLI"
```bash
yolo detect predict model='path/to/best.pt' imgsz=640 source="https://ultralytics.com/assets/signature-s.mp4" conf=0.75
```
### What is the structure of the Signature Detection Dataset, and where can I find more information?
The Signature Detection Dataset is divided into two subsets:
- **Training Set**: Contains 143 images with annotations.
- **Validation Set**: Includes 35 images with annotations.
For detailed information, you can refer to the [Dataset Structure](#dataset-structure) section. Additionally, view the complete dataset configuration in the `signature.yaml` file located at [signature.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/signature.yaml).
---
comments: true
description: Explore the SKU-110k dataset of densely packed retail shelf images, perfect for training and evaluating deep learning models in object detection tasks.
keywords: SKU-110k, dataset, object detection, retail shelf images, deep learning, computer vision, model training
---
# SKU-110k Dataset
The [SKU-110k](https://github.com/eg4000/SKU110K_CVPR19) dataset is a collection of densely packed retail shelf images, designed to support research in [object detection](https://www.ultralytics.com/glossary/object-detection) tasks. Developed by Eran Goldman et al., the dataset contains over 110,000 unique store keeping unit (SKU) categories with densely packed objects, often looking similar or even identical, positioned in close proximity.
<p align="center">
<br>
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/_gRqR-miFPE"
title="YouTube video player" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> How to Train YOLOv10 on SKU-110k Dataset using Ultralytics | Retail Dataset
</p>
![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/densely-packed-retail-shelf.avif)
## Key Features
- SKU-110k contains images of store shelves from around the world, featuring densely packed objects that pose challenges for state-of-the-art object detectors.
- The dataset includes over 110,000 unique SKU categories, providing a diverse range of object appearances.
- Annotations include bounding boxes for objects and SKU category labels.
## Dataset Structure
The SKU-110k dataset is organized into three main subsets:
1. **Training set**: This subset contains images and annotations used for training object detection models.
2. **Validation set**: This subset consists of images and annotations used for model validation during training.
3. **Test set**: This subset is designed for the final evaluation of trained object detection models.
## Applications
The SKU-110k dataset is widely used for training and evaluating deep learning models in object detection tasks, especially in densely packed scenes such as retail shelf displays. The dataset's diverse set of SKU categories and densely packed object arrangements make it a valuable resource for researchers and practitioners in the field of [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv).
## Dataset YAML
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. For the case of the SKU-110K dataset, the `SKU-110K.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/SKU-110K.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/SKU-110K.yaml).
!!! example "ultralytics/cfg/datasets/SKU-110K.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/SKU-110K.yaml"
```
## Usage
To train a YOLO11n model on the SKU-110K dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="SKU-110K.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=SKU-110K.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Sample Data and Annotations
The SKU-110k dataset contains a diverse set of retail shelf images with densely packed objects, providing rich context for object detection tasks. Here are some examples of data from the dataset, along with their corresponding annotations:
![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/densely-packed-retail-shelf-1.avif)
- **Densely packed retail shelf image**: This image demonstrates an example of densely packed objects in a retail shelf setting. Objects are annotated with bounding boxes and SKU category labels.
The example showcases the variety and complexity of the data in the SKU-110k dataset and highlights the importance of high-quality data for object detection tasks.
## Citations and Acknowledgments
If you use the SKU-110k dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@inproceedings{goldman2019dense,
author = {Eran Goldman and Roei Herzig and Aviv Eisenschtat and Jacob Goldberger and Tal Hassner},
title = {Precise Detection in Densely Packed Scenes},
booktitle = {Proc. Conf. Comput. Vision Pattern Recognition (CVPR)},
year = {2019}
}
```
We would like to acknowledge Eran Goldman et al. for creating and maintaining the SKU-110k dataset as a valuable resource for the computer vision research community. For more information about the SKU-110k dataset and its creators, visit the [SKU-110k dataset GitHub repository](https://github.com/eg4000/SKU110K_CVPR19).
## FAQ
### What is the SKU-110k dataset and why is it important for object detection?
The SKU-110k dataset consists of densely packed retail shelf images designed to aid research in object detection tasks. Developed by Eran Goldman et al., it includes over 110,000 unique SKU categories. Its importance lies in its ability to challenge state-of-the-art object detectors with diverse object appearances and close proximity, making it an invaluable resource for researchers and practitioners in computer vision. Learn more about the dataset's structure and applications in our [SKU-110k Dataset](#sku-110k-dataset) section.
### How do I train a YOLO11 model using the SKU-110k dataset?
Training a YOLO11 model on the SKU-110k dataset is straightforward. Here's an example to train a YOLO11n model for 100 epochs with an image size of 640:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="SKU-110K.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=SKU-110K.yaml model=yolo11n.pt epochs=100 imgsz=640
```
For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
### What are the main subsets of the SKU-110k dataset?
The SKU-110k dataset is organized into three main subsets:
1. **Training set**: Contains images and annotations used for training object detection models.
2. **Validation set**: Consists of images and annotations used for model validation during training.
3. **Test set**: Designed for the final evaluation of trained object detection models.
Refer to the [Dataset Structure](#dataset-structure) section for more details.
### How do I configure the SKU-110k dataset for training?
The SKU-110k dataset configuration is defined in a YAML file, which includes details about the dataset's paths, classes, and other relevant information. The `SKU-110K.yaml` file is maintained at [SKU-110K.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/SKU-110K.yaml). For example, you can train a model using this configuration as shown in our [Usage](#usage) section.
### What are the key features of the SKU-110k dataset in the context of [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl)?
The SKU-110k dataset features images of store shelves from around the world, showcasing densely packed objects that pose significant challenges for object detectors:
- Over 110,000 unique SKU categories
- Diverse object appearances
- Annotations include bounding boxes and SKU category labels
These features make the SKU-110k dataset particularly valuable for training and evaluating deep learning models in object detection tasks. For more details, see the [Key Features](#key-features) section.
### How do I cite the SKU-110k dataset in my research?
If you use the SKU-110k dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@inproceedings{goldman2019dense,
author = {Eran Goldman and Roei Herzig and Aviv Eisenschtat and Jacob Goldberger and Tal Hassner},
title = {Precise Detection in Densely Packed Scenes},
booktitle = {Proc. Conf. Comput. Vision Pattern Recognition (CVPR)},
year = {2019}
}
```
More information about the dataset can be found in the [Citations and Acknowledgments](#citations-and-acknowledgments) section.
---
comments: true
description: Explore the VisDrone Dataset, a large-scale benchmark for drone-based image and video analysis with over 2.6 million annotations for objects like pedestrians and vehicles.
keywords: VisDrone, drone dataset, computer vision, object detection, object tracking, crowd counting, machine learning, deep learning
---
# VisDrone Dataset
The [VisDrone Dataset](https://github.com/VisDrone/VisDrone-Dataset) is a large-scale benchmark created by the AISKYEYE team at the Lab of [Machine Learning](https://www.ultralytics.com/glossary/machine-learning-ml) and Data Mining, Tianjin University, China. It contains carefully annotated ground truth data for various computer vision tasks related to drone-based image and video analysis.
<p align="center">
<br>
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/28JV4rbzklM"
title="YouTube video player" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> How to Train Ultralytics YOLO Models on the VisDrone Dataset for Drone Image Analysis
</p>
VisDrone is composed of 288 video clips with 261,908 frames and 10,209 static images, captured by various drone-mounted cameras. The dataset covers a wide range of aspects, including location (14 different cities across China), environment (urban and rural), objects (pedestrians, vehicles, bicycles, etc.), and density (sparse and crowded scenes). The dataset was collected using various drone platforms under different scenarios and weather and lighting conditions. These frames are manually annotated with over 2.6 million bounding boxes of targets such as pedestrians, cars, bicycles, and tricycles. Attributes like scene visibility, object class, and occlusion are also provided for better data utilization.
## Dataset Structure
The VisDrone dataset is organized into five main subsets, each focusing on a specific task:
1. **Task 1**: Object detection in images
2. **Task 2**: Object detection in videos
3. **Task 3**: Single-object tracking
4. **Task 4**: Multi-object tracking
5. **Task 5**: Crowd counting
## Applications
The VisDrone dataset is widely used for training and evaluating deep learning models in drone-based [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) tasks such as object detection, object tracking, and crowd counting. The dataset's diverse set of sensor data, object annotations, and attributes make it a valuable resource for researchers and practitioners in the field of drone-based computer vision.
## Dataset YAML
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the Visdrone dataset, the `VisDrone.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/VisDrone.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/VisDrone.yaml).
!!! example "ultralytics/cfg/datasets/VisDrone.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/VisDrone.yaml"
```
## Usage
To train a YOLO11n model on the VisDrone dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="VisDrone.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=VisDrone.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Sample Data and Annotations
The VisDrone dataset contains a diverse set of images and videos captured by drone-mounted cameras. Here are some examples of data from the dataset, along with their corresponding annotations:
![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/visdrone-object-detection-sample.avif)
- **Task 1**: [Object detection](https://www.ultralytics.com/glossary/object-detection) in images - This image demonstrates an example of object detection in images, where objects are annotated with bounding boxes. The dataset provides a wide variety of images taken from different locations, environments, and densities to facilitate the development of models for this task.
The example showcases the variety and complexity of the data in the VisDrone dataset and highlights the importance of high-quality sensor data for drone-based computer vision tasks.
## Citations and Acknowledgments
If you use the VisDrone dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@ARTICLE{9573394,
author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Detection and Tracking Meet Drones Challenge},
year={2021},
volume={},
number={},
pages={1-1},
doi={10.1109/TPAMI.2021.3119563}}
```
We would like to acknowledge the AISKYEYE team at the Lab of Machine Learning and [Data Mining](https://www.ultralytics.com/glossary/data-mining), Tianjin University, China, for creating and maintaining the VisDrone dataset as a valuable resource for the drone-based computer vision research community. For more information about the VisDrone dataset and its creators, visit the [VisDrone Dataset GitHub repository](https://github.com/VisDrone/VisDrone-Dataset).
## FAQ
### What is the VisDrone Dataset and what are its key features?
The [VisDrone Dataset](https://github.com/VisDrone/VisDrone-Dataset) is a large-scale benchmark created by the AISKYEYE team at Tianjin University, China. It is designed for various computer vision tasks related to drone-based image and video analysis. Key features include:
- **Composition**: 288 video clips with 261,908 frames and 10,209 static images.
- **Annotations**: Over 2.6 million bounding boxes for objects like pedestrians, cars, bicycles, and tricycles.
- **Diversity**: Collected across 14 cities, in urban and rural settings, under different weather and lighting conditions.
- **Tasks**: Split into five main tasks—object detection in images and videos, single-object and multi-object tracking, and crowd counting.
### How can I use the VisDrone Dataset to train a YOLO11 model with Ultralytics?
To train a YOLO11 model on the VisDrone dataset for 100 epochs with an image size of 640, you can follow these steps:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a pretrained model
model = YOLO("yolo11n.pt")
# Train the model
results = model.train(data="VisDrone.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=VisDrone.yaml model=yolo11n.pt epochs=100 imgsz=640
```
For additional configuration options, please refer to the model [Training](../../modes/train.md) page.
### What are the main subsets of the VisDrone dataset and their applications?
The VisDrone dataset is divided into five main subsets, each tailored for a specific computer vision task:
1. **Task 1**: Object detection in images.
2. **Task 2**: Object detection in videos.
3. **Task 3**: Single-object tracking.
4. **Task 4**: Multi-object tracking.
5. **Task 5**: Crowd counting.
These subsets are widely used for training and evaluating [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl) models in drone-based applications such as surveillance, traffic monitoring, and public safety.
### Where can I find the configuration file for the VisDrone dataset in Ultralytics?
The configuration file for the VisDrone dataset, `VisDrone.yaml`, can be found in the Ultralytics repository at the following link:
[VisDrone.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/VisDrone.yaml).
### How can I cite the VisDrone dataset if I use it in my research?
If you use the VisDrone dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@ARTICLE{9573394,
author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Detection and Tracking Meet Drones Challenge},
year={2021},
volume={},
number={},
pages={1-1},
doi={10.1109/TPAMI.2021.3119563}
}
```
---
comments: true
description: Discover the PASCAL VOC dataset, essential for object detection, segmentation, and classification. Learn key features, applications, and usage tips.
keywords: PASCAL VOC, VOC dataset, object detection, segmentation, classification, YOLO, Faster R-CNN, Mask R-CNN, image annotations, computer vision
---
# VOC Dataset
The [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) (Visual Object Classes) dataset is a well-known object detection, segmentation, and classification dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and classification tasks.
## Key Features
- VOC dataset includes two main challenges: VOC2007 and VOC2012.
- The dataset comprises 20 object categories, including common objects like cars, bicycles, and animals, as well as more specific categories such as boats, sofas, and dining tables.
- Annotations include object bounding boxes and class labels for object detection and classification tasks, and segmentation masks for the segmentation tasks.
- VOC provides standardized evaluation metrics like [mean Average Precision](https://www.ultralytics.com/glossary/mean-average-precision-map) (mAP) for object detection and classification, making it suitable for comparing model performance.
## Dataset Structure
The VOC dataset is split into three subsets:
1. **Train**: This subset contains images for training object detection, segmentation, and classification models.
2. **Validation**: This subset has images used for validation purposes during model training.
3. **Test**: This subset consists of images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the [PASCAL VOC evaluation server](http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php) for performance evaluation.
## Applications
The VOC dataset is widely used for training and evaluating [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl) models in object detection (such as YOLO, Faster R-CNN, and SSD), [instance segmentation](https://www.ultralytics.com/glossary/instance-segmentation) (such as Mask R-CNN), and [image classification](https://www.ultralytics.com/glossary/image-classification). The dataset's diverse set of object categories, large number of annotated images, and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners.
## Dataset YAML
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the VOC dataset, the `VOC.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/VOC.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/VOC.yaml).
!!! example "ultralytics/cfg/datasets/VOC.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/VOC.yaml"
```
## Usage
To train a YOLO11n model on the VOC dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="VOC.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=VOC.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Sample Images and Annotations
The VOC dataset contains a diverse set of images with various object categories and complex scenes. Here are some examples of images from the dataset, along with their corresponding annotations:
![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/mosaiced-voc-dataset-sample.avif)
- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts.
The example showcases the variety and complexity of the images in the VOC dataset and the benefits of using mosaicing during the training process.
## Citations and Acknowledgments
If you use the VOC dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@misc{everingham2010pascal,
title={The PASCAL Visual Object Classes (VOC) Challenge},
author={Mark Everingham and Luc Van Gool and Christopher K. I. Williams and John Winn and Andrew Zisserman},
year={2010},
eprint={0909.5206},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
We would like to acknowledge the PASCAL VOC Consortium for creating and maintaining this valuable resource for the [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) community. For more information about the VOC dataset and its creators, visit the [PASCAL VOC dataset website](http://host.robots.ox.ac.uk/pascal/VOC/).
## FAQ
### What is the PASCAL VOC dataset and why is it important for computer vision tasks?
The [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) (Visual Object Classes) dataset is a renowned benchmark for [object detection](https://www.ultralytics.com/glossary/object-detection), segmentation, and classification in computer vision. It includes comprehensive annotations like bounding boxes, class labels, and segmentation masks across 20 different object categories. Researchers use it widely to evaluate the performance of models like Faster R-CNN, YOLO, and Mask R-CNN due to its standardized evaluation metrics such as mean Average Precision (mAP).
### How do I train a YOLO11 model using the VOC dataset?
To train a YOLO11 model with the VOC dataset, you need the dataset configuration in a YAML file. Here's an example to start training a YOLO11n model for 100 epochs with an image size of 640:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="VOC.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=VOC.yaml model=yolo11n.pt epochs=100 imgsz=640
```
### What are the primary challenges included in the VOC dataset?
The VOC dataset includes two main challenges: VOC2007 and VOC2012. These challenges test object detection, segmentation, and classification across 20 diverse object categories. Each image is meticulously annotated with bounding boxes, class labels, and segmentation masks. The challenges provide standardized metrics like mAP, facilitating the comparison and benchmarking of different computer vision models.
### How does the PASCAL VOC dataset enhance model benchmarking and evaluation?
The PASCAL VOC dataset enhances model benchmarking and evaluation through its detailed annotations and standardized metrics like mean Average [Precision](https://www.ultralytics.com/glossary/precision) (mAP). These metrics are crucial for assessing the performance of object detection and classification models. The dataset's diverse and complex images ensure comprehensive model evaluation across various real-world scenarios.
### How do I use the VOC dataset for [semantic segmentation](https://www.ultralytics.com/glossary/semantic-segmentation) in YOLO models?
To use the VOC dataset for semantic segmentation tasks with YOLO models, you need to configure the dataset properly in a YAML file. The YAML file defines paths and classes needed for training segmentation models. Check the VOC dataset YAML configuration file at [VOC.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/VOC.yaml) for detailed setups.
---
comments: true
description: Explore the xView dataset, a rich resource of 1M+ object instances in high-resolution satellite imagery. Enhance detection, learning efficiency, and more.
keywords: xView dataset, overhead imagery, satellite images, object detection, high resolution, bounding boxes, computer vision, TensorFlow, PyTorch, dataset structure
---
# xView Dataset
The [xView](http://xviewdataset.org/) dataset is one of the largest publicly available datasets of overhead imagery, containing images from complex scenes around the world annotated using bounding boxes. The goal of the xView dataset is to accelerate progress in four [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) frontiers:
1. Reduce minimum resolution for detection.
2. Improve learning efficiency.
3. Enable discovery of more object classes.
4. Improve detection of fine-grained classes.
xView builds on the success of challenges like Common Objects in Context (COCO) and aims to leverage computer vision to analyze the growing amount of available imagery from space in order to understand the visual world in new ways and address a range of important applications.
## Key Features
- xView contains over 1 million object instances across 60 classes.
- The dataset has a resolution of 0.3 meters, providing higher resolution imagery than most public satellite imagery datasets.
- xView features a diverse collection of small, rare, fine-grained, and multi-type objects with [bounding box](https://www.ultralytics.com/glossary/bounding-box) annotation.
- Comes with a pre-trained baseline model using the TensorFlow object detection API and an example for [PyTorch](https://www.ultralytics.com/glossary/pytorch).
## Dataset Structure
The xView dataset is composed of satellite images collected from WorldView-3 satellites at a 0.3m ground sample distance. It contains over 1 million objects across 60 classes in over 1,400 km² of imagery.
## Applications
The xView dataset is widely used for training and evaluating deep learning models for object detection in overhead imagery. The dataset's diverse set of object classes and high-resolution imagery make it a valuable resource for researchers and practitioners in the field of computer vision, especially for satellite imagery analysis.
## Dataset YAML
A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the xView dataset, the `xView.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/xView.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/xView.yaml).
!!! example "ultralytics/cfg/datasets/xView.yaml"
```yaml
--8<-- "ultralytics/cfg/datasets/xView.yaml"
```
## Usage
To train a model on the xView dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch) with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="xView.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=xView.yaml model=yolo11n.pt epochs=100 imgsz=640
```
## Sample Data and Annotations
The xView dataset contains high-resolution satellite images with a diverse set of objects annotated using bounding boxes. Here are some examples of data from the dataset, along with their corresponding annotations:
![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/overhead-imagery-object-detection.avif)
- **Overhead Imagery**: This image demonstrates an example of [object detection](https://www.ultralytics.com/glossary/object-detection) in overhead imagery, where objects are annotated with bounding boxes. The dataset provides high-resolution satellite images to facilitate the development of models for this task.
The example showcases the variety and complexity of the data in the xView dataset and highlights the importance of high-quality satellite imagery for object detection tasks.
## Citations and Acknowledgments
If you use the xView dataset in your research or development work, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@misc{lam2018xview,
title={xView: Objects in Context in Overhead Imagery},
author={Darius Lam and Richard Kuzma and Kevin McGee and Samuel Dooley and Michael Laielli and Matthew Klaric and Yaroslav Bulatov and Brendan McCord},
year={2018},
eprint={1802.07856},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
We would like to acknowledge the [Defense Innovation Unit](https://www.diu.mil/) (DIU) and the creators of the xView dataset for their valuable contribution to the computer vision research community. For more information about the xView dataset and its creators, visit the [xView dataset website](http://xviewdataset.org/).
## FAQ
### What is the xView dataset and how does it benefit computer vision research?
The [xView](http://xviewdataset.org/) dataset is one of the largest publicly available collections of high-resolution overhead imagery, containing over 1 million object instances across 60 classes. It is designed to enhance various facets of computer vision research such as reducing the minimum resolution for detection, improving learning efficiency, discovering more object classes, and advancing fine-grained object detection.
### How can I use Ultralytics YOLO to train a model on the xView dataset?
To train a model on the xView dataset using Ultralytics YOLO, follow these steps:
!!! example "Train Example"
=== "Python"
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)
# Train the model
results = model.train(data="xView.yaml", epochs=100, imgsz=640)
```
=== "CLI"
```bash
# Start training from a pretrained *.pt model
yolo detect train data=xView.yaml model=yolo11n.pt epochs=100 imgsz=640
```
For detailed arguments and settings, refer to the model [Training](../../modes/train.md) page.
### What are the key features of the xView dataset?
The xView dataset stands out due to its comprehensive set of features:
- Over 1 million object instances across 60 distinct classes.
- High-resolution imagery at 0.3 meters.
- Diverse object types including small, rare, and fine-grained objects, all annotated with bounding boxes.
- Availability of a pre-trained baseline model and examples in [TensorFlow](https://www.ultralytics.com/glossary/tensorflow) and PyTorch.
### What is the dataset structure of xView, and how is it annotated?
The xView dataset comprises high-resolution satellite images collected from WorldView-3 satellites at a 0.3m ground sample distance. It encompasses over 1 million objects across 60 classes in approximately 1,400 km² of imagery. Each object within the dataset is annotated with bounding boxes, making it ideal for training and evaluating [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl) models for object detection in overhead imagery. For a detailed overview, you can look at the dataset structure section [here](#dataset-structure).
### How do I cite the xView dataset in my research?
If you utilize the xView dataset in your research, please cite the following paper:
!!! quote ""
=== "BibTeX"
```bibtex
@misc{lam2018xview,
title={xView: Objects in Context in Overhead Imagery},
author={Darius Lam and Richard Kuzma and Kevin McGee and Samuel Dooley and Michael Laielli and Matthew Klaric and Yaroslav Bulatov and Brendan McCord},
year={2018},
eprint={1802.07856},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
For more information about the xView dataset, visit the official [xView dataset website](http://xviewdataset.org/).
---
comments: true
description: Explore the Ultralytics Explorer API for dataset exploration with SQL queries, vector similarity search, and semantic search. Learn installation and usage tips.
keywords: Ultralytics, Explorer API, dataset exploration, SQL queries, similarity search, semantic search, Python API, LanceDB, embeddings, data analysis
---
# Ultralytics Explorer API
!!! warning "Community Note ⚠️"
As of **`ultralytics>=8.3.10`**, Ultralytics explorer support has been deprecated. But don't worry! You can now access similar and even enhanced functionality through [Ultralytics HUB](https://hub.ultralytics.com/), our intuitive no-code platform designed to streamline your workflow. With Ultralytics HUB, you can continue exploring, visualizing, and managing your data effortlessly, all without writing a single line of code. Make sure to check it out and take advantage of its powerful features!🚀
## Introduction
<a href="https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/docs/en/datasets/explorer/explorer.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
The Explorer API is a Python API for exploring your datasets. It supports filtering and searching your dataset using SQL queries, vector similarity search and semantic search.
<p align="center">
<br>
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/3VryynorQeo?start=279"
title="YouTube video player" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> Ultralytics Explorer API Overview
</p>
## Installation
Explorer depends on external libraries for some of its functionality. These are automatically installed on usage. To manually install these dependencies, use the following command:
```bash
pip install ultralytics[explorer]
```
## Usage
```python
from ultralytics import Explorer
# Create an Explorer object
explorer = Explorer(data="coco128.yaml", model="yolo11n.pt")
# Create embeddings for your dataset
explorer.create_embeddings_table()
# Search for similar images to a given image/images
dataframe = explorer.get_similar(img="path/to/image.jpg")
# Or search for similar images to a given index/indices
dataframe = explorer.get_similar(idx=0)
```
!!! note
[Embeddings](https://www.ultralytics.com/glossary/embeddings) table for a given dataset and model pair is only created once and reused. These use [LanceDB](https://lancedb.github.io/lancedb/) under the hood, which scales on-disk, so you can create and reuse embeddings for large datasets like COCO without running out of memory.
In case you want to force update the embeddings table, you can pass `force=True` to `create_embeddings_table` method.
You can directly access the LanceDB table object to perform advanced analysis. Learn more about it in the [Working with Embeddings Table section](#4-working-with-embeddings-table)
## 1. Similarity Search
Similarity search is a technique for finding similar images to a given image. It is based on the idea that similar images will have similar embeddings. Once the embeddings table is built, you can get run semantic search in any of the following ways:
- On a given index or list of indices in the dataset: `exp.get_similar(idx=[1,10], limit=10)`
- On any image or list of images not in the dataset: `exp.get_similar(img=["path/to/img1", "path/to/img2"], limit=10)`
In case of multiple inputs, the aggregate of their embeddings is used.
You get a pandas dataframe with the `limit` number of most similar data points to the input, along with their distance in the embedding space. You can use this dataset to perform further filtering
!!! example "Semantic Search"
=== "Using Images"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo11n.pt")
exp.create_embeddings_table()
similar = exp.get_similar(img="https://ultralytics.com/images/bus.jpg", limit=10)
print(similar.head())
# Search using multiple indices
similar = exp.get_similar(
img=["https://ultralytics.com/images/bus.jpg", "https://ultralytics.com/images/bus.jpg"],
limit=10,
)
print(similar.head())
```
=== "Using Dataset Indices"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo11n.pt")
exp.create_embeddings_table()
similar = exp.get_similar(idx=1, limit=10)
print(similar.head())
# Search using multiple indices
similar = exp.get_similar(idx=[1, 10], limit=10)
print(similar.head())
```
### Plotting Similar Images
You can also plot the similar images using the `plot_similar` method. This method takes the same arguments as `get_similar` and plots the similar images in a grid.
!!! example "Plotting Similar Images"
=== "Using Images"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo11n.pt")
exp.create_embeddings_table()
plt = exp.plot_similar(img="https://ultralytics.com/images/bus.jpg", limit=10)
plt.show()
```
=== "Using Dataset Indices"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo11n.pt")
exp.create_embeddings_table()
plt = exp.plot_similar(idx=1, limit=10)
plt.show()
```
## 2. Ask AI (Natural Language Querying)
This allows you to write how you want to filter your dataset using natural language. You don't have to be proficient in writing SQL queries. Our AI powered query generator will automatically do that under the hood. For example - you can say - "show me 100 images with exactly one person and 2 dogs. There can be other objects too" and it'll internally generate the query and show you those results.
Note: This works using LLMs under the hood so the results are probabilistic and might get things wrong sometimes
!!! example "Ask AI"
```python
from ultralytics import Explorer
from ultralytics.data.explorer import plot_query_result
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo11n.pt")
exp.create_embeddings_table()
df = exp.ask_ai("show me 100 images with exactly one person and 2 dogs. There can be other objects too")
print(df.head())
# plot the results
plt = plot_query_result(df)
plt.show()
```
## 3. SQL Querying
You can run SQL queries on your dataset using the `sql_query` method. This method takes a SQL query as input and returns a pandas dataframe with the results.
!!! example "SQL Query"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo11n.pt")
exp.create_embeddings_table()
df = exp.sql_query("WHERE labels LIKE '%person%' AND labels LIKE '%dog%'")
print(df.head())
```
### Plotting SQL Query Results
You can also plot the results of a SQL query using the `plot_sql_query` method. This method takes the same arguments as `sql_query` and plots the results in a grid.
!!! example "Plotting SQL Query Results"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo11n.pt")
exp.create_embeddings_table()
# plot the SQL Query
exp.plot_sql_query("WHERE labels LIKE '%person%' AND labels LIKE '%dog%' LIMIT 10")
```
## 4. Working with Embeddings Table
You can also work with the embeddings table directly. Once the embeddings table is created, you can access it using the `Explorer.table`
!!! tip
Explorer works on [LanceDB](https://lancedb.github.io/lancedb/) tables internally. You can access this table directly, using `Explorer.table` object and run raw queries, push down pre- and post-filters, etc.
```python
from ultralytics import Explorer
exp = Explorer()
exp.create_embeddings_table()
table = exp.table
```
Here are some examples of what you can do with the table:
### Get raw Embeddings
!!! example
```python
from ultralytics import Explorer
exp = Explorer()
exp.create_embeddings_table()
table = exp.table
embeddings = table.to_pandas()["vector"]
print(embeddings)
```
### Advanced Querying with pre- and post-filters
!!! example
```python
from ultralytics import Explorer
exp = Explorer(model="yolo11n.pt")
exp.create_embeddings_table()
table = exp.table
# Dummy embedding
embedding = [i for i in range(256)]
rs = table.search(embedding).metric("cosine").where("").limit(10)
```
### Create Vector Index
When using large datasets, you can also create a dedicated vector index for faster querying. This is done using the `create_index` method on LanceDB table.
```python
table.create_index(num_partitions=..., num_sub_vectors=...)
```
Find more details on the type vector indices available and parameters [here](https://lancedb.github.io/lancedb/ann_indexes/#types-of-index) In the future, we will add support for creating vector indices directly from Explorer API.
## 5. Embeddings Applications
You can use the embeddings table to perform a variety of exploratory analysis. Here are some examples:
### Similarity Index
Explorer comes with a `similarity_index` operation:
- It tries to estimate how similar each data point is with the rest of the dataset.
- It does that by counting how many image embeddings lie closer than `max_dist` to the current image in the generated embedding space, considering `top_k` similar images at a time.
It returns a pandas dataframe with the following columns:
- `idx`: Index of the image in the dataset
- `im_file`: Path to the image file
- `count`: Number of images in the dataset that are closer than `max_dist` to the current image
- `sim_im_files`: List of paths to the `count` similar images
!!! tip
For a given dataset, model, `max_dist` & `top_k` the similarity index once generated will be reused. In case, your dataset has changed, or you simply need to regenerate the similarity index, you can pass `force=True`.
!!! example "Similarity Index"
```python
from ultralytics import Explorer
exp = Explorer()
exp.create_embeddings_table()
sim_idx = exp.similarity_index()
```
You can use similarity index to build custom conditions to filter out the dataset. For example, you can filter out images that are not similar to any other image in the dataset using the following code:
```python
import numpy as np
sim_count = np.array(sim_idx["count"])
sim_idx["im_file"][sim_count > 30]
```
### Visualize Embedding Space
You can also visualize the embedding space using the plotting tool of your choice. For example here is a simple example using matplotlib:
```python
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
# Reduce dimensions using PCA to 3 components for visualization in 3D
pca = PCA(n_components=3)
reduced_data = pca.fit_transform(embeddings)
# Create a 3D scatter plot using Matplotlib Axes3D
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection="3d")
# Scatter plot
ax.scatter(reduced_data[:, 0], reduced_data[:, 1], reduced_data[:, 2], alpha=0.5)
ax.set_title("3D Scatter Plot of Reduced 256-Dimensional Data (PCA)")
ax.set_xlabel("Component 1")
ax.set_ylabel("Component 2")
ax.set_zlabel("Component 3")
plt.show()
```
Start creating your own CV dataset exploration reports using the Explorer API. For inspiration, check out the
## Apps Built Using Ultralytics Explorer
Try our GUI Demo based on Explorer API
## Coming Soon
- [ ] Merge specific labels from datasets. Example - Import all `person` labels from COCO and `car` labels from Cityscapes
- [ ] Remove images that have a higher similarity index than the given threshold
- [ ] Automatically persist new datasets after merging/removing entries
- [ ] Advanced Dataset Visualizations
## FAQ
### What is the Ultralytics Explorer API used for?
The Ultralytics Explorer API is designed for comprehensive dataset exploration. It allows users to filter and search datasets using SQL queries, vector similarity search, and semantic search. This powerful Python API can handle large datasets, making it ideal for various [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) tasks using Ultralytics models.
### How do I install the Ultralytics Explorer API?
To install the Ultralytics Explorer API along with its dependencies, use the following command:
```bash
pip install ultralytics[explorer]
```
This will automatically install all necessary external libraries for the Explorer API functionality. For additional setup details, refer to the [installation section](#installation) of our documentation.
### How can I use the Ultralytics Explorer API for similarity search?
You can use the Ultralytics Explorer API to perform similarity searches by creating an embeddings table and querying it for similar images. Here's a basic example:
```python
from ultralytics import Explorer
# Create an Explorer object
explorer = Explorer(data="coco128.yaml", model="yolo11n.pt")
explorer.create_embeddings_table()
# Search for similar images to a given image
similar_images_df = explorer.get_similar(img="path/to/image.jpg")
print(similar_images_df.head())
```
For more details, please visit the [Similarity Search section](#1-similarity-search).
### What are the benefits of using LanceDB with Ultralytics Explorer?
LanceDB, used under the hood by Ultralytics Explorer, provides scalable, on-disk embeddings tables. This ensures that you can create and reuse embeddings for large datasets like COCO without running out of memory. These tables are only created once and can be reused, enhancing efficiency in data handling.
### How does the Ask AI feature work in the Ultralytics Explorer API?
The Ask AI feature allows users to filter datasets using natural language queries. This feature leverages LLMs to convert these queries into SQL queries behind the scenes. Here's an example:
```python
from ultralytics import Explorer
# Create an Explorer object
explorer = Explorer(data="coco128.yaml", model="yolo11n.pt")
explorer.create_embeddings_table()
# Query with natural language
query_result = explorer.ask_ai("show me 100 images with exactly one person and 2 dogs. There can be other objects too")
print(query_result.head())
```
For more examples, check out the [Ask AI section](#2-ask-ai-natural-language-querying).
---
comments: true
description: Unlock advanced data exploration with Ultralytics Explorer GUI. Utilize semantic search, run SQL queries, and ask AI for natural language data insights.
keywords: Ultralytics Explorer GUI, semantic search, vector similarity, SQL queries, AI, natural language search, data exploration, machine learning, OpenAI, LLMs
---
# Explorer GUI
!!! warning "Community Note ⚠️"
As of **`ultralytics>=8.3.10`**, Ultralytics explorer support has been deprecated. But don't worry! You can now access similar and even enhanced functionality through [Ultralytics HUB](https://hub.ultralytics.com/), our intuitive no-code platform designed to streamline your workflow. With Ultralytics HUB, you can continue exploring, visualizing, and managing your data effortlessly, all without writing a single line of code. Make sure to check it out and take advantage of its powerful features!🚀
Explorer GUI is like a playground build using [Ultralytics Explorer API](api.md). It allows you to run semantic/vector similarity search, SQL queries and even search using natural language using our ask AI feature powered by LLMs.
<p>
<img width="1709" alt="Explorer Dashboard Screenshot 1" src="https://github.com/ultralytics/docs/releases/download/0/explorer-dashboard-screenshot-1.avif">
</p>
<p align="center">
<br>
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/3VryynorQeo?start=306"
title="YouTube video player" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> Ultralytics Explorer Dashboard Overview
</p>
### Installation
```bash
pip install ultralytics[explorer]
```
!!! note
Ask AI feature works using OpenAI, so you'll be prompted to set the api key for OpenAI when you first run the GUI.
You can set it like this - `yolo settings openai_api_key="..."`
## Vector Semantic Similarity Search
Semantic search is a technique for finding similar images to a given image. It is based on the idea that similar images will have similar [embeddings](https://www.ultralytics.com/glossary/embeddings). In the UI, you can select one of more images and search for the images similar to them. This can be useful when you want to find images similar to a given image or a set of images that don't perform as expected.
For example:
In this VOC Exploration dashboard, user selects a couple airplane images like this:
<p>
<img width="1710" alt="Explorer Dashboard Screenshot 2" src="https://github.com/ultralytics/docs/releases/download/0/explorer-dashboard-screenshot-2.avif">
</p>
On performing similarity search, you should see a similar result:
<p>
<img width="1710" alt="Explorer Dashboard Screenshot 3" src="https://github.com/ultralytics/docs/releases/download/0/explorer-dashboard-screenshot-3.avif">
</p>
## Ask AI
This allows you to write how you want to filter your dataset using natural language. You don't have to be proficient in writing SQL queries. Our AI powered query generator will automatically do that under the hood. For example - you can say - "show me 100 images with exactly one person and 2 dogs. There can be other objects too" and it'll internally generate the query and show you those results. Here's an example output when asked to "Show 10 images with exactly 5 persons" and you'll see a result like this:
<p>
<img width="1709" alt="Explorer Dashboard Screenshot 4" src="https://github.com/ultralytics/docs/releases/download/0/explorer-dashboard-screenshot-4.avif">
</p>
Note: This works using LLMs under the hood so the results are probabilistic and might get things wrong sometimes
## Run SQL queries on your CV datasets
You can run SQL queries on your dataset to filter it. It also works if you only provide the WHERE clause. Example SQL query would show only the images that have at least one 1 person and 1 dog in them:
```sql
WHERE labels LIKE '%person%' AND labels LIKE '%dog%'
```
<p>
<img width="1707" alt="Explorer Dashboard Screenshot 5" src="https://github.com/ultralytics/docs/releases/download/0/explorer-dashboard-screenshot-5.avif">
</p>
This is a Demo build using the Explorer API. You can use the API to build your own exploratory notebooks or scripts to get insights into your datasets. Learn more about the Explorer API [here](api.md).
## FAQ
### What is Ultralytics Explorer GUI and how do I install it?
Ultralytics Explorer GUI is a powerful interface that unlocks advanced data exploration capabilities using the [Ultralytics Explorer API](api.md). It allows you to run semantic/vector similarity search, SQL queries, and natural language queries using the Ask AI feature powered by [Large Language Models](https://www.ultralytics.com/glossary/large-language-model-llm) (LLMs).
To install the Explorer GUI, you can use pip:
```bash
pip install ultralytics[explorer]
```
Note: To use the Ask AI feature, you'll need to set the OpenAI API key: `yolo settings openai_api_key="..."`.
### How does the semantic search feature in Ultralytics Explorer GUI work?
The semantic search feature in Ultralytics Explorer GUI allows you to find images similar to a given image based on their embeddings. This technique is useful for identifying and exploring images that share visual similarities. To use this feature, select one or more images in the UI and execute a search for similar images. The result will display images that closely resemble the selected ones, facilitating efficient dataset exploration and [anomaly detection](https://www.ultralytics.com/glossary/anomaly-detection).
Learn more about semantic search and other features by visiting the [Feature Overview](#vector-semantic-similarity-search) section.
### Can I use natural language to filter datasets in Ultralytics Explorer GUI?
Yes, with the Ask AI feature powered by large language models (LLMs), you can filter your datasets using natural language queries. You don't need to be proficient in SQL. For instance, you can ask "Show me 100 images with exactly one person and 2 dogs. There can be other objects too," and the AI will generate the appropriate query under the hood to deliver the desired results.
See an example of a natural language query [here](#ask-ai).
### How do I run SQL queries on datasets using Ultralytics Explorer GUI?
Ultralytics Explorer GUI allows you to run SQL queries directly on your dataset to filter and manage data efficiently. To run a query, navigate to the SQL query section in the GUI and write your query. For example, to show images with at least one person and one dog, you could use:
```sql
WHERE labels LIKE '%person%' AND labels LIKE '%dog%'
```
You can also provide only the WHERE clause, making the querying process more flexible.
For more details, refer to the [SQL Queries Section](#run-sql-queries-on-your-cv-datasets).
### What are the benefits of using Ultralytics Explorer GUI for data exploration?
Ultralytics Explorer GUI enhances data exploration with features like semantic search, SQL querying, and natural language interactions through the Ask AI feature. These capabilities allow users to:
- Efficiently find visually similar images.
- Filter datasets using complex SQL queries.
- Utilize AI to perform natural language searches, eliminating the need for advanced SQL expertise.
These features make it a versatile tool for developers, researchers, and data scientists looking to gain deeper insights into their datasets.
Explore more about these features in the [Explorer GUI Documentation](#explorer-gui).
{
"cells": [
{
"cell_type": "markdown",
"id": "aa923c26-81c8-4565-9277-1cb686e3702e",
"metadata": {
"id": "aa923c26-81c8-4565-9277-1cb686e3702e"
},
"source": [
"# VOC Exploration Example\n",
"<div align=\"center\">\n",
"\n",
" <a href=\"https://ultralytics.com/yolo\" target=\"_blank\">\n",
" <img width=\"1024\", src=\"https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png\"></a>\n",
"\n",
" [中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [Türkçe](https://docs.ultralytics.com/tr/) | [Tiếng Việt](https://docs.ultralytics.com/vi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
"\n",
" <a href=\"https://console.paperspace.com/github/ultralytics/ultralytics\"><img src=\"https://assets.paperspace.io/img/gradient-badge.svg\" alt=\"Run on Gradient\"/></a>\n",
" <a href=\"https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/tutorial.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
" <a href=\"https://www.kaggle.com/models/ultralytics/yolo11\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" alt=\"Open In Kaggle\"></a>\n",
"\n",
"Welcome to the Ultralytics Explorer API notebook! This notebook serves as the starting point for exploring the various resources available to help you get started with using Ultralytics to explore your datasets using with the power of semantic search. You can utilities out of the box that allow you to examine specific types of labels using vector search or even SQL queries.\n",
"\n",
"We hope that the resources in this notebook will help you get the most out of Ultralytics. Please browse the Explorer <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
"\n",
"Try `yolo explorer` powered by Exlorer API\n",
"\n",
"Simply `pip install ultralytics` and run `yolo explorer` in your terminal to run custom queries and semantic search on your datasets right inside your browser!\n",
"\n",
"</div>"
]
},
{
"cell_type": "markdown",
"source": [
"## Ultralytics Explorer support deprecated ⚠️\n",
"\n",
"As of **`ultralytics>=8.3.10`**, Ultralytics explorer support has been deprecated. But don’t worry! You can now access similar and even enhanced functionality through [Ultralytics HUB](https://hub.ultralytics.com/), our intuitive no-code platform designed to streamline your workflow. With Ultralytics HUB, you can continue exploring, visualizing, and managing your data effortlessly, all without writing a single line of code. Make sure to check it out and take advantage of its powerful features!🚀"
],
"metadata": {
"id": "RHe1PX5c7uK2"
},
"id": "RHe1PX5c7uK2"
},
{
"cell_type": "markdown",
"id": "2454d9ba-9db4-4b37-98e8-201ba285c92f",
"metadata": {
"id": "2454d9ba-9db4-4b37-98e8-201ba285c92f"
},
"source": [
"## Setup\n",
"Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "433f3a4d-a914-42cb-b0b6-be84a84e5e41",
"metadata": {
"id": "433f3a4d-a914-42cb-b0b6-be84a84e5e41"
},
"outputs": [],
"source": [
"%pip install ultralytics[explorer] openai\n",
"import ultralytics\n",
"\n",
"ultralytics.checks()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ae602549-3419-4909-9f82-35cba515483f",
"metadata": {
"id": "ae602549-3419-4909-9f82-35cba515483f"
},
"outputs": [],
"source": [
"from ultralytics import Explorer"
]
},
{
"cell_type": "markdown",
"id": "d8c06350-be8e-45cf-b3a6-b5017bbd943c",
"metadata": {
"id": "d8c06350-be8e-45cf-b3a6-b5017bbd943c"
},
"source": [
"## Similarity search\n",
"Utilize the power of vector similarity search to find the similar data points in your dataset along with their distance in the embedding space. Simply create an embeddings table for the given dataset-model pair. It is only needed once and it is reused automatically.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "334619da-6deb-4b32-9fe0-74e0a79cee20",
"metadata": {
"id": "334619da-6deb-4b32-9fe0-74e0a79cee20"
},
"outputs": [],
"source": [
"exp = Explorer(\"VOC.yaml\", model=\"yolo11n.pt\")\n",
"exp.create_embeddings_table()"
]
},
{
"cell_type": "markdown",
"id": "b6c5e42d-bc7e-4b4c-bde0-643072a2165d",
"metadata": {
"id": "b6c5e42d-bc7e-4b4c-bde0-643072a2165d"
},
"source": [
"One the embeddings table is built, you can get run semantic search in any of the following ways:\n",
"- On a given index / list of indices in the dataset like - `exp.get_similar(idx=[1,10], limit=10)`\n",
"- On any image/ list of images not in the dataset - `exp.get_similar(img=[\"path/to/img1\", \"path/to/img2\"], limit=10)`\n",
"In case of multiple inputs, the aggregade of their embeddings is used.\n",
"\n",
"You get a pandas dataframe with the `limit` number of most similar data points to the input, along with their distance in the embedding space. You can use this dataset to perform further filtering\n",
"<img width=\"1120\" alt=\"Screenshot 2024-01-06 at 9 45 42 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/7742ac57-e22a-4cea-a0f9-2b2a257483c5\">\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b485f05b-d92d-42bc-8da7-5e361667b341",
"metadata": {
"id": "b485f05b-d92d-42bc-8da7-5e361667b341"
},
"outputs": [],
"source": [
"similar = exp.get_similar(idx=1, limit=10)\n",
"similar.head()"
]
},
{
"cell_type": "markdown",
"id": "acf4b489-2161-4176-a1fe-d1d067d8083d",
"metadata": {
"id": "acf4b489-2161-4176-a1fe-d1d067d8083d"
},
"source": [
"You can use the also plot the similar samples directly using the `plot_similar` util\n",
"<p>\n",
"\n",
" <img src=\"https://github.com/AyushExel/assets/assets/15766192/a3c9247b-9271-47df-aaa5-36d96c5034b1\" />\n",
"</p>\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9dbfe7d0-8613-4529-adb6-6e0632d7cce7",
"metadata": {
"id": "9dbfe7d0-8613-4529-adb6-6e0632d7cce7"
},
"outputs": [],
"source": [
"exp.plot_similar(idx=6500, limit=20)\n",
"# exp.plot_similar(idx=[100,101], limit=10) # Can also pass list of idxs or imgs"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "260e09bf-4960-4089-a676-cb0e76ff3c0d",
"metadata": {
"id": "260e09bf-4960-4089-a676-cb0e76ff3c0d"
},
"outputs": [],
"source": [
"exp.plot_similar(\n",
" img=\"https://ultralytics.com/images/bus.jpg\", limit=10, labels=False\n",
") # Can also pass any external images"
]
},
{
"cell_type": "markdown",
"id": "faa0b7a7-6318-40e4-b0f4-45a8113bdc3a",
"metadata": {
"id": "faa0b7a7-6318-40e4-b0f4-45a8113bdc3a"
},
"source": [
"<p>\n",
"<img src=\"https://github.com/AyushExel/assets/assets/15766192/8e011195-b0da-43ef-b3cd-5fb6f383037e\">\n",
"\n",
"</p>"
]
},
{
"cell_type": "markdown",
"id": "0cea63f1-71f1-46da-af2b-b1b7d8f73553",
"metadata": {
"id": "0cea63f1-71f1-46da-af2b-b1b7d8f73553"
},
"source": [
"## 2. Ask AI: Search or filter with Natural Language\n",
"You can prompt the Explorer object with the kind of data points you want to see and it'll try to return a dataframe with those. Because it is powered by LLMs, it doesn't always get it right. In that case, it'll return None.\n",
"<p>\n",
"<img width=\"1131\" alt=\"Screenshot 2024-01-07 at 2 34 53 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/c4a69fd9-e54f-4d6a-aba5-dc9cfae1bc04\">\n",
"\n",
"</p>\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92fb92ac-7f76-465a-a9ba-ea7492498d9c",
"metadata": {
"id": "92fb92ac-7f76-465a-a9ba-ea7492498d9c"
},
"outputs": [],
"source": [
"df = exp.ask_ai(\"show me images containing more than 10 objects with at least 2 persons\")\n",
"df.head(5)"
]
},
{
"cell_type": "markdown",
"id": "f2a7d26e-0ce5-4578-ad1a-b1253805280f",
"metadata": {
"id": "f2a7d26e-0ce5-4578-ad1a-b1253805280f"
},
"source": [
"for plotting these results you can use `plot_query_result` util\n",
"Example:\n",
"```\n",
"plt = plot_query_result(exp.ask_ai(\"show me 10 images containing exactly 2 persons\"))\n",
"Image.fromarray(plt)\n",
"```\n",
"<p>\n",
" <img src=\"https://github.com/AyushExel/assets/assets/15766192/2cb780de-d05b-4412-a526-7f7f0f10e669\">\n",
"\n",
"</p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1cfab84-9835-4da0-8e9a-42b30cf84511",
"metadata": {
"id": "b1cfab84-9835-4da0-8e9a-42b30cf84511"
},
"outputs": [],
"source": [
"# plot\n",
"from PIL import Image\n",
"\n",
"from ultralytics.data.explorer import plot_query_result\n",
"\n",
"plt = plot_query_result(exp.ask_ai(\"show me 10 images containing exactly 2 persons\"))\n",
"Image.fromarray(plt)"
]
},
{
"cell_type": "markdown",
"id": "35315ae6-d827-40e4-8813-279f97a83b34",
"metadata": {
"id": "35315ae6-d827-40e4-8813-279f97a83b34"
},
"source": [
"## 3. Run SQL queries on your Dataset!\n",
"Sometimes you might want to investigate a certain type of entries in your dataset. For this Explorer allows you to execute SQL queries.\n",
"It accepts either of the formats:\n",
"- Queries beginning with \"WHERE\" will automatically select all columns. This can be thought of as a short-hand query\n",
"- You can also write full queries where you can specify which columns to select\n",
"\n",
"This can be used to investigate model performance and specific data points. For example:\n",
"- let's say your model struggles on images that have humans and dogs. You can write a query like this to select the points that have at least 2 humans AND at least one dog.\n",
"\n",
"You can combine SQL query and semantic search to filter down to specific type of results\n",
"<img width=\"994\" alt=\"Screenshot 2024-01-06 at 9 47 30 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/92bc3178-c151-4cd5-8007-c76178deb113\">\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8cd1072f-3100-4331-a0e3-4e2f6b1005bf",
"metadata": {
"id": "8cd1072f-3100-4331-a0e3-4e2f6b1005bf"
},
"outputs": [],
"source": [
"table = exp.sql_query(\"WHERE labels LIKE '%person, person%' AND labels LIKE '%dog%' LIMIT 10\")\n",
"table"
]
},
{
"cell_type": "markdown",
"id": "debf8a00-c9f6-448b-bd3b-454cf62f39ab",
"metadata": {
"id": "debf8a00-c9f6-448b-bd3b-454cf62f39ab"
},
"source": [
"Just like similarity search, you also get a util to directly plot the sql queries using `exp.plot_sql_query`\n",
"<img src=\"https://github.com/AyushExel/assets/assets/15766192/f8b66629-8dd0-419e-8f44-9837969ba678\">\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "18b977e7-d048-4b22-b8c4-084a03b04f23",
"metadata": {
"id": "18b977e7-d048-4b22-b8c4-084a03b04f23"
},
"outputs": [],
"source": [
"exp.plot_sql_query(\"WHERE labels LIKE '%person, person%' AND labels LIKE '%dog%' LIMIT 10\", labels=True)"
]
},
{
"cell_type": "markdown",
"id": "f26804c5-840b-4fd1-987f-e362f29e3e06",
"metadata": {
"id": "f26804c5-840b-4fd1-987f-e362f29e3e06"
},
"source": [
"## 3. Working with embeddings Table (Advanced)\n",
"Explorer works on [LanceDB](https://lancedb.github.io/lancedb/) tables internally. You can access this table directly, using `Explorer.table` object and run raw queries, push down pre and post filters, etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ea69260a-3407-40c9-9f42-8b34a6e6af7a",
"metadata": {
"id": "ea69260a-3407-40c9-9f42-8b34a6e6af7a"
},
"outputs": [],
"source": [
"table = exp.table\n",
"table.schema"
]
},
{
"cell_type": "markdown",
"id": "238db292-8610-40b3-9af7-dfd6be174892",
"metadata": {
"id": "238db292-8610-40b3-9af7-dfd6be174892"
},
"source": [
"### Run raw queries\n",
"Vector Search finds the nearest vectors from the database. In a recommendation system or search engine, you can find similar products from the one you searched. In LLM and other AI applications, each data point can be presented by the embeddings generated from some models, it returns the most relevant features.\n",
"\n",
"A search in high-dimensional vector space, is to find K-Nearest-Neighbors (KNN) of the query vector.\n",
"\n",
"Metric\n",
"In LanceDB, a Metric is the way to describe the distance between a pair of vectors. Currently, it supports the following metrics:\n",
"- L2\n",
"- Cosine\n",
"- Dot\n",
"Explorer's similarity search uses L2 by default. You can run queries on tables directly, or use the lance format to build custom utilities to manage datasets. More details on available LanceDB table ops in the [docs](https://lancedb.github.io/lancedb/)\n",
"\n",
"<img width=\"1015\" alt=\"Screenshot 2024-01-06 at 9 48 35 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/a2ccdaf3-8877-4f70-bf47-8a9bd2bb20c0\">\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d74430fe-5aee-45a1-8863-3f2c31338792",
"metadata": {
"id": "d74430fe-5aee-45a1-8863-3f2c31338792"
},
"outputs": [],
"source": [
"dummy_img_embedding = [i for i in range(256)]\n",
"table.search(dummy_img_embedding).limit(5).to_pandas()"
]
},
{
"cell_type": "markdown",
"id": "587486b4-0d19-4214-b994-f032fb2e8eb5",
"metadata": {
"id": "587486b4-0d19-4214-b994-f032fb2e8eb5"
},
"source": [
"### Inter-conversion to popular data formats"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb2876ea-999b-4eba-96bc-c196ba02c41c",
"metadata": {
"id": "bb2876ea-999b-4eba-96bc-c196ba02c41c"
},
"outputs": [],
"source": [
"df = table.to_pandas()\n",
"pa_table = table.to_arrow()"
]
},
{
"cell_type": "markdown",
"id": "42659d63-ad76-49d6-8dfc-78d77278db72",
"metadata": {
"id": "42659d63-ad76-49d6-8dfc-78d77278db72"
},
"source": [
"### Work with Embeddings\n",
"You can access the raw embedding from lancedb Table and analyse it. The image embeddings are stored in column `vector`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66d69e9b-046e-41c8-80d7-c0ee40be3bca",
"metadata": {
"id": "66d69e9b-046e-41c8-80d7-c0ee40be3bca"
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"embeddings = table.to_pandas()[\"vector\"].tolist()\n",
"embeddings = np.array(embeddings)"
]
},
{
"cell_type": "markdown",
"id": "e8df0a49-9596-4399-954b-b8ae1fd7a602",
"metadata": {
"id": "e8df0a49-9596-4399-954b-b8ae1fd7a602"
},
"source": [
"### Scatterplot\n",
"One of the preliminary steps in analysing embeddings is by plotting them in 2D space via dimensionality reduction. Let's try an example\n",
"\n",
"<img width=\"646\" alt=\"Screenshot 2024-01-06 at 9 48 58 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/9e1da25c-face-4426-abc0-2f64a4e4952c\">\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d9a150e8-8092-41b3-82f8-2247f8187fc8",
"metadata": {
"id": "d9a150e8-8092-41b3-82f8-2247f8187fc8"
},
"outputs": [],
"source": [
"!pip install scikit-learn --q"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "196079c3-45a9-4325-81ab-af79a881e37a",
"metadata": {
"id": "196079c3-45a9-4325-81ab-af79a881e37a"
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from sklearn.decomposition import PCA\n",
"\n",
"# Reduce dimensions using PCA to 3 components for visualization in 3D\n",
"pca = PCA(n_components=3)\n",
"reduced_data = pca.fit_transform(embeddings)\n",
"\n",
"# Create a 3D scatter plot using Matplotlib's Axes3D\n",
"fig = plt.figure(figsize=(8, 6))\n",
"ax = fig.add_subplot(111, projection=\"3d\")\n",
"\n",
"# Scatter plot\n",
"ax.scatter(reduced_data[:, 0], reduced_data[:, 1], reduced_data[:, 2], alpha=0.5)\n",
"ax.set_title(\"3D Scatter Plot of Reduced 256-Dimensional Data (PCA)\")\n",
"ax.set_xlabel(\"Component 1\")\n",
"ax.set_ylabel(\"Component 2\")\n",
"ax.set_zlabel(\"Component 3\")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "1c843c23-e3f2-490e-8d6c-212fa038a149",
"metadata": {
"id": "1c843c23-e3f2-490e-8d6c-212fa038a149"
},
"source": [
"## 4. Similarity Index\n",
"Here's a simple example of an operation powered by the embeddings table. Explorer comes with a `similarity_index` operation-\n",
"* It tries to estimate how similar each data point is with the rest of the dataset.\n",
"* It does that by counting how many image embeddings lie closer than `max_dist` to the current image in the generated embedding space, considering `top_k` similar images at a time.\n",
"\n",
"For a given dataset, model, `max_dist` & `top_k` the similarity index once generated will be reused. In case, your dataset has changed, or you simply need to regenerate the similarity index, you can pass `force=True`.\n",
"Similar to vector and SQL search, this also comes with a util to directly plot it. Let's look at the plot first\n",
"<img width=\"633\" alt=\"Screenshot 2024-01-06 at 9 49 36 PM\" src=\"https://github.com/AyushExel/assets/assets/15766192/96a9d984-4a72-4784-ace1-428676ee2bdd\">\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "953c2a5f-1b61-4acf-a8e4-ed08547dbafc",
"metadata": {
"id": "953c2a5f-1b61-4acf-a8e4-ed08547dbafc"
},
"outputs": [],
"source": [
"exp.plot_similarity_index(max_dist=0.2, top_k=0.01)"
]
},
{
"cell_type": "markdown",
"id": "28228a9a-b727-45b5-8ca7-8db662c0b937",
"metadata": {
"id": "28228a9a-b727-45b5-8ca7-8db662c0b937"
},
"source": [
"Now let's look at the output of the operation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f4161aaa-20e6-4df0-8e87-d2293ee0530a",
"metadata": {
"id": "f4161aaa-20e6-4df0-8e87-d2293ee0530a"
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"sim_idx = exp.similarity_index(max_dist=0.2, top_k=0.01, force=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b01d5b1a-9adb-4c3c-a873-217c71527c8d",
"metadata": {
"id": "b01d5b1a-9adb-4c3c-a873-217c71527c8d"
},
"outputs": [],
"source": [
"sim_idx"
]
},
{
"cell_type": "markdown",
"id": "22b28e54-4fbb-400e-ad8c-7068cbba11c4",
"metadata": {
"id": "22b28e54-4fbb-400e-ad8c-7068cbba11c4"
},
"source": [
"Let's create a query to see what data points have similarity count of more than 30 and plot images similar to them."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "58d2557b-d401-43cf-937d-4f554c7bc808",
"metadata": {
"id": "58d2557b-d401-43cf-937d-4f554c7bc808"
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"sim_count = np.array(sim_idx[\"count\"])\n",
"sim_idx[\"im_file\"][sim_count > 30]"
]
},
{
"cell_type": "markdown",
"id": "a5ec8d76-271a-41ab-ac74-cf8c0084ba5e",
"metadata": {
"id": "a5ec8d76-271a-41ab-ac74-cf8c0084ba5e"
},
"source": [
"You should see something like this\n",
"<img src=\"https://github.com/AyushExel/assets/assets/15766192/649bc366-ca2d-46ea-bfd9-3097cf575584\">\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3a7b2ee3-9f35-48a2-9c38-38379516f4d2",
"metadata": {
"id": "3a7b2ee3-9f35-48a2-9c38-38379516f4d2"
},
"outputs": [],
"source": [
"exp.plot_similar(idx=[7146, 14035]) # Using avg embeddings of 2 images"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
---
comments: true
description: Discover Ultralytics Explorer for semantic search, SQL queries, vector similarity, and natural language dataset exploration. Enhance your CV datasets effortlessly.
keywords: Ultralytics Explorer, CV datasets, semantic search, SQL queries, vector similarity, dataset visualization, python API, machine learning, computer vision
---
# Ultralytics Explorer
!!! warning "Community Note ⚠️"
As of **`ultralytics>=8.3.10`**, Ultralytics explorer support has been deprecated. But don't worry! You can now access similar and even enhanced functionality through [Ultralytics HUB](https://hub.ultralytics.com/), our intuitive no-code platform designed to streamline your workflow. With Ultralytics HUB, you can continue exploring, visualizing, and managing your data effortlessly, all without writing a single line of code. Make sure to check it out and take advantage of its powerful features!🚀
<p>
<img width="1709" alt="Ultralytics Explorer Screenshot 1" src="https://github.com/ultralytics/docs/releases/download/0/explorer-dashboard-screenshot-1.avif">
</p>
<a href="https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/docs/en/datasets/explorer/explorer.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
Ultralytics Explorer is a tool for exploring CV datasets using semantic search, SQL queries, vector similarity search and even using natural language. It is also a Python API for accessing the same functionality.
<p align="center">
<br>
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/3VryynorQeo"
title="YouTube video player" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> Ultralytics Explorer API | Semantic Search, SQL Queries & Ask AI Features
</p>
### Installation of optional dependencies
Explorer depends on external libraries for some of its functionality. These are automatically installed on usage. To manually install these dependencies, use the following command:
```bash
pip install ultralytics[explorer]
```
!!! tip
Explorer works on embedding/semantic search & SQL querying and is powered by [LanceDB](https://lancedb.com/) serverless vector database. Unlike traditional in-memory DBs, it is persisted on disk without sacrificing performance, so you can scale locally to large datasets like COCO without running out of memory.
### Explorer API
This is a Python API for Exploring your datasets. It also powers the GUI Explorer. You can use this to create your own exploratory notebooks or scripts to get insights into your datasets.
Learn more about the Explorer API [here](api.md).
## GUI Explorer Usage
The GUI demo runs in your browser allowing you to create [embeddings](https://www.ultralytics.com/glossary/embeddings) for your dataset and search for similar images, run SQL queries and perform semantic search. It can be run using the following command:
```bash
yolo explorer
```
!!! note
Ask AI feature works using OpenAI, so you'll be prompted to set the api key for OpenAI when you first run the GUI.
You can set it like this - `yolo settings openai_api_key="..."`
<p>
<img width="1709" alt="Ultralytics Explorer OpenAI Integration" src="https://github.com/ultralytics/docs/releases/download/0/ultralytics-explorer-openai-integration.avif">
</p>
## FAQ
### What is Ultralytics Explorer and how can it help with CV datasets?
Ultralytics Explorer is a powerful tool designed for exploring [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) (CV) datasets through semantic search, SQL queries, vector similarity search, and even natural language. This versatile tool provides both a GUI and a Python API, allowing users to seamlessly interact with their datasets. By leveraging technologies like LanceDB, Ultralytics Explorer ensures efficient, scalable access to large datasets without excessive memory usage. Whether you're performing detailed dataset analysis or exploring data patterns, Ultralytics Explorer streamlines the entire process.
Learn more about the [Explorer API](api.md).
### How do I install the dependencies for Ultralytics Explorer?
To manually install the optional dependencies needed for Ultralytics Explorer, you can use the following `pip` command:
```bash
pip install ultralytics[explorer]
```
These dependencies are essential for the full functionality of semantic search and SQL querying. By including libraries powered by [LanceDB](https://lancedb.com/), the installation ensures that the database operations remain efficient and scalable, even for large datasets like COCO.
### How can I use the GUI version of Ultralytics Explorer?
Using the GUI version of Ultralytics Explorer is straightforward. After installing the necessary dependencies, you can launch the GUI with the following command:
```bash
yolo explorer
```
The GUI provides a user-friendly interface for creating dataset embeddings, searching for similar images, running SQL queries, and conducting semantic searches. Additionally, the integration with OpenAI's Ask AI feature allows you to query datasets using natural language, enhancing the flexibility and ease of use.
For storage and scalability information, check out our [installation instructions](#installation-of-optional-dependencies).
### What is the Ask AI feature in Ultralytics Explorer?
The Ask AI feature in Ultralytics Explorer allows users to interact with their datasets using natural language queries. Powered by OpenAI, this feature enables you to ask complex questions and receive insightful answers without needing to write SQL queries or similar commands. To use this feature, you'll need to set your OpenAI API key the first time you run the GUI:
```bash
yolo settings openai_api_key="YOUR_API_KEY"
```
For more on this feature and how to integrate it, see our [GUI Explorer Usage](#gui-explorer-usage) section.
### Can I run Ultralytics Explorer in Google Colab?
Yes, Ultralytics Explorer can be run in Google Colab, providing a convenient and powerful environment for dataset exploration. You can start by opening the provided Colab notebook, which is pre-configured with all the necessary settings:
<a href="https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/docs/en/datasets/explorer/explorer.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
This setup allows you to explore your datasets fully, taking advantage of Google's cloud resources. Learn more in our [Google Colab Guide](../../integrations/google-colab.md).
---
comments: true
description: Explore Ultralytics' diverse datasets for vision tasks like detection, segmentation, classification, and more. Enhance your projects with high-quality annotated data.
keywords: Ultralytics, datasets, computer vision, object detection, instance segmentation, pose estimation, image classification, multi-object tracking
---
# Datasets Overview
Ultralytics provides support for various datasets to facilitate computer vision tasks such as detection, [instance segmentation](https://www.ultralytics.com/glossary/instance-segmentation), pose estimation, classification, and multi-object tracking. Below is a list of the main Ultralytics datasets, followed by a summary of each computer vision task and the respective datasets.
<p align="center">
<br>
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/YDXKa1EljmU"
title="YouTube video player" frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
allowfullscreen>
</iframe>
<br>
<strong>Watch:</strong> Ultralytics Datasets Overview
</p>
## Ultralytics Explorer
!!! warning "Community Note ⚠️"
As of **`ultralytics>=8.3.10`**, Ultralytics explorer support has been deprecated. But don't worry! You can now access similar and even enhanced functionality through [Ultralytics HUB](https://hub.ultralytics.com/), our intuitive no-code platform designed to streamline your workflow. With Ultralytics HUB, you can continue exploring, visualizing, and managing your data effortlessly, all without writing a single line of code. Make sure to check it out and take advantage of its powerful features!🚀
Create [embeddings](https://www.ultralytics.com/glossary/embeddings) for your dataset, search for similar images, run SQL queries, perform semantic search and even search using natural language! You can get started with our GUI app or build your own using the API. Learn more [here](explorer/index.md).
<p>
<img alt="Ultralytics Explorer Screenshot" src="https://github.com/ultralytics/docs/releases/download/0/ultralytics-explorer-screenshot.avif">
</p>
- Try the [GUI Demo](explorer/index.md)
- Learn more about the [Explorer API](explorer/index.md)
## [Object Detection](detect/index.md)
[Bounding box](https://www.ultralytics.com/glossary/bounding-box) object detection is a computer vision technique that involves detecting and localizing objects in an image by drawing a bounding box around each object.
- [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
- [COCO](detect/coco.md): Common Objects in Context (COCO) is a large-scale object detection, segmentation, and captioning dataset with 80 object categories.
- [LVIS](detect/lvis.md): A large-scale object detection, segmentation, and captioning dataset with 1203 object categories.
- [COCO8](detect/coco8.md): A smaller subset of the first 4 images from COCO train and COCO val, suitable for quick tests.
- [COCO128](detect/coco.md): A smaller subset of the first 128 images from COCO train and COCO val, suitable for tests.
- [Global Wheat 2020](detect/globalwheat2020.md): A dataset containing images of wheat heads for the Global Wheat Challenge 2020.
- [Objects365](detect/objects365.md): A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
- [OpenImagesV7](detect/open-images-v7.md): A comprehensive dataset by Google with 1.7M train images and 42k validation images.
- [SKU-110K](detect/sku-110k.md): A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes.
- [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
- [VOC](detect/voc.md): The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
- [xView](detect/xview.md): A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.
- [RF100](detect/roboflow-100.md): A diverse object detection benchmark with 100 datasets spanning seven imagery domains for comprehensive model evaluation.
- [Brain-tumor](detect/brain-tumor.md): A dataset for detecting brain tumors includes MRI or CT scan images with details on tumor presence, location, and characteristics.
- [African-wildlife](detect/african-wildlife.md): A dataset featuring images of African wildlife, including buffalo, elephant, rhino, and zebras.
- [Signature](detect/signature.md): A dataset featuring images of various documents with annotated signatures, supporting document verification and fraud detection research.
## [Instance Segmentation](segment/index.md)
Instance segmentation is a computer vision technique that involves identifying and localizing objects in an image at the pixel level.
- [COCO](segment/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images.
- [COCO8-seg](segment/coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations.
- [COCO128-seg](segment/coco.md): A smaller dataset for instance segmentation tasks, containing a subset of 128 COCO images with segmentation annotations.
- [Crack-seg](segment/crack-seg.md): Specifically crafted dataset for detecting cracks on roads and walls, applicable for both object detection and segmentation tasks.
- [Package-seg](segment/package-seg.md): Tailored dataset for identifying packages in warehouses or industrial settings, suitable for both object detection and segmentation applications.
- [Carparts-seg](segment/carparts-seg.md): Purpose-built dataset for identifying vehicle parts, catering to design, manufacturing, and research needs. It serves for both object detection and segmentation tasks.
## [Pose Estimation](pose/index.md)
Pose estimation is a technique used to determine the pose of the object relative to the camera or the world coordinate system.
- [COCO](pose/coco.md): A large-scale dataset with human pose annotations designed for pose estimation tasks.
- [COCO8-pose](pose/coco8-pose.md): A smaller dataset for pose estimation tasks, containing a subset of 8 COCO images with human pose annotations.
- [Tiger-pose](pose/tiger-pose.md): A compact dataset consisting of 263 images focused on tigers, annotated with 12 keypoints per tiger for pose estimation tasks.
- [Hand-Keypoints](pose/hand-keypoints.md): A concise dataset featuring over 26,000 images centered on human hands, annotated with 21 keypoints per hand, designed for pose estimation tasks.
- [Dog-pose](pose/dog-pose.md): A comprehensive dataset featuring approximately 6,000 images focused on dogs, annotated with 24 keypoints per dog, tailored for pose estimation tasks.
## [Classification](classify/index.md)
[Image classification](https://www.ultralytics.com/glossary/image-classification) is a computer vision task that involves categorizing an image into one or more predefined classes or categories based on its visual content.
- [Caltech 101](classify/caltech101.md): A dataset containing images of 101 object categories for image classification tasks.
- [Caltech 256](classify/caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images.
- [CIFAR-10](classify/cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class.
- [CIFAR-100](classify/cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class.
- [Fashion-MNIST](classify/fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks.
- [ImageNet](classify/imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories.
- [ImageNet-10](classify/imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing.
- [Imagenette](classify/imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing.
- [Imagewoof](classify/imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks.
- [MNIST](classify/mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks.
- [MNIST160](classify/mnist.md): First 8 images of each MNIST category from the MNIST dataset. Dataset contains 160 images total.
## [Oriented Bounding Boxes (OBB)](obb/index.md)
Oriented Bounding Boxes (OBB) is a method in computer vision for detecting angled objects in images using rotated bounding boxes, often applied to aerial and satellite imagery.
- [DOTA-v2](obb/dota-v2.md): A popular OBB aerial imagery dataset with 1.7 million instances and 11,268 images.
- [DOTA8](obb/dota8.md): A smaller subset of the first 8 images from the DOTAv1 split set, 4 for training and 4 for validation, suitable for quick tests.
## [Multi-Object Tracking](track/index.md)
Multi-object tracking is a computer vision technique that involves detecting and tracking multiple objects over time in a video sequence.
- [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations for multi-object tracking tasks.
- [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
## Contribute New Datasets
Contributing a new dataset involves several steps to ensure that it aligns well with the existing infrastructure. Below are the necessary steps:
### Steps to Contribute a New Dataset
1. **Collect Images**: Gather the images that belong to the dataset. These could be collected from various sources, such as public databases or your own collection.
2. **Annotate Images**: Annotate these images with bounding boxes, segments, or keypoints, depending on the task.
3. **Export Annotations**: Convert these annotations into the YOLO `*.txt` file format which Ultralytics supports.
4. **Organize Dataset**: Arrange your dataset into the correct folder structure. You should have `train/` and `val/` top-level directories, and within each, an `images/` and `labels/` subdirectory.
```
dataset/
├── train/
│ ├── images/
│ └── labels/
└── val/
├── images/
└── labels/
```
5. **Create a `data.yaml` File**: In your dataset's root directory, create a `data.yaml` file that describes the dataset, classes, and other necessary information.
6. **Optimize Images (Optional)**: If you want to reduce the size of the dataset for more efficient processing, you can optimize the images using the code below. This is not required, but recommended for smaller dataset sizes and faster download speeds.
7. **Zip Dataset**: Compress the entire dataset folder into a zip file.
8. **Document and PR**: Create a documentation page describing your dataset and how it fits into the existing framework. After that, submit a Pull Request (PR). Refer to [Ultralytics Contribution Guidelines](https://docs.ultralytics.com/help/contributing/) for more details on how to submit a PR.
### Example Code to Optimize and Zip a Dataset
!!! example "Optimize and Zip a Dataset"
=== "Python"
```python
from pathlib import Path
from ultralytics.data.utils import compress_one_image
from ultralytics.utils.downloads import zip_directory
# Define dataset directory
path = Path("path/to/dataset")
# Optimize images in dataset (optional)
for f in path.rglob("*.jpg"):
compress_one_image(f)
# Zip dataset into 'path/to/dataset.zip'
zip_directory(path)
```
By following these steps, you can contribute a new dataset that integrates well with Ultralytics' existing structure.
## FAQ
### What datasets does Ultralytics support for [object detection](https://www.ultralytics.com/glossary/object-detection)?
Ultralytics supports a wide variety of datasets for object detection, including:
- [COCO](detect/coco.md): A large-scale object detection, segmentation, and captioning dataset with 80 object categories.
- [LVIS](detect/lvis.md): An extensive dataset with 1203 object categories, designed for more fine-grained object detection and segmentation.
- [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
- [VisDrone](detect/visdrone.md): A dataset with object detection and multi-object tracking data from drone-captured imagery.
- [SKU-110K](detect/sku-110k.md): Featuring dense object detection in retail environments with over 11K images.
These datasets facilitate training robust models for various object detection applications.
### How do I contribute a new dataset to Ultralytics?
Contributing a new dataset involves several steps:
1. **Collect Images**: Gather images from public databases or personal collections.
2. **Annotate Images**: Apply bounding boxes, segments, or keypoints, depending on the task.
3. **Export Annotations**: Convert annotations into the YOLO `*.txt` format.
4. **Organize Dataset**: Use the folder structure with `train/` and `val/` directories, each containing `images/` and `labels/` subdirectories.
5. **Create a `data.yaml` File**: Include dataset descriptions, classes, and other relevant information.
6. **Optimize Images (Optional)**: Reduce dataset size for efficiency.
7. **Zip Dataset**: Compress the dataset into a zip file.
8. **Document and PR**: Describe your dataset and submit a Pull Request following [Ultralytics Contribution Guidelines](https://docs.ultralytics.com/help/contributing/).
Visit [Contribute New Datasets](#contribute-new-datasets) for a comprehensive guide.
### Why should I use Ultralytics Explorer for my dataset?
Ultralytics Explorer offers powerful features for dataset analysis, including:
- **Embeddings Generation**: Create vector embeddings for images.
- **Semantic Search**: Search for similar images using embeddings or AI.
- **SQL Queries**: Run advanced SQL queries for detailed data analysis.
- **Natural Language Search**: Search using plain language queries for ease of use.
Explore the [Ultralytics Explorer](explorer/index.md) for more information and to try the [GUI Demo](explorer/index.md).
### What are the unique features of Ultralytics YOLO models for [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv)?
Ultralytics YOLO models provide several unique features:
- **Real-time Performance**: High-speed inference and training.
- **Versatility**: Suitable for detection, segmentation, classification, and pose estimation tasks.
- **Pretrained Models**: Access to high-performing, pretrained models for various applications.
- **Extensive Community Support**: Active community and comprehensive documentation for troubleshooting and development.
Discover more about YOLO on the [Ultralytics YOLO](https://www.ultralytics.com/yolo) page.
### How can I optimize and zip a dataset using Ultralytics tools?
To optimize and zip a dataset using Ultralytics tools, follow this example code:
!!! example "Optimize and Zip a Dataset"
=== "Python"
```python
from pathlib import Path
from ultralytics.data.utils import compress_one_image
from ultralytics.utils.downloads import zip_directory
# Define dataset directory
path = Path("path/to/dataset")
# Optimize images in dataset (optional)
for f in path.rglob("*.jpg"):
compress_one_image(f)
# Zip dataset into 'path/to/dataset.zip'
zip_directory(path)
```
Learn more on how to [Optimize and Zip a Dataset](#example-code-to-optimize-and-zip-a-dataset).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment