v1.0

af155c51 · chenzk · af155c51 · af155c51 · af155c51 · af155c51
Commit af155c51 authored May 30, 2025 by chenzk
20 changed files
--- a/docs/en/macros/export-table.md
+++ b/docs/en/macros/export-table.md
+| Format                                            | `format` Argument | Model                                           | Metadata | Arguments                                                            |
+| ------------------------------------------------- | ----------------- | ----------------------------------------------- | -------- | -------------------------------------------------------------------- |
+| [PyTorch](https://pytorch.org/)                   | -                 | `{{ model_name or "yolo11n" }}.pt`              | ✅       | -                                                                    |
+| [TorchScript](../integrations/torchscript.md)     | `torchscript`     | `{{ model_name or "yolo11n" }}.torchscript`     | ✅       | `imgsz`, `optimize`, `batch`                                         |
+| [ONNX](../integrations/onnx.md)                   | `onnx`            | `{{ model_name or "yolo11n" }}.onnx`            | ✅       | `imgsz`, `half`, `dynamic`, `simplify`, `opset`, `batch`             |
+| [OpenVINO](../integrations/openvino.md)           | `openvino`        | `{{ model_name or "yolo11n" }}_openvino_model/` | ✅       | `imgsz`, `half`, `int8`, `batch`                                     |
+| [TensorRT](../integrations/tensorrt.md)           | `engine`          | `{{ model_name or "yolo11n" }}.engine`          | ✅       | `imgsz`, `half`, `dynamic`, `simplify`, `workspace`, `int8`, `batch` |
+| [CoreML](../integrations/coreml.md)               | `coreml`          | `{{ model_name or "yolo11n" }}.mlpackage`       | ✅       | `imgsz`, `half`, `int8`, `nms`, `batch`                              |
+| [TF SavedModel](../integrations/tf-savedmodel.md) | `saved_model`     | `{{ model_name or "yolo11n" }}_saved_model/`    | ✅       | `imgsz`, `keras`, `int8`, `batch`                                    |
+| [TF GraphDef](../integrations/tf-graphdef.md)     | `pb`              | `{{ model_name or "yolo11n" }}.pb`              | ❌       | `imgsz`, `batch`                                                     |
+| [TF Lite](../integrations/tflite.md)              | `tflite`          | `{{ model_name or "yolo11n" }}.tflite`          | ✅       | `imgsz`, `half`, `int8`, `batch`                                     |
+| [TF Edge TPU](../integrations/edge-tpu.md)        | `edgetpu`         | `{{ model_name or "yolo11n" }}_edgetpu.tflite`  | ✅       | `imgsz`                                                              |
+| [TF.js](../integrations/tfjs.md)                  | `tfjs`            | `{{ model_name or "yolo11n" }}_web_model/`      | ✅       | `imgsz`, `half`, `int8`, `batch`                                     |
+| [PaddlePaddle](../integrations/paddlepaddle.md)   | `paddle`          | `{{ model_name or "yolo11n" }}_paddle_model/`   | ✅       | `imgsz`, `batch`                                                     |
+| [MNN](../integrations/mnn.md)                     | `mnn`             | `{{ model_name or "yolo11n" }}.mnn`             | ✅       | `imgsz`, `batch`, `int8`, `half`                                     |
+| [NCNN](../integrations/ncnn.md)                   | `ncnn`            | `{{ model_name or "yolo11n" }}_ncnn_model/`     | ✅       | `imgsz`, `half`, `batch`                                             |
+| [IMX500](../integrations/sony-imx500.md)          | `imx`             | `{{ model_name or "yolov8n" }}_imx_model/`      | ✅       | `imgsz`, `int8`                                                      |
--- a/docs/en/macros/predict-args.md
+++ b/docs/en/macros/predict-args.md
+| Argument        | Type           | Default                | Description                                                                                                                                                                                                                                                                                                    |
+| --------------- | -------------- | ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `source`        | `str`          | `'ultralytics/assets'` | Specifies the data source for inference. Can be an image path, video file, directory, URL, or device ID for live feeds. Supports a wide range of formats and sources, enabling flexible application across [different types of input](/modes/predict.md/#inference-sources).                                   |
+| `conf`          | `float`        | `0.25`                 | Sets the minimum confidence threshold for detections. Objects detected with confidence below this threshold will be disregarded. Adjusting this value can help reduce false positives.                                                                                                                         |
+| `iou`           | `float`        | `0.7`                  | [Intersection Over Union](https://www.ultralytics.com/glossary/intersection-over-union-iou) (IoU) threshold for Non-Maximum Suppression (NMS). Lower values result in fewer detections by eliminating overlapping boxes, useful for reducing duplicates.                                                       |
+| `imgsz`         | `int or tuple` | `640`                  | Defines the image size for inference. Can be a single integer `640` for square resizing or a (height, width) tuple. Proper sizing can improve detection [accuracy](https://www.ultralytics.com/glossary/accuracy) and processing speed.                                                                        |
+| `half`          | `bool`         | `False`                | Enables half-[precision](https://www.ultralytics.com/glossary/precision) (FP16) inference, which can speed up model inference on supported GPUs with minimal impact on accuracy.                                                                                                                               |
+| `device`        | `str`          | `None`                 | Specifies the device for inference (e.g., `cpu`, `cuda:0` or `0`). Allows users to select between CPU, a specific GPU, or other compute devices for model execution.                                                                                                                                           |
+| `max_det`       | `int`          | `300`                  | Maximum number of detections allowed per image. Limits the total number of objects the model can detect in a single inference, preventing excessive outputs in dense scenes.                                                                                                                                   |
+| `vid_stride`    | `int`          | `1`                    | Frame stride for video inputs. Allows skipping frames in videos to speed up processing at the cost of temporal resolution. A value of 1 processes every frame, higher values skip frames.                                                                                                                      |
+| `stream_buffer` | `bool`         | `False`                | Determines whether to queue incoming frames for video streams. If `False`, old frames get dropped to accomodate new frames (optimized for real-time applications). If `True', queues new frames in a buffer, ensuring no frames get skipped, but will cause latency if inference FPS is lower than stream FPS. |
+| `visualize`     | `bool`         | `False`                | Activates visualization of model features during inference, providing insights into what the model is "seeing". Useful for debugging and model interpretation.                                                                                                                                                 |
+| `augment`       | `bool`         | `False`                | Enables test-time augmentation (TTA) for predictions, potentially improving detection robustness at the cost of inference speed.                                                                                                                                                                               |
+| `agnostic_nms`  | `bool`         | `False`                | Enables class-agnostic Non-Maximum Suppression (NMS), which merges overlapping boxes of different classes. Useful in multi-class detection scenarios where class overlap is common.                                                                                                                            |
+| `classes`       | `list[int]`    | `None`                 | Filters predictions to a set of class IDs. Only detections belonging to the specified classes will be returned. Useful for focusing on relevant objects in multi-class detection tasks.                                                                                                                        |
+| `retina_masks`  | `bool`         | `False`                | Returns high-resolution segmentation masks. The returned masks (`masks.data`) will match the original image size if enabled. If disabled, they have the image size used during inference.                                                                                                                      |
+| `embed`         | `list[int]`    | `None`                 | Specifies the layers from which to extract feature vectors or [embeddings](https://www.ultralytics.com/glossary/embeddings). Useful for downstream tasks like clustering or similarity search.                                                                                                                 |
+| `project`       | `str`          | `None`                 | Name of the project directory where prediction outputs are saved if `save` is enabled.                                                                                                                                                                                                                         |
+| `name`          | `str`          | `None`                 | Name of the prediction run. Used for creating a subdirectory within the project folder, where prediction outputs are stored if `save` is enabled.                                                                                                                                                              |
--- a/docs/en/macros/track-args.md
+++ b/docs/en/macros/track-args.md
+| Argument  | Type    | Default        | Description                                                                                                                                                |
+| --------- | ------- | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `source`  | `str`   | `None`         | Specifies the source directory for images or videos. Supports file paths and URLs.                                                                         |
+| `persist` | `bool`  | `False`        | Enables persistent tracking of objects between frames, maintaining IDs across video sequences.                                                             |
+| `tracker` | `str`   | `botsort.yaml` | Specifies the tracking algorithm to use, e.g., `bytetrack.yaml` or `botsort.yaml`.                                                                         |
+| `conf`    | `float` | `0.3`          | Sets the confidence threshold for detections; lower values allow more objects to be tracked but may include false positives.                               |
+| `iou`     | `float` | `0.5`          | Sets the [Intersection over Union](https://www.ultralytics.com/glossary/intersection-over-union-iou) (IoU) threshold for filtering overlapping detections. |
+| `classes` | `list`  | `None`         | Filters results by class index. For example, `classes=[0, 2, 3]` only tracks the specified classes.                                                        |
+| `verbose` | `bool`  | `True`         | Controls the display of tracking results, providing a visual output of tracked objects.                                                                    |
--- a/docs/en/macros/train-args.md
+++ b/docs/en/macros/train-args.md
+| Argument          | Default  | Description                                                                                                                                                                                                                                                  |
+| ----------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `model`           | `None`   | Specifies the model file for training. Accepts a path to either a `.pt` pretrained model or a `.yaml` configuration file. Essential for defining the model structure or initializing weights.                                                                |
+| `data`            | `None`   | Path to the dataset configuration file (e.g., `coco8.yaml`). This file contains dataset-specific parameters, including paths to training and [validation data](https://www.ultralytics.com/glossary/validation-data), class names, and number of classes.    |
+| `epochs`          | `100`    | Total number of training epochs. Each [epoch](https://www.ultralytics.com/glossary/epoch) represents a full pass over the entire dataset. Adjusting this value can affect training duration and model performance.                                           |
+| `time`            | `None`   | Maximum training time in hours. If set, this overrides the `epochs` argument, allowing training to automatically stop after the specified duration. Useful for time-constrained training scenarios.                                                          |
+| `patience`        | `100`    | Number of epochs to wait without improvement in validation metrics before early stopping the training. Helps prevent [overfitting](https://www.ultralytics.com/glossary/overfitting) by stopping training when performance plateaus.                         |
+| `batch`           | `16`     | [Batch size](https://www.ultralytics.com/glossary/batch-size), with three modes: set as an integer (e.g., `batch=16`), auto mode for 60% GPU memory utilization (`batch=-1`), or auto mode with specified utilization fraction (`batch=0.70`).               |
+| `imgsz`           | `640`    | Target image size for training. All images are resized to this dimension before being fed into the model. Affects model [accuracy](https://www.ultralytics.com/glossary/accuracy) and computational complexity.                                              |
+| `save`            | `True`   | Enables saving of training checkpoints and final model weights. Useful for resuming training or [model deployment](https://www.ultralytics.com/glossary/model-deployment).                                                                                   |
+| `save_period`     | `-1`     | Frequency of saving model checkpoints, specified in epochs. A value of -1 disables this feature. Useful for saving interim models during long training sessions.                                                                                             |
+| `cache`           | `False`  | Enables caching of dataset images in memory (`True`/`ram`), on disk (`disk`), or disables it (`False`). Improves training speed by reducing disk I/O at the cost of increased memory usage.                                                                  |
+| `device`          | `None`   | Specifies the computational device(s) for training: a single GPU (`device=0`), multiple GPUs (`device=0,1`), CPU (`device=cpu`), or MPS for Apple silicon (`device=mps`).                                                                                    |
+| `workers`         | `8`      | Number of worker threads for data loading (per `RANK` if Multi-GPU training). Influences the speed of data preprocessing and feeding into the model, especially useful in multi-GPU setups.                                                                  |
+| `project`         | `None`   | Name of the project directory where training outputs are saved. Allows for organized storage of different experiments.                                                                                                                                       |
+| `name`            | `None`   | Name of the training run. Used for creating a subdirectory within the project folder, where training logs and outputs are stored.                                                                                                                            |
+| `exist_ok`        | `False`  | If True, allows overwriting of an existing project/name directory. Useful for iterative experimentation without needing to manually clear previous outputs.                                                                                                  |
+| `pretrained`      | `True`   | Determines whether to start training from a pretrained model. Can be a boolean value or a string path to a specific model from which to load weights. Enhances training efficiency and model performance.                                                    |
+| `optimizer`       | `'auto'` | Choice of optimizer for training. Options include `SGD`, `Adam`, `AdamW`, `NAdam`, `RAdam`, `RMSProp` etc., or `auto` for automatic selection based on model configuration. Affects convergence speed and stability.                                         |
+| `seed`            | `0`      | Sets the random seed for training, ensuring reproducibility of results across runs with the same configurations.                                                                                                                                             |
+| `deterministic`   | `True`   | Forces deterministic algorithm use, ensuring reproducibility but may affect performance and speed due to the restriction on non-deterministic algorithms.                                                                                                    |
+| `single_cls`      | `False`  | Treats all classes in multi-class datasets as a single class during training. Useful for binary classification tasks or when focusing on object presence rather than classification.                                                                         |
+| `classes`         | `None`   | Specifies a list of class IDs to train on. Useful for filtering out and focusing only on certain classes during training.                                                                                                                                    |
+| `rect`            | `False`  | Enables rectangular training, optimizing batch composition for minimal padding. Can improve efficiency and speed but may affect model accuracy.                                                                                                              |
+| `cos_lr`          | `False`  | Utilizes a cosine [learning rate](https://www.ultralytics.com/glossary/learning-rate) scheduler, adjusting the learning rate following a cosine curve over epochs. Helps in managing learning rate for better convergence.                                   |
+| `close_mosaic`    | `10`     | Disables mosaic [data augmentation](https://www.ultralytics.com/glossary/data-augmentation) in the last N epochs to stabilize training before completion. Setting to 0 disables this feature.                                                                |
+| `resume`          | `False`  | Resumes training from the last saved checkpoint. Automatically loads model weights, optimizer state, and epoch count, continuing training seamlessly.                                                                                                        |
+| `amp`             | `True`   | Enables Automatic [Mixed Precision](https://www.ultralytics.com/glossary/mixed-precision) (AMP) training, reducing memory usage and possibly speeding up training with minimal impact on accuracy.                                                           |
+| `fraction`        | `1.0`    | Specifies the fraction of the dataset to use for training. Allows for training on a subset of the full dataset, useful for experiments or when resources are limited.                                                                                        |
+| `profile`         | `False`  | Enables profiling of ONNX and TensorRT speeds during training, useful for optimizing model deployment.                                                                                                                                                       |
+| `freeze`          | `None`   | Freezes the first N layers of the model or specified layers by index, reducing the number of trainable parameters. Useful for fine-tuning or [transfer learning](https://www.ultralytics.com/glossary/transfer-learning).                                    |
+| `lr0`             | `0.01`   | Initial learning rate (i.e. `SGD=1E-2`, `Adam=1E-3`) . Adjusting this value is crucial for the optimization process, influencing how rapidly model weights are updated.                                                                                      |
+| `lrf`             | `0.01`   | Final learning rate as a fraction of the initial rate = (`lr0 * lrf`), used in conjunction with schedulers to adjust the learning rate over time.                                                                                                            |
+| `momentum`        | `0.937`  | Momentum factor for SGD or beta1 for [Adam optimizers](https://www.ultralytics.com/glossary/adam-optimizer), influencing the incorporation of past gradients in the current update.                                                                          |
+| `weight_decay`    | `0.0005` | L2 [regularization](https://www.ultralytics.com/glossary/regularization) term, penalizing large weights to prevent overfitting.                                                                                                                              |
+| `warmup_epochs`   | `3.0`    | Number of epochs for learning rate warmup, gradually increasing the learning rate from a low value to the initial learning rate to stabilize training early on.                                                                                              |
+| `warmup_momentum` | `0.8`    | Initial momentum for warmup phase, gradually adjusting to the set momentum over the warmup period.                                                                                                                                                           |
+| `warmup_bias_lr`  | `0.1`    | Learning rate for bias parameters during the warmup phase, helping stabilize model training in the initial epochs.                                                                                                                                           |
+| `box`             | `7.5`    | Weight of the box loss component in the [loss function](https://www.ultralytics.com/glossary/loss-function), influencing how much emphasis is placed on accurately predicting [bounding box](https://www.ultralytics.com/glossary/bounding-box) coordinates. |
+| `cls`             | `0.5`    | Weight of the classification loss in the total loss function, affecting the importance of correct class prediction relative to other components.                                                                                                             |
+| `dfl`             | `1.5`    | Weight of the distribution focal loss, used in certain YOLO versions for fine-grained classification.                                                                                                                                                        |
+| `pose`            | `12.0`   | Weight of the pose loss in models trained for pose estimation, influencing the emphasis on accurately predicting pose keypoints.                                                                                                                             |
+| `kobj`            | `2.0`    | Weight of the keypoint objectness loss in pose estimation models, balancing detection confidence with pose accuracy.                                                                                                                                         |
+| `nbs`             | `64`     | Nominal batch size for normalization of loss.                                                                                                                                                                                                                |
+| `overlap_mask`    | `True`   | Determines whether object masks should be merged into a single mask for training, or kept separate for each object. In case of overlap, the smaller mask is overlayed on top of the larger mask during merge.                                                |
+| `mask_ratio`      | `4`      | Downsample ratio for segmentation masks, affecting the resolution of masks used during training.                                                                                                                                                             |
+| `dropout`         | `0.0`    | Dropout rate for regularization in classification tasks, preventing overfitting by randomly omitting units during training.                                                                                                                                  |
+| `val`             | `True`   | Enables validation during training, allowing for periodic evaluation of model performance on a separate dataset.                                                                                                                                             |
+| `plots`           | `False`  | Generates and saves plots of training and validation metrics, as well as prediction examples, providing visual insights into model performance and learning progression.                                                                                     |
--- a/docs/en/macros/validation-args.md
+++ b/docs/en/macros/validation-args.md
+| Argument      | Type    | Default | Description                                                                                                                                                                                                                           |
+| ------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `data`        | `str`   | `None`  | Specifies the path to the dataset configuration file (e.g., `coco8.yaml`). This file includes paths to [validation data](https://www.ultralytics.com/glossary/validation-data), class names, and number of classes.                   |
+| `imgsz`       | `int`   | `640`   | Defines the size of input images. All images are resized to this dimension before processing.                                                                                                                                         |
+| `batch`       | `int`   | `16`    | Sets the number of images per batch. Use `-1` for AutoBatch, which automatically adjusts based on GPU memory availability.                                                                                                            |
+| `save_json`   | `bool`  | `False` | If `True`, saves the results to a JSON file for further analysis or integration with other tools.                                                                                                                                     |
+| `save_hybrid` | `bool`  | `False` | If `True`, saves a hybrid version of labels that combines original annotations with additional model predictions.                                                                                                                     |
+| `conf`        | `float` | `0.001` | Sets the minimum confidence threshold for detections. Detections with confidence below this threshold are discarded.                                                                                                                  |
+| `iou`         | `float` | `0.6`   | Sets the [Intersection Over Union](https://www.ultralytics.com/glossary/intersection-over-union-iou) (IoU) threshold for Non-Maximum Suppression (NMS). Helps in reducing duplicate detections.                                       |
+| `max_det`     | `int`   | `300`   | Limits the maximum number of detections per image. Useful in dense scenes to prevent excessive detections.                                                                                                                            |
+| `half`        | `bool`  | `True`  | Enables half-[precision](https://www.ultralytics.com/glossary/precision) (FP16) computation, reducing memory usage and potentially increasing speed with minimal impact on [accuracy](https://www.ultralytics.com/glossary/accuracy). |
+| `device`      | `str`   | `None`  | Specifies the device for validation (`cpu`, `cuda:0`, etc.). Allows flexibility in utilizing CPU or GPU resources.                                                                                                                    |
+| `dnn`         | `bool`  | `False` | If `True`, uses the [OpenCV](https://www.ultralytics.com/glossary/opencv) DNN module for ONNX model inference, offering an alternative to [PyTorch](https://www.ultralytics.com/glossary/pytorch) inference methods.                  |
+| `plots`       | `bool`  | `False` | When set to `True`, generates and saves plots of predictions versus ground truth for visual evaluation of the model's performance.                                                                                                    |
+| `rect`        | `bool`  | `True`  | If `True`, uses rectangular inference for batching, reducing padding and potentially increasing speed and efficiency.                                                                                                                 |
+| `split`       | `str`   | `val`   | Determines the dataset split to use for validation (`val`, `test`, or `train`). Allows flexibility in choosing the data segment for performance evaluation.                                                                           |
+| `project`     | `str`   | `None`  | Name of the project directory where validation outputs are saved.                                                                                                                                                                     |
+| `name`        | `str`   | `None`  | Name of the validation run. Used for creating a subdirectory within the project folder, where valdiation logs and outputs are stored.                                                                                                 |
--- a/docs/en/macros/visualization-args.md
+++ b/docs/en/macros/visualization-args.md
+| Argument      | Type            | Default           | Description                                                                                                                                                                            |
+| ------------- | --------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `show`        | `bool`          | `False`           | If `True`, displays the annotated images or videos in a window. Useful for immediate visual feedback during development or testing.                                                    |
+| `save`        | `bool`          | `False` or `True` | Enables saving of the annotated images or videos to file. Useful for documentation, further analysis, or sharing results. Defaults to True when using CLI & False when used in Python. |
+| `save_frames` | `bool`          | `False`           | When processing videos, saves individual frames as images. Useful for extracting specific frames or for detailed frame-by-frame analysis.                                              |
+| `save_txt`    | `bool`          | `False`           | Saves detection results in a text file, following the format `[class] [x_center] [y_center] [width] [height] [confidence]`. Useful for integration with other analysis tools.          |
+| `save_conf`   | `bool`          | `False`           | Includes confidence scores in the saved text files. Enhances the detail available for post-processing and analysis.                                                                    |
+| `save_crop`   | `bool`          | `False`           | Saves cropped images of detections. Useful for dataset augmentation, analysis, or creating focused datasets for specific objects.                                                      |
+| `show_labels` | `bool`          | `True`            | Displays labels for each detection in the visual output. Provides immediate understanding of detected objects.                                                                         |
+| `show_conf`   | `bool`          | `True`            | Displays the confidence score for each detection alongside the label. Gives insight into the model's certainty for each detection.                                                     |
+| `show_boxes`  | `bool`          | `True`            | Draws bounding boxes around detected objects. Essential for visual identification and location of objects in images or video frames.                                                   |
+| `line_width`  | `None` or `int` | `None`            | Specifies the line width of bounding boxes. If `None`, the line width is automatically adjusted based on the image size. Provides visual customization for clarity.                    |
--- a/docs/en/macros/yolo-cls-perf.md
+++ b/docs/en/macros/yolo-cls-perf.md
+| Model                                                                                        | size<br><sup>(pixels) | acc<br><sup>top1 | acc<br><sup>top5 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>T4 TensorRT10<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) at 640 |
+| -------------------------------------------------------------------------------------------- | --------------------- | ---------------- | ---------------- | ------------------------------ | ----------------------------------- | ------------------ | ------------------------ |
+| [YOLO11n-cls](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n-cls.pt) | 224                   | 70.0             | 89.4             | 5.0 ± 0.3                      | 1.1 ± 0.0                           | 1.6                | 3.3                      |
+| [YOLO11s-cls](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s-cls.pt) | 224                   | 75.4             | 92.7             | 7.9 ± 0.2                      | 1.3 ± 0.0                           | 5.5                | 12.1                     |
+| [YOLO11m-cls](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11m-cls.pt) | 224                   | 77.3             | 93.9             | 17.2 ± 0.4                     | 2.0 ± 0.0                           | 10.4               | 39.3                     |
+| [YOLO11l-cls](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11l-cls.pt) | 224                   | 78.3             | 94.3             | 23.2 ± 0.3                     | 2.8 ± 0.0                           | 12.9               | 49.4                     |
+| [YOLO11x-cls](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x-cls.pt) | 224                   | 79.5             | 94.9             | 41.4 ± 0.9                     | 3.8 ± 0.0                           | 28.4               | 110.4                    |
--- a/docs/en/macros/yolo-det-perf.md
+++ b/docs/en/macros/yolo-det-perf.md
+| Model                                                                                | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>T4 TensorRT10<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
+| ------------------------------------------------------------------------------------ | --------------------- | -------------------- | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
+| [YOLO11n](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt) | 640                   | 39.5                 | 56.1 ± 0.8                     | 1.5 ± 0.0                           | 2.6                | 6.5               |
+| [YOLO11s](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s.pt) | 640                   | 47.0                 | 90.0 ± 1.2                     | 2.5 ± 0.0                           | 9.4                | 21.5              |
+| [YOLO11m](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11m.pt) | 640                   | 51.5                 | 183.2 ± 2.0                    | 4.7 ± 0.1                           | 20.1               | 68.0              |
+| [YOLO11l](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11l.pt) | 640                   | 53.4                 | 238.6 ± 1.4                    | 6.2 ± 0.1                           | 25.3               | 86.9              |
+| [YOLO11x](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt) | 640                   | 54.7                 | 462.8 ± 6.7                    | 11.3 ± 0.2                          | 56.9               | 194.9             |
--- a/docs/en/macros/yolo-obb-perf.md
+++ b/docs/en/macros/yolo-obb-perf.md
+| Model                                                                                        | size<br><sup>(pixels) | mAP<sup>test<br>50 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>T4 TensorRT10<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
+| -------------------------------------------------------------------------------------------- | --------------------- | ------------------ | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
+| [YOLO11n-obb](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n-obb.pt) | 1024                  | 78.4               | 117.6 ± 0.8                    | 4.4 ± 0.0                           | 2.7                | 17.2              |
+| [YOLO11s-obb](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s-obb.pt) | 1024                  | 79.5               | 219.4 ± 4.0                    | 5.1 ± 0.0                           | 9.7                | 57.5              |
+| [YOLO11m-obb](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11m-obb.pt) | 1024                  | 80.9               | 562.8 ± 2.9                    | 10.1 ± 0.4                          | 20.9               | 183.5             |
+| [YOLO11l-obb](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11l-obb.pt) | 1024                  | 81.0               | 712.5 ± 5.0                    | 13.5 ± 0.6                          | 26.2               | 232.0             |
+| [YOLO11x-obb](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x-obb.pt) | 1024                  | 81.3               | 1408.6 ± 7.7                   | 28.6 ± 1.0                          | 58.8               | 520.2             |
--- a/docs/en/macros/yolo-pose-perf.md
+++ b/docs/en/macros/yolo-pose-perf.md
+| Model                                                                                          | size<br><sup>(pixels) | mAP<sup>pose<br>50-95 | mAP<sup>pose<br>50 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>T4 TensorRT10<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
+| ---------------------------------------------------------------------------------------------- | --------------------- | --------------------- | ------------------ | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
+| [YOLO11n-pose](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n-pose.pt) | 640                   | 50.0                  | 81.0               | 52.4 ± 0.5                     | 1.7 ± 0.0                           | 2.9                | 7.6               |
+| [YOLO11s-pose](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s-pose.pt) | 640                   | 58.9                  | 86.3               | 90.5 ± 0.6                     | 2.6 ± 0.0                           | 9.9                | 23.2              |
+| [YOLO11m-pose](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11m-pose.pt) | 640                   | 64.9                  | 89.4               | 187.3 ± 0.8                    | 4.9 ± 0.1                           | 20.9               | 71.7              |
+| [YOLO11l-pose](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11l-pose.pt) | 640                   | 66.1                  | 89.9               | 247.7 ± 1.1                    | 6.4 ± 0.1                           | 26.2               | 90.7              |
+| [YOLO11x-pose](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x-pose.pt) | 640                   | 69.5                  | 91.1               | 488.0 ± 13.9                   | 12.1 ± 0.2                          | 58.8               | 203.3             |
--- a/docs/en/macros/yolo-seg-perf.md
+++ b/docs/en/macros/yolo-seg-perf.md
+| Model                                                                                        | size<br><sup>(pixels) | mAP<sup>box<br>50-95 | mAP<sup>mask<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>T4 TensorRT10<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
+| -------------------------------------------------------------------------------------------- | --------------------- | -------------------- | --------------------- | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
+| [YOLO11n-seg](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n-seg.pt) | 640                   | 38.9                 | 32.0                  | 65.9 ± 1.1                     | 1.8 ± 0.0                           | 2.9                | 10.4              |
+| [YOLO11s-seg](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s-seg.pt) | 640                   | 46.6                 | 37.8                  | 117.6 ± 4.9                    | 2.9 ± 0.0                           | 10.1               | 35.5              |
+| [YOLO11m-seg](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11m-seg.pt) | 640                   | 51.5                 | 41.5                  | 281.6 ± 1.2                    | 6.3 ± 0.1                           | 22.4               | 123.3             |
+| [YOLO11l-seg](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11l-seg.pt) | 640                   | 53.4                 | 42.9                  | 344.2 ± 3.2                    | 7.8 ± 0.2                           | 27.6               | 142.2             |
+| [YOLO11x-seg](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x-seg.pt) | 640                   | 54.7                 | 43.8                  | 664.5 ± 3.2                    | 15.8 ± 0.7                          | 62.1               | 319.0             |
--- a/docs/en/models/fast-sam.md
+++ b/docs/en/models/fast-sam.md
+---
+comments: true
+description: Discover FastSAM, a real-time CNN-based solution for segmenting any object in an image. Efficient, competitive, and ideal for various vision tasks.
+keywords: FastSAM, Fast Segment Anything Model, Ultralytics, real-time segmentation, CNN, YOLOv8-seg, object segmentation, image processing, computer vision
+---
+
+# Fast Segment Anything Model (FastSAM)
+
+The Fast Segment Anything Model (FastSAM) is a novel, real-time CNN-based solution for the Segment Anything task. This task is designed to segment any object within an image based on various possible user interaction prompts. FastSAM significantly reduces computational demands while maintaining competitive performance, making it a practical choice for a variety of vision tasks.
+
+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/F7db-EHhxss"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> Object Tracking using FastSAM with Ultralytics
+</p>
+
+## Model Architecture
+
+![Fast Segment Anything Model (FastSAM) architecture overview](https://github.com/ultralytics/docs/releases/download/0/fastsam-architecture-overview.avif)
+
+## Overview
+
+FastSAM is designed to address the limitations of the [Segment Anything Model (SAM)](sam.md), a heavy [Transformer](https://www.ultralytics.com/glossary/transformer) model with substantial computational resource requirements. The FastSAM decouples the segment anything task into two sequential stages: all-[instance segmentation](https://www.ultralytics.com/glossary/instance-segmentation) and prompt-guided selection. The first stage uses [YOLOv8-seg](../tasks/segment.md) to produce the segmentation masks of all instances in the image. In the second stage, it outputs the region-of-interest corresponding to the prompt.
+
+## Key Features
+
+1. **Real-time Solution:** By leveraging the computational efficiency of CNNs, FastSAM provides a real-time solution for the segment anything task, making it valuable for industrial applications that require quick results.
+
+2. **Efficiency and Performance:** FastSAM offers a significant reduction in computational and resource demands without compromising on performance quality. It achieves comparable performance to SAM but with drastically reduced computational resources, enabling real-time application.
+
+3. **Prompt-guided Segmentation:** FastSAM can segment any object within an image guided by various possible user interaction prompts, providing flexibility and adaptability in different scenarios.
+
+4. **Based on YOLOv8-seg:** FastSAM is based on [YOLOv8-seg](../tasks/segment.md), an object detector equipped with an instance segmentation branch. This allows it to effectively produce the segmentation masks of all instances in an image.
+
+5. **Competitive Results on Benchmarks:** On the object proposal task on MS COCO, FastSAM achieves high scores at a significantly faster speed than [SAM](sam.md) on a single NVIDIA RTX 3090, demonstrating its efficiency and capability.
+
+6. **Practical Applications:** The proposed approach provides a new, practical solution for a large number of vision tasks at a really high speed, tens or hundreds of times faster than current methods.
+
+7. **Model Compression Feasibility:** FastSAM demonstrates the feasibility of a path that can significantly reduce the computational effort by introducing an artificial prior to the structure, thus opening new possibilities for large model architecture for general vision tasks.
+
+## Available Models, Supported Tasks, and Operating Modes
+
+This table presents the available models with their specific pre-trained weights, the tasks they support, and their compatibility with different operating modes like [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md), indicated by ✅ emojis for supported modes and ❌ emojis for unsupported modes.
+
+| Model Type | Pre-trained Weights                                                                         | Tasks Supported                              | Inference | Validation | Training | Export |
+| ---------- | ------------------------------------------------------------------------------------------- | -------------------------------------------- | --------- | ---------- | -------- | ------ |
+| FastSAM-s  | [FastSAM-s.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/FastSAM-s.pt) | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ✅     |
+| FastSAM-x  | [FastSAM-x.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/FastSAM-x.pt) | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ✅     |
+
+## Usage Examples
+
+The FastSAM models are easy to integrate into your Python applications. Ultralytics provides user-friendly Python API and CLI commands to streamline development.
+
+### Predict Usage
+
+To perform [object detection](https://www.ultralytics.com/glossary/object-detection) on an image, use the `predict` method as shown below:
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import FastSAM
+
+        # Define an inference source
+        source = "path/to/bus.jpg"
+
+        # Create a FastSAM model
+        model = FastSAM("FastSAM-s.pt")  # or FastSAM-x.pt
+
+        # Run inference on an image
+        everything_results = model(source, device="cpu", retina_masks=True, imgsz=1024, conf=0.4, iou=0.9)
+
+        # Run inference with bboxes prompt
+        results = model(source, bboxes=[439, 437, 524, 709])
+
+        # Run inference with points prompt
+        results = model(source, points=[[200, 200]], labels=[1])
+
+        # Run inference with texts prompt
+        results = model(source, texts="a photo of a dog")
+
+        # Run inference with bboxes and points and texts prompt at the same time
+        results = model(source, bboxes=[439, 437, 524, 709], points=[[200, 200]], labels=[1], texts="a photo of a dog")
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Load a FastSAM model and segment everything with it
+        yolo segment predict model=FastSAM-s.pt source=path/to/bus.jpg imgsz=640
+        ```
+
+This snippet demonstrates the simplicity of loading a pre-trained model and running a prediction on an image.
+
+!!! example "FastSAMPredictor example"
+
+    This way you can run inference on image and get all the segment `results` once and run prompts inference multiple times without running inference multiple times.
+
+    === "Prompt inference"
+
+        ```python
+        from ultralytics.models.fastsam import FastSAMPredictor
+
+        # Create FastSAMPredictor
+        overrides = dict(conf=0.25, task="segment", mode="predict", model="FastSAM-s.pt", save=False, imgsz=1024)
+        predictor = FastSAMPredictor(overrides=overrides)
+
+        # Segment everything
+        everything_results = predictor("ultralytics/assets/bus.jpg")
+
+        # Prompt inference
+        bbox_results = predictor.prompt(everything_results, bboxes=[[200, 200, 300, 300]])
+        point_results = predictor.prompt(everything_results, points=[200, 200])
+        text_results = predictor.prompt(everything_results, texts="a photo of a dog")
+        ```
+
+!!! note
+
+    All the returned `results` in above examples are [Results](../modes/predict.md#working-with-results) object which allows access predicted masks and source image easily.
+
+### Val Usage
+
+Validation of the model on a dataset can be done as follows:
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import FastSAM
+
+        # Create a FastSAM model
+        model = FastSAM("FastSAM-s.pt")  # or FastSAM-x.pt
+
+        # Validate the model
+        results = model.val(data="coco8-seg.yaml")
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Load a FastSAM model and validate it on the COCO8 example dataset at image size 640
+        yolo segment val model=FastSAM-s.pt data=coco8.yaml imgsz=640
+        ```
+
+Please note that FastSAM only supports detection and segmentation of a single class of object. This means it will recognize and segment all objects as the same class. Therefore, when preparing the dataset, you need to convert all object category IDs to 0.
+
+### Track Usage
+
+To perform object tracking on an image, use the `track` method as shown below:
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import FastSAM
+
+        # Create a FastSAM model
+        model = FastSAM("FastSAM-s.pt")  # or FastSAM-x.pt
+
+        # Track with a FastSAM model on a video
+        results = model.track(source="path/to/video.mp4", imgsz=640)
+        ```
+
+    === "CLI"
+
+        ```bash
+        yolo segment track model=FastSAM-s.pt source="path/to/video/file.mp4" imgsz=640
+        ```
+
+## FastSAM official Usage
+
+FastSAM is also available directly from the [https://github.com/CASIA-IVA-Lab/FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) repository. Here is a brief overview of the typical steps you might take to use FastSAM:
+
+### Installation
+
+1. Clone the FastSAM repository:
+
+    ```shell
+    git clone https://github.com/CASIA-IVA-Lab/FastSAM.git
+    ```
+
+2. Create and activate a Conda environment with Python 3.9:
+
+    ```shell
+    conda create -n FastSAM python=3.9
+    conda activate FastSAM
+    ```
+
+3. Navigate to the cloned repository and install the required packages:
+
+    ```shell
+    cd FastSAM
+    pip install -r requirements.txt
+    ```
+
+4. Install the CLIP model:
+    ```shell
+    pip install git+https://github.com/ultralytics/CLIP.git
+    ```
+
+### Example Usage
+
+1. Download a [model checkpoint](https://drive.google.com/file/d/1m1sjY4ihXBU1fZXdQ-Xdj-mDltW-2Rqv/view?usp=sharing).
+
+2. Use FastSAM for inference. Example commands:
+
+    - Segment everything in an image:
+
+        ```shell
+        python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg
+        ```
+
+    - Segment specific objects using text prompt:
+
+        ```shell
+        python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --text_prompt "the yellow dog"
+        ```
+
+    - Segment objects within a [bounding box](https://www.ultralytics.com/glossary/bounding-box) (provide box coordinates in xywh format):
+
+        ```shell
+        python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --box_prompt "[570,200,230,400]"
+        ```
+
+    - Segment objects near specific points:
+        ```shell
+        python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --point_prompt "[[520,360],[620,300]]" --point_label "[1,0]"
+        ```
+
+Additionally, you can try FastSAM through a [Colab demo](https://colab.research.google.com/drive/1oX14f6IneGGw612WgVlAiy91UHwFAvr9?usp=sharing) or on the [HuggingFace web demo](https://huggingface.co/spaces/An-619/FastSAM) for a visual experience.
+
+## Citations and Acknowledgements
+
+We would like to acknowledge the FastSAM authors for their significant contributions in the field of real-time instance segmentation:
+
+!!! quote ""
+
+    === "BibTeX"
+
+      ```bibtex
+      @misc{zhao2023fast,
+            title={Fast Segment Anything},
+            author={Xu Zhao and Wenchao Ding and Yongqi An and Yinglong Du and Tao Yu and Min Li and Ming Tang and Jinqiao Wang},
+            year={2023},
+            eprint={2306.12156},
+            archivePrefix={arXiv},
+            primaryClass={cs.CV}
+      }
+      ```
+
+The original FastSAM paper can be found on [arXiv](https://arxiv.org/abs/2306.12156). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/CASIA-IVA-Lab/FastSAM). We appreciate their efforts in advancing the field and making their work accessible to the broader community.
+
+## FAQ
+
+### What is FastSAM and how does it differ from SAM?
+
+FastSAM, short for Fast Segment Anything Model, is a real-time [convolutional neural network](https://www.ultralytics.com/glossary/convolutional-neural-network-cnn) (CNN)-based solution designed to reduce computational demands while maintaining high performance in object segmentation tasks. Unlike the Segment Anything Model (SAM), which uses a heavier Transformer-based architecture, FastSAM leverages [Ultralytics YOLOv8-seg](../tasks/segment.md) for efficient instance segmentation in two stages: all-instance segmentation followed by prompt-guided selection.
+
+### How does FastSAM achieve real-time segmentation performance?
+
+FastSAM achieves real-time segmentation by decoupling the segmentation task into all-instance segmentation with YOLOv8-seg and prompt-guided selection stages. By utilizing the computational efficiency of CNNs, FastSAM offers significant reductions in computational and resource demands while maintaining competitive performance. This dual-stage approach enables FastSAM to deliver fast and efficient segmentation suitable for applications requiring quick results.
+
+### What are the practical applications of FastSAM?
+
+FastSAM is practical for a variety of [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) tasks that require real-time segmentation performance. Applications include:
+
+- Industrial automation for quality control and assurance
+- Real-time video analysis for security and surveillance
+- Autonomous vehicles for object detection and segmentation
+- Medical imaging for precise and quick segmentation tasks
+
+Its ability to handle various user interaction prompts makes FastSAM adaptable and flexible for diverse scenarios.
+
+### How do I use the FastSAM model for inference in Python?
+
+To use FastSAM for inference in Python, you can follow the example below:
+
+```python
+from ultralytics import FastSAM
+
+# Define an inference source
+source = "path/to/bus.jpg"
+
+# Create a FastSAM model
+model = FastSAM("FastSAM-s.pt")  # or FastSAM-x.pt
+
+# Run inference on an image
+everything_results = model(source, device="cpu", retina_masks=True, imgsz=1024, conf=0.4, iou=0.9)
+
+# Run inference with bboxes prompt
+results = model(source, bboxes=[439, 437, 524, 709])
+
+# Run inference with points prompt
+results = model(source, points=[[200, 200]], labels=[1])
+
+# Run inference with texts prompt
+results = model(source, texts="a photo of a dog")
+
+# Run inference with bboxes and points and texts prompt at the same time
+results = model(source, bboxes=[439, 437, 524, 709], points=[[200, 200]], labels=[1], texts="a photo of a dog")
+```
+
+For more details on inference methods, check the [Predict Usage](#predict-usage) section of the documentation.
+
+### What types of prompts does FastSAM support for segmentation tasks?
+
+FastSAM supports multiple prompt types for guiding the segmentation tasks:
+
+- **Everything Prompt**: Generates segmentation for all visible objects.
+- **Bounding Box (BBox) Prompt**: Segments objects within a specified bounding box.
+- **Text Prompt**: Uses a descriptive text to segment objects matching the description.
+- **Point Prompt**: Segments objects near specific user-defined points.
+
+This flexibility allows FastSAM to adapt to a wide range of user interaction scenarios, enhancing its utility across different applications. For more information on using these prompts, refer to the [Key Features](#key-features) section.
--- a/docs/en/models/index.md
+++ b/docs/en/models/index.md
+---
+comments: true
+description: Discover a variety of models supported by Ultralytics, including YOLOv3 to YOLOv10, NAS, SAM, and RT-DETR for detection, segmentation, and more.
+keywords: Ultralytics, supported models, YOLOv3, YOLOv4, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, SAM, NAS, RT-DETR, object detection, image segmentation, classification, pose estimation, multi-object tracking
+---
+
+# Models Supported by Ultralytics
+
+Welcome to Ultralytics' model documentation! We offer support for a wide range of models, each tailored to specific tasks like [object detection](../tasks/detect.md), [instance segmentation](../tasks/segment.md), [image classification](../tasks/classify.md), [pose estimation](../tasks/pose.md), and [multi-object tracking](../modes/track.md). If you're interested in contributing your model architecture to Ultralytics, check out our [Contributing Guide](../help/contributing.md).
+
+![Ultralytics YOLO11 Comparison Plots](https://raw.githubusercontent.com/ultralytics/assets/refs/heads/main/yolo/performance-comparison.png)
+
+## Featured Models
+
+Here are some of the key models supported:
+
+1. **[YOLOv3](yolov3.md)**: The third iteration of the YOLO model family, originally by Joseph Redmon, known for its efficient real-time object detection capabilities.
+2. **[YOLOv4](yolov4.md)**: A darknet-native update to YOLOv3, released by Alexey Bochkovskiy in 2020.
+3. **[YOLOv5](yolov5.md)**: An improved version of the YOLO architecture by Ultralytics, offering better performance and speed trade-offs compared to previous versions.
+4. **[YOLOv6](yolov6.md)**: Released by [Meituan](https://about.meituan.com/) in 2022, and in use in many of the company's autonomous delivery robots.
+5. **[YOLOv7](yolov7.md)**: Updated YOLO models released in 2022 by the authors of YOLOv4.
+6. **[YOLOv8](yolov8.md)**: The latest version of the YOLO family, featuring enhanced capabilities such as [instance segmentation](https://www.ultralytics.com/glossary/instance-segmentation), pose/keypoints estimation, and classification.
+7. **[YOLOv9](yolov9.md)**: An experimental model trained on the Ultralytics [YOLOv5](yolov5.md) codebase implementing Programmable Gradient Information (PGI).
+8. **[YOLOv10](yolov10.md)**: By Tsinghua University, featuring NMS-free training and efficiency-accuracy driven architecture, delivering state-of-the-art performance and latency.
+9. **[YOLO11](yolo11.md) 🚀 NEW**: Ultralytics' latest YOLO models delivering state-of-the-art (SOTA) performance across multiple tasks.
+10. **[Segment Anything Model (SAM)](sam.md)**: Meta's original Segment Anything Model (SAM).
+11. **[Segment Anything Model 2 (SAM2)](sam-2.md)**: The next generation of Meta's Segment Anything Model (SAM) for videos and images.
+12. **[Mobile Segment Anything Model (MobileSAM)](mobile-sam.md)**: MobileSAM for mobile applications, by Kyung Hee University.
+13. **[Fast Segment Anything Model (FastSAM)](fast-sam.md)**: FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
+14. **[YOLO-NAS](yolo-nas.md)**: YOLO Neural Architecture Search (NAS) Models.
+15. **[Realtime Detection Transformers (RT-DETR)](rtdetr.md)**: Baidu's PaddlePaddle Realtime Detection [Transformer](https://www.ultralytics.com/glossary/transformer) (RT-DETR) models.
+16. **[YOLO-World](yolo-world.md)**: Real-time Open Vocabulary Object Detection models from Tencent AI Lab.
+
+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/MWq1UxqTClU?si=nHAW-lYDzrz68jR0"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> Run Ultralytics YOLO models in just a few lines of code.
+</p>
+
+## Getting Started: Usage Examples
+
+This example provides simple YOLO training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.
+
+Note the below example is for YOLOv8 [Detect](../tasks/detect.md) models for [object detection](https://www.ultralytics.com/glossary/object-detection). For additional supported tasks see the [Segment](../tasks/segment.md), [Classify](../tasks/classify.md) and [Pose](../tasks/pose.md) docs.
+
+!!! example
+
+    === "Python"
+
+        [PyTorch](https://www.ultralytics.com/glossary/pytorch) pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()`, `SAM()`, `NAS()` and `RTDETR()` classes to create a model instance in Python:
+
+        ```python
+        from ultralytics import YOLO
+
+        # Load a COCO-pretrained YOLOv8n model
+        model = YOLO("yolov8n.pt")
+
+        # Display model information (optional)
+        model.info()
+
+        # Train the model on the COCO8 example dataset for 100 epochs
+        results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
+
+        # Run inference with the YOLOv8n model on the 'bus.jpg' image
+        results = model("path/to/bus.jpg")
+        ```
+
+    === "CLI"
+
+        CLI commands are available to directly run the models:
+
+        ```bash
+        # Load a COCO-pretrained YOLOv8n model and train it on the COCO8 example dataset for 100 epochs
+        yolo train model=yolov8n.pt data=coco8.yaml epochs=100 imgsz=640
+
+        # Load a COCO-pretrained YOLOv8n model and run inference on the 'bus.jpg' image
+        yolo predict model=yolov8n.pt source=path/to/bus.jpg
+        ```
+
+## Contributing New Models
+
+Interested in contributing your model to Ultralytics? Great! We're always open to expanding our model portfolio.
+
+1. **Fork the Repository**: Start by forking the [Ultralytics GitHub repository](https://github.com/ultralytics/ultralytics).
+
+2. **Clone Your Fork**: Clone your fork to your local machine and create a new branch to work on.
+
+3. **Implement Your Model**: Add your model following the coding standards and guidelines provided in our [Contributing Guide](../help/contributing.md).
+
+4. **Test Thoroughly**: Make sure to test your model rigorously, both in isolation and as part of the pipeline.
+
+5. **Create a Pull Request**: Once you're satisfied with your model, create a pull request to the main repository for review.
+
+6. **Code Review & Merging**: After review, if your model meets our criteria, it will be merged into the main repository.
+
+For detailed steps, consult our [Contributing Guide](../help/contributing.md).
+
+## FAQ
+
+### What are the key advantages of using Ultralytics YOLOv8 for object detection?
+
+Ultralytics YOLOv8 offers enhanced capabilities such as real-time object detection, instance segmentation, pose estimation, and classification. Its optimized architecture ensures high-speed performance without sacrificing [accuracy](https://www.ultralytics.com/glossary/accuracy), making it ideal for a variety of applications. YOLOv8 also includes built-in compatibility with popular datasets and models, as detailed on the [YOLOv8 documentation page](../models/yolov8.md).
+
+### How can I train a YOLOv8 model on custom data?
+
+Training a YOLOv8 model on custom data can be easily accomplished using Ultralytics' libraries. Here's a quick example:
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import YOLO
+
+        # Load a YOLOv8n model
+        model = YOLO("yolov8n.pt")
+
+        # Train the model on custom dataset
+        results = model.train(data="custom_data.yaml", epochs=100, imgsz=640)
+        ```
+
+    === "CLI"
+
+        ```bash
+        yolo train model=yolov8n.pt data='custom_data.yaml' epochs=100 imgsz=640
+        ```
+
+For more detailed instructions, visit the [Train](../modes/train.md) documentation page.
+
+### Which YOLO versions are supported by Ultralytics?
+
+Ultralytics supports a comprehensive range of YOLO (You Only Look Once) versions from YOLOv3 to YOLOv10, along with models like NAS, SAM, and RT-DETR. Each version is optimized for various tasks such as detection, segmentation, and classification. For detailed information on each model, refer to the [Models Supported by Ultralytics](../models/index.md) documentation.
+
+### Why should I use Ultralytics HUB for [machine learning](https://www.ultralytics.com/glossary/machine-learning-ml) projects?
+
+Ultralytics HUB provides a no-code, end-to-end platform for training, deploying, and managing YOLO models. It simplifies complex workflows, enabling users to focus on model performance and application. The HUB also offers cloud training capabilities, comprehensive dataset management, and user-friendly interfaces. Learn more about it on the [Ultralytics HUB](../hub/index.md) documentation page.
+
+### What types of tasks can YOLOv8 perform, and how does it compare to other YOLO versions?
+
+YOLOv8 is a versatile model capable of performing tasks including object detection, instance segmentation, classification, and pose estimation. Compared to earlier versions like YOLOv3 and YOLOv4, YOLOv8 offers significant improvements in speed and accuracy due to its optimized architecture. For a deeper comparison, refer to the [YOLOv8 documentation](../models/yolov8.md) and the [Task pages](../tasks/index.md) for more details on specific tasks.
--- a/docs/en/models/mobile-sam.md
+++ b/docs/en/models/mobile-sam.md
+---
+comments: true
+description: Discover MobileSAM, a lightweight and fast image segmentation model for mobile applications. Compare its performance with the original SAM and explore its various modes.
+keywords: MobileSAM, image segmentation, lightweight model, fast segmentation, mobile applications, SAM, ViT encoder, Tiny-ViT, Ultralytics
+---
+
+![MobileSAM Logo](https://raw.githubusercontent.com/ChaoningZhang/MobileSAM/master/assets/logo2.png)
+
+# Mobile Segment Anything (MobileSAM)
+
+The MobileSAM paper is now available on [arXiv](https://arxiv.org/pdf/2306.14289.pdf).
+
+A demonstration of MobileSAM running on a CPU can be accessed at this [demo link](https://huggingface.co/spaces/dhkim2810/MobileSAM). The performance on a Mac i5 CPU takes approximately 3 seconds. On the Hugging Face demo, the interface and lower-performance CPUs contribute to a slower response, but it continues to function effectively.
+
+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/yXQPLMrNX2s"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> How to Run Inference with MobileSAM using Ultralytics | Step-by-Step Guide 🎉
+</p>
+
+MobileSAM is implemented in various projects including [Grounding-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [AnyLabeling](https://github.com/vietanhdev/anylabeling), and [Segment Anything in 3D](https://github.com/Jumpat/SegmentAnythingin3D).
+
+MobileSAM is trained on a single GPU with a 100k dataset (1% of the original images) in less than a day. The code for this training will be made available in the future.
+
+## Available Models, Supported Tasks, and Operating Modes
+
+This table presents the available models with their specific pre-trained weights, the tasks they support, and their compatibility with different operating modes like [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md), indicated by ✅ emojis for supported modes and ❌ emojis for unsupported modes.
+
+| Model Type | Pre-trained Weights                                                                           | Tasks Supported                              | Inference | Validation | Training | Export |
+| ---------- | --------------------------------------------------------------------------------------------- | -------------------------------------------- | --------- | ---------- | -------- | ------ |
+| MobileSAM  | [mobile_sam.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/mobile_sam.pt) | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ❌     |
+
+## Adapting from SAM to MobileSAM
+
+Since MobileSAM retains the same pipeline as the original SAM, we have incorporated the original's pre-processing, post-processing, and all other interfaces. Consequently, those currently using the original SAM can transition to MobileSAM with minimal effort.
+
+MobileSAM performs comparably to the original SAM and retains the same pipeline except for a change in the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a smaller Tiny-ViT (5M). On a single GPU, MobileSAM operates at about 12ms per image: 8ms on the image encoder and 4ms on the mask decoder.
+
+The following table provides a comparison of ViT-based image encoders:
+
+| Image Encoder | Original SAM | MobileSAM |
+| ------------- | ------------ | --------- |
+| Parameters    | 611M         | 5M        |
+| Speed         | 452ms        | 8ms       |
+
+Both the original SAM and MobileSAM utilize the same prompt-guided mask decoder:
+
+| Mask Decoder | Original SAM | MobileSAM |
+| ------------ | ------------ | --------- |
+| Parameters   | 3.876M       | 3.876M    |
+| Speed        | 4ms          | 4ms       |
+
+Here is the comparison of the whole pipeline:
+
+| Whole Pipeline (Enc+Dec) | Original SAM | MobileSAM |
+| ------------------------ | ------------ | --------- |
+| Parameters               | 615M         | 9.66M     |
+| Speed                    | 456ms        | 12ms      |
+
+The performance of MobileSAM and the original SAM are demonstrated using both a point and a box as prompts.
+
+![Image with Point as Prompt](https://github.com/ultralytics/docs/releases/download/0/mask-box.avif)
+
+![Image with Box as Prompt](https://github.com/ultralytics/docs/releases/download/0/mask-box.avif)
+
+With its superior performance, MobileSAM is approximately 5 times smaller and 7 times faster than the current FastSAM. More details are available at the [MobileSAM project page](https://github.com/ChaoningZhang/MobileSAM).
+
+## Testing MobileSAM in Ultralytics
+
+Just like the original SAM, we offer a straightforward testing method in Ultralytics, including modes for both Point and Box prompts.
+
+### Model Download
+
+You can download the model [here](https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/mobile_sam.pt).
+
+### Point Prompt
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import SAM
+
+        # Load the model
+        model = SAM("mobile_sam.pt")
+
+        # Predict a segment based on a single point prompt
+        model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
+
+        # Predict multiple segments based on multiple points prompt
+        model.predict("ultralytics/assets/zidane.jpg", points=[[400, 370], [900, 370]], labels=[1, 1])
+
+        # Predict a segment based on multiple points prompt per object
+        model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 1]])
+
+        # Predict a segment using both positive and negative prompts.
+        model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 0]])
+        ```
+
+### Box Prompt
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import SAM
+
+        # Load the model
+        model = SAM("mobile_sam.pt")
+
+        # Predict a segment based on a single point prompt
+        model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
+
+        # Predict mutiple segments based on multiple points prompt
+        model.predict("ultralytics/assets/zidane.jpg", points=[[400, 370], [900, 370]], labels=[1, 1])
+
+        # Predict a segment based on multiple points prompt per object
+        model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 1]])
+
+        # Predict a segment using both positive and negative prompts.
+        model.predict("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 0]])
+        ```
+
+We have implemented `MobileSAM` and `SAM` using the same API. For more usage information, please see the [SAM page](sam.md).
+
+## Citations and Acknowledgements
+
+If you find MobileSAM useful in your research or development work, please consider citing our paper:
+
+!!! quote ""
+
+    === "BibTeX"
+
+        ```bibtex
+        @article{mobile_sam,
+          title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
+          author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
+          journal={arXiv preprint arXiv:2306.14289},
+          year={2023}
+        }
+        ```
+
+## FAQ
+
+### What is MobileSAM and how does it differ from the original SAM model?
+
+MobileSAM is a lightweight, fast [image segmentation](https://www.ultralytics.com/glossary/image-segmentation) model designed for mobile applications. It retains the same pipeline as the original SAM but replaces the heavyweight ViT-H encoder (632M parameters) with a smaller Tiny-ViT encoder (5M parameters). This change results in MobileSAM being approximately 5 times smaller and 7 times faster than the original SAM. For instance, MobileSAM operates at about 12ms per image, compared to the original SAM's 456ms. You can learn more about the MobileSAM implementation in various projects [here](https://github.com/ChaoningZhang/MobileSAM).
+
+### How can I test MobileSAM using Ultralytics?
+
+Testing MobileSAM in Ultralytics can be accomplished through straightforward methods. You can use Point and Box prompts to predict segments. Here's an example using a Point prompt:
+
+```python
+from ultralytics import SAM
+
+# Load the model
+model = SAM("mobile_sam.pt")
+
+# Predict a segment based on a point prompt
+model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
+```
+
+You can also refer to the [Testing MobileSAM](#testing-mobilesam-in-ultralytics) section for more details.
+
+### Why should I use MobileSAM for my mobile application?
+
+MobileSAM is ideal for mobile applications due to its lightweight architecture and fast inference speed. Compared to the original SAM, MobileSAM is approximately 5 times smaller and 7 times faster, making it suitable for environments where computational resources are limited. This efficiency ensures that mobile devices can perform real-time image segmentation without significant latency. Additionally, MobileSAM's models, such as [Inference](../modes/predict.md), are optimized for mobile performance.
+
+### How was MobileSAM trained, and is the training code available?
+
+MobileSAM was trained on a single GPU with a 100k dataset, which is 1% of the original images, in less than a day. While the training code will be made available in the future, you can currently explore other aspects of MobileSAM in the [MobileSAM GitHub repository](https://github.com/ultralytics/assets/releases/download/v8.2.0/mobile_sam.pt). This repository includes pre-trained weights and implementation details for various applications.
+
+### What are the primary use cases for MobileSAM?
+
+MobileSAM is designed for fast and efficient image segmentation in mobile environments. Primary use cases include:
+
+- **Real-time [object detection](https://www.ultralytics.com/glossary/object-detection) and segmentation** for mobile applications.
+- **Low-latency image processing** in devices with limited computational resources.
+- **Integration in AI-driven mobile apps** for tasks such as augmented reality (AR) and real-time analytics.
+
+For more detailed use cases and performance comparisons, see the section on [Adapting from SAM to MobileSAM](#adapting-from-sam-to-mobilesam).
--- a/docs/en/models/rtdetr.md
+++ b/docs/en/models/rtdetr.md
+---
+comments: true
+description: Explore Baidu's RT-DETR, a Vision Transformer-based real-time object detector offering high accuracy and adaptable inference speed. Learn more with Ultralytics.
+keywords: RT-DETR, Baidu, Vision Transformer, real-time object detection, PaddlePaddle, Ultralytics, pre-trained models, AI, machine learning, computer vision
+---
+
+# Baidu's RT-DETR: A Vision [Transformer](https://www.ultralytics.com/glossary/transformer)-Based Real-Time Object Detector
+
+## Overview
+
+Real-Time Detection Transformer (RT-DETR), developed by Baidu, is a cutting-edge end-to-end object detector that provides real-time performance while maintaining high [accuracy](https://www.ultralytics.com/glossary/accuracy). It is based on the idea of DETR (the NMS-free framework), meanwhile introducing conv-based backbone and an efficient hybrid encoder to gain real-time speed. RT-DETR efficiently processes multiscale features by decoupling intra-scale interaction and cross-scale fusion. The model is highly adaptable, supporting flexible adjustment of inference speed using different decoder layers without retraining. RT-DETR excels on accelerated backends like CUDA with TensorRT, outperforming many other real-time object detectors.
+
+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/SArFQs6CHwk"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> Real-Time Detection Transformer (RT-DETR)
+</p>
+
+![Model example image](https://github.com/ultralytics/docs/releases/download/0/baidu-rtdetr-model-overview.avif) **Overview of Baidu's RT-DETR.** The RT-DETR model architecture diagram shows the last three stages of the backbone {S3, S4, S5} as the input to the encoder. The efficient hybrid encoder transforms multiscale features into a sequence of image features through intrascale feature interaction (AIFI) and cross-scale feature-fusion module (CCFM). The IoU-aware query selection is employed to select a fixed number of image features to serve as initial object queries for the decoder. Finally, the decoder with auxiliary prediction heads iteratively optimizes object queries to generate boxes and confidence scores ([source](https://arxiv.org/pdf/2304.08069.pdf)).
+
+### Key Features
+
+- **Efficient Hybrid Encoder:** Baidu's RT-DETR uses an efficient hybrid encoder that processes multiscale features by decoupling intra-scale interaction and cross-scale fusion. This unique Vision Transformers-based design reduces computational costs and allows for real-time [object detection](https://www.ultralytics.com/glossary/object-detection).
+- **IoU-aware Query Selection:** Baidu's RT-DETR improves object query initialization by utilizing IoU-aware query selection. This allows the model to focus on the most relevant objects in the scene, enhancing the detection accuracy.
+- **Adaptable Inference Speed:** Baidu's RT-DETR supports flexible adjustments of inference speed by using different decoder layers without the need for retraining. This adaptability facilitates practical application in various real-time object detection scenarios.
+
+## Pre-trained Models
+
+The Ultralytics Python API provides pre-trained PaddlePaddle RT-DETR models with different scales:
+
+- RT-DETR-L: 53.0% AP on COCO val2017, 114 FPS on T4 GPU
+- RT-DETR-X: 54.8% AP on COCO val2017, 74 FPS on T4 GPU
+
+## Usage Examples
+
+This example provides simple RT-DETR training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import RTDETR
+
+        # Load a COCO-pretrained RT-DETR-l model
+        model = RTDETR("rtdetr-l.pt")
+
+        # Display model information (optional)
+        model.info()
+
+        # Train the model on the COCO8 example dataset for 100 epochs
+        results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
+
+        # Run inference with the RT-DETR-l model on the 'bus.jpg' image
+        results = model("path/to/bus.jpg")
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs
+        yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640
+
+        # Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image
+        yolo predict model=rtdetr-l.pt source=path/to/bus.jpg
+        ```
+
+## Supported Tasks and Modes
+
+This table presents the model types, the specific pre-trained weights, the tasks supported by each model, and the various modes ([Train](../modes/train.md) , [Val](../modes/val.md), [Predict](../modes/predict.md), [Export](../modes/export.md)) that are supported, indicated by ✅ emojis.
+
+| Model Type          | Pre-trained Weights                                                                       | Tasks Supported                        | Inference | Validation | Training | Export |
+| ------------------- | ----------------------------------------------------------------------------------------- | -------------------------------------- | --------- | ---------- | -------- | ------ |
+| RT-DETR Large       | [rtdetr-l.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/rtdetr-l.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+| RT-DETR Extra-Large | [rtdetr-x.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/rtdetr-x.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+
+## Citations and Acknowledgements
+
+If you use Baidu's RT-DETR in your research or development work, please cite the [original paper](https://arxiv.org/abs/2304.08069):
+
+!!! quote ""
+
+    === "BibTeX"
+
+        ```bibtex
+        @misc{lv2023detrs,
+              title={DETRs Beat YOLOs on Real-time Object Detection},
+              author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
+              year={2023},
+              eprint={2304.08069},
+              archivePrefix={arXiv},
+              primaryClass={cs.CV}
+        }
+        ```
+
+We would like to acknowledge Baidu and the [PaddlePaddle](https://github.com/PaddlePaddle/PaddleDetection) team for creating and maintaining this valuable resource for the [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) community. Their contribution to the field with the development of the Vision Transformers-based real-time object detector, RT-DETR, is greatly appreciated.
+
+## FAQ
+
+### What is Baidu's RT-DETR model and how does it work?
+
+Baidu's RT-DETR (Real-Time Detection Transformer) is an advanced real-time object detector built upon the Vision Transformer architecture. It efficiently processes multiscale features by decoupling intra-scale interaction and cross-scale fusion through its efficient hybrid encoder. By employing IoU-aware query selection, the model focuses on the most relevant objects, enhancing detection accuracy. Its adaptable inference speed, achieved by adjusting decoder layers without retraining, makes RT-DETR suitable for various real-time object detection scenarios. Learn more about RT-DETR features [here](https://arxiv.org/pdf/2304.08069.pdf).
+
+### How can I use the pre-trained RT-DETR models provided by Ultralytics?
+
+You can leverage Ultralytics Python API to use pre-trained PaddlePaddle RT-DETR models. For instance, to load an RT-DETR-l model pre-trained on COCO val2017 and achieve high FPS on T4 GPU, you can utilize the following example:
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import RTDETR
+
+        # Load a COCO-pretrained RT-DETR-l model
+        model = RTDETR("rtdetr-l.pt")
+
+        # Display model information (optional)
+        model.info()
+
+        # Train the model on the COCO8 example dataset for 100 epochs
+        results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
+
+        # Run inference with the RT-DETR-l model on the 'bus.jpg' image
+        results = model("path/to/bus.jpg")
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs
+        yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640
+
+        # Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image
+        yolo predict model=rtdetr-l.pt source=path/to/bus.jpg
+        ```
+
+### Why should I choose Baidu's RT-DETR over other real-time object detectors?
+
+Baidu's RT-DETR stands out due to its efficient hybrid encoder and IoU-aware query selection, which drastically reduce computational costs while maintaining high accuracy. Its unique ability to adjust inference speed by using different decoder layers without retraining adds significant flexibility. This makes it particularly advantageous for applications requiring real-time performance on accelerated backends like CUDA with TensorRT, outclassing many other real-time object detectors.
+
+### How does RT-DETR support adaptable inference speed for different real-time applications?
+
+Baidu's RT-DETR allows flexible adjustments of inference speed by using different decoder layers without requiring retraining. This adaptability is crucial for scaling performance across various real-time object detection tasks. Whether you need faster processing for lower [precision](https://www.ultralytics.com/glossary/precision) needs or slower, more accurate detections, RT-DETR can be tailored to meet your specific requirements.
+
+### Can I use RT-DETR models with other Ultralytics modes, such as training, validation, and export?
+
+Yes, RT-DETR models are compatible with various Ultralytics modes including training, validation, prediction, and export. You can refer to the respective documentation for detailed instructions on how to utilize these modes: [Train](../modes/train.md), [Val](../modes/val.md), [Predict](../modes/predict.md), and [Export](../modes/export.md). This ensures a comprehensive workflow for developing and deploying your object detection solutions.
--- a/docs/en/models/sam-2.md
+++ b/docs/en/models/sam-2.md
+---
+comments: true
+description: Discover SAM 2, the next generation of Meta's Segment Anything Model, supporting real-time promptable segmentation in both images and videos with state-of-the-art performance. Learn about its key features, datasets, and how to use it.
+keywords: SAM 2, SAM 2.1, Segment Anything, video segmentation, image segmentation, promptable segmentation, zero-shot performance, SA-V dataset, Ultralytics, real-time segmentation, AI, machine learning
+---
+
+!!! tip "SAM 2.1"
+
+    We have just supported the more accurate SAM2.1 model. Please give it a try!
+
+# SAM 2: Segment Anything Model 2
+
+SAM 2, the successor to Meta's [Segment Anything Model (SAM)](sam.md), is a cutting-edge tool designed for comprehensive object segmentation in both images and videos. It excels in handling complex visual data through a unified, promptable model architecture that supports real-time processing and zero-shot generalization.
+
+![SAM 2 Example Results](https://github.com/ultralytics/docs/releases/download/0/sa-v-dataset.avif)
+
+## Key Features
+
+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/yXQPLMrNX2s"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> How to Run Inference with Meta's SAM2 using Ultralytics | Step-by-Step Guide 🎉
+</p>
+
+### Unified Model Architecture
+
+SAM 2 combines the capabilities of image and video segmentation in a single model. This unification simplifies deployment and allows for consistent performance across different media types. It leverages a flexible prompt-based interface, enabling users to specify objects of interest through various prompt types, such as points, bounding boxes, or masks.
+
+### Real-Time Performance
+
+The model achieves real-time inference speeds, processing approximately 44 frames per second. This makes SAM 2 suitable for applications requiring immediate feedback, such as video editing and augmented reality.
+
+### Zero-Shot Generalization
+
+SAM 2 can segment objects it has never encountered before, demonstrating strong zero-shot generalization. This is particularly useful in diverse or evolving visual domains where pre-defined categories may not cover all possible objects.
+
+### Interactive Refinement
+
+Users can iteratively refine the segmentation results by providing additional prompts, allowing for precise control over the output. This interactivity is essential for fine-tuning results in applications like video annotation or medical imaging.
+
+### Advanced Handling of Visual Challenges
+
+SAM 2 includes mechanisms to manage common video segmentation challenges, such as object occlusion and reappearance. It uses a sophisticated memory mechanism to keep track of objects across frames, ensuring continuity even when objects are temporarily obscured or exit and re-enter the scene.
+
+For a deeper understanding of SAM 2's architecture and capabilities, explore the [SAM 2 research paper](https://arxiv.org/abs/2401.12741).
+
+## Performance and Technical Details
+
+SAM 2 sets a new benchmark in the field, outperforming previous models on various metrics:
+
+| Metric                                                                                     | SAM 2         | Previous SOTA |
+| ------------------------------------------------------------------------------------------ | ------------- | ------------- |
+| **Interactive Video Segmentation**                                                         | **Best**      | -             |
+| **Human Interactions Required**                                                            | **3x fewer**  | Baseline      |
+| **[Image Segmentation](https://www.ultralytics.com/glossary/image-segmentation) Accuracy** | **Improved**  | SAM           |
+| **Inference Speed**                                                                        | **6x faster** | SAM           |
+
+## Model Architecture
+
+### Core Components
+
+- **Image and Video Encoder**: Utilizes a [transformer](https://www.ultralytics.com/glossary/transformer)-based architecture to extract high-level features from both images and video frames. This component is responsible for understanding the visual content at each timestep.
+- **Prompt Encoder**: Processes user-provided prompts (points, boxes, masks) to guide the segmentation task. This allows SAM 2 to adapt to user input and target specific objects within a scene.
+- **Memory Mechanism**: Includes a memory encoder, memory bank, and memory attention module. These components collectively store and utilize information from past frames, enabling the model to maintain consistent object tracking over time.
+- **Mask Decoder**: Generates the final segmentation masks based on the encoded image features and prompts. In video, it also uses memory context to ensure accurate tracking across frames.
+
+![SAM 2 Architecture Diagram](https://raw.githubusercontent.com/facebookresearch/sam2/refs/heads/main/assets/model_diagram.png)
+
+### Memory Mechanism and Occlusion Handling
+
+The memory mechanism allows SAM 2 to handle temporal dependencies and occlusions in video data. As objects move and interact, SAM 2 records their features in a memory bank. When an object becomes occluded, the model can rely on this memory to predict its position and appearance when it reappears. The occlusion head specifically handles scenarios where objects are not visible, predicting the likelihood of an object being occluded.
+
+### Multi-Mask Ambiguity Resolution
+
+In situations with ambiguity (e.g., overlapping objects), SAM 2 can generate multiple mask predictions. This feature is crucial for accurately representing complex scenes where a single mask might not sufficiently describe the scene's nuances.
+
+## SA-V Dataset
+
+The SA-V dataset, developed for SAM 2's training, is one of the largest and most diverse video segmentation datasets available. It includes:
+
+- **51,000+ Videos**: Captured across 47 countries, providing a wide range of real-world scenarios.
+- **600,000+ Mask Annotations**: Detailed spatio-temporal mask annotations, referred to as "masklets," covering whole objects and parts.
+- **Dataset Scale**: It features 4.5 times more videos and 53 times more annotations than previous largest datasets, offering unprecedented diversity and complexity.
+
+## Benchmarks
+
+### Video Object Segmentation
+
+SAM 2 has demonstrated superior performance across major video segmentation benchmarks:
+
+| Dataset         | J&F  | J    | F    |
+| --------------- | ---- | ---- | ---- |
+| **DAVIS 2017**  | 82.5 | 79.8 | 85.2 |
+| **YouTube-VOS** | 81.2 | 78.9 | 83.5 |
+
+### Interactive Segmentation
+
+In interactive segmentation tasks, SAM 2 shows significant efficiency and accuracy:
+
+| Dataset               | NoC@90 | AUC   |
+| --------------------- | ------ | ----- |
+| **DAVIS Interactive** | 1.54   | 0.872 |
+
+## Installation
+
+To install SAM 2, use the following command. All SAM 2 models will automatically download on first use.
+
+```bash
+pip install ultralytics
+```
+
+## How to Use SAM 2: Versatility in Image and Video Segmentation
+
+The following table details the available SAM 2 models, their pre-trained weights, supported tasks, and compatibility with different operating modes like [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md).
+
+| Model Type    | Pre-trained Weights                                                                       | Tasks Supported                              | Inference | Validation | Training | Export |
+| ------------- | ----------------------------------------------------------------------------------------- | -------------------------------------------- | --------- | ---------- | -------- | ------ |
+| SAM 2 tiny    | [sam2_t.pt](https://github.com/ultralytics/assets/releases/download/v8.3.0/sam2_t.pt)     | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ❌     |
+| SAM 2 small   | [sam2_s.pt](https://github.com/ultralytics/assets/releases/download/v8.3.0/sam2_s.pt)     | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ❌     |
+| SAM 2 base    | [sam2_b.pt](https://github.com/ultralytics/assets/releases/download/v8.3.0/sam2_b.pt)     | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ❌     |
+| SAM 2 large   | [sam2_l.pt](https://github.com/ultralytics/assets/releases/download/v8.3.0/sam2_l.pt)     | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ❌     |
+| SAM 2.1 tiny  | [sam2.1_t.pt](https://github.com/ultralytics/assets/releases/download/v8.3.0/sam2.1_t.pt) | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ❌     |
+| SAM 2.1 small | [sam2.1_s.pt](https://github.com/ultralytics/assets/releases/download/v8.3.0/sam2.1_s.pt) | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ❌     |
+| SAM 2.1 base  | [sam2.1_b.pt](https://github.com/ultralytics/assets/releases/download/v8.3.0/sam2.1_b.pt) | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ❌     |
+| SAM 2.1 large | [sam2.1_l.pt](https://github.com/ultralytics/assets/releases/download/v8.3.0/sam2.1_l.pt) | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ❌     |
+
+### SAM 2 Prediction Examples
+
+SAM 2 can be utilized across a broad spectrum of tasks, including real-time video editing, medical imaging, and autonomous systems. Its ability to segment both static and dynamic visual data makes it a versatile tool for researchers and developers.
+
+#### Segment with Prompts
+
+!!! example "Segment with Prompts"
+
+    Use prompts to segment specific objects in images or videos.
+
+    === "Python"
+
+        ```python
+        from ultralytics import SAM
+
+        # Load a model
+        model = SAM("sam2.1_b.pt")
+
+        # Display model information (optional)
+        model.info()
+
+        # Run inference with bboxes prompt
+        results = model("path/to/image.jpg", bboxes=[100, 100, 200, 200])
+
+        # Run inference with single point
+        results = model(points=[900, 370], labels=[1])
+
+        # Run inference with multiple points
+        results = model(points=[[400, 370], [900, 370]], labels=[1, 1])
+
+        # Run inference with multiple points prompt per object
+        results = model(points=[[[400, 370], [900, 370]]], labels=[[1, 1]])
+
+        # Run inference with negative points prompt
+        results = model(points=[[[400, 370], [900, 370]]], labels=[[1, 0]])
+        ```
+
+#### Segment Everything
+
+!!! example "Segment Everything"
+
+    Segment the entire image or video content without specific prompts.
+
+    === "Python"
+
+        ```python
+        from ultralytics import SAM
+
+        # Load a model
+        model = SAM("sam2.1_b.pt")
+
+        # Display model information (optional)
+        model.info()
+
+        # Run inference
+        model("path/to/video.mp4")
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Run inference with a SAM 2 model
+        yolo predict model=sam2.1_b.pt source=path/to/video.mp4
+        ```
+
+#### Segment Video and Track objects
+
+!!! example "Segment Video"
+
+    Segment the entire video content with specific prompts and track objects.
+
+    === "Python"
+
+        ```python
+        from ultralytics.models.sam import SAM2VideoPredictor
+
+        # Create SAM2VideoPredictor
+        overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="sam2_b.pt")
+        predictor = SAM2VideoPredictor(overrides=overrides)
+
+        # Run inference with single point
+        results = predictor(source="test.mp4", points=[920, 470], labels=1)
+
+        # Run inference with multiple points
+        results = predictor(source="test.mp4", points=[[920, 470], [909, 138]], labels=[1, 1])
+
+        # Run inference with multiple points prompt per object
+        results = predictor(source="test.mp4", points=[[[920, 470], [909, 138]]], labels=[[1, 1]])
+
+        # Run inference with negative points prompt
+        results = predictor(source="test.mp4", points=[[[920, 470], [909, 138]]], labels=[[1, 0]])
+        ```
+
+- This example demonstrates how SAM 2 can be used to segment the entire content of an image or video if no prompts (bboxes/points/masks) are provided.
+
+## SAM 2 comparison vs YOLOv8
+
+Here we compare Meta's smallest SAM 2 model, SAM2-t, with Ultralytics smallest segmentation model, [YOLOv8n-seg](../tasks/segment.md):
+
+| Model                                          | Size<br><sup>(MB)</sup> | Parameters<br><sup>(M)</sup> | Speed (CPU)<br><sup>(ms/im)</sup> |
+| ---------------------------------------------- | ----------------------- | ---------------------------- | --------------------------------- |
+| [Meta SAM-b](sam.md)                           | 375                     | 93.7                         | 161440                            |
+| Meta SAM2-b                                    | 162                     | 80.8                         | 121923                            |
+| Meta SAM2-t                                    | 78.1                    | 38.9                         | 85155                             |
+| [MobileSAM](mobile-sam.md)                     | 40.7                    | 10.1                         | 98543                             |
+| [FastSAM-s](fast-sam.md) with YOLOv8 backbone  | 23.7                    | 11.8                         | 140                               |
+| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7** (11.7x smaller) | **3.4** (11.4x less)         | **79.5** (1071x faster)           |
+
+This comparison shows the order-of-magnitude differences in the model sizes and speeds between models. Whereas SAM presents unique capabilities for automatic segmenting, it is not a direct competitor to YOLOv8 segment models, which are smaller, faster and more efficient.
+
+Tests run on a 2023 Apple M2 Macbook with 16GB of RAM using `torch==2.3.1` and `ultralytics==8.3.82`. To reproduce this test:
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import ASSETS, SAM, YOLO, FastSAM
+
+        # Profile SAM2-t, SAM2-b, SAM-b, MobileSAM
+        for file in ["sam_b.pt", "sam2_b.pt", "sam2_t.pt", "mobile_sam.pt"]:
+            model = SAM(file)
+            model.info()
+            model(ASSETS)
+
+        # Profile FastSAM-s
+        model = FastSAM("FastSAM-s.pt")
+        model.info()
+        model(ASSETS)
+
+        # Profile YOLOv8n-seg
+        model = YOLO("yolov8n-seg.pt")
+        model.info()
+        model(ASSETS)
+        ```
+
+## Auto-Annotation: Efficient Dataset Creation
+
+Auto-annotation is a powerful feature of SAM 2, enabling users to generate segmentation datasets quickly and accurately by leveraging pre-trained models. This capability is particularly useful for creating large, high-quality datasets without extensive manual effort.
+
+### How to Auto-Annotate with SAM 2
+
+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/M7xWw4Iodhg"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> Auto Annotation with Meta's Segment Anything 2 Model using Ultralytics | Data Labeling
+</p>
+
+To auto-annotate your dataset using SAM 2, follow this example:
+
+!!! example "Auto-Annotation Example"
+
+    ```python
+    from ultralytics.data.annotator import auto_annotate
+
+    auto_annotate(data="path/to/images", det_model="yolo11x.pt", sam_model="sam2_b.pt")
+    ```
+
+| Argument     | Type                    | Description                                                                                             | Default        |
+| ------------ | ----------------------- | ------------------------------------------------------------------------------------------------------- | -------------- |
+| `data`       | `str`                   | Path to a folder containing images to be annotated.                                                     |                |
+| `det_model`  | `str`, optional         | Pre-trained YOLO detection model. Defaults to 'yolo11x.pt'.                                             | `'yolo11x.pt'` |
+| `sam_model`  | `str`, optional         | Pre-trained SAM 2 segmentation model. Defaults to 'sam2_b.pt'.                                          | `'sam2_b.pt'`  |
+| `device`     | `str`, optional         | Device to run the models on. Defaults to an empty string (CPU or GPU, if available).                    |                |
+| `conf`       | `float`, optional       | Confidence threshold for detection model; default is 0.25.                                              | `0.25`         |
+| `iou`        | `float`, optional       | IoU threshold for filtering overlapping boxes in detection results; default is 0.45.                    | `0.45`         |
+| `imgsz`      | `int`, optional         | Input image resize dimension; default is 640.                                                           | `640`          |
+| `max_det`    | `int`, optional         | Limits detections per image to control outputs in dense scenes.                                         | `300`          |
+| `classes`    | `list`, optional        | Filters predictions to specified class IDs, returning only relevant detections.                         | `None`         |
+| `output_dir` | `str`, `None`, optional | Directory to save the annotated results. Defaults to a 'labels' folder in the same directory as 'data'. | `None`         |
+
+This function facilitates the rapid creation of high-quality segmentation datasets, ideal for researchers and developers aiming to accelerate their projects.
+
+## Limitations
+
+Despite its strengths, SAM 2 has certain limitations:
+
+- **Tracking Stability**: SAM 2 may lose track of objects during extended sequences or significant viewpoint changes.
+- **Object Confusion**: The model can sometimes confuse similar-looking objects, particularly in crowded scenes.
+- **Efficiency with Multiple Objects**: Segmentation efficiency decreases when processing multiple objects simultaneously due to the lack of inter-object communication.
+- **Detail [Accuracy](https://www.ultralytics.com/glossary/accuracy)**: May miss fine details, especially with fast-moving objects. Additional prompts can partially address this issue, but temporal smoothness is not guaranteed.
+
+## Citations and Acknowledgements
+
+If SAM 2 is a crucial part of your research or development work, please cite it using the following reference:
+
+!!! quote ""
+
+    === "BibTeX"
+
+        ```bibtex
+        @article{ravi2024sam2,
+          title={SAM 2: Segment Anything in Images and Videos},
+          author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
+          journal={arXiv preprint},
+          year={2024}
+        }
+        ```
+
+We extend our gratitude to Meta AI for their contributions to the AI community with this groundbreaking model and dataset.
+
+## FAQ
+
+### What is SAM 2 and how does it improve upon the original Segment Anything Model (SAM)?
+
+SAM 2, the successor to Meta's [Segment Anything Model (SAM)](sam.md), is a cutting-edge tool designed for comprehensive object segmentation in both images and videos. It excels in handling complex visual data through a unified, promptable model architecture that supports real-time processing and zero-shot generalization. SAM 2 offers several improvements over the original SAM, including:
+
+- **Unified Model Architecture**: Combines image and video segmentation capabilities in a single model.
+- **Real-Time Performance**: Processes approximately 44 frames per second, making it suitable for applications requiring immediate feedback.
+- **Zero-Shot Generalization**: Segments objects it has never encountered before, useful in diverse visual domains.
+- **Interactive Refinement**: Allows users to iteratively refine segmentation results by providing additional prompts.
+- **Advanced Handling of Visual Challenges**: Manages common video segmentation challenges like object occlusion and reappearance.
+
+For more details on SAM 2's architecture and capabilities, explore the [SAM 2 research paper](https://arxiv.org/abs/2401.12741).
+
+### How can I use SAM 2 for real-time video segmentation?
+
+SAM 2 can be utilized for real-time video segmentation by leveraging its promptable interface and real-time inference capabilities. Here's a basic example:
+
+!!! example "Segment with Prompts"
+
+    Use prompts to segment specific objects in images or videos.
+
+    === "Python"
+
+        ```python
+        from ultralytics import SAM
+
+        # Load a model
+        model = SAM("sam2_b.pt")
+
+        # Display model information (optional)
+        model.info()
+
+        # Segment with bounding box prompt
+        results = model("path/to/image.jpg", bboxes=[100, 100, 200, 200])
+
+        # Segment with point prompt
+        results = model("path/to/image.jpg", points=[150, 150], labels=[1])
+        ```
+
+For more comprehensive usage, refer to the [How to Use SAM 2](#how-to-use-sam-2-versatility-in-image-and-video-segmentation) section.
+
+### What datasets are used to train SAM 2, and how do they enhance its performance?
+
+SAM 2 is trained on the SA-V dataset, one of the largest and most diverse video segmentation datasets available. The SA-V dataset includes:
+
+- **51,000+ Videos**: Captured across 47 countries, providing a wide range of real-world scenarios.
+- **600,000+ Mask Annotations**: Detailed spatio-temporal mask annotations, referred to as "masklets," covering whole objects and parts.
+- **Dataset Scale**: Features 4.5 times more videos and 53 times more annotations than previous largest datasets, offering unprecedented diversity and complexity.
+
+This extensive dataset allows SAM 2 to achieve superior performance across major video segmentation benchmarks and enhances its zero-shot generalization capabilities. For more information, see the [SA-V Dataset](#sa-v-dataset) section.
+
+### How does SAM 2 handle occlusions and object reappearances in video segmentation?
+
+SAM 2 includes a sophisticated memory mechanism to manage temporal dependencies and occlusions in video data. The memory mechanism consists of:
+
+- **Memory Encoder and Memory Bank**: Stores features from past frames.
+- **Memory Attention Module**: Utilizes stored information to maintain consistent object tracking over time.
+- **Occlusion Head**: Specifically handles scenarios where objects are not visible, predicting the likelihood of an object being occluded.
+
+This mechanism ensures continuity even when objects are temporarily obscured or exit and re-enter the scene. For more details, refer to the [Memory Mechanism and Occlusion Handling](#memory-mechanism-and-occlusion-handling) section.
+
+### How does SAM 2 compare to other segmentation models like YOLOv8?
+
+SAM 2 and Ultralytics YOLOv8 serve different purposes and excel in different areas. While SAM 2 is designed for comprehensive object segmentation with advanced features like zero-shot generalization and real-time performance, YOLOv8 is optimized for speed and efficiency in [object detection](https://www.ultralytics.com/glossary/object-detection) and segmentation tasks. Here's a comparison:
+
+| Model                                          | Size<br><sup>(MB)</sup> | Parameters<br><sup>(M)</sup> | Speed (CPU)<br><sup>(ms/im)</sup> |
+| ---------------------------------------------- | ----------------------- | ---------------------------- | --------------------------------- |
+| [Meta SAM-b](sam.md)                           | 375                     | 93.7                         | 161440                            |
+| Meta SAM2-b                                    | 162                     | 80.8                         | 121923                            |
+| Meta SAM2-t                                    | 78.1                    | 38.9                         | 85155                             |
+| [MobileSAM](mobile-sam.md)                     | 40.7                    | 10.1                         | 98543                             |
+| [FastSAM-s](fast-sam.md) with YOLOv8 backbone  | 23.7                    | 11.8                         | 140                               |
+| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7** (11.7x smaller) | **3.4** (11.4x less)         | **79.5** (1071x faster)           |
+
+For more details, see the [SAM 2 comparison vs YOLOv8](#sam-2-comparison-vs-yolov8) section.
--- a/docs/en/models/sam.md
+++ b/docs/en/models/sam.md
+---
+comments: true
+description: Explore the revolutionary Segment Anything Model (SAM) for promptable image segmentation with zero-shot performance. Discover key features, datasets, and usage tips.
+keywords: Segment Anything, SAM, image segmentation, promptable segmentation, zero-shot performance, SA-1B dataset, advanced architecture, auto-annotation, Ultralytics, pre-trained models, instance segmentation, computer vision, AI, machine learning
+---
+
+# Segment Anything Model (SAM)
+
+Welcome to the frontier of [image segmentation](https://www.ultralytics.com/glossary/image-segmentation) with the Segment Anything Model, or SAM. This revolutionary model has changed the game by introducing promptable image segmentation with real-time performance, setting new standards in the field.
+
+## Introduction to SAM: The Segment Anything Model
+
+The Segment Anything Model, or SAM, is a cutting-edge image segmentation model that allows for promptable segmentation, providing unparalleled versatility in image analysis tasks. SAM forms the heart of the Segment Anything initiative, a groundbreaking project that introduces a novel model, task, and dataset for image segmentation.
+
+SAM's advanced design allows it to adapt to new image distributions and tasks without prior knowledge, a feature known as zero-shot transfer. Trained on the expansive [SA-1B dataset](https://ai.facebook.com/datasets/segment-anything/), which contains more than 1 billion masks spread over 11 million carefully curated images, SAM has displayed impressive zero-shot performance, surpassing previous fully supervised results in many cases.
+
+![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/sa-1b-dataset-sample.avif) **SA-1B Example images.** Dataset images overlaid masks from the newly introduced SA-1B dataset. SA-1B contains 11M diverse, high-resolution, licensed, and privacy protecting images and 1.1B high-quality segmentation masks. These masks were annotated fully automatically by SAM, and as verified by human ratings and numerous experiments, are of high quality and diversity. Images are grouped by number of masks per image for visualization (there are ∼100 masks per image on average).
+
+## Key Features of the Segment Anything Model (SAM)
+
+- **Promptable Segmentation Task:** SAM was designed with a promptable segmentation task in mind, allowing it to generate valid segmentation masks from any given prompt, such as spatial or text clues identifying an object.
+- **Advanced Architecture:** The Segment Anything Model employs a powerful image encoder, a prompt encoder, and a lightweight mask decoder. This unique architecture enables flexible prompting, real-time mask computation, and ambiguity awareness in segmentation tasks.
+- **The SA-1B Dataset:** Introduced by the Segment Anything project, the SA-1B dataset features over 1 billion masks on 11 million images. As the largest segmentation dataset to date, it provides SAM with a diverse and large-scale training data source.
+- **Zero-Shot Performance:** SAM displays outstanding zero-shot performance across various segmentation tasks, making it a ready-to-use tool for diverse applications with minimal need for [prompt engineering](https://www.ultralytics.com/glossary/prompt-engineering).
+
+For an in-depth look at the Segment Anything Model and the SA-1B dataset, please visit the [Segment Anything website](https://segment-anything.com/) and check out the research paper [Segment Anything](https://arxiv.org/abs/2304.02643).
+
+## Available Models, Supported Tasks, and Operating Modes
+
+This table presents the available models with their specific pre-trained weights, the tasks they support, and their compatibility with different operating modes like [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md), indicated by ✅ emojis for supported modes and ❌ emojis for unsupported modes.
+
+| Model Type | Pre-trained Weights                                                                 | Tasks Supported                              | Inference | Validation | Training | Export |
+| ---------- | ----------------------------------------------------------------------------------- | -------------------------------------------- | --------- | ---------- | -------- | ------ |
+| SAM base   | [sam_b.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/sam_b.pt) | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ❌     |
+| SAM large  | [sam_l.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/sam_l.pt) | [Instance Segmentation](../tasks/segment.md) | ✅        | ❌         | ❌       | ❌     |
+
+## How to Use SAM: Versatility and Power in Image Segmentation
+
+The Segment Anything Model can be employed for a multitude of downstream tasks that go beyond its training data. This includes edge detection, object proposal generation, [instance segmentation](https://www.ultralytics.com/glossary/instance-segmentation), and preliminary text-to-mask prediction. With prompt engineering, SAM can swiftly adapt to new tasks and data distributions in a zero-shot manner, establishing it as a versatile and potent tool for all your image segmentation needs.
+
+### SAM prediction example
+
+!!! example "Segment with prompts"
+
+    Segment image with given prompts.
+
+    === "Python"
+
+        ```python
+        from ultralytics import SAM
+
+        # Load a model
+        model = SAM("sam_b.pt")
+
+        # Display model information (optional)
+        model.info()
+
+        # Run inference with bboxes prompt
+        results = model("ultralytics/assets/zidane.jpg", bboxes=[439, 437, 524, 709])
+
+        # Run inference with single point
+        results = model(points=[900, 370], labels=[1])
+
+        # Run inference with multiple points
+        results = model(points=[[400, 370], [900, 370]], labels=[1, 1])
+
+        # Run inference with multiple points prompt per object
+        results = model(points=[[[400, 370], [900, 370]]], labels=[[1, 1]])
+
+        # Run inference with negative points prompt
+        results = model(points=[[[400, 370], [900, 370]]], labels=[[1, 0]])
+        ```
+
+!!! example "Segment everything"
+
+    Segment the whole image.
+
+    === "Python"
+
+        ```python
+        from ultralytics import SAM
+
+        # Load a model
+        model = SAM("sam_b.pt")
+
+        # Display model information (optional)
+        model.info()
+
+        # Run inference
+        model("path/to/image.jpg")
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Run inference with a SAM model
+        yolo predict model=sam_b.pt source=path/to/image.jpg
+        ```
+
+- The logic here is to segment the whole image if you don't pass any prompts(bboxes/points/masks).
+
+!!! example "SAMPredictor example"
+
+    This way you can set image once and run prompts inference multiple times without running image encoder multiple times.
+
+    === "Prompt inference"
+
+        ```python
+        from ultralytics.models.sam import Predictor as SAMPredictor
+
+        # Create SAMPredictor
+        overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="mobile_sam.pt")
+        predictor = SAMPredictor(overrides=overrides)
+
+        # Set image
+        predictor.set_image("ultralytics/assets/zidane.jpg")  # set with image file
+        predictor.set_image(cv2.imread("ultralytics/assets/zidane.jpg"))  # set with np.ndarray
+        results = predictor(bboxes=[439, 437, 524, 709])
+
+        # Run inference with single point prompt
+        results = predictor(points=[900, 370], labels=[1])
+
+        # Run inference with multiple points prompt
+        results = predictor(points=[[400, 370], [900, 370]], labels=[[1, 1]])
+
+        # Run inference with negative points prompt
+        results = predictor(points=[[[400, 370], [900, 370]]], labels=[[1, 0]])
+
+        # Reset image
+        predictor.reset_image()
+        ```
+
+    Segment everything with additional args.
+
+    === "Segment everything"
+
+        ```python
+        from ultralytics.models.sam import Predictor as SAMPredictor
+
+        # Create SAMPredictor
+        overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="mobile_sam.pt")
+        predictor = SAMPredictor(overrides=overrides)
+
+        # Segment with additional args
+        results = predictor(source="ultralytics/assets/zidane.jpg", crop_n_layers=1, points_stride=64)
+        ```
+
+!!! note
+
+    All the returned `results` in above examples are [Results](../modes/predict.md#working-with-results) object which allows access predicted masks and source image easily.
+
+- More additional args for `Segment everything` see [`Predictor/generate` Reference](../reference/models/sam/predict.md).
+
+## SAM comparison vs YOLOv8
+
+Here we compare Meta's smallest SAM model, SAM-b, with Ultralytics smallest segmentation model, [YOLOv8n-seg](../tasks/segment.md):
+
+| Model                                          | Size<br><sup>(MB)</sup> | Parameters<br><sup>(M)</sup> | Speed (CPU)<br><sup>(ms/im)</sup> |
+| ---------------------------------------------- | ----------------------- | ---------------------------- | --------------------------------- |
+| Meta SAM-b                                     | 358                     | 94.7                         | 51096                             |
+| [MobileSAM](mobile-sam.md)                     | 40.7                    | 10.1                         | 46122                             |
+| [FastSAM-s](fast-sam.md) with YOLOv8 backbone  | 23.7                    | 11.8                         | 115                               |
+| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7** (53.4x smaller) | **3.4** (27.9x less)         | **59** (866x faster)              |
+
+This comparison shows the order-of-magnitude differences in the model sizes and speeds between models. Whereas SAM presents unique capabilities for automatic segmenting, it is not a direct competitor to YOLOv8 segment models, which are smaller, faster and more efficient.
+
+Tests run on a 2023 Apple M2 Macbook with 16GB of RAM. To reproduce this test:
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import ASSETS, SAM, YOLO, FastSAM
+
+        # Profile SAM-b, MobileSAM
+        for file in ["sam_b.pt", "mobile_sam.pt"]:
+            model = SAM(file)
+            model.info()
+            model(ASSETS)
+
+        # Profile FastSAM-s
+        model = FastSAM("FastSAM-s.pt")
+        model.info()
+        model(ASSETS)
+
+        # Profile YOLOv8n-seg
+        model = YOLO("yolov8n-seg.pt")
+        model.info()
+        model(ASSETS)
+        ```
+
+## Auto-Annotation: A Quick Path to Segmentation Datasets
+
+Auto-annotation is a key feature of SAM, allowing users to generate a [segmentation dataset](../datasets/segment/index.md) using a pre-trained detection model. This feature enables rapid and accurate annotation of a large number of images, bypassing the need for time-consuming manual labeling.
+
+### Generate Your Segmentation Dataset Using a Detection Model
+
+To auto-annotate your dataset with the Ultralytics framework, use the `auto_annotate` function as shown below:
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics.data.annotator import auto_annotate
+
+        auto_annotate(data="path/to/images", det_model="yolo11x.pt", sam_model="sam_b.pt")
+        ```
+
+| Argument     | Type                  | Description                                                                                             | Default        |
+| ------------ | --------------------- | ------------------------------------------------------------------------------------------------------- | -------------- |
+| `data`       | `str`                 | Path to a folder containing images to be annotated.                                                     |                |
+| `det_model`  | `str`, optional       | Pre-trained YOLO detection model. Defaults to 'yolo11x.pt'.                                             | `'yolo11x.pt'` |
+| `sam_model`  | `str`, optional       | Pre-trained SAM segmentation model. Defaults to 'sam_b.pt'.                                             | `'sam_b.pt'`   |
+| `device`     | `str`, optional       | Device to run the models on. Defaults to an empty string (CPU or GPU, if available).                    |                |
+| `conf`       | `float`, optional     | Confidence threshold for detection model; default is 0.25.                                              | `0.25`         |
+| `iou`        | `float`, optional     | IoU threshold for filtering overlapping boxes in detection results; default is 0.45.                    | `0.45`         |
+| `imgsz`      | `int`, optional       | Input image resize dimension; default is 640.                                                           | `640`          |
+| `max_det`    | `int`, optional       | Limits detections per image to control outputs in dense scenes.                                         | `300`          |
+| `classes`    | `list`, optional      | Filters predictions to specified class IDs, returning only relevant detections.                         | `None`         |
+| `output_dir` | `str`, None, optional | Directory to save the annotated results. Defaults to a 'labels' folder in the same directory as 'data'. | `None`         |
+
+The `auto_annotate` function takes the path to your images, with optional arguments for specifying the pre-trained detection and SAM segmentation models, the device to run the models on, and the output directory for saving the annotated results.
+
+Auto-annotation with pre-trained models can dramatically cut down the time and effort required for creating high-quality segmentation datasets. This feature is especially beneficial for researchers and developers dealing with large image collections, as it allows them to focus on model development and evaluation rather than manual annotation.
+
+## Citations and Acknowledgements
+
+If you find SAM useful in your research or development work, please consider citing our paper:
+
+!!! quote ""
+
+    === "BibTeX"
+
+        ```bibtex
+        @misc{kirillov2023segment,
+              title={Segment Anything},
+              author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick},
+              year={2023},
+              eprint={2304.02643},
+              archivePrefix={arXiv},
+              primaryClass={cs.CV}
+        }
+        ```
+
+We would like to express our gratitude to Meta AI for creating and maintaining this valuable resource for the [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) community.
+
+## FAQ
+
+### What is the Segment Anything Model (SAM) by Ultralytics?
+
+The Segment Anything Model (SAM) by Ultralytics is a revolutionary image segmentation model designed for promptable segmentation tasks. It leverages advanced architecture, including image and prompt encoders combined with a lightweight mask decoder, to generate high-quality segmentation masks from various prompts such as spatial or text cues. Trained on the expansive [SA-1B dataset](https://ai.facebook.com/datasets/segment-anything/), SAM excels in zero-shot performance, adapting to new image distributions and tasks without prior knowledge. Learn more [here](#introduction-to-sam-the-segment-anything-model).
+
+### How can I use the Segment Anything Model (SAM) for image segmentation?
+
+You can use the Segment Anything Model (SAM) for image segmentation by running inference with various prompts such as bounding boxes or points. Here's an example using Python:
+
+```python
+from ultralytics import SAM
+
+# Load a model
+model = SAM("sam_b.pt")
+
+# Segment with bounding box prompt
+model("ultralytics/assets/zidane.jpg", bboxes=[439, 437, 524, 709])
+
+# Segment with points prompt
+model("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
+
+# Segment with multiple points prompt
+model("ultralytics/assets/zidane.jpg", points=[[400, 370], [900, 370]], labels=[[1, 1]])
+
+# Segment with multiple points prompt per object
+model("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 1]])
+
+# Segment with negative points prompt.
+model("ultralytics/assets/zidane.jpg", points=[[[400, 370], [900, 370]]], labels=[[1, 0]])
+```
+
+Alternatively, you can run inference with SAM in the command line interface (CLI):
+
+```bash
+yolo predict model=sam_b.pt source=path/to/image.jpg
+```
+
+For more detailed usage instructions, visit the [Segmentation section](#sam-prediction-example).
+
+### How do SAM and YOLOv8 compare in terms of performance?
+
+Compared to YOLOv8, SAM models like SAM-b and FastSAM-s are larger and slower but offer unique capabilities for automatic segmentation. For instance, Ultralytics [YOLOv8n-seg](../tasks/segment.md) is 53.4 times smaller and 866 times faster than SAM-b. However, SAM's zero-shot performance makes it highly flexible and efficient in diverse, untrained tasks. Learn more about performance comparisons between SAM and YOLOv8 [here](#sam-comparison-vs-yolov8).
+
+### How can I auto-annotate my dataset using SAM?
+
+Ultralytics' SAM offers an auto-annotation feature that allows generating segmentation datasets using a pre-trained detection model. Here's an example in Python:
+
+```python
+from ultralytics.data.annotator import auto_annotate
+
+auto_annotate(data="path/to/images", det_model="yolov8x.pt", sam_model="sam_b.pt")
+```
+
+This function takes the path to your images and optional arguments for pre-trained detection and SAM segmentation models, along with device and output directory specifications. For a complete guide, see [Auto-Annotation](#auto-annotation-a-quick-path-to-segmentation-datasets).
+
+### What datasets are used to train the Segment Anything Model (SAM)?
+
+SAM is trained on the extensive [SA-1B dataset](https://ai.facebook.com/datasets/segment-anything/) which comprises over 1 billion masks across 11 million images. SA-1B is the largest segmentation dataset to date, providing high-quality and diverse [training data](https://www.ultralytics.com/glossary/training-data), ensuring impressive zero-shot performance in varied segmentation tasks. For more details, visit the [Dataset section](#key-features-of-the-segment-anything-model-sam).
--- a/docs/en/models/yolo-nas.md
+++ b/docs/en/models/yolo-nas.md
+---
+comments: true
+description: Discover YOLO-NAS by Deci AI - a state-of-the-art object detection model with quantization support. Explore features, pretrained models, and implementation examples.
+keywords: YOLO-NAS, Deci AI, object detection, deep learning, Neural Architecture Search, Ultralytics, Python API, YOLO model, SuperGradients, pretrained models, quantization, AutoNAC
+---
+
+# YOLO-NAS
+
+## Overview
+
+Developed by Deci AI, YOLO-NAS is a groundbreaking object detection foundational model. It is the product of advanced Neural Architecture Search technology, meticulously designed to address the limitations of previous YOLO models. With significant improvements in quantization support and [accuracy](https://www.ultralytics.com/glossary/accuracy)-latency trade-offs, YOLO-NAS represents a major leap in object detection.
+
+![Model example image](https://github.com/ultralytics/docs/releases/download/0/yolo-nas-coco-map-metrics.avif) **Overview of YOLO-NAS.** YOLO-NAS employs quantization-aware blocks and selective quantization for optimal performance. The model, when converted to its INT8 quantized version, experiences a minimal precision drop, a significant improvement over other models. These advancements culminate in a superior architecture with unprecedented object detection capabilities and outstanding performance.
+
+### Key Features
+
+- **Quantization-Friendly Basic Block:** YOLO-NAS introduces a new basic block that is friendly to quantization, addressing one of the significant limitations of previous YOLO models.
+- **Sophisticated Training and Quantization:** YOLO-NAS leverages advanced training schemes and post-training quantization to enhance performance.
+- **AutoNAC Optimization and Pre-training:** YOLO-NAS utilizes AutoNAC optimization and is pre-trained on prominent datasets such as COCO, Objects365, and Roboflow 100. This pre-training makes it extremely suitable for downstream object detection tasks in production environments.
+
+## Pre-trained Models
+
+Experience the power of next-generation object detection with the pre-trained YOLO-NAS models provided by Ultralytics. These models are designed to deliver top-notch performance in terms of both speed and accuracy. Choose from a variety of options tailored to your specific needs:
+
+| Model            | mAP   | Latency (ms) |
+| ---------------- | ----- | ------------ |
+| YOLO-NAS S       | 47.5  | 3.21         |
+| YOLO-NAS M       | 51.55 | 5.85         |
+| YOLO-NAS L       | 52.22 | 7.87         |
+| YOLO-NAS S INT-8 | 47.03 | 2.36         |
+| YOLO-NAS M INT-8 | 51.0  | 3.78         |
+| YOLO-NAS L INT-8 | 52.1  | 4.78         |
+
+Each model variant is designed to offer a balance between [Mean Average Precision](https://www.ultralytics.com/glossary/mean-average-precision-map) (mAP) and latency, helping you optimize your object detection tasks for both performance and speed.
+
+## Usage Examples
+
+Ultralytics has made YOLO-NAS models easy to integrate into your Python applications via our `ultralytics` python package. The package provides a user-friendly Python API to streamline the process.
+
+The following examples show how to use YOLO-NAS models with the `ultralytics` package for inference and validation:
+
+### Inference and Validation Examples
+
+In this example we validate YOLO-NAS-s on the COCO8 dataset.
+
+!!! example
+
+    This example provides simple inference and validation code for YOLO-NAS. For handling inference results see [Predict](../modes/predict.md) mode. For using YOLO-NAS with additional modes see [Val](../modes/val.md) and [Export](../modes/export.md). YOLO-NAS on the `ultralytics` package does not support training.
+
+    === "Python"
+
+        [PyTorch](https://www.ultralytics.com/glossary/pytorch) pretrained `*.pt` models files can be passed to the `NAS()` class to create a model instance in python:
+
+        ```python
+        from ultralytics import NAS
+
+        # Load a COCO-pretrained YOLO-NAS-s model
+        model = NAS("yolo_nas_s.pt")
+
+        # Display model information (optional)
+        model.info()
+
+        # Validate the model on the COCO8 example dataset
+        results = model.val(data="coco8.yaml")
+
+        # Run inference with the YOLO-NAS-s model on the 'bus.jpg' image
+        results = model("path/to/bus.jpg")
+        ```
+
+    === "CLI"
+
+        CLI commands are available to directly run the models:
+
+        ```bash
+        # Load a COCO-pretrained YOLO-NAS-s model and validate it's performance on the COCO8 example dataset
+        yolo val model=yolo_nas_s.pt data=coco8.yaml
+
+        # Load a COCO-pretrained YOLO-NAS-s model and run inference on the 'bus.jpg' image
+        yolo predict model=yolo_nas_s.pt source=path/to/bus.jpg
+        ```
+
+## Supported Tasks and Modes
+
+We offer three variants of the YOLO-NAS models: Small (s), Medium (m), and Large (l). Each variant is designed to cater to different computational and performance needs:
+
+- **YOLO-NAS-s**: Optimized for environments where computational resources are limited but efficiency is key.
+- **YOLO-NAS-m**: Offers a balanced approach, suitable for general-purpose [object detection](https://www.ultralytics.com/glossary/object-detection) with higher accuracy.
+- **YOLO-NAS-l**: Tailored for scenarios requiring the highest accuracy, where computational resources are less of a constraint.
+
+Below is a detailed overview of each model, including links to their pre-trained weights, the tasks they support, and their compatibility with different operating modes.
+
+| Model Type | Pre-trained Weights                                                                           | Tasks Supported                        | Inference | Validation | Training | Export |
+| ---------- | --------------------------------------------------------------------------------------------- | -------------------------------------- | --------- | ---------- | -------- | ------ |
+| YOLO-NAS-s | [yolo_nas_s.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolo_nas_s.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ❌       | ✅     |
+| YOLO-NAS-m | [yolo_nas_m.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolo_nas_m.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ❌       | ✅     |
+| YOLO-NAS-l | [yolo_nas_l.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolo_nas_l.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ❌       | ✅     |
+
+## Citations and Acknowledgements
+
+If you employ YOLO-NAS in your research or development work, please cite SuperGradients:
+
+!!! quote ""
+
+    === "BibTeX"
+
+        ```bibtex
+        @misc{supergradients,
+              doi = {10.5281/ZENODO.7789328},
+              url = {https://zenodo.org/record/7789328},
+              author = {Aharon,  Shay and {Louis-Dupont} and {Ofri Masad} and Yurkova,  Kate and {Lotem Fridman} and {Lkdci} and Khvedchenya,  Eugene and Rubin,  Ran and Bagrov,  Natan and Tymchenko,  Borys and Keren,  Tomer and Zhilko,  Alexander and {Eran-Deci}},
+              title = {Super-Gradients},
+              publisher = {GitHub},
+              journal = {GitHub repository},
+              year = {2021},
+        }
+        ```
+
+We express our gratitude to Deci AI's [SuperGradients](https://github.com/Deci-AI/super-gradients/) team for their efforts in creating and maintaining this valuable resource for the [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) community. We believe YOLO-NAS, with its innovative architecture and superior object detection capabilities, will become a critical tool for developers and researchers alike.
+
+## FAQ
+
+### What is YOLO-NAS and how does it improve over previous YOLO models?
+
+YOLO-NAS, developed by Deci AI, is a state-of-the-art object detection model leveraging advanced Neural Architecture Search (NAS) technology. It addresses the limitations of previous YOLO models by introducing features like quantization-friendly basic blocks and sophisticated training schemes. This results in significant improvements in performance, particularly in environments with limited computational resources. YOLO-NAS also supports quantization, maintaining high accuracy even when converted to its INT8 version, enhancing its suitability for production environments. For more details, see the [Overview](#overview) section.
+
+### How can I integrate YOLO-NAS models into my Python application?
+
+You can easily integrate YOLO-NAS models into your Python application using the `ultralytics` package. Here's a simple example of how to load a pre-trained YOLO-NAS model and perform inference:
+
+```python
+from ultralytics import NAS
+
+# Load a COCO-pretrained YOLO-NAS-s model
+model = NAS("yolo_nas_s.pt")
+
+# Validate the model on the COCO8 example dataset
+results = model.val(data="coco8.yaml")
+
+# Run inference with the YOLO-NAS-s model on the 'bus.jpg' image
+results = model("path/to/bus.jpg")
+```
+
+For more information, refer to the [Inference and Validation Examples](#inference-and-validation-examples).
+
+### What are the key features of YOLO-NAS and why should I consider using it?
+
+YOLO-NAS introduces several key features that make it a superior choice for object detection tasks:
+
+- **Quantization-Friendly Basic Block:** Enhanced architecture that improves model performance with minimal [precision](https://www.ultralytics.com/glossary/precision) drop post quantization.
+- **Sophisticated Training and Quantization:** Employs advanced training schemes and post-training quantization techniques.
+- **AutoNAC Optimization and Pre-training:** Utilizes AutoNAC optimization and is pre-trained on prominent datasets like COCO, Objects365, and Roboflow 100.
+
+These features contribute to its high accuracy, efficient performance, and suitability for deployment in production environments. Learn more in the [Key Features](#key-features) section.
+
+### Which tasks and modes are supported by YOLO-NAS models?
+
+YOLO-NAS models support various object detection tasks and modes such as inference, validation, and export. They do not support training. The supported models include YOLO-NAS-s, YOLO-NAS-m, and YOLO-NAS-l, each tailored to different computational capacities and performance needs. For a detailed overview, refer to the [Supported Tasks and Modes](#supported-tasks-and-modes) section.
+
+### Are there pre-trained YOLO-NAS models available and how do I access them?
+
+Yes, Ultralytics provides pre-trained YOLO-NAS models that you can access directly. These models are pre-trained on datasets like COCO, ensuring high performance in terms of both speed and accuracy. You can download these models using the links provided in the [Pre-trained Models](#pre-trained-models) section. Here are some examples:
+
+- [YOLO-NAS-s](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolo_nas_s.pt)
+- [YOLO-NAS-m](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolo_nas_m.pt)
+- [YOLO-NAS-l](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolo_nas_l.pt)
--- a/docs/en/models/yolo-world.md
+++ b/docs/en/models/yolo-world.md
+---
+comments: true
+description: Explore the YOLO-World Model for efficient, real-time open-vocabulary object detection using Ultralytics YOLOv8 advancements. Achieve top performance with minimal computation.
+keywords: YOLO-World, Ultralytics, open-vocabulary detection, YOLOv8, real-time object detection, machine learning, computer vision, AI, deep learning, model training
+---
+
+# YOLO-World Model
+
+The YOLO-World Model introduces an advanced, real-time [Ultralytics](https://www.ultralytics.com/) [YOLOv8](yolov8.md)-based approach for Open-Vocabulary Detection tasks. This innovation enables the detection of any object within an image based on descriptive texts. By significantly lowering computational demands while preserving competitive performance, YOLO-World emerges as a versatile tool for numerous vision-based applications.
+
+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/cfTKj96TjSE"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> YOLO World training workflow on custom dataset
+</p>
+
+![YOLO-World Model architecture overview](https://github.com/ultralytics/docs/releases/download/0/yolo-world-model-architecture-overview.avif)
+
+## Overview
+
+YOLO-World tackles the challenges faced by traditional Open-Vocabulary detection models, which often rely on cumbersome [Transformer](https://www.ultralytics.com/glossary/transformer) models requiring extensive computational resources. These models' dependence on pre-defined object categories also restricts their utility in dynamic scenarios. YOLO-World revitalizes the YOLOv8 framework with open-vocabulary detection capabilities, employing vision-[language modeling](https://www.ultralytics.com/glossary/language-modeling) and pre-training on expansive datasets to excel at identifying a broad array of objects in zero-shot scenarios with unmatched efficiency.
+
+## Key Features
+
+1. **Real-time Solution:** Harnessing the computational speed of CNNs, YOLO-World delivers a swift open-vocabulary detection solution, catering to industries in need of immediate results.
+
+2. **Efficiency and Performance:** YOLO-World slashes computational and resource requirements without sacrificing performance, offering a robust alternative to models like SAM but at a fraction of the computational cost, enabling real-time applications.
+
+3. **Inference with Offline Vocabulary:** YOLO-World introduces a "prompt-then-detect" strategy, employing an offline vocabulary to enhance efficiency further. This approach enables the use of custom prompts computed apriori, including captions or categories, to be encoded and stored as offline vocabulary embeddings, streamlining the detection process.
+
+4. **Powered by YOLOv8:** Built upon [Ultralytics YOLOv8](yolov8.md), YOLO-World leverages the latest advancements in real-time object detection to facilitate open-vocabulary detection with unparalleled accuracy and speed.
+
+5. **Benchmark Excellence:** YOLO-World outperforms existing open-vocabulary detectors, including MDETR and GLIP series, in terms of speed and efficiency on standard benchmarks, showcasing YOLOv8's superior capability on a single NVIDIA V100 GPU.
+
+6. **Versatile Applications:** YOLO-World's innovative approach unlocks new possibilities for a multitude of vision tasks, delivering speed improvements by orders of magnitude over existing methods.
+
+## Available Models, Supported Tasks, and Operating Modes
+
+This section details the models available with their specific pre-trained weights, the tasks they support, and their compatibility with various operating modes such as [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md), denoted by ✅ for supported modes and ❌ for unsupported modes.
+
+!!! note
+
+    All the YOLOv8-World weights have been directly migrated from the official [YOLO-World](https://github.com/AILab-CVC/YOLO-World) repository, highlighting their excellent contributions.
+
+| Model Type      | Pre-trained Weights                                                                                     | Tasks Supported                        | Inference | Validation | Training | Export |
+| --------------- | ------------------------------------------------------------------------------------------------------- | -------------------------------------- | --------- | ---------- | -------- | ------ |
+| YOLOv8s-world   | [yolov8s-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8s-worldv2 | [yolov8s-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+| YOLOv8m-world   | [yolov8m-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8m-worldv2 | [yolov8m-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+| YOLOv8l-world   | [yolov8l-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8l-worldv2 | [yolov8l-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+| YOLOv8x-world   | [yolov8x-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8x-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8x-worldv2 | [yolov8x-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8x-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+
+## Zero-shot Transfer on COCO Dataset
+
+| Model Type      | mAP  | mAP50 | mAP75 |
+| --------------- | ---- | ----- | ----- |
+| yolov8s-world   | 37.4 | 52.0  | 40.6  |
+| yolov8s-worldv2 | 37.7 | 52.2  | 41.0  |
+| yolov8m-world   | 42.0 | 57.0  | 45.6  |
+| yolov8m-worldv2 | 43.0 | 58.4  | 46.8  |
+| yolov8l-world   | 45.7 | 61.3  | 49.8  |
+| yolov8l-worldv2 | 45.8 | 61.3  | 49.8  |
+| yolov8x-world   | 47.0 | 63.0  | 51.2  |
+| yolov8x-worldv2 | 47.1 | 62.8  | 51.4  |
+
+## Usage Examples
+
+The YOLO-World models are easy to integrate into your Python applications. Ultralytics provides user-friendly Python API and CLI commands to streamline development.
+
+### Train Usage
+
+!!! tip
+
+    We strongly recommend to use `yolov8-worldv2` model for custom training, because it supports deterministic training and also easy to export other formats i.e onnx/tensorrt.
+
+[Object detection](https://www.ultralytics.com/glossary/object-detection) is straightforward with the `train` method, as illustrated below:
+
+!!! example
+
+    === "Python"
+
+        [PyTorch](https://www.ultralytics.com/glossary/pytorch) pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLOWorld()` class to create a model instance in python:
+
+        ```python
+        from ultralytics import YOLOWorld
+
+        # Load a pretrained YOLOv8s-worldv2 model
+        model = YOLOWorld("yolov8s-worldv2.pt")
+
+        # Train the model on the COCO8 example dataset for 100 epochs
+        results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
+
+        # Run inference with the YOLOv8n model on the 'bus.jpg' image
+        results = model("path/to/bus.jpg")
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Load a pretrained YOLOv8s-worldv2 model and train it on the COCO8 example dataset for 100 epochs
+        yolo train model=yolov8s-worldv2.yaml data=coco8.yaml epochs=100 imgsz=640
+        ```
+
+### Predict Usage
+
+Object detection is straightforward with the `predict` method, as illustrated below:
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import YOLOWorld
+
+        # Initialize a YOLO-World model
+        model = YOLOWorld("yolov8s-world.pt")  # or select yolov8m/l-world.pt for different sizes
+
+        # Execute inference with the YOLOv8s-world model on the specified image
+        results = model.predict("path/to/image.jpg")
+
+        # Show results
+        results[0].show()
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Perform object detection using a YOLO-World model
+        yolo predict model=yolov8s-world.pt source=path/to/image.jpg imgsz=640
+        ```
+
+This snippet demonstrates the simplicity of loading a pre-trained model and running a prediction on an image.
+
+### Val Usage
+
+Model validation on a dataset is streamlined as follows:
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import YOLO
+
+        # Create a YOLO-World model
+        model = YOLO("yolov8s-world.pt")  # or select yolov8m/l-world.pt for different sizes
+
+        # Conduct model validation on the COCO8 example dataset
+        metrics = model.val(data="coco8.yaml")
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Validate a YOLO-World model on the COCO8 dataset with a specified image size
+        yolo val model=yolov8s-world.pt data=coco8.yaml imgsz=640
+        ```
+
+### Track Usage
+
+Object tracking with YOLO-World model on a video/images is streamlined as follows:
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import YOLO
+
+        # Create a YOLO-World model
+        model = YOLO("yolov8s-world.pt")  # or select yolov8m/l-world.pt for different sizes
+
+        # Track with a YOLO-World model on a video
+        results = model.track(source="path/to/video.mp4")
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Track with a YOLO-World model on the video with a specified image size
+        yolo track model=yolov8s-world.pt imgsz=640 source="path/to/video/file.mp4"
+        ```
+
+!!! note
+
+    The YOLO-World models provided by Ultralytics come pre-configured with [COCO dataset](../datasets/detect/coco.md) categories as part of their offline vocabulary, enhancing efficiency for immediate application. This integration allows the YOLOv8-World models to directly recognize and predict the 80 standard categories defined in the COCO dataset without requiring additional setup or customization.
+
+### Set prompts
+
+![YOLO-World prompt class names overview](https://github.com/ultralytics/docs/releases/download/0/yolo-world-prompt-class-names-overview.avif)
+
+The YOLO-World framework allows for the dynamic specification of classes through custom prompts, empowering users to tailor the model to their specific needs **without retraining**. This feature is particularly useful for adapting the model to new domains or specific tasks that were not originally part of the [training data](https://www.ultralytics.com/glossary/training-data). By setting custom prompts, users can essentially guide the model's focus towards objects of interest, enhancing the relevance and accuracy of the detection results.
+
+For instance, if your application only requires detecting 'person' and 'bus' objects, you can specify these classes directly:
+
+!!! example
+
+    === "Custom Inference Prompts"
+
+        ```python
+        from ultralytics import YOLO
+
+        # Initialize a YOLO-World model
+        model = YOLO("yolov8s-world.pt")  # or choose yolov8m/l-world.pt
+
+        # Define custom classes
+        model.set_classes(["person", "bus"])
+
+        # Execute prediction for specified categories on an image
+        results = model.predict("path/to/image.jpg")
+
+        # Show results
+        results[0].show()
+        ```
+
+You can also save a model after setting custom classes. By doing this you create a version of the YOLO-World model that is specialized for your specific use case. This process embeds your custom class definitions directly into the model file, making the model ready to use with your specified classes without further adjustments. Follow these steps to save and load your custom YOLOv8 model:
+
+!!! example
+
+    === "Persisting Models with Custom Vocabulary"
+
+        First load a YOLO-World model, set custom classes for it and save it:
+
+        ```python
+        from ultralytics import YOLO
+
+        # Initialize a YOLO-World model
+        model = YOLO("yolov8s-world.pt")  # or select yolov8m/l-world.pt
+
+        # Define custom classes
+        model.set_classes(["person", "bus"])
+
+        # Save the model with the defined offline vocabulary
+        model.save("custom_yolov8s.pt")
+        ```
+
+        After saving, the custom_yolov8s.pt model behaves like any other pre-trained YOLOv8 model but with a key difference: it is now optimized to detect only the classes you have defined. This customization can significantly improve detection performance and efficiency for your specific application scenarios.
+
+        ```python
+        from ultralytics import YOLO
+
+        # Load your custom model
+        model = YOLO("custom_yolov8s.pt")
+
+        # Run inference to detect your custom classes
+        results = model.predict("path/to/image.jpg")
+
+        # Show results
+        results[0].show()
+        ```
+
+### Benefits of Saving with Custom Vocabulary
+
+- **Efficiency**: Streamlines the detection process by focusing on relevant objects, reducing computational overhead and speeding up inference.
+- **Flexibility**: Allows for easy adaptation of the model to new or niche detection tasks without the need for extensive retraining or data collection.
+- **Simplicity**: Simplifies deployment by eliminating the need to repeatedly specify custom classes at runtime, making the model directly usable with its embedded vocabulary.
+- **Performance**: Enhances detection [accuracy](https://www.ultralytics.com/glossary/accuracy) for specified classes by focusing the model's attention and resources on recognizing the defined objects.
+
+This approach provides a powerful means of customizing state-of-the-art object detection models for specific tasks, making advanced AI more accessible and applicable to a broader range of practical applications.
+
+## Reproduce official results from scratch(Experimental)
+
+### Prepare datasets
+
+- Train data
+
+| Dataset                                                           | Type      | Samples | Boxes | Annotation Files                                                                                                                           |
+| ----------------------------------------------------------------- | --------- | ------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------ |
+| [Objects365v1](https://opendatalab.com/OpenDataLab/Objects365_v1) | Detection | 609k    | 9621k | [objects365_train.json](https://opendatalab.com/OpenDataLab/Objects365_v1)                                                                 |
+| [GQA](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)  | Grounding | 621k    | 3681k | [final_mixed_train_no_coco.json](https://huggingface.co/GLIPModel/GLIP/blob/main/mdetr_annotations/final_mixed_train_no_coco.json)         |
+| [Flickr30k](https://shannon.cs.illinois.edu/DenotationGraph/)     | Grounding | 149k    | 641k  | [final_flickr_separateGT_train.json](https://huggingface.co/GLIPModel/GLIP/blob/main/mdetr_annotations/final_flickr_separateGT_train.json) |
+
+- Val data
+
+| Dataset                                                                                                 | Type      | Annotation Files                                                                                       |
+| ------------------------------------------------------------------------------------------------------- | --------- | ------------------------------------------------------------------------------------------------------ |
+| [LVIS minival](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/lvis.yaml) | Detection | [minival.txt](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/lvis.yaml) |
+
+### Launch training from scratch
+
+!!! note
+
+    `WorldTrainerFromScratch` is highly customized to allow training yolo-world models on both detection datasets and grounding datasets simultaneously. More details please checkout [ultralytics.model.yolo.world.train_world.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/yolo/world/train_world.py).
+
+!!! example
+
+    === "Python"
+
+        ```python
+        from ultralytics import YOLOWorld
+        from ultralytics.models.yolo.world.train_world import WorldTrainerFromScratch
+
+        data = dict(
+            train=dict(
+                yolo_data=["Objects365.yaml"],
+                grounding_data=[
+                    dict(
+                        img_path="../datasets/flickr30k/images",
+                        json_file="../datasets/flickr30k/final_flickr_separateGT_train.json",
+                    ),
+                    dict(
+                        img_path="../datasets/GQA/images",
+                        json_file="../datasets/GQA/final_mixed_train_no_coco.json",
+                    ),
+                ],
+            ),
+            val=dict(yolo_data=["lvis.yaml"]),
+        )
+        model = YOLOWorld("yolov8s-worldv2.yaml")
+        model.train(data=data, batch=128, epochs=100, trainer=WorldTrainerFromScratch)
+        ```
+
+## Citations and Acknowledgements
+
+We extend our gratitude to the [Tencent AILab Computer Vision Center](https://www.tencent.com/) for their pioneering work in real-time open-vocabulary object detection with YOLO-World:
+
+!!! quote ""
+
+    === "BibTeX"
+
+        ```bibtex
+        @article{cheng2024yolow,
+        title={YOLO-World: Real-Time Open-Vocabulary Object Detection},
+        author={Cheng, Tianheng and Song, Lin and Ge, Yixiao and Liu, Wenyu and Wang, Xinggang and Shan, Ying},
+        journal={arXiv preprint arXiv:2401.17270},
+        year={2024}
+        }
+        ```
+
+For further reading, the original YOLO-World paper is available on [arXiv](https://arxiv.org/pdf/2401.17270v2.pdf). The project's source code and additional resources can be accessed via their [GitHub repository](https://github.com/AILab-CVC/YOLO-World). We appreciate their commitment to advancing the field and sharing their valuable insights with the community.
+
+## FAQ
+
+### What is the YOLO-World model and how does it work?
+
+The YOLO-World model is an advanced, real-time object detection approach based on the [Ultralytics YOLOv8](yolov8.md) framework. It excels in Open-Vocabulary Detection tasks by identifying objects within an image based on descriptive texts. Using vision-language modeling and pre-training on large datasets, YOLO-World achieves high efficiency and performance with significantly reduced computational demands, making it ideal for real-time applications across various industries.
+
+### How does YOLO-World handle inference with custom prompts?
+
+YOLO-World supports a "prompt-then-detect" strategy, which utilizes an offline vocabulary to enhance efficiency. Custom prompts like captions or specific object categories are pre-encoded and stored as offline vocabulary [embeddings](https://www.ultralytics.com/glossary/embeddings). This approach streamlines the detection process without the need for retraining. You can dynamically set these prompts within the model to tailor it to specific detection tasks, as shown below:
+
+```python
+from ultralytics import YOLOWorld
+
+# Initialize a YOLO-World model
+model = YOLOWorld("yolov8s-world.pt")
+
+# Define custom classes
+model.set_classes(["person", "bus"])
+
+# Execute prediction on an image
+results = model.predict("path/to/image.jpg")
+
+# Show results
+results[0].show()
+```
+
+### Why should I choose YOLO-World over traditional Open-Vocabulary detection models?
+
+YOLO-World provides several advantages over traditional Open-Vocabulary detection models:
+
+- **Real-Time Performance:** It leverages the computational speed of CNNs to offer quick, efficient detection.
+- **Efficiency and Low Resource Requirement:** YOLO-World maintains high performance while significantly reducing computational and resource demands.
+- **Customizable Prompts:** The model supports dynamic prompt setting, allowing users to specify custom detection classes without retraining.
+- **Benchmark Excellence:** It outperforms other open-vocabulary detectors like MDETR and GLIP in both speed and efficiency on standard benchmarks.
+
+### How do I train a YOLO-World model on my dataset?
+
+Training a YOLO-World model on your dataset is straightforward through the provided Python API or CLI commands. Here's how to start training using Python:
+
+```python
+from ultralytics import YOLOWorld
+
+# Load a pretrained YOLOv8s-worldv2 model
+model = YOLOWorld("yolov8s-worldv2.pt")
+
+# Train the model on the COCO8 dataset for 100 epochs
+results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
+```
+
+Or using CLI:
+
+```bash
+yolo train model=yolov8s-worldv2.yaml data=coco8.yaml epochs=100 imgsz=640
+```
+
+### What are the available pre-trained YOLO-World models and their supported tasks?
+
+Ultralytics offers multiple pre-trained YOLO-World models supporting various tasks and operating modes:
+
+| Model Type      | Pre-trained Weights                                                                                     | Tasks Supported                        | Inference | Validation | Training | Export |
+| --------------- | ------------------------------------------------------------------------------------------------------- | -------------------------------------- | --------- | ---------- | -------- | ------ |
+| YOLOv8s-world   | [yolov8s-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8s-worldv2 | [yolov8s-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+| YOLOv8m-world   | [yolov8m-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8m-worldv2 | [yolov8m-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+| YOLOv8l-world   | [yolov8l-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8l-worldv2 | [yolov8l-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+| YOLOv8x-world   | [yolov8x-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8x-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8x-worldv2 | [yolov8x-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8x-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+
+### How do I reproduce the official results of YOLO-World from scratch?
+
+To reproduce the official results from scratch, you need to prepare the datasets and launch the training using the provided code. The training procedure involves creating a data dictionary and running the `train` method with a custom trainer:
+
+```python
+from ultralytics import YOLOWorld
+from ultralytics.models.yolo.world.train_world import WorldTrainerFromScratch
+
+data = {
+    "train": {
+        "yolo_data": ["Objects365.yaml"],
+        "grounding_data": [
+            {
+                "img_path": "../datasets/flickr30k/images",
+                "json_file": "../datasets/flickr30k/final_flickr_separateGT_train.json",
+            },
+            {
+                "img_path": "../datasets/GQA/images",
+                "json_file": "../datasets/GQA/final_mixed_train_no_coco.json",
+            },
+        ],
+    },
+    "val": {"yolo_data": ["lvis.yaml"]},
+}
+
+model = YOLOWorld("yolov8s-worldv2.yaml")
+model.train(data=data, batch=128, epochs=100, trainer=WorldTrainerFromScratch)
+```
--- a/docs/en/models/yolo11.md
+++ b/docs/en/models/yolo11.md
+---
+comments: true
+description: Discover YOLO11, the latest advancement in state-of-the-art object detection, offering unmatched accuracy and efficiency for diverse computer vision tasks.
+keywords: YOLO11, state-of-the-art object detection, YOLO series, Ultralytics, computer vision, AI, machine learning, deep learning
+---
+
+# Ultralytics YOLO11
+
+## Overview
+
+YOLO11 is the latest iteration in the [Ultralytics](https://www.ultralytics.com/) YOLO series of real-time object detectors, redefining what's possible with cutting-edge [accuracy](https://www.ultralytics.com/glossary/accuracy), speed, and efficiency. Building upon the impressive advancements of previous YOLO versions, YOLO11 introduces significant improvements in architecture and training methods, making it a versatile choice for a wide range of [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) tasks.
+
+![Ultralytics YOLO11 Comparison Plots](https://raw.githubusercontent.com/ultralytics/assets/refs/heads/main/yolo/performance-comparison.png)
+
+<p align="center">
+  <br>
+  <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/-JXwa-WlkU8"
+    title="YouTube video player" frameborder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowfullscreen>
+  </iframe>
+  <br>
+  <strong>Watch:</strong> How to Use Ultralytics YOLO11 for Object Detection and Tracking | How to Benchmark | YOLO11 RELEASED🚀
+</p>
+
+## Key Features
+
+- **Enhanced Feature Extraction:** YOLO11 employs an improved backbone and neck architecture, which enhances [feature extraction](https://www.ultralytics.com/glossary/feature-extraction) capabilities for more precise object detection and complex task performance.
+- **Optimized for Efficiency and Speed:** YOLO11 introduces refined architectural designs and optimized training pipelines, delivering faster processing speeds and maintaining an optimal balance between accuracy and performance.
+- **Greater Accuracy with Fewer Parameters:** With advancements in model design, YOLO11m achieves a higher [mean Average Precision](https://www.ultralytics.com/glossary/mean-average-precision-map) (mAP) on the COCO dataset while using 22% fewer parameters than YOLOv8m, making it computationally efficient without compromising accuracy.
+- **Adaptability Across Environments:** YOLO11 can be seamlessly deployed across various environments, including edge devices, cloud platforms, and systems supporting NVIDIA GPUs, ensuring maximum flexibility.
+- **Broad Range of Supported Tasks:** Whether it's object detection, instance segmentation, image classification, pose estimation, or oriented object detection (OBB), YOLO11 is designed to cater to a diverse set of computer vision challenges.
+
+## Supported Tasks and Modes
+
+YOLO11 builds upon the versatile model range introduced in YOLOv8, offering enhanced support across various computer vision tasks:
+
+| Model       | Filenames                                                                                 | Task                                         | Inference | Validation | Training | Export |
+| ----------- | ----------------------------------------------------------------------------------------- | -------------------------------------------- | --------- | ---------- | -------- | ------ |
+| YOLO11      | `yolo11n.pt` `yolo11s.pt` `yolo11m.pt` `yolo11l.pt` `yolo11x.pt`                          | [Detection](../tasks/detect.md)              | ✅        | ✅         | ✅       | ✅     |
+| YOLO11-seg  | `yolo11n-seg.pt` `yolo11s-seg.pt` `yolo11m-seg.pt` `yolo11l-seg.pt` `yolo11x-seg.pt`      | [Instance Segmentation](../tasks/segment.md) | ✅        | ✅         | ✅       | ✅     |
+| YOLO11-pose | `yolo11n-pose.pt` `yolo11s-pose.pt` `yolo11m-pose.pt` `yolo11l-pose.pt` `yolo11x-pose.pt` | [Pose/Keypoints](../tasks/pose.md)           | ✅        | ✅         | ✅       | ✅     |
+| YOLO11-obb  | `yolo11n-obb.pt` `yolo11s-obb.pt` `yolo11m-obb.pt` `yolo11l-obb.pt` `yolo11x-obb.pt`      | [Oriented Detection](../tasks/obb.md)        | ✅        | ✅         | ✅       | ✅     |
+| YOLO11-cls  | `yolo11n-cls.pt` `yolo11s-cls.pt` `yolo11m-cls.pt` `yolo11l-cls.pt` `yolo11x-cls.pt`      | [Classification](../tasks/classify.md)       | ✅        | ✅         | ✅       | ✅     |
+
+This table provides an overview of the YOLO11 model variants, showcasing their applicability in specific tasks and compatibility with operational modes such as Inference, Validation, Training, and Export. This flexibility makes YOLO11 suitable for a wide range of applications in computer vision, from real-time detection to complex segmentation tasks.
+
+## Performance Metrics
+
+!!! performance
+
+    === "Detection (COCO)"
+
+        See [Detection Docs](../tasks/detect.md) for usage examples with these models trained on [COCO](../datasets/detect/coco.md), which include 80 pre-trained classes.
+
+{% filter indent(width=8, first=False, blank=True) %}
+{% include "macros/yolo-det-perf.md" %}
+{% endfilter %}
+
+    === "Segmentation (COCO)"
+
+        See [Segmentation Docs](../tasks/segment.md) for usage examples with these models trained on [COCO](../datasets/segment/coco.md), which include 80 pre-trained classes.
+
+{% filter indent(width=8, first=False, blank=True) %}
+{% include "macros/yolo-seg-perf.md" %}
+{% endfilter %}
+
+    === "Classification (ImageNet)"
+
+        See [Classification Docs](../tasks/classify.md) for usage examples with these models trained on [ImageNet](../datasets/classify/imagenet.md), which include 1000 pre-trained classes.
+
+{% filter indent(width=8, first=False, blank=True) %}
+{% include "macros/yolo-cls-perf.md" %}
+{% endfilter %}
+
+    === "Pose (COCO)"
+
+        See [Pose Estimation Docs](../tasks/pose.md) for usage examples with these models trained on [COCO](../datasets/pose/coco.md), which include 1 pre-trained class, 'person'.
+
+{% filter indent(width=8, first=False, blank=True) %}
+{% include "macros/yolo-pose-perf.md" %}
+{% endfilter %}
+
+    === "OBB (DOTAv1)"
+
+        See [Oriented Detection Docs](../tasks/obb.md) for usage examples with these models trained on [DOTAv1](../datasets/obb/dota-v2.md#dota-v10), which include 15 pre-trained classes.
+
+{% filter indent(width=8, first=False, blank=True) %}
+{% include "macros/yolo-obb-perf.md" %}
+{% endfilter %}
+
+## Usage Examples
+
+This section provides simple YOLO11 training and inference examples. For full documentation on these and other [modes](../modes/index.md), see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md), and [Export](../modes/export.md) docs pages.
+
+Note that the example below is for YOLO11 [Detect](../tasks/detect.md) models for [object detection](https://www.ultralytics.com/glossary/object-detection). For additional supported tasks, see the [Segment](../tasks/segment.md), [Classify](../tasks/classify.md), [OBB](../tasks/obb.md), and [Pose](../tasks/pose.md) docs.
+
+!!! example
+
+    === "Python"
+
+        [PyTorch](https://www.ultralytics.com/glossary/pytorch) pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()` class to create a model instance in Python:
+
+        ```python
+        from ultralytics import YOLO
+
+        # Load a COCO-pretrained YOLO11n model
+        model = YOLO("yolo11n.pt")
+
+        # Train the model on the COCO8 example dataset for 100 epochs
+        results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
+
+        # Run inference with the YOLO11n model on the 'bus.jpg' image
+        results = model("path/to/bus.jpg")
+        ```
+
+    === "CLI"
+
+        CLI commands are available to directly run the models:
+
+        ```bash
+        # Load a COCO-pretrained YOLO11n model and train it on the COCO8 example dataset for 100 epochs
+        yolo train model=yolo11n.pt data=coco8.yaml epochs=100 imgsz=640
+
+        # Load a COCO-pretrained YOLO11n model and run inference on the 'bus.jpg' image
+        yolo predict model=yolo11n.pt source=path/to/bus.jpg
+        ```
+
+## Citations and Acknowledgements
+
+!!! tip "Ultralytics YOLO11 Publication"
+
+    Ultralytics has not published a formal research paper for YOLO11 due to the rapidly evolving nature of the models. We focus on advancing the technology and making it easier to use, rather than producing static documentation. For the most up-to-date information on YOLO architecture, features, and usage, please refer to our [GitHub repository](https://github.com/ultralytics/ultralytics) and [documentation](https://docs.ultralytics.com/).
+
+If you use YOLO11 or any other software from this repository in your work, please cite it using the following format:
+
+!!! quote ""
+
+    === "BibTeX"
+
+        ```bibtex
+        @software{yolo11_ultralytics,
+          author = {Glenn Jocher and Jing Qiu},
+          title = {Ultralytics YOLO11},
+          version = {11.0.0},
+          year = {2024},
+          url = {https://github.com/ultralytics/ultralytics},
+          orcid = {0000-0001-5950-6979, 0000-0002-7603-6750, 0000-0003-3783-7069},
+          license = {AGPL-3.0}
+        }
+        ```
+
+Please note that the DOI is pending and will be added to the citation once it is available. YOLO11 models are provided under [AGPL-3.0](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) and [Enterprise](https://www.ultralytics.com/license) licenses.
+
+## FAQ
+
+### What are the key improvements in Ultralytics YOLO11 compared to previous versions?
+
+Ultralytics YOLO11 introduces several significant advancements over its predecessors. Key improvements include:
+
+- **Enhanced Feature Extraction:** YOLO11 employs an improved backbone and neck architecture, enhancing [feature extraction](https://www.ultralytics.com/glossary/feature-extraction) capabilities for more precise object detection.
+- **Optimized Efficiency and Speed:** Refined architectural designs and optimized training pipelines deliver faster processing speeds while maintaining a balance between accuracy and performance.
+- **Greater Accuracy with Fewer Parameters:** YOLO11m achieves higher mean Average [Precision](https://www.ultralytics.com/glossary/precision) (mAP) on the COCO dataset with 22% fewer parameters than YOLOv8m, making it computationally efficient without compromising accuracy.
+- **Adaptability Across Environments:** YOLO11 can be deployed across various environments, including edge devices, cloud platforms, and systems supporting NVIDIA GPUs.
+- **Broad Range of Supported Tasks:** YOLO11 supports diverse computer vision tasks such as object detection, [instance segmentation](https://www.ultralytics.com/glossary/instance-segmentation), image classification, pose estimation, and oriented object detection (OBB).
+
+### How do I train a YOLO11 model for object detection?
+
+Training a YOLO11 model for object detection can be done using Python or CLI commands. Below are examples for both methods:
+
+!!! Example
+
+    === "Python"
+
+        ```python
+        from ultralytics import YOLO
+
+        # Load a COCO-pretrained YOLO11n model
+        model = YOLO("yolo11n.pt")
+
+        # Train the model on the COCO8 example dataset for 100 epochs
+        results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
+        ```
+
+    === "CLI"
+
+        ```bash
+        # Load a COCO-pretrained YOLO11n model and train it on the COCO8 example dataset for 100 epochs
+        yolo train model=yolo11n.pt data=coco8.yaml epochs=100 imgsz=640
+        ```
+
+For more detailed instructions, refer to the [Train](../modes/train.md) documentation.
+
+### What tasks can YOLO11 models perform?
+
+YOLO11 models are versatile and support a wide range of computer vision tasks, including:
+
+- **Object Detection:** Identifying and locating objects within an image.
+- **Instance Segmentation:** Detecting objects and delineating their boundaries.
+- **[Image Classification](https://www.ultralytics.com/glossary/image-classification):** Categorizing images into predefined classes.
+- **Pose Estimation:** Detecting and tracking keypoints on human bodies.
+- **Oriented Object Detection (OBB):** Detecting objects with rotation for higher precision.
+
+For more information on each task, see the [Detection](../tasks/detect.md), [Instance Segmentation](../tasks/segment.md), [Classification](../tasks/classify.md), [Pose Estimation](../tasks/pose.md), and [Oriented Detection](../tasks/obb.md) documentation.
+
+### How does YOLO11 achieve greater accuracy with fewer parameters?
+
+YOLO11 achieves greater accuracy with fewer parameters through advancements in model design and optimization techniques. The improved architecture allows for efficient feature extraction and processing, resulting in higher mean Average Precision (mAP) on datasets like COCO while using 22% fewer parameters than YOLOv8m. This makes YOLO11 computationally efficient without compromising on accuracy, making it suitable for deployment on resource-constrained devices.
+
+### Can YOLO11 be deployed on edge devices?
+
+Yes, YOLO11 is designed for adaptability across various environments, including edge devices. Its optimized architecture and efficient processing capabilities make it suitable for deployment on edge devices, cloud platforms, and systems supporting NVIDIA GPUs. This flexibility ensures that YOLO11 can be used in diverse applications, from real-time detection on mobile devices to complex segmentation tasks in cloud environments. For more details on deployment options, refer to the [Export](../modes/export.md) documentation.