update code

ff793569 · dengjb · fdfe3c4f · ff793569 · ff793569 · ff793569
Commit ff793569 authored Nov 18, 2023 by dengjb
20 changed files
--- a/docs/zh_cn/user_guides/semi_det.md
+++ b/docs/zh_cn/user_guides/semi_det.md
+# 半监督目标检测
+
+半监督目标检测同时利用标签数据和无标签数据进行训练，一方面可以减少模型对检测框数量的依赖，另一方面也可以利用大量的未标记数据进一步提高模型。
+
+按照以下流程进行半监督目标检测：
+
+- [半监督目标检测](#半监督目标检测)
+  - [准备和拆分数据集](#准备和拆分数据集)
+  - [配置多分支数据流程](#配置多分支数据流程)
+  - [配置半监督数据加载](#配置半监督数据加载)
+  - [配置半监督模型](#配置半监督模型)
+  - [配置MeanTeacherHook](#配置meanteacherhook)
+  - [配置TeacherStudentValLoop](#配置teacherstudentvalloop)
+
+## 准备和拆分数据集
+
+我们提供了数据集下载脚本，默认下载 coco2017 数据集，并且自动解压。
+
+```shell
+python tools/misc/download_dataset.py
+```
+
+解压后的数据集目录如下：
+
+```plain
+mmdetection
+├── data
+│   ├── coco
+│   │   ├── annotations
+│   │   │   ├── image_info_unlabeled2017.json
+│   │   │   ├── instances_train2017.json
+│   │   │   ├── instances_val2017.json
+│   │   ├── test2017
+│   │   ├── train2017
+│   │   ├── unlabeled2017
+│   │   ├── val2017
+```
+
+半监督目标检测在 coco 数据集上有两种比较通用的实验设置：
+
+（1）将 `train2017` 按照固定百分比（1%，2%，5% 和 10%）划分出一部分数据作为标签数据集，剩余的训练集数据作为无标签数据集，同时考虑划分不同的训练集数据作为标签数据集对半监督训练的结果影响较大，所以采用五折交叉验证来评估算法性能。我们提供了数据集划分脚本：
+
+```shell
+python tools/misc/split_coco.py
+```
+
+该脚本默认会按照 1%，2%，5% 和 10% 的标签数据占比划分 `train2017`，每一种划分会随机重复 5 次，用于交叉验证。生成的半监督标注文件名称格式如下：
+
+- 标签数据集标注名称格式：`instances_train2017.{fold}@{percent}.json`
+
+- 无标签数据集名称标注：`instances_train2017.{fold}@{percent}-unlabeled.json`
+
+其中，`fold` 用于交叉验证，`percent` 表示标签数据的占比。 划分后的数据集目录结构如下：
+
+```plain
+mmdetection
+├── data
+│   ├── coco
+│   │   ├── annotations
+│   │   │   ├── image_info_unlabeled2017.json
+│   │   │   ├── instances_train2017.json
+│   │   │   ├── instances_val2017.json
+│   │   ├── semi_anns
+│   │   │   ├── instances_train2017.1@1.json
+│   │   │   ├── instances_train2017.1@1-unlabeled.json
+│   │   │   ├── instances_train2017.1@2.json
+│   │   │   ├── instances_train2017.1@2-unlabeled.json
+│   │   │   ├── instances_train2017.1@5.json
+│   │   │   ├── instances_train2017.1@5-unlabeled.json
+│   │   │   ├── instances_train2017.1@10.json
+│   │   │   ├── instances_train2017.1@10-unlabeled.json
+│   │   │   ├── instances_train2017.2@1.json
+│   │   │   ├── instances_train2017.2@1-unlabeled.json
+│   │   ├── test2017
+│   │   ├── train2017
+│   │   ├── unlabeled2017
+│   │   ├── val2017
+```
+
+（2）将 `train2017` 作为标签数据集，`unlabeled2017` 作为无标签数据集。由于 `image_info_unlabeled2017.json` 没有 `categories` 信息，无法初始化 `CocoDataset` ，所以需要将 `instances_train2017.json` 的 `categories` 写入 `image_info_unlabeled2017.json` ，另存为 `instances_unlabeled2017.json`，相关脚本如下：
+
+```python
+from mmengine.fileio import load, dump
+
+anns_train = load('instances_train2017.json')
+anns_unlabeled = load('image_info_unlabeled2017.json')
+anns_unlabeled['categories'] = anns_train['categories']
+dump(anns_unlabeled, 'instances_unlabeled2017.json')
+```
+
+处理后的数据集目录如下：
+
+```plain
+mmdetection
+├── data
+│   ├── coco
+│   │   ├── annotations
+│   │   │   ├── image_info_unlabeled2017.json
+│   │   │   ├── instances_train2017.json
+│   │   │   ├── instances_unlabeled2017.json
+│   │   │   ├── instances_val2017.json
+│   │   ├── test2017
+│   │   ├── train2017
+│   │   ├── unlabeled2017
+│   │   ├── val2017
+```
+
+## 配置多分支数据流程
+
+半监督学习有两个主要的方法，分别是
+[一致性正则化](https://research.nvidia.com/sites/default/files/publications/laine2017iclr_paper.pdf)
+和[伪标签](https://www.researchgate.net/profile/Dong-Hyun-Lee/publication/280581078_Pseudo-Label_The_Simple_and_Efficient_Semi-Supervised_Learning_Method_for_Deep_Neural_Networks/links/55bc4ada08ae092e9660b776/Pseudo-Label-The-Simple-and-Efficient-Semi-Supervised-Learning-Method-for-Deep-Neural-Networks.pdf) 。
+一致性正则化往往需要一些精心的设计，而伪标签的形式比较简单，更容易拓展到下游任务。我们主要采用了基于伪标签的教师学生联合训练的半监督目标检测框架，对于标签数据和无标签数据需要配置不同的数据流程：
+（1）标签数据的数据流程：
+
+```python
+# pipeline used to augment labeled data,
+# which will be sent to student model for supervised training.
+sup_pipeline = [
+    dict(type='LoadImageFromFile',backend_args = backend_args),
+    dict(type='LoadAnnotations', with_bbox=True),
+    dict(type='RandomResize', scale=scale, keep_ratio=True),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='RandAugment', aug_space=color_space, aug_num=1),
+    dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)),
+    dict(type='MultiBranch', sup=dict(type='PackDetInputs'))
+]
+```
+
+（2）无标签的数据流程：
+
+```python
+# pipeline used to augment unlabeled data weakly,
+# which will be sent to teacher model for predicting pseudo instances.
+weak_pipeline = [
+    dict(type='RandomResize', scale=scale, keep_ratio=True),
+    dict(type='RandomFlip', prob=0.5),
+    dict(
+        type='PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'flip', 'flip_direction',
+                   'homography_matrix')),
+]
+
+# pipeline used to augment unlabeled data strongly,
+# which will be sent to student model for unsupervised training.
+strong_pipeline = [
+    dict(type='RandomResize', scale=scale, keep_ratio=True),
+    dict(type='RandomFlip', prob=0.5),
+    dict(
+        type='RandomOrder',
+        transforms=[
+            dict(type='RandAugment', aug_space=color_space, aug_num=1),
+            dict(type='RandAugment', aug_space=geometric, aug_num=1),
+        ]),
+    dict(type='RandomErasing', n_patches=(1, 5), ratio=(0, 0.2)),
+    dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)),
+    dict(
+        type='PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'flip', 'flip_direction',
+                   'homography_matrix')),
+]
+
+# pipeline used to augment unlabeled data into different views
+unsup_pipeline = [
+    dict(type='LoadImageFromFile', backend_args = backend_args),
+    dict(type='LoadEmptyAnnotations'),
+    dict(
+        type='MultiBranch',
+        unsup_teacher=weak_pipeline,
+        unsup_student=strong_pipeline,
+    )
+]
+```
+
+## 配置半监督数据加载
+
+（1）构建半监督数据集。使用 `ConcatDataset` 拼接标签数据集和无标签数据集。
+
+```python
+labeled_dataset = dict(
+    type=dataset_type,
+    data_root=data_root,
+    ann_file='annotations/instances_train2017.json',
+    data_prefix=dict(img='train2017/'),
+    filter_cfg=dict(filter_empty_gt=True, min_size=32),
+    pipeline=sup_pipeline)
+
+unlabeled_dataset = dict(
+    type=dataset_type,
+    data_root=data_root,
+    ann_file='annotations/instances_unlabeled2017.json',
+    data_prefix=dict(img='unlabeled2017/'),
+    filter_cfg=dict(filter_empty_gt=False),
+    pipeline=unsup_pipeline)
+
+train_dataloader = dict(
+    batch_size=batch_size,
+    num_workers=num_workers,
+    persistent_workers=True,
+    sampler=dict(
+        type='GroupMultiSourceSampler',
+        batch_size=batch_size,
+        source_ratio=[1, 4]),
+    dataset=dict(
+        type='ConcatDataset', datasets=[labeled_dataset, unlabeled_dataset]))
+```
+
+（2）使用多源数据集采样器。 使用 `GroupMultiSourceSampler` 从 `labeled_dataset` 和 `labeled_dataset` 采样数据组成 batch , `source_ratio` 控制 batch 中标签数据和无标签数据的占比。`GroupMultiSourceSampler` 还保证了同一个 batch 中的图片具有相近的长宽比例，如果不需要保证batch内图片的长宽比例，可以使用 `MultiSourceSampler`。`GroupMultiSourceSampler` 采样示意图如下：
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/40661020/186149261-8cf28e92-de5c-4c8c-96e1-13558b2e27f7.jpg"/>
+</div>
+
+`sup=1000` 表示标签数据集的规模为 1000 ，`sup_h=200` 表示标签数据集中长宽比大于等于1的图片规模为 200，`sup_w=800` 表示标签数据集中长宽比小于1的图片规模为 800 ，`unsup=9000` 表示无标签数据集的规模为 9000 ，`unsup_h=1800` 表示无标签数据集中长宽比大于等于1的图片规模为 1800，`unsup_w=7200` 表示标签数据集中长宽比小于1的图片规模为 7200 ，`GroupMultiSourceSampler` 每次按照标签数据集和无标签数据集的图片的总体长宽比分布随机选择一组，然后按照 `source_ratio` 从两个数据集中采样组成 batch ，因此标签数据集和无标签数据集重复采样次数不同。
+
+## 配置半监督模型
+
+我们选择 `Faster R-CNN` 作为 `detector` 进行半监督训练，以半监督目标检测算法 `SoftTeacher` 为例，模型的配置可以继承 `_base_/models/faster-rcnn_r50_fpn.py`，将检测器的骨干网络替换成 `caffe` 风格。
+注意，与监督训练的配置文件不同的是，`Faster R-CNN` 作为 `detector`，是作为 `model`的一个属性，而不是 `model` 。此外，还需要将`data_preprocessor`设置为`MultiBranchDataPreprocessor`，用于处理不同数据流程图片的填充和归一化。
+最后，可以通过 `semi_train_cfg` 和 `semi_test_cfg` 配置半监督训练和测试需要的参数。
+
+```python
+_base_ = [
+    '../_base_/models/faster-rcnn_r50_fpn.py', '../_base_/default_runtime.py',
+    '../_base_/datasets/semi_coco_detection.py'
+]
+
+detector = _base_.model
+detector.data_preprocessor = dict(
+    type='DetDataPreprocessor',
+    mean=[103.530, 116.280, 123.675],
+    std=[1.0, 1.0, 1.0],
+    bgr_to_rgb=False,
+    pad_size_divisor=32)
+detector.backbone = dict(
+    type='ResNet',
+    depth=50,
+    num_stages=4,
+    out_indices=(0, 1, 2, 3),
+    frozen_stages=1,
+    norm_cfg=dict(type='BN', requires_grad=False),
+    norm_eval=True,
+    style='caffe',
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint='open-mmlab://detectron2/resnet50_caffe'))
+
+model = dict(
+    _delete_=True,
+    type='SoftTeacher',
+    detector=detector,
+    data_preprocessor=dict(
+        type='MultiBranchDataPreprocessor',
+        data_preprocessor=detector.data_preprocessor),
+    semi_train_cfg=dict(
+        freeze_teacher=True,
+        sup_weight=1.0,
+        unsup_weight=4.0,
+        pseudo_label_initial_score_thr=0.5,
+        rpn_pseudo_thr=0.9,
+        cls_pseudo_thr=0.9,
+        reg_pseudo_thr=0.02,
+        jitter_times=10,
+        jitter_scale=0.06,
+        min_pseudo_bbox_wh=(1e-2, 1e-2)),
+    semi_test_cfg=dict(predict_on='teacher'))
+```
+
+此外，我们也支持其他检测模型进行半监督训练，比如，`RetinaNet` 和 `Cascade R-CNN`。由于 `SoftTeacher` 仅支持 `Faster R-CNN`，所以需要将其替换为 `SemiBaseDetector`，示例如下：
+
+```python
+_base_ = [
+    '../_base_/models/retinanet_r50_fpn.py', '../_base_/default_runtime.py',
+    '../_base_/datasets/semi_coco_detection.py'
+]
+
+detector = _base_.model
+
+model = dict(
+    _delete_=True,
+    type='SemiBaseDetector',
+    detector=detector,
+    data_preprocessor=dict(
+        type='MultiBranchDataPreprocessor',
+        data_preprocessor=detector.data_preprocessor),
+    semi_train_cfg=dict(
+        freeze_teacher=True,
+        sup_weight=1.0,
+        unsup_weight=1.0,
+        cls_pseudo_thr=0.9,
+        min_pseudo_bbox_wh=(1e-2, 1e-2)),
+    semi_test_cfg=dict(predict_on='teacher'))
+```
+
+沿用 `SoftTeacher` 的半监督训练配置，将 `batch_size` 改为 2 ，`source_ratio` 改为 `[1, 1]`，`RetinaNet`，`Faster R-CNN`， `Cascade R-CNN` 以及 `SoftTeacher` 在 10% coco 训练集上的监督训练和半监督训练的实验结果如下：
+
+|      Model       |   Detector    | BackBone | Style | sup-0.1-coco mAP | semi-0.1-coco mAP |
+| :--------------: | :-----------: | :------: | :---: | :--------------: | :---------------: |
+| SemiBaseDetector |   RetinaNet   | R-50-FPN | caffe |       23.5       |       27.7        |
+| SemiBaseDetector | Faster R-CNN  | R-50-FPN | caffe |       26.7       |       28.4        |
+| SemiBaseDetector | Cascade R-CNN | R-50-FPN | caffe |       28.0       |       29.7        |
+|   SoftTeacher    | Faster R-CNN  | R-50-FPN | caffe |       26.7       |       31.1        |
+
+## 配置MeanTeacherHook
+
+通常，教师模型采用对学生模型指数滑动平均（EMA）的方式进行更新，进而教师模型随着学生模型的优化而优化，可以通过配置 `custom_hooks` 实现：
+
+```python
+custom_hooks = [dict(type='MeanTeacherHook')]
+```
+
+## 配置TeacherStudentValLoop
+
+由于教师学生联合训练框架存在两个模型，我们可以用 `TeacherStudentValLoop` 替换 `ValLoop`，在训练的过程中同时检验两个模型的精度。
+
+```python
+val_cfg = dict(type='TeacherStudentValLoop')
+```
--- a/docs/zh_cn/user_guides/single_stage_as_rpn.md
+++ b/docs/zh_cn/user_guides/single_stage_as_rpn.md
+# 将单阶段检测器作为 RPN
+
+候选区域网络 (Region Proposal Network, RPN) 作为 [Faster R-CNN](https://arxiv.org/abs/1506.01497) 的一个子模块，将为 Faster R-CNN 的第二阶段产生候选区域。在 MMDetection 里大多数的二阶段检测器使用 [`RPNHead`](../../../mmdet/models/dense_heads/rpn_head.py)作为候选区域网络来产生候选区域。然而，任何的单阶段检测器都可以作为候选区域网络，是因为他们对边界框的预测可以被视为是一种候选区域，并且因此能够在 R-CNN 中得到改进。因此在 MMDetection v3.0 中会支持将单阶段检测器作为 RPN 使用。
+
+接下来我们通过一个例子，即如何在 [Faster R-CNN](../../../configs/faster_rcnn/faster-rcnn_r50_fpn_fcos-rpn_1x_coco.py) 中使用一个无锚框的单阶段的检测器模型 [FCOS](../../../configs/fcos/fcos_r50-caffe_fpn_gn-head_1x_coco.py) 作为 RPN ，详细阐述具体的全部流程。
+
+主要流程如下:
+
+1. 在 Faster R-CNN 中使用 `FCOSHead` 作为 `RPNHead`
+2. 评估候选区域
+3. 用预先训练的 FCOS 训练定制的 Faster R-CNN
+
+## 在 Faster R-CNN 中使用 `FCOSHead` 作为` RPNHead`
+
+为了在 Faster R-CNN 中使用 `FCOSHead` 作为 `RPNHead` ，我们应该创建一个名为 `configs/faster_rcnn/faster-rcnn_r50_fpn_fcos-rpn_1x_coco.py` 的配置文件，并且在 `configs/faster_rcnn/faster-rcnn_r50_fpn_fcos-rpn_1x_coco.py` 中将 `rpn_head` 的设置替换为 `bbox_head` 的设置，此外我们仍然使用 FCOS 的瓶颈设置，步幅为`[8,16,32,64,128]`，并且更新 `bbox_roi_extractor` 的 `featmap_stride` 为 ` [8,16,32,64,128]`。为了避免损失变慢，我们在前1000次迭代而不是前500次迭代中应用预热，这意味着 lr 增长得更慢。相关配置如下:
+
+```python
+_base_ = [
+    '../_base_/models/faster-rcnn_r50_fpn.py',
+    '../_base_/datasets/coco_detection.py',
+    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
+]
+model = dict(
+    # 从 configs/fcos/fcos_r50-caffe_fpn_gn-head_1x_coco.py 复制
+    neck=dict(
+        start_level=1,
+        add_extra_convs='on_output',  # 使用 P5
+        relu_before_extra_convs=True),
+    rpn_head=dict(
+        _delete_=True,  # 忽略未使用的旧设置
+        type='FCOSHead',
+        num_classes=1,  # 对于 rpn, num_classes = 1，如果 num_classes > 1，它将在 TwoStageDetector 中自动设置为1
+        in_channels=256,
+        stacked_convs=4,
+        feat_channels=256,
+        strides=[8, 16, 32, 64, 128],
+        loss_cls=dict(
+            type='FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(type='IoULoss', loss_weight=1.0),
+        loss_centerness=dict(
+            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
+    roi_head=dict(  # featmap_strides 的更新取决于于颈部的步伐
+        bbox_roi_extractor=dict(featmap_strides=[8, 16, 32, 64, 128])))
+# 学习率
+param_scheduler = [
+    dict(
+        type='LinearLR', start_factor=0.001, by_epoch=False, begin=0,
+        end=1000),  # 慢慢增加 lr，否则损失变成 NAN
+    dict(
+        type='MultiStepLR',
+        begin=0,
+        end=12,
+        by_epoch=True,
+        milestones=[8, 11],
+        gamma=0.1)
+]
+```
+
+然后，我们可以使用下面的命令来训练我们的定制模型。更多训练命令，请参考[这里](train.md)。
+
+```python
+# 使用8个 GPU 进行训练
+bash
+tools/dist_train.sh
+configs/faster_rcnn/faster-rcnn_r50_fpn_fcos-rpn_1x_coco.py
+--work-dir  /work_dirs/faster-rcnn_r50_fpn_fcos-rpn_1x_coco
+```
+
+## 评估候选区域
+
+候选区域的质量对检测器的性能有重要影响，因此，我们也提供了一种评估候选区域的方法。和上面一样创建一个新的名为 `configs/rpn/fcos-rpn_r50_fpn_1x_coco.py` 的配置文件，并且在 `configs/rpn/fcos-rpn_r50_fpn_1x_coco.py` 中将 `rpn_head` 的设置替换为 `bbox_head` 的设置。
+
+```python
+_base_ = [
+    '../_base_/models/rpn_r50_fpn.py', '../_base_/datasets/coco_detection.py',
+    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
+]
+val_evaluator = dict(metric='proposal_fast')
+test_evaluator = val_evaluator
+model = dict(
+    # 从 configs/fcos/fcos_r50-caffe_fpn_gn-head_1x_coco.py 复制
+    neck=dict(
+        start_level=1,
+        add_extra_convs='on_output',  # 使用 P5
+        relu_before_extra_convs=True),
+    rpn_head=dict(
+        _delete_=True,  # 忽略未使用的旧设置
+        type='FCOSHead',
+        num_classes=1,  # 对于 rpn, num_classes = 1，如果 num_classes >为1，它将在 rpn 中自动设置为1
+        in_channels=256,
+        stacked_convs=4,
+        feat_channels=256,
+        strides=[8, 16, 32, 64, 128],
+        loss_cls=dict(
+            type='FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(type='IoULoss', loss_weight=1.0),
+        loss_centerness=dict(
+            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
+```
+
+假设我们在训练之后有检查点 `./work_dirs/faster-rcnn_r50_fpn_fcos-rpn_1x_coco/epoch_12.pth` ，然后，我们可以使用下面的命令来评估建议的质量。
+
+```python
+# 使用8个 GPU 进行测试
+bash
+tools/dist_test.sh
+configs/rpn/fcos-rpn_r50_fpn_1x_coco.py
+--work_dirs /faster-rcnn_r50_fpn_fcos-rpn_1x_coco/epoch_12.pth
+```
+
+## 用预先训练的 FCOS 训练定制的 Faster R-CNN
+
+预训练不仅加快了训练的收敛速度，而且提高了检测器的性能。因此，我们在这里给出一个例子来说明如何使用预先训练的 FCOS 作为 RPN 来加速训练和提高精度。假设我们想在 Faster R-CNN 中使用 `FCOSHead` 作为 `rpn_head`，并加载预先训练权重来进行训练 [`fcos_r50-caffe_fpn_gn-head_1x_coco`](https://download.openmmlab.com/mmdetection/v2.0/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco/fcos_r50_caffe_fpn_gn-head_1x_coco-821213aa.pth)。 配置文件 `configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_fcos- rpn_1x_copy .py` 的内容如下所示。注意，`fcos_r50-caffe_fpn_gn-head_1x_coco` 使用 ResNet50 的 caffe 版本，因此需要更新 `data_preprocessor` 中的像素平均值和 std。
+
+```python
+_base_ = [
+    '../_base_/models/faster-rcnn_r50_fpn.py',
+    '../_base_/datasets/coco_detection.py',
+    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
+]
+model = dict(
+    data_preprocessor=dict(
+        mean=[103.530, 116.280, 123.675],
+        std=[1.0, 1.0, 1.0],
+        bgr_to_rgb=False),
+    backbone=dict(
+        norm_cfg=dict(type='BN', requires_grad=False),
+        style='caffe',
+        init_cfg=None),  # the checkpoint in ``load_from`` contains the weights of backbone
+    neck=dict(
+        start_level=1,
+        add_extra_convs='on_output',  # 使用 P5
+        relu_before_extra_convs=True),
+    rpn_head=dict(
+        _delete_=True,  # 忽略未使用的旧设置
+        type='FCOSHead',
+        num_classes=1,  # 对于 rpn, num_classes = 1，如果 num_classes > 1，它将在 TwoStageDetector 中自动设置为1
+        in_channels=256,
+        stacked_convs=4,
+        feat_channels=256,
+        strides=[8, 16, 32, 64, 128],
+        loss_cls=dict(
+            type='FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(type='IoULoss', loss_weight=1.0),
+        loss_centerness=dict(
+            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
+    roi_head=dict(  # update featmap_strides due to the strides in neck
+        bbox_roi_extractor=dict(featmap_strides=[8, 16, 32, 64, 128])))
+load_from = 'https://download.openmmlab.com/mmdetection/v2.0/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco/fcos_r50_caffe_fpn_gn-head_1x_coco-821213aa.pth'
+```
+
+训练命令如下。
+
+```python
+bash
+tools/dist_train.sh
+configs/faster_rcnn/faster-rcnn_r50-caffe_fpn_fcos-rpn_1x_coco.py  \
+--work-dir /work_dirs/faster-rcnn_r50-caffe_fpn_fcos-rpn_1x_coco
+```
--- a/docs/zh_cn/user_guides/test.md
+++ b/docs/zh_cn/user_guides/test.md
+# 测试现有模型
+
+我们提供了测试脚本，能够测试一个现有模型在所有数据集（COCO，Pascal VOC，Cityscapes 等）上的性能。我们支持在如下环境下测试：
+
+- 单 GPU 测试
+- CPU 测试
+- 单节点多 GPU 测试
+- 多节点测试
+
+根据以上测试环境，选择合适的脚本来执行测试过程。
+
+```shell
+# 单 GPU 测试
+python tools/test.py \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    [--out ${RESULT_FILE}] \
+    [--show]
+
+# CPU 测试：禁用 GPU 并运行单 GPU 测试脚本
+export CUDA_VISIBLE_DEVICES=-1
+python tools/test.py \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    [--out ${RESULT_FILE}] \
+    [--show]
+
+# 单节点多 GPU 测试
+bash tools/dist_test.sh \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    ${GPU_NUM} \
+    [--out ${RESULT_FILE}]
+```
+
+`tools/dist_test.sh` 也支持多节点测试，不过需要依赖 PyTorch 的 [启动工具](https://pytorch.org/docs/stable/distributed.html#launch-utility) 。
+
+可选参数：
+
+- `RESULT_FILE`: 结果文件名称，需以 .pkl 形式存储。如果没有声明，则不将结果存储到文件。
+- `--show`: 如果开启，检测结果将被绘制在图像上，以一个新窗口的形式展示。它只适用于单 GPU 的测试，是用于调试和可视化的。请确保使用此功能时，你的 GUI 可以在环境中打开。否则，你可能会遇到这么一个错误 `cannot connect to X server`。
+- `--show-dir`: 如果指明，检测结果将会被绘制在图像上并保存到指定目录。它只适用于单 GPU 的测试，是用于调试和可视化的。即使你的环境中没有 GUI，这个选项也可使用。
+- `--cfg-options`:  如果指明，这里的键值对将会被合并到配置文件中。
+
+### 样例
+
+假设你已经下载了 checkpoint 文件到 `checkpoints/` 文件下了。
+
+1. 测试 RTMDet 并可视化其结果。按任意键继续下张图片的测试。配置文件和 checkpoint 文件 [在此](https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet) 。
+
+   ```shell
+   python tools/test.py \
+       configs/rtmdet/rtmdet_l_8xb32-300e_coco.py \
+       checkpoints/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth \
+       --show
+   ```
+
+2. 测试 RTMDet，并为了之后的可视化保存绘制的图像。配置文件和 checkpoint 文件 [在此](https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet) 。
+
+   ```shell
+   python tools/test.py \
+       configs/rtmdet/rtmdet_l_8xb32-300e_coco.py \
+       checkpoints/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth \
+       --show-dir rtmdet_l_8xb32-300e_coco_results
+   ```
+
+3. 在 Pascal VOC 数据集上测试 Faster R-CNN，不保存测试结果，测试 `mAP`。配置文件和 checkpoint 文件 [在此](../../../configs/pascal_voc) 。
+
+   ```shell
+   python tools/test.py \
+       configs/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712.py \
+       checkpoints/faster_rcnn_r50_fpn_1x_voc0712_20200624-c9895d40.pth
+   ```
+
+4. 使用 8 块 GPU 测试 Mask R-CNN，测试 `bbox` 和 `mAP` 。配置文件和 checkpoint 文件 [在此](../../../configs/mask_rcnn) 。
+
+   ```shell
+   ./tools/dist_test.sh \
+       configs/mask-rcnn_r50_fpn_1x_coco.py \
+       checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \
+       8 \
+       --out results.pkl
+   ```
+
+5. 使用 8 块 GPU 测试 Mask R-CNN，测试**每类**的 `bbox` 和 `mAP`。配置文件和 checkpoint 文件 [在此](../../../configs/mask_rcnn) 。
+
+   ```shell
+   ./tools/dist_test.sh \
+       configs/mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py \
+       checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \
+       8
+   ```
+
+   该命令生成两个JSON文件 `./work_dirs/coco_instance/test.bbox.json` 和 `./work_dirs/coco_instance/test.segm.json`。
+
+6. 在 COCO test-dev 数据集上，使用 8 块 GPU 测试 Mask R-CNN，并生成 JSON 文件提交到官方评测服务器，配置文件和 checkpoint 文件 [在此](../../../configs/mask_rcnnn) 。你可以在 [config](./././configs/_base_/datasets/coco_instance.py) 的注释中用 test_evaluator 和 test_dataloader 替换原来的 test_evaluator 和 test_dataloader，然后运行：
+
+   ```shell
+   ./tools/dist_test.sh \
+       configs/cityscapes/mask-rcnn_r50_fpn_1x_cityscapes.py \
+       checkpoints/mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth \
+       8
+   ```
+
+   这行命令生成两个 JSON 文件 `mask_rcnn_test-dev_results.bbox.json` 和 `mask_rcnn_test-dev_results.segm.json`。
+
+7. 在 Cityscapes 数据集上，使用 8 块 GPU 测试 Mask R-CNN，生成 txt 和 png 文件，并上传到官方评测服务器。配置文件和 checkpoint 文件 [在此](../../../configs/cityscapes) 。 你可以在 [config](./././configs/_base_/datasets/cityscapes_instance.py) 的注释中用 test_evaluator 和 test_dataloader 替换原来的 test_evaluator 和 test_dataloader，然后运行：
+
+   ```shell
+   ./tools/dist_test.sh \
+       configs/cityscapes/mask-rcnn_r50_fpn_1x_cityscapes.py \
+       checkpoints/mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth \
+       8
+   ```
+
+   生成的 png 和 txt 文件在 `./work_dirs/cityscapes_metric` 文件夹下。
+
+### 不使用 Ground Truth 标注进行测试
+
+MMDetection 支持在不使用 ground-truth 标注的情况下对模型进行测试，这需要用到 `CocoDataset`。如果你的数据集格式不是 COCO 格式的，请将其转化成 COCO 格式。如果你的数据集格式是 VOC 或者 Cityscapes，你可以使用 [tools/dataset_converters](https://github.com/open-mmlab/mmdetection/tree/main/tools/dataset_converters) 内的脚本直接将其转化成 COCO 格式。如果是其他格式，可以使用 [images2coco 脚本](https://github.com/open-mmlab/mmdetection/tree/master/tools/dataset_converters/images2coco.py) 进行转换。
+
+```shell
+python tools/dataset_converters/images2coco.py \
+    ${IMG_PATH} \
+    ${CLASSES} \
+    ${OUT} \
+    [--exclude-extensions]
+```
+
+参数：
+
+- `IMG_PATH`: 图片根路径。
+- `CLASSES`: 类列表文本文件名。文本中每一行存储一个类别。
+- `OUT`: 输出 json 文件名。 默认保存目录和 `IMG_PATH` 在同一级。
+- `exclude-extensions`: 待排除的文件后缀名。
+
+在转换完成后，使用如下命令进行测试
+
+```shell
+# 单 GPU 测试
+python tools/test.py \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    [--show]
+
+# CPU 测试：禁用 GPU 并运行单 GPU 测试脚本
+export CUDA_VISIBLE_DEVICES=-1
+python tools/test.py \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    [--out ${RESULT_FILE}] \
+    [--show]
+
+# 单节点多 GPU 测试
+bash tools/dist_test.sh \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    ${GPU_NUM} \
+    [--show]
+```
+
+假设 [model zoo](https://mmdetection.readthedocs.io/en/latest/modelzoo_statistics.html) 中的 checkpoint 文件被下载到了 `checkpoints/` 文件夹下，
+我们可以使用以下命令，用 8 块 GPU 在 COCO test-dev 数据集上测试 Mask R-CNN，并且生成 JSON 文件。
+
+```sh
+./tools/dist_test.sh \
+    configs/mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py \
+    checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \
+    8
+```
+
+这行命令生成两个 JSON 文件 `./work_dirs/coco_instance/test.bbox.json` 和 `./work_dirs/coco_instance/test.segm.json`。
+
+### 批量推理
+
+MMDetection 在测试模式下，既支持单张图片的推理，也支持对图像进行批量推理。默认情况下，我们使用单张图片的测试，你可以通过修改测试数据配置文件中的 `samples_per_gpu` 来开启批量测试。
+开启批量推理的配置文件修改方法为：
+
+```shell
+data = dict(train_dataloader=dict(...), val_dataloader=dict(...), test_dataloader=dict(batch_size=2, ...))
+```
+
+或者你可以通过将 `--cfg-options` 设置为 `--cfg-options test_dataloader.batch_size=` 来开启它。
+
+## 测试时增强 (TTA)
+
+测试时增强 (TTA) 是一种在测试阶段使用的数据增强策略。它对同一张图片应用不同的增强，例如翻转和缩放，用于模型推理，然后将每个增强后的图像的预测结果合并，以获得更准确的预测结果。为了让用户更容易使用 TTA，MMEngine 提供了 [BaseTTAModel](https://mmengine.readthedocs.io/en/latest/api/generated/mmengine.model.BaseTTAModel.html#mmengine.model.BaseTTAModel) 类，允许用户根据自己的需求通过简单地扩展 BaseTTAModel 类来实现不同的 TTA 策略。
+
+在 MMDetection 中，我们提供了 [DetTTAModel](../../../mmdet/models/test_time_augs/det_tta.py) 类，它继承自 BaseTTAModel。
+
+### 使用案例
+
+使用 TTA 需要两个步骤。首先，你需要在配置文件中添加 `tta_model` 和 `tta_pipeline`：
+
+```shell
+tta_model = dict(
+    type='DetTTAModel',
+    tta_cfg=dict(nms=dict(
+                   type='nms',
+                   iou_threshold=0.5),
+                   max_per_img=100))
+
+tta_pipeline = [
+    dict(type='LoadImageFromFile',
+        backend_args=None),
+    dict(
+        type='TestTimeAug',
+        transforms=[[
+            dict(type='Resize', scale=(1333, 800), keep_ratio=True)
+        ], [ # It uses 2 flipping transformations (flipping and not flipping).
+            dict(type='RandomFlip', prob=1.),
+            dict(type='RandomFlip', prob=0.)
+        ], [
+            dict(
+               type='PackDetInputs',
+               meta_keys=('img_id', 'img_path', 'ori_shape',
+                       'img_shape', 'scale_factor', 'flip',
+                       'flip_direction'))
+       ]])]
+```
+
+第二步，运行测试脚本时，设置 `--tta` 参数，如下所示：
+
+```shell
+# 单 GPU 测试
+python tools/test.py \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    [--tta]
+
+# CPU 测试：禁用 GPU 并运行单 GPU 测试脚本
+export CUDA_VISIBLE_DEVICES=-1
+python tools/test.py \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    [--out ${RESULT_FILE}] \
+    [--tta]
+
+# 多 GPU 测试
+bash tools/dist_test.sh \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    ${GPU_NUM} \
+    [--tta]
+```
+
+你也可以自己修改 TTA 配置，例如添加缩放增强：
+
+```shell
+tta_model = dict(
+    type='DetTTAModel',
+    tta_cfg=dict(nms=dict(
+                   type='nms',
+                   iou_threshold=0.5),
+                   max_per_img=100))
+
+img_scales = [(1333, 800), (666, 400), (2000, 1200)]
+tta_pipeline = [
+    dict(type='LoadImageFromFile',
+         backend_args=None),
+    dict(
+        type='TestTimeAug',
+        transforms=[[
+            dict(type='Resize', scale=s, keep_ratio=True) for s in img_scales
+        ], [
+            dict(type='RandomFlip', prob=1.),
+            dict(type='RandomFlip', prob=0.)
+        ], [
+            dict(
+               type='PackDetInputs',
+               meta_keys=('img_id', 'img_path', 'ori_shape',
+                       'img_shape', 'scale_factor', 'flip',
+                       'flip_direction'))
+       ]])]
+```
+
+以上数据增强管道将首先对图像执行 3 个多尺度转换，然后执行 2 个翻转转换（翻转和不翻转），最后使用 PackDetInputs 将图像打包到最终结果中。
+这里有更多的 TTA 使用案例供您参考：
+
+- [RetinaNet](../../../configs/retinanet/retinanet_tta.py)
+- [CenterNet](../../../configs/centernet/centernet_tta.py)
+- [YOLOX](../../../configs/rtmdet/rtmdet_tta.py)
+- [RTMDet](../../../configs/yolox/yolox_tta.py)
+
+更多高级用法和 TTA 的数据流，请参考 [MMEngine](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/test_time_augmentation.html#data-flow)。我们将在后续支持实例分割 TTA。
--- a/docs/zh_cn/user_guides/test_results_submission.md
+++ b/docs/zh_cn/user_guides/test_results_submission.md
+# 提交测试结果
+
+## 全景分割测试结果提交
+
+下面几节介绍如何在 COCO 测试开发集上生成泛视分割模型的预测结果，并将预测提交到 [COCO评估服务器](https://competitions.codalab.org/competitions/19507)
+
+### 前提条件
+
+- 下载 [COCO测试数据集图像](http://images.cocodataset.org/zips/test2017.zip)，[测试图像信息](http://images.cocodataset.org/annotations/image_info_test2017.zip)，和[全景训练/相关注释](http://images.cocodataset.org/annotations/panoptic_annotations_trainval2017.zip)，然后解压缩它们，把 `test2017` 放到 `data/coco/`，把 json 文件和注释文件放到 `data/coco/annotations/` 。
+
+```shell
+# 假设 data/coco/ 不存在
+mkdir -pv data/coco/
+# 下载 test2017
+wget -P data/coco/ http://images.cocodataset.org/zips/test2017.zip
+wget -P data/coco/ http://images.cocodataset.org/annotations/image_info_test2017.zip
+wget -P data/coco/ http://images.cocodataset.org/annotations/panoptic_annotations_trainval2017.zip
+# 解压缩它们
+unzip data/coco/test2017.zip -d data/coco/
+unzip data/coco/image_info_test2017.zip -d data/coco/
+unzip data/coco/panoptic_annotations_trainval2017.zip -d data/coco/
+# 删除 zip 文件(可选)
+rm -rf data/coco/test2017.zip data/coco/image_info_test2017.zip data/coco/panoptic_annotations_trainval2017.zip
+```
+
+- 运行以下代码更新测试图像信息中的类别信息。由于 `image_info_test-dev2017.json` 的类别信息中缺少属性 `isthing` ，我们需要用 `panoptic_val2017.json` 中的类别信息更新它。
+
+```shell
+python tools/misc/gen_coco_panoptic_test_info.py data/coco/annotations
+```
+
+在完成上述准备之后，你的 `data` 目录结构应该是这样:
+
+```text
+data
+`-- coco
+    |-- annotations
+    |   |-- image_info_test-dev2017.json
+    |   |-- image_info_test2017.json
+    |   |-- panoptic_image_info_test-dev2017.json
+    |   |-- panoptic_train2017.json
+    |   |-- panoptic_train2017.zip
+    |   |-- panoptic_val2017.json
+    |   `-- panoptic_val2017.zip
+    `-- test2017
+```
+
+### coco 测试开发的推理
+
+要在 coco test-dev 上进行推断，我们应该首先更新 `test_dataloder` 和 `test_evaluator` 的设置。有两种方法可以做到这一点:1. 在配置文件中更新它们;2. 在命令行中更新它们。
+
+#### 在配置文件中更新它们
+
+相关的设置在 `configs/_base_/datasets/ coco_panoptical .py` 的末尾，如下所示。
+
+```python
+test_dataloader = dict(
+    batch_size=1,
+    num_workers=1,
+    persistent_workers=True,
+    drop_last=False,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        ann_file='annotations/panoptic_image_info_test-dev2017.json',
+        data_prefix=dict(img='test2017/'),
+        test_mode=True,
+        pipeline=test_pipeline))
+test_evaluator = dict(
+    type='CocoPanopticMetric',
+    format_only=True,
+    ann_file=data_root + 'annotations/panoptic_image_info_test-dev2017.json',
+    outfile_prefix='./work_dirs/coco_panoptic/test')
+```
+
+以下任何一种方法都可以用于更新 coco test-dev 集上的推理设置
+
+情况1:直接取消注释 `configs/_base_/datasets/ coco_panoptical .py` 中的设置。
+
+情况2:将以下设置复制到您现在使用的配置文件中。
+
+```python
+test_dataloader = dict(
+    dataset=dict(
+        ann_file='annotations/panoptic_image_info_test-dev2017.json',
+        data_prefix=dict(img='test2017/', _delete_=True)))
+test_evaluator = dict(
+    format_only=True,
+    ann_file=data_root + 'annotations/panoptic_image_info_test-dev2017.json',
+    outfile_prefix='./work_dirs/coco_panoptic/test')
+```
+
+然后通过以下命令对 coco test-dev et 进行推断。
+
+```shell
+python tools/test.py \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE}
+```
+
+#### 在命令行中更新它们
+
+coco test-dev 上更新相关设置和推理的命令如下所示。
+
+```shell
+# 用一个 gpu 测试
+CUDA_VISIBLE_DEVICES=0 python tools/test.py \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    --cfg-options \
+    test_dataloader.dataset.ann_file=annotations/panoptic_image_info_test-dev2017.json \
+    test_dataloader.dataset.data_prefix.img=test2017 \
+    test_dataloader.dataset.data_prefix._delete_=True \
+    test_evaluator.format_only=True \
+    test_evaluator.ann_file=data/coco/annotations/panoptic_image_info_test-dev2017.json \
+    test_evaluator.outfile_prefix=${WORK_DIR}/results
+# 用四个 gpu 测试
+CUDA_VISIBLE_DEVICES=0,1,3,4 bash tools/dist_test.sh \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    8 \  # eights gpus
+    --cfg-options \
+    test_dataloader.dataset.ann_file=annotations/panoptic_image_info_test-dev2017.json \
+    test_dataloader.dataset.data_prefix.img=test2017 \
+    test_dataloader.dataset.data_prefix._delete_=True \
+    test_evaluator.format_only=True \
+    test_evaluator.ann_file=data/coco/annotations/panoptic_image_info_test-dev2017.json \
+    test_evaluator.outfile_prefix=${WORK_DIR}/results
+# 用 slurm 测试
+GPUS=8 tools/slurm_test.sh \
+    ${Partition} \
+    ${JOB_NAME} \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT_FILE} \
+    --cfg-options \
+    test_dataloader.dataset.ann_file=annotations/panoptic_image_info_test-dev2017.json \
+    test_dataloader.dataset.data_prefix.img=test2017 \
+    test_dataloader.dataset.data_prefix._delete_=True \
+    test_evaluator.format_only=True \
+    test_evaluator.ann_file=data/coco/annotations/panoptic_image_info_test-dev2017.json \
+    test_evaluator.outfile_prefix=${WORK_DIR}/results
+```
+
+例子:假设我们使用预先训练的带有 ResNet-50 骨干网的 MaskFormer 对 `test2017` 执行推断。
+
+```shell
+# 单 gpu 测试
+CUDA_VISIBLE_DEVICES=0 python tools/test.py \
+    configs/maskformer/maskformer_r50_mstrain_16x1_75e_coco.py \
+    checkpoints/maskformer_r50_mstrain_16x1_75e_coco_20220221_141956-bc2699cb.pth \
+    --cfg-options \
+    test_dataloader.dataset.ann_file=annotations/panoptic_image_info_test-dev2017.json \
+    test_dataloader.dataset.data_prefix.img=test2017 \
+    test_dataloader.dataset.data_prefix._delete_=True \
+    test_evaluator.format_only=True \
+    test_evaluator.ann_file=data/coco/annotations/panoptic_image_info_test-dev2017.json \
+    test_evaluator.outfile_prefix=work_dirs/maskformer/results
+```
+
+### 重命名文件并压缩结果
+
+推理之后，全景分割结果(一个 json 文件和一个存储掩码的目录)将在 `WORK_DIR` 中。我们应该按照 [COCO's Website](https://cocodataset.org/#upload)上的命名约定重新命名它们。最后，我们需要将 json 和存储掩码的目录压缩到 zip 文件中，并根据命名约定重命名该 zip 文件。注意， zip 文件应该**直接**包含上述两个文件。
+
+重命名文件和压缩结果的命令:
+
+```shell
+# 在 WORK_DIR 中，我们有 panoptic 分割结果: 'panoptic' 和 'results. panoptical .json'。
+cd ${WORK_DIR}
+# 将 '[algorithm_name]' 替换为您使用的算法名称
+mv ./panoptic ./panoptic_test-dev2017_[algorithm_name]_results
+mv ./results.panoptic.json ./panoptic_test-dev2017_[algorithm_name]_results.json
+zip panoptic_test-dev2017_[algorithm_name]_results.zip -ur panoptic_test-dev2017_[algorithm_name]_results panoptic_test-dev2017_[algorithm_name]_results.json
+```
--- a/docs/zh_cn/user_guides/tracking_analysis_tools.md
+++ b/docs/zh_cn/user_guides/tracking_analysis_tools.md
+**我们在 `tools/` 目录下提供了很多有用的工具。**
+
+## MOT 测试时参数搜索
+
+`tools/analysis_tools/mot/mot_param_search.py` 可以搜索 MOT 模型中 `tracker` 的参数。
+它与 `tools/test.py` 的使用方式相同，但配置上**有所不同**。
+
+下面是修改配置的示例：
+
+1. 定义要记录的期望评估指标。
+
+   例如，你可以将 `evaluator` 定义为：
+
+   ```python
+   test_evaluator=dict(type='MOTChallengeMetrics', metric=['HOTA', 'CLEAR', 'Identity'])
+   ```
+
+   当然，你也可以自定义 `test_evaluator` 中 `metric` 的内容。你可以自由选择 `['HOTA', 'CLEAR', 'Identity']` 中的一个或多个指标。
+
+2. 定义要搜索的参数及其取值。
+
+   假设你有一个 `tracker` 的配置如下：
+
+   ```python
+   model=dict(
+       tracker=dict(
+           type='BaseTracker',
+           obj_score_thr=0.5,
+           match_iou_thr=0.5
+       )
+   )
+   ```
+
+   如果你想要搜索 `tracker` 的参数，只需将其值改为一个列表，如下所示：
+
+   ```python
+   model=dict(
+       tracker=dict(
+           type='BaseTracker',
+           obj_score_thr=[0.4, 0.5, 0.6],
+           match_iou_thr=[0.4, 0.5, 0.6, 0.7]
+       )
+   )
+   ```
+
+   然后，脚本将测试一共12种情况并且记录结果。
+
+## MOT 误差可视化
+
+`tools/analysis_tools/mot/mot_error_visualize.py` 可以为多目标跟踪可视化错误。
+
+该脚本需要推断的结果作为输入。默认情况下，**红色**边界框表示误检（false positive），**黄色**边界框表示漏检（false negative），**蓝色**边界框表示ID切换（ID switch）。
+
+```
+python tools/analysis_tools/mot/mot_error_visualize.py \
+    ${CONFIG_FILE}\
+    --input ${INPUT} \
+    --result-dir ${RESULT_DIR} \
+    [--output-dir ${OUTPUT}] \
+    [--fps ${FPS}] \
+    [--show] \
+    [--backend ${BACKEND}]
+```
+
+`RESULT_DIR` 中包含了所有视频的推断结果，推断结果是一个 `txt` 文件。
+
+可选参数：
+
+- `OUTPUT`：可视化演示的输出。如果未指定，`--show` 是必选的，用于即时显示视频。
+- `FPS`：输出视频的帧率。
+- `--show`：是否即时显示视频。
+- `BACKEND`：用于可视化边界框的后端。选项包括 `cv2` 和 `plt`。
+
+## 浏览数据集
+
+`tools/analysis_tools/mot/browse_dataset.py` 可以可视化训练数据集，以检查数据集配置是否正确。
+
+**示例：**
+
+```shell
+python tools/analysis_tools/browse_dataset.py ${CONFIG_FILE} [--show-interval ${SHOW_INTERVAL}]
+```
+
+可选参数：
+
+- `SHOW_INTERVAL`: 显示的间隔时间（秒）。
+- `--show`: 是否即时显示图像。
--- a/docs/zh_cn/user_guides/tracking_config.md
+++ b/docs/zh_cn/user_guides/tracking_config.md
+# 学习更多与配置相关的事
+
+我们用 python 文档作为我们的配置系统。你可以在 `MMDetection/configs` 底下找到所有已提供的配置文件。
+
+我们把模块化和继承化设计融入我们的配置系统，这使我们很方便去进行各种实验。如果你想查看相关的配置文件，你可以跑 `python tools/misc/print_config.py /PATH/TO/CONFIG` 去看完整的详细配置。
+
+## 完整配置的简要说明
+
+一个完整的配置通常包含以下主要的字段：
+
+`model`：一个模型的基本配置，包含 `data_preprocessor`、`detector`、`motion` 之类的模块，还有 `train_cfg`、`test_cfg` 等等；
+
+`train_dataloader`：训练数据集的配置，通常包含 `batch_size`、 `num_workers`、 `sampler`、 `dataset` 等等；
+
+`val_dataloader`：验证数据集的配置，与训练数据集的配置类似；
+
+`test_dataloader`：测试数据集的配置，与训练数据集的配置类似；
+
+`val_evaluator`：验证评估器的配置，例如 `type='MOTChallengeMetrics'` 是 MOT 任务里面的测量标准；
+
+`test_evaluator`：测试评估器的配置，与验证评估器的配置类似；
+
+`train_cfg`：训练循环的配置，例如 `type='EpochBasedTrainLoop'` ；
+
+`val_cfg`：验证循环的配置，例如 `type='VideoValLoop'` ；
+
+`test_cfg`：测试循环的配置，例如 `type='VideoTestLoop'` ；
+
+`default_hooks`：默认鱼钩的配置，包含计时器、日志、参数调度程序、检查点、样本种子、可视化；
+
+`vis_backends`：可视化后端的配置，默认使用 `type='LocalVisBackend'` ；
+
+`visualizer`：可视化工具的配置，例如MOT任务使用 `type='TrackLocalVisualizer'` ；
+
+`param_scheduler`：参数调度程序的配置，通常里面设置学习率调度程序；
+
+`optim_wrapper`：优化器封装的配置，包含优化相关的信息，例如优化器、梯度剪裁等；
+
+`load_from`：加载预训练模型的路径；
+
+`resume`：布尔值，如果是 `True` ，会从 `load_from` 加载模型的检查点，训练会恢复至检查点的迭代次数。
+
+## 通过脚本参数修改配置
+
+当使用 `tools/train.py` 或 `tools/test_trackin.py` 执行任务时，可以指定 `--cfg-options` 来就地修改配置。我们举几个例子如下。有关更多详细信息，请参阅[MMEngine](https://mmengine.readthedocs.io/zh_CN/latest/advanced_tutorials/config.html)。
+
+### 更新 dict 链的配置键
+
+可以按照原始配置中 `dict` 键的顺序指定配置选项，例如，设置 `--cfg-options model.detector.backbone.norm_eval=False` 会将模型主干中的所有 `BN` 模块更改为训练模式。
+
+### 更新配置列表中的关键字
+
+一些配置的 `dict` 关键字会以列表的形式组成，例如，测试管道中的 `test_dataloader.dataset.pipeline` 以列表形式出现，即 `[dict(type='LoadImageFromFile'), ...]`。如果你想在测试管道中将 `LoadImageFromFile` 更改为 `LoadImageFromWebcam`，可以设置 `--cfg-options test_dataloader.dataset.pipeline.0.type=LoadImageFromWebcam`。
+
+### 更新列表/元组的值
+
+要被更新的可能是一个列表或一个元组，例如，你可以通过指定 `--cfg options model.data_processor.mean=[0,0,0]` 来更改 `data_preprocessor` 的平均值的关键字。请注意，指定值内不允许有空格。
+
+## 配置文件结构
+
+`config/_base_` 下有三种基本组件类型，即数据集、模型和默认运行时间。可以用它们来轻松构建许多方法，例如 `SORT`，`DeepSORT`。由 `_base_` 中的组件组成的配置称为基元。
+
+对于同一文件夹下的配置文件，建议只有一个基元配置文件。其他配置文件都应该从基元配置文件继承基本结构，这样，继承级别的最大值为 3。
+
+为了便于理解，我们建议贡献者继承现有的方法。例如，如果在 `Faster R-CNN` 的基础上进行了一些修改，用户可以首先通过指定 `_base_ = ../_base_/models/faster-rcnn_r50-dc5.py` 来继承基本的 `Faster R-CNN` 结构，然后修改配置文件中的必要字段。
+
+如果你正在构建一个与任何现有方法都不共享结构的全新方法，则可以在 `configs` 下创建一个新文件夹 method_name。
+
+有关详细文档，请参阅[MMEngine](https://mmengine.readthedocs.io/zh_CN/latest/advanced_tutorials/config.html)。
+
+## 配置命名风格
+
+我们根据以下风格去命名配置文件，建议贡献者遵从相同风格。
+
+`{method}_{module}_{train_cfg}_{train_data}_{test_data}`
+
+`{method}`: 方法名称，例如 `sort`；
+
+`{module}`: 方法的基本模块，例如 `faster-rcnn_r50_fpn`；
+
+`{train_cfg}`: 训练配置通常包含批量大小、迭代次数等，例如 `8xb4-80e`；
+
+`{train_data}`: 训练数据集，例如 `mot17halftrain`；
+
+`{test_data}`: 测试数据集，例如 `test-mot17halfval`。
+
+## 常问问题
+
+### 忽略基本配置中的某些字段
+
+有时候你可以设置 `_delete_=True` 去忽略基本配置中的一些字段，你可以参考[MMEngine](https://mmengine.readthedocs.io/zh_CN/latest/advanced_tutorials/config.html)进行简单说明。
+
+### 跟踪数据结构介绍
+
+#### 优点和新功能
+
+在 `mmdetection` 跟踪任务中，我们使用视频来组织数据集，并使用 `TrackDataSample` 来描述数据集信息。
+
+基于视频组织，我们提供了 `transform UniformRefFrameSample` 来对关键帧和参考帧进行采样，并使用 `TransformBroadcaster` 进行剪辑训练。
+
+在某种程度上，`TrackDataSample` 可以被视为多个 `DetDataSample` 的包装器。它包含一个 `video_data_samples`，这是一个以 `DetDataSample` 组成的列表，里面每个 `DetDataSample` 对应一个帧。此外，它的元信息包括关键帧的索引和参考帧的索引，用与剪辑训练。
+
+得益于基于视频的数据组织，整个视频可以直接被测试。这种方式更简洁直观。如果你的 GPU 内存无法容纳整个视频，我们还提供基于图像的测试方法。
+
+## 要做的事
+
+`StrongSORT`、`Mask2Former` 等算法不支持基于视频的测试，这些算法对 GPU 内存提出了挑战，我们将来会优化这个问题。
+
+现在，我们不支持像 `MOT Challenge dataset` 这样的基于视频的数据集和像 `Crowdhuman` 用于 `QDTrack` 算法这样的基于图像的数据集进行联合训练。我们将来会优化这个问题。
--- a/docs/zh_cn/user_guides/tracking_dataset_prepare.md
+++ b/docs/zh_cn/user_guides/tracking_dataset_prepare.md
+## 数据集准备
+
+本页面提供了现有基准数据集的准备说明，包括：
+
+- 多目标跟踪
+
+  - [MOT Challenge](https://motchallenge.net/)
+  - [CrowdHuman](https://www.crowdhuman.org/)
+
+- 视频实例分割
+
+  - [YouTube-VIS](https://youtube-vos.org/dataset/vis/)
+
+### 1. 下载数据集
+
+请从官方网站下载数据集，并将数据集的根目录建立软链接到 `$MMDETECTION/data` 目录下。
+
+#### 1.1 多目标跟踪
+
+- 对于多目标跟踪任务的训练和测试，需要下载MOT Challenge数据集之一（例如MOT17、MOT20），CrowdHuman数据集可以作为补充数据集。
+
+- 对于中国的用户，可以从 [OpenDataLab](https://opendatalab.com/) 上高速下载如下数据集：
+
+  - [MOT17](https://opendatalab.com/MOT17/download)
+  - [MOT20](https://opendatalab.com/MOT20/download)
+  - [CrowdHuman](https://opendatalab.com/CrowdHuman/download)
+
+#### 1.2 视频实例分割
+
+- 对于视频实例分割任务的训练和测试，只需要选择一个YouTube-VIS数据集（例如YouTube-VIS 2019、YouTube-VIS 2021）即可。
+- 可以从 [YouTubeVOS](https://codalab.lisn.upsaclay.fr/competitions/6064) 上下载YouTube-VIS 2019数据集。
+- 可以从 [YouTubeVOS](https://codalab.lisn.upsaclay.fr/competitions/7680) 上下载YouTube-VIS 2021数据集。
+
+#### 1.3 数据结构
+
+如果您的文件夹结构与以下结构不同，则可能需要在配置文件中更改相应的路径。
+
+```
+mmdetection
+├── mmdet
+├── tools
+├── configs
+├── data
+│   ├── coco
+│   │   ├── train2017
+│   │   ├── val2017
+│   │   ├── test2017
+│   │   ├── annotations
+│   │
+|   ├── MOT15/MOT16/MOT17/MOT20
+|   |   ├── train
+|   |   |   ├── MOT17-02-DPM
+|   |   |   |   ├── det
+|   │   │   │   ├── gt
+|   │   │   │   ├── img1
+|   │   │   │   ├── seqinfo.ini
+│   │   │   ├── ......
+|   |   ├── test
+|   |   |   ├── MOT17-01-DPM
+|   |   |   |   ├── det
+|   │   │   │   ├── img1
+|   │   │   │   ├── seqinfo.ini
+│   │   │   ├── ......
+│   │
+│   ├── crowdhuman
+│   │   ├── annotation_train.odgt
+│   │   ├── annotation_val.odgt
+│   │   ├── train
+│   │   │   ├── Images
+│   │   │   ├── CrowdHuman_train01.zip
+│   │   │   ├── CrowdHuman_train02.zip
+│   │   │   ├── CrowdHuman_train03.zip
+│   │   ├── val
+│   │   │   ├── Images
+│   │   │   ├── CrowdHuman_val.zip
+│   │
+```
+
+### 2. 转换注释
+
+在这种情况下，您需要将官方注释（Annotations）转换为COCO格式。我们提供了相应的脚本，使用方法如下：
+
+```shell
+# MOT17
+# 其他 MOT Challenge 数据集的处理方式与 MOT17 相同。
+python ./tools/dataset_converters/mot2coco.py -i ./data/MOT17/ -o ./data/MOT17/annotations --split-train --convert-det
+python ./tools/dataset_converters/mot2reid.py -i ./data/MOT17/ -o ./data/MOT17/reid --val-split 0.2 --vis-threshold 0.3
+
+# CrowdHuman
+python ./tools/dataset_converters/crowdhuman2coco.py -i ./data/crowdhuman -o ./data/crowdhuman/annotations
+
+# YouTube-VIS 2019
+python ./tools/dataset_converters/youtubevis/youtubevis2coco.py -i ./data/youtube_vis_2019 -o ./data/youtube_vis_2019/annotations --version 2019
+
+# YouTube-VIS 2021
+python ./tools/dataset_converters/youtubevis/youtubevis2coco.py -i ./data/youtube_vis_2021 -o ./data/youtube_vis_2021/annotations --version 2021
+
+```
+
+运行这些脚本后，文件夹结构将如下所示：
+
+```
+mmdetection
+├── mmtrack
+├── tools
+├── configs
+├── data
+│   ├── coco
+│   │   ├── train2017
+│   │   ├── val2017
+│   │   ├── test2017
+│   │   ├── annotations
+│   │
+|   ├── MOT15/MOT16/MOT17/MOT20
+|   |   ├── train
+|   |   |   ├── MOT17-02-DPM
+|   |   |   |   ├── det
+|   │   │   │   ├── gt
+|   │   │   │   ├── img1
+|   │   │   │   ├── seqinfo.ini
+│   │   │   ├── ......
+|   |   ├── test
+|   |   |   ├── MOT17-01-DPM
+|   |   |   |   ├── det
+|   │   │   │   ├── img1
+|   │   │   │   ├── seqinfo.ini
+│   │   │   ├── ......
+|   |   ├── annotations
+|   |   ├── reid
+│   │   │   ├── imgs
+│   │   │   ├── meta
+│   │
+│   ├── crowdhuman
+│   │   ├── annotation_train.odgt
+│   │   ├── annotation_val.odgt
+│   │   ├── train
+│   │   │   ├── Images
+│   │   │   ├── CrowdHuman_train01.zip
+│   │   │   ├── CrowdHuman_train02.zip
+│   │   │   ├── CrowdHuman_train03.zip
+│   │   ├── val
+│   │   │   ├── Images
+│   │   │   ├── CrowdHuman_val.zip
+│   │   ├── annotations
+│   │   │   ├── crowdhuman_train.json
+│   │   │   ├── crowdhuman_val.json
+│   │
+│   ├── youtube_vis_2019
+│   │   │── train
+│   │   │   │── JPEGImages
+│   │   │   │── ......
+│   │   │── valid
+│   │   │   │── JPEGImages
+│   │   │   │── ......
+│   │   │── test
+│   │   │   │── JPEGImages
+│   │   │   │── ......
+│   │   │── train.json (the official annotation files)
+│   │   │── valid.json (the official annotation files)
+│   │   │── test.json (the official annotation files)
+│   │   │── annotations (the converted annotation file)
+│   │
+│   ├── youtube_vis_2021
+│   │   │── train
+│   │   │   │── JPEGImages
+│   │   │   │── instances.json (the official annotation files)
+│   │   │   │── ......
+│   │   │── valid
+│   │   │   │── JPEGImages
+│   │   │   │── instances.json (the official annotation files)
+│   │   │   │── ......
+│   │   │── test
+│   │   │   │── JPEGImages
+│   │   │   │── instances.json (the official annotation files)
+│   │   │   │── ......
+│   │   │── annotations (the converted annotation file)
+```
+
+#### MOT15/MOT16/MOT17/MOT20中的注释和reid文件夹
+
+以 MOT17 数据集为例，其他数据集的结构类似。
+
+在 `data/MOT17/annotations` 文件夹中有8个JSON文件：
+
+`train_cocoformat.json`: 包含MOT17数据集训练集的注释信息的JSON文件。
+
+`train_detections.pkl`: 包含MOT17数据集训练集的公共检测结果的Pickle文件。
+
+`test_cocoformat.json`: 包含MOT17数据集测试集的注释信息的JSON文件。
+
+`test_detections.pkl`: 包含MOT17数据集测试集的公共检测结果的Pickle文件。
+
+`half-train_cocoformat.json`、`half-train_detections.pkl`、`half-val_cocoformat.json` 和 `half-val_detections.pkl` 与 `train_cocoformat.json` 和 `train_detections.pkl` 具有类似的含义。`half` 表示将训练集中的每个视频分成两半。前一半的视频被标记为 `half-train` 集，后一半的视频被标记为 `half-val` 集。
+
+`data/MOT17/reid` 文件夹的结构如下所示：
+
+```
+reid
+├── imgs
+│   ├── MOT17-02-FRCNN_000002
+│   │   ├── 000000.jpg
+│   │   ├── 000001.jpg
+│   │   ├── ...
+│   ├── MOT17-02-FRCNN_000003
+│   │   ├── 000000.jpg
+│   │   ├── 000001.jpg
+│   │   ├── ...
+├── meta
+│   ├── train_80.txt
+│   ├── val_20.txt
+```
+
+`train_80.txt` 中的 `80` 表示训练数据集在整个ReID数据集中的比例为80%。而验证数据集的比例为20%。
+
+关于训练，我们提供了一个注释列表 `train_80.txt`。列表中的每一行包含一个文件名及其对应的真实标签。格式如下所示：
+
+```
+MOT17-05-FRCNN_000110/000018.jpg 0
+MOT17-13-FRCNN_000146/000014.jpg 1
+MOT17-05-FRCNN_000088/000004.jpg 2
+MOT17-02-FRCNN_000009/000081.jpg 3
+```
+
+`MOT17-05-FRCNN_000110` 表示 `MOT17-05-FRCNN` 视频中的第110个人。
+
+对于验证集，注释列表 `val_20.txt` 的格式与上述相同。
+
+`reid/imgs` 中的图像是通过相应的 `gt.txt` 从 `MOT17/train` 中的原始图像中裁剪而来。真实标签的值应在 `[0, num_classes - 1]` 的范围内。
+
+#### CrowdHuman 中的 annotations 文件夹
+
+`data/crowdhuman/annotations` 文件夹下有两个JSON文件：
+
+`crowdhuman_train.json`：包含 CrowdHuman 数据集训练集的注释信息的JSON文件。
+`crowdhuman_val.json`：包含 CrowdHuman 数据集验证集的注释信息的JSON文件。
+
+#### youtube_vis_2019/youtube_vis2021 中的 annotations 文件夹
+
+There are 3 JSON files in `data/youtube_vis_2019/annotations` or `data/youtube_vis_2021/annotations`：
+
+`youtube_vis_2019_train.json`/`youtube_vis_2021_train.json`：包含 youtube_vis_2019/youtube_vis2021 数据集训练集的注释信息的JSON文件。
+
+`youtube_vis_2019_valid.json`/`youtube_vis_2021_valid.json`：包含 youtube_vis_2019/youtube_vis2021 数据集验证集的注释信息的JSON文件。
+
+`youtube_vis_2019_test.json`/`youtube_vis_2021_test.json`：包含 youtube_vis_2019/youtube_vis2021 数据集测试集的注释信息的JSON文件。
--- a/docs/zh_cn/user_guides/tracking_interference.md
+++ b/docs/zh_cn/user_guides/tracking_interference.md
+# 推理
+
+我们提供了一些演示脚本去推理一个给出的视频，或者是推理包含一系列连续照片的文件夹。想要获取该代码资源，请点击 [这里](https://github.com/open-mmlab/mmdetection/tree/tracking/demo)。
+
+若输入为文件夹格式，你需要标明这点。并且，图片命名应该**易于整理**，以便于你根据文件名字中包含的数字信息来重新调整图片的顺序。我们现在只支持 `.jpg`，`.jpeg` 和 `.png` 格式的图片。
+
+## MOT models 的推理
+
+该脚本能够使用多任务跟踪或者视频实例分割方法来推理一段输入的视频/一张图片。
+
+```shell
+python demo/mot_demo.py \
+    ${INPUTS}
+    ${CONFIG_FILE} \
+    [--checkpoint ${CHECKPOINT_FILE}] \
+    [--detector ${DETECTOR_FILE}] \
+    [--reid ${REID_FILE}] \
+    [--score-thr ${SCORE_THR}] \
+    [--device ${DEVICE}] \
+    [--out ${OUTPUT}] \
+    [--show]
+```
+
+`INPUTS` 和 `OUTPUT` 参数支持 _mp4 视频_ 格式和_文件夹_格式。
+
+**特别注意**：对于 `DeepSORT`、`SORT`、`StrongSORT`，他们需要单独加载 `reid` 和 `detector` 的权重。因此，我们会使用 `--detector` 和 `--reid` 来加载权重参数。其他的例如 `ByteTrack`、`OCSORT`、`QDTrack`、`MaskTrackRCNN` 以及 `Mask2Former` 这样的算法则使用 `--checkpoint` 来加载权重参数。
+
+可选参数：
+
+- `CHECKPOINT_FILE`： 可选择 checkpoint。
+- `DETECTOR_FILE`： 可选择 detector。
+- `REID_FILE`： 可选择 reid。
+- `SCORE_THR`： bboxes 的得分阈值。
+- `DEVICE`： 推理所需配置。可以选择 `cpu`，`cuda:0`,或者其他。
+- `OUTPUT`： 输出结果可视化的示例。如果未指定， `--show` 将强制显示动态视频。
+- `--show`： 是否即时显示视频。
+
+**运行 mot model 的示例:**
+
+```shell
+# 示例 1：不指定 --checkpoint 使用 --detector
+python demo/mot_demo.py \
+    demo/demo_mot.mp4 \
+    configs/sort/sort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py \
+    --detector \
+    https://download.openmmlab.com/mmtracking/mot/faster_rcnn/faster-rcnn_r50_fpn_4e_mot17-half-64ee2ed4.pth \
+    --out mot.mp4
+
+# 示例 2：使用 --checkpoint
+python demo/mot_demo.py \
+    demo/demo_mot.mp4 \
+    configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py \
+    --checkpoint https://download.openmmlab.com/mmtracking/mot/qdtrack/mot_dataset/qdtrack_faster-rcnn_r50_fpn_4e_mot17_20220315_145635-76f295ef.pth \
+    --out mot.mp4
+```
--- a/docs/zh_cn/user_guides/tracking_train_test_zh_cn.md
+++ b/docs/zh_cn/user_guides/tracking_train_test_zh_cn.md
+# 学习训练和测试
+
+## 训练
+
+本节将介绍如何在支持的数据集上训练现有模型。
+支持以下训练环境：
+
+- CPU
+- 单 GPU
+- 单节点多 GPU
+- 多节点
+
+您还可以使用 Slurm 管理作业。
+
+重要：
+
+- 在训练过程中，您可以通过修改 `train_cfg` 来改变评估间隔。
+  `train_cfg = dict(val_interval=10)`。这意味着每 10 个 epoch 对模型进行一次评估。
+- 所有配置文件中的默认学习率为 8 个 GPU。
+  根据[线性扩展规则](https://arxiv.org/abs/1706.02677)、
+  如果在每个 GPU 上使用不同的 GPU 或图像，则需要设置与批次大小成比例的学习率、
+  例如，8 个 GPU * 1 个图像/GPU 的学习率为 `lr=0.01`，16 个 GPU * 2 个图像/GPU 的学习率为 lr=0.04。
+- 在训练过程中，日志文件和检查点将保存到工作目录、
+  该目录由 CLI 参数 `--work-dir`指定。它默认使用 `./work_dirs/CONFIG_NAME`。
+- 如果需要混合精度训练，只需指定 CLI 参数 `--amp`。
+
+#### 1.在 CPU 上训练
+
+该模型默认放在 cuda 设备上。
+仅当没有 cuda 设备时，该模型才会放在 CPU 上。
+因此，如果要在 CPU 上训练模型，则需要先 `export CUDA_VISIBLE_DEVICES=-1` 以禁用 GPU 可见性。
+更多细节参见 [MMEngine](https://github.com/open-mmlab/mmengine/blob/ca282aee9e402104b644494ca491f73d93a9544f/mmengine/runner/runner.py#L849-L850).
+
+```shell 脚本
+CUDA_VISIBLE_DEVICES=-1 python tools/train.py ${CONFIG_FILE} [optional arguments]
+```
+
+在 CPU 上训练 MOT 模型 QDTrack 的示例：
+
+```shell 脚本
+CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py
+```
+
+#### 2. 在单 GPU 上训练
+
+如果您想在单 GPU 上训练模型, 您可以按照如下方法直接使用 `tools/train.py`.
+
+```shell 脚本
+python tools/train.py ${CONFIG_FILE} [optional arguments]
+```
+
+您可以使用 `export CUDA_VISIBLE_DEVICES=$GPU_ID` 命令选择GPU.
+
+在单 GPU 上训练 MOT 模型 QDTrack 的示例：
+
+```shell 脚本
+CUDA_VISIBLE_DEVICES=2 python tools/train.py configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py
+```
+
+#### 3. 在单节点多 GPU 上进行训练
+
+我们提供了 `tools/dist_train.sh`，用于在多个 GPU 上启动训练。
+基本用法如下。
+
+```shell 脚本
+bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
+```
+
+如果您想在一台机器上启动多个作业、
+例如，在拥有 8 个 GPU 的机器上启动 2 个 4-GPU 训练作业、
+需要为每个作业指定不同的端口（默认为 29500），以避免通信冲突。
+
+例如，可以在命令中设置端口如下。
+
+```shell 脚本
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
+CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
+```
+
+在单节点多 GPU 上训练 MOT 模型 QDTrack 的示例：
+
+```shell脚本
+bash ./tools/dist_train.sh configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py 8
+```
+
+#### 4. 在多个节点上训练
+
+如果使用以太网连接多台机器，只需运行以下命令即可：
+
+在第一台机器上
+
+```shell 脚本
+NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
+```
+
+在第二台机器上:
+
+```shell script
+NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
+```
+
+如果没有 InfiniBand 等高速网络，速度通常会很慢。
+
+#### 5. 使用 Slurm 进行训练
+
+[Slurm](https://slurm.schedmd.com/)是一个用于计算集群的优秀作业调度系统。
+在 Slurm 管理的集群上，您可以使用 `slurm_train.sh` 生成训练作业。
+它支持单节点和多节点训练。
+
+基本用法如下。
+
+```shell 脚本
+bash ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} ${GPUS}
+```
+
+使用 Slurm 训练 MOT 模型 QDTrack 的示例：
+
+```shell脚本
+PORT=29501 \
+GPUS_PER_NODE=8 \
+SRUN_ARGS="--quotatype=reserved" \
+bash ./tools/slurm_train.sh \
+mypartition \
+mottrack
+configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py
+./work_dirs/QDTrack \
+8
+```
+
+## 测试
+
+本节将介绍如何在支持的数据集上测试现有模型。
+支持以下测试环境：
+
+- CPU
+- 单 GPU
+- 单节点多 GPU
+- 多节点
+
+您还可以使用 Slurm 管理作业。
+
+重要：
+
+- 在 MOT 中，某些算法（如 `DeepSORT`、`SORT`、`StrongSORT`）需要分别加载 `reid` 的权重和 `detector` 的权重。
+  其他算法，如`ByteTrack`、`OCSORT`和`QDTrack`则不需要。因此，我们提供了 `--checkpoint`、`--detector` 和 `--reid`来加载权重。
+- 我们提供了两种评估和测试模型的方法，即基于视频的测试和基于图像的测试。 有些算法如 `StrongSORT`, `Mask2former` 只支持基于视频的测试. 如果您的 GPU 内存无法容纳整个视频，您可以通过设置采样器类型来切换测试方式。
+  例如
+  基于视频的测试：`sampler=dict(type='DefaultSampler', shuffle=False, round_up=False)`
+  基于图像的测试：`sampler=dict（type='TrackImgSampler'）`
+- 您可以通过修改 evaluator 中的关键字 `outfile_prefix` 来设置结果保存路径。
+  例如，`val_evaluator = dict(outfile_prefix='results/sort_mot17')`。
+  否则，将创建一个临时文件，并在评估后删除。
+- 如果您只想要格式化的结果而不需要评估，可以设置 `format_only=True`。
+  例如，`test_evaluator = dict(type='MOTChallengeMetric', metric=['HOTA', 'CLEAR', 'Identity'], outfile_prefix='sort_mot17_results', format_only=True)`
+
+#### 1. 在 CPU 上测试
+
+模型默认在 cuda 设备上运行。
+只有在没有 cuda 设备的情况下，模型才会在 CPU 上运行。
+因此，如果要在 CPU 上测试模型，您需要 `export CUDA_VISIBLE_DEVICES=-1` 先禁用 GPU 可见性。
+
+更多细节请参考[MMEngine](https://github.com/open-mmlab/mmengine/blob/ca282aee9e402104b644494ca491f73d93a9544f/mmengine/runner/runner.py#L849-L850).
+
+```shell 脚本
+CUDA_VISIBLE_DEVICES=-1 python tools/test_tracking.py ${CONFIG_FILE} [optional arguments]
+```
+
+在 CPU 上测试 MOT 模型 SORT 的示例：
+
+```shell 脚本
+CUDA_VISIBLE_DEVICES=-1 python tools/test_tracking.py configs/sort/sort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py --detector ${CHECKPOINT_FILE}
+```
+
+#### 2. 在单 GPU 上测试
+
+如果您想在单 GPU 上测试模型，可以直接使用 `tools/test_tracking.py`，如下所示。
+
+```shell 脚本
+python tools/test_tracking.py ${CONFIG_FILE} [optional arguments]
+```
+
+您可以使用 `export CUDA_VISIBLE_DEVICES=$GPU_ID` 来选择 GPU。
+
+在单 GPU 上测试 MOT 模型 QDTrack 的示例：
+
+```shell 脚本
+CUDA_VISIBLE_DEVICES=2 python tools/test_tracking.py configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py --detector ${CHECKPOINT_FILE}
+```
+
+#### 3. 在单节点多 GPU 上进行测试
+
+我们提供了 `tools/dist_test_tracking.sh`，用于在多个 GPU 上启动测试。
+基本用法如下。
+
+```shell 脚本
+bash ./tools/dist_test_tracking.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
+```
+
+在单节点多 GPU 上测试 MOT 模型 DeepSort 的示例：
+
+```shell 脚本
+bash ./tools/dist_test_tracking.sh configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py 8 --detector ${CHECKPOINT_FILE} --reid ${CHECKPOINT_FILE}
+```
+
+#### 4. 在多个节点上测试
+
+您可以在多个节点上进行测试，这与 "在多个节点上进行训练 "类似。
+
+#### 5. 使用 Slurm 进行测试
+
+在 Slurm 管理的集群上，您可以使用 `slurm_test_tracking.sh` 生成测试作业。
+它支持单节点和多节点测试。
+
+基本用法如下。
+
+```shell 脚本
+[GPUS=${GPUS}] bash tools/slurm_test_tracking.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} [optional arguments]
+```
+
+使用 Slurm 测试 VIS 模型 Mask2former 的示例：
+
+```shell 脚本
+GPUS=8
+bash tools/slurm_test_tracking.sh \
+mypartition \
+vis \
+configs/mask2former_vis/mask2former_r50_8xb2-8e_youtubevis2021.py \
+--checkpoint ${CHECKPOINT_FILE}
+```
--- a/docs/zh_cn/user_guides/tracking_visualization.md
+++ b/docs/zh_cn/user_guides/tracking_visualization.md
+# 了解可视化
+
+## 本地的可视化
+
+这一节将会展示如何使用本地的工具可视化 detection/tracking 的运行结果。
+
+如果你想要画出预测结果的图像，你可以如下示例，将 `TrackVisualizationHook` 中的 draw 的参数设置为 `draw=True`。
+
+```shell
+default_hooks = dict(visualization=dict(type='TrackVisualizationHook', draw=True))
+```
+
+`TrackVisualizationHook` 共有如下参数：
+
+- `draw`： 是否绘制预测结果。如果选择 False，将不会显示图像。该参数默认设置为 False。
+- `interval`： 可视化的间隔。默认值为 30。
+- `score_thr`： 确定是否可视化边界框和掩码的阈值。默认值是 0.3。
+- `show`： 是否展示绘制的图像。默认不显示。
+- `wait_time`： 展示的时间间隔(秒)。默认为 0。
+- `test_out_dir`： 测试过程中绘制图像保存的目录。
+- `backend_args`： 用于实例化文件客户端的参数。默认值为 `None `。
+
+在 `TrackVisualizationHook` 中，将调用 `TrackLocalVisualizer` 来实现 MOT 和 VIS 任务的可视化。具体细节如下。
+
+你可以通过 MMEngine 获取  [Visualization](https://github.com/open-mmlab/mmengine/blob/main/docs/zh_cn/advanced_tutorials/visualization.md) 和  [Hook](https://github.com/open-mmlab/mmengine/blob/main/docs/zh_cn/tutorials/hook.md) 的更多细节。
+
+### Tracking 的可视化
+
+我们使用 `TrackLocalVisualizer` 这个类以实现跟踪任务可视化。调用方式如下：
+
+```python
+visualizer = dict(type='TrackLocalVisualizer')
+```
+
+visualizer 共有如下的参数：
+
+- `name`： 所选实例的名称。默认值为 ‘visualizer’。
+
+- `image`： 用于绘制的原始图像。格式需要为 RGB。默认为 None。
+
+- `vis_backends`： 可视化后端配置列表。默认为 None。
+
+- `save_dir`： 所有后端存储的保存文件目录。如果为 None，后端将不会保存任何数据。
+
+- `line_width`： 边框宽度。默认值为 3。
+
+- `alpha`： 边界框和掩码的透明度。默认为 0.8。
+
+这里提供了一个 DeepSORT 的可视化示例：
+
+![test_img_89](https://user-images.githubusercontent.com/99722489/186062929-6d0e4663-0d8e-4045-9ec8-67e0e41da876.png)
--- a/docs/zh_cn/user_guides/train.md
+++ b/docs/zh_cn/user_guides/train.md
+# 在标准数据集上训练预定义的模型
+
+MMDetection 也为训练检测模型提供了开盖即食的工具。本节将展示在标准数据集（比如 COCO）上如何训练一个预定义的模型。
+
+### 数据集
+
+训练需要准备好数据集，细节请参考 [数据集准备](#%E6%95%B0%E6%8D%AE%E9%9B%86%E5%87%86%E5%A4%87) 。
+
+**注意**：
+目前，`configs/cityscapes` 文件夹下的配置文件都是使用 COCO 预训练权值进行初始化的。如果网络连接不可用或者速度很慢，你可以提前下载现存的模型。否则可能在训练的开始会有错误发生。
+
+### 学习率自动缩放
+
+**注意**：在配置文件中的学习率是在 8 块 GPU，每块 GPU 有 2 张图像（批大小为 8\*2=16）的情况下设置的。其已经设置在 `config/_base_/schedules/schedule_1x.py` 中的 `auto_scale_lr.base_batch_size`。学习率会基于批次大小为 `16`时的值进行自动缩放。同时，为了不影响其他基于 mmdet 的 codebase，启用自动缩放标志 `auto_scale_lr.enable` 默认设置为 `False`。
+
+如果要启用此功能，需在命令添加参数 `--auto-scale-lr`。并且在启动命令之前，请检查下即将使用的配置文件的名称，因为配置名称指示默认的批处理大小。
+在默认情况下，批次大小是 `8 x 2 = 16`，例如：`faster_rcnn_r50_caffe_fpn_90k_coco.py` 或者 `pisa_faster_rcnn_x101_32x4d_fpn_1x_coco.py`；若不是默认批次，你可以在配置文件看到像 `_NxM_` 字样的，例如：`cornernet_hourglass104_mstest_32x3_210e_coco.py` 的批次大小是 `32 x 3 = 96`, 或者 `scnet_x101_64x4d_fpn_8x1_20e_coco.py` 的批次大小是 `8 x 1 = 8`。
+
+**请记住：如果使用不是默认批次大小为 `16`的配置文件，请检查配置文件中的底部，会有 `auto_scale_lr.base_batch_size`。如果找不到，可以在其继承的 `_base_=[xxx]` 文件中找到。另外，如果想使用自动缩放学习率的功能，请不要修改这些值。**
+
+学习率自动缩放基本用法如下：
+
+```shell
+python tools/train.py \
+    ${CONFIG_FILE} \
+    --auto-scale-lr \
+    [optional arguments]
+```
+
+执行命令之后，会根据机器的GPU数量和训练的批次大小对学习率进行自动缩放，缩放方式详见 [线性扩展规则](https://arxiv.org/abs/1706.02677) ，比如：在 4 块 GPU 并且每张 GPU 上有 2 张图片的情况下 `lr=0.01`，那么在 16 块 GPU 并且每张 GPU 上有 4 张图片的情况下, LR 会自动缩放至 `lr=0.08`。
+
+如果不启用该功能，则需要根据 [线性扩展规则](https://arxiv.org/abs/1706.02677) 来手动计算并修改配置文件里面 `optimizer.lr` 的值。
+
+### 使用单 GPU 训练
+
+我们提供了 `tools/train.py` 来开启在单张 GPU 上的训练任务。基本使用如下：
+
+```shell
+python tools/train.py \
+    ${CONFIG_FILE} \
+    [optional arguments]
+```
+
+在训练期间，日志文件和 checkpoint 文件将会被保存在工作目录下，它需要通过配置文件中的 `work_dir` 或者 CLI 参数中的 `--work-dir` 来指定。
+
+默认情况下，模型将在每轮训练之后在 validation 集上进行测试，测试的频率可以通过设置配置文件来指定：
+
+```python
+# 每 12 轮迭代进行一次测试评估
+train_cfg = dict(val_interval=12)
+```
+
+这个工具接受以下参数：
+
+- `--work-dir ${WORK_DIR}`: 覆盖工作目录.
+- `--resume`：自动从work_dir中的最新检查点恢复.
+- `--resume ${CHECKPOINT_FILE}`: 从某个 checkpoint 文件继续训练.
+- `--cfg-options 'Key=value'`: 覆盖使用的配置文件中的其他设置.
+
+**注意**：
+`resume` 和 `load-from` 的区别：
+
+`resume` 既加载了模型的权重和优化器的状态，也会继承指定 checkpoint 的迭代次数，不会重新开始训练。`load-from` 则是只加载模型的权重，它的训练是从头开始的，经常被用于微调模型。其中load-from需要写入配置文件中，而resume作为命令行参数传入。
+
+### 使用 CPU 训练
+
+使用 CPU 训练的流程和使用单 GPU 训练的流程一致，我们仅需要在训练流程开始前禁用 GPU。
+
+```shell
+export CUDA_VISIBLE_DEVICES=-1
+```
+
+之后运行单 GPU 训练脚本即可。
+
+**注意**：
+
+我们不推荐用户使用 CPU 进行训练，这太过缓慢。我们支持这个功能是为了方便用户在没有 GPU 的机器上进行调试。
+
+### 在多 GPU 上训练
+
+我们提供了 `tools/dist_train.sh` 来开启在多 GPU 上的训练。基本使用如下：
+
+```shell
+bash ./tools/dist_train.sh \
+    ${CONFIG_FILE} \
+    ${GPU_NUM} \
+    [optional arguments]
+```
+
+可选参数和单 GPU 训练的可选参数一致。
+
+#### 同时启动多个任务
+
+如果你想在一台机器上启动多个任务的话，比如在一个有 8 块 GPU 的机器上启动 2 个需要 4 块GPU的任务，你需要给不同的训练任务指定不同的端口（默认为 29500）来避免冲突。
+
+如果你使用 `dist_train.sh` 来启动训练任务，你可以使用命令来设置端口。
+
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
+CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
+```
+
+### 使用多台机器训练
+
+如果您想使用由 ethernet 连接起来的多台机器， 您可以使用以下命令:
+
+在第一台机器上:
+
+```shell
+NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
+```
+
+在第二台机器上:
+
+```shell
+NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
+```
+
+但是，如果您不使用高速网路连接这几台机器的话，训练将会非常慢。
+
+### 使用 Slurm 来管理任务
+
+Slurm 是一个常见的计算集群调度系统。在 Slurm 管理的集群上，你可以使用 `slurm.sh` 来开启训练任务。它既支持单节点训练也支持多节点训练。
+
+基本使用如下：
+
+```shell
+[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
+```
+
+以下是在一个名称为 _dev_ 的 Slurm 分区上，使用 16 块 GPU 来训练 Mask R-CNN 的例子，并且将 `work-dir` 设置在了某些共享文件系统下。
+
+```shell
+GPUS=16 ./tools/slurm_train.sh dev mask_r50_1x configs/mask_rcnn_r50_fpn_1x_coco.py /nfs/xxxx/mask_rcnn_r50_fpn_1x
+```
+
+你可以查看 [源码](https://github.com/open-mmlab/mmdetection/blob/main/tools/slurm_train.sh) 来检查全部的参数和环境变量.
+
+在使用 Slurm 时，端口需要以下方的某个方法之一来设置。
+
+1. 通过 `--options` 来设置端口。我们非常建议用这种方法，因为它无需改变原始的配置文件。
+
+   ```shell
+   CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --cfg-options 'dist_params.port=29500'
+   CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --cfg-options 'dist_params.port=29501'
+   ```
+
+2. 修改配置文件来设置不同的交流端口。
+
+   在 `config1.py` 中，设置：
+
+   ```python
+   dist_params = dict(backend='nccl', port=29500)
+   ```
+
+   在 `config2.py` 中，设置：
+
+   ```python
+   dist_params = dict(backend='nccl', port=29501)
+   ```
+
+   然后你可以使用 `config1.py` 和 `config2.py` 来启动两个任务了。
+
+   ```shell
+   CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
+   CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}
+   ```
+
+# 在自定义数据集上进行训练
+
+通过本文档，你将会知道如何使用自定义数据集对预先定义好的模型进行推理，测试以及训练。我们使用 [balloon dataset](https://github.com/matterport/Mask_RCNN/tree/master/samples/balloon) 作为例子来描述整个过程。
+
+基本步骤如下：
+
+1. 准备自定义数据集
+2. 准备配置文件
+3. 在自定义数据集上进行训练，测试和推理。
+
+## 准备自定义数据集
+
+MMDetection 一共支持三种形式应用新数据集：
+
+1. 将数据集重新组织为 COCO 格式。
+2. 将数据集重新组织为一个中间格式。
+3. 实现一个新的数据集。
+
+我们通常建议使用前面两种方法，因为它们通常来说比第三种方法要简单。
+
+在本文档中，我们展示一个例子来说明如何将数据转化为 COCO 格式。
+
+**注意**：在 MMDetection 3.0 之后，数据集和指标已经解耦（除了 CityScapes）。因此，用户在验证阶段使用任意的评价指标来评价模型在任意数据集上的性能。比如，用 VOC 评价指标来评价模型在 COCO 数据集的性能，或者同时使用 VOC 评价指标和 COCO 评价指标来评价模型在 OpenImages 数据集上的性能。
+
+### COCO标注格式
+
+用于实例分割的 COCO 数据集格式如下所示，其中的键（key）都是必要的，参考[这里](https://cocodataset.org/#format-data)来获取更多细节。
+
+```json
+{
+    "images": [image],
+    "annotations": [annotation],
+    "categories": [category]
+}
+
+
+image = {
+    "id": int,
+    "width": int,
+    "height": int,
+    "file_name": str,
+}
+
+annotation = {
+    "id": int,
+    "image_id": int,
+    "category_id": int,
+    "segmentation": RLE or [polygon],
+    "area": float,
+    "bbox": [x,y,width,height], # (x, y) 为 bbox 左上角的坐标
+    "iscrowd": 0 or 1,
+}
+
+categories = [{
+    "id": int,
+    "name": str,
+    "supercategory": str,
+}]
+```
+
+现在假设我们使用 balloon dataset。
+
+下载了数据集之后，我们需要实现一个函数将标注格式转化为 COCO 格式。然后我们就可以使用已经实现的 `CocoDataset` 类来加载数据并进行训练以及评测。
+
+如果你浏览过新数据集，你会发现格式如下：
+
+```json
+{'base64_img_data': '',
+ 'file_attributes': {},
+ 'filename': '34020010494_e5cb88e1c4_k.jpg',
+ 'fileref': '',
+ 'regions': {'0': {'region_attributes': {},
+   'shape_attributes': {'all_points_x': [1020,
+     1000,
+     994,
+     1003,
+     1023,
+     1050,
+     1089,
+     1134,
+     1190,
+     1265,
+     1321,
+     1361,
+     1403,
+     1428,
+     1442,
+     1445,
+     1441,
+     1427,
+     1400,
+     1361,
+     1316,
+     1269,
+     1228,
+     1198,
+     1207,
+     1210,
+     1190,
+     1177,
+     1172,
+     1174,
+     1170,
+     1153,
+     1127,
+     1104,
+     1061,
+     1032,
+     1020],
+    'all_points_y': [963,
+     899,
+     841,
+     787,
+     738,
+     700,
+     663,
+     638,
+     621,
+     619,
+     643,
+     672,
+     720,
+     765,
+     800,
+     860,
+     896,
+     942,
+     990,
+     1035,
+     1079,
+     1112,
+     1129,
+     1134,
+     1144,
+     1153,
+     1166,
+     1166,
+     1150,
+     1136,
+     1129,
+     1122,
+     1112,
+     1084,
+     1037,
+     989,
+     963],
+    'name': 'polygon'}}},
+ 'size': 1115004}
+```
+
+标注文件时是 JSON 格式的，其中所有键（key）组成了一张图片的所有标注。
+
+其中将 balloon dataset 转化为 COCO 格式的代码如下所示。
+
+```python
+import os.path as osp
+
+import mmcv
+
+from mmengine.fileio import dump, load
+from mmengine.utils import track_iter_progress
+
+
+def convert_balloon_to_coco(ann_file, out_file, image_prefix):
+    data_infos = load(ann_file)
+
+    annotations = []
+    images = []
+    obj_count = 0
+    for idx, v in enumerate(track_iter_progress(data_infos.values())):
+        filename = v['filename']
+        img_path = osp.join(image_prefix, filename)
+        height, width = mmcv.imread(img_path).shape[:2]
+
+        images.append(
+            dict(id=idx, file_name=filename, height=height, width=width))
+
+        for _, obj in v['regions'].items():
+            assert not obj['region_attributes']
+            obj = obj['shape_attributes']
+            px = obj['all_points_x']
+            py = obj['all_points_y']
+            poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
+            poly = [p for x in poly for p in x]
+
+            x_min, y_min, x_max, y_max = (min(px), min(py), max(px), max(py))
+
+            data_anno = dict(
+                image_id=idx,
+                id=obj_count,
+                category_id=0,
+                bbox=[x_min, y_min, x_max - x_min, y_max - y_min],
+                area=(x_max - x_min) * (y_max - y_min),
+                segmentation=[poly],
+                iscrowd=0)
+            annotations.append(data_anno)
+            obj_count += 1
+
+    coco_format_json = dict(
+        images=images,
+        annotations=annotations,
+        categories=[{
+            'id': 0,
+            'name': 'balloon'
+        }])
+    dump(coco_format_json, out_file)
+
+
+if __name__ == '__main__':
+    convert_balloon_to_coco(ann_file='data/balloon/train/via_region_data.json',
+                            out_file='data/balloon/train/annotation_coco.json',
+                            image_prefix='data/balloon/train')
+    convert_balloon_to_coco(ann_file='data/balloon/val/via_region_data.json',
+                            out_file='data/balloon/val/annotation_coco.json',
+                            image_prefix='data/balloon/val')
+```
+
+使用如上的函数，用户可以成功将标注文件转化为 JSON 格式，之后可以使用 `CocoDataset` 对模型进行训练，并用 `CocoMetric` 评测。
+
+## 准备配置文件
+
+第二步需要准备一个配置文件来成功加载数据集。假设我们想要用 balloon dataset 来训练配备了 FPN 的 Mask R-CNN ，如下是我们的配置文件。假设配置文件命名为 `mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py`，相应保存路径为 `configs/balloon/`，配置文件内容如下所示。详细的配置文件方法可以参考[学习配置文件 — MMDetection 3.0.0 文档](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/config.html#base)。
+
+```python
+# 新配置继承了基本配置，并做了必要的修改
+_base_ = '../mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py'
+
+# 我们还需要更改 head 中的 num_classes 以匹配数据集中的类别数
+model = dict(
+    roi_head=dict(
+        bbox_head=dict(num_classes=1), mask_head=dict(num_classes=1)))
+
+# 修改数据集相关配置
+data_root = 'data/balloon/'
+metainfo = {
+    'classes': ('balloon', ),
+    'palette': [
+        (220, 20, 60),
+    ]
+}
+train_dataloader = dict(
+    batch_size=1,
+    dataset=dict(
+        data_root=data_root,
+        metainfo=metainfo,
+        ann_file='train/annotation_coco.json',
+        data_prefix=dict(img='train/')))
+val_dataloader = dict(
+    dataset=dict(
+        data_root=data_root,
+        metainfo=metainfo,
+        ann_file='val/annotation_coco.json',
+        data_prefix=dict(img='val/')))
+test_dataloader = val_dataloader
+
+# 修改评价指标相关配置
+val_evaluator = dict(ann_file=data_root + 'val/annotation_coco.json')
+test_evaluator = val_evaluator
+
+# 使用预训练的 Mask R-CNN 模型权重来做初始化，可以提高模型性能
+load_from = 'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'
+
+```
+
+## 训练一个新的模型
+
+为了使用新的配置方法来对模型进行训练，你只需要运行如下命令。
+
+```shell
+python tools/train.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py
+```
+
+参考 [在标准数据集上训练预定义的模型](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/train.html#id1) 来获取更多详细的使用方法。
+
+## 测试以及推理
+
+为了测试训练完毕的模型，你只需要运行如下命令。
+
+```shell
+python tools/test.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py work_dirs/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon/epoch_12.pth
+```
+
+参考 [测试现有模型](https://mmdetection.readthedocs.io/zh_CN/latest/user_guides/test.html) 来获取更多详细的使用方法。
--- a/docs/zh_cn/user_guides/useful_hooks.md
+++ b/docs/zh_cn/user_guides/useful_hooks.md
+# 实用的钩子
+
+MMDetection 和 MMEngine 为用户提供了多种多样实用的钩子（Hook），包括 `MemoryProfilerHook`、`NumClassCheckHook` 等等。
+这篇教程介绍了 MMDetection 中实现的钩子功能及使用方式。若使用 MMEngine 定义的钩子请参考 [MMEngine 的钩子API文档](https://github.com/open-mmlab/mmengine/tree/main/docs/en/tutorials/hook.md).
+
+## CheckInvalidLossHook
+
+## NumClassCheckHook
+
+## MemoryProfilerHook
+
+[内存分析钩子](https://github.com/open-mmlab/mmdetection/blob/main/mmdet/engine/hooks/memory_profiler_hook.py)
+记录了包括虚拟内存、交换内存、当前进程在内的所有内存信息，它能够帮助捕捉系统的使用状况与发现隐藏的内存泄露问题。为了使用这个钩子，你需要先通过 `pip install memory_profiler psutil` 命令安装 `memory_profiler` 和 `psutil`。
+
+### 使用
+
+为了使用这个钩子，使用者需要添加如下代码至 config 文件
+
+```python
+custom_hooks = [
+    dict(type='MemoryProfilerHook', interval=50)
+]
+```
+
+### 结果
+
+在训练中，你会看到 `MemoryProfilerHook` 记录的如下信息：
+
+```text
+The system has 250 GB (246360 MB + 9407 MB) of memory and 8 GB (5740 MB + 2452 MB) of swap memory in total. Currently 9407 MB (4.4%) of memory and 5740 MB (29.9%) of swap memory were consumed. And the current training process consumed 5434 MB of memory.
+```
+
+```text
+2022-04-21 08:49:56,881 - mmengine - INFO - Memory information available_memory: 246360 MB, used_memory: 9407 MB, memory_utilization: 4.4 %, available_swap_memory: 5740 MB, used_swap_memory: 2452 MB, swap_memory_utilization: 29.9 %, current_process_memory: 5434 MB
+```
+
+## SetEpochInfoHook
+
+## SyncNormHook
+
+## SyncRandomSizeHook
+
+## YOLOXLrUpdaterHook
+
+## YOLOXModeSwitchHook
+
+## 如何实现自定义钩子
+
+通常，从模型训练的开始到结束，共有20个点位可以执行钩子。我们可以实现自定义钩子在不同点位执行，以便在训练中实现自定义操作。
+
+- global points: `before_run`, `after_run`
+- points in training: `before_train`, `before_train_epoch`, `before_train_iter`, `after_train_iter`, `after_train_epoch`, `after_train`
+- points in validation: `before_val`, `before_val_epoch`, `before_val_iter`, `after_val_iter`, `after_val_epoch`, `after_val`
+- points at testing: `before_test`, `before_test_epoch`, `before_test_iter`, `after_test_iter`, `after_test_epoch`,  `after_test`
+- other points: `before_save_checkpoint`, `after_save_checkpoint`
+
+比如，我们要实现一个检查 loss 的钩子，当损失为 NaN 时自动结束训练。我们可以把这个过程分为三步：
+
+1. 在 MMEngine 实现一个继承于 `Hook` 类的新钩子，并实现 `after_train_iter` 方法用于检查每 `n` 次训练迭代后损失是否变为 NaN 。
+2. 使用 `@HOOKS.register_module()` 注册实现好了的自定义钩子，如下列代码所示。
+3. 在配置文件中添加 `custom_hooks = [dict(type='MemoryProfilerHook', interval=50)]`
+
+```python
+from typing import Optional
+
+import torch
+from mmengine.hooks import Hook
+from mmengine.runner import Runner
+
+from mmdet.registry import HOOKS
+
+
+@HOOKS.register_module()
+class CheckInvalidLossHook(Hook):
+    """Check invalid loss hook.
+
+    This hook will regularly check whether the loss is valid
+    during training.
+
+    Args:
+        interval (int): Checking interval (every k iterations).
+            Default: 50.
+    """
+
+    def __init__(self, interval: int = 50) -> None:
+        self.interval = interval
+
+    def after_train_iter(self,
+                         runner: Runner,
+                         batch_idx: int,
+                         data_batch: Optional[dict] = None,
+                         outputs: Optional[dict] = None) -> None:
+        """Regularly check whether the loss is valid every n iterations.
+
+        Args:
+            runner (:obj:`Runner`): The runner of the training process.
+            batch_idx (int): The index of the current batch in the train loop.
+            data_batch (dict, Optional): Data from dataloader.
+                Defaults to None.
+            outputs (dict, Optional): Outputs from model. Defaults to None.
+        """
+        if self.every_n_train_iters(runner, self.interval):
+            assert torch.isfinite(outputs['loss']), \
+                runner.logger.info('loss become infinite or NaN!')
+```
+
+请参考 [自定义训练配置](../advanced_guides/customize_runtime.md) 了解更多与自定义钩子相关的内容。
--- a/docs/zh_cn/user_guides/useful_tools.md
+++ b/docs/zh_cn/user_guides/useful_tools.md
+除了训练和测试脚本，我们还在 `tools/` 目录下提供了许多有用的工具。
+
+## 日志分析
+
+`tools/analysis_tools/analyze_logs.py` 可利用指定的训练 log 文件绘制 loss/mAP 曲线图，
+第一次运行前请先运行 `pip install seaborn` 安装必要依赖.
+
+```shell
+python tools/analysis_tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--eval-interval ${EVALUATION_INTERVAL}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}]
+```
+
+![loss curve image](../../../resources/loss_curve.png)
+
+样例:
+
+- 绘制分类损失曲线图
+
+  ```shell
+  python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
+  ```
+
+- 绘制分类损失、回归损失曲线图，保存图片为对应的 pdf 文件
+
+  ```shell
+  python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf
+  ```
+
+- 在相同图像中比较两次运行结果的 bbox mAP
+
+  ```shell
+  python tools/analysis_tools/analyze_logs.py plot_curve log1.json log2.json --keys bbox_mAP --legend run1 run2
+  ```
+
+- 计算平均训练速度
+
+  ```shell
+  python tools/analysis_tools/analyze_logs.py cal_train_time log.json [--include-outliers]
+  ```
+
+  输出以如下形式展示
+
+  ```text
+  -----Analyze train time of work_dirs/some_exp/20190611_192040.log.json-----
+  slowest epoch 11, average time is 1.2024
+  fastest epoch 1, average time is 1.1909
+  time std over epochs is 0.0028
+  average iter time: 1.1959 s/iter
+  ```
+
+## 结果分析
+
+使用 `tools/analysis_tools/analyze_results.py` 可计算每个图像 mAP，随后根据真实标注框与预测框的比较结果，展示或保存最高与最低 top-k 得分的预测图像。
+
+**使用方法**
+
+```shell
+python tools/analysis_tools/analyze_results.py \
+      ${CONFIG} \
+      ${PREDICTION_PATH} \
+      ${SHOW_DIR} \
+      [--show] \
+      [--wait-time ${WAIT_TIME}] \
+      [--topk ${TOPK}] \
+      [--show-score-thr ${SHOW_SCORE_THR}] \
+      [--cfg-options ${CFG_OPTIONS}]
+```
+
+各个参数选项的作用:
+
+- `config`: model config 文件的路径。
+- `prediction_path`:  使用 `tools/test.py` 输出的 pickle 格式结果文件。
+- `show_dir`: 绘制真实标注框与预测框的图像存放目录。
+- `--show`：决定是否展示绘制 box 后的图片，默认值为 `False`。
+- `--wait-time`: show 时间的间隔，若为 0 表示持续显示。
+- `--topk`: 根据最高或最低 `topk` 概率排序保存的图片数量，若不指定，默认设置为 `20`。
+- `--show-score-thr`: 能够展示的概率阈值，默认为 `0`。
+- `--cfg-options`: 如果指定，可根据指定键值对覆盖更新配置文件的对应选项
+
+**样例**:
+假设你已经通过 `tools/test.py` 得到了 pickle 格式的结果文件，路径为 './result.pkl'。
+
+1. 测试 Faster R-CNN 并可视化结果，保存图片至 `results/`
+
+```shell
+python tools/analysis_tools/analyze_results.py \
+       configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py \
+       result.pkl \
+       results \
+       --show
+```
+
+2. 测试 Faster R-CNN 并指定 top-k 参数为 50，保存结果图片至 `results/`
+
+```shell
+python tools/analysis_tools/analyze_results.py \
+       configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py \
+       result.pkl \
+       results \
+       --topk 50
+```
+
+3. 如果你想过滤低概率的预测结果，指定 `show-score-thr` 参数
+
+```shell
+python tools/analysis_tools/analyze_results.py \
+       configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py \
+       result.pkl \
+       results \
+       --show-score-thr 0.3
+```
+
+## 多模型检测结果融合
+
+`tools/analysis_tools/fuse_results.py` 可使用 Weighted Boxes Fusion(WBF) 方法将多个模型的检测结果进行融合。（当前仅支持 COCO 格式）
+
+**使用方法**
+
+```shell
+python tools/analysis_tools/fuse_results.py \
+       ${PRED_RESULTS} \
+       [--annotation ${ANNOTATION}] \
+       [--weights ${WEIGHTS}] \
+       [--fusion-iou-thr ${FUSION_IOU_THR}] \
+       [--skip-box-thr ${SKIP_BOX_THR}] \
+       [--conf-type ${CONF_TYPE}] \
+       [--eval-single ${EVAL_SINGLE}] \
+       [--save-fusion-results ${SAVE_FUSION_RESULTS}] \
+       [--out-dir ${OUT_DIR}]
+```
+
+各个参数选项的作用:
+
+- `pred-results`: 多模型测试结果的保存路径。（目前仅支持 json 格式）
+- `--annotation`: 真实标注框的保存路径。
+- `--weights`: 模型融合权重。默认设置下，每个模型的权重均为1。
+- `--fusion-iou-thr`: 在WBF算法中，匹配成功的 IoU 阈值，默认值为`0.55`。
+- `--skip-box-thr`: WBF算法中需剔除的置信度阈值，置信度小于该值的 bbox 会被剔除，默认值为`0`。
+- `--conf-type`: 如何计算融合后 bbox 的置信度。有以下四种选项：
+  - `avg`: 取平均值，默认为此选项。
+  - `max`: 取最大值。
+  - `box_and_model_avg`: box和模型尺度的加权平均值。
+  - `absent_model_aware_avg`: 考虑缺失模型的加权平均值。
+- `--eval-single`: 是否评估每个单一模型，默认值为`False`。
+- `--save-fusion-results`: 是否保存融合结果，默认值为`False`。
+- `--out-dir`: 融合结果保存的路径。
+
+**样例**:
+假设你已经通过 `tools/test.py` 得到了3个模型的 json 格式的结果文件，路径分别为 './faster-rcnn_r50-caffe_fpn_1x_coco.json', './retinanet_r50-caffe_fpn_1x_coco.json', './cascade-rcnn_r50-caffe_fpn_1x_coco.json'，真实标注框的文件路径为'./annotation.json'。
+
+1. 融合三个模型的预测结果并评估其效果
+
+```shell
+python tools/analysis_tools/fuse_results.py \
+       ./faster-rcnn_r50-caffe_fpn_1x_coco.json \
+       ./retinanet_r50-caffe_fpn_1x_coco.json \
+       ./cascade-rcnn_r50-caffe_fpn_1x_coco.json \
+       --annotation ./annotation.json \
+       --weights 1 2 3 \
+```
+
+2. 同时评估每个单一模型与融合结果
+
+```shell
+python tools/analysis_tools/fuse_results.py \
+       ./faster-rcnn_r50-caffe_fpn_1x_coco.json \
+       ./retinanet_r50-caffe_fpn_1x_coco.json \
+       ./cascade-rcnn_r50-caffe_fpn_1x_coco.json \
+       --annotation ./annotation.json \
+       --weights 1 2 3 \
+       --eval-single
+```
+
+3. 融合三个模型的预测结果并保存
+
+```shell
+python tools/analysis_tools/fuse_results.py \
+       ./faster-rcnn_r50-caffe_fpn_1x_coco.json \
+       ./retinanet_r50-caffe_fpn_1x_coco.json \
+       ./cascade-rcnn_r50-caffe_fpn_1x_coco.json \
+       --annotation ./annotation.json \
+       --weights 1 2 3 \
+       --save-fusion-results \
+       --out-dir outputs/fusion
+```
+
+## 可视化
+
+### 可视化数据集
+
+`tools/analysis_tools/browse_dataset.py` 可帮助使用者检查所使用的检测数据集（包括图像和标注），或保存图像至指定目录。
+
+```shell
+python tools/analysis_tools/browse_dataset.py ${CONFIG} [-h] [--skip-type ${SKIP_TYPE[SKIP_TYPE...]}] [--output-dir ${OUTPUT_DIR}] [--not-show] [--show-interval ${SHOW_INTERVAL}]
+```
+
+### 可视化模型
+
+在可视化之前，需要先转换模型至 ONNX 格式，[可参考此处](#convert-mmdetection-model-to-onnx-experimental)。
+注意，现在只支持 RetinaNet，之后的版本将会支持其他模型
+转换后的模型可以被其他工具可视化[Netron](https://github.com/lutzroeder/netron)。
+
+### 可视化预测结果
+
+如果你想要一个轻量 GUI 可视化检测结果，你可以参考 [DetVisGUI project](https://github.com/Chien-Hung/DetVisGUI/tree/mmdetection)。
+
+## 误差分析
+
+`tools/analysis_tools/coco_error_analysis.py` 使用不同标准分析每个类别的 COCO 评估结果。同时将一些有帮助的信息体现在图表上。
+
+```shell
+python tools/analysis_tools/coco_error_analysis.py ${RESULT} ${OUT_DIR} [-h] [--ann ${ANN}] [--types ${TYPES[TYPES...]}]
+```
+
+样例:
+
+假设你已经把 [Mask R-CNN checkpoint file](https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth) 放置在文件夹 'checkpoint' 中（其他模型请在 [model zoo](./model_zoo.md) 中获取）。
+
+为了保存 bbox 结果信息，我们需要用下列方式修改 `test_evaluator` :
+
+1. 查找当前 config 文件相对应的  'configs/base/datasets' 数据集信息。
+2. 用当前数据集 config 中的 test_evaluator 以及 test_dataloader 替换原始文件的 test_evaluator 以及 test_dataloader。
+3. 使用以下命令得到 bbox 或 segmentation 的 json 格式文件。
+
+```shell
+python tools/test.py \
+       configs/mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py \
+       checkpoint/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \
+```
+
+1. 得到每一类的 COCO bbox 误差结果，并保存分析结果图像至指定目录。（在 [config](../../../configs/_base_/datasets/coco_instance.py) 中默认目录是 './work_dirs/coco_instance/test')
+
+```shell
+python tools/analysis_tools/coco_error_analysis.py \
+       results.bbox.json \
+       results \
+       --ann=data/coco/annotations/instances_val2017.json \
+```
+
+2. 得到每一类的 COCO 分割误差结果，并保存分析结果图像至指定目录。
+
+```shell
+python tools/analysis_tools/coco_error_analysis.py \
+       results.segm.json \
+       results \
+       --ann=data/coco/annotations/instances_val2017.json \
+       --types='segm'
+```
+
+## 模型服务部署
+
+如果你想使用 [`TorchServe`](https://pytorch.org/serve/) 搭建一个 `MMDetection` 模型服务，可以参考以下步骤：
+
+### 1. 安装 TorchServe
+
+假设你已经成功安装了包含 `PyTorch` 和 `MMDetection` 的 `Python` 环境，那么你可以运行以下命令来安装 `TorchServe` 及其依赖项。有关更多其他安装选项，请参考[快速入门](https://github.com/pytorch/serve/blob/master/README.md#serve-a-model)。
+
+```shell
+python -m pip install torchserve torch-model-archiver torch-workflow-archiver nvgpu
+```
+
+**注意**: 如果你想在 docker 中使用`TorchServe`，请参考[torchserve docker](https://github.com/pytorch/serve/blob/master/docker/README.md)。
+
+### 2. 把 MMDetection 模型转换至 TorchServe
+
+```shell
+python tools/deployment/mmdet2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
+--output-folder ${MODEL_STORE} \
+--model-name ${MODEL_NAME}
+```
+
+### 3. 启动 `TorchServe`
+
+```shell
+torchserve --start --ncs \
+  --model-store ${MODEL_STORE} \
+  --models  ${MODEL_NAME}.mar
+```
+
+### 4. 测试部署效果
+
+```shell
+curl -O curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/3dogs.jpg
+curl http://127.0.0.1:8080/predictions/${MODEL_NAME} -T 3dogs.jpg
+```
+
+你可以得到下列 json 信息：
+
+```json
+[
+  {
+    "class_label": 16,
+    "class_name": "dog",
+    "bbox": [
+      294.63409423828125,
+      203.99111938476562,
+      417.048583984375,
+      281.62744140625
+    ],
+    "score": 0.9987992644309998
+  },
+  {
+    "class_label": 16,
+    "class_name": "dog",
+    "bbox": [
+      404.26019287109375,
+      126.0080795288086,
+      574.5091552734375,
+      293.6662292480469
+    ],
+    "score": 0.9979367256164551
+  },
+  {
+    "class_label": 16,
+    "class_name": "dog",
+    "bbox": [
+      197.2144775390625,
+      93.3067855834961,
+      307.8505554199219,
+      276.7560119628906
+    ],
+    "score": 0.993338406085968
+  }
+]
+```
+
+#### 结果对比
+
+你也可以使用 `test_torchserver.py` 来比较 `TorchServe` 和 `PyTorch` 的结果，并可视化：
+
+```shell
+python tools/deployment/test_torchserver.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${MODEL_NAME}
+[--inference-addr ${INFERENCE_ADDR}] [--device ${DEVICE}] [--score-thr ${SCORE_THR}] [--work-dir ${WORK_DIR}]
+```
+
+样例:
+
+```shell
+python tools/deployment/test_torchserver.py \
+demo/demo.jpg \
+configs/yolo/yolov3_d53_8xb8-320-273e_coco.py \
+checkpoint/yolov3_d53_320_273e_coco-421362b6.pth \
+yolov3 \
+--work-dir ./work-dir
+```
+
+### 5. 停止 `TorchServe`
+
+```shell
+torchserve --stop
+```
+
+## 模型复杂度
+
+`tools/analysis_tools/get_flops.py` 工具可用于计算指定模型的 FLOPs、参数量大小（改编自 [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) ）。
+
+```shell
+python tools/analysis_tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
+```
+
+获得的结果如下：
+
+```text
+==============================
+Input shape: (3, 1280, 800)
+Flops: 239.32 GFLOPs
+Params: 37.74 M
+==============================
+```
+
+**注意**：这个工具还只是实验性质，我们不保证这个数值是绝对正确的。你可以将他用于简单的比较，但如果用于科技论文报告需要再三检查确认。
+
+1. FLOPs 与输入的形状大小相关，参数量没有这个关系，默认的输入形状大小为 (1, 3, 1280, 800) 。
+2. 一些算子并不计入 FLOPs，比如 GN 或其他自定义的算子。你可以参考 [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/utils/flops_counter.py) 查看更详细的说明。
+3. 两阶段检测的 FLOPs 大小取决于 proposal 的数量。
+
+## 模型转换
+
+### MMDetection 模型转换至 ONNX 格式
+
+我们提供了一个脚本用于转换模型至 [ONNX](https://github.com/onnx/onnx) 格式。同时还支持比较 Pytorch 与 ONNX 模型的输出结果以便对照。更详细的内容可以参考 [mmdeploy](https://github.com/open-mmlab/mmdeploy)。
+
+### MMDetection 1.x 模型转换至 MMDetection 2.x 模型
+
+`tools/model_converters/upgrade_model_version.py` 可将旧版本的 MMDetection checkpoints 转换至新版本。但要注意此脚本不保证在新版本加入非兼容更新后还能正常转换，建议您直接使用新版本的 checkpoints。
+
+```shell
+python tools/model_converters/upgrade_model_version.py ${IN_FILE} ${OUT_FILE} [-h] [--num-classes NUM_CLASSES]
+```
+
+### RegNet 模型转换至 MMDetection 模型
+
+`tools/model_converters/regnet2mmdet.py` 将 pycls 编码的预训练 RegNet 模型转换为 MMDetection 风格。
+
+```shell
+python tools/model_converters/regnet2mmdet.py ${SRC} ${DST} [-h]
+```
+
+### Detectron ResNet 模型转换至 Pytorch 模型
+
+`tools/model_converters/detectron2pytorch.py` 将 detectron 的原始预训练 RegNet 模型转换为 MMDetection 风格。
+
+```shell
+python tools/model_converters/detectron2pytorch.py ${SRC} ${DST} ${DEPTH} [-h]
+```
+
+### 制作发布用模型
+
+`tools/model_converters/publish_model.py` 可用来制作一个发布用的模型。
+
+在发布模型至 AWS 之前，你可能需要：
+
+1. 将模型转换至 CPU 张量
+2. 删除优化器状态
+3. 计算 checkpoint 文件的 hash 值，并将 hash 号码记录至文件名。
+
+```shell
+python tools/model_converters/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
+```
+
+样例：
+
+```shell
+python tools/model_converters/publish_model.py work_dirs/faster_rcnn/latest.pth faster_rcnn_r50_fpn_1x_20190801.pth
+```
+
+最后输出的文件名如下所示： `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth`.
+
+## 数据集转换
+
+`tools/data_converters/` 提供了将 Cityscapes 数据集与 Pascal VOC 数据集转换至 COCO 数据集格式的工具
+
+```shell
+python tools/dataset_converters/cityscapes.py ${CITYSCAPES_PATH} [-h] [--img-dir ${IMG_DIR}] [--gt-dir ${GT_DIR}] [-o ${OUT_DIR}] [--nproc ${NPROC}]
+python tools/dataset_converters/pascal_voc.py ${DEVKIT_PATH} [-h] [-o ${OUT_DIR}]
+```
+
+## 数据集下载
+
+`tools/misc/download_dataset.py` 可以下载各类形如 COCO， VOC， LVIS 数据集。
+
+```shell
+python tools/misc/download_dataset.py --dataset-name coco2017
+python tools/misc/download_dataset.py --dataset-name voc2007
+python tools/misc/download_dataset.py --dataset-name lvis
+```
+
+对于中国境内的用户，我们也推荐使用开源数据平台 [OpenDataLab](https://opendatalab.com/?source=OpenMMLab%20GitHub) 来获取这些数据集，以获得更好的下载体验:
+
+- [COCO2017](https://opendatalab.com/COCO_2017/download?source=OpenMMLab%20GitHub)
+- [VOC2007](https://opendatalab.com/PASCAL_VOC2007/download?source=OpenMMLab%20GitHub)
+- [VOC2012](https://opendatalab.com/PASCAL_VOC2012/download?source=OpenMMLab%20GitHub)
+- [LVIS](https://opendatalab.com/LVIS/download?source=OpenMMLab%20GitHub)
+
+## 基准测试
+
+### 鲁棒性测试基准
+
+`tools/analysis_tools/test_robustness.py` 及 `tools/analysis_tools/robustness_eval.py` 帮助使用者衡量模型的鲁棒性。其核心思想来源于 [Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming](https://arxiv.org/abs/1907.07484)。如果你想了解如何在污损图像上评估模型的效果，以及参考该基准的一组标准模型，请参照 [robustness_benchmarking.md](robustness_benchmarking.md)。
+
+### FPS 测试基准
+
+`tools/analysis_tools/benchmark.py` 可帮助使用者计算 FPS，FPS 计算包括了模型向前传播与后处理过程。为了得到更精确的计算值，现在的分布式计算模式只支持一个 GPU。
+
+```shell
+python -m torch.distributed.launch --nproc_per_node=1 --master_port=${PORT} tools/analysis_tools/benchmark.py \
+    ${CONFIG} \
+    [--checkpoint ${CHECKPOINT}] \
+    [--repeat-num ${REPEAT_NUM}] \
+    [--max-iter ${MAX_ITER}] \
+    [--log-interval ${LOG_INTERVAL}] \
+    --launcher pytorch
+```
+
+样例：假设你已经下载了 `Faster R-CNN` 模型 checkpoint 并放置在 `checkpoints/` 目录下。
+
+```shell
+python -m torch.distributed.launch --nproc_per_node=1 --master_port=29500 tools/analysis_tools/benchmark.py \
+       configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py \
+       checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
+       --launcher pytorch
+```
+
+## 更多工具
+
+### 以某个评估标准进行评估
+
+`tools/analysis_tools/eval_metric.py` 根据配置文件中的评估方式对 pkl 结果文件进行评估。
+
+```shell
+python tools/analysis_tools/eval_metric.py ${CONFIG} ${PKL_RESULTS} [-h] [--format-only] [--eval ${EVAL[EVAL ...]}]
+                      [--cfg-options ${CFG_OPTIONS [CFG_OPTIONS ...]}]
+                      [--eval-options ${EVAL_OPTIONS [EVAL_OPTIONS ...]}]
+```
+
+### 打印全部 config
+
+`tools/misc/print_config.py` 可将所有配置继承关系展开，完全打印相应的配置文件。
+
+```shell
+python tools/misc/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}]
+```
+
+## 超参数优化
+
+### YOLO Anchor 优化
+
+`tools/analysis_tools/optimize_anchors.py` 提供了两种方法优化 YOLO 的 anchors。
+
+其中一种方法使用 K 均值 anchor 聚类（k-means anchor cluster），源自 [darknet](https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L1421)。
+
+```shell
+python tools/analysis_tools/optimize_anchors.py ${CONFIG} --algorithm k-means --input-shape ${INPUT_SHAPE [WIDTH HEIGHT]} --output-dir ${OUTPUT_DIR}
+```
+
+另一种方法使用差分进化算法优化 anchors。
+
+```shell
+python tools/analysis_tools/optimize_anchors.py ${CONFIG} --algorithm differential_evolution --input-shape ${INPUT_SHAPE [WIDTH HEIGHT]} --output-dir ${OUTPUT_DIR}
+```
+
+样例：
+
+```shell
+python tools/analysis_tools/optimize_anchors.py configs/yolo/yolov3_d53_8xb8-320-273e_coco.py --algorithm differential_evolution --input-shape 608 608 --device cuda --output-dir work_dirs
+```
+
+你可能会看到如下结果：
+
+```
+loading annotations into memory...
+Done (t=9.70s)
+creating index...
+index created!
+2021-07-19 19:37:20,951 - mmdet - INFO - Collecting bboxes from annotation...
+[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 117266/117266, 15874.5 task/s, elapsed: 7s, ETA:     0s
+
+2021-07-19 19:37:28,753 - mmdet - INFO - Collected 849902 bboxes.
+differential_evolution step 1: f(x)= 0.506055
+differential_evolution step 2: f(x)= 0.506055
+......
+
+differential_evolution step 489: f(x)= 0.386625
+2021-07-19 19:46:40,775 - mmdet - INFO Anchor evolution finish. Average IOU: 0.6133754253387451
+2021-07-19 19:46:40,776 - mmdet - INFO Anchor differential evolution result:[[10, 12], [15, 30], [32, 22], [29, 59], [61, 46], [57, 116], [112, 89], [154, 198], [349, 336]]
+2021-07-19 19:46:40,798 - mmdet - INFO Result saved in work_dirs/anchor_optimize_result.json
+```
+
+## 混淆矩阵
+
+混淆矩阵是对检测结果的概览。
+`tools/analysis_tools/confusion_matrix.py` 可对预测结果进行分析，绘制成混淆矩阵表。
+首先，运行 `tools/test.py` 保存 `.pkl` 预测结果。
+之后再运行：
+
+```
+python tools/analysis_tools/confusion_matrix.py ${CONFIG}  ${DETECTION_RESULTS}  ${SAVE_DIR} --show
+```
+
+最后你可以得到如图的混淆矩阵：
+
+![confusion_matrix_example](https://user-images.githubusercontent.com/12907710/140513068-994cdbf4-3a4a-48f0-8fd8-2830d93fd963.png)
+
+## COCO 分离和遮挡实例分割性能评估
+
+对于最先进的目标检测器来说，检测被遮挡的物体仍然是一个挑战。
+我们实现了论文 [A Tri-Layer Plugin to Improve Occluded Detection](https://arxiv.org/abs/2210.10046) 中提出的指标来计算分离和遮挡目标的召回率。
+
+使用此评价指标有两种方法：
+
+### 离线评测
+
+我们提供了一个脚本对存储后的检测结果文件计算指标。
+
+首先，使用 `tools/test.py` 脚本存储检测结果：
+
+```shell
+python tools/test.py ${CONFIG} ${MODEL_PATH} --out results.pkl
+```
+
+然后，运行 `tools/analysis_tools/coco_occluded_separated_recall.py` 脚本来计算分离和遮挡目标的掩码的召回率:
+
+```shell
+python tools/analysis_tools/coco_occluded_separated_recall.py results.pkl --out occluded_separated_recall.json
+```
+
+输出如下：
+
+```
+loading annotations into memory...
+Done (t=0.51s)
+creating index...
+index created!
+processing detection results...
+[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 109.3 task/s, elapsed: 46s, ETA:     0s
+computing occluded mask recall...
+[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5550/5550, 780.5 task/s, elapsed: 7s, ETA:     0s
+COCO occluded mask recall: 58.79%
+COCO occluded mask success num: 3263
+computing separated mask recall...
+[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 3522/3522, 778.3 task/s, elapsed: 5s, ETA:     0s
+COCO separated mask recall: 31.94%
+COCO separated mask success num: 1125
+
+-----------+--------+-------------+
+| mask type | recall | num correct |
+-----------+--------+-------------+
+| occluded  | 58.79% | 3263        |
+| separated | 31.94% | 1125        |
+-----------+--------+-------------+
+Evaluation results have been saved to occluded_separated_recall.json.
+```
+
+### 在线评测
+
+我们实现继承自 `CocoMetic` 的 `CocoOccludedSeparatedMetric`。
+要在训练期间评估分离和遮挡掩码的召回率，只需在配置中将 evaluator 类型替换为 `CocoOccludedSeparatedMetric`：
+
+```python
+val_evaluator = dict(
+    type='CocoOccludedSeparatedMetric',  # 修改此处
+    ann_file=data_root + 'annotations/instances_val2017.json',
+    metric=['bbox', 'segm'],
+    format_only=False)
+test_evaluator = val_evaluator
+```
+
+如果您使用了此指标，请引用论文：
+
+```latex
+@article{zhan2022triocc,
+    title={A Tri-Layer Plugin to Improve Occluded Detection},
+    author={Zhan, Guanqi and Xie, Weidi and Zisserman, Andrew},
+    journal={British Machine Vision Conference},
+    year={2022}
+}
+```
--- a/docs/zh_cn/user_guides/visualization.md
+++ b/docs/zh_cn/user_guides/visualization.md
+# 可视化
+
+在阅读本教程之前，建议先阅读 MMEngine 的 [Visualization](https://github.com/open-mmlab/mmengine/blob/main/docs/en/advanced_tutorials/visualization.md) 文档，以对 `Visualizer` 的定义和用法有一个初步的了解。
+
+简而言之，`Visualizer` 在 MMEngine 中实现以满足日常可视化需求，并包含以下三个主要功能：
+
+- 实现通用的绘图 API，例如 [`draw_bboxes`](mmengine.visualization.Visualizer.draw_bboxes) 实现了绘制边界框的功能，[`draw_lines`](mmengine.visualization.Visualizer.draw_lines) 实现了绘制线条的功能。
+- 支持将可视化结果、学习率曲线、损失函数曲线以及验证精度曲线写入到各种后端中，包括本地磁盘以及常见的深度学习训练日志工具，例如 [TensorBoard](https://www.tensorflow.org/tensorboard) 和 [Wandb](https://wandb.ai/site)。
+- 支持在代码的任何位置调用以可视化或记录模型在训练或测试期间的中间状态，例如特征图和验证结果。
+
+基于 MMEngine 的 `Visualizer`，MMDet 提供了各种预构建的可视化工具，用户可以通过简单地修改以下配置文件来使用它们。
+
+- `tools/analysis_tools/browse_dataset.py` 脚本提供了一个数据集可视化功能，可以在数据经过数据转换后绘制图像和相应的注释，具体描述请参见[`browse_dataset.py`](useful_tools.md#Visualization)。
+
+- MMEngine实现了`LoggerHook`，使用`Visualizer`将学习率、损失和评估结果写入由`Visualizer`设置的后端。因此，通过修改配置文件中的`Visualizer`后端，例如修改为`TensorBoardVISBackend`或`WandbVISBackend`，可以实现日志记录到常用的训练日志工具，如`TensorBoard`或`WandB`，从而方便用户使用这些可视化工具来分析和监控训练过程。
+
+- 在MMDet中实现了`VisualizerHook`，它使用`Visualizer`将验证或预测阶段的预测结果可视化或存储到由`Visualizer`设置的后端。因此，通过修改配置文件中的`Visualizer`后端，例如修改为`TensorBoardVISBackend`或`WandbVISBackend`，可以将预测图像存储到`TensorBoard`或`Wandb`中。
+
+## 配置
+
+由于使用了注册机制，在MMDet中我们可以通过修改配置文件来设置`Visualizer`的行为。通常，我们会在`configs/_base_/default_runtime.py`中为可视化器定义默认配置，详细信息请参见[配置教程](config.md)。
+
+```Python
+vis_backends = [dict(type='LocalVisBackend')]
+visualizer = dict(
+    type='DetLocalVisualizer',
+    vis_backends=vis_backends,
+    name='visualizer')
+```
+
+基于上面的例子，我们可以看到`Visualizer`的配置由两个主要部分组成，即`Visualizer`类型和其使用的可视化后端`vis_backends`。
+
+- 用户可直接使用`DetLocalVisualizer`来可视化支持任务的标签或预测结果。
+- MMDet默认将可视化后端`vis_backend`设置为本地可视化后端`LocalVisBackend`，将所有可视化结果和其他训练信息保存在本地文件夹中。
+
+## 存储
+
+MMDet默认使用本地可视化后端[`LocalVisBackend`](mmengine.visualization.LocalVisBackend)，`VisualizerHook`和`LoggerHook`中存储的模型损失、学习率、模型评估精度和可视化信息，包括损失、学习率、评估精度将默认保存到`{work_dir}/{config_name}/{time}/{vis_data}`文件夹中。此外，MMDet还支持其他常见的可视化后端，例如`TensorboardVisBackend`和`WandbVisBackend`，您只需要在配置文件中更改`vis_backends`类型为相应的可视化后端即可。例如，只需在配置文件中插入以下代码块即可将数据存储到`TensorBoard`和`Wandb`中。
+
+```Python
+# https://mmengine.readthedocs.io/en/latest/api/visualization.html
+_base_.visualizer.vis_backends = [
+    dict(type='LocalVisBackend'), #
+    dict(type='TensorboardVisBackend'),
+    dict(type='WandbVisBackend'),]
+```
+
+## 绘图
+
+### 绘制预测结果
+
+MMDet主要使用[`DetVisualizationHook`](mmdet.engine.hooks.DetVisualizationHook)来绘制验证和测试的预测结果，默认情况下`DetVisualizationHook`是关闭的，其默认配置如下。
+
+```Python
+visualization=dict( #用户可视化验证和测试结果
+    type='DetVisualizationHook',
+    draw=False,
+    interval=1,
+    show=False)
+```
+
+以下表格展示了`DetVisualizationHook`支持的参数。
+
+|   参数   |                                       描述                                       |
+| :------: | :------------------------------------------------------------------------------: |
+|   draw   |          DetVisualizationHook通过enable参数打开和关闭，默认状态为关闭。          |
+| interval | 控制在DetVisualizationHook启用时存储或显示验证或测试结果的间隔，单位为迭代次数。 |
+|   show   |                         控制是否可视化验证或测试的结果。                         |
+
+如果您想在训练或测试期间启用 `DetVisualizationHook` 相关功能和配置，您只需要修改配置文件，以 `configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py` 为例，同时绘制注释和预测，并显示图像，配置文件可以修改如下：
+
+```Python
+visualization = _base_.default_hooks.visualization
+visualization.update(dict(draw=True, show=True))
+```
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/224883427-1294a7ba-14ab-4d93-9152-55a7b270b1f1.png" height="300"/>
+</div>
+
+`test.py`程序提供了`--show`和`--show-dir`参数，可以在测试过程中可视化注释和预测结果，而不需要修改配置文件，从而进一步简化了测试过程。
+
+```Shell
+# 展示测试结果
+python tools/test.py configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_tiny_8xb32-300e_coco/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth --show
+
+# 指定存储预测结果的位置
+python tools/test.py configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_tiny_8xb32-300e_coco/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth --show-dir imgs/
+```
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/224883427-1294a7ba-14ab-4d93-9152-55a7b270b1f1.png" height="300"/>
+</div>
--- a/mmdet/__init__.py
+++ b/mmdet/__init__.py
+# Copyright (c) OpenMMLab. All rights reserved.
+import mmcv
+import mmengine
+from mmengine.utils import digit_version
+
+from .version import __version__, version_info
+
+mmcv_minimum_version = '2.0.0rc4'
+mmcv_maximum_version = '2.2.0'
+mmcv_version = digit_version(mmcv.__version__)
+
+mmengine_minimum_version = '0.7.1'
+mmengine_maximum_version = '1.0.0'
+mmengine_version = digit_version(mmengine.__version__)
+
+assert (mmcv_version >= digit_version(mmcv_minimum_version)
+        and mmcv_version < digit_version(mmcv_maximum_version)), \
+    f'MMCV=={mmcv.__version__} is used but incompatible. ' \
+    f'Please install mmcv>={mmcv_minimum_version}, <{mmcv_maximum_version}.'
+
+assert (mmengine_version >= digit_version(mmengine_minimum_version)
+        and mmengine_version < digit_version(mmengine_maximum_version)), \
+    f'MMEngine=={mmengine.__version__} is used but incompatible. ' \
+    f'Please install mmengine>={mmengine_minimum_version}, ' \
+    f'<{mmengine_maximum_version}.'
+
+__all__ = ['__version__', 'version_info', 'digit_version']
--- a/mmdet/apis/__init__.py
+++ b/mmdet/apis/__init__.py
+# Copyright (c) OpenMMLab. All rights reserved.
+from .det_inferencer import DetInferencer
+from .inference import (async_inference_detector, inference_detector,
+                        inference_mot, init_detector, init_track_model)
+
+__all__ = [
+    'init_detector', 'async_inference_detector', 'inference_detector',
+    'DetInferencer', 'inference_mot', 'init_track_model'
+]
--- a/mmdet/apis/det_inferencer.py
+++ b/mmdet/apis/det_inferencer.py
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os.path as osp
+import warnings
+from typing import Dict, Iterable, List, Optional, Sequence, Tuple, Union
+
+import mmcv
+import mmengine
+import numpy as np
+import torch.nn as nn
+from mmcv.transforms import LoadImageFromFile
+from mmengine.dataset import Compose
+from mmengine.fileio import (get_file_backend, isdir, join_path,
+                             list_dir_or_file)
+from mmengine.infer.infer import BaseInferencer, ModelType
+from mmengine.model.utils import revert_sync_batchnorm
+from mmengine.registry import init_default_scope
+from mmengine.runner.checkpoint import _load_checkpoint_to_model
+from mmengine.visualization import Visualizer
+from rich.progress import track
+
+from mmdet.evaluation import INSTANCE_OFFSET
+from mmdet.registry import DATASETS
+from mmdet.structures import DetDataSample
+from mmdet.structures.mask import encode_mask_results, mask2bbox
+from mmdet.utils import ConfigType
+from ..evaluation import get_classes
+
+try:
+    from panopticapi.evaluation import VOID
+    from panopticapi.utils import id2rgb
+except ImportError:
+    id2rgb = None
+    VOID = None
+
+InputType = Union[str, np.ndarray]
+InputsType = Union[InputType, Sequence[InputType]]
+PredType = List[DetDataSample]
+ImgType = Union[np.ndarray, Sequence[np.ndarray]]
+
+IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif',
+                  '.tiff', '.webp')
+
+
+class DetInferencer(BaseInferencer):
+    """Object Detection Inferencer.
+
+    Args:
+        model (str, optional): Path to the config file or the model name
+            defined in metafile. For example, it could be
+            "rtmdet-s" or 'rtmdet_s_8xb32-300e_coco' or
+            "configs/rtmdet/rtmdet_s_8xb32-300e_coco.py".
+            If model is not specified, user must provide the
+            `weights` saved by MMEngine which contains the config string.
+            Defaults to None.
+        weights (str, optional): Path to the checkpoint. If it is not specified
+            and model is a model name of metafile, the weights will be loaded
+            from metafile. Defaults to None.
+        device (str, optional): Device to run inference. If None, the available
+            device will be automatically used. Defaults to None.
+        scope (str, optional): The scope of the model. Defaults to mmdet.
+        palette (str): Color palette used for visualization. The order of
+            priority is palette -> config -> checkpoint. Defaults to 'none'.
+        show_progress (bool): Control whether to display the progress
+            bar during the inference process. Defaults to True.
+    """
+
+    preprocess_kwargs: set = set()
+    forward_kwargs: set = set()
+    visualize_kwargs: set = {
+        'return_vis',
+        'show',
+        'wait_time',
+        'draw_pred',
+        'pred_score_thr',
+        'img_out_dir',
+        'no_save_vis',
+    }
+    postprocess_kwargs: set = {
+        'print_result',
+        'pred_out_dir',
+        'return_datasamples',
+        'no_save_pred',
+    }
+
+    def __init__(self,
+                 model: Optional[Union[ModelType, str]] = None,
+                 weights: Optional[str] = None,
+                 device: Optional[str] = None,
+                 scope: Optional[str] = 'mmdet',
+                 palette: str = 'none',
+                 show_progress: bool = True) -> None:
+        # A global counter tracking the number of images processed, for
+        # naming of the output images
+        self.num_visualized_imgs = 0
+        self.num_predicted_imgs = 0
+        self.palette = palette
+        init_default_scope(scope)
+        super().__init__(
+            model=model, weights=weights, device=device, scope=scope)
+        self.model = revert_sync_batchnorm(self.model)
+        self.show_progress = show_progress
+
+    def _load_weights_to_model(self, model: nn.Module,
+                               checkpoint: Optional[dict],
+                               cfg: Optional[ConfigType]) -> None:
+        """Loading model weights and meta information from cfg and checkpoint.
+
+        Args:
+            model (nn.Module): Model to load weights and meta information.
+            checkpoint (dict, optional): The loaded checkpoint.
+            cfg (Config or ConfigDict, optional): The loaded config.
+        """
+
+        if checkpoint is not None:
+            _load_checkpoint_to_model(model, checkpoint)
+            checkpoint_meta = checkpoint.get('meta', {})
+            # save the dataset_meta in the model for convenience
+            if 'dataset_meta' in checkpoint_meta:
+                # mmdet 3.x, all keys should be lowercase
+                model.dataset_meta = {
+                    k.lower(): v
+                    for k, v in checkpoint_meta['dataset_meta'].items()
+                }
+            elif 'CLASSES' in checkpoint_meta:
+                # < mmdet 3.x
+                classes = checkpoint_meta['CLASSES']
+                model.dataset_meta = {'classes': classes}
+            else:
+                warnings.warn(
+                    'dataset_meta or class names are not saved in the '
+                    'checkpoint\'s meta data, use COCO classes by default.')
+                model.dataset_meta = {'classes': get_classes('coco')}
+        else:
+            warnings.warn('Checkpoint is not loaded, and the inference '
+                          'result is calculated by the randomly initialized '
+                          'model!')
+            warnings.warn('weights is None, use COCO classes by default.')
+            model.dataset_meta = {'classes': get_classes('coco')}
+
+        # Priority:  args.palette -> config -> checkpoint
+        if self.palette != 'none':
+            model.dataset_meta['palette'] = self.palette
+        else:
+            test_dataset_cfg = copy.deepcopy(cfg.test_dataloader.dataset)
+            # lazy init. We only need the metainfo.
+            test_dataset_cfg['lazy_init'] = True
+            metainfo = DATASETS.build(test_dataset_cfg).metainfo
+            cfg_palette = metainfo.get('palette', None)
+            if cfg_palette is not None:
+                model.dataset_meta['palette'] = cfg_palette
+            else:
+                if 'palette' not in model.dataset_meta:
+                    warnings.warn(
+                        'palette does not exist, random is used by default. '
+                        'You can also set the palette to customize.')
+                    model.dataset_meta['palette'] = 'random'
+
+    def _init_pipeline(self, cfg: ConfigType) -> Compose:
+        """Initialize the test pipeline."""
+        pipeline_cfg = cfg.test_dataloader.dataset.pipeline
+
+        # For inference, the key of ``img_id`` is not used.
+        if 'meta_keys' in pipeline_cfg[-1]:
+            pipeline_cfg[-1]['meta_keys'] = tuple(
+                meta_key for meta_key in pipeline_cfg[-1]['meta_keys']
+                if meta_key != 'img_id')
+
+        load_img_idx = self._get_transform_idx(
+            pipeline_cfg, ('LoadImageFromFile', LoadImageFromFile))
+        if load_img_idx == -1:
+            raise ValueError(
+                'LoadImageFromFile is not found in the test pipeline')
+        pipeline_cfg[load_img_idx]['type'] = 'mmdet.InferencerLoader'
+        return Compose(pipeline_cfg)
+
+    def _get_transform_idx(self, pipeline_cfg: ConfigType,
+                           name: Union[str, Tuple[str, type]]) -> int:
+        """Returns the index of the transform in a pipeline.
+
+        If the transform is not found, returns -1.
+        """
+        for i, transform in enumerate(pipeline_cfg):
+            if transform['type'] in name:
+                return i
+        return -1
+
+    def _init_visualizer(self, cfg: ConfigType) -> Optional[Visualizer]:
+        """Initialize visualizers.
+
+        Args:
+            cfg (ConfigType): Config containing the visualizer information.
+
+        Returns:
+            Visualizer or None: Visualizer initialized with config.
+        """
+        visualizer = super()._init_visualizer(cfg)
+        visualizer.dataset_meta = self.model.dataset_meta
+        return visualizer
+
+    def _inputs_to_list(self, inputs: InputsType) -> list:
+        """Preprocess the inputs to a list.
+
+        Preprocess inputs to a list according to its type:
+
+        - list or tuple: return inputs
+        - str:
+            - Directory path: return all files in the directory
+            - other cases: return a list containing the string. The string
+              could be a path to file, a url or other types of string according
+              to the task.
+
+        Args:
+            inputs (InputsType): Inputs for the inferencer.
+
+        Returns:
+            list: List of input for the :meth:`preprocess`.
+        """
+        if isinstance(inputs, str):
+            backend = get_file_backend(inputs)
+            if hasattr(backend, 'isdir') and isdir(inputs):
+                # Backends like HttpsBackend do not implement `isdir`, so only
+                # those backends that implement `isdir` could accept the inputs
+                # as a directory
+                filename_list = list_dir_or_file(
+                    inputs, list_dir=False, suffix=IMG_EXTENSIONS)
+                inputs = [
+                    join_path(inputs, filename) for filename in filename_list
+                ]
+
+        if not isinstance(inputs, (list, tuple)):
+            inputs = [inputs]
+
+        return list(inputs)
+
+    def preprocess(self, inputs: InputsType, batch_size: int = 1, **kwargs):
+        """Process the inputs into a model-feedable format.
+
+        Customize your preprocess by overriding this method. Preprocess should
+        return an iterable object, of which each item will be used as the
+        input of ``model.test_step``.
+
+        ``BaseInferencer.preprocess`` will return an iterable chunked data,
+        which will be used in __call__ like this:
+
+        .. code-block:: python
+
+            def __call__(self, inputs, batch_size=1, **kwargs):
+                chunked_data = self.preprocess(inputs, batch_size, **kwargs)
+                for batch in chunked_data:
+                    preds = self.forward(batch, **kwargs)
+
+        Args:
+            inputs (InputsType): Inputs given by user.
+            batch_size (int): batch size. Defaults to 1.
+
+        Yields:
+            Any: Data processed by the ``pipeline`` and ``collate_fn``.
+        """
+        chunked_data = self._get_chunk_data(inputs, batch_size)
+        yield from map(self.collate_fn, chunked_data)
+
+    def _get_chunk_data(self, inputs: Iterable, chunk_size: int):
+        """Get batch data from inputs.
+
+        Args:
+            inputs (Iterable): An iterable dataset.
+            chunk_size (int): Equivalent to batch size.
+
+        Yields:
+            list: batch data.
+        """
+        inputs_iter = iter(inputs)
+        while True:
+            try:
+                chunk_data = []
+                for _ in range(chunk_size):
+                    inputs_ = next(inputs_iter)
+                    if isinstance(inputs_, dict):
+                        if 'img' in inputs_:
+                            ori_inputs_ = inputs_['img']
+                        else:
+                            ori_inputs_ = inputs_['img_path']
+                        chunk_data.append(
+                            (ori_inputs_,
+                             self.pipeline(copy.deepcopy(inputs_))))
+                    else:
+                        chunk_data.append((inputs_, self.pipeline(inputs_)))
+                yield chunk_data
+            except StopIteration:
+                if chunk_data:
+                    yield chunk_data
+                break
+
+    # TODO: Video and Webcam are currently not supported and
+    #  may consume too much memory if your input folder has a lot of images.
+    #  We will be optimized later.
+    def __call__(
+            self,
+            inputs: InputsType,
+            batch_size: int = 1,
+            return_vis: bool = False,
+            show: bool = False,
+            wait_time: int = 0,
+            no_save_vis: bool = False,
+            draw_pred: bool = True,
+            pred_score_thr: float = 0.3,
+            return_datasamples: bool = False,
+            print_result: bool = False,
+            no_save_pred: bool = True,
+            out_dir: str = '',
+            # by open image task
+            texts: Optional[Union[str, list]] = None,
+            # by open panoptic task
+            stuff_texts: Optional[Union[str, list]] = None,
+            # by GLIP
+            custom_entities: bool = False,
+            **kwargs) -> dict:
+        """Call the inferencer.
+
+        Args:
+            inputs (InputsType): Inputs for the inferencer.
+            batch_size (int): Inference batch size. Defaults to 1.
+            show (bool): Whether to display the visualization results in a
+                popup window. Defaults to False.
+            wait_time (float): The interval of show (s). Defaults to 0.
+            no_save_vis (bool): Whether to force not to save prediction
+                vis results. Defaults to False.
+            draw_pred (bool): Whether to draw predicted bounding boxes.
+                Defaults to True.
+            pred_score_thr (float): Minimum score of bboxes to draw.
+                Defaults to 0.3.
+            return_datasamples (bool): Whether to return results as
+                :obj:`DetDataSample`. Defaults to False.
+            print_result (bool): Whether to print the inference result w/o
+                visualization to the console. Defaults to False.
+            no_save_pred (bool): Whether to force not to save prediction
+                results. Defaults to True.
+            out_dir: Dir to save the inference results or
+                visualization. If left as empty, no file will be saved.
+                Defaults to ''.
+            texts (str | list[str]): Text prompts. Defaults to None.
+            stuff_texts (str | list[str]): Stuff text prompts of open
+                panoptic task. Defaults to None.
+            custom_entities (bool): Whether to use custom entities.
+                Defaults to False. Only used in GLIP.
+            **kwargs: Other keyword arguments passed to :meth:`preprocess`,
+                :meth:`forward`, :meth:`visualize` and :meth:`postprocess`.
+                Each key in kwargs should be in the corresponding set of
+                ``preprocess_kwargs``, ``forward_kwargs``, ``visualize_kwargs``
+                and ``postprocess_kwargs``.
+
+        Returns:
+            dict: Inference and visualization results.
+        """
+        (
+            preprocess_kwargs,
+            forward_kwargs,
+            visualize_kwargs,
+            postprocess_kwargs,
+        ) = self._dispatch_kwargs(**kwargs)
+
+        ori_inputs = self._inputs_to_list(inputs)
+
+        if texts is not None and isinstance(texts, str):
+            texts = [texts] * len(ori_inputs)
+        if stuff_texts is not None and isinstance(stuff_texts, str):
+            stuff_texts = [stuff_texts] * len(ori_inputs)
+        if texts is not None:
+            assert len(texts) == len(ori_inputs)
+            for i in range(len(texts)):
+                if isinstance(ori_inputs[i], str):
+                    ori_inputs[i] = {
+                        'text': texts[i],
+                        'img_path': ori_inputs[i],
+                        'custom_entities': custom_entities
+                    }
+                else:
+                    ori_inputs[i] = {
+                        'text': texts[i],
+                        'img': ori_inputs[i],
+                        'custom_entities': custom_entities
+                    }
+        if stuff_texts is not None:
+            assert len(stuff_texts) == len(ori_inputs)
+            for i in range(len(stuff_texts)):
+                ori_inputs[i]['stuff_text'] = stuff_texts[i]
+
+        inputs = self.preprocess(
+            ori_inputs, batch_size=batch_size, **preprocess_kwargs)
+
+        results_dict = {'predictions': [], 'visualization': []}
+        for ori_imgs, data in (track(inputs, description='Inference')
+                               if self.show_progress else inputs):
+            preds = self.forward(data, **forward_kwargs)
+            visualization = self.visualize(
+                ori_imgs,
+                preds,
+                return_vis=return_vis,
+                show=show,
+                wait_time=wait_time,
+                draw_pred=draw_pred,
+                pred_score_thr=pred_score_thr,
+                no_save_vis=no_save_vis,
+                img_out_dir=out_dir,
+                **visualize_kwargs)
+            results = self.postprocess(
+                preds,
+                visualization,
+                return_datasamples=return_datasamples,
+                print_result=print_result,
+                no_save_pred=no_save_pred,
+                pred_out_dir=out_dir,
+                **postprocess_kwargs)
+            results_dict['predictions'].extend(results['predictions'])
+            if results['visualization'] is not None:
+                results_dict['visualization'].extend(results['visualization'])
+        return results_dict
+
+    def visualize(self,
+                  inputs: InputsType,
+                  preds: PredType,
+                  return_vis: bool = False,
+                  show: bool = False,
+                  wait_time: int = 0,
+                  draw_pred: bool = True,
+                  pred_score_thr: float = 0.3,
+                  no_save_vis: bool = False,
+                  img_out_dir: str = '',
+                  **kwargs) -> Union[List[np.ndarray], None]:
+        """Visualize predictions.
+
+        Args:
+            inputs (List[Union[str, np.ndarray]]): Inputs for the inferencer.
+            preds (List[:obj:`DetDataSample`]): Predictions of the model.
+            return_vis (bool): Whether to return the visualization result.
+                Defaults to False.
+            show (bool): Whether to display the image in a popup window.
+                Defaults to False.
+            wait_time (float): The interval of show (s). Defaults to 0.
+            draw_pred (bool): Whether to draw predicted bounding boxes.
+                Defaults to True.
+            pred_score_thr (float): Minimum score of bboxes to draw.
+                Defaults to 0.3.
+            no_save_vis (bool): Whether to force not to save prediction
+                vis results. Defaults to False.
+            img_out_dir (str): Output directory of visualization results.
+                If left as empty, no file will be saved. Defaults to ''.
+
+        Returns:
+            List[np.ndarray] or None: Returns visualization results only if
+            applicable.
+        """
+        if no_save_vis is True:
+            img_out_dir = ''
+
+        if not show and img_out_dir == '' and not return_vis:
+            return None
+
+        if self.visualizer is None:
+            raise ValueError('Visualization needs the "visualizer" term'
+                             'defined in the config, but got None.')
+
+        results = []
+
+        for single_input, pred in zip(inputs, preds):
+            if isinstance(single_input, str):
+                img_bytes = mmengine.fileio.get(single_input)
+                img = mmcv.imfrombytes(img_bytes)
+                img = img[:, :, ::-1]
+                img_name = osp.basename(single_input)
+            elif isinstance(single_input, np.ndarray):
+                img = single_input.copy()
+                img_num = str(self.num_visualized_imgs).zfill(8)
+                img_name = f'{img_num}.jpg'
+            else:
+                raise ValueError('Unsupported input type: '
+                                 f'{type(single_input)}')
+
+            out_file = osp.join(img_out_dir, 'vis',
+                                img_name) if img_out_dir != '' else None
+
+            self.visualizer.add_datasample(
+                img_name,
+                img,
+                pred,
+                show=show,
+                wait_time=wait_time,
+                draw_gt=False,
+                draw_pred=draw_pred,
+                pred_score_thr=pred_score_thr,
+                out_file=out_file,
+            )
+            results.append(self.visualizer.get_image())
+            self.num_visualized_imgs += 1
+
+        return results
+
+    def postprocess(
+        self,
+        preds: PredType,
+        visualization: Optional[List[np.ndarray]] = None,
+        return_datasamples: bool = False,
+        print_result: bool = False,
+        no_save_pred: bool = False,
+        pred_out_dir: str = '',
+        **kwargs,
+    ) -> Dict:
+        """Process the predictions and visualization results from ``forward``
+        and ``visualize``.
+
+        This method should be responsible for the following tasks:
+
+        1. Convert datasamples into a json-serializable dict if needed.
+        2. Pack the predictions and visualization results and return them.
+        3. Dump or log the predictions.
+
+        Args:
+            preds (List[:obj:`DetDataSample`]): Predictions of the model.
+            visualization (Optional[np.ndarray]): Visualized predictions.
+            return_datasamples (bool): Whether to use Datasample to store
+                inference results. If False, dict will be used.
+            print_result (bool): Whether to print the inference result w/o
+                visualization to the console. Defaults to False.
+            no_save_pred (bool): Whether to force not to save prediction
+                results. Defaults to False.
+            pred_out_dir: Dir to save the inference results w/o
+                visualization. If left as empty, no file will be saved.
+                Defaults to ''.
+
+        Returns:
+            dict: Inference and visualization results with key ``predictions``
+            and ``visualization``.
+
+            - ``visualization`` (Any): Returned by :meth:`visualize`.
+            - ``predictions`` (dict or DataSample): Returned by
+                :meth:`forward` and processed in :meth:`postprocess`.
+                If ``return_datasamples=False``, it usually should be a
+                json-serializable dict containing only basic data elements such
+                as strings and numbers.
+        """
+        if no_save_pred is True:
+            pred_out_dir = ''
+
+        result_dict = {}
+        results = preds
+        if not return_datasamples:
+            results = []
+            for pred in preds:
+                result = self.pred2dict(pred, pred_out_dir)
+                results.append(result)
+        elif pred_out_dir != '':
+            warnings.warn('Currently does not support saving datasample '
+                          'when return_datasamples is set to True. '
+                          'Prediction results are not saved!')
+        # Add img to the results after printing and dumping
+        result_dict['predictions'] = results
+        if print_result:
+            print(result_dict)
+        result_dict['visualization'] = visualization
+        return result_dict
+
+    # TODO: The data format and fields saved in json need further discussion.
+    #  Maybe should include model name, timestamp, filename, image info etc.
+    def pred2dict(self,
+                  data_sample: DetDataSample,
+                  pred_out_dir: str = '') -> Dict:
+        """Extract elements necessary to represent a prediction into a
+        dictionary.
+
+        It's better to contain only basic data elements such as strings and
+        numbers in order to guarantee it's json-serializable.
+
+        Args:
+            data_sample (:obj:`DetDataSample`): Predictions of the model.
+            pred_out_dir: Dir to save the inference results w/o
+                visualization. If left as empty, no file will be saved.
+                Defaults to ''.
+
+        Returns:
+            dict: Prediction results.
+        """
+        is_save_pred = True
+        if pred_out_dir == '':
+            is_save_pred = False
+
+        if is_save_pred and 'img_path' in data_sample:
+            img_path = osp.basename(data_sample.img_path)
+            img_path = osp.splitext(img_path)[0]
+            out_img_path = osp.join(pred_out_dir, 'preds',
+                                    img_path + '_panoptic_seg.png')
+            out_json_path = osp.join(pred_out_dir, 'preds', img_path + '.json')
+        elif is_save_pred:
+            out_img_path = osp.join(
+                pred_out_dir, 'preds',
+                f'{self.num_predicted_imgs}_panoptic_seg.png')
+            out_json_path = osp.join(pred_out_dir, 'preds',
+                                     f'{self.num_predicted_imgs}.json')
+            self.num_predicted_imgs += 1
+
+        result = {}
+        if 'pred_instances' in data_sample:
+            masks = data_sample.pred_instances.get('masks')
+            pred_instances = data_sample.pred_instances.numpy()
+            result = {
+                'labels': pred_instances.labels.tolist(),
+                'scores': pred_instances.scores.tolist()
+            }
+            if 'bboxes' in pred_instances:
+                result['bboxes'] = pred_instances.bboxes.tolist()
+            if masks is not None:
+                if 'bboxes' not in pred_instances or pred_instances.bboxes.sum(
+                ) == 0:
+                    # Fake bbox, such as the SOLO.
+                    bboxes = mask2bbox(masks.cpu()).numpy().tolist()
+                    result['bboxes'] = bboxes
+                encode_masks = encode_mask_results(pred_instances.masks)
+                for encode_mask in encode_masks:
+                    if isinstance(encode_mask['counts'], bytes):
+                        encode_mask['counts'] = encode_mask['counts'].decode()
+                result['masks'] = encode_masks
+
+        if 'pred_panoptic_seg' in data_sample:
+            if VOID is None:
+                raise RuntimeError(
+                    'panopticapi is not installed, please install it by: '
+                    'pip install git+https://github.com/cocodataset/'
+                    'panopticapi.git.')
+
+            pan = data_sample.pred_panoptic_seg.sem_seg.cpu().numpy()[0]
+            pan[pan % INSTANCE_OFFSET == len(
+                self.model.dataset_meta['classes'])] = VOID
+            pan = id2rgb(pan).astype(np.uint8)
+
+            if is_save_pred:
+                mmcv.imwrite(pan[:, :, ::-1], out_img_path)
+                result['panoptic_seg_path'] = out_img_path
+            else:
+                result['panoptic_seg'] = pan
+
+        if is_save_pred:
+            mmengine.dump(result, out_json_path)
+
+        return result
--- a/mmdet/apis/inference.py
+++ b/mmdet/apis/inference.py
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import warnings
+from pathlib import Path
+from typing import Optional, Sequence, Union
+
+import numpy as np
+import torch
+import torch.nn as nn
+from mmcv.ops import RoIPool
+from mmcv.transforms import Compose
+from mmengine.config import Config
+from mmengine.dataset import default_collate
+from mmengine.model.utils import revert_sync_batchnorm
+from mmengine.registry import init_default_scope
+from mmengine.runner import load_checkpoint
+
+from mmdet.registry import DATASETS
+from mmdet.utils import ConfigType
+from ..evaluation import get_classes
+from ..registry import MODELS
+from ..structures import DetDataSample, SampleList
+from ..utils import get_test_pipeline_cfg
+
+
+def init_detector(
+    config: Union[str, Path, Config],
+    checkpoint: Optional[str] = None,
+    palette: str = 'none',
+    device: str = 'cuda:0',
+    cfg_options: Optional[dict] = None,
+) -> nn.Module:
+    """Initialize a detector from config file.
+
+    Args:
+        config (str, :obj:`Path`, or :obj:`mmengine.Config`): Config file path,
+            :obj:`Path`, or the config object.
+        checkpoint (str, optional): Checkpoint path. If left as None, the model
+            will not load any weights.
+        palette (str): Color palette used for visualization. If palette
+            is stored in checkpoint, use checkpoint's palette first, otherwise
+            use externally passed palette. Currently, supports 'coco', 'voc',
+            'citys' and 'random'. Defaults to none.
+        device (str): The device where the anchors will be put on.
+            Defaults to cuda:0.
+        cfg_options (dict, optional): Options to override some settings in
+            the used config.
+
+    Returns:
+        nn.Module: The constructed detector.
+    """
+    if isinstance(config, (str, Path)):
+        config = Config.fromfile(config)
+    elif not isinstance(config, Config):
+        raise TypeError('config must be a filename or Config object, '
+                        f'but got {type(config)}')
+    if cfg_options is not None:
+        config.merge_from_dict(cfg_options)
+    elif 'init_cfg' in config.model.backbone:
+        config.model.backbone.init_cfg = None
+
+    scope = config.get('default_scope', 'mmdet')
+    if scope is not None:
+        init_default_scope(config.get('default_scope', 'mmdet'))
+
+    model = MODELS.build(config.model)
+    model = revert_sync_batchnorm(model)
+    if checkpoint is None:
+        warnings.simplefilter('once')
+        warnings.warn('checkpoint is None, use COCO classes by default.')
+        model.dataset_meta = {'classes': get_classes('coco')}
+    else:
+        checkpoint = load_checkpoint(model, checkpoint, map_location='cpu')
+        # Weights converted from elsewhere may not have meta fields.
+        checkpoint_meta = checkpoint.get('meta', {})
+
+        # save the dataset_meta in the model for convenience
+        if 'dataset_meta' in checkpoint_meta:
+            # mmdet 3.x, all keys should be lowercase
+            model.dataset_meta = {
+                k.lower(): v
+                for k, v in checkpoint_meta['dataset_meta'].items()
+            }
+        elif 'CLASSES' in checkpoint_meta:
+            # < mmdet 3.x
+            classes = checkpoint_meta['CLASSES']
+            model.dataset_meta = {'classes': classes}
+        else:
+            warnings.simplefilter('once')
+            warnings.warn(
+                'dataset_meta or class names are not saved in the '
+                'checkpoint\'s meta data, use COCO classes by default.')
+            model.dataset_meta = {'classes': get_classes('coco')}
+
+    # Priority:  args.palette -> config -> checkpoint
+    if palette != 'none':
+        model.dataset_meta['palette'] = palette
+    else:
+        test_dataset_cfg = copy.deepcopy(config.test_dataloader.dataset)
+        # lazy init. We only need the metainfo.
+        test_dataset_cfg['lazy_init'] = True
+        metainfo = DATASETS.build(test_dataset_cfg).metainfo
+        cfg_palette = metainfo.get('palette', None)
+        if cfg_palette is not None:
+            model.dataset_meta['palette'] = cfg_palette
+        else:
+            if 'palette' not in model.dataset_meta:
+                warnings.warn(
+                    'palette does not exist, random is used by default. '
+                    'You can also set the palette to customize.')
+                model.dataset_meta['palette'] = 'random'
+
+    model.cfg = config  # save the config in the model for convenience
+    model.to(device)
+    model.eval()
+    return model
+
+
+ImagesType = Union[str, np.ndarray, Sequence[str], Sequence[np.ndarray]]
+
+
+def inference_detector(
+    model: nn.Module,
+    imgs: ImagesType,
+    test_pipeline: Optional[Compose] = None,
+    text_prompt: Optional[str] = None,
+    custom_entities: bool = False,
+) -> Union[DetDataSample, SampleList]:
+    """Inference image(s) with the detector.
+
+    Args:
+        model (nn.Module): The loaded detector.
+        imgs (str, ndarray, Sequence[str/ndarray]):
+           Either image files or loaded images.
+        test_pipeline (:obj:`Compose`): Test pipeline.
+
+    Returns:
+        :obj:`DetDataSample` or list[:obj:`DetDataSample`]:
+        If imgs is a list or tuple, the same length list type results
+        will be returned, otherwise return the detection results directly.
+    """
+
+    if isinstance(imgs, (list, tuple)):
+        is_batch = True
+    else:
+        imgs = [imgs]
+        is_batch = False
+
+    cfg = model.cfg
+
+    if test_pipeline is None:
+        cfg = cfg.copy()
+        test_pipeline = get_test_pipeline_cfg(cfg)
+        if isinstance(imgs[0], np.ndarray):
+            # Calling this method across libraries will result
+            # in module unregistered error if not prefixed with mmdet.
+            test_pipeline[0].type = 'mmdet.LoadImageFromNDArray'
+
+        test_pipeline = Compose(test_pipeline)
+
+    if model.data_preprocessor.device.type == 'cpu':
+        for m in model.modules():
+            assert not isinstance(
+                m, RoIPool
+            ), 'CPU inference with RoIPool is not supported currently.'
+
+    result_list = []
+    for i, img in enumerate(imgs):
+        # prepare data
+        if isinstance(img, np.ndarray):
+            # TODO: remove img_id.
+            data_ = dict(img=img, img_id=0)
+        else:
+            # TODO: remove img_id.
+            data_ = dict(img_path=img, img_id=0)
+
+        if text_prompt:
+            data_['text'] = text_prompt
+            data_['custom_entities'] = custom_entities
+
+        # build the data pipeline
+        data_ = test_pipeline(data_)
+
+        data_['inputs'] = [data_['inputs']]
+        data_['data_samples'] = [data_['data_samples']]
+
+        # forward the model
+        with torch.no_grad():
+            results = model.test_step(data_)[0]
+
+        result_list.append(results)
+
+    if not is_batch:
+        return result_list[0]
+    else:
+        return result_list
+
+
+# TODO: Awaiting refactoring
+async def async_inference_detector(model, imgs):
+    """Async inference image(s) with the detector.
+
+    Args:
+        model (nn.Module): The loaded detector.
+        img (str | ndarray): Either image files or loaded images.
+
+    Returns:
+        Awaitable detection results.
+    """
+    if not isinstance(imgs, (list, tuple)):
+        imgs = [imgs]
+
+    cfg = model.cfg
+
+    if isinstance(imgs[0], np.ndarray):
+        cfg = cfg.copy()
+        # set loading pipeline type
+        cfg.data.test.pipeline[0].type = 'LoadImageFromNDArray'
+
+    # cfg.data.test.pipeline = replace_ImageToTensor(cfg.data.test.pipeline)
+    test_pipeline = Compose(cfg.data.test.pipeline)
+
+    datas = []
+    for img in imgs:
+        # prepare data
+        if isinstance(img, np.ndarray):
+            # directly add img
+            data = dict(img=img)
+        else:
+            # add information into dict
+            data = dict(img_info=dict(filename=img), img_prefix=None)
+        # build the data pipeline
+        data = test_pipeline(data)
+        datas.append(data)
+
+    for m in model.modules():
+        assert not isinstance(
+            m,
+            RoIPool), 'CPU inference with RoIPool is not supported currently.'
+
+    # We don't restore `torch.is_grad_enabled()` value during concurrent
+    # inference since execution can overlap
+    torch.set_grad_enabled(False)
+    results = await model.aforward_test(data, rescale=True)
+    return results
+
+
+def build_test_pipeline(cfg: ConfigType) -> ConfigType:
+    """Build test_pipeline for mot/vis demo. In mot/vis infer, original
+    test_pipeline should remove the "LoadImageFromFile" and
+    "LoadTrackAnnotations".
+
+    Args:
+         cfg (ConfigDict): The loaded config.
+    Returns:
+         ConfigType: new test_pipeline
+    """
+    # remove the "LoadImageFromFile" and "LoadTrackAnnotations" in pipeline
+    transform_broadcaster = cfg.test_dataloader.dataset.pipeline[0].copy()
+    for transform in transform_broadcaster['transforms']:
+        if transform['type'] == 'Resize':
+            transform_broadcaster['transforms'] = transform
+    pack_track_inputs = cfg.test_dataloader.dataset.pipeline[-1].copy()
+    test_pipeline = Compose([transform_broadcaster, pack_track_inputs])
+
+    return test_pipeline
+
+
+def inference_mot(model: nn.Module, img: np.ndarray, frame_id: int,
+                  video_len: int) -> SampleList:
+    """Inference image(s) with the mot model.
+
+    Args:
+        model (nn.Module): The loaded mot model.
+        img (np.ndarray): Loaded image.
+        frame_id (int): frame id.
+        video_len (int): demo video length
+    Returns:
+        SampleList: The tracking data samples.
+    """
+    cfg = model.cfg
+    data = dict(
+        img=[img.astype(np.float32)],
+        frame_id=[frame_id],
+        ori_shape=[img.shape[:2]],
+        img_id=[frame_id + 1],
+        ori_video_length=[video_len])
+
+    test_pipeline = build_test_pipeline(cfg)
+    data = test_pipeline(data)
+
+    if not next(model.parameters()).is_cuda:
+        for m in model.modules():
+            assert not isinstance(
+                m, RoIPool
+            ), 'CPU inference with RoIPool is not supported currently.'
+
+    # forward the model
+    with torch.no_grad():
+        data = default_collate([data])
+        result = model.test_step(data)[0]
+    return result
+
+
+def init_track_model(config: Union[str, Config],
+                     checkpoint: Optional[str] = None,
+                     detector: Optional[str] = None,
+                     reid: Optional[str] = None,
+                     device: str = 'cuda:0',
+                     cfg_options: Optional[dict] = None) -> nn.Module:
+    """Initialize a model from config file.
+
+    Args:
+        config (str or :obj:`mmengine.Config`): Config file path or the config
+            object.
+        checkpoint (Optional[str], optional): Checkpoint path. Defaults to
+            None.
+        detector (Optional[str], optional): Detector Checkpoint path, use in
+            some tracking algorithms like sort.  Defaults to None.
+        reid (Optional[str], optional): Reid checkpoint path. use in
+            some tracking algorithms like sort. Defaults to None.
+        device (str, optional): The device that the model inferences on.
+            Defaults to `cuda:0`.
+        cfg_options (Optional[dict], optional): Options to override some
+            settings in the used config. Defaults to None.
+
+    Returns:
+        nn.Module: The constructed model.
+    """
+    if isinstance(config, str):
+        config = Config.fromfile(config)
+    elif not isinstance(config, Config):
+        raise TypeError('config must be a filename or Config object, '
+                        f'but got {type(config)}')
+    if cfg_options is not None:
+        config.merge_from_dict(cfg_options)
+
+    model = MODELS.build(config.model)
+
+    if checkpoint is not None:
+        checkpoint = load_checkpoint(model, checkpoint, map_location='cpu')
+        # Weights converted from elsewhere may not have meta fields.
+        checkpoint_meta = checkpoint.get('meta', {})
+        # save the dataset_meta in the model for convenience
+        if 'dataset_meta' in checkpoint_meta:
+            if 'CLASSES' in checkpoint_meta['dataset_meta']:
+                value = checkpoint_meta['dataset_meta'].pop('CLASSES')
+                checkpoint_meta['dataset_meta']['classes'] = value
+            model.dataset_meta = checkpoint_meta['dataset_meta']
+
+    if detector is not None:
+        assert not (checkpoint and detector), \
+            'Error: checkpoint and detector checkpoint cannot both exist'
+        load_checkpoint(model.detector, detector, map_location='cpu')
+
+    if reid is not None:
+        assert not (checkpoint and reid), \
+            'Error: checkpoint and reid checkpoint cannot both exist'
+        load_checkpoint(model.reid, reid, map_location='cpu')
+
+    # Some methods don't load checkpoints or checkpoints don't contain
+    # 'dataset_meta'
+    # VIS need dataset_meta, MOT don't need dataset_meta
+    if not hasattr(model, 'dataset_meta'):
+        warnings.warn('dataset_meta or class names are missed, '
+                      'use None by default.')
+        model.dataset_meta = {'classes': None}
+
+    model.cfg = config  # save the config in the model for convenience
+    model.to(device)
+    model.eval()
+    return model
--- a/mmdet/configs/_base_/datasets/coco_detection.py
+++ b/mmdet/configs/_base_/datasets/coco_detection.py
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmcv.transforms import LoadImageFromFile
+from mmengine.dataset.sampler import DefaultSampler
+
+from mmdet.datasets import AspectRatioBatchSampler, CocoDataset
+from mmdet.datasets.transforms import (LoadAnnotations, PackDetInputs,
+                                       RandomFlip, Resize)
+from mmdet.evaluation import CocoMetric
+
+# dataset settings
+dataset_type = CocoDataset
+data_root = 'data/coco/'
+
+# Example to use different file client
+# Method 1: simply set the data root and let the file I/O module
+# automatically infer from prefix (not support LMDB and Memcache yet)
+
+# data_root = 's3://openmmlab/datasets/detection/coco/'
+
+# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6
+# backend_args = dict(
+#     backend='petrel',
+#     path_mapping=dict({
+#         './data/': 's3://openmmlab/datasets/detection/',
+#         'data/': 's3://openmmlab/datasets/detection/'
+#     }))
+backend_args = None
+
+train_pipeline = [
+    dict(type=LoadImageFromFile, backend_args=backend_args),
+    dict(type=LoadAnnotations, with_bbox=True),
+    dict(type=Resize, scale=(1333, 800), keep_ratio=True),
+    dict(type=RandomFlip, prob=0.5),
+    dict(type=PackDetInputs)
+]
+test_pipeline = [
+    dict(type=LoadImageFromFile, backend_args=backend_args),
+    dict(type=Resize, scale=(1333, 800), keep_ratio=True),
+    # If you don't have a gt annotation, delete the pipeline
+    dict(type=LoadAnnotations, with_bbox=True),
+    dict(
+        type=PackDetInputs,
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor'))
+]
+train_dataloader = dict(
+    batch_size=2,
+    num_workers=2,
+    persistent_workers=True,
+    sampler=dict(type=DefaultSampler, shuffle=True),
+    batch_sampler=dict(type=AspectRatioBatchSampler),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        ann_file='annotations/instances_train2017.json',
+        data_prefix=dict(img='train2017/'),
+        filter_cfg=dict(filter_empty_gt=True, min_size=32),
+        pipeline=train_pipeline,
+        backend_args=backend_args))
+val_dataloader = dict(
+    batch_size=1,
+    num_workers=2,
+    persistent_workers=True,
+    drop_last=False,
+    sampler=dict(type=DefaultSampler, shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        ann_file='annotations/instances_val2017.json',
+        data_prefix=dict(img='val2017/'),
+        test_mode=True,
+        pipeline=test_pipeline,
+        backend_args=backend_args))
+test_dataloader = val_dataloader
+
+val_evaluator = dict(
+    type=CocoMetric,
+    ann_file=data_root + 'annotations/instances_val2017.json',
+    metric='bbox',
+    format_only=False,
+    backend_args=backend_args)
+test_evaluator = val_evaluator
+
+# inference on test dataset and
+# format the output results for submission.
+# test_dataloader = dict(
+#     batch_size=1,
+#     num_workers=2,
+#     persistent_workers=True,
+#     drop_last=False,
+#     sampler=dict(type=DefaultSampler, shuffle=False),
+#     dataset=dict(
+#         type=dataset_type,
+#         data_root=data_root,
+#         ann_file=data_root + 'annotations/image_info_test-dev2017.json',
+#         data_prefix=dict(img='test2017/'),
+#         test_mode=True,
+#         pipeline=test_pipeline))
+# test_evaluator = dict(
+#     type=CocoMetric,
+#     metric='bbox',
+#     format_only=True,
+#     ann_file=data_root + 'annotations/image_info_test-dev2017.json',
+#     outfile_prefix='./work_dirs/coco_detection/test')
--- a/mmdet/configs/_base_/datasets/coco_instance.py
+++ b/mmdet/configs/_base_/datasets/coco_instance.py
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmcv.transforms.loading import LoadImageFromFile
+from mmengine.dataset.sampler import DefaultSampler
+
+from mmdet.datasets.coco import CocoDataset
+from mmdet.datasets.samplers.batch_sampler import AspectRatioBatchSampler
+from mmdet.datasets.transforms.formatting import PackDetInputs
+from mmdet.datasets.transforms.loading import LoadAnnotations
+from mmdet.datasets.transforms.transforms import RandomFlip, Resize
+from mmdet.evaluation.metrics.coco_metric import CocoMetric
+
+# dataset settings
+dataset_type = 'CocoDataset'
+data_root = 'data/coco/'
+
+# Example to use different file client
+# Method 1: simply set the data root and let the file I/O module
+# automatically infer from prefix (not support LMDB and Memcache yet)
+
+# data_root = 's3://openmmlab/datasets/detection/coco/'
+
+# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6
+# backend_args = dict(
+#     backend='petrel',
+#     path_mapping=dict({
+#         './data/': 's3://openmmlab/datasets/detection/',
+#         'data/': 's3://openmmlab/datasets/detection/'
+#     }))
+backend_args = None
+
+train_pipeline = [
+    dict(type=LoadImageFromFile, backend_args=backend_args),
+    dict(type=LoadAnnotations, with_bbox=True, with_mask=True),
+    dict(type=Resize, scale=(1333, 800), keep_ratio=True),
+    dict(type=RandomFlip, prob=0.5),
+    dict(type=PackDetInputs)
+]
+test_pipeline = [
+    dict(type=LoadImageFromFile, backend_args=backend_args),
+    dict(type=Resize, scale=(1333, 800), keep_ratio=True),
+    # If you don't have a gt annotation, delete the pipeline
+    dict(type=LoadAnnotations, with_bbox=True, with_mask=True),
+    dict(
+        type=PackDetInputs,
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor'))
+]
+train_dataloader = dict(
+    batch_size=2,
+    num_workers=2,
+    persistent_workers=True,
+    sampler=dict(type=DefaultSampler, shuffle=True),
+    batch_sampler=dict(type=AspectRatioBatchSampler),
+    dataset=dict(
+        type=CocoDataset,
+        data_root=data_root,
+        ann_file='annotations/instances_train2017.json',
+        data_prefix=dict(img='train2017/'),
+        filter_cfg=dict(filter_empty_gt=True, min_size=32),
+        pipeline=train_pipeline,
+        backend_args=backend_args))
+val_dataloader = dict(
+    batch_size=1,
+    num_workers=2,
+    persistent_workers=True,
+    drop_last=False,
+    sampler=dict(type=DefaultSampler, shuffle=False),
+    dataset=dict(
+        type=CocoDataset,
+        data_root=data_root,
+        ann_file='annotations/instances_val2017.json',
+        data_prefix=dict(img='val2017/'),
+        test_mode=True,
+        pipeline=test_pipeline,
+        backend_args=backend_args))
+test_dataloader = val_dataloader
+
+val_evaluator = dict(
+    type=CocoMetric,
+    ann_file=data_root + 'annotations/instances_val2017.json',
+    metric=['bbox', 'segm'],
+    format_only=False,
+    backend_args=backend_args)
+test_evaluator = val_evaluator
+
+# inference on test dataset and
+# format the output results for submission.
+# test_dataloader = dict(
+#     batch_size=1,
+#     num_workers=2,
+#     persistent_workers=True,
+#     drop_last=False,
+#     sampler=dict(type=DefaultSampler, shuffle=False),
+#     dataset=dict(
+#         type=CocoDataset,
+#         data_root=data_root,
+#         ann_file=data_root + 'annotations/image_info_test-dev2017.json',
+#         data_prefix=dict(img='test2017/'),
+#         test_mode=True,
+#         pipeline=test_pipeline))
+# test_evaluator = dict(
+#     type=CocoMetric,
+#     metric=['bbox', 'segm'],
+#     format_only=True,
+#     ann_file=data_root + 'annotations/image_info_test-dev2017.json',
+#     outfile_prefix='./work_dirs/coco_instance/test')