add resources part

b6df0d33 · limm · cbc25585 · b6df0d33 · b6df0d33 · b6df0d33
Commit b6df0d33 authored Jun 24, 2025 by limm
20 changed files
--- a/docs/en/Makefile
+++ b/docs/en/Makefile
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/en/_static/css/readthedocs.css
+++ b/docs/en/_static/css/readthedocs.css
+.header-logo {
+    background-image: url("../image/mmpt-logo.png");
+    background-size: 183px 50px;
+    height: 50px;
+    width: 183px;
+}
+
+@media screen and (min-width: 1100px) {
+  .header-logo {
+    top: -12px;
+  }
+}
+
+pre {
+    white-space: pre;
+}
+
+@media screen and (min-width: 2000px) {
+  .pytorch-content-left {
+    width: 1200px;
+    margin-left: 30px;
+  }
+  article.pytorch-article {
+    max-width: 1200px;
+  }
+  .pytorch-breadcrumbs-wrapper {
+    width: 1200px;
+  }
+  .pytorch-right-menu.scrolling-fixed {
+    position: fixed;
+    top: 45px;
+    left: 1580px;
+  }
+}
+
+
+article.pytorch-article section code {
+  padding: .2em .4em;
+  background-color: #f3f4f7;
+  border-radius: 5px;
+}
+
+/* Disable the change in tables */
+article.pytorch-article section table code {
+  padding: unset;
+  background-color: unset;
+  border-radius: unset;
+}
+
+table.autosummary td {
+  width: 50%
+}
+
+img.align-center {
+  display: block;
+  margin-left: auto;
+  margin-right: auto;
+}
+
+article.pytorch-article p.rubric {
+  font-weight: bold;
+}
--- a/docs/en/_static/image/confusion-matrix.png
+++ b/docs/en/_static/image/confusion-matrix.png
--- a/docs/en/_static/image/mmpt-logo.png
+++ b/docs/en/_static/image/mmpt-logo.png
--- a/docs/en/_static/image/tools/analysis/analyze_log.jpg
+++ b/docs/en/_static/image/tools/analysis/analyze_log.jpg
--- a/docs/en/_static/js/custom.js
+++ b/docs/en/_static/js/custom.js
+var collapsedSections = ['Advanced Guides', 'Model Zoo', 'Visualization', 'Analysis Tools', 'Deployment', 'Notes'];
+
+$(document).ready(function () {
+  $('.model-summary').DataTable({
+    "stateSave": false,
+    "lengthChange": false,
+    "pageLength": 20,
+    "order": []
+  });
+});
--- a/docs/en/_templates/404.html
+++ b/docs/en/_templates/404.html
+{% extends "layout.html" %}
+
+{% block body %}
+
+<h1>Page Not Found</h1>
+<p>
+  The page you are looking for cannot be found.
+</p>
+<p>
+  If you just switched documentation versions, it is likely that the page you were on is moved. You can look for it in
+  the content table left, or go to <a href="{{ pathto(root_doc) }}">the homepage</a>.
+</p>
+<p>
+  If you cannot find documentation you want, please <a
+    href="https://github.com/open-mmlab/mmpretrain/issues/new/choose">open an issue</a> to tell us!
+</p>
+
+{% endblock %}
--- a/docs/en/_templates/autosummary/class.rst
+++ b/docs/en/_templates/autosummary/class.rst
+.. role:: hidden
+    :class: hidden-section
+.. currentmodule:: {{ module }}
+
+
+{{ name | underline}}
+
+.. autoclass:: {{ name }}
+    :members:
+
+..
+  autogenerated from _templates/autosummary/class.rst
+  note it does not have :inherited-members:
--- a/docs/en/_templates/callable.rst
+++ b/docs/en/_templates/callable.rst
+.. role:: hidden
+    :class: hidden-section
+.. currentmodule:: {{ module }}
+
+
+{{ name | underline}}
+
+.. autoclass:: {{ name }}
+    :members:
+    :special-members: __call__
+
+..
+  autogenerated from _templates/callable.rst
+  note it does not have :inherited-members:
--- a/docs/en/_templates/data_transform.rst
+++ b/docs/en/_templates/data_transform.rst
+.. role:: hidden
+    :class: hidden-section
+.. currentmodule:: {{ module }}
+
+
+{{ name | underline}}
+
+.. autoclass:: {{ name }}
+    :members: transform
+
+..
+  autogenerated from _templates/callable.rst
+  note it does not have :inherited-members:
--- a/docs/en/advanced_guides/convention.md
+++ b/docs/en/advanced_guides/convention.md
+# Convention in MMPretrain
+
+## Model Naming Convention
+
+We follow the below convention to name models. Contributors are advised to follow the same style. The model names are divided into five parts: algorithm info, module information, pretrain information, training information and data information. Logically, different parts are concatenated by underscores `'_'`, and words in the same part are concatenated by dashes `'-'`.
+
+```text
+{algorithm info}_{module info}_{pretrain info}_{training info}_{data info}
+```
+
+- `algorithm info` (optional): The main algorithm information, it's includes the main training algorithms like MAE, BEiT, etc.
+- `module info`:  The module information, it usually includes the backbone name, such as resnet, vit, etc.
+- `pretrain info`: (optional): The pretrain model information, such as the pretrain model is trained on ImageNet-21k.
+- `training info`: The training information, some training schedule, including batch size, lr schedule, data augment and the like.
+- `data info`: The data information, it usually includes the dataset name, input size and so on, such as imagenet, cifar, etc.
+
+### Algorithm information
+
+The main algorithm name to train the model. For example:
+
+- `simclr`
+- `mocov2`
+- `eva-mae-style`
+
+The model trained by supervised image classification can omit this field.
+
+### Module information
+
+The modules of the model, usually, the backbone must be included in this field, and the neck and head
+information can be omitted. For example:
+
+- `resnet50`
+- `vit-base-p16`
+- `swin-base`
+
+### Pretrain information
+
+If the model is a fine-tuned model from a pre-trained model, we need to record some information of the
+pre-trained model. For example:
+
+- The source of the pre-trained model: `fb`, `openai`, etc.
+- The method to train the pre-trained model: `clip`, `mae`, `distill`, etc.
+- The dataset used for pre-training: `in21k`, `laion2b`, etc. (`in1k` can be omitted.)
+- The training duration: `300e`, `1600e`, etc.
+
+Not all information is necessary, only select the necessary information to distinguish different pre-trained
+models.
+
+At the end of this field, use a `-pre` as an identifier, like `mae-in21k-pre`.
+
+### Training information
+
+Training schedule, including training type, `batch size`, `lr schedule`, data augment, special loss functions and so on:
+
+- format `{gpu x batch_per_gpu}`, such as `8xb32`
+
+Training type (mainly seen in the transformer network, such as the `ViT` algorithm, which is usually divided into two training type: pre-training and fine-tuning):
+
+- `ft` : configuration file for fine-tuning
+- `pt` : configuration file for pretraining
+
+Training recipe. Usually, only the part that is different from the original paper will be marked. These methods will be arranged in the order `{pipeline aug}-{train aug}-{loss trick}-{scheduler}-{epochs}`.
+
+- `coslr-200e` : use cosine scheduler to train 200 epochs
+- `autoaug-mixup-lbs-coslr-50e` : use `autoaug`, `mixup`, `label smooth`, `cosine scheduler` to train 50 epochs
+
+If the model is converted from a third-party repository like the official repository, the training information
+can be omitted and use a `3rdparty` as an identifier.
+
+### Data information
+
+- `in1k` : `ImageNet1k` dataset, default to use the input image size of 224x224;
+- `in21k` : `ImageNet21k` dataset, also called `ImageNet22k` dataset, default to use the input image size of 224x224;
+- `in1k-384px` : Indicates that the input image size is 384x384;
+- `cifar100`
+
+### Model Name Example
+
+```text
+vit-base-p32_clip-openai-pre_3rdparty_in1k
+```
+
+- `vit-base-p32`: The module information
+- `clip-openai-pre`: The pre-train information.
+  - `clip`: The pre-train method is clip.
+  - `openai`: The pre-trained model is come from OpenAI.
+  - `pre`: The pre-train identifier.
+- `3rdparty`: The model is converted from a third-party repository.
+- `in1k`: Dataset information. The model is trained from ImageNet-1k dataset and the input size is `224x224`.
+
+```text
+beit_beit-base-p16_8xb256-amp-coslr-300e_in1k
+```
+
+- `beit`: The algorithm information
+- `beit-base`: The module information, since the backbone is a modified ViT from BEiT, the backbone name is
+  also `beit`.
+- `8xb256-amp-coslr-300e`: The training information.
+  - `8xb256`: Use 8 GPUs and the batch size on each GPU is 256.
+  - `amp`: Use automatic-mixed-precision training.
+  - `coslr`: Use cosine annealing learning rate scheduler.
+  - `300e`: To train 300 epochs.
+- `in1k`: Dataset information. The model is trained from ImageNet-1k dataset and the input size is `224x224`.
+
+## Config File Naming Convention
+
+The naming of the config file is almost the same with the model name, with several difference:
+
+- The training information is necessary, and cannot be `3rdparty`.
+- If the config file only includes backbone settings, without neither head settings nor dataset settings. We
+  will name it as `{module info}_headless.py`. This kind of config files are usually used for third-party
+  pre-trained models on large datasets.
+
+## Checkpoint Naming Convention
+
+The naming of the weight mainly includes the model name, date and hash value.
+
+```text
+{model_name}_{date}-{hash}.pth
+```
--- a/docs/en/advanced_guides/datasets.md
+++ b/docs/en/advanced_guides/datasets.md
+# Adding New Dataset
+
+You can write a new dataset class inherited from `BaseDataset`, and overwrite `load_data_list(self)`,
+like [CIFAR10](https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/datasets/cifar.py) and [ImageNet](https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/datasets/imagenet.py).
+Typically, this function returns a list, where each sample is a dict, containing necessary data information, e.g., `img` and `gt_label`.
+
+Assume we are going to implement a `Filelist` dataset, which takes filelists for both training and testing. The format of annotation list is as follows:
+
+```text
+000001.jpg 0
+000002.jpg 1
+```
+
+## 1. Create Dataset Class
+
+We can create a new dataset in `mmpretrain/datasets/filelist.py` to load the data.
+
+```python
+from mmpretrain.registry import DATASETS
+from .base_dataset import BaseDataset
+
+
+@DATASETS.register_module()
+class Filelist(BaseDataset):
+
+    def load_data_list(self):
+        assert isinstance(self.ann_file, str),
+
+        data_list = []
+        with open(self.ann_file) as f:
+            samples = [x.strip().split(' ') for x in f.readlines()]
+            for filename, gt_label in samples:
+                img_path = add_prefix(filename, self.img_prefix)
+                info = {'img_path': img_path, 'gt_label': int(gt_label)}
+                data_list.append(info)
+        return data_list
+```
+
+## 2. Add to the package
+
+And add this dataset class in `mmpretrain/datasets/__init__.py`
+
+```python
+from .base_dataset import BaseDataset
+...
+from .filelist import Filelist
+
+__all__ = [
+    'BaseDataset', ... ,'Filelist'
+]
+```
+
+## 3. Modify Related Config
+
+Then in the config, to use `Filelist` you can modify the config as the following
+
+```python
+train_dataloader = dict(
+    ...
+    dataset=dict(
+        type='Filelist',
+        ann_file='image_list.txt',
+        pipeline=train_pipeline,
+    )
+)
+```
+
+All the dataset classes inherit from [`BaseDataset`](https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/datasets/base_dataset.py) have **lazy loading** and **memory saving** features, you can refer to related documents of {external+mmengine:doc}`BaseDataset <advanced_tutorials/basedataset>`.
+
+```{note}
+If the dictionary of the data sample contains 'img_path' but not 'img', then 'LoadImgFromFile' transform must be added in the pipeline.
+```
--- a/docs/en/advanced_guides/evaluation.md
+++ b/docs/en/advanced_guides/evaluation.md
+# Customize Evaluation Metrics
+
+## Use metrics in MMPretrain
+
+In MMPretrain, we have provided multiple metrics for both single-label classification and multi-label
+classification:
+
+**Single-label Classification**:
+
+- [`Accuracy`](mmpretrain.evaluation.Accuracy)
+- [`SingleLabelMetric`](mmpretrain.evaluation.SingleLabelMetric), including precision, recall, f1-score and
+  support.
+
+**Multi-label Classification**:
+
+- [`AveragePrecision`](mmpretrain.evaluation.AveragePrecision), or AP (mAP).
+- [`MultiLabelMetric`](mmpretrain.evaluation.MultiLabelMetric), including precision, recall, f1-score and
+  support.
+
+To use these metrics during validation and testing, we need to modify the `val_evaluator` and `test_evaluator`
+fields in the config file.
+
+Here is several examples:
+
+1. Calculate top-1 and top-5 accuracy during both validation and test.
+
+   ```python
+   val_evaluator = dict(type='Accuracy', topk=(1, 5))
+   test_evaluator = val_evaluator
+   ```
+
+2. Calculate top-1 accuracy, top-5 accuracy, precision and recall during both validation and test.
+
+   ```python
+   val_evaluator = [
+     dict(type='Accuracy', topk=(1, 5)),
+     dict(type='SingleLabelMetric', items=['precision', 'recall']),
+   ]
+   test_evaluator = val_evaluator
+   ```
+
+3. Calculate mAP (mean AveragePrecision), CP (Class-wise mean Precision), CR (Class-wise mean Recall), CF
+   (Class-wise mean F1-score), OP (Overall mean Precision), OR (Overall mean Recall) and OF1 (Overall mean
+   F1-score).
+
+   ```python
+   val_evaluator = [
+     dict(type='AveragePrecision'),
+     dict(type='MultiLabelMetric', average='macro'),  # class-wise mean
+     dict(type='MultiLabelMetric', average='micro'),  # overall mean
+   ]
+   test_evaluator = val_evaluator
+   ```
+
+## Add new metrics
+
+MMPretrain supports the implementation of customized evaluation metrics for users who pursue higher customization.
+
+You need to create a new file under `mmpretrain/evaluation/metrics`, and implement the new metric in the file, for example, in `mmpretrain/evaluation/metrics/my_metric.py`. And create a customized evaluation metric class `MyMetric` which inherits [`BaseMetric in MMEngine`](mmengine.evaluator.BaseMetric).
+
+The data format processing method `process` and the metric calculation method `compute_metrics` need to be overwritten respectively. Add it to the `METRICS` registry to implement any customized evaluation metric.
+
+```python
+from mmengine.evaluator import BaseMetric
+from mmpretrain.registry import METRICS
+
+@METRICS.register_module()
+class MyMetric(BaseMetric):
+
+    def process(self, data_batch: Sequence[Dict], data_samples: Sequence[Dict]):
+    """ The processed results should be stored in ``self.results``, which will
+        be used to computed the metrics when all batches have been processed.
+        `data_batch` stores the batch data from dataloader,
+        and `data_samples` stores the batch outputs from model.
+    """
+        ...
+
+    def compute_metrics(self, results: List):
+    """ Compute the metrics from processed results and returns the evaluation results.
+    """
+        ...
+```
+
+Then, import it in the `mmpretrain/evaluation/metrics/__init__.py` to add it into the `mmpretrain.evaluation` package.
+
+```python
+# In mmpretrain/evaluation/metrics/__init__.py
+...
+from .my_metric import MyMetric
+
+__all__ = [..., 'MyMetric']
+```
+
+Finally, use `MyMetric` in the `val_evaluator` and `test_evaluator` field of config files.
+
+```python
+val_evaluator = dict(type='MyMetric', ...)
+test_evaluator = val_evaluator
+```
+
+```{note}
+More details can be found in {external+mmengine:doc}`MMEngine Documentation: Evaluation <design/evaluation>`.
+```
--- a/docs/en/advanced_guides/modules.md
+++ b/docs/en/advanced_guides/modules.md
+# Customize Models
+
+In our design, a complete model is defined as a top-level module which contains several model components based on their functionalities.
+
+- model: a top-level module defines the type of the task, such as `ImageClassifier` for image classification, `MAE` for self-supervised leanrning, `ImageToImageRetriever` for image retrieval.
+- backbone: usually a feature extraction network that records the major differences between models, e.g., `ResNet`, `MobileNet`.
+- neck: the component between backbone and head, e.g., `GlobalAveragePooling`.
+- head: the component for specific tasks, e.g., `ClsHead`, `ContrastiveHead`.
+- loss: the component in the head for calculating losses, e.g., `CrossEntropyLoss`, `LabelSmoothLoss`.
+- target_generator: the component for self-supervised leanrning task specifically, e.g., `VQKD`, `HOGGenerator`.
+
+## Add a new model
+
+Generally, for image classification and retrieval tasks, the pipelines are consistent. However, the pipelines are different from each self-supervised leanrning algorithms, like `MAE` and `BEiT`. Thus, in this section, we will explain how to add your self-supervised learning algorithm.
+
+### Add a new self-supervised learning algorithm
+
+1. Create a new file `mmpretrain/models/selfsup/new_algorithm.py` and implement `NewAlgorithm` in it.
+
+   ```python
+   from mmpretrain.registry import MODELS
+   from .base import BaseSelfSupvisor
+
+
+   @MODELS.register_module()
+   class NewAlgorithm(BaseSelfSupvisor):
+
+       def __init__(self, backbone, neck=None, head=None, init_cfg=None):
+           super().__init__(init_cfg)
+           pass
+
+       # ``extract_feat`` function is defined in BaseSelfSupvisor, you could
+       # overwrite it if needed
+       def extract_feat(self, inputs, **kwargs):
+           pass
+
+       # the core function to compute the loss
+       def loss(self, inputs, data_samples, **kwargs):
+           pass
+
+   ```
+
+2. Import the new algorithm module in `mmpretrain/models/selfsup/__init__.py`
+
+   ```python
+   ...
+   from .new_algorithm import NewAlgorithm
+
+   __all__ = [
+       ...,
+       'NewAlgorithm',
+       ...
+   ]
+   ```
+
+3. Use it in your config file.
+
+   ```python
+   model = dict(
+       type='NewAlgorithm',
+       backbone=...,
+       neck=...,
+       head=...,
+       ...
+   )
+   ```
+
+## Add a new backbone
+
+Here we present how to develop a new backbone component by an example of `ResNet_CIFAR`.
+As the input size of CIFAR is 32x32, which is much smaller than the default size of 224x224 in ImageNet, this backbone replaces the `kernel_size=7, stride=2` to `kernel_size=3, stride=1` and removes the MaxPooling after the stem layer to avoid forwarding small feature maps to residual blocks.
+
+The easiest way is to inherit from `ResNet` and only modify the stem layer.
+
+1. Create a new file `mmpretrain/models/backbones/resnet_cifar.py`.
+
+   ```python
+   import torch.nn as nn
+
+   from mmpretrain.registry import MODELS
+   from .resnet import ResNet
+
+
+   @MODELS.register_module()
+   class ResNet_CIFAR(ResNet):
+
+       """ResNet backbone for CIFAR.
+
+       short description of the backbone
+
+       Args:
+           depth(int): Network depth, from {18, 34, 50, 101, 152}.
+           ...
+       """
+
+       def __init__(self, depth, deep_stem, **kwargs):
+           # call ResNet init
+           super(ResNet_CIFAR, self).__init__(depth, deep_stem=deep_stem, **kwargs)
+           # other specific initializations
+           assert not self.deep_stem, 'ResNet_CIFAR do not support deep_stem'
+
+       def _make_stem_layer(self, in_channels, base_channels):
+           # override the ResNet method to modify the network structure
+           self.conv1 = build_conv_layer(
+               self.conv_cfg,
+               in_channels,
+               base_channels,
+               kernel_size=3,
+               stride=1,
+               padding=1,
+               bias=False)
+           self.norm1_name, norm1 = build_norm_layer(
+               self.norm_cfg, base_channels, postfix=1)
+           self.add_module(self.norm1_name, norm1)
+           self.relu = nn.ReLU(inplace=True)
+
+       def forward(self, x):
+           # Customize the forward method if needed.
+           x = self.conv1(x)
+           x = self.norm1(x)
+           x = self.relu(x)
+           outs = []
+           for i, layer_name in enumerate(self.res_layers):
+               res_layer = getattr(self, layer_name)
+               x = res_layer(x)
+               if i in self.out_indices:
+                   outs.append(x)
+           # The return value needs to be a tuple with multi-scale outputs from different depths.
+           # If you don't need multi-scale features, just wrap the output as a one-item tuple.
+           return tuple(outs)
+
+       def init_weights(self):
+           # Customize the weight initialization method if needed.
+           super().init_weights()
+
+           # Disable the weight initialization if loading a pretrained model.
+           if self.init_cfg is not None and self.init_cfg['type'] == 'Pretrained':
+               return
+
+           # Usually, we recommend using `init_cfg` to specify weight initialization methods
+           # of convolution, linear, or normalization layers. If you have some special needs,
+           # do these extra weight initialization here.
+           ...
+   ```
+
+```{note}
+Replace original registry names from `BACKBONES`, `NECKS`, `HEADS` and `LOSSES` to `MODELS` in OpenMMLab 2.0 design.
+```
+
+2. Import the new backbone module in `mmpretrain/models/backbones/__init__.py`.
+
+   ```python
+   ...
+   from .resnet_cifar import ResNet_CIFAR
+
+   __all__ = [
+       ..., 'ResNet_CIFAR'
+   ]
+   ```
+
+3. Modify the correlated settings in your config file.
+
+   ```python
+   model = dict(
+       ...
+       backbone=dict(
+           type='ResNet_CIFAR',
+           depth=18,
+           ...),
+       ...
+   ```
+
+### Add a new backbone for self-supervised learning
+
+For some self-supervised learning algorithms, the backbones are kind of different, such as `MAE`, `BEiT`, etc. Their backbones need to deal with `mask` in order to extract features from visible tokens.
+
+Take [MAEViT](mmpretrain.models.selfsup.MAEViT) as an example, we need to overwrite `forward` function to compute with `mask`. We also defines `init_weights` to initialize parameters and `random_masking` to generate mask for `MAE` pre-training.
+
+```python
+class MAEViT(VisionTransformer):
+    """Vision Transformer for MAE pre-training"""
+
+    def __init__(mask_ratio, **kwargs) -> None:
+        super().__init__(**kwargs)
+        # position embedding is not learnable during pretraining
+        self.pos_embed.requires_grad = False
+        self.mask_ratio = mask_ratio
+        self.num_patches = self.patch_resolution[0] * self.patch_resolution[1]
+
+    def init_weights(self) -> None:
+        """Initialize position embedding, patch embedding and cls token."""
+        super().init_weights()
+        # define what if needed
+        pass
+
+    def random_masking(
+        self,
+        x: torch.Tensor,
+        mask_ratio: float = 0.75
+    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
+        """Generate the mask for MAE Pre-training."""
+        pass
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        mask: Optional[bool] = True
+    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
+        """Generate features for masked images.
+
+        The function supports two kind of forward behaviors. If the ``mask`` is
+        ``True``, the function will generate mask to masking some patches
+        randomly and get the hidden features for visible patches, which means
+        the function will be executed as masked imagemodeling pre-training;
+        if the ``mask`` is ``None`` or ``False``, the forward function will
+        call ``super().forward()``, which extract features from images without
+        mask.
+        """
+        if mask is None or False:
+            return super().forward(x)
+
+        else:
+            B = x.shape[0]
+            x = self.patch_embed(x)[0]
+            # add pos embed w/o cls token
+            x = x + self.pos_embed[:, 1:, :]
+
+            # masking: length -> length * mask_ratio
+            x, mask, ids_restore = self.random_masking(x, self.mask_ratio)
+
+            # append cls token
+            cls_token = self.cls_token + self.pos_embed[:, :1, :]
+            cls_tokens = cls_token.expand(B, -1, -1)
+            x = torch.cat((cls_tokens, x), dim=1)
+
+            for _, layer in enumerate(self.layers):
+                x = layer(x)
+            # Use final norm
+            x = self.norm1(x)
+
+            return (x, mask, ids_restore)
+
+```
+
+## Add a new neck
+
+Here we take `GlobalAveragePooling` as an example. It is a very simple neck without any arguments.
+To add a new neck, we mainly implement the `forward` function, which applies some operations on the output from the backbone and forwards the results to the head.
+
+1. Create a new file in `mmpretrain/models/necks/gap.py`.
+
+   ```python
+   import torch.nn as nn
+
+   from mmpretrain.registry import MODELS
+
+   @MODELS.register_module()
+   class GlobalAveragePooling(nn.Module):
+
+       def __init__(self):
+           self.gap = nn.AdaptiveAvgPool2d((1, 1))
+
+       def forward(self, inputs):
+           # we regard inputs as tensor for simplicity
+           outs = self.gap(inputs)
+           outs = outs.view(inputs.size(0), -1)
+           return outs
+   ```
+
+2. Import the new neck module in `mmpretrain/models/necks/__init__.py`.
+
+   ```python
+   ...
+   from .gap import GlobalAveragePooling
+
+   __all__ = [
+       ..., 'GlobalAveragePooling'
+   ]
+   ```
+
+3. Modify the correlated settings in your config file.
+
+   ```python
+   model = dict(
+       neck=dict(type='GlobalAveragePooling'),
+   )
+   ```
+
+## Add a new head
+
+### Based on ClsHead
+
+Here we present how to develop a new head by the example of simplified `VisionTransformerClsHead` as the following.
+To implement a new head, we need to implement a `pre_logits` method for processes before the final classification head and a `forward` method.
+
+:::{admonition} Why do we need the `pre_logits` method?
+:class: note
+
+In classification tasks, we usually use a linear layer to do the final classification. And sometimes, we need
+to obtain the feature before the final classification, which is the output of the `pre_logits` method.
+:::
+
+1. Create a new file in `mmpretrain/models/heads/vit_head.py`.
+
+   ```python
+   import torch.nn as nn
+
+   from mmpretrain.registry import MODELS
+   from .cls_head import ClsHead
+
+
+   @MODELS.register_module()
+   class VisionTransformerClsHead(ClsHead):
+
+       def __init__(self, num_classes, in_channels, hidden_dim, **kwargs):
+           super().__init__(**kwargs)
+           self.in_channels = in_channels
+           self.num_classes = num_classes
+           self.hidden_dim = hidden_dim
+
+           self.fc1 = nn.Linear(in_channels, hidden_dim)
+           self.act = nn.Tanh()
+           self.fc2 = nn.Linear(hidden_dim, num_classes)
+
+       def pre_logits(self, feats):
+           # The output of the backbone is usually a tuple from multiple depths,
+           # and for classification, we only need the final output.
+           feat = feats[-1]
+
+           # The final output of VisionTransformer is a tuple of patch tokens and
+           # classification tokens. We need classification tokens here.
+           _, cls_token = feat
+
+           # Do all works except the final classification linear layer.
+           return self.act(self.fc1(cls_token))
+
+       def forward(self, feats):
+           pre_logits = self.pre_logits(feats)
+
+           # The final classification linear layer.
+           cls_score = self.fc2(pre_logits)
+           return cls_score
+   ```
+
+2. Import the module in `mmpretrain/models/heads/__init__.py`.
+
+   ```python
+   ...
+   from .vit_head import VisionTransformerClsHead
+
+   __all__ = [
+       ..., 'VisionTransformerClsHead'
+   ]
+   ```
+
+3. Modify the correlated settings in your config file.
+
+   ```python
+   model = dict(
+       head=dict(
+           type='VisionTransformerClsHead',
+           ...,
+       ))
+   ```
+
+### Based on BaseModule
+
+Here is an example of `MAEPretrainHead`, which is based on `BaseModule` and implemented for mask image modeling task. It is required to implement `loss` function to generate loss, but the other helper functions are optional.
+
+```python
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+from mmengine.model import BaseModule
+
+from mmpretrain.registry import MODELS
+
+
+@MODELS.register_module()
+class MAEPretrainHead(BaseModule):
+    """Head for MAE Pre-training."""
+
+    def __init__(self,
+                 loss: dict,
+                 norm_pix: bool = False,
+                 patch_size: int = 16) -> None:
+        super().__init__()
+        self.norm_pix = norm_pix
+        self.patch_size = patch_size
+        self.loss_module = MODELS.build(loss)
+
+    def patchify(self, imgs: torch.Tensor) -> torch.Tensor:
+        """Split images into non-overlapped patches."""
+        p = self.patch_size
+        assert imgs.shape[2] == imgs.shape[3] and imgs.shape[2] % p == 0
+
+        h = w = imgs.shape[2] // p
+        x = imgs.reshape(shape=(imgs.shape[0], 3, h, p, w, p))
+        x = torch.einsum('nchpwq->nhwpqc', x)
+        x = x.reshape(shape=(imgs.shape[0], h * w, p**2 * 3))
+        return x
+
+    def construct_target(self, target: torch.Tensor) -> torch.Tensor:
+        """Construct the reconstruction target."""
+        target = self.patchify(target)
+        if self.norm_pix:
+            # normalize the target image
+            mean = target.mean(dim=-1, keepdim=True)
+            var = target.var(dim=-1, keepdim=True)
+            target = (target - mean) / (var + 1.e-6)**.5
+
+        return target
+
+    def loss(self, pred: torch.Tensor, target: torch.Tensor,
+             mask: torch.Tensor) -> torch.Tensor:
+        """Generate loss."""
+        target = self.construct_target(target)
+        loss = self.loss_module(pred, target, mask)
+
+        return loss
+```
+
+After implementation, the following step is the same as the step-2 and step-3 in [Based on ClsHead](#based-on-clshead)
+
+## Add a new loss
+
+To add a new loss function, we mainly implement the `forward` function in the loss module. We should register the loss module as `MODELS` as well.
+In addition, it is helpful to leverage the decorator `weighted_loss` to weight the loss for each element.
+Assuming that we want to mimic a probabilistic distribution generated from another classification model, we implement an L1Loss to fulfill the purpose as below.
+
+1. Create a new file in `mmpretrain/models/losses/l1_loss.py`.
+
+   ```python
+   import torch
+   import torch.nn as nn
+
+   from mmpretrain.registry import MODELS
+   from .utils import weighted_loss
+
+   @weighted_loss
+   def l1_loss(pred, target):
+       assert pred.size() == target.size() and target.numel() > 0
+       loss = torch.abs(pred - target)
+       return loss
+
+   @MODELS.register_module()
+   class L1Loss(nn.Module):
+
+       def __init__(self, reduction='mean', loss_weight=1.0):
+           super(L1Loss, self).__init__()
+           self.reduction = reduction
+           self.loss_weight = loss_weight
+
+       def forward(self,
+                   pred,
+                   target,
+                   weight=None,
+                   avg_factor=None,
+                   reduction_override=None):
+           assert reduction_override in (None, 'none', 'mean', 'sum')
+           reduction = (
+               reduction_override if reduction_override else self.reduction)
+           loss = self.loss_weight * l1_loss(
+               pred, target, weight, reduction=reduction, avg_factor=avg_factor)
+           return loss
+   ```
+
+2. Import the module in `mmpretrain/models/losses/__init__.py`.
+
+   ```python
+   ...
+   from .l1_loss import L1Loss
+
+   __all__ = [
+       ..., 'L1Loss'
+   ]
+   ```
+
+3. Modify loss field in the head configs.
+
+   ```python
+   model = dict(
+       head=dict(
+           loss=dict(type='L1Loss', loss_weight=1.0),
+       ))
+   ```
+
+Finally, we can combine all the new model components in a config file to create a new model for best practices. Because `ResNet_CIFAR` is not a ViT-based backbone, we do not implement `VisionTransformerClsHead` here.
+
+```python
+model = dict(
+    type='ImageClassifier',
+    backbone=dict(
+        type='ResNet_CIFAR',
+        depth=18,
+        num_stages=4,
+        out_indices=(3, ),
+        style='pytorch'),
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='LinearClsHead',
+        num_classes=10,
+        in_channels=512,
+        loss=dict(type='L1Loss', loss_weight=1.0),
+        topk=(1, 5),
+    ))
+
+```
+
+```{tip}
+For convenience, the same model components could inherit from existing config files, refers to [Learn about configs](../user_guides/config.md) for more details.
+```
--- a/docs/en/advanced_guides/pipeline.md
+++ b/docs/en/advanced_guides/pipeline.md
+# Customize Data Pipeline
+
+## Design of Data pipelines
+
+In the [new dataset tutorial](./datasets.md), we know that the dataset class use the `load_data_list` method
+to initialize the entire dataset, and we save the information of every sample to a dict.
+
+Usually, to save memory usage, we only load image paths and labels in the `load_data_list`, and load full
+image content when we use them. Moreover, we may want to do some random data augmentation during picking
+samples when training. Almost all data loading, pre-processing, and formatting operations can be configured in
+MMPretrain by the **data pipeline**.
+
+The data pipeline means how to process the sample dict when indexing a sample from the dataset. And it
+consists of a sequence of data transforms. Each data transform takes a dict as input, processes it, and outputs a
+dict for the next data transform.
+
+Here is a data pipeline example for ResNet-50 training on ImageNet.
+
+```python
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='RandomResizedCrop', scale=224),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='PackInputs'),
+]
+```
+
+All available data transforms in MMPretrain can be found in the [data transforms docs](mmpretrain.datasets.transforms).
+
+## Modify the training/test pipeline
+
+The data pipeline in MMPretrain is pretty flexible. You can control almost every step of the data
+preprocessing from the config file, but on the other hand, you may be confused facing so many options.
+
+Here is a common practice and guidance for image classification tasks.
+
+### Loading
+
+At the beginning of a data pipeline, we usually need to load image data from the file path.
+[`LoadImageFromFile`](mmcv.transforms.LoadImageFromFile) is commonly used to do this task.
+
+```python
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    ...
+]
+```
+
+If you want to load data from files with special formats or special locations, you can [implement a new loading
+transform](#add-new-data-transforms) and add it at the beginning of the data pipeline.
+
+### Augmentation and other processing
+
+During training, we usually need to do data augmentation to avoid overfitting. During the test, we also need to do
+some data processing like resizing and cropping. These data transforms will be placed after the loading process.
+
+Here is a simple data augmentation recipe example. It will randomly resize and crop the input image to the
+specified scale, and randomly flip the image horizontally with probability.
+
+```python
+train_pipeline = [
+    ...
+    dict(type='RandomResizedCrop', scale=224),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    ...
+]
+```
+
+Here is a heavy data augmentation recipe example used in [Swin-Transformer](../papers/swin_transformer.md)
+training. To align with the official implementation, it specified `pillow` as the resize backend and `bicubic`
+as the resize algorithm. Moreover, it added [`RandAugment`](mmpretrain.datasets.transforms.RandAugment) and
+[`RandomErasing`](mmpretrain.datasets.transforms.RandomErasing) as extra data augmentation method.
+
+This configuration specified every detail of the data augmentation, and you can simply copy it to your own
+config file to apply the data augmentations of the Swin-Transformer.
+
+```python
+bgr_mean = [103.53, 116.28, 123.675]
+bgr_std = [57.375, 57.12, 58.395]
+
+train_pipeline = [
+    ...
+    dict(type='RandomResizedCrop', scale=224, backend='pillow', interpolation='bicubic'),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(
+        type='RandAugment',
+        policies='timm_increasing',
+        num_policies=2,
+        total_level=10,
+        magnitude_level=9,
+        magnitude_std=0.5,
+        hparams=dict(
+            pad_val=[round(x) for x in bgr_mean], interpolation='bicubic')),
+    dict(
+        type='RandomErasing',
+        erase_prob=0.25,
+        mode='rand',
+        min_area_ratio=0.02,
+        max_area_ratio=1 / 3,
+        fill_color=bgr_mean,
+        fill_std=bgr_std),
+    ...
+]
+```
+
+```{note}
+Usually, the data augmentation part in the data pipeline handles only image-wise transforms, but not transforms
+like image normalization or mixup/cutmix. It's because we can do image normalization and mixup/cutmix on batch data
+to accelerate. To configure image normalization and mixup/cutmix, please use the [data preprocessor](mmpretrain.models.utils.data_preprocessor).
+```
+
+### Formatting
+
+The formatting is to collect training data from the data information dict and convert these data to
+model-friendly format.
+
+In most cases, you can simply use [`PackInputs`](mmpretrain.datasets.transforms.PackInputs), and it will
+convert the image in NumPy array format to PyTorch tensor, and pack the ground truth categories information and
+other meta information as a [`DataSample`](mmpretrain.structures.DataSample).
+
+```python
+train_pipeline = [
+    ...
+    dict(type='PackInputs'),
+]
+```
+
+## Add new data transforms
+
+1. Write a new data transform in any file, e.g., `my_transform.py`, and place it in
+   the folder `mmpretrain/datasets/transforms/`. The data transform class needs to inherit
+   the [`mmcv.transforms.BaseTransform`](mmcv.transforms.BaseTransform) class and override
+   the `transform` method which takes a dict as input and returns a dict.
+
+   ```python
+   from mmcv.transforms import BaseTransform
+   from mmpretrain.registry import TRANSFORMS
+
+   @TRANSFORMS.register_module()
+   class MyTransform(BaseTransform):
+
+       def transform(self, results):
+           # Modify the data information dict `results`.
+           return results
+   ```
+
+2. Import the new class in the `mmpretrain/datasets/transforms/__init__.py`.
+
+   ```python
+   ...
+   from .my_transform import MyTransform
+
+   __all__ = [
+       ..., 'MyTransform'
+   ]
+   ```
+
+3. Use it in config files.
+
+   ```python
+   train_pipeline = [
+       ...
+       dict(type='MyTransform'),
+       ...
+   ]
+   ```
+
+## Pipeline visualization
+
+After designing data pipelines, you can use the [visualization tools](../useful_tools/dataset_visualization.md) to view the performance.
--- a/docs/en/advanced_guides/runtime.md
+++ b/docs/en/advanced_guides/runtime.md
+# Customize Runtime Settings
+
+The runtime configurations include many helpful functionalities, like checkpoint saving, logger configuration,
+etc. In this tutorial, we will introduce how to configure these functionalities.
+
+## Save Checkpoint
+
+The checkpoint saving functionality is a default hook during training. And you can configure it in the
+`default_hooks.checkpoint` field.
+
+```{note}
+The hook mechanism is widely used in all OpenMMLab libraries. Through hooks, you can plug in many
+functionalities without modifying the main execution logic of the runner.
+
+A detailed introduction of hooks can be found in {external+mmengine:doc}`Hooks <tutorials/hook>`.
+```
+
+**The default settings**
+
+```python
+default_hooks = dict(
+    ...
+    checkpoint = dict(type='CheckpointHook', interval=1)
+    ...
+)
+```
+
+Here are some usual arguments, and all available arguments can be found in the [CheckpointHook](mmengine.hooks.CheckpointHook).
+
+- **`interval`** (int): The saving period. If use -1, it will never save checkpoints.
+- **`by_epoch`** (bool): Whether the **`interval`** is by epoch or by iteration. Defaults to `True`.
+- **`out_dir`** (str): The root directory to save checkpoints. If not specified, the checkpoints will be saved in the work directory. If specified, the checkpoints will be saved in the sub-folder of the **`out_dir`**.
+- **`max_keep_ckpts`** (int): The maximum checkpoints to keep. In some cases, we want only the latest few checkpoints and would like to delete old ones to save disk space. Defaults to -1, which means unlimited.
+- **`save_best`** (str, List[str]): If specified, it will save the checkpoint with the best evaluation result.
+  Usually, you can simply use `save_best="auto"` to automatically select the evaluation metric.
+
+And if you want more advanced configuration, please refer to the [CheckpointHook docs](tutorials/hook.md#checkpointhook).
+
+## Load Checkpoint / Resume Training
+
+In config files, you can specify the loading and resuming functionality as below:
+
+```python
+# load from which checkpoint
+load_from = "Your checkpoint path"
+
+# whether to resume training from the loaded checkpoint
+resume = False
+```
+
+The `load_from` field can be either a local path or an HTTP path. And you can resume training from the checkpoint by
+specify `resume=True`.
+
+```{tip}
+You can also enable auto resuming from the latest checkpoint by specifying `load_from=None` and `resume=True`.
+Runner will find the latest checkpoint from the work directory automatically.
+```
+
+If you are training models by our `tools/train.py` script, you can also use `--resume` argument to resume
+training without modifying the config file manually.
+
+```bash
+# Automatically resume from the latest checkpoint.
+python tools/train.py configs/resnet/resnet50_8xb32_in1k.py --resume
+
+# Resume from the specified checkpoint.
+python tools/train.py configs/resnet/resnet50_8xb32_in1k.py --resume checkpoints/resnet.pth
+```
+
+## Randomness Configuration
+
+In the `randomness` field, we provide some options to make the experiment as reproducible as possible.
+
+By default, we won't specify seed in the config file, and in every experiment, the program will generate a random seed.
+
+**Default settings:**
+
+```python
+randomness = dict(seed=None, deterministic=False)
+```
+
+To make the experiment more reproducible, you can specify a seed and set `deterministic=True`. The influence
+of the `deterministic` option can be found [here](https://pytorch.org/docs/stable/notes/randomness.html#cuda-convolution-benchmarking).
+
+## Log Configuration
+
+The log configuration relates to multiple fields.
+
+In the `log_level` field, you can specify the global logging level. See {external+python:ref}`Logging Levels<levels>` for a list of levels.
+
+```python
+log_level = 'INFO'
+```
+
+In the `default_hooks.logger` field, you can specify the logging interval during training and testing. And all
+available arguments can be found in the [LoggerHook docs](tutorials/hook.md#loggerhook).
+
+```python
+default_hooks = dict(
+    ...
+    # print log every 100 iterations.
+    logger=dict(type='LoggerHook', interval=100),
+    ...
+)
+```
+
+In the `log_processor` field, you can specify the log smooth method. Usually, we use a window with length of 10
+to smooth the log and output the mean value of all information. If you want to specify the smooth method of
+some information finely, see the {external+mmengine:doc}`LogProcessor docs <advanced_tutorials/logging>`.
+
+```python
+# The default setting, which will smooth the values in training log by a 10-length window.
+log_processor = dict(window_size=10)
+```
+
+In the `visualizer` field, you can specify multiple backends to save the log information, such as TensorBoard
+and WandB. More details can be found in the [Visualizer section](#visualizer).
+
+## Custom Hooks
+
+Many above functionalities are implemented by hooks, and you can also plug-in other custom hooks by modifying
+`custom_hooks` field. Here are some hooks in MMEngine and MMPretrain that you can use directly, such as:
+
+- [EMAHook](mmpretrain.engine.hooks.EMAHook)
+- [SyncBuffersHook](mmengine.hooks.SyncBuffersHook)
+- [EmptyCacheHook](mmengine.hooks.EmptyCacheHook)
+- [ClassNumCheckHook](mmpretrain.engine.hooks.ClassNumCheckHook)
+- ......
+
+For example, EMA (Exponential Moving Average) is widely used in the model training, and you can enable it as
+below:
+
+```python
+custom_hooks = [
+    dict(type='EMAHook', momentum=4e-5, priority='ABOVE_NORMAL'),
+]
+```
+
+## Visualize Validation
+
+The validation visualization functionality is a default hook during validation. And you can configure it in the
+`default_hooks.visualization` field.
+
+By default, we disabled it, and you can enable it by specifying `enable=True`. And more arguments can be found in
+the [VisualizationHook docs](mmpretrain.engine.hooks.VisualizationHook).
+
+```python
+default_hooks = dict(
+    ...
+    visualization=dict(type='VisualizationHook', enable=False),
+    ...
+)
+```
+
+This hook will select some images in the validation dataset, and tag the prediction results on these images
+during every validation process. You can use it to watch the varying of model performance on actual images
+during training.
+
+In addition, if the images in your validation dataset are small (\<100), you can rescale them before
+visualization by specifying `rescale_factor=2.` or higher.
+
+## Visualizer
+
+The visualizer is used to record all kinds of information during training and test, including logs, images and
+scalars. By default, the recorded information will be saved at the `vis_data` folder under the work directory.
+
+**Default settings:**
+
+```python
+visualizer = dict(
+    type='UniversalVisualizer',
+    vis_backends=[
+        dict(type='LocalVisBackend'),
+    ]
+)
+```
+
+Usually, the most useful function is to save the log and scalars like `loss` to different backends.
+For example, to save them to TensorBoard, simply set them as below:
+
+```python
+visualizer = dict(
+    type='UniversalVisualizer',
+    vis_backends=[
+        dict(type='LocalVisBackend'),
+        dict(type='TensorboardVisBackend'),
+    ]
+)
+```
+
+Or save them to WandB as below:
+
+```python
+visualizer = dict(
+    type='UniversalVisualizer',
+    vis_backends=[
+        dict(type='LocalVisBackend'),
+        dict(type='WandbVisBackend'),
+    ]
+)
+```
+
+## Environment Configuration
+
+In the `env_cfg` field, you can configure some low-level parameters, like cuDNN, multi-process, and distributed
+communication.
+
+**Please make sure you understand the meaning of these parameters before modifying them.**
+
+```python
+env_cfg = dict(
+    # whether to enable cudnn benchmark
+    cudnn_benchmark=False,
+
+    # set multi-process parameters
+    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
+
+    # set distributed parameters
+    dist_cfg=dict(backend='nccl'),
+)
+```
--- a/docs/en/advanced_guides/schedule.md
+++ b/docs/en/advanced_guides/schedule.md
+# Customize Training Schedule
+
+In our codebase, [default training schedules](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/schedules) have been provided for common datasets such as CIFAR, ImageNet, etc. If we attempt to experiment on these datasets for higher accuracy or on different new methods and datasets, we might possibly need to modify the strategies.
+
+In this tutorial, we will introduce how to modify configs to construct optimizers, use parameter-wise finely configuration, gradient clipping, gradient accumulation as well as customize learning rate and momentum schedules. Furthermore, introduce a template to customize self-implemented optimizationmethods for the project.
+
+## Customize optimization
+
+We use the `optim_wrapper` field to configure the strategies of optimization, which includes choices of optimizer, choices of automatic mixed precision training, parameter-wise configurations, gradient clipping and accumulation. Details are seen below.
+
+### Use optimizers supported by PyTorch
+
+We support all the optimizers implemented by PyTorch, and to use them, please change the `optimizer` field of config files.
+
+For example, if you want to use [`SGD`](torch.optim.SGD), the modification in config file could be as the following. Notice that optimization related settings should all wrapped inside the `optim_wrapper`.
+
+```python
+optim_wrapper = dict(
+    type='OptimWrapper',
+    optimizer=dict(type='SGD', lr=0.0003, weight_decay=0.0001)
+)
+```
+
+```{note}
+`type` in optimizer is not a constructor but a optimizer name in PyTorch.
+Refers to {external+torch:ref}`List of optimizers supported by PyTorch <optim:algorithms>` for more choices.
+```
+
+To modify the learning rate of the model, just modify the `lr` in the config of optimizer.
+You can also directly set other arguments according to the [API doc](torch.optim) of PyTorch.
+
+For example, if you want to use [`Adam`](torch.optim.Adam) with settings like `torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)` in PyTorch. You could use the config below:
+
+```python
+optim_wrapper = dict(
+    type='OptimWrapper',
+    optimizer = dict(
+        type='Adam',
+        lr=0.001,
+        betas=(0.9, 0.999),
+        eps=1e-08,
+        weight_decay=0,
+        amsgrad=False),
+)
+```
+
+````{note}
+The default type of `optim_wrapper` field is [`OptimWrapper`](mmengine.optim.OptimWrapper), therefore, you can
+omit the type field usually, like:
+
+```python
+optim_wrapper = dict(
+    optimizer=dict(
+        type='Adam',
+        lr=0.001,
+        betas=(0.9, 0.999),
+        eps=1e-08,
+        weight_decay=0,
+        amsgrad=False))
+```
+````
+
+### Use AMP training
+
+If we want to use the automatic mixed precision training, we can simply change the type of `optim_wrapper` to `AmpOptimWrapper` in config files.
+
+```python
+optim_wrapper = dict(type='AmpOptimWrapper', optimizer=...)
+```
+
+Alternatively, for conveniency, we can set `--amp` parameter to turn on the AMP option directly in the `tools/train.py` script. Refers to [Training tutorial](../user_guides/train.md) for details of starting a training.
+
+### Parameter-wise finely configuration
+
+Some models may have parameter-specific settings for optimization, for example, no weight decay to the BatchNorm layers or using different learning rates for different network layers.
+To finely configure them, we can use the `paramwise_cfg` argument in `optim_wrapper`.
+
+- **Set different hyper-parameter multipliers for different types of parameters.**
+
+  For instance, we can set `norm_decay_mult=0.` in `paramwise_cfg` to change the weight decay of weight and bias of normalization layers to zero.
+
+  ```python
+  optim_wrapper = dict(
+      optimizer=dict(type='SGD', lr=0.8, weight_decay=1e-4),
+      paramwise_cfg=dict(norm_decay_mult=0.))
+  ```
+
+  More types of parameters are supported to configured, list as follow:
+
+  - `bias_lr_mult`: Multiplier for learning rate of bias (Not include normalization layers' biases and deformable convolution layers' offsets). Defaults to 1.
+  - `bias_decay_mult`: Multiplier for weight decay of bias (Not include normalization layers' biases and deformable convolution layers' offsets). Defaults to 1.
+  - `norm_decay_mult`: Multiplier for weight decay of weight and bias of normalization layers. Defaults to 1.
+  - `flat_decay_mult`: Multiplier for weight decay of all one-dimensional parameters. Defaults to 1.
+  - `dwconv_decay_mult`: Multiplier for weight decay of depth-wise convolution layers. Defaults to 1.
+  - `bypass_duplicate`: Whether to bypass duplicated parameters. Defaults to `False`.
+  - `dcn_offset_lr_mult`: Multiplier for learning rate of deformable convolution layers. Defaults to 1.
+
+- **Set different hyper-parameter multipliers for specific parameters.**
+
+  MMPretrain can use `custom_keys` in `paramwise_cfg` to specify different parameters to use different learning rates or weight decay.
+
+  For example, to set all learning rates and weight decays of `backbone.layer0` to 0, the rest of `backbone` remains the same as optimizer and the learning rate of `head` to 0.001, use the configs below.
+
+  ```python
+  optim_wrapper = dict(
+      optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001),
+      paramwise_cfg=dict(
+          custom_keys={
+              'backbone.layer0': dict(lr_mult=0, decay_mult=0),
+              'backbone': dict(lr_mult=1),
+              'head': dict(lr_mult=0.1)
+          }))
+  ```
+
+### Gradient clipping
+
+During the training process, the loss function may get close to a cliffy region and cause gradient explosion. And gradient clipping is helpful to stabilize the training process. More introduction can be found in [this page](https://paperswithcode.com/method/gradient-clipping).
+
+Currently we support `clip_grad` option in `optim_wrapper` for gradient clipping, refers to [PyTorch Documentation](torch.nn.utils.clip_grad_norm_).
+
+Here is an example:
+
+```python
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001),
+    # norm_type: type of the used p-norm, here norm_type is 2.
+    clip_grad=dict(max_norm=35, norm_type=2))
+```
+
+### Gradient accumulation
+
+When computing resources are lacking, the batch size can only be set to a small value, which may affect the performance of models. Gradient accumulation can be used to solve this problem. We support `accumulative_counts` option in `optim_wrapper` for gradient accumulation.
+
+Here is an example:
+
+```python
+train_dataloader = dict(batch_size=64)
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001),
+    accumulative_counts=4)
+```
+
+Indicates that during training, back-propagation is performed every 4 iters. And the above is equivalent to:
+
+```python
+train_dataloader = dict(batch_size=256)
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001))
+```
+
+## Customize parameter schedules
+
+In training, the optimzation parameters such as learing rate, momentum, are usually not fixed but changing through iterations or epochs. PyTorch supports several learning rate schedulers, which are not sufficient for complex strategies. In MMPretrain, we provide `param_scheduler` for better controls of different parameter schedules.
+
+### Customize learning rate schedules
+
+Learning rate schedulers are widely used to improve performance. We support most of the PyTorch schedulers, including `ExponentialLR`, `LinearLR`, `StepLR`, `MultiStepLR`, etc.
+
+All available learning rate scheduler can be found {external+mmengine:doc}`here <api/optim>`, and the
+names of learning rate schedulers end with `LR`.
+
+- **Single learning rate schedule**
+
+  In most cases, we use only one learning rate schedule for simplicity. For instance, [`MultiStepLR`](mmengine.optim.MultiStepLR) is used as the default learning rate schedule for ResNet. Here, `param_scheduler` is a dictionary.
+
+  ```python
+  param_scheduler = dict(
+      type='MultiStepLR',
+      by_epoch=True,
+      milestones=[100, 150],
+      gamma=0.1)
+  ```
+
+  Or, we want to use the [`CosineAnnealingLR`](mmengine.optim.CosineAnnealingLR) scheduler to decay the learning rate:
+
+  ```python
+  param_scheduler = dict(
+      type='CosineAnnealingLR',
+      by_epoch=True,
+      T_max=num_epochs)
+  ```
+
+- **Multiple learning rate schedules**
+
+  In some of the training cases, multiple learning rate schedules are applied for higher accuracy. For example ,in the early stage, training is easy to be volatile, and warmup is a technique to reduce volatility.
+  The learning rate will increase gradually from a minor value to the expected value by warmup and decay afterwards by other schedules.
+
+  In MMPretrain, simply combines desired schedules in `param_scheduler` as a list can achieve the warmup strategy.
+
+  Here are some examples:
+
+  1. linear warmup during the first 50 iters.
+
+  ```python
+    param_scheduler = [
+        # linear warm-up by iters
+        dict(type='LinearLR',
+            start_factor=0.001,
+            by_epoch=False,  # by iters
+            end=50),  # only warm up for first 50 iters
+        # main learing rate schedule
+        dict(type='MultiStepLR',
+            by_epoch=True,
+            milestones=[8, 11],
+            gamma=0.1)
+    ]
+  ```
+
+  2. linear warmup and update lr by iter during the first 10 epochs.
+
+  ```python
+    param_scheduler = [
+        # linear warm-up by epochs in [0, 10) epochs
+        dict(type='LinearLR',
+            start_factor=0.001,
+            by_epoch=True,
+            end=10,
+            convert_to_iter_based=True,  # Update learning rate by iter.
+        ),
+        # use CosineAnnealing schedule after 10 epochs
+        dict(type='CosineAnnealingLR', by_epoch=True, begin=10)
+    ]
+  ```
+
+  Notice that, we use `begin` and `end` arguments here to assign the valid range, which is [`begin`, `end`) for this schedule. And the range unit is defined by `by_epoch` argument. If not specified, the `begin` is 0 and the `end` is the max epochs or iterations.
+
+  If the ranges for all schedules are not continuous, the learning rate will stay constant in ignored range, otherwise all valid schedulers will be executed in order in a specific stage, which behaves the same as PyTorch [`ChainedScheduler`](torch.optim.lr_scheduler.ChainedScheduler).
+
+  ```{tip}
+  To check that the learning rate curve is as expected, after completing your configuration file，you could use [optimizer parameter visualization tool](../useful_tools/scheduler_visualization.md) to draw the corresponding learning rate adjustment curve.
+  ```
+
+### Customize momentum schedules
+
+We support using momentum schedulers to modify the optimizer's momentum according to learning rate, which could make the loss converge in a faster way. The usage is the same as learning rate schedulers.
+
+All available learning rate scheduler can be found {external+mmengine:doc}`here <api/optim>`, and the
+names of momentum rate schedulers end with `Momentum`.
+
+Here is an example:
+
+```python
+param_scheduler = [
+    # the lr scheduler
+    dict(type='LinearLR', ...),
+    # the momentum scheduler
+    dict(type='LinearMomentum',
+         start_factor=0.001,
+         by_epoch=False,
+         begin=0,
+         end=1000)
+]
+```
+
+## Add new optimizers or constructors
+
+```{note}
+This part will modify the MMPretrain source code or add code to the MMPretrain framework, beginners can skip it.
+```
+
+### Add new optimizers
+
+In academic research and industrial practice, it may be necessary to use optimization methods not implemented by MMPretrain, and you can add them through the following methods.
+
+1. Implement a New Optimizer
+
+   Assume you want to add an optimizer named `MyOptimizer`, which has arguments `a`, `b`, and `c`.
+   You need to create a new file under `mmpretrain/engine/optimizers`, and implement the new optimizer in the file, for example, in `mmpretrain/engine/optimizers/my_optimizer.py`:
+
+   ```python
+   from torch.optim import Optimizer
+   from mmpretrain.registry import OPTIMIZERS
+
+
+   @OPTIMIZERS.register_module()
+   class MyOptimizer(Optimizer):
+
+       def __init__(self, a, b, c):
+           ...
+
+       def step(self, closure=None):
+           ...
+   ```
+
+2. Import the Optimizer
+
+   To find the above module defined above, this module should be imported during the running.
+
+   Import it in the `mmpretrain/engine/optimizers/__init__.py` to add it into the `mmpretrain.engine` package.
+
+   ```python
+   # In mmpretrain/engine/optimizers/__init__.py
+   ...
+   from .my_optimizer import MyOptimizer # MyOptimizer maybe other class name
+
+   __all__ = [..., 'MyOptimizer']
+   ```
+
+   During running, we will automatically import the `mmpretrain.engine` package and register the `MyOptimizer` at the same time.
+
+3. Specify the Optimizer in Config
+
+   Then you can use `MyOptimizer` in the `optim_wrapper.optimizer` field of config files.
+
+   ```python
+   optim_wrapper = dict(
+       optimizer=dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value))
+   ```
+
+### Add new optimizer constructors
+
+Some models may have some parameter-specific settings for optimization, like different weight decay rate for all `BatchNorm` layers.
+
+Although we already can use [the `optim_wrapper.paramwise_cfg` field](#parameter-wise-finely-configuration) to
+configure various parameter-specific optimizer settings. It may still not cover your need.
+
+Of course, you can modify it. By default, we use the [`DefaultOptimWrapperConstructor`](mmengine.optim.DefaultOptimWrapperConstructor)
+class to deal with the construction of optimizer. And during the construction, it fine-grainedly configures the optimizer settings of
+different parameters according to the `paramwise_cfg`，which could also serve as a template for new optimizer constructor.
+
+You can overwrite these behaviors by add new optimizer constructors.
+
+```python
+# In mmpretrain/engine/optimizers/my_optim_constructor.py
+from mmengine.optim import DefaultOptimWrapperConstructor
+from mmpretrain.registry import OPTIM_WRAPPER_CONSTRUCTORS
+
+
+@OPTIM_WRAPPER_CONSTRUCTORS.register_module()
+class MyOptimWrapperConstructor:
+
+    def __init__(self, optim_wrapper_cfg, paramwise_cfg=None):
+        ...
+
+    def __call__(self, model):
+        ...
+```
+
+Here is a specific example of [OptimWrapperConstructor](mmpretrain.engine.optimizers.LearningRateDecayOptimWrapperConstructor).
+
+And then, import it and use it almost like [the optimizer tutorial](#add-new-optimizers).
+
+1. Import it in the `mmpretrain/engine/optimizers/__init__.py` to add it into the `mmpretrain.engine` package.
+
+   ```python
+   # In mmpretrain/engine/optimizers/__init__.py
+   ...
+   from .my_optim_constructor import MyOptimWrapperConstructor
+
+   __all__ = [..., 'MyOptimWrapperConstructor']
+   ```
+
+2. Use `MyOptimWrapperConstructor` in the `optim_wrapper.constructor` field of config files.
+
+   ```python
+   optim_wrapper = dict(
+       constructor=dict(type='MyOptimWrapperConstructor'),
+       optimizer=...,
+       paramwise_cfg=...,
+   )
+   ```
--- a/docs/en/api/apis.rst
+++ b/docs/en/api/apis.rst
+.. role:: hidden
+    :class: hidden-section
+
+.. module:: mmpretrain.apis
+
+mmpretrain.apis
+===================================
+
+These are some high-level APIs for classification tasks.
+
+.. contents:: mmpretrain.apis
+   :depth: 2
+   :local:
+   :backlinks: top
+
+Model
+------------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   list_models
+   get_model
+
+Inference
+------------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: callable.rst
+
+   ImageClassificationInferencer
+   ImageRetrievalInferencer
+   ImageCaptionInferencer
+   VisualQuestionAnsweringInferencer
+   VisualGroundingInferencer
+   TextToImageRetrievalInferencer
+   ImageToTextRetrievalInferencer
+   NLVRInferencer
+   FeatureExtractor
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   inference_model
--- a/docs/en/api/data_process.rst
+++ b/docs/en/api/data_process.rst
+.. role:: hidden
+    :class: hidden-section
+
+Data Process
+=================
+
+In MMPreTrain, the data process and the dataset is decomposed. The
+datasets only define how to get samples' basic information from the file
+system. These basic information includes the ground-truth label and raw
+images data / the paths of images.The data process includes data transforms,
+data preprocessors and batch augmentations.
+
+- :mod:`Data Transforms <mmpretrain.datasets.transforms>`: Transforms includes loading, preprocessing, formatting and etc.
+- :mod:`Data Preprocessors <mmpretrain.models.utils.data_preprocessor>`: Processes includes collate, normalization, stacking, channel fliping and etc.
+
+  - :mod:`Batch Augmentations <mmpretrain.models.utils.batch_augments>`: Batch augmentation involves multiple samples, such as Mixup and CutMix.
+
+.. module:: mmpretrain.datasets.transforms
+
+Data Transforms
+--------------------
+
+To prepare the inputs data, we need to do some transforms on these basic
+information. These transforms includes loading, preprocessing and
+formatting. And a series of data transforms makes up a data pipeline.
+Therefore, you can find the a ``pipeline`` argument in the configs of dataset,
+for example:
+
+.. code:: python
+
+    train_pipeline = [
+        dict(type='LoadImageFromFile'),
+        dict(type='RandomResizedCrop', scale=224),
+        dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+        dict(type='PackInputs'),
+    ]
+
+    train_dataloader = dict(
+        ....
+        dataset=dict(
+            pipeline=train_pipeline,
+            ....),
+        ....
+    )
+
+Every item of a pipeline list is one of the following data transforms class. And if you want to add a custom data transformation class, the tutorial :doc:`Custom Data Pipelines </advanced_guides/pipeline>` will help you.
+
+.. contents::
+   :depth: 1
+   :local:
+   :backlinks: top
+
+Loading and Formatting
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: data_transform.rst
+
+   LoadImageFromFile
+   PackInputs
+   PackMultiTaskInputs
+   PILToNumpy
+   NumpyToPIL
+   Transpose
+   Collect
+
+Processing and Augmentation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: data_transform.rst
+
+   Albumentations
+   CenterCrop
+   ColorJitter
+   EfficientNetCenterCrop
+   EfficientNetRandomCrop
+   Lighting
+   Normalize
+   RandomCrop
+   RandomErasing
+   RandomFlip
+   RandomGrayscale
+   RandomResize
+   RandomResizedCrop
+   Resize
+   ResizeEdge
+   BEiTMaskGenerator
+   SimMIMMaskGenerator
+
+Composed Augmentation
+"""""""""""""""""""""
+Composed augmentation is a kind of methods which compose a series of data
+augmentation transforms, such as ``AutoAugment`` and ``RandAugment``.
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: data_transform.rst
+
+   AutoAugment
+   RandAugment
+
+The above transforms is composed from a group of policies from the below random
+transforms:
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: data_transform.rst
+
+   AutoContrast
+   Brightness
+   ColorTransform
+   Contrast
+   Cutout
+   Equalize
+   GaussianBlur
+   Invert
+   Posterize
+   Rotate
+   Sharpness
+   Shear
+   Solarize
+   SolarizeAdd
+   Translate
+   BaseAugTransform
+
+MMCV transforms
+^^^^^^^^^^^^^^^
+
+We also provides many transforms in MMCV. You can use them directly in the config files. Here are some frequently used transforms, and the whole transforms list can be found in :external+mmcv:doc:`api/transforms`.
+
+Transform Wrapper
+^^^^^^^^^^^^^^^^^
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: data_transform.rst
+
+   MultiView
+
+.. module:: mmpretrain.models.utils.data_preprocessor
+
+
+TorchVision Transforms
+^^^^^^^^^^^^^^^^^^^^^^
+
+We also provide all the transforms in TorchVision. You can use them the like following examples:
+
+**1. Use some TorchVision Augs Surrounded by NumpyToPIL and PILToNumpy (Recommendation)**
+
+Add TorchVision Augs surrounded by ``dict(type='NumpyToPIL', to_rgb=True),`` and ``dict(type='PILToNumpy', to_bgr=True),``
+
+.. code:: python
+
+    train_pipeline = [
+        dict(type='LoadImageFromFile'),
+        dict(type='NumpyToPIL', to_rgb=True),     # from BGR in cv2 to RGB  in PIL
+        dict(type='torchvision/RandomResizedCrop',size=176),
+        dict(type='PILToNumpy', to_bgr=True),     # from RGB  in PIL to BGR in cv2
+        dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+        dict(type='PackInputs'),
+    ]
+
+    data_preprocessor = dict(
+        num_classes=1000,
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        to_rgb=True,                          # from BGR in cv2 to RGB  in PIL
+    )
+
+
+**2. Use TorchVision Augs and ToTensor&Normalize**
+
+Make sure the 'img' has been converted to PIL format from BGR-Numpy format before being processed by TorchVision Augs.
+
+.. code:: python
+
+    train_pipeline = [
+        dict(type='LoadImageFromFile'),
+        dict(type='NumpyToPIL', to_rgb=True),       # from BGR in cv2 to RGB  in PIL
+        dict(
+            type='torchvision/RandomResizedCrop',
+            size=176,
+            interpolation='bilinear'),            # accept str format interpolation mode
+        dict(type='torchvision/RandomHorizontalFlip', p=0.5),
+        dict(
+            type='torchvision/TrivialAugmentWide',
+            interpolation='bilinear'),
+        dict(type='torchvision/PILToTensor'),
+        dict(type='torchvision/ConvertImageDtype', dtype=torch.float),
+        dict(
+            type='torchvision/Normalize',
+            mean=(0.485, 0.456, 0.406),
+            std=(0.229, 0.224, 0.225),
+        ),
+        dict(type='torchvision/RandomErasing', p=0.1),
+        dict(type='PackInputs'),
+    ]
+
+    data_preprocessor = dict(num_classes=1000, mean=None, std=None, to_rgb=False)  # Normalize in dataset pipeline
+
+
+**3. Use TorchVision Augs Except ToTensor&Normalize**
+
+.. code:: python
+
+    train_pipeline = [
+        dict(type='LoadImageFromFile'),
+        dict(type='NumpyToPIL', to_rgb=True),   # from BGR in cv2 to RGB  in PIL
+        dict(type='torchvision/RandomResizedCrop', size=176, interpolation='bilinear'),
+        dict(type='torchvision/RandomHorizontalFlip', p=0.5),
+        dict(type='torchvision/TrivialAugmentWide', interpolation='bilinear'),
+        dict(type='PackInputs'),
+    ]
+
+    # here the Normalize params is for the RGB format
+    data_preprocessor = dict(
+        num_classes=1000,
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        to_rgb=False,
+    )
+
+
+Data Preprocessors
+------------------
+
+The data preprocessor is also a component to process the data before feeding data to the neural network.
+Comparing with the data transforms, the data preprocessor is a module of the classifier,
+and it takes a batch of data to process, which means it can use GPU and batch to accelebrate the processing.
+
+The default data preprocessor in MMPreTrain could do the pre-processing like following:
+
+1. Move data to the target device.
+2. Pad inputs to the maximum size of current batch.
+3. Stack inputs to a batch.
+4. Convert inputs from bgr to rgb if the shape of input is (3, H, W).
+5. Normalize image with defined std and mean.
+6. Do batch augmentations like Mixup and CutMix during training.
+
+You can configure the data preprocessor by the ``data_preprocessor`` field or ``model.data_preprocessor`` field in the config file. Typical usages are as below:
+
+.. code-block:: python
+
+    data_preprocessor = dict(
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        to_rgb=True,    # convert image from BGR to RGB
+    )
+
+Or define in ``model.data_preprocessor`` as following:
+
+.. code-block:: python
+
+   model = dict(
+       backbone = ...,
+       neck = ...,
+       head = ...,
+       data_preprocessor = dict(
+                            mean=[123.675, 116.28, 103.53],
+                            std=[58.395, 57.12, 57.375],
+                            to_rgb=True)
+       train_cfg=...,
+   )
+
+Note that the ``model.data_preprocessor`` has higher priority than ``data_preprocessor``.
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   ClsDataPreprocessor
+   SelfSupDataPreprocessor
+   TwoNormDataPreprocessor
+   VideoDataPreprocessor
+
+.. module:: mmpretrain.models.utils.batch_augments
+
+Batch Augmentations
+^^^^^^^^^^^^^^^^^^^^
+
+The batch augmentation is a component of data preprocessors. It involves multiple samples and mix them in some way, such as Mixup and CutMix.
+
+These augmentations are usually only used during training, therefore, we use the ``model.train_cfg`` field to configure them in config files.
+
+.. code-block:: python
+
+   model = dict(
+       backbone=...,
+       neck=...,
+       head=...,
+       train_cfg=dict(augments=[
+           dict(type='Mixup', alpha=0.8),
+           dict(type='CutMix', alpha=1.0),
+       ]),
+   )
+
+You can also specify the probabilities of every batch augmentation by the ``probs`` field.
+
+.. code-block:: python
+
+   model = dict(
+       backbone=...,
+       neck=...,
+       head=...,
+       train_cfg=dict(augments=[
+           dict(type='Mixup', alpha=0.8),
+           dict(type='CutMix', alpha=1.0),
+       ], probs=[0.3, 0.7])
+   )
+
+Here is a list of batch augmentations can be used in MMPreTrain.
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: callable.rst
+
+   Mixup
+   CutMix
+   ResizeMix
--- a/docs/en/api/datasets.rst
+++ b/docs/en/api/datasets.rst
+.. role:: hidden
+    :class: hidden-section
+
+.. module:: mmpretrain.datasets
+
+mmpretrain.datasets
+===================================
+
+The ``datasets`` package contains several usual datasets for image classification tasks and some dataset wrappers.
+
+.. contents:: mmpretrain.datasets
+   :depth: 2
+   :local:
+   :backlinks: top
+
+Custom Dataset
+--------------
+
+.. autoclass:: CustomDataset
+
+ImageNet
+--------
+
+.. autoclass:: ImageNet
+
+.. autoclass:: ImageNet21k
+
+CIFAR
+-----
+
+.. autoclass:: CIFAR10
+
+.. autoclass:: CIFAR100
+
+MNIST
+-----
+
+.. autoclass:: MNIST
+
+.. autoclass:: FashionMNIST
+
+VOC
+---
+
+.. autoclass:: VOC
+
+CUB
+---
+
+.. autoclass:: CUB
+
+Places205
+---------
+
+.. autoclass:: Places205
+
+Retrieval
+---------
+
+.. autoclass:: InShop
+
+Base classes
+------------
+
+.. autoclass:: BaseDataset
+
+.. autoclass:: MultiLabelDataset
+
+Caltech101
+----------------
+
+.. autoclass:: Caltech101
+
+Food101
+----------------
+
+.. autoclass:: Food101
+
+DTD
+----------------
+
+.. autoclass:: DTD
+
+FGVCAircraft
+----------------
+
+.. autoclass:: FGVCAircraft
+
+
+Flowers102
+----------------
+
+.. autoclass:: Flowers102
+
+StanfordCars
+----------------
+
+.. autoclass:: StanfordCars
+
+OxfordIIITPet
+----------------
+
+.. autoclass:: OxfordIIITPet
+
+SUN397
+----------------
+
+.. autoclass:: SUN397
+
+RefCOCO
+--------
+
+.. autoclass:: RefCOCO
+
+Dataset Wrappers
+----------------
+
+.. autoclass:: KFoldDataset
+
+The dataset wrappers in the MMEngine can be directly used in MMPreTrain.
+
+.. list-table::
+
+   * - :class:`~mmengine.dataset.ConcatDataset`
+     - A wrapper of concatenated dataset.
+   * - :class:`~mmengine.dataset.RepeatDataset`
+     - A wrapper of repeated dataset.
+   * - :class:`~mmengine.dataset.ClassBalancedDataset`
+     - A wrapper of class balanced dataset.