add resources part

b6df0d33 · limm · cbc25585 · b6df0d33 · b6df0d33 · b6df0d33
Commit b6df0d33 authored Jun 24, 2025 by limm
20 changed files
--- a/docs/en/useful_tools/complexity_analysis.md
+++ b/docs/en/useful_tools/complexity_analysis.md
+# Model Complexity Analysis
+## Get the FLOPs and params (experimental)
+We provide a script adapted from [MMEngine](https://github.com/open-mmlab/mmengine/blob/main/mmengine/analysis/complexity_analysis.py) to compute the FLOPs and params of a given model.
+```shell
+python tools/analysis_tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
+```
+Description of all arguments:
+- `config`: The path of the model config file.
+- `--shape`: Input size, support single value or double value parameter, such as `--shape 256` or `--shape 224 256`. If not set, default to be `224 224`.
+Example:
+```shell
+python tools/analysis_tools/get_flops.py configs/resnet/resnet50_8xb32_in1k.py
+```
+You will get the final result like this.
+```text
+==============================
+Input shape: (3, 224, 224)
+Flops: 4.109G
+Params: 25.557M
+Activation: 11.114M
+==============================
+```
+Also, you will get the detailed complexity information of each layer like this:
+```text
+--------------------------+----------------------+-----------+--------------+
+| module                   | #parameters or shape | #flops    | #activations |
+--------------------------+----------------------+-----------+--------------+
+| model                    | 25.557M              | 4.109G    | 11.114M      |
+|  backbone                |  23.508M             |  4.109G   |  11.114M     |
+|   backbone.conv1         |   9.408K             |   0.118G  |   0.803M     |
+|    backbone.conv1.weight |    (64, 3, 7, 7)     |           |              |
+|   backbone.bn1           |   0.128K             |   1.606M  |   0          |
+|    backbone.bn1.weight   |    (64,)             |           |              |
+|    backbone.bn1.bias     |    (64,)             |           |              |
+|   backbone.layer1        |   0.216M             |   0.677G  |   4.415M     |
+|    backbone.layer1.0     |    75.008K           |    0.235G |    2.007M    |
+|    backbone.layer1.1     |    70.4K             |    0.221G |    1.204M    |
+|    backbone.layer1.2     |    70.4K             |    0.221G |    1.204M    |
+|   backbone.layer2        |   1.22M              |   1.034G  |   3.111M     |
+|    backbone.layer2.0     |    0.379M            |    0.375G |    1.305M    |
+|    backbone.layer2.1     |    0.28M             |    0.22G  |    0.602M    |
+|    backbone.layer2.2     |    0.28M             |    0.22G  |    0.602M    |
+|    backbone.layer2.3     |    0.28M             |    0.22G  |    0.602M    |
+|   backbone.layer3        |   7.098M             |   1.469G  |   2.158M     |
+|    backbone.layer3.0     |    1.512M            |    0.374G |    0.652M    |
+|    backbone.layer3.1     |    1.117M            |    0.219G |    0.301M    |
+|    backbone.layer3.2     |    1.117M            |    0.219G |    0.301M    |
+|    backbone.layer3.3     |    1.117M            |    0.219G |    0.301M    |
+|    backbone.layer3.4     |    1.117M            |    0.219G |    0.301M    |
+|    backbone.layer3.5     |    1.117M            |    0.219G |    0.301M    |
+|   backbone.layer4        |   14.965M            |   0.81G   |   0.627M     |
+|    backbone.layer4.0     |    6.04M             |    0.373G |    0.326M    |
+|    backbone.layer4.1     |    4.463M            |    0.219G |    0.151M    |
+|    backbone.layer4.2     |    4.463M            |    0.219G |    0.151M    |
+|  head.fc                 |  2.049M              |           |              |
+|   head.fc.weight         |   (1000, 2048)       |           |              |
+|   head.fc.bias           |   (1000,)            |           |              |
+|  neck.gap                |                      |  0.1M     |  0           |
+--------------------------+----------------------+-----------+--------------+
+```
+```{warning}
+This tool is still experimental and we do not guarantee that the number is correct. You may well use the result for simple comparisons, but double-check it before you adopt it in technical reports or papers.
+- FLOPs are related to the input shape while parameters are not. The default input shape is (1, 3, 224, 224).
+- Some operators are not counted into FLOPs like custom operators. Refer to [`mmengine.analysis.complexity_analysis._DEFAULT_SUPPORTED_FLOP_OPS`](https://github.com/open-mmlab/mmengine/blob/main/mmengine/analysis/complexity_analysis.py) for details.
+```
--- a/docs/en/useful_tools/confusion_matrix.md
+++ b/docs/en/useful_tools/confusion_matrix.md
+# Confusion Matrix
+MMPretrain provides `tools/analysis_tools/confusion_matrix.py` tool to calculate and visualize the confusion matrix. For an introduction to the confusion matrix, see [link](https://en.wikipedia.org/wiki/Confusion_matrix).
+## Command-line Usage
+**Command**：
+```shell
+python tools/analysis_tools/confusion_matrix.py \
+    ${CONFIG_FILE} \
+    ${CHECKPOINT} \
+    [--show] \
+    [--show-path] \
+    [--include-values] \
+    [--cmap ${CMAP}] \
+    [--cfg-options ${CFG-OPTIONS}]
+```
+**Description of all arguments**：
+- `config`: The path of the model config file.
+- `checkpoint`: The path of the checkpoint.
+- `--show`: If or not to show the matplotlib visualization result of the confusion matrix, the default is `False`.
+- `--show-path`: If `show` is True, the path where the results are saved is visualized.
+- `--include-values`: Whether to add values to the visualization results.
+- `--cmap`: The color map used for visualization results, `cmap`, which defaults to `viridis`.
+* `--cfg-options`: Modifications to the configuration file, refer to [Learn about Configs](../user_guides/config.md).
+**Examples of use**:
+```shell
+python tools/analysis_tools/confusion_matrix.py \
+    configs/resnet/resnet50_8xb16_cifar10.py \
+    https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_b16x8_cifar10_20210528-f54bfad9.pth \
+    --show
+```
+**output image**:
+<div align=center><img src="https://user-images.githubusercontent.com/26739999/210298124-49ae00f7-c8fd-488a-a4da-58c285e9c1f1.png" style=" width: auto; height: 40%; "></div>
+## **Basic Usage**
+```python
+>>> import torch
+>>> from mmpretrain.evaluation import ConfusionMatrix
+>>> y_pred = [0, 1, 1, 3]
+>>> y_true = [0, 2, 1, 3]
+>>> ConfusionMatrix.calculate(y_pred, y_true, num_classes=4)
+tensor([[1, 0, 0, 0],
+        [0, 1, 0, 0],
+        [0, 1, 0, 0],
+        [0, 0, 0, 1]])
+>>> # plot the confusion matrix
+>>> import matplotlib.pyplot as plt
+>>> y_score = torch.rand((1000, 10))
+>>> y_true = torch.randint(10, (1000, ))
+>>> matrix = ConfusionMatrix.calculate(y_score, y_true)
+>>> ConfusionMatrix().plot(matrix)
+>>> plt.show()
+```
+## **Use with Evalutor**
+```python
+>>> import torch
+>>> from mmpretrain.evaluation import ConfusionMatrix
+>>> from mmpretrain.structures import DataSample
+>>> from mmengine.evaluator import Evaluator
+>>> data_samples = [
+...     DataSample().set_gt_label(i%5).set_pred_score(torch.rand(5))
+...     for i in range(1000)
+... ]
+>>> evaluator = Evaluator(metrics=ConfusionMatrix())
+>>> evaluator.process(data_samples)
+>>> evaluator.evaluate(1000)
+{'confusion_matrix/result': tensor([[37, 37, 48, 43, 35],
+         [35, 51, 32, 46, 36],
+         [45, 28, 39, 42, 46],
+         [42, 40, 40, 35, 43],
+         [40, 39, 41, 37, 43]])}
+```
--- a/docs/en/useful_tools/dataset_visualization.md
+++ b/docs/en/useful_tools/dataset_visualization.md
+# Dataset Visualization
+## Introduce the dataset visualization tool
+```bash
+python tools/visualization/browse_dataset.py \
+    ${CONFIG_FILE} \
+    [-o, --output-dir ${OUTPUT_DIR}] \
+    [-p, --phase ${DATASET_PHASE}] \
+    [-n, --show-number ${NUMBER_IMAGES_DISPLAY}] \
+    [-i, --show-interval ${SHOW_INTERRVAL}] \
+    [-m, --mode ${DISPLAY_MODE}] \
+    [-r, --rescale-factor ${RESCALE_FACTOR}] \
+    [-c, --channel-order ${CHANNEL_ORDER}] \
+    [--cfg-options ${CFG_OPTIONS}]
+```
+**Description of all arguments**：
+- `config` : The path of a model config file.
+- `-o, --output-dir`: The output path for visualized images. If not specified, it will be set to `''`, which means not to save.
+- **`-p, --phase`**: Phase of visualizing dataset，must be one of `['train', 'val', 'test']`. If not specified, it will be set to `'train'`.
+- **`-n, --show-number`**: The number of samples to visualized. If not specified, display all images in the dataset.
+- `--show-interval`: The interval of show (s).
+- **`-m, --mode`**: The display mode, can be one of `['original', 'transformed', 'concat', 'pipeline']`. If not specified, it will be set to `'transformed'`.
+- `-r, --rescale-factor`: The image rescale factor, which is useful if the output is too large or too small
+  in the `original` mode.
+- `-c, --channel-order`: The channel of the showing images, could be "BGR" or "RGB", If not specified, it will be set to 'BGR'.
+- `--cfg-options` : Modifications to the configuration file, refer to [Learn about Configs](../user_guides/config.md).
+```{note}
+1. The `-m, --mode` is about display mode, display original pictures or transformed pictures or comparison pictures:
+- "original" means show images load from disk;
+- "transformed" means to show images after transformed;
+- "concat" means show images stitched by "original" and "transformed" images;
+- "pipeline" means show all the intermediate images throghout the pipeline.
+2.  The `-r, --rescale-factor` option is set when the label information is too large or too small relative to the picture. For example, when visualizing the CIFAR dataset, since the resolution of the image is very small, `--rescale-factor` can be set to 10.
+```
+## How to visualize the original image
+In **'original'** mode:
+```shell
+python ./tools/visualization/browse_dataset.py ./configs/resnet/resnet101_8xb16_cifar10.py --phase val --output-dir tmp --mode original --show-number 100 --rescale-factor 10 --channel-order RGB
+```
+- `--phase val`: Visual validation set, can be simplified to `-p val`;
+- `--output-dir tmp`: The visualization results are saved in the "tmp" folder, can be simplified to `-o tmp`;
+- `--mode original`: Visualize the original image, can be simplified to `-m original`;
+- `--show-number 100`: visualize 100 images, can be simplified to `-n 100`;
+- `--rescale-factor`: the image is enlarged by 10 times, can be simplified to `-r 10`;
+- `--channel-order RGB`: The channel order of the visualized image is "RGB", can be simplified to `-c RGB`.
+<div align=center><img src="https://user-images.githubusercontent.com/18586273/190993839-216a7a1e-590e-47b9-92ae-08f87a7d58df.jpg" style=" width: auto; height: 40%; "></div>
+## How to visualize the transformed images
+In **'transformed'** mode:
+```shell
+python ./tools/visualization/browse_dataset.py ./configs/resnet/resnet50_8xb32_in1k.py -n 100
+```
+<div align=center><img src="https://user-images.githubusercontent.com/18586273/190994696-737b09d9-d0fb-4593-94a2-4487121e0286.JPEG" style=" width: auto; height: 40%; "></div>
+## How to visualize the transformed images and original images together
+In **'concat'** mode:
+```shell
+python ./tools/visualization/browse_dataset.py configs/swin_transformer/swin-small_16xb64_in1k.py -n 10 -m concat
+```
+<div align=center><img src="https://user-images.githubusercontent.com/18586273/190995078-3872feb2-d4e2-4727-a21b-7062d52f7d3e.JPEG" style=" width: auto; height: 40%; "></div>
+4. In **'pipeline'** mode：
+```shell
+python ./tools/visualization/browse_dataset.py configs/swin_transformer/swin-small_16xb64_in1k.py -m pipeline
+```
+<div align=center><img src="https://user-images.githubusercontent.com/18586273/190995525-fac0220f-6630-4013-b94a-bc6de4fdff7a.JPEG" style=" width: auto; height: 40%; "></div>
+```shell
+python ./tools/visualization/browse_dataset.py configs/beit/beit_beit-base-p16_8xb256-amp-coslr-300e_in1k.py -m pipeline
+```
+<div align=center><img src="https://user-images.githubusercontent.com/26739999/226542300-74216187-e3d0-4a6e-8731-342abe719721.png" style=" width: auto; height: 40%; "></div>
--- a/docs/en/useful_tools/log_result_analysis.md
+++ b/docs/en/useful_tools/log_result_analysis.md
+# Log and Results Analysis
+## Log Analysis
+### Introduction of log analysis tool
+`tools/analysis_tools/analyze_logs.py` plots curves of given keys according to the log files.
+<div align=center><img src="../_static/image/tools/analysis/analyze_log.jpg" style=" width: 75%; height: 30%; "></div>
+```shell
+python tools/analysis_tools/analyze_logs.py plot_curve  \
+    ${JSON_LOGS}  \
+    [--keys ${KEYS}]  \
+    [--title ${TITLE}]  \
+    [--legend ${LEGEND}]  \
+    [--backend ${BACKEND}]  \
+    [--style ${STYLE}]  \
+    [--out ${OUT_FILE}]  \
+    [--window-size ${WINDOW_SIZE}]
+```
+**Description of all arguments**：
+- `json_logs` : The paths of the log files, separate multiple files by spaces.
+- `--keys` : The fields of the logs to analyze, separate multiple keys by spaces. Defaults to 'loss'.
+- `--title` : The title of the figure. Defaults to use the filename.
+- `--legend` : The names of legend, the number of which must be equal to `len(${JSON_LOGS}) * len(${KEYS})`. Defaults to use `"${JSON_LOG}-${KEYS}"`.
+- `--backend` : The backend of matplotlib. Defaults to auto selected by matplotlib.
+- `--style` : The style of the figure. Default to `whitegrid`.
+- `--out` : The path of the output picture. If not set, the figure won't be saved.
+- `--window-size`: The shape of the display window. The format should be `'W*H'`. Defaults to `'12*7'`.
+```{note}
+The `--style` option depends on `seaborn` package, please install it before setting it.
+```
+### How to plot the loss/accuracy curve
+We present some examples here to show how to plot the loss curve of accuracy curve by using the `tools/analysis_tools/analyze_logs.py`
+#### Plot the loss curve in training.
+```shell
+python tools/analysis_tools/analyze_logs.py plot_curve your_log_json --keys loss --legend loss
+```
+#### Plot the top-1 accuracy and top-5 accuracy curves, and save the figure to results.jpg.
+```shell
+python tools/analysis_tools/analyze_logs.py plot_curve your_log_json --keys accuracy/top1 accuracy/top5  --legend top1 top5 --out results.jpg
+```
+#### Compare the top-1 accuracy of two log files in the same figure.
+```shell
+python tools/analysis_tools/analyze_logs.py plot_curve log1.json log2.json --keys accuracy/top1 --legend exp1 exp2
+```
+### How to calculate training time
+`tools/analysis_tools/analyze_logs.py` can also calculate the training time according to the log files.
+```shell
+python tools/analysis_tools/analyze_logs.py cal_train_time \
+    ${JSON_LOGS}
+    [--include-outliers]
+```
+**Description of all arguments**:
+- `json_logs` : The paths of the log files, separate multiple files by spaces.
+- `--include-outliers` : If set, include the first time record in each epoch (Sometimes the time of the first iteration is longer).
+Example:
+```shell
+python tools/analysis_tools/analyze_logs.py cal_train_time work_dirs/your_exp/20230206_181002/vis_data/scalars.json
+```
+The output is expected to be like the below.
+```text
+-----Analyze train time of work_dirs/your_exp/20230206_181002/vis_data/scalars.json-----
+slowest epoch 68, average time is 0.3818
+fastest epoch 1, average time is 0.3694
+time std over epochs is 0.0020
+average iter time: 0.3777 s/iter
+```
+## Result Analysis
+With the `--out` argument in `tools/test.py`, we can save the inference results of all samples as a file.
+And with this result file, we can do further analysis.
+### How to conduct offline metric evaluation
+We provide `tools/analysis_tools/eval_metric.py` to enable the user evaluate the model from the prediction files.
+```shell
+python tools/analysis_tools/eval_metric.py \
+      ${RESULT} \
+      [--metric ${METRIC_OPTIONS} ...]
+```
+Description of all arguments:
+- `result`:  The output result file in pickle format from `tools/test.py`.
+- `--metric`: The metric and options to evaluate the results. You need to specify at least one metric and you
+  can also specify multiple `--metric` to use multiple metrics.
+Please refer the [Metric Documentation](mmpretrain.evaluation) to find the available metrics and options.
+```{note}
+In `tools/test.py`, we support using `--out-item` option to select which kind of results will be saved.
+Please ensure the `--out-item` is not specified or `--out-item=pred` to use this tool.
+```
+**Examples**:
+```shell
+# Get the prediction results
+python tools/test.py configs/resnet/resnet18_8xb16_cifar10.py \
+    https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth \
+    --out results.pkl
+# Eval the top-1 and top-5 accuracy
+python tools/analysis_tools/eval_metric.py results.pkl --metric type=Accuracy topk=1,5
+# Eval the overall accuracy and the class-wise precision, recall, f1-score
+python tools/analysis_tools/eval_metric.py results.pkl --metric type=Accuracy \
+    --metric type=SingleLabelMetric items=precision,recall,f1-score average=None
+```
+### How to plot the confusion matrix for the test result
+We provide `tools/analysis_tools/confusion_matrix.py` to enable the user plot the confusion matrix from the prediction files.
+```shell
+python tools/analysis_tools/confusion_matrix.py \
+      ${CONFIG} \
+      ${RESULT} \
+      [--out ${OUT}] \
+      [--show] \
+      [--show-path ${SHOW_PATH}] \
+      [--include-values] \
+      [--cmap] \
+      [--cfg-options ${CFG_OPTIONS} ...] \
+```
+Description of all arguments:
+- `config`: The config file path.
+- `result`:  The output result file in pickle format from `tools/test.py`, or a checkpoint file.
+- `--out`: The path to save the confusion matrix in pickle format.
+- `--show`: Whether to show the confusion matrix plot.
+- `--show-path`: The path to save the confusion matrix plot.
+- `--include-values`: Whether to show the values in the confusion matrix plot.
+- `--cmap`: The color map to plot the confusion matrix.
+- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
+```{note}
+In `tools/test.py`, we support using `--out-item` option to select which kind of results will be saved.
+Please ensure the `--out-item` is not specified or `--out-item=pred` to use this tool.
+```
+**Examples**:
+```shell
+# Get the prediction results
+python tools/test.py configs/resnet/resnet18_8xb16_cifar10.py \
+    https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth \
+    --out results.pkl
+# Save the confusion matrix in a pickle file
+python tools/analysis_tools/confusion_matrix.py configs/resnet/resnet18_8xb16_cifar10.py results.pkl --out cm.pkl
+# Show the confusion matrix plot in a graphical window.
+python tools/analysis_tools/confusion_matrix.py configs/resnet/resnet18_8xb16_cifar10.py results.pkl --show
+```
+### How to visualize the prediction results
+We can use `tools/analysis_tools/analyze_results.py` to save the images with the highest scores in successful or failed prediction.
+```shell
+python tools/analysis_tools/analyze_results.py \
+      ${CONFIG} \
+      ${RESULT} \
+      [--out-dir ${OUT_DIR}] \
+      [--topk ${TOPK}] \
+      [--rescale-factor ${RESCALE_FACTOR}] \
+      [--cfg-options ${CFG_OPTIONS}]
+```
+**Description of all arguments**:
+- `config` : The path of the model config file.
+- `result`:  Output result file in json/pickle format from `tools/test.py`.
+- `--out_dir`: Directory to store output files.
+- `--topk`: The number of images in successful or failed prediction with the highest `topk` scores to save. If not specified, it will be set to 20.
+- `--rescale-factor`: Image rescale factor, which is useful if the output is too large or too small (Too small
+  images may cause the prediction tag is too vague).
+- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
+```{note}
+In `tools/test.py`, we support using `--out-item` option to select which kind of results will be saved.
+Please ensure the `--out-item` is not specified or `--out-item=pred` to use this tool.
+```
+**Examples**:
+```shell
+# Get the prediction results
+python tools/test.py configs/resnet/resnet18_8xb16_cifar10.py \
+    https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth \
+    --out results.pkl
+# Save the top-10 successful and failed predictions. And enlarge the sample images by 10 times.
+python tools/analysis_tools/analyze_results.py \
+       configs/resnet/resnet18_8xb16_cifar10.py \
+       results.pkl \
+       --out-dir output \
+       --topk 10 \
+       --rescale-factor 10
+```
--- a/docs/en/useful_tools/model_serving.md
+++ b/docs/en/useful_tools/model_serving.md
+# Torchserve Deployment
+In order to serve an `MMPretrain` model with [`TorchServe`](https://pytorch.org/serve/), you can follow the steps:
+## 1. Convert model from MMPretrain to TorchServe
+```shell
+python tools/torchserve/mmpretrain2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
+--output-folder ${MODEL_STORE} \
+--model-name ${MODEL_NAME}
+```
+```{note}
+${MODEL_STORE} needs to be an absolute path to a folder.
+```
+Example:
+```shell
+python tools/torchserve/mmpretrain2torchserve.py \
+  configs/resnet/resnet18_8xb32_in1k.py \
+  checkpoints/resnet18_8xb32_in1k_20210831-fbbb1da6.pth \
+  --output-folder ./checkpoints \
+  --model-name resnet18_in1k
+```
+## 2. Build `mmpretrain-serve` docker image
+```shell
+docker build -t mmpretrain-serve:latest docker/serve/
+```
+## 3. Run `mmpretrain-serve`
+Check the official docs for [running TorchServe with docker](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment).
+In order to run in GPU, you need to install [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). You can omit the `--gpus` argument in order to run in GPU.
+Example:
+```shell
+docker run --rm \
+--name mar \
+--cpus 8 \
+--gpus device=0 \
+-p8080:8080 -p8081:8081 -p8082:8082 \
+--mount type=bind,source=`realpath ./checkpoints`,target=/home/model-server/model-store \
+mmpretrain-serve:latest
+```
+```{note}
+`realpath ./checkpoints` points to the absolute path of "./checkpoints", and you can replace it with the absolute path where you store torchserve models.
+```
+[Read the docs](https://github.com/pytorch/serve/blob/master/docs/rest_api.md) about the Inference (8080), Management (8081) and Metrics (8082) APis
+## 4. Test deployment
+```shell
+curl http://127.0.0.1:8080/predictions/${MODEL_NAME} -T demo/demo.JPEG
+```
+You should obtain a response similar to:
+```json
+{
+  "pred_label": 58,
+  "pred_score": 0.38102269172668457,
+  "pred_class": "water snake"
+}
+```
+And you can use `test_torchserver.py` to compare result of TorchServe and PyTorch, and visualize them.
+```shell
+python tools/torchserve/test_torchserver.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${MODEL_NAME}
+[--inference-addr ${INFERENCE_ADDR}] [--device ${DEVICE}]
+```
+Example:
+```shell
+python tools/torchserve/test_torchserver.py \
+  demo/demo.JPEG \
+  configs/resnet/resnet18_8xb32_in1k.py \
+  checkpoints/resnet18_8xb32_in1k_20210831-fbbb1da6.pth \
+  resnet18_in1k
+```
--- a/docs/en/useful_tools/print_config.md
+++ b/docs/en/useful_tools/print_config.md
+# How to Get the Complete Config
+We also provide the `print_config.py` tools to print the complete configuration of the given experiment.
+You can check each item of the config before the training by using the following command.
+## Description
+`tools/misc/print_config.py` prints the whole config verbatim, expanding all its imports.
+```shell
+python tools/misc/print_config.py ${CONFIG} [--cfg-options ${CFG_OPTIONS}]
+```
+Description of all arguments:
+- `config` : The path of the model config file.
+- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
+## Examples
+```shell
+# Print a complete config
+python tools/misc/print_config.py configs/t2t_vit/t2t-vit-t-14_8xb64_in1k.py
+# Save the complete config to a independent config file.
+python tools/misc/print_config.py configs/t2t_vit/t2t-vit-t-14_8xb64_in1k.py > final_config.py
+```
--- a/docs/en/useful_tools/scheduler_visualization.md
+++ b/docs/en/useful_tools/scheduler_visualization.md
+# Hyper-parameter Scheduler Visualization
+This tool aims to help the user to check the hyper-parameter scheduler of the optimizer (without training), which support the "learning rate" or "momentum"
+## Introduce the scheduler visualization tool
+```bash
+python tools/visualization/vis_scheduler.py \
+    ${CONFIG_FILE} \
+    [-p, --parameter ${PARAMETER_NAME}] \
+    [-d, --dataset-size ${DATASET_SIZE}] \
+    [-n, --ngpus ${NUM_GPUs}] \
+    [-s, --save-path ${SAVE_PATH}] \
+    [--title ${TITLE}] \
+    [--style ${STYLE}] \
+    [--window-size ${WINDOW_SIZE}] \
+    [--cfg-options]
+```
+**Description of all arguments**：
+- `config`: The path of a model config file.
+- **`-p, --parameter`**: The param to visualize its change curve, choose from "lr" and "momentum". Default to use "lr".
+- **`-d, --dataset-size`**: The size of the datasets. If set，`build_dataset` will be skipped and `${DATASET_SIZE}` will be used as the size. Default to use the function `build_dataset`.
+- **`-n, --ngpus`**: The number of GPUs used in training, default to be 1.
+- **`-s, --save-path`**: The learning rate curve plot save path, default not to save.
+- `--title`: Title of figure. If not set, default to be config file name.
+- `--style`: Style of plt. If not set, default to be `whitegrid`.
+- `--window-size`: The shape of the display window. If not specified, it will be set to `12*7`. If used, it must be in the format `'W*H'`.
+- `--cfg-options`: Modifications to the configuration file, refer to [Learn about Configs](../user_guides/config.md).
+```{note}
+Loading annotations maybe consume much time, you can directly specify the size of the dataset with `-d, dataset-size` to save time.
+```
+## How to plot the learning rate curve without training
+You can use the following command to plot the step learning rate schedule used in the config `configs/swin_transformer/swin-base_16xb64_in1k.py`:
+```bash
+python tools/visualization/vis_scheduler.py configs/swin_transformer/swin-base_16xb64_in1k.py --dataset-size 1281167 --ngpus 16
+```
+<div align=center><img src="https://user-images.githubusercontent.com/26739999/226544329-cf3a3d45-6ab3-48aa-8972-2c2a58c35e62.png" style=" width: auto; height: 40%; "></div>
--- a/docs/en/useful_tools/shape_bias.md
+++ b/docs/en/useful_tools/shape_bias.md
+# Shape Bias Tool Usage
+Shape bias measures how a model relies the shapes, compared to texture, to sense the semantics in images. For more details,
+we recommend interested readers to this [paper](https://arxiv.org/abs/2106.07411). MMPretrain provide an off-the-shelf toolbox to
+obtain the shape bias of a classification model. You can following these steps below:
+## Prepare the dataset
+First you should download the [cue-conflict](https://github.com/bethgelab/model-vs-human/releases/download/v0.1/cue-conflict.tar.gz) to `data` folder,
+and then unzip this dataset. After that, you `data` folder should have the following structure:
+```text
+data
+├──cue-conflict
+|      |──airplane
+|      |──bear
+|      ...
+|      |── truck
+```
+## Modify the config for classification
+We run the shape-bias tool on a ViT-base model with masked autoencoder pretraining. Its config file is `configs/mae/benchmarks/vit-base-p16_8xb128-coslr-100e_in1k.py`, and its checkpoint is downloaded from [this link](https://download.openmmlab.com/mmselfsup/1.x/mae/mae_vit-base-p16_8xb512-fp16-coslr-1600e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k_20220825-cf70aa21.pth). Replace the original test_pipeline, test_dataloader and test_evaluation with the following configurations:
+```python
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='ResizeEdge',
+        scale=256,
+        edge='short',
+        backend='pillow'),
+    dict(type='CenterCrop', crop_size=224),
+    dict(type='PackInputs')
+]
+test_dataloader = dict(
+    pin_memory=True,
+    collate_fn=dict(type='default_collate'),
+    batch_size=32,
+    num_workers=4,
+    dataset=dict(
+        type='CustomDataset',
+        data_root='data/cue-conflict',
+        pipeline=test_pipeline,
+        _delete_=True),
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    persistent_workers=True,
+    drop_last=False)
+test_evaluator = dict(
+    type='mmpretrain.ShapeBiasMetric',
+    _delete_=True,
+    csv_dir='work_dirs/shape_bias',
+    model_name='mae')
+```
+Please note you should make custom modifications to the `csv_dir` and `model_name` above. I renamed my modified sample config file as `vit-base-p16_8xb128-coslr-100e_in1k_shape-bias.py` in the folder `configs/mae/benchmarks/`.
+## Inference your model with above modified config file
+Then you should inferece your model on the `cue-conflict` dataset with the your modified config file.
+```shell
+# For PyTorch
+bash tools/dist_test.sh $CONFIG $CHECKPOINT
+```
+**Description of all arguments**:
+- `$CONFIG`: The path of your modified config file.
+- `$CHECKPOINT`: The path or link of the checkpoint file.
+```shell
+# Example
+bash tools/dist_test.sh configs/mae/benchmarks/vit-base-p16_8xb128-coslr-100e_in1k_shape-bias.py https://download.openmmlab.com/mmselfsup/1.x/mae/mae_vit-base-p16_8xb512-fp16-coslr-1600e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k_20220825-cf70aa21.pth 1
+```
+After that, you should obtain a csv file in `csv_dir` folder, named `cue-conflict_model-name_session-1.csv`. Besides this file, you should also download these [csv files](https://github.com/bethgelab/model-vs-human/tree/master/raw-data/cue-conflict) to the
+`csv_dir`.
+## Plot shape bias
+Then we can start to plot the shape bias:
+```shell
+python tools/analysis_tools/shape_bias.py --csv-dir $CSV_DIR --result-dir $RESULT_DIR --colors $RGB --markers o --plotting-names $YOUR_MODEL_NAME --model-names $YOUR_MODEL_NAME
+```
+**Description of all arguments**:
+- `--csv-dir $CSV_DIR`, the same directory to save these csv files.
+- `--result-dir $RESULT_DIR`, the directory to output the result named `cue-conflict_shape-bias_matrixplot.pdf`.
+- `--colors $RGB`, should be the RGB values, formatted in R G B, e.g. 100 100 100, and can be multiple RGB values, if you want to plot the shape bias of several models.
+- `--plotting-names $YOUR_MODEL_NAME`, the name of the legend in the shape bias figure, and you can set it as your model name. If you want to plot several models, plotting_names can be multiple values.
+- `model-names $YOUR_MODEL_NAME`, should be the same name specified in your config, and can be multiple names if you want to plot the shape bias of several models.
+Please note, every three values for `--colors` corresponds to one value for `--model-names`. After all of above steps, you are expected to obtain the following figure.
+<div align="center">
+<img src="https://github.com/open-mmlab/mmpretrain/assets/42371271/dc608d06-43eb-4860-bb70-486ed2a3f927" width="500" />
+</div>
--- a/docs/en/useful_tools/t-sne_visualization.md
+++ b/docs/en/useful_tools/t-sne_visualization.md
+# t-Distributed Stochastic Neighbor Embedding (t-SNE) Visualization
+## Introduction of the t-SNE visualization tool
+MMPretrain provides `tools/visualization/vis_tsne.py` tool to visualize the feature embeddings of images by t-SNE. Please install `sklearn` to calculate t-SNE by `pip install scikit-learn`.
+**Command**：
+```bash
+python tools/visualization/vis_tsne.py \
+    CONFIG \
+    [--checkpoint CHECKPOINT] \
+    [--work-dir WORK_DIR] \
+    [--test-cfg TEST_CFG] \
+    [--vis-stage {backbone,neck,pre_logits}]
+    [--class-idx ${CLASS_IDX} [CLASS_IDX ...]]
+    [--max-num-class MAX_NUM_CLASS]
+    [--max-num-samples MAX_NUM_SAMPLES]
+    [--cfg-options CFG_OPTIONS [CFG_OPTIONS ...]]
+    [--device DEVICE]
+    [--legend]
+    [--show]
+    [--n-components N_COMPONENTS]
+    [--perplexity PERPLEXITY]
+    [--early-exaggeration EARLY_EXAGGERATION]
+    [--learning-rate LEARNING_RATE]
+    [--n-iter N_ITER]
+    [--n-iter-without-progress N_ITER_WITHOUT_PROGRESS]
+    [--init INIT]
+```
+**Description of all arguments**：
+- `CONFIG`: The path of t-SNE config file.
+- `--checkpoint CHECKPOINT`: The path of the checkpoint file.
+- `--work-dir WORK_DIR`: The directory to save logs and visualization images.
+- `--test-cfg TEST_CFG`: The path of t-SNE config file to load config of test dataloader.
+- `--vis-stage {backbone,neck,pre_logits}`: The visualization stage of the model.
+- `--class-idx CLASS_IDX [CLASS_IDX ...]`: The categories used to calculate t-SNE.
+- `--max-num-class MAX_NUM_CLASS`: The first N categories to apply t-SNE algorithms. Defaults to 20.
+- `--max-num-samples MAX_NUM_SAMPLES`: The maximum number of samples per category. Higher number need longer time to calculate. Defaults to 100.
+- `--cfg-options CFG_OPTIONS [CFG_OPTIONS ...]`: override some settings in the used config, the key-value pair in xxx=yyy format will be merged into config file. If the value to be overwritten is a list, it should be like key="[a,b]" or key=a,b It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" Note that the quotation marks are necessary and that no white space is allowed.
+- `--device DEVICE`: Device used for inference.
+- `--legend`: Show the legend of all categories.
+- `--show`: Display the result in a graphical window.
+- `--n-components N_COMPONENTS`: The dimension of results.
+- `--perplexity PERPLEXITY`: The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms.
+- `--early-exaggeration EARLY_EXAGGERATION`: Controls how tight natural clusters in the original space are in the embedded space and how much space will be between them.
+- `--learning-rate LEARNING_RATE`: The learning rate for t-SNE is usually in the range[10.0, 1000.0]. If the learning rate is too high, the data may looklike a ball with any point approximately equidistant from its nearestneighbours. If the learning rate is too low, most points may lookcompressed in a dense cloud with few outliers.
+- `--n-iter N_ITER`: Maximum number of iterations for the optimization. Should be at least 250.
+- `--n-iter-without-progress N_ITER_WITHOUT_PROGRESS`: Maximum number of iterations without progress before we abort the optimization.
+- `--init INIT`: The init method.
+## How to visualize the t-SNE of a image classifier (such as ResNet)
+Here are two examples of running t-SNE visualization on ResNet-18 and ResNet-50 models, trained on CIFAR-10 dataset:
+```shell
+python tools/visualization/vis_tsne.py \
+    configs/resnet/resnet18_8xb16_cifar10.py \
+    --checkpoint https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth
+python tools/visualization/vis_tsne.py \
+    configs/resnet/resnet50_8xb16_cifar10.py \
+    --checkpoint https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_b16x8_cifar10_20210528-f54bfad9.pth
+```
+| ResNet-18                                                                                            | ResNet-50                                                                                            |
+| ---------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
+| <div align=center><img src='https://user-images.githubusercontent.com/42371271/236410521-c4d087da-d16f-48ad-b951-c74d10c68f33.png' height="auto" width="auto" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/42371271/236411844-c97dc514-dad0-401e-ba8f-307d0a385b4e.png' height="auto" width="auto" ></div> |
+## How to visualize the t-SNE of a self-supervised model (such as MAE)
+Here is an example of running t-SNE visualization on MAE-ViT-base model, trained on ImageNet dataset. The input data is from ImageNet validation set. MAE and some self-supervised pre-training algorithms do not have test_dataloader information. When analyzing such self-supervised algorithms, you need to add test_dataloader information in the config, or you can use '--test-cfg' argument to specify a config file.
+```shell
+python tools/visualization/vis_tsne.py \
+    configs/mae/mae_vit-base-p16_8xb512-amp-coslr-800e_in1k.py \
+    --checkpoint https://download.openmmlab.com/mmselfsup/1.x/mae/mae_vit-base-p16_8xb512-fp16-coslr-800e_in1k/mae_vit-base-p16_8xb512-coslr-800e-fp16_in1k_20220825-5d81fbc4.pth \
+    --test-cfg configs/_base_/datasets/imagenet_bs32.py
+```
+| MAE-ViT-base                                                                                                                                                  |
+| ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| <div align=center><img src='https://github.com/open-mmlab/mmpretrain/assets/42371271/ee576c0c-abef-43d1-8866-24a5f5fd0cf6' height="auto" width="auto" ></div> |
--- a/docs/en/useful_tools/verify_dataset.md
+++ b/docs/en/useful_tools/verify_dataset.md
+# Verify Dataset
+In MMPretrain, we also provide a tool `tools/misc/verify_dataset.py` to check whether there exists **broken pictures** in the given dataset.
+## Introduce the tool
+```shell
+python tools/print_config.py \
+    ${CONFIG} \
+    [--out-path ${OUT-PATH}] \
+    [--phase ${PHASE}] \
+    [--num-process ${NUM-PROCESS}]
+    [--cfg-options ${CFG_OPTIONS}]
+```
+**Description of all arguments**:
+- `config` : The path of the model config file.
+- `--out-path` : The path to save the verification result, if not set, defaults to 'brokenfiles.log'.
+- `--phase` :  Phase of dataset to verify, accept "train" "test" and "val", if not set, defaults to "train".
+- `--num-process` : number of process to use, if not set, defaults to 1.
+- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
+## Example
+```shell
+python tools/misc/verify_dataset.py configs/t2t_vit/t2t-vit-t-14_8xb64_in1k.py --out-path broken_imgs.log --phase val --num-process 8
+```
--- a/docs/en/user_guides/config.md
+++ b/docs/en/user_guides/config.md
+# Learn about Configs
+To manage various configurations in a deep-learning experiment, we use a kind of config file to record all of
+these configurations. This config system has a modular and inheritance design, and more details can be found in
+{external+mmengine:doc}`the tutorial in MMEngine <advanced_tutorials/config>`.
+Usually, we use python files as config file. All configuration files are placed under the [`configs`](https://github.com/open-mmlab/mmpretrain/tree/main/configs) folder, and the directory structure is as follows:
+```text
+MMPretrain/
+    ├── configs/
+    │   ├── _base_/                       # primitive configuration folder
+    │   │   ├── datasets/                      # primitive datasets
+    │   │   ├── models/                        # primitive models
+    │   │   ├── schedules/                     # primitive schedules
+    │   │   └── default_runtime.py             # primitive runtime setting
+    │   ├── beit/                         # BEiT Algorithms Folder
+    │   ├── mae/                          # MAE Algorithms Folder
+    │   ├── mocov2/                       # MoCoV2 Algorithms Folder
+    │   ├── resnet/                       # ResNet Algorithms Folder
+    │   ├── swin_transformer/             # Swin Algorithms Folder
+    │   ├── vision_transformer/           # ViT Algorithms Folder
+    │   ├── ...
+    └── ...
+```
+If you wish to inspect the config file, you may run `python tools/misc/print_config.py /PATH/TO/CONFIG` to see the complete config.
+This article mainly explains the structure of configuration files, and how to modify it based on the existing configuration files. We will take [ResNet50 config file](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet50_8xb32_in1k.py) as an example and explain it line by line.
+## Config Structure
+There are four kinds of basic component files in the `configs/_base_` folders, namely：
+- [models](https://github.com/open-mmlab/mmpretrain/tree/main/configs/_base_/models)
+- [datasets](https://github.com/open-mmlab/mmpretrain/tree/main/configs/_base_/datasets)
+- [schedules](https://github.com/open-mmlab/mmpretrain/tree/main/configs/_base_/schedules)
+- [runtime](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/default_runtime.py)
+We call all the config files in the `_base_` folder as _primitive_ config files. You can easily build your training config file by inheriting some primitive config files.
+For easy understanding, we use [ResNet50 config file](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet50_8xb32_in1k.py) as an example and comment on each line.
+```python
+_base_ = [                                    # This config file will inherit all config files in `_base_`.
+    '../_base_/models/resnet50.py',           # model settings
+    '../_base_/datasets/imagenet_bs32.py',    # data settings
+    '../_base_/schedules/imagenet_bs256.py',  # schedule settings
+    '../_base_/default_runtime.py'            # runtime settings
+]
+```
+We will explain the four primitive config files separately below.
+### Model settings
+This primitive config file includes a dict variable `model`, which mainly includes information such as network structure and loss function:
+- `type`: The type of model to build, we support several tasks.
+  - For image classification tasks, it's usually `ImageClassifier` You can find more details in the [API documentation](mmpretrain.models.classifiers).
+  - For self-supervised leanrning, there are several `SelfSupervisors`, such as `MoCoV2`, `BEiT`, `MAE`, etc. You can find more details in the [API documentation](mmpretrain.models.selfsup).
+  - For image retrieval tasks, it's usually `ImageToImageRetriever` You can find more details in the [API documentation](mmpretrain.models.retrievers).
+Usually, we use the **`type` field** to specify the class of the component and use other fields to pass
+the initialization arguments of the class. The {external+mmengine:doc}`registry tutorial <advanced_tutorials/registry>` describes it in detail.
+Here, we use the config fields of [`ImageClassifier`](mmpretrain.models.classifiers.ImageClassifier) as an example to
+describe the initialization arguments as below:
+- `backbone`: The settings of the backbone. The backbone is the main network to extract features of the inputs, like `ResNet`, `Swin Transformer`, `Vision Transformer` etc. All available backbones can be found in the [API documentation](mmpretrain.models.backbones).
+  - For self-supervised leanrning, some of the backbones are re-implemented, you can find more details in the [API documentation](mmpretrain.models.selfsup).
+- `neck`: The settings of the neck. The neck is the intermediate module to connect the backbone and the head, like `GlobalAveragePooling`. All available necks can be found in the [API documentation](mmpretrain.models.necks).
+- `head`: The settings of the task head. The head is the task-related component to do a specified task, like image classification or self-supervised training. All available heads can be found in the [API documentation](mmpretrain.models.heads).
+  - `loss`: The loss function to optimize, like `CrossEntropyLoss`, `LabelSmoothLoss`, `PixelReconstructionLoss` and etc. All available losses can be found in the [API documentation](mmpretrain.models.losses).
+- `data_preprocessor`: The component before the model forwarding to preprocess the inputs. See the [documentation](mmpretrain.models.utils.data_preprocessor) for more details.
+- `train_cfg`: The extra settings of `ImageClassifier` during training. In `ImageClassifier`, we mainly use it to specify batch augmentation settings, like `Mixup` and `CutMix`. See the [documentation](mmpretrain.models.utils.batch_augments) for more details.
+Following is the model primitive config of the ResNet50 config file in [`configs/_base_/models/resnet50.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/models/resnet50.py):
+```python
+model = dict(
+    type='ImageClassifier',     # The type of the main model (here is for image classification task).
+    backbone=dict(
+        type='ResNet',          # The type of the backbone module.
+        # All fields except `type` come from the __init__ method of class `ResNet`
+        # and you can find them from https://mmpretrain.readthedocs.io/en/latest/api/generated/mmpretrain.models.backbones.ResNet.html
+        depth=50,
+        num_stages=4,
+        out_indices=(3, ),
+        frozen_stages=-1,
+        style='pytorch'),
+    neck=dict(type='GlobalAveragePooling'),    # The type of the neck module.
+    head=dict(
+        type='LinearClsHead',     # The type of the classification head module.
+        # All fields except `type` come from the __init__ method of class `LinearClsHead`
+        # and you can find them from https://mmpretrain.readthedocs.io/en/latest/api/generated/mmpretrain.models.heads.LinearClsHead.html
+        num_classes=1000,
+        in_channels=2048,
+        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
+    ))
+```
+### Data settings
+This primitive config file includes information to construct the dataloader and evaluator:
+- `data_preprocessor`: Model input preprocessing configuration, same as `model.data_preprocessor` but with lower priority.
+- `train_evaluator | val_evaluator | test_evaluator`: To build the evaluator or metrics, refer to the [tutorial](mmpretrain.evaluation).
+- `train_dataloader | val_dataloader | test_dataloader`: The settings of dataloaders
+  - `batch_size`: The batch size of each GPU.
+  - `num_workers`: The number of workers to fetch data of each GPU.
+  - `sampler`: The settings of the sampler.
+  - `persistent_workers`: Whether to persistent workers after finishing one epoch.
+  - `dataset`: The settings of the dataset.
+    - `type`: The type of the dataset, we support `CustomDataset`, `ImageNet` and many other datasets, refer to [documentation](mmpretrain.datasets).
+    - `pipeline`: The data transform pipeline. You can find how to design a pipeline in [this tutorial](https://mmpretrain.readthedocs.io/en/latest/tutorials/data_pipeline.html).
+Following is the data primitive config of the ResNet50 config in [`configs/_base_/datasets/imagenet_bs32.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/datasets/imagenet_bs32.py)：
+```python
+dataset_type = 'ImageNet'
+# preprocessing configuration
+data_preprocessor = dict(
+    # Input image data channels in 'RGB' order
+    mean=[123.675, 116.28, 103.53],    # Input image normalized channel mean in RGB order
+    std=[58.395, 57.12, 57.375],       # Input image normalized channel std in RGB order
+    to_rgb=True,                       # Whether to flip the channel from BGR to RGB or RGB to BGR
+)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),     # read image
+    dict(type='RandomResizedCrop', scale=224),     # Random scaling and cropping
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),   # random horizontal flip
+    dict(type='PackInputs'),         # prepare images and labels
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),     # read image
+    dict(type='ResizeEdge', scale=256, edge='short'),  # Scale the short side to 256
+    dict(type='CenterCrop', crop_size=224),     # center crop
+    dict(type='PackInputs'),                 # prepare images and labels
+]
+# Construct training set dataloader
+train_dataloader = dict(
+    batch_size=32,                     # batchsize per GPU
+    num_workers=5,                     # Number of workers to fetch data per GPU
+    dataset=dict(                      # training dataset
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/train.txt',
+        data_prefix='train',
+        pipeline=train_pipeline),
+    sampler=dict(type='DefaultSampler', shuffle=True),   # default sampler
+    persistent_workers=True,                             # Whether to keep the process, can shorten the preparation time of each epoch
+)
+# Construct the validation set dataloader
+val_dataloader = dict(
+    batch_size=32,
+    num_workers=5,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/val.txt',
+        data_prefix='val',
+        pipeline=test_pipeline),
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
+# The settings of the evaluation metrics for validation. We use the top1 and top5 accuracy here.
+val_evaluator = dict(type='Accuracy', topk=(1, 5))
+test_dataloader = val_dataloader  # The settings of the dataloader for the test dataset, which is the same as val_dataloader
+test_evaluator = val_evaluator    # The settings of the evaluation metrics for test, which is the same as val_evaluator
+```
+```{note}
+The data preprocessor can be defined either in the subfield of `model`, or a using the `data_preprocessor` definition here, if both of them exist, use the `model.data_preprocessor` configuration.
+```
+### Schedule settings
+This primitive config file mainly contains training strategy settings and the settings of training, val and
+test loops:
+- `optim_wrapper`: The settings of the optimizer wrapper. We use the optimizer wrapper to customize the
+  optimization process.
+  - `optimizer`: Supports all `pytorch` optimizers, refers to the relevant {external+mmengine:doc}`MMEngine documentation <tutorials/optim_wrapper>`.
+  - `paramwise_cfg`: To set different optimization arguments according to the parameters' type or name, refer to the relevant [learning policy documentation](../advanced_guides/schedule.md).
+  - `accumulative_counts`: Optimize parameters after several backward steps instead of one backward step. You
+    can use it to simulate large batch size by small batch size.
+- `param_scheduler`: Optimizer parameters policy. You can use it to specify learning rate and momentum curves during training. See the {external+mmengine:doc}`documentation <tutorials/param_scheduler>` in MMEngine for more details.
+- `train_cfg | val_cfg | test_cfg`: The settings of the training, validation and test loops, refer to the relevant {external+mmengine:doc}`MMEngine documentation <design/runner>`.
+Following is the schedule primitive config of the ResNet50 config in [`configs/_base_/datasets/imagenet_bs32.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/datasets/imagenet_bs32.py)：
+```python
+optim_wrapper = dict(
+    # Use SGD optimizer to optimize parameters.
+    optimizer=dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001))
+# The tuning strategy of the learning rate.
+# The 'MultiStepLR' means to use multiple steps policy to schedule the learning rate (LR).
+param_scheduler = dict(
+    type='MultiStepLR', by_epoch=True, milestones=[30, 60, 90], gamma=0.1)
+# Training configuration, iterate 100 epochs, and perform validation after every training epoch.
+# 'by_epoch=True' means to use `EpochBaseTrainLoop`, 'by_epoch=False' means to use IterBaseTrainLoop.
+train_cfg = dict(by_epoch=True, max_epochs=100, val_interval=1)
+# Use the default val loop settings.
+val_cfg = dict()
+# Use the default test loop settings.
+test_cfg = dict()
+# This schedule is for the total batch size 256.
+# If you use a different total batch size, like 512 and enable auto learning rate scaling.
+# We will scale up the learning rate to 2 times.
+auto_scale_lr = dict(base_batch_size=256)
+```
+### Runtime settings
+This part mainly includes saving the checkpoint strategy, log configuration, training parameters, breakpoint weight path, working directory, etc.
+Here is the runtime primitive config file ['configs/_base_/default_runtime.py'](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/default_runtime.py) file used by almost all configs:
+```python
+# defaults to use registries in mmpretrain
+default_scope = 'mmpretrain'
+# configure default hooks
+default_hooks = dict(
+    # record the time of every iteration.
+    timer=dict(type='IterTimerHook'),
+    # print log every 100 iterations.
+    logger=dict(type='LoggerHook', interval=100),
+    # enable the parameter scheduler.
+    param_scheduler=dict(type='ParamSchedulerHook'),
+    # save checkpoint per epoch.
+    checkpoint=dict(type='CheckpointHook', interval=1),
+    # set sampler seed in a distributed environment.
+    sampler_seed=dict(type='DistSamplerSeedHook'),
+    # validation results visualization, set True to enable it.
+    visualization=dict(type='VisualizationHook', enable=False),
+)
+# configure environment
+env_cfg = dict(
+    # whether to enable cudnn benchmark
+    cudnn_benchmark=False,
+    # set multi-process parameters
+    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
+    # set distributed parameters
+    dist_cfg=dict(backend='nccl'),
+)
+# set visualizer
+vis_backends = [dict(type='LocalVisBackend')]  # use local HDD backend
+visualizer = dict(
+    type='UniversalVisualizer', vis_backends=vis_backends, name='visualizer')
+# set log level
+log_level = 'INFO'
+# load from which checkpoint
+load_from = None
+# whether to resume training from the loaded checkpoint
+resume = False
+```
+## Inherit and Modify Config File
+For easy understanding, we recommend contributors inherit from existing config files. But do not abuse the
+inheritance. Usually, for all config files, we recommend the maximum inheritance level is 3.
+For example, if your config file is based on ResNet with some other modification, you can first inherit the
+basic ResNet structure, dataset and other training settings by specifying `_base_ ='./resnet50_8xb32_in1k.py'`
+(The path relative to your config file), and then modify the necessary parameters in the config file. A more
+specific example, now we want to use almost all configs in `configs/resnet/resnet50_8xb32_in1k.py`, but using
+`CutMix` train batch augment and changing the number of training epochs from 100 to 300, modify when to decay
+the learning rate, and modify the dataset path, you can create a new config file
+`configs/resnet/resnet50_8xb32-300e_in1k.py` with content as below:
+```python
+# create this file under 'configs/resnet/' folder
+_base_ = './resnet50_8xb32_in1k.py'
+# using CutMix batch augment
+model = dict(
+    train_cfg=dict(
+        augments=dict(type='CutMix', alpha=1.0)
+    )
+)
+# trains more epochs
+train_cfg = dict(max_epochs=300, val_interval=10)  # Train for 300 epochs, evaluate every 10 epochs
+param_scheduler = dict(step=[150, 200, 250])   # The learning rate adjustment has also changed
+# Use your own dataset directory
+train_dataloader = dict(
+    dataset=dict(data_root='mydata/imagenet/train'),
+)
+val_dataloader = dict(
+    batch_size=64,                  # No back-propagation during validation, larger batch size can be used
+    dataset=dict(data_root='mydata/imagenet/val'),
+)
+test_dataloader = dict(
+    batch_size=64,                  # No back-propagation during test, larger batch size can be used
+    dataset=dict(data_root='mydata/imagenet/val'),
+)
+```
+### Use intermediate variables in configs
+Some intermediate variables are used in the configuration file. The intermediate variables make the configuration file clearer and easier to modify.
+For example, `train_pipeline` / `test_pipeline` is the intermediate variable of the data pipeline. We first need to define `train_pipeline` / `test_pipeline`, and then pass them to `train_dataloader` / `test_dataloader`. If you want to modify the size of the input image during training and testing, you need to modify the intermediate variables of `train_pipeline` / `test_pipeline`.
+```python
+bgr_mean = [103.53, 116.28, 123.675]  # mean in BGR order
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='RandomResizedCrop', scale=224, backend='pillow', interpolation='bicubic'),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(
+        type='RandAugment',
+        policies='timm_increasing',
+        num_policies=2,
+        total_level=10,
+        magnitude_level=6,
+        magnitude_std=0.5,
+        hparams=dict(pad_val=[round(x) for x in bgr_mean], interpolation='bicubic')),
+    dict(type='PackInputs'),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='ResizeEdge', scale=236, edge='short', backend='pillow', interpolation='bicubic'),
+    dict(type='CenterCrop', crop_size=224),
+    dict(type='PackInputs')
+]
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+val_dataloader = dict(dataset=dict(pipeline=val_pipeline))
+test_dataloader = dict(dataset=dict(pipeline=val_pipeline))
+```
+### Ignore some fields in the base configs
+Sometimes, you need to set `_delete_=True` to ignore some domain content in the basic configuration file. You can refer to the {external+mmengine:doc}`documentation in MMEngine <advanced_tutorials/config>` for more instructions.
+The following is an example. If you want to use cosine schedule in the above ResNet50 case, just using inheritance and directly modifying it will report `get unexpected keyword 'step'` error, because the `'step'` field of the basic config in `param_scheduler` domain information is reserved, and you need to add `_delete_ =True` to ignore the content of `param_scheduler` related fields in the basic configuration file:
+```python
+_base_ = '../../configs/resnet/resnet50_8xb32_in1k.py'
+# the learning rate scheduler
+param_scheduler = dict(type='CosineAnnealingLR', by_epoch=True, _delete_=True)
+```
+### Use some fields in the base configs
+Sometimes, you may refer to some fields in the `_base_` config, to avoid duplication of definitions. You can refer to {external+mmengine:doc}`MMEngine <advanced_tutorials/config>` for some more instructions.
+The following is an example of using auto augment in the training data preprocessing pipeline, refer to [`configs/resnest/resnest50_32xb64_in1k.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnest/resnest50_32xb64_in1k.py). When defining `train_pipeline`, just add the definition file name of auto augment to `_base_`, and then use `_base_.auto_increasing_policies` to reference the variables in the primitive config:
+```python
+_base_ = [
+    '../_base_/models/resnest50.py', '../_base_/datasets/imagenet_bs64.py',
+    '../_base_/default_runtime.py', './_randaug_policies.py',
+]
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='RandAugment',
+        policies=_base_.policies, # This uses the `policies` parameter in the primitive config.
+        num_policies=2,
+        magnitude_level=12),
+    dict(type='EfficientNetRandomCrop', scale=224, backend='pillow'),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='ColorJitter', brightness=0.4, contrast=0.4, saturation=0.4),
+    dict(
+        type='Lighting',
+        eigval=EIGVAL,
+        eigvec=EIGVEC,
+        alphastd=0.1,
+        to_rgb=False),
+    dict(type='PackInputs'),
+]
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+```
+## Modify config in command
+When you use the script "tools/train.py" or "tools/test.py" to submit tasks or use some other tools, they can directly modify the content of the configuration file used by specifying the `--cfg-options` argument.
+- Update config keys of dict chains.
+  The config options can be specified following the order of the dict keys in the original config.
+  For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode.
+- Update keys inside a list of configs.
+  Some config dicts are composed as a list in your config. For example, the training pipeline `data.train.pipeline` is normally a list
+  e.g. `[dict(type='LoadImageFromFile'), dict(type='TopDownRandomFlip', flip_prob=0.5), ...]`. If you want to change `'flip_prob=0.5'` to `'flip_prob=0.0'` in the pipeline,
+  you may specify `--cfg-options data.train.pipeline.1.flip_prob=0.0`.
+- Update values of list/tuples.
+  If the value to be updated is a list or a tuple. For example, the config file normally sets `val_evaluator = dict(type='Accuracy', topk=(1, 5))`. If you want to change the field `topk`, you may specify `--cfg-options val_evaluator.topk="(1,3)"`. Note that the quotation mark " is necessary to support list/tuple data types and that **NO** white space is allowed inside the quotation marks in the specified value.
--- a/docs/en/user_guides/dataset_prepare.md
+++ b/docs/en/user_guides/dataset_prepare.md
+# Prepare Dataset
+## CustomDataset
+[`CustomDataset`](mmpretrain.datasets.CustomDataset) is a general dataset class for you to use your own datasets. To use `CustomDataset`, you need to organize your dataset files according to the following two formats:
+### Subfolder Format
+In this format, you only need to re-organize your dataset folder and place all samples in one folder without
+creating any annotation files.
+For supervised tasks (with `with_label=True`), we use the name of sub-folders as the categories names, as
+shown in the below example, `class_x` and `class_y` will be recognized as the categories names.
+```text
+data_prefix/
+├── class_x
+│   ├── xxx.png
+│   ├── xxy.png
+│   └── ...
+│       └── xxz.png
+└── class_y
+    ├── 123.png
+    ├── nsdf3.png
+    ├── ...
+    └── asd932_.png
+```
+For unsupervised tasks (with `with_label=False`), we directly load all sample files under the specified folder:
+```text
+data_prefix/
+├── folder_1
+│   ├── xxx.png
+│   ├── xxy.png
+│   └── ...
+├── 123.png
+├── nsdf3.png
+└── ...
+```
+Assume you want to use it as the training dataset, and the below is the configurations in your config file.
+```python
+train_dataloader = dict(
+    ...
+    # Training dataset configurations
+    dataset=dict(
+        type='CustomDataset',
+        data_prefix='path/to/data_prefix',
+        with_label=True,   # or False for unsupervised tasks
+        pipeline=...
+    )
+)
+```
+```{note}
+If you want to use this format, do not specify `ann_file`, or specify `ann_file=''`.
+And please note that the subfolder format requires a folder scanning which may cause a slower initialization,
+especially for large datasets or slow file IO.
+```
+### Text Annotation File Format
+In this format, we use a text annotation file to store image file paths and the corespondding category
+indices.
+For supervised tasks (with `with_label=True`), the annotation file should include the file path and the
+category index of one sample in one line and split them by a space, as below:
+All these file paths can be absolute paths, or paths relative to the `data_prefix`.
+```text
+folder_1/xxx.png 0
+folder_1/xxy.png 1
+123.png 4
+nsdf3.png 3
+...
+```
+```{note}
+The index numbers of categories start from 0. And the value of ground-truth labels should fall in range `[0, num_classes - 1]`.
+In addition, please use the `classes` field in the dataset settings to specify the name of every category.
+```
+For unsupervised tasks (with `with_label=False`), the annotation file only need to include the file path of
+one sample in one line, as below:
+```text
+folder_1/xxx.png
+folder_1/xxy.png
+123.png
+nsdf3.png
+...
+```
+Assume the entire dataset folder is as below:
+```text
+data_root
+├── meta
+│   ├── test.txt     # The annotation file for the test dataset
+│   ├── train.txt    # The annotation file for the training dataset
+│   └── val.txt      # The annotation file for the validation dataset.
+├── train
+│   ├── 123.png
+│   ├── folder_1
+│   │   ├── xxx.png
+│   │   └── xxy.png
+│   └── nsdf3.png
+├── test
+└── val
+```
+Here is an example dataset settings in config files:
+```python
+# Training dataloader configurations
+train_dataloader = dict(
+    dataset=dict(
+        type='CustomDataset',
+        data_root='path/to/data_root',  # The common prefix of both `ann_flie` and `data_prefix`.
+        ann_file='meta/train.txt',      # The path of annotation file relative to the data_root.
+        data_prefix='train',            # The prefix of file paths in the `ann_file`, relative to the data_root.
+        with_label=True,                # or False for unsupervised tasks
+        classes=['A', 'B', 'C', 'D', ...],  # The name of every category.
+        pipeline=...,    # The transformations to process the dataset samples.
+    )
+    ...
+)
+```
+```{note}
+For a complete example about how to use the `CustomDataset`, please see [How to Pretrain with Custom Dataset](../notes/pretrain_custom_dataset.md)
+```
+## ImageNet
+ImageNet has multiple versions, but the most commonly used one is [ILSVRC 2012](http://www.image-net.org/challenges/LSVRC/2012/). It can be accessed with the following steps.
+`````{tabs}
+````{group-tab} Download by MIM
+MIM supports downloading from [OpenXlab](https://openxlab.org.cn/datasets) and preprocessing ImageNet dataset with one command line.
+_You need to register an account at [OpenXlab official website](https://openxlab.org.cn/datasets) and login by CLI._
+```Bash
+# install OpenXlab CLI tools
+pip install -U openxlab
+# log in OpenXLab
+openxlab login
+# download and preprocess by MIM, better to execute in $MMPreTrain directory.
+mim download mmpretrain --dataset imagenet1k
+```
+````
+````{group-tab} Download form Official Source
+1. Register an account and login to the [download page](http://www.image-net.org/download-images).
+2. Find download links for ILSVRC2012 and download the following two files
+   - ILSVRC2012_img_train.tar (~138GB)
+   - ILSVRC2012_img_val.tar (~6.3GB)
+3. Untar the downloaded files
+````
+`````
+### The Directory Structrue of the ImageNet dataset
+We support two ways of organizing the ImageNet dataset: Subfolder Format and Text Annotation File Format.
+#### Subfolder Format
+We have provided a sample, which you can download and extract from this [link](https://download.openmmlab.com/mmpretrain/datasets/imagenet_1k.zip). The directory structure of the dataset should be as below:
+```text
+data/imagenet/
+├── train/
+│   ├── n01440764
+│   │   ├── n01440764_10026.JPEG
+│   │   ├── n01440764_10027.JPEG
+│   │   ├── n01440764_10029.JPEG
+│   │   ├── n01440764_10040.JPEG
+│   │   ├── n01440764_10042.JPEG
+│   │   ├── n01440764_10043.JPEG
+│   │   └── n01440764_10048.JPEG
+│   ├── ...
+├── val/
+│   ├── n01440764
+│   │   ├── ILSVRC2012_val_00000293.JPEG
+│   │   ├── ILSVRC2012_val_00002138.JPEG
+│   │   ├── ILSVRC2012_val_00003014.JPEG
+│   │   └── ...
+│   ├── ...
+```
+#### Text Annotation File Format
+You can download and untar the meta data from this [link](https://download.openmmlab.com/mmclassification/datasets/imagenet/meta/caffe_ilsvrc12.tar.gz). And re-organize the dataset as below:
+```text
+data/imagenet/
+├── meta/
+│   ├── train.txt
+│   ├── test.txt
+│   └── val.txt
+├── train/
+│   ├── n01440764
+│   │   ├── n01440764_10026.JPEG
+│   │   ├── n01440764_10027.JPEG
+│   │   ├── n01440764_10029.JPEG
+│   │   ├── n01440764_10040.JPEG
+│   │   ├── n01440764_10042.JPEG
+│   │   ├── n01440764_10043.JPEG
+│   │   └── n01440764_10048.JPEG
+│   ├── ...
+├── val/
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+│   ├── ILSVRC2012_val_00000003.JPEG
+│   ├── ILSVRC2012_val_00000004.JPEG
+│   ├── ...
+```
+### Configuration
+Once your dataset is organized in the way described above, you can use the [`ImageNet`](mmpretrain.datasets.ImageNet) dataset with the below configurations:
+```python
+train_dataloader = dict(
+    ...
+    # Training dataset configurations
+    dataset=dict(
+        type='ImageNet',
+        data_root='data/imagenet',
+        split='train',
+        pipeline=...,
+    )
+)
+val_dataloader = dict(
+    ...
+    # Validation dataset configurations
+    dataset=dict(
+        type='ImageNet',
+        data_root='data/imagenet',
+        split='val',
+        pipeline=...,
+    )
+)
+test_dataloader = val_dataloader
+```
+## Supported Image Classification Datasets
+| Datasets                                                                           | split                               | HomePage                                                                            |
+| ---------------------------------------------------------------------------------- | :---------------------------------- | ----------------------------------------------------------------------------------- |
+| [`Calthch101`](mmpretrain.datasets.Caltech101)(data_root[, split, pipeline, ...])  | ["train", "test"]                   | [Caltech 101](https://data.caltech.edu/records/mzrjq-6wc02) Dataset.                |
+| [`CIFAR10`](mmpretrain.datasets.CIFAR10)(data_root[, split, pipeline, ...])        | ["train", "test"]                   | [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) Dataset.                     |
+| [`CIFAR100`](mmpretrain.datasets.CIFAR100)(data_root[, split, pipeline, ...])      | ["train", "test"]                   | [CIFAR100](https://www.cs.toronto.edu/~kriz/cifar.html) Dataset.                    |
+| [`CUB`](mmpretrain.datasets.CUB)(data_root[, split, pipeline, ...])                | ["train", "test"]                   | [CUB-200-2011](http://www.vision.caltech.edu/datasets/cub_200_2011/) Dataset.       |
+| [`DTD`](mmpretrain.datasets.DTD)(data_root[, split, pipeline, ...])                | ["train", "val", "tranval", "test"] | [Describable Texture Dataset (DTD)](https://www.robots.ox.ac.uk/~vgg/data/dtd/) Dataset. |
+| [`FashionMNIST`](mmpretrain.datasets.FashionMNIST) (data_root[, split, pipeline, ...]) | ["train", "test"]                   | [Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist) Dataset.          |
+| [`FGVCAircraft`](mmpretrain.datasets.FGVCAircraft)(data_root[, split, pipeline, ...]) | ["train", "val", "tranval", "test"] | [FGVC Aircraft](https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/) Dataset.      |
+| [`Flowers102`](mmpretrain.datasets.Flowers102)(data_root[, split, pipeline, ...])  | ["train", "val", "tranval", "test"] | [Oxford 102 Flower](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) Dataset.    |
+| [`Food101`](mmpretrain.datasets.Food101)(data_root[, split, pipeline, ...])        | ["train", "test"]                   | [Food101](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/) Dataset.     |
+| [`MNIST`](mmpretrain.datasets.MNIST) (data_root[, split, pipeline, ...])           | ["train", "test"]                   | [MNIST](http://yann.lecun.com/exdb/mnist/) Dataset.                                 |
+| [`OxfordIIITPet`](mmpretrain.datasets.OxfordIIITPet)(data_root[, split, pipeline, ...]) | ["tranval", test"]                  | [Oxford-IIIT Pets](https://www.robots.ox.ac.uk/~vgg/data/pets/) Dataset.            |
+| [`Places205`](mmpretrain.datasets.Places205)(data_root[, pipeline, ...])           | -                                   | [Places205](http://places.csail.mit.edu/downloadData.html) Dataset.                 |
+| [`StanfordCars`](mmpretrain.datasets.StanfordCars)(data_root[, split, pipeline, ...]) | ["train", "test"]                   | [Stanford Cars](https://ai.stanford.edu/~jkrause/cars/car_dataset.html) Dataset.    |
+| [`SUN397`](mmpretrain.datasets.SUN397)(data_root[, split, pipeline, ...])          | ["train", "test"]                   | [SUN397](https://vision.princeton.edu/projects/2010/SUN/) Dataset.                  |
+| [`VOC`](mmpretrain.datasets.VOC)(data_root[, image_set_path, pipeline, ...])       | ["train", "val", "tranval", "test"] | [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) Dataset.                      |
+Some dataset homepage links may be unavailable, and you can download datasets through [OpenXLab](https://openxlab.org.cn/datasets), such as [Stanford Cars](https://openxlab.org.cn/datasets/OpenDataLab/Stanford_Cars).
+## Supported Multi-modality Datasets
+| Datasets                                                                                      | split                    | HomePage                                                                            |
+| --------------------------------------------------------------------------------------------- | :----------------------- | ----------------------------------------------------------------------------------- |
+| [`RefCOCO`](mmpretrain.datasets.RefCOCO)(data_root, ann_file, data_prefix, split_file[, split, ...]) | ["train", "val", "test"] | [RefCOCO](https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip) Dataset. |
+Some dataset homepage links may be unavailable, and you can download datasets through [OpenDataLab](https://opendatalab.com/), such as [RefCOCO](https://opendatalab.com/RefCOCO/download).
+## OpenMMLab 2.0 Standard Dataset
+In order to facilitate the training of multi-task algorithm models, we unify the dataset interfaces of different tasks. OpenMMLab has formulated the **OpenMMLab 2.0 Dataset Format Specification**. When starting a trainning task, the users can choose to convert their dataset annotation into the specified format, and use the algorithm library of OpenMMLab to perform algorithm training and testing based on the data annotation file.
+The OpenMMLab 2.0 Dataset Format Specification stipulates that the annotation file must be in `json` or `yaml`, `yml`, `pickle` or `pkl` format; the dictionary stored in the annotation file must contain `metainfo` and `data_list` fields, The value of `metainfo` is a dictionary, which contains the meta information of the dataset; and the value of `data_list` is a list, each element in the list is a dictionary, the dictionary defines a raw data, each raw data contains a or several training/testing samples.
+The following is an example of a JSON annotation file (in this example each raw data contains only one train/test sample):
+```
+{
+    'metainfo':
+        {
+            'classes': ('cat', 'dog'), # the category index of 'cat' is 0 and 'dog' is 1.
+            ...
+        },
+    'data_list':
+        [
+            {
+                'img_path': "xxx/xxx_0.jpg",
+                'gt_label': 0,
+                ...
+            },
+            {
+                'img_path': "xxx/xxx_1.jpg",
+                'gt_label': 1,
+                ...
+            },
+            ...
+        ]
+}
+```
+Assume you want to use the training dataset and the dataset is stored as the below structure:
+```text
+data
+├── annotations
+│   ├── train.json
+├── train
+│   ├── xxx/xxx_0.jpg
+│   ├── xxx/xxx_1.jpg
+│   ├── ...
+```
+Build from the following dictionaries:
+```python
+train_dataloader = dict(
+    ...
+    dataset=dict(
+        type='BaseDataset',
+        data_root='data',
+        ann_file='annotations/train.json',
+        data_prefix='train/',
+        pipeline=...,
+    )
+)
+```
+## Other Datasets
+To find more datasets supported by MMPretrain, and get more configurations of the above datasets, please see the [dataset documentation](mmpretrain.datasets).
+To implement your own dataset class for some special formats, please see the [Adding New Dataset](../advanced_guides/datasets.md).
+## Dataset Wrappers
+The following datawrappers are supported in MMEngine, you can refer to {external+mmengine:doc}`MMEngine tutorial <advanced_tutorials/basedataset>` to learn how to use it.
+- {external:py:class}`~mmengine.dataset.ConcatDataset`
+- {external:py:class}`~mmengine.dataset.RepeatDataset`
+- {external:py:class}`~mmengine.dataset.ClassBalancedDataset`
+The MMPretrain also support [KFoldDataset](mmpretrain.datasets.KFoldDataset), please use it with `tools/kfold-cross-valid.py`.
--- a/docs/en/user_guides/downstream.md
+++ b/docs/en/user_guides/downstream.md
+# Downstream tasks
+## Detection
+For detection tasks, please use MMDetection. First, make sure you have installed [MIM](https://github.com/open-mmlab/mim), which is also a project of OpenMMLab.
+```shell
+pip install openmim
+mim install 'mmdet>=3.0.0rc0'
+```
+Besides, please refer to MMDet for [installation](https://mmdetection.readthedocs.io/en/dev-3.x/get_started.html) and [data preparation](https://mmdetection.readthedocs.io/en/dev-3.x/user_guides/dataset_prepare.html)
+### Train
+After installation, you can run MMDetection with simple command.
+```shell
+# distributed version
+bash tools/benchmarks/mmdetection/mim_dist_train_c4.sh ${CONFIG} ${PRETRAIN} ${GPUS}
+bash tools/benchmarks/mmdetection/mim_dist_train_fpn.sh ${CONFIG} ${PRETRAIN} ${GPUS}
+# slurm version
+bash tools/benchmarks/mmdetection/mim_slurm_train_c4.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
+bash tools/benchmarks/mmdetection/mim_slurm_train_fpn.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
+```
+- `${CONFIG}`: Use config file path in MMDetection directly. And for some algorithms, we also have some
+  modified config files which can be found in the `benchmarks` folder under the correspondding algorithm
+  folder. You can also writing your config file from scratch.
+- `${PRETRAIN}`: the pre-trained model file.
+- `${GPUS}`: The number of GPUs that you want to use to train. We adopt 8 GPUs for detection tasks by default.
+Example:
+```shell
+bash ./tools/benchmarks/mmdetection/mim_dist_train_c4.sh \
+  configs/byol/benchmarks/mask-rcnn_r50-c4_ms-1x_coco.py \
+  https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 8
+```
+### Test
+After training, you can also run the command below to test your model.
+```shell
+# distributed version
+bash tools/benchmarks/mmdetection/mim_dist_test.sh ${CONFIG} ${CHECKPOINT} ${GPUS}
+# slurm version
+bash tools/benchmarks/mmdetection/mim_slurm_test.sh ${PARTITION} ${CONFIG} ${CHECKPOINT}
+```
+- `${CONFIG}`: Use config file name in MMDetection directly. And for some algorithms, we also have some
+  modified config files which can be found in the `benchmarks` folder under the correspondding algorithm
+  folder. You can also writing your config file from scratch.
+- `${CHECKPOINT}`: The fine-tuned detection model that you want to test.
+- `${GPUS}`: The number of GPUs that you want to use to test. We adopt 8 GPUs for detection tasks by default.
+Example:
+```shell
+bash ./tools/benchmarks/mmdetection/mim_dist_test.sh \
+configs/byol/benchmarks/mask-rcnn_r50_fpn_ms-1x_coco.py \
+https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 8
+```
+## Segmentation
+For semantic segmentation task, we use MMSegmentation. First, make sure you have installed [MIM](https://github.com/open-mmlab/mim), which is also a project of OpenMMLab.
+```shell
+pip install openmim
+mim install 'mmsegmentation>=1.0.0rc0'
+```
+Besides, please refer to MMSegmentation for [installation](https://mmsegmentation.readthedocs.io/en/dev-1.x/get_started.html) and [data preparation](https://mmsegmentation.readthedocs.io/en/dev-1.x/user_guides/2_dataset_prepare.html).
+### Train
+After installation, you can run MMSegmentation with simple command.
+```shell
+# distributed version
+bash tools/benchmarks/mmsegmentation/mim_dist_train.sh ${CONFIG} ${PRETRAIN} ${GPUS}
+# slurm version
+bash tools/benchmarks/mmsegmentation/mim_slurm_train.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
+```
+- `${CONFIG}`: Use config file path in MMSegmentation directly. And for some algorithms, we also have some
+  modified config files which can be found in the `benchmarks` folder under the correspondding algorithm
+  folder. You can also writing your config file from scratch.
+- `${PRETRAIN}`: the pre-trained model file.
+- `${GPUS}`: The number of GPUs that you want to use to train. We adopt 4 GPUs for segmentation tasks by default.
+Example:
+```shell
+bash ./tools/benchmarks/mmsegmentation/mim_dist_train.sh \
+configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_4xb4-20k_voc12aug-512x512.py \
+https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 4
+```
+### Test
+After training, you can also run the command below to test your model.
+```shell
+# distributed version
+bash tools/benchmarks/mmsegmentation/mim_dist_test.sh ${CONFIG} ${CHECKPOINT} ${GPUS}
+# slurm version
+bash tools/benchmarks/mmsegmentation/mim_slurm_test.sh ${PARTITION} ${CONFIG} ${CHECKPOINT}
+```
+- `${CONFIG}`: Use config file name in MMSegmentation directly. And for some algorithms, we also have some
+  modified config files which can be found in the `benchmarks` folder under the correspondding algorithm
+  folder. You can also writing your config file from scratch.
+- `${CHECKPOINT}`: The fine-tuned segmentation model that you want to test.
+- `${GPUS}`: The number of GPUs that you want to use to test. We adopt 4 GPUs for segmentation tasks by default.
+Example:
+```shell
+bash ./tools/benchmarks/mmsegmentation/mim_dist_test.sh  fcn_r50-d8_4xb4-20k_voc12aug-512x512.py \
+https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 4
+```
--- a/docs/en/user_guides/inference.md
+++ b/docs/en/user_guides/inference.md
+# Inference with existing models
+This tutorial will show how to use the following APIs：
+- [**`list_models`**](mmpretrain.apis.list_models): List available model names in MMPreTrain.
+- [**`get_model`**](mmpretrain.apis.get_model): Get a model from model name or model config.
+- [**`inference_model`**](mmpretrain.apis.inference_model): Inference a model with the correspondding
+  inferencer. It's a shortcut for a quick start, and for advanced usage, please use the below inferencer
+  directly.
+- Inferencers:
+  1. [**`ImageClassificationInferencer`**](mmpretrain.apis.ImageClassificationInferencer):
+     Perform image classification on the given image.
+  2. [**`ImageRetrievalInferencer`**](mmpretrain.apis.ImageRetrievalInferencer):
+     Perform image-to-image retrieval from the given image on a given image set.
+  3. [**`ImageCaptionInferencer`**](mmpretrain.apis.ImageCaptionInferencer):
+     Generate a caption on the given image.
+  4. [**`VisualQuestionAnsweringInferencer`**](mmpretrain.apis.VisualQuestionAnsweringInferencer):
+     Answer a question according to the given image.
+  5. [**`VisualGroundingInferencer`**](mmpretrain.apis.VisualGroundingInferencer):
+     Locate an object from the description on the given image.
+  6. [**`TextToImageRetrievalInferencer`**](mmpretrain.apis.TextToImageRetrievalInferencer):
+     Perform text-to-image retrieval from the given description on a given image set.
+  7. [**`ImageToTextRetrievalInferencer`**](mmpretrain.apis.ImageToTextRetrievalInferencer):
+     Perform image-to-text retrieval from the given image on a series of text.
+  8. [**`NLVRInferencer`**](mmpretrain.apis.NLVRInferencer):
+     Perform Natural Language for Visual Reasoning on a given image-pair and text.
+  9. [**`FeatureExtractor`**](mmpretrain.apis.FeatureExtractor):
+     Extract features from the image files by a vision backbone.
+## List available models
+list all the models in MMPreTrain.
+```python
+>>> from mmpretrain import list_models
+>>> list_models()
+['barlowtwins_resnet50_8xb256-coslr-300e_in1k',
+ 'beit-base-p16_beit-in21k-pre_3rdparty_in1k',
+ ...]
+```
+`list_models` supports Unix filename pattern matching, you can use \*\* * \*\* to match any character.
+```python
+>>> from mmpretrain import list_models
+>>> list_models("*convnext-b*21k")
+['convnext-base_3rdparty_in21k',
+ 'convnext-base_in21k-pre-3rdparty_in1k-384px',
+ 'convnext-base_in21k-pre_3rdparty_in1k']
+```
+You can use the `list_models` method of inferencers to get the available models of the correspondding tasks.
+```python
+>>> from mmpretrain import ImageCaptionInferencer
+>>> ImageCaptionInferencer.list_models()
+['blip-base_3rdparty_caption',
+ 'blip2-opt2.7b_3rdparty-zeroshot_caption',
+ 'flamingo_3rdparty-zeroshot_caption',
+ 'ofa-base_3rdparty-finetuned_caption']
+```
+## Get a model
+you can use `get_model` get the model.
+```python
+>>> from mmpretrain import get_model
+# Get model without loading pre-trained weight.
+>>> model = get_model("convnext-base_in21k-pre_3rdparty_in1k")
+# Get model and load the default checkpoint.
+>>> model = get_model("convnext-base_in21k-pre_3rdparty_in1k", pretrained=True)
+# Get model and load the specified checkpoint.
+>>> model = get_model("convnext-base_in21k-pre_3rdparty_in1k", pretrained="your_local_checkpoint_path")
+# Get model with extra initialization arguments, for example, modify the num_classes in head.
+>>> model = get_model("convnext-base_in21k-pre_3rdparty_in1k", head=dict(num_classes=10))
+# Another example, remove the neck and head, and output from stage 1, 2, 3 in backbone
+>>> model_headless = get_model("resnet18_8xb32_in1k", head=None, neck=None, backbone=dict(out_indices=(1, 2, 3)))
+```
+The obtained model is a usual PyTorch module.
+```python
+>>> import torch
+>>> from mmpretrain import get_model
+>>> model = get_model('convnext-base_in21k-pre_3rdparty_in1k', pretrained=True)
+>>> x = torch.rand((1, 3, 224, 224))
+>>> y = model(x)
+>>> print(type(y), y.shape)
+<class 'torch.Tensor'> torch.Size([1, 1000])
+```
+## Inference on given images
+Here is an example to inference an [image](https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG) by the ResNet-50 pre-trained classification model.
+```python
+>>> from mmpretrain import inference_model
+>>> image = 'https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG'
+>>> # If you have no graphical interface, please set `show=False`
+>>> result = inference_model('resnet50_8xb32_in1k', image, show=True)
+>>> print(result['pred_class'])
+sea snake
+```
+The `inference_model` API is only for demo and cannot keep the model instance or inference on multiple
+samples. You can use the inferencers for multiple calling.
+```python
+>>> from mmpretrain import ImageClassificationInferencer
+>>> image = 'https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG'
+>>> inferencer = ImageClassificationInferencer('resnet50_8xb32_in1k')
+>>> # Note that the inferencer output is a list of result even if the input is a single sample.
+>>> result = inferencer('https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG')[0]
+>>> print(result['pred_class'])
+sea snake
+>>>
+>>> # You can also use is for multiple images.
+>>> image_list = ['demo/demo.JPEG', 'demo/bird.JPEG'] * 16
+>>> results = inferencer(image_list, batch_size=8)
+>>> print(len(results))
+32
+>>> print(results[1]['pred_class'])
+house finch, linnet, Carpodacus mexicanus
+```
+Usually, the result for every sample is a dictionary. For example, the image classification result is a dictionary containing `pred_label`, `pred_score`, `pred_scores` and `pred_class` as follows:
+```python
+{
+    "pred_label": 65,
+    "pred_score": 0.6649366617202759,
+    "pred_class":"sea snake",
+    "pred_scores": array([..., 0.6649366617202759, ...], dtype=float32)
+}
+```
+You can configure the inferencer by arguments, for example, use your own config file and checkpoint to
+inference images by CUDA.
+```python
+>>> from mmpretrain import ImageClassificationInferencer
+>>> image = 'https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG'
+>>> config = 'configs/resnet/resnet50_8xb32_in1k.py'
+>>> checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'
+>>> inferencer = ImageClassificationInferencer(model=config, pretrained=checkpoint, device='cuda')
+>>> result = inferencer(image)[0]
+>>> print(result['pred_class'])
+sea snake
+```
+## Inference by a Gradio demo
+We also provide a gradio demo for all supported tasks and you can find it in [projects/gradio_demo/launch.py](https://github.com/open-mmlab/mmpretrain/blob/main/projects/gradio_demo/launch.py).
+Please install `gradio` by `pip install -U gradio` at first.
+Here is the interface preview:
+<img src="https://user-images.githubusercontent.com/26739999/236147750-90ccb517-92c0-44e9-905e-1473677023b1.jpg" width="100%"/>
+## Extract Features From Image
+Compared with `model.extract_feat`, `FeatureExtractor` is used to extract features from the image files directly, instead of a batch of tensors.
+In a word, the input of `model.extract_feat` is `torch.Tensor`, the input of `FeatureExtractor` is images.
+```python
+>>> from mmpretrain import FeatureExtractor, get_model
+>>> model = get_model('resnet50_8xb32_in1k', backbone=dict(out_indices=(0, 1, 2, 3)))
+>>> extractor = FeatureExtractor(model)
+>>> features = extractor('https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG')[0]
+>>> features[0].shape, features[1].shape, features[2].shape, features[3].shape
+(torch.Size([256]), torch.Size([512]), torch.Size([1024]), torch.Size([2048]))
+```
--- a/docs/en/user_guides/test.md
+++ b/docs/en/user_guides/test.md
+# Test
+For image classification task and image retrieval task, you could test your model after training.
+## Test with your PC
+You can use `tools/test.py` to test a model on a single machine with a CPU and optionally a GPU.
+Here is the full usage of the script:
+```shell
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
+```
+````{note}
+By default, MMPretrain prefers GPU to CPU. If you want to test a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
+```bash
+CUDA_VISIBLE_DEVICES=-1 python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
+```
+````
+| ARGS                                  | Description                                                                                                                                                         |
+| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `CONFIG_FILE`                         | The path to the config file.                                                                                                                                        |
+| `CHECKPOINT_FILE`                     | The path to the checkpoint file (It can be a http link, and you can find checkpoints [here](https://mmpretrain.readthedocs.io/en/latest/modelzoo_statistics.html)). |
+| `--work-dir WORK_DIR`                 | The directory to save the file containing evaluation metrics.                                                                                                       |
+| `--out OUT`                           | The path to save the file containing test results.                                                                                                                  |
+| `--out-item OUT_ITEM`                 | To specify the content of the test results file, and it can be "pred" or "metrics". If "pred", save the outputs of the model for offline evaluation. If "metrics", save the evaluation metrics. Defaults to "pred". |
+| `--cfg-options CFG_OPTIONS`           | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either `key="[a,b]"` or `key=a,b`. The argument also allows nested list/tuple values, e.g. `key="[(a,b),(c,d)]"`. Note that the quotation marks are necessary and that no white space is allowed. |
+| `--show-dir SHOW_DIR`                 | The directory to save the result visualization images.                                                                                                              |
+| `--show`                              | Visualize the prediction result in a window.                                                                                                                        |
+| `--interval INTERVAL`                 | The interval of samples to visualize.                                                                                                                               |
+| `--wait-time WAIT_TIME`               | The display time of every window (in seconds). Defaults to 1.                                                                                                       |
+| `--no-pin-memory`                     | Whether to disable the `pin_memory` option in dataloaders.                                                                                                          |
+| `--tta`                               | Whether to enable the Test-Time-Aug (TTA). If the config file has `tta_pipeline` and `tta_model` fields, use them to determine the TTA transforms and how to merge the TTA results. Otherwise, use flip TTA by averaging classification score. |
+| `--launcher {none,pytorch,slurm,mpi}` | Options for job launcher.                                                                                                                                           |
+## Test with multiple GPUs
+We provide a shell script to start a multi-GPUs task with `torch.distributed.launch`.
+```shell
+bash ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [PY_ARGS]
+```
+| ARGS              | Description                                                                                                                                                         |
+| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `CONFIG_FILE`     | The path to the config file.                                                                                                                                        |
+| `CHECKPOINT_FILE` | The path to the checkpoint file (It can be a http link, and you can find checkpoints [here](https://mmpretrain.readthedocs.io/en/latest/modelzoo_statistics.html)). |
+| `GPU_NUM`         | The number of GPUs to be used.                                                                                                                                      |
+| `[PY_ARGS]`       | The other optional arguments of `tools/test.py`, see [here](#test-with-your-pc).                                                                                    |
+You can also specify extra arguments of the launcher by environment variables. For example, change the
+communication port of the launcher to 29666 by the below command:
+```shell
+PORT=29666 bash ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [PY_ARGS]
+```
+If you want to startup multiple test jobs and use different GPUs, you can launch them by specifying
+different port and visible devices.
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash ./tools/dist_test.sh ${CONFIG_FILE1} ${CHECKPOINT_FILE} 4 [PY_ARGS]
+CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash ./tools/dist_test.sh ${CONFIG_FILE2} ${CHECKPOINT_FILE} 4 [PY_ARGS]
+```
+## Test with multiple machines
+### Multiple machines in the same network
+If you launch a test job with multiple machines connected with ethernet, you can run the following commands:
+On the first machine:
+```shell
+NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_test.sh $CONFIG $CHECKPOINT_FILE $GPUS
+```
+On the second machine:
+```shell
+NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_test.sh $CONFIG $CHECKPOINT_FILE $GPUS
+```
+Comparing with multi-GPUs in a single machine, you need to specify some extra environment variables:
+| ENV_VARS      | Description                                                                  |
+| ------------- | ---------------------------------------------------------------------------- |
+| `NNODES`      | The total number of machines.                                                |
+| `NODE_RANK`   | The index of the local machine.                                              |
+| `PORT`        | The communication port, it should be the same in all machines.               |
+| `MASTER_ADDR` | The IP address of the master machine, it should be the same in all machines. |
+Usually it is slow if you do not have high speed networking like InfiniBand.
+### Multiple machines managed with slurm
+If you run MMPretrain on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `tools/slurm_test.sh`.
+```shell
+[ENV_VARS] ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${CHECKPOINT_FILE} [PY_ARGS]
+```
+Here are the arguments description of the script.
+| ARGS              | Description                                                                                                                                                         |
+| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `PARTITION`       | The partition to use in your cluster.                                                                                                                               |
+| `JOB_NAME`        | The name of your job, you can name it as you like.                                                                                                                  |
+| `CONFIG_FILE`     | The path to the config file.                                                                                                                                        |
+| `CHECKPOINT_FILE` | The path to the checkpoint file (It can be a http link, and you can find checkpoints [here](https://mmpretrain.readthedocs.io/en/latest/modelzoo_statistics.html)). |
+| `[PY_ARGS]`       | The other optional arguments of `tools/test.py`, see [here](#test-with-your-pc).                                                                                    |
+Here are the environment variables can be used to configure the slurm job.
+| ENV_VARS        | Description                                                                                                |
+| --------------- | ---------------------------------------------------------------------------------------------------------- |
+| `GPUS`          | The number of GPUs to be used. Defaults to 8.                                                              |
+| `GPUS_PER_NODE` | The number of GPUs to be allocated per node.                                                               |
+| `CPUS_PER_TASK` | The number of CPUs to be allocated per task (Usually one GPU corresponds to one task). Defaults to 5.      |
+| `SRUN_ARGS`     | The other arguments of `srun`. Available options can be found [here](https://slurm.schedmd.com/srun.html). |
--- a/docs/en/user_guides/train.md
+++ b/docs/en/user_guides/train.md
+# Train
+In this tutorial, we will introduce how to use the scripts provided in MMPretrain to start a training task. If
+you need, we also have some practice examples about [how to pretrain with custom dataset](../notes/pretrain_custom_dataset.md)
+and [how to finetune with custom dataset](../notes/finetune_custom_dataset.md).
+## Train with your PC
+You can use `tools/train.py` to train a model on a single machine with a CPU and optionally a GPU.
+Here is the full usage of the script:
+```shell
+python tools/train.py ${CONFIG_FILE} [ARGS]
+```
+````{note}
+By default, MMPretrain prefers GPU to CPU. If you want to train a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
+```bash
+CUDA_VISIBLE_DEVICES=-1 python tools/train.py ${CONFIG_FILE} [ARGS]
+```
+````
+| ARGS                                  | Description                                                                                                                                                         |
+| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `CONFIG_FILE`                         | The path to the config file.                                                                                                                                        |
+| `--work-dir WORK_DIR`                 | The target folder to save logs and checkpoints. Defaults to a folder with the same name of the config file under `./work_dirs`.                                     |
+| `--resume [RESUME]`                   | Resume training. If specify a path, resume from it, while if not specify, try to auto resume from the latest checkpoint.                                            |
+| `--amp`                               | Enable automatic-mixed-precision training.                                                                                                                          |
+| `--no-validate`                       | **Not suggested**. Disable checkpoint evaluation during training.                                                                                                   |
+| `--auto-scale-lr`                     | Auto scale the learning rate according to the actual batch size and the original batch size.                                                                        |
+| `--no-pin-memory`                     | Whether to disable the `pin_memory` option in dataloaders.                                                                                                          |
+| `--no-persistent-workers`             | Whether to disable the `persistent_workers` option in dataloaders.                                                                                                  |
+| `--cfg-options CFG_OPTIONS`           | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either `key="[a,b]"` or `key=a,b`. The argument also allows nested list/tuple values, e.g. `key="[(a,b),(c,d)]"`. Note that the quotation marks are necessary and that no white space is allowed. |
+| `--launcher {none,pytorch,slurm,mpi}` | Options for job launcher.                                                                                                                                           |
+## Train with multiple GPUs
+We provide a shell script to start a multi-GPUs task with `torch.distributed.launch`.
+```shell
+bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]
+```
+| ARGS          | Description                                                                        |
+| ------------- | ---------------------------------------------------------------------------------- |
+| `CONFIG_FILE` | The path to the config file.                                                       |
+| `GPU_NUM`     | The number of GPUs to be used.                                                     |
+| `[PY_ARGS]`   | The other optional arguments of `tools/train.py`, see [here](#train-with-your-pc). |
+You can also specify extra arguments of the launcher by environment variables. For example, change the
+communication port of the launcher to 29666 by the below command:
+```shell
+PORT=29666 bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]
+```
+If you want to startup multiple training jobs and use different GPUs, you can launch them by specifying
+different ports and visible devices.
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash ./tools/dist_train.sh ${CONFIG_FILE1} 4 [PY_ARGS]
+CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash ./tools/dist_train.sh ${CONFIG_FILE2} 4 [PY_ARGS]
+```
+## Train with multiple machines
+### Multiple machines in the same network
+If you launch a training job with multiple machines connected with ethernet, you can run the following commands:
+On the first machine:
+```shell
+NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
+```
+On the second machine:
+```shell
+NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
+```
+Comparing with multi-GPUs in a single machine, you need to specify some extra environment variables:
+| ENV_VARS      | Description                                                                  |
+| ------------- | ---------------------------------------------------------------------------- |
+| `NNODES`      | The total number of machines.                                                |
+| `NODE_RANK`   | The index of the local machine.                                              |
+| `PORT`        | The communication port, it should be the same in all machines.               |
+| `MASTER_ADDR` | The IP address of the master machine, it should be the same in all machines. |
+Usually it is slow if you do not have high speed networking like InfiniBand.
+### Multiple machines managed with slurm
+If you run MMPretrain on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `tools/slurm_train.sh`.
+```shell
+[ENV_VARS] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [PY_ARGS]
+```
+Here are the arguments description of the script.
+| ARGS          | Description                                                                        |
+| ------------- | ---------------------------------------------------------------------------------- |
+| `PARTITION`   | The partition to use in your cluster.                                              |
+| `JOB_NAME`    | The name of your job, you can name it as you like.                                 |
+| `CONFIG_FILE` | The path to the config file.                                                       |
+| `WORK_DIR`    | The target folder to save logs and checkpoints.                                    |
+| `[PY_ARGS]`   | The other optional arguments of `tools/train.py`, see [here](#train-with-your-pc). |
+Here are the environment variables can be used to configure the slurm job.
+| ENV_VARS        | Description                                                                                                |
+| --------------- | ---------------------------------------------------------------------------------------------------------- |
+| `GPUS`          | The number of GPUs to be used. Defaults to 8.                                                              |
+| `GPUS_PER_NODE` | The number of GPUs to be allocated per node..                                                              |
+| `CPUS_PER_TASK` | The number of CPUs to be allocated per task (Usually one GPU corresponds to one task). Defaults to 5.      |
+| `SRUN_ARGS`     | The other arguments of `srun`. Available options can be found [here](https://slurm.schedmd.com/srun.html). |
--- a/docs/zh_CN/Makefile
+++ b/docs/zh_CN/Makefile
+# Minimal makefile for Sphinx documentation
+#
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+.PHONY: help Makefile
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/zh_CN/_static/css/readthedocs.css
+++ b/docs/zh_CN/_static/css/readthedocs.css
+.header-logo {
+    background-image: url("../image/mmpt-logo.png");
+    background-size: 183px 50px;
+    height: 50px;
+    width: 183px;
+}
+@media screen and (min-width: 1100px) {
+  .header-logo {
+    top: -12px;
+  }
+}
+pre {
+    white-space: pre;
+}
+@media screen and (min-width: 2000px) {
+  .pytorch-content-left {
+    width: 1200px;
+    margin-left: 30px;
+  }
+  article.pytorch-article {
+    max-width: 1200px;
+  }
+  .pytorch-breadcrumbs-wrapper {
+    width: 1200px;
+  }
+  .pytorch-right-menu.scrolling-fixed {
+    position: fixed;
+    top: 45px;
+    left: 1580px;
+  }
+}
+article.pytorch-article section code {
+  padding: .2em .4em;
+  background-color: #f3f4f7;
+  border-radius: 5px;
+}
+/* Disable the change in tables */
+article.pytorch-article section table code {
+  padding: unset;
+  background-color: unset;
+  border-radius: unset;
+}
+table.autosummary td {
+  width: 50%
+}
+img.align-center {
+  display: block;
+  margin-left: auto;
+  margin-right: auto;
+}
+article.pytorch-article p.rubric {
+  font-weight: bold;
+}
--- a/docs/zh_CN/_static/image/confusion-matrix.png
+++ b/docs/zh_CN/_static/image/confusion-matrix.png
+../../../en/_static/image/confusion-matrix.png
\ No newline at end of file
--- a/docs/zh_CN/_static/image/mmpt-logo.png
+++ b/docs/zh_CN/_static/image/mmpt-logo.png