We provide a script adapted from [MMEngine](https://github.com/open-mmlab/mmengine/blob/main/mmengine/analysis/complexity_analysis.py) to compute the FLOPs and params of a given model.
-`--shape`: Input size, support single value or double value parameter, such as `--shape 256` or `--shape 224 256`. If not set, default to be `224 224`.
This tool is still experimental and we do not guarantee that the number is correct. You may well use the result for simple comparisons, but double-check it before you adopt it in technical reports or papers.
- FLOPs are related to the input shape while parameters are not. The default input shape is (1, 3, 224, 224).
- Some operators are not counted into FLOPs like custom operators. Refer to [`mmengine.analysis.complexity_analysis._DEFAULT_SUPPORTED_FLOP_OPS`](https://github.com/open-mmlab/mmengine/blob/main/mmengine/analysis/complexity_analysis.py) for details.
MMPretrain provides `tools/analysis_tools/confusion_matrix.py` tool to calculate and visualize the confusion matrix. For an introduction to the confusion matrix, see [link](https://en.wikipedia.org/wiki/Confusion_matrix).
## Command-line Usage
**Command**:
```shell
python tools/analysis_tools/confusion_matrix.py \
${CONFIG_FILE}\
${CHECKPOINT}\
[--show]\
[--show-path]\
[--include-values]\
[--cmap${CMAP}]\
[--cfg-options${CFG-OPTIONS}]
```
**Description of all arguments**:
-`config`: The path of the model config file.
-`checkpoint`: The path of the checkpoint.
-`--show`: If or not to show the matplotlib visualization result of the confusion matrix, the default is `False`.
-`--show-path`: If `show` is True, the path where the results are saved is visualized.
-`--include-values`: Whether to add values to the visualization results.
-`--cmap`: The color map used for visualization results, `cmap`, which defaults to `viridis`.
*`--cfg-options`: Modifications to the configuration file, refer to [Learn about Configs](../user_guides/config.md).
-`-o, --output-dir`: The output path for visualized images. If not specified, it will be set to `''`, which means not to save.
-**`-p, --phase`**: Phase of visualizing dataset,must be one of `['train', 'val', 'test']`. If not specified, it will be set to `'train'`.
-**`-n, --show-number`**: The number of samples to visualized. If not specified, display all images in the dataset.
-`--show-interval`: The interval of show (s).
-**`-m, --mode`**: The display mode, can be one of `['original', 'transformed', 'concat', 'pipeline']`. If not specified, it will be set to `'transformed'`.
-`-r, --rescale-factor`: The image rescale factor, which is useful if the output is too large or too small
in the `original` mode.
-`-c, --channel-order`: The channel of the showing images, could be "BGR" or "RGB", If not specified, it will be set to 'BGR'.
-`--cfg-options` : Modifications to the configuration file, refer to [Learn about Configs](../user_guides/config.md).
```{note}
1. The `-m, --mode` is about display mode, display original pictures or transformed pictures or comparison pictures:
- "original" means show images load from disk;
- "transformed" means to show images after transformed;
- "concat" means show images stitched by "original" and "transformed" images;
- "pipeline" means show all the intermediate images throghout the pipeline.
2. The `-r, --rescale-factor` option is set when the label information is too large or too small relative to the picture. For example, when visualizing the CIFAR dataset, since the resolution of the image is very small, `--rescale-factor` can be set to 10.
```
## How to visualize the original image
In **'original'** mode:
```shell
python ./tools/visualization/browse_dataset.py ./configs/resnet/resnet101_8xb16_cifar10.py --phase val --output-dir tmp --mode original --show-number 100 --rescale-factor 10 --channel-order RGB
```
-`--phase val`: Visual validation set, can be simplified to `-p val`;
-`--output-dir tmp`: The visualization results are saved in the "tmp" folder, can be simplified to `-o tmp`;
-`--mode original`: Visualize the original image, can be simplified to `-m original`;
-`--show-number 100`: visualize 100 images, can be simplified to `-n 100`;
-`--rescale-factor`: the image is enlarged by 10 times, can be simplified to `-r 10`;
-`--channel-order RGB`: The channel order of the visualized image is "RGB", can be simplified to `-c RGB`.
### How to plot the confusion matrix for the test result
We provide `tools/analysis_tools/confusion_matrix.py` to enable the user plot the confusion matrix from the prediction files.
```shell
python tools/analysis_tools/confusion_matrix.py \
${CONFIG}\
${RESULT}\
[--out${OUT}]\
[--show]\
[--show-path${SHOW_PATH}]\
[--include-values]\
[--cmap]\
[--cfg-options${CFG_OPTIONS} ...] \
```
Description of all arguments:
-`config`: The config file path.
-`result`: The output result file in pickle format from `tools/test.py`, or a checkpoint file.
-`--out`: The path to save the confusion matrix in pickle format.
-`--show`: Whether to show the confusion matrix plot.
-`--show-path`: The path to save the confusion matrix plot.
-`--include-values`: Whether to show the values in the confusion matrix plot.
-`--cmap`: The color map to plot the confusion matrix.
-`--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
```{note}
In `tools/test.py`, we support using `--out-item` option to select which kind of results will be saved.
Please ensure the `--out-item` is not specified or `--out-item=pred` to use this tool.
We can use `tools/analysis_tools/analyze_results.py` to save the images with the highest scores in successful or failed prediction.
```shell
python tools/analysis_tools/analyze_results.py \
${CONFIG}\
${RESULT}\
[--out-dir${OUT_DIR}]\
[--topk${TOPK}]\
[--rescale-factor${RESCALE_FACTOR}]\
[--cfg-options${CFG_OPTIONS}]
```
**Description of all arguments**:
-`config` : The path of the model config file.
-`result`: Output result file in json/pickle format from `tools/test.py`.
-`--out_dir`: Directory to store output files.
-`--topk`: The number of images in successful or failed prediction with the highest `topk` scores to save. If not specified, it will be set to 20.
-`--rescale-factor`: Image rescale factor, which is useful if the output is too large or too small (Too small
images may cause the prediction tag is too vague).
-`--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
```{note}
In `tools/test.py`, we support using `--out-item` option to select which kind of results will be saved.
Please ensure the `--out-item` is not specified or `--out-item=pred` to use this tool.
Check the official docs for [running TorchServe with docker](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment).
In order to run in GPU, you need to install [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). You can omit the `--gpus` argument in order to run in GPU.
`realpath ./checkpoints` points to the absolute path of "./checkpoints", and you can replace it with the absolute path where you store torchserve models.
```
[Read the docs](https://github.com/pytorch/serve/blob/master/docs/rest_api.md) about the Inference (8080), Management (8081) and Metrics (8082) APis
-`--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
This tool aims to help the user to check the hyper-parameter scheduler of the optimizer (without training), which support the "learning rate" or "momentum"
## Introduce the scheduler visualization tool
```bash
python tools/visualization/vis_scheduler.py \
${CONFIG_FILE}\
[-p, --parameter${PARAMETER_NAME}]\
[-d, --dataset-size${DATASET_SIZE}]\
[-n, --ngpus${NUM_GPUs}]\
[-s, --save-path${SAVE_PATH}]\
[--title${TITLE}]\
[--style${STYLE}]\
[--window-size${WINDOW_SIZE}]\
[--cfg-options]
```
**Description of all arguments**:
-`config`: The path of a model config file.
-**`-p, --parameter`**: The param to visualize its change curve, choose from "lr" and "momentum". Default to use "lr".
-**`-d, --dataset-size`**: The size of the datasets. If set,`build_dataset` will be skipped and `${DATASET_SIZE}` will be used as the size. Default to use the function `build_dataset`.
-**`-n, --ngpus`**: The number of GPUs used in training, default to be 1.
-**`-s, --save-path`**: The learning rate curve plot save path, default not to save.
-`--title`: Title of figure. If not set, default to be config file name.
-`--style`: Style of plt. If not set, default to be `whitegrid`.
-`--window-size`: The shape of the display window. If not specified, it will be set to `12*7`. If used, it must be in the format `'W*H'`.
-`--cfg-options`: Modifications to the configuration file, refer to [Learn about Configs](../user_guides/config.md).
```{note}
Loading annotations maybe consume much time, you can directly specify the size of the dataset with `-d, dataset-size` to save time.
```
## How to plot the learning rate curve without training
You can use the following command to plot the step learning rate schedule used in the config `configs/swin_transformer/swin-base_16xb64_in1k.py`:
Shape bias measures how a model relies the shapes, compared to texture, to sense the semantics in images. For more details,
we recommend interested readers to this [paper](https://arxiv.org/abs/2106.07411). MMPretrain provide an off-the-shelf toolbox to
obtain the shape bias of a classification model. You can following these steps below:
## Prepare the dataset
First you should download the [cue-conflict](https://github.com/bethgelab/model-vs-human/releases/download/v0.1/cue-conflict.tar.gz) to `data` folder,
and then unzip this dataset. After that, you `data` folder should have the following structure:
```text
data
├──cue-conflict
| |──airplane
| |──bear
| ...
| |── truck
```
## Modify the config for classification
We run the shape-bias tool on a ViT-base model with masked autoencoder pretraining. Its config file is `configs/mae/benchmarks/vit-base-p16_8xb128-coslr-100e_in1k.py`, and its checkpoint is downloaded from [this link](https://download.openmmlab.com/mmselfsup/1.x/mae/mae_vit-base-p16_8xb512-fp16-coslr-1600e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k_20220825-cf70aa21.pth). Replace the original test_pipeline, test_dataloader and test_evaluation with the following configurations:
Please note you should make custom modifications to the `csv_dir` and `model_name` above. I renamed my modified sample config file as `vit-base-p16_8xb128-coslr-100e_in1k_shape-bias.py` in the folder `configs/mae/benchmarks/`.
## Inference your model with above modified config file
Then you should inferece your model on the `cue-conflict` dataset with the your modified config file.
```shell
# For PyTorch
bash tools/dist_test.sh $CONFIG$CHECKPOINT
```
**Description of all arguments**:
-`$CONFIG`: The path of your modified config file.
-`$CHECKPOINT`: The path or link of the checkpoint file.
After that, you should obtain a csv file in `csv_dir` folder, named `cue-conflict_model-name_session-1.csv`. Besides this file, you should also download these [csv files](https://github.com/bethgelab/model-vs-human/tree/master/raw-data/cue-conflict) to the
`csv_dir`.
## Plot shape bias
Then we can start to plot the shape bias:
```shell
python tools/analysis_tools/shape_bias.py --csv-dir$CSV_DIR--result-dir$RESULT_DIR--colors$RGB--markers o --plotting-names$YOUR_MODEL_NAME--model-names$YOUR_MODEL_NAME
```
**Description of all arguments**:
-`--csv-dir $CSV_DIR`, the same directory to save these csv files.
-`--result-dir $RESULT_DIR`, the directory to output the result named `cue-conflict_shape-bias_matrixplot.pdf`.
-`--colors $RGB`, should be the RGB values, formatted in R G B, e.g. 100 100 100, and can be multiple RGB values, if you want to plot the shape bias of several models.
-`--plotting-names $YOUR_MODEL_NAME`, the name of the legend in the shape bias figure, and you can set it as your model name. If you want to plot several models, plotting_names can be multiple values.
-`model-names $YOUR_MODEL_NAME`, should be the same name specified in your config, and can be multiple names if you want to plot the shape bias of several models.
Please note, every three values for `--colors` corresponds to one value for `--model-names`. After all of above steps, you are expected to obtain the following figure.
MMPretrain provides `tools/visualization/vis_tsne.py` tool to visualize the feature embeddings of images by t-SNE. Please install `sklearn` to calculate t-SNE by `pip install scikit-learn`.
-`--checkpoint CHECKPOINT`: The path of the checkpoint file.
-`--work-dir WORK_DIR`: The directory to save logs and visualization images.
-`--test-cfg TEST_CFG`: The path of t-SNE config file to load config of test dataloader.
-`--vis-stage {backbone,neck,pre_logits}`: The visualization stage of the model.
-`--class-idx CLASS_IDX [CLASS_IDX ...]`: The categories used to calculate t-SNE.
-`--max-num-class MAX_NUM_CLASS`: The first N categories to apply t-SNE algorithms. Defaults to 20.
-`--max-num-samples MAX_NUM_SAMPLES`: The maximum number of samples per category. Higher number need longer time to calculate. Defaults to 100.
-`--cfg-options CFG_OPTIONS [CFG_OPTIONS ...]`: override some settings in the used config, the key-value pair in xxx=yyy format will be merged into config file. If the value to be overwritten is a list, it should be like key="[a,b]" or key=a,b It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" Note that the quotation marks are necessary and that no white space is allowed.
-`--device DEVICE`: Device used for inference.
-`--legend`: Show the legend of all categories.
-`--show`: Display the result in a graphical window.
-`--n-components N_COMPONENTS`: The dimension of results.
-`--perplexity PERPLEXITY`: The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms.
-`--early-exaggeration EARLY_EXAGGERATION`: Controls how tight natural clusters in the original space are in the embedded space and how much space will be between them.
-`--learning-rate LEARNING_RATE`: The learning rate for t-SNE is usually in the range[10.0, 1000.0]. If the learning rate is too high, the data may looklike a ball with any point approximately equidistant from its nearestneighbours. If the learning rate is too low, most points may lookcompressed in a dense cloud with few outliers.
-`--n-iter N_ITER`: Maximum number of iterations for the optimization. Should be at least 250.
-`--n-iter-without-progress N_ITER_WITHOUT_PROGRESS`: Maximum number of iterations without progress before we abort the optimization.
-`--init INIT`: The init method.
## How to visualize the t-SNE of a image classifier (such as ResNet)
Here are two examples of running t-SNE visualization on ResNet-18 and ResNet-50 models, trained on CIFAR-10 dataset:
## How to visualize the t-SNE of a self-supervised model (such as MAE)
Here is an example of running t-SNE visualization on MAE-ViT-base model, trained on ImageNet dataset. The input data is from ImageNet validation set. MAE and some self-supervised pre-training algorithms do not have test_dataloader information. When analyzing such self-supervised algorithms, you need to add test_dataloader information in the config, or you can use '--test-cfg' argument to specify a config file.
In MMPretrain, we also provide a tool `tools/misc/verify_dataset.py` to check whether there exists **broken pictures** in the given dataset.
## Introduce the tool
```shell
python tools/print_config.py \
${CONFIG}\
[--out-path${OUT-PATH}]\
[--phase${PHASE}]\
[--num-process${NUM-PROCESS}]
[--cfg-options${CFG_OPTIONS}]
```
**Description of all arguments**:
-`config` : The path of the model config file.
-`--out-path` : The path to save the verification result, if not set, defaults to 'brokenfiles.log'.
-`--phase` : Phase of dataset to verify, accept "train" "test" and "val", if not set, defaults to "train".
-`--num-process` : number of process to use, if not set, defaults to 1.
-`--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
## Example
```shell
python tools/misc/verify_dataset.py configs/t2t_vit/t2t-vit-t-14_8xb64_in1k.py --out-path broken_imgs.log --phase val --num-process 8
To manage various configurations in a deep-learning experiment, we use a kind of config file to record all of
these configurations. This config system has a modular and inheritance design, and more details can be found in
{external+mmengine:doc}`the tutorial in MMEngine <advanced_tutorials/config>`.
Usually, we use python files as config file. All configuration files are placed under the [`configs`](https://github.com/open-mmlab/mmpretrain/tree/main/configs) folder, and the directory structure is as follows:
If you wish to inspect the config file, you may run `python tools/misc/print_config.py /PATH/TO/CONFIG` to see the complete config.
This article mainly explains the structure of configuration files, and how to modify it based on the existing configuration files. We will take [ResNet50 config file](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet50_8xb32_in1k.py) as an example and explain it line by line.
## Config Structure
There are four kinds of basic component files in the `configs/_base_` folders, namely:
We call all the config files in the `_base_` folder as _primitive_ config files. You can easily build your training config file by inheriting some primitive config files.
For easy understanding, we use [ResNet50 config file](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet50_8xb32_in1k.py) as an example and comment on each line.
```python
_base_=[# This config file will inherit all config files in `_base_`.
'../_base_/models/resnet50.py',# model settings
'../_base_/datasets/imagenet_bs32.py',# data settings
We will explain the four primitive config files separately below.
### Model settings
This primitive config file includes a dict variable `model`, which mainly includes information such as network structure and loss function:
-`type`: The type of model to build, we support several tasks.
- For image classification tasks, it's usually `ImageClassifier` You can find more details in the [API documentation](mmpretrain.models.classifiers).
- For self-supervised leanrning, there are several `SelfSupervisors`, such as `MoCoV2`, `BEiT`, `MAE`, etc. You can find more details in the [API documentation](mmpretrain.models.selfsup).
- For image retrieval tasks, it's usually `ImageToImageRetriever` You can find more details in the [API documentation](mmpretrain.models.retrievers).
Usually, we use the **`type` field** to specify the class of the component and use other fields to pass
the initialization arguments of the class. The {external+mmengine:doc}`registry tutorial <advanced_tutorials/registry>` describes it in detail.
Here, we use the config fields of [`ImageClassifier`](mmpretrain.models.classifiers.ImageClassifier) as an example to
describe the initialization arguments as below:
-`backbone`: The settings of the backbone. The backbone is the main network to extract features of the inputs, like `ResNet`, `Swin Transformer`, `Vision Transformer` etc. All available backbones can be found in the [API documentation](mmpretrain.models.backbones).
- For self-supervised leanrning, some of the backbones are re-implemented, you can find more details in the [API documentation](mmpretrain.models.selfsup).
-`neck`: The settings of the neck. The neck is the intermediate module to connect the backbone and the head, like `GlobalAveragePooling`. All available necks can be found in the [API documentation](mmpretrain.models.necks).
-`head`: The settings of the task head. The head is the task-related component to do a specified task, like image classification or self-supervised training. All available heads can be found in the [API documentation](mmpretrain.models.heads).
-`loss`: The loss function to optimize, like `CrossEntropyLoss`, `LabelSmoothLoss`, `PixelReconstructionLoss` and etc. All available losses can be found in the [API documentation](mmpretrain.models.losses).
-`data_preprocessor`: The component before the model forwarding to preprocess the inputs. See the [documentation](mmpretrain.models.utils.data_preprocessor) for more details.
-`train_cfg`: The extra settings of `ImageClassifier` during training. In `ImageClassifier`, we mainly use it to specify batch augmentation settings, like `Mixup` and `CutMix`. See the [documentation](mmpretrain.models.utils.batch_augments) for more details.
Following is the model primitive config of the ResNet50 config file in [`configs/_base_/models/resnet50.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/models/resnet50.py):
```python
model=dict(
type='ImageClassifier',# The type of the main model (here is for image classification task).
backbone=dict(
type='ResNet',# The type of the backbone module.
# All fields except `type` come from the __init__ method of class `ResNet`
# and you can find them from https://mmpretrain.readthedocs.io/en/latest/api/generated/mmpretrain.models.backbones.ResNet.html
depth=50,
num_stages=4,
out_indices=(3,),
frozen_stages=-1,
style='pytorch'),
neck=dict(type='GlobalAveragePooling'),# The type of the neck module.
head=dict(
type='LinearClsHead',# The type of the classification head module.
# All fields except `type` come from the __init__ method of class `LinearClsHead`
# and you can find them from https://mmpretrain.readthedocs.io/en/latest/api/generated/mmpretrain.models.heads.LinearClsHead.html
This primitive config file includes information to construct the dataloader and evaluator:
-`data_preprocessor`: Model input preprocessing configuration, same as `model.data_preprocessor` but with lower priority.
-`train_evaluator | val_evaluator | test_evaluator`: To build the evaluator or metrics, refer to the [tutorial](mmpretrain.evaluation).
-`train_dataloader | val_dataloader | test_dataloader`: The settings of dataloaders
-`batch_size`: The batch size of each GPU.
-`num_workers`: The number of workers to fetch data of each GPU.
-`sampler`: The settings of the sampler.
-`persistent_workers`: Whether to persistent workers after finishing one epoch.
-`dataset`: The settings of the dataset.
-`type`: The type of the dataset, we support `CustomDataset`, `ImageNet` and many other datasets, refer to [documentation](mmpretrain.datasets).
-`pipeline`: The data transform pipeline. You can find how to design a pipeline in [this tutorial](https://mmpretrain.readthedocs.io/en/latest/tutorials/data_pipeline.html).
Following is the data primitive config of the ResNet50 config in [`configs/_base_/datasets/imagenet_bs32.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/datasets/imagenet_bs32.py):
```python
dataset_type='ImageNet'
# preprocessing configuration
data_preprocessor=dict(
# Input image data channels in 'RGB' order
mean=[123.675,116.28,103.53],# Input image normalized channel mean in RGB order
std=[58.395,57.12,57.375],# Input image normalized channel std in RGB order
to_rgb=True,# Whether to flip the channel from BGR to RGB or RGB to BGR
)
train_pipeline=[
dict(type='LoadImageFromFile'),# read image
dict(type='RandomResizedCrop',scale=224),# Random scaling and cropping
dict(type='RandomFlip',prob=0.5,direction='horizontal'),# random horizontal flip
dict(type='PackInputs'),# prepare images and labels
]
test_pipeline=[
dict(type='LoadImageFromFile'),# read image
dict(type='ResizeEdge',scale=256,edge='short'),# Scale the short side to 256
dict(type='CenterCrop',crop_size=224),# center crop
dict(type='PackInputs'),# prepare images and labels
]
# Construct training set dataloader
train_dataloader=dict(
batch_size=32,# batchsize per GPU
num_workers=5,# Number of workers to fetch data per GPU
# The settings of the evaluation metrics for validation. We use the top1 and top5 accuracy here.
val_evaluator=dict(type='Accuracy',topk=(1,5))
test_dataloader=val_dataloader# The settings of the dataloader for the test dataset, which is the same as val_dataloader
test_evaluator=val_evaluator# The settings of the evaluation metrics for test, which is the same as val_evaluator
```
```{note}
The data preprocessor can be defined either in the subfield of `model`, or a using the `data_preprocessor` definition here, if both of them exist, use the `model.data_preprocessor` configuration.
```
### Schedule settings
This primitive config file mainly contains training strategy settings and the settings of training, val and
test loops:
-`optim_wrapper`: The settings of the optimizer wrapper. We use the optimizer wrapper to customize the
optimization process.
-`optimizer`: Supports all `pytorch` optimizers, refers to the relevant {external+mmengine:doc}`MMEngine documentation <tutorials/optim_wrapper>`.
-`paramwise_cfg`: To set different optimization arguments according to the parameters' type or name, refer to the relevant [learning policy documentation](../advanced_guides/schedule.md).
-`accumulative_counts`: Optimize parameters after several backward steps instead of one backward step. You
can use it to simulate large batch size by small batch size.
-`param_scheduler`: Optimizer parameters policy. You can use it to specify learning rate and momentum curves during training. See the {external+mmengine:doc}`documentation <tutorials/param_scheduler>` in MMEngine for more details.
-`train_cfg | val_cfg | test_cfg`: The settings of the training, validation and test loops, refer to the relevant {external+mmengine:doc}`MMEngine documentation <design/runner>`.
Following is the schedule primitive config of the ResNet50 config in [`configs/_base_/datasets/imagenet_bs32.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/datasets/imagenet_bs32.py):
# If you use a different total batch size, like 512 and enable auto learning rate scaling.
# We will scale up the learning rate to 2 times.
auto_scale_lr=dict(base_batch_size=256)
```
### Runtime settings
This part mainly includes saving the checkpoint strategy, log configuration, training parameters, breakpoint weight path, working directory, etc.
Here is the runtime primitive config file ['configs/_base_/default_runtime.py'](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/default_runtime.py) file used by almost all configs:
# whether to resume training from the loaded checkpoint
resume=False
```
## Inherit and Modify Config File
For easy understanding, we recommend contributors inherit from existing config files. But do not abuse the
inheritance. Usually, for all config files, we recommend the maximum inheritance level is 3.
For example, if your config file is based on ResNet with some other modification, you can first inherit the
basic ResNet structure, dataset and other training settings by specifying `_base_ ='./resnet50_8xb32_in1k.py'`
(The path relative to your config file), and then modify the necessary parameters in the config file. A more
specific example, now we want to use almost all configs in `configs/resnet/resnet50_8xb32_in1k.py`, but using
`CutMix` train batch augment and changing the number of training epochs from 100 to 300, modify when to decay
the learning rate, and modify the dataset path, you can create a new config file
`configs/resnet/resnet50_8xb32-300e_in1k.py` with content as below:
```python
# create this file under 'configs/resnet/' folder
_base_='./resnet50_8xb32_in1k.py'
# using CutMix batch augment
model=dict(
train_cfg=dict(
augments=dict(type='CutMix',alpha=1.0)
)
)
# trains more epochs
train_cfg=dict(max_epochs=300,val_interval=10)# Train for 300 epochs, evaluate every 10 epochs
param_scheduler=dict(step=[150,200,250])# The learning rate adjustment has also changed
# Use your own dataset directory
train_dataloader=dict(
dataset=dict(data_root='mydata/imagenet/train'),
)
val_dataloader=dict(
batch_size=64,# No back-propagation during validation, larger batch size can be used
dataset=dict(data_root='mydata/imagenet/val'),
)
test_dataloader=dict(
batch_size=64,# No back-propagation during test, larger batch size can be used
dataset=dict(data_root='mydata/imagenet/val'),
)
```
### Use intermediate variables in configs
Some intermediate variables are used in the configuration file. The intermediate variables make the configuration file clearer and easier to modify.
For example, `train_pipeline` / `test_pipeline` is the intermediate variable of the data pipeline. We first need to define `train_pipeline` / `test_pipeline`, and then pass them to `train_dataloader` / `test_dataloader`. If you want to modify the size of the input image during training and testing, you need to modify the intermediate variables of `train_pipeline` / `test_pipeline`.
```python
bgr_mean=[103.53,116.28,123.675]# mean in BGR order
Sometimes, you need to set `_delete_=True` to ignore some domain content in the basic configuration file. You can refer to the {external+mmengine:doc}`documentation in MMEngine <advanced_tutorials/config>` for more instructions.
The following is an example. If you want to use cosine schedule in the above ResNet50 case, just using inheritance and directly modifying it will report `get unexpected keyword 'step'` error, because the `'step'` field of the basic config in `param_scheduler` domain information is reserved, and you need to add `_delete_ =True` to ignore the content of `param_scheduler` related fields in the basic configuration file:
Sometimes, you may refer to some fields in the `_base_` config, to avoid duplication of definitions. You can refer to {external+mmengine:doc}`MMEngine <advanced_tutorials/config>` for some more instructions.
The following is an example of using auto augment in the training data preprocessing pipeline, refer to [`configs/resnest/resnest50_32xb64_in1k.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnest/resnest50_32xb64_in1k.py). When defining `train_pipeline`, just add the definition file name of auto augment to `_base_`, and then use `_base_.auto_increasing_policies` to reference the variables in the primitive config:
When you use the script "tools/train.py" or "tools/test.py" to submit tasks or use some other tools, they can directly modify the content of the configuration file used by specifying the `--cfg-options` argument.
- Update config keys of dict chains.
The config options can be specified following the order of the dict keys in the original config.
For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode.
- Update keys inside a list of configs.
Some config dicts are composed as a list in your config. For example, the training pipeline `data.train.pipeline` is normally a list
e.g. `[dict(type='LoadImageFromFile'), dict(type='TopDownRandomFlip', flip_prob=0.5), ...]`. If you want to change `'flip_prob=0.5'` to `'flip_prob=0.0'` in the pipeline,
you may specify `--cfg-options data.train.pipeline.1.flip_prob=0.0`.
- Update values of list/tuples.
If the value to be updated is a list or a tuple. For example, the config file normally sets `val_evaluator = dict(type='Accuracy', topk=(1, 5))`. If you want to change the field `topk`, you may specify `--cfg-options val_evaluator.topk="(1,3)"`. Note that the quotation mark " is necessary to support list/tuple data types and that **NO** white space is allowed inside the quotation marks in the specified value.
[`CustomDataset`](mmpretrain.datasets.CustomDataset) is a general dataset class for you to use your own datasets. To use `CustomDataset`, you need to organize your dataset files according to the following two formats:
### Subfolder Format
In this format, you only need to re-organize your dataset folder and place all samples in one folder without
creating any annotation files.
For supervised tasks (with `with_label=True`), we use the name of sub-folders as the categories names, as
shown in the below example, `class_x` and `class_y` will be recognized as the categories names.
```text
data_prefix/
├── class_x
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
│ └── xxz.png
└── class_y
├── 123.png
├── nsdf3.png
├── ...
└── asd932_.png
```
For unsupervised tasks (with `with_label=False`), we directly load all sample files under the specified folder:
```text
data_prefix/
├── folder_1
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
├── 123.png
├── nsdf3.png
└── ...
```
Assume you want to use it as the training dataset, and the below is the configurations in your config file.
```python
train_dataloader=dict(
...
# Training dataset configurations
dataset=dict(
type='CustomDataset',
data_prefix='path/to/data_prefix',
with_label=True,# or False for unsupervised tasks
pipeline=...
)
)
```
```{note}
If you want to use this format, do not specify `ann_file`, or specify `ann_file=''`.
And please note that the subfolder format requires a folder scanning which may cause a slower initialization,
especially for large datasets or slow file IO.
```
### Text Annotation File Format
In this format, we use a text annotation file to store image file paths and the corespondding category
indices.
For supervised tasks (with `with_label=True`), the annotation file should include the file path and the
category index of one sample in one line and split them by a space, as below:
All these file paths can be absolute paths, or paths relative to the `data_prefix`.
```text
folder_1/xxx.png 0
folder_1/xxy.png 1
123.png 4
nsdf3.png 3
...
```
```{note}
The index numbers of categories start from 0. And the value of ground-truth labels should fall in range `[0, num_classes - 1]`.
In addition, please use the `classes` field in the dataset settings to specify the name of every category.
```
For unsupervised tasks (with `with_label=False`), the annotation file only need to include the file path of
one sample in one line, as below:
```text
folder_1/xxx.png
folder_1/xxy.png
123.png
nsdf3.png
...
```
Assume the entire dataset folder is as below:
```text
data_root
├── meta
│ ├── test.txt # The annotation file for the test dataset
│ ├── train.txt # The annotation file for the training dataset
│ └── val.txt # The annotation file for the validation dataset.
├── train
│ ├── 123.png
│ ├── folder_1
│ │ ├── xxx.png
│ │ └── xxy.png
│ └── nsdf3.png
├── test
└── val
```
Here is an example dataset settings in config files:
```python
# Training dataloader configurations
train_dataloader=dict(
dataset=dict(
type='CustomDataset',
data_root='path/to/data_root',# The common prefix of both `ann_flie` and `data_prefix`.
ann_file='meta/train.txt',# The path of annotation file relative to the data_root.
data_prefix='train',# The prefix of file paths in the `ann_file`, relative to the data_root.
with_label=True,# or False for unsupervised tasks
classes=['A','B','C','D',...],# The name of every category.
pipeline=...,# The transformations to process the dataset samples.
)
...
)
```
```{note}
For a complete example about how to use the `CustomDataset`, please see [How to Pretrain with Custom Dataset](../notes/pretrain_custom_dataset.md)
```
## ImageNet
ImageNet has multiple versions, but the most commonly used one is [ILSVRC 2012](http://www.image-net.org/challenges/LSVRC/2012/). It can be accessed with the following steps.
`````{tabs}
````{group-tab} Download by MIM
MIM supports downloading from [OpenXlab](https://openxlab.org.cn/datasets) and preprocessing ImageNet dataset with one command line.
_You need to register an account at [OpenXlab official website](https://openxlab.org.cn/datasets) and login by CLI._
```Bash
# install OpenXlab CLI tools
pip install -U openxlab
# log in OpenXLab
openxlab login
# download and preprocess by MIM, better to execute in $MMPreTrain directory.
mim download mmpretrain --dataset imagenet1k
```
````
````{group-tab} Download form Official Source
1. Register an account and login to the [download page](http://www.image-net.org/download-images).
2. Find download links for ILSVRC2012 and download the following two files
- ILSVRC2012_img_train.tar (~138GB)
- ILSVRC2012_img_val.tar (~6.3GB)
3. Untar the downloaded files
````
`````
### The Directory Structrue of the ImageNet dataset
We support two ways of organizing the ImageNet dataset: Subfolder Format and Text Annotation File Format.
#### Subfolder Format
We have provided a sample, which you can download and extract from this [link](https://download.openmmlab.com/mmpretrain/datasets/imagenet_1k.zip). The directory structure of the dataset should be as below:
```text
data/imagenet/
├── train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── n01440764_10029.JPEG
│ │ ├── n01440764_10040.JPEG
│ │ ├── n01440764_10042.JPEG
│ │ ├── n01440764_10043.JPEG
│ │ └── n01440764_10048.JPEG
│ ├── ...
├── val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ILSVRC2012_val_00003014.JPEG
│ │ └── ...
│ ├── ...
```
#### Text Annotation File Format
You can download and untar the meta data from this [link](https://download.openmmlab.com/mmclassification/datasets/imagenet/meta/caffe_ilsvrc12.tar.gz). And re-organize the dataset as below:
```text
data/imagenet/
├── meta/
│ ├── train.txt
│ ├── test.txt
│ └── val.txt
├── train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── n01440764_10029.JPEG
│ │ ├── n01440764_10040.JPEG
│ │ ├── n01440764_10042.JPEG
│ │ ├── n01440764_10043.JPEG
│ │ └── n01440764_10048.JPEG
│ ├── ...
├── val/
│ ├── ILSVRC2012_val_00000001.JPEG
│ ├── ILSVRC2012_val_00000002.JPEG
│ ├── ILSVRC2012_val_00000003.JPEG
│ ├── ILSVRC2012_val_00000004.JPEG
│ ├── ...
```
### Configuration
Once your dataset is organized in the way described above, you can use the [`ImageNet`](mmpretrain.datasets.ImageNet) dataset with the below configurations:
Some dataset homepage links may be unavailable, and you can download datasets through [OpenXLab](https://openxlab.org.cn/datasets), such as [Stanford Cars](https://openxlab.org.cn/datasets/OpenDataLab/Stanford_Cars).
Some dataset homepage links may be unavailable, and you can download datasets through [OpenDataLab](https://opendatalab.com/), such as [RefCOCO](https://opendatalab.com/RefCOCO/download).
## OpenMMLab 2.0 Standard Dataset
In order to facilitate the training of multi-task algorithm models, we unify the dataset interfaces of different tasks. OpenMMLab has formulated the **OpenMMLab 2.0 Dataset Format Specification**. When starting a trainning task, the users can choose to convert their dataset annotation into the specified format, and use the algorithm library of OpenMMLab to perform algorithm training and testing based on the data annotation file.
The OpenMMLab 2.0 Dataset Format Specification stipulates that the annotation file must be in `json` or `yaml`, `yml`, `pickle` or `pkl` format; the dictionary stored in the annotation file must contain `metainfo` and `data_list` fields, The value of `metainfo` is a dictionary, which contains the meta information of the dataset; and the value of `data_list` is a list, each element in the list is a dictionary, the dictionary defines a raw data, each raw data contains a or several training/testing samples.
The following is an example of a JSON annotation file (in this example each raw data contains only one train/test sample):
```
{
'metainfo':
{
'classes': ('cat', 'dog'), # the category index of 'cat' is 0 and 'dog' is 1.
...
},
'data_list':
[
{
'img_path': "xxx/xxx_0.jpg",
'gt_label': 0,
...
},
{
'img_path': "xxx/xxx_1.jpg",
'gt_label': 1,
...
},
...
]
}
```
Assume you want to use the training dataset and the dataset is stored as the below structure:
```text
data
├── annotations
│ ├── train.json
├── train
│ ├── xxx/xxx_0.jpg
│ ├── xxx/xxx_1.jpg
│ ├── ...
```
Build from the following dictionaries:
```python
train_dataloader=dict(
...
dataset=dict(
type='BaseDataset',
data_root='data',
ann_file='annotations/train.json',
data_prefix='train/',
pipeline=...,
)
)
```
## Other Datasets
To find more datasets supported by MMPretrain, and get more configurations of the above datasets, please see the [dataset documentation](mmpretrain.datasets).
To implement your own dataset class for some special formats, please see the [Adding New Dataset](../advanced_guides/datasets.md).
## Dataset Wrappers
The following datawrappers are supported in MMEngine, you can refer to {external+mmengine:doc}`MMEngine tutorial <advanced_tutorials/basedataset>` to learn how to use it.
For detection tasks, please use MMDetection. First, make sure you have installed [MIM](https://github.com/open-mmlab/mim), which is also a project of OpenMMLab.
```shell
pip install openmim
mim install'mmdet>=3.0.0rc0'
```
Besides, please refer to MMDet for [installation](https://mmdetection.readthedocs.io/en/dev-3.x/get_started.html) and [data preparation](https://mmdetection.readthedocs.io/en/dev-3.x/user_guides/dataset_prepare.html)
### Train
After installation, you can run MMDetection with simple command.
For semantic segmentation task, we use MMSegmentation. First, make sure you have installed [MIM](https://github.com/open-mmlab/mim), which is also a project of OpenMMLab.
```shell
pip install openmim
mim install'mmsegmentation>=1.0.0rc0'
```
Besides, please refer to MMSegmentation for [installation](https://mmsegmentation.readthedocs.io/en/dev-1.x/get_started.html) and [data preparation](https://mmsegmentation.readthedocs.io/en/dev-1.x/user_guides/2_dataset_prepare.html).
### Train
After installation, you can run MMSegmentation with simple command.
Here is an example to inference an [image](https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG) by the ResNet-50 pre-trained classification model.
Usually, the result for every sample is a dictionary. For example, the image classification result is a dictionary containing `pred_label`, `pred_score`, `pred_scores` and `pred_class` as follows:
We also provide a gradio demo for all supported tasks and you can find it in [projects/gradio_demo/launch.py](https://github.com/open-mmlab/mmpretrain/blob/main/projects/gradio_demo/launch.py).
Please install `gradio` by `pip install -U gradio` at first.
By default, MMPretrain prefers GPU to CPU. If you want to test a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
| `CHECKPOINT_FILE` | The path to the checkpoint file (It can be a http link, and you can find checkpoints [here](https://mmpretrain.readthedocs.io/en/latest/modelzoo_statistics.html)). |
| `--work-dir WORK_DIR` | The directory to save the file containing evaluation metrics. |
| `--out OUT` | The path to save the file containing test results. |
| `--out-item OUT_ITEM` | To specify the content of the test results file, and it can be "pred" or "metrics". If "pred", save the outputs of the model for offline evaluation. If "metrics", save the evaluation metrics. Defaults to "pred". |
| `--cfg-options CFG_OPTIONS` | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either `key="[a,b]"` or `key=a,b`. The argument also allows nested list/tuple values, e.g. `key="[(a,b),(c,d)]"`. Note that the quotation marks are necessary and that no white space is allowed. |
| `--show-dir SHOW_DIR` | The directory to save the result visualization images. |
| `--show` | Visualize the prediction result in a window. |
| `--interval INTERVAL` | The interval of samples to visualize. |
| `--wait-time WAIT_TIME` | The display time of every window (in seconds). Defaults to 1. |
| `--no-pin-memory` | Whether to disable the `pin_memory` option in dataloaders. |
| `--tta` | Whether to enable the Test-Time-Aug (TTA). If the config file has `tta_pipeline` and `tta_model` fields, use them to determine the TTA transforms and how to merge the TTA results. Otherwise, use flip TTA by averaging classification score. |
| `--launcher {none,pytorch,slurm,mpi}` | Options for job launcher. |
## Test with multiple GPUs
We provide a shell script to start a multi-GPUs task with `torch.distributed.launch`.
| `CHECKPOINT_FILE` | The path to the checkpoint file (It can be a http link, and you can find checkpoints [here](https://mmpretrain.readthedocs.io/en/latest/modelzoo_statistics.html)). |
| `GPU_NUM` | The number of GPUs to be used. |
| `[PY_ARGS]` | The other optional arguments of `tools/test.py`, see [here](#test-with-your-pc). |
You can also specify extra arguments of the launcher by environment variables. For example, change the
communication port of the launcher to 29666 by the below command:
| `PARTITION` | The partition to use in your cluster. |
| `JOB_NAME` | The name of your job, you can name it as you like. |
| `CONFIG_FILE` | The path to the config file. |
| `CHECKPOINT_FILE` | The path to the checkpoint file (It can be a http link, and you can find checkpoints [here](https://mmpretrain.readthedocs.io/en/latest/modelzoo_statistics.html)). |
| `[PY_ARGS]` | The other optional arguments of `tools/test.py`, see [here](#test-with-your-pc). |
Here are the environment variables can be used to configure the slurm job.
In this tutorial, we will introduce how to use the scripts provided in MMPretrain to start a training task. If
you need, we also have some practice examples about [how to pretrain with custom dataset](../notes/pretrain_custom_dataset.md)
and [how to finetune with custom dataset](../notes/finetune_custom_dataset.md).
## Train with your PC
You can use `tools/train.py` to train a model on a single machine with a CPU and optionally a GPU.
Here is the full usage of the script:
```shell
python tools/train.py ${CONFIG_FILE}[ARGS]
```
````{note}
By default, MMPretrain prefers GPU to CPU. If you want to train a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
| `--work-dir WORK_DIR` | The target folder to save logs and checkpoints. Defaults to a folder with the same name of the config file under `./work_dirs`. |
| `--resume [RESUME]` | Resume training. If specify a path, resume from it, while if not specify, try to auto resume from the latest checkpoint. |
| `--auto-scale-lr` | Auto scale the learning rate according to the actual batch size and the original batch size. |
| `--no-pin-memory` | Whether to disable the `pin_memory` option in dataloaders. |
| `--no-persistent-workers` | Whether to disable the `persistent_workers` option in dataloaders. |
| `--cfg-options CFG_OPTIONS` | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either `key="[a,b]"` or `key=a,b`. The argument also allows nested list/tuple values, e.g. `key="[(a,b),(c,d)]"`. Note that the quotation marks are necessary and that no white space is allowed. |
| `--launcher {none,pytorch,slurm,mpi}` | Options for job launcher. |
## Train with multiple GPUs
We provide a shell script to start a multi-GPUs task with `torch.distributed.launch`.