Commit b6df0d33 authored by limm's avatar limm
Browse files

add resources part

parent cbc25585
Pipeline #2802 canceled with stages
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.header-logo {
background-image: url("../image/mmpt-logo.png");
background-size: 183px 50px;
height: 50px;
width: 183px;
}
@media screen and (min-width: 1100px) {
.header-logo {
top: -12px;
}
}
pre {
white-space: pre;
}
@media screen and (min-width: 2000px) {
.pytorch-content-left {
width: 1200px;
margin-left: 30px;
}
article.pytorch-article {
max-width: 1200px;
}
.pytorch-breadcrumbs-wrapper {
width: 1200px;
}
.pytorch-right-menu.scrolling-fixed {
position: fixed;
top: 45px;
left: 1580px;
}
}
article.pytorch-article section code {
padding: .2em .4em;
background-color: #f3f4f7;
border-radius: 5px;
}
/* Disable the change in tables */
article.pytorch-article section table code {
padding: unset;
background-color: unset;
border-radius: unset;
}
table.autosummary td {
width: 50%
}
img.align-center {
display: block;
margin-left: auto;
margin-right: auto;
}
article.pytorch-article p.rubric {
font-weight: bold;
}
var collapsedSections = ['Advanced Guides', 'Model Zoo', 'Visualization', 'Analysis Tools', 'Deployment', 'Notes'];
$(document).ready(function () {
$('.model-summary').DataTable({
"stateSave": false,
"lengthChange": false,
"pageLength": 20,
"order": []
});
});
{% extends "layout.html" %}
{% block body %}
<h1>Page Not Found</h1>
<p>
The page you are looking for cannot be found.
</p>
<p>
If you just switched documentation versions, it is likely that the page you were on is moved. You can look for it in
the content table left, or go to <a href="{{ pathto(root_doc) }}">the homepage</a>.
</p>
<p>
If you cannot find documentation you want, please <a
href="https://github.com/open-mmlab/mmpretrain/issues/new/choose">open an issue</a> to tell us!
</p>
{% endblock %}
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}
{{ name | underline}}
.. autoclass:: {{ name }}
:members:
..
autogenerated from _templates/autosummary/class.rst
note it does not have :inherited-members:
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}
{{ name | underline}}
.. autoclass:: {{ name }}
:members:
:special-members: __call__
..
autogenerated from _templates/callable.rst
note it does not have :inherited-members:
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}
{{ name | underline}}
.. autoclass:: {{ name }}
:members: transform
..
autogenerated from _templates/callable.rst
note it does not have :inherited-members:
# Convention in MMPretrain
## Model Naming Convention
We follow the below convention to name models. Contributors are advised to follow the same style. The model names are divided into five parts: algorithm info, module information, pretrain information, training information and data information. Logically, different parts are concatenated by underscores `'_'`, and words in the same part are concatenated by dashes `'-'`.
```text
{algorithm info}_{module info}_{pretrain info}_{training info}_{data info}
```
- `algorithm info` (optional): The main algorithm information, it's includes the main training algorithms like MAE, BEiT, etc.
- `module info`: The module information, it usually includes the backbone name, such as resnet, vit, etc.
- `pretrain info`: (optional): The pretrain model information, such as the pretrain model is trained on ImageNet-21k.
- `training info`: The training information, some training schedule, including batch size, lr schedule, data augment and the like.
- `data info`: The data information, it usually includes the dataset name, input size and so on, such as imagenet, cifar, etc.
### Algorithm information
The main algorithm name to train the model. For example:
- `simclr`
- `mocov2`
- `eva-mae-style`
The model trained by supervised image classification can omit this field.
### Module information
The modules of the model, usually, the backbone must be included in this field, and the neck and head
information can be omitted. For example:
- `resnet50`
- `vit-base-p16`
- `swin-base`
### Pretrain information
If the model is a fine-tuned model from a pre-trained model, we need to record some information of the
pre-trained model. For example:
- The source of the pre-trained model: `fb`, `openai`, etc.
- The method to train the pre-trained model: `clip`, `mae`, `distill`, etc.
- The dataset used for pre-training: `in21k`, `laion2b`, etc. (`in1k` can be omitted.)
- The training duration: `300e`, `1600e`, etc.
Not all information is necessary, only select the necessary information to distinguish different pre-trained
models.
At the end of this field, use a `-pre` as an identifier, like `mae-in21k-pre`.
### Training information
Training schedule, including training type, `batch size`, `lr schedule`, data augment, special loss functions and so on:
- format `{gpu x batch_per_gpu}`, such as `8xb32`
Training type (mainly seen in the transformer network, such as the `ViT` algorithm, which is usually divided into two training type: pre-training and fine-tuning):
- `ft` : configuration file for fine-tuning
- `pt` : configuration file for pretraining
Training recipe. Usually, only the part that is different from the original paper will be marked. These methods will be arranged in the order `{pipeline aug}-{train aug}-{loss trick}-{scheduler}-{epochs}`.
- `coslr-200e` : use cosine scheduler to train 200 epochs
- `autoaug-mixup-lbs-coslr-50e` : use `autoaug`, `mixup`, `label smooth`, `cosine scheduler` to train 50 epochs
If the model is converted from a third-party repository like the official repository, the training information
can be omitted and use a `3rdparty` as an identifier.
### Data information
- `in1k` : `ImageNet1k` dataset, default to use the input image size of 224x224;
- `in21k` : `ImageNet21k` dataset, also called `ImageNet22k` dataset, default to use the input image size of 224x224;
- `in1k-384px` : Indicates that the input image size is 384x384;
- `cifar100`
### Model Name Example
```text
vit-base-p32_clip-openai-pre_3rdparty_in1k
```
- `vit-base-p32`: The module information
- `clip-openai-pre`: The pre-train information.
- `clip`: The pre-train method is clip.
- `openai`: The pre-trained model is come from OpenAI.
- `pre`: The pre-train identifier.
- `3rdparty`: The model is converted from a third-party repository.
- `in1k`: Dataset information. The model is trained from ImageNet-1k dataset and the input size is `224x224`.
```text
beit_beit-base-p16_8xb256-amp-coslr-300e_in1k
```
- `beit`: The algorithm information
- `beit-base`: The module information, since the backbone is a modified ViT from BEiT, the backbone name is
also `beit`.
- `8xb256-amp-coslr-300e`: The training information.
- `8xb256`: Use 8 GPUs and the batch size on each GPU is 256.
- `amp`: Use automatic-mixed-precision training.
- `coslr`: Use cosine annealing learning rate scheduler.
- `300e`: To train 300 epochs.
- `in1k`: Dataset information. The model is trained from ImageNet-1k dataset and the input size is `224x224`.
## Config File Naming Convention
The naming of the config file is almost the same with the model name, with several difference:
- The training information is necessary, and cannot be `3rdparty`.
- If the config file only includes backbone settings, without neither head settings nor dataset settings. We
will name it as `{module info}_headless.py`. This kind of config files are usually used for third-party
pre-trained models on large datasets.
## Checkpoint Naming Convention
The naming of the weight mainly includes the model name, date and hash value.
```text
{model_name}_{date}-{hash}.pth
```
# Adding New Dataset
You can write a new dataset class inherited from `BaseDataset`, and overwrite `load_data_list(self)`,
like [CIFAR10](https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/datasets/cifar.py) and [ImageNet](https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/datasets/imagenet.py).
Typically, this function returns a list, where each sample is a dict, containing necessary data information, e.g., `img` and `gt_label`.
Assume we are going to implement a `Filelist` dataset, which takes filelists for both training and testing. The format of annotation list is as follows:
```text
000001.jpg 0
000002.jpg 1
```
## 1. Create Dataset Class
We can create a new dataset in `mmpretrain/datasets/filelist.py` to load the data.
```python
from mmpretrain.registry import DATASETS
from .base_dataset import BaseDataset
@DATASETS.register_module()
class Filelist(BaseDataset):
def load_data_list(self):
assert isinstance(self.ann_file, str),
data_list = []
with open(self.ann_file) as f:
samples = [x.strip().split(' ') for x in f.readlines()]
for filename, gt_label in samples:
img_path = add_prefix(filename, self.img_prefix)
info = {'img_path': img_path, 'gt_label': int(gt_label)}
data_list.append(info)
return data_list
```
## 2. Add to the package
And add this dataset class in `mmpretrain/datasets/__init__.py`
```python
from .base_dataset import BaseDataset
...
from .filelist import Filelist
__all__ = [
'BaseDataset', ... ,'Filelist'
]
```
## 3. Modify Related Config
Then in the config, to use `Filelist` you can modify the config as the following
```python
train_dataloader = dict(
...
dataset=dict(
type='Filelist',
ann_file='image_list.txt',
pipeline=train_pipeline,
)
)
```
All the dataset classes inherit from [`BaseDataset`](https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/datasets/base_dataset.py) have **lazy loading** and **memory saving** features, you can refer to related documents of {external+mmengine:doc}`BaseDataset <advanced_tutorials/basedataset>`.
```{note}
If the dictionary of the data sample contains 'img_path' but not 'img', then 'LoadImgFromFile' transform must be added in the pipeline.
```
# Customize Evaluation Metrics
## Use metrics in MMPretrain
In MMPretrain, we have provided multiple metrics for both single-label classification and multi-label
classification:
**Single-label Classification**:
- [`Accuracy`](mmpretrain.evaluation.Accuracy)
- [`SingleLabelMetric`](mmpretrain.evaluation.SingleLabelMetric), including precision, recall, f1-score and
support.
**Multi-label Classification**:
- [`AveragePrecision`](mmpretrain.evaluation.AveragePrecision), or AP (mAP).
- [`MultiLabelMetric`](mmpretrain.evaluation.MultiLabelMetric), including precision, recall, f1-score and
support.
To use these metrics during validation and testing, we need to modify the `val_evaluator` and `test_evaluator`
fields in the config file.
Here is several examples:
1. Calculate top-1 and top-5 accuracy during both validation and test.
```python
val_evaluator = dict(type='Accuracy', topk=(1, 5))
test_evaluator = val_evaluator
```
2. Calculate top-1 accuracy, top-5 accuracy, precision and recall during both validation and test.
```python
val_evaluator = [
dict(type='Accuracy', topk=(1, 5)),
dict(type='SingleLabelMetric', items=['precision', 'recall']),
]
test_evaluator = val_evaluator
```
3. Calculate mAP (mean AveragePrecision), CP (Class-wise mean Precision), CR (Class-wise mean Recall), CF
(Class-wise mean F1-score), OP (Overall mean Precision), OR (Overall mean Recall) and OF1 (Overall mean
F1-score).
```python
val_evaluator = [
dict(type='AveragePrecision'),
dict(type='MultiLabelMetric', average='macro'), # class-wise mean
dict(type='MultiLabelMetric', average='micro'), # overall mean
]
test_evaluator = val_evaluator
```
## Add new metrics
MMPretrain supports the implementation of customized evaluation metrics for users who pursue higher customization.
You need to create a new file under `mmpretrain/evaluation/metrics`, and implement the new metric in the file, for example, in `mmpretrain/evaluation/metrics/my_metric.py`. And create a customized evaluation metric class `MyMetric` which inherits [`BaseMetric in MMEngine`](mmengine.evaluator.BaseMetric).
The data format processing method `process` and the metric calculation method `compute_metrics` need to be overwritten respectively. Add it to the `METRICS` registry to implement any customized evaluation metric.
```python
from mmengine.evaluator import BaseMetric
from mmpretrain.registry import METRICS
@METRICS.register_module()
class MyMetric(BaseMetric):
def process(self, data_batch: Sequence[Dict], data_samples: Sequence[Dict]):
""" The processed results should be stored in ``self.results``, which will
be used to computed the metrics when all batches have been processed.
`data_batch` stores the batch data from dataloader,
and `data_samples` stores the batch outputs from model.
"""
...
def compute_metrics(self, results: List):
""" Compute the metrics from processed results and returns the evaluation results.
"""
...
```
Then, import it in the `mmpretrain/evaluation/metrics/__init__.py` to add it into the `mmpretrain.evaluation` package.
```python
# In mmpretrain/evaluation/metrics/__init__.py
...
from .my_metric import MyMetric
__all__ = [..., 'MyMetric']
```
Finally, use `MyMetric` in the `val_evaluator` and `test_evaluator` field of config files.
```python
val_evaluator = dict(type='MyMetric', ...)
test_evaluator = val_evaluator
```
```{note}
More details can be found in {external+mmengine:doc}`MMEngine Documentation: Evaluation <design/evaluation>`.
```
# Customize Models
In our design, a complete model is defined as a top-level module which contains several model components based on their functionalities.
- model: a top-level module defines the type of the task, such as `ImageClassifier` for image classification, `MAE` for self-supervised leanrning, `ImageToImageRetriever` for image retrieval.
- backbone: usually a feature extraction network that records the major differences between models, e.g., `ResNet`, `MobileNet`.
- neck: the component between backbone and head, e.g., `GlobalAveragePooling`.
- head: the component for specific tasks, e.g., `ClsHead`, `ContrastiveHead`.
- loss: the component in the head for calculating losses, e.g., `CrossEntropyLoss`, `LabelSmoothLoss`.
- target_generator: the component for self-supervised leanrning task specifically, e.g., `VQKD`, `HOGGenerator`.
## Add a new model
Generally, for image classification and retrieval tasks, the pipelines are consistent. However, the pipelines are different from each self-supervised leanrning algorithms, like `MAE` and `BEiT`. Thus, in this section, we will explain how to add your self-supervised learning algorithm.
### Add a new self-supervised learning algorithm
1. Create a new file `mmpretrain/models/selfsup/new_algorithm.py` and implement `NewAlgorithm` in it.
```python
from mmpretrain.registry import MODELS
from .base import BaseSelfSupvisor
@MODELS.register_module()
class NewAlgorithm(BaseSelfSupvisor):
def __init__(self, backbone, neck=None, head=None, init_cfg=None):
super().__init__(init_cfg)
pass
# ``extract_feat`` function is defined in BaseSelfSupvisor, you could
# overwrite it if needed
def extract_feat(self, inputs, **kwargs):
pass
# the core function to compute the loss
def loss(self, inputs, data_samples, **kwargs):
pass
```
2. Import the new algorithm module in `mmpretrain/models/selfsup/__init__.py`
```python
...
from .new_algorithm import NewAlgorithm
__all__ = [
...,
'NewAlgorithm',
...
]
```
3. Use it in your config file.
```python
model = dict(
type='NewAlgorithm',
backbone=...,
neck=...,
head=...,
...
)
```
## Add a new backbone
Here we present how to develop a new backbone component by an example of `ResNet_CIFAR`.
As the input size of CIFAR is 32x32, which is much smaller than the default size of 224x224 in ImageNet, this backbone replaces the `kernel_size=7, stride=2` to `kernel_size=3, stride=1` and removes the MaxPooling after the stem layer to avoid forwarding small feature maps to residual blocks.
The easiest way is to inherit from `ResNet` and only modify the stem layer.
1. Create a new file `mmpretrain/models/backbones/resnet_cifar.py`.
```python
import torch.nn as nn
from mmpretrain.registry import MODELS
from .resnet import ResNet
@MODELS.register_module()
class ResNet_CIFAR(ResNet):
"""ResNet backbone for CIFAR.
short description of the backbone
Args:
depth(int): Network depth, from {18, 34, 50, 101, 152}.
...
"""
def __init__(self, depth, deep_stem, **kwargs):
# call ResNet init
super(ResNet_CIFAR, self).__init__(depth, deep_stem=deep_stem, **kwargs)
# other specific initializations
assert not self.deep_stem, 'ResNet_CIFAR do not support deep_stem'
def _make_stem_layer(self, in_channels, base_channels):
# override the ResNet method to modify the network structure
self.conv1 = build_conv_layer(
self.conv_cfg,
in_channels,
base_channels,
kernel_size=3,
stride=1,
padding=1,
bias=False)
self.norm1_name, norm1 = build_norm_layer(
self.norm_cfg, base_channels, postfix=1)
self.add_module(self.norm1_name, norm1)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
# Customize the forward method if needed.
x = self.conv1(x)
x = self.norm1(x)
x = self.relu(x)
outs = []
for i, layer_name in enumerate(self.res_layers):
res_layer = getattr(self, layer_name)
x = res_layer(x)
if i in self.out_indices:
outs.append(x)
# The return value needs to be a tuple with multi-scale outputs from different depths.
# If you don't need multi-scale features, just wrap the output as a one-item tuple.
return tuple(outs)
def init_weights(self):
# Customize the weight initialization method if needed.
super().init_weights()
# Disable the weight initialization if loading a pretrained model.
if self.init_cfg is not None and self.init_cfg['type'] == 'Pretrained':
return
# Usually, we recommend using `init_cfg` to specify weight initialization methods
# of convolution, linear, or normalization layers. If you have some special needs,
# do these extra weight initialization here.
...
```
```{note}
Replace original registry names from `BACKBONES`, `NECKS`, `HEADS` and `LOSSES` to `MODELS` in OpenMMLab 2.0 design.
```
2. Import the new backbone module in `mmpretrain/models/backbones/__init__.py`.
```python
...
from .resnet_cifar import ResNet_CIFAR
__all__ = [
..., 'ResNet_CIFAR'
]
```
3. Modify the correlated settings in your config file.
```python
model = dict(
...
backbone=dict(
type='ResNet_CIFAR',
depth=18,
...),
...
```
### Add a new backbone for self-supervised learning
For some self-supervised learning algorithms, the backbones are kind of different, such as `MAE`, `BEiT`, etc. Their backbones need to deal with `mask` in order to extract features from visible tokens.
Take [MAEViT](mmpretrain.models.selfsup.MAEViT) as an example, we need to overwrite `forward` function to compute with `mask`. We also defines `init_weights` to initialize parameters and `random_masking` to generate mask for `MAE` pre-training.
```python
class MAEViT(VisionTransformer):
"""Vision Transformer for MAE pre-training"""
def __init__(mask_ratio, **kwargs) -> None:
super().__init__(**kwargs)
# position embedding is not learnable during pretraining
self.pos_embed.requires_grad = False
self.mask_ratio = mask_ratio
self.num_patches = self.patch_resolution[0] * self.patch_resolution[1]
def init_weights(self) -> None:
"""Initialize position embedding, patch embedding and cls token."""
super().init_weights()
# define what if needed
pass
def random_masking(
self,
x: torch.Tensor,
mask_ratio: float = 0.75
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
"""Generate the mask for MAE Pre-training."""
pass
def forward(
self,
x: torch.Tensor,
mask: Optional[bool] = True
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
"""Generate features for masked images.
The function supports two kind of forward behaviors. If the ``mask`` is
``True``, the function will generate mask to masking some patches
randomly and get the hidden features for visible patches, which means
the function will be executed as masked imagemodeling pre-training;
if the ``mask`` is ``None`` or ``False``, the forward function will
call ``super().forward()``, which extract features from images without
mask.
"""
if mask is None or False:
return super().forward(x)
else:
B = x.shape[0]
x = self.patch_embed(x)[0]
# add pos embed w/o cls token
x = x + self.pos_embed[:, 1:, :]
# masking: length -> length * mask_ratio
x, mask, ids_restore = self.random_masking(x, self.mask_ratio)
# append cls token
cls_token = self.cls_token + self.pos_embed[:, :1, :]
cls_tokens = cls_token.expand(B, -1, -1)
x = torch.cat((cls_tokens, x), dim=1)
for _, layer in enumerate(self.layers):
x = layer(x)
# Use final norm
x = self.norm1(x)
return (x, mask, ids_restore)
```
## Add a new neck
Here we take `GlobalAveragePooling` as an example. It is a very simple neck without any arguments.
To add a new neck, we mainly implement the `forward` function, which applies some operations on the output from the backbone and forwards the results to the head.
1. Create a new file in `mmpretrain/models/necks/gap.py`.
```python
import torch.nn as nn
from mmpretrain.registry import MODELS
@MODELS.register_module()
class GlobalAveragePooling(nn.Module):
def __init__(self):
self.gap = nn.AdaptiveAvgPool2d((1, 1))
def forward(self, inputs):
# we regard inputs as tensor for simplicity
outs = self.gap(inputs)
outs = outs.view(inputs.size(0), -1)
return outs
```
2. Import the new neck module in `mmpretrain/models/necks/__init__.py`.
```python
...
from .gap import GlobalAveragePooling
__all__ = [
..., 'GlobalAveragePooling'
]
```
3. Modify the correlated settings in your config file.
```python
model = dict(
neck=dict(type='GlobalAveragePooling'),
)
```
## Add a new head
### Based on ClsHead
Here we present how to develop a new head by the example of simplified `VisionTransformerClsHead` as the following.
To implement a new head, we need to implement a `pre_logits` method for processes before the final classification head and a `forward` method.
:::{admonition} Why do we need the `pre_logits` method?
:class: note
In classification tasks, we usually use a linear layer to do the final classification. And sometimes, we need
to obtain the feature before the final classification, which is the output of the `pre_logits` method.
:::
1. Create a new file in `mmpretrain/models/heads/vit_head.py`.
```python
import torch.nn as nn
from mmpretrain.registry import MODELS
from .cls_head import ClsHead
@MODELS.register_module()
class VisionTransformerClsHead(ClsHead):
def __init__(self, num_classes, in_channels, hidden_dim, **kwargs):
super().__init__(**kwargs)
self.in_channels = in_channels
self.num_classes = num_classes
self.hidden_dim = hidden_dim
self.fc1 = nn.Linear(in_channels, hidden_dim)
self.act = nn.Tanh()
self.fc2 = nn.Linear(hidden_dim, num_classes)
def pre_logits(self, feats):
# The output of the backbone is usually a tuple from multiple depths,
# and for classification, we only need the final output.
feat = feats[-1]
# The final output of VisionTransformer is a tuple of patch tokens and
# classification tokens. We need classification tokens here.
_, cls_token = feat
# Do all works except the final classification linear layer.
return self.act(self.fc1(cls_token))
def forward(self, feats):
pre_logits = self.pre_logits(feats)
# The final classification linear layer.
cls_score = self.fc2(pre_logits)
return cls_score
```
2. Import the module in `mmpretrain/models/heads/__init__.py`.
```python
...
from .vit_head import VisionTransformerClsHead
__all__ = [
..., 'VisionTransformerClsHead'
]
```
3. Modify the correlated settings in your config file.
```python
model = dict(
head=dict(
type='VisionTransformerClsHead',
...,
))
```
### Based on BaseModule
Here is an example of `MAEPretrainHead`, which is based on `BaseModule` and implemented for mask image modeling task. It is required to implement `loss` function to generate loss, but the other helper functions are optional.
```python
# Copyright (c) OpenMMLab. All rights reserved.
import torch
from mmengine.model import BaseModule
from mmpretrain.registry import MODELS
@MODELS.register_module()
class MAEPretrainHead(BaseModule):
"""Head for MAE Pre-training."""
def __init__(self,
loss: dict,
norm_pix: bool = False,
patch_size: int = 16) -> None:
super().__init__()
self.norm_pix = norm_pix
self.patch_size = patch_size
self.loss_module = MODELS.build(loss)
def patchify(self, imgs: torch.Tensor) -> torch.Tensor:
"""Split images into non-overlapped patches."""
p = self.patch_size
assert imgs.shape[2] == imgs.shape[3] and imgs.shape[2] % p == 0
h = w = imgs.shape[2] // p
x = imgs.reshape(shape=(imgs.shape[0], 3, h, p, w, p))
x = torch.einsum('nchpwq->nhwpqc', x)
x = x.reshape(shape=(imgs.shape[0], h * w, p**2 * 3))
return x
def construct_target(self, target: torch.Tensor) -> torch.Tensor:
"""Construct the reconstruction target."""
target = self.patchify(target)
if self.norm_pix:
# normalize the target image
mean = target.mean(dim=-1, keepdim=True)
var = target.var(dim=-1, keepdim=True)
target = (target - mean) / (var + 1.e-6)**.5
return target
def loss(self, pred: torch.Tensor, target: torch.Tensor,
mask: torch.Tensor) -> torch.Tensor:
"""Generate loss."""
target = self.construct_target(target)
loss = self.loss_module(pred, target, mask)
return loss
```
After implementation, the following step is the same as the step-2 and step-3 in [Based on ClsHead](#based-on-clshead)
## Add a new loss
To add a new loss function, we mainly implement the `forward` function in the loss module. We should register the loss module as `MODELS` as well.
In addition, it is helpful to leverage the decorator `weighted_loss` to weight the loss for each element.
Assuming that we want to mimic a probabilistic distribution generated from another classification model, we implement an L1Loss to fulfill the purpose as below.
1. Create a new file in `mmpretrain/models/losses/l1_loss.py`.
```python
import torch
import torch.nn as nn
from mmpretrain.registry import MODELS
from .utils import weighted_loss
@weighted_loss
def l1_loss(pred, target):
assert pred.size() == target.size() and target.numel() > 0
loss = torch.abs(pred - target)
return loss
@MODELS.register_module()
class L1Loss(nn.Module):
def __init__(self, reduction='mean', loss_weight=1.0):
super(L1Loss, self).__init__()
self.reduction = reduction
self.loss_weight = loss_weight
def forward(self,
pred,
target,
weight=None,
avg_factor=None,
reduction_override=None):
assert reduction_override in (None, 'none', 'mean', 'sum')
reduction = (
reduction_override if reduction_override else self.reduction)
loss = self.loss_weight * l1_loss(
pred, target, weight, reduction=reduction, avg_factor=avg_factor)
return loss
```
2. Import the module in `mmpretrain/models/losses/__init__.py`.
```python
...
from .l1_loss import L1Loss
__all__ = [
..., 'L1Loss'
]
```
3. Modify loss field in the head configs.
```python
model = dict(
head=dict(
loss=dict(type='L1Loss', loss_weight=1.0),
))
```
Finally, we can combine all the new model components in a config file to create a new model for best practices. Because `ResNet_CIFAR` is not a ViT-based backbone, we do not implement `VisionTransformerClsHead` here.
```python
model = dict(
type='ImageClassifier',
backbone=dict(
type='ResNet_CIFAR',
depth=18,
num_stages=4,
out_indices=(3, ),
style='pytorch'),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=10,
in_channels=512,
loss=dict(type='L1Loss', loss_weight=1.0),
topk=(1, 5),
))
```
```{tip}
For convenience, the same model components could inherit from existing config files, refers to [Learn about configs](../user_guides/config.md) for more details.
```
# Customize Data Pipeline
## Design of Data pipelines
In the [new dataset tutorial](./datasets.md), we know that the dataset class use the `load_data_list` method
to initialize the entire dataset, and we save the information of every sample to a dict.
Usually, to save memory usage, we only load image paths and labels in the `load_data_list`, and load full
image content when we use them. Moreover, we may want to do some random data augmentation during picking
samples when training. Almost all data loading, pre-processing, and formatting operations can be configured in
MMPretrain by the **data pipeline**.
The data pipeline means how to process the sample dict when indexing a sample from the dataset. And it
consists of a sequence of data transforms. Each data transform takes a dict as input, processes it, and outputs a
dict for the next data transform.
Here is a data pipeline example for ResNet-50 training on ImageNet.
```python
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='RandomResizedCrop', scale=224),
dict(type='RandomFlip', prob=0.5, direction='horizontal'),
dict(type='PackInputs'),
]
```
All available data transforms in MMPretrain can be found in the [data transforms docs](mmpretrain.datasets.transforms).
## Modify the training/test pipeline
The data pipeline in MMPretrain is pretty flexible. You can control almost every step of the data
preprocessing from the config file, but on the other hand, you may be confused facing so many options.
Here is a common practice and guidance for image classification tasks.
### Loading
At the beginning of a data pipeline, we usually need to load image data from the file path.
[`LoadImageFromFile`](mmcv.transforms.LoadImageFromFile) is commonly used to do this task.
```python
train_pipeline = [
dict(type='LoadImageFromFile'),
...
]
```
If you want to load data from files with special formats or special locations, you can [implement a new loading
transform](#add-new-data-transforms) and add it at the beginning of the data pipeline.
### Augmentation and other processing
During training, we usually need to do data augmentation to avoid overfitting. During the test, we also need to do
some data processing like resizing and cropping. These data transforms will be placed after the loading process.
Here is a simple data augmentation recipe example. It will randomly resize and crop the input image to the
specified scale, and randomly flip the image horizontally with probability.
```python
train_pipeline = [
...
dict(type='RandomResizedCrop', scale=224),
dict(type='RandomFlip', prob=0.5, direction='horizontal'),
...
]
```
Here is a heavy data augmentation recipe example used in [Swin-Transformer](../papers/swin_transformer.md)
training. To align with the official implementation, it specified `pillow` as the resize backend and `bicubic`
as the resize algorithm. Moreover, it added [`RandAugment`](mmpretrain.datasets.transforms.RandAugment) and
[`RandomErasing`](mmpretrain.datasets.transforms.RandomErasing) as extra data augmentation method.
This configuration specified every detail of the data augmentation, and you can simply copy it to your own
config file to apply the data augmentations of the Swin-Transformer.
```python
bgr_mean = [103.53, 116.28, 123.675]
bgr_std = [57.375, 57.12, 58.395]
train_pipeline = [
...
dict(type='RandomResizedCrop', scale=224, backend='pillow', interpolation='bicubic'),
dict(type='RandomFlip', prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies='timm_increasing',
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(
pad_val=[round(x) for x in bgr_mean], interpolation='bicubic')),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=1 / 3,
fill_color=bgr_mean,
fill_std=bgr_std),
...
]
```
```{note}
Usually, the data augmentation part in the data pipeline handles only image-wise transforms, but not transforms
like image normalization or mixup/cutmix. It's because we can do image normalization and mixup/cutmix on batch data
to accelerate. To configure image normalization and mixup/cutmix, please use the [data preprocessor](mmpretrain.models.utils.data_preprocessor).
```
### Formatting
The formatting is to collect training data from the data information dict and convert these data to
model-friendly format.
In most cases, you can simply use [`PackInputs`](mmpretrain.datasets.transforms.PackInputs), and it will
convert the image in NumPy array format to PyTorch tensor, and pack the ground truth categories information and
other meta information as a [`DataSample`](mmpretrain.structures.DataSample).
```python
train_pipeline = [
...
dict(type='PackInputs'),
]
```
## Add new data transforms
1. Write a new data transform in any file, e.g., `my_transform.py`, and place it in
the folder `mmpretrain/datasets/transforms/`. The data transform class needs to inherit
the [`mmcv.transforms.BaseTransform`](mmcv.transforms.BaseTransform) class and override
the `transform` method which takes a dict as input and returns a dict.
```python
from mmcv.transforms import BaseTransform
from mmpretrain.registry import TRANSFORMS
@TRANSFORMS.register_module()
class MyTransform(BaseTransform):
def transform(self, results):
# Modify the data information dict `results`.
return results
```
2. Import the new class in the `mmpretrain/datasets/transforms/__init__.py`.
```python
...
from .my_transform import MyTransform
__all__ = [
..., 'MyTransform'
]
```
3. Use it in config files.
```python
train_pipeline = [
...
dict(type='MyTransform'),
...
]
```
## Pipeline visualization
After designing data pipelines, you can use the [visualization tools](../useful_tools/dataset_visualization.md) to view the performance.
# Customize Runtime Settings
The runtime configurations include many helpful functionalities, like checkpoint saving, logger configuration,
etc. In this tutorial, we will introduce how to configure these functionalities.
## Save Checkpoint
The checkpoint saving functionality is a default hook during training. And you can configure it in the
`default_hooks.checkpoint` field.
```{note}
The hook mechanism is widely used in all OpenMMLab libraries. Through hooks, you can plug in many
functionalities without modifying the main execution logic of the runner.
A detailed introduction of hooks can be found in {external+mmengine:doc}`Hooks <tutorials/hook>`.
```
**The default settings**
```python
default_hooks = dict(
...
checkpoint = dict(type='CheckpointHook', interval=1)
...
)
```
Here are some usual arguments, and all available arguments can be found in the [CheckpointHook](mmengine.hooks.CheckpointHook).
- **`interval`** (int): The saving period. If use -1, it will never save checkpoints.
- **`by_epoch`** (bool): Whether the **`interval`** is by epoch or by iteration. Defaults to `True`.
- **`out_dir`** (str): The root directory to save checkpoints. If not specified, the checkpoints will be saved in the work directory. If specified, the checkpoints will be saved in the sub-folder of the **`out_dir`**.
- **`max_keep_ckpts`** (int): The maximum checkpoints to keep. In some cases, we want only the latest few checkpoints and would like to delete old ones to save disk space. Defaults to -1, which means unlimited.
- **`save_best`** (str, List[str]): If specified, it will save the checkpoint with the best evaluation result.
Usually, you can simply use `save_best="auto"` to automatically select the evaluation metric.
And if you want more advanced configuration, please refer to the [CheckpointHook docs](tutorials/hook.md#checkpointhook).
## Load Checkpoint / Resume Training
In config files, you can specify the loading and resuming functionality as below:
```python
# load from which checkpoint
load_from = "Your checkpoint path"
# whether to resume training from the loaded checkpoint
resume = False
```
The `load_from` field can be either a local path or an HTTP path. And you can resume training from the checkpoint by
specify `resume=True`.
```{tip}
You can also enable auto resuming from the latest checkpoint by specifying `load_from=None` and `resume=True`.
Runner will find the latest checkpoint from the work directory automatically.
```
If you are training models by our `tools/train.py` script, you can also use `--resume` argument to resume
training without modifying the config file manually.
```bash
# Automatically resume from the latest checkpoint.
python tools/train.py configs/resnet/resnet50_8xb32_in1k.py --resume
# Resume from the specified checkpoint.
python tools/train.py configs/resnet/resnet50_8xb32_in1k.py --resume checkpoints/resnet.pth
```
## Randomness Configuration
In the `randomness` field, we provide some options to make the experiment as reproducible as possible.
By default, we won't specify seed in the config file, and in every experiment, the program will generate a random seed.
**Default settings:**
```python
randomness = dict(seed=None, deterministic=False)
```
To make the experiment more reproducible, you can specify a seed and set `deterministic=True`. The influence
of the `deterministic` option can be found [here](https://pytorch.org/docs/stable/notes/randomness.html#cuda-convolution-benchmarking).
## Log Configuration
The log configuration relates to multiple fields.
In the `log_level` field, you can specify the global logging level. See {external+python:ref}`Logging Levels<levels>` for a list of levels.
```python
log_level = 'INFO'
```
In the `default_hooks.logger` field, you can specify the logging interval during training and testing. And all
available arguments can be found in the [LoggerHook docs](tutorials/hook.md#loggerhook).
```python
default_hooks = dict(
...
# print log every 100 iterations.
logger=dict(type='LoggerHook', interval=100),
...
)
```
In the `log_processor` field, you can specify the log smooth method. Usually, we use a window with length of 10
to smooth the log and output the mean value of all information. If you want to specify the smooth method of
some information finely, see the {external+mmengine:doc}`LogProcessor docs <advanced_tutorials/logging>`.
```python
# The default setting, which will smooth the values in training log by a 10-length window.
log_processor = dict(window_size=10)
```
In the `visualizer` field, you can specify multiple backends to save the log information, such as TensorBoard
and WandB. More details can be found in the [Visualizer section](#visualizer).
## Custom Hooks
Many above functionalities are implemented by hooks, and you can also plug-in other custom hooks by modifying
`custom_hooks` field. Here are some hooks in MMEngine and MMPretrain that you can use directly, such as:
- [EMAHook](mmpretrain.engine.hooks.EMAHook)
- [SyncBuffersHook](mmengine.hooks.SyncBuffersHook)
- [EmptyCacheHook](mmengine.hooks.EmptyCacheHook)
- [ClassNumCheckHook](mmpretrain.engine.hooks.ClassNumCheckHook)
- ......
For example, EMA (Exponential Moving Average) is widely used in the model training, and you can enable it as
below:
```python
custom_hooks = [
dict(type='EMAHook', momentum=4e-5, priority='ABOVE_NORMAL'),
]
```
## Visualize Validation
The validation visualization functionality is a default hook during validation. And you can configure it in the
`default_hooks.visualization` field.
By default, we disabled it, and you can enable it by specifying `enable=True`. And more arguments can be found in
the [VisualizationHook docs](mmpretrain.engine.hooks.VisualizationHook).
```python
default_hooks = dict(
...
visualization=dict(type='VisualizationHook', enable=False),
...
)
```
This hook will select some images in the validation dataset, and tag the prediction results on these images
during every validation process. You can use it to watch the varying of model performance on actual images
during training.
In addition, if the images in your validation dataset are small (\<100), you can rescale them before
visualization by specifying `rescale_factor=2.` or higher.
## Visualizer
The visualizer is used to record all kinds of information during training and test, including logs, images and
scalars. By default, the recorded information will be saved at the `vis_data` folder under the work directory.
**Default settings:**
```python
visualizer = dict(
type='UniversalVisualizer',
vis_backends=[
dict(type='LocalVisBackend'),
]
)
```
Usually, the most useful function is to save the log and scalars like `loss` to different backends.
For example, to save them to TensorBoard, simply set them as below:
```python
visualizer = dict(
type='UniversalVisualizer',
vis_backends=[
dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend'),
]
)
```
Or save them to WandB as below:
```python
visualizer = dict(
type='UniversalVisualizer',
vis_backends=[
dict(type='LocalVisBackend'),
dict(type='WandbVisBackend'),
]
)
```
## Environment Configuration
In the `env_cfg` field, you can configure some low-level parameters, like cuDNN, multi-process, and distributed
communication.
**Please make sure you understand the meaning of these parameters before modifying them.**
```python
env_cfg = dict(
# whether to enable cudnn benchmark
cudnn_benchmark=False,
# set multi-process parameters
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
# set distributed parameters
dist_cfg=dict(backend='nccl'),
)
```
# Customize Training Schedule
In our codebase, [default training schedules](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/schedules) have been provided for common datasets such as CIFAR, ImageNet, etc. If we attempt to experiment on these datasets for higher accuracy or on different new methods and datasets, we might possibly need to modify the strategies.
In this tutorial, we will introduce how to modify configs to construct optimizers, use parameter-wise finely configuration, gradient clipping, gradient accumulation as well as customize learning rate and momentum schedules. Furthermore, introduce a template to customize self-implemented optimizationmethods for the project.
## Customize optimization
We use the `optim_wrapper` field to configure the strategies of optimization, which includes choices of optimizer, choices of automatic mixed precision training, parameter-wise configurations, gradient clipping and accumulation. Details are seen below.
### Use optimizers supported by PyTorch
We support all the optimizers implemented by PyTorch, and to use them, please change the `optimizer` field of config files.
For example, if you want to use [`SGD`](torch.optim.SGD), the modification in config file could be as the following. Notice that optimization related settings should all wrapped inside the `optim_wrapper`.
```python
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='SGD', lr=0.0003, weight_decay=0.0001)
)
```
```{note}
`type` in optimizer is not a constructor but a optimizer name in PyTorch.
Refers to {external+torch:ref}`List of optimizers supported by PyTorch <optim:algorithms>` for more choices.
```
To modify the learning rate of the model, just modify the `lr` in the config of optimizer.
You can also directly set other arguments according to the [API doc](torch.optim) of PyTorch.
For example, if you want to use [`Adam`](torch.optim.Adam) with settings like `torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)` in PyTorch. You could use the config below:
```python
optim_wrapper = dict(
type='OptimWrapper',
optimizer = dict(
type='Adam',
lr=0.001,
betas=(0.9, 0.999),
eps=1e-08,
weight_decay=0,
amsgrad=False),
)
```
````{note}
The default type of `optim_wrapper` field is [`OptimWrapper`](mmengine.optim.OptimWrapper), therefore, you can
omit the type field usually, like:
```python
optim_wrapper = dict(
optimizer=dict(
type='Adam',
lr=0.001,
betas=(0.9, 0.999),
eps=1e-08,
weight_decay=0,
amsgrad=False))
```
````
### Use AMP training
If we want to use the automatic mixed precision training, we can simply change the type of `optim_wrapper` to `AmpOptimWrapper` in config files.
```python
optim_wrapper = dict(type='AmpOptimWrapper', optimizer=...)
```
Alternatively, for conveniency, we can set `--amp` parameter to turn on the AMP option directly in the `tools/train.py` script. Refers to [Training tutorial](../user_guides/train.md) for details of starting a training.
### Parameter-wise finely configuration
Some models may have parameter-specific settings for optimization, for example, no weight decay to the BatchNorm layers or using different learning rates for different network layers.
To finely configure them, we can use the `paramwise_cfg` argument in `optim_wrapper`.
- **Set different hyper-parameter multipliers for different types of parameters.**
For instance, we can set `norm_decay_mult=0.` in `paramwise_cfg` to change the weight decay of weight and bias of normalization layers to zero.
```python
optim_wrapper = dict(
optimizer=dict(type='SGD', lr=0.8, weight_decay=1e-4),
paramwise_cfg=dict(norm_decay_mult=0.))
```
More types of parameters are supported to configured, list as follow:
- `bias_lr_mult`: Multiplier for learning rate of bias (Not include normalization layers' biases and deformable convolution layers' offsets). Defaults to 1.
- `bias_decay_mult`: Multiplier for weight decay of bias (Not include normalization layers' biases and deformable convolution layers' offsets). Defaults to 1.
- `norm_decay_mult`: Multiplier for weight decay of weight and bias of normalization layers. Defaults to 1.
- `flat_decay_mult`: Multiplier for weight decay of all one-dimensional parameters. Defaults to 1.
- `dwconv_decay_mult`: Multiplier for weight decay of depth-wise convolution layers. Defaults to 1.
- `bypass_duplicate`: Whether to bypass duplicated parameters. Defaults to `False`.
- `dcn_offset_lr_mult`: Multiplier for learning rate of deformable convolution layers. Defaults to 1.
- **Set different hyper-parameter multipliers for specific parameters.**
MMPretrain can use `custom_keys` in `paramwise_cfg` to specify different parameters to use different learning rates or weight decay.
For example, to set all learning rates and weight decays of `backbone.layer0` to 0, the rest of `backbone` remains the same as optimizer and the learning rate of `head` to 0.001, use the configs below.
```python
optim_wrapper = dict(
optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001),
paramwise_cfg=dict(
custom_keys={
'backbone.layer0': dict(lr_mult=0, decay_mult=0),
'backbone': dict(lr_mult=1),
'head': dict(lr_mult=0.1)
}))
```
### Gradient clipping
During the training process, the loss function may get close to a cliffy region and cause gradient explosion. And gradient clipping is helpful to stabilize the training process. More introduction can be found in [this page](https://paperswithcode.com/method/gradient-clipping).
Currently we support `clip_grad` option in `optim_wrapper` for gradient clipping, refers to [PyTorch Documentation](torch.nn.utils.clip_grad_norm_).
Here is an example:
```python
optim_wrapper = dict(
optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001),
# norm_type: type of the used p-norm, here norm_type is 2.
clip_grad=dict(max_norm=35, norm_type=2))
```
### Gradient accumulation
When computing resources are lacking, the batch size can only be set to a small value, which may affect the performance of models. Gradient accumulation can be used to solve this problem. We support `accumulative_counts` option in `optim_wrapper` for gradient accumulation.
Here is an example:
```python
train_dataloader = dict(batch_size=64)
optim_wrapper = dict(
optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001),
accumulative_counts=4)
```
Indicates that during training, back-propagation is performed every 4 iters. And the above is equivalent to:
```python
train_dataloader = dict(batch_size=256)
optim_wrapper = dict(
optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001))
```
## Customize parameter schedules
In training, the optimzation parameters such as learing rate, momentum, are usually not fixed but changing through iterations or epochs. PyTorch supports several learning rate schedulers, which are not sufficient for complex strategies. In MMPretrain, we provide `param_scheduler` for better controls of different parameter schedules.
### Customize learning rate schedules
Learning rate schedulers are widely used to improve performance. We support most of the PyTorch schedulers, including `ExponentialLR`, `LinearLR`, `StepLR`, `MultiStepLR`, etc.
All available learning rate scheduler can be found {external+mmengine:doc}`here <api/optim>`, and the
names of learning rate schedulers end with `LR`.
- **Single learning rate schedule**
In most cases, we use only one learning rate schedule for simplicity. For instance, [`MultiStepLR`](mmengine.optim.MultiStepLR) is used as the default learning rate schedule for ResNet. Here, `param_scheduler` is a dictionary.
```python
param_scheduler = dict(
type='MultiStepLR',
by_epoch=True,
milestones=[100, 150],
gamma=0.1)
```
Or, we want to use the [`CosineAnnealingLR`](mmengine.optim.CosineAnnealingLR) scheduler to decay the learning rate:
```python
param_scheduler = dict(
type='CosineAnnealingLR',
by_epoch=True,
T_max=num_epochs)
```
- **Multiple learning rate schedules**
In some of the training cases, multiple learning rate schedules are applied for higher accuracy. For example ,in the early stage, training is easy to be volatile, and warmup is a technique to reduce volatility.
The learning rate will increase gradually from a minor value to the expected value by warmup and decay afterwards by other schedules.
In MMPretrain, simply combines desired schedules in `param_scheduler` as a list can achieve the warmup strategy.
Here are some examples:
1. linear warmup during the first 50 iters.
```python
param_scheduler = [
# linear warm-up by iters
dict(type='LinearLR',
start_factor=0.001,
by_epoch=False, # by iters
end=50), # only warm up for first 50 iters
# main learing rate schedule
dict(type='MultiStepLR',
by_epoch=True,
milestones=[8, 11],
gamma=0.1)
]
```
2. linear warmup and update lr by iter during the first 10 epochs.
```python
param_scheduler = [
# linear warm-up by epochs in [0, 10) epochs
dict(type='LinearLR',
start_factor=0.001,
by_epoch=True,
end=10,
convert_to_iter_based=True, # Update learning rate by iter.
),
# use CosineAnnealing schedule after 10 epochs
dict(type='CosineAnnealingLR', by_epoch=True, begin=10)
]
```
Notice that, we use `begin` and `end` arguments here to assign the valid range, which is [`begin`, `end`) for this schedule. And the range unit is defined by `by_epoch` argument. If not specified, the `begin` is 0 and the `end` is the max epochs or iterations.
If the ranges for all schedules are not continuous, the learning rate will stay constant in ignored range, otherwise all valid schedulers will be executed in order in a specific stage, which behaves the same as PyTorch [`ChainedScheduler`](torch.optim.lr_scheduler.ChainedScheduler).
```{tip}
To check that the learning rate curve is as expected, after completing your configuration file,you could use [optimizer parameter visualization tool](../useful_tools/scheduler_visualization.md) to draw the corresponding learning rate adjustment curve.
```
### Customize momentum schedules
We support using momentum schedulers to modify the optimizer's momentum according to learning rate, which could make the loss converge in a faster way. The usage is the same as learning rate schedulers.
All available learning rate scheduler can be found {external+mmengine:doc}`here <api/optim>`, and the
names of momentum rate schedulers end with `Momentum`.
Here is an example:
```python
param_scheduler = [
# the lr scheduler
dict(type='LinearLR', ...),
# the momentum scheduler
dict(type='LinearMomentum',
start_factor=0.001,
by_epoch=False,
begin=0,
end=1000)
]
```
## Add new optimizers or constructors
```{note}
This part will modify the MMPretrain source code or add code to the MMPretrain framework, beginners can skip it.
```
### Add new optimizers
In academic research and industrial practice, it may be necessary to use optimization methods not implemented by MMPretrain, and you can add them through the following methods.
1. Implement a New Optimizer
Assume you want to add an optimizer named `MyOptimizer`, which has arguments `a`, `b`, and `c`.
You need to create a new file under `mmpretrain/engine/optimizers`, and implement the new optimizer in the file, for example, in `mmpretrain/engine/optimizers/my_optimizer.py`:
```python
from torch.optim import Optimizer
from mmpretrain.registry import OPTIMIZERS
@OPTIMIZERS.register_module()
class MyOptimizer(Optimizer):
def __init__(self, a, b, c):
...
def step(self, closure=None):
...
```
2. Import the Optimizer
To find the above module defined above, this module should be imported during the running.
Import it in the `mmpretrain/engine/optimizers/__init__.py` to add it into the `mmpretrain.engine` package.
```python
# In mmpretrain/engine/optimizers/__init__.py
...
from .my_optimizer import MyOptimizer # MyOptimizer maybe other class name
__all__ = [..., 'MyOptimizer']
```
During running, we will automatically import the `mmpretrain.engine` package and register the `MyOptimizer` at the same time.
3. Specify the Optimizer in Config
Then you can use `MyOptimizer` in the `optim_wrapper.optimizer` field of config files.
```python
optim_wrapper = dict(
optimizer=dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value))
```
### Add new optimizer constructors
Some models may have some parameter-specific settings for optimization, like different weight decay rate for all `BatchNorm` layers.
Although we already can use [the `optim_wrapper.paramwise_cfg` field](#parameter-wise-finely-configuration) to
configure various parameter-specific optimizer settings. It may still not cover your need.
Of course, you can modify it. By default, we use the [`DefaultOptimWrapperConstructor`](mmengine.optim.DefaultOptimWrapperConstructor)
class to deal with the construction of optimizer. And during the construction, it fine-grainedly configures the optimizer settings of
different parameters according to the `paramwise_cfg`,which could also serve as a template for new optimizer constructor.
You can overwrite these behaviors by add new optimizer constructors.
```python
# In mmpretrain/engine/optimizers/my_optim_constructor.py
from mmengine.optim import DefaultOptimWrapperConstructor
from mmpretrain.registry import OPTIM_WRAPPER_CONSTRUCTORS
@OPTIM_WRAPPER_CONSTRUCTORS.register_module()
class MyOptimWrapperConstructor:
def __init__(self, optim_wrapper_cfg, paramwise_cfg=None):
...
def __call__(self, model):
...
```
Here is a specific example of [OptimWrapperConstructor](mmpretrain.engine.optimizers.LearningRateDecayOptimWrapperConstructor).
And then, import it and use it almost like [the optimizer tutorial](#add-new-optimizers).
1. Import it in the `mmpretrain/engine/optimizers/__init__.py` to add it into the `mmpretrain.engine` package.
```python
# In mmpretrain/engine/optimizers/__init__.py
...
from .my_optim_constructor import MyOptimWrapperConstructor
__all__ = [..., 'MyOptimWrapperConstructor']
```
2. Use `MyOptimWrapperConstructor` in the `optim_wrapper.constructor` field of config files.
```python
optim_wrapper = dict(
constructor=dict(type='MyOptimWrapperConstructor'),
optimizer=...,
paramwise_cfg=...,
)
```
.. role:: hidden
:class: hidden-section
.. module:: mmpretrain.apis
mmpretrain.apis
===================================
These are some high-level APIs for classification tasks.
.. contents:: mmpretrain.apis
:depth: 2
:local:
:backlinks: top
Model
------------------
.. autosummary::
:toctree: generated
:nosignatures:
list_models
get_model
Inference
------------------
.. autosummary::
:toctree: generated
:nosignatures:
:template: callable.rst
ImageClassificationInferencer
ImageRetrievalInferencer
ImageCaptionInferencer
VisualQuestionAnsweringInferencer
VisualGroundingInferencer
TextToImageRetrievalInferencer
ImageToTextRetrievalInferencer
NLVRInferencer
FeatureExtractor
.. autosummary::
:toctree: generated
:nosignatures:
inference_model
.. role:: hidden
:class: hidden-section
Data Process
=================
In MMPreTrain, the data process and the dataset is decomposed. The
datasets only define how to get samples' basic information from the file
system. These basic information includes the ground-truth label and raw
images data / the paths of images.The data process includes data transforms,
data preprocessors and batch augmentations.
- :mod:`Data Transforms <mmpretrain.datasets.transforms>`: Transforms includes loading, preprocessing, formatting and etc.
- :mod:`Data Preprocessors <mmpretrain.models.utils.data_preprocessor>`: Processes includes collate, normalization, stacking, channel fliping and etc.
- :mod:`Batch Augmentations <mmpretrain.models.utils.batch_augments>`: Batch augmentation involves multiple samples, such as Mixup and CutMix.
.. module:: mmpretrain.datasets.transforms
Data Transforms
--------------------
To prepare the inputs data, we need to do some transforms on these basic
information. These transforms includes loading, preprocessing and
formatting. And a series of data transforms makes up a data pipeline.
Therefore, you can find the a ``pipeline`` argument in the configs of dataset,
for example:
.. code:: python
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='RandomResizedCrop', scale=224),
dict(type='RandomFlip', prob=0.5, direction='horizontal'),
dict(type='PackInputs'),
]
train_dataloader = dict(
....
dataset=dict(
pipeline=train_pipeline,
....),
....
)
Every item of a pipeline list is one of the following data transforms class. And if you want to add a custom data transformation class, the tutorial :doc:`Custom Data Pipelines </advanced_guides/pipeline>` will help you.
.. contents::
:depth: 1
:local:
:backlinks: top
Loading and Formatting
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autosummary::
:toctree: generated
:nosignatures:
:template: data_transform.rst
LoadImageFromFile
PackInputs
PackMultiTaskInputs
PILToNumpy
NumpyToPIL
Transpose
Collect
Processing and Augmentation
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autosummary::
:toctree: generated
:nosignatures:
:template: data_transform.rst
Albumentations
CenterCrop
ColorJitter
EfficientNetCenterCrop
EfficientNetRandomCrop
Lighting
Normalize
RandomCrop
RandomErasing
RandomFlip
RandomGrayscale
RandomResize
RandomResizedCrop
Resize
ResizeEdge
BEiTMaskGenerator
SimMIMMaskGenerator
Composed Augmentation
"""""""""""""""""""""
Composed augmentation is a kind of methods which compose a series of data
augmentation transforms, such as ``AutoAugment`` and ``RandAugment``.
.. autosummary::
:toctree: generated
:nosignatures:
:template: data_transform.rst
AutoAugment
RandAugment
The above transforms is composed from a group of policies from the below random
transforms:
.. autosummary::
:toctree: generated
:nosignatures:
:template: data_transform.rst
AutoContrast
Brightness
ColorTransform
Contrast
Cutout
Equalize
GaussianBlur
Invert
Posterize
Rotate
Sharpness
Shear
Solarize
SolarizeAdd
Translate
BaseAugTransform
MMCV transforms
^^^^^^^^^^^^^^^
We also provides many transforms in MMCV. You can use them directly in the config files. Here are some frequently used transforms, and the whole transforms list can be found in :external+mmcv:doc:`api/transforms`.
Transform Wrapper
^^^^^^^^^^^^^^^^^
.. autosummary::
:toctree: generated
:nosignatures:
:template: data_transform.rst
MultiView
.. module:: mmpretrain.models.utils.data_preprocessor
TorchVision Transforms
^^^^^^^^^^^^^^^^^^^^^^
We also provide all the transforms in TorchVision. You can use them the like following examples:
**1. Use some TorchVision Augs Surrounded by NumpyToPIL and PILToNumpy (Recommendation)**
Add TorchVision Augs surrounded by ``dict(type='NumpyToPIL', to_rgb=True),`` and ``dict(type='PILToNumpy', to_bgr=True),``
.. code:: python
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='NumpyToPIL', to_rgb=True), # from BGR in cv2 to RGB in PIL
dict(type='torchvision/RandomResizedCrop',size=176),
dict(type='PILToNumpy', to_bgr=True), # from RGB in PIL to BGR in cv2
dict(type='RandomFlip', prob=0.5, direction='horizontal'),
dict(type='PackInputs'),
]
data_preprocessor = dict(
num_classes=1000,
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True, # from BGR in cv2 to RGB in PIL
)
**2. Use TorchVision Augs and ToTensor&Normalize**
Make sure the 'img' has been converted to PIL format from BGR-Numpy format before being processed by TorchVision Augs.
.. code:: python
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='NumpyToPIL', to_rgb=True), # from BGR in cv2 to RGB in PIL
dict(
type='torchvision/RandomResizedCrop',
size=176,
interpolation='bilinear'), # accept str format interpolation mode
dict(type='torchvision/RandomHorizontalFlip', p=0.5),
dict(
type='torchvision/TrivialAugmentWide',
interpolation='bilinear'),
dict(type='torchvision/PILToTensor'),
dict(type='torchvision/ConvertImageDtype', dtype=torch.float),
dict(
type='torchvision/Normalize',
mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225),
),
dict(type='torchvision/RandomErasing', p=0.1),
dict(type='PackInputs'),
]
data_preprocessor = dict(num_classes=1000, mean=None, std=None, to_rgb=False) # Normalize in dataset pipeline
**3. Use TorchVision Augs Except ToTensor&Normalize**
.. code:: python
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='NumpyToPIL', to_rgb=True), # from BGR in cv2 to RGB in PIL
dict(type='torchvision/RandomResizedCrop', size=176, interpolation='bilinear'),
dict(type='torchvision/RandomHorizontalFlip', p=0.5),
dict(type='torchvision/TrivialAugmentWide', interpolation='bilinear'),
dict(type='PackInputs'),
]
# here the Normalize params is for the RGB format
data_preprocessor = dict(
num_classes=1000,
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=False,
)
Data Preprocessors
------------------
The data preprocessor is also a component to process the data before feeding data to the neural network.
Comparing with the data transforms, the data preprocessor is a module of the classifier,
and it takes a batch of data to process, which means it can use GPU and batch to accelebrate the processing.
The default data preprocessor in MMPreTrain could do the pre-processing like following:
1. Move data to the target device.
2. Pad inputs to the maximum size of current batch.
3. Stack inputs to a batch.
4. Convert inputs from bgr to rgb if the shape of input is (3, H, W).
5. Normalize image with defined std and mean.
6. Do batch augmentations like Mixup and CutMix during training.
You can configure the data preprocessor by the ``data_preprocessor`` field or ``model.data_preprocessor`` field in the config file. Typical usages are as below:
.. code-block:: python
data_preprocessor = dict(
# RGB format normalization parameters
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True, # convert image from BGR to RGB
)
Or define in ``model.data_preprocessor`` as following:
.. code-block:: python
model = dict(
backbone = ...,
neck = ...,
head = ...,
data_preprocessor = dict(
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True)
train_cfg=...,
)
Note that the ``model.data_preprocessor`` has higher priority than ``data_preprocessor``.
.. autosummary::
:toctree: generated
:nosignatures:
ClsDataPreprocessor
SelfSupDataPreprocessor
TwoNormDataPreprocessor
VideoDataPreprocessor
.. module:: mmpretrain.models.utils.batch_augments
Batch Augmentations
^^^^^^^^^^^^^^^^^^^^
The batch augmentation is a component of data preprocessors. It involves multiple samples and mix them in some way, such as Mixup and CutMix.
These augmentations are usually only used during training, therefore, we use the ``model.train_cfg`` field to configure them in config files.
.. code-block:: python
model = dict(
backbone=...,
neck=...,
head=...,
train_cfg=dict(augments=[
dict(type='Mixup', alpha=0.8),
dict(type='CutMix', alpha=1.0),
]),
)
You can also specify the probabilities of every batch augmentation by the ``probs`` field.
.. code-block:: python
model = dict(
backbone=...,
neck=...,
head=...,
train_cfg=dict(augments=[
dict(type='Mixup', alpha=0.8),
dict(type='CutMix', alpha=1.0),
], probs=[0.3, 0.7])
)
Here is a list of batch augmentations can be used in MMPreTrain.
.. autosummary::
:toctree: generated
:nosignatures:
:template: callable.rst
Mixup
CutMix
ResizeMix
.. role:: hidden
:class: hidden-section
.. module:: mmpretrain.datasets
mmpretrain.datasets
===================================
The ``datasets`` package contains several usual datasets for image classification tasks and some dataset wrappers.
.. contents:: mmpretrain.datasets
:depth: 2
:local:
:backlinks: top
Custom Dataset
--------------
.. autoclass:: CustomDataset
ImageNet
--------
.. autoclass:: ImageNet
.. autoclass:: ImageNet21k
CIFAR
-----
.. autoclass:: CIFAR10
.. autoclass:: CIFAR100
MNIST
-----
.. autoclass:: MNIST
.. autoclass:: FashionMNIST
VOC
---
.. autoclass:: VOC
CUB
---
.. autoclass:: CUB
Places205
---------
.. autoclass:: Places205
Retrieval
---------
.. autoclass:: InShop
Base classes
------------
.. autoclass:: BaseDataset
.. autoclass:: MultiLabelDataset
Caltech101
----------------
.. autoclass:: Caltech101
Food101
----------------
.. autoclass:: Food101
DTD
----------------
.. autoclass:: DTD
FGVCAircraft
----------------
.. autoclass:: FGVCAircraft
Flowers102
----------------
.. autoclass:: Flowers102
StanfordCars
----------------
.. autoclass:: StanfordCars
OxfordIIITPet
----------------
.. autoclass:: OxfordIIITPet
SUN397
----------------
.. autoclass:: SUN397
RefCOCO
--------
.. autoclass:: RefCOCO
Dataset Wrappers
----------------
.. autoclass:: KFoldDataset
The dataset wrappers in the MMEngine can be directly used in MMPreTrain.
.. list-table::
* - :class:`~mmengine.dataset.ConcatDataset`
- A wrapper of concatenated dataset.
* - :class:`~mmengine.dataset.RepeatDataset`
- A wrapper of repeated dataset.
* - :class:`~mmengine.dataset.ClassBalancedDataset`
- A wrapper of class balanced dataset.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment