add part code

495d9ed9 · limm · 59b09903 · 495d9ed9 · 495d9ed9 · 495d9ed9
Commit 495d9ed9 authored Jun 24, 2025 by limm
20 changed files
--- a/configs/seresnet/seresnext50-32x4d_8xb32_in1k.py
+++ b/configs/seresnet/seresnext50-32x4d_8xb32_in1k.py
+_base_ = [
+    '../_base_/models/seresnext50_32x4d.py',
+    '../_base_/datasets/imagenet_bs32_pil_resize.py',
+    '../_base_/schedules/imagenet_bs256.py', '../_base_/default_runtime.py'
+]
--- a/configs/shufflenet_v1/README.md
+++ b/configs/shufflenet_v1/README.md
+# Shufflenet V1
+
+> [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.html)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet on ImageNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ~13x actual speedup over AlexNet while maintaining comparable accuracy.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/26739999/142575730-dc2f616d-80df-4fb1-93e1-77ebb2b835cf.png" width="70%"/>
+</div>
+
+## How to use it?
+
+<!-- [TABS-BEGIN] -->
+
+**Predict image**
+
+```python
+from mmpretrain import inference_model
+
+predict = inference_model('shufflenet-v1-1x_16xb64_in1k', 'demo/bird.JPEG')
+print(predict['pred_class'])
+print(predict['pred_score'])
+```
+
+**Use the model**
+
+```python
+import torch
+from mmpretrain import get_model
+
+model = get_model('shufflenet-v1-1x_16xb64_in1k', pretrained=True)
+inputs = torch.rand(1, 3, 224, 224)
+out = model(inputs)
+print(type(out))
+# To extract features.
+feats = model.extract_feat(inputs)
+print(type(feats))
+```
+
+**Train/Test Command**
+
+Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset).
+
+Train:
+
+```shell
+python tools/train.py configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py
+```
+
+Test:
+
+```shell
+python tools/test.py configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py https://download.openmmlab.com/mmclassification/v0/shufflenet_v1/shufflenet_v1_batch1024_imagenet_20200804-5d6cec73.pth
+```
+
+<!-- [TABS-END] -->
+
+## Models and results
+
+### Image Classification on ImageNet-1k
+
+| Model                          |   Pretrain   | Params (M) | Flops (G) | Top-1 (%) | Top-5 (%) |                  Config                   |                                     Download                                     |
+| :----------------------------- | :----------: | :--------: | :-------: | :-------: | :-------: | :---------------------------------------: | :------------------------------------------------------------------------------: |
+| `shufflenet-v1-1x_16xb64_in1k` | From scratch |    1.87    |   0.15    |   68.13   |   87.81   | [config](shufflenet-v1-1x_16xb64_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/shufflenet_v1/shufflenet_v1_batch1024_imagenet_20200804-5d6cec73.pth) \| [log](https://download.openmmlab.com/mmclassification/v0/shufflenet_v1/shufflenet_v1_batch1024_imagenet_20200804-5d6cec73.json) |
+
+## Citation
+
+```bibtex
+@inproceedings{zhang2018shufflenet,
+  title={Shufflenet: An extremely efficient convolutional neural network for mobile devices},
+  author={Zhang, Xiangyu and Zhou, Xinyu and Lin, Mengxiao and Sun, Jian},
+  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
+  pages={6848--6856},
+  year={2018}
+}
+```
--- a/configs/shufflenet_v1/metafile.yml
+++ b/configs/shufflenet_v1/metafile.yml
+Collections:
+  - Name: Shufflenet V1
+    Metadata:
+      Training Data: ImageNet-1k
+      Training Techniques:
+        - SGD with Momentum
+        - Weight Decay
+        - No BN decay
+      Training Resources: 8x 1080 GPUs
+      Epochs: 300
+      Batch Size: 1024
+      Architecture:
+        - Shufflenet V1
+    Paper:
+      URL: https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.html
+      Title: "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices"
+    README: configs/shufflenet_v1/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmpretrain/blob/v0.15.0/mmcls/models/backbones/shufflenet_v1.py#L152
+      Version: v0.15.0
+
+Models:
+  - Name: shufflenet-v1-1x_16xb64_in1k
+    Metadata:
+      FLOPs: 146000000
+      Parameters: 1870000
+    In Collection: Shufflenet V1
+    Results:
+      - Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 68.13
+          Top 5 Accuracy: 87.81
+        Task: Image Classification
+    Weights: https://download.openmmlab.com/mmclassification/v0/shufflenet_v1/shufflenet_v1_batch1024_imagenet_20200804-5d6cec73.pth
+    Config: configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py
--- a/configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py
+++ b/configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py
+_base_ = [
+    '../_base_/models/shufflenet_v1_1x.py',
+    '../_base_/datasets/imagenet_bs64_pil_resize.py',
+    '../_base_/schedules/imagenet_bs1024_linearlr_bn_nowd.py',
+    '../_base_/default_runtime.py'
+]
--- a/configs/shufflenet_v2/README.md
+++ b/configs/shufflenet_v2/README.md
+# Shufflenet V2
+
+> [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://openaccess.thecvf.com/content_ECCV_2018/papers/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.pdf)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Currently, the neural network architecture design is mostly guided by the *indirect* metric of computation complexity, i.e., FLOPs. However, the *direct* metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical *guidelines* for efficient network design. Accordingly, a new architecture is presented, called *ShuffleNet V2*. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/26739999/142576336-e0db2866-3add-44e6-a792-14d4f11bd983.png" width="80%"/>
+</div>
+
+## How to use it?
+
+<!-- [TABS-BEGIN] -->
+
+**Predict image**
+
+```python
+from mmpretrain import inference_model
+
+predict = inference_model('shufflenet-v2-1x_16xb64_in1k', 'demo/bird.JPEG')
+print(predict['pred_class'])
+print(predict['pred_score'])
+```
+
+**Use the model**
+
+```python
+import torch
+from mmpretrain import get_model
+
+model = get_model('shufflenet-v2-1x_16xb64_in1k', pretrained=True)
+inputs = torch.rand(1, 3, 224, 224)
+out = model(inputs)
+print(type(out))
+# To extract features.
+feats = model.extract_feat(inputs)
+print(type(feats))
+```
+
+**Train/Test Command**
+
+Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset).
+
+Train:
+
+```shell
+python tools/train.py configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py
+```
+
+Test:
+
+```shell
+python tools/test.py configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py https://download.openmmlab.com/mmclassification/v0/shufflenet_v2/shufflenet_v2_batch1024_imagenet_20200812-5bf4721e.pth
+```
+
+<!-- [TABS-END] -->
+
+## Models and results
+
+### Image Classification on ImageNet-1k
+
+| Model                          |   Pretrain   | Params (M) | Flops (G) | Top-1 (%) | Top-5 (%) |                  Config                   |                                     Download                                     |
+| :----------------------------- | :----------: | :--------: | :-------: | :-------: | :-------: | :---------------------------------------: | :------------------------------------------------------------------------------: |
+| `shufflenet-v2-1x_16xb64_in1k` | From scratch |    2.28    |   0.15    |   69.55   |   88.92   | [config](shufflenet-v2-1x_16xb64_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/shufflenet_v2/shufflenet_v2_batch1024_imagenet_20200812-5bf4721e.pth) \| [log](https://download.openmmlab.com/mmclassification/v0/shufflenet_v2/shufflenet_v2_batch1024_imagenet_20200812-5bf4721e.json) |
+
+## Citation
+
+```bibtex
+@inproceedings{ma2018shufflenet,
+  title={Shufflenet v2: Practical guidelines for efficient cnn architecture design},
+  author={Ma, Ningning and Zhang, Xiangyu and Zheng, Hai-Tao and Sun, Jian},
+  booktitle={Proceedings of the European conference on computer vision (ECCV)},
+  pages={116--131},
+  year={2018}
+}
+```
--- a/configs/shufflenet_v2/metafile.yml
+++ b/configs/shufflenet_v2/metafile.yml
+Collections:
+  - Name: Shufflenet V2
+    Metadata:
+      Training Data: ImageNet-1k
+      Training Techniques:
+        - SGD with Momentum
+        - Weight Decay
+        - No BN decay
+      Training Resources: 8x 1080 GPUs
+      Epochs: 300
+      Batch Size: 1024
+      Architecture:
+        - Shufflenet V2
+    Paper:
+      URL: https://openaccess.thecvf.com/content_ECCV_2018/papers/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.pdf
+      Title: "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design"
+    README: configs/shufflenet_v2/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmpretrain/blob/v0.15.0/mmcls/models/backbones/shufflenet_v2.py#L134
+      Version: v0.15.0
+
+Models:
+  - Name: shufflenet-v2-1x_16xb64_in1k
+    Metadata:
+      FLOPs: 149000000
+      Parameters: 2280000
+    In Collection: Shufflenet V2
+    Results:
+      - Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 69.55
+          Top 5 Accuracy: 88.92
+        Task: Image Classification
+    Weights: https://download.openmmlab.com/mmclassification/v0/shufflenet_v2/shufflenet_v2_batch1024_imagenet_20200812-5bf4721e.pth
+    Config: configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py
--- a/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py
+++ b/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py
+_base_ = [
+    '../_base_/models/shufflenet_v2_1x.py',
+    '../_base_/datasets/imagenet_bs64_pil_resize.py',
+    '../_base_/schedules/imagenet_bs1024_linearlr_bn_nowd.py',
+    '../_base_/default_runtime.py'
+]
--- a/configs/simclr/README.md
+++ b/configs/simclr/README.md
+# SimCLR
+
+> [A simple framework for contrastive learning of visual representations](https://arxiv.org/abs/2002.05709)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50.
+
+<div align=center>
+<img  src="https://user-images.githubusercontent.com/36138628/149723851-cf5f309e-d891-454d-90c0-e5337e5a11ed.png" width="400" />
+</div>
+
+## How to use it?
+
+<!-- [TABS-BEGIN] -->
+
+**Predict image**
+
+```python
+from mmpretrain import inference_model
+
+predict = inference_model('resnet50_simclr-200e-pre_8xb512-linear-coslr-90e_in1k', 'demo/bird.JPEG')
+print(predict['pred_class'])
+print(predict['pred_score'])
+```
+
+**Use the model**
+
+```python
+import torch
+from mmpretrain import get_model
+
+model = get_model('simclr_resnet50_16xb256-coslr-200e_in1k', pretrained=True)
+inputs = torch.rand(1, 3, 224, 224)
+out = model(inputs)
+print(type(out))
+# To extract features.
+feats = model.extract_feat(inputs)
+print(type(feats))
+```
+
+**Train/Test Command**
+
+Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset).
+
+Train:
+
+```shell
+python tools/train.py configs/simclr/simclr_resnet50_16xb256-coslr-200e_in1k.py
+```
+
+Test:
+
+```shell
+python tools/test.py configs/simclr/benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-f12c0457.pth
+```
+
+<!-- [TABS-END] -->
+
+## Models and results
+
+### Pretrained models
+
+| Model                                     | Params (M) | Flops (G) |                        Config                        |                                         Download                                         |
+| :---------------------------------------- | :--------: | :-------: | :--------------------------------------------------: | :--------------------------------------------------------------------------------------: |
+| `simclr_resnet50_16xb256-coslr-200e_in1k` |   27.97    |   4.11    | [config](simclr_resnet50_16xb256-coslr-200e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/simclr_resnet50_16xb256-coslr-200e_in1k_20220825-4d9cce50.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/simclr_resnet50_16xb256-coslr-200e_in1k_20220825-4d9cce50.json) |
+| `simclr_resnet50_16xb256-coslr-800e_in1k` |   27.97    |   4.11    | [config](simclr_resnet50_16xb256-coslr-800e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/simclr_resnet50_16xb256-coslr-800e_in1k_20220825-85fcc4de.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/simclr_resnet50_16xb256-coslr-800e_in1k_20220825-85fcc4de.json) |
+
+### Image Classification on ImageNet-1k
+
+| Model                                     |                   Pretrain                   | Params (M) | Flops (G) | Top-1 (%) |                   Config                   |                   Download                    |
+| :---------------------------------------- | :------------------------------------------: | :--------: | :-------: | :-------: | :----------------------------------------: | :-------------------------------------------: |
+| `resnet50_simclr-200e-pre_8xb512-linear-coslr-90e_in1k` | [SIMCLR 200-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/simclr_resnet50_16xb256-coslr-200e_in1k_20220825-4d9cce50.pth) |   25.56    |   4.11    |   66.90   | [config](benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-f12c0457.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-f12c0457.json) |
+| `resnet50_simclr-800e-pre_8xb512-linear-coslr-90e_in1k` | [SIMCLR 800-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/simclr_resnet50_16xb256-coslr-800e_in1k_20220825-85fcc4de.pth) |   25.56    |   4.11    |   69.20   | [config](benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-b80ae1e5.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-b80ae1e5.json) |
+
+## Citation
+
+```bibtex
+@inproceedings{chen2020simple,
+  title={A simple framework for contrastive learning of visual representations},
+  author={Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey},
+  booktitle={ICML},
+  year={2020},
+}
+```
--- a/configs/simclr/benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py
+++ b/configs/simclr/benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py
+_base_ = [
+    '../../_base_/models/resnet50.py',
+    '../../_base_/datasets/imagenet_bs32_pil_resize.py',
+    '../../_base_/schedules/imagenet_lars_coslr_90e.py',
+    '../../_base_/default_runtime.py',
+]
+
+model = dict(
+    backbone=dict(
+        frozen_stages=4,
+        init_cfg=dict(type='Pretrained', checkpoint='', prefix='backbone.')))
+
+# dataset summary
+train_dataloader = dict(batch_size=512)
+
+# runtime settings
+default_hooks = dict(
+    checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3))
--- a/configs/simclr/metafile.yml
+++ b/configs/simclr/metafile.yml
+Collections:
+  - Name: SimCLR
+    Metadata:
+      Training Data: ImageNet-1k
+      Training Techniques:
+        - LARS
+      Training Resources: 8x V100 GPUs (b256), 16x A100-80G GPUs (b4096)
+      Architecture:
+        - ResNet
+        - SimCLR
+    Paper:
+      Title: A simple framework for contrastive learning of visual representations
+      URL: https://arxiv.org/abs/2002.05709
+    README: configs/simclr/README.md
+
+Models:
+  - Name: simclr_resnet50_16xb256-coslr-200e_in1k
+    Metadata:
+      Epochs: 200
+      Batch Size: 4096
+      FLOPs: 4109364224
+      Parameters: 27968832
+      Training Data: ImageNet-1k
+    In Collection: SimCLR
+    Results: null
+    Weights: https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/simclr_resnet50_16xb256-coslr-200e_in1k_20220825-4d9cce50.pth
+    Config: configs/simclr/simclr_resnet50_16xb256-coslr-200e_in1k.py
+    Downstream:
+      - resnet50_simclr-200e-pre_8xb512-linear-coslr-90e_in1k
+  - Name: simclr_resnet50_16xb256-coslr-800e_in1k
+    Metadata:
+      Epochs: 200
+      Batch Size: 4096
+      FLOPs: 4109364224
+      Parameters: 27968832
+      Training Data: ImageNet-1k
+    In Collection: SimCLR
+    Results: null
+    Weights: https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/simclr_resnet50_16xb256-coslr-800e_in1k_20220825-85fcc4de.pth
+    Config: configs/simclr/simclr_resnet50_16xb256-coslr-800e_in1k.py
+    Downstream:
+      - resnet50_simclr-800e-pre_8xb512-linear-coslr-90e_in1k
+  - Name: resnet50_simclr-200e-pre_8xb512-linear-coslr-90e_in1k
+    Metadata:
+      Epochs: 90
+      Batch Size: 4096
+      FLOPs: 4109464576
+      Parameters: 25557032
+      Training Data: ImageNet-1k
+    In Collection: SimCLR
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 66.9
+    Weights: https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-f12c0457.pth
+    Config: configs/simclr/benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py
+  - Name: resnet50_simclr-800e-pre_8xb512-linear-coslr-90e_in1k
+    Metadata:
+      Epochs: 90
+      Batch Size: 4096
+      FLOPs: 4109464576
+      Parameters: 25557032
+      Training Data: ImageNet-1k
+    In Collection: SimCLR
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 69.2
+    Weights: https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-b80ae1e5.pth
+    Config: configs/simclr/benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py
--- a/configs/simclr/simclr_resnet50_16xb256-coslr-200e_in1k.py
+++ b/configs/simclr/simclr_resnet50_16xb256-coslr-200e_in1k.py
+_base_ = [
+    '../_base_/datasets/imagenet_bs32_simclr.py',
+    '../_base_/schedules/imagenet_lars_coslr_200e.py',
+    '../_base_/default_runtime.py',
+]
+
+# dataset settings
+train_dataloader = dict(batch_size=256)
+
+# model settings
+model = dict(
+    type='SimCLR',
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        norm_cfg=dict(type='SyncBN'),
+        zero_init_residual=True),
+    neck=dict(
+        type='NonLinearNeck',  # SimCLR non-linear neck
+        in_channels=2048,
+        hid_channels=2048,
+        out_channels=128,
+        num_layers=2,
+        with_avg_pool=True),
+    head=dict(
+        type='ContrastiveHead',
+        loss=dict(type='CrossEntropyLoss'),
+        temperature=0.1),
+)
+
+# optimizer
+optim_wrapper = dict(
+    type='OptimWrapper',
+    optimizer=dict(type='LARS', lr=4.8, momentum=0.9, weight_decay=1e-6),
+    paramwise_cfg=dict(
+        custom_keys={
+            'bn': dict(decay_mult=0, lars_exclude=True),
+            'bias': dict(decay_mult=0, lars_exclude=True),
+            # bn layer in ResNet block downsample module
+            'downsample.1': dict(decay_mult=0, lars_exclude=True),
+        }))
+
+# runtime settings
+default_hooks = dict(
+    # only keeps the latest 3 checkpoints
+    checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3))
--- a/configs/simclr/simclr_resnet50_16xb256-coslr-800e_in1k.py
+++ b/configs/simclr/simclr_resnet50_16xb256-coslr-800e_in1k.py
+_base_ = [
+    '../_base_/datasets/imagenet_bs32_simclr.py',
+    '../_base_/schedules/imagenet_lars_coslr_200e.py',
+    '../_base_/default_runtime.py',
+]
+
+# model settings
+model = dict(
+    type='SimCLR',
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        norm_cfg=dict(type='SyncBN'),
+        zero_init_residual=True),
+    neck=dict(
+        type='NonLinearNeck',  # SimCLR non-linear neck
+        in_channels=2048,
+        hid_channels=2048,
+        out_channels=128,
+        num_layers=2,
+        with_avg_pool=True),
+    head=dict(
+        type='ContrastiveHead',
+        loss=dict(type='CrossEntropyLoss'),
+        temperature=0.1),
+)
+
+# optimizer
+optim_wrapper = dict(
+    type='OptimWrapper',
+    optimizer=dict(type='LARS', lr=4.8, momentum=0.9, weight_decay=1e-6),
+    paramwise_cfg=dict(
+        custom_keys={
+            'bn': dict(decay_mult=0, lars_exclude=True),
+            'bias': dict(decay_mult=0, lars_exclude=True),
+            # bn layer in ResNet block downsample module
+            'downsample.1': dict(decay_mult=0, lars_exclude=True),
+        }))
+
+# learning rate scheduler
+param_scheduler = [
+    dict(
+        type='LinearLR',
+        start_factor=1e-4,
+        by_epoch=True,
+        begin=0,
+        end=10,
+        convert_to_iter_based=True),
+    dict(
+        type='CosineAnnealingLR', T_max=790, by_epoch=True, begin=10, end=800)
+]
+
+# runtime settings
+train_cfg = dict(max_epochs=800)
+default_hooks = dict(
+    # only keeps the latest 3 checkpoints
+    checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3))
--- a/configs/simclr/simclr_resnet50_8xb32-coslr-200e_in1k.py
+++ b/configs/simclr/simclr_resnet50_8xb32-coslr-200e_in1k.py
+_base_ = [
+    '../_base_/datasets/imagenet_bs32_simclr.py',
+    '../_base_/schedules/imagenet_lars_coslr_200e.py',
+    '../_base_/default_runtime.py',
+]
+
+# model settings
+model = dict(
+    type='SimCLR',
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        norm_cfg=dict(type='SyncBN'),
+        zero_init_residual=True),
+    neck=dict(
+        type='NonLinearNeck',  # SimCLR non-linear neck
+        in_channels=2048,
+        hid_channels=2048,
+        out_channels=128,
+        num_layers=2,
+        with_avg_pool=True),
+    head=dict(
+        type='ContrastiveHead',
+        loss=dict(type='CrossEntropyLoss'),
+        temperature=0.1),
+)
+
+# optimizer
+optim_wrapper = dict(
+    type='OptimWrapper',
+    optimizer=dict(type='LARS', lr=0.3, momentum=0.9, weight_decay=1e-6),
+    paramwise_cfg=dict(
+        custom_keys={
+            'bn': dict(decay_mult=0, lars_exclude=True),
+            'bias': dict(decay_mult=0, lars_exclude=True),
+            # bn layer in ResNet block downsample module
+            'downsample.1': dict(decay_mult=0, lars_exclude=True),
+        }))
+
+# runtime settings
+default_hooks = dict(
+    # only keeps the latest 3 checkpoints
+    checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3))
+
+# NOTE: `auto_scale_lr` is for automatically scaling LR
+# based on the actual training batch size.
+auto_scale_lr = dict(base_batch_size=256)
--- a/configs/simmim/README.md
+++ b/configs/simmim/README.md
+# SimMIM
+
+> [SimMIM: A Simple Framework for Masked Image Modeling](https://arxiv.org/abs/2111.09886)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+This paper presents SimMIM, a simple framework for masked image modeling. We simplify recently proposed related approaches without special designs such as blockwise masking and tokenization via discrete VAE or clustering. To study what let the masked image modeling task learn good representations, we systematically study the major components in our framework, and find that simple designs of each component have revealed very strong representation learning performance: 1) random masking of the input image with a moderately large masked patch size (e.g., 32) makes a strong pre-text task; 2) predicting raw pixels of RGB values by direct regression performs no worse than the patch classification approaches with complex designs; 3) the prediction head can be as light as a linear layer, with no worse performance than heavier ones. Using ViT-B, our approach achieves 83.8% top-1 fine-tuning accuracy on ImageNet-1K by pre-training also on this dataset, surpassing previous best approach by +0.6%. When applied on a larger model of about 650 million parameters, SwinV2H, it achieves 87.1% top-1 accuracy on ImageNet-1K using only ImageNet-1K data. We also leverage this approach to facilitate the training of a 3B model (SwinV2-G), that by 40× less data than that in previous practice, we achieve the state-of-the-art on four representative vision benchmarks. The code and models will be publicly available at https: //github.com/microsoft/SimMIM .
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/30762564/159404597-ac6d3a44-ee59-4cdc-8f6f-506a7d1b18b6.png" width="70%"/>
+</div>
+
+## How to use it?
+
+<!-- [TABS-BEGIN] -->
+
+**Predict image**
+
+```python
+from mmpretrain import inference_model
+
+predict = inference_model('swin-base-w6_simmim-100e-pre_8xb256-coslr-100e_in1k-192px', 'demo/bird.JPEG')
+print(predict['pred_class'])
+print(predict['pred_score'])
+```
+
+**Use the model**
+
+```python
+import torch
+from mmpretrain import get_model
+
+model = get_model('simmim_swin-base-w6_8xb256-amp-coslr-100e_in1k-192px', pretrained=True)
+inputs = torch.rand(1, 3, 224, 224)
+out = model(inputs)
+print(type(out))
+# To extract features.
+feats = model.extract_feat(inputs)
+print(type(feats))
+```
+
+**Train/Test Command**
+
+Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset).
+
+Train:
+
+```shell
+python tools/train.py configs/simmim/simmim_swin-base-w6_8xb256-amp-coslr-100e_in1k-192px.py
+```
+
+Test:
+
+```shell
+python tools/test.py configs/simmim/benchmarks/swin-base-w6_8xb256-coslr-100e_in1k-192px.py https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k/swin-base_ft-8xb256-coslr-100e_in1k_20220829-9cf23aa1.pth
+```
+
+<!-- [TABS-END] -->
+
+## Models and results
+
+### Pretrained models
+
+| Model                                                     | Params (M) | Flops (G) |                            Config                             |                            Download                             |
+| :-------------------------------------------------------- | :--------: | :-------: | :-----------------------------------------------------------: | :-------------------------------------------------------------: |
+| `simmim_swin-base-w6_8xb256-amp-coslr-100e_in1k-192px`    |   89.87    |   18.83   | [config](simmim_swin-base-w6_8xb256-amp-coslr-100e_in1k-192px.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192_20220829-0e15782d.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192_20220829-0e15782d.json) |
+| `simmim_swin-base-w6_16xb128-amp-coslr-800e_in1k-192px`   |   89.87    |   18.83   | [config](simmim_swin-base-w6_16xb128-amp-coslr-800e_in1k-192px.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192_20220916-a0e931ac.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192_20220916-a0e931ac.json) |
+| `simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px` |   199.92   |   55.85   | [config](simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192_20220916-4ad216d3.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192_20220916-4ad216d3.json) |
+
+### Image Classification on ImageNet-1k
+
+| Model                                     |                   Pretrain                   | Params (M) | Flops (G) | Top-1 (%) |                   Config                   |                   Download                    |
+| :---------------------------------------- | :------------------------------------------: | :--------: | :-------: | :-------: | :----------------------------------------: | :-------------------------------------------: |
+| `swin-base-w6_simmim-100e-pre_8xb256-coslr-100e_in1k-192px` | [SIMMIM 100-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192_20220829-0e15782d.pth) |   87.75    |   11.30   |   82.70   | [config](benchmarks/swin-base-w6_8xb256-coslr-100e_in1k-192px.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k/swin-base_ft-8xb256-coslr-100e_in1k_20220829-9cf23aa1.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k/swin-base_ft-8xb256-coslr-100e_in1k_20220829-9cf23aa1.json) |
+| `swin-base-w7_simmim-100e-pre_8xb256-coslr-100e_in1k` | [SIMMIM 100-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192_20220829-0e15782d.pth) |   87.77    |   15.47   |   83.50   | [config](benchmarks/swin-base-w7_8xb256-coslr-100e_in1k.py) |                      N/A                      |
+| `swin-base-w6_simmim-800e-pre_8xb256-coslr-100e_in1k-192px` | [SIMMIM 800-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192_20220916-a0e931ac.pth) |   87.77    |   15.47   |   83.80   | [config](benchmarks/swin-base-w7_8xb256-coslr-100e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k-224/swin-base_ft-8xb256-coslr-100e_in1k-224_20221208-155cc6e6.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k-224/swin-base_ft-8xb256-coslr-100e_in1k-224_20221208-155cc6e6.json) |
+| `swin-large-w14_simmim-800e-pre_8xb256-coslr-100e_in1k` | [SIMMIM 800-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192_20220916-4ad216d3.pth) |   196.85   |   38.85   |   84.80   | [config](benchmarks/swin-large-w14_8xb256-coslr-100e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/swin-large_ft-8xb256-coslr-ws14-100e_in1k-224/swin-large_ft-8xb256-coslr-ws14-100e_in1k-224_20220916-d4865790.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/swin-large_ft-8xb256-coslr-ws14-100e_in1k-224/swin-large_ft-8xb256-coslr-ws14-100e_in1k-224_20220916-d4865790.json) |
+
+## Citation
+
+```bibtex
+@inproceedings{xie2021simmim,
+  title={SimMIM: A Simple Framework for Masked Image Modeling},
+  author={Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Bao, Jianmin and Yao, Zhuliang and Dai, Qi and Hu, Han},
+  booktitle={International Conference on Computer Vision and Pattern Recognition (CVPR)},
+  year={2022}
+}
+```
--- a/configs/simmim/benchmarks/swin-base-w6_8xb256-coslr-100e_in1k-192px.py
+++ b/configs/simmim/benchmarks/swin-base-w6_8xb256-coslr-100e_in1k-192px.py
+_base_ = [
+    '../../_base_/models/swin_transformer/base_224.py',
+    '../../_base_/datasets/imagenet_bs256_swin_192.py',
+    '../../_base_/default_runtime.py'
+]
+
+# model settings
+model = dict(
+    backbone=dict(
+        img_size=192,
+        drop_path_rate=0.1,
+        stage_cfgs=dict(block_cfgs=dict(window_size=6)),
+        init_cfg=dict(type='Pretrained', checkpoint='', prefix='backbone.')))
+
+# optimizer settings
+optim_wrapper = dict(
+    type='AmpOptimWrapper',
+    optimizer=dict(type='AdamW', lr=5e-3, weight_decay=0.05),
+    clip_grad=dict(max_norm=5.0),
+    constructor='LearningRateDecayOptimWrapperConstructor',
+    paramwise_cfg=dict(
+        layer_decay_rate=0.9,
+        custom_keys={
+            '.norm': dict(decay_mult=0.0),
+            '.bias': dict(decay_mult=0.0),
+            '.absolute_pos_embed': dict(decay_mult=0.0),
+            '.relative_position_bias_table': dict(decay_mult=0.0)
+        }))
+
+# learning rate scheduler
+param_scheduler = [
+    dict(
+        type='LinearLR',
+        start_factor=2.5e-7 / 1.25e-3,
+        by_epoch=True,
+        begin=0,
+        end=20,
+        convert_to_iter_based=True),
+    dict(
+        type='CosineAnnealingLR',
+        T_max=80,
+        eta_min=2.5e-7 * 2048 / 512,
+        by_epoch=True,
+        begin=20,
+        end=100,
+        convert_to_iter_based=True)
+]
+
+# runtime settings
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=100)
+val_cfg = dict()
+test_cfg = dict()
+
+default_hooks = dict(
+    # save checkpoint per epoch.
+    checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3),
+    logger=dict(type='LoggerHook', interval=100))
+
+randomness = dict(seed=0)
--- a/configs/simmim/benchmarks/swin-base-w7_8xb256-coslr-100e_in1k.py
+++ b/configs/simmim/benchmarks/swin-base-w7_8xb256-coslr-100e_in1k.py
+_base_ = [
+    '../../_base_/models/swin_transformer/base_224.py',
+    '../../_base_/datasets/imagenet_bs256_swin_192.py',
+    '../../_base_/default_runtime.py'
+]
+
+# dataset settings
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='RandomResizedCrop',
+        scale=224,
+        backend='pillow',
+        interpolation='bicubic'),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(
+        type='RandAugment',
+        policies='timm_increasing',
+        num_policies=2,
+        total_level=10,
+        magnitude_level=9,
+        magnitude_std=0.5,
+        hparams=dict(pad_val=[104, 116, 124], interpolation='bicubic')),
+    dict(
+        type='RandomErasing',
+        erase_prob=0.25,
+        mode='rand',
+        min_area_ratio=0.02,
+        max_area_ratio=0.3333333333333333,
+        fill_color=[103.53, 116.28, 123.675],
+        fill_std=[57.375, 57.12, 58.395]),
+    dict(type='PackInputs')
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='ResizeEdge',
+        scale=256,
+        edge='short',
+        backend='pillow',
+        interpolation='bicubic'),
+    dict(type='CenterCrop', crop_size=224),
+    dict(type='PackInputs')
+]
+
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
+test_dataloader = val_dataloader
+
+# model settings
+model = dict(
+    backbone=dict(
+        img_size=224,
+        drop_path_rate=0.1,
+        stage_cfgs=dict(block_cfgs=dict(window_size=7)),
+        init_cfg=dict(type='Pretrained', checkpoint='', prefix='backbone.')))
+
+# optimizer settings
+optim_wrapper = dict(
+    type='AmpOptimWrapper',
+    optimizer=dict(type='AdamW', lr=5e-3, weight_decay=0.05),
+    clip_grad=dict(max_norm=5.0),
+    constructor='LearningRateDecayOptimWrapperConstructor',
+    paramwise_cfg=dict(
+        layer_decay_rate=0.9,
+        custom_keys={
+            '.norm': dict(decay_mult=0.0),
+            '.bias': dict(decay_mult=0.0),
+            '.absolute_pos_embed': dict(decay_mult=0.0),
+            '.relative_position_bias_table': dict(decay_mult=0.0)
+        }))
+
+# learning rate scheduler
+param_scheduler = [
+    dict(
+        type='LinearLR',
+        start_factor=2.5e-7 / 1.25e-3,
+        by_epoch=True,
+        begin=0,
+        end=20,
+        convert_to_iter_based=True),
+    dict(
+        type='CosineAnnealingLR',
+        T_max=80,
+        eta_min=2.5e-7 * 2048 / 512,
+        by_epoch=True,
+        begin=20,
+        end=100,
+        convert_to_iter_based=True)
+]
+
+# runtime settings
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=100)
+val_cfg = dict()
+test_cfg = dict()
+
+default_hooks = dict(
+    # save checkpoint per epoch.
+    checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3),
+    logger=dict(type='LoggerHook', interval=100))
+
+randomness = dict(seed=0)
--- a/configs/simmim/benchmarks/swin-large-w14_8xb256-coslr-100e_in1k.py
+++ b/configs/simmim/benchmarks/swin-large-w14_8xb256-coslr-100e_in1k.py
+_base_ = [
+    '../../_base_/models/swin_transformer/base_224.py',
+    '../../_base_/datasets/imagenet_bs256_swin_192.py',
+    '../../_base_/default_runtime.py'
+]
+
+# dataset settings
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='RandomResizedCrop',
+        scale=224,
+        backend='pillow',
+        interpolation='bicubic'),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(
+        type='RandAugment',
+        policies='timm_increasing',
+        num_policies=2,
+        total_level=10,
+        magnitude_level=9,
+        magnitude_std=0.5,
+        hparams=dict(pad_val=[104, 116, 124], interpolation='bicubic')),
+    dict(
+        type='RandomErasing',
+        erase_prob=0.25,
+        mode='rand',
+        min_area_ratio=0.02,
+        max_area_ratio=0.3333333333333333,
+        fill_color=[103.53, 116.28, 123.675],
+        fill_std=[57.375, 57.12, 58.395]),
+    dict(type='PackInputs')
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='ResizeEdge',
+        scale=256,
+        edge='short',
+        backend='pillow',
+        interpolation='bicubic'),
+    dict(type='CenterCrop', crop_size=224),
+    dict(type='PackInputs')
+]
+
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
+test_dataloader = val_dataloader
+
+# model settings
+model = dict(
+    backbone=dict(
+        arch='large',
+        img_size=224,
+        drop_path_rate=0.2,
+        stage_cfgs=dict(block_cfgs=dict(window_size=14)),
+        pad_small_map=True,
+        init_cfg=dict(type='Pretrained', checkpoint='', prefix='backbone.')),
+    head=dict(in_channels=1536))
+
+# optimizer settings
+optim_wrapper = dict(
+    type='AmpOptimWrapper',
+    optimizer=dict(type='AdamW', lr=5e-3, weight_decay=0.05),
+    clip_grad=dict(max_norm=5.0),
+    constructor='LearningRateDecayOptimWrapperConstructor',
+    paramwise_cfg=dict(
+        layer_decay_rate=0.7,
+        custom_keys={
+            '.norm': dict(decay_mult=0.0),
+            '.bias': dict(decay_mult=0.0),
+            '.absolute_pos_embed': dict(decay_mult=0.0),
+            '.relative_position_bias_table': dict(decay_mult=0.0)
+        }))
+
+# learning rate scheduler
+param_scheduler = [
+    dict(
+        type='LinearLR',
+        start_factor=2.5e-7 / 1.25e-3,
+        by_epoch=True,
+        begin=0,
+        end=20,
+        convert_to_iter_based=True),
+    dict(
+        type='CosineAnnealingLR',
+        T_max=100,
+        eta_min=1e-6,
+        by_epoch=True,
+        begin=20,
+        end=100,
+        convert_to_iter_based=True)
+]
+
+# runtime settings
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=100)
+val_cfg = dict()
+test_cfg = dict()
+
+default_hooks = dict(
+    # save checkpoint per epoch.
+    checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3),
+    logger=dict(type='LoggerHook', interval=100))
+
+randomness = dict(seed=0)
--- a/configs/simmim/metafile.yml
+++ b/configs/simmim/metafile.yml
+Collections:
+  - Name: SimMIM
+    Metadata:
+      Training Data: ImageNet-1k
+      Training Techniques:
+        - AdamW
+      Training Resources: 16x A100 GPUs
+      Architecture:
+        - Swin
+    Paper:
+      Title: 'SimMIM: A Simple Framework for Masked Image Modeling'
+      URL: https://arxiv.org/abs/2111.09886
+    README: configs/simmim/README.md
+
+Models:
+  - Name: simmim_swin-base-w6_8xb256-amp-coslr-100e_in1k-192px
+    Metadata:
+      Epochs: 100
+      Batch Size: 2048
+      FLOPs: 18832161792
+      Parameters: 89874104
+      Training Data: ImageNet-1k
+    In Collection: SimMIM
+    Results: null
+    Weights: https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192_20220829-0e15782d.pth
+    Config: configs/simmim/simmim_swin-base-w6_8xb256-amp-coslr-100e_in1k-192px.py
+    Downstream:
+      - swin-base-w6_simmim-100e-pre_8xb256-coslr-100e_in1k-192px
+      - swin-base-w7_simmim-100e-pre_8xb256-coslr-100e_in1k
+  - Name: simmim_swin-base-w6_16xb128-amp-coslr-800e_in1k-192px
+    Metadata:
+      Epochs: 100
+      Batch Size: 2048
+      FLOPs: 18832161792
+      Parameters: 89874104
+      Training Data: ImageNet-1k
+    In Collection: SimMIM
+    Results: null
+    Weights: https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192_20220916-a0e931ac.pth
+    Config: configs/simmim/simmim_swin-base-w6_16xb128-amp-coslr-800e_in1k-192px.py
+    Downstream:
+      - swin-base-w6_simmim-800e-pre_8xb256-coslr-100e_in1k-192px
+  - Name: simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px
+    Metadata:
+      Epochs: 100
+      Batch Size: 2048
+      FLOPs: 55849130496
+      Parameters: 199920372
+      Training Data: ImageNet-1k
+    In Collection: SimMIM
+    Results: null
+    Weights: https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192_20220916-4ad216d3.pth
+    Config: configs/simmim/simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px.py
+    Downstream:
+      - swin-large-w14_simmim-800e-pre_8xb256-coslr-100e_in1k
+  - Name: swin-base-w6_simmim-100e-pre_8xb256-coslr-100e_in1k-192px
+    Metadata:
+      Epochs: 100
+      Batch Size: 2048
+      FLOPs: 11303976960
+      Parameters: 87750176
+      Training Data: ImageNet-1k
+    In Collection: SimMIM
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 82.7
+    Weights: https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k/swin-base_ft-8xb256-coslr-100e_in1k_20220829-9cf23aa1.pth
+    Config: configs/simmim/benchmarks/swin-base-w6_8xb256-coslr-100e_in1k-192px.py
+  - Name: swin-base-w7_simmim-100e-pre_8xb256-coslr-100e_in1k
+    Metadata:
+      Epochs: 100
+      Batch Size: 2048
+      FLOPs: 15466852352
+      Parameters: 87768224
+      Training Data: ImageNet-1k
+    In Collection: SimMIM
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 83.5
+    Weights: null
+    Config: configs/simmim/benchmarks/swin-base-w7_8xb256-coslr-100e_in1k.py
+  - Name: swin-base-w6_simmim-800e-pre_8xb256-coslr-100e_in1k-192px
+    Metadata:
+      Epochs: 100
+      Batch Size: 2048
+      FLOPs: 15466852352
+      Parameters: 87768224
+      Training Data: ImageNet-1k
+    In Collection: SimMIM
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 83.8
+    Weights: https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k-224/swin-base_ft-8xb256-coslr-100e_in1k-224_20221208-155cc6e6.pth
+    Config: configs/simmim/benchmarks/swin-base-w7_8xb256-coslr-100e_in1k.py
+  - Name: swin-large-w14_simmim-800e-pre_8xb256-coslr-100e_in1k
+    Metadata:
+      Epochs: 100
+      Batch Size: 2048
+      FLOPs: 38853083136
+      Parameters: 196848316
+      Training Data: ImageNet-1k
+    In Collection: SimMIM
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 84.8
+    Weights: https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/swin-large_ft-8xb256-coslr-ws14-100e_in1k-224/swin-large_ft-8xb256-coslr-ws14-100e_in1k-224_20220916-d4865790.pth
+    Config: configs/simmim/benchmarks/swin-large-w14_8xb256-coslr-100e_in1k.py
--- a/configs/simmim/simmim_swin-base-w6_16xb128-amp-coslr-100e_in1k-192px.py
+++ b/configs/simmim/simmim_swin-base-w6_16xb128-amp-coslr-100e_in1k-192px.py
+_base_ = 'simmim_swin-base-w6_8xb256-amp-coslr-100e_in1k-192px.py'
+
+# dataset 16 GPUs x 128
+train_dataloader = dict(batch_size=128)
--- a/configs/simmim/simmim_swin-base-w6_16xb128-amp-coslr-800e_in1k-192px.py
+++ b/configs/simmim/simmim_swin-base-w6_16xb128-amp-coslr-800e_in1k-192px.py
+_base_ = [
+    '../_base_/datasets/imagenet_bs256_simmim_192.py',
+    '../_base_/default_runtime.py',
+]
+
+# model settings
+model = dict(
+    type='SimMIM',
+    backbone=dict(
+        type='SimMIMSwinTransformer',
+        arch='base',
+        img_size=192,
+        stage_cfgs=dict(block_cfgs=dict(window_size=6))),
+    neck=dict(
+        type='SimMIMLinearDecoder', in_channels=128 * 2**3, encoder_stride=32),
+    head=dict(
+        type='SimMIMHead',
+        patch_size=4,
+        loss=dict(type='PixelReconstructionLoss', criterion='L1', channel=3)))
+
+# optimizer wrapper
+optim_wrapper = dict(
+    type='AmpOptimWrapper',
+    optimizer=dict(
+        type='AdamW',
+        lr=1e-4 * 2048 / 512,
+        betas=(0.9, 0.999),
+        weight_decay=0.05),
+    clip_grad=dict(max_norm=5.0),
+    paramwise_cfg=dict(
+        custom_keys={
+            'norm': dict(decay_mult=0.0),
+            'bias': dict(decay_mult=0.0),
+            'absolute_pos_embed': dict(decay_mult=0.),
+            'relative_position_bias_table': dict(decay_mult=0.)
+        }))
+
+# learning rate scheduler
+param_scheduler = [
+    dict(
+        type='LinearLR',
+        start_factor=5e-7 / 1e-4,
+        by_epoch=True,
+        begin=0,
+        end=10,
+        convert_to_iter_based=True),
+    dict(
+        type='MultiStepLR',
+        milestones=[700],
+        by_epoch=True,
+        begin=10,
+        end=800,
+        convert_to_iter_based=True)
+]
+
+# runtime
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=800)
+default_hooks = dict(
+    # only keeps the latest 3 checkpoints
+    checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3))
+
+# NOTE: `auto_scale_lr` is for automatically scaling LR
+# based on the actual training batch size.
+auto_scale_lr = dict(base_batch_size=2048)