"example/37_permute/run_permute_element_example.inc" did not exist on "185f78449952fa8e28cc4764fd4680058d052ce5"
Commit dff2c686 authored by renzhc's avatar renzhc
Browse files

first commit

parent 8f9dd0ed
Pipeline #1665 canceled with stages
# SEResNet
> [Squeeze-and-Excitation Networks](https://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Squeeze-and-Excitation_Networks_CVPR_2018_paper.html)
<!-- [ALGORITHM] -->
## Abstract
The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251%, surpassing the winning entry of 2016 by a relative improvement of ~25%.
<div align=center>
<img src="https://user-images.githubusercontent.com/26739999/142574668-3464d087-b962-48ba-ad1d-5d6b33c3ba0b.png" width="50%"/>
</div>
## How to use it?
<!-- [TABS-BEGIN] -->
**Predict image**
```python
from mmpretrain import inference_model
predict = inference_model('seresnet50_8xb32_in1k', 'demo/bird.JPEG')
print(predict['pred_class'])
print(predict['pred_score'])
```
**Use the model**
```python
import torch
from mmpretrain import get_model
model = get_model('seresnet50_8xb32_in1k', pretrained=True)
inputs = torch.rand(1, 3, 224, 224)
out = model(inputs)
print(type(out))
# To extract features.
feats = model.extract_feat(inputs)
print(type(feats))
```
**Train/Test Command**
Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset).
Train:
```shell
python tools/train.py configs/seresnet/seresnet50_8xb32_in1k.py
```
Test:
```shell
python tools/test.py configs/seresnet/seresnet50_8xb32_in1k.py https://download.openmmlab.com/mmclassification/v0/se-resnet/se-resnet50_batch256_imagenet_20200804-ae206104.pth
```
<!-- [TABS-END] -->
## Models and results
### Image Classification on ImageNet-1k
| Model | Pretrain | Params (M) | Flops (G) | Top-1 (%) | Top-5 (%) | Config | Download |
| :----------------------- | :----------: | :--------: | :-------: | :-------: | :-------: | :---------------------------------: | :------------------------------------------------------------------------------------------: |
| `seresnet50_8xb32_in1k` | From scratch | 28.09 | 4.13 | 77.74 | 93.84 | [config](seresnet50_8xb32_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/se-resnet/se-resnet50_batch256_imagenet_20200804-ae206104.pth) \| [log](https://download.openmmlab.com/mmclassification/v0/se-resnet/se-resnet50_batch256_imagenet_20200708-657b3c36.log.json) |
| `seresnet101_8xb32_in1k` | From scratch | 49.33 | 7.86 | 78.26 | 94.07 | [config](seresnet101_8xb32_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/se-resnet/se-resnet101_batch256_imagenet_20200804-ba5b51d4.pth) \| [log](https://download.openmmlab.com/mmclassification/v0/se-resnet/se-resnet101_batch256_imagenet_20200708-038a4d04.log.json) |
## Citation
```bibtex
@inproceedings{hu2018squeeze,
title={Squeeze-and-excitation networks},
author={Hu, Jie and Shen, Li and Sun, Gang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={7132--7141},
year={2018}
}
```
Collections:
- Name: SEResNet
Metadata:
Training Data: ImageNet-1k
Training Techniques:
- SGD with Momentum
- Weight Decay
Training Resources: 8x V100 GPUs
Epochs: 140
Batch Size: 256
Architecture:
- ResNet
Paper:
URL: https://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Squeeze-and-Excitation_Networks_CVPR_2018_paper.html
Title: "Squeeze-and-Excitation Networks"
README: configs/seresnet/README.md
Code:
URL: https://github.com/open-mmlab/mmpretrain/blob/v0.15.0/mmcls/models/backbones/seresnet.py#L58
Version: v0.15.0
Models:
- Name: seresnet50_8xb32_in1k
Metadata:
FLOPs: 4130000000
Parameters: 28090000
In Collection: SEResNet
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 77.74
Top 5 Accuracy: 93.84
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/se-resnet/se-resnet50_batch256_imagenet_20200804-ae206104.pth
Config: configs/seresnet/seresnet50_8xb32_in1k.py
- Name: seresnet101_8xb32_in1k
Metadata:
FLOPs: 7860000000
Parameters: 49330000
In Collection: SEResNet
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 78.26
Top 5 Accuracy: 94.07
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/se-resnet/se-resnet101_batch256_imagenet_20200804-ba5b51d4.pth
Config: configs/seresnet/seresnet101_8xb32_in1k.py
_base_ = [
'../_base_/models/seresnet101.py',
'../_base_/datasets/imagenet_bs32_pil_resize.py',
'../_base_/schedules/imagenet_bs256.py', '../_base_/default_runtime.py'
]
_base_ = [
'../_base_/models/seresnet50.py',
'../_base_/datasets/imagenet_bs32_pil_resize.py',
'../_base_/schedules/imagenet_bs256_140e.py',
'../_base_/default_runtime.py'
]
_base_ = [
'../_base_/models/seresnext101_32x4d.py',
'../_base_/datasets/imagenet_bs32_pil_resize.py',
'../_base_/schedules/imagenet_bs256.py', '../_base_/default_runtime.py'
]
_base_ = [
'../_base_/models/seresnext50_32x4d.py',
'../_base_/datasets/imagenet_bs32_pil_resize.py',
'../_base_/schedules/imagenet_bs256.py', '../_base_/default_runtime.py'
]
# Shufflenet V1
> [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.html)
<!-- [ALGORITHM] -->
## Abstract
We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet on ImageNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ~13x actual speedup over AlexNet while maintaining comparable accuracy.
<div align=center>
<img src="https://user-images.githubusercontent.com/26739999/142575730-dc2f616d-80df-4fb1-93e1-77ebb2b835cf.png" width="70%"/>
</div>
## How to use it?
<!-- [TABS-BEGIN] -->
**Predict image**
```python
from mmpretrain import inference_model
predict = inference_model('shufflenet-v1-1x_16xb64_in1k', 'demo/bird.JPEG')
print(predict['pred_class'])
print(predict['pred_score'])
```
**Use the model**
```python
import torch
from mmpretrain import get_model
model = get_model('shufflenet-v1-1x_16xb64_in1k', pretrained=True)
inputs = torch.rand(1, 3, 224, 224)
out = model(inputs)
print(type(out))
# To extract features.
feats = model.extract_feat(inputs)
print(type(feats))
```
**Train/Test Command**
Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset).
Train:
```shell
python tools/train.py configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py
```
Test:
```shell
python tools/test.py configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py https://download.openmmlab.com/mmclassification/v0/shufflenet_v1/shufflenet_v1_batch1024_imagenet_20200804-5d6cec73.pth
```
<!-- [TABS-END] -->
## Models and results
### Image Classification on ImageNet-1k
| Model | Pretrain | Params (M) | Flops (G) | Top-1 (%) | Top-5 (%) | Config | Download |
| :----------------------------- | :----------: | :--------: | :-------: | :-------: | :-------: | :---------------------------------------: | :------------------------------------------------------------------------------: |
| `shufflenet-v1-1x_16xb64_in1k` | From scratch | 1.87 | 0.15 | 68.13 | 87.81 | [config](shufflenet-v1-1x_16xb64_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/shufflenet_v1/shufflenet_v1_batch1024_imagenet_20200804-5d6cec73.pth) \| [log](https://download.openmmlab.com/mmclassification/v0/shufflenet_v1/shufflenet_v1_batch1024_imagenet_20200804-5d6cec73.json) |
## Citation
```bibtex
@inproceedings{zhang2018shufflenet,
title={Shufflenet: An extremely efficient convolutional neural network for mobile devices},
author={Zhang, Xiangyu and Zhou, Xinyu and Lin, Mengxiao and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={6848--6856},
year={2018}
}
```
Collections:
- Name: Shufflenet V1
Metadata:
Training Data: ImageNet-1k
Training Techniques:
- SGD with Momentum
- Weight Decay
- No BN decay
Training Resources: 8x 1080 GPUs
Epochs: 300
Batch Size: 1024
Architecture:
- Shufflenet V1
Paper:
URL: https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.html
Title: "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices"
README: configs/shufflenet_v1/README.md
Code:
URL: https://github.com/open-mmlab/mmpretrain/blob/v0.15.0/mmcls/models/backbones/shufflenet_v1.py#L152
Version: v0.15.0
Models:
- Name: shufflenet-v1-1x_16xb64_in1k
Metadata:
FLOPs: 146000000
Parameters: 1870000
In Collection: Shufflenet V1
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 68.13
Top 5 Accuracy: 87.81
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/shufflenet_v1/shufflenet_v1_batch1024_imagenet_20200804-5d6cec73.pth
Config: configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py
_base_ = [
'../_base_/models/shufflenet_v1_1x.py',
'../_base_/datasets/imagenet_bs64_pil_resize.py',
'../_base_/schedules/imagenet_bs1024_linearlr_bn_nowd.py',
'../_base_/default_runtime.py'
]
# Shufflenet V2
> [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://openaccess.thecvf.com/content_ECCV_2018/papers/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.pdf)
<!-- [ALGORITHM] -->
## Abstract
Currently, the neural network architecture design is mostly guided by the *indirect* metric of computation complexity, i.e., FLOPs. However, the *direct* metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical *guidelines* for efficient network design. Accordingly, a new architecture is presented, called *ShuffleNet V2*. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.
<div align=center>
<img src="https://user-images.githubusercontent.com/26739999/142576336-e0db2866-3add-44e6-a792-14d4f11bd983.png" width="80%"/>
</div>
## How to use it?
<!-- [TABS-BEGIN] -->
**Predict image**
```python
from mmpretrain import inference_model
predict = inference_model('shufflenet-v2-1x_16xb64_in1k', 'demo/bird.JPEG')
print(predict['pred_class'])
print(predict['pred_score'])
```
**Use the model**
```python
import torch
from mmpretrain import get_model
model = get_model('shufflenet-v2-1x_16xb64_in1k', pretrained=True)
inputs = torch.rand(1, 3, 224, 224)
out = model(inputs)
print(type(out))
# To extract features.
feats = model.extract_feat(inputs)
print(type(feats))
```
**Train/Test Command**
Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset).
Train:
```shell
python tools/train.py configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py
```
Test:
```shell
python tools/test.py configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py https://download.openmmlab.com/mmclassification/v0/shufflenet_v2/shufflenet_v2_batch1024_imagenet_20200812-5bf4721e.pth
```
<!-- [TABS-END] -->
## Models and results
### Image Classification on ImageNet-1k
| Model | Pretrain | Params (M) | Flops (G) | Top-1 (%) | Top-5 (%) | Config | Download |
| :----------------------------- | :----------: | :--------: | :-------: | :-------: | :-------: | :---------------------------------------: | :------------------------------------------------------------------------------: |
| `shufflenet-v2-1x_16xb64_in1k` | From scratch | 2.28 | 0.15 | 69.55 | 88.92 | [config](shufflenet-v2-1x_16xb64_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/shufflenet_v2/shufflenet_v2_batch1024_imagenet_20200812-5bf4721e.pth) \| [log](https://download.openmmlab.com/mmclassification/v0/shufflenet_v2/shufflenet_v2_batch1024_imagenet_20200812-5bf4721e.json) |
## Citation
```bibtex
@inproceedings{ma2018shufflenet,
title={Shufflenet v2: Practical guidelines for efficient cnn architecture design},
author={Ma, Ningning and Zhang, Xiangyu and Zheng, Hai-Tao and Sun, Jian},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={116--131},
year={2018}
}
```
Collections:
- Name: Shufflenet V2
Metadata:
Training Data: ImageNet-1k
Training Techniques:
- SGD with Momentum
- Weight Decay
- No BN decay
Training Resources: 8x 1080 GPUs
Epochs: 300
Batch Size: 1024
Architecture:
- Shufflenet V2
Paper:
URL: https://openaccess.thecvf.com/content_ECCV_2018/papers/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.pdf
Title: "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design"
README: configs/shufflenet_v2/README.md
Code:
URL: https://github.com/open-mmlab/mmpretrain/blob/v0.15.0/mmcls/models/backbones/shufflenet_v2.py#L134
Version: v0.15.0
Models:
- Name: shufflenet-v2-1x_16xb64_in1k
Metadata:
FLOPs: 149000000
Parameters: 2280000
In Collection: Shufflenet V2
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 69.55
Top 5 Accuracy: 88.92
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/shufflenet_v2/shufflenet_v2_batch1024_imagenet_20200812-5bf4721e.pth
Config: configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py
_base_ = [
'../_base_/models/shufflenet_v2_1x.py',
'../_base_/datasets/imagenet_bs64_pil_resize.py',
'../_base_/schedules/imagenet_bs1024_linearlr_bn_nowd.py',
'../_base_/default_runtime.py'
]
# SimCLR
> [A simple framework for contrastive learning of visual representations](https://arxiv.org/abs/2002.05709)
<!-- [ALGORITHM] -->
## Abstract
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50.
<div align=center>
<img src="https://user-images.githubusercontent.com/36138628/149723851-cf5f309e-d891-454d-90c0-e5337e5a11ed.png" width="400" />
</div>
## How to use it?
<!-- [TABS-BEGIN] -->
**Predict image**
```python
from mmpretrain import inference_model
predict = inference_model('resnet50_simclr-200e-pre_8xb512-linear-coslr-90e_in1k', 'demo/bird.JPEG')
print(predict['pred_class'])
print(predict['pred_score'])
```
**Use the model**
```python
import torch
from mmpretrain import get_model
model = get_model('simclr_resnet50_16xb256-coslr-200e_in1k', pretrained=True)
inputs = torch.rand(1, 3, 224, 224)
out = model(inputs)
print(type(out))
# To extract features.
feats = model.extract_feat(inputs)
print(type(feats))
```
**Train/Test Command**
Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset).
Train:
```shell
python tools/train.py configs/simclr/simclr_resnet50_16xb256-coslr-200e_in1k.py
```
Test:
```shell
python tools/test.py configs/simclr/benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-f12c0457.pth
```
<!-- [TABS-END] -->
## Models and results
### Pretrained models
| Model | Params (M) | Flops (G) | Config | Download |
| :---------------------------------------- | :--------: | :-------: | :--------------------------------------------------: | :--------------------------------------------------------------------------------------: |
| `simclr_resnet50_16xb256-coslr-200e_in1k` | 27.97 | 4.11 | [config](simclr_resnet50_16xb256-coslr-200e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/simclr_resnet50_16xb256-coslr-200e_in1k_20220825-4d9cce50.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/simclr_resnet50_16xb256-coslr-200e_in1k_20220825-4d9cce50.json) |
| `simclr_resnet50_16xb256-coslr-800e_in1k` | 27.97 | 4.11 | [config](simclr_resnet50_16xb256-coslr-800e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/simclr_resnet50_16xb256-coslr-800e_in1k_20220825-85fcc4de.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/simclr_resnet50_16xb256-coslr-800e_in1k_20220825-85fcc4de.json) |
### Image Classification on ImageNet-1k
| Model | Pretrain | Params (M) | Flops (G) | Top-1 (%) | Config | Download |
| :---------------------------------------- | :------------------------------------------: | :--------: | :-------: | :-------: | :----------------------------------------: | :-------------------------------------------: |
| `resnet50_simclr-200e-pre_8xb512-linear-coslr-90e_in1k` | [SIMCLR 200-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/simclr_resnet50_16xb256-coslr-200e_in1k_20220825-4d9cce50.pth) | 25.56 | 4.11 | 66.90 | [config](benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-f12c0457.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-f12c0457.json) |
| `resnet50_simclr-800e-pre_8xb512-linear-coslr-90e_in1k` | [SIMCLR 800-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/simclr_resnet50_16xb256-coslr-800e_in1k_20220825-85fcc4de.pth) | 25.56 | 4.11 | 69.20 | [config](benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-b80ae1e5.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-b80ae1e5.json) |
## Citation
```bibtex
@inproceedings{chen2020simple,
title={A simple framework for contrastive learning of visual representations},
author={Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey},
booktitle={ICML},
year={2020},
}
```
_base_ = [
'../../_base_/models/resnet50.py',
'../../_base_/datasets/imagenet_bs32_pil_resize.py',
'../../_base_/schedules/imagenet_lars_coslr_90e.py',
'../../_base_/default_runtime.py',
]
model = dict(
backbone=dict(
frozen_stages=4,
init_cfg=dict(type='Pretrained', checkpoint='', prefix='backbone.')))
# dataset summary
train_dataloader = dict(batch_size=512)
# runtime settings
default_hooks = dict(
checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3))
Collections:
- Name: SimCLR
Metadata:
Training Data: ImageNet-1k
Training Techniques:
- LARS
Training Resources: 8x V100 GPUs (b256), 16x A100-80G GPUs (b4096)
Architecture:
- ResNet
- SimCLR
Paper:
Title: A simple framework for contrastive learning of visual representations
URL: https://arxiv.org/abs/2002.05709
README: configs/simclr/README.md
Models:
- Name: simclr_resnet50_16xb256-coslr-200e_in1k
Metadata:
Epochs: 200
Batch Size: 4096
FLOPs: 4109364224
Parameters: 27968832
Training Data: ImageNet-1k
In Collection: SimCLR
Results: null
Weights: https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/simclr_resnet50_16xb256-coslr-200e_in1k_20220825-4d9cce50.pth
Config: configs/simclr/simclr_resnet50_16xb256-coslr-200e_in1k.py
Downstream:
- resnet50_simclr-200e-pre_8xb512-linear-coslr-90e_in1k
- Name: simclr_resnet50_16xb256-coslr-800e_in1k
Metadata:
Epochs: 200
Batch Size: 4096
FLOPs: 4109364224
Parameters: 27968832
Training Data: ImageNet-1k
In Collection: SimCLR
Results: null
Weights: https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/simclr_resnet50_16xb256-coslr-800e_in1k_20220825-85fcc4de.pth
Config: configs/simclr/simclr_resnet50_16xb256-coslr-800e_in1k.py
Downstream:
- resnet50_simclr-800e-pre_8xb512-linear-coslr-90e_in1k
- Name: resnet50_simclr-200e-pre_8xb512-linear-coslr-90e_in1k
Metadata:
Epochs: 90
Batch Size: 4096
FLOPs: 4109464576
Parameters: 25557032
Training Data: ImageNet-1k
In Collection: SimCLR
Results:
- Task: Image Classification
Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 66.9
Weights: https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-200e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-f12c0457.pth
Config: configs/simclr/benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py
- Name: resnet50_simclr-800e-pre_8xb512-linear-coslr-90e_in1k
Metadata:
Epochs: 90
Batch Size: 4096
FLOPs: 4109464576
Parameters: 25557032
Training Data: ImageNet-1k
In Collection: SimCLR
Results:
- Task: Image Classification
Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 69.2
Weights: https://download.openmmlab.com/mmselfsup/1.x/simclr/simclr_resnet50_16xb256-coslr-800e_in1k/resnet50_linear-8xb512-coslr-90e_in1k/resnet50_linear-8xb512-coslr-90e_in1k_20220825-b80ae1e5.pth
Config: configs/simclr/benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py
_base_ = [
'../_base_/datasets/imagenet_bs32_simclr.py',
'../_base_/schedules/imagenet_lars_coslr_200e.py',
'../_base_/default_runtime.py',
]
# dataset settings
train_dataloader = dict(batch_size=256)
# model settings
model = dict(
type='SimCLR',
backbone=dict(
type='ResNet',
depth=50,
norm_cfg=dict(type='SyncBN'),
zero_init_residual=True),
neck=dict(
type='NonLinearNeck', # SimCLR non-linear neck
in_channels=2048,
hid_channels=2048,
out_channels=128,
num_layers=2,
with_avg_pool=True),
head=dict(
type='ContrastiveHead',
loss=dict(type='CrossEntropyLoss'),
temperature=0.1),
)
# optimizer
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='LARS', lr=4.8, momentum=0.9, weight_decay=1e-6),
paramwise_cfg=dict(
custom_keys={
'bn': dict(decay_mult=0, lars_exclude=True),
'bias': dict(decay_mult=0, lars_exclude=True),
# bn layer in ResNet block downsample module
'downsample.1': dict(decay_mult=0, lars_exclude=True),
}))
# runtime settings
default_hooks = dict(
# only keeps the latest 3 checkpoints
checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3))
_base_ = [
'../_base_/datasets/imagenet_bs32_simclr.py',
'../_base_/schedules/imagenet_lars_coslr_200e.py',
'../_base_/default_runtime.py',
]
# model settings
model = dict(
type='SimCLR',
backbone=dict(
type='ResNet',
depth=50,
norm_cfg=dict(type='SyncBN'),
zero_init_residual=True),
neck=dict(
type='NonLinearNeck', # SimCLR non-linear neck
in_channels=2048,
hid_channels=2048,
out_channels=128,
num_layers=2,
with_avg_pool=True),
head=dict(
type='ContrastiveHead',
loss=dict(type='CrossEntropyLoss'),
temperature=0.1),
)
# optimizer
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='LARS', lr=4.8, momentum=0.9, weight_decay=1e-6),
paramwise_cfg=dict(
custom_keys={
'bn': dict(decay_mult=0, lars_exclude=True),
'bias': dict(decay_mult=0, lars_exclude=True),
# bn layer in ResNet block downsample module
'downsample.1': dict(decay_mult=0, lars_exclude=True),
}))
# learning rate scheduler
param_scheduler = [
dict(
type='LinearLR',
start_factor=1e-4,
by_epoch=True,
begin=0,
end=10,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR', T_max=790, by_epoch=True, begin=10, end=800)
]
# runtime settings
train_cfg = dict(max_epochs=800)
default_hooks = dict(
# only keeps the latest 3 checkpoints
checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3))
_base_ = [
'../_base_/datasets/imagenet_bs32_simclr.py',
'../_base_/schedules/imagenet_lars_coslr_200e.py',
'../_base_/default_runtime.py',
]
# model settings
model = dict(
type='SimCLR',
backbone=dict(
type='ResNet',
depth=50,
norm_cfg=dict(type='SyncBN'),
zero_init_residual=True),
neck=dict(
type='NonLinearNeck', # SimCLR non-linear neck
in_channels=2048,
hid_channels=2048,
out_channels=128,
num_layers=2,
with_avg_pool=True),
head=dict(
type='ContrastiveHead',
loss=dict(type='CrossEntropyLoss'),
temperature=0.1),
)
# optimizer
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='LARS', lr=0.3, momentum=0.9, weight_decay=1e-6),
paramwise_cfg=dict(
custom_keys={
'bn': dict(decay_mult=0, lars_exclude=True),
'bias': dict(decay_mult=0, lars_exclude=True),
# bn layer in ResNet block downsample module
'downsample.1': dict(decay_mult=0, lars_exclude=True),
}))
# runtime settings
default_hooks = dict(
# only keeps the latest 3 checkpoints
checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3))
# NOTE: `auto_scale_lr` is for automatically scaling LR
# based on the actual training batch size.
auto_scale_lr = dict(base_batch_size=256)
# SimMIM
> [SimMIM: A Simple Framework for Masked Image Modeling](https://arxiv.org/abs/2111.09886)
<!-- [ALGORITHM] -->
## Abstract
This paper presents SimMIM, a simple framework for masked image modeling. We simplify recently proposed related approaches without special designs such as blockwise masking and tokenization via discrete VAE or clustering. To study what let the masked image modeling task learn good representations, we systematically study the major components in our framework, and find that simple designs of each component have revealed very strong representation learning performance: 1) random masking of the input image with a moderately large masked patch size (e.g., 32) makes a strong pre-text task; 2) predicting raw pixels of RGB values by direct regression performs no worse than the patch classification approaches with complex designs; 3) the prediction head can be as light as a linear layer, with no worse performance than heavier ones. Using ViT-B, our approach achieves 83.8% top-1 fine-tuning accuracy on ImageNet-1K by pre-training also on this dataset, surpassing previous best approach by +0.6%. When applied on a larger model of about 650 million parameters, SwinV2H, it achieves 87.1% top-1 accuracy on ImageNet-1K using only ImageNet-1K data. We also leverage this approach to facilitate the training of a 3B model (SwinV2-G), that by 40× less data than that in previous practice, we achieve the state-of-the-art on four representative vision benchmarks. The code and models will be publicly available at https: //github.com/microsoft/SimMIM .
<div align=center>
<img src="https://user-images.githubusercontent.com/30762564/159404597-ac6d3a44-ee59-4cdc-8f6f-506a7d1b18b6.png" width="70%"/>
</div>
## How to use it?
<!-- [TABS-BEGIN] -->
**Predict image**
```python
from mmpretrain import inference_model
predict = inference_model('swin-base-w6_simmim-100e-pre_8xb256-coslr-100e_in1k-192px', 'demo/bird.JPEG')
print(predict['pred_class'])
print(predict['pred_score'])
```
**Use the model**
```python
import torch
from mmpretrain import get_model
model = get_model('simmim_swin-base-w6_8xb256-amp-coslr-100e_in1k-192px', pretrained=True)
inputs = torch.rand(1, 3, 224, 224)
out = model(inputs)
print(type(out))
# To extract features.
feats = model.extract_feat(inputs)
print(type(feats))
```
**Train/Test Command**
Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset).
Train:
```shell
python tools/train.py configs/simmim/simmim_swin-base-w6_8xb256-amp-coslr-100e_in1k-192px.py
```
Test:
```shell
python tools/test.py configs/simmim/benchmarks/swin-base-w6_8xb256-coslr-100e_in1k-192px.py https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k/swin-base_ft-8xb256-coslr-100e_in1k_20220829-9cf23aa1.pth
```
<!-- [TABS-END] -->
## Models and results
### Pretrained models
| Model | Params (M) | Flops (G) | Config | Download |
| :-------------------------------------------------------- | :--------: | :-------: | :-----------------------------------------------------------: | :-------------------------------------------------------------: |
| `simmim_swin-base-w6_8xb256-amp-coslr-100e_in1k-192px` | 89.87 | 18.83 | [config](simmim_swin-base-w6_8xb256-amp-coslr-100e_in1k-192px.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192_20220829-0e15782d.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192_20220829-0e15782d.json) |
| `simmim_swin-base-w6_16xb128-amp-coslr-800e_in1k-192px` | 89.87 | 18.83 | [config](simmim_swin-base-w6_16xb128-amp-coslr-800e_in1k-192px.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192_20220916-a0e931ac.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192_20220916-a0e931ac.json) |
| `simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px` | 199.92 | 55.85 | [config](simmim_swin-large-w12_16xb128-amp-coslr-800e_in1k-192px.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192_20220916-4ad216d3.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192_20220916-4ad216d3.json) |
### Image Classification on ImageNet-1k
| Model | Pretrain | Params (M) | Flops (G) | Top-1 (%) | Config | Download |
| :---------------------------------------- | :------------------------------------------: | :--------: | :-------: | :-------: | :----------------------------------------: | :-------------------------------------------: |
| `swin-base-w6_simmim-100e-pre_8xb256-coslr-100e_in1k-192px` | [SIMMIM 100-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192_20220829-0e15782d.pth) | 87.75 | 11.30 | 82.70 | [config](benchmarks/swin-base-w6_8xb256-coslr-100e_in1k-192px.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k/swin-base_ft-8xb256-coslr-100e_in1k_20220829-9cf23aa1.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k/swin-base_ft-8xb256-coslr-100e_in1k_20220829-9cf23aa1.json) |
| `swin-base-w7_simmim-100e-pre_8xb256-coslr-100e_in1k` | [SIMMIM 100-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192/simmim_swin-base_8xb256-amp-coslr-100e_in1k-192_20220829-0e15782d.pth) | 87.77 | 15.47 | 83.50 | [config](benchmarks/swin-base-w7_8xb256-coslr-100e_in1k.py) | N/A |
| `swin-base-w6_simmim-800e-pre_8xb256-coslr-100e_in1k-192px` | [SIMMIM 800-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192_20220916-a0e931ac.pth) | 87.77 | 15.47 | 83.80 | [config](benchmarks/swin-base-w7_8xb256-coslr-100e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k-224/swin-base_ft-8xb256-coslr-100e_in1k-224_20221208-155cc6e6.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-base_16xb128-amp-coslr-800e_in1k-192/swin-base_ft-8xb256-coslr-100e_in1k-224/swin-base_ft-8xb256-coslr-100e_in1k-224_20221208-155cc6e6.json) |
| `swin-large-w14_simmim-800e-pre_8xb256-coslr-100e_in1k` | [SIMMIM 800-Epochs](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192_20220916-4ad216d3.pth) | 196.85 | 38.85 | 84.80 | [config](benchmarks/swin-large-w14_8xb256-coslr-100e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/swin-large_ft-8xb256-coslr-ws14-100e_in1k-224/swin-large_ft-8xb256-coslr-ws14-100e_in1k-224_20220916-d4865790.pth) \| [log](https://download.openmmlab.com/mmselfsup/1.x/simmim/simmim_swin-large_16xb128-amp-coslr-800e_in1k-192/swin-large_ft-8xb256-coslr-ws14-100e_in1k-224/swin-large_ft-8xb256-coslr-ws14-100e_in1k-224_20220916-d4865790.json) |
## Citation
```bibtex
@inproceedings{xie2021simmim,
title={SimMIM: A Simple Framework for Masked Image Modeling},
author={Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Bao, Jianmin and Yao, Zhuliang and Dai, Qi and Hu, Han},
booktitle={International Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}
```
_base_ = [
'../../_base_/models/swin_transformer/base_224.py',
'../../_base_/datasets/imagenet_bs256_swin_192.py',
'../../_base_/default_runtime.py'
]
# model settings
model = dict(
backbone=dict(
img_size=192,
drop_path_rate=0.1,
stage_cfgs=dict(block_cfgs=dict(window_size=6)),
init_cfg=dict(type='Pretrained', checkpoint='', prefix='backbone.')))
# optimizer settings
optim_wrapper = dict(
type='AmpOptimWrapper',
optimizer=dict(type='AdamW', lr=5e-3, weight_decay=0.05),
clip_grad=dict(max_norm=5.0),
constructor='LearningRateDecayOptimWrapperConstructor',
paramwise_cfg=dict(
layer_decay_rate=0.9,
custom_keys={
'.norm': dict(decay_mult=0.0),
'.bias': dict(decay_mult=0.0),
'.absolute_pos_embed': dict(decay_mult=0.0),
'.relative_position_bias_table': dict(decay_mult=0.0)
}))
# learning rate scheduler
param_scheduler = [
dict(
type='LinearLR',
start_factor=2.5e-7 / 1.25e-3,
by_epoch=True,
begin=0,
end=20,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=80,
eta_min=2.5e-7 * 2048 / 512,
by_epoch=True,
begin=20,
end=100,
convert_to_iter_based=True)
]
# runtime settings
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=100)
val_cfg = dict()
test_cfg = dict()
default_hooks = dict(
# save checkpoint per epoch.
checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3),
logger=dict(type='LoggerHook', interval=100))
randomness = dict(seed=0)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment