添加mmclassification-0.24.1代码，删除mmclassification-speed-benchmark

0fd8347d · unknown · cc567e9e · 0fd8347d · 0fd8347d · 0fd8347d
Commit 0fd8347d authored Jan 08, 2023 by unknown
20 changed files
--- a/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v2/metafile.yml
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v2/metafile.yml
+Collections:
+  - Name: MobileNet V2
+    Metadata:
+      Training Data: ImageNet-1k
+      Training Techniques:
+        - SGD with Momentum
+        - Weight Decay
+      Training Resources: 8x V100 GPUs
+      Epochs: 300
+      Batch Size: 256
+      Architecture:
+        - MobileNet V2
+    Paper:
+      URL: https://arxiv.org/abs/1801.04381
+      Title: "MobileNetV2: Inverted Residuals and Linear Bottlenecks"
+    README: configs/mobilenet_v2/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmclassification/blob/v0.15.0/mmcls/models/backbones/mobilenet_v2.py#L101
+      Version: v0.15.0
+Models:
+  - Name: mobilenet-v2_8xb32_in1k
+    Metadata:
+      FLOPs: 319000000
+      Parameters: 3500000
+    In Collection: MobileNet V2
+    Results:
+      - Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 71.86
+          Top 5 Accuracy: 90.42
+        Task: Image Classification
+    Weights: https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth
+    Config: configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py
--- a/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py
+_base_ = [
+    '../_base_/models/mobilenet_v2_1x.py',
+    '../_base_/datasets/imagenet_bs32_pil_resize.py',
+    '../_base_/schedules/imagenet_bs256_epochstep.py',
+    '../_base_/default_runtime.py'
+]
+#fp16 = dict(loss_scale=512.)
--- a/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v2/mobilenet_v2_b32x8_imagenet.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v2/mobilenet_v2_b32x8_imagenet.py
+_base_ = 'mobilenet-v2_8xb32_in1k.py'
+_deprecation_ = dict(
+    expected='mobilenet-v2_8xb32_in1k.py',
+    reference='https://github.com/open-mmlab/mmclassification/pull/508',
+)
--- a/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/README.md
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/README.md
+# MobileNet V3
+> [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)
+<!-- [ALGORITHM] -->
+## Abstract
+We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances. This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art. Through this process we create two new MobileNet models for release: MobileNetV3-Large and MobileNetV3-Small which are targeted for high and low resource use cases. These models are then adapted and applied to the tasks of object detection and semantic segmentation. For the task of semantic segmentation (or any dense pixel prediction), we propose a new efficient segmentation decoder Lite Reduced Atrous Spatial Pyramid Pooling (LR-ASPP). We achieve new state of the art results for mobile classification, detection and segmentation. MobileNetV3-Large is 3.2% more accurate on ImageNet classification while reducing latency by 15% compared to MobileNetV2. MobileNetV3-Small is 4.6% more accurate while reducing latency by 5% compared to MobileNetV2. MobileNetV3-Large detection is 25% faster at roughly the same accuracy as MobileNetV2 on COCO detection. MobileNetV3-Large LR-ASPP is 30% faster than MobileNetV2 R-ASPP at similar accuracy for Cityscapes segmentation.
+<div align=center>
+<img src="https://user-images.githubusercontent.com/26739999/142563801-ef4feacc-ecd7-4d14-a411-8c9d63571749.png" width="70%"/>
+</div>
+## Results and models
+### ImageNet-1k
+|        Model        | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) |                                  Config                                  |                                  Download                                  |
+| :-----------------: | :-------: | :------: | :-------: | :-------: | :----------------------------------------------------------------------: | :------------------------------------------------------------------------: |
+| MobileNetV3-Small\* |   2.54    |   0.06   |   67.66   |   87.41   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/mobilenet_v3/mobilenet-v3-small_8xb32_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/mobilenet_v3/convert/mobilenet_v3_small-8427ecf0.pth) |
+| MobileNetV3-Large\* |   5.48    |   0.23   |   74.04   |   91.34   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/mobilenet_v3/mobilenet-v3-large_8xb32_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/mobilenet_v3/convert/mobilenet_v3_large-3ea3c186.pth) |
+*Models with * are converted from [torchvision](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv3.html). The config files of these models are only for validation. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.*
+## Citation
+```
+@inproceedings{Howard_2019_ICCV,
+    author = {Howard, Andrew and Sandler, Mark and Chu, Grace and Chen, Liang-Chieh and Chen, Bo and Tan, Mingxing and Wang, Weijun and Zhu, Yukun and Pang, Ruoming and Vasudevan, Vijay and Le, Quoc V. and Adam, Hartwig},
+    title = {Searching for MobileNetV3},
+    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
+    month = {October},
+    year = {2019}
+}
+```
--- a/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/metafile.yml
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/metafile.yml
+Collections:
+  - Name: MobileNet V3
+    Metadata:
+      Training Data: ImageNet-1k
+      Training Techniques:
+        - RMSprop with Momentum
+        - Weight Decay
+      Training Resources: 8x V100 GPUs
+      Epochs: 600
+      Batch Size: 1024
+      Architecture:
+        - MobileNet V3
+    Paper:
+      URL: https://arxiv.org/abs/1905.02244
+      Title: Searching for MobileNetV3
+    README: configs/mobilenet_v3/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmclassification/blob/v0.15.0/mmcls/models/backbones/mobilenet_v3.py
+      Version: v0.15.0
+Models:
+  - Name: mobilenet_v3_small_imagenet
+    Metadata:
+      FLOPs: 60000000
+      Parameters: 2540000
+    In Collection: MobileNet V3
+    Results:
+      - Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 67.66
+          Top 5 Accuracy: 87.41
+        Task: Image Classification
+    Weights: https://download.openmmlab.com/mmclassification/v0/mobilenet_v3/convert/mobilenet_v3_small-8427ecf0.pth
+    Config: configs/mobilenet_v3/mobilenet-v3-small_8xb32_in1k.py
+  - Name: mobilenet_v3_large_imagenet
+    Metadata:
+      FLOPs: 230000000
+      Parameters: 5480000
+    In Collection: MobileNet V3
+    Results:
+      - Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 74.04
+          Top 5 Accuracy: 91.34
+        Task: Image Classification
+    Weights: https://download.openmmlab.com/mmclassification/v0/mobilenet_v3/convert/mobilenet_v3_large-3ea3c186.pth
+    Config: configs/mobilenet_v3/mobilenet-v3-large_8xb32_in1k.py
--- a/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet-v3-large_8xb32_in1k.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet-v3-large_8xb32_in1k.py
+# Refer to https://pytorch.org/blog/ml-models-torchvision-v0.9/#classification
+# ----------------------------
+# -[x] auto_augment='imagenet'
+# -[x] batch_size=128 (per gpu)
+# -[x] epochs=600
+# -[x] opt='rmsprop'
+#     -[x] lr=0.064
+#     -[x] eps=0.0316
+#     -[x] alpha=0.9
+#     -[x] weight_decay=1e-05
+#     -[x] momentum=0.9
+# -[x] lr_gamma=0.973
+# -[x] lr_step_size=2
+# -[x] nproc_per_node=8
+# -[x] random_erase=0.2
+# -[x] workers=16 (workers_per_gpu)
+# - modify: RandomErasing use RE-M instead of RE-0
+_base_ = [
+    '../_base_/models/mobilenet_v3_large_imagenet.py',
+    '../_base_/datasets/imagenet_bs32_pil_resize.py',
+    '../_base_/default_runtime.py'
+]
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+policies = [
+    [
+        dict(type='Posterize', bits=4, prob=0.4),
+        dict(type='Rotate', angle=30., prob=0.6)
+    ],
+    [
+        dict(type='Solarize', thr=256 / 9 * 4, prob=0.6),
+        dict(type='AutoContrast', prob=0.6)
+    ],
+    [dict(type='Equalize', prob=0.8),
+     dict(type='Equalize', prob=0.6)],
+    [
+        dict(type='Posterize', bits=5, prob=0.6),
+        dict(type='Posterize', bits=5, prob=0.6)
+    ],
+    [
+        dict(type='Equalize', prob=0.4),
+        dict(type='Solarize', thr=256 / 9 * 5, prob=0.2)
+    ],
+    [
+        dict(type='Equalize', prob=0.4),
+        dict(type='Rotate', angle=30 / 9 * 8, prob=0.8)
+    ],
+    [
+        dict(type='Solarize', thr=256 / 9 * 6, prob=0.6),
+        dict(type='Equalize', prob=0.6)
+    ],
+    [dict(type='Posterize', bits=6, prob=0.8),
+     dict(type='Equalize', prob=1.)],
+    [
+        dict(type='Rotate', angle=10., prob=0.2),
+        dict(type='Solarize', thr=256 / 9, prob=0.6)
+    ],
+    [
+        dict(type='Equalize', prob=0.6),
+        dict(type='Posterize', bits=5, prob=0.4)
+    ],
+    [
+        dict(type='Rotate', angle=30 / 9 * 8, prob=0.8),
+        dict(type='ColorTransform', magnitude=0., prob=0.4)
+    ],
+    [
+        dict(type='Rotate', angle=30., prob=0.4),
+        dict(type='Equalize', prob=0.6)
+    ],
+    [dict(type='Equalize', prob=0.0),
+     dict(type='Equalize', prob=0.8)],
+    [dict(type='Invert', prob=0.6),
+     dict(type='Equalize', prob=1.)],
+    [
+        dict(type='ColorTransform', magnitude=0.4, prob=0.6),
+        dict(type='Contrast', magnitude=0.8, prob=1.)
+    ],
+    [
+        dict(type='Rotate', angle=30 / 9 * 8, prob=0.8),
+        dict(type='ColorTransform', magnitude=0.2, prob=1.)
+    ],
+    [
+        dict(type='ColorTransform', magnitude=0.8, prob=0.8),
+        dict(type='Solarize', thr=256 / 9 * 2, prob=0.8)
+    ],
+    [
+        dict(type='Sharpness', magnitude=0.7, prob=0.4),
+        dict(type='Invert', prob=0.6)
+    ],
+    [
+        dict(
+            type='Shear',
+            magnitude=0.3 / 9 * 5,
+            prob=0.6,
+            direction='horizontal'),
+        dict(type='Equalize', prob=1.)
+    ],
+    [
+        dict(type='ColorTransform', magnitude=0., prob=0.4),
+        dict(type='Equalize', prob=0.6)
+    ],
+    [
+        dict(type='Equalize', prob=0.4),
+        dict(type='Solarize', thr=256 / 9 * 5, prob=0.2)
+    ],
+    [
+        dict(type='Solarize', thr=256 / 9 * 4, prob=0.6),
+        dict(type='AutoContrast', prob=0.6)
+    ],
+    [dict(type='Invert', prob=0.6),
+     dict(type='Equalize', prob=1.)],
+    [
+        dict(type='ColorTransform', magnitude=0.4, prob=0.6),
+        dict(type='Contrast', magnitude=0.8, prob=1.)
+    ],
+    [dict(type='Equalize', prob=0.8),
+     dict(type='Equalize', prob=0.6)],
+]
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='RandomResizedCrop', size=224, backend='pillow'),
+    dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
+    dict(type='AutoAugment', policies=policies),
+    dict(
+        type='RandomErasing',
+        erase_prob=0.2,
+        mode='const',
+        min_area_ratio=0.02,
+        max_area_ratio=1 / 3,
+        fill_color=img_norm_cfg['mean']),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='ImageToTensor', keys=['img']),
+    dict(type='ToTensor', keys=['gt_label']),
+    dict(type='Collect', keys=['img', 'gt_label'])
+]
+data = dict(
+    samples_per_gpu=128,
+    workers_per_gpu=4,
+    train=dict(pipeline=train_pipeline))
+evaluation = dict(interval=10, metric='accuracy')
+# optimizer
+optimizer = dict(
+    type='RMSprop',
+    lr=0.064,
+    alpha=0.9,
+    momentum=0.9,
+    eps=0.0316,
+    weight_decay=1e-5)
+optimizer_config = dict(grad_clip=None)
+# learning policy
+lr_config = dict(policy='step', step=2, gamma=0.973, by_epoch=True)
+runner = dict(type='EpochBasedRunner', max_epochs=600)
--- a/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet-v3-small_8xb16_cifar10.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet-v3-small_8xb16_cifar10.py
+_base_ = [
+    '../_base_/models/mobilenet-v3-small_cifar.py',
+    '../_base_/datasets/cifar10_bs16.py',
+    '../_base_/schedules/cifar10_bs128.py', '../_base_/default_runtime.py'
+]
+lr_config = dict(policy='step', step=[120, 170])
+runner = dict(type='EpochBasedRunner', max_epochs=200)
--- a/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet-v3-small_8xb32_in1k.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet-v3-small_8xb32_in1k.py
+# Refer to https://pytorch.org/blog/ml-models-torchvision-v0.9/#classification
+# ----------------------------
+# -[x] auto_augment='imagenet'
+# -[x] batch_size=128 (per gpu)
+# -[x] epochs=600
+# -[x] opt='rmsprop'
+#     -[x] lr=0.064
+#     -[x] eps=0.0316
+#     -[x] alpha=0.9
+#     -[x] weight_decay=1e-05
+#     -[x] momentum=0.9
+# -[x] lr_gamma=0.973
+# -[x] lr_step_size=2
+# -[x] nproc_per_node=8
+# -[x] random_erase=0.2
+# -[x] workers=16 (workers_per_gpu)
+# - modify: RandomErasing use RE-M instead of RE-0
+_base_ = [
+    '../_base_/models/mobilenet_v3_small_imagenet.py',
+    '../_base_/datasets/imagenet_bs32_pil_resize.py',
+    '../_base_/default_runtime.py'
+]
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+policies = [
+    [
+        dict(type='Posterize', bits=4, prob=0.4),
+        dict(type='Rotate', angle=30., prob=0.6)
+    ],
+    [
+        dict(type='Solarize', thr=256 / 9 * 4, prob=0.6),
+        dict(type='AutoContrast', prob=0.6)
+    ],
+    [dict(type='Equalize', prob=0.8),
+     dict(type='Equalize', prob=0.6)],
+    [
+        dict(type='Posterize', bits=5, prob=0.6),
+        dict(type='Posterize', bits=5, prob=0.6)
+    ],
+    [
+        dict(type='Equalize', prob=0.4),
+        dict(type='Solarize', thr=256 / 9 * 5, prob=0.2)
+    ],
+    [
+        dict(type='Equalize', prob=0.4),
+        dict(type='Rotate', angle=30 / 9 * 8, prob=0.8)
+    ],
+    [
+        dict(type='Solarize', thr=256 / 9 * 6, prob=0.6),
+        dict(type='Equalize', prob=0.6)
+    ],
+    [dict(type='Posterize', bits=6, prob=0.8),
+     dict(type='Equalize', prob=1.)],
+    [
+        dict(type='Rotate', angle=10., prob=0.2),
+        dict(type='Solarize', thr=256 / 9, prob=0.6)
+    ],
+    [
+        dict(type='Equalize', prob=0.6),
+        dict(type='Posterize', bits=5, prob=0.4)
+    ],
+    [
+        dict(type='Rotate', angle=30 / 9 * 8, prob=0.8),
+        dict(type='ColorTransform', magnitude=0., prob=0.4)
+    ],
+    [
+        dict(type='Rotate', angle=30., prob=0.4),
+        dict(type='Equalize', prob=0.6)
+    ],
+    [dict(type='Equalize', prob=0.0),
+     dict(type='Equalize', prob=0.8)],
+    [dict(type='Invert', prob=0.6),
+     dict(type='Equalize', prob=1.)],
+    [
+        dict(type='ColorTransform', magnitude=0.4, prob=0.6),
+        dict(type='Contrast', magnitude=0.8, prob=1.)
+    ],
+    [
+        dict(type='Rotate', angle=30 / 9 * 8, prob=0.8),
+        dict(type='ColorTransform', magnitude=0.2, prob=1.)
+    ],
+    [
+        dict(type='ColorTransform', magnitude=0.8, prob=0.8),
+        dict(type='Solarize', thr=256 / 9 * 2, prob=0.8)
+    ],
+    [
+        dict(type='Sharpness', magnitude=0.7, prob=0.4),
+        dict(type='Invert', prob=0.6)
+    ],
+    [
+        dict(
+            type='Shear',
+            magnitude=0.3 / 9 * 5,
+            prob=0.6,
+            direction='horizontal'),
+        dict(type='Equalize', prob=1.)
+    ],
+    [
+        dict(type='ColorTransform', magnitude=0., prob=0.4),
+        dict(type='Equalize', prob=0.6)
+    ],
+    [
+        dict(type='Equalize', prob=0.4),
+        dict(type='Solarize', thr=256 / 9 * 5, prob=0.2)
+    ],
+    [
+        dict(type='Solarize', thr=256 / 9 * 4, prob=0.6),
+        dict(type='AutoContrast', prob=0.6)
+    ],
+    [dict(type='Invert', prob=0.6),
+     dict(type='Equalize', prob=1.)],
+    [
+        dict(type='ColorTransform', magnitude=0.4, prob=0.6),
+        dict(type='Contrast', magnitude=0.8, prob=1.)
+    ],
+    [dict(type='Equalize', prob=0.8),
+     dict(type='Equalize', prob=0.6)],
+]
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='RandomResizedCrop', size=224, backend='pillow'),
+    dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
+    dict(type='AutoAugment', policies=policies),
+    dict(
+        type='RandomErasing',
+        erase_prob=0.2,
+        mode='const',
+        min_area_ratio=0.02,
+        max_area_ratio=1 / 3,
+        fill_color=img_norm_cfg['mean']),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='ImageToTensor', keys=['img']),
+    dict(type='ToTensor', keys=['gt_label']),
+    dict(type='Collect', keys=['img', 'gt_label'])
+]
+data = dict(
+    samples_per_gpu=128,
+    workers_per_gpu=4,
+    train=dict(pipeline=train_pipeline))
+evaluation = dict(interval=10, metric='accuracy')
+# optimizer
+optimizer = dict(
+    type='RMSprop',
+    lr=0.064,
+    alpha=0.9,
+    momentum=0.9,
+    eps=0.0316,
+    weight_decay=1e-5)
+optimizer_config = dict(grad_clip=None)
+# learning policy
+lr_config = dict(policy='step', step=2, gamma=0.973, by_epoch=True)
+runner = dict(type='EpochBasedRunner', max_epochs=600)
--- a/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet_v3_large_imagenet.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet_v3_large_imagenet.py
+_base_ = 'mobilenet-v3-large_8xb32_in1k.py'
+_deprecation_ = dict(
+    expected='mobilenet-v3-large_8xb32_in1k.py',
+    reference='https://github.com/open-mmlab/mmclassification/pull/508',
+)
--- a/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet_v3_small_cifar.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet_v3_small_cifar.py
+_base_ = 'mobilenet-v3-small_8xb16_cifar10.py'
+_deprecation_ = dict(
+    expected='mobilenet-v3-small_8xb16_cifar10.py',
+    reference='https://github.com/open-mmlab/mmclassification/pull/508',
+)
--- a/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet_v3_small_imagenet.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mobilenet_v3/mobilenet_v3_small_imagenet.py
+_base_ = 'mobilenet-v3-small_8xb32_in1k.py'
+_deprecation_ = dict(
+    expected='mobilenet-v3-small_8xb32_in1k.py',
+    reference='https://github.com/open-mmlab/mmclassification/pull/508',
+)
--- a/openmmlab_test/mmclassification-0.24.1/configs/mvit/README.md
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mvit/README.md
+# MViT V2
+> [MViTv2: Improved Multiscale Vision Transformers for Classification and Detection](http://openaccess.thecvf.com//content/CVPR2022/papers/Li_MViTv2_Improved_Multiscale_Vision_Transformers_for_Classification_and_Detection_CVPR_2022_paper.pdf)
+<!-- [ALGORITHM] -->
+## Abstract
+In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video
+classification, as well as object detection. We present an improved version of MViT that incorporates
+decomposed relative positional embeddings and residual pooling connections. We instantiate this architecture
+in five sizes and evaluate it for ImageNet classification, COCO detection and Kinetics video recognition where
+it outperforms prior work. We further compare MViTv2s' pooling attention to window attention mechanisms where
+it outperforms the latter in accuracy/compute. Without bells-and-whistles, MViTv2 has state-of-the-art
+performance in 3 domains: 88.8% accuracy on ImageNet classification, 58.7 boxAP on COCO object detection as
+well as 86.1% on Kinetics-400 video classification.
+<div align=center>
+<img src="https://user-images.githubusercontent.com/26739999/180376227-755243fa-158e-4068-940a-416036519665.png" width="50%"/>
+</div>
+## Results and models
+### ImageNet-1k
+|     Model      |   Pretrain   | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) |                                Config                                |                                Download                                 |
+| :------------: | :----------: | :-------: | :------: | :-------: | :-------: | :------------------------------------------------------------------: | :---------------------------------------------------------------------: |
+| MViTv2-tiny\*  | From scratch |   24.17   |   4.70   |   82.33   |   96.15   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/mvit/mvitv2-tiny_8xb256_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-tiny_3rdparty_in1k_20220722-db7beeef.pth) |
+| MViTv2-small\* | From scratch |   34.87   |   7.00   |   83.63   |   96.51   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/mvit/mvitv2-small_8xb256_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-small_3rdparty_in1k_20220722-986bd741.pth) |
+| MViTv2-base\*  | From scratch |   51.47   |  10.20   |   84.34   |   96.86   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/mvit/mvitv2-base_8xb256_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-base_3rdparty_in1k_20220722-9c4f0a17.pth) |
+| MViTv2-large\* | From scratch |  217.99   |  42.10   |   85.25   |   97.14   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/mvit/mvitv2-large_8xb256_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-large_3rdparty_in1k_20220722-2b57b983.pth) |
+*Models with * are converted from the [official repo](https://github.com/facebookresearch/mvit). The config files of these models are only for inference. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.*
+## Citation
+```bibtex
+@inproceedings{li2021improved,
+  title={MViTv2: Improved multiscale vision transformers for classification and detection},
+  author={Li, Yanghao and Wu, Chao-Yuan and Fan, Haoqi and Mangalam, Karttikeya and Xiong, Bo and Malik, Jitendra and Feichtenhofer, Christoph},
+  booktitle={CVPR},
+  year={2022}
+}
+```
--- a/openmmlab_test/mmclassification-0.24.1/configs/mvit/metafile.yml
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mvit/metafile.yml
+Collections:
+  - Name: MViT V2
+    Metadata:
+      Architecture:
+        - Attention Dropout
+        - Convolution
+        - Dense Connections
+        - GELU
+        - Layer Normalization
+        - Scaled Dot-Product Attention
+        - Attention Pooling
+    Paper:
+      URL: http://openaccess.thecvf.com//content/CVPR2022/papers/Li_MViTv2_Improved_Multiscale_Vision_Transformers_for_Classification_and_Detection_CVPR_2022_paper.pdf
+      Title: 'MViTv2: Improved Multiscale Vision Transformers for Classification and Detection'
+    README: configs/mvit/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmclassification/blob/v0.24.0/mmcls/models/backbones/mvit.py
+      Version: v0.24.0
+Models:
+  - Name: mvitv2-tiny_3rdparty_in1k
+    In Collection: MViT V2
+    Metadata:
+      FLOPs: 4700000000
+      Parameters: 24173320
+      Training Data:
+        - ImageNet-1k
+    Results:
+    - Dataset: ImageNet-1k
+      Task: Image Classification
+      Metrics:
+        Top 1 Accuracy: 82.33
+        Top 5 Accuracy: 96.15
+    Weights: https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-tiny_3rdparty_in1k_20220722-db7beeef.pth
+    Converted From:
+      Weights: https://dl.fbaipublicfiles.com/mvit/mvitv2_models/MViTv2_T_in1k.pyth
+      Code: https://github.com/facebookresearch/mvit
+    Config: configs/mvit/mvitv2-tiny_8xb256_in1k.py
+  - Name: mvitv2-small_3rdparty_in1k
+    In Collection: MViT V2
+    Metadata:
+      FLOPs: 7000000000
+      Parameters: 34870216
+      Training Data:
+        - ImageNet-1k
+    Results:
+    - Dataset: ImageNet-1k
+      Task: Image Classification
+      Metrics:
+        Top 1 Accuracy: 83.63
+        Top 5 Accuracy: 96.51
+    Weights: https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-small_3rdparty_in1k_20220722-986bd741.pth
+    Converted From:
+      Weights: https://dl.fbaipublicfiles.com/mvit/mvitv2_models/MViTv2_S_in1k.pyth
+      Code: https://github.com/facebookresearch/mvit
+    Config: configs/mvit/mvitv2-small_8xb256_in1k.py
+  - Name: mvitv2-base_3rdparty_in1k
+    In Collection: MViT V2
+    Metadata:
+      FLOPs: 10200000000
+      Parameters: 51472744
+      Training Data:
+        - ImageNet-1k
+    Results:
+    - Dataset: ImageNet-1k
+      Task: Image Classification
+      Metrics:
+        Top 1 Accuracy: 84.34
+        Top 5 Accuracy: 96.86
+    Weights: https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-base_3rdparty_in1k_20220722-9c4f0a17.pth
+    Converted From:
+      Weights: https://dl.fbaipublicfiles.com/mvit/mvitv2_models/MViTv2_B_in1k.pyth
+      Code: https://github.com/facebookresearch/mvit
+    Config: configs/mvit/mvitv2-base_8xb256_in1k.py
+  - Name: mvitv2-large_3rdparty_in1k
+    In Collection: MViT V2
+    Metadata:
+      FLOPs: 42100000000
+      Parameters: 217992952
+      Training Data:
+        - ImageNet-1k
+    Results:
+    - Dataset: ImageNet-1k
+      Task: Image Classification
+      Metrics:
+        Top 1 Accuracy: 85.25
+        Top 5 Accuracy: 97.14
+    Weights: https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-large_3rdparty_in1k_20220722-2b57b983.pth
+    Converted From:
+      Weights: https://dl.fbaipublicfiles.com/mvit/mvitv2_models/MViTv2_L_in1k.pyth
+      Code: https://github.com/facebookresearch/mvit
+    Config: configs/mvit/mvitv2-large_8xb256_in1k.py
--- a/openmmlab_test/mmclassification-0.24.1/configs/mvit/mvitv2-base_8xb256_in1k.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mvit/mvitv2-base_8xb256_in1k.py
+_base_ = [
+    '../_base_/models/mvit/mvitv2-base.py',
+    '../_base_/datasets/imagenet_bs64_swin_224.py',
+    '../_base_/schedules/imagenet_bs1024_adamw_swin.py',
+    '../_base_/default_runtime.py'
+]
+# dataset settings
+data = dict(samples_per_gpu=256)
+# schedule settings
+paramwise_cfg = dict(
+    norm_decay_mult=0.0,
+    bias_decay_mult=0.0,
+    custom_keys={
+        '.pos_embed': dict(decay_mult=0.0),
+        '.rel_pos_h': dict(decay_mult=0.0),
+        '.rel_pos_w': dict(decay_mult=0.0)
+    })
+optimizer = dict(lr=0.00025, paramwise_cfg=paramwise_cfg)
+optimizer_config = dict(grad_clip=dict(max_norm=1.0))
+# learning policy
+lr_config = dict(
+    policy='CosineAnnealing',
+    warmup='linear',
+    warmup_iters=70,
+    warmup_by_epoch=True)
--- a/openmmlab_test/mmclassification-0.24.1/configs/mvit/mvitv2-large_8xb256_in1k.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mvit/mvitv2-large_8xb256_in1k.py
+_base_ = [
+    '../_base_/models/mvit/mvitv2-large.py',
+    '../_base_/datasets/imagenet_bs64_swin_224.py',
+    '../_base_/schedules/imagenet_bs2048_AdamW.py',
+    '../_base_/default_runtime.py'
+]
+# dataset settings
+data = dict(samples_per_gpu=256)
+# schedule settings
+paramwise_cfg = dict(
+    norm_decay_mult=0.0,
+    bias_decay_mult=0.0,
+    custom_keys={
+        '.pos_embed': dict(decay_mult=0.0),
+        '.rel_pos_h': dict(decay_mult=0.0),
+        '.rel_pos_w': dict(decay_mult=0.0)
+    })
+optimizer = dict(lr=0.00025, paramwise_cfg=paramwise_cfg)
+optimizer_config = dict(grad_clip=dict(max_norm=1.0))
+# learning policy
+lr_config = dict(
+    policy='CosineAnnealing',
+    warmup='linear',
+    warmup_iters=70,
+    warmup_by_epoch=True)
--- a/openmmlab_test/mmclassification-0.24.1/configs/mvit/mvitv2-small_8xb256_in1k.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mvit/mvitv2-small_8xb256_in1k.py
+_base_ = [
+    '../_base_/models/mvit/mvitv2-small.py',
+    '../_base_/datasets/imagenet_bs64_swin_224.py',
+    '../_base_/schedules/imagenet_bs2048_AdamW.py',
+    '../_base_/default_runtime.py'
+]
+# dataset settings
+data = dict(samples_per_gpu=256)
+# schedule settings
+paramwise_cfg = dict(
+    norm_decay_mult=0.0,
+    bias_decay_mult=0.0,
+    custom_keys={
+        '.pos_embed': dict(decay_mult=0.0),
+        '.rel_pos_h': dict(decay_mult=0.0),
+        '.rel_pos_w': dict(decay_mult=0.0)
+    })
+optimizer = dict(lr=0.00025, paramwise_cfg=paramwise_cfg)
+optimizer_config = dict(grad_clip=dict(max_norm=1.0))
+# learning policy
+lr_config = dict(
+    policy='CosineAnnealing',
+    warmup='linear',
+    warmup_iters=70,
+    warmup_by_epoch=True)
--- a/openmmlab_test/mmclassification-0.24.1/configs/mvit/mvitv2-tiny_8xb256_in1k.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/mvit/mvitv2-tiny_8xb256_in1k.py
+_base_ = [
+    '../_base_/models/mvit/mvitv2-tiny.py',
+    '../_base_/datasets/imagenet_bs64_swin_224.py',
+    '../_base_/schedules/imagenet_bs2048_AdamW.py',
+    '../_base_/default_runtime.py'
+]
+# dataset settings
+data = dict(samples_per_gpu=256)
+# schedule settings
+paramwise_cfg = dict(
+    norm_decay_mult=0.0,
+    bias_decay_mult=0.0,
+    custom_keys={
+        '.pos_embed': dict(decay_mult=0.0),
+        '.rel_pos_h': dict(decay_mult=0.0),
+        '.rel_pos_w': dict(decay_mult=0.0)
+    })
+optimizer = dict(lr=0.00025, paramwise_cfg=paramwise_cfg)
+optimizer_config = dict(grad_clip=dict(max_norm=1.0))
+# learning policy
+lr_config = dict(
+    policy='CosineAnnealing',
+    warmup='linear',
+    warmup_iters=70,
+    warmup_by_epoch=True)
--- a/openmmlab_test/mmclassification-0.24.1/configs/poolformer/README.md
+++ b/openmmlab_test/mmclassification-0.24.1/configs/poolformer/README.md
+# PoolFormer
+> [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418)
+<!-- [ALGORITHM] -->
+## Abstract
+Transformers have shown great potential in computer vision tasks. A common belief is their attention-based token mixer module contributes most to their competence. However, recent works show the attention-based module in transformers can be replaced by spatial MLPs and the resulted models still perform quite well. Based on this observation, we hypothesize that the general architecture of the transformers, instead of the specific token mixer module, is more essential to the model's performance. To verify this, we deliberately replace the attention module in transformers with an embarrassingly simple spatial pooling operator to conduct only basic token mixing. Surprisingly, we observe that the derived model, termed as PoolFormer, achieves competitive performance on multiple computer vision tasks. For example, on ImageNet-1K, PoolFormer achieves 82.1% top-1 accuracy, surpassing well-tuned vision transformer/MLP-like baselines DeiT-B/ResMLP-B24 by 0.3%/1.1% accuracy with 35%/52% fewer parameters and 49%/61% fewer MACs. The effectiveness of PoolFormer verifies our hypothesis and urges us to initiate the concept of "MetaFormer", a general architecture abstracted from transformers without specifying the token mixer. Based on the extensive experiments, we argue that MetaFormer is the key player in achieving superior results for recent transformer and MLP-like models on vision tasks. This work calls for more future research dedicated to improving MetaFormer instead of focusing on the token mixer modules. Additionally, our proposed PoolFormer could serve as a starting baseline for future MetaFormer architecture design.
+<div align=center>
+<img src="https://user-images.githubusercontent.com/15921929/144710761-1635f59a-abde-4946-984c-a2c3f22a19d2.png" width="100%"/>
+</div>
+## Results and models
+### ImageNet-1k
+|      Model       | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) |                                  Config                                   |                                   Download                                   |
+| :--------------: | :-------: | :------: | :-------: | :-------: | :-----------------------------------------------------------------------: | :--------------------------------------------------------------------------: |
+| PoolFormer-S12\* |   11.92   |   1.87   |   77.24   |   93.51   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/poolformer/poolformer-s12_32xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-s12_3rdparty_32xb128_in1k_20220414-f8d83051.pth) |
+| PoolFormer-S24\* |   21.39   |   3.51   |   80.33   |   95.05   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/poolformer/poolformer-s24_32xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-s24_3rdparty_32xb128_in1k_20220414-d7055904.pth) |
+| PoolFormer-S36\* |   30.86   |   5.15   |   81.43   |   95.45   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/poolformer/poolformer-s36_32xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-s36_3rdparty_32xb128_in1k_20220414-d78ff3e8.pth) |
+| PoolFormer-M36\* |   56.17   |   8.96   |   82.14   |   95.71   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/poolformer/poolformer-m36_32xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-m36_3rdparty_32xb128_in1k_20220414-c55e0949.pth) |
+| PoolFormer-M48\* |   73.47   |  11.80   |   82.51   |   95.95   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/poolformer/poolformer-m48_32xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-m48_3rdparty_32xb128_in1k_20220414-9378f3eb.pth) |
+*Models with * are converted from the [official repo](https://github.com/sail-sg/poolformer). The config files of these models are only for inference. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.*
+## Citation
+```bibtex
+@article{yu2021metaformer,
+  title={MetaFormer is Actually What You Need for Vision},
+  author={Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng},
+  journal={arXiv preprint arXiv:2111.11418},
+  year={2021}
+}
+```
--- a/openmmlab_test/mmclassification-0.24.1/configs/poolformer/metafile.yml
+++ b/openmmlab_test/mmclassification-0.24.1/configs/poolformer/metafile.yml
+Collections:
+  - Name: PoolFormer
+    Metadata:
+      Training Data: ImageNet-1k
+      Architecture:
+        - Pooling
+        - 1x1 Convolution
+        - LayerScale
+    Paper:
+      URL: https://arxiv.org/abs/2111.11418
+      Title: MetaFormer is Actually What You Need for Vision
+    README: configs/poolformer/README.md
+    Code:
+      Version: v0.22.1
+      URL: https://github.com/open-mmlab/mmclassification/blob/v0.22.1/mmcls/models/backbones/poolformer.py
+Models:
+  - Name: poolformer-s12_3rdparty_32xb128_in1k
+    Metadata:
+      FLOPs: 1871399424
+      Parameters: 11915176
+    In Collections: PoolFormer
+    Results:
+      - Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 77.24
+          Top 5 Accuracy: 93.51
+        Task: Image Classification
+    Weights: https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-s12_3rdparty_32xb128_in1k_20220414-f8d83051.pth
+    Config: configs/poolformer/poolformer-s12_32xb128_in1k.py
+    Converted From:
+      Weights: https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_s12.pth.tar
+      Code: https://github.com/sail-sg/poolformer
+  - Name: poolformer-s24_3rdparty_32xb128_in1k
+    Metadata:
+      Training Data: ImageNet-1k
+      FLOPs: 3510411008
+      Parameters: 21388968
+    In Collections: PoolFormer
+    Results:
+      - Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 80.33
+          Top 5 Accuracy: 95.05
+        Task: Image Classification
+    Weights: https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-s24_3rdparty_32xb128_in1k_20220414-d7055904.pth
+    Config: configs/poolformer/poolformer-s24_32xb128_in1k.py
+    Converted From:
+      Weights: https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_s24.pth.tar
+      Code: https://github.com/sail-sg/poolformer
+  - Name: poolformer-s36_3rdparty_32xb128_in1k
+    Metadata:
+      FLOPs: 5149422592
+      Parameters: 30862760
+    In Collections: PoolFormer
+    Results:
+      - Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 81.43
+          Top 5 Accuracy: 95.45
+        Task: Image Classification
+    Weights: https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-s36_3rdparty_32xb128_in1k_20220414-d78ff3e8.pth
+    Config: configs/poolformer/poolformer-s36_32xb128_in1k.py
+    Converted From:
+      Weights: https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_s36.pth.tar
+      Code: https://github.com/sail-sg/poolformer
+  - Name: poolformer-m36_3rdparty_32xb128_in1k
+    Metadata:
+      Training Data: ImageNet-1k
+      FLOPs: 8960175744
+      Parameters: 56172520
+    In Collections: PoolFormer
+    Results:
+      - Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 82.14
+          Top 5 Accuracy: 95.71
+        Task: Image Classification
+    Weights: https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-m36_3rdparty_32xb128_in1k_20220414-c55e0949.pth
+    Config: configs/poolformer/poolformer-m36_32xb128_in1k.py
+    Converted From:
+      Weights: https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_m36.pth.tar
+      Code: https://github.com/sail-sg/poolformer
+  - Name: poolformer-m48_3rdparty_32xb128_in1k
+    Metadata:
+      FLOPs: 11801805696
+      Parameters: 73473448
+    In Collections: PoolFormer
+    Results:
+      - Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 82.51
+          Top 5 Accuracy: 95.95
+        Task: Image Classification
+    Weights: https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-m48_3rdparty_32xb128_in1k_20220414-9378f3eb.pth
+    Config: configs/poolformer/poolformer-m48_32xb128_in1k.py
+    Converted From:
+      Weights: https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_m48.pth.tar
+      Code: https://github.com/sail-sg/poolformer
--- a/openmmlab_test/mmclassification-0.24.1/configs/poolformer/poolformer-m36_32xb128_in1k.py
+++ b/openmmlab_test/mmclassification-0.24.1/configs/poolformer/poolformer-m36_32xb128_in1k.py
+_base_ = [
+    '../_base_/models/poolformer/poolformer_m36.py',
+    '../_base_/datasets/imagenet_bs128_poolformer_medium_224.py',
+    '../_base_/schedules/imagenet_bs1024_adamw_swin.py',
+    '../_base_/default_runtime.py',
+]
+optimizer = dict(lr=4e-3)