v1.0

68bc58a9 · chenzk · 68bc58a9 · 68bc58a9 · 68bc58a9 · 68bc58a9
Commit 68bc58a9 authored May 15, 2024 by chenzk
20 changed files
--- a/LICENSE
+++ b/LICENSE
+Copyright 2022 Google LLC. All rights reserved.
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright 2016, The Authors.
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
\ No newline at end of file
--- a/MobileNetv4.py
+++ b/MobileNetv4.py
+#!/usr/bin/python
+# -*- coding: utf-8 -*-
+# @Time : 2024/3/24 10:27
+# @Author : 'IReverser'
+# @FileName: vmamba.py
+# Reference: https://github.com/jaiwei98/MobileNetV4-pytorch
+from typing import Any, Callable, Dict, List, Mapping, Optional, Tuple, Union
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from model_config import MODEL_SPECS
+def make_divisible(
+        value: float,
+        divisor: int,
+        min_value: Optional[float] = None,
+        round_down_protect: bool = True,
+) -> int:
+    """
+    This function is copied from here
+    "https://github.com/tensorflow/models/blob/master/official/vision/modeling/layers/nn_layers.py"
+    This is to ensure that all layers have channels that are divisible by 8.
+    Args:
+        value: A `float` of original value.
+        divisor: An `int` of the divisor that need to be checked upon.
+        min_value: A `float` of  minimum value threshold.
+        round_down_protect: A `bool` indicating whether round down more than 10%
+        will be allowed.
+    Returns:
+        The adjusted value in `int` that is divisible against divisor.
+    """
+    if min_value is None:
+        min_value = divisor
+    new_value = max(min_value, int(value + divisor / 2) // divisor * divisor)
+    # Make sure that round down does not go down by more than 10%.
+    if round_down_protect and new_value < 0.9 * value:
+        new_value += divisor
+    return int(new_value)
+def conv2d(in_channels, out_channels, kernel_size=3, stride=1, groups=1, bias=False, norm=True, act=True):
+    conv = nn.Sequential()
+    padding = (kernel_size - 1) // 2
+    conv.append(nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=bias, groups=groups))
+    if norm:
+        conv.append(nn.BatchNorm2d(out_channels))
+    if act:
+        conv.append(nn.ReLU6())
+    return conv
+class InvertedResidual(nn.Module):
+    def __init__(self, in_channels, out_channels, stride, expand_ratio, act=False, squeeze_exactation=False):
+        super(InvertedResidual, self).__init__()
+        self.stride = stride
+        assert stride in [1, 2]
+        hidden_dim = int(round(in_channels * expand_ratio))
+        self.block = nn.Sequential()
+        if expand_ratio != 1:
+            self.block.add_module("exp_1x1", conv2d(in_channels, hidden_dim, kernel_size=3, stride=stride))
+        if squeeze_exactation:
+            self.block.add_module("conv_3x3", conv2d(hidden_dim, hidden_dim, kernel_size=3, stride=stride, groups=hidden_dim))
+        self.block.add_module("res_1x1", conv2d(hidden_dim, out_channels, kernel_size=1, stride=1, act=act))
+        self.use_res_connect = self.stride == 1 and in_channels == out_channels
+    def forward(self, x):
+        if self.use_res_connect:
+            return x + self.block(x)
+        else:
+            return self.block(x)
+class UniversalInvertedBottleneckBlock(nn.Module):
+    def __init__(self, in_channels, out_channels, start_dw_kernel_size, middle_dw_kernel_size, middle_dw_downsample,
+                 stride, expand_ratio):
+        """An inverted bottleneck block with optional depthwises.
+        Referenced from here https://github.com/tensorflow/models/blob/master/official/vision/modeling/layers/nn_blocks.py
+        """
+        super(UniversalInvertedBottleneckBlock, self).__init__()
+        # starting depthwise conv
+        self.start_dw_kernel_size = start_dw_kernel_size
+        if self.start_dw_kernel_size:
+            stride_ = stride if not middle_dw_downsample else 1
+            self._start_dw_ = conv2d(in_channels, in_channels, kernel_size=start_dw_kernel_size, stride=stride_, groups=in_channels, act=False)
+        # expansion with 1x1 convs
+        expand_filters = make_divisible(in_channels * expand_ratio, 8)
+        self._expand_conv = conv2d(in_channels, expand_filters, kernel_size=1)
+        # middle depthwise conv
+        self.middle_dw_kernel_size = middle_dw_kernel_size
+        if self.middle_dw_kernel_size:
+            stride_ = stride if middle_dw_downsample else 1
+            self._middle_dw = conv2d(expand_filters, expand_filters, kernel_size=middle_dw_kernel_size, stride=stride_, groups=expand_filters)
+        # projection with 1x1 convs
+        self._proj_conv = conv2d(expand_filters, out_channels, kernel_size=1, stride=1, act=False)
+        # expand depthwise conv (not used)
+        # _end_dw_kernel_size = 0
+        # self._end_dw = conv2d(out_channels, out_channels, kernel_size=_end_dw_kernel_size, stride=stride, groups=in_channels, act=False)
+    def forward(self, x):
+        if self.start_dw_kernel_size:
+            x = self._start_dw_(x)
+            # print("_start_dw_", x.shape)
+        x = self._expand_conv(x)
+        # print("_expand_conv", x.shape)
+        if self.middle_dw_kernel_size:
+            x = self._middle_dw(x)
+            # print("_middle_dw", x.shape)
+        x = self._proj_conv(x)
+        # print("_proj_conv", x.shape)
+        return x
+class MultiQueryAttentionLayerWithDownSampling(nn.Module):
+    def __init__(self, in_channels, num_heads, key_dim, value_dim, query_h_strides, query_w_strides, kv_strides, dw_kernel_size=3, dropout=0.0):
+        """Multi Query Attention with spatial downsampling.
+        Referenced from here https://github.com/tensorflow/models/blob/master/official/vision/modeling/layers/nn_blocks.py
+        3 parameters are introduced for the spatial downsampling:
+        1. kv_strides: downsampling factor on Key and Values only.
+        2. query_h_strides: vertical strides on Query only.
+        3. query_w_strides: horizontal strides on Query only.
+        This is an optimized version.
+        1. Projections in Attention is explict written out as 1x1 Conv2D.
+        2. Additional reshapes are introduced to bring a up to 3x speed up.
+        """
+        super(MultiQueryAttentionLayerWithDownSampling, self).__init__()
+        self.num_heads = num_heads
+        self.key_dim = key_dim
+        self.value_dim = value_dim
+        self.query_h_strides = query_h_strides
+        self.query_w_strides = query_w_strides
+        self.kv_strides = kv_strides
+        self.dw_kernel_size = dw_kernel_size
+        self.dropout = dropout
+        self.head_dim = self.key_dim // num_heads
+        if self.query_h_strides > 1 or self.query_w_strides > 1:
+            self._query_downsampling_norm = nn.BatchNorm2d(in_channels)
+        self._query_proj = conv2d(in_channels, self.num_heads * self.key_dim, 1, 1, norm=False, act=False)
+        if self.kv_strides > 1:
+            self._key_dw_conv = conv2d(in_channels, in_channels, dw_kernel_size, kv_strides, groups=in_channels,
+                                       norm=True, act=False)
+            self._value_dw_conv = conv2d(in_channels, in_channels, dw_kernel_size, kv_strides, groups=in_channels,
+                                         norm=True, act=False)
+        self._key_proj = conv2d(in_channels, key_dim, 1, 1, norm=False, act=False)
+        self._value_proj = conv2d(in_channels, key_dim, 1, 1, norm=False, act=False)
+        self._output_proj = conv2d(num_heads * key_dim, in_channels, 1, 1, norm=False, act=False)
+        self.dropout = nn.Dropout(p=dropout)
+    def forward(self, x):
+        bs, seq_len, _, _ = x.size()
+        # print(x.size())
+        if self.query_h_strides > 1 or self.query_w_strides > 1:
+            q = F.avg_pool2d(self.query_h_strides, self.query_w_strides)
+            q = self._query_downsampling_norm(q)
+            q = self._query_proj(q)
+        else:
+            q = self._query_proj(x)
+        px = q.size(2)
+        q = q.view(bs, self.num_heads, -1, self.key_dim)  # [batch_size, num_heads, seq_len, key_dim]
+        if self.kv_strides > 1:
+            k = self._key_dw_conv(x)
+            k = self._key_proj(k)
+            v = self._value_dw_conv(x)
+            v = self._value_proj(v)
+        else:
+            k = self._key_proj(x)
+            v = self._value_proj(x)
+        k = k.view(bs, 1, self.key_dim, -1)   # [batch_size, 1, key_dim, seq_length]
+        v = v.view(bs, 1, -1, self.key_dim)    # [batch_size, 1, seq_length, key_dim]
+        # calculate attention score
+        # print(q.shape, k.shape, v.shape)
+        attn_score = torch.matmul(q, k) / (self.head_dim ** 0.5)
+        attn_score = self.dropout(attn_score)
+        attn_score = F.softmax(attn_score, dim=-1)
+        # context = torch.einsum('bnhm,bmv->bnhv', attn_score, v)
+        # print(attn_score.shape, v.shape)
+        context = torch.matmul(attn_score, v)
+        context = context.view(bs, self.num_heads * self.key_dim, px, px)
+        output = self._output_proj(context)
+        # print(output.shape)
+        return output
+class MNV4layerScale(nn.Module):
+    def __init__(self, init_value):
+        """LayerScale as introduced in CaiT: https://arxiv.org/abs/2103.17239
+        Referenced from here https://github.com/tensorflow/models/blob/master/official/vision/modeling/layers/nn_blocks.py
+        As used in MobileNetV4.
+        Attributes:
+            init_value (float): value to initialize the diagonal matrix of LayerScale.
+        """
+        super(MNV4layerScale, self).__init__()
+        self.init_value = init_value
+    def forward(self, x):
+        gamma = self.init_value * torch.ones(x.size(-1), dtype=x.dtype, device=x.device)
+        return x * gamma
+class MultiHeadSelfAttentionBlock(nn.Module):
+    def __init__(self, in_channels, num_heads, key_dim, value_dim, query_h_strides, query_w_strides,
+                 kv_strides, use_layer_scale, use_multi_query, use_residual=True):
+        super(MultiHeadSelfAttentionBlock, self).__init__()
+        self.query_h_strides = query_h_strides
+        self.query_w_strides = query_w_strides
+        self.kv_strides = kv_strides
+        self.use_layer_scale = use_layer_scale
+        self.use_multi_query = use_multi_query
+        self.use_residual = use_residual
+        self._input_norm = nn.BatchNorm2d(in_channels)
+        if self.use_multi_query:
+            self.multi_query_attention = MultiQueryAttentionLayerWithDownSampling(
+                in_channels, num_heads, key_dim, value_dim, query_h_strides, query_w_strides, kv_strides
+            )
+        else:
+            self.multi_head_attention = nn.MultiheadAttention(in_channels, num_heads, kdim=key_dim)
+        if use_layer_scale:
+            self.layer_scale_init_value = 1e-5
+            self.layer_scale = MNV4layerScale(self.layer_scale_init_value)
+    def forward(self, x):
+        # Not using CPE, skipped
+        # input norm
+        shortcut = x
+        x = self._input_norm(x)
+        # multi query
+        if self.use_multi_query:
+            # print(x.size())
+            x = self.multi_query_attention(x)
+            # print(x.size())
+        else:
+            x = self.multi_head_attention(x, x)
+        # layer scale
+        if self.use_layer_scale:
+            x = self.layer_scale(x)
+        # use residual
+        if self.use_residual:
+            x = x + shortcut
+        return x
+def build_blocks(layer_spec):
+    global msha
+    if not layer_spec.get("block_name"):
+        return nn.Sequential()
+    block_names = layer_spec["block_name"]
+    layers = nn.Sequential()
+    if block_names == "convbn":
+        schema_ = ["in_channels", "out_channels", "kernel_size", "stride"]
+        for i in range(layer_spec["num_blocks"]):
+            args = dict(zip(schema_, layer_spec["block_specs"][i]))
+            layers.add_module(f"convbn_{i}", conv2d(**args))
+    elif block_names == "uib":
+        schema_ = ["in_channels", "out_channels", "start_dw_kernel_size", "middle_dw_kernel_size", "middle_dw_downsample",
+                   "stride", "expand_ratio", "msha"]
+        for i in range(layer_spec["num_blocks"]):
+            args = dict(zip(schema_, layer_spec["block_specs"][i]))
+            msha = args.pop("msha") if "msha" in args else 0
+            layers.add_module(f"uib_{i}", UniversalInvertedBottleneckBlock(**args))
+            if msha:
+                msha_schema_ = [
+                    "in_channels", "num_heads", "key_dim", "value_dim", "query_h_strides", "query_w_strides", "kv_strides",
+                    "use_layer_scale", "use_multi_query", "use_residual"
+                ]
+                args = dict(zip(msha_schema_, [args["out_channels"]] + (msha)))
+                layers.add_module(
+                    f"msha_{i}", MultiHeadSelfAttentionBlock(**args)
+                )
+    elif block_names == "fused_ib":
+        schema_ = ["in_channels", "out_channels", "stride", "expand_ratio", "act"]
+        for i in range(layer_spec["num_blocks"]):
+            args = dict(zip(schema_, layer_spec["block_specs"][i]))
+            layers.add_module(f"fused_ib_{i}", InvertedResidual(**args))
+    else:
+        raise NotImplementedError
+    return layers
+class MobileNetV4(nn.Module):
+    def __init__(self, model, num_classes=1000):
+        # MobileNetV4ConvSmall  MobileNetV4ConvMedium  MobileNetV4ConvLarge
+        # MobileNetV4HybridMedium  MobileNetV4HybridLarge
+        """Params to initiate MobilenNetV4
+        Args:
+            model : support 5 types of models as indicated in
+            "https://github.com/tensorflow/models/blob/master/official/vision/modeling/backbones/mobilenet.py"
+        """
+        super(MobileNetV4, self).__init__()
+        # print(MODEL_SPECS.keys(), model not in MODEL_SPECS.keys())
+        assert model in MODEL_SPECS.keys()
+        self.model = model
+        self.num_classes = num_classes
+        self.spec = MODEL_SPECS[self.model]
+        # conv0
+        self.conv0 = build_blocks(self.spec["conv0"])
+        # layer1
+        self.layer1 = build_blocks(self.spec["layer1"])
+        # layer2
+        self.layer2 = build_blocks(self.spec["layer2"])
+        # layer3
+        self.layer3 = build_blocks(self.spec["layer3"])
+        # layer4
+        self.layer4 = build_blocks(self.spec["layer4"])
+        # layer5
+        self.layer5 = build_blocks(self.spec["layer5"])
+        # classify [optional]
+        self.fc = nn.Linear(1280, num_classes)
+    def forward(self, x, is_feat=False):
+        x0 = self.conv0(x)
+        x1 = self.layer1(x0)
+        x2 = self.layer2(x1)
+        x3 = self.layer3(x2)
+        x4 = self.layer4(x3)
+        x5 = self.layer5(x4)
+        x5 = F.adaptive_avg_pool2d(x5, 1)
+        out = self.fc(x5.flatten(1))
+        if is_feat:
+            return [x1, x2, x3, x4, x5], out
+        else:
+            return out
+def create_mobilenetv4(model_name: str, num_classes: int = 1000):
+    model = MobileNetV4(model_name, num_classes)
+    return model
+# MNV4ConvSmall, MNV4ConvMedium, MNV4ConvLarge, MNV4HybridMedium, MNV4HybridLarge
+if __name__ == '__main__':
+    x = torch.rand((2, 3, 224, 224))
+    model = create_mobilenetv4(model_name="MNV4HybridLarge")
+    feats, out = model(x)
+    print("logit: ", out.shape)
+    for index, feat in enumerate(feats):
+        print(f"{index}: ", feat.shape)
+    from torchsummary import summary
+    # summary(create_mobilenetv4(model_name="MNV4HybridLarge"), (3, 224, 224))
+    print(sum([i.numel() for i in model.parameters()]) / 1024 / 1024, "MB")
--- a/README.md
+++ b/README.md
+# MobileNetv4
+轻量化之王MobileNetV4，手机推理速度3.8ms，在移动CPU、DSP、GPU以及苹果M处理器和谷歌Pixel Edge TPU全都高性能。
+## 论文
+`MobileNetV4 - Universal Models for the Mobile Ecosystem`
+- https://arxiv.org/pdf/2404.10518
+## 模型结构
+通用UIB块在倒瓶颈块中引入两个可选的DW，一个在扩展层之前，另一个在扩展层和投影层之间，很好地统一了几个重要现有块，包括原始的IB块、ConvNext块以及ViT中的FFN块。此外，UIB还引入了一种新的变体：额外的深度卷积IB（ExtraDW）块；MobileMQA一个专为加速器优化的新型注意力块，它能提供超过39%的推理速度提升。
+<div align=center>
+    <img src="./doc/structure.png"/>
+</div>
+## 算法原理
+利用标准组件引入新的通用反转瓶颈UIB和移动MQA层，并结合改进的神经架构搜索（NAS）方法改进mobilenet，然后将这些与一种新颖的、最先进的蒸馏方法相结合。
+<div align=center>
+    <img src="./doc/algorithm.png"/>
+</div>
+## 环境配置
+```
+mv mobilenetv4_pytorch MobileNetv4 # 去框架名后缀
+```
+### Docker（方法一）
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
+# <your IMAGE ID>为以上拉取的docker的镜像ID替换，本镜像为：c85ed27005f2
+docker run -it --shm-size=32G -v $PWD/MobileNetv4:/home/MobileNetv4 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name mobilenetv4 <your IMAGE ID> bash
+cd /home/MobileNetv4
+pip install -r requirements.txt # requirements.txt
+```
+### Dockerfile（方法二）
+```
+cd MobileNetv4/docker
+docker build --no-cache -t mobilenetv4:latest .
+docker run --shm-size=32G --name mobilenetv4 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/../../MobileNetv4:/home/MobileNetv4 -it mobilenetv4 bash
+# 若遇到Dockerfile启动的方式安装环境需要长时间等待，可注释掉里面的pip安装，启动容器后再安装python库：pip install -r requirements.txt。
+```
+### Anaconda（方法三）
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
+- https://developer.hpccube.com/tool/
+```
+DTK驱动:dtk24.04
+python:python3.10
+torch:2.1.0
+torchvision:0.16.0
+```
+`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`
+2、其它非特殊库参照requirements.txt安装
+```
+pip install -r requirements.txt # requirements.txt
+```
+## 数据集
+源论文采用`ImageNet`训练，本步骤说明采用数据集`flowers`，项目中已提供[`flowers`](./datasets/flowers/)迷你数据集进行试用，解压即可，完整数据集请从以下官网下载：
+- https://www.kaggle.com/datasets/alxmamaev/flowers-recognition?resource=download
+数据目录结构如下：
+```
+datasets/flowers
+    ├── train
+    ├── ├── daisy
+    ├── ├── ├── xxx.jpg
+    ├── ├── └── xxx.jpg
+    ├── ├── dandelion
+    ├── ├── ├── xxx.jpg
+    ├── ├── └── xxx.jpg
+    ├── ├── rose
+    ├── ├── ├── xxx.jpg
+    ├── ├── └── xxx.jpg
+    ├── ├── sunflower
+    ├── ├── ├── xxx.jpg
+    ├── ├── └── xxx.jpg
+    ├── └── tulip
+    ├── ├── ├── xxx.jpg
+    ├── ├── └── xxx.jpg
+    └── val
+    ├── ├── daisy
+    ├── ├── ├── xxx.jpg
+    ├── ├── └── xxx.jpg
+    ├── ├── dandelion
+    ├── ├── ├── xxx.jpg
+    ├── ├── └── xxx.jpg
+    ├── ├── rose
+    ├── ├── ├── xxx.jpg
+    ├── ├── └── xxx.jpg
+    ├── ├── sunflower
+    ├── ├── ├── xxx.jpg
+    ├── ├── └── xxx.jpg
+    ├── └── tulip
+    ├── ├── ├── xxx.jpg
+    ├── ├── └── xxx.jpg
+```
+## 训练
+### 单机单卡
+```
+python train.py --data_path "./datasets/flowers" --num_classes 5 --input_size 256 --gpu 0 # params: 2.383641M
+```
+更多资料可参考源项目的[`readme_origin`](./readme_origin.md)
+## 推理
+```
+python predict.py
+# MODEL_PATH = './checkpoints/model_MNV4ConvSmall_seed901_best.pt' # 使用MNV4ConvSmall训练权重，训练结果位于checkpoints下。
+```
+## result
+`输入：`
+```
+"results/6089825811_80f253fbe1.jpg"
+```
+<div align=center>
+    <img src="./doc/6089825811_80f253fbe1.png"/>
+</div>
+`输出：`
+```
+Vertification picture: 6089825811_80f253fbe1.jpg
+Recognition result: daisy
+Recognition confidence: 0.9952988
+```
+### 精度
+max epoch为300，推理框架：pytorch。
+|  device   |  Train_Loss  |  Train_Acc@1  |
+|:---------:|:------:|:------:|
+| DCU Z100L | 0.25923 | 94.460 |
+| GPU V100S | 0.29128 | 90.720 |
+## 应用场景
+### 算法类别
+`图像识别`
+### 热点应用行业
+`制造,电商,医疗,能源,教育`
+## 源码仓库及问题反馈
+- http://developer.hpccube.com/codes/modelzoo/mobilenetv4_pytorch.git
+## 参考资料
+- https://github.com/Reversev/Deeplearning_pytorch/blob/master/CV_net/MobileNetv4/predict.py
+- https://github.com/tensorflow/models/blob/master/official/vision/modeling/backbones/mobilenet.py
+- https://github.com/jiaowoguanren0615/MobileNetV4/tree/main
+- https://www.jianshu.com/p/992f0ebf656a
--- a/__pycache__/MobileNetv4.cpython-310.pyc
+++ b/__pycache__/MobileNetv4.cpython-310.pyc
--- a/__pycache__/model_config.cpython-310.pyc
+++ b/__pycache__/model_config.cpython-310.pyc
--- a/classes_indices.json
+++ b/classes_indices.json
+{
+    "0": "daisy",
+    "1": "dandelion",
+    "2": "rose",
+    "3": "sunflower",
+    "4": "tulip"
+}
\ No newline at end of file
--- a/datasets/flowers/train/daisy/6089825811_80f253fbe1.jpg
+++ b/datasets/flowers/train/daisy/6089825811_80f253fbe1.jpg
--- a/datasets/flowers/train/daisy/6095817094_3a5b1d793d.jpg
+++ b/datasets/flowers/train/daisy/6095817094_3a5b1d793d.jpg
--- a/datasets/flowers/train/dandelion/7132605107_f5e033d725_n.jpg
+++ b/datasets/flowers/train/dandelion/7132605107_f5e033d725_n.jpg
--- a/datasets/flowers/train/dandelion/7132676187_7a4265b16f_n.jpg
+++ b/datasets/flowers/train/dandelion/7132676187_7a4265b16f_n.jpg
--- a/datasets/flowers/train/rose/7409458444_0bfc9a0682_n.jpg
+++ b/datasets/flowers/train/rose/7409458444_0bfc9a0682_n.jpg
--- a/datasets/flowers/train/rose/7419966772_d6c1c22a81.jpg
+++ b/datasets/flowers/train/rose/7419966772_d6c1c22a81.jpg
--- a/datasets/flowers/train/sunflower/7369484298_332f69bd88_n.jpg
+++ b/datasets/flowers/train/sunflower/7369484298_332f69bd88_n.jpg
--- a/datasets/flowers/train/sunflower/7492109308_bbbb982ebe_n.jpg
+++ b/datasets/flowers/train/sunflower/7492109308_bbbb982ebe_n.jpg
--- a/datasets/flowers/train/tulip/9947385346_3a8cacea02_n.jpg
+++ b/datasets/flowers/train/tulip/9947385346_3a8cacea02_n.jpg
--- a/datasets/flowers/train/tulip/9976515506_d496c5e72c.jpg
+++ b/datasets/flowers/train/tulip/9976515506_d496c5e72c.jpg
--- a/datasets/flowers/val/daisy/6089825811_80f253fbe1.jpg
+++ b/datasets/flowers/val/daisy/6089825811_80f253fbe1.jpg
--- a/datasets/flowers/val/daisy/6095817094_3a5b1d793d.jpg
+++ b/datasets/flowers/val/daisy/6095817094_3a5b1d793d.jpg
--- a/datasets/flowers/val/dandelion/7132605107_f5e033d725_n.jpg
+++ b/datasets/flowers/val/dandelion/7132605107_f5e033d725_n.jpg
--- a/datasets/flowers/val/dandelion/7132676187_7a4265b16f_n.jpg
+++ b/datasets/flowers/val/dandelion/7132676187_7a4265b16f_n.jpg