[Refactor] Use MODELS registry in mmengine and delete basemodule (#2172)

* change MODELS to mmengine, delete basemodule * fix unit test * remove build from cfg * add comment and rename TARGET_MODELS to registry * refine cnn docs * remove unnecessary check * refine as comment * refine build_xxx_conv error message * fix lint * fix import registry from mmcv * remove unused file

[Refactor] Use MODELS registry in mmengine and delete basemodule (#2172)
* change MODELS to mmengine, delete basemodule * fix unit test * remove build from cfg * add comment and rename TARGET_MODELS to registry * refine cnn docs * remove unnecessary check * refine as comment * refine build_xxx_conv error message * fix lint * fix import registry from mmcv * remove unused file
19a02415 · Mashiro · GitHub · f6fd6c21 · 19a02415 · 19a02415
Unverified Commit 19a02415 authored Aug 19, 2022 by Mashiro Committed by GitHub Aug 19, 2022
20 changed files
--- a/docs/en/understand_mmcv/cnn.md
+++ b/docs/en/understand_mmcv/cnn.md
@@ -14,6 +14,8 @@ which can be written in configs or specified via command line arguments.
 A simplest example is

 ```python
+from mmcv.cnn import build_conv_layer
+
 cfg = dict(type='Conv3d')
 layer = build_conv_layer(cfg, in_channels=3, out_channels=8, kernel_size=3)
 ```
@@ -31,9 +33,9 @@ We also allow extending the building methods with custom layers and operators.
 1. Write and register your own module.

   ```python
-   from mmcv.cnn import UPSAMPLE_LAYERS
+   from mmengine.registry import MODELS

-   @UPSAMPLE_LAYERS.register_module()
+   @MODELS.register_module()
   class MyUpsample:

       def __init__(self, scale_factor):
@@ -46,6 +48,8 @@ We also allow extending the building methods with custom layers and operators.
 2. Import `MyUpsample` somewhere (e.g., in `__init__.py`) and then use it.

   ```python
+   from mmcv.cnn import build_upsample_layer
+
   cfg = dict(type='MyUpsample', scale_factor=2)
   layer = build_upsample_layer(cfg)
   ```
@@ -57,6 +61,8 @@ We also provide common module bundles to facilitate the network construction.
 please refer to the [api](api.html#mmcv.cnn.ConvModule) for details.

 ```python
+from mmcv.cnn import ConvModule
+
 # conv + bn + relu
 conv = ConvModule(3, 8, 2, norm_cfg=dict(type='BN'))
 # conv + gn + relu
@@ -72,475 +78,6 @@ conv = ConvModule(
    3, 8, 2, norm_cfg=dict(type='BN'), order=('norm', 'conv', 'act'))
 ```

-### Weight initialization
-
-> Implementation details are available at [mmcv/cnn/utils/weight_init.py](../../mmcv/cnn/utils/weight_init.py)
-
-During training, a proper initialization strategy is beneficial to speed up the
-training or obtain a higher performance. In MMCV, we provide some commonly used
-methods for initializing modules like `nn.Conv2d`. Of course, we also provide
-high-level APIs for initializing models containing one or more
-modules.
-
-#### Initialization functions
-
-Initialize a `nn.Module` such as `nn.Conv2d`, `nn.Linear` in a functional way.
-
-We provide the following initialization methods.
-
- constant_init
-
-  Initialize module parameters with constant values.
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import constant_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # constant_init(module, val, bias=0)
-  >>> constant_init(conv1, 1, 0)
-  >>> conv1.weight
-  ```
-
- xavier_init
-
-  Initialize module parameters with values according to the method
-  described in [Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010)](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf)
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import xavier_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # xavier_init(module, gain=1, bias=0, distribution='normal')
-  >>> xavier_init(conv1, distribution='normal')
-  ```
-
- normal_init
-
-  Initialize module parameters with the values drawn from a normal distribution.
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import normal_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # normal_init(module, mean=0, std=1, bias=0)
-  >>> normal_init(conv1, std=0.01, bias=0)
-  ```
-
- uniform_init
-
-  Initialize module parameters with values drawn from a uniform distribution.
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import uniform_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # uniform_init(module, a=0, b=1, bias=0)
-  >>> uniform_init(conv1, a=0, b=1)
-  ```
-
- kaiming_init
-
-  Initialize module parameters with the values according to the method
-  described in [Delving deep into rectifiers: Surpassing human-level
-  performance on ImageNet classification - He, K. et al. (2015)](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf)
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import kaiming_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # kaiming_init(module, a=0, mode='fan_out', nonlinearity='relu', bias=0, distribution='normal')
-  >>> kaiming_init(conv1)
-  ```
-
- caffe2_xavier_init
-
-  The xavier initialization is implemented in caffe2, which corresponds to `kaiming_uniform_` in PyTorch.
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import caffe2_xavier_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # caffe2_xavier_init(module, bias=0)
-  >>> caffe2_xavier_init(conv1)
-  ```
-
- bias_init_with_prob
-
-  Initialize conv/fc bias value according to a given probability, as proposed in [Focal Loss for Dense Object Detection](https://arxiv.org/pdf/1708.02002.pdf).
-
-  ```python
-  >>> from mmcv.cnn import bias_init_with_prob
-  >>> # bias_init_with_prob is proposed in Focal Loss
-  >>> bias = bias_init_with_prob(0.01)
-  >>> bias
-  -4.59511985013459
-  ```
-
-#### Initializers and configs
-
-On the basis of the initialization methods, we define the corresponding initialization classes and register them to `INITIALIZERS`, so we can
-use the configuration to initialize the model.
-
-We provide the following initialization classes.
-
- ConstantInit
- XavierInit
- NormalInit
- UniformInit
- KaimingInit
- Caffe2XavierInit
- PretrainedInit
-
-Let us introduce the usage of `initialize` in detail.
-
-1. Initialize model by `layer` key
-
-   If we only define `layer`, it just initialize the layer in `layer` key.
-
-   NOTE: Value of `layer` key is the class name with attributes weights and bias of Pytorch, so `MultiheadAttention layer` is not supported.
-
- Define `layer` key for initializing module with same configuration.
-
-  ```python
-  import torch.nn as nn
-  from mmcv.cnn import initialize
-
-  class FooNet(nn.Module):
-      def __init__(self):
-          super().__init__()
-          self.feat = nn.Conv1d(3, 1, 3)
-          self.reg = nn.Conv2d(3, 3, 3)
-          self.cls = nn.Linear(1, 2)
-
-  model = FooNet()
-  init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d', 'Linear'], val=1)
-  # initialize whole module with same configuration
-  initialize(model, init_cfg)
-  # model.feat.weight
-  # Parameter containing:
-  # tensor([[[1., 1., 1.],
-  #          [1., 1., 1.],
-  #          [1., 1., 1.]]], requires_grad=True)
-  ```
-
- Define `layer` key for initializing layer with different configurations.
-
-  ```python
-  import torch.nn as nn
-  from mmcv.cnn.utils import initialize
-
-  class FooNet(nn.Module):
-      def __init__(self):
-          super().__init__()
-          self.feat = nn.Conv1d(3, 1, 3)
-          self.reg = nn.Conv2d(3, 3, 3)
-          self.cls = nn.Linear(1,2)
-
-  model = FooNet()
-  init_cfg = [dict(type='Constant', layer='Conv1d', val=1),
-              dict(type='Constant', layer='Conv2d', val=2),
-              dict(type='Constant', layer='Linear', val=3)]
-  # nn.Conv1d will be initialized with dict(type='Constant', val=1)
-  # nn.Conv2d will be initialized with dict(type='Constant', val=2)
-  # nn.Linear will be initialized with dict(type='Constant', val=3)
-  initialize(model, init_cfg)
-  # model.reg.weight
-  # Parameter containing:
-  # tensor([[[[2., 2., 2.],
-  #           [2., 2., 2.],
-  #           [2., 2., 2.]],
-  #          ...,
-  #          [[2., 2., 2.],
-  #           [2., 2., 2.],
-  #           [2., 2., 2.]]]], requires_grad=True)
-  ```
-
-2. Initialize model by `override` key
-
- When initializing some specific part with its attribute name, we can use `override` key, and the value in `override` will ignore the value in init_cfg.
-
-  ```python
-  import torch.nn as nn
-  from mmcv.cnn import initialize
-
-  class FooNet(nn.Module):
-      def __init__(self):
-          super().__init__()
-          self.feat = nn.Conv1d(3, 1, 3)
-          self.reg = nn.Conv2d(3, 3, 3)
-          self.cls = nn.Sequential(nn.Conv1d(3, 1, 3), nn.Linear(1,2))
-
-  # if we would like to initialize model's weights as 1 and bias as 2
-  # but weight in `reg` as 3 and bias 4, we can use override key
-  model = FooNet()
-  init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'], val=1, bias=2,
-                  override=dict(type='Constant', name='reg', val=3, bias=4))
-  # self.feat and self.cls will be initialized with dict(type='Constant', val=1, bias=2)
-  # The module called 'reg' will be initialized with dict(type='Constant', val=3, bias=4)
-  initialize(model, init_cfg)
-  # model.reg.weight
-  # Parameter containing:
-  # tensor([[[[3., 3., 3.],
-  #           [3., 3., 3.],
-  #           [3., 3., 3.]],
-  #           ...,
-  #           [[3., 3., 3.],
-  #            [3., 3., 3.],
-  #            [3., 3., 3.]]]], requires_grad=True)
-  ```
-
- If `layer` is None in init_cfg, only sub-module with the name in override will be initialized, and type and other args in override can be omitted.
-
-  ```python
-  model = FooNet()
-  init_cfg = dict(type='Constant', val=1, bias=2, override=dict(name='reg'))
-  # self.feat and self.cls will be initialized by Pytorch
-  # The module called 'reg' will be initialized with dict(type='Constant', val=1, bias=2)
-  initialize(model, init_cfg)
-  # model.reg.weight
-  # Parameter containing:
-  # tensor([[[[1., 1., 1.],
-  #           [1., 1., 1.],
-  #           [1., 1., 1.]],
-  #           ...,
-  #           [[1., 1., 1.],
-  #            [1., 1., 1.],
-  #            [1., 1., 1.]]]], requires_grad=True)
-  ```
-
- If we don't define `layer` key or `override` key, it will not initialize anything.
-
- Invalid usage
-
-  ```python
-  # It is invalid that override don't have name key
-  init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'],
-                  val=1, bias=2,
-                  override=dict(type='Constant', val=3, bias=4))
-
-  # It is also invalid that override has name and other args except type
-  init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'],
-                  val=1, bias=2,
-                  override=dict(name='reg', val=3, bias=4))
-  ```
-
-3. Initialize model with the pretrained model
-
-   ```python
-   import torch.nn as nn
-   import torchvision.models as models
-   from mmcv.cnn import initialize
-
-   # initialize model with pretrained model
-   model = models.resnet50()
-   # model.conv1.weight
-   # Parameter containing:
-   # tensor([[[[-6.7435e-03, -2.3531e-02, -9.0143e-03,  ..., -2.1245e-03,
-   #            -1.8077e-03,  3.0338e-03],
-   #           [-1.2603e-02, -2.7831e-02,  2.3187e-02,  ..., -1.5793e-02,
-   #             1.1655e-02,  4.5889e-03],
-   #           [-3.7916e-02,  1.2014e-02,  1.3815e-02,  ..., -4.2651e-03,
-   #             1.7314e-02, -9.9998e-03],
-   #           ...,
-
-   init_cfg = dict(type='Pretrained',
-                   checkpoint='torchvision://resnet50')
-   initialize(model, init_cfg)
-   # model.conv1.weight
-   # Parameter containing:
-   # tensor([[[[ 1.3335e-02,  1.4664e-02, -1.5351e-02,  ..., -4.0896e-02,
-   #            -4.3034e-02, -7.0755e-02],
-   #           [ 4.1205e-03,  5.8477e-03,  1.4948e-02,  ...,  2.2060e-03,
-   #            -2.0912e-02, -3.8517e-02],
-   #           [ 2.2331e-02,  2.3595e-02,  1.6120e-02,  ...,  1.0281e-01,
-   #             6.2641e-02,  5.1977e-02],
-   #           ...,
-
-   # initialize weights of a sub-module with the specific part of a pretrained model by using 'prefix'
-   model = models.resnet50()
-   url = 'http://download.openmmlab.com/mmdetection/v2.0/retinanet/'\
-         'retinanet_r50_fpn_1x_coco/'\
-         'retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth'
-   init_cfg = dict(type='Pretrained',
-                   checkpoint=url, prefix='backbone.')
-   initialize(model, init_cfg)
-   ```
-
-4. Initialize model inherited from BaseModule, Sequential, ModuleList, ModuleDict
-
-   `BaseModule` is inherited from `torch.nn.Module`, and the only different between them is that `BaseModule` implements `init_weights()`.
-
-   `Sequential` is inherited from `BaseModule` and `torch.nn.Sequential`.
-
-   `ModuleList` is inherited from `BaseModule` and `torch.nn.ModuleList`.
-
-   `ModuleDict` is inherited from `BaseModule` and `torch.nn.ModuleDict`.
-
-   ```python
-   import torch.nn as nn
-   from mmcv.runner import BaseModule, Sequential, ModuleList, ModuleDict
-
-   class FooConv1d(BaseModule):
-
-       def __init__(self, init_cfg=None):
-           super().__init__(init_cfg)
-           self.conv1d = nn.Conv1d(4, 1, 4)
-
-       def forward(self, x):
-           return self.conv1d(x)
-
-   class FooConv2d(BaseModule):
-
-       def __init__(self, init_cfg=None):
-           super().__init__(init_cfg)
-           self.conv2d = nn.Conv2d(3, 1, 3)
-
-       def forward(self, x):
-           return self.conv2d(x)
-
-   # BaseModule
-   init_cfg = dict(type='Constant', layer='Conv1d', val=0., bias=1.)
-   model = FooConv1d(init_cfg)
-   model.init_weights()
-   # model.conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #        [0., 0., 0., 0.],
-   #        [0., 0., 0., 0.],
-   #        [0., 0., 0., 0.]]], requires_grad=True)
-
-   # Sequential
-   init_cfg1 = dict(type='Constant', layer='Conv1d', val=0., bias=1.)
-   init_cfg2 = dict(type='Constant', layer='Conv2d', val=2., bias=3.)
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   seq_model = Sequential(model1, model2)
-   seq_model.init_weights()
-   # seq_model[0].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # seq_model[1].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-
-   # inner init_cfg has higher priority
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d'], val=4., bias=5.)
-   seq_model = Sequential(model1, model2, init_cfg=init_cfg)
-   seq_model.init_weights()
-   # seq_model[0].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # seq_model[1].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-
-   # ModuleList
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   modellist = ModuleList([model1, model2])
-   modellist.init_weights()
-   # modellist[0].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # modellist[1].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-
-   # inner init_cfg has higher priority
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d'], val=4., bias=5.)
-   modellist = ModuleList([model1, model2], init_cfg=init_cfg)
-   modellist.init_weights()
-   # modellist[0].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # modellist[1].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-
-   # ModuleDict
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   modeldict = ModuleDict(dict(model1=model1, model2=model2))
-   modeldict.init_weights()
-   # modeldict['model1'].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # modeldict['model2'].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-
-   # inner init_cfg has higher priority
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d'], val=4., bias=5.)
-   modeldict = ModuleDict(dict(model1=model1, model2=model2), init_cfg=init_cfg)
-   modeldict.init_weights()
-   # modeldict['model1'].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # modeldict['model2'].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-   ```
-
 ### Model Zoo

 Besides torchvision pre-trained models, we also provide pre-trained models of following CNN:

--- a/docs/en/understand_mmcv/registry.md
+++ b/docs/en/understand_mmcv/registry.md
@@ -129,9 +129,9 @@ Basically, there are two ways to build a module from child or sibling registries
   In MMDetection we define:

   ```python
-   from mmcv.utils import Registry
-   from mmcv.cnn import MODELS as MMCV_MODELS
-   MODELS = Registry('model', parent=MMCV_MODELS)
+   from mmengine.registry import Registry
+   from mmengine.registry import MODELS as MMENGINE_MODELS
+   MODELS = Registry('model', parent=MMENGINE_MODELS)

   @MODELS.register_module()
   class NetA(nn.Module):
@@ -142,9 +142,9 @@ Basically, there are two ways to build a module from child or sibling registries
   In MMClassification we define:

   ```python
-   from mmcv.utils import Registry
-   from mmcv.cnn import MODELS as MMCV_MODELS
-   MODELS = Registry('model', parent=MMCV_MODELS)
+   from mmengine.registry import Registry
+   from mmengine.registry import MODELS as MMENGINE_MODELS
+   MODELS = Registry('model', parent=MMENGINE_MODELS)

   @MODELS.register_module()
   class NetB(nn.Module):
@@ -173,7 +173,7 @@ Basically, there are two ways to build a module from child or sibling registries
   The shared `MODELS` registry in MMCV is the parent registry for all downstream codebases (root registry):

   ```python
-   from mmcv.cnn import MODELS as MMCV_MODELS
-   net_a = MMCV_MODELS.build(cfg=dict(type='mmdet.NetA'))
-   net_b = MMCV_MODELS.build(cfg=dict(type='mmcls.NetB'))
+   from mmengine.registry import MODELS as MMENGINE_MODELS
+   net_a = MMENGINE_MODELS.build(cfg=dict(type='mmdet.NetA'))
+   net_b = MMENGINE_MODELS.build(cfg=dict(type='mmcls.NetB'))
   ```
--- a/docs/zh_cn/understand_mmcv/cnn.md
+++ b/docs/zh_cn/understand_mmcv/cnn.md
@@ -11,6 +11,8 @@
 一个简单的例子：

 ```python
+from mmcv.cnn import build_conv_layer
+
 cfg = dict(type='Conv3d')
 layer = build_conv_layer(cfg, in_channels=3, out_channels=8, kernel_size=3)
 ```
@@ -28,9 +30,9 @@ layer = build_conv_layer(cfg, in_channels=3, out_channels=8, kernel_size=3)
 1. 编写和注册自己的模块：

   ```python
-   from mmcv.cnn import UPSAMPLE_LAYERS
+   from mmengine.registry import MODELS

-   @UPSAMPLE_LAYERS.register_module()
+   @MODELS.register_module()
   class MyUpsample:

       def __init__(self, scale_factor):
@@ -43,6 +45,8 @@ layer = build_conv_layer(cfg, in_channels=3, out_channels=8, kernel_size=3)
 2. 在某处导入 `MyUpsample` （例如 `__init__.py` ）然后使用它：

   ```python
+   from mmcv.cnn import build_upsample_layer
+
   cfg = dict(type='MyUpsample', scale_factor=2)
   layer = build_upsample_layer(cfg)
   ```
@@ -53,6 +57,8 @@ layer = build_conv_layer(cfg, in_channels=3, out_channels=8, kernel_size=3)
 卷积组件 `ConvModule` 由 convolution、normalization以及activation layers 组成，更多细节请参考 [ConvModule api](api.html#mmcv.cnn.ConvModule)。

 ```python
+from mmcv.cnn import ConvModule
+
 # conv + bn + relu
 conv = ConvModule(3, 8, 2, norm_cfg=dict(type='BN'))
 # conv + gn + relu
@@ -68,468 +74,6 @@ conv = ConvModule(
    3, 8, 2, norm_cfg=dict(type='BN'), order=('norm', 'conv', 'act'))
 ```

-### Weight initialization
-
-> 实现细节可以在 [mmcv/cnn/utils/weight_init.py](../../mmcv/cnn/utils/weight_init.py)中找到
-
-在训练过程中，适当的初始化策略有利于加快训练速度或者获得更高的性能。 在MMCV中，我们提供了一些常用的方法来初始化模块，比如 `nn.Conv2d` 模块。当然，我们也提供了一些高级API，可用于初始化包含一个或多个模块的模型。
-
-#### Initialization functions
-
-以函数的方式初始化 `nn.Module` ，例如 `nn.Conv2d` 、 `nn.Linear` 等。
-
-我们提供以下初始化方法，
-
- constant_init
-
-  使用给定常量值初始化模型参数
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import constant_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # constant_init(module, val, bias=0)
-  >>> constant_init(conv1, 1, 0)
-  >>> conv1.weight
-  ```
-
- xavier_init
-
-  按照 [Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010)](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf) 描述的方法初始化模型参数
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import xavier_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # xavier_init(module, gain=1, bias=0, distribution='normal')
-  >>> xavier_init(conv1, distribution='normal')
-  ```
-
- normal_init
-
-  使用正态分布（高斯分布）初始化模型参数
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import normal_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # normal_init(module, mean=0, std=1, bias=0)
-  >>> normal_init(conv1, std=0.01, bias=0)
-  ```
-
- uniform_init
-
-  使用均匀分布初始化模型参数
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import uniform_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # uniform_init(module, a=0, b=1, bias=0)
-  >>> uniform_init(conv1, a=0, b=1)
-  ```
-
- kaiming_init
-
-  按照 [Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015)](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf) 描述的方法来初始化模型参数。
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import kaiming_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # kaiming_init(module, a=0, mode='fan_out', nonlinearity='relu', bias=0, distribution='normal')
-  >>> kaiming_init(conv1)
-  ```
-
- caffe2_xavier_init
-
-  caffe2中实现的 `xavier initialization`，对应于 PyTorch中的 `kaiming_uniform_`
-
-  ```python
-  >>> import torch.nn as nn
-  >>> from mmcv.cnn import caffe2_xavier_init
-  >>> conv1 = nn.Conv2d(3, 3, 1)
-  >>> # caffe2_xavier_init(module, bias=0)
-  >>> caffe2_xavier_init(conv1)
-  ```
-
- bias_init_with_prob
-
-  根据给定的概率初始化 `conv/fc`, 这在 [Focal Loss for Dense Object Detection](https://arxiv.org/pdf/1708.02002.pdf) 提出。
-
-  ```python
-  >>> from mmcv.cnn import bias_init_with_prob
-  >>> # bias_init_with_prob is proposed in Focal Loss
-  >>> bias = bias_init_with_prob(0.01)
-  >>> bias
-  -4.59511985013459
-  ```
-
-#### Initializers and configs
-
-在初始化方法的基础上，我们定义了相应的初始化类，并将它们注册到 `INITIALIZERS` 中，这样我们就可以使用 `config` 配置来初始化模型了。
-
-我们提供以下初始化类：
-
- ConstantInit
- XavierInit
- NormalInit
- UniformInit
- KaimingInit
- Caffe2XavierInit
- PretrainedInit
-
-接下来详细介绍 `initialize` 的使用方法
-
-1. 通过关键字 `layer` 来初始化模型
-
-   如果我们只定义了关键字 `layer` ，那么只初始化 `layer` 中包含的层。
-
-   注意: 关键字 `layer` 支持的模块是带有 weights 和 bias 属性的 PyTorch 模块，所以不支持 `MultiheadAttention layer`
-
- 定义关键字 `layer` 列表并使用相同相同配置初始化模块
-
-  ```python
-  import torch.nn as nn
-  from mmcv.cnn import initialize
-
-  class FooNet(nn.Module):
-      def __init__(self):
-          super().__init__()
-          self.feat = nn.Conv1d(3, 1, 3)
-          self.reg = nn.Conv2d(3, 3, 3)
-          self.cls = nn.Linear(1, 2)
-
-  model = FooNet()
-  init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d', 'Linear'], val=1)
-  # 使用相同的配置初始化整个模块
-  initialize(model, init_cfg)
-  # model.feat.weight
-  # Parameter containing:
-  # tensor([[[1., 1., 1.],
-  #          [1., 1., 1.],
-  #          [1., 1., 1.]]], requires_grad=True)
-  ```
-
- 定义关键字 `layer` 用于初始化不同配置的层
-
-  ```python
-  import torch.nn as nn
-  from mmcv.cnn.utils import initialize
-
-  class FooNet(nn.Module):
-      def __init__(self):
-          super().__init__()
-          self.feat = nn.Conv1d(3, 1, 3)
-          self.reg = nn.Conv2d(3, 3, 3)
-          self.cls = nn.Linear(1,2)
-
-  model = FooNet()
-  init_cfg = [dict(type='Constant', layer='Conv1d', val=1),
-              dict(type='Constant', layer='Conv2d', val=2),
-              dict(type='Constant', layer='Linear', val=3)]
-  # nn.Conv1d 使用 dict(type='Constant', val=1) 初始化
-  # nn.Conv2d 使用 dict(type='Constant', val=2) 初始化
-  # nn.Linear 使用 dict(type='Constant', val=3) 初始化
-  initialize(model, init_cfg)
-  # model.reg.weight
-  # Parameter containing:
-  # tensor([[[[2., 2., 2.],
-  #           [2., 2., 2.],
-  #           [2., 2., 2.]],
-  #          ...,
-  #          [[2., 2., 2.],
-  #           [2., 2., 2.],
-  #           [2., 2., 2.]]]], requires_grad=True)
-  ```
-
-2. 定义关键字`override`初始化模型
-
- 当用属性名初始化某个特定部分时, 我们可以使用关键字 `override`, 关键字 `override` 对应的Value会替代init_cfg中相应的值
-
-  ```python
-  import torch.nn as nn
-  from mmcv.cnn import initialize
-
-  class FooNet(nn.Module):
-      def __init__(self):
-          super().__init__()
-          self.feat = nn.Conv1d(3, 1, 3)
-          self.reg = nn.Conv2d(3, 3, 3)
-          self.cls = nn.Sequential(nn.Conv1d(3, 1, 3), nn.Linear(1,2))
-
-  # 如果我们想将模型的权重初始化为 1，将偏差初始化为 2
-  # 但希望 `reg` 中的权重为 3，偏差为 4，则我们可以使用关键字override
-
-  model = FooNet()
-  init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'], val=1, bias=2,
-                  override=dict(type='Constant', name='reg', val=3, bias=4))
-  #  使用 dict(type='Constant', val=1, bias=2)来初始化 self.feat and self.cls
-  # 使用dict(type='Constant', val=3, bias=4)来初始化‘reg’模块。
-  initialize(model, init_cfg)
-  # model.reg.weight
-  # Parameter containing:
-  # tensor([[[[3., 3., 3.],
-  #           [3., 3., 3.],
-  #           [3., 3., 3.]],
-  #           ...,
-  #           [[3., 3., 3.],
-  #            [3., 3., 3.],
-  #            [3., 3., 3.]]]], requires_grad=True)
-  ```
-
- 如果 init_cfg 中的关键字`layer`为None，则只初始化在关键字override中的子模块，并且省略override中的 type 和其他参数
-
-  ```python
-  model = FooNet()
-  init_cfg = dict(type='Constant', val=1, bias=2, override=dict(name='reg'))
-  # self.feat 和 self.cls 使用pyTorch默认的初始化
-  # 将使用 dict(type='Constant', val=1, bias=2) 初始化名为 'reg' 的模块
-  initialize(model, init_cfg)
-  # model.reg.weight
-  # Parameter containing:
-  # tensor([[[[1., 1., 1.],
-  #           [1., 1., 1.],
-  #           [1., 1., 1.]],
-  #           ...,
-  #           [[1., 1., 1.],
-  #            [1., 1., 1.],
-  #            [1., 1., 1.]]]], requires_grad=True)
-  ```
-
- 如果我们没有定义关键字`layer`或`override` , 将不会初始化任何东西
-
- 关键字`override`的无效用法
-
-  ```python
-  # 没有重写任何子模块
-  init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'],
-                  val=1, bias=2,
-                  override=dict(type='Constant', val=3, bias=4))
-
-  # 没有指定type，即便有其他参数，也是无效的。
-  init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'],
-                  val=1, bias=2,
-                  override=dict(name='reg', val=3, bias=4))
-  ```
-
-3. 用预训练模型初始化
-
-   ```python
-   import torch.nn as nn
-   import torchvision.models as models
-   from mmcv.cnn import initialize
-
-   # 使用预训练模型来初始化
-   model = models.resnet50()
-   # model.conv1.weight
-   # Parameter containing:
-   # tensor([[[[-6.7435e-03, -2.3531e-02, -9.0143e-03,  ..., -2.1245e-03,
-   #            -1.8077e-03,  3.0338e-03],
-   #           [-1.2603e-02, -2.7831e-02,  2.3187e-02,  ..., -1.5793e-02,
-   #             1.1655e-02,  4.5889e-03],
-   #           [-3.7916e-02,  1.2014e-02,  1.3815e-02,  ..., -4.2651e-03,
-   #             1.7314e-02, -9.9998e-03],
-   #           ...,
-
-   init_cfg = dict(type='Pretrained',
-                   checkpoint='torchvision://resnet50')
-   initialize(model, init_cfg)
-   # model.conv1.weight
-   # Parameter containing:
-   # tensor([[[[ 1.3335e-02,  1.4664e-02, -1.5351e-02,  ..., -4.0896e-02,
-   #            -4.3034e-02, -7.0755e-02],
-   #           [ 4.1205e-03,  5.8477e-03,  1.4948e-02,  ...,  2.2060e-03,
-   #            -2.0912e-02, -3.8517e-02],
-   #           [ 2.2331e-02,  2.3595e-02,  1.6120e-02,  ...,  1.0281e-01,
-   #             6.2641e-02,  5.1977e-02],
-   #           ...,
-
-   # 使用关键字'prefix'用预训练模型的特定部分来初始化子模块权重
-   model = models.resnet50()
-   url = 'http://download.openmmlab.com/mmdetection/v2.0/retinanet/'\
-         'retinanet_r50_fpn_1x_coco/'\
-         'retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth'
-   init_cfg = dict(type='Pretrained',
-                   checkpoint=url, prefix='backbone.')
-   initialize(model, init_cfg)
-   ```
-
-4. 初始化继承自BaseModule、Sequential、ModuleList、ModuleDict的模型
-
-   `BaseModule` 继承自 `torch.nn.Module`, 它们之间唯一的不同是 `BaseModule` 实现了 `init_weight`
-
-   `Sequential` 继承自 `BaseModule` 和 `torch.nn.Sequential`
-
-   `ModuleList` 继承自 `BaseModule` 和 `torch.nn.ModuleList`
-
-   `ModuleDict` 继承自 `BaseModule` 和 `torch.nn.ModuleDict`
-
-   ```python
-   import torch.nn as nn
-   from mmcv.runner import BaseModule, Sequential, ModuleList, ModuleDict
-
-   class FooConv1d(BaseModule):
-
-       def __init__(self, init_cfg=None):
-           super().__init__(init_cfg)
-           self.conv1d = nn.Conv1d(4, 1, 4)
-
-       def forward(self, x):
-           return self.conv1d(x)
-
-   class FooConv2d(BaseModule):
-
-       def __init__(self, init_cfg=None):
-           super().__init__(init_cfg)
-           self.conv2d = nn.Conv2d(3, 1, 3)
-
-       def forward(self, x):
-           return self.conv2d(x)
-
-   # BaseModule
-   init_cfg = dict(type='Constant', layer='Conv1d', val=0., bias=1.)
-   model = FooConv1d(init_cfg)
-   model.init_weights()
-   # model.conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #        [0., 0., 0., 0.],
-   #        [0., 0., 0., 0.],
-   #        [0., 0., 0., 0.]]], requires_grad=True)
-
-   # Sequential
-   init_cfg1 = dict(type='Constant', layer='Conv1d', val=0., bias=1.)
-   init_cfg2 = dict(type='Constant', layer='Conv2d', val=2., bias=3.)
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   seq_model = Sequential(model1, model2)
-   seq_model.init_weights()
-   # seq_model[0].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # seq_model[1].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-
-   # inner init_cfg has higher priority
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d'], val=4., bias=5.)
-   seq_model = Sequential(model1, model2, init_cfg=init_cfg)
-   seq_model.init_weights()
-   # seq_model[0].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # seq_model[1].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-
-   # ModuleList
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   modellist = ModuleList([model1, model2])
-   modellist.init_weights()
-   # modellist[0].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # modellist[1].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-
-   # inner init_cfg has higher priority
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d'], val=4., bias=5.)
-   modellist = ModuleList([model1, model2], init_cfg=init_cfg)
-   modellist.init_weights()
-   # modellist[0].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # modellist[1].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-
-   # ModuleDict
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   modeldict = ModuleDict(dict(model1=model1, model2=model2))
-   modeldict.init_weights()
-   # modeldict['model1'].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # modeldict['model2'].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-
-   # inner init_cfg has higher priority
-   model1 = FooConv1d(init_cfg1)
-   model2 = FooConv2d(init_cfg2)
-   init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d'], val=4., bias=5.)
-   modeldict = ModuleDict(dict(model1=model1, model2=model2), init_cfg=init_cfg)
-   modeldict.init_weights()
-   # modeldict['model1'].conv1d.weight
-   # Parameter containing:
-   # tensor([[[0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.],
-   #         [0., 0., 0., 0.]]], requires_grad=True)
-   # modeldict['model2'].conv2d.weight
-   # Parameter containing:
-   # tensor([[[[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]],
-   #         ...,
-   #          [[2., 2., 2.],
-   #           [2., 2., 2.],
-   #           [2., 2., 2.]]]], requires_grad=True)
-   ```
-
 ### Model Zoo

 除了`torchvision`的预训练模型，我们还提供以下 CNN 的预训练模型：

--- a/docs/zh_cn/understand_mmcv/registry.md
+++ b/docs/zh_cn/understand_mmcv/registry.md
@@ -30,7 +30,7 @@ MMCV 使用 [注册器](https://github.com/open-mmlab/mmcv/blob/master/mmcv/util
 假设我们要实现一系列数据集转换器（Dataset Converter），用于将不同格式的数据转换为标准数据格式。我们先创建一个名为converters的目录作为包，在包中我们创建一个文件来实现构建器（builder），命名为converters/builder.py，如下

 ```python
-from mmcv.utils import Registry
+from mmengine.registry import Registry
 # 创建转换器（converter）的注册器（registry）
 CONVERTERS = Registry('converter')
 ```
@@ -126,9 +126,9 @@ CONVERTERS = Registry('converter', build_func=build_converter)
   我们在 MMDetection 中定义：

   ```python
-   from mmcv.utils import Registry
-   from mmcv.cnn import MODELS as MMCV_MODELS
-   MODELS = Registry('model', parent=MMCV_MODELS)
+   from mmengine.resgitry import Registry
+   from mmengine.resgitry import MODELS as MMENGINE_MODELS
+   MODELS = Registry('model', parent=MMENGINE_MODELS)

   @MODELS.register_module()
   class NetA(nn.Module):
@@ -139,9 +139,9 @@ CONVERTERS = Registry('converter', build_func=build_converter)
   我们在 MMClassification 中定义：

   ```python
-   from mmcv.utils import Registry
-   from mmcv.cnn import MODELS as MMCV_MODELS
-   MODELS = Registry('model', parent=MMCV_MODELS)
+   from mmengine.registry import Registry
+   from mmengine.registry import MODELS as MMENGINE_MODELS
+   MODELS = Registry('model', parent=MMENGINE_MODELS)

   @MODELS.register_module()
   class NetB(nn.Module):
@@ -170,7 +170,7 @@ CONVERTERS = Registry('converter', build_func=build_converter)
   MMCV中的共享`MODELS`注册器是所有下游代码库的父注册器（根注册器）：

   ```python
-   from mmcv.cnn import MODELS as MMCV_MODELS
-   net_a = MMCV_MODELS.build(cfg=dict(type='mmdet.NetA'))
-   net_b = MMCV_MODELS.build(cfg=dict(type='mmcls.NetB'))
+   from mmengine.registry import MODELS as MMENGINE_MODELS
+   net_a = MMENGINE_MODELS.build(cfg=dict(type='mmdet.NetA'))
+   net_b = MMENGINE_MODELS.build(cfg=dict(type='mmcls.NetB'))
   ```
--- a/mmcv/cnn/__init__.py
+++ b/mmcv/cnn/__init__.py
 # Copyright (c) OpenMMLab. All rights reserved.
 from .alexnet import AlexNet
 # yapf: disable
-from .bricks import (ACTIVATION_LAYERS, CONV_LAYERS, NORM_LAYERS,
-                     PADDING_LAYERS, PLUGIN_LAYERS, UPSAMPLE_LAYERS,
-                     ContextBlock, Conv2d, Conv3d, ConvAWS2d, ConvModule,
+from .bricks import (ContextBlock, Conv2d, Conv3d, ConvAWS2d, ConvModule,
                     ConvTranspose2d, ConvTranspose3d, ConvWS2d,
                     DepthwiseSeparableConvModule, GeneralizedAttention,
                     HSigmoid, HSwish, Linear, MaxPool2d, MaxPool3d,
@@ -11,31 +9,19 @@ from .bricks import (ACTIVATION_LAYERS, CONV_LAYERS, NORM_LAYERS,
                     build_activation_layer, build_conv_layer,
                     build_norm_layer, build_padding_layer, build_plugin_layer,
                     build_upsample_layer, conv_ws_2d, is_norm)
-from .builder import MODELS, build_model_from_cfg
 # yapf: enable
 from .resnet import ResNet, make_res_layer
-from .utils import (INITIALIZERS, Caffe2XavierInit, ConstantInit, KaimingInit,
-                    NormalInit, PretrainedInit, TruncNormalInit, UniformInit,
-                    XavierInit, bias_init_with_prob, caffe2_xavier_init,
-                    constant_init, fuse_conv_bn, get_model_complexity_info,
-                    initialize, kaiming_init, normal_init, trunc_normal_init,
-                    uniform_init, xavier_init)
+from .utils import fuse_conv_bn, get_model_complexity_info
 from .vgg import VGG, make_vgg_layer

 __all__ = [
    'AlexNet', 'VGG', 'make_vgg_layer', 'ResNet', 'make_res_layer',
-    'constant_init', 'xavier_init', 'normal_init', 'trunc_normal_init',
-    'uniform_init', 'kaiming_init', 'caffe2_xavier_init',
-    'bias_init_with_prob', 'ConvModule', 'build_activation_layer',
-    'build_conv_layer', 'build_norm_layer', 'build_padding_layer',
-    'build_upsample_layer', 'build_plugin_layer', 'is_norm', 'NonLocal1d',
-    'NonLocal2d', 'NonLocal3d', 'ContextBlock', 'HSigmoid', 'Swish', 'HSwish',
-    'GeneralizedAttention', 'ACTIVATION_LAYERS', 'CONV_LAYERS', 'NORM_LAYERS',
-    'PADDING_LAYERS', 'UPSAMPLE_LAYERS', 'PLUGIN_LAYERS', 'Scale',
-    'get_model_complexity_info', 'conv_ws_2d', 'ConvAWS2d', 'ConvWS2d',
-    'fuse_conv_bn', 'DepthwiseSeparableConvModule', 'Linear', 'Conv2d',
-    'ConvTranspose2d', 'MaxPool2d', 'ConvTranspose3d', 'MaxPool3d', 'Conv3d',
-    'initialize', 'INITIALIZERS', 'ConstantInit', 'XavierInit', 'NormalInit',
-    'TruncNormalInit', 'UniformInit', 'KaimingInit', 'PretrainedInit',
-    'Caffe2XavierInit', 'MODELS', 'build_model_from_cfg'
+    'ConvModule', 'build_activation_layer', 'build_conv_layer',
+    'build_norm_layer', 'build_padding_layer', 'build_upsample_layer',
+    'build_plugin_layer', 'is_norm', 'NonLocal1d', 'NonLocal2d', 'NonLocal3d',
+    'ContextBlock', 'HSigmoid', 'Swish', 'HSwish', 'GeneralizedAttention',
+    'Scale', 'conv_ws_2d', 'ConvAWS2d', 'ConvWS2d',
+    'DepthwiseSeparableConvModule', 'Linear', 'Conv2d', 'ConvTranspose2d',
+    'MaxPool2d', 'ConvTranspose3d', 'MaxPool3d', 'Conv3d', 'fuse_conv_bn',
+    'get_model_complexity_info'
 ]
--- a/mmcv/cnn/bricks/__init__.py
+++ b/mmcv/cnn/bricks/__init__.py
@@ -14,8 +14,6 @@ from .non_local import NonLocal1d, NonLocal2d, NonLocal3d
 from .norm import build_norm_layer, is_norm
 from .padding import build_padding_layer
 from .plugin import build_plugin_layer
-from .registry import (ACTIVATION_LAYERS, CONV_LAYERS, NORM_LAYERS,
-                       PADDING_LAYERS, PLUGIN_LAYERS, UPSAMPLE_LAYERS)
 from .scale import Scale
 from .swish import Swish
 from .upsample import build_upsample_layer
@@ -27,9 +25,8 @@ __all__ = [
    'build_norm_layer', 'build_padding_layer', 'build_upsample_layer',
    'build_plugin_layer', 'is_norm', 'HSigmoid', 'HSwish', 'NonLocal1d',
    'NonLocal2d', 'NonLocal3d', 'ContextBlock', 'GeneralizedAttention',
-    'ACTIVATION_LAYERS', 'CONV_LAYERS', 'NORM_LAYERS', 'PADDING_LAYERS',
-    'UPSAMPLE_LAYERS', 'PLUGIN_LAYERS', 'Scale', 'ConvAWS2d', 'ConvWS2d',
-    'conv_ws_2d', 'DepthwiseSeparableConvModule', 'Swish', 'Linear',
-    'Conv2dAdaptivePadding', 'Conv2d', 'ConvTranspose2d', 'MaxPool2d',
-    'ConvTranspose3d', 'MaxPool3d', 'Conv3d', 'Dropout', 'DropPath'
+    'Scale', 'ConvAWS2d', 'ConvWS2d', 'conv_ws_2d',
+    'DepthwiseSeparableConvModule', 'Swish', 'Linear', 'Conv2dAdaptivePadding',
+    'Conv2d', 'ConvTranspose2d', 'MaxPool2d', 'ConvTranspose3d', 'MaxPool3d',
+    'Conv3d', 'Dropout', 'DropPath'
 ]
--- a/mmcv/cnn/bricks/activation.py
+++ b/mmcv/cnn/bricks/activation.py
@@ -4,19 +4,18 @@ from typing import Dict
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
-
-from mmcv.utils import TORCH_VERSION, build_from_cfg, digit_version
-from .registry import ACTIVATION_LAYERS
+from mmengine.registry import MODELS
+from mmengine.utils import TORCH_VERSION, digit_version

 for module in [
        nn.ReLU, nn.LeakyReLU, nn.PReLU, nn.RReLU, nn.ReLU6, nn.ELU,
        nn.Sigmoid, nn.Tanh
 ]:
-    ACTIVATION_LAYERS.register_module(module=module)
+    MODELS.register_module(module=module)


-@ACTIVATION_LAYERS.register_module(name='Clip')
-@ACTIVATION_LAYERS.register_module()
+@MODELS.register_module(name='Clip')
+@MODELS.register_module()
 class Clamp(nn.Module):
    """Clamp activation layer.

@@ -75,9 +74,9 @@ class GELU(nn.Module):

 if (TORCH_VERSION == 'parrots'
        or digit_version(TORCH_VERSION) < digit_version('1.4')):
-    ACTIVATION_LAYERS.register_module(module=GELU)
+    MODELS.register_module(module=GELU)
 else:
-    ACTIVATION_LAYERS.register_module(module=nn.GELU)
+    MODELS.register_module(module=nn.GELU)


 def build_activation_layer(cfg: Dict) -> nn.Module:
@@ -92,4 +91,4 @@ def build_activation_layer(cfg: Dict) -> nn.Module:
    Returns:
        nn.Module: Created activation layer.
    """
-    return build_from_cfg(cfg, ACTIVATION_LAYERS)
+    return MODELS.build(cfg)
--- a/mmcv/cnn/bricks/context_block.py
+++ b/mmcv/cnn/bricks/context_block.py
@@ -2,11 +2,10 @@
 from typing import Union

 import torch
+from mmengine.model.utils import constant_init, kaiming_init
+from mmengine.registry import MODELS
 from torch import nn

-from ..utils import constant_init, kaiming_init
-from .registry import PLUGIN_LAYERS
-

 def last_zero_init(m: Union[nn.Module, nn.Sequential]) -> None:
    if isinstance(m, nn.Sequential):
@@ -15,7 +14,7 @@ def last_zero_init(m: Union[nn.Module, nn.Sequential]) -> None:
        constant_init(m, val=0)


-@PLUGIN_LAYERS.register_module()
+@MODELS.register_module()
 class ContextBlock(nn.Module):
    """ContextBlock module in GCNet.


--- a/mmcv/cnn/bricks/conv.py
+++ b/mmcv/cnn/bricks/conv.py
 # Copyright (c) OpenMMLab. All rights reserved.
 from typing import Dict, Optional

+from mmengine.registry import MODELS
 from torch import nn

-from .registry import CONV_LAYERS
-
-CONV_LAYERS.register_module('Conv1d', module=nn.Conv1d)
-CONV_LAYERS.register_module('Conv2d', module=nn.Conv2d)
-CONV_LAYERS.register_module('Conv3d', module=nn.Conv3d)
-CONV_LAYERS.register_module('Conv', module=nn.Conv2d)
+MODELS.register_module('Conv1d', module=nn.Conv1d)
+MODELS.register_module('Conv2d', module=nn.Conv2d)
+MODELS.register_module('Conv3d', module=nn.Conv3d)
+MODELS.register_module('Conv', module=nn.Conv2d)


 def build_conv_layer(cfg: Optional[Dict], *args, **kwargs) -> nn.Module:
@@ -36,11 +35,15 @@ def build_conv_layer(cfg: Optional[Dict], *args, **kwargs) -> nn.Module:
        cfg_ = cfg.copy()

    layer_type = cfg_.pop('type')
-    if layer_type not in CONV_LAYERS:
-        raise KeyError(f'Unrecognized layer type {layer_type}')
-    else:
-        conv_layer = CONV_LAYERS.get(layer_type)

+    # Switch registry to the target scope. If `conv_layer` cannot be found
+    # in the registry, fallback to search `conv_layer` in the
+    # mmengine.MODELS.
+    with MODELS.switch_scope_and_registry(None) as registry:
+        conv_layer = registry.get(layer_type)
+    if conv_layer is None:
+        raise KeyError(f'Cannot find {conv_layer} in registry under scope '
+                       f'name {registry.scope}')
    layer = conv_layer(*args, **kwargs, **cfg_)

    return layer
--- a/mmcv/cnn/bricks/conv2d_adaptive_padding.py
+++ b/mmcv/cnn/bricks/conv2d_adaptive_padding.py
@@ -3,13 +3,12 @@ import math
 from typing import Tuple, Union

 import torch
+from mmengine.registry import MODELS
 from torch import nn
 from torch.nn import functional as F

-from .registry import CONV_LAYERS

-
-@CONV_LAYERS.register_module()
+@MODELS.register_module()
 class Conv2dAdaptivePadding(nn.Conv2d):
    """Implementation of 2D convolution in tensorflow with `padding` as "same",
    which applies padding to input (if needed) so that input image gets fully

--- a/mmcv/cnn/bricks/conv_module.py
+++ b/mmcv/cnn/bricks/conv_module.py
@@ -4,17 +4,17 @@ from typing import Dict, Optional, Tuple, Union

 import torch
 import torch.nn as nn
+from mmengine.model.utils import constant_init, kaiming_init
+from mmengine.registry import MODELS

 from mmcv.utils import _BatchNorm, _InstanceNorm
-from ..utils import constant_init, kaiming_init
 from .activation import build_activation_layer
 from .conv import build_conv_layer
 from .norm import build_norm_layer
 from .padding import build_padding_layer
-from .registry import PLUGIN_LAYERS


-@PLUGIN_LAYERS.register_module()
+@MODELS.register_module()
 class ConvModule(nn.Module):
    """A conv block that bundles conv/norm/activation layers.


--- a/mmcv/cnn/bricks/conv_ws.py
+++ b/mmcv/cnn/bricks/conv_ws.py
@@ -5,8 +5,7 @@ from typing import Dict, List, Optional, Tuple, Union
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
-
-from .registry import CONV_LAYERS
+from mmengine.registry import MODELS


 def conv_ws_2d(input: torch.Tensor,
@@ -25,7 +24,7 @@ def conv_ws_2d(input: torch.Tensor,
    return F.conv2d(input, weight, bias, stride, padding, dilation, groups)


-@CONV_LAYERS.register_module('ConvWS')
+@MODELS.register_module('ConvWS')
 class ConvWS2d(nn.Conv2d):

    def __init__(self,
@@ -54,7 +53,7 @@ class ConvWS2d(nn.Conv2d):
                          self.dilation, self.groups, self.eps)


-@CONV_LAYERS.register_module(name='ConvAWS')
+@MODELS.register_module(name='ConvAWS')
 class ConvAWS2d(nn.Conv2d):
    """AWS (Adaptive Weight Standardization)


--- a/mmcv/cnn/bricks/drop.py
+++ b/mmcv/cnn/bricks/drop.py
@@ -3,9 +3,7 @@ from typing import Any, Dict, Optional

 import torch
 import torch.nn as nn
-
-from mmcv import build_from_cfg
-from .registry import DROPOUT_LAYERS
+from mmengine.registry import MODELS


 def drop_path(x: torch.Tensor,
@@ -28,7 +26,7 @@ def drop_path(x: torch.Tensor,
    return output


-@DROPOUT_LAYERS.register_module()
+@MODELS.register_module()
 class DropPath(nn.Module):
    """Drop paths (Stochastic Depth) per sample  (when applied in main path of
    residual blocks).
@@ -48,7 +46,7 @@ class DropPath(nn.Module):
        return drop_path(x, self.drop_prob, self.training)


-@DROPOUT_LAYERS.register_module()
+@MODELS.register_module()
 class Dropout(nn.Dropout):
    """A wrapper for ``torch.nn.Dropout``, We rename the ``p`` of
    ``torch.nn.Dropout`` to ``drop_prob`` so as to be consistent with
@@ -66,4 +64,4 @@ class Dropout(nn.Dropout):

 def build_dropout(cfg: Dict, default_args: Optional[Dict] = None) -> Any:
    """Builder for drop out layers."""
-    return build_from_cfg(cfg, DROPOUT_LAYERS, default_args)
+    return MODELS.build(cfg, default_args=default_args)
--- a/mmcv/cnn/bricks/generalized_attention.py
+++ b/mmcv/cnn/bricks/generalized_attention.py
@@ -5,12 +5,11 @@ import numpy as np
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
+from mmengine.model.utils import kaiming_init
+from mmengine.registry import MODELS

-from ..utils import kaiming_init
-from .registry import PLUGIN_LAYERS

-
-@PLUGIN_LAYERS.register_module()
+@MODELS.register_module()
 class GeneralizedAttention(nn.Module):
    """GeneralizedAttention module.


--- a/mmcv/cnn/bricks/hsigmoid.py
+++ b/mmcv/cnn/bricks/hsigmoid.py
@@ -3,11 +3,10 @@ import warnings

 import torch
 import torch.nn as nn
+from mmengine.registry import MODELS

-from .registry import ACTIVATION_LAYERS

-
-@ACTIVATION_LAYERS.register_module()
+@MODELS.register_module()
 class HSigmoid(nn.Module):
    """Hard Sigmoid Module. Apply the hard sigmoid function:
    Hsigmoid(x) = min(max((x + bias) / divisor, min_value), max_value)

--- a/mmcv/cnn/bricks/hswish.py
+++ b/mmcv/cnn/bricks/hswish.py
 # Copyright (c) OpenMMLab. All rights reserved.
 import torch
 import torch.nn as nn
+from mmengine.registry import MODELS

 from mmcv.utils import TORCH_VERSION, digit_version
-from .registry import ACTIVATION_LAYERS


 class HSwish(nn.Module):
@@ -34,6 +34,6 @@ if (TORCH_VERSION == 'parrots'
        or digit_version(TORCH_VERSION) < digit_version('1.7')):
    # Hardswish is not supported when PyTorch version < 1.6.
    # And Hardswish in PyTorch 1.6 does not support inplace.
-    ACTIVATION_LAYERS.register_module(module=HSwish)
+    MODELS.register_module(module=HSwish)
 else:
-    ACTIVATION_LAYERS.register_module(module=nn.Hardswish, name='HSwish')
+    MODELS.register_module(module=nn.Hardswish, name='HSwish')
--- a/mmcv/cnn/bricks/non_local.py
+++ b/mmcv/cnn/bricks/non_local.py
@@ -4,10 +4,10 @@ from typing import Dict, Optional

 import torch
 import torch.nn as nn
+from mmengine.model.utils import constant_init, normal_init
+from mmengine.registry import MODELS

-from ..utils import constant_init, normal_init
 from .conv_module import ConvModule
-from .registry import PLUGIN_LAYERS


 class _NonLocalNd(nn.Module, metaclass=ABCMeta):
@@ -246,7 +246,7 @@ class NonLocal1d(_NonLocalNd):
                self.phi = max_pool_layer


-@PLUGIN_LAYERS.register_module()
+@MODELS.register_module()
 class NonLocal2d(_NonLocalNd):
    """2D Non-local module.


--- a/mmcv/cnn/bricks/norm.py
+++ b/mmcv/cnn/bricks/norm.py
@@ -3,22 +3,22 @@ import inspect
 from typing import Dict, Tuple, Union

 import torch.nn as nn
+from mmengine.registry import MODELS

 from mmcv.utils import is_tuple_of
 from mmcv.utils.parrots_wrapper import SyncBatchNorm, _BatchNorm, _InstanceNorm
-from .registry import NORM_LAYERS

-NORM_LAYERS.register_module('BN', module=nn.BatchNorm2d)
-NORM_LAYERS.register_module('BN1d', module=nn.BatchNorm1d)
-NORM_LAYERS.register_module('BN2d', module=nn.BatchNorm2d)
-NORM_LAYERS.register_module('BN3d', module=nn.BatchNorm3d)
-NORM_LAYERS.register_module('SyncBN', module=SyncBatchNorm)
-NORM_LAYERS.register_module('GN', module=nn.GroupNorm)
-NORM_LAYERS.register_module('LN', module=nn.LayerNorm)
-NORM_LAYERS.register_module('IN', module=nn.InstanceNorm2d)
-NORM_LAYERS.register_module('IN1d', module=nn.InstanceNorm1d)
-NORM_LAYERS.register_module('IN2d', module=nn.InstanceNorm2d)
-NORM_LAYERS.register_module('IN3d', module=nn.InstanceNorm3d)
+MODELS.register_module('BN', module=nn.BatchNorm2d)
+MODELS.register_module('BN1d', module=nn.BatchNorm1d)
+MODELS.register_module('BN2d', module=nn.BatchNorm2d)
+MODELS.register_module('BN3d', module=nn.BatchNorm3d)
+MODELS.register_module('SyncBN', module=SyncBatchNorm)
+MODELS.register_module('GN', module=nn.GroupNorm)
+MODELS.register_module('LN', module=nn.LayerNorm)
+MODELS.register_module('IN', module=nn.InstanceNorm2d)
+MODELS.register_module('IN1d', module=nn.InstanceNorm1d)
+MODELS.register_module('IN2d', module=nn.InstanceNorm2d)
+MODELS.register_module('IN3d', module=nn.InstanceNorm3d)


 def infer_abbr(class_type):
@@ -97,10 +97,15 @@ def build_norm_layer(cfg: Dict,
    cfg_ = cfg.copy()

    layer_type = cfg_.pop('type')
-    if layer_type not in NORM_LAYERS:
-        raise KeyError(f'Unrecognized norm type {layer_type}')

-    norm_layer = NORM_LAYERS.get(layer_type)
+    # Switch registry to the target scope. If `norm_layer` cannot be found
+    # in the registry, fallback to search `norm_layer` in the
+    # mmengine.MODELS.
+    with MODELS.switch_scope_and_registry(None) as registry:
+        norm_layer = registry.get(layer_type)
+    if norm_layer is None:
+        raise KeyError(f'Cannot find {norm_layer} in registry under scope '
+                       f'name {registry.scope}')
    abbr = infer_abbr(norm_layer)

    assert isinstance(postfix, (int, str))

--- a/mmcv/cnn/bricks/padding.py
+++ b/mmcv/cnn/bricks/padding.py
@@ -2,12 +2,11 @@
 from typing import Dict

 import torch.nn as nn
+from mmengine.registry import MODELS

-from .registry import PADDING_LAYERS
-
-PADDING_LAYERS.register_module('zero', module=nn.ZeroPad2d)
-PADDING_LAYERS.register_module('reflect', module=nn.ReflectionPad2d)
-PADDING_LAYERS.register_module('replicate', module=nn.ReplicationPad2d)
+MODELS.register_module('zero', module=nn.ZeroPad2d)
+MODELS.register_module('reflect', module=nn.ReflectionPad2d)
+MODELS.register_module('replicate', module=nn.ReplicationPad2d)


 def build_padding_layer(cfg: Dict, *args, **kwargs) -> nn.Module:
@@ -28,11 +27,15 @@ def build_padding_layer(cfg: Dict, *args, **kwargs) -> nn.Module:

    cfg_ = cfg.copy()
    padding_type = cfg_.pop('type')
-    if padding_type not in PADDING_LAYERS:
-        raise KeyError(f'Unrecognized padding type {padding_type}.')
-    else:
-        padding_layer = PADDING_LAYERS.get(padding_type)

+    # Switch registry to the target scope. If `padding_layer` cannot be found
+    # in the registry, fallback to search `padding_layer` in the
+    # mmengine.MODELS.
+    with MODELS.switch_scope_and_registry(None) as registry:
+        padding_layer = registry.get(padding_type)
+    if padding_layer is None:
+        raise KeyError(f'Cannot find {padding_layer} in registry under scope '
+                       f'name {registry.scope}')
    layer = padding_layer(*args, **kwargs, **cfg_)

    return layer
--- a/mmcv/cnn/bricks/plugin.py
+++ b/mmcv/cnn/bricks/plugin.py
@@ -4,8 +4,7 @@ import platform
 from typing import Dict, Tuple, Union

 import torch.nn as nn
-
-from .registry import PLUGIN_LAYERS
+from mmengine.registry import MODELS

 if platform.system() == 'Windows':
    import regex as re  # type: ignore
@@ -80,10 +79,15 @@ def build_plugin_layer(cfg: Dict,
    cfg_ = cfg.copy()

    layer_type = cfg_.pop('type')
-    if layer_type not in PLUGIN_LAYERS:
-        raise KeyError(f'Unrecognized plugin type {layer_type}')

-    plugin_layer = PLUGIN_LAYERS.get(layer_type)
+    # Switch registry to the target scope. If `plugin_layer` cannot be found
+    # in the registry, fallback to search `plugin_layer` in the
+    # mmengine.MODELS.
+    with MODELS.switch_scope_and_registry(None) as registry:
+        plugin_layer = registry.get(layer_type)
+    if plugin_layer is None:
+        raise KeyError(f'Cannot find {plugin_layer} in registry under scope '
+                       f'name {registry.scope}')
    abbr = infer_abbr(plugin_layer)

    assert isinstance(postfix, (int, str))