[Model Compression] v2 doc (#4246)

0efabe96 · J-shang · GitHub · 70fcdda6 · 0efabe96 · 0efabe96
Unverified Commit 0efabe96 authored Oct 25, 2021 by J-shang Committed by GitHub Oct 25, 2021
8 changed files
--- a/docs/en_US/Compression/v2_pruning.rst
+++ b/docs/en_US/Compression/v2_pruning.rst
+Pruning V2
+==========
+
+Pruning V2 is a refactoring of the old version and provides more powerful functions.
+Compared with the old version, the iterative pruning process is detached from the pruner and the pruner is only responsible for pruning and generating the masks once.
+What's more, pruning V2 unifies the pruning process and provides a more free combination of pruning components.
+Task generator only cares about the pruning effect that should be achieved in each round, and uses a config list to express how to pruning in the next step.
+Pruner will reset with the model and config list given by task generator then generate the masks in current step.
+
+For a clearer structure vision, please refer to the figure below.
+
+.. image:: ../../img/pruning_process.png
+   :target: ../../img/pruning_process.png
+   :alt:
+
+In V2, a pruning process is usually driven by a pruning scheduler, it contains a specific pruner and a task generator.
+But users can also use pruner directly like in the pruning V1.
+
+For details, please refer to the following tutorials:
+
+..  toctree::
+    :maxdepth: 2
+
+    Pruning Algorithms <v2_pruning_algo>
+    Pruning Scheduler <v2_scheduler>
--- a/docs/en_US/Compression/v2_pruning_algo.rst
+++ b/docs/en_US/Compression/v2_pruning_algo.rst
+Supported Pruning Algorithms in NNI
+===================================
+
+NNI provides several pruning algorithms that reproducing from the papers. In pruning v2, NNI split the pruning algorithm into more detailed components.
+This means users can freely combine components from different algorithms,
+or easily use a component of their own implementation to replace a step in the original algorithm to implement their own pruning algorithm.
+
+Right now, pruning algorithms with how to generate masks in one step are implemented as pruners,
+and how to schedule sparsity in each iteration are implemented as iterative pruners.
+
+**Pruner**
+
+* `Level Pruner <#level-pruner>`__
+* `L1 Norm Pruner <#l1-norm-pruner>`__
+* `L2 Norm Pruner <#l2-norm-pruner>`__
+* `FPGM Pruner <#fpgm-pruner>`__
+* `Slim Pruner <#slim-pruner>`__
+* `Activation APoZ Rank Pruner <#activation-apoz-rank-pruner>`__
+* `Activation Mean Rank Pruner <#activation-mean-rank-pruner>`__
+* `Taylor FO Weight Pruner <#taylor-fo-weight-pruner>`__
+* `ADMM Pruner <#admm-pruner>`__
+
+**Iterative Pruner**
+
+* `Linear Pruner <#linear-pruner>`__
+* `AGP Pruner <#agp-pruner>`__
+* `Lottery Ticket Pruner <#lottery-ticket-pruner>`__
+* `Simulated Annealing Pruner <#simulated-annealing-pruner>`__
+
+Level Pruner
+------------
+
+This is a basic pruner, and in some papers called it magnitude pruning or fine-grained pruning.
+
+It will mask the weight in each specified layer with smaller absolute value by a ratio configured in the config list.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import LevelPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
+   pruner = LevelPruner(model, config_list)
+   masked_model, masks = pruner.compress()
+
+User configuration for Level Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LevelPruner
+
+L1 Norm Pruner
+--------------
+
+L1 norm pruner computes the l1 norm of the layer weight on the first dimension,
+then prune the weight blocks on this dimension with smaller l1 norm values.
+i.e., compute the l1 norm of the filters in convolution layer as metric values,
+compute the l1 norm of the weight by rows in linear layer as metric values.
+
+For more details, please refer to `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__\.
+
+In addition, L1 norm pruner also supports dependency-aware mode.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import L1NormPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+   pruner = L1NormPruner(model, config_list)
+   masked_model, masks = pruner.compress()
+
+User configuration for L1 Norm Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.L1NormPruner
+
+L2 Norm Pruner
+--------------
+
+L2 norm pruner is a variant of L1 norm pruner. It uses l2 norm as metric to determine which weight elements should be pruned.
+
+L2 norm pruner also supports dependency-aware mode.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import L2NormPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+   pruner = L2NormPruner(model, config_list)
+   masked_model, masks = pruner.compress()
+
+User configuration for L2 Norm Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.L2NormPruner
+
+FPGM Pruner
+-----------
+
+FPGM pruner prunes the blocks of the weight on the first dimension with the smallest geometric median.
+FPGM chooses the weight blocks with the most replaceable contribution.
+
+For more details, please refer to `Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration <https://arxiv.org/abs/1811.00250>`__.
+
+FPGM pruner also supports dependency-aware mode.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import FPGMPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+   pruner = FPGMPruner(model, config_list)
+   masked_model, masks = pruner.compress()
+
+User configuration for FPGM Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.FPGMPruner
+
+Slim Pruner
+-----------
+
+Slim pruner adds sparsity regularization on the scaling factors of batch normalization (BN) layers during training to identify unimportant channels.
+The channels with small scaling factor values will be pruned.
+
+For more details, please refer to `Learning Efficient Convolutional Networks through Network Slimming <https://arxiv.org/abs/1708.06519>`__\.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import SlimPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
+   pruner = SlimPruner(model, config_list, trainer, optimizer, criterion, training_epochs=1)
+   masked_model, masks = pruner.compress()
+
+User configuration for Slim Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.SlimPruner
+
+Activation APoZ Rank Pruner
+---------------------------
+
+Activation APoZ rank pruner is a pruner which prunes on the first weight dimension,
+with the smallest importance criterion ``APoZ`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity.
+The pruning criterion ``APoZ`` is explained in the paper `Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures <https://arxiv.org/abs/1607.03250>`__.
+
+The APoZ is defined as:
+
+:math:`APoZ_{c}^{(i)} = APoZ\left(O_{c}^{(i)}\right)=\frac{\sum_{k}^{N} \sum_{j}^{M} f\left(O_{c, j}^{(i)}(k)=0\right)}{N \times M}`
+
+Activation APoZ rank pruner also supports dependency-aware mode.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import ActivationAPoZRankPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+   pruner = ActivationAPoZRankPruner(model, config_list, trainer, optimizer, criterion, training_batches=20)
+   masked_model, masks = pruner.compress()
+
+User configuration for Activation APoZ Rank Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ActivationAPoZRankPruner
+
+Activation Mean Rank Pruner
+---------------------------
+
+Activation mean rank pruner is a pruner which prunes on the first weight dimension,
+with the smallest importance criterion ``mean activation`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity.
+The pruning criterion ``mean activation`` is explained in section 2.2 of the paper `Pruning Convolutional Neural Networks for Resource Efficient Inference <https://arxiv.org/abs/1611.06440>`__.
+
+Activation mean rank pruner also supports dependency-aware mode.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import ActivationMeanRankPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+   pruner = ActivationMeanRankPruner(model, config_list, trainer, optimizer, criterion, training_batches=20)
+   masked_model, masks = pruner.compress()
+
+User configuration for Activation Mean Rank Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ActivationMeanRankPruner
+
+Taylor FO Weight Pruner
+-----------------------
+
+Taylor FO weight pruner is a pruner which prunes on the first weight dimension,
+based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity.
+The estimated importance is defined as the paper `Importance Estimation for Neural Network Pruning <http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf>`__.
+
+:math:`\widehat{\mathcal{I}}_{\mathcal{S}}^{(1)}(\mathbf{W}) \triangleq \sum_{s \in \mathcal{S}} \mathcal{I}_{s}^{(1)}(\mathbf{W})=\sum_{s \in \mathcal{S}}\left(g_{s} w_{s}\right)^{2}`
+
+Taylor FO weight pruner also supports dependency-aware mode.
+
+What's more, we provide a global-sort mode for this pruner which is aligned with paper implementation.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import TaylorFOWeightPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+   pruner = TaylorFOWeightPruner(model, config_list, trainer, optimizer, criterion, training_batches=20)
+   masked_model, masks = pruner.compress()
+
+User configuration for Activation Mean Rank Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.TaylorFOWeightPruner
+
+ADMM Pruner
+-----------
+
+Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
+by decomposing the original nonconvex problem into two subproblems that can be solved iteratively.
+In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively. 
+
+During the process of solving these two subproblems, the weights of the original model will be changed.
+Then a fine-grained pruning will be applied to prune the model according to the config list given.
+
+This solution framework applies both to non-structured and different variations of structured pruning schemes.
+
+For more details, please refer to `A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers <https://arxiv.org/abs/1804.03294>`__.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import ADMMPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+   pruner = ADMMPruner(model, config_list, trainer, optimizer, criterion, iterations=10, training_epochs=1)
+   masked_model, masks = pruner.compress()
+
+User configuration for ADMM Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ADMMPruner
+
+Linear Pruner
+-------------
+
+Linear pruner is an iterative pruner, it will increase sparsity evenly from scratch during each iteration.
+For example, the final sparsity is set as 0.5, and the iteration number is 5, then the sparsity used in each iteration are ``[0, 0.1, 0.2, 0.3, 0.4, 0.5]``.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import LinearPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+   pruner = LinearPruner(model, config_list, pruning_algorithm='l1', total_iteration=10, finetuner=finetuner)
+   pruner.compress()
+   _, model, masks, _, _ = pruner.get_best_result()
+
+User configuration for Linear Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LinearPruner
+
+AGP Pruner
+----------
+
+This is an iterative pruner, which the sparsity is increased from an initial sparsity value :math:`s_{i}` (usually 0) to a final sparsity value :math:`s_{f}` over a span of :math:`n` pruning iterations,
+starting at training step :math:`t_{0}` and with pruning frequency :math:`\Delta t`:
+
+:math:`s_{t}=s_{f}+\left(s_{i}-s_{f}\right)\left(1-\frac{t-t_{0}}{n \Delta t}\right)^{3} \text { for } t \in\left\{t_{0}, t_{0}+\Delta t, \ldots, t_{0} + n \Delta t\right\}`
+
+For more details please refer to `To prune, or not to prune: exploring the efficacy of pruning for model compression <https://arxiv.org/abs/1710.01878>`__\.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import AGPPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+   pruner = AGPPruner(model, config_list, pruning_algorithm='l1', total_iteration=10, finetuner=finetuner)
+   pruner.compress()
+   _, model, masks, _, _ = pruner.get_best_result()
+
+User configuration for AGP Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.AGPPruner
+
+Lottery Ticket Pruner
+---------------------
+
+`The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks <https://arxiv.org/abs/1803.03635>`__\ ,
+authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis,
+and articulate the *lottery ticket hypothesis*\ : dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*\ ) that
+-- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.
+
+In this paper, the authors use the following process to prune a model, called *iterative prunning*\ :
+
+..
+
+   #. Randomly initialize a neural network f(x;theta_0) (where theta\ *0 follows D*\ {theta}).
+   #. Train the network for j iterations, arriving at parameters theta_j.
+   #. Prune p% of the parameters in theta_j, creating a mask m.
+   #. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
+   #. Repeat step 2, 3, and 4.
+
+If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning,
+each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import LotteryTicketPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+   pruner = LotteryTicketPruner(model, config_list, pruning_algorithm='l1', total_iteration=10, finetuner=finetuner, reset_weight=True)
+   pruner.compress()
+   _, model, masks, _, _ = pruner.get_best_result()
+
+User configuration for Lottery Ticket Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LotteryTicketPruner
+
+Simulated Annealing Pruner
+--------------------------
+
+We implement a guided heuristic search method, Simulated Annealing (SA) algorithm. As mentioned in the paper, this method is enhanced on guided search based on prior experience.
+The enhanced SA technique is based on the observation that a DNN layer with more number of weights often has a higher degree of model compression with less impact on overall accuracy.
+
+* Randomly initialize a pruning rate distribution (sparsities).
+* While current_temperature < stop_temperature:
+
+  #. generate a perturbation to current distribution
+  #. Perform fast evaluation on the perturbated distribution
+  #. accept the perturbation according to the performance and probability, if not accepted, return to step 1
+  #. cool down, current_temperature <- current_temperature * cool_down_rate
+
+For more details, please refer to `AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates <https://arxiv.org/abs/1907.03141>`__.
+
+Usage
+^^^^^^
+
+.. code-block:: python
+
+   from nni.algorithms.compression.v2.pytorch.pruning import SimulatedAnnealingPruner
+   config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+   pruner = SimulatedAnnealingPruner(model, config_list, pruning_algorithm='l1', cool_down_rate=0.9, finetuner=finetuner)
+   pruner.compress()
+   _, model, masks, _, _ = pruner.get_best_result()
+
+User configuration for Simulated Annealing Pruner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**PyTorch**
+
+.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.SimulatedAnnealingPruner
--- a/docs/en_US/Compression/v2_scheduler.rst
+++ b/docs/en_US/Compression/v2_scheduler.rst
+Pruning Scheduler
+=================
+
+Pruning scheduler is new feature supported in pruning v2. It can bring more flexibility for pruning the model iteratively.
+All the built-in iterative pruners (e.g., AGPPruner, SimulatedAnnealingPruner) are based on three abstracted components: pruning scheduler, pruners and task generators.
+In addition to using the NNI built-in iterative pruners,
+users can directly use the pruning schedulers to customize their own iterative pruning logic.
+
+Workflow of Pruning Scheduler
+-----------------------------
+
+In iterative pruning, the final goal will be broken down into different small goals, and complete a small goal in each iteration.
+For example, each iteration increases a little sparsity ratio, and after several pruning iterations, the continuous pruned model reaches the final overall sparsity;
+fix the overall sparsity, try different ways to allocate sparsity between layers in each iteration, and find the best allocation way.
+
+We define a small goal as ``Task``, it usually includes states inherited from previous iterations (eg. pruned model and masks) and description of the current goal (eg. a config list that describes how to allocate sparsity).
+Details about ``Task`` can be found in this :githublink:`file <nni/algorithms/compression/v2/pytorch/base/scheduler.py>`.
+
+Pruning scheduler handles two main components, a basic pruner, and a task generator. The logic of generating ``Task`` is encapsulated in the task generator.
+In an iteration (one pruning step), pruning scheduler parses the ``Task`` getting from the task generator,
+and reset the pruner by ``model``, ``masks``, ``config_list`` parsing from the ``Task``.
+Then pruning scheduler generates the new masks by the pruner. During an iteration, the new masked model may also experience speed-up, finetuning, and evaluating.
+After one iteration is done, the pruning scheduler collects the compact model, new masks and evaluation score, packages them into ``TaskResult``, and passes it to task generator.
+The iteration process will end until the task generator has no more ``Task``.
+
+How to Customized Iterative Pruning
+-----------------------------------
+
+Using AGP Pruning as an example to explain how to implement an iterative pruning by scheduler in NNI.
+
+.. code-block:: python
+
+    from nni.algorithms.compression.v2.pytorch.pruning import L1NormPruner, PruningScheduler
+    from nni.algorithms.compression.v2.pytorch.pruning.tools import AGPTaskGenerator
+
+    pruner = L1NormPruner(model=None, config_list=None, mode='dependency_aware', dummy_input=torch.rand(10, 3, 224, 224).to(device))
+    task_generator = AGPTaskGenerator(total_iteration=10, origin_model=model, origin_config_list=config_list, log_dir='.', keep_intermediate_result=True)
+    scheduler = PruningScheduler(pruner, task_generator, finetuner=finetuner, speed_up=True, dummy_input=dummy_input, evaluator=None, reset_weight=False)
+
+    scheduler.compress()
+    _, model, masks, _, _ = scheduler.get_best_result()
+
+The full script can be found :githublink:`here <examples/model_compress/pruning/v2/scheduler_torch.py>`.
+
+In this example, we use ``dependency_aware`` mode L1 Norm Pruner as a basic pruner during each iteration.
+Note we do not need to pass ``model`` and ``config_list`` to the pruner, because in each iteration the ``model`` and ``config_list`` used by the pruner are received from the task generator.
+Then we can use ``scheduler`` as an iterative pruner directly. In fact, this is the implementation of ``AGPPruner`` in NNI.
+
+More about Task Generator
+-------------------------
+
+The task generator is used to give the model that needs to be pruned in each iteration and the corresponding config_list.
+For example, ``AGPTaskGenerator`` will give the model pruned in the previous iteration and compute the sparsity using in the current iteration.
+``TaskGenerator`` put all these pruning information into ``Task`` and pruning scheduler will get the ``Task``, then run it.
+The pruning result will return to the ``TaskGenerator`` at the end of each iteration and ``TaskGenerator`` will judge whether and how to generate the next ``Task``.
+
+The information included in the ``Task`` and ``TaskResult`` can be found :githublink:`here <nni/algorithms/compression/v2/pytorch/base/scheduler.py>`.
+
+A clearer iterative pruning flow chart can be found `here <v2_pruning.rst>`__.
+
+If you want to implement your own task generator, please following the ``TaskGenerator`` :githublink:`interface <nni/algorithms/compression/v2/pytorch/pruning/tools/base.py>`.
+Two main functions should be implemented, ``init_pending_tasks(self) -> List[Task]`` and ``generate_tasks(self, task_result: TaskResult) -> List[Task]``.
+
+Why Use Pruning Scheduler
+-------------------------
+
+One of the benefits of using a scheduler to do iterative pruning is users can use more functions of NNI pruning components,
+because of simplicity of the interface and the restoration of the paper, NNI not fully exposing all the low-level interfaces to the upper layer.
+For example, resetting weight value to the original model in each iteration is a key point in lottery ticket pruning algorithm, and this is implemented in ``LotteryTicketPruner``.
+To reduce the complexity of the interface, we only support this function in ``LotteryTicketPruner``, not other pruners.
+If users want to reset weight during each iteration in AGP pruning, ``AGPPruner`` can not do this, but users can easily set ``reset_weight=True`` in ``PruningScheduler`` to implement this.
+
+What's more, for a customized pruner or task generator, using scheduler can easily enhance the algorithm.
+In addition, users can also customize the scheduling process to implement their own scheduler.
--- a/docs/en_US/model_compression.rst
+++ b/docs/en_US/model_compression.rst
@@ -26,6 +26,7 @@ For details, please refer to the following tutorials:
    Overview <Compression/Overview>
    Quick Start <Compression/QuickStart>
    Pruning <Compression/pruning>
+    Pruning V2 <Compression/v2_pruning>
    Quantization <Compression/quantization>
    Utilities <Compression/CompressionUtils>
    Advanced Usage <Compression/advanced>

--- a/docs/img/pruning_process.png
+++ b/docs/img/pruning_process.png
--- a/examples/model_compress/pruning/v2/scheduler_torch.py
+++ b/examples/model_compress/pruning/v2/scheduler_torch.py
@@ -92,6 +92,8 @@ if __name__ == '__main__':
    # or the result with the highest score (given by evaluator) will be the best result.

    # scheduler = PruningScheduler(pruner, task_generator, finetuner=finetuner, speed_up=True, dummy_input=dummy_input, evaluator=evaluator)
-    scheduler = PruningScheduler(pruner, task_generator, finetuner=finetuner, speed_up=True, dummy_input=dummy_input, evaluator=None)
+    scheduler = PruningScheduler(pruner, task_generator, finetuner=finetuner, speed_up=True, dummy_input=dummy_input, evaluator=None, reset_weight=False)

    scheduler.compress()
+
+    _, model, masks, _, _ = scheduler.get_best_result()
--- a/nni/algorithms/compression/v2/pytorch/pruning/basic_pruner.py
+++ b/nni/algorithms/compression/v2/pytorch/pruning/basic_pruner.py
@@ -123,21 +123,21 @@ class BasicPruner(Pruner):


 class LevelPruner(BasicPruner):
+    """
+    Parameters
+    ----------
+    model : torch.nn.Module
+        Model to be pruned
+    config_list : List[Dict]
+        Supported keys:
+            - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
+            - sparsity_per_layer : Equals to sparsity.
+            - op_types : Operation types to prune.
+            - op_names : Operation names to prune.
+            - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
+    """
+
    def __init__(self, model: Module, config_list: List[Dict]):
-        """
-        Parameters
-        ----------
-        model
-            Model to be pruned
-        config_list
-            Supported keys:
-                - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
-                - sparsity_per_layer : Equals to sparsity.
-                - op_types : Operation types to prune.
-                - op_names : Operation names to prune.
-                - op_partial_names: An auxiliary field collecting matched op_names in model, then this will convert to op_names.
-                - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
-        """
        super().__init__(model, config_list)

    def _validate_config_before_canonical(self, model: Module, config_list: List[Dict]):
@@ -157,36 +157,36 @@ class LevelPruner(BasicPruner):


 class NormPruner(BasicPruner):
+    """
+    Parameters
+    ----------
+    model : torch.nn.Module
+        Model to be pruned
+    config_list : List[Dict]
+        Supported keys:
+            - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
+            - sparsity_per_layer : Equals to sparsity.
+            - op_types : Conv2d and Linear are supported in NormPruner.
+            - op_names : Operation names to prune.
+            - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
+    p : int
+        The order of norm.
+    mode : str
+        'normal' or 'dependency_aware'.
+        If prune the model in a dependency-aware way, this pruner will
+        prune the model according to the norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : Optional[torch.Tensor]
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
+    """
+
    def __init__(self, model: Module, config_list: List[Dict], p: int,
                 mode: str = 'normal', dummy_input: Optional[Tensor] = None):
-        """
-        Parameters
-        ----------
-        model
-            Model to be pruned
-        config_list
-            Supported keys:
-                - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
-                - sparsity_per_layer : Equals to sparsity.
-                - op_types : Conv2d and Linear are supported in NormPruner.
-                - op_names : Operation names to prune.
-                - op_partial_names: An auxiliary field collecting matched op_names in model, then this will convert to op_names.
-                - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
-        p
-            The order of norm.
-        mode
-            'normal' or 'dependency_aware'.
-            If prune the model in a dependency-aware way, this pruner will
-            prune the model according to the norm of weights and the channel-dependency or
-            group-dependency of the model. In this way, the pruner will force the conv layers
-            that have dependencies to prune the same channels, so the speedup module can better
-            harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
-            , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
-            dependency between the conv layers.
-        dummy_input
-            The dummy input to analyze the topology constraints. Note that, the dummy_input
-            should on the same device with the model.
-        """
        self.p = p
        self.mode = mode
        self.dummy_input = dummy_input
@@ -217,98 +217,98 @@ class NormPruner(BasicPruner):


 class L1NormPruner(NormPruner):
+    """
+    Parameters
+    ----------
+    model : torch.nn.Module
+        Model to be pruned
+    config_list : List[Dict]
+        Supported keys:
+            - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
+            - sparsity_per_layer : Equals to sparsity.
+            - op_types : Conv2d and Linear are supported in L1NormPruner.
+            - op_names : Operation names to prune.
+            - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
+    mode : str
+        'normal' or 'dependency_aware'.
+        If prune the model in a dependency-aware way, this pruner will
+        prune the model according to the l1-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : Optional[torch.Tensor]
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
+    """
+
    def __init__(self, model: Module, config_list: List[Dict],
                 mode: str = 'normal', dummy_input: Optional[Tensor] = None):
-        """
-        Parameters
-        ----------
-        model
-            Model to be pruned
-        config_list
-            Supported keys:
-                - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
-                - sparsity_per_layer : Equals to sparsity.
-                - op_types : Conv2d and Linear are supported in L1NormPruner.
-                - op_names : Operation names to prune.
-                - op_partial_names: An auxiliary field collecting matched op_names in model, then this will convert to op_names.
-                - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
-        mode
-            'normal' or 'dependency_aware'.
-            If prune the model in a dependency-aware way, this pruner will
-            prune the model according to the l1-norm of weights and the channel-dependency or
-            group-dependency of the model. In this way, the pruner will force the conv layers
-            that have dependencies to prune the same channels, so the speedup module can better
-            harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
-            , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
-            dependency between the conv layers.
-        dummy_input
-            The dummy input to analyze the topology constraints. Note that, the dummy_input
-            should on the same device with the model.
-        """
        super().__init__(model, config_list, 1, mode, dummy_input)


 class L2NormPruner(NormPruner):
+    """
+    Parameters
+    ----------
+    model : torch.nn.Module
+        Model to be pruned
+    config_list : List[Dict]
+        Supported keys:
+            - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
+            - sparsity_per_layer : Equals to sparsity.
+            - op_types : Conv2d and Linear are supported in L1NormPruner.
+            - op_names : Operation names to prune.
+            - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
+    mode : str
+        'normal' or 'dependency_aware'.
+        If prune the model in a dependency-aware way, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : Optional[torch.Tensor]
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
+    """
+
    def __init__(self, model: Module, config_list: List[Dict],
                 mode: str = 'normal', dummy_input: Optional[Tensor] = None):
-        """
-        Parameters
-        ----------
-        model
-            Model to be pruned
-        config_list
-            Supported keys:
-                - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
-                - sparsity_per_layer : Equals to sparsity.
-                - op_types : Conv2d and Linear are supported in L2NormPruner.
-                - op_names : Operation names to prune.
-                - op_partial_names: An auxiliary field collecting matched op_names in model, then this will convert to op_names.
-                - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
-        mode
-            'normal' or 'dependency_aware'.
-            If prune the model in a dependency-aware way, this pruner will
-            prune the model according to the l2-norm of weights and the channel-dependency or
-            group-dependency of the model. In this way, the pruner will force the conv layers
-            that have dependencies to prune the same channels, so the speedup module can better
-            harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
-            , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
-            dependency between the conv layers.
-        dummy_input
-            The dummy input to analyze the topology constraints. Note that, the dummy_input
-            should on the same device with the model.
-        """
        super().__init__(model, config_list, 2, mode, dummy_input)


 class FPGMPruner(BasicPruner):
+    """
+    Parameters
+    ----------
+    model : torch.nn.Module
+        Model to be pruned
+    config_list : List[Dict]
+        Supported keys:
+            - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
+            - sparsity_per_layer : Equals to sparsity.
+            - op_types : Conv2d and Linear are supported in FPGMPruner.
+            - op_names : Operation names to prune.
+            - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
+    mode : str
+        'normal' or 'dependency_aware'.
+        If prune the model in a dependency-aware way, this pruner will
+        prune the model according to the FPGM of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : Optional[torch.Tensor]
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
+    """
+
    def __init__(self, model: Module, config_list: List[Dict],
                 mode: str = 'normal', dummy_input: Optional[Tensor] = None):
-        """
-        Parameters
-        ----------
-        model
-            Model to be pruned
-        config_list
-            Supported keys:
-                - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
-                - sparsity_per_layer : Equals to sparsity.
-                - op_types : Conv2d and Linear are supported in FPGMPruner.
-                - op_names : Operation names to prune.
-                - op_partial_names: An auxiliary field collecting matched op_names in model, then this will convert to op_names.
-                - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
-        mode
-            'normal' or 'dependency_aware'.
-            If prune the model in a dependency-aware way, this pruner will
-            prune the model according to the FPGM of weights and the channel-dependency or
-            group-dependency of the model. In this way, the pruner will force the conv layers
-            that have dependencies to prune the same channels, so the speedup module can better
-            harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
-            , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
-            dependency between the conv layers.
-        dummy_input
-            The dummy input to analyze the topology constraints. Note that, the dummy_input
-            should on the same device with the model.
-        """
        self.mode = mode
        self.dummy_input = dummy_input
        super().__init__(model, config_list)
@@ -338,57 +338,57 @@ class FPGMPruner(BasicPruner):


 class SlimPruner(BasicPruner):
+    """
+    Parameters
+    ----------
+    model : torch.nn.Module
+        Model to be pruned
+    config_list : List[Dict]
+        Supported keys:
+            - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
+            - sparsity_per_layer : Equals to sparsity.
+            - total_sparsity : This is to specify the total sparsity for all layers in this config,
+                each layer may have different sparsity.
+            - max_sparsity_per_layer : Always used with total_sparsity. Limit the max sparsity of each layer.
+            - op_types : Only BatchNorm2d is supported in SlimPruner.
+            - op_names : Operation names to prune.
+            - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
+    trainer : Callable[[Module, Optimizer, Callable], None]
+        A callable function used to train model or just inference. Take model, optimizer, criterion as input.
+        The model will be trained or inferenced `training_epochs` epochs.
+
+        Example::
+
+            def trainer(model: Module, optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor]):
+                training = model.training
+                model.train(mode=True)
+                device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+                for batch_idx, (data, target) in enumerate(train_loader):
+                    data, target = data.to(device), target.to(device)
+                    optimizer.zero_grad()
+                    output = model(data)
+                    loss = criterion(output, target)
+                    loss.backward()
+                    # If you don't want to update the model, you can skip `optimizer.step()`, and set train mode False.
+                    optimizer.step()
+                model.train(mode=training)
+    optimizer : torch.optim.Optimizer
+        The optimizer instance used in trainer. Note that this optimizer might be patched during collect data,
+        so do not use this optimizer in other places.
+    criterion : Callable[[Tensor, Tensor], Tensor]
+        The criterion function used in trainer. Take model output and target value as input, and return the loss.
+    training_epochs : int
+        The epoch number for training model to sparsify the BN weight.
+    mode : str
+        'normal' or 'global'.
+        If prune the model in a global way, all layer weights with same config will be considered uniformly.
+        That means a single layer may not reach or exceed the sparsity setting in config,
+        but the total pruned weights meet the sparsity setting.
+    """
+
    def __init__(self, model: Module, config_list: List[Dict], trainer: Callable[[Module, Optimizer, Callable], None],
                 optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor],
                 training_epochs: int, scale: float = 0.0001, mode='global'):
-        """
-        Parameters
-        ----------
-        model
-            Model to be pruned
-        config_list
-            Supported keys:
-                - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
-                - sparsity_per_layer : Equals to sparsity.
-                - total_sparsity : This is to specify the total sparsity for all layers in this config,
-                each layer may have different sparsity.
-                - max_sparsity_per_layer : Always used with total_sparsity. Limit the max sparsity of each layer.
-                - op_types : Only BatchNorm2d is supported in SlimPruner.
-                - op_names : Operation names to prune.
-                - op_partial_names: An auxiliary field collecting matched op_names in model, then this will convert to op_names.
-                - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
-        trainer
-            A callable function used to train model or just inference. Take model, optimizer, criterion as input.
-            The model will be trained or inferenced `training_epochs` epochs.
-
-            Example::
-
-                def trainer(model: Module, optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor]):
-                    training = model.training
-                    model.train(mode=True)
-                    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-                    for batch_idx, (data, target) in enumerate(train_loader):
-                        data, target = data.to(device), target.to(device)
-                        optimizer.zero_grad()
-                        output = model(data)
-                        loss = criterion(output, target)
-                        loss.backward()
-                        # If you don't want to update the model, you can skip `optimizer.step()`, and set train mode False.
-                        optimizer.step()
-                    model.train(mode=training)
-        optimizer
-            The optimizer instance used in trainer. Note that this optimizer might be patched during collect data,
-            so do not use this optimizer in other places.
-        criterion
-            The criterion function used in trainer. Take model output and target value as input, and return the loss.
-        training_epochs
-            The epoch number for training model to sparsify the BN weight.
-        mode
-            'normal' or 'global'.
-            If prune the model in a global way, all layer weights with same config will be considered uniformly.
-            That means a single layer may not reach or exceed the sparsity setting in config,
-            but the total pruned weights meet the sparsity setting.
-        """
        self.mode = mode
        self.trainer = trainer
        self.optimizer = optimizer
@@ -435,61 +435,61 @@ class SlimPruner(BasicPruner):


 class ActivationPruner(BasicPruner):
+    """
+    Parameters
+    ----------
+    model : torch.nn.Module
+        Model to be pruned
+    config_list : List[Dict]
+        Supported keys:
+            - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
+            - sparsity_per_layer : Equals to sparsity.
+            - op_types : Conv2d and Linear are supported in ActivationPruner.
+            - op_names : Operation names to prune.
+            - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
+    trainer : Callable[[Module, Optimizer, Callable], None]
+        A callable function used to train model or just inference. Take model, optimizer, criterion as input.
+        The model will be trained or inferenced `training_epochs` epochs.
+
+        Example::
+
+            def trainer(model: Module, optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor]):
+                training = model.training
+                model.train(mode=True)
+                device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+                for batch_idx, (data, target) in enumerate(train_loader):
+                    data, target = data.to(device), target.to(device)
+                    optimizer.zero_grad()
+                    output = model(data)
+                    loss = criterion(output, target)
+                    loss.backward()
+                    # If you don't want to update the model, you can skip `optimizer.step()`, and set train mode False.
+                    optimizer.step()
+                model.train(mode=training)
+    optimizer : torch.optim.Optimizer
+        The optimizer instance used in trainer. Note that this optimizer might be patched during collect data,
+        so do not use this optimizer in other places.
+    criterion : Callable[[Tensor, Tensor], Tensor]
+        The criterion function used in trainer. Take model output and target value as input, and return the loss.
+    training_batches
+        The batch number used to collect activations.
+    mode : str
+        'normal' or 'dependency_aware'.
+        If prune the model in a dependency-aware way, this pruner will
+        prune the model according to the activation-based metrics and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : Optional[torch.Tensor]
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
+    """
+
    def __init__(self, model: Module, config_list: List[Dict], trainer: Callable[[Module, Optimizer, Callable], None],
                 optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor], training_batches: int, activation: str = 'relu',
                 mode: str = 'normal', dummy_input: Optional[Tensor] = None):
-        """
-        Parameters
-        ----------
-        model
-            Model to be pruned
-        config_list
-            Supported keys:
-                - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
-                - sparsity_per_layer : Equals to sparsity.
-                - op_types : Conv2d and Linear are supported in ActivationPruner.
-                - op_names : Operation names to prune.
-                - op_partial_names: An auxiliary field collecting matched op_names in model, then this will convert to op_names.
-                - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
-        trainer
-            A callable function used to train model or just inference. Take model, optimizer, criterion as input.
-            The model will be trained or inferenced `training_epochs` epochs.
-
-            Example::
-
-                def trainer(model: Module, optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor]):
-                    training = model.training
-                    model.train(mode=True)
-                    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-                    for batch_idx, (data, target) in enumerate(train_loader):
-                        data, target = data.to(device), target.to(device)
-                        optimizer.zero_grad()
-                        output = model(data)
-                        loss = criterion(output, target)
-                        loss.backward()
-                        # If you don't want to update the model, you can skip `optimizer.step()`, and set train mode False.
-                        optimizer.step()
-                    model.train(mode=training)
-        optimizer
-            The optimizer instance used in trainer. Note that this optimizer might be patched during collect data,
-            so do not use this optimizer in other places.
-        criterion
-            The criterion function used in trainer. Take model output and target value as input, and return the loss.
-        training_batches
-            The batch number used to collect activations.
-        mode
-            'normal' or 'dependency_aware'.
-            If prune the model in a dependency-aware way, this pruner will
-            prune the model according to the activation-based metrics and the channel-dependency or
-            group-dependency of the model. In this way, the pruner will force the conv layers
-            that have dependencies to prune the same channels, so the speedup module can better
-            harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
-            , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
-            dependency between the conv layers.
-        dummy_input
-            The dummy input to analyze the topology constraints. Note that, the dummy_input
-            should on the same device with the model.
-        """
        self.mode = mode
        self.dummy_input = dummy_input
        self.trainer = trainer
@@ -553,69 +553,69 @@ class ActivationMeanRankPruner(ActivationPruner):


 class TaylorFOWeightPruner(BasicPruner):
+    """
+    Parameters
+    ----------
+    model : torch.nn.Module
+        Model to be pruned
+    config_list : List[Dict]
+        Supported keys:
+            - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
+            - sparsity_per_layer : Equals to sparsity.
+            - total_sparsity : This is to specify the total sparsity for all layers in this config,
+                each layer may have different sparsity.
+            - max_sparsity_per_layer : Always used with total_sparsity. Limit the max sparsity of each layer.
+            - op_types : Conv2d and Linear are supported in TaylorFOWeightPruner.
+            - op_names : Operation names to prune.
+            - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
+    trainer : Callable[[Module, Optimizer, Callable]
+        A callable function used to train model or just inference. Take model, optimizer, criterion as input.
+        The model will be trained or inferenced `training_epochs` epochs.
+
+        Example::
+
+            def trainer(model: Module, optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor]):
+                training = model.training
+                model.train(mode=True)
+                device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+                for batch_idx, (data, target) in enumerate(train_loader):
+                    data, target = data.to(device), target.to(device)
+                    optimizer.zero_grad()
+                    output = model(data)
+                    loss = criterion(output, target)
+                    loss.backward()
+                    # If you don't want to update the model, you can skip `optimizer.step()`, and set train mode False.
+                    optimizer.step()
+                model.train(mode=training)
+    optimizer : torch.optim.Optimizer
+        The optimizer instance used in trainer. Note that this optimizer might be patched during collect data,
+        so do not use this optimizer in other places.
+    criterion : Callable[[Tensor, Tensor], Tensor]
+        The criterion function used in trainer. Take model output and target value as input, and return the loss.
+    training_batches : int
+        The batch number used to collect activations.
+    mode : str
+        'normal', 'dependency_aware' or 'global'.
+
+        If prune the model in a dependency-aware way, this pruner will
+        prune the model according to the taylorFO and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+
+        If prune the model in a global way, all layer weights with same config will be considered uniformly.
+        That means a single layer may not reach or exceed the sparsity setting in config,
+        but the total pruned weights meet the sparsity setting.
+    dummy_input : Optional[torch.Tensor]
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
+    """
+
    def __init__(self, model: Module, config_list: List[Dict], trainer: Callable[[Module, Optimizer, Callable], None],
                 optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor], training_batches: int,
                 mode: str = 'normal', dummy_input: Optional[Tensor] = None):
-        """
-        Parameters
-        ----------
-        model
-            Model to be pruned
-        config_list
-            Supported keys:
-                - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
-                - sparsity_per_layer : Equals to sparsity.
-                - total_sparsity : This is to specify the total sparsity for all layers in this config,
-                each layer may have different sparsity.
-                - max_sparsity_per_layer : Always used with total_sparsity. Limit the max sparsity of each layer.
-                - op_types : Conv2d and Linear are supported in TaylorFOWeightPruner.
-                - op_names : Operation names to prune.
-                - op_partial_names: An auxiliary field collecting matched op_names in model, then this will convert to op_names.
-                - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
-        trainer
-            A callable function used to train model or just inference. Take model, optimizer, criterion as input.
-            The model will be trained or inferenced `training_epochs` epochs.
-
-            Example::
-
-                def trainer(model: Module, optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor]):
-                    training = model.training
-                    model.train(mode=True)
-                    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-                    for batch_idx, (data, target) in enumerate(train_loader):
-                        data, target = data.to(device), target.to(device)
-                        optimizer.zero_grad()
-                        output = model(data)
-                        loss = criterion(output, target)
-                        loss.backward()
-                        # If you don't want to update the model, you can skip `optimizer.step()`, and set train mode False.
-                        optimizer.step()
-                    model.train(mode=training)
-        optimizer
-            The optimizer instance used in trainer. Note that this optimizer might be patched during collect data,
-            so do not use this optimizer in other places.
-        criterion
-            The criterion function used in trainer. Take model output and target value as input, and return the loss.
-        training_batches
-            The batch number used to collect activations.
-        mode
-            'normal', 'dependency_aware' or 'global'.
-
-            If prune the model in a dependency-aware way, this pruner will
-            prune the model according to the taylorFO and the channel-dependency or
-            group-dependency of the model. In this way, the pruner will force the conv layers
-            that have dependencies to prune the same channels, so the speedup module can better
-            harvest the speed benefit from the pruned model. Note that, if set 'dependency_aware'
-            , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
-            dependency between the conv layers.
-
-            If prune the model in a global way, all layer weights with same config will be considered uniformly.
-            That means a single layer may not reach or exceed the sparsity setting in config,
-            but the total pruned weights meet the sparsity setting.
-        dummy_input
-            The dummy input to analyze the topology constraints. Note that, the dummy_input
-            should on the same device with the model.
-        """
        self.mode = mode
        self.dummy_input = dummy_input
        self.trainer = trainer
@@ -674,53 +674,51 @@ class ADMMPruner(BasicPruner):
    Only in the final iteration, the mask will be generated and apply to model wrapper.

    The original paper refer to: https://arxiv.org/abs/1804.03294.
+
+    Parameters
+    ----------
+    model : torch.nn.Module
+        Model to be pruned.
+    config_list : List[Dict]
+        Supported keys:
+            - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
+            - sparsity_per_layer : Equals to sparsity.
+            - rho : Penalty parameters in ADMM algorithm.
+            - op_types : Operation types to prune.
+            - op_names : Operation names to prune.
+            - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
+    trainer : Callable[[Module, Optimizer, Callable]
+        A callable function used to train model or just inference. Take model, optimizer, criterion as input.
+        The model will be trained or inferenced `training_epochs` epochs.
+
+        Example::
+
+            def trainer(model: Module, optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor]):
+                training = model.training
+                model.train(mode=True)
+                device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+                for batch_idx, (data, target) in enumerate(train_loader):
+                    data, target = data.to(device), target.to(device)
+                    optimizer.zero_grad()
+                    output = model(data)
+                    loss = criterion(output, target)
+                    loss.backward()
+                    # If you don't want to update the model, you can skip `optimizer.step()`, and set train mode False.
+                    optimizer.step()
+                model.train(mode=training)
+    optimizer : torch.optim.Optimizer
+        The optimizer instance used in trainer. Note that this optimizer might be patched during collect data,
+        so do not use this optimizer in other places.
+    criterion : Callable[[Tensor, Tensor], Tensor]
+        The criterion function used in trainer. Take model output and target value as input, and return the loss.
+    iterations : int
+        The total iteration number in admm pruning algorithm.
+    training_epochs : int
+        The epoch number for training model in each iteration.
    """

    def __init__(self, model: Module, config_list: List[Dict], trainer: Callable[[Module, Optimizer, Callable], None],
                 optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor], iterations: int, training_epochs: int):
-        """
-        Parameters
-        ----------
-        model
-            Model to be pruned.
-        config_list
-            Supported keys:
-                - sparsity : This is to specify the sparsity for each layer in this config to be compressed.
-                - sparsity_per_layer : Equals to sparsity.
-                - rho : Penalty parameters in ADMM algorithm. Default: 1e-4.
-                - op_types : Operation types to prune.
-                - op_names : Operation names to prune.
-                - op_partial_names: An auxiliary field collecting matched op_names in model, then this will convert to op_names.
-                - exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
-        trainer
-            A callable function used to train model or just inference. Take model, optimizer, criterion as input.
-            The model will be trained or inferenced `training_epochs` epochs.
-
-            Example::
-
-                def trainer(model: Module, optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor]):
-                    training = model.training
-                    model.train(mode=True)
-                    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-                    for batch_idx, (data, target) in enumerate(train_loader):
-                        data, target = data.to(device), target.to(device)
-                        optimizer.zero_grad()
-                        output = model(data)
-                        loss = criterion(output, target)
-                        loss.backward()
-                        # If you don't want to update the model, you can skip `optimizer.step()`, and set train mode False.
-                        optimizer.step()
-                    model.train(mode=training)
-        optimizer
-            The optimizer instance used in trainer. Note that this optimizer might be patched during collect data,
-            so do not use this optimizer in other places.
-        criterion
-            The criterion function used in trainer. Take model output and target value as input, and return the loss.
-        iterations
-            The total iteration number in admm pruning algorithm.
-        training_epochs
-            The epoch number for training model in each iteration.
-        """
        self.trainer = trainer
        self.optimizer = optimizer
        self.criterion = criterion

--- a/nni/algorithms/compression/v2/pytorch/pruning/basic_scheduler.py
+++ b/nni/algorithms/compression/v2/pytorch/pruning/basic_scheduler.py
@@ -14,29 +14,29 @@ from .tools import TaskGenerator


 class PruningScheduler(BasePruningScheduler):
+    """
+    Parameters
+    ----------
+    pruner
+        The pruner used in pruner scheduler.
+        The scheduler will use `Pruner.reset(model, config_list)` to reset it in each iteration.
+    task_generator
+        Used to generate task for each iteration.
+    finetuner
+        The finetuner handled all finetune logic, use a pytorch module as input.
+    speed_up
+        If set True, speed up the model in each iteration.
+    dummy_input
+        If `speed_up` is True, `dummy_input` is required for trace the model in speed up.
+    evaluator
+        Evaluate the pruned model and give a score.
+        If evaluator is None, the best result refers to the latest result.
+    reset_weight
+        If set True, the model weight will reset to the origin model weight at the end of each iteration step.
+    """
    def __init__(self, pruner: Pruner, task_generator: TaskGenerator, finetuner: Callable[[Module], None] = None,
                 speed_up: bool = False, dummy_input: Tensor = None, evaluator: Optional[Callable[[Module], float]] = None,
                 reset_weight: bool = False):
-        """
-        Parameters
-        ----------
-        pruner
-            The pruner used in pruner scheduler.
-            The scheduler will use `Pruner.reset(model, config_list)` to reset it in each iteration.
-        task_generator
-            Used to generate task for each iteration.
-        finetuner
-            The finetuner handled all finetune logic, use a pytorch module as input.
-        speed_up
-            If set True, speed up the model in each iteration.
-        dummy_input
-            If `speed_up` is True, `dummy_input` is required for trace the model in speed up.
-        evaluator
-            Evaluate the pruned model and give a score.
-            If evaluator is None, the best result refers to the latest result.
-        reset_weight
-            If set True, the model weight will reset to the origin model weight at the end of each iteration step.
-        """
        self.pruner = pruner
        self.task_generator = task_generator
        self.finetuner = finetuner