Refactor model compression documentation (#2612)

19eabd69 · chicm-ms · GitHub · 0b9d6ce6 · 19eabd69 · 19eabd69
Unverified Commit 19eabd69 authored Jun 30, 2020 by chicm-ms Committed by GitHub Jun 30, 2020
5 changed files
--- a/docs/en_US/Compressor/CustomizeCompressor.md
+++ b/docs/en_US/Compressor/CustomizeCompressor.md
+# Customize New Compression Algorithm
+```eval_rst
+.. contents::
+```
+In order to simplify the process of writing new compression algorithms, we have designed simple and flexible programming interface, which covers pruning and quantization. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.
+**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference [Framework overview of model compression](https://nni.readthedocs.io/en/latest/Compressor/Framework.html)
+## Customize a new pruning algorithm
+Implementing a new pruning algorithm requires implementing a `weight masker` class which shoud be a subclass of `WeightMasker`, and a `pruner` class, which should be a subclass `Pruner`.
+An implementation of `weight masker` may look like this:
+```python
+class MyMasker(WeightMasker):
+    def __init__(self, model, pruner):
+        super().__init__(model, pruner)
+        # You can do some initialization here, such as collecting some statistics data
+        # if it is necessary for your algorithms to calculate the masks.
+    def calc_mask(self, sparsity, wrapper, wrapper_idx=None):
+        # calculate the masks based on the wrapper.weight, and sparsity, 
+        # and anything else
+        # mask = ...
+        return {'weight_mask': mask}
+```
+You can reference nni provided [weight masker](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py) implementations to implement your own weight masker.
+A basic `pruner` looks likes this:
+```python
+class MyPruner(Pruner):
+    def __init__(self, model, config_list, optimizer):
+        super().__init__(model, config_list, optimizer)
+        self.set_wrappers_attribute("if_calculated", False)
+        # construct a weight masker instance
+        self.masker = MyMasker(model, self)
+    def calc_mask(self, wrapper, wrapper_idx=None):
+        sparsity = wrapper.config['sparsity']
+        if wrapper.if_calculated:
+            # Already pruned, do not prune again as a one-shot pruner
+            return None
+        else:
+            # call your masker to actually calcuate the mask for this layer
+            masks = self.masker.calc_mask(sparsity=sparsity, wrapper=wrapper, wrapper_idx=wrapper_idx)
+            wrapper.if_calculated = True
+            return masks
+```
+Reference nni provided [pruner](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/compression/torch/pruning/one_shot.py) implementations to implement your own pruner class.
+***
+## Customize a new quantization algorithm
+To write a new quantization algorithm, you can write a class that inherits `nni.compression.torch.Quantizer`. Then, override the member functions with the logic of your algorithm. The member function to override is `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
+```python
+from nni.compression.torch import Quantizer
+class YourQuantizer(Quantizer):
+    def __init__(self, model, config_list):
+        """
+        Suggest you to use the NNI defined spec for config
+        """
+        super().__init__(model, config_list)
+    def quantize_weight(self, weight, config, **kwargs):
+        """
+        quantize should overload this method to quantize weight tensors.
+        This method is effectively hooked to :meth:`forward` of the model.
+        Parameters
+        ----------
+        weight : Tensor
+            weight that needs to be quantized
+        config : dict
+            the configuration for weight quantization
+        """
+        # Put your code to generate `new_weight` here
+        return new_weight
+    def quantize_output(self, output, config, **kwargs):
+        """
+        quantize should overload this method to quantize output.
+        This method is effectively hooked to `:meth:`forward` of the model.
+        Parameters
+        ----------
+        output : Tensor
+            output that needs to be quantized
+        config : dict
+            the configuration for output quantization
+        """
+        # Put your code to generate `new_output` here
+        return new_output
+    def quantize_input(self, *inputs, config, **kwargs):
+        """
+        quantize should overload this method to quantize input.
+        This method is effectively hooked to :meth:`forward` of the model.
+        Parameters
+        ----------
+        inputs : Tensor
+            inputs that needs to be quantized
+        config : dict
+            the configuration for inputs quantization
+        """
+        # Put your code to generate `new_input` here
+        return new_input
+    def update_epoch(self, epoch_num):
+        pass
+    def step(self):
+        """
+        Can do some processing based on the model or weights binded
+        in the func bind_model
+        """
+        pass
+```
+### Customize backward function
+Sometimes it's necessary for a quantization operation to have a customized backward function, such as [Straight-Through Estimator](https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste), user can customize a backward function as follow:
+```python
+from nni.compression.torch.compressor import Quantizer, QuantGrad, QuantType
+class ClipGrad(QuantGrad):
+    @staticmethod
+    def quant_backward(tensor, grad_output, quant_type):
+        """
+        This method should be overrided by subclass to provide customized backward function,
+        default implementation is Straight-Through Estimator
+        Parameters
+        ----------
+        tensor : Tensor
+            input of quantization operation
+        grad_output : Tensor
+            gradient of the output of quantization operation
+        quant_type : QuantType
+            the type of quantization, it can be `QuantType.QUANT_INPUT`, `QuantType.QUANT_WEIGHT`, `QuantType.QUANT_OUTPUT`,
+            you can define different behavior for different types.
+        Returns
+        -------
+        tensor
+            gradient of the input of quantization operation
+        """
+        # for quant_output function, set grad to zero if the absolute value of tensor is larger than 1
+        if quant_type == QuantType.QUANT_OUTPUT: 
+            grad_output[torch.abs(tensor) > 1] = 0
+        return grad_output
+class YourQuantizer(Quantizer):
+    def __init__(self, model, config_list):
+        super().__init__(model, config_list)
+        # set your customized backward function to overwrite default backward function
+        self.quant_grad = ClipGrad
+```
+If you do not customize `QuantGrad`, the default backward is Straight-Through Estimator. 
+_Coming Soon_ ...
--- a/docs/en_US/Compressor/Framework.md
+++ b/docs/en_US/Compressor/Framework.md
-# Customize A New Compression Algorithm
+# Framework overview of model compression
 ```eval_rst
 .. contents::
 ```
-To simplify writing a new compression algorithm, we design programming interfaces which are simple but flexible enough. There are interfaces for pruning and quantization respectively. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.
+Below picture shows the components overview of model compression framework.
-## Customize a new pruning algorithm
+![](../../img/compressor_framework.jpg)
-To better demonstrate how to customize a new pruning algorithm, it is necessary for users to first understand the framework for supporting various pruning algorithms in NNI.
+There are 3 major components/classes in NNI model compression framework: `Compressor`, `Pruner` and `Quantizer`. Let's look at them in detail one by one:
-### Framework overview for pruning algorithms
+## Compressor
-Following example shows how to use a pruner:
+Compressor is the base class for pruner and quntizer, it provides a unified interface for pruner and quantizer for end users, so that pruner and quantizer can be used in the same way. For example, to use a pruner:
 ```python
 from nni.compression.torch import LevelPruner
@@ -32,82 +32,25 @@ model = pruner.compress()
 # the model will be pruned during training automatically
 ```
-A pruner receives `model`, `config_list` and `optimizer` as arguments. It prunes the model per the `config_list` during training loop by adding a hook on `optimizer.step()`.
+To use a quantizer:
-From implementation perspective, a pruner consists of a `weight masker` instance and multiple `module wrapper` instances.
-#### Weight masker
-A `weight masker` is the implementation of pruning algorithms, it can prune a specified layer wrapped by `module wrapper` with specified sparsity.
-#### Module wrapper
-A `module wrapper` is a module containing:
-1. the origin module
-2. some buffers used by `calc_mask`
-3. a new forward method that applies masks before running the original forward method.
-the reasons to use `module wrapper`:
-1. some buffers are needed by `calc_mask` to calculate masks and these buffers should be registered in `module wrapper` so that the original modules are not contaminated.
-2. a new `forward` method is needed to apply masks to weight before calling the real `forward` method.
-#### Pruner
-A `pruner` is responsible for:
-1. Manage / verify config_list.
-2. Use `module wrapper` to wrap the model layers and add hook on `optimizer.step`
-3. Use `weight masker` to calculate masks of layers while pruning.
-4. Export pruned model weights and masks.
-### Implement a new pruning algorithm
-Implementing a new pruning algorithm requires implementing a `weight masker` class which shoud be a subclass of `WeightMasker`, and a `pruner` class, which should be a subclass `Pruner`.
-An implementation of `weight masker` may look like this:
 ```python
-class MyMasker(WeightMasker):
+from nni.compression.torch import DoReFaQuantizer
-    def __init__(self, model, pruner):
-        super().__init__(model, pruner)
-        # You can do some initialization here, such as collecting some statistics data
-        # if it is necessary for your algorithms to calculate the masks.
-    def calc_mask(self, sparsity, wrapper, wrapper_idx=None):
-        # calculate the masks based on the wrapper.weight, and sparsity, 
-        # and anything else
-        # mask = ...
-        return {'weight_mask': mask}
-```
-You can reference nni provided [weight masker](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py) implementations to implement your own weight masker.
-A basic `pruner` looks likes this:
-```python
+configure_list = [{
-class MyPruner(Pruner):
+    'quant_types': ['weight'],
-    def __init__(self, model, config_list, optimizer):
+    'quant_bits': {
-        super().__init__(model, config_list, optimizer)
+        'weight': 8,
-        self.set_wrappers_attribute("if_calculated", False)
+    },
-        # construct a weight masker instance
+    'op_types':['Conv2d', 'Linear']
-        self.masker = MyMasker(model, self)
+}]
+optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
-    def calc_mask(self, wrapper, wrapper_idx=None):
+quantizer = DoReFaQuantizer(model, configure_list, optimizer)
-        sparsity = wrapper.config['sparsity']
+quantizer.compress()
-        if wrapper.if_calculated:
-            # Already pruned, do not prune again as a one-shot pruner
-            return None
-        else:
-            # call your masker to actually calcuate the mask for this layer
-            masks = self.masker.calc_mask(sparsity=sparsity, wrapper=wrapper, wrapper_idx=wrapper_idx)
-            wrapper.if_calculated = True
-            return masks
 ```
+View [example code](https://github.com/microsoft/nni/tree/master/examples/model_compress) for more information.
-Reference nni provided [pruner](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/compression/torch/pruning/one_shot.py) implementations to implement your own pruner class.
+`Compressor` class provides some utility methods for subclass and users:
 ### Set wrapper attribute
@@ -148,130 +91,104 @@ collector_id = self.pruner.add_activation_collector(collector)
 self.pruner.remove_activation_collector(collector_id)
 ```
-### Multi-GPU support
+***
+## Pruner
+A pruner receives `model`, `config_list` and `optimizer` as arguments. It prunes the model per the `config_list` during training loop by adding a hook on `optimizer.step()`.
+Pruner class is a subclass of Compressor, so it contains everything in the Compressor class and some additional components only for pruning, it contains:
+### Weight masker
+A `weight masker` is the implementation of pruning algorithms, it can prune a specified layer wrapped by `module wrapper` with specified sparsity.
+### Pruning module wrapper
+A `pruning module wrapper` is a module containing:
+1. the origin module
+2. some buffers used by `calc_mask`
+3. a new forward method that applies masks before running the original forward method.
+the reasons to use `module wrapper`:
+1. some buffers are needed by `calc_mask` to calculate masks and these buffers should be registered in `module wrapper` so that the original modules are not contaminated.
+2. a new `forward` method is needed to apply masks to weight before calling the real `forward` method.
+### Pruning hook
+A pruning hook is installed on a pruner when the pruner is constructed, it is used to call pruner's calc_mask method at `optimizer.step()` is invoked.
-On multi-GPU training, buffers and parameters are copied to multiple GPU every time the `forward` method runs on multiple GPU. If buffers and parameters are updated in the `forward` method, an `in-place` update is needed to ensure the update is effective.
-Since `calc_mask` is called in the `optimizer.step` method, which happens after the `forward` method and happens only on one GPU, it supports multi-GPU naturally.
 ***
-## Customize a new quantization algorithm
+## Quantizer
-To write a new quantization algorithm, you can write a class that inherits `nni.compression.torch.Quantizer`. Then, override the member functions with the logic of your algorithm. The member function to override is `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
+Quantizer class is also a subclass of `Compressor`, it is used to compress models by reducing the number of bits required to represent weights or activations, which can reduce the computations and the inference time. It contains:
-```python
+### Quantization module wrapper
-from nni.compression.torch import Quantizer
+Each module/layer of the model to be quantized is wrapped by a quantization module wrapper, it provides a new `forward` method to quantize the original module's weight, input and output.
-class YourQuantizer(Quantizer):
+### Quantization hook
-    def __init__(self, model, config_list):
+A quantization hook is installed on a quntizer when it is constructed, it is call at `optimizer.step()`.
+### Quantization methods
+`Quantizer` class provides following methods for subclass to implement quantization algorithms:
+```python
+class Quantizer(Compressor):
    """
-        Suggest you to use the NNI defined spec for config
+    Base quantizer for pytorch quantizer
    """
-        super().__init__(model, config_list)
+    def quantize_weight(self, weight, wrapper, **kwargs):
-    def quantize_weight(self, weight, config, **kwargs):
        """
-        quantize should overload this method to quantize weight tensors.
+        quantize should overload this method to quantize weight.
        This method is effectively hooked to :meth:`forward` of the model.
        Parameters
        ----------
        weight : Tensor
            weight that needs to be quantized
-        config : dict
+        wrapper : QuantizerModuleWrapper
-            the configuration for weight quantization
+            the wrapper for origin module
        """
+        raise NotImplementedError('Quantizer must overload quantize_weight()')
-        # Put your code to generate `new_weight` here
+    def quantize_output(self, output, wrapper, **kwargs):
-        return new_weight
-    def quantize_output(self, output, config, **kwargs):
        """
        quantize should overload this method to quantize output.
-        This method is effectively hooked to `:meth:`forward` of the model.
+        This method is effectively hooked to :meth:`forward` of the model.
        Parameters
        ----------
        output : Tensor
            output that needs to be quantized
-        config : dict
+        wrapper : QuantizerModuleWrapper
-            the configuration for output quantization
+            the wrapper for origin module
        """
+        raise NotImplementedError('Quantizer must overload quantize_output()')
-        # Put your code to generate `new_output` here
+    def quantize_input(self, *inputs, wrapper, **kwargs):
-        return new_output
-    def quantize_input(self, *inputs, config, **kwargs):
        """
        quantize should overload this method to quantize input.
        This method is effectively hooked to :meth:`forward` of the model.
        Parameters
        ----------
        inputs : Tensor
            inputs that needs to be quantized
-        config : dict
+        wrapper : QuantizerModuleWrapper
-            the configuration for inputs quantization
+            the wrapper for origin module
        """
+        raise NotImplementedError('Quantizer must overload quantize_input()')
-        # Put your code to generate `new_input` here
-        return new_input
-    def update_epoch(self, epoch_num):
-        pass
-    def step(self):
-        """
-        Can do some processing based on the model or weights binded
-        in the func bind_model
-        """
-        pass
 ```
-### Customize backward function
+***
-Sometimes it's necessary for a quantization operation to have a customized backward function, such as [Straight-Through Estimator](https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste), user can customize a backward function as follow:
-```python
-from nni.compression.torch.compressor import Quantizer, QuantGrad, QuantType
-class ClipGrad(QuantGrad):
-    @staticmethod
-    def quant_backward(tensor, grad_output, quant_type):
-        """
-        This method should be overrided by subclass to provide customized backward function,
-        default implementation is Straight-Through Estimator
-        Parameters
-        ----------
-        tensor : Tensor
-            input of quantization operation
-        grad_output : Tensor
-            gradient of the output of quantization operation
-        quant_type : QuantType
-            the type of quantization, it can be `QuantType.QUANT_INPUT`, `QuantType.QUANT_WEIGHT`, `QuantType.QUANT_OUTPUT`,
-            you can define different behavior for different types.
-        Returns
-        -------
-        tensor
-            gradient of the input of quantization operation
-        """
-        # for quant_output function, set grad to zero if the absolute value of tensor is larger than 1
-        if quant_type == QuantType.QUANT_OUTPUT: 
-            grad_output[torch.abs(tensor) > 1] = 0
-        return grad_output
-class YourQuantizer(Quantizer):
+## Multi-GPU support
-    def __init__(self, model, config_list):
-        super().__init__(model, config_list)
-        # set your customized backward function to overwrite default backward function
-        self.quant_grad = ClipGrad
-```
+On multi-GPU training, buffers and parameters are copied to multiple GPU every time the `forward` method runs on multiple GPU. If buffers and parameters are updated in the `forward` method, an `in-place` update is needed to ensure the update is effective.
+Since `calc_mask` is called in the `optimizer.step` method, which happens after the `forward` method and happens only on one GPU, it supports multi-GPU naturally.
-If you do not customize `QuantGrad`, the default backward is Straight-Through Estimator. 
-_Coming Soon_ ...
\ No newline at end of file
--- a/docs/en_US/Compressor/Pruner.md
+++ b/docs/en_US/Compressor/Pruner.md
 # Supported Pruning Algorithms on NNI
-We provide several pruning algorithms that support fine-grained weight pruning and structural filter pruning. **Weight pruning** generally results in  unstructured models, which need specialized haredware or software to speed up the sparse network. **Filter Pruning** achieves acceleratation by removing the entire filter.  We also provide an algorithm to control the **pruning schedule**.
+We provide several pruning algorithms that support fine-grained weight pruning and structural filter pruning. **Fine-grained Pruning** generally results in  unstructured models, which need specialized haredware or software to speed up the sparse network. **Filter Pruning** achieves acceleratation by removing the entire filter.  We also provide an algorithm to control the **pruning schedule**.
-**Weight Pruning**
+**Fine-grained Pruning**
 * [Level Pruner](#level-pruner)
-* [Lottery Ticket Hypothesis](#lottery-ticket-hypothesis)
 **Filter Pruning**
 * [Slim Pruner](#slim-pruner)
-* [Filter Pruners with Weight Rank](#weightrankfilterpruner)
+* [FPGM Pruner](#fpgm-pruner)
-    * [FPGM Pruner](#fpgm-pruner)
+* [L1Filter Pruner](#l1filter-pruner)
-    * [L1Filter Pruner](#l1filter-pruner)
+* [L2Filter Pruner](#l2filter-pruner)
-    * [L2Filter Pruner](#l2filter-pruner)
+* [APoZ Rank Pruner](#activationapozrankfilterpruner)
-* [Filter Pruners with Activation Rank](#activationrankfilterpruner)
+* [Activation Mean Rank Pruner](#activationmeanrankfilterpruner)
-    * [APoZ Rank Pruner](#activationapozrankfilterpruner)
+* [Taylor FO On Weight Pruner](#taylorfoweightfilterpruner)
-    * [Activation Mean Rank Pruner](#activationmeanrankfilterpruner)
-* [Filter Pruners with Gradient Rank](#gradientrankfilterpruner)
-    * [Taylor FO On Weight Pruner](#taylorfoweightfilterpruner)
 **Pruning Schedule**
 * [AGP Pruner](#agp-pruner)
+**Others**
+* [Lottery Ticket Hypothesis](#lottery-ticket-hypothesis)
 ## Level Pruner
 This is one basic one-shot pruner: you can set a target sparsity level (expressed as a fraction, 0.6 means we will prune 60%). 
@@ -51,126 +50,6 @@ pruner.compress()
 ***
-## AGP Pruner
-This is an iterative pruner, In [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878), authors Michael Zhu and Suyog Gupta provide an algorithm to prune the weight gradually.
->We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step t0 and with pruning frequency ∆t:
-![](../../img/agp_pruner.png)
->The binary weight masks are updated every ∆t steps as the network is trained to gradually increase the sparsity of the network while allowing the network training steps to recover from any pruning-induced loss in accuracy. In our experience, varying the pruning frequency ∆t between 100 and 1000 training steps had a negligible impact on the final model quality. Once the model achieves the target sparsity sf , the weight masks are no longer updated. The intuition behind this sparsity function in equation
-### Usage
-You can prune all weight from 0% to 80% sparsity in 10 epoch with the code below.
-Tensorflow code
-```python
-from nni.compression.tensorflow import AGP_Pruner
-config_list = [{
-    'initial_sparsity': 0,
-    'final_sparsity': 0.8,
-    'start_epoch': 0,
-    'end_epoch': 10,
-    'frequency': 1,
-    'op_types': 'default'
-}]
-pruner = AGP_Pruner(tf.get_default_graph(), config_list)
-pruner.compress()
-```
-PyTorch code
-```python
-from nni.compression.torch import AGP_Pruner
-config_list = [{
-    'initial_sparsity': 0,
-    'final_sparsity': 0.8,
-    'start_epoch': 0,
-    'end_epoch': 10,
-    'frequency': 1,
-    'op_types': ['default']
-}]
-pruner = AGP_Pruner(model, config_list, pruning_algorithm='level')
-pruner.compress()
-```
-AGP pruner uses `LevelPruner` algorithms to prune the weight by default, however you can set `pruning_algorithm` parameter to other values to use other pruning algorithms:
-* `level`: LevelPruner
-* `slim`: SlimPruner
-* `l1`: L1FilterPruner
-* `l2`: L2FilterPruner
-* `fpgm`: FPGMPruner
-* `taylorfo`: TaylorFOWeightFilterPruner
-* `apoz`: ActivationAPoZRankFilterPruner
-* `mean_activation`: ActivationMeanRankFilterPruner
-You should add code below to update epoch number when you finish one epoch in your training code.
-Tensorflow code 
-```python
-pruner.update_epoch(epoch, sess)
-```
-PyTorch code
-```python
-pruner.update_epoch(epoch)
-```
-You can view example for more information
-#### User configuration for AGP Pruner
-* **initial_sparsity:** This is to specify the sparsity when compressor starts to compress
-* **final_sparsity:** This is to specify the sparsity when compressor finishes to compress
-* **start_epoch:** This is to specify the epoch number when compressor starts to compress, default start from epoch 0
-* **end_epoch:** This is to specify the epoch number when compressor finishes to compress
-* **frequency:** This is to specify every *frequency* number epochs compressor compress once, default frequency=1
-***
-## Lottery Ticket Hypothesis
-[The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635), authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the *lottery ticket hypothesis*: dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*) that -- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.
-In this paper, the authors use the following process to prune a model, called *iterative prunning*:
->1. Randomly initialize a neural network f(x;theta_0) (where theta_0 follows D_{theta}).
->2. Train the network for j iterations, arriving at parameters theta_j.
->3. Prune p% of the parameters in theta_j, creating a mask m.
->4. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
->5. Repeat step 2, 3, and 4.
-If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning, each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.
-### Usage
-PyTorch code
-```python
-from nni.compression.torch import LotteryTicketPruner
-config_list = [{
-    'prune_iterations': 5,
-    'sparsity': 0.8,
-    'op_types': ['default']
-}]
-pruner = LotteryTicketPruner(model, config_list, optimizer)
-pruner.compress()
-for _ in pruner.get_prune_iterations():
-    pruner.prune_iteration_start()
-    for epoch in range(epoch_num):
-        ...
-```
-The above configuration means that there are 5 times of iterative pruning. As the 5 times iterative pruning are executed in the same run, LotteryTicketPruner needs `model` and `optimizer` (**Note that should add `lr_scheduler` if used**) to reset their states every time a new prune iteration starts. Please use `get_prune_iterations` to get the pruning iterations, and invoke `prune_iteration_start` at the beginning of each iteration. `epoch_num` is better to be large enough for model convergence, because the hypothesis is that the performance (accuracy) got in latter rounds with high sparsity could be comparable with that got in the first round.
-*Tensorflow version will be supported later.*
-#### User configuration for LotteryTicketPruner
-* **prune_iterations:** The number of rounds for the iterative pruning, i.e., the number of iterative pruning.
-* **sparsity:** The final sparsity when the compression is done.
-### Reproduced Experiment
-We try to reproduce the experiment result of the fully connected network on MNIST using the same configuration as in the paper. The code can be referred [here](https://github.com/microsoft/nni/tree/master/examples/model_compress/lottery_torch_mnist_fc.py). In this experiment, we prune 10 times, for each pruning we train the pruned model for 50 epochs.
-![](../../img/lottery_ticket_mnist_fc.png)
-The above figure shows the result of the fully connected network. `round0-sparsity-0.0` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear.
-***
 ## Slim Pruner
 This is an one-shot pruner, In ['Learning Efficient Convolutional Networks through Network Slimming'](https://arxiv.org/pdf/1708.06519.pdf), authors Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan and Changshui Zhang.
@@ -208,10 +87,7 @@ The experiments code can be found at [examples/model_compress]( https://github.c
 ***
-## WeightRankFilterPruner
+## FPGM Pruner
-WeightRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the weights in convolution layers to achieve a preset level of network sparsity
-### FPGM Pruner
 This is an one-shot pruner, FPGM Pruner is an implementation of paper [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf)
@@ -221,7 +97,7 @@ FPGMPruner prune filters with the smallest geometric median
 >Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
-#### Usage
+### Usage
 Tensorflow code
 ```python
@@ -243,26 +119,14 @@ config_list = [{
 pruner = FPGMPruner(model, config_list)
 pruner.compress()
 ```
-Note: FPGM Pruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.
-You should add code below to update epoch number at beginning of each epoch.
-Tensorflow code
-```python
-pruner.update_epoch(epoch, sess)
-```
-PyTorch code
-```python
-pruner.update_epoch(epoch)
-```
-You can view example for more information
 #### User configuration for FPGM Pruner
-* **sparsity:** How much percentage of convolutional filters are to be pruned.
+- **sparsity:** How much percentage of convolutional filters are to be pruned.
+- **op_types:** Only Conv2d is supported in L1Filter Pruner
 ***
-### L1Filter Pruner
+## L1Filter Pruner
 This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
@@ -280,7 +144,7 @@ This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https:
 > 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
 >      weights are copied to the new model.
-#### Usage
+### Usage
 PyTorch code
@@ -294,9 +158,9 @@ pruner.compress()
 #### User configuration for L1Filter Pruner
 - **sparsity:** This is to specify the sparsity operations to be compressed to
- **op_types:** Only Conv1d and Conv2d is supported in L1Filter Pruner
+- **op_types:** Only Conv2d is supported in L1Filter Pruner
-#### Reproduced Experiment
+### Reproduced Experiment
 We implemented one of the experiments in ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710) with **L1FilterPruner**, we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which $64\%$ parameters are pruned. Our experiments results are as follows:
@@ -309,11 +173,11 @@ The experiments code can be found at [examples/model_compress]( https://github.c
 ***
-### L2Filter Pruner
+## L2Filter Pruner
 This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights. It is implemented as a one-shot pruner.
-#### Usage
+### Usage
 PyTorch code
@@ -324,25 +188,22 @@ pruner = L2FilterPruner(model, config_list)
 pruner.compress()
 ```
-#### User configuration for L2Filter Pruner
+### User configuration for L2Filter Pruner
 - **sparsity:** This is to specify the sparsity operations to be compressed to
- **op_types:** Only Conv1d and Conv2d is supported in L2Filter Pruner
+- **op_types:** Only Conv2d is supported in L2Filter Pruner
 ***
-## ActivationRankFilterPruner
+## ActivationAPoZRankFilterPruner
-ActivationRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the output activations of convolution layers to achieve a preset level of network sparsity.
-### ActivationAPoZRankFilterPruner
+ActivationAPoZRankFilterPruner is a pruner which prunes the filters with the smallest importance criterion `APoZ` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `APoZ` is explained in the paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250).
-We implemented it as a one-shot pruner, it prunes convolutional layers based on the criterion `APoZ` which is explained in the paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250). Iterative pruning based on `APoZ` will be supported in future release.
 The APoZ is defined as:
 ![](../../img/apoz.png)
-#### Usage
+### Usage
 PyTorch code
@@ -360,18 +221,18 @@ Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers withi
 You can view example for more information
-#### User configuration for ActivationAPoZRankFilterPruner
+### User configuration for ActivationAPoZRankFilterPruner
 - **sparsity:** How much percentage of convolutional filters are to be pruned.
 - **op_types:** Only Conv2d is supported in ActivationAPoZRankFilterPruner
 ***
-### ActivationMeanRankFilterPruner
+## ActivationMeanRankFilterPruner
-We implemented it as a one-shot pruner, it prunes convolutional layers based on the criterion `mean activation` which is explained in section 2.2 of the paper[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440). Other pruning criteria mentioned in this paper will be supported in future release.
+ActivationMeanRankFilterPruner is a pruner which prunes the filters with the smallest importance criterion `mean activation` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `mean activation` is explained in section 2.2 of the paper[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440). Other pruning criteria mentioned in this paper will be supported in future release.
-#### Usage
+### Usage
 PyTorch code
@@ -381,7 +242,7 @@ config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
 }]
-pruner = ActivationMeanRankFilterPruner(model, config_list)
+pruner = ActivationMeanRankFilterPruner(model, config_list, statistics_batch_num=1)
 pruner.compress()
 ```
@@ -389,26 +250,22 @@ Note: ActivationMeanRankFilterPruner is used to prune convolutional layers withi
 You can view example for more information
-#### User configuration for ActivationMeanRankFilterPruner
+### User configuration for ActivationMeanRankFilterPruner
 - **sparsity:** How much percentage of convolutional filters are to be pruned.
 - **op_types:** Only Conv2d is supported in ActivationMeanRankFilterPruner.
 ***
-## GradientRankFilterPruner
+## TaylorFOWeightFilterPruner
-GradientRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the gradients of convolution layers to achieve a preset level of network sparsity.
-### TaylorFOWeightFilterPruner
+TaylorFOWeightFilterPruner is a pruner which prunes convolutional layers based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity. The estimated importance of filters is defined as the paper [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf). Other pruning criteria mentioned in this paper will be supported in future release.
-We implemented it as a one-shot pruner, it prunes convolutional layers based on the first order taylor expansion on weights. The estimated importance of filters is defined as the paper [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf). Other pruning criteria mentioned in this paper will be supported in future release.
 > 
 ![](../../img/importance_estimation_sum.png)
-#### Usage
+### Usage
 PyTorch code
@@ -418,17 +275,128 @@ config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
 }]
-pruner = TaylorFOWeightFilterPruner(model, config_list, optimizer)
+pruner = TaylorFOWeightFilterPruner(model, config_list, statistics_batch_num=1)
 pruner.compress()
 ```
 You can view example for more information
-#### User configuration for GradientWeightSumFilterPruner
+### User configuration for TaylorFOWeightFilterPruner
 - **sparsity:** How much percentage of convolutional filters are to be pruned.
 - **op_types:** Currently only Conv2d is supported in TaylorFOWeightFilterPruner.
+***
+## AGP Pruner
+This is an iterative pruner, In [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878), authors Michael Zhu and Suyog Gupta provide an algorithm to prune the weight gradually.
+>We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step t0 and with pruning frequency ∆t:
+![](../../img/agp_pruner.png)
+>The binary weight masks are updated every ∆t steps as the network is trained to gradually increase the sparsity of the network while allowing the network training steps to recover from any pruning-induced loss in accuracy. In our experience, varying the pruning frequency ∆t between 100 and 1000 training steps had a negligible impact on the final model quality. Once the model achieves the target sparsity sf , the weight masks are no longer updated. The intuition behind this sparsity function in equation
+### Usage
+You can prune all weight from 0% to 80% sparsity in 10 epoch with the code below.
+PyTorch code
+```python
+from nni.compression.torch import AGP_Pruner
+config_list = [{
+    'initial_sparsity': 0,
+    'final_sparsity': 0.8,
+    'start_epoch': 0,
+    'end_epoch': 10,
+    'frequency': 1,
+    'op_types': ['default']
+}]
+# load a pretrained model or train a model before using a pruner
+# model = MyModel()
+# model.load_state_dict(torch.load('mycheckpoint.pth'))
+# AGP pruner prunes model while fine tuning the model by adding a hook on
+# optimizer.step(), so an optimizer is required to prune the model.
+optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
+pruner = AGP_Pruner(model, config_list, optimizer, pruning_algorithm='level')
+pruner.compress()
+```
+AGP pruner uses `LevelPruner` algorithms to prune the weight by default, however you can set `pruning_algorithm` parameter to other values to use other pruning algorithms:
+* `level`: LevelPruner
+* `slim`: SlimPruner
+* `l1`: L1FilterPruner
+* `l2`: L2FilterPruner
+* `fpgm`: FPGMPruner
+* `taylorfo`: TaylorFOWeightFilterPruner
+* `apoz`: ActivationAPoZRankFilterPruner
+* `mean_activation`: ActivationMeanRankFilterPruner
+You should add code below to update epoch number when you finish one epoch in your training code.
+PyTorch code
+```python
+pruner.update_epoch(epoch)
+```
+You can view example for more information
+#### User configuration for AGP Pruner
+* **initial_sparsity:** This is to specify the sparsity when compressor starts to compress
+* **final_sparsity:** This is to specify the sparsity when compressor finishes to compress
+* **start_epoch:** This is to specify the epoch number when compressor starts to compress, default start from epoch 0
+* **end_epoch:** This is to specify the epoch number when compressor finishes to compress
+* **frequency:** This is to specify every *frequency* number epochs compressor compress once, default frequency=1
+***
+## Lottery Ticket Hypothesis
+[The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635), authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the *lottery ticket hypothesis*: dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*) that -- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.
+In this paper, the authors use the following process to prune a model, called *iterative prunning*:
+>1. Randomly initialize a neural network f(x;theta_0) (where theta_0 follows D_{theta}).
+>2. Train the network for j iterations, arriving at parameters theta_j.
+>3. Prune p% of the parameters in theta_j, creating a mask m.
+>4. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
+>5. Repeat step 2, 3, and 4.
+If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning, each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.
+### Usage
+PyTorch code
+```python
+from nni.compression.torch import LotteryTicketPruner
+config_list = [{
+    'prune_iterations': 5,
+    'sparsity': 0.8,
+    'op_types': ['default']
+}]
+pruner = LotteryTicketPruner(model, config_list, optimizer)
+pruner.compress()
+for _ in pruner.get_prune_iterations():
+    pruner.prune_iteration_start()
+    for epoch in range(epoch_num):
+        ...
+```
+The above configuration means that there are 5 times of iterative pruning. As the 5 times iterative pruning are executed in the same run, LotteryTicketPruner needs `model` and `optimizer` (**Note that should add `lr_scheduler` if used**) to reset their states every time a new prune iteration starts. Please use `get_prune_iterations` to get the pruning iterations, and invoke `prune_iteration_start` at the beginning of each iteration. `epoch_num` is better to be large enough for model convergence, because the hypothesis is that the performance (accuracy) got in latter rounds with high sparsity could be comparable with that got in the first round.
+*Tensorflow version will be supported later.*
+#### User configuration for LotteryTicketPruner
+* **prune_iterations:** The number of rounds for the iterative pruning, i.e., the number of iterative pruning.
+* **sparsity:** The final sparsity when the compression is done.
+### Reproduced Experiment
+We try to reproduce the experiment result of the fully connected network on MNIST using the same configuration as in the paper. The code can be referred [here](https://github.com/microsoft/nni/tree/master/examples/model_compress/lottery_torch_mnist_fc.py). In this experiment, we prune 10 times, for each pruning we train the pruned model for 50 epochs.
+![](../../img/lottery_ticket_mnist_fc.png)
+The above figure shows the result of the fully connected network. `round0-sparsity-0.0` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear.
\ No newline at end of file
--- a/docs/en_US/model_compression.rst
+++ b/docs/en_US/model_compression.rst
@@ -22,4 +22,5 @@ For details, please refer to the following tutorials:
    Automatic Model Compression <Compressor/AutoCompression>
    Model Speedup <Compressor/ModelSpeedup>
    Compression Utilities <Compressor/CompressionUtils>
-    Customize Compression Algorithms <Compressor/Framework>
+    Compression Framework <Compressor/Framework>
+    Customize Compression Algorithms <Compressor/CustomizeCompressor>
--- a/docs/img/compressor_framework.jpg
+++ b/docs/img/compressor_framework.jpg