Add network trimming pruning algorithm and fix bias mask(testing) (#1867)

b0c0eb7b · Tang Lang · QuanluZhang · 80a49a10 · b0c0eb7b · b0c0eb7b
Commit b0c0eb7b authored Dec 24, 2019 by Tang Lang Committed by QuanluZhang Dec 24, 2019
12 changed files
--- a/docs/en_US/Compressor/ActivationRankFilterPruner.md
+++ b/docs/en_US/Compressor/ActivationRankFilterPruner.md
+ActivationRankFilterPruner on NNI Compressor
+===
+## 1. Introduction
+ActivationRankFilterPruner is a series of pruners which prune filters according to some importance criterion calculated from the filters' output activations.
+|             Pruner             |       Importance criterion        |                       Reference paper                        |
+| :----------------------------: | :-------------------------------: | :----------------------------------------------------------: |
+| ActivationAPoZRankFilterPruner | APoZ(average percentage of zeros) | [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250) |
+| ActivationMeanRankFilterPruner | mean value of output activations  | [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440) |
+## 2. Pruners
+### ActivationAPoZRankFilterPruner
+Hengyuan Hu, Rui Peng, Yu-Wing Tai and Chi-Keung Tang,
+"[Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250)", ICLR 2016.
+ActivationAPoZRankFilterPruner prunes the filters with the smallest APoZ(average percentage of zeros) of output activations.
+The APoZ is defined as:
+![](../../img/apoz.png)
+### ActivationMeanRankFilterPruner
+Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila and Jan Kautz,
+"[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440)", ICLR 2017.
+ActivationMeanRankFilterPruner prunes the filters with the smallest mean value of output activations
+## 3. Usage
+PyTorch code
+```python
+from nni.compression.torch import ActivationAPoZRankFilterPruner
+config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'], 'op_names': ['conv1', 'conv2'] }]
+pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1)
+pruner.compress()
+```
+#### User configuration for ActivationAPoZRankFilterPruner
+- **sparsity:** This is to specify the sparsity operations to be compressed to
+- **op_types:** Only Conv2d is supported in ActivationAPoZRankFilterPruner
+## 4. Experiment
+TODO. 
--- a/docs/en_US/Compressor/Overview.md
+++ b/docs/en_US/Compressor/Overview.md
@@ -14,10 +14,14 @@ We have provided several compression algorithms, including several pruning and q
 |---|---|
 | [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
 | [AGP Pruner](./Pruner.md#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
-| [L1Filter Pruner](./Pruner.md#l1filter-pruner) | Pruning least important filters in convolution layers(PRUNING FILTERS FOR EFFICIENT CONVNETS)[Reference Paper](https://arxiv.org/abs/1608.08710) |
-| [Slim Pruner](./Pruner.md#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming)[Reference Paper](https://arxiv.org/abs/1708.06519) |
 | [Lottery Ticket Pruner](./Pruner.md#agp-pruner) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
 | [FPGM Pruner](./Pruner.md#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
+| [L1Filter Pruner](./Pruner.md#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers(PRUNING FILTERS FOR EFFICIENT CONVNETS)[Reference Paper](https://arxiv.org/abs/1608.08710) |
+| [L2Filter Pruner](./Pruner.md#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
+| [ActivationAPoZRankFilterPruner](./Pruner.md#ActivationAPoZRankFilterPruner) | Pruning filters prunes the filters with the smallest APoZ(average percentage of zeros) of output activations(Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures)[Reference Paper](https://arxiv.org/abs/1607.03250) |
+| [ActivationMeanRankFilterPruner](./Pruner.md#ActivationMeanRankFilterPruner) | Pruning filters prunes the filters with the smallest mean value of output activations(Pruning Convolutional Neural Networks for Resource Efficient Inference)[Reference Paper](https://arxiv.org/abs/1611.06440) |
+| [Slim Pruner](./Pruner.md#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming)[Reference Paper](https://arxiv.org/abs/1708.06519) |
 **Quantization**

--- a/docs/en_US/Compressor/Pruner.md
+++ b/docs/en_US/Compressor/Pruner.md
@@ -10,7 +10,7 @@ We first sort the weights in the specified layer by their absolute values. And t
 ### Usage
 Tensorflow code
-```
+```python
 from nni.compression.tensorflow import LevelPruner
 config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
 pruner = LevelPruner(model_graph, config_list)
@@ -18,7 +18,7 @@ pruner.compress()
 ```
 PyTorch code
-```
+```python
 from nni.compression.torch import LevelPruner
 config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
 pruner = LevelPruner(model, config_list)
@@ -40,8 +40,6 @@ This is an iterative pruner, In [To prune, or not to prune: exploring the effica
 ### Usage
 You can prune all weight from 0% to 80% sparsity in 10 epoch with the code below.
-First, you should import pruner and add mask to model.
 Tensorflow code
 ```python
 from nni.compression.tensorflow import AGP_Pruner
@@ -71,7 +69,7 @@ pruner = AGP_Pruner(model, config_list)
 pruner.compress()
 ```
-Second, you should add code below to update epoch number when you finish one epoch in your training code.
+you should add code below to update epoch number when you finish one epoch in your training code.
 Tensorflow code 
 ```python
@@ -133,13 +131,16 @@ The above configuration means that there are 5 times of iterative pruning. As th
 * **sparsity:** The final sparsity when the compression is done.
 ***
-## FPGM Pruner
+## WeightRankFilterPruner
+WeightRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the weights in convolution layers to achieve a preset level of network sparsity
+### 1, FPGM Pruner
 This is an one-shot pruner, FPGM Pruner is an implementation of paper [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf)
 >Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
-### Usage
+#### Usage
-First, you should import pruner and add mask to model.
 Tensorflow code
 ```python
@@ -163,7 +164,7 @@ pruner.compress()
 ```
 Note: FPGM Pruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.
-Second, you should add code below to update epoch number at beginning of each epoch.
+you should add code below to update epoch number at beginning of each epoch.
 Tensorflow code
 ```python
@@ -180,7 +181,7 @@ You can view example for more information
 ***
-## L1Filter Pruner
+### 2, L1Filter Pruner
 This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
@@ -193,12 +194,16 @@ This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https:
 > 1. For each filter ![](http://latex.codecogs.com/gif.latex?F_{i,j}), calculate the sum of its absolute kernel weights![](http://latex.codecogs.com/gif.latex?s_j=\sum_{l=1}^{n_i}\sum|K_l|)
 > 2. Sort the filters by ![](http://latex.codecogs.com/gif.latex?s_j).
 > 3. Prune ![](http://latex.codecogs.com/gif.latex?m) filters with the smallest sum values and their corresponding feature maps. The
->   kernels in the next convolutional layer corresponding to the pruned feature maps are also
+>      kernels in the next convolutional layer corresponding to the pruned feature maps are also
->   removed.
+>        removed.
 > 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
->   weights are copied to the new model.
+>      weights are copied to the new model.
-```
+#### Usage
+PyTorch code
+```python
 from nni.compression.torch import L1FilterPruner
 config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
 pruner = L1FilterPruner(model, config_list)
@@ -208,7 +213,90 @@ pruner.compress()
 #### User configuration for L1Filter Pruner
 - **sparsity:** This is to specify the sparsity operations to be compressed to
- **op_types:** Only Conv2d is supported in L1Filter Pruner
+- **op_types:** Only Conv1d and Conv2d is supported in L1Filter Pruner
+***
+### 3, L2Filter Pruner
+This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights.
+#### Usage
+PyTorch code
+```python
+from nni.compression.torch import L2FilterPruner
+config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+pruner = L2FilterPruner(model, config_list)
+pruner.compress()
+```
+#### User configuration for L2Filter Pruner
+- **sparsity:** This is to specify the sparsity operations to be compressed to
+- **op_types:** Only Conv1d and Conv2d is supported in L2Filter Pruner
+## ActivationRankFilterPruner
+ActivationRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the output activations of convolution layers to achieve a preset level of network sparsity
+### 1, ActivationAPoZRankFilterPruner
+This is an one-shot pruner, ActivationAPoZRankFilterPruner is an implementation of paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250)
+#### Usage
+PyTorch code
+```python
+from nni.compression.torch import ActivationAPoZRankFilterPruner
+config_list = [{
+    'sparsity': 0.5,
+    'op_types': ['Conv2d']
+}]
+pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1)
+pruner.compress()
+```
+Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.
+You can view example for more information
+#### User configuration for ActivationAPoZRankFilterPruner
+- **sparsity:** How much percentage of convolutional filters are to be pruned.
+- **op_types:** Only Conv2d is supported in ActivationAPoZRankFilterPruner
+***
+### 2, ActivationMeanRankFilterPruner
+This is an one-shot pruner, ActivationMeanRankFilterPruner is an implementation of paper [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440)
+#### Usage
+PyTorch code
+```python
+from nni.compression.torch import ActivationMeanRankFilterPruner
+config_list = [{
+    'sparsity': 0.5,
+    'op_types': ['Conv2d']
+}]
+pruner = ActivationMeanRankFilterPruner(model, config_list)
+pruner.compress()
+```
+Note: ActivationMeanRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.
+You can view example for more information
+#### User configuration for ActivationMeanRankFilterPruner
+- **sparsity:** How much percentage of convolutional filters are to be pruned.
+- **op_types:** Only Conv2d is supported in ActivationMeanRankFilterPruner
+***
 ## Slim Pruner
@@ -222,7 +310,7 @@ This is an one-shot pruner, In ['Learning Efficient Convolutional Networks throu
 PyTorch code
-```
+```python
 from nni.compression.torch import SlimPruner
 config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
 pruner = SlimPruner(model, config_list)

--- a/docs/en_US/Compressor/L1FilterPruner.md
+++ b/docs/en_US/Compressor/L1FilterPruner.md
-L1FilterPruner on NNI Compressor
+WeightRankFilterPruner on NNI Compressor
 ===
 ## 1. Introduction
+WeightRankFilterPruner is a series of pruners which prune filters according to some importance criterion calculated from the filters' weight.
+|     Pruner     |    Importance criterion     |                       Reference paper                        |
+| :------------: | :-------------------------: | :----------------------------------------------------------: |
+| L1FilterPruner |     L1 norm of weights      | [PRUNING FILTERS FOR EFFICIENT CONVNETS](https://arxiv.org/abs/1608.08710) |
+| L2FilterPruner |     L2 norm of weights      |                                                              |
+|   FPGMPruner   | Geometric Median of weights | [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf) |
+## 2. Pruners
+### L1FilterPruner
 L1FilterPruner is a general structured pruning algorithm for pruning filters in the convolutional layers.
 In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
@@ -16,12 +28,26 @@ In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710),
 > 1. For each filter ![](http://latex.codecogs.com/gif.latex?F_{i,j}), calculate the sum of its absolute kernel weights![](http://latex.codecogs.com/gif.latex?s_j=\sum_{l=1}^{n_i}\sum|K_l|)
 > 2. Sort the filters by ![](http://latex.codecogs.com/gif.latex?s_j).
 > 3. Prune ![](http://latex.codecogs.com/gif.latex?m) filters with the smallest sum values and their corresponding feature maps. The
->   kernels in the next convolutional layer corresponding to the pruned feature maps are also
+>      kernels in the next convolutional layer corresponding to the pruned feature maps are also
->   removed.
+>        removed.
 > 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
->   weights are copied to the new model.
+>      weights are copied to the new model.
+### L2FilterPruner
+L2FilterPruner is similar to L1FilterPruner, but only replace the importance criterion from L1 norm to L2 norm
+### FPGMPruner
+Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang
+"[Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/abs/1811.00250)", CVPR 2019.
+FPGMPruner prune filters with the smallest geometric median
+ ![](../../img/fpgm_fig1.png)
-## 2. Usage
+## 3. Usage
 PyTorch code
@@ -37,9 +63,9 @@ pruner.compress()
 - **sparsity:** This is to specify the sparsity operations to be compressed to
 - **op_types:** Only Conv2d is supported in L1Filter Pruner
-## 3. Experiment
+## 4. Experiment
-We implemented one of the experiments in ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which $64\%$ parameters are pruned. Our experiments results are as follows:
+We implemented one of the experiments in ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710) with **L1FilterPruner**, we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which $64\%$ parameters are pruned. Our experiments results are as follows:
 | Model           | Error(paper/ours) | Parameters      | Pruned   |
 | --------------- | ----------------- | --------------- | -------- |

--- a/docs/img/apoz.png
+++ b/docs/img/apoz.png
--- a/docs/img/fpgm_fig1.png
+++ b/docs/img/fpgm_fig1.png
--- a/examples/model_compress/APoZ_torch_cifar10.py
+++ b/examples/model_compress/APoZ_torch_cifar10.py
+import math
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision import datasets, transforms
+from nni.compression.torch import ActivationAPoZRankFilterPruner
+from models.cifar10.vgg import VGG
+def train(model, device, train_loader, optimizer):
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = F.cross_entropy(output, target)
+        loss.backward()
+        optimizer.step()
+        if batch_idx % 100 == 0:
+            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
+def test(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+    test_loss /= len(test_loader.dataset)
+    acc = 100 * correct / len(test_loader.dataset)
+    print('Loss: {}  Accuracy: {}%)\n'.format(
+        test_loss, acc))
+    return acc
+def main():
+    torch.manual_seed(0)
+    device = torch.device('cuda')
+    train_loader = torch.utils.data.DataLoader(
+        datasets.CIFAR10('./data.cifar10', train=True, download=True,
+                         transform=transforms.Compose([
+                             transforms.Pad(4),
+                             transforms.RandomCrop(32),
+                             transforms.RandomHorizontalFlip(),
+                             transforms.ToTensor(),
+                             transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
+                         ])),
+        batch_size=64, shuffle=True)
+    test_loader = torch.utils.data.DataLoader(
+        datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
+            transforms.ToTensor(),
+            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
+        ])),
+        batch_size=200, shuffle=False)
+    model = VGG(depth=16)
+    model.to(device)
+    # Train the base VGG-16 model
+    print('=' * 10 + 'Train the unpruned base model' + '=' * 10)
+    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
+    lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 160, 0)
+    for epoch in range(160):
+        train(model, device, train_loader, optimizer)
+        test(model, device, test_loader)
+        lr_scheduler.step(epoch)
+    torch.save(model.state_dict(), 'vgg16_cifar10.pth')
+    # Test base model accuracy
+    print('=' * 10 + 'Test on the original model' + '=' * 10)
+    model.load_state_dict(torch.load('vgg16_cifar10.pth'))
+    test(model, device, test_loader)
+    # top1 = 93.51%
+    # Pruning Configuration, in paper 'PRUNING FILTERS FOR EFFICIENT CONVNETS',
+    # Conv_1, Conv_8, Conv_9, Conv_10, Conv_11, Conv_12 are pruned with 50% sparsity, as 'VGG-16-pruned-A'
+    configure_list = [{
+        'sparsity': 0.5,
+        'op_types': ['default'],
+        'op_names': ['feature.0', 'feature.24', 'feature.27', 'feature.30', 'feature.34', 'feature.37']
+    }]
+    # Prune model and test accuracy without fine tuning.
+    print('=' * 10 + 'Test on the pruned model before fine tune' + '=' * 10)
+    pruner = ActivationAPoZRankFilterPruner(model, configure_list)
+    model = pruner.compress()
+    test(model, device, test_loader)
+    # top1 = 88.19%
+    # Fine tune the pruned model for 40 epochs and test accuracy
+    print('=' * 10 + 'Fine tuning' + '=' * 10)
+    optimizer_finetune = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
+    best_top1 = 0
+    for epoch in range(40):
+        pruner.update_epoch(epoch)
+        print('# Epoch {} #'.format(epoch))
+        train(model, device, train_loader, optimizer_finetune)
+        top1 = test(model, device, test_loader)
+        if top1 > best_top1:
+            best_top1 = top1
+            # Export the best model, 'model_path' stores state_dict of the pruned model,
+            # mask_path stores mask_dict of the pruned model
+            pruner.export_model(model_path='pruned_vgg16_cifar10.pth', mask_path='mask_vgg16_cifar10.pth')
+    # Test the exported model
+    print('=' * 10 + 'Test on the pruned model after fine tune' + '=' * 10)
+    new_model = VGG(depth=16)
+    new_model.to(device)
+    new_model.load_state_dict(torch.load('pruned_vgg16_cifar10.pth'))
+    test(new_model, device, test_loader)
+    # top1 = 93.53%
+if __name__ == '__main__':
+    main()
--- a/examples/model_compress/MeanActivation_torch_cifar10.py
+++ b/examples/model_compress/MeanActivation_torch_cifar10.py
+import math
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision import datasets, transforms
+from nni.compression.torch import L1FilterPruner
+from models.cifar10.vgg import VGG
+def train(model, device, train_loader, optimizer):
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = F.cross_entropy(output, target)
+        loss.backward()
+        optimizer.step()
+        if batch_idx % 100 == 0:
+            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
+def test(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+    test_loss /= len(test_loader.dataset)
+    acc = 100 * correct / len(test_loader.dataset)
+    print('Loss: {}  Accuracy: {}%)\n'.format(
+        test_loss, acc))
+    return acc
+def main():
+    torch.manual_seed(0)
+    device = torch.device('cuda')
+    train_loader = torch.utils.data.DataLoader(
+        datasets.CIFAR10('./data.cifar10', train=True, download=True,
+                         transform=transforms.Compose([
+                             transforms.Pad(4),
+                             transforms.RandomCrop(32),
+                             transforms.RandomHorizontalFlip(),
+                             transforms.ToTensor(),
+                             transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
+                         ])),
+        batch_size=64, shuffle=True)
+    test_loader = torch.utils.data.DataLoader(
+        datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
+            transforms.ToTensor(),
+            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
+        ])),
+        batch_size=200, shuffle=False)
+    model = VGG(depth=16)
+    model.to(device)
+    # Train the base VGG-16 model
+    print('=' * 10 + 'Train the unpruned base model' + '=' * 10)
+    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
+    lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 160, 0)
+    for epoch in range(160):
+        train(model, device, train_loader, optimizer)
+        test(model, device, test_loader)
+        lr_scheduler.step(epoch)
+    torch.save(model.state_dict(), 'vgg16_cifar10.pth')
+    # Test base model accuracy
+    print('=' * 10 + 'Test on the original model' + '=' * 10)
+    model.load_state_dict(torch.load('vgg16_cifar10.pth'))
+    test(model, device, test_loader)
+    # top1 = 93.51%
+    # Pruning Configuration, in paper 'PRUNING FILTERS FOR EFFICIENT CONVNETS',
+    # Conv_1, Conv_8, Conv_9, Conv_10, Conv_11, Conv_12 are pruned with 50% sparsity, as 'VGG-16-pruned-A'
+    configure_list = [{
+        'sparsity': 0.5,
+        'op_types': ['default'],
+        'op_names': ['feature.0', 'feature.24', 'feature.27', 'feature.30', 'feature.34', 'feature.37']
+    }]
+    # Prune model and test accuracy without fine tuning.
+    print('=' * 10 + 'Test on the pruned model before fine tune' + '=' * 10)
+    pruner = L1FilterPruner(model, configure_list)
+    model = pruner.compress()
+    test(model, device, test_loader)
+    # top1 = 88.19%
+    # Fine tune the pruned model for 40 epochs and test accuracy
+    print('=' * 10 + 'Fine tuning' + '=' * 10)
+    optimizer_finetune = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
+    best_top1 = 0
+    for epoch in range(40):
+        pruner.update_epoch(epoch)
+        print('# Epoch {} #'.format(epoch))
+        train(model, device, train_loader, optimizer_finetune)
+        top1 = test(model, device, test_loader)
+        if top1 > best_top1:
+            best_top1 = top1
+            # Export the best model, 'model_path' stores state_dict of the pruned model,
+            # mask_path stores mask_dict of the pruned model
+            pruner.export_model(model_path='pruned_vgg16_cifar10.pth', mask_path='mask_vgg16_cifar10.pth')
+    # Test the exported model
+    print('=' * 10 + 'Test on the pruned model after fine tune' + '=' * 10)
+    new_model = VGG(depth=16)
+    new_model.to(device)
+    new_model.load_state_dict(torch.load('pruned_vgg16_cifar10.pth'))
+    test(new_model, device, test_loader)
+    # top1 = 93.53%
+if __name__ == '__main__':
+    main()
--- a/src/sdk/pynni/nni/compression/torch/builtin_pruners.py
+++ b/src/sdk/pynni/nni/compression/torch/builtin_pruners.py
--- a/src/sdk/pynni/nni/compression/torch/compressor.py
+++ b/src/sdk/pynni/nni/compression/torch/compressor.py
@@ -16,6 +16,7 @@ class LayerInfo:
        self._forward = None
 class Compressor:
    """
    Abstract base PyTorch compressor
@@ -193,10 +194,16 @@ class Pruner(Compressor):
        layer._forward = layer.module.forward
        def new_forward(*inputs):
+            mask = self.calc_mask(layer, config)
            # apply mask to weight
            old_weight = layer.module.weight.data
-            mask = self.calc_mask(layer, config)
+            mask_weight = mask['weight']
-            layer.module.weight.data = old_weight.mul(mask)
+            layer.module.weight.data = old_weight.mul(mask_weight)
+            # apply mask to bias
+            if mask.__contains__('bias') and hasattr(layer.module, 'bias') and layer.module.bias is not None:
+                old_bias = layer.module.bias.data
+                mask_bias = mask['bias']
+                layer.module.bias.data = old_bias.mul(mask_bias)
            # calculate forward
            ret = layer._forward(*inputs)
            return ret
@@ -224,12 +231,14 @@ class Pruner(Compressor):
        for name, m in self.bound_model.named_modules():
            if name == "":
                continue
-            mask = self.mask_dict.get(name)
+            masks = self.mask_dict.get(name)
-            if mask is not None:
+            if masks is not None:
-                mask_sum = mask.sum().item()
+                mask_sum = masks['weight'].sum().item()
-                mask_num = mask.numel()
+                mask_num = masks['weight'].numel()
                _logger.info('Layer: %s  Sparsity: %.2f', name, 1 - mask_sum / mask_num)
-                m.weight.data = m.weight.data.mul(mask)
+                m.weight.data = m.weight.data.mul(masks['weight'])
+                if masks.__contains__('bias') and hasattr(m, 'bias') and m.bias is not None:
+                    m.bias.data = m.bias.data.mul(masks['bias'])
            else:
                _logger.info('Layer: %s  NOT compressed', name)
        torch.save(self.bound_model.state_dict(), model_path)
@@ -258,7 +267,6 @@ class Quantizer(Compressor):
        """
        quantize should overload this method to quantize weight.
        This method is effectively hooked to :meth:`forward` of the model.
        Parameters
        ----------
        weight : Tensor
@@ -272,7 +280,6 @@ class Quantizer(Compressor):
        """
        quantize should overload this method to quantize output.
        This method is effectively hooked to :meth:`forward` of the model.
        Parameters
        ----------
        output : Tensor
@@ -286,7 +293,6 @@ class Quantizer(Compressor):
        """
        quantize should overload this method to quantize input.
        This method is effectively hooked to :meth:`forward` of the model.
        Parameters
        ----------
        inputs : Tensor
@@ -300,7 +306,6 @@ class Quantizer(Compressor):
    def _instrument_layer(self, layer, config):
        """
        Create a wrapper forward function to replace the original one.
        Parameters
        ----------
        layer : LayerInfo
@@ -365,7 +370,6 @@ class QuantGrad(torch.autograd.Function):
        """
        This method should be overrided by subclass to provide customized backward function,
        default implementation is Straight-Through Estimator
        Parameters
        ----------
        tensor : Tensor
@@ -375,7 +379,6 @@ class QuantGrad(torch.autograd.Function):
        quant_type : QuantType
            the type of quantization, it can be `QuantType.QUANT_INPUT`, `QuantType.QUANT_WEIGHT`, `QuantType.QUANT_OUTPUT`,
            you can define different behavior for different types.
        Returns
        -------
        tensor
@@ -399,3 +402,4 @@ def _check_weight(module):
        return isinstance(module.weight.data, torch.Tensor)
    except AttributeError:
        return False
\ No newline at end of file
--- a/src/sdk/pynni/nni/compression/torch/lottery_ticket.py
+++ b/src/sdk/pynni/nni/compression/torch/lottery_ticket.py
@@ -17,6 +17,7 @@ class LotteryTicketPruner(Pruner):
    4. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
    5. Repeat step 2, 3, and 4.
    """
    def __init__(self, model, config_list, optimizer, lr_scheduler=None, reset_weights=True):
        """
        Parameters
@@ -55,7 +56,8 @@ class LotteryTicketPruner(Pruner):
            assert 'prune_iterations' in config, 'prune_iterations must exist in your config'
            assert 'sparsity' in config, 'sparsity must exist in your config'
            if prune_iterations is not None:
-                assert prune_iterations == config['prune_iterations'], 'The values of prune_iterations must be equal in your config'
+                assert prune_iterations == config[
+                    'prune_iterations'], 'The values of prune_iterations must be equal in your config'
            prune_iterations = config['prune_iterations']
        return prune_iterations
@@ -67,8 +69,8 @@ class LotteryTicketPruner(Pruner):
            if print_mask:
                print('mask: ', mask)
            # calculate current sparsity
-            mask_num = mask.sum().item()
+            mask_num = mask['weight'].sum().item()
-            mask_size = mask.numel()
+            mask_size = mask['weight'].numel()
            print('sparsity: ', 1 - mask_num / mask_size)
        torch.set_printoptions(profile='default')
@@ -84,11 +86,11 @@ class LotteryTicketPruner(Pruner):
            curr_sparsity = self._calc_sparsity(sparsity)
            assert self.mask_dict.get(op_name) is not None
            curr_mask = self.mask_dict.get(op_name)
-            w_abs = weight.abs() * curr_mask
+            w_abs = weight.abs() * curr_mask['weight']
            k = int(w_abs.numel() * curr_sparsity)
            threshold = torch.topk(w_abs.view(-1), k, largest=False).values.max()
            mask = torch.gt(w_abs, threshold).type_as(weight)
-        return mask
+        return {'weight': mask}
    def calc_mask(self, layer, config):
        """

--- a/src/sdk/pynni/tests/test_compressor.py
+++ b/src/sdk/pynni/tests/test_compressor.py
@@ -136,12 +136,12 @@ class CompressorTestCase(TestCase):
        model.conv2.weight.data = torch.tensor(w).float()
        layer = torch_compressor.compressor.LayerInfo('conv2', model.conv2)
        masks = pruner.calc_mask(layer, config_list[0])
-        assert all(torch.sum(masks, (1, 2, 3)).numpy() == np.array([45., 45., 45., 45., 0., 0., 45., 45., 45., 45.]))
+        assert all(torch.sum(masks['weight'], (1, 2, 3)).numpy() == np.array([45., 45., 45., 45., 0., 0., 45., 45., 45., 45.]))
        pruner.update_epoch(1)
        model.conv2.weight.data = torch.tensor(w).float()
        masks = pruner.calc_mask(layer, config_list[1])
-        assert all(torch.sum(masks, (1, 2, 3)).numpy() == np.array([45., 45., 0., 0., 0., 0., 0., 0., 45., 45.]))
+        assert all(torch.sum(masks['weight'], (1, 2, 3)).numpy() == np.array([45., 45., 0., 0., 0., 0., 0., 0., 45., 45.]))
    @tf2
    def test_tf_fpgm_pruner(self):
@@ -190,8 +190,8 @@ class CompressorTestCase(TestCase):
        mask1 = pruner.calc_mask(layer1, config_list[0])
        layer2 = torch_compressor.compressor.LayerInfo('conv2', model.conv2)
        mask2 = pruner.calc_mask(layer2, config_list[1])
-        assert all(torch.sum(mask1, (1, 2, 3)).numpy() == np.array([0., 27., 27., 27., 27.]))
+        assert all(torch.sum(mask1['weight'], (1, 2, 3)).numpy() == np.array([0., 27., 27., 27., 27.]))
-        assert all(torch.sum(mask2, (1, 2, 3)).numpy() == np.array([0., 0., 0., 27., 27.]))
+        assert all(torch.sum(mask2['weight'], (1, 2, 3)).numpy() == np.array([0., 0., 0., 27., 27.]))
    def test_torch_slim_pruner(self):
        """
@@ -218,8 +218,10 @@ class CompressorTestCase(TestCase):
        mask1 = pruner.calc_mask(layer1, config_list[0])
        layer2 = torch_compressor.compressor.LayerInfo('bn2', model.bn2)
        mask2 = pruner.calc_mask(layer2, config_list[0])
-        assert all(mask1.numpy() == np.array([0., 1., 1., 1., 1.]))
+        assert all(mask1['weight'].numpy() == np.array([0., 1., 1., 1., 1.]))
-        assert all(mask2.numpy() == np.array([0., 1., 1., 1., 1.]))
+        assert all(mask2['weight'].numpy() == np.array([0., 1., 1., 1., 1.]))
+        assert all(mask1['bias'].numpy() == np.array([0., 1., 1., 1., 1.]))
+        assert all(mask2['bias'].numpy() == np.array([0., 1., 1., 1., 1.]))
        config_list = [{'sparsity': 0.6, 'op_types': ['BatchNorm2d']}]
        model.bn1.weight.data = torch.tensor(w).float()
@@ -230,8 +232,10 @@ class CompressorTestCase(TestCase):
        mask1 = pruner.calc_mask(layer1, config_list[0])
        layer2 = torch_compressor.compressor.LayerInfo('bn2', model.bn2)
        mask2 = pruner.calc_mask(layer2, config_list[0])
-        assert all(mask1.numpy() == np.array([0., 0., 0., 1., 1.]))
+        assert all(mask1['weight'].numpy() == np.array([0., 0., 0., 1., 1.]))
-        assert all(mask2.numpy() == np.array([0., 0., 0., 1., 1.]))
+        assert all(mask2['weight'].numpy() == np.array([0., 0., 0., 1., 1.]))
+        assert all(mask1['bias'].numpy() == np.array([0., 0., 0., 1., 1.]))
+        assert all(mask2['bias'].numpy() == np.array([0., 0., 0., 1., 1.]))
    def test_torch_QAT_quantizer(self):
        model = TorchModel()