Auto pruners (#2490)

f5caa193 · Guoxin · GitHub · a3b0bd7d · f5caa193 · f5caa193
Unverified Commit f5caa193 authored Jun 30, 2020 by Guoxin Committed by GitHub Jun 30, 2020
17 changed files
--- a/README.md
+++ b/README.md
@@ -144,6 +144,10 @@ Within the following table, we summarized the current NNI capabilities, we are g
              <li><a href="docs/en_US/Compressor/Pruner.md#agp-pruner">AGP Pruner</a></li>
              <li><a href="docs/en_US/Compressor/Pruner.md#slim-pruner">Slim Pruner</a></li>
              <li><a href="docs/en_US/Compressor/Pruner.md#fpgm-pruner">FPGM Pruner</a></li>
+              <li><a href="docs/en_US/Compressor/Pruner.md#netadapt-pruner">NetAdapt Pruner</a></li>
+              <li><a href="docs/en_US/Compressor/Pruner.md#simulatedannealing-pruner">SimulatedAnnealing Pruner</a></li>
+              <li><a href="docs/en_US/Compressor/Pruner.md#admm-pruner">ADMM Pruner</a></li>
+              <li><a href="docs/en_US/Compressor/Pruner.md#autocompress-pruner">AutoCompress Pruner</a></li>
            </ul>
            <b>Quantization</b>
            <ul>

--- a/docs/en_US/Compressor/Overview.md
+++ b/docs/en_US/Compressor/Overview.md
@@ -37,6 +37,10 @@ Pruning algorithms compress the original network by removing redundant weights o
 | [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
 | [Slim Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
 | [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) |
+| [ADMM Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#admm-pruner) | Pruning based on ADMM optimization technique [Reference Paper](https://arxiv.org/abs/1804.03294) |
+| [NetAdapt Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#netadapt-pruner) | Automatically simplify a pretrained network to meet the resource budget by iterative pruning  [Reference Paper](https://arxiv.org/abs/1804.03230) |
+| [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
+| [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |
 ### Quantization Algorithms

--- a/docs/en_US/Compressor/Pruner.md
+++ b/docs/en_US/Compressor/Pruner.md
@@ -17,8 +17,12 @@ We provide several pruning algorithms that support fine-grained weight pruning a
 **Pruning Schedule**
 * [AGP Pruner](#agp-pruner)
+* [NetAdapt Pruner](#netadapt-pruner)
+* [SimulatedAnnealing Pruner](#simulatedannealing-pruner)
+* [AutoCompress Pruner](#autocompress-pruner)
 **Others**
+* [ADMM Pruner](#admm-pruner)
 * [Lottery Ticket Hypothesis](#lottery-ticket-hypothesis)
 ## Level Pruner
@@ -349,6 +353,290 @@ You can view example for more information
 ***
+## NetAdapt Pruner
+NetAdapt allows a user to automatically simplify a pretrained network to meet the resource budget. 
+Given the overall sparsity, NetAdapt will automatically generate the sparsities distribution among different layers by iterative pruning.
+For more details, please refer to [NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications](https://arxiv.org/abs/1804.03230).
+![](../../img/algo_NetAdapt.png)
+#### Usage
+PyTorch code
+```python
+from nni.compression.torch import NetAdaptPruner
+config_list = [{
+    'sparsity': 0.5,
+    'op_types': ['Conv2d']
+}]
+pruner = NetAdaptPruner(model, config_list, short_term_fine_tuner=short_term_fine_tuner, evaluator=evaluator,base_algo='l1', experiment_data_dir='./')
+pruner.compress()
+```
+You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/auto_pruners_torch.py) for more information.
+#### User configuration for NetAdapt Pruner
+- **sparsity:** The target overall sparsity.
+- **op_types:** The operation type to prune. If `base_algo` is `l1` or `l2`, then only `Conv2d` is supported as `op_types`.
+- **short_term_fine_tuner:** Function to short-term fine tune the masked model.
+This function should include `model` as the only parameter, and fine tune the model for a short term after each pruning iteration.
+    Example:
+    ```python
+    >>> def short_term_fine_tuner(model, epoch=3):
+    >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    >>>     train_loader = ...
+    >>>     criterion = torch.nn.CrossEntropyLoss()
+    >>>     optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
+    >>>     model.train()
+    >>>     for _ in range(epoch):
+    >>>         for batch_idx, (data, target) in enumerate(train_loader):
+    >>>             data, target = data.to(device), target.to(device)
+    >>>             optimizer.zero_grad()
+    >>>             output = model(data)
+    >>>             loss = criterion(output, target)
+    >>>             loss.backward()
+    >>>             optimizer.step()
+    ```
+- **evaluator:** Function to evaluate the masked model. This function should include `model` as the only parameter, and returns a scalar value.
+    Example::
+    ```python
+    >>> def evaluator(model):
+    >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    >>>     val_loader = ...
+    >>>     model.eval()
+    >>>     correct = 0
+    >>>     with torch.no_grad():
+    >>>         for data, target in val_loader:
+    >>>             data, target = data.to(device), target.to(device)
+    >>>             output = model(data)
+    >>>             # get the index of the max log-probability
+    >>>             pred = output.argmax(dim=1, keepdim=True)
+    >>>             correct += pred.eq(target.view_as(pred)).sum().item()
+    >>>     accuracy = correct / len(val_loader.dataset)
+    >>>     return accuracy
+    ```
+- **optimize_mode:** Optimize mode, `maximize` or `minimize`, by default `maximize`.
+- **base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
+Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
+- **sparsity_per_iteration:** The sparsity to prune in each iteration. NetAdapt Pruner prune the model by the same level in each iteration to meet the resource budget progressively.
+- **experiment_data_dir:** PATH to save experiment data, including the config_list generated for the base pruning algorithm and the performance of the pruned model.
+## SimulatedAnnealing Pruner
+We implement a guided heuristic search method, Simulated Annealing (SA) algorithm, with enhancement on guided search based on prior experience. 
+The enhanced SA technique is based on the observation that a DNN layer with more number of weights often has a higher degree of model compression with less impact on overall accuracy.
+- Randomly initialize a pruning rate distribution (sparsities).
+- While current_temperature < stop_temperature:
+    1. generate a perturbation to current distribution
+    2. Perform fast evaluation on the perturbated distribution
+    3. accept the perturbation according to the performance and probability, if not accepted, return to step 1
+    4. cool down, current_temperature <- current_temperature * cool_down_rate
+For more details, please refer to [AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates](https://arxiv.org/abs/1907.03141).
+#### Usage
+PyTorch code
+```python
+from nni.compression.torch import SimulatedAnnealingPruner
+config_list = [{
+    'sparsity': 0.5,
+    'op_types': ['Conv2d']
+}]
+pruner = SimulatedAnnealingPruner(model, config_list, evaluator=evaluator, base_algo='l1', cool_down_rate=0.9, experiment_data_dir='./')
+pruner.compress()
+```
+You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/auto_pruners_torch.py) for more information.
+#### User configuration for SimulatedAnnealing Pruner
+- **sparsity:** The target overall sparsity.
+- **op_types:** The operation type to prune. If `base_algo` is `l1` or `l2`, then only `Conv2d` is supported as `op_types`.
+- **evaluator:** Function to evaluate the masked model. This function should include `model` as the only parameter, and returns a scalar value.
+    Example::
+    ```python
+    >>> def evaluator(model):
+    >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    >>>     val_loader = ...
+    >>>     model.eval()
+    >>>     correct = 0
+    >>>     with torch.no_grad():
+    >>>         for data, target in val_loader:
+    >>>             data, target = data.to(device), target.to(device)
+    >>>             output = model(data)
+    >>>             # get the index of the max log-probability
+    >>>             pred = output.argmax(dim=1, keepdim=True)
+    >>>             correct += pred.eq(target.view_as(pred)).sum().item()
+    >>>     accuracy = correct / len(val_loader.dataset)
+    >>>     return accuracy
+    ```
+- **optimize_mode:** Optimize mode, `maximize` or `minimize`, by default `maximize`.
+- **base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
+Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
+- **start_temperature:** Simualated Annealing related parameter.
+- **stop_temperature:** Simualated Annealing related parameter.
+- **cool_down_rate:** Simualated Annealing related parameter.
+- **perturbation_magnitude:** Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.
+- **experiment_data_dir:** PATH to save experiment data, including the config_list generated for the base pruning algorithm, the performance of the pruned model and the pruning history.
+## AutoCompress Pruner
+For each round, AutoCompressPruner prune the model for the same sparsity to achive the overall sparsity:
+        1. Generate sparsities distribution using SimualtedAnnealingPruner
+        2. Perform ADMM-based structured pruning to generate pruning result for the next round.
+           Here we use `speedup` to perform real pruning.
+For more details, please refer to [AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates](https://arxiv.org/abs/1907.03141).
+#### Usage
+PyTorch code
+```python
+from nni.compression.torch import ADMMPruner
+config_list = [{
+        'sparsity': 0.5,
+        'op_types': ['Conv2d']
+    }]
+pruner = AutoCompressPruner(
+            model, config_list, trainer=trainer, evaluator=evaluator,
+            dummy_input=dummy_input, num_iterations=3, optimize_mode='maximize', base_algo='l1',
+            cool_down_rate=0.9, admm_num_iterations=30, admm_training_epochs=5, experiment_data_dir='./')
+pruner.compress()
+```
+You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/auto_pruners_torch.py) for more information.
+#### User configuration for AutoCompress Pruner
+- **sparsity:** The target overall sparsity.
+- **op_types:** The operation type to prune. If `base_algo` is `l1` or `l2`, then only `Conv2d` is supported as `op_types`.
+- **trainer:** Function used for the first subproblem.
+Users should write this function as a normal function to train the Pytorch model and include `model, optimizer, criterion, epoch, callback` as function arguments.
+Here `callback` acts as an L2 regulizer as presented in the formula (7) of the original paper.
+The logic of `callback` is implemented inside the Pruner, users are just required to insert `callback()` between `loss.backward()` and `optimizer.step()`.
+    Example:
+    ```python
+    >>> def trainer(model, criterion, optimizer, epoch, callback):
+    >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    >>>     train_loader = ...
+    >>>     model.train()
+    >>>     for batch_idx, (data, target) in enumerate(train_loader):
+    >>>         data, target = data.to(device), target.to(device)
+    >>>         optimizer.zero_grad()
+    >>>         output = model(data)
+    >>>         loss = criterion(output, target)
+    >>>         loss.backward()
+    >>>         # callback should be inserted between loss.backward() and optimizer.step()
+    >>>         if callback:
+    >>>             callback()
+    >>>         optimizer.step()
+    ```
+- **evaluator:** Function to evaluate the masked model. This function should include `model` as the only parameter, and returns a scalar value.
+    Example::
+    ```python
+    >>> def evaluator(model):
+    >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    >>>     val_loader = ...
+    >>>     model.eval()
+    >>>     correct = 0
+    >>>     with torch.no_grad():
+    >>>         for data, target in val_loader:
+    >>>             data, target = data.to(device), target.to(device)
+    >>>             output = model(data)
+    >>>             # get the index of the max log-probability
+    >>>             pred = output.argmax(dim=1, keepdim=True)
+    >>>             correct += pred.eq(target.view_as(pred)).sum().item()
+    >>>     accuracy = correct / len(val_loader.dataset)
+    >>>     return accuracy
+    ```
+- **dummy_input:** The dummy input for model speed up, users should put it on right device before pass in.
+- **iterations:** The number of overall iterations.
+- **optimize_mode:** Optimize mode, `maximize` or `minimize`, by default `maximize`.
+- **base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
+Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
+- **start_temperature:** Simualated Annealing related parameter.
+- **stop_temperature:** Simualated Annealing related parameter.
+- **cool_down_rate:** Simualated Annealing related parameter.
+- **perturbation_magnitude:** Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.
+- **admm_num_iterations:** Number of iterations of ADMM Pruner.
+- **admm_training_epochs:** Training epochs of the first optimization subproblem of ADMMPruner.
+- **experiment_data_dir:** PATH to store temporary experiment data.
+## ADMM Pruner
+Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
+by decomposing the original nonconvex problem into two subproblems that can be solved iteratively. In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively. 
+During the process of solving these two subproblems, the weights of the original model will be changed. An one-shot pruner will then be applied to prune the model according to the config list given.
+This solution framework applies both to non-structured and different variations of structured pruning schemes.
+For more details, please refer to [A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers](https://arxiv.org/abs/1804.03294).
+#### Usage
+PyTorch code
+```python
+from nni.compression.torch import ADMMPruner
+config_list = [{
+            'sparsity': 0.8,
+            'op_types': ['Conv2d'],
+            'op_names': ['conv1']
+        }, {
+            'sparsity': 0.92,
+            'op_types': ['Conv2d'],
+            'op_names': ['conv2']
+        }]
+pruner = ADMMPruner(model, config_list, trainer=trainer, num_iterations=30, epochs=5)
+pruner.compress()
+```
+You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/auto_pruners_torch.py) for more information.
+#### User configuration for ADMM Pruner
+- **sparsity:** This is to specify the sparsity operations to be compressed to.
+- **op_types:** The operation type to prune. If `base_algo` is `l1` or `l2`, then only `Conv2d` is supported as `op_types`.
+- **trainer:** Function used for the first subproblem in ADMM optimization, attention, this is not used for fine-tuning.
+Users should write this function as a normal function to train the Pytorch model and include `model, optimizer, criterion, epoch, callback` as function arguments.
+Here `callback` acts as an L2 regulizer as presented in the formula (7) of the original paper.
+The logic of `callback` is implemented inside the Pruner, users are just required to insert `callback()` between `loss.backward()` and `optimizer.step()`.
+    Example: 
+    ```python
+    >>> def trainer(model, criterion, optimizer, epoch, callback):
+    >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    >>>     train_loader = ...
+    >>>     model.train()
+    >>>     for batch_idx, (data, target) in enumerate(train_loader):
+    >>>         data, target = data.to(device), target.to(device)
+    >>>         optimizer.zero_grad()
+    >>>         output = model(data)
+    >>>         loss = criterion(output, target)
+    >>>         loss.backward()
+    >>>         # callback should be inserted between loss.backward() and optimizer.step()
+    >>>         if callback:
+    >>>             callback()
+    >>>         optimizer.step()
+    ```
+- **num_iterations:** Total number of iterations.
+- **training_epochs:** Training epochs of the first subproblem.
+- **row:** Penalty parameters for ADMM training.
+- **base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
+Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
 ## Lottery Ticket Hypothesis
 [The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635), authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the *lottery ticket hypothesis*: dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*) that -- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.
@@ -396,7 +684,3 @@ We try to reproduce the experiment result of the fully connected network on MNIS
 ![](../../img/lottery_ticket_mnist_fc.png)
 The above figure shows the result of the fully connected network. `round0-sparsity-0.0` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear.
\ No newline at end of file
--- a/docs/img/algo_NetAdapt.png
+++ b/docs/img/algo_NetAdapt.png
--- a/examples/model_compress/auto_pruners_torch.py
+++ b/examples/model_compress/auto_pruners_torch.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+'''
+Examples for automatic pruners
+'''
+import argparse
+import os
+import json
+import torch
+from torch.optim.lr_scheduler import StepLR, MultiStepLR
+from torchvision import datasets, transforms, models
+from models.mnist.lenet import LeNet
+from models.cifar10.vgg import VGG
+from nni.compression.torch import L1FilterPruner, SimulatedAnnealingPruner, ADMMPruner, NetAdaptPruner, AutoCompressPruner
+from nni.compression.torch import ModelSpeedup
+def get_data(args):
+    '''
+    get data
+    '''
+    kwargs = {'num_workers': 1, 'pin_memory': True} if torch.cuda.is_available() else {
+    }
+    if args.dataset == 'mnist':
+        train_loader = torch.utils.data.DataLoader(
+            datasets.MNIST(args.data_dir, train=True, download=True,
+                           transform=transforms.Compose([
+                               transforms.ToTensor(),
+                               transforms.Normalize((0.1307,), (0.3081,))
+                           ])),
+            batch_size=args.batch_size, shuffle=True, **kwargs)
+        val_loader = torch.utils.data.DataLoader(
+            datasets.MNIST(args.data_dir, train=False,
+                           transform=transforms.Compose([
+                               transforms.ToTensor(),
+                               transforms.Normalize((0.1307,), (0.3081,))
+                           ])),
+            batch_size=args.test_batch_size, shuffle=True, **kwargs)
+        criterion = torch.nn.NLLLoss()
+    elif args.dataset == 'cifar10':
+        normalize = transforms.Normalize(
+            (0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
+        train_loader = torch.utils.data.DataLoader(
+            datasets.CIFAR10(args.data_dir, train=True, transform=transforms.Compose([
+                transforms.RandomHorizontalFlip(),
+                transforms.RandomCrop(32, 4),
+                transforms.ToTensor(),
+                normalize,
+            ]), download=True),
+            batch_size=args.batch_size, shuffle=True, **kwargs)
+        val_loader = torch.utils.data.DataLoader(
+            datasets.CIFAR10(args.data_dir, train=False, transform=transforms.Compose([
+                transforms.ToTensor(),
+                normalize,
+            ])),
+            batch_size=args.batch_size, shuffle=False, **kwargs)
+        criterion = torch.nn.CrossEntropyLoss()
+    elif args.dataset == 'imagenet':
+        normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
+                                         std=[0.229, 0.224, 0.225])
+        train_loader = torch.utils.data.DataLoader(
+            datasets.ImageFolder(os.path.join(args.data_dir, 'train'),
+                                 transform=transforms.Compose([
+                                     transforms.RandomResizedCrop(224),
+                                     transforms.RandomHorizontalFlip(),
+                                     transforms.ToTensor(),
+                                     normalize,
+                                 ])),
+            batch_size=args.batch_size, shuffle=True, **kwargs)
+        val_loader = torch.utils.data.DataLoader(
+            datasets.ImageFolder(os.path.join(args.data_dir, 'val'),
+                                 transform=transforms.Compose([
+                                     transforms.Resize(256),
+                                     transforms.CenterCrop(224),
+                                     transforms.ToTensor(),
+                                     normalize,
+                                 ])),
+            batch_size=args.test_batch_size, shuffle=True, **kwargs)
+        criterion = torch.nn.CrossEntropyLoss()
+    return train_loader, val_loader, criterion
+def train(args, model, device, train_loader, criterion, optimizer, epoch, callback=None):
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = criterion(output, target)
+        loss.backward()
+        # callback should be inserted between loss.backward() and optimizer.step()
+        if callback:
+            callback()
+        optimizer.step()
+        if batch_idx % args.log_interval == 0:
+            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
+                epoch, batch_idx * len(data), len(train_loader.dataset),
+                100. * batch_idx / len(train_loader), loss.item()))
+def test(model, device, criterion, val_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in val_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            # sum up batch loss
+            test_loss += criterion(output, target).item()
+            # get the index of the max log-probability
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+    test_loss /= len(val_loader.dataset)
+    accuracy = correct / len(val_loader.dataset)
+    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
+        test_loss, correct, len(val_loader.dataset), 100. * accuracy))
+    return accuracy
+def get_trained_model(args, device, train_loader, val_loader, criterion):
+    if args.model == 'LeNet':
+        model = LeNet().to(device)
+        optimizer = torch.optim.Adadelta(model.parameters(), lr=1)
+        scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
+        for epoch in range(args.pretrain_epochs):
+            train(args, model, device, train_loader,
+                  criterion, optimizer, epoch)
+            scheduler.step()
+    elif args.model == 'vgg16':
+        model = VGG(depth=16).to(device)
+        optimizer = torch.optim.SGD(model.parameters(), lr=0.01,
+                                    momentum=0.9,
+                                    weight_decay=5e-4)
+        scheduler = MultiStepLR(
+            optimizer, milestones=[int(args.pretrain_epochs*0.5), int(args.pretrain_epochs*0.75)], gamma=0.1)
+        for epoch in range(args.pretrain_epochs):
+            train(args, model, device, train_loader,
+                  criterion, optimizer, epoch)
+            scheduler.step()
+    elif args.model == 'resnet18':
+        model = models.resnet18(pretrained=False, num_classes=10).to(device)
+        optimizer = torch.optim.SGD(model.parameters(), lr=0.01,
+                                    momentum=0.9,
+                                    weight_decay=5e-4)
+        scheduler = MultiStepLR(
+            optimizer, milestones=[int(args.pretrain_epochs*0.5), int(args.pretrain_epochs*0.75)], gamma=0.1)
+        for epoch in range(args.pretrain_epochs):
+            train(args, model, device, train_loader,
+                  criterion, optimizer, epoch)
+            scheduler.step()
+    elif args.model == 'mobilenet_v2':
+        model = models.mobilenet_v2(pretrained=True).to(device)
+    if args.save_model:
+        torch.save(model.state_dict(), os.path.join(
+            args.experiment_data_dir, 'model_trained.pth'))
+        print('Model trained saved to %s', args.experiment_data_dir)
+    return model, optimizer
+def get_dummy_input(args, device):
+    if args.dataset == 'mnist':
+        dummy_input = torch.randn(
+            [args.test_batch_size, 1, 28, 28]).to(device)
+    elif args.dataset in ['cifar10', 'imagenet']:
+        dummy_input = torch.randn(
+            [args.test_batch_size, 3, 32, 32]).to(device)
+    return dummy_input
+def main(args):
+    # prepare dataset
+    torch.manual_seed(0)
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    train_loader, val_loader, criterion = get_data(args)
+    model, optimizer = get_trained_model(args, device, train_loader, val_loader, criterion)
+    def short_term_fine_tuner(model, epochs=1):
+        for epoch in range(epochs):
+            train(args, model, device, train_loader, criterion, optimizer, epoch)
+    def trainer(model, optimizer, criterion, epoch, callback):
+        return train(args, model, device, train_loader, criterion, optimizer, epoch=epoch, callback=callback)
+    def evaluator(model):
+        return test(model, device, criterion, val_loader)
+    # used to save the performance of the original & pruned & finetuned models
+    result = {}
+    evaluation_result = evaluator(model)
+    print('Evaluation result (original model): %s' % evaluation_result)
+    result['original'] = evaluation_result
+    # module types to prune, only "Conv2d" supported for channel pruning
+    if args.base_algo in ['l1', 'l2']:
+        op_types = ['Conv2d']
+    elif args.base_algo == 'level':
+        op_types = ['default']
+    config_list = [{
+        'sparsity': args.sparsity,
+        'op_types': op_types
+    }]
+    dummy_input = get_dummy_input(args, device)
+    if args.pruner == 'L1FilterPruner':
+        pruner = L1FilterPruner(model, config_list)
+    elif args.pruner == 'NetAdaptPruner':
+        pruner = NetAdaptPruner(model, config_list, short_term_fine_tuner=short_term_fine_tuner, evaluator=evaluator,
+                                base_algo=args.base_algo, experiment_data_dir=args.experiment_data_dir)
+    elif args.pruner == 'ADMMPruner':
+        # users are free to change the config here
+        if args.model == 'LeNet':
+            if args.base_algo in ['l1', 'l2']:
+                config_list = [{
+                    'sparsity': 0.8,
+                    'op_types': ['Conv2d'],
+                    'op_names': ['conv1']
+                }, {
+                    'sparsity': 0.92,
+                    'op_types': ['Conv2d'],
+                    'op_names': ['conv2']
+                }]
+            elif args.base_algo == 'level':
+                config_list = [{
+                    'sparsity': 0.8,
+                    'op_names': ['conv1']
+                }, {
+                    'sparsity': 0.92,
+                    'op_names': ['conv2']
+                }, {
+                    'sparsity': 0.991,
+                    'op_names': ['fc1']
+                }, {
+                    'sparsity': 0.93,
+                    'op_names': ['fc2']
+                }]
+        else:
+            raise ValueError('Example only implemented for LeNet.')
+        pruner = ADMMPruner(model, config_list, trainer=trainer, num_iterations=2, training_epochs=2)
+    elif args.pruner == 'SimulatedAnnealingPruner':
+        pruner = SimulatedAnnealingPruner(
+            model, config_list, evaluator=evaluator, base_algo=args.base_algo,
+            cool_down_rate=args.cool_down_rate, experiment_data_dir=args.experiment_data_dir)
+    elif args.pruner == 'AutoCompressPruner':
+        pruner = AutoCompressPruner(
+            model, config_list, trainer=trainer, evaluator=evaluator, dummy_input=dummy_input,
+            num_iterations=3, optimize_mode='maximize', base_algo=args.base_algo,
+            cool_down_rate=args.cool_down_rate, admm_num_iterations=30, admm_training_epochs=5,
+            experiment_data_dir=args.experiment_data_dir)
+    else:
+        raise ValueError(
+            "Please use L1FilterPruner, NetAdaptPruner, SimulatedAnnealingPruner, ADMMPruner or AutoCompressPruner in this example.")
+    # Pruner.compress() returns the masked model
+    # but for AutoCompressPruner, Pruner.compress() returns directly the pruned model
+    model_masked = pruner.compress()
+    evaluation_result = evaluator(model_masked)
+    print('Evaluation result (masked model): %s' % evaluation_result)
+    result['pruned'] = evaluation_result
+    if args.save_model:
+        pruner.export_model(
+            os.path.join(args.experiment_data_dir, 'model_masked.pth'), os.path.join(args.experiment_data_dir, 'mask.pth'))
+        print('Masked model saved to %s', args.experiment_data_dir)
+    if args.fine_tune:
+        if args.dataset == 'mnist':
+            optimizer = torch.optim.Adadelta(model_masked.parameters(), lr=1)
+            scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
+            for epoch in range(args.fine_tune_epochs):
+                train(args, model_masked, device, train_loader, criterion, optimizer, epoch)
+                scheduler.step()
+                test(model_masked, device, criterion, val_loader)
+        elif args.dataset == 'cifar10':
+            optimizer = torch.optim.SGD(model_masked.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
+            scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
+            for epoch in range(args.fine_tune_epochs):
+                train(args, model_masked, device, train_loader, criterion, optimizer, epoch)
+                scheduler.step()
+                test(model_masked, device, criterion, val_loader)
+        elif args.dataset == 'imagenet':
+            for epoch in range(args.fine_tune_epochs):
+                optimizer = torch.optim.SGD(model_masked.parameters(), lr=0.05, momentum=0.9, weight_decay=5e-4)
+                train(args, model_masked, device, train_loader, criterion, optimizer, epoch)
+                test(model_masked, device, criterion, val_loader)
+    evaluation_result = evaluator(model_masked)
+    print('Evaluation result (fine tuned): %s' % evaluation_result)
+    result['finetuned'] = evaluation_result
+    if args.save_model:
+        pruner.export_model(os.path.join(
+            args.experiment_data_dir, 'model_fine_tuned.pth'), os.path.join(args.experiment_data_dir, 'mask.pth'))
+        print('Fined tuned model saved to %s', args.experiment_data_dir)
+    # model speed up
+    if args.speed_up and args.pruner != 'AutoCompressPruner':
+        if args.model == 'LeNet':
+            model = LeNet().to(device)
+        elif args.model == 'vgg16':
+            model = VGG(depth=16).to(device)
+        elif args.model == 'resnet18':
+            model = models.resnet18(pretrained=False, num_classes=10).to(device)
+        elif args.model == 'mobilenet_v2':
+            model = models.mobilenet_v2(pretrained=False).to(device)
+        model.load_state_dict(torch.load(os.path.join(args.experiment_data_dir, 'model_fine_tuned.pth')))
+        masks_file = os.path.join(args.experiment_data_dir, 'mask.pth')
+        m_speedup = ModelSpeedup(model, dummy_input, masks_file, device)
+        m_speedup.speedup_model()
+        evaluation_result = evaluator(model)
+        print('Evaluation result (speed up model): %s' % evaluation_result)
+        result['speedup'] = evaluation_result
+        torch.save(model.state_dict(), os.path.join(args.experiment_data_dir, 'model_speed_up.pth'))
+        print('Speed up model saved to %s', args.experiment_data_dir)
+    with open(os.path.join(args.experiment_data_dir, 'performance.json'), 'w+') as f:
+        json.dump(result, f)
+if __name__ == '__main__':
+    def str2bool(v):
+        if isinstance(v, bool):
+            return v
+        if v.lower() in ('yes', 'true', 't', 'y', '1'):
+            return True
+        elif v.lower() in ('no', 'false', 'f', 'n', '0'):
+            return False
+        else:
+            raise argparse.ArgumentTypeError('Boolean value expected.')
+    parser = argparse.ArgumentParser(description='PyTorch Example for SimulatedAnnealingPruner')
+    parser.add_argument('--pruner', type=str, default='SimulatedAnnealingPruner',
+                        help='pruner to use, L1FilterPruner, NetAdaptPruner, SimulatedAnnealingPruner, ADMMPruner or AutoCompressPruner')
+    parser.add_argument('--base-algo', type=str, default='l1',
+                        help='base pruning algorithm. level, l1 or l2')
+    parser.add_argument('--sparsity', type=float, default=0.3,
+                        help='overall target sparsity')
+    parser.add_argument('--speed-up', type=str2bool, default=False,
+                        help='Whether to speed-up the pruned model')
+    # param for SimulatedAnnealingPruner
+    parser.add_argument('--cool-down-rate', type=float, default=0.9,
+                        help='cool down rate')
+    # param for NetAdaptPruner
+    parser.add_argument('--sparsity-per-iteration', type=float, default=0.05,
+                        help='sparsity_per_iteration of NetAdaptPruner')
+    parser.add_argument('--dataset', type=str, default='mnist',
+                        help='dataset to use, mnist, cifar10 or imagenet (default MNIST)')
+    parser.add_argument('--model', type=str, default='LeNet',
+                        help='model to use, LeNet, vgg16, resnet18 or mobilenet_v2')
+    parser.add_argument('--fine-tune', type=str2bool, default=True,
+                        help='whether to fine-tune the pruned model')
+    parser.add_argument('--fine-tune-epochs', type=int, default=10,
+                        help='epochs to fine tune')
+    parser.add_argument('--data-dir', type=str, default='/datasets/',
+                        help='dataset directory')
+    parser.add_argument('--experiment-data-dir', type=str, default='./',
+                        help='For saving experiment data')
+    parser.add_argument('--batch-size', type=int, default=64,
+                        help='input batch size for training (default: 64)')
+    parser.add_argument('--test-batch-size', type=int, default=64,
+                        help='input batch size for testing (default: 64)')
+    parser.add_argument('--pretrain-epochs', type=int, default=1,
+                        help='number of epochs to pretrain the model')
+    parser.add_argument('--log-interval', type=int, default=200,
+                        help='how many batches to wait before logging training status')
+    parser.add_argument('--save-model', type=str2bool, default=True,
+                        help='For Saving the current Model')
+    args = parser.parse_args()
+    if not os.path.exists(args.experiment_data_dir):
+        os.makedirs(args.experiment_data_dir)
+    main(args)
--- a/examples/model_compress/models/mnist/lenet.py
+++ b/examples/model_compress/models/mnist/lenet.py
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+class LeNet(nn.Module):
+    def __init__(self):
+        super(LeNet, self).__init__()
+        self.conv1 = nn.Conv2d(1, 32, 3, 1)
+        self.conv2 = nn.Conv2d(32, 64, 3, 1)
+        self.dropout1 = nn.Dropout2d(0.25)
+        self.dropout2 = nn.Dropout2d(0.5)
+        self.fc1 = nn.Linear(9216, 128)
+        self.fc2 = nn.Linear(128, 10)
+    def forward(self, x):
+        x = self.conv1(x)
+        x = F.relu(x)
+        x = self.conv2(x)
+        x = F.relu(x)
+        x = F.max_pool2d(x, 2)
+        x = self.dropout1(x)
+        x = torch.flatten(x, 1)
+        x = self.fc1(x)
+        x = F.relu(x)
+        x = self.dropout2(x)
+        x = self.fc2(x)
+        output = F.log_softmax(x, dim=1)
+        return output
--- a/src/sdk/pynni/nni/compression/torch/__init__.py
+++ b/src/sdk/pynni/nni/compression/torch/__init__.py
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
+from .speedup import ModelSpeedup
 from .pruning import *
 from .quantization import *
 from .compressor import Compressor, Pruner, Quantizer
-from .speedup import ModelSpeedup
--- a/src/sdk/pynni/nni/compression/torch/compressor.py
+++ b/src/sdk/pynni/nni/compression/torch/compressor.py
@@ -346,7 +346,7 @@ class Pruner(Compressor):
        config : dict
            the configuration for generating the mask
        """
-        _logger.info("compressing module %s.", layer.name)
+        _logger.info("Module detected to compress : %s.", layer.name)
        wrapper = PrunerModuleWrapper(layer.module, layer.name, layer.type, config, self)
        assert hasattr(layer.module, 'weight'), "module %s does not have 'weight' attribute" % layer.name
        # move newly registered buffers to the same device of weight
@@ -381,7 +381,7 @@ class Pruner(Compressor):
            if weight_mask is not None:
                mask_sum = weight_mask.sum().item()
                mask_num = weight_mask.numel()
-                _logger.info('Layer: %s  Sparsity: %.2f', wrapper.name, 1 - mask_sum / mask_num)
+                _logger.info('Layer: %s  Sparsity: %.4f', wrapper.name, 1 - mask_sum / mask_num)
                wrapper.module.weight.data = wrapper.module.weight.data.mul(weight_mask)
            if bias_mask is not None:
                wrapper.module.bias.data = wrapper.module.bias.data.mul(bias_mask)

--- a/src/sdk/pynni/nni/compression/torch/pruning/__init__.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/__init__.py
@@ -7,3 +7,7 @@ from .apply_compression import apply_compression_results
 from .one_shot import *
 from .agp import *
 from .lottery_ticket import LotteryTicketPruner
+from .simulated_annealing_pruner import SimulatedAnnealingPruner
+from .net_adapt_pruner import NetAdaptPruner
+from .admm_pruner import ADMMPruner
+from .auto_compress_pruner import AutoCompressPruner
--- a/src/sdk/pynni/nni/compression/torch/pruning/admm_pruner.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/admm_pruner.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+import logging
+import torch
+from schema import And, Optional
+from ..utils.config_validation import CompressorSchema
+from .constants import MASKER_DICT
+from .one_shot import OneshotPruner
+_logger = logging.getLogger(__name__)
+class ADMMPruner(OneshotPruner):
+    """
+    This is a Pytorch implementation of ADMM Pruner algorithm.
+    Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
+    by decomposing the original nonconvex problem into two subproblems that can be solved iteratively.
+    In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively.
+    This solution framework applies both to non-structured and different variations of structured pruning schemes.
+    For more details, please refer to the paper: https://arxiv.org/abs/1804.03294.
+    """
+    def __init__(self, model, config_list, trainer, num_iterations=30, training_epochs=5, row=1e-4, base_algo='l1'):
+        """
+        Parameters
+        ----------
+        model : torch.nn.module
+            Model to be pruned
+        config_list : list
+            List on pruning configs
+        trainer : function
+            Function used for the first subproblem.
+            Users should write this function as a normal function to train the Pytorch model
+            and include `model, optimizer, criterion, epoch, callback` as function arguments.
+            Here `callback` acts as an L2 regulizer as presented in the formula (7) of the original paper.
+            The logic of `callback` is implemented inside the Pruner,
+            users are just required to insert `callback()` between `loss.backward()` and `optimizer.step()`.
+            Example::
+            ```
+            >>> def trainer(model, criterion, optimizer, epoch, callback):
+            >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            >>>     train_loader = ...
+            >>>     model.train()
+            >>>     for batch_idx, (data, target) in enumerate(train_loader):
+            >>>         data, target = data.to(device), target.to(device)
+            >>>         optimizer.zero_grad()
+            >>>         output = model(data)
+            >>>         loss = criterion(output, target)
+            >>>         loss.backward()
+            >>>         # callback should be inserted between loss.backward() and optimizer.step()
+            >>>         if callback:
+            >>>             callback()
+            >>>         optimizer.step()
+            ```
+        num_iterations : int
+            Total number of iterations.
+        training_epochs : int
+            Training epochs of the first subproblem.
+        row : float
+            Penalty parameters for ADMM training.
+        base_algo : str
+            Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`. Given the sparsity distribution among the ops,
+            the assigned `base_algo` is used to decide which filters/channels/weights to prune.
+        """
+        self._base_algo = base_algo
+        super().__init__(model, config_list)
+        self._trainer = trainer
+        self._num_iterations = num_iterations
+        self._training_epochs = training_epochs
+        self._row = row
+        self.set_wrappers_attribute("if_calculated", False)
+        self.masker = MASKER_DICT[self._base_algo](self.bound_model, self)
+    def validate_config(self, model, config_list):
+        """
+        Parameters
+        ----------
+        model : torch.nn.module
+            Model to be pruned
+        config_list : list
+            List on pruning configs
+        """
+        if self._base_algo == 'level':
+            schema = CompressorSchema([{
+                'sparsity': And(float, lambda n: 0 < n < 1),
+                Optional('op_types'): [str],
+                Optional('op_names'): [str],
+            }], model, _logger)
+        elif self._base_algo in ['l1', 'l2']:
+            schema = CompressorSchema([{
+                'sparsity': And(float, lambda n: 0 < n < 1),
+                'op_types': ['Conv2d'],
+                Optional('op_names'): [str]
+            }], model, _logger)
+        schema.validate(config_list)
+    def _projection(self, weight, sparsity):
+        '''
+        Return the Euclidean projection of the weight matrix according to the pruning mode.
+        Parameters
+        ----------
+        weight : tensor
+            original matrix
+        sparsity : float
+            the ratio of parameters which need to be set to zero
+        Returns
+        -------
+        tensor
+            the projected matrix
+        '''
+        w_abs = weight.abs()
+        if self._base_algo == 'level':
+            k = int(weight.numel() * sparsity)
+            if k == 0:
+                mask_weight = torch.ones(weight.shape).type_as(weight)
+            else:
+                threshold = torch.topk(w_abs.view(-1), k, largest=False)[0].max()
+                mask_weight = torch.gt(w_abs, threshold).type_as(weight)
+        elif self._base_algo in ['l1', 'l2']:
+            filters = weight.size(0)
+            num_prune = int(filters * sparsity)
+            if filters < 2 or num_prune < 1:
+                mask_weight = torch.ones(weight.size()).type_as(weight).detach()
+            else:
+                w_abs_structured = w_abs.view(filters, -1).sum(dim=1)
+                threshold = torch.topk(w_abs_structured.view(-1), num_prune, largest=False)[0].max()
+                mask_weight = torch.gt(w_abs_structured, threshold)[:, None, None, None].expand_as(weight).type_as(weight)
+        return weight.data.mul(mask_weight)
+    def compress(self):
+        """
+        Compress the model with ADMM.
+        Returns
+        -------
+        torch.nn.Module
+            model with specified modules compressed.
+        """
+        _logger.info('Starting ADMM Compression...')
+        # initiaze Z, U
+        # Z_i^0 = W_i^0
+        # U_i^0 = 0
+        Z = []
+        U = []
+        for wrapper in self.get_modules_wrapper():
+            z = wrapper.module.weight.data
+            Z.append(z)
+            U.append(torch.zeros_like(z))
+        optimizer = torch.optim.Adam(
+            self.bound_model.parameters(), lr=1e-3, weight_decay=5e-5)
+        # Loss = cross_entropy +  l2 regulization + \Sum_{i=1}^N \row_i ||W_i - Z_i^k + U_i^k||^2
+        criterion = torch.nn.CrossEntropyLoss()
+        # callback function to do additonal optimization, refer to the deriatives of Formula (7)
+        def callback():
+            for i, wrapper in enumerate(self.get_modules_wrapper()):
+                wrapper.module.weight.data -= self._row * \
+                    (wrapper.module.weight.data - Z[i] + U[i])
+        # optimization iteration
+        for k in range(self._num_iterations):
+            _logger.info('ADMM iteration : %d', k)
+            # step 1: optimize W with AdamOptimizer
+            for epoch in range(self._training_epochs):
+                self._trainer(self.bound_model, optimizer=optimizer,
+                              criterion=criterion, epoch=epoch, callback=callback)
+            # step 2: update Z, U
+            # Z_i^{k+1} = projection(W_i^{k+1} + U_i^k)
+            # U_i^{k+1} = U^k + W_i^{k+1} - Z_i^{k+1}
+            for i, wrapper in enumerate(self.get_modules_wrapper()):
+                z = wrapper.module.weight.data + U[i]
+                Z[i] = self._projection(z, wrapper.config['sparsity'])
+                U[i] = U[i] + wrapper.module.weight.data - Z[i]
+        # apply prune
+        self.update_mask()
+        _logger.info('Compression finished.')
+        return self.bound_model
--- a/src/sdk/pynni/nni/compression/torch/pruning/auto_compress_pruner.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/auto_compress_pruner.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+import logging
+import os
+import copy
+import torch
+from schema import And, Optional
+from nni.utils import OptimizeMode
+from nni.compression.torch import ModelSpeedup
+from ..compressor import Pruner
+from ..utils.config_validation import CompressorSchema
+from .simulated_annealing_pruner import SimulatedAnnealingPruner
+from .admm_pruner import ADMMPruner
+_logger = logging.getLogger(__name__)
+class AutoCompressPruner(Pruner):
+    """
+    This is a Pytorch implementation of AutoCompress pruning algorithm.
+    For each round, AutoCompressPruner prune the model for the same sparsity to achive the ovrall sparsity:
+        1. Generate sparsities distribution using SimualtedAnnealingPruner
+        2. Perform ADMM-based structured pruning to generate pruning result for the next round.
+           Here we use 'speedup' to perform real pruning.
+    For more details, please refer to the paper: https://arxiv.org/abs/1907.03141.
+    """
+    def __init__(self, model, config_list, trainer, evaluator, dummy_input,
+                 num_iterations=3, optimize_mode='maximize', base_algo='l1',
+                 # SimulatedAnnealing related
+                 start_temperature=100, stop_temperature=20, cool_down_rate=0.9, perturbation_magnitude=0.35,
+                 # ADMM related
+                 admm_num_iterations=30, admm_training_epochs=5, row=1e-4,
+                 experiment_data_dir='./'):
+        """
+        Parameters
+        ----------
+        model : pytorch model
+            The model to be pruned
+        config_list : list
+            Supported keys:
+                - sparsity : The target overall sparsity.
+                - op_types : The operation type to prune.
+        trainer : function
+            Function used for the first subproblem of ADMM Pruner.
+            Users should write this function as a normal function to train the Pytorch model
+            and include `model, optimizer, criterion, epoch, callback` as function arguments.
+            Here `callback` acts as an L2 regulizer as presented in the formula (7) of the original paper.
+            The logic of `callback` is implemented inside the Pruner,
+            users are just required to insert `callback()` between `loss.backward()` and `optimizer.step()`.
+            Example::
+            ```
+            >>> def trainer(model, criterion, optimizer, epoch, callback):
+            >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            >>>     train_loader = ...
+            >>>     model.train()
+            >>>     for batch_idx, (data, target) in enumerate(train_loader):
+            >>>         data, target = data.to(device), target.to(device)
+            >>>         optimizer.zero_grad()
+            >>>         output = model(data)
+            >>>         loss = criterion(output, target)
+            >>>         loss.backward()
+            >>>         # callback should be inserted between loss.backward() and optimizer.step()
+            >>>         if callback:
+            >>>             callback()
+            >>>         optimizer.step()
+            ```
+        evaluator : function
+            function to evaluate the pruned model.
+            This function should include `model` as the only parameter, and returns a scalar value.
+            Example::
+            >>> def evaluator(model):
+            >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            >>>     val_loader = ...
+            >>>     model.eval()
+            >>>     correct = 0
+            >>>     with torch.no_grad():
+            >>>         for data, target in val_loader:
+            >>>             data, target = data.to(device), target.to(device)
+            >>>             output = model(data)
+            >>>             # get the index of the max log-probability
+            >>>             pred = output.argmax(dim=1, keepdim=True)
+            >>>             correct += pred.eq(target.view_as(pred)).sum().item()
+            >>>     accuracy = correct / len(val_loader.dataset)
+            >>>     return accuracy
+        dummy_input : pytorch tensor
+            The dummy input for ```jit.trace```, users should put it on right device before pass in
+        num_iterations : int
+            Number of overall iterations
+        optimize_mode : str
+            optimize mode, `maximize` or `minimize`, by default `maximize`
+        base_algo : str
+            Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`. Given the sparsity distribution among the ops,
+            the assigned `base_algo` is used to decide which filters/channels/weights to prune.
+        start_temperature : float
+            Simualated Annealing related parameter
+        stop_temperature : float
+            Simualated Annealing related parameter
+        cool_down_rate : float
+            Simualated Annealing related parameter
+        perturbation_magnitude : float
+            Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature
+        admm_num_iterations : int
+            Number of iterations of ADMM Pruner
+        admm_training_epochs : int
+            Training epochs of the first optimization subproblem of ADMMPruner
+        row : float
+            Penalty parameters for ADMM training
+        experiment_data_dir : string
+            PATH to store temporary experiment data
+        """
+        # original model
+        self._model_to_prune = model
+        self._base_algo = base_algo
+        self._trainer = trainer
+        self._evaluator = evaluator
+        self._dummy_input = dummy_input
+        self._num_iterations = num_iterations
+        self._optimize_mode = OptimizeMode(optimize_mode)
+        # hyper parameters for SA algorithm
+        self._start_temperature = start_temperature
+        self._stop_temperature = stop_temperature
+        self._cool_down_rate = cool_down_rate
+        self._perturbation_magnitude = perturbation_magnitude
+        # hyper parameters for ADMM algorithm
+        self._admm_num_iterations = admm_num_iterations
+        self._admm_training_epochs = admm_training_epochs
+        self._row = row
+        # overall pruning rate
+        self._sparsity = config_list[0]['sparsity']
+        self._experiment_data_dir = experiment_data_dir
+        if not os.path.exists(self._experiment_data_dir):
+            os.makedirs(self._experiment_data_dir)
+    def validate_config(self, model, config_list):
+        """
+        Parameters
+        ----------
+        model : torch.nn.module
+            Model to be pruned
+        config_list : list
+            List on pruning configs
+        """
+        if self._base_algo == 'level':
+            schema = CompressorSchema([{
+                'sparsity': And(float, lambda n: 0 < n < 1),
+                Optional('op_types'): [str],
+                Optional('op_names'): [str],
+            }], model, _logger)
+        elif self._base_algo in ['l1', 'l2']:
+            schema = CompressorSchema([{
+                'sparsity': And(float, lambda n: 0 < n < 1),
+                'op_types': ['Conv2d'],
+                Optional('op_names'): [str]
+            }], model, _logger)
+        schema.validate(config_list)
+    def calc_mask(self, wrapper, **kwargs):
+        return None
+    def compress(self):
+        """
+        Compress the model with AutoCompress.
+        Returns
+        -------
+        torch.nn.Module
+            model with specified modules compressed.
+        """
+        _logger.info('Starting AutoCompress pruning...')
+        sparsity_each_round = 1 - pow(1-self._sparsity, 1/self._num_iterations)
+        for i in range(self._num_iterations):
+            _logger.info('Pruning iteration: %d', i)
+            _logger.info('Target sparsity this round: %s',
+                         1-pow(1-sparsity_each_round, i+1))
+            # SimulatedAnnealingPruner
+            _logger.info(
+                'Generating sparsities with SimulatedAnnealingPruner...')
+            SApruner = SimulatedAnnealingPruner(
+                model=copy.deepcopy(self._model_to_prune),
+                config_list=[
+                    {"sparsity": sparsity_each_round, "op_types": ['Conv2d']}],
+                evaluator=self._evaluator,
+                optimize_mode=self._optimize_mode,
+                base_algo=self._base_algo,
+                start_temperature=self._start_temperature,
+                stop_temperature=self._stop_temperature,
+                cool_down_rate=self._cool_down_rate,
+                perturbation_magnitude=self._perturbation_magnitude,
+                experiment_data_dir=self._experiment_data_dir)
+            config_list = SApruner.compress(return_config_list=True)
+            _logger.info("Generated config_list : %s", config_list)
+            # ADMMPruner
+            _logger.info('Performing structured pruning with ADMMPruner...')
+            ADMMpruner = ADMMPruner(
+                model=copy.deepcopy(self._model_to_prune),
+                config_list=config_list,
+                trainer=self._trainer,
+                num_iterations=self._admm_num_iterations,
+                training_epochs=self._admm_training_epochs,
+                row=self._row,
+                base_algo=self._base_algo)
+            ADMMpruner.compress()
+            ADMMpruner.export_model(os.path.join(self._experiment_data_dir, 'model_admm_masked.pth'), os.path.join(
+                self._experiment_data_dir, 'mask.pth'))
+            # use speed up to prune the model before next iteration, because SimulatedAnnealingPruner & ADMMPruner don't take masked models
+            self._model_to_prune.load_state_dict(torch.load(os.path.join(
+                self._experiment_data_dir, 'model_admm_masked.pth')))
+            masks_file = os.path.join(self._experiment_data_dir, 'mask.pth')
+            device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            _logger.info('Speeding up models...')
+            m_speedup = ModelSpeedup(self._model_to_prune, self._dummy_input, masks_file, device)
+            m_speedup.speedup_model()
+            evaluation_result = self._evaluator(self._model_to_prune)
+            _logger.info('Evaluation result of the pruned model in iteration %d: %s', i, evaluation_result)
+        _logger.info('----------Compression finished--------------')
+        os.remove(os.path.join(self._experiment_data_dir, 'model_admm_masked.pth'))
+        os.remove(os.path.join(self._experiment_data_dir, 'mask.pth'))
+        return self._model_to_prune
+    def export_model(self, model_path, mask_path=None, onnx_path=None, input_shape=None, device=None):
+        _logger.info("AutoCompressPruner export directly the pruned model without mask")
+        torch.save(self._model_to_prune.state_dict(), model_path)
+        _logger.info('Model state_dict saved to %s', model_path)
+        if onnx_path is not None:
+            assert input_shape is not None, 'input_shape must be specified to export onnx model'
+            # input info needed
+            if device is None:
+                device = torch.device('cpu')
+            input_data = torch.Tensor(*input_shape)
+            torch.onnx.export(self._model_to_prune, input_data.to(device), onnx_path)
+            _logger.info('Model in onnx with input shape %s saved to %s', input_data.shape, onnx_path)
--- a/src/sdk/pynni/nni/compression/torch/pruning/constants_pruner.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/constants_pruner.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+from .one_shot import LevelPruner, L1FilterPruner, L2FilterPruner
+PRUNER_DICT = {
+    'level': LevelPruner,
+    'l1': L1FilterPruner,
+    'l2': L2FilterPruner
+}
--- a/src/sdk/pynni/nni/compression/torch/pruning/finegrained_pruning.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/finegrained_pruning.py
@@ -29,4 +29,3 @@ class LevelPrunerMasker(WeightMasker):
        mask_weight = torch.gt(w_abs, threshold).type_as(weight)
        mask = {'weight_mask': mask_weight}
        return mask
--- a/src/sdk/pynni/nni/compression/torch/pruning/net_adapt_pruner.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/net_adapt_pruner.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+import logging
+import os
+import copy
+import json
+import torch
+from schema import And, Optional
+from nni.utils import OptimizeMode
+from ..compressor import Pruner
+from ..utils.config_validation import CompressorSchema
+from ..utils.num_param_counter import get_total_num_weights
+from .constants_pruner import PRUNER_DICT
+_logger = logging.getLogger(__name__)
+class NetAdaptPruner(Pruner):
+    """
+    This is a Pytorch implementation of NetAdapt compression algorithm.
+    The pruning procedure can be described as follows:
+    While Res_i > Bud:
+        1. Con = Res_i - delta_Res
+        2. for every layer:
+            Choose Num Filters to prune
+            Choose which filter to prune
+            Short-term fine tune the pruned model
+        3. Pick the best layer to prune
+    Long-term fine tune
+    For the details of this algorithm, please refer to the paper: https://arxiv.org/abs/1804.03230
+    """
+    def __init__(self, model, config_list, short_term_fine_tuner, evaluator,
+                 optimize_mode='maximize', base_algo='l1', sparsity_per_iteration=0.05, experiment_data_dir='./'):
+        """
+        Parameters
+        ----------
+        model : pytorch model
+            The model to be pruned
+        config_list : list
+            Supported keys:
+                - sparsity : The target overall sparsity.
+                - op_types : The operation type to prune.
+        short_term_fine_tuner : function
+            function to short-term fine tune the masked model.
+            This function should include `model` as the only parameter,
+            and fine tune the model for a short term after each pruning iteration.
+            Example:
+            >>> def short_term_fine_tuner(model, epoch=3):
+            >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            >>>     train_loader = ...
+            >>>     criterion = torch.nn.CrossEntropyLoss()
+            >>>     optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
+            >>>     model.train()
+            >>>     for _ in range(epoch):
+            >>>         for _, (data, target) in enumerate(train_loader):
+            >>>             data, target = data.to(device), target.to(device)
+            >>>             optimizer.zero_grad()
+            >>>             output = model(data)
+            >>>             loss = criterion(output, target)
+            >>>             loss.backward()
+            >>>             optimizer.step()
+        evaluator : function
+            function to evaluate the masked model.
+            This function should include `model` as the only parameter, and returns a scalar value.
+            Example::
+            >>> def evaluator(model):
+            >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            >>>     val_loader = ...
+            >>>     model.eval()
+            >>>     correct = 0
+            >>>     with torch.no_grad():
+            >>>         for data, target in val_loader:
+            >>>             data, target = data.to(device), target.to(device)
+            >>>             output = model(data)
+            >>>             # get the index of the max log-probability
+            >>>             pred = output.argmax(dim=1, keepdim=True)
+            >>>             correct += pred.eq(target.view_as(pred)).sum().item()
+            >>>     accuracy = correct / len(val_loader.dataset)
+            >>>     return accuracy
+        optimize_mode : str
+            optimize mode, `maximize` or `minimize`, by default `maximize`.
+        base_algo : str
+            Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`. Given the sparsity distribution among the ops,
+            the assigned `base_algo` is used to decide which filters/channels/weights to prune.
+        sparsity_per_iteration : float
+            sparsity to prune in each iteration
+        experiment_data_dir : str
+            PATH to save experiment data,
+            including the config_list generated for the base pruning algorithm and the performance of the pruned model.
+        """
+        # models used for iterative pruning and evaluation
+        self._model_to_prune = copy.deepcopy(model)
+        self._base_algo = base_algo
+        super().__init__(model, config_list)
+        self._short_term_fine_tuner = short_term_fine_tuner
+        self._evaluator = evaluator
+        self._optimize_mode = OptimizeMode(optimize_mode)
+        # hyper parameters for NetAdapt algorithm
+        self._sparsity_per_iteration = sparsity_per_iteration
+        # overall pruning rate
+        self._sparsity = config_list[0]['sparsity']
+        # config_list
+        self._config_list_generated = []
+        self._experiment_data_dir = experiment_data_dir
+        if not os.path.exists(self._experiment_data_dir):
+            os.makedirs(self._experiment_data_dir)
+        self._tmp_model_path = os.path.join(self._experiment_data_dir, 'tmp_model.pth')
+    def validate_config(self, model, config_list):
+        """
+        Parameters
+        ----------
+        model : torch.nn.module
+            Model to be pruned
+        config_list : list
+            List on pruning configs
+        """
+        if self._base_algo == 'level':
+            schema = CompressorSchema([{
+                'sparsity': And(float, lambda n: 0 < n < 1),
+                Optional('op_types'): [str],
+                Optional('op_names'): [str],
+            }], model, _logger)
+        elif self._base_algo in ['l1', 'l2']:
+            schema = CompressorSchema([{
+                'sparsity': And(float, lambda n: 0 < n < 1),
+                'op_types': ['Conv2d'],
+                Optional('op_names'): [str]
+            }], model, _logger)
+        schema.validate(config_list)
+    def calc_mask(self, wrapper, **kwargs):
+        return None
+    def _update_config_list(self, config_list, op_name, sparsity):
+        '''
+        update sparsity of op_name in config_list
+        '''
+        config_list_updated = copy.deepcopy(config_list)
+        for idx, item in enumerate(config_list):
+            if op_name in item['op_names']:
+                config_list_updated[idx]['sparsity'] = sparsity
+                return config_list_updated
+        # if op_name is not in self._config_list_generated, create a new json item
+        if self._base_algo in ['l1', 'l2']:
+            config_list_updated.append(
+                {'sparsity': sparsity, 'op_types': ['Conv2d'], 'op_names': [op_name]})
+        elif self._base_algo == 'level':
+            config_list_updated.append(
+                {'sparsity': sparsity, 'op_names': [op_name]})
+        return config_list_updated
+    def _get_op_num_weights_remained(self, op_name, module):
+        '''
+        Get the number of weights remained after channel pruning with current sparsity
+        Returns
+        -------
+        int
+            remained number of weights of the op
+        '''
+        # if op is wrapped by the pruner
+        for wrapper in self.get_modules_wrapper():
+            if wrapper.name == op_name:
+                return wrapper.weight_mask.sum().item()
+        # if op is not wrapped by the pruner
+        return module.weight.data.numel()
+    def _get_op_sparsity(self, op_name):
+        for config in self._config_list_generated:
+            if 'op_names' in config and op_name in config['op_names']:
+                return config['sparsity']
+        return 0
+    def _calc_num_related_weights(self, op_name):
+        '''
+        Calculate total number weights of the op and the next op, applicable only for models without dependencies among ops
+        Parameters
+        ----------
+        op_name : str
+        Returns
+        -------
+        int
+            total number of all the realted (current and the next) op weights
+        '''
+        num_weights = 0
+        flag_found = False
+        previous_name = None
+        previous_module = None
+        for name, module in self._model_to_prune.named_modules():
+            if not flag_found and name != op_name and type(module).__name__ in ['Conv2d', 'Linear']:
+                previous_name = name
+                previous_module = module
+            if not flag_found and name == op_name:
+                _logger.debug("original module found: %s", name)
+                num_weights = module.weight.data.numel()
+                # consider related pruning in this op caused by previous op's pruning
+                if previous_module:
+                    sparsity_previous_op = self._get_op_sparsity(previous_name)
+                    if sparsity_previous_op:
+                        _logger.debug(
+                            "decrease op's weights by %s due to previous op %s's pruning...", sparsity_previous_op, previous_name)
+                        num_weights *= (1-sparsity_previous_op)
+                flag_found = True
+                continue
+            if flag_found and type(module).__name__ in ['Conv2d', 'Linear']:
+                _logger.debug("related module found: %s", name)
+                # channel/filter pruning crossing is considered here, so only the num_weights after channel pruning is valuable
+                num_weights += self._get_op_num_weights_remained(name, module)
+                break
+        _logger.debug("num related weights of op %s : %d", op_name, num_weights)
+        return num_weights
+    def compress(self):
+        """
+        Compress the model.
+        Returns
+        -------
+        torch.nn.Module
+            model with specified modules compressed.
+        """
+        _logger.info('Starting NetAdapt Compression...')
+        pruning_iteration = 0
+        current_sparsity = 0
+        delta_num_weights_per_iteration = \
+            int(get_total_num_weights(self._model_to_prune, ['Conv2d', 'Linear']) * self._sparsity_per_iteration)
+        # stop condition
+        while current_sparsity < self._sparsity:
+            _logger.info('Pruning iteration: %d', pruning_iteration)
+            # calculate target sparsity of this iteration
+            target_sparsity = current_sparsity + self._sparsity_per_iteration
+            # variable to store the info of the best layer found in this iteration
+            best_op = {}
+            for wrapper in self.get_modules_wrapper():
+                _logger.debug("op name : %s", wrapper.name)
+                _logger.debug("op weights : %d", wrapper.weight_mask.numel())
+                _logger.debug("op left weights : %d", wrapper.weight_mask.sum().item())
+                current_op_sparsity = 1 - wrapper.weight_mask.sum().item() / wrapper.weight_mask.numel()
+                _logger.debug("current op sparsity : %s", current_op_sparsity)
+                # sparsity that this layer needs to prune to satisfy the requirement
+                target_op_sparsity = current_op_sparsity + delta_num_weights_per_iteration / self._calc_num_related_weights(wrapper.name)
+                if target_op_sparsity >= 1:
+                    _logger.info('Layer %s has no enough weights (remained) to prune', wrapper.name)
+                    continue
+                config_list = self._update_config_list(self._config_list_generated, wrapper.name, target_op_sparsity)
+                _logger.debug("config_list used : %s", config_list)
+                pruner = PRUNER_DICT[self._base_algo](copy.deepcopy(self._model_to_prune), config_list)
+                model_masked = pruner.compress()
+                # Short-term fine tune the pruned model
+                self._short_term_fine_tuner(model_masked)
+                performance = self._evaluator(model_masked)
+                _logger.info("Layer : %s, evaluation result after short-term fine tuning : %s", wrapper.name, performance)
+                if not best_op \
+                    or (self._optimize_mode is OptimizeMode.Maximize and performance > best_op['performance']) \
+                    or (self._optimize_mode is OptimizeMode.Minimize and performance < best_op['performance']):
+                    _logger.debug("updating best layer to %s...", wrapper.name)
+                    # find weight mask of this layer
+                    for w in pruner.get_modules_wrapper():
+                        if w.name == wrapper.name:
+                            masks = {'weight_mask': w.weight_mask,
+                                     'bias_mask': w.bias_mask}
+                            break
+                    best_op = {
+                        'op_name': wrapper.name,
+                        'sparsity': target_op_sparsity,
+                        'performance': performance,
+                        'masks': masks
+                    }
+                    # save model weights
+                    pruner.export_model(self._tmp_model_path)
+            if not best_op:
+                # decrease pruning step
+                self._sparsity_per_iteration *= 0.5
+                _logger.info("No more layers to prune, decrease pruning step to %s", self._sparsity_per_iteration)
+                continue
+            # Pick the best layer to prune, update iterative information
+            # update config_list
+            self._config_list_generated = self._update_config_list(
+                self._config_list_generated, best_op['op_name'], best_op['sparsity'])
+            # update weights parameters
+            self._model_to_prune.load_state_dict(torch.load(self._tmp_model_path))
+            # update mask of the chosen op
+            for wrapper in self.get_modules_wrapper():
+                if wrapper.name == best_op['op_name']:
+                    for k in best_op['masks']:
+                        setattr(wrapper, k, best_op['masks'][k])
+                    break
+            current_sparsity = target_sparsity
+            _logger.info('Pruning iteration %d finished, current sparsity: %s', pruning_iteration, current_sparsity)
+            _logger.info('Layer %s seleted with sparsity %s, performance after pruning & short term fine-tuning : %s',
+                         best_op['op_name'], best_op['sparsity'], best_op['performance'])
+            pruning_iteration += 1
+            self._final_performance = best_op['performance']
+        # load weights parameters
+        self.load_model_state_dict(torch.load(self._tmp_model_path))
+        os.remove(self._tmp_model_path)
+        _logger.info('----------Compression finished--------------')
+        _logger.info('config_list generated: %s', self._config_list_generated)
+        _logger.info("Performance after pruning: %s", self._final_performance)
+        _logger.info("Masked sparsity: %.6f", current_sparsity)
+        # save best config found and best performance
+        with open(os.path.join(self._experiment_data_dir, 'search_result.json'), 'w') as jsonfile:
+            json.dump({
+                'performance': self._final_performance,
+                'config_list': json.dumps(self._config_list_generated)
+            }, jsonfile)
+        _logger.info('search history and result saved to foler : %s', self._experiment_data_dir)
+        return self.bound_model
--- a/src/sdk/pynni/nni/compression/torch/pruning/simulated_annealing_pruner.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/simulated_annealing_pruner.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+import logging
+import os
+import math
+import copy
+import csv
+import json
+import numpy as np
+from schema import And, Optional
+from nni.utils import OptimizeMode
+from ..compressor import Pruner
+from ..utils.config_validation import CompressorSchema
+from .constants_pruner import PRUNER_DICT
+_logger = logging.getLogger(__name__)
+class SimulatedAnnealingPruner(Pruner):
+    """
+    This is a Pytorch implementation of Simulated Annealing compression algorithm.
+    - Randomly initialize a pruning rate distribution (sparsities).
+    - While current_temperature < stop_temperature:
+        1. generate a perturbation to current distribution
+        2. Perform fast evaluation on the perturbated distribution
+        3. accept the perturbation according to the performance and probability, if not accepted, return to step 1
+        4. cool down, current_temperature <- current_temperature * cool_down_rate
+    """
+    def __init__(self, model, config_list, evaluator, optimize_mode='maximize', base_algo='l1',
+                 start_temperature=100, stop_temperature=20, cool_down_rate=0.9, perturbation_magnitude=0.35, experiment_data_dir='./'):
+        """
+        Parameters
+        ----------
+        model : pytorch model
+            The model to be pruned
+        config_list : list
+            Supported keys:
+                - sparsity : The target overall sparsity.
+                - op_types : The operation type to prune.
+        evaluator : function
+            function to evaluate the pruned model.
+            This function should include `model` as the only parameter, and returns a scalar value.
+            Example::
+            >>> def evaluator(model):
+            >>>     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            >>>     val_loader = ...
+            >>>     model.eval()
+            >>>     correct = 0
+            >>>     with torch.no_grad():
+            >>>         for data, target in val_loader:
+            >>>             data, target = data.to(device), target.to(device)
+            >>>             output = model(data)
+            >>>             # get the index of the max log-probability
+            >>>             pred = output.argmax(dim=1, keepdim=True)
+            >>>             correct += pred.eq(target.view_as(pred)).sum().item()
+            >>>     accuracy = correct / len(val_loader.dataset)
+            >>>     return accuracy
+        optimize_mode : str
+            optimize mode, `maximize` or `minimize`, by default `maximize`.
+        base_algo : str
+            Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`. Given the sparsity distribution among the ops,
+            the assigned `base_algo` is used to decide which filters/channels/weights to prune.
+        start_temperature : float
+            Simualated Annealing related parameter
+        stop_temperature : float
+            Simualated Annealing related parameter
+        cool_down_rate : float
+            Simualated Annealing related parameter
+        perturbation_magnitude : float
+            initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature
+        experiment_data_dir : string
+            PATH to save experiment data,
+            including the config_list generated for the base pruning algorithm, the performance of the pruned model and the pruning history.
+        """
+        # original model
+        self._model_to_prune = copy.deepcopy(model)
+        self._base_algo = base_algo
+        super().__init__(model, config_list)
+        self._evaluator = evaluator
+        self._optimize_mode = OptimizeMode(optimize_mode)
+        # hyper parameters for SA algorithm
+        self._start_temperature = start_temperature
+        self._current_temperature = start_temperature
+        self._stop_temperature = stop_temperature
+        self._cool_down_rate = cool_down_rate
+        self._perturbation_magnitude = perturbation_magnitude
+        # overall pruning rate
+        self._sparsity = config_list[0]['sparsity']
+        # pruning rates of the layers
+        self._sparsities = None
+        # init current performance & best performance
+        self._current_performance = -np.inf
+        self._best_performance = -np.inf
+        self._best_config_list = []
+        self._search_history = []
+        self._experiment_data_dir = experiment_data_dir
+        if not os.path.exists(self._experiment_data_dir):
+            os.makedirs(self._experiment_data_dir)
+    def validate_config(self, model, config_list):
+        """
+        Parameters
+        ----------
+        model : torch.nn.module
+            Model to be pruned
+        config_list : list
+            List on pruning configs
+        """
+        if self._base_algo == 'level':
+            schema = CompressorSchema([{
+                'sparsity': And(float, lambda n: 0 < n < 1),
+                Optional('op_types'): [str],
+                Optional('op_names'): [str],
+            }], model, _logger)
+        elif self._base_algo in ['l1', 'l2']:
+            schema = CompressorSchema([{
+                'sparsity': And(float, lambda n: 0 < n < 1),
+                'op_types': ['Conv2d'],
+                Optional('op_names'): [str]
+            }], model, _logger)
+        schema.validate(config_list)
+    def _sparsities_2_config_list(self, sparsities):
+        '''
+        convert sparsities vector into config_list for LevelPruner or L1FilterPruner
+        Parameters
+        ----------
+        sparsities : list
+            list of sparsities
+        Returns
+        -------
+        list of dict
+            config_list for LevelPruner or L1FilterPruner
+        '''
+        config_list = []
+        sparsities = sorted(sparsities)
+        self.modules_wrapper = sorted(
+            self.modules_wrapper, key=lambda wrapper: wrapper.module.weight.data.numel())
+        # a layer with more weights will have no less pruning rate
+        for idx, wrapper in enumerate(self.get_modules_wrapper()):
+            # L1Filter Pruner requires to specify op_types
+            if self._base_algo in ['l1', 'l2']:
+                config_list.append(
+                    {'sparsity': sparsities[idx], 'op_types': ['Conv2d'], 'op_names': [wrapper.name]})
+            elif self._base_algo == 'level':
+                config_list.append(
+                    {'sparsity': sparsities[idx], 'op_names': [wrapper.name]})
+        config_list = [val for val in config_list if not math.isclose(val['sparsity'], 0, abs_tol=1e-6)]
+        return config_list
+    def _rescale_sparsities(self, sparsities, target_sparsity):
+        '''
+        Rescale the sparsities list to satisfy the target overall sparsity
+        Parameters
+        ----------
+        sparsities : list
+        target_sparsity : float
+            the target overall sparsity
+        Returns
+        -------
+        list
+            the rescaled sparsities
+        '''
+        num_weights = []
+        for wrapper in self.get_modules_wrapper():
+            num_weights.append(wrapper.module.weight.data.numel())
+        num_weights = sorted(num_weights)
+        sparsities = sorted(sparsities)
+        total_weights = 0
+        total_weights_pruned = 0
+        # calculate the scale
+        for idx, num_weight in enumerate(num_weights):
+            total_weights += num_weight
+            total_weights_pruned += int(num_weight*sparsities[idx])
+        if total_weights_pruned == 0:
+            return None
+        scale = target_sparsity / (total_weights_pruned/total_weights)
+        # rescale the sparsities
+        sparsities = np.asarray(sparsities)*scale
+        return sparsities
+    def _init_sparsities(self):
+        '''
+        Generate a sorted sparsities vector
+        '''
+        # repeatedly generate a distribution until satisfies the overall sparsity requirement
+        _logger.info('Gererating sparsities...')
+        while True:
+            sparsities = sorted(np.random.uniform(
+                0, 1, len(self.get_modules_wrapper())))
+            sparsities = self._rescale_sparsities(
+                sparsities, target_sparsity=self._sparsity)
+            if sparsities is not None and sparsities[0] >= 0 and sparsities[-1] < 1:
+                _logger.info('Initial sparsities generated : %s', sparsities)
+                self._sparsities = sparsities
+                break
+    def _generate_perturbations(self):
+        '''
+        Generate perturbation to the current sparsities distribution.
+        Returns:
+        --------
+        list
+            perturbated sparsities
+        '''
+        _logger.info("Gererating perturbations to the current sparsities...")
+        # decrease magnitude with current temperature
+        magnitude = self._current_temperature / \
+            self._start_temperature * self._perturbation_magnitude
+        _logger.info('current perturation magnitude:%s', magnitude)
+        while True:
+            perturbation = np.random.uniform(-magnitude,
+                                             magnitude, len(self.get_modules_wrapper()))
+            sparsities = np.clip(0, self._sparsities + perturbation, None)
+            _logger.debug("sparsities before rescalling:%s", sparsities)
+            sparsities = self._rescale_sparsities(
+                sparsities, target_sparsity=self._sparsity)
+            _logger.debug("sparsities after rescalling:%s", sparsities)
+            if sparsities is not None and sparsities[0] >= 0 and sparsities[-1] < 1:
+                _logger.info("Sparsities perturbated:%s", sparsities)
+                return sparsities
+    def calc_mask(self, wrapper, **kwargs):
+        return None
+    def compress(self, return_config_list=False):
+        """
+        Compress the model with Simulated Annealing.
+        Returns
+        -------
+        torch.nn.Module
+            model with specified modules compressed.
+        """
+        _logger.info('Starting Simulated Annealing Compression...')
+        # initiaze a randomized action
+        pruning_iteration = 0
+        self._init_sparsities()
+        # stop condition
+        self._current_temperature = self._start_temperature
+        while self._current_temperature > self._stop_temperature:
+            _logger.info('Pruning iteration: %d', pruning_iteration)
+            _logger.info('Current temperature: %d, Stop temperature: %d',
+                         self._current_temperature, self._stop_temperature)
+            while True:
+                # generate perturbation
+                sparsities_perturbated = self._generate_perturbations()
+                config_list = self._sparsities_2_config_list(
+                    sparsities_perturbated)
+                _logger.info(
+                    "config_list for Pruner generated: %s", config_list)
+                # fast evaluation
+                pruner = PRUNER_DICT[self._base_algo](copy.deepcopy(self._model_to_prune), config_list)
+                model_masked = pruner.compress()
+                evaluation_result = self._evaluator(model_masked)
+                self._search_history.append(
+                    {'sparsity': self._sparsity, 'performance': evaluation_result, 'config_list': config_list})
+                if self._optimize_mode is OptimizeMode.Minimize:
+                    evaluation_result *= -1
+                # if better evaluation result, then accept the perturbation
+                if evaluation_result > self._current_performance:
+                    self._current_performance = evaluation_result
+                    self._sparsities = sparsities_perturbated
+                    # save best performance and best params
+                    if evaluation_result > self._best_performance:
+                        _logger.info('updating best model...')
+                        self._best_performance = evaluation_result
+                        self._best_config_list = config_list
+                        # save the overall best masked model
+                        self.bound_model = model_masked
+                    break
+                # if not, accept with probability e^(-deltaE/current_temperature)
+                else:
+                    delta_E = np.abs(evaluation_result -
+                                     self._current_performance)
+                    probability = math.exp(-1 * delta_E /
+                                           self._current_temperature)
+                    if np.random.uniform(0, 1) < probability:
+                        self._current_performance = evaluation_result
+                        self._sparsities = sparsities_perturbated
+                        break
+            # cool down
+            self._current_temperature *= self._cool_down_rate
+            pruning_iteration += 1
+        _logger.info('----------Compression finished--------------')
+        _logger.info('Best performance: %s', self._best_performance)
+        _logger.info('config_list found : %s',
+                     self._best_config_list)
+        # save search history
+        with open(os.path.join(self._experiment_data_dir, 'search_history.csv'), 'w') as csvfile:
+            writer = csv.DictWriter(csvfile, fieldnames=['sparsity', 'performance', 'config_list'])
+            writer.writeheader()
+            for item in self._search_history:
+                writer.writerow({'sparsity': item['sparsity'], 'performance': item['performance'], 'config_list': json.dumps(
+                    item['config_list'])})
+        # save best config found and best performance
+        if self._optimize_mode is OptimizeMode.Minimize:
+            self._best_performance *= -1
+        with open(os.path.join(self._experiment_data_dir, 'search_result.json'), 'w+') as jsonfile:
+            json.dump({
+                'performance': self._best_performance,
+                'config_list': json.dumps(self._best_config_list)
+            }, jsonfile)
+        _logger.info('search history and result saved to foler : %s',
+                     self._experiment_data_dir)
+        if return_config_list:
+            return self._best_config_list
+        return self.bound_model
--- a/src/sdk/pynni/nni/compression/torch/utils/num_param_counter.py
+++ b/src/sdk/pynni/nni/compression/torch/utils/num_param_counter.py
+def get_total_num_weights(model, op_types=['default']):
+        '''
+        calculate the total number of weights
+        Returns
+        -------
+        int
+            total weights of all the op considered
+        '''
+        num_weights = 0
+        for _, module in model.named_modules():
+            if module == model:
+                continue
+            if 'default' in op_types or type(module).__name__ in op_types:
+                num_weights += module.weight.data.numel()
+        return num_weights
\ No newline at end of file
--- a/src/sdk/pynni/tests/test_pruners.py
+++ b/src/sdk/pynni/tests/test_pruners.py
@@ -9,7 +9,7 @@ import math
 from unittest import TestCase, main
 from nni.compression.torch import LevelPruner, SlimPruner, FPGMPruner, L1FilterPruner, \
    L2FilterPruner, AGP_Pruner, ActivationMeanRankFilterPruner, ActivationAPoZRankFilterPruner, \
-    TaylorFOWeightFilterPruner
+    TaylorFOWeightFilterPruner, NetAdaptPruner, SimulatedAnnealingPruner, ADMMPruner, AutoCompressPruner
 def validate_sparsity(wrapper, sparsity, bias=False):
    masks = [wrapper.weight_mask]
@@ -113,6 +113,47 @@ prune_config = {
        'validators': [
            lambda model: validate_sparsity(model.conv1, 0.5, model.bias)
        ]
+    },
+    'netadapt': {
+        'pruner_class': NetAdaptPruner,
+        'config_list': [{
+            'sparsity': 0.5,
+            'op_types': ['Conv2d']
+        }],
+        'short_term_fine_tuner': lambda model:model, 
+        'evaluator':lambda model: 0.9,
+        'validators': []
+    },
+    'simulatedannealing': {
+        'pruner_class': SimulatedAnnealingPruner,
+        'config_list': [{
+            'sparsity': 0.5,
+            'op_types': ['Conv2d']
+        }],
+        'evaluator':lambda model: 0.9,
+        'validators': []
+    },
+    'admm': {
+        'pruner_class': ADMMPruner,
+        'config_list': [{
+            'sparsity': 0.5,
+            'op_types': ['Conv2d'],
+        }],
+        'trainer': lambda model, optimizer, criterion, epoch, callback : model, 
+        'validators': [
+            lambda model: validate_sparsity(model.conv1, 0.5, model.bias)
+        ]
+    },
+    'autocompress': {
+        'pruner_class': AutoCompressPruner,
+        'config_list': [{
+            'sparsity': 0.5,
+            'op_types': ['Conv2d'],
+        }],
+        'trainer': lambda model, optimizer, criterion, epoch, callback : model,
+        'evaluator': lambda model: 0.9,
+        'dummy_input': torch.randn([64, 1, 28, 28]),
+        'validators': []
    }
 }
@@ -127,25 +168,36 @@ class Model(nn.Module):
    def forward(self, x):
        return self.fc(self.pool(self.bn1(self.conv1(x))).view(x.size(0), -1))
-def pruners_test(pruner_names=['agp', 'level', 'slim', 'fpgm', 'l1', 'l2', 'taylorfo', 'mean_activation', 'apoz'], bias=True):
+def pruners_test(pruner_names=['level', 'agp', 'slim', 'fpgm', 'l1', 'l2', 'taylorfo', 'mean_activation', 'apoz', 'netadapt', 'simulatedannealing', 'admm', 'autocompress'], bias=True):
    for pruner_name in pruner_names:
-        model = Model(bias=bias)
+        print('testing {}...'.format(pruner_name))
+        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        model = Model(bias=bias).to(device)
        optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
        config_list = prune_config[pruner_name]['config_list']
-        x = torch.randn(2, 1, 28, 28)
+        x = torch.randn(2, 1, 28, 28).to(device)
-        y = torch.tensor([0, 1]).long()
+        y = torch.tensor([0, 1]).long().to(device)
        out = model(x)
        loss = F.cross_entropy(out, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
-        pruner = prune_config[pruner_name]['pruner_class'](model, config_list, optimizer)
+        if pruner_name == 'netadapt':
+            pruner = prune_config[pruner_name]['pruner_class'](model, config_list, short_term_fine_tuner=prune_config[pruner_name]['short_term_fine_tuner'], evaluator=prune_config[pruner_name]['evaluator'])
+        elif pruner_name == 'simulatedannealing':
+            pruner = prune_config[pruner_name]['pruner_class'](model, config_list, evaluator=prune_config[pruner_name]['evaluator'])
+        elif pruner_name == 'admm':
+            pruner = prune_config[pruner_name]['pruner_class'](model, config_list, trainer=prune_config[pruner_name]['trainer'])
+        elif pruner_name == 'autocompress':
+            pruner = prune_config[pruner_name]['pruner_class'](model, config_list, trainer=prune_config[pruner_name]['trainer'], evaluator=prune_config[pruner_name]['evaluator'], dummy_input=x)
+        else:
+            pruner = prune_config[pruner_name]['pruner_class'](model, config_list, optimizer)
        pruner.compress()
-        x = torch.randn(2, 1, 28, 28)
+        x = torch.randn(2, 1, 28, 28).to(device)
-        y = torch.tensor([0, 1]).long()
+        y = torch.tensor([0, 1]).long().to(device)
        out = model(x)
        loss = F.cross_entropy(out, y)
        optimizer.zero_grad()
@@ -157,14 +209,16 @@ def pruners_test(pruner_names=['agp', 'level', 'slim', 'fpgm', 'l1', 'l2', 'tayl
            # when iteration >= statistics_batch_num (default 1)
            optimizer.step()
-        pruner.export_model('./model_tmp.pth', './mask_tmp.pth', './onnx_tmp.pth', input_shape=(2,1,28,28))
+        pruner.export_model('./model_tmp.pth', './mask_tmp.pth', './onnx_tmp.pth', input_shape=(2,1,28,28), device=device)
        for v in prune_config[pruner_name]['validators']:
            v(model)
-    os.remove('./model_tmp.pth')
-    os.remove('./mask_tmp.pth')
+    filePaths = ['./model_tmp.pth', './mask_tmp.pth', './onnx_tmp.pth', './search_history.csv', './search_result.json']
-    os.remove('./onnx_tmp.pth')
+    for f in filePaths:
+        if os.path.exists(f):
+            os.remove(f)
 def test_agp(pruning_algorithm):
        model = Model()