Constraint-aware one-shot pruners (#2657)

ec5af41f · Ningxin Zheng · GitHub · a4802083 · ec5af41f · ec5af41f
Unverified Commit ec5af41f authored Sep 21, 2020 by Ningxin Zheng Committed by GitHub Sep 21, 2020
14 changed files
--- a/docs/en_US/Compressor/DependencyAware.md
+++ b/docs/en_US/Compressor/DependencyAware.md
+# Dependency-aware Mode for Filter Pruning
+
+Currently, we have several filter pruning algorithm for the convolutional layers: FPGM Pruner, L1Filter Pruner, L2Filter Pruner, Activation APoZ Rank Filter Pruner, Activation Mean Rank Filter Pruner, Taylor FO On Weight Pruner. In these filter pruning algorithms, the pruner will prune each convolutional layer separately. While pruning a convolution layer, the algorithm will quantify the importance of each filter based on some specific rules(such as l1-norm), and prune the less important filters.
+
+As [dependency analysis utils](./CompressionUtils.md) shows, if the output channels of two convolutional layers(conv1, conv2) are added together, then these two conv layers have channel dependency with each other(more details please see [Compression Utils](./CompressionUtils.md)). Take the following figure as an example.
+![](../../img/mask_conflict.jpg)
+
+If we prune the first 50% of output channels(filters) for conv1, and prune the last 50% of output channels for conv2. Although both layers have pruned 50% of the filters, the speedup module still needs to add zeros to align the output channels. In this case, we cannot harvest the speed benefit from the model pruning.
+
+
+ To better gain the speed benefit of the model pruning, we add a dependency-aware mode for the Filter Pruner. In the dependency-aware mode, the pruner prunes the model not only based on the l1 norm of each filter, but also the topology of the whole network architecture.
+
+In the dependency-aware mode(`dependency_aware` is set `True`), the pruner will try to prune the same output channels for the layers that have the channel dependencies with each other, as shown in the following figure.
+
+![](../../img/dependency-aware.jpg)
+
+Take the dependency-aware mode of L1Filter Pruner as an example. Specifically, the pruner will calculate the L1 norm (for example) sum of all the layers in the dependency set for each channel. Obviously, the number of channels that can actually be pruned of this dependency set in the end is determined by the minimum sparsity of layers in this dependency set(denoted by `min_sparsity`). According to the L1 norm sum of each channel, the pruner will prune the same `min_sparsity` channels for all the layers. Next, the pruner will additionally prune `sparsity` - `min_sparsity` channels for each convolutional layer based on its own L1 norm of each channel. For example, suppose the output channels of `conv1` , `conv2` are added together and the configured sparsities of `conv1` and `conv2` are 0.3, 0.2 respectively. In this case, the `dependency-aware pruner` will 
+
+    - First, prune the same 20% of channels for `conv1` and `conv2` according to L1 norm sum of `conv1` and `conv2`. 
+    - Second, the pruner will additionally prune 10% channels for `conv1` according to the L1 norm of each channel of `conv1`.
+
+In addition, for the convolutional layers that have more than one filter group, `dependency-aware pruner` will also try to prune the same number of the channels for each filter group. Overall, this pruner will prune the model according to the L1 norm of each filter and try to meet the topological constrains(channel dependency, etc) to improve the final speed gain after the speedup process. 
+
+In the dependency-aware mode, the pruner will provide a better speed gain from the model pruning.
+
+## Usage
+In this section, we will show how to enable the dependency-aware mode for the filter pruner. Currently, only the one-shot pruners such as FPGM Pruner, L1Filter Pruner, L2Filter Pruner, Activation APoZ Rank Filter Pruner, Activation Mean Rank Filter Pruner, Taylor FO On Weight Pruner, support the dependency-aware mode.
+
+To enable the dependency-aware mode for `L1FilterPruner`:
+```python
+from nni.compression.torch import L1FilterPruner
+config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+# dummy_input is necessary for the dependency_aware mode
+dummy_input = torch.ones(1, 3, 224, 224).cuda()
+pruner = L1FilterPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
+# for L2FilterPruner
+# pruner = L2FilterPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
+# for FPGMPruner
+# pruner = FPGMPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
+# for ActivationAPoZRankFilterPruner
+# pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1, , dependency_aware=True, dummy_input=dummy_input)
+# for ActivationMeanRankFilterPruner
+# pruner = ActivationMeanRankFilterPruner(model, config_list, statistics_batch_num=1, dependency_aware=True, dummy_input=dummy_input)
+# for TaylorFOWeightFilterPruner
+# pruner = TaylorFOWeightFilterPruner(model, config_list, statistics_batch_num=1, dependency_aware=True, dummy_input=dummy_input)
+
+pruner.compress()
+```
+
+## Evaluation
+In order to compare the performance of the pruner with or without the dependency-aware mode, we use L1FilterPruner to prune the Mobilenet_v2 separately when the dependency-aware mode is turned on and off. To simplify the experiment, we use the uniform pruning which means we allocate the same sparsity for all convolutional layers in the model.
+We trained a Mobilenet_v2 model on the cifar10 dataset and prune the model based on this pretrained checkpoint. The following figure shows the accuracy and FLOPs of the model pruned by different pruners.
+![](../../img/mobilev2_l1_cifar.jpg)
+
+In the figure, the `Dependency-aware` represents the L1FilterPruner with dependency-aware mode enabled. `L1 Filter` is the normal `L1FilterPruner` without the dependency-aware mode, and the `No-Dependency` means  pruner only prunes the layers that has no channel dependency with other layers. As we can see in the figure, when the dependency-aware mode enabled, the pruner can bring higher accuracy under the same Flops.
\ No newline at end of file
--- a/docs/en_US/Compressor/Pruner.md
+++ b/docs/en_US/Compressor/Pruner.md
@@ -114,7 +114,9 @@ FPGMPruner prune filters with the smallest geometric median.

 ![](../../img/fpgm_fig1.png)

->Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
+>Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance. 
+
+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.

 ### Usage

@@ -154,6 +156,8 @@ This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https:
 > 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
 >      weights are copied to the new model.

+In addition, we also provide a dependency-aware mode for the L1FilterPruner. For more details about the dependency-aware mode, please reference [dependency-aware mode](./DependencyAware.md).
+
 ### Usage

 PyTorch code
@@ -189,6 +193,8 @@ The experiments code can be found at [examples/model_compress]( https://github.c

 This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights. It is implemented as a one-shot pruner.

+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage

 PyTorch code
@@ -200,6 +206,7 @@ pruner = L2FilterPruner(model, config_list)
 pruner.compress()
 ```

+
 ### User configuration for L2Filter Pruner

 ##### PyTorch
@@ -208,6 +215,7 @@ pruner.compress()
 ```
 ***

+
 ## ActivationAPoZRankFilter Pruner

 ActivationAPoZRankFilter Pruner is a pruner which prunes the filters with the smallest importance criterion `APoZ` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `APoZ` is explained in the paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250).
@@ -216,6 +224,8 @@ The APoZ is defined as:

 ![](../../img/apoz.png)

+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage

 PyTorch code
@@ -234,6 +244,8 @@ Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers withi

 You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py) for more information.

+
+
 ### User configuration for ActivationAPoZRankFilter Pruner

 ##### PyTorch
@@ -247,6 +259,8 @@ You can view [example](https://github.com/microsoft/nni/blob/master/examples/mod

 ActivationMeanRankFilterPruner is a pruner which prunes the filters with the smallest importance criterion `mean activation` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `mean activation` is explained in section 2.2 of the paper[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440). Other pruning criteria mentioned in this paper will be supported in future release.

+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage

 PyTorch code
@@ -265,6 +279,7 @@ Note: ActivationMeanRankFilterPruner is used to prune convolutional layers withi

 You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py) for more information.

+
 ### User configuration for ActivationMeanRankFilterPruner

 ##### PyTorch
@@ -273,6 +288,7 @@ You can view [example](https://github.com/microsoft/nni/blob/master/examples/mod
 ```
 ***

+
 ## TaylorFOWeightFilter Pruner

 TaylorFOWeightFilter Pruner is a pruner which prunes convolutional layers based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity. The estimated importance of filters is defined as the paper [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf). Other pruning criteria mentioned in this paper will be supported in future release.
@@ -281,6 +297,8 @@ TaylorFOWeightFilter Pruner is a pruner which prunes convolutional layers based

 ![](../../img/importance_estimation_sum.png)

+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage

 PyTorch code

--- a/docs/en_US/model_compression.rst
+++ b/docs/en_US/model_compression.rst
@@ -17,7 +17,7 @@ For details, please refer to the following tutorials:

    Overview <Compressor/Overview>
    Quick Start <Compressor/QuickStart>
-    Pruners <Compressor/Pruner>
+    Pruning <pruning>
    Quantizers <Compressor/Quantizer>
    Automatic Model Compression <Compressor/AutoCompression>
    Model Speedup <Compressor/ModelSpeedup>

--- a/docs/en_US/pruning.rst
+++ b/docs/en_US/pruning.rst
+#################
+Pruning
+#################
+
+NNI provides several pruning algorithms that support fine-grained weight pruning and structural filter pruning.
+It supports Tensorflow and PyTorch with unified interface.
+For users to prune their models, they only need to add several lines in their code.
+For the structural filter pruning, NNI also provides a dependency-aware mode. In the dependency-aware mode, the
+filter pruner will get better speed gain after the speedup.
+
+For details, please refer to the following tutorials:
+
+..  toctree::
+    :maxdepth: 2
+
+    Pruners <Compressor/Pruner>
+    Dependency Aware Mode <Compressor/DependencyAware>
--- a/docs/img/dependency-aware.jpg
+++ b/docs/img/dependency-aware.jpg
--- a/docs/img/mask_conflict.jpg
+++ b/docs/img/mask_conflict.jpg
--- a/docs/img/mobilev2_l1_cifar.jpg
+++ b/docs/img/mobilev2_l1_cifar.jpg
--- a/examples/model_compress/model_prune_torch.py
+++ b/examples/model_compress/model_prune_torch.py
@@ -48,7 +48,7 @@ prune_config = {
        'dataset_name': 'mnist',
        'model_name': 'naive',
        'pruner_class': FPGMPruner,
-        'config_list':[{
+        'config_list': [{
            'sparsity': 0.5,
            'op_types': ['Conv2d']
        }]
@@ -85,6 +85,7 @@ prune_config = {
    }
 }

+
 def get_data_loaders(dataset_name='mnist', batch_size=128):
    assert dataset_name in ['cifar10', 'mnist']

@@ -98,20 +99,23 @@ def get_data_loaders(dataset_name='mnist', batch_size=128):
    train_loader = DataLoader(
        ds_class(
            './data', train=True, download=True,
-            transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize(MEAN, STD)])
+            transform=transforms.Compose(
+                [transforms.ToTensor(), transforms.Normalize(MEAN, STD)])
        ),
        batch_size=batch_size, shuffle=True
    )
    test_loader = DataLoader(
        ds_class(
            './data', train=False, download=True,
-            transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize(MEAN, STD)])
+            transform=transforms.Compose(
+                [transforms.ToTensor(), transforms.Normalize(MEAN, STD)])
        ),
        batch_size=batch_size, shuffle=False
    )

    return train_loader, test_loader

+
 class NaiveModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
@@ -132,6 +136,7 @@ class NaiveModel(torch.nn.Module):
        x = self.fc2(x)
        return x

+
 def create_model(model_name='naive'):
    assert model_name in ['naive', 'vgg16', 'vgg19']

@@ -142,10 +147,18 @@ def create_model(model_name='naive'):
    else:
        return VGG(19)

-def create_pruner(model, pruner_name, optimizer=None):
+
+def create_pruner(model, pruner_name, optimizer=None, dependency_aware=False, dummy_input=None):
    pruner_class = prune_config[pruner_name]['pruner_class']
    config_list = prune_config[pruner_name]['config_list']
-    return pruner_class(model, config_list, optimizer)
+    kw_args = {}
+    if dependency_aware:
+        print('Enable the dependency_aware mode')
+        # note that, not all pruners support the dependency_aware mode
+        kw_args['dependency_aware'] = True
+        kw_args['dummy_input'] = dummy_input
+    pruner = pruner_class(model, config_list, optimizer, **kw_args)
+    return pruner

 def train(model, device, train_loader, optimizer):
    model.train()
@@ -157,7 +170,9 @@ def train(model, device, train_loader, optimizer):
        loss.backward()
        optimizer.step()
        if batch_idx % 100 == 0:
-            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
+            print('{:2.0f}%  Loss {}'.format(
+                100 * batch_idx / len(train_loader), loss.item()))
+

 def test(model, device, test_loader):
    model.eval()
@@ -167,7 +182,8 @@ def test(model, device, test_loader):
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
-            test_loss += F.cross_entropy(output, target, reduction='sum').item()
+            test_loss += F.cross_entropy(output,
+                                         target, reduction='sum').item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(test_loader.dataset)
@@ -177,20 +193,25 @@ def test(model, device, test_loader):
        test_loss, acc))
    return acc

+
 def main(args):
-    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+    device = torch.device(
+        'cuda') if torch.cuda.is_available() else torch.device('cpu')
    os.makedirs(args.checkpoints_dir, exist_ok=True)

    model_name = prune_config[args.pruner_name]['model_name']
    dataset_name = prune_config[args.pruner_name]['dataset_name']
    train_loader, test_loader = get_data_loaders(dataset_name, args.batch_size)
+    dummy_input, _ = next(iter(train_loader))
+    dummy_input = dummy_input.to(device)
    model = create_model(model_name).cuda()
    if args.resume_from is not None and os.path.exists(args.resume_from):
        print('loading checkpoint {} ...'.format(args.resume_from))
        model.load_state_dict(torch.load(args.resume_from))
        test(model, device, test_loader)
    else:
-        optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
+        optimizer = torch.optim.SGD(
+            model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
        if args.multi_gpu and torch.cuda.device_count():
            model = nn.DataParallel(model)

@@ -204,17 +225,21 @@ def main(args):

    print('start model pruning...')

-    model_path = os.path.join(args.checkpoints_dir, 'pruned_{}_{}_{}.pth'.format(model_name, dataset_name, args.pruner_name))
-    mask_path = os.path.join(args.checkpoints_dir, 'mask_{}_{}_{}.pth'.format(model_name, dataset_name, args.pruner_name))
+    model_path = os.path.join(args.checkpoints_dir, 'pruned_{}_{}_{}.pth'.format(
+        model_name, dataset_name, args.pruner_name))
+    mask_path = os.path.join(args.checkpoints_dir, 'mask_{}_{}_{}.pth'.format(
+        model_name, dataset_name, args.pruner_name))

    # pruner needs to be initialized from a model not wrapped by DataParallel
    if isinstance(model, nn.DataParallel):
        model = model.module

-    optimizer_finetune = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
+    optimizer_finetune = torch.optim.SGD(
+        model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
    best_top1 = 0

-    pruner = create_pruner(model, args.pruner_name, optimizer_finetune)
+    pruner = create_pruner(model, args.pruner_name,
+                           optimizer_finetune, args.dependency_aware, dummy_input)
    model = pruner.compress()

    if args.multi_gpu and torch.cuda.device_count() > 1:
@@ -231,15 +256,23 @@ def main(args):
            # mask_path stores mask_dict of the pruned model
            pruner.export_model(model_path=model_path, mask_path=mask_path)

+
 if __name__ == '__main__':
    parser = argparse.ArgumentParser()
-    parser.add_argument("--pruner_name", type=str, default="level", help="pruner name")
+    parser.add_argument("--pruner_name", type=str,
+                        default="level", help="pruner name")
    parser.add_argument("--batch_size", type=int, default=256)
-    parser.add_argument("--pretrain_epochs", type=int, default=10, help="training epochs before model pruning")
-    parser.add_argument("--prune_epochs", type=int, default=10, help="training epochs for model pruning")
-    parser.add_argument("--checkpoints_dir", type=str, default="./checkpoints", help="checkpoints directory")
-    parser.add_argument("--resume_from", type=str, default=None, help="pretrained model weights")
-    parser.add_argument("--multi_gpu", action="store_true", help="Use multiple GPUs for training")
-
+    parser.add_argument("--pretrain_epochs", type=int,
+                        default=10, help="training epochs before model pruning")
+    parser.add_argument("--prune_epochs", type=int, default=10,
+                        help="training epochs for model pruning")
+    parser.add_argument("--checkpoints_dir", type=str,
+                        default="./checkpoints", help="checkpoints directory")
+    parser.add_argument("--resume_from", type=str,
+                        default=None, help="pretrained model weights")
+    parser.add_argument("--multi_gpu", action="store_true",
+                        help="Use multiple GPUs for training")
+    parser.add_argument("--dependency_aware", action="store_true", default=False,
+                        help="If enable the dependency_aware mode for the pruner")
    args = parser.parse_args()
    main(args)
--- a/src/sdk/pynni/nni/compression/torch/pruning/__init__.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/__init__.py
@@ -13,4 +13,3 @@ from .admm_pruner import ADMMPruner
 from .auto_compress_pruner import AutoCompressPruner
 from .sensitivity_pruner import SensitivityPruner
 from .amc import AMCPruner
-
--- a/src/sdk/pynni/nni/compression/torch/pruning/one_shot.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/one_shot.py
@@ -3,14 +3,19 @@

 import logging
 from schema import And, Optional
+from nni._graph_utils import TorchModuleGraph
+from nni.compression.torch.utils.shape_dependency import ChannelDependency, GroupDependency
 from .constants import MASKER_DICT
 from ..utils.config_validation import CompressorSchema
 from ..compressor import Pruner

-__all__ = ['LevelPruner', 'SlimPruner', 'L1FilterPruner', 'L2FilterPruner', 'FPGMPruner', \
-    'TaylorFOWeightFilterPruner', 'ActivationAPoZRankFilterPruner', 'ActivationMeanRankFilterPruner']

-logger = logging.getLogger('torch pruner')
+__all__ = ['LevelPruner', 'SlimPruner', 'L1FilterPruner', 'L2FilterPruner', 'FPGMPruner',
+           'TaylorFOWeightFilterPruner', 'ActivationAPoZRankFilterPruner', 'ActivationMeanRankFilterPruner']
+
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)
+

 class OneshotPruner(Pruner):
    """
@@ -35,7 +40,8 @@ class OneshotPruner(Pruner):

        super().__init__(model, config_list, optimizer)
        self.set_wrappers_attribute("if_calculated", False)
-        self.masker = MASKER_DICT[pruning_algorithm](model, self, **algo_kwargs)
+        self.masker = MASKER_DICT[pruning_algorithm](
+            model, self, **algo_kwargs)

    def validate_config(self, model, config_list):
        """
@@ -75,7 +81,8 @@ class OneshotPruner(Pruner):

        sparsity = wrapper.config['sparsity']
        if not wrapper.if_calculated:
-            masks = self.masker.calc_mask(sparsity=sparsity, wrapper=wrapper, wrapper_idx=wrapper_idx)
+            masks = self.masker.calc_mask(
+                sparsity=sparsity, wrapper=wrapper, wrapper_idx=wrapper_idx)

            # masker.calc_mask returns None means calc_mask is not calculated sucessfully, can try later
            if masks is not None:
@@ -84,6 +91,7 @@ class OneshotPruner(Pruner):
        else:
            return None

+
 class LevelPruner(OneshotPruner):
    """
    Parameters
@@ -97,9 +105,11 @@ class LevelPruner(OneshotPruner):
    optimizer: torch.optim.Optimizer
            Optimizer used to train model
    """
+
    def __init__(self, model, config_list, optimizer=None):
        super().__init__(model, config_list, pruning_algorithm='level', optimizer=optimizer)

+
 class SlimPruner(OneshotPruner):
    """
    Parameters
@@ -113,6 +123,7 @@ class SlimPruner(OneshotPruner):
    optimizer: torch.optim.Optimizer
            Optimizer used to train model
    """
+
    def __init__(self, model, config_list, optimizer=None):
        super().__init__(model, config_list, pruning_algorithm='slim', optimizer=optimizer)

@@ -128,9 +139,50 @@ class SlimPruner(OneshotPruner):
        if len(config_list) > 1:
            logger.warning('Slim pruner only supports 1 configuration')

+
 class _StructuredFilterPruner(OneshotPruner):
-    def __init__(self, model, config_list, pruning_algorithm, optimizer=None, **algo_kwargs):
-        super().__init__(model, config_list, pruning_algorithm=pruning_algorithm, optimizer=optimizer, **algo_kwargs)
+    """
+    _StructuredFilterPruner has two ways to calculate the masks
+    for conv layers. In the normal way, the _StructuredFilterPruner
+    will calculate the mask of each layer separately. For example, each
+    conv layer determine which filters should be pruned according to its L1
+    norm. In constrast, in the dependency-aware way, the layers that in a
+    dependency group will be pruned jointly and these layers will be forced
+    to prune the same channels.
+    """
+
+    def __init__(self, model, config_list, pruning_algorithm, optimizer=None, dependency_aware=False, dummy_input=None, **algo_kwargs):
+        super().__init__(model, config_list, pruning_algorithm=pruning_algorithm,
+                         optimizer=optimizer, **algo_kwargs)
+        self.dependency_aware = dependency_aware
+        # set the dependency-aware switch for the masker
+        self.masker.dependency_aware = dependency_aware
+        self.dummy_input = dummy_input
+        if self.dependency_aware:
+            errmsg = "When dependency_aware is set, the dummy_input should not be None"
+            assert self.dummy_input is not None, errmsg
+            # Get the TorchModuleGraph of the target model
+            # to trace the model, we need to unwrap the wrappers
+            self._unwrap_model()
+            self.graph = TorchModuleGraph(model, dummy_input)
+            self._wrap_model()
+            self.channel_depen = ChannelDependency(
+                traced_model=self.graph.trace)
+            self.group_depen = GroupDependency(traced_model=self.graph.trace)
+            self.channel_depen = self.channel_depen.dependency_sets
+            self.channel_depen = {
+                name: sets for sets in self.channel_depen for name in sets}
+            self.group_depen = self.group_depen.dependency_sets
+
+    def update_mask(self):
+        if not self.dependency_aware:
+            # if we use the normal way to update the mask,
+            # then call the update_mask of the father class
+            super(_StructuredFilterPruner, self).update_mask()
+        else:
+            # if we update the mask in a dependency-aware way
+            # then we call _dependency_update_mask
+            self._dependency_update_mask()

    def validate_config(self, model, config_list):
        schema = CompressorSchema([{
@@ -141,6 +193,71 @@ class _StructuredFilterPruner(OneshotPruner):

        schema.validate(config_list)

+    def _dependency_calc_mask(self, wrappers, channel_dsets, wrappers_idx=None):
+        """
+        calculate the masks for the conv layers in the same
+        channel dependecy set. All the layers passed in have
+        the same number of channels.
+
+        Parameters
+        ----------
+        wrappers: list
+            The list of the wrappers that in the same channel dependency
+            set.
+        wrappers_idx: list
+            The list of the indexes of wrapppers.
+        Returns
+        -------
+        masks: dict
+            A dict object that contains the masks of the layers in this
+            dependency group, the key is the name of the convolutional layers.
+        """
+        # The number of the groups for each conv layers
+        # Note that, this number may be different from its
+        # original number of groups of filters.
+        groups = [self.group_depen[_w.name] for _w in wrappers]
+        sparsities = [_w.config['sparsity'] for _w in wrappers]
+        masks = self.masker.calc_mask(
+            sparsities, wrappers, wrappers_idx, channel_dsets=channel_dsets, groups=groups)
+        if masks is not None:
+            # if masks is None, then the mask calculation fails.
+            # for example, in activation related maskers, we should
+            # pass enough batches of data to the model, so that the
+            # masks can be calculated successfully.
+            for _w in wrappers:
+                _w.if_calculated = True
+        return masks
+
+    def _dependency_update_mask(self):
+        """
+        In the original update_mask, the wraper of each layer will update its
+        own mask according to the sparsity specified in the config_list. However, in
+        the _dependency_update_mask, we may prune several layers at the same
+        time according the sparsities and the channel/group dependencies.
+        """
+        name2wrapper = {x.name: x for x in self.get_modules_wrapper()}
+        wrapper2index = {x: i for i, x in enumerate(self.get_modules_wrapper())}
+        for wrapper in self.get_modules_wrapper():
+            if wrapper.if_calculated:
+                continue
+            # find all the conv layers that have channel dependecy with this layer
+            # and prune all these layers at the same time.
+            _names = [x for x in self.channel_depen[wrapper.name]]
+            logger.info('Pruning the dependent layers: %s', ','.join(_names))
+            _wrappers = [name2wrapper[name]
+                         for name in _names if name in name2wrapper]
+            _wrapper_idxes = [wrapper2index[_w] for _w in _wrappers]
+
+            masks = self._dependency_calc_mask(
+                _wrappers, _names, wrappers_idx=_wrapper_idxes)
+            if masks is not None:
+                for layer in masks:
+                    for mask_type in masks[layer]:
+                        assert hasattr(
+                            name2wrapper[layer], mask_type), "there is no attribute '%s' in wrapper on %s" % (mask_type, layer)
+                        setattr(name2wrapper[layer], mask_type, masks[layer][mask_type])
+
+
 class L1FilterPruner(_StructuredFilterPruner):
    """
    Parameters
@@ -153,9 +270,23 @@ class L1FilterPruner(_StructuredFilterPruner):
            - op_types : Only Conv2d is supported in L1FilterPruner.
    optimizer: torch.optim.Optimizer
            Optimizer used to train model
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
    """
-    def __init__(self, model, config_list, optimizer=None):
-        super().__init__(model, config_list, pruning_algorithm='l1', optimizer=optimizer)
+
+    def __init__(self, model, config_list, optimizer=None, dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='l1', optimizer=optimizer,
+                         dependency_aware=dependency_aware, dummy_input=dummy_input)
+

 class L2FilterPruner(_StructuredFilterPruner):
    """
@@ -169,9 +300,23 @@ class L2FilterPruner(_StructuredFilterPruner):
            - op_types : Only Conv2d is supported in L2FilterPruner.
    optimizer: torch.optim.Optimizer
            Optimizer used to train model
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
    """
-    def __init__(self, model, config_list, optimizer=None):
-        super().__init__(model, config_list, pruning_algorithm='l2', optimizer=optimizer)
+
+    def __init__(self, model, config_list, optimizer=None, dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='l2', optimizer=optimizer,
+                         dependency_aware=dependency_aware, dummy_input=dummy_input)
+

 class FPGMPruner(_StructuredFilterPruner):
    """
@@ -185,9 +330,23 @@ class FPGMPruner(_StructuredFilterPruner):
            - op_types : Only Conv2d is supported in FPGM Pruner.
    optimizer: torch.optim.Optimizer
            Optimizer used to train model
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
    """
-    def __init__(self, model, config_list, optimizer=None):
-        super().__init__(model, config_list, pruning_algorithm='fpgm', optimizer=optimizer)
+
+    def __init__(self, model, config_list, optimizer=None, dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='fpgm',
+                         dependency_aware=dependency_aware, dummy_input=dummy_input, optimizer=optimizer)
+

 class TaylorFOWeightFilterPruner(_StructuredFilterPruner):
    """
@@ -201,9 +360,28 @@ class TaylorFOWeightFilterPruner(_StructuredFilterPruner):
            - op_types : Currently only Conv2d is supported in TaylorFOWeightFilterPruner.
    optimizer: torch.optim.Optimizer
            Optimizer used to train model
+    statistics_batch_num: int
+        The number of batches to statistic the activation.
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
+
    """
-    def __init__(self, model, config_list, optimizer=None, statistics_batch_num=1):
-        super().__init__(model, config_list, pruning_algorithm='taylorfo', optimizer=optimizer, statistics_batch_num=statistics_batch_num)
+
+    def __init__(self, model, config_list, optimizer=None, statistics_batch_num=1,
+                 dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='taylorfo',
+                         dependency_aware=dependency_aware, dummy_input=dummy_input,
+                         optimizer=optimizer, statistics_batch_num=statistics_batch_num)
+

 class ActivationAPoZRankFilterPruner(_StructuredFilterPruner):
    """
@@ -217,10 +395,30 @@ class ActivationAPoZRankFilterPruner(_StructuredFilterPruner):
            - op_types : Only Conv2d is supported in ActivationAPoZRankFilterPruner.
    optimizer: torch.optim.Optimizer
            Optimizer used to train model
+    activation: str
+        The activation type.
+    statistics_batch_num: int
+        The number of batches to statistic the activation.
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
+
    """
-    def __init__(self, model, config_list, optimizer=None, activation='relu', statistics_batch_num=1):
-        super().__init__(model, config_list, pruning_algorithm='apoz', optimizer=optimizer, \
-            activation=activation, statistics_batch_num=statistics_batch_num)
+
+    def __init__(self, model, config_list, optimizer=None, activation='relu',
+                 statistics_batch_num=1, dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='apoz', optimizer=optimizer,
+                         dependency_aware=dependency_aware, dummy_input=dummy_input,
+                         activation=activation, statistics_batch_num=statistics_batch_num)
+

 class ActivationMeanRankFilterPruner(_StructuredFilterPruner):
    """
@@ -233,8 +431,26 @@ class ActivationMeanRankFilterPruner(_StructuredFilterPruner):
            - sparsity : How much percentage of convolutional filters are to be pruned.
            - op_types : Only Conv2d is supported in ActivationMeanRankFilterPruner.
    optimizer: torch.optim.Optimizer
-            Optimizer used to train model
+            Optimizer used to train model.
+    activation: str
+        The activation type.
+    statistics_batch_num: int
+        The number of batches to statistic the activation.
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
    """
-    def __init__(self, model, config_list, optimizer=None, activation='relu', statistics_batch_num=1):
-        super().__init__(model, config_list, pruning_algorithm='mean_activation', optimizer=optimizer, \
-            activation=activation, statistics_batch_num=statistics_batch_num)
+
+    def __init__(self, model, config_list, optimizer=None, activation='relu',
+                 statistics_batch_num=1, dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='mean_activation', optimizer=optimizer,
+                         dependency_aware=dependency_aware, dummy_input=dummy_input,
+                         activation=activation, statistics_batch_num=statistics_batch_num)
--- a/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py
--- a/src/sdk/pynni/nni/compression/torch/utils/mask_conflict.py
+++ b/src/sdk/pynni/nni/compression/torch/utils/mask_conflict.py
@@ -290,4 +290,5 @@ class ChannelMaskConflict(MaskFix):
            _logger.info('Pruned Filters after fixing conflict:')
            pruned_filters = set(list(range(ori_channels)))-channel_remain
            _logger.info(str(sorted(pruned_filters)))
+
        return self.masks
--- a/src/sdk/pynni/nni/compression/torch/utils/shape_dependency.py
+++ b/src/sdk/pynni/nni/compression/torch/utils/shape_dependency.py
@@ -484,3 +484,6 @@ class GroupDependency(Dependency):
            for name in self.dependency:
                group = self.dependency[name]
                csv_w.writerow([name, group])
+    @property
+    def dependency_sets(self):
+        return self.dependency
--- a/src/sdk/pynni/tests/test_dependecy_aware.py
+++ b/src/sdk/pynni/tests/test_dependecy_aware.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+
+import random
+import unittest
+from unittest import TestCase, main
+import torch
+import torch.nn as nn
+import torchvision.models as models
+import numpy as np
+
+from nni.compression.torch import L1FilterPruner, L2FilterPruner, FPGMPruner, \
+    TaylorFOWeightFilterPruner, ActivationAPoZRankFilterPruner, \
+    ActivationMeanRankFilterPruner
+from nni.compression.torch import ModelSpeedup
+
+unittest.TestLoader.sortTestMethodsUsing = None
+
+MODEL_FILE, MASK_FILE = './model.pth', './mask.pth'
+
+def generate_random_sparsity(model):
+    """
+    generate a random sparsity for all conv layers in the
+    model.
+    """
+    cfg_list = []
+    for name, module in model.named_modules():
+        if isinstance(module, nn.Conv2d):
+            sparsity = np.random.uniform(0.5, 0.99)
+            cfg_list.append({'op_types': ['Conv2d'], 'op_names': [name],
+                             'sparsity': sparsity})
+    return cfg_list
+
+def generate_random_sparsity_v2(model):
+    """
+    only generate a random sparsity for some conv layers in
+    in the model.
+    """
+    cfg_list = []
+    for name, module in model.named_modules():
+        # randomly pick 50% layers
+        if isinstance(module, nn.Conv2d) and random.uniform(0, 1) > 0.5:
+            sparsity = np.random.uniform(0.5, 0.99)
+            cfg_list.append({'op_types': ['Conv2d'], 'op_names': [name],
+                             'sparsity': sparsity})
+    return cfg_list
+
+
+class DependencyawareTest(TestCase):
+    @unittest.skipIf(torch.__version__ < "1.3.0", "not supported")
+    def test_dependency_aware_pruning(self):
+        model_zoo = ['resnet18']
+        pruners = [L1FilterPruner, L2FilterPruner, FPGMPruner, TaylorFOWeightFilterPruner]
+        sparsity = 0.7
+        cfg_list = [{'op_types': ['Conv2d'], 'sparsity':sparsity}]
+        dummy_input = torch.ones(1, 3, 224, 224)
+        for model_name in model_zoo:
+            for pruner in pruners:
+                print('Testing on ', pruner)
+                ori_filters = {}
+                Model = getattr(models, model_name)
+                net = Model(pretrained=True, progress=False)
+                # record the number of the filter of each conv layer
+                for name, module in net.named_modules():
+                    if isinstance(module, nn.Conv2d):
+                        ori_filters[name] = module.out_channels
+
+                # for the pruners that based on the activations, we need feed
+                # enough data before we call the compress function.
+                optimizer = torch.optim.SGD(net.parameters(), lr=0.0001,
+                                 momentum=0.9,
+                                 weight_decay=4e-5)
+                criterion = torch.nn.CrossEntropyLoss()
+                tmp_pruner = pruner(
+                    net, cfg_list, optimizer, dependency_aware=True, dummy_input=dummy_input)
+                # train one single batch so that the the pruner can collect the
+                # statistic
+                optimizer.zero_grad()
+                out = net(dummy_input)
+                batchsize = dummy_input.size(0)
+                loss = criterion(out, torch.zeros(batchsize, dtype=torch.int64))
+                loss.backward()
+                optimizer.step()
+
+                tmp_pruner.compress()
+                tmp_pruner.export_model(MODEL_FILE, MASK_FILE)
+                # if we want to use the same model, we should unwrap the pruner before the speedup
+                tmp_pruner._unwrap_model()
+                ms = ModelSpeedup(net, dummy_input, MASK_FILE)
+                ms.speedup_model()
+                for name, module in net.named_modules():
+                    if isinstance(module, nn.Conv2d):
+                        expected = int(ori_filters[name] * (1-sparsity))
+                        filter_diff = abs(expected - module.out_channels)
+                        errmsg = '%s Ori: %d, Expected: %d, Real: %d' % (
+                            name, ori_filters[name], expected, module.out_channels)
+
+                        # because we are using the dependency-aware mode, so the number of the
+                        # filters after speedup should be ori_filters[name] * ( 1 - sparsity )
+                        print(errmsg)
+                        assert filter_diff <= 1, errmsg
+
+    @unittest.skipIf(torch.__version__ < "1.3.0", "not supported")
+    def test_dependency_aware_random_config(self):
+        model_zoo = ['resnet18']
+        pruners = [L1FilterPruner, L2FilterPruner, FPGMPruner, TaylorFOWeightFilterPruner,
+                   ActivationMeanRankFilterPruner, ActivationAPoZRankFilterPruner]
+        dummy_input = torch.ones(1, 3, 224, 224)
+        for model_name in model_zoo:
+            for pruner in pruners:
+                Model = getattr(models, model_name)
+                cfg_generator = [generate_random_sparsity, generate_random_sparsity_v2]
+                for _generator in cfg_generator:
+                    net = Model(pretrained=True, progress=False)
+                    cfg_list = _generator(net)
+
+                    print('\n\nModel:', model_name)
+                    print('Pruner', pruner)
+                    print('Config_list:', cfg_list)
+                    # for the pruners that based on the activations, we need feed
+                    # enough data before we call the compress function.
+                    optimizer = torch.optim.SGD(net.parameters(), lr=0.0001,
+                                    momentum=0.9,
+                                    weight_decay=4e-5)
+                    criterion = torch.nn.CrossEntropyLoss()
+                    tmp_pruner = pruner(
+                        net, cfg_list, optimizer, dependency_aware=True, dummy_input=dummy_input)
+                    # train one single batch so that the the pruner can collect the
+                    # statistic
+                    optimizer.zero_grad()
+                    out = net(dummy_input)
+                    batchsize = dummy_input.size(0)
+                    loss = criterion(out, torch.zeros(batchsize, dtype=torch.int64))
+                    loss.backward()
+                    optimizer.step()
+
+                    tmp_pruner.compress()
+                    tmp_pruner.export_model(MODEL_FILE, MASK_FILE)
+                    # if we want to use the same model, we should unwrap the pruner before the speedup
+                    tmp_pruner._unwrap_model()
+                    ms = ModelSpeedup(net, dummy_input, MASK_FILE)
+                    ms.speedup_model()
+
+
+if __name__ == '__main__':
+    main()