add dtk24.04 code

9f73153f · zhanggzh · eb77376e · 9f73153f · 9f73153f · 9f73153f
Commit 9f73153f authored Jul 01, 2024 by zhanggzh
20 changed files
--- a/docs/img/webui-img/managerExperimentList/expList.png
+++ b/docs/img/webui-img/managerExperimentList/expList.png
--- a/docs/img/webui-img/managerExperimentList/experimentListNav.png
+++ b/docs/img/webui-img/managerExperimentList/experimentListNav.png
--- a/docs/img/webui-img/managerExperimentList/toAnotherExp.png
+++ b/docs/img/webui-img/managerExperimentList/toAnotherExp.png
--- a/docs/img/webui-img/over1.png
+++ b/docs/img/webui-img/over1.png
--- a/docs/img/webui-img/over2.png
+++ b/docs/img/webui-img/over2.png
--- a/docs/img/webui-img/refresh-interval.png
+++ b/docs/img/webui-img/refresh-interval.png
--- a/docs/img/webui-img/review-log.png
+++ b/docs/img/webui-img/review-log.png
--- a/docs/img/webui-img/search-space-button.png
+++ b/docs/img/webui-img/search-space-button.png
--- a/docs/img/webui-img/search-space.png
+++ b/docs/img/webui-img/search-space.png
--- a/docs/img/webui-img/search-trial.png
+++ b/docs/img/webui-img/search-trial.png
--- a/docs/img/webui-img/searchSpace.png
+++ b/docs/img/webui-img/searchSpace.png
--- a/docs/img/webui-img/select-trial.png
+++ b/docs/img/webui-img/select-trial.png
--- a/docs/img/webui-img/summary.png
+++ b/docs/img/webui-img/summary.png
--- a/docs/img/webui-img/trial_duration.png
+++ b/docs/img/webui-img/trial_duration.png
--- a/docs/img/webui-img/trials_intermeidate.png
+++ b/docs/img/webui-img/trials_intermeidate.png
--- a/docs/source/compression/advanced_usage.rst
+++ b/docs/source/compression/advanced_usage.rst
+Advanced Usage
+==============
+
+..  toctree::
+    :maxdepth: 2
+
+    Customize Basic Pruner <../tutorials/pruning_customize>
+    Customize Quantizer <../tutorials/quantization_customize>
+    Customize Scheduled Pruning Process <pruning_scheduler>
+    Utilities <compression_utils>
--- a/docs/source/compression/best_practices.rst
+++ b/docs/source/compression/best_practices.rst
+Best Practices
+==============
+
+.. toctree::
+    :hidden:
+    :maxdepth: 2
+
+    Pruning Transformer </tutorials/pruning_bert_glue>
--- a/docs/source/compression/compression_config_list.rst
+++ b/docs/source/compression/compression_config_list.rst
+Compression Config Specification
+================================
+
+Each sub-config in the config list is a dict, and the scope of each setting (key) is only internal to each sub-config.
+If multiple sub-configs are configured for the same layer, the later ones will overwrite the previous ones.
+
+Common Keys in Config
+---------------------
+
+op_types
+^^^^^^^^
+
+The type of the layers targeted by this sub-config.
+If ``op_names`` is not set in this sub-config, all layers in the model that satisfy the type will be selected.
+If ``op_names`` is set in this sub-config, the selected layers should satisfy both type and name.
+
+op_names
+^^^^^^^^
+
+The name of the layers targeted by this sub-config.
+If ``op_types`` is set in this sub-config, the selected layer should satisfy both type and name.
+
+exclude
+^^^^^^^
+
+The ``exclude`` and ``sparsity`` keyword are mutually exclusive and cannot exist in the same sub-config.
+If ``exclude`` is set in sub-config, the layers selected by this config will not be compressed.
+
+Special Keys for Pruning
+------------------------
+
+op_partial_names
+^^^^^^^^^^^^^^^^
+
+This key will share with `Quantization Config` in the future.
+
+This key is for the layers to be pruned with names that have the same sub-string. NNI will find all names in the model,
+find names that contain one of ``op_partial_names``, and append them into the ``op_names``.
+
+sparsity_per_layer
+^^^^^^^^^^^^^^^^^^
+
+The sparsity ratio of each selected layer.
+
+e.g., the ``sparsity_per_layer`` is 0.8 means each selected layer will mask 80% values on the weight.
+If ``layer_1`` (500 parameters) and ``layer_2`` (1000 parameters) are selected in this sub-config,
+then ``layer_1`` will be masked 400 parameters and ``layer_2`` will be masked 800 parameters.
+
+total_sparsity
+^^^^^^^^^^^^^^
+
+The sparsity ratio of all selected layers, means that sparsity ratio may no longer be even between layers.
+
+e.g., the ``total_sparsity`` is 0.8 means 80% of parameters in this sub-config will be masked.
+If ``layer_1`` (500 parameters) and ``layer_2`` (1000 parameters) are selected in this sub-config,
+then ``layer_1`` and ``layer_2`` will be masked a total of 1200 parameters,
+how these total parameters are distributed between the two layers is determined by the pruning algorithm.
+
+sparsity
+^^^^^^^^
+
+``sparsity`` is an old config key from the pruning v1, it has the same meaning as ``sparsity_per_layer``.
+You can also use ``sparsity`` right now, but it will be deprecated in the future.
+
+max_sparsity_per_layer
+^^^^^^^^^^^^^^^^^^^^^^
+
+This key is usually used with ``total_sparsity``. It limits the maximum sparsity ratio of each layer.
+
+In ``total_sparsity`` example, there are 1200 parameters that need to be masked and all parameters in ``layer_1`` may be totally masked.
+To avoid this situation, ``max_sparsity_per_layer`` can be set as 0.9, this means up to 450 parameters can be masked in ``layer_1``,
+and 900 parameters can be masked in ``layer_2``.
+
+Special Keys for Quantization
+-----------------------------
+
+quant_types
+^^^^^^^^^^^
+
+Currently, nni support three kind of quantization types: 'weight', 'input', 'output'.
+It can be set as ``str`` or ``List[str]``.
+Note that 'weight' and 'input' are always quantize together, e.g., ``['input', 'weight']``.
+
+quant_bits
+^^^^^^^^^^
+
+Bits length of quantization, key is the quantization type set in ``quant_types``, value is the length,
+eg. {'weight': 8}, when the type is int, all quantization types share same bits length.
+
+quant_start_step
+^^^^^^^^^^^^^^^^
+
+Specific key for ``QAT Quantizer``. Disable quantization until model are run by certain number of steps,
+this allows the network to enter a more stable.
+State where output quantization ranges do not exclude a signiﬁcant fraction of values, default value is 0.
+
+Examples
+--------
+
+Suppose we want to compress the following model::
+
+    class Model(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.conv1 = nn.Conv2d(1, 32, 3, 1)
+            self.conv2 = nn.Conv2d(32, 64, 3, 1)
+            self.dropout1 = nn.Dropout2d(0.25)
+            self.dropout2 = nn.Dropout2d(0.5)
+            self.fc1 = nn.Linear(9216, 128)
+            self.fc2 = nn.Linear(128, 10)
+
+        def forward(self, x):
+            ...
+    
+First, we need to determine where to compress, use the following config list to specify all ``Conv2d`` modules and module named ``fc1``::
+
+    config_list = [{'op_types': ['Conv2d']}, {'op_names': ['fc1']}]
+
+Sometimes we may need to compress all modules of a certain type, except for a few special ones.
+Writing all the module names is laborious at this point, we can use ``exclude`` to quickly specify the compression target modules::
+
+    config_list = [{
+        'op_types': ['Conv2d', 'Linear']
+    }, {
+        'exclude': True,
+        'op_names': ['fc2']
+    }]
+
+The above two config lists are equivalent to the model we want to compress, they both use ``conv1``, ``conv2``, and ``fc1`` as compression targets.
+
+Let's take a simple pruning config list example, pruning all ``Conv2d`` modules with 50% sparsity, and pruning ``fc1`` with 80% sparsity::
+
+    config_list = [{
+        'op_types': ['Conv2d'],
+        'total_sparsity': 0.5
+    }, {
+        'op_names': ['fc1'],
+        'total_sparsity': 0.8
+    }]
+
+Then if you want to try model quantization, here is a simple config list example::
+
+    config_list = [{
+        'op_types': ['Conv2d'],
+        'quant_types': ['input', 'weight'],
+        'quant_bits': {'input': 8, 'weight': 8}
+    }, {
+        'op_names': ['fc1'],
+        'quant_types': ['input', 'weight'],
+        'quant_bits': {'input': 8, 'weight': 8}
+    }]
--- a/docs/source/compression/compression_evaluator.rst
+++ b/docs/source/compression/compression_evaluator.rst
+Compression Evaluator
+=====================
+
+The ``Evaluator`` is used to package the training and evaluation process for a targeted model.
+To explain why NNI needs an ``Evaluator``, let's first look at the general process of model compression in NNI.
+
+In model pruning, some algorithms need to prune according to some intermediate variables (gradients, activations, etc.) generated during the training process,
+and some algorithms need to gradually increase or adjust the sparsity of different layers during the training process,
+or adjust the pruning strategy according to the performance changes of the model during the pruning process.
+
+In model quantization, NNI has quantization-aware training algorithm,
+it can adjust the scale and zero point required for model quantization from time to time during the training process,
+and may achieve a better performance compare to post-training quantization.
+
+In order to better support the above algorithms' needs and maintain the consistency of the interface,
+NNI introduces the ``Evaluator`` as the carrier of the training and evaluation process.
+
+.. note::
+    For users prior to NNI v2.8: NNI previously provided APIs like ``trainer``, ``traced_optimizer``, ``criterion``, ``finetuner``.
+    These APIs were maybe tedious in terms of user experience. Users need to exchange the corresponding API frequently if they want to switch compression algorithms.
+    ``Evaluator`` is an alternative to the above interface, users only need to create the evaluator once and it can be used in all compressors.
+
+For users of native PyTorch, :class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` requires the user to encapsulate the training process as a function and exposes the specified interface,
+which will bring some complexity. But don't worry, in most cases, this will not change too much code.
+
+For users of `PyTorchLightning <https://www.pytorchlightning.ai/>`__, :class:`LightningEvaluator <nni.compression.pytorch.LightningEvaluator>` can be created with only a few lines of code based on your original Lightning code.
+
+Here we give two examples of how to create an ``Evaluator`` for both native PyTorch and PyTorchLightning users.
+
+TorchEvaluator
+--------------
+
+:class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` is for the users who work in a native PyTorch environment (If you are using PyTorchLightning, please refer `LightningEvaluator`_).
+
+:class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` has six initialization parameters ``training_func``, ``optimizers``, ``criterion``, ``lr_schedulers``,
+``dummy_input``, ``evaluating_func``.
+
+* ``training_func`` is the training loop to train the compressed model.
+  It is a callable function with six input parameters ``model``, ``optimizers``,
+  ``criterion``, ``lr_schedulers``, ``max_steps``, ``max_epochs``.
+  Please make sure each input argument of the ``training_func`` is actually used,
+  especially ``max_steps`` and ``max_epochs`` can correctly control the duration of training.
+* ``optimizers`` is a single / a list of traced optimizer(s),
+  please make sure using ``nni.trace`` wrapping the ``Optimizer`` class before initializing it / them.
+* ``criterion`` is a callable function to compute loss, it has two input parameters ``input`` and ``target``, and returns a tensor as loss.
+* ``lr_schedulers`` is a single / a list of traced scheduler(s), same as ``optimizers``,
+  please make sure using ``nni.trace`` wrapping the ``_LRScheduler`` class before initializing it / them.
+* ``dummy_input`` is used to trace the model, same as ``example_inputs``
+  in `torch.jit.trace <https://pytorch.org/docs/stable/generated/torch.jit.trace.html?highlight=torch%20jit%20trace#torch.jit.trace>`_.
+* ``evaluating_func`` is a callable function to evaluate the compressed model performance. Its input is a compressed model and its output is metric.
+  The format of metric should be a float number or a dict with key ``default``.
+
+Please refer :class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` for more details.
+Here is an example of how to initialize a :class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>`.
+
+.. code-block:: python
+
+    from __future__ import annotations
+    from typing import Callable, Any
+
+    import torch
+    from torch.optim.lr_scheduler import StepLR, _LRScheduler
+    from torch.utils.data import DataLoader
+    from torchvision import datasets, models
+
+    import nni
+    from nni.algorithms.compression.v2.pytorch import TorchEvaluator
+
+
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+
+    def training_func(model: torch.nn.Module, optimizers: torch.optim.Optimizer,
+                      criterion: Callable[[Any, Any], torch.Tensor],
+                      lr_schedulers: _LRScheduler | None = None, max_steps: int | None = None,
+                      max_epochs: int | None = None, *args, **kwargs):
+        model.train()
+
+        # prepare data
+        imagenet_train_data = datasets.ImageNet(root='data/imagenet', split='train', download=True)
+        train_dataloader = DataLoader(imagenet_train_data, batch_size=4, shuffle=True)
+
+        #############################################################################
+        # NNI may change the training duration by setting max_steps or max_epochs.
+        # To ensure that NNI has the ability to control the training duration,
+        # please add max_steps and max_epochs as constraints to the training loop.
+        #############################################################################
+        total_epochs = max_epochs if max_epochs else 20
+        total_steps = max_steps if max_steps else 1000000
+        current_steps = 0
+
+        # training loop
+        for _ in range(total_epochs):
+            for inputs, labels in train_dataloader:
+                inputs, labels = inputs.to(device), labels.to(device)
+
+                optimizers.zero_grad()
+                loss = criterion(model(inputs), labels)
+                loss.backward()
+                optimizers.step()
+                ######################################################################
+                # stop the training loop when reach the total_steps
+                ######################################################################
+                current_steps += 1
+                if total_steps and current_steps == total_steps:
+                    return
+            lr_schedulers.step()
+
+
+    def evaluating_func(model: torch.nn.Module):
+        model.eval()
+
+        # prepare data
+        imagenet_val_data = datasets.ImageNet(root='./data/imagenet', split='val', download=True)
+        val_dataloader = DataLoader(imagenet_val_data, batch_size=4, shuffle=False)
+
+        # testing loop
+        correct = 0
+        with torch.no_grad():
+            for inputs, labels in val_dataloader:
+                inputs, labels = inputs.to(device), labels.to(device)
+                logits = model(inputs)
+                preds = torch.argmax(logits, dim=1)
+                correct += preds.eq(labels.view_as(preds)).sum().item()
+        return correct / len(imagenet_val_data)
+
+
+    # initialize the optimizer, criterion, lr_scheduler, dummy_input
+    model = models.resnet18().to(device)
+    ######################################################################
+    # please use nni.trace wrap the optimizer class,
+    # NNI will use the trace information to re-initialize the optimizer
+    ######################################################################
+    optimizer = nni.trace(torch.optim.Adam)(model.parameters(), lr=1e-3)
+    criterion = torch.nn.CrossEntropyLoss()
+    ######################################################################
+    # please use nni.trace wrap the lr_scheduler class,
+    # NNI will use the trace information to re-initialize the lr_scheduler
+    ######################################################################
+    lr_scheduler = nni.trace(StepLR)(optimizer, step_size=5, gamma=0.1)
+    dummy_input = torch.rand(4, 3, 224, 224).to(device)
+
+    # TorchEvaluator initialization
+    evaluator = TorchEvaluator(training_func=training_func, optimizers=optimizer, criterion=criterion,
+                               lr_schedulers=lr_scheduler, dummy_input=dummy_input, evaluating_func=evaluating_func)
+
+
+.. note::
+    It is also worth to note that not all the arguments of :class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` must be provided.
+    Some compressors only require ``evaluate_func`` as they do not train the model, some compressors only require ``training_func``.
+    Please refer to each compressor's doc to check the required arguments.
+    But, it is fine to provide more arguments than the compressor's need.
+
+
+A complete example of pruner using :class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` to compress model can be found :githublink:`here <examples/model_compress/pruning/taylorfo_torch_evaluator.py>`.
+
+
+LightningEvaluator
+------------------
+:class:`LightningEvaluator <nni.compression.pytorch.LightningEvaluator>` is for the users who work with PyTorchLightning.
+
+Only three parts users need to modify compared with the original pytorch-lightning code:
+
+1. Wrap the ``Optimizer`` and ``_LRScheduler`` class with ``nni.trace``.
+2. Wrap the ``LightningModule`` class with ``nni.trace``.
+3. Wrap the ``LightningDataModule`` class with ``nni.trace``.
+
+Please refer :class:`LightningEvaluator <nni.compression.pytorch.LightningEvaluator>` for more details.
+Here is an example of how to initialize a :class:`LightningEvaluator <nni.compression.pytorch.LightningEvaluator>`.
+
+.. code-block:: python
+
+    import pytorch_lightning as pl
+    from pytorch_lightning.loggers import TensorBoardLogger
+    import torch
+    from torch.optim.lr_scheduler import StepLR
+    from torch.utils.data import DataLoader
+    from torchmetrics.functional import accuracy
+    from torchvision import datasets, models
+
+    import nni
+    from nni.algorithms.compression.v2.pytorch import LightningEvaluator
+
+
+    class SimpleLightningModel(pl.LightningModule):
+        def __init__(self):
+            super().__init__()
+            self.model = models.resnet18()
+            self.criterion = torch.nn.CrossEntropyLoss()
+
+        def forward(self, x):
+            return self.model(x)
+
+        def training_step(self, batch, batch_idx):
+            x, y = batch
+            logits = self(x)
+            loss = self.criterion(logits, y)
+            self.log("train_loss", loss)
+            return loss
+
+        def evaluate(self, batch, stage=None):
+            x, y = batch
+            logits = self(x)
+            loss = self.criterion(logits, y)
+            preds = torch.argmax(logits, dim=1)
+            acc = accuracy(preds, y)
+
+            if stage:
+                self.log(f"default", loss, prog_bar=False)
+                self.log(f"{stage}_loss", loss, prog_bar=True)
+                self.log(f"{stage}_acc", acc, prog_bar=True)
+
+        def validation_step(self, batch, batch_idx):
+            self.evaluate(batch, "val")
+
+        def test_step(self, batch, batch_idx):
+            self.evaluate(batch, "test")
+
+        #####################################################################
+        # please pay attention to this function,
+        # using nni.trace trace the optimizer and lr_scheduler class.
+        #####################################################################
+        def configure_optimizers(self):
+            optimizer = nni.trace(torch.optim.SGD)(
+                self.parameters(),
+                lr=0.01,
+                momentum=0.9,
+                weight_decay=5e-4,
+            )
+            scheduler_dict = {
+                "scheduler": nni.trace(StepLR)(
+                    optimizer,
+                    step_size=5,
+                    amma=0.1
+                ),
+                "interval": "epoch",
+            }
+            return {"optimizer": optimizer, "lr_scheduler": scheduler_dict}
+
+
+    class ImageNetDataModule(pl.LightningDataModule):
+        def __init__(self, data_dir: str = "./data/imagenet"):
+            super().__init__()
+            self.data_dir = data_dir
+
+        def prepare_data(self):
+            # download
+            datasets.ImageNet(self.data_dir, split='train', download=True)
+            datasets.ImageNet(self.data_dir, split='val', download=True)
+
+        def setup(self, stage: str | None = None):
+            if stage == "fit" or stage is None:
+                self.imagenet_train_data = datasets.ImageNet(root='data/imagenet', split='train')
+                self.imagenet_val_data = datasets.ImageNet(root='./data/imagenet', split='val')
+
+            if stage == "test" or stage is None:
+                self.imagenet_test_data = datasets.ImageNet(root='./data/imagenet', split='val')
+
+            if stage == "predict" or stage is None:
+                self.imagenet_predict_data = datasets.ImageNet(root='./data/imagenet', split='val')
+
+        def train_dataloader(self):
+            return DataLoader(self.imagenet_train_data, batch_size=4)
+
+        def val_dataloader(self):
+            return DataLoader(self.imagenet_val_data, batch_size=4)
+
+        def test_dataloader(self):
+            return DataLoader(self.imagenet_test_data, batch_size=4)
+
+        def predict_dataloader(self):
+            return DataLoader(self.imagenet_predict_data, batch_size=4)
+
+    #####################################################################
+    # please use nni.trace wrap the pl.Trainer class,
+    # NNI will use the trace information to re-initialize the trainer
+    #####################################################################
+    pl_trainer = nni.trace(pl.Trainer)(
+        accelerator='auto',
+        devices=1,
+        max_epochs=1,
+        max_steps=50,
+        logger=TensorBoardLogger('./lightning_logs', name="resnet"),
+    )
+
+    #####################################################################
+    # please use nni.trace wrap the pl.LightningDataModule class,
+    # NNI will use the trace information to re-initialize the datamodule
+    #####################################################################
+    pl_data = nni.trace(ImageNetDataModule)(data_dir='./data/imagenet')
+
+    evaluator = LightningEvaluator(pl_trainer, pl_data)
+
+
+.. note::
+    In ``LightningModule.configure_optimizers``, user should use traced ``torch.optim.Optimizer`` and traced ``torch.optim._LRScheduler``.
+    It's for NNI can get the initialization parameters of the optimizers and lr_schedulers.
+
+    .. code-block:: python
+
+        class SimpleModel(pl.LightningModule):
+            ...
+
+            def configure_optimizers(self):
+                optimizers = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.001)
+                lr_schedulers = nni.trace(ExponentialLR)(optimizer=optimizers, gamma=0.1)
+                return optimizers, lr_schedulers
+
+
+A complete example of pruner using :class:`LightningEvaluator <nni.compression.pytorch.LightningEvaluator>` to compress model can be found :githublink:`here <examples/model_compress/pruning/taylorfo_lightning_evaluator.py>`.
--- a/docs/source/compression/compression_utils.rst
+++ b/docs/source/compression/compression_utils.rst
+Analysis Utils for Model Compression
+====================================
+
+We provide several easy-to-use tools for users to analyze their model during model compression.
+
+Sensitivity Analysis
+--------------------
+
+First, we provide a sensitivity analysis tool (\ **SensitivityAnalysis**\ ) for users to analyze the sensitivity of each convolutional layer in their model. Specifically, the SensitiviyAnalysis gradually prune each layer of the model, and test the accuracy of the model at the same time. Note that, SensitivityAnalysis only prunes a layer once a time, and the other layers are set to their original weights. According to the accuracies of different convolutional layers under different sparsities, we can easily find out which layers the model accuracy is more sensitive to. 
+
+Usage
+^^^^^
+
+The following codes show the basic usage of the SensitivityAnalysis.
+
+.. code-block:: python
+
+   from nni.compression.pytorch.utils.sensitivity_analysis import SensitivityAnalysis
+
+   def val(model):
+       model.eval()
+       total = 0
+       correct = 0
+       with torch.no_grad():
+           for batchid, (data, label) in enumerate(val_loader):
+               data, label = data.cuda(), label.cuda()
+               out = model(data)
+               _, predicted = out.max(1)
+               total += data.size(0)
+               correct += predicted.eq(label).sum().item()
+       return correct / total
+
+   s_analyzer = SensitivityAnalysis(model=net, val_func=val)
+   sensitivity = s_analyzer.analysis(val_args=[net])
+   os.makedir(outdir)
+   s_analyzer.export(os.path.join(outdir, filename))
+
+Two key parameters of SensitivityAnalysis are ``model``\ , and ``val_func``. ``model`` is the neural network that to be analyzed and the ``val_func`` is the validation function that returns the model accuracy/loss/ or other metrics on the validation dataset. Due to different scenarios may have different ways to calculate the loss/accuracy, so users should prepare a function that returns the model accuracy/loss on the dataset and pass it to SensitivityAnalysis.
+SensitivityAnalysis can export the sensitivity results as a csv file usage is shown in the example above.
+
+Futhermore, users can specify the sparsities values used to prune for each layer by optional parameter ``sparsities``.
+
+.. code-block:: python
+
+   s_analyzer = SensitivityAnalysis(model=net, val_func=val, sparsities=[0.25, 0.5, 0.75])
+
+the SensitivityAnalysis will prune 25% 50% 75% weights gradually for each layer, and record the model's accuracy at the same time (SensitivityAnalysis only prune a layer once a time, the other layers are set to their original weights). If the sparsities is not set, SensitivityAnalysis will use the numpy.arange(0.1, 1.0, 0.1) as the default sparsity values.
+
+Users can also speedup the progress of sensitivity analysis by the early_stop_mode and early_stop_value option. By default, the SensitivityAnalysis will test the accuracy under all sparsities for each layer. In contrast, when the early_stop_mode and early_stop_value are set, the sensitivity analysis for a layer will stop, when the accuracy/loss has already met the threshold set by early_stop_value. We support four early stop modes:  minimize, maximize, dropped, raised.
+
+minimize: The analysis stops when the validation metric return by the val_func lower than ``early_stop_value``.
+
+maximize: The analysis stops when the validation metric return by the val_func larger than ``early_stop_value``.
+
+dropped: The analysis stops when the validation metric has dropped by ``early_stop_value``.
+
+raised: The analysis stops when the validation metric has raised by ``early_stop_value``.
+
+.. code-block:: python
+
+   s_analyzer = SensitivityAnalysis(model=net, val_func=val, sparsities=[0.25, 0.5, 0.75], early_stop_mode='dropped', early_stop_value=0.1)
+
+If users only want to analyze several specified convolutional layers, users can specify the target conv layers by the ``specified_layers`` in analysis function. ``specified_layers`` is a list that consists of the Pytorch module names of the conv layers. For example
+
+.. code-block:: python
+
+   sensitivity = s_analyzer.analysis(val_args=[net], specified_layers=['Conv1'])
+
+In this example, only the ``Conv1`` layer is analyzed. In addtion, users can quickly and easily achieve the analysis parallelization by launching multiple processes and assigning different conv layers of the same model to each process.
+
+Output example
+^^^^^^^^^^^^^^
+
+The following lines are the example csv file exported from SensitivityAnalysis. The first line is constructed by 'layername' and sparsity list. Here the sparsity value means how much weight SensitivityAnalysis prune for each layer. Each line below records the model accuracy when this layer is under different sparsities. Note that, due to the early_stop option, some layers may
+not have model accuracies/losses under all sparsities, for example, its accuracy drop has already exceeded the threshold set by the user.
+
+.. code-block:: bash
+
+   layername,0.05,0.1,0.2,0.3,0.4,0.5,0.7,0.85,0.95
+   features.0,0.54566,0.46308,0.06978,0.0374,0.03024,0.01512,0.00866,0.00492,0.00184
+   features.3,0.54878,0.51184,0.37978,0.19814,0.07178,0.02114,0.00438,0.00442,0.00142
+   features.6,0.55128,0.53566,0.4887,0.4167,0.31178,0.19152,0.08612,0.01258,0.00236
+   features.8,0.55696,0.54194,0.48892,0.42986,0.33048,0.2266,0.09566,0.02348,0.0056
+   features.10,0.55468,0.5394,0.49576,0.4291,0.3591,0.28138,0.14256,0.05446,0.01578
+
+.. _topology-analysis:
+
+Topology Analysis
+-----------------
+
+We also provide several tools for the topology analysis during the model compression. These tools are to help users compress their model better. Because of the complex topology of the network, when compressing the model, users often need to spend a lot of effort to check whether the compression configuration is reasonable. So we provide these tools for topology analysis to reduce the burden on users.
+
+ChannelDependency
+^^^^^^^^^^^^^^^^^
+
+Complicated models may have residual connection/concat operations in their models. When the user prunes these models, they need to be careful about the channel-count dependencies between the convolution layers in the model. Taking the following residual block in the resnet18 as an example. The output features of the ``layer2.0.conv2`` and ``layer2.0.downsample.0`` are added together, so the number of the output channels of ``layer2.0.conv2`` and ``layer2.0.downsample.0`` should be the same, or there may be a tensor shape conflict.
+
+
+.. image:: ../../img/channel_dependency_example.jpg
+   :target: ../../img/channel_dependency_example.jpg
+   :alt: 
+ 
+
+If the layers have channel dependency are assigned with different sparsities (here we only discuss the structured pruning by L1FilterPruner/L2FilterPruner), then there will be a shape conflict during these layers. Even the pruned model with mask works fine, the pruned model cannot be speedup to the final model directly that runs on the devices, because there will be a shape conflict when the model tries to add/concat the outputs of these layers. This tool is to find the layers that have channel count dependencies to help users better prune their model.
+
+Usage
+"""""
+
+.. code-block:: python
+
+   from nni.compression.pytorch.utils.shape_dependency import ChannelDependency
+   data = torch.ones(1, 3, 224, 224).cuda()
+   channel_depen = ChannelDependency(net, data)
+   channel_depen.export('dependency.csv')
+
+Output Example
+""""""""""""""
+
+The following lines are the output example of torchvision.models.resnet18 exported by ChannelDependency. The layers at the same line have output channel dependencies with each other. For example, layer1.1.conv2, conv1, and layer1.0.conv2 have output channel dependencies with each other, which means the output channel(filters) numbers of these three layers should be same with each other, otherwise, the model may have shape conflict. 
+
+.. code-block:: bash
+
+   Dependency Set,Convolutional Layers
+   Set 1,layer1.1.conv2,layer1.0.conv2,conv1
+   Set 2,layer1.0.conv1
+   Set 3,layer1.1.conv1
+   Set 4,layer2.0.conv1
+   Set 5,layer2.1.conv2,layer2.0.conv2,layer2.0.downsample.0
+   Set 6,layer2.1.conv1
+   Set 7,layer3.0.conv1
+   Set 8,layer3.0.downsample.0,layer3.1.conv2,layer3.0.conv2
+   Set 9,layer3.1.conv1
+   Set 10,layer4.0.conv1
+   Set 11,layer4.0.downsample.0,layer4.1.conv2,layer4.0.conv2
+   Set 12,layer4.1.conv1
+
+MaskConflict
+^^^^^^^^^^^^
+
+When the masks of different layers in a model have conflict (for example, assigning different sparsities for the layers that have channel dependency), we can fix the mask conflict by MaskConflict. Specifically, the MaskConflict loads the masks exported by the pruners(L1FilterPruner, etc), and check if there is mask conflict, if so, MaskConflict sets the conflicting masks to the same value.
+
+.. code-block:: python
+
+   from nni.compression.pytorch.utils.mask_conflict import fix_mask_conflict
+   fixed_mask = fix_mask_conflict('./resnet18_mask', net, data)
+
+not_safe_to_prune
+^^^^^^^^^^^^^^^^^
+
+If we try to prune a layer whose output tensor is taken as the input by a shape-constraint OP(for example, view, reshape), then such pruning maybe not be safe. For example, we have a convolutional layer followed by a view function.
+
+.. code-block:: python
+
+   x = self.conv(x) # output shape is (batch, 1024, 3, 3)
+   x = x.view(-1, 1024)
+
+If the output shape of the pruned conv layer is not divisible by 1024(for example(batch, 500, 3, 3)), we may meet a shape error. We cannot replace such a function that directly operates on the Tensor. Therefore, we need to be careful when pruning such layers. The function not_safe_to_prune finds all the layers followed by a shape-constraint function. Here is an example for usage. If you meet a shape error when running the forward inference on the speeduped model, you can exclude the layers returned by not_safe_to_prune and try again. 
+
+.. code-block:: python
+
+   not_safe = not_safe_to_prune(model, dummy_input)
+
+.. _flops-counter:
+
+Model FLOPs/Parameters Counter
+------------------------------
+
+We provide a model counter for calculating the model FLOPs and parameters. This counter supports calculating FLOPs/parameters of a normal model without masks, it can also calculates FLOPs/parameters of a model with mask wrappers, which helps users easily check model complexity during model compression on NNI. Note that, for sturctured pruning, we only identify the remained filters according to its mask, which not taking the pruned input channels into consideration, so the calculated FLOPs will be larger than real number (i.e., the number calculated after Model Speedup). 
+
+We support two modes to collect information of modules. The first mode is ``default``\ , which only collect the information of convolution and linear. The second mode is ``full``\ , which also collect the information of other operations. Users can easily use our collected ``results`` for futher analysis.
+
+Usage
+^^^^^
+
+.. code-block:: python
+
+   from nni.compression.pytorch.utils import count_flops_params
+
+   # Given input size (1, 1, 28, 28)
+   flops, params, results = count_flops_params(model, (1, 1, 28, 28)) 
+
+   # Given input tensor with size (1, 1, 28, 28) and switch to full mode
+   x = torch.randn(1, 1, 28, 28)
+
+   flops, params, results = count_flops_params(model, (x,), mode='full') # tuple of tensor as input
+
+   # Format output size to M (i.e., 10^6)
+   print(f'FLOPs: {flops/1e6:.3f}M,  Params: {params/1e6:.3f}M')
+   print(results)
+   {
+   'conv': {'flops': [60], 'params': [20], 'weight_size': [(5, 3, 1, 1)], 'input_size': [(1, 3, 2, 2)], 'output_size': [(1, 5, 2, 2)], 'module_type': ['Conv2d']}, 
+   'conv2': {'flops': [100], 'params': [30], 'weight_size': [(5, 5, 1, 1)], 'input_size': [(1, 5, 2, 2)], 'output_size': [(1, 5, 2, 2)], 'module_type': ['Conv2d']}
+   }