Unverified Commit 51d261e7 authored by J-shang's avatar J-shang Committed by GitHub
Browse files

Merge pull request #4668 from microsoft/doc-refactor

parents d63a2ea3 b469e1c1
#######################
Use Cases and Solutions
#######################
Different from the tutorials and examples in the rest of the document which show the usage of a feature, this part mainly introduces end-to-end scenarios and use cases to help users further understand how NNI can help them. NNI can be widely adopted in various scenarios. We also encourage community contributors to share their AutoML practices especially the NNI usage practices from their experience.
Use Cases and Solutions
=======================
.. toctree::
:maxdepth: 2
Automatic Model Tuning (HPO/NAS) <automodel>
Automatic System Tuning (AutoSys) <autosys>
Model Compression <model_compression>
Feature Engineering <feature_engineering>
Performance measurement, comparison and analysis <perf_compare>
Use NNI on Google Colab <NNI_colab_support>
External Repositories and References
====================================
With authors' permission, we listed a set of NNI usage examples and relevant articles.
External Repositories
=====================
* `Hyperparameter Tuning for Matrix Factorization <https://github.com/microsoft/recommenders/blob/master/examples/04_model_select_and_optimize/nni_surprise_svd.ipynb>`__ with NNI
* `scikit-nni <https://github.com/ksachdeva/scikit-nni>`__ Hyper-parameter search for scikit-learn pipelines using NNI
Relevant Articles
=================
* `Cost-effective Hyper-parameter Tuning using AdaptDL with NNI - Feb 23, 2021 <https://medium.com/casl-project/cost-effective-hyper-parameter-tuning-using-adaptdl-with-nni-e55642888761>`__
* `(in Chinese) A summary of NNI new capabilities in NNI 2.0 - Jan 21, 2021 <https://www.msra.cn/zh-cn/news/features/nni-2>`__
* `(in Chinese) A summary of NNI new capabilities in 2019 - Dec 26, 2019 <https://mp.weixin.qq.com/s/7_KRT-rRojQbNuJzkjFMuA>`__
* `Find thy hyper-parameters for scikit-learn pipelines using Microsoft NNI - Nov 6, 2019 <https://towardsdatascience.com/find-thy-hyper-parameters-for-scikit-learn-pipelines-using-microsoft-nni-f1015b1224c1>`__
* `(in Chinese) AutoML tools (Advisor, NNI and Google Vizier) comparison - Aug 05, 2019 <http://gaocegege.com/Blog/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/katib-new#%E6%80%BB%E7%BB%93%E4%B8%8E%E5%88%86%E6%9E%90>`__
* `Hyper Parameter Optimization Comparison <./HpoComparison.rst>`__
* `Neural Architecture Search Comparison <./NasComparison.rst>`__
* `Parallelizing a Sequential Algorithm TPE <./ParallelizingTpeSearch.rst>`__
* `Automatically tuning SVD with NNI <./RecommendersSvd.rst>`__
* `Automatically tuning SPTAG with NNI <./SptagAutoTune.rst>`__
Auto Compression with NNI Experiment
====================================
If you want to compress your model, but don't know what compression algorithm to choose, or don't know what sparsity is suitable for your model, or just want to try more possibilities, auto compression may help you.
Users can choose different compression algorithms and define the algorithms' search space, then auto compression will launch an NNI experiment and try different compression algorithms with varying sparsity automatically.
Of course, in addition to the sparsity rate, users can also introduce other related parameters into the search space.
If you don't know what is search space or how to write search space, `this <./Tutorial/SearchSpaceSpec.rst>`__ is for your reference.
Auto compression using experience is similar to the NNI experiment in python.
The main differences are as follows:
* Use a generator to help generate search space object.
* Need to provide the model to be compressed, and the model should have already been pre-trained.
* No need to set ``trial_command``, additional need to set ``auto_compress_module`` as ``AutoCompressionExperiment`` input.
.. note::
Auto compression only supports TPE Tuner, Random Search Tuner, Anneal Tuner, Evolution Tuner right now.
Generate search space
---------------------
Due to the extensive use of nested search space, we recommend a using generator to configure search space.
The following is an example. Using ``add_config()`` add subconfig, then ``dumps()`` search space dict.
.. code-block:: python
from nni.algorithms.compression.pytorch.auto_compress import AutoCompressionSearchSpaceGenerator
generator = AutoCompressionSearchSpaceGenerator()
generator.add_config('level', [
{
"sparsity": {
"_type": "uniform",
"_value": [0.01, 0.99]
},
'op_types': ['default']
}
])
generator.add_config('qat', [
{
'quant_types': ['weight', 'output'],
'quant_bits': {
'weight': 8,
'output': 8
},
'op_types': ['Conv2d', 'Linear']
}])
search_space = generator.dumps()
Now we support the following pruners and quantizers:
.. code-block:: python
PRUNER_DICT = {
'level': LevelPruner,
'slim': SlimPruner,
'l1': L1FilterPruner,
'l2': L2FilterPruner,
'fpgm': FPGMPruner,
'taylorfo': TaylorFOWeightFilterPruner,
'apoz': ActivationAPoZRankFilterPruner,
'mean_activation': ActivationMeanRankFilterPruner
}
QUANTIZER_DICT = {
'naive': NaiveQuantizer,
'qat': QAT_Quantizer,
'dorefa': DoReFaQuantizer,
'bnn': BNNQuantizer
}
Provide user model for compression
----------------------------------
Users need to inherit ``AbstractAutoCompressionModule`` and override the abstract class function.
.. code-block:: python
from nni.algorithms.compression.pytorch.auto_compress import AbstractAutoCompressionModule
class AutoCompressionModule(AbstractAutoCompressionModule):
@classmethod
def model(cls) -> nn.Module:
...
return _model
@classmethod
def evaluator(cls) -> Callable[[nn.Module], float]:
...
return _evaluator
Users need to implement at least ``model()`` and ``evaluator()``.
If you use iterative pruner, you need to additional implement ``optimizer_factory()``, ``criterion()`` and ``sparsifying_trainer()``.
If you want to finetune the model after compression, you need to implement ``optimizer_factory()``, ``criterion()``, ``post_compress_finetuning_trainer()`` and ``post_compress_finetuning_epochs()``.
The ``optimizer_factory()`` should return a factory function, the input is an iterable variable, i.e. your ``model.parameters()``, and the output is an optimizer instance.
The two kinds of ``trainer()`` should return a trainer with input ``model, optimizer, criterion, current_epoch``.
The full abstract interface refers to :githublink:`interface.py <nni/algorithms/compression/pytorch/auto_compress/interface.py>`.
An example of ``AutoCompressionModule`` implementation refers to :githublink:`auto_compress_module.py <examples/model_compress/auto_compress/torch/auto_compress_module.py>`.
Launch NNI experiment
---------------------
Similar to launch from python, the difference is no need to set ``trial_command`` and put the user-provided ``AutoCompressionModule`` as ``AutoCompressionExperiment`` input.
.. code-block:: python
from pathlib import Path
from nni.algorithms.compression.pytorch.auto_compress import AutoCompressionExperiment
from auto_compress_module import AutoCompressionModule
experiment = AutoCompressionExperiment(AutoCompressionModule, 'local')
experiment.config.experiment_name = 'auto compression torch example'
experiment.config.trial_concurrency = 1
experiment.config.max_trial_number = 10
experiment.config.search_space = search_space
experiment.config.trial_code_directory = Path(__file__).parent
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.training_service.use_active_gpu = True
experiment.run(8088)
Model Compression API Reference
===============================
.. contents::
Compressors
-----------
Compressor
^^^^^^^^^^
.. autoclass:: nni.compression.pytorch.compressor.Compressor
:members:
.. autoclass:: nni.compression.pytorch.compressor.Pruner
:members:
.. autoclass:: nni.compression.pytorch.compressor.Quantizer
:members:
Module Wrapper
^^^^^^^^^^^^^^
.. autoclass:: nni.compression.pytorch.compressor.PrunerModuleWrapper
:members:
.. autoclass:: nni.compression.pytorch.compressor.QuantizerModuleWrapper
:members:
Weight Masker
^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.pytorch.pruning.weight_masker.WeightMasker
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.structured_pruning_masker.StructuredWeightMasker
:members:
Pruners
^^^^^^^
.. autoclass:: nni.algorithms.compression.pytorch.pruning.sensitivity_pruner.SensitivityPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.one_shot_pruner.OneshotPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.one_shot_pruner.LevelPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.one_shot_pruner.L1FilterPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.one_shot_pruner.L2FilterPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.one_shot_pruner.FPGMPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.iterative_pruner.IterativePruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.iterative_pruner.SlimPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.iterative_pruner.TaylorFOWeightFilterPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.iterative_pruner.ActivationAPoZRankFilterPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.iterative_pruner.ActivationMeanRankFilterPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.iterative_pruner.AGPPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.iterative_pruner.ADMMPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.auto_compress_pruner.AutoCompressPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.net_adapt_pruner.NetAdaptPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.simulated_annealing_pruner.SimulatedAnnealingPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.lottery_ticket.LotteryTicketPruner
:members:
.. autoclass:: nni.algorithms.compression.pytorch.pruning.transformer_pruner.TransformerHeadPruner
:members:
Quantizers
^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.pytorch.quantization.NaiveQuantizer
:members:
.. autoclass:: nni.algorithms.compression.pytorch.quantization.QAT_Quantizer
:members:
.. autoclass:: nni.algorithms.compression.pytorch.quantization.DoReFaQuantizer
:members:
.. autoclass:: nni.algorithms.compression.pytorch.quantization.BNNQuantizer
:members:
.. autoclass:: nni.algorithms.compression.pytorch.quantization.LsqQuantizer
:members:
.. autoclass:: nni.algorithms.compression.pytorch.quantization.ObserverQuantizer
:members:
Model Speedup
-------------
Quantization Speedup
^^^^^^^^^^^^^^^^^^^^
.. autoclass:: nni.compression.pytorch.quantization_speedup.backend.BaseModelSpeedup
:members:
.. autoclass:: nni.compression.pytorch.quantization_speedup.integrated_tensorrt.ModelSpeedupTensorRT
:members:
.. autoclass:: nni.compression.pytorch.quantization_speedup.calibrator.Calibrator
:members:
Compression Utilities
---------------------
Sensitivity Utilities
^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: nni.compression.pytorch.utils.sensitivity_analysis.SensitivityAnalysis
:members:
Topology Utilities
^^^^^^^^^^^^^^^^^^
.. autoclass:: nni.compression.pytorch.utils.shape_dependency.ChannelDependency
:members:
.. autoclass:: nni.compression.pytorch.utils.shape_dependency.GroupDependency
:members:
.. autoclass:: nni.compression.pytorch.utils.mask_conflict.GroupMaskConflict
:members:
.. autoclass:: nni.compression.pytorch.utils.mask_conflict.ChannelMaskConflict
:members:
Model FLOPs/Parameters Counter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: nni.compression.pytorch.utils.counter.count_flops_params
Customize New Compression Algorithm
===================================
.. contents::
In order to simplify the process of writing new compression algorithms, we have designed simple and flexible programming interface, which covers pruning and quantization. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.
**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference `Framework overview of model compression <../Compression/Framework.rst>`__
Customize a new pruning algorithm
---------------------------------
Implementing a new pruning algorithm requires implementing a ``weight masker`` class which shoud be a subclass of ``WeightMasker``\ , and a ``pruner`` class, which should be a subclass ``Pruner``.
An implementation of ``weight masker`` may look like this:
.. code-block:: python
class MyMasker(WeightMasker):
def __init__(self, model, pruner):
super().__init__(model, pruner)
# You can do some initialization here, such as collecting some statistics data
# if it is necessary for your algorithms to calculate the masks.
def calc_mask(self, sparsity, wrapper, wrapper_idx=None):
# calculate the masks based on the wrapper.weight, and sparsity,
# and anything else
# mask = ...
return {'weight_mask': mask}
You can reference nni provided :githublink:`weight masker <nni/algorithms/compression/pytorch/pruning/structured_pruning_masker.py>` implementations to implement your own weight masker.
A basic ``pruner`` looks likes this:
.. code-block:: python
class MyPruner(Pruner):
def __init__(self, model, config_list, optimizer):
super().__init__(model, config_list, optimizer)
self.set_wrappers_attribute("if_calculated", False)
# construct a weight masker instance
self.masker = MyMasker(model, self)
def calc_mask(self, wrapper, wrapper_idx=None):
sparsity = wrapper.config['sparsity']
if wrapper.if_calculated:
# Already pruned, do not prune again as a one-shot pruner
return None
else:
# call your masker to actually calcuate the mask for this layer
masks = self.masker.calc_mask(sparsity=sparsity, wrapper=wrapper, wrapper_idx=wrapper_idx)
wrapper.if_calculated = True
return masks
Reference nni provided :githublink:`pruner <nni/algorithms/compression/pytorch/pruning/one_shot_pruner.py>` implementations to implement your own pruner class.
----
Customize a new quantization algorithm
--------------------------------------
To write a new quantization algorithm, you can write a class that inherits ``nni.compression.pytorch.Quantizer``. Then, override the member functions with the logic of your algorithm. The member function to override is ``quantize_weight``. ``quantize_weight`` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
.. code-block:: python
from nni.compression.pytorch import Quantizer
class YourQuantizer(Quantizer):
def __init__(self, model, config_list):
"""
Suggest you to use the NNI defined spec for config
"""
super().__init__(model, config_list)
def quantize_weight(self, weight, config, **kwargs):
"""
quantize should overload this method to quantize weight tensors.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
weight : Tensor
weight that needs to be quantized
config : dict
the configuration for weight quantization
"""
# Put your code to generate `new_weight` here
return new_weight
def quantize_output(self, output, config, **kwargs):
"""
quantize should overload this method to quantize output.
This method is effectively hooked to `:meth:`forward` of the model.
Parameters
----------
output : Tensor
output that needs to be quantized
config : dict
the configuration for output quantization
"""
# Put your code to generate `new_output` here
return new_output
def quantize_input(self, *inputs, config, **kwargs):
"""
quantize should overload this method to quantize input.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
inputs : Tensor
inputs that needs to be quantized
config : dict
the configuration for inputs quantization
"""
# Put your code to generate `new_input` here
return new_input
def update_epoch(self, epoch_num):
pass
def step(self):
"""
Can do some processing based on the model or weights binded
in the func bind_model
"""
pass
Customize backward function
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sometimes it's necessary for a quantization operation to have a customized backward function, such as `Straight-Through Estimator <https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste>`__\ , user can customize a backward function as follow:
.. code-block:: python
from nni.compression.pytorch.compressor import Quantizer, QuantGrad, QuantType
class ClipGrad(QuantGrad):
@staticmethod
def quant_backward(tensor, grad_output, quant_type):
"""
This method should be overrided by subclass to provide customized backward function,
default implementation is Straight-Through Estimator
Parameters
----------
tensor : Tensor
input of quantization operation
grad_output : Tensor
gradient of the output of quantization operation
quant_type : QuantType
the type of quantization, it can be `QuantType.INPUT`, `QuantType.WEIGHT`, `QuantType.OUTPUT`,
you can define different behavior for different types.
Returns
-------
tensor
gradient of the input of quantization operation
"""
# for quant_output function, set grad to zero if the absolute value of tensor is larger than 1
if quant_type == QuantType.OUTPUT:
grad_output[torch.abs(tensor) > 1] = 0
return grad_output
class YourQuantizer(Quantizer):
def __init__(self, model, config_list):
super().__init__(model, config_list)
# set your customized backward function to overwrite default backward function
self.quant_grad = ClipGrad
If you do not customize ``QuantGrad``\ , the default backward is Straight-Through Estimator.
*Coming Soon* ...
Dependency-aware Mode for Filter Pruning
========================================
Currently, we have several filter pruning algorithm for the convolutional layers: FPGM Pruner, L1Filter Pruner, L2Filter Pruner, Activation APoZ Rank Filter Pruner, Activation Mean Rank Filter Pruner, Taylor FO On Weight Pruner. In these filter pruning algorithms, the pruner will prune each convolutional layer separately. While pruning a convolution layer, the algorithm will quantify the importance of each filter based on some specific rules(such as l1-norm), and prune the less important filters.
As `dependency analysis utils <./CompressionUtils.rst>`__ shows, if the output channels of two convolutional layers(conv1, conv2) are added together, then these two conv layers have channel dependency with each other(more details please see `Compression Utils <./CompressionUtils.rst>`__\ ). Take the following figure as an example.
.. image:: ../../img/mask_conflict.jpg
:target: ../../img/mask_conflict.jpg
:alt:
If we prune the first 50% of output channels(filters) for conv1, and prune the last 50% of output channels for conv2. Although both layers have pruned 50% of the filters, the speedup module still needs to add zeros to align the output channels. In this case, we cannot harvest the speed benefit from the model pruning.
To better gain the speed benefit of the model pruning, we add a dependency-aware mode for the Filter Pruner. In the dependency-aware mode, the pruner prunes the model not only based on the l1 norm of each filter, but also the topology of the whole network architecture.
In the dependency-aware mode(\ ``dependency_aware`` is set ``True``\ ), the pruner will try to prune the same output channels for the layers that have the channel dependencies with each other, as shown in the following figure.
.. image:: ../../img/dependency-aware.jpg
:target: ../../img/dependency-aware.jpg
:alt:
Take the dependency-aware mode of L1Filter Pruner as an example. Specifically, the pruner will calculate the L1 norm (for example) sum of all the layers in the dependency set for each channel. Obviously, the number of channels that can actually be pruned of this dependency set in the end is determined by the minimum sparsity of layers in this dependency set(denoted by ``min_sparsity``\ ). According to the L1 norm sum of each channel, the pruner will prune the same ``min_sparsity`` channels for all the layers. Next, the pruner will additionally prune ``sparsity`` - ``min_sparsity`` channels for each convolutional layer based on its own L1 norm of each channel. For example, suppose the output channels of ``conv1`` , ``conv2`` are added together and the configured sparsities of ``conv1`` and ``conv2`` are 0.3, 0.2 respectively. In this case, the ``dependency-aware pruner`` will
.. code-block:: bash
- First, prune the same 20% of channels for `conv1` and `conv2` according to L1 norm sum of `conv1` and `conv2`.
- Second, the pruner will additionally prune 10% channels for `conv1` according to the L1 norm of each channel of `conv1`.
In addition, for the convolutional layers that have more than one filter group, ``dependency-aware pruner`` will also try to prune the same number of the channels for each filter group. Overall, this pruner will prune the model according to the L1 norm of each filter and try to meet the topological constrains(channel dependency, etc) to improve the final speed gain after the speedup process.
In the dependency-aware mode, the pruner will provide a better speed gain from the model pruning.
Usage
-----
In this section, we will show how to enable the dependency-aware mode for the filter pruner. Currently, only the one-shot pruners such as FPGM Pruner, L1Filter Pruner, L2Filter Pruner, Activation APoZ Rank Filter Pruner, Activation Mean Rank Filter Pruner, Taylor FO On Weight Pruner, support the dependency-aware mode.
To enable the dependency-aware mode for ``L1FilterPruner``\ :
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import L1FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
# dummy_input is necessary for the dependency_aware mode
dummy_input = torch.ones(1, 3, 224, 224).cuda()
pruner = L1FilterPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
# for L2FilterPruner
# pruner = L2FilterPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
# for FPGMPruner
# pruner = FPGMPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
# for ActivationAPoZRankFilterPruner
# pruner = ActivationAPoZRankFilterPruner(model, config_list, optimizer, trainer, criterion, sparsifying_training_batches=1, dependency_aware=True, dummy_input=dummy_input)
# for ActivationMeanRankFilterPruner
# pruner = ActivationMeanRankFilterPruner(model, config_list, optimizer, trainer, criterion, sparsifying_training_batches=1, dependency_aware=True, dummy_input=dummy_input)
# for TaylorFOWeightFilterPruner
# pruner = TaylorFOWeightFilterPruner(model, config_list, optimizer, trainer, criterion, sparsifying_training_batches=1, dependency_aware=True, dummy_input=dummy_input)
pruner.compress()
Evaluation
----------
In order to compare the performance of the pruner with or without the dependency-aware mode, we use L1FilterPruner to prune the Mobilenet_v2 separately when the dependency-aware mode is turned on and off. To simplify the experiment, we use the uniform pruning which means we allocate the same sparsity for all convolutional layers in the model.
We trained a Mobilenet_v2 model on the cifar10 dataset and prune the model based on this pretrained checkpoint. The following figure shows the accuracy and FLOPs of the model pruned by different pruners.
.. image:: ../../img/mobilev2_l1_cifar.jpg
:target: ../../img/mobilev2_l1_cifar.jpg
:alt:
In the figure, the ``Dependency-aware`` represents the L1FilterPruner with dependency-aware mode enabled. ``L1 Filter`` is the normal ``L1FilterPruner`` without the dependency-aware mode, and the ``No-Dependency`` means pruner only prunes the layers that has no channel dependency with other layers. As we can see in the figure, when the dependency-aware mode enabled, the pruner can bring higher accuracy under the same Flops.
Framework overview of model compression
=======================================
.. contents::
Below picture shows the components overview of model compression framework.
.. image:: ../../img/compressor_framework.jpg
:target: ../../img/compressor_framework.jpg
:alt:
There are 3 major components/classes in NNI model compression framework: ``Compressor``\ , ``Pruner`` and ``Quantizer``. Let's look at them in detail one by one:
Compressor
----------
Compressor is the base class for pruner and quntizer, it provides a unified interface for pruner and quantizer for end users, so that pruner and quantizer can be used in the same way. For example, to use a pruner:
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import LevelPruner
# load a pretrained model or train a model before using a pruner
configure_list = [{
'sparsity': 0.7,
'op_types': ['Conv2d', 'Linear'],
}]
pruner = LevelPruner(model, configure_list)
model = pruner.compress()
# model is ready for pruning, now start finetune the model,
# the model will be pruned during training automatically
To use a quantizer:
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import DoReFaQuantizer
configure_list = [{
'quant_types': ['weight'],
'quant_bits': {
'weight': 8,
},
'op_types':['Conv2d', 'Linear']
}]
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
quantizer = DoReFaQuantizer(model, configure_list, optimizer)
quantizer.compress()
View :githublink:`example code <examples/model_compress>` for more information.
``Compressor`` class provides some utility methods for subclass and users:
Set wrapper attribute
^^^^^^^^^^^^^^^^^^^^^
Sometimes ``calc_mask`` must save some state data, therefore users can use ``set_wrappers_attribute`` API to register attribute just like how buffers are registered in PyTorch modules. These buffers will be registered to ``module wrapper``. Users can access these buffers through ``module wrapper``.
In above example, we use ``set_wrappers_attribute`` to set a buffer ``if_calculated`` which is used as flag indicating if the mask of a layer is already calculated.
Collect data during forward
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sometimes users want to collect some data during the modules' forward method, for example, the mean value of the activation. This can be done by adding a customized collector to module.
.. code-block:: python
class MyMasker(WeightMasker):
def __init__(self, model, pruner):
super().__init__(model, pruner)
# Set attribute `collected_activation` for all wrappers to store
# activations for each layer
self.pruner.set_wrappers_attribute("collected_activation", [])
self.activation = torch.nn.functional.relu
def collector(wrapper, input_, output):
# The collected activation can be accessed via each wrapper's collected_activation
# attribute
wrapper.collected_activation.append(self.activation(output.detach().cpu()))
self.pruner.hook_id = self.pruner.add_activation_collector(collector)
The collector function will be called each time the forward method runs.
Users can also remove this collector like this:
.. code-block:: python
# Save the collector identifier
collector_id = self.pruner.add_activation_collector(collector)
# When the collector is not used any more, it can be remove using
# the saved collector identifier
self.pruner.remove_activation_collector(collector_id)
----
Pruner
------
A pruner receives ``model`` , ``config_list`` as arguments.
Some pruners like ``TaylorFOWeightFilter Pruner`` prune the model per the ``config_list`` during training loop by adding a hook on ``optimizer.step()``.
Pruner class is a subclass of Compressor, so it contains everything in the Compressor class and some additional components only for pruning, it contains:
Weight masker
^^^^^^^^^^^^^
A ``weight masker`` is the implementation of pruning algorithms, it can prune a specified layer wrapped by ``module wrapper`` with specified sparsity.
Pruning module wrapper
^^^^^^^^^^^^^^^^^^^^^^
A ``pruning module wrapper`` is a module containing:
#. the origin module
#. some buffers used by ``calc_mask``
#. a new forward method that applies masks before running the original forward method.
the reasons to use ``module wrapper``\ :
#. some buffers are needed by ``calc_mask`` to calculate masks and these buffers should be registered in ``module wrapper`` so that the original modules are not contaminated.
#. a new ``forward`` method is needed to apply masks to weight before calling the real ``forward`` method.
Pruning hook
^^^^^^^^^^^^
A pruning hook is installed on a pruner when the pruner is constructed, it is used to call pruner's calc_mask method at ``optimizer.step()`` is invoked.
----
Quantizer
---------
Quantizer class is also a subclass of ``Compressor``\ , it is used to compress models by reducing the number of bits required to represent weights or activations, which can reduce the computations and the inference time. It contains:
Quantization module wrapper
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Each module/layer of the model to be quantized is wrapped by a quantization module wrapper, it provides a new ``forward`` method to quantize the original module's weight, input and output.
Quantization hook
^^^^^^^^^^^^^^^^^
A quantization hook is installed on a quntizer when it is constructed, it is call at ``optimizer.step()``.
Quantization methods
^^^^^^^^^^^^^^^^^^^^
``Quantizer`` class provides following methods for subclass to implement quantization algorithms:
.. code-block:: python
class Quantizer(Compressor):
"""
Base quantizer for pytorch quantizer
"""
def quantize_weight(self, weight, wrapper, **kwargs):
"""
quantize should overload this method to quantize weight.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
weight : Tensor
weight that needs to be quantized
wrapper : QuantizerModuleWrapper
the wrapper for origin module
"""
raise NotImplementedError('Quantizer must overload quantize_weight()')
def quantize_output(self, output, wrapper, **kwargs):
"""
quantize should overload this method to quantize output.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
output : Tensor
output that needs to be quantized
wrapper : QuantizerModuleWrapper
the wrapper for origin module
"""
raise NotImplementedError('Quantizer must overload quantize_output()')
def quantize_input(self, *inputs, wrapper, **kwargs):
"""
quantize should overload this method to quantize input.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
inputs : Tensor
inputs that needs to be quantized
wrapper : QuantizerModuleWrapper
the wrapper for origin module
"""
raise NotImplementedError('Quantizer must overload quantize_input()')
----
Multi-GPU support
-----------------
On multi-GPU training, buffers and parameters are copied to multiple GPU every time the ``forward`` method runs on multiple GPU. If buffers and parameters are updated in the ``forward`` method, an ``in-place`` update is needed to ensure the update is effective.
Since ``calc_mask`` is called in the ``optimizer.step`` method, which happens after the ``forward`` method and happens only on one GPU, it supports multi-GPU naturally.
.. 37577199d91c137b881450f825f38fa2
使用 NNI 进行模型压缩
==========================
.. contents::
目前的大型神经网络较之以往具有更多的层和节点,而如何降低其存储和计算成本是一个重要的话题,尤其是针对于那些需要实时响应的应用程序。模型压缩的相关方法可以用于解决这些问题。
NNI 的模型压缩工具包,提供了最先进的模型压缩算法和策略,帮助压缩并加速模型。NNI 模型压缩支持的主要功能有:
* 支持多种流行的剪枝和量化算法。
* 通过 NNI 强大的自动调优功能,可使用最先进的策略来自动化模型的剪枝和量化过程。
* 加速压缩的模型,使其在推理时有更低的延迟,同时文件也会变小。
* 提供友好易用的压缩工具,帮助用户深入了解压缩过程和结果。
* 提供简洁的接口,帮助用户实现自己的压缩算法。
压缩流水线
----------
.. image:: ../../img/compression_flow.jpg
:target: ../../img/compression_flow.jpg
:alt:
NNI整体的模型压缩流水线图。对于压缩一个预训练的模型,剪枝和量化可以单独使用或结合使用。
.. note::
NNI 压缩算法并不意味着真正使模型变小或者减少延迟,NNI 的加速工具才可以真正压缩模型并减少延迟。要获得真正压缩后的模型,用户应该进行 `模型加速 <./ModelSpeedup.rst>`__。* 注意,PyTorch 和 TensorFlow 有统一的 API 接口,当前仅支持 PyTorch 版本,未来会提供 TensorFlow 的支持。
支持的算法
----------
包括剪枝和量化算法。
剪枝算法
^^^^^^^^
剪枝算法通过删除冗余权重或层通道来压缩原始网络,从而降低模型复杂性并解决过拟合问题。
.. list-table::
:header-rows: 1
:widths: auto
* - 名称
- 算法简介
* - `Level Pruner <Pruner.rst#level-pruner>`__
- 根据权重的绝对值,来按比例修剪权重。
* - `AGP Pruner <../Compression/Pruner.rst#agp-pruner>`__
- 自动的逐步剪枝(To prune, or not to prune: exploring the efficacy of pruning for model compression)`参考论文 <https://arxiv.org/abs/1710.01878>`__
* - `Lottery Ticket Pruner <../Compression/Pruner.rst#lottery-ticket>`__
- "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks" 提出的剪枝过程。 它会反复修剪模型。 `参考论文 <https://arxiv.org/abs/1803.03635>`__
* - `FPGM Pruner <../Compression/Pruner.rst#fpgm-pruner>`__
- Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration `参考论文 <https://arxiv.org/pdf/1811.00250.pdf>`__
* - `L1Filter Pruner <../Compression/Pruner.rst#l1filter-pruner>`__
- 在卷积层中具有最小 L1 权重规范的剪枝滤波器。(Pruning Filters for Efficient Convnets) `参考论文 <https://arxiv.org/abs/1608.08710>`__
* - `L2Filter Pruner <../Compression/Pruner.rst#l2filter-pruner>`__
- 在卷积层中具有最小 L2 权重规范的剪枝滤波器。
* - `ActivationAPoZRankFilterPruner <../Compression/Pruner.rst#activationapozrankfilter-pruner>`__
- 基于指标 APoZ(平均百分比零)的剪枝滤波器,该指标测量(卷积)图层激活值中零的百分比。 `参考论文 <https://arxiv.org/abs/1607.03250>`__
* - `ActivationMeanRankFilterPruner <../Compression/Pruner.rst#activationmeanrankfilter-pruner>`__
- 基于计算输出激活最小平均值指标的剪枝滤波器。
* - `Slim Pruner <../Compression/Pruner.rst#slim-pruner>`__
- 通过修剪 BN 层中的缩放因子来修剪卷积层中的通道。 (Learning Efficient Convolutional Networks through Network Slimming) `参考论文 <https://arxiv.org/abs/1708.06519>`__
* - `TaylorFO Pruner <../Compression/Pruner.rst#taylorfoweightfilter-pruner>`__
- 基于一阶泰勒展开的权重对滤波器剪枝。 (Importance Estimation for Neural Network Pruning) `参考论文 <http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf>`__
* - `ADMM Pruner <../Compression/Pruner.rst#admm-pruner>`__
- 基于 ADMM 优化技术的剪枝。 `参考论文 <https://arxiv.org/abs/1804.03294>`__
* - `NetAdapt Pruner <../Compression/Pruner.rst#netadapt-pruner>`__
- 在满足计算资源预算的情况下,对预训练的网络迭代剪枝。 `参考论文 <https://arxiv.org/abs/1804.03230>`__
* - `SimulatedAnnealing Pruner <../Compression/Pruner.rst#simulatedannealing-pruner>`__
- 通过启发式的模拟退火算法进行自动剪枝。 `参考论文 <https://arxiv.org/abs/1907.03141>`__
* - `AutoCompress Pruner <../Compression/Pruner.rst#autocompress-pruner>`__
- 通过迭代调用 SimulatedAnnealing Pruner 和 ADMM Pruner 进行自动剪枝。 `参考论文 - <https://arxiv.org/abs/1907.03141>`__
* - `AMC Pruner <../Compression/Pruner.rst#amc-pruner>`__
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices `参考论文 <https://arxiv.org/pdf/1802.03494.pdf>`__
* - `Transformer Head Pruner <../Compression/Pruner.rst#transformer-head-pruner>`__
- 针对transformer中的注意力头的剪枝.
参考此 :githublink:`基准测试 <../CommunitySharings/ModelCompressionComparison.rst>` 来查看这些剪枝器在一些基准问题上的表现。
量化算法
^^^^^^^^
量化算法通过减少表示权重或激活函数所需的精度位数来压缩原始网络,这可以减少计算和推理时间。
.. list-table::
:header-rows: 1
:widths: auto
* - 名称
- 算法简介
* - `Naive Quantizer <../Compression/Quantizer.rst#naive-quantizer>`__
- 默认将权重量化为 8 位。
* - `QAT Quantizer <../Compression/Quantizer.rst#qat-quantizer>`__
- Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. `参考论文 <http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf>`__
* - `DoReFa Quantizer <../Compression/Quantizer.rst#dorefa-quantizer>`__
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. `参考论文 <https://arxiv.org/abs/1606.06160>`__
* - `BNN Quantizer <../Compression/Quantizer.rst#bnn-quantizer>`__
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. `参考论文 <https://arxiv.org/abs/1602.02830>`__
* - `LSQ Quantizer <../Compression/Quantizer.rst#lsq-quantizer>`__
- Learned step size quantization. `参考论文 <https://arxiv.org/pdf/1902.08153.pdf>`__
* - `Observer Quantizer <../Compression/Quantizer.rst#observer-quantizer>`__
- Post training quantizaiton. 使用 observer 在校准期间收集量化信息。
模型加速
--------
模型压缩的目的是减少推理延迟和模型大小。但现有的模型压缩算法主要通过模拟的方法来检查压缩模型性能(如精度)。例如,剪枝算法中使用掩码,而量化算法中量化值仍然是以 32 位浮点数来存储。只要给出这些算法产生的掩码和量化位,NNI 可真正的加速模型。基于掩码的模型加速详细教程可以在 `这里 <./ModelSpeedup.rst>`__ 找到。混合精度量化的详细教程可以在 `这里 <./QuantizationSpeedup.rst>`__ 找到。
压缩工具
--------
压缩工具包括了一些有用的工具,能帮助用户理解并分析要压缩的模型。例如,可检查每层对剪枝的敏感度。可很容易的计算模型的 FLOPs 和参数数量。`点击这里 <./CompressionUtils.rst>`__,查看压缩工具的完整列表。
高级用法
--------
NNI 模型压缩提供了简洁的接口,用于自定义新的压缩算法。接口的设计理念是,将框架相关的实现细节包装起来,让用户能聚焦于压缩逻辑。用户可以进一步了解我们的压缩框架,并根据我们的框架定制新的压缩算法(剪枝算法或量化算法)。此外,还可利用 NNI 的自动调参功能来自动的压缩模型。参考 `这里 <./advanced.rst>`__ 了解更多细节。
参考和反馈
----------
* 在Github 中 `提交此功能的 Bug <https://github.com/microsoft/nni/issues/new?template=bug-report.rst>`__
* 在Github 中 `提交新功能或请求改进 <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__
* 了解更多关于 NNI 中的 `特征工程 <../FeatureEngineering/Overview.rst>`__\ ;
* 了解更多关于 NNI 中的 `NAS <../NAS/Overview.rst>`__\ ;
* 了解更多关于 NNI 中的 `超参调优 <../Tuner/BuiltinTuner.rst>`__\ ;
Supported Pruning Algorithms on NNI
===================================
We provide several pruning algorithms that support fine-grained weight pruning and structural filter pruning. **Fine-grained Pruning** generally results in unstructured models, which need specialized hardware or software to speed up the sparse network. **Filter Pruning** achieves acceleration by removing the entire filter. Some pruning algorithms use one-shot method that prune weights at once based on an importance metric (It is necessary to finetune the model to compensate for the loss of accuracy). Other pruning algorithms **iteratively** prune weights during optimization, which control the pruning schedule, including some automatic pruning algorithms.
**One-shot Pruning**
* `Level Pruner <#level-pruner>`__ ((fine-grained pruning))
* `Slim Pruner <#slim-pruner>`__
* `FPGM Pruner <#fpgm-pruner>`__
* `L1Filter Pruner <#l1filter-pruner>`__
* `L2Filter Pruner <#l2filter-pruner>`__
* `Activation APoZ Rank Filter Pruner <#activationAPoZRankFilter-pruner>`__
* `Activation Mean Rank Filter Pruner <#activationmeanrankfilter-pruner>`__
* `Taylor FO On Weight Pruner <#taylorfoweightfilter-pruner>`__
**Iteratively Pruning**
* `AGP Pruner <#agp-pruner>`__
* `NetAdapt Pruner <#netadapt-pruner>`__
* `SimulatedAnnealing Pruner <#simulatedannealing-pruner>`__
* `AutoCompress Pruner <#autocompress-pruner>`__
* `AMC Pruner <#amc-pruner>`__
* `Sensitivity Pruner <#sensitivity-pruner>`__
* `ADMM Pruner <#admm-pruner>`__
**Others**
* `Lottery Ticket Hypothesis <#lottery-ticket-hypothesis>`__
* `Transformer Head Pruner <#transformer-head-pruner>`__
Level Pruner
------------
This is one basic one-shot pruner: you can set a target sparsity level (expressed as a fraction, 0.6 means we will prune 60% of the weight parameters).
We first sort the weights in the specified layer by their absolute values. And then mask to zero the smallest magnitude weights until the desired sparsity level is reached.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(model, config_list)
pruner.compress()
User configuration for Level Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.LevelPruner
**TensorFlow**
.. autoclass:: nni.algorithms.compression.tensorflow.pruning.LevelPruner
Slim Pruner
-----------
This is an one-shot pruner, which adds sparsity regularization on the scaling factors of batch normalization (BN) layers during training to identify unimportant channels. The channels with small scaling factor values will be pruned. For more details, please refer to `'Learning Efficient Convolutional Networks through Network Slimming' <https://arxiv.org/pdf/1708.06519.pdf>`__\.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import SlimPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
pruner = SlimPruner(model, config_list, optimizer, trainer, criterion)
pruner.compress()
User configuration for Slim Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.SlimPruner
Reproduced Experiment
^^^^^^^^^^^^^^^^^^^^^
We implemented one of the experiments in `Learning Efficient Convolutional Networks through Network Slimming <https://arxiv.org/pdf/1708.06519.pdf>`__\ , we pruned ``70%`` channels in the **VGGNet** for CIFAR-10 in the paper, in which ``88.5%`` parameters are pruned. Our experiments results are as follows:
.. list-table::
:header-rows: 1
:widths: auto
* - Model
- Error(paper/ours)
- Parameters
- Pruned
* - VGGNet
- 6.34/6.69
- 20.04M
-
* - Pruned-VGGNet
- 6.20/6.34
- 2.03M
- 88.5%
The experiments code can be found at :githublink:`examples/model_compress/pruning/basic_pruners_torch.py <examples/model_compress/pruning/basic_pruners_torch.py>`
.. code-block:: python
python basic_pruners_torch.py --pruner slim --model vgg19 --sparsity 0.7 --speed-up
----
FPGM Pruner
-----------
This is an one-shot pruner, which prunes filters with the smallest geometric median. FPGM chooses the filters with the most replaceable contribution.
For more details, please refer to `Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration <https://arxiv.org/pdf/1811.00250.pdf>`__.
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import FPGMPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = FPGMPruner(model, config_list)
pruner.compress()
User configuration for FPGM Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.FPGMPruner
L1Filter Pruner
---------------
This is an one-shot pruner, which prunes the filters in the **convolution layers**.
..
The procedure of pruning m filters from the ith convolutional layer is as follows:
#. For each filter :math:`F_{i,j}`, calculate the sum of its absolute kernel weights :math:`s_j=\sum_{l=1}^{n_i}\sum|K_l|`.
#. Sort the filters by :math:`s_j`.
#. Prune :math:`m` filters with the smallest sum values and their corresponding feature maps. The
kernels in the next convolutional layer corresponding to the pruned feature maps are also removed.
#. A new kernel matrix is created for both the :math:`i`-th and :math:`i+1`-th layers, and the remaining kernel
weights are copied to the new model.
For more details, please refer to `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__\.
In addition, we also provide a dependency-aware mode for the L1FilterPruner. For more details about the dependency-aware mode, please reference `dependency-aware mode <./DependencyAware.rst>`__.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import L1FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L1FilterPruner(model, config_list)
pruner.compress()
User configuration for L1Filter Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.L1FilterPruner
Reproduced Experiment
^^^^^^^^^^^^^^^^^^^^^
We implemented one of the experiments in `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__ with **L1FilterPruner**\ , we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which ``64%`` parameters are pruned. Our experiments results are as follows:
.. list-table::
:header-rows: 1
:widths: auto
* - Model
- Error(paper/ours)
- Parameters
- Pruned
* - VGG-16
- 6.75/6.49
- 1.5x10^7
-
* - VGG-16-pruned-A
- 6.60/6.47
- 5.4x10^6
- 64.0%
The experiments code can be found at :githublink:`examples/model_compress/pruning/basic_pruners_torch.py <examples/model_compress/pruning/basic_pruners_torch.py>`
.. code-block:: python
python basic_pruners_torch.py --pruner l1filter --model vgg16 --speed-up
----
L2Filter Pruner
---------------
This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights. It is implemented as a one-shot pruner.
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import L2FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L2FilterPruner(model, config_list)
pruner.compress()
User configuration for L2Filter Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.L2FilterPruner
----
ActivationAPoZRankFilter Pruner
-------------------------------
ActivationAPoZRankFilter Pruner is a pruner which prunes the filters with the smallest importance criterion ``APoZ`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion ``APoZ`` is explained in the paper `Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures <https://arxiv.org/abs/1607.03250>`__.
The APoZ is defined as:
:math:`APoZ_{c}^{(i)} = APoZ\left(O_{c}^{(i)}\right)=\frac{\sum_{k}^{N} \sum_{j}^{M} f\left(O_{c, j}^{(i)}(k)=0\right)}{N \times M}`
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import ActivationAPoZRankFilterPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = ActivationAPoZRankFilterPruner(model, config_list, optimizer, trainer, criterion, sparsifying_training_batches=1)
pruner.compress()
Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the ``op_types`` field supports only convolutional layers.
You can view :githublink:`example <examples/model_compress/pruning/basic_pruners_torch.py>` for more information.
User configuration for ActivationAPoZRankFilter Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.ActivationAPoZRankFilterPruner
----
ActivationMeanRankFilter Pruner
-------------------------------
ActivationMeanRankFilterPruner is a pruner which prunes the filters with the smallest importance criterion ``mean activation`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion ``mean activation`` is explained in section 2.2 of the paper `Pruning Convolutional Neural Networks for Resource Efficient Inference <https://arxiv.org/abs/1611.06440>`__. Other pruning criteria mentioned in this paper will be supported in future release.
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import ActivationMeanRankFilterPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = ActivationMeanRankFilterPruner(model, config_list, optimizer, trainer, criterion, sparsifying_training_batches=1)
pruner.compress()
Note: ActivationMeanRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the ``op_types`` field supports only convolutional layers.
You can view :githublink:`example <examples/model_compress/pruning/basic_pruners_torch.py>` for more information.
User configuration for ActivationMeanRankFilterPruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.ActivationMeanRankFilterPruner
----
TaylorFOWeightFilter Pruner
---------------------------
TaylorFOWeightFilter Pruner is a pruner which prunes convolutional layers based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity. The estimated importance of filters is defined as the paper `Importance Estimation for Neural Network Pruning <http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf>`__. Other pruning criteria mentioned in this paper will be supported in future release.
..
:math:`\widehat{\mathcal{I}}_{\mathcal{S}}^{(1)}(\mathbf{W}) \triangleq \sum_{s \in \mathcal{S}} \mathcal{I}_{s}^{(1)}(\mathbf{W})=\sum_{s \in \mathcal{S}}\left(g_{s} w_{s}\right)^{2}`
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
What's more, we provide a global-sort mode for this pruner which is aligned with paper implementation. Please set parameter 'global_sort' to True when instantiate TaylorFOWeightFilterPruner.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import TaylorFOWeightFilterPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = TaylorFOWeightFilterPruner(model, config_list, optimizer, trainer, criterion, sparsifying_training_batches=1)
pruner.compress()
User configuration for TaylorFOWeightFilter Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.TaylorFOWeightFilterPruner
----
AGP Pruner
----------
This is an iterative pruner, which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step :math:`t_{0}` and with pruning frequency :math:`\Delta t`:
:math:`s_{t}=s_{f}+\left(s_{i}-s_{f}\right)\left(1-\frac{t-t_{0}}{n \Delta t}\right)^{3} \text { for } t \in\left\{t_{0}, t_{0}+\Delta t, \ldots, t_{0} + n \Delta t\right\}`
For more details please refer to `To prune, or not to prune: exploring the efficacy of pruning for model compression <https://arxiv.org/abs/1710.01878>`__\.
Usage
^^^^^
You can prune all weights from 0% to 80% sparsity in 10 epoch with the code below.
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import AGPPruner
config_list = [{
'sparsity': 0.8,
'op_types': ['default']
}]
# load a pretrained model or train a model before using a pruner
# model = MyModel()
# model.load_state_dict(torch.load('mycheckpoint.pth'))
# AGP pruner prunes model while fine tuning the model by adding a hook on
# optimizer.step(), so an optimizer is required to prune the model.
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
pruner = AGPPruner(model, config_list, optimizer, trainer, criterion, pruning_algorithm='level')
pruner.compress()
AGP pruner uses ``LevelPruner`` algorithms to prune the weight by default, however you can set ``pruning_algorithm`` parameter to other values to use other pruning algorithms:
* ``level``\ : LevelPruner
* ``slim``\ : SlimPruner
* ``l1``\ : L1FilterPruner
* ``l2``\ : L2FilterPruner
* ``fpgm``\ : FPGMPruner
* ``taylorfo``\ : TaylorFOWeightFilterPruner
* ``apoz``\ : ActivationAPoZRankFilterPruner
* ``mean_activation``\ : ActivationMeanRankFilterPruner
User configuration for AGP Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.AGPPruner
----
NetAdapt Pruner
---------------
NetAdapt allows a user to automatically simplify a pretrained network to meet the resource budget.
Given the overall sparsity, NetAdapt will automatically generate the sparsities distribution among different layers by iterative pruning.
For more details, please refer to `NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications <https://arxiv.org/abs/1804.03230>`__.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import NetAdaptPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = NetAdaptPruner(model, config_list, short_term_fine_tuner=short_term_fine_tuner, evaluator=evaluator,base_algo='l1', experiment_data_dir='./')
pruner.compress()
You can view :githublink:`example <examples/model_compress/pruning/auto_pruners_torch.py>` for more information.
User configuration for NetAdapt Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.NetAdaptPruner
SimulatedAnnealing Pruner
-------------------------
We implement a guided heuristic search method, Simulated Annealing (SA) algorithm, with enhancement on guided search based on prior experience.
The enhanced SA technique is based on the observation that a DNN layer with more number of weights often has a higher degree of model compression with less impact on overall accuracy.
* Randomly initialize a pruning rate distribution (sparsities).
* While current_temperature < stop_temperature:
#. generate a perturbation to current distribution
#. Perform fast evaluation on the perturbated distribution
#. accept the perturbation according to the performance and probability, if not accepted, return to step 1
#. cool down, current_temperature <- current_temperature * cool_down_rate
For more details, please refer to `AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates <https://arxiv.org/abs/1907.03141>`__.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import SimulatedAnnealingPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = SimulatedAnnealingPruner(model, config_list, evaluator=evaluator, base_algo='l1', cool_down_rate=0.9, experiment_data_dir='./')
pruner.compress()
You can view :githublink:`example <examples/model_compress/pruning/auto_pruners_torch.py>` for more information.
User configuration for SimulatedAnnealing Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.SimulatedAnnealingPruner
AutoCompress Pruner
-------------------
For each round, AutoCompressPruner prune the model for the same sparsity to achive the overall sparsity:
.. code-block:: bash
1. Generate sparsities distribution using SimulatedAnnealingPruner
2. Perform ADMM-based structured pruning to generate pruning result for the next round.
Here we use `speedup` to perform real pruning.
For more details, please refer to `AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates <https://arxiv.org/abs/1907.03141>`__.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import AutoCompressPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = AutoCompressPruner(
model, config_list, trainer=trainer, evaluator=evaluator,
dummy_input=dummy_input, num_iterations=3, optimize_mode='maximize', base_algo='l1',
cool_down_rate=0.9, admm_num_iterations=30, admm_training_epochs=5, experiment_data_dir='./')
pruner.compress()
You can view :githublink:`example <examples/model_compress/pruning/auto_pruners_torch.py>` for more information.
User configuration for AutoCompress Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.AutoCompressPruner
AMC Pruner
----------
AMC pruner leverages reinforcement learning to provide the model compression policy.
This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio,
better preserving the accuracy and freeing human labor.
For more details, please refer to `AMC: AutoML for Model Compression and Acceleration on Mobile Devices <https://arxiv.org/pdf/1802.03494.pdf>`__.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import AMCPruner
config_list = [{
'op_types': ['Conv2d', 'Linear']
}]
pruner = AMCPruner(model, config_list, evaluator, val_loader, flops_ratio=0.5)
pruner.compress()
You can view :githublink:`example <examples/model_compress/pruning/amc/>` for more information.
User configuration for AMC Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.AMCPruner
Reproduced Experiment
^^^^^^^^^^^^^^^^^^^^^
We implemented one of the experiments in `AMC: AutoML for Model Compression and Acceleration on Mobile Devices <https://arxiv.org/pdf/1802.03494.pdf>`__\ , we pruned **MobileNet** to 50% FLOPS for ImageNet in the paper. Our experiments results are as follows:
.. list-table::
:header-rows: 1
:widths: auto
* - Model
- Top 1 acc.(paper/ours)
- Top 5 acc. (paper/ours)
- FLOPS
* - MobileNet
- 70.5% / 69.9%
- 89.3% / 89.1%
- 50%
The experiments code can be found at :githublink:`examples/model_compress/pruning/ <examples/model_compress/pruning/amc/>`
ADMM Pruner
-----------
Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
by decomposing the original nonconvex problem into two subproblems that can be solved iteratively. In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively.
During the process of solving these two subproblems, the weights of the original model will be changed. An one-shot pruner will then be applied to prune the model according to the config list given.
This solution framework applies both to non-structured and different variations of structured pruning schemes.
For more details, please refer to `A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers <https://arxiv.org/abs/1804.03294>`__.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import ADMMPruner
config_list = [{
'sparsity': 0.8,
'op_types': ['Conv2d'],
'op_names': ['conv1']
}, {
'sparsity': 0.92,
'op_types': ['Conv2d'],
'op_names': ['conv2']
}]
pruner = ADMMPruner(model, config_list, trainer, num_iterations=30, epochs_per_iteration=5)
pruner.compress()
You can view :githublink:`example <examples/model_compress/pruning/auto_pruners_torch.py>` for more information.
User configuration for ADMM Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.ADMMPruner
Lottery Ticket Hypothesis
-------------------------
`The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks <https://arxiv.org/abs/1803.03635>`__\ , authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the *lottery ticket hypothesis*\ : dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*\ ) that -- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.
In this paper, the authors use the following process to prune a model, called *iterative prunning*\ :
..
#. Randomly initialize a neural network f(x;theta_0) (where theta\ *0 follows D*\ {theta}).
#. Train the network for j iterations, arriving at parameters theta_j.
#. Prune p% of the parameters in theta_j, creating a mask m.
#. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
#. Repeat step 2, 3, and 4.
If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning, each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import LotteryTicketPruner
config_list = [{
'prune_iterations': 5,
'sparsity': 0.8,
'op_types': ['default']
}]
pruner = LotteryTicketPruner(model, config_list, optimizer)
pruner.compress()
for _ in pruner.get_prune_iterations():
pruner.prune_iteration_start()
for epoch in range(epoch_num):
...
The above configuration means that there are 5 times of iterative pruning. As the 5 times iterative pruning are executed in the same run, LotteryTicketPruner needs ``model`` and ``optimizer`` (\ **Note that should add ``lr_scheduler`` if used**\ ) to reset their states every time a new prune iteration starts. Please use ``get_prune_iterations`` to get the pruning iterations, and invoke ``prune_iteration_start`` at the beginning of each iteration. ``epoch_num`` is better to be large enough for model convergence, because the hypothesis is that the performance (accuracy) got in latter rounds with high sparsity could be comparable with that got in the first round.
User configuration for LotteryTicket Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.LotteryTicketPruner
Reproduced Experiment
^^^^^^^^^^^^^^^^^^^^^
We try to reproduce the experiment result of the fully connected network on MNIST using the same configuration as in the paper. The code can be referred :githublink:`here <examples/model_compress/pruning/lottery_torch_mnist_fc.py>`. In this experiment, we prune 10 times, for each pruning we train the pruned model for 50 epochs.
.. image:: ../../img/lottery_ticket_mnist_fc.png
:target: ../../img/lottery_ticket_mnist_fc.png
:alt:
The above figure shows the result of the fully connected network. ``round0-sparsity-0.0`` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear.
Sensitivity Pruner
------------------
For each round, SensitivityPruner prunes the model based on the sensitivity to the accuracy of each layer until meeting the final configured sparsity of the whole model:
.. code-block:: bash
1. Analyze the sensitivity of each layer in the current state of the model.
2. Prune each layer according to the sensitivity.
For more details, please refer to `Learning both Weights and Connections for Efficient Neural Networks <https://arxiv.org/abs/1506.02626>`__.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import SensitivityPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = SensitivityPruner(model, config_list, finetuner=fine_tuner, evaluator=evaluator)
# eval_args and finetune_args are the parameters passed to the evaluator and finetuner respectively
pruner.compress(eval_args=[model], finetune_args=[model])
User configuration for Sensitivity Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.SensitivityPruner
Transformer Head Pruner
-----------------------
Transformer Head Pruner is a tool designed for pruning attention heads from the models belonging to the `Transformer family <https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf>`__. The following image from `Efficient Transformers: A Survey <https://arxiv.org/pdf/2009.06732.pdf>`__ gives a good overview the general structure of the Transformer.
.. image:: ../../img/transformer_structure.png
:target: ../../img/transformer_structure.png
:alt:
Typically, each attention layer in the Transformer models consists of four weights: three projection matrices for query, key, value, and an output projection matrix. The outputs of the former three matrices contains the projected results for all heads. Normally, the results are then reshaped so that each head performs that attention computation independently. The final results are concatenated back before fed into the output projection. Therefore, when an attention head is pruned, the same weights corresponding to that heads in the three projection matrices are pruned. Also, the weights in the output projection corresponding to the head's output are pruned. In our implementation, we calculate and apply masks to the four matrices together.
Note: currently, the pruner can only handle models with projection weights written as separate ``Linear`` modules, i.e., it expects four ``Linear`` modules corresponding to query, key, value, and an output projections. Therefore, in the ``config_list``, you should either write ``['Linear']`` for the ``op_types`` field, or write names corresponding to ``Linear`` modules for the ``op_names`` field. For instance, the `Huggingface transformers <https://huggingface.co/transformers/index.html>`_ are supported, but ``torch.nn.Transformer`` is not.
The pruner implements the following algorithm:
.. code-block:: bash
Repeat for each pruning iteration (1 for one-shot pruning):
1. Calculate importance scores for each head in each specified layer using a specific criterion.
2. Sort heads locally or globally, and prune out some heads with lowest scores. The number of pruned heads is determined according to the sparsity specified in the config.
3. If the specified pruning iteration is larger than 1 (iterative pruning), finetune the model for a while before the next pruning iteration.
Currently, the following head sorting criteria are supported:
* "l1_weight": rank heads by the L1-norm of weights of the query, key, and value projection matrices.
* "l2_weight": rank heads by the L2-norm of weights of the query, key, and value projection matrices.
* "l1_activation": rank heads by the L1-norm of their attention computation output.
* "l2_activation": rank heads by the L2-norm of their attention computation output.
* "taylorfo": rank heads by l1 norm of the output of attention computation * gradient for this output. Check more details in `this paper <https://arxiv.org/abs/1905.10650>`__ and `this one <https://arxiv.org/abs/1611.06440>`__.
We support local sorting (i.e., sorting heads within a layer) and global sorting (sorting all heads together), and you can control by setting the ``global_sort`` parameter. Note that if ``global_sort=True`` is passed, all weights must have the same sparsity in the config list. However, this does not mean that each layer will be prune to the same sparsity as specified. This sparsity value will be interpreted as a global sparsity, and each layer is likely to have different sparsity after pruning by global sort. As a reminder, we found that if global sorting is used, it is usually helpful to use an iterative pruning scheme, interleaving pruning with intermediate finetuning, since global sorting often results in non-uniform sparsity distributions, which makes the model more susceptible to forgetting.
In our implementation, we support two ways to group the four weights in the same layer together. You can either pass a nested list containing the names of these modules as the pruner's initialization parameters (usage below), or simply pass a dummy input instead and the pruner will run ``torch.jit.trace`` to group the weights (experimental feature). However, if you would like to assign different sparsity to each layer, you can only use the first option, i.e., passing names of the weights to the pruner (see usage below). Also, note that we require the weights belonging to the same layer to have the same sparsity.
Usage
^^^^^
Suppose we want to prune a BERT with Huggingface implementation, which has the following architecture (obtained by calling ``print(model)``). Note that we only show the first layer of the repeated layers in the encoder's ``ModuleList layer``.
.. image:: ../../img/huggingface_bert_architecture.png
:target: ../../img/huggingface_bert_architecture.png
:alt:
**Usage Example: one-shot pruning, assigning sparsity 0.5 to the first six layers and sparsity 0.25 to the last six layers (PyTorch code)**. Note that
* Here we specify ``op_names`` in the config list to assign different sparsity to different layers.
* Meanwhile, we pass ``attention_name_groups`` to the pruner so that the pruner may group together the weights belonging to the same attention layer.
* Since in this example we want to do one-shot pruning, the ``num_iterations`` parameter is set to 1, and the parameter ``epochs_per_iteration`` is ignored. If you would like to do iterative pruning instead, you can set the ``num_iterations`` parameter to the number of pruning iterations, and the ``epochs_per_iteration`` parameter to the number of finetuning epochs between two iterations.
* The arguments ``trainer`` and ``optimizer`` are only used when we want to do iterative pruning, or the ranking criterion is ``taylorfo``. Here these two parameters are ignored by the pruner.
* The argument ``forward_runner`` is only used when the ranking criterion is ``l1_activation`` or ``l2_activation``. Here this parameter is ignored by the pruner.
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import TransformerHeadPruner
attention_name_groups = list(zip(["encoder.layer.{}.attention.self.query".format(i) for i in range(12)],
["encoder.layer.{}.attention.self.key".format(i) for i in range(12)],
["encoder.layer.{}.attention.self.value".format(i) for i in range(12)],
["encoder.layer.{}.attention.output.dense".format(i) for i in range(12)]))
kwargs = {"ranking_criterion": "l1_weight",
"global_sort": False,
"num_iterations": 1,
"epochs_per_iteration": 1, # this is ignored when num_iterations = 1
"head_hidden_dim": 64,
"attention_name_groups": attention_name_groups,
"trainer": trainer,
"optimizer": optimizer,
"forward_runner": forward_runner
}
config_list = [{
"sparsity": 0.5,
"op_types": ["Linear"],
"op_names": [x for layer in attention_name_groups[:6] for x in layer] # first six layers
}, {
"sparsity": 0.25,
"op_types": ["Linear"],
"op_names": [x for layer in attention_name_groups[6:] for x in layer] # last six layers
}]
pruner = TransformerHeadPruner(model, config_list, **kwargs)
pruner.compress()
In addition to this usage guide, we provide a more detailed example of pruning BERT (Huggingface implementation) for transfer learning on the tasks from the `GLUE benchmark <https://gluebenchmark.com/>`_. Please find it in this :githublink:`page <examples/model_compress/pruning/transformers>`. To run the example, first make sure that you install the package ``transformers`` and ``datasets``. Then, you may start by running the following command:
.. code-block:: bash
./run.sh gpu_id glue_task
By default, the code will download a pretrained BERT language model, and then finetune for several epochs on the downstream GLUE task. Then, the ``TransformerHeadPruner`` will be used to prune out heads from each layer by a certain criterion (by default, the code lets the pruner uses magnitude ranking, and prunes out 50% of the heads in each layer in an one-shot manner). Finally, the pruned model will be finetuned in the downstream task for several epochs. You can check the details of pruning from the logs printed out by the example. You can also experiment with different pruning settings by changing the parameters in ``run.sh``, or directly changing the ``config_list`` in ``transformer_pruning.py``.
User configuration for Transformer Head Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.pytorch.pruning.TransformerHeadPruner
Supported Quantization Algorithms on NNI
========================================
Index of supported quantization algorithms
* `Naive Quantizer <#naive-quantizer>`__
* `QAT Quantizer <#qat-quantizer>`__
* `DoReFa Quantizer <#dorefa-quantizer>`__
* `BNN Quantizer <#bnn-quantizer>`__
* `LSQ Quantizer <#lsq-quantizer>`__
* `Observer Quantizer <#observer-quantizer>`__
Naive Quantizer
---------------
We provide Naive Quantizer to quantizer weight to default 8 bits, you can use it to test quantize algorithm without any configure.
Usage
^^^^^
pytorch
.. code-block:: python
model = nni.algorithms.compression.pytorch.quantization.NaiveQuantizer(model).compress()
----
QAT Quantizer
-------------
In `Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference <http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf>`__\ , authors Benoit Jacob and Skirmantas Kligys provide an algorithm to quantize the model with training.
..
We propose an approach that simulates quantization effects in the forward pass of training. Backpropagation still happens as usual, and all weights and biases are stored in floating point so that they can be easily nudged by small amounts. The forward propagation pass however simulates quantized inference as it will happen in the inference engine, by implementing in floating-point arithmetic the rounding behavior of the quantization scheme
* Weights are quantized before they are convolved with the input. If batch normalization (see [17]) is used for the layer, the batch normalization parameters are folded into the weights before quantization.
* Activations are quantized at points where they would be during inference, e.g. after the activation function is applied to a convolutional or fully connected layers output, or after a bypass connection adds or concatenates the outputs of several layers together such as in ResNets.
Usage
^^^^^
You can quantize your model to 8 bits with the code below before your training code.
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
model = Mnist()
config_list = [{
'quant_types': ['weight'],
'quant_bits': {
'weight': 8,
}, # you can just use `int` here because all `quan_types` share same bits length, see config for `ReLu6` below.
'op_types':['Conv2d', 'Linear']
}, {
'quant_types': ['output'],
'quant_bits': 8,
'quant_start_step': 7000,
'op_types':['ReLU6']
}]
quantizer = QAT_Quantizer(model, config_list)
quantizer.compress()
You can view example for more information
User configuration for QAT Quantizer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
common configuration needed by compression algorithms can be found at `Specification of `config_list <./QuickStart.rst>`__.
configuration needed by this algorithm :
* **quant_start_step:** int
disable quantization until model are run by certain number of steps, this allows the network to enter a more stable
state where activation quantization ranges do not exclude a signicant fraction of values, default value is 0
Batch normalization folding
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Batch normalization folding is supported in QAT quantizer. It can be easily enabled by passing an argument `dummy_input` to
the quantizer, like:
.. code-block:: python
# assume your model takes an input of shape (1, 1, 28, 28)
# and dummy_input must be on the same device as the model
dummy_input = torch.randn(1, 1, 28, 28)
# pass the dummy_input to the quantizer
quantizer = QAT_Quantizer(model, config_list, dummy_input=dummy_input)
The quantizer will automatically detect Conv-BN patterns and simulate batch normalization folding process in the training
graph. Note that when the quantization aware training process is finished, the folded weight/bias would be restored after calling
`quantizer.export_model`.
Quantization dtype and scheme customization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Different backends on different devices use different quantization strategies (i.e. dtype (int or uint) and
scheme (per-tensor or per-channel and symmetric or affine)). QAT quantizer supports customization of mainstream dtypes and schemes.
There are two ways to set them. One way is setting them globally through a function named `set_quant_scheme_dtype` like:
.. code-block:: python
from nni.compression.pytorch.quantization.settings import set_quant_scheme_dtype
# This will set all the quantization of 'input' in 'per_tensor_affine' and 'uint' manner
set_quant_scheme_dtype('input', 'per_tensor_affine', 'uint)
# This will set all the quantization of 'output' in 'per_tensor_symmetric' and 'int' manner
set_quant_scheme_dtype('output', 'per_tensor_symmetric', 'int')
# This will set all the quantization of 'weight' in 'per_channel_symmetric' and 'int' manner
set_quant_scheme_dtype('weight', 'per_channel_symmetric', 'int')
The other way is more detailed. You can customize the dtype and scheme in each quantization config list like:
.. code-block:: python
config_list = [{
'quant_types': ['weight'],
'quant_bits': 8,
'op_types':['Conv2d', 'Linear'],
'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
}, {
'quant_types': ['output'],
'quant_bits': 8,
'quant_start_step': 7000,
'op_types':['ReLU6'],
'quant_dtype': 'uint',
'quant_scheme': 'per_tensor_affine'
}]
Multi-GPU training
^^^^^^^^^^^^^^^^^^^
QAT quantizer natively supports multi-gpu training (DataParallel and DistributedDataParallel). Note that the quantizer
instantiation should happen before you wrap your model with DataParallel or DistributedDataParallel. For example:
.. code-block:: python
from torch.nn.parallel import DistributedDataParallel as DDP
from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
model = define_your_model()
model = QAT_Quantizer(model, **other_params) # <--- QAT_Quantizer instantiation
model = DDP(model)
for i in range(epochs):
train(model)
eval(model)
----
LSQ Quantizer
-------------
In `LEARNED STEP SIZE QUANTIZATION <https://arxiv.org/pdf/1902.08153.pdf>`__\ , authors Steven K. Esser and Jeffrey L. McKinstry provide an algorithm to train the scales with gradients.
..
The authors introduce a novel means to estimate and scale the task loss gradient at each weight and activation layer’s quantizer step size, such that it can be learned in conjunction with other network parameters.
Usage
^^^^^
You can add codes below before your training codes. Three things must be done:
1. configure which layer to be quantized and which tensor (input/output/weight) of that layer to be quantized.
2. construct the lsq quantizer
3. call the `compress` API
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.quantization import LsqQuantizer
model = Mnist()
configure_list = [{
'quant_types': ['weight', 'input'],
'quant_bits': {
'weight': 8,
'input': 8,
},
'op_names': ['conv1']
}, {
'quant_types': ['output'],
'quant_bits': {'output': 8,},
'op_names': ['relu1']
}]
quantizer = LsqQuantizer(model, configure_list, optimizer)
quantizer.compress()
You can view example for more information. :githublink:`examples/model_compress/quantization/LSQ_torch_quantizer.py <examples/model_compress/quantization/LSQ_torch_quantizer.py>`
User configuration for LSQ Quantizer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
common configuration needed by compression algorithms can be found at `Specification of `config_list <./QuickStart.rst>`__.
configuration needed by this algorithm :
----
DoReFa Quantizer
----------------
In `DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients <https://arxiv.org/abs/1606.06160>`__\ , authors Shuchang Zhou and Yuxin Wu provide an algorithm named DoReFa to quantize the weight, activation and gradients with training.
Usage
^^^^^
To implement DoReFa Quantizer, you can add code below before your training code
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.quantization import DoReFaQuantizer
config_list = [{
'quant_types': ['weight'],
'quant_bits': 8,
'op_types': ['default']
}]
quantizer = DoReFaQuantizer(model, config_list)
quantizer.compress()
You can view example for more information
User configuration for DoReFa Quantizer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
common configuration needed by compression algorithms can be found at `Specification of ``config_list`` <./QuickStart.rst>`__.
configuration needed by this algorithm :
----
BNN Quantizer
-------------
In `Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 <https://arxiv.org/abs/1602.02830>`__\ ,
..
We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At training-time the binary weights and activations are used for computing the parameters gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.quantization import BNNQuantizer
model = VGG_Cifar10(num_classes=10)
configure_list = [{
'quant_bits': 1,
'quant_types': ['weight'],
'op_types': ['Conv2d', 'Linear'],
'op_names': ['features.0', 'features.3', 'features.7', 'features.10', 'features.14', 'features.17', 'classifier.0', 'classifier.3']
}, {
'quant_bits': 1,
'quant_types': ['output'],
'op_types': ['Hardtanh'],
'op_names': ['features.6', 'features.9', 'features.13', 'features.16', 'features.20', 'classifier.2', 'classifier.5']
}]
quantizer = BNNQuantizer(model, configure_list)
model = quantizer.compress()
You can view example :githublink:`examples/model_compress/quantization/BNN_quantizer_cifar10.py <examples/model_compress/quantization/BNN_quantizer_cifar10.py>` for more information.
User configuration for BNN Quantizer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
common configuration needed by compression algorithms can be found at `Specification of ``config_list`` <./QuickStart.rst>`__.
configuration needed by this algorithm :
Experiment
^^^^^^^^^^
We implemented one of the experiments in `Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 <https://arxiv.org/abs/1602.02830>`__\ , we quantized the **VGGNet** for CIFAR-10 in the paper. Our experiments results are as follows:
.. list-table::
:header-rows: 1
:widths: auto
* - Model
- Accuracy
* - VGGNet
- 86.93%
The experiments code can be found at :githublink:`examples/model_compress/quantization/BNN_quantizer_cifar10.py <examples/model_compress/quantization/BNN_quantizer_cifar10.py>`
Observer Quantizer
------------------
..
Observer quantizer is a framework of post-training quantization. It will insert observers into the place where the quantization will happen. During quantization calibration, each observer will record all the tensors it 'sees'. These tensors will be used to calculate the quantization statistics after calibration.
Usage
^^^^^
1. configure which layer to be quantized and which tensor (input/output/weight) of that layer to be quantized.
2. construct the observer quantizer.
3. do quantization calibration.
4. call the `compress` API to calculate the scale and zero point for each tensor and switch model to evaluation mode.
PyTorch code
.. code-block:: python
from nni.algorithms.compression.pytorch.quantization import ObserverQuantizer
def calibration(model, calib_loader):
model.eval()
with torch.no_grad():
for data, _ in calib_loader:
model(data)
model = Mnist()
configure_list = [{
'quant_bits': 8,
'quant_types': ['weight', 'input'],
'op_names': ['conv1', 'conv2],
}, {
'quant_bits': 8,
'quant_types': ['output'],
'op_names': ['relu1', 'relu2],
}]
quantizer = ObserverQuantizer(model, configure_list)
calibration(model, calib_loader)
model = quantizer.compress()
You can view example :githublink:`examples/model_compress/quantization/observer_quantizer.py <examples/model_compress/quantization/observer_quantizer.py>` for more information.
User configuration for Observer Quantizer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Common configuration needed by compression algorithms can be found at `Specification of `config_list <./QuickStart.rst>`__.
.. note::
This quantizer is still under development for now. Some quantizer settings are hard-coded:
- weight observer: per_tensor_symmetric, qint8
- output observer: per_tensor_affine, quint8, reduce_range=True
Other settings (such as quant_type and op_names) can be configured.
About the compress API
^^^^^^^^^^^^^^^^^^^^^^
Before the `compress` API is called, the model will only record tensors' statistics and no quantization process will be executed.
After the `compress` API is called, the model will NOT record tensors' statistics any more. The quantization scale and zero point will
be generated for each tensor and will be used to quantize each tensor during inference (we call it evaluation mode)
About calibration
^^^^^^^^^^^^^^^^^
Usually we pick up about 100 training/evaluation examples for calibration. If you found the accuracy is a bit low, try
to reduce the number of calibration examples.
Quick Start
===========
.. code-block::
.. toctree::
:hidden:
Notebook Example <compression_pipeline_example>
Model compression usually consists of three stages: 1) pre-training a model, 2) compress the model, 3) fine-tuning the model. NNI mainly focuses on the second stage and provides very simple APIs for compressing a model. Follow this guide for a quick look at how easy it is to use NNI to compress a model.
.. A `compression pipeline example <./compression_pipeline_example.rst>`__ with Jupyter notebook is supported and refer the code :githublink:`here <examples/notebooks/compression_pipeline_example.ipynb>`.
Model Pruning
-------------
Here we use `level pruner <../Compression/Pruner.rst#level-pruner>`__ as an example to show the usage of pruning in NNI.
Step1. Write configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^
Write a configuration to specify the layers that you want to prune. The following configuration means pruning all the ``default``\ ops to sparsity 0.5 while keeping other layers unpruned.
.. code-block:: python
config_list = [{
'sparsity': 0.5,
'op_types': ['default'],
}]
The specification of configuration can be found `here <./Tutorial.rst#specify-the-configuration>`__. Note that different pruners may have their own defined fields in configuration. Please refer to each pruner's `usage <./Pruner.rst>`__ for details, and adjust the configuration accordingly.
Step2. Choose a pruner and compress the model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
First instantiate the chosen pruner with your model and configuration as arguments, then invoke ``compress()`` to compress your model. Note that, some algorithms may check gradients for compressing, so we may also define a trainer, an optimizer, a criterion and pass them to the pruner.
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import LevelPruner
pruner = LevelPruner(model, config_list)
model = pruner.compress()
Some pruners (e.g., L1FilterPruner, FPGMPruner) prune once, some pruners (e.g., AGPPruner) prune your model iteratively, the masks are adjusted epoch by epoch during training.
So if the pruners prune your model iteratively or they need training or inference to get gradients, you need pass finetuning logic to pruner.
For example:
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import AGPPruner
pruner = AGPPruner(model, config_list, optimizer, trainer, criterion, num_iterations=10, epochs_per_iteration=1, pruning_algorithm='level')
model = pruner.compress()
Step3. Export compression result
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
After training, you can export model weights to a file, and the generated masks to a file as well. Exporting onnx model is also supported.
.. code-block:: python
pruner.export_model(model_path='pruned_vgg19_cifar10.pth', mask_path='mask_vgg19_cifar10.pth')
Plese refer to :githublink:`mnist example <examples/model_compress/pruning/naive_prune_torch.py>` for example code.
More examples of pruning algorithms can be found in :githublink:`basic_pruners_torch <examples/model_compress/pruning/basic_pruners_torch.py>` and :githublink:`auto_pruners_torch <examples/model_compress/pruning/auto_pruners_torch.py>`.
Model Quantization
------------------
Here we use `QAT Quantizer <../Compression/Quantizer.rst#qat-quantizer>`__ as an example to show the usage of pruning in NNI.
Step1. Write configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python
config_list = [{
'quant_types': ['weight', 'input'],
'quant_bits': {
'weight': 8,
'input': 8,
}, # you can just use `int` here because all `quan_types` share same bits length, see config for `ReLu6` below.
'op_types':['Conv2d', 'Linear'],
'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
}, {
'quant_types': ['output'],
'quant_bits': 8,
'quant_start_step': 7000,
'op_types':['ReLU6'],
'quant_dtype': 'uint',
'quant_scheme': 'per_tensor_affine'
}]
The specification of configuration can be found `here <./Tutorial.rst#quantization-specific-keys>`__.
Step2. Choose a quantizer and compress the model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python
from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
quantizer = QAT_Quantizer(model, config_list)
quantizer.compress()
Step3. Export compression result
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
After training and calibration, you can export model weight to a file, and the generated calibration parameters to a file as well. Exporting onnx model is also supported.
.. code-block:: python
calibration_config = quantizer.export_model(model_path, calibration_path, onnx_path, input_shape, device)
Plese refer to :githublink:`mnist example <examples/model_compress/quantization/QAT_torch_quantizer.py>` for example code.
Congratulations! You've compressed your first model via NNI. To go a bit more in depth about model compression in NNI, check out the `Tutorial <./Tutorial.rst>`__.
\ No newline at end of file
.. 98b0285bbfe1a01c90b9ba6a9b0d6caa
快速入门
===========
.. code-block::
.. toctree::
:hidden:
Notebook Example <compression_pipeline_example>
模型压缩通常包括三个阶段:1)预训练模型,2)压缩模型,3)微调模型。 NNI 主要关注于第二阶段,并为模型压缩提供易于使用的 API。遵循本指南,您将快速了解如何使用 NNI 来压缩模型。更深入地了解 NNI 中的模型压缩模块,请查看 `Tutorial <./Tutorial.rst>`__
.. 提供了一个在 Jupyter notebook 中进行完整的模型压缩流程的 `示例 <./compression_pipeline_example.rst>`__,参考 :githublink:`代码 <examples/notebooks/compression_pipeline_example.ipynb>`
模型剪枝
-------------
这里通过 `level pruner <../Compression/Pruner.rst#level-pruner>`__ 举例说明 NNI 中模型剪枝的用法。
Step1. 编写配置
^^^^^^^^^^^^^^^^^^^^^^^^^^
编写配置来指定要剪枝的层。以下配置表示剪枝所有的 ``default`` 层,稀疏度设为 0.5,其它层保持不变。
.. code-block:: python
config_list = [{
'sparsity': 0.5,
'op_types': ['default'],
}]
配置说明在 `这里 <./Tutorial.rst#specify-the-configuration>`__。注意,不同的 Pruner 可能有自定义的配置字段。详情参考每个 Pruner `具体用法 <./Pruner.rst>`__,来调整相应的配置。
Step2. 选择 Pruner 来压缩模型
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
首先,使用模型来初始化 Pruner,并将配置作为参数传入,然后调用 ``compress()`` 来压缩模型。请注意,有些算法可能会检查训练过程中的梯度,因此我们可能会定义一组 trainer, optimizer, criterion 并传递给 Pruner
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import LevelPruner
pruner = LevelPruner(model, config_list)
model = pruner.compress()
然后,使用正常的训练方法来训练模型 (如,SGD),剪枝在训练过程中是透明的。有些 Pruner(如 L1FilterPrunerFPGMPruner)在开始时修剪一次,下面的训练可以看作是微调。有些 Pruner(例如AGPPruner)会迭代的对模型剪枝,在训练过程中逐步修改掩码。
如果使用 Pruner 进行迭代剪枝,或者剪枝过程中需要训练或者推理,则需要将 finetune 逻辑传到 Pruner 中。
例如:
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import AGPPruner
pruner = AGPPruner(model, config_list, optimizer, trainer, criterion, num_iterations=10, epochs_per_iteration=1, pruning_algorithm='level')
model = pruner.compress()
Step3. 导出压缩结果
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
训练之后,可将模型权重导出到文件,同时将生成的掩码也导出到文件, 也支持导出 ONNX 模型。
.. code-block:: python
pruner.export_model(model_path='pruned_vgg19_cifar10.pth', mask_path='mask_vgg19_cifar10.pth')
参考 :githublink:`mnist 示例 <examples/model_compress/pruning/naive_prune_torch.py>` 获取代码。
更多剪枝算法的示例在 :githublink:`basic_pruners_torch <examples/model_compress/pruning/basic_pruners_torch.py>` :githublink:`auto_pruners_torch <examples/model_compress/pruning/auto_pruners_torch.py>`
模型量化
------------------
这里通过 `QAT Quantizer <../Compression/Quantizer.rst#qat-quantizer>`__ 举例说明在 NNI 中量化的用法。
Step1. 编写配置
^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python
config_list = [{
'quant_types': ['weight', 'input'],
'quant_bits': {
'weight': 8,
'input': 8,
}, # 这里可以仅使用 `int`,因为所有 `quan_types` 使用了一样的位长,参考下方 `ReLu6` 配置。
'op_types':['Conv2d', 'Linear'],
'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
}, {
'quant_types': ['output'],
'quant_bits': 8,
'quant_start_step': 7000,
'op_types':['ReLU6'],
'quant_dtype': 'uint',
'quant_scheme': 'per_tensor_affine'
}]
配置说明在 `这里 <./Tutorial.rst#quantization-specific-keys>`__
Step2. 选择 Quantizer 来压缩模型
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python
from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
quantizer = QAT_Quantizer(model, config_list)
quantizer.compress()
Step3. 导出压缩结果
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
在训练和校准之后,你可以将模型权重导出到一个文件,并将生成的校准参数也导出到一个文件。 也支持导出 ONNX 模型。
.. code-block:: python
calibration_config = quantizer.export_model(model_path, calibration_path, onnx_path, input_shape, device)
参考 :githublink:`mnist example <examples/model_compress/quantization/QAT_torch_quantizer.py>` 获取示例代码。
恭喜! 您已经通过 NNI 压缩了您的第一个模型。 更深入地了解 NNI 中的模型压缩,请查看 `Tutorial <./Tutorial.rst>`__
\ No newline at end of file
Tutorial
========
.. contents::
In this tutorial, we will explain more detailed usage about the model compression in NNI.
Setup compression goal
----------------------
Specify the configuration
^^^^^^^^^^^^^^^^^^^^^^^^^
Users can specify the configuration (i.e., ``config_list``\ ) for a compression algorithm. For example, when compressing a model, users may want to specify the sparsity ratio, to specify different ratios for different types of operations, to exclude certain types of operations, or to compress only a certain types of operations. For users to express these kinds of requirements, we define a configuration specification. It can be seen as a python ``list`` object, where each element is a ``dict`` object.
The ``dict``\ s in the ``list`` are applied one by one, that is, the configurations in latter ``dict`` will overwrite the configurations in former ones on the operations that are within the scope of both of them.
There are different keys in a ``dict``. Some of them are common keys supported by all the compression algorithms:
* **op_types**\ : This is to specify what types of operations to be compressed. 'default' means following the algorithm's default setting. All suported module types are defined in :githublink:`default_layers.py <nni/compression/pytorch/default_layers.py>` for pytorch.
* **op_names**\ : This is to specify by name what operations to be compressed. If this field is omitted, operations will not be filtered by it.
* **exclude**\ : Default is False. If this field is True, it means the operations with specified types and names will be excluded from the compression.
Some other keys are often specific to a certain algorithm, users can refer to `pruning algorithms <./Pruner.rst>`__ and `quantization algorithms <./Quantizer.rst>`__ for the keys allowed by each algorithm.
To prune all ``Conv2d`` layers with the sparsity of 0.6, the configuration can be written as:
.. code-block:: python
[{
'sparsity': 0.6,
'op_types': ['Conv2d']
}]
To control the sparsity of specific layers, the configuration can be written as:
.. code-block:: python
[{
'sparsity': 0.8,
'op_types': ['default']
},
{
'sparsity': 0.6,
'op_names': ['op_name1', 'op_name2']
},
{
'exclude': True,
'op_names': ['op_name3']
}]
It means following the algorithm's default setting for compressed operations with sparsity 0.8, but for ``op_name1`` and ``op_name2`` use sparsity 0.6, and do not compress ``op_name3``.
Quantization specific keys
^^^^^^^^^^^^^^^^^^^^^^^^^^
Besides the keys explained above, if you use quantization algorithms you need to specify more keys in ``config_list``\ , which are explained below.
* **quant_types** : list of string.
Type of quantization you want to apply, currently support 'weight', 'input', 'output'. 'weight' means applying quantization operation
to the weight parameter of modules. 'input' means applying quantization operation to the input of module forward method. 'output' means applying quantization operation to the output of module forward method, which is often called as 'activation' in some papers.
* **quant_bits** : int or dict of {str : int}
bits length of quantization, key is the quantization type, value is the quantization bits length, eg.
.. code-block:: python
{
quant_bits: {
'weight': 8,
'output': 4,
},
}
when the value is int type, all quantization types share same bits length. eg.
.. code-block:: python
{
quant_bits: 8, # weight or output quantization are all 8 bits
}
* **quant_dtype** : str or dict of {str : str}
quantization dtype, used to determine the range of quantized value. Two choices can be used:
- int: the range is singed
- uint: the range is unsigned
Two ways to set it. One is that the key is the quantization type, and the value is the quantization dtype, eg.
.. code-block:: python
{
quant_dtype: {
'weight': 'int',
'output': 'uint,
},
}
The other is that the value is str type, and all quantization types share the same dtype. eg.
.. code-block:: python
{
'quant_dtype': 'int', # the dtype of weight and output quantization are all 'int'
}
There are totally two kinds of `quant_dtype` you can set, they are 'int' and 'uint'.
* **quant_scheme** : str or dict of {str : str}
quantization scheme, used to determine the quantization manners. Four choices can used:
- per_tensor_affine: per tensor, asymmetric quantization
- per_tensor_symmetric: per tensor, symmetric quantization
- per_channel_affine: per channel, asymmetric quantization
- per_channel_symmetric: per channel, symmetric quantization
Two ways to set it. One is that the key is the quantization type, value is the quantization scheme, eg.
.. code-block:: python
{
quant_scheme: {
'weight': 'per_channel_symmetric',
'output': 'per_tensor_affine',
},
}
The other is that the value is str type, all quantization types share the same quant_scheme. eg.
.. code-block:: python
{
quant_scheme: 'per_channel_symmetric', # the quant_scheme of weight and output quantization are all 'per_channel_symmetric'
}
There are totally four kinds of `quant_scheme` you can set, they are 'per_tensor_affine', 'per_tensor_symmetric', 'per_channel_affine' and 'per_channel_symmetric'.
The following example shows a more complete ``config_list``\ , it uses ``op_names`` (or ``op_types``\ ) to specify the target layers along with the quantization bits for those layers.
.. code-block:: python
config_list = [{
'quant_types': ['weight'],
'quant_bits': 8,
'op_names': ['conv1'],
'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
},
{
'quant_types': ['weight'],
'quant_bits': 4,
'quant_start_step': 0,
'op_names': ['conv2'],
'quant_dtype': 'int',
'quant_scheme': 'per_tensor_symmetric'
},
{
'quant_types': ['weight'],
'quant_bits': 3,
'op_names': ['fc1'],
'quant_dtype': 'int',
'quant_scheme': 'per_tensor_symmetric'
},
{
'quant_types': ['weight'],
'quant_bits': 2,
'op_names': ['fc2'],
'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
}]
In this example, 'op_names' is the name of layer and four layers will be quantized to different quant_bits.
Export compression result
-------------------------
Export the pruned model
^^^^^^^^^^^^^^^^^^^^^^^
You can easily export the pruned model using the following API if you are pruning your model, ``state_dict`` of the sparse model weights will be stored in ``model.pth``\ , which can be loaded by ``torch.load('model.pth')``. Note that, the exported ``model.pth``\ has the same parameters as the original model except the masked weights are zero. ``mask_dict`` stores the binary value that produced by the pruning algorithm, which can be further used to speed up the model.
.. code-block:: python
# export model weights and mask
pruner.export_model(model_path='model.pth', mask_path='mask.pth')
# apply mask to model
from nni.compression.pytorch import apply_compression_results
apply_compression_results(model, mask_file, device)
export model in ``onnx`` format(\ ``input_shape`` need to be specified):
.. code-block:: python
pruner.export_model(model_path='model.pth', mask_path='mask.pth', onnx_path='model.onnx', input_shape=[1, 1, 28, 28])
Export the quantized model
^^^^^^^^^^^^^^^^^^^^^^^^^^
You can export the quantized model directly by using ``torch.save`` api and the quantized model can be loaded by ``torch.load`` without any extra modification. The following example shows the normal procedure of saving, loading quantized model and get related parameters in QAT.
.. code-block:: python
# Save quantized model which is generated by using NNI QAT algorithm
torch.save(model.state_dict(), "quantized_model.pth")
# Simulate model loading procedure
# Have to init new model and compress it before loading
qmodel_load = Mnist()
optimizer = torch.optim.SGD(qmodel_load.parameters(), lr=0.01, momentum=0.5)
quantizer = QAT_Quantizer(qmodel_load, config_list, optimizer)
quantizer.compress()
# Load quantized model
qmodel_load.load_state_dict(torch.load("quantized_model.pth"))
# Get scale, zero_point and weight of conv1 in loaded model
conv1 = qmodel_load.conv1
scale = conv1.module.scale
zero_point = conv1.module.zero_point
weight = conv1.module.weight
Speed up the model
------------------
Masks do not provide real speedup of your model. The model should be speeded up based on the exported masks, thus, we provide an API to speed up your model as shown below. After invoking ``apply_compression_results`` on your model, your model becomes a smaller one with shorter inference latency.
.. code-block:: python
from nni.compression.pytorch import apply_compression_results, ModelSpeedup
dummy_input = torch.randn(config['input_shape']).to(device)
m_speedup = ModelSpeedup(model, dummy_input, masks_file, device)
m_speedup.speedup_model()
Please refer to `here <ModelSpeedup.rst>`__ for detailed description. The example code for model speedup can be found :githublink:`here <examples/model_compress/pruning/model_speedup.py>`
Control the Fine-tuning process
-------------------------------
Enhance the fine-tuning process
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Knowledge distillation effectively learns a small student model from a large teacher model. Users can enhance the fine-tuning process that utilize knowledge distillation to improve the performance of the compressed model. Example code can be found :githublink:`here <examples/model_compress/pruning/finetune_kd_torch.py>`
.. acd3f66ad7c2d82b950568efcba1f175
高级用法
==============
.. toctree::
:maxdepth: 2
框架 <./Framework>
自定义压缩算法 <./CustomizeCompressor>
自动模型压缩 (Beta) <./AutoCompression>
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Prepare model"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torch.nn.functional as F\n",
"\n",
"class NaiveModel(torch.nn.Module):\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)\n",
" self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)\n",
" self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)\n",
" self.fc2 = torch.nn.Linear(500, 10)\n",
" self.relu1 = torch.nn.ReLU6()\n",
" self.relu2 = torch.nn.ReLU6()\n",
" self.relu3 = torch.nn.ReLU6()\n",
" self.max_pool1 = torch.nn.MaxPool2d(2, 2)\n",
" self.max_pool2 = torch.nn.MaxPool2d(2, 2)\n",
"\n",
" def forward(self, x):\n",
" x = self.relu1(self.conv1(x))\n",
" x = self.max_pool1(x)\n",
" x = self.relu2(self.conv2(x))\n",
" x = self.max_pool2(x)\n",
" x = x.view(-1, x.size()[1:].numel())\n",
" x = self.relu3(self.fc1(x))\n",
" x = self.fc2(x)\n",
" return F.log_softmax(x, dim=1)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# define model, optimizer, criterion, data_loader, trainer, evaluator.\n",
"\n",
"import torch.optim as optim\n",
"from torchvision import datasets, transforms\n",
"from torch.optim.lr_scheduler import StepLR\n",
"\n",
"device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"\n",
"model = NaiveModel().to(device)\n",
"\n",
"optimizer = optim.Adadelta(model.parameters(), lr=1)\n",
"\n",
"criterion = torch.nn.NLLLoss()\n",
"\n",
"transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])\n",
"train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)\n",
"test_dataset = datasets.MNIST('./data', train=False, transform=transform)\n",
"train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64)\n",
"test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1000)\n",
"\n",
"def trainer(model, optimizer, criterion, epoch):\n",
" model.train()\n",
" for batch_idx, (data, target) in enumerate(train_loader):\n",
" data, target = data.to(device), target.to(device)\n",
" optimizer.zero_grad()\n",
" output = model(data)\n",
" loss = criterion(output, target)\n",
" loss.backward()\n",
" optimizer.step()\n",
" if batch_idx % 100 == 0:\n",
" print('Train Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(\n",
" epoch, batch_idx * len(data), len(train_loader.dataset),\n",
" 100. * batch_idx / len(train_loader), loss.item()))\n",
"\n",
"def evaluator(model):\n",
" model.eval()\n",
" test_loss = 0\n",
" correct = 0\n",
" with torch.no_grad():\n",
" for data, target in test_loader:\n",
" data, target = data.to(device), target.to(device)\n",
" output = model(data)\n",
" test_loss += F.nll_loss(output, target, reduction='sum').item()\n",
" pred = output.argmax(dim=1, keepdim=True)\n",
" correct += pred.eq(target.view_as(pred)).sum().item()\n",
"\n",
" test_loss /= len(test_loader.dataset)\n",
" acc = 100 * correct / len(test_loader.dataset)\n",
"\n",
" print('\\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\\n'.format(\n",
" test_loss, correct, len(test_loader.dataset), acc))\n",
"\n",
" return acc"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train Epoch: 0 [0/60000 (0%)]\tLoss: 2.313423\n",
"Train Epoch: 0 [6400/60000 (11%)]\tLoss: 0.091786\n",
"Train Epoch: 0 [12800/60000 (21%)]\tLoss: 0.087317\n",
"Train Epoch: 0 [19200/60000 (32%)]\tLoss: 0.036397\n",
"Train Epoch: 0 [25600/60000 (43%)]\tLoss: 0.008173\n",
"Train Epoch: 0 [32000/60000 (53%)]\tLoss: 0.047565\n",
"Train Epoch: 0 [38400/60000 (64%)]\tLoss: 0.122448\n",
"Train Epoch: 0 [44800/60000 (75%)]\tLoss: 0.036732\n",
"Train Epoch: 0 [51200/60000 (85%)]\tLoss: 0.150135\n",
"Train Epoch: 0 [57600/60000 (96%)]\tLoss: 0.109684\n",
"\n",
"Test set: Average loss: 0.0457, Accuracy: 9857/10000 (99%)\n",
"\n",
"Train Epoch: 1 [0/60000 (0%)]\tLoss: 0.020650\n",
"Train Epoch: 1 [6400/60000 (11%)]\tLoss: 0.091525\n",
"Train Epoch: 1 [12800/60000 (21%)]\tLoss: 0.019602\n",
"Train Epoch: 1 [19200/60000 (32%)]\tLoss: 0.027827\n",
"Train Epoch: 1 [25600/60000 (43%)]\tLoss: 0.019414\n",
"Train Epoch: 1 [32000/60000 (53%)]\tLoss: 0.007640\n",
"Train Epoch: 1 [38400/60000 (64%)]\tLoss: 0.051296\n",
"Train Epoch: 1 [44800/60000 (75%)]\tLoss: 0.012038\n",
"Train Epoch: 1 [51200/60000 (85%)]\tLoss: 0.121057\n",
"Train Epoch: 1 [57600/60000 (96%)]\tLoss: 0.015796\n",
"\n",
"Test set: Average loss: 0.0302, Accuracy: 9902/10000 (99%)\n",
"\n",
"Train Epoch: 2 [0/60000 (0%)]\tLoss: 0.009903\n",
"Train Epoch: 2 [6400/60000 (11%)]\tLoss: 0.062256\n",
"Train Epoch: 2 [12800/60000 (21%)]\tLoss: 0.013844\n",
"Train Epoch: 2 [19200/60000 (32%)]\tLoss: 0.014133\n",
"Train Epoch: 2 [25600/60000 (43%)]\tLoss: 0.001051\n",
"Train Epoch: 2 [32000/60000 (53%)]\tLoss: 0.006128\n",
"Train Epoch: 2 [38400/60000 (64%)]\tLoss: 0.032162\n",
"Train Epoch: 2 [44800/60000 (75%)]\tLoss: 0.007687\n",
"Train Epoch: 2 [51200/60000 (85%)]\tLoss: 0.092295\n",
"Train Epoch: 2 [57600/60000 (96%)]\tLoss: 0.006266\n",
"\n",
"Test set: Average loss: 0.0259, Accuracy: 9920/10000 (99%)\n",
"\n"
]
}
],
"source": [
"# pre-train model for 3 epoches.\n",
"\n",
"scheduler = StepLR(optimizer, step_size=1, gamma=0.7)\n",
"\n",
"for epoch in range(0, 3):\n",
" trainer(model, optimizer, criterion, epoch)\n",
" evaluator(model)\n",
" scheduler.step()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"op_name: \n",
"op_type: <class '__main__.NaiveModel'>\n",
"\n",
"op_name: conv1\n",
"op_type: <class 'torch.nn.modules.conv.Conv2d'>\n",
"\n",
"op_name: conv2\n",
"op_type: <class 'torch.nn.modules.conv.Conv2d'>\n",
"\n",
"op_name: fc1\n",
"op_type: <class 'torch.nn.modules.linear.Linear'>\n",
"\n",
"op_name: fc2\n",
"op_type: <class 'torch.nn.modules.linear.Linear'>\n",
"\n",
"op_name: relu1\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: relu2\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: relu3\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: max_pool1\n",
"op_type: <class 'torch.nn.modules.pooling.MaxPool2d'>\n",
"\n",
"op_name: max_pool2\n",
"op_type: <class 'torch.nn.modules.pooling.MaxPool2d'>\n",
"\n"
]
},
{
"data": {
"text/plain": [
"[None, None, None, None, None, None, None, None, None, None]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# show all op_name and op_type in the model.\n",
"\n",
"[print('op_name: {}\\nop_type: {}\\n'.format(name, type(module))) for name, module in model.named_modules()]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([20, 1, 5, 5])\n"
]
}
],
"source": [
"# show the weight size of `conv1`.\n",
"\n",
"print(model.conv1.weight.data.size())"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[[[ 1.5338e-01, -1.1766e-01, -2.6654e-01, -2.9445e-02, -1.4650e-01],\n",
" [-1.8796e-01, -2.9882e-01, 6.9725e-02, 2.1561e-01, 6.5688e-02],\n",
" [ 1.5274e-01, -9.8471e-03, 3.2303e-01, 1.3472e-03, 1.7235e-01],\n",
" [ 1.1804e-01, 2.2535e-01, -8.3370e-02, -3.4553e-02, -1.2529e-01],\n",
" [-6.6012e-02, -2.0272e-02, -1.8797e-01, -4.6882e-02, -8.3206e-02]]],\n",
"\n",
"\n",
" [[[-1.2112e-01, 7.0756e-02, 5.0446e-02, 1.5156e-01, -2.7929e-02],\n",
" [-1.9744e-01, -2.1336e-03, 7.2534e-02, 6.2336e-02, 1.6039e-01],\n",
" [-6.7510e-02, 1.4636e-01, 7.1972e-02, -8.9118e-02, -4.0895e-02],\n",
" [ 2.9499e-02, 2.0788e-01, -1.4989e-01, 1.1668e-01, -2.8503e-01],\n",
" [ 8.1894e-02, -1.4489e-01, -4.2038e-02, -1.2794e-01, -5.0379e-02]]],\n",
"\n",
"\n",
" [[[ 3.8332e-02, -1.4270e-01, -1.9585e-01, 2.2653e-01, 1.0104e-01],\n",
" [-2.7956e-03, -1.4108e-01, -1.4694e-01, -1.3525e-01, 2.6959e-01],\n",
" [ 1.9522e-01, -1.2281e-01, -1.9173e-01, -1.8910e-02, 3.1572e-03],\n",
" [-1.0580e-01, -2.5239e-02, -5.8266e-02, -6.5815e-02, 6.6433e-02],\n",
" [ 8.9601e-02, 7.1189e-02, -2.4255e-01, 1.5746e-01, -1.4708e-01]]],\n",
"\n",
"\n",
" [[[-1.1963e-01, -1.7243e-01, -3.5174e-02, 1.4651e-01, -1.1675e-01],\n",
" [-1.3518e-01, 1.2830e-02, 7.7188e-02, 2.1060e-01, 4.0924e-02],\n",
" [-4.3364e-02, -1.9579e-01, -3.6559e-02, -6.9803e-02, 1.2380e-01],\n",
" [ 7.7321e-02, 3.7590e-02, 8.2935e-02, 2.2878e-01, 2.7859e-03],\n",
" [-1.3601e-01, -2.1167e-01, -2.3195e-01, -1.2524e-01, 1.0073e-01]]],\n",
"\n",
"\n",
" [[[-2.7300e-01, 6.8470e-02, 2.8405e-02, -4.5879e-03, -1.3735e-01],\n",
" [-8.9789e-02, -2.0209e-03, 5.0950e-03, 2.1633e-01, 2.5554e-01],\n",
" [ 5.4389e-02, 1.2262e-01, -1.5514e-01, -1.0416e-01, 1.3606e-01],\n",
" [-1.6794e-01, -2.8876e-02, 2.5900e-02, -2.4261e-02, 1.0923e-01],\n",
" [ 5.2524e-03, -4.4625e-02, -2.1327e-01, -1.7211e-01, -4.4819e-04]]],\n",
"\n",
"\n",
" [[[ 7.2378e-02, 1.5122e-01, -1.2964e-01, 4.9105e-02, -2.1639e-01],\n",
" [ 3.6547e-02, -1.5518e-02, 3.2059e-02, -3.2820e-02, 6.1231e-02],\n",
" [ 1.2514e-01, 8.0623e-02, 1.2686e-02, -1.0074e-01, 2.2836e-02],\n",
" [-2.6842e-02, 2.5578e-02, -2.5877e-01, -1.7808e-01, 7.6966e-02],\n",
" [-4.2424e-02, 4.7006e-02, -1.5486e-02, -4.2686e-02, 4.8482e-02]]],\n",
"\n",
"\n",
" [[[ 1.3081e-01, 9.9530e-02, -1.4729e-01, -1.7665e-01, -1.9757e-01],\n",
" [ 9.6603e-02, 2.2783e-02, 7.8402e-02, -2.8679e-02, 8.5252e-02],\n",
" [-1.5310e-02, 1.1605e-01, -5.8300e-02, 2.4563e-02, 1.7488e-01],\n",
" [ 6.5576e-02, -1.6325e-01, -1.1318e-01, -2.9251e-02, 6.2352e-02],\n",
" [-1.9084e-03, -1.4005e-01, -1.2363e-01, -9.7985e-02, -2.0562e-01]]],\n",
"\n",
"\n",
" [[[ 4.0772e-02, -8.2086e-02, -2.7555e-01, -3.2547e-01, -1.2226e-01],\n",
" [-5.9877e-02, 9.8567e-02, 2.5186e-01, -1.0280e-01, -2.3416e-01],\n",
" [ 8.5760e-02, 1.0896e-01, 1.4898e-01, 2.1579e-01, 8.5297e-02],\n",
" [ 5.4720e-02, -1.7226e-01, -7.2518e-02, 6.7099e-03, -1.6011e-03],\n",
" [-8.9944e-02, 1.7404e-01, -3.6985e-02, 1.8602e-01, 7.2353e-02]]],\n",
"\n",
"\n",
" [[[ 1.6276e-02, -9.6439e-02, -9.6085e-02, -2.4267e-01, -1.8521e-01],\n",
" [ 6.3310e-02, 1.7866e-01, 1.1694e-01, -1.4464e-01, -2.7711e-01],\n",
" [-2.4514e-02, 2.2222e-01, 2.1053e-01, -1.4271e-01, 8.7045e-02],\n",
" [-1.9207e-01, -5.4719e-02, -5.7775e-03, -1.0034e-05, -1.0923e-01],\n",
" [-2.4006e-02, 2.3780e-02, 1.8988e-01, 2.4734e-01, 4.8097e-02]]],\n",
"\n",
"\n",
" [[[ 1.1335e-01, -5.8451e-02, 5.2440e-02, -1.3223e-01, -2.5534e-02],\n",
" [ 9.1323e-02, -6.0707e-02, 2.3524e-01, 2.4992e-01, 8.7842e-02],\n",
" [ 2.9002e-02, 3.5379e-02, -5.9689e-02, -2.8363e-03, 1.8618e-01],\n",
" [-2.9671e-01, 8.1830e-03, 1.1076e-01, -5.4118e-02, -6.1685e-02],\n",
" [-1.7580e-01, -3.4534e-01, -3.9250e-01, -2.7569e-01, -2.6131e-01]]],\n",
"\n",
"\n",
" [[[ 1.1586e-01, -7.5997e-02, -1.4614e-01, 4.8750e-02, 1.8097e-01],\n",
" [-6.7027e-02, -1.4901e-01, -1.5614e-02, -1.0379e-02, 9.5526e-02],\n",
" [-3.2333e-02, -1.5107e-01, -1.9498e-01, 1.0083e-01, 2.2328e-01],\n",
" [-2.0692e-01, -6.3798e-02, -1.2524e-01, 1.9549e-01, 1.9682e-01],\n",
" [-2.1494e-01, 1.0475e-01, -2.4858e-02, -9.7831e-02, 1.1551e-01]]],\n",
"\n",
"\n",
" [[[ 6.3785e-02, -1.8044e-01, -1.0190e-01, -1.3588e-01, 8.5433e-02],\n",
" [ 2.0675e-01, 3.3238e-02, 9.2437e-02, 1.1799e-01, 2.1111e-01],\n",
" [-5.2138e-02, 1.5790e-01, 1.8151e-01, 8.0470e-02, 1.0131e-01],\n",
" [-4.4786e-02, 1.1771e-01, 2.1706e-02, -1.2563e-01, -2.1142e-01],\n",
" [-2.3589e-01, -2.1154e-01, -1.7890e-01, -2.7769e-01, -1.2512e-01]]],\n",
"\n",
"\n",
" [[[ 1.9133e-01, 2.4711e-01, 1.0413e-01, -1.9187e-01, -3.0991e-01],\n",
" [-1.2382e-01, 8.3641e-03, -5.6734e-02, 5.8376e-02, 2.2880e-02],\n",
" [-3.1734e-01, -1.0637e-02, -5.5974e-02, 1.0676e-01, -1.1080e-02],\n",
" [-2.2980e-01, 2.0486e-01, 1.0147e-01, 1.4484e-01, 5.2265e-02],\n",
" [ 7.4410e-02, 2.2806e-02, 8.5137e-02, -2.1809e-01, 3.1704e-02]]],\n",
"\n",
"\n",
" [[[-1.1006e-01, -2.5311e-01, 1.8925e-02, 1.0399e-02, 1.1951e-01],\n",
" [-2.1116e-01, 1.8409e-01, 3.2172e-02, 1.5962e-01, -7.9457e-02],\n",
" [ 1.1059e-01, 9.1966e-02, 1.0777e-01, -9.9132e-02, -4.4586e-02],\n",
" [-8.7919e-02, -3.7283e-02, 9.1275e-02, -3.7412e-02, 3.8875e-02],\n",
" [-4.3558e-02, 1.6196e-01, -4.7944e-03, -1.7560e-02, -1.2593e-01]]],\n",
"\n",
"\n",
" [[[ 7.6976e-02, -3.8627e-02, 1.2610e-01, 1.1994e-01, 2.1706e-03],\n",
" [ 7.4357e-02, 6.7929e-02, 3.1386e-02, 1.4606e-01, 2.1429e-01],\n",
" [-2.6569e-01, -4.2631e-04, -3.6654e-02, -3.0967e-02, -9.4961e-02],\n",
" [-2.0192e-01, -3.5423e-01, -2.5246e-01, -3.5092e-01, -2.4159e-01],\n",
" [ 1.7636e-02, 1.3744e-01, -1.0306e-01, 8.8370e-02, 7.3258e-02]]],\n",
"\n",
"\n",
" [[[ 2.0016e-01, 1.0956e-01, -5.9223e-02, 6.4871e-03, -2.4165e-01],\n",
" [ 5.6283e-02, 1.7276e-01, -2.2316e-01, -1.6699e-01, -7.0742e-02],\n",
" [ 2.6179e-01, -2.5102e-01, -2.0774e-01, -9.6413e-02, 3.4367e-02],\n",
" [-9.1882e-02, -2.9195e-01, -8.7432e-02, 1.0144e-01, -2.0559e-02],\n",
" [-2.5668e-01, -9.8016e-02, 1.1103e-01, -3.0233e-02, 1.1076e-01]]],\n",
"\n",
"\n",
" [[[ 1.0027e-03, -5.7955e-02, -2.1339e-01, -1.6729e-01, -2.0870e-01],\n",
" [ 4.2464e-02, 2.3177e-01, -6.1459e-02, -1.0905e-01, 1.7613e-02],\n",
" [-1.2282e-01, 2.1762e-01, -1.3553e-02, 2.7476e-01, 1.6703e-01],\n",
" [-5.6282e-02, 1.2731e-02, 1.0944e-01, -1.7347e-01, 4.4497e-02],\n",
" [ 5.7346e-02, -5.4657e-02, 4.8718e-02, -2.6221e-02, -2.6933e-02]]],\n",
"\n",
"\n",
" [[[ 6.7697e-02, 1.5692e-01, 2.7050e-01, 1.5936e-02, 1.7659e-01],\n",
" [-2.8899e-02, -1.4866e-01, 3.1838e-02, 1.0903e-01, 1.2292e-01],\n",
" [-1.3608e-01, -4.3198e-03, -9.8925e-02, -4.5599e-02, 1.3452e-01],\n",
" [-5.1435e-02, -2.3815e-01, -2.4151e-01, -4.8556e-02, 1.3825e-01],\n",
" [-1.2823e-01, 8.9324e-03, -1.5313e-01, -2.2933e-01, -3.4081e-02]]],\n",
"\n",
"\n",
" [[[-1.8396e-01, -6.8774e-03, -1.6675e-01, 7.1980e-03, 1.9922e-02],\n",
" [ 1.3416e-01, -1.1450e-01, -1.5277e-01, -6.5713e-02, -9.5435e-02],\n",
" [ 1.5406e-01, -9.1235e-02, -1.0880e-01, -7.1603e-02, -9.5575e-02],\n",
" [ 2.1772e-01, 8.4073e-02, -2.5264e-01, -2.1428e-01, 1.9537e-01],\n",
" [ 1.3124e-01, 7.9532e-02, -2.4044e-01, -1.5717e-01, 1.6562e-01]]],\n",
"\n",
"\n",
" [[[ 1.1849e-01, -5.0517e-03, -1.8900e-01, 1.8093e-02, 6.4660e-02],\n",
" [-1.5309e-01, -2.0106e-01, -8.6551e-02, 5.2692e-03, 1.5448e-01],\n",
" [-3.0727e-01, 4.9703e-02, -4.7637e-02, 2.9111e-01, -1.3173e-01],\n",
" [-8.5167e-02, -1.3540e-01, 2.9235e-01, 3.7895e-03, -9.4651e-02],\n",
" [-6.0694e-02, 9.6936e-02, 1.0533e-01, -6.1769e-02, -1.8086e-01]]]],\n",
" device='cuda:0')\n"
]
}
],
"source": [
"# show the weight of `conv1`.\n",
"\n",
"print(model.conv1.weight.data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Prepare config_list for pruning"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# we will prune 50% weights in `conv1`.\n",
"\n",
"config_list = [{\n",
" 'sparsity': 0.5,\n",
" 'op_types': ['Conv2d'],\n",
" 'op_names': ['conv1']\n",
"}]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3. Choose a pruner and pruning"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# use l1filter pruner to prune the model\n",
"\n",
"from nni.algorithms.compression.pytorch.pruning import L1FilterPruner\n",
"\n",
"# Note that if you use a compressor that need you to pass a optimizer,\n",
"# you need a new optimizer instead of you have used above, because NNI might modify the optimizer.\n",
"# And of course this modified optimizer can not be used in finetuning.\n",
"pruner = L1FilterPruner(model, config_list)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"op_name: \n",
"op_type: <class '__main__.NaiveModel'>\n",
"\n",
"op_name: conv1\n",
"op_type: <class 'nni.compression.pytorch.compressor.PrunerModuleWrapper'>\n",
"\n",
"op_name: conv1.module\n",
"op_type: <class 'torch.nn.modules.conv.Conv2d'>\n",
"\n",
"op_name: conv2\n",
"op_type: <class 'torch.nn.modules.conv.Conv2d'>\n",
"\n",
"op_name: fc1\n",
"op_type: <class 'torch.nn.modules.linear.Linear'>\n",
"\n",
"op_name: fc2\n",
"op_type: <class 'torch.nn.modules.linear.Linear'>\n",
"\n",
"op_name: relu1\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: relu2\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: relu3\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: max_pool1\n",
"op_type: <class 'torch.nn.modules.pooling.MaxPool2d'>\n",
"\n",
"op_name: max_pool2\n",
"op_type: <class 'torch.nn.modules.pooling.MaxPool2d'>\n",
"\n"
]
},
{
"data": {
"text/plain": [
"[None, None, None, None, None, None, None, None, None, None, None]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we can find the `conv1` has been wrapped, the origin `conv1` changes to `conv1.module`.\n",
"# the weight of conv1 will modify by `weight * mask` in `forward()`. The initial mask is a `ones_like(weight)` tensor.\n",
"\n",
"[print('op_name: {}\\nop_type: {}\\n'.format(name, type(module))) for name, module in model.named_modules()]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NaiveModel(\n",
" (conv1): PrunerModuleWrapper(\n",
" (module): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))\n",
" )\n",
" (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1))\n",
" (fc1): Linear(in_features=800, out_features=500, bias=True)\n",
" (fc2): Linear(in_features=500, out_features=10, bias=True)\n",
" (relu1): ReLU6()\n",
" (relu2): ReLU6()\n",
" (relu3): ReLU6()\n",
" (max_pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (max_pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
")"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# compress the model, the mask will be updated.\n",
"\n",
"pruner.compress()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([20, 1, 5, 5])\n"
]
}
],
"source": [
"# show the mask size of `conv1`\n",
"\n",
"print(model.conv1.weight_mask.size())"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]]], device='cuda:0')\n"
]
}
],
"source": [
"# show the mask of `conv1`\n",
"\n",
"print(model.conv1.weight_mask)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[[[ 1.5338e-01, -1.1766e-01, -2.6654e-01, -2.9445e-02, -1.4650e-01],\n",
" [-1.8796e-01, -2.9882e-01, 6.9725e-02, 2.1561e-01, 6.5688e-02],\n",
" [ 1.5274e-01, -9.8471e-03, 3.2303e-01, 1.3472e-03, 1.7235e-01],\n",
" [ 1.1804e-01, 2.2535e-01, -8.3370e-02, -3.4553e-02, -1.2529e-01],\n",
" [-6.6012e-02, -2.0272e-02, -1.8797e-01, -4.6882e-02, -8.3206e-02]]],\n",
"\n",
"\n",
" [[[-0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 3.8332e-02, -1.4270e-01, -1.9585e-01, 2.2653e-01, 1.0104e-01],\n",
" [-2.7956e-03, -1.4108e-01, -1.4694e-01, -1.3525e-01, 2.6959e-01],\n",
" [ 1.9522e-01, -1.2281e-01, -1.9173e-01, -1.8910e-02, 3.1572e-03],\n",
" [-1.0580e-01, -2.5239e-02, -5.8266e-02, -6.5815e-02, 6.6433e-02],\n",
" [ 8.9601e-02, 7.1189e-02, -2.4255e-01, 1.5746e-01, -1.4708e-01]]],\n",
"\n",
"\n",
" [[[-0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00]]],\n",
"\n",
"\n",
" [[[-0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 4.0772e-02, -8.2086e-02, -2.7555e-01, -3.2547e-01, -1.2226e-01],\n",
" [-5.9877e-02, 9.8567e-02, 2.5186e-01, -1.0280e-01, -2.3416e-01],\n",
" [ 8.5760e-02, 1.0896e-01, 1.4898e-01, 2.1579e-01, 8.5297e-02],\n",
" [ 5.4720e-02, -1.7226e-01, -7.2518e-02, 6.7099e-03, -1.6011e-03],\n",
" [-8.9944e-02, 1.7404e-01, -3.6985e-02, 1.8602e-01, 7.2353e-02]]],\n",
"\n",
"\n",
" [[[ 1.6276e-02, -9.6439e-02, -9.6085e-02, -2.4267e-01, -1.8521e-01],\n",
" [ 6.3310e-02, 1.7866e-01, 1.1694e-01, -1.4464e-01, -2.7711e-01],\n",
" [-2.4514e-02, 2.2222e-01, 2.1053e-01, -1.4271e-01, 8.7045e-02],\n",
" [-1.9207e-01, -5.4719e-02, -5.7775e-03, -1.0034e-05, -1.0923e-01],\n",
" [-2.4006e-02, 2.3780e-02, 1.8988e-01, 2.4734e-01, 4.8097e-02]]],\n",
"\n",
"\n",
" [[[ 1.1335e-01, -5.8451e-02, 5.2440e-02, -1.3223e-01, -2.5534e-02],\n",
" [ 9.1323e-02, -6.0707e-02, 2.3524e-01, 2.4992e-01, 8.7842e-02],\n",
" [ 2.9002e-02, 3.5379e-02, -5.9689e-02, -2.8363e-03, 1.8618e-01],\n",
" [-2.9671e-01, 8.1830e-03, 1.1076e-01, -5.4118e-02, -6.1685e-02],\n",
" [-1.7580e-01, -3.4534e-01, -3.9250e-01, -2.7569e-01, -2.6131e-01]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 6.3785e-02, -1.8044e-01, -1.0190e-01, -1.3588e-01, 8.5433e-02],\n",
" [ 2.0675e-01, 3.3238e-02, 9.2437e-02, 1.1799e-01, 2.1111e-01],\n",
" [-5.2138e-02, 1.5790e-01, 1.8151e-01, 8.0470e-02, 1.0131e-01],\n",
" [-4.4786e-02, 1.1771e-01, 2.1706e-02, -1.2563e-01, -2.1142e-01],\n",
" [-2.3589e-01, -2.1154e-01, -1.7890e-01, -2.7769e-01, -1.2512e-01]]],\n",
"\n",
"\n",
" [[[ 1.9133e-01, 2.4711e-01, 1.0413e-01, -1.9187e-01, -3.0991e-01],\n",
" [-1.2382e-01, 8.3641e-03, -5.6734e-02, 5.8376e-02, 2.2880e-02],\n",
" [-3.1734e-01, -1.0637e-02, -5.5974e-02, 1.0676e-01, -1.1080e-02],\n",
" [-2.2980e-01, 2.0486e-01, 1.0147e-01, 1.4484e-01, 5.2265e-02],\n",
" [ 7.4410e-02, 2.2806e-02, 8.5137e-02, -2.1809e-01, 3.1704e-02]]],\n",
"\n",
"\n",
" [[[-0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 7.6976e-02, -3.8627e-02, 1.2610e-01, 1.1994e-01, 2.1706e-03],\n",
" [ 7.4357e-02, 6.7929e-02, 3.1386e-02, 1.4606e-01, 2.1429e-01],\n",
" [-2.6569e-01, -4.2631e-04, -3.6654e-02, -3.0967e-02, -9.4961e-02],\n",
" [-2.0192e-01, -3.5423e-01, -2.5246e-01, -3.5092e-01, -2.4159e-01],\n",
" [ 1.7636e-02, 1.3744e-01, -1.0306e-01, 8.8370e-02, 7.3258e-02]]],\n",
"\n",
"\n",
" [[[ 2.0016e-01, 1.0956e-01, -5.9223e-02, 6.4871e-03, -2.4165e-01],\n",
" [ 5.6283e-02, 1.7276e-01, -2.2316e-01, -1.6699e-01, -7.0742e-02],\n",
" [ 2.6179e-01, -2.5102e-01, -2.0774e-01, -9.6413e-02, 3.4367e-02],\n",
" [-9.1882e-02, -2.9195e-01, -8.7432e-02, 1.0144e-01, -2.0559e-02],\n",
" [-2.5668e-01, -9.8016e-02, 1.1103e-01, -3.0233e-02, 1.1076e-01]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[-1.8396e-01, -6.8774e-03, -1.6675e-01, 7.1980e-03, 1.9922e-02],\n",
" [ 1.3416e-01, -1.1450e-01, -1.5277e-01, -6.5713e-02, -9.5435e-02],\n",
" [ 1.5406e-01, -9.1235e-02, -1.0880e-01, -7.1603e-02, -9.5575e-02],\n",
" [ 2.1772e-01, 8.4073e-02, -2.5264e-01, -2.1428e-01, 1.9537e-01],\n",
" [ 1.3124e-01, 7.9532e-02, -2.4044e-01, -1.5717e-01, 1.6562e-01]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00]]]],\n",
" device='cuda:0')\n"
]
}
],
"source": [
"# use a dummy input to apply the sparsify.\n",
"\n",
"model(torch.rand(1, 1, 28, 28).to(device))\n",
"\n",
"# the weights of `conv1` have been sparsified.\n",
"\n",
"print(model.conv1.module.weight.data)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:26:05] INFO (nni.compression.pytorch.compressor/MainThread) Model state_dict saved to pruned_naive_mnist_l1filter.pth\n",
"[2021-07-26 22:26:05] INFO (nni.compression.pytorch.compressor/MainThread) Mask dict saved to mask_naive_mnist_l1filter.pth\n"
]
}
],
"source": [
"# export the sparsified model state to './pruned_naive_mnist_l1filter.pth'.\n",
"# export the mask to './mask_naive_mnist_l1filter.pth'.\n",
"\n",
"pruner.export_model(model_path='pruned_naive_mnist_l1filter.pth', mask_path='mask_naive_mnist_l1filter.pth')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 4. Speed Up"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NaiveModel(\n",
" (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))\n",
" (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1))\n",
" (fc1): Linear(in_features=800, out_features=500, bias=True)\n",
" (fc2): Linear(in_features=500, out_features=10, bias=True)\n",
" (relu1): ReLU6()\n",
" (relu2): ReLU6()\n",
" (relu3): ReLU6()\n",
" (max_pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (max_pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
")\n"
]
}
],
"source": [
"# If you use a wrapped model, don't forget to unwrap it.\n",
"\n",
"pruner._unwrap_model()\n",
"\n",
"# the model has been unwrapped.\n",
"\n",
"print(model)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-1-0f2a9eb92f42>:22: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n",
" x = x.view(-1, x.size()[1:].numel())\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) start to speed up the model\n",
"[2021-07-26 22:26:18] INFO (FixMaskConflict/MainThread) {'conv1': 1, 'conv2': 1}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:26:18] INFO (FixMaskConflict/MainThread) dim0 sparsity: 0.500000\n",
"[2021-07-26 22:26:18] INFO (FixMaskConflict/MainThread) dim1 sparsity: 0.000000\n",
"[2021-07-26 22:26:18] INFO (FixMaskConflict/MainThread) Dectected conv prune dim\" 0\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) infer module masks...\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for conv1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for relu1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for max_pool1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for conv2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for relu2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for max_pool2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::view.9\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.jit_translate/MainThread) View Module output size: [-1, 800]\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for fc1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for relu3\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for fc2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::log_softmax.10\n",
"[2021-07-26 22:26:18] ERROR (nni.compression.pytorch.speedup.jit_translate/MainThread) aten::log_softmax is not Supported! Please report an issue at https://github.com/microsoft/nni. Thanks~\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for .aten::log_softmax.10\n",
"[2021-07-26 22:26:18] WARNING (nni.compression.pytorch.speedup.compressor/MainThread) Note: .aten::log_softmax.10 does not have corresponding mask inference object\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for fc2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the fc2\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for relu3\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the relu3\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for fc1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the fc1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for .aten::view.9\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the .aten::view.9\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for max_pool2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the max_pool2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for relu2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the relu2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for conv2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the conv2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for max_pool1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the max_pool1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for relu1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the relu1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for conv1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the conv1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) resolve the mask conflict\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace compressed modules...\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: conv1, op_type: Conv2d)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: relu1, op_type: ReLU6)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: max_pool1, op_type: MaxPool2d)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: conv2, op_type: Conv2d)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: relu2, op_type: ReLU6)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: max_pool2, op_type: MaxPool2d)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::view.9, op_type: aten::view) which is func type\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: fc1, op_type: Linear)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compress_modules/MainThread) replace linear with new in_features: 800, out_features: 500\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: relu3, op_type: ReLU6)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: fc2, op_type: Linear)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compress_modules/MainThread) replace linear with new in_features: 500, out_features: 10\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::log_softmax.10, op_type: aten::log_softmax) which is func type\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) speedup done\n"
]
}
],
"source": [
"from nni.compression.pytorch import ModelSpeedup\n",
"\n",
"m_speedup = ModelSpeedup(model, dummy_input=torch.rand(10, 1, 28, 28).to(device), masks_file='mask_naive_mnist_l1filter.pth')\n",
"m_speedup.speedup_model()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NaiveModel(\n",
" (conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))\n",
" (conv2): Conv2d(10, 50, kernel_size=(5, 5), stride=(1, 1))\n",
" (fc1): Linear(in_features=800, out_features=500, bias=True)\n",
" (fc2): Linear(in_features=500, out_features=10, bias=True)\n",
" (relu1): ReLU6()\n",
" (relu2): ReLU6()\n",
" (relu3): ReLU6()\n",
" (max_pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (max_pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
")\n"
]
}
],
"source": [
"# the `conv1` has been replace from `Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))` to `Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))`\n",
"# and the following layer `conv2` has also changed because the input channel of `conv2` should aware the output channel of `conv1`.\n",
"\n",
"print(model)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train Epoch: 0 [0/60000 (0%)]\tLoss: 0.306930\n",
"Train Epoch: 0 [6400/60000 (11%)]\tLoss: 0.045807\n",
"Train Epoch: 0 [12800/60000 (21%)]\tLoss: 0.049293\n",
"Train Epoch: 0 [19200/60000 (32%)]\tLoss: 0.031464\n",
"Train Epoch: 0 [25600/60000 (43%)]\tLoss: 0.005392\n",
"Train Epoch: 0 [32000/60000 (53%)]\tLoss: 0.005652\n",
"Train Epoch: 0 [38400/60000 (64%)]\tLoss: 0.040619\n",
"Train Epoch: 0 [44800/60000 (75%)]\tLoss: 0.016515\n",
"Train Epoch: 0 [51200/60000 (85%)]\tLoss: 0.092886\n",
"Train Epoch: 0 [57600/60000 (96%)]\tLoss: 0.041380\n",
"\n",
"Test set: Average loss: 0.0257, Accuracy: 9917/10000 (99%)\n",
"\n"
]
}
],
"source": [
"# finetune the model to recover the accuracy.\n",
"\n",
"optimizer = torch.optim.SGD(model.parameters(), lr=0.01)\n",
"\n",
"for epoch in range(0, 1):\n",
" trainer(model, optimizer, criterion, epoch)\n",
" evaluator(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 5. Prepare config_list for quantization"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"config_list = [{\n",
" 'quant_types': ['weight', 'input'],\n",
" 'quant_bits': {'weight': 8, 'input': 8},\n",
" 'op_names': ['conv1', 'conv2']\n",
"}]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 6. Choose a quantizer and quantizing"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NaiveModel(\n",
" (conv1): QuantizerModuleWrapper(\n",
" (module): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))\n",
" )\n",
" (conv2): QuantizerModuleWrapper(\n",
" (module): Conv2d(10, 50, kernel_size=(5, 5), stride=(1, 1))\n",
" )\n",
" (fc1): Linear(in_features=800, out_features=500, bias=True)\n",
" (fc2): Linear(in_features=500, out_features=10, bias=True)\n",
" (relu1): ReLU6()\n",
" (relu2): ReLU6()\n",
" (relu3): ReLU6()\n",
" (max_pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (max_pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
")"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer\n",
"\n",
"quantizer = QAT_Quantizer(model, config_list, optimizer)\n",
"quantizer.compress()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train Epoch: 0 [0/60000 (0%)]\tLoss: 0.004960\n",
"Train Epoch: 0 [6400/60000 (11%)]\tLoss: 0.036269\n",
"Train Epoch: 0 [12800/60000 (21%)]\tLoss: 0.018744\n",
"Train Epoch: 0 [19200/60000 (32%)]\tLoss: 0.021916\n",
"Train Epoch: 0 [25600/60000 (43%)]\tLoss: 0.003095\n",
"Train Epoch: 0 [32000/60000 (53%)]\tLoss: 0.003947\n",
"Train Epoch: 0 [38400/60000 (64%)]\tLoss: 0.032094\n",
"Train Epoch: 0 [44800/60000 (75%)]\tLoss: 0.017358\n",
"Train Epoch: 0 [51200/60000 (85%)]\tLoss: 0.083886\n",
"Train Epoch: 0 [57600/60000 (96%)]\tLoss: 0.040433\n",
"\n",
"Test set: Average loss: 0.0247, Accuracy: 9917/10000 (99%)\n",
"\n"
]
}
],
"source": [
"# finetune the model for calibration.\n",
"\n",
"for epoch in range(0, 1):\n",
" trainer(model, optimizer, criterion, epoch)\n",
" evaluator(model)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:34:41] INFO (nni.compression.pytorch.compressor/MainThread) Model state_dict saved to quantized_naive_mnist_l1filter.pth\n",
"[2021-07-26 22:34:41] INFO (nni.compression.pytorch.compressor/MainThread) Mask dict saved to calibration_naive_mnist_l1filter.pth\n"
]
},
{
"data": {
"text/plain": [
"{'conv1': {'weight_bit': 8,\n",
" 'tracked_min_input': -0.42417848110198975,\n",
" 'tracked_max_input': 2.8212687969207764},\n",
" 'conv2': {'weight_bit': 8,\n",
" 'tracked_min_input': 0.0,\n",
" 'tracked_max_input': 4.246923446655273}}"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# export the sparsified model state to './quantized_naive_mnist_l1filter.pth'.\n",
"# export the calibration config to './calibration_naive_mnist_l1filter.pth'.\n",
"\n",
"quantizer.export_model(model_path='quantized_naive_mnist_l1filter.pth', calibration_path='calibration_naive_mnist_l1filter.pth')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 7. Speed Up"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# speed up with tensorRT\n",
"\n",
"engine = ModelSpeedupTensorRT(model, (32, 1, 28, 28), config=calibration_config, batchsize=32)\n",
"engine.compress()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
#################
Pruning
#################
Pruning is a common technique to compress neural network models.
The pruning methods explore the redundancy in the model weights(parameters) and try to remove/prune the redundant and uncritical weights.
The redundant elements are pruned from the model, their values are zeroed and we make sure they don't take part in the back-propagation process.
From pruning granularity perspective, fine-grained pruning or unstructured pruning refers to pruning each individual weights separately.
Coarse-grained pruning or structured pruning is pruning entire group of weights, such as a convolutional filter.
NNI provides multiple unstructured pruning and structured pruning algorithms.
It supports Tensorflow and PyTorch with unified interface.
For users to prune their models, they only need to add several lines in their code.
For the structured filter pruning, NNI also provides a dependency-aware mode. In the dependency-aware mode, the
filter pruner will get better speed gain after the speedup.
For details, please refer to the following tutorials:
.. toctree::
:maxdepth: 2
Pruners <Pruner>
Dependency Aware Mode <DependencyAware>
Model Speedup <ModelSpeedup>
.. 0f2050a973cfb2207984b4e58c4baf28
#################
剪枝
#################
剪枝是一种常用的神经网络模型压缩技术。
剪枝算法探索模型权重(参数)中的冗余,并尝试去除冗余和非关键权重,
将它们的值归零,确保其不参与反向传播过程。
从剪枝粒度的角度来看,细粒度剪枝或非结构化剪枝是指分别对每个权重进行剪枝。
粗粒度剪枝或结构化剪枝是修剪整组权重,例如卷积滤波器。
NNI 提供了多种非结构化和结构化剪枝算法。
其使用了统一的接口来支持 TensorFlow 和 PyTorch。
只需要添加几行代码即可压缩模型。
对于结构化滤波器剪枝,NNI 还提供了依赖感知模式。 在依赖感知模式下,
滤波器剪枝在加速后会获得更好的速度增益。
详细信息,参考以下教程:
.. toctree::
:maxdepth: 2
Pruners <Pruner>
依赖感知模式 <DependencyAware>
模型加速 <ModelSpeedup>
.. fe32a6de0be31a992afadba5cf6ffe23
#################
量化
#################
量化是指通过减少权重表示或激活所需的比特数来压缩模型,
从而减少计算量和推理时间。 在深度神经网络的背景下,模型权重主要的数据
格式是32位浮点数。 许多研究工作表明,在不显着降低精度的情况下,权重和激活
可以使用8位整数表示, 更低的比特位数,例如4/2/1比特,
是否能够表示权重也是目前非常活跃的研究方向。
一个 Quantizer 是指一种 NNI 实现的量化算法,NNI 提供了多个 Quantizer,如下所示。你也可以
使用 NNI 模型压缩的接口来创造你的 Quantizer。
.. toctree::
:maxdepth: 2
Quantizers <Quantizer>
量化加速 <QuantizationSpeedup>
Pruning V2
==========
Pruning V2 is a refactoring of the old version and provides more powerful functions.
Compared with the old version, the iterative pruning process is detached from the pruner and the pruner is only responsible for pruning and generating the masks once.
What's more, pruning V2 unifies the pruning process and provides a more free combination of pruning components.
Task generator only cares about the pruning effect that should be achieved in each round, and uses a config list to express how to pruning in the next step.
Pruner will reset with the model and config list given by task generator then generate the masks in current step.
For a clearer structure vision, please refer to the figure below.
.. image:: ../../img/pruning_process.png
:target: ../../img/pruning_process.png
:alt:
In V2, a pruning process is usually driven by a pruning scheduler, it contains a specific pruner and a task generator.
But users can also use pruner directly like in the pruning V1.
For details, please refer to the following tutorials:
.. toctree::
:maxdepth: 2
Pruning Algorithms <v2_pruning_algo>
Pruning Scheduler <v2_scheduler>
Pruning Config List <v2_pruning_config_list>
Supported Pruning Algorithms in NNI
===================================
NNI provides several pruning algorithms that reproducing from the papers. In pruning v2, NNI split the pruning algorithm into more detailed components.
This means users can freely combine components from different algorithms,
or easily use a component of their own implementation to replace a step in the original algorithm to implement their own pruning algorithm.
Right now, pruning algorithms with how to generate masks in one step are implemented as pruners,
and how to schedule sparsity in each iteration are implemented as iterative pruners.
**Pruner**
* `Level Pruner <#level-pruner>`__
* `L1 Norm Pruner <#l1-norm-pruner>`__
* `L2 Norm Pruner <#l2-norm-pruner>`__
* `FPGM Pruner <#fpgm-pruner>`__
* `Slim Pruner <#slim-pruner>`__
* `Activation APoZ Rank Pruner <#activation-apoz-rank-pruner>`__
* `Activation Mean Rank Pruner <#activation-mean-rank-pruner>`__
* `Taylor FO Weight Pruner <#taylor-fo-weight-pruner>`__
* `ADMM Pruner <#admm-pruner>`__
* `Movement Pruner <#movement-pruner>`__
**Iterative Pruner**
* `Linear Pruner <#linear-pruner>`__
* `AGP Pruner <#agp-pruner>`__
* `Lottery Ticket Pruner <#lottery-ticket-pruner>`__
* `Simulated Annealing Pruner <#simulated-annealing-pruner>`__
* `Auto Compress Pruner <#auto-compress-pruner>`__
* `AMC Pruner <#amc-pruner>`__
Level Pruner
------------
This is a basic pruner, and in some papers called it magnitude pruning or fine-grained pruning.
It will mask the weight in each specified layer with smaller absolute value by a ratio configured in the config list.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(model, config_list)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/level_pruning_torch.py <examples/model_compress/pruning/v2/level_pruning_torch.py>`
User configuration for Level Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LevelPruner
L1 Norm Pruner
--------------
L1 norm pruner computes the l1 norm of the layer weight on the first dimension,
then prune the weight blocks on this dimension with smaller l1 norm values.
i.e., compute the l1 norm of the filters in convolution layer as metric values,
compute the l1 norm of the weight by rows in linear layer as metric values.
For more details, please refer to `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__\.
In addition, L1 norm pruner also supports dependency-aware mode.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import L1NormPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L1NormPruner(model, config_list)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/norm_pruning_torch.py <examples/model_compress/pruning/v2/norm_pruning_torch.py>`
User configuration for L1 Norm Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.L1NormPruner
L2 Norm Pruner
--------------
L2 norm pruner is a variant of L1 norm pruner. It uses l2 norm as metric to determine which weight elements should be pruned.
L2 norm pruner also supports dependency-aware mode.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import L2NormPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L2NormPruner(model, config_list)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/norm_pruning_torch.py <examples/model_compress/pruning/v2/norm_pruning_torch.py>`
User configuration for L2 Norm Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.L2NormPruner
FPGM Pruner
-----------
FPGM pruner prunes the blocks of the weight on the first dimension with the smallest geometric median.
FPGM chooses the weight blocks with the most replaceable contribution.
For more details, please refer to `Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration <https://arxiv.org/abs/1811.00250>`__.
FPGM pruner also supports dependency-aware mode.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import FPGMPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = FPGMPruner(model, config_list)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/fpgm_pruning_torch.py <examples/model_compress/pruning/v2/fpgm_pruning_torch.py>`
User configuration for FPGM Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.FPGMPruner
Slim Pruner
-----------
Slim pruner adds sparsity regularization on the scaling factors of batch normalization (BN) layers during training to identify unimportant channels.
The channels with small scaling factor values will be pruned.
For more details, please refer to `Learning Efficient Convolutional Networks through Network Slimming <https://arxiv.org/abs/1708.06519>`__\.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import SlimPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
pruner = SlimPruner(model, config_list, trainer, traced_optimizer, criterion, training_epochs=1)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/slim_pruning_torch.py <examples/model_compress/pruning/v2/slim_pruning_torch.py>`
User configuration for Slim Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.SlimPruner
Activation APoZ Rank Pruner
---------------------------
Activation APoZ rank pruner is a pruner which prunes on the first weight dimension,
with the smallest importance criterion ``APoZ`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity.
The pruning criterion ``APoZ`` is explained in the paper `Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures <https://arxiv.org/abs/1607.03250>`__.
The APoZ is defined as:
:math:`APoZ_{c}^{(i)} = APoZ\left(O_{c}^{(i)}\right)=\frac{\sum_{k}^{N} \sum_{j}^{M} f\left(O_{c, j}^{(i)}(k)=0\right)}{N \times M}`
Activation APoZ rank pruner also supports dependency-aware mode.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import ActivationAPoZRankPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = ActivationAPoZRankPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/activation_pruning_torch.py <examples/model_compress/pruning/v2/activation_pruning_torch.py>`
User configuration for Activation APoZ Rank Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ActivationAPoZRankPruner
Activation Mean Rank Pruner
---------------------------
Activation mean rank pruner is a pruner which prunes on the first weight dimension,
with the smallest importance criterion ``mean activation`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity.
The pruning criterion ``mean activation`` is explained in section 2.2 of the paper `Pruning Convolutional Neural Networks for Resource Efficient Inference <https://arxiv.org/abs/1611.06440>`__.
Activation mean rank pruner also supports dependency-aware mode.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import ActivationMeanRankPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.traces(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = ActivationMeanRankPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/activation_pruning_torch.py <examples/model_compress/pruning/v2/activation_pruning_torch.py>`
User configuration for Activation Mean Rank Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ActivationMeanRankPruner
Taylor FO Weight Pruner
-----------------------
Taylor FO weight pruner is a pruner which prunes on the first weight dimension,
based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity.
The estimated importance is defined as the paper `Importance Estimation for Neural Network Pruning <http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf>`__.
:math:`\widehat{\mathcal{I}}_{\mathcal{S}}^{(1)}(\mathbf{W}) \triangleq \sum_{s \in \mathcal{S}} \mathcal{I}_{s}^{(1)}(\mathbf{W})=\sum_{s \in \mathcal{S}}\left(g_{s} w_{s}\right)^{2}`
Taylor FO weight pruner also supports dependency-aware mode.
What's more, we provide a global-sort mode for this pruner which is aligned with paper implementation.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import TaylorFOWeightPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = TaylorFOWeightPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/taylorfo_pruning_torch.py <examples/model_compress/pruning/v2/taylorfo_pruning_torch.py>`
User configuration for Activation Mean Rank Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.TaylorFOWeightPruner
ADMM Pruner
-----------
Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
by decomposing the original nonconvex problem into two subproblems that can be solved iteratively.
In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively.
During the process of solving these two subproblems, the weights of the original model will be changed.
Then a fine-grained pruning will be applied to prune the model according to the config list given.
This solution framework applies both to non-structured and different variations of structured pruning schemes.
For more details, please refer to `A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers <https://arxiv.org/abs/1804.03294>`__.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import ADMMPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = ADMMPruner(model, config_list, trainer, traced_optimizer, criterion, iterations=10, training_epochs=1)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/admm_pruning_torch.py <examples/model_compress/pruning/v2/admm_pruning_torch.py>`
User configuration for ADMM Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ADMMPruner
Movement Pruner
---------------
Movement pruner is an implementation of movement pruning.
This is a "fine-pruning" algorithm, which means the masks may change during each fine-tuning step.
Each weight element will be scored by the opposite of the sum of the product of weight and its gradient during each step.
This means the weight elements moving towards zero will accumulate negative scores, the weight elements moving away from zero will accumulate positive scores.
The weight elements with low scores will be masked during inference.
The following figure from the paper shows the weight pruning by movement pruning.
.. image:: ../../img/movement_pruning.png
:target: ../../img/movement_pruning.png
:alt:
For more details, please refer to `Movement Pruning: Adaptive Sparsity by Fine-Tuning <https://arxiv.org/abs/2005.07683>`__.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import MovementPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{'op_types': ['Linear'], 'op_partial_names': ['bert.encoder'], 'sparsity': 0.9}]
pruner = MovementPruner(model, config_list, trainer, traced_optimizer, criterion, 10, 3000, 27000)
masked_model, masks = pruner.compress()
User configuration for Movement Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.MovementPruner
Reproduced Experiment
^^^^^^^^^^^^^^^^^^^^^
.. list-table::
:header-rows: 1
:widths: auto
* - Model
- Dataset
- Remaining Weights
- MaP acc.(paper/ours)
- MvP acc.(paper/ours)
* - Bert base
- MNLI - Dev
- 10%
- 77.8% / 73.6%
- 79.3% / 78.8%
Linear Pruner
-------------
Linear pruner is an iterative pruner, it will increase sparsity evenly from scratch during each iteration.
For example, the final sparsity is set as 0.5, and the iteration number is 5, then the sparsity used in each iteration are ``[0, 0.1, 0.2, 0.3, 0.4, 0.5]``.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import LinearPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = LinearPruner(model, config_list, pruning_algorithm='l1', total_iteration=10, finetuner=finetuner)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/iterative_pruning_torch.py <examples/model_compress/pruning/v2/iterative_pruning_torch.py>`
User configuration for Linear Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LinearPruner
AGP Pruner
----------
This is an iterative pruner, which the sparsity is increased from an initial sparsity value :math:`s_{i}` (usually 0) to a final sparsity value :math:`s_{f}` over a span of :math:`n` pruning iterations,
starting at training step :math:`t_{0}` and with pruning frequency :math:`\Delta t`:
:math:`s_{t}=s_{f}+\left(s_{i}-s_{f}\right)\left(1-\frac{t-t_{0}}{n \Delta t}\right)^{3} \text { for } t \in\left\{t_{0}, t_{0}+\Delta t, \ldots, t_{0} + n \Delta t\right\}`
For more details please refer to `To prune, or not to prune: exploring the efficacy of pruning for model compression <https://arxiv.org/abs/1710.01878>`__\.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import AGPPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = AGPPruner(model, config_list, pruning_algorithm='l1', total_iteration=10, finetuner=finetuner)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/iterative_pruning_torch.py <examples/model_compress/pruning/v2/iterative_pruning_torch.py>`
User configuration for AGP Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.AGPPruner
Lottery Ticket Pruner
---------------------
`The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks <https://arxiv.org/abs/1803.03635>`__\ ,
authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis,
and articulate the *lottery ticket hypothesis*\ : dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*\ ) that
-- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.
In this paper, the authors use the following process to prune a model, called *iterative prunning*\ :
..
#. Randomly initialize a neural network f(x;theta_0) (where theta\ *0 follows D*\ {theta}).
#. Train the network for j iterations, arriving at parameters theta_j.
#. Prune p% of the parameters in theta_j, creating a mask m.
#. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
#. Repeat step 2, 3, and 4.
If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning,
each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import LotteryTicketPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = LotteryTicketPruner(model, config_list, pruning_algorithm='l1', total_iteration=10, finetuner=finetuner, reset_weight=True)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/iterative_pruning_torch.py <examples/model_compress/pruning/v2/iterative_pruning_torch.py>`
User configuration for Lottery Ticket Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LotteryTicketPruner
Simulated Annealing Pruner
--------------------------
We implement a guided heuristic search method, Simulated Annealing (SA) algorithm. As mentioned in the paper, this method is enhanced on guided search based on prior experience.
The enhanced SA technique is based on the observation that a DNN layer with more number of weights often has a higher degree of model compression with less impact on overall accuracy.
* Randomly initialize a pruning rate distribution (sparsities).
* While current_temperature < stop_temperature:
#. generate a perturbation to current distribution
#. Perform fast evaluation on the perturbated distribution
#. accept the perturbation according to the performance and probability, if not accepted, return to step 1
#. cool down, current_temperature <- current_temperature * cool_down_rate
For more details, please refer to `AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates <https://arxiv.org/abs/1907.03141>`__.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import SimulatedAnnealingPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = SimulatedAnnealingPruner(model, config_list, pruning_algorithm='l1', evaluator=evaluator, cool_down_rate=0.9, finetuner=finetuner)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/simulated_anealing_pruning_torch.py <examples/model_compress/pruning/v2/simulated_anealing_pruning_torch.py>`
User configuration for Simulated Annealing Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.SimulatedAnnealingPruner
Auto Compress Pruner
--------------------
For total iteration number :math:`N`, AutoCompressPruner prune the model that survive the previous iteration for a fixed sparsity ratio (e.g., :math:`1-{(1-0.8)}^{(1/N)}`) to achieve the overall sparsity (e.g., :math:`0.8`):
.. code-block:: bash
1. Generate sparsities distribution using SimulatedAnnealingPruner
2. Perform ADMM-based pruning to generate pruning result for the next iteration.
For more details, please refer to `AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates <https://arxiv.org/abs/1907.03141>`__.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import AutoCompressPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
admm_params = {
'trainer': trainer,
'traced_optimizer': traced_optimizer,
'criterion': criterion,
'iterations': 10,
'training_epochs': 1
}
sa_params = {
'evaluator': evaluator
}
pruner = AutoCompressPruner(model, config_list, 10, admm_params, sa_params, finetuner=finetuner)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
The full script can be found :githublink:`here <examples/model_compress/pruning/v2/auto_compress_pruner.py>`.
User configuration for Auto Compress Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.AutoCompressPruner
AMC Pruner
----------
AMC pruner leverages reinforcement learning to provide the model compression policy.
According to the author, this learning-based compression policy outperforms conventional rule-based compression policy by having a higher compression ratio,
better preserving the accuracy and freeing human labor.
For more details, please refer to `AMC: AutoML for Model Compression and Acceleration on Mobile Devices <https://arxiv.org/pdf/1802.03494.pdf>`__.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import AMCPruner
config_list = [{'op_types': ['Conv2d'], 'total_sparsity': 0.5, 'max_sparsity_per_layer': 0.8}]
pruner = AMCPruner(400, model, config_list, dummy_input, evaluator, finetuner=finetuner)
pruner.compress()
The full script can be found :githublink:`here <examples/model_compress/pruning/v2/amc_pruning_torch.py>`.
User configuration for AMC Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.AMCPruner
.. 1ec93e31648291b0c881655304116b50
剪枝(V2版本)
===============
剪枝(V2版本)是对旧版本的重构,提供了更强大的功能。
与旧版本相比,迭代剪枝过程与剪枝器(pruner)分离,剪枝器只负责剪枝且生成掩码一次。
更重要的是,V2版本统一了剪枝过程,并提供了更自由的剪枝组件组合。
任务生成器(task generator)只关心在每一轮中应该达到的修剪效果,并使用配置列表(config list)来表示下一步如何修剪。
剪枝器将使用任务生成器提供的模型和配置列表重置,然后在当前步骤中生成掩码。
有关更清晰的架构,请参考下图。
.. image:: ../../img/pruning_process.png
:target: ../../img/pruning_process.png
:alt:
在V2版本中,修剪过程通常由剪枝调度器(pruning scheduler)驱动,它包含一个特定的剪枝器和一个任务生成器。
但是用户也可以像V1版本中那样直接使用剪枝器。
有关详细信息,请参阅以下教程:
.. toctree::
:maxdepth: 1
剪枝算法 <v2_pruning_algo>
剪枝调度器接口 <v2_scheduler>
剪枝配置 <v2_pruning_config_list>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment