Unverified Commit 19eabd69 authored by chicm-ms's avatar chicm-ms Committed by GitHub
Browse files

Refactor model compression documentation (#2612)

parent 0b9d6ce6
# Customize New Compression Algorithm
```eval_rst
.. contents::
```
In order to simplify the process of writing new compression algorithms, we have designed simple and flexible programming interface, which covers pruning and quantization. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.
**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference [Framework overview of model compression](https://nni.readthedocs.io/en/latest/Compressor/Framework.html)
## Customize a new pruning algorithm
Implementing a new pruning algorithm requires implementing a `weight masker` class which shoud be a subclass of `WeightMasker`, and a `pruner` class, which should be a subclass `Pruner`.
An implementation of `weight masker` may look like this:
```python
class MyMasker(WeightMasker):
def __init__(self, model, pruner):
super().__init__(model, pruner)
# You can do some initialization here, such as collecting some statistics data
# if it is necessary for your algorithms to calculate the masks.
def calc_mask(self, sparsity, wrapper, wrapper_idx=None):
# calculate the masks based on the wrapper.weight, and sparsity,
# and anything else
# mask = ...
return {'weight_mask': mask}
```
You can reference nni provided [weight masker](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py) implementations to implement your own weight masker.
A basic `pruner` looks likes this:
```python
class MyPruner(Pruner):
def __init__(self, model, config_list, optimizer):
super().__init__(model, config_list, optimizer)
self.set_wrappers_attribute("if_calculated", False)
# construct a weight masker instance
self.masker = MyMasker(model, self)
def calc_mask(self, wrapper, wrapper_idx=None):
sparsity = wrapper.config['sparsity']
if wrapper.if_calculated:
# Already pruned, do not prune again as a one-shot pruner
return None
else:
# call your masker to actually calcuate the mask for this layer
masks = self.masker.calc_mask(sparsity=sparsity, wrapper=wrapper, wrapper_idx=wrapper_idx)
wrapper.if_calculated = True
return masks
```
Reference nni provided [pruner](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/compression/torch/pruning/one_shot.py) implementations to implement your own pruner class.
***
## Customize a new quantization algorithm
To write a new quantization algorithm, you can write a class that inherits `nni.compression.torch.Quantizer`. Then, override the member functions with the logic of your algorithm. The member function to override is `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
```python
from nni.compression.torch import Quantizer
class YourQuantizer(Quantizer):
def __init__(self, model, config_list):
"""
Suggest you to use the NNI defined spec for config
"""
super().__init__(model, config_list)
def quantize_weight(self, weight, config, **kwargs):
"""
quantize should overload this method to quantize weight tensors.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
weight : Tensor
weight that needs to be quantized
config : dict
the configuration for weight quantization
"""
# Put your code to generate `new_weight` here
return new_weight
def quantize_output(self, output, config, **kwargs):
"""
quantize should overload this method to quantize output.
This method is effectively hooked to `:meth:`forward` of the model.
Parameters
----------
output : Tensor
output that needs to be quantized
config : dict
the configuration for output quantization
"""
# Put your code to generate `new_output` here
return new_output
def quantize_input(self, *inputs, config, **kwargs):
"""
quantize should overload this method to quantize input.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
inputs : Tensor
inputs that needs to be quantized
config : dict
the configuration for inputs quantization
"""
# Put your code to generate `new_input` here
return new_input
def update_epoch(self, epoch_num):
pass
def step(self):
"""
Can do some processing based on the model or weights binded
in the func bind_model
"""
pass
```
### Customize backward function
Sometimes it's necessary for a quantization operation to have a customized backward function, such as [Straight-Through Estimator](https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste), user can customize a backward function as follow:
```python
from nni.compression.torch.compressor import Quantizer, QuantGrad, QuantType
class ClipGrad(QuantGrad):
@staticmethod
def quant_backward(tensor, grad_output, quant_type):
"""
This method should be overrided by subclass to provide customized backward function,
default implementation is Straight-Through Estimator
Parameters
----------
tensor : Tensor
input of quantization operation
grad_output : Tensor
gradient of the output of quantization operation
quant_type : QuantType
the type of quantization, it can be `QuantType.QUANT_INPUT`, `QuantType.QUANT_WEIGHT`, `QuantType.QUANT_OUTPUT`,
you can define different behavior for different types.
Returns
-------
tensor
gradient of the input of quantization operation
"""
# for quant_output function, set grad to zero if the absolute value of tensor is larger than 1
if quant_type == QuantType.QUANT_OUTPUT:
grad_output[torch.abs(tensor) > 1] = 0
return grad_output
class YourQuantizer(Quantizer):
def __init__(self, model, config_list):
super().__init__(model, config_list)
# set your customized backward function to overwrite default backward function
self.quant_grad = ClipGrad
```
If you do not customize `QuantGrad`, the default backward is Straight-Through Estimator.
_Coming Soon_ ...
# Customize A New Compression Algorithm
# Framework overview of model compression
```eval_rst
.. contents::
```
To simplify writing a new compression algorithm, we design programming interfaces which are simple but flexible enough. There are interfaces for pruning and quantization respectively. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.
Below picture shows the components overview of model compression framework.
## Customize a new pruning algorithm
![](../../img/compressor_framework.jpg)
To better demonstrate how to customize a new pruning algorithm, it is necessary for users to first understand the framework for supporting various pruning algorithms in NNI.
There are 3 major components/classes in NNI model compression framework: `Compressor`, `Pruner` and `Quantizer`. Let's look at them in detail one by one:
### Framework overview for pruning algorithms
## Compressor
Following example shows how to use a pruner:
Compressor is the base class for pruner and quntizer, it provides a unified interface for pruner and quantizer for end users, so that pruner and quantizer can be used in the same way. For example, to use a pruner:
```python
from nni.compression.torch import LevelPruner
......@@ -32,82 +32,25 @@ model = pruner.compress()
# the model will be pruned during training automatically
```
A pruner receives `model`, `config_list` and `optimizer` as arguments. It prunes the model per the `config_list` during training loop by adding a hook on `optimizer.step()`.
From implementation perspective, a pruner consists of a `weight masker` instance and multiple `module wrapper` instances.
#### Weight masker
A `weight masker` is the implementation of pruning algorithms, it can prune a specified layer wrapped by `module wrapper` with specified sparsity.
#### Module wrapper
A `module wrapper` is a module containing:
1. the origin module
2. some buffers used by `calc_mask`
3. a new forward method that applies masks before running the original forward method.
the reasons to use `module wrapper`:
1. some buffers are needed by `calc_mask` to calculate masks and these buffers should be registered in `module wrapper` so that the original modules are not contaminated.
2. a new `forward` method is needed to apply masks to weight before calling the real `forward` method.
#### Pruner
A `pruner` is responsible for:
1. Manage / verify config_list.
2. Use `module wrapper` to wrap the model layers and add hook on `optimizer.step`
3. Use `weight masker` to calculate masks of layers while pruning.
4. Export pruned model weights and masks.
### Implement a new pruning algorithm
Implementing a new pruning algorithm requires implementing a `weight masker` class which shoud be a subclass of `WeightMasker`, and a `pruner` class, which should be a subclass `Pruner`.
An implementation of `weight masker` may look like this:
To use a quantizer:
```python
class MyMasker(WeightMasker):
def __init__(self, model, pruner):
super().__init__(model, pruner)
# You can do some initialization here, such as collecting some statistics data
# if it is necessary for your algorithms to calculate the masks.
def calc_mask(self, sparsity, wrapper, wrapper_idx=None):
# calculate the masks based on the wrapper.weight, and sparsity,
# and anything else
# mask = ...
return {'weight_mask': mask}
```
You can reference nni provided [weight masker](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py) implementations to implement your own weight masker.
A basic `pruner` looks likes this:
from nni.compression.torch import DoReFaQuantizer
```python
class MyPruner(Pruner):
def __init__(self, model, config_list, optimizer):
super().__init__(model, config_list, optimizer)
self.set_wrappers_attribute("if_calculated", False)
# construct a weight masker instance
self.masker = MyMasker(model, self)
def calc_mask(self, wrapper, wrapper_idx=None):
sparsity = wrapper.config['sparsity']
if wrapper.if_calculated:
# Already pruned, do not prune again as a one-shot pruner
return None
else:
# call your masker to actually calcuate the mask for this layer
masks = self.masker.calc_mask(sparsity=sparsity, wrapper=wrapper, wrapper_idx=wrapper_idx)
wrapper.if_calculated = True
return masks
configure_list = [{
'quant_types': ['weight'],
'quant_bits': {
'weight': 8,
},
'op_types':['Conv2d', 'Linear']
}]
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
quantizer = DoReFaQuantizer(model, configure_list, optimizer)
quantizer.compress()
```
View [example code](https://github.com/microsoft/nni/tree/master/examples/model_compress) for more information.
Reference nni provided [pruner](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/compression/torch/pruning/one_shot.py) implementations to implement your own pruner class.
`Compressor` class provides some utility methods for subclass and users:
### Set wrapper attribute
......@@ -148,130 +91,104 @@ collector_id = self.pruner.add_activation_collector(collector)
self.pruner.remove_activation_collector(collector_id)
```
### Multi-GPU support
***
## Pruner
A pruner receives `model`, `config_list` and `optimizer` as arguments. It prunes the model per the `config_list` during training loop by adding a hook on `optimizer.step()`.
Pruner class is a subclass of Compressor, so it contains everything in the Compressor class and some additional components only for pruning, it contains:
### Weight masker
A `weight masker` is the implementation of pruning algorithms, it can prune a specified layer wrapped by `module wrapper` with specified sparsity.
### Pruning module wrapper
A `pruning module wrapper` is a module containing:
1. the origin module
2. some buffers used by `calc_mask`
3. a new forward method that applies masks before running the original forward method.
the reasons to use `module wrapper`:
1. some buffers are needed by `calc_mask` to calculate masks and these buffers should be registered in `module wrapper` so that the original modules are not contaminated.
2. a new `forward` method is needed to apply masks to weight before calling the real `forward` method.
### Pruning hook
A pruning hook is installed on a pruner when the pruner is constructed, it is used to call pruner's calc_mask method at `optimizer.step()` is invoked.
On multi-GPU training, buffers and parameters are copied to multiple GPU every time the `forward` method runs on multiple GPU. If buffers and parameters are updated in the `forward` method, an `in-place` update is needed to ensure the update is effective.
Since `calc_mask` is called in the `optimizer.step` method, which happens after the `forward` method and happens only on one GPU, it supports multi-GPU naturally.
***
## Customize a new quantization algorithm
## Quantizer
To write a new quantization algorithm, you can write a class that inherits `nni.compression.torch.Quantizer`. Then, override the member functions with the logic of your algorithm. The member function to override is `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
Quantizer class is also a subclass of `Compressor`, it is used to compress models by reducing the number of bits required to represent weights or activations, which can reduce the computations and the inference time. It contains:
```python
from nni.compression.torch import Quantizer
### Quantization module wrapper
Each module/layer of the model to be quantized is wrapped by a quantization module wrapper, it provides a new `forward` method to quantize the original module's weight, input and output.
class YourQuantizer(Quantizer):
def __init__(self, model, config_list):
### Quantization hook
A quantization hook is installed on a quntizer when it is constructed, it is call at `optimizer.step()`.
### Quantization methods
`Quantizer` class provides following methods for subclass to implement quantization algorithms:
```python
class Quantizer(Compressor):
"""
Suggest you to use the NNI defined spec for config
Base quantizer for pytorch quantizer
"""
super().__init__(model, config_list)
def quantize_weight(self, weight, config, **kwargs):
def quantize_weight(self, weight, wrapper, **kwargs):
"""
quantize should overload this method to quantize weight tensors.
quantize should overload this method to quantize weight.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
weight : Tensor
weight that needs to be quantized
config : dict
the configuration for weight quantization
wrapper : QuantizerModuleWrapper
the wrapper for origin module
"""
raise NotImplementedError('Quantizer must overload quantize_weight()')
# Put your code to generate `new_weight` here
return new_weight
def quantize_output(self, output, config, **kwargs):
def quantize_output(self, output, wrapper, **kwargs):
"""
quantize should overload this method to quantize output.
This method is effectively hooked to `:meth:`forward` of the model.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
output : Tensor
output that needs to be quantized
config : dict
the configuration for output quantization
wrapper : QuantizerModuleWrapper
the wrapper for origin module
"""
raise NotImplementedError('Quantizer must overload quantize_output()')
# Put your code to generate `new_output` here
return new_output
def quantize_input(self, *inputs, config, **kwargs):
def quantize_input(self, *inputs, wrapper, **kwargs):
"""
quantize should overload this method to quantize input.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
inputs : Tensor
inputs that needs to be quantized
config : dict
the configuration for inputs quantization
wrapper : QuantizerModuleWrapper
the wrapper for origin module
"""
raise NotImplementedError('Quantizer must overload quantize_input()')
# Put your code to generate `new_input` here
return new_input
def update_epoch(self, epoch_num):
pass
def step(self):
"""
Can do some processing based on the model or weights binded
in the func bind_model
"""
pass
```
### Customize backward function
Sometimes it's necessary for a quantization operation to have a customized backward function, such as [Straight-Through Estimator](https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste), user can customize a backward function as follow:
```python
from nni.compression.torch.compressor import Quantizer, QuantGrad, QuantType
class ClipGrad(QuantGrad):
@staticmethod
def quant_backward(tensor, grad_output, quant_type):
"""
This method should be overrided by subclass to provide customized backward function,
default implementation is Straight-Through Estimator
Parameters
----------
tensor : Tensor
input of quantization operation
grad_output : Tensor
gradient of the output of quantization operation
quant_type : QuantType
the type of quantization, it can be `QuantType.QUANT_INPUT`, `QuantType.QUANT_WEIGHT`, `QuantType.QUANT_OUTPUT`,
you can define different behavior for different types.
Returns
-------
tensor
gradient of the input of quantization operation
"""
# for quant_output function, set grad to zero if the absolute value of tensor is larger than 1
if quant_type == QuantType.QUANT_OUTPUT:
grad_output[torch.abs(tensor) > 1] = 0
return grad_output
***
class YourQuantizer(Quantizer):
def __init__(self, model, config_list):
super().__init__(model, config_list)
# set your customized backward function to overwrite default backward function
self.quant_grad = ClipGrad
## Multi-GPU support
```
On multi-GPU training, buffers and parameters are copied to multiple GPU every time the `forward` method runs on multiple GPU. If buffers and parameters are updated in the `forward` method, an `in-place` update is needed to ensure the update is effective.
Since `calc_mask` is called in the `optimizer.step` method, which happens after the `forward` method and happens only on one GPU, it supports multi-GPU naturally.
If you do not customize `QuantGrad`, the default backward is Straight-Through Estimator.
_Coming Soon_ ...
\ No newline at end of file
This diff is collapsed.
......@@ -22,4 +22,5 @@ For details, please refer to the following tutorials:
Automatic Model Compression <Compressor/AutoCompression>
Model Speedup <Compressor/ModelSpeedup>
Compression Utilities <Compressor/CompressionUtils>
Customize Compression Algorithms <Compressor/Framework>
Compression Framework <Compressor/Framework>
Customize Compression Algorithms <Compressor/CustomizeCompressor>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment