@@ -232,7 +232,7 @@ For NNI on Windows, please refer to [NNI on Windows](docs/en_US/Tutorial/NniOnWi
**Verify install**
The following example is an experiment built on TensorFlow. Make sure you have **TensorFlow installed** before running it.
The following example is an experiment built on TensorFlow. Make sure you have **TensorFlow 1.x installed** before running it. Note that **currently Tensorflow 2.0 is NOT supported**.
* Download the examples via clone the source code.
```{ 'sparsity': 0.8, 'op_types': 'default' }```means that **all layers with weight will be compressed with the same 0.8 sparsity**. When ```pruner(model)``` called, the model is compressed with masks and after that you can normally fine tune this model and **pruned weights won't be updated** which have been masked.
## Then, make this automatic
The previous example manually choosed LevelPruner and pruned all layers with the same sparsity, this is obviously sub-optimal because different layers may have different redundancy. Layer sparsity should be carefully tuned to achieve least model performance degradation and this can be done with NNI tuners.
The first thing we need to do is to design a search space, here we use a nested search space which contains choosing pruning algorithm and optimizing layer sparsity.
NNI provides an easy-to-use toolkit to help user design and use compression algorithms. It supports Tensorflow and PyTorch with unified interface. For users to compress their models, they only need to add several lines in their code. There are some popular model compression algorithms built-in in NNI. Users could further use NNI's auto tuning power to find the best compressed model, which is detailed in [Auto Model Compression](./AutoCompression.md). On the other hand, users could easily customize their new compression algorithms using NNI's interface, refer to the tutorial [here](#customize-new-compression-algorithms).
## Supported algorithms
We have provided two naive compression algorithms and four popular ones for users, including three pruning algorithms and three quantization algorithms:
We have provided two naive compression algorithms and three popular ones for users, including two pruning algorithms and three quantization algorithms:
|Name|Brief Introduction of Algorithm|
|---|---|
| [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
| [AGP Pruner](./Pruner.md#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
| [Sensitivity Pruner](./Pruner.md#sensitivity-pruner) | Learning both Weights and Connections for Efficient Neural Networks. [Reference Paper](https://arxiv.org/abs/1506.02626)|
| [QAT Quantizer](./Quantizer.md#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
| [DoReFa Quantizer](./Quantizer.md#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
@@ -48,7 +48,7 @@ from nni.compression.tensorflow import AGP_Pruner
config_list=[{
'initial_sparsity':0,
'final_sparsity':0.8,
'start_epoch':1,
'start_epoch':0,
'end_epoch':10,
'frequency':1,
'op_types':'default'
...
...
@@ -62,7 +62,7 @@ from nni.compression.torch import AGP_Pruner
config_list=[{
'initial_sparsity':0,
'final_sparsity':0.8,
'start_epoch':1,
'start_epoch':0,
'end_epoch':10,
'frequency':1,
'op_types':'default'
...
...
@@ -86,47 +86,9 @@ You can view example for more information
#### User configuration for AGP Pruner
***initial_sparsity:** This is to specify the sparsity when compressor starts to compress
***final_sparsity:** This is to specify the sparsity when compressor finishes to compress
***start_epoch:** This is to specify the epoch number when compressor starts to compress
***start_epoch:** This is to specify the epoch number when compressor starts to compress, default start from epoch 0
***end_epoch:** This is to specify the epoch number when compressor finishes to compress
***frequency:** This is to specify every *frequency* number epochs compressor compress once
***frequency:** This is to specify every *frequency* number epochs compressor compress once, default frequency=1
***
## Sensitivity Pruner
In [Learning both Weights and Connections for Efficient Neural Networks](https://arxiv.org/abs/1506.02626), author Song Han and provide an algorithm to find the sensitivity of each layer and set the pruning threshold to each layer.
>We used the sensitivity results to find each layer’s threshold: for example, the smallest threshold was applied to the most sensitive layer, which is the first convolutional layer... The pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layer’s weights
### Usage
You can prune weight step by step and reach one target sparsity by Sensitivity Pruner with the code below.
You can run these examples easily like this, take torch pruning for example
```bash
python main_torch_pruner.py
```
This example uses AGP Pruner. Initiating a pruner needs a user provided configuration which can be provided in two ways:
- By reading ```configure_example.yaml```, this can make code clean when your configuration is complicated
- Directly config in your codes
In our example, we simply config model compression in our codes like this
```python
configure_list=[{
'initial_sparsity':0,
'final_sparsity':0.8,
'start_epoch':0,
'end_epoch':10,
'frequency':1,
'op_type':'default'
}]
pruner=AGP_Pruner(configure_list)
```
When ```pruner(model)``` is called, your model is injected with masks as embedded operations. For example, a layer takes a weight as input, we will insert an operation between the weight and the layer, this operation takes the weight as input and outputs a new weight applied by the mask. Thus, the masks are applied at any time the computation goes through the operations. You can fine-tune your model **without** any modifications.
```python
forepochinrange(10):
# update_epoch is for pruner to be aware of epochs, so that it could adjust masks during training.
pruner.update_epoch(epoch)
print('# Epoch {} #'.format(epoch))
train(model,device,train_loader,optimizer)
test(model,device,test_loader)
```
When fine tuning finished, pruned weights are all masked and you can get masks like this