The 'default' op_type stands for the module types defined in :githublink:`default_layers.py <nni/compression/pytorch/default_layers.py>` for pytorch.
The 'Conv2d' op_type stands for the module types defined in :githublink:`default_layers.py <nni/compression/pytorch/default_layers.py>` for pytorch.
Therefore ``{ 'sparsity': 0.8, 'op_types': ['default'] }``\ means that **all layers with specified op_types will be compressed with the same 0.8 sparsity**. When ``pruner.compress()`` called, the model is compressed with masks and after that you can normally fine tune this model and **pruned weights won't be updated** which have been masked.
Therefore ``{ 'sparsity': 0.5, 'op_types': ['Conv2d'] }``\ means that **all layers with specified op_types will be compressed with the same 0.5 sparsity**. When ``pruner.compress()`` called, the model is compressed with masks and after that you can normally fine tune this model and **pruned weights won't be updated** which have been masked.
Then, make this automatic
Then, make this automatic
-------------------------
-------------------------
The previous example manually choosed LevelPruner and pruned all layers with the same sparsity, this is obviously sub-optimal because different layers may have different redundancy. Layer sparsity should be carefully tuned to achieve least model performance degradation and this can be done with NNI tuners.
The previous example manually chose L2FilterPruner and pruned with a specified sparsity. Different sparsity and different pruners may have different effects on different models. This process can be done with NNI tuners.
The first thing we need to do is to design a search space, here we use a nested search space which contains choosing pruning algorithm and optimizing layer sparsity.
Firstly, modify our codes for few lines
.. code-block:: json
{
"prune_method": {
"_type": "choice",
"_value": [
{
"_name": "agp",
"conv0_sparsity": {
"_type": "uniform",
"_value": [
0.1,
0.9
]
},
"conv1_sparsity": {
"_type": "uniform",
"_value": [
0.1,
0.9
]
},
},
{
"_name": "level",
"conv0_sparsity": {
"_type": "uniform",
"_value": [
0.1,
0.9
]
},
"conv1_sparsity": {
"_type": "uniform",
"_value": [
0.01,
0.9
]
},
}
]
}
}
Then we need to modify our codes for few lines
.. code-block:: python
.. code-block:: python
import nni
import nni
from nni.algorithms.compression.pytorch.pruning import *
from nni.algorithms.compression.pytorch.pruning import *
We provide several pruning algorithms that support fine-grained weight pruning and structural filter pruning. **Fine-grained Pruning** generally results in unstructured models, which need specialized haredware or software to speed up the sparse network.**Filter Pruning** achieves acceleratation by removing the entire filter. We also provide an algorithm to control the**pruning schedule**.
We provide several pruning algorithms that support fine-grained weight pruning and structural filter pruning. **Fine-grained Pruning** generally results in unstructured models, which need specialized hardware or software to speed up the sparse network.**Filter Pruning** achieves acceleration by removing the entire filter. Some pruning algorithms use one-shot method that prune weights at once based on an importance metric. Other pruning algorithms control the**pruning schedule** that prune weights during optimization, including some automatic pruning algorithms.
**Fine-grained Pruning**
**Fine-grained Pruning**
* `Level Pruner <#level-pruner>`__
* `Level Pruner <#level-pruner>`__
**Filter Pruning**
**Filter Pruning**
* `Slim Pruner <#slim-pruner>`__
* `Slim Pruner <#slim-pruner>`__
* `FPGM Pruner <#fpgm-pruner>`__
* `FPGM Pruner <#fpgm-pruner>`__
* `L1Filter Pruner <#l1filter-pruner>`__
* `L1Filter Pruner <#l1filter-pruner>`__
...
@@ -21,7 +20,6 @@ We provide several pruning algorithms that support fine-grained weight pruning a
...
@@ -21,7 +20,6 @@ We provide several pruning algorithms that support fine-grained weight pruning a
This is an one-shot pruner, which adds sparsity regularization on the scaling factors of batch normalization (BN) layers during training to identify unimportant channels. The channels with small scaling factor values will be pruned. For more details, please refer to `'Learning Efficient Convolutional Networks through Network Slimming' <https://arxiv.org/pdf/1708.06519.pdf>`__\.
This is an one-shot pruner, In `'Learning Efficient Convolutional Networks through Network Slimming' <https://arxiv.org/pdf/1708.06519.pdf>`__\ , authors Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan and Changshui Zhang.
.. image:: ../../img/slim_pruner.png
:target: ../../img/slim_pruner.png
:alt:
..
Slim Pruner **prunes channels in the convolution layers by masking corresponding scaling factors in the later BN layers**\ , L1 regularization on the scaling factors should be applied in batch normalization (BN) layers while training, scaling factors of BN layers are** globally ranked** while pruning, so the sparse model can be automatically found given sparsity.
Usage
Usage
^^^^^
^^^^^
...
@@ -124,36 +100,29 @@ We implemented one of the experiments in `Learning Efficient Convolutional Netwo
...
@@ -124,36 +100,29 @@ We implemented one of the experiments in `Learning Efficient Convolutional Netwo
- Parameters
- Parameters
- Pruned
- Pruned
* - VGGNet
* - VGGNet
- 6.34/6.40
- 6.34/6.69
- 20.04M
- 20.04M
-
-
* - Pruned-VGGNet
* - Pruned-VGGNet
- 6.20/6.26
- 6.20/6.34
- 2.03M
- 2.03M
- 88.5%
- 88.5%
The experiments code can be found at :githublink:`examples/model_compress/pruning/reproduced/slim_torch_cifar10.py <examples/model_compress/pruning/reproduced/slim_torch_cifar10.py>`
The experiments code can be found at :githublink:`examples/model_compress/pruning/basic_pruners_torch.py <examples/model_compress/pruning/basic_pruners_torch.py>`
----
FPGM Pruner
-----------
This is an one-shot pruner, FPGM Pruner is an implementation of paper `Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration <https://arxiv.org/pdf/1811.00250.pdf>`__
FPGMPruner prune filters with the smallest geometric median.
Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
FPGM Pruner
-----------
This is an one-shot pruner, which prunes filters with the smallest geometric median. FPGM chooses the filters with the most replaceable contribution.
For more details, please refer to `Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration <https://arxiv.org/pdf/1811.00250.pdf>`__.
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
...
@@ -182,21 +151,11 @@ User configuration for FPGM Pruner
...
@@ -182,21 +151,11 @@ User configuration for FPGM Pruner
L1Filter Pruner
L1Filter Pruner
---------------
---------------
This is an one-shot pruner, In `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__\ , authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
This is an one-shot pruner, which prunes the filters in the **convolution layers**.
.. image:: ../../img/l1filter_pruner.png
:target: ../../img/l1filter_pruner.png
:alt:
..
..
L1Filter Pruner prunes filters in the **convolution layers**
The procedure of pruning m filters from the ith convolutional layer is as follows:
The procedure of pruning m filters from the ith convolutional layer is as follows:
#. For each filter :math:`F_{i,j}`, calculate the sum of its absolute kernel weights :math:`s_j=\sum_{l=1}^{n_i}\sum|K_l|`.
#. For each filter :math:`F_{i,j}`, calculate the sum of its absolute kernel weights :math:`s_j=\sum_{l=1}^{n_i}\sum|K_l|`.
#. Sort the filters by :math:`s_j`.
#. Sort the filters by :math:`s_j`.
...
@@ -207,6 +166,9 @@ This is an one-shot pruner, In `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://
...
@@ -207,6 +166,9 @@ This is an one-shot pruner, In `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://
#. A new kernel matrix is created for both the :math:`i`-th and :math:`i+1`-th layers, and the remaining kernel
#. A new kernel matrix is created for both the :math:`i`-th and :math:`i+1`-th layers, and the remaining kernel
weights are copied to the new model.
weights are copied to the new model.
For more details, please refer to `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__\.
In addition, we also provide a dependency-aware mode for the L1FilterPruner. For more details about the dependency-aware mode, please reference `dependency-aware mode <./DependencyAware.rst>`__.
In addition, we also provide a dependency-aware mode for the L1FilterPruner. For more details about the dependency-aware mode, please reference `dependency-aware mode <./DependencyAware.rst>`__.
...
@@ -252,7 +214,11 @@ We implemented one of the experiments in `PRUNING FILTERS FOR EFFICIENT CONVNETS
...
@@ -252,7 +214,11 @@ We implemented one of the experiments in `PRUNING FILTERS FOR EFFICIENT CONVNETS
- 64.0%
- 64.0%
The experiments code can be found at :githublink:`examples/model_compress/pruning/reproduced/L1_torch_cifar10.py <examples/model_compress/pruning/reproduced/L1_torch_cifar10.py>`
The experiments code can be found at :githublink:`examples/model_compress/pruning/basic_pruners_torch.py <examples/model_compress/pruning/basic_pruners_torch.py>`
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
...
@@ -316,7 +279,7 @@ PyTorch code
...
@@ -316,7 +279,7 @@ PyTorch code
Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the ``op_types`` field supports only convolutional layers.
Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the ``op_types`` field supports only convolutional layers.
You can view :githublink:`example <examples/model_compress/pruning/model_prune_torch.py>` for more information.
You can view :githublink:`example <examples/model_compress/pruning/basic_pruners_torch.py>` for more information.
User configuration for ActivationAPoZRankFilter Pruner
User configuration for ActivationAPoZRankFilter Pruner
Note: ActivationMeanRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the ``op_types`` field supports only convolutional layers.
Note: ActivationMeanRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the ``op_types`` field supports only convolutional layers.
You can view :githublink:`example <examples/model_compress/pruning/model_prune_torch.py>` for more information.
You can view :githublink:`example <examples/model_compress/pruning/basic_pruners_torch.py>` for more information.
User configuration for ActivationMeanRankFilterPruner
User configuration for ActivationMeanRankFilterPruner
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
...
@@ -407,24 +364,17 @@ User configuration for TaylorFOWeightFilter Pruner
...
@@ -407,24 +364,17 @@ User configuration for TaylorFOWeightFilter Pruner
AGP Pruner
AGP Pruner
----------
----------
This is an iterative pruner, In `To prune, or not to prune: exploring the efficacy of pruning for model compression <https://arxiv.org/abs/1710.01878>`__\ , authors Michael Zhu and Suyog Gupta provide an algorithm to prune the weight gradually.
This is an iterative pruner, which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step :math:`t_{0}` and with pruning frequency :math:`\Delta t`:
..
We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step t0 and with pruning frequency ∆t:
.. image:: ../../img/agp_pruner.png
:target: ../../img/agp_pruner.png
:alt:
:math:`s_{t}=s_{f}+\left(s_{i}-s_{f}\right)\left(1-\frac{t-t_{0}}{n \Delta t}\right)^{3} \text { for } t \in\left\{t_{0}, t_{0}+\Delta t, \ldots, t_{0} + n \Delta t\right\}`
The binary weight masks are updated every ∆t steps as the network is trained to gradually increase the sparsity of the network while allowing the network training steps to recover from any pruning-induced loss in accuracy. In our experience, varying the pruning frequency ∆t between 100 and 1000 training steps had a negligible impact on the final model quality. Once the model achieves the target sparsity sf , the weight masks are no longer updated. The intuition behind this sparsity function in equation (1).
For more details please refer to `To prune, or not to prune: exploring the efficacy of pruning for model compression <https://arxiv.org/abs/1710.01878>`__\.
Usage
Usage
^^^^^
^^^^^
You can prune all weight from 0% to 80% sparsity in 10 epoch with the code below.
You can prune all weights from 0% to 80% sparsity in 10 epoch with the code below.
PyTorch code
PyTorch code
...
@@ -471,7 +421,6 @@ PyTorch code
...
@@ -471,7 +421,6 @@ PyTorch code
pruner.update_epoch(epoch)
pruner.update_epoch(epoch)
You can view :githublink:`example <examples/model_compress/pruning/model_prune_torch.py>` for more information.
User configuration for AGP Pruner
User configuration for AGP Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
@@ -491,11 +440,6 @@ Given the overall sparsity, NetAdapt will automatically generate the sparsities
...
@@ -491,11 +440,6 @@ Given the overall sparsity, NetAdapt will automatically generate the sparsities
For more details, please refer to `NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications <https://arxiv.org/abs/1804.03230>`__.
For more details, please refer to `NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications <https://arxiv.org/abs/1804.03230>`__.
better preserving the accuracy and freeing human labor.
better preserving the accuracy and freeing human labor.
.. image:: ../../img/amc_pruner.jpg
:target: ../../img/amc_pruner.jpg
:alt:
For more details, please refer to `AMC: AutoML for Model Compression and Acceleration on Mobile Devices <https://arxiv.org/pdf/1802.03494.pdf>`__.
For more details, please refer to `AMC: AutoML for Model Compression and Acceleration on Mobile Devices <https://arxiv.org/pdf/1802.03494.pdf>`__.
Usage
Usage
...
@@ -742,7 +681,6 @@ PyTorch code
...
@@ -742,7 +681,6 @@ PyTorch code
The above configuration means that there are 5 times of iterative pruning. As the 5 times iterative pruning are executed in the same run, LotteryTicketPruner needs ``model`` and ``optimizer`` (\ **Note that should add ``lr_scheduler`` if used**\ ) to reset their states every time a new prune iteration starts. Please use ``get_prune_iterations`` to get the pruning iterations, and invoke ``prune_iteration_start`` at the beginning of each iteration. ``epoch_num`` is better to be large enough for model convergence, because the hypothesis is that the performance (accuracy) got in latter rounds with high sparsity could be comparable with that got in the first round.
The above configuration means that there are 5 times of iterative pruning. As the 5 times iterative pruning are executed in the same run, LotteryTicketPruner needs ``model`` and ``optimizer`` (\ **Note that should add ``lr_scheduler`` if used**\ ) to reset their states every time a new prune iteration starts. Please use ``get_prune_iterations`` to get the pruning iterations, and invoke ``prune_iteration_start`` at the beginning of each iteration. ``epoch_num`` is better to be large enough for model convergence, because the hypothesis is that the performance (accuracy) got in latter rounds with high sparsity could be comparable with that got in the first round.
*Tensorflow version will be supported later.*
User configuration for LotteryTicket Pruner
User configuration for LotteryTicket Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
@@ -754,7 +692,7 @@ User configuration for LotteryTicket Pruner
...
@@ -754,7 +692,7 @@ User configuration for LotteryTicket Pruner
Reproduced Experiment
Reproduced Experiment
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
We try to reproduce the experiment result of the fully connected network on MNIST using the same configuration as in the paper. The code can be referred :githublink:`here <examples/model_compress/pruning/reproduced/lottery_torch_mnist_fc.py>`. In this experiment, we prune 10 times, for each pruning we train the pruned model for 50 epochs.
We try to reproduce the experiment result of the fully connected network on MNIST using the same configuration as in the paper. The code can be referred :githublink:`here <examples/model_compress/pruning/lottery_torch_mnist_fc.py>`. In this experiment, we prune 10 times, for each pruning we train the pruned model for 50 epochs.
You can use other compression algorithms in the package of ``nni.compression``. The algorithms are implemented in both PyTorch and TensorFlow (partial support on TensorFlow), under ``nni.compression.pytorch`` and ``nni.compression.tensorflow`` respectively. You can refer to `Pruner <./Pruner.rst>`__ and `Quantizer <./Quantizer.rst>`__ for detail description of supported algorithms. Also if you want to use knowledge distillation, you can refer to `KDExample <../TrialExample/KDExample.rst>`__
You can use other compression algorithms in the package of ``nni.compression``. The algorithms are implemented in both PyTorch and TensorFlow (partial support on TensorFlow), under ``nni.compression.pytorch`` and ``nni.compression.tensorflow`` respectively. You can refer to `Pruner <./Pruner.rst>`__ and `Quantizer <./Quantizer.rst>`__ for detail description of supported algorithms. Also if you want to use knowledge distillation, you can refer to `KDExample <../TrialExample/KDExample.rst>`__
A compression algorithm is first instantiated with a ``config_list`` passed in. The specification of this ``config_list`` will be described later.
A compression algorithm is first instantiated with a ``config_list`` passed in. The specification of this ``config_list`` will be described later.
Knowledge distillation support, in `Distilling the Knowledge in a Neural Network <https://arxiv.org/abs/1503.02531>`__\ , the compressed model is trained to mimic a pre-trained, larger model. This training setting is also referred to as "teacher-student", where the large model is the teacher and the small model is the student.
Knowledge Distillation (KD) is proposed in `Distilling the Knowledge in a Neural Network <https://arxiv.org/abs/1503.02531>`__\ , the compressed model is trained to mimic a pre-trained, larger model. This training setting is also referred to as "teacher-student", where the large model is the teacher and the small model is the student. KD is often used to fine-tune the pruned model.
.. image:: ../../img/distill.png
.. image:: ../../img/distill.png
:target: ../../img/distill.png
:target: ../../img/distill.png
:alt:
:alt:
Usage
Usage
^^^^^
^^^^^
...
@@ -19,24 +18,29 @@ PyTorch code
...
@@ -19,24 +18,29 @@ PyTorch code
.. code-block:: python
.. code-block:: python
from knowledge_distill.knowledge_distill import KnowledgeDistill
kd = KnowledgeDistill(kd_teacher_model, kd_T=5)
alpha = 1
beta = 0.8
for batch_idx, (data, target) in enumerate(train_loader):
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
optimizer.zero_grad()
output = model(data)
y_s = model_s(data)
loss = F.cross_entropy(output, target)
y_t = model_t(data)
# you only to add the following line to fine-tune with knowledge distillation
loss_cri = F.cross_entropy(y_s, target)
loss = alpha * loss + beta * kd.loss(data=data, student_out=output)
Note that: for fine-tuning a pruned model, run :githublink:`basic_pruners_torch.py <examples/model_compress/pruning/basic_pruners_torch.py>` first to get the mask file, then pass the mask path as argument to the script.
* **kd_teacher_model:** The pre-trained teacher model
* **kd_T:** Temperature for smoothing teacher model's output
The complete code can be found `here <https://github.com/microsoft/nni/tree/v1.3/examples/model_compress/knowledge_distill/>`__
You can run these examples easily like this, take torch pruning for example
```bash
python model_prune_torch.py
```
This example uses AGP Pruner. Initiating a pruner needs a user provided configuration which can be provided in two ways:
- By reading ```configure_example.yaml```, this can make code clean when your configuration is complicated
- Directly config in your codes
In our example, we simply config model compression in our codes like this
```python
config_list=[{
'initial_sparsity':0,
'final_sparsity':0.8,
'start_epoch':0,
'end_epoch':10,
'frequency':1,
'op_types':['default']
}]
pruner=AGPPruner(config_list)
```
When ```pruner(model)``` is called, your model is injected with masks as embedded operations. For example, a layer takes a weight as input, we will insert an operation between the weight and the layer, this operation takes the weight as input and outputs a new weight applied by the mask. Thus, the masks are applied at any time the computation goes through the operations. You can fine-tune your model **without** any modifications.
```python
forepochinrange(10):
# update_epoch is for pruner to be aware of epochs, so that it could adjust masks during training.
pruner.update_epoch(epoch)
print('# Epoch {} #'.format(epoch))
train(model,device,train_loader,optimizer)
test(model,device,test_loader)
```
When fine tuning finished, pruned weights are all masked and you can get masks like this
Example for supported automatic pruning algorithms.
In this example, we present the usage of automatic pruners (NetAdapt, AutoCompressPruner). L1, L2, FPGM pruners are also executed for comparison purpose.
NNI exmaple for fine-tuning the pruned model with KD.
Run basic_pruners_torch.py first to get the masks of the pruned model. Then pass the mask as argument for model speedup. The compressed model is further used for fine-tuning.