Unverified Commit a9dcc006 authored by colorjam's avatar colorjam Committed by GitHub
Browse files

Refactor model compression examples (#3326)

parent 5946b4a4
......@@ -6,116 +6,70 @@ It's convenient to implement auto model pruning with NNI compression and NNI tun
First, model compression with NNI
---------------------------------
You can easily compress a model with NNI compression. Take pruning for example, you can prune a pretrained model with LevelPruner like this
You can easily compress a model with NNI compression. Take pruning for example, you can prune a pretrained model with L2FilterPruner like this
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(model, config_list)
from nni.algorithms.compression.pytorch.pruning import L2FilterPruner
config_list = [{ 'sparsity': 0.5, 'op_types': ['Conv2d'] }]
pruner = L2FilterPruner(model, config_list)
pruner.compress()
The 'default' op_type stands for the module types defined in :githublink:`default_layers.py <nni/compression/pytorch/default_layers.py>` for pytorch.
The 'Conv2d' op_type stands for the module types defined in :githublink:`default_layers.py <nni/compression/pytorch/default_layers.py>` for pytorch.
Therefore ``{ 'sparsity': 0.8, 'op_types': ['default'] }``\ means that **all layers with specified op_types will be compressed with the same 0.8 sparsity**. When ``pruner.compress()`` called, the model is compressed with masks and after that you can normally fine tune this model and **pruned weights won't be updated** which have been masked.
Therefore ``{ 'sparsity': 0.5, 'op_types': ['Conv2d'] }``\ means that **all layers with specified op_types will be compressed with the same 0.5 sparsity**. When ``pruner.compress()`` called, the model is compressed with masks and after that you can normally fine tune this model and **pruned weights won't be updated** which have been masked.
Then, make this automatic
-------------------------
The previous example manually choosed LevelPruner and pruned all layers with the same sparsity, this is obviously sub-optimal because different layers may have different redundancy. Layer sparsity should be carefully tuned to achieve least model performance degradation and this can be done with NNI tuners.
The first thing we need to do is to design a search space, here we use a nested search space which contains choosing pruning algorithm and optimizing layer sparsity.
.. code-block:: json
{
"prune_method": {
"_type": "choice",
"_value": [
{
"_name": "agp",
"conv0_sparsity": {
"_type": "uniform",
"_value": [
0.1,
0.9
]
},
"conv1_sparsity": {
"_type": "uniform",
"_value": [
0.1,
0.9
]
},
},
{
"_name": "level",
"conv0_sparsity": {
"_type": "uniform",
"_value": [
0.1,
0.9
]
},
"conv1_sparsity": {
"_type": "uniform",
"_value": [
0.01,
0.9
]
},
}
]
}
}
Then we need to modify our codes for few lines
The previous example manually chose L2FilterPruner and pruned with a specified sparsity. Different sparsity and different pruners may have different effects on different models. This process can be done with NNI tuners.
Firstly, modify our codes for few lines
.. code-block:: python
import nni
from nni.algorithms.compression.pytorch.pruning import *
params = nni.get_parameters()
conv0_sparsity = params['prune_method']['conv0_sparsity']
conv1_sparsity = params['prune_method']['conv1_sparsity']
# these raw sparsity should be scaled if you need total sparsity constrained
config_list_level = [{ 'sparsity': conv0_sparsity, 'op_name': 'conv0' },
{ 'sparsity': conv1_sparsity, 'op_name': 'conv1' }]
config_list_agp = [{'initial_sparsity': 0, 'final_sparsity': conv0_sparsity,
'start_epoch': 0, 'end_epoch': 3,
'frequency': 1,'op_name': 'conv0' },
{'initial_sparsity': 0, 'final_sparsity': conv1_sparsity,
'start_epoch': 0, 'end_epoch': 3,
'frequency': 1,'op_name': 'conv1' },]
PRUNERS = {'level':LevelPruner(model, config_list_level), 'agp':AGPPruner(model, config_list_agp)}
pruner = PRUNERS(params['prune_method']['_name'])
sparsity = params['sparsity']
pruner_name = params['pruner']
model_name = params['model']
model, pruner = get_model_pruner(model_name, pruner_name, sparsity)
pruner.compress()
... # fine tuning
acc = evaluate(model) # evaluation
train(model) # your code for fine-tuning the model
acc = test(model) # test the fine-tuned model
nni.report_final_results(acc)
Last, define our task and automatically tuning pruning methods with layers sparsity
Then, define a ``config`` file in YAML to automatically tuning model, pruning algorithm and sparsity.
.. code-block:: yaml
authorName: default
experimentName: Auto_Compression
trialConcurrency: 2
maxExecDuration: 100h
maxTrialNum: 500
#choice: local, remote, pai
trainingServicePlatform: local
#choice: true, false
useAnnotation: False
searchSpacePath: search_space.json
searchSpace:
sparsity:
_type: choice
_value: [0.25, 0.5, 0.75]
pruner:
_type: choice
_value: ['slim', 'l2filter', 'fpgm', 'apoz']
model:
_type: choice
_value: ['vgg16', 'vgg19']
trainingService:
platform: local
trialCodeDirectory: .
trialCommand: python3 basic_pruners_torch.py --nni
trialConcurrency: 1
trialGpuNumber: 0
tuner:
#choice: TPE, Random, Anneal...
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: bash run_prune.sh
codeDir: .
gpuNum: 1
name: grid
The full example can be found :githublink:`here <examples/model_compress/pruning/config.yml>`
Finally, start the searching via
.. code-block:: bash
nnictl create -c config.yml
Supported Pruning Algorithms on NNI
===================================
We provide several pruning algorithms that support fine-grained weight pruning and structural filter pruning. **Fine-grained Pruning** generally results in unstructured models, which need specialized haredware or software to speed up the sparse network.** Filter Pruning** achieves acceleratation by removing the entire filter. We also provide an algorithm to control the** pruning schedule**.
We provide several pruning algorithms that support fine-grained weight pruning and structural filter pruning. **Fine-grained Pruning** generally results in unstructured models, which need specialized hardware or software to speed up the sparse network. **Filter Pruning** achieves acceleration by removing the entire filter. Some pruning algorithms use one-shot method that prune weights at once based on an importance metric. Other pruning algorithms control the **pruning schedule** that prune weights during optimization, including some automatic pruning algorithms.
**Fine-grained Pruning**
**Fine-grained Pruning**
* `Level Pruner <#level-pruner>`__
**Filter Pruning**
* `Slim Pruner <#slim-pruner>`__
* `FPGM Pruner <#fpgm-pruner>`__
* `L1Filter Pruner <#l1filter-pruner>`__
......@@ -21,7 +20,6 @@ We provide several pruning algorithms that support fine-grained weight pruning a
**Pruning Schedule**
* `AGP Pruner <#agp-pruner>`__
* `NetAdapt Pruner <#netadapt-pruner>`__
* `SimulatedAnnealing Pruner <#simulatedannealing-pruner>`__
......@@ -31,7 +29,6 @@ We provide several pruning algorithms that support fine-grained weight pruning a
**Others**
* `ADMM Pruner <#admm-pruner>`__
* `Lottery Ticket Hypothesis <#lottery-ticket-hypothesis>`__
......@@ -45,15 +42,6 @@ We first sort the weights in the specified layer by their absolute values. And t
Usage
^^^^^
Tensorflow code
.. code-block:: python
from nni.algorithms.compression.tensorflow.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(model, config_list)
pruner.compress()
PyTorch code
.. code-block:: python
......@@ -70,26 +58,14 @@ User configuration for Level Pruner
.. autoclass:: nni.algorithms.compression.pytorch.pruning.LevelPruner
Tensorflow
""""""""""
**TensorFlow**
.. autoclass:: nni.algorithms.compression.tensorflow.pruning.LevelPruner
Slim Pruner
-----------
This is an one-shot pruner, In `'Learning Efficient Convolutional Networks through Network Slimming' <https://arxiv.org/pdf/1708.06519.pdf>`__\ , authors Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan and Changshui Zhang.
.. image:: ../../img/slim_pruner.png
:target: ../../img/slim_pruner.png
:alt:
..
Slim Pruner **prunes channels in the convolution layers by masking corresponding scaling factors in the later BN layers**\ , L1 regularization on the scaling factors should be applied in batch normalization (BN) layers while training, scaling factors of BN layers are** globally ranked** while pruning, so the sparse model can be automatically found given sparsity.
This is an one-shot pruner, which adds sparsity regularization on the scaling factors of batch normalization (BN) layers during training to identify unimportant channels. The channels with small scaling factor values will be pruned. For more details, please refer to `'Learning Efficient Convolutional Networks through Network Slimming' <https://arxiv.org/pdf/1708.06519.pdf>`__\.
Usage
^^^^^
......@@ -124,36 +100,29 @@ We implemented one of the experiments in `Learning Efficient Convolutional Netwo
- Parameters
- Pruned
* - VGGNet
- 6.34/6.40
- 6.34/6.69
- 20.04M
-
* - Pruned-VGGNet
- 6.20/6.26
- 6.20/6.34
- 2.03M
- 88.5%
The experiments code can be found at :githublink:`examples/model_compress/pruning/reproduced/slim_torch_cifar10.py <examples/model_compress/pruning/reproduced/slim_torch_cifar10.py>`
----
FPGM Pruner
-----------
This is an one-shot pruner, FPGM Pruner is an implementation of paper `Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration <https://arxiv.org/pdf/1811.00250.pdf>`__
FPGMPruner prune filters with the smallest geometric median.
The experiments code can be found at :githublink:`examples/model_compress/pruning/basic_pruners_torch.py <examples/model_compress/pruning/basic_pruners_torch.py>`
.. code-block:: python
.. image:: ../../img/fpgm_fig1.png
:target: ../../img/fpgm_fig1.png
:alt:
python basic_pruners_torch.py --pruner slim --model vgg19 --sparsity 0.7 --speed-up
..
----
Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
FPGM Pruner
-----------
This is an one-shot pruner, which prunes filters with the smallest geometric median. FPGM chooses the filters with the most replaceable contribution.
For more details, please refer to `Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration <https://arxiv.org/pdf/1811.00250.pdf>`__.
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
......@@ -182,21 +151,11 @@ User configuration for FPGM Pruner
L1Filter Pruner
---------------
This is an one-shot pruner, In `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__\ , authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
.. image:: ../../img/l1filter_pruner.png
:target: ../../img/l1filter_pruner.png
:alt:
This is an one-shot pruner, which prunes the filters in the **convolution layers**.
..
L1Filter Pruner prunes filters in the **convolution layers**
The procedure of pruning m filters from the ith convolutional layer is as follows:
#. For each filter :math:`F_{i,j}`, calculate the sum of its absolute kernel weights :math:`s_j=\sum_{l=1}^{n_i}\sum|K_l|`.
#. Sort the filters by :math:`s_j`.
......@@ -207,6 +166,9 @@ This is an one-shot pruner, In `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://
#. A new kernel matrix is created for both the :math:`i`-th and :math:`i+1`-th layers, and the remaining kernel
weights are copied to the new model.
For more details, please refer to `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__\.
In addition, we also provide a dependency-aware mode for the L1FilterPruner. For more details about the dependency-aware mode, please reference `dependency-aware mode <./DependencyAware.rst>`__.
......@@ -252,7 +214,11 @@ We implemented one of the experiments in `PRUNING FILTERS FOR EFFICIENT CONVNETS
- 64.0%
The experiments code can be found at :githublink:`examples/model_compress/pruning/reproduced/L1_torch_cifar10.py <examples/model_compress/pruning/reproduced/L1_torch_cifar10.py>`
The experiments code can be found at :githublink:`examples/model_compress/pruning/basic_pruners_torch.py <examples/model_compress/pruning/basic_pruners_torch.py>`
.. code-block:: python
python basic_pruners_torch.py --pruner l1filter --model vgg16 --speed-up
----
......@@ -291,10 +257,7 @@ ActivationAPoZRankFilter Pruner is a pruner which prunes the filters with the sm
The APoZ is defined as:
.. image:: ../../img/apoz.png
:target: ../../img/apoz.png
:alt:
:math:`APoZ_{c}^{(i)} = APoZ\left(O_{c}^{(i)}\right)=\frac{\sum_{k}^{N} \sum_{j}^{M} f\left(O_{c, j}^{(i)}(k)=0\right)}{N \times M}`
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
......@@ -316,7 +279,7 @@ PyTorch code
Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the ``op_types`` field supports only convolutional layers.
You can view :githublink:`example <examples/model_compress/pruning/model_prune_torch.py>` for more information.
You can view :githublink:`example <examples/model_compress/pruning/basic_pruners_torch.py>` for more information.
User configuration for ActivationAPoZRankFilter Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -351,7 +314,7 @@ PyTorch code
Note: ActivationMeanRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the ``op_types`` field supports only convolutional layers.
You can view :githublink:`example <examples/model_compress/pruning/model_prune_torch.py>` for more information.
You can view :githublink:`example <examples/model_compress/pruning/basic_pruners_torch.py>` for more information.
User configuration for ActivationMeanRankFilterPruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -369,13 +332,7 @@ TaylorFOWeightFilter Pruner is a pruner which prunes convolutional layers based
..
.. image:: ../../img/importance_estimation_sum.png
:target: ../../img/importance_estimation_sum.png
:alt:
:math:`\widehat{\mathcal{I}}_{\mathcal{S}}^{(1)}(\mathbf{W}) \triangleq \sum_{s \in \mathcal{S}} \mathcal{I}_{s}^{(1)}(\mathbf{W})=\sum_{s \in \mathcal{S}}\left(g_{s} w_{s}\right)^{2}`
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.
......@@ -407,24 +364,17 @@ User configuration for TaylorFOWeightFilter Pruner
AGP Pruner
----------
This is an iterative pruner, In `To prune, or not to prune: exploring the efficacy of pruning for model compression <https://arxiv.org/abs/1710.01878>`__\ , authors Michael Zhu and Suyog Gupta provide an algorithm to prune the weight gradually.
..
We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step t0 and with pruning frequency ∆t:
.. image:: ../../img/agp_pruner.png
:target: ../../img/agp_pruner.png
:alt:
This is an iterative pruner, which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step :math:`t_{0}` and with pruning frequency :math:`\Delta t`:
:math:`s_{t}=s_{f}+\left(s_{i}-s_{f}\right)\left(1-\frac{t-t_{0}}{n \Delta t}\right)^{3} \text { for } t \in\left\{t_{0}, t_{0}+\Delta t, \ldots, t_{0} + n \Delta t\right\}`
The binary weight masks are updated every ∆t steps as the network is trained to gradually increase the sparsity of the network while allowing the network training steps to recover from any pruning-induced loss in accuracy. In our experience, varying the pruning frequency ∆t between 100 and 1000 training steps had a negligible impact on the final model quality. Once the model achieves the target sparsity sf , the weight masks are no longer updated. The intuition behind this sparsity function in equation (1).
For more details please refer to `To prune, or not to prune: exploring the efficacy of pruning for model compression <https://arxiv.org/abs/1710.01878>`__\.
Usage
^^^^^
You can prune all weight from 0% to 80% sparsity in 10 epoch with the code below.
You can prune all weights from 0% to 80% sparsity in 10 epoch with the code below.
PyTorch code
......@@ -471,7 +421,6 @@ PyTorch code
pruner.update_epoch(epoch)
You can view :githublink:`example <examples/model_compress/pruning/model_prune_torch.py>` for more information.
User configuration for AGP Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -491,11 +440,6 @@ Given the overall sparsity, NetAdapt will automatically generate the sparsities
For more details, please refer to `NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications <https://arxiv.org/abs/1804.03230>`__.
.. image:: ../../img/algo_NetAdapt.png
:target: ../../img/algo_NetAdapt.png
:alt:
Usage
^^^^^
......@@ -610,11 +554,6 @@ This learning-based compression policy outperforms conventional rule-based compr
better preserving the accuracy and freeing human labor.
.. image:: ../../img/amc_pruner.jpg
:target: ../../img/amc_pruner.jpg
:alt:
For more details, please refer to `AMC: AutoML for Model Compression and Acceleration on Mobile Devices <https://arxiv.org/pdf/1802.03494.pdf>`__.
Usage
......@@ -742,7 +681,6 @@ PyTorch code
The above configuration means that there are 5 times of iterative pruning. As the 5 times iterative pruning are executed in the same run, LotteryTicketPruner needs ``model`` and ``optimizer`` (\ **Note that should add ``lr_scheduler`` if used**\ ) to reset their states every time a new prune iteration starts. Please use ``get_prune_iterations`` to get the pruning iterations, and invoke ``prune_iteration_start`` at the beginning of each iteration. ``epoch_num`` is better to be large enough for model convergence, because the hypothesis is that the performance (accuracy) got in latter rounds with high sparsity could be comparable with that got in the first round.
*Tensorflow version will be supported later.*
User configuration for LotteryTicket Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -754,7 +692,7 @@ User configuration for LotteryTicket Pruner
Reproduced Experiment
^^^^^^^^^^^^^^^^^^^^^
We try to reproduce the experiment result of the fully connected network on MNIST using the same configuration as in the paper. The code can be referred :githublink:`here <examples/model_compress/pruning/reproduced/lottery_torch_mnist_fc.py>`. In this experiment, we prune 10 times, for each pruning we train the pruned model for 50 epochs.
We try to reproduce the experiment result of the fully connected network on MNIST using the same configuration as in the paper. The code can be referred :githublink:`here <examples/model_compress/pruning/lottery_torch_mnist_fc.py>`. In this experiment, we prune 10 times, for each pruning we train the pruned model for 50 epochs.
.. image:: ../../img/lottery_ticket_mnist_fc.png
......
......@@ -45,7 +45,7 @@ After training, you get accuracy of the pruned model. You can export model weigh
pruner.export_model(model_path='pruned_vgg19_cifar10.pth', mask_path='mask_vgg19_cifar10.pth')
The complete code of model compression examples can be found :githublink:`here <examples/model_compress/pruning/model_prune_torch.py>`.
Please refer :githublink:`mnist example <examples/model_compress/pruning/naive_prune_torch.py>` for quick start.
Speed up the model
^^^^^^^^^^^^^^^^^^
......@@ -73,15 +73,6 @@ PyTorch code
pruner = LevelPruner(model, config_list)
pruner.compress()
Tensorflow code
.. code-block:: python
from nni.algorithms.compression.tensorflow.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(tf.get_default_graph(), config_list)
pruner.compress()
You can use other compression algorithms in the package of ``nni.compression``. The algorithms are implemented in both PyTorch and TensorFlow (partial support on TensorFlow), under ``nni.compression.pytorch`` and ``nni.compression.tensorflow`` respectively. You can refer to `Pruner <./Pruner.rst>`__ and `Quantizer <./Quantizer.rst>`__ for detail description of supported algorithms. Also if you want to use knowledge distillation, you can refer to `KDExample <../TrialExample/KDExample.rst>`__
A compression algorithm is first instantiated with a ``config_list`` passed in. The specification of this ``config_list`` will be described later.
......
......@@ -4,14 +4,13 @@ Knowledge Distillation on NNI
KnowledgeDistill
----------------
Knowledge distillation support, in `Distilling the Knowledge in a Neural Network <https://arxiv.org/abs/1503.02531>`__\ , the compressed model is trained to mimic a pre-trained, larger model. This training setting is also referred to as "teacher-student", where the large model is the teacher and the small model is the student.
Knowledge Distillation (KD) is proposed in `Distilling the Knowledge in a Neural Network <https://arxiv.org/abs/1503.02531>`__\ , the compressed model is trained to mimic a pre-trained, larger model. This training setting is also referred to as "teacher-student", where the large model is the teacher and the small model is the student. KD is often used to fine-tune the pruned model.
.. image:: ../../img/distill.png
:target: ../../img/distill.png
:alt:
Usage
^^^^^
......@@ -19,24 +18,29 @@ PyTorch code
.. code-block:: python
from knowledge_distill.knowledge_distill import KnowledgeDistill
kd = KnowledgeDistill(kd_teacher_model, kd_T=5)
alpha = 1
beta = 0.8
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, target)
# you only to add the following line to fine-tune with knowledge distillation
loss = alpha * loss + beta * kd.loss(data=data, student_out=output)
y_s = model_s(data)
y_t = model_t(data)
loss_cri = F.cross_entropy(y_s, target)
# kd loss
p_s = F.log_softmax(y_s/kd_T, dim=1)
p_t = F.softmax(y_t/kd_T, dim=1)
loss_kd = F.kl_div(p_s, p_t, size_average=False) * (self.T**2) / y_s.shape[0]
# total loss
loss = loss_cir + loss_kd
loss.backward()
User configuration for KnowledgeDistill
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The complete code for fine-tuning the pruend model can be found :githublink:`here <examples/model_compress/pruning/finetune_kd_torch.py>`
.. code-block:: python
python finetune_kd_torch.py --model [model name] --teacher-model-dir [pretrained checkpoint path] --student-model-dir [pruend checkpoint path] --mask-path [mask file path]
Note that: for fine-tuning a pruned model, run :githublink:`basic_pruners_torch.py <examples/model_compress/pruning/basic_pruners_torch.py>` first to get the mask file, then pass the mask path as argument to the script.
* **kd_teacher_model:** The pre-trained teacher model
* **kd_T:** Temperature for smoothing teacher model's output
The complete code can be found `here <https://github.com/microsoft/nni/tree/v1.3/examples/model_compress/knowledge_distill/>`__
# AMCPruner 示例
此示例将说明如何使用 AMCPruner。
## 步骤一:训练模型
运行以下命令来训练 mobilenetv2 模型:
```bash
python3 amc_train.py --model_type mobilenetv2 --n_epoch 50
```
训练完成之后,检查点文件被保存在这里:
```
logs/mobilenetv2_cifar10_train-run1/ckpt.best.pth
```
## 使用 AMCPruner 剪枝
运行以下命令对模型进行剪枝:
```bash
python3 amc_search.py --model_type mobilenetv2 --ckpt logs/mobilenetv2_cifar10_train-run1/ckpt.best.pth
```
完成之后,剪枝后的模型和掩码文件被保存在:
```
logs/mobilenetv2_cifar10_r0.5_search-run2
```
## 微调剪枝后的模型
加上 `--ckpt``--mask` 参数,再次运行 `amc_train.py` 命令去加速和微调剪枝后的模型。
```bash
python3 amc_train.py --model_type mobilenetv2 --ckpt logs/mobilenetv2_cifar10_r0.5_search-run2/best_model.pth --mask logs/mobilenetv2_cifar10_r0.5_search-run2/best_mask.pth --n_epoch 100
```
# Run model compression examples
You can run these examples easily like this, take torch pruning for example
```bash
python model_prune_torch.py
```
This example uses AGP Pruner. Initiating a pruner needs a user provided configuration which can be provided in two ways:
- By reading ```configure_example.yaml```, this can make code clean when your configuration is complicated
- Directly config in your codes
In our example, we simply config model compression in our codes like this
```python
config_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 0,
'end_epoch': 10,
'frequency': 1,
'op_types': ['default']
}]
pruner = AGPPruner(config_list)
```
When ```pruner(model)``` is called, your model is injected with masks as embedded operations. For example, a layer takes a weight as input, we will insert an operation between the weight and the layer, this operation takes the weight as input and outputs a new weight applied by the mask. Thus, the masks are applied at any time the computation goes through the operations. You can fine-tune your model **without** any modifications.
```python
for epoch in range(10):
# update_epoch is for pruner to be aware of epochs, so that it could adjust masks during training.
pruner.update_epoch(epoch)
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
```
When fine tuning finished, pruned weights are all masked and you can get masks like this
```
masks = pruner.mask_list
layer_name = xxx
mask = masks[layer_name]
```
# 运行模型压缩示例
以 PyTorch 剪枝为例:
```bash
python main_torch_pruner.py
```
此示例使用了 AGP Pruner。 初始化 Pruner 需要通过以下两种方式来提供配置。
- 读取 `configure_example.yaml`,这样代码会更整洁,但配置会比较复杂。
- 直接在代码中配置
此例在代码中配置了模型压缩:
```python
config_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 0,
'end_epoch': 10,
'frequency': 1,
'op_types': ['default']
}]
pruner = AGPPruner(config_list)
```
当调用 `pruner(model)` 时,模型会被嵌入掩码操作。 例如,某层以权重作为输入,可在权重和层操作之间插入一个操作,此操作以权重为输入,并将其应用掩码后输出。 因此,计算过程中,只要通过此操作,就会应用掩码。 还可以**不做任何改动**,来对模型进行微调。
```python
for epoch in range(10):
# update_epoch 来让 Pruner 知道 Epoch 的数量,从而能够在训练过程中调整掩码。
pruner.update_epoch(epoch)
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
```
微调完成后,被修剪过的权重可通过以下代码获得:
```
masks = pruner.mask_list
layer_name = xxx
mask = masks[layer_name]
```
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
Examples for automatic pruners
Example for supported automatic pruning algorithms.
In this example, we present the usage of automatic pruners (NetAdapt, AutoCompressPruner). L1, L2, FPGM pruners are also executed for comparison purpose.
'''
import argparse
......@@ -62,30 +64,6 @@ def get_data(dataset, data_dir, batch_size, test_batch_size):
])),
batch_size=batch_size, shuffle=False, **kwargs)
criterion = torch.nn.CrossEntropyLoss()
elif dataset == 'imagenet':
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
train_loader = torch.utils.data.DataLoader(
datasets.ImageFolder(os.path.join(data_dir, 'train'),
transform=transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,
])),
batch_size=batch_size, shuffle=True, **kwargs)
val_loader = torch.utils.data.DataLoader(
datasets.ImageFolder(os.path.join(data_dir, 'val'),
transform=transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
])),
batch_size=test_batch_size, shuffle=True, **kwargs)
criterion = torch.nn.CrossEntropyLoss()
return train_loader, val_loader, criterion
......@@ -248,7 +226,6 @@ def main(args):
'op_types': op_types
}]
dummy_input = get_dummy_input(args, device)
if args.pruner == 'L1FilterPruner':
pruner = L1FilterPruner(model, config_list)
elif args.pruner == 'L2FilterPruner':
......
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for supported basic pruning algorithms.
In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speed up is required.
You can also try auto_pruners_torch.py to see the usage of some automatic pruning algorithms.
'''
import logging
import argparse
import os
import time
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR, MultiStepLR
from torchvision import datasets, transforms
from models.mnist.lenet import LeNet
from models.cifar10.vgg import VGG
from nni.compression.pytorch.utils.counter import count_flops_params
import nni
from nni.compression.pytorch import apply_compression_results, ModelSpeedup
from nni.algorithms.compression.pytorch.pruning import (
LevelPruner,
SlimPruner,
FPGMPruner,
L1FilterPruner,
L2FilterPruner,
AGPPruner,
ActivationAPoZRankFilterPruner
)
_logger = logging.getLogger('mnist_example')
_logger.setLevel(logging.INFO)
str2pruner = {
'level': LevelPruner,
'l1filter': L1FilterPruner,
'l2filter': L2FilterPruner,
'slim': SlimPruner,
'agp': AGPPruner,
'fpgm': FPGMPruner,
'apoz': ActivationAPoZRankFilterPruner
}
def get_dummy_input(args, device):
if args.dataset == 'mnist':
dummy_input = torch.randn([args.test_batch_size, 1, 28, 28]).to(device)
elif args.dataset in ['cifar10', 'imagenet']:
dummy_input = torch.randn([args.test_batch_size, 3, 32, 32]).to(device)
return dummy_input
def get_pruner(model, pruner_name, device, optimizer=None, dependency_aware=False):
pruner_cls = str2pruner[pruner_name]
if pruner_name == 'level':
config_list = [{
'sparsity': args.sparsity,
'op_types': ['default']
}]
elif pruner_name == 'l1filter':
# Reproduced result in paper 'PRUNING FILTERS FOR EFFICIENT CONVNETS',
# Conv_1, Conv_8, Conv_9, Conv_10, Conv_11, Conv_12 are pruned with 50% sparsity, as 'VGG-16-pruned-A'
config_list = [{
'sparsity': args.sparsity,
'op_types': ['Conv2d'],
'op_names': ['feature.0', 'feature.24', 'feature.27', 'feature.30', 'feature.34', 'feature.37']
}]
elif pruner_name == 'slim':
config_list = [{
'sparsity': args.sparsity,
'op_types': ['BatchNorm2d'],
}]
else:
config_list = [{
'sparsity': args.sparsity,
'op_types': ['Conv2d']
}]
kw_args = {}
if dependency_aware:
dummy_input = get_dummy_input(args, device)
print('Enable the dependency_aware mode')
# note that, not all pruners support the dependency_aware mode
kw_args['dependency_aware'] = True
kw_args['dummy_input'] = dummy_input
pruner = pruner_cls(model, config_list, optimizer, **kw_args)
return pruner
def get_data(dataset, data_dir, batch_size, test_batch_size):
kwargs = {'num_workers': 1, 'pin_memory': True} if torch.cuda.is_available() else {
}
if dataset == 'mnist':
train_loader = torch.utils.data.DataLoader(
datasets.MNIST(data_dir, train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(data_dir, train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=test_batch_size, shuffle=True, **kwargs)
criterion = torch.nn.NLLLoss()
elif dataset == 'cifar10':
normalize = transforms.Normalize(
(0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10(data_dir, train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10(data_dir, train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=batch_size, shuffle=False, **kwargs)
criterion = torch.nn.CrossEntropyLoss()
return train_loader, test_loader, criterion
def get_model_optimizer_scheduler(args, device, train_loader, test_loader, criterion):
if args.model == 'lenet':
model = LeNet().to(device)
if args.pretrained_model_dir is None:
optimizer = torch.optim.Adadelta(model.parameters(), lr=1)
scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
elif args.model == 'vgg16':
model = VGG(depth=16).to(device)
if args.pretrained_model_dir is None:
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
scheduler = MultiStepLR(
optimizer, milestones=[int(args.pretrain_epochs*0.5), int(args.pretrain_epochs*0.75)], gamma=0.1)
elif args.model == 'vgg19':
model = VGG(depth=19).to(device)
if args.pretrained_model_dir is None:
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
scheduler = MultiStepLR(
optimizer, milestones=[int(args.pretrain_epochs*0.5), int(args.pretrain_epochs*0.75)], gamma=0.1)
else:
raise ValueError("model not recognized")
if args.pretrained_model_dir is None:
print('start pre-training...')
best_acc = 0
for epoch in range(args.pretrain_epochs):
train(args, model, device, train_loader, criterion, optimizer, epoch, sparse_bn=True if args.pruner == 'slim' else False)
scheduler.step()
acc = test(args, model, device, criterion, test_loader)
if acc > best_acc:
best_acc = acc
state_dict = model.state_dict()
model.load_state_dict(state_dict)
acc = best_acc
torch.save(state_dict, os.path.join(args.experiment_data_dir, f'pretrain_{args.dataset}_{args.model}.pth'))
print('Model trained saved to %s' % args.experiment_data_dir)
else:
model.load_state_dict(torch.load(args.pretrained_model_dir))
best_acc = test(args, model, device, criterion, test_loader)
# setup new opotimizer for fine-tuning
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
scheduler = MultiStepLR(
optimizer, milestones=[int(args.pretrain_epochs*0.5), int(args.pretrain_epochs*0.75)], gamma=0.1)
print('Pretrained model acc:', best_acc)
return model, optimizer, scheduler
def updateBN(model):
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
m.weight.grad.data.add_(0.0001 * torch.sign(m.weight.data))
def train(args, model, device, train_loader, criterion, optimizer, epoch, sparse_bn=False):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
if sparse_bn:
# L1 regularization on BN layer
updateBN(model)
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
if args.dry_run:
break
def test(args, model, device, criterion, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += criterion(output, target).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
acc = 100 * correct / len(test_loader.dataset)
print('Test Loss: {} Accuracy: {}%\n'.format(
test_loss, acc))
return acc
def main(args):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
os.makedirs(args.experiment_data_dir, exist_ok=True)
# prepare model and data
train_loader, test_loader, criterion = get_data(args.dataset, args.data_dir, args.batch_size, args.test_batch_size)
model, optimizer, scheduler = get_model_optimizer_scheduler(args, device, train_loader, test_loader, criterion)
dummy_input = get_dummy_input(args, device)
flops, params, results = count_flops_params(model, dummy_input)
print(f"FLOPs: {flops}, params: {params}")
print('start pruning...')
model_path = os.path.join(args.experiment_data_dir, 'pruned_{}_{}_{}.pth'.format(
args.model, args.dataset, args.pruner))
mask_path = os.path.join(args.experiment_data_dir, 'mask_{}_{}_{}.pth'.format(
args.model, args.dataset, args.pruner))
pruner = get_pruner(model, args.pruner, device, optimizer, args.dependency_aware)
model = pruner.compress()
if args.multi_gpu and torch.cuda.device_count() > 1:
model = nn.DataParallel(model)
if args.test_only:
test(args, model, device, criterion, test_loader)
best_top1 = 0
for epoch in range(args.fine_tune_epochs):
pruner.update_epoch(epoch)
print('# Epoch {} #'.format(epoch))
train(args, model, device, train_loader, criterion, optimizer, epoch)
scheduler.step()
top1 = test(args, model, device, criterion, test_loader)
if top1 > best_top1:
best_top1 = top1
# Export the best model, 'model_path' stores state_dict of the pruned model,
# mask_path stores mask_dict of the pruned model
pruner.export_model(model_path=model_path, mask_path=mask_path)
if args.nni:
nni.report_final_result(best_top1)
if args.speed_up:
# reload the best checkpoint for speed-up
args.pretrained_model_dir = model_path
model, _, _ = get_model_optimizer_scheduler(args, device, train_loader, test_loader, criterion)
model.eval()
apply_compression_results(model, mask_path, device)
# test model speed
start = time.time()
for _ in range(32):
use_mask_out = model(dummy_input)
print('elapsed time when use mask: ', time.time() - start)
m_speedup = ModelSpeedup(model, dummy_input, mask_path, device)
m_speedup.speedup_model()
flops, params, results = count_flops_params(model, dummy_input)
print(f"FLOPs: {flops}, params: {params}")
start = time.time()
for _ in range(32):
use_speedup_out = model(dummy_input)
print('elapsed time when use speedup: ', time.time() - start)
top1 = test(args, model, device, criterion, test_loader)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
# dataset and model
parser.add_argument('--dataset', type=str, default='cifar10',
help='dataset to use, mnist, cifar10 or imagenet')
parser.add_argument('--data-dir', type=str, default='./data/',
help='dataset directory')
parser.add_argument('--model', type=str, default='vgg16',
choices=['LeNet', 'vgg16' ,'vgg19', 'resnet18'],
help='model to use')
parser.add_argument('--pretrained-model-dir', type=str, default=None,
help='path to pretrained model')
parser.add_argument('--pretrain-epochs', type=int, default=160,
help='number of epochs to pretrain the model')
parser.add_argument('--batch-size', type=int, default=128,
help='input batch size for training')
parser.add_argument('--test-batch-size', type=int, default=200,
help='input batch size for testing')
parser.add_argument('--experiment-data-dir', type=str, default='./experiment_data',
help='For saving output checkpoints')
parser.add_argument('--log-interval', type=int, default=100, metavar='N',
help='how many batches to wait before logging training status')
parser.add_argument('--dry-run', action='store_true', default=False,
help='quickly check a single pass')
parser.add_argument('--multi-gpu', action='store_true', default=False,
help='run on mulitple gpus')
parser.add_argument('--test-only', action='store_true', default=False,
help='run test only')
# pruner
parser.add_argument('--sparsity', type=float, default=0.5,
help='target overall target sparsity')
parser.add_argument('--dependency-aware', action='store_true', default=False,
help='toggle dependency aware mode')
parser.add_argument('--pruner', type=str, default='l1filter',
choices=['level', 'l1filter', 'l2filter', 'slim', 'agp',
'fpgm', 'apoz'],
help='pruner to use')
# fine-tuning
parser.add_argument('--fine-tune-epochs', type=int, default=160,
help='epochs to fine tune')
# speed-up
parser.add_argument('--speed-up', action='store_true', default=False,
help='whether to speed-up the pruned model')
parser.add_argument('--nni', action='store_true', default=False,
help="whether to tune the pruners using NNi tuners")
args = parser.parse_args()
if args.nni:
params = nni.get_next_parameter()
print(params)
args.sparsity = params['sparsity']
args.pruner = params['pruner']
args.model = params['pruner']
main(args)
searchSpace:
sparsity:
_type: choice
_value: [0.25, 0.5, 0.75]
pruner:
_type: choice
_value: ['slim', 'l2filter', 'fpgm', 'apoz']
model:
_type: choice
_value: ['vgg16', 'vgg19']
trainingService:
platform: local
trialCodeDirectory: .
trialCommand: python3 basic_pruners_torch.py --nni
trialConcurrency: 1
trialGpuNumber: 0
tuner:
name: grid
AGPruner:
config:
-
start_epoch: 0
end_epoch: 10
frequency: 1
initial_sparsity: 0.05
final_sparsity: 0.8
op_types: ['default']
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI exmaple for fine-tuning the pruned model with KD.
Run basic_pruners_torch.py first to get the masks of the pruned model. Then pass the mask as argument for model speedup. The compressed model is further used for fine-tuning.
'''
import argparse
import os
import time
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR, MultiStepLR
from torchvision import datasets, transforms
from copy import deepcopy
from models.mnist.lenet import LeNet
from models.cifar10.vgg import VGG
from basic_pruners_torch import get_data
import nni
from nni.compression.pytorch import ModelSpeedup, get_dummy_input
class DistillKL(nn.Module):
"""Distilling the Knowledge in a Neural Network"""
def __init__(self, T):
super(DistillKL, self).__init__()
self.T = T
def forward(self, y_s, y_t):
p_s = F.log_softmax(y_s/self.T, dim=1)
p_t = F.softmax(y_t/self.T, dim=1)
loss = F.kl_div(p_s, p_t, size_average=False) * (self.T**2) / y_s.shape[0]
return loss
def get_model_optimizer_scheduler(args, device, test_loader, criterion):
if args.model == 'LeNet':
model = LeNet().to(device)
elif args.model == 'vgg16':
model = VGG(depth=16).to(device)
elif args.model == 'vgg19':
model = VGG(depth=19).to(device)
else:
raise ValueError("model not recognized")
# In this example, we set the architecture of teacher and student to be the same. It is feasible to set a different teacher architecture.
if args.teacher_model_dir is None:
raise NotImplementedError('please load pretrained teacher model first')
else:
model.load_state_dict(torch.load(args.teacher_model_dir))
best_acc = test(args, model, device, criterion, test_loader)
model_t = deepcopy(model)
model_s = deepcopy(model)
if args.student_model_dir is not None:
# load the pruned student model checkpoint
model_s.load_state_dict(torch.load(args.student_model_dir))
dummy_input = get_dummy_input(args, device)
m_speedup = ModelSpeedup(model_s, dummy_input, args.mask_path, device)
m_speedup.speedup_model()
module_list = nn.ModuleList([])
module_list.append(model_s)
module_list.append(model_t)
# setup opotimizer for fine-tuning studeng model
optimizer = torch.optim.SGD(model_s.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
scheduler = MultiStepLR(
optimizer, milestones=[int(args.fine_tune_epochs*0.5), int(args.fine_tune_epochs*0.75)], gamma=0.1)
print('Pretrained teacher model acc:', best_acc)
return module_list, optimizer, scheduler
def train(args, models, device, train_loader, criterion, optimizer, epoch):
# model.train()
model_s = models[0].train()
model_t = models[-1].eval()
cri_cls = criterion
cri_kd = DistillKL(args.kd_T)
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output_s = model_s(data)
output_t = model_t(data)
loss_cls = cri_cls(output_s, target)
loss_kd = cri_kd(output_s, output_t)
loss = loss_cls + loss_kd
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
if args.dry_run:
break
def test(args, model, device, criterion, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += criterion(output, target).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
acc = 100 * correct / len(test_loader.dataset)
print('Test Loss: {} Accuracy: {}%\n'.format(
test_loss, acc))
return acc
def main(args):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
os.makedirs(args.experiment_data_dir, exist_ok=True)
# prepare model and data
train_loader, test_loader, criterion = get_data(args.dataset, args.data_dir, args.batch_size, args.test_batch_size)
models, optimizer, scheduler = get_model_optimizer_scheduler(args, device, test_loader, criterion)
best_top1 = 0
if args.test_only:
test(args, models[0], device, criterion, test_loader)
print('start fine-tuning...')
for epoch in range(args.fine_tune_epochs):
print('# Epoch {} #'.format(epoch))
train(args, models, device, train_loader, criterion, optimizer, epoch)
scheduler.step()
# test student only
top1 = test(args, models[0], device, criterion, test_loader)
if top1 > best_top1:
best_top1 = top1
torch.save(models[0].state_dict(), os.path.join(args.experiment_data_dir, 'model_trained.pth'))
print('Model trained saved to %s' % args.experiment_data_dir)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
# dataset and model
parser.add_argument('--dataset', type=str, default='cifar10',
help='dataset to use, mnist, cifar10 or imagenet')
parser.add_argument('--data-dir', type=str, default='./data/',
help='dataset directory')
parser.add_argument('--model', type=str, default='vgg16',
choices=['LeNet', 'vgg16' ,'vgg19', 'resnet18'],
help='model to use')
parser.add_argument('--teacher-model-dir', type=str, default=None,
help='path to the pretrained teacher model checkpoint')
parser.add_argument('--mask-path', type=str, default=None,
help='path to the pruned student model mask file')
parser.add_argument('--student-model-dir', type=str, default=None,
help='path to the pruned student model checkpoint')
parser.add_argument('--batch-size', type=int, default=128,
help='input batch size for training')
parser.add_argument('--test-batch-size', type=int, default=200,
help='input batch size for testing')
parser.add_argument('--fine-tune-epochs', type=int, default=160,
help='epochs to fine tune')
parser.add_argument('--experiment-data-dir', type=str, default='./experiment_data',
help='For saving output checkpoints')
parser.add_argument('--log-interval', type=int, default=100, metavar='N',
help='how many batches to wait before logging training status')
parser.add_argument('--dry-run', action='store_true', default=False,
help='quickly check a single pass')
parser.add_argument('--test-only', action='store_true', default=False,
help='run test only')
# knowledge distillation
parser.add_argument('--kd_T', type=float, default=4,
help='temperature for KD distillation')
args = parser.parse_args()
main(args)
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI exmaple for reproducing Lottery Ticket Hypothesis.
'''
import argparse
import copy
import torch
......
import os
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from models.cifar10.vgg import VGG
import nni
from nni.algorithms.compression.pytorch.pruning import (
LevelPruner,
SlimPruner,
FPGMPruner,
L1FilterPruner,
L2FilterPruner,
AGPPruner,
ActivationMeanRankFilterPruner,
ActivationAPoZRankFilterPruner
)
prune_config = {
'level': {
'dataset_name': 'mnist',
'model_name': 'naive',
'pruner_class': LevelPruner,
'config_list': [{
'sparsity': 0.5,
'op_types': ['default'],
}]
},
'agp': {
'dataset_name': 'mnist',
'model_name': 'naive',
'pruner_class': AGPPruner,
'config_list': [{
'initial_sparsity': 0.,
'final_sparsity': 0.8,
'start_epoch': 0,
'end_epoch': 10,
'frequency': 1,
'op_types': ['Conv2d']
}]
},
'slim': {
'dataset_name': 'cifar10',
'model_name': 'vgg19',
'pruner_class': SlimPruner,
'config_list': [{
'sparsity': 0.7,
'op_types': ['BatchNorm2d']
}]
},
'fpgm': {
'dataset_name': 'mnist',
'model_name': 'naive',
'pruner_class': FPGMPruner,
'config_list': [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
},
'l1filter': {
'dataset_name': 'cifar10',
'model_name': 'vgg16',
'pruner_class': L1FilterPruner,
'config_list': [{
'sparsity': 0.5,
'op_types': ['Conv2d'],
'op_names': ['feature.0', 'feature.24', 'feature.27', 'feature.30', 'feature.34', 'feature.37']
}]
},
'mean_activation': {
'dataset_name': 'cifar10',
'model_name': 'vgg16',
'pruner_class': ActivationMeanRankFilterPruner,
'config_list': [{
'sparsity': 0.5,
'op_types': ['Conv2d'],
'op_names': ['feature.0', 'feature.24', 'feature.27', 'feature.30', 'feature.34', 'feature.37']
}]
},
'apoz': {
'dataset_name': 'cifar10',
'model_name': 'vgg16',
'pruner_class': ActivationAPoZRankFilterPruner,
'config_list': [{
'sparsity': 0.5,
'op_types': ['Conv2d'],
'op_names': ['feature.0', 'feature.24', 'feature.27', 'feature.30', 'feature.34', 'feature.37']
}]
}
}
def get_data_loaders(dataset_name='mnist', batch_size=128):
assert dataset_name in ['cifar10', 'mnist']
if dataset_name == 'cifar10':
ds_class = datasets.CIFAR10 if dataset_name == 'cifar10' else datasets.MNIST
MEAN, STD = (0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)
else:
ds_class = datasets.MNIST
MEAN, STD = (0.1307,), (0.3081,)
train_loader = DataLoader(
ds_class(
'./data', train=True, download=True,
transform=transforms.Compose(
[transforms.ToTensor(), transforms.Normalize(MEAN, STD)])
),
batch_size=batch_size, shuffle=True
)
test_loader = DataLoader(
ds_class(
'./data', train=False, download=True,
transform=transforms.Compose(
[transforms.ToTensor(), transforms.Normalize(MEAN, STD)])
),
batch_size=batch_size, shuffle=False
)
return train_loader, test_loader
class NaiveModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.bn1 = nn.BatchNorm2d(self.conv1.out_channels)
self.bn2 = nn.BatchNorm2d(self.conv2.out_channels)
self.fc1 = nn.Linear(4 * 4 * 50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.bn1(self.conv1(x)))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.bn2(self.conv2(x)))
x = F.max_pool2d(x, 2, 2)
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
def create_model(model_name='naive'):
assert model_name in ['naive', 'vgg16', 'vgg19']
if model_name == 'naive':
return NaiveModel()
elif model_name == 'vgg16':
return VGG(16)
else:
return VGG(19)
def create_pruner(model, pruner_name, optimizer=None, dependency_aware=False, dummy_input=None):
pruner_class = prune_config[pruner_name]['pruner_class']
config_list = prune_config[pruner_name]['config_list']
kw_args = {}
if dependency_aware:
print('Enable the dependency_aware mode')
# note that, not all pruners support the dependency_aware mode
kw_args['dependency_aware'] = True
kw_args['dummy_input'] = dummy_input
pruner = pruner_class(model, config_list, optimizer, **kw_args)
return pruner
def train(model, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(
100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.cross_entropy(output,
target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
acc = 100 * correct / len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, acc))
return acc
def main(args):
device = torch.device(
'cuda') if torch.cuda.is_available() else torch.device('cpu')
os.makedirs(args.checkpoints_dir, exist_ok=True)
model_name = prune_config[args.pruner_name]['model_name']
dataset_name = prune_config[args.pruner_name]['dataset_name']
train_loader, test_loader = get_data_loaders(dataset_name, args.batch_size)
dummy_input, _ = next(iter(train_loader))
dummy_input = dummy_input.to(device)
model = create_model(model_name).to(device)
if args.resume_from is not None and os.path.exists(args.resume_from):
print('loading checkpoint {} ...'.format(args.resume_from))
model.load_state_dict(torch.load(args.resume_from))
test(model, device, test_loader)
else:
optimizer = torch.optim.SGD(
model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
if args.multi_gpu and torch.cuda.device_count():
model = nn.DataParallel(model)
print('start training')
pretrain_model_path = os.path.join(
args.checkpoints_dir, 'pretrain_{}_{}_{}.pth'.format(model_name, dataset_name, args.pruner_name))
for epoch in range(args.pretrain_epochs):
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
torch.save(model.state_dict(), pretrain_model_path)
print('start model pruning...')
model_path = os.path.join(args.checkpoints_dir, 'pruned_{}_{}_{}.pth'.format(
model_name, dataset_name, args.pruner_name))
mask_path = os.path.join(args.checkpoints_dir, 'mask_{}_{}_{}.pth'.format(
model_name, dataset_name, args.pruner_name))
# pruner needs to be initialized from a model not wrapped by DataParallel
if isinstance(model, nn.DataParallel):
model = model.module
optimizer_finetune = torch.optim.SGD(
model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
best_top1 = 0
pruner = create_pruner(model, args.pruner_name,
optimizer_finetune, args.dependency_aware, dummy_input)
model = pruner.compress()
if args.multi_gpu and torch.cuda.device_count() > 1:
model = nn.DataParallel(model)
for epoch in range(args.prune_epochs):
pruner.update_epoch(epoch)
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer_finetune)
top1 = test(model, device, test_loader)
if top1 > best_top1:
best_top1 = top1
# Export the best model, 'model_path' stores state_dict of the pruned model,
# mask_path stores mask_dict of the pruned model
pruner.export_model(model_path=model_path, mask_path=mask_path)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--pruner_name", type=str,
default="level", help="pruner name")
parser.add_argument("--batch_size", type=int, default=256)
parser.add_argument("--pretrain_epochs", type=int,
default=10, help="training epochs before model pruning")
parser.add_argument("--prune_epochs", type=int, default=10,
help="training epochs for model pruning")
parser.add_argument("--checkpoints_dir", type=str,
default="./checkpoints", help="checkpoints directory")
parser.add_argument("--resume_from", type=str,
default=None, help="pretrained model weights")
parser.add_argument("--multi_gpu", action="store_true",
help="Use multiple GPUs for training")
parser.add_argument("--dependency_aware", action="store_true", default=False,
help="If enable the dependency_aware mode for the pruner")
args = parser.parse_args()
main(args)
......@@ -6,6 +6,7 @@ import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from models.cifar10.vgg import VGG
from models.mnist.lenet import LeNet
from nni.compression.pytorch import apply_compression_results, ModelSpeedup
torch.manual_seed(0)
......@@ -15,24 +16,24 @@ compare_results = True
config = {
'apoz': {
'model_name': 'vgg16',
'input_shape': [64, 3, 32, 32],
'masks_file': './checkpoints/mask_vgg16_cifar10_apoz.pth'
'model_name': 'lenet',
'input_shape': [64, 1, 28, 28],
'masks_file': './experiment_data/mask_lenet_mnist_apoz.pth'
},
'l1filter': {
'model_name': 'vgg16',
'input_shape': [64, 3, 32, 32],
'masks_file': './checkpoints/mask_vgg16_cifar10_l1filter.pth'
'masks_file': './experiment_data/mask_vgg16_cifar10_l1filter.pth'
},
'fpgm': {
'model_name': 'naive',
'input_shape': [64, 1, 28, 28],
'masks_file': './checkpoints/mask_naive_mnist_fpgm.pth'
'model_name': 'vgg16',
'input_shape': [64, 3, 32, 32],
'masks_file': './experiment_data/mask_vgg16_cifar10_fpgm.pth'
},
'slim': {
'model_name': 'vgg19',
'input_shape': [64, 3, 32, 32],
'masks_file': './checkpoints/mask_vgg19_cifar10_slim.pth' #'mask_vgg19_cifar10.pth'
'masks_file': './experiment_data/mask_vgg19_cifar10_slim.pth'
}
}
......@@ -46,9 +47,9 @@ def model_inference(config):
model = VGG(depth=16)
elif config['model_name'] == 'vgg19':
model = VGG(depth=19)
elif config['model_name'] == 'naive':
from model_prune_torch import NaiveModel
model = NaiveModel()
elif config['model_name'] == 'lenet':
model = LeNet()
model.to(device)
model.eval()
......
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for quick start of pruning.
In this example, we use level pruner to prune the LeNet on MNIST.
'''
import argparse
import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.layers import (Conv2D, Dense, Dropout, Flatten, MaxPool2D)
from nni.algorithms.compression.tensorflow.pruning import LevelPruner
class LeNet(Model):
"""
LeNet-5 Model with customizable hyper-parameters
"""
def __init__(self, conv_size=3, hidden_size=32, dropout_rate=0.5):
"""
Initialize hyper-parameters.
Parameters
----------
conv_size : int
Kernel size of convolutional layers.
hidden_size : int
Dimensionality of last hidden layer.
dropout_rate : float
Dropout rate between two fully connected (dense) layers, to prevent co-adaptation.
"""
super().__init__()
self.conv1 = Conv2D(filters=32, kernel_size=conv_size, activation='relu')
self.pool1 = MaxPool2D(pool_size=2)
self.conv2 = Conv2D(filters=64, kernel_size=conv_size, activation='relu')
self.pool2 = MaxPool2D(pool_size=2)
self.flatten = Flatten()
self.fc1 = Dense(units=hidden_size, activation='relu')
self.dropout = Dropout(rate=dropout_rate)
self.fc2 = Dense(units=10, activation='softmax')
def call(self, x):
"""Override ``Model.call`` to build LeNet-5 model."""
x = self.conv1(x)
x = self.pool1(x)
x = self.conv2(x)
x = self.pool2(x)
x = self.flatten(x)
x = self.fc1(x)
x = self.dropout(x)
return self.fc2(x)
def get_dataset(dataset_name='mnist'):
assert dataset_name == 'mnist'
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train[..., tf.newaxis] / 255.0
x_test = x_test[..., tf.newaxis] / 255.0
return (x_train, y_train), (x_test, y_test)
# def create_model(model_name='naive'):
# assert model_name == 'naive'
# return tf.keras.Sequential([
# tf.keras.layers.Conv2D(filters=20, kernel_size=5),
# tf.keras.layers.BatchNormalization(),
# tf.keras.layers.ReLU(),
# tf.keras.layers.MaxPool2D(pool_size=2),
# tf.keras.layers.Conv2D(filters=20, kernel_size=5),
# tf.keras.layers.BatchNormalization(),
# tf.keras.layers.ReLU(),
# tf.keras.layers.MaxPool2D(pool_size=2),
# tf.keras.layers.Flatten(),
# tf.keras.layers.Dense(units=500),
# tf.keras.layers.ReLU(),
# tf.keras.layers.Dense(units=10),
# tf.keras.layers.Softmax()
# ])
def main(args):
train_set, test_set = get_dataset('mnist')
model = LeNet()
print('start training')
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9, decay=1e-4)
model.compile(
optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
model.fit(
train_set[0],
train_set[1],
batch_size=args.batch_size,
epochs=args.pretrain_epochs,
validation_data=test_set
)
print('start pruning')
optimizer_finetune = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9, decay=1e-4)
# create_pruner
prune_config = [{
'sparsity': args.sparsity,
'op_types': ['default'],
}]
pruner = LevelPruner(model, prune_config)
# pruner = create_pruner(model, args.pruner_name)
model = pruner.compress()
model.compile(
optimizer=optimizer_finetune,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],
run_eagerly=True # NOTE: Important, model compression does not work in graph mode!
)
# fine-tuning
model.fit(
train_set[0],
train_set[1],
batch_size=args.batch_size,
epochs=args.prune_epochs,
validation_data=test_set
)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--pruner_name', type=str, default='level')
parser.add_argument('--batch-size', type=int, default=256)
parser.add_argument('--pretrain_epochs', type=int, default=10)
parser.add_argument('--prune_epochs', type=int, default=10)
parser.add_argument('--sparsity', type=float, default=0.5)
args = parser.parse_args()
main(args)
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for quick start of pruning.
In this example, we use level pruner to prune the LeNet on MNIST.
'''
import logging
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import StepLR
from models.mnist.lenet import LeNet
from nni.algorithms.compression.pytorch.pruning import LevelPruner
import nni
_logger = logging.getLogger('mnist_example')
_logger.setLevel(logging.INFO)
def train(args, model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
if args.dry_run:
break
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
acc = 100 * correct / len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset), acc))
return acc
def main(args):
torch.manual_seed(args.seed)
use_cuda = not args.no_cuda and torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
train_kwargs = {'batch_size': args.batch_size}
test_kwargs = {'batch_size': args.test_batch_size}
if use_cuda:
cuda_kwargs = {'num_workers': 1,
'pin_memory': True,
'shuffle': True}
train_kwargs.update(cuda_kwargs)
test_kwargs.update(cuda_kwargs)
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
dataset1 = datasets.MNIST('./data', train=True, download=True,
transform=transform)
dataset2 = datasets.MNIST('./data', train=False,
transform=transform)
train_loader = torch.utils.data.DataLoader(dataset1,**train_kwargs)
test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)
model = LeNet().to(device)
optimizer = optim.Adadelta(model.parameters(), lr=args.lr)
print('start pre-training')
scheduler = StepLR(optimizer, step_size=1, gamma=args.gamma)
for epoch in range(1, args.epochs + 1):
train(args, model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
scheduler.step()
torch.save(model.state_dict(), "pretrain_mnist_lenet.pt")
print('start pruning')
optimizer_finetune = torch.optim.SGD(model.parameters(), lr=0.01)
# create pruner
prune_config = [{
'sparsity': args.sparsity,
'op_types': ['default'],
}]
pruner = LevelPruner(model, prune_config, optimizer_finetune)
model = pruner.compress()
# fine-tuning
best_top1 = 0
for epoch in range(1, args.epochs + 1):
pruner.update_epoch(epoch)
train(args, model, device, train_loader, optimizer_finetune, epoch)
top1 = test(model, device, test_loader)
if top1 > best_top1:
best_top1 = top1
# Export the best model, 'model_path' stores state_dict of the pruned model,
# mask_path stores mask_dict of the pruned model
pruner.export_model(model_path='pruend_mnist_lenet.pt', mask_path='mask_mnist_lenet.pt')
if __name__ == '__main__':
# Training settings
parser = argparse.ArgumentParser(description='PyTorch MNIST Example for model comporession')
parser.add_argument('--batch-size', type=int, default=64, metavar='N',
help='input batch size for training (default: 64)')
parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
help='input batch size for testing (default: 1000)')
parser.add_argument('--epochs', type=int, default=10, metavar='N',
help='number of epochs to train (default: 10)')
parser.add_argument('--lr', type=float, default=1.0, metavar='LR',
help='learning rate (default: 1.0)')
parser.add_argument('--gamma', type=float, default=0.7, metavar='M',
help='Learning rate step gamma (default: 0.7)')
parser.add_argument('--no-cuda', action='store_true', default=False,
help='disables CUDA training')
parser.add_argument('--dry-run', action='store_true', default=False,
help='quickly check a single pass')
parser.add_argument('--seed', type=int, default=1, metavar='S',
help='random seed (default: 1)')
parser.add_argument('--log-interval', type=int, default=10, metavar='N',
help='how many batches to wait before logging training status')
parser.add_argument('--sparsity', type=float, default=0.5,
help='target overall target sparsity')
args = parser.parse_args()
main(args)
\ No newline at end of file
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.algorithms.compression.pytorch.pruning import L1FilterPruner
from models.cifar10.vgg import VGG
def train(model, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
acc = 100 * correct / len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, acc))
return acc
def main():
torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=True, download=True,
transform=transforms.Compose([
transforms.Pad(4),
transforms.RandomCrop(32),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=200, shuffle=False)
model = VGG(depth=16)
model.to(device)
# Train the base VGG-16 model
print('=' * 10 + 'Train the unpruned base model' + '=' * 10)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 160, 0)
for epoch in range(160):
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
lr_scheduler.step(epoch)
torch.save(model.state_dict(), 'vgg16_cifar10.pth')
# Test base model accuracy
print('=' * 10 + 'Test on the original model' + '=' * 10)
model.load_state_dict(torch.load('vgg16_cifar10.pth'))
test(model, device, test_loader)
# top1 = 93.51%
# Pruning Configuration, in paper 'PRUNING FILTERS FOR EFFICIENT CONVNETS',
# Conv_1, Conv_8, Conv_9, Conv_10, Conv_11, Conv_12 are pruned with 50% sparsity, as 'VGG-16-pruned-A'
configure_list = [{
'sparsity': 0.5,
'op_types': ['default'],
'op_names': ['feature.0', 'feature.24', 'feature.27', 'feature.30', 'feature.34', 'feature.37']
}]
# Prune model and test accuracy without fine tuning.
print('=' * 10 + 'Test on the pruned model before fine tune' + '=' * 10)
optimizer_finetune = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
pruner = L1FilterPruner(model, configure_list, optimizer_finetune)
model = pruner.compress()
test(model, device, test_loader)
# top1 = 88.19%
# Fine tune the pruned model for 40 epochs and test accuracy
print('=' * 10 + 'Fine tuning' + '=' * 10)
best_top1 = 0
for epoch in range(40):
pruner.update_epoch(epoch)
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer_finetune)
top1 = test(model, device, test_loader)
if top1 > best_top1:
best_top1 = top1
# Export the best model, 'model_path' stores state_dict of the pruned model,
# mask_path stores mask_dict of the pruned model
pruner.export_model(model_path='pruned_vgg16_cifar10.pth', mask_path='mask_vgg16_cifar10.pth')
# Test the exported model
print('=' * 10 + 'Test on the pruned model after fine tune' + '=' * 10)
new_model = VGG(depth=16)
new_model.to(device)
new_model.load_state_dict(torch.load('pruned_vgg16_cifar10.pth'))
test(new_model, device, test_loader)
# top1 = 93.53%
if __name__ == '__main__':
main()
\ No newline at end of file
import math
import os
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.algorithms.compression.pytorch.pruning import SlimPruner
from models.cifar10.vgg import VGG
def updateBN(model):
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
m.weight.grad.data.add_(0.0001 * torch.sign(m.weight.data)) # L1
def train(model, device, train_loader, optimizer, sparse_bn=False):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, target)
loss.backward()
# L1 regularization on BN layer
if sparse_bn:
updateBN(model)
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
acc = 100 * correct / len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, acc))
return acc
def main():
parser = argparse.ArgumentParser("multiple gpu with pruning")
parser.add_argument("--epochs", type=int, default=160)
parser.add_argument("--retrain", default=False, action="store_true")
parser.add_argument("--parallel", default=False, action="store_true")
args = parser.parse_args()
torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=True, download=True,
transform=transforms.Compose([
transforms.Pad(4),
transforms.RandomCrop(32),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=200, shuffle=False)
model = VGG(depth=19)
model.to(device)
# Train the base VGG-19 model
if args.retrain:
print('=' * 10 + 'Train the unpruned base model' + '=' * 10)
epochs = args.epochs
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
for epoch in range(epochs):
if epoch in [epochs * 0.5, epochs * 0.75]:
for param_group in optimizer.param_groups:
param_group['lr'] *= 0.1
print("epoch {}".format(epoch))
train(model, device, train_loader, optimizer, True)
test(model, device, test_loader)
torch.save(model.state_dict(), 'vgg19_cifar10.pth')
else:
assert os.path.isfile('vgg19_cifar10.pth'), "can not find checkpoint 'vgg19_cifar10.pth'"
model.load_state_dict(torch.load('vgg19_cifar10.pth'))
# Test base model accuracy
print('=' * 10 + 'Test the original model' + '=' * 10)
test(model, device, test_loader)
# top1 = 93.60%
# Pruning Configuration, in paper 'Learning efficient convolutional networks through network slimming',
configure_list = [{
'sparsity': 0.7,
'op_types': ['BatchNorm2d'],
}]
# Prune model and test accuracy without fine tuning.
print('=' * 10 + 'Test the pruned model before fine tune' + '=' * 10)
optimizer_finetune = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
pruner = SlimPruner(model, configure_list, optimizer_finetune)
model = pruner.compress()
if args.parallel:
if torch.cuda.device_count() > 1:
print("use {} gpus for pruning".format(torch.cuda.device_count()))
model = nn.DataParallel(model)
# model = nn.DataParallel(model, device_ids=[0, 1])
else:
print("only detect 1 gpu, fall back")
model.to(device)
# Fine tune the pruned model for 40 epochs and test accuracy
print('=' * 10 + 'Fine tuning' + '=' * 10)
best_top1 = 0
for epoch in range(40):
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer_finetune)
top1 = test(model, device, test_loader)
if top1 > best_top1:
best_top1 = top1
# Export the best model, 'model_path' stores state_dict of the pruned model,
# mask_path stores mask_dict of the pruned model
pruner.export_model(model_path='pruned_vgg19_cifar10.pth', mask_path='mask_vgg19_cifar10.pth')
# Test the exported model
print('=' * 10 + 'Test the export pruned model after fine tune' + '=' * 10)
new_model = VGG(depth=19)
new_model.to(device)
new_model.load_state_dict(torch.load('pruned_vgg19_cifar10.pth'))
test(new_model, device, test_loader)
# top1 = 93.74%
if __name__ == '__main__':
main()
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment