"\n# Pruning Quickstart\n\nHere is a three-minute video to get you started with model pruning.\n\n.. youtube:: wKh51Jnr0a8\n :align: center\n\nModel pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.\nThere are three common practices for pruning a DNN model:\n\n#. Pre-training a model -> Pruning the model -> Fine-tuning the pruned model\n#. Pruning a model during training (i.e., pruning aware training) -> Fine-tuning the pruned model\n#. Pruning a model -> Training the pruned model from scratch\n\nNNI supports all of the above pruning practices by working on the key pruning stage.\nFollowing this tutorial for a quick look at how to use NNI to prune a model in a common practice.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparation\n\nIn this tutorial, we use a simple model and pre-trained on MNIST dataset.\nIf you are familiar with defining a model and training in pytorch, you can skip directly to `Pruning Model`_.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch\nimport torch.nn.functional as F\nfrom torch.optim import SGD\n\nfrom scripts.compression_mnist_model import TorchModel, trainer, evaluator, device\n\n# define the model\nmodel = TorchModel().to(device)\n\n# show the model structure, note that pruner will wrap the model layer.\nprint(model)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# define the optimizer and criterion for pre-training\n\noptimizer = SGD(model.parameters(), 1e-2)\ncriterion = F.nll_loss\n\n# pre-train and evaluate the model on MNIST dataset\nfor epoch in range(3):\n trainer(model, optimizer, criterion)\n evaluator(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pruning Model\n\nUsing L1NormPruner to prune the model and generate the masks.\nUsually, a pruner requires original model and ``config_list`` as its inputs.\nDetailed about how to write ``config_list`` please refer :doc:`compression config specification <../compression/compression_config_list>`.\n\nThe following `config_list` means all layers whose type is `Linear` or `Conv2d` will be pruned,\nexcept the layer named `fc3`, because `fc3` is `exclude`.\nThe final sparsity ratio for each layer is 50%. The layer named `fc3` will not be pruned.\n\n"
"Pruners usually require `model` and `config_list` as input arguments.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from nni.compression.pytorch.pruning import L1NormPruner\npruner = L1NormPruner(model, config_list)\n\n# show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.\nprint(model)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# compress the model and generate the masks\n_, masks = pruner.compress()\n# show the masks sparsity\nfor name, mask in masks.items():\n print(name, ' sparsity : ', '{:.2}'.format(mask['weight'].sum() / mask['weight'].numel()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Speedup the original model with masks, note that `ModelSpeedup` requires an unwrapped model.\nThe model becomes smaller after speedup,\nand reaches a higher sparsity ratio because `ModelSpeedup` will propagate the masks across layers.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# need to unwrap the model, if the model is wrapped before speedup\npruner._unwrap_model()\n\n# speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.\nfrom nni.compression.pytorch.speedup import ModelSpeedup\n\nModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"the model will become real smaller after speedup\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fine-tuning Compacted Model\nNote that if the model has been sped up, you need to re-initialize a new optimizer for fine-tuning.\nBecause speedup will replace the masked big layers with dense small ones.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"optimizer = SGD(model.parameters(), 1e-2)\nfor epoch in range(3):\n trainer(model, optimizer, criterion)"
/home/ningshang/anaconda3/envs/nni-dev/lib/python3.8/site-packages/torch/_tensor.py:1013:UserWarning:The.gradattributeofaTensorthatisnotaleafTensorisbeingaccessed.Its.gradattributewon't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:417.)
/home/ningshang/anaconda3/envs/nni-dev/lib/python3.8/site-packages/torch/_tensor.py:1013:UserWarning:The.gradattributeofaTensorthatisnotaleafTensorisbeingaccessed.Its.gradattributewon't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:417.)
"\n# Speedup Model with Mask\n\n## Introduction\n\nPruning algorithms usually use weight masks to simulate the real pruning. Masks can be used\nto check model performance of a specific pruning (or sparsity), but there is no real speedup.\nSince model speedup is the ultimate goal of model pruning, we try to provide a tool to users\nto convert a model to a smaller one based on user provided masks (the masks come from the\npruning algorithms).\n\nThere are two types of pruning. One is fine-grained pruning, it does not change the shape of weights,\nand input/output tensors. Sparse kernel is required to speedup a fine-grained pruned layer.\nThe other is coarse-grained pruning (e.g., channels), shape of weights and input/output tensors usually change due to such pruning.\nTo speedup this kind of pruning, there is no need to use sparse kernel, just replace the pruned layer with smaller one.\nSince the support of sparse kernels in community is limited,\nwe only support the speedup of coarse-grained pruning and leave the support of fine-grained pruning in future.\n\n## Design and Implementation\n\nTo speedup a model, the pruned layers should be replaced, either replaced with smaller layer for coarse-grained mask,\nor replaced with sparse kernel for fine-grained mask. Coarse-grained mask usually changes the shape of weights or input/output tensors,\nthus, we should do shape inference to check are there other unpruned layers should be replaced as well due to shape change.\nTherefore, in our design, there are two main steps: first, do shape inference to find out all the modules that should be replaced;\nsecond, replace the modules.\n\nThe first step requires topology (i.e., connections) of the model, we use ``jit.trace`` to obtain the model graph for PyTorch.\nThe new shape of module is auto-inference by NNI, the unchanged parts of outputs during forward and inputs during backward are prepared for reduct.\nFor each type of module, we should prepare a function for module replacement.\nThe module replacement function returns a newly created module which is smaller.\n\n## Usage\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Generate a mask for the model at first.\nWe usually use a NNI pruner to generate the masks then use ``ModelSpeedup`` to compact the model.\nBut in fact ``ModelSpeedup`` is a relatively independent tool, so you can use it independently.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch\nfrom scripts.compression_mnist_model import TorchModel, device\n\nmodel = TorchModel().to(device)\n# masks = {layer_name: {'weight': weight_mask, 'bias': bias_mask}}\nconv1_mask = torch.ones_like(model.conv1.weight.data)\n# mask the first three output channels in conv1\nconv1_mask[0: 3] = 0\nmasks = {'conv1': {'weight': conv1_mask}}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show the original model structure.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Roughly test the original model inference speed.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import time\nstart = time.time()\nmodel(torch.rand(128, 1, 28, 28).to(device))\nprint('Original Model - Elapsed Time : ', time.time() - start)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Speedup the model and show the model structure after speedup.\n\n"
# Roughly test the model after speedup inference speed.
start=time.time()
model(torch.rand(128,1,28,28).to(device))
print('Speedup Model - Elapsed Time : ',time.time()-start)
# %%
# For combining usage of ``Pruner`` masks generation with ``ModelSpeedup``,
# please refer to :doc:`Pruning Quick Start <pruning_quick_start_mnist>`.
#
# NOTE: The current implementation supports PyTorch 1.3.1 or newer.
#
# Limitations
# -----------
#
# For PyTorch we can only replace modules, if functions in ``forward`` should be replaced,
# our current implementation does not work. One workaround is make the function a PyTorch module.
#
# If you want to speedup your own model which cannot supported by the current implementation,
# you need implement the replace function for module replacement, welcome to contribute.
#
# Speedup Results of Examples
# ---------------------------
#
# The code of these experiments can be found :githublink:`here <examples/model_compress/pruning/legacy/speedup/model_speedup.py>`.
#
# These result are tested on the `legacy pruning framework <https://nni.readthedocs.io/en/v2.6/Compression/pruning.html>`_, new results will coming soon.
#
# slim pruner example
# ^^^^^^^^^^^^^^^^^^^
#
# on one V100 GPU,
# input tensor: ``torch.randn(64, 3, 32, 32)``
#
# .. list-table::
# :header-rows: 1
# :widths: auto
#
# * - Times
# - Mask Latency
# - Speedup Latency
# * - 1
# - 0.01197
# - 0.005107
# * - 2
# - 0.02019
# - 0.008769
# * - 4
# - 0.02733
# - 0.014809
# * - 8
# - 0.04310
# - 0.027441
# * - 16
# - 0.07731
# - 0.05008
# * - 32
# - 0.14464
# - 0.10027
#
# fpgm pruner example
# ^^^^^^^^^^^^^^^^^^^
#
# on cpu,
# input tensor: ``torch.randn(64, 1, 28, 28)``\ ,
# too large variance
#
# .. list-table::
# :header-rows: 1
# :widths: auto
#
# * - Times
# - Mask Latency
# - Speedup Latency
# * - 1
# - 0.01383
# - 0.01839
# * - 2
# - 0.01167
# - 0.003558
# * - 4
# - 0.01636
# - 0.01088
# * - 40
# - 0.14412
# - 0.08268
# * - 40
# - 1.29385
# - 0.14408
# * - 40
# - 0.41035
# - 0.46162
# * - 400
# - 6.29020
# - 5.82143
#
# l1filter pruner example
# ^^^^^^^^^^^^^^^^^^^^^^^
#
# on one V100 GPU,
# input tensor: ``torch.randn(64, 3, 32, 32)``
#
# .. list-table::
# :header-rows: 1
# :widths: auto
#
# * - Times
# - Mask Latency
# - Speedup Latency
# * - 1
# - 0.01026
# - 0.003677
# * - 2
# - 0.01657
# - 0.008161
# * - 4
# - 0.02458
# - 0.020018
# * - 8
# - 0.03498
# - 0.025504
# * - 16
# - 0.06757
# - 0.047523
# * - 32
# - 0.10487
# - 0.086442
#
# APoZ pruner example
# ^^^^^^^^^^^^^^^^^^^
#
# on one V100 GPU,
# input tensor: ``torch.randn(64, 3, 32, 32)``
#
# .. list-table::
# :header-rows: 1
# :widths: auto
#
# * - Times
# - Mask Latency
# - Speedup Latency
# * - 1
# - 0.01389
# - 0.004208
# * - 2
# - 0.01628
# - 0.008310
# * - 4
# - 0.02521
# - 0.014008
# * - 8
# - 0.03386
# - 0.023923
# * - 16
# - 0.06042
# - 0.046183
# * - 32
# - 0.12421
# - 0.087113
#
# SimulatedAnnealing pruner example
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# In this experiment, we use SimulatedAnnealing pruner to prune the resnet18 on the cifar10 dataset.
# We measure the latencies and accuracies of the pruned model under different sparsity ratios, as shown in the following figure.
# The latency is measured on one V100 GPU and the input tensor is ``torch.randn(128, 3, 32, 32)``.
/home/nishang/anaconda3/envs/MCM/lib/python3.9/site-packages/torch/_tensor.py:1013:UserWarning:The.gradattributeofaTensorthatisnotaleafTensorisbeingaccessed.Its.gradattributewon't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1640811803361/work/build/aten/src/ATen/core/TensorBody.h:417.)
Speedup Model - Elapsed Time : 0.006000041961669922
.. GENERATED FROM PYTHON SOURCE LINES 79-240
For combining usage of ``Pruner`` masks generation with ``ModelSpeedup``,
please refer to :doc:`Pruning Quick Start <pruning_quick_start_mnist>`.
NOTE: The current implementation supports PyTorch 1.3.1 or newer.
Limitations
-----------
For PyTorch we can only replace modules, if functions in ``forward`` should be replaced,
our current implementation does not work. One workaround is make the function a PyTorch module.
If you want to speedup your own model which cannot supported by the current implementation,
you need implement the replace function for module replacement, welcome to contribute.
Speedup Results of Examples
---------------------------
The code of these experiments can be found :githublink:`here <examples/model_compress/pruning/legacy/speedup/model_speedup.py>`.
These result are tested on the `legacy pruning framework <https://nni.readthedocs.io/en/v2.6/Compression/pruning.html>`_, new results will coming soon.
slim pruner example
^^^^^^^^^^^^^^^^^^^
on one V100 GPU,
input tensor: ``torch.randn(64, 3, 32, 32)``
.. list-table::
:header-rows: 1
:widths: auto
* - Times
- Mask Latency
- Speedup Latency
* - 1
- 0.01197
- 0.005107
* - 2
- 0.02019
- 0.008769
* - 4
- 0.02733
- 0.014809
* - 8
- 0.04310
- 0.027441
* - 16
- 0.07731
- 0.05008
* - 32
- 0.14464
- 0.10027
fpgm pruner example
^^^^^^^^^^^^^^^^^^^
on cpu,
input tensor: ``torch.randn(64, 1, 28, 28)``\ ,
too large variance
.. list-table::
:header-rows: 1
:widths: auto
* - Times
- Mask Latency
- Speedup Latency
* - 1
- 0.01383
- 0.01839
* - 2
- 0.01167
- 0.003558
* - 4
- 0.01636
- 0.01088
* - 40
- 0.14412
- 0.08268
* - 40
- 1.29385
- 0.14408
* - 40
- 0.41035
- 0.46162
* - 400
- 6.29020
- 5.82143
l1filter pruner example
^^^^^^^^^^^^^^^^^^^^^^^
on one V100 GPU,
input tensor: ``torch.randn(64, 3, 32, 32)``
.. list-table::
:header-rows: 1
:widths: auto
* - Times
- Mask Latency
- Speedup Latency
* - 1
- 0.01026
- 0.003677
* - 2
- 0.01657
- 0.008161
* - 4
- 0.02458
- 0.020018
* - 8
- 0.03498
- 0.025504
* - 16
- 0.06757
- 0.047523
* - 32
- 0.10487
- 0.086442
APoZ pruner example
^^^^^^^^^^^^^^^^^^^
on one V100 GPU,
input tensor: ``torch.randn(64, 3, 32, 32)``
.. list-table::
:header-rows: 1
:widths: auto
* - Times
- Mask Latency
- Speedup Latency
* - 1
- 0.01389
- 0.004208
* - 2
- 0.01628
- 0.008310
* - 4
- 0.02521
- 0.014008
* - 8
- 0.03386
- 0.023923
* - 16
- 0.06042
- 0.046183
* - 32
- 0.12421
- 0.087113
SimulatedAnnealing pruner example
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In this experiment, we use SimulatedAnnealing pruner to prune the resnet18 on the cifar10 dataset.
We measure the latencies and accuracies of the pruned model under different sparsity ratios, as shown in the following figure.
The latency is measured on one V100 GPU and the input tensor is ``torch.randn(128, 3, 32, 32)``.
.. image:: ../../img/SA_latency_accuracy.png
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 4.528 seconds)
"\n# Customize a new quantization algorithm\n\nTo write a new quantization algorithm, you can write a class that inherits ``nni.compression.pytorch.Quantizer``.\nThen, override the member functions with the logic of your algorithm. The member function to override is ``quantize_weight``.\n``quantize_weight`` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from nni.compression.pytorch import Quantizer\n\nclass YourQuantizer(Quantizer):\n def __init__(self, model, config_list):\n \"\"\"\n Suggest you to use the NNI defined spec for config\n \"\"\"\n super().__init__(model, config_list)\n\n def quantize_weight(self, weight, config, **kwargs):\n \"\"\"\n quantize should overload this method to quantize weight tensors.\n This method is effectively hooked to :meth:`forward` of the model.\n\n Parameters\n ----------\n weight : Tensor\n weight that needs to be quantized\n config : dict\n the configuration for weight quantization\n \"\"\"\n\n # Put your code to generate `new_weight` here\n new_weight = ...\n return new_weight\n\n def quantize_output(self, output, config, **kwargs):\n \"\"\"\n quantize should overload this method to quantize output.\n This method is effectively hooked to `:meth:`forward` of the model.\n\n Parameters\n ----------\n output : Tensor\n output that needs to be quantized\n config : dict\n the configuration for output quantization\n \"\"\"\n\n # Put your code to generate `new_output` here\n new_output = ...\n return new_output\n\n def quantize_input(self, *inputs, config, **kwargs):\n \"\"\"\n quantize should overload this method to quantize input.\n This method is effectively hooked to :meth:`forward` of the model.\n\n Parameters\n ----------\n inputs : Tensor\n inputs that needs to be quantized\n config : dict\n the configuration for inputs quantization\n \"\"\"\n\n # Put your code to generate `new_input` here\n new_input = ...\n return new_input\n\n def update_epoch(self, epoch_num):\n pass\n\n def step(self):\n \"\"\"\n Can do some processing based on the model or weights binded\n in the func bind_model\n \"\"\"\n pass"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Customize backward function\n\nSometimes it's necessary for a quantization operation to have a customized backward function,\nsuch as `Straight-Through Estimator <https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste>`__\\ ,\nuser can customize a backward function as follow:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from nni.compression.pytorch.compressor import Quantizer, QuantGrad, QuantType\n\nclass ClipGrad(QuantGrad):\n @staticmethod\n def quant_backward(tensor, grad_output, quant_type):\n \"\"\"\n This method should be overrided by subclass to provide customized backward function,\n default implementation is Straight-Through Estimator\n Parameters\n ----------\n tensor : Tensor\n input of quantization operation\n grad_output : Tensor\n gradient of the output of quantization operation\n quant_type : QuantType\n the type of quantization, it can be `QuantType.INPUT`, `QuantType.WEIGHT`, `QuantType.OUTPUT`,\n you can define different behavior for different types.\n Returns\n -------\n tensor\n gradient of the input of quantization operation\n \"\"\"\n\n # for quant_output function, set grad to zero if the absolute value of tensor is larger than 1\n if quant_type == QuantType.OUTPUT:\n grad_output[tensor.abs() > 1] = 0\n return grad_output\n\nclass _YourQuantizer(Quantizer):\n def __init__(self, model, config_list):\n super().__init__(model, config_list)\n # set your customized backward function to overwrite default backward function\n self.quant_grad = ClipGrad"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you do not customize ``QuantGrad``, the default backward is Straight-Through Estimator. \n\n"
To write a new quantization algorithm, you can write a class that inherits ``nni.compression.pytorch.Quantizer``.
Then, override the member functions with the logic of your algorithm. The member function to override is ``quantize_weight``.
``quantize_weight`` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
"""
fromnni.compression.pytorchimportQuantizer
classYourQuantizer(Quantizer):
def__init__(self,model,config_list):
"""
Suggest you to use the NNI defined spec for config
"""
super().__init__(model,config_list)
defquantize_weight(self,weight,config,**kwargs):
"""
quantize should overload this method to quantize weight tensors.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
weight : Tensor
weight that needs to be quantized
config : dict
the configuration for weight quantization
"""
# Put your code to generate `new_weight` here
new_weight=...
returnnew_weight
defquantize_output(self,output,config,**kwargs):
"""
quantize should overload this method to quantize output.
This method is effectively hooked to `:meth:`forward` of the model.
Parameters
----------
output : Tensor
output that needs to be quantized
config : dict
the configuration for output quantization
"""
# Put your code to generate `new_output` here
new_output=...
returnnew_output
defquantize_input(self,*inputs,config,**kwargs):
"""
quantize should overload this method to quantize input.
This method is effectively hooked to :meth:`forward` of the model.
Parameters
----------
inputs : Tensor
inputs that needs to be quantized
config : dict
the configuration for inputs quantization
"""
# Put your code to generate `new_input` here
new_input=...
returnnew_input
defupdate_epoch(self,epoch_num):
pass
defstep(self):
"""
Can do some processing based on the model or weights binded
in the func bind_model
"""
pass
# %%
# Customize backward function
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# Sometimes it's necessary for a quantization operation to have a customized backward function,
# such as `Straight-Through Estimator <https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste>`__\ ,
# user can customize a backward function as follow:
To write a new quantization algorithm, you can write a class that inherits ``nni.compression.pytorch.Quantizer``.
Then, override the member functions with the logic of your algorithm. The member function to override is ``quantize_weight``.
``quantize_weight`` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
.. GENERATED FROM PYTHON SOURCE LINES 9-80
.. code-block:: default
from nni.compression.pytorch import Quantizer
class YourQuantizer(Quantizer):
def __init__(self, model, config_list):
"""
Suggest you to use the NNI defined spec for config
"\n# Quantization Quickstart\n\nHere is a four-minute video to get you started with model quantization.\n\n.. youtube:: MSfV7AyfiA4\n :align: center\n\nQuantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.\n\nIn NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.\nHere we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparation\n\nIn this tutorial, we use a simple model and pre-train on MNIST dataset.\nIf you are familiar with defining a model and training in pytorch, you can skip directly to `Quantizing Model`_.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch\nimport torch.nn.functional as F\nfrom torch.optim import SGD\n\nfrom scripts.compression_mnist_model import TorchModel, trainer, evaluator, device, test_trt\n\n# define the model\nmodel = TorchModel().to(device)\n\n# define the optimizer and criterion for pre-training\n\noptimizer = SGD(model.parameters(), 1e-2)\ncriterion = F.nll_loss\n\n# pre-train and evaluate the model on MNIST dataset\nfor epoch in range(3):\n trainer(model, optimizer, criterion)\n evaluator(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Quantizing Model\n\nInitialize a `config_list`.\nDetailed about how to write ``config_list`` please refer :doc:`compression config specification <../compression/compression_config_list>`.\n\n"
"The model has now been wrapped, and quantization targets ('quant_types' setting in `config_list`)\nwill be quantized & dequantized for simulated quantization in the wrapped layers.\nQAT is a training-aware quantizer, it will update scale and zero point during training.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"for epoch in range(3):\n trainer(model, optimizer, criterion)\n evaluator(model)"