Unverified Commit c299e576 authored by J-shang's avatar J-shang Committed by GitHub
Browse files

[Doc] add an example for compression config list & add videos in doc (#4890)

parent 64ae3d73
......@@ -18,4 +18,5 @@ sphinx-gallery
sphinx-intl
sphinx-tabs
sphinxcontrib-bibtex
sphinxcontrib-youtube
git+https://github.com/bashtage/sphinx-material@6e0ef82#egg=sphinx_material
......@@ -93,3 +93,59 @@ quant_start_step
Specific key for ``QAT Quantizer``. Disable quantization until model are run by certain number of steps,
this allows the network to enter a more stable.
State where output quantization ranges do not exclude a significant fraction of values, default value is 0.
Examples
--------
Suppose we want to compress the following model::
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
...
First, we need to determine where to compress, use the following config list to specify all ``Conv2d`` modules and module named ``fc1``::
config_list = [{'op_types': ['Conv2d']}, {'op_names': ['fc1']}]
Sometimes we may need to compress all modules of a certain type, except for a few special ones.
Writing all the module names is laborious at this point, we can use ``exclude`` to quickly specify the compression target modules::
config_list = [{
'op_types': ['Conv2d', 'Linear']
}, {
'exclude': True,
'op_names': ['fc2']
}]
The above two config lists are equivalent to the model we want to compress, they both use ``conv1``, ``conv2``, and ``fc1`` as compression targets.
Let's take a simple pruning config list example, pruning all ``Conv2d`` modules with 50% sparsity, and pruning ``fc1`` with 80% sparsity::
config_list = [{
'op_types': ['Conv2d'],
'total_sparsity': 0.5
}, {
'op_names': ['fc1'],
'total_sparsity': 0.8
}]
Then if you want to try model quantization, here is a simple config list example::
config_list = [{
'op_types': ['Conv2d'],
'quant_types': ['input', 'weight'],
'quant_bits': {'input': 8, 'weight': 8}
}, {
'op_names': ['fc1'],
'quant_types': ['input', 'weight'],
'quant_bits': {'input': 8, 'weight': 8}
}]
......@@ -53,6 +53,7 @@ extensions = [
'sphinx.ext.viewcode',
'sphinx.ext.intersphinx',
'sphinxcontrib.bibtex',
'sphinxcontrib.youtube',
# 'nbsphinx', # nbsphinx has conflicts with sphinx-gallery.
'sphinx.ext.extlinks',
'IPython.sphinxext.ipython_console_highlighting',
......
.. 16ce3c41e8ec5389a2071e6cbe56ccab
.. a6a9f0292afa81c7796304ae7da5afcd
Web 界面
========
......
......@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: NNI \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2022-04-20 05:50+0000\n"
"POT-Creation-Date: 2022-05-27 16:52+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
......@@ -127,12 +127,12 @@ msgstr ""
#: ../../source/tutorials/hello_nas.rst:564
#: ../../source/tutorials/hpo_quickstart_pytorch/main.rst:244
#: ../../source/tutorials/hpo_quickstart_pytorch/main.rst:281
#: ../../source/tutorials/pruning_quick_start_mnist.rst:65
#: ../../source/tutorials/pruning_quick_start_mnist.rst:107
#: ../../source/tutorials/pruning_quick_start_mnist.rst:172
#: ../../source/tutorials/pruning_quick_start_mnist.rst:218
#: ../../source/tutorials/pruning_quick_start_mnist.rst:255
#: ../../source/tutorials/pruning_quick_start_mnist.rst:283
#: ../../source/tutorials/pruning_quick_start_mnist.rst:70
#: ../../source/tutorials/pruning_quick_start_mnist.rst:112
#: ../../source/tutorials/pruning_quick_start_mnist.rst:177
#: ../../source/tutorials/pruning_quick_start_mnist.rst:223
#: ../../source/tutorials/pruning_quick_start_mnist.rst:260
#: ../../source/tutorials/pruning_quick_start_mnist.rst:288
msgid "Out:"
msgstr ""
......@@ -347,7 +347,7 @@ msgstr ""
#: ../../source/tutorials/hello_nas.rst:625
#: ../../source/tutorials/hpo_quickstart_pytorch/main.rst:335
#: ../../source/tutorials/pruning_quick_start_mnist.rst:357
#: ../../source/tutorials/pruning_quick_start_mnist.rst:362
msgid "`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_"
msgstr ""
......@@ -648,49 +648,53 @@ msgid "Pruning Quickstart"
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:24
msgid "Here is a three-minute video to get you started with model pruning."
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:29
msgid ""
"Model pruning is a technique to reduce the model size and computation by "
"reducing model weight size or intermediate state size. There are three "
"common practices for pruning a DNN model:"
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:27
#: ../../source/tutorials/pruning_quick_start_mnist.rst:32
msgid "Pre-training a model -> Pruning the model -> Fine-tuning the pruned model"
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:28
#: ../../source/tutorials/pruning_quick_start_mnist.rst:33
msgid ""
"Pruning a model during training (i.e., pruning aware training) -> Fine-"
"tuning the pruned model"
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:29
#: ../../source/tutorials/pruning_quick_start_mnist.rst:34
msgid "Pruning a model -> Training the pruned model from scratch"
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:31
#: ../../source/tutorials/pruning_quick_start_mnist.rst:36
msgid ""
"NNI supports all of the above pruning practices by working on the key "
"pruning stage. Following this tutorial for a quick look at how to use NNI"
" to prune a model in a common practice."
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:37
#: ../../source/tutorials/pruning_quick_start_mnist.rst:42
msgid "Preparation"
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:39
#: ../../source/tutorials/pruning_quick_start_mnist.rst:44
msgid ""
"In this tutorial, we use a simple model and pre-trained on MNIST dataset."
" If you are familiar with defining a model and training in pytorch, you "
"can skip directly to `Pruning Model`_."
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:121
#: ../../source/tutorials/pruning_quick_start_mnist.rst:126
msgid "Pruning Model"
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:123
#: ../../source/tutorials/pruning_quick_start_mnist.rst:128
msgid ""
"Using L1NormPruner to prune the model and generate the masks. Usually, a "
"pruner requires original model and ``config_list`` as its inputs. "
......@@ -699,7 +703,7 @@ msgid ""
"<../compression/compression_config_list>`."
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:127
#: ../../source/tutorials/pruning_quick_start_mnist.rst:132
msgid ""
"The following `config_list` means all layers whose type is `Linear` or "
"`Conv2d` will be pruned, except the layer named `fc3`, because `fc3` is "
......@@ -707,11 +711,11 @@ msgid ""
"named `fc3` will not be pruned."
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:153
#: ../../source/tutorials/pruning_quick_start_mnist.rst:158
msgid "Pruners usually require `model` and `config_list` as input arguments."
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:232
#: ../../source/tutorials/pruning_quick_start_mnist.rst:237
msgid ""
"Speedup the original model with masks, note that `ModelSpeedup` requires "
"an unwrapped model. The model becomes smaller after speedup, and reaches "
......@@ -719,32 +723,32 @@ msgid ""
"across layers."
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:269
#: ../../source/tutorials/pruning_quick_start_mnist.rst:274
msgid "the model will become real smaller after speedup"
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:307
#: ../../source/tutorials/pruning_quick_start_mnist.rst:312
msgid "Fine-tuning Compacted Model"
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:308
#: ../../source/tutorials/pruning_quick_start_mnist.rst:313
msgid ""
"Note that if the model has been sped up, you need to re-initialize a new "
"optimizer for fine-tuning. Because speedup will replace the masked big "
"layers with dense small ones."
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:329
msgid "**Total running time of the script:** ( 0 minutes 58.337 seconds)"
#: ../../source/tutorials/pruning_quick_start_mnist.rst:334
msgid "**Total running time of the script:** ( 1 minutes 30.730 seconds)"
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:344
#: ../../source/tutorials/pruning_quick_start_mnist.rst:349
msgid ""
":download:`Download Python source code: pruning_quick_start_mnist.py "
"<pruning_quick_start_mnist.py>`"
msgstr ""
#: ../../source/tutorials/pruning_quick_start_mnist.rst:350
#: ../../source/tutorials/pruning_quick_start_mnist.rst:355
msgid ""
":download:`Download Jupyter notebook: pruning_quick_start_mnist.ipynb "
"<pruning_quick_start_mnist.ipynb>`"
......@@ -771,3 +775,6 @@ msgstr ""
#~ msgid "**Total running time of the script:** ( 1 minutes 24.393 seconds)"
#~ msgstr ""
#~ msgid "**Total running time of the script:** ( 0 minutes 58.337 seconds)"
#~ msgstr ""
......@@ -53,7 +53,7 @@ Tutorials
.. raw:: html
<div class="sphx-glr-thumbcontainer" tooltip="Quantization reduces model size and speeds up inference time by reducing the number of bits req...">
<div class="sphx-glr-thumbcontainer" tooltip="Here is a four-minute video to get you started with model quantization.">
.. only:: html
......@@ -74,7 +74,7 @@ Tutorials
.. raw:: html
<div class="sphx-glr-thumbcontainer" tooltip="Model pruning is a technique to reduce the model size and computation by reducing model weight ...">
<div class="sphx-glr-thumbcontainer" tooltip="Here is a three-minute video to get you started with model pruning.">
.. only:: html
......
......@@ -15,7 +15,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Pruning Quickstart\n\nModel pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.\nThere are three common practices for pruning a DNN model:\n\n#. Pre-training a model -> Pruning the model -> Fine-tuning the pruned model\n#. Pruning a model during training (i.e., pruning aware training) -> Fine-tuning the pruned model\n#. Pruning a model -> Training the pruned model from scratch\n\nNNI supports all of the above pruning practices by working on the key pruning stage.\nFollowing this tutorial for a quick look at how to use NNI to prune a model in a common practice.\n"
"\n# Pruning Quickstart\n\nHere is a three-minute video to get you started with model pruning.\n\n.. youtube:: wKh51Jnr0a8\n :align: center\n\nModel pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.\nThere are three common practices for pruning a DNN model:\n\n#. Pre-training a model -> Pruning the model -> Fine-tuning the pruned model\n#. Pruning a model during training (i.e., pruning aware training) -> Fine-tuning the pruned model\n#. Pruning a model -> Training the pruned model from scratch\n\nNNI supports all of the above pruning practices by working on the key pruning stage.\nFollowing this tutorial for a quick look at how to use NNI to prune a model in a common practice.\n"
]
},
{
......@@ -165,7 +165,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
"version": "3.8.8"
}
},
"nbformat": 4,
......
......@@ -2,6 +2,11 @@
Pruning Quickstart
==================
Here is a three-minute video to get you started with model pruning.
.. youtube:: wKh51Jnr0a8
:align: center
Model pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.
There are three common practices for pruning a DNN model:
......
930f8ee2f57b70037e3231152a72606c
\ No newline at end of file
33781311d6344b4aebb94db94a96dfd3
\ No newline at end of file
......@@ -21,6 +21,11 @@
Pruning Quickstart
==================
Here is a three-minute video to get you started with model pruning.
.. youtube:: wKh51Jnr0a8
:align: center
Model pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.
There are three common practices for pruning a DNN model:
......@@ -31,7 +36,7 @@ There are three common practices for pruning a DNN model:
NNI supports all of the above pruning practices by working on the key pruning stage.
Following this tutorial for a quick look at how to use NNI to prune a model in a common practice.
.. GENERATED FROM PYTHON SOURCE LINES 17-22
.. GENERATED FROM PYTHON SOURCE LINES 22-27
Preparation
-----------
......@@ -39,7 +44,7 @@ Preparation
In this tutorial, we use a simple model and pre-trained on MNIST dataset.
If you are familiar with defining a model and training in pytorch, you can skip directly to `Pruning Model`_.
.. GENERATED FROM PYTHON SOURCE LINES 22-35
.. GENERATED FROM PYTHON SOURCE LINES 27-40
.. code-block:: default
......@@ -83,7 +88,7 @@ If you are familiar with defining a model and training in pytorch, you can skip
.. GENERATED FROM PYTHON SOURCE LINES 36-47
.. GENERATED FROM PYTHON SOURCE LINES 41-52
.. code-block:: default
......@@ -108,14 +113,14 @@ If you are familiar with defining a model and training in pytorch, you can skip
.. code-block:: none
Average test loss: 0.5368, Accuracy: 8321/10000 (83%)
Average test loss: 0.3092, Accuracy: 9104/10000 (91%)
Average test loss: 0.2070, Accuracy: 9380/10000 (94%)
Average test loss: 0.4925, Accuracy: 8414/10000 (84%)
Average test loss: 0.2626, Accuracy: 9214/10000 (92%)
Average test loss: 0.2006, Accuracy: 9369/10000 (94%)
.. GENERATED FROM PYTHON SOURCE LINES 48-58
.. GENERATED FROM PYTHON SOURCE LINES 53-63
Pruning Model
-------------
......@@ -128,7 +133,7 @@ The following `config_list` means all layers whose type is `Linear` or `Conv2d`
except the layer named `fc3`, because `fc3` is `exclude`.
The final sparsity ratio for each layer is 50%. The layer named `fc3` will not be pruned.
.. GENERATED FROM PYTHON SOURCE LINES 58-67
.. GENERATED FROM PYTHON SOURCE LINES 63-72
.. code-block:: default
......@@ -148,11 +153,11 @@ The final sparsity ratio for each layer is 50%. The layer named `fc3` will not b
.. GENERATED FROM PYTHON SOURCE LINES 68-69
.. GENERATED FROM PYTHON SOURCE LINES 73-74
Pruners usually require `model` and `config_list` as input arguments.
.. GENERATED FROM PYTHON SOURCE LINES 69-76
.. GENERATED FROM PYTHON SOURCE LINES 74-81
.. code-block:: default
......@@ -198,7 +203,7 @@ Pruners usually require `model` and `config_list` as input arguments.
.. GENERATED FROM PYTHON SOURCE LINES 77-84
.. GENERATED FROM PYTHON SOURCE LINES 82-89
.. code-block:: default
......@@ -227,13 +232,13 @@ Pruners usually require `model` and `config_list` as input arguments.
.. GENERATED FROM PYTHON SOURCE LINES 85-88
.. GENERATED FROM PYTHON SOURCE LINES 90-93
Speedup the original model with masks, note that `ModelSpeedup` requires an unwrapped model.
The model becomes smaller after speedup,
and reaches a higher sparsity ratio because `ModelSpeedup` will propagate the masks across layers.
.. GENERATED FROM PYTHON SOURCE LINES 88-97
.. GENERATED FROM PYTHON SOURCE LINES 93-102
.. code-block:: default
......@@ -258,17 +263,17 @@ and reaches a higher sparsity ratio because `ModelSpeedup` will propagate the ma
aten::log_softmax is not Supported! Please report an issue at https://github.com/microsoft/nni. Thanks~
Note: .aten::log_softmax.12 does not have corresponding mask inference object
/home/nishang/anaconda3/envs/MCM/lib/python3.9/site-packages/torch/_tensor.py:1013: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1640811803361/work/build/aten/src/ATen/core/TensorBody.h:417.)
/home/ningshang/anaconda3/envs/nni-dev/lib/python3.8/site-packages/torch/_tensor.py:1013: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:417.)
return self._grad
.. GENERATED FROM PYTHON SOURCE LINES 98-99
.. GENERATED FROM PYTHON SOURCE LINES 103-104
the model will become real smaller after speedup
.. GENERATED FROM PYTHON SOURCE LINES 99-101
.. GENERATED FROM PYTHON SOURCE LINES 104-106
.. code-block:: default
......@@ -301,14 +306,14 @@ the model will become real smaller after speedup
.. GENERATED FROM PYTHON SOURCE LINES 102-106
.. GENERATED FROM PYTHON SOURCE LINES 107-111
Fine-tuning Compacted Model
---------------------------
Note that if the model has been sped up, you need to re-initialize a new optimizer for fine-tuning.
Because speedup will replace the masked big layers with dense small ones.
.. GENERATED FROM PYTHON SOURCE LINES 106-110
.. GENERATED FROM PYTHON SOURCE LINES 111-115
.. code-block:: default
......@@ -326,7 +331,7 @@ Because speedup will replace the masked big layers with dense small ones.
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 58.337 seconds)
**Total running time of the script:** ( 1 minutes 30.730 seconds)
.. _sphx_glr_download_tutorials_pruning_quick_start_mnist.py:
......
.. 5f266ace988c9ca9e44555fdc497e9ba
.. b743ab67f64dd0a0688a8cb184e0e947
.. note::
:class: sphx-glr-download-link-note
......@@ -14,6 +14,11 @@
模型剪枝入门
============
下面是一个三分钟快速入门模型剪枝的视频。
.. youtube:: wKh51Jnr0a8
:align: center
模型剪枝是一种通过减小模型权重规模或中间状态规模来减小模型大小和计算量的技术。
修剪 DNN 模型有三种常见做法:
......
......@@ -15,7 +15,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Quantization Quickstart\n\nQuantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.\n\nIn NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.\nHere we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.\n"
"\n# Quantization Quickstart\n\nHere is a four-minute video to get you started with model quantization.\n\n.. youtube:: MSfV7AyfiA4\n :align: center\n\nQuantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.\n\nIn NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.\nHere we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.\n"
]
},
{
......@@ -143,7 +143,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
"version": "3.8.8"
}
},
"nbformat": 4,
......
......@@ -2,6 +2,11 @@
Quantization Quickstart
=======================
Here is a four-minute video to get you started with model quantization.
.. youtube:: MSfV7AyfiA4
:align: center
Quantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.
In NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.
......
bceaf8235b437428267b614af06634a0
\ No newline at end of file
2995cef94c5c6c66a6dfa4b5ff28baea
\ No newline at end of file
......@@ -21,12 +21,17 @@
Quantization Quickstart
=======================
Here is a four-minute video to get you started with model quantization.
.. youtube:: MSfV7AyfiA4
:align: center
Quantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.
In NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.
Here we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.
.. GENERATED FROM PYTHON SOURCE LINES 12-17
.. GENERATED FROM PYTHON SOURCE LINES 17-22
Preparation
-----------
......@@ -34,7 +39,7 @@ Preparation
In this tutorial, we use a simple model and pre-train on MNIST dataset.
If you are familiar with defining a model and training in pytorch, you can skip directly to `Quantizing Model`_.
.. GENERATED FROM PYTHON SOURCE LINES 17-37
.. GENERATED FROM PYTHON SOURCE LINES 22-42
.. code-block:: default
......@@ -68,14 +73,14 @@ If you are familiar with defining a model and training in pytorch, you can skip
.. code-block:: none
Average test loss: 0.7073, Accuracy: 7624/10000 (76%)
Average test loss: 0.2776, Accuracy: 9122/10000 (91%)
Average test loss: 0.1907, Accuracy: 9412/10000 (94%)
Average test loss: 0.5901, Accuracy: 8293/10000 (83%)
Average test loss: 0.2469, Accuracy: 9245/10000 (92%)
Average test loss: 0.1586, Accuracy: 9531/10000 (95%)
.. GENERATED FROM PYTHON SOURCE LINES 38-43
.. GENERATED FROM PYTHON SOURCE LINES 43-48
Quantizing Model
----------------
......@@ -83,7 +88,7 @@ Quantizing Model
Initialize a `config_list`.
Detailed about how to write ``config_list`` please refer :doc:`compression config specification <../compression/compression_config_list>`.
.. GENERATED FROM PYTHON SOURCE LINES 43-58
.. GENERATED FROM PYTHON SOURCE LINES 48-63
.. code-block:: default
......@@ -109,11 +114,11 @@ Detailed about how to write ``config_list`` please refer :doc:`compression confi
.. GENERATED FROM PYTHON SOURCE LINES 59-60
.. GENERATED FROM PYTHON SOURCE LINES 64-65
finetuning the model by using QAT
.. GENERATED FROM PYTHON SOURCE LINES 60-65
.. GENERATED FROM PYTHON SOURCE LINES 65-70
.. code-block:: default
......@@ -165,13 +170,13 @@ finetuning the model by using QAT
.. GENERATED FROM PYTHON SOURCE LINES 66-69
.. GENERATED FROM PYTHON SOURCE LINES 71-74
The model has now been wrapped, and quantization targets ('quant_types' setting in `config_list`)
will be quantized & dequantized for simulated quantization in the wrapped layers.
QAT is a training-aware quantizer, it will update scale and zero point during training.
.. GENERATED FROM PYTHON SOURCE LINES 69-74
.. GENERATED FROM PYTHON SOURCE LINES 74-79
.. code-block:: default
......@@ -190,18 +195,18 @@ QAT is a training-aware quantizer, it will update scale and zero point during tr
.. code-block:: none
Average test loss: 0.1542, Accuracy: 9529/10000 (95%)
Average test loss: 0.1133, Accuracy: 9664/10000 (97%)
Average test loss: 0.0919, Accuracy: 9726/10000 (97%)
Average test loss: 0.1333, Accuracy: 9587/10000 (96%)
Average test loss: 0.1076, Accuracy: 9660/10000 (97%)
Average test loss: 0.0957, Accuracy: 9702/10000 (97%)
.. GENERATED FROM PYTHON SOURCE LINES 75-76
.. GENERATED FROM PYTHON SOURCE LINES 80-81
export model and get calibration_config
.. GENERATED FROM PYTHON SOURCE LINES 76-82
.. GENERATED FROM PYTHON SOURCE LINES 81-87
.. code-block:: default
......@@ -221,16 +226,18 @@ export model and get calibration_config
.. code-block:: none
calibration_config: {'conv1': {'weight_bits': 8, 'weight_scale': tensor([0.0031], device='cuda:0'), 'weight_zero_point': tensor([76.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': -0.4242129623889923, 'tracked_max_input': 2.821486711502075}, 'conv2': {'weight_bits': 8, 'weight_scale': tensor([0.0018], device='cuda:0'), 'weight_zero_point': tensor([113.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 12.42452621459961}, 'fc1': {'weight_bits': 8, 'weight_scale': tensor([0.0011], device='cuda:0'), 'weight_zero_point': tensor([124.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 31.650196075439453}, 'fc2': {'weight_bits': 8, 'weight_scale': tensor([0.0013], device='cuda:0'), 'weight_zero_point': tensor([122.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 25.805370330810547}, 'relu1': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 12.499907493591309}, 'relu2': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 32.0243034362793}, 'relu3': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 26.491384506225586}, 'relu4': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 17.662996292114258}}
INFO:nni.compression.pytorch.compressor:Model state_dict saved to ./log/mnist_model.pth
INFO:nni.compression.pytorch.compressor:Mask dict saved to ./log/mnist_calibration.pth
calibration_config: {'conv1': {'weight_bits': 8, 'weight_scale': tensor([0.0029], device='cuda:0'), 'weight_zero_point': tensor([96.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': -0.4242129623889923, 'tracked_max_input': 2.821486711502075}, 'conv2': {'weight_bits': 8, 'weight_scale': tensor([0.0017], device='cuda:0'), 'weight_zero_point': tensor([101.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 10.014460563659668}, 'fc1': {'weight_bits': 8, 'weight_scale': tensor([0.0012], device='cuda:0'), 'weight_zero_point': tensor([118.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 25.994585037231445}, 'fc2': {'weight_bits': 8, 'weight_scale': tensor([0.0012], device='cuda:0'), 'weight_zero_point': tensor([120.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 21.589195251464844}, 'relu1': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 10.066218376159668}, 'relu2': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 26.317869186401367}, 'relu3': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 21.97711944580078}, 'relu4': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 17.56885528564453}}
.. GENERATED FROM PYTHON SOURCE LINES 83-84
.. GENERATED FROM PYTHON SOURCE LINES 88-89
build tensorRT engine to make a real speedup, for more information about speedup, please refer :doc:`quantization_speedup`.
.. GENERATED FROM PYTHON SOURCE LINES 84-90
.. GENERATED FROM PYTHON SOURCE LINES 89-95
.. code-block:: default
......@@ -250,8 +257,8 @@ build tensorRT engine to make a real speedup, for more information about speedup
.. code-block:: none
Loss: 0.09358334274291992 Accuracy: 97.21%
Inference elapsed_time (whole dataset): 0.04445981979370117s
Loss: 0.09545102081298829 Accuracy: 96.98%
Inference elapsed_time (whole dataset): 0.03549933433532715s
......@@ -259,7 +266,7 @@ build tensorRT engine to make a real speedup, for more information about speedup
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 36.499 seconds)
**Total running time of the script:** ( 1 minutes 45.743 seconds)
.. _sphx_glr_download_tutorials_quantization_quick_start_mnist.py:
......
......@@ -5,10 +5,10 @@
Computation times
=================
**01:04.509** total execution time for **tutorials** files:
**01:45.743** total execution time for **tutorials** files:
+-----------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_quantization_speedup.py` (``quantization_speedup.py``) | 01:04.509 | 0.0 MB |
| :ref:`sphx_glr_tutorials_quantization_quick_start_mnist.py` (``quantization_quick_start_mnist.py``) | 01:45.743 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hello_nas.py` (``hello_nas.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
......@@ -22,5 +22,5 @@ Computation times
+-----------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_quantization_customize.py` (``quantization_customize.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_quantization_quick_start_mnist.py` (``quantization_quick_start_mnist.py``) | 00:00.000 | 0.0 MB |
| :ref:`sphx_glr_tutorials_quantization_speedup.py` (``quantization_speedup.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
......@@ -2,6 +2,11 @@
Pruning Quickstart
==================
Here is a three-minute video to get you started with model pruning.
.. youtube:: wKh51Jnr0a8
:align: center
Model pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.
There are three common practices for pruning a DNN model:
......
......@@ -2,6 +2,11 @@
Quantization Quickstart
=======================
Here is a four-minute video to get you started with model quantization.
.. youtube:: MSfV7AyfiA4
:align: center
Quantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.
In NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment