quantization_quick_start_mnist.rst 8.07 KB
Newer Older
J-shang's avatar
J-shang committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/quantization_quick_start_mnist.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_tutorials_quantization_quick_start_mnist.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_quantization_quick_start_mnist.py:


Quantization Quickstart
=======================

24
25
26
27
28
Here is a four-minute video to get you started with model quantization.

..  youtube:: MSfV7AyfiA4
    :align: center

J-shang's avatar
J-shang committed
29
30
31
32
33
Quantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.

In NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.
Here we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.

34
.. GENERATED FROM PYTHON SOURCE LINES 17-22
J-shang's avatar
J-shang committed
35
36
37
38
39
40
41

Preparation
-----------

In this tutorial, we use a simple model and pre-train on MNIST dataset.
If you are familiar with defining a model and training in pytorch, you can skip directly to `Quantizing Model`_.

42
.. GENERATED FROM PYTHON SOURCE LINES 22-42
J-shang's avatar
J-shang committed
43
44
45
46
47
48
49
50

.. code-block:: default


    import torch
    import torch.nn.functional as F
    from torch.optim import SGD

51
    from nni_assets.compression.mnist_model import TorchModel, trainer, evaluator, device, test_trt
J-shang's avatar
J-shang committed
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

    # define the model
    model = TorchModel().to(device)

    # define the optimizer and criterion for pre-training

    optimizer = SGD(model.parameters(), 1e-2)
    criterion = F.nll_loss

    # pre-train and evaluate the model on MNIST dataset
    for epoch in range(3):
        trainer(model, optimizer, criterion)
        evaluator(model)





.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

76
77
78
    Average test loss: 0.8954, Accuracy: 6995/10000 (70%)
    Average test loss: 0.3259, Accuracy: 9046/10000 (90%)
    Average test loss: 0.2125, Accuracy: 9354/10000 (94%)
J-shang's avatar
J-shang committed
79
80
81
82




83
.. GENERATED FROM PYTHON SOURCE LINES 43-48
J-shang's avatar
J-shang committed
84
85
86
87
88

Quantizing Model
----------------

Initialize a `config_list`.
89
Detailed about how to write ``config_list`` please refer :doc:`compression config specification <../compression/compression_config_list>`.
J-shang's avatar
J-shang committed
90

91
.. GENERATED FROM PYTHON SOURCE LINES 48-63
J-shang's avatar
J-shang committed
92
93
94
95
96
97
98

.. code-block:: default


    config_list = [{
        'quant_types': ['input', 'weight'],
        'quant_bits': {'input': 8, 'weight': 8},
99
        'op_types': ['Conv2d']
J-shang's avatar
J-shang committed
100
101
102
    }, {
        'quant_types': ['output'],
        'quant_bits': {'output': 8},
103
        'op_types': ['ReLU']
J-shang's avatar
J-shang committed
104
105
106
    }, {
        'quant_types': ['input', 'weight'],
        'quant_bits': {'input': 8, 'weight': 8},
107
        'op_names': ['fc1', 'fc2']
J-shang's avatar
J-shang committed
108
109
110
111
112
113
114
115
116
    }]








117
.. GENERATED FROM PYTHON SOURCE LINES 64-65
J-shang's avatar
J-shang committed
118
119
120

finetuning the model by using QAT

121
.. GENERATED FROM PYTHON SOURCE LINES 65-70
J-shang's avatar
J-shang committed
122
123
124
125
126
127
128

.. code-block:: default

    from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
    dummy_input = torch.rand(32, 1, 28, 28).to(device)
    quantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)
    quantizer.compress()
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147





.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    TorchModel(
      (conv1): QuantizerModuleWrapper(
        (module): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
      )
      (conv2): QuantizerModuleWrapper(
        (module): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
      )
148
149
150
151
152
153
      (fc1): QuantizerModuleWrapper(
        (module): Linear(in_features=256, out_features=120, bias=True)
      )
      (fc2): QuantizerModuleWrapper(
        (module): Linear(in_features=120, out_features=84, bias=True)
      )
154
      (fc3): Linear(in_features=84, out_features=10, bias=True)
155
156
157
158
159
160
161
162
163
164
165
166
167
168
      (relu1): QuantizerModuleWrapper(
        (module): ReLU()
      )
      (relu2): QuantizerModuleWrapper(
        (module): ReLU()
      )
      (relu3): QuantizerModuleWrapper(
        (module): ReLU()
      )
      (relu4): QuantizerModuleWrapper(
        (module): ReLU()
      )
      (pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
      (pool2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
169
170
171
172
    )



173
.. GENERATED FROM PYTHON SOURCE LINES 71-74
174
175
176
177
178

The model has now been wrapped, and quantization targets ('quant_types' setting in `config_list`)
will be quantized & dequantized for simulated quantization in the wrapped layers.
QAT is a training-aware quantizer, it will update scale and zero point during training.

179
.. GENERATED FROM PYTHON SOURCE LINES 74-79
180
181
182
183

.. code-block:: default


J-shang's avatar
J-shang committed
184
185
186
187
188
189
190
191
192
193
194
195
196
197
    for epoch in range(3):
        trainer(model, optimizer, criterion)
        evaluator(model)





.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

198
199
200
    Average test loss: 0.1858, Accuracy: 9438/10000 (94%)
    Average test loss: 0.1420, Accuracy: 9564/10000 (96%)
    Average test loss: 0.1213, Accuracy: 9632/10000 (96%)
J-shang's avatar
J-shang committed
201
202
203
204




205
.. GENERATED FROM PYTHON SOURCE LINES 80-81
J-shang's avatar
J-shang committed
206
207
208

export model and get calibration_config

209
.. GENERATED FROM PYTHON SOURCE LINES 81-87
J-shang's avatar
J-shang committed
210
211
212
213
214
215
216
217
218
219
220
221

.. code-block:: default

    model_path = "./log/mnist_model.pth"
    calibration_path = "./log/mnist_calibration.pth"
    calibration_config = quantizer.export_model(model_path, calibration_path)

    print("calibration_config: ", calibration_config)




222
223
224
225
226
227
228

.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

229
    calibration_config:  {'conv1': {'weight_bits': 8, 'weight_scale': tensor([0.0039], device='cuda:0'), 'weight_zero_point': tensor([82.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': -0.4242129623889923, 'tracked_max_input': 2.821486711502075}, 'conv2': {'weight_bits': 8, 'weight_scale': tensor([0.0019], device='cuda:0'), 'weight_zero_point': tensor([127.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 18.87591552734375}, 'fc1': {'weight_bits': 8, 'weight_scale': tensor([0.0010], device='cuda:0'), 'weight_zero_point': tensor([123.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 26.67470932006836}, 'fc2': {'weight_bits': 8, 'weight_scale': tensor([0.0012], device='cuda:0'), 'weight_zero_point': tensor([129.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 21.60409164428711}, 'relu1': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 18.998125076293945}, 'relu2': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 27.000442504882812}, 'relu3': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 22.2519588470459}, 'relu4': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 17.8553524017334}}
230
231
232
233




234
.. GENERATED FROM PYTHON SOURCE LINES 88-89
235
236
237

build tensorRT engine to make a real speedup, for more information about speedup, please refer :doc:`quantization_speedup`.

238
.. GENERATED FROM PYTHON SOURCE LINES 89-95
239
240
241
242
243
244
245
246
247
248
249
250
251

.. code-block:: default


    from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
    input_shape = (32, 1, 28, 28)
    engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=32)
    engine.compress()
    test_trt(engine)




J-shang's avatar
J-shang committed
252
253
254
255
256
257
.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

258
259
    Loss: 0.12193695755004882  Accuracy: 96.38%
    Inference elapsed_time (whole dataset): 0.036092281341552734s
J-shang's avatar
J-shang committed
260
261
262
263
264
265
266





.. rst-class:: sphx-glr-timing

267
   **Total running time of the script:** ( 1 minutes  39.686 seconds)
J-shang's avatar
J-shang committed
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295


.. _sphx_glr_download_tutorials_quantization_quick_start_mnist.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: quantization_quick_start_mnist.py <quantization_quick_start_mnist.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: quantization_quick_start_mnist.ipynb <quantization_quick_start_mnist.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_