quantization_quick_start_mnist.rst


.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/quantization_quick_start_mnist.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_tutorials_quantization_quick_start_mnist.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_quantization_quick_start_mnist.py:


Quantization Quickstart
=======================

Here is a four-minute video to get you started with model quantization.

..  youtube:: MSfV7AyfiA4
    :align: center

Quantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.

In NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.
Here we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.

.. GENERATED FROM PYTHON SOURCE LINES 17-22

Preparation
-----------

In this tutorial, we use a simple model and pre-train on MNIST dataset.
If you are familiar with defining a model and training in pytorch, you can skip directly to `Quantizing Model`_.

.. GENERATED FROM PYTHON SOURCE LINES 22-42

.. code-block:: default


    import torch
    import torch.nn.functional as F
    from torch.optim import SGD

    from scripts.compression_mnist_model import TorchModel, trainer, evaluator, device, test_trt

    # define the model
    model = TorchModel().to(device)

    # define the optimizer and criterion for pre-training

    optimizer = SGD(model.parameters(), 1e-2)
    criterion = F.nll_loss

    # pre-train and evaluate the model on MNIST dataset
    for epoch in range(3):
        trainer(model, optimizer, criterion)
        evaluator(model)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Average test loss: 0.5901, Accuracy: 8293/10000 (83%)
    Average test loss: 0.2469, Accuracy: 9245/10000 (92%)
    Average test loss: 0.1586, Accuracy: 9531/10000 (95%)


.. GENERATED FROM PYTHON SOURCE LINES 43-48

Quantizing Model
----------------

Initialize a `config_list`.
Detailed about how to write ``config_list`` please refer :doc:`compression config specification <../compression/compression_config_list>`.

.. GENERATED FROM PYTHON SOURCE LINES 48-63

.. code-block:: default


    config_list = [{
        'quant_types': ['input', 'weight'],
        'quant_bits': {'input': 8, 'weight': 8},
        'op_types': ['Conv2d']
    }, {
        'quant_types': ['output'],
        'quant_bits': {'output': 8},
        'op_types': ['ReLU']
    }, {
        'quant_types': ['input', 'weight'],
        'quant_bits': {'input': 8, 'weight': 8},
        'op_names': ['fc1', 'fc2']
    }]


.. GENERATED FROM PYTHON SOURCE LINES 64-65

finetuning the model by using QAT

.. GENERATED FROM PYTHON SOURCE LINES 65-70

.. code-block:: default

    from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
    dummy_input = torch.rand(32, 1, 28, 28).to(device)
    quantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)
    quantizer.compress()


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    TorchModel(
      (conv1): QuantizerModuleWrapper(
        (module): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
      )
      (conv2): QuantizerModuleWrapper(
        (module): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
      )
      (fc1): QuantizerModuleWrapper(
        (module): Linear(in_features=256, out_features=120, bias=True)
      )
      (fc2): QuantizerModuleWrapper(
        (module): Linear(in_features=120, out_features=84, bias=True)
      )
      (fc3): Linear(in_features=84, out_features=10, bias=True)
      (relu1): QuantizerModuleWrapper(
        (module): ReLU()
      )
      (relu2): QuantizerModuleWrapper(
        (module): ReLU()
      )
      (relu3): QuantizerModuleWrapper(
        (module): ReLU()
      )
      (relu4): QuantizerModuleWrapper(
        (module): ReLU()
      )
      (pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
      (pool2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    )


.. GENERATED FROM PYTHON SOURCE LINES 71-74

The model has now been wrapped, and quantization targets ('quant_types' setting in `config_list`)
will be quantized & dequantized for simulated quantization in the wrapped layers.
QAT is a training-aware quantizer, it will update scale and zero point during training.

.. GENERATED FROM PYTHON SOURCE LINES 74-79

.. code-block:: default


    for epoch in range(3):
        trainer(model, optimizer, criterion)
        evaluator(model)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Average test loss: 0.1333, Accuracy: 9587/10000 (96%)
    Average test loss: 0.1076, Accuracy: 9660/10000 (97%)
    Average test loss: 0.0957, Accuracy: 9702/10000 (97%)


.. GENERATED FROM PYTHON SOURCE LINES 80-81

export model and get calibration_config

.. GENERATED FROM PYTHON SOURCE LINES 81-87

.. code-block:: default

    model_path = "./log/mnist_model.pth"
    calibration_path = "./log/mnist_calibration.pth"
    calibration_config = quantizer.export_model(model_path, calibration_path)

    print("calibration_config: ", calibration_config)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    INFO:nni.compression.pytorch.compressor:Model state_dict saved to ./log/mnist_model.pth
    INFO:nni.compression.pytorch.compressor:Mask dict saved to ./log/mnist_calibration.pth
    calibration_config:  {'conv1': {'weight_bits': 8, 'weight_scale': tensor([0.0029], device='cuda:0'), 'weight_zero_point': tensor([96.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': -0.4242129623889923, 'tracked_max_input': 2.821486711502075}, 'conv2': {'weight_bits': 8, 'weight_scale': tensor([0.0017], device='cuda:0'), 'weight_zero_point': tensor([101.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 10.014460563659668}, 'fc1': {'weight_bits': 8, 'weight_scale': tensor([0.0012], device='cuda:0'), 'weight_zero_point': tensor([118.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 25.994585037231445}, 'fc2': {'weight_bits': 8, 'weight_scale': tensor([0.0012], device='cuda:0'), 'weight_zero_point': tensor([120.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 21.589195251464844}, 'relu1': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 10.066218376159668}, 'relu2': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 26.317869186401367}, 'relu3': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 21.97711944580078}, 'relu4': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 17.56885528564453}}


.. GENERATED FROM PYTHON SOURCE LINES 88-89

build tensorRT engine to make a real speedup, for more information about speedup, please refer :doc:`quantization_speedup`.

.. GENERATED FROM PYTHON SOURCE LINES 89-95

.. code-block:: default


    from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
    input_shape = (32, 1, 28, 28)
    engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=32)
    engine.compress()
    test_trt(engine)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Loss: 0.09545102081298829  Accuracy: 96.98%
    Inference elapsed_time (whole dataset): 0.03549933433532715s


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 1 minutes  45.743 seconds)


.. _sphx_glr_download_tutorials_quantization_quick_start_mnist.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: quantization_quick_start_mnist.py <quantization_quick_start_mnist.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: quantization_quick_start_mnist.ipynb <quantization_quick_start_mnist.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_