.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/quantization_quick_start_mnist.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_quantization_quick_start_mnist.py: Quantization Quickstart ======================= Here is a four-minute video to get you started with model quantization. .. youtube:: MSfV7AyfiA4 :align: center Quantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations. In NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported. Here we use `QAT_Quantizer` as an example to show the usage of quantization in NNI. .. GENERATED FROM PYTHON SOURCE LINES 17-22 Preparation ----------- In this tutorial, we use a simple model and pre-train on MNIST dataset. If you are familiar with defining a model and training in pytorch, you can skip directly to `Quantizing Model`_. .. GENERATED FROM PYTHON SOURCE LINES 22-42 .. code-block:: default import torch import torch.nn.functional as F from torch.optim import SGD from nni_assets.compression.mnist_model import TorchModel, trainer, evaluator, device, test_trt # define the model model = TorchModel().to(device) # define the optimizer and criterion for pre-training optimizer = SGD(model.parameters(), 1e-2) criterion = F.nll_loss # pre-train and evaluate the model on MNIST dataset for epoch in range(3): trainer(model, optimizer, criterion) evaluator(model) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Average test loss: 0.8954, Accuracy: 6995/10000 (70%) Average test loss: 0.3259, Accuracy: 9046/10000 (90%) Average test loss: 0.2125, Accuracy: 9354/10000 (94%) .. GENERATED FROM PYTHON SOURCE LINES 43-48 Quantizing Model ---------------- Initialize a `config_list`. Detailed about how to write ``config_list`` please refer :doc:`compression config specification <../compression/compression_config_list>`. .. GENERATED FROM PYTHON SOURCE LINES 48-63 .. code-block:: default config_list = [{ 'quant_types': ['input', 'weight'], 'quant_bits': {'input': 8, 'weight': 8}, 'op_types': ['Conv2d'] }, { 'quant_types': ['output'], 'quant_bits': {'output': 8}, 'op_types': ['ReLU'] }, { 'quant_types': ['input', 'weight'], 'quant_bits': {'input': 8, 'weight': 8}, 'op_names': ['fc1', 'fc2'] }] .. GENERATED FROM PYTHON SOURCE LINES 64-65 finetuning the model by using QAT .. GENERATED FROM PYTHON SOURCE LINES 65-70 .. code-block:: default from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer dummy_input = torch.rand(32, 1, 28, 28).to(device) quantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input) quantizer.compress() .. rst-class:: sphx-glr-script-out Out: .. code-block:: none TorchModel( (conv1): QuantizerModuleWrapper( (module): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1)) ) (conv2): QuantizerModuleWrapper( (module): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1)) ) (fc1): QuantizerModuleWrapper( (module): Linear(in_features=256, out_features=120, bias=True) ) (fc2): QuantizerModuleWrapper( (module): Linear(in_features=120, out_features=84, bias=True) ) (fc3): Linear(in_features=84, out_features=10, bias=True) (relu1): QuantizerModuleWrapper( (module): ReLU() ) (relu2): QuantizerModuleWrapper( (module): ReLU() ) (relu3): QuantizerModuleWrapper( (module): ReLU() ) (relu4): QuantizerModuleWrapper( (module): ReLU() ) (pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False) (pool2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False) ) .. GENERATED FROM PYTHON SOURCE LINES 71-74 The model has now been wrapped, and quantization targets ('quant_types' setting in `config_list`) will be quantized & dequantized for simulated quantization in the wrapped layers. QAT is a training-aware quantizer, it will update scale and zero point during training. .. GENERATED FROM PYTHON SOURCE LINES 74-79 .. code-block:: default for epoch in range(3): trainer(model, optimizer, criterion) evaluator(model) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Average test loss: 0.1858, Accuracy: 9438/10000 (94%) Average test loss: 0.1420, Accuracy: 9564/10000 (96%) Average test loss: 0.1213, Accuracy: 9632/10000 (96%) .. GENERATED FROM PYTHON SOURCE LINES 80-81 export model and get calibration_config .. GENERATED FROM PYTHON SOURCE LINES 81-87 .. code-block:: default model_path = "./log/mnist_model.pth" calibration_path = "./log/mnist_calibration.pth" calibration_config = quantizer.export_model(model_path, calibration_path) print("calibration_config: ", calibration_config) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none calibration_config: {'conv1': {'weight_bits': 8, 'weight_scale': tensor([0.0039], device='cuda:0'), 'weight_zero_point': tensor([82.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': -0.4242129623889923, 'tracked_max_input': 2.821486711502075}, 'conv2': {'weight_bits': 8, 'weight_scale': tensor([0.0019], device='cuda:0'), 'weight_zero_point': tensor([127.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 18.87591552734375}, 'fc1': {'weight_bits': 8, 'weight_scale': tensor([0.0010], device='cuda:0'), 'weight_zero_point': tensor([123.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 26.67470932006836}, 'fc2': {'weight_bits': 8, 'weight_scale': tensor([0.0012], device='cuda:0'), 'weight_zero_point': tensor([129.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 21.60409164428711}, 'relu1': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 18.998125076293945}, 'relu2': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 27.000442504882812}, 'relu3': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 22.2519588470459}, 'relu4': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 17.8553524017334}} .. GENERATED FROM PYTHON SOURCE LINES 88-89 build tensorRT engine to make a real speedup, for more information about speedup, please refer :doc:`quantization_speedup`. .. GENERATED FROM PYTHON SOURCE LINES 89-95 .. code-block:: default from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT input_shape = (32, 1, 28, 28) engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=32) engine.compress() test_trt(engine) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Loss: 0.12193695755004882 Accuracy: 96.38% Inference elapsed_time (whole dataset): 0.036092281341552734s .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 1 minutes 39.686 seconds) .. _sphx_glr_download_tutorials_quantization_quick_start_mnist.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: quantization_quick_start_mnist.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: quantization_quick_start_mnist.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_