[Bugbash] compression doc quick fix (#4718)

1a016e3d · J-shang · GitHub · 611ed639 · 1a016e3d · 1a016e3d
Unverified Commit 1a016e3d authored Mar 30, 2022 by J-shang Committed by GitHub Mar 30, 2022
20 changed files
--- a/dependencies/recommended.txt
+++ b/dependencies/recommended.txt
@@ -7,7 +7,7 @@ torch == 1.10.0+cpu ; sys_platform != "darwin"
 torch == 1.10.0 ; sys_platform == "darwin"
 torchvision == 0.11.1+cpu ; sys_platform != "darwin"
 torchvision == 0.11.1 ; sys_platform == "darwin"
-pytorch-lightning >= 1.5.0
+pytorch-lightning >= 1.5.0, < 1.6.0
 torchmetrics
 lightgbm
 onnx

--- a/dependencies/recommended_gpu.txt
+++ b/dependencies/recommended_gpu.txt
@@ -4,7 +4,7 @@
 tensorflow
 torch == 1.10.0+cu111
 torchvision == 0.11.1+cu111
-pytorch-lightning >= 1.5.0
+pytorch-lightning >= 1.5.0, < 1.6.0
 lightgbm
 onnx
 peewee

--- a/docs/source/compression/advanced_usage.rst
+++ b/docs/source/compression/advanced_usage.rst
 Advanced Usage
 ==============
+NNI supports customized compression process for advanced users and provides some practical tools that are used in the compression process.
+Users only need to implement the part they care about following the nni interface to use all other related compression pipelines in nni.
 ..  toctree::
    :maxdepth: 2

--- a/docs/source/compression/compression_config_list.rst
+++ b/docs/source/compression/compression_config_list.rst
@@ -4,8 +4,8 @@ Compression Config Specification
 Each sub-config in the config list is a dict, and the scope of each setting (key) is only internal to each sub-config.
 If multiple sub-configs are configured for the same layer, the later ones will overwrite the previous ones.
-Shared Keys in both Pruning & Quantization Config
+Common Keys in Config
-------------------------------------------------
+---------------------
 op_types
 ^^^^^^^^
@@ -26,8 +26,8 @@ exclude
 The ``exclude`` and ``sparsity`` keyword are mutually exclusive and cannot exist in the same sub-config.
 If ``exclude`` is set in sub-config, the layers selected by this config will not be compressed.
-The Keys in Pruning Config
+Special Keys for Pruning
--------------------------
+------------------------
 op_partial_names
 ^^^^^^^^^^^^^^^^
@@ -71,8 +71,8 @@ In ``total_sparsity`` example, there are 1200 parameters that need to be masked
 To avoid this situation, ``max_sparsity_per_layer`` can be set as 0.9, this means up to 450 parameters can be masked in ``layer_1``,
 and 900 parameters can be masked in ``layer_2``.
-The Keys in Quantization Config List
+Special Keys for Quantization
------------------------------------
+-----------------------------
 quant_types
 ^^^^^^^^^^^

--- a/docs/source/compression/compression_utils.rst
+++ b/docs/source/compression/compression_utils.rst
@@ -83,6 +83,8 @@ not have model accuracies/losses under all sparsities, for example, its accuracy
   features.8,0.55696,0.54194,0.48892,0.42986,0.33048,0.2266,0.09566,0.02348,0.0056
   features.10,0.55468,0.5394,0.49576,0.4291,0.3591,0.28138,0.14256,0.05446,0.01578
+.. _topology-analysis:
 Topology Analysis
 -----------------

--- a/docs/source/compression/index.rst
+++ b/docs/source/compression/index.rst
@@ -31,7 +31,6 @@ We further elaborate on the two methods, pruning and quantization, in the follow
 NNI provides an easy-to-use toolkit to help users design and use model pruning and quantization algorithms.
 For users to compress their models, they only need to add several lines in their code.
 There are some popular model compression algorithms built-in in NNI.
-Users could further use NNI’s auto-tuning power to find the best-compressed model, which is detailed in Auto Model Compression.
 On the other hand, users could easily customize their new compression algorithms using NNI’s interface.
 There are several core features supported by NNI model compression:
@@ -54,7 +53,7 @@ If users want to apply both, a sequential mode is recommended as common practise
 .. note::
  Note that NNI pruners or quantizers are not meant to physically compact the model but for simulating the compression effect. Whereas NNI speedup tool can truly compress model by changing the network architecture and therefore reduce latency.
-  To obtain a truly compact model, users should conduct :doc:`pruning speedup <../tutorials/pruning_speedup>` or :doc:`quantizaiton speedup <../tutorials/quantization_speedup>`. 
+  To obtain a truly compact model, users should conduct :doc:`pruning speedup <../tutorials/cp_pruning_speedup>` or :doc:`quantizaiton speedup <../tutorials/cp_quantization_speedup>`. 
  The interface and APIs are unified for both PyTorch and TensorFlow. Currently only PyTorch version has been supported, and TensorFlow version will be supported in future.
@@ -69,7 +68,7 @@ Pruning algorithms compress the original network by removing redundant weights o
   * - Name
     - Brief Introduction of Algorithm
   * - :ref:`level-pruner`
-     - Pruning the specified ratio on each weight based on absolute values of weights
+     - Pruning the specified ratio on each weight element based on absolute value of weight element
   * - :ref:`l1-norm-pruner`
     - Pruning output channels with the smallest L1 norm of weights (Pruning Filters for Efficient Convnets) `Reference Paper <https://arxiv.org/abs/1608.08710>`__
   * - :ref:`l2-norm-pruner`
@@ -140,8 +139,8 @@ The following figure shows how NNI prunes and speeds up your models.
   :scale: 40%
   :alt:
-The detailed tutorial of Speedup Model with Mask can be found :doc:`here <../tutorials/pruning_speedup>`.
+The detailed tutorial of Speedup Model with Mask can be found :doc:`here <../tutorials/cp_pruning_speedup>`.
-The detailed tutorial of Speedup Model with Calibration Config can be found :doc:`here <../tutorials/quantization_speedup>`.
+The detailed tutorial of Speedup Model with Calibration Config can be found :doc:`here <../tutorials/cp_quantization_speedup>`.
 .. attention::

--- a/docs/source/compression/index_zh.rst
+++ b/docs/source/compression/index_zh.rst
-.. f84e48e51358b0989cd18267e5abd329
+.. 24e3b17b67378b92fbb84467e3b8c7eb
 模型压缩
 ========

--- a/docs/source/compression/pruning.rst
+++ b/docs/source/compression/pruning.rst
@@ -13,17 +13,18 @@ The following concepts can help you understand pruning in NNI.
 Pruning target means where we apply the sparsity.
 Most pruning methods prune the weights to reduce the model size and accelerate the inference latency.
-Other pruning methods also apply sparsity on the inputs, outputs or intermediate states to accelerate the inference latency.
+Other pruning methods also apply sparsity on activations (e.g., inputs, outputs, or feature maps) to accelerate the inference latency.
-NNI support pruning module weights right now, and will support other pruning targets in the future.
+NNI supports pruning module weights right now, and will support other pruning targets in the future.
 .. rubric:: Basic Pruner
-Basic pruner generates the masks for each pruning targets (weights) for a determined sparsity ratio.
+Basic pruner generates the masks for each pruning target (weights) for a determined sparsity ratio.
-It usually takes model and config as input arguments, then generate a mask for the model.
+It usually takes model and config as input arguments, then generates masks for each pruning target.
 .. rubric:: Scheduled Pruner
-Scheduled pruner decides how to allocate sparsity ratio to each pruning targets, it also handles the pruning speedup and finetuning logic.
+Scheduled pruner decides how to allocate sparsity ratio to each pruning target,
+it also handles the model speedup (after each pruning iteration) and finetuning logic.
 From the implementation logic, the scheduled pruner is a combination of pruning scheduler, basic pruner and task generator.
 Task generator only cares about the pruning effect that should be achieved in each round, and uses a config list to express how to pruning.
@@ -42,23 +43,23 @@ More information about scheduled pruning process please refer to :doc:`Pruning S
 .. rubric:: Granularity
 Fine-grained pruning or unstructured pruning refers to pruning each individual weights separately.
-Coarse-grained pruning or structured pruning is pruning entire group of weights, such as a convolutional filter.
+Coarse-grained pruning or structured pruning is pruning a regular group of weights, such as a convolutional filter.
-:ref:`level-pruner` is the only fine-grained pruner in NNI, all other pruners pruning the output channels on weights.
+Only :ref:`level-pruner` and :ref:`admm-pruner` support fine-grained pruning, all other pruners do some kind of structured pruning on weights.
 .. _dependency-awareode-for-output-channel-pruning:
 .. rubric:: Dependency-aware Mode for Output Channel Pruning
-Currently, we support ``dependency aware`` mode in several ``pruner``: :ref:`l1-norm-pruner`, :ref:`l2-norm-pruner`, :ref:`fpgm-pruner`,
+Currently, we support dependency-aware mode in several ``pruner``: :ref:`l1-norm-pruner`, :ref:`l2-norm-pruner`, :ref:`fpgm-pruner`,
 :ref:`activation-apoz-rank-pruner`, :ref:`activation-mean-rank-pruner`, :ref:`taylor-fo-weight-pruner`.
 In these pruning algorithms, the pruner will prune each layer separately. While pruning a layer,
-the algorithm will quantify the importance of each filter based on some specific rules(such as l1 norm), and prune the less important output channels.
+the algorithm will quantify the importance of each filter based on some specific metrics(such as l1 norm), and prune the less important output channels.
-We use pruning convolutional layers as an example to explain ``dependency aware`` mode.
+We use pruning convolutional layers as an example to explain dependency-aware mode.
-As :doc:`dependency analysis utils <./compression_utils>` shows, if the output channels of two convolutional layers(conv1, conv2) are added together,
+As :ref:`topology analysis utils <topology-analysis>` shows, if the output channels of two convolutional layers(conv1, conv2) are added together,
-then these two convolutional layers have channel dependency with each other(more details please see :doc:`Compression Utils <./compression_utils>` ).
+then these two convolutional layers have channel dependency with each other (more details please see :ref:`ChannelDependency <topology-analysis>`).
 Take the following figure as an example.
 .. image:: ../../img/mask_conflict.jpg

--- a/docs/source/compression/pruning_scheduler.rst
+++ b/docs/source/compression/pruning_scheduler.rst
@@ -42,7 +42,7 @@ Using AGP Pruning as an example to explain how to implement an iterative pruning
 The full script can be found :githublink:`here <examples/model_compress/pruning/v2/scheduler_torch.py>`.
-In this example, we use ``dependency_aware`` mode L1 Norm Pruner as a basic pruner during each iteration.
+In this example, we use dependency-aware mode L1 Norm Pruner as a basic pruner during each iteration.
 Note we do not need to pass ``model`` and ``config_list`` to the pruner, because in each iteration the ``model`` and ``config_list`` used by the pruner are received from the task generator.
 Then we can use ``scheduler`` as an iterative pruner directly. In fact, this is the implementation of ``AGPPruner`` in NNI.

--- a/docs/source/compression/quantization.rst
+++ b/docs/source/compression/quantization.rst
@@ -7,8 +7,8 @@ format for model weights is 32-bit float, or FP32. Many research works have demo
 can be represented using 8-bit integers without significant loss in accuracy. Even lower bit-widths, such as 4/2/1 bits,
 is an active field of research.
-A quantizer is a quantization algorithm implementation in NNI, NNI provides multiple quantizers as below. You can also
+A quantizer is a quantization algorithm implementation in NNI.
-create your own quantizer using NNI model compression interface.
+You can also :doc:`create your own quantizer <../tutorials/quantization_customize>` using NNI model compression interface.
 .. toctree::
    :hidden:

--- a/docs/source/tutorials/pruning_quick_start_mnist.ipynb
+++ b/docs/source/tutorials/pruning_quick_start_mnist.ipynb
@@ -15,14 +15,14 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "\n# Pruning Quickstart\n\nModel pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.\nIt usually has following paths:\n\n#. Pre-training a model -> Pruning the model -> Fine-tuning the model\n#. Pruning the model aware training -> Fine-tuning the model\n#. Pruning the model -> Pre-training the compact model\n\nNNI supports the above three modes and mainly focuses on the pruning stage.\nFollow this tutorial for a quick look at how to use NNI to prune a model in a common practice.\n"
+        "\n# Pruning Quickstart\n\nModel pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.\nThere are three common practices for pruning a DNN model:\n\n#. Pre-training a model -> Pruning the model -> Fine-tuning the pruned model\n#. Pruning a model during training (i.e., pruning aware training) -> Fine-tuning the pruned model\n#. Pruning a model -> Training the pruned model from scratch\n\nNNI supports all of the above pruning practices by working on the key pruning stage.\nFollowing this tutorial for a quick look at how to use NNI to prune a model in a common practice.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "## Preparation\n\nIn this tutorial, we use a simple model and pre-train on MNIST dataset.\nIf you are familiar with defining a model and training in pytorch, you can skip directly to `Pruning Model`_.\n\n"
+        "## Preparation\n\nIn this tutorial, we use a simple model and pre-trained on MNIST dataset.\nIf you are familiar with defining a model and training in pytorch, you can skip directly to `Pruning Model`_.\n\n"
      ]
    },
    {
@@ -51,7 +51,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "## Pruning Model\n\nUsing L1NormPruner pruning the model and generating the masks.\nUsually, pruners require original model and ``config_list`` as parameters.\nDetailed about how to write ``config_list`` please refer :doc:`compression config specification <../compression/compression_config_list>`.\n\nThis `config_list` means all layers whose type is `Linear` or `Conv2d` will be pruned,\nexcept the layer named `fc3`, because `fc3` is `exclude`.\nThe final sparsity ratio for each layer is 50%. The layer named `fc3` will not be pruned.\n\n"
+        "## Pruning Model\n\nUsing L1NormPruner to prune the model and generate the masks.\nUsually, a pruner requires original model and ``config_list`` as its inputs.\nDetailed about how to write ``config_list`` please refer :doc:`compression config specification <../compression/compression_config_list>`.\n\nThe following `config_list` means all layers whose type is `Linear` or `Conv2d` will be pruned,\nexcept the layer named `fc3`, because `fc3` is `exclude`.\nThe final sparsity ratio for each layer is 50%. The layer named `fc3` will not be pruned.\n\n"
      ]
    },
    {

--- a/docs/source/tutorials/pruning_quick_start_mnist.py
+++ b/docs/source/tutorials/pruning_quick_start_mnist.py
@@ -3,21 +3,21 @@ Pruning Quickstart
 ==================
 Model pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.
-It usually has following paths:
+There are three common practices for pruning a DNN model:
-#. Pre-training a model -> Pruning the model -> Fine-tuning the model
+#. Pre-training a model -> Pruning the model -> Fine-tuning the pruned model
-#. Pruning the model aware training -> Fine-tuning the model
+#. Pruning a model during training (i.e., pruning aware training) -> Fine-tuning the pruned model
-#. Pruning the model -> Pre-training the compact model
+#. Pruning a model -> Training the pruned model from scratch
-NNI supports the above three modes and mainly focuses on the pruning stage.
+NNI supports all of the above pruning practices by working on the key pruning stage.
-Follow this tutorial for a quick look at how to use NNI to prune a model in a common practice.
+Following this tutorial for a quick look at how to use NNI to prune a model in a common practice.
 """
 # %%
 # Preparation
 # -----------
 #
-# In this tutorial, we use a simple model and pre-train on MNIST dataset.
+# In this tutorial, we use a simple model and pre-trained on MNIST dataset.
 # If you are familiar with defining a model and training in pytorch, you can skip directly to `Pruning Model`_.
 import torch
@@ -48,11 +48,11 @@ for epoch in range(3):
 # Pruning Model
 # -------------
 #
-# Using L1NormPruner pruning the model and generating the masks.
+# Using L1NormPruner to prune the model and generate the masks.
-# Usually, pruners require original model and ``config_list`` as parameters.
+# Usually, a pruner requires original model and ``config_list`` as its inputs.
 # Detailed about how to write ``config_list`` please refer :doc:`compression config specification <../compression/compression_config_list>`.
 #
-# This `config_list` means all layers whose type is `Linear` or `Conv2d` will be pruned,
+# The following `config_list` means all layers whose type is `Linear` or `Conv2d` will be pruned,
 # except the layer named `fc3`, because `fc3` is `exclude`.
 # The final sparsity ratio for each layer is 50%. The layer named `fc3` will not be pruned.

--- a/docs/source/tutorials/pruning_quick_start_mnist.py.md5
+++ b/docs/source/tutorials/pruning_quick_start_mnist.py.md5
-d92f924d6f03735378a573851b96f54e
+8be66184a81594952338aa3da270833f
\ No newline at end of file
--- a/docs/source/tutorials/pruning_quick_start_mnist.rst
+++ b/docs/source/tutorials/pruning_quick_start_mnist.rst
@@ -22,21 +22,21 @@ Pruning Quickstart
 ==================
 Model pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.
-It usually has following paths:
+There are three common practices for pruning a DNN model:
-#. Pre-training a model -> Pruning the model -> Fine-tuning the model
+#. Pre-training a model -> Pruning the model -> Fine-tuning the pruned model
-#. Pruning the model aware training -> Fine-tuning the model
+#. Pruning a model during training (i.e., pruning aware training) -> Fine-tuning the pruned model
-#. Pruning the model -> Pre-training the compact model
+#. Pruning a model -> Training the pruned model from scratch
-NNI supports the above three modes and mainly focuses on the pruning stage.
+NNI supports all of the above pruning practices by working on the key pruning stage.
-Follow this tutorial for a quick look at how to use NNI to prune a model in a common practice.
+Following this tutorial for a quick look at how to use NNI to prune a model in a common practice.
 .. GENERATED FROM PYTHON SOURCE LINES 17-22
 Preparation
 -----------
-In this tutorial, we use a simple model and pre-train on MNIST dataset.
+In this tutorial, we use a simple model and pre-trained on MNIST dataset.
 If you are familiar with defining a model and training in pytorch, you can skip directly to `Pruning Model`_.
 .. GENERATED FROM PYTHON SOURCE LINES 22-35
@@ -102,9 +102,9 @@ If you are familiar with defining a model and training in pytorch, you can skip
 .. code-block:: none
-    Average test loss: 0.5606, Accuracy: 8239/10000 (82%)
+    Average test loss: 0.5266, Accuracy: 8345/10000 (83%)
-    Average test loss: 0.2550, Accuracy: 9228/10000 (92%)
+    Average test loss: 0.2713, Accuracy: 9209/10000 (92%)
-    Average test loss: 0.1870, Accuracy: 9432/10000 (94%)
+    Average test loss: 0.1919, Accuracy: 9356/10000 (94%)
@@ -114,11 +114,11 @@ If you are familiar with defining a model and training in pytorch, you can skip
 Pruning Model
 -------------
-Using L1NormPruner pruning the model and generating the masks.
+Using L1NormPruner to prune the model and generate the masks.
-Usually, pruners require original model and ``config_list`` as parameters.
+Usually, a pruner requires original model and ``config_list`` as its inputs.
 Detailed about how to write ``config_list`` please refer :doc:`compression config specification <../compression/compression_config_list>`.
-This `config_list` means all layers whose type is `Linear` or `Conv2d` will be pruned,
+The following `config_list` means all layers whose type is `Linear` or `Conv2d` will be pruned,
 except the layer named `fc3`, because `fc3` is `exclude`.
 The final sparsity ratio for each layer is 50%. The layer named `fc3` will not be pruned.
@@ -308,7 +308,7 @@ Because speedup will replace the masked big layers with dense small ones.
 .. rst-class:: sphx-glr-timing
-   **Total running time of the script:** ( 1 minutes  26.953 seconds)
+   **Total running time of the script:** ( 1 minutes  24.976 seconds)
 .. _sphx_glr_download_tutorials_pruning_quick_start_mnist.py:

--- a/docs/source/tutorials/pruning_quick_start_mnist_codeobj.pickle
+++ b/docs/source/tutorials/pruning_quick_start_mnist_codeobj.pickle
--- a/docs/source/tutorials/quantization_quick_start_mnist.ipynb
+++ b/docs/source/tutorials/quantization_quick_start_mnist.ipynb
@@ -69,7 +69,25 @@
      },
      "outputs": [],
      "source": [
-        "from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer\ndummy_input = torch.rand(32, 1, 28, 28).to(device)\nquantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)\nquantizer.compress()\nfor epoch in range(3):\n    trainer(model, optimizer, criterion)\n    evaluator(model)"
+        "from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer\ndummy_input = torch.rand(32, 1, 28, 28).to(device)\nquantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)\nquantizer.compress()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The model has now been wrapped, and quantization targets ('quant_types' setting in `config_list`)\nwill be quantized & dequantized for simulated quantization in the wrapped layers.\nQAT is a training-aware quantizer, it will update scale and zero point during training.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "for epoch in range(3):\n    trainer(model, optimizer, criterion)\n    evaluator(model)"
      ]
    },
    {

--- a/docs/source/tutorials/quantization_quick_start_mnist.py
+++ b/docs/source/tutorials/quantization_quick_start_mnist.py
@@ -65,6 +65,12 @@ from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
 dummy_input = torch.rand(32, 1, 28, 28).to(device)
 quantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)
 quantizer.compress()
+# %%
+# The model has now been wrapped, and quantization targets ('quant_types' setting in `config_list`)
+# will be quantized & dequantized for simulated quantization in the wrapped layers.
+# QAT is a training-aware quantizer, it will update scale and zero point during training.
 for epoch in range(3):
    trainer(model, optimizer, criterion)
    evaluator(model)

--- a/docs/source/tutorials/quantization_quick_start_mnist.py.md5
+++ b/docs/source/tutorials/quantization_quick_start_mnist.py.md5
-6907e4d1a88c7f2f64e18e34a105e68e
+15b0515a271445cfa4648fe832aa6a43
\ No newline at end of file
--- a/docs/source/tutorials/quantization_quick_start_mnist.rst
+++ b/docs/source/tutorials/quantization_quick_start_mnist.rst
@@ -68,9 +68,9 @@ If you are familiar with defining a model and training in pytorch, you can skip
 .. code-block:: none
-    Average test loss: 0.4877, Accuracy: 8541/10000 (85%)
+    Average test loss: 0.4043, Accuracy: 8879/10000 (89%)
-    Average test loss: 0.2618, Accuracy: 9191/10000 (92%)
+    Average test loss: 0.2668, Accuracy: 9212/10000 (92%)
-    Average test loss: 0.1626, Accuracy: 9543/10000 (95%)
+    Average test loss: 0.1599, Accuracy: 9510/10000 (95%)
@@ -117,7 +117,7 @@ Detailed about how to write ``config_list`` please refer :doc:`compression confi
 finetuning the model by using QAT
-.. GENERATED FROM PYTHON SOURCE LINES 64-72
+.. GENERATED FROM PYTHON SOURCE LINES 64-69
 .. code-block:: default
@@ -125,6 +125,45 @@ finetuning the model by using QAT
    dummy_input = torch.rand(32, 1, 28, 28).to(device)
    quantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)
    quantizer.compress()
+.. rst-class:: sphx-glr-script-out
+ Out:
+ .. code-block:: none
+    op_names ['relu1'] not found in model
+    op_names ['relu2'] not found in model
+    TorchModel(
+      (conv1): QuantizerModuleWrapper(
+        (module): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
+      )
+      (conv2): QuantizerModuleWrapper(
+        (module): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
+      )
+      (fc1): Linear(in_features=256, out_features=120, bias=True)
+      (fc2): Linear(in_features=120, out_features=84, bias=True)
+      (fc3): Linear(in_features=84, out_features=10, bias=True)
+    )
+.. GENERATED FROM PYTHON SOURCE LINES 70-73
+The model has now been wrapped, and quantization targets ('quant_types' setting in `config_list`)
+will be quantized & dequantized for simulated quantization in the wrapped layers.
+QAT is a training-aware quantizer, it will update scale and zero point during training.
+.. GENERATED FROM PYTHON SOURCE LINES 73-78
+.. code-block:: default
    for epoch in range(3):
        trainer(model, optimizer, criterion)
        evaluator(model)
@@ -139,20 +178,18 @@ finetuning the model by using QAT
 .. code-block:: none
-    op_names ['relu1'] not found in model
+    Average test loss: 0.1332, Accuracy: 9601/10000 (96%)
-    op_names ['relu2'] not found in model
+    Average test loss: 0.1180, Accuracy: 9657/10000 (97%)
-    Average test loss: 0.1739, Accuracy: 9441/10000 (94%)
+    Average test loss: 0.0894, Accuracy: 9714/10000 (97%)
-    Average test loss: 0.1078, Accuracy: 9671/10000 (97%)
-    Average test loss: 0.0991, Accuracy: 9696/10000 (97%)
-.. GENERATED FROM PYTHON SOURCE LINES 73-74
+.. GENERATED FROM PYTHON SOURCE LINES 79-80
 export model and get calibration_config
-.. GENERATED FROM PYTHON SOURCE LINES 74-79
+.. GENERATED FROM PYTHON SOURCE LINES 80-85
 .. code-block:: default
@@ -171,7 +208,7 @@ export model and get calibration_config
 .. code-block:: none
-    calibration_config:  {'conv1': {'weight_bits': 8, 'weight_scale': tensor([0.0034], device='cuda:0'), 'weight_zero_point': tensor([75.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': -0.4242129623889923, 'tracked_max_input': 2.821486711502075}, 'conv2': {'weight_bits': 8, 'weight_scale': tensor([0.0018], device='cuda:0'), 'weight_zero_point': tensor([110.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 13.838628768920898}}
+    calibration_config:  {'conv1': {'weight_bits': 8, 'weight_scale': tensor([0.0040], device='cuda:0'), 'weight_zero_point': tensor([84.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': -0.4242129623889923, 'tracked_max_input': 2.821486711502075}, 'conv2': {'weight_bits': 8, 'weight_scale': tensor([0.0017], device='cuda:0'), 'weight_zero_point': tensor([111.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 18.413312911987305}}
@@ -179,7 +216,7 @@ export model and get calibration_config
 .. rst-class:: sphx-glr-timing
-   **Total running time of the script:** ( 1 minutes  51.644 seconds)
+   **Total running time of the script:** ( 1 minutes  46.015 seconds)
 .. _sphx_glr_download_tutorials_quantization_quick_start_mnist.py:

--- a/docs/source/tutorials/quantization_quick_start_mnist_codeobj.pickle
+++ b/docs/source/tutorials/quantization_quick_start_mnist_codeobj.pickle