Merge pull request #4668 from microsoft/doc-refactor

51d261e7 · J-shang · GitHub · d63a2ea3 · b469e1c1 · 51d261e7
Unverified Commit 51d261e7 authored Mar 22, 2022 by J-shang Committed by GitHub Mar 22, 2022
20 changed files
--- a/docs/source/tutorials/quantization_quick_start_mnist_codeobj.pickle
+++ b/docs/source/tutorials/quantization_quick_start_mnist_codeobj.pickle
--- a/docs/source/tutorials/quantization_speed_up.ipynb
+++ b/docs/source/tutorials/quantization_speed_up.ipynb
+{
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "%matplotlib inline"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n# Speed Up Model with Calibration Config\n\n\n## Introduction\n\nDeep learning network has been computational intensive and memory intensive \nwhich increases the difficulty of deploying deep neural network model. Quantization is a \nfundamental technology which is widely used to reduce memory footprint and speed up inference \nprocess. Many frameworks begin to support quantization, but few of them support mixed precision \nquantization and get real speedup. Frameworks like `HAQ: Hardware-Aware Automated Quantization with Mixed Precision <https://arxiv.org/pdf/1811.08886.pdf>`__\\, only support simulated mixed precision quantization which will \nnot speed up the inference process. To get real speedup of mixed precision quantization and \nhelp people get the real feedback from hardware, we design a general framework with simple interface to allow NNI quantization algorithms to connect different \nDL model optimization backends (e.g., TensorRT, NNFusion), which gives users an end-to-end experience that after quantizing their model \nwith quantization algorithms, the quantized model can be directly speeded up with the connected optimization backend. NNI connects \nTensorRT at this stage, and will support more backends in the future.\n\n\n## Design and Implementation\n\nTo support speeding up mixed precision quantization, we divide framework into two part, frontend and backend.  \nFrontend could be popular training frameworks such as PyTorch, TensorFlow etc. Backend could be inference \nframework for different hardwares, such as TensorRT. At present, we support PyTorch as frontend and \nTensorRT as backend. To convert PyTorch model to TensorRT engine, we leverage onnx as intermediate graph \nrepresentation. In this way, we convert PyTorch model to onnx model, then TensorRT parse onnx \nmodel to generate inference engine. \n\n\nQuantization aware training combines NNI quantization algorithm 'QAT' and NNI quantization speedup tool.\nUsers should set config to train quantized model using QAT algorithm(please refer to `NNI Quantization Algorithms <https://nni.readthedocs.io/en/stable/Compression/Quantizer.html>`__\\  ).\nAfter quantization aware training, users can get new config with calibration parameters and model with quantized weight. By passing new config and model to quantization speedup tool, users can get real mixed precision speedup engine to do inference.\n\n\nAfter getting mixed precision engine, users can do inference with input data.\n\n\nNote\n\n\n* Recommend using \"cpu\"(host) as data device(for both inference data and calibration data) since data should be on host initially and it will be transposed to device before inference. If data type is not \"cpu\"(host), this tool will transpose it to \"cpu\" which may increases unnecessary overhead.\n* User can also do post-training quantization leveraging TensorRT directly(need to provide calibration dataset).\n* Not all op types are supported right now. At present, NNI supports Conv, Linear, Relu and MaxPool. More op types will be supported in the following release.\n\n\n## Prerequisite\nCUDA version >= 11.0\n\nTensorRT version >= 7.2\n\nNote\n\n* If you haven't installed TensorRT before or use the old version, please refer to `TensorRT Installation Guide <https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html>`__\\  \n\n## Usage\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import torch\nimport torch.nn.functional as F\nfrom torch.optim import SGD\nfrom scripts.compression_mnist_model import TorchModel, device, trainer, evaluator, test_trt\n\nconfig_list = [{\n    'quant_types': ['input', 'weight'],\n    'quant_bits': {'input': 8, 'weight': 8},\n    'op_names': ['conv1']\n}, {\n    'quant_types': ['output'],\n    'quant_bits': {'output': 8},\n    'op_names': ['relu1']\n}, {\n    'quant_types': ['input', 'weight'],\n    'quant_bits': {'input': 8, 'weight': 8},\n    'op_names': ['conv2']\n}, {\n    'quant_types': ['output'],\n    'quant_bits': {'output': 8},\n    'op_names': ['relu2']\n}]\n\nmodel = TorchModel().to(device)\noptimizer = SGD(model.parameters(), lr=0.01, momentum=0.5)\ncriterion = F.nll_loss\ndummy_input = torch.rand(32, 1, 28,28).to(device)\n\nfrom nni.algorithms.compression.pytorch.quantization import QAT_Quantizer\nquantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)\nquantizer.compress()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "finetuning the model by using QAT\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "for epoch in range(3):\n    trainer(model, optimizer, criterion)\n    evaluator(model)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "export model and get calibration_config\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "model_path = \"./log/mnist_model.pth\"\ncalibration_path = \"./log/mnist_calibration.pth\"\ncalibration_config = quantizer.export_model(model_path, calibration_path)\n\nprint(\"calibration_config: \", calibration_config)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "build tensorRT engine to make a real speed up\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "# from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT\n# input_shape = (32, 1, 28, 28)\n# engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=32)\n# engine.compress()\n# test_trt(engine)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Note that NNI also supports post-training quantization directly, please refer to complete examples for detail.\n\nFor complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`.\n\nFor more parameters about the class 'TensorRTModelSpeedUp', you can refer to `Model Compression API Reference <https://nni.readthedocs.io/en/stable/Compression/CompressionReference.html#quantization-speedup>`__\\.\n\n### Mnist test\n\non one GTX2080 GPU,\ninput tensor: ``torch.randn(128, 1, 28, 28)``\n\n.. list-table::\n   :header-rows: 1\n   :widths: auto\n\n   * - quantization strategy\n     - Latency\n     - accuracy\n   * - all in 32bit\n     - 0.001199961\n     - 96%\n   * - mixed precision(average bit 20.4)\n     - 0.000753688\n     - 96%\n   * - all in 8bit\n     - 0.000229869\n     - 93.7%\n\n### Cifar10 resnet18 test (train one epoch)\n\non one GTX2080 GPU,\ninput tensor: ``torch.randn(128, 3, 32, 32)``\n\n.. list-table::\n   :header-rows: 1\n   :widths: auto\n\n   * - quantization strategy\n     - Latency\n     - accuracy\n   * - all in 32bit\n     - 0.003286268\n     - 54.21%\n   * - mixed precision(average bit 11.55)\n     - 0.001358022\n     - 54.78%\n   * - all in 8bit\n     - 0.000859139\n     - 52.81%\n\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.8.8"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
\ No newline at end of file
--- a/docs/source/Compression/QuantizationSpeedup.rst
+++ b/docs/source/Compression/QuantizationSpeedup.rst
-Speed up Mixed Precision Quantization Model (experimental)
-==========================================================
+"""
+Speed Up Model with Calibration Config
+======================================


 Introduction
@@ -56,87 +57,114 @@ Note

 Usage
 -----
-quantization aware training:

-.. code-block:: python
-
-    # arrange bit config for QAT algorithm
-    configure_list = [{
-            'quant_types': ['weight', 'output'],
-            'quant_bits': {'weight':8, 'output':8},
-            'op_names': ['conv1']
-        }, {
-            'quant_types': ['output'],
-            'quant_bits': {'output':8},
-            'op_names': ['relu1']
-        }
-    ]
-
-    quantizer = QAT_Quantizer(model, configure_list, optimizer)
-    quantizer.compress()
-    calibration_config = quantizer.export_model(model_path, calibration_path)
-
-    engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=batch_size)
-    # build tensorrt inference engine
-    engine.compress()
-    # data should be pytorch tensor
-    output, time = engine.inference(data)
-
-
-Note that NNI also supports post-training quantization directly, please refer to complete examples for detail.
-
-
-For complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`.
-
-
-For more parameters about the class 'TensorRTModelSpeedUp', you can refer to `Model Compression API Reference <https://nni.readthedocs.io/en/stable/Compression/CompressionReference.html#quantization-speedup>`__\.
-
-
-Mnist test
-^^^^^^^^^^^^^^^^^^^
-
-on one GTX2080 GPU,
-input tensor: ``torch.randn(128, 1, 28, 28)``
-
-.. list-table::
-   :header-rows: 1
-   :widths: auto
-
-   * - quantization strategy
-     - Latency
-     - accuracy
-   * - all in 32bit
-     - 0.001199961
-     - 96%
-   * - mixed precision(average bit 20.4)
-     - 0.000753688
-     - 96%
-   * - all in 8bit
-     - 0.000229869
-     - 93.7%
-
-
-Cifar10 resnet18 test(train one epoch)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-
-on one GTX2080 GPU,
-input tensor: ``torch.randn(128, 3, 32, 32)``
-
-
-.. list-table::
-   :header-rows: 1
-   :widths: auto
-
-   * - quantization strategy
-     - Latency
-     - accuracy
-   * - all in 32bit
-     - 0.003286268
-     - 54.21%
-   * - mixed precision(average bit 11.55)
-     - 0.001358022
-     - 54.78%
-   * - all in 8bit
-     - 0.000859139
-     - 52.81%
\ No newline at end of file
+"""
+
+# %%
+import torch
+import torch.nn.functional as F
+from torch.optim import SGD
+from scripts.compression_mnist_model import TorchModel, device, trainer, evaluator, test_trt
+
+config_list = [{
+    'quant_types': ['input', 'weight'],
+    'quant_bits': {'input': 8, 'weight': 8},
+    'op_names': ['conv1']
+}, {
+    'quant_types': ['output'],
+    'quant_bits': {'output': 8},
+    'op_names': ['relu1']
+}, {
+    'quant_types': ['input', 'weight'],
+    'quant_bits': {'input': 8, 'weight': 8},
+    'op_names': ['conv2']
+}, {
+    'quant_types': ['output'],
+    'quant_bits': {'output': 8},
+    'op_names': ['relu2']
+}]
+
+model = TorchModel().to(device)
+optimizer = SGD(model.parameters(), lr=0.01, momentum=0.5)
+criterion = F.nll_loss
+dummy_input = torch.rand(32, 1, 28,28).to(device)
+
+from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
+quantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)
+quantizer.compress()
+
+# %%
+# finetuning the model by using QAT
+for epoch in range(3):
+    trainer(model, optimizer, criterion)
+    evaluator(model)
+
+# %%
+# export model and get calibration_config
+model_path = "./log/mnist_model.pth"
+calibration_path = "./log/mnist_calibration.pth"
+calibration_config = quantizer.export_model(model_path, calibration_path)
+
+print("calibration_config: ", calibration_config)
+
+# %%
+# build tensorRT engine to make a real speed up
+
+# from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
+# input_shape = (32, 1, 28, 28)
+# engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=32)
+# engine.compress()
+# test_trt(engine)
+
+# %%
+# Note that NNI also supports post-training quantization directly, please refer to complete examples for detail.
+#
+# For complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`.
+#
+# For more parameters about the class 'TensorRTModelSpeedUp', you can refer to `Model Compression API Reference <https://nni.readthedocs.io/en/stable/Compression/CompressionReference.html#quantization-speedup>`__\.
+#
+# Mnist test
+# ^^^^^^^^^^
+#
+# on one GTX2080 GPU,
+# input tensor: ``torch.randn(128, 1, 28, 28)``
+#
+# .. list-table::
+#    :header-rows: 1
+#    :widths: auto
+#
+#    * - quantization strategy
+#      - Latency
+#      - accuracy
+#    * - all in 32bit
+#      - 0.001199961
+#      - 96%
+#    * - mixed precision(average bit 20.4)
+#      - 0.000753688
+#      - 96%
+#    * - all in 8bit
+#      - 0.000229869
+#      - 93.7%
+#
+# Cifar10 resnet18 test (train one epoch)
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#
+# on one GTX2080 GPU,
+# input tensor: ``torch.randn(128, 3, 32, 32)``
+#
+# .. list-table::
+#    :header-rows: 1
+#    :widths: auto
+#
+#    * - quantization strategy
+#      - Latency
+#      - accuracy
+#    * - all in 32bit
+#      - 0.003286268
+#      - 54.21%
+#    * - mixed precision(average bit 11.55)
+#      - 0.001358022
+#      - 54.78%
+#    * - all in 8bit
+#      - 0.000859139
+#      - 52.81%
--- a/docs/source/tutorials/quantization_speed_up.py.md5
+++ b/docs/source/tutorials/quantization_speed_up.py.md5
+07fe95336a0d7edb8924dc24b609b361
\ No newline at end of file
--- a/docs/source/tutorials/quantization_speed_up.rst
+++ b/docs/source/tutorials/quantization_speed_up.rst
+
+.. DO NOT EDIT.
+.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
+.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
+.. "tutorials/quantization_speed_up.py"
+.. LINE NUMBERS ARE GIVEN BELOW.
+
+.. only:: html
+
+    .. note::
+        :class: sphx-glr-download-link-note
+
+        Click :ref:`here <sphx_glr_download_tutorials_quantization_speed_up.py>`
+        to download the full example code
+
+.. rst-class:: sphx-glr-example-title
+
+.. _sphx_glr_tutorials_quantization_speed_up.py:
+
+
+Speed Up Model with Calibration Config
+======================================
+
+
+Introduction
+------------
+
+Deep learning network has been computational intensive and memory intensive 
+which increases the difficulty of deploying deep neural network model. Quantization is a 
+fundamental technology which is widely used to reduce memory footprint and speed up inference 
+process. Many frameworks begin to support quantization, but few of them support mixed precision 
+quantization and get real speedup. Frameworks like `HAQ: Hardware-Aware Automated Quantization with Mixed Precision <https://arxiv.org/pdf/1811.08886.pdf>`__\, only support simulated mixed precision quantization which will 
+not speed up the inference process. To get real speedup of mixed precision quantization and 
+help people get the real feedback from hardware, we design a general framework with simple interface to allow NNI quantization algorithms to connect different 
+DL model optimization backends (e.g., TensorRT, NNFusion), which gives users an end-to-end experience that after quantizing their model 
+with quantization algorithms, the quantized model can be directly speeded up with the connected optimization backend. NNI connects 
+TensorRT at this stage, and will support more backends in the future.
+
+
+Design and Implementation
+-------------------------
+
+To support speeding up mixed precision quantization, we divide framework into two part, frontend and backend.  
+Frontend could be popular training frameworks such as PyTorch, TensorFlow etc. Backend could be inference 
+framework for different hardwares, such as TensorRT. At present, we support PyTorch as frontend and 
+TensorRT as backend. To convert PyTorch model to TensorRT engine, we leverage onnx as intermediate graph 
+representation. In this way, we convert PyTorch model to onnx model, then TensorRT parse onnx 
+model to generate inference engine. 
+
+
+Quantization aware training combines NNI quantization algorithm 'QAT' and NNI quantization speedup tool.
+Users should set config to train quantized model using QAT algorithm(please refer to `NNI Quantization Algorithms <https://nni.readthedocs.io/en/stable/Compression/Quantizer.html>`__\  ).
+After quantization aware training, users can get new config with calibration parameters and model with quantized weight. By passing new config and model to quantization speedup tool, users can get real mixed precision speedup engine to do inference.
+
+
+After getting mixed precision engine, users can do inference with input data.
+
+
+Note
+
+
+* Recommend using "cpu"(host) as data device(for both inference data and calibration data) since data should be on host initially and it will be transposed to device before inference. If data type is not "cpu"(host), this tool will transpose it to "cpu" which may increases unnecessary overhead.
+* User can also do post-training quantization leveraging TensorRT directly(need to provide calibration dataset).
+* Not all op types are supported right now. At present, NNI supports Conv, Linear, Relu and MaxPool. More op types will be supported in the following release.
+
+
+Prerequisite
+------------
+CUDA version >= 11.0
+
+TensorRT version >= 7.2
+
+Note
+
+* If you haven't installed TensorRT before or use the old version, please refer to `TensorRT Installation Guide <https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html>`__\  
+
+Usage
+-----
+
+.. GENERATED FROM PYTHON SOURCE LINES 64-96
+
+.. code-block:: default
+
+    import torch
+    import torch.nn.functional as F
+    from torch.optim import SGD
+    from scripts.compression_mnist_model import TorchModel, device, trainer, evaluator, test_trt
+
+    config_list = [{
+        'quant_types': ['input', 'weight'],
+        'quant_bits': {'input': 8, 'weight': 8},
+        'op_names': ['conv1']
+    }, {
+        'quant_types': ['output'],
+        'quant_bits': {'output': 8},
+        'op_names': ['relu1']
+    }, {
+        'quant_types': ['input', 'weight'],
+        'quant_bits': {'input': 8, 'weight': 8},
+        'op_names': ['conv2']
+    }, {
+        'quant_types': ['output'],
+        'quant_bits': {'output': 8},
+        'op_names': ['relu2']
+    }]
+
+    model = TorchModel().to(device)
+    optimizer = SGD(model.parameters(), lr=0.01, momentum=0.5)
+    criterion = F.nll_loss
+    dummy_input = torch.rand(32, 1, 28,28).to(device)
+
+    from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
+    quantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)
+    quantizer.compress()
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    [2022-02-21 18:53:07] WARNING (nni.algorithms.compression.pytorch.quantization.qat_quantizer/MainThread) op_names ['relu1'] not found in model
+    [2022-02-21 18:53:07] WARNING (nni.algorithms.compression.pytorch.quantization.qat_quantizer/MainThread) op_names ['relu2'] not found in model
+
+    TorchModel(
+      (conv1): QuantizerModuleWrapper(
+        (module): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
+      )
+      (conv2): QuantizerModuleWrapper(
+        (module): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
+      )
+      (fc1): Linear(in_features=256, out_features=120, bias=True)
+      (fc2): Linear(in_features=120, out_features=84, bias=True)
+      (fc3): Linear(in_features=84, out_features=10, bias=True)
+    )
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 97-98
+
+finetuning the model by using QAT
+
+.. GENERATED FROM PYTHON SOURCE LINES 98-102
+
+.. code-block:: default
+
+    for epoch in range(3):
+        trainer(model, optimizer, criterion)
+        evaluator(model)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    Average test loss: 0.2524, Accuracy: 9209/10000 (92%)
+    Average test loss: 0.1711, Accuracy: 9461/10000 (95%)
+    Average test loss: 0.1037, Accuracy: 9690/10000 (97%)
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 103-104
+
+export model and get calibration_config
+
+.. GENERATED FROM PYTHON SOURCE LINES 104-110
+
+.. code-block:: default
+
+    model_path = "./log/mnist_model.pth"
+    calibration_path = "./log/mnist_calibration.pth"
+    calibration_config = quantizer.export_model(model_path, calibration_path)
+
+    print("calibration_config: ", calibration_config)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    [2022-02-21 18:53:54] INFO (nni.compression.pytorch.compressor/MainThread) Model state_dict saved to ./log/mnist_model.pth
+    [2022-02-21 18:53:54] INFO (nni.compression.pytorch.compressor/MainThread) Mask dict saved to ./log/mnist_calibration.pth
+    calibration_config:  {'conv1': {'weight_bits': 8, 'weight_scale': tensor([0.0026], device='cuda:0'), 'weight_zero_point': tensor([103.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': -0.4242129623889923, 'tracked_max_input': 2.821486711502075}, 'conv2': {'weight_bits': 8, 'weight_scale': tensor([0.0019], device='cuda:0'), 'weight_zero_point': tensor([116.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 10.175512313842773}}
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 111-112
+
+build tensorRT engine to make a real speed up
+
+.. GENERATED FROM PYTHON SOURCE LINES 112-119
+
+.. code-block:: default
+
+
+    # from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
+    # input_shape = (32, 1, 28, 28)
+    # engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=32)
+    # engine.compress()
+    # test_trt(engine)
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 120-171
+
+Note that NNI also supports post-training quantization directly, please refer to complete examples for detail.
+
+For complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`.
+
+For more parameters about the class 'TensorRTModelSpeedUp', you can refer to `Model Compression API Reference <https://nni.readthedocs.io/en/stable/Compression/CompressionReference.html#quantization-speedup>`__\.
+
+Mnist test
+^^^^^^^^^^
+
+on one GTX2080 GPU,
+input tensor: ``torch.randn(128, 1, 28, 28)``
+
+.. list-table::
+   :header-rows: 1
+   :widths: auto
+
+   * - quantization strategy
+     - Latency
+     - accuracy
+   * - all in 32bit
+     - 0.001199961
+     - 96%
+   * - mixed precision(average bit 20.4)
+     - 0.000753688
+     - 96%
+   * - all in 8bit
+     - 0.000229869
+     - 93.7%
+
+Cifar10 resnet18 test (train one epoch)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+on one GTX2080 GPU,
+input tensor: ``torch.randn(128, 3, 32, 32)``
+
+.. list-table::
+   :header-rows: 1
+   :widths: auto
+
+   * - quantization strategy
+     - Latency
+     - accuracy
+   * - all in 32bit
+     - 0.003286268
+     - 54.21%
+   * - mixed precision(average bit 11.55)
+     - 0.001358022
+     - 54.78%
+   * - all in 8bit
+     - 0.000859139
+     - 52.81%
+
+
+.. rst-class:: sphx-glr-timing
+
+   **Total running time of the script:** ( 0 minutes  52.798 seconds)
+
+
+.. _sphx_glr_download_tutorials_quantization_speed_up.py:
+
+
+.. only :: html
+
+ .. container:: sphx-glr-footer
+    :class: sphx-glr-footer-example
+
+
+
+  .. container:: sphx-glr-download sphx-glr-download-python
+
+     :download:`Download Python source code: quantization_speed_up.py <quantization_speed_up.py>`
+
+
+
+  .. container:: sphx-glr-download sphx-glr-download-jupyter
+
+     :download:`Download Jupyter notebook: quantization_speed_up.ipynb <quantization_speed_up.ipynb>`
+
+
+.. only:: html
+
+ .. rst-class:: sphx-glr-signature
+
+    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
--- a/docs/source/tutorials/quantization_speed_up_codeobj.pickle
+++ b/docs/source/tutorials/quantization_speed_up_codeobj.pickle
--- a/docs/source/tutorials/sg_execution_times.rst
+++ b/docs/source/tutorials/sg_execution_times.rst
@@ -5,10 +5,24 @@

 Computation times
 =================
-**00:24.663** total execution time for **tutorials** files:
+**03:24.740** total execution time for **tutorials** files:

-+-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorials_nni_experiment.py` (``nni_experiment.py``)               | 00:24.662 | 0.0 MB |
-+-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorials_nas_quick_start_mnist.py` (``nas_quick_start_mnist.py``) | 00:00.002 | 0.0 MB |
-+-----------------------------------------------------------------------------------+-----------+--------+
+-----------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_tutorials_quantization_quick_start_mnist.py` (``quantization_quick_start_mnist.py``) | 01:51.644 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_tutorials_pruning_quick_start_mnist.py` (``pruning_quick_start_mnist.py``)           | 01:33.096 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_tutorials_hello_nas.py` (``hello_nas.py``)                                           | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_tutorials_nasbench_as_dataset.py` (``nasbench_as_dataset.py``)                       | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_tutorials_nni_experiment.py` (``nni_experiment.py``)                                 | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_tutorials_pruning_customize.py` (``pruning_customize.py``)                           | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_tutorials_pruning_speed_up.py` (``pruning_speed_up.py``)                             | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_tutorials_quantization_customize.py` (``quantization_customize.py``)                 | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_tutorials_quantization_speed_up.py` (``quantization_speed_up.py``)                   | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
--- a/docs/static/css/material_custom.css
+++ b/docs/static/css/material_custom.css
+/* Material icon: https://developers.google.com/fonts/docs/material_icons */
+/* Icon library: https://fonts.google.com/icons */
+.material-icons {
+    font-family: 'Material Icons';
+    font-weight: normal;
+    font-style: normal;
+    font-size: 24px;  /* Preferred icon size */
+    display: inline-block;
+    line-height: 1;
+    text-transform: none;
+    letter-spacing: normal;
+    word-wrap: normal;
+    white-space: nowrap;
+    direction: ltr;
+  
+    /* Support for all WebKit browsers. */
+    -webkit-font-smoothing: antialiased;
+    /* Support for Safari and Chrome. */
+    text-rendering: optimizeLegibility;
+  
+    /* Support for Firefox. */
+    -moz-osx-font-smoothing: grayscale;
+  
+    /* Support for IE. */
+    font-feature-settings: 'liga';
+}
+
 /* viewcode link should have left padding */
 span.viewcode-link {
    padding-left: 0.6rem;
@@ -12,6 +39,37 @@ dt.sig-object {
    word-wrap: break-word;
 }

+.class > dt.sig-object {
+    border-left: none;                    /* remove left border */
+    border-top: 0.18rem solid #ec407a;  /* this should be matched with theme color. */
+}
+
+.function > dt.sig-object {
+    border-left: none;                    /* remove left border */
+    border-top: 0.18rem solid #ec407a;  /* this should be matched with theme color. */
+}
+
+.exception > dt.sig-object {
+    border-left: none;                    /* remove left border */
+    border-top: 0.18rem solid #ec407a;  /* this should be matched with theme color. */
+}
+
+/* Padding on parameter list is not needed */
+dl.field-list > dt {
+    padding-left: 0 !important;
+}
+
+dl.field-list > dd {
+    margin-left: 1.5em;
+}
+
+/* show headerlink when hover/focus */
+.headerlink:focus, .headerlink:hover {
+    -webkit-transform: translate(0);
+    transform: translate(0);
+    opacity: 1;
+}
+
 /* logo is too large */
 a.md-logo img {
    padding: 3px;
@@ -47,3 +105,115 @@ nav.md-tabs .md-tabs__item:not(:last-child) .md-tabs__link:after {
 .md-nav span.caption {
    margin-top: 1.25em;
 }
+
+/* citation style */
+.citation dt {
+    padding-right: 1em;
+}
+
+/* for release icon, on home page */
+.release-icon {
+    margin-left: 8px;
+    width: 40px;
+}
+
+/* Similar to cardlink, but used in codesnippet in index page. see sphinx_gallery.css */
+.codesnippet-card-container {
+    display: flex;
+    flex-flow: wrap row;
+}
+
+.codesnippet-card.admonition {
+    border-left: 0;
+    padding: 0;
+    margin: .5rem 1rem 1rem 0rem;
+    width: 100%;
+}
+
+/* Controlling the cards in containers only */
+.codesnippet-card-container .codesnippet-card.admonition {
+    width: 47%;
+}
+
+@media only screen and (max-width:59.9375em) {
+    .codesnippet-card-container .codesnippet-card.admonition {
+        width: 100%;
+    }
+}
+
+.codesnippet-card .codesnippet-card-body {
+    min-height: 4rem;
+    position: relative;
+    padding: 0.9rem 0.9rem 3rem 0.9rem;
+}
+
+.codesnippet-card .codesnippet-card-footer {
+    padding: 0.8rem 0.9rem;
+    border-top: 1px solid #ddd;
+    margin: 0 !important;
+    position: absolute;
+    bottom: 0;
+    width: 100%;
+}
+
+.codesnippet-card a:not(:hover) {
+    color: rgba(0, 0, 0, .68);
+}
+
+.codesnippet-card-title-container {
+    margin-top: 0.3rem;
+    position: relative;
+}
+
+.codesnippet-card-title-container h4 {
+    padding-left: 2.3rem;
+    line-height: 1.6rem;
+    height: 1.6rem;
+    margin-top: 0;
+}
+
+.codesnippet-card-icon {
+    position: absolute;
+    top: 0;
+    left: 0;
+}
+
+.codesnippet-card-icon img {
+    max-width: 100%;
+    max-height: 100%;
+    /* horizontal and vertical center */
+    /* https://stackoverflow.com/questions/7273338/how-to-vertically-align-an-image-inside-a-div */
+    text-align: center;
+    vertical-align: middle;
+    position: absolute;
+    left: 0;
+    right: 0;
+    top: 0;
+    bottom: 0;
+    margin: auto;
+}
+
+.codesnippet-card-icon {
+    width: 1.6rem;
+    height: 1.6rem;
+    padding: 0;
+}
+
+.codesnippet-card-link {
+    position: relative;
+}
+
+.codesnippet-card-link .material-icons {
+    position: absolute;
+    right: 0;
+}
+
+/* fixes reference overlapping issue */
+/* This is originally defined to be negative in application_fixes.css */
+/* They did that to ensure the header doesn't disappear in jump links */
+/* We did this by using scroll-margin-top instead */
+dt:target {
+    margin-top: 0.15rem !important;
+    padding-top: 0.5rem !important;
+    scroll-margin-top: 3.5rem;
+}
--- a/docs/templates/autosummary/module.rst
+++ b/docs/templates/autosummary/module.rst
+.. Modified from https://raw.githubusercontent.com/sphinx-doc/sphinx/4.x/sphinx/ext/autosummary/templates/autosummary/module.rst
+
+{% if fullname == 'nni' %}
+Python API Reference
+====================
+{% else %}
+{{ fullname | escape | underline }}
+{% endif %}
+
+.. automodule:: {{ fullname }}
+   :noindex:
+
+   {% block attributes %}
+   {% if attributes %}
+   .. rubric:: {{ _('Module Attributes') }}
+
+   .. autosummary::
+   {% for item in attributes %}
+      {{ item }}
+   {%- endfor %}
+   {% endif %}
+   {% endblock %}
+
+   {% block functions %}
+   {% if functions %}
+   .. rubric:: {{ _('Functions') }}
+
+   .. autosummary::
+   {% for item in functions %}
+      {{ item }}
+   {%- endfor %}
+   {% endif %}
+   {% endblock %}
+
+   {% block classes %}
+   {% if classes %}
+   .. rubric:: {{ _('Classes') }}
+
+   .. autosummary::
+   {% for item in classes %}
+      {{ item }}
+   {%- endfor %}
+   {% endif %}
+   {% endblock %}
+
+   {% block exceptions %}
+   {% if exceptions %}
+   .. rubric:: {{ _('Exceptions') }}
+
+   .. autosummary::
+   {% for item in exceptions %}
+      {{ item }}
+   {%- endfor %}
+   {% endif %}
+   {% endblock %}
+
+{% block modules %}
+{% if modules %}
+.. rubric:: Modules
+
+.. autosummary::
+   :toctree:
+   :recursive:
+{% for item in modules %}
+   {{ item }}
+{%- endfor %}
+{% endif %}
+{% endblock %}
--- a/examples/tutorials/.gitignore
+++ b/examples/tutorials/.gitignore
+data/
+log/
--- a/examples/tutorials/hello_nas.py
+++ b/examples/tutorials/hello_nas.py
+"""
+Hello, NAS!
+===========
+
+This is the 101 tutorial of Neural Architecture Search (NAS) on NNI.
+In this tutorial, we will search for a neural architecture on MNIST dataset with the help of NAS framework of NNI, i.e., *Retiarii*.
+We use multi-trial NAS as an example to show how to construct and explore a model space.
+
+There are mainly three crucial components for a neural architecture search task, namely,
+
+* Model search space that defines a set of models to explore.
+* A proper strategy as the method to explore this model space.
+* A model evaluator that reports the performance of every model in the space.
+
+Currently, PyTorch is the only supported framework by Retiarii, and we have only tested **PyTorch 1.7 to 1.10**.
+This tutorial assumes PyTorch context but it should also apply to other frameworks, which is in our future plan.
+
+Define your Model Space
+-----------------------
+
+Model space is defined by users to express a set of models that users want to explore, which contains potentially good-performing models.
+In this framework, a model space is defined with two parts: a base model and possible mutations on the base model.
+"""
+
+# %%
+#
+# Define Base Model
+# ^^^^^^^^^^^^^^^^^
+#
+# Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model.
+# Usually, you only need to replace the code ``import torch.nn as nn`` with
+# ``import nni.retiarii.nn.pytorch as nn`` to use our wrapped PyTorch modules.
+#
+# Below is a very simple example of defining a base model.
+
+import torch
+import torch.nn.functional as F
+import nni.retiarii.nn.pytorch as nn
+from nni.retiarii import model_wrapper
+
+
+@model_wrapper      # this decorator should be put on the out most
+class Net(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 32, 3, 1)
+        self.conv2 = nn.Conv2d(32, 64, 3, 1)
+        self.dropout1 = nn.Dropout(0.25)
+        self.dropout2 = nn.Dropout(0.5)
+        self.fc1 = nn.Linear(9216, 128)
+        self.fc2 = nn.Linear(128, 10)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        x = F.max_pool2d(self.conv2(x), 2)
+        x = torch.flatten(self.dropout1(x), 1)
+        x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+        output = F.log_softmax(x, dim=1)
+        return output
+
+# %%
+# .. tip:: Always keep in mind that you should use ``import nni.retiarii.nn.pytorch as nn`` and :meth:`nni.retiarii.model_wrapper`.
+#          Many mistakes are a result of forgetting one of those.
+#          Also, please use ``torch.nn`` for submodules of ``nn.init``, e.g., ``torch.nn.init`` instead of ``nn.init``.
+#
+# Define Model Mutations
+# ^^^^^^^^^^^^^^^^^^^^^^
+#
+# A base model is only one concrete model not a model space. We provide :doc:`API and Primitives </nas/construct_space>`
+# for users to express how the base model can be mutated. That is, to build a model space which includes many models.
+#
+# Based on the above base model, we can define a model space as below.
+#
+# .. code-block:: diff
+#
+#   @model_wrapper
+#   class Net(nn.Module):
+#     def __init__(self):
+#       super().__init__()
+#       self.conv1 = nn.Conv2d(1, 32, 3, 1)
+#   -   self.conv2 = nn.Conv2d(32, 64, 3, 1)
+#   +   self.conv2 = nn.LayerChoice([
+#   +       nn.Conv2d(32, 64, 3, 1),
+#   +       DepthwiseSeparableConv(32, 64)
+#   +   ])
+#   -   self.dropout1 = nn.Dropout(0.25)
+#   +   self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
+#       self.dropout2 = nn.Dropout(0.5)
+#   -   self.fc1 = nn.Linear(9216, 128)
+#   -   self.fc2 = nn.Linear(128, 10)
+#   +   feature = nn.ValueChoice([64, 128, 256])
+#   +   self.fc1 = nn.Linear(9216, feature)
+#   +   self.fc2 = nn.Linear(feature, 10)
+#
+#     def forward(self, x):
+#       x = F.relu(self.conv1(x))
+#       x = F.max_pool2d(self.conv2(x), 2)
+#       x = torch.flatten(self.dropout1(x), 1)
+#       x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+#       output = F.log_softmax(x, dim=1)
+#       return output
+#
+# This results in the following code:
+
+
+class DepthwiseSeparableConv(nn.Module):
+    def __init__(self, in_ch, out_ch):
+        super().__init__()
+        self.depthwise = nn.Conv2d(in_ch, in_ch, kernel_size=3, groups=in_ch)
+        self.pointwise = nn.Conv2d(in_ch, out_ch, kernel_size=1)
+
+    def forward(self, x):
+        return self.pointwise(self.depthwise(x))
+
+
+@model_wrapper
+class ModelSpace(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 32, 3, 1)
+        # LayerChoice is used to select a layer between Conv2d and DwConv.
+        self.conv2 = nn.LayerChoice([
+            nn.Conv2d(32, 64, 3, 1),
+            DepthwiseSeparableConv(32, 64)
+        ])
+        # ValueChoice is used to select a dropout rate.
+        # ValueChoice can be used as parameter of modules wrapped in `nni.retiarii.nn.pytorch`
+        # or customized modules wrapped with `@basic_unit`.
+        self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))  # choose dropout rate from 0.25, 0.5 and 0.75
+        self.dropout2 = nn.Dropout(0.5)
+        feature = nn.ValueChoice([64, 128, 256])
+        self.fc1 = nn.Linear(9216, feature)
+        self.fc2 = nn.Linear(feature, 10)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        x = F.max_pool2d(self.conv2(x), 2)
+        x = torch.flatten(self.dropout1(x), 1)
+        x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+        output = F.log_softmax(x, dim=1)
+        return output
+
+
+model_space = ModelSpace()
+model_space
+
+# %%
+# This example uses two mutation APIs, ``nn.LayerChoice`` and ``nn.ValueChoice``.
+# ``nn.LayerChoice`` takes a list of candidate modules (two in this example), one will be chosen for each sampled model.
+# It can be used like normal PyTorch module.
+# ``nn.ValueChoice`` takes a list of candidate values, one will be chosen to take effect for each sampled model.
+#
+# More detailed API description and usage can be found :doc:`here </nas/construct_space>`.
+#
+# .. note::
+#
+#     We are actively enriching the mutation APIs, to facilitate easy construction of model space.
+#     If the currently supported mutation APIs cannot express your model space,
+#     please refer to :doc:`this doc </nas/mutator>` for customizing mutators.
+#
+# Explore the Defined Model Space
+# -------------------------------
+#
+# There are basically two exploration approaches: (1) search by evaluating each sampled model independently,
+# which is the search approach in :ref:`multi-trial NAS <multi-trial-nas>`
+# and (2) one-shot weight-sharing based search, which is used in one-shot NAS.
+# We demonstrate the first approach in this tutorial. Users can refer to :ref:`here <one-shot-nas>` for the second approach.
+#
+# First, users need to pick a proper exploration strategy to explore the defined model space.
+# Second, users need to pick or customize a model evaluator to evaluate the performance of each explored model.
+#
+# Pick an exploration strategy
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#
+# Retiarii supports many :doc:`exploration strategies </nas/exploration_strategy>`.
+#
+# Simply choosing (i.e., instantiate) an exploration strategy as below.
+
+import nni.retiarii.strategy as strategy
+search_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is not wanted
+
+# %%
+# Pick or customize a model evaluator
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#
+# In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training
+# and validating each generated model to obtain the model's performance.
+# The performance is sent to the exploration strategy for the strategy to generate better models.
+#
+# Retiarii has provided :doc:`built-in model evaluators </nas/evaluator>`, but to start with,
+# it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function.
+# This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.
+#
+# An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.
+
+import nni
+
+from torchvision import transforms
+from torchvision.datasets import MNIST
+from torch.utils.data import DataLoader
+
+
+def train_epoch(model, device, train_loader, optimizer, epoch):
+    loss_fn = torch.nn.CrossEntropyLoss()
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = loss_fn(output, target)
+        loss.backward()
+        optimizer.step()
+        if batch_idx % 10 == 0:
+            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
+                epoch, batch_idx * len(data), len(train_loader.dataset),
+                100. * batch_idx / len(train_loader), loss.item()))
+
+
+def test_epoch(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+
+    test_loss /= len(test_loader.dataset)
+    accuracy = 100. * correct / len(test_loader.dataset)
+
+    print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
+          correct, len(test_loader.dataset), accuracy))
+
+    return accuracy
+
+
+def evaluate_model(model_cls):
+    # "model_cls" is a class, need to instantiate
+    model = model_cls()
+
+    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+    model.to(device)
+
+    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
+    transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+    train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
+    test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
+
+    for epoch in range(3):
+        # train the model for one epoch
+        train_epoch(model, device, train_loader, optimizer, epoch)
+        # test the model for one epoch
+        accuracy = test_epoch(model, device, test_loader)
+        # call report intermediate result. Result can be float or dict
+        nni.report_intermediate_result(accuracy)
+
+    # report final test result
+    nni.report_final_result(accuracy)
+
+
+# %%
+# Create the evaluator
+
+from nni.retiarii.evaluator import FunctionalEvaluator
+evaluator = FunctionalEvaluator(evaluate_model)
+
+# %%
+#
+# The ``train_epoch`` and ``test_epoch`` here can be any customized function, where users can write their own training recipe.
+#
+# It is recommended that the :doc:``evaluate_model`` here accepts no additional arguments other than ``model_cls``.
+# However, in the `advanced tutorial </nas/evaluator>`, we will show how to use additional arguments in case you actually need those.
+# In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".
+#
+# Launch an Experiment
+# --------------------
+#
+# After all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.
+
+from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig
+exp = RetiariiExperiment(model_space, evaluator, [], search_strategy)
+exp_config = RetiariiExeConfig('local')
+exp_config.experiment_name = 'mnist_search'
+
+# %%
+# The following configurations are useful to control how many trials to run at most / at the same time.
+
+exp_config.max_trial_number = 4   # spawn 4 trials at most
+exp_config.trial_concurrency = 2  # will run two trials concurrently
+
+# %%
+# Remember to set the following config if you want to GPU.
+# ``use_active_gpu`` should be set true if you wish to use an occupied GPU (possibly running a GUI).
+
+exp_config.trial_gpu_number = 1
+exp_config.training_service.use_active_gpu = True
+
+# %%
+# Launch the experiment. The experiment should take several minutes to finish on a workstation with 2 GPUs.
+
+exp.run(exp_config, 8081)
+
+# %%
+# Users can also run Retiarii Experiment with :doc:`different training services </experiment/training_service>`
+# besides ``local`` training service.
+#
+# Visualize the Experiment
+# ------------------------
+#
+# Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment.
+# For example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``.
+# Please refer to :doc:`here </experiment/webui>` for details.
+#
+# We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__).
+# This can be used by clicking ``Visualization`` in detail panel for each trial.
+# Note that current visualization is based on `onnx <https://onnx.ai/>`__ ,
+# thus visualization is not feasible if the model cannot be exported into onnx.
+#
+# Built-in evaluators (e.g., Classification) will automatically export the model into a file.
+# For your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.
+# For instance,
+
+import os
+from pathlib import Path
+
+
+def evaluate_model_with_visualization(model_cls):
+    model = model_cls()
+    # dump the model into an onnx
+    if 'NNI_OUTPUT_DIR' in os.environ:
+        dummy_input = torch.zeros(1, 3, 32, 32)
+        torch.onnx.export(model, (dummy_input, ),
+                          Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
+    evaluate_model(model_cls)
+
+# %%
+# Relaunch the experiment, and a button is shown on WebUI.
+#
+# .. image:: ../../img/netron_entrance_webui.png
+#
+# Export Top Models
+# -----------------
+#
+# Users can export top models after the exploration is done using ``export_top_models``.
+
+for model_dict in exp.export_top_models(formatter='dict'):
+    print(model_dict)
+
+# The output is `json` object which records the mutation actions of the top model.
+# If users want to output source code of the top model, they can use graph-based execution engine for the experiment,
+# by simply adding the following two lines.
+#
+# .. code-block:: python
+#
+#   exp_config.execution_engine = 'base'
+#   export_formatter = 'code'
--- a/examples/tutorials/hpo_quickstart_pytorch/README.rst
+++ b/examples/tutorials/hpo_quickstart_pytorch/README.rst
--- a/examples/tutorials/hpo_quickstart_pytorch/main.py
+++ b/examples/tutorials/hpo_quickstart_pytorch/main.py
+"""
+NNI HPO Quickstart with PyTorch
+===============================
+This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.
+
+There is also a :doc:`TensorFlow version<../hpo_quickstart_tensorflow/main>` if you prefer it.
+
+The tutorial consists of 4 steps: 
+
+1. Modify the model for auto-tuning.
+2. Define hyperparameters' search space.
+3. Configure the experiment.
+4. Run the experiment.
+
+.. _official PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
+"""
+
+# %%
+# Step 1: Prepare the model
+# -------------------------
+# In first step, we need to prepare the model to be tuned.
+#
+# The model should be put in a separate script.
+# It will be evaluated many times concurrently,
+# and possibly will be trained on distributed platforms.
+#
+# In this tutorial, the model is defined in :doc:`model.py <model>`.
+#
+# In short, it is a PyTorch model with 3 additional API calls:
+#
+# 1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
+# 2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
+# 3. Use :func:`nni.report_final_result` to report final accuracy.
+#
+# Please understand the model code before continue to next step.
+
+# %%
+# Step 2: Define search space
+# ---------------------------
+# In model code, we have prepared 3 hyperparameters to be tuned:
+# *features*, *lr*, and *momentum*.
+#
+# Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
+#
+# Assuming we have following prior knowledge for these hyperparameters:
+#
+# 1. *features* should be one of 128, 256, 512, 1024.
+# 2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
+# 3. *momentum* should be a float between 0 and 1.
+#
+# In NNI, the space of *features* is called ``choice``;
+# the space of *lr* is called ``loguniform``;
+# and the space of *momentum* is called ``uniform``.
+# You may have noticed, these names are derived from ``numpy.random``.
+#
+# For full specification of search space, check :doc:`the reference </hpo/search_space>`.
+#
+# Now we can define the search space as follow:
+
+search_space = {
+    'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},
+    'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
+    'momentum': {'_type': 'uniform', '_value': [0, 1]},
+}
+
+# %%
+# Step 3: Configure the experiment
+# --------------------------------
+# NNI uses an *experiment* to manage the HPO process.
+# The *experiment config* defines how to train the models and how to explore the search space.
+# 
+# In this tutorial we use a *local* mode experiment,
+# which means models will be trained on local machine, without using any special training platform.
+from nni.experiment import Experiment
+experiment = Experiment('local')
+
+# %%
+# Now we start to configure the experiment.
+#
+# Configure trial code
+# ^^^^^^^^^^^^^^^^^^^^
+# In NNI evaluation of each hyperparameter set is called a *trial*.
+# So the model script is called *trial code*.
+experiment.config.trial_command = 'python model.py'
+experiment.config.trial_code_directory = '.'
+# %%
+# When ``trial_code_directory`` is a relative path, it relates to current working directory.
+# To run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.
+# (`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
+# is only available in standard Python, not in Jupyter Notebook.)
+#
+# .. attention::
+#
+#     If you are using Linux system without Conda,
+#     you may need to change ``"python model.py"`` to ``"python3 model.py"``.
+
+# %%
+# Configure search space
+# ^^^^^^^^^^^^^^^^^^^^^^
+experiment.config.search_space = search_space
+
+# %%
+# Configure tuning algorithm
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^
+# Here we use :doc:`TPE tuner </hpo/tuners>`.
+experiment.config.tuner.name = 'TPE'
+experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
+
+# %%
+# Configure how many trials to run
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
+experiment.config.max_trial_number = 10
+experiment.config.trial_concurrency = 2
+# %%
+# .. note::
+#
+#     ``max_trial_number`` is set to 10 here for a fast example.
+#     In real world it should be set to a larger number.
+#     With default config TPE tuner requires 20 trials to warm up.
+#
+# You may also set ``max_experiment_duration = '1h'`` to limit running time.
+#
+# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
+# the experiment will run forever until you press Ctrl-C.
+
+# %%
+# Step 4: Run the experiment
+# --------------------------
+# Now the experiment is ready. Choose a port and launch it. (Here we use port 8080.)
+#
+# You can use the web portal to view experiment status: http://localhost:8080.
+experiment.run(8080)
+
+# %%
+# After the experiment is done
+# ----------------------------
+# Everything is done and it is safe to exit now. The following are optional.
+#
+# If you are using standard Python instead of Jupyter Notebook,
+# you can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,
+# allowing you to view the web portal after the experiment is done.
+
+# input('Press enter to quit')
+experiment.stop()
+
+# %%
+# :meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,
+# so it can be omitted in your code.
+#
+# After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.
+#
+# .. tip::
+#
+#     This example uses :doc:`Python API </reference/experiment>` to create experiment.
+#
+#     You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.
--- a/examples/tutorials/hpo_quickstart_pytorch/model.py
+++ b/examples/tutorials/hpo_quickstart_pytorch/model.py
+"""
+Port PyTorch Quickstart to NNI
+==============================
+This is a modified version of `PyTorch quickstart`_.
+
+It can be run directly and will have the exact same result as original version.
+
+Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
+
+It is recommended to run this script directly first to verify the environment.
+
+There are 2 key differences from the original version:
+
+1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
+2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
+
+.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
+"""
+
+# %%
+import nni
+import torch
+from torch import nn
+from torch.utils.data import DataLoader
+from torchvision import datasets
+from torchvision.transforms import ToTensor
+
+# %%
+# Hyperparameters to be tuned
+# ---------------------------
+# These are the hyperparameters that will be tuned.
+params = {
+    'features': 512,
+    'lr': 0.001,
+    'momentum': 0,
+}
+
+# %%
+# Get optimized hyperparameters
+# -----------------------------
+# If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
+# But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
+optimized_params = nni.get_next_parameter()
+params.update(optimized_params)
+print(params)
+
+# %%
+# Load dataset
+# ------------
+training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
+test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
+
+batch_size = 64
+
+train_dataloader = DataLoader(training_data, batch_size=batch_size)
+test_dataloader = DataLoader(test_data, batch_size=batch_size)
+
+# %%
+# Build model with hyperparameters
+# --------------------------------
+device = "cuda" if torch.cuda.is_available() else "cpu"
+print(f"Using {device} device")
+
+class NeuralNetwork(nn.Module):
+    def __init__(self):
+        super(NeuralNetwork, self).__init__()
+        self.flatten = nn.Flatten()
+        self.linear_relu_stack = nn.Sequential(
+            nn.Linear(28*28, params['features']),
+            nn.ReLU(),
+            nn.Linear(params['features'], params['features']),
+            nn.ReLU(),
+            nn.Linear(params['features'], 10)
+        )
+
+    def forward(self, x):
+        x = self.flatten(x)
+        logits = self.linear_relu_stack(x)
+        return logits
+
+model = NeuralNetwork().to(device)
+
+loss_fn = nn.CrossEntropyLoss()
+optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
+
+# %%
+# Define train and test
+# ---------------------
+def train(dataloader, model, loss_fn, optimizer):
+    size = len(dataloader.dataset)
+    model.train()
+    for batch, (X, y) in enumerate(dataloader):
+        X, y = X.to(device), y.to(device)
+        pred = model(X)
+        loss = loss_fn(pred, y)
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+
+def test(dataloader, model, loss_fn):
+    size = len(dataloader.dataset)
+    num_batches = len(dataloader)
+    model.eval()
+    test_loss, correct = 0, 0
+    with torch.no_grad():
+        for X, y in dataloader:
+            X, y = X.to(device), y.to(device)
+            pred = model(X)
+            test_loss += loss_fn(pred, y).item()
+            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
+    test_loss /= num_batches
+    correct /= size
+    return correct
+
+# %%
+# Train model and report accuracy
+# -------------------------------
+# Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
+epochs = 5
+for t in range(epochs):
+    print(f"Epoch {t+1}\n-------------------------------")
+    train(train_dataloader, model, loss_fn, optimizer)
+    accuracy = test(test_dataloader, model, loss_fn)
+    nni.report_intermediate_result(accuracy)
+nni.report_final_result(accuracy)
--- a/examples/tutorials/hpo_quickstart_tensorflow/README.rst
+++ b/examples/tutorials/hpo_quickstart_tensorflow/README.rst
--- a/examples/tutorials/hpo_quickstart_tensorflow/main.py
+++ b/examples/tutorials/hpo_quickstart_tensorflow/main.py
+"""
+NNI HPO Quickstart with TensorFlow
+==================================
+This tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.
+
+The tutorial consists of 4 steps: 
+
+1. Modify the model for auto-tuning.
+2. Define hyperparameters' search space.
+3. Configure the experiment.
+4. Run the experiment.
+
+.. _official TensorFlow quickstart: https://www.tensorflow.org/tutorials/quickstart/beginner
+"""
+
+# %%
+# Step 1: Prepare the model
+# -------------------------
+# In first step, we need to prepare the model to be tuned.
+#
+# The model should be put in a separate script.
+# It will be evaluated many times concurrently,
+# and possibly will be trained on distributed platforms.
+#
+# In this tutorial, the model is defined in :doc:`model.py <model>`.
+#
+# In short, it is a TensorFlow model with 3 additional API calls:
+#
+# 1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
+# 2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
+# 3. Use :func:`nni.report_final_result` to report final accuracy.
+#
+# Please understand the model code before continue to next step.
+
+# %%
+# Step 2: Define search space
+# ---------------------------
+# In model code, we have prepared 4 hyperparameters to be tuned:
+# *dense_units*, *activation_type*, *dropout_rate*, and *learning_rate*.
+#
+# Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
+#
+# Assuming we have following prior knowledge for these hyperparameters:
+#
+# 1. *dense_units* should be one of 64, 128, 256.
+# 2. *activation_type* should be one of 'relu', 'tanh', 'swish', or None.
+# 3. *dropout_rate* should be a float between 0.5 and 0.9.
+# 4. *learning_rate* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
+#
+# In NNI, the space of *dense_units* and *activation_type* is called ``choice``;
+# the space of *dropout_rate* is called ``uniform``;
+# and the space of *learning_rate* is called ``loguniform``.
+# You may have noticed, these names are derived from ``numpy.random``.
+#
+# For full specification of search space, check :doc:`the reference </hpo/search_space>`.
+#
+# Now we can define the search space as follow:
+
+search_space = {
+    'dense_units': {'_type': 'choice', '_value': [64, 128, 256]},
+    'activation_type': {'_type': 'choice', '_value': ['relu', 'tanh', 'swish', None]},
+    'dropout_rate': {'_type': 'uniform', '_value': [0.5, 0.9]},
+    'learning_rate': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
+}
+
+# %%
+# Step 3: Configure the experiment
+# --------------------------------
+# NNI uses an *experiment* to manage the HPO process.
+# The *experiment config* defines how to train the models and how to explore the search space.
+# 
+# In this tutorial we use a *local* mode experiment,
+# which means models will be trained on local machine, without using any special training platform.
+from nni.experiment import Experiment
+experiment = Experiment('local')
+
+# %%
+# Now we start to configure the experiment.
+#
+# Configure trial code
+# ^^^^^^^^^^^^^^^^^^^^
+# In NNI evaluation of each hyperparameter set is called a *trial*.
+# So the model script is called *trial code*.
+experiment.config.trial_command = 'python model.py'
+experiment.config.trial_code_directory = '.'
+# %%
+# When ``trial_code_directory`` is a relative path, it relates to current working directory.
+# To run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.
+# (`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
+# is only available in standard Python, not in Jupyter Notebook.)
+#
+# .. attention::
+#
+#     If you are using Linux system without Conda,
+#     you may need to change ``"python model.py"`` to ``"python3 model.py"``.
+
+# %%
+# Configure search space
+# ^^^^^^^^^^^^^^^^^^^^^^
+experiment.config.search_space = search_space
+
+# %%
+# Configure tuning algorithm
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^
+# Here we use :doc:`TPE tuner </hpo/tuners>`.
+experiment.config.tuner.name = 'TPE'
+experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
+
+# %%
+# Configure how many trials to run
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
+experiment.config.max_trial_number = 10
+experiment.config.trial_concurrency = 2
+# %%
+# .. note::
+#
+#     ``max_trial_number`` is set to 10 here for a fast example.
+#     In real world it should be set to a larger number.
+#     With default config TPE tuner requires 20 trials to warm up.
+#
+# You may also set ``max_experiment_duration = '1h'`` to limit running time.
+#
+# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
+# the experiment will run forever until you press Ctrl-C.
+
+# %%
+# Step 4: Run the experiment
+# --------------------------
+# Now the experiment is ready. Choose a port and launch it. (Here we use port 8080.)
+#
+# You can use the web portal to view experiment status: http://localhost:8080.
+experiment.run(8080)
+
+# %%
+# After the experiment is done
+# ----------------------------
+# Everything is done and it is safe to exit now. The following are optional.
+#
+# If you are using standard Python instead of Jupyter Notebook,
+# you can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,
+# allowing you to view the web portal after the experiment is done.
+
+# input('Press enter to quit')
+experiment.stop()
+
+# %%
+# :meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,
+# so it can be omitted in your code.
+#
+# After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.
+#
+# .. tip::
+#
+#     This example uses :doc:`Python API </reference/experiment>` to create experiment.
+#
+#     You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.
--- a/examples/tutorials/hpo_quickstart_tensorflow/model.py
+++ b/examples/tutorials/hpo_quickstart_tensorflow/model.py
+"""
+Port TensorFlow Quickstart to NNI
+=================================
+This is a modified version of `TensorFlow quickstart`_.
+
+It can be run directly and will have the exact same result as original version.
+
+Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
+
+It is recommended to run this script directly first to verify the environment.
+
+There are 3 key differences from the original version:
+
+1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
+2. In `(Optional) Report intermediate results`_ part, it reports per-epoch accuracy metrics.
+3. In `Report final result`_ part, it reports final accuracy.
+
+.. _TensorFlow quickstart: https://www.tensorflow.org/tutorials/quickstart/beginner
+"""
+
+# %%
+import nni
+import tensorflow as tf
+
+# %%
+# Hyperparameters to be tuned
+# ---------------------------
+# These are the hyperparameters that will be tuned later.
+params = {
+    'dense_units': 128,
+    'activation_type': 'relu',
+    'dropout_rate': 0.2,
+    'learning_rate': 0.001,
+}
+
+# %%
+# Get optimized hyperparameters
+# -----------------------------
+# If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
+# But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
+optimized_params = nni.get_next_parameter()
+params.update(optimized_params)
+print(params)
+
+# %%
+# Load dataset
+# ------------
+mnist = tf.keras.datasets.mnist
+
+(x_train, y_train), (x_test, y_test) = mnist.load_data()
+x_train, x_test = x_train / 255.0, x_test / 255.0
+
+# %%
+# Build model with hyperparameters
+# --------------------------------
+model = tf.keras.models.Sequential([
+    tf.keras.layers.Flatten(input_shape=(28, 28)),
+    tf.keras.layers.Dense(params['dense_units'], activation=params['activation_type']),
+    tf.keras.layers.Dropout(params['dropout_rate']),
+    tf.keras.layers.Dense(10)
+])
+
+adam = tf.keras.optimizers.Adam(learning_rate=params['learning_rate'])
+loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
+model.compile(optimizer=adam, loss=loss_fn, metrics=['accuracy'])
+
+# %%
+# (Optional) Report intermediate results
+# --------------------------------------
+# The callback reports per-epoch accuracy to show learning curve in the web portal.
+# You can also leverage the metrics for early stopping with :doc:`NNI assessors </hpo/assessors>`.
+#
+# This part can be safely skipped and the experiment will work fine.
+callback = tf.keras.callbacks.LambdaCallback(
+    on_epoch_end = lambda epoch, logs: nni.report_intermediate_result(logs['accuracy'])
+)
+
+# %%
+# Train and evluate the model
+# ---------------------------
+model.fit(x_train, y_train, epochs=5, verbose=2, callbacks=[callback])
+loss, accuracy = model.evaluate(x_test, y_test, verbose=2)
+
+# %%
+# Report final result
+# -------------------
+# Report final accuracy to NNI so the tuning algorithm can suggest better hyperparameters.
+nni.report_final_result(accuracy)
--- a/examples/tutorials/nas_quick_start_mnist.py
+++ b/examples/tutorials/nas_quick_start_mnist.py
-"""
-Get started with NAS on MNIST
-=============================
-"""
-
-# %%
-a = (1, 2, 3)
-a
-
-# %%
-print('hello')
--- a/examples/tutorials/nasbench_as_dataset.py
+++ b/examples/tutorials/nasbench_as_dataset.py
+"""
+Use NAS Benchmarks as Datasets
+==============================
+
+In this tutorial, we show how to use NAS Benchmarks as datasets.
+For research purposes we sometimes desire to query the benchmarks for architecture accuracies,
+rather than train them one by one from scratch.
+NNI has provided query tools so that users can easily get the retrieve the data in NAS benchmarks.
+"""
+
+# %%
+# Prerequisites
+# -------------
+# This tutorial assumes that you have already prepared your NAS benchmarks under cache directory
+# (by default, ``~/.cache/nni/nasbenchmark``).
+# If you haven't, please follow the data preparation guide in :doc:`/nas/benchmarks`.
+#
+# As a result, the directory should look like:
+
+import os
+os.listdir(os.path.expanduser('~/.cache/nni/nasbenchmark'))
+
+# %%
+import pprint
+
+from nni.nas.benchmarks.nasbench101 import query_nb101_trial_stats
+from nni.nas.benchmarks.nasbench201 import query_nb201_trial_stats
+from nni.nas.benchmarks.nds import query_nds_trial_stats
+
+# %%
+# NAS-Bench-101
+# -------------
+#
+# Use the following architecture as an example:
+#
+# .. image:: ../../img/nas-bench-101-example.png
+
+arch = {
+    'op1': 'conv3x3-bn-relu',
+    'op2': 'maxpool3x3',
+    'op3': 'conv3x3-bn-relu',
+    'op4': 'conv3x3-bn-relu',
+    'op5': 'conv1x1-bn-relu',
+    'input1': [0],
+    'input2': [1],
+    'input3': [2],
+    'input4': [0],
+    'input5': [0, 3, 4],
+    'input6': [2, 5]
+}
+for t in query_nb101_trial_stats(arch, 108, include_intermediates=True):
+    pprint.pprint(t)
+
+# %%
+# An architecture of NAS-Bench-101 could be trained more than once.
+# Each element of the returned generator is a dict which contains one of the training results of this trial config
+# (architecture + hyper-parameters) including train/valid/test accuracy,
+# training time, number of epochs, etc. The results of NAS-Bench-201 and NDS follow similar formats.
+#
+# NAS-Bench-201
+# -------------
+#
+# Use the following architecture as an example:
+#
+# .. image:: ../../img/nas-bench-201-example.png
+
+arch = {
+    '0_1': 'avg_pool_3x3',
+    '0_2': 'conv_1x1',
+    '1_2': 'skip_connect',
+    '0_3': 'conv_1x1',
+    '1_3': 'skip_connect',
+    '2_3': 'skip_connect'
+}
+for t in query_nb201_trial_stats(arch, 200, 'cifar100'):
+    pprint.pprint(t)
+
+# %%
+# Intermediate results are also available.
+
+for t in query_nb201_trial_stats(arch, None, 'imagenet16-120', include_intermediates=True):
+    print(t['config'])
+    print('Intermediates:', len(t['intermediates']))
+
+# %%
+# NDS
+# ---
+#
+# Use the following architecture as an example:
+#
+# .. image:: ../../img/nas-bench-nds-example.png
+#
+# Here, ``bot_muls``, ``ds``, ``num_gs``, ``ss`` and ``ws`` stand for "bottleneck multipliers",
+# "depths", "number of groups", "strides" and "widths" respectively.
+
+# %%
+model_spec = {
+    'bot_muls': [0.0, 0.25, 0.25, 0.25],
+    'ds': [1, 16, 1, 4],
+    'num_gs': [1, 2, 1, 2],
+    'ss': [1, 1, 2, 2],
+    'ws': [16, 64, 128, 16]
+}
+
+# %%
+# Use none as a wildcard.
+for t in query_nds_trial_stats('residual_bottleneck', None, None, model_spec, None, 'cifar10'):
+    pprint.pprint(t)
+
+# %%
+model_spec = {
+    'bot_muls': [0.0, 0.25, 0.25, 0.25],
+    'ds': [1, 16, 1, 4],
+    'num_gs': [1, 2, 1, 2],
+    'ss': [1, 1, 2, 2],
+    'ws': [16, 64, 128, 16]
+}
+for t in query_nds_trial_stats('residual_bottleneck', None, None, model_spec, None, 'cifar10', include_intermediates=True):
+    pprint.pprint(t['intermediates'][:10])
+
+# %%
+model_spec = {'ds': [1, 12, 12, 12], 'ss': [1, 1, 2, 2], 'ws': [16, 24, 24, 40]}
+for t in query_nds_trial_stats('residual_basic', 'resnet', 'random', model_spec, {}, 'cifar10'):
+    pprint.pprint(t)
+
+# %%
+# Get the first one.
+pprint.pprint(next(query_nds_trial_stats('vanilla', None, None, None, None, None)))
+
+# %%
+# Count number.
+model_spec = {'num_nodes_normal': 5, 'num_nodes_reduce': 5, 'depth': 12, 'width': 32, 'aux': False, 'drop_prob': 0.0}
+cell_spec = {
+    'normal_0_op_x': 'avg_pool_3x3',
+    'normal_0_input_x': 0,
+    'normal_0_op_y': 'conv_7x1_1x7',
+    'normal_0_input_y': 1,
+    'normal_1_op_x': 'sep_conv_3x3',
+    'normal_1_input_x': 2,
+    'normal_1_op_y': 'sep_conv_5x5',
+    'normal_1_input_y': 0,
+    'normal_2_op_x': 'dil_sep_conv_3x3',
+    'normal_2_input_x': 2,
+    'normal_2_op_y': 'dil_sep_conv_3x3',
+    'normal_2_input_y': 2,
+    'normal_3_op_x': 'skip_connect',
+    'normal_3_input_x': 4,
+    'normal_3_op_y': 'dil_sep_conv_3x3',
+    'normal_3_input_y': 4,
+    'normal_4_op_x': 'conv_7x1_1x7',
+    'normal_4_input_x': 2,
+    'normal_4_op_y': 'sep_conv_3x3',
+    'normal_4_input_y': 4,
+    'normal_concat': [3, 5, 6],
+    'reduce_0_op_x': 'avg_pool_3x3',
+    'reduce_0_input_x': 0,
+    'reduce_0_op_y': 'dil_sep_conv_3x3',
+    'reduce_0_input_y': 1,
+    'reduce_1_op_x': 'sep_conv_3x3',
+    'reduce_1_input_x': 0,
+    'reduce_1_op_y': 'sep_conv_3x3',
+    'reduce_1_input_y': 0,
+    'reduce_2_op_x': 'skip_connect',
+    'reduce_2_input_x': 2,
+    'reduce_2_op_y': 'sep_conv_7x7',
+    'reduce_2_input_y': 0,
+    'reduce_3_op_x': 'conv_7x1_1x7',
+    'reduce_3_input_x': 4,
+    'reduce_3_op_y': 'skip_connect',
+    'reduce_3_input_y': 4,
+    'reduce_4_op_x': 'conv_7x1_1x7',
+    'reduce_4_input_x': 0,
+    'reduce_4_op_y': 'conv_7x1_1x7',
+    'reduce_4_input_y': 5,
+    'reduce_concat': [3, 6]
+}
+
+for t in query_nds_trial_stats('nas_cell', None, None, model_spec, cell_spec, 'cifar10'):
+    assert t['config']['model_spec'] == model_spec
+    assert t['config']['cell_spec'] == cell_spec
+    pprint.pprint(t)
+
+# %%
+# Count number.
+print('NDS (amoeba) count:', len(list(query_nds_trial_stats(None, 'amoeba', None, None, None, None, None))))
--- a/examples/tutorials/pruning_customize.py
+++ b/examples/tutorials/pruning_customize.py
+"""
+Customize Basic Pruner
+======================
+
+Users can easily customize a basic pruner in NNI. A large number of basic modules have been provided and can be reused.
+Follow the NNI pruning interface, users only need to focus on their creative parts without worrying about other regular modules.
+
+In this tutorial, we show how to customize a basic pruner.
+
+Concepts
+--------
+
+NNI abstracts the basic pruning process into three steps, collecting data, calculating metrics, allocating sparsity.
+Most pruning algorithms rely on a metric to decide where should be pruned. Using L1 norm pruner as an example,
+the first step is collecting model weights, the second step is calculating L1 norm for weight per output channel,
+the third step is ranking L1 norm metric and masking the output channels that have small L1 norm.
+
+In NNI basic pruner, these three step is implement as ``DataCollector``, ``MetricsCalculator`` and ``SparsityAllocator``.
+
+-   ``DataCollector``: This module take pruner as initialize parameter.
+    It will get the relevant information of the model from the pruner,
+    and sometimes it will also hook the model to get input, output or gradient of a layer or a tensor.
+    It can also patch optimizer if some special steps need to be executed before or after ``optimizer.step()``.
+
+-   ``MetricsCalculator``: This module will take the data collected from the ``DataCollector``,
+    then calculate the metrics. The metric shape is usually reduced from the data shape.
+    The ``dim`` taken by ``MetricsCalculator`` means which dimension will be kept after calculate metrics.
+    i.e., the collected data shape is (10, 20, 30), and the ``dim`` is 1, then the dimension-1 will be kept,
+    the output metrics shape should be (20,).
+
+-   ``SparsityAllocator``: This module take the metrics and generate the masks.
+    Different ``SparsityAllocator`` has different masks generation strategies.
+    A common and simple strategy is sorting the metrics' values and calculating a threshold according to the configured sparsity,
+    mask the positions which metric value smaller than the threshold.
+    The ``dim`` taken by ``SparsityAllocator`` means the metrics are for which dimension, the mask will be expanded to weight shape.
+    i.e., the metric shape is (20,), the corresponding layer weight shape is (20, 40), and the ``dim`` is 0.
+    ``SparsityAllocator`` will first generate a mask with shape (20,), then expand this mask to shape (20, 40).
+
+Simple Example: Customize a Block-L1NormPruner
+----------------------------------------------
+
+NNI already have L1NormPruner, but for the reason of reproducing the paper and reducing user configuration items,
+it only support pruning layer output channels. In this example, we will customize a pruner that supports block granularity for Linear.
+
+Note that you don't need to implement all these three kinds of tools for each time,
+NNI supports many predefined tools, and you can directly use these to customize your own pruner.
+This is a tutorial so we show how to define all these three kinds of pruning tools.
+
+Customize the pruning tools used by the pruner at first.
+"""
+
+import torch
+from nni.algorithms.compression.v2.pytorch.pruning.basic_pruner import BasicPruner
+from nni.algorithms.compression.v2.pytorch.pruning.tools import (
+    DataCollector,
+    MetricsCalculator,
+    SparsityAllocator
+)
+
+
+# This data collector collects weight in wrapped module as data.
+# The wrapped module is the module configured in pruner's config_list.
+# This implementation is similar as nni.algorithms.compression.v2.pytorch.pruning.tools.WeightDataCollector
+class WeightDataCollector(DataCollector):
+    def collect(self):
+        data = {}
+        # get_modules_wrapper will get all the wrapper in the compressor (pruner),
+        # it returns a dict with format {wrapper_name: wrapper},
+        # use wrapper.module to get the wrapped module.
+        for _, wrapper in self.compressor.get_modules_wrapper().items():
+            data[wrapper.name] = wrapper.module.weight.data
+        # return {wrapper_name: weight_data}
+        return data
+
+
+class BlockNormMetricsCalculator(MetricsCalculator):
+    def __init__(self, block_sparse_size):
+        # Because we will keep all dimension with block granularity, so fix ``dim=None``,
+        # means all dimensions will be kept.
+        super().__init__(dim=None, block_sparse_size=block_sparse_size)
+
+    def calculate_metrics(self, data):
+        data_length = len(self.block_sparse_size)
+        reduce_unfold_dims = list(range(data_length, 2 * data_length))
+
+        metrics = {}
+        for name, t in data.items():
+            # Unfold t as block size, and calculate L1 Norm for each block.
+            for dim, size in enumerate(self.block_sparse_size):
+                t = t.unfold(dim, size, size)
+            metrics[name] = t.norm(dim=reduce_unfold_dims, p=1)
+        # return {wrapper_name: block_metric}
+        return metrics
+
+
+# This implementation is similar as nni.algorithms.compression.v2.pytorch.pruning.tools.NormalSparsityAllocator
+class BlockSparsityAllocator(SparsityAllocator):
+    def __init__(self, pruner, block_sparse_size):
+        super().__init__(pruner, dim=None, block_sparse_size=block_sparse_size, continuous_mask=True)
+
+    def generate_sparsity(self, metrics):
+        masks = {}
+        for name, wrapper in self.pruner.get_modules_wrapper().items():
+            # wrapper.config['total_sparsity'] can get the configured sparsity ratio for this wrapped module
+            sparsity_rate = wrapper.config['total_sparsity']
+            # get metric for this wrapped module
+            metric = metrics[name]
+            # mask the metric with old mask, if the masked position need never recover,
+            # just keep this is ok if you are new in NNI pruning
+            if self.continuous_mask:
+                metric *= self._compress_mask(wrapper.weight_mask)
+            # convert sparsity ratio to prune number
+            prune_num = int(sparsity_rate * metric.numel())
+            # calculate the metric threshold
+            threshold = torch.topk(metric.view(-1), prune_num, largest=False)[0].max()
+            # generate mask, keep the metric positions that metric values greater than the threshold
+            mask = torch.gt(metric, threshold).type_as(metric)
+            # expand the mask to weight size, if the block is masked, this block will be filled with zeros,
+            # otherwise filled with ones
+            masks[name] = self._expand_mask(name, mask)
+            # merge the new mask with old mask, if the masked position need never recover,
+            # just keep this is ok if you are new in NNI pruning
+            if self.continuous_mask:
+                masks[name]['weight'] *= wrapper.weight_mask
+        return masks
+
+
+# %%
+# Customize the pruner.
+
+class BlockL1NormPruner(BasicPruner):
+    def __init__(self, model, config_list, block_sparse_size):
+        self.block_sparse_size = block_sparse_size
+        super().__init__(model, config_list)
+
+    # Implement reset_tools is enough for this pruner.
+    def reset_tools(self):
+        if self.data_collector is None:
+            self.data_collector = WeightDataCollector(self)
+        else:
+            self.data_collector.reset()
+        if self.metrics_calculator is None:
+            self.metrics_calculator = BlockNormMetricsCalculator(self.block_sparse_size)
+        if self.sparsity_allocator is None:
+            self.sparsity_allocator = BlockSparsityAllocator(self, self.block_sparse_size)
+
+
+# %%
+# Try this pruner.
+
+# Define a simple model.
+class TestModel(torch.nn.Module):
+    def __init__(self) -> None:
+        super().__init__()
+        self.fc1 = torch.nn.Linear(4, 8)
+        self.fc2 = torch.nn.Linear(8, 4)
+
+    def forward(self, x):
+        return self.fc2(self.fc1(x))
+
+model = TestModel()
+config_list = [{'op_types': ['Linear'], 'total_sparsity': 0.5}]
+# use 2x2 block
+_, masks = BlockL1NormPruner(model, config_list, [2, 2]).compress()
+
+# show the generated masks
+print('fc1 masks:\n', masks['fc1']['weight'])
+print('fc2 masks:\n', masks['fc2']['weight'])
+
+
+# %%
+# This time we successfully define a new pruner with pruning block granularity!
+# Note that we don't put validation logic in this example, like ``_validate_config_before_canonical``,
+# but for a robust implementation, we suggest you involve the validation logic.