Unverified Commit ec6e6594 authored by J-shang's avatar J-shang Committed by GitHub
Browse files

[Doc] fix broken links (#4778)

parent fb0c2734
...@@ -24,7 +24,7 @@ and handle the repeated trial-evaluate-feedback loop in the **framework** abstra ...@@ -24,7 +24,7 @@ and handle the repeated trial-evaluate-feedback loop in the **framework** abstra
contains two main components: a **benchmark** from the automlbenchmark library, and an **architecture** which defines the search contains two main components: a **benchmark** from the automlbenchmark library, and an **architecture** which defines the search
space and the evaluator. To further clarify, we provide the definition for the terminologies used in this document. space and the evaluator. To further clarify, we provide the definition for the terminologies used in this document.
* **tuner**\ : a `tuner or advisor provided by NNI <https://nni.readthedocs.io/en/stable/builtin_tuner.html>`_, or a custom tuner provided by the user. * **tuner**\ : a :doc:`tuner or advisor provided by NNI <tuners>`, or a custom tuner provided by the user.
* **task**\ : an abstraction used by automlbenchmark. A task can be thought of as a tuple (dataset, metric). It provides train and test datasets to the frameworks. Then, based on the returns predictions on the test set, the task evaluates the metric (e.g., mse for regression, f1 for classification) and reports the score. * **task**\ : an abstraction used by automlbenchmark. A task can be thought of as a tuple (dataset, metric). It provides train and test datasets to the frameworks. Then, based on the returns predictions on the test set, the task evaluates the metric (e.g., mse for regression, f1 for classification) and reports the score.
* **benchmark**\ : an abstraction used by automlbenchmark. A benchmark is a set of tasks, along with other external constraints such as time limits. * **benchmark**\ : an abstraction used by automlbenchmark. A benchmark is a set of tasks, along with other external constraints such as time limits.
* **framework**\ : an abstraction used by automlbenchmark. Given a task, a framework solves the proposed regression or classification problem using train data and produces predictions on the test set. In our implementation, each framework is an architecture, which defines a search space. To evaluate a task given by the benchmark on a specific tuner, we let the tuner continuously tune the hyperparameters (by giving it cross-validation score on the train data as feedback) until the time or trial limit is reached. Then, the architecture is retrained on the entire train set using the best set of hyperparameters. * **framework**\ : an abstraction used by automlbenchmark. Given a task, a framework solves the proposed regression or classification problem using train data and produces predictions on the test set. In our implementation, each framework is an architecture, which defines a search space. To evaluate a task given by the benchmark on a specific tuner, we let the tuner continuously tune the hyperparameters (by giving it cross-validation score on the train data as feedback) until the time or trial limit is reached. Then, the architecture is retrained on the entire train set using the best set of hyperparameters.
...@@ -148,7 +148,7 @@ By default, the script runs the specified tuners against the specified benchmark ...@@ -148,7 +148,7 @@ By default, the script runs the specified tuners against the specified benchmark
all tuners simultaneously in the background, set the "serialize" flag to false in ``runbenchmark_nni.sh``. all tuners simultaneously in the background, set the "serialize" flag to false in ``runbenchmark_nni.sh``.
Note: the SMAC tuner, DNGO tuner, and the BOHB advisor has to be manually installed before running benchmarks on them. Note: the SMAC tuner, DNGO tuner, and the BOHB advisor has to be manually installed before running benchmarks on them.
Please refer to `this page <https://nni.readthedocs.io/en/stable/Tuner/BuiltinTuner.html?highlight=nni>`_ for more details Please refer to :doc:`this page <tuners>` for more details
on installation. on installation.
Run customized benchmarks on existing tuners Run customized benchmarks on existing tuners
...@@ -163,10 +163,10 @@ Run benchmarks on custom tuners ...@@ -163,10 +163,10 @@ Run benchmarks on custom tuners
You may also use the benchmark to compare a custom tuner written by yourself with the NNI built-in tuners. To use custom You may also use the benchmark to compare a custom tuner written by yourself with the NNI built-in tuners. To use custom
tuners, first make sure that the tuner inherits from ``nni.tuner.Tuner`` and correctly implements the required APIs. For tuners, first make sure that the tuner inherits from ``nni.tuner.Tuner`` and correctly implements the required APIs. For
more information on implementing a custom tuner, please refer to `here <https://nni.readthedocs.io/en/stable/Tuner/CustomizeTuner.html>`_. more information on implementing a custom tuner, please refer to :doc:`here <custom_algorithm>`.
Next, perform the following steps: Next, perform the following steps:
#. Install the custom tuner via the command ``nnictl algo register``. Check `this document <https://nni.readthedocs.io/en/stable/Tutorial/Nnictl.html>`_ for details. #. Install the custom tuner via the command ``nnictl algo register``. Check :doc:`this document <../reference/nnictl>` for details.
#. In ``./nni/frameworks.yaml``\ , add a new framework extending the base framework NNI. Make sure that the parameter ``tuner_type`` corresponds to the "builtinName" of tuner installed in step 1. #. In ``./nni/frameworks.yaml``\ , add a new framework extending the base framework NNI. Make sure that the parameter ``tuner_type`` corresponds to the "builtinName" of tuner installed in step 1.
#. Run the following command #. Run the following command
......
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"\n# SpeedUp Model with Calibration Config\n\n\n## Introduction\n\nDeep learning network has been computational intensive and memory intensive \nwhich increases the difficulty of deploying deep neural network model. Quantization is a \nfundamental technology which is widely used to reduce memory footprint and speedup inference \nprocess. Many frameworks begin to support quantization, but few of them support mixed precision \nquantization and get real speedup. Frameworks like `HAQ: Hardware-Aware Automated Quantization with Mixed Precision <https://arxiv.org/pdf/1811.08886.pdf>`__\\, only support simulated mixed precision quantization which will \nnot speedup the inference process. To get real speedup of mixed precision quantization and \nhelp people get the real feedback from hardware, we design a general framework with simple interface to allow NNI quantization algorithms to connect different \nDL model optimization backends (e.g., TensorRT, NNFusion), which gives users an end-to-end experience that after quantizing their model \nwith quantization algorithms, the quantized model can be directly speeded up with the connected optimization backend. NNI connects \nTensorRT at this stage, and will support more backends in the future.\n\n\n## Design and Implementation\n\nTo support speeding up mixed precision quantization, we divide framework into two part, frontend and backend. \nFrontend could be popular training frameworks such as PyTorch, TensorFlow etc. Backend could be inference \nframework for different hardwares, such as TensorRT. At present, we support PyTorch as frontend and \nTensorRT as backend. To convert PyTorch model to TensorRT engine, we leverage onnx as intermediate graph \nrepresentation. In this way, we convert PyTorch model to onnx model, then TensorRT parse onnx \nmodel to generate inference engine. \n\n\nQuantization aware training combines NNI quantization algorithm 'QAT' and NNI quantization speedup tool.\nUsers should set config to train quantized model using QAT algorithm(please refer to `NNI Quantization Algorithms <https://nni.readthedocs.io/en/stable/Compression/Quantizer.html>`__\\ ).\nAfter quantization aware training, users can get new config with calibration parameters and model with quantized weight. By passing new config and model to quantization speedup tool, users can get real mixed precision speedup engine to do inference.\n\n\nAfter getting mixed precision engine, users can do inference with input data.\n\n\nNote\n\n\n* Recommend using \"cpu\"(host) as data device(for both inference data and calibration data) since data should be on host initially and it will be transposed to device before inference. If data type is not \"cpu\"(host), this tool will transpose it to \"cpu\" which may increases unnecessary overhead.\n* User can also do post-training quantization leveraging TensorRT directly(need to provide calibration dataset).\n* Not all op types are supported right now. At present, NNI supports Conv, Linear, Relu and MaxPool. More op types will be supported in the following release.\n\n\n## Prerequisite\nCUDA version >= 11.0\n\nTensorRT version >= 7.2\n\nNote\n\n* If you haven't installed TensorRT before or use the old version, please refer to `TensorRT Installation Guide <https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html>`__\\ \n\n## Usage\n" "\n# SpeedUp Model with Calibration Config\n\n\n## Introduction\n\nDeep learning network has been computational intensive and memory intensive \nwhich increases the difficulty of deploying deep neural network model. Quantization is a \nfundamental technology which is widely used to reduce memory footprint and speedup inference \nprocess. Many frameworks begin to support quantization, but few of them support mixed precision \nquantization and get real speedup. Frameworks like `HAQ: Hardware-Aware Automated Quantization with Mixed Precision <https://arxiv.org/pdf/1811.08886.pdf>`__\\, only support simulated mixed precision quantization which will \nnot speedup the inference process. To get real speedup of mixed precision quantization and \nhelp people get the real feedback from hardware, we design a general framework with simple interface to allow NNI quantization algorithms to connect different \nDL model optimization backends (e.g., TensorRT, NNFusion), which gives users an end-to-end experience that after quantizing their model \nwith quantization algorithms, the quantized model can be directly speeded up with the connected optimization backend. NNI connects \nTensorRT at this stage, and will support more backends in the future.\n\n\n## Design and Implementation\n\nTo support speeding up mixed precision quantization, we divide framework into two part, frontend and backend. \nFrontend could be popular training frameworks such as PyTorch, TensorFlow etc. Backend could be inference \nframework for different hardwares, such as TensorRT. At present, we support PyTorch as frontend and \nTensorRT as backend. To convert PyTorch model to TensorRT engine, we leverage onnx as intermediate graph \nrepresentation. In this way, we convert PyTorch model to onnx model, then TensorRT parse onnx \nmodel to generate inference engine. \n\n\nQuantization aware training combines NNI quantization algorithm 'QAT' and NNI quantization speedup tool.\nUsers should set config to train quantized model using QAT algorithm(please refer to :doc:`NNI Quantization Algorithms <../compression/quantizer>` ).\nAfter quantization aware training, users can get new config with calibration parameters and model with quantized weight. By passing new config and model to quantization speedup tool, users can get real mixed precision speedup engine to do inference.\n\n\nAfter getting mixed precision engine, users can do inference with input data.\n\n\nNote\n\n\n* Recommend using \"cpu\"(host) as data device(for both inference data and calibration data) since data should be on host initially and it will be transposed to device before inference. If data type is not \"cpu\"(host), this tool will transpose it to \"cpu\" which may increases unnecessary overhead.\n* User can also do post-training quantization leveraging TensorRT directly(need to provide calibration dataset).\n* Not all op types are supported right now. At present, NNI supports Conv, Linear, Relu and MaxPool. More op types will be supported in the following release.\n\n\n## Prerequisite\nCUDA version >= 11.0\n\nTensorRT version >= 7.2\n\nNote\n\n* If you haven't installed TensorRT before or use the old version, please refer to `TensorRT Installation Guide <https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html>`__\\ \n\n## Usage\n"
] ]
}, },
{ {
...@@ -87,7 +87,7 @@ ...@@ -87,7 +87,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Note that NNI also supports post-training quantization directly, please refer to complete examples for detail.\n\nFor complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`.\n\nFor more parameters about the class 'TensorRTModelSpeedUp', you can refer to `Model Compression API Reference <https://nni.readthedocs.io/en/stable/Compression/CompressionReference.html#quantization-speedup>`__\\.\n\n### Mnist test\n\non one GTX2080 GPU,\ninput tensor: ``torch.randn(128, 1, 28, 28)``\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - quantization strategy\n - Latency\n - accuracy\n * - all in 32bit\n - 0.001199961\n - 96%\n * - mixed precision(average bit 20.4)\n - 0.000753688\n - 96%\n * - all in 8bit\n - 0.000229869\n - 93.7%\n\n### Cifar10 resnet18 test (train one epoch)\n\non one GTX2080 GPU,\ninput tensor: ``torch.randn(128, 3, 32, 32)``\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - quantization strategy\n - Latency\n - accuracy\n * - all in 32bit\n - 0.003286268\n - 54.21%\n * - mixed precision(average bit 11.55)\n - 0.001358022\n - 54.78%\n * - all in 8bit\n - 0.000859139\n - 52.81%\n\n" "Note that NNI also supports post-training quantization directly, please refer to complete examples for detail.\n\nFor complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`.\n\nFor more parameters about the class 'TensorRTModelSpeedUp', you can refer to :doc:`Model Compression API Reference <../reference/compression/quantization_speedup>`.\n\n### Mnist test\n\non one GTX2080 GPU,\ninput tensor: ``torch.randn(128, 1, 28, 28)``\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - quantization strategy\n - Latency\n - accuracy\n * - all in 32bit\n - 0.001199961\n - 96%\n * - mixed precision(average bit 20.4)\n - 0.000753688\n - 96%\n * - all in 8bit\n - 0.000229869\n - 93.7%\n\n### Cifar10 resnet18 test (train one epoch)\n\non one GTX2080 GPU,\ninput tensor: ``torch.randn(128, 3, 32, 32)``\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - quantization strategy\n - Latency\n - accuracy\n * - all in 32bit\n - 0.003286268\n - 54.21%\n * - mixed precision(average bit 11.55)\n - 0.001358022\n - 54.78%\n * - all in 8bit\n - 0.000859139\n - 52.81%\n\n"
] ]
} }
], ],
......
...@@ -30,7 +30,7 @@ model to generate inference engine. ...@@ -30,7 +30,7 @@ model to generate inference engine.
Quantization aware training combines NNI quantization algorithm 'QAT' and NNI quantization speedup tool. Quantization aware training combines NNI quantization algorithm 'QAT' and NNI quantization speedup tool.
Users should set config to train quantized model using QAT algorithm(please refer to `NNI Quantization Algorithms <https://nni.readthedocs.io/en/stable/Compression/Quantizer.html>`__\ ). Users should set config to train quantized model using QAT algorithm(please refer to :doc:`NNI Quantization Algorithms <../compression/quantizer>` ).
After quantization aware training, users can get new config with calibration parameters and model with quantized weight. By passing new config and model to quantization speedup tool, users can get real mixed precision speedup engine to do inference. After quantization aware training, users can get new config with calibration parameters and model with quantized weight. By passing new config and model to quantization speedup tool, users can get real mixed precision speedup engine to do inference.
...@@ -119,7 +119,7 @@ test_trt(engine) ...@@ -119,7 +119,7 @@ test_trt(engine)
# #
# For complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`. # For complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`.
# #
# For more parameters about the class 'TensorRTModelSpeedUp', you can refer to `Model Compression API Reference <https://nni.readthedocs.io/en/stable/Compression/CompressionReference.html#quantization-speedup>`__\. # For more parameters about the class 'TensorRTModelSpeedUp', you can refer to :doc:`Model Compression API Reference <../reference/compression/quantization_speedup>`.
# #
# Mnist test # Mnist test
# ^^^^^^^^^^ # ^^^^^^^^^^
......
dc4ece0a2ec69cd5d8b3e7cd9ddeff20 2404b8d0c3958a0191b77bbe882456e4
\ No newline at end of file \ No newline at end of file
...@@ -49,7 +49,7 @@ model to generate inference engine. ...@@ -49,7 +49,7 @@ model to generate inference engine.
Quantization aware training combines NNI quantization algorithm 'QAT' and NNI quantization speedup tool. Quantization aware training combines NNI quantization algorithm 'QAT' and NNI quantization speedup tool.
Users should set config to train quantized model using QAT algorithm(please refer to `NNI Quantization Algorithms <https://nni.readthedocs.io/en/stable/Compression/Quantizer.html>`__\ ). Users should set config to train quantized model using QAT algorithm(please refer to :doc:`NNI Quantization Algorithms <../compression/quantizer>` ).
After quantization aware training, users can get new config with calibration parameters and model with quantized weight. By passing new config and model to quantization speedup tool, users can get real mixed precision speedup engine to do inference. After quantization aware training, users can get new config with calibration parameters and model with quantized weight. By passing new config and model to quantization speedup tool, users can get real mixed precision speedup engine to do inference.
...@@ -174,9 +174,9 @@ finetuning the model by using QAT ...@@ -174,9 +174,9 @@ finetuning the model by using QAT
.. code-block:: none .. code-block:: none
Average test loss: 0.3444, Accuracy: 9141/10000 (91%) Average test loss: 0.5386, Accuracy: 8619/10000 (86%)
Average test loss: 0.1325, Accuracy: 9599/10000 (96%) Average test loss: 0.1553, Accuracy: 9521/10000 (95%)
Average test loss: 0.0980, Accuracy: 9700/10000 (97%) Average test loss: 0.1001, Accuracy: 9686/10000 (97%)
...@@ -207,7 +207,7 @@ export model and get calibration_config ...@@ -207,7 +207,7 @@ export model and get calibration_config
.. code-block:: none .. code-block:: none
calibration_config: {'conv1': {'weight_bits': 8, 'weight_scale': tensor([0.0029], device='cuda:0'), 'weight_zero_point': tensor([121.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': -0.4242129623889923, 'tracked_max_input': 2.821486711502075}, 'conv2': {'weight_bits': 8, 'weight_scale': tensor([0.0015], device='cuda:0'), 'weight_zero_point': tensor([109.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 7.498777389526367}, 'fc1': {'weight_bits': 8, 'weight_scale': tensor([0.0009], device='cuda:0'), 'weight_zero_point': tensor([125.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 13.905810356140137}, 'fc2': {'weight_bits': 8, 'weight_scale': tensor([0.0012], device='cuda:0'), 'weight_zero_point': tensor([118.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 12.378301620483398}, 'relu1': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 7.626255035400391}, 'relu2': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 14.335213661193848}, 'relu3': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 12.815309524536133}, 'relu4': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 11.077027320861816}} calibration_config: {'conv1': {'weight_bits': 8, 'weight_scale': tensor([0.0029], device='cuda:0'), 'weight_zero_point': tensor([98.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': -0.4242129623889923, 'tracked_max_input': 2.821486711502075}, 'conv2': {'weight_bits': 8, 'weight_scale': tensor([0.0017], device='cuda:0'), 'weight_zero_point': tensor([124.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 8.848002433776855}, 'fc1': {'weight_bits': 8, 'weight_scale': tensor([0.0010], device='cuda:0'), 'weight_zero_point': tensor([134.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 14.64758586883545}, 'fc2': {'weight_bits': 8, 'weight_scale': tensor([0.0013], device='cuda:0'), 'weight_zero_point': tensor([121.], device='cuda:0'), 'input_bits': 8, 'tracked_min_input': 0.0, 'tracked_max_input': 15.807988166809082}, 'relu1': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 9.041301727294922}, 'relu2': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 15.143928527832031}, 'relu3': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 16.151935577392578}, 'relu4': {'output_bits': 8, 'tracked_min_output': 0.0, 'tracked_max_output': 11.749024391174316}}
...@@ -237,8 +237,8 @@ build tensorRT engine to make a real speedup ...@@ -237,8 +237,8 @@ build tensorRT engine to make a real speedup
.. code-block:: none .. code-block:: none
Loss: 0.09857580718994141 Accuracy: 96.96% Loss: 0.10061546401977539 Accuracy: 96.83%
Inference elapsed_time (whole dataset): 0.044492483139038086s Inference elapsed_time (whole dataset): 0.04322671890258789s
...@@ -249,7 +249,7 @@ Note that NNI also supports post-training quantization directly, please refer to ...@@ -249,7 +249,7 @@ Note that NNI also supports post-training quantization directly, please refer to
For complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`. For complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`.
For more parameters about the class 'TensorRTModelSpeedUp', you can refer to `Model Compression API Reference <https://nni.readthedocs.io/en/stable/Compression/CompressionReference.html#quantization-speedup>`__\. For more parameters about the class 'TensorRTModelSpeedUp', you can refer to :doc:`Model Compression API Reference <../reference/compression/quantization_speedup>`.
Mnist test Mnist test
^^^^^^^^^^ ^^^^^^^^^^
...@@ -300,7 +300,7 @@ input tensor: ``torch.randn(128, 3, 32, 32)`` ...@@ -300,7 +300,7 @@ input tensor: ``torch.randn(128, 3, 32, 32)``
.. rst-class:: sphx-glr-timing .. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 59.208 seconds) **Total running time of the script:** ( 1 minutes 4.509 seconds)
.. _sphx_glr_download_tutorials_quantization_speedup.py: .. _sphx_glr_download_tutorials_quantization_speedup.py:
......
...@@ -5,10 +5,12 @@ ...@@ -5,10 +5,12 @@
Computation times Computation times
================= =================
**02:04.499** total execution time for **tutorials** files: **01:04.509** total execution time for **tutorials** files:
+-----------------------------------------------------------------------------------------------------+-----------+--------+ +-----------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hello_nas.py` (``hello_nas.py``) | 02:04.499 | 0.0 MB | | :ref:`sphx_glr_tutorials_quantization_speedup.py` (``quantization_speedup.py``) | 01:04.509 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hello_nas.py` (``hello_nas.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+ +-----------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_nasbench_as_dataset.py` (``nasbench_as_dataset.py``) | 00:00.000 | 0.0 MB | | :ref:`sphx_glr_tutorials_nasbench_as_dataset.py` (``nasbench_as_dataset.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+ +-----------------------------------------------------------------------------------------------------+-----------+--------+
...@@ -22,5 +24,3 @@ Computation times ...@@ -22,5 +24,3 @@ Computation times
+-----------------------------------------------------------------------------------------------------+-----------+--------+ +-----------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_quantization_quick_start_mnist.py` (``quantization_quick_start_mnist.py``) | 00:00.000 | 0.0 MB | | :ref:`sphx_glr_tutorials_quantization_quick_start_mnist.py` (``quantization_quick_start_mnist.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+ +-----------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_quantization_speedup.py` (``quantization_speedup.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------------------+-----------+--------+
...@@ -30,7 +30,7 @@ model to generate inference engine. ...@@ -30,7 +30,7 @@ model to generate inference engine.
Quantization aware training combines NNI quantization algorithm 'QAT' and NNI quantization speedup tool. Quantization aware training combines NNI quantization algorithm 'QAT' and NNI quantization speedup tool.
Users should set config to train quantized model using QAT algorithm(please refer to `NNI Quantization Algorithms <https://nni.readthedocs.io/en/stable/Compression/Quantizer.html>`__\ ). Users should set config to train quantized model using QAT algorithm(please refer to :doc:`NNI Quantization Algorithms <../compression/quantizer>` ).
After quantization aware training, users can get new config with calibration parameters and model with quantized weight. By passing new config and model to quantization speedup tool, users can get real mixed precision speedup engine to do inference. After quantization aware training, users can get new config with calibration parameters and model with quantized weight. By passing new config and model to quantization speedup tool, users can get real mixed precision speedup engine to do inference.
...@@ -119,7 +119,7 @@ test_trt(engine) ...@@ -119,7 +119,7 @@ test_trt(engine)
# #
# For complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`. # For complete examples please refer to :githublink:`the code <examples/model_compress/quantization/mixed_precision_speedup_mnist.py>`.
# #
# For more parameters about the class 'TensorRTModelSpeedUp', you can refer to `Model Compression API Reference <https://nni.readthedocs.io/en/stable/Compression/CompressionReference.html#quantization-speedup>`__\. # For more parameters about the class 'TensorRTModelSpeedUp', you can refer to :doc:`Model Compression API Reference <../reference/compression/quantization_speedup>`.
# #
# Mnist test # Mnist test
# ^^^^^^^^^^ # ^^^^^^^^^^
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment