Unverified Commit 51d261e7 authored by J-shang's avatar J-shang Committed by GitHub
Browse files

Merge pull request #4668 from microsoft/doc-refactor

parents d63a2ea3 b469e1c1
Model Compression with NNI
==========================
.. contents::
.. toctree::
:hidden:
:maxdepth: 2
As larger neural networks with more layers and nodes are considered, reducing their storage and computational cost becomes critical, especially for some real-time applications. Model compression can be used to address this problem.
Pruning <pruning>
Quantization <quantization>
Config Specification <compression_config_list>
Advanced Usage <advanced_usage>
NNI provides a model compression toolkit to help user compress and speed up their model with state-of-the-art compression algorithms and strategies. There are several core features supported by NNI model compression:
.. Using rubric to prevent the section heading to be include into toc
.. rubric:: Overview
Deep neural networks (DNNs) have achieved great success in many tasks like computer vision, nature launguage processing, speech processing.
However, typical neural networks are both computationally expensive and energy-intensive,
which can be difficult to be deployed on devices with low computation resources or with strict latency requirements.
Therefore, a natural thought is to perform model compression to reduce model size and accelerate model training/inference without losing performance significantly.
Model compression techniques can be divided into two categories: pruning and quantization.
The pruning methods explore the redundancy in the model weights and try to remove/prune the redundant and uncritical weights.
Quantization refers to compress models by reducing the number of bits required to represent weights or activations.
We further elaborate on the two methods, pruning and quantization, in the following chapters. Besides, the figure below visualizes the difference between these two methods.
.. image:: ../../img/prune_quant.jpg
:target: ../../img/prune_quant.jpg
:scale: 40%
:alt:
NNI provides an easy-to-use toolkit to help users design and use model pruning and quantization algorithms.
For users to compress their models, they only need to add several lines in their code.
There are some popular model compression algorithms built-in in NNI.
Users could further use NNI’s auto-tuning power to find the best-compressed model, which is detailed in Auto Model Compression.
On the other hand, users could easily customize their new compression algorithms using NNI’s interface.
There are several core features supported by NNI model compression:
* Support many popular pruning and quantization algorithms.
* Automate model pruning and quantization process with state-of-the-art strategies and NNI's auto tuning power.
* Speed up a compressed model to make it have lower inference latency and also make it become smaller.
* Speed up a compressed model to make it have lower inference latency and also make it smaller.
* Provide friendly and easy-to-use compression utilities for users to dive into the compression process and results.
* Concise interface for users to customize their own compression algorithms.
Compression Pipeline
--------------------
.. rubric:: Compression Pipeline
.. image:: ../../img/compression_flow.jpg
:target: ../../img/compression_flow.jpg
:alt:
The overall compression pipeline in NNI. For compressing a pretrained model, pruning and quantization can be used alone or in combination.
The overall compression pipeline in NNI is shown above. For compressing a pretrained model, pruning and quantization can be used alone or in combination.
If users want to apply both, a sequential mode is recommended as common practise.
.. note::
Since NNI compression algorithms are not meant to compress model while NNI speedup tool can truly compress model and reduce latency. To obtain a truly compact model, users should conduct `model speedup <./ModelSpeedup.rst>`__. The interface and APIs are unified for both PyTorch and TensorFlow, currently only PyTorch version has been supported, TensorFlow version will be supported in future.
Note that NNI pruners or quantizers are not meant to physically compact the model but for simulating the compression effect. Whereas NNI speedup tool can truly compress model by changing the network architecture and therefore reduce latency.
To obtain a truly compact model, users should conduct :doc:`pruning speedup <../tutorials/pruning_speed_up>` or :doc:`quantizaiton speedup <../tutorials/quantization_speed_up>`.
The interface and APIs are unified for both PyTorch and TensorFlow. Currently only PyTorch version has been supported, and TensorFlow version will be supported in future.
Supported Algorithms
--------------------
The algorithms include pruning algorithms and quantization algorithms.
Pruning Algorithms
^^^^^^^^^^^^^^^^^^
.. rubric:: Supported Pruning Algorithms
Pruning algorithms compress the original network by removing redundant weights or channels of layers, which can reduce model complexity and mitigate the over-fitting issue.
......@@ -43,44 +68,41 @@ Pruning algorithms compress the original network by removing redundant weights o
* - Name
- Brief Introduction of Algorithm
* - `Level Pruner <Pruner.rst#level-pruner>`__
* - :ref:`level-pruner`
- Pruning the specified ratio on each weight based on absolute values of weights
* - `AGP Pruner <../Compression/Pruner.rst#agp-pruner>`__
- Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) `Reference Paper <https://arxiv.org/abs/1710.01878>`__
* - `Lottery Ticket Pruner <../Compression/Pruner.rst#lottery-ticket-hypothesis>`__
- The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. `Reference Paper <https://arxiv.org/abs/1803.03635>`__
* - `FPGM Pruner <../Compression/Pruner.rst#fpgm-pruner>`__
- Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration `Reference Paper <https://arxiv.org/pdf/1811.00250.pdf>`__
* - `L1Filter Pruner <../Compression/Pruner.rst#l1filter-pruner>`__
- Pruning filters with the smallest L1 norm of weights in convolution layers (Pruning Filters for Efficient Convnets) `Reference Paper <https://arxiv.org/abs/1608.08710>`__
* - `L2Filter Pruner <../Compression/Pruner.rst#l2filter-pruner>`__
- Pruning filters with the smallest L2 norm of weights in convolution layers
* - `ActivationAPoZRankFilterPruner <../Compression/Pruner.rst#activationapozrankfilter-pruner>`__
- Pruning filters based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. `Reference Paper <https://arxiv.org/abs/1607.03250>`__
* - `ActivationMeanRankFilterPruner <../Compression/Pruner.rst#activationmeanrankfilter-pruner>`__
- Pruning filters based on the metric that calculates the smallest mean value of output activations
* - `Slim Pruner <../Compression/Pruner.rst#slim-pruner>`__
- Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) `Reference Paper <https://arxiv.org/abs/1708.06519>`__
* - `TaylorFO Pruner <../Compression/Pruner.rst#taylorfoweightfilter-pruner>`__
* - :ref:`l1-norm-pruner`
- Pruning output channels with the smallest L1 norm of weights (Pruning Filters for Efficient Convnets) `Reference Paper <https://arxiv.org/abs/1608.08710>`__
* - :ref:`l2-norm-pruner`
- Pruning output channels with the smallest L2 norm of weights
* - :ref:`fpgm-pruner`
- Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration `Reference Paper <https://arxiv.org/abs/1811.00250>`__
* - :ref:`slim-pruner`
- Pruning output channels by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) `Reference Paper <https://arxiv.org/abs/1708.06519>`__
* - :ref:`activation-apoz-rank-pruner`
- Pruning output channels based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. `Reference Paper <https://arxiv.org/abs/1607.03250>`__
* - :ref:`activation-mean-rank-pruner`
- Pruning output channels based on the metric that calculates the smallest mean value of output activations
* - :ref:`taylor-fo-weight-pruner`
- Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) `Reference Paper <http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf>`__
* - `ADMM Pruner <../Compression/Pruner.rst#admm-pruner>`__
* - :ref:`admm-pruner`
- Pruning based on ADMM optimization technique `Reference Paper <https://arxiv.org/abs/1804.03294>`__
* - `NetAdapt Pruner <../Compression/Pruner.rst#netadapt-pruner>`__
- Automatically simplify a pretrained network to meet the resource budget by iterative pruning `Reference Paper <https://arxiv.org/abs/1804.03230>`__
* - `SimulatedAnnealing Pruner <../Compression/Pruner.rst#simulatedannealing-pruner>`__
* - :ref:`linear-pruner`
- Sparsity ratio increases linearly during each pruning rounds, in each round, using a basic pruner to prune the model.
* - :ref:`agp-pruner`
- Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) `Reference Paper <https://arxiv.org/abs/1710.01878>`__
* - :ref:`lottery-ticket-pruner`
- The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. `Reference Paper <https://arxiv.org/abs/1803.03635>`__
* - :ref:`simulated-annealing-pruner`
- Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm `Reference Paper <https://arxiv.org/abs/1907.03141>`__
* - `AutoCompress Pruner <../Compression/Pruner.rst#autocompress-pruner>`__
* - :ref:`auto-compress-pruner`
- Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner `Reference Paper <https://arxiv.org/abs/1907.03141>`__
* - `AMC Pruner <../Compression/Pruner.rst#amc-pruner>`__
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices `Reference Paper <https://arxiv.org/pdf/1802.03494.pdf>`__
* - `Transformer Head Pruner <../Compression/Pruner.rst#transformer-head-pruner>`__
- Pruning attention heads from transformer models either in one shot or iteratively.
* - :ref:`amc-pruner`
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices `Reference Paper <https://arxiv.org/abs/1802.03494>`__
* - :ref:`movement-pruner`
- Movement Pruning: Adaptive Sparsity by Fine-Tuning `Reference Paper <https://arxiv.org/abs/2005.07683>`__
You can refer to this `benchmark <../CommunitySharings/ModelCompressionComparison.rst>`__ for the performance of these pruners on some benchmark problems.
Quantization Algorithms
^^^^^^^^^^^^^^^^^^^^^^^
.. rubric:: Supported Quantization Algorithms
Quantization algorithms compress the original network by reducing the number of bits required to represent weights or activations, which can reduce the computations and the inference time.
......@@ -90,42 +112,39 @@ Quantization algorithms compress the original network by reducing the number of
* - Name
- Brief Introduction of Algorithm
* - `Naive Quantizer <../Compression/Quantizer.rst#naive-quantizer>`__
* - :ref:`naive-quantizer`
- Quantize weights to default 8 bits
* - `QAT Quantizer <../Compression/Quantizer.rst#qat-quantizer>`__
* - :ref:`qat-quantizer`
- Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. `Reference Paper <http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf>`__
* - `DoReFa Quantizer <../Compression/Quantizer.rst#dorefa-quantizer>`__
* - :ref:`dorefa-quantizer`
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. `Reference Paper <https://arxiv.org/abs/1606.06160>`__
* - `BNN Quantizer <../Compression/Quantizer.rst#bnn-quantizer>`__
* - :ref:`bnn-quantizer`
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. `Reference Paper <https://arxiv.org/abs/1602.02830>`__
* - `LSQ Quantizer <../Compression/Quantizer.rst#lsq-quantizer>`__
* - :ref:`lsq-quantizer`
- Learned step size quantization. `Reference Paper <https://arxiv.org/pdf/1902.08153.pdf>`__
* - `Observer Quantizer <../Compression/Quantizer.rst#observer-quantizer>`__
* - :ref:`observer-quantizer`
- Post training quantizaiton. Collect quantization information during calibration with observers.
Model Speedup
-------------
The final goal of model compression is to reduce inference latency and model size. However, existing model compression algorithms mainly use simulation to check the performance (e.g., accuracy) of compressed model, for example, using masks for pruning algorithms, and storing quantized values still in float32 for quantization algorithms. Given the output masks and quantization bits produced by those algorithms, NNI can really speed up the model. The detailed tutorial of Masked Model Speedup can be found `here <./ModelSpeedup.rst>`__, The detailed tutorial of Mixed Precision Quantization Model Speedup can be found `here <./QuantizationSpeedup.rst>`__.
Compression Utilities
---------------------
.. rubric:: Model Speedup
Compression utilities include some useful tools for users to understand and analyze the model they want to compress. For example, users could check sensitivity of each layer to pruning. Users could easily calculate the FLOPs and parameter size of a model. Please refer to `here <./CompressionUtils.rst>`__ for a complete list of compression utilities.
The final goal of model compression is to reduce inference latency and model size.
However, existing model compression algorithms mainly use simulation to check the performance (e.g., accuracy) of compressed model.
For example, using masks for pruning algorithms, and storing quantized values still in float32 for quantization algorithms.
Given the output masks and quantization bits produced by those algorithms, NNI can really speed up the model.
Advanced Usage
--------------
The following figure shows how NNI prunes and speeds up your models.
NNI model compression leaves simple interface for users to customize a new compression algorithm. The design philosophy of the interface is making users focus on the compression logic while hiding framework specific implementation details from users. Users can learn more about our compression framework and customize a new compression algorithm (pruning algorithm or quantization algorithm) based on our framework. Moreover, users could leverage NNI's auto tuning power to automatically compress a model. Please refer to `here <./advanced.rst>`__ for more details.
.. image:: ../../img/pipeline_compress.jpg
:target: ../../img/pipeline_compress.jpg
:scale: 40%
:alt:
The detailed tutorial of Speed Up Model with Mask can be found :doc:`here <../tutorials/pruning_speed_up>`.
The detailed tutorial of Speed Up Model with Calibration Config can be found :doc:`here <../tutorials/quantization_speed_up>`.
Reference and Feedback
----------------------
.. attention::
* To `report a bug <https://github.com/microsoft/nni/issues/new?template=bug-report.rst>`__ for this feature in GitHub;
* To `file a feature or improvement request <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__ for this feature in GitHub;
* To know more about `Feature Engineering with NNI <../FeatureEngineering/Overview.rst>`__\ ;
* To know more about `NAS with NNI <../NAS/Overview.rst>`__\ ;
* To know more about `Hyperparameter Tuning with NNI <../Tuner/BuiltinTuner.rst>`__\ ;
NNI's model pruning framework has been upgraded to a more powerful version (named pruning v2 before nni v2.6).
The old version (`named pruning before nni v2.6 <https://nni.readthedocs.io/en/v2.6/Compression/pruning.html>`_) will be out of maintenance. If for some reason you have to use the old pruning,
v2.6 is the last nni version to support old pruning version.
.. da97b4cdd507bd8fad43d640f3d2bfef
.. cacd7e0a78bfacc867ee868c07c1d700
#################
模型压缩
#################
========
.. toctree::
:hidden:
:maxdepth: 2
模型剪枝 <pruning>
模型量化 <quantization>
用户配置 <compression_config_list>
高级用法 <advanced_usage>
深度神经网络(DNNs)在许多领域都取得了巨大的成功。 然而,典型的神经网络是
计算和能源密集型的,很难将其部署在计算资源匮乏
......@@ -19,18 +27,3 @@ NNI 中也内置了一些主流的模型压缩算法。
用户可以进一步利用 NNI 的自动调优功能找到最佳的压缩模型,
该功能在自动模型压缩部分有详细介绍。
另一方面,用户可以使用 NNI 的接口自定义新的压缩算法。
详细信息,参考以下教程:
.. toctree::
:maxdepth: 2
概述 <Compression/Overview>
快速入门 <Compression/QuickStart>
教程 <Compression/Tutorial>
剪枝 <Compression/pruning>
剪枝(V2版本) <Compression/v2_pruning>
量化 <Compression/quantization>
工具 <Compression/CompressionUtils>
高级用法 <Compression/advanced>
API 参考 <Compression/CompressionReference>
Pruner Reference
================
Basic Pruner
------------
.. _level-pruner:
Level Pruner
^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LevelPruner
.. _l1-norm-pruner:
L1 Norm Pruner
^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.L1NormPruner
.. _l2-norm-pruner:
L2 Norm Pruner
^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.L2NormPruner
.. _fpgm-pruner:
FPGM Pruner
^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.FPGMPruner
.. _slim-pruner:
Slim Pruner
^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.SlimPruner
.. _activation-apoz-rank-pruner:
Activation APoZ Rank Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ActivationAPoZRankPruner
.. _activation-mean-rank-pruner:
Activation Mean Rank Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ActivationMeanRankPruner
.. _taylor-fo-weight-pruner:
Taylor FO Weight Pruner
^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.TaylorFOWeightPruner
.. _admm-pruner:
ADMM Pruner
^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ADMMPruner
Scheduled Pruners
-----------------
.. _linear-pruner:
Linear Pruner
^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LinearPruner
.. _agp-pruner:
AGP Pruner
^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.AGPPruner
.. _lottery-ticket-pruner:
Lottery Ticket Pruner
^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LotteryTicketPruner
.. _simulated-annealing-pruner:
Simulated Annealing Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.SimulatedAnnealingPruner
.. _auto-compress-pruner:
Auto Compress Pruner
^^^^^^^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.AutoCompressPruner
.. _amc-pruner:
AMC Pruner
^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.AMCPruner
Other Pruner
------------
.. _movement-pruner:
Movement Pruner
^^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.MovementPruner
\ No newline at end of file
Model Pruning with NNI
======================
Pruning is a common technique to compress neural network models.
The pruning methods explore the redundancy in the model weights(parameters) and try to remove/prune the redundant and uncritical weights.
The redundant elements are pruned from the model, their values are zeroed and we make sure they don't take part in the back-propagation process.
The following concepts can help you understand pruning in NNI.
.. Using rubric to prevent the section heading to be include into toc
.. rubric:: Pruning Target
Pruning target means where we apply the sparsity.
Most pruning methods prune the weights to reduce the model size and accelerate the inference latency.
Other pruning methods also apply sparsity on the inputs, outputs or intermediate states to accelerate the inference latency.
NNI support pruning module weights right now, and will support other pruning targets in the future.
.. rubric:: Basic Pruner
Basic pruner generates the masks for each pruning targets (weights) for a determined sparsity ratio.
It usually takes model and config as input arguments, then generate a mask for the model.
.. rubric:: Scheduled Pruner
Scheduled pruner decides how to allocate sparsity ratio to each pruning targets, it also handles the pruning speed up and finetuning logic.
From the implementation logic, the scheduled pruner is a combination of pruning scheduler, basic pruner and task generator.
Task generator only cares about the pruning effect that should be achieved in each round, and uses a config list to express how to pruning.
Basic pruner will reset with the model and config list given by task generator then generate the masks.
For a clearer structure vision, please refer to the figure below.
.. image:: ../../img/pruning_process.png
:target: ../../img/pruning_process.png
:scale: 80%
:align: center
:alt:
More information about scheduled pruning process please refer to :doc:`Pruning Scheduler <pruning_scheduler>`.
.. rubric:: Granularity
Fine-grained pruning or unstructured pruning refers to pruning each individual weights separately.
Coarse-grained pruning or structured pruning is pruning entire group of weights, such as a convolutional filter.
:ref:`level-pruner` is the only fine-grained pruner in NNI, all other pruners pruning the output channels on weights.
.. _dependency-awareode-for-output-channel-pruning:
.. rubric:: Dependency-aware Mode for Output Channel Pruning
Currently, we support ``dependency aware`` mode in several ``pruner``: :ref:`l1-norm-pruner`, :ref:`l2-norm-pruner`, :ref:`fpgm-pruner`,
:ref:`activation-apoz-rank-pruner`, :ref:`activation-mean-rank-pruner`, :ref:`taylor-fo-weight-pruner`.
In these pruning algorithms, the pruner will prune each layer separately. While pruning a layer,
the algorithm will quantify the importance of each filter based on some specific rules(such as l1 norm), and prune the less important output channels.
We use pruning convolutional layers as an example to explain ``dependency aware`` mode.
As :doc:`dependency analysis utils <./compression_utils>` shows, if the output channels of two convolutional layers(conv1, conv2) are added together,
then these two convolutional layers have channel dependency with each other(more details please see :doc:`Compression Utils <./compression_utils>` ).
Take the following figure as an example.
.. image:: ../../img/mask_conflict.jpg
:target: ../../img/mask_conflict.jpg
:scale: 80%
:align: center
:alt:
If we prune the first 50% of output channels (filters) for conv1, and prune the last 50% of output channels for conv2.
Although both layers have pruned 50% of the filters, the speedup module still needs to add zeros to align the output channels.
In this case, we cannot harvest the speed benefit from the model pruning.
To better gain the speed benefit of the model pruning, we add a dependency-aware mode for the ``Pruner`` that can prune the output channels.
In the dependency-aware mode, the pruner prunes the model not only based on the metric of each output channels, but also the topology of the whole network architecture.
In the dependency-aware mode (``dependency_aware`` is set ``True``), the pruner will try to prune the same output channels for the layers that have the channel dependencies with each other, as shown in the following figure.
.. image:: ../../img/dependency-aware.jpg
:target: ../../img/dependency-aware.jpg
:scale: 80%
:align: center
:alt:
Take the dependency-aware mode of :ref:`l1-norm-pruner` as an example.
Specifically, the pruner will calculate the L1 norm (for example) sum of all the layers in the dependency set for each channel.
Obviously, the number of channels that can actually be pruned of this dependency set in the end is determined by the minimum sparsity of layers in this dependency set (denoted by ``min_sparsity``).
According to the L1 norm sum of each channel, the pruner will prune the same ``min_sparsity`` channels for all the layers.
Next, the pruner will additionally prune ``sparsity`` - ``min_sparsity`` channels for each convolutional layer based on its own L1 norm of each channel.
For example, suppose the output channels of ``conv1``, ``conv2`` are added together and the configured sparsities of ``conv1`` and ``conv2`` are 0.3, 0.2 respectively.
In this case, the ``dependency-aware pruner`` will
* First, prune the same 20% of channels for `conv1` and `conv2` according to L1 norm sum of `conv1` and `conv2`.
* Second, the pruner will additionally prune 10% channels for `conv1` according to the L1 norm of each channel of `conv1`.
In addition, for the convolutional layers that have more than one filter group,
``dependency-aware pruner`` will also try to prune the same number of the channels for each filter group.
Overall, this pruner will prune the model according to the L1 norm of each filter and try to meet the topological constrains (channel dependency, etc) to improve the final speed gain after the speedup process.
In the dependency-aware mode, the pruner will provide a better speed gain from the model pruning.
.. toctree::
:hidden:
:maxdepth: 2
Quickstart <../tutorials/cp_pruning_quick_start_mnist>
Pruner <pruner>
Speed Up <../tutorials/cp_pruning_speed_up>
#################
Quantization
#################
Model Quantization with NNI
===========================
Quantization refers to compressing models by reducing the number of bits required to represent weights or activations,
which can reduce the computations and the inference time. In the context of deep neural networks, the major numerical
......@@ -11,8 +10,10 @@ is an active field of research.
A quantizer is a quantization algorithm implementation in NNI, NNI provides multiple quantizers as below. You can also
create your own quantizer using NNI model compression interface.
.. toctree::
.. toctree::
:hidden:
:maxdepth: 2
Quantizers <Quantizer>
Quantization Speedup <QuantizationSpeedup>
Quickstart <../tutorials/cp_quantization_quick_start_mnist>
Quantizer <quantizer>
Speed Up <../tutorials/cp_quantization_speed_up>
Quantizer Reference
===================
.. _naive-quantizer:
Naive Quantizer
^^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.pytorch.quantization.NaiveQuantizer
.. _qat-quantizer:
QAT Quantizer
^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.pytorch.quantization.QAT_Quantizer
.. _dorefa-quantizer:
DoReFa Quantizer
^^^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.pytorch.quantization.DoReFaQuantizer
.. _bnn-quantizer:
BNN Quantizer
^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.pytorch.quantization.BNNQuantizer
.. _lsq-quantizer:
LSQ Quantizer
^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.pytorch.quantization.LsqQuantizer
.. _observer-quantizer:
Observer Quantizer
^^^^^^^^^^^^^^^^^^
.. autoclass:: nni.algorithms.compression.pytorch.quantization.ObserverQuantizer
......@@ -13,6 +13,7 @@
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import re
import subprocess
import sys
sys.path.insert(0, os.path.abspath('../..'))
......@@ -44,23 +45,59 @@ release = 'v2.6'
extensions = [
'sphinx_gallery.gen_gallery',
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.intersphinx',
'sphinx.ext.mathjax',
'sphinxarg4nni.ext',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx.ext.intersphinx',
'sphinxcontrib.bibtex',
# 'nbsphinx', # nbsphinx has conflicts with sphinx-gallery.
'sphinx.ext.extlinks',
'IPython.sphinxext.ipython_console_highlighting',
# Custom extensions in extension/ folder.
'tutorial_links', # this has to be after sphinx-gallery
'inplace_translation',
'cardlinkitem',
'codesnippetcard',
'patch_docutils',
'patch_autodoc',
]
# Autosummary related settings
autosummary_imported_members = True
autosummary_ignore_module_all = False
# Auto-generate stub files before building docs
autosummary_generate = True
# Add mock modules
autodoc_mock_imports = ['apex', 'nni_node', 'tensorrt', 'pycuda', 'nn_meter']
autodoc_mock_imports = [
'apex', 'nni_node', 'tensorrt', 'pycuda', 'nn_meter', 'azureml',
'ConfigSpace', 'ConfigSpaceNNI', 'smac', 'statsmodels', 'pybnn',
]
# Some of our modules cannot generate summary
autosummary_mock_imports = [
'nni.retiarii.codegen.tensorflow',
'nni.nas.benchmarks.nasbench101.db_gen',
'nni.tools.jupyter_extension.management',
] + autodoc_mock_imports
autodoc_typehints = 'description'
autodoc_typehints_description_target = 'documented'
autodoc_inherit_docstrings = False
# Bibliography files
bibtex_bibfiles = ['refs.bib']
# Add a heading to bibliography
bibtex_footbibliography_header = '.. rubric:: Bibliography'
# Set bibliography style
bibtex_default_style = 'plain'
# Sphinx gallery examples
sphinx_gallery_conf = {
......@@ -82,6 +119,25 @@ sphinx_gallery_conf = {
'default_thumb_file': os.path.join(os.path.dirname(__file__), '../img/thumbnails/nni_icon_blue.png'),
}
# Some tutorials might need to appear more than once in toc.
# In this list, we make source/target tutorial pairs.
# Each "source" tutorial rst will be copied to "target" tutorials.
# The anchors will be replaced to avoid dupilcate labels.
# Target should start with ``cp_`` to be properly ignored in git.
tutorials_copy_list = [
# The global quickstart
('tutorials/hpo_quickstart_pytorch/main.rst', 'tutorials/hpo_quickstart_pytorch/cp_global_quickstart_hpo.rst'),
('tutorials/hello_nas.rst', 'tutorials/cp_global_quickstart_nas.rst'),
('tutorials/pruning_quick_start_mnist.rst', 'tutorials/cp_global_quickstart_compression.rst'),
# Others in full-scale materials
('tutorials/hello_nas.rst', 'tutorials/cp_hello_nas_quickstart.rst'),
('tutorials/pruning_quick_start_mnist.rst', 'tutorials/cp_pruning_quick_start_mnist.rst'),
('tutorials/pruning_speed_up.rst', 'tutorials/cp_pruning_speed_up.rst'),
('tutorials/quantization_quick_start_mnist.rst', 'tutorials/cp_quantization_quick_start_mnist.rst'),
('tutorials/quantization_speed_up.rst', 'tutorials/cp_quantization_speed_up.rst'),
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['../templates']
......
######################
Examples
######################
.. toctree::
:maxdepth: 2
MNIST<./TrialExample/MnistExamples>
Cifar10<./TrialExample/Cifar10Examples>
Scikit-learn<./TrialExample/SklearnExamples>
GBDT<./TrialExample/GbdtExample>
Pix2pix<./TrialExample/Pix2pixExample>
.. d19a00598b8eca71c825d80c0a7106f2
######################
示例
######################
.. toctree::
:maxdepth: 2
MNIST<./TrialExample/MnistExamples>
Cifar10<./TrialExample/Cifar10Examples>
Scikit-learn<./TrialExample/SklearnExamples>
GBDT<./TrialExample/GbdtExample>
Pix2pix<./TrialExample/Pix2pixExample>
\ No newline at end of file
Run an Experiment on AdaptDL
============================
Now NNI supports running experiment on `AdaptDL <https://github.com/petuum/adaptdl>`__. Before starting to use NNI AdaptDL mode, you should have a Kubernetes cluster, either on-premises or `Azure Kubernetes Service(AKS) <https://azure.microsoft.com/en-us/services/kubernetes-service/>`__\ , a Ubuntu machine on which `kubeconfig <https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/>`__ is setup to connect to your Kubernetes cluster. In AdaptDL mode, your trial program will run as AdaptDL job in Kubernetes cluster.
AdaptDL Training Service
========================
Now NNI supports running experiment on `AdaptDL <https://github.com/petuum/adaptdl>`__, which is a resource-adaptive deep learning training and scheduling framework. With AdaptDL training service, your trial program will run as AdaptDL job in Kubernetes cluster.
AdaptDL aims to make distributed deep learning easy and efficient in dynamic-resource environments such as shared clusters and the cloud.
Prerequisite for Kubernetes Service
-----------------------------------
Prerequisite
------------
Before starting to use NNI AdaptDL training service, you should have a Kubernetes cluster, either on-premises or `Azure Kubernetes Service(AKS) <https://azure.microsoft.com/en-us/services/kubernetes-service/>`__\ , a Ubuntu machine on which `kubeconfig <https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/>`__ is setup to connect to your Kubernetes cluster.
#. A **Kubernetes** cluster using Kubernetes 1.14 or later with storage. Follow this guideline to set up Kubernetes `on Azure <https://azure.microsoft.com/en-us/services/kubernetes-service/>`__\ , or `on-premise <https://kubernetes.io/docs/setup/>`__ with `cephfs <https://kubernetes.io/docs/concepts/storage/storage-classes/#ceph-rbd>`__\ , or `microk8s with storage add-on enabled <https://microk8s.io/docs/addons>`__.
#. Helm install **AdaptDL Scheduler** to your Kubernetes cluster. Follow this `guideline <https://adaptdl.readthedocs.io/en/latest/installation/install-adaptdl.html>`__ to setup AdaptDL scheduler.
#. Prepare a **kubeconfig** file, which will be used by NNI to interact with your Kubernetes API server. By default, NNI manager will use ``$(HOME)/.kube/config`` as kubeconfig file's path. You can also specify other kubeconfig files by setting the ** KUBECONFIG** environment variable. Refer this `guideline <https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig>`__ to learn more about kubeconfig.
#. Prepare a **kubeconfig** file, which will be used by NNI to interact with your Kubernetes API server. By default, NNI manager will use ``$(HOME)/.kube/config`` as kubeconfig file's path. You can also specify other kubeconfig files by setting the **KUBECONFIG** environment variable. Refer this `guideline <https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig>`__ to learn more about kubeconfig.
#. If your NNI trial job needs GPU resource, you should follow this `guideline <https://github.com/NVIDIA/k8s-device-plugin>`__ to configure **Nvidia device plugin for Kubernetes**.
#. (Optional) Prepare a **NFS server** and export a general purpose mount as external storage.
#. Install **NNI**\ , follow the install guide `here <../Tutorial/QuickStart.rst>`__.
#. Install **NNI**.
Verify Prerequisites
^^^^^^^^^^^^^^^^^^^^
Verify the Prerequisites
^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
.. code-block:: bash
nnictl --version
# Expected: <version_number>
nnictl --version
# Expected: <version_number>
.. code-block:: bash
.. code-block:: bash
kubectl version
# Expected that the kubectl client version matches the server version.
kubectl version
# Expected that the kubectl client version matches the server version.
.. code-block:: bash
.. code-block:: bash
kubectl api-versions | grep adaptdl
# Expected: adaptdl.petuum.com/v1
kubectl api-versions | grep adaptdl
# Expected: adaptdl.petuum.com/v1
Run an experiment
-----------------
Usage
-----
We have a CIFAR10 example that fully leverages the AdaptDL scheduler under ``examples/trials/cifar10_pytorch`` folder. (\ ``main_adl.py`` and ``config_adl.yaml``\ )
We have a CIFAR10 example that fully leverages the AdaptDL scheduler under :githublink:`examples/trials/cifar10_pytorch` folder. (:githublink:`main_adl.py <examples/trials/cifar10_pytorch/main_adl.py>` and :githublink:`config_adl.yaml <examples/trials/cifar10_pytorch/config_adl.yaml>`)
Here is a template configuration specification to use AdaptDL as a training service.
.. code-block:: yaml
authorName: default
experimentName: minimal_adl
trainingServicePlatform: adl
nniManagerIp: 10.1.10.11
logCollection: http
tuner:
builtinTunerName: GridSearch
searchSpacePath: search_space.json
trialConcurrency: 2
maxTrialNum: 2
trial:
adaptive: false # optional.
image: <image_tag>
imagePullSecrets: # optional
- name: stagingsecret
codeDir: .
command: python main.py
gpuNum: 1
cpuNum: 1 # optional
memorySize: 8Gi # optional
nfs: # optional
server: 10.20.41.55
path: /
containerMountPath: /nfs
checkpoint: # optional
storageClass: dfs
storageSize: 1Gi
Those configs not mentioned below, are following the
`default specs defined </Tutorial/ExperimentConfig.rst#configuration-spec>`__ in the NNI doc.
.. code-block:: yaml
authorName: default
experimentName: minimal_adl
trainingServicePlatform: adl
nniManagerIp: 10.1.10.11
logCollection: http
tuner:
builtinTunerName: GridSearch
searchSpacePath: search_space.json
trialConcurrency: 2
maxTrialNum: 2
trial:
adaptive: false # optional.
image: <image_tag>
imagePullSecrets: # optional
- name: stagingsecret
codeDir: .
command: python main.py
gpuNum: 1
cpuNum: 1 # optional
memorySize: 8Gi # optional
nfs: # optional
server: 10.20.41.55
path: /
containerMountPath: /nfs
checkpoint: # optional
storageClass: dfs
storageSize: 1Gi
.. warning::
This configuration is written following the specification of `legacy experiment configuration <https://nni.readthedocs.io/en/v2.6/Tutorial/ExperimentConfig.html>`__. It is still supported, and will be updated to the latest version in future release.
The following explains the configuration fields of AdaptDL training service.
* **trainingServicePlatform**\ : Choose ``adl`` to use the Kubernetes cluster with AdaptDL scheduler.
* **nniManagerIp**\ : *Required* to get the correct info and metrics back from the cluster, for ``adl`` training service.
......@@ -103,6 +104,9 @@ Those configs not mentioned below, are following the
* **storageClass**\ : check `Kubernetes storage documentation <https://kubernetes.io/docs/concepts/storage/storage-classes/>`__ for how to use the appropriate ``storageClass``.
* **storageSize**\ : this value should be large enough to fit your model's checkpoints, or it could cause "disk quota exceeded" error.
More Features
-------------
NFS Storage
^^^^^^^^^^^
......@@ -121,7 +125,6 @@ The ``adl`` training service can then mount it to the kubernetes for every trial
Use cases:
* If your training trials depend on a dataset of large size, you may want to download it first onto the NFS first,
and mount it so that it can be shared across multiple trials.
* The storage for containers are ephemeral and the trial containers will be deleted after a trial's lifecycle is over.
......@@ -131,7 +134,7 @@ Use cases:
In short, it is not limited how a trial wants to read from or write on the NFS storage, so you may use it flexibly as per your needs.
Monitor via Log Stream
----------------------
^^^^^^^^^^^^^^^^^^^^^^
Follow the log streaming of a certain trial:
......@@ -149,7 +152,7 @@ However you may still be able to access the past trial logs
according to the following approach.
Monitor via TensorBoard
-----------------------
^^^^^^^^^^^^^^^^^^^^^^^
In the context of NNI, an experiment has multiple trials.
For easy comparison across trials for a model tuning process,
......
AML Training Service
====================
To run your trials on `AzureML <https://azure.microsoft.com/en-us/services/machine-learning/>`__, you can use AML training service. AML training service can programmatically submit runs to AzureML platform and collect their metrics.
Prerequisite
------------
1. Create an Azure account/subscription using this `link <https://azure.microsoft.com/en-us/free/services/machine-learning/>`__. If you already have an Azure account/subscription, skip this step.
2. Install the Azure CLI on your machine, follow the install guide `here <https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest>`__.
3. Authenticate to your Azure subscription from the CLI. To authenticate interactively, open a command line or terminal and use the following command:
.. code-block:: bash
az login
4. Log into your Azure account with a web browser and create a Machine Learning resource. You will need to choose a resource group and specific a workspace name. Then download ``config.json`` which will be used later.
.. image:: ../../img/aml_workspace.png
5. Create an AML cluster as the compute target.
.. image:: ../../img/aml_cluster.png
6. Open a command line and install AML package environment.
.. code-block:: bash
python3 -m pip install azureml
python3 -m pip install azureml-sdk
Usage
-----
We show an example configuration here with YAML (Python configuration should be similar).
.. code-block:: yaml
trialConcurrency: 1
maxTrialNumber: 10
...
trainingService:
platform: aml
dockerImage: msranni/nni
subscriptionId: ${your subscription ID}
resourceGroup: ${your resource group}
workspaceName: ${your workspace name}
computeTarget: ${your compute target}
Configuration References
------------------------
Compared with :doc:`local` and :doc:`remote`, OpenPAI training service supports the following additional configurations.
.. list-table::
:header-rows: 1
:widths: auto
* - Field name
- Description
* - dockerImage
- Required field. The docker image name used in job. If you don't want to build your own, NNI has provided a docker image `msranni/nni <https://hub.docker.com/r/msranni/nni>`__, which is up-to-date with every NNI release.
* - subscriptionId
- Required field. The subscription id of your account, can be found in ``config.json`` described above.
* - resourceGroup
- Required field. The resource group of your account, can be found in ``config.json`` described above.
* - workspaceName
- Required field. The workspace name of your account, can be found in ``config.json`` described above.
* - computeTarget
- Required field. The compute cluster name you want to use in your AML workspace. See `reference <https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target>`__ and Step 5 above.
* - maxTrialNumberPerGpu
- Optional field. Default 1. Used to specify the max concurrency trial number on a GPU device.
* - useActiveGpu
- Optional field. Default false. Used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no other active process in the GPU. See :doc:`local` for details.
Monitor your trial on the cloud by using AML studio
---------------------------------------------------
To see your trial job's detailed status on the cloud, you need to visit your studio which you create at Step 5 above. Once the job completes, go to the **Outputs + logs** tab. There you can see a ``70_driver_log.txt`` file, This file contains the standard output from a run and can be useful when you're debugging remote runs in the cloud. Learn more about aml from `here <https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-hello-world>`__.
How to Implement Training Service in NNI
========================================
Customize a Training Service
============================
Overview
--------
......
Experiment Management
=====================
An experiment can be created with command line tool ``nnictl`` or python APIs. NNI provides both command line tool ``nnictl`` and web Portal to manage the experiments, such as, creating, stopping, resuming, deleting, ranking, and comparing the experiments.
Management with ``nnictl``
--------------------------
The ability of ``nnictl`` on experiment management is almost equivalent to :doc:`./webui`. Users can refer to :doc:`../reference/nnictl` for detailed usage. It is highly suggested when visualization is not well supported in your environment (e.g., no GUI on your machine).
Management with web portal
--------------------------
Experiment management on web potral gives an quick overview of all the experiment on users' machine. Users can easily switch to one experiment from this page. Users can refer to the :ref:`exp-manage-webportal` page for details. The experiment management on web portal is still under intensive development to bring more user-friendly features.
\ No newline at end of file
FrameworkController Training Service
====================================
NNI supports running experiment using `FrameworkController <https://github.com/Microsoft/frameworkcontroller>`__,
called frameworkcontroller mode.
FrameworkController is built to orchestrate all kinds of applications on Kubernetes,
you don't need to install Kubeflow for specific deep learning framework like tf-operator or pytorch-operator.
Now you can use FrameworkController as the training service to run NNI experiment.
Prerequisite for on-premises Kubernetes Service
-----------------------------------------------
1. A **Kubernetes** cluster using Kubernetes 1.8 or later.
Follow this `guideline <https://kubernetes.io/docs/setup/>`__ to set up Kubernetes.
2. Prepare a **kubeconfig** file, which will be used by NNI to interact with your Kubernetes API server.
By default, NNI manager will use ``~/.kube/config`` as kubeconfig file's path.
You can also specify other kubeconfig files by setting the**KUBECONFIG** environment variable.
Refer this `guideline <https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig>`__
to learn more about kubeconfig.
3. If your NNI trial job needs GPU resource, you should follow this `guideline <https://github.com/NVIDIA/k8s-device-plugin>`__
to configure **Nvidia device plugin for Kubernetes**.
4. Prepare a **NFS server** and export a general purpose mount
(we recommend to map your NFS server path in ``root_squash option``,
otherwise permission issue may raise when NNI copies files to NFS.
Refer this `page <https://linux.die.net/man/5/exports>`__ to learn what root_squash option is),
or **Azure File Storage**.
5. Install **NFS client** on the machine where you install NNI and run nnictl to create experiment.
Run this command to install NFSv4 client:
.. code-block:: bash
apt install nfs-common
6. Install **NNI**:
.. code-block:: bash
python -m pip install nni
Prerequisite for Azure Kubernetes Service
-----------------------------------------
1. NNI support FrameworkController based on Azure Kubernetes Service,
follow the `guideline <https://azure.microsoft.com/en-us/services/kubernetes-service/>`__ to set up Azure Kubernetes Service.
2. Install `Azure CLI <https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest>`__ and **kubectl**.
Use ``az login`` to set azure account, and connect kubectl client to AKS,
refer this `guideline <https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#connect-to-the-cluster>`__.
3. Follow the `guideline <https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal>`__
to create azure file storage account.
If you use Azure Kubernetes Service, NNI need Azure Storage Service to store code files and the output files.
4. To access Azure storage service, NNI need the access key of the storage account,
and NNI uses `Azure Key Vault <https://azure.microsoft.com/en-us/services/key-vault/>`__ Service to protect your private key.
Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account.
Follow this `guideline <https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli>`__ to store the access key.
Setup FrameworkController
-------------------------
Follow the `guideline <https://github.com/Microsoft/frameworkcontroller/tree/master/example/run>`__
to set up FrameworkController in the Kubernetes cluster, NNI supports FrameworkController by the stateful set mode.
If your cluster enforces authorization, you need to create a service account with granted permission for FrameworkController,
and then pass the name of the FrameworkController service account to the NNI Experiment Config.
`refer <https://github.com/Microsoft/frameworkcontroller/tree/master/example/run#run-by-kubernetes-statefulset>`__.
If the k8s cluster enforces Authorization, you also need to create a ServiceAccount with granted permission for FrameworkController,
`refer <https://github.com/microsoft/frameworkcontroller/tree/master/example/run#prerequisite>`__.
Design
------
Please refer the design of `Kubeflow training service <KubeflowMode.rst>`__,
FrameworkController training service pipeline is similar.
Example
-------
The FrameworkController config format is:
.. code-block:: python
from nni.experiment import (
Experiment,
FrameworkAttemptCompletionPolicy,
FrameworkControllerRoleConfig,
K8sNfsConfig,
)
experiment = Experiment('frameworkcontroller')
experiment.config.trial_code_directory = '.'
experiment.config.search_space = search_space
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
experiment.config.training_service.storage = K8sNfsConfig()
experiment.config.training_service.storage.server = '10.20.30.40'
experiment.config.training_service.storage.path = '/mnt/nfs/nni'
experiment.config.training_service.task_roles = [FrameworkControllerRoleConfig()]
experiment.config.training_service.task_roles[0].name = 'worker'
experiment.config.training_service.task_roles[0].task_number = 1
experiment.config.training_service.task_roles[0].command = 'python3 model.py'
experiment.config.training_service.task_roles[0].gpuNumber = 1
experiment.config.training_service.task_roles[0].cpuNumber = 1
experiment.config.training_service.task_roles[0].memorySize = '4g'
experiment.config.training_service.task_roles[0].framework_attempt_completion_policy = \
FrameworkAttemptCompletionPolicy(min_failed_task_count = 1, min_succeed_task_count = 1)
If you use Azure Kubernetes Service, you should set storage config as follows:
.. code-block:: python
experiment.config.training_service.storage = K8sAzureStorageConfig()
experiment.config.training_service.storage.azure_account = 'your_storage_account_name'
experiment.config.training_service.storage.azure_share = 'your_azure_share_name'
experiment.config.training_service.storage.key_vault_name = 'your_vault_name'
experiment.config.training_service.storage.key_vault_key = 'your_secret_name'
If you set `ServiceAccount <https://github.com/microsoft/frameworkcontroller/tree/master/example/run#prerequisite>`__ in your k8s,
please set ``serviceAccountName`` in your config:
.. code-block:: python
experiment.config.training_service.service_account_name = 'frameworkcontroller'
The trial's config format for NNI frameworkcontroller mode is a simple version of FrameworkController's official config,
you could refer the `Tensorflow example of FrameworkController
<https://github.com/microsoft/frameworkcontroller/blob/master/example/framework/scenario/tensorflow/ps/cpu/tensorflowdistributedtrainingwithcpu.yaml>`__
for deep understanding.
Once it's ready, run:
.. code-block:: python
experiment.run(8080)
Notice: In frameworkcontroller mode,
NNIManager will start a rest server and listen on a port which is your NNI web portal's port plus 1.
For example, if your web portal port is ``8080``, the rest server will listen on ``8081``,
to receive metrics from trial job running in Kubernetes.
So you should ``enable 8081`` TCP port in your firewall rule to allow incoming traffic.
Hybrid Training Service
=======================
Hybrid training service is for aggregating different types of computation resources into a virtually unified resource pool, in which trial jobs are dispatched. Hybrid training service is for collecting user's all available computation resources to jointly work on an AutoML task, it is flexibile enough to switch among different types of computation resources. For example, NNI could submit trial jobs to multiple remote machines and AML simultaneously.
Prerequisite
------------
NNI has supported :doc:`./local`, :doc:`./remote`, :doc:`./openpai`, :doc:`./aml`, :doc:`./kubeflow`, :doc:`./frameworkcontroller`, for hybrid training service. Before starting an experiment using using hybrid training service, users should first setup their chosen (sub) training services (e.g., remote training service) according to each training service's own document page.
Usage
-----
Unlike other training services (e.g., ``platform: remote`` in remote training service), there is no dedicated keyword for hybrid training service, users can simply list the configurations of their chosen training services under the ``trainingService`` field. Below is an example of a hybrid training service containing remote training service and local training service in experiment configuration yaml.
.. code-block:: yaml
# the experiment config yaml file
...
trainingService:
- platform: remote
machineList:
- host: 127.0.0.1 # your machine's IP address
user: bob
password: bob
- platform: local
...
A complete example configuration file can be found in :githublink:`examples/trials/mnist-pytorch/config_hybrid.yml`.
\ No newline at end of file
Kubeflow Training Service
=========================
Now NNI supports running experiment on `Kubeflow <https://github.com/kubeflow/kubeflow>`__, called kubeflow mode.
Before starting to use NNI kubeflow mode, you should have a Kubernetes cluster,
either on-premises or `Azure Kubernetes Service(AKS) <https://azure.microsoft.com/en-us/services/kubernetes-service/>`__,
a Ubuntu machine on which `kubeconfig <https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/>`__
is setup to connect to your Kubernetes cluster.
If you are not familiar with Kubernetes, `here <https://kubernetes.io/docs/tutorials/kubernetes-basics/>`__ is a good start.
In kubeflow mode, your trial program will run as Kubeflow job in Kubernetes cluster.
Prerequisite for on-premises Kubernetes Service
-----------------------------------------------
1. A **Kubernetes** cluster using Kubernetes 1.8 or later.
Follow this `guideline <https://kubernetes.io/docs/setup/>`__ to set up Kubernetes.
2. Download, set up, and deploy **Kubeflow** to your Kubernetes cluster.
Follow this `guideline <https://www.kubeflow.org/docs/started/getting-started/>`__ to setup Kubeflow.
3. Prepare a **kubeconfig** file, which will be used by NNI to interact with your Kubernetes API server.
By default, NNI manager will use ``~/.kube/config`` as kubeconfig file's path.
You can also specify other kubeconfig files by setting the **KUBECONFIG** environment variable.
Refer this `guideline <https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig>`__
to learn more about kubeconfig.
4. If your NNI trial job needs GPU resource, you should follow this `guideline <https://github.com/NVIDIA/k8s-device-plugin>`__
to configure **Nvidia device plugin for Kubernetes**.
5. Prepare a **NFS server** and export a general purpose mount
(we recommend to map your NFS server path in ``root_squash option``,
otherwise permission issue may raise when NNI copy files to NFS.
Refer this `page <https://linux.die.net/man/5/exports>`__ to learn what root_squash option is),
or **Azure File Storage**.
6. Install **NFS client** on the machine where you install NNI and run nnictl to create experiment.
Run this command to install NFSv4 client:
.. code-block:: bash
apt install nfs-common
7. Install **NNI**:
.. code-block:: bash
python -m pip install nni
Prerequisite for Azure Kubernetes Service
-----------------------------------------
1. NNI support Kubeflow based on Azure Kubernetes Service,
follow the `guideline <https://azure.microsoft.com/en-us/services/kubernetes-service/>`__ to set up Azure Kubernetes Service.
2. Install `Azure CLI <https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest>`__ and **kubectl**.
Use ``az login`` to set azure account, and connect kubectl client to AKS,
refer this `guideline <https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#connect-to-the-cluster>`__.
3. Deploy Kubeflow on Azure Kubernetes Service, follow the `guideline <https://www.kubeflow.org/docs/started/getting-started/>`__.
4. Follow the `guideline <https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal>`__
to create azure file storage account.
If you use Azure Kubernetes Service, NNI need Azure Storage Service to store code files and the output files.
5. To access Azure storage service, NNI need the access key of the storage account,
and NNI use `Azure Key Vault <https://azure.microsoft.com/en-us/services/key-vault/>`__ Service to protect your private key.
Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account.
Follow this `guideline <https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli>`__ to store the access key.
Design
------
.. image:: ../../img/kubeflow_training_design.png
:target: ../../img/kubeflow_training_design.png
:alt:
Kubeflow training service instantiates a Kubernetes rest client to interact with your K8s cluster's API server.
For each trial, we will upload all the files in your local ``trial_code_directory``
together with NNI generated files like parameter.cfg into a storage volumn.
Right now we support two kinds of storage volumes:
`nfs <https://en.wikipedia.org/wiki/Network_File_System>`__
and `azure file storage <https://azure.microsoft.com/en-us/services/storage/files/>`__,
you should configure the storage volumn in experiment config.
After files are prepared, Kubeflow training service will call K8S rest API to create Kubeflow jobs
(`tf-operator <https://github.com/kubeflow/tf-operator>`__ job
or `pytorch-operator <https://github.com/kubeflow/pytorch-operator>`__ job)
in K8S, and mount your storage volume into the job's pod.
Output files of Kubeflow job, like stdout, stderr, trial.log or model files, will also be copied back to the storage volumn.
NNI will show the storage volumn's URL for each trial in web portal, to allow user browse the log files and job's output files.
Supported operator
------------------
NNI only support tf-operator and pytorch-operator of Kubeflow, other operators are not tested.
Users can set operator type in experiment config.
The setting of tf-operator:
.. code-block:: yaml
config.training_service.operator = 'tf-operator'
The setting of pytorch-operator:
.. code-block:: yaml
config.training_service.operator = 'pytorch-operator'
If users want to use tf-operator, he could set ``ps`` and ``worker`` in trial config.
If users want to use pytorch-operator, he could set ``master`` and ``worker`` in trial config.
Supported storage type
----------------------
NNI support NFS and Azure Storage to store the code and output files,
users could set storage type in config file and set the corresponding config.
The setting for NFS storage are as follows:
.. code-block:: python
config.training_service.storage = K8sNfsConfig(
server = '10.20.30.40', # your NFS server IP
path = '/mnt/nfs/nni' # your NFS server export path
)
If you use Azure storage, you should set ``storage`` in your config as follows:
.. code-block:: python
config.training_service.storage = K8sAzureStorageConfig(
azure_account = your_azure_account_name,
azure_share = your_azure_share_name,
key_vault_name = your_vault_name,
key_vault_key = your_secret_name
)
Run an experiment
-----------------
Use :doc:`PyTorch quickstart </tutorials/hpo_quickstart_pytorch/main>` as an example.
This is a PyTorch job, and use pytorch-operator of Kubeflow.
The experiment config is like:
.. code-block:: python
from nni.experiment import Experiment, K8sNfsConfig, KubeflowRowConfig
experiment = Experiment('kubeflow')
experiment.config.search_space = search_space
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
experiment.config.operator = 'pytorch-operator'
experiment.config.api_version = 'v1alpha2'
experiment.config.training_service.storage = K8sNfsConfig()
experiment.config.training_service.storage.server = '10.20.30.40'
experiment.config.training_service.storage.path = '/mnt/nfs/nni'
experiment.config.training_service.worker = KubeflowRowConfig()
experiment.config.training_service.worker.replicas = 2
experiment.config.training_service.worker.command = 'python3 model.py'
experiment.config.training_service.worker.gpuNumber = 1
experiment.config.training_service.worker.cpuNumber = 1
experiment.config.training_service.worker.memorySize = '4g'
experiment.config.training_service.worker.code_directory = '.'
experiment.config.training_service.master = KubeflowRowConfig()
experiment.config.training_service.master.replicas = 1
experiment.config.training_service.master.command = 'python3 model.py'
experiment.config.training_service.master.gpuNumber = 0
experiment.config.training_service.master.cpuNumber = 1
experiment.config.training_service.master.memorySize = '4g'
experiment.config.training_service.master.code_directory = '.'
experiment.config.training_service.worker.docker_image = 'msranni/nni:latest' # default
Once it's ready, run:
.. code-block:: python
experiment.run(8080)
NNI will create Kubeflow pytorchjob for each trial,
and the job name format is something like ``nni_exp_{experiment_id}_trial_{trial_id}``.
You can see the Kubeflow jobs created by NNI in your Kubernetes dashboard.
Notice: In kubeflow mode, NNIManager will start a rest server and listen on a port which is your NNI web portal's port plus 1.
For example, if your web portal port is ``8080``, the rest server will listen on ``8081``,
to receive metrics from trial job running in Kubernetes.
So you should ``enable 8081`` TCP port in your firewall rule to allow incoming traffic.
Once a trial job is completed, you can go to NNI web portal's overview page (like http://localhost:8080/oview)
to check trials' information.
Local Training Service
======================
With local training service, the whole experiment (e.g., tuning algorithms, trials) runs on a single machine, i.e., user's dev machine. The generated trials run on this machine following ``trialConcurrency`` set in the configuration yaml file. If GPUs are used by trial, local training service will allocate required number of GPUs for each trial, like a resource scheduler.
Prerequisite
------------
You are recommended to go through quick start first, as this document page only explains the configuration of local training service, one part of the experiment configuration yaml file.
Usage
-----
.. code-block:: yaml
# the experiment config yaml file
...
trainingService:
platform: local
useActiveGpu: false # optional
...
There are other supported fields for local training service, such as ``maxTrialNumberPerGpu``, ``gpuIndices``, for concurrently running multiple trials on one GPU, and running trials on a subset of GPUs on your machine. Please refer to :ref:`reference-local-config-label` in reference for detailed usage.
.. note::
Users should set **useActiveGpu** to `true`, if the local machine has GPUs and your trial uses GPU, but generated trials keep waiting. This is usually the case when you are using graphical OS like Windows 10 and Ubuntu desktop.
Then we explain how local training service works with different configurations of ``trialGpuNumber`` and ``trialConcurrency``. Suppose user's local machine has 4 GPUs, with configuration ``trialGpuNumber: 1`` and ``trialConcurrency: 4``, there will be 4 trials run on this machine concurrently, each of which uses 1 GPU. If the configuration is ``trialGpuNumber: 2`` and ``trialConcurrency: 2``, there will be 2 trials run on this machine concurrently, each of which uses 2 GPUs. Which GPU is allocated to which trial is decided by local training service, users do not need to worry about it. An exmaple configuration below.
.. code-block:: yaml
...
trialGpuNumber: 1
trialConcurrency: 4
...
trainingService:
platform: local
useActiveGpu: false
A complete example configuration file can be found :githublink:`examples/trials/mnist-pytorch/config.yml`.
\ No newline at end of file
OpenPAI Training Service
========================
NNI supports running an experiment on `OpenPAI <https://github.com/Microsoft/pai>`__. OpenPAI manages computing resources and is optimized for deep learning. Through docker technology, the computing hardware are decoupled with software, so that it's easy to run distributed jobs, switch with different deep learning frameworks, or run other kinds of jobs on consistent environments.
Prerequisite
------------
1. Before starting to use OpenPAI training service, you should have an account to access an `OpenPAI <https://github.com/Microsoft/pai>`__ cluster. See `here <https://github.com/Microsoft/pai#how-to-deploy>`__ if you don't have any OpenPAI account and want to deploy an OpenPAI cluster. Please note that, on OpenPAI, your trial program will run in Docker containers.
2. Get token. Open web portal of OpenPAI, and click ``My profile`` button in the top-right side.
.. image:: ../../img/pai_profile.jpg
:scale: 80%
Click ``copy`` button in the page to copy a jwt token.
.. image:: ../../img/pai_token.jpg
:scale: 67%
3. Mount NFS storage to local machine. If you don't know where to find the NFS storage, please click ``Submit job`` button in web portal.
.. image:: ../../img/pai_job_submission_page.jpg
:scale: 50%
Find the data management region in job submission page.
.. image:: ../../img/pai_data_management_page.jpg
:scale: 33%
The ``Preview container paths`` is the NFS host and path that OpenPAI provided, you need to mount the corresponding host and path to your local machine first, then NNI could use the OpenPAI's NFS storage to upload data/code to or download from OpenPAI cluster. To mount the storage, please use ``mount`` command, for example:
.. code-block:: bash
sudo mount -t nfs4 gcr-openpai-infra02:/pai/data /local/mnt
Then the ``/data`` folder in container will be mounted to ``/local/mnt`` folder in your local machine. Please keep in mind that ``localStorageMountPoint`` should be set to ``/local/mnt`` in this case.
4. Get OpenPAI's storage config name and ``containerStorageMountPoint``. They can also be found in data management region in job submission page. Please find the ``Name`` and ``Path`` of your ``Team share storage``. They should be put into ``storageConfigName`` and ``containerStorageMountPoint``. For example,
.. code-block:: yaml
storageConfigName: confignfs-data
containerStorageMountPoint: /mnt/confignfs-data
Usage
-----
We show an example configuration here with YAML (Python configuration should be similar).
.. code-block:: yaml
trialGpuNumber: 0
trialConcurrency: 1
...
trainingService:
platform: openpai
host: http://123.123.123.123
username: ${your user name}
token: ${your token}
dockerImage: msranni/nni
trialCpuNumber: 1
trialMemorySize: 8GB
storageConfigName: confignfs-data
localStorageMountPoint: /local/mnt
containerStorageMountPoint: /mnt/confignfs-data
Once completing the configuration and run nnictl / use Python to launch the experiment. NNI will start to spawn trials to your specified OpenPAI platform.
The job name format is something like ``nni_exp_{experiment_id}_trial_{trial_id}``. You can see jobs created by NNI on the OpenPAI cluster's web portal, like:
.. image:: ../../img/nni_pai_joblist.jpg
.. note:: For OpenPAI training service, NNI will start an additional rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is ``8080``, the rest server will listen on ``8081``, to receive metrics from trial job running in Kubernetes. So you should ``enable 8081`` TCP port in your firewall rule to allow incoming traffic.
Once a trial job is completed, you can go to NNI WebUI's overview page (like ``http://localhost:8080/oview``) to check trial's information. For example, you can expand a trial information in trial list view, click the logPath link like:
.. image:: ../../img/nni_webui_joblist.png
:scale: 30%
Configuration References
------------------------
Compared with :doc:`local` and :doc:`remote`, OpenPAI training service supports the following additional configurations.
.. list-table::
:header-rows: 1
:widths: auto
* - Field name
- Description
* - username
- Required field. User name of OpenPAI platform.
* - token
- Required field. Authentication key of OpenPAI platform.
* - host
- Required field. The host of OpenPAI platform. It's PAI's job submission page URI, like ``10.10.5.1``. The default protocol in NNI is HTTPS. If your PAI's cluster has disabled https, please use the URI in ``http://10.10.5.1`` format.
* - trialCpuNumber
- Optional field. Should be positive number based on your trial program's CPU requirement. If it's not set in trial configuration, it should be set in the config specified in ``openpaiConfig`` or ``openpaiConfigFile`` field.
* - trialMemorySize
- Optional field. Should be in format like ``2gb`` based on your trial program's memory requirement. If it's not set in trial configuration, it should be set in the config specified in ``openpaiConfig`` or ``openpaiConfigFile`` field.
* - dockerImage
- Optional field. In OpenPAI training service, your trial program will be scheduled by OpenPAI to run in `Docker container <https://www.docker.com/>`__. This key is used to specify the Docker image used to create the container in which your trial will run. Upon every NNI release, we build `a docker image <https://hub.docker.com/r/msranni/nni>`__ with :githublink:`this Dockerfile <https://hub.docker.com/r/msranni/nni>`. You can either use this image directly in your config file, or build your own image. If it's not set in trial configuration, it should be set in the config specified in ``openpaiConfig`` or ``openpaiConfigFile`` field.
* - virtualCluster
- Optional field. Set the virtualCluster of OpenPAI. If omitted, the job will run on ``default`` virtual cluster.
* - localStorageMountPoint
- Required field. Set the mount path in the machine you start the experiment.
* - containerStorageMountPoint
- Optional field. Set the mount path in your container used in OpenPAI.
* - storageConfigName
- Optional field. Set the storage name used in OpenPAI. If it's not set in trial configuration, it should be set in the config specified in ``openpaiConfig`` or ``openpaiConfigFile`` field.
* - openpaiConfigFile
- Optional field. Set the file path of OpenPAI job configuration, the file is in yaml format. If users set ``openpaiConfigFile`` in NNI's configuration file, there's no need to specify the fields ``storageConfigName``, ``virtualCluster``, ``dockerImage``, ``trialCpuNumber``, ``trialGpuNumber``, ``trialMemorySize`` in configuration. These fields will use the values from the config file specified by ``openpaiConfigFile``.
* - openpaiConfig
- Optional field. Similar to ``openpaiConfigFile``, but instead of referencing an external file, using this field you embed the content into NNI's config YAML.
.. note::
#. The job name in OpenPAI's configuration file will be replaced by a new job name, the new job name is created by NNI, the name format is ``nni_exp_{this.experimentId}_trial_{trialJobId}`` .
#. If users set multiple taskRoles in OpenPAI's configuration file, NNI will wrap all of these taskRoles and start multiple tasks in one trial job, users should ensure that only one taskRole report metric to NNI, otherwise there might be some conflict error.
Data management
---------------
Before using NNI to start your experiment, users should set the corresponding mount data path in your nniManager machine. OpenPAI has their own storage (NFS, AzureBlob ...), and the storage will used in OpenPAI will be mounted to the container when it start a job. Users should set the OpenPAI storage type by ``paiStorageConfigName`` field to choose a storage in OpenPAI. Then users should mount the storage to their nniManager machine, and set the ``nniManagerNFSMountPath`` field in configuration file, NNI will generate bash files and copy data in ``codeDir`` to the ``nniManagerNFSMountPath`` folder, then NNI will start a trial job. The data in ``nniManagerNFSMountPath`` will be sync to OpenPAI storage, and will be mounted to OpenPAI's container. The data path in container is set in ``containerNFSMountPath``, NNI will enter this folder first, and then run scripts to start a trial job.
Version check
-------------
NNI support version check feature in since version 0.6. It is a policy to insure the version of NNIManager is consistent with trialKeeper, and avoid errors caused by version incompatibility.
Check policy:
#. NNIManager before v0.6 could run any version of trialKeeper, trialKeeper support backward compatibility.
#. Since version 0.6, NNIManager version should keep same with triakKeeper version. For example, if NNIManager version is 0.6, trialKeeper version should be 0.6 too.
#. Note that the version check feature only check first two digits of version.For example, NNIManager v0.6.1 could use trialKeeper v0.6 or trialKeeper v0.6.2, but could not use trialKeeper v0.5.1 or trialKeeper v0.7.
If you could not run your experiment and want to know if it is caused by version check, you could check your webUI, and there will be an error message about version check.
.. image:: ../../img/webui-img/experimentError.png
:scale: 80%
With local training service, the whole experiment (e.g., tuning algorithms, trials) runs on a single machine, i.e., user's dev machine. The generated trials run on this machine following ``trialConcurrency`` set in the configuration yaml file. If GPUs are used by trial, local training service will allocate required number of GPUs for each trial, like a resource scheduler.
NNI Experiment
==============
An NNI experiment is a unit of one tuning process. For example, it is one run of hyper-parameter tuning on a specific search space, it is one run of neural architecture search on a search space, or it is one run of automatic model compression on user specified goal on latency and accuracy. Usually, the tuning process requires many trials to explore feasible and potentially good-performing models. Thus, an important component of NNI experiment is **training service**, which is a unified interface to abstract diverse computation resources (e.g., local machine, remote servers, AKS). Users can easily run the tuning process on their prefered computation resource and platform. On the other hand, NNI experiment provides **WebUI** to visualize the tuning process to users.
During developing a DNN model, users need to manage the tuning process, such as, creating an experiment, adjusting an experiment, kill or rerun a trial in an experiment, dumping experiment data for customized analysis. Also, users may create a new experiment for comparison, or concurrently for new model developing tasks. Thus, NNI provides the functionality of **experiment management**. Users can use :doc:`../reference/nnictl` to interact with experiments.
The relation of the components in NNI experiment is illustrated in the following figure. Hyper-parameter optimization (HPO), neural architecture search (NAS), and model compression are three key features in NNI that help users develop and tune their models. Training serivce provides the ability of parallel running trials on available computation resources. WebUI visualizes the tuning process. *nnictl* is for managing the experiments.
.. image:: ../../img/experiment_arch.png
:scale: 80 %
:align: center
Before reading the following content, you are recommended to go through the quick start first.
.. toctree::
:maxdepth: 2
Training Services <training_service>
Web Portal <web_portal>
Experiment Management <exp_management>
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment