Unverified Commit b52f7756 authored by liuzhe-lz's avatar liuzhe-lz Committed by GitHub
Browse files

HPO doc (#4579)

parent 88ffe908
...@@ -23,6 +23,8 @@ Tuner ...@@ -23,6 +23,8 @@ Tuner
.. autoclass:: nni.algorithms.hpo.tpe_tuner.TpeTuner .. autoclass:: nni.algorithms.hpo.tpe_tuner.TpeTuner
:members: :members:
.. autoclass:: nni.algorithms.hpo.tpe_tuner.TpeArguments
.. autoclass:: nni.algorithms.hpo.random_tuner.RandomTuner .. autoclass:: nni.algorithms.hpo.random_tuner.RandomTuner
:members: :members:
......
.. d5351e951811dcaeeda7f270427187fd
内置 Assessor
=================
为了节省计算资源,NNI 支持提前终止策略,并且通过叫做 **Assessor** 的接口来执行此操作。
Assessor 从 Trial 中接收中间结果,并通过指定的算法决定此 Trial 是否应该终止。 一旦 Trial 满足了提前终止策略(这表示 Assessor 认为最终结果不会太好),Assessor 会终止此 Trial,并将其状态标志为 `EARLY_STOPPED`。
这是 MNIST 在 "最大化" 模式下使用 "曲线拟合" Assessor 的实验结果。 可以看到 Assessor 成功的 **提前终止** 了许多结果不好超参组合的 Trial。 使用 Assessor,能在相同的计算资源下,得到更好的结果。
实验代码: :githublink:`config_assessor.yml <examples/trials/mnist-pytorch/config_assessor.yml>`
.. image:: ../img/Assessor.png
.. toctree::
:maxdepth: 1
概述<./Assessor/BuiltinAssessor>
Medianstop<./Assessor/MedianstopAssessor>
Curvefitting(曲线拟合)<./Assessor/CurvefittingAssessor>
.. 10b9097fcfec13f98bb6914b40bd0337
内置 Tuner
==========
为了让机器学习和深度学习模型适应不同的任务和问题,我们需要进行超参数调优,而自动化调优依赖于优秀的调优算法。NNI 内置了先进的调优算法,并且提供了易于使用的 API。
在 NNI 中,调优算法被称为“tuner”。Tuner 向 trial 发送超参数,接收运行结果从而评估这组超参的性能,然后将下一组超参发送给新的 trial。
下表简要介绍了 NNI 内置的调优算法。点击 tuner 的名称可以查看其安装需求、推荐使用场景、示例配置文件等详细信息。`这篇文章 <../CommunitySharings/HpoComparison.rst>`__ 对比了各个 tuner 在不同场景下的性能。
.. list-table::
:header-rows: 1
:widths: auto
* - Tuner
- 算法简介
* - `TPE <./TpeTuner.rst>`__
- Tree-structured Parzen Estimator (TPE) 是一种基于序列模型的优化方法 (sequential model-based optimization, SMBO)。SMBO方法根据历史数据来顺序地构造模型,从而预估超参性能,并基于此模型来选择新的超参。`参考论文 <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__
* - `Random Search (随机搜索) <./RandomTuner.rst>`__
- 随机搜索在超算优化中表现出了令人意外的性能。如果没有对超参分布的先验知识,我们推荐使用随机搜索作为基线方法。`参考论文 <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__
* - `Anneal (退火) <./AnnealTuner.rst>`__
- 朴素退火算法首先基于先验进行采样,然后逐渐逼近实际性能较好的采样点。该算法是随即搜索的变体,利用了反应曲面的平滑性。该实现中退火率不是自适应的。
* - `Naive Evolution(朴素进化) <./EvolutionTuner.rst>`__
- 朴素进化算法来自于 Large-Scale Evolution of Image Classifiers。它基于搜索空间随机生成一个种群,在每一代中选择较好的结果,并对其下一代进行变异。朴素进化算法需要很多 Trial 才能取得最优效果,但它也非常简单,易于扩展。`参考论文 <https://arxiv.org/pdf/1703.01041.pdf>`__
* - `SMAC <./SmacTuner.rst>`__
- SMAC 是基于序列模型的优化方法 (SMBO)。它利用使用过的最突出的模型(高斯随机过程模型),并将随机森林引入到SMBO中,来处理分类参数。NNI 的 SMAC tuner 封装了 GitHub 上的 `SMAC3 <https://github.com/automl/SMAC3>`__。`参考论文 <https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf>`__
注意:SMAC 算法需要使用 ``pip install nni[SMAC]`` 安装依赖,暂不支持 Windows 操作系统。
* - `Batch(批处理) <./BatchTuner.rst>`__
- 批处理允许用户直接提供若干组配置,为每种配置运行一个 trial。
* - `Grid Search(网格遍历) <./GridsearchTuner.rst>`__
- 网格遍历会穷举搜索空间中的所有超参组合。
* - `Hyperband <./HyperbandAdvisor.rst>`__
- Hyperband 试图用有限的资源探索尽可能多的超参组合。该算法的思路是,首先生成大量超参配置,将每组超参运行较短的一段时间,随后抛弃其中效果较差的一半,让较好的超参继续运行,如此重复多轮。`参考论文 <https://arxiv.org/pdf/1603.06560.pdf>`__
* - `Metis <./MetisTuner.rst>`__
- 大多数调参工具仅仅预测最优配置,而 Metis 的优势在于它有两个输出:(a) 最优配置的当前预测结果, 以及 (b) 下一次 trial 的建议。大多数工具假设训练集没有噪声数据,但 Metis 会知道是否需要对某个超参重新采样。`参考论文 <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__
* - `BOHB <./BohbAdvisor.rst>`__
- BOHB 是 Hyperband 算法的后续工作。 Hyperband 在生成新的配置时,没有利用已有的 trial 结果,而本算法利用了 trial 结果。BOHB 中,HB 表示 Hyperband,BO 表示贝叶斯优化(Byesian Optimization)。 BOHB 会建立多个 TPE 模型,从而利用已完成的 Trial 生成新的配置。`参考论文 <https://arxiv.org/abs/1807.01774>`__
* - `GP (高斯过程) <./GPTuner.rst>`__
- GP Tuner 是基于序列模型的优化方法 (SMBO),使用高斯过程进行 surrogate。`参考论文 <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__
* - `PBT <./PBTTuner.rst>`__
- PBT Tuner 是一种简单的异步优化算法,在固定的计算资源下,它能有效的联合优化一组模型及其超参来最优化性能。`参考论文 <https://arxiv.org/abs/1711.09846v1>`__
* - `DNGO <./DngoTuner.rst>`__
- DNGO 是基于序列模型的优化方法 (SMBO),该算法使用神经网络(而不是高斯过程)去建模贝叶斯优化中所需要的函数分布。
.. toctree::
:maxdepth: 1
TPE <Tuner/TpeTuner>
Random Search(随机搜索) <Tuner/RandomTuner>
Anneal(退火) <Tuner/AnnealTuner>
Naïve Evolution(朴素进化) <Tuner/EvolutionTuner>
SMAC <Tuner/SmacTuner>
Metis Tuner <Tuner/MetisTuner>
Batch Tuner(批处理) <Tuner/BatchTuner>
Grid Search(网格遍历) <Tuner/GridsearchTuner>
GP Tuner <Tuner/GPTuner>
Network Morphism <Tuner/NetworkmorphismTuner>
Hyperband <Tuner/HyperbandAdvisor>
BOHB <Tuner/BohbAdvisor>
PBT Tuner <Tuner/PBTTuner>
DNGO Tuner <Tuner/DngoTuner>
...@@ -79,6 +79,10 @@ autosummary_mock_imports = [ ...@@ -79,6 +79,10 @@ autosummary_mock_imports = [
'nni.tools.jupyter_extension.management', 'nni.tools.jupyter_extension.management',
] + autodoc_mock_imports ] + autodoc_mock_imports
autodoc_typehints = 'description'
autodoc_typehints_description_target = 'documented'
autodoc_inherit_docstrings = False
# Bibliography files # Bibliography files
bibtex_bibfiles = ['refs.bib'] bibtex_bibfiles = ['refs.bib']
......
.. d19a00598b8eca71c825d80c0a7106f2
######################
示例
######################
.. toctree::
:maxdepth: 2
MNIST<./TrialExample/MnistExamples>
Cifar10<./TrialExample/Cifar10Examples>
Scikit-learn<./TrialExample/SklearnExamples>
GBDT<./TrialExample/GbdtExample>
Pix2pix<./TrialExample/Pix2pixExample>
\ No newline at end of file
###########################
Hyperparameter Optimization
###########################
.. toctree::
:maxdepth: 2
TensorBoard Integration <tensorboard>
Implement Custom Tuners and Assessors <custom_algorithm>
Install Custom or 3rd-party Tuners and Assessors <custom_algorithm_installation>
Tuner Benchmark <hpo_benchmark>
Builtin-Assessors Assessor: Early Stopping
================= ========================
In order to save on computing resources, NNI supports an early stopping policy and has an interface called **Assessor** to do this job. In order to save on computing resources, NNI supports an early stopping policy and has an interface called **Assessor** to do this job.
...@@ -9,11 +9,11 @@ Here is an experimental result of MNIST after using the 'Curvefitting' Assessor ...@@ -9,11 +9,11 @@ Here is an experimental result of MNIST after using the 'Curvefitting' Assessor
Implemented code directory: :githublink:`config_assessor.yml <examples/trials/mnist-pytorch/config_assessor.yml>` Implemented code directory: :githublink:`config_assessor.yml <examples/trials/mnist-pytorch/config_assessor.yml>`
.. image:: ../img/Assessor.png .. image:: ../../img/Assessor.png
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
Overview<./Assessor/BuiltinAssessor> Overview<../Assessor/BuiltinAssessor>
Medianstop<./Assessor/MedianstopAssessor> Medianstop<../Assessor/MedianstopAssessor>
Curvefitting<./Assessor/CurvefittingAssessor> Curvefitting<../Assessor/CurvefittingAssessor>
Customize-Tuner Customize Tuner
=============== ===============
NNI provides state-of-the-art tuning algorithm in builtin-tuners. NNI supports to build a tuner by yourself for tuning demand. NNI provides state-of-the-art tuning algorithm in builtin-tuners. NNI supports to build a tuner by yourself for tuning demand.
...@@ -123,3 +123,68 @@ Write a more advanced automl algorithm ...@@ -123,3 +123,68 @@ Write a more advanced automl algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called ``advisor`` which directly inherits from ``MsgDispatcherBase`` in :githublink:`msg_dispatcher_base.py <nni/runtime/msg_dispatcher_base.py>`. Please refer to `here <CustomizeAdvisor.rst>`__ for how to write a customized advisor. The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called ``advisor`` which directly inherits from ``MsgDispatcherBase`` in :githublink:`msg_dispatcher_base.py <nni/runtime/msg_dispatcher_base.py>`. Please refer to `here <CustomizeAdvisor.rst>`__ for how to write a customized advisor.
Customize Assessor
==================
NNI supports to build an assessor by yourself for tuning demand.
If you want to implement a customized Assessor, there are three things to do:
#. Inherit the base Assessor class
#. Implement assess_trial function
#. Configure your customized Assessor in experiment YAML config file
**1. Inherit the base Assessor class**
.. code-block:: python
from nni.assessor import Assessor
class CustomizedAssessor(Assessor):
def __init__(self, ...):
...
**2. Implement assess trial function**
.. code-block:: python
from nni.assessor import Assessor, AssessResult
class CustomizedAssessor(Assessor):
def __init__(self, ...):
...
def assess_trial(self, trial_history):
"""
Determines whether a trial should be killed. Must override.
trial_history: a list of intermediate result objects.
Returns AssessResult.Good or AssessResult.Bad.
"""
# you code implement here.
...
**3. Configure your customized Assessor in experiment YAML config file**
NNI needs to locate your customized Assessor class and instantiate the class, so you need to specify the location of the customized Assessor class and pass literal values as parameters to the __init__ constructor.
.. code-block:: yaml
assessor:
codeDir: /home/abc/myassessor
classFileName: my_customized_assessor.py
className: CustomizedAssessor
# Any parameter need to pass to your Assessor class __init__ constructor
# can be specified in this optional classArgs field, for example
classArgs:
arg1: value1
Please noted in **2**. The object ``trial_history`` are exact the object that Trial send to Assessor by using SDK ``report_intermediate_result`` function.
The working directory of your assessor is ``<home>/nni-experiments/<experiment_id>/log``\ , which can be retrieved with environment variable ``NNI_LOG_DIRECTORY``\ ,
More detail example you could see:
* :githublink:`medianstop-assessor <nni/algorithms/hpo/medianstop_assessor.py>`
* :githublink:`curvefitting-assessor <nni/algorithms/hpo/curvefitting_assessor/>`
...@@ -28,8 +28,8 @@ classification tasks, the metric "auc" and "logloss" were used for evaluation, w ...@@ -28,8 +28,8 @@ classification tasks, the metric "auc" and "logloss" were used for evaluation, w
After the script finishes, the final scores of each tuner are summarized in the file ``results[time]/reports/performances.txt``. After the script finishes, the final scores of each tuner are summarized in the file ``results[time]/reports/performances.txt``.
Since the file is large, we only show the following screenshot and summarize other important statistics instead. Since the file is large, we only show the following screenshot and summarize other important statistics instead.
.. image:: ../img/hpo_benchmark/performances.png .. image:: ../../img/hpo_benchmark/performances.png
:target: ../img/hpo_benchmark/performances.png :target: ../../img/hpo_benchmark/performances.png
:alt: :alt:
When the results are parsed, the tuners are also ranked based on their final performance. The following three tables show When the results are parsed, the tuners are also ranked based on their final performance. The following three tables show
...@@ -154,52 +154,52 @@ To view the same data in another way, for each tuner, we present the average ran ...@@ -154,52 +154,52 @@ To view the same data in another way, for each tuner, we present the average ran
Besides these reports, our script also generates two graphs for each fold of each task: one graph presents the best score received by each tuner until trial x, and another graph shows the score that each tuner receives in trial x. These two graphs can give some information regarding how the tuners are "converging" to their final solution. We found that for "nnismall", tuners on the random forest model with search space defined in ``/examples/trials/benchmarking/automlbenchmark/nni/extensions/NNI/architectures/run_random_forest.py`` generally converge to the final solution after 40 to 60 trials. As there are too much graphs to incldue in a single report (96 graphs in total), we only present 10 graphs here. Besides these reports, our script also generates two graphs for each fold of each task: one graph presents the best score received by each tuner until trial x, and another graph shows the score that each tuner receives in trial x. These two graphs can give some information regarding how the tuners are "converging" to their final solution. We found that for "nnismall", tuners on the random forest model with search space defined in ``/examples/trials/benchmarking/automlbenchmark/nni/extensions/NNI/architectures/run_random_forest.py`` generally converge to the final solution after 40 to 60 trials. As there are too much graphs to incldue in a single report (96 graphs in total), we only present 10 graphs here.
.. image:: ../img/hpo_benchmark/car_fold1_1.jpg .. image:: ../../img/hpo_benchmark/car_fold1_1.jpg
:target: ../img/hpo_benchmark/car_fold1_1.jpg :target: ../../img/hpo_benchmark/car_fold1_1.jpg
:alt: :alt:
.. image:: ../img/hpo_benchmark/car_fold1_2.jpg .. image:: ../../img/hpo_benchmark/car_fold1_2.jpg
:target: ../img/hpo_benchmark/car_fold1_2.jpg :target: ../../img/hpo_benchmark/car_fold1_2.jpg
:alt: :alt:
The previous two graphs are generated for fold 1 of the task "car". In the first graph, we observe that most tuners find a relatively good solution within 40 trials. In this experiment, among all tuners, the DNGOTuner converges fastest to the best solution (within 10 trials). Its best score improved for three times in the entire experiment. In the second graph, we observe that most tuners have their score flucturate between 0.8 and 1 throughout the experiment. However, it seems that the Anneal tuner (green line) is more unstable (having more fluctuations) while the GPTuner has a more stable pattern. This may be interpreted as the Anneal tuner explores more aggressively than the GPTuner and thus its scores for different trials vary a lot. Regardless, although this pattern can to some extent hint a tuner's position on the explore-exploit tradeoff, it is not a comprehensive evaluation of a tuner's effectiveness. The previous two graphs are generated for fold 1 of the task "car". In the first graph, we observe that most tuners find a relatively good solution within 40 trials. In this experiment, among all tuners, the DNGOTuner converges fastest to the best solution (within 10 trials). Its best score improved for three times in the entire experiment. In the second graph, we observe that most tuners have their score flucturate between 0.8 and 1 throughout the experiment. However, it seems that the Anneal tuner (green line) is more unstable (having more fluctuations) while the GPTuner has a more stable pattern. This may be interpreted as the Anneal tuner explores more aggressively than the GPTuner and thus its scores for different trials vary a lot. Regardless, although this pattern can to some extent hint a tuner's position on the explore-exploit tradeoff, it is not a comprehensive evaluation of a tuner's effectiveness.
.. image:: ../img/hpo_benchmark/christine_fold0_1.jpg .. image:: ../../img/hpo_benchmark/christine_fold0_1.jpg
:target: ../img/hpo_benchmark/christine_fold0_1.jpg :target: ../../img/hpo_benchmark/christine_fold0_1.jpg
:alt: :alt:
.. image:: ../img/hpo_benchmark/christine_fold0_2.jpg .. image:: ../../img/hpo_benchmark/christine_fold0_2.jpg
:target: ../img/hpo_benchmark/christine_fold0_2.jpg :target: ../../img/hpo_benchmark/christine_fold0_2.jpg
:alt: :alt:
.. image:: ../img/hpo_benchmark/cnae-9_fold0_1.jpg .. image:: ../../img/hpo_benchmark/cnae-9_fold0_1.jpg
:target: ../img/hpo_benchmark/cnae-9_fold0_1.jpg :target: ../../img/hpo_benchmark/cnae-9_fold0_1.jpg
:alt: :alt:
.. image:: ../img/hpo_benchmark/cnae-9_fold0_2.jpg .. image:: ../../img/hpo_benchmark/cnae-9_fold0_2.jpg
:target: ../img/hpo_benchmark/cnae-9_fold0_2.jpg :target: ../../img/hpo_benchmark/cnae-9_fold0_2.jpg
:alt: :alt:
.. image:: ../img/hpo_benchmark/credit-g_fold1_1.jpg .. image:: ../../img/hpo_benchmark/credit-g_fold1_1.jpg
:target: ../img/hpo_benchmark/credit-g_fold1_1.jpg :target: ../../img/hpo_benchmark/credit-g_fold1_1.jpg
:alt: :alt:
.. image:: ../img/hpo_benchmark/credit-g_fold1_2.jpg .. image:: ../../img/hpo_benchmark/credit-g_fold1_2.jpg
:target: ../img/hpo_benchmark/credit-g_fold1_2.jpg :target: ../../img/hpo_benchmark/credit-g_fold1_2.jpg
:alt: :alt:
.. image:: ../img/hpo_benchmark/titanic_2_fold1_1.jpg .. image:: ../../img/hpo_benchmark/titanic_2_fold1_1.jpg
:target: ../img/hpo_benchmark/titanic_2_fold1_1.jpg :target: ../../img/hpo_benchmark/titanic_2_fold1_1.jpg
:alt: :alt:
.. image:: ../img/hpo_benchmark/titanic_2_fold1_2.jpg .. image:: ../../img/hpo_benchmark/titanic_2_fold1_2.jpg
:target: ../img/hpo_benchmark/titanic_2_fold1_2.jpg :target: ../../img/hpo_benchmark/titanic_2_fold1_2.jpg
:alt: :alt:
###########################
Hyperparameter Optimization
###########################
.. raw:: html
<script>
const parts = window.location.href.split('/');
if (parts.pop() === 'index.html') {
window.location.replace(parts.join('/') + '/overview.html')
}
</script>
.. toctree::
:maxdepth: 2
Overview <overview>
Search Space <search_space>
Tuners <tuners>
Assessors <assessors>
Advanced Usage <advanced_toctree.rst>
:orphan:
NNI Annotation NNI Annotation
============== ==============
......
Hyperparameter Optimization Overview
====================================
Auto hyperparameter optimization (HPO), or auto tuning, is one of the key features of NNI.
Introduction to HPO
-------------------
In machine learning, a hyperparameter is a parameter whose value is used to control learning process [1]_,
and HPO is the problem of choosing a set of optimal hyperparameters for a learning algorithm [2]_.
.. [1] https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)
.. [2] https://en.wikipedia.org/wiki/Hyperparameter_optimization
Following code snippet demonstrates a naive HPO process:
.. code-block:: python
best_hyperparameters = None
best_accuracy = 0
for learning_rate in [0.1, 0.01, 0.001, 0.0001]:
for momentum in [i / 10 for i in range(10)]:
for activation_type in ['relu', 'tanh', 'sigmoid']:
model = build_model(activation_type)
train_model(model, learning_rate, momentum)
accuracy = evaluate_model(model)
if accuracy > best_accuracy:
best_accuracy = accuracy
best_hyperparameters = (learning_rate, momentum, activation_type)
print('Best hyperparameters:', best_hyperparameters)
You may have noticed, the example will train 4×10×3=120 models in total.
Since it consumes so much computing resources, you may want to:
1. Find the best set of hyperparameters with less iterations.
2. Train the models on distributed platforms.
3. Have a portal to monitor and control the process.
And NNI will do them for you.
Key Features of NNI HPO
-----------------------
Tuning Algorithms
^^^^^^^^^^^^^^^^^
NNI provides *tuners* to speed up the process of finding best hyperparameter set.
A tuner, or a tuning algorithm, decides the order in which hyperparameter sets are evaluated.
Based on the results of historical hyperparameter sets, an efficient tuner can predict where the best hyperparameters locates around,
and finds them in much fewer attempts.
The naive example above evaluates all possible hyperparameter sets in constant order, ignoring the historical results.
This is the brute-force tuning algorithm called *grid search*.
NNI has out-of-the-box support for a variety of popular tuners.
It includes naive algorithms like random search and grid search, Bayesian-based algorithms like TPE and SMAC,
RL based algorithms like PPO, and much more.
Main article: :doc:`tuners`
Training Platforms
^^^^^^^^^^^^^^^^^^
If you are not interested in distributed platforms, you can simply run NNI HPO with current computer,
just like any ordinary Python library.
And when you want to leverage more computing resources, NNI provides built-in integration for training platforms
from simple on-premise servers to scalable commercial clouds.
With NNI you can write one piece of model code, and concurrently evaluate hyperparameter sets on local machine, SSH servers,
Kubernetes-based clusters, AzureML service, and much more.
Main article: (FIXME: link to training_services)
Web UI
^^^^^^
NNI provides a web portal to monitor training progress, to visualize hyperparameter performance,
to manually customize hyperparameters, and to manage multiple HPO experiments.
(FIXME: image and link)
Tutorials
---------
To start using NNI HPO, choose the tutorial of your favorite framework:
* PyTorch MNIST tutorial
* :doc:`TensorFlow MNIST tutorial </tutorials/hpo_quickstart_tensorflow/main>`
Extra Features
--------------
After you are familiar with basic usage, you can explore more HPO features:
* :doc:`Assessor: Early stop non-optimal models <assessors>`
* :doc:`nnictl: Use command line tool to create and manage experiments </reference/nnictl>`
* :doc:`Custom tuner: Implement your own tuner <custom_algorithm>`
* :doc:`Tensorboard support <tensorboard>`
* :doc:`Tuner benchmark <hpo_benchmark>`
* :doc:`NNI Annotation (legacy) <nni_annotation>`
Builtin-Tuners Tuner: Tuning Algorithms
============== ========================
NNI provides an easy way to adopt an approach to set up parameter tuning algorithms, we call them **Tuner**. The tuner decides which hyperparameter sets will be evaluated. It is a most important part of NNI HPO.
Tuner receives metrics from `Trial` to evaluate the performance of a specific parameters/architecture configuration. Tuner sends the next hyper-parameter or architecture configuration to Trial. A tuner works in following steps:
The following table briefly describes the built-in tuners provided by NNI. Click the **Tuner's name** to get the Tuner's installation requirements, suggested scenario, and an example configuration. A link for a detailed description of each algorithm is located at the end of the suggested scenario for each tuner. Here is an `article <../CommunitySharings/HpoComparison.rst>`__ comparing different Tuners on several problems. 1. Initialize with a search space.
2. Generate hyperparameter sets from the search space.
3. Send hyperparameters to trials.
4. Receive evaluation results.
5. Update internal states according to the results.
6. Go to step 2, until experiment end.
NNI has out-of-the-box support for many popular tuning algorithms.
They should be sufficient to cover most typical machine learning scenarios.
However, if you have a very specific demand, or if you have designed an algorithm yourself,
you can also implement your own tuner: :doc:`custom_algorithm`
Common Usage
------------
All built-in tuners have similar usage.
To use a built-in tuner, you need to specify its name and arguments in experiment config,
and provides a standard :doc:`search_space`.
Some tuners, like SMAC and DNGO, have extra dependencies that need to be installed separately.
Please check each tuner's reference page for what arguments it supports and whether it needs extra dependencies.
For a general example, random tuner can be configured as follow:
.. code-block:: python
config.search_space = {
'x': {'_type': 'uniform', '_value': [0, 1]}
}
config.tuner.name = 'Random'
config.tuner.class_args = {'seed': 0}
Full List
---------
.. list-table:: .. list-table::
:header-rows: 1 :header-rows: 1
...@@ -14,61 +49,52 @@ The following table briefly describes the built-in tuners provided by NNI. Click ...@@ -14,61 +49,52 @@ The following table briefly describes the built-in tuners provided by NNI. Click
* - Tuner * - Tuner
- Brief Introduction of Algorithm - Brief Introduction of Algorithm
* - `TPE <./TpeTuner.rst>`__ * - `TPE <../autotune_ref.html#nni.algorithms.hpo.tpe_tuner.TpeTuner>`_
- The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. `Reference Paper <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__ - The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. `Reference Paper <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__
* - `Random Search <./RandomTuner.rst>`__ * - `Random Search <../autotune_ref.html#nni.algorithms.hpo.random_tuner.RandomTuner>`_
- In Random Search for Hyper-Parameter Optimization show that Random Search might be surprisingly simple and effective. We suggest that we could use Random Search as the baseline when we have no knowledge about the prior distribution of hyper-parameters. `Reference Paper <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__ - In Random Search for Hyper-Parameter Optimization show that Random Search might be surprisingly simple and effective. We suggest that we could use Random Search as the baseline when we have no knowledge about the prior distribution of hyper-parameters. `Reference Paper <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__
* - `Anneal <./AnnealTuner.rst>`__ * - `Anneal <../autotune_ref.html#nni.algorithms.hpo.hyperopt_tuner.HyperoptTuner>`_
- This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on the random search that leverages smoothness in the response surface. The annealing rate is not adaptive. - This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on the random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
* - `Naïve Evolution <./EvolutionTuner.rst>`__ * - `Naive Evolution <../autotune_ref.html#nni.algorithms.hpo.evolution_tuner.EvolutionTuner>`_
- Naïve Evolution comes from Large-Scale Evolution of Image Classifiers. It randomly initializes a population-based on search space. For each generation, it chooses better ones and does some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naïve Evolution requires many trials to work, but it's very simple and easy to expand new features. `Reference paper <https://arxiv.org/pdf/1703.01041.pdf>`__ - Naive Evolution comes from Large-Scale Evolution of Image Classifiers. It randomly initializes a population-based on search space. For each generation, it chooses better ones and does some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naïve Evolution requires many trials to work, but it's very simple and easy to expand new features. `Reference paper <https://arxiv.org/pdf/1703.01041.pdf>`__
* - `SMAC <./SmacTuner.rst>`__ * - `SMAC <../autotune_ref.html#nni.algorithms.hpo.smac_tuner.SMACTuner>`_
- SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by NNI is a wrapper on the SMAC3 GitHub repo. - SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by NNI is a wrapper on the SMAC3 GitHub repo.
Notice, SMAC needs to be installed by ``pip install nni[SMAC]`` command. `Reference Paper, <https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf>`__ `GitHub Repo <https://github.com/automl/SMAC3>`__ Notice, SMAC needs to be installed by ``pip install nni[SMAC]`` command. `Reference Paper, <https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf>`__ `GitHub Repo <https://github.com/automl/SMAC3>`__
* - `Batch tuner <./BatchTuner.rst>`__ * - `Batch <../autotune_ref.html#nni.algorithms.hpo.batch_tuner.BatchTuner>`_
- Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec. - Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.
* - `Grid Search <./GridsearchTuner.rst>`__ * - `Grid Search <../autotune_ref.html#nni.algorithms.hpo.gridsearch_tuner.GridSearchTuner>`_
- Grid Search performs an exhaustive searching through the search space. - Grid Search performs an exhaustive searching through the search space.
* - `Hyperband <./HyperbandAdvisor.rst>`__ * - `Hyperband <../autotune_ref.html#nni.algorithms.hpo.hyperband_advisor.Hyperband>`_
- Hyperband tries to use limited resources to explore as many configurations as possible and returns the most promising ones as a final result. The basic idea is to generate many configurations and run them for a small number of trials. The half least-promising configurations are thrown out, the remaining are further trained along with a selection of new configurations. The size of these populations is sensitive to resource constraints (e.g. allotted search time). `Reference Paper <https://arxiv.org/pdf/1603.06560.pdf>`__ - Hyperband tries to use limited resources to explore as many configurations as possible and returns the most promising ones as a final result. The basic idea is to generate many configurations and run them for a small number of trials. The half least-promising configurations are thrown out, the remaining are further trained along with a selection of new configurations. The size of these populations is sensitive to resource constraints (e.g. allotted search time). `Reference Paper <https://arxiv.org/pdf/1603.06560.pdf>`__
* - `Metis Tuner <./MetisTuner.rst>`__ * - `Metis <../autotune_ref.html#nni.algorithms.hpo.metis_tuner.MetisTuner>`_
- Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. `Reference Paper <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__ - Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. `Reference Paper <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__
* - `BOHB <./BohbAdvisor.rst>`__ * - `BOHB <../autotune_ref.html#nni.algorithms.hpo.bohb_advisor.BOHB>`_
- BOHB is a follow-up work to Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. `Reference Paper <https://arxiv.org/abs/1807.01774>`__ - BOHB is a follow-up work to Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. `Reference Paper <https://arxiv.org/abs/1807.01774>`__
* - `GP Tuner <./GPTuner.rst>`__ * - `GP <../autotune_ref.html#nni.algorithms.hpo.gp_tuner.GPTuner>`_
- Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. `Reference Paper <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__, `Github Repo <https://github.com/fmfn/BayesianOptimization>`__ - Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. `Reference Paper <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__, `Github Repo <https://github.com/fmfn/BayesianOptimization>`__
* - `PBT Tuner <./PBTTuner.rst>`__ * - `PBT <../autotune_ref.html>`_
- PBT Tuner is a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. `Reference Paper <https://arxiv.org/abs/1711.09846v1>`__ - PBT Tuner is a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. `Reference Paper <https://arxiv.org/abs/1711.09846v1>`__
* - `DNGO Tuner <./DngoTuner.rst>`__ * - `DNGO <../autotune_ref.html>`_
- Use of neural networks as an alternative to GPs to model distributions over functions in bayesian optimization. - Use of neural networks as an alternative to GPs to model distributions over functions in bayesian optimization.
.. toctree:: Comparison
:maxdepth: 1 ----------
TPE <Tuner/TpeTuner> These articles have compared built-in tuners' performance on some different tasks:
Random Search <Tuner/RandomTuner>
Anneal <Tuner/AnnealTuner> :doc:`hpo_benchmark_stats`
Naive Evolution <Tuner/EvolutionTuner>
SMAC <Tuner/SmacTuner> :doc:`/CommunitySharings/HpoComparison`
Metis Tuner <Tuner/MetisTuner>
Batch Tuner <Tuner/BatchTuner>
Grid Search <Tuner/GridsearchTuner>
GP Tuner <Tuner/GPTuner>
Network Morphism <Tuner/NetworkmorphismTuner>
Hyperband <Tuner/HyperbandAdvisor>
BOHB <Tuner/BohbAdvisor>
PBT Tuner <Tuner/PBTTuner>
DNGO Tuner <Tuner/DngoTuner>
Advanced Features
=================
.. toctree::
:maxdepth: 2
Write a New Tuner <Tuner/CustomizeTuner>
Write a New Assessor <Assessor/CustomizeAssessor>
Write a New Advisor <Tuner/CustomizeAdvisor>
Install Customized Algorithms as Builtin Tuners/Assessors/Advisors <Tutorial/InstallCustomizedAlgos>
.. aa9e6234ae4a578e6e74efcdc521f119
高级功能
=================
.. toctree::
:maxdepth: 2
编写新的 Tuner <Tuner/CustomizeTuner>
编写新的 Assessor <Assessor/CustomizeAssessor>
编写新的 Advisor <Tuner/CustomizeAdvisor>
安装自定义的 Tuners/Assessors/Advisors <Tutorial/InstallCustomizedAlgos>
#############################
Auto (Hyper-parameter) Tuning
#############################
Auto tuning is one of the key features provided by NNI; a main application scenario being
hyper-parameter tuning. Tuning specifically applies to trial code. We provide a lot of popular
auto tuning algorithms (called Tuner), and some early stop algorithms (called Assessor).
NNI supports running trials on various training platforms, for example, on a local machine,
on several servers in a distributed manner, or on platforms such as OpenPAI, Kubernetes, etc.
Other key features of NNI, such as model compression, feature engineering, can also be further
enhanced by auto tuning, which we'll described when introducing those features.
NNI has high extensibility, advanced users can customize their own Tuner, Assessor, and Training Service
according to their needs.
.. toctree::
:maxdepth: 2
Write Trial <TrialExample/Trials>
Tuners <builtin_tuner>
Assessors <builtin_assessor>
Training Platform <training_services>
Examples <examples>
WebUI <Tutorial/WebUI>
How to Debug <Tutorial/HowToDebug>
Advanced <hpo_advanced>
HPO Benchmarks <hpo_benchmark>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment