Unverified Commit 463c0f78 authored by liuzhe-lz's avatar liuzhe-lz Committed by GitHub
Browse files

Update random & tpe & grid search tuner doc (#4339)

parent c9ddce99
Batch Tuner on NNI Batch Tuner on NNI
================== ==================
1. Introduction Introduction
--------------- ------------
Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type ``choice`` in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__. Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type ``choice`` in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__.
Suggested scenario: If the configurations you want to try have been decided, you can list them in the SearchSpace file (using ``choice``\ ) and run them using the batch tuner. Suggested scenario: If the configurations you want to try have been decided, you can list them in the SearchSpace file (using ``choice``\ ) and run them using the batch tuner.
2. Usage Usage
-------- -----
Example Configuration Example Configuration
^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^
...@@ -18,7 +18,7 @@ Example Configuration ...@@ -18,7 +18,7 @@ Example Configuration
# config.yml # config.yml
tuner: tuner:
builtinTunerName: BatchTuner name: BatchTuner
:raw-html:`<br>` :raw-html:`<br>`
...@@ -39,4 +39,4 @@ Note that the search space for BatchTuner should look like: ...@@ -39,4 +39,4 @@ Note that the search space for BatchTuner should look like:
} }
} }
The search space file should include the high-level key ``combine_params``. The type of params in the search space must be ``choice`` and the ``values`` must include all the combined params values. The search space file should include the high-level key ``combine_params``. The type of params in the search space must be ``choice`` and the ``values`` must include all the combined params values.
\ No newline at end of file
BOHB Advisor on NNI BOHB Advisor on NNI
=================== ===================
1. Introduction Introduction
--------------- ------------
BOHB is a robust and efficient hyperparameter tuning algorithm mentioned in `this reference paper <https://arxiv.org/abs/1807.01774>`__. BO is an abbreviation for "Bayesian Optimization" and HB is an abbreviation for "Hyperband". BOHB is a robust and efficient hyperparameter tuning algorithm mentioned in `this reference paper <https://arxiv.org/abs/1807.01774>`__. BO is an abbreviation for "Bayesian Optimization" and HB is an abbreviation for "Hyperband".
...@@ -46,8 +46,8 @@ best and worst configurations, respectively, to model the two densities. ...@@ -46,8 +46,8 @@ best and worst configurations, respectively, to model the two densities.
Note that we also sample a constant fraction named **random fraction** of the configurations uniformly at random. Note that we also sample a constant fraction named **random fraction** of the configurations uniformly at random.
2. Workflow Workflow
----------- --------
.. image:: ../../img/bohb_6.jpg .. image:: ../../img/bohb_6.jpg
...@@ -66,8 +66,8 @@ The sampling procedure (using Multidimensional KDE to guide selection) is summar ...@@ -66,8 +66,8 @@ The sampling procedure (using Multidimensional KDE to guide selection) is summar
:alt: :alt:
3. Usage Usage
-------- -----
Installation Installation
^^^^^^^^^^^^ ^^^^^^^^^^^^
...@@ -133,8 +133,8 @@ To use BOHB, you should add the following spec in your experiment's YAML config ...@@ -133,8 +133,8 @@ To use BOHB, you should add the following spec in your experiment's YAML config
*Please note that the float type currently only supports decimal representations. You have to use 0.333 instead of 1/3 and 0.001 instead of 1e-3.* *Please note that the float type currently only supports decimal representations. You have to use 0.333 instead of 1/3 and 0.001 instead of 1e-3.*
4. File Structure File Structure
----------------- --------------
The advisor has a lot of different files, functions, and classes. Here, we will only give most of those files a brief introduction: The advisor has a lot of different files, functions, and classes. Here, we will only give most of those files a brief introduction:
...@@ -142,8 +142,8 @@ The advisor has a lot of different files, functions, and classes. Here, we will ...@@ -142,8 +142,8 @@ The advisor has a lot of different files, functions, and classes. Here, we will
* ``bohb_advisor.py`` Definition of BOHB, handles interaction with the dispatcher, including generating new trials and processing results. Also includes the implementation of the HB (Hyperband) part. * ``bohb_advisor.py`` Definition of BOHB, handles interaction with the dispatcher, including generating new trials and processing results. Also includes the implementation of the HB (Hyperband) part.
* ``config_generator.py`` Includes the implementation of the BO (Bayesian Optimization) part. The function *get_config* can generate new configurations based on BO; the function *new_result* will update the model with the new result. * ``config_generator.py`` Includes the implementation of the BO (Bayesian Optimization) part. The function *get_config* can generate new configurations based on BO; the function *new_result* will update the model with the new result.
5. Experiment Experiment
------------- ----------
MNIST with BOHB MNIST with BOHB
^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
......
...@@ -30,7 +30,7 @@ Currently, we support the following algorithms: ...@@ -30,7 +30,7 @@ Currently, we support the following algorithms:
* - `Batch tuner <#Batch>`__ * - `Batch tuner <#Batch>`__
- Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec. - Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.
* - `Grid Search <#GridSearch>`__ * - `Grid Search <#GridSearch>`__
- Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, randint. - Grid Search performs an exhaustive searching through the search space.
* - `Hyperband <#Hyperband>`__ * - `Hyperband <#Hyperband>`__
- Hyperband tries to use limited resources to explore as many configurations as possible and returns the most promising ones as a final result. The basic idea is to generate many configurations and run them for a small number of trials. The half least-promising configurations are thrown out, the remaining are further trained along with a selection of new configurations. The size of these populations is sensitive to resource constraints (e.g. allotted search time). `Reference Paper <https://arxiv.org/pdf/1603.06560.pdf>`__ - Hyperband tries to use limited resources to explore as many configurations as possible and returns the most promising ones as a final result. The basic idea is to generate many configurations and run them for a small number of trials. The half least-promising configurations are thrown out, the remaining are further trained along with a selection of new configurations. The size of these populations is sensitive to resource constraints (e.g. allotted search time). `Reference Paper <https://arxiv.org/pdf/1603.06560.pdf>`__
* - `Network Morphism <#NetworkMorphism>`__ * - `Network Morphism <#NetworkMorphism>`__
...@@ -49,7 +49,7 @@ Currently, we support the following algorithms: ...@@ -49,7 +49,7 @@ Currently, we support the following algorithms:
Usage of Built-in Tuners Usage of Built-in Tuners
------------------------ ------------------------
Using a built-in tuner provided by the NNI SDK requires one to declare the **builtinTunerName** and **classArgs** in the ``config.yml`` file. In this part, we will introduce each tuner along with information about usage and suggested scenarios, classArg requirements, and an example configuration. Using a built-in tuner provided by the NNI SDK requires one to declare the **name** and **classArgs** in the ``config.yml`` file. In this part, we will introduce each tuner along with information about usage and suggested scenarios, classArg requirements, and an example configuration.
Note: Please follow the format when you write your ``config.yml`` file. Some built-in tuners have dependencies that need to be installed using ``pip install nni[<tuner>]``, like SMAC's dependencies can be installed using ``pip install nni[SMAC]``. Note: Please follow the format when you write your ``config.yml`` file. Some built-in tuners have dependencies that need to be installed using ``pip install nni[<tuner>]``, like SMAC's dependencies can be installed using ``pip install nni[SMAC]``.
...@@ -62,7 +62,7 @@ TPE ...@@ -62,7 +62,7 @@ TPE
Built-in Tuner Name: **TPE** Built-in Tuner Name: **TPE**
TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resources and can only try a small number of trials. From a large amount of experiments, we found that TPE is far better than Random Search. `Detailed Description <./HyperoptTuner.rst>`__ TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resources and can only try a small number of trials. From a large amount of experiments, we found that TPE is far better than Random Search. `Detailed Description <./TpeTuner.rst>`__
:raw-html:`<br>` :raw-html:`<br>`
...@@ -75,7 +75,7 @@ Random Search ...@@ -75,7 +75,7 @@ Random Search
Built-in Tuner Name: **Random** Built-in Tuner Name: **Random**
Random search is suggested when each trial does not take very long (e.g., each trial can be completed very quickly, or early stopped by the assessor), and you have enough computational resources. It's also useful if you want to uniformly explore the search space. Random Search can be considered a baseline search algorithm. `Detailed Description <./HyperoptTuner.rst>`__ Random search is suggested when each trial does not take very long (e.g., each trial can be completed very quickly, or early stopped by the assessor), and you have enough computational resources. It's also useful if you want to uniformly explore the search space. Random Search can be considered a baseline search algorithm. `Detailed Description <./RandomTuner.rst>`__
:raw-html:`<br>` :raw-html:`<br>`
...@@ -144,8 +144,6 @@ Grid Search ...@@ -144,8 +144,6 @@ Grid Search
Built-in Tuner Name: **Grid Search** Built-in Tuner Name: **Grid Search**
Note that the only acceptable types within the search space are ``choice``\ , ``quniform``\ , and ``randint``.
This is suggested when the search space is small. It's suggested when it is feasible to exhaustively sweep the whole search space. `Detailed Description <./GridsearchTuner.rst>`__ This is suggested when the search space is small. It's suggested when it is feasible to exhaustively sweep the whole search space. `Detailed Description <./GridsearchTuner.rst>`__
:raw-html:`<br>` :raw-html:`<br>`
......
**How To** - Customize Your Own Advisor **How To** - Customize Your Own Advisor
=========================================== =======================================
*Warning: API is subject to change in future releases.* *Warning: API is subject to change in future releases.*
......
DNGO on NNI DNGO on NNI
=========== ===========
1. Introduction Introduction
--------------- ------------
2. Usage Usage
-------- -----
Installation Installation
^^^^^^^^^^^^ ^^^^^^^^^^^^
...@@ -25,6 +25,6 @@ Example Configuration ...@@ -25,6 +25,6 @@ Example Configuration
# config.yml # config.yml
tuner: tuner:
builtinTunerName: DNGOTuner name: DNGOTuner
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
\ No newline at end of file
...@@ -2,13 +2,13 @@ Naive Evolution Tuners on NNI ...@@ -2,13 +2,13 @@ Naive Evolution Tuners on NNI
============================= =============================
1. Introduction Introduction
--------------- ------------
Naive Evolution comes from `Large-Scale Evolution of Image Classifiers <https://arxiv.org/pdf/1703.01041.pdf>`__. It randomly initializes a population based on the search space. For each generation, it chooses better ones and does some mutation (e.g., changes a hyperparameter, adds/removes one layer, etc.) on them to get the next generation. Naive Evolution requires many trials to works but it's very simple and it's easily expanded with new features. Naive Evolution comes from `Large-Scale Evolution of Image Classifiers <https://arxiv.org/pdf/1703.01041.pdf>`__. It randomly initializes a population based on the search space. For each generation, it chooses better ones and does some mutation (e.g., changes a hyperparameter, adds/removes one layer, etc.) on them to get the next generation. Naive Evolution requires many trials to works but it's very simple and it's easily expanded with new features.
2. Usage Usage
-------- -----
classArgs Requirements classArgs Requirements
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
...@@ -26,7 +26,7 @@ Example Configuration ...@@ -26,7 +26,7 @@ Example Configuration
# config.yml # config.yml
tuner: tuner:
builtinTunerName: Evolution name: Evolution
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
population_size: 100 population_size: 100
......
GP Tuner on NNI GP Tuner on NNI
=============== ===============
1. Introduction Introduction
--------------- ------------
Bayesian optimization works by constructing a posterior distribution of functions (a Gaussian Process) that best describes the function you want to optimize. As the number of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in parameter space are worth exploring and which are not. Bayesian optimization works by constructing a posterior distribution of functions (a Gaussian Process) that best describes the function you want to optimize. As the number of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in parameter space are worth exploring and which are not.
...@@ -12,8 +12,8 @@ Note that the only acceptable types within the search space are ``randint``\ , ` ...@@ -12,8 +12,8 @@ Note that the only acceptable types within the search space are ``randint``\ , `
This optimization approach is described in Section 3 of `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__. This optimization approach is described in Section 3 of `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__.
2. Usage Usage
-------- -----
classArgs requirements classArgs requirements
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
...@@ -35,7 +35,7 @@ Example Configuration ...@@ -35,7 +35,7 @@ Example Configuration
# config.yml # config.yml
tuner: tuner:
builtinTunerName: GPTuner name: GPTuner
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
utility: 'ei' utility: 'ei'
...@@ -45,4 +45,4 @@ Example Configuration ...@@ -45,4 +45,4 @@ Example Configuration
alpha: 1e-6 alpha: 1e-6
cold_start_num: 10 cold_start_num: 10
selection_num_warm_up: 100000 selection_num_warm_up: 100000
selection_num_starting_points: 250 selection_num_starting_points: 250
\ No newline at end of file
...@@ -4,21 +4,22 @@ Grid Search on NNI ...@@ -4,21 +4,22 @@ Grid Search on NNI
Grid Search Grid Search
----------- -----------
1. Introduction Introduction
--------------- ------------
Grid Search performs an exhaustive search through a manually specified subset of the hyperparameter space defined in the searchspace file. Grid Search performs an exhaustive search through a search space.
Note that the only acceptable types within the search space are ``choice``\ , ``quniform``\ , and ``randint``. For uniform and normal distributed parameters, grid search tuner samples them at progressively decreased intervals.
2. Usage Usage
-------- -----
Grid search tuner has no argument.
Example Configuration Example Configuration
^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml .. code-block:: yaml
# config.yml
tuner: tuner:
builtinTunerName: GridSearch name: GridSearch
\ No newline at end of file
Hyperband on NNI Hyperband on NNI
================ ================
1. Introduction Introduction
--------------- ------------
`Hyperband <https://arxiv.org/pdf/1603.06560.pdf>`__ is a popular autoML algorithm. The basic idea of Hyperband is to create several buckets, each having ``n`` randomly generated hyperparameter configurations, each configuration using ``r`` resources (e.g., epoch number, batch number). After the ``n`` configurations are finished, it chooses the top ``n/eta`` configurations and runs them using increased ``r*eta`` resources. At last, it chooses the best configuration it has found so far. `Hyperband <https://arxiv.org/pdf/1603.06560.pdf>`__ is a popular autoML algorithm. The basic idea of Hyperband is to create several buckets, each having ``n`` randomly generated hyperparameter configurations, each configuration using ``r`` resources (e.g., epoch number, batch number). After the ``n`` configurations are finished, it chooses the top ``n/eta`` configurations and runs them using increased ``r*eta`` resources. At last, it chooses the best configuration it has found so far.
2. Implementation with full parallelism Implementation with full parallelism
--------------------------------------- ------------------------------------
First, this is an example of how to write an autoML algorithm based on MsgDispatcherBase, rather than Tuner and Assessor. Hyperband is implemented in this way because it integrates the functions of both Tuner and Assessor, thus, we call it Advisor. First, this is an example of how to write an autoML algorithm based on MsgDispatcherBase, rather than Tuner and Assessor. Hyperband is implemented in this way because it integrates the functions of both Tuner and Assessor, thus, we call it Advisor.
...@@ -31,8 +31,8 @@ Or if you want to set ``exec_mode`` with ``serial`` according to the original al ...@@ -31,8 +31,8 @@ Or if you want to set ``exec_mode`` with ``serial`` according to the original al
If you want to reproduce these results, refer to the example under ``examples/trials/benchmarking/`` for details. If you want to reproduce these results, refer to the example under ``examples/trials/benchmarking/`` for details.
3. Usage Usage
-------- -----
Config file Config file
^^^^^^^^^^^ ^^^^^^^^^^^
...@@ -138,8 +138,8 @@ Example Configuration ...@@ -138,8 +138,8 @@ Example Configuration
R: 60 R: 60
eta: 3 eta: 3
4. Future improvements Future improvements
---------------------- -------------------
The current implementation of Hyperband can be further improved by supporting a simple early stop algorithm since it's possible that not all the configurations in the top ``n/eta`` perform well. Any unpromising configurations should be stopped early. The current implementation of Hyperband can be further improved by supporting a simple early stop algorithm since it's possible that not all the configurations in the top ``n/eta`` perform well. Any unpromising configurations should be stopped early.
......
TPE, Random Search, Anneal Tuners on NNI Anneal Tuner
======================================== ============
TPE Introduction
---
1. Introduction
---------------
The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluation matric. P(x|y) is modeled by transforming the generative process of hyperparameters, replacing the distributions of the configuration prior with non-parametric densities. This optimization approach is described in detail in `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__. ​
Parallel TPE optimization
^^^^^^^^^^^^^^^^^^^^^^^^^
TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete. The original algorithm design was optimized for sequential computation. If we were to use TPE with much concurrency, its performance will be bad. We have optimized this case using the Constant Liar algorithm. For these principles of optimization, please refer to our `research blog <../CommunitySharings/ParallelizingTpeSearch.rst>`__.
2. Usage
--------
To use TPE, you should add the following spec in your experiment's YAML config file:
.. code-block:: yaml
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
parallel_optimize: True
constant_liar_type: min
classArgs requirements
^^^^^^^^^^^^^^^^^^^^^^
* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
* **parallel_optimize** (*bool, optional, default = False*\ ) - If True, TPE will use the Constant Liar algorithm to optimize parallel hyperparameter tuning. Otherwise, TPE will not discriminate between sequential or parallel situations.
* **constant_liar_type** (*min or max or mean, optional, default = min*\ ) - The type of constant liar to use, will logically be determined on the basis of the values taken by y at X. There are three possible values, min{Y}, max{Y}, and mean{Y}.
Note: We have optimized the parallelism of TPE for large-scale trial concurrency. For the principle of optimization or turn-on optimization, please refer to `TPE document <./HyperoptTuner.rst>`__.
Example Configuration
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
# config.yml
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
RandomSearch
------------ ------------
1. Introduction
---------------
In `Random Search for Hyper-Parameter Optimization <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__ we show that Random Search might be surprisingly effective despite its simplicity. We suggest using Random Search as a baseline when no knowledge about the prior distribution of hyper-parameters is available.
2. Usage
--------
Example Configuration
.. code-block:: yaml
# config.yml
tuner:
builtinTunerName: Random
Anneal on NNI
-------------
1. Introduction
---------------
This simple annealing algorithm begins by sampling from the prior but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive. This simple annealing algorithm begins by sampling from the prior but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
2. Usage Usage
-------- -----
classArgs Requirements classArgs Requirements
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
Example Configuration Example Configuration
^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^
...@@ -92,6 +21,6 @@ Example Configuration ...@@ -92,6 +21,6 @@ Example Configuration
# config.yml # config.yml
tuner: tuner:
builtinTunerName: Anneal name: Anneal
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
\ No newline at end of file
Metis Tuner on NNI Metis Tuner on NNI
================== ==================
1. Introduction Introduction
--------------- ------------
`Metis <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__ offers several benefits over other tuning algorithms. While most tools only predict the optimal configuration, Metis gives you two outputs, a prediction for the optimal configuration and a suggestion for the next trial. No more guess work! `Metis <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__ offers several benefits over other tuning algorithms. While most tools only predict the optimal configuration, Metis gives you two outputs, a prediction for the optimal configuration and a suggestion for the next trial. No more guess work!
...@@ -23,8 +23,8 @@ Note that the only acceptable types within the search space are ``quniform``\ , ...@@ -23,8 +23,8 @@ Note that the only acceptable types within the search space are ``quniform``\ ,
More details can be found in our `paper <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__. More details can be found in our `paper <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__.
2. Usage Usage
-------- -----
classArgs requirements classArgs requirements
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
...@@ -38,6 +38,6 @@ Example Configuration ...@@ -38,6 +38,6 @@ Example Configuration
# config.yml # config.yml
tuner: tuner:
builtinTunerName: MetisTuner name: MetisTuner
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
\ No newline at end of file
Network Morphism Tuner on NNI Network Morphism Tuner on NNI
============================= =============================
1. Introduction Introduction
--------------- ------------
`Autokeras <https://arxiv.org/abs/1806.10282>`__ is a popular autoML tool using Network Morphism. The basic idea of Autokeras is to use Bayesian Regression to estimate the metric of the Neural Network Architecture. Each time, it generates several child networks from father networks. Then it uses a naïve Bayesian regression to estimate its metric value from the history of trained results of network and metric value pairs. Next, it chooses the child which has the best, estimated performance and adds it to the training queue. Inspired by the work of Autokeras and referring to its `code <https://github.com/jhfjhfj1/autokeras>`__\ , we implemented our Network Morphism method on the NNI platform. `Autokeras <https://arxiv.org/abs/1806.10282>`__ is a popular autoML tool using Network Morphism. The basic idea of Autokeras is to use Bayesian Regression to estimate the metric of the Neural Network Architecture. Each time, it generates several child networks from father networks. Then it uses a naïve Bayesian regression to estimate its metric value from the history of trained results of network and metric value pairs. Next, it chooses the child which has the best, estimated performance and adds it to the training queue. Inspired by the work of Autokeras and referring to its `code <https://github.com/jhfjhfj1/autokeras>`__\ , we implemented our Network Morphism method on the NNI platform.
If you want to know more about network morphism trial usage, please see the :githublink:`Readme.md <examples/trials/network_morphism/README.rst>`. If you want to know more about network morphism trial usage, please see the :githublink:`Readme.md <examples/trials/network_morphism/README.rst>`.
2. Usage Usage
-------- -----
Installation Installation
^^^^^^^^^^^^ ^^^^^^^^^^^^
...@@ -36,7 +36,7 @@ To use Network Morphism, you should modify the following spec in your ``config.y ...@@ -36,7 +36,7 @@ To use Network Morphism, you should modify the following spec in your ``config.y
tuner: tuner:
#choice: NetworkMorphism #choice: NetworkMorphism
builtinTunerName: NetworkMorphism name: NetworkMorphism
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
...@@ -134,8 +134,8 @@ If you want to save and load the **best model**\ , the following methods are rec ...@@ -134,8 +134,8 @@ If you want to save and load the **best model**\ , the following methods are rec
model_id = "" # id of the model you want to reuse model_id = "" # id of the model you want to reuse
loaded_model = torch.load("model-{}.pt".format(model_id)) loaded_model = torch.load("model-{}.pt".format(model_id))
3. File Structure File Structure
----------------- --------------
The tuner has a lot of different files, functions, and classes. Here, we will give most of those files only a brief introduction: The tuner has a lot of different files, functions, and classes. Here, we will give most of those files only a brief introduction:
...@@ -164,8 +164,8 @@ The tuner has a lot of different files, functions, and classes. Here, we will gi ...@@ -164,8 +164,8 @@ The tuner has a lot of different files, functions, and classes. Here, we will gi
* ``metric.py`` some metric classes including Accuracy and MSE. * ``metric.py`` some metric classes including Accuracy and MSE.
* ``utils.py`` is the example search network architectures for the ``cifar10`` dataset, using Keras. * ``utils.py`` is the example search network architectures for the ``cifar10`` dataset, using Keras.
4. The Network Representation Json Example The Network Representation Json Example
------------------------------------------ ---------------------------------------
Here is an example of the intermediate representation JSON file we defined, which is passed from the tuner to the trial in the architecture search procedure. Users can call the "json_to_graph()" function in the trial code to build a PyTorch or Keras model from this JSON file. Here is an example of the intermediate representation JSON file we defined, which is passed from the tuner to the trial in the architecture search procedure. Users can call the "json_to_graph()" function in the trial code to build a PyTorch or Keras model from this JSON file.
...@@ -293,7 +293,7 @@ You can consider the model to be a `directed acyclic graph <https://en.wikipedia ...@@ -293,7 +293,7 @@ You can consider the model to be a `directed acyclic graph <https://en.wikipedia
* *
For else layers, the numbering follows the format: its node input id (or id list) and node output id. For else layers, the numbering follows the format: its node input id (or id list) and node output id.
5. TODO TODO
------- ----
Next step, we will change the API from s fixed network generator to a network generator with more available operators. We will use ONNX instead of JSON later as the intermediate representation spec in the future. Next step, we will change the API from s fixed network generator to a network generator with more available operators. We will use ONNX instead of JSON later as the intermediate representation spec in the future.
PBT Tuner on NNI PBT Tuner on NNI
================ ================
1. Introduction Introduction
--------------- ------------
Population Based Training (PBT) comes from `Population Based Training of Neural Networks <https://arxiv.org/abs/1711.09846v1>`__. It's a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. Population Based Training (PBT) comes from `Population Based Training of Neural Networks <https://arxiv.org/abs/1711.09846v1>`__. It's a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training.
...@@ -15,8 +15,8 @@ Population Based Training (PBT) comes from `Population Based Training of Neural ...@@ -15,8 +15,8 @@ Population Based Training (PBT) comes from `Population Based Training of Neural
PBTTuner initializes a population with several trials (i.e., ``population_size``\ ). There are four steps in the above figure, each trial only runs by one step. How long is one step is controlled by trial code, e.g., one epoch. When a trial starts, it loads a checkpoint specified by PBTTuner and continues to run one step, then saves checkpoint to a directory specified by PBTTuner and exits. The trials in a population run steps synchronously, that is, after all the trials finish the ``i``\ -th step, the ``(i+1)``\ -th step can be started. Exploitation and exploration of PBT are executed between two consecutive steps. PBTTuner initializes a population with several trials (i.e., ``population_size``\ ). There are four steps in the above figure, each trial only runs by one step. How long is one step is controlled by trial code, e.g., one epoch. When a trial starts, it loads a checkpoint specified by PBTTuner and continues to run one step, then saves checkpoint to a directory specified by PBTTuner and exits. The trials in a population run steps synchronously, that is, after all the trials finish the ``i``\ -th step, the ``(i+1)``\ -th step can be started. Exploitation and exploration of PBT are executed between two consecutive steps.
2. Usage Usage
-------- -----
Provide checkpoint directory Provide checkpoint directory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...@@ -62,7 +62,7 @@ Below is an exmaple of PBTTuner configuration in experiment config file. **Note ...@@ -62,7 +62,7 @@ Below is an exmaple of PBTTuner configuration in experiment config file. **Note
# config.yml # config.yml
tuner: tuner:
builtinTunerName: PBTTuner name: PBTTuner
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
all_checkpoint_dir: /the/path/to/store/checkpoints all_checkpoint_dir: /the/path/to/store/checkpoints
...@@ -77,4 +77,4 @@ Example Configuration ...@@ -77,4 +77,4 @@ Example Configuration
tuner: tuner:
builtinTunerName: PBTTuner builtinTunerName: PBTTuner
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
\ No newline at end of file
Random Tuner
============
Introduction
------------
In `Random Search for Hyper-Parameter Optimization <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__ we show that Random Search might be surprisingly effective despite its simplicity.
We suggest using Random Search as a baseline when no knowledge about the prior distribution of hyper-parameters is available.
Usage
-----
Example Configuration
.. code-block:: yaml
tuner:
name: Random
classArgs:
seed: 100 # optional
...@@ -2,16 +2,16 @@ SMAC Tuner on NNI ...@@ -2,16 +2,16 @@ SMAC Tuner on NNI
================= =================
1. Introduction Introduction
--------------- ------------
`SMAC <https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf>`__ is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO in order to handle categorical parameters. The SMAC supported by nni is a wrapper on `the SMAC3 github repo <https://github.com/automl/SMAC3>`__. `SMAC <https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf>`__ is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO in order to handle categorical parameters. The SMAC supported by nni is a wrapper on `the SMAC3 github repo <https://github.com/automl/SMAC3>`__.
Note that SMAC on nni only supports a subset of the types in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__\ : ``choice``\ , ``randint``\ , ``uniform``\ , ``loguniform``\ , and ``quniform``. Note that SMAC on nni only supports a subset of the types in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__\ : ``choice``\ , ``randint``\ , ``uniform``\ , ``loguniform``\ , and ``quniform``.
2. Usage Usage
-------- -----
Installation Installation
^^^^^^^^^^^^ ^^^^^^^^^^^^
...@@ -35,6 +35,6 @@ Example Configuration ...@@ -35,6 +35,6 @@ Example Configuration
# config.yml # config.yml
tuner: tuner:
builtinTunerName: SMAC name: SMAC
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
\ No newline at end of file
TPE Tuner
=========
Introduction
------------
The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach.
SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements,
and then subsequently choose new hyperparameters to test based on this model.
The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluation matric.
P(x|y) is modeled by transforming the generative process of hyperparameters,
replacing the distributions of the configuration prior with non-parametric densities.
This optimization approach is described in detail in `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__.
Parallel TPE optimization
^^^^^^^^^^^^^^^^^^^^^^^^^
TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete.
The original algorithm design was optimized for sequential computation.
If we were to use TPE with much concurrency, its performance will be bad.
We have optimized this case using the Constant Liar algorithm.
For these principles of optimization, please refer to our `research blog <../CommunitySharings/ParallelizingTpeSearch.rst>`__.
Usage
-----
To use TPE, you should add the following spec in your experiment's YAML config file:
.. code-block:: yaml
## minimal config ##
tuner:
name: TPE
classArgs:
optimize_mode: minimize
.. code-block:: yaml
## advanced config ##
tuner:
name: TPE
classArgs:
optimize_mode: maximize
seed: 12345
tpe_args:
constant_liar_type: 'mean'
n_startup_jobs: 10
n_ei_candidates: 20
linear_forgetting: 100
prior_weight: 0
gamma: 0.5
classArgs
^^^^^^^^^
.. list-table::
:widths: 10 20 10 60
:header-rows: 1
* - Field
- Type
- Default
- Description
* - ``optimize_mode``
- ``'minimize' | 'maximize'``
- ``'minimize'``
- Whether to minimize or maximize trial metrics.
* - ``seed``
- ``int | null``
- ``null``
- The random seed.
* - ``tpe_args.constant_liar_type``
- ``'best' | 'worst' | 'mean' | null``
- ``'best'``
- TPE algorithm itself does not support parallel tuning. This parameter specifies how to optimize for trial_concurrency > 1. How each liar works is explained in paper's section 6.1.
In general ``best`` suit for small trial number and ``worst`` suit for large trial number.
* - ``tpe_args.n_startup_jobs``
- ``int``
- ``20``
- The first N hyper-parameters are generated fully randomly for warming up.
If the search space is large, you can increase this value. Or if max_trial_number is small, you may want to decrease it.
* - ``tpe_args.n_ei_candidates``
- ``int``
- ``24``
- For each iteration TPE samples EI for N sets of parameters and choose the best one. (loosely speaking)
* - ``tpe_args.linear_forgetting``
- ``int``
- ``25``
- TPE will lower the weights of old trials. This controls how many iterations it takes for a trial to start decay.
* - ``tpe_args.prior_weight``
- ``float``
- ``1.0``
- TPE treats user provided search space as prior.
When generating new trials, it also incorporates the prior in trial history by transforming the search space to
one trial configuration (i.e., each parameter of this configuration chooses the mean of its candidate range).
Here, prior_weight determines the weight of this trial configuration in the history trial configurations.
With prior weight 1.0, the search space is treated as one good trial.
For example, "normal(0, 1)" effectly equals to a trial with x = 0 which has yielded good result.
* - ``tpe_args.gamma``
- ``float``
- ``0.25``
- Controls how many trials are considered "good".
The number is calculated as "min(gamma * sqrt(N), linear_forgetting)".
...@@ -20,6 +20,12 @@ Tuner ...@@ -20,6 +20,12 @@ Tuner
.. autoclass:: nni.tuner.Tuner .. autoclass:: nni.tuner.Tuner
:members: :members:
.. autoclass:: nni.algorithms.hpo.tpe_tuner.TpeTuner
:members:
.. autoclass:: nni.algorithms.hpo.random_tuner.RandomTuner
:members:
.. autoclass:: nni.algorithms.hpo.hyperopt_tuner.HyperoptTuner .. autoclass:: nni.algorithms.hpo.hyperopt_tuner.HyperoptTuner
:members: :members:
......
...@@ -10,7 +10,9 @@ Tuner receives metrics from `Trial` to evaluate the performance of a specific pa ...@@ -10,7 +10,9 @@ Tuner receives metrics from `Trial` to evaluate the performance of a specific pa
:maxdepth: 1 :maxdepth: 1
Overview <Tuner/BuiltinTuner> Overview <Tuner/BuiltinTuner>
TPE / Random Search / Anneal <Tuner/HyperoptTuner> TPE <Tuner/TpeTuner>
Random Search <Tuner/RandomTuner>
Anneal <Tuner/HyperoptTuner>
Naive Evolution <Tuner/EvolutionTuner> Naive Evolution <Tuner/EvolutionTuner>
SMAC <Tuner/SmacTuner> SMAC <Tuner/SmacTuner>
Metis Tuner <Tuner/MetisTuner> Metis Tuner <Tuner/MetisTuner>
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment