NNI provides state-of-the-art tuning algorithm in builtin-tuners. NNI supports to build a tuner by yourself for tuning demand.
NNI provides state-of-the-art tuning algorithm in builtin-tuners. NNI supports to build a tuner by yourself for tuning demand.
...
@@ -125,7 +128,7 @@ Write a more advanced automl algorithm
...
@@ -125,7 +128,7 @@ Write a more advanced automl algorithm
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called ``advisor`` which directly inherits from ``MsgDispatcherBase`` in :githublink:`msg_dispatcher_base.py <nni/runtime/msg_dispatcher_base.py>`. Please refer to `here <CustomizeAdvisor.rst>`__ for how to write a customized advisor.
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called ``advisor`` which directly inherits from ``MsgDispatcherBase`` in :githublink:`msg_dispatcher_base.py <nni/runtime/msg_dispatcher_base.py>`. Please refer to `here <CustomizeAdvisor.rst>`__ for how to write a customized advisor.
Customize Assessor
Customize Assessor
==================
------------------
NNI supports to build an assessor by yourself for tuning demand.
NNI supports to build an assessor by yourself for tuning demand.
NNI provides a lot of `builtin tuners <../Tuner/BuiltinTuner.rst>`_, `advisors <../Tuner/HyperbandAdvisor.rst>`__ and `assessors <../Assessor/BuiltinAssessor.rst>`__ can be used directly for Hyper Parameter Optimization, and some extra algorithms can be registered via ``nnictl algo register --meta <path_to_meta_file>`` after NNI is installed. You can check builtin algorithms via ``nnictl algo list`` command.
NNI provides a lot of :doc:`builtin tuners <tuners>`, and :doc:`assessors <assessors>` can be used directly for Hyper Parameter Optimization, and some extra algorithms can be registered via ``nnictl algo register --meta <path_to_meta_file>`` after NNI is installed. You can check builtin algorithms via ``nnictl algo list`` command.
NNI also provides the ability to build your own customized tuners, advisors and assessors. To use the customized algorithm, users can simply follow the spec in experiment config file to properly reference the algorithm, which has been illustrated in the tutorials of `customized tuners <../Tuner/CustomizeTuner.rst>`_ / `advisors <../Tuner/CustomizeAdvisor.rst>`__ / `assessors <../Assessor/CustomizeAssessor.rst>`__.
NNI also provides the ability to build your own customized tuners, advisors and assessors. To use the customized algorithm, users can simply follow the spec in experiment config file to properly reference the algorithm, which has been illustrated in the tutorials of :doc:`customized algorithms <custom_algorithm>`.
NNI also allows users to install the customized algorithm as a builtin algorithm, in order for users to use the algorithm in the same way as NNI builtin tuners/advisors/assessors. More importantly, it becomes much easier for users to share or distribute their implemented algorithm to others. Customized tuners/advisors/assessors can be installed into NNI as builtin algorithms, once they are installed into NNI, you can use your customized algorithms the same way as builtin tuners/advisors/assessors in your experiment configuration file. For example, you built a customized tuner and installed it into NNI using a builtin name ``mytuner``, then you can use this tuner in your configuration file like below:
NNI also allows users to install the customized algorithm as a builtin algorithm, in order for users to use the algorithm in the same way as NNI builtin tuners/advisors/assessors. More importantly, it becomes much easier for users to share or distribute their implemented algorithm to others. Customized tuners/advisors/assessors can be installed into NNI as builtin algorithms, once they are installed into NNI, you can use your customized algorithms the same way as builtin tuners/advisors/assessors in your experiment configuration file. For example, you built a customized tuner and installed it into NNI using a builtin name ``mytuner``, then you can use this tuner in your configuration file like below:
...
@@ -18,19 +15,15 @@ NNI also allows users to install the customized algorithm as a builtin algorithm
...
@@ -18,19 +15,15 @@ NNI also allows users to install the customized algorithm as a builtin algorithm
tuner:
tuner:
builtinTunerName: mytuner
builtinTunerName: mytuner
Register customized algorithms as builtin tuners, assessors and advisors
Register customized algorithms like builtin tuners, assessors and advisors
All that needs to be modified is to delete ``NNI Package :: tuner`` metadata in ``setup.py`` and add a meta file mentioned in `4. Prepare meta file`_. Then you can follow `Register customized algorithms as builtin tuners, assessors and advisors`_ to register your customized algorithms.
All that needs to be modified is to delete ``NNI Package :: tuner`` metadata in ``setup.py`` and add a meta file mentioned in `4. Prepare meta file`_.
Then you can follow `Register customized algorithms like builtin tuners, assessors and advisors`_ to register your customized algorithms.
Example: Register a customized tuner as a builtin tuner
Example: Register a customized tuner as a builtin tuner
Take the first line as an example. ``dropout_rate`` is defined as a variable whose prior distribution is a uniform distribution with a range from ``0.1`` to ``0.5``.
Take the first line as an example.
``dropout_rate`` is defined as a variable whose prior distribution is a uniform distribution with a range from ``0.1`` to ``0.5``.
.. attention::
.. note:: In the `experiment configuration (V2) schema <ExperimentConfig.rst>`_, NNI supports defining the search space directly in the configuration file, detailed usage can be found `here <QuickStart.rst#step-2-define-the-search-space>`__. When using Python API, users can write the search space in the Python file, refer `here <HowToLaunchFromPython.rst>`__.
The available sampling strategies within a search space depend on the tuner you want to use.
We list the supported types for each built-in tuner :ref:`below <hpo-space-support>`.
Note that the available sampling strategies within a search space depend on the tuner you want to use. We list the supported types for each builtin tuner below. For a customized tuner, you don't have to follow our convention and you will have the flexibility to define any type you want.
For a customized tuner, you don't have to follow our convention and you will have the flexibility to define any type you want.
Types
Types
-----
-----
All types of sampling strategies and their parameter are listed here:
All types of sampling strategies and their parameter are listed here:
choice
^^^^^^
.. code-block:: python
{"_type": "choice", "_value": options}
* The variable's value is one of the options. Here ``options`` should be a list of **numbers** or a list of **strings**. Using arbitrary objects as members of this list (like sublists, a mixture of numbers and strings, or null values) should work in most cases, but may trigger undefined behaviors.
* ``options`` can also be a nested sub-search-space, this sub-search-space takes effect only when the corresponding element is chosen. The variables in this sub-search-space can be seen as conditional variables. Here is an simple :githublink:`example of nested search space definition <examples/trials/mnist-nested-search-space/search_space.json>`. If an element in the options list is a dict, it is a sub-search-space, and for our built-in tuners you have to add a ``_name`` key in this dict, which helps you to identify which element is chosen. Accordingly, here is a :githublink:`sample <examples/trials/mnist-nested-search-space/sample.json>` which users can get from nni with nested search space definition. See the table below for the tuners which support nested search spaces.
randint
^^^^^^^
.. code-block:: python
*
{"_type": "randint", "_value": [lower, upper]}
``{"_type": "choice", "_value": options}``
* Choosing a random integer between ``lower`` (inclusive) and ``upper`` (exclusive).
* Note: Different tuners may interpret ``randint`` differently. Some (e.g., TPE, GridSearch) treat integers from lower
to upper as unordered ones, while others respect the ordering (e.g., SMAC). If you want all the tuners to respect
the ordering, please use ``quniform`` with ``q=1``.
* The variable's value is one of the options. Here ``options`` should be a list of **numbers** or a list of **strings**. Using arbitrary objects as members of this list (like sublists, a mixture of numbers and strings, or null values) should work in most cases, but may trigger undefined behaviors.
uniform
* ``options`` can also be a nested sub-search-space, this sub-search-space takes effect only when the corresponding element is chosen. The variables in this sub-search-space can be seen as conditional variables. Here is an simple :githublink:`example of nested search space definition <examples/trials/mnist-nested-search-space/search_space.json>`. If an element in the options list is a dict, it is a sub-search-space, and for our built-in tuners you have to add a ``_name`` key in this dict, which helps you to identify which element is chosen. Accordingly, here is a :githublink:`sample <examples/trials/mnist-nested-search-space/sample.json>` which users can get from nni with nested search space definition. See the table below for the tuners which support nested search spaces.
* Choosing a random integer between ``lower`` (inclusive) and ``upper`` (exclusive).
* The variable value is uniformly sampled between low and high.
* Note: Different tuners may interpret ``randint`` differently. Some (e.g., TPE, GridSearch) treat integers from lower
* When optimizing, this variable is constrained to a two-sided interval.
to upper as unordered ones, while others respect the ordering (e.g., SMAC). If you want all the tuners to respect
the ordering, please use ``quniform`` with ``q=1``.
*
quniform
``{"_type": "uniform", "_value": [low, high]}``
^^^^^^^^
.. code-block:: python
* The variable value is uniformly sampled between low and high.
{"_type": "quniform", "_value": [low, high, q]}
* When optimizing, this variable is constrained to a two-sided interval.
*
* The variable value is determined using ``clip(round(uniform(low, high) / q) * q, low, high)``\ , where the clip operation is used to constrain the generated value within the bounds. For example, for ``_value`` specified as [0, 10, 2.5], possible values are [0, 2.5, 5.0, 7.5, 10.0]; For ``_value`` specified as [2, 10, 5], possible values are [2, 5, 10].
* Suitable for a discrete value with respect to which the objective is still somewhat "smooth", but which should be bounded both above and below. If you want to uniformly choose an integer from a range [low, high], you can write ``_value`` like this: ``[low, high, 1]``.
loguniform
^^^^^^^^^^
* The variable value is determined using ``clip(round(uniform(low, high) / q) * q, low, high)``\ , where the clip operation is used to constrain the generated value within the bounds. For example, for ``_value`` specified as [0, 10, 2.5], possible values are [0, 2.5, 5.0, 7.5, 10.0]; For ``_value`` specified as [2, 10, 5], possible values are [2, 5, 10].
.. code-block:: python
* Suitable for a discrete value with respect to which the objective is still somewhat "smooth", but which should be bounded both above and below. If you want to uniformly choose an integer from a range [low, high], you can write ``_value`` like this: ``[low, high, 1]``.
* The variable value is drawn from a range [low, high] according to a loguniform distribution like exp(uniform(log(low), log(high))), so that the logarithm of the return value is uniformly distributed.
* When optimizing, this variable is constrained to be positive.
* The variable value is drawn from a range [low, high] according to a loguniform distribution like exp(uniform(log(low), log(high))), so that the logarithm of the return value is uniformly distributed.
qloguniform
* When optimizing, this variable is constrained to be positive.
* The variable value is determined using ``clip(round(loguniform(low, high) / q) * q, low, high)``\ , where the clip operation is used to constrain the generated value within the bounds.
* The variable value is determined using ``clip(round(loguniform(low, high) / q) * q, low, high)``\ , where the clip operation is used to constrain the generated value within the bounds.
* Suitable for a discrete variable with respect to which the objective is "smooth" and gets smoother with the size of the value, but which should be bounded both above and below.
* Suitable for a discrete variable with respect to which the objective is "smooth" and gets smoother with the size of the value, but which should be bounded both above and below.
*
normal
``{"_type": "normal", "_value": [mu, sigma]}``
^^^^^^
.. code-block:: python
* The variable value is a real value that's normally-distributed with mean mu and standard deviation sigma. When optimizing, this is an unconstrained variable.
{"_type": "normal", "_value": [mu, sigma]}
*
* The variable value is a real value that's normally-distributed with mean mu and standard deviation sigma. When optimizing, this is an unconstrained variable.
* The variable value is determined using ``round(normal(mu, sigma) / q) * q``
.. code-block:: python
* Suitable for a discrete variable that probably takes a value around mu, but is fundamentally unbounded.
*
{"_type": "qnormal", "_value": [mu, sigma, q]}
``{"_type": "lognormal", "_value": [mu, sigma]}``
* The variable value is determined using ``round(normal(mu, sigma) / q) * q``
* Suitable for a discrete variable that probably takes a value around mu, but is fundamentally unbounded.
* The variable value is drawn according to ``exp(normal(mu, sigma))`` so that the logarithm of the return value is normally distributed. When optimizing, this variable is constrained to be positive.
* The variable value is determined using ``round(exp(normal(mu, sigma)) / q) * q``
* The variable value is drawn according to ``exp(normal(mu, sigma))`` so that the logarithm of the return value is normally distributed. When optimizing, this variable is constrained to be positive.
* Suitable for a discrete variable with respect to which the objective is smooth and gets smoother with the size of the variable, which is bounded from one side.
qlognormal
^^^^^^^^^^
.. code-block:: python
{"_type": "qlognormal", "_value": [mu, sigma, q]}
* The variable value is determined using ``round(exp(normal(mu, sigma)) / q) * q``
* Suitable for a discrete variable with respect to which the objective is smooth and gets smoother with the size of the variable, which is bounded from one side.
.. _hpo-space-support:
Search Space Types Supported by Each Tuner
Search Space Types Supported by Each Tuner
------------------------------------------
------------------------------------------
...
@@ -124,68 +156,87 @@ Search Space Types Supported by Each Tuner
...
@@ -124,68 +156,87 @@ Search Space Types Supported by Each Tuner
@@ -260,12 +329,12 @@ Search Space Types Supported by Each Tuner
...
@@ -260,12 +329,12 @@ Search Space Types Supported by Each Tuner
Known Limitations:
Known Limitations:
* GP Tuner, Metis Tuner and DNGO tuner support only **numerical values** in search space
(``choice`` type values can be no-numerical with other tuners, e.g. string values).
Both GP Tuner and Metis Tuner use Gaussian Process Regressor(GPR).
GPR make predictions based on a kernel function and the 'distance' between different points,
it's hard to get the true distance between no-numerical values.
*
* Note that for nested search space:
GP Tuner, Metis Tuner and DNGO tuner support only **numerical values** in search space (\ ``choice`` type values can be no-numerical with other tuners, e.g. string values). Both GP Tuner and Metis Tuner use Gaussian Process Regressor(GPR). GPR make predictions based on a kernel function and the 'distance' between different points, it's hard to get the true distance between no-numerical values.
*
Note that for nested search space:
* Only Random Search/TPE/Anneal/Evolution/Grid Search tuner supports nested search space
* Only TPE/Random/Grid Search/Anneal/Evolution tuners support nested search space.
You can launch a tensorboard process cross one or multi trials within webui since NNI v2.2. This feature supports local training service and reuse mode training service with shared storage for now, and will support more scenarios in later nni version.
Preparation
-----------
Make sure tensorboard installed in your environment. If you never used tensorboard, here are getting start tutorials for your reference, `tensorboard with tensorflow <https://www.tensorflow.org/tensorboard/get_started>`__, `tensorboard with pytorch <https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html>`__.
Use WebUI Launch Tensorboard
----------------------------
1. Save Logs
^^^^^^^^^^^^
NNI will automatically fetch the ``tensorboard`` subfolder under trial's output folder as tensorboard logdir. So in trial's source code, you need to save the tensorboard logs under ``NNI_OUTPUT_DIR/tensorboard``. This log path can be joined as:
Like compare, select the trials you want to combine to launch the tensorboard at first, then click the ``Tensorboard`` button.
.. image:: ../../img/Tensorboard_1.png
:target: ../../img/Tensorboard_1.png
:alt:
After click the ``OK`` button in the pop-up box, you will jump to the tensorboard portal.
.. image:: ../../img/Tensorboard_2.png
:target: ../../img/Tensorboard_2.png
:alt:
You can see the ``SequenceID-TrialID`` on the tensorboard portal.
.. image:: ../../img/Tensorboard_3.png
:target: ../../img/Tensorboard_3.png
:alt:
3. Stop All
^^^^^^^^^^^^
If you want to open the portal you have already launched, click the tensorboard id. If you don't need the tensorboard anymore, click ``Stop all tensorboard`` button.
- This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on the random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
- This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on the random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
- Naive Evolution comes from Large-Scale Evolution of Image Classifiers. It randomly initializes a population-based on search space. For each generation, it chooses better ones and does some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naïve Evolution requires many trials to work, but it's very simple and easy to expand new features. `Reference paper <https://arxiv.org/pdf/1703.01041.pdf>`__
- Naive Evolution comes from Large-Scale Evolution of Image Classifiers. It randomly initializes a population-based on search space. For each generation, it chooses better ones and does some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naïve Evolution requires many trials to work, but it's very simple and easy to expand new features. `Reference paper <https://arxiv.org/pdf/1703.01041.pdf>`__
- SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by NNI is a wrapper on the SMAC3 GitHub repo.
- SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by NNI is a wrapper on the SMAC3 GitHub repo.
Notice, SMAC needs to be installed by ``pip install nni[SMAC]`` command. `Reference Paper, <https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf>`__ `GitHub Repo <https://github.com/automl/SMAC3>`__
Notice, SMAC needs to be installed by ``pip install nni[SMAC]`` command. `Reference Paper, <https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf>`__ `GitHub Repo <https://github.com/automl/SMAC3>`__
- Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.
- Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.
- Hyperband tries to use limited resources to explore as many configurations as possible and returns the most promising ones as a final result. The basic idea is to generate many configurations and run them for a small number of trials. The half least-promising configurations are thrown out, the remaining are further trained along with a selection of new configurations. The size of these populations is sensitive to resource constraints (e.g. allotted search time). `Reference Paper <https://arxiv.org/pdf/1603.06560.pdf>`__
- Hyperband tries to use limited resources to explore as many configurations as possible and returns the most promising ones as a final result. The basic idea is to generate many configurations and run them for a small number of trials. The half least-promising configurations are thrown out, the remaining are further trained along with a selection of new configurations. The size of these populations is sensitive to resource constraints (e.g. allotted search time). `Reference Paper <https://arxiv.org/pdf/1603.06560.pdf>`__
- Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. `Reference Paper <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__
- Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. `Reference Paper <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__
- BOHB is a follow-up work to Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. `Reference Paper <https://arxiv.org/abs/1807.01774>`__
- BOHB is a follow-up work to Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. `Reference Paper <https://arxiv.org/abs/1807.01774>`__
- Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. `Reference Paper <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__, `Github Repo <https://github.com/fmfn/BayesianOptimization>`__
- Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. `Reference Paper <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__, `Github Repo <https://github.com/fmfn/BayesianOptimization>`__
- PBT Tuner is a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. `Reference Paper <https://arxiv.org/abs/1711.09846v1>`__
- PBT Tuner is a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. `Reference Paper <https://arxiv.org/abs/1711.09846v1>`__