Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type ``choice`` in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__.
Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type ``choice`` in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__.
Suggested scenario: If the configurations you want to try have been decided, you can list them in the SearchSpace file (using ``choice``\ ) and run them using the batch tuner.
Suggested scenario: If the configurations you want to try have been decided, you can list them in the SearchSpace file (using ``choice``\ ) and run them using the batch tuner.
2. Usage
Usage
--------
-----
Example Configuration
Example Configuration
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
...
@@ -18,7 +18,7 @@ Example Configuration
...
@@ -18,7 +18,7 @@ Example Configuration
# config.yml
# config.yml
tuner:
tuner:
builtinTunerName: BatchTuner
name: BatchTuner
:raw-html:`<br>`
:raw-html:`<br>`
...
@@ -39,4 +39,4 @@ Note that the search space for BatchTuner should look like:
...
@@ -39,4 +39,4 @@ Note that the search space for BatchTuner should look like:
}
}
}
}
The search space file should include the high-level key ``combine_params``. The type of params in the search space must be ``choice`` and the ``values`` must include all the combined params values.
The search space file should include the high-level key ``combine_params``. The type of params in the search space must be ``choice`` and the ``values`` must include all the combined params values.
BOHB is a robust and efficient hyperparameter tuning algorithm mentioned in `this reference paper <https://arxiv.org/abs/1807.01774>`__. BO is an abbreviation for "Bayesian Optimization" and HB is an abbreviation for "Hyperband".
BOHB is a robust and efficient hyperparameter tuning algorithm mentioned in `this reference paper <https://arxiv.org/abs/1807.01774>`__. BO is an abbreviation for "Bayesian Optimization" and HB is an abbreviation for "Hyperband".
...
@@ -46,8 +46,8 @@ best and worst configurations, respectively, to model the two densities.
...
@@ -46,8 +46,8 @@ best and worst configurations, respectively, to model the two densities.
Note that we also sample a constant fraction named **random fraction** of the configurations uniformly at random.
Note that we also sample a constant fraction named **random fraction** of the configurations uniformly at random.
2. Workflow
Workflow
-----------
--------
.. image:: ../../img/bohb_6.jpg
.. image:: ../../img/bohb_6.jpg
...
@@ -66,8 +66,8 @@ The sampling procedure (using Multidimensional KDE to guide selection) is summar
...
@@ -66,8 +66,8 @@ The sampling procedure (using Multidimensional KDE to guide selection) is summar
:alt:
:alt:
3. Usage
Usage
--------
-----
Installation
Installation
^^^^^^^^^^^^
^^^^^^^^^^^^
...
@@ -133,8 +133,8 @@ To use BOHB, you should add the following spec in your experiment's YAML config
...
@@ -133,8 +133,8 @@ To use BOHB, you should add the following spec in your experiment's YAML config
*Please note that the float type currently only supports decimal representations. You have to use 0.333 instead of 1/3 and 0.001 instead of 1e-3.*
*Please note that the float type currently only supports decimal representations. You have to use 0.333 instead of 1/3 and 0.001 instead of 1e-3.*
4. File Structure
File Structure
-----------------
--------------
The advisor has a lot of different files, functions, and classes. Here, we will only give most of those files a brief introduction:
The advisor has a lot of different files, functions, and classes. Here, we will only give most of those files a brief introduction:
...
@@ -142,8 +142,8 @@ The advisor has a lot of different files, functions, and classes. Here, we will
...
@@ -142,8 +142,8 @@ The advisor has a lot of different files, functions, and classes. Here, we will
* ``bohb_advisor.py`` Definition of BOHB, handles interaction with the dispatcher, including generating new trials and processing results. Also includes the implementation of the HB (Hyperband) part.
* ``bohb_advisor.py`` Definition of BOHB, handles interaction with the dispatcher, including generating new trials and processing results. Also includes the implementation of the HB (Hyperband) part.
* ``config_generator.py`` Includes the implementation of the BO (Bayesian Optimization) part. The function *get_config* can generate new configurations based on BO; the function *new_result* will update the model with the new result.
* ``config_generator.py`` Includes the implementation of the BO (Bayesian Optimization) part. The function *get_config* can generate new configurations based on BO; the function *new_result* will update the model with the new result.
@@ -30,7 +30,7 @@ Currently, we support the following algorithms:
...
@@ -30,7 +30,7 @@ Currently, we support the following algorithms:
* - `Batch tuner <#Batch>`__
* - `Batch tuner <#Batch>`__
- Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.
- Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.
* - `Grid Search <#GridSearch>`__
* - `Grid Search <#GridSearch>`__
- Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, randint.
- Grid Search performs an exhaustive searching through the search space.
* - `Hyperband <#Hyperband>`__
* - `Hyperband <#Hyperband>`__
- Hyperband tries to use limited resources to explore as many configurations as possible and returns the most promising ones as a final result. The basic idea is to generate many configurations and run them for a small number of trials. The half least-promising configurations are thrown out, the remaining are further trained along with a selection of new configurations. The size of these populations is sensitive to resource constraints (e.g. allotted search time). `Reference Paper <https://arxiv.org/pdf/1603.06560.pdf>`__
- Hyperband tries to use limited resources to explore as many configurations as possible and returns the most promising ones as a final result. The basic idea is to generate many configurations and run them for a small number of trials. The half least-promising configurations are thrown out, the remaining are further trained along with a selection of new configurations. The size of these populations is sensitive to resource constraints (e.g. allotted search time). `Reference Paper <https://arxiv.org/pdf/1603.06560.pdf>`__
* - `Network Morphism <#NetworkMorphism>`__
* - `Network Morphism <#NetworkMorphism>`__
...
@@ -49,7 +49,7 @@ Currently, we support the following algorithms:
...
@@ -49,7 +49,7 @@ Currently, we support the following algorithms:
Usage of Built-in Tuners
Usage of Built-in Tuners
------------------------
------------------------
Using a built-in tuner provided by the NNI SDK requires one to declare the **builtinTunerName** and **classArgs** in the ``config.yml`` file. In this part, we will introduce each tuner along with information about usage and suggested scenarios, classArg requirements, and an example configuration.
Using a built-in tuner provided by the NNI SDK requires one to declare the **name** and **classArgs** in the ``config.yml`` file. In this part, we will introduce each tuner along with information about usage and suggested scenarios, classArg requirements, and an example configuration.
Note: Please follow the format when you write your ``config.yml`` file. Some built-in tuners have dependencies that need to be installed using ``pip install nni[<tuner>]``, like SMAC's dependencies can be installed using ``pip install nni[SMAC]``.
Note: Please follow the format when you write your ``config.yml`` file. Some built-in tuners have dependencies that need to be installed using ``pip install nni[<tuner>]``, like SMAC's dependencies can be installed using ``pip install nni[SMAC]``.
...
@@ -62,7 +62,7 @@ TPE
...
@@ -62,7 +62,7 @@ TPE
Built-in Tuner Name: **TPE**
Built-in Tuner Name: **TPE**
TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resources and can only try a small number of trials. From a large amount of experiments, we found that TPE is far better than Random Search. `Detailed Description <./HyperoptTuner.rst>`__
TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resources and can only try a small number of trials. From a large amount of experiments, we found that TPE is far better than Random Search. `Detailed Description <./TpeTuner.rst>`__
:raw-html:`<br>`
:raw-html:`<br>`
...
@@ -75,7 +75,7 @@ Random Search
...
@@ -75,7 +75,7 @@ Random Search
Built-in Tuner Name: **Random**
Built-in Tuner Name: **Random**
Random search is suggested when each trial does not take very long (e.g., each trial can be completed very quickly, or early stopped by the assessor), and you have enough computational resources. It's also useful if you want to uniformly explore the search space. Random Search can be considered a baseline search algorithm. `Detailed Description <./HyperoptTuner.rst>`__
Random search is suggested when each trial does not take very long (e.g., each trial can be completed very quickly, or early stopped by the assessor), and you have enough computational resources. It's also useful if you want to uniformly explore the search space. Random Search can be considered a baseline search algorithm. `Detailed Description <./RandomTuner.rst>`__
:raw-html:`<br>`
:raw-html:`<br>`
...
@@ -144,8 +144,6 @@ Grid Search
...
@@ -144,8 +144,6 @@ Grid Search
Built-in Tuner Name: **Grid Search**
Built-in Tuner Name: **Grid Search**
Note that the only acceptable types within the search space are ``choice``\ , ``quniform``\ , and ``randint``.
This is suggested when the search space is small. It's suggested when it is feasible to exhaustively sweep the whole search space. `Detailed Description <./GridsearchTuner.rst>`__
This is suggested when the search space is small. It's suggested when it is feasible to exhaustively sweep the whole search space. `Detailed Description <./GridsearchTuner.rst>`__
Naive Evolution comes from `Large-Scale Evolution of Image Classifiers <https://arxiv.org/pdf/1703.01041.pdf>`__. It randomly initializes a population based on the search space. For each generation, it chooses better ones and does some mutation (e.g., changes a hyperparameter, adds/removes one layer, etc.) on them to get the next generation. Naive Evolution requires many trials to works but it's very simple and it's easily expanded with new features.
Naive Evolution comes from `Large-Scale Evolution of Image Classifiers <https://arxiv.org/pdf/1703.01041.pdf>`__. It randomly initializes a population based on the search space. For each generation, it chooses better ones and does some mutation (e.g., changes a hyperparameter, adds/removes one layer, etc.) on them to get the next generation. Naive Evolution requires many trials to works but it's very simple and it's easily expanded with new features.
Bayesian optimization works by constructing a posterior distribution of functions (a Gaussian Process) that best describes the function you want to optimize. As the number of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in parameter space are worth exploring and which are not.
Bayesian optimization works by constructing a posterior distribution of functions (a Gaussian Process) that best describes the function you want to optimize. As the number of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in parameter space are worth exploring and which are not.
...
@@ -12,8 +12,8 @@ Note that the only acceptable types within the search space are ``randint``\ , `
...
@@ -12,8 +12,8 @@ Note that the only acceptable types within the search space are ``randint``\ , `
This optimization approach is described in Section 3 of `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__.
This optimization approach is described in Section 3 of `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__.
`Hyperband <https://arxiv.org/pdf/1603.06560.pdf>`__ is a popular autoML algorithm. The basic idea of Hyperband is to create several buckets, each having ``n`` randomly generated hyperparameter configurations, each configuration using ``r`` resources (e.g., epoch number, batch number). After the ``n`` configurations are finished, it chooses the top ``n/eta`` configurations and runs them using increased ``r*eta`` resources. At last, it chooses the best configuration it has found so far.
`Hyperband <https://arxiv.org/pdf/1603.06560.pdf>`__ is a popular autoML algorithm. The basic idea of Hyperband is to create several buckets, each having ``n`` randomly generated hyperparameter configurations, each configuration using ``r`` resources (e.g., epoch number, batch number). After the ``n`` configurations are finished, it chooses the top ``n/eta`` configurations and runs them using increased ``r*eta`` resources. At last, it chooses the best configuration it has found so far.
2. Implementation with full parallelism
Implementation with full parallelism
---------------------------------------
------------------------------------
First, this is an example of how to write an autoML algorithm based on MsgDispatcherBase, rather than Tuner and Assessor. Hyperband is implemented in this way because it integrates the functions of both Tuner and Assessor, thus, we call it Advisor.
First, this is an example of how to write an autoML algorithm based on MsgDispatcherBase, rather than Tuner and Assessor. Hyperband is implemented in this way because it integrates the functions of both Tuner and Assessor, thus, we call it Advisor.
...
@@ -31,8 +31,8 @@ Or if you want to set ``exec_mode`` with ``serial`` according to the original al
...
@@ -31,8 +31,8 @@ Or if you want to set ``exec_mode`` with ``serial`` according to the original al
If you want to reproduce these results, refer to the example under ``examples/trials/benchmarking/`` for details.
If you want to reproduce these results, refer to the example under ``examples/trials/benchmarking/`` for details.
3. Usage
Usage
--------
-----
Config file
Config file
^^^^^^^^^^^
^^^^^^^^^^^
...
@@ -138,8 +138,8 @@ Example Configuration
...
@@ -138,8 +138,8 @@ Example Configuration
R: 60
R: 60
eta: 3
eta: 3
4. Future improvements
Future improvements
----------------------
-------------------
The current implementation of Hyperband can be further improved by supporting a simple early stop algorithm since it's possible that not all the configurations in the top ``n/eta`` perform well. Any unpromising configurations should be stopped early.
The current implementation of Hyperband can be further improved by supporting a simple early stop algorithm since it's possible that not all the configurations in the top ``n/eta`` perform well. Any unpromising configurations should be stopped early.
The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluation matric. P(x|y) is modeled by transforming the generative process of hyperparameters, replacing the distributions of the configuration prior with non-parametric densities. This optimization approach is described in detail in `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__.
Parallel TPE optimization
^^^^^^^^^^^^^^^^^^^^^^^^^
TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete. The original algorithm design was optimized for sequential computation. If we were to use TPE with much concurrency, its performance will be bad. We have optimized this case using the Constant Liar algorithm. For these principles of optimization, please refer to our `research blog <../CommunitySharings/ParallelizingTpeSearch.rst>`__.
2. Usage
--------
To use TPE, you should add the following spec in your experiment's YAML config file:
.. code-block:: yaml
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
parallel_optimize: True
constant_liar_type: min
classArgs requirements
^^^^^^^^^^^^^^^^^^^^^^
* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
* **parallel_optimize** (*bool, optional, default = False*\ ) - If True, TPE will use the Constant Liar algorithm to optimize parallel hyperparameter tuning. Otherwise, TPE will not discriminate between sequential or parallel situations.
* **constant_liar_type** (*min or max or mean, optional, default = min*\ ) - The type of constant liar to use, will logically be determined on the basis of the values taken by y at X. There are three possible values, min{Y}, max{Y}, and mean{Y}.
Note: We have optimized the parallelism of TPE for large-scale trial concurrency. For the principle of optimization or turn-on optimization, please refer to `TPE document <./HyperoptTuner.rst>`__.
Example Configuration
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
# config.yml
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
RandomSearch
------------
------------
1. Introduction
---------------
In `Random Search for Hyper-Parameter Optimization <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__ we show that Random Search might be surprisingly effective despite its simplicity. We suggest using Random Search as a baseline when no knowledge about the prior distribution of hyper-parameters is available.
2. Usage
--------
Example Configuration
.. code-block:: yaml
# config.yml
tuner:
builtinTunerName: Random
Anneal on NNI
-------------
1. Introduction
---------------
This simple annealing algorithm begins by sampling from the prior but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
This simple annealing algorithm begins by sampling from the prior but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
2. Usage
Usage
--------
-----
classArgs Requirements
classArgs Requirements
^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^
* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
`Metis <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__ offers several benefits over other tuning algorithms. While most tools only predict the optimal configuration, Metis gives you two outputs, a prediction for the optimal configuration and a suggestion for the next trial. No more guess work!
`Metis <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__ offers several benefits over other tuning algorithms. While most tools only predict the optimal configuration, Metis gives you two outputs, a prediction for the optimal configuration and a suggestion for the next trial. No more guess work!
...
@@ -23,8 +23,8 @@ Note that the only acceptable types within the search space are ``quniform``\ ,
...
@@ -23,8 +23,8 @@ Note that the only acceptable types within the search space are ``quniform``\ ,
More details can be found in our `paper <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__.
More details can be found in our `paper <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__.
Population Based Training (PBT) comes from `Population Based Training of Neural Networks <https://arxiv.org/abs/1711.09846v1>`__. It's a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training.
Population Based Training (PBT) comes from `Population Based Training of Neural Networks <https://arxiv.org/abs/1711.09846v1>`__. It's a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training.
...
@@ -15,8 +15,8 @@ Population Based Training (PBT) comes from `Population Based Training of Neural
...
@@ -15,8 +15,8 @@ Population Based Training (PBT) comes from `Population Based Training of Neural
PBTTuner initializes a population with several trials (i.e., ``population_size``\ ). There are four steps in the above figure, each trial only runs by one step. How long is one step is controlled by trial code, e.g., one epoch. When a trial starts, it loads a checkpoint specified by PBTTuner and continues to run one step, then saves checkpoint to a directory specified by PBTTuner and exits. The trials in a population run steps synchronously, that is, after all the trials finish the ``i``\ -th step, the ``(i+1)``\ -th step can be started. Exploitation and exploration of PBT are executed between two consecutive steps.
PBTTuner initializes a population with several trials (i.e., ``population_size``\ ). There are four steps in the above figure, each trial only runs by one step. How long is one step is controlled by trial code, e.g., one epoch. When a trial starts, it loads a checkpoint specified by PBTTuner and continues to run one step, then saves checkpoint to a directory specified by PBTTuner and exits. The trials in a population run steps synchronously, that is, after all the trials finish the ``i``\ -th step, the ``(i+1)``\ -th step can be started. Exploitation and exploration of PBT are executed between two consecutive steps.
2. Usage
Usage
--------
-----
Provide checkpoint directory
Provide checkpoint directory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
@@ -62,7 +62,7 @@ Below is an exmaple of PBTTuner configuration in experiment config file. **Note
...
@@ -62,7 +62,7 @@ Below is an exmaple of PBTTuner configuration in experiment config file. **Note
In `Random Search for Hyper-Parameter Optimization <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__ we show that Random Search might be surprisingly effective despite its simplicity.
We suggest using Random Search as a baseline when no knowledge about the prior distribution of hyper-parameters is available.
`SMAC <https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf>`__ is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO in order to handle categorical parameters. The SMAC supported by nni is a wrapper on `the SMAC3 github repo <https://github.com/automl/SMAC3>`__.
`SMAC <https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf>`__ is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO in order to handle categorical parameters. The SMAC supported by nni is a wrapper on `the SMAC3 github repo <https://github.com/automl/SMAC3>`__.
Note that SMAC on nni only supports a subset of the types in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__\ : ``choice``\ , ``randint``\ , ``uniform``\ , ``loguniform``\ , and ``quniform``.
Note that SMAC on nni only supports a subset of the types in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__\ : ``choice``\ , ``randint``\ , ``uniform``\ , ``loguniform``\ , and ``quniform``.
The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach.
SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements,
and then subsequently choose new hyperparameters to test based on this model.
The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluation matric.
P(x|y) is modeled by transforming the generative process of hyperparameters,
replacing the distributions of the configuration prior with non-parametric densities.
This optimization approach is described in detail in `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__.
Parallel TPE optimization
^^^^^^^^^^^^^^^^^^^^^^^^^
TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete.
The original algorithm design was optimized for sequential computation.
If we were to use TPE with much concurrency, its performance will be bad.
We have optimized this case using the Constant Liar algorithm.
For these principles of optimization, please refer to our `research blog <../CommunitySharings/ParallelizingTpeSearch.rst>`__.
Usage
-----
To use TPE, you should add the following spec in your experiment's YAML config file:
.. code-block:: yaml
## minimal config ##
tuner:
name: TPE
classArgs:
optimize_mode: minimize
.. code-block:: yaml
## advanced config ##
tuner:
name: TPE
classArgs:
optimize_mode: maximize
seed: 12345
tpe_args:
constant_liar_type: 'mean'
n_startup_jobs: 10
n_ei_candidates: 20
linear_forgetting: 100
prior_weight: 0
gamma: 0.5
classArgs
^^^^^^^^^
.. list-table::
:widths: 10 20 10 60
:header-rows: 1
* - Field
- Type
- Default
- Description
* - ``optimize_mode``
- ``'minimize' | 'maximize'``
- ``'minimize'``
- Whether to minimize or maximize trial metrics.
* - ``seed``
- ``int | null``
- ``null``
- The random seed.
* - ``tpe_args.constant_liar_type``
- ``'best' | 'worst' | 'mean' | null``
- ``'best'``
- TPE algorithm itself does not support parallel tuning. This parameter specifies how to optimize for trial_concurrency > 1. How each liar works is explained in paper's section 6.1.
In general ``best`` suit for small trial number and ``worst`` suit for large trial number.
* - ``tpe_args.n_startup_jobs``
- ``int``
- ``20``
- The first N hyper-parameters are generated fully randomly for warming up.
If the search space is large, you can increase this value. Or if max_trial_number is small, you may want to decrease it.
* - ``tpe_args.n_ei_candidates``
- ``int``
- ``24``
- For each iteration TPE samples EI for N sets of parameters and choose the best one. (loosely speaking)
* - ``tpe_args.linear_forgetting``
- ``int``
- ``25``
- TPE will lower the weights of old trials. This controls how many iterations it takes for a trial to start decay.
* - ``tpe_args.prior_weight``
- ``float``
- ``1.0``
- TPE treats user provided search space as prior.
When generating new trials, it also incorporates the prior in trial history by transforming the search space to
one trial configuration (i.e., each parameter of this configuration chooses the mean of its candidate range).
Here, prior_weight determines the weight of this trial configuration in the history trial configurations.
With prior weight 1.0, the search space is treated as one good trial.
For example, "normal(0, 1)" effectly equals to a trial with x = 0 which has yielded good result.
* - ``tpe_args.gamma``
- ``float``
- ``0.25``
- Controls how many trials are considered "good".
The number is calculated as "min(gamma * sqrt(N), linear_forgetting)".