"llm/vscode:/vscode.git/clone" did not exist on "f00757db2784510506f52c629624ebe040183692"
HyperoptTuner.rst 4.01 KB
Newer Older
1
TPE, Random Search, Anneal Tuners on NNI
2
========================================
3

4
5
TPE
---
6

ChrisZRen's avatar
ChrisZRen committed
7
8
9
1. Introduction
---------------

10
The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluation matric. P(x|y) is modeled by transforming the generative process of hyperparameters, replacing the distributions of the configuration prior with non-parametric densities. This optimization approach is described in detail in `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__. ​
11

12
13
Parallel TPE optimization
^^^^^^^^^^^^^^^^^^^^^^^^^
xuehui's avatar
xuehui committed
14

15
TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete. The original algorithm design was optimized for sequential computation. If we were to use TPE with much concurrency, its performance will be bad. We have optimized this case using the Constant Liar algorithm. For these principles of optimization, please refer to our `research blog <../CommunitySharings/ParallelizingTpeSearch.rst>`__.
xuehui's avatar
xuehui committed
16

ChrisZRen's avatar
ChrisZRen committed
17
18
2. Usage
--------
xuehui's avatar
xuehui committed
19
20
21

 To use TPE, you should add the following spec in your experiment's YAML config file:

22
23
24
25
26
27
28
29
.. code-block:: yaml

   tuner:
     builtinTunerName: TPE
     classArgs:
       optimize_mode: maximize
       parallel_optimize: True
       constant_liar_type: min
xuehui's avatar
xuehui committed
30

ChrisZRen's avatar
ChrisZRen committed
31
32
classArgs requirements
^^^^^^^^^^^^^^^^^^^^^^
xuehui's avatar
xuehui committed
33

34
* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
ChrisZRen's avatar
ChrisZRen committed
35

36
* **parallel_optimize** (*bool, optional, default = False*\ ) - If True, TPE will use the Constant Liar algorithm to optimize parallel hyperparameter tuning. Otherwise, TPE will not discriminate between sequential or parallel situations.
ChrisZRen's avatar
ChrisZRen committed
37

38
39
* **constant_liar_type** (*min or max or mean, optional, default = min*\ ) - The type of constant liar to use, will logically be determined on the basis of the values taken by y at X. There are three possible values, min{Y}, max{Y}, and mean{Y}.

ChrisZRen's avatar
ChrisZRen committed
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
Note: We have optimized the parallelism of TPE for large-scale trial concurrency. For the principle of optimization or turn-on optimization, please refer to `TPE document <./HyperoptTuner.rst>`__.

Example Configuration
^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   # config.yml
   tuner:
     builtinTunerName: TPE
     classArgs:
       optimize_mode: maximize

RandomSearch
------------

1. Introduction
---------------
58

59
In `Random Search for Hyper-Parameter Optimization <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__ we show that Random Search might be surprisingly effective despite its simplicity. We suggest using Random Search as a baseline when no knowledge about the prior distribution of hyper-parameters is available.
60

ChrisZRen's avatar
ChrisZRen committed
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
2. Usage
--------

Example Configuration

.. code-block:: yaml

   # config.yml
   tuner:
     builtinTunerName: Random

Anneal on NNI
-------------

1. Introduction
---------------
77

78
This simple annealing algorithm begins by sampling from the prior but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
ChrisZRen's avatar
ChrisZRen committed
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97

2. Usage
--------

classArgs Requirements
^^^^^^^^^^^^^^^^^^^^^^

* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.

Example Configuration
^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   # config.yml
   tuner:
     builtinTunerName: Anneal
     classArgs:
       optimize_mode: maximize