Update random & tpe & grid search tuner doc (#4339)

463c0f78 · liuzhe-lz · GitHub · c9ddce99 · 463c0f78 · 463c0f78
Unverified Commit 463c0f78 authored Dec 08, 2021 by liuzhe-lz Committed by GitHub Dec 08, 2021
18 changed files
--- a/docs/en_US/Tuner/BatchTuner.rst
+++ b/docs/en_US/Tuner/BatchTuner.rst
 Batch Tuner on NNI
 ==================
-1. Introduction
+Introduction
---------------
+------------
 Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type ``choice`` in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__.
 Suggested scenario: If the configurations you want to try have been decided, you can list them in the SearchSpace file (using ``choice``\ ) and run them using the batch tuner.
-2. Usage
+Usage
--------
+-----
 Example Configuration
 ^^^^^^^^^^^^^^^^^^^^^
@@ -18,7 +18,7 @@ Example Configuration
   # config.yml
   tuner:
-     builtinTunerName: BatchTuner
+     name: BatchTuner
 :raw-html:`<br>`
@@ -39,4 +39,4 @@ Note that the search space for BatchTuner should look like:
       }
   }
 The search space file should include the high-level key ``combine_params``. The type of params in the search space must be ``choice`` and the ``values`` must include all the combined params values.
\ No newline at end of file
--- a/docs/en_US/Tuner/BohbAdvisor.rst
+++ b/docs/en_US/Tuner/BohbAdvisor.rst
 BOHB Advisor on NNI
 ===================
-1. Introduction
+Introduction
---------------
+------------
 BOHB is a robust and efficient hyperparameter tuning algorithm mentioned in `this reference paper <https://arxiv.org/abs/1807.01774>`__. BO is an abbreviation for "Bayesian Optimization" and HB is an abbreviation for "Hyperband".
@@ -46,8 +46,8 @@ best and worst configurations, respectively, to model the two densities.
 Note that we also sample a constant fraction named **random fraction** of the configurations uniformly at random.
-2. Workflow
+Workflow
-----------
+--------
 .. image:: ../../img/bohb_6.jpg
@@ -66,8 +66,8 @@ The sampling procedure (using Multidimensional KDE to guide selection) is summar
   :alt: 
-3. Usage
+Usage
--------
+-----
 Installation
 ^^^^^^^^^^^^
@@ -133,8 +133,8 @@ To use BOHB, you should add the following spec in your experiment's YAML config
 *Please note that the float type currently only supports decimal representations. You have to use 0.333 instead of 1/3 and 0.001 instead of 1e-3.*
-4. File Structure
+File Structure
-----------------
+--------------
 The advisor has a lot of different files, functions, and classes. Here, we will only give most of those files a brief introduction:
@@ -142,8 +142,8 @@ The advisor has a lot of different files, functions, and classes. Here, we will
 * ``bohb_advisor.py`` Definition of BOHB, handles interaction with the dispatcher, including generating new trials and processing results. Also includes the implementation of the HB (Hyperband) part.
 * ``config_generator.py`` Includes the implementation of the BO (Bayesian Optimization) part. The function *get_config* can generate new configurations based on BO; the function *new_result* will update the model with the new result.
-5. Experiment
+Experiment
-------------
+----------
 MNIST with BOHB
 ^^^^^^^^^^^^^^^

--- a/docs/en_US/Tuner/BuiltinTuner.rst
+++ b/docs/en_US/Tuner/BuiltinTuner.rst
@@ -30,7 +30,7 @@ Currently, we support the following algorithms:
   * - `Batch tuner <#Batch>`__
     - Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.
   * - `Grid Search <#GridSearch>`__
-     - Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, randint.
+     - Grid Search performs an exhaustive searching through the search space.
   * - `Hyperband <#Hyperband>`__
     - Hyperband tries to use limited resources to explore as many configurations as possible and returns the most promising ones as a final result. The basic idea is to generate many configurations and run them for a small number of trials. The half least-promising configurations are thrown out, the remaining are further trained along with a selection of new configurations. The size of these populations is sensitive to resource constraints (e.g. allotted search time). `Reference Paper <https://arxiv.org/pdf/1603.06560.pdf>`__
   * - `Network Morphism <#NetworkMorphism>`__
@@ -49,7 +49,7 @@ Currently, we support the following algorithms:
 Usage of Built-in Tuners
 ------------------------
-Using a built-in tuner provided by the NNI SDK requires one to declare the  **builtinTunerName** and **classArgs** in the ``config.yml`` file. In this part, we will introduce each tuner along with information about usage and suggested scenarios, classArg requirements, and an example configuration.
+Using a built-in tuner provided by the NNI SDK requires one to declare the  **name** and **classArgs** in the ``config.yml`` file. In this part, we will introduce each tuner along with information about usage and suggested scenarios, classArg requirements, and an example configuration.
 Note: Please follow the format when you write your ``config.yml`` file. Some built-in tuners have dependencies that need to be installed using ``pip install nni[<tuner>]``, like SMAC's dependencies can be installed using ``pip install nni[SMAC]``.
@@ -62,7 +62,7 @@ TPE
   Built-in Tuner Name: **TPE**
-TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resources and can only try a small number of trials. From a large amount of experiments, we found that TPE is far better than Random Search. `Detailed Description <./HyperoptTuner.rst>`__
+TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resources and can only try a small number of trials. From a large amount of experiments, we found that TPE is far better than Random Search. `Detailed Description <./TpeTuner.rst>`__
 :raw-html:`<br>`
@@ -75,7 +75,7 @@ Random Search
   Built-in Tuner Name: **Random**
-Random search is suggested when each trial does not take very long (e.g., each trial can be completed very quickly, or early stopped by the assessor), and you have enough computational resources. It's also useful if you want to uniformly explore the search space. Random Search can be considered a baseline search algorithm. `Detailed Description <./HyperoptTuner.rst>`__
+Random search is suggested when each trial does not take very long (e.g., each trial can be completed very quickly, or early stopped by the assessor), and you have enough computational resources. It's also useful if you want to uniformly explore the search space. Random Search can be considered a baseline search algorithm. `Detailed Description <./RandomTuner.rst>`__
 :raw-html:`<br>`
@@ -144,8 +144,6 @@ Grid Search
   Built-in Tuner Name: **Grid Search**
-Note that the only acceptable types within the search space are ``choice``\ , ``quniform``\ , and ``randint``.
 This is suggested when the search space is small. It's suggested when it is feasible to exhaustively sweep the whole search space. `Detailed Description <./GridsearchTuner.rst>`__
 :raw-html:`<br>`

--- a/docs/en_US/Tuner/CustomizeAdvisor.rst
+++ b/docs/en_US/Tuner/CustomizeAdvisor.rst
 **How To** - Customize Your Own Advisor
-===========================================
+=======================================
 *Warning: API is subject to change in future releases.*

--- a/docs/en_US/Tuner/DngoTuner.rst
+++ b/docs/en_US/Tuner/DngoTuner.rst
 DNGO on NNI
 ===========
-1. Introduction
+Introduction
---------------
+------------
-2. Usage
+Usage
--------
+-----
 Installation
 ^^^^^^^^^^^^
@@ -25,6 +25,6 @@ Example Configuration
   # config.yml
   tuner:
-     builtinTunerName: DNGOTuner
+     name: DNGOTuner
     classArgs:
       optimize_mode: maximize
\ No newline at end of file
--- a/docs/en_US/Tuner/EvolutionTuner.rst
+++ b/docs/en_US/Tuner/EvolutionTuner.rst
@@ -2,13 +2,13 @@ Naive Evolution Tuners on NNI
 =============================
-1. Introduction
+Introduction
---------------
+------------
 Naive Evolution comes from `Large-Scale Evolution of Image Classifiers <https://arxiv.org/pdf/1703.01041.pdf>`__. It randomly initializes a population based on the search space. For each generation, it chooses better ones and does some mutation (e.g., changes a hyperparameter, adds/removes one layer, etc.) on them to get the next generation. Naive Evolution requires many trials to works but it's very simple and it's easily expanded with new features.
-2. Usage
+Usage
--------
+-----
 classArgs Requirements
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -26,7 +26,7 @@ Example Configuration
   # config.yml
   tuner:
-     builtinTunerName: Evolution
+     name: Evolution
     classArgs:
       optimize_mode: maximize
       population_size: 100

--- a/docs/en_US/Tuner/GPTuner.rst
+++ b/docs/en_US/Tuner/GPTuner.rst
 GP Tuner on NNI
 ===============
-1. Introduction
+Introduction
---------------
+------------
 Bayesian optimization works by constructing a posterior distribution of functions (a Gaussian Process) that best describes the function you want to optimize. As the number of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in parameter space are worth exploring and which are not.
@@ -12,8 +12,8 @@ Note that the only acceptable types within the search space are ``randint``\ , `
 This optimization approach is described in Section 3 of `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__.
-2. Usage
+Usage
--------
+-----
 classArgs requirements
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -35,7 +35,7 @@ Example Configuration
   # config.yml
   tuner:
-     builtinTunerName: GPTuner
+     name: GPTuner
     classArgs:
       optimize_mode: maximize
       utility: 'ei'
@@ -45,4 +45,4 @@ Example Configuration
       alpha: 1e-6
       cold_start_num: 10
       selection_num_warm_up: 100000
       selection_num_starting_points: 250
\ No newline at end of file
--- a/docs/en_US/Tuner/GridsearchTuner.rst
+++ b/docs/en_US/Tuner/GridsearchTuner.rst
@@ -4,21 +4,22 @@ Grid Search on NNI
 Grid Search
 -----------
-1. Introduction
+Introduction
---------------
+------------
-Grid Search performs an exhaustive search through a manually specified subset of the hyperparameter space defined in the searchspace file. 
+Grid Search performs an exhaustive search through a search space.
-Note that the only acceptable types within the search space are ``choice``\ , ``quniform``\ , and ``randint``.
+For uniform and normal distributed parameters, grid search tuner samples them at progressively decreased intervals.
-2. Usage
+Usage
--------
+-----
+Grid search tuner has no argument.
 Example Configuration
 ^^^^^^^^^^^^^^^^^^^^^
 .. code-block:: yaml
-   # config.yml
   tuner:
-     builtinTunerName: GridSearch
+     name: GridSearch
\ No newline at end of file
--- a/docs/en_US/Tuner/HyperbandAdvisor.rst
+++ b/docs/en_US/Tuner/HyperbandAdvisor.rst
 Hyperband on NNI
 ================
-1. Introduction
+Introduction
---------------
+------------
 `Hyperband <https://arxiv.org/pdf/1603.06560.pdf>`__ is a popular autoML algorithm. The basic idea of Hyperband is to create several buckets, each having ``n`` randomly generated hyperparameter configurations, each configuration using ``r`` resources (e.g., epoch number, batch number). After the ``n`` configurations are finished, it chooses the top ``n/eta`` configurations and runs them using increased ``r*eta`` resources. At last, it chooses the best configuration it has found so far.
-2. Implementation with full parallelism
+Implementation with full parallelism
---------------------------------------
+------------------------------------
 First, this is an example of how to write an autoML algorithm based on MsgDispatcherBase, rather than Tuner and Assessor. Hyperband is implemented in this way because it integrates the functions of both Tuner and Assessor, thus, we call it Advisor.
@@ -31,8 +31,8 @@ Or if you want to set ``exec_mode`` with ``serial`` according to the original al
 If you want to reproduce these results, refer to the example under ``examples/trials/benchmarking/`` for details.
-3. Usage
+Usage
--------
+-----
 Config file
 ^^^^^^^^^^^
@@ -138,8 +138,8 @@ Example Configuration
       R: 60
       eta: 3
-4. Future improvements
+Future improvements
----------------------
+-------------------
 The current implementation of Hyperband can be further improved by supporting a simple early stop algorithm since it's possible that not all the configurations in the top ``n/eta`` perform well. Any unpromising configurations should be stopped early.

--- a/docs/en_US/Tuner/HyperoptTuner.rst
+++ b/docs/en_US/Tuner/HyperoptTuner.rst
-TPE, Random Search, Anneal Tuners on NNI
+Anneal Tuner
-========================================
+============
-TPE
+Introduction
---
-1. Introduction
---------------
-The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluation matric. P(x|y) is modeled by transforming the generative process of hyperparameters, replacing the distributions of the configuration prior with non-parametric densities. This optimization approach is described in detail in `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__. 
-Parallel TPE optimization
-^^^^^^^^^^^^^^^^^^^^^^^^^
-TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete. The original algorithm design was optimized for sequential computation. If we were to use TPE with much concurrency, its performance will be bad. We have optimized this case using the Constant Liar algorithm. For these principles of optimization, please refer to our `research blog <../CommunitySharings/ParallelizingTpeSearch.rst>`__.
-2. Usage
--------
- To use TPE, you should add the following spec in your experiment's YAML config file:
-.. code-block:: yaml
-   tuner:
-     builtinTunerName: TPE
-     classArgs:
-       optimize_mode: maximize
-       parallel_optimize: True
-       constant_liar_type: min
-classArgs requirements
-^^^^^^^^^^^^^^^^^^^^^^
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
-* **parallel_optimize** (*bool, optional, default = False*\ ) - If True, TPE will use the Constant Liar algorithm to optimize parallel hyperparameter tuning. Otherwise, TPE will not discriminate between sequential or parallel situations.
-* **constant_liar_type** (*min or max or mean, optional, default = min*\ ) - The type of constant liar to use, will logically be determined on the basis of the values taken by y at X. There are three possible values, min{Y}, max{Y}, and mean{Y}.
-Note: We have optimized the parallelism of TPE for large-scale trial concurrency. For the principle of optimization or turn-on optimization, please refer to `TPE document <./HyperoptTuner.rst>`__.
-Example Configuration
-^^^^^^^^^^^^^^^^^^^^^
-.. code-block:: yaml
-   # config.yml
-   tuner:
-     builtinTunerName: TPE
-     classArgs:
-       optimize_mode: maximize
-RandomSearch
 ------------
-1. Introduction
---------------
-In `Random Search for Hyper-Parameter Optimization <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__ we show that Random Search might be surprisingly effective despite its simplicity. We suggest using Random Search as a baseline when no knowledge about the prior distribution of hyper-parameters is available.
-2. Usage
--------
-Example Configuration
-.. code-block:: yaml
-   # config.yml
-   tuner:
-     builtinTunerName: Random
-Anneal on NNI
-------------
-1. Introduction
---------------
 This simple annealing algorithm begins by sampling from the prior but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
-2. Usage
+Usage
--------
+-----
 classArgs Requirements
 ^^^^^^^^^^^^^^^^^^^^^^
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
 Example Configuration
 ^^^^^^^^^^^^^^^^^^^^^
@@ -92,6 +21,6 @@ Example Configuration
   # config.yml
   tuner:
-     builtinTunerName: Anneal
+     name: Anneal
     classArgs:
       optimize_mode: maximize
\ No newline at end of file
--- a/docs/en_US/Tuner/MetisTuner.rst
+++ b/docs/en_US/Tuner/MetisTuner.rst
 Metis Tuner on NNI
 ==================
-1. Introduction
+Introduction
---------------
+------------
 `Metis <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__ offers several benefits over other tuning algorithms. While most tools only predict the optimal configuration, Metis gives you two outputs, a prediction for the optimal configuration and a suggestion for the next trial. No more guess work!
@@ -23,8 +23,8 @@ Note that the only acceptable types within the search space are ``quniform``\ ,
 More details can be found in our `paper <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__.
-2. Usage
+Usage
--------
+-----
 classArgs requirements
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -38,6 +38,6 @@ Example Configuration
   # config.yml
   tuner:
-     builtinTunerName: MetisTuner
+     name: MetisTuner
     classArgs:
       optimize_mode: maximize
\ No newline at end of file
--- a/docs/en_US/Tuner/NetworkmorphismTuner.rst
+++ b/docs/en_US/Tuner/NetworkmorphismTuner.rst
 Network Morphism Tuner on NNI
 =============================
-1. Introduction
+Introduction
---------------
+------------
 `Autokeras <https://arxiv.org/abs/1806.10282>`__ is a popular autoML tool using Network Morphism. The basic idea of Autokeras is to use Bayesian Regression to estimate the metric of the Neural Network Architecture. Each time, it generates several child networks from father networks. Then it uses a naïve Bayesian regression to estimate its metric value from the history of trained results of network and metric value pairs. Next, it chooses the child which has the best, estimated performance and adds it to the training queue. Inspired by the work of Autokeras and referring to its `code <https://github.com/jhfjhfj1/autokeras>`__\ , we implemented our Network Morphism method on the NNI platform.
 If you want to know more about network morphism trial usage, please see the :githublink:`Readme.md <examples/trials/network_morphism/README.rst>`.
-2. Usage
+Usage
--------
+-----
 Installation
 ^^^^^^^^^^^^
@@ -36,7 +36,7 @@ To use Network Morphism, you should modify the following spec in your ``config.y
   tuner:
     #choice: NetworkMorphism
-     builtinTunerName: NetworkMorphism
+     name: NetworkMorphism
     classArgs:
       #choice: maximize, minimize
       optimize_mode: maximize
@@ -134,8 +134,8 @@ If you want to save and load the **best model**\ , the following methods are rec
   model_id = "" # id of the model you want to reuse
   loaded_model = torch.load("model-{}.pt".format(model_id))
-3. File Structure
+File Structure
-----------------
+--------------
 The tuner has a lot of different files, functions, and classes. Here, we will give most of those files only a brief introduction:
@@ -164,8 +164,8 @@ The tuner has a lot of different files, functions, and classes. Here, we will gi
 * ``metric.py`` some metric classes including Accuracy and MSE.
 * ``utils.py`` is the example search network architectures for the ``cifar10`` dataset, using Keras.
-4. The Network Representation Json Example
+The Network Representation Json Example
------------------------------------------
+---------------------------------------
 Here is an example of the intermediate representation JSON file we defined, which is passed from the tuner to the trial in the architecture search procedure. Users can call the "json_to_graph()" function in the trial code to build a PyTorch or Keras model from this JSON file.
@@ -293,7 +293,7 @@ You can consider the model to be a `directed acyclic graph <https://en.wikipedia
  * 
    For else layers, the numbering follows the format: its node input id (or id list) and node output id.
-5. TODO
+TODO
-------
+----
 Next step, we will change the API from s fixed network generator to a network generator with more available operators. We will use ONNX instead of JSON later as the intermediate representation spec in the future.
--- a/docs/en_US/Tuner/PBTTuner.rst
+++ b/docs/en_US/Tuner/PBTTuner.rst
 PBT Tuner on NNI
 ================
-1. Introduction
+Introduction
---------------
+------------
 Population Based Training (PBT) comes from `Population Based Training of Neural Networks <https://arxiv.org/abs/1711.09846v1>`__. It's a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. 
@@ -15,8 +15,8 @@ Population Based Training (PBT) comes from `Population Based Training of Neural
 PBTTuner initializes a population with several trials (i.e., ``population_size``\ ). There are four steps in the above figure, each trial only runs by one step. How long is one step is controlled by trial code, e.g., one epoch. When a trial starts, it loads a checkpoint specified by PBTTuner and continues to run one step, then saves checkpoint to a directory specified by PBTTuner and exits. The trials in a population run steps synchronously, that is, after all the trials finish the ``i``\ -th step, the ``(i+1)``\ -th step can be started. Exploitation and exploration of PBT are executed between two consecutive steps.
-2. Usage
+Usage
--------
+-----
 Provide checkpoint directory
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -62,7 +62,7 @@ Below is an exmaple of PBTTuner configuration in experiment config file. **Note
   # config.yml
   tuner:
-     builtinTunerName: PBTTuner
+     name: PBTTuner
     classArgs:
       optimize_mode: maximize
       all_checkpoint_dir: /the/path/to/store/checkpoints
@@ -77,4 +77,4 @@ Example Configuration
   tuner:
     builtinTunerName: PBTTuner
     classArgs:
       optimize_mode: maximize
\ No newline at end of file
--- a/docs/en_US/Tuner/RandomTuner.rst
+++ b/docs/en_US/Tuner/RandomTuner.rst
+Random Tuner
+============
+Introduction
+------------
+In `Random Search for Hyper-Parameter Optimization <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__ we show that Random Search might be surprisingly effective despite its simplicity.
+We suggest using Random Search as a baseline when no knowledge about the prior distribution of hyper-parameters is available.
+Usage
+-----
+Example Configuration
+.. code-block:: yaml
+   tuner:
+     name: Random
+     classArgs:
+       seed: 100  # optional
--- a/docs/en_US/Tuner/SmacTuner.rst
+++ b/docs/en_US/Tuner/SmacTuner.rst
@@ -2,16 +2,16 @@ SMAC Tuner on NNI
 =================
-1. Introduction
+Introduction
---------------
+------------
 `SMAC <https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf>`__ is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO in order to handle categorical parameters. The SMAC supported by nni is a wrapper on `the SMAC3 github repo <https://github.com/automl/SMAC3>`__.
 Note that SMAC on nni only supports a subset of the types in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__\ : ``choice``\ , ``randint``\ , ``uniform``\ , ``loguniform``\ , and ``quniform``.
-2. Usage
+Usage
--------
+-----
 Installation
 ^^^^^^^^^^^^
@@ -35,6 +35,6 @@ Example Configuration
   # config.yml
   tuner:
-     builtinTunerName: SMAC
+     name: SMAC
     classArgs:
       optimize_mode: maximize
\ No newline at end of file
--- a/docs/en_US/Tuner/TpeTuner.rst
+++ b/docs/en_US/Tuner/TpeTuner.rst
+TPE Tuner
+=========
+Introduction
+------------
+The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach.
+SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements,
+and then subsequently choose new hyperparameters to test based on this model.
+The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluation matric.
+P(x|y) is modeled by transforming the generative process of hyperparameters,
+replacing the distributions of the configuration prior with non-parametric densities.
+This optimization approach is described in detail in `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__.
+Parallel TPE optimization
+^^^^^^^^^^^^^^^^^^^^^^^^^
+TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete.
+The original algorithm design was optimized for sequential computation.
+If we were to use TPE with much concurrency, its performance will be bad.
+We have optimized this case using the Constant Liar algorithm.
+For these principles of optimization, please refer to our `research blog <../CommunitySharings/ParallelizingTpeSearch.rst>`__.
+Usage
+-----
+ To use TPE, you should add the following spec in your experiment's YAML config file:
+.. code-block:: yaml
+    ## minimal config ##
+    tuner:
+      name: TPE
+      classArgs:
+        optimize_mode: minimize
+.. code-block:: yaml
+    ## advanced config ##
+    tuner:
+      name: TPE
+      classArgs:
+        optimize_mode: maximize
+        seed: 12345
+        tpe_args:
+          constant_liar_type: 'mean'
+          n_startup_jobs: 10
+          n_ei_candidates: 20
+          linear_forgetting: 100
+          prior_weight: 0
+          gamma: 0.5
+classArgs
+^^^^^^^^^
+.. list-table::
+    :widths: 10 20 10 60
+    :header-rows: 1
+    * - Field
+      - Type
+      - Default
+      - Description
+    * - ``optimize_mode``
+      - ``'minimize' | 'maximize'``
+      - ``'minimize'``
+      - Whether to minimize or maximize trial metrics.
+    * - ``seed``
+      - ``int | null``
+      - ``null``
+      - The random seed.
+    * - ``tpe_args.constant_liar_type``
+      - ``'best' | 'worst' | 'mean' | null``
+      - ``'best'``
+      - TPE algorithm itself does not support parallel tuning. This parameter specifies how to optimize for trial_concurrency > 1. How each liar works is explained in paper's section 6.1.
+        In general ``best`` suit for small trial number and ``worst`` suit for large trial number.
+    * - ``tpe_args.n_startup_jobs``
+      - ``int``
+      - ``20``
+      - The first N hyper-parameters are generated fully randomly for warming up.
+        If the search space is large, you can increase this value. Or if max_trial_number is small, you may want to decrease it.
+    * - ``tpe_args.n_ei_candidates``
+      - ``int``
+      - ``24``
+      - For each iteration TPE samples EI for N sets of parameters and choose the best one. (loosely speaking)
+    * - ``tpe_args.linear_forgetting``
+      - ``int``
+      - ``25``
+      - TPE will lower the weights of old trials. This controls how many iterations it takes for a trial to start decay.
+    * - ``tpe_args.prior_weight``
+      - ``float``
+      - ``1.0``
+      - TPE treats user provided search space as prior.
+        When generating new trials, it also incorporates the prior in trial history by transforming the search space to
+        one trial configuration (i.e., each parameter of this configuration chooses the mean of its candidate range).
+        Here, prior_weight determines the weight of this trial configuration in the history trial configurations.
+        With prior weight 1.0, the search space is treated as one good trial.
+        For example, "normal(0, 1)" effectly equals to a trial with x = 0 which has yielded good result.
+    * - ``tpe_args.gamma``
+      - ``float``
+      - ``0.25``
+      - Controls how many trials are considered "good".
+        The number is calculated as "min(gamma * sqrt(N), linear_forgetting)".
--- a/docs/en_US/autotune_ref.rst
+++ b/docs/en_US/autotune_ref.rst
@@ -20,6 +20,12 @@ Tuner
 ..  autoclass:: nni.tuner.Tuner
    :members:
+..  autoclass:: nni.algorithms.hpo.tpe_tuner.TpeTuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.random_tuner.RandomTuner
+    :members:
 ..  autoclass:: nni.algorithms.hpo.hyperopt_tuner.HyperoptTuner
    :members:

--- a/docs/en_US/builtin_tuner.rst
+++ b/docs/en_US/builtin_tuner.rst
@@ -10,7 +10,9 @@ Tuner receives metrics from `Trial` to evaluate the performance of a specific pa
    :maxdepth: 1
    Overview <Tuner/BuiltinTuner>
-    TPE / Random Search / Anneal <Tuner/HyperoptTuner>
+    TPE <Tuner/TpeTuner>
+    Random Search <Tuner/RandomTuner>
+    Anneal <Tuner/HyperoptTuner>
    Naive Evolution <Tuner/EvolutionTuner>
    SMAC <Tuner/SmacTuner>
    Metis Tuner <Tuner/MetisTuner>