Update tuner doc style (#4351)

cf7032a5 · liuzhe-lz · GitHub · fd0d1d99 · cf7032a5 · cf7032a5
Unverified Commit cf7032a5 authored Dec 15, 2021 by liuzhe-lz Committed by GitHub Dec 15, 2021
20 changed files
--- a/docs/en_US/Tuner/HyperoptTuner.rst
+++ b/docs/en_US/Tuner/HyperoptTuner.rst
 Anneal Tuner
 ============
-Introduction
------------
 This simple annealing algorithm begins by sampling from the prior but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
 Usage

--- a/docs/en_US/Tuner/BatchTuner.rst
+++ b/docs/en_US/Tuner/BatchTuner.rst
-Batch Tuner on NNI
+Batch Tuner
-==================
+===========
-Introduction
------------
 Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type ``choice`` in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__.
-Suggested scenario: If the configurations you want to try have been decided, you can list them in the SearchSpace file (using ``choice``\ ) and run them using the batch tuner.
+Suggested scenario: If the configurations you want to try have been decided, you can list them in the SearchSpace file (using ``choice``) and run them using the batch tuner.
 Usage
 -----
@@ -20,8 +17,6 @@ Example Configuration
   tuner:
     name: BatchTuner
-:raw-html:`<br>`
 Note that the search space for BatchTuner should look like:
 .. code-block:: json

--- a/docs/en_US/Tuner/BohbAdvisor.rst
+++ b/docs/en_US/Tuner/BohbAdvisor.rst
-BOHB Advisor on NNI
+BOHB Advisor
-===================
+============
-Introduction
------------
 BOHB is a robust and efficient hyperparameter tuning algorithm mentioned in `this reference paper <https://arxiv.org/abs/1807.01774>`__. BO is an abbreviation for "Bayesian Optimization" and HB is an abbreviation for "Hyperband".
@@ -81,16 +78,16 @@ BOHB advisor requires the `ConfigSpace <https://github.com/automl/ConfigSpace>`_
 classArgs Requirements
 ^^^^^^^^^^^^^^^^^^^^^^
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
-* **min_budget** (*int, optional, default = 1*\ ) - The smallest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be positive.
+* **min_budget** (*int, optional, default = 1*) - The smallest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be positive.
-* **max_budget** (*int, optional, default = 3*\ ) - The largest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be larger than min_budget.
+* **max_budget** (*int, optional, default = 3*) - The largest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be larger than min_budget.
-* **eta** (*int, optional, default = 3*\ ) - In each iteration, a complete run of sequential halving is executed. In it, after evaluating each configuration on the same subset size, only a fraction of 1/eta of them 'advances' to the next round. Must be greater or equal to 2.
+* **eta** (*int, optional, default = 3*) - In each iteration, a complete run of sequential halving is executed. In it, after evaluating each configuration on the same subset size, only a fraction of 1/eta of them 'advances' to the next round. Must be greater or equal to 2.
-* **min_points_in_model**\ (*int, optional, default = None*\ ): number of observations to start building a KDE. Default 'None' means dim+1; when the number of completed trials in this budget is equal to or larger than ``max{dim+1, min_points_in_model}``\ , BOHB will start to build a KDE model of this budget then use said KDE model to guide configuration selection. Needs to be positive. (dim means the number of hyperparameters in search space)
+* **min_points_in_model** (*int, optional, default = None*): number of observations to start building a KDE. Default 'None' means dim+1; when the number of completed trials in this budget is equal to or larger than ``max{dim+1, min_points_in_model}``, BOHB will start to build a KDE model of this budget then use said KDE model to guide configuration selection. Needs to be positive. (dim means the number of hyperparameters in search space)
-* **top_n_percent**\ (*int, optional, default = 15*\ ): percentage (between 1 and 99) of the observations which are considered good. Good points and bad points are used for building KDE models. For example, if you have 100 observed trials and top_n_percent is 15, then the top 15% of points will be used for building the good points models "l(x)". The remaining 85% of points will be used for building the bad point models "g(x)".
+* **top_n_percent** (*int, optional, default = 15*): percentage (between 1 and 99) of the observations which are considered good. Good points and bad points are used for building KDE models. For example, if you have 100 observed trials and top_n_percent is 15, then the top 15% of points will be used for building the good points models "l(x)". The remaining 85% of points will be used for building the bad point models "g(x)".
-* **num_samples**\ (*int, optional, default = 64*\ ): number of samples to optimize EI (default 64). In this case, we will sample "num_samples" points and compare the result of l(x)/g(x). Then we will return the one with the maximum l(x)/g(x) value as the next configuration if the optimize_mode is ``maximize``. Otherwise, we return the smallest one.
+* **num_samples** (*int, optional, default = 64*): number of samples to optimize EI (default 64). In this case, we will sample "num_samples" points and compare the result of l(x)/g(x). Then we will return the one with the maximum l(x)/g(x) value as the next configuration if the optimize_mode is ``maximize``. Otherwise, we return the smallest one.
-* **random_fraction**\ (*float, optional, default = 0.33*\ ): fraction of purely random configurations that are sampled from the prior without the model.
+* **random_fraction** (*float, optional, default = 0.33*): fraction of purely random configurations that are sampled from the prior without the model.
-* **bandwidth_factor**\ (*float, optional, default = 3.0*\ ): to encourage diversity, the points proposed to optimize EI are sampled from a 'widened' KDE where the bandwidth is multiplied by this factor. We suggest using the default value if you are not familiar with KDE.
+* **bandwidth_factor** (*float, optional, default = 3.0*): to encourage diversity, the points proposed to optimize EI are sampled from a 'widened' KDE where the bandwidth is multiplied by this factor. We suggest using the default value if you are not familiar with KDE.
-* **min_bandwidth**\ (*float, optional, default = 0.001*\ ): to keep diversity, even when all (good) samples have the same value for one of the parameters, a minimum bandwidth (default: 1e-3) is used instead of zero. We suggest using the default value if you are not familiar with KDE.
+* **min_bandwidth** (*float, optional, default = 0.001*): to keep diversity, even when all (good) samples have the same value for one of the parameters, a minimum bandwidth (default: 1e-3) is used instead of zero. We suggest using the default value if you are not familiar with KDE.
 *Please note that the float type currently only supports decimal representations. You have to use 0.333 instead of 1/3 and 0.001 instead of 1e-3.*
@@ -119,16 +116,16 @@ To use BOHB, you should add the following spec in your experiment's YAML config
 **classArgs Requirements:**
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
-* **min_budget** (*int, optional, default = 1*\ ) - The smallest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be positive.
+* **min_budget** (*int, optional, default = 1*) - The smallest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be positive.
-* **max_budget** (*int, optional, default = 3*\ ) - The largest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be larger than min_budget.
+* **max_budget** (*int, optional, default = 3*) - The largest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be larger than min_budget.
-* **eta** (*int, optional, default = 3*\ ) - In each iteration, a complete run of sequential halving is executed. In it, after evaluating each configuration on the same subset size, only a fraction of 1/eta of them 'advances' to the next round. Must be greater or equal to 2.
+* **eta** (*int, optional, default = 3*) - In each iteration, a complete run of sequential halving is executed. In it, after evaluating each configuration on the same subset size, only a fraction of 1/eta of them 'advances' to the next round. Must be greater or equal to 2.
-* **min_points_in_model**\ (*int, optional, default = None*\ ): number of observations to start building a KDE. Default 'None' means dim+1; when the number of completed trials in this budget is equal to or larger than ``max{dim+1, min_points_in_model}``\ , BOHB will start to build a KDE model of this budget then use said KDE model to guide configuration selection. Needs to be positive. (dim means the number of hyperparameters in search space)
+* **min_points_in_model** (*int, optional, default = None*): number of observations to start building a KDE. Default 'None' means dim+1; when the number of completed trials in this budget is equal to or larger than ``max{dim+1, min_points_in_model}``, BOHB will start to build a KDE model of this budget then use said KDE model to guide configuration selection. Needs to be positive. (dim means the number of hyperparameters in search space)
-* **top_n_percent**\ (*int, optional, default = 15*\ ): percentage (between 1 and 99) of the observations which are considered good. Good points and bad points are used for building KDE models. For example, if you have 100 observed trials and top_n_percent is 15, then the top 15% of points will be used for building the good points models "l(x)". The remaining 85% of points will be used for building the bad point models "g(x)".
+* **top_n_percent** (*int, optional, default = 15*): percentage (between 1 and 99) of the observations which are considered good. Good points and bad points are used for building KDE models. For example, if you have 100 observed trials and top_n_percent is 15, then the top 15% of points will be used for building the good points models "l(x)". The remaining 85% of points will be used for building the bad point models "g(x)".
-* **num_samples**\ (*int, optional, default = 64*\ ): number of samples to optimize EI (default 64). In this case, we will sample "num_samples" points and compare the result of l(x)/g(x). Then we will return the one with the maximum l(x)/g(x) value as the next configuration if the optimize_mode is ``maximize``. Otherwise, we return the smallest one.
+* **num_samples** (*int, optional, default = 64*): number of samples to optimize EI (default 64). In this case, we will sample "num_samples" points and compare the result of l(x)/g(x). Then we will return the one with the maximum l(x)/g(x) value as the next configuration if the optimize_mode is ``maximize``. Otherwise, we return the smallest one.
-* **random_fraction**\ (*float, optional, default = 0.33*\ ): fraction of purely random configurations that are sampled from the prior without the model.
+* **random_fraction** (*float, optional, default = 0.33*): fraction of purely random configurations that are sampled from the prior without the model.
-* **bandwidth_factor**\ (*float, optional, default = 3.0*\ ): to encourage diversity, the points proposed to optimize EI are sampled from a 'widened' KDE where the bandwidth is multiplied by this factor. We suggest using the default value if you are not familiar with KDE.
+* **bandwidth_factor** (*float, optional, default = 3.0*): to encourage diversity, the points proposed to optimize EI are sampled from a 'widened' KDE where the bandwidth is multiplied by this factor. We suggest using the default value if you are not familiar with KDE.
-* **min_bandwidth**\ (*float, optional, default = 0.001*\ ): to keep diversity, even when all (good) samples have the same value for one of the parameters, a minimum bandwidth (default: 1e-3) is used instead of zero. We suggest using the default value if you are not familiar with KDE.
+* **min_bandwidth** (*float, optional, default = 0.001*): to keep diversity, even when all (good) samples have the same value for one of the parameters, a minimum bandwidth (default: 1e-3) is used instead of zero. We suggest using the default value if you are not familiar with KDE.
 * **config_space** (*str, optional*): directly use a .pcs file serialized by `ConfigSpace <https://automl.github.io/ConfigSpace/>` in "pcs new" format. In this case, search space file (if provided in config) will be ignored. Note that this path needs to be an absolute path. Relative path is currently not supported.
 *Please note that the float type currently only supports decimal representations. You have to use 0.333 instead of 1/3 and 0.001 instead of 1e-3.*

--- a/docs/en_US/Tuner/BuiltinTuner.rst
+++ b/docs/en_US/Tuner/BuiltinTuner.rst
--- a/docs/en_US/Tuner/CustomizeAdvisor.rst
+++ b/docs/en_US/Tuner/CustomizeAdvisor.rst
@@ -36,7 +36,7 @@ Similar to tuner and assessor. NNI needs to locate your customized Advisor class
     classArgs:
       arg1: value1
-**Note that** The working directory of your advisor is ``<home>/nni-experiments/<experiment_id>/log``\ , which can be retrieved with environment variable ``NNI_LOG_DIRECTORY``.
+**Note that** The working directory of your advisor is ``<home>/nni-experiments/<experiment_id>/log``, which can be retrieved with environment variable ``NNI_LOG_DIRECTORY``.
 Example
 -------

--- a/docs/en_US/Tuner/CustomizeTuner.rst
+++ b/docs/en_US/Tuner/CustomizeTuner.rst
 Customize-Tuner
 ===============
-Customize Tuner
---------------
 NNI provides state-of-the-art tuning algorithm in builtin-tuners. NNI supports to build a tuner by yourself for tuning demand.
 If you want to implement your own tuning algorithm, you can implement a customized Tuner, there are three things to do:
@@ -81,7 +78,7 @@ If the you implement the ``generate_parameters`` like this:
       # your code implements here.
       return {"dropout": 0.3, "learning_rate": 0.4}
-It means your Tuner will always generate parameters ``{"dropout": 0.3, "learning_rate": 0.4}``. Then Trial will receive ``{"dropout": 0.3, "learning_rate": 0.4}`` by calling API ``nni.get_next_parameter()``. Once the trial ends with a result (normally some kind of metrics), it can send the result to Tuner by calling API ``nni.report_final_result()``\ , for example ``nni.report_final_result(0.93)``. Then your Tuner's ``receive_trial_result`` function will receied the result like：
+It means your Tuner will always generate parameters ``{"dropout": 0.3, "learning_rate": 0.4}``. Then Trial will receive ``{"dropout": 0.3, "learning_rate": 0.4}`` by calling API ``nni.get_next_parameter()``. Once the trial ends with a result (normally some kind of metrics), it can send the result to Tuner by calling API ``nni.report_final_result()``, for example ``nni.report_final_result(0.93)``. Then your Tuner's ``receive_trial_result`` function will receied the result like：
 .. code-block:: python
@@ -89,7 +86,7 @@ It means your Tuner will always generate parameters ``{"dropout": 0.3, "learning
   parameters = {"dropout": 0.3, "learning_rate": 0.4}
   value = 0.93
-**Note that** The working directory of your tuner is ``<home>/nni-experiments/<experiment_id>/log``\ , which can be retrieved with environment variable ``NNI_LOG_DIRECTORY``\ , therefore, if you want to access a file (e.g., ``data.txt``\ ) in the directory of your own tuner, you cannot use ``open('data.txt', 'r')``. Instead, you should use the following:
+**Note that** The working directory of your tuner is ``<home>/nni-experiments/<experiment_id>/log``, which can be retrieved with environment variable ``NNI_LOG_DIRECTORY``, therefore, if you want to access a file (e.g., ``data.txt``) in the directory of your own tuner, you cannot use ``open('data.txt', 'r')``. Instead, you should use the following:
 .. code-block:: python

--- a/docs/en_US/Tuner/DngoTuner.rst
+++ b/docs/en_US/Tuner/DngoTuner.rst
-DNGO on NNI
+DNGO Tuner
-===========
+==========
-Introduction
------------
 Usage
 -----
@@ -13,7 +10,7 @@ Installation
 classArgs requirements
 ^^^^^^^^^^^^^^^^^^^^^^
-* **optimize_mode** (*'maximize' or 'minimize'*\ ) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
+* **optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
 * **sample_size** (*int, default = 1000*) - Number of samples to select in each iteration. The best one will be picked from the samples as the next trial.
 * **trials_per_update** (*int, default = 20*) - Number of trials to collect before updating the model.
 * **num_epochs_per_training** (*int, default = 500*) - Number of epochs to train DNGO model.

--- a/docs/en_US/Tuner/EvolutionTuner.rst
+++ b/docs/en_US/Tuner/EvolutionTuner.rst
-Naive Evolution Tuners on NNI
+Naive Evolution Tuner
-=============================
+=====================
-Introduction
------------
 Naive Evolution comes from `Large-Scale Evolution of Image Classifiers <https://arxiv.org/pdf/1703.01041.pdf>`__. It randomly initializes a population based on the search space. For each generation, it chooses better ones and does some mutation (e.g., changes a hyperparameter, adds/removes one layer, etc.) on them to get the next generation. Naive Evolution requires many trials to works but it's very simple and it's easily expanded with new features.
@@ -14,10 +10,10 @@ classArgs Requirements
 ^^^^^^^^^^^^^^^^^^^^^^
 * 
-  **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
+  **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
 * 
-  **population_size** (*int value (should > 0), optional, default = 20*\ ) - the initial size of the population (trial num) in the evolution tuner. It's suggested that ``population_size`` be much larger than ``concurrency`` so users can get the most out of the algorithm (and at least ``concurrency``\ , or the tuner will fail on its first generation of parameters).
+  **population_size** (*int value (should > 0), optional, default = 20*) - the initial size of the population (trial num) in the evolution tuner. It's suggested that ``population_size`` be much larger than ``concurrency`` so users can get the most out of the algorithm (and at least ``concurrency``, or the tuner will fail on its first generation of parameters).
 Example Configuration
 ^^^^^^^^^^^^^^^^^^^^^

--- a/docs/en_US/Tuner/GPTuner.rst
+++ b/docs/en_US/Tuner/GPTuner.rst
-GP Tuner on NNI
+GP Tuner
-===============
+========
-Introduction
------------
 Bayesian optimization works by constructing a posterior distribution of functions (a Gaussian Process) that best describes the function you want to optimize. As the number of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in parameter space are worth exploring and which are not.
 GP Tuner is designed to minimize/maximize the number of steps required to find a combination of parameters that are close to the optimal combination. To do so, this method uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) to solve, and it's amenable to common tools. Therefore, Bayesian Optimization is suggested for situations where sampling the function to be optimized is very expensive.
-Note that the only acceptable types within the search space are ``randint``\ , ``uniform``\ , ``quniform``\ ,  ``loguniform``\ , ``qloguniform``\ , and numerical ``choice``.
+Note that the only acceptable types within the search space are ``randint``, ``uniform``, ``quniform``, ``loguniform``, ``qloguniform``, and numerical ``choice``.
 This optimization approach is described in Section 3 of `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__.
@@ -18,15 +15,15 @@ Usage
 classArgs requirements
 ^^^^^^^^^^^^^^^^^^^^^^
-* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*\ ) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
+* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
-* **utility** (*'ei', 'ucb' or 'poi', optional, default = 'ei'*\ ) - The utility function (acquisition function). 'ei', 'ucb', and 'poi' correspond to 'Expected Improvement', 'Upper Confidence Bound', and 'Probability of Improvement', respectively.
+* **utility** (*'ei', 'ucb' or 'poi', optional, default = 'ei'*) - The utility function (acquisition function). 'ei', 'ucb', and 'poi' correspond to 'Expected Improvement', 'Upper Confidence Bound', and 'Probability of Improvement', respectively.
-* **kappa** (*float, optional, default = 5*\ ) - Used by the 'ucb' utility function. The bigger ``kappa`` is, the more exploratory the tuner will be.
+* **kappa** (*float, optional, default = 5*) - Used by the 'ucb' utility function. The bigger ``kappa`` is, the more exploratory the tuner will be.
-* **xi** (*float, optional, default = 0*\ ) - Used by the 'ei' and 'poi' utility functions. The bigger ``xi`` is, the more exploratory the tuner will be.
+* **xi** (*float, optional, default = 0*) - Used by the 'ei' and 'poi' utility functions. The bigger ``xi`` is, the more exploratory the tuner will be.
-* **nu** (*float, optional, default = 2.5*\ ) - Used to specify the Matern kernel. The smaller nu, the less smooth the approximated function is.
+* **nu** (*float, optional, default = 2.5*) - Used to specify the Matern kernel. The smaller nu, the less smooth the approximated function is.
-* **alpha** (*float, optional, default = 1e-6*\ ) - Used to specify the Gaussian Process Regressor. Larger values correspond to an increased noise level in the observations.
+* **alpha** (*float, optional, default = 1e-6*) - Used to specify the Gaussian Process Regressor. Larger values correspond to an increased noise level in the observations.
-* **cold_start_num** (*int, optional, default = 10*\ ) - Number of random explorations to perform before the Gaussian Process. Random exploration can help by diversifying the exploration space.
+* **cold_start_num** (*int, optional, default = 10*) - Number of random explorations to perform before the Gaussian Process. Random exploration can help by diversifying the exploration space.
-* **selection_num_warm_up** (*int, optional, default = 1e5*\ ) - Number of random points to evaluate when getting the point which maximizes the acquisition function.
+* **selection_num_warm_up** (*int, optional, default = 1e5*) - Number of random points to evaluate when getting the point which maximizes the acquisition function.
-* **selection_num_starting_points** (*int, optional, default = 250*\ ) - Number of times to run L-BFGS-B from a random starting point after the warmup.
+* **selection_num_starting_points** (*int, optional, default = 250*) - Number of times to run L-BFGS-B from a random starting point after the warmup.
 Example Configuration
 ^^^^^^^^^^^^^^^^^^^^^

--- a/docs/en_US/Tuner/GridsearchTuner.rst
+++ b/docs/en_US/Tuner/GridsearchTuner.rst
-Grid Search on NNI
+Grid Search Tuner
-==================
+=================
-Grid Search
-----------
-Introduction
------------
 Grid Search performs an exhaustive search through a search space.

--- a/docs/en_US/Tuner/HyperbandAdvisor.rst
+++ b/docs/en_US/Tuner/HyperbandAdvisor.rst
-Hyperband on NNI
+Hyperband Advisor
-================
+=================
-Introduction
------------
 `Hyperband <https://arxiv.org/pdf/1603.06560.pdf>`__ is a popular autoML algorithm. The basic idea of Hyperband is to create several buckets, each having ``n`` randomly generated hyperparameter configurations, each configuration using ``r`` resources (e.g., epoch number, batch number). After the ``n`` configurations are finished, it chooses the top ``n/eta`` configurations and runs them using increased ``r*eta`` resources. At last, it chooses the best configuration it has found so far.
@@ -15,7 +12,7 @@ Second, this implementation fully leverages Hyperband's internal parallelism. Sp
 Or if you want to set ``exec_mode`` with ``serial`` according to the original algorithm. In this mode, the next bucket will start strictly after the current bucket.
-``parallelism`` mode may lead to multiple unfinished buckets, and there is at most one unfinished bucket under ``serial`` mode. The advantage of ``parallelism`` mode is to make full use of resources, which may reduce the experiment duration multiple times. The following two pictures are the results of quick verification using `nas-bench-201 <../NAS/Benchmarks.rst>`__\ , picture above is in ``parallelism`` mode, picture below is in ``serial`` mode.
+``parallelism`` mode may lead to multiple unfinished buckets, and there is at most one unfinished bucket under ``serial`` mode. The advantage of ``parallelism`` mode is to make full use of resources, which may reduce the experiment duration multiple times. The following two pictures are the results of quick verification using `nas-bench-201 <../NAS/Benchmarks.rst>`__, picture above is in ``parallelism`` mode, picture below is in ``serial`` mode.
 .. image:: ../../img/hyperband_parallelism.png
@@ -54,15 +51,15 @@ To use Hyperband, you should add the following spec in your experiment's YAML co
       #choice: serial, parallelism
       exec_mode: parallelism
-Note that once you use Advisor, you are not allowed to add a Tuner and Assessor spec in the config file. If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there will be one more key called ``TRIAL_BUDGET`` defined by user. **By using this ``TRIAL_BUDGET``\ , the trial can control how long it runs**.
+Note that once you use Advisor, you are not allowed to add a Tuner and Assessor spec in the config file. If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there will be one more key called ``TRIAL_BUDGET`` defined by user. **By using this ``TRIAL_BUDGET``, the trial can control how long it runs**.
-For ``report_intermediate_result(metric)`` and ``report_final_result(metric)`` in your trial code, **\ ``metric`` should be either a number or a dict which has a key ``default`` with a number as its value**. This number is the one you want to maximize or minimize, for example, accuracy or loss.
+For ``report_intermediate_result(metric)`` and ``report_final_result(metric)`` in your trial code, **``metric`` should be either a number or a dict which has a key ``default`` with a number as its value**. This number is the one you want to maximize or minimize, for example, accuracy or loss.
 ``R`` and ``eta`` are the parameters of Hyperband that you can change. ``R`` means the maximum trial budget that can be allocated to a configuration. Here, trial budget could mean the number of epochs or mini-batches. This ``TRIAL_BUDGET`` should be used by the trial to control how long it runs. Refer to the example under ``examples/trials/mnist-advisor/`` for details.
 ``eta`` means ``n/eta`` configurations from ``n`` configurations will survive and rerun using more budgets.
-Here is a concrete example of ``R=81`` and ``eta=3``\ :
+Here is a concrete example of ``R=81`` and ``eta=3``:
 .. list-table::
   :header-rows: 1
@@ -120,10 +117,10 @@ classArgs requirements
 ^^^^^^^^^^^^^^^^^^^^^^
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
-* **R** (*int, optional, default = 60*\ ) - the maximum budget given to a trial (could be the number of mini-batches or epochs). Each trial should use TRIAL_BUDGET to control how long they run.
+* **R** (*int, optional, default = 60*) - the maximum budget given to a trial (could be the number of mini-batches or epochs). Each trial should use TRIAL_BUDGET to control how long they run.
-* **eta** (*int, optional, default = 3*\ ) - ``(eta-1)/eta`` is the proportion of discarded trials.
+* **eta** (*int, optional, default = 3*) - ``(eta-1)/eta`` is the proportion of discarded trials.
-* **exec_mode** (*serial or parallelism, optional, default = parallelism*\ ) - If 'parallelism', the tuner will try to use available resources to start new bucket immediately. If 'serial', the tuner will only start new bucket after the current bucket is done.
+* **exec_mode** (*serial or parallelism, optional, default = parallelism*) - If 'parallelism', the tuner will try to use available resources to start new bucket immediately. If 'serial', the tuner will only start new bucket after the current bucket is done.
 Example Configuration
 ^^^^^^^^^^^^^^^^^^^^^

--- a/docs/en_US/Tuner/MetisTuner.rst
+++ b/docs/en_US/Tuner/MetisTuner.rst
-Metis Tuner on NNI
+Metis Tuner
-==================
+===========
-Introduction
------------
 `Metis <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__ offers several benefits over other tuning algorithms. While most tools only predict the optimal configuration, Metis gives you two outputs, a prediction for the optimal configuration and a suggestion for the next trial. No more guess work!
@@ -19,7 +16,7 @@ Metis belongs to the class of sequential model-based optimization (SMBO) algorit
 * 
  It identifies the next hyper-parameter candidate. This is achieved by inferring the potential information gain of exploration, exploitation, and resampling.
-Note that the only acceptable types within the search space are ``quniform``\ , ``uniform``\ , ``randint``\ , and numerical ``choice``.
+Note that the only acceptable types within the search space are ``quniform``, ``uniform``, ``randint``, and numerical ``choice``.
 More details can be found in our `paper <https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/>`__.
@@ -29,7 +26,7 @@ Usage
 classArgs requirements
 ^^^^^^^^^^^^^^^^^^^^^^
-* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*\ ) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
+* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
 Example Configuration
 ^^^^^^^^^^^^^^^^^^^^^

--- a/docs/en_US/Tuner/NetworkmorphismTuner.rst
+++ b/docs/en_US/Tuner/NetworkmorphismTuner.rst
-Network Morphism Tuner on NNI
+Network Morphism Tuner
-=============================
+======================
-Introduction
+`Autokeras <https://arxiv.org/abs/1806.10282>`__ is a popular autoML tool using Network Morphism. The basic idea of Autokeras is to use Bayesian Regression to estimate the metric of the Neural Network Architecture. Each time, it generates several child networks from father networks. Then it uses a naïve Bayesian regression to estimate its metric value from the history of trained results of network and metric value pairs. Next, it chooses the child which has the best, estimated performance and adds it to the training queue. Inspired by the work of Autokeras and referring to its `code <https://github.com/jhfjhfj1/autokeras>`__, we implemented our Network Morphism method on the NNI platform.
------------
-`Autokeras <https://arxiv.org/abs/1806.10282>`__ is a popular autoML tool using Network Morphism. The basic idea of Autokeras is to use Bayesian Regression to estimate the metric of the Neural Network Architecture. Each time, it generates several child networks from father networks. Then it uses a naïve Bayesian regression to estimate its metric value from the history of trained results of network and metric value pairs. Next, it chooses the child which has the best, estimated performance and adds it to the training queue. Inspired by the work of Autokeras and referring to its `code <https://github.com/jhfjhfj1/autokeras>`__\ , we implemented our Network Morphism method on the NNI platform.
 If you want to know more about network morphism trial usage, please see the :githublink:`Readme.md <examples/trials/network_morphism/README.rst>`.
@@ -19,11 +16,11 @@ NetworkMorphism requires :githublink:`PyTorch <examples/trials/network_morphism/
 classArgs Requirements
 ^^^^^^^^^^^^^^^^^^^^^^
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
-* **task** (*('cv'), optional, default = 'cv'*\ ) - The domain of the experiment. For now, this tuner only supports the computer vision (CV) domain.
+* **task** (*('cv'), optional, default = 'cv'*) - The domain of the experiment. For now, this tuner only supports the computer vision (CV) domain.
-* **input_width** (*int, optional, default = 32*\ ) - input image width
+* **input_width** (*int, optional, default = 32*) - input image width
-* **input_channel** (*int, optional, default = 3*\ ) - input image channel
+* **input_channel** (*int, optional, default = 3*) - input image channel
-* **n_output_node** (*int, optional, default = 10*\ ) - number of classes
+* **n_output_node** (*int, optional, default = 10*) - number of classes
@@ -56,7 +53,7 @@ Example Configuration
   # config.yml
   tuner:
-     builtinTunerName: NetworkMorphism
+     name: NetworkMorphism
       classArgs:
         optimize_mode: maximize
         task: cv
@@ -89,7 +86,7 @@ In the training procedure, it generates a JSON file which represents a Network G
   # report the final accuracy to NNI
   nni.report_final_result(best_acc)
-If you want to save and load the **best model**\ , the following methods are recommended.
+If you want to save and load the **best model**, the following methods are recommended.
 .. code-block:: python
@@ -276,19 +273,19 @@ You can consider the model to be a `directed acyclic graph <https://en.wikipedia
  * 
-    For ``StubConv (StubConv1d, StubConv2d, StubConv3d)``\ , the numbering follows the format: its node input id (or id list), node output id, input_channel, filters, kernel_size, stride, and padding.
+    For ``StubConv (StubConv1d, StubConv2d, StubConv3d)``, the numbering follows the format: its node input id (or id list), node output id, input_channel, filters, kernel_size, stride, and padding.
  * 
-    For ``StubDense``\ , the numbering follows the format: its node input id (or id list), node output id, input_units, and units.
+    For ``StubDense``, the numbering follows the format: its node input id (or id list), node output id, input_units, and units.
  * 
-    For ``StubBatchNormalization (StubBatchNormalization1d, StubBatchNormalization2d, StubBatchNormalization3d)``\ ,  the numbering follows the format: its node input id (or id list), node output id, and features numbers.
+    For ``StubBatchNormalization (StubBatchNormalization1d, StubBatchNormalization2d, StubBatchNormalization3d)``, the numbering follows the format: its node input id (or id list), node output id, and features numbers.
  * 
-    For ``StubDropout(StubDropout1d, StubDropout2d, StubDropout3d)``\ , the numbering follows the format: its node input id (or id list), node output id, and dropout rate.
+    For ``StubDropout(StubDropout1d, StubDropout2d, StubDropout3d)``, the numbering follows the format: its node input id (or id list), node output id, and dropout rate.
  * 
-    For ``StubPooling (StubPooling1d, StubPooling2d, StubPooling3d)``\ , the numbering follows the format: its node input id (or id list), node output id, kernel_size, stride, and padding.
+    For ``StubPooling (StubPooling1d, StubPooling2d, StubPooling3d)``, the numbering follows the format: its node input id (or id list), node output id, kernel_size, stride, and padding.
  * 
    For else layers, the numbering follows the format: its node input id (or id list) and node output id.

--- a/docs/en_US/Tuner/PBTTuner.rst
+++ b/docs/en_US/Tuner/PBTTuner.rst
-PBT Tuner on NNI
+PBT Tuner
-================
+=========
-Introduction
------------
 Population Based Training (PBT) comes from `Population Based Training of Neural Networks <https://arxiv.org/abs/1711.09846v1>`__. It's a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. 
@@ -13,7 +9,7 @@ Population Based Training (PBT) comes from `Population Based Training of Neural
   :alt: 
-PBTTuner initializes a population with several trials (i.e., ``population_size``\ ). There are four steps in the above figure, each trial only runs by one step. How long is one step is controlled by trial code, e.g., one epoch. When a trial starts, it loads a checkpoint specified by PBTTuner and continues to run one step, then saves checkpoint to a directory specified by PBTTuner and exits. The trials in a population run steps synchronously, that is, after all the trials finish the ``i``\ -th step, the ``(i+1)``\ -th step can be started. Exploitation and exploration of PBT are executed between two consecutive steps.
+PBTTuner initializes a population with several trials (i.e., ``population_size``). There are four steps in the above figure, each trial only runs by one step. How long is one step is controlled by trial code, e.g., one epoch. When a trial starts, it loads a checkpoint specified by PBTTuner and continues to run one step, then saves checkpoint to a directory specified by PBTTuner and exits. The trials in a population run steps synchronously, that is, after all the trials finish the ``i``-th step, the ``(i+1)``-th step can be started. Exploitation and exploration of PBT are executed between two consecutive steps.
 Usage
 -----
@@ -21,7 +17,7 @@ Usage
 Provide checkpoint directory
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Since some trials need to load other trial's checkpoint, users should provide a directory (i.e., ``all_checkpoint_dir``\ ) which is accessible by every trial. It is easy for local mode, users could directly use the default directory or specify any directory on the local machine. For other training services, users should follow `the document of those training services <../TrainingService/Overview.rst>`__ to provide a directory in a shared storage, such as NFS, Azure storage.
+Since some trials need to load other trial's checkpoint, users should provide a directory (i.e., ``all_checkpoint_dir``) which is accessible by every trial. It is easy for local mode, users could directly use the default directory or specify any directory on the local machine. For other training services, users should follow `the document of those training services <../TrainingService/Overview.rst>`__ to provide a directory in a shared storage, such as NFS, Azure storage.
 Modify your trial code
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -47,11 +43,11 @@ The complete example code can be found :githublink:`here <examples/trials/mnist-
 classArgs requirements
 ^^^^^^^^^^^^^^^^^^^^^^
-* **optimize_mode** (*'maximize' or 'minimize'*\ ) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
+* **optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
-* **all_checkpoint_dir** (*str, optional, default = None*\ ) - Directory for trials to load and save checkpoint, if not specified, the directory would be "~/nni/checkpoint/\ :raw-html:`<exp-id>`\ ". Note that if the experiment is not local mode, users should provide a path in a shared storage which can be accessed by all the trials.
+* **all_checkpoint_dir** (*str, optional, default = None*) - Directory for trials to load and save checkpoint, if not specified, the directory would be "~/nni/checkpoint/\ :raw-html:`<exp-id>`\ ". Note that if the experiment is not local mode, users should provide a path in a shared storage which can be accessed by all the trials.
-* **population_size** (*int, optional, default = 10*\ ) - Number of trials in a population. Each step has this number of trials. In our implementation, one step is running each trial by specific training epochs set by users.
+* **population_size** (*int, optional, default = 10*) - Number of trials in a population. Each step has this number of trials. In our implementation, one step is running each trial by specific training epochs set by users.
-* **factors** (*tuple, optional, default = (1.2, 0.8)*\ ) - Factors for perturbation of hyperparameters.
+* **factors** (*tuple, optional, default = (1.2, 0.8)*) - Factors for perturbation of hyperparameters.
-* **fraction** (*float, optional, default = 0.2*\ ) - Fraction for selecting bottom and top trials.
+* **fraction** (*float, optional, default = 0.2*) - Fraction for selecting bottom and top trials.
 Experiment config
 ^^^^^^^^^^^^^^^^^
@@ -75,6 +71,6 @@ Example Configuration
   # config.yml
   tuner:
-     builtinTunerName: PBTTuner
+     name: PBTTuner
     classArgs:
       optimize_mode: maximize
--- a/docs/en_US/Tuner/RandomTuner.rst
+++ b/docs/en_US/Tuner/RandomTuner.rst
 Random Tuner
 ============
-Introduction
------------
 In `Random Search for Hyper-Parameter Optimization <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__ we show that Random Search might be surprisingly effective despite its simplicity.
 We suggest using Random Search as a baseline when no knowledge about the prior distribution of hyper-parameters is available.

--- a/docs/en_US/Tuner/SmacTuner.rst
+++ b/docs/en_US/Tuner/SmacTuner.rst
-SMAC Tuner on NNI
+SMAC Tuner
-=================
+==========
-Introduction
------------
 `SMAC <https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf>`__ is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO in order to handle categorical parameters. The SMAC supported by nni is a wrapper on `the SMAC3 github repo <https://github.com/automl/SMAC3>`__.
-Note that SMAC on nni only supports a subset of the types in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__\ : ``choice``\ , ``randint``\ , ``uniform``\ , ``loguniform``\ , and ``quniform``.
+Note that SMAC on nni only supports a subset of the types in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__: ``choice``, ``randint``, ``uniform``, ``loguniform``, and ``quniform``.
 Usage
 -----
@@ -25,8 +20,8 @@ SMAC has dependencies that need to be installed by following command before the
 classArgs requirements
 ^^^^^^^^^^^^^^^^^^^^^^
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
-* **config_dedup** (*True or False, optional, default = False*\ ) - If True, the tuner will not generate a configuration that has been already generated. If False, a configuration may be generated twice, but it is rare for a relatively large search space.
+* **config_dedup** (*True or False, optional, default = False*) - If True, the tuner will not generate a configuration that has been already generated. If False, a configuration may be generated twice, but it is rare for a relatively large search space.
 Example Configuration
 ^^^^^^^^^^^^^^^^^^^^^

--- a/docs/en_US/Tuner/TpeTuner.rst
+++ b/docs/en_US/Tuner/TpeTuner.rst
 TPE Tuner
 =========
-Introduction
------------
 The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach.
 SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements,
 and then subsequently choose new hyperparameters to test based on this model.

--- a/docs/en_US/builtin_tuner.rst
+++ b/docs/en_US/builtin_tuner.rst
@@ -12,7 +12,7 @@ Tuner receives metrics from `Trial` to evaluate the performance of a specific pa
    Overview <Tuner/BuiltinTuner>
    TPE <Tuner/TpeTuner>
    Random Search <Tuner/RandomTuner>
-    Anneal <Tuner/HyperoptTuner>
+    Anneal <Tuner/AnnealTuner>
    Naive Evolution <Tuner/EvolutionTuner>
    SMAC <Tuner/SmacTuner>
    Metis Tuner <Tuner/MetisTuner>

--- a/docs/zh_CN/Tuner/AnnealTuner.rst
+++ b/docs/zh_CN/Tuner/AnnealTuner.rst
+../../en_US/Tuner/AnnealTuner.rst
\ No newline at end of file
--- a/docs/zh_CN/Tuner/HyperoptTuner.rst
+++ b/docs/zh_CN/Tuner/HyperoptTuner.rst
-../../en_US/Tuner/HyperoptTuner.rst
\ No newline at end of file