Fix doc build warning (#1799)

* fix doc build warnings * update docstring guide * fix doc build warning #2 * remove typing.Dict * update * fix dead link * remove deprecated docs * fix missing link warning * fix link issue after merge * fix docstring indentation warning * remove trial.py * revert commit for deadlink of outdated docs * fix pylint error

Fix doc build warning (#1799)
* fix doc build warnings * update docstring guide * fix doc build warning #2 * remove typing.Dict * update * fix dead link * remove deprecated docs * fix missing link warning * fix link issue after merge * fix docstring indentation warning * remove trial.py * revert commit for deadlink of outdated docs * fix pylint error
659480f2 · Yan Ni · xuehui · ac6f420f · ac6f420f · ac6f420f
Commit 659480f2 authored Dec 12, 2019 by Yan Ni Committed by xuehui Dec 12, 2019
18 changed files
--- a/docs/en_US/AdvancedFeature/AdvancedNas.md
+++ b/docs/en_US/AdvancedFeature/AdvancedNas.md
-# Tutorial for Advanced Neural Architecture Search
-Currently many of the NAS algorithms leverage the technique of **weight sharing** among trials to accelerate its training process. For example, [ENAS][1] delivers 1000x effiency with '_parameter sharing between child models_', compared with the previous [NASNet][2] algorithm. Other NAS algorithms such as [DARTS][3], [Network Morphism][4], and [Evolution][5] is also leveraging, or has the potential to leverage weight sharing.
-This is a tutorial on how to enable weight sharing in NNI.
-## Weight Sharing among trials
-Currently we recommend sharing weights through NFS (Network File System), which supports sharing files across machines, and is light-weighted, (relatively) efficient. We also welcome contributions from the community on more efficient techniques.
-### Weight Sharing through NFS file
-With the NFS setup (see below), trial code can share model weight through loading & saving files. Here we recommend that user feed the tuner with the storage path:
-```yaml
-tuner:
-  codeDir: path/to/customer_tuner
-  classFileName: customer_tuner.py
-  className: CustomerTuner
-  classArgs:
-    ...
-    save_dir_root: /nfs/storage/path/
-```
-And let tuner decide where to save & load weights and feed the paths to trials through `nni.get_next_parameters()`:
-<img src="https://user-images.githubusercontent.com/23273522/51817667-93ebf080-2306-11e9-8395-b18b322062bc.png" alt="drawing" width="700"/>
- For example, in tensorflow:
-```python
-# save models
-saver = tf.train.Saver()
-saver.save(sess, os.path.join(params['save_path'], 'model.ckpt'))
-# load models
-tf.init_from_checkpoint(params['restore_path'])
-```
-where `'save_path'` and `'restore_path'` in hyper-parameter can be managed by the tuner.
-### NFS Setup
-NFS follows the Client-Server Architecture, with an NFS server providing physical storage, trials on the remote machine with an NFS client can read/write those files in the same way that they access local files.
-#### NFS Server
-An NFS server can be any machine as long as it can provide enough physical storage, and network connection with **remote machine** for NNI trials. Usually you can choose one of the remote machine as NFS Server.
-On Ubuntu, install NFS server through `apt-get`:
-```bash
-sudo apt-get install nfs-kernel-server
-```
-Suppose `/tmp/nni/shared` is used as the physical storage, then run:
-```bash
-mkdir -p /tmp/nni/shared
-sudo echo "/tmp/nni/shared *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
-sudo service nfs-kernel-server restart
-```
-You can check if the above directory is successfully exported by NFS using `sudo showmount -e localhost`
-#### NFS Client
-For a trial on remote machine able to access shared files with NFS, an NFS client needs to be installed. For example, on Ubuntu:
-```bash
-sudo apt-get install nfs-common
-```
-Then create & mount the mounted directory of shared files:
-```bash
-mkdir -p /mnt/nfs/nni/
-sudo mount -t nfs 10.10.10.10:/tmp/nni/shared /mnt/nfs/nni
-```
-where `10.10.10.10` should be replaced by the real IP of NFS server machine in practice.
-## Asynchronous Dispatcher Mode for trial dependency control
-The feature of weight sharing enables trials from different machines, in which most of the time **read after write** consistency must be assured. After all, the child model should not load parent model before parent trial finishes training. To deal with this, users can enable **asynchronous dispatcher mode** with `multiThread: true` in `config.yml` in NNI, where the dispatcher assign a tuner thread each time a `NEW_TRIAL` request comes in, and the tuner thread can decide when to submit a new trial by blocking and unblocking the thread itself. For example:
-```python
-    def generate_parameters(self, parameter_id):
-        self.thread_lock.acquire()
-        indiv = # configuration for a new trial
-        self.events[parameter_id] = threading.Event()
-        self.thread_lock.release()
-        if indiv.parent_id is not None:
-            self.events[indiv.parent_id].wait()
-    def receive_trial_result(self, parameter_id, parameters, reward):
-        self.thread_lock.acquire()
-        # code for processing trial results
-        self.thread_lock.release()
-        self.events[parameter_id].set()
-```
-## Examples
-For details, please refer to this [simple weight sharing example](https://github.com/Microsoft/nni/tree/master/test/async_sharing_test). We also provided a [practice example](https://github.com/Microsoft/nni/tree/master/examples/trials/weight_sharing/ga_squad) for reading comprehension, based on previous [ga_squad](https://github.com/Microsoft/nni/tree/master/examples/trials/ga_squad) example.
-[1]: https://arxiv.org/abs/1802.03268
-[2]: https://arxiv.org/abs/1707.07012
-[3]: https://arxiv.org/abs/1806.09055
-[4]: https://arxiv.org/abs/1806.10282
-[5]: https://arxiv.org/abs/1703.01041
--- a/docs/en_US/AdvancedFeature/GeneralNasInterfaces.md
+++ b/docs/en_US/AdvancedFeature/GeneralNasInterfaces.md
-# NNI Programming Interface for Neural Architecture Search (NAS)
-_*This is an **experimental feature**. Currently, we only implemented the general NAS programming interface. Weight sharing will be supported in the following releases._
-Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another.
-To facilitate NAS innovations (e.g., design/implement new NAS models, compare different NAS models side-by-side), an easy-to-use and flexible programming interface is crucial.
-<a name="ProgInterface"></a>
-## Programming interface
- A new programming interface for designing and searching for a model is often demanded in two scenarios. 1) When designing a neural network, the designer may have multiple choices for a layer, sub-model, or connection, and not sure which one or a combination performs the best. It would be appealing to have an easy way to express the candidate layers/sub-models they want to try. 2) For the researchers who are working on automatic NAS, they want to have an unified way to express the search space of neural architectures. And making unchanged trial code adapted to different searching algorithms.
- We designed a simple and flexible programming interface based on [NNI annotation](../Tutorial/AnnotationSpec.md). It is elaborated through examples below.
-### Example: choose an operator for a layer
-When designing the following model there might be several choices in the fourth layer that may make this model perform well. In the script of this model, we can use annotation for the fourth layer as shown in the figure. In this annotation, there are five fields in total:
-![](../../img/example_layerchoice.png)
-* __layer_choice__: It is a list of function calls, each function should have defined in user's script or imported libraries. The input arguments of the function should follow the format: `def XXX(inputs, arg2, arg3, ...)`, where inputs is a list with two elements. One is the list of `fixed_inputs`, and the other is a list of the chosen inputs from `optional_inputs`. `conv` and `pool` in the figure are examples of function definition. For the function calls in this list, no need to write the first argument (i.e., input). Note that only one of the function calls are chosen for this layer.
-* __fixed_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another `nni.mutable_layer` before this layer, or other python variables before this layer. All the variables in this list will be fed into the chosen function in `layer_choice` (as the first element of the input list).
-* __optional_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another `nni.mutable_layer` before this layer, or other python variables before this layer. Only `optional_input_size` variables will be fed into the chosen function in `layer_choice` (as the second element of the input list).
-* __optional_input_size__: It indicates how many inputs are chosen from `input_candidates`. It could be a number or a range. A range [1,3] means it chooses 1, 2, or 3 inputs.
-* __layer_output__: The name of the output(s) of this layer, in this case it represents the return of the function call in `layer_choice`. This will be a variable name that can be used in the following python code or `nni.mutable_layer`.
-There are two ways to write annotation for this example. For the upper one, input of the function calls is `[[],[out3]]`. For the bottom one, input is `[[out3],[]]`.
-__Debugging__: We provided an `nnictl trial codegen` command to help debugging your code of NAS programming on NNI. If your trial with trial_id `XXX` in your experiment `YYY` is failed, you could run `nnictl trial codegen YYY --trial_id XXX` to generate an executable code for this trial under your current directory. With this code, you can directly run the trial command without NNI to check why this trial is failed. Basically, this command is to compile your trial code and replace the NNI NAS code with the real chosen layers and inputs.
-### Example: choose input connections for a layer
-Designing connections of layers is critical for making a high performance model. With our provided interface, users could annotate which connections a layer takes (as inputs). They could choose several ones from a set of connections. Below is an example which chooses two inputs from three candidate inputs for `concat`. Here `concat` always takes the output of its previous layer using `fixed_inputs`.
-![](../../img/example_connectchoice.png)
-### Example: choose both operators and connections
-In this example, we choose one from the three operators and choose two connections for it. As there are multiple variables in inputs, we call `concat` at the beginning of the functions.
-![](../../img/example_combined.png)
-### Example: [ENAS][1] macro search space
-To illustrate the convenience of the programming interface, we use the interface to implement the trial code of "ENAS + macro search space". The left figure is the macro search space in ENAS paper.
-![](../../img/example_enas.png)
-## Unified NAS search space specification
-After finishing the trial code through the annotation above, users have implicitly specified the search space of neural architectures in the code. Based on the code, NNI will automatically generate a search space file which could be fed into tuning algorithms. This search space file follows the following JSON format.
-```javascript
-{
-    "mutable_1": {
-        "_type": "mutable_layer",
-        "_value": {
-            "layer_1": {
-                "layer_choice": ["conv(ch=128)", "pool", "identity"],
-                "optional_inputs": ["out1", "out2", "out3"],
-                "optional_input_size": 2
-            },
-            "layer_2": {
-                ...
-            }
-        }
-    }
-}
-```
-Accordingly, a specified neural architecture (generated by tuning algorithm) is expressed as follows:
-```javascript
-{
-    "mutable_1": {
-        "layer_1": {
-            "chosen_layer": "pool",
-            "chosen_inputs": ["out1", "out3"]
-        },
-        "layer_2": {
-            ...
-        }
-    }
-}
-```
-With the specification of the format of search space and architecture (choice) expression, users are free to implement various (general) tuning algorithms for neural architecture search on NNI. One future work is to provide a general NAS algorithm.
-## Support of One-Shot NAS
-One-Shot NAS is a popular approach to find good neural architecture within a limited time and resource budget. Basically, it builds a full graph based on the search space, and uses gradient descent to at last find the best subgraph. There are different training approaches, such as [training subgraphs (per mini-batch)][1], [training full graph through dropout][6], [training with architecture weights (regularization)][3]. 
-NNI has supported the general NAS as demonstrated above. From users' point of view, One-Shot NAS and NAS have the same search space specification, thus, they could share the same programming interface as demonstrated above, just different training modes. NNI provides four training modes:
-**\*classic_mode\***: this mode is described [above](#ProgInterface), in this mode, each subgraph runs as a trial job. To use this mode, you should enable NNI annotation and specify a tuner for nas in experiment config file. [Here](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas) is an example to show how to write trial code and the config file. And [here](https://github.com/microsoft/nni/tree/master/examples/tuners/random_nas_tuner) is a simple tuner for nas.
-**\*enas_mode\***: following the training approach in [enas paper][1]. It builds the full graph based on neural architrecture search space, and only activate one subgraph that generated by the controller for each mini-batch. [Detailed Description](#ENASMode). (currently only supported on tensorflow). 
-To use enas_mode, you should add one more field in the `trial` config as shown below.
-```diff
-trial:
-    command: your command to run the trial
-    codeDir: the directory where the trial's code is located
-    gpuNum: the number of GPUs that one trial job needs
-+   #choice: classic_mode, enas_mode, oneshot_mode
-+   nasMode: enas_mode
-```
-Similar to classic_mode, in enas_mode you need to specify a tuner for nas, as it also needs to receive subgraphs from tuner (or controller using the terminology in the paper). Since this trial job needs to receive multiple subgraphs from tuner, each one for a mini-batch, two lines need to be added to the trial code to receive the next subgraph (i.e., `nni.training_update`) and report the result of the current subgraph. Below is an example:
-```python
-for _ in range(num):
-    # here receives and enables a new subgraph
-    """@nni.training_update(tf=tf, session=self.session)"""
-    loss, _ = self.session.run([loss_op, train_op])
-    # report the loss of this mini-batch
-    """@nni.report_final_result(loss)"""
-```
-Here, `nni.training_update` is to do some update on the full graph. In enas_mode, the update means receiving a subgraph and enabling it on the next mini-batch. While in darts_mode, the update means training the architecture weights (details in darts_mode). In enas_mode, you need to pass the imported tensorflow package to `tf` and the session to `session`.
-**\*oneshot_mode\***: following the training approach in [this paper][6]. Different from enas_mode which trains the full graph by training large numbers of subgraphs, in oneshot_mode the full graph is built and dropout is added to candidate inputs and also added to candidate ops' outputs. Then this full graph is trained like other DL models. [Detailed Description](#OneshotMode). (currently only supported on tensorflow). 
-To use oneshot_mode, you should add one more field in the `trial` config as shown below. In this mode, though there is no need to use tuner, you still need to specify a tuner (any tuner) in the config file for now. Also, no need to add `nni.training_update` in this mode, because no special processing (or update) is needed during training.
-```diff
-trial:
-    command: your command to run the trial
-    codeDir: the directory where the trial's code is located
-    gpuNum: the number of GPUs that one trial job needs
-+   #choice: classic_mode, enas_mode, oneshot_mode
-+   nasMode: oneshot_mode
-```
-**\*darts_mode\***: following the training approach in [this paper][3]. It is similar to oneshot_mode. There are two differences, one is that darts_mode only add architecture weights to the outputs of candidate ops, the other is that it trains model weights and architecture weights in an interleaved manner. [Detailed Description](#DartsMode).
-To use darts_mode, you should add one more field in the `trial` config as shown below. In this mode, though there is no need to use tuner, you still need to specify a tuner (any tuner) in the config file for now.
-```diff
-trial:
-    command: your command to run the trial
-    codeDir: the directory where the trial's code is located
-    gpuNum: the number of GPUs that one trial job needs
-+   #choice: classic_mode, enas_mode, oneshot_mode
-+   nasMode: darts_mode
-```
-When using darts_mode, you need to call `nni.training_update` as shown below when architecture weights should be updated. Updating architecture weights need `loss` for updating the weights as well as the training data (i.e., `feed_dict`) for it.
-```python
-for _ in range(num):
-    # here trains the architecture weights
-    """@nni.training_update(tf=tf, session=self.session, loss=loss, feed_dict=feed_dict)"""
-    loss, _ = self.session.run([loss_op, train_op])
-```
-**Note:** for enas_mode, oneshot_mode, and darts_mode, NNI only works on the training phase. They also have their own inference phase which is not handled by NNI. For enas_mode, the inference phase is to generate new subgraphs through the controller. For oneshot_mode, the inference phase is sampling new subgraphs randomly and choosing good ones. For darts_mode, the inference phase is pruning a proportion of candidates ops based on architecture weights.
-<a name="ENASMode"></a>
-### enas_mode
-In enas_mode, the compiled trial code builds the full graph (rather than subgraph), it receives a chosen architecture and training this architecture on the full graph for a mini-batch, then request another chosen architecture. It is supported by [NNI multi-phase](./MultiPhase.md).
-Specifically, for trials using tensorflow, we create and use tensorflow variable as signals, and tensorflow conditional functions to control the search space (full-graph) to be more flexible, which means it can be changed into different sub-graphs (multiple times) depending on these signals. [Here](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas/enas_mode) is an example for enas_mode.
-<a name="OneshotMode"></a>
-### oneshot_mode
-Below is the figure to show where dropout is added to the full graph for one layer in `nni.mutable_layers`, input 1-k are candidate inputs, the four ops are candidate ops.
-![](../../img/oneshot_mode.png)
-As suggested in the [paper][6], a dropout method is implemented to the inputs for every layer. The dropout rate is set to r^(1/k), where 0 < r < 1 is a hyper-parameter of the model (default to be 0.01) and k is number of optional inputs for a specific layer. The higher the fan-in, the more likely each possible input is to be dropped out. However, the probability of dropping out all optional_inputs of a layer is kept constant regardless of its fan-in. Suppose r = 0.05. If a layer has k = 2 optional_inputs then each one will independently be dropped out with probability 0.051/2 ≈ 0.22 and will be retained with probability 0.78. If a layer has k = 7 optional_inputs then each one will independently be dropped out with probability 0.051/7 ≈ 0.65 and will be retained with probability 0.35. In both cases, the probability of dropping out all of the layer's optional_inputs is 5%. The outputs of candidate ops are dropped out through the same way. [Here](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas/oneshot_mode) is an example for oneshot_mode.
-<a name="DartsMode"></a>
-### darts_mode
-Below is the figure to show where architecture weights are added to the full graph for one layer in `nni.mutable_layers`, output of each candidate op is multiplied by a weight which is called architecture weight.
-![](../../img/darts_mode.png)
-In `nni.training_update`, tensorflow MomentumOptimizer is used to train the architecture weights based on the pass `loss` and `feed_dict`. [Here](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas/darts_mode) is an example for darts_mode.
-### [__TODO__] Multiple trial jobs for One-Shot NAS
-One-Shot NAS usually has only one trial job with the full graph. However, running multiple such trial jobs leads to benefits. For example, in enas_mode multiple trial jobs could share the weights of the full graph to speedup the model training (or converge). Some One-Shot approaches are not stable, running multiple trial jobs increase the possibility of finding better models.
-NNI natively supports running multiple such trial jobs. The figure below shows how multiple trial jobs run on NNI.
-![](../../img/one-shot_training.png)
-=============================================================
-## System design of NAS on NNI
-### Basic flow of experiment execution
-NNI's annotation compiler transforms the annotated trial code to the code that could receive architecture choice and build the corresponding model (i.e., graph). The NAS search space can be seen as a full graph (here, full graph means enabling all the provided operators and connections to build a graph), the architecture chosen by the tuning algorithm is a subgraph in it. By default, the compiled trial code only builds and executes the subgraph.
-![](../../img/nas_on_nni.png)
-The above figure shows how the trial code runs on NNI. `nnictl` processes user trial code to generate a search space file and compiled trial code. The former is fed to tuner, and the latter is used to run trials. 
-[Simple example of NAS on NNI](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas).
-### [__TODO__] Weight sharing
-Sharing weights among chosen architectures (i.e., trials) could speedup model search. For example, properly inheriting weights of completed trials could speedup the converge of new trials. One-Shot NAS (e.g., ENAS, Darts) is more aggressive, the training of different architectures (i.e., subgraphs) shares the same copy of the weights in full graph.
-![](../../img/nas_weight_share.png)
-We believe weight sharing (transferring) plays a key role on speeding up NAS, while finding efficient ways of sharing weights is still a hot research topic. We provide a key-value store for users to store and load weights. Tuners and Trials use a provided KV client lib to access the storage.
-Example of weight sharing on NNI.
-## General tuning algorithms for NAS
-Like hyperparameter tuning, a relatively general algorithm for NAS is required. The general programming interface makes this task easier to some extent. We have an [RL tuner based on PPO algorithm](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/ppo_tuner) for NAS. We expect efforts from community to design and implement better NAS algorithms.
-## [__TODO__] Export best neural architecture and code
-After the NNI experiment is done, users could run `nnictl experiment export --code` to export the trial code with the best neural architecture.
-## Conclusion and Future work
-There could be different NAS algorithms and execution modes, but they could be supported with the same programming interface as demonstrated above.
-There are many interesting research topics in this area, both system and machine learning.
-[1]: https://arxiv.org/abs/1802.03268
-[2]: https://arxiv.org/abs/1707.07012
-[3]: https://arxiv.org/abs/1806.09055
-[4]: https://arxiv.org/abs/1806.10282
-[5]: https://arxiv.org/abs/1703.01041
-[6]: http://proceedings.mlr.press/v80/bender18a/bender18a.pdf
--- a/docs/en_US/CommunitySharings/community_sharings.rst
+++ b/docs/en_US/CommunitySharings/community_sharings.rst
@@ -12,3 +12,4 @@ In addtion to the official tutorilas and examples, we encourage community contri
    Neural Architecture Search Comparison <NasComparision>
    Hyper-parameter Tuning Algorithm Comparsion <HpoComparision>
    Parallelizing Optimization for TPE <ParallelizingTpeSearch>
+    Automatically tune systems with NNI <TuningSystems>
--- a/docs/en_US/Compressor/AutoCompression.md
+++ b/docs/en_US/Compressor/AutoCompression.md
@@ -84,7 +84,7 @@ config_list_agp = [{'initial_sparsity': 0, 'final_sparsity': conv0_sparsity,
                   {'initial_sparsity': 0, 'final_sparsity': conv1_sparsity,
                    'start_epoch': 0, 'end_epoch': 3,
                    'frequency': 1,'op_name': 'conv1' },]
-PRUNERS = {'level':LevelPruner(model, config_list_level)，'agp':AGP_Pruner(model, config_list_agp)}
+PRUNERS = {'level':LevelPruner(model, config_list_level), 'agp':AGP_Pruner(model, config_list_agp)}
 pruner = PRUNERS(params['prune_method']['_name'])
 pruner.compress()
 ... # fine tuning

--- a/docs/en_US/NAS/NasInterface.md
+++ b/docs/en_US/NAS/NasInterface.md
@@ -2,8 +2,6 @@
 We are trying to support various NAS algorithms with unified programming interface, and it's still in experimental stage. It means the current programing interface might be updated in future.
-*previous [NAS annotation](../AdvancedFeature/GeneralNasInterfaces.md) interface will be deprecated soon.*
 ## Programming interface for user model
 The programming interface of designing and searching a model is often demanded in two scenarios.
@@ -100,7 +98,7 @@ trainer.export(file='./chosen_arch')
 Different trainers could have different input arguments depending on their algorithms. Please refer to [each trainer's code](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch) for detailed arguments. After training, users could export the best one of the found models through `trainer.export()`. No need to start an NNI experiment through `nnictl`.
-The supported trainers can be found [here](./Overview.md#supported-one-shot-nas-algorithms). A very simple example using NNI NAS API can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/simple/train.py).
+The supported trainers can be found [here](Overview.md#supported-one-shot-nas-algorithms). A very simple example using NNI NAS API can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/simple/train.py).
 ### Classic distributed search

--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -97,8 +97,6 @@ python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json
 NOTE, we are trying to support various NAS algorithms with unified programming interface, and it's in very experimental stage. It means the current programing interface may be updated in future.
-*previous [NAS annotation](../AdvancedFeature/GeneralNasInterfaces.md) interface will be deprecated soon.*
 ### Programming interface
 The programming interface of designing and searching a model is often demanded in two scenarios.

--- a/docs/en_US/Release.md
+++ b/docs/en_US/Release.md
@@ -63,7 +63,7 @@
    - Support Auto-Feature generator & selection    -Issue#877  -PR #1387
         + Provide auto feature interface
         + Tuner based on beam search
-         + [Add Pakdd example](./examples/trials/auto-feature-engineering/README.md)
+         + [Add Pakdd example](https://github.com/microsoft/nni/tree/master/examples/trials/auto-feature-engineering)
    - Add a parallel algorithm to improve the performance of TPE with large concurrency.  -PR #1052
    - Support multiphase for hyperband    -PR #1257
@@ -91,9 +91,9 @@
 * Documentation
    - Update the docs structure  -Issue #1231
-    - [Multi phase document improvement](./docs/en_US/AdvancedFeature/MultiPhase.md)   -Issue #1233  -PR #1242
+    - [Multi phase document improvement](AdvancedFeature/MultiPhase.md)   -Issue #1233  -PR #1242
         + Add configuration example
-    - [WebUI description improvement](./docs/en_US/Tutorial/WebUI.md)  -PR #1419
+    - [WebUI description improvement](Tutorial/WebUI.md)  -PR #1419
 ### Bug fix

--- a/docs/en_US/TrialExample/RocksdbExamples.md
+++ b/docs/en_US/TrialExample/RocksdbExamples.md
@@ -8,9 +8,9 @@ The performance of RocksDB is highly contingent on its tuning. However, because
 This example illustrates how to use NNI to search the best configuration of RocksDB for a `fillrandom` benchmark supported by a benchmark tool `db_bench`, which is an official benchmark tool provided by RocksDB itself. Therefore, before running this example, please make sure NNI is installed and [`db_bench`](https://github.com/facebook/rocksdb/wiki/Benchmarking-tools) is in your `PATH`. Please refer to [here](../Tutorial/QuickStart.md) for detailed information about installation and preparing of NNI environment, and [here](https://github.com/facebook/rocksdb/blob/master/INSTALL.md) for compiling RocksDB as well as `db_bench`.
-We also provide a simple script [`db_bench_installation.sh`](../../../examples/trials/systems/rocksdb-fillrandom/db_bench_installation.sh) helping to compile and install `db_bench` as well as its dependencies on Ubuntu. Installing RocksDB on other systems can follow the same procedure.
+We also provide a simple script [`db_bench_installation.sh`](https://github.com/microsoft/nni/tree/master/examples/trials/systems/rocksdb-fillrandom/db_bench_installation.sh) helping to compile and install `db_bench` as well as its dependencies on Ubuntu. Installing RocksDB on other systems can follow the same procedure.
-*code directory: [`example/trials/systems/rocksdb-fillrandom`](../../../examples/trials/systems/rocksdb-fillrandom)*
+*code directory: [`example/trials/systems/rocksdb-fillrandom`](https://github.com/microsoft/nni/tree/master/examples/trials/systems/rocksdb-fillrandom)*
 ## Experiment setup
@@ -39,7 +39,7 @@ In this example, the search space is specified by a `search_space.json` file as
 }
 ```
-*code directory: [`example/trials/systems/rocksdb-fillrandom/search_space.json`](../../../examples/trials/systems/rocksdb-fillrandom/search_space.json)*
+*code directory: [`example/trials/systems/rocksdb-fillrandom/search_space.json`](https://github.com/microsoft/nni/tree/master/examples/trials/systems/rocksdb-fillrandom/search_space.json)*
 ### Benchmark code
@@ -48,7 +48,7 @@ Benchmark code should receive a configuration from NNI manager, and report the c
 * Use `nni.get_next_parameter()` to get next system configuration.
 * Use `nni.report_final_result(metric)` to report the benchmark result.
-*code directory: [`example/trials/systems/rocksdb-fillrandom/main.py`](../../../examples/trials/systems/rocksdb-fillrandom/main.py)*
+*code directory: [`example/trials/systems/rocksdb-fillrandom/main.py`](https://github.com/microsoft/nni/tree/master/examples/trials/systems/rocksdb-fillrandom/main.py)*
 ### Config file
@@ -56,11 +56,11 @@ One could start a NNI experiment with a config file. A config file for NNI is a
 Here is an example of tuning RocksDB with SMAC algorithm:
-*code directory: [`example/trials/systems/rocksdb-fillrandom/config_smac.yml`](../../../examples/trials/systems/rocksdb-fillrandom/config_smac.yml)*
+*code directory: [`example/trials/systems/rocksdb-fillrandom/config_smac.yml`](https://github.com/microsoft/nni/tree/master/examples/trials/systems/rocksdb-fillrandom/config_smac.yml)*
 Here is an example of tuning RocksDB with TPE algorithm:
-*code directory: [`example/trials/systems/rocksdb-fillrandom/config_tpe.yml`](../../../examples/trials/systems/rocksdb-fillrandom/config_tpe.yml)*
+*code directory: [`example/trials/systems/rocksdb-fillrandom/config_tpe.yml`](https://github.com/microsoft/nni/tree/master/examples/trials/systems/rocksdb-fillrandom/config_tpe.yml)*
 Other tuners can be easily adopted in the same way. Please refer to [here](../Tuner/BuiltinTuner.md) for more information.
@@ -87,7 +87,7 @@ We ran these two examples on the same machine with following details:
 The detailed experiment results are shown in the below figure. Horizontal axis is sequential order of trials. Vertical axis is the metric, write OPS in this example. Blue dots represent trials for tuning RocksDB with SMAC tuner, and orange dots stand for trials for tuning RocksDB with TPE tuner. 
-![image](../../../examples/trials/systems/rocksdb-fillrandom/plot.png)
+![image](https://github.com/microsoft/nni/tree/master/examples/trials/systems/rocksdb-fillrandom/plot.png)
 Following table lists the best trials and corresponding parameters and metric obtained by the two tuners. Unsurprisingly, both of them found the same optimal configuration for `fillrandom` benchmark.

--- a/docs/en_US/Tuner/BuiltinTuner.md
+++ b/docs/en_US/Tuner/BuiltinTuner.md
@@ -43,7 +43,7 @@ TPE, as a black-box optimization, can be used in various scenarios and shows goo
 * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
-Note: We have optimized the parallelism of TPE for large-scale trial-concurrency. For the principle of optimization or turn-on optimization, please refer to [TPE document](HyperoptTuner.md).
+Note: We have optimized the parallelism of TPE for large-scale trial-concurrency. For the principle of optimization or turn-on optimization, please refer to [TPE document](./HyperoptTuner.md).
 **Usage example:**

--- a/docs/en_US/Tuner/CustomizeAdvisor.md
+++ b/docs/en_US/Tuner/CustomizeAdvisor.md
@@ -35,4 +35,4 @@ advisor:
 ## Example
-Here we provide an [example](../../../examples/tuners/mnist_keras_customized_advisor).
+Here we provide an [example](https://github.com/microsoft/nni/tree/master/examples/tuners/mnist_keras_customized_advisor).
--- a/docs/en_US/Tutorial/Contributing.md
+++ b/docs/en_US/Tutorial/Contributing.md
@@ -43,6 +43,8 @@ A person looking to contribute can take up an issue by claiming it as a comment/
 * For docstrings, please refer to [numpydoc docstring guide](https://numpydoc.readthedocs.io/en/latest/format.html) and [pandas docstring guide](https://python-sprints.github.io/pandas/guide/pandas_docstring.html)
    * For function docstring, **description**, **Parameters**, and **Returns**/**Yields** are mandatory.
    * For class docstring, **description**, **Attributes** are mandatory.
+    * For docstring to describe `dict`, which is commonly used in our hyper-param format description, please refer to [RiboKit : Doc Standards
+ - Internal Guideline on Writing Standards](https://ribokit.github.io/docs/text/)
 ## Documentation
 Our documentation is built with [sphinx](http://sphinx-doc.org/), supporting [Markdown](https://guides.github.com/features/mastering-markdown/) and [reStructuredText](http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html) format. All our documentations are placed under [docs/en_US](https://github.com/Microsoft/nni/tree/master/docs).

--- a/docs/en_US/examples.rst
+++ b/docs/en_US/examples.rst
@@ -10,3 +10,4 @@ Examples
    Scikit-learn<./TrialExample/SklearnExamples>
    EvolutionSQuAD<./TrialExample/SquadEvolutionExamples>
    GBDT<./TrialExample/GbdtExample>
+    RocksDB <./TrialExample/RocksdbExamples>
--- a/docs/en_US/feature_engineering.rst
+++ b/docs/en_US/feature_engineering.rst
-###################
 Feature Engineering
-###################
+===================
 We are glad to announce the alpha release for Feature Engineering toolkit on top of NNI,
 it's still in the experiment phase which might evolve based on usage feedback.

--- a/docs/en_US/nas.rst
+++ b/docs/en_US/nas.rst
@@ -20,6 +20,6 @@ For details, please refer to the following tutorials:
    Overview <NAS/Overview>
    NAS Interface <NAS/NasInterface>
-    ENAS <NAS/Overview>
+    ENAS <NAS/ENAS>
-    DARTS <NAS/Overview>
+    DARTS <NAS/DARTS>
    P-DARTS <NAS/Overview>
--- a/docs/en_US/reference.rst
+++ b/docs/en_US/reference.rst
@@ -10,3 +10,4 @@ References
    Configuration<Tutorial/ExperimentConfig>
    Search Space <Tutorial/SearchSpaceSpec>
    TrainingService <TrainingService/HowToImplementTrainingService>
+    Framework Library <SupportedFramework_Library>
--- a/docs/en_US/training_services.rst
+++ b/docs/en_US/training_services.rst
@@ -2,6 +2,7 @@ Introduction to NNI Training Services
 =====================================
 ..  toctree::
+    Overview <./TrainingService/SupportTrainingService>
    Local<./TrainingService/LocalMode>
    Remote<./TrainingService/RemoteMachineMode>
    OpenPAI<./TrainingService/PaiMode>

--- a/src/sdk/pynni/nni/batch_tuner/batch_tuner.py
+++ b/src/sdk/pynni/nni/batch_tuner/batch_tuner.py
@@ -24,13 +24,15 @@ class BatchTuner(Tuner):
    Examples
    --------
    The search space only be accepted like:
-    ```
-    {
+        ::
-        'combine_params': { '_type': 'choice',
+            {'combine_params':
+                { '_type': 'choice',
                            '_value': '[{...}, {...}, {...}]',
                }
            }
-    ```
    """
    def __init__(self):

--- a/src/sdk/pynni/nni/msg_dispatcher_base.py
+++ b/src/sdk/pynni/nni/msg_dispatcher_base.py
@@ -163,12 +163,15 @@ class MsgDispatcherBase(Recoverable):
        raise NotImplementedError('handle_initialize not implemented')
    def handle_request_trial_jobs(self, data):
-        """The message dispatcher is demanded to generate `data` trial jobs.
+        """The message dispatcher is demanded to generate ``data`` trial jobs.
-        These trial jobs should be sent via `send(CommandType.NewTrialJob, json_tricks.dumps(parameter))`,
+        These trial jobs should be sent via ``send(CommandType.NewTrialJob, json_tricks.dumps(parameter))``,
-        where `parameter` will be received by NNI Manager and eventually accessible to trial jobs as "next parameter".
+        where ``parameter`` will be received by NNI Manager and eventually accessible to trial jobs as "next parameter".
-        Semantically, message dispatcher should do this `send` exactly `data` times.
+        Semantically, message dispatcher should do this ``send`` exactly ``data`` times.
        The JSON sent by this method should follow the format of
+        ::
            {
                "parameter_id": 42
                "parameters": {
@@ -176,6 +179,7 @@ class MsgDispatcherBase(Recoverable):
                },
                "parameter_source": "algorithm" // optional
            }
        Parameters
        ----------
        data: int
@@ -211,6 +215,7 @@ class MsgDispatcherBase(Recoverable):
    def handle_report_metric_data(self, data):
        """Called when metric data is reported or new parameters are requested (for multiphase).
        When new parameters are requested, this method should send a new parameter.
        Parameters
        ----------
        data: dict
@@ -219,6 +224,7 @@ class MsgDispatcherBase(Recoverable):
            `REQUEST_PARAMETER` is used to request new parameters for multiphase trial job. In this case,
            the dict will contain additional keys: `trial_job_id`, `parameter_index`. Refer to `msg_dispatcher.py`
            as an example.
        Raises
        ------
        ValueError
@@ -228,6 +234,7 @@ class MsgDispatcherBase(Recoverable):
    def handle_trial_end(self, data):
        """Called when the state of one of the trials is changed
        Parameters
        ----------
        data: dict
@@ -235,5 +242,6 @@ class MsgDispatcherBase(Recoverable):
            trial_job_id: the id generated by training service.
            event: the job’s state.
            hyper_params: the string that is sent by message dispatcher during the creation of trials.
        """
        raise NotImplementedError('handle_trial_end not implemented')