Merge branch 'v2.0' into master

349ead41 · liuzhe · 25db55ca · 649ee597 · 349ead41 · 349ead41
Commit 349ead41 authored Jan 14, 2021 by liuzhe
20 changed files
--- a/docs/en_US/Compression/Quantizer.rst
+++ b/docs/en_US/Compression/Quantizer.rst
@@ -71,7 +71,7 @@ You can view example for more information
 User configuration for QAT Quantizer
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-common configuration needed by compression algorithms can be found at `Specification of ``config_list`` <./QuickStart.rst>`__.
+common configuration needed by compression algorithms can be found at `Specification of `config_list <./QuickStart.rst>`__.
 configuration needed by this algorithm :
@@ -157,7 +157,7 @@ PyTorch code
   quantizer = BNNQuantizer(model, configure_list)
   model = quantizer.compress()
-You can view example :githublink:`examples/model_compress/BNN_quantizer_cifar10.py <examples/model_compress/BNN_quantizer_cifar10.py>` for more information.
+You can view example :githublink:`examples/model_compress/quantization/BNN_quantizer_cifar10.py <examples/model_compress/quantization/BNN_quantizer_cifar10.py>` for more information.
 User configuration for BNN Quantizer
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -181,4 +181,4 @@ We implemented one of the experiments in `Binarized Neural Networks: Training De
     - 86.93%
-The experiments code can be found at :githublink:`examples/model_compress/BNN_quantizer_cifar10.py <examples/model_compress/BNN_quantizer_cifar10.py>` 
+The experiments code can be found at :githublink:`examples/model_compress/quantization/BNN_quantizer_cifar10.py <examples/model_compress/quantization/BNN_quantizer_cifar10.py>` 
--- a/docs/en_US/Compression/QuickStart.rst
+++ b/docs/en_US/Compression/QuickStart.rst
@@ -8,7 +8,7 @@ In this tutorial, we use the `first section <#quick-start-to-compress-a-model>`_
 Quick Start to Compress a Model
 -------------------------------
-NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use `slim pruner </Compression/Pruner.html#slim-pruner>`__ as an example to show the usage.
+NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use `slim pruner <../Compression/Pruner.rst#slim-pruner>`__ as an example to show the usage.
 Write configuration
 ^^^^^^^^^^^^^^^^^^^
@@ -45,7 +45,7 @@ After training, you get accuracy of the pruned model. You can export model weigh
   pruner.export_model(model_path='pruned_vgg19_cifar10.pth', mask_path='mask_vgg19_cifar10.pth')
-The complete code of model compression examples can be found :githublink:`here <examples/model_compress/model_prune_torch.py>`.
+The complete code of model compression examples can be found :githublink:`here <examples/model_compress/pruning/model_prune_torch.py>`.
 Speed up the model
 ^^^^^^^^^^^^^^^^^^
@@ -82,13 +82,13 @@ Tensorflow code
   pruner = LevelPruner(tf.get_default_graph(), config_list)
   pruner.compress()
-You can use other compression algorithms in the package of ``nni.compression``. The algorithms are implemented in both PyTorch and TensorFlow (partial support on TensorFlow), under ``nni.compression.pytorch`` and ``nni.compression.tensorflow`` respectively. You can refer to `Pruner <./Pruner.md>`__ and `Quantizer <./Quantizer.md>`__ for detail description of supported algorithms. Also if you want to use knowledge distillation, you can refer to `KDExample <../TrialExample/KDExample.rst>`__
+You can use other compression algorithms in the package of ``nni.compression``. The algorithms are implemented in both PyTorch and TensorFlow (partial support on TensorFlow), under ``nni.compression.pytorch`` and ``nni.compression.tensorflow`` respectively. You can refer to `Pruner <./Pruner.rst>`__ and `Quantizer <./Quantizer.rst>`__ for detail description of supported algorithms. Also if you want to use knowledge distillation, you can refer to `KDExample <../TrialExample/KDExample.rst>`__
 A compression algorithm is first instantiated with a ``config_list`` passed in. The specification of this ``config_list`` will be described later.
 The function call ``pruner.compress()`` modifies user defined model (in Tensorflow the model can be obtained with ``tf.get_default_graph()``\ , while in PyTorch the model is the defined model class), and the model is modified with masks inserted. Then when you run the model, the masks take effect. The masks can be adjusted at runtime by the algorithms.
-*Note that, ``pruner.compress`` simply adds masks on model weights, it does not include fine tuning logic. If users want to fine tune the compressed model, they need to write the fine tune logic by themselves after ``pruner.compress``.*
+Note that, ``pruner.compress`` simply adds masks on model weights, it does not include fine tuning logic. If users want to fine tune the compressed model, they need to write the fine tune logic by themselves after ``pruner.compress``.
 Specification of ``config_list``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -104,7 +104,7 @@ There are different keys in a ``dict``. Some of them are common keys supported b
 * **op_names**\ : This is to specify by name what operations to be compressed. If this field is omitted, operations will not be filtered by it.
 * **exclude**\ : Default is False. If this field is True, it means the operations with specified types and names will be excluded from the compression.
-Some other keys are often specific to a certain algorithms, users can refer to `pruning algorithms <./Pruner.md>`__ and `quantization algorithms <./Quantizer.rst>`__ for the keys allowed by each algorithm.
+Some other keys are often specific to a certain algorithms, users can refer to `pruning algorithms <./Pruner.rst>`__ and `quantization algorithms <./Quantizer.rst>`__ for the keys allowed by each algorithm.
 A simple example of configuration is shown below:
@@ -190,14 +190,14 @@ In this example, 'op_names' is the name of layer and four layers will be quantiz
 APIs for Updating Fine Tuning Status
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Some compression algorithms use epochs to control the progress of compression (e.g. `AGP </Compression/Pruner.html#agp-pruner>`__\ ), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: ``pruner.update_epoch(epoch)`` and ``pruner.step()``.
+Some compression algorithms use epochs to control the progress of compression (e.g. `AGP <../Compression/Pruner.rst#agp-pruner>`__\ ), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: ``pruner.update_epoch(epoch)`` and ``pruner.step()``.
 ``update_epoch`` should be invoked in every epoch, while ``step`` should be invoked after each minibatch. Note that most algorithms do not require calling the two APIs. Please refer to each algorithm's document for details. For the algorithms that do not need them, calling them is allowed but has no effect.
-Export Compressed Model
+Export Pruned Model
-^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^
-You can easily export the compressed model using the following API if you are pruning your model, ``state_dict`` of the sparse model weights will be stored in ``model.pth``\ , which can be loaded by ``torch.load('model.pth')``. In this exported ``model.pth``\ , the masked weights are zero.
+You can easily export the pruned model using the following API if you are pruning your model, ``state_dict`` of the sparse model weights will be stored in ``model.pth``\ , which can be loaded by ``torch.load('model.pth')``. In this exported ``model.pth``\ , the masked weights are zero.
 .. code-block:: bash
@@ -209,4 +209,43 @@ You can easily export the compressed model using the following API if you are pr
   pruner.export_model(model_path='model.pth', mask_path='mask.pth', onnx_path='model.onnx', input_shape=[1, 1, 28, 28])
+Export Quantized Model
+^^^^^^^^^^^^^^^^^^^^^^
+You can export the quantized model directly by using ``torch.save`` api and the quantized model can be loaded by ``torch.load`` without any extra modification. The following example shows the normal procedure of saving, loading quantized model and get related parameters in QAT.
+.. code-block:: python
+   # Init model and quantize it by using NNI QAT
+   model = Mnist()
+   configure_list = [...]
+   optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
+   quantizer = QAT_Quantizer(model, configure_list, optimizer)
+   quantizer.compress()
+   model.to(device)
+   # Quantize aware training
+   for epoch in range(40):
+        print('# Epoch {} #'.format(epoch))
+        train(model, quantizer, device, train_loader, optimizer)
+   # Save quantized model which is generated by using NNI QAT algorithm
+   torch.save(model.state_dict(), "quantized_model.pkt")
+   # Simulate model loading procedure
+   # Have to init new model and compress it before loading
+   qmodel_load = Mnist()
+   optimizer = torch.optim.SGD(qmodel_load.parameters(), lr=0.01, momentum=0.5)
+   quantizer = QAT_Quantizer(qmodel_load, configure_list, optimizer)
+   quantizer.compress()
+   # Load quantized model
+   qmodel_load.load_state_dict(torch.load("quantized_model.pkt"))
+   # Get scale, zero_point and weight of conv1 in loaded model
+   conv1 = qmodel_load.conv1
+   scale = conv1.module.scale
+   zero_point = conv1.module.zero_point
+   weight = conv1.module.weight
 If you want to really speed up the compressed model, please refer to `NNI model speedup <./ModelSpeedup.rst>`__ for details.
--- a/docs/en_US/Compression/quantization.rst
+++ b/docs/en_US/Compression/quantization.rst
@@ -8,8 +8,8 @@ format for model weights is 32-bit float, or FP32. Many research works have demo
 can be represented using 8-bit integers without significant loss in accuracy. Even lower bit-widths, such as 4/2/1 bits,
 is an active field of research.
-A quantizer is a quantization algorithm implementation in NNI, NNI provides multiple quntizers as below. You can also
+A quantizer is a quantization algorithm implementation in NNI, NNI provides multiple quantizers as below. You can also
-create your own quntizer using NNI model compression interface.
+create your own quantizer using NNI model compression interface.
 ..  toctree::
    :maxdepth: 2

--- a/docs/en_US/FeatureEngineering/GBDTSelector.rst
+++ b/docs/en_US/FeatureEngineering/GBDTSelector.rst
@@ -40,7 +40,7 @@ Then
 And you could reference the examples in ``/examples/feature_engineering/gbdt_selector/``\ , too.
-**Requirement of ``fit`` FuncArgs**
+**Requirement of fit FuncArgs**
 * 
@@ -64,7 +64,7 @@ And you could reference the examples in ``/examples/feature_engineering/gbdt_sel
 * 
  **num_boost_round** (int, require) - number of boost round. The detail you could reference `here <https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.train.html#lightgbm.train>`__.
-**Requirement of ``get_selected_features`` FuncArgs**
+**Requirement of get_selected_features FuncArgs**
 * **topk** (int, require) - the topK impotance features you want to selected.
--- a/docs/en_US/FeatureEngineering/GradientFeatureSelector.rst
+++ b/docs/en_US/FeatureEngineering/GradientFeatureSelector.rst
 GradientFeatureSelector
 -----------------------
-The algorithm in GradientFeatureSelector comes from `"Feature Gradients: Scalable Feature Selection via Discrete Relaxation" <https://arxiv.org/pdf/1908.10382.pdf>`__.
+The algorithm in GradientFeatureSelector comes from `Feature Gradients: Scalable Feature Selection via Discrete Relaxation <https://arxiv.org/pdf/1908.10382.pdf>`__.
 GradientFeatureSelector, a gradient-based search algorithm
 for feature selection. 
@@ -90,7 +90,7 @@ And you could reference the examples in ``/examples/feature_engineering/gradient
 * 
  **device** (str, optional, default = 'cpu') - 'cpu' to run on CPU and 'cuda' to run on GPU. Runs much faster on GPU
-**Requirement of ``fit`` FuncArgs**
+**Requirement of fit FuncArgs**
 * 
@@ -102,6 +102,6 @@ And you could reference the examples in ``/examples/feature_engineering/gradient
 * 
  **groups** (array-like, optional, default = None) - Groups of columns that must be selected as a unit. e.g. [0, 0, 1, 2] specifies the first two columns are part of a group. Which shape is [n_features].
-**Requirement of ``get_selected_features`` FuncArgs**
+**Requirement of get_selected_features FuncArgs**
 For now, the ``get_selected_features`` function has no parameters.
--- a/docs/en_US/FeatureEngineering/Overview.rst
+++ b/docs/en_US/FeatureEngineering/Overview.rst
@@ -49,7 +49,7 @@ If you want to implement a customized feature selector, you need to:
 #. Inherit the base FeatureSelector class
-#. Implement *fit* and _get_selected*features* function
+#. Implement *fit* and _get_selected *features* function
 #. Integrate with sklearn (Optional)
 Here is an example:
@@ -64,7 +64,7 @@ Here is an example:
       def __init__(self, ...):
       ...
-**2. Implement *fit* and _get_selected*features* Function**
+**2. Implement fit and _get_selected features Function**
 .. code-block:: python

--- a/docs/en_US/NAS/Advanced.rst
+++ b/docs/en_US/NAS/Advanced.rst
@@ -81,9 +81,9 @@ The requirements of return values of ``sample_search()`` and ``sample_final()``
       def sample_final(self):
           return self.sample_search()  # use the same logic here. you can do something different
-The complete example of random mutator can be found :githublink:`here <src/sdk/pynni/nni/nas/pytorch/random/mutator.py>`.
+The complete example of random mutator can be found :githublink:`here <nni/nas/pytorch/mutator.py>`.
-For advanced usages, e.g., users want to manipulate the way modules in ``LayerChoice`` are executed, they can inherit ``BaseMutator``\ , and overwrite ``on_forward_layer_choice`` and ``on_forward_input_choice``\ , which are the callback implementation of ``LayerChoice`` and ``InputChoice`` respectively. Users can still use property ``mutables`` to get all ``LayerChoice`` and ``InputChoice`` in the model code. For details, please refer to :githublink:`reference <src/sdk/pynni/nni/nas/pytorch>` here to learn more.
+For advanced usages, e.g., users want to manipulate the way modules in ``LayerChoice`` are executed, they can inherit ``BaseMutator``\ , and overwrite ``on_forward_layer_choice`` and ``on_forward_input_choice``\ , which are the callback implementation of ``LayerChoice`` and ``InputChoice`` respectively. Users can still use property ``mutables`` to get all ``LayerChoice`` and ``InputChoice`` in the model code. For details, please refer to :githublink:`reference <nni/nas/pytorch/>` here to learn more.
 .. tip::
    A useful application of random mutator is for debugging. Use

--- a/docs/en_US/NAS/Benchmarks.rst
+++ b/docs/en_US/NAS/Benchmarks.rst
@@ -32,7 +32,7 @@ To avoid storage and legality issues, we do not provide any prepared databases.
      git clone -b ${NNI_VERSION} https://github.com/microsoft/nni
      cd nni/examples/nas/benchmarks
-   Replace ``${NNI_VERSION}`` with a released version name or branch name, e.g., ``v1.9``.
+   Replace ``${NNI_VERSION}`` with a released version name or branch name, e.g., ``v2.0``.
 #. 
   Install dependencies via ``pip3 install -r xxx.requirements.txt``. ``xxx`` can be ``nasbench101``\ , ``nasbench201`` or ``nds``.
@@ -44,12 +44,13 @@ Please make sure there is at least 10GB free disk space and note that the conver
 Example Usages
 --------------
-Please refer to `examples usages of Benchmarks API <./BenchmarksExample>`__.
+Please refer to `examples usages of Benchmarks API <./BenchmarksExample.rst>`__.
 NAS-Bench-101
 -------------
-`Paper link <https://arxiv.org/abs/1902.09635>`__ &nbsp; &nbsp; `Open-source <https://github.com/google-research/nasbench>`__
+* `Paper link <https://arxiv.org/abs/1902.09635>`__ 
+* `Open-source <https://github.com/google-research/nasbench>`__
 NAS-Bench-101 contains 423,624 unique neural networks, combined with 4 variations in number of epochs (4, 12, 36, 108), each of which is trained 3 times. It is a cell-wise search space, which constructs and stacks a cell by enumerating DAGs with at most 7 operators, and no more than 9 connections. All operators can be chosen from ``CONV3X3_BN_RELU``\ , ``CONV1X1_BN_RELU`` and ``MAXPOOL3X3``\ , except the first operator (always ``INPUT``\ ) and last operator (always ``OUTPUT``\ ).
@@ -85,7 +86,9 @@ API Documentation
 NAS-Bench-201
 -------------
-`Paper link <https://arxiv.org/abs/2001.00326>`__ &nbsp; &nbsp; `Open-source API <https://github.com/D-X-Y/NAS-Bench-201>`__ &nbsp; &nbsp;\ `Implementations <https://github.com/D-X-Y/AutoDL-Projects>`__
+* `Paper link <https://arxiv.org/abs/2001.00326>`__ 
+* `Open-source API <https://github.com/D-X-Y/NAS-Bench-201>`__ 
+* `Implementations <https://github.com/D-X-Y/AutoDL-Projects>`__
 NAS-Bench-201 is a cell-wise search space that views nodes as tensors and edges as operators. The search space contains all possible densely-connected DAGs with 4 nodes, resulting in 15,625 candidates in total. Each operator (i.e., edge) is selected from a pre-defined operator set (\ ``NONE``\ , ``SKIP_CONNECT``\ , ``CONV_1X1``\ , ``CONV_3X3`` and ``AVG_POOL_3X3``\ ). Training appraoches vary in the dataset used (CIFAR-10, CIFAR-100, ImageNet) and number of epochs scheduled (12 and 200). Each combination of architecture and training approach is repeated 1 - 3 times with different random seeds.
@@ -113,7 +116,8 @@ API Documentation
 NDS
 ---
-`Paper link <https://arxiv.org/abs/1905.13214>`__ &nbsp; &nbsp; `Open-source <https://github.com/facebookresearch/nds>`__
+* `Paper link <https://arxiv.org/abs/1905.13214>`__ 
+* `Open-source <https://github.com/facebookresearch/nds>`__
 *On Network Design Spaces for Visual Recognition* released trial statistics of over 100,000 configurations (models + hyper-parameters) sampled from multiple model families, including vanilla (feedforward network loosely inspired by VGG), ResNet and ResNeXt (residual basic block and residual bottleneck block) and NAS cells (following popular design from NASNet, Ameoba, PNAS, ENAS and DARTS). Most configurations are trained only once with a fixed seed, except a few that are trained twice or three times.

--- a/docs/en_US/NAS/Cream.rst
+++ b/docs/en_US/NAS/Cream.rst
-.. role:: raw-html(raw)
-   :format: html
 Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
 =======================================================================================
-`[Paper] <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__ `[Models-Google Drive] <https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing>`__ `[Models-Baidu Disk (PWD: wqw6)] <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__ `[BibTex] <https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja>`__   :raw-html:`<br/>`
+* `Paper <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__
+* `Models-Google Drive <https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing>`__
+* `Models-Baidu Disk (PWD: wqw6) <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__
+* `BibTex <https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja>`__
 In this work, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. The discovered architectures achieve superior performance compared to the recent `MobileNetV3 <https://arxiv.org/abs/1905.02244>`__ and `EfficientNet <https://arxiv.org/abs/1905.11946>`__ families under aligned settings.
-:raw-html:`<div ><img src="https://github.com/microsoft/Cream/blob/main/demo/intro.jpg" width="800"/></div>`
+.. image:: https://raw.githubusercontent.com/microsoft/Cream/main/demo/intro.jpg
 Reproduced Results
 ------------------
@@ -44,13 +44,11 @@ The training with 16 Gpus is a little bit superior than 8 Gpus, as below.
-.. raw:: html
+.. image:: ../../img/cream_flops100.jpg
+   :scale: 50%
-   <table style="border: none">
-       <th><img src="./../../img/cream_flops100.jpg" alt="drawing" width="400"/></th>
-       <th><img src="./../../img/cream_flops600.jpg" alt="drawing" width="400"/></th>
-   </table>
+.. image:: ../../img/cream_flops600.jpg
+   :scale: 50%
 Examples
 --------
@@ -62,7 +60,7 @@ Please run the following scripts in the example folder.
 Data Preparation
 ----------------
-You need to first download the `ImageNet-2012 <http://www.image-net.org/>`__ to the folder ``./data/imagenet`` and move the validation set to the subfolder ``./data/imagenet/val``. To move the validation set, you cloud use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh 
+You need to first download the `ImageNet-2012 <http://www.image-net.org/>`__ to the folder ``./data/imagenet`` and move the validation set to the subfolder ``./data/imagenet/val``. To move the validation set, you cloud use `the following script <https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh>`__ .
 Put the imagenet data in ``./data``. It should be like following:
@@ -75,7 +73,7 @@ Put the imagenet data in ``./data``. It should be like following:
 Quick Start
 -----------
-I. Search
+1. Search
 ^^^^^^^^^
 First, build environments for searching.
@@ -105,7 +103,7 @@ After you specify the flops of the architectures you would like to search, you c
 The searched architectures need to be retrained and obtain the final model. The final model is saved in ``.pth.tar`` format. Retraining code will be released soon.
-II. Retrain
+2. Retrain
 ^^^^^^^^^^^
 To train searched architectures, you need to configure the parameter ``MODEL_SELECTION`` to specify the model Flops. To specify which model to train, you should add ``MODEL_SELECTION`` in ``./configs/retrain.yaml``. You can select one from [14,43,112,287,481,604], which stands for different Flops(MB).
@@ -130,7 +128,7 @@ After adding ``MODEL_SELECTION`` in ``./configs/retrain.yaml``\ , you need to us
   python -m torch.distributed.launch --nproc_per_node=8 ./retrain.py --cfg ./configs/retrain.yaml
-III. Test
+3. Test
 ^^^^^^^^^
 To test our trained of models, you need to use ``MODEL_SELECTION`` in ``./configs/test.yaml`` to specify which model to test.

--- a/docs/en_US/NAS/ENAS.rst
+++ b/docs/en_US/NAS/ENAS.rst
@@ -39,8 +39,8 @@ Reference
 PyTorch
 ^^^^^^^
-..  autoclass:: nni.algorithms.nas.pytorch.enas.EnasTrainer
+.. autoclass:: nni.algorithms.nas.pytorch.enas.EnasTrainer
    :members:
-..  autoclass:: nni.algorithms.nas.pytorch.enas.EnasMutator
+.. autoclass:: nni.algorithms.nas.pytorch.enas.EnasMutator
    :members:
--- a/docs/en_US/NAS/Overview.rst
+++ b/docs/en_US/NAS/Overview.rst
@@ -29,7 +29,7 @@ The procedure of classic NAS algorithms is similar to hyper-parameter tuning, us
     - Brief Introduction of Algorithm
   * - :githublink:`Random Search <examples/tuners/random_nas_tuner>`
     - Randomly pick a model from search space
-   * - `PPO Tuner </Tuner/BuiltinTuner.html#PPOTuner>`__
+   * - `PPO Tuner <../Tuner/BuiltinTuner.rst#PPO-Tuner>`__
     - PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. `Reference Paper <https://arxiv.org/abs/1707.06347>`__
@@ -46,20 +46,22 @@ NNI currently supports the one-shot NAS algorithms listed below and is adding mo
   * - Name
     - Brief Introduction of Algorithm
-   * - `ENAS </NAS/ENAS.html>`__
+   * - `ENAS <ENAS.rst>`__
     - `Efficient Neural Architecture Search via Parameter Sharing <https://arxiv.org/abs/1802.03268>`__. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance.
-   * - `DARTS </NAS/DARTS.html>`__
+   * - `DARTS <DARTS.rst>`__
     - `DARTS: Differentiable Architecture Search <https://arxiv.org/abs/1806.09055>`__ introduces a novel algorithm for differentiable network architecture search on bilevel optimization.
-   * - `P-DARTS </NAS/PDARTS.html>`__
+   * - `P-DARTS <PDARTS.rst>`__
     - `Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation <https://arxiv.org/abs/1904.12760>`__ is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure.
-   * - `SPOS </NAS/SPOS.html>`__
+   * - `SPOS <SPOS.rst>`__
     - `Single Path One-Shot Neural Architecture Search with Uniform Sampling <https://arxiv.org/abs/1904.00420>`__ constructs a simplified supernet trained with a uniform path sampling method and applies an evolutionary algorithm to efficiently search for the best-performing architectures.
-   * - `CDARTS </NAS/CDARTS.html>`__
+   * - `CDARTS <CDARTS.rst>`__
-     - `Cyclic Differentiable Architecture Search <https://arxiv.org/abs/****>`__ builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.
+     - `Cyclic Differentiable Architecture Search <https://arxiv.org/pdf/2006.10724.pdf>`__ builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.
-   * - `ProxylessNAS </NAS/Proxylessnas.html>`__
+   * - `ProxylessNAS <Proxylessnas.rst>`__
     - `ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware <https://arxiv.org/abs/1812.00332>`__. It removes proxy, directly learns the architectures for large-scale target tasks and target hardware platforms.
-   * - `TextNAS </NAS/TextNAS.html>`__
+   * - `TextNAS <TextNAS.rst>`__
     - `TextNAS: A Neural Architecture Search Space tailored for Text Representation <https://arxiv.org/pdf/1912.10729.pdf>`__. It is a neural architecture search algorithm tailored for text representation.
+   * - `Cream </NAS/Cream.html>`__
+     - `Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search  <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__. It is a new NAS algorithm distilling prioritized paths in search space, without using evolutionary algorithms. Achieving competitive performance on ImageNet, especially for small models (e.g. <200 M Flops).
 One-shot algorithms run **standalone without nnictl**. NNI supports both PyTorch and Tensorflow 2.X.

--- a/docs/en_US/NAS/Proxylessnas.rst
+++ b/docs/en_US/NAS/Proxylessnas.rst
@@ -56,8 +56,7 @@ Implementation
 The implementation on NNI is based on the `offical implementation <https://github.com/mit-han-lab/ProxylessNAS>`__. The official implementation supports two training approaches: gradient descent and RL based, and support different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing.
-Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in :githublink:`example code <examples/nas/proxylessnas>` using :githublink:`NNI NAS interface <src/sdk/pynni/nni/nas/pytorch/proxylessnas>`.
+Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in :githublink:`example code <examples/nas/proxylessnas>` using :githublink:`NNI NAS interface <nni/algorithms/nas/pytorch/proxylessnas>`.
 .. image:: ../../img/proxylessnas.png
   :target: ../../img/proxylessnas.png

--- a/docs/en_US/NAS/Visualization.rst
+++ b/docs/en_US/NAS/Visualization.rst
@@ -4,7 +4,7 @@ NAS Visualization (Experimental)
 Built-in Trainers Support
 -------------------------
-Currently, only ENAS and DARTS support visualization. Examples of `ENAS <./ENAS.md>`__ and `DARTS <./DARTS.rst>`__ has demonstrated how to enable visualization in your code, namely, adding this before ``trainer.train()``\ :
+Currently, only ENAS and DARTS support visualization. Examples of `ENAS <./ENAS.rst>`__ and `DARTS <./DARTS.rst>`__ has demonstrated how to enable visualization in your code, namely, adding this before ``trainer.train()``\ :
 .. code-block:: python

--- a/docs/en_US/NAS/WriteSearchSpace.rst
+++ b/docs/en_US/NAS/WriteSearchSpace.rst
-Write A .. role:: raw-html(raw)
+Write A Search Space
-   :format: html
-Search Space
 ====================
 Genrally, a search space describes candiate architectures from which users want to find the best one. Different search algorithms, no matter classic NAS or one-shot NAS, can be applied on the search space. NNI provides APIs to unified the expression of neural architecture search space.
@@ -61,10 +58,10 @@ So how about the possibilities of connections? This can be done using ``InputCho
           # ... same ...
           return output
-Input choice can be thought of as a callable module that receives a list of tensors and outputs the concatenation/sum/mean of some of them (sum by default), or ``None`` if none is selected. Like layer choices, input choices should be **initialized in ``__init__`` and called in ``forward``**. This is to allow search algorithms to identify these choices and do necessary preparations.
+Input choice can be thought of as a callable module that receives a list of tensors and outputs the concatenation/sum/mean of some of them (sum by default), or ``None`` if none is selected. Like layer choices, input choices should be initialized in ``__init__`` and called in ``forward``. This is to allow search algorithms to identify these choices and do necessary preparations.
 ``LayerChoice`` and ``InputChoice`` are both **mutables**. Mutable means "changeable". As opposed to traditional deep learning layers/modules which have fixed operation types once defined, models with mutable are essentially a series of possible models.
 Users can specify a **key** for each mutable. By default, NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two ``LayerChoice``\ s with the same candidate operations and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice and will be used in the dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables (e.g., ``LayerChoice`` and ``InputChoice``\ ), see `Mutables <./NasReference.rst>`__.
-With search space defined, the next step is searching for the best model from it. Please refer to `classic NAS algorithms <./ClassicNas.md>`__ and `one-shot NAS algorithms <./NasGuide.rst>`__ for how to search from your defined search space.
+With search space defined, the next step is searching for the best model from it. Please refer to `classic NAS algorithms <./ClassicNas.rst>`__ and `one-shot NAS algorithms <./NasGuide.rst>`__ for how to search from your defined search space.
--- a/docs/en_US/NAS/retiarii/ApiReference.rst
+++ b/docs/en_US/NAS/retiarii/ApiReference.rst
+Retiarii API Reference
+======================
+.. contents::
+Inline Mutation APIs
+--------------------
+..  autoclass:: nni.retiarii.nn.pytorch.LayerChoice
+    :members:
+..  autoclass:: nni.retiarii.nn.pytorch.InputChoice
+    :members:
+Graph Mutation APIs
+-------------------
+..  autoclass:: nni.retiarii.Mutator
+    :members:
+..  autoclass:: nni.retiarii.Model
+    :members:
+..  autoclass:: nni.retiarii.Graph
+    :members:
+..  autoclass:: nni.retiarii.Node
+    :members:
+..  autoclass:: nni.retiarii.Edge
+    :members:
+..  autoclass:: nni.retiarii.Operation
+    :members:
+Trainers
+--------
+..  autoclass:: nni.retiarii.trainer.PyTorchImageClassificationTrainer
+    :members:
+..  autoclass:: nni.retiarii.trainer.PyTorchMultiModelTrainer
+    :members:
+Oneshot Trainers
+----------------
+..  autoclass:: nni.retiarii.trainer.pytorch.DartsTrainer
+    :members:
+..  autoclass:: nni.retiarii.trainer.pytorch.EnasTrainer
+    :members:
+..  autoclass:: nni.retiarii.trainer.pytorch.ProxylessTrainer
+    :members:
+..  autoclass:: nni.retiarii.trainer.pytorch.SinglePathTrainer
+    :members:
+Strategies
+----------
+..  autoclass:: nni.retiarii.strategies.RandomStrategy
+    :members:
+..  autoclass:: nni.retiarii.strategies.TPEStrategy
+    :members:
+Retiarii Experiments
+--------------------
+..  autoclass:: nni.retiarii.experiment.RetiariiExperiment
+    :members:
+..  autoclass:: nni.retiarii.experiment.RetiariiExeConfig
+    :members:
--- a/docs/en_US/NAS/retiarii/Tutorial.rst
+++ b/docs/en_US/NAS/retiarii/Tutorial.rst
+Neural Architecture Search with Retiarii (Experimental)
+=======================================================
+`Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ is a new framework to support neural architecture search and hyper-parameter tuning. It allows users to express various search space with high flexibility, to reuse many SOTA search algorithms, and to leverage system level optimizations to speed up the search process. This framework provides the following new user experiences.
+* Search space can be expressed directly in user model code. A tuning space can be expressed along defining a model.
+* Neural architecture candidates and hyper-parameter candidates are more friendly supported in an experiment.
+* The experiment can be launched directly from python code.
+*We are working on migrating* `our previous NAS framework <../Overview.rst>`__ *to Retiarii framework. Thus, this feature is still experimental. We recommend users to try the new framework and provide your valuable feedback for us to improve it. The old framework is still supported for now.*
+.. contents::
+There are mainly two steps to start an experiment for your neural architecture search task. First, define the model space you want to explore. Second, choose a search method to explore your defined model space.
+Define your Model Space
+-----------------------
+Model space is defined by users to express a set of models that users want to explore, and believe good-performing models are included in those models. In this framework, a model space is defined with two parts: a base model and possible mutations on the base model.
+Define Base Model
+^^^^^^^^^^^^^^^^^
+Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model. There are only two small differences.
+* Replace the code ``import torch.nn as nn`` with ``import nni.retiarii.nn.pytorch as nn`` for PyTorch modules, such as ``nn.Conv2d``, ``nn.ReLU``.
+* Some **user-defined** modules should be decorated with ``@blackbox_module``. For example, user-defined module used in ``LayerChoice`` should be decorated. Users can refer to `here <#blackbox-module>`__ for detailed usage instruction of ``@blackbox_module``.
+Below is a very simple example of defining a base model, it is almost the same as defining a PyTorch model.
+.. code-block:: python
+  import torch.nn.functional as F
+  import nni.retiarii.nn.pytorch as nn
+  class MyModule(nn.Module):
+    def __init__(self):
+      super().__init__()
+      self.conv = nn.Conv2d(32, 1, 5)
+      self.pool = nn.MaxPool2d(kernel_size=2)
+    def forward(self, x):
+      return self.pool(self.conv(x))
+  class Model(nn.Module):
+    def __init__(self):
+      super().__init__()
+      self.mymodule = MyModule()
+    def forward(self, x):
+      return F.relu(self.mymodule(x))
+Users can refer to :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>` and :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>` for more complicated examples.
+Define Model Mutations
+^^^^^^^^^^^^^^^^^^^^^^
+A base model is only one concrete model not a model space. We provide APIs and primitives for users to express how the base model can be mutated, i.e., a model space which includes many models.
+**Express mutations in an inlined manner**
+For easy usability and also backward compatibility, we provide some APIs for users to easily express possible mutations after defining a base model. The APIs can be used just like PyTorch module.
+* ``nn.LayerChoice``. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model. *Note that if the candidate is a user-defined module, it should be decorated as `blackbox module <#blackbox-module>`__. In the following example, ``ops.PoolBN`` and ``ops.SepConv`` should be decorated.*
+  .. code-block:: python
+    # import nni.retiarii.nn.pytorch as nn
+    # declared in `__init__`
+    self.layer = nn.LayerChoice([
+      ops.PoolBN('max', channels, 3, stride, 1),
+      ops.SepConv(channels, channels, 3, stride, 1),
+      nn.Identity()
+    ]))
+    # invoked in `forward` function
+    out = self.layer(x)
+* ``nn.InputChoice``. It is mainly for choosing (or trying) different connections. It takes several tensors and chooses ``n_chosen`` tensors from them.
+  .. code-block:: python
+    # import nni.retiarii.nn.pytorch as nn
+    # declared in `__init__`
+    self.input_switch = nn.InputChoice(n_chosen=1)
+    # invoked in `forward` function, choose one from the three
+    out = self.input_switch([tensor1, tensor2, tensor3])
+* ``nn.ValueChoice``. It is for choosing one value from some candidate values. It can only be used as input argument of the modules in ``nn.modules`` and ``@blackbox_module`` decorated user-defined modules. *Note that it has not been officially supported.*
+  .. code-block:: python
+    # import nni.retiarii.nn.pytorch as nn
+    # used in `__init__`
+    self.conv = nn.Conv2d(XX, XX, kernel_size=nn.ValueChoice([1, 3, 5])
+    self.op = MyOp(nn.ValueChoice([0, 1], nn.ValueChoice([-1, 1]))
+Detailed API description and usage can be found `here <./ApiReference.rst>`__\. Example of using these APIs can be found in :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>`.
+**Express mutations with mutators**
+Though easy-to-use, inline mutations have limited expressiveness, some model spaces cannot be expressed. To improve expressiveness and flexibility, we provide primitives for users to write *Mutator* to express how they want to mutate base model more flexibly. Mutator stands above base model, thus has full ability to edit the model.
+Users can instantiate several mutators as below, the mutators will be sequentially applied to the base model one after another for sampling a new model.
+.. code-block:: python
+  applied_mutators = []
+  applied_mutators.append(BlockMutator('mutable_0'))
+  applied_mutators.append(BlockMutator('mutable_1'))
+``BlockMutator`` is defined by users to express how to mutate the base model. User-defined mutator should inherit ``Mutator`` class, and implement mutation logic in the member function ``mutate``.
+.. code-block:: python
+  from nni.retiarii import Mutator
+  class BlockMutator(Mutator):
+    def __init__(self, target: str, candidates: List):
+        super(BlockMutator, self).__init__()
+        self.target = target
+        self.candidate_op_list = candidates
+    def mutate(self, model):
+      nodes = model.get_nodes_by_label(self.target)
+      for node in nodes:
+        chosen_op = self.choice(self.candidate_op_list)
+        node.update_operation(chosen_op.type, chosen_op.params)
+The input of ``mutate`` is graph IR of the base model (please refer to `here <./ApiReference.rst>`__ for the format and APIs of the IR), users can mutate the graph with its member functions (e.g., ``get_nodes_by_label``, ``update_operation``). The mutation operations can be combined with the API ``self.choice``, in order to express a set of possible mutations. In the above example, the node's operation can be changed to any operation from ``candidate_op_list``.
+Use placehoder to make mutation easier: ``nn.Placeholder``. If you want to mutate a subgraph or node of your model, you can define a placeholder in this model to represent the subgraph or node. Then, use mutator to mutate this placeholder to make it real modules.
+.. code-block:: python
+  ph = nn.Placeholder(label='mutable_0',
+    related_info={
+      'kernel_size_options': [1, 3, 5],
+      'n_layer_options': [1, 2, 3, 4],
+      'exp_ratio': exp_ratio,
+      'stride': stride
+    }
+  )
+``label`` is used by mutator to identify this placeholder, ``related_info`` is the information that are required by mutator. As ``related_info`` is a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>`.
+Explore the Defined Model Space
+-------------------------------
+After model space is defined, it is time to explore this model space. Users can choose proper search and training approach to explore the model space.
+Create a Trainer and Exploration Strategy
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+**Classic search approach:**
+In this approach, trainer is for training each explored model, while strategy is for sampling the models. Both trainer and strategy are required to explore the model space.
+**Oneshot (weight-sharing) search approach:**
+In this approach, users only need a oneshot trainer, because this trainer takes charge of both search and training.
+In the following table, we listed the available trainers and strategies.
+.. list-table::
+  :header-rows: 1
+  :widths: auto
+  * - Trainer
+    - Strategy
+    - Oneshot Trainer
+  * - PyTorchImageClassificationTrainer
+    - TPEStrategy
+    - DartsTrainer
+  * - PyTorchMultiModelTrainer
+    - RandomStrategy
+    - EnasTrainer
+  * - 
+    - 
+    - ProxylessTrainer
+  * - 
+    - 
+    - SinglePathTrainer (RandomTrainer)
+There usage and API document can be found `here <./ApiReference>`__\.
+Here is a simple example of using trainer and strategy.
+.. code-block:: python
+  trainer = PyTorchImageClassificationTrainer(base_model,   
+    dataset_cls="MNIST",
+    dataset_kwargs={"root": "data/mnist", "download": True},
+    dataloader_kwargs={"batch_size": 32},
+    optimizer_kwargs={"lr": 1e-3},
+    trainer_kwargs={"max_epochs": 1})
+  simple_startegy = RandomStrategy()
+Users can refer to `this document <./WriteTrainer.rst>`__ for how to write a new trainer, and refer to `this document <./WriteStrategy.rst>`__ for how to write a new strategy.
+Set up an Experiment
+^^^^^^^^^^^^^^^^^^^^
+After all the above are prepared, it is time to start an experiment to do the model search. We design unified interface for users to start their experiment. An example is shown below
+.. code-block:: python
+  exp = RetiariiExperiment(base_model, trainer, applied_mutators, simple_startegy)
+  exp_config = RetiariiExeConfig('local')
+  exp_config.experiment_name = 'mnasnet_search'
+  exp_config.trial_concurrency = 2
+  exp_config.max_trial_number = 10
+  exp_config.training_service.use_active_gpu = False
+  exp.run(exp_config, 8081)
+This code starts an NNI experiment. Note that if inlined mutation is used, ``applied_mutators`` should be ``None``.
+The complete code of a simple MNIST example can be found :githublink:`here <test/retiarii_test/mnist/test.py>`.
+Visualize your experiment
+^^^^^^^^^^^^^^^^^^^^^^^^^
+Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost::8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../../Tutorial/WebUI.rst>`__ for details. If users are using oneshot trainer, they can refer to `here <../Visualization.rst>`__ for how to visualize their experiments.
+Export the best model found in your experiment
+----------------------------------------------
+If you are using *classic search approach*, you can simply find out the best one from WebUI.
+If you are using *oneshot (weight-sharing) search approach*, you can invole ``exp.export_top_models`` to output several best models that are found in the experiment.
+Advanced and FAQ
+----------------
+.. _blackbox-module:
+**Blackbox Module**
+To understand the decorator ``blackbox_module``, we first briefly explain how our framework works: it converts user-defined model to a graph representation (called graph IR), each instantiated module is converted to a subgraph. Then user-defined mutations are applied to the graph to generate new graphs. Each new graph is then converted back to PyTorch code and executed. ``@blackbox_module`` here means the module will not be converted to a subgraph but is converted to a single graph node. That is, the module will not be unfolded anymore. Users should/can decorate a user-defined module class in the following cases:
+* When a module class cannot be successfully converted to a subgraph due to some implementation issues. For example, currently our framework does not support adhoc loop, if there is adhoc loop in a module's forward, this class should be decorated as blackbox module. The following ``MyModule`` should be decorated.
+  .. code-block:: python
+    @blackbox_module
+    class MyModule(nn.Module):
+      def __init__(self):
+        ...
+      def forward(self, x):
+        for i in range(10): # <- adhoc loop
+          ...
+* The candidate ops in ``LayerChoice`` should be decorated as blackbox module. For example, ``self.op = nn.LayerChoice([Op1(...), Op2(...), Op3(...)])``, where ``Op1``, ``Op2``, ``Op3`` should be decorated if they are user defined modules.
+* When users want to use ``ValueChoice`` in a module's input argument, the module should be decorated as blackbox module. For example, ``self.conv = MyConv(kernel_size=nn.ValueChoice([1, 3, 5]))``, where ``MyConv`` should be decorated.
+* If no mutation is targeted on a module, this module *can be* decorated as a blackbox module.
\ No newline at end of file
--- a/docs/en_US/NAS/retiarii/WriteStrategy.rst
+++ b/docs/en_US/NAS/retiarii/WriteStrategy.rst
+Customize A New Strategy
+========================
+To write a new strategy, you should inherit the base strategy class ``BaseStrategy``, then implement the member function ``run``. This member function takes ``base_model`` and ``applied_mutators`` as its input arguments. It can simply apply the user specified mutators in ``applied_mutators`` onto ``base_model`` to generate a new model. When a mutator is applied, it should be bound with a sampler (e.g., ``RandomSampler``). Every sampler implements the ``choice`` function which chooses value(s) from candidate values. The ``choice`` functions invoked in mutators are executed with the sampler.
+Below is a very simple random strategy, the complete code can be found :githublink:`here <nni/retiarii/strategies/random_strategy.py>`.
+.. code-block:: python
+    class RandomSampler(Sampler):
+        def choice(self, candidates, mutator, model, index):
+            return random.choice(candidates)
+    class RandomStrategy(BaseStrategy):
+        def __init__(self):
+            self.random_sampler = RandomSampler()
+        def run(self, base_model, applied_mutators):
+            _logger.info('stargety start...')
+            while True:
+                avail_resource = query_available_resources()
+                if avail_resource > 0:
+                    model = base_model
+                    _logger.info('apply mutators...')
+                    _logger.info('mutators: %s', str(applied_mutators))
+                    for mutator in applied_mutators:
+                        mutator.bind_sampler(self.random_sampler)
+                        model = mutator.apply(model)
+                    # run models
+                    submit_models(model)
+                else:
+                    time.sleep(2)
+You can find that this strategy does not know the search space beforehand, it passively makes decisions every time ``choice`` is invoked from mutators. If a strategy wants to know the whole search space before making any decision (e.g., TPE, SMAC), it can use ``dry_run`` function provided by ``Mutator`` to obtain the space. An example strategy can be found :githublink:`here <nni/retiarii/strategies/tpe_strategy.py>`.
+After generating a new model, the strategy can use our provided APIs (e.g., ``submit_models``, ``is_stopped_exec``) to submit the model and get its reported results. More APIs can be found in `API References <./ApiReference.rst>`__.
\ No newline at end of file
--- a/docs/en_US/NAS/retiarii/WriteTrainer.rst
+++ b/docs/en_US/NAS/retiarii/WriteTrainer.rst
+Customize A New Trainer
+=======================
+Trainers are necessary to evaluate the performance of new explored models. In NAS scenario, this further divides into two use cases:
+1. **Classic trainers**: trainers that are used to train and evaluate one single model.
+2. **One-shot trainers**: trainers that handle training and searching simultaneously, from an end-to-end perspective.
+Classic trainers
+----------------
+All classic trainers need to inherit ``nni.retiarii.trainer.BaseTrainer``, implement the ``fit`` method and decorated with ``@register_trainer`` if it is intended to be used together with Retiarii. The decorator serialize the trainer that is used and its argument to fit for the requirements of NNI.
+The init function of trainer should take model as its first argument, and the rest of the arguments should be named (``*args`` and ``**kwargs`` may not work as expected) and JSON serializable. This means, currently, passing a complex object like ``torchvision.datasets.ImageNet()`` is not supported. Trainer should use NNI standard API to communicate with tuning algorithms. This includes ``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics.
+An example is as follows:
+.. code-block::python
+    from nni.retiarii import register_trainer
+    from nni.retiarii.trainer import BaseTrainer
+    @register_trainer
+    class MnistTrainer(BaseTrainer):
+        def __init__(self, model, optimizer_class_name='SGD', learning_rate=0.1):
+            super().__init__()
+            self.model = model
+            self.criterion = nn.CrossEntropyLoss()
+            self.train_dataset = MNIST(train=True)
+            self.valid_dataset = MNIST(train=False)
+            self.optimizer = getattr(torch.optim, optimizer_class_name)(lr=learning_rate)
+        def validate():
+            pass
+        def fit(self) -> None:
+            for i in range(10):  # number of epochs:
+                for x, y in DataLoader(self.dataset):
+                    self.optimizer.zero_grad()
+                    pred = self.model(x)
+                    loss = self.criterion(pred, y)
+                    loss.backward()
+                    self.optimizer.step()
+            acc = self.validate()  # get validation accuracy
+            nni.report_final_result(acc)
+One-shot trainers
+-----------------
+One-shot trainers should inheirt ``nni.retiarii.trainer.BaseOneShotTrainer``, which is basically same as ``BaseTrainer``, but only with one extra method ``export()``, which is expected to return the searched best architecture.
+Writing a one-shot trainer is very different to classic trainers. First of all, there are no more restrictions on init method arguments, any Python arguments are acceptable. Secondly, the model feeded into one-shot trainers might be a model with Retiarii-specific modules, such as LayerChoice and InputChoice. Such model cannot directly forward-propagate and trainers need to decide how to handle those modules.
+A typical example is DartsTrainer, where learnable-parameters are used to combine multiple choices in LayerChoice. Retiarii provides ease-to-use utility functions for module-replace purposes, namely ``replace_layer_choice``, ``replace_input_choice``. A simplified example is as follows: 
+.. code-block::python
+    from nni.retiarii.trainer import BaseOneShotTrainer
+    from nni.retiarii.trainer.pytorch.utils import replace_layer_choice, replace_input_choice
+    class DartsLayerChoice(nn.Module):
+        def __init__(self, layer_choice):
+            super(DartsLayerChoice, self).__init__()
+            self.name = layer_choice.key
+            self.op_choices = nn.ModuleDict(layer_choice.named_children())
+            self.alpha = nn.Parameter(torch.randn(len(self.op_choices)) * 1e-3)
+        def forward(self, *args, **kwargs):
+            op_results = torch.stack([op(*args, **kwargs) for op in self.op_choices.values()])
+            alpha_shape = [-1] + [1] * (len(op_results.size()) - 1)
+            return torch.sum(op_results * F.softmax(self.alpha, -1).view(*alpha_shape), 0)
+    class DartsTrainer(BaseOneShotTrainer):
+        def __init__(self, model, loss, metrics, optimizer):
+            self.model = model
+            self.loss = loss
+            self.metrics = metrics
+            self.num_epochs = 10
+            self.nas_modules = []
+            replace_layer_choice(self.model, DartsLayerChoice, self.nas_modules)
+            ... # init dataloaders and optimizers
+        def fit(self):
+            for i in range(self.num_epochs):
+                for (trn_X, trn_y), (val_X, val_y) in zip(self.train_loader, self.valid_loader):
+                    self.train_architecture(val_X, val_y)
+                    self.train_model_weight(trn_X, trn_y)
+        @torch.no_grad()
+        def export(self):
+            result = dict()
+            for name, module in self.nas_modules:
+                if name not in result:
+                    result[name] = select_best_of_module(module)
+            return result
+The full code of DartsTrainer is available to Retiarii source code. Please have a check at :githublink:`nni/retiarii/trainer/pytorch/darts.py`.
--- a/docs/en_US/NAS/retiarii/retiarii_index.rst
+++ b/docs/en_US/NAS/retiarii/retiarii_index.rst
+#################
+Retiarii Overview
+#################
+`Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ is a new framework to support neural architecture search and hyper-parameter tuning. It allows users to express various search space with high flexibility, to reuse many SOTA search algorithms, and to leverage system level optimizations to speed up the search process. This framework provides the following new user experiences.
+..  toctree::
+    :maxdepth: 2
+    Quick Start <Tutorial>
+    Customize a New Trainer <WriteTrainer>
+    Customize a New Strategy <WriteStrategy>
+    Retiarii APIs <ApiReference>
\ No newline at end of file
--- a/docs/en_US/Overview.rst
+++ b/docs/en_US/Overview.rst
@@ -77,7 +77,7 @@ NNI also provides algorithm toolkits for machine learning and deep learning, esp
 Hyperparameter Tuning
 ^^^^^^^^^^^^^^^^^^^^^
-This is a core and basic feature of NNI, we provide many popular `automatic tuning algorithms <Tuner/BuiltinTuner.md>`__ (i.e., tuner) and `early stop algorithms <Assessor/BuiltinAssessor.md>`__ (i.e., assessor). You can follow `Quick Start <Tutorial/QuickStart.rst>`__ to tune your model (or system). Basically, there are the above three steps and then starting an NNI experiment.
+This is a core and basic feature of NNI, we provide many popular `automatic tuning algorithms <Tuner/BuiltinTuner.rst>`__ (i.e., tuner) and `early stop algorithms <Assessor/BuiltinAssessor.rst>`__ (i.e., assessor). You can follow `Quick Start <Tutorial/QuickStart.rst>`__ to tune your model (or system). Basically, there are the above three steps and then starting an NNI experiment.
 General NAS Framework
 ^^^^^^^^^^^^^^^^^^^^^