[Retiarii] refactor of NAS doc and make python engine default (#3785)

Co-authored-by: Scarlett Li <39592018+scarlett2018@users.noreply.github.com> Co-authored-by: kvartet <48014605+kvartet@users.noreply.github.com>

[Retiarii] refactor of NAS doc and make python engine default (#3785)
Co-authored-by: Scarlett Li <39592018+scarlett2018@users.noreply.github.com> Co-authored-by: kvartet <48014605+kvartet@users.noreply.github.com>
4146c715 · QuanluZhang · GitHub · 0247be5e · 0247be5e · 4146c715
Unverified Commit 4146c715 authored Jun 13, 2021 by QuanluZhang Committed by GitHub Jun 13, 2021
20 changed files
--- a/docs/en_US/NAS/Advanced.rst
+++ b/docs/en_US/NAS/Advanced.rst
-Customize a NAS Algorithm
-=========================
-
-Extend the Ability of One-Shot Trainers
---------------------------------------
-
-Users might want to do multiple things if they are using the trainers on real tasks, for example, distributed training, half-precision training, logging periodically, writing tensorboard, dumping checkpoints and so on. As mentioned previously, some trainers do have support for some of the items listed above; others might not. Generally, there are two recommended ways to add anything you want to an existing trainer: inherit an existing trainer and override, or copy an existing trainer and modify.
-
-Either way, you are walking into the scope of implementing a new trainer. Basically, implementing a one-shot trainer is no different from any traditional deep learning trainer, except that a new concept called mutator will reveal itself. So that the implementation will be different in at least two places:
-
-
-* Initialization
-
-.. code-block:: python
-
-   model = Model()
-   mutator = MyMutator(model)
-
-
-* Training
-
-.. code-block:: python
-
-   for _ in range(epochs):
-       for x, y in data_loader:
-           mutator.reset()  # reset all the choices in model
-           out = model(x)  # like traditional model
-           loss = criterion(out, y)
-           loss.backward()
-           # no difference below
-
-To demonstrate what mutators are for, we need to know how one-shot NAS normally works. Usually, one-shot NAS "co-optimize model weights and architecture weights". It repeatedly: sample an architecture or combination of several architectures from the supernet, train the chosen architectures like traditional deep learning model, update the trained parameters to the supernet, and use the metrics or loss as some signal to guide the architecture sampler. The mutator, is the architecture sampler here, often defined to be another deep-learning model. Therefore, you can treat it as any model, by defining parameters in it and optimizing it with optimizers. One mutator is initialized with exactly one model. Once a mutator is binded to a model, it cannot be rebinded to another model.
-
-``mutator.reset()`` is the core step. That's where all the choices in the model are finalized. The reset result will be always effective, until the next reset flushes the data. After the reset, the model can be seen as a traditional model to do forward-pass and backward-pass.
-
-Finally, mutators provide a method called ``mutator.export()`` that export a dict with architectures to the model. Note that currently this dict this a mapping from keys of mutables to tensors of selection. So in order to dump to json, users need to convert the tensors explicitly into python list.
-
-Meanwhile, NNI provides some useful tools so that users can implement trainers more easily. See `Trainers <./NasReference.rst>`__ for details.
-
-Implement New Mutators
----------------------
-
-To start with, here is the pseudo-code that demonstrates what happens on ``mutator.reset()`` and ``mutator.export()``.
-
-.. code-block:: python
-
-   def reset(self):
-       self.apply_on_model(self.sample_search())
-
-.. code-block:: python
-
-   def export(self):
-       return self.sample_final()
-
-On reset, a new architecture is sampled with ``sample_search()`` and applied on the model. Then the model is trained for one or more steps in search phase. On export, a new architecture is sampled with ``sample_final()`` and **do nothing to the model**. This is either for checkpoint or exporting the final architecture.
-
-The requirements of return values of ``sample_search()`` and ``sample_final()`` are the same: a mapping from mutable keys to tensors. The tensor can be either a BoolTensor (true for selected, false for negative), or a FloatTensor which applies weight on each candidate. The selected branches will then be computed (in ``LayerChoice``\ , modules will be called; in ``InputChoice``\ , it's just tensors themselves), and reduce with the reduction operation specified in the choices. For most algorithms only worrying about the former part, here is an example of your mutator implementation.
-
-.. code-block:: python
-
-   class RandomMutator(Mutator):
-       def __init__(self, model):
-           super().__init__(model)  # don't forget to call super
-           # do something else
-
-       def sample_search(self):
-           result = dict()
-           for mutable in self.mutables:  # this is all the mutable modules in user model
-               # mutables share the same key will be de-duplicated
-               if isinstance(mutable, LayerChoice):
-                   # decided that this mutable should choose `gen_index`
-                   gen_index = np.random.randint(mutable.length)
-                   result[mutable.key] = torch.tensor([i == gen_index for i in range(mutable.length)], 
-                                                      dtype=torch.bool)
-               elif isinstance(mutable, InputChoice):
-                   if mutable.n_chosen is None:  # n_chosen is None, then choose any number
-                       result[mutable.key] = torch.randint(high=2, size=(mutable.n_candidates,)).view(-1).bool()
-                   # else do something else
-           return result
-
-       def sample_final(self):
-           return self.sample_search()  # use the same logic here. you can do something different
-
-The complete example of random mutator can be found :githublink:`here <nni/nas/pytorch/mutator.py>`.
-
-For advanced usages, e.g., users want to manipulate the way modules in ``LayerChoice`` are executed, they can inherit ``BaseMutator``\ , and overwrite ``on_forward_layer_choice`` and ``on_forward_input_choice``\ , which are the callback implementation of ``LayerChoice`` and ``InputChoice`` respectively. Users can still use property ``mutables`` to get all ``LayerChoice`` and ``InputChoice`` in the model code. For details, please refer to :githublink:`reference <nni/nas/pytorch/>` here to learn more.
-
-.. tip::
-    A useful application of random mutator is for debugging. Use
-
-    .. code-block:: python
-
-        mutator = RandomMutator(model)
-        mutator.reset()
-
-    will immediately set one possible candidate in the search space as the active one.
-
-Implemented a Distributed NAS Tuner
-----------------------------------
-
-Before learning how to write a distributed NAS tuner, users should first learn how to write a general tuner. read `Customize Tuner <../Tuner/CustomizeTuner.rst>`__ for tutorials.
-
-When users call "\ `nnictl ss_gen <../Tutorial/Nnictl.rst>`__\ " to generate search space file, a search space file like this will be generated:
-
-.. code-block:: json
-
-   {
-       "key_name": {
-           "_type": "layer_choice",
-           "_value": ["op1_repr", "op2_repr", "op3_repr"]
-       },
-       "key_name": {
-           "_type": "input_choice",
-           "_value": {
-               "candidates": ["in1_key", "in2_key", "in3_key"],
-               "n_chosen": 1
-           }
-       }
-   }
-
-This is the exact search space tuners will receive in ``update_search_space``. It's then tuners' responsibility to interpret the search space and generate new candidates in ``generate_parameters``. A valid "parameters" will be in the following format:
-
-.. code-block:: json
-
-   {
-       "key_name": {
-           "_value": "op1_repr",
-           "_idx": 0
-       },
-       "key_name": {
-           "_value": ["in2_key"],
-           "_idex": [1]
-       }
-   }
-
-Send it through ``generate_parameters``\ , and the tuner would look like any HPO tuner. Refer to `SPOS <./SPOS.rst>`__ example code for an example.
--- a/docs/en_US/NAS/retiarii/ApiReference.rst
+++ b/docs/en_US/NAS/retiarii/ApiReference.rst
@@ -75,8 +75,8 @@ Oneshot Trainers
 ..  autoclass:: nni.retiarii.oneshot.pytorch.SinglePathTrainer
    :members:

-Strategies
----------
+Exploration Strategies
+----------------------

 ..  autoclass:: nni.retiarii.strategy.Random
    :members:
@@ -90,6 +90,9 @@ Strategies
 ..  autoclass:: nni.retiarii.strategy.TPEStrategy
    :members:

+..  autoclass:: nni.retiarii.strategy.PolicyBasedRL
+    :members:
+
 Retiarii Experiments
 --------------------

@@ -98,3 +101,8 @@ Retiarii Experiments

 ..  autoclass:: nni.retiarii.experiment.pytorch.RetiariiExeConfig
    :members:
+
+Utilities
+---------
+
+..  autofunction:: nni.retiarii.serialize
\ No newline at end of file
--- a/docs/en_US/NAS/Benchmarks.rst
+++ b/docs/en_US/NAS/Benchmarks.rst
@@ -9,7 +9,7 @@ NAS Benchmarks
 Introduction
 ------------

-To imporve the reproducibility of NAS algorithms as well as reducing computing resource requirements, researchers proposed a series of NAS benchmarks such as `NAS-Bench-101 <https://arxiv.org/abs/1902.09635>`__\ , `NAS-Bench-201 <https://arxiv.org/abs/2001.00326>`__\ , `NDS <https://arxiv.org/abs/1905.13214>`__\ , etc. NNI provides a query interface for users to acquire these benchmarks. Within just a few lines of code, researcher are able to evaluate their NAS algorithms easily and fairly by utilizing these benchmarks.
+To improve the reproducibility of NAS algorithms as well as reducing computing resource requirements, researchers proposed a series of NAS benchmarks such as `NAS-Bench-101 <https://arxiv.org/abs/1902.09635>`__\ , `NAS-Bench-201 <https://arxiv.org/abs/2001.00326>`__\ , `NDS <https://arxiv.org/abs/1905.13214>`__\ , etc. NNI provides a query interface for users to acquire these benchmarks. Within just a few lines of code, researcher are able to evaluate their NAS algorithms easily and fairly by utilizing these benchmarks.

 Prerequisites
 -------------

--- a/docs/en_US/NAS/CDARTS.rst
+++ b/docs/en_US/NAS/CDARTS.rst
-CDARTS
-======
-
-Introduction
------------
-
-`CDARTS <https://arxiv.org/pdf/2006.10724.pdf>`__ builds a cyclic feedback mechanism between the search and evaluation networks. First, the search network generates an initial topology for evaluation, so that the weights of the evaluation network can be optimized. Second, the architecture topology in the search network is further optimized by the label supervision in classification, as well as the regularization from the evaluation network through feature distillation. Repeating the above cycle results in a joint optimization of the search and evaluation networks, and thus enables the evolution of the topology to fit the final evaluation network.
-
-In implementation of ``CdartsTrainer``\ , it first instantiates two models and two mutators (one for each). The first model is the so-called "search network", which is mutated with a ``RegularizedDartsMutator`` -- a mutator with subtle differences with ``DartsMutator``. The second model is the "evaluation network", which is mutated with a discrete mutator that leverages the previous search network mutator, to sample a single path each time. Trainers train models and mutators alternatively. Users can refer to `paper <https://arxiv.org/pdf/2006.10724.pdf>`__ if they are interested in more details on these trainers and mutators.
-
-Reproduction Results
--------------------
-
-This is CDARTS based on the NNI platform, which currently supports CIFAR10 search and retrain. ImageNet search and retrain should also be supported, and we provide corresponding interfaces. Our reproduced results on NNI are slightly lower than the paper, but much higher than the original DARTS. Here we show the results of three independent experiments on CIFAR10.
-
-.. list-table::
-   :header-rows: 1
-   :widths: auto
-
-   * - Runs
-     - Paper
-     - NNI
-   * - 1
-     - 97.52
-     - 97.44
-   * - 2
-     - 97.53
-     - 97.48
-   * - 3
-     - 97.58
-     - 97.56
-
-
-Examples
--------
-
-`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/legacy/cdarts>`__
-
-.. code-block:: bash
-
-   # In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-   git clone https://github.com/Microsoft/nni.git
-
-   # install apex for distributed training.
-   git clone https://github.com/NVIDIA/apex
-   cd apex
-   python setup.py install --cpp_ext --cuda_ext
-
-   # search the best architecture
-   cd examples/nas/legacy/cdarts
-   bash run_search_cifar.sh
-
-   # train the best architecture.
-   bash run_retrain_cifar.sh
-
-Reference
---------
-
-PyTorch
-^^^^^^^
-
-..  autoclass:: nni.algorithms.nas.pytorch.cdarts.CdartsTrainer
-    :members:
-
-..  autoclass:: nni.algorithms.nas.pytorch.cdarts.RegularizedDartsMutator
-    :members:
-
-..  autoclass:: nni.algorithms.nas.pytorch.cdarts.DartsDiscreteMutator
-    :members:
-
-..  autoclass:: nni.algorithms.nas.pytorch.cdarts.RegularizedMutatorParallel
-    :members:
--- a/docs/en_US/NAS/ClassicNas.rst
+++ b/docs/en_US/NAS/ClassicNas.rst
-.. role:: raw-html(raw)
-   :format: html
-
-
-Classic NAS Algorithms
-======================
-
-In classic NAS algorithms, each architecture is trained as a trial and the NAS algorithm acts as a tuner. Thus, this training mode naturally fits within the NNI hyper-parameter tuning framework, where Tuner generates new architecture for the next trial and trials run in the training service.
-
-Quick Start
-----------
-
-The following example shows how to use classic NAS algorithms. You can see it is quite similar to NNI hyper-parameter tuning.
-
-.. code-block:: python
-
-   model = Net()
-
-   # get the chosen architecture from tuner and apply it on model
-   get_and_apply_next_architecture(model)
-   train(model)  # your code for training the model
-   acc = test(model)  # test the trained model
-   nni.report_final_result(acc)  # report the performance of the chosen architecture
-
-First, instantiate the model. Search space has been defined in this model through ``LayerChoice`` and ``InputChoice``. After that, user should invoke ``get_and_apply_next_architecture(model)`` to settle down to a specific architecture. This function receives the architecture from tuner (i.e., the classic NAS algorithm) and applies the architecture to ``model``. At this point, ``model`` becomes a specific architecture rather than a search space. Then users are free to train this model just like training a normal PyTorch model. After get the accuracy of this model, users should invoke ``nni.report_final_result(acc)`` to report the result to the tuner.
-
-At this point, trial code is ready. Then, we can prepare an NNI experiment, i.e., search space file and experiment config file. Different from NNI hyper-parameter tuning, search space file is automatically generated from the trial code by running the command (the detailed usage of this command can be found `here <../Tutorial/Nnictl.rst>`__\ ):
-
-``nnictl ss_gen --trial_command="the command for running your trial code"``
-
-A file named ``nni_auto_gen_search_space.json`` is generated by this command. Then put the path of the generated search space in the field ``searchSpacePath`` of the experiment config file. The other fields of the config file can be filled by referring `this tutorial <../Tutorial/QuickStart.rst>`__.
-
-Currently, we only support :githublink:`PPO Tuner <examples/tuners/random_nas_tuner>` for classic NAS. More classic NAS algorithms will be supported soon.
-
-The complete examples can be found :githublink:`here <examples/nas/legacy/classic_nas>` for PyTorch and :githublink:`here <examples/nas/legacy/classic_nas-tf>` for TensorFlow.
-
-Standalone mode for easy debugging
----------------------------------
-
-We support a standalone mode for easy debugging, where you can directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for ``LayerChoice`` and ``InputChoice`` in this standalone mode.
-
-:raw-html:`<a name="regulaized-evolution-tuner"></a>`
-
-Regularized Evolution Tuner
---------------------------
-
-This is a tuner geared for NNI’s Neural Architecture Search (NAS) interface. It uses the `evolution algorithm <https://arxiv.org/pdf/1802.01548.pdf>`__.
-
-The tuner first randomly initializes the number of ``population`` models and evaluates them. After that, every time to produce a new architecture, the tuner randomly chooses the number of ``sample`` architectures from ``population``\ , then mutates the best model in ``sample``\ , the parent model, to produce the child model. The mutation includes the hidden mutation and the op mutation. The hidden state mutation consists of replacing a hidden state with another hidden state from within the cell, subject to the constraint that no loops are formed. The op mutation behaves like the hidden state mutation as far as replacing one op with another op from the op set. Note that keeping the child model the same as its parent is not allowed. After evaluating the child model, it is added to the tail of the ``population``\ , then pops the front one.
-
-Note that **trial concurrency should be less than the population of the model**\ , otherwise NO_MORE_TRIAL exception will be raised.
-
-The whole procedure is summarized by the pseudocode below.
-
-
-.. image:: ../../img/EvoNasTuner.png
-   :target: ../../img/EvoNasTuner.png
-   :alt: 
-
--- a/docs/en_US/NAS/Cream.rst
+++ b/docs/en_US/NAS/Cream.rst
-Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
-=======================================================================================
-
-* `Paper <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__
-* `Models-Google Drive <https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing>`__
-* `Models-Baidu Disk (PWD: wqw6) <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__
-* `BibTex <https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja>`__
-
-In this work, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. The discovered architectures achieve superior performance compared to the recent `MobileNetV3 <https://arxiv.org/abs/1905.02244>`__ and `EfficientNet <https://arxiv.org/abs/1905.11946>`__ families under aligned settings.
-
-.. image:: https://raw.githubusercontent.com/microsoft/Cream/main/demo/intro.jpg
-
-Reproduced Results
------------------
-
-Top-1 Accuracy on ImageNet. The top-1 accuracy of Cream search algorithm surpasses MobileNetV3 and EfficientNet-B0/B1 on ImageNet.
-The training with 16 Gpus is a little bit superior than 8 Gpus, as below.
-
-.. list-table::
-   :header-rows: 1
-   :widths: auto
-
-   * - Model (M Flops)
-     - 8Gpus
-     - 16Gpus
-   * - 14M
-     - 53.7
-     - 53.8
-   * - 43M
-     - 65.8
-     - 66.5
-   * - 114M
-     - 72.1
-     - 72.8
-   * - 287M
-     - 76.7
-     - 77.6
-   * - 481M
-     - 78.9
-     - 79.2
-   * - 604M
-     - 79.4
-     - 80.0
-
-
-
-.. image:: ../../img/cream_flops100.jpg
-   :scale: 50%
-
-.. image:: ../../img/cream_flops600.jpg
-   :scale: 50%
-
-Examples
--------
-
-`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/legacy/cream>`__
-
-Please run the following scripts in the example folder.
-
-Data Preparation
----------------
-
-You need to first download the `ImageNet-2012 <http://www.image-net.org/>`__ to the folder ``./data/imagenet`` and move the validation set to the subfolder ``./data/imagenet/val``. To move the validation set, you cloud use `the following script <https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh>`__ .
-
-Put the imagenet data in ``./data``. It should be like following:
-
-.. code-block:: bash
-
-   ./data/imagenet/train
-   ./data/imagenet/val
-   ...
-
-Quick Start
-----------
-
-1. Search
-^^^^^^^^^
-
-First, build environments for searching.
-
-.. code-block:: bash
-
-   pip install -r ./requirements
-
-   git clone https://github.com/NVIDIA/apex.git
-   cd apex
-   python setup.py install --cpp_ext --cuda_ext
-
-To search for an architecture, you need to configure the parameters ``FLOPS_MINIMUM`` and ``FLOPS_MAXIMUM`` to specify the desired model flops, such as [0,600]MB flops. You can specify the flops interval by changing these two parameters in ``./configs/train.yaml``
-
-.. code-block:: bash
-
-   FLOPS_MINIMUM: 0 # Minimum Flops of Architecture
-   FLOPS_MAXIMUM: 600 # Maximum Flops of Architecture
-
-For example, if you expect to search an architecture with model flops <= 200M, please set the ``FLOPS_MINIMUM`` and ``FLOPS_MAXIMUM`` to be ``0`` and ``200``.
-
-After you specify the flops of the architectures you would like to search, you can search an architecture now by running:
-
-.. code-block:: bash
-
-   python -m torch.distributed.launch --nproc_per_node=8 ./train.py --cfg ./configs/train.yaml
-
-The searched architectures need to be retrained and obtain the final model. The final model is saved in ``.pth.tar`` format. Retraining code will be released soon.
-
-2. Retrain
-^^^^^^^^^^^
-
-To train searched architectures, you need to configure the parameter ``MODEL_SELECTION`` to specify the model Flops. To specify which model to train, you should add ``MODEL_SELECTION`` in ``./configs/retrain.yaml``. You can select one from [14,43,112,287,481,604], which stands for different Flops(MB).
-
-.. code-block:: bash
-
-   MODEL_SELECTION: 43 # Retrain 43m model
-   MODEL_SELECTION: 481 # Retrain 481m model
-   ......
-
-To train random architectures, you need specify ``MODEL_SELECTION`` to ``-1`` and configure the parameter ``INPUT_ARCH``\ :
-
-.. code-block:: bash
-
-   MODEL_SELECTION: -1 # Train random architectures
-   INPUT_ARCH: [[0], [3], [3, 3], [3, 1, 3], [3, 3, 3, 3], [3, 3, 3], [0]] # Random Architectures
-   ......
-
-After adding ``MODEL_SELECTION`` in ``./configs/retrain.yaml``\ , you need to use the following command to train the model.
-
-.. code-block:: bash
-
-   python -m torch.distributed.launch --nproc_per_node=8 ./retrain.py --cfg ./configs/retrain.yaml
-
-3. Test
-^^^^^^^^^
-
-To test our trained of models, you need to use ``MODEL_SELECTION`` in ``./configs/test.yaml`` to specify which model to test.
-
-.. code-block:: bash
-
-   MODEL_SELECTION: 43 # test 43m model
-   MODEL_SELECTION: 481 # test 470m model
-   ......
-
-After specifying the flops of the model, you need to write the path to the resume model in ``./test.sh``.
-
-.. code-block:: bash
-
-   RESUME_PATH: './43.pth.tar'
-   RESUME_PATH: './481.pth.tar'
-   ......
-
-We provide 14M/43M/114M/287M/481M/604M pretrained models in `google drive <https://drive.google.com/drive/folders/1CQjyBryZ4F20Rutj7coF8HWFcedApUn2>`__ or `[Models-Baidu Disk (password: wqw6)] <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__ .
-
-After downloading the pretrained models and adding ``MODEL_SELECTION`` and ``RESUME_PATH`` in './configs/test.yaml', you need to use the following command to test the model.
-
-.. code-block:: bash
-
-   python -m torch.distributed.launch --nproc_per_node=8 ./test.py --cfg ./configs/test.yaml
--- a/docs/en_US/NAS/DARTS.rst
+++ b/docs/en_US/NAS/DARTS.rst
@@ -56,11 +56,8 @@ Reference
 PyTorch
 ^^^^^^^

-..  autoclass:: nni.algorithms.nas.pytorch.darts.DartsTrainer
-    :members:
-
-..  autoclass:: nni.algorithms.nas.pytorch.darts.DartsMutator
-    :members:
+..  autoclass:: nni.retiarii.oneshot.pytorch.DartsTrainer
+    :noindex:

 Limitations
 -----------

--- a/docs/en_US/NAS/ENAS.rst
+++ b/docs/en_US/NAS/ENAS.rst
@@ -39,8 +39,5 @@ Reference
 PyTorch
 ^^^^^^^

-.. autoclass:: nni.algorithms.nas.pytorch.enas.EnasTrainer
-    :members:
-
-.. autoclass:: nni.algorithms.nas.pytorch.enas.EnasMutator
-    :members:
+..  autoclass:: nni.retiarii.oneshot.pytorch.EnasTrainer
+    :noindex:
--- a/docs/en_US/NAS/ExecutionEngines.rst
+++ b/docs/en_US/NAS/ExecutionEngines.rst
+Execution Engines
+=================
+
+Execution engine is for running Retiarii Experiment. NNI supports three execution engines, users can choose a speicific engine according to the type of their model mutation definition and their requirements for cross-model optimizations. 
+
+* **Pure-python execution engine** is the default engine, it supports the model space expressed by `inline mutation API <./MutationPrimitives.rst>`__. 
+
+* **Graph-based execution engine** supports the use of `inline mutation APIs <./MutationPrimitives.rst>`__ and model spaces represented by `mutators <./Mutators.rst>`__. It requires the user's model to be parsed by `TorchScript <https://pytorch.org/docs/stable/jit.html>`__.
+
+* **CGO execution engine** has the same requirements and capabilities as the **Graph-based execution engine**. But further enables cross-model optimizations, which makes model space exploration faster.
+
+Pure-python Execution Engine
+----------------------------
+
+Pure-python Execution Engine is the default engine, we recommend users to keep using this execution engine, if they are new to NNI NAS. Pure-python execution engine plays magic within the scope of inline mutation APIs, while does not touch the rest of user model. Thus, it has minimal requirement on user model. 
+
+One steps are needed to use this engine now.
+
+1. Add ``@nni.retiarii.model_wrapper`` decorator outside the whole PyTorch model.
+
+.. note:: You should always use ``super().__init__()`` instead of ``super(MyNetwork, self).__init__()`` in the PyTorch model, because the latter one has issues with model wrapper.
+
+Graph-based Execution Engine
+----------------------------
+
+For graph-based execution engine, it converts user-defined model to a graph representation (called graph IR) using `TorchScript <https://pytorch.org/docs/stable/jit.html>`__, each instantiated module in the model is converted to a subgraph. Then mutations are applied to the graph to generate new graphs. Each new graph is then converted back to PyTorch code and executed on the user specified training service.
+
+Users may find ``@basic_unit`` helpful in some cases. ``@basic_unit`` here means the module will not be converted to a subgraph, instead, it is converted to a single graph node as a basic unit.
+
+``@basic_unit`` is usually used in the following cases:
+
+* When users want to tune initialization parameters of a module using ``ValueChoice``, then decorate the module with ``@basic_unit``. For example, ``self.conv = MyConv(kernel_size=nn.ValueChoice([1, 3, 5]))``, here ``MyConv`` should be decorated.
+
+* When a module cannot be successfully parsed to a subgraph, decorate the module with ``@basic_unit``. The parse failure could be due to complex control flow. Currently Retiarii does not support adhoc loop, if there is adhoc loop in a module's forward, this class should be decorated as serializable module. For example, the following ``MyModule`` should be decorated.
+
+  .. code-block:: python
+
+    @basic_unit
+    class MyModule(nn.Module):
+      def __init__(self):
+        ...
+      def forward(self, x):
+        for i in range(10): # <- adhoc loop
+          ...
+
+* Some inline mutation APIs require their handled module to be decorated with ``@basic_unit``. For example, user-defined module that is provided to ``LayerChoice`` as a candidate op should be decorated.
+
+Three steps are need to use graph-based execution engine.
+
+1. Remove ``@nni.retiarii.model_wrapper`` if there is any in your model.
+2. Add ``config.execution_engine = 'base'`` to ``RetiariiExeConfig``. The default value of ``execution_engine`` is 'py', which means pure-python execution engine.
+3. Add ``@basic_unit`` when necessary following the above guidelines.
+
+For exporting top models, graph-based execution engine supports exporting source code for top models by running ``exp.export_top_models(formatter='code')``.
+
+CGO Execution Engine
+--------------------
+
+CGO execution engine does cross-model optimizations based on the graph-based execution engine. This execution engine will be `released in v2.4 <https://github.com/microsoft/nni/issues/3813>`__.
--- a/docs/en_US/NAS/ExplorationStrategies.rst
+++ b/docs/en_US/NAS/ExplorationStrategies.rst
+Exploration Strategies for Multi-trial NAS
+==========================================
+
+Usage of Exploration Strategy
+-----------------------------
+
+To use an exploration strategy, users simply instantiate an exploration strategy and pass the instantiated object to ``RetiariiExperiment``. Below is a simple example.
+
+.. code-block:: python
+
+  import nni.retiarii.strategy as strategy
+
+  exploration_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is not wanted
+
+Supported Exploration Strategies
+--------------------------------
+
+NNI provides the following exploration strategies for multi-trial NAS. Users could also `customize new exploration strategies <./WriteStrategy.rst>`__.
+
+.. list-table::
+   :header-rows: 1
+   :widths: auto
+
+   * - Name
+     - Brief Introduction of Algorithm
+   * - `Random Strategy <./ApiReference.rst#nni.retiarii.strategy.Random>`__
+     - Randomly sampling new model(s) from user defined model space. (``nni.retiarii.strategy.Random``)
+   * - `Grid Search <./ApiReference.rst#nni.retiarii.strategy.GridSearch>`__
+     - Sampling new model(s) from user defined model space using grid search algorithm. (``nni.retiarii.strategy.GridSearch``)
+   * - `Regularized Evolution <./ApiReference.rst#nni.retiarii.strategy.RegularizedEvolution>`__
+     - Generating new model(s) from generated models using `regularized evolution algorithm <https://arxiv.org/abs/1802.01548>`__ . (``nni.retiarii.strategy.RegularizedEvolution``)
+   * - `TPE Strategy <./ApiReference.rst#nni.retiarii.strategy.TPEStrategy>`__
+     - Sampling new model(s) from user defined model space using `TPE algorithm <https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf>`__ . (``nni.retiarii.strategy.TPEStrategy``)
+   * - `RL Strategy <./ApiReference.rst#nni.retiarii.strategy.PolicyBasedRL>`__
+     - It uses `PPO algorithm <https://arxiv.org/abs/1707.06347>`__ to sample new model(s) from user defined model space. (``nni.retiarii.strategy.PolicyBasedRL``)
\ No newline at end of file
--- a/docs/en_US/NAS/FBNet.rst
+++ b/docs/en_US/NAS/FBNet.rst
 FBNet
 ======

+.. note:: This one-shot NAS is still implemented under NNI NAS 1.0, and will `be migrated to Retiarii framework in v2.4 <https://github.com/microsoft/nni/issues/3814>`__.
+
 For the mobile application of facial landmark, based on the basic architecture of PFLD model, we have applied the FBNet (Block-wise DNAS) to design an concise model with the trade-off between latency and accuracy. References are listed as below:


@@ -148,4 +150,4 @@ The checkpoints of pre-trained supernet and subnet are offered as below:

 * `Supernet <https://drive.google.com/file/d/1TCuWKq8u4_BQ84BWbHSCZ45N3JGB9kFJ/view?usp=sharing>`__
 * `Subnet <https://drive.google.com/file/d/160rkuwB7y7qlBZNM3W_T53cb6MQIYHIE/view?usp=sharing>`__
-* `ONNX model <https://drive.google.com/file/d/1s-v-aOiMv0cqBspPVF3vSGujTbn_T_Uo/view?usp=sharing>`__
+* `ONNX model <https://drive.google.com/file/d/1s-v-aOiMv0cqBspPVF3vSGujTbn_T_Uo/view?usp=sharing>`__
\ No newline at end of file
--- a/docs/en_US/NAS/ModelEvaluators.rst
+++ b/docs/en_US/NAS/ModelEvaluators.rst
+Model Evaluators
+================
+
+A model evaluator is for training and validating each generated model.
+
+Usage of Model Evaluator
+------------------------
+
+In multi-trial NAS, a sampled model should be able to be executed on a remote machine or a training platform (e.g., AzureML, OpenPAI). Thus, both the model and its model evaluator should be correctly serialized. To make NNI correctly serialize model evaluator, users should apply ``serialize`` on some of their functions and objects.
+
+.. _serializer:
+
+`serialize <./ApiReference.rst#utilities>`__ enables re-instantiation of model evaluator in another process or machine. It is implemented by recording the initialization parameters of user instantiated evaluator.
+
+The evaluator related APIs provided by Retiarii have already supported serialization, for example ``pl.Classification``, ``pl.DataLoader``, no need to apply ``serialize`` on them. In the following case users should use ``serialize`` API manually.
+
+If the initialization parameters of the evaluator APIs (e.g., ``pl.Classification``, ``pl.DataLoader``) are not primitive types (e.g., ``int``, ``string``), they should be applied with  ``serialize``. If those parameters' initialization parameters are not primitive types, ``serialize`` should also be applied. In a word, ``serialize`` should be applied recursively if necessary.
+
+Below is an example, ``transforms.Compose``, ``transforms.Normalize``, and ``MNIST`` are serialized manually using ``serialize``. ``serialize`` takes a class ``cls`` as its first argument, its following arguments are the arguments for initializing this class. ``pl.Classification`` is not applied ``serialize`` because it is already serializable as an API provided by NNI.
+
+.. code-block:: python
+
+  import nni.retiarii.evaluator.pytorch.lightning as pl
+  from nni.retiarii import serialize
+  from torchvision import transforms
+
+  transform = serialize(transforms.Compose, [serialize(transforms.ToTensor()), serialize(transforms.Normalize, (0.1307,), (0.3081,))])
+  train_dataset = serialize(MNIST, root='data/mnist', train=True, download=True, transform=transform)
+  test_dataset = serialize(MNIST, root='data/mnist', train=False, download=True, transform=transform)
+  evaluator = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
+                                val_dataloaders=pl.DataLoader(test_dataset, batch_size=100),
+                                max_epochs=10)
+
+Supported Model Evaluators
+--------------------------
+
+NNI provides some commonly used model evaluators for users' convenience. If these model evaluators do not meet users' requirement, they can customize new model evaluators following the tutorial `here <./WriteTrainer.rst>`__.
+
+..  autoclass:: nni.retiarii.evaluator.pytorch.lightning.Classification
+    :noindex:
+
+..  autoclass:: nni.retiarii.evaluator.pytorch.lightning.Regression
+    :noindex:
--- a/docs/en_US/NAS/MutationPrimitives.rst
+++ b/docs/en_US/NAS/MutationPrimitives.rst
+Mutation Primitives
+===================
+
+To make users easily express a model space within their PyTorch/TensorFlow model, NNI provides some inline mutation APIs as shown below.
+
+* `nn.LayerChoice <./ApiReference.rst#nni.retiarii.nn.pytorch.LayerChoice>`__. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model.
+
+  .. code-block:: python
+
+    # import nni.retiarii.nn.pytorch as nn
+    # declared in `__init__` method
+    self.layer = nn.LayerChoice([
+      ops.PoolBN('max', channels, 3, stride, 1),
+      ops.SepConv(channels, channels, 3, stride, 1),
+      nn.Identity()
+    ]))
+    # invoked in `forward` method
+    out = self.layer(x)
+
+* `nn.InputChoice <./ApiReference.rst#nni.retiarii.nn.pytorch.InputChoice>`__. It is mainly for choosing (or trying) different connections. It takes several tensors and chooses ``n_chosen`` tensors from them.
+
+  .. code-block:: python
+
+    # import nni.retiarii.nn.pytorch as nn
+    # declared in `__init__` method
+    self.input_switch = nn.InputChoice(n_chosen=1)
+    # invoked in `forward` method, choose one from the three
+    out = self.input_switch([tensor1, tensor2, tensor3])
+
+* `nn.ValueChoice <./ApiReference.rst#nni.retiarii.nn.pytorch.ValueChoice>`__. It is for choosing one value from some candidate values. It can only be used as input argument of basic units, that is, modules in ``nni.retiarii.nn.pytorch`` and user-defined modules decorated with ``@basic_unit``.
+
+  .. code-block:: python
+
+    # import nni.retiarii.nn.pytorch as nn
+    # used in `__init__` method
+    self.conv = nn.Conv2d(XX, XX, kernel_size=nn.ValueChoice([1, 3, 5])
+    self.op = MyOp(nn.ValueChoice([0, 1]), nn.ValueChoice([-1, 1]))
+
+* `nn.Repeat <./ApiReference.rst#nni.retiarii.nn.pytorch.Repeat>`__. Repeat a block by a variable number of times.
+
+* `nn.Cell <./ApiReference.rst#nni.retiarii.nn.pytorch.Cell>`__. `This cell structure is popularly used in NAS literature <https://arxiv.org/abs/1611.01578>`__. Specifically, the cell consists of multiple "nodes". Each node is a sum of multiple operators. Each operator is chosen from user specified candidates, and takes one input from previous nodes and predecessors. Predecessor means the input of cell. The output of cell is the concatenation of some of the nodes in the cell (currently all the nodes).
\ No newline at end of file
--- a/docs/en_US/NAS/retiarii/Advanced.rst
+++ b/docs/en_US/NAS/retiarii/Advanced.rst
-Advanced Tutorial
-=================
-
-Pure-python execution engine (experimental)
-------------------------------------------
-
-If you are experiencing issues with TorchScript, or the generated model code by Retiarii, there is another execution engine called Pure-python execution engine which doesn't need the code-graph conversion. This should generally not affect models and strategies in most cases, but customized mutation might not be supported.
-
-This will come as the default execution engine in future version of Retiarii.
-
-Three steps are needed to enable this engine now.
-
-1. Add ``@nni.retiarii.model_wrapper`` decorator outside the whole PyTorch model.
-2. Add ``config.execution_engine = 'py'`` to ``RetiariiExeConfig``.
-3. If you need to export top models, formatter needs to be set to ``dict``. Exporting ``code`` won't work with this engine.
-
-.. note:: You should always use ``super().__init__()` instead of ``super(MyNetwork, self).__init__()`` in the PyTorch model, because the latter one has issues with model wrapper.
-
-``@basic_unit`` and ``serializer``
----------------------------------
-
-.. _serializer:
-
-``@basic_unit`` and ``serialize`` can be viewed as some kind of serializer. They are designed for making the whole model (including training) serializable to be executed on another process or machine.
-
-**@basic_unit** annotates that a module is a basic unit, i.e, no need to understand the details of this module. The effect is that it prevents Retiarii to parse this module. To understand this, we first briefly explain how Retiarii works: it converts user-defined model to a graph representation (called graph IR) using `TorchScript <https://pytorch.org/docs/stable/jit.html>`__, each instantiated module in the model is converted to a subgraph. Then mutations are applied to the graph to generate new graphs. Each new graph is then converted back to PyTorch code and executed. ``@basic_unit`` here means the module will not be converted to a subgraph, instead, it is converted to a single graph node as a basic unit. That is, the module will not be unfolded anymore. When the module is not unfolded, mutations on initialization parameters of this module becomes easier.
-
-``@basic_unit`` is usually used in the following cases:
-
-* When users want to tune initialization parameters of a module using ``ValueChoice``, then decorate the module with ``@basic_unit``. For example, ``self.conv = MyConv(kernel_size=nn.ValueChoice([1, 3, 5]))``, here ``MyConv`` should be decorated.
-
-* When a module cannot be successfully parsed to a subgraph, decorate the module with ``@basic_unit``. The parse failure could be due to complex control flow. Currently Retiarii does not support adhoc loop, if there is adhoc loop in a module's forward, this class should be decorated as serializable module. For example, the following ``MyModule`` should be decorated.
-
-  .. code-block:: python
-
-    @basic_unit
-    class MyModule(nn.Module):
-      def __init__(self):
-        ...
-      def forward(self, x):
-        for i in range(10): # <- adhoc loop
-          ...
-
-* Some inline mutation APIs require their handled module to be decorated with ``@basic_unit``. For example, user-defined module that is provided to ``LayerChoice`` as a candidate op should be decorated.
-
-**serialize** is mainly used for serializing model training logic. It enables re-instantiation of model evaluator in another process or machine. Re-instantiation is necessary because most of time model and evaluator should be sent to training services. ``serialize`` is implemented by recording the initialization parameters of user instantiated evaluator.
-
-The evaluator related APIs provided by Retiarii have already supported serialization, for example ``pl.Classification``, ``pl.DataLoader``, no need to apply ``serialize`` on them. In the following case users should use ``serialize`` API manually.
-
-If the initialization parameters of the evaluator APIs (e.g., ``pl.Classification``, ``pl.DataLoader``) are not primitive types (e.g., ``int``, ``string``), they should be applied with  ``serialize``. If those parameters' initialization parameters are not primitive types, ``serialize`` should also be applied. In a word, ``serialize`` should be applied recursively if necessary.
-
-
 Express Mutations with Mutators
-------------------------------
+===============================

-Besides inline mutations which have been demonstrated `here <./Tutorial.rst>`__, Retiarii provides a more general approach to express a model space: *Mutator*. Inline mutations APIs are also implemented with mutator, which can be seen as a special case of model mutation.
+Besides the inline mutation APIs demonstrated `here <./MutationPrimitives.rst>`__, NNI provides a more general approach to express a model space, i.e., *Mutator*, to cover more complex model spaces. Those inline mutation APIs are also implemented with mutator in the underlying system, which can be seen as a special case of model mutation.

 .. note:: Mutator and inline mutation APIs cannot be used together.

@@ -68,7 +16,7 @@ A mutator is a piece of logic to express how to mutate a given model. Users are
 ``BlockMutator`` is defined by users to express how to mutate the base model. 

 Write a mutator
-^^^^^^^^^^^^^^^
+---------------

 User-defined mutator should inherit ``Mutator`` class, and implement mutation logic in the member function ``mutate``.

@@ -101,7 +49,7 @@ Use placehoder to make mutation easier: ``nn.Placeholder``. If you want to mutat
    stride=stride
  )

-``label`` is used by mutator to identify this placeholder. The other parameters are the information that are required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>`.
+``label`` is used by mutator to identify this placeholder. The other parameters are the information that is required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>`.

 Starting an experiment is almost the same as using inline mutation APIs. The only difference is that the applied mutators should be passed to ``RetiariiExperiment``. Below is a simple example.


--- a/docs/en_US/NAS/NasGuide.rst
+++ b/docs/en_US/NAS/NasGuide.rst
-One-shot NAS algorithms
-=======================
-
-Besides `classic NAS algorithms <./ClassicNas.rst>`__\ , users also apply more advanced one-shot NAS algorithms to find better models from a search space. There are lots of related works about one-shot NAS algorithms, such as `SMASH <https://arxiv.org/abs/1708.05344>`__\ , `ENAS <https://arxiv.org/abs/1802.03268>`__\ , `DARTS <https://arxiv.org/abs/1808.05377>`__\ , `FBNet <https://arxiv.org/abs/1812.03443>`__\ , `ProxylessNAS <https://arxiv.org/abs/1812.00332>`__\ , `SPOS <https://arxiv.org/abs/1904.00420>`__\ , `Single-Path NAS <https://arxiv.org/abs/1904.02877>`__\ ,  `Understanding One-shot <http://proceedings.mlr.press/v80/bender18a>`__ and `GDAS <https://arxiv.org/abs/1910.04465>`__. One-shot NAS algorithms usually build a supernet containing every candidate in the search space as its subnetwork, and in each step, a subnetwork or combination of several subnetworks is trained.
-
-Currently, several one-shot NAS methods are supported on NNI. For example, ``DartsTrainer``\ , which uses SGD to train architecture weights and model weights iteratively, and ``ENASTrainer``\ , which `uses a controller to train the model <https://arxiv.org/abs/1802.03268>`__. New and more efficient NAS trainers keep emerging in research community and some will be implemented in future releases of NNI.
-
-Search with One-shot NAS Algorithms
-----------------------------------
-
-Each one-shot NAS algorithm implements a trainer, for which users can find usage details in the description of each algorithm. Here is a simple example, demonstrating how users can use ``EnasTrainer``.
-
-.. code-block:: python
-
-   # this is exactly same as traditional model training
-   model = Net()
-   dataset_train = CIFAR10(root="./data", train=True, download=True, transform=train_transform)
-   dataset_valid = CIFAR10(root="./data", train=False, download=True, transform=valid_transform)
-   criterion = nn.CrossEntropyLoss()
-   optimizer = torch.optim.SGD(model.parameters(), 0.05, momentum=0.9, weight_decay=1.0E-4)
-
-   # use NAS here
-   def top1_accuracy(output, target):
-       # this is the function that computes the reward, as required by ENAS algorithm
-       batch_size = target.size(0)
-       _, predicted = torch.max(output.data, 1)
-       return (predicted == target).sum().item() / batch_size
-
-   def metrics_fn(output, target):
-       # metrics function receives output and target and computes a dict of metrics
-       return {"acc1": top1_accuracy(output, target)}
-
-   from nni.algorithms.nas.pytorch import enas
-   trainer = enas.EnasTrainer(model,
-                              loss=criterion,
-                              metrics=metrics_fn,
-                              reward_function=top1_accuracy,
-                              optimizer=optimizer,
-                              batch_size=128
-                              num_epochs=10,  # 10 epochs
-                              dataset_train=dataset_train,
-                              dataset_valid=dataset_valid,
-                              log_frequency=10)  # print log every 10 steps
-   trainer.train()  # training
-   trainer.export(file="model_dir/final_architecture.json")  # export the final architecture to file
-
-``model`` is the one with `user defined search space <./WriteSearchSpace.rst>`__. Then users should prepare training data and model evaluation metrics. To search from the defined search space, a one-shot algorithm is instantiated, called trainer (e.g., EnasTrainer). The trainer exposes a few arguments that you can customize. For example, the loss function, the metrics function, the optimizer, and the datasets. These should satisfy most usage requirements and we do our best to make sure our built-in trainers work on as many models, tasks, and datasets as possible.
-
-**Note that** when using one-shot NAS algorithms, there is no need to start an NNI experiment. Users can directly run this Python script (i.e., ``train.py``\ ) through ``python3 train.py`` without ``nnictl``. After training, users can export the best one of the found models through ``trainer.export()``.
-
-Each trainer in NNI has its targeted scenario and usage. Some trainers have the assumption that the task is a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps). Most trainers do not have support for distributed training: they won't wrap your model with ``DataParallel`` or ``DistributedDataParallel`` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might need to `customize your trainer <./Advanced.rst#extend-the-ability-of-one-shot-trainers>`__.
-
-Furthermore, one-shot NAS can be visualized with our NAS UI. `See more details. <./Visualization.rst>`__
-
-Retrain with Exported Architecture
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-After the search phase, it's time to train the found architecture. Unlike many open-source NAS algorithms who write a whole new model specifically for retraining. We found that the search model and retraining model are usually very similar, and therefore you can construct your final model with the exact same model code. For example
-
-.. code-block:: python
-
-   model = Net()
-   apply_fixed_architecture(model, "model_dir/final_architecture.json")
-
-The JSON is simply a mapping from mutable keys to choices. Choices can be expressed in:
-
-
-* A string: select the candidate with corresponding name.
-* A number: select the candidate with corresponding index.
-* A list of string: select the candidates with corresponding names.
-* A list of number: select the candidates with corresponding indices.
-* A list of boolean values: a multi-hot array.
-
-For example,
-
-.. code-block:: json
-
-   {
-       "LayerChoice1": "conv5x5",
-       "LayerChoice2": 6,
-       "InputChoice3": ["layer1", "layer3"],
-       "InputChoice4": [1, 2],
-       "InputChoice5": [false, true, false, false, true]
-   }
-
-After applying, the model is then fixed and ready for final training. The model works as a single model, and unused parameters and modules are pruned.
-
-Also, refer to `DARTS <./DARTS.rst>`__ for code exemplifying retraining.
--- a/docs/en_US/NAS/NasReference.rst
+++ b/docs/en_US/NAS/NasReference.rst
-NAS Reference
-=============
-
-.. contents::
-
-Mutables
--------
-
-..  autoclass:: nni.nas.pytorch.mutables.Mutable
-    :members:
-
-..  autoclass:: nni.nas.pytorch.mutables.LayerChoice
-    :members:
-
-..  autoclass:: nni.nas.pytorch.mutables.InputChoice
-    :members:
-
-..  autoclass:: nni.nas.pytorch.mutables.MutableScope
-    :members:
-
-Utilities
-^^^^^^^^^
-
-..  autofunction:: nni.nas.pytorch.utils.global_mutable_counting
-
-Mutators
--------
-
-..  autoclass:: nni.nas.pytorch.base_mutator.BaseMutator
-    :members:
-
-..  autoclass:: nni.nas.pytorch.mutator.Mutator
-    :members:
-
-Random Mutator
-^^^^^^^^^^^^^^
-
-..  autoclass:: nni.algorithms.nas.pytorch.random.RandomMutator
-    :members:
-
-Utilities
-^^^^^^^^^
-
-..  autoclass:: nni.nas.pytorch.utils.StructuredMutableTreeNode
-    :members:
-
-Trainers
--------
-
-Trainer
-^^^^^^^
-
-..  autoclass:: nni.nas.pytorch.base_trainer.BaseTrainer
-    :members:
-
-..  autoclass:: nni.nas.pytorch.trainer.Trainer
-    :members:
-
-Retrain
-^^^^^^^
-
-..  autofunction:: nni.nas.pytorch.fixed.apply_fixed_architecture
-
-..  autoclass:: nni.nas.pytorch.fixed.FixedArchitecture
-    :members:
-
-Distributed NAS
-^^^^^^^^^^^^^^^
-
-..  autofunction:: nni.algorithms.nas.pytorch.classic_nas.get_and_apply_next_architecture
-
-..  autoclass:: nni.algorithms.nas.pytorch.classic_nas.mutator.ClassicMutator
-    :members:
-
-Callbacks
-^^^^^^^^^
-
-..  autoclass:: nni.nas.pytorch.callbacks.Callback
-    :members:
-
-..  autoclass:: nni.nas.pytorch.callbacks.LRSchedulerCallback
-    :members:
-
-..  autoclass:: nni.nas.pytorch.callbacks.ArchitectureCheckpoint
-    :members:
-
-..  autoclass:: nni.nas.pytorch.callbacks.ModelCheckpoint
-    :members:
-
-Utilities
-^^^^^^^^^
-
-..  autoclass:: nni.nas.pytorch.utils.AverageMeterGroup
-    :members:
-
-..  autoclass:: nni.nas.pytorch.utils.AverageMeter
-    :members:
-
-..  autofunction:: nni.nas.pytorch.utils.to_device
--- a/docs/en_US/NAS/OneshotTrainer.rst
+++ b/docs/en_US/NAS/OneshotTrainer.rst
+One-shot NAS
+============
+
+Before reading this tutorial, we highly recommend you to first go through the tutorial of how to `define a model space <./QuickStart.rst#define-your-model-space>`__.
+
+Model Search with One-shot Trainer
+----------------------------------
+
+With a defined model space, users can explore the space in two ways. One is using strategy and single-arch evaluator as demonstrated `here <./QuickStart.rst#explore-the-defined-model-space>`__. The other is using one-shot trainer, which consumes much less computational resource compared to the first one. In this tutorial we focus on this one-shot approach. The principle of one-shot approach is combining all the models in a model space into one big model (usually called super-model or super-graph). It takes charge of both search, training and testing, by training and evaluating this big model.
+
+We list the supported one-shot trainers here:
+
+* DARTS trainer
+* ENAS trainer
+* ProxylessNAS trainer
+* Single-path (random) trainer
+
+See `API reference <./ApiReference.rst>`__ for detailed usages. Here, we show an example to use DARTS trainer manually.
+
+.. code-block:: python
+
+  from nni.retiarii.oneshot.pytorch import DartsTrainer
+  trainer = DartsTrainer(
+      model=model,
+      loss=criterion,
+      metrics=lambda output, target: accuracy(output, target, topk=(1,)),
+      optimizer=optim,
+      num_epochs=args.epochs,
+      dataset=dataset_train,
+      batch_size=args.batch_size,
+      log_frequency=args.log_frequency,
+      unrolled=args.unrolled
+  )
+  trainer.fit()
+  final_architecture = trainer.export()
+
+**Format of the exported architecture.** TBD.
--- a/docs/en_US/NAS/Overview.rst
+++ b/docs/en_US/NAS/Overview.rst
-Neural Architecture Search (NAS) on NNI
+Retiarii for Neural Architecture Search
 =======================================

+.. Note:: NNI's latest NAS supports are all based on Retiarii Framework, users who are still on `early version using NNI NAS v1.0 <https://nni.readthedocs.io/en/v2.2/nas.html>`__ shall migrate your work to Retiarii as soon as possible.
+
 .. contents::

+Motivation
+----------
+
+Automatic neural architecture search is playing an increasingly important role in finding better models. Recent research has proven the feasibility of automatic NAS and has led to models that beat many manually designed and tuned models. Representative works include `NASNet <https://arxiv.org/abs/1707.07012>`__\ , `ENAS <https://arxiv.org/abs/1802.03268>`__\ , `DARTS <https://arxiv.org/abs/1806.09055>`__\ , `Network Morphism <https://arxiv.org/abs/1806.10282>`__\ , and `Evolution <https://arxiv.org/abs/1703.01041>`__. In addition, new innovations continue to emerge.
+
+However, it is pretty hard to use existing NAS work to help develop common DNN models. Therefore, we designed `Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__, a novel NAS/HPO framework, and implemented it in NNI. It helps users easily construct a model space (or search space, tuning space), and utilize existing NAS algorithms. The framework also facilitates NAS innovation and is used to design new NAS algorithms.
+
 Overview
 --------

-Automatic neural architecture search is taking an increasingly important role in finding better models. Recent research has proved the feasibility of automatic NAS and has lead to models that beat many manually designed and tuned models. Some representative works are `NASNet <https://arxiv.org/abs/1707.07012>`__\ , `ENAS <https://arxiv.org/abs/1802.03268>`__\ , `DARTS <https://arxiv.org/abs/1806.09055>`__\ , `Network Morphism <https://arxiv.org/abs/1806.10282>`__\ , and `Evolution <https://arxiv.org/abs/1703.01041>`__. Further, new innovations keep emerging.
-
-However, it takes a great effort to implement NAS algorithms, and it's hard to reuse the code base of existing algorithms for new ones. To facilitate NAS innovations (e.g., the design and implementation of new NAS models, the comparison of different NAS models side-by-side, etc.), an easy-to-use and flexible programming interface is crucial.
+There are three key characteristics of the Retiarii framework:

-With this motivation, our ambition is to provide a unified architecture in NNI, accelerate innovations on NAS, and apply state-of-the-art algorithms to real-world problems faster.
+* Simple APIs are provided for defining model search space within PyTorch/TensorFlow model.
+* SOTA NAS algorithms are built-in to be used for exploring model search space.
+* System-level optimizations are implemented for speeding up the exploration.

-With the unified interface, there are two different modes for architecture search. `One <#supported-one-shot-nas-algorithms>`__ is the so-called one-shot NAS, where a super-net is built based on a search space and one-shot training is used to generate a good-performing child model. `The other <#supported-classic-nas-algorithms>`__ is the traditional search-based approach, where each child model within the search space runs as an independent trial. We call it classic NAS.
+There are two types of model space exploration approach: **Multi-trial NAS** and **One-shot NAS**. Mutli-trial NAS trains each sampled model in the model space independently, while One-shot NAS samples the model from a super model. After constructing the model space, users can use either exploration appraoch to explore the model space. 

-NNI also provides dedicated `visualization tool <#nas-visualization>`__ for users to check the status of the neural architecture search process.

-Supported Classic NAS Algorithms
--------------------------------
+Multi-trial NAS
+---------------

-The procedure of classic NAS algorithms is similar to hyper-parameter tuning, users use ``nnictl`` to start experiments and each model runs as a trial. The difference is that search space file is automatically generated from user model (with search space in it) by running ``nnictl ss_gen``. The following table listed supported tuning algorihtms for classic NAS mode. More algorihtms will be supported in future release.
+Multi-trial NAS means each sampled model from model space is trained independently. A typical multi-trial NAS is `NASNet <https://arxiv.org/abs/1707.07012>`__. The algorithm to sample models from model space is called exploration strategy. NNI has supported the following exploration strategies for multi-trial NAS.

 .. list-table::
   :header-rows: 1
   :widths: auto

-   * - Name
+   * - Exploration Strategy Name
     - Brief Introduction of Algorithm
-   * - :githublink:`Random Search <examples/tuners/random_nas_tuner>`
-     - Randomly pick a model from search space
-   * - `PPO Tuner <../Tuner/BuiltinTuner.rst#PPO-Tuner>`__
-     - PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. `Reference Paper <https://arxiv.org/abs/1707.06347>`__
+   * - Random Strategy
+     - Randomly sampling new model(s) from user defined model space. (``nni.retiarii.strategy.Random``)
+   * - Grid Search
+     - Sampling new model(s) from user defined model space using grid search algorithm. (``nni.retiarii.strategy.GridSearch``)
+   * - Regularized Evolution
+     - Generating new model(s) from generated models using `regularized evolution algorithm <https://arxiv.org/abs/1802.01548>`__ . (``nni.retiarii.strategy.RegularizedEvolution``)
+   * - TPE Strategy
+     - Sampling new model(s) from user defined model space using `TPE algorithm <https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf>`__ . (``nni.retiarii.strategy.TPEStrategy``)
+   * - RL Strategy
+     - It uses `PPO algorithm <https://arxiv.org/abs/1707.06347>`__ to sample new model(s) from user defined model space. (``nni.retiarii.strategy.PolicyBasedRL``)


-Please refer to `here <ClassicNas.rst>`__ for the usage of classic NAS algorithms.
+Please refer to `here <./multi_trial_nas.rst>`__ for detailed usage of multi-trial NAS.

-Supported One-shot NAS Algorithms
---------------------------------
+One-shot NAS
+------------

-NNI currently supports the one-shot NAS algorithms listed below and is adding more. Users can reproduce an algorithm or use it on their own dataset. We also encourage users to implement other algorithms with `NNI API <#use-nni-api>`__\ , to benefit more people.
+One-shot NAS means building model space into a super-model, training the super-model with weight sharing, and then sampling models from the super-model to find the best one. `DARTS <https://arxiv.org/abs/1806.09055>`__ is a typical one-shot NAS.
+Below is the supported one-shot NAS algorithms. More one-shot NAS will be supported soon.

 .. list-table::
   :header-rows: 1
   :widths: auto

-   * - Name
+   * - One-shot Algorithm Name
     - Brief Introduction of Algorithm
   * - `ENAS <ENAS.rst>`__
     - `Efficient Neural Architecture Search via Parameter Sharing <https://arxiv.org/abs/1802.03268>`__. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance.
   * - `DARTS <DARTS.rst>`__
     - `DARTS: Differentiable Architecture Search <https://arxiv.org/abs/1806.09055>`__ introduces a novel algorithm for differentiable network architecture search on bilevel optimization.
-   * - `P-DARTS <PDARTS.rst>`__
-     - `Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation <https://arxiv.org/abs/1904.12760>`__ is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure.
   * - `SPOS <SPOS.rst>`__
     - `Single Path One-Shot Neural Architecture Search with Uniform Sampling <https://arxiv.org/abs/1904.00420>`__ constructs a simplified supernet trained with a uniform path sampling method and applies an evolutionary algorithm to efficiently search for the best-performing architectures.
-   * - `CDARTS <CDARTS.rst>`__
-     - `Cyclic Differentiable Architecture Search <https://arxiv.org/pdf/2006.10724.pdf>`__ builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.
   * - `ProxylessNAS <Proxylessnas.rst>`__
     - `ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware <https://arxiv.org/abs/1812.00332>`__. It removes proxy, directly learns the architectures for large-scale target tasks and target hardware platforms.
-   * - `FBNet <FBNet.rst>`__
-     - `FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search <https://arxiv.org/abs/1812.03443>`__. It is a block-wise differentiable neural network architecture search method with the hardware-aware constraint.
-   * - `TextNAS <TextNAS.rst>`__
-     - `TextNAS: A Neural Architecture Search Space tailored for Text Representation <https://arxiv.org/pdf/1912.10729.pdf>`__. It is a neural architecture search algorithm tailored for text representation.
-   * - `Cream <Cream.rst>`__
-     - `Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__. It is a new NAS algorithm distilling prioritized paths in search space, without using evolutionary algorithms. Achieving competitive performance on ImageNet, especially for small models (e.g. <200 M FLOPs).
-
-
-One-shot algorithms run **standalone without nnictl**. NNI supports both PyTorch and Tensorflow 2.X.
-
-Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``.
-
-
-* tensorboard
-* PyTorch 1.2+
-* git
-
-Please refer to `here <NasGuide.rst>`__ for the usage of one-shot NAS algorithms.
-
-One-shot NAS can be visualized with our visualization tool. Learn more details `here <./Visualization.rst>`__.

-Search Space Zoo
----------------
-
-NNI provides some predefined search space which can be easily reused. By stacking the extracted cells, user can quickly reproduce those NAS models.
-
-Search Space Zoo contains the following NAS cells:
-
-
-* `DartsCell <./SearchSpaceZoo.rst#DartsCell>`__
-* `ENAS micro <./SearchSpaceZoo.rst#ENASMicroLayer>`__
-* `ENAS macro <./SearchSpaceZoo.rst#ENASMacroLayer>`__
-* `NAS Bench 201 <./SearchSpaceZoo.rst#nas-bench-201>`__
-
-Using NNI API to Write Your Search Space
----------------------------------------
-
-The programming interface of designing and searching a model is often demanded in two scenarios.
-
-
-#. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models.
-#. When applying NAS on a neural network, it needs a unified way to express the search space of architectures, so that it doesn't need to update trial code for different search algorithms.
-
-For using NNI NAS, we suggest users to first go through `the tutorial of NAS API for building search space <./WriteSearchSpace.rst>`__.
-
-NAS Visualization
-----------------
-
-To help users track the process and status of how the model is searched under specified search space, we developed a visualization tool. It visualizes search space as a super-net and shows importance of subnets and layers/operations, as well as how the importance changes along with the search process. Please refer to `the document of NAS visualization <./Visualization.rst>`__ for how to use it.
+Please refer to `here <one_shot_nas.rst>`__ for detailed usage of one-shot NAS algorithms.

 Reference and Feedback
 ----------------------

-
-* To `report a bug <https://github.com/microsoft/nni/issues/new?template=bug-report.rst>`__ for this feature in GitHub;
-* To `file a feature or improvement request <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__ for this feature in GitHub.
+* `Quick Start <./QuickStart.rst>`__ ;
+* `Construct Your Model Space <./construct_space.rst>`__ ;
+* `Retiarii: A Deep Learning Exploratory-Training Framework <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ ;
+* To `report a bug <https://github.com/microsoft/nni/issues/new?template=bug-report.rst>`__ for this feature in GitHub ;
+* To `file a feature or improvement request <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__ for this feature in GitHub .
--- a/docs/en_US/NAS/PDARTS.rst
+++ b/docs/en_US/NAS/PDARTS.rst
-P-DARTS
-=======
-
-Examples
--------
-
-:githublink:`Example code <examples/nas/legacy/pdarts>`
-
-.. code-block:: bash
-
-   # In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-   git clone https://github.com/Microsoft/nni.git
-
-   # search the best architecture
-   cd examples/nas/legacy/pdarts
-   python3 search.py
-
-   # train the best architecture, it's the same progress as darts.
-   cd ../darts
-   python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json
--- a/docs/en_US/NAS/Proxylessnas.rst
+++ b/docs/en_US/NAS/Proxylessnas.rst
@@ -9,7 +9,7 @@ The paper `ProxylessNAS: Direct Neural Architecture Search on Target Task and Ha
 Usage
 -----

-To use ProxylessNAS training/searching approach, users need to specify search space in their model using `NNI NAS interface <NasGuide.rst>`__\ , e.g., ``LayerChoice``\ , ``InputChoice``. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it.
+To use ProxylessNAS training/searching approach, users need to specify search space in their model using `NNI NAS interface <./MutationPrimitives.rst>`__\ , e.g., ``LayerChoice``\ , ``InputChoice``. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it.

 .. code-block:: python