Add SPOS docs and improve NAS doc structure (#1907)

* darts mutator docs * fix docs * update * add docs for SPOS * index SPOS * restore workers

Add SPOS docs and improve NAS doc structure (#1907)
* darts mutator docs * fix docs * update * add docs for SPOS * index SPOS * restore workers
c993f767 · Yuge Zhang · GitHub · 31f545ee · c993f767 · c993f767
Unverified Commit c993f767 authored Dec 31, 2019 by Yuge Zhang Committed by GitHub Dec 31, 2019
19 changed files
--- a/docs/en_US/NAS/DARTS.md
+++ b/docs/en_US/NAS/DARTS.md
-# DARTS on NNI
+# DARTS
 ## Introduction
-The paper [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Their method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent
+The paper [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Their method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent.
-To implement, authors optimize the network weights and architecture weights alternatively in mini-batches. They further explore the possibility that uses second order optimization (unroll) instead of first order, to improve the performance.
+Authors' code optimizes the network weights and architecture weights alternatively in mini-batches. They further explore the possibility that uses second order optimization (unroll) instead of first order, to improve the performance.
-Implementation on NNI is based on the [official implementation](https://github.com/quark0/darts) and a [popular 3rd-party repo](https://github.com/khanrc/pt.darts). So far, first and second order optimization and training from scratch on CIFAR10 have been implemented.
+Implementation on NNI is based on the [official implementation](https://github.com/quark0/darts) and a [popular 3rd-party repo](https://github.com/khanrc/pt.darts). DARTS on NNI is designed to be general for arbitrary search space. A CNN search space tailored for CIFAR10, same as the original paper, is implemented as a use case of DARTS.
-## Reproduce Results
+## Reproduction Results
-To reproduce the results in the paper, we do experiments with first and second order optimization. Due to the time limit, we retrain *only the best architecture* derived from the search phase and we repeat the experiment *only once*. Our results is currently on par with the results reported in paper. We will add more results later when ready.
+The above-mentioned example is meant to reproduce the results in the paper, we do experiments with first and second order optimization. Due to the time limit, we retrain *only the best architecture* derived from the search phase and we repeat the experiment *only once*. Our results is currently on par with the results reported in paper. We will add more results later when ready.
 |                        | In paper      | Reproduction |
 | ---------------------- | ------------- | ------------ |
 | First order (CIFAR10)  | 3.00 +/- 0.14 | 2.78         |
 | Second order (CIFAR10) | 2.76 +/- 0.09 | 2.89         |
+## Examples
+### CNN Search Space
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/darts)
+```bash
+# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
+git clone https://github.com/Microsoft/nni.git
+# search the best architecture
+cd examples/nas/darts
+python3 search.py
+# train the best architecture
+python3 retrain.py --arc-checkpoint ./checkpoints/epoch_49.json
+```
+## Reference
+### PyTorch
+```eval_rst
+..  autoclass:: nni.nas.pytorch.darts.DartsTrainer
+    :members:
+    .. automethod:: __init__
+..  autoclass:: nni.nas.pytorch.darts.DartsMutator
+    :members:
+```
--- a/docs/en_US/NAS/ENAS.md
+++ b/docs/en_US/NAS/ENAS.md
-# ENAS on NNI
+# ENAS
 ## Introduction
 The paper [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268) uses parameter sharing between child models to accelerate the NAS process. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss.
-Implementation on NNI is based on the [official implementation in Tensorflow](https://github.com/melodyguan/enas), macro and micro search space on CIFAR10 included. Since code to train from scratch on NNI is not ready yet, reproduction results are currently unavailable.
+Implementation on NNI is based on the [official implementation in Tensorflow](https://github.com/melodyguan/enas), including a general-purpose Reinforcement-learning controller and a trainer that trains target network and this controller alternatively. Following paper, we have also implemented macro and micro search space on CIFAR10 to demonstrate how to use these trainers. Since code to train from scratch on NNI is not ready yet, reproduction results are currently unavailable.
+## Examples
+### CIFAR10 Macro/Micro Search Space
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/enas)
+```bash
+# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
+git clone https://github.com/Microsoft/nni.git
+# search the best architecture
+cd examples/nas/enas
+# search in macro search space
+python3 search.py --search-for macro
+# search in micro search space
+python3 search.py --search-for micro
+# view more options for search
+python3 search.py -h
+```
+## Reference
+### PyTorch
+```eval_rst
+..  autoclass:: nni.nas.pytorch.enas.EnasTrainer
+    :members:
+    .. automethod:: __init__
+..  autoclass:: nni.nas.pytorch.enas.EnasMutator
+    :members:
+    .. automethod:: __init__
+```
--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -6,7 +6,7 @@ However, it takes great efforts to implement NAS algorithms, and it is hard to r
 With this motivation, our ambition is to provide a unified architecture in NNI, to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.
-With [the unified interface](./NasInterface.md), there are two different modes for the architecture search. [The one](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](./NasInterface.md#classic-distributed-search) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model.
+With [the unified interface](./NasInterface.md), there are two different modes for the architecture search. [One](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](./NasInterface.md#classic-distributed-search) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model.
 * [Supported One-shot NAS Algorithms](#supported-one-shot-nas-algorithms)
 * [Classic Distributed NAS with NNI experiment](./NasInterface.md#classic-distributed-search)
@@ -14,85 +14,24 @@ With [the unified interface](./NasInterface.md), there are two different modes f
 ## Supported One-shot NAS Algorithms
-NNI supports below NAS algorithms now and being adding more. User can reproduce an algorithm or use it on owned dataset. we also encourage user to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.
+NNI supports below NAS algorithms now and is adding more. User can reproduce an algorithm or use it on their own dataset. We also encourage users to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.
 |Name|Brief Introduction of Algorithm|
 |---|---|
-| [ENAS](#enas) | Efficient Neural Architecture Search via Parameter Sharing [Reference Paper][1] |
+| [ENAS](ENAS.md) | [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268). In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance. |
-| [DARTS](#darts) | DARTS: Differentiable Architecture Search [Reference Paper][3] |
+| [DARTS](DARTS.md) | [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) introduces a novel algorithm for differentiable network architecture search on bilevel optimization. |
-| [P-DARTS](#p-darts) | Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation [Reference Paper](https://arxiv.org/abs/1904.12760)|
+| [P-DARTS](PDARTS.md) | [Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. |
+| [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with an uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
-Note, these algorithms run **standalone without nnictl**, and supports PyTorch only. Tensorflow 2.0 will be supported in future release.
+One-shot algorithms run **standalone without nnictl**. Only PyTorch version has been implemented. Tensorflow 2.x will be supported in future release.
-### Dependencies
+Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``.
 * NNI 1.2+
 * tensorboard
 * PyTorch 1.2+
 * git
-### ENAS
-[Efficient Neural Architecture Search via Parameter Sharing][1]. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance.
-#### Usage
-ENAS in NNI is still under development and we only support search phase for macro/micro search space on CIFAR10. Training from scratch and search space on PTB has not been finished yet. [Detailed Description](ENAS.md)
-```bash
-# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-git clone https://github.com/Microsoft/nni.git
-# search the best architecture
-cd examples/nas/enas
-# search in macro search space
-python3 search.py --search-for macro
-# search in micro search space
-python3 search.py --search-for micro
-# view more options for search
-python3 search.py -h
-```
-### DARTS
-The main contribution of [DARTS: Differentiable Architecture Search][3] on algorithm is to introduce a novel algorithm for differentiable network architecture search on bilevel optimization. [Detailed Description](DARTS.md)
-#### Usage
-```bash
-# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-git clone https://github.com/Microsoft/nni.git
-# search the best architecture
-cd examples/nas/darts
-python3 search.py
-# train the best architecture
-python3 retrain.py --arc-checkpoint ./checkpoints/epoch_49.json
-```
-### P-DARTS
-[Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) bases on [DARTS](#DARTS). It's contribution on algorithm is to introduce an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure.
-#### Usage
-```bash
-# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-git clone https://github.com/Microsoft/nni.git
-# search the best architecture
-cd examples/nas/pdarts
-python3 search.py
-# train the best architecture, it's the same progress as darts.
-cd ../darts
-python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json
-```
 ## Use NNI API
 NOTE, we are trying to support various NAS algorithms with unified programming interface, and it's in very experimental stage. It means the current programing interface may be updated in future.
@@ -104,7 +43,7 @@ The programming interface of designing and searching a model is often demanded i
 1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs  best. So, it needs an easy way to express the candidate layers or sub-models.
 2. When applying NAS on a neural network, it needs an unified way to express the search space of architectures, so that it doesn't need to update trial code for different searching algorithms.
-NNI proposed API is [here](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch). And [here](https://github.com/microsoft/nni/tree/master/examples/nas/darts) is an example of NAS implementation, which bases on NNI proposed interface.
+NNI proposed API is [here](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch). And [here](https://github.com/microsoft/nni/tree/master/examples/nas/naive) is an example of NAS implementation, which bases on NNI proposed interface.
 [1]: https://arxiv.org/abs/1802.03268
 [2]: https://arxiv.org/abs/1707.07012

--- a/docs/en_US/NAS/PDARTS.md
+++ b/docs/en_US/NAS/PDARTS.md
+# P-DARTS
+## Examples
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/pdarts)
+```bash
+# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
+git clone https://github.com/Microsoft/nni.git
+# search the best architecture
+cd examples/nas/pdarts
+python3 search.py
+# train the best architecture, it's the same progress as darts.
+cd ../darts
+python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json
+```
--- a/docs/en_US/NAS/SPOS.md
+++ b/docs/en_US/NAS/SPOS.md
+# Single Path One-Shot (SPOS)
+## Introduction
+Proposed in [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) is a one-shot NAS method that addresses the difficulties in training One-Shot NAS models by constructing a simplified supernet trained with an uniform path sampling method, so that all underlying architectures (and their weights) get trained fully and equally. An evolutionary algorithm is then applied to efficiently search for the best-performing architectures without any fine tuning.
+Implementation on NNI is based on [official repo](https://github.com/megvii-model/SinglePathOneShot). We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase. We have also shown 
+## Examples
+Here is a use case, which is the search space in paper, and the way to use flops limit to perform uniform sampling.
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/spos)
+### Requirements
+NVIDIA DALI >= 0.16 is needed as we use DALI to accelerate the data loading of ImageNet. [Installation guide](https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/installation.html)
+Download the flops lookup table from [here](https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN) (maintained by [Megvii](https://github.com/megvii-model)).
+Put `op_flops_dict.pkl` and `checkpoint-150000.pth.tar` (if you don't want to retrain the supernet) under `data` directory.
+Prepare ImageNet in the standard format (follow the script [here](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4)). Linking it to `data/imagenet` will be more convenient.
+After preparation, it's expected to have the following code structure:
+```
+spos
+├── architecture_final.json
+├── blocks.py
+├── config_search.yml
+├── data
+│   ├── imagenet
+│   │   ├── train
+│   │   └── val
+│   └── op_flops_dict.pkl
+├── dataloader.py
+├── network.py
+├── readme.md
+├── scratch.py
+├── supernet.py
+├── tester.py
+├── tuner.py
+└── utils.py
+```
+### Step 1. Train Supernet
+```
+python supernet.py
+```
+Will export the checkpoint to `checkpoints` directory, for the next step.
+NOTE: The data loading used in the official repo is [slightly different from usual](https://github.com/megvii-model/SinglePathOneShot/issues/5), as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option `--spos-preprocessing` will simulate the behavior used originally and enable you to use the checkpoints pretrained.
+### Step 2. Evolution Search
+Single Path One-Shot leverages evolution algorithm to search for the best architecture. The tester, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
+In order to make the tuner aware of the flops limit and have the ability to calculate the flops, we created a new tuner called `EvolutionWithFlops` in `tuner.py`, inheriting the tuner in SDK.
+To have a search space ready for NNI framework, first run
+```
+nnictl ss_gen -t "python tester.py"
+```
+This will generate a file called `nni_auto_gen_search_space.json`, which is a serialized representation of your search space.
+By default, it will use `checkpoint-150000.pth.tar` downloaded previously. In case you want to use the checkpoint trained by yourself from the last step, specify `--checkpoint` in the command in `config_search.yml`.
+Then search with evolution tuner.
+```
+nnictl create --config config_search.yml
+```
+The final architecture exported from every epoch of evolution can be found in `checkpoints` under the working directory of your tuner, which, by default, is `$HOME/nni/experiments/your_experiment_id/log`.
+### Step 3. Train from Scratch
+```
+python scratch.py
+```
+By default, it will use `architecture_final.json`. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with `--fixed-arc` option.
+## Reference
+### PyTorch
+```eval_rst
+..  autoclass:: nni.nas.pytorch.spos.SPOSEvolution
+    :members:
+    .. automethod:: __init__
+..  autoclass:: nni.nas.pytorch.spos.SPOSSupernetTrainer
+    :members:
+    .. automethod:: __init__
+..  autoclass:: nni.nas.pytorch.spos.SPOSSupernetTrainingMutator
+    :members:
+    .. automethod:: __init__
+```
+## Known Limitations
+* Block search only. Channel search is not supported yet.
+* Only GPU version is provided here.
+## Current Reproduction Results
+Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.
+* Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to [this issue](https://github.com/megvii-model/SinglePathOneShot/issues/6).
+* Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.
--- a/docs/en_US/nas.rst
+++ b/docs/en_US/nas.rst
@@ -22,4 +22,5 @@ For details, please refer to the following tutorials:
    NAS Interface <NAS/NasInterface>
    ENAS <NAS/ENAS>
    DARTS <NAS/DARTS>
-    P-DARTS <NAS/Overview>
+    P-DARTS <NAS/PDARTS>
+    SPOS <NAS/SPOS>
--- a/examples/nas/darts/README.md
+++ b/examples/nas/darts/README.md
+[Documentation](https://nni.readthedocs.io/en/latest/NAS/DARTS.html)
--- a/examples/nas/enas/README.md
+++ b/examples/nas/enas/README.md
+[Documentation](https://nni.readthedocs.io/en/latest/NAS/ENAS.html)
--- a/examples/nas/naive/README.md
+++ b/examples/nas/naive/README.md
+This is a naive example that demonstrates how to use NNI interface to implement a NAS search space.
\ No newline at end of file
--- a/examples/nas/pdarts/README.md
+++ b/examples/nas/pdarts/README.md
+[Documentation](https://nni.readthedocs.io/en/latest/NAS/PDARTS.html)
--- a/examples/nas/spos/README.md
+++ b/examples/nas/spos/README.md
-# Single Path One-Shot Neural Architecture Search with Uniform Sampling
+[Documentation](https://nni.readthedocs.io/en/latest/NAS/SPOS.html)
-Single Path One-Shot by Megvii Research. [Paper link](https://arxiv.org/abs/1904.00420). [Official repo](https://github.com/megvii-model/SinglePathOneShot).
-Block search only. Channel search is not supported yet.
-Only GPU version is provided here.
-## Preparation
-### Requirements
-* PyTorch >= 1.2
-* NVIDIA DALI >= 0.16 as we use DALI to accelerate the data loading of ImageNet. [Installation guide](https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/installation.html)
-### Data
-Need to download the flops lookup table from [here](https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN).
-Put `op_flops_dict.pkl` and `checkpoint-150000.pth.tar` (if you don't want to retrain the supernet) under `data` directory.
-Prepare ImageNet in the standard format (follow the script [here](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4)). Linking it to `data/imagenet` will be more convenient.
-After preparation, it's expected to have the following code structure:
-```
-spos
-├── architecture_final.json
-├── blocks.py
-├── config_search.yml
-├── data
-│   ├── imagenet
-│   │   ├── train
-│   │   └── val
-│   └── op_flops_dict.pkl
-├── dataloader.py
-├── network.py
-├── readme.md
-├── scratch.py
-├── supernet.py
-├── tester.py
-├── tuner.py
-└── utils.py
-```
-## Step 1. Train Supernet
-```
-python supernet.py
-```
-Will export the checkpoint to `checkpoints` directory, for the next step.
-NOTE: The data loading used in the official repo is [slightly different from usual](https://github.com/megvii-model/SinglePathOneShot/issues/5), as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option `--spos-preprocessing` will simulate the behavior used originally and enable you to use the checkpoints pretrained.
-## Step 2. Evolution Search
-Single Path One-Shot leverages evolution algorithm to search for the best architecture. The tester, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
-To have a search space ready for NNI framework, first run
-```
-nnictl ss_gen -t "python tester.py"
-```
-This will generate a file called `nni_auto_gen_search_space.json`, which is a serialized representation of your search space.
-By default, it will use `checkpoint-150000.pth.tar` downloaded previously. In case you want to use the checkpoint trained by yourself from the last step, specify `--checkpoint` in the command in `config_search.yml`.
-Then search with evolution tuner.
-```
-nnictl create --config config_search.yml
-```
-The final architecture exported from every epoch of evolution can be found in `checkpoints` under the working directory of your tuner, which, by default, is `$HOME/nni/experiments/your_experiment_id/log`.
-## Step 3. Train from Scratch
-```
-python scratch.py
-```
-By default, it will use `architecture_final.json`. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with `--fixed-arc` option.
-## Current Reproduction Results
-Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.
-* Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to [this issue](https://github.com/megvii-model/SinglePathOneShot/issues/6).
-* Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.
--- a/src/sdk/pynni/nni/nas/pytorch/darts/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/darts/mutator.py
@@ -14,6 +14,26 @@ _logger = logging.getLogger(__name__)
 class DartsMutator(Mutator):
+    """
+    Connects the model in a DARTS (differentiable) way.
+    An extra connection is automatically inserted for each LayerChoice, when this connection is selected, there is no
+    op on this LayerChoice (namely a ``ZeroOp``), in which case, every element in the exported choice list is ``false``
+    (not chosen).
+    All input choice will be fully connected in the search phase. On exporting, the input choice will choose inputs based
+    on keys in ``choose_from``. If the keys were to be keys of LayerChoices, the top logit of the corresponding LayerChoice
+    will join the competition of input choice to compete against other logits. Otherwise, the logit will be assumed 0.
+    It's possible to cut branches by setting parameter ``choices`` in a particular position to ``-inf``. After softmax, the
+    value would be 0. Framework will ignore 0 values and not connect. Note that the gradient on the ``-inf`` location will
+    be 0. Since manipulations with ``-inf`` will be ``nan``, you need to handle the gradient update phase carefully.
+    Attributes
+    ----------
+    choices: ParameterDict
+        dict that maps keys of LayerChoices to weighted-connection float tensors.
+    """
    def __init__(self, model):
        super().__init__(model)
        self.choices = nn.ParameterDict()

--- a/src/sdk/pynni/nni/nas/pytorch/darts/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/darts/trainer.py
@@ -19,6 +19,42 @@ class DartsTrainer(Trainer):
                 optimizer, num_epochs, dataset_train, dataset_valid,
                 mutator=None, batch_size=64, workers=4, device=None, log_frequency=None,
                 callbacks=None, arc_learning_rate=3.0E-4, unrolled=False):
+        """
+        Initialize a DartsTrainer.
+        Parameters
+        ----------
+        model : nn.Module
+            PyTorch model to be trained.
+        loss : callable
+            Receives logits and ground truth label, return a loss tensor.
+        metrics : callable
+            Receives logits and ground truth label, return a dict of metrics.
+        optimizer : Optimizer
+            The optimizer used for optimizing the model.
+        num_epochs : int
+            Number of epochs planned for training.
+        dataset_train : Dataset
+            Dataset for training. Will be split for training weights and architecture weights.
+        dataset_valid : Dataset
+            Dataset for testing.
+        mutator : DartsMutator
+            Use in case of customizing your own DartsMutator. By default will instantiate a DartsMutator.
+        batch_size : int
+            Batch size.
+        workers : int
+            Workers for data loading.
+        device : torch.device
+            ``torch.device("cpu")`` or ``torch.device("cuda")``.
+        log_frequency : int
+            Step count per logging.
+        callbacks : list of Callback
+            list of callbacks to trigger at events.
+        arc_learning_rate : float
+            Learning rate of architecture parameters.
+        unrolled : float
+            ``True`` if using second order optimization, else first order optimization.
+        """
        super().__init__(model, mutator if mutator is not None else DartsMutator(model),
                         loss, metrics, optimizer, num_epochs, dataset_train, dataset_valid,
                         batch_size, workers, device, log_frequency, callbacks)

--- a/src/sdk/pynni/nni/nas/pytorch/enas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/enas/mutator.py
@@ -31,6 +31,31 @@ class EnasMutator(Mutator):
    def __init__(self, model, lstm_size=64, lstm_num_layers=1, tanh_constant=1.5, cell_exit_extra_step=False,
                 skip_target=0.4, branch_bias=0.25):
+        """
+        Initialize a EnasMutator.
+        Parameters
+        ----------
+        model : nn.Module
+            PyTorch model.
+        lstm_size : int
+            Controller LSTM hidden units.
+        lstm_num_layers : int
+            Number of layers for stacked LSTM.
+        tanh_constant : float
+            Logits will be equal to ``tanh_constant * tanh(logits)``. Don't use ``tanh`` if this value is ``None``.
+        cell_exit_extra_step : bool
+            If true, RL controller will perform an extra step at the exit of each MutableScope, dump the hidden state
+            and mark it as the hidden state of this MutableScope. This is to align with the original implementation of paper.
+        skip_target : float
+            Target probability that skipconnect will appear.
+        branch_bias : float
+            Manual bias applied to make some operations more likely to be chosen.
+            Currently this is implemented with a hardcoded match rule that aligns with original repo.
+            If a mutable has a ``reduce`` in its key, all its op choices
+            that contains `conv` in their typename will receive a bias of ``+self.branch_bias`` initially; while others
+            receive a bias of ``-self.branch_bias``.
+        """
        super().__init__(model)
        self.lstm_size = lstm_size
        self.lstm_num_layers = lstm_num_layers

--- a/src/sdk/pynni/nni/nas/pytorch/enas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/enas/trainer.py
@@ -18,6 +18,54 @@ class EnasTrainer(Trainer):
                 mutator=None, batch_size=64, workers=4, device=None, log_frequency=None, callbacks=None,
                 entropy_weight=0.0001, skip_weight=0.8, baseline_decay=0.999,
                 mutator_lr=0.00035, mutator_steps_aggregate=20, mutator_steps=50, aux_weight=0.4):
+        """
+        Initialize an EnasTrainer.
+        Parameters
+        ----------
+        model : nn.Module
+            PyTorch model to be trained.
+        loss : callable
+            Receives logits and ground truth label, return a loss tensor.
+        metrics : callable
+            Receives logits and ground truth label, return a dict of metrics.
+        reward_function : callable
+            Receives logits and ground truth label, return a tensor, which will be feeded to RL controller as reward.
+        optimizer : Optimizer
+            The optimizer used for optimizing the model.
+        num_epochs : int
+            Number of epochs planned for training.
+        dataset_train : Dataset
+            Dataset for training. Will be split for training weights and architecture weights.
+        dataset_valid : Dataset
+            Dataset for testing.
+        mutator : EnasMutator
+            Use when customizing your own mutator or a mutator with customized parameters.
+        batch_size : int
+            Batch size.
+        workers : int
+            Workers for data loading.
+        device : torch.device
+            ``torch.device("cpu")`` or ``torch.device("cuda")``.
+        log_frequency : int
+            Step count per logging.
+        callbacks : list of Callback
+            list of callbacks to trigger at events.
+        entropy_weight : float
+            Weight of sample entropy loss.
+        skip_weight : float
+            Weight of skip penalty loss.
+        baseline_decay : float
+            Decay factor of baseline. New baseline will be equal to ``baseline_decay * baseline_old + reward * (1 - baseline_decay)``.
+        mutator_lr : float
+            Learning rate for RL controller.
+        mutator_steps_aggregate : int
+            Number of steps that will be aggregated into one mini-batch for RL controller.
+        mutator_steps : int
+            Number of mini-batches for each epoch of RL controller learning.
+        aux_weight : float
+            Weight of auxiliary head loss. ``aux_weight * aux_loss`` will be added to total loss.
+        """
        super().__init__(model, mutator if mutator is not None else EnasMutator(model),
                         loss, metrics, optimizer, num_epochs, dataset_train, dataset_valid,
                         batch_size, workers, device, log_frequency, callbacks)

--- a/src/sdk/pynni/nni/nas/pytorch/spos/evolution.py
+++ b/src/sdk/pynni/nni/nas/pytorch/spos/evolution.py
@@ -211,6 +211,7 @@ class SPOSEvolution(Tuner):
        Parameters
        ----------
        result : dict
+            Chosen architectures to be exported.
        """
        os.makedirs("checkpoints", exist_ok=True)
        for i, cand in enumerate(result):

--- a/src/sdk/pynni/nni/nas/pytorch/spos/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/spos/mutator.py
@@ -17,6 +17,7 @@ class SPOSSupernetTrainingMutator(RandomMutator):
        Parameters
        ----------
        model : nn.Module
+            PyTorch model.
        flops_func : callable
            Callable that takes a candidate from `sample_search` and returns its candidate. When `flops_func`
            is None, functions related to flops will be deactivated.

--- a/src/sdk/pynni/nni/nas/pytorch/spos/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/spos/trainer.py
@@ -21,6 +21,37 @@ class SPOSSupernetTrainer(Trainer):
                 optimizer, num_epochs, train_loader, valid_loader,
                 mutator=None, batch_size=64, workers=4, device=None, log_frequency=None,
                 callbacks=None):
+        """
+        Parameters
+        ----------
+        model : nn.Module
+            Model with mutables.
+        mutator : Mutator
+            A mutator object that has been initialized with the model.
+        loss : callable
+            Called with logits and targets. Returns a loss tensor.
+        metrics : callable
+            Returns a dict that maps metrics keys to metrics data.
+        optimizer : Optimizer
+            Optimizer that optimizes the model.
+        num_epochs : int
+            Number of epochs of training.
+        train_loader : iterable
+            Data loader of training. Raise ``StopIteration`` when one epoch is exhausted.
+        dataset_valid : iterable
+            Data loader of validation. Raise ``StopIteration`` when one epoch is exhausted.
+        batch_size : int
+            Batch size.
+        workers: int
+            Number of threads for data preprocessing. Not used for this trainer. Maybe removed in future.
+        device : torch.device
+            Device object. Either ``torch.device("cuda")`` or ``torch.device("cpu")``. When ``None``, trainer will
+            automatic detects GPU and selects GPU first.
+        log_frequency : int
+            Number of mini-batches to log metrics.
+        callbacks : list of Callback
+            Callbacks to plug into the trainer. See Callbacks.
+        """
        assert torch.cuda.is_available()
        super().__init__(model, mutator if mutator is not None else SPOSSupernetTrainingMutator(model),
                         loss, metrics, optimizer, num_epochs, None, None,

--- a/src/sdk/pynni/nni/nas/pytorch/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/trainer.py
@@ -52,7 +52,7 @@ class Trainer(BaseTrainer):
        workers : int
            Number of workers used in data preprocessing.
        device : torch.device
-            Device object. Either `torch.device("cuda")` or torch.device("cpu")`. When `None`, trainer will
+            Device object. Either ``torch.device("cuda")`` or ``torch.device("cpu")``. When ``None``, trainer will
            automatic detects GPU and selects GPU first.
        log_frequency : int
            Number of mini-batches to log metrics.