[Doc] NAS (#4584)

cef9babd · Yuge Zhang · GitHub · ad5aff39 · cef9babd · cef9babd
Unverified Commit cef9babd authored Feb 28, 2022 by Yuge Zhang Committed by GitHub Feb 28, 2022
20 changed files
--- a/docs/source/sdk_reference_zh.rst
+++ b/docs/source/sdk_reference_zh.rst
-.. 60cb924d0ec522b7709acf4f8cff3f16
+.. b1551bf7ef0c652ee5078598183fda45

 ####################
 Python API 参考
@@ -9,6 +9,5 @@ Python API 参考
    :maxdepth: 1

    自动调优 <autotune_ref>
-    NAS <NAS/ApiReference>
    模型压缩 <Compression/CompressionReference>
    Python API <Tutorial/HowToLaunchFromPython>
\ No newline at end of file
--- a/docs/source/tutorials/hello_nas.ipynb
+++ b/docs/source/tutorials/hello_nas.ipynb
@@ -40,7 +40,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        ".. tip:: Always keep in mind that you should use ``import nni.retiarii.nn.pytorch as nn`` and :meth:`nni.retiarii.model_wrapper`.\n         Many mistakes are a result of forgetting one of those.\n         Also, please use ``torch.nn`` for submodules of ``nn.init``, e.g., ``torch.nn.init`` instead of ``nn.init``.\n\n### Define Model Mutations\n\nA base model is only one concrete model not a model space. We provide :doc:`API and Primitives </NAS/MutationPrimitives>`\nfor users to express how the base model can be mutated. That is, to build a model space which includes many models.\n\nBased on the above base model, we can define a model space as below.\n\n.. code-block:: diff\n\n  @model_wrapper\n  class Net(nn.Module):\n    def __init__(self):\n      super().__init__()\n      self.conv1 = nn.Conv2d(1, 32, 3, 1)\n  -   self.conv2 = nn.Conv2d(32, 64, 3, 1)\n  +   self.conv2 = nn.LayerChoice([\n  +       nn.Conv2d(32, 64, 3, 1),\n  +       DepthwiseSeparableConv(32, 64)\n  +   ])\n  -   self.dropout1 = nn.Dropout(0.25)\n  +   self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))\n      self.dropout2 = nn.Dropout(0.5)\n  -   self.fc1 = nn.Linear(9216, 128)\n  -   self.fc2 = nn.Linear(128, 10)\n  +   feature = nn.ValueChoice([64, 128, 256])\n  +   self.fc1 = nn.Linear(9216, feature)\n  +   self.fc2 = nn.Linear(feature, 10)\n\n    def forward(self, x):\n      x = F.relu(self.conv1(x))\n      x = F.max_pool2d(self.conv2(x), 2)\n      x = torch.flatten(self.dropout1(x), 1)\n      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))\n      output = F.log_softmax(x, dim=1)\n      return output\n\nThis results in the following code:\n\n"
+        ".. tip:: Always keep in mind that you should use ``import nni.retiarii.nn.pytorch as nn`` and :meth:`nni.retiarii.model_wrapper`.\n         Many mistakes are a result of forgetting one of those.\n         Also, please use ``torch.nn`` for submodules of ``nn.init``, e.g., ``torch.nn.init`` instead of ``nn.init``.\n\n### Define Model Mutations\n\nA base model is only one concrete model not a model space. We provide :doc:`API and Primitives </nas/construct_space>`\nfor users to express how the base model can be mutated. That is, to build a model space which includes many models.\n\nBased on the above base model, we can define a model space as below.\n\n.. code-block:: diff\n\n  @model_wrapper\n  class Net(nn.Module):\n    def __init__(self):\n      super().__init__()\n      self.conv1 = nn.Conv2d(1, 32, 3, 1)\n  -   self.conv2 = nn.Conv2d(32, 64, 3, 1)\n  +   self.conv2 = nn.LayerChoice([\n  +       nn.Conv2d(32, 64, 3, 1),\n  +       DepthwiseSeparableConv(32, 64)\n  +   ])\n  -   self.dropout1 = nn.Dropout(0.25)\n  +   self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))\n      self.dropout2 = nn.Dropout(0.5)\n  -   self.fc1 = nn.Linear(9216, 128)\n  -   self.fc2 = nn.Linear(128, 10)\n  +   feature = nn.ValueChoice([64, 128, 256])\n  +   self.fc1 = nn.Linear(9216, feature)\n  +   self.fc2 = nn.Linear(feature, 10)\n\n    def forward(self, x):\n      x = F.relu(self.conv1(x))\n      x = F.max_pool2d(self.conv2(x), 2)\n      x = torch.flatten(self.dropout1(x), 1)\n      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))\n      output = F.log_softmax(x, dim=1)\n      return output\n\nThis results in the following code:\n\n"
      ]
    },
    {
@@ -58,7 +58,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "This example uses two mutation APIs, ``nn.LayerChoice`` and ``nn.ValueChoice``.\n``nn.LayerChoice`` takes a list of candidate modules (two in this example), one will be chosen for each sampled model.\nIt can be used like normal PyTorch module.\n``nn.ValueChoice`` takes a list of candidate values, one will be chosen to take effect for each sampled model.\n\nMore detailed API description and usage can be found :doc:`here </NAS/construct_space>`.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>We are actively enriching the mutation APIs, to facilitate easy construction of model space.\n    If the currently supported mutation APIs cannot express your model space,\n    please refer to :doc:`this doc </NAS/Mutators>` for customizing mutators.</p></div>\n\n## Explore the Defined Model Space\n\nThere are basically two exploration approaches: (1) search by evaluating each sampled model independently,\nwhich is the search approach in multi-trial NAS and (2) one-shot weight-sharing based search, which is used in one-shot NAS.\nWe demonstrate the first approach in this tutorial. Users can refer to :doc:`here </NAS/OneshotTrainer>` for the second approach.\n\nFirst, users need to pick a proper exploration strategy to explore the defined model space.\nSecond, users need to pick or customize a model evaluator to evaluate the performance of each explored model.\n\n### Pick an exploration strategy\n\nRetiarii supports many :doc:`exploration strategies </NAS/ExplorationStrategies>`.\n\nSimply choosing (i.e., instantiate) an exploration strategy as below.\n\n"
+        "This example uses two mutation APIs, ``nn.LayerChoice`` and ``nn.ValueChoice``.\n``nn.LayerChoice`` takes a list of candidate modules (two in this example), one will be chosen for each sampled model.\nIt can be used like normal PyTorch module.\n``nn.ValueChoice`` takes a list of candidate values, one will be chosen to take effect for each sampled model.\n\nMore detailed API description and usage can be found :doc:`here </nas/construct_space>`.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>We are actively enriching the mutation APIs, to facilitate easy construction of model space.\n    If the currently supported mutation APIs cannot express your model space,\n    please refer to :doc:`this doc </nas/mutator>` for customizing mutators.</p></div>\n\n## Explore the Defined Model Space\n\nThere are basically two exploration approaches: (1) search by evaluating each sampled model independently,\nwhich is the search approach in `multi-trial NAS <multi-trial-nas>`\nand (2) one-shot weight-sharing based search, which is used in one-shot NAS.\nWe demonstrate the first approach in this tutorial. Users can refer to `here <one-shot-nas>` for the second approach.\n\nFirst, users need to pick a proper exploration strategy to explore the defined model space.\nSecond, users need to pick or customize a model evaluator to evaluate the performance of each explored model.\n\n### Pick an exploration strategy\n\nRetiarii supports many :doc:`exploration strategies </nas/exploration_strategy>`.\n\nSimply choosing (i.e., instantiate) an exploration strategy as below.\n\n"
      ]
    },
    {
@@ -76,7 +76,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "### Pick or customize a model evaluator\n\nIn the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training and validating each generated model to obtain the model's performance. The performance is sent to the exploration strategy for the strategy to generate better models.\n\nRetiarii has provided :doc:`built-in model evaluators </NAS/ModelEvaluators>`, but to start with, it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function. This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.\n\nAn example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.\n\n"
+        "### Pick or customize a model evaluator\n\nIn the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training\nand validating each generated model to obtain the model's performance.\nThe performance is sent to the exploration strategy for the strategy to generate better models.\n\nRetiarii has provided :doc:`built-in model evaluators </nas/evaluator>`, but to start with,\nit is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function.\nThis function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.\n\nAn example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.\n\n"
      ]
    },
    {
@@ -112,7 +112,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "The ``train_epoch`` and ``test_epoch`` here can be any customized function, where users can write their own training recipe.\n\nIt is recommended that the :doc:``evaluate_model`` here accepts no additional arguments other than ``model_cls``.\nHowever, in the `advanced tutorial </NAS/ModelEvaluators>`, we will show how to use additional arguments in case you actually need those.\nIn future, we will support mutation on the arguments of evaluators, which is commonly called \"Hyper-parmeter tuning\".\n\n## Launch an Experiment\n\nAfter all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.\n\n"
+        "The ``train_epoch`` and ``test_epoch`` here can be any customized function, where users can write their own training recipe.\n\nIt is recommended that the :doc:``evaluate_model`` here accepts no additional arguments other than ``model_cls``.\nHowever, in the `advanced tutorial </nas/evaluator>`, we will show how to use additional arguments in case you actually need those.\nIn future, we will support mutation on the arguments of evaluators, which is commonly called \"Hyper-parmeter tuning\".\n\n## Launch an Experiment\n\nAfter all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.\n\n"
      ]
    },
    {
@@ -159,7 +159,7 @@
      },
      "outputs": [],
      "source": [
-        "exp_config.trial_gpu_number = 1\nexp_config.training_service.use_active_gpu = False"
+        "exp_config.trial_gpu_number = 1\nexp_config.training_service.use_active_gpu = True"
      ]
    },
    {
@@ -184,7 +184,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "Users can also run Retiarii Experiment with :doc:`different training services <../training_services>` besides ``local`` training service.\n\n## Visualize the Experiment\n\nUsers can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment.\nFor example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``.\nPlease refer to :doc:`here <../Tutorial/WebUI>` for details.\n\nWe support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__).\nThis can be used by clicking ``Visualization`` in detail panel for each trial.\nNote that current visualization is based on `onnx <https://onnx.ai/>`__ ,\nthus visualization is not feasible if the model cannot be exported into onnx.\n\nBuilt-in evaluators (e.g., Classification) will automatically export the model into a file.\nFor your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.\nFor instance,\n\n"
+        "Users can also run Retiarii Experiment with :doc:`different training services </experiment/training_service>`\nbesides ``local`` training service.\n\n## Visualize the Experiment\n\nUsers can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment.\nFor example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``.\nPlease refer to :doc:`here </experiment/webui>` for details.\n\nWe support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__).\nThis can be used by clicking ``Visualization`` in detail panel for each trial.\nNote that current visualization is based on `onnx <https://onnx.ai/>`__ ,\nthus visualization is not feasible if the model cannot be exported into onnx.\n\nBuilt-in evaluators (e.g., Classification) will automatically export the model into a file.\nFor your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.\nFor instance,\n\n"
      ]
    },
    {

--- a/docs/source/tutorials/hello_nas.py
+++ b/docs/source/tutorials/hello_nas.py
@@ -66,7 +66,7 @@ class Net(nn.Module):
 # Define Model Mutations
 # ^^^^^^^^^^^^^^^^^^^^^^
 #
-# A base model is only one concrete model not a model space. We provide :doc:`API and Primitives </NAS/MutationPrimitives>`
+# A base model is only one concrete model not a model space. We provide :doc:`API and Primitives </nas/construct_space>`
 # for users to express how the base model can be mutated. That is, to build a model space which includes many models.
 #
 # Based on the above base model, we can define a model space as below.
@@ -150,20 +150,21 @@ model_space
 # It can be used like normal PyTorch module.
 # ``nn.ValueChoice`` takes a list of candidate values, one will be chosen to take effect for each sampled model.
 #
-# More detailed API description and usage can be found :doc:`here </NAS/construct_space>`.
+# More detailed API description and usage can be found :doc:`here </nas/construct_space>`.
 #
 # .. note::
 #
 #     We are actively enriching the mutation APIs, to facilitate easy construction of model space.
 #     If the currently supported mutation APIs cannot express your model space,
-#     please refer to :doc:`this doc </NAS/Mutators>` for customizing mutators.
+#     please refer to :doc:`this doc </nas/mutator>` for customizing mutators.
 #
 # Explore the Defined Model Space
 # -------------------------------
 #
 # There are basically two exploration approaches: (1) search by evaluating each sampled model independently,
-# which is the search approach in multi-trial NAS and (2) one-shot weight-sharing based search, which is used in one-shot NAS.
-# We demonstrate the first approach in this tutorial. Users can refer to :doc:`here </NAS/OneshotTrainer>` for the second approach.
+# which is the search approach in :ref:`multi-trial NAS <multi-trial-nas>`
+# and (2) one-shot weight-sharing based search, which is used in one-shot NAS.
+# We demonstrate the first approach in this tutorial. Users can refer to :ref:`here <one-shot-nas>` for the second approach.
 #
 # First, users need to pick a proper exploration strategy to explore the defined model space.
 # Second, users need to pick or customize a model evaluator to evaluate the performance of each explored model.
@@ -171,7 +172,7 @@ model_space
 # Pick an exploration strategy
 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 #
-# Retiarii supports many :doc:`exploration strategies </NAS/ExplorationStrategies>`.
+# Retiarii supports many :doc:`exploration strategies </nas/exploration_strategy>`.
 #
 # Simply choosing (i.e., instantiate) an exploration strategy as below.

@@ -182,9 +183,13 @@ search_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is
 # Pick or customize a model evaluator
 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 #
-# In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training and validating each generated model to obtain the model's performance. The performance is sent to the exploration strategy for the strategy to generate better models.
+# In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training
+# and validating each generated model to obtain the model's performance.
+# The performance is sent to the exploration strategy for the strategy to generate better models.
 #
-# Retiarii has provided :doc:`built-in model evaluators </NAS/ModelEvaluators>`, but to start with, it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function. This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.
+# Retiarii has provided :doc:`built-in model evaluators </nas/evaluator>`, but to start with,
+# it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function.
+# This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.
 #
 # An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.

@@ -266,7 +271,7 @@ evaluator = FunctionalEvaluator(evaluate_model)
 # The ``train_epoch`` and ``test_epoch`` here can be any customized function, where users can write their own training recipe.
 #
 # It is recommended that the :doc:``evaluate_model`` here accepts no additional arguments other than ``model_cls``.
-# However, in the `advanced tutorial </NAS/ModelEvaluators>`, we will show how to use additional arguments in case you actually need those.
+# However, in the `advanced tutorial </nas/evaluator>`, we will show how to use additional arguments in case you actually need those.
 # In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".
 #
 # Launch an Experiment
@@ -290,7 +295,7 @@ exp_config.trial_concurrency = 2  # will run two trials concurrently
 # ``use_active_gpu`` should be set true if you wish to use an occupied GPU (possibly running a GUI).

 exp_config.trial_gpu_number = 1
-exp_config.training_service.use_active_gpu = False
+exp_config.training_service.use_active_gpu = True

 # %%
 # Launch the experiment. The experiment should take several minutes to finish on a workstation with 2 GPUs.
@@ -298,14 +303,15 @@ exp_config.training_service.use_active_gpu = False
 exp.run(exp_config, 8081)

 # %%
-# Users can also run Retiarii Experiment with :doc:`different training services <../training_services>` besides ``local`` training service.
+# Users can also run Retiarii Experiment with :doc:`different training services </experiment/training_service>`
+# besides ``local`` training service.
 #
 # Visualize the Experiment
 # ------------------------
 #
 # Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment.
 # For example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``.
-# Please refer to :doc:`here <../Tutorial/WebUI>` for details.
+# Please refer to :doc:`here </experiment/webui>` for details.
 #
 # We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__).
 # This can be used by clicking ``Visualization`` in detail panel for each trial.

--- a/docs/source/tutorials/hello_nas.py.md5
+++ b/docs/source/tutorials/hello_nas.py.md5
-49ae2fd144f8c845a18b778edf168636
\ No newline at end of file
+6b66fe7afb47bb8f9a4124c8083e2930
\ No newline at end of file
--- a/docs/source/tutorials/hello_nas.rst
+++ b/docs/source/tutorials/hello_nas.rst
@@ -97,7 +97,7 @@ Below is a very simple example of defining a base model.
 Define Model Mutations
 ^^^^^^^^^^^^^^^^^^^^^^

-A base model is only one concrete model not a model space. We provide :doc:`API and Primitives </NAS/MutationPrimitives>`
+A base model is only one concrete model not a model space. We provide :doc:`API and Primitives </nas/construct_space>`
 for users to express how the base model can be mutated. That is, to build a model space which includes many models.

 Based on the above base model, we can define a model space as below.
@@ -205,27 +205,28 @@ This results in the following code:



-.. GENERATED FROM PYTHON SOURCE LINES 148-177
+.. GENERATED FROM PYTHON SOURCE LINES 148-178

 This example uses two mutation APIs, ``nn.LayerChoice`` and ``nn.ValueChoice``.
 ``nn.LayerChoice`` takes a list of candidate modules (two in this example), one will be chosen for each sampled model.
 It can be used like normal PyTorch module.
 ``nn.ValueChoice`` takes a list of candidate values, one will be chosen to take effect for each sampled model.

-More detailed API description and usage can be found :doc:`here </NAS/construct_space>`.
+More detailed API description and usage can be found :doc:`here </nas/construct_space>`.

 .. note::

    We are actively enriching the mutation APIs, to facilitate easy construction of model space.
    If the currently supported mutation APIs cannot express your model space,
-    please refer to :doc:`this doc </NAS/Mutators>` for customizing mutators.
+    please refer to :doc:`this doc </nas/mutator>` for customizing mutators.

 Explore the Defined Model Space
 -------------------------------

 There are basically two exploration approaches: (1) search by evaluating each sampled model independently,
-which is the search approach in multi-trial NAS and (2) one-shot weight-sharing based search, which is used in one-shot NAS.
-We demonstrate the first approach in this tutorial. Users can refer to :doc:`here </NAS/OneshotTrainer>` for the second approach.
+which is the search approach in :ref:`multi-trial NAS <multi-trial-nas>`
+and (2) one-shot weight-sharing based search, which is used in one-shot NAS.
+We demonstrate the first approach in this tutorial. Users can refer to :ref:`here <one-shot-nas>` for the second approach.

 First, users need to pick a proper exploration strategy to explore the defined model space.
 Second, users need to pick or customize a model evaluator to evaluate the performance of each explored model.
@@ -233,11 +234,11 @@ Second, users need to pick or customize a model evaluator to evaluate the perfor
 Pick an exploration strategy
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-Retiarii supports many :doc:`exploration strategies </NAS/ExplorationStrategies>`.
+Retiarii supports many :doc:`exploration strategies </nas/exploration_strategy>`.

 Simply choosing (i.e., instantiate) an exploration strategy as below.

-.. GENERATED FROM PYTHON SOURCE LINES 177-181
+.. GENERATED FROM PYTHON SOURCE LINES 178-182

 .. code-block:: default

@@ -255,26 +256,31 @@ Simply choosing (i.e., instantiate) an exploration strategy as below.

 .. code-block:: none

-    [2022-02-22 18:55:27] INFO (hyperopt.utils/MainThread) Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support.
-    [2022-02-22 18:55:27] INFO (hyperopt.fmin/MainThread) Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support.
+    [2022-02-28 14:01:11] INFO (hyperopt.utils/MainThread) Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support.
+    [2022-02-28 14:01:11] INFO (hyperopt.fmin/MainThread) Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support.
+
    /home/yugzhan/miniconda3/envs/cu102/lib/python3.8/site-packages/ray/autoscaler/_private/cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
      warnings.warn(




-.. GENERATED FROM PYTHON SOURCE LINES 182-190
+.. GENERATED FROM PYTHON SOURCE LINES 183-195

 Pick or customize a model evaluator
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training and validating each generated model to obtain the model's performance. The performance is sent to the exploration strategy for the strategy to generate better models.
+In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training
+and validating each generated model to obtain the model's performance.
+The performance is sent to the exploration strategy for the strategy to generate better models.

-Retiarii has provided :doc:`built-in model evaluators </NAS/ModelEvaluators>`, but to start with, it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function. This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.
+Retiarii has provided :doc:`built-in model evaluators </nas/evaluator>`, but to start with,
+it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function.
+This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.

 An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.

-.. GENERATED FROM PYTHON SOURCE LINES 190-258
+.. GENERATED FROM PYTHON SOURCE LINES 195-263

 .. code-block:: default

@@ -353,11 +359,11 @@ An example here creates a simple evaluator that runs on MNIST dataset, trains fo



-.. GENERATED FROM PYTHON SOURCE LINES 259-260
+.. GENERATED FROM PYTHON SOURCE LINES 264-265

 Create the evaluator

-.. GENERATED FROM PYTHON SOURCE LINES 260-264
+.. GENERATED FROM PYTHON SOURCE LINES 265-269

 .. code-block:: default

@@ -372,12 +378,12 @@ Create the evaluator



-.. GENERATED FROM PYTHON SOURCE LINES 265-275
+.. GENERATED FROM PYTHON SOURCE LINES 270-280

 The ``train_epoch`` and ``test_epoch`` here can be any customized function, where users can write their own training recipe.

 It is recommended that the :doc:``evaluate_model`` here accepts no additional arguments other than ``model_cls``.
-However, in the `advanced tutorial </NAS/ModelEvaluators>`, we will show how to use additional arguments in case you actually need those.
+However, in the `advanced tutorial </nas/evaluator>`, we will show how to use additional arguments in case you actually need those.
 In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".

 Launch an Experiment
@@ -385,7 +391,7 @@ Launch an Experiment

 After all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.

-.. GENERATED FROM PYTHON SOURCE LINES 276-282
+.. GENERATED FROM PYTHON SOURCE LINES 281-287

 .. code-block:: default

@@ -402,11 +408,11 @@ After all the above are prepared, it is time to start an experiment to do the mo



-.. GENERATED FROM PYTHON SOURCE LINES 283-284
+.. GENERATED FROM PYTHON SOURCE LINES 288-289

 The following configurations are useful to control how many trials to run at most / at the same time.

-.. GENERATED FROM PYTHON SOURCE LINES 284-288
+.. GENERATED FROM PYTHON SOURCE LINES 289-293

 .. code-block:: default

@@ -421,18 +427,18 @@ The following configurations are useful to control how many trials to run at mos



-.. GENERATED FROM PYTHON SOURCE LINES 289-291
+.. GENERATED FROM PYTHON SOURCE LINES 294-296

 Remember to set the following config if you want to GPU.
 ``use_active_gpu`` should be set true if you wish to use an occupied GPU (possibly running a GUI).

-.. GENERATED FROM PYTHON SOURCE LINES 291-295
+.. GENERATED FROM PYTHON SOURCE LINES 296-300

 .. code-block:: default


    exp_config.trial_gpu_number = 1
-    exp_config.training_service.use_active_gpu = False
+    exp_config.training_service.use_active_gpu = True



@@ -441,11 +447,11 @@ Remember to set the following config if you want to GPU.



-.. GENERATED FROM PYTHON SOURCE LINES 296-297
+.. GENERATED FROM PYTHON SOURCE LINES 301-302

 Launch the experiment. The experiment should take several minutes to finish on a workstation with 2 GPUs.

-.. GENERATED FROM PYTHON SOURCE LINES 297-300
+.. GENERATED FROM PYTHON SOURCE LINES 302-305

 .. code-block:: default

@@ -462,34 +468,35 @@ Launch the experiment. The experiment should take several minutes to finish on a

 .. code-block:: none

-    [2022-02-22 18:55:28] INFO (nni.experiment/MainThread) Creating experiment, Experiment ID: 68a4xl2o
-    [2022-02-22 18:55:28] INFO (nni.experiment/MainThread) Connecting IPC pipe...
-    [2022-02-22 18:55:28] INFO (nni.experiment/MainThread) Starting web server...
-    [2022-02-22 18:55:29] INFO (nni.experiment/MainThread) Setting up...
-    [2022-02-22 18:55:30] INFO (nni.runtime.msg_dispatcher_base/Thread-3) Dispatcher started
-    [2022-02-22 18:55:30] INFO (nni.retiarii.experiment.pytorch/MainThread) Web UI URLs: http://127.0.0.1:8081 http://10.190.172.35:8081 http://192.168.49.1:8081 http://172.17.0.1:8081
-    [2022-02-22 18:55:30] INFO (nni.retiarii.experiment.pytorch/MainThread) Start strategy...
-    [2022-02-22 18:55:30] INFO (root/MainThread) Successfully update searchSpace.
-    [2022-02-22 18:55:30] INFO (nni.retiarii.strategy.bruteforce/MainThread) Random search running in fixed size mode. Dedup: on.
-    [2022-02-22 18:57:50] INFO (nni.retiarii.experiment.pytorch/Thread-4) Stopping experiment, please wait...
-    [2022-02-22 18:57:50] INFO (nni.retiarii.experiment.pytorch/MainThread) Strategy exit
-    [2022-02-22 18:57:50] INFO (nni.retiarii.experiment.pytorch/MainThread) Waiting for experiment to become DONE (you can ctrl+c if there is no running trial jobs)...
-    [2022-02-22 18:57:51] INFO (nni.runtime.msg_dispatcher_base/Thread-3) Dispatcher exiting...
-    [2022-02-22 18:57:51] INFO (nni.retiarii.experiment.pytorch/Thread-4) Experiment stopped
+    [2022-02-28 14:01:13] INFO (nni.experiment/MainThread) Creating experiment, Experiment ID: dt84p16a
+    [2022-02-28 14:01:13] INFO (nni.experiment/MainThread) Connecting IPC pipe...
+    [2022-02-28 14:01:14] INFO (nni.experiment/MainThread) Starting web server...
+    [2022-02-28 14:01:15] INFO (nni.experiment/MainThread) Setting up...
+    [2022-02-28 14:01:15] INFO (nni.runtime.msg_dispatcher_base/Thread-3) Dispatcher started
+    [2022-02-28 14:01:15] INFO (nni.retiarii.experiment.pytorch/MainThread) Web UI URLs: http://127.0.0.1:8081 http://10.190.172.35:8081 http://192.168.49.1:8081 http://172.17.0.1:8081
+    [2022-02-28 14:01:15] INFO (nni.retiarii.experiment.pytorch/MainThread) Start strategy...
+    [2022-02-28 14:01:15] INFO (root/MainThread) Successfully update searchSpace.
+    [2022-02-28 14:01:15] INFO (nni.retiarii.strategy.bruteforce/MainThread) Random search running in fixed size mode. Dedup: on.
+    [2022-02-28 14:05:16] INFO (nni.retiarii.experiment.pytorch/Thread-4) Stopping experiment, please wait...
+    [2022-02-28 14:05:16] INFO (nni.retiarii.experiment.pytorch/MainThread) Strategy exit
+    [2022-02-28 14:05:16] INFO (nni.retiarii.experiment.pytorch/MainThread) Waiting for experiment to become DONE (you can ctrl+c if there is no running trial jobs)...
+    [2022-02-28 14:05:17] INFO (nni.runtime.msg_dispatcher_base/Thread-3) Dispatcher exiting...
+    [2022-02-28 14:05:17] INFO (nni.retiarii.experiment.pytorch/Thread-4) Experiment stopped




-.. GENERATED FROM PYTHON SOURCE LINES 301-318
+.. GENERATED FROM PYTHON SOURCE LINES 306-324

-Users can also run Retiarii Experiment with :doc:`different training services <../training_services>` besides ``local`` training service.
+Users can also run Retiarii Experiment with :doc:`different training services </experiment/training_service>`
+besides ``local`` training service.

 Visualize the Experiment
 ------------------------

 Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment.
 For example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``.
-Please refer to :doc:`here <../Tutorial/WebUI>` for details.
+Please refer to :doc:`here </experiment/webui>` for details.

 We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__).
 This can be used by clicking ``Visualization`` in detail panel for each trial.
@@ -500,7 +507,7 @@ Built-in evaluators (e.g., Classification) will automatically export the model i
 For your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.
 For instance,

-.. GENERATED FROM PYTHON SOURCE LINES 318-332
+.. GENERATED FROM PYTHON SOURCE LINES 324-338

 .. code-block:: default

@@ -525,7 +532,7 @@ For instance,



-.. GENERATED FROM PYTHON SOURCE LINES 333-341
+.. GENERATED FROM PYTHON SOURCE LINES 339-347

 Relaunch the experiment, and a button is shown on WebUI.

@@ -536,7 +543,7 @@ Export Top Models

 Users can export top models after the exploration is done using ``export_top_models``.

-.. GENERATED FROM PYTHON SOURCE LINES 341-353
+.. GENERATED FROM PYTHON SOURCE LINES 347-359

 .. code-block:: default

@@ -562,7 +569,7 @@ Users can export top models after the exploration is done using ``export_top_mod

 .. code-block:: none

-    {'model_1': '1', 'model_2': 0.5, 'model_3': 256}
+    {'model_1': '0', 'model_2': 0.25, 'model_3': 128}



@@ -570,7 +577,7 @@ Users can export top models after the exploration is done using ``export_top_mod

 .. rst-class:: sphx-glr-timing

-   **Total running time of the script:** ( 2 minutes  24.722 seconds)
+   **Total running time of the script:** ( 4 minutes  6.818 seconds)


 .. _sphx_glr_download_tutorials_hello_nas.py:

--- a/docs/source/tutorials/hello_nas_codeobj.pickle
+++ b/docs/source/tutorials/hello_nas_codeobj.pickle
--- a/docs/source/tutorials/nasbench_as_dataset.ipynb
+++ b/docs/source/tutorials/nasbench_as_dataset.ipynb
@@ -22,7 +22,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "## Prerequisites\nThis tutorial assumes that you have already prepared your NAS benchmarks under cache directory\n(by default, ``~/.cache/nni/nasbenchmark``).\nIf you haven't, please follow the data preparation guide in :doc:`../NAS/Benchmarks`.\n\nAs a result, the directory should look like:\n\n"
+        "## Prerequisites\nThis tutorial assumes that you have already prepared your NAS benchmarks under cache directory\n(by default, ``~/.cache/nni/nasbenchmark``).\nIf you haven't, please follow the data preparation guide in :doc:`/nas/benchmarks`.\n\nAs a result, the directory should look like:\n\n"
      ]
    },
    {

--- a/docs/source/tutorials/nasbench_as_dataset.py
+++ b/docs/source/tutorials/nasbench_as_dataset.py
@@ -13,7 +13,7 @@ NNI has provided query tools so that users can easily get the retrieve the data
 # -------------
 # This tutorial assumes that you have already prepared your NAS benchmarks under cache directory
 # (by default, ``~/.cache/nni/nasbenchmark``).
-# If you haven't, please follow the data preparation guide in :doc:`../NAS/Benchmarks`.
+# If you haven't, please follow the data preparation guide in :doc:`/nas/benchmarks`.
 #
 # As a result, the directory should look like:


--- a/docs/source/tutorials/nasbench_as_dataset.py.md5
+++ b/docs/source/tutorials/nasbench_as_dataset.py.md5
-651df59829f535210b4b58cc03027731
\ No newline at end of file
+715de24d20c57f3639033f6f10376c21
\ No newline at end of file
--- a/docs/source/tutorials/nasbench_as_dataset.rst
+++ b/docs/source/tutorials/nasbench_as_dataset.rst
@@ -32,7 +32,7 @@ Prerequisites
 -------------
 This tutorial assumes that you have already prepared your NAS benchmarks under cache directory
 (by default, ``~/.cache/nni/nasbenchmark``).
-If you haven't, please follow the data preparation guide in :doc:`../NAS/Benchmarks`.
+If you haven't, please follow the data preparation guide in :doc:`/nas/benchmarks`.

 As a result, the directory should look like:

@@ -116,7 +116,7 @@ Use the following architecture as an example:

 .. code-block:: none

-    [2022-02-22 18:52:29] INFO (nni.nas.benchmarks.utils/MainThread) "/home/yugzhan/.cache/nni/nasbenchmark/nasbench101-209f5694.db" already exists. Checking hash.
+    [2022-02-28 13:48:51] INFO (nni.nas.benchmarks.utils/MainThread) "/home/yugzhan/.cache/nni/nasbenchmark/nasbench101-209f5694.db" already exists. Checking hash.
    {'config': {'arch': {'input1': [0],
                         'input2': [1],
                         'input3': [2],
@@ -260,7 +260,7 @@ Use the following architecture as an example:

 .. code-block:: none

-    [2022-02-22 18:52:36] INFO (nni.nas.benchmarks.utils/MainThread) "/home/yugzhan/.cache/nni/nasbenchmark/nasbench201-b2b60732.db" already exists. Checking hash.
+    [2022-02-28 13:49:09] INFO (nni.nas.benchmarks.utils/MainThread) "/home/yugzhan/.cache/nni/nasbenchmark/nasbench201-b2b60732.db" already exists. Checking hash.
    {'config': {'arch': {'0_1': 'avg_pool_3x3',
                         '0_2': 'conv_1x1',
                         '0_3': 'conv_1x1',
@@ -436,7 +436,7 @@ Use none as a wildcard.

 .. code-block:: none

-    [2022-02-22 18:52:47] INFO (nni.nas.benchmarks.utils/MainThread) "/home/yugzhan/.cache/nni/nasbenchmark/nds-5745c235.db" already exists. Checking hash.
+    [2022-02-28 13:49:36] INFO (nni.nas.benchmarks.utils/MainThread) "/home/yugzhan/.cache/nni/nasbenchmark/nds-5745c235.db" already exists. Checking hash.
    {'best_test_acc': 90.48,
     'best_train_acc': 96.356,
     'best_train_loss': 0.116,
@@ -803,7 +803,7 @@ Count number.

 .. rst-class:: sphx-glr-timing

-   **Total running time of the script:** ( 0 minutes  25.047 seconds)
+   **Total running time of the script:** ( 1 minutes  2.214 seconds)


 .. _sphx_glr_download_tutorials_nasbench_as_dataset.py:

--- a/docs/source/tutorials/nasbench_as_dataset_codeobj.pickle
+++ b/docs/source/tutorials/nasbench_as_dataset_codeobj.pickle
--- a/docs/source/tutorials/sg_execution_times.rst
+++ b/docs/source/tutorials/sg_execution_times.rst
@@ -5,10 +5,10 @@

 Computation times
 =================
-**02:24.722** total execution time for **tutorials** files:
+**04:06.818** total execution time for **tutorials** files:

 +-------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorials_hello_nas.py` (``hello_nas.py``)                     | 02:24.722 | 0.0 MB |
+| :ref:`sphx_glr_tutorials_hello_nas.py` (``hello_nas.py``)                     | 04:06.818 | 0.0 MB |
 +-------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_tutorials_nasbench_as_dataset.py` (``nasbench_as_dataset.py``) | 00:00.000 | 0.0 MB |
 +-------------------------------------------------------------------------------+-----------+--------+

--- a/docs/static/css/material_custom.css
+++ b/docs/static/css/material_custom.css
@@ -47,3 +47,8 @@ nav.md-tabs .md-tabs__item:not(:last-child) .md-tabs__link:after {
 .md-nav span.caption {
    margin-top: 1.25em;
 }
+
+/* citation style */
+.citation dt {
+    padding-right: 1em;
+}
--- a/examples/tutorials/hello_nas.py
+++ b/examples/tutorials/hello_nas.py
@@ -66,7 +66,7 @@ class Net(nn.Module):
 # Define Model Mutations
 # ^^^^^^^^^^^^^^^^^^^^^^
 #
-# A base model is only one concrete model not a model space. We provide :doc:`API and Primitives </NAS/MutationPrimitives>`
+# A base model is only one concrete model not a model space. We provide :doc:`API and Primitives </nas/construct_space>`
 # for users to express how the base model can be mutated. That is, to build a model space which includes many models.
 #
 # Based on the above base model, we can define a model space as below.
@@ -150,20 +150,21 @@ model_space
 # It can be used like normal PyTorch module.
 # ``nn.ValueChoice`` takes a list of candidate values, one will be chosen to take effect for each sampled model.
 #
-# More detailed API description and usage can be found :doc:`here </NAS/construct_space>`.
+# More detailed API description and usage can be found :doc:`here </nas/construct_space>`.
 #
 # .. note::
 #
 #     We are actively enriching the mutation APIs, to facilitate easy construction of model space.
 #     If the currently supported mutation APIs cannot express your model space,
-#     please refer to :doc:`this doc </NAS/Mutators>` for customizing mutators.
+#     please refer to :doc:`this doc </nas/mutator>` for customizing mutators.
 #
 # Explore the Defined Model Space
 # -------------------------------
 #
 # There are basically two exploration approaches: (1) search by evaluating each sampled model independently,
-# which is the search approach in multi-trial NAS and (2) one-shot weight-sharing based search, which is used in one-shot NAS.
-# We demonstrate the first approach in this tutorial. Users can refer to :doc:`here </NAS/OneshotTrainer>` for the second approach.
+# which is the search approach in :ref:`multi-trial NAS <multi-trial-nas>`
+# and (2) one-shot weight-sharing based search, which is used in one-shot NAS.
+# We demonstrate the first approach in this tutorial. Users can refer to :ref:`here <one-shot-nas>` for the second approach.
 #
 # First, users need to pick a proper exploration strategy to explore the defined model space.
 # Second, users need to pick or customize a model evaluator to evaluate the performance of each explored model.
@@ -171,7 +172,7 @@ model_space
 # Pick an exploration strategy
 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 #
-# Retiarii supports many :doc:`exploration strategies </NAS/ExplorationStrategies>`.
+# Retiarii supports many :doc:`exploration strategies </nas/exploration_strategy>`.
 #
 # Simply choosing (i.e., instantiate) an exploration strategy as below.

@@ -182,9 +183,13 @@ search_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is
 # Pick or customize a model evaluator
 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 #
-# In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training and validating each generated model to obtain the model's performance. The performance is sent to the exploration strategy for the strategy to generate better models.
+# In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training
+# and validating each generated model to obtain the model's performance.
+# The performance is sent to the exploration strategy for the strategy to generate better models.
 #
-# Retiarii has provided :doc:`built-in model evaluators </NAS/ModelEvaluators>`, but to start with, it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function. This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.
+# Retiarii has provided :doc:`built-in model evaluators </nas/evaluator>`, but to start with,
+# it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function.
+# This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.
 #
 # An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.

@@ -266,7 +271,7 @@ evaluator = FunctionalEvaluator(evaluate_model)
 # The ``train_epoch`` and ``test_epoch`` here can be any customized function, where users can write their own training recipe.
 #
 # It is recommended that the :doc:``evaluate_model`` here accepts no additional arguments other than ``model_cls``.
-# However, in the `advanced tutorial </NAS/ModelEvaluators>`, we will show how to use additional arguments in case you actually need those.
+# However, in the `advanced tutorial </nas/evaluator>`, we will show how to use additional arguments in case you actually need those.
 # In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".
 #
 # Launch an Experiment
@@ -290,7 +295,7 @@ exp_config.trial_concurrency = 2  # will run two trials concurrently
 # ``use_active_gpu`` should be set true if you wish to use an occupied GPU (possibly running a GUI).

 exp_config.trial_gpu_number = 1
-exp_config.training_service.use_active_gpu = False
+exp_config.training_service.use_active_gpu = True

 # %%
 # Launch the experiment. The experiment should take several minutes to finish on a workstation with 2 GPUs.
@@ -298,14 +303,15 @@ exp_config.training_service.use_active_gpu = False
 exp.run(exp_config, 8081)

 # %%
-# Users can also run Retiarii Experiment with :doc:`different training services <../training_services>` besides ``local`` training service.
+# Users can also run Retiarii Experiment with :doc:`different training services </experiment/training_service>`
+# besides ``local`` training service.
 #
 # Visualize the Experiment
 # ------------------------
 #
 # Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment.
 # For example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``.
-# Please refer to :doc:`here <../Tutorial/WebUI>` for details.
+# Please refer to :doc:`here </experiment/webui>` for details.
 #
 # We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__).
 # This can be used by clicking ``Visualization`` in detail panel for each trial.

--- a/examples/tutorials/nasbench_as_dataset.py
+++ b/examples/tutorials/nasbench_as_dataset.py
@@ -13,7 +13,7 @@ NNI has provided query tools so that users can easily get the retrieve the data
 # -------------
 # This tutorial assumes that you have already prepared your NAS benchmarks under cache directory
 # (by default, ``~/.cache/nni/nasbenchmark``).
-# If you haven't, please follow the data preparation guide in :doc:`../NAS/Benchmarks`.
+# If you haven't, please follow the data preparation guide in :doc:`/nas/benchmarks`.
 #
 # As a result, the directory should look like:


--- a/nni/retiarii/converter/graph_gen.py
+++ b/nni/retiarii/converter/graph_gen.py
@@ -695,15 +695,17 @@ class GraphConverter:
 class GraphConverterWithShape(GraphConverter):
    """
    Convert a pytorch model to nni ir along with input/output shape info.
-    Based ir acquired through `torch.jit.script`
-    and shape info acquired through `torch.jit.trace`.
-
-    Known issues
-    ------------
-    1. `InputChoice` and `ValueChoice` not supported yet.
-    2. Currently random inputs are fed while tracing layerchoice.
-       If forward path of candidates depends on input data, then wrong path will be traced.
-       This will result in incomplete shape info.
+    Based ir acquired through ``torch.jit.script``
+    and shape info acquired through ``torch.jit.trace``.
+
+    .. warning::
+
+        Known issues:
+
+        1. ``InputChoice`` and ``ValueChoice`` not supported yet.
+        2. Currently random inputs are fed while tracing layerchoice.
+           If forward path of candidates depends on input data, then wrong path will be traced.
+           This will result in incomplete shape info.
    """
    def convert_module(self, script_module, module, module_name, ir_model, dummy_input):
        module.eval()

--- a/nni/retiarii/evaluator/pytorch/lightning.py
+++ b/nni/retiarii/evaluator/pytorch/lightning.py
@@ -41,7 +41,9 @@ class LightningModule(pl.LightningModule):


 Trainer = nni.trace(pl.Trainer)
+Trainer.__doc__ = 'Traced version of ``pytorch_lightning.Trainer``.'
 DataLoader = nni.trace(torch_data.DataLoader)
+DataLoader.__doc__ = 'Traced version of ``torch.utils.data.DataLoader``.'

 @nni.trace
 class Lightning(Evaluator):

--- a/nni/retiarii/experiment/pytorch.py
+++ b/nni/retiarii/experiment/pytorch.py
@@ -43,6 +43,9 @@ from ..strategy.utils import dry_run_for_formatted_search_space
 _logger = logging.getLogger(__name__)


+__all__ = ['RetiariiExeConfig', 'RetiariiExperiment']
+
+
 @dataclass(init=False)
 class RetiariiExeConfig(ConfigBase):
    experiment_name: Optional[str] = None
@@ -376,6 +379,8 @@ class RetiariiExperiment(Experiment):
        For one-shot algorithms, only top-1 is supported. For others, ``optimize_mode`` and ``formatter`` are
        available for customization.

+        Parameters
+        ----------
        top_k : int
            How many models are intended to be exported.
        optimize_mode : str

--- a/nni/retiarii/mutator.py
+++ b/nni/retiarii/mutator.py
@@ -32,8 +32,8 @@ class Mutator:
    Mutates graphs in model to generate new model.
    `Mutator` class will be used in two places:

-        1. Inherit `Mutator` to implement graph mutation logic.
-        2. Use `Mutator` subclass to implement NAS strategy.
+    1. Inherit `Mutator` to implement graph mutation logic.
+    2. Use `Mutator` subclass to implement NAS strategy.

    In scenario 1, the subclass should implement `Mutator.mutate()` interface with `Mutator.choice()`.
    In scenario 2, strategy should use constructor or `Mutator.bind_sampler()` to initialize subclass,

--- a/nni/retiarii/nn/pytorch/api.py
+++ b/nni/retiarii/nn/pytorch/api.py
@@ -21,7 +21,9 @@ class LayerChoice(Mutable):
    """
    Layer choice selects one of the ``candidates``, then apply it on inputs and return results.

-    Layer choice does not allow itself to be nested.
+    It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model.
+
+    *New in v2.2:* Layer choice can be nested.

    Parameters
    ----------
@@ -42,6 +44,21 @@ class LayerChoice(Mutable):
        Deprecated. A list of all candidate modules in the layer choice module.
        ``list(layer_choice)`` is recommended, which will serve the same purpose.

+    Examples
+    --------
+
+    ::
+
+        # import nni.retiarii.nn.pytorch as nn
+        # declared in `__init__` method
+        self.layer = nn.LayerChoice([
+            ops.PoolBN('max', channels, 3, stride, 1),
+            ops.SepConv(channels, channels, 3, stride, 1),
+            nn.Identity()
+        ])
+        # invoked in `forward` method
+        out = self.layer(x)
+
    Notes
    -----
    ``candidates`` can be a list of modules or a ordered dict of named modules, for example,
@@ -150,6 +167,10 @@ class LayerChoice(Mutable):
        return list(self)

    def forward(self, x):
+        """
+        The forward of layer choice is simply running the first candidate module.
+        It shouldn't be called directly by users in most cases.
+        """
        warnings.warn('You should not run forward of this module directly.')
        return self._first_module(x)

@@ -168,6 +189,10 @@ ReductionType = Literal['mean', 'concat', 'sum', 'none']
 class InputChoice(Mutable):
    """
    Input choice selects ``n_chosen`` inputs from ``choose_from`` (contains ``n_candidates`` keys).
+
+    It is mainly for choosing (or trying) different connections. It takes several tensors and chooses ``n_chosen`` tensors from them.
+    When specific inputs are chosen, ``InputChoice`` will become :class:`ChosenInputs`.
+
    Use ``reduction`` to specify how chosen inputs are reduced into one output. A few options are:

    * ``none``: do nothing and return the list directly.
@@ -189,6 +214,16 @@ class InputChoice(Mutable):
        Prior distribution used in random sampling.
    label : str
        Identifier of the input choice.
+
+    Examples
+    --------
+    ::
+
+        # import nni.retiarii.nn.pytorch as nn
+        # declared in `__init__` method
+        self.input_switch = nn.InputChoice(n_chosen=1)
+        # invoked in `forward` method, choose one from the three
+        out = self.input_switch([tensor1, tensor2, tensor3])
    """

    @classmethod
@@ -230,6 +265,10 @@ class InputChoice(Mutable):
        return self._label

    def forward(self, candidate_inputs: List[torch.Tensor]) -> torch.Tensor:
+        """
+        The forward of input choice is simply the first item of ``candidate_inputs``.
+        It shouldn't be called directly by users in most cases.
+        """
        warnings.warn('You should not run forward of this module directly.')
        return candidate_inputs[0]

@@ -260,6 +299,9 @@ class ChosenInputs(nn.Module):
        self.reduction = reduction

    def forward(self, candidate_inputs):
+        """
+        Compute the reduced input based on ``chosen`` and ``reduction``.
+        """
        return self._tensor_reduction(self.reduction, [candidate_inputs[i] for i in self.chosen])

    def _tensor_reduction(self, reduction_type, tensor_list):
@@ -661,11 +703,13 @@ ValueChoiceOrAny = TypeVar('ValueChoiceOrAny', ValueChoiceX, Any)

 class ValueChoice(ValueChoiceX, Mutable):
    """
-    ValueChoice is to choose one from ``candidates``.
+    ValueChoice is to choose one from ``candidates``. The most common use cases are:

-    In most use scenarios, ValueChoice should be passed to the init parameters of a serializable module. For example,
+    * Used as input arguments of :class:`~nni.retiarii.basic_unit`
+      (i.e., modules in ``nni.retiarii.nn.pytorch`` and user-defined modules decorated with ``@basic_unit``).
+    * Used as input arguments of evaluator (*new in v2.7*).

-    .. code-block:: python
+    It can be used in parameters of operators: ::

        class Net(nn.Module):
            def __init__(self):
@@ -675,37 +719,83 @@ class ValueChoice(ValueChoiceX, Mutable):
            def forward(self, x):
                return self.conv(x)

-    In case, you want to search a parameter that is used repeatedly, this is also possible by sharing the same value choice instance.
-    (Sharing the label should have the same effect.) For example,
+    Or evaluator: ::

-    .. code-block:: python
+        def train_and_evaluate(model_cls, learning_rate):
+            ...

-        class Net(nn.Module):
-            def __init__(self):
-                super().__init__()
-                hidden_dim = nn.ValueChoice([128, 512])
-                self.fc = nn.Sequential(
-                    nn.Linear(64, hidden_dim),
-                    nn.Linear(hidden_dim, 10)
-                )
-
-                # the following code has the same effect.
-                # self.fc = nn.Sequential(
-                #     nn.Linear(64, nn.ValueChoice([128, 512], label='dim')),
-                #     nn.Linear(nn.ValueChoice([128, 512], label='dim'), 10)
-                # )
+        self.evaluator = FunctionalEvaluator(train_and_evaluate, learning_rate=nn.ValueChoice([1e-3, 1e-2, 1e-1]))

-            def forward(self, x):
-                return self.fc(x)
+    Value choices supports arithmetic operators, which is particularly useful when searching for a network width multiplier: ::

-    Note that ValueChoice should be used directly. Transformations like ``nn.Linear(32, nn.ValueChoice([64, 128]) * 2)``
-    are not supported.
+        # init
+        scale = nn.ValueChoice([1.0, 1.5, 2.0])
+        self.conv1 = nn.Conv2d(3, round(scale * 16))
+        self.conv2 = nn.Conv2d(round(scale * 16), round(scale * 64))
+        self.conv3 = nn.Conv2d(round(scale * 64), round(scale * 256))

-    Another common use case is to initialize the values to choose from in init and call the module in forward to get the chosen value.
-    Usually, this is used to pass a mutable value to a functional API like ``torch.xxx`` or ``nn.functional.xxx```.
-    For example,
+        # forward
+        return self.conv3(self.conv2(self.conv1(x)))

-    .. code-block:: python
+    Or when kernel size and padding are coupled so as to keep the output size constant: ::
+
+        # init
+        ks = nn.ValueChoice([3, 5, 7])
+        self.conv = nn.Conv2d(3, 16, kernel_size=ks, padding=(ks - 1) // 2)
+
+        # forward
+        return self.conv(x)
+
+    Or when several layers are concatenated for a final layer. ::
+
+        # init
+        self.linear1 = nn.Linear(3, nn.ValueChoice([1, 2, 3], label='a'))
+        self.linear2 = nn.Linear(3, nn.ValueChoice([4, 5, 6], label='b'))
+        self.final = nn.Linear(nn.ValueChoice([1, 2, 3], label='a') + nn.ValueChoice([4, 5, 6], label='b'), 2)
+
+        # forward
+        return self.final(torch.cat([self.linear1(x), self.linear2(x)], 1))
+
+    Some advanced operators are also provided, such as :meth:`ValueChoice.max` and :meth:`ValueChoice.cond`.
+
+    .. tip::
+
+        All the APIs have an optional argument called ``label``,
+        mutations with the same label will share the same choice. A typical example is, ::
+
+            self.net = nn.Sequential(
+                nn.Linear(10, nn.ValueChoice([32, 64, 128], label='hidden_dim')),
+                nn.Linear(nn.ValueChoice([32, 64, 128], label='hidden_dim'), 3)
+            )
+
+        Sharing the same value choice instance has the similar effect. ::
+
+            class Net(nn.Module):
+                def __init__(self):
+                    super().__init__()
+                    hidden_dim = nn.ValueChoice([128, 512])
+                    self.fc = nn.Sequential(
+                        nn.Linear(64, hidden_dim),
+                        nn.Linear(hidden_dim, 10)
+                    )
+
+    .. warning::
+
+        It looks as if a specific candidate has been chosen (e.g., how it looks like when you can put ``ValueChoice``
+        as a parameter of ``nn.Conv2d``), but in fact it's a syntax sugar as because the basic units and evaluators
+        do all the underlying works. That means, you cannot assume that ``ValueChoice`` can be used in the same way
+        as its candidates. For example, the following usage will NOT work: ::
+
+            self.blocks = []
+            for i in range(nn.ValueChoice([1, 2, 3])):
+                self.blocks.append(Block())
+
+            # NOTE: instead you should probably write
+            # self.blocks = nn.Repeat(Block(), (1, 3))
+
+    Another use case is to initialize the values to choose from in init and call the module in forward to get the chosen value.
+    Usually, this is used to pass a mutable value to a functional API like ``torch.xxx`` or ``nn.functional.xxx```.
+    For example, ::

        class Net(nn.Module):
            def __init__(self):
@@ -747,6 +837,10 @@ class ValueChoice(ValueChoiceX, Mutable):
        return self._label

    def forward(self):
+        """
+        The forward of input choice is simply the first value of ``candidates``.
+        It shouldn't be called directly by users in most cases.
+        """
        warnings.warn('You should not run forward of this module directly.')
        return self.candidates[0]

@@ -785,4 +879,8 @@ class Placeholder(nn.Module):
        super().__init__()

    def forward(self, x):
+        """
+        Forward of placeholder is not meaningful.
+        It returns input directly.
+        """
        return x