[Doc] NAS (#4584)

cef9babd · Yuge Zhang · GitHub · ad5aff39 · cef9babd · ad5aff39
Unverified Commit cef9babd authored Feb 28, 2022 by Yuge Zhang Committed by GitHub Feb 28, 2022
20 changed files
--- a/dependencies/develop.txt
+++ b/dependencies/develop.txt
@@ -12,4 +12,5 @@ rstcheck
 sphinx
 sphinx-argparse-nni >= 0.4.0
 sphinx-gallery
+sphinxcontrib-bibtex
 git+https://github.com/bashtage/sphinx-material.git
--- a/docs/source/NAS/ApiReference.rst
+++ b/docs/source/NAS/ApiReference.rst
-Retiarii API Reference
-======================
-
-.. contents::
-
-Inline Mutation APIs
--------------------
-
-..  autoclass:: nni.retiarii.nn.pytorch.LayerChoice
-    :members:
-
-..  autoclass:: nni.retiarii.nn.pytorch.InputChoice
-    :members:
-
-..  autoclass:: nni.retiarii.nn.pytorch.ValueChoice
-    :members:
-
-..  autoclass:: nni.retiarii.nn.pytorch.ChosenInputs
-    :members:
-
-..  autoclass:: nni.retiarii.nn.pytorch.Repeat
-    :members:
-
-..  autoclass:: nni.retiarii.nn.pytorch.Cell
-    :members:
-
-Graph Mutation APIs
-------------------
-
-..  autoclass:: nni.retiarii.Mutator
-    :members:
-
-..  autoclass:: nni.retiarii.Model
-    :members:
-
-..  autoclass:: nni.retiarii.Graph
-    :members:
-
-..  autoclass:: nni.retiarii.Node
-    :members:
-
-..  autoclass:: nni.retiarii.Edge
-    :members:
-
-..  autoclass:: nni.retiarii.Operation
-    :members:
-
-Evaluators
----------
-
-..  autoclass:: nni.retiarii.evaluator.FunctionalEvaluator
-    :members:
-
-..  autoclass:: nni.retiarii.evaluator.pytorch.lightning.LightningModule
-    :members:
-
-..  autoclass:: nni.retiarii.evaluator.pytorch.lightning.Classification
-    :members:
-
-..  autoclass:: nni.retiarii.evaluator.pytorch.lightning.Regression
-    :members:
-
-Oneshot Trainers
----------------
-
-..  autoclass:: nni.retiarii.oneshot.pytorch.DartsTrainer
-    :members:
-
-..  autoclass:: nni.retiarii.oneshot.pytorch.EnasTrainer
-    :members:
-
-..  autoclass:: nni.retiarii.oneshot.pytorch.ProxylessTrainer
-    :members:
-
-..  autoclass:: nni.retiarii.oneshot.pytorch.SinglePathTrainer
-    :members:
-
-Exploration Strategies
----------------------
-
-..  autoclass:: nni.retiarii.strategy.Random
-    :members:
-
-..  autoclass:: nni.retiarii.strategy.GridSearch
-    :members:
-
-..  autoclass:: nni.retiarii.strategy.RegularizedEvolution
-    :members:
-
-..  autoclass:: nni.retiarii.strategy.TPEStrategy
-    :members:
-
-..  autoclass:: nni.retiarii.strategy.PolicyBasedRL
-    :members:
-
-Retiarii Experiments
--------------------
-
-..  autoclass:: nni.retiarii.experiment.pytorch.RetiariiExperiment
-    :members:
-
-..  autoclass:: nni.retiarii.experiment.pytorch.RetiariiExeConfig
-    :members:
-
-CGO Execution
-------------
-
-..  autofunction:: nni.retiarii.evaluator.pytorch.cgo.evaluator.MultiModelSupervisedLearningModule
-
-..  autofunction:: nni.retiarii.evaluator.pytorch.cgo.evaluator.Classification
-
-..  autofunction:: nni.retiarii.evaluator.pytorch.cgo.evaluator.Regression
-
-Utilities
---------
-
-..  autofunction:: nni.retiarii.basic_unit
-
-..  autofunction:: nni.retiarii.model_wrapper
-
-..  autofunction:: nni.retiarii.fixed_arch
-
-
--- a/docs/source/NAS/BenchmarksExample.ipynb
+++ b/docs/source/NAS/BenchmarksExample.ipynb
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "# Example Usages of NAS Benchmarks"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 3,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "import pprint\n",
-        "import time\n",
-        "\n",
-        "from nni.nas.benchmarks.nasbench101 import query_nb101_trial_stats\n",
-        "from nni.nas.benchmarks.nasbench201 import query_nb201_trial_stats\n",
-        "from nni.nas.benchmarks.nds import query_nds_trial_stats\n",
-        "\n",
-        "ti = time.time()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## NAS-Bench-101"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Use the following architecture as an example:\n",
-        "\n",
-        "![nas-101](../../img/nas-bench-101-example.png)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 2,
-      "metadata": {
-        "tags": []
-      },
-      "outputs": [],
-      "source": [
-        "arch = {\n",
-        "    'op1': 'conv3x3-bn-relu',\n",
-        "    'op2': 'maxpool3x3',\n",
-        "    'op3': 'conv3x3-bn-relu',\n",
-        "    'op4': 'conv3x3-bn-relu',\n",
-        "    'op5': 'conv1x1-bn-relu',\n",
-        "    'input1': [0],\n",
-        "    'input2': [1],\n",
-        "    'input3': [2],\n",
-        "    'input4': [0],\n",
-        "    'input5': [0, 3, 4],\n",
-        "    'input6': [2, 5]\n",
-        "}\n",
-        "for t in query_nb101_trial_stats(arch, 108, include_intermediates=True):\n",
-        "    pprint.pprint(t)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "An architecture of NAS-Bench-101 could be trained more than once. Each element of the returned generator is a dict which contains one of the training results of this trial config (architecture + hyper-parameters) including train/valid/test accuracy, training time, number of epochs, etc. The results of NAS-Bench-201 and NDS follow similar formats."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## NAS-Bench-201"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Use the following architecture as an example:\n",
-        "\n",
-        "![nas-201](../../img/nas-bench-201-example.png)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 3,
-      "metadata": {
-        "tags": []
-      },
-      "outputs": [],
-      "source": [
-        "arch = {\n",
-        "    '0_1': 'avg_pool_3x3',\n",
-        "    '0_2': 'conv_1x1',\n",
-        "    '1_2': 'skip_connect',\n",
-        "    '0_3': 'conv_1x1',\n",
-        "    '1_3': 'skip_connect',\n",
-        "    '2_3': 'skip_connect'\n",
-        "}\n",
-        "for t in query_nb201_trial_stats(arch, 200, 'cifar100'):\n",
-        "    pprint.pprint(t)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Intermediate results are also available."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 4,
-      "metadata": {
-        "tags": []
-      },
-      "outputs": [],
-      "source": [
-        "for t in query_nb201_trial_stats(arch, None, 'imagenet16-120', include_intermediates=True):\n",
-        "    print(t['config'])\n",
-        "    print('Intermediates:', len(t['intermediates']))"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## NDS"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Use the following architecture as an example:<br>\n",
-        "![nds](../../img/nas-bench-nds-example.png)\n",
-        "\n",
-        "Here, `bot_muls`, `ds`, `num_gs`, `ss` and `ws` stand for \"bottleneck multipliers\", \"depths\", \"number of groups\", \"strides\" and \"widths\" respectively."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 5,
-      "metadata": {
-        "tags": []
-      },
-      "outputs": [],
-      "source": [
-        "model_spec = {\n",
-        "    'bot_muls': [0.0, 0.25, 0.25, 0.25],\n",
-        "    'ds': [1, 16, 1, 4],\n",
-        "    'num_gs': [1, 2, 1, 2],\n",
-        "    'ss': [1, 1, 2, 2],\n",
-        "    'ws': [16, 64, 128, 16]\n",
-        "}\n",
-        "# Use none as a wildcard\n",
-        "for t in query_nds_trial_stats('residual_bottleneck', None, None, model_spec, None, 'cifar10'):\n",
-        "    pprint.pprint(t)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 6,
-      "metadata": {
-        "tags": []
-      },
-      "outputs": [],
-      "source": [
-        "model_spec = {\n",
-        "    'bot_muls': [0.0, 0.25, 0.25, 0.25],\n",
-        "    'ds': [1, 16, 1, 4],\n",
-        "    'num_gs': [1, 2, 1, 2],\n",
-        "    'ss': [1, 1, 2, 2],\n",
-        "    'ws': [16, 64, 128, 16]\n",
-        "}\n",
-        "for t in query_nds_trial_stats('residual_bottleneck', None, None, model_spec, None, 'cifar10', include_intermediates=True):\n",
-        "    pprint.pprint(t['intermediates'][:10])"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 7,
-      "metadata": {
-        "tags": []
-      },
-      "outputs": [],
-      "source": [
-        "model_spec = {'ds': [1, 12, 12, 12], 'ss': [1, 1, 2, 2], 'ws': [16, 24, 24, 40]}\n",
-        "for t in query_nds_trial_stats('residual_basic', 'resnet', 'random', model_spec, {}, 'cifar10'):\n",
-        "    pprint.pprint(t)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 8,
-      "metadata": {
-        "tags": []
-      },
-      "outputs": [],
-      "source": [
-        "# get the first one\n",
-        "pprint.pprint(next(query_nds_trial_stats('vanilla', None, None, None, None, None)))"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 9,
-      "metadata": {
-        "tags": []
-      },
-      "outputs": [],
-      "source": [
-        "# count number\n",
-        "model_spec = {'num_nodes_normal': 5, 'num_nodes_reduce': 5, 'depth': 12, 'width': 32, 'aux': False, 'drop_prob': 0.0}\n",
-        "cell_spec = {\n",
-        "    'normal_0_op_x': 'avg_pool_3x3',\n",
-        "    'normal_0_input_x': 0,\n",
-        "    'normal_0_op_y': 'conv_7x1_1x7',\n",
-        "    'normal_0_input_y': 1,\n",
-        "    'normal_1_op_x': 'sep_conv_3x3',\n",
-        "    'normal_1_input_x': 2,\n",
-        "    'normal_1_op_y': 'sep_conv_5x5',\n",
-        "    'normal_1_input_y': 0,\n",
-        "    'normal_2_op_x': 'dil_sep_conv_3x3',\n",
-        "    'normal_2_input_x': 2,\n",
-        "    'normal_2_op_y': 'dil_sep_conv_3x3',\n",
-        "    'normal_2_input_y': 2,\n",
-        "    'normal_3_op_x': 'skip_connect',\n",
-        "    'normal_3_input_x': 4,\n",
-        "    'normal_3_op_y': 'dil_sep_conv_3x3',\n",
-        "    'normal_3_input_y': 4,\n",
-        "    'normal_4_op_x': 'conv_7x1_1x7',\n",
-        "    'normal_4_input_x': 2,\n",
-        "    'normal_4_op_y': 'sep_conv_3x3',\n",
-        "    'normal_4_input_y': 4,\n",
-        "    'normal_concat': [3, 5, 6],\n",
-        "    'reduce_0_op_x': 'avg_pool_3x3',\n",
-        "    'reduce_0_input_x': 0,\n",
-        "    'reduce_0_op_y': 'dil_sep_conv_3x3',\n",
-        "    'reduce_0_input_y': 1,\n",
-        "    'reduce_1_op_x': 'sep_conv_3x3',\n",
-        "    'reduce_1_input_x': 0,\n",
-        "    'reduce_1_op_y': 'sep_conv_3x3',\n",
-        "    'reduce_1_input_y': 0,\n",
-        "    'reduce_2_op_x': 'skip_connect',\n",
-        "    'reduce_2_input_x': 2,\n",
-        "    'reduce_2_op_y': 'sep_conv_7x7',\n",
-        "    'reduce_2_input_y': 0,\n",
-        "    'reduce_3_op_x': 'conv_7x1_1x7',\n",
-        "    'reduce_3_input_x': 4,\n",
-        "    'reduce_3_op_y': 'skip_connect',\n",
-        "    'reduce_3_input_y': 4,\n",
-        "    'reduce_4_op_x': 'conv_7x1_1x7',\n",
-        "    'reduce_4_input_x': 0,\n",
-        "    'reduce_4_op_y': 'conv_7x1_1x7',\n",
-        "    'reduce_4_input_y': 5,\n",
-        "    'reduce_concat': [3, 6]\n",
-        "}\n",
-        "\n",
-        "for t in query_nds_trial_stats('nas_cell', None, None, model_spec, cell_spec, 'cifar10'):\n",
-        "    assert t['config']['model_spec'] == model_spec\n",
-        "    assert t['config']['cell_spec'] == cell_spec\n",
-        "    pprint.pprint(t)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 10,
-      "metadata": {
-        "tags": []
-      },
-      "outputs": [],
-      "source": [
-        "# count number\n",
-        "print('NDS (amoeba) count:', len(list(query_nds_trial_stats(None, 'amoeba', None, None, None, None, None))))"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## NLP"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "pycharm": {
-          "metadata": false
-        }
-      },
-      "source": [
-        "Use the following two architectures as examples. \n",
-        "The arch in the paper is called \"receipe\" with nested variable, and now it is nunested in the benchmarks for NNI.\n",
-        "An arch has multiple Node, Node_input_n and Node_op, you can refer to doc for more details.\n",
-        "\n",
-        "arch1 : <img src=\"../../img/nas-bench-nlp-example1.jpeg\" width=400 height=300 /> \n",
-        "\n",
-        "\n",
-        "arch2 : <img src=\"../../img/nas-bench-nlp-example2.jpeg\" width=400 height=300 /> \n",
-        "\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 1,
-      "metadata": {},
-      "outputs": [
-        {
-          "output_type": "stream",
-          "name": "stdout",
-          "text": [
-            "{'config': {'arch': {'h_new_0_input_0': 'node_3',\n                     'h_new_0_input_1': 'node_2',\n                     'h_new_0_input_2': 'node_1',\n                     'h_new_0_op': 'blend',\n                     'node_0_input_0': 'x',\n                     'node_0_input_1': 'h_prev_0',\n                     'node_0_op': 'linear',\n                     'node_1_input_0': 'node_0',\n                     'node_1_op': 'activation_tanh',\n                     'node_2_input_0': 'h_prev_0',\n                     'node_2_input_1': 'node_1',\n                     'node_2_input_2': 'x',\n                     'node_2_op': 'linear',\n                     'node_3_input_0': 'node_2',\n                     'node_3_op': 'activation_leaky_relu'},\n            'dataset': 'ptb',\n            'id': 20003},\n 'id': 16291,\n 'test_loss': 4.680262297102549,\n 'train_loss': 4.132040537087838,\n 'training_time': 177.05208373069763,\n 'val_loss': 4.707944253177966}\n"
-          ]
-        }
-      ],
-      "source": [
-        "import pprint\n",
-        "from nni.nas.benchmarks.nlp import query_nlp_trial_stats\n",
-        "\n",
-        "arch1 = {'h_new_0_input_0': 'node_3', 'h_new_0_input_1': 'node_2', 'h_new_0_input_2': 'node_1', 'h_new_0_op': 'blend', 'node_0_input_0': 'x', 'node_0_input_1': 'h_prev_0', 'node_0_op': 'linear','node_1_input_0': 'node_0', 'node_1_op': 'activation_tanh', 'node_2_input_0': 'h_prev_0', 'node_2_input_1': 'node_1', 'node_2_input_2': 'x', 'node_2_op': 'linear', 'node_3_input_0': 'node_2', 'node_3_op': 'activation_leaky_relu'}\n",
-        "for i in query_nlp_trial_stats(arch=arch1, dataset=\"ptb\"):\n",
-        "    pprint.pprint(i)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 6,
-      "metadata": {},
-      "outputs": [
-        {
-          "output_type": "stream",
-          "name": "stdout",
-          "text": [
-            "[{'current_epoch': 46,\n  'id': 1796,\n  'test_loss': 6.233430054978619,\n  'train_loss': 6.4866799231542664,\n  'training_time': 146.5680329799652,\n  'val_loss': 6.326836978687959},\n {'current_epoch': 47,\n  'id': 1797,\n  'test_loss': 6.2402057403023825,\n  'train_loss': 6.485401405247535,\n  'training_time': 146.05511450767517,\n  'val_loss': 6.3239741605870865},\n {'current_epoch': 48,\n  'id': 1798,\n  'test_loss': 6.351145308363877,\n  'train_loss': 6.611281181173992,\n  'training_time': 145.8849437236786,\n  'val_loss': 6.436160816865809},\n {'current_epoch': 49,\n  'id': 1799,\n  'test_loss': 6.227155079159031,\n  'train_loss': 6.473414458249545,\n  'training_time': 145.51414465904236,\n  'val_loss': 6.313294354607077}]\n"
-          ]
-        }
-      ],
-      "source": [
-        "arch2 = {\"h_new_0_input_0\":\"node_0\",\"h_new_0_input_1\":\"node_1\",\"h_new_0_op\":\"elementwise_sum\",\"node_0_input_0\":\"x\",\"node_0_input_1\":\"h_prev_0\",\"node_0_op\":\"linear\",\"node_1_input_0\":\"node_0\",\"node_1_op\":\"activation_tanh\"}\n",
-        "for i in query_nlp_trial_stats(arch=arch2, dataset='wikitext-2', include_intermediates=True):\n",
-        "    pprint.pprint(i['intermediates'][45:49])"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 4,
-      "metadata": {
-        "pycharm": {},
-        "tags": []
-      },
-      "outputs": [
-        {
-          "output_type": "stream",
-          "name": "stdout",
-          "text": [
-            "Elapsed time:  5.60982608795166 seconds\n"
-          ]
-        }
-      ],
-      "source": [
-        "print('Elapsed time: ', time.time() - ti, 'seconds')"
-      ]
-    }
-  ],
-  "metadata": {
-    "file_extension": ".py",
-    "kernelspec": {
-      "display_name": "Python 3",
-      "language": "python",
-      "name": "python3"
-    },
-    "language_info": {
-      "codemirror_mode": {
-        "name": "ipython",
-        "version": 3
-      },
-      "name": "python",
-      "version": "3.8.5-final"
-    },
-    "mimetype": "text/x-python",
-    "name": "python",
-    "npconvert_exporter": "python",
-    "orig_nbformat": 2,
-    "pygments_lexer": "ipython3",
-    "version": 3
-  },
-  "nbformat": 4,
-  "nbformat_minor": 2
-}
\ No newline at end of file
--- a/docs/source/NAS/DARTS.rst
+++ b/docs/source/NAS/DARTS.rst
-DARTS
-=====
-
-Introduction
------------
-
-The paper `DARTS: Differentiable Architecture Search <https://arxiv.org/abs/1806.09055>`__ addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Their method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent.
-
-Authors' code optimizes the network weights and architecture weights alternatively in mini-batches. They further explore the possibility that uses second order optimization (unroll) instead of first order, to improve the performance.
-
-Implementation on NNI is based on the `official implementation <https://github.com/quark0/darts>`__ and a `popular 3rd-party repo <https://github.com/khanrc/pt.darts>`__. DARTS on NNI is designed to be general for arbitrary search space. A CNN search space tailored for CIFAR10, same as the original paper, is implemented as a use case of DARTS.
-
-Reproduction Results
--------------------
-
-The above-mentioned example is meant to reproduce the results in the paper, we do experiments with first and second order optimization. Due to the time limit, we retrain *only the best architecture* derived from the search phase and we repeat the experiment *only once*. Our results is currently on par with the results reported in paper. We will add more results later when ready.
-
-.. list-table::
-   :header-rows: 1
-   :widths: auto
-
-   * - 
-     - In paper
-     - Reproduction
-   * - First order (CIFAR10)
-     - 3.00 +/- 0.14
-     - 2.78
-   * - Second order (CIFAR10)
-     - 2.76 +/- 0.09
-     - 2.80
-
-
-Examples
--------
-
-CNN Search Space
-^^^^^^^^^^^^^^^^
-
-:githublink:`Example code <examples/nas/oneshot/darts>`
-
-.. code-block:: bash
-
-   # In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-   git clone https://github.com/Microsoft/nni.git
-
-   # search the best architecture
-   cd examples/nas/oneshot/darts
-   python3 search.py
-
-   # train the best architecture
-   python3 retrain.py --arc-checkpoint ./checkpoints/epoch_49.json
-
-Reference
---------
-
-PyTorch
-^^^^^^^
-
-..  autoclass:: nni.retiarii.oneshot.pytorch.DartsTrainer
-    :noindex:
-
-Limitations
-----------
-
-
-* DARTS doesn't support DataParallel and needs to be customized in order to support DistributedDataParallel.
--- a/docs/source/NAS/ENAS.rst
+++ b/docs/source/NAS/ENAS.rst
-ENAS
-====
-
-Introduction
------------
-
-The paper `Efficient Neural Architecture Search via Parameter Sharing <https://arxiv.org/abs/1802.03268>`__ uses parameter sharing between child models to accelerate the NAS process. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss.
-
-Implementation on NNI is based on the `official implementation in Tensorflow <https://github.com/melodyguan/enas>`__\ , including a general-purpose Reinforcement-learning controller and a trainer that trains target network and this controller alternatively. Following paper, we have also implemented macro and micro search space on CIFAR10 to demonstrate how to use these trainers. Since code to train from scratch on NNI is not ready yet, reproduction results are currently unavailable.
-
-Examples
--------
-
-CIFAR10 Macro/Micro Search Space
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-:githublink:`Example code <examples/nas/oneshot/enas>`
-
-.. code-block:: bash
-
-   # In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-   git clone https://github.com/Microsoft/nni.git
-
-   # search the best architecture
-   cd examples/nas/oneshot/enas
-
-   # search in macro search space
-   python3 search.py --search-for macro
-
-   # search in micro search space
-   python3 search.py --search-for micro
-
-   # view more options for search
-   python3 search.py -h
-
-Reference
---------
-
-PyTorch
-^^^^^^^
-
-..  autoclass:: nni.retiarii.oneshot.pytorch.EnasTrainer
-    :noindex:
--- a/docs/source/NAS/ExplorationStrategies.rst
+++ b/docs/source/NAS/ExplorationStrategies.rst
-Exploration Strategies for Multi-trial NAS
-==========================================
-
-Usage of Exploration Strategy
-----------------------------
-
-To use an exploration strategy, users simply instantiate an exploration strategy and pass the instantiated object to ``RetiariiExperiment``. Below is a simple example.
-
-.. code-block:: python
-
-  import nni.retiarii.strategy as strategy
-
-  exploration_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is not wanted
-
-Supported Exploration Strategies
--------------------------------
-
-NNI provides the following exploration strategies for multi-trial NAS.
-
-.. list-table::
-   :header-rows: 1
-   :widths: auto
-
-   * - Name
-     - Brief Introduction of Algorithm
-   * - `Random Strategy <./ApiReference.rst#nni.retiarii.strategy.Random>`__
-     - Randomly sampling new model(s) from user defined model space. (``nni.retiarii.strategy.Random``)
-   * - `Grid Search <./ApiReference.rst#nni.retiarii.strategy.GridSearch>`__
-     - Sampling new model(s) from user defined model space using grid search algorithm. (``nni.retiarii.strategy.GridSearch``)
-   * - `Regularized Evolution <./ApiReference.rst#nni.retiarii.strategy.RegularizedEvolution>`__
-     - Generating new model(s) from generated models using `regularized evolution algorithm <https://arxiv.org/abs/1802.01548>`__ . (``nni.retiarii.strategy.RegularizedEvolution``)
-   * - `TPE Strategy <./ApiReference.rst#nni.retiarii.strategy.TPEStrategy>`__
-     - Sampling new model(s) from user defined model space using `TPE algorithm <https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf>`__ . (``nni.retiarii.strategy.TPEStrategy``)
-   * - `RL Strategy <./ApiReference.rst#nni.retiarii.strategy.PolicyBasedRL>`__
-     - It uses `PPO algorithm <https://arxiv.org/abs/1707.06347>`__ to sample new model(s) from user defined model space. (``nni.retiarii.strategy.PolicyBasedRL``)
-
-Customize Exploration Strategy
------------------------------
-
-If users want to innovate a new exploration strategy, they can easily customize a new one following the interface provided by NNI. Specifically, users should inherit the base strategy class ``BaseStrategy``, then implement the member function ``run``. This member function takes ``base_model`` and ``applied_mutators`` as its input arguments. It can simply apply the user specified mutators in ``applied_mutators`` onto ``base_model`` to generate a new model. When a mutator is applied, it should be bound with a sampler (e.g., ``RandomSampler``). Every sampler implements the ``choice`` function which chooses value(s) from candidate values. The ``choice`` functions invoked in mutators are executed with the sampler.
-
-Below is a very simple random strategy, which makes the choices completely random.
-
-.. code-block:: python
-
-    from nni.retiarii import Sampler
-
-    class RandomSampler(Sampler):
-        def choice(self, candidates, mutator, model, index):
-            return random.choice(candidates)
-
-    class RandomStrategy(BaseStrategy):
-        def __init__(self):
-            self.random_sampler = RandomSampler()
-
-        def run(self, base_model, applied_mutators):
-            _logger.info('stargety start...')
-            while True:
-                avail_resource = query_available_resources()
-                if avail_resource > 0:
-                    model = base_model
-                    _logger.info('apply mutators...')
-                    _logger.info('mutators: %s', str(applied_mutators))
-                    for mutator in applied_mutators:
-                        mutator.bind_sampler(self.random_sampler)
-                        model = mutator.apply(model)
-                    # run models
-                    submit_models(model)
-                else:
-                    time.sleep(2)
-
-You can find that this strategy does not know the search space beforehand, it passively makes decisions every time ``choice`` is invoked from mutators. If a strategy wants to know the whole search space before making any decision (e.g., TPE, SMAC), it can use ``dry_run`` function provided by ``Mutator`` to obtain the space. An example strategy can be found :githublink:`here <nni/retiarii/strategy/tpe_strategy.py>`.
-
-After generating a new model, the strategy can use our provided APIs (e.g., ``submit_models``, ``is_stopped_exec``) to submit the model and get its reported results. More APIs can be found in `API References <./ApiReference.rst>`__.
--- a/docs/source/NAS/FBNet.rst
+++ b/docs/source/NAS/FBNet.rst
-FBNet
-======
-
-.. note:: This one-shot NAS is still implemented under NNI NAS 1.0, and will `be migrated to Retiarii framework in v2.4 <https://github.com/microsoft/nni/issues/3814>`__.
-
-For the mobile application of facial landmark, based on the basic architecture of PFLD model, we have applied the FBNet (Block-wise DNAS) to design an concise model with the trade-off between latency and accuracy. References are listed as below:
-
-
-* `FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search <https://arxiv.org/abs/1812.03443>`__
-* `PFLD: A Practical Facial Landmark Detector <https://arxiv.org/abs/1902.10859>`__
-
-FBNet is a block-wise differentiable NAS method (Block-wise DNAS), where the best candidate building blocks can be chosen by using Gumbel Softmax random sampling and differentiable training. At each layer (or stage) to be searched, the diverse candidate blocks are side by side planned (just like the effectiveness of structural re-parameterization), leading to sufficient pre-training of the supernet. The pre-trained supernet is further sampled for finetuning of the subnet, to achieve better performance.
-
-.. image:: ../../img/fbnet.png
-   :target: ../../img/fbnet.png
-   :alt:
-
-
-PFLD is a lightweight facial landmark model for realtime application. The architecture of PLFD is firstly simplified for acceleration, by using the stem block of PeleeNet, average pooling with depthwise convolution and eSE module.
-
-To achieve better trade-off between latency and accuracy, the FBNet is further applied on the simplified PFLD for searching the best block at each specific layer. The search space is based on the FBNet space, and optimized for mobile deployment by using the average pooling with depthwise convolution and eSE module etc.
-
-
-Experiments
------------
-
-To verify the effectiveness of FBNet applied on PFLD, we choose the open source dataset with 106 landmark points as the benchmark:
-
-* `Grand Challenge of 106-Point Facial Landmark Localization <https://arxiv.org/abs/1905.03469>`__
-
-The baseline model is denoted as MobileNet-V3 PFLD (`Reference baseline <https://github.com/Hsintao/pfld_106_face_landmarks>`__), and the searched model is denoted as Subnet. The experimental results are listed as below, where the latency is tested on Qualcomm 625 CPU (ARMv8):
-
-
-.. list-table::
-   :header-rows: 1
-   :widths: auto
-
-   * - Model
-     - Size
-     - Latency
-     - Validation NME
-   * - MobileNet-V3 PFLD
-     - 1.01MB
-     - 10ms
-     - 6.22%
-   * - Subnet
-     - 693KB
-     - 1.60ms
-     - 5.58%
-
-
-Example
--------
-
-`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/oneshot/pfld>`__
-
-Please run the following scripts at the example directory.
-
-The Python dependencies used here are listed as below:
-
-.. code-block:: bash
-
-   numpy==1.18.5
-   opencv-python==4.5.1.48
-   torch==1.6.0
-   torchvision==0.7.0
-   onnx==1.8.1
-   onnx-simplifier==0.3.5
-   onnxruntime==1.7.0
-
-Data Preparation
-----------------
-
-Firstly, you should download the dataset `106points dataset <https://drive.google.com/file/d/1I7QdnLxAlyG2Tq3L66QYzGhiBEoVfzKo/view?usp=sharing>`__ to the path ``./data/106points`` . The dataset includes the train-set and test-set:
-
-.. code-block:: bash
-
-   ./data/106points/train_data/imgs
-   ./data/106points/train_data/list.txt
-   ./data/106points/test_data/imgs
-   ./data/106points/test_data/list.txt
-
-
-Quik Start
-----------
-
-1. Search
-^^^^^^^^^^
-
-Based on the architecture of simplified PFLD, the setting of multi-stage search space and hyper-parameters for searching should be firstly configured to construct the supernet, as an example:
-
-.. code-block:: bash
-
-   from lib.builder import search_space
-   from lib.ops import PRIMITIVES
-   from lib.supernet import PFLDInference, AuxiliaryNet
-   from nni.algorithms.nas.pytorch.fbnet import LookUpTable, NASConfig,
-
-   # configuration of hyper-parameters
-   # search_space defines the multi-stage search space
-   nas_config = NASConfig(
-          model_dir="./ckpt_save",
-          nas_lr=0.01,
-          mode="mul",
-          alpha=0.25,
-          beta=0.6,
-          search_space=search_space,
-      )
-   # lookup table to manage the information
-   lookup_table = LookUpTable(config=nas_config, primitives=PRIMITIVES)
-   # created supernet
-   pfld_backbone = PFLDInference(lookup_table)
-
-
-After creation of the supernet with the specification of search space and hyper-parameters, we can run below command to start searching and training of the supernet:
-
-.. code-block:: bash
-
-   python train.py --dev_id "0,1" --snapshot "./ckpt_save" --data_root "./data/106points"
-
-The validation accuracy will be shown during training, and the model with best accuracy will be saved as ``./ckpt_save/supernet/checkpoint_best.pth``.
-
-
-2. Finetune
-^^^^^^^^^^^^
-
-After pre-training of the supernet, we can run below command to sample the subnet and conduct the finetuning:
-
-.. code-block:: bash
-
-   python retrain.py --dev_id "0,1" --snapshot "./ckpt_save" --data_root "./data/106points" \
-                     --supernet "./ckpt_save/supernet/checkpoint_best.pth"
-
-The validation accuracy will be shown during training, and the model with best accuracy will be saved as ``./ckpt_save/subnet/checkpoint_best.pth``.
-
-
-3. Export
-^^^^^^^^^^
-
-After the finetuning of subnet, we can run below command to export the ONNX model:
-
-.. code-block:: bash
-
-   python export.py --supernet "./ckpt_save/supernet/checkpoint_best.pth" \
-                    --resume "./ckpt_save/subnet/checkpoint_best.pth"
-
-ONNX model is saved as ``./output/subnet.onnx``, which can be further converted to the mobile inference engine by using `MNN <https://github.com/alibaba/MNN>`__ .
-
-The checkpoints of pre-trained supernet and subnet are offered as below:
-
-* `Supernet <https://drive.google.com/file/d/1TCuWKq8u4_BQ84BWbHSCZ45N3JGB9kFJ/view?usp=sharing>`__
-* `Subnet <https://drive.google.com/file/d/160rkuwB7y7qlBZNM3W_T53cb6MQIYHIE/view?usp=sharing>`__
-* `ONNX model <https://drive.google.com/file/d/1s-v-aOiMv0cqBspPVF3vSGujTbn_T_Uo/view?usp=sharing>`__
\ No newline at end of file
--- a/docs/source/NAS/Hypermodules.rst
+++ b/docs/source/NAS/Hypermodules.rst
-Hypermodules
-============
-
-Hypermodule is a (PyTorch) module which contains many architecture/hyperparameter candidates for this module. By using hypermodule in user defined model, NNI will help users automatically find the best architecture/hyperparameter of the hypermodules for this model. This follows the design philosophy of Retiarii that users write DNN model as a space.
-
-There has been proposed some hypermodules in NAS community, such as AutoActivation, AutoDropout. Some of them are implemented in the Retiarii framework.
-
-..  autoclass:: nni.retiarii.nn.pytorch.AutoActivation
-    :members:
\ No newline at end of file
--- a/docs/source/NAS/MutationPrimitives.rst
+++ b/docs/source/NAS/MutationPrimitives.rst
-Mutation Primitives
-===================
-
-.. TODO: this file will be merged with API reference in future.
-
-To make users easily express a model space within their PyTorch/TensorFlow model, NNI provides some inline mutation APIs as shown below.
-
-We show the most common use case here. For advanced usages, please see `reference <./ApiReference.rst>`__.
-
-.. note:: We can actively adding more mutation primitives. If you have any suggestions, feel free to `ask here <https://github.com/microsoft/nni/issues>`__.
-
-``nn.LayerChoice``
-""""""""""""""""""
-
-API reference: :class:`nni.retiarii.nn.pytorch.LayerChoice`
-
-It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model.
-
-..  code-block:: python
-
-    # import nni.retiarii.nn.pytorch as nn
-    # declared in `__init__` method
-    self.layer = nn.LayerChoice([
-      ops.PoolBN('max', channels, 3, stride, 1),
-      ops.SepConv(channels, channels, 3, stride, 1),
-      nn.Identity()
-    ])
-    # invoked in `forward` method
-    out = self.layer(x)
-
-``nn.InputChoice``
-""""""""""""""""""
-
-API reference: :class:`nni.retiarii.nn.pytorch.InputChoice`
-
-It is mainly for choosing (or trying) different connections. It takes several tensors and chooses ``n_chosen`` tensors from them.
-
-..  code-block:: python
-
-    # import nni.retiarii.nn.pytorch as nn
-    # declared in `__init__` method
-    self.input_switch = nn.InputChoice(n_chosen=1)
-    # invoked in `forward` method, choose one from the three
-    out = self.input_switch([tensor1, tensor2, tensor3])
-
-``nn.ValueChoice``
-""""""""""""""""""
-
-API reference: :class:`nni.retiarii.nn.pytorch.ValueChoice`
-
-It is for choosing one value from some candidate values. The most common use cases are:
-
-* Used as input arguments of :class:`nni.retiarii.basic_unit` (i.e., modules in ``nni.retiarii.nn.pytorch`` and user-defined modules decorated with ``@basic_unit``).
-* Used as input arguments of evaluator (*new in v2.7*).
-
-Examples are as follows:
-
-..  code-block:: python
-
-    # import nni.retiarii.nn.pytorch as nn
-    # used in `__init__` method
-    self.conv = nn.Conv2d(XX, XX, kernel_size=nn.ValueChoice([1, 3, 5]))
-    self.op = MyOp(nn.ValueChoice([0, 1]), nn.ValueChoice([-1, 1]))
-
-    # used in evaluator
-    def train_and_evaluate(model_cls, learning_rate):
-      ...
-
-    self.evaluator = FunctionalEvaluator(train_and_evaluate, learning_rate=nn.ValueChoice([1e-3, 1e-2, 1e-1]))
-
-Value choices supports arithmetic operators, which is particularly useful when searching for a network width multiplier:
-
-..  code-block:: python
-
-    # init
-    scale = nn.ValueChoice([1.0, 1.5, 2.0])
-    self.conv1 = nn.Conv2d(3, round(scale * 16))
-    self.conv2 = nn.Conv2d(round(scale * 16), round(scale * 64))
-    self.conv3 = nn.Conv2d(round(scale * 64), round(scale * 256))
-
-    # forward
-    return self.conv3(self.conv2(self.conv1(x)))
-
-Or when kernel size and padding are coupled so as to keep the output size constant:
-
-..  code-block:: python
-
-    # init
-    ks = nn.ValueChoice([3, 5, 7])
-    self.conv = nn.Conv2d(3, 16, kernel_size=ks, padding=(ks - 1) // 2)
-
-    # forward
-    return self.conv(x)
-
-Or when several layers are concatenated for a final layer.
-
-..  code-block:: python
-
-    # init
-    self.linear1 = nn.Linear(3, nn.ValueChoice([1, 2, 3], label='a'))
-    self.linear2 = nn.Linear(3, nn.ValueChoice([4, 5, 6], label='b'))
-    self.final = nn.Linear(nn.ValueChoice([1, 2, 3], label='a') + nn.ValueChoice([4, 5, 6], label='b'), 2)
-
-    # forward
-    return self.final(torch.cat([self.linear1(x), self.linear2(x)], 1))
-
-Some advanced operators are also provided, such as ``nn.ValueChoice.max`` and ``nn.ValueChoice.cond``. See reference of :class:`nni.retiarii.nn.pytorch.ValueChoice` for more details.
-
-.. tip::
-
-  All the APIs have an optional argument called ``label``, mutations with the same label will share the same choice. A typical example is,
-
-  .. code-block:: python
-
-      self.net = nn.Sequential(
-        nn.Linear(10, nn.ValueChoice([32, 64, 128], label='hidden_dim')),
-        nn.Linear(nn.ValueChoice([32, 64, 128], label='hidden_dim'), 3)
-      )
-
-.. warning::
-
-    It looks as if a specific candidate has been chosen (e.g., the way you can put ``ValueChoice`` as a parameter of ``nn.ValueChoice``), but in fact it's a syntax sugar as because the basic units and evaluators do all the underlying works. That means, you cannot assume that ``ValueChoice`` can be used in the same way as its candidates. For example, the following usage will NOT work:
-
-    .. code-block:: python
-
-      self.blocks = []
-      for i in range(nn.ValueChoice([1, 2, 3])):
-        self.blocks.append(Block())
-
-      # NOTE: instead you should probably write
-      # self.blocks = nn.Repeat(Block(), (1, 3))
-
-``nn.Repeat``
-"""""""""""""
-
-API reference: :class:`nni.retiarii.nn.pytorch.Repeat`
-
-Repeat a block by a variable number of times.
-
-.. code-block:: python
-
-  # import nni.retiarii.nn.pytorch as nn
-  # used in `__init__` method
-
-  # Block() will be deep copied and repeated 3 times
-  self.blocks = nn.Repeat(Block(), 3)
-
-  # Block() will be repeated 1, 2, or 3 times
-  self.blocks = nn.Repeat(Block(), (1, 3))
-
-  # Can be used together with layer choice
-  # With deep copy, the 3 layers will have the same label, thus share the choice
-  self.blocks = nn.Repeat(nn.LayerChoice([...]), (1, 3))
-
-  # To make the three layer choices independently
-  # Need a factory function that accepts index (0, 1, 2, ...) and returns the module of the `index`-th layer.
-  self.blocks = nn.Repeat(lambda index: nn.LayerChoice([...], label=f'layer{index}'), (1, 3))
-
-``nn.Cell``
-"""""""""""
-
-API reference: :class:`nni.retiarii.nn.pytorch.Cell`
-
-This cell structure is popularly used in `NAS literature <https://arxiv.org/abs/1611.01578>`__. High-level speaking, literatures often use the following glossaries.
-
-.. list-table::
-   :widths: 25 75
-
-   * - Cell
-     - A cell consists of several nodes.
-   * - Node
-     - A node is the **sum** of several operators.
-   * - Operator
-     - Each operator is independently chosen from a list of user-specified candidate operators.
-   * - Operator's input
-     - Each operator has one input, chosen from previous nodes as well as predecessors.
-   * - Predecessors
-     - Input of cell. A cell can have multiple predecessors. Predecessors are sent to *preprocessor* for preprocessing.
-   * - Cell's output
-     - Output of cell. Usually concatenation of several nodes (possibly all nodes) in the cell. Cell's output, along with predecessors, are sent to *postprocessor* for postprocessing.
-   * - Preprocessor
-     - Extra preprocessing to predecessors. Usually used in shape alignment (e.g., predecessors have different shapes). By default, do nothing.
-   * - Postprocessor
-     - Extra postprocessing for cell's output. Usually used to chain cells with multiple Predecessors
-       (e.g., the next cell wants to have the outputs of both this cell and previous cell as its input). By default, directly use this cell's output.
-
-Example usages:
-
-.. code-block:: python
-
-  # import nni.retiarii.nn.pytorch as nn
-  # used in `__init__` method
-
-  # Choose between conv2d and maxpool2d.
-  # The cell have 4 nodes, 1 op per node, and 2 predecessors.
-  cell = nn.Cell([nn.Conv2d(32, 32, 3), nn.MaxPool2d(3)], 4, 1, 2)
-  # forward
-  cell([input1, input2])
-
-  # Use `merge_op` to specify how to construct the output.
-  # The output will then have dynamic shape, depending on which input has been used in the cell.
-  cell = nn.Cell([nn.Conv2d(32, 32, 3), nn.MaxPool2d(3)], 4, 1, 2, merge_op='loose_end')
-
-  # The op candidates can be callable that accepts node index in cell, op index in node, and input index.
-  cell = nn.Cell([
-    lambda node_index, op_index, input_index: nn.Conv2d(32, 32, 3, stride=2 if input_index < 1 else 1),
-    ...
-  ], 4, 1, 2)
-
-  # predecessor example
-  class Preprocessor:
-    def __init__(self):
-      self.conv1 = nn.Conv2d(16, 32, 1)
-      self.conv2 = nn.Conv2d(64, 32, 1)
-
-    def forward(self, x):
-      return [self.conv1(x[0]), self.conv2(x[1])]
-
-  cell = nn.Cell([nn.Conv2d(32, 32, 3), nn.MaxPool2d(3)], 4, 1, 2, preprocessor=Preprocessor())
-  cell([torch.randn(1, 16, 48, 48), torch.randn(1, 64, 48, 48)])  # the two inputs will be sent to conv1 and conv2 respectively
--- a/docs/source/NAS/OneshotTrainer.rst
+++ b/docs/source/NAS/OneshotTrainer.rst
-One-shot NAS
-============
-
-Before reading this tutorial, we highly recommend you to first go through the tutorial of how to `define a model space <./QuickStart.rst#define-your-model-space>`__.
-
-Model Search with One-shot Trainer
----------------------------------
-
-With a defined model space, users can explore the space in two ways. One is using strategy and single-arch evaluator as demonstrated `here <./QuickStart.rst#explore-the-defined-model-space>`__. The other is using one-shot trainer, which consumes much less computational resource compared to the first one. In this tutorial we focus on this one-shot approach. The principle of one-shot approach is combining all the models in a model space into one big model (usually called super-model or super-graph). It takes charge of both search, training and testing, by training and evaluating this big model.
-
-We list the supported one-shot trainers here:
-
-* DARTS trainer
-* ENAS trainer
-* ProxylessNAS trainer
-* Single-path (random) trainer
-
-See `API reference <./ApiReference.rst>`__ for detailed usages. Here, we show an example to use DARTS trainer manually.
-
-.. code-block:: python
-
-  from nni.retiarii.oneshot.pytorch import DartsTrainer
-  trainer = DartsTrainer(
-      model=model,
-      loss=criterion,
-      metrics=lambda output, target: accuracy(output, target, topk=(1,)),
-      optimizer=optim,
-      num_epochs=args.epochs,
-      dataset=dataset_train,
-      batch_size=args.batch_size,
-      log_frequency=args.log_frequency,
-      unrolled=args.unrolled
-  )
-  trainer.fit()
-  final_architecture = trainer.export()
-
-After the searching is done, we can use the exported architecture to instantiate the full network for retraining. Here is an example:
-
-.. code-block:: python
-
-    from nni.retiarii import fixed_arch
-    with fixed_arch('/path/to/checkpoint.json'):
-        model = Model()
--- a/docs/source/NAS/Overview.rst
+++ b/docs/source/NAS/Overview.rst
-Retiarii for Neural Architecture Search
-=======================================
-
-.. attention:: NNI's latest NAS supports are all based on Retiarii Framework, users who are still on `early version using NNI NAS v1.0 <https://nni.readthedocs.io/en/v2.2/nas.html>`__ shall migrate your work to Retiarii as soon as possible.
-
-.. contents::
-
-Motivation
----------
-
-Automatic neural architecture search is playing an increasingly important role in finding better models. Recent research has proven the feasibility of automatic NAS and has led to models that beat many manually designed and tuned models. Representative works include `NASNet <https://arxiv.org/abs/1707.07012>`__\ , `ENAS <https://arxiv.org/abs/1802.03268>`__\ , `DARTS <https://arxiv.org/abs/1806.09055>`__\ , `Network Morphism <https://arxiv.org/abs/1806.10282>`__\ , and `Evolution <https://arxiv.org/abs/1703.01041>`__. In addition, new innovations continue to emerge.
-
-However, it is pretty hard to use existing NAS work to help develop common DNN models. Therefore, we designed `Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__, a novel NAS/HPO framework, and implemented it in NNI. It helps users easily construct a model space (or search space, tuning space), and utilize existing NAS algorithms. The framework also facilitates NAS innovation and is used to design new NAS algorithms.
-
-Overview
--------
-
-There are three key characteristics of the Retiarii framework:
-
-* Simple APIs are provided for defining model search space within PyTorch/TensorFlow model.
-* SOTA NAS algorithms are built-in to be used for exploring model search space.
-* System-level optimizations are implemented for speeding up the exploration.
-
-There are two types of model space exploration approach: **Multi-trial NAS** and **One-shot NAS**. Mutli-trial NAS trains each sampled model in the model space independently, while One-shot NAS samples the model from a super model. After constructing the model space, users can use either exploration appraoch to explore the model space. 
-
-
-Multi-trial NAS
---------------
-
-Multi-trial NAS means each sampled model from model space is trained independently. A typical multi-trial NAS is `NASNet <https://arxiv.org/abs/1707.07012>`__. The algorithm to sample models from model space is called exploration strategy. NNI has supported the following exploration strategies for multi-trial NAS.
-
-.. list-table::
-   :header-rows: 1
-   :widths: auto
-
-   * - Exploration Strategy Name
-     - Brief Introduction of Algorithm
-   * - Random Strategy
-     - Randomly sampling new model(s) from user defined model space. (``nni.retiarii.strategy.Random``)
-   * - Grid Search
-     - Sampling new model(s) from user defined model space using grid search algorithm. (``nni.retiarii.strategy.GridSearch``)
-   * - Regularized Evolution
-     - Generating new model(s) from generated models using `regularized evolution algorithm <https://arxiv.org/abs/1802.01548>`__ . (``nni.retiarii.strategy.RegularizedEvolution``)
-   * - TPE Strategy
-     - Sampling new model(s) from user defined model space using `TPE algorithm <https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf>`__ . (``nni.retiarii.strategy.TPEStrategy``)
-   * - RL Strategy
-     - It uses `PPO algorithm <https://arxiv.org/abs/1707.06347>`__ to sample new model(s) from user defined model space. (``nni.retiarii.strategy.PolicyBasedRL``)
-
-
-Please refer to `here <./multi_trial_nas.rst>`__ for detailed usage of multi-trial NAS.
-
-One-shot NAS
------------
-
-One-shot NAS means building model space into a super-model, training the super-model with weight sharing, and then sampling models from the super-model to find the best one. `DARTS <https://arxiv.org/abs/1806.09055>`__ is a typical one-shot NAS.
-Below is the supported one-shot NAS algorithms. More one-shot NAS will be supported soon.
-
-.. list-table::
-   :header-rows: 1
-   :widths: auto
-
-   * - One-shot Algorithm Name
-     - Brief Introduction of Algorithm
-   * - `ENAS <ENAS.rst>`__
-     - `Efficient Neural Architecture Search via Parameter Sharing <https://arxiv.org/abs/1802.03268>`__. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance.
-   * - `DARTS <DARTS.rst>`__
-     - `DARTS: Differentiable Architecture Search <https://arxiv.org/abs/1806.09055>`__ introduces a novel algorithm for differentiable network architecture search on bilevel optimization.
-   * - `SPOS <SPOS.rst>`__
-     - `Single Path One-Shot Neural Architecture Search with Uniform Sampling <https://arxiv.org/abs/1904.00420>`__ constructs a simplified supernet trained with a uniform path sampling method and applies an evolutionary algorithm to efficiently search for the best-performing architectures.
-   * - `ProxylessNAS <Proxylessnas.rst>`__
-     - `ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware <https://arxiv.org/abs/1812.00332>`__. It removes proxy, directly learns the architectures for large-scale target tasks and target hardware platforms.
-
-Please refer to `here <one_shot_nas.rst>`__ for detailed usage of one-shot NAS algorithms.
-
-Reference and Feedback
----------------------
-
-* `Quick Start <./QuickStart.rst>`__ ;
-* `Construct Your Model Space <./construct_space.rst>`__ ;
-* `Retiarii: A Deep Learning Exploratory-Training Framework <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ ;
-* To `report a bug <https://github.com/microsoft/nni/issues/new?template=bug-report.rst>`__ for this feature in GitHub ;
-* To `file a feature or improvement request <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__ for this feature in GitHub .
--- a/docs/source/NAS/Proxylessnas.rst
+++ b/docs/source/NAS/Proxylessnas.rst
-ProxylessNAS on NNI
-===================
-
-Introduction
------------
-
-The paper `ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware <https://arxiv.org/pdf/1812.00332.pdf>`__ removes proxy, it directly learns the architectures for large-scale target tasks and target hardware platforms. They address high memory consumption issue of differentiable NAS and reduce the computational cost to the same level of regular training while still allowing a large candidate set. Please refer to the paper for the details.
-
-Usage
-----
-
-To use ProxylessNAS training/searching approach, users need to specify search space in their model using `NNI NAS interface <./MutationPrimitives.rst>`__\ , e.g., ``LayerChoice``\ , ``InputChoice``. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it.
-
-.. code-block:: python
-
-   trainer = ProxylessTrainer(model,
-                              loss=LabelSmoothingLoss(),
-                              dataset=None,
-                              optimizer=optimizer,
-                              metrics=lambda output, target: accuracy(output, target, topk=(1, 5,)),
-                              num_epochs=120,
-                              log_frequency=10,
-                              grad_reg_loss_type=args.grad_reg_loss_type, 
-                              grad_reg_loss_params=grad_reg_loss_params, 
-                              applied_hardware=args.applied_hardware, dummy_input=(1, 3, 224, 224),
-                              ref_latency=args.reference_latency)
-   trainer.train()
-   trainer.export(args.arch_path)
-
-The complete example code can be found :githublink:`here <examples/nas/oneshot/proxylessnas>`.
-
-**Input arguments of ProxylessNasTrainer**
-
-
-* **model** (*PyTorch model, required*\ ) - The model that users want to tune/search. It has mutables to specify search space.
-* **metrics** (*PyTorch module, required*\ ) - The main term of the loss function for model train. Receives logits and ground truth label, return a loss tensor.
-* **optimizer** (*PyTorch Optimizer, required*\) - The optimizer used for optimizing the model.
-* **num_epochs** (*int, optional, default = 120*\ ) - The number of epochs to train/search.
-* **dataset** (*PyTorch dataset, required*\ ) - Dataset for training. Will be split for training weights and architecture weights.
-* **warmup_epochs** (*int, optional, default = 0*\ ) - The number of epochs to do during warmup.
-* **batch_size** (*int, optional, default = 64*\ ) - Batch size.
-* **workers** (*int, optional, default = 4*\ ) - Workers for data loading.
-* **device** (*device, optional, default = 'cpu'*\ ) - The devices that users provide to do the train/search. The trainer applies data parallel on the model for users.
-* **log_frequency** (*int, optional, default = None*\ ) - Step count per logging.
-* **arc_learning_rate** (*float, optional, default = 1e-3*\ ) - The learning rate of the architecture parameters optimizer.
-* **grad_reg_loss_type** (*'mul#log', 'add#linear', or None, optional, default = 'add#linear'*\ ) - Regularization type to add hardware related loss. The trainer will not apply loss regularization when grad_reg_loss_type is set as None.
-* **grad_reg_loss_params** (*dict, optional, default = None*\ ) - Regularization params. 'alpha' and 'beta' is required when ``grad_reg_loss_type`` is 'mul#log', 'lambda' is required when ``grad_reg_loss_type`` is 'add#linear'.
-* **applied_hardware** (*string, optional, default = None*\ ) - Applied hardware for to constraint the model's latency. Latency is predicted by Microsoft nn-Meter (https://github.com/microsoft/nn-Meter). 
-* **dummy_input** (*tuple, optional, default = (1, 3, 224, 224)*\ ) - The dummy input shape when applied to the target hardware.
-* **ref_latency** (*float, optional, default = 65.0*\ ) - Reference latency value in the applied hardware (ms).
-
-
-Implementation
--------------
-
-The implementation on NNI is based on the `offical implementation <https://github.com/mit-han-lab/ProxylessNAS>`__. The official implementation supports two training approaches: gradient descent and RL based. In our current implementation on NNI, gradient descent training approach is supported. The complete support of ProxylessNAS is ongoing.
-
-The official implementation supports different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'.  In NNI repo, the hardware latency prediction is supported by `Microsoft nn-Meter <https://github.com/microsoft/nn-Meter>`__. nn-Meter is an accurate inference latency predictor for DNN models on diverse edge devices. nn-Meter support four hardwares up to now, including *'cortexA76cpu_tflite21'*, *'adreno640gpu_tflite21'*, *'adreno630gpu_tflite21'*, and *'myriadvpu_openvino2019r2'*. Users can find more information about nn-Meter on its website. More hardware will be supported in the future. Users could find more details about applying ``nn-Meter`` `here <./HardwareAwareNAS.rst>`__ .
-
-Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in :githublink:`example code <examples/nas/oneshot/proxylessnas>` using :githublink:`NNI NAS interface <nni/retiarii/oneshot/pytorch/proxyless>`.
-
-.. image:: ../../img/proxylessnas.png
-   :target: ../../img/proxylessnas.png
-   :alt: 
-
-
-ProxylessNAS training approach is composed of ProxylessLayerChoice and ProxylessNasTrainer. ProxylessLayerChoice instantiates MixedOp for each mutable (i.e., LayerChoice), and manage architecture weights in MixedOp. **For DataParallel**\ , architecture weights should be included in user model. Specifically, in ProxylessNAS implementation, we add MixedOp to the corresponding mutable (i.e., LayerChoice) as a member variable. The ProxylessLayerChoice class also exposes two member functions, i.e., ``resample``\ , ``finalize_grad``\ , for the trainer to control the training of architecture weights.
-
-ProxylessNasMutator also implements the forward logic of the mutables (i.e., LayerChoice).
-
-Reproduce Results
-----------------
-
-To reproduce the result, we first run the search, we found that though it runs many epochs the chosen architecture converges at the first several epochs. This is probably induced by hyper-parameters or the implementation, we are working on it. 
\ No newline at end of file
--- a/docs/source/NAS/QuickStart.rst
+++ b/docs/source/NAS/QuickStart.rst
-Quick Start of Retiarii on NNI
-==============================
-
-
-.. contents::
-
-In this quick start, we use multi-trial NAS as an example to show how to construct and explore a model space. There are mainly three crucial components for a neural architecture search task, namely,
-
-* Model search space that defines a set of models to explore.
-* A proper strategy as the method to explore this model space.
-* A model evaluator that reports the performance of every model in the space.
-
-The tutorial for One-shot NAS can be found `here <./OneshotTrainer.rst>`__.
-
-Currently, PyTorch is the only supported framework by Retiarii, and we have only tested **PyTorch 1.7 to 1.10**. This documentation assumes PyTorch context but it should also apply to other frameworks, which is in our future plan.
-
-Define your Model Space
-----------------------
-
-Model space is defined by users to express a set of models that users want to explore, which contains potentially good-performing models. In this framework, a model space is defined with two parts: a base model and possible mutations on the base model.
-
-Define Base Model
-^^^^^^^^^^^^^^^^^
-
-Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model. Usually, you only need to replace the code ``import torch.nn as nn`` with ``import nni.retiarii.nn.pytorch as nn`` to use our wrapped PyTorch modules.
-
-Below is a very simple example of defining a base model.
-
-.. code-block:: python
-
-  import torch
-  import torch.nn.functional as F
-  import nni.retiarii.nn.pytorch as nn
-  from nni.retiarii import model_wrapper
-
-  @model_wrapper      # this decorator should be put on the out most
-  class Net(nn.Module):
-    def __init__(self):
-      super().__init__()
-      self.conv1 = nn.Conv2d(1, 32, 3, 1)
-      self.conv2 = nn.Conv2d(32, 64, 3, 1)
-      self.dropout1 = nn.Dropout(0.25)
-      self.dropout2 = nn.Dropout(0.5)
-      self.fc1 = nn.Linear(9216, 128)
-      self.fc2 = nn.Linear(128, 10)
-
-    def forward(self, x):
-      x = F.relu(self.conv1(x))
-      x = F.max_pool2d(self.conv2(x), 2)
-      x = torch.flatten(self.dropout1(x), 1)
-      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
-      output = F.log_softmax(x, dim=1)
-      return output
-
-.. tip:: Always keep in mind that you should use ``import nni.retiarii.nn.pytorch as nn`` and :meth:`nni.retiarii.model_wrapper`. Many mistakes are a result of forgetting one of those. Also, please use ``torch.nn`` for submodules of ``nn.init``, e.g., ``torch.nn.init`` instead of ``nn.init``. 
-
-Define Model Mutations
-^^^^^^^^^^^^^^^^^^^^^^
-
-A base model is only one concrete model not a model space. We provide `APIs and primitives <./MutationPrimitives.rst>`__ for users to express how the base model can be mutated. That is, to build a model space which includes many models.
-
-Based on the above base model, we can define a model space as below. 
-
-.. code-block:: diff
-
-  import torch
-  import torch.nn.functional as F
-  import nni.retiarii.nn.pytorch as nn
-  from nni.retiarii import model_wrapper
-
-  @model_wrapper
-  class Net(nn.Module):
-    def __init__(self):
-      super().__init__()
-      self.conv1 = nn.Conv2d(1, 32, 3, 1)
-  -   self.conv2 = nn.Conv2d(32, 64, 3, 1)
-  +   self.conv2 = nn.LayerChoice([
-  +       nn.Conv2d(32, 64, 3, 1),
-  +       DepthwiseSeparableConv(32, 64)
-  +   ])
-  -   self.dropout1 = nn.Dropout(0.25)
-  +   self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
-      self.dropout2 = nn.Dropout(0.5)
-  -   self.fc1 = nn.Linear(9216, 128)
-  -   self.fc2 = nn.Linear(128, 10)
-  +   feature = nn.ValueChoice([64, 128, 256])
-  +   self.fc1 = nn.Linear(9216, feature)
-  +   self.fc2 = nn.Linear(feature, 10)
-
-    def forward(self, x):
-      x = F.relu(self.conv1(x))
-      x = F.max_pool2d(self.conv2(x), 2)
-      x = torch.flatten(self.dropout1(x), 1)
-      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
-      output = F.log_softmax(x, dim=1)
-      return output
-
-This example uses two mutation APIs, ``nn.LayerChoice`` and ``nn.ValueChoice``. ``nn.LayerChoice`` takes a list of candidate modules (two in this example), one will be chosen for each sampled model. It can be used like normal PyTorch module. ``nn.ValueChoice`` takes a list of candidate values, one will be chosen to take effect for each sampled model.
-
-More detailed API description and usage can be found `here <./construct_space.rst>`__ .
-
-.. note:: We are actively enriching the mutation APIs, to facilitate easy construction of model space. If the currently supported mutation APIs cannot express your model space, please refer to `this doc <./Mutators.rst>`__ for customizing mutators.
-
-Explore the Defined Model Space
-------------------------------
-
-There are basically two exploration approaches: (1) search by evaluating each sampled model independently, which is the search approach in multi-trial NAS and (2) one-shot weight-sharing based search, which is used in one-shot NAS. We demonstrate the first approach in this tutorial. Users can refer to `here <./OneshotTrainer.rst>`__ for the second approach.
-
-First, users need to pick a proper exploration strategy to explore the defined model space. Second, users need to pick or customize a model evaluator to evaluate the performance of each explored model.
-
-Pick an exploration strategy
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Retiarii supports many `exploration strategies <./ExplorationStrategies.rst>`__.
-
-Simply choosing (i.e., instantiate) an exploration strategy as below.
-
-.. code-block:: python
-
-  import nni.retiarii.strategy as strategy
-
-  search_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is not wanted
-
-Pick or customize a model evaluator
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training and validating each generated model to obtain the model's performance. The performance is sent to the exploration strategy for the strategy to generate better models.
-
-Retiarii has provided `built-in model evaluators <./ModelEvaluators.rst>`__, but to start with, it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function. This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.
-
-An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.
-
-..  code-block:: python
-
-    def evaluate_model(model_cls):
-      # "model_cls" is a class, need to instantiate
-      model = model_cls()
-
-      optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
-      transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-      train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
-      test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
-
-      device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
-
-      for epoch in range(3):
-        # train the model for one epoch
-        train_epoch(model, device, train_loader, optimizer, epoch)
-        # test the model for one epoch
-        accuracy = test_epoch(model, device, test_loader)
-        # call report intermediate result. Result can be float or dict
-        nni.report_intermediate_result(accuracy)
-
-      # report final test result
-      nni.report_final_result(accuracy)
-
-    # Create the evaluator
-    evaluator = nni.retiarii.evaluator.FunctionalEvaluator(evaluate_model)
-
-The ``train_epoch`` and ``test_epoch`` here can be any customized function, where users can write their own training recipe. See :githublink:`examples/nas/multi-trial/mnist/search.py` for the full example.
-
-It is recommended that the ``evaluate_model`` here accepts no additional arguments other than ``model_cls``. However, in the `advanced tutorial <./ModelEvaluators.rst>`__, we will show how to use additional arguments in case you actually need those. In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".
-
-Launch an Experiment
--------------------
-
-After all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.
-
-.. code-block:: python
-
-  exp = RetiariiExperiment(base_model, evaluator, [], search_strategy)
-  exp_config = RetiariiExeConfig('local')
-  exp_config.experiment_name = 'mnist_search'
-  exp_config.trial_concurrency = 2
-  exp_config.max_trial_number = 20
-  exp_config.training_service.use_active_gpu = False
-  exp.run(exp_config, 8081)
-
-The complete code of this example can be found :githublink:`here <examples/nas/multi-trial/mnist/search.py>`. Users can also run Retiarii Experiment with `different training services <../training_services.rst>`__ besides ``local`` training service.
-
-Visualize the Experiment
------------------------
-
-Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../Tutorial/WebUI.rst>`__ for details.
-
-We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__). This can be used by clicking ``Visualization`` in detail panel for each trial. Note that current visualization is based on `onnx <https://onnx.ai/>`__ , thus visualization is not feasible if the model cannot be exported into onnx.
-
-Built-in evaluators (e.g., Classification) will automatically export the model into a file. For your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work. For instance,
-
-.. code-block:: python
-
-  def evaluate_model(model_cls):
-    model = model_cls()
-    # dump the model into an onnx
-    if 'NNI_OUTPUT_DIR' in os.environ:
-      torch.onnx.export(model, (dummy_input, ),
-                        Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
-    # the rest are training and evaluation
-
-Export Top Models
-----------------
-
-Users can export top models after the exploration is done using ``export_top_models``.
-
-.. code-block:: python
-
-  for model_code in exp.export_top_models(formatter='dict'):
-    print(model_code)
-
-The output is `json` object which records the mutation actions of the top model. If users want to output source code of the top model, they can use graph-based execution engine for the experiment, by simply adding the following two lines.
-
-.. code-block:: python
-
-  exp_config.execution_engine = 'base'
-  export_formatter = 'code'
--- a/docs/source/NAS/QuickStart_zh.rst
+++ b/docs/source/NAS/QuickStart_zh.rst
-.. d6ad1b913b292469c647ca68ac158840
-
-快速入门 Retiarii
-==============================
-
-
-.. contents::
-
-在快速入门教程中，我们以 multi-trial NAS 为例来展示如何构建和探索模型空间。 神经网络架构搜索任务主要有三个关键组件，即：
-
-* 模型搜索空间（Model search space），定义了要探索的模型集合。
-* 一个适当的策略（strategy），作为探索这个搜索空间的方法。
-* 一个模型评估器（model evaluator），报告一个给定模型的性能。
-
-One-shot NAS 教程在 `这里 <./OneshotTrainer.rst>`__。
-
-.. note:: 目前，PyTorch 是 Retiarii 唯一支持的框架，我们只用 **PyTorch 1.7 和 1.10** 进行了测试。 本文档基于 PyTorch 的背景，但它也应该适用于其他框架，这在我们未来的计划中。
-
-定义模型空间
-----------------------
-
-模型空间是由用户定义的，用来表达用户想要探索、认为包含性能良好模型的一组模型。 模型空间是由用户定义的，用来表达用户想要探索、认为包含性能良好模型的一组模型。 在这个框架中，模型空间由两部分组成：基本模型和基本模型上可能的突变。
-
-定义基本模型
-^^^^^^^^^^^^^^^^^
-
-定义基本模型与定义 PyTorch（或 TensorFlow）模型几乎相同， 只有两个小区别。 对于 PyTorch 模块（例如 ``nn.Conv2d``, ``nn.ReLU``），将代码 ``import torch.nn as nn`` 替换为 ``import nni.retiarii.nn.pytorch as nn`` 。
-
-下面是定义基本模型的一个简单的示例，它与定义 PyTorch 模型几乎相同。
-
-.. code-block:: python
-
-  import torch
-  import torch.nn.functional as F
-  import nni.retiarii.nn.pytorch as nn
-  from nni.retiarii import model_wrapper
-
-  @model_wrapper      # this decorator should be put on the out most
-  class Net(nn.Module):
-    def __init__(self):
-      super().__init__()
-      self.conv1 = nn.Conv2d(1, 32, 3, 1)
-      self.conv2 = nn.Conv2d(32, 64, 3, 1)
-      self.dropout1 = nn.Dropout(0.25)
-      self.dropout2 = nn.Dropout(0.5)
-      self.fc1 = nn.Linear(9216, 128)
-      self.fc2 = nn.Linear(128, 10)
-
-    def forward(self, x):
-      x = F.relu(self.conv1(x))
-      x = F.max_pool2d(self.conv2(x), 2)
-      x = torch.flatten(self.dropout1(x), 1)
-      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
-      output = F.log_softmax(x, dim=1)
-      return output
-
-.. tip:: 记得使用 ``import nni.retiarii.nn.pytorch as nn`` 和 :meth:`nni.retiarii.model_wrapper`. 许多错误都源于忘记使用它们。同时，对于 ``nn`` 的子模块（例如 ``nn.init``）请使用 ``torch.nn``，比如，``torch.nn.init`` 而不是 ``nn.init``。
-
-定义模型突变
-^^^^^^^^^^^^^^^^^^^^^^
-
-基本模型只是一个具体模型，而不是模型空间。 我们为用户提供 `API 和原语 <./MutationPrimitives.rst>`__，用于把基本模型变形成包含多个模型的模型空间。
-
-基于上面定义的基本模型，我们可以这样定义一个模型空间：
-
-.. code-block:: diff
-
-  import torch
-  import torch.nn.functional as F
-  import nni.retiarii.nn.pytorch as nn
-  from nni.retiarii import model_wrapper
-
-  @model_wrapper
-  class Net(nn.Module):
-    def __init__(self):
-      super().__init__()
-      self.conv1 = nn.Conv2d(1, 32, 3, 1)
-  -   self.conv2 = nn.Conv2d(32, 64, 3, 1)
-  +   self.conv2 = nn.LayerChoice([
-  +       nn.Conv2d(32, 64, 3, 1),
-  +       DepthwiseSeparableConv(32, 64)
-  +   ])
-  -   self.dropout1 = nn.Dropout(0.25)
-  +   self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
-      self.dropout2 = nn.Dropout(0.5)
-  -   self.fc1 = nn.Linear(9216, 128)
-  -   self.fc2 = nn.Linear(128, 10)
-  +   feature = nn.ValueChoice([64, 128, 256])
-  +   self.fc1 = nn.Linear(9216, feature)
-  +   self.fc2 = nn.Linear(feature, 10)
-
-    def forward(self, x):
-      x = F.relu(self.conv1(x))
-      x = F.max_pool2d(self.conv2(x), 2)
-      x = torch.flatten(self.dropout1(x), 1)
-      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
-      output = F.log_softmax(x, dim=1)
-      return output
-
-在这个例子中我们使用了两个突变 API， ``nn.LayerChoice`` 和 ``nn.ValueChoice``。 ``nn.LayerChoice`` 的输入参数是一个候选模块的列表（在这个例子中是两个），每个采样到的模型会选择其中的一个，然后它就可以像一般的 PyTorch 模块一样被使用。 ``nn.ValueChoice`` 输入一系列候选的值，然后对于每个采样到的模型，其中的一个值会生效。
-
-更多的 API 描述和用法可以请阅读 `这里 <./construct_space.rst>`__ 。
-
-.. note:: 我们正在积极的丰富突变 API，以简化模型空间的构建。如果我们提供的 API 不能满足您表达模型空间的需求，请阅读 `这个文档 <./Mutators.rst>`__ 以获得更多定制突变的资讯。
-
-探索定义的模型空间
-------------------------------
-
-简单来说，探索模型空间有两种方法：(1) 通过独立评估每个采样模型进行搜索；(2) 基于 One-Shot 的权重共享式搜索。 我们在本教程中演示了下面的第一种方法。 第二种方法可以参考 `这里 <./OneshotTrainer.rst>`__。
-
-首先，用户需要选择合适的探索策略来探索模型空间。然后，用户需要选择或自定义模型评估器来评估每个采样模型的性能。
-
-选择搜索策略
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-Retiarii 支持许多 `探索策略（exploration strategies） <./ExplorationStrategies.rst>`__。
-
-简单地选择（即实例化）一个探索策略：
-
-.. code-block:: python
-
-  import nni.retiarii.strategy as strategy
-
-  search_strategy = strategy.Random(dedup=True)  # dedup=False 如果不希望有重复数据删除
-
-选择或编写模型评估器
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-在 NAS 过程中，探索策略反复生成新模型。 模型评估器用于训练和验证每个生成的模型。 生成的模型所获得的性能被收集起来，并送至探索策略以生成更好的模型。
-
-Retiarii 提供了诸多的 `内置模型评估器 <./ModelEvaluators.rst>`__，但是作为第一步，我们还是推荐使用 ``FunctionalEvaluator``，也就是说，将您自己的训练和测试代码用一个函数包起来。这个函数的输入参数是一个模型的类，然后使用 ``nni.report_final_result`` 来汇报模型的效果。
-
-这里的一个例子创建了一个简单的评估器，它在 MNIST 数据集上运行，训练 2 个 Epoch，并报告其在验证集上的准确率。
-
-..  code-block:: python
-
-    def evaluate_model(model_cls):
-      # "model_cls" 是一个类，需要初始化
-      model = model_cls()
-
-      optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
-      transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-      train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
-      test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
-
-      device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
-
-      for epoch in range(3):
-        # 训练模型，1 个 epoch
-        train_epoch(model, device, train_loader, optimizer, epoch)
-        # 测试模型，1 个 epoch
-        accuracy = test_epoch(model, device, test_loader)
-        # 汇报中间结果，可以是 float 或者 dict 类型
-        nni.report_intermediate_result(accuracy)
-
-      # 汇报最终结果
-      nni.report_final_result(accuracy)
-
-    # 创建模型评估器
-    evaluator = nni.retiarii.evaluator.FunctionalEvaluator(evaluate_model)
-
-在这里 ``train_epoch`` 和 ``test_epoch`` 可以是任意自定义的函数，用户可以写自己的训练流程。完整的样例可以参见 :githublink:`examples/nas/multi-trial/mnist/search.py`。
-
-我们建议 ``evaluate_model`` 不接受 ``model_cls`` 以外的其他参数。但是，我们在 `高级教程 <./ModelEvaluators.rst>`__ 中展示了其他参数的用法，如果您真的需要的话。另外，我们会在未来支持这些参数的突变（这通常会成为 "超参调优"）。
-
-发起 Experiment
--------------------
-
-一切准备就绪，就可以发起 Experiment 以进行模型搜索了。 样例如下：
-
-.. code-block:: python
-
-  exp = RetiariiExperiment(base_model, evaluator, [], search_strategy)
-  exp_config = RetiariiExeConfig('local')
-  exp_config.experiment_name = 'mnist_search'
-  exp_config.trial_concurrency = 2
-  exp_config.max_trial_number = 20
-  exp_config.training_service.use_active_gpu = False
-  exp.run(exp_config, 8081)
-
-一个简单 MNIST 示例的完整代码在 :githublink:`这里 <examples/nas/multi-trial/mnist/search.py>`。 除了本地训练平台，用户还可以在除了本地机器以外的 `不同的训练平台 <../training_services.rst>`__ 上运行 Retiarii 的实验。
-
-可视化 Experiment
------------------------
-
-用户可以像可视化普通的超参数调优 Experiment 一样可视化他们的 Experiment。 例如，在浏览器里打开 ``localhost:8081``，8081 是在 ``exp.run`` 里设置的端口。 参考 `这里 <../Tutorial/WebUI.rst>`__ 了解更多细节。
-
-我们支持使用第三方工具（例如 `Netron <https://netron.app/>`__）可视化搜索过程中采样到的模型。您可以点击每个 trial 面板下的 ``Visualization``。注意，目前的可视化是基于导出成 `onnx <https://onnx.ai/>`__ 格式的模型实现的，所以如果模型无法导出成 onnx，那么可视化就无法进行。
-
-内置的模型评估器（比如 Classification）已经自动将模型导出成了一个文件。如果您自定义了模型，您需要将模型导出到 ``$NNI_OUTPUT_DIR/model.onnx``。例如，
-
-.. code-block:: python
-
-  def evaluate_model(model_cls):
-    model = model_cls()
-    # 把模型导出成 onnx
-    if 'NNI_OUTPUT_DIR' in os.environ:
-      torch.onnx.export(model, (dummy_input, ),
-                        Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
-    # 剩下的就是训练和测试流程
-
-导出最佳模型
-----------------
-
-探索完成后，用户可以使用 ``export_top_models`` 导出最佳模型。
-
-.. code-block:: python
-
-  for model_code in exp.export_top_models(formatter='dict'):
-    print(model_code)
-
-导出的 `json` 记录的是最佳模型的突变记录。如果用户想要最佳模型的代码，可以简单的使用基于图的执行引擎，增加如下两行代码即可：
-
-.. code-block:: python
-
-  exp_config.execution_engine = 'base'
-  export_formatter = 'code'
--- a/docs/source/NAS/SPOS.rst
+++ b/docs/source/NAS/SPOS.rst
-Single Path One-Shot (SPOS)
-===========================
-
-Introduction
------------
-
-Proposed in `Single Path One-Shot Neural Architecture Search with Uniform Sampling <https://arxiv.org/abs/1904.00420>`__ is a one-shot NAS method that addresses the difficulties in training One-Shot NAS models by constructing a simplified supernet trained with an uniform path sampling method, so that all underlying architectures (and their weights) get trained fully and equally. An evolutionary algorithm is then applied to efficiently search for the best-performing architectures without any fine tuning.
-
-Implementation on NNI is based on `official repo <https://github.com/megvii-model/SinglePathOneShot>`__. We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase.
-
-Examples
--------
-
-Here is a use case, which is the search space in paper. However, we applied latency limit instead of flops limit to perform the architecture search phase.
-
-:githublink:`Example code <examples/nas/oneshot/spos>`
-
-Requirements
-^^^^^^^^^^^^
-
-Prepare ImageNet in the standard format (follow the script `here <https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4>`__\ ). Linking it to ``data/imagenet`` will be more convenient.
-
-Download the checkpoint file from `here <https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN>`__ (maintained by `Megvii <https://github.com/megvii-model>`__\ ) if you don't want to retrain the supernet.
-Put ``checkpoint-150000.pth.tar`` under ``data`` directory.
-
-
-After preparation, it's expected to have the following code structure:
-
-.. code-block:: bash
-
-   spos
-   ├── architecture_final.json
-   ├── blocks.py
-   ├── data
-   │   ├── imagenet
-   │   │   ├── train
-   │   │   └── val
-   │   └── checkpoint-150000.pth.tar
-   ├── network.py
-   ├── readme.md
-   ├── supernet.py
-   ├── evaluation.py
-   ├── search.py
-   └── utils.py
-
-Step 1. Train Supernet
-^^^^^^^^^^^^^^^^^^^^^^
-
-.. code-block:: bash
-
-   python supernet.py
-
-Will export the checkpoint to ``checkpoints`` directory, for the next step.
-
-NOTE: The data loading used in the official repo is `slightly different from usual <https://github.com/megvii-model/SinglePathOneShot/issues/5>`__\ , as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option ``--spos-preprocessing`` will simulate the behavior used originally and enable you to use the checkpoints pretrained.
-
-Step 2. Evolution Search
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-Single Path One-Shot leverages evolution algorithm to search for the best architecture. In the paper, the search module, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
-
-In this example, we have an incomplete implementation of the evolution search. The example only support training from scratch. Inheriting weights from pretrained supernet is not supported yet. To search with the regularized evolution strategy, run
-
-.. code-block:: bash
-
-   python search.py
-
-The final architecture exported from every epoch of evolution can be found in ``trials`` under the working directory of your tuner, which, by default, is ``$HOME/nni-experiments/your_experiment_id/trials``.
-
-Step 3. Train for Evaluation
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code-block:: bash
-
-   python evaluation.py
-
-By default, it will use ``architecture_final.json``. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with ``--fixed-arc`` option.
-
-Reference
---------
-
-PyTorch
-^^^^^^^
-
-..  autoclass:: nni.retiarii.oneshot.pytorch.SinglePathTrainer
-    :noindex:
-
-Known Limitations
-----------------
-
-
-* Block search only. Channel search is not supported yet.
-* In the search phase, training from the scratch is required. Inheriting weights from supernet is not supported yet.
-
-Current Reproduction Results
----------------------------
-
-Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.
-
-
-* Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to `this issue <https://github.com/megvii-model/SinglePathOneShot/issues/6>`__.
-* Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.
--- a/docs/source/NAS/construct_space.rst
+++ b/docs/source/NAS/construct_space.rst
-#####################
-Construct Model Space
-#####################
-
-NNI provides powerful APIs for users to easily express model space (or search space). First, users can use mutation primitives (e.g., ValueChoice, LayerChoice) to inline a space in their model. Second, NNI provides simple interface for users to customize new mutators for expressing more complicated model spaces. In most cases, the mutation primitives are enough to express users' model spaces.
-
-..  toctree::
-    :maxdepth: 1
-
-    Mutation Primitives <MutationPrimitives>
-    Customize Mutators <Mutators>
-    Hypermodule Lib <Hypermodules>
\ No newline at end of file
--- a/docs/source/NAS/construct_space_zh.rst
+++ b/docs/source/NAS/construct_space_zh.rst
-.. bb39a6ac0ae1f5554bc38604c77fb616
-
-#####################
-构建模型空间
-#####################
-
-NNI为用户提供了强大的API，以方便表达模型空间（或搜索空间）。 首先，用户可以使用 mutation 原语（如 ValueChoice、LayerChoice）在他们的模型中内联一个空间。 其次，NNI为用户提供了简单的接口，可以定制新的 mutators 来表达更复杂的模型空间。 在大多数情况下，mutation 原语足以表达用户的模型空间。
-
-..  toctree::
-    :maxdepth: 1
-
-    mutation 原语 <MutationPrimitives>
-    定制 mutator <Mutators>
-    Hypermodule Lib <Hypermodules>
\ No newline at end of file
--- a/docs/source/NAS/multi_trial_nas.rst
+++ b/docs/source/NAS/multi_trial_nas.rst
-Multi-trial NAS
-===============
-
-In multi-trial NAS, users need model evaluator to evaluate the performance of each sampled model, and need an exploration strategy to sample models from a defined model space. Here, users could use NNI provided model evaluators or write their own model evalutor. They can simply choose a exploration strategy. Advanced users can also customize new exploration strategy. For a simple example about how to run a multi-trial NAS experiment, please refer to `Quick Start <./QuickStart.rst>`__.
-
-..  toctree::
-    :maxdepth: 2
-
-    Model Evaluators <ModelEvaluators>
-    Exploration Strategies <ExplorationStrategies>
-    Execution Engines <ExecutionEngines>
-    Serialization <Serialization>
--- a/docs/source/NAS/multi_trial_nas_zh.rst
+++ b/docs/source/NAS/multi_trial_nas_zh.rst
-.. 51734c9945d4eca0f9b5633929d8fadf
-
-Multi-trial NAS
-===============
-
-在 multi-trial NAS 中，用户需要模型评估器来评估每个采样模型的性能，并且需要一个探索策略来从定义的模型空间中采样模型。 在这里，用户可以使用 NNI 提供的模型评估器或编写自己的模型评估器。 他们可以简单地选择一种探索策略。 高级用户还可以自定义新的探索策略。 关于如何运行 multi-trial NAS 实验的简单例子，请参考 `快速入门 <./QuickStart.rst>`__。
-
-..  toctree::
-    :maxdepth: 1
-
-    模型评估器 <ModelEvaluators>
-    探索策略 <ExplorationStrategies>
-    执行引擎 <ExecutionEngines>
-    序列化 <Serialization>
--- a/docs/source/NAS/one_shot_nas.rst
+++ b/docs/source/NAS/one_shot_nas.rst
-One-shot NAS
-============
-
-One-shot NAS algorithms leverage weight sharing among models in neural architecture search space to train a supernet, and use this supernet to guide the selection of better models. This type of algorihtms greatly reduces computational resource compared to independently training each model from scratch (which we call "Multi-trial NAS"). NNI has supported many popular One-shot NAS algorithms as following.
-
-
-..  toctree::
-    :maxdepth: 1
-
-    Run One-shot NAS <OneshotTrainer>
-    ENAS <ENAS>
-    DARTS <DARTS>
-    SPOS <SPOS>
-    ProxylessNAS <Proxylessnas>
-    FBNet <FBNet>
-    Customize One-shot NAS <WriteOneshot>