create branch for v2.9

e773dfcc · qianyj · e773dfcc · e773dfcc · e773dfcc · e773dfcc
Commit e773dfcc authored Mar 21, 2023 by qianyj
20 changed files
--- a/docs/source/sharings/sptag_auto_tune.rst
+++ b/docs/source/sharings/sptag_auto_tune.rst
+Automatically tuning SPTAG with NNI
+===================================
+
+`SPTAG <https://github.com/microsoft/SPTAG>`__ (Space Partition Tree And Graph) is a library for large scale vector approximate nearest neighbor search scenario released by `Microsoft Research (MSR) <https://www.msra.cn/>`__ and `Microsoft Bing <https://www.bing.com/>`__.
+
+This library assumes that the samples are represented as vectors and that the vectors can be compared by L2 distances or cosine distances. Vectors returned for a query vector are the vectors that have smallest L2 distance or cosine distances with the query vector.
+SPTAG provides two methods: kd-tree and relative neighborhood graph (SPTAG-KDT) and balanced k-means tree and relative neighborhood graph (SPTAG-BKT). SPTAG-KDT is advantageous in index building cost, and SPTAG-BKT is advantageous in search accuracy in very high-dimensional data.
+
+In SPTAG, there are tens of parameters that can be tuned for specified scenarios or datasets. NNI is a great tool for automatically tuning those parameters. The authors of SPTAG tried NNI for the auto tuning and found good-performing parameters easily, thus, they shared the practice of tuning SPTAG on NNI in their document `here <https://github.com/microsoft/SPTAG/blob/master/docs/Parameters.md>`__. Please refer to it for detailed tutorial.
--- a/docs/source/sharings/squad_evolution_examples.rst
+++ b/docs/source/sharings/squad_evolution_examples.rst
+Automatic Model Architecture Search for Reading Comprehension
+=============================================================
+
+This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension.
+
+1. Search Space
+---------------
+
+Since attention and RNN have been proven effective in Reading Comprehension, we conclude the search space as follow:
+
+
+#. IDENTITY (Effectively means keep training).
+#. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.)
+#. REMOVE-RNN-LAYER
+#. INSERT-ATTENTION-LAYER(Inserts an attention layer.)
+#. REMOVE-ATTENTION-LAYER
+#. ADD-SKIP (Identity between random layers).
+#. REMOVE-SKIP (Removes random skip).
+
+
+.. image:: ../../../examples/trials/ga_squad/ga_squad.png
+   :target: ../../../examples/trials/ga_squad/ga_squad.png
+   :alt: 
+
+
+New version
+^^^^^^^^^^^
+
+Also we have another version which time cost is less and performance is better. We will release soon.
+
+2. How to run this example in local?
+------------------------------------
+
+2.1 Use downloading script to download data
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Execute the following command to download needed files
+using the downloading script:
+
+.. code-block:: bash
+
+   chmod +x ./download.sh
+   ./download.sh
+
+Or Download manually
+
+
+#. download ``dev-v1.1.json`` and ``train-v1.1.json`` `here <https://rajpurkar.github.io/SQuAD-explorer/>`__
+
+.. code-block:: bash
+
+   wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
+   wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
+
+
+#. download ``glove.840B.300d.txt`` `here <https://nlp.stanford.edu/projects/glove/>`__
+
+.. code-block:: bash
+
+   wget http://nlp.stanford.edu/data/glove.840B.300d.zip
+   unzip glove.840B.300d.zip
+
+2.2 Update configuration
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Modify ``nni/examples/trials/ga_squad/config.yml``\ , here is the default configuration:
+
+.. code-block:: yaml
+
+   experimentName: ga-squad example
+   trialCommand: python3 trial.py
+   trialCodeDirectory: ~/nni/examples/trials/ga_squad
+
+   trialGpuNumber: 0
+   trialConcurrency: 1
+   maxTrialNumber: 10
+   maxExperimentDuration: 1h
+
+   searchSpace: {}  # hard-coded in tuner
+   tuner:
+     className: customer_tuner.CustomerTuner
+     codeDirectory: ~/nni/examples/tuners/ga_customer_tuner
+     classArgs:
+       optimize_mode: maximize
+
+   trainingService:
+     platform: local
+
+In the **trial** part, if you want to use GPU to perform the architecture search, change ``trialGpuNum`` from ``0`` to ``1``. You need to increase the ``maxTrialNumber`` and ``maxExperimentDuration``\ , according to how long you want to wait for the search result.
+
+2.3 submit this job
+^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: bash
+
+   nnictl create --config ~/nni/examples/trials/ga_squad/config.yml
+
+3. Technical details about the trial
+------------------------------------
+
+3.1 How does it works
+^^^^^^^^^^^^^^^^^^^^^
+
+The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner.
+
+3.2 The trial
+^^^^^^^^^^^^^
+
+The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
+
+
+* ``attention.py`` contains an implementation for attention mechanism in Tensorflow.
+* ``data.py`` contains functions for data preprocessing.
+* ``evaluate.py`` contains the evaluation script.
+* ``graph.py`` contains the definition of the computation graph.
+* ``rnn.py`` contains an implementation for GRU in Tensorflow.
+* ``train_model.py`` is a wrapper for the whole question answering model.
+
+Among those files, ``trial.py`` and ``graph_to_tf.py`` are special.
+
+``graph_to_tf.py`` has a function named as ``graph_to_network``\ , here is its skeleton code:
+
+.. code-block:: python
+
+   def graph_to_network(input1,
+                        input2,
+                        input1_lengths,
+                        input2_lengths,
+                        graph,
+                        dropout_rate,
+                        is_training,
+                        num_heads=1,
+                        rnn_units=256):
+       topology = graph.is_topology()
+       layers = dict()
+       layers_sequence_lengths = dict()
+       num_units = input1.get_shape().as_list()[-1]
+       layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \
+           positional_encoding(input1, scale=False, zero_pad=False)
+       layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32))
+       layers[0] = dropout(layers[0], dropout_rate, is_training)
+       layers[1] = dropout(layers[1], dropout_rate, is_training)
+       layers_sequence_lengths[0] = input1_lengths
+       layers_sequence_lengths[1] = input2_lengths
+       for _, topo_i in enumerate(topology):
+           if topo_i == '|':
+               continue
+           if graph.layers[topo_i].graph_type == LayerType.input.value:
+               ...
+           elif graph.layers[topo_i].graph_type == LayerType.attention.value:
+               ...
+           # More layers to handle
+
+As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the ``Model configuration format`` section) ``graph``\ , to a Tensorflow computation graph.
+
+.. code-block:: python
+
+   topology = graph.is_topology()
+
+performs topological sorting on the internal graph representation, and the code inside the loop:
+
+.. code-block:: python
+
+   for _, topo_i in enumerate(topology):
+       ...
+
+performs actually conversion that maps each layer to a part in Tensorflow computation graph.
+
+3.3 The tuner
+^^^^^^^^^^^^^
+
+The tuner is much more simple than the trial. They actually share the same ``graph.py``. Besides, the tuner has a ``customer_tuner.py``\ , the most important class in which is ``CustomerTuner``\ :
+
+.. code-block:: python
+
+   class CustomerTuner(Tuner):
+       # ......
+
+       def generate_parameters(self, parameter_id):
+           """Returns a set of trial graph config, as a serializable object.
+           parameter_id : int
+           """
+           if len(self.population) <= 0:
+               logger.debug("the len of poplution lower than zero.")
+               raise Exception('The population is empty')
+           pos = -1
+           for i in range(len(self.population)):
+               if self.population[i].result == None:
+                   pos = i
+                   break
+           if pos != -1:
+               indiv = copy.deepcopy(self.population[pos])
+               self.population.pop(pos)
+               temp = json.loads(graph_dumps(indiv.config))
+           else:
+               random.shuffle(self.population)
+               if self.population[0].result > self.population[1].result:
+                   self.population[0] = self.population[1]
+               indiv = copy.deepcopy(self.population[0])
+               self.population.pop(1)
+               indiv.mutation()
+               graph = indiv.config
+               temp =  json.loads(graph_dumps(graph))
+
+       # ......
+
+As we can see, the overloaded method ``generate_parameters`` implements a pretty naive mutation algorithm. The code lines:
+
+.. code-block:: python
+
+               if self.population[0].result > self.population[1].result:
+                   self.population[0] = self.population[1]
+               indiv = copy.deepcopy(self.population[0])
+
+controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result.
+
+3.4 Model configuration format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure.
+
+.. code-block:: json
+
+   {
+       "max_layer_num": 50,
+       "layers": [
+           {
+               "input_size": 0,
+               "type": 3,
+               "output_size": 1,
+               "input": [],
+               "size": "x",
+               "output": [4, 5],
+               "is_delete": false
+           },
+           {
+               "input_size": 0,
+               "type": 3,
+               "output_size": 1,
+               "input": [],
+               "size": "y",
+               "output": [4, 5],
+               "is_delete": false
+           },
+           {
+               "input_size": 1,
+               "type": 4,
+               "output_size": 0,
+               "input": [6],
+               "size": "x",
+               "output": [],
+               "is_delete": false
+           },
+           {
+               "input_size": 1,
+               "type": 4,
+               "output_size": 0,
+               "input": [5],
+               "size": "y",
+               "output": [],
+               "is_delete": false
+           },
+           {"Comment": "More layers will be here for actual graphs."}
+       ]
+   }
+
+Every model configuration will have a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where:
+
+
+* ``type`` is the type of the layer. 0, 1, 2, 3, 4 corresponds to attention, self-attention, RNN, input and output layer respectively.
+* ``size`` is the length of the output. "x", "y" correspond to document length / question length, respectively.
+* ``input_size`` is the number of inputs the layer has.
+* ``input`` is the indices of layers taken as input of this layer.
+* ``output`` is the indices of layers use this layer's output as their input.
+* ``is_delete`` means whether the layer is still available.
--- a/docs/source/tutorials/darts.ipynb
+++ b/docs/source/tutorials/darts.ipynb
+{
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "%matplotlib inline"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n# Searching in DARTS search space\n\nIn this tutorial, we demonstrate how to search in the famous model space proposed in `DARTS`_.\n\nThrough this process, you will learn:\n\n* How to use the built-in model spaces from NNI's model space hub.\n* How to use one-shot exploration strategies to explore a model space.\n* How to customize evaluators to achieve the best performance.\n\nIn the end, we get a strong-performing model on CIFAR-10 dataset, which achieves up to 97.28% accuracy.\n\n.. attention::\n\n   Running this tutorial requires a GPU.\n   If you don't have one, you can set ``gpus`` in :class:`~nni.retiarii.evaluator.pytorch.Classification` to be 0,\n   but do note that it will be much slower.\n\n\n## Use a pre-searched DARTS model\n\nSimilar to [the beginner tutorial of PyTorch](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)_,\nwe begin with CIFAR-10 dataset, which is a image classification dataset of 10 categories.\nThe images in CIFAR-10 are of size 3x32x32, i.e., RGB-colored images of 32x32 pixels in size.\n\nWe first load the CIFAR-10 dataset with torchvision.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import nni\nimport torch\nfrom torchvision import transforms\nfrom torchvision.datasets import CIFAR10\nfrom nni.retiarii.evaluator.pytorch import DataLoader\n\nCIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124]\nCIFAR_STD = [0.24703233, 0.24348505, 0.26158768]\n\ntransform_valid = transforms.Compose([\n    transforms.ToTensor(),\n    transforms.Normalize(CIFAR_MEAN, CIFAR_STD),\n])\nvalid_data = nni.trace(CIFAR10)(root='./data', train=False, download=True, transform=transform_valid)\nvalid_loader = DataLoader(valid_data, batch_size=256, num_workers=6)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "<div class=\"alert alert-info\"><h4>Note</h4><p>If you are to use multi-trial strategies, wrapping CIFAR10 with :func:`nni.trace` and\n   use DataLoader from ``nni.retiarii.evaluator.pytorch`` (instead of ``torch.utils.data``) are mandatory.\n   Otherwise, it's optional.</p></div>\n\nNNI presents many built-in model spaces, along with many *pre-searched models* in :doc:`model space hub </nas/space_hub>`,\nwhich are produced by most popular NAS literatures.\nA pre-trained model is a saved network that was previously trained on a large dataset like CIFAR-10 or ImageNet.\nYou can easily load these models as a starting point, validate their performances, and finetune them if you need.\n\nIn this tutorial, we choose one from `DARTS`_ search space, which is natively trained on our target dataset, CIFAR-10,\nso as to save the tedious steps of finetuning.\n\n.. tip::\n\n   Finetuning a pre-searched model on other datasets is no different from finetuning *any model*.\n   We recommend reading\n   [this tutorial of object detection finetuning](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html)_\n   if you want to know how finetuning is generally done in PyTorch.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "from nni.retiarii.hub.pytorch import DARTS as DartsSpace\n\ndarts_v2_model = DartsSpace.load_searched_model('darts-v2', pretrained=True, download=True)\n\ndef evaluate_model(model, cuda=False):\n    device = torch.device('cuda' if cuda else 'cpu')\n    model.to(device)\n    model.eval()\n    with torch.no_grad():\n        correct = total = 0\n        for inputs, targets in valid_loader:\n            inputs, targets = inputs.to(device), targets.to(device)\n            logits = model(inputs)\n            _, predict = torch.max(logits, 1)\n            correct += (predict == targets).sum().cpu().item()\n            total += targets.size(0)\n    print('Accuracy:', correct / total)\n    return correct / total\n\nevaluate_model(darts_v2_model, cuda=True)  # Set this to false if there's no GPU."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The journey of using a pre-searched model could end here. Or you are interested,\nwe can go a step further to search a model within :class:`~nni.retiarii.hub.pytorch.DARTS` space on our own.\n\n## Use the DARTS model space\n\nThe model space provided in `DARTS`_ originated from [NASNet](https://arxiv.org/abs/1707.07012)_,\nwhere the full model is constructed by repeatedly stacking a single computational unit (called a **cell**).\nThere are two types of cells within a network. The first type is called *normal cell*, and the second type is called *reduction cell*.\nThe key difference between normal and reduction cell is that the reduction cell will downsample the input feature map,\nand decrease its resolution. Normal and reduction cells are stacked alternately, as shown in the following figure.\n\n<img src=\"file://../../img/nasnet_cell_stack.png\">\n\nA cell takes outputs from two previous cells as inputs and contains a collection of *nodes*.\nEach node takes two previous nodes within the same cell (or the two cell inputs),\nand applies an *operator* (e.g., convolution, or max-pooling) to each input,\nand sums the outputs of operators as the output of the node.\nThe output of cell is the concatenation of all the nodes that are never used as inputs of another node.\nUsers could read [NDS](https://arxiv.org/pdf/1905.13214.pdf)_ or [ENAS](https://arxiv.org/abs/1802.03268)_ for more details.\n\nWe illustrate an example of cells in the following figure.\n\n<img src=\"file://../../img/nasnet_cell.png\">\n\nThe search space proposed in `DARTS`_ paper introduced two modifications to the original space\nin [NASNet](https://arxiv.org/abs/1707.07012)_.\n\nFirstly, the operator candidates have been narrowed down to seven:\n\n- Max pooling 3x3\n- Average pooling 3x3\n- Skip connect (Identity)\n- Separable convolution 3x3\n- Separable convolution 5x5\n- Dilated convolution 3x3\n- Dilated convolution 5x5\n\nSecondly, the output of cell is the concatenate of **all the nodes within the cell**.\n\nAs the search space is based on cell, once the normal and reduction cell has been fixed, we can stack them for indefinite times.\nTo save the search cost, the common practice is to reduce the number of filters (i.e., channels) and number of stacked cells\nduring the search phase, and increase them back when training the final searched architecture.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>`DARTS`_ is one of those papers that innovate both in search space and search strategy.\n   In this tutorial, we will search on **model space** provided by DARTS with **search strategy** proposed by DARTS.\n   We refer to them as *DARTS model space* (``DartsSpace``) and *DARTS strategy* (``DartsStrategy``), respectively.\n   We did NOT imply that the :class:`~nni.retiarii.hub.pytorch.DARTS` space and\n   :class:`~nni.retiarii.strategy.DARTS` strategy has to used together.\n   You can always explore the DARTS space with another search strategy, or use your own strategy to search a different model space.</p></div>\n\nIn the following example, we initialize a :class:`~nni.retiarii.hub.pytorch.DARTS`\nmodel space, with 16 initial filters and 8 stacked cells.\nThe network is specialized for CIFAR-10 dataset with 32x32 input resolution.\n\nThe :class:`~nni.retiarii.hub.pytorch.DARTS` model space here is provided by :doc:`model space hub </nas/space_hub>`,\nwhere we have supported multiple popular model spaces for plug-and-play.\n\n.. tip::\n\n   The model space here can be replaced with any space provided in the hub,\n   or even customized spaces built from scratch.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "model_space = DartsSpace(\n    width=16,           # the initial filters (channel number) for the model\n    num_cells=8,        # the number of stacked cells in total\n    dataset='cifar'     # to give a hint about input resolution, here is 32x32\n)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Search on the model space\n\n<div class=\"alert alert-danger\"><h4>Warning</h4><p>Please set ``fast_dev_run`` to False to reproduce the our claimed results.\n   Otherwise, only a few mini-batches will be run.</p></div>\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "fast_dev_run = True"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Evaluator\n\nTo begin exploring the model space, one firstly need to have an evaluator to provide the criterion of a \"good model\".\nAs we are searching on CIFAR-10 dataset, one can easily use the :class:`~nni.retiarii.evaluator.pytorch.Classification`\nas a starting point.\n\nNote that for a typical setup of NAS, the model search should be on validation set, and the evaluation of the final searched model\nshould be on test set. However, as CIFAR-10 dataset doesn't have a test dataset (only 50k train + 10k valid),\nwe have to split the original training set into a training set and a validation set.\nThe recommended train/val split by `DARTS`_ strategy is 1:1.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import numpy as np\nfrom nni.retiarii.evaluator.pytorch import Classification\nfrom torch.utils.data import SubsetRandomSampler\n\ntransform = transforms.Compose([\n    transforms.RandomCrop(32, padding=4),\n    transforms.RandomHorizontalFlip(),\n    transforms.ToTensor(),\n    transforms.Normalize(CIFAR_MEAN, CIFAR_STD),\n])\n\ntrain_data = nni.trace(CIFAR10)(root='./data', train=True, download=True, transform=transform)\n\nnum_samples = len(train_data)\nindices = np.random.permutation(num_samples)\nsplit = num_samples // 2\n\nsearch_train_loader = DataLoader(\n    train_data, batch_size=64, num_workers=6,\n    sampler=SubsetRandomSampler(indices[:split]),\n)\n\nsearch_valid_loader = DataLoader(\n    train_data, batch_size=64, num_workers=6,\n    sampler=SubsetRandomSampler(indices[split:]),\n)\n\nevaluator = Classification(\n    learning_rate=1e-3,\n    weight_decay=1e-4,\n    train_dataloaders=search_train_loader,\n    val_dataloaders=search_valid_loader,\n    max_epochs=10,\n    gpus=1,\n    fast_dev_run=fast_dev_run,\n)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Strategy\n\nWe will use `DARTS`_ (Differentiable ARchiTecture Search) as the search strategy to explore the model space.\n:class:`~nni.retiarii.strategy.DARTS` strategy belongs to the category of `one-shot strategy <one-shot-nas>`.\nThe fundamental differences between One-shot strategies and `multi-trial strategies <multi-trial-nas>` is that,\none-shot strategy combines search with model training into a single run.\nCompared to multi-trial strategies, one-shot NAS doesn't need to iteratively spawn new trials (i.e., models),\nand thus saves the excessive cost of model training.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>It's worth mentioning that one-shot NAS also suffers from multiple drawbacks despite its computational efficiency.\n   We recommend\n   [Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap](https://arxiv.org/abs/2008.01475)_\n   and\n   [How Does Supernet Help in Neural Architecture Search?](https://arxiv.org/abs/2010.08219)_ for interested readers.</p></div>\n\n:class:`~nni.retiarii.strategy.DARTS` strategy is provided as one of NNI's :doc:`built-in search strategies </nas/exploration_strategy>`.\nUsing it can be as simple as one line of code.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "from nni.retiarii.strategy import DARTS as DartsStrategy\n\nstrategy = DartsStrategy()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        ".. tip:: The ``DartsStrategy`` here can be replaced by any search strategies, even multi-trial strategies.\n\nIf you want to know how DARTS strategy works, here is a brief version.\nUnder the hood, DARTS converts the cell into a densely connected graph, and put operators on edges (see the following figure).\nSince the operators are not decided yet, every edge is a weighted mixture of multiple operators (multiple color in the figure).\nDARTS then learns to assign the optimal \"color\" for each edge during the network training.\nIt finally selects one \"color\" for each edge, and drops redundant edges.\nThe weights on the edges are called *architecture weights*.\n\n<img src=\"file://../../img/darts_illustration.png\">\n\n.. tip:: It's NOT reflected in the figure that, for DARTS model space, exactly two inputs are kept for every node.\n\n### Launch experiment\n\nWe then come to the step of launching the experiment.\nThis step is similar to what we have done in the :doc:`beginner tutorial <hello_nas>`,\nexcept that the ``execution_engine`` argument should be set to ``oneshot``.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig\n\nconfig = RetiariiExeConfig(execution_engine='oneshot')\nexperiment = RetiariiExperiment(model_space, evaluator=evaluator, strategy=strategy)\nexperiment.run(config)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        ".. tip::\n\n   The search process can be visualized with tensorboard. For example::\n\n       tensorboard --logdir=./lightning_logs\n\n   Then, open the browser and go to http://localhost:6006/ to monitor the search process.\n\n   .. image:: ../../img/darts_search_process.png\n\nWe can then retrieve the best model found by the strategy with ``export_top_models``.\nHere, the retrieved model is a dict (called *architecture dict*) describing the selected normal cell and reduction cell.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "exported_arch = experiment.export_top_models()[0]\n\nexported_arch"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The cell can be visualized with the following code snippet\n(copied and modified from [DARTS visualization](https://github.com/quark0/darts/blob/master/cnn/visualize.py)_).\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import io\nimport graphviz\nimport matplotlib.pyplot as plt\nfrom PIL import Image\n\ndef plot_single_cell(arch_dict, cell_name):\n    g = graphviz.Digraph(\n        node_attr=dict(style='filled', shape='rect', align='center'),\n        format='png'\n    )\n    g.body.extend(['rankdir=LR'])\n\n    g.node('c_{k-2}', fillcolor='darkseagreen2')\n    g.node('c_{k-1}', fillcolor='darkseagreen2')\n    assert len(arch_dict) % 2 == 0\n\n    for i in range(2, 6):\n        g.node(str(i), fillcolor='lightblue')\n\n    for i in range(2, 6):\n        for j in range(2):\n            op = arch_dict[f'{cell_name}/op_{i}_{j}']\n            from_ = arch_dict[f'{cell_name}/input_{i}_{j}']\n            if from_ == 0:\n                u = 'c_{k-2}'\n            elif from_ == 1:\n                u = 'c_{k-1}'\n            else:\n                u = str(from_)\n            v = str(i)\n            g.edge(u, v, label=op, fillcolor='gray')\n\n    g.node('c_{k}', fillcolor='palegoldenrod')\n    for i in range(2, 6):\n        g.edge(str(i), 'c_{k}', fillcolor='gray')\n\n    g.attr(label=f'{cell_name.capitalize()} cell')\n\n    image = Image.open(io.BytesIO(g.pipe()))\n    return image\n\ndef plot_double_cells(arch_dict):\n    image1 = plot_single_cell(arch_dict, 'normal')\n    image2 = plot_single_cell(arch_dict, 'reduce')\n    height_ratio = max(image1.size[1] / image1.size[0], image2.size[1] / image2.size[0]) \n    _, axs = plt.subplots(1, 2, figsize=(20, 10 * height_ratio))\n    axs[0].imshow(image1)\n    axs[1].imshow(image2)\n    axs[0].axis('off')\n    axs[1].axis('off')\n    plt.show()\n\nplot_double_cells(exported_arch)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "<div class=\"alert alert-danger\"><h4>Warning</h4><p>The cell above is obtained via ``fast_dev_run`` (i.e., running only 1 mini-batch).</p></div>\n\nWhen ``fast_dev_run`` is turned off, we get a model with the following architecture,\nwhere you might notice an interesting fact that around half the operations have selected ``sep_conv_3x3``.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "plot_double_cells({\n    'normal/op_2_0': 'sep_conv_3x3',\n    'normal/input_2_0': 1,\n    'normal/op_2_1': 'sep_conv_3x3',\n    'normal/input_2_1': 0,\n    'normal/op_3_0': 'sep_conv_3x3',\n    'normal/input_3_0': 1,\n    'normal/op_3_1': 'sep_conv_3x3',\n    'normal/input_3_1': 2,\n    'normal/op_4_0': 'sep_conv_3x3',\n    'normal/input_4_0': 1,\n    'normal/op_4_1': 'sep_conv_3x3',\n    'normal/input_4_1': 0,\n    'normal/op_5_0': 'sep_conv_3x3',\n    'normal/input_5_0': 1,\n    'normal/op_5_1': 'max_pool_3x3',\n    'normal/input_5_1': 0,\n    'reduce/op_2_0': 'sep_conv_3x3',\n    'reduce/input_2_0': 0,\n    'reduce/op_2_1': 'sep_conv_3x3',\n    'reduce/input_2_1': 1,\n    'reduce/op_3_0': 'dil_conv_5x5',\n    'reduce/input_3_0': 2,\n    'reduce/op_3_1': 'sep_conv_3x3',\n    'reduce/input_3_1': 0,\n    'reduce/op_4_0': 'dil_conv_5x5',\n    'reduce/input_4_0': 2,\n    'reduce/op_4_1': 'sep_conv_5x5',\n    'reduce/input_4_1': 1,\n    'reduce/op_5_0': 'sep_conv_5x5',\n    'reduce/input_5_0': 4,\n    'reduce/op_5_1': 'dil_conv_5x5',\n    'reduce/input_5_1': 2\n})"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Retrain the searched model\n\nWhat we have got in the last step, is only a cell structure.\nTo get a final usable model with trained weights, we need to construct a real model based on this structure,\nand then fully train it.\n\nTo construct a fixed model based on the architecture dict exported from the experiment,\nwe can use :func:`nni.retiarii.fixed_arch`. Under the with-context, we will creating a fixed model based on ``exported_arch``,\ninstead of creating a space.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "from nni.retiarii import fixed_arch\n\nwith fixed_arch(exported_arch):\n    final_model = DartsSpace(width=16, num_cells=8, dataset='cifar')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We then train the model on full CIFAR-10 training dataset, and evaluate it on the original CIFAR-10 validation dataset.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "train_loader = DataLoader(train_data, batch_size=96, num_workers=6)  # Use the original training data"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The validation data loader can be reused.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "valid_loader"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We must create a new evaluator here because a different data split is used.\nAlso, we should avoid the underlying pytorch-lightning implementation of :class:`~nni.retiarii.evaluator.pytorch.Classification`\nevaluator from loading the wrong checkpoint.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "max_epochs = 100\n\nevaluator = Classification(\n    learning_rate=1e-3,\n    weight_decay=1e-4,\n    train_dataloaders=train_loader,\n    val_dataloaders=valid_loader,\n    max_epochs=max_epochs,\n    gpus=1,\n    export_onnx=False,          # Disable ONNX export for this experiment\n    fast_dev_run=fast_dev_run   # Should be false for fully training\n)\n\nevaluator.fit(final_model)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "<div class=\"alert alert-info\"><h4>Note</h4><p>When ``fast_dev_run`` is turned off, we achieve a validation accuracy of 89.69% after training for 100 epochs.</p></div>\n\n## Reproduce results in DARTS paper\n\nAfter a brief walkthrough of search + retrain process with one-shot strategy,\nwe then fill the gap between our results (89.69%) and the results in the `DARTS` paper.\nThis is because we didn't introduce some extra training tricks, including [DropPath](https://arxiv.org/pdf/1605.07648v4.pdf)_,\nAuxiliary loss, gradient clipping and augmentations like [Cutout](https://arxiv.org/pdf/1708.04552v2.pdf)_.\nThey also train the deeper (20 cells) and wider (36 filters) networks for longer time (600 epochs).\nHere we reproduce these tricks to get comparable results with DARTS paper.\n\n\n### Evaluator\n\nTo implement these tricks, we first need to rewrite a few parts of evaluator.\n\nWorking with one-shot strategies, evaluators need to be implemented in the style of `PyTorch-Lightning <lightning-evaluator>`,\nThe full tutorial can be found in :doc:`/nas/evaluator`.\nPutting it briefly, the core part of writing a new evaluator is to write a new LightningModule.\n[LightingModule](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html)_ is a concept in\nPyTorch-Lightning, which organizes the model training process into a list of functions, such as,\n``training_step``, ``validation_step``, ``configure_optimizers``, etc.\nSince we are merely adding a few ingredients to :class:`~nni.retiarii.evaluator.pytorch.Classification`,\nwe can simply inherit :class:`~nni.retiarii.evaluator.pytorch.ClassificationModule`, which is the underlying LightningModule\nbehind :class:`~nni.retiarii.evaluator.pytorch.Classification`.\nThis could look intimidating at first, but most of them are just plug-and-play tricks which you don't need to know details about.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import torch\nfrom nni.retiarii.evaluator.pytorch import ClassificationModule\n\nclass DartsClassificationModule(ClassificationModule):\n    def __init__(\n        self,\n        learning_rate: float = 0.001,\n        weight_decay: float = 0.,\n        auxiliary_loss_weight: float = 0.4,\n        max_epochs: int = 600\n    ):\n        self.auxiliary_loss_weight = auxiliary_loss_weight\n        # Training length will be used in LR scheduler\n        self.max_epochs = max_epochs\n        super().__init__(learning_rate=learning_rate, weight_decay=weight_decay, export_onnx=False)\n\n    def configure_optimizers(self):\n        \"\"\"Customized optimizer with momentum, as well as a scheduler.\"\"\"\n        optimizer = torch.optim.SGD(\n            self.parameters(),\n            momentum=0.9,\n            lr=self.hparams.learning_rate,\n            weight_decay=self.hparams.weight_decay\n        )\n        return {\n            'optimizer': optimizer,\n            'lr_scheduler': torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, self.max_epochs, eta_min=1e-3)\n        }\n\n    def training_step(self, batch, batch_idx):\n        \"\"\"Training step, customized with auxiliary loss.\"\"\"\n        x, y = batch\n        if self.auxiliary_loss_weight:\n            y_hat, y_aux = self(x)\n            loss_main = self.criterion(y_hat, y)\n            loss_aux = self.criterion(y_aux, y)\n            self.log('train_loss_main', loss_main)\n            self.log('train_loss_aux', loss_aux)\n            loss = loss_main + self.auxiliary_loss_weight * loss_aux\n        else:\n            y_hat = self(x)\n            loss = self.criterion(y_hat, y)\n        self.log('train_loss', loss, prog_bar=True)\n        for name, metric in self.metrics.items():\n            self.log('train_' + name, metric(y_hat, y), prog_bar=True)\n        return loss\n\n    def on_train_epoch_start(self):\n        # Set drop path probability before every epoch. This has no effect if drop path is not enabled in model.\n        self.model.set_drop_path_prob(self.model.drop_path_prob * self.current_epoch / self.max_epochs)\n\n        # Logging learning rate at the beginning of every epoch\n        self.log('lr', self.trainer.optimizers[0].param_groups[0]['lr'])"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The full evaluator is written as follows,\nwhich simply wraps everything (except model space and search strategy of course), in a single object.\n:class:`~nni.retiarii.evaluator.pytorch.Lightning` here is a special type of evaluator.\nDon't forget to use the train/val data split specialized for search (1:1) here.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "from nni.retiarii.evaluator.pytorch import Lightning, Trainer\n\nmax_epochs = 50\n\nevaluator = Lightning(\n    DartsClassificationModule(0.025, 3e-4, 0., max_epochs),\n    Trainer(\n        gpus=1,\n        max_epochs=max_epochs,\n        fast_dev_run=fast_dev_run,\n    ),\n    train_dataloaders=search_train_loader,\n    val_dataloaders=search_valid_loader\n)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Strategy\n\n:class:`~nni.retiarii.strategy.DARTS` strategy is created with gradient clip turned on.\nIf you are familiar with PyTorch-Lightning, you might aware that gradient clipping can be enabled in Lightning trainer.\nHowever, enabling gradient clip in the trainer above won't work, because the underlying\nimplementation of :class:`~nni.retiarii.strategy.DARTS` strategy is based on\n[manual optimization](https://pytorch-lightning.readthedocs.io/en/stable/common/optimization.html)_.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "strategy = DartsStrategy(gradient_clip_val=5.)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Launch experiment\n\nThen we use the newly created evaluator and strategy to launch the experiment again.\n\n<div class=\"alert alert-danger\"><h4>Warning</h4><p>``model_space`` has to be re-instantiated because a known limitation,\n   i.e., one model space instance can't be reused across multiple experiments.</p></div>\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "model_space = DartsSpace(width=16, num_cells=8, dataset='cifar')\n\nconfig = RetiariiExeConfig(execution_engine='oneshot')\nexperiment = RetiariiExperiment(model_space, evaluator=evaluator, strategy=strategy)\nexperiment.run(config)\n\nexported_arch = experiment.export_top_models()[0]\n\nexported_arch"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We get the following architecture when ``fast_dev_run`` is set to False. It takes around 8 hours on a P100 GPU.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "plot_double_cells({\n    'normal/op_2_0': 'sep_conv_3x3',\n    'normal/input_2_0': 0,\n    'normal/op_2_1': 'sep_conv_3x3',\n    'normal/input_2_1': 1,\n    'normal/op_3_0': 'sep_conv_3x3',\n    'normal/input_3_0': 1,\n    'normal/op_3_1': 'skip_connect',\n    'normal/input_3_1': 0,\n    'normal/op_4_0': 'sep_conv_3x3',\n    'normal/input_4_0': 0,\n    'normal/op_4_1': 'max_pool_3x3',\n    'normal/input_4_1': 1,\n    'normal/op_5_0': 'sep_conv_3x3',\n    'normal/input_5_0': 0,\n    'normal/op_5_1': 'sep_conv_3x3',\n    'normal/input_5_1': 1,\n    'reduce/op_2_0': 'max_pool_3x3',\n    'reduce/input_2_0': 0,\n    'reduce/op_2_1': 'sep_conv_5x5',\n    'reduce/input_2_1': 1,\n    'reduce/op_3_0': 'dil_conv_5x5',\n    'reduce/input_3_0': 2,\n    'reduce/op_3_1': 'max_pool_3x3',\n    'reduce/input_3_1': 0,\n    'reduce/op_4_0': 'max_pool_3x3',\n    'reduce/input_4_0': 0,\n    'reduce/op_4_1': 'sep_conv_3x3',\n    'reduce/input_4_1': 2,\n    'reduce/op_5_0': 'max_pool_3x3',\n    'reduce/input_5_0': 0,\n    'reduce/op_5_1': 'skip_connect',\n    'reduce/input_5_1': 2\n})"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Retrain\n\nWhen retraining,\nwe extend the original dataloader to introduce another trick called [Cutout](https://arxiv.org/pdf/1708.04552v2.pdf)_.\nCutout is a data augmentation technique that randomly masks out rectangular regions in images.\nIn CIFAR-10, the typical masked size is 16x16 (the image sizes are 32x32 in the dataset).\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "def cutout_transform(img, length: int = 16):\n    h, w = img.size(1), img.size(2)\n    mask = np.ones((h, w), np.float32)\n    y = np.random.randint(h)\n    x = np.random.randint(w)\n\n    y1 = np.clip(y - length // 2, 0, h)\n    y2 = np.clip(y + length // 2, 0, h)\n    x1 = np.clip(x - length // 2, 0, w)\n    x2 = np.clip(x + length // 2, 0, w)\n\n    mask[y1: y2, x1: x2] = 0.\n    mask = torch.from_numpy(mask)\n    mask = mask.expand_as(img)\n    img *= mask\n    return img\n\ntransform_with_cutout = transforms.Compose([\n    transforms.RandomCrop(32, padding=4),\n    transforms.RandomHorizontalFlip(),\n    transforms.ToTensor(),\n    transforms.Normalize(CIFAR_MEAN, CIFAR_STD),\n    cutout_transform,\n])"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The train dataloader needs to be reinstantiated with the new transform.\nThe validation dataloader is not affected, and thus can be reused.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "train_data_cutout = nni.trace(CIFAR10)(root='./data', train=True, download=True, transform=transform_with_cutout)\ntrain_loader_cutout = DataLoader(train_data_cutout, batch_size=96)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We then create the final model based on the new exported architecture.\nThis time, auxiliary loss and drop path probability is enabled.\n\nFollowing the same procedure as paper, we also increase the number of filters to 36, and number of cells to 20,\nso as to reasonably increase the model size and boost the performance.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "with fixed_arch(exported_arch):\n    final_model = DartsSpace(width=36, num_cells=20, dataset='cifar', auxiliary_loss=True, drop_path_prob=0.2)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We create a new evaluator for the retraining process, where the gradient clipping is put into the keyword arguments of trainer.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "max_epochs = 600\n\nevaluator = Lightning(\n    DartsClassificationModule(0.025, 3e-4, 0.4, max_epochs),\n    trainer=Trainer(\n        gpus=1,\n        gradient_clip_val=5.,\n        max_epochs=max_epochs,\n        fast_dev_run=fast_dev_run\n    ),\n    train_dataloaders=train_loader_cutout,\n    val_dataloaders=valid_loader,\n)\n\nevaluator.fit(final_model)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "When ``fast_dev_run`` is turned off, after retraining, the architecture yields a top-1 accuracy of 97.12%.\nIf we take the best snapshot throughout the retrain process,\nthere is a chance that the top-1 accuracy will be 97.28%.\n\n<img src=\"file://../../img/darts_val_acc.png\">\n\nIn the figure, the orange line is the validation accuracy curve after training for 600 epochs,\nwhile the red line corresponding the previous version in this tutorial before adding all the training tricks and\nonly trains for 100 epochs.\n\nThe results outperforms \"DARTS (first order) + cutout\" in `DARTS`_ paper, which is only 97.00\u00b10.14%.\nIt's even comparable with \"DARTS (second order) + cutout\" in the paper (97.24\u00b10.09%),\nthough we didn't implement the second order version.\nThe implementation of second order DARTS is in our future plan, and we also welcome your contribution.\n\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.8.13"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
\ No newline at end of file
--- a/docs/source/tutorials/darts.py
+++ b/docs/source/tutorials/darts.py
+"""
+Searching in DARTS search space
+===============================
+
+In this tutorial, we demonstrate how to search in the famous model space proposed in `DARTS`_.
+
+Through this process, you will learn:
+
+* How to use the built-in model spaces from NNI's model space hub.
+* How to use one-shot exploration strategies to explore a model space.
+* How to customize evaluators to achieve the best performance.
+
+In the end, we get a strong-performing model on CIFAR-10 dataset, which achieves up to 97.28% accuracy.
+
+.. attention::
+
+   Running this tutorial requires a GPU.
+   If you don't have one, you can set ``gpus`` in :class:`~nni.retiarii.evaluator.pytorch.Classification` to be 0,
+   but do note that it will be much slower.
+
+.. _DARTS: https://arxiv.org/abs/1806.09055
+
+Use a pre-searched DARTS model
+------------------------------
+
+Similar to `the beginner tutorial of PyTorch <https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html>`__,
+we begin with CIFAR-10 dataset, which is a image classification dataset of 10 categories.
+The images in CIFAR-10 are of size 3x32x32, i.e., RGB-colored images of 32x32 pixels in size.
+
+We first load the CIFAR-10 dataset with torchvision.
+"""
+
+import nni
+import torch
+from torchvision import transforms
+from torchvision.datasets import CIFAR10
+from nni.retiarii.evaluator.pytorch import DataLoader
+
+CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124]
+CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
+
+transform_valid = transforms.Compose([
+    transforms.ToTensor(),
+    transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
+])
+valid_data = nni.trace(CIFAR10)(root='./data', train=False, download=True, transform=transform_valid)
+valid_loader = DataLoader(valid_data, batch_size=256, num_workers=6)
+
+# %%
+#
+# .. note::
+#
+#    If you are to use multi-trial strategies, wrapping CIFAR10 with :func:`nni.trace` and
+#    use DataLoader from ``nni.retiarii.evaluator.pytorch`` (instead of ``torch.utils.data``) are mandatory.
+#    Otherwise, it's optional.
+#
+# NNI presents many built-in model spaces, along with many *pre-searched models* in :doc:`model space hub </nas/space_hub>`,
+# which are produced by most popular NAS literatures.
+# A pre-trained model is a saved network that was previously trained on a large dataset like CIFAR-10 or ImageNet.
+# You can easily load these models as a starting point, validate their performances, and finetune them if you need.
+#
+# In this tutorial, we choose one from `DARTS`_ search space, which is natively trained on our target dataset, CIFAR-10,
+# so as to save the tedious steps of finetuning.
+#
+# .. tip::
+#
+#    Finetuning a pre-searched model on other datasets is no different from finetuning *any model*.
+#    We recommend reading
+#    `this tutorial of object detection finetuning <https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html>`__
+#    if you want to know how finetuning is generally done in PyTorch.
+
+from nni.retiarii.hub.pytorch import DARTS as DartsSpace
+
+darts_v2_model = DartsSpace.load_searched_model('darts-v2', pretrained=True, download=True)
+
+def evaluate_model(model, cuda=False):
+    device = torch.device('cuda' if cuda else 'cpu')
+    model.to(device)
+    model.eval()
+    with torch.no_grad():
+        correct = total = 0
+        for inputs, targets in valid_loader:
+            inputs, targets = inputs.to(device), targets.to(device)
+            logits = model(inputs)
+            _, predict = torch.max(logits, 1)
+            correct += (predict == targets).sum().cpu().item()
+            total += targets.size(0)
+    print('Accuracy:', correct / total)
+    return correct / total
+
+evaluate_model(darts_v2_model, cuda=True)  # Set this to false if there's no GPU.
+
+# %%
+#
+# The journey of using a pre-searched model could end here. Or you are interested,
+# we can go a step further to search a model within :class:`~nni.retiarii.hub.pytorch.DARTS` space on our own.
+#
+# Use the DARTS model space
+# -------------------------
+#
+# The model space provided in `DARTS`_ originated from `NASNet <https://arxiv.org/abs/1707.07012>`__,
+# where the full model is constructed by repeatedly stacking a single computational unit (called a **cell**).
+# There are two types of cells within a network. The first type is called *normal cell*, and the second type is called *reduction cell*.
+# The key difference between normal and reduction cell is that the reduction cell will downsample the input feature map,
+# and decrease its resolution. Normal and reduction cells are stacked alternately, as shown in the following figure.
+#
+# .. image:: ../../img/nasnet_cell_stack.png
+#
+# A cell takes outputs from two previous cells as inputs and contains a collection of *nodes*.
+# Each node takes two previous nodes within the same cell (or the two cell inputs),
+# and applies an *operator* (e.g., convolution, or max-pooling) to each input,
+# and sums the outputs of operators as the output of the node.
+# The output of cell is the concatenation of all the nodes that are never used as inputs of another node.
+# Users could read `NDS <https://arxiv.org/pdf/1905.13214.pdf>`__ or `ENAS <https://arxiv.org/abs/1802.03268>`__ for more details.
+#
+# We illustrate an example of cells in the following figure.
+#
+# .. image:: ../../img/nasnet_cell.png
+#
+# The search space proposed in `DARTS`_ paper introduced two modifications to the original space
+# in `NASNet <https://arxiv.org/abs/1707.07012>`__.
+#
+# Firstly, the operator candidates have been narrowed down to seven:
+#
+# - Max pooling 3x3
+# - Average pooling 3x3
+# - Skip connect (Identity)
+# - Separable convolution 3x3
+# - Separable convolution 5x5
+# - Dilated convolution 3x3
+# - Dilated convolution 5x5
+#
+# Secondly, the output of cell is the concatenate of **all the nodes within the cell**.
+#
+# As the search space is based on cell, once the normal and reduction cell has been fixed, we can stack them for indefinite times.
+# To save the search cost, the common practice is to reduce the number of filters (i.e., channels) and number of stacked cells
+# during the search phase, and increase them back when training the final searched architecture.
+#
+# .. note::
+#
+#    `DARTS`_ is one of those papers that innovate both in search space and search strategy.
+#    In this tutorial, we will search on **model space** provided by DARTS with **search strategy** proposed by DARTS.
+#    We refer to them as *DARTS model space* (``DartsSpace``) and *DARTS strategy* (``DartsStrategy``), respectively.
+#    We did NOT imply that the :class:`~nni.retiarii.hub.pytorch.DARTS` space and
+#    :class:`~nni.retiarii.strategy.DARTS` strategy has to used together.
+#    You can always explore the DARTS space with another search strategy, or use your own strategy to search a different model space.
+#
+# In the following example, we initialize a :class:`~nni.retiarii.hub.pytorch.DARTS`
+# model space, with 16 initial filters and 8 stacked cells.
+# The network is specialized for CIFAR-10 dataset with 32x32 input resolution.
+#
+# The :class:`~nni.retiarii.hub.pytorch.DARTS` model space here is provided by :doc:`model space hub </nas/space_hub>`,
+# where we have supported multiple popular model spaces for plug-and-play.
+#
+# .. tip::
+#
+#    The model space here can be replaced with any space provided in the hub,
+#    or even customized spaces built from scratch.
+
+model_space = DartsSpace(
+    width=16,           # the initial filters (channel number) for the model
+    num_cells=8,        # the number of stacked cells in total
+    dataset='cifar'     # to give a hint about input resolution, here is 32x32
+)
+
+# %%
+#
+# Search on the model space
+# -------------------------
+#
+# .. warning::
+#
+#    Please set ``fast_dev_run`` to False to reproduce the our claimed results.
+#    Otherwise, only a few mini-batches will be run.
+
+fast_dev_run = True
+
+# %%
+#
+# Evaluator
+# ^^^^^^^^^
+#
+# To begin exploring the model space, one firstly need to have an evaluator to provide the criterion of a "good model".
+# As we are searching on CIFAR-10 dataset, one can easily use the :class:`~nni.retiarii.evaluator.pytorch.Classification`
+# as a starting point.
+#
+# Note that for a typical setup of NAS, the model search should be on validation set, and the evaluation of the final searched model
+# should be on test set. However, as CIFAR-10 dataset doesn't have a test dataset (only 50k train + 10k valid),
+# we have to split the original training set into a training set and a validation set.
+# The recommended train/val split by `DARTS`_ strategy is 1:1.
+
+import numpy as np
+from nni.retiarii.evaluator.pytorch import Classification
+from torch.utils.data import SubsetRandomSampler
+
+transform = transforms.Compose([
+    transforms.RandomCrop(32, padding=4),
+    transforms.RandomHorizontalFlip(),
+    transforms.ToTensor(),
+    transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
+])
+
+train_data = nni.trace(CIFAR10)(root='./data', train=True, download=True, transform=transform)
+
+num_samples = len(train_data)
+indices = np.random.permutation(num_samples)
+split = num_samples // 2
+
+search_train_loader = DataLoader(
+    train_data, batch_size=64, num_workers=6,
+    sampler=SubsetRandomSampler(indices[:split]),
+)
+
+search_valid_loader = DataLoader(
+    train_data, batch_size=64, num_workers=6,
+    sampler=SubsetRandomSampler(indices[split:]),
+)
+
+evaluator = Classification(
+    learning_rate=1e-3,
+    weight_decay=1e-4,
+    train_dataloaders=search_train_loader,
+    val_dataloaders=search_valid_loader,
+    max_epochs=10,
+    gpus=1,
+    fast_dev_run=fast_dev_run,
+)
+
+# %%
+#
+# Strategy
+# ^^^^^^^^
+#
+# We will use `DARTS`_ (Differentiable ARchiTecture Search) as the search strategy to explore the model space.
+# :class:`~nni.retiarii.strategy.DARTS` strategy belongs to the category of :ref:`one-shot strategy <one-shot-nas>`.
+# The fundamental differences between One-shot strategies and :ref:`multi-trial strategies <multi-trial-nas>` is that,
+# one-shot strategy combines search with model training into a single run.
+# Compared to multi-trial strategies, one-shot NAS doesn't need to iteratively spawn new trials (i.e., models),
+# and thus saves the excessive cost of model training.
+#
+# .. note::
+#
+#    It's worth mentioning that one-shot NAS also suffers from multiple drawbacks despite its computational efficiency.
+#    We recommend
+#    `Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap <https://arxiv.org/abs/2008.01475>`__
+#    and
+#    `How Does Supernet Help in Neural Architecture Search? <https://arxiv.org/abs/2010.08219>`__ for interested readers.
+#
+# :class:`~nni.retiarii.strategy.DARTS` strategy is provided as one of NNI's :doc:`built-in search strategies </nas/exploration_strategy>`.
+# Using it can be as simple as one line of code.
+
+from nni.retiarii.strategy import DARTS as DartsStrategy
+
+strategy = DartsStrategy()
+
+# %%
+#
+# .. tip:: The ``DartsStrategy`` here can be replaced by any search strategies, even multi-trial strategies.
+#
+# If you want to know how DARTS strategy works, here is a brief version.
+# Under the hood, DARTS converts the cell into a densely connected graph, and put operators on edges (see the following figure).
+# Since the operators are not decided yet, every edge is a weighted mixture of multiple operators (multiple color in the figure).
+# DARTS then learns to assign the optimal "color" for each edge during the network training.
+# It finally selects one "color" for each edge, and drops redundant edges.
+# The weights on the edges are called *architecture weights*.
+#
+# .. image:: ../../img/darts_illustration.png
+#
+# .. tip:: It's NOT reflected in the figure that, for DARTS model space, exactly two inputs are kept for every node.
+#
+# Launch experiment
+# ^^^^^^^^^^^^^^^^^
+#
+# We then come to the step of launching the experiment.
+# This step is similar to what we have done in the :doc:`beginner tutorial <hello_nas>`,
+# except that the ``execution_engine`` argument should be set to ``oneshot``.
+
+from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig
+
+config = RetiariiExeConfig(execution_engine='oneshot')
+experiment = RetiariiExperiment(model_space, evaluator=evaluator, strategy=strategy)
+experiment.run(config)
+
+# %%
+#
+# .. tip::
+#
+#    The search process can be visualized with tensorboard. For example::
+#
+#        tensorboard --logdir=./lightning_logs
+#
+#    Then, open the browser and go to http://localhost:6006/ to monitor the search process.
+#
+#    .. image:: ../../img/darts_search_process.png
+#
+# We can then retrieve the best model found by the strategy with ``export_top_models``.
+# Here, the retrieved model is a dict (called *architecture dict*) describing the selected normal cell and reduction cell.
+
+exported_arch = experiment.export_top_models()[0]
+
+exported_arch
+
+# %%
+#
+# The cell can be visualized with the following code snippet
+# (copied and modified from `DARTS visualization <https://github.com/quark0/darts/blob/master/cnn/visualize.py>`__).
+
+import io
+import graphviz
+import matplotlib.pyplot as plt
+from PIL import Image
+
+def plot_single_cell(arch_dict, cell_name):
+    g = graphviz.Digraph(
+        node_attr=dict(style='filled', shape='rect', align='center'),
+        format='png'
+    )
+    g.body.extend(['rankdir=LR'])
+
+    g.node('c_{k-2}', fillcolor='darkseagreen2')
+    g.node('c_{k-1}', fillcolor='darkseagreen2')
+    assert len(arch_dict) % 2 == 0
+
+    for i in range(2, 6):
+        g.node(str(i), fillcolor='lightblue')
+
+    for i in range(2, 6):
+        for j in range(2):
+            op = arch_dict[f'{cell_name}/op_{i}_{j}']
+            from_ = arch_dict[f'{cell_name}/input_{i}_{j}']
+            if from_ == 0:
+                u = 'c_{k-2}'
+            elif from_ == 1:
+                u = 'c_{k-1}'
+            else:
+                u = str(from_)
+            v = str(i)
+            g.edge(u, v, label=op, fillcolor='gray')
+
+    g.node('c_{k}', fillcolor='palegoldenrod')
+    for i in range(2, 6):
+        g.edge(str(i), 'c_{k}', fillcolor='gray')
+
+    g.attr(label=f'{cell_name.capitalize()} cell')
+
+    image = Image.open(io.BytesIO(g.pipe()))
+    return image
+
+def plot_double_cells(arch_dict):
+    image1 = plot_single_cell(arch_dict, 'normal')
+    image2 = plot_single_cell(arch_dict, 'reduce')
+    height_ratio = max(image1.size[1] / image1.size[0], image2.size[1] / image2.size[0]) 
+    _, axs = plt.subplots(1, 2, figsize=(20, 10 * height_ratio))
+    axs[0].imshow(image1)
+    axs[1].imshow(image2)
+    axs[0].axis('off')
+    axs[1].axis('off')
+    plt.show()
+
+plot_double_cells(exported_arch)
+
+# %%
+#
+# .. warning:: The cell above is obtained via ``fast_dev_run`` (i.e., running only 1 mini-batch).
+#
+# When ``fast_dev_run`` is turned off, we get a model with the following architecture,
+# where you might notice an interesting fact that around half the operations have selected ``sep_conv_3x3``.
+
+plot_double_cells({
+    'normal/op_2_0': 'sep_conv_3x3',
+    'normal/input_2_0': 1,
+    'normal/op_2_1': 'sep_conv_3x3',
+    'normal/input_2_1': 0,
+    'normal/op_3_0': 'sep_conv_3x3',
+    'normal/input_3_0': 1,
+    'normal/op_3_1': 'sep_conv_3x3',
+    'normal/input_3_1': 2,
+    'normal/op_4_0': 'sep_conv_3x3',
+    'normal/input_4_0': 1,
+    'normal/op_4_1': 'sep_conv_3x3',
+    'normal/input_4_1': 0,
+    'normal/op_5_0': 'sep_conv_3x3',
+    'normal/input_5_0': 1,
+    'normal/op_5_1': 'max_pool_3x3',
+    'normal/input_5_1': 0,
+    'reduce/op_2_0': 'sep_conv_3x3',
+    'reduce/input_2_0': 0,
+    'reduce/op_2_1': 'sep_conv_3x3',
+    'reduce/input_2_1': 1,
+    'reduce/op_3_0': 'dil_conv_5x5',
+    'reduce/input_3_0': 2,
+    'reduce/op_3_1': 'sep_conv_3x3',
+    'reduce/input_3_1': 0,
+    'reduce/op_4_0': 'dil_conv_5x5',
+    'reduce/input_4_0': 2,
+    'reduce/op_4_1': 'sep_conv_5x5',
+    'reduce/input_4_1': 1,
+    'reduce/op_5_0': 'sep_conv_5x5',
+    'reduce/input_5_0': 4,
+    'reduce/op_5_1': 'dil_conv_5x5',
+    'reduce/input_5_1': 2
+})
+
+# %%
+#
+# Retrain the searched model
+# --------------------------
+#
+# What we have got in the last step, is only a cell structure.
+# To get a final usable model with trained weights, we need to construct a real model based on this structure,
+# and then fully train it.
+#
+# To construct a fixed model based on the architecture dict exported from the experiment,
+# we can use :func:`nni.retiarii.fixed_arch`. Under the with-context, we will creating a fixed model based on ``exported_arch``,
+# instead of creating a space.
+
+from nni.retiarii import fixed_arch
+
+with fixed_arch(exported_arch):
+    final_model = DartsSpace(width=16, num_cells=8, dataset='cifar')
+
+# %%
+#
+# We then train the model on full CIFAR-10 training dataset, and evaluate it on the original CIFAR-10 validation dataset.
+
+train_loader = DataLoader(train_data, batch_size=96, num_workers=6)  # Use the original training data
+
+# %%
+#
+# The validation data loader can be reused.
+
+valid_loader
+
+# %%
+#
+# We must create a new evaluator here because a different data split is used.
+# Also, we should avoid the underlying pytorch-lightning implementation of :class:`~nni.retiarii.evaluator.pytorch.Classification`
+# evaluator from loading the wrong checkpoint.
+
+max_epochs = 100
+
+evaluator = Classification(
+    learning_rate=1e-3,
+    weight_decay=1e-4,
+    train_dataloaders=train_loader,
+    val_dataloaders=valid_loader,
+    max_epochs=max_epochs,
+    gpus=1,
+    export_onnx=False,          # Disable ONNX export for this experiment
+    fast_dev_run=fast_dev_run   # Should be false for fully training
+)
+
+evaluator.fit(final_model)
+
+# %%
+#
+# .. note:: When ``fast_dev_run`` is turned off, we achieve a validation accuracy of 89.69% after training for 100 epochs.
+#
+# Reproduce results in DARTS paper
+# --------------------------------
+#
+# After a brief walkthrough of search + retrain process with one-shot strategy,
+# we then fill the gap between our results (89.69%) and the results in the `DARTS` paper.
+# This is because we didn't introduce some extra training tricks, including `DropPath <https://arxiv.org/pdf/1605.07648v4.pdf>`__,
+# Auxiliary loss, gradient clipping and augmentations like `Cutout <https://arxiv.org/pdf/1708.04552v2.pdf>`__.
+# They also train the deeper (20 cells) and wider (36 filters) networks for longer time (600 epochs).
+# Here we reproduce these tricks to get comparable results with DARTS paper.
+#
+#
+# Evaluator
+# ^^^^^^^^^
+#
+# To implement these tricks, we first need to rewrite a few parts of evaluator.
+#
+# Working with one-shot strategies, evaluators need to be implemented in the style of :ref:`PyTorch-Lightning <lightning-evaluator>`,
+# The full tutorial can be found in :doc:`/nas/evaluator`.
+# Putting it briefly, the core part of writing a new evaluator is to write a new LightningModule.
+# `LightingModule <https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html>`__ is a concept in
+# PyTorch-Lightning, which organizes the model training process into a list of functions, such as,
+# ``training_step``, ``validation_step``, ``configure_optimizers``, etc.
+# Since we are merely adding a few ingredients to :class:`~nni.retiarii.evaluator.pytorch.Classification`,
+# we can simply inherit :class:`~nni.retiarii.evaluator.pytorch.ClassificationModule`, which is the underlying LightningModule
+# behind :class:`~nni.retiarii.evaluator.pytorch.Classification`.
+# This could look intimidating at first, but most of them are just plug-and-play tricks which you don't need to know details about.
+
+import torch
+from nni.retiarii.evaluator.pytorch import ClassificationModule
+
+class DartsClassificationModule(ClassificationModule):
+    def __init__(
+        self,
+        learning_rate: float = 0.001,
+        weight_decay: float = 0.,
+        auxiliary_loss_weight: float = 0.4,
+        max_epochs: int = 600
+    ):
+        self.auxiliary_loss_weight = auxiliary_loss_weight
+        # Training length will be used in LR scheduler
+        self.max_epochs = max_epochs
+        super().__init__(learning_rate=learning_rate, weight_decay=weight_decay, export_onnx=False)
+
+    def configure_optimizers(self):
+        """Customized optimizer with momentum, as well as a scheduler."""
+        optimizer = torch.optim.SGD(
+            self.parameters(),
+            momentum=0.9,
+            lr=self.hparams.learning_rate,
+            weight_decay=self.hparams.weight_decay
+        )
+        return {
+            'optimizer': optimizer,
+            'lr_scheduler': torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, self.max_epochs, eta_min=1e-3)
+        }
+
+    def training_step(self, batch, batch_idx):
+        """Training step, customized with auxiliary loss."""
+        x, y = batch
+        if self.auxiliary_loss_weight:
+            y_hat, y_aux = self(x)
+            loss_main = self.criterion(y_hat, y)
+            loss_aux = self.criterion(y_aux, y)
+            self.log('train_loss_main', loss_main)
+            self.log('train_loss_aux', loss_aux)
+            loss = loss_main + self.auxiliary_loss_weight * loss_aux
+        else:
+            y_hat = self(x)
+            loss = self.criterion(y_hat, y)
+        self.log('train_loss', loss, prog_bar=True)
+        for name, metric in self.metrics.items():
+            self.log('train_' + name, metric(y_hat, y), prog_bar=True)
+        return loss
+
+    def on_train_epoch_start(self):
+        # Set drop path probability before every epoch. This has no effect if drop path is not enabled in model.
+        self.model.set_drop_path_prob(self.model.drop_path_prob * self.current_epoch / self.max_epochs)
+
+        # Logging learning rate at the beginning of every epoch
+        self.log('lr', self.trainer.optimizers[0].param_groups[0]['lr'])
+
+# %%
+#
+# The full evaluator is written as follows,
+# which simply wraps everything (except model space and search strategy of course), in a single object.
+# :class:`~nni.retiarii.evaluator.pytorch.Lightning` here is a special type of evaluator.
+# Don't forget to use the train/val data split specialized for search (1:1) here.
+
+from nni.retiarii.evaluator.pytorch import Lightning, Trainer
+
+max_epochs = 50
+
+evaluator = Lightning(
+    DartsClassificationModule(0.025, 3e-4, 0., max_epochs),
+    Trainer(
+        gpus=1,
+        max_epochs=max_epochs,
+        fast_dev_run=fast_dev_run,
+    ),
+    train_dataloaders=search_train_loader,
+    val_dataloaders=search_valid_loader
+)
+
+# %%
+#
+# Strategy
+# ^^^^^^^^
+#
+# :class:`~nni.retiarii.strategy.DARTS` strategy is created with gradient clip turned on.
+# If you are familiar with PyTorch-Lightning, you might aware that gradient clipping can be enabled in Lightning trainer.
+# However, enabling gradient clip in the trainer above won't work, because the underlying
+# implementation of :class:`~nni.retiarii.strategy.DARTS` strategy is based on
+# `manual optimization <https://pytorch-lightning.readthedocs.io/en/stable/common/optimization.html>`__.
+
+strategy = DartsStrategy(gradient_clip_val=5.)
+
+# %%
+#
+# Launch experiment
+# ^^^^^^^^^^^^^^^^^
+#
+# Then we use the newly created evaluator and strategy to launch the experiment again.
+#
+# .. warning::
+#
+#    ``model_space`` has to be re-instantiated because a known limitation,
+#    i.e., one model space instance can't be reused across multiple experiments.
+
+model_space = DartsSpace(width=16, num_cells=8, dataset='cifar')
+
+config = RetiariiExeConfig(execution_engine='oneshot')
+experiment = RetiariiExperiment(model_space, evaluator=evaluator, strategy=strategy)
+experiment.run(config)
+
+exported_arch = experiment.export_top_models()[0]
+
+exported_arch
+
+# %%
+#
+# We get the following architecture when ``fast_dev_run`` is set to False. It takes around 8 hours on a P100 GPU.
+
+plot_double_cells({
+    'normal/op_2_0': 'sep_conv_3x3',
+    'normal/input_2_0': 0,
+    'normal/op_2_1': 'sep_conv_3x3',
+    'normal/input_2_1': 1,
+    'normal/op_3_0': 'sep_conv_3x3',
+    'normal/input_3_0': 1,
+    'normal/op_3_1': 'skip_connect',
+    'normal/input_3_1': 0,
+    'normal/op_4_0': 'sep_conv_3x3',
+    'normal/input_4_0': 0,
+    'normal/op_4_1': 'max_pool_3x3',
+    'normal/input_4_1': 1,
+    'normal/op_5_0': 'sep_conv_3x3',
+    'normal/input_5_0': 0,
+    'normal/op_5_1': 'sep_conv_3x3',
+    'normal/input_5_1': 1,
+    'reduce/op_2_0': 'max_pool_3x3',
+    'reduce/input_2_0': 0,
+    'reduce/op_2_1': 'sep_conv_5x5',
+    'reduce/input_2_1': 1,
+    'reduce/op_3_0': 'dil_conv_5x5',
+    'reduce/input_3_0': 2,
+    'reduce/op_3_1': 'max_pool_3x3',
+    'reduce/input_3_1': 0,
+    'reduce/op_4_0': 'max_pool_3x3',
+    'reduce/input_4_0': 0,
+    'reduce/op_4_1': 'sep_conv_3x3',
+    'reduce/input_4_1': 2,
+    'reduce/op_5_0': 'max_pool_3x3',
+    'reduce/input_5_0': 0,
+    'reduce/op_5_1': 'skip_connect',
+    'reduce/input_5_1': 2
+})
+
+# %%
+#
+# Retrain
+# ^^^^^^^
+#
+# When retraining,
+# we extend the original dataloader to introduce another trick called `Cutout <https://arxiv.org/pdf/1708.04552v2.pdf>`__.
+# Cutout is a data augmentation technique that randomly masks out rectangular regions in images.
+# In CIFAR-10, the typical masked size is 16x16 (the image sizes are 32x32 in the dataset).
+
+def cutout_transform(img, length: int = 16):
+    h, w = img.size(1), img.size(2)
+    mask = np.ones((h, w), np.float32)
+    y = np.random.randint(h)
+    x = np.random.randint(w)
+
+    y1 = np.clip(y - length // 2, 0, h)
+    y2 = np.clip(y + length // 2, 0, h)
+    x1 = np.clip(x - length // 2, 0, w)
+    x2 = np.clip(x + length // 2, 0, w)
+
+    mask[y1: y2, x1: x2] = 0.
+    mask = torch.from_numpy(mask)
+    mask = mask.expand_as(img)
+    img *= mask
+    return img
+
+transform_with_cutout = transforms.Compose([
+    transforms.RandomCrop(32, padding=4),
+    transforms.RandomHorizontalFlip(),
+    transforms.ToTensor(),
+    transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
+    cutout_transform,
+])
+
+# %%
+#
+# The train dataloader needs to be reinstantiated with the new transform.
+# The validation dataloader is not affected, and thus can be reused.
+
+train_data_cutout = nni.trace(CIFAR10)(root='./data', train=True, download=True, transform=transform_with_cutout)
+train_loader_cutout = DataLoader(train_data_cutout, batch_size=96)
+
+# %%
+#
+# We then create the final model based on the new exported architecture.
+# This time, auxiliary loss and drop path probability is enabled.
+#
+# Following the same procedure as paper, we also increase the number of filters to 36, and number of cells to 20,
+# so as to reasonably increase the model size and boost the performance.
+
+with fixed_arch(exported_arch):
+    final_model = DartsSpace(width=36, num_cells=20, dataset='cifar', auxiliary_loss=True, drop_path_prob=0.2)
+
+# %%
+#
+# We create a new evaluator for the retraining process, where the gradient clipping is put into the keyword arguments of trainer.
+
+max_epochs = 600
+
+evaluator = Lightning(
+    DartsClassificationModule(0.025, 3e-4, 0.4, max_epochs),
+    trainer=Trainer(
+        gpus=1,
+        gradient_clip_val=5.,
+        max_epochs=max_epochs,
+        fast_dev_run=fast_dev_run
+    ),
+    train_dataloaders=train_loader_cutout,
+    val_dataloaders=valid_loader,
+)
+
+evaluator.fit(final_model)
+
+# %%
+#
+# When ``fast_dev_run`` is turned off, after retraining, the architecture yields a top-1 accuracy of 97.12%.
+# If we take the best snapshot throughout the retrain process,
+# there is a chance that the top-1 accuracy will be 97.28%.
+#
+# .. image:: ../../img/darts_val_acc.png
+#
+# In the figure, the orange line is the validation accuracy curve after training for 600 epochs,
+# while the red line corresponding the previous version in this tutorial before adding all the training tricks and
+# only trains for 100 epochs.
+#
+# The results outperforms "DARTS (first order) + cutout" in `DARTS`_ paper, which is only 97.00±0.14%.
+# It's even comparable with "DARTS (second order) + cutout" in the paper (97.24±0.09%),
+# though we didn't implement the second order version.
+# The implementation of second order DARTS is in our future plan, and we also welcome your contribution.
--- a/docs/source/tutorials/darts.py.md5
+++ b/docs/source/tutorials/darts.py.md5
+f314677f825241fdc926f4d01c55680d
\ No newline at end of file
--- a/docs/source/tutorials/darts.rst
+++ b/docs/source/tutorials/darts.rst
+
+.. DO NOT EDIT.
+.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
+.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
+.. "tutorials/darts.py"
+.. LINE NUMBERS ARE GIVEN BELOW.
+
+.. only:: html
+
+    .. note::
+        :class: sphx-glr-download-link-note
+
+        Click :ref:`here <sphx_glr_download_tutorials_darts.py>`
+        to download the full example code
+
+.. rst-class:: sphx-glr-example-title
+
+.. _sphx_glr_tutorials_darts.py:
+
+
+Searching in DARTS search space
+===============================
+
+In this tutorial, we demonstrate how to search in the famous model space proposed in `DARTS`_.
+
+Through this process, you will learn:
+
+* How to use the built-in model spaces from NNI's model space hub.
+* How to use one-shot exploration strategies to explore a model space.
+* How to customize evaluators to achieve the best performance.
+
+In the end, we get a strong-performing model on CIFAR-10 dataset, which achieves up to 97.28% accuracy.
+
+.. attention::
+
+   Running this tutorial requires a GPU.
+   If you don't have one, you can set ``gpus`` in :class:`~nni.retiarii.evaluator.pytorch.Classification` to be 0,
+   but do note that it will be much slower.
+
+.. _DARTS: https://arxiv.org/abs/1806.09055
+
+Use a pre-searched DARTS model
+------------------------------
+
+Similar to `the beginner tutorial of PyTorch <https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html>`__,
+we begin with CIFAR-10 dataset, which is a image classification dataset of 10 categories.
+The images in CIFAR-10 are of size 3x32x32, i.e., RGB-colored images of 32x32 pixels in size.
+
+We first load the CIFAR-10 dataset with torchvision.
+
+.. GENERATED FROM PYTHON SOURCE LINES 32-49
+
+.. code-block:: default
+
+
+    import nni
+    import torch
+    from torchvision import transforms
+    from torchvision.datasets import CIFAR10
+    from nni.retiarii.evaluator.pytorch import DataLoader
+
+    CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124]
+    CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
+
+    transform_valid = transforms.Compose([
+        transforms.ToTensor(),
+        transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
+    ])
+    valid_data = nni.trace(CIFAR10)(root='./data', train=False, download=True, transform=transform_valid)
+    valid_loader = DataLoader(valid_data, batch_size=256, num_workers=6)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+    Files already downloaded and verified
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 50-70
+
+.. note::
+
+   If you are to use multi-trial strategies, wrapping CIFAR10 with :func:`nni.trace` and
+   use DataLoader from ``nni.retiarii.evaluator.pytorch`` (instead of ``torch.utils.data``) are mandatory.
+   Otherwise, it's optional.
+
+NNI presents many built-in model spaces, along with many *pre-searched models* in :doc:`model space hub </nas/space_hub>`,
+which are produced by most popular NAS literatures.
+A pre-trained model is a saved network that was previously trained on a large dataset like CIFAR-10 or ImageNet.
+You can easily load these models as a starting point, validate their performances, and finetune them if you need.
+
+In this tutorial, we choose one from `DARTS`_ search space, which is natively trained on our target dataset, CIFAR-10,
+so as to save the tedious steps of finetuning.
+
+.. tip::
+
+   Finetuning a pre-searched model on other datasets is no different from finetuning *any model*.
+   We recommend reading
+   `this tutorial of object detection finetuning <https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html>`__
+   if you want to know how finetuning is generally done in PyTorch.
+
+.. GENERATED FROM PYTHON SOURCE LINES 71-93
+
+.. code-block:: default
+
+
+    from nni.retiarii.hub.pytorch import DARTS as DartsSpace
+
+    darts_v2_model = DartsSpace.load_searched_model('darts-v2', pretrained=True, download=True)
+
+    def evaluate_model(model, cuda=False):
+        device = torch.device('cuda' if cuda else 'cpu')
+        model.to(device)
+        model.eval()
+        with torch.no_grad():
+            correct = total = 0
+            for inputs, targets in valid_loader:
+                inputs, targets = inputs.to(device), targets.to(device)
+                logits = model(inputs)
+                _, predict = torch.max(logits, 1)
+                correct += (predict == targets).sum().cpu().item()
+                total += targets.size(0)
+        print('Accuracy:', correct / total)
+        return correct / total
+
+    evaluate_model(darts_v2_model, cuda=True)  # Set this to false if there's no GPU.
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+    Accuracy: 0.9737
+
+    0.9737
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 94-158
+
+The journey of using a pre-searched model could end here. Or you are interested,
+we can go a step further to search a model within :class:`~nni.retiarii.hub.pytorch.DARTS` space on our own.
+
+Use the DARTS model space
+-------------------------
+
+The model space provided in `DARTS`_ originated from `NASNet <https://arxiv.org/abs/1707.07012>`__,
+where the full model is constructed by repeatedly stacking a single computational unit (called a **cell**).
+There are two types of cells within a network. The first type is called *normal cell*, and the second type is called *reduction cell*.
+The key difference between normal and reduction cell is that the reduction cell will downsample the input feature map,
+and decrease its resolution. Normal and reduction cells are stacked alternately, as shown in the following figure.
+
+.. image:: ../../img/nasnet_cell_stack.png
+
+A cell takes outputs from two previous cells as inputs and contains a collection of *nodes*.
+Each node takes two previous nodes within the same cell (or the two cell inputs),
+and applies an *operator* (e.g., convolution, or max-pooling) to each input,
+and sums the outputs of operators as the output of the node.
+The output of cell is the concatenation of all the nodes that are never used as inputs of another node.
+Users could read `NDS <https://arxiv.org/pdf/1905.13214.pdf>`__ or `ENAS <https://arxiv.org/abs/1802.03268>`__ for more details.
+
+We illustrate an example of cells in the following figure.
+
+.. image:: ../../img/nasnet_cell.png
+
+The search space proposed in `DARTS`_ paper introduced two modifications to the original space
+in `NASNet <https://arxiv.org/abs/1707.07012>`__.
+
+Firstly, the operator candidates have been narrowed down to seven:
+
+- Max pooling 3x3
+- Average pooling 3x3
+- Skip connect (Identity)
+- Separable convolution 3x3
+- Separable convolution 5x5
+- Dilated convolution 3x3
+- Dilated convolution 5x5
+
+Secondly, the output of cell is the concatenate of **all the nodes within the cell**.
+
+As the search space is based on cell, once the normal and reduction cell has been fixed, we can stack them for indefinite times.
+To save the search cost, the common practice is to reduce the number of filters (i.e., channels) and number of stacked cells
+during the search phase, and increase them back when training the final searched architecture.
+
+.. note::
+
+   `DARTS`_ is one of those papers that innovate both in search space and search strategy.
+   In this tutorial, we will search on **model space** provided by DARTS with **search strategy** proposed by DARTS.
+   We refer to them as *DARTS model space* (``DartsSpace``) and *DARTS strategy* (``DartsStrategy``), respectively.
+   We did NOT imply that the :class:`~nni.retiarii.hub.pytorch.DARTS` space and
+   :class:`~nni.retiarii.strategy.DARTS` strategy has to used together.
+   You can always explore the DARTS space with another search strategy, or use your own strategy to search a different model space.
+
+In the following example, we initialize a :class:`~nni.retiarii.hub.pytorch.DARTS`
+model space, with 16 initial filters and 8 stacked cells.
+The network is specialized for CIFAR-10 dataset with 32x32 input resolution.
+
+The :class:`~nni.retiarii.hub.pytorch.DARTS` model space here is provided by :doc:`model space hub </nas/space_hub>`,
+where we have supported multiple popular model spaces for plug-and-play.
+
+.. tip::
+
+   The model space here can be replaced with any space provided in the hub,
+   or even customized spaces built from scratch.
+
+.. GENERATED FROM PYTHON SOURCE LINES 159-166
+
+.. code-block:: default
+
+
+    model_space = DartsSpace(
+        width=16,           # the initial filters (channel number) for the model
+        num_cells=8,        # the number of stacked cells in total
+        dataset='cifar'     # to give a hint about input resolution, here is 32x32
+    )
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 167-174
+
+Search on the model space
+-------------------------
+
+.. warning::
+
+   Please set ``fast_dev_run`` to False to reproduce the our claimed results.
+   Otherwise, only a few mini-batches will be run.
+
+.. GENERATED FROM PYTHON SOURCE LINES 175-178
+
+.. code-block:: default
+
+
+    fast_dev_run = True
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 179-190
+
+Evaluator
+^^^^^^^^^
+
+To begin exploring the model space, one firstly need to have an evaluator to provide the criterion of a "good model".
+As we are searching on CIFAR-10 dataset, one can easily use the :class:`~nni.retiarii.evaluator.pytorch.Classification`
+as a starting point.
+
+Note that for a typical setup of NAS, the model search should be on validation set, and the evaluation of the final searched model
+should be on test set. However, as CIFAR-10 dataset doesn't have a test dataset (only 50k train + 10k valid),
+we have to split the original training set into a training set and a validation set.
+The recommended train/val split by `DARTS`_ strategy is 1:1.
+
+.. GENERATED FROM PYTHON SOURCE LINES 191-229
+
+.. code-block:: default
+
+
+    import numpy as np
+    from nni.retiarii.evaluator.pytorch import Classification
+    from torch.utils.data import SubsetRandomSampler
+
+    transform = transforms.Compose([
+        transforms.RandomCrop(32, padding=4),
+        transforms.RandomHorizontalFlip(),
+        transforms.ToTensor(),
+        transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
+    ])
+
+    train_data = nni.trace(CIFAR10)(root='./data', train=True, download=True, transform=transform)
+
+    num_samples = len(train_data)
+    indices = np.random.permutation(num_samples)
+    split = num_samples // 2
+
+    search_train_loader = DataLoader(
+        train_data, batch_size=64, num_workers=6,
+        sampler=SubsetRandomSampler(indices[:split]),
+    )
+
+    search_valid_loader = DataLoader(
+        train_data, batch_size=64, num_workers=6,
+        sampler=SubsetRandomSampler(indices[split:]),
+    )
+
+    evaluator = Classification(
+        learning_rate=1e-3,
+        weight_decay=1e-4,
+        train_dataloaders=search_train_loader,
+        val_dataloaders=search_valid_loader,
+        max_epochs=10,
+        gpus=1,
+        fast_dev_run=fast_dev_run,
+    )
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+    Files already downloaded and verified
+    /data/data0/jiahang/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:447: LightningDeprecationWarning: Setting `Trainer(gpus=1)` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=1)` instead.
+      rank_zero_deprecation(
+    GPU available: True (cuda), used: True
+    TPU available: False, using: 0 TPU cores
+    IPU available: False, using: 0 IPUs
+    HPU available: False, using: 0 HPUs
+    Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 230-250
+
+Strategy
+^^^^^^^^
+
+We will use `DARTS`_ (Differentiable ARchiTecture Search) as the search strategy to explore the model space.
+:class:`~nni.retiarii.strategy.DARTS` strategy belongs to the category of :ref:`one-shot strategy <one-shot-nas>`.
+The fundamental differences between One-shot strategies and :ref:`multi-trial strategies <multi-trial-nas>` is that,
+one-shot strategy combines search with model training into a single run.
+Compared to multi-trial strategies, one-shot NAS doesn't need to iteratively spawn new trials (i.e., models),
+and thus saves the excessive cost of model training.
+
+.. note::
+
+   It's worth mentioning that one-shot NAS also suffers from multiple drawbacks despite its computational efficiency.
+   We recommend
+   `Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap <https://arxiv.org/abs/2008.01475>`__
+   and
+   `How Does Supernet Help in Neural Architecture Search? <https://arxiv.org/abs/2010.08219>`__ for interested readers.
+
+:class:`~nni.retiarii.strategy.DARTS` strategy is provided as one of NNI's :doc:`built-in search strategies </nas/exploration_strategy>`.
+Using it can be as simple as one line of code.
+
+.. GENERATED FROM PYTHON SOURCE LINES 251-256
+
+.. code-block:: default
+
+
+    from nni.retiarii.strategy import DARTS as DartsStrategy
+
+    strategy = DartsStrategy()
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 257-276
+
+.. tip:: The ``DartsStrategy`` here can be replaced by any search strategies, even multi-trial strategies.
+
+If you want to know how DARTS strategy works, here is a brief version.
+Under the hood, DARTS converts the cell into a densely connected graph, and put operators on edges (see the following figure).
+Since the operators are not decided yet, every edge is a weighted mixture of multiple operators (multiple color in the figure).
+DARTS then learns to assign the optimal "color" for each edge during the network training.
+It finally selects one "color" for each edge, and drops redundant edges.
+The weights on the edges are called *architecture weights*.
+
+.. image:: ../../img/darts_illustration.png
+
+.. tip:: It's NOT reflected in the figure that, for DARTS model space, exactly two inputs are kept for every node.
+
+Launch experiment
+^^^^^^^^^^^^^^^^^
+
+We then come to the step of launching the experiment.
+This step is similar to what we have done in the :doc:`beginner tutorial <hello_nas>`,
+except that the ``execution_engine`` argument should be set to ``oneshot``.
+
+.. GENERATED FROM PYTHON SOURCE LINES 277-284
+
+.. code-block:: default
+
+
+    from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig
+
+    config = RetiariiExeConfig(execution_engine='oneshot')
+    experiment = RetiariiExperiment(model_space, evaluator=evaluator, strategy=strategy)
+    experiment.run(config)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+    LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3]
+
+      | Name  | Type                 | Params
+    -----------------------------------------------
+    0 | model | ClassificationModule | 3.0 M 
+    -----------------------------------------------
+    3.0 M     Trainable params
+    0         Non-trainable params
+    3.0 M     Total params
+    12.164    Total estimated model params size (MB)
+    /data/data0/jiahang/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
+      rank_zero_warn(
+
    Training: 0it [00:00, ?it/s]
    Training:   0%|          | 0/1 [00:00<?, ?it/s]
    Epoch 0:   0%|          | 0/1 [00:00<?, ?it/s] 
    Epoch 0: 100%|##########| 1/1 [00:03<00:00,  3.75s/it]
    Epoch 0: 100%|##########| 1/1 [00:03<00:00,  3.75s/it, v_num=, train_loss=2.310, train_acc=0.0781]
    Epoch 0: 100%|##########| 1/1 [00:03<00:00,  3.76s/it, v_num=, train_loss=2.310, train_acc=0.0781]`Trainer.fit` stopped: `max_epochs=1` reached.
+
    Epoch 0: 100%|##########| 1/1 [00:03<00:00,  3.77s/it, v_num=, train_loss=2.310, train_acc=0.0781]
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 285-297
+
+.. tip::
+
+   The search process can be visualized with tensorboard. For example::
+
+       tensorboard --logdir=./lightning_logs
+
+   Then, open the browser and go to http://localhost:6006/ to monitor the search process.
+
+   .. image:: ../../img/darts_search_process.png
+
+We can then retrieve the best model found by the strategy with ``export_top_models``.
+Here, the retrieved model is a dict (called *architecture dict*) describing the selected normal cell and reduction cell.
+
+.. GENERATED FROM PYTHON SOURCE LINES 298-303
+
+.. code-block:: default
+
+
+    exported_arch = experiment.export_top_models()[0]
+
+    exported_arch
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+
+    {'normal/op_2_0': 'sep_conv_5x5', 'normal/input_2_0': 1, 'normal/op_2_1': 'max_pool_3x3', 'normal/input_2_1': 0, 'normal/op_3_0': 'dil_conv_5x5', 'normal/input_3_0': 0, 'normal/op_3_1': 'sep_conv_3x3', 'normal/input_3_1': 2, 'normal/op_4_0': 'dil_conv_5x5', 'normal/input_4_0': 3, 'normal/op_4_1': 'sep_conv_3x3', 'normal/input_4_1': 1, 'normal/op_5_0': 'sep_conv_5x5', 'normal/input_5_0': 1, 'normal/op_5_1': 'dil_conv_5x5', 'normal/input_5_1': 3, 'reduce/op_2_0': 'dil_conv_5x5', 'reduce/input_2_0': 0, 'reduce/op_2_1': 'sep_conv_5x5', 'reduce/input_2_1': 1, 'reduce/op_3_0': 'sep_conv_5x5', 'reduce/input_3_0': 1, 'reduce/op_3_1': 'max_pool_3x3', 'reduce/input_3_1': 2, 'reduce/op_4_0': 'avg_pool_3x3', 'reduce/input_4_0': 1, 'reduce/op_4_1': 'dil_conv_5x5', 'reduce/input_4_1': 3, 'reduce/op_5_0': 'sep_conv_3x3', 'reduce/input_5_0': 1, 'reduce/op_5_1': 'sep_conv_5x5', 'reduce/input_5_1': 3}
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 304-306
+
+The cell can be visualized with the following code snippet
+(copied and modified from `DARTS visualization <https://github.com/quark0/darts/blob/master/cnn/visualize.py>`__).
+
+.. GENERATED FROM PYTHON SOURCE LINES 307-362
+
+.. code-block:: default
+
+
+    import io
+    import graphviz
+    import matplotlib.pyplot as plt
+    from PIL import Image
+
+    def plot_single_cell(arch_dict, cell_name):
+        g = graphviz.Digraph(
+            node_attr=dict(style='filled', shape='rect', align='center'),
+            format='png'
+        )
+        g.body.extend(['rankdir=LR'])
+
+        g.node('c_{k-2}', fillcolor='darkseagreen2')
+        g.node('c_{k-1}', fillcolor='darkseagreen2')
+        assert len(arch_dict) % 2 == 0
+
+        for i in range(2, 6):
+            g.node(str(i), fillcolor='lightblue')
+
+        for i in range(2, 6):
+            for j in range(2):
+                op = arch_dict[f'{cell_name}/op_{i}_{j}']
+                from_ = arch_dict[f'{cell_name}/input_{i}_{j}']
+                if from_ == 0:
+                    u = 'c_{k-2}'
+                elif from_ == 1:
+                    u = 'c_{k-1}'
+                else:
+                    u = str(from_)
+                v = str(i)
+                g.edge(u, v, label=op, fillcolor='gray')
+
+        g.node('c_{k}', fillcolor='palegoldenrod')
+        for i in range(2, 6):
+            g.edge(str(i), 'c_{k}', fillcolor='gray')
+
+        g.attr(label=f'{cell_name.capitalize()} cell')
+
+        image = Image.open(io.BytesIO(g.pipe()))
+        return image
+
+    def plot_double_cells(arch_dict):
+        image1 = plot_single_cell(arch_dict, 'normal')
+        image2 = plot_single_cell(arch_dict, 'reduce')
+        height_ratio = max(image1.size[1] / image1.size[0], image2.size[1] / image2.size[0]) 
+        _, axs = plt.subplots(1, 2, figsize=(20, 10 * height_ratio))
+        axs[0].imshow(image1)
+        axs[1].imshow(image2)
+        axs[0].axis('off')
+        axs[1].axis('off')
+        plt.show()
+
+    plot_double_cells(exported_arch)
+
+
+
+
+.. image-sg:: /tutorials/images/sphx_glr_darts_001.png
+   :alt: darts
+   :srcset: /tutorials/images/sphx_glr_darts_001.png
+   :class: sphx-glr-single-img
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 363-367
+
+.. warning:: The cell above is obtained via ``fast_dev_run`` (i.e., running only 1 mini-batch).
+
+When ``fast_dev_run`` is turned off, we get a model with the following architecture,
+where you might notice an interesting fact that around half the operations have selected ``sep_conv_3x3``.
+
+.. GENERATED FROM PYTHON SOURCE LINES 368-404
+
+.. code-block:: default
+
+
+    plot_double_cells({
+        'normal/op_2_0': 'sep_conv_3x3',
+        'normal/input_2_0': 1,
+        'normal/op_2_1': 'sep_conv_3x3',
+        'normal/input_2_1': 0,
+        'normal/op_3_0': 'sep_conv_3x3',
+        'normal/input_3_0': 1,
+        'normal/op_3_1': 'sep_conv_3x3',
+        'normal/input_3_1': 2,
+        'normal/op_4_0': 'sep_conv_3x3',
+        'normal/input_4_0': 1,
+        'normal/op_4_1': 'sep_conv_3x3',
+        'normal/input_4_1': 0,
+        'normal/op_5_0': 'sep_conv_3x3',
+        'normal/input_5_0': 1,
+        'normal/op_5_1': 'max_pool_3x3',
+        'normal/input_5_1': 0,
+        'reduce/op_2_0': 'sep_conv_3x3',
+        'reduce/input_2_0': 0,
+        'reduce/op_2_1': 'sep_conv_3x3',
+        'reduce/input_2_1': 1,
+        'reduce/op_3_0': 'dil_conv_5x5',
+        'reduce/input_3_0': 2,
+        'reduce/op_3_1': 'sep_conv_3x3',
+        'reduce/input_3_1': 0,
+        'reduce/op_4_0': 'dil_conv_5x5',
+        'reduce/input_4_0': 2,
+        'reduce/op_4_1': 'sep_conv_5x5',
+        'reduce/input_4_1': 1,
+        'reduce/op_5_0': 'sep_conv_5x5',
+        'reduce/input_5_0': 4,
+        'reduce/op_5_1': 'dil_conv_5x5',
+        'reduce/input_5_1': 2
+    })
+
+
+
+
+.. image-sg:: /tutorials/images/sphx_glr_darts_002.png
+   :alt: darts
+   :srcset: /tutorials/images/sphx_glr_darts_002.png
+   :class: sphx-glr-single-img
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 405-415
+
+Retrain the searched model
+--------------------------
+
+What we have got in the last step, is only a cell structure.
+To get a final usable model with trained weights, we need to construct a real model based on this structure,
+and then fully train it.
+
+To construct a fixed model based on the architecture dict exported from the experiment,
+we can use :func:`nni.retiarii.fixed_arch`. Under the with-context, we will creating a fixed model based on ``exported_arch``,
+instead of creating a space.
+
+.. GENERATED FROM PYTHON SOURCE LINES 416-422
+
+.. code-block:: default
+
+
+    from nni.retiarii import fixed_arch
+
+    with fixed_arch(exported_arch):
+        final_model = DartsSpace(width=16, num_cells=8, dataset='cifar')
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 423-424
+
+We then train the model on full CIFAR-10 training dataset, and evaluate it on the original CIFAR-10 validation dataset.
+
+.. GENERATED FROM PYTHON SOURCE LINES 425-428
+
+.. code-block:: default
+
+
+    train_loader = DataLoader(train_data, batch_size=96, num_workers=6)  # Use the original training data
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 429-430
+
+The validation data loader can be reused.
+
+.. GENERATED FROM PYTHON SOURCE LINES 431-434
+
+.. code-block:: default
+
+
+    valid_loader
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+
+    <torch.utils.data.dataloader.DataLoader object at 0x7f5e187c0430>
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 435-438
+
+We must create a new evaluator here because a different data split is used.
+Also, we should avoid the underlying pytorch-lightning implementation of :class:`~nni.retiarii.evaluator.pytorch.Classification`
+evaluator from loading the wrong checkpoint.
+
+.. GENERATED FROM PYTHON SOURCE LINES 439-455
+
+.. code-block:: default
+
+
+    max_epochs = 100
+
+    evaluator = Classification(
+        learning_rate=1e-3,
+        weight_decay=1e-4,
+        train_dataloaders=train_loader,
+        val_dataloaders=valid_loader,
+        max_epochs=max_epochs,
+        gpus=1,
+        export_onnx=False,          # Disable ONNX export for this experiment
+        fast_dev_run=fast_dev_run   # Should be false for fully training
+    )
+
+    evaluator.fit(final_model)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+    /data/data0/jiahang/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:447: LightningDeprecationWarning: Setting `Trainer(gpus=1)` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=1)` instead.
+      rank_zero_deprecation(
+    GPU available: True (cuda), used: True
+    TPU available: False, using: 0 TPU cores
+    IPU available: False, using: 0 IPUs
+    HPU available: False, using: 0 HPUs
+    Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
+    LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3]
+
+      | Name      | Type             | Params
+    -----------------------------------------------
+    0 | criterion | CrossEntropyLoss | 0     
+    1 | metrics   | ModuleDict       | 0     
+    2 | model     | DARTS            | 345 K 
+    -----------------------------------------------
+    345 K     Trainable params
+    0         Non-trainable params
+    345 K     Total params
+    1.381     Total estimated model params size (MB)
+    /data/data0/jiahang/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
+      rank_zero_warn(
+
    Training: 0it [00:00, ?it/s]
    Training:   0%|          | 0/2 [00:00<?, ?it/s]
    Epoch 0:   0%|          | 0/2 [00:00<?, ?it/s] 
    Epoch 0:  50%|#####     | 1/2 [00:00<00:00,  1.02it/s]
    Epoch 0:  50%|#####     | 1/2 [00:00<00:00,  1.02it/s, loss=2.46, v_num=, train_loss=2.460, train_acc=0.0729]
+
    Validation: 0it [00:00, ?it/s]
+
    Validation:   0%|          | 0/1 [00:00<?, ?it/s]
+
    Validation DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]
+
    Validation DataLoader 0: 100%|##########| 1/1 [00:00<00:00, 11.12it/s]
    Epoch 0: 100%|##########| 2/2 [00:01<00:00,  1.15it/s, loss=2.46, v_num=, train_loss=2.460, train_acc=0.0729]
    Epoch 0: 100%|##########| 2/2 [00:01<00:00,  1.15it/s, loss=2.46, v_num=, train_loss=2.460, train_acc=0.0729, val_loss=2.300, val_acc=0.117]
+
                                                                          
    Epoch 0: 100%|##########| 2/2 [00:01<00:00,  1.15it/s, loss=2.46, v_num=, train_loss=2.460, train_acc=0.0729, val_loss=2.300, val_acc=0.117]`Trainer.fit` stopped: `max_steps=1` reached.
+
    Epoch 0: 100%|##########| 2/2 [00:01<00:00,  1.15it/s, loss=2.46, v_num=, train_loss=2.460, train_acc=0.0729, val_loss=2.300, val_acc=0.117]
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 456-484
+
+.. note:: When ``fast_dev_run`` is turned off, we achieve a validation accuracy of 89.69% after training for 100 epochs.
+
+Reproduce results in DARTS paper
+--------------------------------
+
+After a brief walkthrough of search + retrain process with one-shot strategy,
+we then fill the gap between our results (89.69%) and the results in the `DARTS` paper.
+This is because we didn't introduce some extra training tricks, including `DropPath <https://arxiv.org/pdf/1605.07648v4.pdf>`__,
+Auxiliary loss, gradient clipping and augmentations like `Cutout <https://arxiv.org/pdf/1708.04552v2.pdf>`__.
+They also train the deeper (20 cells) and wider (36 filters) networks for longer time (600 epochs).
+Here we reproduce these tricks to get comparable results with DARTS paper.
+
+
+Evaluator
+^^^^^^^^^
+
+To implement these tricks, we first need to rewrite a few parts of evaluator.
+
+Working with one-shot strategies, evaluators need to be implemented in the style of :ref:`PyTorch-Lightning <lightning-evaluator>`,
+The full tutorial can be found in :doc:`/nas/evaluator`.
+Putting it briefly, the core part of writing a new evaluator is to write a new LightningModule.
+`LightingModule <https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html>`__ is a concept in
+PyTorch-Lightning, which organizes the model training process into a list of functions, such as,
+``training_step``, ``validation_step``, ``configure_optimizers``, etc.
+Since we are merely adding a few ingredients to :class:`~nni.retiarii.evaluator.pytorch.Classification`,
+we can simply inherit :class:`~nni.retiarii.evaluator.pytorch.ClassificationModule`, which is the underlying LightningModule
+behind :class:`~nni.retiarii.evaluator.pytorch.Classification`.
+This could look intimidating at first, but most of them are just plug-and-play tricks which you don't need to know details about.
+
+.. GENERATED FROM PYTHON SOURCE LINES 485-540
+
+.. code-block:: default
+
+
+    import torch
+    from nni.retiarii.evaluator.pytorch import ClassificationModule
+
+    class DartsClassificationModule(ClassificationModule):
+        def __init__(
+            self,
+            learning_rate: float = 0.001,
+            weight_decay: float = 0.,
+            auxiliary_loss_weight: float = 0.4,
+            max_epochs: int = 600
+        ):
+            self.auxiliary_loss_weight = auxiliary_loss_weight
+            # Training length will be used in LR scheduler
+            self.max_epochs = max_epochs
+            super().__init__(learning_rate=learning_rate, weight_decay=weight_decay, export_onnx=False)
+
+        def configure_optimizers(self):
+            """Customized optimizer with momentum, as well as a scheduler."""
+            optimizer = torch.optim.SGD(
+                self.parameters(),
+                momentum=0.9,
+                lr=self.hparams.learning_rate,
+                weight_decay=self.hparams.weight_decay
+            )
+            return {
+                'optimizer': optimizer,
+                'lr_scheduler': torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, self.max_epochs, eta_min=1e-3)
+            }
+
+        def training_step(self, batch, batch_idx):
+            """Training step, customized with auxiliary loss."""
+            x, y = batch
+            if self.auxiliary_loss_weight:
+                y_hat, y_aux = self(x)
+                loss_main = self.criterion(y_hat, y)
+                loss_aux = self.criterion(y_aux, y)
+                self.log('train_loss_main', loss_main)
+                self.log('train_loss_aux', loss_aux)
+                loss = loss_main + self.auxiliary_loss_weight * loss_aux
+            else:
+                y_hat = self(x)
+                loss = self.criterion(y_hat, y)
+            self.log('train_loss', loss, prog_bar=True)
+            for name, metric in self.metrics.items():
+                self.log('train_' + name, metric(y_hat, y), prog_bar=True)
+            return loss
+
+        def on_train_epoch_start(self):
+            # Set drop path probability before every epoch. This has no effect if drop path is not enabled in model.
+            self.model.set_drop_path_prob(self.model.drop_path_prob * self.current_epoch / self.max_epochs)
+
+            # Logging learning rate at the beginning of every epoch
+            self.log('lr', self.trainer.optimizers[0].param_groups[0]['lr'])
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 541-545
+
+The full evaluator is written as follows,
+which simply wraps everything (except model space and search strategy of course), in a single object.
+:class:`~nni.retiarii.evaluator.pytorch.Lightning` here is a special type of evaluator.
+Don't forget to use the train/val data split specialized for search (1:1) here.
+
+.. GENERATED FROM PYTHON SOURCE LINES 546-562
+
+.. code-block:: default
+
+
+    from nni.retiarii.evaluator.pytorch import Lightning, Trainer
+
+    max_epochs = 50
+
+    evaluator = Lightning(
+        DartsClassificationModule(0.025, 3e-4, 0., max_epochs),
+        Trainer(
+            gpus=1,
+            max_epochs=max_epochs,
+            fast_dev_run=fast_dev_run,
+        ),
+        train_dataloaders=search_train_loader,
+        val_dataloaders=search_valid_loader
+    )
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+    /data/data0/jiahang/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:447: LightningDeprecationWarning: Setting `Trainer(gpus=1)` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=1)` instead.
+      rank_zero_deprecation(
+    GPU available: True (cuda), used: True
+    TPU available: False, using: 0 TPU cores
+    IPU available: False, using: 0 IPUs
+    HPU available: False, using: 0 HPUs
+    Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 563-571
+
+Strategy
+^^^^^^^^
+
+:class:`~nni.retiarii.strategy.DARTS` strategy is created with gradient clip turned on.
+If you are familiar with PyTorch-Lightning, you might aware that gradient clipping can be enabled in Lightning trainer.
+However, enabling gradient clip in the trainer above won't work, because the underlying
+implementation of :class:`~nni.retiarii.strategy.DARTS` strategy is based on
+`manual optimization <https://pytorch-lightning.readthedocs.io/en/stable/common/optimization.html>`__.
+
+.. GENERATED FROM PYTHON SOURCE LINES 572-575
+
+.. code-block:: default
+
+
+    strategy = DartsStrategy(gradient_clip_val=5.)
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 576-585
+
+Launch experiment
+^^^^^^^^^^^^^^^^^
+
+Then we use the newly created evaluator and strategy to launch the experiment again.
+
+.. warning::
+
+   ``model_space`` has to be re-instantiated because a known limitation,
+   i.e., one model space instance can't be reused across multiple experiments.
+
+.. GENERATED FROM PYTHON SOURCE LINES 586-597
+
+.. code-block:: default
+
+
+    model_space = DartsSpace(width=16, num_cells=8, dataset='cifar')
+
+    config = RetiariiExeConfig(execution_engine='oneshot')
+    experiment = RetiariiExperiment(model_space, evaluator=evaluator, strategy=strategy)
+    experiment.run(config)
+
+    exported_arch = experiment.export_top_models()[0]
+
+    exported_arch
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+    LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3]
+
+      | Name  | Type                      | Params
+    ----------------------------------------------------
+    0 | model | DartsClassificationModule | 3.0 M 
+    ----------------------------------------------------
+    3.0 M     Trainable params
+    0         Non-trainable params
+    3.0 M     Total params
+    12.164    Total estimated model params size (MB)
+    /data/data0/jiahang/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
+      rank_zero_warn(
+
    Training: 0it [00:00, ?it/s]
    Training:   0%|          | 0/1 [00:00<?, ?it/s]
    Epoch 0:   0%|          | 0/1 [00:00<?, ?it/s] 
    Epoch 0: 100%|##########| 1/1 [01:04<00:00, 64.95s/it]
    Epoch 0: 100%|##########| 1/1 [01:04<00:00, 64.95s/it, v_num=, train_loss=2.450, train_acc=0.0625]
    Epoch 0: 100%|##########| 1/1 [01:04<00:00, 64.96s/it, v_num=, train_loss=2.450, train_acc=0.0625]`Trainer.fit` stopped: `max_epochs=1` reached.
+
    Epoch 0: 100%|##########| 1/1 [01:04<00:00, 64.97s/it, v_num=, train_loss=2.450, train_acc=0.0625]
+
+    {'normal/op_2_0': 'avg_pool_3x3', 'normal/input_2_0': 0, 'normal/op_2_1': 'avg_pool_3x3', 'normal/input_2_1': 1, 'normal/op_3_0': 'sep_conv_5x5', 'normal/input_3_0': 2, 'normal/op_3_1': 'avg_pool_3x3', 'normal/input_3_1': 0, 'normal/op_4_0': 'dil_conv_3x3', 'normal/input_4_0': 2, 'normal/op_4_1': 'sep_conv_3x3', 'normal/input_4_1': 0, 'normal/op_5_0': 'avg_pool_3x3', 'normal/input_5_0': 2, 'normal/op_5_1': 'dil_conv_5x5', 'normal/input_5_1': 4, 'reduce/op_2_0': 'sep_conv_3x3', 'reduce/input_2_0': 1, 'reduce/op_2_1': 'sep_conv_5x5', 'reduce/input_2_1': 0, 'reduce/op_3_0': 'avg_pool_3x3', 'reduce/input_3_0': 2, 'reduce/op_3_1': 'sep_conv_3x3', 'reduce/input_3_1': 0, 'reduce/op_4_0': 'max_pool_3x3', 'reduce/input_4_0': 1, 'reduce/op_4_1': 'dil_conv_5x5', 'reduce/input_4_1': 2, 'reduce/op_5_0': 'dil_conv_3x3', 'reduce/input_5_0': 3, 'reduce/op_5_1': 'max_pool_3x3', 'reduce/input_5_1': 4}
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 598-599
+
+We get the following architecture when ``fast_dev_run`` is set to False. It takes around 8 hours on a P100 GPU.
+
+.. GENERATED FROM PYTHON SOURCE LINES 600-636
+
+.. code-block:: default
+
+
+    plot_double_cells({
+        'normal/op_2_0': 'sep_conv_3x3',
+        'normal/input_2_0': 0,
+        'normal/op_2_1': 'sep_conv_3x3',
+        'normal/input_2_1': 1,
+        'normal/op_3_0': 'sep_conv_3x3',
+        'normal/input_3_0': 1,
+        'normal/op_3_1': 'skip_connect',
+        'normal/input_3_1': 0,
+        'normal/op_4_0': 'sep_conv_3x3',
+        'normal/input_4_0': 0,
+        'normal/op_4_1': 'max_pool_3x3',
+        'normal/input_4_1': 1,
+        'normal/op_5_0': 'sep_conv_3x3',
+        'normal/input_5_0': 0,
+        'normal/op_5_1': 'sep_conv_3x3',
+        'normal/input_5_1': 1,
+        'reduce/op_2_0': 'max_pool_3x3',
+        'reduce/input_2_0': 0,
+        'reduce/op_2_1': 'sep_conv_5x5',
+        'reduce/input_2_1': 1,
+        'reduce/op_3_0': 'dil_conv_5x5',
+        'reduce/input_3_0': 2,
+        'reduce/op_3_1': 'max_pool_3x3',
+        'reduce/input_3_1': 0,
+        'reduce/op_4_0': 'max_pool_3x3',
+        'reduce/input_4_0': 0,
+        'reduce/op_4_1': 'sep_conv_3x3',
+        'reduce/input_4_1': 2,
+        'reduce/op_5_0': 'max_pool_3x3',
+        'reduce/input_5_0': 0,
+        'reduce/op_5_1': 'skip_connect',
+        'reduce/input_5_1': 2
+    })
+
+
+
+
+.. image-sg:: /tutorials/images/sphx_glr_darts_003.png
+   :alt: darts
+   :srcset: /tutorials/images/sphx_glr_darts_003.png
+   :class: sphx-glr-single-img
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 637-644
+
+Retrain
+^^^^^^^
+
+When retraining,
+we extend the original dataloader to introduce another trick called `Cutout <https://arxiv.org/pdf/1708.04552v2.pdf>`__.
+Cutout is a data augmentation technique that randomly masks out rectangular regions in images.
+In CIFAR-10, the typical masked size is 16x16 (the image sizes are 32x32 in the dataset).
+
+.. GENERATED FROM PYTHON SOURCE LINES 645-671
+
+.. code-block:: default
+
+
+    def cutout_transform(img, length: int = 16):
+        h, w = img.size(1), img.size(2)
+        mask = np.ones((h, w), np.float32)
+        y = np.random.randint(h)
+        x = np.random.randint(w)
+
+        y1 = np.clip(y - length // 2, 0, h)
+        y2 = np.clip(y + length // 2, 0, h)
+        x1 = np.clip(x - length // 2, 0, w)
+        x2 = np.clip(x + length // 2, 0, w)
+
+        mask[y1: y2, x1: x2] = 0.
+        mask = torch.from_numpy(mask)
+        mask = mask.expand_as(img)
+        img *= mask
+        return img
+
+    transform_with_cutout = transforms.Compose([
+        transforms.RandomCrop(32, padding=4),
+        transforms.RandomHorizontalFlip(),
+        transforms.ToTensor(),
+        transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
+        cutout_transform,
+    ])
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 672-674
+
+The train dataloader needs to be reinstantiated with the new transform.
+The validation dataloader is not affected, and thus can be reused.
+
+.. GENERATED FROM PYTHON SOURCE LINES 675-679
+
+.. code-block:: default
+
+
+    train_data_cutout = nni.trace(CIFAR10)(root='./data', train=True, download=True, transform=transform_with_cutout)
+    train_loader_cutout = DataLoader(train_data_cutout, batch_size=96)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+    Files already downloaded and verified
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 680-685
+
+We then create the final model based on the new exported architecture.
+This time, auxiliary loss and drop path probability is enabled.
+
+Following the same procedure as paper, we also increase the number of filters to 36, and number of cells to 20,
+so as to reasonably increase the model size and boost the performance.
+
+.. GENERATED FROM PYTHON SOURCE LINES 686-690
+
+.. code-block:: default
+
+
+    with fixed_arch(exported_arch):
+        final_model = DartsSpace(width=36, num_cells=20, dataset='cifar', auxiliary_loss=True, drop_path_prob=0.2)
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 691-692
+
+We create a new evaluator for the retraining process, where the gradient clipping is put into the keyword arguments of trainer.
+
+.. GENERATED FROM PYTHON SOURCE LINES 693-710
+
+.. code-block:: default
+
+
+    max_epochs = 600
+
+    evaluator = Lightning(
+        DartsClassificationModule(0.025, 3e-4, 0.4, max_epochs),
+        trainer=Trainer(
+            gpus=1,
+            gradient_clip_val=5.,
+            max_epochs=max_epochs,
+            fast_dev_run=fast_dev_run
+        ),
+        train_dataloaders=train_loader_cutout,
+        val_dataloaders=valid_loader,
+    )
+
+    evaluator.fit(final_model)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+    /data/data0/jiahang/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:447: LightningDeprecationWarning: Setting `Trainer(gpus=1)` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=1)` instead.
+      rank_zero_deprecation(
+    GPU available: True (cuda), used: True
+    TPU available: False, using: 0 TPU cores
+    IPU available: False, using: 0 IPUs
+    HPU available: False, using: 0 HPUs
+    Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
+    LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3]
+
+      | Name      | Type             | Params
+    -----------------------------------------------
+    0 | criterion | CrossEntropyLoss | 0     
+    1 | metrics   | ModuleDict       | 0     
+    2 | model     | DARTS            | 3.2 M 
+    -----------------------------------------------
+    3.2 M     Trainable params
+    0         Non-trainable params
+    3.2 M     Total params
+    12.942    Total estimated model params size (MB)
+    /data/data0/jiahang/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:225: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 56 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
+      rank_zero_warn(
+    /data/data0/jiahang/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
+      rank_zero_warn(
+
    Training: 0it [00:00, ?it/s]
    Training:   0%|          | 0/2 [00:00<?, ?it/s]
    Epoch 0:   0%|          | 0/2 [00:00<?, ?it/s] /data/data0/jiahang/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional_pil.py:41: DeprecationWarning: FLIP_LEFT_RIGHT is deprecated and will be removed in Pillow 10 (2023-07-01). Use Transpose.FLIP_LEFT_RIGHT instead.
+      return img.transpose(Image.FLIP_LEFT_RIGHT)
+
    Epoch 0:  50%|#####     | 1/2 [00:00<00:00,  1.33it/s]
    Epoch 0:  50%|#####     | 1/2 [00:00<00:00,  1.33it/s, loss=3.47, v_num=, train_loss=3.470, train_acc=0.0625]
+
    Validation: 0it [00:00, ?it/s]
+
    Validation:   0%|          | 0/1 [00:00<?, ?it/s]
+
    Validation DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]
+
    Validation DataLoader 0: 100%|##########| 1/1 [00:00<00:00,  3.13it/s]
    Epoch 0: 100%|##########| 2/2 [00:01<00:00,  1.20it/s, loss=3.47, v_num=, train_loss=3.470, train_acc=0.0625]
    Epoch 0: 100%|##########| 2/2 [00:01<00:00,  1.20it/s, loss=3.47, v_num=, train_loss=3.470, train_acc=0.0625, val_loss=2.300, val_acc=0.0938]
+
                                                                          
    Epoch 0: 100%|##########| 2/2 [00:01<00:00,  1.19it/s, loss=3.47, v_num=, train_loss=3.470, train_acc=0.0625, val_loss=2.300, val_acc=0.0938]`Trainer.fit` stopped: `max_steps=1` reached.
+
    Epoch 0: 100%|##########| 2/2 [00:01<00:00,  1.19it/s, loss=3.47, v_num=, train_loss=3.470, train_acc=0.0625, val_loss=2.300, val_acc=0.0938]
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 711-725
+
+When ``fast_dev_run`` is turned off, after retraining, the architecture yields a top-1 accuracy of 97.12%.
+If we take the best snapshot throughout the retrain process,
+there is a chance that the top-1 accuracy will be 97.28%.
+
+.. image:: ../../img/darts_val_acc.png
+
+In the figure, the orange line is the validation accuracy curve after training for 600 epochs,
+while the red line corresponding the previous version in this tutorial before adding all the training tricks and
+only trains for 100 epochs.
+
+The results outperforms "DARTS (first order) + cutout" in `DARTS`_ paper, which is only 97.00±0.14%.
+It's even comparable with "DARTS (second order) + cutout" in the paper (97.24±0.09%),
+though we didn't implement the second order version.
+The implementation of second order DARTS is in our future plan, and we also welcome your contribution.
+
+
+.. rst-class:: sphx-glr-timing
+
+   **Total running time of the script:** ( 1 minutes  53.716 seconds)
+
+
+.. _sphx_glr_download_tutorials_darts.py:
+
+.. only:: html
+
+  .. container:: sphx-glr-footer sphx-glr-footer-example
+
+
+    .. container:: sphx-glr-download sphx-glr-download-python
+
+      :download:`Download Python source code: darts.py <darts.py>`
+
+    .. container:: sphx-glr-download sphx-glr-download-jupyter
+
+      :download:`Download Jupyter notebook: darts.ipynb <darts.ipynb>`
+
+
+.. only:: html
+
+ .. rst-class:: sphx-glr-signature
+
+    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
--- a/docs/source/tutorials/darts_codeobj.pickle
+++ b/docs/source/tutorials/darts_codeobj.pickle
--- a/docs/source/tutorials/hello_nas.ipynb
+++ b/docs/source/tutorials/hello_nas.ipynb
+{
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "%matplotlib inline"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n# Hello, NAS!\n\nThis is the 101 tutorial of Neural Architecture Search (NAS) on NNI.\nIn this tutorial, we will search for a neural architecture on MNIST dataset with the help of NAS framework of NNI, i.e., *Retiarii*.\nWe use multi-trial NAS as an example to show how to construct and explore a model space.\n\nThere are mainly three crucial components for a neural architecture search task, namely,\n\n* Model search space that defines a set of models to explore.\n* A proper strategy as the method to explore this model space.\n* A model evaluator that reports the performance of every model in the space.\n\nCurrently, PyTorch is the only supported framework by Retiarii, and we have only tested **PyTorch 1.7 to 1.10**.\nThis tutorial assumes PyTorch context but it should also apply to other frameworks, which is in our future plan.\n\n## Define your Model Space\n\nModel space is defined by users to express a set of models that users want to explore, which contains potentially good-performing models.\nIn this framework, a model space is defined with two parts: a base model and possible mutations on the base model.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Define Base Model\n\nDefining a base model is almost the same as defining a PyTorch (or TensorFlow) model.\nUsually, you only need to replace the code ``import torch.nn as nn`` with\n``import nni.retiarii.nn.pytorch as nn`` to use our wrapped PyTorch modules.\n\nBelow is a very simple example of defining a base model.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import torch\nimport torch.nn.functional as F\nimport nni.retiarii.nn.pytorch as nn\nfrom nni.retiarii import model_wrapper\n\n\n@model_wrapper      # this decorator should be put on the out most\nclass Net(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(1, 32, 3, 1)\n        self.conv2 = nn.Conv2d(32, 64, 3, 1)\n        self.dropout1 = nn.Dropout(0.25)\n        self.dropout2 = nn.Dropout(0.5)\n        self.fc1 = nn.Linear(9216, 128)\n        self.fc2 = nn.Linear(128, 10)\n\n    def forward(self, x):\n        x = F.relu(self.conv1(x))\n        x = F.max_pool2d(self.conv2(x), 2)\n        x = torch.flatten(self.dropout1(x), 1)\n        x = self.fc2(self.dropout2(F.relu(self.fc1(x))))\n        output = F.log_softmax(x, dim=1)\n        return output"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        ".. tip:: Always keep in mind that you should use ``import nni.retiarii.nn.pytorch as nn`` and :meth:`nni.retiarii.model_wrapper`.\n         Many mistakes are a result of forgetting one of those.\n         Also, please use ``torch.nn`` for submodules of ``nn.init``, e.g., ``torch.nn.init`` instead of ``nn.init``.\n\n### Define Model Mutations\n\nA base model is only one concrete model not a model space. We provide :doc:`API and Primitives </nas/construct_space>`\nfor users to express how the base model can be mutated. That is, to build a model space which includes many models.\n\nBased on the above base model, we can define a model space as below.\n\n.. code-block:: diff\n\n  @model_wrapper\n  class Net(nn.Module):\n    def __init__(self):\n      super().__init__()\n      self.conv1 = nn.Conv2d(1, 32, 3, 1)\n  -   self.conv2 = nn.Conv2d(32, 64, 3, 1)\n  +   self.conv2 = nn.LayerChoice([\n  +       nn.Conv2d(32, 64, 3, 1),\n  +       DepthwiseSeparableConv(32, 64)\n  +   ])\n  -   self.dropout1 = nn.Dropout(0.25)\n  +   self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))\n      self.dropout2 = nn.Dropout(0.5)\n  -   self.fc1 = nn.Linear(9216, 128)\n  -   self.fc2 = nn.Linear(128, 10)\n  +   feature = nn.ValueChoice([64, 128, 256])\n  +   self.fc1 = nn.Linear(9216, feature)\n  +   self.fc2 = nn.Linear(feature, 10)\n\n    def forward(self, x):\n      x = F.relu(self.conv1(x))\n      x = F.max_pool2d(self.conv2(x), 2)\n      x = torch.flatten(self.dropout1(x), 1)\n      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))\n      output = F.log_softmax(x, dim=1)\n      return output\n\nThis results in the following code:\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "class DepthwiseSeparableConv(nn.Module):\n    def __init__(self, in_ch, out_ch):\n        super().__init__()\n        self.depthwise = nn.Conv2d(in_ch, in_ch, kernel_size=3, groups=in_ch)\n        self.pointwise = nn.Conv2d(in_ch, out_ch, kernel_size=1)\n\n    def forward(self, x):\n        return self.pointwise(self.depthwise(x))\n\n\n@model_wrapper\nclass ModelSpace(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(1, 32, 3, 1)\n        # LayerChoice is used to select a layer between Conv2d and DwConv.\n        self.conv2 = nn.LayerChoice([\n            nn.Conv2d(32, 64, 3, 1),\n            DepthwiseSeparableConv(32, 64)\n        ])\n        # ValueChoice is used to select a dropout rate.\n        # ValueChoice can be used as parameter of modules wrapped in `nni.retiarii.nn.pytorch`\n        # or customized modules wrapped with `@basic_unit`.\n        self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))  # choose dropout rate from 0.25, 0.5 and 0.75\n        self.dropout2 = nn.Dropout(0.5)\n        feature = nn.ValueChoice([64, 128, 256])\n        self.fc1 = nn.Linear(9216, feature)\n        self.fc2 = nn.Linear(feature, 10)\n\n    def forward(self, x):\n        x = F.relu(self.conv1(x))\n        x = F.max_pool2d(self.conv2(x), 2)\n        x = torch.flatten(self.dropout1(x), 1)\n        x = self.fc2(self.dropout2(F.relu(self.fc1(x))))\n        output = F.log_softmax(x, dim=1)\n        return output\n\n\nmodel_space = ModelSpace()\nmodel_space"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "This example uses two mutation APIs,\n:class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>` and\n:class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>`.\n:class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>`\ntakes a list of candidate modules (two in this example), one will be chosen for each sampled model.\nIt can be used like normal PyTorch module.\n:class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>` takes a list of candidate values,\none will be chosen to take effect for each sampled model.\n\nMore detailed API description and usage can be found :doc:`here </nas/construct_space>`.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>We are actively enriching the mutation APIs, to facilitate easy construction of model space.\n    If the currently supported mutation APIs cannot express your model space,\n    please refer to :doc:`this doc </nas/mutator>` for customizing mutators.</p></div>\n\n## Explore the Defined Model Space\n\nThere are basically two exploration approaches: (1) search by evaluating each sampled model independently,\nwhich is the search approach in `multi-trial NAS <multi-trial-nas>`\nand (2) one-shot weight-sharing based search, which is used in one-shot NAS.\nWe demonstrate the first approach in this tutorial. Users can refer to `here <one-shot-nas>` for the second approach.\n\nFirst, users need to pick a proper exploration strategy to explore the defined model space.\nSecond, users need to pick or customize a model evaluator to evaluate the performance of each explored model.\n\n### Pick an exploration strategy\n\nRetiarii supports many :doc:`exploration strategies </nas/exploration_strategy>`.\n\nSimply choosing (i.e., instantiate) an exploration strategy as below.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import nni.retiarii.strategy as strategy\nsearch_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is not wanted"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Pick or customize a model evaluator\n\nIn the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training\nand validating each generated model to obtain the model's performance.\nThe performance is sent to the exploration strategy for the strategy to generate better models.\n\nRetiarii has provided :doc:`built-in model evaluators </nas/evaluator>`, but to start with,\nit is recommended to use :class:`FunctionalEvaluator <nni.retiarii.evaluator.FunctionalEvaluator>`,\nthat is, to wrap your own training and evaluation code with one single function.\nThis function should receive one single model class and uses :func:`nni.report_final_result` to report the final score of this model.\n\nAn example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import nni\n\nfrom torchvision import transforms\nfrom torchvision.datasets import MNIST\nfrom torch.utils.data import DataLoader\n\n\ndef train_epoch(model, device, train_loader, optimizer, epoch):\n    loss_fn = torch.nn.CrossEntropyLoss()\n    model.train()\n    for batch_idx, (data, target) in enumerate(train_loader):\n        data, target = data.to(device), target.to(device)\n        optimizer.zero_grad()\n        output = model(data)\n        loss = loss_fn(output, target)\n        loss.backward()\n        optimizer.step()\n        if batch_idx % 10 == 0:\n            print('Train Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(\n                epoch, batch_idx * len(data), len(train_loader.dataset),\n                100. * batch_idx / len(train_loader), loss.item()))\n\n\ndef test_epoch(model, device, test_loader):\n    model.eval()\n    test_loss = 0\n    correct = 0\n    with torch.no_grad():\n        for data, target in test_loader:\n            data, target = data.to(device), target.to(device)\n            output = model(data)\n            pred = output.argmax(dim=1, keepdim=True)\n            correct += pred.eq(target.view_as(pred)).sum().item()\n\n    test_loss /= len(test_loader.dataset)\n    accuracy = 100. * correct / len(test_loader.dataset)\n\n    print('\\nTest set: Accuracy: {}/{} ({:.0f}%)\\n'.format(\n          correct, len(test_loader.dataset), accuracy))\n\n    return accuracy\n\n\ndef evaluate_model(model_cls):\n    # \"model_cls\" is a class, need to instantiate\n    model = model_cls()\n\n    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')\n    model.to(device)\n\n    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)\n    transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])\n    train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)\n    test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)\n\n    for epoch in range(3):\n        # train the model for one epoch\n        train_epoch(model, device, train_loader, optimizer, epoch)\n        # test the model for one epoch\n        accuracy = test_epoch(model, device, test_loader)\n        # call report intermediate result. Result can be float or dict\n        nni.report_intermediate_result(accuracy)\n\n    # report final test result\n    nni.report_final_result(accuracy)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Create the evaluator\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "from nni.retiarii.evaluator import FunctionalEvaluator\nevaluator = FunctionalEvaluator(evaluate_model)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The ``train_epoch`` and ``test_epoch`` here can be any customized function,\nwhere users can write their own training recipe.\n\nIt is recommended that the ``evaluate_model`` here accepts no additional arguments other than ``model_cls``.\nHowever, in the :doc:`advanced tutorial </nas/evaluator>`, we will show how to use additional arguments in case you actually need those.\nIn future, we will support mutation on the arguments of evaluators, which is commonly called \"Hyper-parmeter tuning\".\n\n## Launch an Experiment\n\nAfter all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig\nexp = RetiariiExperiment(model_space, evaluator, [], search_strategy)\nexp_config = RetiariiExeConfig('local')\nexp_config.experiment_name = 'mnist_search'"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The following configurations are useful to control how many trials to run at most / at the same time.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "exp_config.max_trial_number = 4   # spawn 4 trials at most\nexp_config.trial_concurrency = 2  # will run two trials concurrently"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Remember to set the following config if you want to GPU.\n``use_active_gpu`` should be set true if you wish to use an occupied GPU (possibly running a GUI).\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "exp_config.trial_gpu_number = 1\nexp_config.training_service.use_active_gpu = True"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Launch the experiment. The experiment should take several minutes to finish on a workstation with 2 GPUs.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "exp.run(exp_config, 8081)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Users can also run Retiarii Experiment with :doc:`different training services </experiment/training_service/overview>`\nbesides ``local`` training service.\n\n## Visualize the Experiment\n\nUsers can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment.\nFor example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``.\nPlease refer to :doc:`here </experiment/web_portal/web_portal>` for details.\n\nWe support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__).\nThis can be used by clicking ``Visualization`` in detail panel for each trial.\nNote that current visualization is based on `onnx <https://onnx.ai/>`__ ,\nthus visualization is not feasible if the model cannot be exported into onnx.\n\nBuilt-in evaluators (e.g., Classification) will automatically export the model into a file.\nFor your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.\nFor instance,\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import os\nfrom pathlib import Path\n\n\ndef evaluate_model_with_visualization(model_cls):\n    model = model_cls()\n    # dump the model into an onnx\n    if 'NNI_OUTPUT_DIR' in os.environ:\n        dummy_input = torch.zeros(1, 3, 32, 32)\n        torch.onnx.export(model, (dummy_input, ),\n                          Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')\n    evaluate_model(model_cls)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Relaunch the experiment, and a button is shown on Web portal.\n\n<img src=\"file://../../img/netron_entrance_webui.png\">\n\n## Export Top Models\n\nUsers can export top models after the exploration is done using ``export_top_models``.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "for model_dict in exp.export_top_models(formatter='dict'):\n    print(model_dict)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The output is ``json`` object which records the mutation actions of the top model.\nIf users want to output source code of the top model,\nthey can use `graph-based execution engine <graph-based-execution-engine>` for the experiment,\nby simply adding the following two lines.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "exp_config.execution_engine = 'base'\nexport_formatter = 'code'"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.8.8"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
\ No newline at end of file
--- a/docs/source/tutorials/hello_nas.py
+++ b/docs/source/tutorials/hello_nas.py
+"""
+Hello, NAS!
+===========
+
+This is the 101 tutorial of Neural Architecture Search (NAS) on NNI.
+In this tutorial, we will search for a neural architecture on MNIST dataset with the help of NAS framework of NNI, i.e., *Retiarii*.
+We use multi-trial NAS as an example to show how to construct and explore a model space.
+
+There are mainly three crucial components for a neural architecture search task, namely,
+
+* Model search space that defines a set of models to explore.
+* A proper strategy as the method to explore this model space.
+* A model evaluator that reports the performance of every model in the space.
+
+Currently, PyTorch is the only supported framework by Retiarii, and we have only tested **PyTorch 1.7 to 1.10**.
+This tutorial assumes PyTorch context but it should also apply to other frameworks, which is in our future plan.
+
+Define your Model Space
+-----------------------
+
+Model space is defined by users to express a set of models that users want to explore, which contains potentially good-performing models.
+In this framework, a model space is defined with two parts: a base model and possible mutations on the base model.
+"""
+
+# %%
+#
+# Define Base Model
+# ^^^^^^^^^^^^^^^^^
+#
+# Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model.
+# Usually, you only need to replace the code ``import torch.nn as nn`` with
+# ``import nni.retiarii.nn.pytorch as nn`` to use our wrapped PyTorch modules.
+#
+# Below is a very simple example of defining a base model.
+
+import torch
+import torch.nn.functional as F
+import nni.retiarii.nn.pytorch as nn
+from nni.retiarii import model_wrapper
+
+
+@model_wrapper      # this decorator should be put on the out most
+class Net(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 32, 3, 1)
+        self.conv2 = nn.Conv2d(32, 64, 3, 1)
+        self.dropout1 = nn.Dropout(0.25)
+        self.dropout2 = nn.Dropout(0.5)
+        self.fc1 = nn.Linear(9216, 128)
+        self.fc2 = nn.Linear(128, 10)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        x = F.max_pool2d(self.conv2(x), 2)
+        x = torch.flatten(self.dropout1(x), 1)
+        x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+        output = F.log_softmax(x, dim=1)
+        return output
+
+# %%
+# .. tip:: Always keep in mind that you should use ``import nni.retiarii.nn.pytorch as nn`` and :meth:`nni.retiarii.model_wrapper`.
+#          Many mistakes are a result of forgetting one of those.
+#          Also, please use ``torch.nn`` for submodules of ``nn.init``, e.g., ``torch.nn.init`` instead of ``nn.init``.
+#
+# Define Model Mutations
+# ^^^^^^^^^^^^^^^^^^^^^^
+#
+# A base model is only one concrete model not a model space. We provide :doc:`API and Primitives </nas/construct_space>`
+# for users to express how the base model can be mutated. That is, to build a model space which includes many models.
+#
+# Based on the above base model, we can define a model space as below.
+#
+# .. code-block:: diff
+#
+#   @model_wrapper
+#   class Net(nn.Module):
+#     def __init__(self):
+#       super().__init__()
+#       self.conv1 = nn.Conv2d(1, 32, 3, 1)
+#   -   self.conv2 = nn.Conv2d(32, 64, 3, 1)
+#   +   self.conv2 = nn.LayerChoice([
+#   +       nn.Conv2d(32, 64, 3, 1),
+#   +       DepthwiseSeparableConv(32, 64)
+#   +   ])
+#   -   self.dropout1 = nn.Dropout(0.25)
+#   +   self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
+#       self.dropout2 = nn.Dropout(0.5)
+#   -   self.fc1 = nn.Linear(9216, 128)
+#   -   self.fc2 = nn.Linear(128, 10)
+#   +   feature = nn.ValueChoice([64, 128, 256])
+#   +   self.fc1 = nn.Linear(9216, feature)
+#   +   self.fc2 = nn.Linear(feature, 10)
+#
+#     def forward(self, x):
+#       x = F.relu(self.conv1(x))
+#       x = F.max_pool2d(self.conv2(x), 2)
+#       x = torch.flatten(self.dropout1(x), 1)
+#       x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+#       output = F.log_softmax(x, dim=1)
+#       return output
+#
+# This results in the following code:
+
+
+class DepthwiseSeparableConv(nn.Module):
+    def __init__(self, in_ch, out_ch):
+        super().__init__()
+        self.depthwise = nn.Conv2d(in_ch, in_ch, kernel_size=3, groups=in_ch)
+        self.pointwise = nn.Conv2d(in_ch, out_ch, kernel_size=1)
+
+    def forward(self, x):
+        return self.pointwise(self.depthwise(x))
+
+
+@model_wrapper
+class ModelSpace(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 32, 3, 1)
+        # LayerChoice is used to select a layer between Conv2d and DwConv.
+        self.conv2 = nn.LayerChoice([
+            nn.Conv2d(32, 64, 3, 1),
+            DepthwiseSeparableConv(32, 64)
+        ])
+        # ValueChoice is used to select a dropout rate.
+        # ValueChoice can be used as parameter of modules wrapped in `nni.retiarii.nn.pytorch`
+        # or customized modules wrapped with `@basic_unit`.
+        self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))  # choose dropout rate from 0.25, 0.5 and 0.75
+        self.dropout2 = nn.Dropout(0.5)
+        feature = nn.ValueChoice([64, 128, 256])
+        self.fc1 = nn.Linear(9216, feature)
+        self.fc2 = nn.Linear(feature, 10)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        x = F.max_pool2d(self.conv2(x), 2)
+        x = torch.flatten(self.dropout1(x), 1)
+        x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+        output = F.log_softmax(x, dim=1)
+        return output
+
+
+model_space = ModelSpace()
+model_space
+
+# %%
+# This example uses two mutation APIs,
+# :class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>` and
+# :class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>`.
+# :class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>`
+# takes a list of candidate modules (two in this example), one will be chosen for each sampled model.
+# It can be used like normal PyTorch module.
+# :class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>` takes a list of candidate values,
+# one will be chosen to take effect for each sampled model.
+#
+# More detailed API description and usage can be found :doc:`here </nas/construct_space>`.
+#
+# .. note::
+#
+#     We are actively enriching the mutation APIs, to facilitate easy construction of model space.
+#     If the currently supported mutation APIs cannot express your model space,
+#     please refer to :doc:`this doc </nas/mutator>` for customizing mutators.
+#
+# Explore the Defined Model Space
+# -------------------------------
+#
+# There are basically two exploration approaches: (1) search by evaluating each sampled model independently,
+# which is the search approach in :ref:`multi-trial NAS <multi-trial-nas>`
+# and (2) one-shot weight-sharing based search, which is used in one-shot NAS.
+# We demonstrate the first approach in this tutorial. Users can refer to :ref:`here <one-shot-nas>` for the second approach.
+#
+# First, users need to pick a proper exploration strategy to explore the defined model space.
+# Second, users need to pick or customize a model evaluator to evaluate the performance of each explored model.
+#
+# Pick an exploration strategy
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#
+# Retiarii supports many :doc:`exploration strategies </nas/exploration_strategy>`.
+#
+# Simply choosing (i.e., instantiate) an exploration strategy as below.
+
+import nni.retiarii.strategy as strategy
+search_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is not wanted
+
+# %%
+# Pick or customize a model evaluator
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#
+# In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training
+# and validating each generated model to obtain the model's performance.
+# The performance is sent to the exploration strategy for the strategy to generate better models.
+#
+# Retiarii has provided :doc:`built-in model evaluators </nas/evaluator>`, but to start with,
+# it is recommended to use :class:`FunctionalEvaluator <nni.retiarii.evaluator.FunctionalEvaluator>`,
+# that is, to wrap your own training and evaluation code with one single function.
+# This function should receive one single model class and uses :func:`nni.report_final_result` to report the final score of this model.
+#
+# An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.
+
+import nni
+
+from torchvision import transforms
+from torchvision.datasets import MNIST
+from torch.utils.data import DataLoader
+
+
+def train_epoch(model, device, train_loader, optimizer, epoch):
+    loss_fn = torch.nn.CrossEntropyLoss()
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = loss_fn(output, target)
+        loss.backward()
+        optimizer.step()
+        if batch_idx % 10 == 0:
+            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
+                epoch, batch_idx * len(data), len(train_loader.dataset),
+                100. * batch_idx / len(train_loader), loss.item()))
+
+
+def test_epoch(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+
+    test_loss /= len(test_loader.dataset)
+    accuracy = 100. * correct / len(test_loader.dataset)
+
+    print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
+          correct, len(test_loader.dataset), accuracy))
+
+    return accuracy
+
+
+def evaluate_model(model_cls):
+    # "model_cls" is a class, need to instantiate
+    model = model_cls()
+
+    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+    model.to(device)
+
+    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
+    transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+    train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
+    test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
+
+    for epoch in range(3):
+        # train the model for one epoch
+        train_epoch(model, device, train_loader, optimizer, epoch)
+        # test the model for one epoch
+        accuracy = test_epoch(model, device, test_loader)
+        # call report intermediate result. Result can be float or dict
+        nni.report_intermediate_result(accuracy)
+
+    # report final test result
+    nni.report_final_result(accuracy)
+
+
+# %%
+# Create the evaluator
+
+from nni.retiarii.evaluator import FunctionalEvaluator
+evaluator = FunctionalEvaluator(evaluate_model)
+
+# %%
+#
+# The ``train_epoch`` and ``test_epoch`` here can be any customized function,
+# where users can write their own training recipe.
+#
+# It is recommended that the ``evaluate_model`` here accepts no additional arguments other than ``model_cls``.
+# However, in the :doc:`advanced tutorial </nas/evaluator>`, we will show how to use additional arguments in case you actually need those.
+# In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".
+#
+# Launch an Experiment
+# --------------------
+#
+# After all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.
+
+from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig
+exp = RetiariiExperiment(model_space, evaluator, [], search_strategy)
+exp_config = RetiariiExeConfig('local')
+exp_config.experiment_name = 'mnist_search'
+
+# %%
+# The following configurations are useful to control how many trials to run at most / at the same time.
+
+exp_config.max_trial_number = 4   # spawn 4 trials at most
+exp_config.trial_concurrency = 2  # will run two trials concurrently
+
+# %%
+# Remember to set the following config if you want to GPU.
+# ``use_active_gpu`` should be set true if you wish to use an occupied GPU (possibly running a GUI).
+
+exp_config.trial_gpu_number = 1
+exp_config.training_service.use_active_gpu = True
+
+# %%
+# Launch the experiment. The experiment should take several minutes to finish on a workstation with 2 GPUs.
+
+exp.run(exp_config, 8081)
+
+# %%
+# Users can also run Retiarii Experiment with :doc:`different training services </experiment/training_service/overview>`
+# besides ``local`` training service.
+#
+# Visualize the Experiment
+# ------------------------
+#
+# Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment.
+# For example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``.
+# Please refer to :doc:`here </experiment/web_portal/web_portal>` for details.
+#
+# We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__).
+# This can be used by clicking ``Visualization`` in detail panel for each trial.
+# Note that current visualization is based on `onnx <https://onnx.ai/>`__ ,
+# thus visualization is not feasible if the model cannot be exported into onnx.
+#
+# Built-in evaluators (e.g., Classification) will automatically export the model into a file.
+# For your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.
+# For instance,
+
+import os
+from pathlib import Path
+
+
+def evaluate_model_with_visualization(model_cls):
+    model = model_cls()
+    # dump the model into an onnx
+    if 'NNI_OUTPUT_DIR' in os.environ:
+        dummy_input = torch.zeros(1, 3, 32, 32)
+        torch.onnx.export(model, (dummy_input, ),
+                          Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
+    evaluate_model(model_cls)
+
+# %%
+# Relaunch the experiment, and a button is shown on Web portal.
+#
+# .. image:: ../../img/netron_entrance_webui.png
+#
+# Export Top Models
+# -----------------
+#
+# Users can export top models after the exploration is done using ``export_top_models``.
+
+for model_dict in exp.export_top_models(formatter='dict'):
+    print(model_dict)
+
+# %%
+# The output is ``json`` object which records the mutation actions of the top model.
+# If users want to output source code of the top model,
+# they can use :ref:`graph-based execution engine <graph-based-execution-engine>` for the experiment,
+# by simply adding the following two lines.
+
+exp_config.execution_engine = 'base'
+export_formatter = 'code'
--- a/docs/source/tutorials/hello_nas.py.md5
+++ b/docs/source/tutorials/hello_nas.py.md5
+0e49e3aef98633744807b814786f6b31
\ No newline at end of file
--- a/docs/source/tutorials/hello_nas.rst
+++ b/docs/source/tutorials/hello_nas.rst
+
+.. DO NOT EDIT.
+.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
+.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
+.. "tutorials/hello_nas.py"
+.. LINE NUMBERS ARE GIVEN BELOW.
+
+.. only:: html
+
+    .. note::
+        :class: sphx-glr-download-link-note
+
+        Click :ref:`here <sphx_glr_download_tutorials_hello_nas.py>`
+        to download the full example code
+
+.. rst-class:: sphx-glr-example-title
+
+.. _sphx_glr_tutorials_hello_nas.py:
+
+
+Hello, NAS!
+===========
+
+This is the 101 tutorial of Neural Architecture Search (NAS) on NNI.
+In this tutorial, we will search for a neural architecture on MNIST dataset with the help of NAS framework of NNI, i.e., *Retiarii*.
+We use multi-trial NAS as an example to show how to construct and explore a model space.
+
+There are mainly three crucial components for a neural architecture search task, namely,
+
+* Model search space that defines a set of models to explore.
+* A proper strategy as the method to explore this model space.
+* A model evaluator that reports the performance of every model in the space.
+
+Currently, PyTorch is the only supported framework by Retiarii, and we have only tested **PyTorch 1.7 to 1.10**.
+This tutorial assumes PyTorch context but it should also apply to other frameworks, which is in our future plan.
+
+Define your Model Space
+-----------------------
+
+Model space is defined by users to express a set of models that users want to explore, which contains potentially good-performing models.
+In this framework, a model space is defined with two parts: a base model and possible mutations on the base model.
+
+.. GENERATED FROM PYTHON SOURCE LINES 26-34
+
+Define Base Model
+^^^^^^^^^^^^^^^^^
+
+Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model.
+Usually, you only need to replace the code ``import torch.nn as nn`` with
+``import nni.retiarii.nn.pytorch as nn`` to use our wrapped PyTorch modules.
+
+Below is a very simple example of defining a base model.
+
+.. GENERATED FROM PYTHON SOURCE LINES 35-61
+
+.. code-block:: default
+
+
+    import torch
+    import torch.nn.functional as F
+    import nni.retiarii.nn.pytorch as nn
+    from nni.retiarii import model_wrapper
+
+
+    @model_wrapper      # this decorator should be put on the out most
+    class Net(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.conv1 = nn.Conv2d(1, 32, 3, 1)
+            self.conv2 = nn.Conv2d(32, 64, 3, 1)
+            self.dropout1 = nn.Dropout(0.25)
+            self.dropout2 = nn.Dropout(0.5)
+            self.fc1 = nn.Linear(9216, 128)
+            self.fc2 = nn.Linear(128, 10)
+
+        def forward(self, x):
+            x = F.relu(self.conv1(x))
+            x = F.max_pool2d(self.conv2(x), 2)
+            x = torch.flatten(self.dropout1(x), 1)
+            x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+            output = F.log_softmax(x, dim=1)
+            return output
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 62-104
+
+.. tip:: Always keep in mind that you should use ``import nni.retiarii.nn.pytorch as nn`` and :meth:`nni.retiarii.model_wrapper`.
+         Many mistakes are a result of forgetting one of those.
+         Also, please use ``torch.nn`` for submodules of ``nn.init``, e.g., ``torch.nn.init`` instead of ``nn.init``.
+
+Define Model Mutations
+^^^^^^^^^^^^^^^^^^^^^^
+
+A base model is only one concrete model not a model space. We provide :doc:`API and Primitives </nas/construct_space>`
+for users to express how the base model can be mutated. That is, to build a model space which includes many models.
+
+Based on the above base model, we can define a model space as below.
+
+.. code-block:: diff
+
+  @model_wrapper
+  class Net(nn.Module):
+    def __init__(self):
+      super().__init__()
+      self.conv1 = nn.Conv2d(1, 32, 3, 1)
+  -   self.conv2 = nn.Conv2d(32, 64, 3, 1)
+  +   self.conv2 = nn.LayerChoice([
+  +       nn.Conv2d(32, 64, 3, 1),
+  +       DepthwiseSeparableConv(32, 64)
+  +   ])
+  -   self.dropout1 = nn.Dropout(0.25)
+  +   self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
+      self.dropout2 = nn.Dropout(0.5)
+  -   self.fc1 = nn.Linear(9216, 128)
+  -   self.fc2 = nn.Linear(128, 10)
+  +   feature = nn.ValueChoice([64, 128, 256])
+  +   self.fc1 = nn.Linear(9216, feature)
+  +   self.fc2 = nn.Linear(feature, 10)
+
+    def forward(self, x):
+      x = F.relu(self.conv1(x))
+      x = F.max_pool2d(self.conv2(x), 2)
+      x = torch.flatten(self.dropout1(x), 1)
+      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+      output = F.log_softmax(x, dim=1)
+      return output
+
+This results in the following code:
+
+.. GENERATED FROM PYTHON SOURCE LINES 104-147
+
+.. code-block:: default
+
+
+
+    class DepthwiseSeparableConv(nn.Module):
+        def __init__(self, in_ch, out_ch):
+            super().__init__()
+            self.depthwise = nn.Conv2d(in_ch, in_ch, kernel_size=3, groups=in_ch)
+            self.pointwise = nn.Conv2d(in_ch, out_ch, kernel_size=1)
+
+        def forward(self, x):
+            return self.pointwise(self.depthwise(x))
+
+
+    @model_wrapper
+    class ModelSpace(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.conv1 = nn.Conv2d(1, 32, 3, 1)
+            # LayerChoice is used to select a layer between Conv2d and DwConv.
+            self.conv2 = nn.LayerChoice([
+                nn.Conv2d(32, 64, 3, 1),
+                DepthwiseSeparableConv(32, 64)
+            ])
+            # ValueChoice is used to select a dropout rate.
+            # ValueChoice can be used as parameter of modules wrapped in `nni.retiarii.nn.pytorch`
+            # or customized modules wrapped with `@basic_unit`.
+            self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))  # choose dropout rate from 0.25, 0.5 and 0.75
+            self.dropout2 = nn.Dropout(0.5)
+            feature = nn.ValueChoice([64, 128, 256])
+            self.fc1 = nn.Linear(9216, feature)
+            self.fc2 = nn.Linear(feature, 10)
+
+        def forward(self, x):
+            x = F.relu(self.conv1(x))
+            x = F.max_pool2d(self.conv2(x), 2)
+            x = torch.flatten(self.dropout1(x), 1)
+            x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+            output = F.log_softmax(x, dim=1)
+            return output
+
+
+    model_space = ModelSpace()
+    model_space
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+
+    ModelSpace(
+      (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
+      (conv2): LayerChoice([Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1)), DepthwiseSeparableConv(
+        (depthwise): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32)
+        (pointwise): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
+      )], label='model_1')
+      (dropout1): Dropout(p=0.25, inplace=False)
+      (dropout2): Dropout(p=0.5, inplace=False)
+      (fc1): Linear(in_features=9216, out_features=64, bias=True)
+      (fc2): Linear(in_features=64, out_features=10, bias=True)
+    )
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 148-182
+
+This example uses two mutation APIs,
+:class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>` and
+:class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>`.
+:class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>`
+takes a list of candidate modules (two in this example), one will be chosen for each sampled model.
+It can be used like normal PyTorch module.
+:class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>` takes a list of candidate values,
+one will be chosen to take effect for each sampled model.
+
+More detailed API description and usage can be found :doc:`here </nas/construct_space>`.
+
+.. note::
+
+    We are actively enriching the mutation APIs, to facilitate easy construction of model space.
+    If the currently supported mutation APIs cannot express your model space,
+    please refer to :doc:`this doc </nas/mutator>` for customizing mutators.
+
+Explore the Defined Model Space
+-------------------------------
+
+There are basically two exploration approaches: (1) search by evaluating each sampled model independently,
+which is the search approach in :ref:`multi-trial NAS <multi-trial-nas>`
+and (2) one-shot weight-sharing based search, which is used in one-shot NAS.
+We demonstrate the first approach in this tutorial. Users can refer to :ref:`here <one-shot-nas>` for the second approach.
+
+First, users need to pick a proper exploration strategy to explore the defined model space.
+Second, users need to pick or customize a model evaluator to evaluate the performance of each explored model.
+
+Pick an exploration strategy
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Retiarii supports many :doc:`exploration strategies </nas/exploration_strategy>`.
+
+Simply choosing (i.e., instantiate) an exploration strategy as below.
+
+.. GENERATED FROM PYTHON SOURCE LINES 182-186
+
+.. code-block:: default
+
+
+    import nni.retiarii.strategy as strategy
+    search_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is not wanted
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+
+    /home/yugzhan/miniconda3/envs/cu102/lib/python3.8/site-packages/ray/autoscaler/_private/cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
+      warnings.warn(
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 187-200
+
+Pick or customize a model evaluator
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training
+and validating each generated model to obtain the model's performance.
+The performance is sent to the exploration strategy for the strategy to generate better models.
+
+Retiarii has provided :doc:`built-in model evaluators </nas/evaluator>`, but to start with,
+it is recommended to use :class:`FunctionalEvaluator <nni.retiarii.evaluator.FunctionalEvaluator>`,
+that is, to wrap your own training and evaluation code with one single function.
+This function should receive one single model class and uses :func:`nni.report_final_result` to report the final score of this model.
+
+An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.
+
+.. GENERATED FROM PYTHON SOURCE LINES 200-268
+
+.. code-block:: default
+
+
+    import nni
+
+    from torchvision import transforms
+    from torchvision.datasets import MNIST
+    from torch.utils.data import DataLoader
+
+
+    def train_epoch(model, device, train_loader, optimizer, epoch):
+        loss_fn = torch.nn.CrossEntropyLoss()
+        model.train()
+        for batch_idx, (data, target) in enumerate(train_loader):
+            data, target = data.to(device), target.to(device)
+            optimizer.zero_grad()
+            output = model(data)
+            loss = loss_fn(output, target)
+            loss.backward()
+            optimizer.step()
+            if batch_idx % 10 == 0:
+                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
+                    epoch, batch_idx * len(data), len(train_loader.dataset),
+                    100. * batch_idx / len(train_loader), loss.item()))
+
+
+    def test_epoch(model, device, test_loader):
+        model.eval()
+        test_loss = 0
+        correct = 0
+        with torch.no_grad():
+            for data, target in test_loader:
+                data, target = data.to(device), target.to(device)
+                output = model(data)
+                pred = output.argmax(dim=1, keepdim=True)
+                correct += pred.eq(target.view_as(pred)).sum().item()
+
+        test_loss /= len(test_loader.dataset)
+        accuracy = 100. * correct / len(test_loader.dataset)
+
+        print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
+              correct, len(test_loader.dataset), accuracy))
+
+        return accuracy
+
+
+    def evaluate_model(model_cls):
+        # "model_cls" is a class, need to instantiate
+        model = model_cls()
+
+        device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+        model.to(device)
+
+        optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
+        transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+        train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
+        test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
+
+        for epoch in range(3):
+            # train the model for one epoch
+            train_epoch(model, device, train_loader, optimizer, epoch)
+            # test the model for one epoch
+            accuracy = test_epoch(model, device, test_loader)
+            # call report intermediate result. Result can be float or dict
+            nni.report_intermediate_result(accuracy)
+
+        # report final test result
+        nni.report_final_result(accuracy)
+
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 269-270
+
+Create the evaluator
+
+.. GENERATED FROM PYTHON SOURCE LINES 270-274
+
+.. code-block:: default
+
+
+    from nni.retiarii.evaluator import FunctionalEvaluator
+    evaluator = FunctionalEvaluator(evaluate_model)
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 275-286
+
+The ``train_epoch`` and ``test_epoch`` here can be any customized function,
+where users can write their own training recipe.
+
+It is recommended that the ``evaluate_model`` here accepts no additional arguments other than ``model_cls``.
+However, in the :doc:`advanced tutorial </nas/evaluator>`, we will show how to use additional arguments in case you actually need those.
+In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".
+
+Launch an Experiment
+--------------------
+
+After all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.
+
+.. GENERATED FROM PYTHON SOURCE LINES 287-293
+
+.. code-block:: default
+
+
+    from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig
+    exp = RetiariiExperiment(model_space, evaluator, [], search_strategy)
+    exp_config = RetiariiExeConfig('local')
+    exp_config.experiment_name = 'mnist_search'
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 294-295
+
+The following configurations are useful to control how many trials to run at most / at the same time.
+
+.. GENERATED FROM PYTHON SOURCE LINES 295-299
+
+.. code-block:: default
+
+
+    exp_config.max_trial_number = 4   # spawn 4 trials at most
+    exp_config.trial_concurrency = 2  # will run two trials concurrently
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 300-302
+
+Remember to set the following config if you want to GPU.
+``use_active_gpu`` should be set true if you wish to use an occupied GPU (possibly running a GUI).
+
+.. GENERATED FROM PYTHON SOURCE LINES 302-306
+
+.. code-block:: default
+
+
+    exp_config.trial_gpu_number = 1
+    exp_config.training_service.use_active_gpu = True
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 307-308
+
+Launch the experiment. The experiment should take several minutes to finish on a workstation with 2 GPUs.
+
+.. GENERATED FROM PYTHON SOURCE LINES 308-311
+
+.. code-block:: default
+
+
+    exp.run(exp_config, 8081)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    INFO:nni.experiment:Creating experiment, Experiment ID: z8ns5fv7
+    INFO:nni.experiment:Connecting IPC pipe...
+    INFO:nni.experiment:Starting web server...
+    INFO:nni.experiment:Setting up...
+    INFO:nni.runtime.msg_dispatcher_base:Dispatcher started
+    INFO:nni.retiarii.experiment.pytorch:Web UI URLs: http://127.0.0.1:8081 http://10.190.172.35:8081 http://192.168.49.1:8081 http://172.17.0.1:8081
+    INFO:nni.retiarii.experiment.pytorch:Start strategy...
+    INFO:root:Successfully update searchSpace.
+    INFO:nni.retiarii.strategy.bruteforce:Random search running in fixed size mode. Dedup: on.
+    INFO:nni.retiarii.experiment.pytorch:Stopping experiment, please wait...
+    INFO:nni.retiarii.experiment.pytorch:Strategy exit
+    INFO:nni.retiarii.experiment.pytorch:Waiting for experiment to become DONE (you can ctrl+c if there is no running trial jobs)...
+    INFO:nni.runtime.msg_dispatcher_base:Dispatcher exiting...
+    INFO:nni.retiarii.experiment.pytorch:Experiment stopped
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 312-330
+
+Users can also run Retiarii Experiment with :doc:`different training services </experiment/training_service/overview>`
+besides ``local`` training service.
+
+Visualize the Experiment
+------------------------
+
+Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment.
+For example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``.
+Please refer to :doc:`here </experiment/web_portal/web_portal>` for details.
+
+We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__).
+This can be used by clicking ``Visualization`` in detail panel for each trial.
+Note that current visualization is based on `onnx <https://onnx.ai/>`__ ,
+thus visualization is not feasible if the model cannot be exported into onnx.
+
+Built-in evaluators (e.g., Classification) will automatically export the model into a file.
+For your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.
+For instance,
+
+.. GENERATED FROM PYTHON SOURCE LINES 330-344
+
+.. code-block:: default
+
+
+    import os
+    from pathlib import Path
+
+
+    def evaluate_model_with_visualization(model_cls):
+        model = model_cls()
+        # dump the model into an onnx
+        if 'NNI_OUTPUT_DIR' in os.environ:
+            dummy_input = torch.zeros(1, 3, 32, 32)
+            torch.onnx.export(model, (dummy_input, ),
+                              Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
+        evaluate_model(model_cls)
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 345-353
+
+Relaunch the experiment, and a button is shown on Web portal.
+
+.. image:: ../../img/netron_entrance_webui.png
+
+Export Top Models
+-----------------
+
+Users can export top models after the exploration is done using ``export_top_models``.
+
+.. GENERATED FROM PYTHON SOURCE LINES 353-357
+
+.. code-block:: default
+
+
+    for model_dict in exp.export_top_models(formatter='dict'):
+        print(model_dict)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    {'model_1': '0', 'model_2': 0.25, 'model_3': 64}
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 358-362
+
+The output is ``json`` object which records the mutation actions of the top model.
+If users want to output source code of the top model,
+they can use :ref:`graph-based execution engine <graph-based-execution-engine>` for the experiment,
+by simply adding the following two lines.
+
+.. GENERATED FROM PYTHON SOURCE LINES 362-365
+
+.. code-block:: default
+
+
+    exp_config.execution_engine = 'base'
+    export_formatter = 'code'
+
+
+
+
+
+
+
+
+.. rst-class:: sphx-glr-timing
+
+   **Total running time of the script:** ( 2 minutes  4.499 seconds)
+
+
+.. _sphx_glr_download_tutorials_hello_nas.py:
+
+
+.. only :: html
+
+ .. container:: sphx-glr-footer
+    :class: sphx-glr-footer-example
+
+
+
+  .. container:: sphx-glr-download sphx-glr-download-python
+
+     :download:`Download Python source code: hello_nas.py <hello_nas.py>`
+
+
+
+  .. container:: sphx-glr-download sphx-glr-download-jupyter
+
+     :download:`Download Jupyter notebook: hello_nas.ipynb <hello_nas.ipynb>`
+
+
+.. only:: html
+
+ .. rst-class:: sphx-glr-signature
+
+    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
--- a/docs/source/tutorials/hello_nas_codeobj.pickle
+++ b/docs/source/tutorials/hello_nas_codeobj.pickle
--- a/docs/source/tutorials/hello_nas_zh.rst
+++ b/docs/source/tutorials/hello_nas_zh.rst
+.. 8a873f2c9cb0e8e3ed2d66b9d16c330f
+
+
+.. DO NOT EDIT.
+.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
+.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
+.. "tutorials/hello_nas.py"
+.. LINE NUMBERS ARE GIVEN BELOW.
+
+.. only:: html
+
+    .. note::
+        :class: sphx-glr-download-link-note
+
+        Click :ref:`here <sphx_glr_download_tutorials_hello_nas.py>`
+        to download the full example code
+
+.. rst-class:: sphx-glr-example-title
+
+.. _sphx_glr_tutorials_hello_nas.py:
+
+
+架构搜索入门教程
+================
+
+这是 NNI 上的神经架构搜索（NAS）的入门教程。
+在本教程中，我们将借助 NNI 的 NAS 框架，即 *Retiarii*，在 MNIST 数据集上实现网络结构搜索。
+我们以多尝试的架构搜索为例来展示如何构建和探索模型空间。
+
+神经架构搜索任务主要有三个关键组成部分，即
+
+* 模型搜索空间，定义了一个要探索的模型的集合。
+* 一个合适的策略作为探索这个模型空间的方法。
+* 一个模型评估器，用于为搜索空间中每个模型评估性能。
+
+目前，Retiarii 只支持 PyTorch，并对 **PyTorch 1.7 到 1.10** 进行了测试。
+所以本教程假定您使用 PyTorch 作为深度学习框架。未来我们会支持更多框架。
+
+定义您的模型空间
+----------------------
+
+模型空间是由用户定义的，用来表达用户想要探索的一组模型，其中包含有潜力的好模型。
+在 NNI 的框架中，模型空间由两部分定义：基本模型和基本模型上可能的变化。
+
+.. GENERATED FROM PYTHON SOURCE LINES 26-34
+
+定义基本模型
+^^^^^^^^^^^^^^^^^
+
+定义基本模型与定义 PyTorch（或 TensorFlow）模型几乎相同。
+通常，您只需将代码 ``import torch.nn as nn`` 替换为
+``import nni.retiarii.nn.pytorch as nn`` 以使用我们打包的 PyTorch 模块。
+
+下面是定义基本模型的一个非常简单的示例。
+
+.. GENERATED FROM PYTHON SOURCE LINES 35-61
+
+.. code-block:: default
+
+
+    import torch
+    import torch.nn.functional as F
+    import nni.retiarii.nn.pytorch as nn
+    from nni.retiarii import model_wrapper
+
+
+    @model_wrapper      # this decorator should be put on the out most
+    class Net(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.conv1 = nn.Conv2d(1, 32, 3, 1)
+            self.conv2 = nn.Conv2d(32, 64, 3, 1)
+            self.dropout1 = nn.Dropout(0.25)
+            self.dropout2 = nn.Dropout(0.5)
+            self.fc1 = nn.Linear(9216, 128)
+            self.fc2 = nn.Linear(128, 10)
+
+        def forward(self, x):
+            x = F.relu(self.conv1(x))
+            x = F.max_pool2d(self.conv2(x), 2)
+            x = torch.flatten(self.dropout1(x), 1)
+            x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+            output = F.log_softmax(x, dim=1)
+            return output
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 62-104
+
+.. tip:: 记住，您应该使用 ``import nni.retiarii.nn.pytorch as nn`` 和 :meth:`nni.retiarii.model_wrapper`。
+    许多错误都是因为忘记使用某一个。
+    另外，要使用 ``nn.init`` 的子模块，可以使用 ``torch.nn``，例如， ``torch.nn.init`` 而不是 ``nn.init``。
+
+定义模型变化
+^^^^^^^^^^^^^^^^^^^^^^
+
+基本模型只是一个具体模型，而不是模型空间。 我们提供 :doc:`模型变化的 API </nas/construct_space>`
+让用户表达如何改变基本模型。 即构建一个包含许多模型的搜索空间。
+
+基于上述基本模型，我们可以定义如下模型空间。
+
+.. code-block:: diff
+
+  @model_wrapper
+  class Net(nn.Module):
+    def __init__(self):
+      super().__init__()
+      self.conv1 = nn.Conv2d(1, 32, 3, 1)
+  -   self.conv2 = nn.Conv2d(32, 64, 3, 1)
+  +   self.conv2 = nn.LayerChoice([
+  +       nn.Conv2d(32, 64, 3, 1),
+  +       DepthwiseSeparableConv(32, 64)
+  +   ])
+  -   self.dropout1 = nn.Dropout(0.25)
+  +   self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
+      self.dropout2 = nn.Dropout(0.5)
+  -   self.fc1 = nn.Linear(9216, 128)
+  -   self.fc2 = nn.Linear(128, 10)
+  +   feature = nn.ValueChoice([64, 128, 256])
+  +   self.fc1 = nn.Linear(9216, feature)
+  +   self.fc2 = nn.Linear(feature, 10)
+
+    def forward(self, x):
+      x = F.relu(self.conv1(x))
+      x = F.max_pool2d(self.conv2(x), 2)
+      x = torch.flatten(self.dropout1(x), 1)
+      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+      output = F.log_softmax(x, dim=1)
+      return output
+
+结果是以下代码：
+
+.. GENERATED FROM PYTHON SOURCE LINES 104-147
+
+.. code-block:: default
+
+
+
+    class DepthwiseSeparableConv(nn.Module):
+        def __init__(self, in_ch, out_ch):
+            super().__init__()
+            self.depthwise = nn.Conv2d(in_ch, in_ch, kernel_size=3, groups=in_ch)
+            self.pointwise = nn.Conv2d(in_ch, out_ch, kernel_size=1)
+
+        def forward(self, x):
+            return self.pointwise(self.depthwise(x))
+
+
+    @model_wrapper
+    class ModelSpace(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.conv1 = nn.Conv2d(1, 32, 3, 1)
+            # LayerChoice is used to select a layer between Conv2d and DwConv.
+            self.conv2 = nn.LayerChoice([
+                nn.Conv2d(32, 64, 3, 1),
+                DepthwiseSeparableConv(32, 64)
+            ])
+            # ValueChoice is used to select a dropout rate.
+            # ValueChoice can be used as parameter of modules wrapped in `nni.retiarii.nn.pytorch`
+            # or customized modules wrapped with `@basic_unit`.
+            self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))  # choose dropout rate from 0.25, 0.5 and 0.75
+            self.dropout2 = nn.Dropout(0.5)
+            feature = nn.ValueChoice([64, 128, 256])
+            self.fc1 = nn.Linear(9216, feature)
+            self.fc2 = nn.Linear(feature, 10)
+
+        def forward(self, x):
+            x = F.relu(self.conv1(x))
+            x = F.max_pool2d(self.conv2(x), 2)
+            x = torch.flatten(self.dropout1(x), 1)
+            x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+            output = F.log_softmax(x, dim=1)
+            return output
+
+
+    model_space = ModelSpace()
+    model_space
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+
+    ModelSpace(
+      (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
+      (conv2): LayerChoice([Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1)), DepthwiseSeparableConv(
+        (depthwise): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32)
+        (pointwise): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
+      )], label='model_1')
+      (dropout1): Dropout(p=0.25, inplace=False)
+      (dropout2): Dropout(p=0.5, inplace=False)
+      (fc1): Linear(in_features=9216, out_features=64, bias=True)
+      (fc2): Linear(in_features=64, out_features=10, bias=True)
+    )
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 148-182
+
+这个例子使用了两个模型变化的 API， :class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>` 和 :class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>`。
+:class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>` 可以从一系列的候选子模块中（在本例中为两个），为每个采样模型选择一个。
+它可以像原来的 PyTorch 子模块一样使用。
+:class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>` 的参数是一个候选值列表，语义是为每个采样模型选择一个值。
+
+更详细的 API 描述和用法可以在 :doc:`这里 </nas/construct_space>` 找到。
+
+.. note::
+
+    我们正在积极丰富模型变化的 API，使得您可以轻松构建模型空间。
+    如果当前支持的模型变化的 API 不能表达您的模型空间，
+    请参考 :doc:`这篇文档 </nas/mutator>` 来自定义突变。
+
+探索定义的模型空间
+-------------------------------------------
+
+简单来讲，有两种探索方法：
+(1) 独立评估每个采样到的模型，这是 :ref:`多尝试 NAS <multi-trial-nas>` 中的搜索方法。
+(2) 单尝试共享权重型的搜索，简称单尝试 NAS。
+我们在本教程中演示了第一种方法。第二种方法用户可以参考 :ref:`这里 <one-shot-nas>`。
+
+首先，用户需要选择合适的探索策略来探索定义好的模型空间。
+其次，用户需要选择或自定义模型性能评估来评估每个探索模型的性能。
+
+选择探索策略
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Retiarii 支持许多 :doc:`探索策略</nas/exploration_strategy>`。
+
+只需选择（即实例化）探索策略，就如下面的代码演示的一样：
+
+.. GENERATED FROM PYTHON SOURCE LINES 182-186
+
+.. code-block:: default
+
+
+    import nni.retiarii.strategy as strategy
+    search_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is not wanted
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+
+    /home/yugzhan/miniconda3/envs/cu102/lib/python3.8/site-packages/ray/autoscaler/_private/cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
+      warnings.warn(
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 187-200
+
+挑选或自定义模型评估器
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+在探索过程中，探索策略反复生成新模型。模型评估器负责训练并验证每个生成的模型以获得模型的性能。
+该性能作为模型的得分被发送到探索策略以帮助其生成更好的模型。
+
+Retiarii 提供了 :doc:`内置模型评估器 </nas/evaluator>`，但在此之前，
+我们建议使用 :class:`FunctionalEvaluator <nni.retiarii.evaluator.FunctionalEvaluator>`，即用一个函数包装您自己的训练和评估代码。
+这个函数应该接收一个单一的模型类并使用 :func:`nni.report_final_result` 报告这个模型的最终分数。
+
+此处的示例创建了一个简单的评估器，该评估器在 MNIST 数据集上运行，训练 2 个 epoch，并报告其在验证集上的准确率。
+
+.. GENERATED FROM PYTHON SOURCE LINES 200-268
+
+.. code-block:: default
+
+
+    import nni
+
+    from torchvision import transforms
+    from torchvision.datasets import MNIST
+    from torch.utils.data import DataLoader
+
+
+    def train_epoch(model, device, train_loader, optimizer, epoch):
+        loss_fn = torch.nn.CrossEntropyLoss()
+        model.train()
+        for batch_idx, (data, target) in enumerate(train_loader):
+            data, target = data.to(device), target.to(device)
+            optimizer.zero_grad()
+            output = model(data)
+            loss = loss_fn(output, target)
+            loss.backward()
+            optimizer.step()
+            if batch_idx % 10 == 0:
+                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
+                    epoch, batch_idx * len(data), len(train_loader.dataset),
+                    100. * batch_idx / len(train_loader), loss.item()))
+
+
+    def test_epoch(model, device, test_loader):
+        model.eval()
+        test_loss = 0
+        correct = 0
+        with torch.no_grad():
+            for data, target in test_loader:
+                data, target = data.to(device), target.to(device)
+                output = model(data)
+                pred = output.argmax(dim=1, keepdim=True)
+                correct += pred.eq(target.view_as(pred)).sum().item()
+
+        test_loss /= len(test_loader.dataset)
+        accuracy = 100. * correct / len(test_loader.dataset)
+
+        print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
+              correct, len(test_loader.dataset), accuracy))
+
+        return accuracy
+
+
+    def evaluate_model(model_cls):
+        # "model_cls" is a class, need to instantiate
+        model = model_cls()
+
+        device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+        model.to(device)
+
+        optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
+        transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+        train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
+        test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
+
+        for epoch in range(3):
+            # train the model for one epoch
+            train_epoch(model, device, train_loader, optimizer, epoch)
+            # test the model for one epoch
+            accuracy = test_epoch(model, device, test_loader)
+            # call report intermediate result. Result can be float or dict
+            nni.report_intermediate_result(accuracy)
+
+        # report final test result
+        nni.report_final_result(accuracy)
+
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 269-270
+
+创建评估器
+
+.. GENERATED FROM PYTHON SOURCE LINES 270-274
+
+.. code-block:: default
+
+
+    from nni.retiarii.evaluator import FunctionalEvaluator
+    evaluator = FunctionalEvaluator(evaluate_model)
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 275-286
+
+这里的 ``train_epoch`` 和 ``test_epoch`` 可以是任何自定义函数，用户可以在其中编写自己的训练逻辑。
+
+建议这里的 ``evaluate_model`` 不接受除 ``model_cls`` 之外的其他参数。
+但是，在 `高级教程 </nas/evaluator>` 中，我们将展示如何使用其他参数，以免您确实需要这些参数。
+未来，我们将支持对评估器的参数进行变化（通常称为“超参数调优”）。
+
+启动实验
+--------------------
+
+一切都已准备就绪，现在就可以开始做模型搜索的实验了。如下所示。
+
+.. GENERATED FROM PYTHON SOURCE LINES 287-293
+
+.. code-block:: default
+
+
+    from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig
+    exp = RetiariiExperiment(model_space, evaluator, [], search_strategy)
+    exp_config = RetiariiExeConfig('local')
+    exp_config.experiment_name = 'mnist_search'
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 294-295
+
+以下配置可以用于控制最多/同时运行多少试验。
+
+.. GENERATED FROM PYTHON SOURCE LINES 295-299
+
+.. code-block:: default
+
+
+    exp_config.max_trial_number = 4   # 最多运行 4 个实验
+    exp_config.trial_concurrency = 2  # 最多同时运行 2 个试验
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 300-302
+
+如果要使用 GPU，请设置以下配置。
+如果您希望使用被占用了的 GPU（比如 GPU 上可能正在运行 GUI），则 ``use_active_gpu`` 应设置为 true。
+
+.. GENERATED FROM PYTHON SOURCE LINES 302-306
+
+.. code-block:: default
+
+
+    exp_config.trial_gpu_number = 1
+    exp_config.training_service.use_active_gpu = True
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 307-308
+
+启动实验。 在一个有两块 GPU 的工作站上完成整个实验大约需要几分钟时间。
+
+.. GENERATED FROM PYTHON SOURCE LINES 308-311
+
+.. code-block:: default
+
+
+    exp.run(exp_config, 8081)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    INFO:nni.experiment:Creating experiment, Experiment ID: z8ns5fv7
+    INFO:nni.experiment:Connecting IPC pipe...
+    INFO:nni.experiment:Starting web server...
+    INFO:nni.experiment:Setting up...
+    INFO:nni.runtime.msg_dispatcher_base:Dispatcher started
+    INFO:nni.retiarii.experiment.pytorch:Web UI URLs: http://127.0.0.1:8081 http://10.190.172.35:8081 http://192.168.49.1:8081 http://172.17.0.1:8081
+    INFO:nni.retiarii.experiment.pytorch:Start strategy...
+    INFO:root:Successfully update searchSpace.
+    INFO:nni.retiarii.strategy.bruteforce:Random search running in fixed size mode. Dedup: on.
+    INFO:nni.retiarii.experiment.pytorch:Stopping experiment, please wait...
+    INFO:nni.retiarii.experiment.pytorch:Strategy exit
+    INFO:nni.retiarii.experiment.pytorch:Waiting for experiment to become DONE (you can ctrl+c if there is no running trial jobs)...
+    INFO:nni.runtime.msg_dispatcher_base:Dispatcher exiting...
+    INFO:nni.retiarii.experiment.pytorch:Experiment stopped
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 312-330
+
+除了 ``local`` 训练平台，用户还可以使用 :doc:`不同的训练平台 </experiment/training_service/overview>` 来运行 Retiarii 试验。
+
+可视化实验
+----------------------
+
+用户可以可视化他们的架构搜索实验，就像可视化超参调优实验一样。
+例如，在浏览器中打开 ``localhost:8081``，8081 是您在 ``exp.run`` 中设置的端口。
+详情请参考 :doc:`这里</experiment/web_portal/web_portal>`。
+
+我们支持使用第三方可视化引擎（如 `Netron <https://netron.app/>`__）对模型进行可视化。
+这可以通过单击每个试验的详细面板中的“可视化”来使用。
+请注意，当前的可视化是基于 `onnx <https://onnx.ai/>`__，
+因此，如果模型不能导出为 onnx，可视化是不可行的。
+
+内置评估器（例如 Classification）会将模型自动导出到文件中。
+对于您自己的评估器，您需要将文件保存到 ``$NNI_OUTPUT_DIR/model.onnx``。
+例如，
+
+.. GENERATED FROM PYTHON SOURCE LINES 330-344
+
+.. code-block:: default
+
+
+    import os
+    from pathlib import Path
+
+
+    def evaluate_model_with_visualization(model_cls):
+        model = model_cls()
+        # dump the model into an onnx
+        if 'NNI_OUTPUT_DIR' in os.environ:
+            dummy_input = torch.zeros(1, 3, 32, 32)
+            torch.onnx.export(model, (dummy_input, ),
+                              Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
+        evaluate_model(model_cls)
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 345-353
+
+重新启动实验，Web 界面上会显示一个按钮。
+
+.. image:: ../../img/netron_entrance_webui.png
+
+导出最优模型
+-----------------
+
+搜索完成后，用户可以使用 ``export_top_models`` 导出最优模型。
+
+.. GENERATED FROM PYTHON SOURCE LINES 353-357
+
+.. code-block:: default
+
+
+    for model_dict in exp.export_top_models(formatter='dict'):
+        print(model_dict)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    {'model_1': '0', 'model_2': 0.25, 'model_3': 64}
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 358-362
+
+输出是一个 JSON 对象，记录了最好的模型的每一个选择都选了什么。
+如果用户想要搜出来的模型的源代码，他们可以使用 :ref:`基于图的引擎 <graph-based-execution-engine>`，只需增加如下两行。
+
+.. GENERATED FROM PYTHON SOURCE LINES 362-365
+
+.. code-block:: default
+
+
+    exp_config.execution_engine = 'base'
+    export_formatter = 'code'
+
+
+
+
+
+
+
+
+.. rst-class:: sphx-glr-timing
+
+   **Total running time of the script:** ( 2 minutes  4.499 seconds)
+
+
+.. _sphx_glr_download_tutorials_hello_nas.py:
+
+
+.. only :: html
+
+ .. container:: sphx-glr-footer
+    :class: sphx-glr-footer-example
+
+
+
+  .. container:: sphx-glr-download sphx-glr-download-python
+
+     :download:`Download Python source code: hello_nas.py <hello_nas.py>`
+
+
+
+  .. container:: sphx-glr-download sphx-glr-download-jupyter
+
+     :download:`Download Jupyter notebook: hello_nas.ipynb <hello_nas.ipynb>`
+
+
+.. only:: html
+
+ .. rst-class:: sphx-glr-signature
+
+    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
--- a/docs/source/tutorials/hpo_nnictl/model.ipynb
+++ b/docs/source/tutorials/hpo_nnictl/model.ipynb
+{
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "%matplotlib inline"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n# Port PyTorch Quickstart to NNI\nThis is a modified version of `PyTorch quickstart`_.\n\nIt can be run directly and will have the exact same result as original version.\n\nFurthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.\n\nIt is recommended to run this script directly first to verify the environment.\n\nThere are 2 key differences from the original version:\n\n1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.\n2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import nni\nimport torch\nfrom torch import nn\nfrom torch.utils.data import DataLoader\nfrom torchvision import datasets\nfrom torchvision.transforms import ToTensor"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Hyperparameters to be tuned\nThese are the hyperparameters that will be tuned.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "params = {\n    'features': 512,\n    'lr': 0.001,\n    'momentum': 0,\n}"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Get optimized hyperparameters\nIf run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.\nBut with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "optimized_params = nni.get_next_parameter()\nparams.update(optimized_params)\nprint(params)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Load dataset\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "training_data = datasets.FashionMNIST(root=\"data\", train=True, download=True, transform=ToTensor())\ntest_data = datasets.FashionMNIST(root=\"data\", train=False, download=True, transform=ToTensor())\n\nbatch_size = 64\n\ntrain_dataloader = DataLoader(training_data, batch_size=batch_size)\ntest_dataloader = DataLoader(test_data, batch_size=batch_size)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Build model with hyperparameters\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nprint(f\"Using {device} device\")\n\nclass NeuralNetwork(nn.Module):\n    def __init__(self):\n        super(NeuralNetwork, self).__init__()\n        self.flatten = nn.Flatten()\n        self.linear_relu_stack = nn.Sequential(\n            nn.Linear(28*28, params['features']),\n            nn.ReLU(),\n            nn.Linear(params['features'], params['features']),\n            nn.ReLU(),\n            nn.Linear(params['features'], 10)\n        )\n\n    def forward(self, x):\n        x = self.flatten(x)\n        logits = self.linear_relu_stack(x)\n        return logits\n\nmodel = NeuralNetwork().to(device)\n\nloss_fn = nn.CrossEntropyLoss()\noptimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Define train and test\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "def train(dataloader, model, loss_fn, optimizer):\n    size = len(dataloader.dataset)\n    model.train()\n    for batch, (X, y) in enumerate(dataloader):\n        X, y = X.to(device), y.to(device)\n        pred = model(X)\n        loss = loss_fn(pred, y)\n        optimizer.zero_grad()\n        loss.backward()\n        optimizer.step()\n\ndef test(dataloader, model, loss_fn):\n    size = len(dataloader.dataset)\n    num_batches = len(dataloader)\n    model.eval()\n    test_loss, correct = 0, 0\n    with torch.no_grad():\n        for X, y in dataloader:\n            X, y = X.to(device), y.to(device)\n            pred = model(X)\n            test_loss += loss_fn(pred, y).item()\n            correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n    test_loss /= num_batches\n    correct /= size\n    return correct"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Train model and report accuracy\nReport accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "epochs = 5\nfor t in range(epochs):\n    print(f\"Epoch {t+1}\\n-------------------------------\")\n    train(train_dataloader, model, loss_fn, optimizer)\n    accuracy = test(test_dataloader, model, loss_fn)\n    nni.report_intermediate_result(accuracy)\nnni.report_final_result(accuracy)"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.10.3"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
\ No newline at end of file
--- a/docs/source/tutorials/hpo_nnictl/model.py
+++ b/docs/source/tutorials/hpo_nnictl/model.py
+"""
+Port PyTorch Quickstart to NNI
+==============================
+This is a modified version of `PyTorch quickstart`_.
+
+It can be run directly and will have the exact same result as original version.
+
+Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
+
+It is recommended to run this script directly first to verify the environment.
+
+There are 2 key differences from the original version:
+
+1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
+2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
+
+.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
+"""
+
+# %%
+import nni
+import torch
+from torch import nn
+from torch.utils.data import DataLoader
+from torchvision import datasets
+from torchvision.transforms import ToTensor
+
+# %%
+# Hyperparameters to be tuned
+# ---------------------------
+# These are the hyperparameters that will be tuned.
+params = {
+    'features': 512,
+    'lr': 0.001,
+    'momentum': 0,
+}
+
+# %%
+# Get optimized hyperparameters
+# -----------------------------
+# If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
+# But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
+optimized_params = nni.get_next_parameter()
+params.update(optimized_params)
+print(params)
+
+# %%
+# Load dataset
+# ------------
+training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
+test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
+
+batch_size = 64
+
+train_dataloader = DataLoader(training_data, batch_size=batch_size)
+test_dataloader = DataLoader(test_data, batch_size=batch_size)
+
+# %%
+# Build model with hyperparameters
+# --------------------------------
+device = "cuda" if torch.cuda.is_available() else "cpu"
+print(f"Using {device} device")
+
+class NeuralNetwork(nn.Module):
+    def __init__(self):
+        super(NeuralNetwork, self).__init__()
+        self.flatten = nn.Flatten()
+        self.linear_relu_stack = nn.Sequential(
+            nn.Linear(28*28, params['features']),
+            nn.ReLU(),
+            nn.Linear(params['features'], params['features']),
+            nn.ReLU(),
+            nn.Linear(params['features'], 10)
+        )
+
+    def forward(self, x):
+        x = self.flatten(x)
+        logits = self.linear_relu_stack(x)
+        return logits
+
+model = NeuralNetwork().to(device)
+
+loss_fn = nn.CrossEntropyLoss()
+optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
+
+# %%
+# Define train and test
+# ---------------------
+def train(dataloader, model, loss_fn, optimizer):
+    size = len(dataloader.dataset)
+    model.train()
+    for batch, (X, y) in enumerate(dataloader):
+        X, y = X.to(device), y.to(device)
+        pred = model(X)
+        loss = loss_fn(pred, y)
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+
+def test(dataloader, model, loss_fn):
+    size = len(dataloader.dataset)
+    num_batches = len(dataloader)
+    model.eval()
+    test_loss, correct = 0, 0
+    with torch.no_grad():
+        for X, y in dataloader:
+            X, y = X.to(device), y.to(device)
+            pred = model(X)
+            test_loss += loss_fn(pred, y).item()
+            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
+    test_loss /= num_batches
+    correct /= size
+    return correct
+
+# %%
+# Train model and report accuracy
+# -------------------------------
+# Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
+epochs = 5
+for t in range(epochs):
+    print(f"Epoch {t+1}\n-------------------------------")
+    train(train_dataloader, model, loss_fn, optimizer)
+    accuracy = test(test_dataloader, model, loss_fn)
+    nni.report_intermediate_result(accuracy)
+nni.report_final_result(accuracy)
--- a/docs/source/tutorials/hpo_nnictl/model.py.md5
+++ b/docs/source/tutorials/hpo_nnictl/model.py.md5
+ed8bfc27e3d555d842fc4eec2635e619
\ No newline at end of file
--- a/docs/source/tutorials/hpo_nnictl/model.rst
+++ b/docs/source/tutorials/hpo_nnictl/model.rst
+:orphan:
+
+.. DO NOT EDIT.
+.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
+.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
+.. "tutorials/hpo_nnictl/model.py"
+.. LINE NUMBERS ARE GIVEN BELOW.
+
+.. only:: html
+
+    .. note::
+        :class: sphx-glr-download-link-note
+
+        Click :ref:`here <sphx_glr_download_tutorials_hpo_nnictl_model.py>`
+        to download the full example code
+
+.. rst-class:: sphx-glr-example-title
+
+.. _sphx_glr_tutorials_hpo_nnictl_model.py:
+
+
+Port PyTorch Quickstart to NNI
+==============================
+This is a modified version of `PyTorch quickstart`_.
+
+It can be run directly and will have the exact same result as original version.
+
+Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
+
+It is recommended to run this script directly first to verify the environment.
+
+There are 2 key differences from the original version:
+
+1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
+2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
+
+.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
+
+.. GENERATED FROM PYTHON SOURCE LINES 21-28
+
+.. code-block:: default
+
+    import nni
+    import torch
+    from torch import nn
+    from torch.utils.data import DataLoader
+    from torchvision import datasets
+    from torchvision.transforms import ToTensor
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 29-32
+
+Hyperparameters to be tuned
+---------------------------
+These are the hyperparameters that will be tuned.
+
+.. GENERATED FROM PYTHON SOURCE LINES 32-38
+
+.. code-block:: default
+
+    params = {
+        'features': 512,
+        'lr': 0.001,
+        'momentum': 0,
+    }
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 39-43
+
+Get optimized hyperparameters
+-----------------------------
+If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
+But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
+
+.. GENERATED FROM PYTHON SOURCE LINES 43-47
+
+.. code-block:: default
+
+    optimized_params = nni.get_next_parameter()
+    params.update(optimized_params)
+    print(params)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    {'features': 512, 'lr': 0.001, 'momentum': 0}
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 48-50
+
+Load dataset
+------------
+
+.. GENERATED FROM PYTHON SOURCE LINES 50-58
+
+.. code-block:: default
+
+    training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
+    test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
+
+    batch_size = 64
+
+    train_dataloader = DataLoader(training_data, batch_size=batch_size)
+    test_dataloader = DataLoader(test_data, batch_size=batch_size)
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 59-61
+
+Build model with hyperparameters
+--------------------------------
+
+.. GENERATED FROM PYTHON SOURCE LINES 61-86
+
+.. code-block:: default
+
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    print(f"Using {device} device")
+
+    class NeuralNetwork(nn.Module):
+        def __init__(self):
+            super(NeuralNetwork, self).__init__()
+            self.flatten = nn.Flatten()
+            self.linear_relu_stack = nn.Sequential(
+                nn.Linear(28*28, params['features']),
+                nn.ReLU(),
+                nn.Linear(params['features'], params['features']),
+                nn.ReLU(),
+                nn.Linear(params['features'], 10)
+            )
+
+        def forward(self, x):
+            x = self.flatten(x)
+            logits = self.linear_relu_stack(x)
+            return logits
+
+    model = NeuralNetwork().to(device)
+
+    loss_fn = nn.CrossEntropyLoss()
+    optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    Using cpu device
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 87-89
+
+Define train and test
+---------------------
+
+.. GENERATED FROM PYTHON SOURCE LINES 89-115
+
+.. code-block:: default
+
+    def train(dataloader, model, loss_fn, optimizer):
+        size = len(dataloader.dataset)
+        model.train()
+        for batch, (X, y) in enumerate(dataloader):
+            X, y = X.to(device), y.to(device)
+            pred = model(X)
+            loss = loss_fn(pred, y)
+            optimizer.zero_grad()
+            loss.backward()
+            optimizer.step()
+
+    def test(dataloader, model, loss_fn):
+        size = len(dataloader.dataset)
+        num_batches = len(dataloader)
+        model.eval()
+        test_loss, correct = 0, 0
+        with torch.no_grad():
+            for X, y in dataloader:
+                X, y = X.to(device), y.to(device)
+                pred = model(X)
+                test_loss += loss_fn(pred, y).item()
+                correct += (pred.argmax(1) == y).type(torch.float).sum().item()
+        test_loss /= num_batches
+        correct /= size
+        return correct
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 116-119
+
+Train model and report accuracy
+-------------------------------
+Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
+
+.. GENERATED FROM PYTHON SOURCE LINES 119-126
+
+.. code-block:: default
+
+    epochs = 5
+    for t in range(epochs):
+        print(f"Epoch {t+1}\n-------------------------------")
+        train(train_dataloader, model, loss_fn, optimizer)
+        accuracy = test(test_dataloader, model, loss_fn)
+        nni.report_intermediate_result(accuracy)
+    nni.report_final_result(accuracy)
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    Epoch 1
+    -------------------------------
+    [2022-03-21 01:09:37] INFO (nni/MainThread) Intermediate result: 0.461  (Index 0)
+    Epoch 2
+    -------------------------------
+    [2022-03-21 01:09:42] INFO (nni/MainThread) Intermediate result: 0.5529  (Index 1)
+    Epoch 3
+    -------------------------------
+    [2022-03-21 01:09:47] INFO (nni/MainThread) Intermediate result: 0.6155  (Index 2)
+    Epoch 4
+    -------------------------------
+    [2022-03-21 01:09:52] INFO (nni/MainThread) Intermediate result: 0.6345  (Index 3)
+    Epoch 5
+    -------------------------------
+    [2022-03-21 01:09:56] INFO (nni/MainThread) Intermediate result: 0.6505  (Index 4)
+    [2022-03-21 01:09:56] INFO (nni/MainThread) Final result: 0.6505
+
+
+
+
+
+.. rst-class:: sphx-glr-timing
+
+   **Total running time of the script:** ( 0 minutes  24.441 seconds)
+
+
+.. _sphx_glr_download_tutorials_hpo_nnictl_model.py:
+
+
+.. only :: html
+
+ .. container:: sphx-glr-footer
+    :class: sphx-glr-footer-example
+
+
+
+  .. container:: sphx-glr-download sphx-glr-download-python
+
+     :download:`Download Python source code: model.py <model.py>`
+
+
+
+  .. container:: sphx-glr-download sphx-glr-download-jupyter
+
+     :download:`Download Jupyter notebook: model.ipynb <model.ipynb>`
+
+
+.. only:: html
+
+ .. rst-class:: sphx-glr-signature
+
+    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
--- a/docs/source/tutorials/hpo_nnictl/model_codeobj.pickle
+++ b/docs/source/tutorials/hpo_nnictl/model_codeobj.pickle
--- a/docs/source/tutorials/hpo_nnictl/nnictl.rst
+++ b/docs/source/tutorials/hpo_nnictl/nnictl.rst
+Run HPO Experiment with nnictl
+==============================
+
+This tutorial has exactly the same effect as :doc:`../hpo_quickstart_pytorch/main`.
+
+Both tutorials optimize the model in `official PyTorch quickstart
+<https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ with auto-tuning,
+while this one manages the experiment with command line tool and YAML config file, instead of pure Python code.
+
+The tutorial consists of 4 steps: 
+
+1. Modify the model for auto-tuning.
+2. Define hyperparameters' search space.
+3. Create config file.
+4. Run the experiment.
+
+The first two steps are identical to quickstart.
+
+Step 1: Prepare the model
+-------------------------
+In first step, we need to prepare the model to be tuned.
+
+The model should be put in a separate script.
+It will be evaluated many times concurrently,
+and possibly will be trained on distributed platforms.
+
+In this tutorial, the model is defined in :doc:`model.py <model>`.
+
+In short, it is a PyTorch model with 3 additional API calls:
+
+1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
+2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
+3. Use :func:`nni.report_final_result` to report final accuracy.
+
+Please understand the model code before continue to next step.
+
+Step 2: Define search space
+---------------------------
+In model code, we have prepared 3 hyperparameters to be tuned:
+*features*, *lr*, and *momentum*.
+
+Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
+
+Assuming we have following prior knowledge for these hyperparameters:
+
+1. *features* should be one of 128, 256, 512, 1024.
+2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
+3. *momentum* should be a float between 0 and 1.
+
+In NNI, the space of *features* is called ``choice``;
+the space of *lr* is called ``loguniform``;
+and the space of *momentum* is called ``uniform``.
+You may have noticed, these names are derived from ``numpy.random``.
+
+For full specification of search space, check :doc:`the reference </hpo/search_space>`.
+
+Now we can define the search space as follow:
+
+.. code-block:: yaml
+
+    search_space:
+      features:
+        _type: choice
+        _value: [ 128, 256, 512, 1024 ]
+      lr:
+        _type: loguniform
+        _value: [ 0.0001, 0.1 ]
+      momentum:
+        _type: uniform
+        _value: [ 0, 1 ]
+
+Step 3: Configure the experiment
+--------------------------------
+NNI uses an *experiment* to manage the HPO process.
+The *experiment config* defines how to train the models and how to explore the search space.
+
+In this tutorial we use a YAML file ``config.yaml`` to define the experiment.
+
+Configure trial code
+^^^^^^^^^^^^^^^^^^^^
+In NNI evaluation of each hyperparameter set is called a *trial*.
+So the model script is called *trial code*.
+
+.. code-block:: yaml
+
+    trial_command: python model.py
+    trial_code_directory: .
+
+When ``trial_code_directory`` is a relative path, it relates to the config file.
+So in this case we need to put ``config.yaml`` and ``model.py`` in the same directory.
+
+.. attention::
+
+    The rules for resolving relative path are different in YAML config file and :doc:`Python experiment API </reference/experiment>`.
+    In Python experiment API relative paths are relative to current working directory.
+
+Configure how many trials to run
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
+
+.. code-block:: yaml
+
+    max_trial_number: 10
+    trial_concurrency: 2
+
+You may also set ``max_experiment_duration = '1h'`` to limit running time.
+
+If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
+the experiment will run forever until you stop it.
+
+.. note::
+
+    ``max_trial_number`` is set to 10 here for a fast example.
+    In real world it should be set to a larger number.
+    With default config TPE tuner requires 20 trials to warm up.
+
+
+Configure tuning algorithm
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+Here we use :doc:`TPE tuner </hpo/tuners>`.
+
+.. code-block:: yaml
+
+    name: TPE
+    class_args:
+      optimize_mode: maximize
+
+Configure training service
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In this tutorial we use *local* mode,
+which means models will be trained on local machine, without using any special training platform.
+
+.. code-block:: yaml
+
+    training_service:
+      platform: local
+
+Wrap up
+^^^^^^^
+
+The full content of ``config.yaml`` is as follow:
+
+.. code-block:: yaml
+
+    search_space:
+      features:
+        _type: choice
+        _value: [ 128, 256, 512, 1024 ]
+      lr:
+        _type: loguniform
+        _value: [ 0.0001, 0.1 ]
+      momentum:
+        _type: uniform
+        _value: [ 0, 1 ]
+    
+    trial_command: python model.py
+    trial_code_directory: .
+
+    trial_concurrency: 2
+    max_trial_number: 10
+    
+    tuner:
+      name: TPE
+      class_args:
+        optimize_mode: maximize
+    
+    training_service:
+      platform: local
+
+Step 4: Run the experiment
+--------------------------
+Now the experiment is ready. Launch it with ``nnictl create`` command:
+
+.. code-block:: bash
+
+    $ nnictl create --config config.yaml --port 8080
+
+You can use the web portal to view experiment status: http://localhost:8080.
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    [2022-04-01 12:00:00] Creating experiment, Experiment ID: p43ny6ew
+    [2022-04-01 12:00:00] Starting web server...
+    [2022-04-01 12:00:01] Setting up...
+    [2022-04-01 12:00:01] Web portal URLs: http://127.0.0.1:8080 http://192.168.1.1:8080
+    [2022-04-01 12:00:01] To stop experiment run "nnictl stop p43ny6ew" or "nnictl stop --all"
+    [2022-04-01 12:00:01] Reference: https://nni.readthedocs.io/en/stable/reference/nnictl.html
+
+When the experiment is done, use ``nnictl stop`` command to stop it.
+
+.. code-block:: bash
+
+    $ nnictl stop p43ny6ew
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    INFO:  Stopping experiment 7u8yg9zw
+    INFO:  Stop experiment success.
--- a/docs/source/tutorials/hpo_quickstart_pytorch/images/thumb/sphx_glr_main_thumb.png
+++ b/docs/source/tutorials/hpo_quickstart_pytorch/images/thumb/sphx_glr_main_thumb.png