refactor of nas examples (#3513)

6aaca5f7 · QuanluZhang · GitHub · 26207d15 · 6aaca5f7 · 6aaca5f7
Unverified Commit 6aaca5f7 authored Apr 09, 2021 by QuanluZhang Committed by GitHub Apr 09, 2021
20 changed files
--- a/docs/en_US/NAS/CDARTS.rst
+++ b/docs/en_US/NAS/CDARTS.rst
@@ -34,7 +34,7 @@ This is CDARTS based on the NNI platform, which currently supports CIFAR10 searc
 Examples
 --------
-`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/cdarts>`__
+`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/legacy/cdarts>`__
 .. code-block:: bash
@@ -47,7 +47,7 @@ Examples
   python setup.py install --cpp_ext --cuda_ext
   # search the best architecture
-   cd examples/nas/cdarts
+   cd examples/nas/legacy/cdarts
   bash run_search_cifar.sh
   # train the best architecture.

--- a/docs/en_US/NAS/ClassicNas.rst
+++ b/docs/en_US/NAS/ClassicNas.rst
@@ -32,7 +32,7 @@ A file named ``nni_auto_gen_search_space.json`` is generated by this command. Th
 Currently, we only support :githublink:`PPO Tuner <examples/tuners/random_nas_tuner>` for classic NAS. More classic NAS algorithms will be supported soon.
-The complete examples can be found :githublink:`here <examples/nas/classic_nas>` for PyTorch and :githublink:`here <examples/nas/classic_nas-tf>` for TensorFlow.
+The complete examples can be found :githublink:`here <examples/nas/legacy/classic_nas>` for PyTorch and :githublink:`here <examples/nas/legacy/classic_nas-tf>` for TensorFlow.
 Standalone mode for easy debugging
 ----------------------------------

--- a/docs/en_US/NAS/Cream.rst
+++ b/docs/en_US/NAS/Cream.rst
@@ -53,7 +53,7 @@ The training with 16 Gpus is a little bit superior than 8 Gpus, as below.
 Examples
 --------
-`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/cream>`__
+`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/legacy/cream>`__
 Please run the following scripts in the example folder.

--- a/docs/en_US/NAS/DARTS.rst
+++ b/docs/en_US/NAS/DARTS.rst
@@ -36,7 +36,7 @@ Examples
 CNN Search Space
 ^^^^^^^^^^^^^^^^
-:githublink:`Example code <examples/nas/darts>`
+:githublink:`Example code <examples/nas/oneshot/darts>`
 .. code-block:: bash
@@ -44,7 +44,7 @@ CNN Search Space
   git clone https://github.com/Microsoft/nni.git
   # search the best architecture
-   cd examples/nas/darts
+   cd examples/nas/oneshot/darts
   python3 search.py
   # train the best architecture

--- a/docs/en_US/NAS/ENAS.rst
+++ b/docs/en_US/NAS/ENAS.rst
@@ -14,7 +14,7 @@ Examples
 CIFAR10 Macro/Micro Search Space
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-:githublink:`Example code <examples/nas/enas>`
+:githublink:`Example code <examples/nas/oneshot/enas>`
 .. code-block:: bash
@@ -22,7 +22,7 @@ CIFAR10 Macro/Micro Search Space
   git clone https://github.com/Microsoft/nni.git
   # search the best architecture
-   cd examples/nas/enas
+   cd examples/nas/oneshot/enas
   # search in macro search space
   python3 search.py --search-for macro

--- a/docs/en_US/NAS/PDARTS.rst
+++ b/docs/en_US/NAS/PDARTS.rst
@@ -4,7 +4,7 @@ P-DARTS
 Examples
 --------
-:githublink:`Example code <examples/nas/pdarts>`
+:githublink:`Example code <examples/nas/legacy/pdarts>`
 .. code-block:: bash
@@ -12,7 +12,7 @@ Examples
   git clone https://github.com/Microsoft/nni.git
   # search the best architecture
-   cd examples/nas/pdarts
+   cd examples/nas/legacy/pdarts
   python3 search.py
   # train the best architecture, it's the same progress as darts.

--- a/docs/en_US/NAS/Proxylessnas.rst
+++ b/docs/en_US/NAS/Proxylessnas.rst
@@ -24,7 +24,7 @@ To use ProxylessNAS training/searching approach, users need to specify search sp
   trainer.train()
   trainer.export(args.arch_path)
-The complete example code can be found :githublink:`here <examples/nas/proxylessnas>`.
+The complete example code can be found :githublink:`here <examples/nas/oneshot/proxylessnas>`.
 **Input arguments of ProxylessNasTrainer**
@@ -56,7 +56,7 @@ Implementation
 The implementation on NNI is based on the `offical implementation <https://github.com/mit-han-lab/ProxylessNAS>`__. The official implementation supports two training approaches: gradient descent and RL based, and support different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing.
-Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in :githublink:`example code <examples/nas/proxylessnas>` using :githublink:`NNI NAS interface <nni/algorithms/nas/pytorch/proxylessnas>`.
+Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in :githublink:`example code <examples/nas/oneshot/proxylessnas>` using :githublink:`NNI NAS interface <nni/algorithms/nas/pytorch/proxylessnas>`.
 .. image:: ../../img/proxylessnas.png
   :target: ../../img/proxylessnas.png

--- a/docs/en_US/NAS/SPOS.rst
+++ b/docs/en_US/NAS/SPOS.rst
@@ -13,7 +13,7 @@ Examples
 Here is a use case, which is the search space in paper, and the way to use flops limit to perform uniform sampling.
-:githublink:`Example code <examples/nas/spos>`
+:githublink:`Example code <examples/nas/oneshot/spos>`
 Requirements
 ^^^^^^^^^^^^

--- a/docs/en_US/NAS/SearchSpaceZoo.rst
+++ b/docs/en_US/NAS/SearchSpaceZoo.rst
@@ -8,7 +8,7 @@ Search Space Zoo
 DartsCell
 ---------
-DartsCell is extracted from :githublink:`CNN model <examples/nas/darts>`. A DartsCell is a directed acyclic graph containing an ordered sequence of N nodes and each node stands for a latent representation (e.g. feature map in a convolutional network). Directed edges from Node 1 to Node 2 are associated with some operations that transform Node 1 and the result is stored on Node 2. The `Candidate operators <#predefined-operations-darts>`__ between nodes is predefined and unchangeable. One edge represents an operation that chosen from the predefined ones to be applied to the starting node of the edge. One cell contains two input nodes, a single output node, and other ``n_node`` nodes. The input nodes are defined as the cell outputs in the previous two layers. The output of the cell is obtained by applying a reduction operation (e.g. concatenation) to all the intermediate nodes. To make the search space continuous, the categorical choice of a particular operation is relaxed to a softmax over all possible operations. By adjusting the weight of softmax on every node, the operation with the highest probability is chosen to be part of the final structure. A CNN model can be formed by stacking several cells together, which builds a search space. Note that, in DARTS paper all cells in the model share the same structure.
+DartsCell is extracted from :githublink:`CNN model <examples/nas/oneshot/darts>`. A DartsCell is a directed acyclic graph containing an ordered sequence of N nodes and each node stands for a latent representation (e.g. feature map in a convolutional network). Directed edges from Node 1 to Node 2 are associated with some operations that transform Node 1 and the result is stored on Node 2. The `Candidate operators <#predefined-operations-darts>`__ between nodes is predefined and unchangeable. One edge represents an operation that chosen from the predefined ones to be applied to the starting node of the edge. One cell contains two input nodes, a single output node, and other ``n_node`` nodes. The input nodes are defined as the cell outputs in the previous two layers. The output of the cell is obtained by applying a reduction operation (e.g. concatenation) to all the intermediate nodes. To make the search space continuous, the categorical choice of a particular operation is relaxed to a softmax over all possible operations. By adjusting the weight of softmax on every node, the operation with the highest probability is chosen to be part of the final structure. A CNN model can be formed by stacking several cells together, which builds a search space. Note that, in DARTS paper all cells in the model share the same structure.
 One structure in the Darts search space is shown below. Note that, NNI merges the last one of the four intermediate nodes and the output node.
@@ -82,7 +82,7 @@ All supported operators for Darts are listed below.
 ENASMicroLayer
 --------------
-This layer is extracted from the model designed :githublink:`here <examples/nas/enas>`. A model contains several blocks that share the same architecture. A block is made up of some normal layers and reduction layers, ``ENASMicroLayer`` is a unified implementation of the two types of layers. The only difference between the two layers is that reduction layers apply all operations with ``stride=2``.
+This layer is extracted from the model designed :githublink:`here <examples/nas/oneshot/enas>`. A model contains several blocks that share the same architecture. A block is made up of some normal layers and reduction layers, ``ENASMicroLayer`` is a unified implementation of the two types of layers. The only difference between the two layers is that reduction layers apply all operations with ``stride=2``.
 ENAS Micro employs a DAG with N nodes in one cell, where the nodes represent local computations, and the edges represent the flow of information between the N nodes. One cell contains two input nodes and a single output node. The following nodes choose two previous nodes as input and apply two operations from `predefined ones <#predefined-operations-enas>`__ then add them as the output of this node. For example, Node 4 chooses Node 1 and Node 3 as inputs then applies ``MaxPool`` and ``AvgPool`` on the inputs respectively, then adds and sums them as the output of Node 4. Nodes that are not served as input for any other node are viewed as the output of the layer. If there are multiple output nodes, the model will calculate the average of these nodes as the layer output.

--- a/docs/en_US/NAS/TextNAS.rst
+++ b/docs/en_US/NAS/TextNAS.rst
@@ -57,7 +57,7 @@ Examples
 Search Space
 ^^^^^^^^^^^^
-:githublink:`Example code <examples/nas/textnas>`
+:githublink:`Example code <examples/nas/legacy/textnas>`
 .. code-block:: bash
@@ -65,7 +65,7 @@ Search Space
   git clone https://github.com/Microsoft/nni.git
   # search the best architecture
-   cd examples/nas/textnas
+   cd examples/nas/legacy/textnas
   # view more options for search
   python3 search.py -h
@@ -83,7 +83,7 @@ retrain
   git clone https://github.com/Microsoft/nni.git
   # search the best architecture
-   cd examples/nas/textnas
+   cd examples/nas/legacy/textnas
   # default to retrain on sst-2
   sh run_retrain.sh

--- a/docs/en_US/NAS/retiarii/Advanced.rst
+++ b/docs/en_US/NAS/retiarii/Advanced.rst
@@ -88,7 +88,7 @@ Use placehoder to make mutation easier: ``nn.Placeholder``. If you want to mutat
    stride=stride
  )
-``label`` is used by mutator to identify this placeholder. The other parameters are the information that are required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>`.
+``label`` is used by mutator to identify this placeholder. The other parameters are the information that are required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>`.
 Starting an experiment is almost the same as using inline mutation APIs. The only difference is that the applied mutators should be passed to ``RetiariiExperiment``. Below is a simple example.

--- a/docs/en_US/NAS/retiarii/Tutorial.rst
+++ b/docs/en_US/NAS/retiarii/Tutorial.rst
@@ -63,7 +63,7 @@ Below is a very simple example of defining a base model, it is almost the same a
 The above example also shows how to use ``@basic_unit``. ``@basic_unit`` is decorated on a user-defined module to tell Retiarii that there will be no mutation within this module, Retiarii can treat it as a basic unit (i.e., as a blackbox). It is useful when (1) users want to mutate the initialization parameters of this module, or (2) Retiarii fails to parse this module due to complex control flow (e.g., ``for``, ``while``). More detailed description of ``@basic_unit`` can be found `here <./Advanced.rst>`__.
-Users can refer to :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>` and :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>` for more complicated examples.
+Users can refer to :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>` and :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>` for more complicated examples.
 Define Model Mutations
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -195,7 +195,7 @@ After all the above are prepared, it is time to start an experiment to do the mo
  exp_config.training_service.use_active_gpu = False
  exp.run(exp_config, 8081)
-The complete code of a simple MNIST example can be found :githublink:`here <test/retiarii_test/mnist/test.py>`.
+The complete code of a simple MNIST example can be found :githublink:`here <examples/nas/multi-trial/mnist/search.py>`.
 **Local Debug Mode**: When running an experiment, it is easy to get some trivial errors in trial code, such as shape mismatch, undefined variable. To quickly fix these kinds of errors, we provide local debug mode which locally applies mutators once and runs only that generated model. To use local debug mode, users can simply invoke the API `debug_mutated_model(base_model, trainer, applied_mutators)`.

--- a/examples/nas/cdarts/aux_head.py
+++ b/examples/nas/cdarts/aux_head.py
--- a/examples/nas/cdarts/config.py
+++ b/examples/nas/cdarts/config.py
--- a/examples/nas/cdarts/datasets/cifar.py
+++ b/examples/nas/cdarts/datasets/cifar.py
--- a/examples/nas/cdarts/datasets/data_utils.py
+++ b/examples/nas/cdarts/datasets/data_utils.py
--- a/examples/nas/cdarts/datasets/imagenet.py
+++ b/examples/nas/cdarts/datasets/imagenet.py
--- a/examples/nas/cdarts/genotypes.py
+++ b/examples/nas/cdarts/genotypes.py
--- a/examples/nas/cdarts/model.py
+++ b/examples/nas/cdarts/model.py
--- a/examples/nas/cdarts/ops.py
+++ b/examples/nas/cdarts/ops.py