Unverified Commit 6aaca5f7 authored by QuanluZhang's avatar QuanluZhang Committed by GitHub
Browse files

refactor of nas examples (#3513)

parent 26207d15
...@@ -34,7 +34,7 @@ This is CDARTS based on the NNI platform, which currently supports CIFAR10 searc ...@@ -34,7 +34,7 @@ This is CDARTS based on the NNI platform, which currently supports CIFAR10 searc
Examples Examples
-------- --------
`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/cdarts>`__ `Example code <https://github.com/microsoft/nni/tree/master/examples/nas/legacy/cdarts>`__
.. code-block:: bash .. code-block:: bash
...@@ -47,7 +47,7 @@ Examples ...@@ -47,7 +47,7 @@ Examples
python setup.py install --cpp_ext --cuda_ext python setup.py install --cpp_ext --cuda_ext
# search the best architecture # search the best architecture
cd examples/nas/cdarts cd examples/nas/legacy/cdarts
bash run_search_cifar.sh bash run_search_cifar.sh
# train the best architecture. # train the best architecture.
......
...@@ -32,7 +32,7 @@ A file named ``nni_auto_gen_search_space.json`` is generated by this command. Th ...@@ -32,7 +32,7 @@ A file named ``nni_auto_gen_search_space.json`` is generated by this command. Th
Currently, we only support :githublink:`PPO Tuner <examples/tuners/random_nas_tuner>` for classic NAS. More classic NAS algorithms will be supported soon. Currently, we only support :githublink:`PPO Tuner <examples/tuners/random_nas_tuner>` for classic NAS. More classic NAS algorithms will be supported soon.
The complete examples can be found :githublink:`here <examples/nas/classic_nas>` for PyTorch and :githublink:`here <examples/nas/classic_nas-tf>` for TensorFlow. The complete examples can be found :githublink:`here <examples/nas/legacy/classic_nas>` for PyTorch and :githublink:`here <examples/nas/legacy/classic_nas-tf>` for TensorFlow.
Standalone mode for easy debugging Standalone mode for easy debugging
---------------------------------- ----------------------------------
......
...@@ -53,7 +53,7 @@ The training with 16 Gpus is a little bit superior than 8 Gpus, as below. ...@@ -53,7 +53,7 @@ The training with 16 Gpus is a little bit superior than 8 Gpus, as below.
Examples Examples
-------- --------
`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/cream>`__ `Example code <https://github.com/microsoft/nni/tree/master/examples/nas/legacy/cream>`__
Please run the following scripts in the example folder. Please run the following scripts in the example folder.
......
...@@ -36,7 +36,7 @@ Examples ...@@ -36,7 +36,7 @@ Examples
CNN Search Space CNN Search Space
^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^
:githublink:`Example code <examples/nas/darts>` :githublink:`Example code <examples/nas/oneshot/darts>`
.. code-block:: bash .. code-block:: bash
...@@ -44,7 +44,7 @@ CNN Search Space ...@@ -44,7 +44,7 @@ CNN Search Space
git clone https://github.com/Microsoft/nni.git git clone https://github.com/Microsoft/nni.git
# search the best architecture # search the best architecture
cd examples/nas/darts cd examples/nas/oneshot/darts
python3 search.py python3 search.py
# train the best architecture # train the best architecture
......
...@@ -14,7 +14,7 @@ Examples ...@@ -14,7 +14,7 @@ Examples
CIFAR10 Macro/Micro Search Space CIFAR10 Macro/Micro Search Space
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:githublink:`Example code <examples/nas/enas>` :githublink:`Example code <examples/nas/oneshot/enas>`
.. code-block:: bash .. code-block:: bash
...@@ -22,7 +22,7 @@ CIFAR10 Macro/Micro Search Space ...@@ -22,7 +22,7 @@ CIFAR10 Macro/Micro Search Space
git clone https://github.com/Microsoft/nni.git git clone https://github.com/Microsoft/nni.git
# search the best architecture # search the best architecture
cd examples/nas/enas cd examples/nas/oneshot/enas
# search in macro search space # search in macro search space
python3 search.py --search-for macro python3 search.py --search-for macro
......
...@@ -4,7 +4,7 @@ P-DARTS ...@@ -4,7 +4,7 @@ P-DARTS
Examples Examples
-------- --------
:githublink:`Example code <examples/nas/pdarts>` :githublink:`Example code <examples/nas/legacy/pdarts>`
.. code-block:: bash .. code-block:: bash
...@@ -12,7 +12,7 @@ Examples ...@@ -12,7 +12,7 @@ Examples
git clone https://github.com/Microsoft/nni.git git clone https://github.com/Microsoft/nni.git
# search the best architecture # search the best architecture
cd examples/nas/pdarts cd examples/nas/legacy/pdarts
python3 search.py python3 search.py
# train the best architecture, it's the same progress as darts. # train the best architecture, it's the same progress as darts.
......
...@@ -24,7 +24,7 @@ To use ProxylessNAS training/searching approach, users need to specify search sp ...@@ -24,7 +24,7 @@ To use ProxylessNAS training/searching approach, users need to specify search sp
trainer.train() trainer.train()
trainer.export(args.arch_path) trainer.export(args.arch_path)
The complete example code can be found :githublink:`here <examples/nas/proxylessnas>`. The complete example code can be found :githublink:`here <examples/nas/oneshot/proxylessnas>`.
**Input arguments of ProxylessNasTrainer** **Input arguments of ProxylessNasTrainer**
...@@ -56,7 +56,7 @@ Implementation ...@@ -56,7 +56,7 @@ Implementation
The implementation on NNI is based on the `offical implementation <https://github.com/mit-han-lab/ProxylessNAS>`__. The official implementation supports two training approaches: gradient descent and RL based, and support different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing. The implementation on NNI is based on the `offical implementation <https://github.com/mit-han-lab/ProxylessNAS>`__. The official implementation supports two training approaches: gradient descent and RL based, and support different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing.
Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in :githublink:`example code <examples/nas/proxylessnas>` using :githublink:`NNI NAS interface <nni/algorithms/nas/pytorch/proxylessnas>`. Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in :githublink:`example code <examples/nas/oneshot/proxylessnas>` using :githublink:`NNI NAS interface <nni/algorithms/nas/pytorch/proxylessnas>`.
.. image:: ../../img/proxylessnas.png .. image:: ../../img/proxylessnas.png
:target: ../../img/proxylessnas.png :target: ../../img/proxylessnas.png
......
...@@ -13,7 +13,7 @@ Examples ...@@ -13,7 +13,7 @@ Examples
Here is a use case, which is the search space in paper, and the way to use flops limit to perform uniform sampling. Here is a use case, which is the search space in paper, and the way to use flops limit to perform uniform sampling.
:githublink:`Example code <examples/nas/spos>` :githublink:`Example code <examples/nas/oneshot/spos>`
Requirements Requirements
^^^^^^^^^^^^ ^^^^^^^^^^^^
......
...@@ -8,7 +8,7 @@ Search Space Zoo ...@@ -8,7 +8,7 @@ Search Space Zoo
DartsCell DartsCell
--------- ---------
DartsCell is extracted from :githublink:`CNN model <examples/nas/darts>`. A DartsCell is a directed acyclic graph containing an ordered sequence of N nodes and each node stands for a latent representation (e.g. feature map in a convolutional network). Directed edges from Node 1 to Node 2 are associated with some operations that transform Node 1 and the result is stored on Node 2. The `Candidate operators <#predefined-operations-darts>`__ between nodes is predefined and unchangeable. One edge represents an operation that chosen from the predefined ones to be applied to the starting node of the edge. One cell contains two input nodes, a single output node, and other ``n_node`` nodes. The input nodes are defined as the cell outputs in the previous two layers. The output of the cell is obtained by applying a reduction operation (e.g. concatenation) to all the intermediate nodes. To make the search space continuous, the categorical choice of a particular operation is relaxed to a softmax over all possible operations. By adjusting the weight of softmax on every node, the operation with the highest probability is chosen to be part of the final structure. A CNN model can be formed by stacking several cells together, which builds a search space. Note that, in DARTS paper all cells in the model share the same structure. DartsCell is extracted from :githublink:`CNN model <examples/nas/oneshot/darts>`. A DartsCell is a directed acyclic graph containing an ordered sequence of N nodes and each node stands for a latent representation (e.g. feature map in a convolutional network). Directed edges from Node 1 to Node 2 are associated with some operations that transform Node 1 and the result is stored on Node 2. The `Candidate operators <#predefined-operations-darts>`__ between nodes is predefined and unchangeable. One edge represents an operation that chosen from the predefined ones to be applied to the starting node of the edge. One cell contains two input nodes, a single output node, and other ``n_node`` nodes. The input nodes are defined as the cell outputs in the previous two layers. The output of the cell is obtained by applying a reduction operation (e.g. concatenation) to all the intermediate nodes. To make the search space continuous, the categorical choice of a particular operation is relaxed to a softmax over all possible operations. By adjusting the weight of softmax on every node, the operation with the highest probability is chosen to be part of the final structure. A CNN model can be formed by stacking several cells together, which builds a search space. Note that, in DARTS paper all cells in the model share the same structure.
One structure in the Darts search space is shown below. Note that, NNI merges the last one of the four intermediate nodes and the output node. One structure in the Darts search space is shown below. Note that, NNI merges the last one of the four intermediate nodes and the output node.
...@@ -82,7 +82,7 @@ All supported operators for Darts are listed below. ...@@ -82,7 +82,7 @@ All supported operators for Darts are listed below.
ENASMicroLayer ENASMicroLayer
-------------- --------------
This layer is extracted from the model designed :githublink:`here <examples/nas/enas>`. A model contains several blocks that share the same architecture. A block is made up of some normal layers and reduction layers, ``ENASMicroLayer`` is a unified implementation of the two types of layers. The only difference between the two layers is that reduction layers apply all operations with ``stride=2``. This layer is extracted from the model designed :githublink:`here <examples/nas/oneshot/enas>`. A model contains several blocks that share the same architecture. A block is made up of some normal layers and reduction layers, ``ENASMicroLayer`` is a unified implementation of the two types of layers. The only difference between the two layers is that reduction layers apply all operations with ``stride=2``.
ENAS Micro employs a DAG with N nodes in one cell, where the nodes represent local computations, and the edges represent the flow of information between the N nodes. One cell contains two input nodes and a single output node. The following nodes choose two previous nodes as input and apply two operations from `predefined ones <#predefined-operations-enas>`__ then add them as the output of this node. For example, Node 4 chooses Node 1 and Node 3 as inputs then applies ``MaxPool`` and ``AvgPool`` on the inputs respectively, then adds and sums them as the output of Node 4. Nodes that are not served as input for any other node are viewed as the output of the layer. If there are multiple output nodes, the model will calculate the average of these nodes as the layer output. ENAS Micro employs a DAG with N nodes in one cell, where the nodes represent local computations, and the edges represent the flow of information between the N nodes. One cell contains two input nodes and a single output node. The following nodes choose two previous nodes as input and apply two operations from `predefined ones <#predefined-operations-enas>`__ then add them as the output of this node. For example, Node 4 chooses Node 1 and Node 3 as inputs then applies ``MaxPool`` and ``AvgPool`` on the inputs respectively, then adds and sums them as the output of Node 4. Nodes that are not served as input for any other node are viewed as the output of the layer. If there are multiple output nodes, the model will calculate the average of these nodes as the layer output.
......
...@@ -57,7 +57,7 @@ Examples ...@@ -57,7 +57,7 @@ Examples
Search Space Search Space
^^^^^^^^^^^^ ^^^^^^^^^^^^
:githublink:`Example code <examples/nas/textnas>` :githublink:`Example code <examples/nas/legacy/textnas>`
.. code-block:: bash .. code-block:: bash
...@@ -65,7 +65,7 @@ Search Space ...@@ -65,7 +65,7 @@ Search Space
git clone https://github.com/Microsoft/nni.git git clone https://github.com/Microsoft/nni.git
# search the best architecture # search the best architecture
cd examples/nas/textnas cd examples/nas/legacy/textnas
# view more options for search # view more options for search
python3 search.py -h python3 search.py -h
...@@ -83,7 +83,7 @@ retrain ...@@ -83,7 +83,7 @@ retrain
git clone https://github.com/Microsoft/nni.git git clone https://github.com/Microsoft/nni.git
# search the best architecture # search the best architecture
cd examples/nas/textnas cd examples/nas/legacy/textnas
# default to retrain on sst-2 # default to retrain on sst-2
sh run_retrain.sh sh run_retrain.sh
......
...@@ -88,7 +88,7 @@ Use placehoder to make mutation easier: ``nn.Placeholder``. If you want to mutat ...@@ -88,7 +88,7 @@ Use placehoder to make mutation easier: ``nn.Placeholder``. If you want to mutat
stride=stride stride=stride
) )
``label`` is used by mutator to identify this placeholder. The other parameters are the information that are required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>`. ``label`` is used by mutator to identify this placeholder. The other parameters are the information that are required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>`.
Starting an experiment is almost the same as using inline mutation APIs. The only difference is that the applied mutators should be passed to ``RetiariiExperiment``. Below is a simple example. Starting an experiment is almost the same as using inline mutation APIs. The only difference is that the applied mutators should be passed to ``RetiariiExperiment``. Below is a simple example.
......
...@@ -63,7 +63,7 @@ Below is a very simple example of defining a base model, it is almost the same a ...@@ -63,7 +63,7 @@ Below is a very simple example of defining a base model, it is almost the same a
The above example also shows how to use ``@basic_unit``. ``@basic_unit`` is decorated on a user-defined module to tell Retiarii that there will be no mutation within this module, Retiarii can treat it as a basic unit (i.e., as a blackbox). It is useful when (1) users want to mutate the initialization parameters of this module, or (2) Retiarii fails to parse this module due to complex control flow (e.g., ``for``, ``while``). More detailed description of ``@basic_unit`` can be found `here <./Advanced.rst>`__. The above example also shows how to use ``@basic_unit``. ``@basic_unit`` is decorated on a user-defined module to tell Retiarii that there will be no mutation within this module, Retiarii can treat it as a basic unit (i.e., as a blackbox). It is useful when (1) users want to mutate the initialization parameters of this module, or (2) Retiarii fails to parse this module due to complex control flow (e.g., ``for``, ``while``). More detailed description of ``@basic_unit`` can be found `here <./Advanced.rst>`__.
Users can refer to :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>` and :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>` for more complicated examples. Users can refer to :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>` and :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>` for more complicated examples.
Define Model Mutations Define Model Mutations
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
...@@ -195,7 +195,7 @@ After all the above are prepared, it is time to start an experiment to do the mo ...@@ -195,7 +195,7 @@ After all the above are prepared, it is time to start an experiment to do the mo
exp_config.training_service.use_active_gpu = False exp_config.training_service.use_active_gpu = False
exp.run(exp_config, 8081) exp.run(exp_config, 8081)
The complete code of a simple MNIST example can be found :githublink:`here <test/retiarii_test/mnist/test.py>`. The complete code of a simple MNIST example can be found :githublink:`here <examples/nas/multi-trial/mnist/search.py>`.
**Local Debug Mode**: When running an experiment, it is easy to get some trivial errors in trial code, such as shape mismatch, undefined variable. To quickly fix these kinds of errors, we provide local debug mode which locally applies mutators once and runs only that generated model. To use local debug mode, users can simply invoke the API `debug_mutated_model(base_model, trainer, applied_mutators)`. **Local Debug Mode**: When running an experiment, it is easy to get some trivial errors in trial code, such as shape mismatch, undefined variable. To quickly fix these kinds of errors, we provide local debug mode which locally applies mutators once and runs only that generated model. To use local debug mode, users can simply invoke the API `debug_mutated_model(base_model, trainer, applied_mutators)`.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment