Unverified Commit 4146c715 authored by QuanluZhang's avatar QuanluZhang Committed by GitHub
Browse files

[Retiarii] refactor of NAS doc and make python engine default (#3785)


Co-authored-by: default avatarScarlett Li <39592018+scarlett2018@users.noreply.github.com>
Co-authored-by: default avatarkvartet <48014605+kvartet@users.noreply.github.com>
parent 0247be5e
Neural Architecture Search with Retiarii (Alpha)
================================================
Quick Start of Retiarii on NNI
==============================
*This is a pre-release, its interfaces may subject to minor changes. The roadmap of this feature is: experimental in V2.0 -> alpha version in V2.1 -> beta version in V2.2 -> official release in V2.3. Feel free to give us your comments and suggestions.*
`Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ is a new framework to support neural architecture search and hyper-parameter tuning. It allows users to express various search space with high flexibility, to reuse many SOTA search algorithms, and to leverage system level optimizations to speed up the search process. This framework provides the following new user experiences.
* Search space can be expressed directly in user model code. A tuning space can be expressed during defining a model.
* Neural architecture candidates and hyper-parameter candidates are more friendly supported in an experiment.
* The experiment can be launched directly from python code.
.. Note:: `Our previous NAS framework <../Overview.rst>`__ is still supported for now, but will be migrated to Retiarii framework in V2.3.
.. contents::
There are mainly two crucial components for a neural architecture search task, namely,
In this quick start tutorial, we use multi-trial NAS as an example to show how to construct and explore a model space. There are mainly three crucial components for a neural architecture search task, namely,
* Model search space that defines the set of models to explore.
* A proper strategy as the method to explore this search space.
* A model evaluator that reports the performance of a given model.
One-shot NAS tutorial can be found `here <./OneshotTrainer.rst>`__.
.. note:: Currently, PyTorch is the only supported framework by Retiarii, and we have only tested with **PyTorch 1.6 and 1.7**. This documentation assumes PyTorch context but it should also apply to other frameworks, that is in our future plan.
Define your Model Space
......@@ -37,8 +30,8 @@ Below is a very simple example of defining a base model, it is almost the same a
import torch.nn.functional as F
import nni.retiarii.nn.pytorch as nn
from nni.retiarii import model_wrapper
@basic_unit
class BasicBlock(nn.Module):
def __init__(self, const):
self.const = const
......@@ -53,6 +46,7 @@ Below is a very simple example of defining a base model, it is almost the same a
def forward(self, x):
return self.pool(self.conv(x))
@model_wrapper # this decorator should be put on the out most PyTorch module
class Model(nn.Module):
def __init__(self):
super().__init__()
......@@ -61,10 +55,6 @@ Below is a very simple example of defining a base model, it is almost the same a
def forward(self, x):
return F.relu(self.convpool(self.mymodule(x)))
The above example also shows how to use ``@basic_unit``. ``@basic_unit`` is decorated on a user-defined module to tell Retiarii that there will be no mutation within this module, Retiarii can treat it as a basic unit (i.e., as a blackbox). It is useful when (1) users want to mutate the initialization parameters of this module, or (2) Retiarii fails to parse this module due to complex control flow (e.g., ``for``, ``while``). More detailed description of ``@basic_unit`` can be found `here <./Advanced.rst>`__.
Users can refer to :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>` and :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>` for more complicated examples.
Define Model Mutations
^^^^^^^^^^^^^^^^^^^^^^
......@@ -72,7 +62,7 @@ A base model is only one concrete model not a model space. We provide APIs and p
We provide some APIs as shown below for users to easily express possible mutations after defining a base model. The APIs can be used just like PyTorch module. This approach is also called inline mutations.
* ``nn.LayerChoice``. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model. Note that if the candidate is a user-defined module, it should be decorated as a `basic unit <./Advanced.rst>`__ with ``@basic_unit``. In the following example, ``ops.PoolBN`` and ``ops.SepConv`` should be decorated.
* ``nn.LayerChoice``. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model.
.. code-block:: python
......@@ -114,27 +104,21 @@ All the APIs have an optional argument called ``label``, mutations with the same
nn.Linear(nn.ValueChoice([32, 64, 128], label='hidden_dim'), 3)
)
Detailed API description and usage can be found `here <./ApiReference.rst>`__\. Example of using these APIs can be found in :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>`. We are actively enriching the set of inline mutations, to make it easier to express a new search space.
If the inline mutation APIs are not enough for your scenario, you can refer to `defining model space using mutators <./Advanced.rst#express-mutations-with-mutators>`__ to write more complex model spaces.
Detailed API description and usage can be found `here <./ApiReference.rst>`__\. Example of using these APIs can be found in :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>`. We are actively enriching the set of inline mutation APIs, to make it easier to express a new search space. Please refer to `here <./construct_space.rst>`__ for more tutorials about how to express complex model spaces.
Explore the Defined Model Space
-------------------------------
There are basically two exploration approaches: (1) search by evaluating each sampled model independently and (2) one-shot weight-sharing based search. We demonstrate the first approach below in this tutorial. Users can refer to `here <./OneshotTrainer.rst>`__ for the second approach.
Users can choose a proper search strategy to explore the model space, and use a chosen or user-defined model evaluator to evaluate the performance of each sampled model.
Users can choose a proper exploration strategy to explore the model space, and use a chosen or user-defined model evaluator to evaluate the performance of each sampled model.
Choose a search strategy
Pick a search strategy
^^^^^^^^^^^^^^^^^^^^^^^^
Retiarii currently supports the following search strategies:
* Grid search: enumerate all the possible models defined in the space.
* Random: randomly pick the models from search space.
* Regularized evolution: a genetic algorithm that explores the space based on inheritance and mutation.
Retiarii supports many `exploration strategies <./ExplorationStrategies.rst>`__.
Choose (i.e., instantiate) a search strategy is very easy. An example is as follows,
Simply choosing (i.e., instantiate) an exploration strategy as below.
.. code-block:: python
......@@ -142,14 +126,10 @@ Choose (i.e., instantiate) a search strategy is very easy. An example is as foll
search_strategy = strategy.Random(dedup=True) # dedup=False if deduplication is not wanted
Detailed descriptions and usages of available strategies can be found `here <./ApiReference.rst>`__ .
Choose or write a model evaluator
Pick or write a model evaluator
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In the NAS process, the search strategy repeatedly generates new models. A model evaluator is for training and validating each generated model. The obtained performance of a generated model is collected and sent to search strategy for generating better models.
The model evaluator should correctly identify the use case of the model and the optimization goal. For example, on a classification task, an <input, label> dataset is needed, the loss function could be cross entropy and the optimized metric could be accuracy. On a regression task, the optimized metric could be mean-squared-error.
In the NAS process, the exploration strategy repeatedly generates new models. A model evaluator is for training and validating each generated model. The obtained performance of a generated model is collected and sent to the exploration strategy for generating better models.
In the context of PyTorch, Retiarii has provided two built-in model evaluators, designed for simple use cases: classification and regression. These two evaluators are built upon the awesome library PyTorch-Lightning.
......@@ -172,7 +152,7 @@ As the model evaluator is running in another process (possibly in some remote ma
Detailed descriptions and usages of model evaluators can be found `here <./ApiReference.rst>`__ .
If the built-in model evaluators do not meet your requirement, or you already wrote the training code and just want to use it, you can follow `the guide to write a new evaluator <./WriteTrainer.rst>`__ .
If the built-in model evaluators do not meet your requirement, or you already wrote the training code and just want to use it, you can follow `the guide to write a new model evaluator <./WriteTrainer.rst>`__ .
.. note:: In case you want to run the model evaluator locally for debug purpose, you can directly run the evaluator via ``evaluator._execute(Net)`` (note that it has to be ``Net``, not ``Net()``). However, this API is currently internal and subject to change.
......@@ -195,11 +175,19 @@ After all the above are prepared, it is time to start an experiment to do the mo
exp_config.training_service.use_active_gpu = False
exp.run(exp_config, 8081)
The complete code of a simple MNIST example can be found :githublink:`here <examples/nas/multi-trial/mnist/search.py>`.
**Local Debug Mode**: When running an experiment, it is easy to get some trivial errors in trial code, such as shape mismatch, undefined variable. To quickly fix these kinds of errors, we provide local debug mode which locally applies mutators once and runs only that generated model. To use local debug mode, users can simply invoke the API `debug_mutated_model(base_model, trainer, applied_mutators)`.
The complete code of a simple MNIST example can be found :githublink:`here <examples/nas/multi-trial/mnist/search.py>`. Users can also run Retiarii Experiment on `different training services <../training_services.rst>`__ besides ``local`` training service.
Visualize the Experiment
------------------------
Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost::8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../../Tutorial/WebUI.rst>`__ for details.
Export Top Models
-----------------
Users can export top models after the exploration is done using ``export_top_models``.
.. code-block:: python
for model_code in exp.export_top_models(formatter='dict'):
print(model_code)
......@@ -98,14 +98,8 @@ Reference
PyTorch
^^^^^^^
.. autoclass:: nni.algorithms.nas.pytorch.spos.SPOSEvolution
:members:
.. autoclass:: nni.algorithms.nas.pytorch.spos.SPOSSupernetTrainer
:members:
.. autoclass:: nni.algorithms.nas.pytorch.spos.SPOSSupernetTrainingMutator
:members:
.. autoclass:: nni.retiarii.oneshot.pytorch.SinglePathTrainer
:noindex:
Known Limitations
-----------------
......
.. role:: raw-html(raw)
:format: html
Search Space Zoo
================
DartsCell
---------
DartsCell is extracted from :githublink:`CNN model <examples/nas/oneshot/darts>`. A DartsCell is a directed acyclic graph containing an ordered sequence of N nodes and each node stands for a latent representation (e.g. feature map in a convolutional network). Directed edges from Node 1 to Node 2 are associated with some operations that transform Node 1 and the result is stored on Node 2. The `Candidate operators <#predefined-operations-darts>`__ between nodes is predefined and unchangeable. One edge represents an operation that chosen from the predefined ones to be applied to the starting node of the edge. One cell contains two input nodes, a single output node, and other ``n_node`` nodes. The input nodes are defined as the cell outputs in the previous two layers. The output of the cell is obtained by applying a reduction operation (e.g. concatenation) to all the intermediate nodes. To make the search space continuous, the categorical choice of a particular operation is relaxed to a softmax over all possible operations. By adjusting the weight of softmax on every node, the operation with the highest probability is chosen to be part of the final structure. A CNN model can be formed by stacking several cells together, which builds a search space. Note that, in DARTS paper all cells in the model share the same structure.
One structure in the Darts search space is shown below. Note that, NNI merges the last one of the four intermediate nodes and the output node.
.. image:: ../../img/NAS_Darts_cell.svg
:target: ../../img/NAS_Darts_cell.svg
:alt:
The predefined operators are shown `here <#predefined-operations-darts>`__.
.. autoclass:: nni.nas.pytorch.search_space_zoo.DartsCell
:members:
Example code
^^^^^^^^^^^^
:githublink:`example code <examples/nas/search_space_zoo/darts_example.py>`
.. code-block:: bash
git clone https://github.com/Microsoft/nni.git
cd nni/examples/nas/search_space_zoo
# search the best structure
python3 darts_example.py
:raw-html:`<a name="predefined-operations-darts"></a>`
Candidate operators
^^^^^^^^^^^^^^^^^^^
All supported operators for Darts are listed below.
*
MaxPool / AvgPool
* MaxPool: Call ``torch.nn.MaxPool2d``. This operation applies a 2D max pooling over all input channels. Its parameters ``kernel_size=3`` and ``padding=1`` are fixed. The pooling result will pass through a BatchNorm2d then return as the result.
*
AvgPool: Call ``torch.nn.AvgPool2d``. This operation applies a 2D average pooling over all input channels. Its parameters ``kernel_size=3`` and ``padding=1`` are fixed. The pooling result will pass through a BatchNorm2d then return as the result.
MaxPool / AvgPool with ``kernel_size=3`` and ``padding=1`` followed by BatchNorm2d
.. autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.PoolBN
*
SkipConnect
There is no operation between two nodes. Call ``torch.nn.Identity`` to forward what it gets to the output.
*
Zero operation
There is no connection between two nodes.
*
DilConv3x3 / DilConv5x5
:raw-html:`<a name="DilConv"></a>`\ DilConv3x3: (Dilated) depthwise separable Conv. It's a 3x3 depthwise convolution with ``C_in`` groups, followed by a 1x1 pointwise convolution. It reduces the amount of parameters. Input is first passed through relu, then DilConv and finally batchNorm2d. **Note that the operation is not Dilated Convolution, but we follow the convention in NAS papers to name it DilConv.** 3x3 DilConv has parameters ``kernel_size=3``\ , ``padding=1`` and 5x5 DilConv has parameters ``kernel_size=5``\ , ``padding=4``.
.. autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.DilConv
*
SepConv3x3 / SepConv5x5
Composed of two DilConvs with fixed ``kernel_size=3``\ , ``padding=1`` or ``kernel_size=5``\ , ``padding=2`` sequentially.
.. autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.SepConv
ENASMicroLayer
--------------
This layer is extracted from the model designed :githublink:`here <examples/nas/oneshot/enas>`. A model contains several blocks that share the same architecture. A block is made up of some normal layers and reduction layers, ``ENASMicroLayer`` is a unified implementation of the two types of layers. The only difference between the two layers is that reduction layers apply all operations with ``stride=2``.
ENAS Micro employs a DAG with N nodes in one cell, where the nodes represent local computations, and the edges represent the flow of information between the N nodes. One cell contains two input nodes and a single output node. The following nodes choose two previous nodes as input and apply two operations from `predefined ones <#predefined-operations-enas>`__ then add them as the output of this node. For example, Node 4 chooses Node 1 and Node 3 as inputs then applies ``MaxPool`` and ``AvgPool`` on the inputs respectively, then adds and sums them as the output of Node 4. Nodes that are not served as input for any other node are viewed as the output of the layer. If there are multiple output nodes, the model will calculate the average of these nodes as the layer output.
The ENAS micro search space is shown below.
.. image:: ../../img/NAS_ENAS_micro.svg
:target: ../../img/NAS_ENAS_micro.svg
:alt:
The predefined operators can be seen `here <#predefined-operations-enas>`__.
.. autoclass:: nni.nas.pytorch.search_space_zoo.ENASMicroLayer
:members:
The Reduction Layer is made up of two Conv operations followed by BatchNorm, each of them will output ``C_out//2`` channels and concat them in channels as the output. The Convolution has ``kernel_size=1`` and ``stride=2``\ , and they perform alternate sampling on the input to reduce the resolution without loss of information. This layer is wrapped in ``ENASMicroLayer``.
Example code
^^^^^^^^^^^^
:githublink:`example code <examples/nas/search_space_zoo/enas_micro_example.py>`
.. code-block:: bash
git clone https://github.com/Microsoft/nni.git
cd nni/examples/nas/search_space_zoo
# search the best cell structure
python3 enas_micro_example.py
:raw-html:`<a name="predefined-operations-enas"></a>`
Candidate operators
^^^^^^^^^^^^^^^^^^^
All supported operators for ENAS micro search are listed below.
*
MaxPool / AvgPool
* MaxPool: Call ``torch.nn.MaxPool2d``. This operation applies a 2D max pooling over all input channels followed by BatchNorm2d. Its parameters are fixed to ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
* AvgPool: Call ``torch.nn.AvgPool2d``. This operation applies a 2D average pooling over all input channels followed by BatchNorm2d. Its parameters are fixed to ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
.. autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.Pool
*
SepConv
* SepConvBN3x3: ReLU followed by a `DilConv <#DilConv>`__ and BatchNorm. Convolution parameters are ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
*
SepConvBN5x5: Do the same operation as the previous one but it has different kernel sizes and paddings, which is set to 5 and 2 respectively.
.. autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.SepConvBN
*
SkipConnect
Call ``torch.nn.Identity`` to connect directly to the next cell.
ENASMacroLayer
--------------
In Macro search, the controller makes two decisions for each layer: i) the `operation <#macro-operations>`__ to perform on the result of the previous layer, ii) which the previous layer to connect to for SkipConnects. ENAS uses a controller to design the whole model architecture instead of one of its components. The output of operations is going to concat with the tensor of the chosen layer for SkipConnect. NNI provides `predefined operators <#macro-operations>`__ for macro search, which are listed in `Candidate operators <#macro-operations>`__.
Part of one structure in the ENAS macro search space is shown below.
.. image:: ../../img/NAS_ENAS_macro.svg
:target: ../../img/NAS_ENAS_macro.svg
:alt:
.. autoclass:: nni.nas.pytorch.search_space_zoo.ENASMacroLayer
:members:
To describe the whole search space, NNI provides a model, which is built by stacking the layers.
.. autoclass:: nni.nas.pytorch.search_space_zoo.ENASMacroGeneralModel
:members:
Example code
^^^^^^^^^^^^
:githublink:`example code <examples/nas/search_space_zoo/enas_macro_example.py>`
.. code-block:: bash
git clone https://github.com/Microsoft/nni.git
cd nni/examples/nas/search_space_zoo
# search the best cell structure
python3 enas_macro_example.py
:raw-html:`<a name="macro-operations"></a>`
Candidate operators
^^^^^^^^^^^^^^^^^^^
All supported operators for ENAS macro search are listed below.
*
ConvBranch
All input first passes into a StdConv, which is made up of a 1x1Conv followed by BatchNorm2d and ReLU. Then the intermediate result goes through one of the operations listed below. The final result is calculated through a BatchNorm2d and ReLU as post-procedure.
* Separable Conv3x3: If ``separable=True``\ , the cell will use `SepConv <#DilConv>`__ instead of normal Conv operation. SepConv's ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
* Separable Conv5x5: SepConv's ``kernel_size=5``\ , ``stride=1`` and ``padding=2``.
* Normal Conv3x3: If ``separable=False``\ , the cell will use a normal Conv operations with ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
*
Normal Conv5x5: Conv's ``kernel_size=5``\ , ``stride=1`` and ``padding=2``.
.. autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.ConvBranch
*
PoolBranch
All input first passes into a StdConv, which is made up of a 1x1Conv followed by BatchNorm2d and ReLU. Then the intermediate goes through pooling operation followed by BatchNorm.
* AvgPool: Call ``torch.nn.AvgPool2d``. This operation applies a 2D average pooling over all input channels. Its parameters are fixed to ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
*
MaxPool: Call ``torch.nn.MaxPool2d``. This operation applies a 2D max pooling over all input channels. Its parameters are fixed to ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
.. autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.PoolBranch
NAS-Bench-201
-------------
NAS Bench 201 defines a unified search space, which is algorithm agnostic. The predefined skeleton consists of a stack of cells that share the same architecture. Every cell contains four nodes and a DAG is formed by connecting edges among them, where the node represents the sum of feature maps and the edge stands for an operation transforming a tensor from the source node to the target node. The predefined candidate operators can be found in `Candidate operators <#nas-bench-201-reference>`__.
The search space of NAS Bench 201 is shown below.
.. image:: ../../img/NAS_Bench_201.svg
:target: ../../img/NAS_Bench_201.svg
:alt:
.. autoclass:: nni.nas.pytorch.nasbench201.NASBench201Cell
:members:
Example code
^^^^^^^^^^^^
:githublink:`example code <examples/nas/search_space_zoo/nas_bench_201.py>`
.. code-block:: bash
# for structure searching
git clone https://github.com/Microsoft/nni.git
cd nni/examples/nas/search_space_zoo
python3 nas_bench_201.py
:raw-html:`<a name="nas-bench-201-reference"></a>`
Candidate operators
^^^^^^^^^^^^^^^^^^^
All supported operators for NAS Bench 201 are listed below.
*
AvgPool
If the number of input channels is not equal to the number of output channels, the input will first pass through a ``ReLUConvBN`` layer with ``kernel_size=1``\ , ``stride=1``\ , ``padding=0``\ , and ``dilation=0``.
Call ``torch.nn.AvgPool2d``. This operation applies a 2D average pooling over all input channels followed by BatchNorm2d. Its parameters are fixed to ``kernel_size=3`` and ``padding=1``.
.. autoclass:: nni.nas.pytorch.nasbench201.nasbench201_ops.Pooling
:members:
*
Conv
* Conv1x1: Consist of a sequence of ReLU, ``nn.Cinv2d`` and BatchNorm. The Conv operation's parameter is fixed to ``kernal_size=1``\ , ``padding=0``\ , and ``dilation=1``.
* Conv3x3: Consist of a sequence of ReLU, ``nn.Cinv2d`` and BatchNorm. The Conv operation's parameter is fixed to ``kernal_size=3``\ , ``padding=1``\ , and ``dilation=1``.
.. autoclass:: nni.nas.pytorch.nasbench201.nasbench201_ops.ReLUConvBN
:members:
*
SkipConnect
Call ``torch.nn.Identity`` to connect directly to the next cell.
*
Zeroize
Generate zero tensors indicating there is no connection from the source node to the target node.
.. autoclass:: nni.nas.pytorch.nasbench201.nasbench201_ops.Zero
:members:
TextNAS
=======
Introduction
------------
This is the implementation of the TextNAS algorithm proposed in the paper `TextNAS: A Neural Architecture Search Space tailored for Text Representation <https://arxiv.org/pdf/1912.10729.pdf>`__. TextNAS is a neural architecture search algorithm tailored for text representation, more specifically, TextNAS is based on a novel search space consists of operators widely adopted to solve various NLP tasks, and TextNAS also supports multi-path ensemble within a single network to balance the width and depth of the architecture.
The search space of TextNAS contains:
.. code-block:: bash
* 1-D convolutional operator with filter size 1, 3, 5, 7
* recurrent operator (bi-directional GRU)
* self-attention operator
* pooling operator (max/average)
Following the ENAS algorithm, TextNAS also utilizes parameter sharing to accelerate the search speed and adopts a reinforcement-learning controller for the architecture sampling and generation. Please refer to the paper for more details of TextNAS.
Preparation
-----------
Prepare the word vectors and SST dataset, and organize them in data directory as shown below:
.. code-block:: bash
textnas
├── data
│ ├── sst
│ │ └── trees
│ │ ├── dev.txt
│ │ ├── test.txt
│ │ └── train.txt
│ └── glove.840B.300d.txt
├── dataloader.py
├── model.py
├── ops.py
├── README.md
├── search.py
└── utils.py
The following link might be helpful for finding and downloading the corresponding dataset:
* `GloVe: Global Vectors for Word Representation <https://nlp.stanford.edu/projects/glove/>`__
* `glove.840B.300d.txt <http://nlp.stanford.edu/data/glove.840B.300d.zip>`__
* `Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank <https://nlp.stanford.edu/sentiment/>`__
* `trainDevTestTrees_PTB.zip <https://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip>`__
Examples
--------
Search Space
^^^^^^^^^^^^
:githublink:`Example code <examples/nas/legacy/textnas>`
.. code-block:: bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/legacy/textnas
# view more options for search
python3 search.py -h
After each search epoch, 10 sampled architectures will be tested directly. Their performances are expected to be 40% - 42% after 10 epochs.
By default, 20 sampled architectures will be exported into ``checkpoints`` directory for next step.
retrain
^^^^^^^
.. code-block:: bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/legacy/textnas
# default to retrain on sst-2
sh run_retrain.sh
Reference
---------
TextNAS directly uses EnasTrainer, please refer to `ENAS <./ENAS.rst>`__ for the trainer APIs.
NAS Visualization (Experimental)
================================
Built-in Trainers Support
-------------------------
Currently, only ENAS and DARTS support visualization. Examples of `ENAS <./ENAS.rst>`__ and `DARTS <./DARTS.rst>`__ has demonstrated how to enable visualization in your code, namely, adding this before ``trainer.train()``\ :
.. code-block:: python
trainer.enable_visualization()
This will create a directory ``logs/<current_time_stamp>`` in your working folder, in which you will find two files ``graph.json`` and ``log``.
You don't have to wait until your program finishes to launch NAS UI, but it's important that these two files have been already created. Launch NAS UI with
.. code-block:: bash
nnictl webui nas --logdir logs/<current_time_stamp> --port <port>
Visualize a Customized Trainer
------------------------------
If you are interested in how to customize a trainer, please read this `doc <./Advanced.rst#extend-the-ability-of-one-shot-trainers>`__.
You should do two modifications to an existing trainer to enable visualization:
#. Export your graph before training, with
.. code-block:: python
vis_graph = self.mutator.graph(inputs)
# `inputs` is a dummy input to your model. For example, torch.randn((1, 3, 32, 32)).cuda()
# If your model has multiple inputs, it should be a tuple.
with open("/path/to/your/logdir/graph.json", "w") as f:
json.dump(vis_graph, f)
#. Logging the choices you've made. You can do it once per epoch, once per mini-batch or whatever frequency you'd like.
.. code-block:: python
def __init__(self):
# ...
self.status_writer = open("/path/to/your/logdir/log", "w") # create a writer
def train(self):
# ...
print(json.dumps(self.mutator.status()), file=self.status_writer, flush=True) # dump a record of status
If you are implementing your customized trainer inheriting ``Trainer``. We have provided ``enable_visualization()`` and ``_write_graph_status()`` for easy-to-use purposes. All you need to do is calling ``trainer.enable_visualization()`` before start, and ``trainer._write_graph_status()`` each time you want to do the logging. But remember both of these APIs are experimental and subject to change in future.
Last but not least, invode NAS UI with
.. code-block:: bash
nnictl webui nas --logdir /path/to/your/logdir
NAS UI Preview
--------------
.. image:: ../../img/nasui-1.png
:target: ../../img/nasui-1.png
:alt:
.. image:: ../../img/nasui-2.png
:target: ../../img/nasui-2.png
:alt:
Limitations
-----------
* NAS visualization only works with PyTorch >=1.4. We've tested it on PyTorch 1.3.1 and it doesn't work.
* We rely on PyTorch support for tensorboard for graph export, which relies on ``torch.jit``. It will not work if your model doesn't support ``jit``.
* There are known performance issues when loading a moderate-size graph with many op choices (like DARTS search space).
Feedback
--------
NAS UI is currently experimental. We welcome your feedback. `Here <https://github.com/microsoft/nni/pull/2085>`__ we have listed all the to-do items of NAS UI in the future. Feel free to comment (or `submit a new issue <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__\ ) if you have other suggestions.
One-shot Experiments on Retiarii
================================
Before reading this tutorial, we highly recommend you to first go through the tutorial of how to `define a model space <./Tutorial.rst#define-your-model-space>`__.
Model Search with One-shot Trainer
----------------------------------
With a defined model space, users can explore the space in two ways. One is using strategy and single-arch evaluator as demonstrated `here <./Tutorial.rst#explore-the-defined-model-space>`__. The other is using one-shot trainer, which consumes much less computational resource compared to the first one. In this tutorial we focus on this one-shot approach. The principle of one-shot approach is combining all the models in a model space into one big model (usually called super-model or super-graph). It takes charge of both search, training and testing, by training and evaluating this big model.
We list the supported one-shot trainers here:
* DARTS trainer
* ENAS trainer
* ProxylessNAS trainer
* Single-path (random) trainer
See `API reference <./ApiReference.rst>`__ for detailed usages. Here, we show an example to use DARTS trainer manually.
.. code-block:: python
from nni.retiarii.oneshot.pytorch import DartsTrainer
trainer = DartsTrainer(
model=model,
loss=criterion,
metrics=lambda output, target: accuracy(output, target, topk=(1,)),
optimizer=optim,
num_epochs=args.epochs,
dataset=dataset_train,
batch_size=args.batch_size,
log_frequency=args.log_frequency,
unrolled=args.unrolled
)
trainer.fit()
final_architecture = trainer.export()
**Format of the exported architecture.** TBD.
One-shot experiment can be visualized with NAS UI, please refer to `here <../Visualization.rst>`__ for the usage guidance. Note that NAS visualization is under intensive development.
Customize a New One-shot Trainer
--------------------------------
================================
One-shot trainers should inherit ``nni.retiarii.oneshot.BaseOneShotTrainer``, and need to implement ``fit()`` (used to conduct the fitting and searching process) and ``export()`` method (used to return the searched best architecture).
......
Write A Search Space
====================
Genrally, a search space describes candiate architectures from which users want to find the best one. Different search algorithms, no matter classic NAS or one-shot NAS, can be applied on the search space. NNI provides APIs to unified the expression of neural architecture search space.
A search space can be built on a base model. This is also a common practice when a user wants to apply NAS on an existing model. Take `MNIST on PyTorch <https://github.com/pytorch/examples/blob/master/mnist/main.py>`__ as an example. Note that NNI provides the same APIs for expressing search space on PyTorch and TensorFlow.
.. code-block:: python
from nni.nas.pytorch import mutables
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = mutables.LayerChoice([
nn.Conv2d(1, 32, 3, 1),
nn.Conv2d(1, 32, 5, 3)
]) # try 3x3 kernel and 5x5 kernel
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
# ... same as original ...
return output
The example above adds an option of choosing conv5x5 at conv1. The modification is as simple as declaring a ``LayerChoice`` with the original conv3x3 and a new conv5x5 as its parameter. That's it! You don't have to modify the forward function in any way. You can imagine conv1 as any other module without NAS.
So how about the possibilities of connections? This can be done using ``InputChoice``. To allow for a skip connection on the MNIST example, we add another layer called conv3. In the following example, a possible connection from conv2 is added to the output of conv3.
.. code-block:: python
from nni.nas.pytorch import mutables
class Net(nn.Module):
def __init__(self):
# ... same ...
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.conv3 = nn.Conv2d(64, 64, 1, 1)
# declaring that there is exactly one candidate to choose from
# search strategy will choose one or None
self.skipcon = mutables.InputChoice(n_candidates=1)
# ... same ...
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x0 = self.skipcon([x]) # choose one or none from [x]
x = self.conv3(x)
if x0 is not None: # skipconnection is open
x += x0
x = F.max_pool2d(x, 2)
# ... same ...
return output
Input choice can be thought of as a callable module that receives a list of tensors and outputs the concatenation/sum/mean of some of them (sum by default), or ``None`` if none is selected. Like layer choices, input choices should be initialized in ``__init__`` and called in ``forward``. This is to allow search algorithms to identify these choices and do necessary preparations.
``LayerChoice`` and ``InputChoice`` are both **mutables**. Mutable means "changeable". As opposed to traditional deep learning layers/modules which have fixed operation types once defined, models with mutable are essentially a series of possible models.
Users can specify a **key** for each mutable. By default, NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two ``LayerChoice``\ s with the same candidate operations and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice and will be used in the dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables (e.g., ``LayerChoice`` and ``InputChoice``\ ), see `Mutables <./NasReference.rst>`__.
With search space defined, the next step is searching for the best model from it. Please refer to `classic NAS algorithms <./ClassicNas.rst>`__ and `one-shot NAS algorithms <./NasGuide.rst>`__ for how to search from your defined search space.
Customize A New Strategy
========================
Customize Exploration Strategy
==============================
To write a new strategy, you should inherit the base strategy class ``BaseStrategy``, then implement the member function ``run``. This member function takes ``base_model`` and ``applied_mutators`` as its input arguments. It can simply apply the user specified mutators in ``applied_mutators`` onto ``base_model`` to generate a new model. When a mutator is applied, it should be bound with a sampler (e.g., ``RandomSampler``). Every sampler implements the ``choice`` function which chooses value(s) from candidate values. The ``choice`` functions invoked in mutators are executed with the sampler.
If users want to innovate a new exploration strategy, they can easily customize a new one following the interface provided by NNI. Specifically, users should inherit the base strategy class ``BaseStrategy``, then implement the member function ``run``. This member function takes ``base_model`` and ``applied_mutators`` as its input arguments. It can simply apply the user specified mutators in ``applied_mutators`` onto ``base_model`` to generate a new model. When a mutator is applied, it should be bound with a sampler (e.g., ``RandomSampler``). Every sampler implements the ``choice`` function which chooses value(s) from candidate values. The ``choice`` functions invoked in mutators are executed with the sampler.
Below is a very simple random strategy, which makes the choices completely random.
......
#####################
Construct Model Space
#####################
NNI provides powerful APIs for users to easily express model space (or search space). First, users can use mutation primitives (e.g., ValueChoice, LayerChoice) to inline a space in their model. Second, NNI provides simple interface for users to customize new mutators for expressing more complicated model spaces. In most cases, the mutation primitives are enough to express users' model spaces.
.. toctree::
:maxdepth: 1
Mutation Primitives <MutationPrimitives>
Customize Mutators <Mutators>
\ No newline at end of file
Multi-trial NAS
===============
In multi-trial NAS, users need model evaluator to evaluate the performance of each sampled model, and need an exploration strategy to sample models from a defined model space. Here, users could use NNI provided model evaluators or write their own model evalutor. They can simply choose a exploration strategy. Advanced users can also customize new exploration strategy. For a simple example about how to run a multi-trial NAS experiment, please refer to `Quick Start <./QuickStart.rst>`__.
.. toctree::
:maxdepth: 1
Model Evaluators <ModelEvaluators>
Customize Model Evaluator <WriteTrainer>
Exploration Strategies <ExplorationStrategies>
Customize Exploration Strategies <WriteStrategy>
Execution Engines <ExecutionEngines>
One-shot NAS Algorithms
=======================
One-shot NAS
============
One-shot NAS algorithms leverage weight sharing among models in neural architecture search space to train a supernet, and use this supernet to guide the selection of better models. This type of algorihtms greatly reduces computational resource compared to independently training each model from scratch (which we call "Classic NAS"). NNI has supported many popular One-shot NAS algorithms as following.
One-shot NAS algorithms leverage weight sharing among models in neural architecture search space to train a supernet, and use this supernet to guide the selection of better models. This type of algorihtms greatly reduces computational resource compared to independently training each model from scratch (which we call "Multi-trial NAS"). NNI has supported many popular One-shot NAS algorithms as following.
.. toctree::
:maxdepth: 1
Quick Start <NasGuide>
Run One-shot NAS <OneshotTrainer>
ENAS <ENAS>
DARTS <DARTS>
P-DARTS <PDARTS>
SPOS <SPOS>
CDARTS <CDARTS>
ProxylessNAS <Proxylessnas>
FBNet <FBNet>
TextNAS <TextNAS>
Cream <Cream>
Customize one-shot NAS <WriteOneshot>
#################
Retiarii Overview
#################
`Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ is a deep learning framework that supports the exploratory training on a neural network model space, rather than on a single neural network model.
Exploratory training with Retiarii allows user to express various search space for **Neural Architecture Search** and **Hyper-Parameter Tuning** with high flexibility.
As previous NAS and HPO supports, the new framework continued the ability for allowing user to reuse SOTA search algorithms, and to leverage system level optimizations to speed up the search process.
Follow the instructions below to start your journey with Retiarii.
.. toctree::
:maxdepth: 2
Quick Start <Tutorial>
Write a Model Evaluator <WriteTrainer>
One-shot NAS <OneshotTrainer>
Advanced Tutorial <Advanced>
Customize a New Strategy <WriteStrategy>
Retiarii APIs <ApiReference>
......@@ -41,8 +41,6 @@ Currently, we support the following algorithms:
- BOHB is a follow-up work to Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. `Reference Paper <https://arxiv.org/abs/1807.01774>`__
* - `GP Tuner <#GPTuner>`__
- Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. `Reference Paper <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__\ , `Github Repo <https://github.com/fmfn/BayesianOptimization>`__
* - `PPO Tuner <#PPOTuner>`__
- PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. `Reference Paper <https://arxiv.org/abs/1707.06347>`__
* - `PBT Tuner <#PBTTuner>`__
- PBT Tuner is a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. `Reference Paper <https://arxiv.org/abs/1711.09846v1>`__
* - `DNGO Tuner <#DNGOTuner>`__
......
PPO Tuner on NNI
================
PPOTuner
--------
This is a tuner geared for NNI's Neural Architecture Search (NAS) interface. It uses the `ppo algorithm <https://arxiv.org/abs/1707.06347>`__. The implementation inherits the main logic of the ppo2 OpenAI implementation `here <https://github.com/openai/baselines/tree/master/baselines/ppo2>`__ and is adapted for the NAS scenario.
We had successfully tuned the mnist-nas example and has the following result:
.. Note:: we are refactoring this example to the latest NAS interface, will publish the example codes after the refactor.
.. image:: ../../img/ppo_mnist.png
:target: ../../img/ppo_mnist.png
:alt:
We also tune :githublink:`the macro search space for image classification in the enas paper <examples/nas/legacy/classic_nas>` (with a limited epoch number for each trial, i.e., 8 epochs), which is implemented using the NAS interface and tuned with PPOTuner. Here is Figure 7 from the `enas paper <https://arxiv.org/pdf/1802.03268.pdf>`__ to show what the search space looks like
.. image:: ../../img/enas_search_space.png
:target: ../../img/enas_search_space.png
:alt:
The figure above was the chosen architecture. Each square is a layer whose operation was chosen from 6 options. Each dashed line is a skip connection, each square layer can choose 0 or 1 skip connections, getting the output from a previous layer. **Note that**\ , in original macro search space, each square layer could choose any number of skip connections, while in our implementation, it is only allowed to choose 0 or 1.
The results are shown in figure below (see the experimenal config :githublink:`here <examples/nas/legacy/classic_nas/config_ppo.yml>`\ :
.. image:: ../../img/ppo_cifar10.png
:target: ../../img/ppo_cifar10.png
:alt:
......@@ -20,5 +20,4 @@ Tuner receives metrics from `Trial` to evaluate the performance of a specific pa
Network Morphism <Tuner/NetworkmorphismTuner>
Hyperband <Tuner/HyperbandAdvisor>
BOHB <Tuner/BohbAdvisor>
PPO Tuner <Tuner/PPOTuner>
PBT Tuner <Tuner/PBTTuner>
##########################
Neural Architecture Search
##########################
#############################################
Retiarii for Neural Architecture Search (NAS)
#############################################
Automatic neural architecture search is taking an increasingly important role on finding better models.
Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually tuned models.
......@@ -10,20 +10,24 @@ However, it takes great efforts to implement NAS algorithms, and it is hard to r
To facilitate NAS innovations (e.g., design and implement new NAS models, compare different NAS models side-by-side),
an easy-to-use and flexible programming interface is crucial.
Therefore, we provide a unified interface for NAS,
to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.
For details, please refer to the following tutorials:
Thus, we design `Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__. It is a deep learning framework that supports the exploratory training on a neural network model space, rather than on a single neural network model.
Exploratory training with Retiarii allows user to express various search spaces for *Neural Architecture Search* and *Hyper-Parameter Tuning* with high flexibility.
Some frequently used terminologies in this document:
* *Model search space*: it means a set of models from which the best model is explored/searched. Sometimes we use *search space* or *model space* in short.
* *Exploration strategy*: the algorithm that is used to explore a model search space.
* *Model evaluator*: it is used to train a model and evaluate the model's performance.
Follow the instructions below to start your journey with Retiarii.
.. toctree::
:maxdepth: 2
Overview <NAS/Overview>
Write A Search Space <NAS/WriteSearchSpace>
Classic NAS <NAS/ClassicNas>
Quick Start <NAS/QuickStart>
Construct Model Space <NAS/construct_space>
Multi-trial NAS <NAS/multi_trial_nas>
One-shot NAS <NAS/one_shot_nas>
Retiarii NAS (Alpha) <NAS/retiarii/retiarii_index>
Customize a NAS Algorithm <NAS/Advanced>
NAS Visualization <NAS/Visualization>
Search Space Zoo <NAS/SearchSpaceZoo>
NAS Benchmarks <NAS/Benchmarks>
API Reference <NAS/NasReference>
NAS API References <NAS/ApiReference>
......@@ -7,5 +7,5 @@ Python API Reference
:maxdepth: 1
Auto Tune <autotune_ref>
NAS <NAS/NasReference>
NAS <NAS/ApiReference>
Compression <Compression/CompressionReference>
\ No newline at end of file
......@@ -53,5 +53,6 @@ if __name__ == '__main__':
exp_config.trial_concurrency = 2
exp_config.max_trial_number = 10
exp_config.training_service.use_active_gpu = False
exp_config.execution_engine = 'base'
exp.run(exp_config, 8097)
......@@ -10,8 +10,8 @@ from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST
# uncomment this for python execution engine
# @model_wrapper
# comment the follwing line for graph-based execution engine
@model_wrapper
class Net(nn.Module):
def __init__(self, hidden_size):
super().__init__()
......@@ -43,10 +43,6 @@ if __name__ == '__main__':
val_dataloaders=pl.DataLoader(test_dataset, batch_size=100),
max_epochs=2)
# uncomment the following two lines to debug a generated model
#debug_mutated_model(base_model, trainer, [])
#exit(0)
simple_strategy = strategy.Random()
exp = RetiariiExperiment(base_model, trainer, [], simple_strategy)
......@@ -56,11 +52,11 @@ if __name__ == '__main__':
exp_config.trial_concurrency = 2
exp_config.max_trial_number = 2
exp_config.training_service.use_active_gpu = False
export_formatter = 'code'
export_formatter = 'dict'
# uncomment this for python execution engine
# exp_config.execution_engine = 'py'
# export_formatter = 'dict'
# uncomment this for graph-based execution engine
# exp_config.execution_engine = 'base'
# export_formatter = 'code'
exp.run(exp_config, 8081 + random.randint(0, 100))
print('Final model:')
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment