update doc for NNI NAS interface (#1781)

fbe2586d · QuanluZhang · GitHub · 469209af · fbe2586d · fbe2586d
Unverified Commit fbe2586d authored Nov 26, 2019 by QuanluZhang Committed by GitHub Nov 26, 2019
6 changed files
--- a/docs/en_US/NAS/NasInterface.md
+++ b/docs/en_US/NAS/NasInterface.md
+# NNI NAS Programming Interface
+
+We are trying to support various NAS algorithms with unified programming interface, and it's still in experimental stage. It means the current programing interface might be updated in future.
+
+*previous [NAS annotation](../AdvancedFeature/GeneralNasInterfaces.md) interface will be deprecated soon.*
+
+## Programming interface for user model
+
+The programming interface of designing and searching a model is often demanded in two scenarios.
+
+1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs  best. So, it needs an easy way to express the candidate layers or sub-models.
+2. When applying NAS on a neural network, it needs an unified way to express the search space of architectures, so that it doesn't need to update trial code for different searching algorithms.
+
+
+For expressing neural architecture search space in user code, we provide the following APIs (take PyTorch as example):
+
+```python
+# in PyTorch module class
+def __init__(self):
+    ...
+    # choose one ``op`` from ``ops``, for PyTorch this is a module.
+    # op_candidates: for PyTorch ``ops`` is a list of modules, for tensorflow it is a list of keras layers.
+    # key: the name of this ``LayerChoice`` instance
+    self.one_layer = nni.nas.pytorch.LayerChoice([
+        PoolBN('max', channels, 3, stride, 1, affine=False),
+        PoolBN('avg', channels, 3, stride, 1, affine=False),
+        FactorizedReduce(channels, channels, affine=False),
+        SepConv(channels, channels, 3, stride, 1, affine=False),
+        DilConv(channels, channels, 3, stride, 2, 2, affine=False)],
+        key="layer_name")
+    ...
+
+def forward(self, x):
+    ...
+    out = self.one_layer(x)
+    ...
+```
+This is for users to specify multiple candidate operations for a layer, one operation will be chosen at last. `key` is the identifier of the layer,it could be used to share choice between multiple `LayerChoice`. For example, there are two `LayerChoice` with the same candidate operations, and you want them to have the same choice (i.e., if first one chooses the `i`th op, the second one also chooses the `i`th op), give them the same key.
+
+```python
+def __init__(self):
+    ...
+    # choose ``n_selected`` from ``n_candidates`` inputs.
+    # n_candidates: the number of candidate inputs
+    # n_chosen: the number of chosen inputs
+    # key: the name of this ``InputChoice`` instance
+    self.input_switch = nni.nas.pytorch.InputChoice(
+        n_candidates=3,
+        n_chosen=1,
+        key="switch_name")
+    ...
+
+def forward(self, x):
+    ...
+    out = self.input_switch([in_tensor1, in_tensor2, in_tensor3])
+    ...
+```
+`InputChoice` is a PyTorch module, in init, it needs meta information, for example, from how many input candidates to choose how many inputs, the name of this initialized `InputChoice`. The real candidate input tensors can only be obtained in `forward` function. In `forward`, `InputChoice` instance is called with real candidate input tensors.
+
+Some [NAS trainers](#one-shot-training-mode) need to know the source layer the input tensors, thus, we add one input argument `choose_from` in `InputChoice` to indicate the source layer of each candidate input. `choose_from` is a list of string, each element is `key` of `LayerChoice` and `InputChoice` or the name of a module (refer to [the code](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/nas/pytorch/mutables.py) for more details).
+
+
+Besides `LayerChoice` and `InputChoice`, we also provide `MutableScope` which allows users to label a sub-network, thus, could provide more semantic information (e.g., the structure of the network) to NAS trainers. Here is an example:
+```python
+class Cell(mutables.MutableScope):
+    def __init__(self, scope_name):
+        super().__init__(scope_name)
+        self.layer1 = nni.nas.pytorch.LayerChoice(...)
+        self.layer2 = nni.nas.pytorch.LayerChoice(...)
+        self.layer3 = nni.nas.pytorch.LayerChoice(...)
+        ...
+```
+The three `LayerChoice` (`layer1`, `layer2`, `layer3`) are included in the `MutableScope` named `scope_name`. NAS trainer could get this hierarchical structure.
+
+
+## Two training modes
+
+After writing your model with search space embedded in the model using the above APIs, the next step is finding the best model from the search space. There are two training modes: [one-shot training mode](#one-shot-training-mode) and [classic distributed search](#classic-distributed-search).
+
+### One-shot training mode
+
+Similar to optimizers of deep learning models, the procedure of finding the best model from search space can be viewed as a type of optimizing process, we call it `NAS trainer`. There have been several NAS trainers, for example, `DartsTrainer` which uses SGD to train architecture weights and model weights iteratively, `ENASTrainer` which uses a controller to train the model. New and more efficient NAS trainers keep emerging in research community.
+
+NNI provides some popular NAS trainers, to use a NAS trainer, users could initialize a trainer after the model is defined:
+
+```python
+# create a DartsTrainer
+trainer = DartsTrainer(model,
+                       loss=criterion,
+                       metrics=lambda output, target: accuracy(output, target, topk=(1,)),
+                       optimizer=optim,
+                       num_epochs=args.epochs,
+                       dataset_train=dataset_train,
+                       dataset_valid=dataset_valid,)
+# finding the best model from search space
+trainer.train()
+# export the best found model
+trainer.export(file='./chosen_arch')
+```
+
+Different trainers could have different input arguments depending on their algorithms. Please refer to [each trainer's code](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch) for detailed arguments. After training, users could export the best one of the found models through `trainer.export()`. No need to start an NNI experiment through `nnictl`.
+
+The supported trainers can be found [here](./Overview.md#supported-one-shot-nas-algorithms). A very simple example using NNI NAS API can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/simple/train.py).
+
+The complete example code can be found [here]().
+
+### Classic distributed search
+
+Neural architecture search is originally executed by running each child model independently as a trial job. We also support this searching approach, and it naturally fits in NNI hyper-parameter tuning framework, where tuner generates child model for next trial and trials run in training service.
+
+For using this mode, no need to change the search space expressed with NNI NAS API (i.e., `LayerChoice`, `InputChoice`, `MutableScope`). After the model is initialized, apply the function `get_and_apply_next_architecture` on the model. One-shot NAS trainers are not used in this mode. Here is a simple example:
+```python
+class Net(nn.Module):
+    # defined model with LayerChoice and InputChoice
+    ...
+
+model = Net()
+# get the chosen architecture from tuner and apply it on model
+get_and_apply_next_architecture(model)
+# your code for training the model
+train(model)
+# test the trained model
+acc = test(model)
+# report the performance of the chosen architecture
+nni.report_final_result(acc)
+```
+
+The search space should be automatically generated and sent to tuner. As with NNI NAS API the search space is embedded in user code, users could use "[nnictl ss_gen](../Tutorial/Nnictl.md)" to generate search space file. Then, put the path of the generated search space in the field `searchSpacePath` of `config.yml`. The other fields in `config.yml` can be filled by referring [this tutorial](../Tutorial/QuickStart.md).
+
+You could use [NNI tuners](../Tuner/BuiltinTuner.md) to do the search.
+
+We support standalone mode for easy debugging, where you could directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode.
+
+The complete example code can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/classic_nas/config_nas.yml).
+
+## Programming interface for NAS algorithm
+
+We also provide simple interface for users to easily implement a new NAS trainer on NNI.
+
+### Implement a new NAS trainer on NNI
+
+To implement a new NAS trainer, users basically only need to implement two classes by inheriting `BaseMutator` and `BaseTrainer` respectively.
+
+In `BaseMutator`, users need to overwrite `on_forward_layer_choice` and `on_forward_input_choice`, which are the implementation of `LayerChoice` and `InputChoice` respectively. Users could use property `mutables` to get all `LayerChoice` and `InputChoice` in the model code. Then users need to implement a new trainer, which instantiates the new mutator and implement the training logic. For details, please read [the code](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch) and the supported trainers, for example, [DartsTrainer](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/darts).
+
+### Implement an NNI tuner for NAS
+
+NNI tuner for NAS takes the auto generated search space. The search space format of `LayerChoice` and `InputChoice` is shown below:
+```json
+{
+    "key_name": {
+        "_type": "layer_choice",
+        "_value": ["op1_repr", "op2_repr", "op3_repr"]
+    },
+    "key_name": {
+        "_type": "input_choice",
+        "_value": {
+            "candidates": ["in1_key", "in2_key", "in3_key"],
+            "n_chosen": 1
+        }
+    }
+}
+```
+
+Correspondingly, the generate architecture is in the following format:
+```json
+{
+    "key_name": {
+        "_value": "op1_repr",
+        "_idx": 0
+    },
+    "key_name": {
+        "_value": ["in2_key"],
+        "_idex": [1]
+    }
+}
+```
\ No newline at end of file
--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -6,18 +6,56 @@ However, it takes great efforts to implement NAS algorithms, and it is hard to r

 With this motivation, our ambition is to provide a unified architecture in NNI, to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.

-## Supported algorithms
+With [the unified interface](.NasInterface.md), there are two different modes for the architecture search. [The one](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](.ClassicNas.md) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model.
+
+* [Supported One-shot NAS Algorithms](#supported-one-shot-nas-algorithms)
+* [Classic Distributed NAS with NNI experiment](.NasInterface.md#classic-distributed-search)
+* [NNI NAS Programming Interface](.NasInterface.md)
+
+## Supported One-shot NAS Algorithms

 NNI supports below NAS algorithms now and being adding more. User can reproduce an algorithm or use it on owned dataset. we also encourage user to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.

-Note, these algorithms run standalone without nnictl, and supports PyTorch only.
+|Name|Brief Introduction of Algorithm|
+|---|---|
+| [ENAS](#enas) | Efficient Neural Architecture Search via Parameter Sharing [Reference Paper][1] |
+| [DARTS](#darts) | DARTS: Differentiable Architecture Search [Reference Paper][3] |
+| [P-DARTS](#p-darts) | Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation [Reference Paper](https://arxiv.org/abs/1904.12760)|
+
+Note, these algorithms run **standalone without nnictl**, and supports PyTorch only. Tensorflow 2.0 will be supported in future release.

 ### Dependencies

-* Install latest NNI
+* NNI 1.2+
+* tensorboard
 * PyTorch 1.2+
 * git

+### ENAS
+
+[Efficient Neural Architecture Search via Parameter Sharing][1]. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance.
+
+#### Usage
+
+ENAS in NNI is still under development and we only support search phase for macro/micro search space on CIFAR10. Training from scratch and search space on PTB has not been finished yet.
+
+```bash
+# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
+git clone https://github.com/Microsoft/nni.git
+
+# search the best architecture
+cd examples/nas/enas
+
+# search in macro search space
+python3 search.py --search-for macro
+
+# search in micro search space
+python3 search.py --search-for micro
+
+# view more options for search
+python3 search.py -h
+```
+
 ### DARTS

 The main contribution of [DARTS: Differentiable Architecture Search][3] on algorithm is to introduce a novel algorithm for differentiable network architecture search on bilevel optimization.
@@ -57,7 +95,7 @@ python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json

 ## Use NNI API

-NOTE, we are trying to support various NAS algorithms with unified programming interface, and it's in very experimental stage. It means the current programing interface may be updated significantly.
+NOTE, we are trying to support various NAS algorithms with unified programming interface, and it's in very experimental stage. It means the current programing interface may be updated in future.

 *previous [NAS annotation](../AdvancedFeature/GeneralNasInterfaces.md) interface will be deprecated soon.*


--- a/docs/en_US/feature_engineering.rst
+++ b/docs/en_US/feature_engineering.rst
+#################
+Feature Engineering
+#################
+
+We are glad to announce the alpha release for Feature Engineering toolkit on top of NNI,
+it's still in the experiment phase which might evolve based on usage feedback.
+We'd like to invite you to use, feedback and even contribute.
+
+For details, please refer to the following tutorials:
+
+..  toctree::
+    :maxdepth: 2
+
+    Overview <FeatureEngineering/Overview>
+    GradientFeatureSelector <FeatureEngineering/GradientFeatureSelector>
+    GBDTSelector <FeatureEngineering/GBDTSelector>
--- a/docs/en_US/model_compression.rst
+++ b/docs/en_US/model_compression.rst
+#################
+Model Compression
+#################
+
+NNI provides an easy-to-use toolkit to help user design and use compression algorithms.
+It supports Tensorflow and PyTorch with unified interface.
+For users to compress their models, they only need to add several lines in their code.
+There are some popular model compression algorithms built-in in NNI.
+Users could further use NNI's auto tuning power to find the best compressed model,
+which is detailed in Auto Model Compression.
+On the other hand, users could easily customize their new compression algorithms using NNI's interface.
+
+For details, please refer to the following tutorials:
+
+..  toctree::
+    :maxdepth: 2
+
+    Overview <Compressor/Overview>
+    Level Pruner <Compressor/Pruner>
+    AGP Pruner <Compressor/Pruner>
+    L1Filter Pruner <Compressor/L1FilterPruner>
+    Slim Pruner <Compressor/SlimPruner>
+    Lottery Ticket Pruner <Compressor/LotteryTicketHypothesis>
+    FPGM Pruner <Compressor/Pruner>
+    Naive Quantizer <Compressor/Quantizer>
+    QAT Quantizer <Compressor/Quantizer>
+    DoReFa Quantizer <Compressor/Quantizer>
+    Automatic Model Compression <Compressor/AutoCompression>
--- a/docs/en_US/nas.rst
+++ b/docs/en_US/nas.rst
+#################
+NAS Algorithms
+#################
+
+Automatic neural architecture search is taking an increasingly important role on finding better models.
+Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models.
+Some of representative works are NASNet, ENAS, DARTS, Network Morphism, and Evolution. There are new innovations keeping emerging.
+
+However, it takes great efforts to implement NAS algorithms, and it is hard to reuse code base of existing algorithms in new one.
+To facilitate NAS innovations (e.g., design and implement new NAS models, compare different NAS models side-by-side),
+an easy-to-use and flexible programming interface is crucial.
+
+With this motivation, our ambition is to provide a unified architecture in NNI,
+to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.
+
+For details, please refer to the following tutorials:
+
+..  toctree::
+    :maxdepth: 2
+
+    Overview <NAS/Overview>
+    NAS Interface <NAS/NasInterface>
+    ENAS <NAS/Overview>
+    DARTS <NAS/Overview>
+    P-DARTS <NAS/Overview>
--- a/docs/en_US/tutorials.rst
+++ b/docs/en_US/tutorials.rst
@@ -9,6 +9,9 @@ Tutorials
    Write Trial <TrialExample/Trials>
    Tuners <tuners>
    Assessors <assessors>
+    NAS (Beta) <nas>
+    Model Compression (Beta) <model_compression>
+    Feature Engineering (Beta) <feature_engineering>
    WebUI <Tutorial/WebUI>
    Training Platform <training_services>
    How to use docker <Tutorial/HowToUseDocker>