Neural Architecture Search Interface (#1103)

* Dev nas interface -- document (#1049) * nas interface doc * Dev nas compile -- code generator (#1067) * finish code for parsing mutable_layers annotation and testcode * Dev nas interface -- update figures (#1070) update figs * update searchspace_generator (#1071) * GeneralNasInterfaces.md: Fix a typo (#1079) Signed-off-by: Ce Gao <gaoce@caicloud.io> * add NAS example and fix bugs (#1083) update searchspace_generator, add example, update NAS example * fix bugs (#1108) * Remove NAS example (#1116) remove example * update (#1119) * Dev nas interface2 (#1121) update doc * Fix comment for pr of nas (#1122) resolve comment

Neural Architecture Search Interface (#1103)
* Dev nas interface -- document (#1049) * nas interface doc * Dev nas compile -- code generator (#1067) * finish code for parsing mutable_layers annotation and testcode * Dev nas interface -- update figures (#1070) update figs * update searchspace_generator (#1071) * GeneralNasInterfaces.md: Fix a typo (#1079) Signed-off-by: Ce Gao <gaoce@caicloud.io> * add NAS example and fix bugs (#1083) update searchspace_generator, add example, update NAS example * fix bugs (#1108) * Remove NAS example (#1116) remove example * update (#1119) * Dev nas interface2 (#1121) update doc * Fix comment for pr of nas (#1122) resolve comment
fe338861 · Zejun Lin · xuehui · 171ae918 · fe338861 · fe338861
Commit fe338861 authored May 28, 2019 by Zejun Lin Committed by xuehui May 28, 2019
13 changed files
--- a/docs/en_US/GeneralNasInterfaces.md
+++ b/docs/en_US/GeneralNasInterfaces.md
+# General Programming Interface for Neural Architecture Search
+Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another.
+To facilitate NAS innovations (e.g., design/implement new NAS models, compare different NAS models side-by-side), an easy-to-use and flexibile programming interface is crucial.
+## Programming interface
+ A new programming interface for designing and searching for a model is often demanded in two scenarios. 1) When designing a neural network, the designer may have multiple choices for a layer, sub-model, or connection, and not sure which one or a combination performs the best. It would be appealing to have an easy way to express the candidate layers/sub-models they want to try. 2) For the researchers who are working on automatic NAS, they want to have an unified way to express the search space of neural architectures. And making unchanged trial code adapted to different searching algorithms.
+ We designed a simple and flexible programming interface based on [NNI annotation](./AnnotationSpec.md). It is elaborated through examples below.
+ ### Example: choose an operator for a layer
+When designing the following model there might be several choices in the fourth layer that may make this model perform good. In the script of this model, we can use annotation for the fourth layer as shown in the figure. In this annotation, there are five fields in total:
+![](../img/example_layerchoice.png)
+* __layer_choice__: It is a list of function calls, each function should have defined in user's script or imported libraries. The input arguments of the function should follow the format: `def XXX(inputs, arg2, arg3, ...)`, where `inputs` is a list with two elements. One is the list of `fixed_inputs`, and the other is a list of the chosen inputs from `optional_inputs`. `conv` and `pool` in the figure are examples of function definition. For the function calls in this list, no need to write the first argument (i.e., `input`). Note that only one of the function calls are chosen for this layer.
+* __fixed_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another nni.mutable_layer before this layer, or other python variables before this layer. All the variables in this list will be fed into the chosen function in `layer_choice` (as the first element of the `input` list).
+* __optional_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another nni.mutable_layer before this layer, or other python variables before this layer. Only `input_num` variables will be fed into the chosen function in `layer_choice` (as the second element of the `input` list).
+* __optional_input_size__: It indicates how many inputs are chosen from `input_candidates`. It could be a number or a range. A range [1,3] means it chooses 1, 2, or 3 inputs.
+* __layer_output__: The name of the output(s) of this layer, in this case it represents the return of the function call in `layer_choice`. This will be a variable name that can be used in the following python code or nni.mutable_layer(s).
+There are two ways to write annotation for this example. For the upper one, `input` of the function calls is `[[],[out3]]`. For the bottom one, `input` is `[[out3],[]]`.
+### Example: choose input connections for a layer
+Designing connections of layers is critical for making a high performance model. With our provided interface, users could annotate which connections a layer takes (as inputs). They could choose several ones from a set of connections. Below is an example which chooses two inputs from three candidate inputs for `concat`. Here `concat` always takes the output of its previous layer using `fixed_inputs`.
+![](../img/example_connectchoice.png)
+### Example: choose both operators and connections
+In this example, we choose one from the three operators and choose two connections for it. As there are multiple variables in `inputs`, we call `concat` at the beginning of the functions.
+![](../img/example_combined.png)
+### Example: [ENAS][1] macro search space
+To illustrate the convenience of the programming interface, we use the interface to implement the trial code of "ENAS + macro search space". The left figure is the macro search space in ENAS paper.
+![](../img/example_enas.png)
+## Unified NAS search space specification
+After finishing the trial code through the annotation above, users have implicitly specified the search space of neural architectures in the code. Based on the code, NNI will automatcailly generate a search space file which could be fed into tuning algorithms. This search space file follows the following `json` format.
+```json
+{
+    "mutable_1": {
+        "layer_1": {
+            "layer_choice": ["conv(ch=128)", "pool", "identity"],
+            "optional_inputs": ["out1", "out2", "out3"],
+            "optional_input_size": 2
+        },
+        "layer_2": {
+            ...
+        }
+    }
+}
+```
+Accordingly, a specified neural architecture (generated by tuning algorithm) is expressed as follows:
+```json
+{
+    "mutable_1": {
+        "layer_1": {
+            "chosen_layer": "pool",
+            "chosen_inputs": ["out1", "out3"]
+        },
+        "layer_2": {
+            ...
+        }
+    }
+}
+```
+With the specification of the format of search space and architecture (choice) expression, users are free to implement various (general) tuning algorithms for neural architecture search on NNI. One future work is to provide a general NAS algorihtm.
+=============================================================
+## Neural architecture search on NNI
+### Basic flow of experiment execution
+NNI's annotation compiler transforms the annotated trial code to the code that could receive architecture choice and build the corresponding model (i.e., graph). The NAS search space can be seen as a full graph (here, full graph means enabling all the provided operators and connections to build a graph), the architecture chosen by the tuning algorithm is a subgraph in it. By default, the compiled trial code only builds and executes the subgraph.
+![](../img/nas_on_nni.png)
+The above figure shows how the trial code runs on NNI. `nnictl` processes user trial code to generate a search space file and compiled trial code. The former is fed to tuner, and the latter is used to run trilas. 
+[__TODO__] Simple example of NAS on NNI.
+### Weight sharing
+Sharing weights among chosen architectures (i.e., trials) could speedup model search. For example, properly inheriting weights of completed trials could speedup the converge of new trials. One-Shot NAS (e.g., ENAS, Darts) is more aggressive, the training of different architectures (i.e., subgraphs) shares the same copy of the weights in full graph.
+![](../img/nas_weight_share.png)
+We believe weight sharing (transferring) plays a key role on speeding up NAS, while finding efficient ways of sharing weights is still a hot research topic. We provide a key-value store for users to store and load weights. Tuners and Trials use a provided KV client lib to access the storage.
+[__TODO__] Example of weight sharing on NNI.
+### Support of One-Shot NAS
+One-Shot NAS is a popular approach to find good neural architecture within a limited time and resource budget. Basically, it builds a full graph based on the search space, and uses gradient descent to at last find the best subgraph. There are different training approaches, such as [training subgraphs (per mini-batch)][1], [training full graph through dropout][6], [training with architecture weights (regularization)][3]. Here we focus on the first approach, i.e., training subgraphs (ENAS).
+With the same annotated trial code, users could choose One-Shot NAS as execution mode on NNI. Specifically, the compiled trial code builds the full graph (rather than subgraph demonstrated above), it receives a chosen architecture and training this architecture on the full graph for a mini-batch, then request another chosen architecture. It is supported by [NNI multi-phase](./multiPhase.md). We support this training approach because training a subgraph is very fast, building the graph every time training a subgraph induces too much overhead.
+![](../img/one-shot_training.png)
+The design of One-Shot NAS on NNI is shown in the above figure. One-Shot NAS usually only has one trial job with full graph. NNI supports running multiple such trial jobs each of which runs independently. As One-Shot NAS is not stable, running multiple instances helps find better model. Moreover, trial jobs are also able to synchronize weights during running (i.e., there is only one copy of weights, like asynchroneous parameter-server mode). This may speedup converge.
+[__TODO__] Example of One-Shot NAS on NNI.
+## General tuning algorithms for NAS
+Like hyperparameter tuning, a relatively general algorithm for NAS is required. The general programming interface makes this task easier to some extent. We have a RL-based tuner algorithm for NAS from our contributors. We expect efforts from community to design and implement better NAS algorithms.
+[__TODO__] More tuning algorithms for NAS.
+## Export best neural architecture and code
+[__TODO__] After the NNI experiment is done, users could run `nnictl experiment export --code` to export the trial code with the best neural architecture.
+## Conclusion and Future work
+There could be different NAS algorithms and execution modes, but they could be supported with the same programming interface as demonstrated above. 
+There are many interesting research topics in this area, both system and machine learning. 
+[1]: https://arxiv.org/abs/1802.03268
+[2]: https://arxiv.org/abs/1707.07012
+[3]: https://arxiv.org/abs/1806.09055
+[4]: https://arxiv.org/abs/1806.10282
+[5]: https://arxiv.org/abs/1703.01041
+[6]: http://proceedings.mlr.press/v80/bender18a/bender18a.pdf
--- a/docs/img/example_combined.png
+++ b/docs/img/example_combined.png
--- a/docs/img/example_connectchoice.png
+++ b/docs/img/example_connectchoice.png
--- a/docs/img/example_enas.png
+++ b/docs/img/example_enas.png
--- a/docs/img/example_layerchoice.png
+++ b/docs/img/example_layerchoice.png
--- a/docs/img/nas_on_nni.png
+++ b/docs/img/nas_on_nni.png
--- a/docs/img/nas_weight_share.png
+++ b/docs/img/nas_weight_share.png
--- a/docs/img/one-shot_training.png
+++ b/docs/img/one-shot_training.png
--- a/examples/trials/NAS/README.md
+++ b/examples/trials/NAS/README.md
+ **Run Neural Network Architecture Search in NNI**	
+ ===	
+Now we have an NAS example [NNI-NAS-Example](https://github.com/Crysple/NNI-NAS-Example) run in NNI using NAS interface from our contributors.	
+Thanks our lovely contributors. 	
+And welcome more and more people to join us!
--- a/src/sdk/pynni/nni/smartparam.py
+++ b/src/sdk/pynni/nni/smartparam.py
@@ -36,7 +36,8 @@ __all__ = [
    'qnormal',
    'lognormal',
    'qlognormal',
-    'function_choice'
+    'function_choice',
+    'mutable_layer'
 ]
@@ -78,6 +79,9 @@ if trial_env_vars.NNI_PLATFORM is None:
    def function_choice(*funcs, name=None):
        return random.choice(funcs)()
+    def mutable_layer():
+        raise RuntimeError('Cannot call nni.mutable_layer in this mode')
 else:
    def choice(options, name=None, key=None):
@@ -113,6 +117,42 @@ else:
    def function_choice(funcs, name=None, key=None):
        return funcs[_get_param(key)]()
+    def mutable_layer(
+            mutable_id,
+            mutable_layer_id,
+            funcs,
+            funcs_args,
+            fixed_inputs,
+            optional_inputs,
+            optional_input_size=0):
+        '''execute the chosen function and inputs.
+        Below is an example of chosen function and inputs:
+        {
+            "mutable_id": {
+                "mutable_layer_id": {
+                    "chosen_layer": "pool",
+                    "chosen_inputs": ["out1", "out3"]
+                }
+            }
+        }
+        Parameters:
+        ---------------
+        mutable_id: the name of this mutable_layer block (which could have multiple mutable layers)
+        mutable_layer_id: the name of a mutable layer in this block
+        funcs: dict of function calls
+        funcs_args:
+        fixed_inputs:
+        optional_inputs: dict of optional inputs
+        optional_input_size: number of candidate inputs to be chosen
+        '''
+        mutable_block = _get_param(mutable_id)
+        chosen_layer = mutable_block[mutable_layer_id]["chosen_layer"]
+        chosen_inputs = mutable_block[mutable_layer_id]["chosen_inputs"]
+        real_chosen_inputs = [optional_inputs[input_name] for input_name in chosen_inputs]
+        layer_out = funcs[chosen_layer]([fixed_inputs, real_chosen_inputs], *funcs_args[chosen_layer])
+        return layer_out
    def _get_param(key):
        if trial._params is None:
            trial.get_next_parameter()

--- a/tools/nni_annotation/code_generator.py
+++ b/tools/nni_annotation/code_generator.py
@@ -25,6 +25,94 @@ from nni_cmd.common_utils import print_warning
 # pylint: disable=unidiomatic-typecheck
+def parse_annotation_mutable_layers(code, lineno):
+    """Parse the string of mutable layers in annotation.
+    Return a list of AST Expr nodes
+    code: annotation string (excluding '@')
+    """
+    module = ast.parse(code)
+    assert type(module) is ast.Module, 'internal error #1'
+    assert len(module.body) == 1, 'Annotation mutable_layers contains more than one expression'
+    assert type(module.body[0]) is ast.Expr, 'Annotation is not expression'
+    call = module.body[0].value
+    nodes = []
+    mutable_id = 'mutable_block_' + str(lineno)
+    mutable_layer_cnt = 0
+    for arg in call.args:
+        fields = {'layer_choice': False,
+                  'fixed_inputs': False,
+                  'optional_inputs': False,
+                  'optional_input_size': False,
+                  'layer_output': False}
+        for k, value in zip(arg.keys, arg.values):
+            if k.id == 'layer_choice':
+                assert not fields['layer_choice'], 'Duplicated field: layer_choice'
+                assert type(value) is ast.List, 'Value of layer_choice should be a list'
+                call_funcs_keys = []
+                call_funcs_values = []
+                call_kwargs_values = []
+                for call in value.elts:
+                    assert type(call) is ast.Call, 'Element in layer_choice should be function call'
+                    call_name = astor.to_source(call).strip()
+                    call_funcs_keys.append(ast.Str(s=call_name))
+                    call_funcs_values.append(call.func)
+                    assert not call.args, 'Number of args without keyword should be zero'
+                    kw_args = []
+                    kw_values = []
+                    for kw in call.keywords:
+                        kw_args.append(kw.arg)
+                        kw_values.append(kw.value)
+                    call_kwargs_values.append(ast.Dict(keys=kw_args, values=kw_values))
+                call_funcs = ast.Dict(keys=call_funcs_keys, values=call_funcs_values)
+                call_kwargs = ast.Dict(keys=call_funcs_keys, values=call_kwargs_values)
+                fields['layer_choice'] = True
+            elif k.id == 'fixed_inputs':
+                assert not fields['fixed_inputs'], 'Duplicated field: fixed_inputs'
+                assert type(value) is ast.List, 'Value of fixed_inputs should be a list'
+                fixed_inputs = value
+                fields['fixed_inputs'] = True
+            elif k.id == 'optional_inputs':
+                assert not fields['optional_inputs'], 'Duplicated field: optional_inputs'
+                assert type(value) is ast.List, 'Value of optional_inputs should be a list'
+                var_names = [ast.Str(s=astor.to_source(var).strip()) for var in value.elts]
+                optional_inputs = ast.Dict(keys=var_names, values=value.elts)
+                fields['optional_inputs'] = True
+            elif k.id == 'optional_input_size':
+                assert not fields['optional_input_size'], 'Duplicated field: optional_input_size'
+                assert type(value) is ast.Num, 'Value of optional_input_size should be a number'
+                optional_input_size = value
+                fields['optional_input_size'] = True
+            elif k.id == 'layer_output':
+                assert not fields['layer_output'], 'Duplicated field: layer_output'
+                assert type(value) is ast.Name, 'Value of layer_output should be ast.Name type'
+                layer_output = value
+                fields['layer_output'] = True
+            else:
+                raise AssertionError('Unexpected field in mutable layer')
+        # make call for this mutable layer
+        assert fields['layer_choice'], 'layer_choice must exist'
+        assert fields['layer_output'], 'layer_output must exist'
+        mutable_layer_id = 'mutable_layer_' + str(mutable_layer_cnt)
+        mutable_layer_cnt += 1
+        target_call_attr = ast.Attribute(value=ast.Name(id='nni', ctx=ast.Load()), attr='mutable_layer', ctx=ast.Load())
+        target_call_args = [ast.Str(s=mutable_id),
+                            ast.Str(s=mutable_layer_id),
+                            call_funcs,
+                            call_kwargs]
+        if fields['fixed_inputs']:
+            target_call_args.append(fixed_inputs)
+        else:
+            target_call_args.append(ast.NameConstant(value=None))
+        if fields['optional_inputs']:
+            target_call_args.append(optional_inputs)
+            assert fields['optional_input_size'], 'optional_input_size must exist when optional_inputs exists'
+            target_call_args.append(optional_input_size)
+        else:
+            target_call_args.append(ast.NameConstant(value=None))
+        target_call = ast.Call(func=target_call_attr, args=target_call_args, keywords=[])
+        node = ast.Assign(targets=[layer_output], value=target_call)
+        nodes.append(node)
+    return nodes
 def parse_annotation(code):
    """Parse an annotation string.
@@ -235,6 +323,9 @@ class Transformer(ast.NodeTransformer):
                or string.startswith('@nni.get_next_parameter('):
            return parse_annotation(string[1:])  # expand annotation string to code
+        if string.startswith('@nni.mutable_layers('):
+            return parse_annotation_mutable_layers(string[1:], node.lineno)
        if string.startswith('@nni.variable(') \
                or string.startswith('@nni.function_choice('):
            self.stack[-1] = string[1:]  # mark that the next expression is annotated

--- a/tools/nni_annotation/search_space_generator.py
+++ b/tools/nni_annotation/search_space_generator.py
@@ -38,7 +38,8 @@ _ss_funcs = [
    'qnormal',
    'lognormal',
    'qlognormal',
-    'function_choice'
+    'function_choice',
+    'mutable_layer'
 ]
@@ -50,6 +51,18 @@ class SearchSpaceGenerator(ast.NodeTransformer):
        self.search_space = {}
        self.last_line = 0  # last parsed line, useful for error reporting
+    def generate_mutable_layer_search_space(self, args):
+        mutable_block = args[0].s
+        mutable_layer = args[1].s
+        if mutable_block not in self.search_space:
+            self.search_space[mutable_block] = dict()
+        self.search_space[mutable_block][mutable_layer] = {
+            'layer_choice': [key.s for key in args[2].keys],
+            'optional_inputs': [key.s for key in args[5].keys],
+            'optional_input_size': args[6].n
+        }
    def visit_Call(self, node):  # pylint: disable=invalid-name
        self.generic_visit(node)
@@ -68,6 +81,10 @@ class SearchSpaceGenerator(ast.NodeTransformer):
        self.last_line = node.lineno
+        if func == 'mutable_layer':
+            self.generate_mutable_layer_search_space(node.args)
+            return node
        if node.keywords:
            # there is a `name` argument
            assert len(node.keywords) == 1, 'Smart parameter has keyword argument other than "name"'

--- a/tools/nni_annotation/testcase/mutable_layer_usercode/simple.py
+++ b/tools/nni_annotation/testcase/mutable_layer_usercode/simple.py
+import time
+def add_one(inputs):
+    return inputs + 1
+def add_two(inputs):
+    return inputs + 2
+def add_three(inputs):
+    return inputs + 3
+def add_four(inputs):
+    return inputs + 4
+def main():
+    images = 5
+    """@nni.mutable_layers(
+    {
+        layer_choice: [add_one(), add_two(), add_three(), add_four()],
+        optional_inputs: [images],
+        optional_input_size: 1,
+        layer_output: layer_1_out
+    },
+    {
+        layer_choice: [add_one(), add_two(), add_three(), add_four()],
+        optional_inputs: [layer_1_out],
+        optional_input_size: 1,
+        layer_output: layer_2_out
+    },
+    {
+        layer_choice: [add_one(), add_two(), add_three(), add_four()],
+        optional_inputs: [layer_1_out, layer_2_out],
+        optional_input_size: 1,
+        layer_output: layer_3_out
+    }
+    )"""
+    """@nni.report_intermediate_result(layer_1_out)"""
+    time.sleep(2)
+    """@nni.report_intermediate_result(layer_2_out)"""
+    time.sleep(2)
+    """@nni.report_intermediate_result(layer_3_out)"""
+    time.sleep(2)
+    layer_3_out = layer_3_out + 10
+    """@nni.report_final_result(layer_3_out)"""
+if __name__ == '__main__':
+    main()