Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in [search space spec](SearchSpaceSpec.md).
Suggested sceanrio: If the configurations you want to try have been decided, you can list them in searchspace file (using choice) and run them using batch tuner.
\ No newline at end of file
Suggested scenario: If the configurations you want to try have been decided, you can list them in SearchSpace file (using choice) and run them using batch tuner.
NNI provides state-of-the-art tuning algorithm in our builtin-assessors and makes them easy to use. Below is the brief overview of NNI current builtin Assessors:
Note: Click the **Assessor's name** to get a detailed description of the algorithm, click the corresponding **Usage** to get the Assessor's installation requirements, suggested scenario and using example.
Note: Click the **Assessor's name** to get the Assessor's installation requirements, suggested scenario and using example. The link for a detailed description of the algorithm is at the end of the suggested scenario of each Assessor.
Currently we support the following Assessors:
...
...
@@ -25,7 +25,7 @@ Note: Please follow the format when you write your `config.yml` file.
**Suggested scenario**
It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress.
It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress.[Detailed Description](./MedianstopAssessor.md)
**Requirement of classArg**
...
...
@@ -53,7 +53,7 @@ assessor:
**Suggested scenario**
It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress. Even better, it's able to handle and assess curves with similar performance.
It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress. Even better, it's able to handle and assess curves with similar performance.[Detailed Description](./CurvefittingAssessor.md)
NNI provides state-of-the-art tuning algorithm as our builtin-tuners and makes them easy to use. Below is the brief summary of NNI currently built-in Tuners:
Note: Click the **Tuner's name** to get a detailed description of the algorithm, click the corresponding **Usage** to get the Tuner's installation requirements, suggested scenario and using example. Here is an [article](./CommunitySharings/HPOComparison.md) about the comparison of different Tuners on several problems.
Note: Click the **Tuner's name** to get the Tuner's installation requirements, suggested scenario and using example. The link for a detailed description of the algorithm is at the end of the suggested scenario of each tuner. Here is an [article](./CommunitySharings/HpoComparision.md) about the comparison of different Tuners on several problems.
Currently we support the following algorithms:
...
...
@@ -36,7 +36,8 @@ Note: Please follow the format when you write your `config.yml` file. Some built
**Suggested scenario**
TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resource and can only try a small number of trials. From a large amount of experiments, we could found that TPE is far better than Random Search.
TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resource and can only try a small number of trials. From a large amount of experiments, we could found that TPE is far better than Random Search. [Detailed Description](./HyperoptTuner.md)
**Requirement of classArg**
...
...
@@ -62,7 +63,7 @@ tuner:
**Suggested scenario**
Random search is suggested when each trial does not take too long (e.g., each trial can be completed very soon, or early stopped by assessor quickly), and you have enough computation resource. Or you want to uniformly explore the search space. Random Search could be considered as baseline of search algorithm.
Random search is suggested when each trial does not take too long (e.g., each trial can be completed very soon, or early stopped by assessor quickly), and you have enough computation resource. Or you want to uniformly explore the search space. Random Search could be considered as baseline of search algorithm.[Detailed Description](./HyperoptTuner.md)
**Requirement of classArg:**
...
...
@@ -86,7 +87,8 @@ tuner:
**Suggested scenario**
Anneal is suggested when each trial does not take too long, and you have enough computation resource(almost same with Random Search). Or the variables in search space could be sample from some prior distribution.
Anneal is suggested when each trial does not take too long, and you have enough computation resource(almost same with Random Search). Or the variables in search space could be sample from some prior distribution. [Detailed Description](./HyperoptTuner.md)
**Requirement of classArg**
...
...
@@ -112,7 +114,8 @@ tuner:
**Suggested scenario**
Its requirement of computation resource is relatively high. Specifically, it requires large initial population to avoid falling into local optimum. If your trial is short or leverages assessor, this tuner is a good choice. And, it is more suggested when your trial code supports weight transfer, that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training progress.
Its requirement of computation resource is relatively high. Specifically, it requires large initial population to avoid falling into local optimum. If your trial is short or leverages assessor, this tuner is a good choice. And, it is more suggested when your trial code supports weight transfer, that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training progress. [Detailed Description](./EvolutionTuner.md)
Similar to TPE, SMAC is also a black-box tuner which can be tried in various scenarios, and is suggested when computation resource is limited. It is optimized for discrete hyperparameters, thus, suggested when most of your hyperparameters are discrete.
Similar to TPE, SMAC is also a black-box tuner which can be tried in various scenarios, and is suggested when computation resource is limited. It is optimized for discrete hyperparameters, thus, suggested when most of your hyperparameters are discrete.[Detailed Description](./SmacTuner.md)
**Requirement of classArg**
...
...
@@ -170,7 +173,7 @@ tuner:
**Suggested scenario**
If the configurations you want to try have been decided, you can list them in searchspace file (using `choice`) and run them using batch tuner.
If the configurations you want to try have been decided, you can list them in searchspace file (using `choice`) and run them using batch tuner.[Detailed Description](./BatchTuner.md)
**Usage example**
...
...
@@ -211,7 +214,7 @@ The search space file including the high-level key `combine_params`. The type of
Note that the only acceptable types of search space are `choice`, `quniform`, `qloguniform`. **The number `q` in `quniform` and `qloguniform` has special meaning (different from the spec in [search space spec](./SearchSpaceSpec.md)). It means the number of values that will be sampled evenly from the range `low` and `high`.**
It is suggested when search space is small, it is feasible to exhaustively sweeping the whole search space.
It is suggested when search space is small, it is feasible to exhaustively sweeping the whole search space.[Detailed Description](./GridsearchTuner.md)
**Usage example**
...
...
@@ -231,7 +234,7 @@ tuner:
**Suggested scenario**
It is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent.
It is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent.[Detailed Description](./HyperbandAdvisor.md)
**Requirement of classArg**
...
...
@@ -265,7 +268,7 @@ NetworkMorphism requires [pyTorch](https://pytorch.org/get-started/locally), so
**Suggested scenario**
It is suggested that you want to apply deep learning methods to your task (your own dataset) but you have no idea of how to choose or design a network. You modify the [example](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate or optimizer. It is feasible for different tasks to find a good network architecture. Now this tuner only supports the computer vision domain.
It is suggested that you want to apply deep learning methods to your task (your own dataset) but you have no idea of how to choose or design a network. You modify the [example](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate or optimizer. It is feasible for different tasks to find a good network architecture. Now this tuner only supports the computer vision domain.[Detailed Description](./NetworkmorphismTuner.md)
**Requirement of classArg**
...
...
@@ -305,7 +308,7 @@ Metis Tuner requires [sklearn](https://scikit-learn.org/), so users should insta
**Suggested scenario**
Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the nni SDK.
Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the nni SDK.[Detailed Description](./MetisTuner.md)
Similar to Hyperband, it is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. In this case, it may converges to a better configuration due to bayesian optimization usage.
Similar to Hyperband, it is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. In this case, it may converges to a better configuration due to bayesian optimization usage.[Detailed Description](./BohbAdvisor.md)
In this tutorial, we first introduce a github repo [Recommenders](https://github.com/Microsoft/Recommenders). It is a repository that provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. It has various models that are popular and widely deployed in recommendation systems. To provide a complete end-to-end experience, they present each example in five key tasks, as shown below:
-[Prepare Data](https://github.com/Microsoft/Recommenders/blob/master/notebooks/01_prepare_data/README.md): Preparing and loading data for each recommender algorithm
-[Prepare Data](https://github.com/Microsoft/Recommenders/blob/master/notebooks/01_prepare_data/README.md): Preparing and loading data for each recommender algorithm.
-[Model](https://github.com/Microsoft/Recommenders/blob/master/notebooks/02_model/README.md): Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares ([ALS](https://spark.apache.org/docs/latest/api/python/_modules/pyspark/ml/recommendation.html#ALS)) or eXtreme Deep Factorization Machines ([xDeepFM](https://arxiv.org/abs/1803.05170)).
-[Evaluate](https://github.com/Microsoft/Recommenders/blob/master/notebooks/03_evaluate/README.md): Evaluating algorithms with offline metrics
-[Model Select and Optimize](https://github.com/Microsoft/Recommenders/blob/master/notebooks/04_model_select_and_optimize/README.md): Tuning and optimizing hyperparameters for recommender models
-[Operationalize](https://github.com/Microsoft/Recommenders/blob/master/notebooks/05_operationalize/README.md): Operationalizing models in a production environment on Azure
-[Evaluate](https://github.com/Microsoft/Recommenders/blob/master/notebooks/03_evaluate/README.md): Evaluating algorithms with offline metrics.
-[Model Select and Optimize](https://github.com/Microsoft/Recommenders/blob/master/notebooks/04_model_select_and_optimize/README.md): Tuning and optimizing hyperparameters for recommender models.
-[Operationalize](https://github.com/Microsoft/Recommenders/blob/master/notebooks/05_operationalize/README.md): Operationalizing models in a production environment on Azure.
The fourth task is tuning and optimizing the model's hyperparametrs, this is where NNI could help. To give a concrete example that NNI tunes the models in Recommenders, let's demonstrate with the model [SVD](https://github.com/Microsoft/Recommenders/blob/master/notebooks/02_model/surprise_svd_deep_dive.ipynb), and data Movielens100k. There are more than 10 hyperparameters to be tuned in this model.
The fourth task is tuning and optimizing the model's hyperparameters, this is where NNI could help. To give a concrete example that NNI tunes the models in Recommenders, let's demonstrate with the model [SVD](https://github.com/Microsoft/Recommenders/blob/master/notebooks/02_model/surprise_svd_deep_dive.ipynb), and data Movielens100k. There are more than 10 hyperparameters to be tuned in this model.
[This Jupyter notebook](https://github.com/Microsoft/Recommenders/blob/master/notebooks/04_model_select_and_optimize/nni_surprise_svd.ipynb) provided by Recommenders is a very detailed step-by-step tutorial for this example. It uses different built-in tuning algorithms in NNI, including `Annealing`, `SMAC`, `Random Search`, `TPE`, `Hyperband`, `Metis` and `Evolution`. Finally, the results of different tuning algorithms are compared. Please go through this notebook to learn how to use NNI to tune SVD model, then you could further use NNI to tune other models in Recommenders.
\ No newline at end of file
[This Jupyter notebook](https://github.com/Microsoft/Recommenders/blob/master/notebooks/04_model_select_and_optimize/nni_surprise_svd.ipynb) provided by Recommenders is a very detailed step-by-step tutorial for this example. It uses different built-in tuning algorithms in NNI, including `Annealing`, `SMAC`, `Random Search`, `TPE`, `Hyperband`, `Metis` and `Evolution`. Finally, the results of different tuning algorithms are compared. Please go through this notebook to learn how to use NNI to tune SVD model, then you could further use NNI to tune other models in Recommenders.
@@ -24,7 +24,7 @@ Your machine don't have eth0 device, please set [nniManagerIp](ExperimentConfig.
When the duration of experiment reaches the maximum duration, nniManager will not create new trials, but the existing trials will continue unless user manually stop the experiment.
### Could not stop an experiment using `nnictl stop`
If you upgrade your NNI or you delete some config files of NNI when there is an experiment running, this kind of issue may happen because the loss of config file. You could use `ps -ef | grep node` to find the pid of your experiment, and use `kill -9 {pid}` to kill it manually.
If you upgrade your NNI or you delete some config files of NNI when there is an experiment running, this kind of issue may happen because the loss of config file. You could use `ps -ef | grep node` to find the PID of your experiment, and use `kill -9 {pid}` to kill it manually.
### Could not get `default metric` in webUI of virtual machines
Config the network mode to bridge mode or other mode that could make virtual machine's host accessible from external machine, and make sure the port of virtual machine is not forbidden by firewall.
...
...
@@ -34,7 +34,7 @@ Unable to open the WebUI may have the following reasons:
* http://127.0.0.1, http://172.17.0.1 and http://10.0.0.15 are referred to localhost, if you start your experiment on the server or remote machine. You can replace the IP to your server IP to view the WebUI, like http://[your_server_ip]:8080
* If you still can't see the WebUI after you use the server IP, you can check the proxy and the firewall of your machine. Or use the browser on the machine where you start your NNI experiment.
* Another reason may be your experiment is failed and NNI may fail to get the experiment infomation. You can check the log of NNImanager in the following directory: ~/nni/experiment/[your_experiment_id] /log/nnimanager.log
* Another reason may be your experiment is failed and NNI may fail to get the experiment information. You can check the log of NNIManager in the following directory: ~/nni/experiment/[your_experiment_id] /log/nnimanager.log
@@ -4,7 +4,7 @@ _*This is an experimental feature, currently, we only implemented the general NA
Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another.
To facilitate NAS innovations (e.g., design/implement new NAS models, compare different NAS models side-by-side), an easy-to-use and flexibile programming interface is crucial.
To facilitate NAS innovations (e.g., design/implement new NAS models, compare different NAS models side-by-side), an easy-to-use and flexible programming interface is crucial.
## Programming interface
...
...
@@ -12,19 +12,19 @@ To facilitate NAS innovations (e.g., design/implement new NAS models, compare di
We designed a simple and flexible programming interface based on [NNI annotation](./AnnotationSpec.md). It is elaborated through examples below.
### Example: choose an operator for a layer
### Example: choose an operator for a layer
When designing the following model there might be several choices in the fourth layer that may make this model perform good. In the script of this model, we can use annotation for the fourth layer as shown in the figure. In this annotation, there are five fields in total:

* __layer_choice__: It is a list of function calls, each function should have defined in user's script or imported libraries. The input arguments of the function should follow the format: `def XXX(inputs, arg2, arg3, ...)`, where `inputs` is a list with two elements. One is the list of `fixed_inputs`, and the other is a list of the chosen inputs from `optional_inputs`. `conv` and `pool` in the figure are examples of function definition. For the function calls in this list, no need to write the first argument (i.e., `input`). Note that only one of the function calls are chosen for this layer.
* __fixed_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another nni.mutable_layer before this layer, or other python variables before this layer. All the variables in this list will be fed into the chosen function in `layer_choice` (as the first element of the `input` list).
* __optional_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another nni.mutable_layer before this layer, or other python variables before this layer. Only `input_num` variables will be fed into the chosen function in `layer_choice` (as the second element of the `input` list).
* __layer_choice__: It is a list of function calls, each function should have defined in user's script or imported libraries. The input arguments of the function should follow the format: `def XXX(inputs, arg2, arg3, ...)`, where inputs is a list with two elements. One is the list of `fixed_inputs`, and the other is a list of the chosen inputs from `optional_inputs`. `conv` and `pool` in the figure are examples of function definition. For the function calls in this list, no need to write the first argument (i.e., input). Note that only one of the function calls are chosen for this layer.
* __fixed_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another `nni.mutable_layer` before this layer, or other python variables before this layer. All the variables in this list will be fed into the chosen function in `layer_choice` (as the first element of the input list).
* __optional_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another `nni.mutable_layer` before this layer, or other python variables before this layer. Only `optional_input_size` variables will be fed into the chosen function in `layer_choice` (as the second element of the input list).
* __optional_input_size__: It indicates how many inputs are chosen from `input_candidates`. It could be a number or a range. A range [1,3] means it chooses 1, 2, or 3 inputs.
* __layer_output__: The name of the output(s) of this layer, in this case it represents the return of the function call in `layer_choice`. This will be a variable name that can be used in the following python code or nni.mutable_layer(s).
* __layer_output__: The name of the output(s) of this layer, in this case it represents the return of the function call in `layer_choice`. This will be a variable name that can be used in the following python code or `nni.mutable_layer`.
There are two ways to write annotation for this example. For the upper one, `input` of the function calls is `[[],[out3]]`. For the bottom one, `input` is `[[out3],[]]`.
There are two ways to write annotation for this example. For the upper one, input of the function calls is `[[],[out3]]`. For the bottom one, input is `[[out3],[]]`.
__Debugging__: We provided an `nnictl trial codegen` command to help debugging your code of NAS programming on NNI. If your trial with trial_id `XXX` in your experiment `YYY` is failed, you could run `nnictl trial codegen YYY --trial_id XXX` to generate an executable code for this trial under your current directory. With this code, you can directly run the trial command without NNI to check why this trial is failed. Basically, this command is to compile your trial code and replace the NNI NAS code with the real chosen layers and inputs.
...
...
@@ -36,7 +36,7 @@ Designing connections of layers is critical for making a high performance model.
### Example: choose both operators and connections
In this example, we choose one from the three operators and choose two connections for it. As there are multiple variables in `inputs`, we call `concat` at the beginning of the functions.
In this example, we choose one from the three operators and choose two connections for it. As there are multiple variables in inputs, we call `concat` at the beginning of the functions.

...
...
@@ -46,10 +46,9 @@ To illustrate the convenience of the programming interface, we use the interface

## Unified NAS search space specification
After finishing the trial code through the annotation above, users have implicitly specified the search space of neural architectures in the code. Based on the code, NNI will automatcailly generate a search space file which could be fed into tuning algorithms. This search space file follows the following `json` format.
After finishing the trial code through the annotation above, users have implicitly specified the search space of neural architectures in the code. Based on the code, NNI will automatically generate a search space file which could be fed into tuning algorithms. This search space file follows the following JSON format.
```json
{
...
...
@@ -82,7 +81,7 @@ Accordingly, a specified neural architecture (generated by tuning algorithm) is
}
```
With the specification of the format of search space and architecture (choice) expression, users are free to implement various (general) tuning algorithms for neural architecture search on NNI. One future work is to provide a general NAS algorihtm.
With the specification of the format of search space and architecture (choice) expression, users are free to implement various (general) tuning algorithms for neural architecture search on NNI. One future work is to provide a general NAS algorithm.
@@ -116,7 +115,7 @@ With the same annotated trial code, users could choose One-Shot NAS as execution

The design of One-Shot NAS on NNI is shown in the above figure. One-Shot NAS usually only has one trial job with full graph. NNI supports running multiple such trial jobs each of which runs independently. As One-Shot NAS is not stable, running multiple instances helps find better model. Moreover, trial jobs are also able to synchronize weights during running (i.e., there is only one copy of weights, like asynchroneous parameter-server mode). This may speedup converge.
The design of One-Shot NAS on NNI is shown in the above figure. One-Shot NAS usually only has one trial job with full graph. NNI supports running multiple such trial jobs each of which runs independently. As One-Shot NAS is not stable, running multiple instances helps find better model. Moreover, trial jobs are also able to synchronize weights during running (i.e., there is only one copy of weights, like asynchronous parameter-server mode). This may speedup converge.
Example of One-Shot NAS on NNI.
...
...
@@ -133,10 +132,9 @@ After the NNI experiment is done, users could run `nnictl experiment export --co
## Conclusion and Future work
There could be different NAS algorithms and execution modes, but they could be supported with the same programming interface as demonstrated above.
There are many interesting research topics in this area, both system and machine learning.
There could be different NAS algorithms and execution modes, but they could be supported with the same programming interface as demonstrated above.
There are many interesting research topics in this area, both system and machine learning.
@@ -24,15 +26,16 @@ Currently we support installation on Linux, Mac and Windows(local, remote and pa
You can also install NNI in a docker image. Please follow the instructions [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/README.md) to build NNI docker image. The NNI docker image can also be retrieved from Docker Hub through the command `docker pull msranni/nni:latest`.
## **Installation on Windows**
## **Installation on Windows**
When you use PowerShell to run script for the first time, you need **run PowerShell as administrator** with this command:
When you use PowerShell to run script for the first time, you need **run PowerShell as administrator** with this command:
@@ -38,7 +38,7 @@ All types of sampling strategies and their parameter are listed here:
* {"_type":"randint","_value":[lower, upper]}
* For now, we implment the "randint" distribution with "quniform", which means the variable value is a value like round(uniform(lower, upper)). The type of chosen value is float. If you want to use integer value, please convert it explicitly.
* For now, we implement the "randint" distribution with "quniform", which means the variable value is a value like round(uniform(lower, upper)). The type of chosen value is float. If you want to use integer value, please convert it explicitly.
* {"_type":"uniform","_value":[low, high]}
* Which means the variable value is a value uniformly between low and high.
...
...
@@ -92,7 +92,7 @@ Known Limitations:
* Note that In Grid Search Tuner, for users' convenience, the definition of `quniform` and `qloguniform` change, where q here specifies the number of values that will be sampled. Details about them are listed as follows
* Type 'quniform' will receive three values [low, high, q], where [low, high] specifies a range and 'q' specifies the number of values that will be sampled evenly. Note that q should be at least 2. It will be sampled in a way that the first sampled value is 'low', and each of the following values is (high-low)/q larger that the value in front of it.
* Type 'qloguniform' behaves like 'quniform' except that it will first change the range to [log(low), log(high)] and sample and then change the sampled value back.
* Note that Metis Tuner only supports numerical `choice` now
权重分配(转移)在加速 NAS 中有关键作用,而找到有效的权重共享方式仍是热门的研究课题。 NNI 提供了一个键值存储,用于存储和加载权重。 Tuner 和 Trial 使用 KV 客户端库来访问存储。
[**待实现**] NNI 上的权重共享示例。
### 支持 One-Shot NAS
One-Shot NAS 是流行的,能在有限的时间和资源预算内找到较好的神经网络结构的方法。 本质上,它会基于搜索空间来构建完整的图,并使用梯度下降最终找到最佳子图。 它有不同的训练方法,如:[training subgraphs (per mini-batch)](https://arxiv.org/abs/1802.03268) ,[training full graph through dropout](http://proceedings.mlr.press/v80/bender18a/bender18a.pdf),以及 [training with architecture weights (regularization)](https://arxiv.org/abs/1806.09055) 。 这里会关注第一种方法,即训练子图(ENAS)。