In classic NAS algorithms, each architecture is trained as a trial and the NAS algorithm acts as a tuner. Thus, this training mode naturally fits within the NNI hyper-parameter tuning framework, where Tuner generates new architecture for the next trial and trials run in the training service.
## Quick Start
The following example shows how to use classic NAS algorithms. You can see it is quite similar to NNI hyper-parameter tuning.
```python
model=Net()
# get the chosen architecture from tuner and apply it on model
get_and_apply_next_architecture(model)
train(model)# your code for training the model
acc=test(model)# test the trained model
nni.report_final_result(acc)# report the performance of the chosen architecture
```
First, instantiate the model. Search space has been defined in this model through `LayerChoice` and `InputChoice`. After that, user should invoke `get_and_apply_next_architecture(model)` to settle down to a specific architecture. This function receives the architecture from tuner (i.e., the classic NAS algorithm) and applies the architecture to `model`. At this point, `model` becomes a specific architecture rather than a search space. Then users are free to train this model just like training a normal PyTorch model. After get the accuracy of this model, users should invoke `nni.report_final_result(acc)` to report the result to the tuner.
At this point, trial code is ready. Then, we can prepare an NNI experiment, i.e., search space file and experiment config file. Different from NNI hyper-parameter tuning, search space file is automatically generated from the trial code by running the command (the detailed usage of this command can be found [here](../Tutorial/Nnictl.md)):
`nnictl ss_gen --trial_command="the command for running your trial code"`
A file named `nni_auto_gen_search_space.json` is generated by this command. Then put the path of the generated search space in the field `searchSpacePath` of the experiment config file. The other fields of the config file can be filled by referring [this tutorial](../Tutorial/QuickStart.md).
Currently, we only support [PPO Tuner](../Tuner/BuiltinTuner.md) and [random tuner](https://github.com/microsoft/nni/tree/master/examples/tuners/random_nas_tuner) for classic NAS. More classic NAS algorithms will be supported soon.
The complete examples can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/classic_nas) for PyTorch and [here](https://github.com/microsoft/nni/tree/master/examples/nas/classic_nas-tf) for TensorFlow.
## Standalone mode for easy debugging
We support a standalone mode for easy debugging, where you can directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode.
Besides [classic NAS algorithms](./ClassicNas.md), users also apply more advanced one-shot NAS algorithms to find better models from a search space. There are lots of related works about one-shot NAS algorithms, such as [SMASH][8], [ENAS][2], [DARTS][1], [FBNet][3], [ProxylessNAS][4], [SPOS][5], [Single-Path NAS][6], [Understanding One-shot][7] and [GDAS][9]. One-shot NAS algorithms usually build a supernet containing every candidate in the search space as its subnetwork, and in each step, a subnetwork or combination of several subnetworks is trained.
.. contents::
.. Note:: The APIs are in an experimental stage. The current programing interface is subject to change.
```

Modern Neural Architecture Search (NAS) methods usually incorporate [three dimensions][1]: search space, search strategy, and performance estimation strategy. Search space often contains a limited number of neural network architectures to explore, while the search strategy samples architectures from search space, gets estimations of their performance, and evolves itself. Ideally, the search strategy should find the best architecture in the search space and report it to users. After users obtain the "best architecture", many methods use a "retrain step", which trains the network with the same pipeline as any traditional model.
## Implement a Search Space
Assuming we've got a baseline model, what should we do to be empowered with NAS? Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example, the code might look like this:
```python
fromnni.nas.pytorchimportmutables
classNet(nn.Module):
def__init__(self):
super(Net,self).__init__()
self.conv1=mutables.LayerChoice([
nn.Conv2d(1,32,3,1),
nn.Conv2d(1,32,5,3)
])# try 3x3 kernel and 5x5 kernel
self.conv2=nn.Conv2d(32,64,3,1)
self.dropout1=nn.Dropout2d(0.25)
self.dropout2=nn.Dropout2d(0.5)
self.fc1=nn.Linear(9216,128)
self.fc2=nn.Linear(128,10)
defforward(self,x):
x=self.conv1(x)
x=F.relu(x)
# ... same as original ...
returnoutput
```
The example above adds an option of choosing conv5x5 at conv1. The modification is as simple as declaring a `LayerChoice` with the original conv3x3 and a new conv5x5 as its parameter. That's it! You don't have to modify the forward function in any way. You can imagine conv1 as any other module without NAS.
So how about the possibilities of connections? This can be done using `InputChoice`. To allow for a skip connection on the MNIST example, we add another layer called conv3. In the following example, a possible connection from conv2 is added to the output of conv3.
```python
fromnni.nas.pytorchimportmutables
classNet(nn.Module):
def__init__(self):
# ... same ...
self.conv2=nn.Conv2d(32,64,3,1)
self.conv3=nn.Conv2d(64,64,1,1)
# declaring that there is exactly one candidate to choose from
# search strategy will choose one or None
self.skipcon=mutables.InputChoice(n_candidates=1)
# ... same ...
defforward(self,x):
x=self.conv1(x)
x=F.relu(x)
x=self.conv2(x)
x0=self.skipcon([x])# choose one or none from [x]
x=self.conv3(x)
ifx0isnotNone:# skipconnection is open
x+=x0
x=F.max_pool2d(x,2)
# ... same ...
returnoutput
```
Input choice can be thought of as a callable module that receives a list of tensors and outputs the concatenation/sum/mean of some of them (sum by default), or `None` if none is selected. Like layer choices, input choices should be **initialized in `__init__` and called in `forward`**. We will see later that this is to allow search algorithms to identify these choices and do necessary preparations.
`LayerChoice` and `InputChoice` are both **mutables**. Mutable means "changeable". As opposed to traditional deep learning layers/modules which have fixed operation types once defined, models with mutable are essentially a series of possible models.
Users can specify a **key** for each mutable. By default, NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice`s with the same candidate operations and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice and will be used in the dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables, see [Mutables](./NasReference.md).
## Use a Search Algorithm
Aside from using a search space, there are at least two other ways users can do search. One runs NAS distributedly, which can be as naive as enumerating all the architectures and training each one from scratch, or can involve leveraging more advanced technique, such as [SMASH][8], [ENAS][2], [DARTS][1], [FBNet][3], [ProxylessNAS][4], [SPOS][5], [Single-Path NAS][6], [Understanding One-shot][7] and [GDAS][9]. Since training many different architectures is known to be expensive, another family of methods, called one-shot NAS, builds a supernet containing every candidate in the search space as its subnetwork, and in each step, a subnetwork or combination of several subnetworks is trained.
Currently, several one-shot NAS methods are supported on NNI. For example, `DartsTrainer`, which uses SGD to train architecture weights and model weights iteratively, and `ENASTrainer`, which [uses a controller to train the model][2]. New and more efficient NAS trainers keep emerging in research community and some will be implemented in future releases of NNI.
Currently, several one-shot NAS methods are supported on NNI. For example, `DartsTrainer`, which uses SGD to train architecture weights and model weights iteratively, and `ENASTrainer`, which [uses a controller to train the model][2]. New and more efficient NAS trainers keep emerging in research community and some will be implemented in future releases of NNI.
### One-Shot NAS
## Search with One-shot NAS Algorithms
Each one-shot NAS algorithm implements a trainer, for which users can find usage details in the description of each algorithm. Here is a simple example, demonstrating how users can use `EnasTrainer`.
Each one-shot NAS algorithm implements a trainer, for which users can find usage details in the description of each algorithm. Here is a simple example, demonstrating how users can use `EnasTrainer`.
# metrics function receives output and target and computes a dict of metrics
# metrics function receives output and target and computes a dict of metrics
return{"acc1":reward_accuracy(output,target)}
return{"acc1":top1_accuracy(output,target)}
fromnni.nas.pytorchimportenas
fromnni.nas.pytorchimportenas
trainer=enas.EnasTrainer(model,
trainer=enas.EnasTrainer(model,
...
@@ -117,35 +42,13 @@ trainer.train() # training
...
@@ -117,35 +42,13 @@ trainer.train() # training
trainer.export(file="model_dir/final_architecture.json")# export the final architecture to file
trainer.export(file="model_dir/final_architecture.json")# export the final architecture to file
```
```
Users can directly run their training file through `python3 train.py` without `nnictl`. After training, users can export the best one of the found models through `trainer.export()`.
`model` is the one with [user defined search space](./WriteSearchSpace.md). Then users should prepare training data and model evaluation metrics. To search from the defined search space, a one-shot algorithm is instantiated, called trainer (e.g., EnasTrainer). The trainer exposes a few arguments that you can customize. For example, the loss function, the metrics function, the optimizer, and the datasets. These should satisfy most usage requirements and we do our best to make sure our built-in trainers work on as many models, tasks, and datasets as possible.
Normally, the trainer exposes a few arguments that you can customize. For example, the loss function, the metrics function, the optimizer, and the datasets. These should satisfy most usages needs and we do our best to make sure our built-in trainers work on as many models, tasks, and datasets as possible. But there is no guarantee. For example, some trainers have the assumption that the task is a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps); most trainers do not have support for distributed training: they won't wrap your model with `DataParallel` or `DistributedDataParallel` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might need to [customize your trainer](./Advanced.md#extend-the-ability-of-one-shot-trainers).
Furthermore, one-shot NAS can be visualized with our NAS UI. [See more details.](./Visualization.md)
### Distributed NAS
Neural architecture search was originally executed by running each child model independently as a trial job. We also support this searching approach, and it naturally fits within the NNI hyper-parameter tuning framework, where Tuner generates child models for the next trial and trials run in the training service.
To use this mode, there is no need to change the search space expressed with the NNI NAS API (i.e., `LayerChoice`, `InputChoice`, `MutableScope`). After the model is initialized, apply the function `get_and_apply_next_architecture` on the model. One-shot NAS trainers are not used in this mode. Here is a simple example:
```python
model=Net()
# get the chosen architecture from tuner and apply it on model
get_and_apply_next_architecture(model)
train(model)# your code for training the model
acc=test(model)# test the trained model
nni.report_final_result(acc)# report the performance of the chosen architecture
```
The search space should be generated and sent to Tuner. As with the NNI NAS API, the search space is embedded in the user code. Users can use "[nnictl ss_gen](../Tutorial/Nnictl.md)" to generate the search space file. Then put the path of the generated search space in the field `searchSpacePath` of `config.yml`. The other fields in `config.yml` can be filled by referring [this tutorial](../Tutorial/QuickStart.md).
You can use the [NNI tuners](../Tuner/BuiltinTuner.md) to do the search. Currently, only PPO Tuner supports NAS search spaces.
**Note that** when using one-shot NAS algorithms, there is no need to start an NNI experiment. Users can directly run this Python script (i.e., `train.py`) through `python3 train.py` without `nnictl`. After training, users can export the best one of the found models through `trainer.export()`.
We support a standalone mode for easy debugging, where you can directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode.
Each trainer in NNI has its targeted scenario and usage. Some trainers have the assumption that the task is a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps). Most trainers do not have support for distributed training: they won't wrap your model with `DataParallel` or `DistributedDataParallel` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might need to [customize your trainer](./Advanced.md#extend-the-ability-of-one-shot-trainers).
A complete example can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/classic_nas/config_nas.yml).
Furthermore, one-shot NAS can be visualized with our NAS UI. [See more details.](./Visualization.md)
Automatic neural architecture search is taking an increasingly important role in finding better models. Recent research has proved the feasibility of automatic NAS and has lead to models that beat many manually designed and tuned models. Some representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. Further, new innovations keep emerging.
Automatic neural architecture search is taking an increasingly important role in finding better models. Recent research has proved the feasibility of automatic NAS and has lead to models that beat many manually designed and tuned models. Some representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. Further, new innovations keep emerging.
However, it takes a great effort to implement NAS algorithms, and it's hard to reuse the code base of existing algorithms for new ones. To facilitate NAS innovations (e.g., the design and implementation of new NAS models, the comparison of different NAS models side-by-side, etc.), an easy-to-use and flexible programming interface is crucial.
However, it takes a great effort to implement NAS algorithms, and it's hard to reuse the code base of existing algorithms for new ones. To facilitate NAS innovations (e.g., the design and implementation of new NAS models, the comparison of different NAS models side-by-side, etc.), an easy-to-use and flexible programming interface is crucial.
With this motivation, our ambition is to provide a unified architecture in NNI, accelerate innovations on NAS, and apply state-of-the-art algorithms to real-world problems faster.
With this motivation, our ambition is to provide a unified architecture in NNI, accelerate innovations on NAS, and apply state-of-the-art algorithms to real-world problems faster.
With the unified interface, there are two different modes for architecture search. [One](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on a search space and one-shot training is used to generate a good-performing child model. [The other](#supported-distributed-nas-algorithms) is the traditional search-based approach, where each child model within the search space runs as an independent trial. The performance result is then sent to Tuner and the tuner generates a new child model.
With the unified interface, there are two different modes for architecture search. [One](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on a search space and one-shot training is used to generate a good-performing child model. [The other](#supported-classic-nas-algorithms) is the traditional search-based approach, where each child model within the search space runs as an independent trial. We call it classic NAS.
NNI also provides dedicated [visualization tool](#nas-visualization) for users to check the status of the neural architecture search process.
## Supported Classic NAS Algorithms
The procedure of classic NAS algorithms is similar to hyper-parameter tuning, users use `nnictl` to start experiments and each model runs as a trial. The difference is that search space file is automatically generated from user model (with search space in it) by running `nnictl ss_gen`. The following table listed supported tuning algorihtms for classic NAS mode. More algorihtms will be supported in future release.
|Name|Brief Introduction of Algorithm|
|---|---|
| [Random Search](https://github.com/microsoft/nni/tree/master/examples/tuners/random_nas_tuner) | Randomly pick a model from search space |
| [PPO Tuner](https://nni.readthedocs.io/en/latest/Tuner/BuiltinTuner.html#PPOTuner) | PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. [Reference Paper](https://arxiv.org/abs/1707.06347) |
Please refer to [here](ClassicNas.md) for the usage of classic NAS algorithms.
## Supported One-shot NAS Algorithms
## Supported One-shot NAS Algorithms
NNI currently supports the NAS algorithms listed below and is adding more. Users can reproduce an algorithm or use it on their own dataset. We also encourage users to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.
NNI currently supports the one-shot NAS algorithms listed below and is adding more. Users can reproduce an algorithm or use it on their own dataset. We also encourage users to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.
|Name|Brief Introduction of Algorithm|
|Name|Brief Introduction of Algorithm|
|---|---|
|---|---|
| [ENAS](ENAS.md) | [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268). In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance. |
| [ENAS](https://nni.readthedocs.io/en/latest/NAS/ENAS.html) | [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268). In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance. |
| [DARTS](DARTS.md) | [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) introduces a novel algorithm for differentiable network architecture search on bilevel optimization. |
| [DARTS](https://nni.readthedocs.io/en/latest/NAS/DARTS.html) | [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) introduces a novel algorithm for differentiable network architecture search on bilevel optimization. |
| [P-DARTS](PDARTS.md) | [Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. |
| [P-DARTS](https://nni.readthedocs.io/en/latest/NAS/PDARTS.html) | [Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. |
| [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with a uniform path sampling method and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
| [SPOS](https://nni.readthedocs.io/en/latest/NAS/SPOS.html) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with a uniform path sampling method and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
| [CDARTS](CDARTS.md) | [Cyclic Differentiable Architecture Search](https://arxiv.org/abs/****) builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.|
| [CDARTS](https://nni.readthedocs.io/en/latest/NAS/CDARTS.html) | [Cyclic Differentiable Architecture Search](https://arxiv.org/abs/****) builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.|
| [ProxylessNAS](Proxylessnas.md) | [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332). It removes proxy, directly learns the architectures for large-scale target tasks and target hardware platforms. |
| [ProxylessNAS](https://nni.readthedocs.io/en/latest/NAS/Proxylessnas.html) | [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332). It removes proxy, directly learns the architectures for large-scale target tasks and target hardware platforms. |
| [TextNAS](TextNAS.md) | [TextNAS: A Neural Architecture Search Space tailored for Text Representation](https://arxiv.org/pdf/1912.10729.pdf). It is a neural architecture search algorithm tailored for text representation. |
| [TextNAS](https://nni.readthedocs.io/en/latest/NAS/TextNAS.html) | [TextNAS: A Neural Architecture Search Space tailored for Text Representation](https://arxiv.org/pdf/1912.10729.pdf). It is a neural architecture search algorithm tailored for text representation. |
One-shot algorithms run **standalone without nnictl**. Only the PyTorch version has been implemented. Tensorflow 2.x will be supported in a future release.
One-shot algorithms run **standalone without nnictl**. NNI supports both PyTorch and Tensorflow 2.X.
Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``.
Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``.
...
@@ -30,26 +49,19 @@ Here are some common dependencies to run the examples. PyTorch needs to be above
...
@@ -30,26 +49,19 @@ Here are some common dependencies to run the examples. PyTorch needs to be above
* PyTorch 1.2+
* PyTorch 1.2+
* git
* git
One-shot NAS can be visualized with our visualization tool. Learn more details [here](./Visualization.md).
Please refer to [here](NasGuide.md) for the usage of one-shot NAS algorithms.
## Supported Distributed NAS Algorithms
|Name|Brief Introduction of Algorithm|
One-shot NAS can be visualized with our visualization tool. Learn more details [here](./Visualization.md).
|---|---|
| [SPOS's 2nd stage](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with a uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures.|
```eval_rst
.. Note:: SPOS is a two-stage algorithm, whose first stage is one-shot and the second stage is distributed, leveraging the result of the first stage as a checkpoint.
```
## Using the NNI API
## Using NNI API to Write Your Search Space
The programming interface of designing and searching a model is often demanded in two scenarios.
The programming interface of designing and searching a model is often demanded in two scenarios.
1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models.
1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models.
2. When applying NAS on a neural network, it needs a unified way to express the search space of architectures, so that it doesn't need to update trial code for different search algorithms.
2. When applying NAS on a neural network, it needs a unified way to express the search space of architectures, so that it doesn't need to update trial code for different search algorithms.
[Here](./NasGuide.md) is the user guide to get started with using NAS on NNI.
For using NNI NAS, we suggest users to first go through [the tutorial of NAS API for building search space](./WriteSearchSpace.md).
The NAS feature provided by NNI has two key components: APIs for expressing the search space and NAS training approaches. The former is for users to easily specify a class of models (i.e., the candidate models specified by the search space) which may perform well. The latter is for users to easily apply state-of-the-art NAS training approaches on their own model.
Here we use a simple example to demonstrate how to tune your model architecture with the NNI NAS APIs step by step. The complete code of this example can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/naive).
## Write your model with NAS APIs
Instead of writing a concrete neural model, you can write a class of neural models using two of the NAS APIs library functions, `LayerChoice` and `InputChoice`. For example, if you think either of two options might work in the first convolution layer, then you can get one from them using `LayerChoice` as shown by `self.conv1` in the code. Similarly, the second convolution layer `self.conv2` also chooses one from two options. To this line, four candidate neural networks are specified. `self.skipconnect` uses `InputChoice` to specify two choices, adding a skip connection or not.
For a detailed description of `LayerChoice` and `InputChoice`, please refer to [the NAS guide](NasGuide.md)
## Choose a NAS trainer
After the model is instantiated, it is time to train the model using a NAS trainer. Different trainers use different approaches to search for the best one from a class of neural models that you specified. NNI provides several popular NAS training approaches such as DARTS and ENAS. Here we use `DartsTrainer` in the example below. After the trainer is instantiated, invoke `trainer.train()` to do the search.
```python
trainer=DartsTrainer(net,
loss=criterion,
metrics=accuracy,
optimizer=optimizer,
num_epochs=2,
dataset_train=dataset_train,
dataset_valid=dataset_valid,
batch_size=64,
log_frequency=10)
trainer.train()
```
## Export the best model
After the search (i.e., `trainer.train()`) is done, to get the best performing model we simply call `trainer.export("final_arch.json")` to export the found neural architecture to a file.
## NAS visualization
We are working on NAS visualization and will release this feature soon.
## Retrain the exported best model
It is simple to retrain the found (exported) neural architecture. Step one, instantiate the model you defined above. Step two, invoke `apply_fixed_architecture` to the model. Then the model becomes the found (exported) one. Afterward, you can use traditional training to train this model.
Genrally, a search space describes candiate architectures from which users want to find the best one. Different search algorithms, no matter classic NAS or one-shot NAS, can be applied on the search space. NNI provides APIs to unified the expression of neural architecture search space.
A search space can be built on a base model. This is also a common practice when a user wants to apply NAS on an existing model. Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example. Note that NNI provides the same APIs for expressing search space on PyTorch and TensorFlow.
```python
fromnni.nas.pytorchimportmutables
classNet(nn.Module):
def__init__(self):
super(Net,self).__init__()
self.conv1=mutables.LayerChoice([
nn.Conv2d(1,32,3,1),
nn.Conv2d(1,32,5,3)
])# try 3x3 kernel and 5x5 kernel
self.conv2=nn.Conv2d(32,64,3,1)
self.dropout1=nn.Dropout2d(0.25)
self.dropout2=nn.Dropout2d(0.5)
self.fc1=nn.Linear(9216,128)
self.fc2=nn.Linear(128,10)
defforward(self,x):
x=self.conv1(x)
x=F.relu(x)
# ... same as original ...
returnoutput
```
The example above adds an option of choosing conv5x5 at conv1. The modification is as simple as declaring a `LayerChoice` with the original conv3x3 and a new conv5x5 as its parameter. That's it! You don't have to modify the forward function in any way. You can imagine conv1 as any other module without NAS.
So how about the possibilities of connections? This can be done using `InputChoice`. To allow for a skip connection on the MNIST example, we add another layer called conv3. In the following example, a possible connection from conv2 is added to the output of conv3.
```python
fromnni.nas.pytorchimportmutables
classNet(nn.Module):
def__init__(self):
# ... same ...
self.conv2=nn.Conv2d(32,64,3,1)
self.conv3=nn.Conv2d(64,64,1,1)
# declaring that there is exactly one candidate to choose from
# search strategy will choose one or None
self.skipcon=mutables.InputChoice(n_candidates=1)
# ... same ...
defforward(self,x):
x=self.conv1(x)
x=F.relu(x)
x=self.conv2(x)
x0=self.skipcon([x])# choose one or none from [x]
x=self.conv3(x)
ifx0isnotNone:# skipconnection is open
x+=x0
x=F.max_pool2d(x,2)
# ... same ...
returnoutput
```
Input choice can be thought of as a callable module that receives a list of tensors and outputs the concatenation/sum/mean of some of them (sum by default), or `None` if none is selected. Like layer choices, input choices should be **initialized in `__init__` and called in `forward`**. This is to allow search algorithms to identify these choices and do necessary preparations.
`LayerChoice` and `InputChoice` are both **mutables**. Mutable means "changeable". As opposed to traditional deep learning layers/modules which have fixed operation types once defined, models with mutable are essentially a series of possible models.
Users can specify a **key** for each mutable. By default, NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice`s with the same candidate operations and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice and will be used in the dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables (e.g., `LayerChoice` and `InputChoice`), see [Mutables](./NasReference.md).
With search space defined, the next step is searching for the best model from it. Please refer to [classic NAS algorithms](./ClassicNas.md) and [one-shot NAS algorithms](./NasGuide.md) for how to search from your defined search space.
One-shot NAS algorithms leverage weight sharing among models in neural architecture search space to train a supernet, and use this supernet to guide the selection of better models. This type of algorihtms greatly reduces computational resource compared to independently training each model from scratch (which we call "Classic NAS"). NNI has supported many popular One-shot NAS algorithms as following.
Note that the only acceptable types within the search space is `mutable_layer`. `optional_input_size` can only be 0, 1, or [0, 1].
Note that the only acceptable types within the search space are `layer_choice` and `input_choice`. For `input_choice`, `n_chosen` can only be 0, 1, or [0, 1]. Note, the search space file for NAS is usually automatically generated through the command [`nnictl ss_gen`](../Tutorial/Nnictl.md).