Unverified Commit d2c610a1 authored by Yuge Zhang's avatar Yuge Zhang Committed by GitHub
Browse files

Update guide and reference of NAS (#1972)

parent d8388957
...@@ -46,16 +46,12 @@ bash run_retrain_cifar.sh ...@@ -46,16 +46,12 @@ bash run_retrain_cifar.sh
.. autoclass:: nni.nas.pytorch.cdarts.CdartsTrainer .. autoclass:: nni.nas.pytorch.cdarts.CdartsTrainer
:members: :members:
.. automethod:: __init__
.. autoclass:: nni.nas.pytorch.cdarts.RegularizedDartsMutator .. autoclass:: nni.nas.pytorch.cdarts.RegularizedDartsMutator
:members: :members:
.. autoclass:: nni.nas.pytorch.cdarts.DartsDiscreteMutator .. autoclass:: nni.nas.pytorch.cdarts.DartsDiscreteMutator
:members: :members:
.. automethod:: __init__
.. autoclass:: nni.nas.pytorch.cdarts.RegularizedMutatorParallel .. autoclass:: nni.nas.pytorch.cdarts.RegularizedMutatorParallel
:members: :members:
``` ```
...@@ -43,8 +43,10 @@ python3 retrain.py --arc-checkpoint ./checkpoints/epoch_49.json ...@@ -43,8 +43,10 @@ python3 retrain.py --arc-checkpoint ./checkpoints/epoch_49.json
.. autoclass:: nni.nas.pytorch.darts.DartsTrainer .. autoclass:: nni.nas.pytorch.darts.DartsTrainer
:members: :members:
.. automethod:: __init__
.. autoclass:: nni.nas.pytorch.darts.DartsMutator .. autoclass:: nni.nas.pytorch.darts.DartsMutator
:members: :members:
``` ```
## Limitations
* DARTS doesn't support DataParallel and needs to be customized in order to support DistributedDataParallel.
...@@ -37,10 +37,6 @@ python3 search.py -h ...@@ -37,10 +37,6 @@ python3 search.py -h
.. autoclass:: nni.nas.pytorch.enas.EnasTrainer .. autoclass:: nni.nas.pytorch.enas.EnasTrainer
:members: :members:
.. automethod:: __init__
.. autoclass:: nni.nas.pytorch.enas.EnasMutator .. autoclass:: nni.nas.pytorch.enas.EnasMutator
:members: :members:
.. automethod:: __init__
``` ```
# Guide: Using NAS on NNI
```eval_rst
.. contents::
.. Note:: The APIs are in an experimental stage. The current programing interface is subject to change.
```
![](../../img/nas_abstract_illustration.png)
Modern Neural Architecture Search (NAS) methods usually incorporate [three dimensions][1]: search space, search strategy, and performance estimation strategy. Search space often contains a limited neural network architectures to explore, while search strategy samples architectures from search space, gets estimations of their performance, and evolves itself. Ideally, search strategy should find the best architecture in the search space and report it to users. After users obtain such "best architecture", many methods use a "retrain step", which trains the network with the same pipeline as any traditional model.
## Implement a Search Space
Assuming now we've got a baseline model, what should we do to be empowered with NAS? Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example, the code might look like this:
```python
from nni.nas.pytorch import mutables
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = mutables.LayerChoice([
nn.Conv2d(1, 32, 3, 1),
nn.Conv2d(1, 32, 5, 3)
]) # try 3x3 kernel and 5x5 kernel
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
# ... same as original ...
return output
```
The example above adds an option of choosing conv5x5 at conv1. The modification is as simple as declaring a `LayerChoice` with original conv3x3 and a new conv5x5 as its parameter. That's it! You don't have to modify the forward function in anyway. You can imagine conv1 as any another module without NAS.
So how about the possibilities of connections? This can be done by `InputChoice`. To allow for a skipconnection on an MNIST example, we add another layer called conv3. In the following example, a possible connection from conv2 is added to the output of conv3.
```python
from nni.nas.pytorch import mutables
class Net(nn.Module):
def __init__(self):
# ... same ...
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.conv3 = nn.Conv2d(64, 64, 1, 1)
# declaring that there is exactly one candidate to choose from
# search strategy will choose one or None
self.skipcon = mutables.InputChoice(n_candidates=1)
# ... same ...
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x0 = self.skipcon([x]) # choose one or none from [x]
x = self.conv3(x)
if x0 is not None: # skipconnection is open
x += x0
x = F.max_pool2d(x, 2)
# ... same ...
return output
```
Input choice can be thought of as a callable module that receives a list of tensors and output the concatenation/sum/mean of some of them (sum by default), or `None` if none is selected. Like layer choices, input choices should be **initialized in `__init__` and called in `forward`**. We will see later that this is to allow search algorithms to identify these choices, and do necessary preparation.
`LayerChoice` and `InputChoice` are both **mutables**. Mutable means "changeable". As opposed to traditional deep learning layers/modules which have fixed operation type once defined, models with mutables are essentially a series of possible models.
Users can specify a **key** for each mutable. By default NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice` with the same candidate operations, and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice, and will be used in dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables, see [Mutables](./NasReference.md#mutables).
## Use a Search Algorithm
Different in how the search space is explored and trials are spawned, there are at least two different ways users can do search. One runs NAS distributedly, which can be as naive as enumerating all the architectures and training each one from scratch, or leveraging more advanced technique, such as [SMASH][8], [ENAS][2], [DARTS][1], [FBNet][3], [ProxylessNAS][4], [SPOS][5], [Single-Path NAS][6], [Understanding One-shot][7] and [GDAS][9]. Since training many different architectures are known to be expensive, another family of methods, called one-shot NAS, builds a supernet containing every candidate in the search space as its subnetwork, and in each step a subnetwork or combination of several subnetworks is trained.
Currently, several one-shot NAS methods have been supported on NNI. For example, `DartsTrainer` which uses SGD to train architecture weights and model weights iteratively, `ENASTrainer` which [uses a controller to train the model][2]. New and more efficient NAS trainers keep emerging in research community.
### One-Shot NAS
Each one-shot NAS implements a trainer, which users can find detailed usages in the description of each algorithm. Here is a simple example, demonstrating how users can use `EnasTrainer`.
```python
# this is exactly same as traditional model training
model = Net()
dataset_train = CIFAR10(root="./data", train=True, download=True, transform=train_transform)
dataset_valid = CIFAR10(root="./data", train=False, download=True, transform=valid_transform)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), 0.05, momentum=0.9, weight_decay=1.0E-4)
# use NAS here
def top1_accuracy(output, target):
# this is the function that computes the reward, as required by ENAS algorithm
batch_size = target.size(0)
_, predicted = torch.max(output.data, 1)
return (predicted == target).sum().item() / batch_size
def metrics_fn(output, target):
# metrics function receives output and target and computes a dict of metrics
return {"acc1": reward_accuracy(output, target)}
from nni.nas.pytorch import enas
trainer = enas.EnasTrainer(model,
loss=criterion,
metrics=metrics_fn,
reward_function=top1_accuracy,
optimizer=optimizer,
batch_size=128
num_epochs=10, # 10 epochs
dataset_train=dataset_train,
dataset_valid=dataset_valid,
log_frequency=10) # print log every 10 steps
trainer.train() # training
trainer.export(file="model_dir/final_architecture.json") # export the final architecture to file
```
Users can directly run their training file by `python3 train.py`, without `nnictl`. After training, users could export the best one of the found models through `trainer.export()`.
Normally, the trainer exposes a few arguments that you can customize, for example, loss function, metrics function, optimizer, and datasets. These should satisfy the needs from most usages, and we do our best to make sure our built-in trainers work on as many models, tasks and datasets as possible. But there is no guarantee. For example, some trainers have assumption that the task has to be a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps); most trainers do not have support for distributed training: they won't wrap your model with `DataParallel` or `DistributedDataParallel` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might very soon need to [customize your trainer](#extend-the-ability-of-one-shot-trainers).
### Distributed NAS
Neural architecture search is originally executed by running each child model independently as a trial job. We also support this searching approach, and it naturally fits in NNI hyper-parameter tuning framework, where tuner generates child model for next trial and trials run in training service.
To use this mode, there is no need to change the search space expressed with NNI NAS API (i.e., `LayerChoice`, `InputChoice`, `MutableScope`). After the model is initialized, apply the function `get_and_apply_next_architecture` on the model. One-shot NAS trainers are not used in this mode. Here is a simple example:
```python
model = Net()
# get the chosen architecture from tuner and apply it on model
get_and_apply_next_architecture(model)
train(model) # your code for training the model
acc = test(model) # test the trained model
nni.report_final_result(acc) # report the performance of the chosen architecture
```
The search space should be generated and sent to tuner. As with NNI NAS API the search space is embedded in user code, users could use "[nnictl ss_gen](../Tutorial/Nnictl.md)" to generate search space file. Then, put the path of the generated search space in the field `searchSpacePath` of `config.yml`. The other fields in `config.yml` can be filled by referring [this tutorial](../Tutorial/QuickStart.md).
You could use [NNI tuners](../Tuner/BuiltinTuner.md) to do the search. Currently, only PPO Tuner supports NAS search space.
We support standalone mode for easy debugging, where you could directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode.
A complete example can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/classic_nas/config_nas.yml).
### Retrain with Exported Architecture
After the searching phase, it's time to train the architecture found. Unlike many open-source NAS algorithms who write a whole new model specifically for retraining. We found that searching model and retraining model are usual very similar, and therefore you can construct your final model with the exact model code. For example
```python
model = Net()
apply_fixed_architecture(model, "model_dir/final_architecture.json")
```
The JSON is simply a mapping from mutable keys to one-hot or multi-hot representation of choices. For example
```json
{
"LayerChoice1": [false, true, false, false],
"InputChoice2": [true, true, false]
}
```
After applying, the model is then fixed and ready for a final training. The model works as a single model, although it might contain more parameters than expected. This comes with pros and cons. The good side is, you can directly load the checkpoint dumped from supernet during search phase and start retrain from there. However, this is also a model with redundant parameters, which may cause problems when trying to count the number of parameters in model. For deeper reasons and possible workaround, see [Trainers](./NasReference.md#retrain).
Also refer to [DARTS](./DARTS.md) for example code of retraining.
## Customize a Search Algorithm
### Extend the Ability of One-Shot Trainers
Users might want to do multiple things if they are using the trainers on real tasks, for example, distributed training, half-precision training, logging periodically, writing tensorboard, dumping checkpoints and so on. As mentioned previously, some trainers do have support for some of the items listed above; others might not. Generally, there are two recommended ways to add anything you want to an existing trainer: inherit an existing trainer and override, or copy an existing trainer and modify.
Either way, you are walking into the scope of implementing a new trainer. Basically, implementing a one-shot trainer is no different from any traditional deep learning trainer, except that a new concept called mutator will reveal itself. So that the implementation will be different in at least two places:
* Initialization
```python
model = Model()
mutator = MyMutator(model)
```
* Training
```python
for _ in range(epochs):
for x, y in data_loader:
mutator.reset() # reset all the choices in model
out = model(x) # like traditional model
loss = criterion(out, y)
loss.backward()
# no difference below
```
To demonstrate what mutators are for, we need to know how one-shot NAS normally works. Usually, one-shot NAS "co-optimize model weights and architecture weights". It repeatedly: sample an architecture or combination of several architectures from the supernet, train the chosen architectures like traditional deep learning model, update the trained parameters to the supernet, and use the metrics or loss as some signal to guide the architecture sampler. The mutator, is the architecture sampler here, often defined to be another deep-learning model. Therefore, you can treat it as any model, by defining parameters in it and optimizing it with optimizers. One mutator is initialized with exactly one model. Once a mutator is binded to a model, it cannot be rebinded to another model.
`mutator.reset()` is the core step. That's where all the choices in the model are finalized. The reset result will be always effective, until the next reset flushes the data. After the reset, the model can be seen as a traditional model to do forward-pass and backward-pass.
Finally, mutators provide a method called `mutator.export()` that export a dict with architectures to the model. Note that currently this dict this a mapping from keys of mutables to tensors of selection. So in order to dump to json, users need to convert the tensors explicitly into python list.
Meanwhile, NNI provides some useful tools so that users can implement trainers more easily. See [Trainers](./NasReference.md#trainers) for details.
### Implement New Mutators
To start with, here is the pseudo-code that demonstrates what happens on `mutator.reset()` and `mutator.export()`.
```python
def reset(self):
self.apply_on_model(self.sample_search())
```
```python
def export(self):
return self.sample_final()
```
On reset, a new architecture is sampled with `sample_search()` and applied on the model. Then the model is trained for one or more steps in search phase. On export, a new architecture is sampled with `sample_final()` and **do nothing to the model**. This is either for checkpoint or exporting the final architecture.
The requirements of return values of `sample_search()` and `sample_final()` are the same: a mapping from mutable keys to tensors. The tensor can be either a BoolTensor (true for selected, false for negative), or a FloatTensor which applies weight on each candidate. The selected branches will then be computed (in `LayerChoice`, modules will be called; in `InputChoice`, it's just tensors themselves), and reduce with the reduction operation specified in the choices. For most algorithms only worrying about the former part, here is an example of your mutator implementation.
```python
class RandomMutator(Mutator):
def __init__(self, model):
super().__init__(model) # don't forget to call super
# do something else
def sample_search(self):
result = dict()
for mutable in self.mutables: # this is all the mutable modules in user model
# mutables share the same key will be de-duplicated
if isinstance(mutable, LayerChoice):
# decided that this mutable should choose `gen_index`
gen_index = np.random.randint(mutable.length)
result[mutable.key] = torch.tensor([i == gen_index for i in range(mutable.length)],
dtype=torch.bool)
elif isinstance(mutable, InputChoice):
if mutable.n_chosen is None: # n_chosen is None, then choose any number
result[mutable.key] = torch.randint(high=2, size=(mutable.n_candidates,)).view(-1).bool()
# else do something else
return result
def sample_final(self):
return self.sample_search() # use the same logic here. you can do something different
```
The complete example of random mutator can be found [here](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/nas/pytorch/random/mutator.py).
For advanced usages, e.g., users want to manipulate the way modules in `LayerChoice` are executed, they can inherit `BaseMutator`, and overwrite `on_forward_layer_choice` and `on_forward_input_choice`, which are the callback implementation of `LayerChoice` and `InputChoice` respectively. Users can still use property `mutables` to get all `LayerChoice` and `InputChoice` in the model code. For details, please refer to [reference](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch) here to learn more.
```eval_rst
.. tip::
A useful application of random mutator is for debugging. Use
.. code-block:: python
mutator = RandomMutator(model)
mutator.reset()
will immediately set one possible candidate in the search space as the active one.
```
### Implemented a Distributed NAS Tuner
Before learning how to write a one-shot NAS tuner, users should first learn how to write a general tuner. read [Customize Tuner](../Tuner/CustomizeTuner.md) for tutorials.
When users call "[nnictl ss_gen](../Tutorial/Nnictl.md)" to generate search space file, a search space file like this will be generated:
```json
{
"key_name": {
"_type": "layer_choice",
"_value": ["op1_repr", "op2_repr", "op3_repr"]
},
"key_name": {
"_type": "input_choice",
"_value": {
"candidates": ["in1_key", "in2_key", "in3_key"],
"n_chosen": 1
}
}
}
```
This is the exact search space tuners will receive in `update_search_space`. It's then tuners' responsibility to interpret the search space and generate new candidates in `generate_parameters`. A valid "parameters" will be in the following format:
```json
{
"key_name": {
"_value": "op1_repr",
"_idx": 0
},
"key_name": {
"_value": ["in2_key"],
"_idex": [1]
}
}
```
Send it through `generate_parameters`, and the tuner would look like any HPO tuner. Refer to [SPOS](./SPOS.md) example code for an example.
[1]: https://arxiv.org/abs/1808.05377
[2]: https://arxiv.org/abs/1802.03268
[3]: https://arxiv.org/abs/1812.03443
[4]: https://arxiv.org/abs/1812.00332
[5]: https://arxiv.org/abs/1904.00420
[6]: https://arxiv.org/abs/1904.02877
[7]: http://proceedings.mlr.press/v80/bender18a
[8]: https://arxiv.org/abs/1708.05344
[9]: https://arxiv.org/abs/1910.04465
\ No newline at end of file
# NNI NAS Programming Interface
We are trying to support various NAS algorithms with unified programming interface, and it's still in experimental stage. It means the current programing interface might be updated in future.
## Programming interface for user model
The programming interface of designing and searching a model is often demanded in two scenarios.
1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models.
2. When applying NAS on a neural network, it needs an unified way to express the search space of architectures, so that it doesn't need to update trial code for different searching algorithms.
For expressing neural architecture search space in user code, we provide the following APIs (take PyTorch as example):
```python
# in PyTorch module class
def __init__(self):
...
# choose one ``op`` from ``ops``, for PyTorch this is a module.
# op_candidates: for PyTorch ``ops`` is a list of modules, for tensorflow it is a list of keras layers.
# key: the name of this ``LayerChoice`` instance
self.one_layer = nni.nas.pytorch.LayerChoice([
PoolBN('max', channels, 3, stride, 1, affine=False),
PoolBN('avg', channels, 3, stride, 1, affine=False),
FactorizedReduce(channels, channels, affine=False),
SepConv(channels, channels, 3, stride, 1, affine=False),
DilConv(channels, channels, 3, stride, 2, 2, affine=False)],
key="layer_name")
...
def forward(self, x):
...
out = self.one_layer(x)
...
```
This is for users to specify multiple candidate operations for a layer, one operation will be chosen at last. `key` is the identifier of the layer,it could be used to share choice between multiple `LayerChoice`. For example, there are two `LayerChoice` with the same candidate operations, and you want them to have the same choice (i.e., if first one chooses the `i`th op, the second one also chooses the `i`th op), give them the same key.
```python
def __init__(self):
...
# choose ``n_selected`` from ``n_candidates`` inputs.
# n_candidates: the number of candidate inputs
# n_chosen: the number of chosen inputs
# key: the name of this ``InputChoice`` instance
self.input_switch = nni.nas.pytorch.InputChoice(
n_candidates=3,
n_chosen=1,
key="switch_name")
...
def forward(self, x):
...
out = self.input_switch([in_tensor1, in_tensor2, in_tensor3])
...
```
`InputChoice` is a PyTorch module, in init, it needs meta information, for example, from how many input candidates to choose how many inputs, and the name of this initialized `InputChoice`. The real candidate input tensors can only be obtained in `forward` function. In the `forward` function, the `InputChoice` module you create in `__init__` (e.g., `self.input_switch`) is called with real candidate input tensors.
Some [NAS trainers](#one-shot-training-mode) need to know the source layer the input tensors, thus, we add one input argument `choose_from` in `InputChoice` to indicate the source layer of each candidate input. `choose_from` is a list of string, each element is `key` of `LayerChoice` and `InputChoice` or the name of a module (refer to [the code](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/nas/pytorch/mutables.py) for more details).
Besides `LayerChoice` and `InputChoice`, we also provide `MutableScope` which allows users to label a sub-network, thus, could provide more semantic information (e.g., the structure of the network) to NAS trainers. Here is an example:
```python
class Cell(mutables.MutableScope):
def __init__(self, scope_name):
super().__init__(scope_name)
self.layer1 = nni.nas.pytorch.LayerChoice(...)
self.layer2 = nni.nas.pytorch.LayerChoice(...)
self.layer3 = nni.nas.pytorch.LayerChoice(...)
...
```
The three `LayerChoice` (`layer1`, `layer2`, `layer3`) are included in the `MutableScope` named `scope_name`. NAS trainer could get this hierarchical structure.
## Two training modes
After writing your model with search space embedded in the model using the above APIs, the next step is finding the best model from the search space. There are two training modes: [one-shot training mode](#one-shot-training-mode) and [classic distributed search](#classic-distributed-search).
### One-shot training mode
Similar to optimizers of deep learning models, the procedure of finding the best model from search space can be viewed as a type of optimizing process, we call it `NAS trainer`. There have been several NAS trainers, for example, `DartsTrainer` which uses SGD to train architecture weights and model weights iteratively, `ENASTrainer` which uses a controller to train the model. New and more efficient NAS trainers keep emerging in research community.
NNI provides some popular NAS trainers, to use a NAS trainer, users could initialize a trainer after the model is defined:
```python
# create a DartsTrainer
trainer = DartsTrainer(model,
loss=criterion,
metrics=lambda output, target: accuracy(output, target, topk=(1,)),
optimizer=optim,
num_epochs=args.epochs,
dataset_train=dataset_train,
dataset_valid=dataset_valid,)
# finding the best model from search space
trainer.train()
# export the best found model
trainer.export(file='./chosen_arch')
```
Different trainers could have different input arguments depending on their algorithms. Please refer to [each trainer's code](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch) for detailed arguments. After training, users could export the best one of the found models through `trainer.export()`. No need to start an NNI experiment through `nnictl`.
The supported trainers can be found [here](Overview.md#supported-one-shot-nas-algorithms). A very simple example using NNI NAS API can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/simple/train.py).
### Classic distributed search
Neural architecture search is originally executed by running each child model independently as a trial job. We also support this searching approach, and it naturally fits in NNI hyper-parameter tuning framework, where tuner generates child model for next trial and trials run in training service.
For using this mode, no need to change the search space expressed with NNI NAS API (i.e., `LayerChoice`, `InputChoice`, `MutableScope`). After the model is initialized, apply the function `get_and_apply_next_architecture` on the model. One-shot NAS trainers are not used in this mode. Here is a simple example:
```python
class Net(nn.Module):
# defined model with LayerChoice and InputChoice
...
model = Net()
# get the chosen architecture from tuner and apply it on model
get_and_apply_next_architecture(model)
# your code for training the model
train(model)
# test the trained model
acc = test(model)
# report the performance of the chosen architecture
nni.report_final_result(acc)
```
The search space should be automatically generated and sent to tuner. As with NNI NAS API the search space is embedded in user code, users could use "[nnictl ss_gen](../Tutorial/Nnictl.md)" to generate search space file. Then, put the path of the generated search space in the field `searchSpacePath` of `config.yml`. The other fields in `config.yml` can be filled by referring [this tutorial](../Tutorial/QuickStart.md).
You could use [NNI tuners](../Tuner/BuiltinTuner.md) to do the search.
We support standalone mode for easy debugging, where you could directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode.
The complete example code can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/classic_nas/config_nas.yml).
## Programming interface for NAS algorithm
We also provide simple interface for users to easily implement a new NAS trainer on NNI.
### Implement a new NAS trainer on NNI
To implement a new NAS trainer, users basically only need to implement two classes by inheriting `BaseMutator` and `BaseTrainer` respectively.
In `BaseMutator`, users need to overwrite `on_forward_layer_choice` and `on_forward_input_choice`, which are the implementation of `LayerChoice` and `InputChoice` respectively. Users could use property `mutables` to get all `LayerChoice` and `InputChoice` in the model code. Then users need to implement a new trainer, which instantiates the new mutator and implement the training logic. For details, please read [the code](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch) and the supported trainers, for example, [DartsTrainer](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/darts).
### Implement an NNI tuner for NAS
NNI tuner for NAS takes the auto generated search space. The search space format of `LayerChoice` and `InputChoice` is shown below:
```json
{
"key_name": {
"_type": "layer_choice",
"_value": ["op1_repr", "op2_repr", "op3_repr"]
},
"key_name": {
"_type": "input_choice",
"_value": {
"candidates": ["in1_key", "in2_key", "in3_key"],
"n_chosen": 1
}
}
}
```
Correspondingly, the generate architecture is in the following format:
```json
{
"key_name": {
"_value": "op1_repr",
"_idx": 0
},
"key_name": {
"_value": ["in2_key"],
"_idex": [1]
}
}
```
# NAS Reference
```eval_rst
.. contents::
```
## Mutables
```eval_rst
.. autoclass:: nni.nas.pytorch.mutables.Mutable
:members:
.. autoclass:: nni.nas.pytorch.mutables.LayerChoice
:members:
.. autoclass:: nni.nas.pytorch.mutables.InputChoice
:members:
.. autoclass:: nni.nas.pytorch.mutables.MutableScope
:members:
```
### Utilities
```eval_rst
.. autofunction:: nni.nas.pytorch.utils.global_mutable_counting
```
## Mutators
```eval_rst
.. autoclass:: nni.nas.pytorch.base_mutator.BaseMutator
:members:
.. autoclass:: nni.nas.pytorch.mutator.Mutator
:members:
```
### Random Mutator
```eval_rst
.. autoclass:: nni.nas.pytorch.random.RandomMutator
:members:
```
### Utilities
```eval_rst
.. autoclass:: nni.nas.pytorch.utils.StructuredMutableTreeNode
:members:
```
## Trainers
### Trainer
```eval_rst
.. autoclass:: nni.nas.pytorch.base_trainer.BaseTrainer
:members:
.. autoclass:: nni.nas.pytorch.trainer.Trainer
:members:
```
### Retrain
```eval_rst
.. autofunction:: nni.nas.pytorch.fixed.apply_fixed_architecture
.. autoclass:: nni.nas.pytorch.fixed.FixedArchitecture
:members:
```
### Distributed NAS
```eval_rst
.. autofunction:: nni.nas.pytorch.classic_nas.get_and_apply_next_architecture
.. autoclass:: nni.nas.pytorch.classic_nas.mutator.ClassicMutator
:members:
```
### Callbacks
```eval_rst
.. autoclass:: nni.nas.pytorch.callbacks.Callback
:members:
.. autoclass:: nni.nas.pytorch.callbacks.LRSchedulerCallback
:members:
.. autoclass:: nni.nas.pytorch.callbacks.ArchitectureCheckpoint
:members:
.. autoclass:: nni.nas.pytorch.callbacks.ModelCheckpoint
:members:
```
### Utilities
```eval_rst
.. autoclass:: nni.nas.pytorch.utils.AverageMeterGroup
:members:
.. autoclass:: nni.nas.pytorch.utils.AverageMeter
:members:
.. autofunction:: nni.nas.pytorch.utils.to_device
```
...@@ -6,11 +6,7 @@ However, it takes great efforts to implement NAS algorithms, and it is hard to r ...@@ -6,11 +6,7 @@ However, it takes great efforts to implement NAS algorithms, and it is hard to r
With this motivation, our ambition is to provide a unified architecture in NNI, to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster. With this motivation, our ambition is to provide a unified architecture in NNI, to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.
With [the unified interface](./NasInterface.md), there are two different modes for the architecture search. [One](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](./NasInterface.md#classic-distributed-search) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model. With the unified interface, there are two different modes for the architecture search. [One](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](#supported-distributed-nas-algorithms) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model.
* [Supported One-shot NAS Algorithms](#supported-one-shot-nas-algorithms)
* [Classic Distributed NAS with NNI experiment](./NasInterface.md#classic-distributed-search)
* [NNI NAS Programming Interface](./NasInterface.md)
## Supported One-shot NAS Algorithms ## Supported One-shot NAS Algorithms
...@@ -33,18 +29,26 @@ Here are some common dependencies to run the examples. PyTorch needs to be above ...@@ -33,18 +29,26 @@ Here are some common dependencies to run the examples. PyTorch needs to be above
* PyTorch 1.2+ * PyTorch 1.2+
* git * git
## Use NNI API ## Supported Distributed NAS Algorithms
|Name|Brief Introduction of Algorithm|
|---|---|
| [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with an uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
NOTE, we are trying to support various NAS algorithms with unified programming interface, and it's in very experimental stage. It means the current programing interface may be updated in future. ```eval_rst
.. Note:: SPOS is a two-stage algorithm, whose first stage is one-shot and second stage is distributed, leveraging result of first stage as a checkpoint.
```
### Programming interface ## Use NNI API
The programming interface of designing and searching a model is often demanded in two scenarios. The programming interface of designing and searching a model is often demanded in two scenarios.
1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models. 1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models.
2. When applying NAS on a neural network, it needs an unified way to express the search space of architectures, so that it doesn't need to update trial code for different searching algorithms. 2. When applying NAS on a neural network, it needs an unified way to express the search space of architectures, so that it doesn't need to update trial code for different searching algorithms.
NNI proposed API is [here](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch). And [here](https://github.com/microsoft/nni/tree/master/examples/nas/naive) is an example of NAS implementation, which bases on NNI proposed interface. [Here](./NasGuide.md) is a user guide to get started with using NAS on NNI.
## Reference and Feedback
[1]: https://arxiv.org/abs/1802.03268 [1]: https://arxiv.org/abs/1802.03268
[2]: https://arxiv.org/abs/1707.07012 [2]: https://arxiv.org/abs/1707.07012
...@@ -52,9 +56,5 @@ NNI proposed API is [here](https://github.com/microsoft/nni/tree/master/src/sdk/ ...@@ -52,9 +56,5 @@ NNI proposed API is [here](https://github.com/microsoft/nni/tree/master/src/sdk/
[4]: https://arxiv.org/abs/1806.10282 [4]: https://arxiv.org/abs/1806.10282
[5]: https://arxiv.org/abs/1703.01041 [5]: https://arxiv.org/abs/1703.01041
## **Reference and Feedback**
* To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub; * To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub;
* To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub; * To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub.
* To know more about [Feature Engineering with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/FeatureEngineering/Overview.md); \ No newline at end of file
* To know more about [Model Compression with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/Compressor/Overview.md);
* To know more about [Hyperparameter Tuning with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/Tuner/BuiltinTuner.md);
...@@ -93,17 +93,11 @@ By default, it will use `architecture_final.json`. This architecture is provided ...@@ -93,17 +93,11 @@ By default, it will use `architecture_final.json`. This architecture is provided
.. autoclass:: nni.nas.pytorch.spos.SPOSEvolution .. autoclass:: nni.nas.pytorch.spos.SPOSEvolution
:members: :members:
.. automethod:: __init__
.. autoclass:: nni.nas.pytorch.spos.SPOSSupernetTrainer .. autoclass:: nni.nas.pytorch.spos.SPOSSupernetTrainer
:members: :members:
.. automethod:: __init__
.. autoclass:: nni.nas.pytorch.spos.SPOSSupernetTrainingMutator .. autoclass:: nni.nas.pytorch.spos.SPOSSupernetTrainingMutator
:members: :members:
.. automethod:: __init__
``` ```
## Known Limitations ## Known Limitations
......
...@@ -17,7 +17,8 @@ For details, please refer to the following tutorials: ...@@ -17,7 +17,8 @@ For details, please refer to the following tutorials:
.. toctree:: .. toctree::
Overview <NAS/Overview> Overview <NAS/Overview>
NAS Interface <NAS/NasInterface> Guide <NAS/NasGuide>
API Reference <NAS/NasReference>
ENAS <NAS/ENAS> ENAS <NAS/ENAS>
DARTS <NAS/DARTS> DARTS <NAS/DARTS>
P-DARTS <NAS/PDARTS> P-DARTS <NAS/PDARTS>
......
...@@ -13,7 +13,12 @@ logger = logging.getLogger(__name__) ...@@ -13,7 +13,12 @@ logger = logging.getLogger(__name__)
class BaseMutator(nn.Module): class BaseMutator(nn.Module):
""" """
A mutator is responsible for mutating a graph by obtaining the search space from the network and implementing A mutator is responsible for mutating a graph by obtaining the search space from the network and implementing
callbacks that are called in ``forward`` in Mutables. callbacks that are called in ``forward`` in mutables.
Parameters
----------
model : nn.Module
PyTorch model to apply mutator on.
""" """
def __init__(self, model): def __init__(self, model):
...@@ -52,9 +57,19 @@ class BaseMutator(nn.Module): ...@@ -52,9 +57,19 @@ class BaseMutator(nn.Module):
@property @property
def mutables(self): def mutables(self):
"""
A generator of all modules inheriting :class:`~nni.nas.pytorch.mutables.Mutable`.
Modules are yielded in the order that they are defined in ``__init__``.
For mutables with their keys appearing multiple times, only the first one will appear.
"""
return self._structured_mutables return self._structured_mutables
def forward(self, *inputs): def forward(self, *inputs):
"""
Warnings
--------
Don't call forward of a mutator.
"""
raise RuntimeError("Forward is undefined for mutators.") raise RuntimeError("Forward is undefined for mutators.")
def __setattr__(self, name, value): def __setattr__(self, name, value):
...@@ -70,6 +85,7 @@ class BaseMutator(nn.Module): ...@@ -70,6 +85,7 @@ class BaseMutator(nn.Module):
Parameters Parameters
---------- ----------
mutable_scope : MutableScope mutable_scope : MutableScope
The mutable scope that is entered.
""" """
pass pass
...@@ -80,6 +96,7 @@ class BaseMutator(nn.Module): ...@@ -80,6 +96,7 @@ class BaseMutator(nn.Module):
Parameters Parameters
---------- ----------
mutable_scope : MutableScope mutable_scope : MutableScope
The mutable scope that is exited.
""" """
pass pass
...@@ -90,12 +107,14 @@ class BaseMutator(nn.Module): ...@@ -90,12 +107,14 @@ class BaseMutator(nn.Module):
Parameters Parameters
---------- ----------
mutable : LayerChoice mutable : LayerChoice
Module whose forward is called.
inputs : list of torch.Tensor inputs : list of torch.Tensor
The arguments of its forward function.
Returns Returns
------- -------
tuple of torch.Tensor and torch.Tensor tuple of torch.Tensor and torch.Tensor
output tensor and mask Output tensor and mask.
""" """
raise NotImplementedError raise NotImplementedError
...@@ -106,12 +125,14 @@ class BaseMutator(nn.Module): ...@@ -106,12 +125,14 @@ class BaseMutator(nn.Module):
Parameters Parameters
---------- ----------
mutable : InputChoice mutable : InputChoice
Mutable that is called.
tensor_list : list of torch.Tensor tensor_list : list of torch.Tensor
The arguments mutable is called with.
Returns Returns
------- -------
tuple of torch.Tensor and torch.Tensor tuple of torch.Tensor and torch.Tensor
output tensor and mask Output tensor and mask.
""" """
raise NotImplementedError raise NotImplementedError
...@@ -123,5 +144,6 @@ class BaseMutator(nn.Module): ...@@ -123,5 +144,6 @@ class BaseMutator(nn.Module):
Returns Returns
------- -------
dict dict
Mappings from mutable keys to decisions.
""" """
raise NotImplementedError raise NotImplementedError
...@@ -8,16 +8,33 @@ class BaseTrainer(ABC): ...@@ -8,16 +8,33 @@ class BaseTrainer(ABC):
@abstractmethod @abstractmethod
def train(self): def train(self):
"""
Override the method to train.
"""
raise NotImplementedError raise NotImplementedError
@abstractmethod @abstractmethod
def validate(self): def validate(self):
"""
Override the method to validate.
"""
raise NotImplementedError raise NotImplementedError
@abstractmethod @abstractmethod
def export(self, file): def export(self, file):
"""
Override the method to export to file.
Parameters
----------
file : str
File path to export to.
"""
raise NotImplementedError raise NotImplementedError
@abstractmethod @abstractmethod
def checkpoint(self): def checkpoint(self):
"""
Override to dump a checkpoint.
"""
raise NotImplementedError raise NotImplementedError
...@@ -11,6 +11,9 @@ _logger = logging.getLogger(__name__) ...@@ -11,6 +11,9 @@ _logger = logging.getLogger(__name__)
class Callback: class Callback:
"""
Callback provides an easy way to react to events like begin/end of epochs.
"""
def __init__(self): def __init__(self):
self.model = None self.model = None
...@@ -18,14 +21,42 @@ class Callback: ...@@ -18,14 +21,42 @@ class Callback:
self.trainer = None self.trainer = None
def build(self, model, mutator, trainer): def build(self, model, mutator, trainer):
"""
Callback needs to be built with model, mutator, trainer, to get updates from them.
Parameters
----------
model : nn.Module
Model to be trained.
mutator : nn.Module
Mutator that mutates the model.
trainer : BaseTrainer
Trainer that is to call the callback.
"""
self.model = model self.model = model
self.mutator = mutator self.mutator = mutator
self.trainer = trainer self.trainer = trainer
def on_epoch_begin(self, epoch): def on_epoch_begin(self, epoch):
"""
Implement this to do something at the begin of epoch.
Parameters
----------
epoch : int
Epoch number, starting from 0.
"""
pass pass
def on_epoch_end(self, epoch): def on_epoch_end(self, epoch):
"""
Implement this to do something at the end of epoch.
Parameters
----------
epoch : int
Epoch number, starting from 0.
"""
pass pass
def on_batch_begin(self, epoch): def on_batch_begin(self, epoch):
...@@ -36,6 +67,14 @@ class Callback: ...@@ -36,6 +67,14 @@ class Callback:
class LRSchedulerCallback(Callback): class LRSchedulerCallback(Callback):
"""
Calls scheduler on every epoch ends.
Parameters
----------
scheduler : LRScheduler
Scheduler to be called.
"""
def __init__(self, scheduler, mode="epoch"): def __init__(self, scheduler, mode="epoch"):
super().__init__() super().__init__()
assert mode == "epoch" assert mode == "epoch"
...@@ -43,28 +82,54 @@ class LRSchedulerCallback(Callback): ...@@ -43,28 +82,54 @@ class LRSchedulerCallback(Callback):
self.mode = mode self.mode = mode
def on_epoch_end(self, epoch): def on_epoch_end(self, epoch):
"""
Call ``self.scheduler.step()`` on epoch end.
"""
self.scheduler.step() self.scheduler.step()
class ArchitectureCheckpoint(Callback): class ArchitectureCheckpoint(Callback):
"""
Calls ``trainer.export()`` on every epoch ends.
Parameters
----------
checkpoint_dir : str
Location to save checkpoints.
"""
def __init__(self, checkpoint_dir): def __init__(self, checkpoint_dir):
super().__init__() super().__init__()
self.checkpoint_dir = checkpoint_dir self.checkpoint_dir = checkpoint_dir
os.makedirs(self.checkpoint_dir, exist_ok=True) os.makedirs(self.checkpoint_dir, exist_ok=True)
def on_epoch_end(self, epoch): def on_epoch_end(self, epoch):
"""
Dump to ``/checkpoint_dir/epoch_{number}.json`` on epoch end.
"""
dest_path = os.path.join(self.checkpoint_dir, "epoch_{}.json".format(epoch)) dest_path = os.path.join(self.checkpoint_dir, "epoch_{}.json".format(epoch))
_logger.info("Saving architecture to %s", dest_path) _logger.info("Saving architecture to %s", dest_path)
self.trainer.export(dest_path) self.trainer.export(dest_path)
class ModelCheckpoint(Callback): class ModelCheckpoint(Callback):
"""
Calls ``trainer.export()`` on every epoch ends.
Parameters
----------
checkpoint_dir : str
Location to save checkpoints.
"""
def __init__(self, checkpoint_dir): def __init__(self, checkpoint_dir):
super().__init__() super().__init__()
self.checkpoint_dir = checkpoint_dir self.checkpoint_dir = checkpoint_dir
os.makedirs(self.checkpoint_dir, exist_ok=True) os.makedirs(self.checkpoint_dir, exist_ok=True)
def on_epoch_end(self, epoch): def on_epoch_end(self, epoch):
"""
Dump to ``/checkpoint_dir/epoch_{number}.pth.tar`` on every epoch end.
``DataParallel`` object will have their inside modules exported.
"""
if isinstance(self.model, nn.DataParallel): if isinstance(self.model, nn.DataParallel):
state_dict = self.model.module.state_dict() state_dict = self.model.module.state_dict()
else: else:
......
...@@ -127,10 +127,6 @@ class RegularizedMutatorParallel(DistributedDataParallel): ...@@ -127,10 +127,6 @@ class RegularizedMutatorParallel(DistributedDataParallel):
class DartsDiscreteMutator(Mutator): class DartsDiscreteMutator(Mutator):
""" """
A mutator that applies the final sampling result of a parent mutator on another model to train. A mutator that applies the final sampling result of a parent mutator on another model to train.
"""
def __init__(self, model, parent_mutator):
"""
Initialization.
Parameters Parameters
---------- ----------
...@@ -139,6 +135,7 @@ class DartsDiscreteMutator(Mutator): ...@@ -139,6 +135,7 @@ class DartsDiscreteMutator(Mutator):
parent_mutator : Mutator parent_mutator : Mutator
The mutator that provides ``sample_final`` method, that will be called to get the architecture. The mutator that provides ``sample_final`` method, that will be called to get the architecture.
""" """
def __init__(self, model, parent_mutator):
super().__init__(model) super().__init__(model)
self.__dict__["parent_mutator"] = parent_mutator # avoid parameters to be included self.__dict__["parent_mutator"] = parent_mutator # avoid parameters to be included
......
...@@ -32,14 +32,8 @@ class InteractiveKLLoss(nn.Module): ...@@ -32,14 +32,8 @@ class InteractiveKLLoss(nn.Module):
class CdartsTrainer(object): class CdartsTrainer(object):
def __init__(self, model_small, model_large, criterion, loaders, samplers, logger=None,
regular_coeff=5, regular_ratio=0.2, warmup_epochs=2, fix_head=True,
epochs=32, steps_per_epoch=None, loss_alpha=2, loss_T=2, distributed=True,
log_frequency=10, grad_clip=5.0, interactive_type='kl', output_path='./outputs',
w_lr=0.2, w_momentum=0.9, w_weight_decay=3e-4, alpha_lr=0.2, alpha_weight_decay=1e-4,
nasnet_lr=0.2, local_rank=0, share_module=True):
""" """
Initialize a CdartsTrainer. CDARTS trainer.
Parameters Parameters
---------- ----------
...@@ -99,6 +93,12 @@ class CdartsTrainer(object): ...@@ -99,6 +93,12 @@ class CdartsTrainer(object):
share_module : bool share_module : bool
``True`` if sharing the stem and auxiliary heads, else not sharing these modules. ``True`` if sharing the stem and auxiliary heads, else not sharing these modules.
""" """
def __init__(self, model_small, model_large, criterion, loaders, samplers, logger=None,
regular_coeff=5, regular_ratio=0.2, warmup_epochs=2, fix_head=True,
epochs=32, steps_per_epoch=None, loss_alpha=2, loss_T=2, distributed=True,
log_frequency=10, grad_clip=5.0, interactive_type='kl', output_path='./outputs',
w_lr=0.2, w_momentum=0.9, w_weight_decay=3e-4, alpha_lr=0.2, alpha_weight_decay=1e-4,
nasnet_lr=0.2, local_rank=0, share_module=True):
if logger is None: if logger is None:
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
train_loader, valid_loader = loaders train_loader, valid_loader = loaders
......
...@@ -22,12 +22,21 @@ INPUT_CHOICE = "input_choice" ...@@ -22,12 +22,21 @@ INPUT_CHOICE = "input_choice"
def get_and_apply_next_architecture(model): def get_and_apply_next_architecture(model):
""" """
Wrapper of ClassicMutator to make it more meaningful, Wrapper of :class:`~nni.nas.pytorch.classic_nas.mutator.ClassicMutator` to make it more meaningful,
similar to ```get_next_parameter``` for HPO. similar to ``get_next_parameter`` for HPO.
Tt will generate search space based on ``model``.
If env ``NNI_GEN_SEARCH_SPACE`` exists, this is in dry run mode for
generating search space for the experiment.
If not, there are still two mode, one is nni experiment mode where users
use ``nnictl`` to start an experiment. The other is standalone mode
where users directly run the trial command, this mode chooses the first
one(s) for each LayerChoice and InputChoice.
Parameters Parameters
---------- ----------
model : pytorch model model : nn.Module
user's model with search space (e.g., LayerChoice, InputChoice) embedded in it User's model with search space (e.g., LayerChoice, InputChoice) embedded in it.
""" """
ClassicMutator(model) ClassicMutator(model)
...@@ -36,23 +45,15 @@ class ClassicMutator(Mutator): ...@@ -36,23 +45,15 @@ class ClassicMutator(Mutator):
""" """
This mutator is to apply the architecture chosen from tuner. This mutator is to apply the architecture chosen from tuner.
It implements the forward function of LayerChoice and InputChoice, It implements the forward function of LayerChoice and InputChoice,
to only activate the chosen ones to only activate the chosen ones.
"""
def __init__(self, model):
"""
Generate search space based on ```model```.
If env ```NNI_GEN_SEARCH_SPACE``` exists, this is in dry run mode for
generating search space for the experiment.
If not, there are still two mode, one is nni experiment mode where users
use ```nnictl``` to start an experiment. The other is standalone mode
where users directly run the trial command, this mode chooses the first
one(s) for each LayerChoice and InputChoice.
Parameters Parameters
---------- ----------
model : PyTorch model model : nn.Module
user's model with search space (e.g., LayerChoice, InputChoice) embedded in it User's model with search space (e.g., LayerChoice, InputChoice) embedded in it.
""" """
def __init__(self, model):
super(ClassicMutator, self).__init__(model) super(ClassicMutator, self).__init__(model)
self._chosen_arch = {} self._chosen_arch = {}
self._search_space = self._generate_search_space() self._search_space = self._generate_search_space()
...@@ -114,9 +115,15 @@ class ClassicMutator(Mutator): ...@@ -114,9 +115,15 @@ class ClassicMutator(Mutator):
return torch.tensor(multihot_list, dtype=torch.bool) # pylint: disable=not-callable return torch.tensor(multihot_list, dtype=torch.bool) # pylint: disable=not-callable
def sample_search(self): def sample_search(self):
"""
See :meth:`sample_final`.
"""
return self.sample_final() return self.sample_final()
def sample_final(self): def sample_final(self):
"""
Convert the chosen arch and apply it on model.
"""
assert set(self._chosen_arch.keys()) == set(self._search_space.keys()), \ assert set(self._chosen_arch.keys()) == set(self._search_space.keys()), \
"Unmatched keys, expected keys '{}' from search space, found '{}'.".format(self._search_space.keys(), "Unmatched keys, expected keys '{}' from search space, found '{}'.".format(self._search_space.keys(),
self._chosen_arch.keys()) self._chosen_arch.keys())
......
...@@ -15,12 +15,8 @@ logger = logging.getLogger(__name__) ...@@ -15,12 +15,8 @@ logger = logging.getLogger(__name__)
class DartsTrainer(Trainer): class DartsTrainer(Trainer):
def __init__(self, model, loss, metrics,
optimizer, num_epochs, dataset_train, dataset_valid,
mutator=None, batch_size=64, workers=4, device=None, log_frequency=None,
callbacks=None, arc_learning_rate=3.0E-4, unrolled=False):
""" """
Initialize a DartsTrainer. DARTS trainer.
Parameters Parameters
---------- ----------
...@@ -55,6 +51,10 @@ class DartsTrainer(Trainer): ...@@ -55,6 +51,10 @@ class DartsTrainer(Trainer):
unrolled : float unrolled : float
``True`` if using second order optimization, else first order optimization. ``True`` if using second order optimization, else first order optimization.
""" """
def __init__(self, model, loss, metrics,
optimizer, num_epochs, dataset_train, dataset_valid,
mutator=None, batch_size=64, workers=4, device=None, log_frequency=None,
callbacks=None, arc_learning_rate=3.0E-4, unrolled=False):
super().__init__(model, mutator if mutator is not None else DartsMutator(model), super().__init__(model, mutator if mutator is not None else DartsMutator(model),
loss, metrics, optimizer, num_epochs, dataset_train, dataset_valid, loss, metrics, optimizer, num_epochs, dataset_train, dataset_valid,
batch_size, workers, device, log_frequency, callbacks) batch_size, workers, device, log_frequency, callbacks)
......
...@@ -28,11 +28,8 @@ class StackedLSTMCell(nn.Module): ...@@ -28,11 +28,8 @@ class StackedLSTMCell(nn.Module):
class EnasMutator(Mutator): class EnasMutator(Mutator):
def __init__(self, model, lstm_size=64, lstm_num_layers=1, tanh_constant=1.5, cell_exit_extra_step=False,
skip_target=0.4, temperature=None, branch_bias=0.25, entropy_reduction="sum"):
""" """
Initialize a EnasMutator. A mutator that mutates the graph with RL.
Parameters Parameters
---------- ----------
...@@ -60,6 +57,9 @@ class EnasMutator(Mutator): ...@@ -60,6 +57,9 @@ class EnasMutator(Mutator):
entropy_reduction : str entropy_reduction : str
Can be one of ``sum`` and ``mean``. How the entropy of multi-input-choice is reduced. Can be one of ``sum`` and ``mean``. How the entropy of multi-input-choice is reduced.
""" """
def __init__(self, model, lstm_size=64, lstm_num_layers=1, tanh_constant=1.5, cell_exit_extra_step=False,
skip_target=0.4, temperature=None, branch_bias=0.25, entropy_reduction="sum"):
super().__init__(model) super().__init__(model)
self.lstm_size = lstm_size self.lstm_size = lstm_size
self.lstm_num_layers = lstm_num_layers self.lstm_num_layers = lstm_num_layers
......
...@@ -16,14 +16,8 @@ logger = logging.getLogger(__name__) ...@@ -16,14 +16,8 @@ logger = logging.getLogger(__name__)
class EnasTrainer(Trainer): class EnasTrainer(Trainer):
def __init__(self, model, loss, metrics, reward_function,
optimizer, num_epochs, dataset_train, dataset_valid,
mutator=None, batch_size=64, workers=4, device=None, log_frequency=None, callbacks=None,
entropy_weight=0.0001, skip_weight=0.8, baseline_decay=0.999, child_steps=500,
mutator_lr=0.00035, mutator_steps_aggregate=20, mutator_steps=50, aux_weight=0.4,
test_arc_per_epoch=1):
""" """
Initialize an EnasTrainer. ENAS trainer.
Parameters Parameters
---------- ----------
...@@ -74,6 +68,12 @@ class EnasTrainer(Trainer): ...@@ -74,6 +68,12 @@ class EnasTrainer(Trainer):
test_arc_per_epoch : int test_arc_per_epoch : int
How many architectures are chosen for direct test after each epoch. How many architectures are chosen for direct test after each epoch.
""" """
def __init__(self, model, loss, metrics, reward_function,
optimizer, num_epochs, dataset_train, dataset_valid,
mutator=None, batch_size=64, workers=4, device=None, log_frequency=None, callbacks=None,
entropy_weight=0.0001, skip_weight=0.8, baseline_decay=0.999, child_steps=500,
mutator_lr=0.00035, mutator_steps_aggregate=20, mutator_steps=50, aux_weight=0.4,
test_arc_per_epoch=1):
super().__init__(model, mutator if mutator is not None else EnasMutator(model), super().__init__(model, mutator if mutator is not None else EnasMutator(model),
loss, metrics, optimizer, num_epochs, dataset_train, dataset_valid, loss, metrics, optimizer, num_epochs, dataset_train, dataset_valid,
batch_size, workers, device, log_frequency, callbacks) batch_size, workers, device, log_frequency, callbacks)
......
...@@ -10,10 +10,8 @@ from nni.nas.pytorch.mutator import Mutator ...@@ -10,10 +10,8 @@ from nni.nas.pytorch.mutator import Mutator
class FixedArchitecture(Mutator): class FixedArchitecture(Mutator):
def __init__(self, model, fixed_arc, strict=True):
""" """
Initialize a fixed architecture mutator. Fixed architecture mutator that always selects a certain graph.
Parameters Parameters
---------- ----------
...@@ -22,8 +20,10 @@ class FixedArchitecture(Mutator): ...@@ -22,8 +20,10 @@ class FixedArchitecture(Mutator):
fixed_arc : str or dict fixed_arc : str or dict
Path to the architecture checkpoint (a string), or preloaded architecture object (a dict). Path to the architecture checkpoint (a string), or preloaded architecture object (a dict).
strict : bool strict : bool
Force everything that appears in `fixed_arc` to be used at least once. Force everything that appears in ``fixed_arc`` to be used at least once.
""" """
def __init__(self, model, fixed_arc, strict=True):
super().__init__(model) super().__init__(model)
self._fixed_arc = fixed_arc self._fixed_arc = fixed_arc
...@@ -35,9 +35,15 @@ class FixedArchitecture(Mutator): ...@@ -35,9 +35,15 @@ class FixedArchitecture(Mutator):
raise RuntimeError("Missing keys in fixed architecture: {}.".format(mutable_keys - fixed_arc_keys)) raise RuntimeError("Missing keys in fixed architecture: {}.".format(mutable_keys - fixed_arc_keys))
def sample_search(self): def sample_search(self):
"""
Always returns the fixed architecture.
"""
return self._fixed_arc return self._fixed_arc
def sample_final(self): def sample_final(self):
"""
Always returns the fixed architecture.
"""
return self._fixed_arc return self._fixed_arc
...@@ -66,6 +72,7 @@ def apply_fixed_architecture(model, fixed_arc_path): ...@@ -66,6 +72,7 @@ def apply_fixed_architecture(model, fixed_arc_path):
Returns Returns
------- -------
FixedArchitecture FixedArchitecture
Mutator that is responsible for fixes the graph.
""" """
if isinstance(fixed_arc_path, str): if isinstance(fixed_arc_path, str):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment