Unverified Commit 5afc83ec authored by QuanluZhang's avatar QuanluZhang Committed by GitHub
Browse files

[retiarii] doc update (#3432)

parent a4e3c240
Advanced Tutorial
=================
This document includes two parts. The first part explains the design decision of ``@basic_unit`` and ``serializer``. The second part is the tutorial of how to write a model space with mutators.
``@basic_unit`` and ``serializer``
----------------------------------
.. _serializer:
``@basic_unit`` and ``serialize`` can be viewed as some kind of serializer. They are designed for making the whole model (including training) serializable to be executed on another process or machine.
**@basic_unit** annotates that a module is a basic unit, i.e, no need to understand the details of this module. The effect is that it prevents Retiarii to parse this module. To understand this, we first briefly explain how Retiarii works: it converts user-defined model to a graph representation (called graph IR) using `TorchScript <https://pytorch.org/docs/stable/jit.html>`__, each instantiated module in the model is converted to a subgraph. Then mutations are applied to the graph to generate new graphs. Each new graph is then converted back to PyTorch code and executed. ``@basic_unit`` here means the module will not be converted to a subgraph, instead, it is converted to a single graph node as a basic unit. That is, the module will not be unfolded anymore. When the module is not unfolded, mutations on initialization parameters of this module becomes easier.
``@basic_unit`` is usually used in the following cases:
* When users want to tune initialization parameters of a module using ``ValueChoice``, then decorate the module with ``@basic_unit``. For example, ``self.conv = MyConv(kernel_size=nn.ValueChoice([1, 3, 5]))``, here ``MyConv`` should be decorated.
* When a module cannot be successfully parsed to a subgraph, decorate the module with ``@basic_unit``. The parse failure could be due to complex control flow. Currently Retiarii does not support adhoc loop, if there is adhoc loop in a module's forward, this class should be decorated as serializable module. For example, the following ``MyModule`` should be decorated.
.. code-block:: python
@basic_unit
class MyModule(nn.Module):
def __init__(self):
...
def forward(self, x):
for i in range(10): # <- adhoc loop
...
* Some inline mutation APIs require their handled module to be decorated with ``@basic_unit``. For example, user-defined module that is provided to ``LayerChoice`` as a candidate op should be decorated.
**serialize** is mainly used for serializing model training logic. It enables re-instantiation of model evaluator in another process or machine. Re-instantiation is necessary because most of time model and evaluator should be sent to training services. ``serialize`` is implemented by recording the initialization parameters of user instantiated evaluator.
The evaluator related APIs provided by Retiarii have already supported serialization, for example ``pl.Classification``, ``pl.DataLoader``, no need to apply ``serialize`` on them. In the following case users should use ``serialize`` API manually.
If the initialization parameters of the evaluator APIs (e.g., ``pl.Classification``, ``pl.DataLoader``) are not primitive types (e.g., ``int``, ``string``), they should be applied with ``serialize``. If those parameters' initialization parameters are not primitive types, ``serialize`` should also be applied. In a word, ``serialize`` should be applied recursively if necessary.
Express Mutations with Mutators
-------------------------------
Besides inline mutations which have been demonstrated `here <./Tutorial.rst>`__, Retiarii provides a more general approach to express a model space: *Mutator*. Inline mutations APIs are also implemented with mutator, which can be seen as a special case of model mutation.
.. note:: Mutator and inline mutation APIs cannot be used together.
A mutator is a piece of logic to express how to mutate a given model. Users are free to write their own mutators. Then a model space is expressed with a base model and a list of mutators. A model in the model space is sampled by applying the mutators on the base model one after another. An example is shown below.
.. code-block:: python
applied_mutators = []
applied_mutators.append(BlockMutator('mutable_0'))
applied_mutators.append(BlockMutator('mutable_1'))
``BlockMutator`` is defined by users to express how to mutate the base model.
Write a mutator
^^^^^^^^^^^^^^^
User-defined mutator should inherit ``Mutator`` class, and implement mutation logic in the member function ``mutate``.
.. code-block:: python
from nni.retiarii import Mutator
class BlockMutator(Mutator):
def __init__(self, target: str, candidates: List):
super(BlockMutator, self).__init__()
self.target = target
self.candidate_op_list = candidates
def mutate(self, model):
nodes = model.get_nodes_by_label(self.target)
for node in nodes:
chosen_op = self.choice(self.candidate_op_list)
node.update_operation(chosen_op.type, chosen_op.params)
The input of ``mutate`` is graph IR (Intermediate Representation) of the base model (please refer to `here <./ApiReference.rst>`__ for the format and APIs of the IR), users can mutate the graph using the graph's member functions (e.g., ``get_nodes_by_label``, ``update_operation``). The mutation operations can be combined with the API ``self.choice``, in order to express a set of possible mutations. In the above example, the node's operation can be changed to any operation from ``candidate_op_list``.
Use placehoder to make mutation easier: ``nn.Placeholder``. If you want to mutate a subgraph or node of your model, you can define a placeholder in this model to represent the subgraph or node. Then, use mutator to mutate this placeholder to make it real modules.
.. code-block:: python
ph = nn.Placeholder(
label='mutable_0',
kernel_size_options=[1, 3, 5],
n_layer_options=[1, 2, 3, 4],
exp_ratio=exp_ratio,
stride=stride
)
``label`` is used by mutator to identify this placeholder. The other parameters are the information that are required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>`.
Starting an experiment is almost the same as using inline mutation APIs. The only difference is that the applied mutators should be passed to ``RetiariiExperiment``. Below is a simple example.
.. code-block:: python
exp = RetiariiExperiment(base_model, trainer, applied_mutators, simple_strategy)
exp_config = RetiariiExeConfig('local')
exp_config.experiment_name = 'mnasnet_search'
exp_config.trial_concurrency = 2
exp_config.max_trial_number = 10
exp_config.training_service.use_active_gpu = False
exp.run(exp_config, 8081)
...@@ -39,8 +39,8 @@ Graph Mutation APIs ...@@ -39,8 +39,8 @@ Graph Mutation APIs
.. autoclass:: nni.retiarii.Operation .. autoclass:: nni.retiarii.Operation
:members: :members:
Trainers Evaluators
-------- ----------
.. autoclass:: nni.retiarii.evaluator.FunctionalEvaluator .. autoclass:: nni.retiarii.evaluator.FunctionalEvaluator
:members: :members:
......
One-shot Experiments on Retiarii
================================
Before reading this tutorial, we highly recommend you to first go through the tutorial of how to `define a model space <./Tutorial.rst#define-your-model-space>`__.
Model Search with One-shot Trainer
----------------------------------
With a defined model space, users can explore the space in two ways. One is using strategy and single-arch evaluator as demonstrated `here <./Tutorial.rst#explore-the-defined-model-space>`__. The other is using one-shot trainer, which consumes much less computational resource compared to the first one. In this tutorial we focus on this one-shot approach. The principle of one-shot approach is combining all the models in a model space into one big model (usually called super-model or super-graph). It takes charge of both search, training and testing, by training and evaluating this big model.
We list the supported one-shot trainers here:
* DARTS trainer
* ENAS trainer
* ProxylessNAS trainer
* Single-path (random) trainer
See `API reference <./ApiReference.rst>`__ for detailed usages. Here, we show an example to use DARTS trainer manually.
.. code-block:: python
from nni.retiarii.oneshot.pytorch import DartsTrainer
trainer = DartsTrainer(
model=model,
loss=criterion,
metrics=lambda output, target: accuracy(output, target, topk=(1,)),
optimizer=optim,
num_epochs=args.epochs,
dataset=dataset_train,
batch_size=args.batch_size,
log_frequency=args.log_frequency,
unrolled=args.unrolled
)
trainer.fit()
final_architecture = trainer.export()
**Format of the exported architecture.** TBD.
One-shot experiment can be visualized with NAS UI, please refer to `here <../Visualization.rst>`__ for the usage guidance. Note that NAS visualization is under intensive development.
Customize a New One-shot Trainer
--------------------------------
One-shot trainers should inherit ``nni.retiarii.oneshot.BaseOneShotTrainer``, and need to implement ``fit()`` (used to conduct the fitting and searching process) and ``export()`` method (used to return the searched best architecture).
Writing a one-shot trainer is very different to single-arch evaluator. First of all, there are no more restrictions on init method arguments, any Python arguments are acceptable. Secondly, the model fed into one-shot trainers might be a model with Retiarii-specific modules, such as LayerChoice and InputChoice. Such model cannot directly forward-propagate and trainers need to decide how to handle those modules.
A typical example is DartsTrainer, where learnable-parameters are used to combine multiple choices in LayerChoice. Retiarii provides ease-to-use utility functions for module-replace purposes, namely ``replace_layer_choice``, ``replace_input_choice``. A simplified example is as follows:
.. code-block:: python
from nni.retiarii.oneshot import BaseOneShotTrainer
from nni.retiarii.oneshot.pytorch import replace_layer_choice, replace_input_choice
class DartsLayerChoice(nn.Module):
def __init__(self, layer_choice):
super(DartsLayerChoice, self).__init__()
self.name = layer_choice.key
self.op_choices = nn.ModuleDict(layer_choice.named_children())
self.alpha = nn.Parameter(torch.randn(len(self.op_choices)) * 1e-3)
def forward(self, *args, **kwargs):
op_results = torch.stack([op(*args, **kwargs) for op in self.op_choices.values()])
alpha_shape = [-1] + [1] * (len(op_results.size()) - 1)
return torch.sum(op_results * F.softmax(self.alpha, -1).view(*alpha_shape), 0)
class DartsTrainer(BaseOneShotTrainer):
def __init__(self, model, loss, metrics, optimizer):
self.model = model
self.loss = loss
self.metrics = metrics
self.num_epochs = 10
self.nas_modules = []
replace_layer_choice(self.model, DartsLayerChoice, self.nas_modules)
... # init dataloaders and optimizers
def fit(self):
for i in range(self.num_epochs):
for (trn_X, trn_y), (val_X, val_y) in zip(self.train_loader, self.valid_loader):
self.train_architecture(val_X, val_y)
self.train_model_weight(trn_X, trn_y)
@torch.no_grad()
def export(self):
result = dict()
for name, module in self.nas_modules:
if name not in result:
result[name] = select_best_of_module(module)
return result
The full code of DartsTrainer is available to Retiarii source code. Please have a check at :githublink:`DartsTrainer <nni/retiarii/oneshot/pytorch/darts.py>`.
Neural Architecture Search with Retiarii (Experimental) Neural Architecture Search with Retiarii (Alpha)
======================================================= ================================================
*This is a pre-release, its interfaces may subject to minor changes. The roadmap of this figure is: experimental in V2.0 -> alpha version in V2.1 -> beta version in V2.2 -> official release in V2.3. Feel free to give us your comments and suggestions.*
`Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ is a new framework to support neural architecture search and hyper-parameter tuning. It allows users to express various search space with high flexibility, to reuse many SOTA search algorithms, and to leverage system level optimizations to speed up the search process. This framework provides the following new user experiences. `Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ is a new framework to support neural architecture search and hyper-parameter tuning. It allows users to express various search space with high flexibility, to reuse many SOTA search algorithms, and to leverage system level optimizations to speed up the search process. This framework provides the following new user experiences.
* Search space can be expressed directly in user model code. A tuning space can be expressed along defining a model. * Search space can be expressed directly in user model code. A tuning space can be expressed during defining a model.
* Neural architecture candidates and hyper-parameter candidates are more friendly supported in an experiment. * Neural architecture candidates and hyper-parameter candidates are more friendly supported in an experiment.
* The experiment can be launched directly from python code. * The experiment can be launched directly from python code.
*We are working on migrating* `our previous NAS framework <../Overview.rst>`__ *to Retiarii framework. Thus, this feature is still experimental. We recommend users to try the new framework and provide your valuable feedback for us to improve it. The old framework is still supported for now.* .. Note:: `Our previous NAS framework <../Overview.rst>`__ is still supported for now, but will be migrated to Retiarii framework in V2.3.
.. contents:: .. contents::
There are mainly two steps to start an experiment for your neural architecture search task. First, define the model space you want to explore. Second, choose a search method to explore your defined model space. There are mainly two crucial components for a neural architecture search task, namely,
* Model search space that defines the set of models to explore.
* A proper strategy as the method to explore this search space.
* A model evaluator that reports the performance of a given model.
.. note:: Currently, PyTorch is the only supported framework by Retiarii, and we have only tested with **PyTorch 1.6 and 1.7**. This documentation assumes PyTorch context but it should also apply to other frameworks, that is in our future plan.
Define your Model Space Define your Model Space
----------------------- -----------------------
Model space is defined by users to express a set of models that users want to explore, and believe good-performing models are included in those models. In this framework, a model space is defined with two parts: a base model and possible mutations on the base model. Model space is defined by users to express a set of models that users want to explore, which contains potentially good-performing models. In this framework, a model space is defined with two parts: a base model and possible mutations on the base model.
Define Base Model Define Base Model
^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^
Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model. There are only two small differences. Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model. Usually, you only need to replace the code ``import torch.nn as nn`` with ``import nni.retiarii.nn.pytorch as nn`` to use our wrapped PyTorch modules.
* Replace the code ``import torch.nn as nn`` with ``import nni.retiarii.nn.pytorch as nn`` for PyTorch modules, such as ``nn.Conv2d``, ``nn.ReLU``.
* Some **user-defined** modules should be decorated with ``@basic_unit``. For example, user-defined module used in ``LayerChoice`` should be decorated. Users can refer to `here <#serialize-module>`__ for detailed usage instruction of ``@basic_unit``.
Below is a very simple example of defining a base model, it is almost the same as defining a PyTorch model. Below is a very simple example of defining a base model, it is almost the same as defining a PyTorch model.
...@@ -33,10 +38,17 @@ Below is a very simple example of defining a base model, it is almost the same a ...@@ -33,10 +38,17 @@ Below is a very simple example of defining a base model, it is almost the same a
import torch.nn.functional as F import torch.nn.functional as F
import nni.retiarii.nn.pytorch as nn import nni.retiarii.nn.pytorch as nn
class MyModule(nn.Module): @basic_unit
class BasicBlock(nn.Module):
def __init__(self, const):
self.const = const
def forward(self, x):
return x + self.const
class ConvPool(nn.Module):
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.conv = nn.Conv2d(32, 1, 5) self.conv = nn.Conv2d(32, 1, 5) # possibly mutate this conv
self.pool = nn.MaxPool2d(kernel_size=2) self.pool = nn.MaxPool2d(kernel_size=2)
def forward(self, x): def forward(self, x):
return self.pool(self.conv(x)) return self.pool(self.conv(x))
...@@ -44,9 +56,12 @@ Below is a very simple example of defining a base model, it is almost the same a ...@@ -44,9 +56,12 @@ Below is a very simple example of defining a base model, it is almost the same a
class Model(nn.Module): class Model(nn.Module):
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.mymodule = MyModule() self.convpool = ConvPool()
self.mymodule = BasicBlock(2.)
def forward(self, x): def forward(self, x):
return F.relu(self.mymodule(x)) return F.relu(self.convpool(self.mymodule(x)))
The above example also shows how to use ``@basic_unit``. ``@basic_unit`` is decorated on a user-defined module to tell Retiarii that there will be no mutation within this module, Retiarii can treat it as a basic unit (i.e., as a blackbox). It is useful when (1) users want to mutate the initialization parameters of this module, or (2) Retiarii fails to parse this module due to complex control flow (e.g., ``for``, ``while``). More detailed description of ``@basic_unit`` can be found `here <./Advanced.rst>`__.
Users can refer to :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>` and :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>` for more complicated examples. Users can refer to :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>` and :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>` for more complicated examples.
...@@ -55,22 +70,20 @@ Define Model Mutations ...@@ -55,22 +70,20 @@ Define Model Mutations
A base model is only one concrete model not a model space. We provide APIs and primitives for users to express how the base model can be mutated, i.e., a model space which includes many models. A base model is only one concrete model not a model space. We provide APIs and primitives for users to express how the base model can be mutated, i.e., a model space which includes many models.
**Express mutations in an inlined manner** We provide some APIs as shown below for users to easily express possible mutations after defining a base model. The APIs can be used just like PyTorch module. This approach is also called inline mutations.
For easy usability and also backward compatibility, we provide some APIs for users to easily express possible mutations after defining a base model. The APIs can be used just like PyTorch module.
* ``nn.LayerChoice``. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model. *Note that if the candidate is a user-defined module, it should be decorated as `serialize module <#serialize-module>`__. In the following example, ``ops.PoolBN`` and ``ops.SepConv`` should be decorated.* * ``nn.LayerChoice``. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model. Note that if the candidate is a user-defined module, it should be decorated as a `basic unit <./Advanced.rst>`__ with ``@basic_unit``. In the following example, ``ops.PoolBN`` and ``ops.SepConv`` should be decorated.
.. code-block:: python .. code-block:: python
# import nni.retiarii.nn.pytorch as nn # import nni.retiarii.nn.pytorch as nn
# declared in `__init__` # declared in `__init__` method
self.layer = nn.LayerChoice([ self.layer = nn.LayerChoice([
ops.PoolBN('max', channels, 3, stride, 1), ops.PoolBN('max', channels, 3, stride, 1),
ops.SepConv(channels, channels, 3, stride, 1), ops.SepConv(channels, channels, 3, stride, 1),
nn.Identity() nn.Identity()
])) ]))
# invoked in `forward` function # invoked in `forward` method
out = self.layer(x) out = self.layer(x)
* ``nn.InputChoice``. It is mainly for choosing (or trying) different connections. It takes several tensors and chooses ``n_chosen`` tensors from them. * ``nn.InputChoice``. It is mainly for choosing (or trying) different connections. It takes several tensors and chooses ``n_chosen`` tensors from them.
...@@ -78,106 +91,69 @@ For easy usability and also backward compatibility, we provide some APIs for use ...@@ -78,106 +91,69 @@ For easy usability and also backward compatibility, we provide some APIs for use
.. code-block:: python .. code-block:: python
# import nni.retiarii.nn.pytorch as nn # import nni.retiarii.nn.pytorch as nn
# declared in `__init__` # declared in `__init__` method
self.input_switch = nn.InputChoice(n_chosen=1) self.input_switch = nn.InputChoice(n_chosen=1)
# invoked in `forward` function, choose one from the three # invoked in `forward` method, choose one from the three
out = self.input_switch([tensor1, tensor2, tensor3]) out = self.input_switch([tensor1, tensor2, tensor3])
* ``nn.ValueChoice``. It is for choosing one value from some candidate values. It can only be used as input argument of the modules in ``nn.modules`` and ``@basic_unit`` decorated user-defined modules. * ``nn.ValueChoice``. It is for choosing one value from some candidate values. It can only be used as input argument of basic units, that is, modules in ``nni.retiarii.nn.pytorch`` and user-defined modules decorated with ``@basic_unit``.
.. code-block:: python .. code-block:: python
# import nni.retiarii.nn.pytorch as nn # import nni.retiarii.nn.pytorch as nn
# used in `__init__` # used in `__init__` method
self.conv = nn.Conv2d(XX, XX, kernel_size=nn.ValueChoice([1, 3, 5]) self.conv = nn.Conv2d(XX, XX, kernel_size=nn.ValueChoice([1, 3, 5])
self.op = MyOp(nn.ValueChoice([0, 1], nn.ValueChoice([-1, 1])) self.op = MyOp(nn.ValueChoice([0, 1]), nn.ValueChoice([-1, 1]))
Detailed API description and usage can be found `here <./ApiReference.rst>`__\. Example of using these APIs can be found in :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>`. All the APIs have an optional argument called ``label``, mutations with the same label will share the same choice. A typical example is,
**Express mutations with mutators** .. code-block:: python
Though easy-to-use, inline mutations have limited expressiveness, some model spaces cannot be expressed. To improve expressiveness and flexibility, we provide primitives for users to write *Mutator* to express how they want to mutate base model more flexibly. Mutator stands above base model, thus has full ability to edit the model. self.net = nn.Sequential(
nn.Linear(10, nn.ValueChoice([32, 64, 128], label='hidden_dim'),
nn.Linear(nn.ValueChoice([32, 64, 128], label='hidden_dim'), 3)
)
Users can instantiate several mutators as below, the mutators will be sequentially applied to the base model one after another for sampling a new model. Detailed API description and usage can be found `here <./ApiReference.rst>`__\. Example of using these APIs can be found in :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>`. We are actively enriching the set of inline mutations, to make it easier to express a new search space.
.. code-block:: python If the inline mutation APIs are not enough for your scenario, you can refer to `defining model space using mutators <./Advanced.rst#express-mutations-with-mutators>`__ to write more complex model spaces.
applied_mutators = [] Explore the Defined Model Space
applied_mutators.append(BlockMutator('mutable_0')) -------------------------------
applied_mutators.append(BlockMutator('mutable_1'))
``BlockMutator`` is defined by users to express how to mutate the base model. User-defined mutator should inherit ``Mutator`` class, and implement mutation logic in the member function ``mutate``. There are basically two exploration approaches: (1) search by evaluating each sampled model independently and (2) one-shot weight-sharing based search. We demonstrate the first approach below in this tutorial. Users can refer to `here <./OneshotTrainer.rst>`__ for the second approach.
.. code-block:: python Users can choose a proper search strategy to explore the model space, and use a chosen or user-defined model evaluator to evaluate the performance of each sampled model.
from nni.retiarii import Mutator Choose a search strategy
class BlockMutator(Mutator): ^^^^^^^^^^^^^^^^^^^^^^^^
def __init__(self, target: str, candidates: List):
super(BlockMutator, self).__init__()
self.target = target
self.candidate_op_list = candidates
def mutate(self, model): Retiarii currently supports the following search strategies:
nodes = model.get_nodes_by_label(self.target)
for node in nodes:
chosen_op = self.choice(self.candidate_op_list)
node.update_operation(chosen_op.type, chosen_op.params)
The input of ``mutate`` is graph IR of the base model (please refer to `here <./ApiReference.rst>`__ for the format and APIs of the IR), users can mutate the graph with its member functions (e.g., ``get_nodes_by_label``, ``update_operation``). The mutation operations can be combined with the API ``self.choice``, in order to express a set of possible mutations. In the above example, the node's operation can be changed to any operation from ``candidate_op_list``. * Grid search: enumerate all the possible models defined in the space.
* Random: randomly pick the models from search space.
* Regularized evolution: a genetic algorithm that explores the space based on inheritance and mutation.
Use placehoder to make mutation easier: ``nn.Placeholder``. If you want to mutate a subgraph or node of your model, you can define a placeholder in this model to represent the subgraph or node. Then, use mutator to mutate this placeholder to make it real modules. Choose (i.e., instantiate) a search strategy is very easy. An example is as follows,
.. code-block:: python .. code-block:: python
ph = nn.Placeholder( import nni.retiarii.strategy as strategy
label='mutable_0',
kernel_size_options=[1, 3, 5],
n_layer_options=[1, 2, 3, 4],
exp_ratio=exp_ratio,
stride=stride
)
``label`` is used by mutator to identify this placeholder. The other parameters are the information that are required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>`.
Explore the Defined Model Space
-------------------------------
After model space is defined, it is time to explore this model space. Users can choose proper search and model evaluator to explore the model space.
Create an Evaluator and Exploration Strategy search_strategy = strategy.Random(dedup=True) # dedup=False if deduplication is not wanted
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**Classic search approach:** Detailed descriptions and usages of available strategies can be found `here <./ApiReference.rst>`__ .
In this approach, model evaluator is for training and testing each explored model, while strategy is for sampling the models. Both evaluator and strategy are required to explore the model space. We recommend PyTorch-Lightning to write the full evaluation process.
**Oneshot (weight-sharing) search approach:** Choose or write a model evaluator
In this approach, users only need a oneshot trainer, because this trainer takes charge of both search, training and testing. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In the following table, we listed the available evaluators and strategies. In the NAS process, the search strategy repeatedly generates new models. A model evaluator is for training and validating each generated model. The obtained performance of a generated model is collected and sent to search strategy for generating better models.
.. list-table:: The model evaluator should correctly identify the use case of the model and the optimization goal. For example, on a classification task, an <input, label> dataset is needed, the loss function could be cross entropy and the optimized metric could be accuracy. On a regression task, the optimized metric could be mean-squared-error.
:header-rows: 1
:widths: auto
* - Evaluator In the context of PyTorch, Retiarii has provided two built-in model evaluators, designed for simple use cases: classification and regression. These two evaluators are built upon the awesome library PyTorch-Lightning.
- Strategy
- Oneshot Trainer
* - Classification
- TPEStrategy
- DartsTrainer
* - Regression
- Random
- EnasTrainer
* -
- GridSearch
- ProxylessTrainer
* -
- RegularizedEvolution
- SinglePathTrainer (RandomTrainer)
There usage and API document can be found `here <./ApiReference>`__\. An example here creates a simple evaluator that runs on MNIST dataset, trains for 10 epochs, and reports its validation accuracy.
Here is a simple example of using evaluator and strategy.
.. code-block:: python .. code-block:: python
...@@ -185,25 +161,33 @@ Here is a simple example of using evaluator and strategy. ...@@ -185,25 +161,33 @@ Here is a simple example of using evaluator and strategy.
from nni.retiarii import serialize from nni.retiarii import serialize
from torchvision import transforms from torchvision import transforms
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]) transform = serialize(transforms.Compose, [serialize(transforms.ToTensor()), serialize(transforms.Normalize, (0.1307,), (0.3081,))])
train_dataset = serialize(MNIST, root='data/mnist', train=True, download=True, transform=transform) train_dataset = serialize(MNIST, root='data/mnist', train=True, download=True, transform=transform)
test_dataset = serialize(MNIST, root='data/mnist', train=False, download=True, transform=transform) test_dataset = serialize(MNIST, root='data/mnist', train=False, download=True, transform=transform)
lightning = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=100), evaluator = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
val_dataloaders=pl.DataLoader(test_dataset, batch_size=100), val_dataloaders=pl.DataLoader(test_dataset, batch_size=100),
max_epochs=10) max_epochs=10)
.. Note:: For NNI to capture the dataset and dataloader and distribute it across different runs, please wrap your dataset with ``serialize`` and use ``pl.DataLoader`` instead of ``torch.utils.data.DataLoader``. See ``basic_unit`` section below for details. As the model evaluator is running in another process (possibly in some remote machines), the defined evaluator, along with all its parameters, needs to be correctly serialized. For example, users should use the dataloader that has been already wrapped as a serializable class defined in ``nni.retiarii.evaluator.pytorch.lightning``. For the arguments used in dataloader, recursive serialization needs to be done, until the arguments are simple types like int, str, float.
Detailed descriptions and usages of model evaluators can be found `here <./ApiReference.rst>`__ .
If the built-in model evaluators do not meet your requirement, or you already wrote the training code and just want to use it, you can follow `the guide to write a new evaluator <./WriteTrainer.rst>`__ .
Users can refer to `API reference <./ApiReference.rst>`__ on detailed usage of evaluator. "`write a trainer <./WriteTrainer.rst>`__" for how to write a new trainer, and refer to `this document <./WriteStrategy.rst>`__ for how to write a new strategy. .. note:: In case you want to run the model evaluator locally for debug purpose, you can directly run the evaluator via ``evaluator._execute(Net)`` (note that it has to be ``Net``, not ``Net()``). However, this API is currently internal and subject to change.
Set up an Experiment .. warning:: Mutations on the parameters of model evaluator (known as hyper-parameter tuning) is currently not supported but will be supported in the future.
^^^^^^^^^^^^^^^^^^^^
After all the above are prepared, it is time to start an experiment to do the model search. We design unified interface for users to start their experiment. An example is shown below .. warning:: To use PyTorch-lightning with Retiarii, currently you need to install PyTorch-lightning v1.1.x (v1.2 is not supported).
Launch an Experiment
--------------------
After all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.
.. code-block:: python .. code-block:: python
exp = RetiariiExperiment(base_model, trainer, applied_mutators, simple_strategy) exp = RetiariiExperiment(base_model, trainer, None, simple_strategy)
exp_config = RetiariiExeConfig('local') exp_config = RetiariiExeConfig('local')
exp_config.experiment_name = 'mnasnet_search' exp_config.experiment_name = 'mnasnet_search'
exp_config.trial_concurrency = 2 exp_config.trial_concurrency = 2
...@@ -211,43 +195,9 @@ After all the above are prepared, it is time to start an experiment to do the mo ...@@ -211,43 +195,9 @@ After all the above are prepared, it is time to start an experiment to do the mo
exp_config.training_service.use_active_gpu = False exp_config.training_service.use_active_gpu = False
exp.run(exp_config, 8081) exp.run(exp_config, 8081)
This code starts an NNI experiment. Note that if inlined mutation is used, ``applied_mutators`` should be ``None``.
The complete code of a simple MNIST example can be found :githublink:`here <test/retiarii_test/mnist/test.py>`. The complete code of a simple MNIST example can be found :githublink:`here <test/retiarii_test/mnist/test.py>`.
Visualize your experiment Visualize the Experiment
^^^^^^^^^^^^^^^^^^^^^^^^^ ------------------------
Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost::8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../../Tutorial/WebUI.rst>`__ for details. If users are using oneshot trainer, they can refer to `here <../Visualization.rst>`__ for how to visualize their experiments.
Export the best model found in your experiment
----------------------------------------------
If you are using *classic search approach*, you can simply find out the best one from WebUI.
If you are using *oneshot (weight-sharing) search approach*, you can invole ``exp.export_top_models`` to output several best models that are found in the experiment.
Advanced and FAQ
----------------
.. _serialize-module:
**Serialize Module**
To understand the decorator ``basic_unit``, we first briefly explain how our framework works: it converts user-defined model to a graph representation (called graph IR), each instantiated module is converted to a subgraph. Then user-defined mutations are applied to the graph to generate new graphs. Each new graph is then converted back to PyTorch code and executed. ``@basic_unit`` here means the module will not be converted to a subgraph but is converted to a single graph node. That is, the module will not be unfolded anymore. Users should/can decorate a user-defined module class in the following cases:
* When a module class cannot be successfully converted to a subgraph due to some implementation issues. For example, currently our framework does not support adhoc loop, if there is adhoc loop in a module's forward, this class should be decorated as serializeble module. The following ``MyModule`` should be decorated.
.. code-block:: python
@basic_unit
class MyModule(nn.Module):
def __init__(self):
...
def forward(self, x):
for i in range(10): # <- adhoc loop
...
* The candidate ops in ``LayerChoice`` should be decorated as serializable module. For example, ``self.op = nn.LayerChoice([Op1(...), Op2(...), Op3(...)])``, where ``Op1``, ``Op2``, ``Op3`` should be decorated if they are user defined modules. Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost::8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../../Tutorial/WebUI.rst>`__ for details.
* When users want to use ``ValueChoice`` in a module's input argument, the module should be decorated as serializable module. For example, ``self.conv = MyConv(kernel_size=nn.ValueChoice([1, 3, 5]))``, where ``MyConv`` should be decorated. \ No newline at end of file
* If no mutation is targeted on a module, this module *can be* decorated as a serializable module.
Customize A New Evaluator/Trainer Customize A New Model Evaluator
================================= ===============================
Evaluators/Trainers are necessary to evaluate the performance of new explored models. In NAS scenario, this further divides into two use cases: Model Evaluator is necessary to evaluate the performance of new explored models. A model evaluator usually includes training, validating and testing of a single model. We provide two ways for users to write a new model evaluator, which will be demonstrated below respectively.
1. **Single-arch evaluators**: evaluators that are used to train and evaluate one single model.
2. **One-shot trainers**: trainers that handle training and searching simultaneously, from an end-to-end perspective.
Single-arch evaluators
----------------------
With FunctionalEvaluator With FunctionalEvaluator
^^^^^^^^^^^^^^^^^^^^^^^^ ------------------------
The simplest way to customize a new evaluator is with functional APIs, which is very easy when training code is already available. Users only need to write a fit function that wraps everything. This function takes one positional arguments (model) and possible keyword arguments. In this way, users get everything under their control, but exposes less information to the framework and thus fewer opportunities for possible optimization. An example is as belows: The simplest way to customize a new evaluator is with functional APIs, which is very easy when training code is already available. Users only need to write a fit function that wraps everything. This function takes one positional arguments (``model_cls``) and possible keyword arguments. The keyword arguments (other than ``model_cls``) are fed to FunctionEvaluator as its initialization parameters. In this way, users get everything under their control, but expose less information to the framework and thus fewer opportunities for possible optimization. An example is as belows:
.. code-block:: python .. code-block:: python
from nni.retiarii.evaluator import FunctionalEvaluator from nni.retiarii.evaluator import FunctionalEvaluator
from nni.retiarii.experiment.pytorch import RetiariiExperiment from nni.retiarii.experiment.pytorch import RetiariiExperiment
def fit(model, dataloader): def fit(model_cls, dataloader):
model = model_cls()
train(model, dataloader) train(model, dataloader)
acc = test(model, dataloader) acc = test(model, dataloader)
nni.report_final_result(acc) nni.report_final_result(acc)
...@@ -27,12 +22,14 @@ The simplest way to customize a new evaluator is with functional APIs, which is ...@@ -27,12 +22,14 @@ The simplest way to customize a new evaluator is with functional APIs, which is
evaluator = FunctionalEvaluator(fit, dataloader=DataLoader(foo, bar)) evaluator = FunctionalEvaluator(fit, dataloader=DataLoader(foo, bar))
experiment = RetiariiExperiment(base_model, evaluator, mutators, strategy) experiment = RetiariiExperiment(base_model, evaluator, mutators, strategy)
.. note:: Due to our current implementation limitation, the ``fit`` function should be put in another python file instead of putting it in the main file. This limitation will be fixed in future release.
With PyTorch-Lightning With PyTorch-Lightning
^^^^^^^^^^^^^^^^^^^^^^ ----------------------
It's recommended to write training code in PyTorch-Lightning style, that is, to write a LightningModule that defines all elements needed for training (e.g., loss function, optimizer) and to define a trainer that takes (optional) dataloaders to execute the training. Before that, please read the `document of PyTorch-lightning <https://pytorch-lightning.readthedocs.io/>` to learn the basic concepts and components provided by PyTorch-lightning. It's recommended to write training code in PyTorch-Lightning style, that is, to write a LightningModule that defines all elements needed for training (e.g., loss function, optimizer) and to define a trainer that takes (optional) dataloaders to execute the training. Before that, please read the `document of PyTorch-lightning <https://pytorch-lightning.readthedocs.io/>`__ to learn the basic concepts and components provided by PyTorch-lightning.
In pratice, writing a new training module in NNI should inherit ``nni.retiarii.evaluator.pytorch.lightning.LightningModule``, which has a ``set_model`` that will be called after ``__init__`` to save the candidate model (generated by strategy) as ``self.model``. The rest of the process (like ``training_step``) should be the same as writing any other lightning module. Evaluators should also communicate with strategies via two API calls (``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics), added in ``on_validation_epoch_end`` and ``teardown`` respectively. In practice, writing a new training module in Retiarii should inherit ``nni.retiarii.evaluator.pytorch.lightning.LightningModule``, which has a ``set_model`` that will be called after ``__init__`` to save the candidate model (generated by strategy) as ``self.model``. The rest of the process (like ``training_step``) should be the same as writing any other lightning module. Evaluators should also communicate with strategies via two API calls (``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics), added in ``on_validation_epoch_end`` and ``teardown`` respectively.
An example is as follows: An example is as follows:
...@@ -97,61 +94,3 @@ Then, users need to wrap everything (including LightningModule, trainer and data ...@@ -97,61 +94,3 @@ Then, users need to wrap everything (including LightningModule, trainer and data
train_dataloader=pl.DataLoader(train_dataset, batch_size=100), train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
val_dataloaders=pl.DataLoader(test_dataset, batch_size=100)) val_dataloaders=pl.DataLoader(test_dataset, batch_size=100))
experiment = RetiariiExperiment(base_model, lightning, mutators, strategy) experiment = RetiariiExperiment(base_model, lightning, mutators, strategy)
One-shot trainers
-----------------
One-shot trainers should inheirt ``nni.retiarii.oneshot.BaseOneShotTrainer``, and need to implement ``fit()`` (used to conduct the fitting and searching process) and ``export()`` method (used to return the searched best architecture).
Writing a one-shot trainer is very different to classic evaluators. First of all, there are no more restrictions on init method arguments, any Python arguments are acceptable. Secondly, the model feeded into one-shot trainers might be a model with Retiarii-specific modules, such as LayerChoice and InputChoice. Such model cannot directly forward-propagate and trainers need to decide how to handle those modules.
A typical example is DartsTrainer, where learnable-parameters are used to combine multiple choices in LayerChoice. Retiarii provides ease-to-use utility functions for module-replace purposes, namely ``replace_layer_choice``, ``replace_input_choice``. A simplified example is as follows:
.. code-block:: python
from nni.retiarii.oneshot import BaseOneShotTrainer
from nni.retiarii.oneshot.pytorch import replace_layer_choice, replace_input_choice
class DartsLayerChoice(nn.Module):
def __init__(self, layer_choice):
super(DartsLayerChoice, self).__init__()
self.name = layer_choice.key
self.op_choices = nn.ModuleDict(layer_choice.named_children())
self.alpha = nn.Parameter(torch.randn(len(self.op_choices)) * 1e-3)
def forward(self, *args, **kwargs):
op_results = torch.stack([op(*args, **kwargs) for op in self.op_choices.values()])
alpha_shape = [-1] + [1] * (len(op_results.size()) - 1)
return torch.sum(op_results * F.softmax(self.alpha, -1).view(*alpha_shape), 0)
class DartsTrainer(BaseOneShotTrainer):
def __init__(self, model, loss, metrics, optimizer):
self.model = model
self.loss = loss
self.metrics = metrics
self.num_epochs = 10
self.nas_modules = []
replace_layer_choice(self.model, DartsLayerChoice, self.nas_modules)
... # init dataloaders and optimizers
def fit(self):
for i in range(self.num_epochs):
for (trn_X, trn_y), (val_X, val_y) in zip(self.train_loader, self.valid_loader):
self.train_architecture(val_X, val_y)
self.train_model_weight(trn_X, trn_y)
@torch.no_grad()
def export(self):
result = dict()
for name, module in self.nas_modules:
if name not in result:
result[name] = select_best_of_module(module)
return result
The full code of DartsTrainer is available to Retiarii source code. Please have a check at :githublink:`nni/retiarii/trainer/pytorch/darts.py`.
...@@ -8,6 +8,8 @@ Retiarii Overview ...@@ -8,6 +8,8 @@ Retiarii Overview
:maxdepth: 2 :maxdepth: 2
Quick Start <Tutorial> Quick Start <Tutorial>
Customize a New Trainer <WriteTrainer> Write a Model Evaluator <WriteTrainer>
One-shot NAS <OneshotTrainer>
Advanced Tutorial <Advanced>
Customize a New Strategy <WriteStrategy> Customize a New Strategy <WriteStrategy>
Retiarii APIs <ApiReference> Retiarii APIs <ApiReference>
\ No newline at end of file
...@@ -21,7 +21,7 @@ For details, please refer to the following tutorials: ...@@ -21,7 +21,7 @@ For details, please refer to the following tutorials:
Write A Search Space <NAS/WriteSearchSpace> Write A Search Space <NAS/WriteSearchSpace>
Classic NAS <NAS/ClassicNas> Classic NAS <NAS/ClassicNas>
One-shot NAS <NAS/one_shot_nas> One-shot NAS <NAS/one_shot_nas>
Retiarii NAS (experimental) <NAS/retiarii/retiarii_index> Retiarii NAS (Alpha) <NAS/retiarii/retiarii_index>
Customize a NAS Algorithm <NAS/Advanced> Customize a NAS Algorithm <NAS/Advanced>
NAS Visualization <NAS/Visualization> NAS Visualization <NAS/Visualization>
Search Space Zoo <NAS/SearchSpaceZoo> Search Space Zoo <NAS/SearchSpaceZoo>
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment