Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model. There are only two small differences.
Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model. There are only two small differences.
* Replace the code ``import torch.nn as nn`` with ``import nni.retiarii.nn.pytorch as nn`` for PyTorch modules, such as ``nn.Conv2d``, ``nn.ReLU``.
* Replace the code ``import torch.nn as nn`` with ``import nni.retiarii.nn.pytorch as nn`` for PyTorch modules, such as ``nn.Conv2d``, ``nn.ReLU``.
* Some **user-defined** modules should be decorated with ``@blackbox_module``. For example, user-defined module used in ``LayerChoice`` should be decorated. Users can refer to `here <#blackbox-module>`__ for detailed usage instruction of ``@blackbox_module``.
* Some **user-defined** modules should be decorated with ``@basic_unit``. For example, user-defined module used in ``LayerChoice`` should be decorated. Users can refer to `here <#serialize-module>`__ for detailed usage instruction of ``@basic_unit``.
Below is a very simple example of defining a base model, it is almost the same as defining a PyTorch model.
Below is a very simple example of defining a base model, it is almost the same as defining a PyTorch model.
...
@@ -59,7 +59,7 @@ A base model is only one concrete model not a model space. We provide APIs and p
...
@@ -59,7 +59,7 @@ A base model is only one concrete model not a model space. We provide APIs and p
For easy usability and also backward compatibility, we provide some APIs for users to easily express possible mutations after defining a base model. The APIs can be used just like PyTorch module.
For easy usability and also backward compatibility, we provide some APIs for users to easily express possible mutations after defining a base model. The APIs can be used just like PyTorch module.
* ``nn.LayerChoice``. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model. *Note that if the candidate is a user-defined module, it should be decorated as `blackbox module <#blackbox-module>`__. In the following example, ``ops.PoolBN`` and ``ops.SepConv`` should be decorated.*
* ``nn.LayerChoice``. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model. *Note that if the candidate is a user-defined module, it should be decorated as `serialize module <#serialize-module>`__. In the following example, ``ops.PoolBN`` and ``ops.SepConv`` should be decorated.*
.. code-block:: python
.. code-block:: python
...
@@ -83,7 +83,7 @@ For easy usability and also backward compatibility, we provide some APIs for use
...
@@ -83,7 +83,7 @@ For easy usability and also backward compatibility, we provide some APIs for use
# invoked in `forward` function, choose one from the three
# invoked in `forward` function, choose one from the three
out = self.input_switch([tensor1, tensor2, tensor3])
out = self.input_switch([tensor1, tensor2, tensor3])
* ``nn.ValueChoice``. It is for choosing one value from some candidate values. It can only be used as input argument of the modules in ``nn.modules`` and ``@blackbox_module`` decorated user-defined modules.
* ``nn.ValueChoice``. It is for choosing one value from some candidate values. It can only be used as input argument of the modules in ``nn.modules`` and ``@basic_unit`` decorated user-defined modules.
.. code-block:: python
.. code-block:: python
...
@@ -129,38 +129,37 @@ Use placehoder to make mutation easier: ``nn.Placeholder``. If you want to mutat
...
@@ -129,38 +129,37 @@ Use placehoder to make mutation easier: ``nn.Placeholder``. If you want to mutat
.. code-block:: python
.. code-block:: python
ph = nn.Placeholder(label='mutable_0',
ph = nn.Placeholder(
related_info={
label='mutable_0',
'kernel_size_options': [1, 3, 5],
kernel_size_options=[1, 3, 5],
'n_layer_options': [1, 2, 3, 4],
n_layer_options=[1, 2, 3, 4],
'exp_ratio': exp_ratio,
exp_ratio=exp_ratio,
'stride': stride
stride=stride
}
)
)
``label`` is used by mutator to identify this placeholder, ``related_info`` is the information that are required by mutator. As ``related_info`` is a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>`.
``label`` is used by mutator to identify this placeholder. The other parameters are the information that are required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <test/retiarii_test/mnasnet/base_mnasnet.py>`.
Explore the Defined Model Space
Explore the Defined Model Space
-------------------------------
-------------------------------
After model space is defined, it is time to explore this model space. Users can choose proper search and training approach to explore the model space.
After model space is defined, it is time to explore this model space. Users can choose proper search and model evaluator to explore the model space.
Create a Trainer and Exploration Strategy
Create an Evaluator and Exploration Strategy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**Classic search approach:**
**Classic search approach:**
In this approach, trainer is for training each explored model, while strategy is for sampling the models. Both trainer and strategy are required to explore the model space. We recommend PyTorch-Lightning to write the full training process.
In this approach, model evaluator is for training and testing each explored model, while strategy is for sampling the models. Both evaluator and strategy are required to explore the model space. We recommend PyTorch-Lightning to write the full evaluation process.
**Oneshot (weight-sharing) search approach:**
**Oneshot (weight-sharing) search approach:**
In this approach, users only need a oneshot trainer, because this trainer takes charge of both search and training.
In this approach, users only need a oneshot trainer, because this trainer takes charge of both search, training and testing.
In the following table, we listed the available trainers and strategies.
In the following table, we listed the available evaluators and strategies.
.. list-table::
.. list-table::
:header-rows: 1
:header-rows: 1
:widths: auto
:widths: auto
* - Trainer
* - Evaluator
- Strategy
- Strategy
- Oneshot Trainer
- Oneshot Trainer
* - Classification
* - Classification
...
@@ -178,24 +177,24 @@ In the following table, we listed the available trainers and strategies.
...
@@ -178,24 +177,24 @@ In the following table, we listed the available trainers and strategies.
There usage and API document can be found `here <./ApiReference>`__\.
There usage and API document can be found `here <./ApiReference>`__\.
Here is a simple example of using trainer and strategy.
Here is a simple example of using evaluator and strategy.
.. code-block:: python
.. code-block:: python
import nni.retiarii.trainer.pytorch.lightning as pl
import nni.retiarii.evaluator.pytorch.lightning as pl
.. Note:: For NNI to capture the dataset and dataloader and distribute it across different runs, please wrap your dataset with ``blackbox`` and use ``pl.DataLoader`` instead of ``torch.utils.data.DataLoader``. See ``blackbox_module`` section below for details.
.. Note:: For NNI to capture the dataset and dataloader and distribute it across different runs, please wrap your dataset with ``serialize`` and use ``pl.DataLoader`` instead of ``torch.utils.data.DataLoader``. See ``basic_unit`` section below for details.
Users can refer to `API reference <./ApiReference.rst>`__ on detailed usage of trainer. "`write a trainer <./WriteTrainer.rst>`__" for how to write a new trainer, and refer to `this document <./WriteStrategy.rst>`__ for how to write a new strategy.
Users can refer to `API reference <./ApiReference.rst>`__ on detailed usage of evaluator. "`write a trainer <./WriteTrainer.rst>`__" for how to write a new trainer, and refer to `this document <./WriteStrategy.rst>`__ for how to write a new strategy.
Set up an Experiment
Set up an Experiment
^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^
...
@@ -231,17 +230,17 @@ If you are using *oneshot (weight-sharing) search approach*, you can invole ``ex
...
@@ -231,17 +230,17 @@ If you are using *oneshot (weight-sharing) search approach*, you can invole ``ex
Advanced and FAQ
Advanced and FAQ
----------------
----------------
.. _blackbox-module:
.. _serialize-module:
**Blackbox Module**
**Serialize Module**
To understand the decorator ``blackbox_module``, we first briefly explain how our framework works: it converts user-defined model to a graph representation (called graph IR), each instantiated module is converted to a subgraph. Then user-defined mutations are applied to the graph to generate new graphs. Each new graph is then converted back to PyTorch code and executed. ``@blackbox_module`` here means the module will not be converted to a subgraph but is converted to a single graph node. That is, the module will not be unfolded anymore. Users should/can decorate a user-defined module class in the following cases:
To understand the decorator ``basic_unit``, we first briefly explain how our framework works: it converts user-defined model to a graph representation (called graph IR), each instantiated module is converted to a subgraph. Then user-defined mutations are applied to the graph to generate new graphs. Each new graph is then converted back to PyTorch code and executed. ``@basic_unit`` here means the module will not be converted to a subgraph but is converted to a single graph node. That is, the module will not be unfolded anymore. Users should/can decorate a user-defined module class in the following cases:
* When a module class cannot be successfully converted to a subgraph due to some implementation issues. For example, currently our framework does not support adhoc loop, if there is adhoc loop in a module's forward, this class should be decorated as blackbox module. The following ``MyModule`` should be decorated.
* When a module class cannot be successfully converted to a subgraph due to some implementation issues. For example, currently our framework does not support adhoc loop, if there is adhoc loop in a module's forward, this class should be decorated as serializeble module. The following ``MyModule`` should be decorated.
.. code-block:: python
.. code-block:: python
@blackbox_module
@basic_unit
class MyModule(nn.Module):
class MyModule(nn.Module):
def __init__(self):
def __init__(self):
...
...
...
@@ -249,6 +248,6 @@ To understand the decorator ``blackbox_module``, we first briefly explain how ou
...
@@ -249,6 +248,6 @@ To understand the decorator ``blackbox_module``, we first briefly explain how ou
for i in range(10): # <- adhoc loop
for i in range(10): # <- adhoc loop
...
...
* The candidate ops in ``LayerChoice`` should be decorated as blackbox module. For example, ``self.op = nn.LayerChoice([Op1(...), Op2(...), Op3(...)])``, where ``Op1``, ``Op2``, ``Op3`` should be decorated if they are user defined modules.
* The candidate ops in ``LayerChoice`` should be decorated as serializable module. For example, ``self.op = nn.LayerChoice([Op1(...), Op2(...), Op3(...)])``, where ``Op1``, ``Op2``, ``Op3`` should be decorated if they are user defined modules.
* When users want to use ``ValueChoice`` in a module's input argument, the module should be decorated as blackbox module. For example, ``self.conv = MyConv(kernel_size=nn.ValueChoice([1, 3, 5]))``, where ``MyConv`` should be decorated.
* When users want to use ``ValueChoice`` in a module's input argument, the module should be decorated as serializable module. For example, ``self.conv = MyConv(kernel_size=nn.ValueChoice([1, 3, 5]))``, where ``MyConv`` should be decorated.
* If no mutation is targeted on a module, this module *can be* decorated as a blackbox module.
* If no mutation is targeted on a module, this module *can be* decorated as a serializable module.
Trainers are necessary to evaluate the performance of new explored models. In NAS scenario, this further divides into two use cases:
Evaluators/Trainers are necessary to evaluate the performance of new explored models. In NAS scenario, this further divides into two use cases:
1. **Single-arch trainers**: trainers that are used to train and evaluate one single model.
1. **Single-arch evaluators**: evaluators that are used to train and evaluate one single model.
2. **One-shot trainers**: trainers that handle training and searching simultaneously, from an end-to-end perspective.
2. **One-shot trainers**: trainers that handle training and searching simultaneously, from an end-to-end perspective.
Single-arch trainers
Single-arch evaluators
--------------------
----------------------
With FunctionalEvaluator
^^^^^^^^^^^^^^^^^^^^^^^^
The simplest way to customize a new evaluator is with functional APIs, which is very easy when training code is already available. Users only need to write a fit function that wraps everything. This function takes one positional arguments (model) and possible keyword arguments. In this way, users get everything under their control, but exposes less information to the framework and thus fewer opportunities for possible optimization. An example is as belows:
.. code-block:: python
from nni.retiarii.evaluator import FunctionalEvaluator
from nni.retiarii.experiment.pytorch import RetiariiExperiment
It's recommended to write training code in PyTorch-Lightning style, that is, to write a LightningModule that defines all elements needed for training (e.g., loss function, optimizer) and to define a trainer that takes (optional) dataloaders to execute the training. Before that, please read the `document of PyTorch-lightning <https://pytorch-lightning.readthedocs.io/>` to learn the basic concepts and components provided by PyTorch-lightning.
It's recommended to write training code in PyTorch-Lightning style, that is, to write a LightningModule that defines all elements needed for training (e.g., loss function, optimizer) and to define a trainer that takes (optional) dataloaders to execute the training. Before that, please read the `document of PyTorch-lightning <https://pytorch-lightning.readthedocs.io/>` to learn the basic concepts and components provided by PyTorch-lightning.
In pratice, writing a new training module in NNI should inherit ``nni.retiarii.trainer.pytorch.lightning.LightningModule``, which has a ``set_model`` that will be called after ``__init__`` to save the candidate model (generated by strategy) as ``self.model``. The rest of the process (like ``training_step``) should be the same as writing any other lightning module. Trainers should also communicate with strategies via two API calls (``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics), added in ``on_validation_epoch_end`` and ``teardown`` respectively.
In pratice, writing a new training module in NNI should inherit ``nni.retiarii.evaluator.pytorch.lightning.LightningModule``, which has a ``set_model`` that will be called after ``__init__`` to save the candidate model (generated by strategy) as ``self.model``. The rest of the process (like ``training_step``) should be the same as writing any other lightning module. Evaluators should also communicate with strategies via two API calls (``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics), added in ``on_validation_epoch_end`` and ``teardown`` respectively.
An example is as follows:
An example is as follows:
.. code-block::python
.. code-block::python
from nni.retiarii.trainer.pytorch.lightning import LightningModule # please import this one
from nni.retiarii.evaluator.pytorch.lightning import LightningModule # please import this one
@blackbox_module
@basic_unit
class AutoEncoder(LightningModule):
class AutoEncoder(LightningModule):
def __init__(self):
def __init__(self):
super().__init__()
super().__init__()
...
@@ -69,9 +87,9 @@ An example is as follows:
...
@@ -69,9 +87,9 @@ An example is as follows:
Then, users need to wrap everything (including LightningModule, trainer and dataloaders) into a ``Lightning`` object, and pass this object into a Retiarii experiment.
Then, users need to wrap everything (including LightningModule, trainer and dataloaders) into a ``Lightning`` object, and pass this object into a Retiarii experiment.
.. code-block::python
.. code-block::python
import nni.retiarii.trainer.pytorch.lightning as pl
import nni.retiarii.evaluator.pytorch.lightning as pl
from nni.retiarii.experiment.pytorch import RetiariiExperiment
from nni.retiarii.experiment.pytorch import RetiariiExperiment
lightning = pl.Lightning(AutoEncoder(),
lightning = pl.Lightning(AutoEncoder(),
...
@@ -80,38 +98,20 @@ Then, users need to wrap everything (including LightningModule, trainer and data
...
@@ -80,38 +98,20 @@ Then, users need to wrap everything (including LightningModule, trainer and data
There is another way to customize a new trainer with functional APIs, which provides more flexibility. Users only need to write a fit function that wraps everything. This function takes one positional arguments (model) and possible keyword arguments. In this way, users get everything under their control, but exposes less information to the framework and thus fewer opportunities for possible optimization. An example is as belows:
.. code-block::python
from nni.retiarii.trainer import FunctionalTrainer
from nni.retiarii.experiment.pytorch import RetiariiExperiment
One-shot trainers should inheirt ``nni.retiarii.trainer.BaseOneShotTrainer``, and need to implement ``fit()`` (used to conduct the fitting and searching process) and ``export()`` method (used to return the searched best architecture).
One-shot trainers should inheirt ``nni.retiarii.oneshot.BaseOneShotTrainer``, and need to implement ``fit()`` (used to conduct the fitting and searching process) and ``export()`` method (used to return the searched best architecture).
Writing a one-shot trainer is very different to classic trainers. First of all, there are no more restrictions on init method arguments, any Python arguments are acceptable. Secondly, the model feeded into one-shot trainers might be a model with Retiarii-specific modules, such as LayerChoice and InputChoice. Such model cannot directly forward-propagate and trainers need to decide how to handle those modules.
Writing a one-shot trainer is very different to classic evaluators. First of all, there are no more restrictions on init method arguments, any Python arguments are acceptable. Secondly, the model feeded into one-shot trainers might be a model with Retiarii-specific modules, such as LayerChoice and InputChoice. Such model cannot directly forward-propagate and trainers need to decide how to handle those modules.
A typical example is DartsTrainer, where learnable-parameters are used to combine multiple choices in LayerChoice. Retiarii provides ease-to-use utility functions for module-replace purposes, namely ``replace_layer_choice``, ``replace_input_choice``. A simplified example is as follows:
A typical example is DartsTrainer, where learnable-parameters are used to combine multiple choices in LayerChoice. Retiarii provides ease-to-use utility functions for module-replace purposes, namely ``replace_layer_choice``, ``replace_input_choice``. A simplified example is as follows:
.. code-block::python
.. code-block::python
from nni.retiarii.trainer.pytorch import BaseOneShotTrainer
from nni.retiarii.oneshot import BaseOneShotTrainer
from nni.retiarii.trainer.pytorch.utils import replace_layer_choice, replace_input_choice
from nni.retiarii.oneshot.pytorch import replace_layer_choice, replace_input_choice