@@ -149,7 +149,7 @@ Create a Trainer and Exploration Strategy
...
@@ -149,7 +149,7 @@ Create a Trainer and Exploration Strategy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**Classic search approach:**
**Classic search approach:**
In this approach, trainer is for training each explored model, while strategy is for sampling the models. Both trainer and strategy are required to explore the model space.
In this approach, trainer is for training each explored model, while strategy is for sampling the models. Both trainer and strategy are required to explore the model space. We recommend PyTorch-Lightning to write the full training process.
**Oneshot (weight-sharing) search approach:**
**Oneshot (weight-sharing) search approach:**
In this approach, users only need a oneshot trainer, because this trainer takes charge of both search and training.
In this approach, users only need a oneshot trainer, because this trainer takes charge of both search and training.
...
@@ -163,10 +163,10 @@ In the following table, we listed the available trainers and strategies.
...
@@ -163,10 +163,10 @@ In the following table, we listed the available trainers and strategies.
* - Trainer
* - Trainer
- Strategy
- Strategy
- Oneshot Trainer
- Oneshot Trainer
* - PyTorchImageClassificationTrainer
* - Classification
- TPEStrategy
- TPEStrategy
- DartsTrainer
- DartsTrainer
* - PyTorchMultiModelTrainer
* - Regression
- RandomStrategy
- RandomStrategy
- EnasTrainer
- EnasTrainer
* -
* -
...
@@ -182,15 +182,20 @@ Here is a simple example of using trainer and strategy.
...
@@ -182,15 +182,20 @@ Here is a simple example of using trainer and strategy.
Users can refer to `this document <./WriteTrainer.rst>`__ for how to write a new trainer, and refer to `this document <./WriteStrategy.rst>`__ for how to write a new strategy.
.. Note:: For NNI to capture the dataset and dataloader and distribute it across different runs, please wrap your dataset with ``blackbox`` and use ``pl.DataLoader`` instead of ``torch.utils.data.DataLoader``. See ``blackbox_module`` section below for details.
Users can refer to `API reference <./ApiReference.rst>`__ on detailed usage of trainer. "`write a trainer <./WriteTrainer.rst>`__" for how to write a new trainer, and refer to `this document <./WriteStrategy.rst>`__ for how to write a new strategy.
Trainers are necessary to evaluate the performance of new explored models. In NAS scenario, this further divides into two use cases:
Trainers are necessary to evaluate the performance of new explored models. In NAS scenario, this further divides into two use cases:
1. **Classic trainers**: trainers that are used to train and evaluate one single model.
1. **Single-arch trainers**: trainers that are used to train and evaluate one single model.
2. **One-shot trainers**: trainers that handle training and searching simultaneously, from an end-to-end perspective.
2. **One-shot trainers**: trainers that handle training and searching simultaneously, from an end-to-end perspective.
Classic trainers
Single-arch trainers
----------------
--------------------
All classic trainers need to inherit ``nni.retiarii.trainer.BaseTrainer``, implement the ``fit`` method and decorated with ``@register_trainer`` if it is intended to be used together with Retiarii. The decorator serialize the trainer that is used and its argument to fit for the requirements of NNI.
With PyTorch-Lightning
^^^^^^^^^^^^^^^^^^^^^^
The init function of trainer should take model as its first argument, and the rest of the arguments should be named (``*args`` and ``**kwargs`` may not work as expected) and JSON serializable. This means, currently, passing a complex object like ``torchvision.datasets.ImageNet()`` is not supported. Trainer should use NNI standard API to communicate with tuning algorithms. This includes ``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics.
It's recommended to write training code in PyTorch-Lightning style, that is, to write a LightningModule that defines all elements needed for training (e.g., loss function, optimizer) and to define a trainer that takes (optional) dataloaders to execute the training. Before that, please read the `document of PyTorch-lightning <https://pytorch-lightning.readthedocs.io/>` to learn the basic concepts and components provided by PyTorch-lightning.
In pratice, writing a new training module in NNI should inherit ``nni.retiarii.trainer.pytorch.lightning.LightningModule``, which has a ``set_model`` that will be called after ``__init__`` to save the candidate model (generated by strategy) as ``self.model``. The rest of the process (like ``training_step``) should be the same as writing any other lightning module. Trainers should also communicate with strategies via two API calls (``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics), added in ``on_validation_epoch_end`` and ``teardown`` respectively.
An example is as follows:
An example is as follows:
.. code-block::python
.. code-block::python
from nni.retiarii import register_trainer
from nni.retiarii.trainer.pytorch.lightning import LightningModule # please import this one
Then, users need to wrap everything (including LightningModule, trainer and dataloaders) into a ``Lightning`` object, and pass this object into a Retiarii experiment.
.. code-block::python
import nni.retiarii.trainer.pytorch.lightning as pl
from nni.retiarii.experiment.pytorch import RetiariiExperiment
There is another way to customize a new trainer with functional APIs, which provides more flexibility. Users only need to write a fit function that wraps everything. This function takes one positional arguments (model) and possible keyword arguments. In this way, users get everything under their control, but exposes less information to the framework and thus fewer opportunities for possible optimization. An example is as belows:
.. code-block::python
from nni.retiarii.trainer import FunctionalTrainer
from nni.retiarii.experiment.pytorch import RetiariiExperiment
One-shot trainers should inheirt ``nni.retiarii.trainer.BaseOneShotTrainer``, which is basically same as ``BaseTrainer``, but only with one extra method ``export()``, which is expected to return the searched best architecture.
One-shot trainers should inheirt ``nni.retiarii.trainer.BaseOneShotTrainer``, and need to implement ``fit()`` (used to conduct the fitting and searching process) and ``export()`` method (used to return the searched best architecture).
Writing a one-shot trainer is very different to classic trainers. First of all, there are no more restrictions on init method arguments, any Python arguments are acceptable. Secondly, the model feeded into one-shot trainers might be a model with Retiarii-specific modules, such as LayerChoice and InputChoice. Such model cannot directly forward-propagate and trainers need to decide how to handle those modules.
Writing a one-shot trainer is very different to classic trainers. First of all, there are no more restrictions on init method arguments, any Python arguments are acceptable. Secondly, the model feeded into one-shot trainers might be a model with Retiarii-specific modules, such as LayerChoice and InputChoice. Such model cannot directly forward-propagate and trainers need to decide how to handle those modules.
...
@@ -55,7 +110,7 @@ A typical example is DartsTrainer, where learnable-parameters are used to combin
...
@@ -55,7 +110,7 @@ A typical example is DartsTrainer, where learnable-parameters are used to combin
.. code-block::python
.. code-block::python
from nni.retiarii.trainer import BaseOneShotTrainer
from nni.retiarii.trainer.pytorch import BaseOneShotTrainer
from nni.retiarii.trainer.pytorch.utils import replace_layer_choice, replace_input_choice
from nni.retiarii.trainer.pytorch.utils import replace_layer_choice, replace_input_choice