@@ -88,9 +88,9 @@ To enable CGO execution engine, you need to follow these steps:
...
@@ -88,9 +88,9 @@ To enable CGO execution engine, you need to follow these steps:
config.training_service.machine_list = [rm_conf]
config.training_service.machine_list = [rm_conf]
exp.run(config, 8099)
exp.run(config, 8099)
CGO Execution Engine only supports pytorch-lightning trainer that inherits `MultiModelSupervisedLearningModule <./ApiReference.rst#nni.retiarii.evaluator.pytorch.cgo.evaluator.MultiModelSupervisedLearningModule>`__.
CGO Execution Engine only supports pytorch-lightning trainer that inherits :class:`nni.retiarii.evaluator.pytorch.cgo.evaluator.MultiModelSupervisedLearningModule`.
For a trial running multiple models, the trainers inheriting ``MultiModelSupervisedLearningModule`` can handle the multiple outputs from the merged model for training, test and validation.
For a trial running multiple models, the trainers inheriting :class:`nni.retiarii.evaluator.pytorch.cgo.evaluator.MultiModelSupervisedLearningModule` can handle the multiple outputs from the merged model for training, test and validation.
We have already implemented two trainers: `Classification <./ApiReference.rst#nni.retiarii.evaluator.pytorch.cgo.evaluator.Classification>`__ and `Regression <./ApiReference.rst#nni.retiarii.evaluator.pytorch.cgo.evaluator.Regression>`__.
We have already implemented two trainers: :class:`nni.retiarii.evaluator.pytorch.cgo.evaluator.Classification` and :class:`nni.retiarii.evaluator.pytorch.cgo.evaluator.Regression`.
@@ -15,7 +15,7 @@ To use an exploration strategy, users simply instantiate an exploration strategy
...
@@ -15,7 +15,7 @@ To use an exploration strategy, users simply instantiate an exploration strategy
SupportedExplorationStrategies
SupportedExplorationStrategies
--------------------------------
--------------------------------
NNI provides the following exploration strategies for multi-trial NAS. Users could also `customize new exploration strategies <./WriteStrategy.rst>`__.
In multi-trial NAS, a sampled model should be able to be executed on a remote machine or a training platform (e.g., AzureML, OpenPAI). Thus, both the model and its model evaluator should be correctly serialized. To make NNI correctly serialize model evaluator, users should apply ``serialize`` on some of their functions and objects.
`serialize <./ApiReference.rst#utilities>`__ enables re-instantiation of model evaluator in another process or machine. It is implemented by recording the initialization parameters of user instantiated evaluator.
EvaluatorswithPyTorch-Lightning
---------------------------------
The evaluator related APIs provided by Retiarii have already supported serialization, for example ``pl.Classification``, ``pl.DataLoader``, no need to apply ``serialize`` on them. In the following case users should use ``serialize`` API manually.
UseBuilt-inEvaluators
^^^^^^^^^^^^^^^^^^^^^^^
If the initialization parameters of the evaluator APIs (e.g., ``pl.Classification``, ``pl.DataLoader``) are not primitive types (e.g., ``int``, ``string``), they should be applied with ``serialize``. If those parameters' initialization parameters are not primitive types, ``serialize`` should also be applied. In a word, ``serialize`` should be applied recursively if necessary.
NNIprovidessomecommonlyusedmodelevaluatorsforusers' convenience. These evaluators are built upon the awesome library PyTorch-Lightning.
Below is an example, ``transforms.Compose``, ``transforms.Normalize``, and ``MNIST`` are serialized manually using ``serialize``. ``serialize`` takes a class ``cls`` as its first argument, its following arguments are the arguments for initializing this class. ``pl.Classification`` is not applied ``serialize`` because it is already serializable as an API provided by NNI.
We recommend to read the `serialization tutorial <./Serialization.rst>`__ before using these evaluators. A few notes to summarize the tutorial:
1. ``pl.DataLoader`` should be used in place of ``torch.utils.data.DataLoader``.
2. The datasets used in data-loader should be decorated with ``nni.trace`` recursively.
For example,
.. code-block:: python
.. code-block:: python
import nni.retiarii.evaluator.pytorch.lightning as pl
import nni.retiarii.evaluator.pytorch.lightning as pl
NNI provides some commonly used model evaluators for users' convenience. If these model evaluators do not meet users' requirement, they can customize new model evaluators following the tutorial `here <./WriteTrainer.rst>`__.
Another approach is to write training code in PyTorch-Lightning style, that is, to write a LightningModule that defines all elements needed for training (e.g., loss function, optimizer) and to define a trainer that takes (optional) dataloaders to execute the training. Before that, please read the `document of PyTorch-lightning <https://pytorch-lightning.readthedocs.io/>`__ to learn the basic concepts and components provided by PyTorch-lightning.
In practice, writing a new training module in Retiarii should inherit ``nni.retiarii.evaluator.pytorch.lightning.LightningModule``, which has a ``set_model`` that will be called after ``__init__`` to save the candidate model (generated by strategy) as ``self.model``. The rest of the process (like ``training_step``) should be the same as writing any other lightning module. Evaluators should also communicate with strategies via two API calls (``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics), added in ``on_validation_epoch_end`` and ``teardown`` respectively.
An example is as follows:
.. code-block:: python
from nni.retiarii.evaluator.pytorch.lightning import LightningModule # please import this one
.. Note:: NNI's latest NAS supports are all based on Retiarii Framework, users who are still on `early version using NNI NAS v1.0 <https://nni.readthedocs.io/en/v2.2/nas.html>`__ shall migrate your work to Retiarii as soon as possible.
.. attention:: NNI's latest NAS supports are all based on Retiarii Framework, users who are still on `early version using NNI NAS v1.0 <https://nni.readthedocs.io/en/v2.2/nas.html>`__ shall migrate your work to Retiarii as soon as possible.
.. note:: Currently, PyTorch is the only supported framework by Retiarii, and we have only tested **PyTorch 1.6 to 1.9**. This documentation assumes PyTorch context but it should also apply to other frameworks, which is in our future plan.
@@ -124,30 +126,40 @@ Pick or customize a model evaluator
...
@@ -124,30 +126,40 @@ Pick or customize a model evaluator
Intheexplorationprocess,theexplorationstrategyrepeatedlygeneratesnewmodels.Amodelevaluatorisfortrainingandvalidatingeachgeneratedmodeltoobtainthemodel's performance. The performance is sent to the exploration strategy for the strategy to generate better models.
Intheexplorationprocess,theexplorationstrategyrepeatedlygeneratesnewmodels.Amodelevaluatorisfortrainingandvalidatingeachgeneratedmodeltoobtainthemodel's performance. The performance is sent to the exploration strategy for the strategy to generate better models.
Retiarii has provided two built-in model evaluators, designed for simple use cases: classification and regression. These two evaluators are built upon the awesome library PyTorch-Lightning.
Retiarii has provided `built-in model evaluators <./ModelEvaluators.rst>`__, but to start with, it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function. This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.
An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.
An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.
.. code-block:: python
.. code-block:: python
import nni.retiarii.evaluator.pytorch.lightning as pl
# call report intermediate result. Result can be float or dict
nni.report_intermediate_result(accuracy)
``serialize`` is for serializing the objects to make model evaluator executable on another process or another machine (e.g., on remote training service). Retiarii provided model evaluators and other classes are already serializable. Other objects should be applied ``serialize``, for example, ``MNIST`` in the above example.
# report final test result
nni.report_final_result(accuracy)
Detailed descriptions and usages of model evaluators can be found `here <./ApiReference.rst>`__ .
If the built-in model evaluators do not meet your requirement, or you already wrote the training code and just want to use it, you can follow `the guide to write a new model evaluator <./WriteTrainer.rst>`__ .
The ``train_epoch`` and ``test_epoch`` here can be any customized function, where users can write their own training recipe. See :githublink:`examples/nas/multi-trial/mnist/search.py` for the full example.
.. warning:: Mutations on the parameters of model evaluator is currently not supported but will be supported in the future.
It is recommended that the ``evaluate_model`` here accepts no additional arguments other than ``model_cls``. However, in the `advanced tutorial <./ModelEvaluators.rst>`__, we will show how to use additional arguments in case you actually need those. In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".
Launch an Experiment
Launch an Experiment
--------------------
--------------------
...
@@ -156,7 +168,7 @@ After all the above are prepared, it is time to start an experiment to do the mo
...
@@ -156,7 +168,7 @@ After all the above are prepared, it is time to start an experiment to do the mo
Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost::8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../Tutorial/WebUI.rst>`__ for details.
Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost::8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../Tutorial/WebUI.rst>`__ for details.
We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__). This can be used by clicking ``Visualization`` in detail panel for each trial. Note that current visualization is based on `onnx <https://onnx.ai/>`__ . Built-in evaluators (e.g., Classification) will automatically export the model into a file, for your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.
We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__). This can be used by clicking ``Visualization`` in detail panel for each trial. Note that current visualization is based on `onnx <https://onnx.ai/>`__ , thus visualization is not feasible if the model cannot be exported into onnx. Built-in evaluators (e.g., Classification) will automatically export the model into a file. For your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.
In multi-trial NAS, a sampled model should be able to be executed on a remote machine or a training platform (e.g., AzureML, OpenPAI). "Serialization" enables re-instantiation of model evaluator in another process or machine, such that, both the model and its model evaluator should be correctly serialized. To make NNI correctly serialize model evaluator, users should apply ``nni.trace`` on some of their functions and objects. API references can be found in :func:`nni.trace`.
Serialization is implemented as a combination of `json-tricks <https://json-tricks.readthedocs.io/en/latest/>`_ and `cloudpickle <https://github.com/cloudpipe/cloudpickle>`_. Essentially, it is json-tricks, that is a enhanced version of Python JSON, enabling handling of serialization of numpy arrays, date/times, decimal, fraction and etc. The difference lies in the handling of class instances. Json-tricks deals with class instances with ``__dict__`` and ``__class__``, which in most of our cases are not reliable (e.g., datasets, dataloaders). Rather, our serialization deals with class instances with two methods:
1. If the class / factory that creates the object is decorated with ``nni.trace``, we can serialize the class / factory function, along with the parameters, such that the instance can be re-instantiated.
2. Otherwise, cloudpickle is used to serialize the object into a binary.
The recommendation is, unless you are absolutely certain that there is no problem and extra burden to serialize the object into binary, always add ``nni.trace``. In most cases, it will be more clean and neat, and enables possibilities such as mutation of parameters (will be supported in future).
.. warning::
**What will happen if I forget to "trace" my objects?**
It is likely that the program can still run. NNI will try to serialize the untraced object into a binary. If might fail in complicated cases (e.g., circular dependency). Even if it succeeds, the result might be a substantially large object. For example, if you forgot to add ``nni.trace`` on ``MNIST``, the MNIST dataset object wil be serialized into binary, which will be dozens of megabytes because the object has the whole 60k images stored inside. You might see warnings and even errors when running experiments. To avoid such issues, the easiest way is to always remember to add ``nni.trace`` to non-primitive objects.
To trace a function or class, users can use decorator like,
.. code-block:: python
@nni.trace
class MyClass:
...
Inline trace that traces instantly on the object instantiation or function invoke is also acceptable: ``nni.trace(MyClass)(parameters)``.
Assuming a class ``cls`` is already traced, when it is serialized, its class type along with initialization parameters will be dumped. As the parameters are possibly class instances (if not primitive types like ``int`` and ``str``), their serialization will be a similar problem. We recommend decorate them with ``nni.trace`` as well. In other words, ``nni.trace`` should be applied recursively if necessary.
Below is an example, ``transforms.Compose``, ``transforms.Normalize``, and ``MNIST`` are serialized manually using ``nni.trace``. ``nni.trace`` takes a class / function as its argument, and returns a wrapped class and function that has the same behavior with the original class / function. The usage of the wrapped class / function is also identical to the original one, except that the arguments are recorded. No need to apply ``nni.trace`` to ``pl.Classification`` and ``pl.DataLoader`` because they are already traced.
.. code-block:: python
import nni
import nni.retiarii.evaluator.pytorch.lightning as pl
**What's the relationship between model_wrapper, basic_unit and nni.trace?**
They are fundamentally different. ``model_wrapper`` is used to wrap a base model (search space), ``basic_unit`` to annotate a module as primitive. ``nni.trace`` is to enable serialization of general objects. Though they share similar underlying implementations, but do keep in mind that you will experience errors if you mix them up.
.. seealso:: Please refer to API reference of :meth:`nni.retiarii.model_wrapper`, :meth:`nni.retiarii.basic_unit`, and :meth:`nni.trace`.
It's recommended to write training code in PyTorch-Lightning style, that is, to write a LightningModule that defines all elements needed for training (e.g., loss function, optimizer) and to define a trainer that takes (optional) dataloaders to execute the training. Before that, please read the `document of PyTorch-lightning <https://pytorch-lightning.readthedocs.io/>`__ to learn the basic concepts and components provided by PyTorch-lightning.
In practice, writing a new training module in Retiarii should inherit ``nni.retiarii.evaluator.pytorch.lightning.LightningModule``, which has a ``set_model`` that will be called after ``__init__`` to save the candidate model (generated by strategy) as ``self.model``. The rest of the process (like ``training_step``) should be the same as writing any other lightning module. Evaluators should also communicate with strategies via two API calls (``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics), added in ``on_validation_epoch_end`` and ``teardown`` respectively.
An example is as follows:
.. code-block:: python
from nni.retiarii.evaluator.pytorch.lightning import LightningModule # please import this one
In multi-trial NAS, users need model evaluator to evaluate the performance of each sampled model, and need an exploration strategy to sample models from a defined model space. Here, users could use NNI provided model evaluators or write their own model evalutor. They can simply choose a exploration strategy. Advanced users can also customize new exploration strategy. For a simple example about how to run a multi-trial NAS experiment, please refer to `Quick Start <./QuickStart.rst>`__.
In multi-trial NAS, users need model evaluator to evaluate the performance of each sampled model, and need an exploration strategy to sample models from a defined model space. Here, users could use NNI provided model evaluators or write their own model evalutor. They can simply choose a exploration strategy. Advanced users can also customize new exploration strategy. For a simple example about how to run a multi-trial NAS experiment, please refer to `Quick Start <./QuickStart.rst>`__.
使用说明和 API 文档在 `这里 <./ApiReference>`__。 详细的 API 描述和使用说明在 `这里 <./ApiReference.rst>`__。 使用这些 API 的示例在 :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>`。 我们正在积极丰富内联突变 API,使其更容易表达一个新的搜索空间。 参考 `这里 <./construct_space.rst>`__ 获取更多关于表达复杂模型空间的教程。