Serialization improvements (docs, examples, bug fixes) (#4390)

2ad69cf6 · Yuge Zhang · GitHub · a254f058 · 2ad69cf6 · 2ad69cf6
Unverified Commit 2ad69cf6 authored Dec 28, 2021 by Yuge Zhang Committed by GitHub Dec 28, 2021
20 changed files
--- a/dependencies/required.txt
+++ b/dependencies/required.txt
 astor
 hyperopt == 0.1.2
-json_tricks
+json_tricks >= 3.15.5
 psutil
 pyyaml >= 5.4
 requests

--- a/docs/en_US/NAS/ExecutionEngines.rst
+++ b/docs/en_US/NAS/ExecutionEngines.rst
@@ -88,9 +88,9 @@ To enable CGO execution engine, you need to follow these steps:
    config.training_service.machine_list = [rm_conf]
    exp.run(config, 8099)
-CGO Execution Engine only supports pytorch-lightning trainer that inherits `MultiModelSupervisedLearningModule <./ApiReference.rst#nni.retiarii.evaluator.pytorch.cgo.evaluator.MultiModelSupervisedLearningModule>`__.
+CGO Execution Engine only supports pytorch-lightning trainer that inherits :class:`nni.retiarii.evaluator.pytorch.cgo.evaluator.MultiModelSupervisedLearningModule`.
-For a trial running multiple models, the trainers inheriting ``MultiModelSupervisedLearningModule`` can handle the multiple outputs from the merged model for training, test and validation.
+For a trial running multiple models, the trainers inheriting :class:`nni.retiarii.evaluator.pytorch.cgo.evaluator.MultiModelSupervisedLearningModule` can handle the multiple outputs from the merged model for training, test and validation.
-We have already implemented two trainers: `Classification <./ApiReference.rst#nni.retiarii.evaluator.pytorch.cgo.evaluator.Classification>`__ and `Regression <./ApiReference.rst#nni.retiarii.evaluator.pytorch.cgo.evaluator.Regression>`__.
+We have already implemented two trainers: :class:`nni.retiarii.evaluator.pytorch.cgo.evaluator.Classification` and :class:`nni.retiarii.evaluator.pytorch.cgo.evaluator.Regression`.
 .. code-block:: python

--- a/docs/en_US/NAS/ExplorationStrategies.rst
+++ b/docs/en_US/NAS/ExplorationStrategies.rst
@@ -15,7 +15,7 @@ To use an exploration strategy, users simply instantiate an exploration strategy
 Supported Exploration Strategies
 --------------------------------
-NNI provides the following exploration strategies for multi-trial NAS. Users could also `customize new exploration strategies <./WriteStrategy.rst>`__.
+NNI provides the following exploration strategies for multi-trial NAS.
 .. list-table::
   :header-rows: 1
@@ -33,3 +33,42 @@ NNI provides the following exploration strategies for multi-trial NAS. Users cou
     - Sampling new model(s) from user defined model space using `TPE algorithm <https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf>`__ . (``nni.retiarii.strategy.TPEStrategy``)
   * - `RL Strategy <./ApiReference.rst#nni.retiarii.strategy.PolicyBasedRL>`__
     - It uses `PPO algorithm <https://arxiv.org/abs/1707.06347>`__ to sample new model(s) from user defined model space. (``nni.retiarii.strategy.PolicyBasedRL``)
+Customize Exploration Strategy
+------------------------------
+If users want to innovate a new exploration strategy, they can easily customize a new one following the interface provided by NNI. Specifically, users should inherit the base strategy class ``BaseStrategy``, then implement the member function ``run``. This member function takes ``base_model`` and ``applied_mutators`` as its input arguments. It can simply apply the user specified mutators in ``applied_mutators`` onto ``base_model`` to generate a new model. When a mutator is applied, it should be bound with a sampler (e.g., ``RandomSampler``). Every sampler implements the ``choice`` function which chooses value(s) from candidate values. The ``choice`` functions invoked in mutators are executed with the sampler.
+Below is a very simple random strategy, which makes the choices completely random.
+.. code-block:: python
+    from nni.retiarii import Sampler
+    class RandomSampler(Sampler):
+        def choice(self, candidates, mutator, model, index):
+            return random.choice(candidates)
+    class RandomStrategy(BaseStrategy):
+        def __init__(self):
+            self.random_sampler = RandomSampler()
+        def run(self, base_model, applied_mutators):
+            _logger.info('stargety start...')
+            while True:
+                avail_resource = query_available_resources()
+                if avail_resource > 0:
+                    model = base_model
+                    _logger.info('apply mutators...')
+                    _logger.info('mutators: %s', str(applied_mutators))
+                    for mutator in applied_mutators:
+                        mutator.bind_sampler(self.random_sampler)
+                        model = mutator.apply(model)
+                    # run models
+                    submit_models(model)
+                else:
+                    time.sleep(2)
+You can find that this strategy does not know the search space beforehand, it passively makes decisions every time ``choice`` is invoked from mutators. If a strategy wants to know the whole search space before making any decision (e.g., TPE, SMAC), it can use ``dry_run`` function provided by ``Mutator`` to obtain the space. An example strategy can be found :githublink:`here <nni/retiarii/strategy/tpe_strategy.py>`.
+After generating a new model, the strategy can use our provided APIs (e.g., ``submit_models``, ``is_stopped_exec``) to submit the model and get its reported results. More APIs can be found in `API References <./ApiReference.rst>`__.
--- a/docs/en_US/NAS/ModelEvaluators.rst
+++ b/docs/en_US/NAS/ModelEvaluators.rst
 Model Evaluators
 ================
-A model evaluator is for training and validating each generated model.
+A model evaluator is for training and validating each generated model. They are necessary to evaluate the performance of new explored models.
-Usage of Model Evaluator
+Customize Evaluator with Any Function
------------------------
+-------------------------------------
-In multi-trial NAS, a sampled model should be able to be executed on a remote machine or a training platform (e.g., AzureML, OpenPAI). Thus, both the model and its model evaluator should be correctly serialized. To make NNI correctly serialize model evaluator, users should apply ``serialize`` on some of their functions and objects.
+The simplest way to customize a new evaluator is with functional APIs, which is very easy when training code is already available. Users only need to write a fit function that wraps everything, which usually includes training, validating and testing of a single model. This function takes one positional arguments (``model_cls``) and possible keyword arguments. The keyword arguments (other than ``model_cls``) are fed to FunctionEvaluator as its initialization parameters (note that they will be `serialized <./Serialization.rst>`__). In this way, users get everything under their control, but expose less information to the framework and as a result, further optimizations like `CGO <./ExecutionEngines.rst#cgo-execution-engine-experimental>`__ might be not feasible. An example is as belows:
-.. _serializer:
+.. code-block:: python
+    from nni.retiarii.evaluator import FunctionalEvaluator
+    from nni.retiarii.experiment.pytorch import RetiariiExperiment
+    def fit(model_cls, dataloader):
+        model = model_cls()
+        train(model, dataloader)
+        acc = test(model, dataloader)
+        nni.report_final_result(acc)
+    # The dataloader will be serialized, thus ``nni.trace`` is needed here.
+    # See serialization tutorial for more details.
+    evaluator = FunctionalEvaluator(fit, dataloader=nni.trace(DataLoader)(foo, bar))
+    experiment = RetiariiExperiment(base_model, evaluator, mutators, strategy)
+.. tip::
+    When using customized evaluators, if you want to visualize models, you need to export your model and save it into ``$NNI_OUTPUT_DIR/model.onnx`` in your evaluator. An example here:
+    .. code-block:: python
+        def fit(model_cls):
+            model = model_cls()
+            onnx_path = Path(os.environ.get('NNI_OUTPUT_DIR', '.')) / 'model.onnx'
+            onnx_path.parent.mkdir(exist_ok=True)
+            dummy_input = torch.randn(10, 3, 224, 224)
+            torch.onnx.export(model, dummy_input, onnx_path)
+            # the rest of training code here
+    If the conversion is successful, the model will be able to be visualized with powerful tools `Netron <https://netron.app/>`__.
-`serialize <./ApiReference.rst#utilities>`__ enables re-instantiation of model evaluator in another process or machine. It is implemented by recording the initialization parameters of user instantiated evaluator.
+Evaluators with PyTorch-Lightning
+---------------------------------
-The evaluator related APIs provided by Retiarii have already supported serialization, for example ``pl.Classification``, ``pl.DataLoader``, no need to apply ``serialize`` on them. In the following case users should use ``serialize`` API manually.
+Use Built-in Evaluators
+^^^^^^^^^^^^^^^^^^^^^^^
-If the initialization parameters of the evaluator APIs (e.g., ``pl.Classification``, ``pl.DataLoader``) are not primitive types (e.g., ``int``, ``string``), they should be applied with  ``serialize``. If those parameters' initialization parameters are not primitive types, ``serialize`` should also be applied. In a word, ``serialize`` should be applied recursively if necessary.
+NNI provides some commonly used model evaluators for users' convenience. These evaluators are built upon the awesome library PyTorch-Lightning.
-Below is an example, ``transforms.Compose``, ``transforms.Normalize``, and ``MNIST`` are serialized manually using ``serialize``. ``serialize`` takes a class ``cls`` as its first argument, its following arguments are the arguments for initializing this class. ``pl.Classification`` is not applied ``serialize`` because it is already serializable as an API provided by NNI.
+We recommend to read the `serialization tutorial <./Serialization.rst>`__ before using these evaluators. A few notes to summarize the tutorial:
+1. ``pl.DataLoader`` should be used in place of ``torch.utils.data.DataLoader``.
+2. The datasets used in data-loader should be decorated with ``nni.trace`` recursively.
+For example,
 .. code-block:: python
  import nni.retiarii.evaluator.pytorch.lightning as pl
-  from nni.retiarii import serialize
  from torchvision import transforms
-  transform = serialize(transforms.Compose, [serialize(transforms.ToTensor()), serialize(transforms.Normalize, (0.1307,), (0.3081,))])
+  transform = nni.trace(transforms.Compose, [nni.trace(transforms.ToTensor()), nni.trace(transforms.Normalize, (0.1307,), (0.3081,))])
-  train_dataset = serialize(MNIST, root='data/mnist', train=True, download=True, transform=transform)
+  train_dataset = nni.trace(MNIST, root='data/mnist', train=True, download=True, transform=transform)
-  test_dataset = serialize(MNIST, root='data/mnist', train=False, download=True, transform=transform)
+  test_dataset = nni.trace(MNIST, root='data/mnist', train=False, download=True, transform=transform)
+  # pl.DataLoader and pl.Classification is already traced and supports serialization.
  evaluator = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
                                val_dataloaders=pl.DataLoader(test_dataset, batch_size=100),
                                max_epochs=10)
-Supported Model Evaluators
--------------------------
-NNI provides some commonly used model evaluators for users' convenience. If these model evaluators do not meet users' requirement, they can customize new model evaluators following the tutorial `here <./WriteTrainer.rst>`__.
 ..  autoclass:: nni.retiarii.evaluator.pytorch.lightning.Classification
    :noindex:
 ..  autoclass:: nni.retiarii.evaluator.pytorch.lightning.Regression
    :noindex:
+Customize Evaluator with PyTorch-Lightning
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Another approach is to write training code in PyTorch-Lightning style, that is, to write a LightningModule that defines all elements needed for training (e.g., loss function, optimizer) and to define a trainer that takes (optional) dataloaders to execute the training. Before that, please read the `document of PyTorch-lightning <https://pytorch-lightning.readthedocs.io/>`__ to learn the basic concepts and components provided by PyTorch-lightning.
+In practice, writing a new training module in Retiarii should inherit ``nni.retiarii.evaluator.pytorch.lightning.LightningModule``, which has a ``set_model`` that will be called after ``__init__`` to save the candidate model (generated by strategy) as ``self.model``. The rest of the process (like ``training_step``) should be the same as writing any other lightning module. Evaluators should also communicate with strategies via two API calls (``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics), added in ``on_validation_epoch_end`` and ``teardown`` respectively. 
+An example is as follows:
+.. code-block:: python
+    from nni.retiarii.evaluator.pytorch.lightning import LightningModule  # please import this one
+    @nni.trace
+    class AutoEncoder(LightningModule):
+        def __init__(self):
+            super().__init__()
+            self.decoder = nn.Sequential(
+                nn.Linear(3, 64),
+                nn.ReLU(),
+                nn.Linear(64, 28*28)
+            )
+        def forward(self, x):
+            embedding = self.model(x)  # let's search for encoder
+            return embedding
+        def training_step(self, batch, batch_idx):
+            # training_step defined the train loop.
+            # It is independent of forward
+            x, y = batch
+            x = x.view(x.size(0), -1)
+            z = self.model(x)  # model is the one that is searched for
+            x_hat = self.decoder(z)
+            loss = F.mse_loss(x_hat, x)
+            # Logging to TensorBoard by default
+            self.log('train_loss', loss)
+            return loss
+        def validation_step(self, batch, batch_idx):
+            x, y = batch
+            x = x.view(x.size(0), -1)
+            z = self.model(x)
+            x_hat = self.decoder(z)
+            loss = F.mse_loss(x_hat, x)
+            self.log('val_loss', loss)
+        def configure_optimizers(self):
+            optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
+            return optimizer
+        def on_validation_epoch_end(self):
+            nni.report_intermediate_result(self.trainer.callback_metrics['val_loss'].item())
+        def teardown(self, stage):
+            if stage == 'fit':
+                nni.report_final_result(self.trainer.callback_metrics['val_loss'].item())
+Then, users need to wrap everything (including LightningModule, trainer and dataloaders) into a ``Lightning`` object, and pass this object into a Retiarii experiment.
+.. code-block:: python
+    import nni.retiarii.evaluator.pytorch.lightning as pl
+    from nni.retiarii.experiment.pytorch import RetiariiExperiment
+    lightning = pl.Lightning(AutoEncoder(),
+                             pl.Trainer(max_epochs=10),
+                             train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
+                             val_dataloaders=pl.DataLoader(test_dataset, batch_size=100))
+    experiment = RetiariiExperiment(base_model, lightning, mutators, strategy)
--- a/docs/en_US/NAS/Overview.rst
+++ b/docs/en_US/NAS/Overview.rst
 Retiarii for Neural Architecture Search
 =======================================
-.. Note:: NNI's latest NAS supports are all based on Retiarii Framework, users who are still on `early version using NNI NAS v1.0 <https://nni.readthedocs.io/en/v2.2/nas.html>`__ shall migrate your work to Retiarii as soon as possible.
+.. attention:: NNI's latest NAS supports are all based on Retiarii Framework, users who are still on `early version using NNI NAS v1.0 <https://nni.readthedocs.io/en/v2.2/nas.html>`__ shall migrate your work to Retiarii as soon as possible.
 .. contents::

--- a/docs/en_US/NAS/QuickStart.rst
+++ b/docs/en_US/NAS/QuickStart.rst
@@ -12,7 +12,7 @@ In this quick start, we use multi-trial NAS as an example to show how to constru
 The tutorial for One-shot NAS can be found `here <./OneshotTrainer.rst>`__.
-.. note:: Currently, PyTorch is the only supported framework by Retiarii, and we have only tested **PyTorch 1.6 to 1.9**. This documentation assumes PyTorch context but it should also apply to other frameworks, which is in our future plan.
+Currently, PyTorch is the only supported framework by Retiarii, and we have only tested **PyTorch 1.7 to 1.10**. This documentation assumes PyTorch context but it should also apply to other frameworks, which is in our future plan.
 Define your Model Space
 -----------------------
@@ -52,6 +52,8 @@ Below is a very simple example of defining a base model.
      output = F.log_softmax(x, dim=1)
      return output
+.. tip:: Always keep in mind that you should use ``import nni.retiarii.nn.pytorch as nn`` and :meth:`nni.retiarii.model_wrapper`. Many mistakes are a result of forgetting one of those. Also, please use ``torch.nn`` for submodules of ``nn.init``, e.g., ``torch.nn.init`` instead of ``nn.init``. 
 Define Model Mutations
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -95,7 +97,7 @@ Based on the above base model, we can define a model space as below.
 This example uses two mutation APIs, ``nn.LayerChoice`` and ``nn.ValueChoice``. ``nn.LayerChoice`` takes a list of candidate modules (two in this example), one will be chosen for each sampled model. It can be used like normal PyTorch module. ``nn.ValueChoice`` takes a list of candidate values, one will be chosen to take effect for each sampled model.
-More detailed API description and usage can be found `here <./construct_space.rst>`__\.
+More detailed API description and usage can be found `here <./construct_space.rst>`__ .
 .. note:: We are actively enriching the mutation APIs, to facilitate easy construction of model space. If the currently supported mutation APIs cannot express your model space, please refer to `this doc <./Mutators.rst>`__ for customizing mutators.
@@ -124,30 +126,40 @@ Pick or customize a model evaluator
 In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training and validating each generated model to obtain the model's performance. The performance is sent to the exploration strategy for the strategy to generate better models.
-Retiarii has provided two built-in model evaluators, designed for simple use cases: classification and regression. These two evaluators are built upon the awesome library PyTorch-Lightning.
+Retiarii has provided `built-in model evaluators <./ModelEvaluators.rst>`__, but to start with, it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function. This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.
 An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.
 ..  code-block:: python
-  import nni.retiarii.evaluator.pytorch.lightning as pl
+    def evaluate_model(model_cls):
-  from nni.retiarii import serialize
+      # "model_cls" is a class, need to instantiate
-  from torchvision import transforms
+      model = model_cls()
+      optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
+      transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+      train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
+      test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
+      device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
-  transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+      for epoch in range(3):
-  train_dataset = serialize(MNIST, root='data/mnist', train=True, download=True, transform=transform)
+        # train the model for one epoch
-  test_dataset = serialize(MNIST, root='data/mnist', train=False, download=True, transform=transform)
+        train_epoch(model, device, train_loader, optimizer, epoch)
-  trainer = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
+        # test the model for one epoch
-                              val_dataloaders=pl.DataLoader(test_dataset, batch_size=100),
+        accuracy = test_epoch(model, device, test_loader)
-                              max_epochs=2)
+        # call report intermediate result. Result can be float or dict
+        nni.report_intermediate_result(accuracy)
-``serialize`` is for serializing the objects to make model evaluator executable on another process or another machine (e.g., on remote training service). Retiarii provided model evaluators and other classes are already serializable. Other objects should be applied ``serialize``, for example, ``MNIST`` in the above example.
+      # report final test result
+      nni.report_final_result(accuracy)
-Detailed descriptions and usages of model evaluators can be found `here <./ApiReference.rst>`__ .
+    # Create the evaluator
+    evaluator = nni.retiarii.evaluator.FunctionalEvaluator(evaluate_model)
-If the built-in model evaluators do not meet your requirement, or you already wrote the training code and just want to use it, you can follow `the guide to write a new model evaluator <./WriteTrainer.rst>`__ .
+The ``train_epoch`` and ``test_epoch`` here can be any customized function, where users can write their own training recipe. See :githublink:`examples/nas/multi-trial/mnist/search.py` for the full example.
-.. warning:: Mutations on the parameters of model evaluator is currently not supported but will be supported in the future.
+It is recommended that the ``evaluate_model`` here accepts no additional arguments other than ``model_cls``. However, in the `advanced tutorial <./ModelEvaluators.rst>`__, we will show how to use additional arguments in case you actually need those. In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".
 Launch an Experiment
 --------------------
@@ -156,7 +168,7 @@ After all the above are prepared, it is time to start an experiment to do the mo
 .. code-block:: python
-  exp = RetiariiExperiment(base_model, trainer, [], simple_strategy)
+  exp = RetiariiExperiment(base_model, evaluator, [], search_strategy)
  exp_config = RetiariiExeConfig('local')
  exp_config.experiment_name = 'mnist_search'
  exp_config.trial_concurrency = 2
@@ -171,7 +183,7 @@ Visualize the Experiment
 Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost::8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../Tutorial/WebUI.rst>`__ for details.
-We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__). This can be used by clicking ``Visualization`` in detail panel for each trial. Note that current visualization is based on `onnx <https://onnx.ai/>`__ . Built-in evaluators (e.g., Classification) will automatically export the model into a file, for your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.
+We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__). This can be used by clicking ``Visualization`` in detail panel for each trial. Note that current visualization is based on `onnx <https://onnx.ai/>`__ , thus visualization is not feasible if the model cannot be exported into onnx. Built-in evaluators (e.g., Classification) will automatically export the model into a file. For your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.
 Export Top Models
 -----------------

--- a/docs/en_US/NAS/Serialization.rst
+++ b/docs/en_US/NAS/Serialization.rst
+Serialization
+=============
+In multi-trial NAS, a sampled model should be able to be executed on a remote machine or a training platform (e.g., AzureML, OpenPAI). "Serialization" enables re-instantiation of model evaluator in another process or machine, such that, both the model and its model evaluator should be correctly serialized. To make NNI correctly serialize model evaluator, users should apply ``nni.trace`` on some of their functions and objects. API references can be found in :func:`nni.trace`.
+Serialization is implemented as a combination of `json-tricks <https://json-tricks.readthedocs.io/en/latest/>`_ and `cloudpickle <https://github.com/cloudpipe/cloudpickle>`_. Essentially, it is json-tricks, that is a enhanced version of Python JSON, enabling handling of serialization of numpy arrays, date/times, decimal, fraction and etc. The difference lies in the handling of class instances. Json-tricks deals with class instances with ``__dict__`` and ``__class__``, which in most of our cases are not reliable (e.g., datasets, dataloaders). Rather, our serialization deals with class instances with two methods:
+1. If the class / factory that creates the object is decorated with ``nni.trace``, we can serialize the class / factory function, along with the parameters, such that the instance can be re-instantiated.
+2. Otherwise, cloudpickle is used to serialize the object into a binary.
+The recommendation is, unless you are absolutely certain that there is no problem and extra burden to serialize the object into binary, always add ``nni.trace``. In most cases, it will be more clean and neat, and enables possibilities such as mutation of parameters (will be supported in future).
+.. warning::
+    **What will happen if I forget to "trace" my objects?**
+    It is likely that the program can still run. NNI will try to serialize the untraced object into a binary. If might fail in complicated cases (e.g., circular dependency). Even if it succeeds, the result might be a substantially large object. For example, if you forgot to add ``nni.trace`` on ``MNIST``, the MNIST dataset object wil be serialized into binary, which will be dozens of megabytes because the object has the whole 60k images stored inside. You might see warnings and even errors when running experiments. To avoid such issues, the easiest way is to always remember to add ``nni.trace`` to non-primitive objects.
+To trace a function or class, users can use decorator like,
+.. code-block:: python
+    @nni.trace
+    class MyClass:
+        ...
+Inline trace that traces instantly on the object instantiation or function invoke is also acceptable: ``nni.trace(MyClass)(parameters)``.
+Assuming a class ``cls`` is already traced, when it is serialized, its class type along with initialization parameters will be dumped. As the parameters are possibly class instances (if not primitive types like ``int`` and ``str``), their serialization will be a similar problem. We recommend decorate them with ``nni.trace`` as well. In other words, ``nni.trace`` should be applied recursively if necessary.
+Below is an example, ``transforms.Compose``, ``transforms.Normalize``, and ``MNIST`` are serialized manually using ``nni.trace``. ``nni.trace`` takes a class / function as its argument, and returns a wrapped class and function that has the same behavior with the original class / function. The usage of the wrapped class / function is also identical to the original one, except that the arguments are recorded. No need to apply ``nni.trace`` to ``pl.Classification`` and ``pl.DataLoader`` because they are already traced.
+.. code-block:: python
+  import nni
+  import nni.retiarii.evaluator.pytorch.lightning as pl
+  from torchvision import transforms
+  def create_mnist_dataset(root, transform):
+    return MNIST(root='data/mnist', train=False, download=True, transform=transform)
+  transform = nni.trace(transforms.Compose)([nni.trace(transforms.ToTensor)(), nni.trace(transforms.Normalize)((0.1307,), (0.3081,))])
+  # If you write like following, the whole transform will be serialized into a pickle.
+  # This actually works fine, but we do NOT recommend such practice.
+  # transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+  train_dataset = nni.trace(MNIST)(root='data/mnist', train=True, download=True, transform=transform)
+  test_dataset = nni.trace(create_mnist_dataset)('data/mnist', transform=transform)  # factory is also acceptable
+  evaluator = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
+                                val_dataloaders=pl.DataLoader(test_dataset, batch_size=100),
+                                max_epochs=10)
+.. note::
+    **What's the relationship between model_wrapper, basic_unit and nni.trace?**
+    They are fundamentally different. ``model_wrapper`` is used to wrap a base model (search space), ``basic_unit`` to annotate a module as primitive. ``nni.trace`` is to enable serialization of general objects. Though they share similar underlying implementations, but do keep in mind that you will experience errors if you mix them up.
+    .. seealso:: Please refer to API reference of :meth:`nni.retiarii.model_wrapper`, :meth:`nni.retiarii.basic_unit`, and :meth:`nni.trace`.
--- a/docs/en_US/NAS/WriteStrategy.rst
+++ b/docs/en_US/NAS/WriteStrategy.rst
-Customize Exploration Strategy
-==============================
-If users want to innovate a new exploration strategy, they can easily customize a new one following the interface provided by NNI. Specifically, users should inherit the base strategy class ``BaseStrategy``, then implement the member function ``run``. This member function takes ``base_model`` and ``applied_mutators`` as its input arguments. It can simply apply the user specified mutators in ``applied_mutators`` onto ``base_model`` to generate a new model. When a mutator is applied, it should be bound with a sampler (e.g., ``RandomSampler``). Every sampler implements the ``choice`` function which chooses value(s) from candidate values. The ``choice`` functions invoked in mutators are executed with the sampler.
-Below is a very simple random strategy, which makes the choices completely random.
-.. code-block:: python
-    from nni.retiarii import Sampler
-    class RandomSampler(Sampler):
-        def choice(self, candidates, mutator, model, index):
-            return random.choice(candidates)
-    class RandomStrategy(BaseStrategy):
-        def __init__(self):
-            self.random_sampler = RandomSampler()
-        def run(self, base_model, applied_mutators):
-            _logger.info('stargety start...')
-            while True:
-                avail_resource = query_available_resources()
-                if avail_resource > 0:
-                    model = base_model
-                    _logger.info('apply mutators...')
-                    _logger.info('mutators: %s', str(applied_mutators))
-                    for mutator in applied_mutators:
-                        mutator.bind_sampler(self.random_sampler)
-                        model = mutator.apply(model)
-                    # run models
-                    submit_models(model)
-                else:
-                    time.sleep(2)
-You can find that this strategy does not know the search space beforehand, it passively makes decisions every time ``choice`` is invoked from mutators. If a strategy wants to know the whole search space before making any decision (e.g., TPE, SMAC), it can use ``dry_run`` function provided by ``Mutator`` to obtain the space. An example strategy can be found :githublink:`here <nni/retiarii/strategy/tpe_strategy.py>`.
-After generating a new model, the strategy can use our provided APIs (e.g., ``submit_models``, ``is_stopped_exec``) to submit the model and get its reported results. More APIs can be found in `API References <./ApiReference.rst>`__.
\ No newline at end of file
--- a/docs/en_US/NAS/WriteTrainer.rst
+++ b/docs/en_US/NAS/WriteTrainer.rst
-Customize A New Model Evaluator
-===============================
-Model Evaluator is necessary to evaluate the performance of new explored models. A model evaluator usually includes training, validating and testing of a single model. We provide two ways for users to write a new model evaluator, which will be demonstrated below respectively.
-With FunctionalEvaluator
------------------------
-The simplest way to customize a new evaluator is with functional APIs, which is very easy when training code is already available. Users only need to write a fit function that wraps everything. This function takes one positional arguments (``model_cls``) and possible keyword arguments. The keyword arguments (other than ``model_cls``) are fed to FunctionEvaluator as its initialization parameters. In this way, users get everything under their control, but expose less information to the framework and thus fewer opportunities for possible optimization. An example is as belows:
-.. code-block:: python
-    from nni.retiarii.evaluator import FunctionalEvaluator
-    from nni.retiarii.experiment.pytorch import RetiariiExperiment
-    def fit(model_cls, dataloader):
-        model = model_cls()
-        train(model, dataloader)
-        acc = test(model, dataloader)
-        nni.report_final_result(acc)
-    evaluator = FunctionalEvaluator(fit, dataloader=DataLoader(foo, bar))
-    experiment = RetiariiExperiment(base_model, evaluator, mutators, strategy)
-.. note:: Due to our current implementation limitation, the ``fit`` function should be put in another python file instead of putting it in the main file. This limitation will be fixed in future release.
-.. note:: When using customized evaluators, if you want to visualize models, you need to export your model and save it into ``$NNI_OUTPUT_DIR/model.onnx`` in your evaluator.
-With PyTorch-Lightning
----------------------
-It's recommended to write training code in PyTorch-Lightning style, that is, to write a LightningModule that defines all elements needed for training (e.g., loss function, optimizer) and to define a trainer that takes (optional) dataloaders to execute the training. Before that, please read the `document of PyTorch-lightning <https://pytorch-lightning.readthedocs.io/>`__ to learn the basic concepts and components provided by PyTorch-lightning.
-In practice, writing a new training module in Retiarii should inherit ``nni.retiarii.evaluator.pytorch.lightning.LightningModule``, which has a ``set_model`` that will be called after ``__init__`` to save the candidate model (generated by strategy) as ``self.model``. The rest of the process (like ``training_step``) should be the same as writing any other lightning module. Evaluators should also communicate with strategies via two API calls (``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics), added in ``on_validation_epoch_end`` and ``teardown`` respectively. 
-An example is as follows:
-.. code-block:: python
-    from nni.retiarii.evaluator.pytorch.lightning import LightningModule  # please import this one
-    @basic_unit
-    class AutoEncoder(LightningModule):
-        def __init__(self):
-            super().__init__()
-            self.decoder = nn.Sequential(
-                nn.Linear(3, 64),
-                nn.ReLU(),
-                nn.Linear(64, 28*28)
-            )
-        def forward(self, x):
-            embedding = self.model(x)  # let's search for encoder
-            return embedding
-        def training_step(self, batch, batch_idx):
-            # training_step defined the train loop.
-            # It is independent of forward
-            x, y = batch
-            x = x.view(x.size(0), -1)
-            z = self.model(x)  # model is the one that is searched for
-            x_hat = self.decoder(z)
-            loss = F.mse_loss(x_hat, x)
-            # Logging to TensorBoard by default
-            self.log('train_loss', loss)
-            return loss
-        def validation_step(self, batch, batch_idx):
-            x, y = batch
-            x = x.view(x.size(0), -1)
-            z = self.model(x)
-            x_hat = self.decoder(z)
-            loss = F.mse_loss(x_hat, x)
-            self.log('val_loss', loss)
-        def configure_optimizers(self):
-            optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
-            return optimizer
-        def on_validation_epoch_end(self):
-            nni.report_intermediate_result(self.trainer.callback_metrics['val_loss'].item())
-        def teardown(self, stage):
-            if stage == 'fit':
-                nni.report_final_result(self.trainer.callback_metrics['val_loss'].item())
-Then, users need to wrap everything (including LightningModule, trainer and dataloaders) into a ``Lightning`` object, and pass this object into a Retiarii experiment.
-.. code-block:: python
-    import nni.retiarii.evaluator.pytorch.lightning as pl
-    from nni.retiarii.experiment.pytorch import RetiariiExperiment
-    lightning = pl.Lightning(AutoEncoder(),
-                             pl.Trainer(max_epochs=10),
-                             train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
-                             val_dataloaders=pl.DataLoader(test_dataset, batch_size=100))
-    experiment = RetiariiExperiment(base_model, lightning, mutators, strategy)
--- a/docs/en_US/NAS/multi_trial_nas.rst
+++ b/docs/en_US/NAS/multi_trial_nas.rst
@@ -4,10 +4,9 @@ Multi-trial NAS
 In multi-trial NAS, users need model evaluator to evaluate the performance of each sampled model, and need an exploration strategy to sample models from a defined model space. Here, users could use NNI provided model evaluators or write their own model evalutor. They can simply choose a exploration strategy. Advanced users can also customize new exploration strategy. For a simple example about how to run a multi-trial NAS experiment, please refer to `Quick Start <./QuickStart.rst>`__.
 ..  toctree::
-    :maxdepth: 1
+    :maxdepth: 2
    Model Evaluators <ModelEvaluators>
-    Customize Model Evaluator <WriteTrainer>
    Exploration Strategies <ExplorationStrategies>
-    Customize Exploration Strategies <WriteStrategy>
    Execution Engines <ExecutionEngines>
+    Serialization <Serialization>
--- a/docs/zh_CN/NAS/QuickStart.rst
+++ b/docs/zh_CN/NAS/QuickStart.rst
-.. 83ce1769eb03248c40c61ccae8afe4cd
+.. 2cbe7334076be1841320c31208c338ff
 快速入门 Retiarii
 ==============================
@@ -14,7 +14,7 @@
 One-shot NAS 教程在 `这里 <./OneshotTrainer.rst>`__。
-.. note:: 目前，PyTorch 是 Retiarii 唯一支持的框架，我们只用 **PyTorch 1.6 和 1.7** 进行了测试。 本文档基于 PyTorch 的背景，但它也应该适用于其他框架，这在我们未来的计划中。
+.. note:: 目前，PyTorch 是 Retiarii 唯一支持的框架，我们只用 **PyTorch 1.7 和 1.10** 进行了测试。 本文档基于 PyTorch 的背景，但它也应该适用于其他框架，这在我们未来的计划中。
 定义模型空间
 -----------------------
@@ -30,90 +30,85 @@ One-shot NAS 教程在 `这里 <./OneshotTrainer.rst>`__。
 .. code-block:: python
+  import torch
  import torch.nn.functional as F
  import nni.retiarii.nn.pytorch as nn
  from nni.retiarii import model_wrapper
-  class BasicBlock(nn.Module):
+  @model_wrapper      # this decorator should be put on the out most
-    def __init__(self, const):
+  class Net(nn.Module):
-      self.const = const
-    def forward(self, x):
-      return x + self.const
-  class ConvPool(nn.Module):
    def __init__(self):
      super().__init__()
-      self.conv = nn.Conv2d(32, 1, 5)  # possibly mutate this conv
+      self.conv1 = nn.Conv2d(1, 32, 3, 1)
-      self.pool = nn.MaxPool2d(kernel_size=2)
+      self.conv2 = nn.Conv2d(32, 64, 3, 1)
-    def forward(self, x):
+      self.dropout1 = nn.Dropout(0.25)
-      return self.pool(self.conv(x))
+      self.dropout2 = nn.Dropout(0.5)
+      self.fc1 = nn.Linear(9216, 128)
+      self.fc2 = nn.Linear(128, 10)
-  @model_wrapper      # 这个装饰器应该放在最外面的 PyTorch 模块上
-  class Model(nn.Module):
-    def __init__(self):
-      super().__init__()
-      self.convpool = ConvPool()
-      self.mymodule = BasicBlock(2.)
    def forward(self, x):
-      return F.relu(self.convpool(self.mymodule(x)))
+      x = F.relu(self.conv1(x))
+      x = F.max_pool2d(self.conv2(x), 2)
+      x = torch.flatten(self.dropout1(x), 1)
+      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+      output = F.log_softmax(x, dim=1)
+      return output
+.. tip:: 记得使用 ``import nni.retiarii.nn.pytorch as nn`` 和 :meth:`nni.retiarii.model_wrapper`. 许多错误都源于忘记使用它们。同时，对于 ``nn`` 的子模块（例如 ``nn.init``）请使用 ``torch.nn``，比如，``torch.nn.init`` 而不是 ``nn.init``。
 定义模型突变
 ^^^^^^^^^^^^^^^^^^^^^^
-基本模型只是一个具体模型，而不是模型空间。 我们为用户提供 API 和原语，用于把基本模型变形成包含多个模型的模型空间。
+基本模型只是一个具体模型，而不是模型空间。 我们为用户提供 `API 和原语 <./MutationPrimitives.rst>`__，用于把基本模型变形成包含多个模型的模型空间。
-用户可以按以下方式实例化多个 Mutator，这些 Mutator 将依次依次应用于基本模型来对新模型进行采样。 API 可以像 PyTorch 模块一样使用。 这种方法也被称为内联突变。
-* ``nn.LayerChoice``， 它允许用户放置多个候选操作（例如，PyTorch 模块），在每个探索的模型中选择其中一个。
-  .. code-block:: python
-    # import nni.retiarii.nn.pytorch as nn
-    # 在 `__init__` 中声明
-    self.layer = nn.LayerChoice([
-      ops.PoolBN('max', channels, 3, stride, 1),
-      ops.SepConv(channels, channels, 3, stride, 1),
-      nn.Identity()
-    ]))
-    # 在 `forward` 函数中调用
-    out = self.layer(x)
-* ``nn.InputChoice``， 它主要用于选择（或尝试）不同的连接。 它会从设置的几个张量中，选择 ``n_chosen`` 个张量。
+基于上面定义的基本模型，我们可以这样定义一个模型空间：
-  .. code-block:: python
+.. code-block:: diff
-    # import nni.retiarii.nn.pytorch as nn
+  import torch
-    # 在 `__init__` 中声明
+  import torch.nn.functional as F
-    self.input_switch = nn.InputChoice(n_chosen=1)
+  import nni.retiarii.nn.pytorch as nn
-    # 在 `forward` 函数中调用，三者选一
+  from nni.retiarii import model_wrapper
-    out = self.input_switch([tensor1, tensor2, tensor3])
-* ``nn.ValueChoice``， 它用于从一些候选值中选择一个值。 它只能作为基本单元的输入参数，即 ``nni.retiarii.nn.pytorch`` 中的模块和用 ``@basic_unit`` 装饰的用户定义的模块。
-  .. code-block:: python
-    # import nni.retiarii.nn.pytorch as nn
+  @model_wrapper
-    # 在 `__init__` 中声明
+  class Net(nn.Module):
-    self.conv = nn.Conv2d(XX, XX, kernel_size=nn.ValueChoice([1, 3, 5])
+    def __init__(self):
-    self.op = MyOp(nn.ValueChoice([0, 1], nn.ValueChoice([-1, 1]))
+      super().__init__()
+      self.conv1 = nn.Conv2d(1, 32, 3, 1)
+  -   self.conv2 = nn.Conv2d(32, 64, 3, 1)
+  +   self.conv2 = nn.LayerChoice([
+  +       nn.Conv2d(32, 64, 3, 1),
+  +       DepthwiseSeparableConv(32, 64)
+  +   ])
+  -   self.dropout1 = nn.Dropout(0.25)
+  +   self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
+      self.dropout2 = nn.Dropout(0.5)
+  -   self.fc1 = nn.Linear(9216, 128)
+  -   self.fc2 = nn.Linear(128, 10)
+  +   feature = nn.ValueChoice([64, 128, 256])
+  +   self.fc1 = nn.Linear(9216, feature)
+  +   self.fc2 = nn.Linear(feature, 10)
-所有的API都有一个可选的参数，叫做 ``label``，具有相同标签的突变将共享相同的选择。 一个典型示例：
+    def forward(self, x):
+      x = F.relu(self.conv1(x))
+      x = F.max_pool2d(self.conv2(x), 2)
+      x = torch.flatten(self.dropout1(x), 1)
+      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
+      output = F.log_softmax(x, dim=1)
+      return output
-  .. code-block:: python
+在这个例子中我们使用了两个突变 API， ``nn.LayerChoice`` 和 ``nn.ValueChoice``。 ``nn.LayerChoice`` 的输入参数是一个候选模块的列表（在这个例子中是两个），每个采样到的模型会选择其中的一个，然后它就可以像一般的 PyTorch 模块一样被使用。 ``nn.ValueChoice`` 输入一系列候选的值，然后对于每个采样到的模型，其中的一个值会生效。
-    self.net = nn.Sequential(
+更多的 API 描述和用法可以请阅读 `这里 <./construct_space.rst>`__ 。
-        nn.Linear(10, nn.ValueChoice([32, 64, 128], label='hidden_dim'),
-        nn.Linear(nn.ValueChoice([32, 64, 128], label='hidden_dim'), 3)
-    )
-使用说明和 API 文档在 `这里 <./ApiReference>`__。 详细的 API 描述和使用说明在 `这里 <./ApiReference.rst>`__。 使用这些 API 的示例在 :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>`。 我们正在积极丰富内联突变 API，使其更容易表达一个新的搜索空间。 参考 `这里 <./construct_space.rst>`__ 获取更多关于表达复杂模型空间的教程。
+.. note:: 我们正在积极的丰富突变 API，以简化模型空间的构建。如果我们提供的 API 不能满足您表达模型空间的需求，请阅读 `这个文档 <./Mutators.rst>`__ 以获得更多定制突变的资讯。
 探索定义的模型空间
 -------------------------------
-基本上有两种探索方法：(1)通过独立评估每个采样模型进行搜索；(2)基于 One-Shot 的权重共享式搜索。 我们在本教程中演示了下面的第一种方法。 第二种方法可以参考 `这里 <./OneshotTrainer.rst>`__。
+简单来说，探索模型空间有两种方法：(1) 通过独立评估每个采样模型进行搜索；(2) 基于 One-Shot 的权重共享式搜索。 我们在本教程中演示了下面的第一种方法。 第二种方法可以参考 `这里 <./OneshotTrainer.rst>`__。
-用户可以选择合适的探索策略来探索模型空间，并选择或自定义模型评估器来评估每个采样模型的性能。
+首先，用户需要选择合适的探索策略来探索模型空间。然后，用户需要选择或自定义模型评估器来评估每个采样模型的性能。
 选择搜索策略
 ^^^^^^^^^^^^^^^^^^^^^^^^
@@ -133,57 +128,65 @@ Retiarii 支持许多 `探索策略（exploration strategies） <./ExplorationSt
 在 NAS 过程中，探索策略反复生成新模型。 模型评估器用于训练和验证每个生成的模型。 生成的模型所获得的性能被收集起来，并送至探索策略以生成更好的模型。
-在 PyTorch 的上下文中，Retiarii 提供了两个内置模型评估器，为简单用例而设计：分类和回归。 这两个评估器是建立在强大的库 PyTorch-Lightning 之上。
+Retiarii 提供了诸多的 `内置模型评估器 <./ModelEvaluators.rst>`__，但是作为第一步，我们还是推荐使用 ``FunctionalEvaluator``，也就是说，将您自己的训练和测试代码用一个函数包起来。这个函数的输入参数是一个模型的类，然后使用 ``nni.report_final_result`` 来汇报模型的效果。
-这里的一个例子创建了一个简单的评估器，它在 MNIST 数据集上运行，训练 10 个 Epoch，并报告其验证准确性。
+这里的一个例子创建了一个简单的评估器，它在 MNIST 数据集上运行，训练 2 个 Epoch，并报告其在验证集上的准确率。
 ..  code-block:: python
-  import nni.retiarii.evaluator.pytorch.lightning as pl
+    def evaluate_model(model_cls):
-  from nni.retiarii import serialize
+      # "model_cls" 是一个类，需要初始化
-  from torchvision import transforms
+      model = model_cls()
-  transform = serialize(transforms.Compose, [serialize(transforms.ToTensor()), serialize(transforms.Normalize, (0.1307,), (0.3081,))])
+      optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
-  train_dataset = serialize(MNIST, root='data/mnist', train=True, download=True, transform=transform)
+      transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-  test_dataset = serialize(MNIST, root='data/mnist', train=False, download=True, transform=transform)
+      train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
-  evaluator = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
+      test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
-                                val_dataloaders=pl.DataLoader(test_dataset, batch_size=100),
-                                max_epochs=10)
-由于模型评估器是在另一个进程中运行的（可能是在一些远程机器中），定义的评估器以及它的所有参数都需要被正确序列化。 例如，用户应该使用已经被包装为在 ``nni.retiarii.evaluator.pytorch.lightning`` 中的可序列化类的 dataloader。 对于 dataloader 中使用的参数，需要进行递归序列化，直到参数为 int、str、float 等简单类型。
+      device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
-模型评价器的详细描述和使用方法可以在 `这里 <./ApiReference.rst>`__ 找到。
+      for epoch in range(3):
+        # 训练模型，1 个 epoch
+        train_epoch(model, device, train_loader, optimizer, epoch)
+        # 测试模型，1 个 epoch
+        accuracy = test_epoch(model, device, test_loader)
+        # 汇报中间结果，可以是 float 或者 dict 类型
+        nni.report_intermediate_result(accuracy)
-如果内置的模型评估器不符合您的要求，或者您已经编写了训练代码只是想使用它，您可以参考 `编写新模型评估器的指南 <./WriteTrainer.rst>`__ 。
+      # 汇报最终结果
+      nni.report_final_result(accuracy)
-.. note:: 如果您想在本地运行模型评估器以进行调试，您可以通过 ``evaluator._execute(Net)`` 直接运行评估器（注意它必须是 ``Net``，而不是 ``Net()``）。 但是，此 API 目前是内部的，可能会发生变化。
+    # 创建模型评估器
+    evaluator = nni.retiarii.evaluator.FunctionalEvaluator(evaluate_model)
-.. warning:: 目前不支持模型评估器参数的突变（也就是超参数调整），但将未来会支持。
+在这里 ``train_epoch`` 和 ``test_epoch`` 可以是任意自定义的函数，用户可以写自己的训练流程。完整的样例可以参见 :githublink:`examples/nas/multi-trial/mnist/search.py`。
-.. warning:: 要在 Retiarii中 使用 PyTorch-lightning，目前你需要安装 PyTorch-lightning v1.1.x（不支持 v1.2）。
+我们建议 ``evaluate_model`` 不接受 ``model_cls`` 以外的其他参数。但是，我们在 `高级教程 <./ModelEvaluators.rst>`__ 中展示了其他参数的用法，如果您真的需要的话。另外，我们会在未来支持这些参数的突变（这通常会成为 "超参调优"）。
 发起 Experiment
 --------------------
-上述内容准备就绪之后，就可以发起 Experiment 以进行模型搜索了。 样例如下：
+一切准备就绪，就可以发起 Experiment 以进行模型搜索了。 样例如下：
 .. code-block:: python
-  exp = RetiariiExperiment(base_model, trainer, None, simple_strategy)
+  exp = RetiariiExperiment(base_model, evaluator, [], search_strategy)
  exp_config = RetiariiExeConfig('local')
-  exp_config.experiment_name = 'mnasnet_search'
+  exp_config.experiment_name = 'mnist_search'
  exp_config.trial_concurrency = 2
-  exp_config.max_trial_number = 10
+  exp_config.max_trial_number = 20
  exp_config.training_service.use_active_gpu = False
  exp.run(exp_config, 8081)
-一个简单 MNIST 示例的完整代码在 :githublink:`这里 <test/retiarii_test/mnist/test.py>`。 除了本地训练平台，用户还可以在 `不同的训练平台 <../training_services.rst>`__ 上运行 Retiarii 的实验。
+一个简单 MNIST 示例的完整代码在 :githublink:`这里 <examples/nas/multi-trial/mnist/search.py>`。 除了本地训练平台，用户还可以在除了本地机器以外的 `不同的训练平台 <../training_services.rst>`__ 上运行 Retiarii 的实验。
 可视化 Experiment
 ------------------------
 用户可以像可视化普通的超参数调优 Experiment 一样可视化他们的 Experiment。 例如，在浏览器里打开 ``localhost::8081``，8081 是在 ``exp.run`` 里设置的端口。 参考 `这里 <../Tutorial/WebUI.rst>`__ 了解更多细节。
+我们支持使用第三方工具（例如 `Netron <https://netron.app/>`__）可视化搜索过程中采样到的模型。您可以点击每个 trial 面板下的 ``Visualization``。注意，目前的可视化是基于导出成 `onnx <https://onnx.ai/>`__ 格式的模型实现的，所以如果模型无法导出成 onnx，那么可视化就无法进行。内置的模型评估器（比如 Classification）已经自动将模型导出成了一个文件。如果您自定义了模型，您需要将模型导出到 ``$NNI_OUTPUT_DIR/model.onnx``。
 导出最佳模型
 -----------------

--- a/docs/zh_CN/NAS/Serialization.rst
+++ b/docs/zh_CN/NAS/Serialization.rst
+../../en_US/NAS/Serialization.rst
\ No newline at end of file
--- a/docs/zh_CN/NAS/WriteStrategy.rst
+++ b/docs/zh_CN/NAS/WriteStrategy.rst
-../../en_US/NAS/WriteStrategy.rst
\ No newline at end of file
--- a/docs/zh_CN/NAS/WriteTrainer.rst
+++ b/docs/zh_CN/NAS/WriteTrainer.rst
-../../en_US/NAS/WriteTrainer.rst
\ No newline at end of file
--- a/docs/zh_CN/NAS/multi_trial_nas.rst
+++ b/docs/zh_CN/NAS/multi_trial_nas.rst
-.. 579c410263c842f4541cbdad14723328
+.. 51734c9945d4eca0f9b5633929d8fadf
 Multi-trial NAS
 ===============
@@ -9,7 +9,6 @@ Multi-trial NAS
    :maxdepth: 1
    模型评估器 <ModelEvaluators>
-    自定义模型评估器 <WriteTrainer>
    探索策略 <ExplorationStrategies>
-    自定义探索策略 <WriteStrategy>
    执行引擎 <ExecutionEngines>
+    序列化 <Serialization>
--- a/examples/nas/multi-trial/mnist/search.py
+++ b/examples/nas/multi-trial/mnist/search.py
 import random
+import nni
 import torch
+import torch.nn.functional as F
+# remember to import nni.retiarii.nn.pytorch as nn, instead of torch.nn as nn
 import nni.retiarii.nn.pytorch as nn
 import nni.retiarii.strategy as strategy
-import nni.retiarii.evaluator.pytorch.lightning as pl
+from nni.retiarii import model_wrapper
-import torch.nn.functional as F
+from nni.retiarii.evaluator import FunctionalEvaluator
-from nni.retiarii import serialize, model_wrapper
 from nni.retiarii.experiment.pytorch import RetiariiExeConfig, RetiariiExperiment, debug_mutated_model
 from torch.utils.data import DataLoader
 from torchvision import transforms
 from torchvision.datasets import MNIST
 class DepthwiseSeparableConv(nn.Module):
    def __init__(self, in_ch, out_ch):
        super().__init__()
@@ -20,18 +23,24 @@ class DepthwiseSeparableConv(nn.Module):
    def forward(self, x):
        return self.pointwise(self.depthwise(x))
 @model_wrapper
 class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
+        # LayerChoice is used to select a layer between Conv2d and DwConv.
        self.conv2 = nn.LayerChoice([
            nn.Conv2d(32, 64, 3, 1),
            DepthwiseSeparableConv(32, 64)
        ])
+        # ValueChoice is used to select a dropout rate.
+        # ValueChoice can be used as parameter of modules wrapped in `nni.retiarii.nn.pytorch`
+        # or customized modules wrapped with `@basic_unit`.
        self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
        self.dropout2 = nn.Dropout(0.5)
        feature = nn.ValueChoice([64, 128, 256])
+        # Same value choice can be used multiple times
        self.fc1 = nn.Linear(9216, feature)
        self.fc2 = nn.Linear(feature, 10)
@@ -40,21 +49,76 @@ class Net(nn.Module):
        x = F.max_pool2d(self.conv2(x), 2)
        x = torch.flatten(self.dropout1(x), 1)
        x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
-        output = F.log_softmax(x, dim=1)
+        return x
-        return output
+def train_epoch(model, device, train_loader, optimizer, epoch):
+    loss_fn = torch.nn.CrossEntropyLoss()
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = loss_fn(output, target)
+        loss.backward()
+        optimizer.step()
+        if batch_idx % 10 == 0:
+            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
+                epoch, batch_idx * len(data), len(train_loader.dataset),
+                100. * batch_idx / len(train_loader), loss.item()))
+def test_epoch(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+    test_loss /= len(test_loader.dataset)
+    accuracy = 100. * correct / len(test_loader.dataset)
+    print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
+        correct, len(test_loader.dataset), accuracy))
+    return accuracy
+def evaluate_model(model_cls):
+    # "model_cls" is a class, need to instantiate
+    model = model_cls()
+    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
+    transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+    train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
+    test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
+    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+    for epoch in range(3):
+        # train the model for one epoch
+        train_epoch(model, device, train_loader, optimizer, epoch)
+        # test the model for one epoch
+        accuracy = test_epoch(model, device, test_loader)
+        # call report intermediate result. Result can be float or dict
+        nni.report_intermediate_result(accuracy)
+    # report final test result
+    nni.report_final_result(accuracy)
 if __name__ == '__main__':
    base_model = Net()
-    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-    train_dataset = serialize(MNIST, root='data/mnist', train=True, download=True, transform=transform)
-    test_dataset = serialize(MNIST, root='data/mnist', train=False, download=True, transform=transform)
-    trainer = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
-                                val_dataloaders=pl.DataLoader(test_dataset, batch_size=100),
-                                max_epochs=2)
-    simple_strategy = strategy.Random()
+    search_strategy = strategy.Random()
+    model_evaluator = FunctionalEvaluator(evaluate_model)
-    exp = RetiariiExperiment(base_model, trainer, [], simple_strategy)
+    exp = RetiariiExperiment(base_model, model_evaluator, [], search_strategy)
    exp_config = RetiariiExeConfig('local')
    exp_config.experiment_name = 'mnist_search'

--- a/nni/common/serializer.py
+++ b/nni/common/serializer.py
 import abc
-import copy
-import collections.abc
 import base64
+import collections.abc
+import copy
 import functools
 import inspect
 import numbers
 import types
 import warnings
 from io import IOBase
-from typing import Any, Union, Dict, Optional, List, TypeVar
+from typing import Any, Dict, List, Optional, TypeVar, Union
-import json_tricks  # use json_tricks as serializer backend
 import cloudpickle  # use cloudpickle as backend for unserializable types and instances
+import json_tricks  # use json_tricks as serializer backend
 __all__ = ['trace', 'dump', 'load', 'Translatable', 'Traceable', 'is_traceable']
@@ -64,7 +63,7 @@ class Traceable(abc.ABC):
 class Translatable(abc.ABC):
    """
-    Inherit this class and implement ``translate`` when the inner class needs a different
+    Inherit this class and implement ``translate`` when the wrapped class needs a different
    parameter from the wrapper class in its init function.
    """
@@ -205,11 +204,11 @@ def trace(cls_or_func: T = None, *, kw_only: bool = True) -> Union[T, Traceable]
    When a class/function is annotated, all the instances/calls will return a object as it normally will.
    Although the object might act like a normal object, it's actually a different object with NNI-specific properties.
-    One exception is that if your function returns None, it will return an empty SerializableObject instead,
+    One exception is that if your function returns None, it will return an empty traceable object instead,
    which should raise your attention when you want to check whether the None ``is None``.
-    When parameters of functions are received, it is first stored, and then a shallow copy will be passed to inner function.
+    When parameters of functions are received, it is first stored, and then a shallow copy will be passed to wrapped function/class.
-    This is to prevent mutable objects gets modified in the inner function.
+    This is to prevent mutable objects gets modified in the wrapped function/class.
    When the function finished execution, we also record extra information about where this object comes from.
    That's why it's called "trace".
    When call ``nni.dump``, that information will be used, by default.
@@ -403,8 +402,9 @@ def _trace_func(func, kw_only):
 def _copy_class_wrapper_attributes(base, wrapper):
    _MISSING = '_missing'
-    for k in functools.WRAPPER_ASSIGNMENTS:
    # assign magic attributes like __module__, __qualname__, __doc__
+    for k in functools.WRAPPER_ASSIGNMENTS:
        v = getattr(base, k, _MISSING)
        if v is not _MISSING:
            try:
@@ -418,9 +418,9 @@ def _copy_class_wrapper_attributes(base, wrapper):
 def _argument_processor(arg):
    # 1) translate
    # handle cases like ValueChoice
-    # This is needed because sometimes the recorded arguments are meant to be different from what the inner object receives.
+    # This is needed because sometimes the recorded arguments are meant to be different from what the wrapped object receives.
    arg = Translatable._translate_argument(arg)
-    # 2) prevent the stored parameters to be mutated by inner class.
+    # 2) prevent the stored parameters to be mutated by wrapped class.
    # an example: https://github.com/microsoft/nni/issues/4329
    if isinstance(arg, (collections.abc.MutableMapping, collections.abc.MutableSequence, collections.abc.MutableSet)):
        arg = copy.copy(arg)
@@ -562,7 +562,7 @@ def _json_tricks_any_object_encode(obj: Any, primitives: bool = False, pickle_si
        return obj
    if hasattr(obj, '__class__') and (hasattr(obj, '__dict__') or hasattr(obj, '__slots__')):
        b = cloudpickle.dumps(obj)
-        if len(b) > pickle_size_limit:
+        if len(b) > pickle_size_limit > 0:
            raise ValueError(f'Pickle too large when trying to dump {obj}. This might be caused by classes that are '
                             'not decorated by @nni.trace. Another option is to force bytes pickling and '
                             'try to raise pickle_size_limit.')

--- a/nni/retiarii/integration.py
+++ b/nni/retiarii/integration.py
@@ -2,6 +2,7 @@
 # Licensed under the MIT license.
 import logging
+import warnings
 from typing import Any, Callable
 import nni
@@ -121,7 +122,19 @@ class RetiariiAdvisor(MsgDispatcherBase):
            'placement_constraint': placement_constraint
        }
        _logger.debug('New trial sent: %s', new_trial)
-        send(CommandType.NewTrialJob, nni.dump(new_trial))
+        send_payload = nni.dump(new_trial, pickle_size_limit=-1)
+        if len(send_payload) > 256 * 1024:
+            warnings.warn(
+                'The total payload of the trial is larger than 50 KB. '
+                'This can cause performance issues and even the crash of NNI experiment. '
+                'This is usually caused by pickling large objects (like datasets) by mistake. '
+                'See https://nni.readthedocs.io/en/stable/NAS/Serialization.html for details.'
+            )
+        # trial parameters can be super large, disable pickle size limit here
+        # nevertheless, there could still be blocked by pipe / nni-manager
+        send(CommandType.NewTrialJob, send_payload)
        if self.send_trial_callback is not None:
            self.send_trial_callback(parameters)  # pylint: disable=not-callable
        return self.parameters_count

--- a/nni/retiarii/serializer.py
+++ b/nni/retiarii/serializer.py
@@ -54,6 +54,10 @@ def basic_unit(cls: T, basic_unit_tag: bool = True) -> Union[T, Traceable]:
    ``basic_unit_tag`` is true by default. If set to false, it will not be explicitly mark as a basic unit, and
    graph parser will continue to parse. Currently, this is to handle a special case in ``nn.Sequential``.
+    Although ``basic_unit`` calls ``trace`` in its implementation, it is not for serialization. Rather, it is meant
+    to capture the initialization arguments for mutation. Also, graph execution engine will stop digging into the inner
+    modules when it reaches a module that is decorated with ``basic_unit``.
    .. code-block:: python
        @basic_unit
@@ -83,7 +87,7 @@ def basic_unit(cls: T, basic_unit_tag: bool = True) -> Union[T, Traceable]:
 def model_wrapper(cls: T) -> Union[T, Traceable]:
    """
-    Wrap the model if you are using pure-python execution engine. For example
+    Wrap the base model (search space). For example,
    .. code-block:: python
@@ -94,8 +98,10 @@ def model_wrapper(cls: T) -> Union[T, Traceable]:
    The wrapper serves two purposes:
        1. Capture the init parameters of python class so that it can be re-instantiated in another process.
-        2. Reset uid in ``mutation`` namespace so that each model counts from zero.
+        2. Reset uid in namespace so that the auto label counting in each model stably starts from zero.
-           Can be useful in unittest and other multi-model scenarios.
+    Currently, NNI might not complain in simple cases where ``@model_wrapper`` is actually not needed.
+    But in future, we might enforce ``@model_wrapper`` to be required for base model.
    """
    _check_wrapped(cls)

--- a/test/ut/sdk/test_serializer.py
+++ b/test/ut/sdk/test_serializer.py
 import math
-from pathlib import Path
 import re
 import sys
+from pathlib import Path
+import pytest
 import nni
 import torch
 from torch.utils.data import DataLoader
@@ -189,6 +190,21 @@ def test_dataset():
    assert y.size() == torch.Size([10])
+@pytest.mark.skipif(sys.platform != 'linux', reason='https://github.com/microsoft/nni/issues/4434')
+def test_multiprocessing_dataloader():
+    # check whether multi-processing works
+    # it's possible to have pickle errors
+    dataset = nni.trace(MNIST)(root='data/mnist', train=False, download=True,
+                               transform=nni.trace(transforms.Compose)(
+                                   [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]
+                               ))
+    import nni.retiarii.evaluator.pytorch.lightning as pl
+    dataloader = pl.DataLoader(dataset, batch_size=10, num_workers=2)
+    x, y = next(iter(dataloader))
+    assert x.size() == torch.Size([10, 1, 28, 28])
+    assert y.size() == torch.Size([10])
 def test_type():
    assert nni.dump(torch.optim.Adam) == '{"__nni_type__": "path:torch.optim.adam.Adam"}'
    assert nni.load('{"__nni_type__": "path:torch.optim.adam.Adam"}') == torch.optim.Adam
@@ -211,4 +227,4 @@ if __name__ == '__main__':
    # test_nested_class()
    # test_unserializable()
    # test_basic_unit()
-    test_type()
+    test_multiprocessing_dataloader()