support tf2 NAS with non-weight-sharing mode (#2541)

9a1fb17b · QuanluZhang · GitHub · eee2f532 · 9a1fb17b · 9a1fb17b
Unverified Commit 9a1fb17b authored Jun 27, 2020 by QuanluZhang Committed by GitHub Jun 27, 2020
20 changed files
--- a/docs/en_US/NAS/ClassicNas.md
+++ b/docs/en_US/NAS/ClassicNas.md
+# Classic NAS Algorithms
+In classic NAS algorithms, each architecture is trained as a trial and the NAS algorithm acts as a tuner. Thus, this training mode naturally fits within the NNI hyper-parameter tuning framework, where Tuner generates new architecture for the next trial and trials run in the training service.
+## Quick Start
+The following example shows how to use classic NAS algorithms. You can see it is quite similar to NNI hyper-parameter tuning.
+```python
+model = Net()
+# get the chosen architecture from tuner and apply it on model
+get_and_apply_next_architecture(model)
+train(model)  # your code for training the model
+acc = test(model)  # test the trained model
+nni.report_final_result(acc)  # report the performance of the chosen architecture
+```
+First, instantiate the model. Search space has been defined in this model through `LayerChoice` and `InputChoice`. After that, user should invoke `get_and_apply_next_architecture(model)` to settle down to a specific architecture. This function receives the architecture from tuner (i.e., the classic NAS algorithm) and applies the architecture to `model`. At this point, `model` becomes a specific architecture rather than a search space. Then users are free to train this model just like training a normal PyTorch model. After get the accuracy of this model, users should invoke `nni.report_final_result(acc)` to report the result to the tuner.
+At this point, trial code is ready. Then, we can prepare an NNI experiment, i.e., search space file and experiment config file. Different from NNI hyper-parameter tuning, search space file is automatically generated from the trial code by running the command (the detailed usage of this command can be found [here](../Tutorial/Nnictl.md)):
+`nnictl ss_gen --trial_command="the command for running your trial code"`
+A file named `nni_auto_gen_search_space.json` is generated by this command. Then put the path of the generated search space in the field `searchSpacePath` of the experiment config file. The other fields of the config file can be filled by referring [this tutorial](../Tutorial/QuickStart.md).
+Currently, we only support [PPO Tuner](../Tuner/BuiltinTuner.md) and [random tuner](https://github.com/microsoft/nni/tree/master/examples/tuners/random_nas_tuner) for classic NAS. More classic NAS algorithms will be supported soon.
+The complete examples can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/classic_nas) for PyTorch and [here](https://github.com/microsoft/nni/tree/master/examples/nas/classic_nas-tf) for TensorFlow.
+## Standalone mode for easy debugging
+We support a standalone mode for easy debugging, where you can directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode.
\ No newline at end of file
--- a/docs/en_US/NAS/NasGuide.md
+++ b/docs/en_US/NAS/NasGuide.md
-# Guide: Using NAS on NNI
+# One-shot NAS algorithms
-```eval_rst
+Besides [classic NAS algorithms](./ClassicNas.md), users also apply more advanced one-shot NAS algorithms to find better models from a search space. There are lots of related works about one-shot NAS algorithms, such as [SMASH][8], [ENAS][2], [DARTS][1], [FBNet][3], [ProxylessNAS][4], [SPOS][5], [Single-Path NAS][6],  [Understanding One-shot][7] and [GDAS][9]. One-shot NAS algorithms usually build a supernet containing every candidate in the search space as its subnetwork, and in each step, a subnetwork or combination of several subnetworks is trained.
-.. contents::
-.. Note:: The APIs are in an experimental stage. The current programing interface is subject to change.
-```
-![](../../img/nas_abstract_illustration.png)
-Modern Neural Architecture Search (NAS) methods usually incorporate [three dimensions][1]: search space, search strategy, and performance estimation strategy. Search space often contains a limited number of neural network architectures to explore, while the search strategy samples architectures from search space, gets estimations of their performance, and evolves itself. Ideally, the search strategy should find the best architecture in the search space and report it to users. After users obtain the "best architecture", many methods use a "retrain step", which trains the network with the same pipeline as any traditional model.
-## Implement a Search Space
-Assuming we've got a baseline model, what should we do to be empowered with NAS? Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example, the code might look like this:
-```python
-from nni.nas.pytorch import mutables
-class Net(nn.Module):
-    def __init__(self):
-        super(Net, self).__init__()
-        self.conv1 = mutables.LayerChoice([
-            nn.Conv2d(1, 32, 3, 1),
-            nn.Conv2d(1, 32, 5, 3)
-        ])  # try 3x3 kernel and 5x5 kernel
-        self.conv2 = nn.Conv2d(32, 64, 3, 1)
-        self.dropout1 = nn.Dropout2d(0.25)
-        self.dropout2 = nn.Dropout2d(0.5)
-        self.fc1 = nn.Linear(9216, 128)
-        self.fc2 = nn.Linear(128, 10)
-    def forward(self, x):
-        x = self.conv1(x)
-        x = F.relu(x)
-        # ... same as original ...
-        return output
-```
-The example above adds an option of choosing conv5x5 at conv1. The modification is as simple as declaring a `LayerChoice` with the original conv3x3 and a new conv5x5 as its parameter. That's it! You don't have to modify the forward function in any way. You can imagine conv1 as any other module without NAS.
-So how about the possibilities of connections? This can be done using `InputChoice`. To allow for a skip connection on the MNIST example, we add another layer called conv3. In the following example, a possible connection from conv2 is added to the output of conv3.
-```python
-from nni.nas.pytorch import mutables
-class Net(nn.Module):
-    def __init__(self):
-        # ... same ...
-        self.conv2 = nn.Conv2d(32, 64, 3, 1)
-        self.conv3 = nn.Conv2d(64, 64, 1, 1)
-        # declaring that there is exactly one candidate to choose from
-        # search strategy will choose one or None
-        self.skipcon = mutables.InputChoice(n_candidates=1)
-        # ... same ...
-    def forward(self, x):
-        x = self.conv1(x)
-        x = F.relu(x)
-        x = self.conv2(x)
-        x0 = self.skipcon([x])  # choose one or none from [x]
-        x = self.conv3(x)
-        if x0 is not None:  # skipconnection is open
-            x += x0
-        x = F.max_pool2d(x, 2)
-        # ... same ...
-        return output
-```
-Input choice can be thought of as a callable module that receives a list of tensors and outputs the concatenation/sum/mean of some of them (sum by default), or `None` if none is selected. Like layer choices, input choices should be **initialized in `__init__` and called in `forward`**. We will see later that this is to allow search algorithms to identify these choices and do necessary preparations.
-`LayerChoice` and `InputChoice` are both **mutables**. Mutable means "changeable". As opposed to traditional deep learning layers/modules which have fixed operation types once defined, models with mutable are essentially a series of possible models.
-Users can specify a **key** for each mutable. By default, NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice`s with the same candidate operations and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice and will be used in the dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables, see [Mutables](./NasReference.md).
-## Use a Search Algorithm
-Aside from using a search space, there are at least two other ways users can do search. One runs NAS distributedly, which can be as naive as enumerating all the architectures and training each one from scratch, or can involve leveraging more advanced technique, such as [SMASH][8], [ENAS][2], [DARTS][1], [FBNet][3], [ProxylessNAS][4], [SPOS][5], [Single-Path NAS][6],  [Understanding One-shot][7] and [GDAS][9]. Since training many different architectures is known to be expensive, another family of methods, called one-shot NAS, builds a supernet containing every candidate in the search space as its subnetwork, and in each step, a subnetwork or combination of several subnetworks is trained.
 Currently, several one-shot NAS methods are supported on NNI. For example, `DartsTrainer`, which uses SGD to train architecture weights and model weights iteratively, and `ENASTrainer`, which [uses a controller to train the model][2]. New and more efficient NAS trainers keep emerging in research community and some will be implemented in future releases of NNI.
-### One-Shot NAS
+## Search with One-shot NAS Algorithms
 Each one-shot NAS algorithm implements a trainer, for which users can find usage details in the description of each algorithm. Here is a simple example, demonstrating how users can use `EnasTrainer`.
@@ -100,7 +25,7 @@ def top1_accuracy(output, target):
 def metrics_fn(output, target):
    # metrics function receives output and target and computes a dict of metrics
-    return {"acc1": reward_accuracy(output, target)}
+    return {"acc1": top1_accuracy(output, target)}
 from nni.nas.pytorch import enas
 trainer = enas.EnasTrainer(model,
@@ -117,35 +42,13 @@ trainer.train()  # training
 trainer.export(file="model_dir/final_architecture.json")  # export the final architecture to file
 ```
-Users can directly run their training file through `python3 train.py` without `nnictl`. After training, users can export the best one of the found models through `trainer.export()`.
+`model` is the one with [user defined search space](./WriteSearchSpace.md). Then users should prepare training data and model evaluation metrics. To search from the defined search space, a one-shot algorithm is instantiated, called trainer (e.g., EnasTrainer). The trainer exposes a few arguments that you can customize. For example, the loss function, the metrics function, the optimizer, and the datasets. These should satisfy most usage requirements and we do our best to make sure our built-in trainers work on as many models, tasks, and datasets as possible.
-Normally, the trainer exposes a few arguments that you can customize. For example, the loss function, the metrics function, the optimizer, and the datasets. These should satisfy most usages needs and we do our best to make sure our built-in trainers work on as many models, tasks, and datasets as possible. But there is no guarantee. For example, some trainers have the assumption that the task is a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps); most trainers do not have support for distributed training: they won't wrap your model with `DataParallel` or `DistributedDataParallel` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might need to [customize your trainer](./Advanced.md#extend-the-ability-of-one-shot-trainers).
-Furthermore, one-shot NAS can be visualized with our NAS UI. [See more details.](./Visualization.md)
-### Distributed NAS
-Neural architecture search was originally executed by running each child model independently as a trial job. We also support this searching approach, and it naturally fits within the NNI hyper-parameter tuning framework, where Tuner generates child models for the next trial and trials run in the training service.
-To use this mode, there is no need to change the search space expressed with the NNI NAS API (i.e., `LayerChoice`, `InputChoice`, `MutableScope`). After the model is initialized, apply the function `get_and_apply_next_architecture` on the model. One-shot NAS trainers are not used in this mode. Here is a simple example:
-```python
-model = Net()
-# get the chosen architecture from tuner and apply it on model
-get_and_apply_next_architecture(model)
-train(model)  # your code for training the model
-acc = test(model)  # test the trained model
-nni.report_final_result(acc)  # report the performance of the chosen architecture
-```
-The search space should be generated and sent to Tuner. As with the NNI NAS API, the search space is embedded in the user code. Users can use "[nnictl ss_gen](../Tutorial/Nnictl.md)" to generate the search space file. Then put the path of the generated search space in the field `searchSpacePath` of `config.yml`. The other fields in `config.yml` can be filled by referring [this tutorial](../Tutorial/QuickStart.md).
-You can use the [NNI tuners](../Tuner/BuiltinTuner.md) to do the search. Currently, only PPO Tuner supports NAS search spaces.
+**Note that** when using one-shot NAS algorithms, there is no need to start an NNI experiment. Users can directly run this Python script (i.e., `train.py`) through `python3 train.py` without `nnictl`. After training, users can export the best one of the found models through `trainer.export()`.
-We support a standalone mode for easy debugging, where you can directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode.
+Each trainer in NNI has its targeted scenario and usage. Some trainers have the assumption that the task is a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps). Most trainers do not have support for distributed training: they won't wrap your model with `DataParallel` or `DistributedDataParallel` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might need to [customize your trainer](./Advanced.md#extend-the-ability-of-one-shot-trainers).
-A complete example can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/classic_nas/config_nas.yml).
+Furthermore, one-shot NAS can be visualized with our NAS UI. [See more details.](./Visualization.md)
 ### Retrain with Exported Architecture

--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
 # Neural Architecture Search (NAS) on NNI
+```eval_rst
+.. contents::
+```
+## Overview
 Automatic neural architecture search is taking an increasingly important role in finding better models. Recent research has proved the feasibility of automatic NAS and has lead to models that beat many manually designed and tuned models. Some representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. Further, new innovations keep emerging.
 However, it takes a great effort to implement NAS algorithms, and it's hard to reuse the code base of existing algorithms for new ones. To facilitate NAS innovations (e.g., the design and implementation of new NAS models, the comparison of different NAS models side-by-side, etc.), an easy-to-use and flexible programming interface is crucial.
 With this motivation, our ambition is to provide a unified architecture in NNI, accelerate innovations on NAS, and apply state-of-the-art algorithms to real-world problems faster.
-With the unified interface, there are two different modes for architecture search. [One](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on a search space and one-shot training is used to generate a good-performing child model. [The other](#supported-distributed-nas-algorithms) is the traditional search-based approach, where each child model within the search space runs as an independent trial. The performance result is then sent to Tuner and the tuner generates a new child model.
+With the unified interface, there are two different modes for architecture search. [One](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on a search space and one-shot training is used to generate a good-performing child model. [The other](#supported-classic-nas-algorithms) is the traditional search-based approach, where each child model within the search space runs as an independent trial. We call it classic NAS.
+NNI also provides dedicated [visualization tool](#nas-visualization) for users to check the status of the neural architecture search process.
+## Supported Classic NAS Algorithms
+The procedure of classic NAS algorithms is similar to hyper-parameter tuning, users use `nnictl` to start experiments and each model runs as a trial. The difference is that search space file is automatically generated from user model (with search space in it) by running `nnictl ss_gen`. The following table listed supported tuning algorihtms for classic NAS mode. More algorihtms will be supported in future release.
+|Name|Brief Introduction of Algorithm|
+|---|---|
+| [Random Search](https://github.com/microsoft/nni/tree/master/examples/tuners/random_nas_tuner) | Randomly pick a model from search space |
+| [PPO Tuner](https://nni.readthedocs.io/en/latest/Tuner/BuiltinTuner.html#PPOTuner) | PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. [Reference Paper](https://arxiv.org/abs/1707.06347) |
+Please refer to [here](ClassicNas.md) for the usage of classic NAS algorithms.
 ## Supported One-shot NAS Algorithms
-NNI currently supports the NAS algorithms listed below and is adding more. Users can reproduce an algorithm or use it on their own dataset. We also encourage users to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.
+NNI currently supports the one-shot NAS algorithms listed below and is adding more. Users can reproduce an algorithm or use it on their own dataset. We also encourage users to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.
 |Name|Brief Introduction of Algorithm|
 |---|---|
-| [ENAS](ENAS.md) | [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268). In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance. |
+| [ENAS](https://nni.readthedocs.io/en/latest/NAS/ENAS.html) | [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268). In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance. |
-| [DARTS](DARTS.md) | [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) introduces a novel algorithm for differentiable network architecture search on bilevel optimization. |
+| [DARTS](https://nni.readthedocs.io/en/latest/NAS/DARTS.html) | [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) introduces a novel algorithm for differentiable network architecture search on bilevel optimization. |
-| [P-DARTS](PDARTS.md) | [Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. |
+| [P-DARTS](https://nni.readthedocs.io/en/latest/NAS/PDARTS.html) | [Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. |
-| [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with a uniform path sampling method and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
+| [SPOS](https://nni.readthedocs.io/en/latest/NAS/SPOS.html) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with a uniform path sampling method and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
-| [CDARTS](CDARTS.md) | [Cyclic Differentiable Architecture Search](https://arxiv.org/abs/****) builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.|
+| [CDARTS](https://nni.readthedocs.io/en/latest/NAS/CDARTS.html) | [Cyclic Differentiable Architecture Search](https://arxiv.org/abs/****) builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.|
-| [ProxylessNAS](Proxylessnas.md) | [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332). It removes proxy, directly learns the architectures for large-scale target tasks and target hardware platforms. |
+| [ProxylessNAS](https://nni.readthedocs.io/en/latest/NAS/Proxylessnas.html) | [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332). It removes proxy, directly learns the architectures for large-scale target tasks and target hardware platforms. |
-| [TextNAS](TextNAS.md) | [TextNAS: A Neural Architecture Search Space tailored for Text Representation](https://arxiv.org/pdf/1912.10729.pdf). It is a neural architecture search algorithm tailored for text representation. |
+| [TextNAS](https://nni.readthedocs.io/en/latest/NAS/TextNAS.html) | [TextNAS: A Neural Architecture Search Space tailored for Text Representation](https://arxiv.org/pdf/1912.10729.pdf). It is a neural architecture search algorithm tailored for text representation. |
-One-shot algorithms run **standalone without nnictl**. Only the PyTorch version has been implemented. Tensorflow 2.x will be supported in a future release.
+One-shot algorithms run **standalone without nnictl**. NNI supports both PyTorch and Tensorflow 2.X.
 Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``.
@@ -30,26 +49,19 @@ Here are some common dependencies to run the examples. PyTorch needs to be above
 * PyTorch 1.2+
 * git
-One-shot NAS can be visualized with our visualization tool. Learn more details [here](./Visualization.md).
+Please refer to [here](NasGuide.md) for the usage of one-shot NAS algorithms.
-## Supported Distributed NAS Algorithms
-|Name|Brief Introduction of Algorithm|
+One-shot NAS can be visualized with our visualization tool. Learn more details [here](./Visualization.md).
-|---|---|
-| [SPOS's 2nd stage](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with a uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures.|
-```eval_rst	
-.. Note:: SPOS is a two-stage algorithm, whose first stage is one-shot and the second stage is distributed, leveraging the result of the first stage as a checkpoint.	
-```
-## Using the NNI API
+## Using NNI API to Write Your Search Space
 The programming interface of designing and searching a model is often demanded in two scenarios.
 1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models.
 2. When applying NAS on a neural network, it needs a unified way to express the search space of architectures, so that it doesn't need to update trial code for different search algorithms.
-[Here](./NasGuide.md) is the user guide to get started with using NAS on NNI.
+For using NNI NAS, we suggest users to first go through [the tutorial of NAS API for building search space](./WriteSearchSpace.md).
 ## NAS Visualization

--- a/docs/en_US/NAS/QuickStart.md
+++ b/docs/en_US/NAS/QuickStart.md
-# NAS Quick Start
-The NAS feature provided by NNI has two key components: APIs for expressing the search space and NAS training approaches. The former is for users to easily specify a class of models (i.e., the candidate models specified by the search space) which may perform well. The latter is for users to easily apply state-of-the-art NAS training approaches on their own model.
-Here we use a simple example to demonstrate how to tune your model architecture with the NNI NAS APIs step by step. The complete code of this example can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/naive).
-## Write your model with NAS APIs
-Instead of writing a concrete neural model, you can write a class of neural models using two of the NAS APIs library functions, `LayerChoice` and `InputChoice`. For example, if you think either of two options might work in the first convolution layer, then you can get one from them using `LayerChoice` as shown by `self.conv1` in the code. Similarly, the second convolution layer `self.conv2` also chooses one from two options. To this line, four candidate neural networks are specified. `self.skipconnect` uses `InputChoice` to specify two choices, adding a skip connection or not.
-```python
-import torch.nn as nn
-from nni.nas.pytorch.mutables import LayerChoice, InputChoice
-class Net(nn.Module):
-    def __init__(self):
-        super(Net, self).__init__()
-        self.conv1 = LayerChoice([nn.Conv2d(3, 6, 3, padding=1), nn.Conv2d(3, 6, 5, padding=2)])
-        self.pool = nn.MaxPool2d(2, 2)
-        self.conv2 = LayerChoice([nn.Conv2d(6, 16, 3, padding=1), nn.Conv2d(6, 16, 5, padding=2)])
-        self.conv3 = nn.Conv2d(16, 16, 1)
-        self.skipconnect = InputChoice(n_candidates=1)
-        self.bn = nn.BatchNorm2d(16)
-        self.gap = nn.AdaptiveAvgPool2d(4)
-        self.fc1 = nn.Linear(16 * 4 * 4, 120)
-        self.fc2 = nn.Linear(120, 84)
-        self.fc3 = nn.Linear(84, 10)
-```
-For a detailed description of `LayerChoice` and `InputChoice`, please refer to [the NAS guide](NasGuide.md)
-## Choose a NAS trainer
-After the model is instantiated, it is time to train the model using a NAS trainer. Different trainers use different approaches to search for the best one from a class of neural models that you specified. NNI provides several popular NAS training approaches such as DARTS and ENAS. Here we use `DartsTrainer` in the example below. After the trainer is instantiated, invoke `trainer.train()` to do the search.
-```python
-trainer = DartsTrainer(net,
-                       loss=criterion,
-                       metrics=accuracy,
-                       optimizer=optimizer,
-                       num_epochs=2,
-                       dataset_train=dataset_train,
-                       dataset_valid=dataset_valid,
-                       batch_size=64,
-                       log_frequency=10)
-trainer.train()
-```
-## Export the best model
-After the search (i.e., `trainer.train()`) is done, to get the best performing model we simply call `trainer.export("final_arch.json")` to export the found neural architecture to a file.
-## NAS visualization
-We are working on NAS visualization and will release this feature soon.
-## Retrain the exported best model
-It is simple to retrain the found (exported) neural architecture. Step one, instantiate the model you defined above. Step two, invoke `apply_fixed_architecture` to the model. Then the model becomes the found (exported) one. Afterward, you can use traditional training to train this model.
-```python
-model = Net()
-apply_fixed_architecture(model, "final_arch.json")
-```
--- a/docs/en_US/NAS/WriteSearchSpace.md
+++ b/docs/en_US/NAS/WriteSearchSpace.md
+# Write A Search Space
+Genrally, a search space describes candiate architectures from which users want to find the best one. Different search algorithms, no matter classic NAS or one-shot NAS, can be applied on the search space. NNI provides APIs to unified the expression of neural architecture search space.
+A search space can be built on a base model. This is also a common practice when a user wants to apply NAS on an existing model. Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example. Note that NNI provides the same APIs for expressing search space on PyTorch and TensorFlow.
+```python
+from nni.nas.pytorch import mutables
+class Net(nn.Module):
+    def __init__(self):
+        super(Net, self).__init__()
+        self.conv1 = mutables.LayerChoice([
+            nn.Conv2d(1, 32, 3, 1),
+            nn.Conv2d(1, 32, 5, 3)
+        ])  # try 3x3 kernel and 5x5 kernel
+        self.conv2 = nn.Conv2d(32, 64, 3, 1)
+        self.dropout1 = nn.Dropout2d(0.25)
+        self.dropout2 = nn.Dropout2d(0.5)
+        self.fc1 = nn.Linear(9216, 128)
+        self.fc2 = nn.Linear(128, 10)
+    def forward(self, x):
+        x = self.conv1(x)
+        x = F.relu(x)
+        # ... same as original ...
+        return output
+```
+The example above adds an option of choosing conv5x5 at conv1. The modification is as simple as declaring a `LayerChoice` with the original conv3x3 and a new conv5x5 as its parameter. That's it! You don't have to modify the forward function in any way. You can imagine conv1 as any other module without NAS.
+So how about the possibilities of connections? This can be done using `InputChoice`. To allow for a skip connection on the MNIST example, we add another layer called conv3. In the following example, a possible connection from conv2 is added to the output of conv3.
+```python
+from nni.nas.pytorch import mutables
+class Net(nn.Module):
+    def __init__(self):
+        # ... same ...
+        self.conv2 = nn.Conv2d(32, 64, 3, 1)
+        self.conv3 = nn.Conv2d(64, 64, 1, 1)
+        # declaring that there is exactly one candidate to choose from
+        # search strategy will choose one or None
+        self.skipcon = mutables.InputChoice(n_candidates=1)
+        # ... same ...
+    def forward(self, x):
+        x = self.conv1(x)
+        x = F.relu(x)
+        x = self.conv2(x)
+        x0 = self.skipcon([x])  # choose one or none from [x]
+        x = self.conv3(x)
+        if x0 is not None:  # skipconnection is open
+            x += x0
+        x = F.max_pool2d(x, 2)
+        # ... same ...
+        return output
+```
+Input choice can be thought of as a callable module that receives a list of tensors and outputs the concatenation/sum/mean of some of them (sum by default), or `None` if none is selected. Like layer choices, input choices should be **initialized in `__init__` and called in `forward`**. This is to allow search algorithms to identify these choices and do necessary preparations.
+`LayerChoice` and `InputChoice` are both **mutables**. Mutable means "changeable". As opposed to traditional deep learning layers/modules which have fixed operation types once defined, models with mutable are essentially a series of possible models.
+Users can specify a **key** for each mutable. By default, NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice`s with the same candidate operations and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice and will be used in the dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables (e.g., `LayerChoice` and `InputChoice`), see [Mutables](./NasReference.md).
+With search space defined, the next step is searching for the best model from it. Please refer to [classic NAS algorithms](./ClassicNas.md) and [one-shot NAS algorithms](./NasGuide.md) for how to search from your defined search space.
\ No newline at end of file
--- a/docs/en_US/NAS/one_shot_nas.rst
+++ b/docs/en_US/NAS/one_shot_nas.rst
+One-shot NAS Algorithms
+=======================
+One-shot NAS algorithms leverage weight sharing among models in neural architecture search space to train a supernet, and use this supernet to guide the selection of better models. This type of algorihtms greatly reduces computational resource compared to independently training each model from scratch (which we call "Classic NAS"). NNI has supported many popular One-shot NAS algorithms as following.
+..  toctree::
+    :maxdepth: 1
+    Quick Start <NasGuide>
+    ENAS <ENAS>
+    DARTS <DARTS>
+    P-DARTS <PDARTS>
+    SPOS <SPOS>
+    CDARTS <CDARTS>
+    ProxylessNAS <Proxylessnas>
+    TextNAS <TextNAS>
\ No newline at end of file
--- a/docs/en_US/Tuner/BuiltinTuner.md
+++ b/docs/en_US/Tuner/BuiltinTuner.md
@@ -421,7 +421,7 @@ tuner:
 > Built-in Tuner Name: **PPOTuner**
-Note that the only acceptable types within the search space is `mutable_layer`. `optional_input_size` can only be 0, 1, or [0, 1].
+Note that the only acceptable types within the search space are `layer_choice` and `input_choice`. For `input_choice`, `n_chosen` can only be 0, 1, or [0, 1]. Note, the search space file for NAS is usually automatically generated through the command [`nnictl ss_gen`](../Tutorial/Nnictl.md).
 **Suggested scenario**

--- a/docs/en_US/nas.rst
+++ b/docs/en_US/nas.rst
@@ -18,15 +18,9 @@ For details, please refer to the following tutorials:
    :maxdepth: 2
    Overview <NAS/Overview>
-    Quick Start <NAS/QuickStart>
+    Write A Search Space <NAS/WriteSearchSpace>
-    Tutorial <NAS/NasGuide>
+    Classic NAS <NAS/ClassicNas>
-    ENAS <NAS/ENAS>
+    One-shot NAS <NAS/one_shot_nas>
-    DARTS <NAS/DARTS>
-    P-DARTS <NAS/PDARTS>
-    SPOS <NAS/SPOS>
-    CDARTS <NAS/CDARTS>
-    ProxylessNAS <NAS/Proxylessnas>
-    TextNAS <NAS/TextNAS>
    Customize a NAS Algorithm <NAS/Advanced>
    NAS Visualization <NAS/Visualization>
    API Reference <NAS/NasReference>
--- a/examples/nas/classic_nas-tf/config_ppo.yml
+++ b/examples/nas/classic_nas-tf/config_ppo.yml
+authorName: default
+experimentName: example_mnist
+trialConcurrency: 1
+maxExecDuration: 100h
+maxTrialNum: 1000
+#choice: local, remote, pai
+trainingServicePlatform: local
+#please use `nnictl ss_gen` to generate search space file first
+searchSpacePath: nni_auto_gen_search_space.json
+useAnnotation: False
+tuner:
+  builtinTunerName: PPOTuner
+  classArgs:
+    optimize_mode: maximize
+trial:
+  command: python3 train.py
+  codeDir: .
+  gpuNum: 0
--- a/examples/nas/classic_nas-tf/config_random_search.yml
+++ b/examples/nas/classic_nas-tf/config_random_search.yml
+authorName: default
+experimentName: example_mnist
+trialConcurrency: 1
+maxExecDuration: 1h
+maxTrialNum: 10
+#choice: local, remote, pai
+trainingServicePlatform: local
+#please use `nnictl ss_gen` to generate search space file first
+searchSpacePath: nni_auto_gen_search_space.json
+useAnnotation: False
+tuner:
+  codeDir: ../../tuners/random_nas_tuner
+  classFileName: random_nas_tuner.py
+  className: RandomNASTuner
+trial:
+  command: python3 train.py
+  codeDir: .
+  gpuNum: 0
--- a/examples/nas/classic_nas-tf/train.py
+++ b/examples/nas/classic_nas-tf/train.py
+import argparse
+import tensorflow as tf
+from tensorflow.keras import Model
+from tensorflow.keras.layers import (AveragePooling2D, BatchNormalization, Conv2D, Dense, MaxPool2D)
+from tensorflow.keras.losses import Reduction, SparseCategoricalCrossentropy
+from tensorflow.keras.optimizers import SGD
+import nni
+from nni.nas.tensorflow.mutables import LayerChoice, InputChoice
+from nni.nas.tensorflow.classic_nas import get_and_apply_next_architecture
+tf.get_logger().setLevel('ERROR')
+class Net(Model):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = LayerChoice([
+            Conv2D(6, 3, padding='same', activation='relu'),
+            Conv2D(6, 5, padding='same', activation='relu'),
+        ])
+        self.pool = MaxPool2D(2)
+        self.conv2 = LayerChoice([
+            Conv2D(16, 3, padding='same', activation='relu'),
+            Conv2D(16, 5, padding='same', activation='relu'),
+        ])
+        self.conv3 = Conv2D(16, 1)
+        self.skipconnect = InputChoice(n_candidates=2, n_chosen=1)
+        self.bn = BatchNormalization()
+        self.gap = AveragePooling2D(2)
+        self.fc1 = Dense(120, activation='relu')
+        self.fc2 = Dense(84, activation='relu')
+        self.fc3 = Dense(10)
+    def call(self, x):
+        bs = x.shape[0]
+        t = self.conv1(x)
+        x = self.pool(t)
+        x0 = self.conv2(x)
+        x1 = self.conv3(x0)
+        x0 = self.skipconnect([x0, None])
+        if x0 is not None:
+            x1 += x0
+        x = self.pool(self.bn(x1))
+        x = self.gap(x)
+        x = tf.reshape(x, [bs, -1])
+        x = self.fc1(x)
+        x = self.fc2(x)
+        x = self.fc3(x)
+        return x
+loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
+def loss(model, x, y, training):
+    # training=training is needed only if there are layers with different
+    # behavior during training versus inference (e.g. Dropout).
+    y_ = model(x, training=training)
+    return loss_object(y_true=y, y_pred=y_)
+def grad(model, inputs, targets):
+    with tf.GradientTape() as tape:
+        loss_value = loss(model, inputs, targets, training=True)
+    return loss_value, tape.gradient(loss_value, model.trainable_variables)
+def train(net, train_dataset, optimizer, num_epochs):
+    train_loss_results = []
+    train_accuracy_results = []
+    for epoch in range(num_epochs):
+        epoch_loss_avg = tf.keras.metrics.Mean()
+        epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
+        for x, y in train_dataset:
+            loss_value, grads = grad(net, x, y)
+            optimizer.apply_gradients(zip(grads, net.trainable_variables))
+            epoch_loss_avg.update_state(loss_value)
+            epoch_accuracy.update_state(y, net(x, training=True))
+        train_loss_results.append(epoch_loss_avg.result())
+        train_accuracy_results.append(epoch_accuracy.result())
+        if epoch % 1 == 0:
+            print("Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format(epoch,
+                                                                epoch_loss_avg.result(),
+                                                                epoch_accuracy.result()))
+def test(model, test_dataset):
+    test_accuracy = tf.keras.metrics.Accuracy()
+    for (x, y) in test_dataset:
+        # training=False is needed only if there are layers with different
+        # behavior during training versus inference (e.g. Dropout).
+        logits = model(x, training=False)
+        prediction = tf.argmax(logits, axis=1, output_type=tf.int32)
+        test_accuracy(prediction, y)
+    print("Test set accuracy: {:.3%}".format(test_accuracy.result()))
+    return test_accuracy.result()
+if __name__ == '__main__':
+    # Training settings
+    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
+    parser.add_argument('--epochs', type=int, default=10, metavar='N',
+                        help='number of epochs to train (default: 10)')
+    args, _ = parser.parse_known_args()
+    cifar10 = tf.keras.datasets.cifar10
+    (x_train, y_train), (x_test, y_test) = cifar10.load_data()
+    x_train, x_test = x_train / 255.0, x_test / 255.0
+    split = int(len(x_train) * 0.9)
+    dataset_train = tf.data.Dataset.from_tensor_slices((x_train[:split], y_train[:split])).batch(64)
+    dataset_valid = tf.data.Dataset.from_tensor_slices((x_train[split:], y_train[split:])).batch(64)
+    dataset_test = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(64)
+    net = Net()
+    get_and_apply_next_architecture(net)
+    optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
+    train(net, dataset_train, optimizer, args.epochs)
+    acc = test(net, dataset_test)
+    nni.report_final_result(acc.numpy())
--- a/examples/nas/classic_nas/config_nas.yml
+++ b/examples/nas/classic_nas/config_nas.yml
--- a/src/sdk/pynni/nni/nas/tensorflow/classic_nas/__init__.py
+++ b/src/sdk/pynni/nni/nas/tensorflow/classic_nas/__init__.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+from .mutator import get_and_apply_next_architecture
--- a/src/sdk/pynni/nni/nas/tensorflow/classic_nas/mutator.py
+++ b/src/sdk/pynni/nni/nas/tensorflow/classic_nas/mutator.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+import json
+import logging
+import os
+import sys
+import tensorflow as tf
+import nni
+from nni.env_vars import trial_env_vars
+from nni.nas.tensorflow.mutables import LayerChoice, InputChoice, MutableScope
+from nni.nas.tensorflow.mutator import Mutator
+logger = logging.getLogger(__name__)
+NNI_GEN_SEARCH_SPACE = "NNI_GEN_SEARCH_SPACE"
+LAYER_CHOICE = "layer_choice"
+INPUT_CHOICE = "input_choice"
+def get_and_apply_next_architecture(model):
+    """
+    Wrapper of :class:`~nni.nas.tensorflow.classic_nas.mutator.ClassicMutator` to make it more meaningful,
+    similar to ``get_next_parameter`` for HPO.
+    Tt will generate search space based on ``model``.
+    If env ``NNI_GEN_SEARCH_SPACE`` exists, this is in dry run mode for
+    generating search space for the experiment.
+    If not, there are still two mode, one is nni experiment mode where users
+    use ``nnictl`` to start an experiment. The other is standalone mode
+    where users directly run the trial command, this mode chooses the first
+    one(s) for each LayerChoice and InputChoice.
+    Parameters
+    ----------
+    model : nn.Module
+        User's model with search space (e.g., LayerChoice, InputChoice) embedded in it.
+    """
+    ClassicMutator(model)
+class ClassicMutator(Mutator):
+    """
+    This mutator is to apply the architecture chosen from tuner.
+    It implements the forward function of LayerChoice and InputChoice,
+    to only activate the chosen ones.
+    Parameters
+    ----------
+    model : nn.Module
+        User's model with search space (e.g., LayerChoice, InputChoice) embedded in it.
+    """
+    def __init__(self, model):
+        super(ClassicMutator, self).__init__(model)
+        self._chosen_arch = {}
+        self._search_space = self._generate_search_space()
+        if NNI_GEN_SEARCH_SPACE in os.environ:
+            # dry run for only generating search space
+            self._dump_search_space(os.environ[NNI_GEN_SEARCH_SPACE])
+            sys.exit(0)
+        if trial_env_vars.NNI_PLATFORM is None:
+            logger.warning("This is in standalone mode, the chosen are the first one(s).")
+            self._chosen_arch = self._standalone_generate_chosen()
+        else:
+            # get chosen arch from tuner
+            self._chosen_arch = nni.get_next_parameter()
+            if self._chosen_arch is None:
+                if trial_env_vars.NNI_PLATFORM == "unittest":
+                    # happens if NNI_PLATFORM is intentionally set, e.g., in UT
+                    logger.warning("`NNI_PLATFORM` is set but `param` is None. Falling back to standalone mode.")
+                    self._chosen_arch = self._standalone_generate_chosen()
+                else:
+                    raise RuntimeError("Chosen architecture is None. This may be a platform error.")
+        self.reset()
+    def _sample_layer_choice(self, mutable, idx, value, search_space_item):
+        """
+        Convert layer choice to tensor representation.
+        Parameters
+        ----------
+        mutable : Mutable
+        idx : int
+            Number `idx` of list will be selected.
+        value : str
+            The verbose representation of the selected value.
+        search_space_item : list
+            The list for corresponding search space.
+        """
+        # doesn't support multihot for layer choice yet
+        assert 0 <= idx < len(mutable) and search_space_item[idx] == value, \
+            "Index '{}' in search space '{}' is not '{}'".format(idx, search_space_item, value)
+        mask = tf.one_hot(idx, len(mutable))
+        return tf.cast(tf.reshape(mask, [-1]), tf.bool)
+    def _sample_input_choice(self, mutable, idx, value, search_space_item):
+        """
+        Convert input choice to tensor representation.
+        Parameters
+        ----------
+        mutable : Mutable
+        idx : int
+            Number `idx` of list will be selected.
+        value : str
+            The verbose representation of the selected value.
+        search_space_item : list
+            The list for corresponding search space.
+        """
+        candidate_repr = search_space_item["candidates"]
+        multihot_list = [False] * mutable.n_candidates
+        for i, v in zip(idx, value):
+            assert 0 <= i < mutable.n_candidates and candidate_repr[i] == v, \
+                "Index '{}' in search space '{}' is not '{}'".format(i, candidate_repr, v)
+            assert not multihot_list[i], "'{}' is selected twice in '{}', which is not allowed.".format(i, idx)
+            multihot_list[i] = True
+        return tf.cast(multihot_list, tf.bool)  # pylint: disable=not-callable
+    def sample_search(self):
+        """
+        See :meth:`sample_final`.
+        """
+        return self.sample_final()
+    def sample_final(self):
+        """
+        Convert the chosen arch and apply it on model.
+        """
+        assert set(self._chosen_arch.keys()) == set(self._search_space.keys()), \
+            "Unmatched keys, expected keys '{}' from search space, found '{}'.".format(self._search_space.keys(),
+                                                                                       self._chosen_arch.keys())
+        result = dict()
+        for mutable in self.mutables:
+            if isinstance(mutable, (LayerChoice, InputChoice)):
+                assert mutable.key in self._chosen_arch, \
+                    "Expected '{}' in chosen arch, but not found.".format(mutable.key)
+                data = self._chosen_arch[mutable.key]
+                assert isinstance(data, dict) and "_value" in data and "_idx" in data, \
+                    "'{}' is not a valid choice.".format(data)
+            if isinstance(mutable, LayerChoice):
+                result[mutable.key] = self._sample_layer_choice(mutable, data["_idx"], data["_value"],
+                                                                self._search_space[mutable.key]["_value"])
+            elif isinstance(mutable, InputChoice):
+                result[mutable.key] = self._sample_input_choice(mutable, data["_idx"], data["_value"],
+                                                                self._search_space[mutable.key]["_value"])
+            elif isinstance(mutable, MutableScope):
+                logger.info("Mutable scope '%s' is skipped during parsing choices.", mutable.key)
+            else:
+                raise TypeError("Unsupported mutable type: '%s'." % type(mutable))
+        return result
+    def _standalone_generate_chosen(self):
+        """
+        Generate the chosen architecture for standalone mode,
+        i.e., choose the first one(s) for LayerChoice and InputChoice.
+        ::
+            { key_name: {"_value": "conv1",
+                         "_idx": 0} }
+            { key_name: {"_value": ["in1"],
+                         "_idx": [0]} }
+        Returns
+        -------
+        dict
+            the chosen architecture
+        """
+        chosen_arch = {}
+        for key, val in self._search_space.items():
+            if val["_type"] == LAYER_CHOICE:
+                choices = val["_value"]
+                chosen_arch[key] = {"_value": choices[0], "_idx": 0}
+            elif val["_type"] == INPUT_CHOICE:
+                choices = val["_value"]["candidates"]
+                n_chosen = val["_value"]["n_chosen"]
+                if n_chosen is None:
+                    n_chosen = len(choices)
+                chosen_arch[key] = {"_value": choices[:n_chosen], "_idx": list(range(n_chosen))}
+            else:
+                raise ValueError("Unknown key '%s' and value '%s'." % (key, val))
+        return chosen_arch
+    def _generate_search_space(self):
+        """
+        Generate search space from mutables.
+        Here is the search space format:
+        ::
+            { key_name: {"_type": "layer_choice",
+                         "_value": ["conv1", "conv2"]} }
+            { key_name: {"_type": "input_choice",
+                         "_value": {"candidates": ["in1", "in2"],
+                                    "n_chosen": 1}} }
+        Returns
+        -------
+        dict
+            the generated search space
+        """
+        search_space = {}
+        for mutable in self.mutables:
+            # for now we only generate flattened search space
+            if isinstance(mutable, LayerChoice):
+                key = mutable.key
+                val = mutable.names
+                search_space[key] = {"_type": LAYER_CHOICE, "_value": val}
+            elif isinstance(mutable, InputChoice):
+                key = mutable.key
+                search_space[key] = {"_type": INPUT_CHOICE,
+                                     "_value": {"candidates": mutable.choose_from,
+                                                "n_chosen": mutable.n_chosen}}
+            elif isinstance(mutable, MutableScope):
+                logger.info("Mutable scope '%s' is skipped during generating search space.", mutable.key)
+            else:
+                raise TypeError("Unsupported mutable type: '%s'." % type(mutable))
+        return search_space
+    def _dump_search_space(self, file_path):
+        with open(file_path, "w") as ss_file:
+            json.dump(self._search_space, ss_file, sort_keys=True, indent=2)
--- a/src/sdk/pynni/nni/nas/tensorflow/mutables.py
+++ b/src/sdk/pynni/nni/nas/tensorflow/mutables.py
@@ -2,6 +2,7 @@
 # Licensed under the MIT license.
 import logging
+from collections import OrderedDict
 from tensorflow.keras import Model
@@ -77,6 +78,18 @@ class MutableScope(Mutable):
 class LayerChoice(Mutable):
    def __init__(self, op_candidates, reduction='sum', return_mask=False, key=None):
        super().__init__(key=key)
+        self.names = []
+        if isinstance(op_candidates, OrderedDict):
+            for name, _ in op_candidates.items():
+                assert name not in ["length", "reduction", "return_mask", "_key", "key", "names"], \
+                    "Please don't use a reserved name '{}' for your module.".format(name)
+                self.names.append(name)
+        elif isinstance(op_candidates, list):
+            for i, _ in enumerate(op_candidates):
+                self.names.append(str(i))
+        else:
+            raise TypeError("Unsupported op_candidates type: {}".format(type(op_candidates)))
        self.length = len(op_candidates)
        self.choices = op_candidates
        self.reduction = reduction

--- a/src/sdk/pynni/nni/ppo_tuner/distri.py
+++ b/src/sdk/pynni/nni/ppo_tuner/distri.py
@@ -5,7 +5,8 @@
 functions for sampling from hidden state
 """
-import tensorflow as tf
+import tensorflow.compat.v1 as tf
+tf.disable_v2_behavior()
 from .util import fc

--- a/src/sdk/pynni/nni/ppo_tuner/model.py
+++ b/src/sdk/pynni/nni/ppo_tuner/model.py
@@ -5,7 +5,8 @@
 the main model of policy/value network
 """
-import tensorflow as tf
+import tensorflow.compat.v1 as tf
+tf.disable_v2_behavior()
 from .util import initialize, get_session

--- a/src/sdk/pynni/nni/ppo_tuner/policy.py
+++ b/src/sdk/pynni/nni/ppo_tuner/policy.py
@@ -5,7 +5,8 @@
 build policy/value network from model
 """
-import tensorflow as tf
+import tensorflow.compat.v1 as tf
+tf.disable_v2_behavior()
 from .distri import CategoricalPdType
 from .util import lstm_model, fc, observation_placeholder, adjust_shape

--- a/src/sdk/pynni/nni/ppo_tuner/util.py
+++ b/src/sdk/pynni/nni/ppo_tuner/util.py
@@ -9,7 +9,8 @@ import os
 import random
 import multiprocessing
 import numpy as np
-import tensorflow as tf
+import tensorflow.compat.v1 as tf
+tf.disable_v2_behavior()
 from gym.spaces import Discrete, Box, MultiDiscrete
 def set_global_seeds(i):

--- a/test/config/examples/classic-nas-pytorch.yml
+++ b/test/config/examples/classic-nas-pytorch.yml
+authorName: nni
+experimentName: default_test
+maxExecDuration: 10m
+maxTrialNum: 1
+trialConcurrency: 1
+searchSpacePath: nni-nas-search-space.json
+tuner:
+  builtinTunerName: PPOTuner
+  classArgs:
+    optimize_mode: maximize
+trial:
+  command: python3 mnist.py --epochs 1
+  codeDir: ../../../examples/nas/classic_nas
+  gpuNum: 0
+useAnnotation: false
+multiPhase: false
+multiThread: false
+trainingServicePlatform: local
\ No newline at end of file