Merge pull request #233 from microsoft/master

merge master

Merge pull request #233 from microsoft/master
merge master
aa316742 · SparkSnail · GitHub · 3fe117f0 · 24fa4619 · aa316742
Unverified Commit aa316742 authored Feb 21, 2020 by SparkSnail Committed by GitHub Feb 21, 2020
20 changed files
--- a/docs/en_US/model_compression.rst
+++ b/docs/en_US/model_compression.rst
@@ -16,13 +16,8 @@ For details, please refer to the following tutorials:
    :maxdepth: 2

    Overview <Compressor/Overview>
-    Level Pruner <Compressor/Pruner>
-    AGP Pruner <Compressor/Pruner>
-    L1Filter Pruner <Compressor/l1filterpruner>
-    Slim Pruner <Compressor/SlimPruner>
-    Lottery Ticket Pruner <Compressor/LotteryTicketHypothesis>
-    FPGM Pruner <Compressor/Pruner>
-    Naive Quantizer <Compressor/Quantizer>
-    QAT Quantizer <Compressor/Quantizer>
-    DoReFa Quantizer <Compressor/Quantizer>
+    Quick Start <Compressor/QuickStart>
+    Pruners <pruners>
+    Quantizers <quantizers>
+    Model Speedup <Compressor/ModelSpeedup>
    Automatic Model Compression <Compressor/AutoCompression>
--- a/docs/en_US/nas.rst
+++ b/docs/en_US/nas.rst
-##############
-NAS Algorithms
-##############
+##########################
+Neural Architecture Search
+##########################

 Automatic neural architecture search is taking an increasingly important role on finding better models.
-Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models.
-Some of representative works are NASNet, ENAS, DARTS, Network Morphism, and Evolution. There are new innovations keeping emerging.
+Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually tuned models.
+Some of representative works are NASNet, ENAS, DARTS, Network Morphism, and Evolution. Moreover, new innovations keep emerging.

-However, it takes great efforts to implement NAS algorithms, and it is hard to reuse code base of existing algorithms in new one.
+However, it takes great efforts to implement NAS algorithms, and it is hard to reuse code base of existing algorithms in a new one.
 To facilitate NAS innovations (e.g., design and implement new NAS models, compare different NAS models side-by-side),
 an easy-to-use and flexible programming interface is crucial.

-With this motivation, our ambition is to provide a unified architecture in NNI,
+Therefore, we provide a unified interface for NAS,
 to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.
-
 For details, please refer to the following tutorials:

 ..  toctree::
    :maxdepth: 2

    Overview <NAS/Overview>
-    NAS Interface <NAS/NasInterface>
+    Quick Start <NAS/QuickStart>
+    Tutorial <NAS/NasGuide>
    ENAS <NAS/ENAS>
    DARTS <NAS/DARTS>
    P-DARTS <NAS/PDARTS>
    SPOS <NAS/SPOS>
    CDARTS <NAS/CDARTS>
+    ProxylessNAS <NAS/Proxylessnas>
+    Customize a NAS Algorithm <NAS/Advanced>
+    API Reference <NAS/NasReference>
--- a/docs/en_US/pruners.rst
+++ b/docs/en_US/pruners.rst
+############################
+Supported Pruning Algorithms
+############################
+
+..  toctree::
+    :maxdepth: 1
+
+    Level Pruner <Compressor/Pruner>
+    AGP Pruner <Compressor/Pruner>
+    Lottery Ticket Pruner <Compressor/LotteryTicketHypothesis>
+    FPGM Pruner <Compressor/Pruner>
+    L1Filter Pruner <Compressor/l1filterpruner>
+    L2Filter Pruner <Compressor/Pruner>
+    ActivationAPoZRankFilterPruner <Compressor/Pruner>
+    ActivationMeanRankFilterPruner <Compressor/Pruner>
+    Slim Pruner <Compressor/SlimPruner>
--- a/docs/en_US/quantizers.rst
+++ b/docs/en_US/quantizers.rst
+#################################
+Supported Quantization Algorithms
+#################################
+
+..  toctree::
+    :maxdepth: 1
+
+    Naive Quantizer <Compressor/Quantizer>
+    QAT Quantizer <Compressor/Quantizer>
+    DoReFa Quantizer <Compressor/Quantizer>
+    BNN Quantizer <Compressor/Quantizer>
\ No newline at end of file
--- a/docs/en_US/reference.rst
+++ b/docs/en_US/reference.rst
@@ -2,12 +2,11 @@ References
 ==================

 ..  toctree::
-    :maxdepth: 3
+    :maxdepth: 2

-    Command Line <Tutorial/Nnictl>
-    Python API <sdk_reference>
-    Annotation <Tutorial/AnnotationSpec>
-    Configuration<Tutorial/ExperimentConfig>
+    nnictl Commands <Tutorial/Nnictl>
+    Experiment Configuration <Tutorial/ExperimentConfig>
    Search Space <Tutorial/SearchSpaceSpec>
-    TrainingService <TrainingService/HowToImplementTrainingService>
-    Framework Library <SupportedFramework_Library>
+    NNI Annotation <Tutorial/AnnotationSpec>
+    SDK API References <sdk_reference>
+    Supported Framework Library <SupportedFramework_Library>
--- a/docs/en_US/sdk_reference.rst
+++ b/docs/en_US/sdk_reference.rst
-###########################
+####################
 Python API Reference
-###########################
+####################

-Trial
------------------------
-..  autofunction:: nni.get_next_parameter
-..  autofunction:: nni.get_current_parameter
-..  autofunction:: nni.report_intermediate_result
-..  autofunction:: nni.report_final_result
-..  autofunction:: nni.get_experiment_id
-..  autofunction:: nni.get_trial_id
-..  autofunction:: nni.get_sequence_id

+..  toctree::
+    :maxdepth: 1

-Tuner
------------------------
-..  autoclass:: nni.tuner.Tuner
-    :members:
-
-..  autoclass:: nni.hyperopt_tuner.hyperopt_tuner.HyperoptTuner
-    :members:
-
-..  autoclass:: nni.evolution_tuner.evolution_tuner.EvolutionTuner
-    :members:
-
-..  autoclass:: nni.smac_tuner.SMACTuner
-    :members:
-
-..  autoclass:: nni.gridsearch_tuner.GridSearchTuner
-    :members:
-
-..  autoclass:: nni.networkmorphism_tuner.networkmorphism_tuner.NetworkMorphismTuner
-    :members:
-
-..  autoclass:: nni.metis_tuner.metis_tuner.MetisTuner
-    :members:
-
-..  autoclass:: nni.ppo_tuner.PPOTuner
-    :members:
-
-..  autoclass:: nni.batch_tuner.batch_tuner.BatchTuner
-    :members:
-
-..  autoclass:: nni.gp_tuner.gp_tuner.GPTuner
-    :members:
-
-Assessor
------------------------
-..  autoclass:: nni.assessor.Assessor
-    :members:
-
-..  autoclass:: nni.assessor.AssessResult
-    :members:
-
-..  autoclass:: nni.curvefitting_assessor.CurvefittingAssessor
-    :members:
-
-..  autoclass:: nni.medianstop_assessor.MedianstopAssessor
-    :members:
-
-
-Advisor
------------------------
-..  autoclass:: nni.msg_dispatcher_base.MsgDispatcherBase
-    :members:
-
-..  autoclass:: nni.hyperband_advisor.hyperband_advisor.Hyperband
-    :members:
-
-..  autoclass:: nni.bohb_advisor.bohb_advisor.BOHB
-    :members:
+    Auto Tune <autotune_ref>
+    NAS <NAS/NasReference>
\ No newline at end of file
--- a/docs/en_US/tuners.rst
+++ b/docs/en_US/tuners.rst
-#################
-Tuners
-#################
-
-NNI provides an easy way to adopt an approach to set up parameter tuning algorithms, we call them **Tuner**.
-
-Tuner receives metrics from `Trial` to evaluate the performance of a specific parameters/architecture configures. And tuner sends next hyper-parameter or architecture configure to Trial.
-
-In NNI, we support two approaches to set the tuner: first is directly use builtin tuner provided by nni sdk, second is customize a tuner file by yourself. We also have Advisor that combines the functinality of Tuner & Assessor.
-
-For details, please refer to the following tutorials:
-
-..  toctree::
-    :maxdepth: 2
-
-    Builtin Tuners <builtin_tuner>
-    Customized Tuners <Tuner/CustomizeTuner>
-    Customized Advisor <Tuner/CustomizeAdvisor>
--- a/docs/en_US/tutorials.rst
+++ b/docs/en_US/tutorials.rst
-######################
-Tutorials
-######################
-
-..  toctree::
-    :maxdepth: 2
-
-    Installation <Tutorial/Installation>
-    Write Trial <TrialExample/Trials>
-    Tuners <tuners>
-    Assessors <assessors>
-    NAS (Beta) <nas>
-    Model Compression (Beta) <model_compression>
-    Feature Engineering (Beta) <feature_engineering>
-    WebUI <Tutorial/WebUI>
-    Training Platform <training_services>
-    How to use docker <Tutorial/HowToUseDocker>
-    advanced
-    Debug HowTo <Tutorial/HowToDebug>
-    NNI on Windows <Tutorial/NniOnWindows>
\ No newline at end of file
--- a/docs/img/nas_abstract_illustration.png
+++ b/docs/img/nas_abstract_illustration.png
--- a/docs/img/pai_data_management_page.jpg
+++ b/docs/img/pai_data_management_page.jpg
--- a/docs/img/pai_job_submission_page.jpg
+++ b/docs/img/pai_job_submission_page.jpg
--- a/docs/img/pai_token_button.jpg
+++ b/docs/img/pai_token_button.jpg
--- a/docs/img/pai_token_profile.jpg
+++ b/docs/img/pai_token_profile.jpg
--- a/docs/img/proxylessnas.png
+++ b/docs/img/proxylessnas.png
--- a/docs/zh_CN/Assessor/BuiltinAssessor.md
+++ b/docs/zh_CN/Assessor/BuiltinAssessor.md
@@ -8,7 +8,7 @@ NNI 提供了先进的调优算法，使用上也很简单。 下面是内置 As

 | Assessor                          | 算法简介                                                                                                                                                                                                                                                      |
 | --------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| [**Medianstop**](#MedianStop)     | Medianstop 是一个简单的提前终止算法。 如果 Trial X 的在步骤 S 的最好目标值比所有已完成 Trial 的步骤 S 的中位数值明显要低，这个 Trial 就会被提前停止。 [参考论文](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf)                                                         |
+| [**Medianstop**](#MedianStop)     | Medianstop 是一个简单的提前终止算法。 如果 Trial X 在步骤 S 的最好目标值低于所有已完成 Trial 前 S 个步骤目标平均值的中位数，这个 Trial 就会被提前停止。 [参考论文](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf)                                                        |
 | [**Curvefitting**](#Curvefitting) | Curve Fitting Assessor 是一个 LPA (learning, predicting, assessing，即学习、预测、评估) 的算法。 如果预测的 Trial X 在 step S 比性能最好的 Trial 要差，就会提前终止它。 此算法中采用了 12 种曲线来拟合精度曲线。 [参考论文](http://aad.informatik.uni-freiburg.de/papers/15-IJCAI-Extrapolation_of_Learning_Curves.pdf) |

 ## 用法

--- a/docs/zh_CN/Assessor/MedianstopAssessor.md
+++ b/docs/zh_CN/Assessor/MedianstopAssessor.md
@@ -2,4 +2,4 @@

 ## Median Stop

-Medianstop 是一种简单的提前终止 Trial 的策略，可参考[论文](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf)。 如果 Trial X 的在步骤 S 的最好目标值比所有已完成 Trial 的步骤 S 的中位数值明显要低，这个 Trial 就会被提前停止。
\ No newline at end of file
+Medianstop 是一种简单的提前终止 Trial 的策略，可参考[论文](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf)。 如果 Trial X 在步骤 S 的最好目标值低于所有已完成 Trial 前 S 个步骤目标平均值的中位数，这个 Trial 就会被提前停止。
\ No newline at end of file
--- a/docs/zh_CN/Compressor/QuickStart.md
+++ b/docs/zh_CN/Compressor/QuickStart.md
+# 模型压缩快速入门
+
+NNI 为模型压缩提供了非常简单的 API。 压缩包括剪枝和量化算法。 它们的用法相同，这里通过 slim Pruner 来演示如何使用。 完整示例在[这里](https://github.com/microsoft/nni/blob/master/examples/model_compress/slim_torch_cifar10.py)
+
+## 编写配置
+
+编写配置来指定要剪枝的层。 以下配置表示剪枝所有的 `BatchNorm2d`，稀疏度设为 0.7，其它层保持不变。
+
+```python
+configure_list = [{
+    'sparsity': 0.7,
+    'op_types': ['BatchNorm2d'],
+}]
+```
+
+配置说明在[这里](Overview.md#user-configuration-for-a-compression-algorithm)。 注意，不同的 Pruner 可能有自定义的配置字段，例如，AGP Pruner 有 `start_epoch`。 详情参考每个 Pruner 的 [使用](Overview.md#supported-algorithms)，来调整相应的配置。
+
+## 选择压缩算法
+
+选择 Pruner 来修剪模型。 首先，使用模型来初始化 Pruner，并将配置作为参数传入，然后调用 `compress()` 来压缩模型。
+
+```python
+pruner = SlimPruner(model, configure_list)
+model = pruner.compress()
+```
+
+然后，使用正常的训练方法来训练模型 （如，SGD），剪枝在训练过程中是透明的。 一些 Pruner 只在最开始剪枝一次，接下来的训练可被看作是微调优化。 有些 Pruner 会迭代的对模型剪枝，在训练过程中逐步修改掩码。
+
+## 导出压缩结果
+
+训练完成后，可获得剪枝后模型的精度。 可将模型权重到处到文件，同时将生成的掩码也导出到文件。 也支持导出 ONNX 模型。
+
+```python
+pruner.export_model(model_path='pruned_vgg19_cifar10.pth', mask_path='mask_vgg19_cifar10.pth')
+```
+
+## 加速模型
+
+掩码实际上并不能加速模型。 要基于导出的掩码，来对模型加速，因此，NNI 提供了 API 来加速模型。 在模型上调用 `apply_compression_results` 后，模型会变得更小，推理延迟也会减小。
+
+```python
+from nni.compression.torch import apply_compression_results
+apply_compression_results(model, 'mask_vgg19_cifar10.pth')
+```
+
+参考[这里](ModelSpeedup.md)，了解详情。
\ No newline at end of file
--- a/docs/zh_CN/FeatureEngineering/Overview.md
+++ b/docs/zh_CN/FeatureEngineering/Overview.md
@@ -6,11 +6,14 @@
 - [GradientFeatureSelector](./GradientFeatureSelector.md)
 - [GBDTSelector](./GBDTSelector.md)

+这些 Selector 适用于结构化的数据（也就是不适用于图像，语音和文本数据）。

-# 如何使用
+另外，Selector 仅用于特征选择。 如果需要： 1) 在特征选择时，通过 NNI 生成高阶的组合特征； 2) 使用分布式资源； 可以尝试[本示例](https://github.com/microsoft/nni/tree/master/examples/feature_engineering/auto-feature-engineering)。
+
+## 如何使用

 ```python
-from nni.feature_engineering.gradient_selector import GradientFeatureSelector
+from nni.feature_engineering.gradient_selector import FeatureGradientSelector
 # from nni.feature_engineering.gbdt_selector import GBDTSelector

 # 读取数据
@@ -18,7 +21,7 @@ from nni.feature_engineering.gradient_selector import GradientFeatureSelector
 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

 # 初始化 Selector
-fgs = GradientFeatureSelector(...)
+fgs = FeatureGradientSelector(...)
 # 拟合数据
 fgs.fit(X_train, y_train)
 # 获取重要的特征
@@ -30,7 +33,7 @@ print(fgs.get_selected_features(...))

 使用内置 Selector 时，需要 `import` 对应的特征选择器，并 `initialize`。 可在 Selector 中调用 `fit` 函数来传入数据。 之后，可通过 `get_seleteced_features` 来获得重要的特征。 不同 Selector 的函数参数可能不同，在使用前需要先检查文档。

-# 如何定制
+## 如何定制？

 NNI 内置了_最先进的_特征工程算法的 Selector。 NNI 也支持定制自己的特征 Selector。

@@ -238,7 +241,7 @@ print("Pipeline Score: ", pipeline.score(X_train, y_train))

 ```

-# 基准测试
+## 基准测试

 `Baseline` 表示没有进行特征选择，直接将数据传入 LogisticRegression。 此基准测试中，仅用了 10% 的训练数据作为测试数据。 对于 GradientFeatureSelector，仅使用了前 20 个特征。 下列指标是在给定测试数据和标签上的平均精度。

@@ -255,7 +258,7 @@ print("Pipeline Score: ", pipeline.score(X_train, y_train))

 代码参考 `/examples/feature_engineering/gradient_feature_selector/benchmark_test.py`。

-## **参考和反馈**
+## 参考和反馈
 * 在 GitHub 中[提交此功能的 Bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md)；
 * 在 GitHub 中[提交新功能或改进请求](https://github.com/microsoft/nni/issues/new?template=enhancement.md)；
 * 了解 NNI 中[神经网络结构搜索的更多信息](https://github.com/microsoft/nni/blob/master/docs/zh_CN/NAS/Overview.md)；

--- a/docs/zh_CN/NAS/Advanced.md
+++ b/docs/zh_CN/NAS/Advanced.md
+# 自定义 NAS 算法
+
+## 扩展 One-Shot Trainer
+
+如果要在真实任务上使用 Trainer，还需要更多操作，如分布式训练，低精度训练，周期日志，写入 TensorBoard，保存检查点等等。 如前所述，一些 Trainer 支持了上述某些功能。 有两种方法可往已有的 Trainer 中加入功能：继承已有 Trainer 并重载，或复制已有 Trainer 并修改。
+
+无论哪种方法，都需要实现新的 Trainer。 基本上，除了新的 Mutator 的概念，实现 One-Shot Trainer 与普通的深度学习 Trainer 相同。 因此，有两处会有所不同：
+
+* 初始化
+
+```python
+model = Model()
+mutator = MyMutator(model)
+```
+
+* 训练
+
+```python
+for _ in range(epochs):
+    for x, y in data_loader:
+        mutator.reset()  # 在模型中重置所有 Choice
+        out = model(x)  # 与普通模型相同
+        loss = criterion(out, y)
+        loss.backward()
+        # 以下代码没有不同
+```
+
+要展示 Mutator 的用途，需要先了解 One-Shot NAS 的工作原理。 通常 One-Shot NAS 会同时优化模型权重和架构权重。 它会反复的：对架构采样，或由超网络中的几种架构组成，然后像普通深度学习模型一样训练，将训练后的参数更新到超网络中，然后用指标或损失作为信号来指导架构的采样。 Mutator，在这里用作架构采样，通常会是另一个深度学习模型。 因此，可将其看作一个通过定义参数，并使用优化器进行优化的任何模型。 Mutator 是由一个模型来初始化的。 一旦 Mutator 绑定到了某个模型，就不能重新绑定到另一个模型上。
+
+`mutator.reset()` 是关键步骤。 这一步确定了模型最终的所有 Choice。 重置的结果会一直有效，直到下一次重置来刷新数据。 重置后，模型可看作是普通的模型来进行前向和反向传播。
+
+最后，Mutator 会提供叫做 `mutator.export()` 的方法来将模型的架构参数作为 dict 导出。 注意，当前 dict 是从 Mutable 键值到选择张量的映射。 为了存储到 JSON，用户需要将张量显式的转换为 Python 的 list。
+
+同时，NNI 提供了工具，能更容易地实现 Trainer。 参考 [Trainer](./NasReference.md#trainers) 了解详情。
+
+## 实现新的 Mutator
+
+这是为了演示 `mutator.reset()` 和 `mutator.export()` 的伪代码。
+
+```python
+def reset(self):
+    self.apply_on_model(self.sample_search())
+```
+
+```python
+def export(self):
+    return self.sample_final()
+```
+
+重置时，新架构会通过 `sample_search()` 采样，并应用到模型上。 然后，对模型进行一步或多步的搜索。 导出时，新架构通过 `sample_final()` 来采样，**不对模型做操作**。 可用于检查点或导出最终架构。
+
+`sample_search()` 和 `sample_final()` 返回值的要求一致：从 Mutable 键值到张量的映射。 张量可以是 BoolTensor （true 表示选择，false 表示没有），或 FloatTensor 将权重应用于每个候选对象。 选定的分支会被计算出来（对于 `LayerChoice`，模型会被调用；对于 `InputChoice`，只有权重），并通过 Choice 的剪枝操作来剪枝模型。 这是 Mutator 实现的示例，大多数算法只需要关心前面部分。
+
+```python
+class RandomMutator(Mutator):
+    def __init__(self, model):
+        super().__init__(model)  # 记得调用 super
+        # 别的操作
+
+    def sample_search(self):
+        result = dict()
+        for mutable in self.mutables:  # 这是用户模型中所有 Mutable 模块
+            # 共享同样键值的 Mutable 会去重
+            if isinstance(mutable, LayerChoice):
+                # 决定此模型会选择 `gen_index`
+                gen_index = np.random.randint(mutable.length)
+                result[mutable.key] = torch.tensor([i == gen_index for i in range(mutable.length)], 
+                                                   dtype=torch.bool)
+            elif isinstance(mutable, InputChoice):
+                if mutable.n_chosen is None:  # n_chosen 是 None，表示选择所有数字
+                    result[mutable.key] = torch.randint(high=2, size=(mutable.n_candidates,)).view(-1).bool()
+                # 其它
+        return result
+
+    def sample_final(self):
+        return self.sample_search()  # 使用同样的逻辑 其它操作
+```
+
+随机 Mutator 的完整示例在[这里](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/nas/pytorch/random/mutator.py)。
+
+对于高级用法，例如，需要在 `LayerChoice` 执行的时候操作模型，可继承 `BaseMutator`，并重载 `on_forward_layer_choice` 和`on_forward_input_choice`。这些是 `LayerChoice` 和 `InputChoice` 对应的回调实现。 还可使用属性 `mutables` 来获得模型中所有的 `LayerChoice` 和 `InputChoice`。 详细信息，[参考这里](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch)。
+
+```eval_rst
+.. tip::
+    用于调试的随机 Mutator。 使用
+
+    .. code-block:: python
+
+        mutator = RandomMutator(model)
+        mutator.reset()
+
+    会立刻从搜索空间中选择一个来激活。
+```
+
+## 实现分布式 NAS Tuner
+
+在学习编写 One-Shot NAS Tuner前，应先了解如何写出通用的 Tuner。 阅读[自定义 Tuner](../Tuner/CustomizeTuner.md) 的教程。
+
+当调用 "[nnictl ss_gen](../Tutorial/Nnictl.md)" 时，会生成下面这样的搜索空间文件：
+
+```json
+{
+    "key_name": {
+        "_type": "layer_choice",
+        "_value": ["op1_repr", "op2_repr", "op3_repr"]
+    },
+    "key_name": {
+        "_type": "input_choice",
+        "_value": {
+            "candidates": ["in1_key", "in2_key", "in3_key"],
+            "n_chosen": 1
+        }
+    }
+}
+```
+
+这是 Tuner 在 `update_search_space` 中会收到的搜索空间。 Tuner 需要解析搜索空间，并在 `generate_parameters` 中生成新的候选。 有效的 "参数" 格式如下：
+
+```json
+{
+    "key_name": {
+        "_value": "op1_repr",
+        "_idx": 0
+    },
+    "key_name": {
+        "_value": ["in2_key"],
+        "_idex": [1]
+    }
+}
+```
+
+和普通超参优化 Tuner 类似，通过 `generate_parameters` 来发送。 参考 [SPOS](./SPOS.md) 示例代码。
\ No newline at end of file
--- a/docs/zh_CN/NAS/NasGuide.md
+++ b/docs/zh_CN/NAS/NasGuide.md
+# 指南：在 NNI 上使用 NAS
+
+```eval_rst
+.. contents::
+
+.. Note:: 此 API 初始试验阶段。 当前接口可能会更改。
+```
+
+![](../../img/nas_abstract_illustration.png)
+
+现代神经架构搜索（NAS）方法通常包含 [三个维度](https://arxiv.org/abs/1808.05377)：搜索空间、搜索策略和性能估计策略。 搜索空间通常是要搜索的一个有限的神经网络架构，而搜索策略会采样来自搜索空间的架构，评估性能，并不断演进。 理想情况下，搜索策略会找到搜索空间中最好的架构，并返回给用户。 在获得了 "最好架构" 后，很多方法都会有 "重新训练" 的步骤，会像普通神经网络模型一样训练。
+
+## 实现搜索空间
+
+假设已经有了基础的模型，该如何使用 NAS 来提升？ 以 [PyTorch 上的 MNIST](https://github.com/pytorch/examples/blob/master/mnist/main.py) 为例，代码如下：
+
+```python
+from nni.nas.pytorch import mutables
+
+class Net(nn.Module):
+    def __init__(self):
+        super(Net, self).__init__()
+        self.conv1 = mutables.LayerChoice([
+            nn.Conv2d(1, 32, 3, 1),
+            nn.Conv2d(1, 32, 5, 3)
+        ])  # try 3x3 kernel and 5x5 kernel
+        self.conv2 = nn.Conv2d(32, 64, 3, 1)
+        self.dropout1 = nn.Dropout2d(0.25)
+        self.dropout2 = nn.Dropout2d(0.5)
+        self.fc1 = nn.Linear(9216, 128)
+        self.fc2 = nn.Linear(128, 10)
+
+    def forward(self, x):
+        x = self.conv1(x)
+        x = F.relu(x)
+        # ... 与原始的一样 ...
+        返回输出
+```
+
+以上示例在 conv1 上添加了 conv5x5 的选项。 修改非常简单，只需要声明 `LayerChoice` 并将原始的 conv3x3 和新的 conv5x5 作为参数即可。 就这么简单！ 不需要修改 forward 函数。 可将 conv1 想象为没有 NAS 的模型。
+
+如何表示可能的连接？ 通过 `InputChoice` 来实现。 要在 MNIST 示例上使用跳过连接，需要增加另一层 conv3。 下面的示例中，从 conv2 的可能连接加入到了 conv3 的输出中。
+
+```python
+from nni.nas.pytorch import mutables
+
+class Net(nn.Module):
+    def __init__(self):
+        # ... 相同 ...
+        self.conv2 = nn.Conv2d(32, 64, 3, 1)
+        self.conv3 = nn.Conv2d(64, 64, 1, 1)
+        # 声明搜索策略，来选择最多一个选项
+        self.skipcon = mutables.InputChoice(n_candidates=1)
+        # ... 相同 ...
+
+    def forward(self, x):
+        x = self.conv1(x)
+        x = F.relu(x)
+        x = self.conv2(x)
+        x0 = self.skipcon([x])  # 从 [x] 中选择 0 或 1 个
+        x = self.conv3(x)
+        if x0 is not None:  # 允许跳过连接
+            x += x0
+        x = F.max_pool2d(x, 2)
+        # ... 相同 ...
+        返回输出
+```
+
+Input Choice 可被视为可调用的模块，它接收张量数组，输出其中部分的连接、求和、平均（默认为求和），或没有选择时输出 `None`。 与 Layer Choice 一样，Input Choice 要**在 `__init__` 中初始化，并在 `forward` 中调用。 稍后的例子中会看到搜索算法如何识别这些 Choice，并进行相应的准备。</p>
+
+`LayerChoice` 和 `InputChoice` 都是 **Mutable**。 Mutable 表示 "可变化的"。 与传统深度学习层、模型都是固定的不同，使用 Mutable 的模块，是一组可能选择的模型。
+
+用户可为每个 Mutable 指定 **key**。 默认情况下，NNI 会分配全局唯一的，但如果需要共享 Choice（例如，两个 `LayerChoice` 有同样的候选操作，希望共享同样的 Choice。即，如果一个选择了第 i 个操作，第二个也要选择第 i 个操作），那么就应该给它们相同的 key。 key 标记了此 Choice，并会在存储的检查点中使用。 如果要增加导出架构的可读性，可为每个 Mutable 的 key 指派名称。 高级用法参考 [Mutable](./NasReference.md#mutables)。
+
+## 使用搜索算法
+
+搜索空间的探索方式和 Trial 生成方式不同，至少有两种不同的方法用来搜索。 一种是分布式运行 NAS，可从头枚举运行所有架构。或者利用更多高级功能，如 [SMASH](https://arxiv.org/abs/1708.05344), [ENAS](https://arxiv.org/abs/1802.03268), [DARTS](https://arxiv.org/abs/1808.05377), [FBNet](https://arxiv.org/abs/1812.03443), [ProxylessNAS](https://arxiv.org/abs/1812.00332), [SPOS](https://arxiv.org/abs/1904.00420), [Single-Path NAS](https://arxiv.org/abs/1904.02877),  [Understanding One-shot](http://proceedings.mlr.press/v80/bender18a) 以及 [GDAS](https://arxiv.org/abs/1910.04465)。 由于很多不同架构搜索起来成本较高，另一类方法，即 One-Shot NAS，在搜索空间中，构建包含有所有候选网络的超网络，每一步中选择一个或几个子网络来训练。
+
+当前，NNI 支持数种 One-Shot 方法。 例如，`DartsTrainer` 使用 SGD 来交替训练架构和模型权重，`ENASTrainer` [使用 Controller 来训练模型](https://arxiv.org/abs/1802.03268)。 新的、更高效的 NAS Trainer 在研究界不断的涌现出来。
+
+### One-Shot NAS
+
+每个 One-Shot NAS 都实现了 Trainer，可在每种算法说明中找到详细信息。 这是如何使用 `EnasTrainer` 的简单示例。
+
+```python
+# 此处与普通模型训练相同
+model = Net()
+dataset_train = CIFAR10(root="./data", train=True, download=True, transform=train_transform)
+dataset_valid = CIFAR10(root="./data", train=False, download=True, transform=valid_transform)
+criterion = nn.CrossEntropyLoss()
+optimizer = torch.optim.SGD(model.parameters(), 0.05, momentum=0.9, weight_decay=1.0E-4)
+
+# 使用 NAS
+def top1_accuracy(output, target):
+    # ENAS 使用此函数来计算奖励
+    batch_size = target.size(0)
+    _, predicted = torch.max(output.data, 1)
+    return (predicted == target).sum().item() / batch_size
+
+def metrics_fn(output, target):
+    # 指标函数接收输出和目标，并计算出指标 dict
+    return {"acc1": reward_accuracy(output, target)}
+
+from nni.nas.pytorch import enas
+trainer = enas.EnasTrainer(model,
+                           loss=criterion,
+                           metrics=metrics_fn,
+                           reward_function=top1_accuracy,
+                           optimizer=optimizer,
+                           batch_size=128
+                           num_epochs=10,  # 10 epochs
+                           dataset_train=dataset_train,
+                           dataset_valid=dataset_valid,
+                           log_frequency=10)  # 每 10 步打印
+trainer.train()  # 训练
+trainer.export(file="model_dir/final_architecture.json")  # 将最终架构导出到文件
+```
+
+用户可直接通过 `python3 train.py` 开始训练，不需要使用 `nnictl`。 训练完成后，可通过 `trainer.export()` 导出找到的最好的模型。
+
+通常，Trainer 会有些可定制的参数，例如，损失函数，指标函数，优化器，以及数据集。 这些功能可满足大部分需求，NNI 会尽力让内置 Trainer 能够处理更多的模型、任务和数据集。 但无法保证全面的支持。 例如，一些 Trainer 假设必须是分类任务；一些 Trainer 对 "Epoch" 的定义有所不同（例如，ENAS 的 epoch 表示一部分子步骤加上一些 Controller 的步骤）；大多数 Trainer 不支持分布式训练，不会将模型通过 `DataParallel` 或 `DistributedDataParallel` 进行包装。 如果通过试用，想要在定制的应用中使用 Trainer，可能需要[自定义 Trainer](#extend-the-ability-of-one-shot-trainers)。
+
+### 分布式 NAS
+
+神经网络架构搜索通过在 Trial 任务中独立运行单个子模型来实现。 NNI 同样支持这种搜索方法，其天然适用于 NNI 的超参搜索框架。Tuner 为每个 Trial 生成子模型，并在训练平台上运行。
+
+要使用此模式，不需要修改 NNI NAS API 的搜索空间定义 (即, `LayerChoice`, `InputChoice`, `MutableScope`)。 模型初始化后，在模型上调用 `get_and_apply_next_architecture`。 One-shot NAS Trainer 不能在此模式中使用。 简单示例：
+
+```python
+model = Net()
+
+# 从 Tuner 中获得选择的架构，并应用到模型上
+get_and_apply_next_architecture(model)
+train(model)  # 训练模型的代码
+acc = test(model)  # 测试训练好的模型
+nni.report_final_result(acc)  # 报告所选架构的性能
+```
+
+搜索空间应生成，并发送给 Tuner。 通过 NNI NAS API，搜索空间嵌入在用户代码中，需要通过 "[nnictl ss_gen](../Tutorial/Nnictl.md)" 来生成搜索空间文件。 然后，将生成的搜索空间文件路径填入 `config.yml` 的 `searchSpacePath`。 `config.yml` 中的其它字段参考[教程](../Tutorial/QuickStart.md)。
+
+可使用 [NNI Tuner](../Tuner/BuiltinTuner.md) 来搜索。 目前，只有 PPO Tuner 支持 NAS 搜索空间。
+
+为了便于调试，其支持独立运行模式，可直接运行 Trial 命令，而不启动 NNI Experiment。 可以通过此方法来检查 Trial 代码是否可正常运行。 在独立模式下，`LayerChoice` 和 `InputChoice` 会选择最开始的候选项。
+
+[此处](https://github.com/microsoft/nni/tree/master/examples/nas/classic_nas/config_nas.yml)是完整示例。
+
+### 使用导出的架构重新训练
+
+搜索阶段后，就该训练找到的架构了。 与很多开源 NAS 算法不同，它们为重新训练专门写了新的模型。 我们发现搜索模型和重新训练模型的过程非常相似，因而可直接将一样的模型代码用到最终模型上。 例如：
+
+```python
+model = Net()
+apply_fixed_architecture(model, "model_dir/final_architecture.json")
+```
+
+JSON 文件是从 Mutable key 到 Choice 的表示。 例如：
+
+```json
+{
+    "LayerChoice1": [false, true, false, false],
+    "InputChoice2": [true, true, false]
+}
+```
+
+应用后，模型会被固定，并准备好进行最终训练。 虽然它可能包含了更多的参数，但可作为单个模型来使用。 这各有利弊。 好的方面是，可以在搜索阶段直接读取来自超网络的检查点，并开始重新训练。 但是，这也造成模型有荣誉的参数，在计算模型所包含的参数数量时，可能会不准确。 更多深层次原因和解决方法可参考 [Trainer](./NasReference.md#retrain)。
+
+也可参考 [DARTS](./DARTS.md) 的重新训练代码。
\ No newline at end of file