Merge branch 'master' of https://github.com/microsoft/nni

ba8dccd6 · suiguoxin · 56a1575b · 150ee83a · ba8dccd6 · ba8dccd6
Commit ba8dccd6 authored Jun 23, 2019 by suiguoxin
20 changed files
--- a/docs/zh_CN/CommunitySharings/HpoComparision.md
+++ b/docs/zh_CN/CommunitySharings/HpoComparision.md
+# 超参数优化的对比
+
+*匿名作者*
+
+超参优化算法在几个问题上的对比。
+
+超参数优化算法如下：
+
+- [Random Search（随机搜索）](../BuiltinTuner.md)
+- [Grid Search（遍历搜索）](../BuiltinTuner.md)
+- [Evolution](../BuiltinTuner.md)
+- [Anneal（退火算法）](../BuiltinTuner.md)
+- [Metis](../BuiltinTuner.md)
+- [TPE](../BuiltinTuner.md)
+- [SMAC](../BuiltinTuner.md)
+- [HyperBand](../BuiltinTuner.md)
+- [BOHB](../BuiltinTuner.md)
+
+所有算法都在 NNI 本机环境下运行。
+
+环境：
+
+    OS: Linux Ubuntu 16.04 LTS
+    CPU: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2600 MHz
+    Memory: 112 GB
+    NNI Version: v0.7
+    NNI 模式(local|pai|remote): local
+    Python 版本: 3.6
+    使用的虚拟环境: Conda
+    是否在 Docker 中运行: no
+    
+
+## AutoGBDT 示例
+
+### 问题描述
+
+超参搜索上的非凸问题 [AutoGBDT](../gbdt_example.md)。
+
+### 搜索空间
+
+```json
+{
+  "num_leaves": {
+    "_type": "choice",
+    "_value": [10, 12, 14, 16, 18, 20, 22, 24, 28, 32, 48, 64, 96, 128]
+  },
+  "learning_rate": {
+    "_type": "choice",
+    "_value": [0.00001, 0.0001, 0.001, 0.01, 0.05, 0.1, 0.2, 0.5]
+  },
+  "max_depth": {
+    "_type": "choice",
+    "_value": [-1, 2, 3, 4, 5, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 28, 32, 48, 64, 96, 128]
+  },
+  "feature_fraction": {
+    "_type": "choice",
+    "_value": [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2]
+  },
+  "bagging_fraction": {
+    "_type": "choice",
+    "_value": [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2]
+  },
+  "bagging_freq": {
+    "_type": "choice",
+    "_value": [1, 2, 4, 8, 10, 12, 14, 16]
+  }
+}
+```
+
+总搜索空间为 1, 204, 224 次，将最大 Trial 次数设置为1000。 时间限制为 48 小时。
+
+### 结果
+
+| 算法            | 最好的损失值       | 最好的 5 次损失的平均值 | 最好的 10 次损失的平均 |
+| ------------- | ------------ | ------------- | ------------- |
+| Random Search | 0.418854     | 0.420352      | 0.421553      |
+| Random Search | 0.417364     | 0.420024      | 0.420997      |
+| Random Search | 0.417861     | 0.419744      | 0.420642      |
+| Grid Search   | 0.498166     | 0.498166      | 0.498166      |
+| Evolution     | 0.409887     | 0.409887      | 0.409887      |
+| Evolution     | 0.413620     | 0.413875      | 0.414067      |
+| Evolution     | 0.409887     | 0.409887      | 0.409887      |
+| Anneal        | 0.414877     | 0.417289      | 0.418281      |
+| Anneal        | 0.409887     | 0.409887      | 0.410118      |
+| Anneal        | 0.413683     | 0.416949      | 0.417537      |
+| Metis         | 0.416273     | 0.420411      | 0.422380      |
+| Metis         | 0.420262     | 0.423175      | 0.424816      |
+| Metis         | 0.421027     | 0.424172      | 0.425714      |
+| TPE           | 0.414478     | 0.414478      | 0.414478      |
+| TPE           | 0.415077     | 0.417986      | 0.418797      |
+| TPE           | 0.415077     | 0.417009      | 0.418053      |
+| SMAC          | **0.408386** | **0.408386**  | **0.408386**  |
+| SMAC          | 0.414012     | 0.414012      | 0.414012      |
+| SMAC          | **0.408386** | **0.408386**  | **0.408386**  |
+| BOHB          | 0.410464     | 0.415319      | 0.417755      |
+| BOHB          | 0.418995     | 0.420268      | 0.422604      |
+| BOHB          | 0.415149     | 0.418072      | 0.418932      |
+| HyperBand     | 0.414065     | 0.415222      | 0.417628      |
+| HyperBand     | 0.416807     | 0.417549      | 0.418828      |
+| HyperBand     | 0.415550     | 0.415977      | 0.417186      |
+
+Metis 算法因为其高斯计算过程的复杂度为 O(n^3) 而运行非常慢，因此仅执行了 300 次 Trial。
+
+## RocksDB 的 'fillrandom' 和 'readrandom' 基准测试
+
+### 问题描述
+
+[DB_Bench](https://github.com/facebook/rocksdb/wiki/Benchmarking-tools) 是用来做 [RocksDB](https://rocksdb.org/) 性能基准测试的工具。 有多个参数需要调优。
+
+`DB_Bench` 的性能与计算机配置和安装方法有关。 在 `DB_Bench` Linux 系统上运行，并将 Rock 作为共享库安装。
+
+#### 计算机配置
+
+    RocksDB:    version 6.1
+    CPU:        6 * Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
+    CPUCache:   35840 KB
+    Keys:       16 bytes each
+    Values:     100 bytes each (50 bytes after compression)
+    Entries:    1000000
+    
+
+#### 存储性能
+
+**延迟**：每个 IO 请求都需要一些时间才能完成，这称为平均延迟。 有几个因素会影响此时间，包括网络连接质量和硬盘IO性能。
+
+**IOPS**： **每秒的 IO 操作数量**，这意味着可以在一秒钟内完成的*读取或写入操作次数*。
+
+**IO 大小**： **每个 IO 请求的大小**。 根据操作系统和需要磁盘访问的应用程序、服务，它将同时发出读取或写入一定数量数据的请求。
+
+**吞吐量（以 MB/s 为单位）= 平均 IO 大小 x IOPS **
+
+IOPS 与在线处理能力有关，我们在实验中使用 IOPS 作为指标。
+
+### 搜索空间
+
+```json
+{
+  "max_background_compactions": {
+    "_type": "quniform",
+    "_value": [1, 256, 1]
+  },
+  "block_size": {
+    "_type": "quniform",
+    "_value": [1, 500000, 1]
+  },
+  "write_buffer_size": {
+    "_type": "quniform",
+    "_value": [1, 130000000, 1]
+  },
+  "max_write_buffer_number": {
+    "_type": "quniform",
+    "_value": [1, 128, 1]
+  },
+  "min_write_buffer_number_to_merge": {
+    "_type": "quniform",
+    "_value": [1, 32, 1]
+  },
+  "level0_file_num_compaction_trigger": {
+    "_type": "quniform",
+    "_value": [1, 256, 1]
+  },
+  "level0_slowdown_writes_trigger": {
+    "_type": "quniform",
+    "_value": [1, 1024, 1]
+  },
+  "level0_stop_writes_trigger": {
+    "_type": "quniform",
+    "_value": [1, 1024, 1]
+  },
+  "cache_size": {
+    "_type": "quniform",
+    "_value": [1, 30000000, 1]
+  },
+  "compaction_readahead_size": {
+    "_type": "quniform",
+    "_value": [1, 30000000, 1]
+  },
+  "new_table_reader_for_compaction_inputs": {
+    "_type": "randint",
+    "_value": [1]
+  }
+}
+```
+
+搜索空间非常大（约10 的 40 次方），将最大 Trial 次数设置为 100 以限制资源。
+
+### 结果
+
+#### fillrandom 基准
+
+| 模型        | 最高 IOPS（重复 1 次） | 最高 IOPS（重复 2 次） | 最高 IOPS（重复 3 次） |
+| --------- | --------------- | --------------- | --------------- |
+| Random    | 449901          | 427620          | 477174          |
+| Anneal    | 461896          | 467150          | 437528          |
+| Evolution | 436755          | 389956          | 389790          |
+| TPE       | 378346          | 482316          | 468989          |
+| SMAC      | 491067          | 490472          | **491136**      |
+| Metis     | 444920          | 457060          | 454438          |
+
+Figure:
+
+![](../../img/hpo_rocksdb_fillrandom.png)
+
+#### readrandom 基准
+
+| 模型        | 最高 IOPS（重复 1 次） | 最高 IOPS（重复 2 次） | 最高 IOPS（重复 3 次） |
+| --------- | --------------- | --------------- | --------------- |
+| Random    | 2276157         | 2285301         | 2275142         |
+| Anneal    | 2286330         | 2282229         | 2284012         |
+| Evolution | 2286524         | 2283673         | 2283558         |
+| TPE       | 2287366         | 2282865         | 2281891         |
+| SMAC      | 2270874         | 2284904         | 2282266         |
+| Metis     | **2287696**     | 2283496         | 2277701         |
+
+Figure:
+
+![](../../img/hpo_rocksdb_readrandom.png)
\ No newline at end of file
--- a/docs/zh_CN/CommunitySharings/NasComparision.md
+++ b/docs/zh_CN/CommunitySharings/NasComparision.md
+# 神经网络结构搜索的对比
+
+*匿名作者*
+
+训练和比较 NAS（神经网络架构搜索）的模型，包括 Autokeras，DARTS，ENAS 和 NAO。
+
+源码链接如下：
+
+- Autokeras: <https://github.com/jhfjhfj1/autokeras>
+
+- DARTS: <https://github.com/quark0/darts>
+
+- ENAS: <https://github.com/melodyguan/enas>
+
+- NAO: <https://github.com/renqianluo/NAO>
+
+## 实验说明
+
+为了避免算法仅仅在 **CIFAR-10** 数据集上过拟合，还对比了包括 Fashion-MNIST, CIFAR-100, OUI-Adience-Age, ImageNet-10-1 (ImageNet的子集) 和 ImageNet-10-2 (ImageNet 的另一个子集) 在内的其它 5 个数据集。 分别从 ImageNet 中抽取 10 种不同类别标签的子集，组成 ImageNet10-1 和 ImageNet10-2 数据集 。
+
+| 数据集                                                                                     | 训练数据集大小 | 类别标签数 | 数据集说明                                                       |
+|:--------------------------------------------------------------------------------------- | ------- | ----- | ----------------------------------------------------------- |
+| [Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist)                       | 60,000  | 10    | T恤上衣，裤子，套头衫，连衣裙，外套，凉鞋，衬衫，运动鞋，包和踝靴。                          |
+| [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)                                 | 50,000  | 10    | 飞机，汽车，鸟类，猫，鹿，狗，青蛙，马，船和卡车。                                   |
+| [CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html)                                | 50,000  | 100   | 和 CIFAR-10 类似，但总共有 100 个类，每个类有 600 张图。                      |
+| [OUI-Adience-Age](https://talhassner.github.io/home/projects/Adience/Adience-data.html) | 26,580  | 8     | 8 个年龄组类别 (0-2, 4-6, 8-13, 15-20, 25-32, 38-43, 48-53, 60-)。 |
+| [ImageNet-10-1](http://www.image-net.org/)                                              | 9,750   | 10    | 咖啡杯、电脑键盘、餐桌、衣柜、割草机、麦克风、秋千、缝纫机、里程表和燃气泵。                      |
+| [ImageNet-10-2](http://www.image-net.org/)                                              | 9,750   | 10    | 鼓，班吉，口哨，三角钢琴，小提琴，管风琴，原声吉他，长号，长笛和萨克斯。                        |
+
+没有改变源码中的 Fine-tuning 方法。 为了匹配每个任务，改变了源码中模型的输入图片大小和输出类别数目的部分。
+
+所有 NAS 方法模型搜索时间和重训练时间都是**两天**。 所有结果都是基于**三次重复实验**。 评估计算机有一块 Nvidia Tesla P100 GPU、112GB 内存和 2.60GHz CPU (Intel E5-2690)。
+
+NAO 需要太多的计算资源，因此只使用提供 Pipeline 脚本的 NAO-WS。
+
+对于 AutoKeras，使用了 0.2.18 版本的代码, 因为这是开始实验时的最新版本。
+
+## NAS 结果对比
+
+| NAS             | AutoKeras (%) | ENAS (macro) (%) | ENAS (micro) (%) | DARTS (%) | NAO-WS (%) |
+| --------------- |:-------------:|:----------------:|:----------------:|:---------:|:----------:|
+| Fashion-MNIST   |     91.84     |      95.44       |      95.53       | **95.74** |   95.20    |
+| CIFAR-10        |     75.78     |      95.68       |    **96.16**     |   94.23   |   95.64    |
+| CIFAR-100       |     43.61     |      78.13       |      78.84       | **79.74** |   75.75    |
+| OUI-Adience-Age |     63.20     |    **80.34**     |      78.55       |   76.83   |   72.96    |
+| ImageNet-10-1   |     61.80     |      77.07       |      79.80       | **80.48** |   77.20    |
+| ImageNet-10-2   |     37.20     |      58.13       |      56.47       |   60.53   | **61.20**  |
+
+很遗憾，我们无法复现论文中所有的结果。
+
+论文中提供的最佳或平均结果：
+
+| NAS       | AutoKeras(%) | ENAS (macro) (%) | ENAS (micro) (%) |   DARTS (%)    | NAO-WS (%)  |
+| --------- | ------------ |:----------------:|:----------------:|:--------------:|:-----------:|
+| CIFAR- 10 | 88.56(best)  |   96.13(best)    |   97.11(best)    | 97.17(average) | 96.47(best) |
+
+AutoKeras，由于其算法中的随机因素，它在所有数据集中的表现相对较差。
+
+ENAS，ENAS（macro）在 OUI-Adience-Age 数据集中表现较好，并且 ENAS（micro）在 CIFAR-10 数据集中表现较好。
+
+对于DARTS，在某些数据集上具有良好的结果，但在某些数据集中具有比较大的方差。 DARTS 三次实验中的差异在 OUI-Audience-Age 数据集上可达 5.37％（绝对值），在 ImageNet-10-1 数据集上可达4.36％（绝对值）。
+
+NAO-WS 在 ImageNet-10-2 中表现良好，但在 OUI-Adience-Age 中表现非常差。
+
+## 参考文献
+
+1. Jin, Haifeng, Qingquan Song, and Xia Hu. "Efficient neural architecture search with network morphism." *arXiv preprint arXiv:1806.10282* (2018).
+
+2. Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "Darts: Differentiable architecture search." arXiv preprint arXiv:1806.09055 (2018).
+
+3. Pham, Hieu, et al. "Efficient Neural Architecture Search via Parameters Sharing." international conference on machine learning (2018): 4092-4101.
+
+4. Luo, Renqian, et al. "Neural Architecture Optimization." neural information processing systems (2018): 7827-7838.
\ No newline at end of file
--- a/docs/zh_CN/GeneralNasInterfaces.md
+++ b/docs/zh_CN/GeneralNasInterfaces.md
-# 神经网络架构搜索的通用编程接口
+# 神经网络架构搜索的通用编程接口（测试版）
+
+** 这是一个测试中的功能，目前只实现了通用的 NAS 编程接口。 接下来的版本会基于此接口支持权重共享和 one-shot NAS。*

 自动化的神经网络架构（NAS）搜索在寻找更好的模型方面发挥着越来越重要的作用。 最近的研究工作证明了自动化 NAS 的可行性，并发现了一些超越手动设计和调整的模型。 代表算法有 [NASNet](https://arxiv.org/abs/1707.07012)，[ENAS](https://arxiv.org/abs/1802.03268)，[DARTS](https://arxiv.org/abs/1806.09055)，[Network Morphism](https://arxiv.org/abs/1806.10282)，以及 [Evolution](https://arxiv.org/abs/1703.01041) 等。 新的算法还在不断涌现。 然而，实现这些算法需要很大的工作量，且很难重用其它算法的代码库来实现。

@@ -24,6 +26,8 @@

 此示例有两种写 Annotation 的方法。 对于上面的示例，输入函数的形式是 `[[], [out3]]` 。 对于下面的示例，输入的形式是 `[[out3], []]`。

+**调试**：`nnictl trial codegen` 命令可帮助调试 NAS 编程接口。 如果 Experiment `YYY` 中的 Trial 的 `XXX` 出错了，可以运行 `nnictl trial codegen YYY --trial_id XXX` 在当前目录下生成这个 Trial 的可执行代码。 通过运行此代码，可以不需要 NNI 就能调试 Trial 失败的原因。 此命令会编译 Trial 代码，并用实际选择的层次和输入来替换 NNI 的 NAS 代码。
+
 ### 示例：为层选择输入的连接

 设计层的连接对于制作高性能模型至关重要。 通过此接口，可选择一个层可以采用哪些连接来作为输入。 可以从一组连接中选择几个。 下面的示例从三个候选输入中为 `concat` 这个函数选择两个输入 。 `concat` 还会使用 `fixed_inputs` 获取其上一层的输出 。
@@ -91,9 +95,9 @@ NNI 的 Annotation 编译器会将 Trial 代码转换为可以接收架构选择

 上图显示了 Trial 代码如何在 NNI 上运行。 `nnictl` 处理 Trial 代码，并生成搜索空间文件和编译后的 Trial 代码。 前者会输入 Tuner，后者会在 Trial 代码运行时使用。

-[**待实现**] NNI 上 NAS 的简单示例。
+[使用 NAS 的简单示例](https://github.com/microsoft/nni/tree/v0.8/examples/trials/mnist-nas)。

-### 权重共享
+### [**待实现**] 权重共享

 在所选择的架构（即 Trial）之间共享权重可以加速模型搜索。 例如，适当地继承已完成 Trial 的权重可加速新 Trial 的收敛。 One-shot NAS（例如，ENAS，Darts）更为激进，不同架构（即子图）的训练会在完整图中共享相同的权重。

@@ -101,9 +105,9 @@ NNI 的 Annotation 编译器会将 Trial 代码转换为可以接收架构选择

 权重分配（转移）在加速 NAS 中有关键作用，而找到有效的权重共享方式仍是热门的研究课题。 NNI 提供了一个键值存储，用于存储和加载权重。 Tuner 和 Trial 使用 KV 客户端库来访问存储。

-[**待实现**] NNI 上的权重共享示例。
+NNI 上的权重共享示例。

-### 支持 One-Shot NAS
+### [**待实现**] 支持 One-Shot NAS

 One-Shot NAS 是流行的，能在有限的时间和资源预算内找到较好的神经网络结构的方法。 本质上，它会基于搜索空间来构建完整的图，并使用梯度下降最终找到最佳子图。 它有不同的训练方法，如：[training subgraphs (per mini-batch)](https://arxiv.org/abs/1802.03268) ，[training full graph through dropout](http://proceedings.mlr.press/v80/bender18a/bender18a.pdf)，以及 [training with architecture weights (regularization)](https://arxiv.org/abs/1806.09055) 。 这里会关注第一种方法，即训练子图（ENAS）。

@@ -113,17 +117,17 @@ One-Shot NAS 是流行的，能在有限的时间和资源预算内找到较好

 One-Shot NAS 的设计如上图所示。 One-Shot NAS 通常只有一个带有完整图的 Trial 任务。 NNI 支持运行多个此类 Trial 任务，每个任务都独立运行。 由于 One-Shot NAS 不够稳定，运行多个实例有助于找到更好的模型。 此外，Trial 任务之间也能在运行时同步权重（即，只有一份权重数据，如异步的参数 — 服务器模式）。 这样有可能加速收敛。

-[**TODO**] NNI 上的 One-Shot NAS 示例。
+One-Shot NAS 示例。

-## 通用的 NAS 调优算法
+## [**待实现**] NAS 的一般调优算法。

 与超参数调优一样，NAS 也需要相对通用的算法。 通用编程接口使其更容易。 贡献者为 NAS 提供了基于 RL 的调参算法。 期待社区努力设计和实施更好的 NAS 调优算法。

-[**待实现**] 更多 NAS 的调优算法。
+NAS 的一般调优算法。

-## 导出最好的神经网络网络架构和代码
+## [**待实现**] 导出最佳神经网络架构和代码

-[**待实现**] Experiment 完成后，可通过 `nnictl experiment export --code` 来导出用最好的神经网络结构和 Trial 代码。
+Experiment 完成后，可通过 `nnictl experiment export --code` 来导出用最好的神经网络结构和 Trial 代码。

 ## 结论和未来的工作


--- a/docs/zh_CN/NniOnWindows.md
+++ b/docs/zh_CN/NniOnWindows.md
@@ -4,7 +4,7 @@

 ## **在 Windows 上安装**

-详细信息参考[安装](Installation.md#installation-on-windows)。
+详细信息参考[安装文档](Installation.md)。

 完成操作后，使用 **config_windows.yml** 配置来开始 Experiment 进行验证。


--- a/docs/zh_CN/PAIMode.md
+++ b/docs/zh_CN/PAIMode.md
-# **在 OpenPAI 上运行 Experiment**
-
-NNI 支持在 [OpenPAI](https://github.com/Microsoft/pai) （简称 pai）上运行 Experiment，即 pai 模式。 在使用 NNI 的 pai 模式前, 需要有 [OpenPAI](https://github.com/Microsoft/pai) 群集的账户。 如果没有 OpenPAI 账户，参考[这里](https://github.com/Microsoft/pai#how-to-deploy)来进行部署。 在 pai 模式中，会在 Docker 创建的容器中运行 Trial 程序。
-
-## 设置环境
-
-参考[指南](QuickStart.md)安装 NNI。
-
-## 运行 Experiment
-
-以 `examples/trials/mnist-annotation` 为例。 NNI 的 YAML 配置文件如下：
-
-```yaml
-authorName: your_name
-experimentName: auto_mnist
-# 并发运行的 Trial 数量
-trialConcurrency: 2
-# Experiment 的最长持续运行时间
-maxExecDuration: 3h
-# 空表示一直运行
-maxTrialNum: 100
-# 可选项: local, remote, pai
-trainingServicePlatform: pai
-# 可选项: true, false  
-useAnnotation: true
-tuner:
-  builtinTunerName: TPE
-  classArgs:
-    optimize_mode: maximize
-trial:
-  command: python3 mnist.py
-  codeDir: ~/nni/examples/trials/mnist-annotation
-  gpuNum: 0
-  cpuNum: 1
-  memoryMB: 8196
-  image: openpai/pai.example.tensorflow
-  dataDir: hdfs://10.1.1.1:9000/nni
-  outputDir: hdfs://10.1.1.1:9000/nni
-# 配置访问的 OpenPAI 集群
-paiConfig:
-  userName: your_pai_nni_user
-  passWord: your_pai_password
-  host: 10.1.1.1
-```
-
-注意：如果用 pai 模式运行，需要在 YAML 文件中设置 `trainingServicePlatform: pai`。
-
-与本机模式，以及[远程计算机模式](RemoteMachineMode.md)相比，pai 模式的 Trial 有额外的配置：
-
-* cpuNum 
-    * 必填。 Trial 程序的 CPU 需求，必须为正数。
-* memoryMB 
-    * 必填。 Trial 程序的内存需求，必须为正数。
-* image 
-    * 必填。 在 pai 模式中，Trial 程序由 OpenPAI 在 [Docker 容器](https://www.docker.com/)中安排运行。 此键用来指定 Trial 程序的容器使用的 Docker 映像。
-    * [Docker Hub](https://hub.docker.com/) 上有预制的 NNI Docker 映像 [nnimsra/nni](https://hub.docker.com/r/msranni/nni/)。 它包含了用来启动 NNI Experiment 所依赖的所有 Python 包，Node 模块和 JavaScript。 生成此 Docker 映像的文件在[这里](https://github.com/Microsoft/nni/tree/master/deployment/docker/Dockerfile)。 可以直接使用此映像，或参考它来生成自己的映像。
-* dataDir 
-    * 可选。 指定了 Trial 用于下载数据的 HDFS 数据目录。 格式应为 hdfs://{your HDFS host}:9000/{数据目录}
-* outputDir 
-    * 可选。 指定了 Trial 的 HDFS 输出目录。 Trial 在完成（成功或失败）后，Trial 的 stdout， stderr 会被 NNI 自动复制到此目录中。 格式应为 hdfs://{your HDFS host}:9000/{输出目录}
-* virtualCluster 
-    * 可选。 设置 OpenPAI 的 virtualCluster，即虚拟集群。 如果未设置此参数，将使用默认的虚拟集群。
-* shmMB 
-    * 可选。 设置 OpenPAI 的 shmMB，即 Docker 中的共享内存。
-
-完成并保存 NNI Experiment 配置文件后（例如可保存为：exp_pai.yml），运行以下命令：
-
-    nnictl create --config exp_pai.yml
-    
-
-来在 pai 模式下启动 Experiment。 NNI 会为每个 Trial 创建 OpenPAI 作业，作业名称的格式为 `nni_exp_{experiment_id}_trial_{trial_id}`。 可以在 OpenPAI 集群的网站中看到 NNI 创建的作业，例如： ![](../img/nni_pai_joblist.jpg)
-
-注意：pai 模式下，NNIManager 会启动 RESTful 服务，监听端口为 NNI 网页服务器的端口加1。 例如，如果网页端口为`8080`，那么 RESTful 服务器会监听在 `8081`端口，来接收运行在 Kubernetes 中的 Trial 作业的指标。 因此，需要在防火墙中启用端口 `8081` 的 TCP 协议，以允许传入流量。
-
-当一个 Trial 作业完成后，可以在 NNI 网页的概述页面（如：http://localhost:8080/oview）中查看 Trial 的信息。
-
-在 Trial 列表页面中展开 Trial 信息，点击如下的 logPath： ![](../img/nni_webui_joblist.jpg)
-
-接着将会打开 HDFS 的 WEB 界面，并浏览到 Trial 的输出文件： ![](../img/nni_trial_hdfs_output.jpg)
-
-在输出目录中可以看到三个文件：stderr, stdout, 以及 trial.log
-
-如果希望将 Trial 的模型数据等其它输出保存到HDFS中，可在 Trial 代码中使用 `NNI_OUTPUT_DIR` 来自己保存输出文件，NNI SDK会从 Trial 的容器中将 `NNI_OUTPUT_DIR` 中的文件复制到 HDFS 中。
-
-如果在使用 pai 模式时遇到任何问题，请到 [NNI Github](https://github.com/Microsoft/nni) 中创建问题。
-
-## 版本校验
-
-从 0.6 开始，NNI 支持版本校验。确保 NNIManager 与 trialKeeper 的版本一致，避免兼容性错误。  
-检查策略：
-
-1. 0.6 以前的 NNIManager 可与任何版本的 trialKeeper 一起运行，trialKeeper 支持向后兼容。
-2. 从 NNIManager 0.6 开始，与 triakKeeper 的版本必须一致。 例如，如果 NNIManager 是 0.6 版，则 trialKeeper 也必须是 0.6 版。 
-3. 注意，只有版本的前两位数字才会被检查。例如，NNIManager 0.6.1 可以和 trialKeeper 的 0.6 或 0.6.2 一起使用，但不能与 trialKeeper 的 0.5.1 或 0.7 版本一起使用。 
-
-如果 Experiment 无法运行，而且不能确认是否是因为版本不匹配造成的，可以在 Web 界面检查是否有相关的错误消息。  
-![](../img/version_check.png)
\ No newline at end of file
--- a/docs/zh_CN/PaiMode.md
+++ b/docs/zh_CN/PaiMode.md
@@ -33,7 +33,7 @@ trial:
  gpuNum: 0
  cpuNum: 1
  memoryMB: 8196
-  image: openpai/pai.example.tensorflow
+  image: msranni/nni:latest
  dataDir: hdfs://10.1.1.1:9000/nni
  outputDir: hdfs://10.1.1.1:9000/nni
 # 配置访问的 OpenPAI 集群

--- a/docs/zh_CN/Release.md
+++ b/docs/zh_CN/Release.md
 # 更改日志

+# 发布 0.8 - 6/4/2019
+
+## 主要功能
+
+* 在 Windows 上支持 NNI 的 OpenPAI 和远程模式 
+    * NNI 可在 Windows 上使用 OpenPAI 模式
+    * NNI 可在 Windows 上使用 OpenPAI 模式
+* GPU 的高级功能 
+    * 在本机或远程模式上，可在同一个 GPU 上运行多个 Trial。
+    * 在已经运行非 NNI 任务的 GPU 上也能运行 Trial
+* 支持 Kubeflow v1beta2 操作符 
+    * 支持 Kubeflow TFJob/PyTorchJob v1beta2
+* [通过 NAS 编程接口](./GeneralNasInterfaces.md) 
+    * 实现了 NAS 的编程接口，可通过 NNI Annotation 很容易的表达神经网络架构搜索空间
+    * 提供新命令 `nnictl trial codegen` 来调试 NAS 代码生成部分
+    * 提供 NAS 编程接口教程，NAS 在 MNIST 上的示例，用于 NAS 的可定制的随机 Tuner
+* 支持在恢复 Experiment 时，同时恢复 Tuner 和 Advisor 的状态 
+    * 在恢复 Experiment 时，Tuner 和 Advisor 会导入已完成的 Trial 的数据。
+* Web 界面 
+    * 改进拷贝 Trial 参数的设计
+    * 在 hyper-parameter 图中支持 'randint' 类型
+    * 使用 ComponentUpdate 来避免不必要的刷新
+
+## Bug 修复和其它更新
+
+* 修复 `nnictl update` 不一致的命令行风格
+* SMAC Tuner 支持导入数据
+* 支持 Experiment 状态从 ERROR 回到 RUNNING
+* 修复表格的 Bug
+* 优化嵌套搜索空间
+* 优化 'randint' 类型，并支持下限
+* [比较不同超参搜索调优算法](./CommunitySharings/HpoComparision.md)
+* [NAS 算法的对比](./CommunitySharings/NasComparision.md)
+* [Recommenders 上的实践](./CommunitySharings/NniPracticeSharing/RecommendersSvd.md)
+
 ## 发布 0.7 - 4/29/2018

 ### 主要功能

 * [支持在 Windows 上使用 NNI](./WindowsLocalMode.md) 
-  * NNI 可在 Windows 上使用本机模式
+    * NNI 可在 Windows 上使用本机模式
 * [支持新的 Advisor: BOHB](./BohbAdvisor.md) 
-  * 支持新的 BOHB Advisor，这是一个健壮而有效的超参调优算法，囊括了贝叶斯优化和 Hyperband 的优点
+    * 支持新的 BOHB Advisor，这是一个健壮而有效的超参调优算法，囊括了贝叶斯优化和 Hyperband 的优点
 * [支持通过 nnictl 来导入导出 Experiment 数据](./Nnictl.md#experiment) 
-  * 在 Experiment 执行完后，可生成分析结果报告
-  * 支持将先前的调优数据导入到 Tuner 和 Advisor 中
+    * 在 Experiment 执行完后，可生成分析结果报告
+    * 支持将先前的调优数据导入到 Tuner 和 Advisor 中
 * [可为 NNI Trial 任务指定 GPU](./ExperimentConfig.md#localConfig) 
-  * 通过 gpuIndices 配置来为 Trial 任务指定GPU。如果 Experiment 配置文件中有 gpuIndices，则只有指定的 GPU 会被用于 NNI 的 Trial 任务。
+    * 通过 gpuIndices 配置来为 Trial 任务指定GPU。如果 Experiment 配置文件中有 gpuIndices，则只有指定的 GPU 会被用于 NNI 的 Trial 任务。
 * 改进 Web 界面 
-  * 在 Web 界面上使用十进制格式的指标
-  * 添加多阶段训练相关的提示
-  * 可将超参复制为 Python dict 格式
-  * 可将提前终止的 Trial 数据传入 Tuner。
+    * 在 Web 界面上使用十进制格式的指标
+    * 添加多阶段训练相关的提示
+    * 可将超参复制为 Python dict 格式
+    * 可将提前终止的 Trial 数据传入 Tuner。
 * 为 nnictl 提供更友好的错误消息 
-  * 为 YAML 文件格式错误提供更有意义的错误信息
+    * 为 YAML 文件格式错误提供更有意义的错误信息

 ### Bug 修复

@@ -31,12 +66,12 @@

 ### 主要功能

-* [版本检查](https://github.com/Microsoft/nni/blob/master/docs/zh_CN/PaiMode.md#version-check) 
-  * 检查 nniManager 和 trialKeeper 的版本是否一致
+* [版本检查](https://github.com/Microsoft/nni/blob/master/docs/en_US/PaiMode.md#version-check) 
+    * 检查 nniManager 和 trialKeeper 的版本是否一致
 * [提前终止的任务也可返回最终指标](https://github.com/Microsoft/nni/issues/776) 
-  * 如果 includeIntermediateResults 为 true，最后一个 Assessor 的中间结果会被发送给 Tuner 作为最终结果。 includeIntermediateResults 的默认值为 false。
+    * 如果 includeIntermediateResults 为 true，最后一个 Assessor 的中间结果会被发送给 Tuner 作为最终结果。 includeIntermediateResults 的默认值为 false。
 * [分离 Tuner/Assessor](https://github.com/Microsoft/nni/issues/841) 
-  * 增加两个管道来分离 Tuner 和 Assessor 的消息
+    * 增加两个管道来分离 Tuner 和 Assessor 的消息
 * 使日志集合功能可配置
 * 为所有 Trial 增加中间结果的视图

@@ -101,15 +136,15 @@
 #### 改进训练平台

 * [FrameworkController 训练平台](./FrameworkControllerMode.md): 支持使用在 Kubernetes 上使用 FrameworkController。 
-  * FrameworkController 是 Kubernetes 上非常通用的控制器（Controller），能用来运行基于各种机器学习框架的分布式作业，如 TensorFlow，Pytorch， MXNet 等。
-  * NNI 为作业定义了统一而简单的规范。
-  * 如何使用 FrameworkController 的 MNIST 样例。
+    * FrameworkController 是 Kubernetes 上非常通用的控制器（Controller），能用来运行基于各种机器学习框架的分布式作业，如 TensorFlow，Pytorch， MXNet 等。
+    * NNI 为作业定义了统一而简单的规范。
+    * 如何使用 FrameworkController 的 MNIST 样例。

 #### 改进用户体验

 * 为 OpenPAI, Kubeflow 和 FrameworkController 模式提供更好的日志支持。 
-  * 改进后的日志架构能将尝试的 stdout/stderr 通过 HTTP POST 方式发送给 NNI 管理器。 NNI 管理器将 Trial 的 stdout/stderr 消息存储在本地日志文件中。
-  * 在 WEB 界面上显示 Trial 日志的链接。
+    * 改进后的日志架构能将尝试的 stdout/stderr 通过 HTTP POST 方式发送给 NNI 管理器。 NNI 管理器将 Trial 的 stdout/stderr 消息存储在本地日志文件中。
+    * 在 WEB 界面上显示 Trial 日志的链接。
 * 支持将最终结果显示为键值对。

 ## 发布 0.4.1 - 12/14/2018
@@ -150,19 +185,19 @@
 ### 主要功能

 * [Kubeflow 训练服务](./KubeflowMode.md) 
-  * 支持 tf-operator
-  * 使用 Kubeflow 的[分布式 Trial 样例](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed/dist_mnist.py)
+    * 支持 tf-operator
+    * 使用 Kubeflow 的[分布式 Trial 样例](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed/dist_mnist.py)
 * [网格搜索 Tuner](GridsearchTuner.md) 
 * [Hyperband Tuner](HyperbandAdvisor.md)
 * 支持在 MAC 上运行 NNI Experiment
 * Web 界面 
-  * 支持 hyperband Tuner
-  * 移除 tensorboard 按钮
-  * 显示 Experiment 的错误消息
-  * 显示搜索空间和 Trial 配置的行号
-  * 支持通过指定的 Trial id 来搜索
-  * 显示 Trial 的 hdfsLogPath
-  * 下载 Experiment 参数
+    * 支持 hyperband Tuner
+    * 移除 tensorboard 按钮
+    * 显示 Experiment 的错误消息
+    * 显示搜索空间和 Trial 配置的行号
+    * 支持通过指定的 Trial id 来搜索
+    * 显示 Trial 的 hdfsLogPath
+    * 下载 Experiment 参数

 ### 其它

@@ -170,22 +205,22 @@
 * 更新 Docker 文件，增加 pytorch 库 
 * 重构 'nnictl stop' 过程，发送 SIGTERM 给 NNI 管理器进程，而不是调用停止 Restful API. 
 * 修复 OpenPAI 训练服务的 Bug 
-  * 在 NNI 管理器中为 OpenPAI 集群配置文件支持 IP 配置(nniManagerIp)，来修复用户计算机没有 eth0 设备的问题。 
-  * codeDir 中的文件数量上限改为1000，避免用户无意中填写了 root 目录。
-  * 移除 OpenPAI 作业的 stdout 日志中无用的 ‘metrics is empty’。 在新指标被记录时，仅输出有用的消息，来减少用户检查 OpenPAI Trial 输出时的困惑。
-  * 在 Trial keeper 的开始增加时间戳。
+    * 在 NNI 管理器中为 OpenPAI 集群配置文件支持 IP 配置(nniManagerIp)，来修复用户计算机没有 eth0 设备的问题。 
+    * codeDir 中的文件数量上限改为1000，避免用户无意中填写了 root 目录。
+    * 移除 OpenPAI 作业的 stdout 日志中无用的 ‘metrics is empty’。 在新指标被记录时，仅输出有用的消息，来减少用户检查 OpenPAI Trial 输出时的困惑。
+    * 在 Trial keeper 的开始增加时间戳。

 ## 发布 0.3.0 - 11/2/2018

 ### NNICTL 的新功能和更新

 * 支持同时运行多个 Experiment。
-  
-  在 v0.3 以前，NNI 仅支持一次运行一个 Experiment。 此版本开始，用户可以同时运行多个 Experiment。 每个 Experiment 都需要一个唯一的端口，第一个 Experiment 会像以前版本一样使用默认端口。 需要为其它 Experiment 指定唯一端口：
-  
-  ```bash
-  nnictl create --port 8081 --config <config file path>
-  ```
+    
+    在 v0.3 以前，NNI 仅支持一次运行一个 Experiment。 此版本开始，用户可以同时运行多个 Experiment。 每个 Experiment 都需要一个唯一的端口，第一个 Experiment 会像以前版本一样使用默认端口。 需要为其它 Experiment 指定唯一端口：
+    
+    ```bash
+    nnictl create --port 8081 --config <config file path>
+    ```

 * 支持更新最大 Trial 的数量。 使用 `nnictl update --help` 了解详情。 或参考 [NNICTL](Nnictl.md) 查看完整帮助。

@@ -194,18 +229,18 @@
 * <span style="color:red"><strong>不兼容的改动</strong></span>：nn.get_parameters() 改为 nni.get_next_parameter。 所有以前版本的样例将无法在 v0.3 上运行，需要重新克隆 NNI 代码库获取新样例。 如果在自己的代码中使用了 NNI，也需要相应的更新。

 * 新 API **nni.get_sequence_id()**。 每个 Trial 任务都会被分配一个唯一的序列数字，可通过 nni.get_sequence_id() API 来获取。
-  
-  ```bash
-  git clone -b v0.3 https://github.com/Microsoft/nni.git
-  ```
+    
+    ```bash
+    git clone -b v0.3 https://github.com/Microsoft/nni.git
+    ```

 * **nni.report_final_result(result)** API 对结果参数支持更多的数据类型。
-  
-  可用类型：
-  
-  * int
-  * float
-  * 包含有 'default' 键值的 dict，'default' 的值必须为 int 或 float。 dict 可以包含任何其它键值对。
+    
+    可用类型：
+    
+    * int
+    * float
+    * 包含有 'default' 键值的 dict，'default' 的值必须为 int 或 float。 dict 可以包含任何其它键值对。

 ### 支持新的 Tuner

@@ -214,10 +249,10 @@
 ### 新样例

 * 公开的 NNI Docker 映像：
-  
-  ```bash
-  docker pull msranni/nni:latest
-  ```
+    
+    ```bash
+    docker pull msranni/nni:latest
+    ```

 * 新的 Trial 样例： [NNI Sklearn 样例](https://github.com/Microsoft/nni/tree/master/examples/trials/sklearn)

@@ -234,14 +269,14 @@
 ### 主要功能

 * 支持 [OpenPAI](https://github.com/Microsoft/pai) (又称 pai) 训练服务 (参考[这里](./PaiMode.md)来了解如何在 OpenPAI 下提交 NNI 任务) 
-  * 支持 pai 模式的训练服务。 NNI Trial 可发送至 OpenPAI 集群上运行
-  * NNI Trial 输出 (包括日志和模型文件) 会被复制到 OpenPAI 的 HDFS 中。
+    * 支持 pai 模式的训练服务。 NNI Trial 可发送至 OpenPAI 集群上运行
+    * NNI Trial 输出 (包括日志和模型文件) 会被复制到 OpenPAI 的 HDFS 中。
 * 支持 [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) Tuner (参考[这里](SmacTuner.md)，了解如何使用 SMAC Tuner) 
-  * [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) 基于 Sequential Model-Based Optimization (SMBO). 它会利用使用过的结果好的模型（高斯随机过程模型），并将随机森林引入到 SMBO 中，来处理分类参数。 NNI 的 SMAC 通过包装 [SMAC3](https://github.com/automl/SMAC3) 来支持。
+    * [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) 基于 Sequential Model-Based Optimization (SMBO). 它会利用使用过的结果好的模型（高斯随机过程模型），并将随机森林引入到 SMBO 中，来处理分类参数。 NNI 的 SMAC 通过包装 [SMAC3](https://github.com/automl/SMAC3) 来支持。
 * 支持将 NNI 安装在 [conda](https://conda.io/docs/index.html) 和 Python 虚拟环境中。
 * 其它 
-  * 更新 ga squad 样例与相关文档
-  * 用户体验改善及 Bug 修复
+    * 更新 ga squad 样例与相关文档
+    * 用户体验改善及 Bug 修复

 ### 已知问题

@@ -254,20 +289,20 @@
 ### 主要功能

 * 安装和部署 
-  * 支持 pip 和源代码安装
-  * 支持本机（包括多 GPU 卡）训练和远程多机训练模式
+    * 支持 pip 和源代码安装
+    * 支持本机（包括多 GPU 卡）训练和远程多机训练模式
 * Tuner ，Assessor 和 Trial 
-  * 支持的自动机器学习算法包括： hyperopt_tpe, hyperopt_annealing, hyperopt_random, 和 evolution_tuner。
-  * 支持 Assessor（提前终止）算法包括：medianstop。
-  * 提供 Python API 来自定义 Tuner 和 Assessor
-  * 提供 Python API 来包装 Trial 代码，以便能在 NNI 中运行
+    * 支持的自动机器学习算法包括： hyperopt_tpe, hyperopt_annealing, hyperopt_random, 和 evolution_tuner。
+    * 支持 Assessor（提前终止）算法包括：medianstop。
+    * 提供 Python API 来自定义 Tuner 和 Assessor
+    * 提供 Python API 来包装 Trial 代码，以便能在 NNI 中运行
 * Experiment 
-  * 提供命令行工具 'nnictl' 来管理 Experiment
-  * 提供网页界面来查看并管理 Experiment
+    * 提供命令行工具 'nnictl' 来管理 Experiment
+    * 提供网页界面来查看并管理 Experiment
 * 持续集成 
-  * 使用 Ubuntu 的 [travis-ci](https://github.com/travis-ci) 来支持持续集成
+    * 使用 Ubuntu 的 [travis-ci](https://github.com/travis-ci) 来支持持续集成
 * 其它 
-  * 支持简单的 GPU 任务调度
+    * 支持简单的 GPU 任务调度

 ### 已知问题


--- a/docs/zh_CN/Trials.md
+++ b/docs/zh_CN/Trials.md
@@ -149,7 +149,7 @@ export NNI_TRIAL_SEQ_ID=1
 export MULTI_PHASE=false
 export CUDA_VISIBLE_DEVICES=
 eval python3 mnist.py 2>/home/user_name/nni/experiments/$experiment_id$/trials/$trial_id$/stderr
-echo $? `date +%s000` >/home/user_name/nni/experiments/$experiment_id$/trials/$trial_id$/.nni/state
+echo $? `date +%s%3N` >/home/user_name/nni/experiments/$experiment_id$/trials/$trial_id$/.nni/state
 ```

 ### 其它模式
@@ -166,4 +166,4 @@ echo $? `date +%s000` >/home/user_name/nni/experiments/$experiment_id$/trials/$t
 * [为 CIFAR 10 分类找到最佳的 optimizer](Cifar10Examples.md)
 * [如何在 NNI 调优 SciKit-learn 的参数](SklearnExamples.md)
 * [在阅读理解上使用自动模型架构搜索。](SquadEvolutionExamples.md)
-* [如何在 NNI 上调优 GBDT](GbdtExample.md)
\ No newline at end of file
+* [如何在 NNI 上调优 GBDT](GbdtExample.md)
--- a/docs/zh_CN/advanced.rst
+++ b/docs/zh_CN/advanced.rst
@@ -3,4 +3,5 @@

 ..  toctree::
    多阶段<MultiPhase>
-    高级网络架构搜索<AdvancedNas>
\ No newline at end of file
+    高级网络架构搜索<AdvancedNas>
+    NAS 编程接口<GeneralNasInterfaces>
\ No newline at end of file
--- a/docs/zh_CN/builtin_assessor.rst
+++ b/docs/zh_CN/builtin_assessor.rst
@@ -4,6 +4,6 @@
 ..  toctree::
    :maxdepth: 1

-    介绍<BuiltinAssessors>
+    介绍<BuiltinAssessor>
    Medianstop<MedianstopAssessor>
    Curvefitting<CurvefittingAssessor>
\ No newline at end of file
--- a/examples/experiment_config.yml
+++ b/examples/experiment_config.yml
-authorName: 
-experimentName: 
-trialConcurrency: 
-maxExecDuration: 
-maxTrialNum: 
+authorName:
+experimentName:
+trialConcurrency:
+maxExecDuration:
+maxTrialNum:
 #choice: local, remote
-trainingServicePlatform: 
-searchSpacePath: 
+trainingServicePlatform:
+searchSpacePath:
 #choice: true, false
-useAnnotation: 
+useAnnotation:
 tuner:
  #choice: TPE, Random, Anneal, Evolution
-  builtinTunerName: 
+  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
-    optimize_mode: 
+    optimize_mode:
 assessor:
  #choice: Medianstop
-  builtinAssessorName: 
+  builtinAssessorName:
  classArgs:
    #choice: maximize, minimize
-    optimize_mode: 
+    optimize_mode:
 trial:
-  command: 
-  codeDir: 
-  gpuNum: 
+  command:
+  codeDir:
+  gpuNum:
 #machineList can be empty if the platform is local
 machineList:
-  - ip: 
-    port: 
-    username: 
+  - ip:
+    port:
+    username:
    passwd: 
\ No newline at end of file
--- a/examples/trials/README.md
+++ b/examples/trials/README.md
 # How to write a Trial running on NNI?

-*Trial receive the hyper-parameter/architecture configure from Tuner, and send intermediate result to Assessor and final result to Tuner.* 
+*Trial receive the hyper-parameter/architecture configure from Tuner, and send intermediate result to Assessor and final result to Tuner.*

 So when user want to write a Trial running on NNI, she/he should:

@@ -140,9 +140,9 @@ def train(args, params):

    _, acc = model.evaluate(x_test, y_test, verbose=0)

-...    
+...
 ```
-**4) Send final result**  
+**4) Send final result**

 Use `nni.report_final_result` to send final result to Tuner. Please noted **15** line in the following code.

@@ -162,7 +162,7 @@ def train(args, params):

    _, acc = model.evaluate(x_test, y_test, verbose=0)
    nni.report_final_result(acc)
-...    
+...
 ```

 Here is the complete example:

--- a/examples/trials/auto-gbdt/main.py
+++ b/examples/trials/auto-gbdt/main.py
@@ -3,9 +3,9 @@
 #
 # MIT License
 #
-# Permission is hereby granted, free of charge, 
+# Permission is hereby granted, free of charge,
 # to any person obtaining a copy of this software and associated
-# documentation files (the "Software"), 
+# documentation files (the "Software"),
 # to deal in the Software without restriction, including without limitation
 # the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
 # to permit persons to whom the Software is furnished to do so, subject to the following conditions:
@@ -88,7 +88,7 @@ def run(lgb_train, lgb_eval, params, X_test, y_test):
    # predict
    y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)

-    # eval 
+    # eval
    rmse = mean_squared_error(y_test, y_pred) ** 0.5
    print('The rmse of prediction is:', rmse)


--- a/examples/trials/cifar10_pytorch/README.md
+++ b/examples/trials/cifar10_pytorch/README.md
@@ -2,5 +2,5 @@ This example requires pytorch.
 pytorch install package should be chosen based on python version and cuda version.

 Here is an example of the environment python==3.5 and cuda == 8.0, then using the following commands to install pytorch:
-python3 -m pip install http://download.pytorch.org/whl/cu80/torch-0.4.1-cp35-cp35m-linux_x86_64.whl 
+python3 -m pip install http://download.pytorch.org/whl/cu80/torch-0.4.1-cp35-cp35m-linux_x86_64.whl
 python3 -m pip install torchvision
\ No newline at end of file
--- a/examples/trials/cifar10_pytorch/main.py
+++ b/examples/trials/cifar10_pytorch/main.py
@@ -95,7 +95,7 @@ def prepare(args):
    if args['optimizer'] == 'Adam':
        optimizer = optim.Adam(net.parameters(), lr=args['lr'])
    if args['optimizer'] == 'Adamax':
-        optimizer = optim.Adam(net.parameters(), lr=args['lr'])       
+        optimizer = optim.Adam(net.parameters(), lr=args['lr'])


 # Training

--- a/examples/trials/ga_squad/README.md
+++ b/examples/trials/ga_squad/README.md
@@ -239,7 +239,7 @@ class CustomerTuner(Tuner):
            indiv.mutation()
            graph = indiv.config
            temp =  json.loads(graph_dumps(graph))
-    
+
    # ......
 ```


--- a/examples/trials/kaggle-tgs-salt/README.md
+++ b/examples/trials/kaggle-tgs-salt/README.md
 ## 33rd place solution code for Kaggle [TGS Salt Identification Chanllenge](https://www.kaggle.com/c/tgs-salt-identification-challenge)

-This example shows how to enable AutoML for competition code by running it on NNI without any code change. 
+This example shows how to enable AutoML for competition code by running it on NNI without any code change.
 To run this code on NNI, firstly you need to run it standalone, then configure the config.yml and:
 ```
 nnictl create --config config.yml
@@ -18,7 +18,7 @@ Stage 1:

 Train fold 0-3 for 100 epochs, for each fold, train 3 models:
 ```
-python3 train.py --ifolds 0 --epochs 100 --model_name UNetResNetV4 
+python3 train.py --ifolds 0 --epochs 100 --model_name UNetResNetV4
 python3 train.py --ifolds 0 --epochs 100 --model_name UNetResNetV5 --layers 50
 python3 train.py --ifolds 0 --epochs 100 --model_name UNetResNetV6
 ```
@@ -28,7 +28,7 @@ Stage 2:
 Fine tune stage 1 models for 300 epochs with cosine annealing lr scheduler:

 ```
-python3 train.py --ifolds 0 --epochs 300 --lrs cosine --lr 0.001 --min_lr 0.0001 --model_name UNetResNetV4 
+python3 train.py --ifolds 0 --epochs 300 --lrs cosine --lr 0.001 --min_lr 0.0001 --model_name UNetResNetV4
 ```

 Stage 3:

--- a/examples/trials/kaggle-tgs-salt/augmentation.py
+++ b/examples/trials/kaggle-tgs-salt/augmentation.py
@@ -165,7 +165,7 @@ def test_transform():
        RandomHFlipWithMask(),
        RandomVFlipWithMask(),
        RandomRotateWithMask([0, 90, 180, 270]),
-        #RandomRotateWithMask(15), 
+        #RandomRotateWithMask(15),
        RandomResizedCropWithMask(768, scale=(0.81, 1))
    ])


--- a/examples/trials/kaggle-tgs-salt/focal_loss.py
+++ b/examples/trials/kaggle-tgs-salt/focal_loss.py
@@ -33,7 +33,7 @@ class FocalLoss2d(nn.Module):

    def forward(self, logit, target, class_weight=None, type='sigmoid'):
        target = target.view(-1, 1).long()
-        
+
        if type=='sigmoid':
            if class_weight is None:
                class_weight = [1]*2 #[0.5, 0.5]

--- a/examples/trials/kaggle-tgs-salt/loader.py
+++ b/examples/trials/kaggle-tgs-salt/loader.py
@@ -40,11 +40,11 @@ class ImageDataset(data.Dataset):

        self.train_mode = train_mode
        self.meta = meta
-    
+
        self.img_ids = meta[ID_COLUMN].values
        self.salt_exists = meta['salt_exists'].values
        self.is_train = meta['is_train'].values
-        
+
        if self.train_mode:
            self.mask_filenames = meta[Y_COLUMN].values

@@ -207,7 +207,7 @@ def get_train_loaders(ifold, batch_size=8, dev_mode=False, pad_mode='edge', meta

    val_set = ImageDataset(True, val_meta,
                            augment_with_target=img_mask_aug_val,
-                            image_augment=None, 
+                            image_augment=None,
                            image_transform=get_image_transform(pad_mode),
                            mask_transform=get_mask_transform(pad_mode))
    val_loader = data.DataLoader(val_set, batch_size=batch_size, shuffle=False, num_workers=4, collate_fn=val_set.collate_fn)
@@ -221,7 +221,7 @@ def get_test_loader(batch_size=16, index=0, dev_mode=False, pad_mode='edge'):
    if dev_mode:
        test_meta = test_meta.iloc[:10]
    test_set = ImageDataset(False, test_meta,
-                            image_augment=None if pad_mode == 'resize' else transforms.Pad((13,13,14,14), padding_mode=pad_mode), 
+                            image_augment=None if pad_mode == 'resize' else transforms.Pad((13,13,14,14), padding_mode=pad_mode),
                            image_transform=get_tta_transforms(index, pad_mode))
    test_loader = data.DataLoader(test_set, batch_size=batch_size, shuffle=False, num_workers=4, collate_fn=test_set.collate_fn, drop_last=False)
    test_loader.num = len(test_set)
@@ -236,13 +236,13 @@ def get_depth_tensor(pad_mode):

    if depth_channel_tensor is not None:
        return depth_channel_tensor
-    
+
    depth_tensor = None

    if pad_mode == 'resize':
        depth_tensor = np.zeros((H, W))
        for row, const in enumerate(np.linspace(0, 1, H)):
-            depth_tensor[row, :] = const 
+            depth_tensor[row, :] = const
    else:
        depth_tensor = np.zeros((ORIG_H, ORIG_W))
        for row, const in enumerate(np.linspace(0, 1, ORIG_H)):