Merge pull request #2664 from microsoft/v1.7

V1.7 merge back to master

Merge pull request #2664 from microsoft/v1.7
V1.7 merge back to master
755e313f · chicm-ms · GitHub · 51aebf18 · a38df504 · 755e313f
Unverified Commit 755e313f authored Jul 08, 2020 by chicm-ms Committed by GitHub Jul 08, 2020
20 changed files
--- a/azure-pipelines.yml
+++ b/azure-pipelines.yml
@@ -30,7 +30,7 @@ jobs:
      python3 -m pip install torch==1.5.0+cpu torchvision==0.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html --user
      python3 -m pip install tensorflow==2.2.0 --user
      python3 -m pip install keras==2.4.2 --user
-      python3 -m pip install gym onnx peewee --user
+      python3 -m pip install gym onnx peewee thop --user
      python3 -m pip install sphinx==1.8.3 sphinx-argparse==0.2.5 sphinx-markdown-tables==0.0.9 sphinx-rtd-theme==0.4.2 sphinxcontrib-websupport==1.1.0 recommonmark==0.5.0 nbsphinx --user
      sudo apt-get install swig -y
      nnictl package install --name=SMAC
@@ -59,6 +59,7 @@ jobs:
      python3 -m pip install --upgrade pip setuptools --user
      python3 -m pip install pylint==2.3.1 astroid==2.2.5 --user
      python3 -m pip install coverage --user
+      python3 -m pip install thop --user
      echo "##vso[task.setvariable variable=PATH]${HOME}/.local/bin:${PATH}"
    displayName: 'Install python tools'
  - script: |

--- a/deployment/docker/Dockerfile
+++ b/deployment/docker/Dockerfile
@@ -29,6 +29,11 @@ RUN DEBIAN_FRONTEND=noninteractive && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
+#
+# generate python script
+#
+RUN cp /usr/bin/python3 /usr/bin/python
 #
 # update pip
 #
@@ -69,6 +74,13 @@ RUN python3 -m pip --no-cache-dir install pandas==0.23.4 lightgbm==2.2.2
 #
 RUN python3 -m pip --no-cache-dir install nni
+#
+# install aml package
+#
+RUN python3 -m pip --no-cache-dir install azureml
+RUN python3 -m pip --no-cache-dir install azureml-sdk
 ENV PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/root/.local/bin:/usr/bin:/bin:/sbin
 WORKDIR /root
--- a/docs/en_US/Compressor/CompressionUtils.md
+++ b/docs/en_US/Compressor/CompressionUtils.md
@@ -129,5 +129,6 @@ from nni.compression.torch.utils.counter import count_flops_params
 # Given input size (1, 1, 28, 28) 
 flops, params = count_flops_params(model, (1, 1, 28, 28))
+# Format output size to M (i.e., 10^6)
 print(f'FLOPs: {flops/1e6:.3f}M,  Params: {params/1e6:.3f}M)
 ```
\ No newline at end of file
--- a/docs/en_US/NAS/Benchmarks.md
+++ b/docs/en_US/NAS/Benchmarks.md
@@ -7,6 +7,9 @@
    Example Usages <BenchmarksExample>
 ```
+## Introduction
+To imporve the reproducibility of NAS algorithms as well as reducing computing resource requirements, researchers proposed a series of NAS benchmarks such as [NAS-Bench-101](https://arxiv.org/abs/1902.09635), [NAS-Bench-201](https://arxiv.org/abs/2001.00326), [NDS](https://arxiv.org/abs/1905.13214), etc. NNI provides a query interface for users to acquire these benchmarks. Within just a few lines of code, researcher are able to evaluate their NAS algorithms easily and fairly by utilizing these benchmarks.
 ## Prerequisites
 * Please prepare a folder to household all the benchmark databases. By default, it can be found at `${HOME}/.nni/nasbenchmark`. You can place it anywhere you like, and specify it in `NASBENCHMARK_DIR` before importing NNI.
@@ -14,33 +17,17 @@
 ## Data Preparation
-To avoid storage and legal issues, we do not provide any prepared databases. We strongly recommend users to use docker to run the generation scripts, to ease the burden of installing multiple dependencies. Please follow the following steps.
+To avoid storage and legality issues, we do not provide any prepared databases. Please follow the following steps.
-**Step 1.** Clone NNI repo. Replace `${NNI_VERSION}` with a released version name or branch name, e.g., `v1.6`.
-```bash
-git clone -b ${NNI_VERSION} https://github.com/microsoft/nni
-```
-**Step 2.** Run docker.
-For NAS-Bench-101,
+1. Clone NNI to your machine and enter `examples/nas/benchmarks` directory.
-```bash
-docker run -v ${HOME}/.nni/nasbenchmark:/outputs -v /path/to/your/nni:/nni tensorflow/tensorflow:1.15.2-py3 /bin/bash /nni/examples/nas/benchmarks/nasbench101.sh
 ```
+git clone -b ${NNI_VERSION} https://github.com/microsoft/nni
-For NAS-Bench-201,
+cd nni/examples/nas/benchmarks
-```bash
-docker run -v ${HOME}/.nni/nasbenchmark:/outputs -v /path/to/your/nni:/nni ufoym/deepo:pytorch-cpu /bin/bash /nni/examples/nas/benchmarks/nasbench201.sh
 ```
+Replace `${NNI_VERSION}` with a released version name or branch name, e.g., `v1.7`.
-For NDS,
+2. Install dependencies via `pip3 install -r xxx.requirements.txt`. `xxx` can be `nasbench101`, `nasbench201` or `nds`.
+3. Generate the database via `./xxx.sh`. The directory that stores the benchmark file can be configured with `NASBENCHMARK_DIR` environment variable, which defaults to `~/.nni/nasbenchmark`. Note that the NAS-Bench-201 dataset will be downloaded from a google drive.
-```bash
-docker run -v ${HOME}/.nni/nasbenchmark:/outputs -v /path/to/your/nni:/nni python:3.7 /bin/bash /nni/examples/nas/benchmarks/nds.sh
-```
 Please make sure there is at least 10GB free disk space and note that the conversion process can take up to hours to complete.

--- a/docs/en_US/NAS/BenchmarksExample.ipynb
+++ b/docs/en_US/NAS/BenchmarksExample.ipynb
 {
- "nbformat": 4,
- "nbformat_minor": 2,
- "metadata": {
-  "language_info": {
-   "name": "python",
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "version": "3.6.10-final"
-  },
-  "orig_nbformat": 2,
-  "file_extension": ".py",
-  "mimetype": "text/x-python",
-  "name": "python",
-  "npconvert_exporter": "python",
-  "pygments_lexer": "ipython3",
-  "version": 3,
-  "kernelspec": {
-   "name": "python361064bitnnilatestcondabff8d66a619a4d26af34fe0fe687c7b0",
-   "display_name": "Python 3.6.10 64-bit ('nnilatest': conda)"
-  }
- },
 "cells": [
  {
   "cell_type": "markdown",
@@ -53,6 +30,14 @@
    "## NAS-Bench-101"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Use the following architecture as an example:<br>\n",
+    "![nas-101](../../img/nas-bench-101-example.png)"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 2,
@@ -82,6 +67,13 @@
    "    pprint.pprint(t)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "An architecture of NAS-Bench-101 could be trained more than once. Each element of the returned generator is a dict which contains one of the training results of this trial config (architecture + hyper-parameters) including train/valid/test accuracy, training time, number of epochs, etc. The results of NAS-Bench-201 and NDS follow similar formats."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -89,6 +81,14 @@
    "## NAS-Bench-201"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Use the following architecture as an example:<br>\n",
+    "![nas-201](../../img/nas-bench-201-example.png)"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 3,
@@ -120,6 +120,16 @@
    "## NDS"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Use the following architecture as an example:<br>\n",
+    "![nds](../../img/nas-bench-nds-example.png)\n",
+    "\n",
+    "Here, `bot_muls`, `ds`, `num_gs`, `ss` and `ws` stand for \"bottleneck multipliers\", \"depths\", \"number of groups\", \"strides\" and \"widths\" respectively."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 4,
@@ -273,5 +283,28 @@
    "print('Elapsed time: ', time.time() - ti, 'seconds')"
   ]
  }
- ]
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python",
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "version": "3.6.10-final"
+  },
+  "orig_nbformat": 2,
+  "file_extension": ".py",
+  "mimetype": "text/x-python",
+  "name": "python",
+  "npconvert_exporter": "python",
+  "pygments_lexer": "ipython3",
+  "version": 3,
+  "kernelspec": {
+   "name": "python361064bitnnilatestcondabff8d66a619a4d26af34fe0fe687c7b0",
+   "display_name": "Python 3.6.10 64-bit ('nnilatest': conda)"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
 }
\ No newline at end of file
--- a/docs/en_US/Release.md
+++ b/docs/en_US/Release.md
 # ChangeLog
+# Release 1.7 - 7/8/2020
+## Major Features
+### Training Service
+* Support AML(Azure Machine Learning) platform as NNI training service.
+* OpenPAI job can be reusable. When a trial is completed, the OpenPAI job won't stop, and wait next trial. [refer to reuse flag in OpenPAI config](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/TrainingService/PaiMode.md#openpai-configurations).
+* [Support ignoring files and folders in code directory with .nniignore when uploading code directory to training service](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/TrainingService/Overview.md#how-to-use-training-service).
+### Neural Architecture Search (NAS)
+* [Provide NAS Open Benchmarks (NasBench101, NasBench201, NDS) with friendly APIs](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/NAS/Benchmarks.md).
+* [Support Classic NAS (i.e., non-weight-sharing mode) on TensorFlow 2.X](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/NAS/ClassicNas.md).
+### Model Compression
+* Improve Model Speedup: track more dependencies among layers and automatically resolve mask conflict, support the speedup of pruned resnet.
+* Added new pruners, including three auto model pruning algorithms: [NetAdapt Pruner](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/Compressor/Pruner.md#netadapt-pruner), [SimulatedAnnealing Pruner](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/Compressor/Pruner.md#simulatedannealing-pruner), [AutoCompress Pruner](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/Compressor/Pruner.md#autocompress-pruner), and [ADMM Pruner](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/Compressor/Pruner.md#admm-pruner).
+* Added [model sensitivity analysis tool](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/Compressor/CompressionUtils.md) to help users find the sensitivity of each layer to the pruning.
+* [Easy flops calculation for model compression and NAS](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/Compressor/CompressionUtils.md#model-flops-parameters-counter).
+* Update lottery ticket pruner to export winning ticket.
+### Examples
+* Automatically optimize tensor operators on NNI with a new [customized tuner OpEvo](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/TrialExample/OpEvoExamples.md).
+### Built-in tuners/assessors/advisors
+* [Allow customized tuners/assessor/advisors to be installed as built-in algorithms](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/Tutorial/InstallCustomizedAlgos.md).
+### WebUI
+* Support visualizing nested search space more friendly.
+* Show trial's dict keys in hyper-parameter graph.
+* Enhancements to trial duration display.
+### Others
+* Provide utility function to merge parameters received from NNI
+* Support setting paiStorageConfigName in pai mode
+## Documentation
+* Improve [documentation for model compression](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/Compressor/Overview.md)
+* Improve [documentation](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/NAS/Benchmarks.md)
+and [examples](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/NAS/BenchmarksExample.ipynb) for NAS benchmarks.
+* Improve [documentation for AzureML training service](https://github.com/microsoft/nni/blob/v1.7/docs/en_US/TrainingService/AMLMode.md)
+* Homepage migration to readthedoc.
+## Bug Fixes
+* Fix bug for model graph with shared nn.Module
+* Fix nodejs OOM when `make build`
+* Fix NASUI bugs
+* Fix duration and intermediate results pictures update issue.
+* Fix minor WebUI table style issues.
 ## Release 1.6 - 5/26/2020
 ### Major Features

--- a/docs/en_US/TrainingService/AMLMode.md
+++ b/docs/en_US/TrainingService/AMLMode.md
@@ -5,12 +5,22 @@ NNI supports running an experiment on [AML](https://azure.microsoft.com/en-us/se
 ## Setup environment
 Step 1. Install NNI, follow the install guide [here](../Tutorial/QuickStart.md).   
-Step 2. Create AML account, follow the document [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace-cli).
+Step 2. Create an Azure account/subscription using this [link](https://azure.microsoft.com/en-us/free/services/machine-learning/). If you already have an Azure account/subscription, skip this step.
-Step 3. Get your account information.
+Step 3. Install the Azure CLI on your machine, follow the install guide [here](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest).
-![](../../img/aml_account.png)
-Step4. Install AML package environment.
+Step 4. Authenticate to your Azure subscription from the CLI. To authenticate interactively, open a command line or terminal and use the following command:
+```
+az login
+```
+Step 5. Log into your Azure account with a web browser and create a Machine Learning resource. You will need to choose a resource group and specific a workspace name. Then download `config.json` which will be used later.
+![](../../img/aml_workspace.png)
+Step 6. Create an AML cluster as the computeTarget.
+![](../../img/aml_cluster.png)
+Step 7. Open a command line and install AML package environment.
 ```
 python3 -m pip install azureml --user
 python3 -m pip install azureml-sdk --user
@@ -52,9 +62,9 @@ Note: You should set `trainingServicePlatform: aml` in NNI config YAML file if y
 Compared with [LocalMode](LocalMode.md) trial configuration in aml mode have these additional keys:
 * computeTarget
-    * required key. The computer cluster name you want to use in your AML workspace.
+    * required key. The compute cluster name you want to use in your AML workspace. See Step 6.
 * image
-    * required key. The docker image name used in job.
+    * required key. The docker image name used in job. The image `msranni/nni` of this example only support GPU computeTargets.
 amlConfig:
 * subscriptionId
@@ -64,3 +74,15 @@ amlConfig:
 * workspaceName
    * the workspaceName of your account
+The required information of amlConfig could be found in the downloaded `config.json` in Step 5.
+Run the following commands to start the example experiment:
+```
+git clone -b ${NNI_VERSION} https://github.com/microsoft/nni
+cd nni/examples/trials/mnist-tfv1
+# modify config_aml.yml ...
+nnictl create --config config_aml.yml
+```
+Replace `${NNI_VERSION}` with a released version name or branch name, e.g., `v1.7`.
--- a/docs/en_US/TrainingService/Overview.md
+++ b/docs/en_US/TrainingService/Overview.md
@@ -12,7 +12,7 @@ If the computing resource customers try to use is not listed above, NNI provides
 Training service needs to be chosen and configured properly in experiment configuration YAML file. Users could refer to the document of each training service for how to write the configuration. Also, [reference](../Tutorial/ExperimentConfig) provides more details on the specification of the experiment configuration file.
-Next, users should prepare code directory, which is specified as `codeDir` in config file. Please note that in non-local mode, the code directory will be uploaded to remote or cluster before the experiment. Therefore, we limit the number of files to 2000 and total size to 300MB. If the code directory contains too many files, users can choose which files and subfolders should be excluded by adding a `.nniignore` file that works like a `.gitignore` file. For more details on how to write this file, see the [git documentation](https://git-scm.com/docs/gitignore#_pattern_format).
+Next, users should prepare code directory, which is specified as `codeDir` in config file. Please note that in non-local mode, the code directory will be uploaded to remote or cluster before the experiment. Therefore, we limit the number of files to 2000 and total size to 300MB. If the code directory contains too many files, users can choose which files and subfolders should be excluded by adding a `.nniignore` file that works like a `.gitignore` file. For more details on how to write this file, see [this example](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1/.nniignore) and the [git documentation](https://git-scm.com/docs/gitignore#_pattern_format).
 In case users intend to use large files in their experiment (like large-scaled datasets) and they are not using local mode, they can either: 1) download the data before each trial launches by putting it into trial command; or 2) use a shared storage that is accessible to worker nodes. Usually, training platforms are equipped with shared storage, and NNI allows users to easily use them. Refer to docs of each built-in training service for details.

--- a/docs/en_US/TrialExample/OpEvoExamples.md
+++ b/docs/en_US/TrialExample/OpEvoExamples.md
@@ -4,8 +4,13 @@
 Abundant applications raise the demands of training and inference deep neural networks (DNNs) efficiently on diverse hardware platforms ranging from cloud servers to embedded devices. Moreover, computational graph-level optimization of deep neural network, like tensor operator fusion, may introduce new tensor operators. Thus, manually optimized tensor operators provided by hardware-specific libraries have limitations in terms of supporting new hardware platforms or supporting new operators, so automatically optimizing tensor operators on diverse hardware platforms is essential for large-scale deployment and application of deep learning technologies in the real-world problems.
-Tensor operator optimization is substantially a combinatorial optimization problem. The objective function is the performance of a tensor operator on specific hardware platform, which should be maximized with respect to the hyper-parameters of corresponding device code, such as how to tile a matrix or whether to unroll a loop. This example illustrates how to automatically tune tensor operators with NNI. Three tuning algorithms, OpEvo, G-BFS and N-A2C are provided. Please refer to [OpEvo: An Evolutionary Method for Tensor Operator Optimization](https://arxiv.org/abs/2006.05664) for detailed explanation about these algorithms.
+Tensor operator optimization is substantially a combinatorial optimization problem. The objective function is the performance of a tensor operator on specific hardware platform, which should be maximized with respect to the hyper-parameters of corresponding device code, such as how to tile a matrix or whether to unroll a loop. Unlike many typical problems of this type, such as travelling salesman problem, the objective function of tensor operator optimization is a black box and expensive to sample. One has to compile a device code with a specific configuration and run it on real hardware to get the corresponding performance metric. Therefore, a desired method for optimizing tensor operators should find the best configuration with as few samples as possible.
+The expensive objective function makes solving tensor operator optimization problem with traditional combinatorial optimization methods, for example, simulated annealing and evolutionary algorithms, almost impossible. Although these algorithms inherently support combinatorial search spaces, they do not take sample-efficiency into account,
+thus thousands of or even more samples are usually needed, which is unacceptable when tuning tensor operators in product environments. On the other hand, sequential model based optimization (SMBO) methods are proved sample-efficient for optimizing black-box functions with continuous search spaces. However, when optimizing ones with combinatorial search spaces, SMBO methods are not as sample-efficient as their continuous counterparts, because there is lack of prior assumptions about the objective functions, such as continuity and differentiability in the case of continuous search spaces. For example, if one could assume an objective function with a continuous search space is infinitely differentiable, a Gaussian process with a radial basis function (RBF) kernel could be used to model the objective function. In this way, a sample provides not only a single value at a point but also the local properties of the objective function in its neighborhood or even global properties,
+which results in a high sample-efficiency. In contrast, SMBO methods for combinatorial optimization suffer poor sample-efficiency due to the lack of proper prior assumptions and surrogate models which can leverage them.
+OpEvo is recently proposed for solving this challenging problem. It efficiently explores the search spaces of tensor operators by introducing a topology-aware mutation operation based on q-random walk distribution to leverage the topological structures over the search spaces. Following this example, you can use OpEvo to tune three representative types of tensor operators selected from two popular neural networks, BERT and AlexNet. Three comparison baselines, AutoTVM, G-BFS and N-A2C, are also provided. Please refer to [OpEvo: An Evolutionary Method for Tensor Operator Optimization](https://arxiv.org/abs/2006.05664) for detailed explanation about these algorithms.
 ## Environment Setup
@@ -26,53 +31,79 @@ For tuning the operators of matrix multiplication, please run below commands fro
 # (N, K) x (K, M) represents a matrix of shape (N, K) multiplies a matrix of shape (K, M)
 # (512, 1024) x (1024, 1024)
-# tuning with opevo
+# tuning with OpEvo
 nnictl create --config experiments/mm/N512K1024M1024/config_opevo.yml
-# tuning with g-bfs
+# tuning with G-BFS
 nnictl create --config experiments/mm/N512K1024M1024/config_gbfs.yml
-# tuning with n-a2c
+# tuning with N-A2C
 nnictl create --config experiments/mm/N512K1024M1024/config_na2c.yml
+# tuning with AutoTVM
+OP=matmul STEP=512 N=512 M=1024 K=1024 P=NN ./run.s
 # (512, 1024) x (1024, 4096)
-# tuning with opevo
+# tuning with OpEvo
 nnictl create --config experiments/mm/N512K1024M4096/config_opevo.yml
-# tuning with g-bfs
+# tuning with G-BFS
 nnictl create --config experiments/mm/N512K1024M4096/config_gbfs.yml
-# tuning with n-a2c
+# tuning with N-A2C
 nnictl create --config experiments/mm/N512K1024M4096/config_na2c.yml
+# tuning with AutoTVM
+OP=matmul STEP=512 N=512 M=1024 K=4096 P=NN ./run.sh
 # (512, 4096) x (4096, 1024)
-# tuning with opevo
+# tuning with OpEvo
-nnictl create --config experiments/mm/N512K1024M4096/config_opevo.yml
+nnictl create --config experiments/mm/N512K4096M1024/config_opevo.yml
-# tuning with g-bfs
+# tuning with G-BFS
-nnictl create --config experiments/mm/N512K1024M4096/config_gbfs.yml
+nnictl create --config experiments/mm/N512K4096M1024/config_gbfs.yml
-# tuning with n-a2c
+# tuning with N-A2C
-nnictl create --config experiments/mm/N512K1024M4096/config_na2c.yml
+nnictl create --config experiments/mm/N512K4096M1024/config_na2c.yml
+# tuning with AutoTVM
+OP=matmul STEP=512 N=512 M=4096 K=1024 P=NN ./run.sh
 ```
 For tuning the operators of batched matrix multiplication, please run below commands from `/root`:
 ```bash
 # batched matrix with batch size 960 and shape of matrix (128, 128) multiplies batched matrix with batch size 960 and shape of matrix (128, 64)
+# tuning with OpEvo
 nnictl create --config experiments/bmm/B960N128K128M64PNN/config_opevo.yml
+# tuning with AutoTVM
+OP=batch_matmul STEP=512 B=960 N=128 K=128 M=64 P=NN ./run.sh
 # batched matrix with batch size 960 and shape of matrix (128, 128) is transposed first and then multiplies batched matrix with batch size 960 and shape of matrix (128, 64)
+# tuning with OpEvo
 nnictl create --config experiments/bmm/B960N128K128M64PTN/config_opevo.yml
+# tuning with AutoTVM
+OP=batch_matmul STEP=512 B=960 N=128 K=128 M=64 P=TN ./run.sh
 # batched matrix with batch size 960 and shape of matrix (128, 64) is transposed first and then right multiplies batched matrix with batch size 960 and shape of matrix (128, 64).
+# tuning with OpEvo
 nnictl create --config experiments/bmm/B960N128K64M128PNT/config_opevo.yml
+# tuning with AutoTVM
+OP=batch_matmul STEP=512 B=960 N=128 K=64 M=128 P=NT ./run.sh
 ```
 For tuning the operators of 2D convolution, please run below commands from `/root`:
 ```bash
-# image tensor of shape $(512, 3, 227, 227)$ convolves with kernel tensor of shape $(64, 3, 11, 11)$ with stride 4 and padding 0
+# image tensor of shape (512, 3, 227, 227) convolves with kernel tensor of shape (64, 3, 11, 11) with stride 4 and padding 0
+# tuning with OpEvo
 nnictl create --config experiments/conv/N512C3HW227F64K11ST4PD0/config_opevo.yml
-# image tensor of shape $(512, 64, 27, 27)$ convolves with kernel tensor of shape $(192, 64, 5, 5)$ with stride 1 and padding 2
+# tuning with AutoTVM
+OP=convfwd_direct STEP=512 N=512 C=3 H=227 W=227 F=64 K=11 ST=4 PD=0 ./run.sh
+# image tensor of shape (512, 64, 27, 27) convolves with kernel tensor of shape (192, 64, 5, 5) with stride 1 and padding 2
+# tuning with OpEvo
 nnictl create --config experiments/conv/N512C64HW27F192K5ST1PD2/config_opevo.yml
+# tuning with AutoTVM
+OP=convfwd_direct STEP=512 N=512 C=64 H=27 W=27 F=192 K=5 ST=1 PD=2 ./run.sh
 ```
-Please note that G-BFS and N-A2C are not eligible to tune the operators of batched matrix multiplication and 2D convolution, since there are unsupportable parameters in the search spaces of these operators.
+Please note that G-BFS and N-A2C are only designed for tuning tiling schemes of multiplication of matrices with only power of 2 rows and columns, so they are not compatible with other types of configuration spaces, thus not eligible to tune the operators of batched matrix multiplication and 2D convolution. Here, AutoTVM is implemented by its authors in the TVM project, so the tuning results are printed on the screen rather than reported to NNI manager. The port 8080 of the container is bind to the host on the same port, so one can access the NNI Web UI through `host_ip_addr:8080` and monitor tuning process as below screenshot.
+<img src="../../../examples/trials/systems/opevo/screenshot.png" />
 ## Citing OpEvo
-If you use OpEvo in your research, please consider citing the paper as follows:
+If you feel OpEvo is helpful, please consider citing the paper as follows:
 ```
 @misc{gao2020opevo,
    title={OpEvo: An Evolutionary Method for Tensor Operator Optimization},

--- a/docs/img/aml_account.png
+++ b/docs/img/aml_account.png
--- a/docs/img/aml_cluster.png
+++ b/docs/img/aml_cluster.png
--- a/docs/img/aml_workspace.png
+++ b/docs/img/aml_workspace.png
--- a/docs/img/nas-bench-101-example.png
+++ b/docs/img/nas-bench-101-example.png
--- a/docs/img/nas-bench-201-example.png
+++ b/docs/img/nas-bench-201-example.png
--- a/docs/img/nas-bench-nds-example.png
+++ b/docs/img/nas-bench-nds-example.png
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -14,4 +14,5 @@ nbsphinx
 schema
 tensorboard
 scikit-learn==0.20
+thop
 https://download.pytorch.org/whl/cpu/torch-1.3.1%2Bcpu-cp37-cp37m-linux_x86_64.whl
--- a/examples/nas/benchmarks/.gitignore
+++ b/examples/nas/benchmarks/.gitignore
+nasbench_full.tfrecord
+a.pth
+data.zip
+nds_data
--- a/examples/nas/benchmarks/nasbench101.requirements.txt
+++ b/examples/nas/benchmarks/nasbench101.requirements.txt
+tensorflow==1.15.2
+tqdm
+peewee
+git+https://github.com/google-research/nasbench
--- a/examples/nas/benchmarks/nasbench101.sh
+++ b/examples/nas/benchmarks/nasbench101.sh
+#!/bin/bash
 set -e
-mkdir -p /outputs /tmp
-echo "Installing dependencies..."
+if [ -z "${NASBENCHMARK_DIR}" ]; then
-apt update && apt install -y wget git
+    NASBENCHMARK_DIR=~/.nni/nasbenchmark
-pip install --no-cache-dir tqdm peewee
+fi
-echo "Installing NNI..."
-cd /nni && echo "y" | source install.sh
-cd /tmp
-echo "Installing NASBench..."
-git clone https://github.com/google-research/nasbench
-cd nasbench && pip install -e . && cd ..
 echo "Downloading NAS-Bench-101..."
-wget https://storage.googleapis.com/nasbench/nasbench_full.tfrecord
+if [ -f "nasbench_full.tfrecord" ]; then
+    echo "nasbench_full.tfrecord found. Skip download."
+else
+    wget https://storage.googleapis.com/nasbench/nasbench_full.tfrecord
+fi
 echo "Generating database..."
-rm -f /outputs/nasbench101.db /outputs/nasbench101.db-journal
+rm -f ${NASBENCHMARK_DIR}/nasbench101.db ${NASBENCHMARK_DIR}/nasbench101.db-journal
-NASBENCHMARK_DIR=/outputs python -m nni.nas.benchmarks.nasbench101.db_gen nasbench_full.tfrecord
+mkdir -p ${NASBENCHMARK_DIR}
+python -m nni.nas.benchmarks.nasbench101.db_gen nasbench_full.tfrecord
+rm -f nasbench_full.tfrecord
--- a/examples/nas/benchmarks/nasbench201.requirements.txt
+++ b/examples/nas/benchmarks/nasbench201.requirements.txt
+torch
+gdown
+tqdm
+peewee