Unverified Commit 403195f0 authored by Yuge Zhang's avatar Yuge Zhang Committed by GitHub
Browse files

Merge branch 'master' into nn-meter

parents 99aa8226 a7278d2d
This diff is collapsed.
<!-- BEGIN MICROSOFT SECURITY.MD V0.0.5 BLOCK -->
## 安全
微软非常重视软件产品和服务的安全性,包括通过我们的 GitHub 组织管理的所有源代码库,其中涵盖 [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin),和 [我们 GitHub 的组织](https://opensource.microsoft.com/)
如果你在任何微软拥有的资源库中发现了安全漏洞,并且符合 [微软对安全漏洞的定义](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)),请按照下文所述向我们报告。
## 报告安全问题
**请不要通过公开的 GitHub 问题报告安全漏洞。**
相反,请向微软安全响应中心(MSRC)报告,链接是 [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report)
如果您希望在不登录的情况下提交,请发送电子邮件至 [secure@microsoft.com](mailto:secure@microsoft.com)。 如果可能的话,请用我们的 PGP 密钥对您的信息进行加密;请从以下网站下载该密钥 [微软安全响应中心 PGP 密钥页面](https://www.microsoft.com/en-us/msrc/pgp-key-msrc)
你应该在24小时内收到回复。 如果由于某些原因你没有收到,请通过电子邮件跟进,以确保我们收到你的原始信息。 其他信息可以在以下网站找到 [microsoft.com/msrc](https://www.microsoft.com/msrc)
请包括以下所要求的信息(尽可能多地提供),以帮助我们更好地了解可能的问题的性质和范围。
* 问题类型(如缓冲区溢出、SQL 注入、跨站脚本等)
* 与问题表现有关的源文件的完整路径
* 受影响的源代码位置(标签/分支/提交或 URL)
* 重现该问题所需的任何特殊配置
* 重现该问题的分步骤说明
* 概念证明或漏洞代码(如果可能的话)
* 该问题的影响,包括攻击者如何利用该问题
这些信息将帮助我们更快地对你的报告进行分流。
如果您需要报告错误赏金,更完整的报告可有助于获得更高的赏金奖励。 请访问我们的[微软漏洞赏金计划](https://microsoft.com/msrc/bounty)页面,以了解有关我们活动计划的更多详细信息。
## 首选语言
我们希望所有的交流都是用英语进行的。
## 政策
微软遵循[协调漏洞披露](https://www.microsoft.com/en-us/msrc/cvd)的原则。
<!-- END MICROSOFT SECURITY.MD BLOCK -->
# Recommended because some non-commonly-used modules/examples depend on those packages.
-f https://download.pytorch.org/whl/torch_stable.html
tensorflow
keras
torch == 1.6.0
torchvision == 0.7.0
pytorch-lightning >= 1.1.1
onnx
peewee
graphviz
gym
tianshou >= 0.4.1
Auto Completion for nnictl Commands
===================================
NNI's command line tool **nnictl** support auto-completion, i.e., you can complete a nnictl command by pressing the ``tab`` key.
For example, if the current command is
.. code-block:: bash
nnictl cre
By pressing the ``tab`` key, it will be completed to
.. code-block:: bash
nnictl create
For now, auto-completion will not be enabled by default if you install NNI through ``pip``\ , and it only works on Linux with bash shell. If you want to enable this feature on your computer, please refer to the following steps:
Step 1. Download ``bash-completion``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
cd ~
wget https://raw.githubusercontent.com/microsoft/nni/{nni-version}/tools/bash-completion
Here, {nni-version} should by replaced by the version of NNI, e.g., ``master``, ``v2.3``. You can also check the latest ``bash-completion`` script :githublink:`here <tools/bash-completion>`.
.. cannot find :githublink:`here <tools/bash-completion>`.
Step 2. Install the script
^^^^^^^^^^^^^^^^^^^^^^^^^^
If you are running a root account and want to install this script for all the users
.. code-block:: bash
install -m644 ~/bash-completion /usr/share/bash-completion/completions/nnictl
If you just want to install this script for your self
.. code-block:: bash
mkdir -p ~/.bash_completion.d
install -m644 ~/bash-completion ~/.bash_completion.d/nnictl
echo '[[ -f ~/.bash_completion.d/nnictl ]] && source ~/.bash_completion.d/nnictl' >> ~/.bash_completion
Step 3. Reopen your terminal
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Reopen your terminal and you should be able to use the auto-completion feature. Enjoy!
Step 4. Uninstall
^^^^^^^^^^^^^^^^^
If you want to uninstall this feature, just revert the changes in the steps above.
...@@ -15,7 +15,6 @@ Use Cases and Solutions ...@@ -15,7 +15,6 @@ Use Cases and Solutions
Feature Engineering <feature_engineering> Feature Engineering <feature_engineering>
Performance measurement, comparison and analysis <perf_compare> Performance measurement, comparison and analysis <perf_compare>
Use NNI on Google Colab <NNI_colab_support> Use NNI on Google Colab <NNI_colab_support>
Auto Completion for nnictl Commands <AutoCompletion>
External Repositories and References External Repositories and References
==================================== ====================================
......
...@@ -140,9 +140,6 @@ Topology Utilities ...@@ -140,9 +140,6 @@ Topology Utilities
.. autoclass:: nni.compression.pytorch.utils.shape_dependency.GroupDependency .. autoclass:: nni.compression.pytorch.utils.shape_dependency.GroupDependency
:members: :members:
.. autoclass:: nni.compression.pytorch.utils.mask_conflict.CatMaskPadding
:members:
.. autoclass:: nni.compression.pytorch.utils.mask_conflict.GroupMaskConflict .. autoclass:: nni.compression.pytorch.utils.mask_conflict.GroupMaskConflict
:members: :members:
......
...@@ -50,6 +50,10 @@ CUDA version >= 11.0 ...@@ -50,6 +50,10 @@ CUDA version >= 11.0
TensorRT version >= 7.2 TensorRT version >= 7.2
Note
* If you haven't installed TensorRT before or use the old version, please refer to `TensorRT Installation Guide <https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html>`__\
Usage Usage
----- -----
quantization aware training: quantization aware training:
......
...@@ -94,10 +94,10 @@ And you could reference the examples in ``/examples/feature_engineering/gradient ...@@ -94,10 +94,10 @@ And you could reference the examples in ``/examples/feature_engineering/gradient
* *
**X** (array-like, require) - The training input samples which shape = [n_samples, n_features] **X** (array-like, require) - The training input samples which shape = [n_samples, n_features]. `np.ndarry` recommended.
* *
**y** (array-like, require) - The target values (class labels in classification, real numbers in regression) which shape = [n_samples]. **y** (array-like, require) - The target values (class labels in classification, real numbers in regression) which shape = [n_samples]. `np.ndarry` recommended.
* *
**groups** (array-like, optional, default = None) - Groups of columns that must be selected as a unit. e.g. [0, 0, 1, 2] specifies the first two columns are part of a group. Which shape is [n_features]. **groups** (array-like, optional, default = None) - Groups of columns that must be selected as a unit. e.g. [0, 0, 1, 2] specifies the first two columns are part of a group. Which shape is [n_features].
......
Hypermodules
============
Hypermodule is a (PyTorch) module which contains many architecture/hyperparameter candidates for this module. By using hypermodule in user defined model, NNI will help users automatically find the best architecture/hyperparameter of the hypermodules for this model. This follows the design philosophy of Retiarii that users write DNN model as a space.
There has been proposed some hypermodules in NAS community, such as AutoActivation, AutoDropout. Some of them are implemented in the Retiarii framework.
.. autoclass:: nni.retiarii.nn.pytorch.AutoActivation
:members:
\ No newline at end of file
...@@ -12,7 +12,7 @@ In this quick start tutorial, we use multi-trial NAS as an example to show how t ...@@ -12,7 +12,7 @@ In this quick start tutorial, we use multi-trial NAS as an example to show how t
One-shot NAS tutorial can be found `here <./OneshotTrainer.rst>`__. One-shot NAS tutorial can be found `here <./OneshotTrainer.rst>`__.
.. note:: Currently, PyTorch is the only supported framework by Retiarii, and we have only tested with **PyTorch 1.6 and 1.7**. This documentation assumes PyTorch context but it should also apply to other frameworks, that is in our future plan. .. note:: Currently, PyTorch is the only supported framework by Retiarii, and we have only tested with **PyTorch 1.6 to 1.9**. This documentation assumes PyTorch context but it should also apply to other frameworks, that is in our future plan.
Define your Model Space Define your Model Space
----------------------- -----------------------
...@@ -180,7 +180,9 @@ The complete code of a simple MNIST example can be found :githublink:`here <exam ...@@ -180,7 +180,9 @@ The complete code of a simple MNIST example can be found :githublink:`here <exam
Visualize the Experiment Visualize the Experiment
------------------------ ------------------------
Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost::8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../../Tutorial/WebUI.rst>`__ for details. Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost::8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../Tutorial/WebUI.rst>`__ for details.
We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__). This can be used by clicking ``Visualization`` in detail panel for each trial. Note that current visualization is based on `onnx <https://onnx.ai/>`__ . Built-in evaluators (e.g., Classification) will automatically export the model into a file, for your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.
Export Top Models Export Top Models
----------------- -----------------
......
...@@ -24,6 +24,8 @@ The simplest way to customize a new evaluator is with functional APIs, which is ...@@ -24,6 +24,8 @@ The simplest way to customize a new evaluator is with functional APIs, which is
.. note:: Due to our current implementation limitation, the ``fit`` function should be put in another python file instead of putting it in the main file. This limitation will be fixed in future release. .. note:: Due to our current implementation limitation, the ``fit`` function should be put in another python file instead of putting it in the main file. This limitation will be fixed in future release.
.. note:: When using customized evaluators, if you want to visualize models, you need to export your model and save it into ``$NNI_OUTPUT_DIR/model.onnx`` in your evaluator.
With PyTorch-Lightning With PyTorch-Lightning
---------------------- ----------------------
......
...@@ -8,4 +8,5 @@ NNI provides powerful APIs for users to easily express model space (or search sp ...@@ -8,4 +8,5 @@ NNI provides powerful APIs for users to easily express model space (or search sp
:maxdepth: 1 :maxdepth: 1
Mutation Primitives <MutationPrimitives> Mutation Primitives <MutationPrimitives>
Customize Mutators <Mutators> Customize Mutators <Mutators>
\ No newline at end of file Hypermodule Lib <Hypermodules>
\ No newline at end of file
...@@ -54,7 +54,7 @@ For each experiment, the user only needs to define a search space and update a f ...@@ -54,7 +54,7 @@ For each experiment, the user only needs to define a search space and update a f
Step 2: `Update model codes <TrialExample/Trials.rst>`__ Step 2: `Update model codes <TrialExample/Trials.rst>`__
Step 3: `Define Experiment <Tutorial/ExperimentConfig.rst>`__ Step 3: `Define Experiment <reference/experiment_config.rst>`__
......
Pix2pix example
=================
Overview
--------
`Pix2pix <https://arxiv.org/abs/1611.07004>`__ is a conditional generative adversial network (conditional GAN) framework proposed by Isola et. al. in 2016 targeting at solving image-to-image translation problems. This framework performs well in a wide range of image generation problems. In the original paper, the authors demonstrate how to use pix2pix to solve the following image translation problems: 1) labels to street scene; 2) labels to facade; 3) BW to Color; 4) Aerial to Map; 5) Day to Night and 6) Edges to Photo. If you are interested, please read more in the `official project page <https://phillipi.github.io/pix2pix/>`__ . In this example, we use pix2pix to introduce how to use NNI for tuning conditional GANs.
**Goals**
^^^^^^^^^^^^^
Although GANs are known to be able to generate high-resolution realistic images, they are generally fragile and difficult to optimize, and mode collapse can happen during training due to improper optimization setting, loss formulation, model architecture, weight initialization, or even data augmentation patterns. The goal of this tutorial is to leverage NNI hyperparameter tuning tools to automatically find a good setting for these important factors.
In this example, we aim at selecting the following hyperparameters automatically:
* ``ngf``: number of generator filters in the last conv layer
* ``ndf``: number of discriminator filters in the first conv layer
* ``netG``: generator architecture
* ``netD``: discriminator architecture
* ``norm``: normalization type
* ``init_type``: weight initialization method
* ``lr``: initial learning rate for adam
* ``beta1``: momentum term of adam
* ``lr_policy``: learning rate policy
* ``gan_mode``: type of GAN objective
* ``lambda_L1``: weight of L1 loss in the generator objective
**Experiments**
^^^^^^^^^^^^^^^^^^^^
Preparations
^^^^^^^^^^^^
This example requires the GPU version of PyTorch. PyTorch installation should be chosen based on system, python version, and cuda version.
Please refer to the detailed instruction of installing `PyTorch <https://pytorch.org/get-started/locally/>`__
Next, run the following shell script to clone the repository maintained by the original authors of pix2pix. This example relies on the implementations in this repository.
.. code-block:: bash
./setup.sh
Pix2pix with NNI
^^^^^^^^^^^^^^^^^
**Search Space**
We summarize the range of values for each hyperparameter mentioned above into a single search space json object.
.. code-block:: json
{
"ngf": {"_type":"choice","_value":[16, 32, 64, 128, 256]},
"ndf": {"_type":"choice","_value":[16, 32, 64, 128, 256]},
"netG": {"_type":"choice","_value":["resnet_9blocks", "unet_256"]},
"netD": {"_type":"choice","_value":["basic", "pixel", "n_layers"]},
"norm": {"_type":"choice","_value":["batch", "instance", "none"]},
"init_type": {"_type":"choice","_value":["xavier", "normal", "kaiming", "orthogonal"]},
"lr":{"_type":"choice","_value":[0.0001, 0.0002, 0.0005, 0.001, 0.005, 0.01, 0.1]},
"beta1":{"_type":"uniform","_value":[0, 1]},
"lr_policy": {"_type":"choice","_value":["linear", "step", "plateau", "cosine"]},
"gan_mode": {"_type":"choice","_value":["vanilla", "lsgan", "wgangp"]} ,
"lambda_L1": {"_type":"choice","_value":[1, 5, 10, 100, 250, 500]}
}
Starting from v2.0, the search space is directly included in the config. Please find the example here: :githublink:`config.yml <examples/trials/pix2pix-pytorch/config.yml>`
**Trial**
To experiment on this set of hyperparameters using NNI, we have to write a trial code, which receives a set of parameter settings from NNI, trains a generator and discriminator using these parameters, and then reports the final scores back to NNI. In the experiment, NNI repeatedly calls this trial code, passing in different set of hyperparameter settings. It is important that the following three lines are incorporated in the trial code:
* Use ``nni.get_next_parameter()`` to get next hyperparameter set.
* (Optional) Use ``nni.report_intermediate_result(score)`` to report the intermediate result after finishing each epoch.
* Use ``nni.report_final_result(score)`` to report the final result before the trial ends.
Implemented code directory: :githublink:`pix2pix.py <examples/trials/pix2pix-pytorch/pix2pix.py>`
Some notes on the implementation:
* The trial code for this example is adapted from the `repository maintained by the authors of Pix2pix and CycleGAN <https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix>`__ . You can also use your previous code directly. Please refer to `How to define a trial <Trials.rst>`__ for modifying the code.
* By default, the code uses the dataset "facades". It also supports the datasets "night2day", "edges2handbags", "edges2shoes", and "maps".
* For "facades", 200 epochs are enough for the model to converge to a point where the difference between models trained with different hyperparameters are salient enough for evaluation. If you are using other datasets, please consider increasing the ``n_epochs`` and ``n_epochs_decay`` parameters by either passing them as arguments when calling ``pix2pix.py`` in the config file (discussed below) or changing the ``pix2pix.py`` directly. Also, for "facades", 200 epochs are enought for the final training, while the number may vary for other datasets.
* In this example, we use L1 loss on the test set as the score to report to NNI. Although L1 is by no means a comprehensive measure of image generation performance, at most times it makes sense for evaluating pix2pix models with similar architectural setup. In this example, for the hyperparameters we experiment on, a higher L1 score generally indicates a higher generation performance.
**Config**
Here is the example config of running this experiment on local (with a single GPU):
code directory: :githublink:`examples/trials/pix2pix-pytorch/config.yml <examples/trials/pix2pix-pytorch/config.yml>`
To have a full glance on our implementation, check: :githublink:`examples/trials/pix2pix-pytorch/ <examples/trials/pix2pix-pytorch>`
Launch the experiment
^^^^^^^^^^^^^^^^^^^^^
We are ready for the experiment, let's now **run the config.yml file from your command line to start the experiment**.
.. code-block:: bash
nnictl create --config nni/examples/trials/pix2pix-pytorch/config.yml
Collecting the Results
^^^^^^^^^^^^^^^^^^^^^^
By default, our trial code saves the final trained model for each trial in the ``checkpoints/`` directory in the trial directory of the NNI experiment. The ``latest_net_G.pth`` and ``latest_net_D.pth`` correspond to the save checkpoints for the generator and the discriminator.
To make it easier to run inference and see the generated images, we also incorporate a simple inference code here: :githublink:`test.py <examples/trials/pix2pix-pytorch/test.py>`
To use the code, run the following command:
.. code-block:: bash
python3 test.py -c CHECKPOINT -p PARAMETER_CFG -d DATASET_NAME -o OUTPUT_DIR
``CHECKPOINT`` is the directory saving the checkpoints (e.g., the ``checkpoints/`` directory in the trial directory). ``PARAMETER_CFG`` is the ``parameter.cfg`` file generated by NNI recording the hyperparameter settings. This file can be found in the trial directory created by NNI.
Results and Discussions
^^^^^^^^^^^^^^^^^^^^^^^
Following the previous steps, we ran the example for 40 trials using the TPE tuner. We found that the best-performing parameters on the 'facades' dataset to be the following set.
.. code-block:: json
{
"ngf": 16,
"ndf": 128,
"netG": "unet_256",
"netD": "pixel",
"norm": "none",
"init_type": "normal",
"lr": 0.0002,
"beta1": 0.6954,
"lr_policy": "step",
"gan_mode": "lsgan",
"lambda_L1": 500
}
Meanwhile, we compare the results with the model training using the following default empirical hyperparameter settings:
.. code-block:: json
{
"ngf": 128,
"ndf": 128,
"netG": "unet_256",
"netD": "basic",
"norm": "batch",
"init_type": "xavier",
"lr": 0.0002,
"beta1": 0.5,
"lr_policy": "linear",
"gan_mode": "lsgan",
"lambda_L1": 100
}
We can observe that for learning rate (0.0002), the generator architecture (U-Net), and gan objective (LSGAN), the two results agree with each other. This is also consistent with the widely accepted practice on this dataset. Meanwhile, the hyperparameters "beta1", "lambda_L1", "ngf", and "ndf" are slightly changed in the NNI's found solution to fit the target dataset. We found that the parameters searched by NNI outperforms the empirical parameters on the facades dataset both in terms of L1 loss and the visual qualities of the images. While the search hyperparameter has a L1 loss of 0.3317 on the test set of facades, the empirical hyperparameters can only achieve a L1 loss of 0.4148. The following image shows some sample results of facades test set input-output pairs produced by the model with hyperparameters tuned with NNI.
.. image:: ../../img/pix2pix_pytorch_facades.png
:target: ../../img/pix2pix_pytorch_facades.png
:alt:
...@@ -67,8 +67,8 @@ Our documentation is built with :githublink:`sphinx <docs>`. ...@@ -67,8 +67,8 @@ Our documentation is built with :githublink:`sphinx <docs>`.
* Before submitting the documentation change, please **build homepage locally**: ``cd docs/en_US && make html``, then you can see all the built documentation webpage under the folder ``docs/en_US/_build/html``. It's also highly recommended taking care of **every WARNING** during the build, which is very likely the signal of a **deadlink** and other annoying issues. * Before submitting the documentation change, please **build homepage locally**: ``cd docs/en_US && make html``, then you can see all the built documentation webpage under the folder ``docs/en_US/_build/html``. It's also highly recommended taking care of **every WARNING** during the build, which is very likely the signal of a **deadlink** and other annoying issues.
* *
For links, please consider using **relative paths** first. However, if the documentation is written in Markdown format, and: For links, please consider using **relative paths** first. However, if the documentation is written in reStructuredText format, and:
* It's an image link which needs to be formatted with embedded html grammar, please use global URL like ``https://user-images.githubusercontent.com/44491713/51381727-e3d0f780-1b4f-11e9-96ab-d26b9198ba65.png``, which can be automatically generated by dragging picture onto `Github Issue <https://github.com/Microsoft/nni/issues/new>`__ Box. * It's an image link which needs to be formatted with embedded html grammar, please use global URL like ``https://user-images.githubusercontent.com/44491713/51381727-e3d0f780-1b4f-11e9-96ab-d26b9198ba65.png``, which can be automatically generated by dragging picture onto `Github Issue <https://github.com/Microsoft/nni/issues/new>`__ Box.
* It cannot be re-formatted by sphinx, such as source code, please use its global URL. For source code that links to our github repo, please use URLs rooted at ``https://github.com/Microsoft/nni/tree/v2.3/`` (:githublink:`mnist.py <examples/trials/mnist-pytorch/mnist.py>` for example). * It cannot be re-formatted by sphinx, such as source code, please use its global URL. For source code that links to our github repo, please use URLs rooted at ``https://github.com/Microsoft/nni/tree/master/`` (:githublink:`mnist.py <examples/trials/mnist-pytorch/mnist.py>` for example).
...@@ -20,6 +20,7 @@ This document describes the rules to write the config file, and provides some ex ...@@ -20,6 +20,7 @@ This document describes the rules to write the config file, and provides some ex
* `versionCheck <#versioncheck>`__ * `versionCheck <#versioncheck>`__
* `debug <#debug>`__ * `debug <#debug>`__
* `maxTrialNum <#maxtrialnum>`__ * `maxTrialNum <#maxtrialnum>`__
* `maxTrialDuration <#maxtrialduration>`__
* `trainingServicePlatform <#trainingserviceplatform>`__ * `trainingServicePlatform <#trainingserviceplatform>`__
* `searchSpacePath <#searchspacepath>`__ * `searchSpacePath <#searchspacepath>`__
* `useAnnotation <#useannotation>`__ * `useAnnotation <#useannotation>`__
...@@ -254,7 +255,7 @@ maxExecDuration ...@@ -254,7 +255,7 @@ maxExecDuration
Optional. String. Default: 999d. Optional. String. Default: 999d.
**maxExecDuration** specifies the max duration time of an experiment. The unit of the time is {**s**\ **m**\ , **h**\ , **d**\ }, which means {*seconds*\ , *minutes*\ , *hours*\ , *days*\ }. **maxExecDuration** specifies the max duration time of an experiment. The unit of the time is {**s**\ , **m**\ , **h**\ , **d**\ }, which means {*seconds*\ , *minutes*\ , *hours*\ , *days*\ }.
Note: The maxExecDuration spec set the time of an experiment, not a trial job. If the experiment reach the max duration time, the experiment will not stop, but could not submit new trial jobs any more. Note: The maxExecDuration spec set the time of an experiment, not a trial job. If the experiment reach the max duration time, the experiment will not stop, but could not submit new trial jobs any more.
...@@ -279,6 +280,13 @@ Optional. Integer between 1 and 99999. Default: 99999. ...@@ -279,6 +280,13 @@ Optional. Integer between 1 and 99999. Default: 99999.
Specifies the max number of trial jobs created by NNI, including succeeded and failed jobs. Specifies the max number of trial jobs created by NNI, including succeeded and failed jobs.
maxTrialDuration
^^^^^^^^^^^^^^^^
Optional. String. Default: 999d.
**maxTrialDuration** specifies the max duration time of each trial job. The unit of the time is {**s**\ , **m**\ , **h**\ , **d**\ }, which means {*seconds*\ , *minutes*\ , *hours*\ , *days*\ }. If current trial job reach the max duration time, this trial job will stop.
trainingServicePlatform trainingServicePlatform
^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^
......
...@@ -166,6 +166,10 @@ nnictl resume ...@@ -166,6 +166,10 @@ nnictl resume
- False - False
- -
- set foreground mode, print log content to terminal - set foreground mode, print log content to terminal
* - --experiment_dir, -e
- False
-
- Resume experiment from external folder, specify the full path of experiment folder
...@@ -218,6 +222,10 @@ nnictl view ...@@ -218,6 +222,10 @@ nnictl view
- False - False
- -
- Rest port of the experiment you want to view - Rest port of the experiment you want to view
* - --experiment_dir, -e
- False
-
- View experiment from external folder, specify the full path of experiment folder
......
...@@ -4,7 +4,7 @@ QuickStart ...@@ -4,7 +4,7 @@ QuickStart
Installation Installation
------------ ------------
We currently support Linux, macOS, and Windows. Ubuntu 16.04 or higher, macOS 10.14.1, and Windows 10.1809 are tested and supported. Simply run the following ``pip install`` in an environment that has ``python >= 3.6``. Currently, NNI supports running on Linux, macOS and Windows. Ubuntu 16.04 or higher, macOS 10.14.1, and Windows 10.1809 are tested and supported. Simply run the following ``pip install`` in an environment that has ``python >= 3.6``.
Linux and macOS Linux and macOS
^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
...@@ -20,21 +20,17 @@ Windows ...@@ -20,21 +20,17 @@ Windows
python -m pip install --upgrade nni python -m pip install --upgrade nni
.. Note:: For Linux and macOS, ``--user`` can be added if you want to install NNI in your home directory; this does not require any special privileges. .. Note:: For Linux and macOS, ``--user`` can be added if you want to install NNI in your home directory, which does not require any special privileges.
.. Note:: If there is an error like ``Segmentation fault``, please refer to the :doc:`FAQ <FAQ>`. .. Note:: If there is an error like ``Segmentation fault``, please refer to the :doc:`FAQ <FAQ>`.
.. Note:: For the system requirements of NNI, please refer to :doc:`Install NNI on Linux & Mac <InstallationLinux>` or :doc:`Windows <InstallationWin>`. .. Note:: For the system requirements of NNI, please refer to :doc:`Install NNI on Linux & Mac <InstallationLinux>` or :doc:`Windows <InstallationWin>`. If you want to use docker, refer to :doc:`HowToUseDocker <HowToUseDocker>`.
Enable NNI Command-line Auto-Completion (Optional)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
After the installation, you may want to enable the auto-completion feature for **nnictl** commands. Please refer to this `tutorial <../CommunitySharings/AutoCompletion.rst>`__.
"Hello World" example on MNIST "Hello World" example on MNIST
------------------------------ ------------------------------
NNI is a toolkit to help users run automated machine learning experiments. It can automatically do the cyclic process of getting hyperparameters, running trials, testing results, and tuning hyperparameters. Here, we'll show how to use NNI to help you find the optimal hyperparameters for a MNIST model. NNI is a toolkit to help users run automated machine learning experiments. It can automatically do the cyclic process of getting hyperparameters, running trials, testing results, and tuning hyperparameters. Here, we'll show how to use NNI to help you find the optimal hyperparameters on the MNIST dataset.
Here is an example script to train a CNN on the MNIST dataset **without NNI**: Here is an example script to train a CNN on the MNIST dataset **without NNI**:
...@@ -63,9 +59,9 @@ Here is an example script to train a CNN on the MNIST dataset **without NNI**: ...@@ -63,9 +59,9 @@ Here is an example script to train a CNN on the MNIST dataset **without NNI**:
} }
main(params) main(params)
The above code can only try one set of parameters at a time; if we want to tune learning rate, we need to manually modify the hyperparameter and start the trial again and again. The above code can only try one set of parameters at a time. If you want to tune the learning rate, you need to manually modify the hyperparameter and start the trial again and again.
NNI is born to help the user do tuning jobs; the NNI working process is presented below: NNI is born to help users tune jobs, whose working process is presented below:
.. code-block:: text .. code-block:: text
...@@ -80,26 +76,20 @@ NNI is born to help the user do tuning jobs; the NNI working process is presente ...@@ -80,26 +76,20 @@ NNI is born to help the user do tuning jobs; the NNI working process is presente
6: Stop the experiment 6: Stop the experiment
7: return hyperparameter value with best final result 7: return hyperparameter value with best final result
If you want to use NNI to automatically train your model and find the optimal hyper-parameters, you need to do three changes based on your code: .. note::
Three steps to start an experiment If you want to use NNI to automatically train your model and find the optimal hyper-parameters, there are two approaches:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**Step 1**: Write a ``Search Space`` file in JSON, including the ``name`` and the ``distribution`` (discrete-valued or continuous-valued) of all the hyperparameters you need to search. 1. Write a config file and start the experiment from the command line.
2. Config and launch the experiment directly from a Python file
.. code-block:: diff In the this part, we will focus on the first approach. For the second approach, please refer to `this tutorial <HowToLaunchFromPython.rst>`__\ .
- params = {'batch_size': 32, 'hidden_size': 128, 'lr': 0.001, 'momentum': 0.5}
+ {
+ "batch_size": {"_type":"choice", "_value": [16, 32, 64, 128]},
+ "hidden_size":{"_type":"choice","_value":[128, 256, 512, 1024]},
+ "lr":{"_type":"choice","_value":[0.0001, 0.001, 0.01, 0.1]},
+ "momentum":{"_type":"uniform","_value":[0, 1]}
+ }
*Example:* :githublink:`search_space.json <examples/trials/mnist-pytorch/search_space.json>` Step 1: Modify the ``Trial`` Code
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**Step 2**\ : Modify your ``Trial`` file to get the hyperparameter set from NNI and report the final result to NNI. Modify your ``Trial`` file to get the hyperparameter set from NNI and report the final results to NNI.
.. code-block:: diff .. code-block:: diff
...@@ -128,55 +118,83 @@ Three steps to start an experiment ...@@ -128,55 +118,83 @@ Three steps to start an experiment
*Example:* :githublink:`mnist.py <examples/trials/mnist-pytorch/mnist.py>` *Example:* :githublink:`mnist.py <examples/trials/mnist-pytorch/mnist.py>`
**Step 3**\ : Define a ``config`` file in YAML which declares the ``path`` to the search space and trial files. It also gives other information such as the tuning algorithm, max trial number, and max duration arguments.
Step 2: Define the Search Space
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Define a ``Search Space`` in a YAML file, including the ``name`` and the ``distribution`` (discrete-valued or continuous-valued) of all the hyperparameters you want to search.
.. code-block:: yaml .. code-block:: yaml
authorName: default searchSpace:
experimentName: example_mnist batch_size:
trialConcurrency: 1 _type: choice
maxExecDuration: 1h _value: [16, 32, 64, 128]
maxTrialNum: 10 hidden_size:
trainingServicePlatform: local _type: choice
# The path to Search Space _value: [128, 256, 512, 1024]
searchSpacePath: search_space.json lr:
useAnnotation: false _type: choice
tuner: _value: [0.0001, 0.001, 0.01, 0.1]
builtinTunerName: TPE momentum:
# The path and the running command of trial _type: uniform
trial: _value: [0, 1]
command: python3 mnist.py
codeDir: . *Example:* :githublink:`config_detailed.yml <examples/trials/mnist-pytorch/config_detailed.yml>`
gpuNum: 0
You can also write your search space in a JSON file and specify the file path in the configuration. For detailed tutorial on how to write the search space, please see `here <SearchSpaceSpec.rst>`__.
Step 3: Config the Experiment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In addition to the search_space defined in the `step2 <step-2-define-the-search-space>`__, you need to config the experiment in the YAML file. It specifies the key information of the experiment, such as the trial files, tuning algorithm, max trial number, and max duration, etc.
.. code-block:: yaml
experimentName: MNIST # An optional name to distinguish the experiments
trialCommand: python3 mnist.py # NOTE: change "python3" to "python" if you are using Windows
trialConcurrency: 2 # Run 2 trials concurrently
maxTrialNumber: 10 # Generate at most 10 trials
maxExperimentDuration: 1h # Stop generating trials after 1 hour
tuner: # Configure the tuning algorithm
name: TPE
classArgs: # Algorithm specific arguments
optimize_mode: maximize
trainingService: # Configure the training platform
platform: local
Experiment config reference could be found `here <../reference/experiment_config.rst>`__.
.. _nniignore: .. _nniignore:
.. Note:: If you are planning to use remote machines or clusters as your :doc:`training service <../TrainingService/Overview>`, to avoid too much pressure on network, we limit the number of files to 2000 and total size to 300MB. If your codeDir contains too many files, you can choose which files and subfolders should be excluded by adding a ``.nniignore`` file that works like a ``.gitignore`` file. For more details on how to write this file, see the `git documentation <https://git-scm.com/docs/gitignore#_pattern_format>`__. .. Note:: If you are planning to use remote machines or clusters as your :doc:`training service <../TrainingService/Overview>`, to avoid too much pressure on network, NNI limits the number of files to 2000 and total size to 300MB. If your codeDir contains too many files, you can choose which files and subfolders should be excluded by adding a ``.nniignore`` file that works like a ``.gitignore`` file. For more details on how to write this file, see the `git documentation <https://git-scm.com/docs/gitignore#_pattern_format>`__.
*Example:* :githublink:`config_detailed.yml <examples/trials/mnist-pytorch/config_detailed.yml>` and :githublink:`.nniignore <examples/trials/mnist-pytorch/.nniignore>`
All the code above is already prepared and stored in :githublink:`examples/trials/mnist-pytorch/<examples/trials/mnist-pytorch>`.
*Example:* :githublink:`config.yml <examples/trials/mnist-pytorch/config.yml>` and :githublink:`.nniignore <examples/trials/mnist-pytorch/.nniignore>`
All the code above is already prepared and stored in :githublink:`examples/trials/mnist-pytorch/ <examples/trials/mnist-pytorch>`. Step 4: Launch the Experiment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Linux and macOS Linux and macOS
^^^^^^^^^^^^^^^ ***************
Run the **config.yml** file from your command line to start an MNIST experiment. Run the **config_detailed.yml** file from your command line to start the experiment.
.. code-block:: bash .. code-block:: bash
nnictl create --config nni/examples/trials/mnist-pytorch/config.yml nnictl create --config nni/examples/trials/mnist-pytorch/config_detailed.yml
Windows Windows
^^^^^^^ *******
Run the **config_windows.yml** file from your command line to start an MNIST experiment. Change ``python3`` to ``python`` of the ``trialCommand`` field in the **config_detailed.yml** file, and run the **config_detailed.yml** file from your command line to start the experiment.
.. code-block:: bash .. code-block:: bash
nnictl create --config nni\examples\trials\mnist-pytorch\config_windows.yml nnictl create --config nni\examples\trials\mnist-pytorch\config_detailed.yml
.. Note:: If you're using NNI on Windows, you probably need to change ``python3`` to ``python`` in the config.yml file or use the config_windows.yml file to start the experiment.
.. Note:: ``nnictl`` is a command line tool that can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc. Click :doc:`here <Nnictl>` for more usage of ``nnictl``. .. Note:: ``nnictl`` is a command line tool that can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc. Click :doc:`here <Nnictl>` for more usage of ``nnictl``.
...@@ -208,24 +226,25 @@ Wait for the message ``INFO: Successfully started experiment!`` in the command l ...@@ -208,24 +226,25 @@ Wait for the message ``INFO: Successfully started experiment!`` in the command l
8. nnictl --help get help information about nnictl 8. nnictl --help get help information about nnictl
----------------------------------------------------------------------- -----------------------------------------------------------------------
If you prepared ``trial``\ , ``search space``\ , and ``config`` according to the above steps and successfully created an NNI job, NNI will automatically tune the optimal hyper-parameters and run different hyper-parameter sets for each trial according to the requirements you set. You can clearly see its progress through the NNI WebUI. If you prepared ``trial``\ , ``search space``\ , and ``config`` according to the above steps and successfully created an NNI job, NNI will automatically tune the optimal hyper-parameters and run different hyper-parameter sets for each trial according to the defined search space. You can see its progress through the WebUI clearly.
WebUI Step 5: View the Experiment
----- ^^^^^^^^^^^^^^^^^^^^^^^^^^^
After you start your experiment in NNI successfully, you can find a message in the command-line interface that tells you the ``Web UI url`` like this: After starting the experiment successfully, you can find a message in the command-line interface that tells you the ``Web UI url`` like this:
.. code-block:: text .. code-block:: text
The Web UI urls are: [Your IP]:8080 The Web UI urls are: [Your IP]:8080
Open the ``Web UI url`` (Here it's: ``[Your IP]:8080``\ ) in your browser; you can view detailed information about the experiment and all the submitted trial jobs as shown below. If you cannot open the WebUI link in your terminal, please refer to the `FAQ <FAQ.rst>`__. Open the ``Web UI url`` (Here it's: ``[Your IP]:8080``\ ) in your browser, you can view detailed information about the experiment and all the submitted trial jobs as shown below. If you cannot open the WebUI link in your terminal, please refer to the `FAQ <FAQ.rst#could-not-open-webui-link>`__.
View overview page
^^^^^^^^^^^^^^^^^^
View Overview Page
******************
Information about this experiment will be shown in the WebUI, including the experiment trial profile and search space message. NNI also supports downloading this information and the parameters through the **Experiment summary** button. Information about this experiment will be shown in the WebUI, including the experiment profile and search space message. NNI also supports downloading this information and the parameters through the **Experiment summary** button.
.. image:: ../../img/webui-img/full-oview.png .. image:: ../../img/webui-img/full-oview.png
...@@ -233,11 +252,10 @@ Information about this experiment will be shown in the WebUI, including the expe ...@@ -233,11 +252,10 @@ Information about this experiment will be shown in the WebUI, including the expe
:alt: overview :alt: overview
View Trials Detail Page
***********************
View trials detail page You could see the best trial metrics and hyper-parameter graph in this page. And the table content includes more columns when you click the button ``Add/Remove columns``.
^^^^^^^^^^^^^^^^^^^^^^^
We could see best trial metrics and hyper-parameter graph in this page. And the table content includes more columns when you click the button ``Add/Remove columns``.
.. image:: ../../img/webui-img/full-detail.png .. image:: ../../img/webui-img/full-detail.png
...@@ -245,9 +263,8 @@ We could see best trial metrics and hyper-parameter graph in this page. And the ...@@ -245,9 +263,8 @@ We could see best trial metrics and hyper-parameter graph in this page. And the
:alt: detail :alt: detail
View Experiments Management Page
View experiments management page ********************************
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
On the ``All experiments`` page, you can see all the experiments on your machine. On the ``All experiments`` page, you can see all the experiments on your machine.
...@@ -255,22 +272,18 @@ On the ``All experiments`` page, you can see all the experiments on your machine ...@@ -255,22 +272,18 @@ On the ``All experiments`` page, you can see all the experiments on your machine
:target: ../../img/webui-img/managerExperimentList/expList.png :target: ../../img/webui-img/managerExperimentList/expList.png
:alt: Experiments list :alt: Experiments list
For more detailed usage of WebUI, please refer to `this doc <./WebUI.rst>`__.
More detail please refer `the doc <./WebUI.rst>`__.
Related Topic Related Topic
------------- -------------
* `How to debug? <HowToDebug.rst>`__
* `How to write a trial? <../TrialExample/Trials.rst>`__
* `How to try different Tuners? <../Tuner/BuiltinTuner.rst>`__
* `How to try different Assessors? <../Assessor/BuiltinAssessor.rst>`__
* `How to run an experiment on the different training platforms? <../training_services.rst>`__
* `How to use Annotation? <AnnotationSpec.rst>`__
* `How to use the command line tool nnictl? <Nnictl.rst>`__
* `How to launch Tensorboard on WebUI? <Tensorboard.rst>`__
* `Launch Tensorboard on WebUI <Tensorboard.rst>`__
* `Try different Tuners <../Tuner/BuiltinTuner.rst>`__
* `Try different Assessors <../Assessor/BuiltinAssessor.rst>`__
* `How to use command line tool nnictl <Nnictl.rst>`__
* `How to write a trial <../TrialExample/Trials.rst>`__
* `How to run an experiment on local (with multiple GPUs)? <../TrainingService/LocalMode.rst>`__
* `How to run an experiment on multiple machines? <../TrainingService/RemoteMachineMode.rst>`__
* `How to run an experiment on OpenPAI? <../TrainingService/PaiMode.rst>`__
* `How to run an experiment on Kubernetes through Kubeflow? <../TrainingService/KubeflowMode.rst>`__
* `How to run an experiment on Kubernetes through FrameworkController? <../TrainingService/FrameworkControllerMode.rst>`__
* `How to run an experiment on Kubernetes through AdaptDL? <../TrainingService/AdaptDLMode.rst>`__
...@@ -7,14 +7,13 @@ Search Space ...@@ -7,14 +7,13 @@ Search Space
Overview Overview
-------- --------
In NNI, tuner will sample parameters/architecture according to the search space, which is defined as a json file. In NNI, tuner will sample parameters/architectures according to the search space.
To define a search space, users should define the name of the variable, the type of sampling strategy and its parameters. To define a search space, users should define the name of the variable, the type of sampling strategy and its parameters.
* An example of a search space definition in a JSON file is as follow:
* An example of a search space definition is as follow: .. code-block:: json
.. code-block:: yaml
{ {
"dropout_rate": {"_type": "uniform", "_value": [0.1, 0.5]}, "dropout_rate": {"_type": "uniform", "_value": [0.1, 0.5]},
...@@ -24,7 +23,9 @@ To define a search space, users should define the name of the variable, the type ...@@ -24,7 +23,9 @@ To define a search space, users should define the name of the variable, the type
"learning_rate": {"_type": "uniform", "_value": [0.0001, 0.1]} "learning_rate": {"_type": "uniform", "_value": [0.0001, 0.1]}
} }
Take the first line as an example. ``dropout_rate`` is defined as a variable whose priori distribution is a uniform distribution with a range from ``0.1`` to ``0.5``. Take the first line as an example. ``dropout_rate`` is defined as a variable whose prior distribution is a uniform distribution with a range from ``0.1`` to ``0.5``.
.. note:: In the `experiment configuration (V2) schema <ExperimentConfig.rst>`_, NNI supports defining the search space directly in the configuration file, detailed usage can be found `here <QuickStart.rst#step-2-define-the-search-space>`__. When using Python API, users can write the search space in the Python file, refer `here <HowToLaunchFromPython.rst>`__.
Note that the available sampling strategies within a search space depend on the tuner you want to use. We list the supported types for each builtin tuner below. For a customized tuner, you don't have to follow our convention and you will have the flexibility to define any type you want. Note that the available sampling strategies within a search space depend on the tuner you want to use. We list the supported types for each builtin tuner below. For a customized tuner, you don't have to follow our convention and you will have the flexibility to define any type you want.
...@@ -38,7 +39,7 @@ All types of sampling strategies and their parameter are listed here: ...@@ -38,7 +39,7 @@ All types of sampling strategies and their parameter are listed here:
``{"_type": "choice", "_value": options}`` ``{"_type": "choice", "_value": options}``
* The variable's value is one of the options. Here ``options`` should be a list of numbers or a list of strings. Using arbitrary objects as members of this list (like sublists, a mixture of numbers and strings, or null values) should work in most cases, but may trigger undefined behaviors. * The variable's value is one of the options. Here ``options`` should be a list of **numbers** or a list of **strings**. Using arbitrary objects as members of this list (like sublists, a mixture of numbers and strings, or null values) should work in most cases, but may trigger undefined behaviors.
* ``options`` can also be a nested sub-search-space, this sub-search-space takes effect only when the corresponding element is chosen. The variables in this sub-search-space can be seen as conditional variables. Here is an simple :githublink:`example of nested search space definition <examples/trials/mnist-nested-search-space/search_space.json>`. If an element in the options list is a dict, it is a sub-search-space, and for our built-in tuners you have to add a ``_name`` key in this dict, which helps you to identify which element is chosen. Accordingly, here is a :githublink:`sample <examples/trials/mnist-nested-search-space/sample.json>` which users can get from nni with nested search space definition. See the table below for the tuners which support nested search spaces. * ``options`` can also be a nested sub-search-space, this sub-search-space takes effect only when the corresponding element is chosen. The variables in this sub-search-space can be seen as conditional variables. Here is an simple :githublink:`example of nested search space definition <examples/trials/mnist-nested-search-space/search_space.json>`. If an element in the options list is a dict, it is a sub-search-space, and for our built-in tuners you have to add a ``_name`` key in this dict, which helps you to identify which element is chosen. Accordingly, here is a :githublink:`sample <examples/trials/mnist-nested-search-space/sample.json>` which users can get from nni with nested search space definition. See the table below for the tuners which support nested search spaces.
* *
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment