Unverified Commit 3ec26b40 authored by liuzhe-lz's avatar liuzhe-lz Committed by GitHub
Browse files

Merge master into dev-retiarii (#3178)

parent d165905d
Contributing to Neural Network Intelligence (NNI)
=================================================
Great!! We are always on the lookout for more contributors to our code base.
Firstly, if you are unsure or afraid of anything, just ask or submit the issue or pull request anyways. You won't be yelled at for giving your best effort. The worst that can happen is that you'll be politely asked to change something. We appreciate any sort of contributions and don't want a wall of rules to get in the way of that.
However, for those individuals who want a bit more guidance on the best way to contribute to the project, read on. This document will cover all the points we're looking for in your contributions, raising your chances of quickly merging or addressing your contributions.
Looking for a quickstart, get acquainted with our `Get Started <QuickStart.rst>`__ guide.
There are a few simple guidelines that you need to follow before providing your hacks.
Raising Issues
--------------
When raising issues, please specify the following:
* Setup details needs to be filled as specified in the issue template clearly for the reviewer to check.
* A scenario where the issue occurred (with details on how to reproduce it).
* Errors and log messages that are displayed by the software.
* Any other details that might be useful.
Submit Proposals for New Features
---------------------------------
*
There is always something more that is required, to make it easier to suit your use-cases. Feel free to join the discussion on new features or raise a PR with your proposed change.
*
Fork the repository under your own github handle. After cloning the repository. Add, commit, push and sqaush (if necessary) the changes with detailed commit messages to your fork. From where you can proceed to making a pull request.
Contributing to Source Code and Bug Fixes
-----------------------------------------
Provide PRs with appropriate tags for bug fixes or enhancements to the source code. Do follow the correct naming conventions and code styles when you work on and do try to implement all code reviews along the way.
If you are looking for How to develop and debug the NNI source code, you can refer to `How to set up NNI developer environment doc <./SetupNniDeveloperEnvironment.rst>`__ file in the ``docs`` folder.
Similarly for `Quick Start <QuickStart.rst>`__. For everything else, refer to `NNI Home page <http://nni.readthedocs.io>`__.
Solve Existing Issues
---------------------
Head over to `issues <https://github.com/Microsoft/nni/issues>`__ to find issues where help is needed from contributors. You can find issues tagged with 'good-first-issue' or 'help-wanted' to contribute in.
A person looking to contribute can take up an issue by claiming it as a comment/assign their Github ID to it. In case there is no PR or update in progress for a week on the said issue, then the issue reopens for anyone to take up again. We need to consider high priority issues/regressions where response time must be a day or so.
Code Styles & Naming Conventions
--------------------------------
* We follow `PEP8 <https://www.python.org/dev/peps/pep-0008/>`__ for Python code and naming conventions, do try to adhere to the same when making a pull request or making a change. One can also take the help of linters such as ``flake8`` or ``pylint``
* We also follow `NumPy Docstring Style <https://www.sphinx-doc.org/en/master/usage/extensions/example_numpy.html#example-numpy>`__ for Python Docstring Conventions. During the `documentation building <Contributing.rst#documentation>`__\ , we use `sphinx.ext.napoleon <https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html>`__ to generate Python API documentation from Docstring.
* For docstrings, please refer to `numpydoc docstring guide <https://numpydoc.readthedocs.io/en/latest/format.html>`__ and `pandas docstring guide <https://python-sprints.github.io/pandas/guide/pandas_docstring.html>`__
* For function docstring, **description**, **Parameters**, and **Returns** **Yields** are mandatory.
* For class docstring, **description**, **Attributes** are mandatory.
* For docstring to describe ``dict``, which is commonly used in our hyper-param format description, please refer to RiboKit Doc Standards
* `Internal Guideline on Writing Standards <https://ribokit.github.io/docs/text/>`__
Documentation
-------------
Our documentation is built with :githublink:`sphinx <docs>`.
* Before submitting the documentation change, please **build homepage locally**: ``cd docs/en_US && make html``, then you can see all the built documentation webpage under the folder ``docs/en_US/_build/html``. It's also highly recommended taking care of **every WARNING** during the build, which is very likely the signal of a **deadlink** and other annoying issues.
*
For links, please consider using **relative paths** first. However, if the documentation is written in Markdown format, and:
* It's an image link which needs to be formatted with embedded html grammar, please use global URL like ``https://user-images.githubusercontent.com/44491713/51381727-e3d0f780-1b4f-11e9-96ab-d26b9198ba65.png``, which can be automatically generated by dragging picture onto `Github Issue <https://github.com/Microsoft/nni/issues/new>`__ Box.
* It cannot be re-formatted by sphinx, such as source code, please use its global URL. For source code that links to our github repo, please use URLs rooted at ``https://github.com/Microsoft/nni/tree/v1.9/`` (:githublink:`mnist.py <examples/trials/mnist-tfv1/mnist.py>` for example).
Experiment Config Reference
===========================
A config file is needed when creating an experiment. The path of the config file is provided to ``nnictl``.
The config file is in YAML format.
This document describes the rules to write the config file, and provides some examples and templates.
* `Experiment Config Reference <#experiment-config-reference>`__
* `Template <#template>`__
* `Configuration Spec <#configuration-spec>`__
* `authorName <#authorname>`__
* `experimentName <#experimentname>`__
* `trialConcurrency <#trialconcurrency>`__
* `maxExecDuration <#maxexecduration>`__
* `versionCheck <#versioncheck>`__
* `debug <#debug>`__
* `maxTrialNum <#maxtrialnum>`__
* `trainingServicePlatform <#trainingserviceplatform>`__
* `searchSpacePath <#searchspacepath>`__
* `useAnnotation <#useannotation>`__
* `multiThread <#multithread>`__
* `nniManagerIp <#nnimanagerip>`__
* `logDir <#logdir>`__
* `logLevel <#loglevel>`__
* `logCollection <#logcollection>`__
* `tuner <#tuner>`__
* `builtinTunerName <#builtintunername>`__
* `codeDir <#codedir>`__
* `classFileName <#classfilename>`__
* `className <#classname>`__
* `classArgs <#classargs>`__
* `gpuIndices <#gpuindices>`__
* `includeIntermediateResults <#includeintermediateresults>`__
* `assessor <#assessor>`__
* `builtinAssessorName <#builtinassessorname>`__
* `codeDir <#codedir-1>`__
* `classFileName <#classfilename-1>`__
* `className <#classname-1>`__
* `classArgs <#classargs-1>`__
* `advisor <#advisor>`__
* `builtinAdvisorName <#builtinadvisorname>`__
* `codeDir <#codedir-2>`__
* `classFileName <#classfilename-2>`__
* `className <#classname-2>`__
* `classArgs <#classargs-2>`__
* `gpuIndices <#gpuindices-1>`__
* `trial <#trial>`__
* `localConfig <#localconfig>`__
* `gpuIndices <#gpuindices-2>`__
* `maxTrialNumPerGpu <#maxtrialnumpergpu>`__
* `useActiveGpu <#useactivegpu>`__
* `machineList <#machinelist>`__
* `ip <#ip>`__
* `port <#port>`__
* `username <#username>`__
* `passwd <#passwd>`__
* `sshKeyPath <#sshkeypath>`__
* `passphrase <#passphrase>`__
* `gpuIndices <#gpuindices-3>`__
* `maxTrialNumPerGpu <#maxtrialnumpergpu-1>`__
* `useActiveGpu <#useactivegpu-1>`__
* `preCommand <#preCommand>`__
* `kubeflowConfig <#kubeflowconfig>`__
* `operator <#operator>`__
* `storage <#storage>`__
* `nfs <#nfs>`__
* `keyVault <#keyvault>`__
* `azureStorage <#azurestorage>`__
* `uploadRetryCount <#uploadretrycount>`__
* `paiConfig <#paiconfig>`__
* `userName <#username>`__
* `password <#password>`__
* `token <#token>`__
* `host <#host>`__
* `reuse <#reuse>`__
* `Examples <#examples>`__
* `Local mode <#local-mode>`__
* `Remote mode <#remote-mode>`__
* `PAI mode <#pai-mode>`__
* `Kubeflow mode <#kubeflow-mode>`__
* `Kubeflow with azure storage <#kubeflow-with-azure-storage>`__
Template
--------
* **Light weight (without Annotation and Assessor)**
.. code-block:: yaml
authorName:
experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote, pai, kubeflow
trainingServicePlatform:
searchSpacePath:
#choice: true, false, default: false
useAnnotation:
#choice: true, false, default: false
multiThread:
tuner:
#choice: TPE, Random, Anneal, Evolution
builtinTunerName:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuIndices:
trial:
command:
codeDir:
gpuNum:
#machineList can be empty if the platform is local
machineList:
- ip:
port:
username:
passwd:
* **Use Assessor**
.. code-block:: yaml
authorName:
experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote, pai, kubeflow
trainingServicePlatform:
searchSpacePath:
#choice: true, false, default: false
useAnnotation:
#choice: true, false, default: false
multiThread:
tuner:
#choice: TPE, Random, Anneal, Evolution
builtinTunerName:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuIndices:
assessor:
#choice: Medianstop
builtinAssessorName:
classArgs:
#choice: maximize, minimize
optimize_mode:
trial:
command:
codeDir:
gpuNum:
#machineList can be empty if the platform is local
machineList:
- ip:
port:
username:
passwd:
* **Use Annotation**
.. code-block:: yaml
authorName:
experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote, pai, kubeflow
trainingServicePlatform:
#choice: true, false, default: false
useAnnotation:
#choice: true, false, default: false
multiThread:
tuner:
#choice: TPE, Random, Anneal, Evolution
builtinTunerName:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuIndices:
assessor:
#choice: Medianstop
builtinAssessorName:
classArgs:
#choice: maximize, minimize
optimize_mode:
trial:
command:
codeDir:
gpuNum:
#machineList can be empty if the platform is local
machineList:
- ip:
port:
username:
passwd:
Configuration Spec
------------------
authorName
^^^^^^^^^^
Required. String.
The name of the author who create the experiment.
*TBD: add default value.*
experimentName
^^^^^^^^^^^^^^
Required. String.
The name of the experiment created.
*TBD: add default value.*
trialConcurrency
^^^^^^^^^^^^^^^^
Required. Integer between 1 and 99999.
Specifies the max num of trial jobs run simultaneously.
If trialGpuNum is bigger than the free gpu numbers, and the trial jobs running simultaneously can not reach **trialConcurrency** number, some trial jobs will be put into a queue to wait for gpu allocation.
maxExecDuration
^^^^^^^^^^^^^^^
Optional. String. Default: 999d.
**maxExecDuration** specifies the max duration time of an experiment. The unit of the time is {**s**\ ,** m**\ ,** h**\ ,** d**\ }, which means {*seconds*\ , *minutes*\ , *hours*\ , *days*\ }.
Note: The maxExecDuration spec set the time of an experiment, not a trial job. If the experiment reach the max duration time, the experiment will not stop, but could not submit new trial jobs any more.
versionCheck
^^^^^^^^^^^^
Optional. Bool. Default: true.
NNI will check the version of nniManager process and the version of trialKeeper in remote, pai and kubernetes platform. If you want to disable version check, you could set versionCheck be false.
debug
^^^^^
Optional. Bool. Default: false.
Debug mode will set versionCheck to false and set logLevel to be 'debug'.
maxTrialNum
^^^^^^^^^^^
Optional. Integer between 1 and 99999. Default: 99999.
Specifies the max number of trial jobs created by NNI, including succeeded and failed jobs.
trainingServicePlatform
^^^^^^^^^^^^^^^^^^^^^^^
Required. String.
Specifies the platform to run the experiment, including **local**\ ,** remote**\ ,** pai**\ ,** kubeflow**\ ,** frameworkcontroller**.
*
**local** run an experiment on local ubuntu machine.
*
**remote** submit trial jobs to remote ubuntu machines, and** machineList** field should be filed in order to set up SSH connection to remote machine.
*
**pai** submit trial jobs to `OpenPAI <https://github.com/Microsoft/pai>`__ of Microsoft. For more details of pai configuration, please refer to `Guide to PAI Mode <../TrainingService/PaiMode.rst>`__
*
**kubeflow** submit trial jobs to `kubeflow <https://www.kubeflow.org/docs/about/kubeflow/>`__\ , NNI support kubeflow based on normal kubernetes and `azure kubernetes <https://azure.microsoft.com/en-us/services/kubernetes-service/>`__. For detail please refer to `Kubeflow Docs <../TrainingService/KubeflowMode.rst>`__
*
**adl** submit trial jobs to `AdaptDL <https://www.kubeflow.org/docs/about/kubeflow/>`__\ , NNI support AdaptDL on Kubernetes cluster. For detail please refer to `AdaptDL Docs <../TrainingService/AdaptDLMode.rst>`__
*
TODO: explain frameworkcontroller.
searchSpacePath
^^^^^^^^^^^^^^^
Optional. Path to existing file.
Specifies the path of search space file, which should be a valid path in the local linux machine.
The only exception that **searchSpacePath** can be not fulfilled is when ``useAnnotation=True``.
useAnnotation
^^^^^^^^^^^^^
Optional. Bool. Default: false.
Use annotation to analysis trial code and generate search space.
Note: if **useAnnotation** is true, the searchSpacePath field should be removed.
multiThread
^^^^^^^^^^^
Optional. Bool. Default: false.
Enable multi-thread mode for dispatcher. If multiThread is enabled, dispatcher will start a thread to process each command from NNI Manager.
nniManagerIp
^^^^^^^^^^^^
Optional. String. Default: eth0 device IP.
Set the IP address of the machine on which NNI manager process runs. This field is optional, and if it's not set, eth0 device IP will be used instead.
Note: run ``ifconfig`` on NNI manager's machine to check if eth0 device exists. If not, **nniManagerIp** is recommended to set explicitly.
logDir
^^^^^^
Optional. Path to a directory. Default: ``<user home directory>/nni-experiments``.
Configures the directory to store logs and data of the experiment.
logLevel
^^^^^^^^
Optional. String. Default: ``info``.
Sets log level for the experiment. Available log levels are: ``trace``\ , ``debug``\ , ``info``\ , ``warning``\ , ``error``\ , ``fatal``.
logCollection
^^^^^^^^^^^^^
Optional. ``http`` or ``none``. Default: ``none``.
Set the way to collect log in remote, pai, kubeflow, frameworkcontroller platform. There are two ways to collect log, one way is from ``http``\ , trial keeper will post log content back from http request in this way, but this way may slow down the speed to process logs in trialKeeper. The other way is ``none``\ , trial keeper will not post log content back, and only post job metrics. If your log content is too big, you could consider setting this param be ``none``.
tuner
^^^^^
Required.
Specifies the tuner algorithm in the experiment, there are two kinds of ways to set tuner. One way is to use tuner provided by NNI sdk (built-in tuners), in which case you need to set **builtinTunerName** and **classArgs**. Another way is to use users' own tuner file, in which case **codeDirectory**\ ,** classFileName**\ ,** className** and **classArgs** are needed. *Users must choose exactly one way.*
builtinTunerName
^^^^^^^^^^^^^^^^
Required if using built-in tuners. String.
Specifies the name of system tuner, NNI sdk provides different tuners introduced `here <../Tuner/BuiltinTuner.rst>`__.
codeDir
^^^^^^^
Required if using customized tuners. Path relative to the location of config file.
Specifies the directory of tuner code.
classFileName
^^^^^^^^^^^^^
Required if using customized tuners. File path relative to **codeDir**.
Specifies the name of tuner file.
className
^^^^^^^^^
Required if using customized tuners. String.
Specifies the name of tuner class.
classArgs
^^^^^^^^^
Optional. Key-value pairs. Default: empty.
Specifies the arguments of tuner algorithm. Please refer to `this file <../Tuner/BuiltinTuner.rst>`__ for the configurable arguments of each built-in tuner.
gpuIndices
^^^^^^^^^^
Optional. String. Default: empty.
Specifies the GPUs that can be used by the tuner process. Single or multiple GPU indices can be specified. Multiple GPU indices are separated by comma ``,``. For example, ``1``\ , or ``0,1,3``. If the field is not set, no GPU will be visible to tuner (by setting ``CUDA_VISIBLE_DEVICES`` to be an empty string).
includeIntermediateResults
^^^^^^^^^^^^^^^^^^^^^^^^^^
Optional. Bool. Default: false.
If **includeIntermediateResults** is true, the last intermediate result of the trial that is early stopped by assessor is sent to tuner as final result.
assessor
^^^^^^^^
Specifies the assessor algorithm to run an experiment. Similar to tuners, there are two kinds of ways to set assessor. One way is to use assessor provided by NNI sdk. Users need to set **builtinAssessorName** and **classArgs**. Another way is to use users' own assessor file, and users need to set **codeDirectory**\ ,** classFileName**\ ,** className** and **classArgs**. *Users must choose exactly one way.*
By default, there is no assessor enabled.
builtinAssessorName
^^^^^^^^^^^^^^^^^^^
Required if using built-in assessors. String.
Specifies the name of built-in assessor, NNI sdk provides different assessors introduced `here <../Assessor/BuiltinAssessor.rst>`__.
codeDir
^^^^^^^
Required if using customized assessors. Path relative to the location of config file.
Specifies the directory of assessor code.
classFileName
^^^^^^^^^^^^^
Required if using customized assessors. File path relative to **codeDir**.
Specifies the name of assessor file.
className
^^^^^^^^^
Required if using customized assessors. String.
Specifies the name of assessor class.
classArgs
^^^^^^^^^
Optional. Key-value pairs. Default: empty.
Specifies the arguments of assessor algorithm.
advisor
^^^^^^^
Optional.
Specifies the advisor algorithm in the experiment. Similar to tuners and assessors, there are two kinds of ways to specify advisor. One way is to use advisor provided by NNI sdk, need to set **builtinAdvisorName** and **classArgs**. Another way is to use users' own advisor file, and need to set **codeDirectory**\ ,** classFileName**\ ,** className** and **classArgs**.
When advisor is enabled, settings of tuners and advisors will be bypassed.
builtinAdvisorName
^^^^^^^^^^^^^^^^^^
Specifies the name of a built-in advisor. NNI sdk provides `BOHB <../Tuner/BohbAdvisor.md>`__ and `Hyperband <../Tuner/HyperbandAdvisor.rst>`__.
codeDir
^^^^^^^
Required if using customized advisors. Path relative to the location of config file.
Specifies the directory of advisor code.
classFileName
^^^^^^^^^^^^^
Required if using customized advisors. File path relative to **codeDir**.
Specifies the name of advisor file.
className
^^^^^^^^^
Required if using customized advisors. String.
Specifies the name of advisor class.
classArgs
^^^^^^^^^
Optional. Key-value pairs. Default: empty.
Specifies the arguments of advisor.
gpuIndices
^^^^^^^^^^
Optional. String. Default: empty.
Specifies the GPUs that can be used. Single or multiple GPU indices can be specified. Multiple GPU indices are separated by comma ``,``. For example, ``1``\ , or ``0,1,3``. If the field is not set, no GPU will be visible to tuner (by setting ``CUDA_VISIBLE_DEVICES`` to be an empty string).
trial
^^^^^
Required. Key-value pairs.
In local and remote mode, the following keys are required.
*
**command**\ : Required string. Specifies the command to run trial process.
*
**codeDir**\ : Required string. Specifies the directory of your own trial file. This directory will be automatically uploaded in remote mode.
*
**gpuNum**\ : Optional integer. Specifies the num of gpu to run the trial process. Default value is 0.
In PAI mode, the following keys are required.
*
**command**\ : Required string. Specifies the command to run trial process.
*
**codeDir**\ : Required string. Specifies the directory of the own trial file. Files in the directory will be uploaded in PAI mode.
*
**gpuNum**\ : Required integer. Specifies the num of gpu to run the trial process. Default value is 0.
*
**cpuNum**\ : Required integer. Specifies the cpu number of cpu to be used in pai container.
*
**memoryMB**\ : Required integer. Set the memory size to be used in pai container, in megabytes.
*
**image**\ : Required string. Set the image to be used in pai.
*
**authFile**\ : Optional string. Used to provide Docker registry which needs authentication for image pull in PAI. `Reference <https://github.com/microsoft/pai/blob/2ea69b45faa018662bc164ed7733f6fdbb4c42b3/docs/faq.rst#q-how-to-use-private-docker-registry-job-image-when-submitting-an-openpai-job>`__.
*
**shmMB**\ : Optional integer. Shared memory size of container.
*
**portList**\ : List of key-values pairs with ``label``\ , ``beginAt``\ , ``portNumber``. See `job tutorial of PAI <https://github.com/microsoft/pai/blob/master/docs/job_tutorial.rst>`__ for details.
In Kubeflow mode, the following keys are required.
*
**codeDir**\ : The local directory where the code files are in.
*
**ps**\ : An optional configuration for kubeflow's tensorflow-operator, which includes
*
**replicas**\ : The replica number of **ps** role.
*
**command**\ : The run script in **ps**\ 's container.
*
**gpuNum**\ : The gpu number to be used in **ps** container.
*
**cpuNum**\ : The cpu number to be used in **ps** container.
*
**memoryMB**\ : The memory size of the container.
*
**image**\ : The image to be used in **ps**.
*
**worker**\ : An optional configuration for kubeflow's tensorflow-operator.
*
**replicas**\ : The replica number of **worker** role.
*
**command**\ : The run script in **worker**\ 's container.
*
**gpuNum**\ : The gpu number to be used in **worker** container.
*
**cpuNum**\ : The cpu number to be used in **worker** container.
*
**memoryMB**\ : The memory size of the container.
*
**image**\ : The image to be used in **worker**.
localConfig
^^^^^^^^^^^
Optional in local mode. Key-value pairs.
Only applicable if **trainingServicePlatform** is set to ``local``\ , otherwise there should not be** localConfig** section in configuration file.
gpuIndices
^^^^^^^^^^
Optional. String. Default: none.
Used to specify designated GPU devices for NNI, if it is set, only the specified GPU devices are used for NNI trial jobs. Single or multiple GPU indices can be specified. Multiple GPU indices should be separated with comma (\ ``,``\ ), such as ``1`` or ``0,1,3``. By default, all GPUs available will be used.
maxTrialNumPerGpu
^^^^^^^^^^^^^^^^^
Optional. Integer. Default: 1.
Used to specify the max concurrency trial number on a GPU device.
useActiveGpu
^^^^^^^^^^^^
Optional. Bool. Default: false.
Used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no other active process in the GPU. If **useActiveGpu** is set to true, NNI will use the GPU regardless of another processes. This field is not applicable for NNI on Windows.
machineList
^^^^^^^^^^^
Required in remote mode. A list of key-value pairs with the following keys.
ip
^^
Required. IP address or host name that is accessible from the current machine.
The IP address or host name of remote machine.
port
^^^^
Optional. Integer. Valid port. Default: 22.
The ssh port to be used to connect machine.
username
^^^^^^^^
Required if authentication with username/password. String.
The account of remote machine.
passwd
^^^^^^
Required if authentication with username/password. String.
Specifies the password of the account.
sshKeyPath
^^^^^^^^^^
Required if authentication with ssh key. Path to private key file.
If users use ssh key to login remote machine, **sshKeyPath** should be a valid path to a ssh key file.
*Note: if users set passwd and sshKeyPath simultaneously, NNI will try passwd first.*
passphrase
^^^^^^^^^^
Optional. String.
Used to protect ssh key, which could be empty if users don't have passphrase.
gpuIndices
^^^^^^^^^^
Optional. String. Default: none.
Used to specify designated GPU devices for NNI, if it is set, only the specified GPU devices are used for NNI trial jobs. Single or multiple GPU indices can be specified. Multiple GPU indices should be separated with comma (\ ``,``\ ), such as ``1`` or ``0,1,3``. By default, all GPUs available will be used.
maxTrialNumPerGpu
^^^^^^^^^^^^^^^^^
Optional. Integer. Default: 1.
Used to specify the max concurrency trial number on a GPU device.
useActiveGpu
^^^^^^^^^^^^
Optional. Bool. Default: false.
Used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no other active process in the GPU. If **useActiveGpu** is set to true, NNI will use the GPU regardless of another processes. This field is not applicable for NNI on Windows.
preCommand
^^^^^^^^^^
Optional. String.
Specifies the pre-command that will be executed before the remote machine executes other commands. Users can configure the experimental environment on remote machine by setting **preCommand**. If there are multiple commands need to execute, use ``&&`` to connect them, such as ``preCommand: command1 && command2 && ...``.
**Note**\ : Because **preCommand** will execute before other commands each time, it is strongly not recommended to set **preCommand** that will make changes to system, i.e. ``mkdir`` or ``touch``.
remoteConfig
^^^^^^^^^^^^
Optional field in remote mode. Users could set per machine information in ``machineList`` field, and set global configuration for remote mode in this field.
reuse
^^^^^
Optional. Bool. default: ``false``. It's an experimental feature.
If it's true, NNI will reuse remote jobs to run as many as possible trials. It can save time of creating new jobs. User needs to make sure each trial can run independent in same job, for example, avoid loading checkpoint from previous trials.
kubeflowConfig
^^^^^^^^^^^^^^
operator
^^^^^^^^
Required. String. Has to be ``tf-operator`` or ``pytorch-operator``.
Specifies the kubeflow's operator to be used, NNI support ``tf-operator`` in current version.
storage
^^^^^^^
Optional. String. Default. ``nfs``.
Specifies the storage type of kubeflow, including ``nfs`` and ``azureStorage``.
nfs
^^^
Required if using nfs. Key-value pairs.
*
**server** is the host of nfs server.
*
**path** is the mounted path of nfs.
keyVault
^^^^^^^^
Required if using azure storage. Key-value pairs.
Set **keyVault** to storage the private key of your azure storage account. Refer to https://docs.microsoft.com/en-us/azure/key-vault/key-vault-manage-with-cli2.
*
**vaultName** is the value of ``--vault-name`` used in az command.
*
**name** is the value of ``--name`` used in az command.
azureStorage
^^^^^^^^^^^^
Required if using azure storage. Key-value pairs.
Set azure storage account to store code files.
*
**accountName** is the name of azure storage account.
*
**azureShare** is the share of the azure file storage.
uploadRetryCount
^^^^^^^^^^^^^^^^
Required if using azure storage. Integer between 1 and 99999.
If upload files to azure storage failed, NNI will retry the process of uploading, this field will specify the number of attempts to re-upload files.
paiConfig
^^^^^^^^^
userName
^^^^^^^^
Required. String.
The user name of your pai account.
password
^^^^^^^^
Required if using password authentication. String.
The password of the pai account.
token
^^^^^
Required if using token authentication. String.
Personal access token that can be retrieved from PAI portal.
host
^^^^
Required. String.
The hostname of IP address of PAI.
reuse
^^^^^
Optional. Bool. default: ``false``. It's an experimental feature.
If it's true, NNI will reuse OpenPAI jobs to run as many as possible trials. It can save time of creating new jobs. User needs to make sure each trial can run independent in same job, for example, avoid loading checkpoint from previous trials.
Examples
--------
Local mode
^^^^^^^^^^
If users want to run trial jobs in local machine, and use annotation to generate search space, could use the following config:
.. code-block:: yaml
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai, kubeflow
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
tuner:
#choice: TPE, Random, Anneal, Evolution
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: /nni/mnist
gpuNum: 0
You can add assessor configuration.
.. code-block:: yaml
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai, kubeflow
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
assessor:
#choice: Medianstop
builtinAssessorName: Medianstop
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: /nni/mnist
gpuNum: 0
Or you could specify your own tuner and assessor file as following,
.. code-block:: yaml
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai, kubeflow
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
codeDir: /nni/tuner
classFileName: mytuner.py
className: MyTuner
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
assessor:
codeDir: /nni/assessor
classFileName: myassessor.py
className: MyAssessor
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: /nni/mnist
gpuNum: 0
Remote mode
^^^^^^^^^^^
If run trial jobs in remote machine, users could specify the remote machine information as following format:
.. code-block:: yaml
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai, kubeflow
trainingServicePlatform: remote
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: /nni/mnist
gpuNum: 0
#machineList can be empty if the platform is local
machineList:
- ip: 10.10.10.10
port: 22
username: test
passwd: test
- ip: 10.10.10.11
port: 22
username: test
passwd: test
- ip: 10.10.10.12
port: 22
username: test
sshKeyPath: /nni/sshkey
passphrase: qwert
# Pre-command will be executed before the remote machine executes other commands.
# Below is an example of specifying python environment.
# If you want to execute multiple commands, please use "&&" to connect them.
# preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
# preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
PAI mode
^^^^^^^^
.. code-block:: yaml
authorName: test
experimentName: nni_test1
trialConcurrency: 1
maxExecDuration:500h
maxTrialNum: 1
#choice: local, remote, pai, kubeflow
trainingServicePlatform: pai
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 main.py
codeDir: .
gpuNum: 4
cpuNum: 2
memoryMB: 10000
#The docker image to run NNI job on pai
image: msranni/nni:latest
paiConfig:
#The username to login pai
userName: test
#The password to login pai
passWord: test
#The host of restful server of pai
host: 10.10.10.10
Kubeflow mode
^^^^^^^^^^^^^
kubeflow with nfs storage.
.. code-block:: yaml
authorName: default
experimentName: example_mni
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 1
#choice: local, remote, pai, kubeflow
trainingServicePlatform: kubeflow
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
codeDir: .
worker:
replicas: 1
command: python3 mnist.py
gpuNum: 0
cpuNum: 1
memoryMB: 8192
image: msranni/nni:latest
kubeflowConfig:
operator: tf-operator
nfs:
server: 10.10.10.10
path: /var/nfs/general
Kubeflow with azure storage
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
authorName: default
experimentName: example_mni
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 1
#choice: local, remote, pai, kubeflow
trainingServicePlatform: kubeflow
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
#nniManagerIp: 10.10.10.10
tuner:
#choice: TPE, Random, Anneal, Evolution
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
assessor:
builtinAssessorName: Medianstop
classArgs:
optimize_mode: maximize
trial:
codeDir: .
worker:
replicas: 1
command: python3 mnist.py
gpuNum: 0
cpuNum: 1
memoryMB: 4096
image: msranni/nni:latest
kubeflowConfig:
operator: tf-operator
keyVault:
vaultName: Contoso-Vault
name: AzureStorageAccountKey
azureStorage:
accountName: storage
azureShare: share01
FAQ
===
This page is for frequent asked questions and answers.
tmp folder fulled
^^^^^^^^^^^^^^^^^
nnictl will use tmp folder as a temporary folder to copy files under codeDir when executing experimentation creation.
When met errors like below, try to clean up **tmp** folder first.
..
OSError: [Errno 28] No space left on device
Cannot get trials' metrics in OpenPAI mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In OpenPAI training mode, we start a rest server which listens on 51189 port in NNI Manager to receive metrcis reported from trials running in OpenPAI cluster. If you didn't see any metrics from WebUI in OpenPAI mode, check your machine where NNI manager runs on to make sure 51189 port is turned on in the firewall rule.
Segmentation Fault (core dumped) when installing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: text
make: *** [install-XXX] Segmentation fault (core dumped)
Please try the following solutions in turn:
* Update or reinstall you current python's pip like ``python3 -m pip install -U pip``
* Install NNI with ``--no-cache-dir`` flag like ``python3 -m pip install nni --no-cache-dir``
Job management error: getIPV4Address() failed because os.networkInterfaces().eth0 is undefined.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Your machine don't have eth0 device, please set `nniManagerIp <ExperimentConfig.rst>`__ in your config file manually.
Exceed the MaxDuration but didn't stop
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When the duration of experiment reaches the maximum duration, nniManager will not create new trials, but the existing trials will continue unless user manually stop the experiment.
Could not stop an experiment using ``nnictl stop``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you upgrade your NNI or you delete some config files of NNI when there is an experiment running, this kind of issue may happen because the loss of config file. You could use ``ps -ef | grep node`` to find the PID of your experiment, and use ``kill -9 {pid}`` to kill it manually.
Could not get ``default metric`` in webUI of virtual machines
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Config the network mode to bridge mode or other mode that could make virtual machine's host accessible from external machine, and make sure the port of virtual machine is not forbidden by firewall.
Could not open webUI link
^^^^^^^^^^^^^^^^^^^^^^^^^
Unable to open the WebUI may have the following reasons:
* ``http://127.0.0.1``\ , ``http://172.17.0.1`` and ``http://10.0.0.15`` are referred to localhost, if you start your experiment on the server or remote machine. You can replace the IP to your server IP to view the WebUI, like ``http://[your_server_ip]:8080``
* If you still can't see the WebUI after you use the server IP, you can check the proxy and the firewall of your machine. Or use the browser on the machine where you start your NNI experiment.
* Another reason may be your experiment is failed and NNI may fail to get the experiment information. You can check the log of NNIManager in the following directory: ``~/nni-experiments/[your_experiment_id]`` ``/log/nnimanager.log``
Restful server start failed
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Probably it's a problem with your network config. Here is a checklist.
* You might need to link ``127.0.0.1`` with ``localhost``. Add a line ``127.0.0.1 localhost`` to ``/etc/hosts``.
* It's also possible that you have set some proxy config. Check your environment for variables like ``HTTP_PROXY`` or ``HTTPS_PROXY`` and unset if they are set.
NNI on Windows problems
^^^^^^^^^^^^^^^^^^^^^^^
Please refer to `NNI on Windows <InstallationWin.rst>`__
More FAQ issues
^^^^^^^^^^^^^^^
`NNI Issues with FAQ labels <https://github.com/microsoft/nni/labels/FAQ>`__
Help us improve
^^^^^^^^^^^^^^^
Please inquiry the problem in https://github.com/Microsoft/nni/issues to see whether there are other people already reported the problem, create a new one if there are no existing issues been created.
**How to Debug in NNI**
===========================
Overview
--------
There are three parts that might have logs in NNI. They are nnimanager, dispatcher and trial. Here we will introduce them succinctly. More information please refer to `Overview <../Overview.rst>`__.
* **NNI controller**\ : NNI controller (nnictl) is the nni command-line tool that is used to manage experiments (e.g., start an experiment).
* **nnimanager**\ : nnimanager is the core of NNI, whose log is important when the whole experiment fails (e.g., no webUI or training service fails)
* **Dispatcher**\ : Dispatcher calls the methods of **Tuner** and **Assessor**. Logs of dispatcher are related to the tuner or assessor code.
* **Tuner**\ : Tuner is an AutoML algorithm, which generates a new configuration for the next try. A new trial will run with this configuration.
* **Assessor**\ : Assessor analyzes trial's intermediate results (e.g., periodically evaluated accuracy on test dataset) to tell whether this trial can be early stopped or not.
* **Trial**\ : Trial code is the code you write to run your experiment, which is an individual attempt at applying a new configuration (e.g., a set of hyperparameter values, a specific nerual architecture).
Where is the log
----------------
There are three kinds of log in NNI. When creating a new experiment, you can specify log level as debug by adding ``--debug``. Besides, you can set more detailed log level in your configuration file by using
``logLevel`` keyword. Available logLevels are: ``trace``\ , ``debug``\ , ``info``\ , ``warning``\ , ``error``\ , ``fatal``.
NNI controller
^^^^^^^^^^^^^^
All possible errors that happen when launching an NNI experiment can be found here.
You can use ``nnictl log stderr`` to find error information. For more options please refer to `NNICTL <Nnictl.rst>`__
Experiment Root Directory
^^^^^^^^^^^^^^^^^^^^^^^^^
Every experiment has a root folder, which is shown on the right-top corner of webUI. Or you could assemble it by replacing the ``experiment_id`` with your actual experiment_id in path ``~/nni-experiments/experiment_id/`` in case of webUI failure. ``experiment_id`` could be seen when you run ``nnictl create ...`` to create a new experiment.
..
For flexibility, we also offer a ``logDir`` option in your configuration, which specifies the directory to store all experiments (defaults to ``~/nni-experiments``\ ). Please refer to `Configuration <ExperimentConfig.rst>`__ for more details.
Under that directory, there is another directory named ``log``\ , where ``nnimanager.log`` and ``dispatcher.log`` are placed.
Trial Root Directory
^^^^^^^^^^^^^^^^^^^^
Usually in webUI, you can click ``+`` in the left of every trial to expand it to see each trial's log path.
Besides, there is another directory under experiment root directory, named ``trials``\ , which stores all the trials.
Every trial has a unique id as its directory name. In this directory, a file named ``stderr`` records trial error and another named ``trial.log`` records this trial's log.
Different kinds of errors
-------------------------
There are different kinds of errors. However, they can be divided into three categories based on their severity. So when nni fails, check each part sequentially.
Generally, if webUI is started successfully, there is a ``Status`` in the ``Overview`` tab, serving as a possible indicator of what kind of error happens. Otherwise you should check manually.
**NNI** Fails
^^^^^^^^^^^^^^^^^
This is the most serious error. When this happens, the whole experiment fails and no trial will be run. Usually this might be related to some installation problem.
When this happens, you should check ``nnictl``\ 's error output file ``stderr`` (i.e., nnictl log stderr) and then the ``nnimanager``\ 's log to find if there is any error.
**Dispatcher** Fails
^^^^^^^^^^^^^^^^^^^^^^^^
Dispatcher fails. Usually, for some new users of NNI, it means that tuner fails. You could check dispatcher's log to see what happens to your dispatcher. For built-in tuner, some common errors might be invalid search space (unsupported type of search space or inconsistence between initializing args in configuration file and actual tuner's __init__ function args).
Take the later situation as an example. If you write a customized tuner who's __init__ function has an argument called ``optimize_mode``\ , which you do not provide in your configuration file, NNI will fail to run your tuner so the experiment fails. You can see errors in the webUI like:
.. image:: ../../img/dispatcher_error.jpg
:target: ../../img/dispatcher_error.jpg
:alt:
Here we can see it is a dispatcher error. So we can check dispatcher's log, which might look like:
.. code-block:: bash
[2019-02-19 19:36:45] DEBUG (nni.main/MainThread) START
[2019-02-19 19:36:47] ERROR (nni.main/MainThread) __init__() missing 1 required positional arguments: 'optimize_mode'
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/nni/__main__.py", line 202, in <module>
main()
File "/usr/lib/python3.7/site-packages/nni/__main__.py", line 164, in main
args.tuner_args)
File "/usr/lib/python3.7/site-packages/nni/__main__.py", line 81, in create_customized_class_instance
instance = class_constructor(**class_args)
TypeError: __init__() missing 1 required positional arguments: 'optimize_mode'.
**Trial** Fails
^^^^^^^^^^^^^^^^^^^
In this situation, NNI can still run and create new trials.
It means your trial code (which is run by NNI) fails. This kind of error is strongly related to your trial code. Please check trial's log to fix any possible errors shown there.
A common example of this would be run the mnist example without installing tensorflow. Surely there is an Import Error (that is, not installing tensorflow but trying to import it in your trial code) and thus every trial fails.
.. image:: ../../img/trial_error.jpg
:target: ../../img/trial_error.jpg
:alt:
As it shows, every trial has a log path, where you can find trial's log and stderr.
In addition to experiment level debug, NNI also provides the capability for debugging a single trial without the need to start the entire experiment. Refer to `standalone mode <../TrialExample/Trials#standalone-mode-for-debugging>`__ for more information about debug single trial code.
**How to Use Docker in NNI**
================================
Overview
--------
`Docker <https://www.docker.com/>`__ is a tool to make it easier for users to deploy and run applications based on their own operating system by starting containers. Docker is not a virtual machine, it does not create a virtual operating system, but it allows different applications to use the same OS kernel and isolate different applications by container.
Users can start NNI experiments using Docker. NNI also provides an official Docker image `msranni/nni <https://hub.docker.com/r/msranni/nni>`__ on Docker Hub.
Using Docker in local machine
-----------------------------
Step 1: Installation of Docker
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Before you start using Docker for NNI experiments, you should install Docker on your local machine. `See here <https://docs.docker.com/install/linux/docker-ce/ubuntu/>`__.
Step 2: Start a Docker container
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you have installed the Docker package in your local machine, you can start a Docker container instance to run NNI examples. You should notice that because NNI will start a web UI process in a container and continue to listen to a port, you need to specify the port mapping between your host machine and Docker container to give access to web UI outside the container. By visiting the host IP address and port, you can redirect to the web UI process started in Docker container and visit web UI content.
For example, you could start a new Docker container from the following command:
.. code-block:: bash
docker run -i -t -p [hostPort]:[containerPort] [image]
``-i:`` Start a Docker in an interactive mode.
``-t:`` Docker assign the container an input terminal.
``-p:`` Port mapping, map host port to a container port.
For more information about Docker commands, please `refer to this <https://docs.docker.com/v17.09/edge/engine/reference/run/>`__.
Note:
.. code-block:: bash
NNI only supports Ubuntu and MacOS systems in local mode for the moment, please use correct Docker image type. If you want to use gpu in a Docker container, please use nvidia-docker.
Step 3: Run NNI in a Docker container
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you start a Docker image using NNI's official image ``msranni/nni``\ , you can directly start NNI experiments by using the ``nnictl`` command. Our official image has NNI's running environment and basic python and deep learning frameworks preinstalled.
If you start your own Docker image, you may need to install the NNI package first; please refer to `NNI installation <InstallationLinux.rst>`__.
If you want to run NNI's official examples, you may need to clone the NNI repo in GitHub using
.. code-block:: bash
git clone https://github.com/Microsoft/nni.git
then you can enter ``nni/examples/trials`` to start an experiment.
After you prepare NNI's environment, you can start a new experiment using the ``nnictl`` command. `See here <QuickStart.rst>`__.
Using Docker on a remote platform
---------------------------------
NNI supports starting experiments in `remoteTrainingService <../TrainingService/RemoteMachineMode.rst>`__\ , and running trial jobs on remote machines. As Docker can start an independent Ubuntu system as an SSH server, a Docker container can be used as the remote machine in NNI's remote mode.
Step 1: Setting a Docker environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You should install the Docker software on your remote machine first, please `refer to this <https://docs.docker.com/install/linux/docker-ce/ubuntu/>`__.
To make sure your Docker container can be connected by NNI experiments, you should build your own Docker image to set an SSH server or use images with an SSH configuration. If you want to use a Docker container as an SSH server, you should configure the SSH password login or private key login; please `refer to this <https://docs.docker.com/engine/examples/running_ssh_service/>`__.
Note:
.. code-block:: text
NNI's official image msranni/nni does not support SSH servers for the time being; you should build your own Docker image with an SSH configuration or use other images as a remote server.
Step 2: Start a Docker container on a remote machine
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
An SSH server needs a port; you need to expose Docker's SSH port to NNI as the connection port. For example, if you set your container's SSH port as ``A``, you should map the container's port ``A`` to your remote host machine's other port ``B``, NNI will connect port ``B`` as an SSH port, and your host machine will map the connection from port ``B`` to port ``A`` then NNI could connect to your Docker container.
For example, you could start your Docker container using the following commands:
.. code-block:: bash
docker run -dit -p [hostPort]:[containerPort] [image]
The ``containerPort`` is the SSH port used in your Docker container and the ``hostPort`` is your host machine's port exposed to NNI. You can set your NNI's config file to connect to ``hostPort`` and the connection will be transmitted to your Docker container.
For more information about Docker commands, please `refer to this <https://docs.docker.com/v17.09/edge/engine/reference/run/>`__.
Note:
.. code-block:: bash
If you use your own Docker image as a remote server, please make sure that this image has a basic python environment and an NNI SDK runtime environment. If you want to use a GPU in a Docker container, please use nvidia-docker.
Step 3: Run NNI experiments
^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can set your config file as a remote platform and set the ``machineList`` configuration to connect to your Docker SSH server; `refer to this <../TrainingService/RemoteMachineMode.rst>`__. Note that you should set the correct ``port``\ , ``username``\ , and ``passWd`` or ``sshKeyPath`` of your host machine.
``port:`` The host machine's port, mapping to Docker's SSH port.
``username:`` The username of the Docker container.
``passWd:`` The password of the Docker container.
``sshKeyPath:`` The path of the private key of the Docker container.
After the configuration of the config file, you could start an experiment, `refer to this <QuickStart.rst>`__.
**How to install customized algorithms as builtin tuners, assessors and advisors**
======================================================================================
Overview
--------
NNI provides a lot of `builtin tuners <../Tuner/BuiltinTuner.md>`__\ , `advisors <../Tuner/HyperbandAdvisor.md>`__ and `assessors <../Assessor/BuiltinAssessor.rst>`__ can be used directly for Hyper Parameter Optimization, and some extra algorithms can be installed via ``nnictl package install --name <name>`` after NNI is installed. You can check these extra algorithms via ``nnictl package list`` command.
NNI also provides the ability to build your own customized tuners, advisors and assessors. To use the customized algorithm, users can simply follow the spec in experiment config file to properly reference the algorithm, which has been illustrated in the tutorials of `customized tuners <../Tuner/CustomizeTuner.md>`__\ /\ `advisors <../Tuner/CustomizeAdvisor.md>`__\ /\ `assessors <../Assessor/CustomizeAssessor.rst>`__.
NNI also allows users to install the customized algorithm as a builtin algorithm, in order for users to use the algorithm in the same way as NNI builtin tuners/advisors/assessors. More importantly, it becomes much easier for users to share or distribute their implemented algorithm to others. Customized tuners/advisors/assessors can be installed into NNI as builtin algorithms, once they are installed into NNI, you can use your customized algorithms the same way as builtin tuners/advisors/assessors in your experiment configuration file. For example, you built a customized tuner and installed it into NNI using a builtin name ``mytuner``\ , then you can use this tuner in your configuration file like below:
.. code-block:: yaml
tuner:
builtinTunerName: mytuner
Install customized algorithms as builtin tuners, assessors and advisors
-----------------------------------------------------------------------
You can follow below steps to build a customized tuner/assessor/advisor, and install it into NNI as builtin algorithm.
1. Create a customized tuner/assessor/advisor
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Reference following instructions to create:
* `customized tuner <../Tuner/CustomizeTuner.rst>`__
* `customized assessor <../Assessor/CustomizeAssessor.rst>`__
* `customized advisor <../Tuner/CustomizeAdvisor.rst>`__
2. (Optional) Create a validator to validate classArgs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NNI provides a ``ClassArgsValidator`` interface for customized algorithms author to validate the classArgs parameters in experiment configuration file which are passed to customized algorithms constructors.
The ``ClassArgsValidator`` interface is defined as:
.. code-block:: python
class ClassArgsValidator(object):
def validate_class_args(self, **kwargs):
"""
The classArgs fields in experiment configuration are packed as a dict and
passed to validator as kwargs.
"""
pass
For example, you can implement your validator such as:
.. code-block:: python
from schema import Schema, Optional
from nni import ClassArgsValidator
class MedianstopClassArgsValidator(ClassArgsValidator):
def validate_class_args(self, **kwargs):
Schema({
Optional('optimize_mode'): self.choices('optimize_mode', 'maximize', 'minimize'),
Optional('start_step'): self.range('start_step', int, 0, 9999),
}).validate(kwargs)
The validator will be invoked before experiment is started to check whether the classArgs fields are valid for your customized algorithms.
3. Prepare package installation source
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to be installed as builtin tuners, assessors and advisors, the customized algorithms need to be packaged as installable source which can be recognized by ``pip`` command, under the hood nni calls ``pip`` command to install the package.
Besides being a common pip source, the package needs to provide meta information in the ``classifiers`` field.
Format of classifiers field is a following:
.. code-block:: bash
NNI Package :: <type> :: <builtin name> :: <full class name of tuner> :: <full class name of class args validator>
* ``type``\ : type of algorithms, could be one of ``tuner``\ , ``assessor``\ , ``advisor``
* ``builtin name``\ : builtin name used in experiment configuration file
* `full class name of tuner`: tuner class name, including its module name, for example: ``demo_tuner.DemoTuner``
* `full class name of class args validator`: class args validator class name, including its module name, for example: ``demo_tuner.MyClassArgsValidator``
Following is an example of classfiers in package's ``setup.py``\ :
.. code-block:: python
classifiers = [
'Programming Language :: Python :: 3',
'License :: OSI Approved :: MIT License',
'Operating System :: ',
'NNI Package :: tuner :: demotuner :: demo_tuner.DemoTuner :: demo_tuner.MyClassArgsValidator'
],
Once you have the meta info in ``setup.py``\ , you can build your pip installation source via:
* Run command ``python setup.py develop`` from the package directory, this command will build the directory as a pip installation source.
* Run command ``python setup.py bdist_wheel`` from the package directory, this command build a whl file which is a pip installation source.
NNI will look for the classifier starts with ``NNI Package`` to retrieve the package meta information while the package being installed with ``nnictl package install <source>`` command.
Reference `customized tuner example <../Tuner/InstallCustomizedTuner.rst>`__ for a full example.
4. Install customized algorithms package into NNI
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If your installation source is prepared as a directory with ``python setup.py develop``\ , you can install the package by following command:
``nnictl package install <installation source directory>``
For example:
``nnictl package install nni/examples/tuners/customized_tuner/``
If your installation source is prepared as a whl file with ``python setup.py bdist_wheel``\ , you can install the package by following command:
``nnictl package install <whl file path>``
For example:
``nnictl package install nni/examples/tuners/customized_tuner/dist/demo_tuner-0.1-py3-none-any.whl``
5. Use the installed builtin algorithms in experiment
-----------------------------------------------------
Once your customized algorithms is installed, you can use it in experiment configuration file the same way as other builtin tuners/assessors/advisors, for example:
.. code-block:: yaml
tuner:
builtinTunerName: demotuner
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
Manage packages using ``nnictl package``
--------------------------------------------
List installed packages
^^^^^^^^^^^^^^^^^^^^^^^
Run following command to list the installed packages:
.. code-block:: bash
nnictl package list
+-----------------+------------+-----------+--------=-------------+------------------------------------------+
| Name | Type | Installed | Class Name | Module Name |
+-----------------+------------+-----------+----------------------+------------------------------------------+
| demotuner | tuners | Yes | DemoTuner | demo_tuner |
| SMAC | tuners | No | SMACTuner | nni.smac_tuner.smac_tuner |
| PPOTuner | tuners | No | PPOTuner | nni.ppo_tuner.ppo_tuner |
| BOHB | advisors | Yes | BOHB | nni.bohb_advisor.bohb_advisor |
+-----------------+------------+-----------+----------------------+------------------------------------------+
Run following command to list all packages, including the builtin packages can not be uninstalled.
.. code-block:: bash
nnictl package list --all
+-----------------+------------+-----------+--------=-------------+------------------------------------------+
| Name | Type | Installed | Class Name | Module Name |
+-----------------+------------+-----------+----------------------+------------------------------------------+
| TPE | tuners | Yes | HyperoptTuner | nni.hyperopt_tuner.hyperopt_tuner |
| Random | tuners | Yes | HyperoptTuner | nni.hyperopt_tuner.hyperopt_tuner |
| Anneal | tuners | Yes | HyperoptTuner | nni.hyperopt_tuner.hyperopt_tuner |
| Evolution | tuners | Yes | EvolutionTuner | nni.evolution_tuner.evolution_tuner |
| BatchTuner | tuners | Yes | BatchTuner | nni.batch_tuner.batch_tuner |
| GridSearch | tuners | Yes | GridSearchTuner | nni.gridsearch_tuner.gridsearch_tuner |
| NetworkMorphism | tuners | Yes | NetworkMorphismTuner | nni.networkmorphism_tuner.networkmo... |
| MetisTuner | tuners | Yes | MetisTuner | nni.metis_tuner.metis_tuner |
| GPTuner | tuners | Yes | GPTuner | nni.gp_tuner.gp_tuner |
| PBTTuner | tuners | Yes | PBTTuner | nni.pbt_tuner.pbt_tuner |
| SMAC | tuners | No | SMACTuner | nni.smac_tuner.smac_tuner |
| PPOTuner | tuners | No | PPOTuner | nni.ppo_tuner.ppo_tuner |
| Medianstop | assessors | Yes | MedianstopAssessor | nni.medianstop_assessor.medianstop_... |
| Curvefitting | assessors | Yes | CurvefittingAssessor | nni.curvefitting_assessor.curvefitt... |
| Hyperband | advisors | Yes | Hyperband | nni.hyperband_advisor.hyperband_adv... |
| BOHB | advisors | Yes | BOHB | nni.bohb_advisor.bohb_advisor |
+-----------------+------------+-----------+----------------------+------------------------------------------+
Uninstall package
^^^^^^^^^^^^^^^^^
Run following command to uninstall an installed package:
``nnictl package uninstall <builtin name>``
For example:
``nnictl package uninstall demotuner``
Install on Linux & Mac
======================
Installation
------------
Installation on Linux and macOS follow the same instructions, given below.
Install NNI through pip
^^^^^^^^^^^^^^^^^^^^^^^
Prerequisite: ``python 64-bit >= 3.6``
.. code-block:: bash
python3 -m pip install --upgrade nni
Install NNI through source code
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you are interested in special or the latest code versions, you can install NNI through source code.
Prerequisites: ``python 64-bit >=3.6``\ , ``git``\ , ``wget``
.. code-block:: bash
git clone -b v1.9 https://github.com/Microsoft/nni.git
cd nni
./install.sh
Use NNI in a docker image
^^^^^^^^^^^^^^^^^^^^^^^^^
You can also install NNI in a docker image. Please follow the instructions :githublink:`here <deployment/docker/README.rst>` to build an NNI docker image. The NNI docker image can also be retrieved from Docker Hub through the command ``docker pull msranni/nni:latest``.
Verify installation
-------------------
The following example is built on TensorFlow 1.x. Make sure **TensorFlow 1.x is used** when running it.
*
Download the examples via cloning the source code.
.. code-block:: bash
git clone -b v1.9 https://github.com/Microsoft/nni.git
*
Run the MNIST example.
.. code-block:: bash
nnictl create --config nni/examples/trials/mnist-tfv1/config.yml
*
Wait for the message ``INFO: Successfully started experiment!`` in the command line. This message indicates that your experiment has been successfully started. You can explore the experiment using the ``Web UI url``.
.. code-block:: text
INFO: Starting restful server...
INFO: Successfully started Restful server!
INFO: Setting local config...
INFO: Successfully set local config!
INFO: Starting experiment...
INFO: Successfully started experiment!
-----------------------------------------------------------------------
The experiment id is egchD4qy
The Web UI urls are: http://223.255.255.1:8080 http://127.0.0.1:8080
-----------------------------------------------------------------------
You can use these commands to get more information about the experiment
-----------------------------------------------------------------------
commands description
1. nnictl experiment show show the information of experiments
2. nnictl trial ls list all of trial jobs
3. nnictl top monitor the status of running experiments
4. nnictl log stderr show stderr log content
5. nnictl log stdout show stdout log content
6. nnictl stop stop an experiment
7. nnictl trial kill kill a trial job by id
8. nnictl --help get help information about nnictl
-----------------------------------------------------------------------
* Open the ``Web UI url`` in your browser, you can view detailed information about the experiment and all the submitted trial jobs as shown below. `Here <../Tutorial/WebUI.rst>`__ are more Web UI pages.
.. image:: ../../img/webui_overview_page.png
:target: ../../img/webui_overview_page.png
:alt: overview
.. image:: ../../img/webui_trialdetail_page.png
:target: ../../img/webui_trialdetail_page.png
:alt: detail
System requirements
-------------------
Due to potential programming changes, the minimum system requirements of NNI may change over time.
Linux
^^^^^
.. list-table::
:header-rows: 1
:widths: auto
* -
- Recommended
- Minimum
* - **Operating System**
- Ubuntu 16.04 or above
-
* - **CPU**
- Intel® Core™ i5 or AMD Phenom™ II X3 or better
- Intel® Core™ i3 or AMD Phenom™ X3 8650
* - **GPU**
- NVIDIA® GeForce® GTX 660 or better
- NVIDIA® GeForce® GTX 460
* - **Memory**
- 6 GB RAM
- 4 GB RAM
* - **Storage**
- 30 GB available hare drive space
-
* - **Internet**
- Boardband internet connection
-
* - **Resolution**
- 1024 x 768 minimum display resolution
-
macOS
^^^^^
.. list-table::
:header-rows: 1
:widths: auto
* -
- Recommended
- Minimum
* - **Operating System**
- macOS 10.14.1 or above
-
* - **CPU**
- Intel® Core™ i7-4770 or better
- Intel® Core™ i5-760 or better
* - **GPU**
- AMD Radeon™ R9 M395X or better
- NVIDIA® GeForce® GT 750M or AMD Radeon™ R9 M290 or better
* - **Memory**
- 8 GB RAM
- 4 GB RAM
* - **Storage**
- 70GB available space SSD
- 70GB available space 7200 RPM HDD
* - **Internet**
- Boardband internet connection
-
* - **Resolution**
- 1024 x 768 minimum display resolution
-
Further reading
---------------
* `Overview <../Overview.rst>`__
* `Use command line tool nnictl <Nnictl.rst>`__
* `Use NNIBoard <WebUI.rst>`__
* `Define search space <SearchSpaceSpec.rst>`__
* `Config an experiment <ExperimentConfig.rst>`__
* `How to run an experiment on local (with multiple GPUs)? <../TrainingService/LocalMode.rst>`__
* `How to run an experiment on multiple machines? <../TrainingService/RemoteMachineMode.rst>`__
* `How to run an experiment on OpenPAI? <../TrainingService/PaiMode.rst>`__
* `How to run an experiment on Kubernetes through Kubeflow? <../TrainingService/KubeflowMode.rst>`__
* `How to run an experiment on Kubernetes through FrameworkController? <../TrainingService/FrameworkControllerMode.rst>`__
* `How to run an experiment on Kubernetes through AdaptDL? <../TrainingService/AdaptDLMode.rst>`__
Install on Windows
==================
Prerequires
-----------
*
Python 3.6 (or above) 64-bit. `Anaconda <https://www.anaconda.com/products/individual>`__ or `Miniconda <https://docs.conda.io/en/latest/miniconda.html>`__ is highly recommended to manage multiple Python environments on Windows.
*
If it's a newly installed Python environment, it needs to install `Microsoft C++ Build Tools <https://visualstudio.microsoft.com/visual-cpp-build-tools/>`__ to support build NNI dependencies like ``scikit-learn``.
.. code-block:: bat
pip install cython wheel
*
git for verifying installation.
Install NNI
-----------
In most cases, you can install and upgrade NNI from pip package. It's easy and fast.
If you are interested in special or the latest code versions, you can install NNI through source code.
If you want to contribute to NNI, refer to `setup development environment <SetupNniDeveloperEnvironment.rst>`__.
*
From pip package
.. code-block:: bat
python -m pip install --upgrade nni
*
From source code
.. code-block:: bat
git clone -b v1.9 https://github.com/Microsoft/nni.git
cd nni
powershell -ExecutionPolicy Bypass -file install.ps1
Verify installation
-------------------
The following example is built on TensorFlow 1.x. Make sure **TensorFlow 1.x is used** when running it.
*
Clone examples within source code.
.. code-block:: bat
git clone -b v1.9 https://github.com/Microsoft/nni.git
*
Run the MNIST example.
.. code-block:: bat
nnictl create --config nni\examples\trials\mnist-tfv1\config_windows.yml
Note: If you are familiar with other frameworks, you can choose corresponding example under ``examples\trials``. It needs to change trial command ``python3`` to ``python`` in each example YAML, since default installation has ``python.exe``\ , not ``python3.exe`` executable.
*
Wait for the message ``INFO: Successfully started experiment!`` in the command line. This message indicates that your experiment has been successfully started. You can explore the experiment using the ``Web UI url``.
.. code-block:: text
INFO: Starting restful server...
INFO: Successfully started Restful server!
INFO: Setting local config...
INFO: Successfully set local config!
INFO: Starting experiment...
INFO: Successfully started experiment!
-----------------------------------------------------------------------
The experiment id is egchD4qy
The Web UI urls are: http://223.255.255.1:8080 http://127.0.0.1:8080
-----------------------------------------------------------------------
You can use these commands to get more information about the experiment
-----------------------------------------------------------------------
commands description
1. nnictl experiment show show the information of experiments
2. nnictl trial ls list all of trial jobs
3. nnictl top monitor the status of running experiments
4. nnictl log stderr show stderr log content
5. nnictl log stdout show stdout log content
6. nnictl stop stop an experiment
7. nnictl trial kill kill a trial job by id
8. nnictl --help get help information about nnictl
-----------------------------------------------------------------------
* Open the ``Web UI url`` in your browser, you can view detailed information about the experiment and all the submitted trial jobs as shown below. `Here <../Tutorial/WebUI.rst>`__ are more Web UI pages.
.. image:: ../../img/webui_overview_page.png
:target: ../../img/webui_overview_page.png
:alt: overview
.. image:: ../../img/webui_trialdetail_page.png
:target: ../../img/webui_trialdetail_page.png
:alt: detail
System requirements
-------------------
Below are the minimum system requirements for NNI on Windows, Windows 10.1809 is well tested and recommend. Due to potential programming changes, the minimum system requirements for NNI may change over time.
.. list-table::
:header-rows: 1
:widths: auto
* -
- Recommended
- Minimum
* - **Operating System**
- Windows 10 1809 or above
-
* - **CPU**
- Intel® Core™ i5 or AMD Phenom™ II X3 or better
- Intel® Core™ i3 or AMD Phenom™ X3 8650
* - **GPU**
- NVIDIA® GeForce® GTX 660 or better
- NVIDIA® GeForce® GTX 460
* - **Memory**
- 6 GB RAM
- 4 GB RAM
* - **Storage**
- 30 GB available hare drive space
-
* - **Internet**
- Boardband internet connection
-
* - **Resolution**
- 1024 x 768 minimum display resolution
-
FAQ
---
simplejson failed when installing NNI
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Make sure a C++ 14.0 compiler is installed.
..
building 'simplejson._speedups' extension error: [WinError 3] The system cannot find the path specified
Trial failed with missing DLL in command line or PowerShell
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This error is caused by missing LIBIFCOREMD.DLL and LIBMMD.DLL and failure to install SciPy. Using Anaconda or Miniconda with Python(64-bit) can solve it.
..
ImportError: DLL load failed
Trial failed on webUI
^^^^^^^^^^^^^^^^^^^^^
Please check the trial log file stderr for more details.
If there is a stderr file, please check it. Two possible cases are:
* forgetting to change the trial command ``python3`` to ``python`` in each experiment YAML.
* forgetting to install experiment dependencies such as TensorFlow, Keras and so on.
Fail to use BOHB on Windows
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Make sure a C++ 14.0 compiler is installed when trying to run ``nnictl package install --name=BOHB`` to install the dependencies.
Not supported tuner on Windows
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SMAC is not supported currently; for the specific reason refer to this `GitHub issue <https://github.com/automl/SMAC3/issues/483>`__.
Use Windows as a remote worker
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Refer to `Remote Machine mode <../TrainingService/RemoteMachineMode.rst>`__.
Segmentation fault (core dumped) when installing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Refer to `FAQ <FAQ.rst>`__.
Further reading
---------------
* `Overview <../Overview.rst>`__
* `Use command line tool nnictl <Nnictl.rst>`__
* `Use NNIBoard <WebUI.rst>`__
* `Define search space <SearchSpaceSpec.rst>`__
* `Config an experiment <ExperimentConfig.rst>`__
* `How to run an experiment on local (with multiple GPUs)? <../TrainingService/LocalMode.rst>`__
* `How to run an experiment on multiple machines? <../TrainingService/RemoteMachineMode.rst>`__
* `How to run an experiment on OpenPAI? <../TrainingService/PaiMode.rst>`__
* `How to run an experiment on Kubernetes through Kubeflow? <../TrainingService/KubeflowMode.rst>`__
* `How to run an experiment on Kubernetes through FrameworkController? <../TrainingService/FrameworkControllerMode.rst>`__
.. role:: raw-html(raw)
:format: html
nnictl
======
Introduction
------------
**nnictl** is a command line tool, which can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc.
Commands
--------
nnictl support commands:
* `nnictl create <#create>`__
* `nnictl resume <#resume>`__
* `nnictl view <#view>`__
* `nnictl stop <#stop>`__
* `nnictl update <#update>`__
* `nnictl trial <#trial>`__
* `nnictl top <#top>`__
* `nnictl experiment <#experiment>`__
* `nnictl platform <#platform>`__
* `nnictl config <#config>`__
* `nnictl log <#log>`__
* `nnictl webui <#webui>`__
* `nnictl tensorboard <#tensorboard>`__
* `nnictl package <#package>`__
* `nnictl ss_gen <#ss_gen>`__
* `nnictl --version <#version>`__
Manage an experiment
^^^^^^^^^^^^^^^^^^^^
:raw-html:`<a name="create"></a>`
nnictl create
^^^^^^^^^^^^^
*
Description
You can use this command to create a new experiment, using the configuration specified in config file.
After this command is successfully done, the context will be set as this experiment, which means the following command you issued is associated with this experiment, unless you explicitly changes the context(not supported yet).
*
Usage
.. code-block:: bash
nnictl create [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - --config, -c
- True
-
- YAML configure file of the experiment
* - --port, -p
- False
-
- the port of restful server
* - --debug, -d
- False
-
- set debug mode
* - --foreground, -f
- False
-
- set foreground mode, print log content to terminal
*
Examples
..
create a new experiment with the default port: 8080
.. code-block:: bash
nnictl create --config nni/examples/trials/mnist-tfv1/config.yml
..
create a new experiment with specified port 8088
.. code-block:: bash
nnictl create --config nni/examples/trials/mnist-tfv1/config.yml --port 8088
..
create a new experiment with specified port 8088 and debug mode
.. code-block:: bash
nnictl create --config nni/examples/trials/mnist-tfv1/config.yml --port 8088 --debug
Note:
.. code-block:: text
Debug mode will disable version check function in Trialkeeper.
:raw-html:`<a name="resume"></a>`
nnictl resume
^^^^^^^^^^^^^
*
Description
You can use this command to resume a stopped experiment.
*
Usage
.. code-block:: bash
nnictl resume [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- True
-
- The id of the experiment you want to resume
* - --port, -p
- False
-
- Rest port of the experiment you want to resume
* - --debug, -d
- False
-
- set debug mode
* - --foreground, -f
- False
-
- set foreground mode, print log content to terminal
*
Example
..
resume an experiment with specified port 8088
.. code-block:: bash
nnictl resume [experiment_id] --port 8088
:raw-html:`<a name="view"></a>`
nnictl view
^^^^^^^^^^^
*
Description
You can use this command to view a stopped experiment.
*
Usage
.. code-block:: bash
nnictl view [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- True
-
- The id of the experiment you want to view
* - --port, -p
- False
-
- Rest port of the experiment you want to view
*
Example
..
view an experiment with specified port 8088
.. code-block:: bash
nnictl view [experiment_id] --port 8088
:raw-html:`<a name="stop"></a>`
nnictl stop
^^^^^^^^^^^
*
Description
You can use this command to stop a running experiment or multiple experiments.
*
Usage
.. code-block:: bash
nnictl stop [Options]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- The id of the experiment you want to stop
* - --port, -p
- False
-
- Rest port of the experiment you want to stop
* - --all, -a
- False
-
- Stop all of experiments
*
Details & Examples
#.
If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
.. code-block:: bash
nnictl stop
#.
If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message.
.. code-block:: bash
nnictl stop [experiment_id]
#.
If there is a port specified, and an experiment is running on that port, the experiment will be stopped.
.. code-block:: bash
nnictl stop --port 8080
#.
Users could use 'nnictl stop --all' to stop all experiments.
.. code-block:: bash
nnictl stop --all
#.
If the id ends with \*, nnictl will stop all experiments whose ids matchs the regular.
#. If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
#. If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
:raw-html:`<a name="update"></a>`
nnictl update
^^^^^^^^^^^^^
*
**nnictl update searchspace**
*
Description
You can use this command to update an experiment's search space.
*
Usage
.. code-block:: bash
nnictl update searchspace [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
* - --filename, -f
- True
-
- the file storing your new search space
*
Example
``update experiment's new search space with file dir 'examples/trials/mnist-tfv1/search_space.json'``
.. code-block:: bash
nnictl update searchspace [experiment_id] --filename examples/trials/mnist-tfv1/search_space.json
*
**nnictl update concurrency**
*
Description
You can use this command to update an experiment's concurrency.
*
Usage
.. code-block:: bash
nnictl update concurrency [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
* - --value, -v
- True
-
- the number of allowed concurrent trials
*
Example
..
update experiment's concurrency
.. code-block:: bash
nnictl update concurrency [experiment_id] --value [concurrency_number]
*
**nnictl update duration**
*
Description
You can use this command to update an experiment's duration.
*
Usage
.. code-block:: bash
nnictl update duration [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
* - --value, -v
- True
-
- Strings like '1m' for one minute or '2h' for two hours. SUFFIX may be 's' for seconds, 'm' for minutes, 'h' for hours or 'd' for days.
*
Example
..
update experiment's duration
.. code-block:: bash
nnictl update duration [experiment_id] --value [duration]
*
**nnictl update trialnum**
*
Description
You can use this command to update an experiment's maxtrialnum.
*
Usage
.. code-block:: bash
nnictl update trialnum [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
* - --value, -v
- True
-
- the new number of maxtrialnum you want to set
*
Example
..
update experiment's trial num
.. code-block:: bash
nnictl update trialnum [experiment_id] --value [trial_num]
:raw-html:`<a name="trial"></a>`
nnictl trial
^^^^^^^^^^^^
*
**nnictl trial ls**
*
Description
You can use this command to show trial's information. Note that if ``head`` or ``tail`` is set, only complete trials will be listed.
*
Usage
.. code-block:: bash
nnictl trial ls
nnictl trial ls --head 10
nnictl trial ls --tail 10
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
* - --head
- False
-
- the number of items to be listed with the highest default metric
* - --tail
- False
-
- the number of items to be listed with the lowest default metric
*
**nnictl trial kill**
*
Description
You can use this command to kill a trial job.
*
Usage
.. code-block:: bash
nnictl trial kill [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- Experiment ID of the trial
* - --trial_id, -T
- True
-
- ID of the trial you want to kill.
*
Example
..
kill trail job
.. code-block:: bash
nnictl trial kill [experiment_id] --trial_id [trial_id]
:raw-html:`<a name="top"></a>`
nnictl top
^^^^^^^^^^
*
Description
Monitor all of running experiments.
*
Usage
.. code-block:: bash
nnictl top
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
* - --time, -t
- False
-
- The interval to update the experiment status, the unit of time is second, and the default value is 3 second.
:raw-html:`<a name="experiment"></a>`
Manage experiment information
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*
**nnictl experiment show**
*
Description
Show the information of experiment.
*
Usage
.. code-block:: bash
nnictl experiment show
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
*
**nnictl experiment status**
*
Description
Show the status of experiment.
*
Usage
.. code-block:: bash
nnictl experiment status
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
*
**nnictl experiment list**
*
Description
Show the information of all the (running) experiments.
*
Usage
.. code-block:: bash
nnictl experiment list [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - --all
- False
-
- list all of experiments
*
**nnictl experiment delete**
*
Description
Delete one or all experiments, it includes log, result, environment information and cache. It uses to delete useless experiment result, or save disk space.
*
Usage
.. code-block:: bash
nnictl experiment delete [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment
* - --all
- False
-
- delete all of experiments
*
**nnictl experiment export**
*
Description
You can use this command to export reward & hyper-parameter of trial jobs to a csv file.
*
Usage
.. code-block:: bash
nnictl experiment export [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment
* - --filename, -f
- True
-
- File path of the output file
* - --type
- True
-
- Type of output file, only support "csv" and "json"
* - --intermediate, -i
- False
-
- Are intermediate results included
*
Examples
..
export all trial data in an experiment as json format
.. code-block:: bash
nnictl experiment export [experiment_id] --filename [file_path] --type json --intermediate
*
**nnictl experiment import**
*
Description
You can use this command to import several prior or supplementary trial hyperparameters & results for NNI hyperparameter tuning. The data are fed to the tuning algorithm (e.g., tuner or advisor).
*
Usage
.. code-block:: bash
nnictl experiment import [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- The id of the experiment you want to import data into
* - --filename, -f
- True
-
- a file with data you want to import in json format
*
Details
NNI supports users to import their own data, please express the data in the correct format. An example is shown below:
.. code-block:: json
[
{"parameter": {"x": 0.5, "y": 0.9}, "value": 0.03},
{"parameter": {"x": 0.4, "y": 0.8}, "value": 0.05},
{"parameter": {"x": 0.3, "y": 0.7}, "value": 0.04}
]
Every element in the top level list is a sample. For our built-in tuners/advisors, each sample should have at least two keys: ``parameter`` and ``value``. The ``parameter`` must match this experiment's search space, that is, all the keys (or hyperparameters) in ``parameter`` must match the keys in the search space. Otherwise, tuner/advisor may have unpredictable behavior. ``Value`` should follow the same rule of the input in ``nni.report_final_result``\ , that is, either a number or a dict with a key named ``default``. For your customized tuner/advisor, the file could have any json content depending on how you implement the corresponding methods (e.g., ``import_data``\ ).
You also can use `nnictl experiment export <#export>`__ to export a valid json file including previous experiment trial hyperparameters and results.
Currently, following tuner and advisor support import data:
.. code-block:: yaml
builtinTunerName: TPE, Anneal, GridSearch, MetisTuner
builtinAdvisorName: BOHB
*If you want to import data to BOHB advisor, user are suggested to add "TRIAL_BUDGET" in parameter as NNI do, otherwise, BOHB will use max_budget as "TRIAL_BUDGET". Here is an example:*
.. code-block:: json
[
{"parameter": {"x": 0.5, "y": 0.9, "TRIAL_BUDGET": 27}, "value": 0.03}
]
*
Examples
..
import data to a running experiment
.. code-block:: bash
nnictl experiment import [experiment_id] -f experiment_data.json
*
**nnictl experiment save**
*
Description
Save nni experiment metadata and code data.
*
Usage
.. code-block:: bash
nnictl experiment save [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- True
-
- The id of the experiment you want to save
* - --path, -p
- False
-
- the folder path to store nni experiment data, default current working directory
* - --saveCodeDir, -s
- False
-
- save codeDir data of the experiment, default False
*
Examples
..
save an expeirment
.. code-block:: bash
nnictl experiment save [experiment_id] --saveCodeDir
*
**nnictl experiment load**
*
Description
Load an nni experiment.
*
Usage
.. code-block:: bash
nnictl experiment load [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - --path, -p
- True
-
- the file path of nni package
* - --codeDir, -c
- True
-
- the path of codeDir for loaded experiment, this path will also put the code in the loaded experiment package
* - --logDir, -l
- False
-
- the path of logDir for loaded experiment
* - --searchSpacePath, -s
- True
-
- the path of search space file for loaded experiment, this path contains file name. Default in $codeDir/search_space.json
*
Examples
..
load an expeirment
.. code-block:: bash
nnictl experiment load --path [path] --codeDir [codeDir]
:raw-html:`<a name="platform"></a>`
Manage platform information
^^^^^^^^^^^^^^^^^^^^^^^^^^^
*
**nnictl platform clean**
*
Description
It uses to clean up disk on a target platform. The provided YAML file includes the information of target platform, and it follows the same schema as the NNI configuration file.
*
Note
if the target platform is being used by other users, it may cause unexpected errors to others.
*
Usage
.. code-block:: bash
nnictl platform clean [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - --config
- True
-
- the path of yaml config file used when create an experiment
:raw-html:`<a name="config"></a>`
nnictl config show
^^^^^^^^^^^^^^^^^^
*
Description
Display the current context information.
*
Usage
.. code-block:: bash
nnictl config show
:raw-html:`<a name="log"></a>`
Manage log
^^^^^^^^^^
*
**nnictl log stdout**
*
Description
Show the stdout log content.
*
Usage
.. code-block:: bash
nnictl log stdout [options]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
* - --head, -h
- False
-
- show head lines of stdout
* - --tail, -t
- False
-
- show tail lines of stdout
* - --path, -p
- False
-
- show the path of stdout file
*
Example
..
Show the tail of stdout log content
.. code-block:: bash
nnictl log stdout [experiment_id] --tail [lines_number]
*
**nnictl log stderr**
*
Description
Show the stderr log content.
*
Usage
.. code-block:: bash
nnictl log stderr [options]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
* - --head, -h
- False
-
- show head lines of stderr
* - --tail, -t
- False
-
- show tail lines of stderr
* - --path, -p
- False
-
- show the path of stderr file
*
**nnictl log trial**
*
Description
Show trial log path.
*
Usage
.. code-block:: bash
nnictl log trial [options]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- Experiment ID of the trial
* - --trial_id, -T
- False
-
- ID of the trial to be found the log path, required when id is not empty.
:raw-html:`<a name="webui"></a>`
Manage webui
^^^^^^^^^^^^
*
**nnictl webui url**
*
Description
Show an experiment's webui url
*
Usage
.. code-block:: bash
nnictl webui url [options]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- Experiment ID
:raw-html:`<a name="tensorboard"></a>`
Manage tensorboard
^^^^^^^^^^^^^^^^^^
*
**nnictl tensorboard start**
*
Description
Start the tensorboard process.
*
Usage
.. code-block:: bash
nnictl tensorboard start
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
* - --trial_id, -T
- False
-
- ID of the trial
* - --port
- False
- 6006
- The port of the tensorboard process
*
Detail
#. NNICTL support tensorboard function in local and remote platform for the moment, other platforms will be supported later.
#. If you want to use tensorboard, you need to write your tensorboard log data to environment variable [NNI_OUTPUT_DIR] path.
#. In local mode, nnictl will set --logdir=[NNI_OUTPUT_DIR] directly and start a tensorboard process.
#. In remote mode, nnictl will create a ssh client to copy log data from remote machine to local temp directory firstly, and then start a tensorboard process in your local machine. You need to notice that nnictl only copy the log data one time when you use the command, if you want to see the later result of tensorboard, you should execute nnictl tensorboard command again.
#. If there is only one trial job, you don't need to set trial id. If there are multiple trial jobs running, you should set the trial id, or you could use [nnictl tensorboard start --trial_id all] to map --logdir to all trial log paths.
*
**nnictl tensorboard stop**
*
Description
Stop all of the tensorboard process.
*
Usage
.. code-block:: bash
nnictl tensorboard stop
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - id
- False
-
- ID of the experiment you want to set
:raw-html:`<a name="package"></a>`
Manage package
^^^^^^^^^^^^^^
*
**nnictl package install**
*
Description
Install a package (customized algorithms or nni provided algorithms) as builtin tuner/assessor/advisor.
*
Usage
.. code-block:: bash
nnictl package install --name <package name>
The available ``<package name>`` can be checked via ``nnictl package list`` command.
or
.. code-block:: bash
nnictl package install <installation source>
Reference `Install customized algorithms <InstallCustomizedAlgos.rst>`__ to prepare the installation source.
*
Example
..
Install SMAC tuner
.. code-block:: bash
nnictl package install --name SMAC
..
Install a customized tuner
.. code-block:: bash
nnictl package install nni/examples/tuners/customized_tuner/dist/demo_tuner-0.1-py3-none-any.whl
*
**nnictl package show**
*
Description
Show the detailed information of specified packages.
*
Usage
.. code-block:: bash
nnictl package show <package name>
*
Example
.. code-block:: bash
nnictl package show SMAC
*
**nnictl package list**
*
Description
List the installed/all packages.
*
Usage
.. code-block:: bash
nnictl package list [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - --all
- False
-
- List all packages
*
Example
..
List installed packages
.. code-block:: bash
nnictl package list
..
List all packages
.. code-block:: bash
nnictl package list --all
*
**nnictl package uninstall**
*
Description
Uninstall a package.
*
Usage
.. code-block:: bash
nnictl package uninstall <package name>
*
Example
Uninstall SMAC package
.. code-block:: bash
nnictl package uninstall SMAC
:raw-html:`<a name="ss_gen"></a>`
Generate search space
^^^^^^^^^^^^^^^^^^^^^
*
**nnictl ss_gen**
*
Description
Generate search space from user trial code which uses NNI NAS APIs.
*
Usage
.. code-block:: bash
nnictl ss_gen [OPTIONS]
*
Options
.. list-table::
:header-rows: 1
:widths: auto
* - Name, shorthand
- Required
- Default
- Description
* - --trial_command
- True
-
- The command of the trial code
* - --trial_dir
- False
- ./
- The directory of the trial code
* - --file
- False
- nni_auto_gen_search_space.json
- The file for storing generated search space
*
Example
..
Generate a search space
.. code-block:: bash
nnictl ss_gen --trial_command="python3 mnist.py" --trial_dir=./ --file=ss.json
:raw-html:`<a name="version"></a>`
Check NNI version
^^^^^^^^^^^^^^^^^
*
**nnictl --version**
*
Description
Describe the current version of NNI installed.
*
Usage
.. code-block:: bash
nnictl --version
QuickStart
==========
Installation
------------
We currently support Linux, macOS, and Windows. Ubuntu 16.04 or higher, macOS 10.14.1, and Windows 10.1809 are tested and supported. Simply run the following ``pip install`` in an environment that has ``python >= 3.6``.
Linux and macOS
^^^^^^^^^^^^^^^
.. code-block:: bash
python3 -m pip install --upgrade nni
Windows
^^^^^^^
.. code-block:: bash
python -m pip install --upgrade nni
.. Note:: For Linux and macOS, ``--user`` can be added if you want to install NNI in your home directory; this does not require any special privileges.
.. Note:: If there is an error like ``Segmentation fault``, please refer to the :doc:`FAQ <FAQ>`.
.. Note:: For the system requirements of NNI, please refer to :doc:`Install NNI on Linux & Mac <InstallationLinux>` or :doc:`Windows <InstallationWin>`.
Enable NNI Command-line Auto-Completion (Optional)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
After the installation, you may want to enable the auto-completion feature for **nnictl** commands. Please refer to this `tutorial <../CommunitySharings/AutoCompletion.rst>`__.
"Hello World" example on MNIST
------------------------------
NNI is a toolkit to help users run automated machine learning experiments. It can automatically do the cyclic process of getting hyperparameters, running trials, testing results, and tuning hyperparameters. Here, we'll show how to use NNI to help you find the optimal hyperparameters for a MNIST model.
Here is an example script to train a CNN on the MNIST dataset **without NNI**\ :
.. code-block:: python
def run_trial(params):
# Input data
mnist = input_data.read_data_sets(params['data_dir'], one_hot=True)
# Build network
mnist_network = MnistNetwork(channel_1_num=params['channel_1_num'],
channel_2_num=params['channel_2_num'],
conv_size=params['conv_size'],
hidden_size=params['hidden_size'],
pool_size=params['pool_size'],
learning_rate=params['learning_rate'])
mnist_network.build_network()
test_acc = 0.0
with tf.Session() as sess:
# Train network
mnist_network.train(sess, mnist)
# Evaluate network
test_acc = mnist_network.evaluate(mnist)
if __name__ == '__main__':
params = {'data_dir': '/tmp/tensorflow/mnist/input_data',
'dropout_rate': 0.5,
'channel_1_num': 32,
'channel_2_num': 64,
'conv_size': 5,
'pool_size': 2,
'hidden_size': 1024,
'learning_rate': 1e-4,
'batch_num': 2000,
'batch_size': 32}
run_trial(params)
If you want to see the full implementation, please refer to :githublink:`examples/trials/mnist-tfv1/mnist_before.py <examples/trials/mnist-tfv1/mnist_before.py>`.
The above code can only try one set of parameters at a time; if we want to tune learning rate, we need to manually modify the hyperparameter and start the trial again and again.
NNI is born to help the user do tuning jobs; the NNI working process is presented below:
.. code-block:: text
input: search space, trial code, config file
output: one optimal hyperparameter configuration
1: For t = 0, 1, 2, ..., maxTrialNum,
2: hyperparameter = chose a set of parameter from search space
3: final result = run_trial_and_evaluate(hyperparameter)
4: report final result to NNI
5: If reach the upper limit time,
6: Stop the experiment
7: return hyperparameter value with best final result
If you want to use NNI to automatically train your model and find the optimal hyper-parameters, you need to do three changes based on your code:
Three steps to start an experiment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**Step 1**\ : Write a ``Search Space`` file in JSON, including the ``name`` and the ``distribution`` (discrete-valued or continuous-valued) of all the hyperparameters you need to search.
.. code-block:: diff
- params = {'data_dir': '/tmp/tensorflow/mnist/input_data', 'dropout_rate': 0.5, 'channel_1_num': 32, 'channel_2_num': 64,
- 'conv_size': 5, 'pool_size': 2, 'hidden_size': 1024, 'learning_rate': 1e-4, 'batch_num': 2000, 'batch_size': 32}
+ {
+ "dropout_rate":{"_type":"uniform","_value":[0.5, 0.9]},
+ "conv_size":{"_type":"choice","_value":[2,3,5,7]},
+ "hidden_size":{"_type":"choice","_value":[124, 512, 1024]},
+ "batch_size": {"_type":"choice", "_value": [1, 4, 8, 16, 32]},
+ "learning_rate":{"_type":"choice","_value":[0.0001, 0.001, 0.01, 0.1]}
+ }
*Example:* :githublink:`search_space.json <examples/trials/mnist-tfv1/search_space.json>`
**Step 2**\ : Modify your ``Trial`` file to get the hyperparameter set from NNI and report the final result to NNI.
.. code-block:: diff
+ import nni
def run_trial(params):
mnist = input_data.read_data_sets(params['data_dir'], one_hot=True)
mnist_network = MnistNetwork(channel_1_num=params['channel_1_num'], channel_2_num=params['channel_2_num'], conv_size=params['conv_size'], hidden_size=params['hidden_size'], pool_size=params['pool_size'], learning_rate=params['learning_rate'])
mnist_network.build_network()
with tf.Session() as sess:
mnist_network.train(sess, mnist)
test_acc = mnist_network.evaluate(mnist)
+ nni.report_final_result(test_acc)
if __name__ == '__main__':
- params = {'data_dir': '/tmp/tensorflow/mnist/input_data', 'dropout_rate': 0.5, 'channel_1_num': 32, 'channel_2_num': 64,
- 'conv_size': 5, 'pool_size': 2, 'hidden_size': 1024, 'learning_rate': 1e-4, 'batch_num': 2000, 'batch_size': 32}
+ params = nni.get_next_parameter()
run_trial(params)
*Example:* :githublink:`mnist.py <examples/trials/mnist-tfv1/mnist.py>`
**Step 3**\ : Define a ``config`` file in YAML which declares the ``path`` to the search space and trial files. It also gives other information such as the tuning algorithm, max trial number, and max duration arguments.
.. code-block:: yaml
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
trainingServicePlatform: local
# The path to Search Space
searchSpacePath: search_space.json
useAnnotation: false
tuner:
builtinTunerName: TPE
# The path and the running command of trial
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
.. Note:: If you are planning to use remote machines or clusters as your :doc:`training service <../TrainingService/Overview>`, to avoid too much pressure on network, we limit the number of files to 2000 and total size to 300MB. If your codeDir contains too many files, you can choose which files and subfolders should be excluded by adding a ``.nniignore`` file that works like a ``.gitignore`` file. For more details on how to write this file, see the `git documentation <https://git-scm.com/docs/gitignore#_pattern_format>`__.
*Example:* :githublink:`config.yml <examples/trials/mnist-tfv1/config.yml>` :githublink:`.nniignore <examples/trials/mnist-tfv1/.nniignore>`
All the code above is already prepared and stored in :githublink:`examples/trials/mnist-tfv1/ <examples/trials/mnist-tfv1>`.
Linux and macOS
^^^^^^^^^^^^^^^
Run the **config.yml** file from your command line to start an MNIST experiment.
.. code-block:: bash
nnictl create --config nni/examples/trials/mnist-tfv1/config.yml
Windows
^^^^^^^
Run the **config_windows.yml** file from your command line to start an MNIST experiment.
.. code-block:: bash
nnictl create --config nni\examples\trials\mnist-tfv1\config_windows.yml
.. Note:: If you're using NNI on Windows, you probably need to change ``python3`` to ``python`` in the config.yml file or use the config_windows.yml file to start the experiment.
.. Note:: ``nnictl`` is a command line tool that can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc. Click :doc:`here <Nnictl>` for more usage of ``nnictl``.
Wait for the message ``INFO: Successfully started experiment!`` in the command line. This message indicates that your experiment has been successfully started. And this is what we expect to get:
.. code-block:: text
INFO: Starting restful server...
INFO: Successfully started Restful server!
INFO: Setting local config...
INFO: Successfully set local config!
INFO: Starting experiment...
INFO: Successfully started experiment!
-----------------------------------------------------------------------
The experiment id is egchD4qy
The Web UI urls are: [Your IP]:8080
-----------------------------------------------------------------------
You can use these commands to get more information about the experiment
-----------------------------------------------------------------------
commands description
1. nnictl experiment show show the information of experiments
2. nnictl trial ls list all of trial jobs
3. nnictl top monitor the status of running experiments
4. nnictl log stderr show stderr log content
5. nnictl log stdout show stdout log content
6. nnictl stop stop an experiment
7. nnictl trial kill kill a trial job by id
8. nnictl --help get help information about nnictl
-----------------------------------------------------------------------
If you prepared ``trial``\ , ``search space``\ , and ``config`` according to the above steps and successfully created an NNI job, NNI will automatically tune the optimal hyper-parameters and run different hyper-parameter sets for each trial according to the requirements you set. You can clearly see its progress through the NNI WebUI.
WebUI
-----
After you start your experiment in NNI successfully, you can find a message in the command-line interface that tells you the ``Web UI url`` like this:
.. code-block:: text
The Web UI urls are: [Your IP]:8080
Open the ``Web UI url`` (Here it's: ``[Your IP]:8080``\ ) in your browser; you can view detailed information about the experiment and all the submitted trial jobs as shown below. If you cannot open the WebUI link in your terminal, please refer to the `FAQ <FAQ.rst>`__.
View summary page
^^^^^^^^^^^^^^^^^
Click the "Overview" tab.
Information about this experiment will be shown in the WebUI, including the experiment trial profile and search space message. NNI also supports downloading this information and the parameters through the **Download** button. You can download the experiment results anytime while the experiment is running, or you can wait until the end of the execution, etc.
.. image:: ../../img/QuickStart1.png
:target: ../../img/QuickStart1.png
:alt:
The top 10 trials will be listed on the Overview page. You can browse all the trials on the "Trials Detail" page.
.. image:: ../../img/QuickStart2.png
:target: ../../img/QuickStart2.png
:alt:
View trials detail page
^^^^^^^^^^^^^^^^^^^^^^^
Click the "Default Metric" tab to see the point graph of all trials. Hover to see specific default metrics and search space messages.
.. image:: ../../img/QuickStart3.png
:target: ../../img/QuickStart3.png
:alt:
Click the "Hyper Parameter" tab to see the parallel graph.
* You can select the percentage to see the top trials.
* Choose two axis to swap their positions.
.. image:: ../../img/QuickStart4.png
:target: ../../img/QuickStart4.png
:alt:
Click the "Trial Duration" tab to see the bar graph.
.. image:: ../../img/QuickStart5.png
:target: ../../img/QuickStart5.png
:alt:
Below is the status of all trials. Specifically:
* Trial detail: trial's id, duration, start time, end time, status, accuracy, and search space file.
* If you run on the OpenPAI platform, you can also see the hdfsLogPath.
* Kill: you can kill a job that has the ``Running`` status.
* Support: Used to search for a specific trial.
.. image:: ../../img/QuickStart6.png
:target: ../../img/QuickStart6.png
:alt:
* Intermediate Result Graph
.. image:: ../../img/QuickStart7.png
:target: ../../img/QuickStart7.png
:alt:
Related Topic
-------------
* `Try different Tuners <../Tuner/BuiltinTuner.rst>`__
* `Try different Assessors <../Assessor/BuiltinAssessor.rst>`__
* `How to use command line tool nnictl <Nnictl.rst>`__
* `How to write a trial <../TrialExample/Trials.rst>`__
* `How to run an experiment on local (with multiple GPUs)? <../TrainingService/LocalMode.rst>`__
* `How to run an experiment on multiple machines? <../TrainingService/RemoteMachineMode.rst>`__
* `How to run an experiment on OpenPAI? <../TrainingService/PaiMode.rst>`__
* `How to run an experiment on Kubernetes through Kubeflow? <../TrainingService/KubeflowMode.rst>`__
* `How to run an experiment on Kubernetes through FrameworkController? <../TrainingService/FrameworkControllerMode.rst>`__
* `How to run an experiment on Kubernetes through AdaptDL? <../TrainingService/AdaptDLMode.rst>`__
.. role:: raw-html(raw)
:format: html
Search Space
============
Overview
--------
In NNI, tuner will sample parameters/architecture according to the search space, which is defined as a json file.
To define a search space, users should define the name of the variable, the type of sampling strategy and its parameters.
* An example of a search space definition is as follow:
.. code-block:: yaml
{
"dropout_rate": {"_type": "uniform", "_value": [0.1, 0.5]},
"conv_size": {"_type": "choice", "_value": [2, 3, 5, 7]},
"hidden_size": {"_type": "choice", "_value": [124, 512, 1024]},
"batch_size": {"_type": "choice", "_value": [50, 250, 500]},
"learning_rate": {"_type": "uniform", "_value": [0.0001, 0.1]}
}
Take the first line as an example. ``dropout_rate`` is defined as a variable whose priori distribution is a uniform distribution with a range from ``0.1`` to ``0.5``.
Note that the available sampling strategies within a search space depend on the tuner you want to use. We list the supported types for each builtin tuner below. For a customized tuner, you don't have to follow our convention and you will have the flexibility to define any type you want.
Types
-----
All types of sampling strategies and their parameter are listed here:
*
``{"_type": "choice", "_value": options}``
* The variable's value is one of the options. Here ``options`` should be a list of numbers or a list of strings. Using arbitrary objects as members of this list (like sublists, a mixture of numbers and strings, or null values) should work in most cases, but may trigger undefined behaviors.
* ``options`` can also be a nested sub-search-space, this sub-search-space takes effect only when the corresponding element is chosen. The variables in this sub-search-space can be seen as conditional variables. Here is an simple :githublink:`example of nested search space definition <examples/trials/mnist-nested-search-space/search_space.json>`. If an element in the options list is a dict, it is a sub-search-space, and for our built-in tuners you have to add a ``_name`` key in this dict, which helps you to identify which element is chosen. Accordingly, here is a :githublink:`sample <examples/trials/mnist-nested-search-space/sample.json>` which users can get from nni with nested search space definition. See the table below for the tuners which support nested search spaces.
*
``{"_type": "randint", "_value": [lower, upper]}``
* Choosing a random integer between ``lower`` (inclusive) and ``upper`` (exclusive).
* Note: Different tuners may interpret ``randint`` differently. Some (e.g., TPE, GridSearch) treat integers from lower
to upper as unordered ones, while others respect the ordering (e.g., SMAC). If you want all the tuners to respect
the ordering, please use ``quniform`` with ``q=1``.
*
``{"_type": "uniform", "_value": [low, high]}``
* The variable value is uniformly sampled between low and high.
* When optimizing, this variable is constrained to a two-sided interval.
*
``{"_type": "quniform", "_value": [low, high, q]}``
* The variable value is determined using ``clip(round(uniform(low, high) / q) * q, low, high)``\ , where the clip operation is used to constrain the generated value within the bounds. For example, for ``_value`` specified as [0, 10, 2.5], possible values are [0, 2.5, 5.0, 7.5, 10.0]; For ``_value`` specified as [2, 10, 5], possible values are [2, 5, 10].
* Suitable for a discrete value with respect to which the objective is still somewhat "smooth", but which should be bounded both above and below. If you want to uniformly choose an integer from a range [low, high], you can write ``_value`` like this: ``[low, high, 1]``.
*
``{"_type": "loguniform", "_value": [low, high]}``
* The variable value is drawn from a range [low, high] according to a loguniform distribution like exp(uniform(log(low), log(high))), so that the logarithm of the return value is uniformly distributed.
* When optimizing, this variable is constrained to be positive.
*
``{"_type": "qloguniform", "_value": [low, high, q]}``
* The variable value is determined using ``clip(round(loguniform(low, high) / q) * q, low, high)``\ , where the clip operation is used to constrain the generated value within the bounds.
* Suitable for a discrete variable with respect to which the objective is "smooth" and gets smoother with the size of the value, but which should be bounded both above and below.
*
``{"_type": "normal", "_value": [mu, sigma]}``
* The variable value is a real value that's normally-distributed with mean mu and standard deviation sigma. When optimizing, this is an unconstrained variable.
*
``{"_type": "qnormal", "_value": [mu, sigma, q]}``
* The variable value is determined using ``round(normal(mu, sigma) / q) * q``
* Suitable for a discrete variable that probably takes a value around mu, but is fundamentally unbounded.
*
``{"_type": "lognormal", "_value": [mu, sigma]}``
* The variable value is drawn according to ``exp(normal(mu, sigma))`` so that the logarithm of the return value is normally distributed. When optimizing, this variable is constrained to be positive.
*
``{"_type": "qlognormal", "_value": [mu, sigma, q]}``
* The variable value is determined using ``round(exp(normal(mu, sigma)) / q) * q``
* Suitable for a discrete variable with respect to which the objective is smooth and gets smoother with the size of the variable, which is bounded from one side.
Search Space Types Supported by Each Tuner
------------------------------------------
.. list-table::
:header-rows: 1
:widths: auto
* -
- choice
- choice(nested)
- randint
- uniform
- quniform
- loguniform
- qloguniform
- normal
- qnormal
- lognormal
- qlognormal
* - TPE Tuner
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
* - Random Search Tuner
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
* - Anneal Tuner
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
* - Evolution Tuner
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
* - SMAC Tuner
- :raw-html:`&#10003;`
-
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
-
-
-
-
-
* - Batch Tuner
- :raw-html:`&#10003;`
-
-
-
-
-
-
-
-
-
-
* - Grid Search Tuner
- :raw-html:`&#10003;`
-
- :raw-html:`&#10003;`
-
- :raw-html:`&#10003;`
-
-
-
-
-
-
* - Hyperband Advisor
- :raw-html:`&#10003;`
-
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
* - Metis Tuner
- :raw-html:`&#10003;`
-
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
-
-
-
-
-
-
* - GP Tuner
- :raw-html:`&#10003;`
-
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
- :raw-html:`&#10003;`
-
-
-
-
Known Limitations:
*
GP Tuner and Metis Tuner support only **numerical values** in search space (\ ``choice`` type values can be no-numerical with other tuners, e.g. string values). Both GP Tuner and Metis Tuner use Gaussian Process Regressor(GPR). GPR make predictions based on a kernel function and the 'distance' between different points, it's hard to get the true distance between no-numerical values.
*
Note that for nested search space:
* Only Random Search/TPE/Anneal/Evolution tuner supports nested search space
Setup NNI development environment
=================================
NNI development environment supports Ubuntu 1604 (or above), and Windows 10 with Python3 64bit.
Installation
------------
The installation steps are similar with installing from source code. But the installation links to code directory, so that code changes can be applied to installation as easy as possible.
1. Clone source code
^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
git clone https://github.com/Microsoft/nni.git
Note, if you want to contribute code back, it needs to fork your own NNI repo, and clone from there.
2. Install from source code
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Ubuntu
^^^^^^
.. code-block:: bash
make dev-easy-install
Windows
^^^^^^^
.. code-block:: bat
powershell -ExecutionPolicy Bypass -file install.ps1 -Development
3. Check if the environment is ready
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Now, you can try to start an experiment to check if your environment is ready.
For example, run the command
.. code-block:: bash
nnictl create --config examples/trials/mnist-tfv1/config.yml
And open WebUI to check if everything is OK
4. Reload changes
^^^^^^^^^^^^^^^^^
Python
^^^^^^
Nothing to do, the code is already linked to package folders.
TypeScript
^^^^^^^^^^
* If ``src/nni_manager`` is changed, run ``yarn watch`` under this folder. It will watch and build code continually. The ``nnictl`` need to be restarted to reload NNI manager.
* If ``src/webui`` is changed, run ``yarn dev``\ , which will run a mock API server and a webpack dev server simultaneously. Use ``EXPERIMENT`` environment variable (e.g., ``mnist-tfv1-running``\ ) to specify the mock data being used. Built-in mock experiments are listed in ``src/webui/mock``. An example of the full command is ``EXPERIMENT=mnist-tfv1-running yarn dev``.
* If ``src/nasui`` is changed, run ``yarn start`` under the corresponding folder. The web UI will refresh automatically if code is changed. There is also a mock API server that is useful when developing. It can be launched via ``node server.js``.
5. Submit Pull Request
^^^^^^^^^^^^^^^^^^^^^^
All changes are merged to master branch from your forked repo. The description of Pull Request must be meaningful, and useful.
We will review the changes as soon as possible. Once it passes review, we will merge it to master branch.
For more contribution guidelines and coding styles, you can refer to the `contributing document <Contributing.rst>`__.
WebUI
=====
View summary page
-----------------
Click the tab "Overview".
* On the overview tab, you can see the experiment information and status and the performance of top trials. If you want to see config and search space, please click the right button "Config" and "Search space".
.. image:: ../../img/webui-img/full-oview.png
:target: ../../img/webui-img/full-oview.png
:alt:
* If your experiment has many trials, you can change the refresh interval here.
.. image:: ../../img/webui-img/refresh-interval.png
:target: ../../img/webui-img/refresh-interval.png
:alt:
* You can review and download the experiment results and nni-manager/dispatcher log files from the "Download" button.
.. image:: ../../img/webui-img/download.png
:target: ../../img/webui-img/download.png
:alt:
* You can change some experiment configurations such as maxExecDuration, maxTrialNum and trial concurrency on here.
.. image:: ../../img/webui-img/edit-experiment-param.png
:target: ../../img/webui-img/edit-experiment-param.png
:alt:
* You can click the exclamation point in the error box to see a log message if the experiment's status is an error.
.. image:: ../../img/webui-img/log-error.png
:target: ../../img/webui-img/log-error.png
:alt:
.. image:: ../../img/webui-img/review-log.png
:target: ../../img/webui-img/review-log.png
:alt:
* You can click "About" to see the version and report any questions.
View job default metric
-----------------------
* Click the tab "Default Metric" to see the point graph of all trials. Hover to see its specific default metric and search space message.
.. image:: ../../img/webui-img/default-metric.png
:target: ../../img/webui-img/default-metric.png
:alt:
* Click the switch named "optimization curve" to see the experiment's optimization curve.
.. image:: ../../img/webui-img/best-curve.png
:target: ../../img/webui-img/best-curve.png
:alt:
View hyper parameter
--------------------
Click the tab "Hyper Parameter" to see the parallel graph.
* You can add/remove axes and drag to swap axes on the chart.
* You can select the percentage to see top trials.
.. image:: ../../img/webui-img/hyperPara.png
:target: ../../img/webui-img/hyperPara.png
:alt:
View Trial Duration
-------------------
Click the tab "Trial Duration" to see the bar graph.
.. image:: ../../img/webui-img/trial_duration.png
:target: ../../img/webui-img/trial_duration.png
:alt:
View Trial Intermediate Result Graph
------------------------------------
Click the tab "Intermediate Result" to see the line graph.
.. image:: ../../img/webui-img/trials_intermeidate.png
:target: ../../img/webui-img/trials_intermeidate.png
:alt:
The trial may have many intermediate results in the training process. In order to see the trend of some trials more clearly, we set a filtering function for the intermediate result graph.
You may find that these trials will get better or worse at an intermediate result. This indicates that it is an important and relevant intermediate result. To take a closer look at the point here, you need to enter its corresponding X-value at #Intermediate. Then input the range of metrics on this intermedia result. In the picture below, we choose the No. 4 intermediate result and set the range of metrics to 0.8-1.
.. image:: ../../img/webui-img/filter-intermediate.png
:target: ../../img/webui-img/filter-intermediate.png
:alt:
View trials status
------------------
Click the tab "Trials Detail" to see the status of all trials. Specifically:
* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy, and search space file.
.. image:: ../../img/webui-img/detail-local.png
:target: ../../img/webui-img/detail-local.png
:alt:
* The button named "Add column" can select which column to show on the table. If you run an experiment whose final result is a dict, you can see other keys in the table. You can choose the column "Intermediate count" to watch the trial's progress.
.. image:: ../../img/webui-img/addColumn.png
:target: ../../img/webui-img/addColumn.png
:alt:
* If you want to compare some trials, you can select them and then click "Compare" to see the results.
.. image:: ../../img/webui-img/select-trial.png
:target: ../../img/webui-img/select-trial.png
:alt:
.. image:: ../../img/webui-img/compare.png
:target: ../../img/webui-img/compare.png
:alt:
* Support to search for a specific trial by it's id, status, Trial No. and parameters.
.. image:: ../../img/webui-img/search-trial.png
:target: ../../img/webui-img/search-trial.png
:alt:
* You can use the button named "Copy as python" to copy the trial's parameters.
.. image:: ../../img/webui-img/copyParameter.png
:target: ../../img/webui-img/copyParameter.png
:alt:
* If you run on the OpenPAI or Kubeflow platform, you can also see the nfs log.
.. image:: ../../img/webui-img/detail-pai.png
:target: ../../img/webui-img/detail-pai.png
:alt:
* Intermediate Result Graph: you can see the default metric in this graph by clicking the intermediate button.
.. image:: ../../img/webui-img/intermediate.png
:target: ../../img/webui-img/intermediate.png
:alt:
* Kill: you can kill a job that status is running.
.. image:: ../../img/webui-img/kill-running.png
:target: ../../img/webui-img/kill-running.png
:alt:
Python API Reference of Auto Tune
=================================
.. contents::
Trial
-----
.. autofunction:: nni.get_next_parameter
.. autofunction:: nni.get_current_parameter
.. autofunction:: nni.report_intermediate_result
.. autofunction:: nni.report_final_result
.. autofunction:: nni.get_experiment_id
.. autofunction:: nni.get_trial_id
.. autofunction:: nni.get_sequence_id
Tuner
-----
.. autoclass:: nni.tuner.Tuner
:members:
.. autoclass:: nni.algorithms.hpo.hyperopt_tuner.hyperopt_tuner.HyperoptTuner
:members:
.. autoclass:: nni.algorithms.hpo.evolution_tuner.evolution_tuner.EvolutionTuner
:members:
.. autoclass:: nni.algorithms.hpo.smac_tuner.SMACTuner
:members:
.. autoclass:: nni.algorithms.hpo.gridsearch_tuner.GridSearchTuner
:members:
.. autoclass:: nni.algorithms.hpo.networkmorphism_tuner.networkmorphism_tuner.NetworkMorphismTuner
:members:
.. autoclass:: nni.algorithms.hpo.metis_tuner.metis_tuner.MetisTuner
:members:
.. autoclass:: nni.algorithms.hpo.ppo_tuner.PPOTuner
:members:
.. autoclass:: nni.algorithms.hpo.batch_tuner.batch_tuner.BatchTuner
:members:
.. autoclass:: nni.algorithms.hpo.gp_tuner.gp_tuner.GPTuner
:members:
Assessor
--------
.. autoclass:: nni.assessor.Assessor
:members:
.. autoclass:: nni.assessor.AssessResult
:members:
.. autoclass:: nni.algorithms.hpo.curvefitting_assessor.CurvefittingAssessor
:members:
.. autoclass:: nni.algorithms.hpo.medianstop_assessor.MedianstopAssessor
:members:
Advisor
-------
.. autoclass:: nni.runtime.msg_dispatcher_base.MsgDispatcherBase
:members:
.. autoclass:: nni.algorithms.hpo.hyperband_advisor.hyperband_advisor.Hyperband
:members:
.. autoclass:: nni.algorithms.hpo.bohb_advisor.bohb_advisor.BOHB
:members:
Utilities
---------
.. autofunction:: nni.utils.merge_parameter
......@@ -12,12 +12,10 @@
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
from recommonmark.transform import AutoStructify
from recommonmark.parser import CommonMarkParser
import os
import subprocess
import sys
sys.path.insert(0, os.path.abspath('../../src/sdk/pynni'))
sys.path.insert(1, os.path.abspath('../../src/sdk/pycli'))
sys.path.insert(0, os.path.abspath('../..'))
# -- Project information ---------------------------------------------------
......@@ -43,12 +41,12 @@ release = 'v1.9'
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.mathjax',
'sphinx_markdown_tables',
'sphinxarg.ext',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx.ext.intersphinx',
'nbsphinx',
'sphinx.ext.extlinks',
]
# Add mock modules
......@@ -59,12 +57,7 @@ templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_parsers = {
'.md': CommonMarkParser
}
source_suffix = ['.rst', '.md']
source_suffix = ['.rst']
# The master toctree document.
master_doc = 'contents'
......@@ -197,12 +190,14 @@ epub_title = project
# A list of files that should not be packed into the epub file.
epub_exclude_files = ['search.html']
# external links (for github code)
# Reference the code via :githublink:`path/to/your/example/code.py`
git_commit_id = subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode().strip()
extlinks = {
'githublink': ('https://github.com/microsoft/nni/blob/' + git_commit_id + '/%s', 'Github link: ')
}
# -- Extension configuration -------------------------------------------------
def setup(app):
app.add_config_value('recommonmark_config', {
'enable_eval_rst': True,
'enable_auto_toc_tree': False,
}, True)
app.add_transform(AutoStructify)
app.add_stylesheet('css/custom.css')
......@@ -17,6 +17,7 @@ Neural Network Intelligence
Feature Engineering <feature_engineering>
References <reference>
Use Cases and Solutions <CommunitySharings/community_sharings>
Research and Publications <ResearchPublications>
FAQ <Tutorial/FAQ>
How to Contribute <contribution>
Changelog <Release>
......@@ -9,4 +9,4 @@ Advanced Features
Write a New Advisor <Tuner/CustomizeAdvisor>
Write a New Training Service <TrainingService/HowToImplementTrainingService>
Install Customized Algorithms as Builtin Tuners/Assessors/Advisors <Tutorial/InstallCustomizedAlgos>
How to install customized tuner as a builtin tuner <Tuner/InstallCustomizedTuner.md>
How to install customized tuner as a builtin tuner <Tuner/InstallCustomizedTuner>
NNI Client
==========
NNI client is a python API of ``nnictl``\ , which implements the most commonly used commands. Users can use this API to control their experiments, collect experiment results and conduct advanced analyses based on experiment results in python code directly instead of using command line. Here is an example:
.. code-block:: bash
from nni.experiment import Experiment
# create an experiment instance
exp = Experiment()
# start an experiment, then connect the instance to this experiment
# you can also use `resume_experiment`, `view_experiment` or `connect_experiment`
# only one of them should be called in one instance
exp.start_experiment('nni/examples/trials/mnist-pytorch/config.yml', port=9090)
# update the experiment's concurrency
exp.update_concurrency(3)
# get some information about the experiment
print(exp.get_experiment_status())
print(exp.get_job_statistics())
print(exp.list_trial_jobs())
# stop the experiment, then disconnect the instance from the experiment.
exp.stop_experiment()
References
----------
.. autoclass:: nni.experiment.Experiment
:members:
.. autoclass:: nni.experiment.TrialJob
:members:
.. autoclass:: nni.experiment.TrialHyperParameters
:members:
.. autoclass:: nni.experiment.TrialMetricData
:members:
.. autoclass:: nni.experiment.TrialResult
:members:
sphinx==1.8.3
sphinx==3.3.1
sphinx-argparse==0.2.5
sphinx-markdown-tables==0.0.9
sphinx-rtd-theme==0.4.2
sphinxcontrib-websupport==1.1.0
recommonmark==0.5.0
pygments==2.7.1
hyperopt
json_tricks
......
import argparse
import m2r
import os
import re
import shutil
from pathlib import Path
def single_line_process(line):
if line == ' .. contents::':
return '.. contents::'
# https://github.com/sphinx-doc/sphinx/issues/3921
line = re.sub(r'(`.*? <.*?>`)_', r'\1__', line)
# inline emphasis
line = re.sub(r'\*\*\\ (.*?)\\ \*\*', r' **\1** ', line)
line = re.sub(r'\*(.*?)\\ \*', r'*\1*', line)
line = re.sub(r'\*\*(.*?) \*\*', r'**\1** ', line)
line = re.sub(r'\\\*\\\*(.*?)\*\*', r'**\1**', line)
line = re.sub(r'\\\*\\\*(.*?)\*\*\\ ', r'**\1**', line)
line = line.replace(r'\* - `\**', r'* - `**')
line = re.sub(r'\\\* \*\*(.*?)\*\* \(\\\*\s*(.*?)\s*\*\\ \)', r'* \1 (\2)', line)
line = re.sub(r'\<(.*)\.md(\>|#)', r'<\1.rst\2', line)
line = re.sub(r'`\*\*(.*?)\*\* <#(.*?)>`__', r'`\1 <#\2>`__', line)
line = re.sub(r'\*\* (classArgs|stop|FLOPS.*?|pruned.*?|large.*?|path|preCommand|2D.*?|codeDirectory|ps|worker|Tuner|Assessor)\*\*',
r' **\1**', line)
line = line.replace('.. code-block:::: bash', '.. code-block:: bash')
line = line.replace('raw-html-m2r', 'raw-html')
line = line.replace('[toc]', '.. toctree::')
# image
line = re.sub(r'\:raw\-html\:`\<img src\=\"(.*?)\" style\=\"zoom\: ?(\d+)\%\;\" \/\>`', r'\n.. image:: \1\n :scale: \2%', line)
# special case (per line handling)
line = line.replace('Nb = |Db|', r'Nb = \|Db\|')
line = line.replace(' Here is just a small list of libraries ', '\nHere is just a small list of libraries ')
line = line.replace(' Find the data management region in job submission page.', 'Find the data management region in job submission page.')
line = line.replace('Tuner/InstallCustomizedTuner.md', 'Tuner/InstallCustomizedTuner')
line = line.replace('&#10003;', ':raw-html:`&#10003;`')
line = line.replace(' **builtinTunerName** and** classArgs**', '**builtinTunerName** and **classArgs**')
line = line.replace('`\ ``nnictl ss_gen`` <../Tutorial/Nnictl.rst>`__', '`nnictl ss_gen <../Tutorial/Nnictl.rst>`__')
line = line.replace('**Step 1. Install NNI, follow the install guide `here <../Tutorial/QuickStart.rst>`__.**',
'**Step 1. Install NNI, follow the install guide** `here <../Tutorial/QuickStart.rst>`__.')
line = line.replace('*Please refer to `here ', 'Please refer to `here ')
# line = line.replace('\* **optimize_mode** ', '* **optimize_mode** ')
if line == '~' * len(line):
line = '^' * len(line)
return line
def special_case_replace(full_text):
replace_pairs = {}
replace_pairs['PyTorch\n"""""""'] = '**PyTorch**'
replace_pairs['Search Space\n============'] = '.. role:: raw-html(raw)\n :format: html\n\nSearch Space\n============'
for file in os.listdir(Path(__file__).parent / 'patches'):
with open(Path(__file__).parent / 'patches' / file) as f:
r, s = f.read().split('%%%%%%\n')
replace_pairs[r] = s
for r, s in replace_pairs.items():
full_text = full_text.replace(r, s)
return full_text
def process_table(content):
content = content.replace('------ |', '------|')
lines = []
for line in content.split('\n'):
if line.startswith(' |'):
line = line[2:]
lines.append(line)
return '\n'.join(lines)
def process_github_link(line):
line = re.sub(r'`(\\ ``)?([^`]*?)(``)? \<(.*?)(blob|tree)/v1.9/(.*?)\>`__', r':githublink:`\2 <\6>`', line)
if 'githublink' in line:
line = re.sub(r'\*Example: (.*)\*', r'*Example:* \1', line)
line = line.replace('https://nni.readthedocs.io/en/latest', '')
return line
for root, dirs, files in os.walk('en_US'):
root = Path(root)
for file in files:
if not file.endswith('.md') or file == 'Release_v1.0.md':
continue
with open(root / file) as f:
md_content = f.read()
if file == 'Nnictl.md':
md_content = process_table(md_content)
out = m2r.convert(md_content)
lines = out.split('\n')
if lines[0] == '':
lines = lines[1:]
# remove code-block eval_rst
i = 0
while i < len(lines):
line = lines[i]
if line.strip() == '.. code-block:: eval_rst':
space_count = line.index('.')
lines[i] = lines[i + 1] = None
if i > 0 and lines[i - 1]:
lines[i] = '' # blank line
i += 2
while i < len(lines) and (lines[i].startswith(' ' * (space_count + 3)) or lines[i] == ''):
lines[i] = lines[i][space_count + 3:]
i += 1
elif line.strip() == '.. code-block' or line.strip() == '.. code-block::':
lines[i] += ':: bash'
i += 1
else:
i += 1
lines = [l for l in lines if l is not None]
lines = list(map(single_line_process, lines))
if file != 'Release.md':
# githublink
lines = list(map(process_github_link, lines))
out = '\n'.join(lines)
out = special_case_replace(out)
with open(root / (Path(file).stem + '.rst'), 'w') as f:
f.write(out)
# back it up and remove
moved_root = Path('archive_en_US') / root.relative_to('en_US')
moved_root.mkdir(exist_ok=True)
shutil.move(root / file, moved_root / file)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment