"src/git@developer.sourcefind.cn:OpenDAS/nni.git" did not exist on "9cc9c68a39a69e12b1bdca12a27f04d2824de7f4"
Unverified Commit 0663218b authored by SparkSnail's avatar SparkSnail Committed by GitHub
Browse files

Merge pull request #163 from Microsoft/master

merge master
parents 6c9360a5 cf983800
...@@ -66,6 +66,7 @@ NNI_VERSION_TEMPLATE = 999.0.0-developing ...@@ -66,6 +66,7 @@ NNI_VERSION_TEMPLATE = 999.0.0-developing
build: build:
#$(_INFO) Building NNI Manager $(_END) #$(_INFO) Building NNI Manager $(_END)
cd src/nni_manager && $(NNI_YARN) && $(NNI_YARN) build cd src/nni_manager && $(NNI_YARN) && $(NNI_YARN) build
cp -rf src/nni_manager/config src/nni_manager/dist/
#$(_INFO) Building WebUI $(_END) #$(_INFO) Building WebUI $(_END)
cd src/webui && $(NNI_YARN) && $(NNI_YARN) build cd src/webui && $(NNI_YARN) && $(NNI_YARN) build
......
...@@ -60,7 +60,8 @@ NNI (Neural Network Intelligence) 是自动机器学习(AutoML)的工具包 ...@@ -60,7 +60,8 @@ NNI (Neural Network Intelligence) 是自动机器学习(AutoML)的工具包
<li><a href="docs/zh_CN/Builtin_Tuner.md#NetworkMorphism">Network Morphism</a></li> <li><a href="docs/zh_CN/Builtin_Tuner.md#NetworkMorphism">Network Morphism</a></li>
<li><a href="examples/tuners/enas_nni/README_zh_CN.md">ENAS</a></li> <li><a href="examples/tuners/enas_nni/README_zh_CN.md">ENAS</a></li>
<li><a href="docs/zh_CN/Builtin_Tuner.md#NetworkMorphism#MetisTuner">Metis Tuner</a></li> <li><a href="docs/zh_CN/Builtin_Tuner.md#NetworkMorphism#MetisTuner">Metis Tuner</a></li>
</ul> <li><a href="docs/zh_CN/Builtin_Tuner.md#BOHB">BOHB</a></li>
</ul>
<a href="docs/zh_CN/Builtin_Assessors.md#assessor">Assessor(评估器)</a> <a href="docs/zh_CN/Builtin_Assessors.md#assessor">Assessor(评估器)</a>
<ul> <ul>
<li><a href="docs/zh_CN/Builtin_Assessors.md#Medianstop">Median Stop</a></li> <li><a href="docs/zh_CN/Builtin_Assessors.md#Medianstop">Median Stop</a></li>
...@@ -69,7 +70,7 @@ NNI (Neural Network Intelligence) 是自动机器学习(AutoML)的工具包 ...@@ -69,7 +70,7 @@ NNI (Neural Network Intelligence) 是自动机器学习(AutoML)的工具包
</td> </td>
<td> <td>
<ul> <ul>
<li><a href="docs/zh_CN/tutorial_1_CR_exp_local_api.md">本地计算机</a></li> <li><a href="docs/zh_CN/LocalMode.md">本地计算机</a></li>
<li><a href="docs/zh_CN/RemoteMachineMode.md">远程计算机</a></li> <li><a href="docs/zh_CN/RemoteMachineMode.md">远程计算机</a></li>
<li><a href="docs/zh_CN/PAIMode.md">OpenPAI</a></li> <li><a href="docs/zh_CN/PAIMode.md">OpenPAI</a></li>
<li><a href="docs/zh_CN/KubeflowMode.md">Kubeflow</a></li> <li><a href="docs/zh_CN/KubeflowMode.md">Kubeflow</a></li>
...@@ -193,7 +194,7 @@ NNI (Neural Network Intelligence) 是自动机器学习(AutoML)的工具包 ...@@ -193,7 +194,7 @@ NNI (Neural Network Intelligence) 是自动机器学习(AutoML)的工具包
## **教程** ## **教程**
* [在本机运行 Experiment (支持多 GPU 卡)](docs/zh_CN/tutorial_1_CR_exp_local_api.md) * [在本机运行 Experiment (支持多 GPU 卡)](docs/zh_CN/LocalMode.md)
* [在多机上运行 Experiment](docs/zh_CN/RemoteMachineMode.md) * [在多机上运行 Experiment](docs/zh_CN/RemoteMachineMode.md)
* [在 OpenPAI 上运行 Experiment](docs/zh_CN/PAIMode.md) * [在 OpenPAI 上运行 Experiment](docs/zh_CN/PAIMode.md)
* [在 Kubeflow 上运行 Experiment。](docs/zh_CN/KubeflowMode.md) * [在 Kubeflow 上运行 Experiment。](docs/zh_CN/KubeflowMode.md)
......
...@@ -171,3 +171,46 @@ jobs: ...@@ -171,3 +171,46 @@ jobs:
fi fi
condition: eq( variables['upload_package'], 'true') condition: eq( variables['upload_package'], 'true')
displayName: 'upload nni package to pypi/testpypi' displayName: 'upload nni package to pypi/testpypi'
- job: 'Build_upload_nni_windows'
dependsOn: version_number_validation
condition: succeeded()
pool:
vmImage: 'vs2017-win2016'
strategy:
matrix:
Python36:
PYTHON_VERSION: '3.6'
steps:
- script: |
python -m pip install --upgrade pip setuptools --user
python -m pip install twine --user
displayName: 'Install twine'
- script: |
cd deployment/pypi
if [ $(build_type) = 'prerelease' ]
then
# NNI build scripts (powershell) uses branch tag as package version number
git tag $(build_version)
echo 'building prerelease package...'
powershell.exe ./install.ps1 -version_ts $True
else
echo 'building release package...'
powershell.exe ./install.ps1
fi
condition: eq( variables['upload_package'], 'true')
displayName: 'build nni bdsit_wheel'
- script: |
cd deployment/pypi
if [ $(build_type) = 'prerelease' ]
then
echo 'uploading prerelease package to testpypi...'
python -m twine upload -u $(testpypi_user) -p $(testpypi_pwd) --repository-url https://test.pypi.org/legacy/ dist/*
else
echo 'uploading release package to pypi...'
python -m twine upload -u $(pypi_user) -p $(pypi_pwd) dist/*
fi
condition: eq( variables['upload_package'], 'true')
displayName: 'upload nni package to pypi/testpypi'
\ No newline at end of file
Python Package Index (PyPI) for NNI # Python Package Index (PyPI) for NNI
===
## 1.Description
This is the PyPI build and upload tool for NNI project. This is the PyPI build and upload tool for NNI project.
## 2.Prepare environment ## **For Linux**
Before build and upload NNI package, make sure the below OS and tools are available.
``` * __Prepare environment__
Ubuntu 16.04 LTS
make Before build and upload NNI package, make sure the below OS and tools are available.
wget ```
Python >= 3.5 Ubuntu 16.04 LTS
Pip make
Node.js wget
Yarn Python >= 3.5
``` Pip
Node.js
## 2.How to build Yarn
```bash ```
make
``` * __How to build__
## 3.How to upload ```bash
make
### upload for testing ```
```bash
TWINE_REPOSITORY_URL=https://test.pypi.org/legacy/ make upload * __How to upload__
```
You may need to input the account and password of https://test.pypi.org during this process. **upload for testing**
```bash
### upload for release TWINE_REPOSITORY_URL=https://test.pypi.org/legacy/ make upload
```bash ```
make upload You may need to input the account and password of https://test.pypi.org during this process.
```
You may need to input the account and password of https://pypi.org during this process. **upload for release**
```bash
make upload
```
You may need to input the account and password of https://pypi.org during this process.
## **For Windows**
* __Prepare environment__
Before build and upload NNI package, make sure the below OS and tools are available.
```
Windows 10
powershell
Python >= 3.5
Pip
Node.js
Yarn
tar
```
* __How to build__
```bash
powershell ./install.ps1
```
* __How to upload__
**upload for testing**
```bash
powershell ./upload.ps1
```
You may need to input the account and password of https://test.pypi.org during this process.
**upload for release**
```bash
powershell ./upload.ps1 -test $False
```
You may need to input the account and password of https://pypi.org during this process.
\ No newline at end of file
$CWD = $PWD
$OS_SPEC = "windows"
Remove-Item $CWD\build -Recurse -Force
Remove-Item $CWD\dist -Recurse -Force
Remove-Item $CWD\nni -Recurse -Force
Remove-Item $CWD\nni.egg-info -Recurse -Force
Remove-Item $CWD\node-$OS_SPEC-x64 -Recurse -Force
\ No newline at end of file
param([bool]$version_ts=$false)
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
$CWD = $PWD
$OS_SPEC = "windows"
$WHEEL_SPEC = "win_amd64"
$TIME_STAMP = date -u "+%y%m%d%H%M"
$NNI_VERSION_VALUE = git describe --tags --abbrev=0
# To include time stamp in version value, run:
# make version_ts=true build
if($version_ts){
$NNI_VERSION_VALUE = "$NNI_VERSION_VALUE.$TIME_STAMP"
}
$NNI_VERSION_TEMPLATE = "999.0.0-developing"
python -m pip install --user --upgrade setuptools wheel
$nodeUrl = "https://aka.ms/nni/nodejs-download/win64"
$NNI_NODE_ZIP = "$CWD\node-$OS_SPEC-x64.zip"
$NNI_NODE_FOLDER = "$CWD\node-$OS_SPEC-x64"
(New-Object Net.WebClient).DownloadFile($nodeUrl, $NNI_NODE_ZIP)
if(Test-Path $NNI_NODE_FOLDER){
Remove-Item $NNI_NODE_FOLDER -Recurse -Force
}
New-Item $NNI_NODE_FOLDER -ItemType Directory
cmd /c tar -xf $NNI_NODE_ZIP -C $NNI_NODE_FOLDER --strip-components 1
cd $CWD\..\..\src\nni_manager
yarn
yarn build
cd $CWD\..\..\src\webui
yarn
yarn build
if(Test-Path $CWD\nni){
Remove-Item $CWD\nni -r -fo
}
Copy-Item $CWD\..\..\src\nni_manager\dist $CWD\nni -Recurse
Copy-Item $CWD\..\..\src\webui\build $CWD\nni\static -Recurse
Copy-Item $CWD\..\..\src\nni_manager\package.json $CWD\nni
(Get-Content $CWD\nni\package.json).replace($NNI_VERSION_TEMPLATE, $NNI_VERSION_VALUE) | Set-Content $CWD\nni\package.json
cd $CWD\nni
yarn --prod
cd $CWD
(Get-Content setup.py).replace($NNI_VERSION_TEMPLATE, $NNI_VERSION_VALUE) | Set-Content setup.py
python setup.py bdist_wheel -p $WHEEL_SPEC
\ No newline at end of file
...@@ -27,10 +27,15 @@ if os_type == 'Linux': ...@@ -27,10 +27,15 @@ if os_type == 'Linux':
os_name = 'POSIX :: Linux' os_name = 'POSIX :: Linux'
elif os_type == 'Darwin': elif os_type == 'Darwin':
os_name = 'MacOS' os_name = 'MacOS'
elif os_type == 'Windows':
os_name = 'Microsoft :: Windows'
else: else:
raise NotImplementedError('current platform {} not supported'.format(os_type)) raise NotImplementedError('current platform {} not supported'.format(os_type))
data_files = [('bin', ['node-{}-x64/bin/node'.format(os_type.lower())])] data_files = [('bin', ['node-{}-x64/bin/node'.format(os_type.lower())])]
if os_type == 'Windows':
data_files = [('.\Scripts', ['node-{}-x64/node.exe'.format(os_type.lower())])]
for (dirpath, dirnames, filenames) in walk('./nni'): for (dirpath, dirnames, filenames) in walk('./nni'):
files = [path.normpath(path.join(dirpath, filename)) for filename in filenames] files = [path.normpath(path.join(dirpath, filename)) for filename in filenames]
data_files.append((path.normpath(dirpath), files)) data_files.append((path.normpath(dirpath), files))
...@@ -69,7 +74,8 @@ setuptools.setup( ...@@ -69,7 +74,8 @@ setuptools.setup(
'json_tricks', 'json_tricks',
'numpy', 'numpy',
'scipy', 'scipy',
'coverage' 'coverage',
'colorama'
], ],
classifiers = [ classifiers = [
'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3',
......
param([bool]$test=$true)
python -m pip install --user --upgrade twine
if($test){
python -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*
}
else{
python -m twine upload dist/*
}
\ No newline at end of file
...@@ -394,10 +394,10 @@ machineList: ...@@ -394,10 +394,10 @@ machineList:
* __localConfig__ * __localConfig__
__localConfig__ is applicable only if __trainingServicePlatform__ is set to ```local```, otherwise there should not be __localConfig__ section in configuration file. __localConfig__ is applicable only if __trainingServicePlatform__ is set to `local`, otherwise there should not be __localConfig__ section in configuration file.
* __gpuIndices__ * __gpuIndices__
__gpuIndices__ is used to specify designated GPU devices for NNI, if it is set, only the specified GPU devices are used for NNI trial jobs. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as ```1``` or ```0,1,3```. __gpuIndices__ is used to specify designated GPU devices for NNI, if it is set, only the specified GPU devices are used for NNI trial jobs. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as `1` or `0,1,3`.
* __machineList__ * __machineList__
...@@ -431,7 +431,7 @@ machineList: ...@@ -431,7 +431,7 @@ machineList:
* __gpuIndices__ * __gpuIndices__
__gpuIndices__ is used to specify designated GPU devices for NNI on this remote machine, if it is set, only the specified GPU devices are used for NNI trial jobs. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as ```1``` or ```0,1,3```. __gpuIndices__ is used to specify designated GPU devices for NNI on this remote machine, if it is set, only the specified GPU devices are used for NNI trial jobs. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as `1` or `0,1,3`.
* __kubeflowConfig__: * __kubeflowConfig__:
......
# Installation of NNI # Installation of NNI
Currently we only support installation on Linux & Mac. Currently we support installation on Linux, Mac and Windows.
## **Installation** ## **Installation on Linux & Mac**
* __Install NNI through pip__ * __Install NNI through pip__
...@@ -13,7 +13,7 @@ Currently we only support installation on Linux & Mac. ...@@ -13,7 +13,7 @@ Currently we only support installation on Linux & Mac.
* __Install NNI through source code__ * __Install NNI through source code__
Prerequisite: `python >=3.5, git, wget` Prerequisite: `python >=3.5`, `git`, `wget`
```bash ```bash
git clone -b v0.6 https://github.com/Microsoft/nni.git git clone -b v0.6 https://github.com/Microsoft/nni.git
cd nni cd nni
...@@ -24,6 +24,29 @@ Currently we only support installation on Linux & Mac. ...@@ -24,6 +24,29 @@ Currently we only support installation on Linux & Mac.
You can also install NNI in a docker image. Please follow the instructions [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/README.md) to build NNI docker image. The NNI docker image can also be retrieved from Docker Hub through the command `docker pull msranni/nni:latest`. You can also install NNI in a docker image. Please follow the instructions [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/README.md) to build NNI docker image. The NNI docker image can also be retrieved from Docker Hub through the command `docker pull msranni/nni:latest`.
## **Installation on Windows**
* __Install NNI through pip__
Prerequisite: `python >= 3.5`
```bash
python -m pip install --upgrade nni
```
* __Install NNI through source code__
Prerequisite: `python >=3.5`, `git`, `powershell`
When you use powershell to run script for the first time, you need run powershell as Administrator with this command:
```bash
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
```
Then you can install nni as administrator or current user as follows:
```bash
git clone https://github.com/Microsoft/nni.git
cd nni
powershell ./install.ps1
```
## **System requirements** ## **System requirements**
Below are the minimum system requirements for NNI on Linux. Due to potential programming changes, the minimum system requirements for NNI may change over time. Below are the minimum system requirements for NNI on Linux. Due to potential programming changes, the minimum system requirements for NNI may change over time.
...@@ -50,6 +73,18 @@ Below are the minimum system requirements for NNI on macOS. Due to potential pro ...@@ -50,6 +73,18 @@ Below are the minimum system requirements for NNI on macOS. Due to potential pro
|**Internet**|Boardband internet connection| |**Internet**|Boardband internet connection|
|**Resolution**|1024 x 768 minimum display resolution| |**Resolution**|1024 x 768 minimum display resolution|
Below are the minimum system requirements for NNI on Windows. Due to potential programming changes, the minimum system requirements for NNI may change over time.
||Minimum Requirements|Recommended Specifications|
|---|---|---|
|**Operating System**|Windows 10|Windows 10|
|**CPU**|Intel® Core™ i3 or AMD Phenom™ X3 8650|Intel® Core™ i5 or AMD Phenom™ II X3 or better|
|**GPU**|NVIDIA® GeForce® GTX 460|NVIDIA® GeForce® GTX 660 or better|
|**Memory**|4 GB RAM|6 GB RAM|
|**Storage**|30 GB available hare drive space|
|**Internet**|Boardband internet connection|
|**Resolution**|1024 x 768 minimum display resolution|
## Further reading ## Further reading
* [Overview](Overview.md) * [Overview](Overview.md)
......
...@@ -287,35 +287,17 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -287,35 +287,17 @@ Debug mode will disable version check function in Trialkeeper.
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
|id| False| |ID of the trial to be killed| |id| False| |Experiment ID of the trial|
|--experiment, -E| True| |Experiment id of the trial| |--trial_id, -T| True| |ID of the trial you want to kill.|
* Example * Example
> kill trail job > kill trail job
```bash ```bash
nnictl trial [trial_id] --vexperiment [experiment_id] nnictl trial [trial_id] --experiment [experiment_id]
``` ```
* __nnictl trial export__
* Description
You can use this command to export reward & hyper-parameter of trial jobs to a csv file.
* Usage
```bash
nnictl trial export [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment |
|--file| True| |File path of the output csv file |
<a name="top"></a> <a name="top"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `nnictl top` ![](https://placehold.it/15/1589F0/000000?text=+) `nnictl top`
...@@ -388,6 +370,92 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -388,6 +370,92 @@ Debug mode will disable version check function in Trialkeeper.
nnictl experiment list nnictl experiment list
``` ```
<a name="export"></a>
* __nnictl experiment export__
* Description
You can use this command to export reward & hyper-parameter of trial jobs to a csv file.
* Usage
```bash
nnictl experiment export [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment |
|--file| True| |File path of the output file |
|--type| True| |Type of output file, only support "csv" and "json"|
* Examples
> export all trial data in an experiment as json format
```bash
nnictl experiment export [experiment_id] --file [file_path] --type json
```
* __nnictl experiment import__
* Description
You can use this command to import several prior or supplementary trial hyperparameters & results for NNI hyperparameter tuning. The data are fed to the tuning algorithm (e.g., tuner or advisor).
* Usage
```bash
nnictl experiment import [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------|------|
|id| False| |The id of the experiment you want to import data into|
|--file, -f| True| |a file with data you want to import in json format|
* Details
NNI supports users to import their own data, please express the data in the correct format. An example is shown below:
```json
[
{"parameter": {"x": 0.5, "y": 0.9}, "value": 0.03},
{"parameter": {"x": 0.4, "y": 0.8}, "value": 0.05},
{"parameter": {"x": 0.3, "y": 0.7}, "value": 0.04}
]
```
Every element in the top level list is a sample. For our built-in tuners/advisors, each sample should have at least two keys: `parameter` and `value`. The `parameter` must match this experiment's search space, that is, all the keys (or hyperparameters) in `parameter` must match the keys in the search space. Otherwise, tuner/advisor may have unpredictable behavior. `Value` should follow the same rule of the input in `nni.report_final_result`, that is, either a number or a dict with a key named `default`. For your customized tuner/advisor, the file could have any json content depending on how you implement the corresponding methods (e.g., `import_data`).
You also can use [nnictl experiment export](#export) to export a valid json file including previous experiment trial hyperparameters and results.
Currenctly, following tuner and advisor support import data:
```yml
builtinTunerName: TPE, Anneal, GridSearch, MetisTuner
builtinAdvisorName: BOHB
```
*If you want to import data to BOHB advisor, user are suggested to add "TRIAL_BUDGET" in parameter as NNI do, otherwise, BOHB will use max_budget as "TRIAL_BUDGET". Here is an example:*
```json
[
{"parameter": {"x": 0.5, "y": 0.9, "TRIAL_BUDGET": 27}, "value": 0.03}
]
```
* Examples
> import data to a running experiment
```bash
nnictl experiment [experiment_id] -f experiment_data.json
```
<a name="config"></a> <a name="config"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `nnictl config show` ![](https://placehold.it/15/1589F0/000000?text=+) `nnictl config show`
...@@ -470,8 +538,8 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -470,8 +538,8 @@ Debug mode will disable version check function in Trialkeeper.
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
|id| False| |ID of the trial to be found the log path| |id| False| |Experiment ID of the trial|
|--experiment, -E| False| |Experiment ID of the trial, required when id is not empty.| |--trial_id, -T| False| |ID of the trial to be found the log path, required when id is not empty.|
<a name="webui"></a> <a name="webui"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `Manage webui` ![](https://placehold.it/15/1589F0/000000?text=+) `Manage webui`
...@@ -498,7 +566,7 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -498,7 +566,7 @@ Debug mode will disable version check function in Trialkeeper.
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
|id| False| |ID of the experiment you want to set| |id| False| |ID of the experiment you want to set|
|--trialid| False| |ID of the trial| |--trial_id, -T| False| |ID of the trial|
|--port| False| 6006|The port of the tensorboard process| |--port| False| 6006|The port of the tensorboard process|
* Detail * Detail
...@@ -507,7 +575,7 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -507,7 +575,7 @@ Debug mode will disable version check function in Trialkeeper.
2. If you want to use tensorboard, you need to write your tensorboard log data to environment variable [NNI_OUTPUT_DIR] path. 2. If you want to use tensorboard, you need to write your tensorboard log data to environment variable [NNI_OUTPUT_DIR] path.
3. In local mode, nnictl will set --logdir=[NNI_OUTPUT_DIR] directly and start a tensorboard process. 3. In local mode, nnictl will set --logdir=[NNI_OUTPUT_DIR] directly and start a tensorboard process.
4. In remote mode, nnictl will create a ssh client to copy log data from remote machine to local temp directory firstly, and then start a tensorboard process in your local machine. You need to notice that nnictl only copy the log data one time when you use the command, if you want to see the later result of tensorboard, you should execute nnictl tensorboard command again. 4. In remote mode, nnictl will create a ssh client to copy log data from remote machine to local temp directory firstly, and then start a tensorboard process in your local machine. You need to notice that nnictl only copy the log data one time when you use the command, if you want to see the later result of tensorboard, you should execute nnictl tensorboard command again.
5. If there is only one trial job, you don't need to set trialid. If there are multiple trial jobs running, you should set the trialid, or you could use [nnictl tensorboard start --trialid all] to map --logdir to all trial log paths. 5. If there is only one trial job, you don't need to set trial id. If there are multiple trial jobs running, you should set the trial id, or you could use [nnictl tensorboard start --trial_id all] to map --logdir to all trial log paths.
* __nnictl tensorboard stop__ * __nnictl tensorboard stop__
* Description * Description
......
## Create multi-phase experiment ## What is multi-phase experiment
Typically each trial job gets single set of configuration (e.g. hyper parameters) from tuner and do some kind of experiment, let's say train a model with that hyper parameter and reports its result to tuner. Sometimes you may want to train multiple models within one trial job to share information between models or saving system resource by creating less trial jobs, for example: Typically each trial job gets a single configuration (e.g., hyperparameters) from tuner, tries this configuration and reports result, then exits. But sometimes a trial job may wants to request multiple configurations from tuner. We find this is a very compelling feature. For example:
1. Train multiple models sequentially in one trial job, so that later models can leverage the weights or other information of prior models and may use different hyper parameters. 1. Job launch takes tens of seconds in some training platform. If a configuration takes only around a minute to finish, running only one configuration in a trial job would be every inefficient. An appealing way is that a trial job requests a configuration and finishes it, then requests another configuration and run. The extreme case is that a trial job can run infinite configurations. If you set concurrency to be for example 6, there would be 6 __long running__ jobs keeping trying different configurations.
2. Train large amount of models on limited system resource, combine multiple models together to save system resource to create large amount of trial jobs.
3. Any other scenario that you would like to train multiple models with different hyper parameters in one trial job, be aware that if you allocate multiple GPUs to a trial job and you train multiple models concurrently within on trial job, you need to allocate GPU resource properly by your trial code.
In above cases, you can leverage NNI multi-phase experiment to train multiple models with different hyper parameters within each trial job. 2. Some types of models have to be trained phase by phase, the configuration of next phase depends on the results of previous phase(s). For example, to find the best quantization for a model, the training procedure is often as follows: the auto-quantization algorithm (i.e., tuner in NNI) chooses a size of bits (e.g., 16 bits), a trial job gets this configuration and trains the model for some epochs and reports result (e.g., accuracy). The algorithm receives this result and makes decision of changing 16 bits to 8 bits, or changing back to 32 bits. This process is repeated for a configured times.
Multi-phase experiments refer to experiments whose trial jobs request multiple hyper parameters from tuner and report multiple final results to NNI. The above cases can be supported by the same feature, i.e., multi-phase execution. To support those cases, basically a trial job should be able to request multiple configurations from tuner. Tuner is aware of whether two configuration requests are from the same trial job or different ones. Also in multi-phase a trial job can report multiple final results.
To use multi-phase experiment, please follow below steps: Note that, `nni.get_next_parameter()` and `nni.report_final_result()` should be called sequentially: __call the former one, then call the later one; and repeat this pattern__. If `nni.get_next_parameter()` is called multiple times consecutively, and then `nni.report_final_result()` is called once, the result is associated to the last configuration, which is retrieved from the last get_next_parameter call. So there is no result associated to previous get_next_parameter calls, and it may cause some multi-phase algorithm broken.
1. Implement nni.multi_phase.MultiPhaseTuner. For example, this [ENAS tuner](https://github.com/countif/enas_nni/blob/master/nni/examples/tuners/enas/nni_controller_ptb.py) is a multi-phase Tuner which implements nni.multi_phase.MultiPhaseTuner. While implementing your MultiPhaseTuner, you may want to use the trial_job_id parameter of generate_parameters method to generate hyper parameters for each trial job. ## Create multi-phase experiment
1. Set `multiPhase` field to `true`, and configure your tuner implemented in step 1 as customized tuner in configuration file, for example: ### Write trial code which leverages multi-phase:
```yaml __1. Update trial code__
...
multiPhase: true
tuner:
codeDir: tuners/enas
classFileName: nni_controller_ptb.py
className: ENASTuner
classArgs:
say_hello: "hello"
...
```
1. Invoke nni.get_next_parameter() API for multiple times as needed in a trial, for example: It is pretty simple to use multi-phase in trial code, an example is shown below:
```python ```python
# ...
for i in range(5): for i in range(5):
# get parameter from tuner # get parameter from tuner
tuner_param = nni.get_next_parameter() tuner_param = nni.get_next_parameter()
...@@ -40,4 +29,17 @@ To use multi-phase experiment, please follow below steps: ...@@ -40,4 +29,17 @@ To use multi-phase experiment, please follow below steps:
# report final result somewhere for the parameter retrieved above # report final result somewhere for the parameter retrieved above
nni.report_final_result() nni.report_final_result()
# ... # ...
# ...
``` ```
__2. Modify experiment configuration__
To enable multi-phase, you should also add `multiPhase: true` in your experiment yaml configure file. If this line is not added, `nni.get_next_parameter()` would always return the same configuration. For all the built-in tuners/advisors, you can use multi-phase in your trial code without modification of tuner/advisor spec in the yaml configure file.
### Write a tuner that leverages multi-phase:
Before writing a multi-phase tuner, we highly suggest you to go through [Customize Tuner](https://nni.readthedocs.io/en/latest/Customize_Tuner.html). Different from writing a normal tuner, your tuner needs to inherit from `MultiPhaseTuner` (in nni.multi_phase_tuner). The key difference between `Tuner` and `MultiPhaseTuner` is that the methods in MultiPhaseTuner are aware of additional information, that is, `trial_job_id`. With this information, the tuner could know which trial is requesting a configuration, and which trial is reporting results. This information provides enough flexibility for your tuner to deal with different trials and different phases. For example, you may want to use the trial_job_id parameter of generate_parameters method to generate hyperparameters for a specific trial job.
Of course, to use your multi-phase tuner, __you should add `multiPhase: true` in your experiment yaml configure file__.
[ENAS tuner](https://github.com/countif/enas_nni/blob/master/nni/examples/tuners/enas/nni_controller_ptb.py) is an example of a multi-phase tuner.
docs/img/accuracy.png

26.8 KB | W: | H:

docs/img/accuracy.png

26.2 KB | W: | H:

docs/img/accuracy.png
docs/img/accuracy.png
docs/img/accuracy.png
docs/img/accuracy.png
  • 2-up
  • Swipe
  • Onion skin
docs/img/hyperPara.png

84.7 KB | W: | H:

docs/img/hyperPara.png

76.4 KB | W: | H:

docs/img/hyperPara.png
docs/img/hyperPara.png
docs/img/hyperPara.png
docs/img/hyperPara.png
  • 2-up
  • Swipe
  • Onion skin
docs/img/trial_duration.png

25.9 KB | W: | H:

docs/img/trial_duration.png

24.5 KB | W: | H:

docs/img/trial_duration.png
docs/img/trial_duration.png
docs/img/trial_duration.png
docs/img/trial_duration.png
  • 2-up
  • Swipe
  • Onion skin
docs/img/webui-img/detail-local.png

53 KB | W: | H:

docs/img/webui-img/detail-local.png

50.3 KB | W: | H:

docs/img/webui-img/detail-local.png
docs/img/webui-img/detail-local.png
docs/img/webui-img/detail-local.png
docs/img/webui-img/detail-local.png
  • 2-up
  • Swipe
  • Onion skin
docs/img/webui-img/detail-pai.png

34.5 KB | W: | H:

docs/img/webui-img/detail-pai.png

22.1 KB | W: | H:

docs/img/webui-img/detail-pai.png
docs/img/webui-img/detail-pai.png
docs/img/webui-img/detail-pai.png
docs/img/webui-img/detail-pai.png
  • 2-up
  • Swipe
  • Onion skin
docs/img/webui-img/over1.png

67.1 KB | W: | H:

docs/img/webui-img/over1.png

61.9 KB | W: | H:

docs/img/webui-img/over1.png
docs/img/webui-img/over1.png
docs/img/webui-img/over1.png
docs/img/webui-img/over1.png
  • 2-up
  • Swipe
  • Onion skin
docs/img/webui-img/over2.png

35.3 KB | W: | H:

docs/img/webui-img/over2.png

29.9 KB | W: | H:

docs/img/webui-img/over2.png
docs/img/webui-img/over2.png
docs/img/webui-img/over2.png
docs/img/webui-img/over2.png
  • 2-up
  • Swipe
  • Onion skin
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment