Commit 05913424 authored by suiguoxin's avatar suiguoxin
Browse files

Merge branch 'master' into quniform-tuners

parents e3c8552f 1dab3118
...@@ -167,6 +167,7 @@ dev-install-python-modules: ...@@ -167,6 +167,7 @@ dev-install-python-modules:
#$(_INFO) Installing Python SDK $(_END) #$(_INFO) Installing Python SDK $(_END)
mkdir -p build mkdir -p build
ln -sf ../src/sdk/pynni/nni build/nni ln -sf ../src/sdk/pynni/nni build/nni
ln -sf ../src/sdk/pynni/nnicli build/nnicli
ln -sf ../tools/nni_annotation build/nni_annotation ln -sf ../tools/nni_annotation build/nni_annotation
ln -sf ../tools/nni_cmd build/nni_cmd ln -sf ../tools/nni_cmd build/nni_cmd
ln -sf ../tools/nni_trial_tool build/nni_trial_tool ln -sf ../tools/nni_trial_tool build/nni_trial_tool
......
...@@ -34,6 +34,10 @@ jobs: ...@@ -34,6 +34,10 @@ jobs:
cd test cd test
PATH=$HOME/.local/bin:$PATH python3 metrics_test.py PATH=$HOME/.local/bin:$PATH python3 metrics_test.py
displayName: 'Trial job metrics test' displayName: 'Trial job metrics test'
- script: |
cd test
PATH=$HOME/.local/bin:$PATH python3 cli_test.py
displayName: 'nnicli test'
- job: 'basic_test_pr_macOS' - job: 'basic_test_pr_macOS'
pool: pool:
...@@ -61,3 +65,7 @@ jobs: ...@@ -61,3 +65,7 @@ jobs:
cd test cd test
PATH=$HOME/Library/Python/3.7/bin:$PATH python3 tuner_test.py PATH=$HOME/Library/Python/3.7/bin:$PATH python3 tuner_test.py
displayName: 'Built-in tuners / assessors tests' displayName: 'Built-in tuners / assessors tests'
- script: |
cd test
PATH=$HOME/Library/Python/3.7/bin:$PATH python3 cli_test.py
displayName: 'nnicli test'
...@@ -38,6 +38,7 @@ build: ...@@ -38,6 +38,7 @@ build:
cd $(CWD)../../src/webui && $(NNI_YARN) && $(NNI_YARN) build cd $(CWD)../../src/webui && $(NNI_YARN) && $(NNI_YARN) build
rm -rf $(CWD)nni rm -rf $(CWD)nni
cp -r $(CWD)../../src/nni_manager/dist $(CWD)nni cp -r $(CWD)../../src/nni_manager/dist $(CWD)nni
cp -r $(CWD)../../src/nni_manager/config $(CWD)nni
cp -r $(CWD)../../src/webui/build $(CWD)nni/static cp -r $(CWD)../../src/webui/build $(CWD)nni/static
cp $(CWD)../../src/nni_manager/package.json $(CWD)nni cp $(CWD)../../src/nni_manager/package.json $(CWD)nni
sed -ie 's/$(NNI_VERSION_TEMPLATE)/$(NNI_VERSION_VALUE)/' $(CWD)nni/package.json sed -ie 's/$(NNI_VERSION_TEMPLATE)/$(NNI_VERSION_VALUE)/' $(CWD)nni/package.json
......
...@@ -53,13 +53,16 @@ setuptools.setup( ...@@ -53,13 +53,16 @@ setuptools.setup(
long_description_content_type = 'text/markdown', long_description_content_type = 'text/markdown',
license = 'MIT', license = 'MIT',
url = 'https://github.com/Microsoft/nni', url = 'https://github.com/Microsoft/nni',
packages = setuptools.find_packages('../../tools') + setuptools.find_packages('../../src/sdk/pynni', exclude=['tests']), packages = setuptools.find_packages('../../tools') \
+ setuptools.find_packages('../../src/sdk/pynni', exclude=['tests']) \
+ setuptools.find_packages('../../src/sdk/pycli'),
package_dir = { package_dir = {
'nni_annotation': '../../tools/nni_annotation', 'nni_annotation': '../../tools/nni_annotation',
'nni_cmd': '../../tools/nni_cmd', 'nni_cmd': '../../tools/nni_cmd',
'nni_trial_tool': '../../tools/nni_trial_tool', 'nni_trial_tool': '../../tools/nni_trial_tool',
'nni_gpu_tool': '../../tools/nni_gpu_tool', 'nni_gpu_tool': '../../tools/nni_gpu_tool',
'nni': '../../src/sdk/pynni/nni' 'nni': '../../src/sdk/pynni/nni',
'nnicli': '../../src/sdk/pycli/nnicli'
}, },
package_data = {'nni': ['**/requirements.txt']}, package_data = {'nni': ['**/requirements.txt']},
python_requires = '>=3.5', python_requires = '>=3.5',
......
# Parallelizing a Sequential Algorithm TPE
TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete. For the TPE approach, the so-called constant liar approach was used: each time a candidate point x∗ was proposed, a fake fitness evaluation of the y was assigned temporarily, until the evaluation completed and reported the actual loss f(x∗).
## Introducion and Problems
### Sequential Model-based Global Optimization
Sequential Model-Based Global Optimization (SMBO) algorithms have been used in many applications where evaluation of the fitness function is expensive. In an application where the true fitness function f: X → R is costly to evaluate, model-based algorithms approximate f with a surrogate that is cheaper to evaluate. Typically the inner loop in an SMBO algorithm is the numerical optimization of this surrogate, or some transformation of the surrogate. The point x∗ that maximizes the surrogate (or its transformation) becomes the proposal for where the true function f should be evaluated. This active-learning-like algorithm template is summarized in the figure below. SMBO algorithms differ in what criterion they optimize to obtain x∗ given a model (or surrogate) of f, and in they model f via observation history H.
![](../../img/parallel_tpe_search4.PNG)
The algorithms in this work optimize the criterion of Expected Improvement (EI). Other criteria have been suggested, such as Probability of Improvement and Expected Improvement, minimizing the Conditional Entropy of the Minimizer, and the bandit-based criterion. We chose to use the EI criterion in TPE because it is intuitive, and has been shown to work well in a variety of settings. Expected improvement is the expectation under some model M of f : X → RN that f(x) will exceed (negatively) some threshold y∗:
![](../../img/parallel_tpe_search_ei.PNG)
Since calculation of p(y|x) is expensive, TPE approach modeled p(y|x) by p(x|y) and p(y).The TPE defines p(x|y) using two such densities:
![](../../img/parallel_tpe_search_tpe.PNG)
where l(x) is the density formed by using the observations {x(i)} such that corresponding loss
f(x(i)) was less than y∗ and g(x) is the density formed by using the remaining observations. TPE algorithm depends on a y∗ that is larger than the best observed f(x) so that some points can be used to form l(x). The TPE algorithm chooses y∗ to be some quantile γ of the observed y values, so that p(y<`y∗`) = γ, but no specific model for p(y) is necessary. The tree-structured form of l and g makes it easy to draw manycandidates according to l and evaluate them according to g(x)/l(x). On each iteration, the algorithm returns the candidate x∗ with the greatest EI.
Here is a simulation of the TPE algorithm in a two-dimensional search space. The difference of background color represents different values. It can be seen that TPE combines exploration and exploitation very well. (Black indicates the points of this round samples, and yellow indicates the points has been taken in the history.)
![](../../img/parallel_tpe_search1.gif)
**Since EI is a continuous function, the highest x of EI is determined at a certain status.** As shown in the figure below, the blue triangle is the point that is most likely to be sampled in this state.
![](../../img/parallel_tpe_search_ei2.PNG)
TPE performs well when we use it in sequential, but if we provide a larger concurrency, then **there will be a large number of points produced in the same EI state**, too concentrated points will reduce the exploration ability of the tuner, resulting in resources waste.
Here is the simulation figure when we set `concurrency=60`, It can be seen that this phenomenon is obvious.
![](../../img/parallel_tpe_search2.gif)
## Research solution
### Approximated q-EI Maximization
The multi-points criterion that we have presented below can potentially be used to deliver an additional design of experiments in one step through the resolution of the optimization problem.
![](../../img/parallel_tpe_search_qEI.PNG)
However, the computation of q-EI becomes intensive as q increases. After our research, there are four popular greedy strategies that approach the result of problem while avoiding its numerical cost.
#### Solution 1: Believing the OK Predictor: The KB(Kriging Believer) Heuristic Strategy
The Kriging Believer strategy replaces the conditional knowledge about the responses at the sites chosen within the last iterations by deterministic values equal to the expectation of the Kriging predictor. Keeping the same notations as previously, the strategy can be summed up as follows:
![](../../img/parallel_tpe_search_kb.PNG)
This sequential strategy delivers a q-points design and is computationally affordable since it relies on the analytically known EI, optimized in d dimensions. However, there is a risk of failure, since believing an OK predictor that overshoots the observed data may lead to a sequence that gets trapped in a non-optimal region for many iterations. We now propose a second strategy that reduces this risk.
#### Solution 2: The CL(Constant Liar) Heuristic Strategy
Let us now consider a sequential strategy in which the metamodel is updated (still without hyperparameter re-estimation) at each iteration with a value L exogenously fixed by the user, here called a ”lie”. The strategy referred to as the Constant Liar consists in lying with the same value L at every iteration: maximize EI (i.e. find xn+1), actualize the model as if y(xn+1) = L, and so on always with the same L ∈ R:
![](../../img/parallel_tpe_search_cl.PNG)
L should logically be determined on the basis of the values taken by y at X. Three values, min{Y}, mean{Y}, and max{Y} are considered here. **The larger L is, the more explorative the algorithm will be, and vice versa.**
We have simulated the method above. The following figure shows the result of using mean value liars to maximize q-EI. We find that the points we have taken have begun to be scattered.
![](../../img/parallel_tpe_search3.gif)
## Experiment
### Branin-Hoo
The four optimization strtigeies presented in the last section are now complared on the Branin-Hoo function which is a classical test-case in global optimization.
![](../../img/parallel_tpe_search_branin.PNG)
The recommended values of a, b, c, r, s and t are: a = 1, b = 5.1 ⁄ (4π2), c = 5 ⁄ π, r = 6, s = 10 and t = 1 ⁄ (8π). This function has three global minimizers(-3.14, 12.27), (3.14, 2.27), (9.42, 2.47).
Next is the comparaison of the q-EI associated with the q first points (q ∈ [1,10]) given by the constant liar strategies (min and max), 2000 q-points designs uniformly drawn for every q, and 2000 q-points LHS designs taken at random for every q.
![](../../img/parallel_tpe_search_result.PNG)
As we can seen on figure, CL[max] and CL[min] offer very good q-EI results compared to random designs, especially for small values of q.
### Gaussian Mixed Model function
We also compared the case of using parallel optimization and not using parallel optimization. A two-dimensional multimodal Gaussian Mixed distribution is used to simulate, the following is our result:
||concurrency=80|concurrency=60|concurrency=40|concurrency=20|concurrency=10|
|---|---|---|---|---|---|
|Without parallel optimization|avg = 0.4841 <br> var = 0.1953|avg = 0.5155 <br> var = 0.2219|avg = 0.5773 <br> var = 0.2570|avg = 0.4680 <br> var = 0.1994| avg = 0.2774 <br> var = 0.1217|
|With parallel optimization|avg = 0.2132 <br> var = 0.0700|avg = 0.2177<br>var = 0.0796| avg = 0.1835 <br> var = 0.0533|avg = 0.1671 <br> var = 0.0413|avg = 0.1918 <br> var = 0.0697|
Note: The total number of samples per test is 240 (ensure that the budget is equal). The trials in each form were repeated 1000 times, the value is the average and variance of the best results in 1000 trials.
## References
[1] James Bergstra, Remi Bardenet, Yoshua Bengio, Balazs Kegl. "Algorithms for Hyper-Parameter Optimization". [Link](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf)
[2] Meng-Hiot Lim, Yew-Soon Ong. "Computational Intelligence in Expensive Optimization Problems". [Link](https://link.springer.com/content/pdf/10.1007%2F978-3-642-10701-6.pdf)
[3] M. Jordan, J. Kleinberg, B. Scho¨lkopf. "Pattern Recognition and Machine Learning". [Link](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf)
...@@ -10,3 +10,4 @@ In addtion to the official tutorilas and examples, we encourage community contri ...@@ -10,3 +10,4 @@ In addtion to the official tutorilas and examples, we encourage community contri
NNI in Recommenders <RecommendersSvd> NNI in Recommenders <RecommendersSvd>
Neural Architecture Search Comparison <NasComparision> Neural Architecture Search Comparison <NasComparision>
Hyper-parameter Tuning Algorithm Comparsion <HpoComparision> Hyper-parameter Tuning Algorithm Comparsion <HpoComparision>
Parallelizing Optimization for TPE <ParallelizingTpeSearch>
...@@ -198,6 +198,8 @@ Trial configuration in kubeflow mode have the following configuration keys: ...@@ -198,6 +198,8 @@ Trial configuration in kubeflow mode have the following configuration keys:
* image * image
* Required key. In kubeflow mode, your trial program will be scheduled by Kubernetes to run in [Pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/). This key is used to specify the Docker image used to create the pod where your trail program will run. * Required key. In kubeflow mode, your trial program will be scheduled by Kubernetes to run in [Pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/). This key is used to specify the Docker image used to create the pod where your trail program will run.
* We already build a docker image [msranni/nni](https://hub.docker.com/r/msranni/nni/) on [Docker Hub](https://hub.docker.com/). It contains NNI python packages, Node modules and javascript artifact files required to start experiment, and all of NNI dependencies. The docker file used to build this image can be found at [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/Dockerfile). You can either use this image directly in your config file, or build your own image based on it. * We already build a docker image [msranni/nni](https://hub.docker.com/r/msranni/nni/) on [Docker Hub](https://hub.docker.com/). It contains NNI python packages, Node modules and javascript artifact files required to start experiment, and all of NNI dependencies. The docker file used to build this image can be found at [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/Dockerfile). You can either use this image directly in your config file, or build your own image based on it.
* privateRegistryAuthPath
* Optional field, specify `config.json` file path that holds an authorization token of docker registry, used to pull image from private registry. [Refer](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/).
* apiVersion * apiVersion
* Required key. The API version of your Kubeflow. * Required key. The API version of your Kubeflow.
* ps (optional). This config section is used to configure Tensorflow parameter server role. * ps (optional). This config section is used to configure Tensorflow parameter server role.
......
...@@ -32,8 +32,6 @@ trial: ...@@ -32,8 +32,6 @@ trial:
cpuNum: 1 cpuNum: 1
memoryMB: 8196 memoryMB: 8196
image: msranni/nni:latest image: msranni/nni:latest
dataDir: hdfs://10.1.1.1:9000/nni
outputDir: hdfs://10.1.1.1:9000/nni
# Configuration to access OpenPAI Cluster # Configuration to access OpenPAI Cluster
paiConfig: paiConfig:
userName: your_pai_nni_user userName: your_pai_nni_user
...@@ -51,14 +49,12 @@ Compared with [LocalMode](LocalMode.md) and [RemoteMachineMode](RemoteMachineMod ...@@ -51,14 +49,12 @@ Compared with [LocalMode](LocalMode.md) and [RemoteMachineMode](RemoteMachineMod
* image * image
* Required key. In pai mode, your trial program will be scheduled by OpenPAI to run in [Docker container](https://www.docker.com/). This key is used to specify the Docker image used to create the container in which your trial will run. * Required key. In pai mode, your trial program will be scheduled by OpenPAI to run in [Docker container](https://www.docker.com/). This key is used to specify the Docker image used to create the container in which your trial will run.
* We already build a docker image [nnimsra/nni](https://hub.docker.com/r/msranni/nni/) on [Docker Hub](https://hub.docker.com/). It contains NNI python packages, Node modules and javascript artifact files required to start experiment, and all of NNI dependencies. The docker file used to build this image can be found at [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/Dockerfile). You can either use this image directly in your config file, or build your own image based on it. * We already build a docker image [nnimsra/nni](https://hub.docker.com/r/msranni/nni/) on [Docker Hub](https://hub.docker.com/). It contains NNI python packages, Node modules and javascript artifact files required to start experiment, and all of NNI dependencies. The docker file used to build this image can be found at [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/Dockerfile). You can either use this image directly in your config file, or build your own image based on it.
* dataDir
* Optional key. It specifies the HDFS data direcotry for trial to download data. The format should be something like hdfs://{your HDFS host}:9000/{your data directory}
* outputDir
* Optional key. It specifies the HDFS output directory for trial. Once the trial is completed (either succeed or fail), trial's stdout, stderr will be copied to this directory by NNI sdk automatically. The format should be something like hdfs://{your HDFS host}:9000/{your output directory}
* virtualCluster * virtualCluster
* Optional key. Set the virtualCluster of OpenPAI. If omitted, the job will run on default virtual cluster. * Optional key. Set the virtualCluster of OpenPAI. If omitted, the job will run on default virtual cluster.
* shmMB * shmMB
* Optional key. Set the shmMB configuration of OpenPAI, it set the shared memory for one task in the task role. * Optional key. Set the shmMB configuration of OpenPAI, it set the shared memory for one task in the task role.
* authFile
* Optional key, Set the auth file path for private registry while using PAI mode, [Refer](https://github.com/microsoft/pai/blob/2ea69b45faa018662bc164ed7733f6fdbb4c42b3/docs/faq.md#q-how-to-use-private-docker-registry-job-image-when-submitting-an-openpai-job).
Once complete to fill NNI experiment config file and save (for example, save as exp_pai.yml), then run the following command Once complete to fill NNI experiment config file and save (for example, save as exp_pai.yml), then run the following command
``` ```
...@@ -80,9 +76,10 @@ And you will be redirected to HDFS web portal to browse the output files of that ...@@ -80,9 +76,10 @@ And you will be redirected to HDFS web portal to browse the output files of that
You can see there're three fils in output folder: stderr, stdout, and trial.log You can see there're three fils in output folder: stderr, stdout, and trial.log
If you also want to save trial's other output into HDFS, like model files, you can use environment variable `NNI_OUTPUT_DIR` in your trial code to save your own output files, and NNI SDK will copy all the files in `NNI_OUTPUT_DIR` from trial's container to HDFS. ## data management
If your training data is not too large, it could be put into codeDir, and nni will upload the data to hdfs, or you could build your own docker image with the data. If you have large dataset, it's not appropriate to put the data in codeDir, and you could follow the [guidance](https://github.com/microsoft/pai/blob/master/docs/user/storage.md) to mount the data folder in container.
Any problems when using NNI in pai mode, please create issues on [NNI github repo](https://github.com/Microsoft/nni). If you also want to save trial's other output into HDFS, like model files, you can use environment variable `NNI_OUTPUT_DIR` in your trial code to save your own output files, and NNI SDK will copy all the files in `NNI_OUTPUT_DIR` from trial's container to HDFS, the target path is `hdfs://host:port/{username}/nni/{experiments}/{experimentId}/trials/{trialId}/nnioutput`
## version check ## version check
NNI support version check feature in since version 0.6. It is a policy to insure the version of NNIManager is consistent with trialKeeper, and avoid errors caused by version incompatibility. NNI support version check feature in since version 0.6. It is a policy to insure the version of NNIManager is consistent with trialKeeper, and avoid errors caused by version incompatibility.
...@@ -92,4 +89,4 @@ Check policy: ...@@ -92,4 +89,4 @@ Check policy:
3. Note that the version check feature only check first two digits of version.For example, NNIManager v0.6.1 could use trialKeeper v0.6 or trialKeeper v0.6.2, but could not use trialKeeper v0.5.1 or trialKeeper v0.7. 3. Note that the version check feature only check first two digits of version.For example, NNIManager v0.6.1 could use trialKeeper v0.6 or trialKeeper v0.6.2, but could not use trialKeeper v0.5.1 or trialKeeper v0.7.
If you could not run your experiment and want to know if it is caused by version check, you could check your webUI, and there will be an error message about version check. If you could not run your experiment and want to know if it is caused by version check, you could check your webUI, and there will be an error message about version check.
![](../../img/version_check.png) ![](../../img/version_check.png)
\ No newline at end of file
...@@ -14,7 +14,7 @@ Currently we support the following algorithms: ...@@ -14,7 +14,7 @@ Currently we support the following algorithms:
|[__Naïve Evolution__](#Evolution)|Naïve Evolution comes from Large-Scale Evolution of Image Classifiers. It randomly initializes a population-based on search space. For each generation, it chooses better ones and does some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naïve Evolution requires many trials to works, but it's very simple and easy to expand new features. [Reference paper](https://arxiv.org/pdf/1703.01041.pdf)| |[__Naïve Evolution__](#Evolution)|Naïve Evolution comes from Large-Scale Evolution of Image Classifiers. It randomly initializes a population-based on search space. For each generation, it chooses better ones and does some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naïve Evolution requires many trials to works, but it's very simple and easy to expand new features. [Reference paper](https://arxiv.org/pdf/1703.01041.pdf)|
|[__SMAC__](#SMAC)|SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by NNI is a wrapper on the SMAC3 GitHub repo. Notice, SMAC need to be installed by `nnictl package` command. [Reference Paper,](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) [GitHub Repo](https://github.com/automl/SMAC3)| |[__SMAC__](#SMAC)|SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by NNI is a wrapper on the SMAC3 GitHub repo. Notice, SMAC need to be installed by `nnictl package` command. [Reference Paper,](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) [GitHub Repo](https://github.com/automl/SMAC3)|
|[__Batch tuner__](#Batch)|Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.| |[__Batch tuner__](#Batch)|Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.|
|[__Grid Search__](#GridSearch)|Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, qloguniform. The number q in quniform and qloguniform has special meaning (different from the spec in search space spec). It means the number of values that will be sampled evenly from the range low and high.| |[__Grid Search__](#GridSearch)|Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, randint. |
|[__Hyperband__](#Hyperband)|Hyperband tries to use the limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for the small number of trial budget to find out promising one, then further training those promising ones to select several more promising one.[Reference Paper](https://arxiv.org/pdf/1603.06560.pdf)| |[__Hyperband__](#Hyperband)|Hyperband tries to use the limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for the small number of trial budget to find out promising one, then further training those promising ones to select several more promising one.[Reference Paper](https://arxiv.org/pdf/1603.06560.pdf)|
|[__Network Morphism__](#NetworkMorphism)|Network Morphism provides functions to automatically search for architecture of deep learning models. Every child network inherits the knowledge from its parent network and morphs into diverse types of networks, including changes of depth, width, and skip-connection. Next, it estimates the value of a child network using the historic architecture and metric pairs. Then it selects the most promising one to train. [Reference Paper](https://arxiv.org/abs/1806.10282)| |[__Network Morphism__](#NetworkMorphism)|Network Morphism provides functions to automatically search for architecture of deep learning models. Every child network inherits the knowledge from its parent network and morphs into diverse types of networks, including changes of depth, width, and skip-connection. Next, it estimates the value of a child network using the historic architecture and metric pairs. Then it selects the most promising one to train. [Reference Paper](https://arxiv.org/abs/1806.10282)|
|[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)| |[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)|
...@@ -42,6 +42,8 @@ TPE, as a black-box optimization, can be used in various scenarios and shows goo ...@@ -42,6 +42,8 @@ TPE, as a black-box optimization, can be used in various scenarios and shows goo
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
Note: We have optimized the parallelism of TPE for large-scale trial-concurrency. For the principle of optimization or turn-on optimization, please refer to [TPE document](HyperoptTuner.md).
**Usage example:** **Usage example:**
```yaml ```yaml
...@@ -211,7 +213,7 @@ The search space file including the high-level key `combine_params`. The type of ...@@ -211,7 +213,7 @@ The search space file including the high-level key `combine_params`. The type of
**Suggested scenario** **Suggested scenario**
Note that the only acceptable types of search space are `choice`, `quniform`, `qloguniform`. **The number `q` in `quniform` and `qloguniform` has special meaning (different from the spec in [search space spec](../Tutorial/SearchSpaceSpec.md)). It means the number of values that will be sampled evenly from the range `low` and `high`.** Note that the only acceptable types of search space are `choice`, `quniform`, `randint`.
It is suggested when search space is small, it is feasible to exhaustively sweeping the whole search space. [Detailed Description](./GridsearchTuner.md) It is suggested when search space is small, it is feasible to exhaustively sweeping the whole search space. [Detailed Description](./GridsearchTuner.md)
......
...@@ -5,10 +5,33 @@ TPE, Random Search, Anneal Tuners on NNI ...@@ -5,10 +5,33 @@ TPE, Random Search, Anneal Tuners on NNI
The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluate matric. P(x|y) is modeled by transforming the generative process of hyperparameters, replacing the distributions of the configuration prior with non-parametric densities. This optimization approach is described in detail in [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf). ​ The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluate matric. P(x|y) is modeled by transforming the generative process of hyperparameters, replacing the distributions of the configuration prior with non-parametric densities. This optimization approach is described in detail in [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf). ​
### Parallel TPE optimization
TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete. The original intention of the algorithm design is to optimize sequential. When we use TPE with a large concurrency, its performance will be bad. We have optimized this phenomenon using Constant Liar algorithm. For the principle of optimization, please refer to our [research blog](../CommunitySharings/ParallelizingTpeSearch.md).
### Usage
To use TPE, you should add the following spec in your experiment's YAML config file:
```yaml
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
parallel_optimize: True
constant_liar_type: min
```
**Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will target to maximize metrics. If 'minimize', tuner will target to minimize metrics.
* **parallel_optimize** (*bool, optional, default = False*) - If True, TPE will use Constant Liar algorithm to optimize parallel hyperparameter tuning. Otherwise, TPE will not discriminate between sequential or parallel situations.
* **constant_liar_type** (*min or max or mean, optional, default = min*) - The type of constant liar to use, will logically be determined on the basis of the values taken by y at X. Corresponding to three values, min{Y}, max{Y}, and mean{Y}.
## Random Search ## Random Search
In [Random Search for Hyper-Parameter Optimization](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf) show that Random Search might be surprisingly simple and effective. We suggests that we could use Random Search as baseline when we have no knowledge about the prior distribution of hyper-parameters. In [Random Search for Hyper-Parameter Optimization](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf) show that Random Search might be surprisingly simple and effective. We suggests that we could use Random Search as baseline when we have no knowledge about the prior distribution of hyper-parameters.
## Anneal ## Anneal
This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive. This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
\ No newline at end of file
...@@ -266,7 +266,7 @@ machineList: ...@@ -266,7 +266,7 @@ machineList:
* __gpuNum__ * __gpuNum__
__gpuNum__ specifies the gpu number to run the tuner process. The value of this field should be a positive number. __gpuNum__ specifies the gpu number to run the tuner process. The value of this field should be a positive number. If the field is not set, NNI will not set `CUDA_VISIBLE_DEVICES` in script (that is, will not control the visibility of GPUs on trial command through `CUDA_VISIBLE_DEVICES`), and will not manage gpu resource.
Note: users could only specify one way to set tuner, for example, set {tunerName, optimizationMode} or {tunerCommand, tunerCwd}, and could not set them both. Note: users could only specify one way to set tuner, for example, set {tunerName, optimizationMode} or {tunerCommand, tunerCwd}, and could not set them both.
......
...@@ -114,9 +114,16 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -114,9 +114,16 @@ Debug mode will disable version check function in Trialkeeper.
* Usage * Usage
```bash ```bash
nnictl stop [id] nnictl stop [Options]
``` ```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |The id of the experiment you want to stop|
|--port, -p| False| |Rest port of the experiment you want to stop|
* Details & Examples * Details & Examples
1. If there is no id specified, and there is an experiment running, stop the running experiment, or print error message. 1. If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
...@@ -131,15 +138,21 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -131,15 +138,21 @@ Debug mode will disable version check function in Trialkeeper.
nnictl stop [experiment_id] nnictl stop [experiment_id]
``` ```
3. Users could use 'nnictl stop all' to stop all experiments. 3. If there is a port specified, and an experiment is running on that port, the experiment will be stopped.
```bash
nnictl stop --port 8080
```
4. Users could use 'nnictl stop all' to stop all experiments.
```bash ```bash
nnictl stop all nnictl stop all
``` ```
4. If the id ends with *, nnictl will stop all experiments whose ids matchs the regular. 5. If the id ends with *, nnictl will stop all experiments whose ids matchs the regular.
5. If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment. 6. If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
6. If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information. 7. If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
<a name="update"></a> <a name="update"></a>
...@@ -704,4 +717,4 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -704,4 +717,4 @@ Debug mode will disable version check function in Trialkeeper.
```bash ```bash
nnictl --version nnictl --version
``` ```
\ No newline at end of file
...@@ -73,6 +73,11 @@ All types of sampling strategies and their parameter are listed here: ...@@ -73,6 +73,11 @@ All types of sampling strategies and their parameter are listed here:
* Which means the variable value is a value like round(exp(normal(mu, sigma)) / q) * q * Which means the variable value is a value like round(exp(normal(mu, sigma)) / q) * q
* Suitable for a discrete variable with respect to which the objective is smooth and gets smoother with the size of the variable, which is bounded from one side. * Suitable for a discrete variable with respect to which the objective is smooth and gets smoother with the size of the variable, which is bounded from one side.
* {"_type":"mutable_layer","_value":{mutable_layer_infomation}}
* Type for [Neural Architecture Search Space][1]. Value is also a dictionary, which contains key-value pairs representing respectively name and search space of each mutable_layer.
* For now, users can only use this type of search space with annotation, which means that there is no need to define a json file for search space since it will be automatically generated according to the annotation in trial code.
* For detailed usage, please refer to [General NAS Interfaces][1].
## Search Space Types Supported by Each Tuner ## Search Space Types Supported by Each Tuner
| | choice | randint | uniform | quniform | loguniform | qloguniform | normal | qnormal | lognormal | qlognormal | | | choice | randint | uniform | quniform | loguniform | qloguniform | normal | qnormal | lognormal | qlognormal |
...@@ -83,7 +88,7 @@ All types of sampling strategies and their parameter are listed here: ...@@ -83,7 +88,7 @@ All types of sampling strategies and their parameter are listed here:
| Evolution Tuner | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | | Evolution Tuner | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; |
| SMAC Tuner | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | | | | | | | SMAC Tuner | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | | | | | |
| Batch Tuner | &#10003; | | | | | | | | | | | Batch Tuner | &#10003; | | | | | | | | | |
| Grid Search Tuner | &#10003; | | | &#10003; | | &#10003; | | | | | | Grid Search Tuner | &#10003; | &#10003; | | &#10003; | | | | | | |
| Hyperband Advisor | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | | Hyperband Advisor | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; |
| Metis Tuner | &#10003; | &#10003; | &#10003; | &#10003; | | | | | | | | Metis Tuner | &#10003; | &#10003; | &#10003; | &#10003; | | | | | | |
| GP Tuner | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | | | | | | GP Tuner | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | | | | |
...@@ -91,12 +96,6 @@ All types of sampling strategies and their parameter are listed here: ...@@ -91,12 +96,6 @@ All types of sampling strategies and their parameter are listed here:
Known Limitations: Known Limitations:
* Note that In Grid Search Tuner, for users' convenience, the definition of `quniform` and `qloguniform` change, where q here specifies the number of values that will be sampled. Details about them are listed as follows
* Type 'quniform' will receive three values [low, high, q], where [low, high] specifies a range and 'q' specifies the number of values that will be sampled evenly. Note that q should be at least 2. It will be sampled in a way that the first sampled value is 'low', and each of the following values is (high-low)/q larger that the value in front of it.
* Type 'qloguniform' behaves like 'quniform' except that it will first change the range to [log(low), log(high)] and sample and then change the sampled value back.
* Note that Metis Tuner only supports numerical `choice` now * Note that Metis Tuner only supports numerical `choice` now
* Note that for nested search space: * Note that for nested search space:
...@@ -104,3 +103,5 @@ Known Limitations: ...@@ -104,3 +103,5 @@ Known Limitations:
* Only Random Search/TPE/Anneal/Evolution tuner supports nested search space * Only Random Search/TPE/Anneal/Evolution tuner supports nested search space
* We do not support nested search space "Hyper Parameter" in visualization now, the enhancement is being considered in #1110(https://github.com/microsoft/nni/issues/1110), any suggestions or discussions or contributions are warmly welcomed * We do not support nested search space "Hyper Parameter" in visualization now, the enhancement is being considered in #1110(https://github.com/microsoft/nni/issues/1110), any suggestions or discussions or contributions are warmly welcomed
[1]: ../AdvancedFeature/GeneralNasInterfaces.md
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment