Hyperband Refector (#3040)

26b71c40 · J-shang · GitHub · a3108caf · 26b71c40 · 26b71c40
Unverified Commit 26b71c40 authored Nov 13, 2020 by J-shang Committed by GitHub Nov 13, 2020
20 changed files
--- a/docs/en_US/Tuner/BuiltinTuner.md
+++ b/docs/en_US/Tuner/BuiltinTuner.md
@@ -247,6 +247,7 @@ This is suggested when you have limited computational resources but have a relat
 * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
 * **R** (*int, optional, default = 60*) - the maximum budget given to a trial (could be the number of mini-batches or epochs). Each trial should use TRIAL_BUDGET to control how long they run.
 * **eta** (*int, optional, default = 3*) - `(eta-1)/eta` is the proportion of discarded trials.
+* **exec_mode** (*serial or parallelism, optional, default = parallelism*) - If 'parallelism', the tuner will try to use available resources to start new bucket immediately. If 'serial', the tuner will only start new bucket after the current bucket is done.

 **Example Configuration:**


--- a/docs/en_US/Tuner/HyperbandAdvisor.md
+++ b/docs/en_US/Tuner/HyperbandAdvisor.md
@@ -7,7 +7,17 @@ Hyperband on NNI
 ## 2. Implementation with full parallelism
 First, this is an example of how to write an autoML algorithm based on MsgDispatcherBase, rather than Tuner and Assessor. Hyperband is implemented in this way because it integrates the functions of both Tuner and Assessor, thus, we call it Advisor.

-Second, this implementation fully leverages Hyperband's internal parallelism. Specifically, the next bucket is not started strictly after the current bucket. Instead, it starts when there are available resources.
+Second, this implementation fully leverages Hyperband's internal parallelism. Specifically, the next bucket is not started strictly after the current bucket. Instead, it starts when there are available resources. If you want to use full parallelism mode, set `exec_mode` with `parallelism`. 
+
+Or if you want to set `exec_mode` with `serial` according to the original algorithm. In this mode, the next bucket will start strictly after the current bucket.
+
+`parallelism` mode may lead to multiple unfinished buckets, and there is at most one unfinished bucket under `serial` mode. The advantage of `parallelism` mode is to make full use of resources, which may reduce the experiment duration multiple times. The following two pictures are the results of quick verification using [nas-bench-201](../NAS/Benchmarks.md), picture above is in `parallelism` mode, picture below is in `serial` mode.
+
+![parallelism mode](../../img/hyperband_parallelism.png "parallelism mode")
+
+![serial mode](../../img/hyperband_serial.png "serial mode")
+
+If you want to reproduce these results, refer to the example under `examples/trials/benchmarking/` for details.

 ## 3. Usage
 To use Hyperband, you should add the following spec in your experiment's YAML config file:
@@ -23,6 +33,8 @@ advisor:
    eta: 3
    #choice: maximize, minimize
    optimize_mode: maximize
+    #choice: serial, parallelism
+    exec_mode: parallelism
 ```

 Note that once you use Advisor, you are not allowed to add a Tuner and Assessor spec in the config file. If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there will be one more key called `TRIAL_BUDGET` defined by user. **By using this `TRIAL_BUDGET`, the trial can control how long it runs**.

--- a/docs/img/hyperband_parallelism.png
+++ b/docs/img/hyperband_parallelism.png
--- a/docs/img/hyperband_serial.png
+++ b/docs/img/hyperband_serial.png
--- a/examples/trials/benchmarking/config_hyperband.yml
+++ b/examples/trials/benchmarking/config_hyperband.yml
+authorName: default
+experimentName: example_mnist_hyperband
+trialConcurrency: 2
+maxExecDuration: 100h
+maxTrialNum: 10000
+#choice: local, remote, pai
+trainingServicePlatform: local
+searchSpacePath: search_space.json
+#choice: true, false
+useAnnotation: false
+advisor:
+  #choice: Hyperband, BOHB
+  builtinAdvisorName: Hyperband
+  classArgs:
+    #R: the maximum trial budget (could be the number of mini-batches or epochs) can be
+    #   allocated to a trial. Each trial should use trial budget to control how long it runs.
+    R: 60
+    #eta: proportion of discarded trials
+    eta: 3
+    #choice: maximize, minimize
+    optimize_mode: maximize
+    #choice: serial, parallelism
+    exec_mode: serial
+trial:
+  command: python3 main.py
+  codeDir: .
+  gpuNum: 0
--- a/examples/trials/benchmarking/main.py
+++ b/examples/trials/benchmarking/main.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+"""
+A test for hyperband, using nasbench201. So it need install the dependencies for nasbench201 at first.
+"""
+import argparse
+import logging
+import random
+import time
+
+import nni
+from nni.utils import merge_parameter
+from nni.nas.benchmarks.nasbench201 import query_nb201_trial_stats
+
+
+logger = logging.getLogger('test_hyperband')
+
+
+def main(args):
+    r = args.pop('TRIAL_BUDGET')
+    dataset = [t for t in query_nb201_trial_stats(args, 200, 'cifar100', include_intermediates=True)]
+    test_acc = random.choice(dataset)['intermediates'][r - 1]['ori_test_acc'] / 100
+    time.sleep(random.randint(0, 10))
+    nni.report_final_result(test_acc)
+    logger.debug('Final result is %g', test_acc)
+    logger.debug('Send final result done.')
+
+def get_params():
+    parser = argparse.ArgumentParser(description='Hyperband Test')
+    parser.add_argument("--0_1", type=str, default='none')
+    parser.add_argument("--0_2", type=str, default='none')
+    parser.add_argument("--0_3", type=str, default='none')
+    parser.add_argument("--1_2", type=str, default='none')
+    parser.add_argument("--1_3", type=str, default='none')
+    parser.add_argument("--2_3", type=str, default='none')
+    parser.add_argument("--TRIAL_BUDGET", type=int, default=200)
+
+    args, _ = parser.parse_known_args()
+    return args
+
+if __name__ == '__main__':
+    try:
+        # get parameters form tuner
+        tuner_params = nni.get_next_parameter()
+        logger.debug(tuner_params)
+        params = vars(merge_parameter(get_params(), tuner_params))
+        print(params)
+        main(params)
+    except Exception as exception:
+        logger.exception(exception)
+        raise
--- a/examples/trials/benchmarking/search_space.json
+++ b/examples/trials/benchmarking/search_space.json
+{
+    "0_1": {"_type": "choice", "_value": ["none", "skip_connect", "conv_1x1", "conv_3x3", "avg_pool_3x3"]},
+    "0_2": {"_type": "choice", "_value": ["none", "skip_connect", "conv_1x1", "conv_3x3", "avg_pool_3x3"]},
+    "0_3": {"_type": "choice", "_value": ["none", "skip_connect", "conv_1x1", "conv_3x3", "avg_pool_3x3"]},
+    "1_2": {"_type": "choice", "_value": ["none", "skip_connect", "conv_1x1", "conv_3x3", "avg_pool_3x3"]},
+    "1_3": {"_type": "choice", "_value": ["none", "skip_connect", "conv_1x1", "conv_3x3", "avg_pool_3x3"]},
+    "2_3": {"_type": "choice", "_value": ["none", "skip_connect", "conv_1x1", "conv_3x3", "avg_pool_3x3"]}
+}
--- a/examples/trials/mnist-advisor/config_hyperband.yml
+++ b/examples/trials/mnist-advisor/config_hyperband.yml
@@ -19,6 +19,8 @@ advisor:
    eta: 3
    #choice: maximize, minimize
    optimize_mode: maximize
+    #choice: serial, parallelism
+    exec_mode: parallelism
 trial:
  command: python3 mnist.py
  codeDir: .

--- a/examples/trials/mnist-advisor/config_pai.yml
+++ b/examples/trials/mnist-advisor/config_pai.yml
@@ -19,6 +19,8 @@ advisor:
    eta: 3
    #choice: maximize, minimize
    optimize_mode: maximize
+    #choice: serial, parallelism
+    exec_mode: parallelism
 trial:
  command: python3 mnist.py
  codeDir: .

--- a/examples/trials/mnist-advisor/config_paiYarn.yml
+++ b/examples/trials/mnist-advisor/config_paiYarn.yml
@@ -19,6 +19,8 @@ advisor:
    eta: 3
    #choice: maximize, minimize
    optimize_mode: maximize
+    #choice: serial, parallelism
+    exec_mode: parallelism
 trial:
  command: python3 mnist.py
  codeDir: .

--- a/examples/trials/systems/opevo/Dockerfile
+++ b/examples/trials/systems/opevo/Dockerfile
--- a/examples/trials/systems/opevo/Makefile
+++ b/examples/trials/systems/opevo/Makefile
--- a/examples/trials/systems/opevo/screenshot.png
+++ b/examples/trials/systems/opevo/screenshot.png
--- a/examples/trials/systems/opevo/src/algorithms/gbfs.py
+++ b/examples/trials/systems/opevo/src/algorithms/gbfs.py
--- a/examples/trials/systems/opevo/src/algorithms/na2c.py
+++ b/examples/trials/systems/opevo/src/algorithms/na2c.py
--- a/examples/trials/systems/opevo/src/algorithms/opevo.py
+++ b/examples/trials/systems/opevo/src/algorithms/opevo.py
--- a/examples/trials/systems/opevo/src/compiler_auto_tune_stable.py
+++ b/examples/trials/systems/opevo/src/compiler_auto_tune_stable.py
--- a/examples/trials/systems/opevo/src/experiments/bmm/B960N128K128M64PNN/config_opevo.yml
+++ b/examples/trials/systems/opevo/src/experiments/bmm/B960N128K128M64PNN/config_opevo.yml
--- a/examples/trials/systems/opevo/src/experiments/bmm/B960N128K128M64PNN/search_space.json
+++ b/examples/trials/systems/opevo/src/experiments/bmm/B960N128K128M64PNN/search_space.json
--- a/examples/trials/systems/opevo/src/experiments/bmm/B960N128K128M64PTN/config_opevo.yml
+++ b/examples/trials/systems/opevo/src/experiments/bmm/B960N128K128M64PTN/config_opevo.yml