Fix multiphase issue with gridsearch tuner and batch tuner (#1539)

* Fix multiphase issue with gridsearch tuner and batch tuner

Fix multiphase issue with gridsearch tuner and batch tuner (#1539)
* Fix multiphase issue with gridsearch tuner and batch tuner
7246593f · chicm-ms · GitHub · 41e58703 · 7246593f · 7246593f
Unverified Commit 7246593f authored Sep 16, 2019 by chicm-ms Committed by GitHub Sep 16, 2019
11 changed files
--- a/docs/en_US/AdvancedFeature/MultiPhase.md
+++ b/docs/en_US/AdvancedFeature/MultiPhase.md
@@ -8,8 +8,6 @@ Typically each trial job gets a single configuration (e.g., hyperparameters) fro

 The above cases can be supported by the same feature, i.e., multi-phase execution. To support those cases, basically a trial job should be able to request multiple configurations from tuner. Tuner is aware of whether two configuration requests are from the same trial job or different ones. Also in multi-phase a trial job can report multiple final results.

-Note that, `nni.get_next_parameter()` and `nni.report_final_result()` should be called sequentially: __call the former one, then call the later one; and repeat this pattern__. If `nni.get_next_parameter()` is called multiple times consecutively, and then `nni.report_final_result()` is called once, the result is associated to the last configuration, which is retrieved from the last get_next_parameter call. So there is no result associated to previous get_next_parameter calls, and it may cause some multi-phase algorithm broken.
-
 ## Create multi-phase experiment

 ### Write trial code which leverages multi-phase:
@@ -23,6 +21,9 @@ It is pretty simple to use multi-phase in trial code, an example is shown below:
    for i in range(5):
        # get parameter from tuner
        tuner_param = nni.get_next_parameter()
+        # nni.get_next_parameter returns None if there is no more hyper parameters can be generated by tuner.
+        if tuner_param is None:
+          break

        # consume the params
        # ...
@@ -32,6 +33,10 @@ It is pretty simple to use multi-phase in trial code, an example is shown below:
    # ...
    ```

+In multi-phase experiments, at each time the API ```nni.get_next_parameter()``` is called, it returns a new hyper parameter generated by tuner, then the trail code consume this new hyper parameter and report final result of this hyper parameter. `nni.get_next_parameter()` and `nni.report_final_result()` should be called sequentially: __call the former one, then call the later one; and repeat this pattern__. If `nni.get_next_parameter()` is called multiple times consecutively, and then `nni.report_final_result()` is called once, the result is associated to the last configuration, which is retrieved from the last get_next_parameter call. So there is no result associated to previous get_next_parameter calls, and it may cause some multi-phase algorithm broken.
+
+Note that, ```nni.get_next_parameter``` returns None if there is no more hyper parameters can be generated by tuner.
+
 __2. Experiment configuration__

 To enable multi-phase, you should also add `multiPhase: true` in your experiment YAML configure file. If this line is not added, `nni.get_next_parameter()` would always return the same configuration.

--- a/src/nni_manager/core/nnimanager.ts
+++ b/src/nni_manager/core/nnimanager.ts
@@ -691,8 +691,11 @@ class NNIManager implements Manager {
                };
                this.log.info(`updateTrialJob: job id: ${tunerCommand.trial_job_id}, form: ${JSON.stringify(trialJobForm)}`);
                await this.trainingService.updateTrialJob(tunerCommand.trial_job_id, trialJobForm);
+                if (tunerCommand['parameters'] !== null) {
+                    // parameters field is set as empty string if no more hyper parameter can be generated by tuner.
                    await this.dataStore.storeTrialJobEvent(
                        'ADD_HYPERPARAMETER', tunerCommand.trial_job_id, content, undefined);
+                }
                break;
            case NO_MORE_TRIAL_JOBS:
                if (!['ERROR', 'STOPPING', 'STOPPED'].includes(this.status.status)) {

--- a/src/sdk/pynni/nni/msg_dispatcher.py
+++ b/src/sdk/pynni/nni/msg_dispatcher.py
@@ -22,6 +22,7 @@ import logging
 from collections import defaultdict
 import json_tricks

+from nni import NoMoreTrialError
 from .protocol import CommandType, send
 from .msg_dispatcher_base import MsgDispatcherBase
 from .assessor import AssessResult
@@ -144,7 +145,10 @@ class MsgDispatcher(MsgDispatcherBase):
            assert data['trial_job_id'] is not None
            assert data['parameter_index'] is not None
            param_id = _create_parameter_id()
+            try:
                param = self.tuner.generate_parameters(param_id, trial_job_id=data['trial_job_id'])
+            except NoMoreTrialError:
+                param = None
            send(CommandType.SendTrialJobParameter, _pack_parameter(param_id, param, trial_job_id=data['trial_job_id'], parameter_index=data['parameter_index']))
        else:
            raise ValueError('Data type not supported: {}'.format(data['type']))

--- a/src/sdk/pynni/nni/trial.py
+++ b/src/sdk/pynni/nni/trial.py
@@ -43,7 +43,8 @@ _sequence_id = platform.get_sequence_id()


 def get_next_parameter():
-    """Returns a set of (hyper-)paremeters generated by Tuner."""
+    """Returns a set of (hyper-)paremeters generated by Tuner.
+    Returns None if no more (hyper-)parameters can be generated by Tuner."""
    global _params
    _params = platform.get_next_parameter()
    if _params is None:

--- a/test/config_test/multi_phase/multi_phase.py
+++ b/test/config_test/multi_phase/multi_phase.py
@@ -4,5 +4,8 @@ import nni
 if __name__ == '__main__':
    for i in range(5):
        hyper_params = nni.get_next_parameter()
+        print('hyper_params:[{}]'.format(hyper_params))
+        if hyper_params is None:
+            break
        nni.report_final_result(0.1*i)
        time.sleep(3)
--- a/test/pipelines-it-local-windows.yml
+++ b/test/pipelines-it-local-windows.yml
@@ -18,7 +18,7 @@ jobs:
    displayName: 'generate config files'
  - script: |
      cd test
-      python config_test.py --ts local --local_gpu --exclude smac,bohb,multi_phase_batch,multi_phase_grid
+      python config_test.py --ts local --local_gpu --exclude smac,bohb
    displayName: 'Examples and advanced features tests on local machine'
  - script: |
      cd test

--- a/test/pipelines-it-local.yml
+++ b/test/pipelines-it-local.yml
@@ -31,7 +31,7 @@ jobs:
    displayName: 'Built-in tuners / assessors tests'
  - script: |
      cd test
-      PATH=$HOME/.local/bin:$PATH python3 config_test.py --ts local --local_gpu --exclude multi_phase_batch,multi_phase_grid
+      PATH=$HOME/.local/bin:$PATH python3 config_test.py --ts local --local_gpu
    displayName: 'Examples and advanced features tests on local machine'
  - script: |
      cd test

--- a/test/pipelines-it-pai-windows.yml
+++ b/test/pipelines-it-pai-windows.yml
@@ -65,5 +65,5 @@ jobs:
      python --version
      python generate_ts_config.py --ts pai --pai_host $(pai_host) --pai_user $(pai_user) --pai_pwd $(pai_pwd) --vc $(pai_virtual_cluster) --nni_docker_image $(docker_image) --data_dir $(data_dir) --output_dir $(output_dir) --nni_manager_ip $(nni_manager_ip)

-      python config_test.py --ts pai --exclude multi_phase,smac,bohb,multi_phase_batch,multi_phase_grid
+      python config_test.py --ts pai --exclude multi_phase,smac,bohb
    displayName: 'Examples and advanced features tests on pai'
\ No newline at end of file
--- a/test/pipelines-it-pai.yml
+++ b/test/pipelines-it-pai.yml
@@ -76,6 +76,6 @@ jobs:
      python3 generate_ts_config.py --ts pai --pai_host $(pai_host) --pai_user $(pai_user) --pai_pwd $(pai_pwd) --vc $(pai_virtual_cluster) \
      --nni_docker_image $TEST_IMG --data_dir $(data_dir) --output_dir $(output_dir) --nni_manager_ip $(nni_manager_ip)

-      PATH=$HOME/.local/bin:$PATH python3 config_test.py --ts pai --exclude multi_phase_batch,multi_phase_grid
+      PATH=$HOME/.local/bin:$PATH python3 config_test.py --ts pai
      PATH=$HOME/.local/bin:$PATH python3 metrics_test.py
    displayName: 'integration test'
--- a/test/pipelines-it-remote-windows.yml
+++ b/test/pipelines-it-remote-windows.yml
@@ -39,7 +39,7 @@ jobs:
      cd test
      python generate_ts_config.py --ts remote --remote_user $(docker_user) --remote_host $(remote_host) --remote_port $(Get-Content port) --remote_pwd $(docker_pwd) --nni_manager_ip $(nni_manager_ip)
      Get-Content training_service.yml
-      python config_test.py --ts remote --exclude cifar10,smac,bohb,multi_phase_batch,multi_phase_grid
+      python config_test.py --ts remote --exclude cifar10,smac,bohb
    displayName: 'integration test'
  - task: SSH@0
    inputs:

--- a/test/pipelines-it-remote.yml
+++ b/test/pipelines-it-remote.yml
@@ -53,7 +53,7 @@ jobs:
      python3 generate_ts_config.py --ts remote --remote_user $(docker_user) --remote_host $(remote_host) \
      --remote_port $(cat port) --remote_pwd $(docker_pwd) --nni_manager_ip $(nni_manager_ip)
      cat training_service.yml
-      PATH=$HOME/.local/bin:$PATH python3 config_test.py --ts remote --exclude cifar10,multi_phase_batch,multi_phase_grid
+      PATH=$HOME/.local/bin:$PATH python3 config_test.py --ts remote --exclude cifar10
      PATH=$HOME/.local/bin:$PATH python3 metrics_test.py
    displayName: 'integration test'
  - task: SSH@0