Handling import data in Tuner/Advisor (#992)

Handling import data in Tuner/Advisor

Handling import data in Tuner/Advisor (#992)
Handling import data in Tuner/Advisor
fcbb0ea3 · Shufan Huang · chicm-ms · 26c195e9 · fcbb0ea3 · fcbb0ea3
Commit fcbb0ea3 authored Apr 19, 2019 by Shufan Huang Committed by chicm-ms Apr 19, 2019
14 changed files
--- a/docs/en_US/NNICTLDOC.md
+++ b/docs/en_US/NNICTLDOC.md
@@ -298,24 +298,6 @@ Debug mode will disable version check function in Trialkeeper.
    nnictl trial [trial_id] --experiment [experiment_id]
    ```

-* __nnictl trial export__
-  * Description
-
-    You can use this command to export reward & hyper-parameter of trial jobs to a csv file.
-
-  * Usage
-
-    ```bash
-    nnictl trial export [OPTIONS]
-    ```
-
-  * Options
-
-  |Name, shorthand|Required|Default|Description|
-  |------|------|------ |------|
-  |id|  False| |ID of the experiment    |
-  |--file|  True| |File path of the output csv file     |
-
 <a name="top"></a>

 ![](https://placehold.it/15/1589F0/000000?text=+) `nnictl top`
@@ -388,6 +370,92 @@ Debug mode will disable version check function in Trialkeeper.
    nnictl experiment list
    ```

+<a name="export"></a>
+
+* __nnictl experiment export__
+  * Description
+
+    You can use this command to export reward & hyper-parameter of trial jobs to a csv file.
+
+  * Usage
+
+    ```bash
+    nnictl experiment export [OPTIONS]
+    ```
+
+  * Options
+
+  |Name, shorthand|Required|Default|Description|
+  |------|------|------ |------|
+  |id|  False| |ID of the experiment    |
+  |--file|  True| |File path of the output file     |
+  |--type|  True| |Type of output file, only support "csv" and "json"|
+
+  * Examples
+
+  > export all trial data in an experiment as json format
+
+  ```bash
+  nnictl experiment export [experiment_id] --file [file_path] --type json
+  ```
+
+* __nnictl experiment import__
+  * Description
+
+    You can use this command to import several prior or supplementary trial hyperparameters & results for NNI hyperparameter tuning. The data are fed to the tuning algorithm (e.g., tuner or advisor).
+
+  * Usage
+
+    ```bash
+    nnictl experiment import [OPTIONS]
+    ```
+
+  * Options
+
+  |Name, shorthand|Required|Default|Description|
+  |------|------|------|------|
+  |id|  False| |The id of the experiment you want to import data into|
+  |--file, -f|  True| |a file with data you want to import in json format|
+
+  * Details
+
+    NNI supports users to import their own data, please express the data in the correct format. An example is shown below:
+
+    ```json
+    [
+      {"parameter": {"x": 0.5, "y": 0.9}, "value": 0.03},
+      {"parameter": {"x": 0.4, "y": 0.8}, "value": 0.05},
+      {"parameter": {"x": 0.3, "y": 0.7}, "value": 0.04}
+    ]
+    ```
+
+    Every element in the top level list is a sample. For our built-in tuners/advisors, each sample should have at least two keys: `parameter` and `value`. The `parameter` must match this experiment's search space, that is, all the keys (or hyperparameters) in `parameter` must match the keys in the search space. Otherwise, tuner/advisor may have unpredictable behavior. `Value` should follow the same rule of the input in `nni.report_final_result`, that is, either a number or a dict with a key named `default`. For your customized tuner/advisor, the file could have any json content depending on how you implement the corresponding methods (e.g., `import_data`).
+
+    You also can use [nnictl experiment export](#export) to export a valid json file including previous experiment trial hyperparameters and results.
+
+    Currenctly, following tuner and advisor support import data:
+
+    ```yml
+    builtinTunerName: TPE, Anneal, GridSearch, MetisTuner
+    builtinAdvisorName: BOHB
+    ```
+
+    *If you want to import data to BOHB advisor, user are suggested to add "TRIAL_BUDGET" in parameter as NNI do, otherwise, BOHB will use max_budget as "TRIAL_BUDGET". Here is an example:*
+
+    ```json
+    [
+      {"parameter": {"x": 0.5, "y": 0.9, "TRIAL_BUDGET": 27}, "value": 0.03}
+    ]
+    ```
+
+  * Examples
+
+    > import data to a running experiment
+
+    ```bash
+    nnictl experiment [experiment_id] -f experiment_data.json
+    ```
+
 <a name="config"></a>
 ![](https://placehold.it/15/1589F0/000000?text=+) `nnictl config show`


--- a/src/sdk/pynni/nni/batch_tuner/batch_tuner.py
+++ b/src/sdk/pynni/nni/batch_tuner/batch_tuner.py
@@ -94,4 +94,7 @@ class BatchTuner(Tuner):
        return self.values[self.count]

    def receive_trial_result(self, parameter_id, parameters, value):
-        pass
\ No newline at end of file
+        pass
+
+    def import_data(self, data):
+        pass
--- a/src/sdk/pynni/nni/bohb_advisor/bohb_advisor.py
+++ b/src/sdk/pynni/nni/bohb_advisor/bohb_advisor.py
@@ -573,3 +573,35 @@ class BOHB(MsgDispatcherBase):

    def handle_add_customized_trial(self, data):
        pass
+
+    def handle_import_data(self, data):
+        """Import additional data for tuning
+
+        Parameters
+        ----------
+        data:
+            a list of dictionarys, each of which has at least two keys, 'parameter' and 'value'
+
+        Raises
+        ------
+        AssertionError
+            data doesn't have required key 'parameter' and 'value'
+        """
+        _completed_num = 0
+        for trial_info in data:
+            logger.info("Importing data, current processing progress %s / %s" %(_completed_num), len(data))
+            _completed_num += 1
+            assert "parameter" in trial_info
+            _params = trial_info["parameter"]
+            assert "value" in trial_info
+            _value = trial_info['value']
+            if _KEY not in _params:
+                _params[_KEY] = self.max_budget
+                logger.info("Set \"TRIAL_BUDGET\" value to %s (max budget)" %self.max_budget)
+            if self.optimize_mode is OptimizeMode.Maximize:
+                reward = -_value
+            else:
+                reward = _value
+            _budget = _params[_KEY]
+            self.cg.new_result(loss=reward, budget=_budget, parameters=_params, update_model=True)
+        logger.info("Successfully import tuning data to BOHB advisor.")
--- a/src/sdk/pynni/nni/evolution_tuner/evolution_tuner.py
+++ b/src/sdk/pynni/nni/evolution_tuner/evolution_tuner.py
@@ -34,7 +34,6 @@ from nni.tuner import Tuner
 from nni.utils import extract_scalar_reward
 from .. import parameter_expressions

-
 @unique
 class OptimizeMode(Enum):
    """Optimize Mode class
@@ -299,3 +298,6 @@ class EvolutionTuner(Tuner):

        indiv = Individual(config=params, result=reward)
        self.population.append(indiv)
+
+    def import_data(self, data):
+        pass
--- a/src/sdk/pynni/nni/gridsearch_tuner/gridsearch_tuner.py
+++ b/src/sdk/pynni/nni/gridsearch_tuner/gridsearch_tuner.py
@@ -24,14 +24,17 @@ gridsearch_tuner.py including:

 import copy
 import numpy as np
+import logging

 import nni
 from nni.tuner import Tuner
+from nni.utils import convert_dict2tuple

 TYPE = '_type'
 CHOICE = 'choice'
 VALUE = '_value'

+logger = logging.getLogger('grid_search_AutoML')

 class GridSearchTuner(Tuner):
    '''
@@ -51,6 +54,7 @@ class GridSearchTuner(Tuner):
    def __init__(self):
        self.count = -1
        self.expanded_search_space = []
+        self.supplement_data = dict()

    def json2paramater(self, ss_spec):
        '''
@@ -135,9 +139,31 @@ class GridSearchTuner(Tuner):

    def generate_parameters(self, parameter_id):
        self.count += 1
-        if self.count > len(self.expanded_search_space)-1:
-            raise nni.NoMoreTrialError('no more parameters now.')
-        return self.expanded_search_space[self.count]
+        while (self.count <= len(self.expanded_search_space)-1):
+            _params_tuple = convert_dict2tuple(self.expanded_search_space[self.count])
+            if _params_tuple in self.supplement_data:
+                self.count += 1
+            else:
+                return self.expanded_search_space[self.count]
+        raise nni.NoMoreTrialError('no more parameters now.')

    def receive_trial_result(self, parameter_id, parameters, value):
        pass
+
+    def import_data(self, data):
+        """Import additional data for tuning
+
+        Parameters
+        ----------
+        data:
+            a list of dictionarys, each of which has at least two keys, 'parameter' and 'value'
+        """
+        _completed_num = 0
+        for trial_info in data:
+            logger.info("Importing data, current processing progress %s / %s" %(_completed_num), len(data))
+            _completed_num += 1
+            assert "parameter" in trial_info
+            _params = trial_info["parameter"]
+            _params_tuple = convert_dict2tuple(_params)
+            self.supplement_data[_params_tuple] = True
+        logger.info("Successfully import data to grid search tuner.")
--- a/src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py
+++ b/src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py
@@ -419,3 +419,6 @@ class Hyperband(MsgDispatcherBase):

    def handle_add_customized_trial(self, data):
        pass
+
+    def handle_import_data(self, data):
+        pass
--- a/src/sdk/pynni/nni/hyperopt_tuner/hyperopt_tuner.py
+++ b/src/sdk/pynni/nni/hyperopt_tuner/hyperopt_tuner.py
@@ -172,6 +172,7 @@ class HyperoptTuner(Tuner):
        self.json = None
        self.total_data = {}
        self.rval = None
+        self.supplement_data_num = 0

    def _choose_tuner(self, algorithm_name):
        """
@@ -353,3 +354,27 @@ class HyperoptTuner(Tuner):
        # remove '_index' from json2parameter and save params-id
        total_params = json2parameter(self.json, parameter)
        return total_params
+
+    def import_data(self, data):
+        """Import additional data for tuning
+
+        Parameters
+        ----------
+        data:
+            a list of dictionarys, each of which has at least two keys, 'parameter' and 'value'
+        """
+        _completed_num = 0
+        for trial_info in data:
+            logger.info("Importing data, current processing progress %s / %s" %(_completed_num), len(data))
+            _completed_num += 1
+            if self.algorithm_name == 'random_search':
+                return
+            assert "parameter" in trial_info
+            _params = trial_info["parameter"]
+            assert "value" in trial_info
+            _value = trial_info['value']
+            self.supplement_data_num += 1
+            _parameter_id = '_'.join(["ImportData", str(self.supplement_data_num)])
+            self.total_data[_parameter_id] = _params
+            self.receive_trial_result(parameter_id=_parameter_id, parameters=_params, value=_value)
+        logger.info("Successfully import data to TPE/Anneal tuner.")
--- a/src/sdk/pynni/nni/metis_tuner/metis_tuner.py
+++ b/src/sdk/pynni/nni/metis_tuner/metis_tuner.py
@@ -96,7 +96,7 @@ class MetisTuner(Tuner):
        self.samples_x = []
        self.samples_y = []
        self.samples_y_aggregation = []
-        self.history_parameters = []
+        self.total_data = []
        self.space = None
        self.no_resampling = no_resampling
        self.no_candidates = no_candidates
@@ -107,6 +107,7 @@ class MetisTuner(Tuner):
        self.exploration_probability = exploration_probability
        self.minimize_constraints_fun = None
        self.minimize_starting_points = None
+        self.supplement_data_num = 0


    def update_search_space(self, search_space):
@@ -392,15 +393,35 @@ class MetisTuner(Tuner):
        # ===== STEP 7: If current optimal hyperparameter occurs in the history or exploration probability is less than the threshold, take next config as exploration step  =====
        outputs = self._pack_output(lm_current['hyperparameter'])
        ap = random.uniform(0, 1)
-        if outputs in self.history_parameters or ap<=self.exploration_probability:
+        if outputs in self.total_data or ap<=self.exploration_probability:
            if next_candidate is not None:
                outputs = self._pack_output(next_candidate['hyperparameter'])
            else:
                random_parameter = _rand_init(x_bounds, x_types, 1)[0]
                outputs = self._pack_output(random_parameter)
-        self.history_parameters.append(outputs)
+        self.total_data.append(outputs)
        return outputs

+    def import_data(self, data):
+        """Import additional data for tuning
+        Parameters
+        ----------
+        data:
+            a list of dictionarys, each of which has at least two keys, 'parameter' and 'value'
+        """
+        _completed_num = 0
+        for trial_info in data:
+            logger.info("Importing data, current processing progress %s / %s" %(_completed_num), len(data))
+            _completed_num += 1
+            assert "parameter" in trial_info
+            _params = trial_info["parameter"]
+            assert "value" in trial_info
+            _value = trial_info['value']
+            self.supplement_data_num += 1
+            _parameter_id = '_'.join(["ImportData", str(self.supplement_data_num)])
+            self.total_data.append(_params)
+            self.receive_trial_result(parameter_id=_parameter_id, parameters=_params, value=_value)
+        logger.info("Successfully import data to metis tuner.")

 def _rand_with_constraints(x_bounds, x_types):
    outputs = None

--- a/src/sdk/pynni/nni/multi_phase/multi_phase_dispatcher.py
+++ b/src/sdk/pynni/nni/multi_phase/multi_phase_dispatcher.py
@@ -154,6 +154,9 @@ class MultiPhaseMsgDispatcher(MsgDispatcherBase):
            self.tuner.trial_end(json_tricks.loads(data['hyper_params'])['parameter_id'], data['event'] == 'SUCCEEDED', trial_job_id)
        return True

+    def handle_import_data(self, data):
+        pass
+
    def _handle_intermediate_metric_data(self, data):
        if data['type'] != 'PERIODICAL':
            return True

--- a/src/sdk/pynni/nni/multi_phase/multi_phase_tuner.py
+++ b/src/sdk/pynni/nni/multi_phase/multi_phase_tuner.py
@@ -95,3 +95,6 @@ class MultiPhaseTuner(Recoverable):

    def _on_error(self):
        pass
+
+    def import_data(self, data):
+        pass
--- a/src/sdk/pynni/nni/networkmorphism_tuner/networkmorphism_tuner.py
+++ b/src/sdk/pynni/nni/networkmorphism_tuner/networkmorphism_tuner.py
@@ -307,3 +307,7 @@ class NetworkMorphismTuner(Tuner):
            if item["model_id"] == model_id:
                return item["metric_value"]
        return None
+
+    def import_data(self, data):
+        pass
+
--- a/src/sdk/pynni/nni/recoverable.py
+++ b/src/sdk/pynni/nni/recoverable.py
@@ -24,7 +24,7 @@ class Recoverable:
    def load_checkpoint(self):
        pass

-    def save_checkpont(self):
+    def save_checkpoint(self):
        pass

    def get_checkpoint_path(self):

--- a/src/sdk/pynni/nni/smac_tuner/smac_tuner.py
+++ b/src/sdk/pynni/nni/smac_tuner/smac_tuner.py
@@ -261,3 +261,6 @@ class SMACTuner(Tuner):
                params.append(self.convert_loguniform_categorical(challenger.get_dictionary()))
                cnt += 1
        return params
+
+    def import_data(self, data):
+        pass
--- a/src/sdk/pynni/nni/utils.py
+++ b/src/sdk/pynni/nni/utils.py
@@ -24,6 +24,8 @@ from .env_vars import dispatcher_env_vars

 def extract_scalar_reward(value, scalar_key='default'):
    """
+    Extract scalar reward from trial result.
+
    Raises
    ------
    RuntimeError
@@ -35,9 +37,20 @@ def extract_scalar_reward(value, scalar_key='default'):
    elif isinstance(value, dict) and scalar_key in value and isinstance(value[scalar_key], (float, int)):
        reward = value[scalar_key]
    else:
-        raise RuntimeError('Incorrect final result: the final result for %s should be float/int, or a dict which has a key named "default" whose value is float/int.' % str(self.__class__)) 
+        raise RuntimeError('Incorrect final result: the final result should be float/int, or a dict which has a key named "default" whose value is float/int.')
    return reward

+def convert_dict2tuple(value):
+    """
+    convert dict type to tuple to solve unhashable problem.
+    """
+    if isinstance(value, dict):
+        for _keys in value:
+            value[_keys] = convert_dict2tuple(value[_keys])
+        return tuple(sorted(value.items()))
+    else:
+        return value
+
 def init_dispatcher_logger():
    """ Initialize dispatcher logging configuration"""
    logger_file_path = 'dispatcher.log'