New experiment config (#3138)

a1016a6c · liuzhe-lz · GitHub · 05534f37 · a1016a6c · a1016a6c
Unverified Commit a1016a6c authored Dec 21, 2020 by liuzhe-lz Committed by GitHub Dec 21, 2020
20 changed files
--- a/docs/en_US/reference.rst
+++ b/docs/en_US/reference.rst
@@ -6,6 +6,7 @@ References
    nnictl Commands <Tutorial/Nnictl>
    Experiment Configuration <Tutorial/ExperimentConfig>
+    Experiment Configuration V2 <reference/experiment_config>
    Search Space <Tutorial/SearchSpaceSpec>
    NNI Annotation <Tutorial/AnnotationSpec>
    SDK API References <sdk_reference>

--- a/docs/en_US/reference/experiment_config.rst
+++ b/docs/en_US/reference/experiment_config.rst
+===========================
+Experiment Config Reference
+===========================
+This is the detailed list of experiment config fields.
+For quick start guide, reference the tutorial instead. [TODO]
+Notes
+=====
+1. This document list field names as separated words.
+   They should be spelled in ``snake_case`` for Python library ``nni.experiment``, and are normally spelled in ``camelCase`` for YAML files.
+2. In this document type of fields are expressed in `Python type hint <https://docs.python.org/3/library/typing.html>`__ format.
+   Therefore JSON objects are called `dict` and arrays are called `list`.
+.. _Path:
+.. _directory:
+3. Some fields take a path to file or directory.
+   Unless otherwise noted, both absolute path and relative path are supported, and ``~`` can be used for home directory.
+   - When written in YAML file, relative paths are relative to the directory containing that file.
+   - When assigned in Python code, relative paths are relative to current working directory.
+   - All relative paths are converted to absolute when loading YAML file into Python class, and when saving Python class to YAML file.
+4. Setting a field to ``None`` or ``null`` is equivalent to not setting the field.
+ExperimentConfig
+================
+experiment name
+---------------
+Mnemonic name of the experiment. This will be shown in web UI and nnictl.
+type: ``Optional[str]``
+search space file
+-----------------
+Path_ to a JSON file containing the search space.
+type: ``Optional[str]``
+Search space format is determined by tuner. Common format for built-in tuners is documeted `here <../Tutorial/SearchSpaceSpec.html>`__.
+Mutually exclusive to `search space`_.
+search space
+------------
+Search space object.
+type: ``Optional[Any]``
+The format is determined by tuner. Common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.html>`__.
+Note that ``None`` means "no such field" so empty search space should be written as ``{}``.
+Mutually exclusive to `search space file`_.
+trial command
+-------------
+Command(s) to launch trial.
+type: ``str``
+Bash will be used on Linux and macOS. PowerShell will be used on Windows.
+trial code directory
+--------------------
+`Path`_ to the directory containing trial source files.
+type: ``str``
+default: ``"."``
+All files in this directory will be sent to training machine, unless there is a ``.nniignore`` file [TODO:link]
+trial concurrency
+-----------------
+Specify how many trials should be run concurrently.
+type: ``int``
+The real concurrency also depends on hardware resources and may be less than this value.
+trial gpu number
+----------------
+Number of GPUs used by each trial.
+type: ``Optional[int]``
+If set to zero, trials will have no access to any GPU. 
+If not specified, trials will be created and scheduled as if they do not use GPU,
+but they can still access all GPUs on the training machine.
+max experiment duration
+-----------------------
+Limit the duration of this experiment if specified.
+type: ``Optional[str]``
+format: ``number + s|m|h|d``
+examples: ``"10m"``, ``"0.5h"``
+When time runs out, the experiment will stop creating trials but continue to serve web UI.
+max trial number
+----------------
+Limit the number of trials to create if specified.
+type: ``Optional[int]``
+When the budget runs out, the experiment will stop creating trials but continue to serve web UI.
+nni manager ip
+--------------
+IP of current machine, used by training machines to access NNI manager. Not used in local mode.
+type: ``Optional[str]``
+If not specified, this will be the default IPv4 address of outgoing connection.
+use annotation
+--------------
+Enable `annotation <../Tutorial/AnnotationSpec.html>`__.
+type: ``bool``
+default: ``False``
+When using annotation, `search space`_ and `search space file`_ should not be specified manually.
+debug
+-----
+Enable debug mode.
+type: ``bool``
+default: ``False``
+When enabled, logging will be more verbose and some internal validation will be loosen.
+log level
+---------
+Set log level of whole system.
+type: ``Optional[str]``
+values: ``"trace"``, ``"debug"``, ``"info"``, ``"warning"``, ``"error"``, ``"fatal"``
+Defaults to "info" or "debug", depending on `debug`_ option.
+Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc.
+The exception is trial, whose logging level is directly managed by trial code.
+For Python modules, "trace" acts as ``logging.DEBUG`` and "fatal" acts as ``logging.CRITICAL``.
+experiment working directory
+----------------------------
+Specify the `directory`_ to place log, checkpoint, metadata, and other run-time stuff.
+type: ``Optional[str]``
+By default uses ``~/nni-experiments``.
+NNI will create a subdirectory named by experiment ID, so it is safe to use same directory for multiple experiments.
+tuner gpu indices
+-----------------
+Limit the GPUs visible to tuner, assessor, and advisor.
+type: ``Optional[Union[list[int], str]]``
+This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process.
+Because tuner, assessor, and advisor run in same process, this option will affect them all.
+tuner
+-----
+Specify the tuner [TODO:link]
+type: Optional `AlgorithmConfig`_
+assessor
+--------
+Specify the assessor [TODO:link]
+type: Optional `AlgorithmConfig`_
+advisor
+-------
+Specify the advisor [TODO:link]
+type: Optional `AlgorithmConfig`_
+training service
+----------------
+Specify `training service <../TrainingService/Overview.html>`__.
+type: `TrainingServiceConfig`_
+AlgorithmConfig
+===============
+[TODO:short description]
+name
+----
+Name of built-in or registered [TODO:link] algorithm.
+type: ``str`` for built-in and registered algorithm, ``None`` for custom algorithm
+class name
+----------
+Qualified class name of custom algorithm.
+type: ``str`` for custom algorithm, ``None`` for built-in and registered algorithm
+example: ``"my_tuner.MyTuner"``
+code directory
+--------------
+`Path`_ to directory containing the custom algorithm class.
+type: ``Optional[str]`` for custom algorithm, ``None`` for built-in and registered algorithm
+If not specified, the `class name`_ will be looked up in Python's `module search path <https://docs.python.org/3/tutorial/modules.html#the-module-search-path>`__
+class args
+----------
+Keyword arguments passed to algorithm class' constructor.
+type: ``Optional[dict[str, Any]]``
+See algorithm's document for supported value.
+TrainingServiceConfig
+=====================
+One of following:
+  - `LocalConfig`_
+  - `RemoteConfig`_
+  - `OpenPaiConfig`_
+LocalConfig
+===========
+Detailed `here <../TrainingService/LocalMode.html>`__.
+platform
+--------
+Constant string ``"local"``.
+use active gpu
+--------------
+Specify whether NNI should submit trials to GPUs occupied by other tasks.
+type: ``bool``
+If your are using desktop system with GUI, set this to ``True``.
+// need to discuss default value
+max trial number per gpu
+------------------------
+Specify how many trials can share one GPU.
+type: ``int``
+default: ``1``
+gpu indices
+-----------
+Limit the GPUs visible to trial processes.
+type: ``Optional[Union[list[int], str]]``
+If `trial gpu number`_ is less than the length of this value, only a subset will be visible to each trial.
+This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.
+RemoteConfig
+============
+Detailed `here <../TrainingService/RemoteMachineMode.html>`__.
+platform
+--------
+Constant string ``"remote"``.
+machine list
+------------
+List of training machines.
+type: list of `RemoteMachineConfig`_
+reuse mode
+----------
+Enable reuse mode [TODO]
+type: bool
+RemoteMachineConfig
+===================
+host
+----
+IP or hostname (domain name) of the machine.
+type: ``str``
+port
+----
+SSH service port.
+type: ``int``
+default: 22
+user
+----
+Login user name.
+type: ``str``
+password
+--------
+Login password.
+type: ``Optional[str]``
+If not specified, `ssh key file`_ will be used instead.
+ssh key file
+------------
+`Path`_ to ssh key file (identity file).
+type: ``str``
+default: ``"~/.ssh/id_rsa"``
+Only used when `password`_ is not specified.
+ssh passphrase
+--------------
+Passphrase of SSH identity file.
+type: ``Optional[str]``
+use active gpu
+--------------
+Specify whether NNI should submit trials to GPUs occupied by other tasks.
+type: ``bool``
+max trial number per gpu
+------------------------
+Specify how many trials can share one GPU.
+type: ``int``
+default: ``1``
+gpu indices
+-----------
+Limit the GPUs visible to trial processes.
+type: ``Optional[Union[list[int], str]]``
+If `trial gpu number`_ is less than the length of this value, only a subset will be visible to each trial.
+This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.
+trial prepare command
+---------------------
+Command(s) to run before launching each trial.
+type: ``Optional[str]``
+This is useful if preparing steps vary for different machines.
+OpenPaiConfig
+=============
+Detailed `here <../TrainingService/PaiMode.html>`__.
+platform
+--------
+Constant string ``"openpai"``.
+host
+----
+Hostname of OpenPAI service.
+type: ``str``
+username
+--------
+OpenPAI user name.
+type: ``str``
+token
+-----
+OpenPAI user token.
+type: ``str``
+This can be found in your OpenPAI user settings page.
+trial cpu number
+----------------
+Number of CPUs used by each trial.
+type: ``int``
+default: ``1``
+trial memory size
+-----------------
+Memory used by each trial.
+type: ``str``
+examples: ``"1gb"``, ``"512mb"``
+docker image
+------------
+Name and tag of docker image to run the trials.
+type: ``str``
+default: ``"msranni/nni:latest"``
+reuse mode
+----------
+Enable reuse mode.
+type: ``bool``
+default: ``False``
+nni manager storage mount point
+-------------------------------
+`Mount point <path>`_ of storage service (typically NFS) on current machine.
+type: ``str``
+container storage mount point
+-----------------------------
+Mount point of storage service (typically NFS) in docker container.
+type: ``str``
+This must be an absolute path.
+open pai config
+---------------
+Embedded OpenPAI config file.
+type: ``Optional[Dict[str, Any]]``
+open pai config file
+--------------------
+`Path`_ to OpenPAI
--- a/nni/experiment/config/__init__.py
+++ b/nni/experiment/config/__init__.py
@@ -3,3 +3,9 @@
 from .common import *
 from .local import *
+from .remote import *
+from .openpai import *
+from .aml import *
+from .kubeflow import *
+from .frameworkcontroller import *
+from .adl import *
--- a/nni/experiment/config/adl.py
+++ b/nni/experiment/config/adl.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+from dataclasses import dataclass
+from .common import TrainingServiceConfig
+__all__ = ['AdlConfig']
+@dataclass(init=False)
+class AdlConfig(TrainingServiceConfig):
+    platform: str = 'adl'
+    docker_image: str = 'msranni/nni:latest'
+    _validation_rules = {
+        'platform': lambda value: (value == 'adl', 'cannot be modified')
+    }
--- a/nni/experiment/config/aml.py
+++ b/nni/experiment/config/aml.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+from dataclasses import dataclass
+from .common import TrainingServiceConfig
+__all__ = ['AmlConfig']
+@dataclass(init=False)
+class AmlConfig(TrainingServiceConfig):
+    platform: str = 'aml'
+    subscription_id: str
+    resource_group: str
+    workspace_name: str
+    compute_target: str
+    docker_image: str = 'msranni/nni:latest'
+    _validation_rules = {
+        'platform': lambda value: (value == 'aml', 'cannot be modified')
+    }
--- a/nni/experiment/config/base.py
+++ b/nni/experiment/config/base.py
@@ -58,11 +58,6 @@ class ConfigBase:
                    value = Path(value).expanduser()
                    if not value.is_absolute():
                        value = _base_path / value
-                # convert nested dict to config type
-                if isinstance(value, dict):
-                    cls = util.strip_optional(field.type)
-                    if isinstance(cls, type) and issubclass(cls, ConfigBase):
-                        value = cls(**value, _base_path=_base_path)
            setattr(self, field.name, value)
        if kwargs:
            cls = type(self).__name__

--- a/nni/experiment/config/common.py
+++ b/nni/experiment/config/common.py
@@ -53,7 +53,7 @@ class ExperimentConfig(ConfigBase):
    trial_command: str
    trial_code_directory: PathLike = '.'
    trial_concurrency: int
-    trial_gpu_number: int = 0
+    trial_gpu_number: Optional[int] = None
    max_experiment_duration: Optional[str] = None
    max_trial_number: Optional[int] = None
    nni_manager_ip: Optional[str] = None
@@ -68,10 +68,13 @@ class ExperimentConfig(ConfigBase):
    training_service: TrainingServiceConfig
    def __init__(self, training_service_platform: Optional[str] = None, **kwargs):
-        super().__init__(**kwargs)
+        kwargs = util.case_insensitive(kwargs)
        if training_service_platform is not None:
-            assert 'training_service' not in kwargs
+            assert 'trainingservice' not in kwargs
-            self.training_service = util.training_service_config_factory(training_service_platform)
+            kwargs['trainingservice'] = util.training_service_config_factory(training_service_platform)
+        elif isinstance(kwargs.get('trainingservice'), dict):
+            kwargs['trainingservice'] = util.training_service_config_factory(**kwargs['trainingservice'])
+        super().__init__(**kwargs)
    def validate(self, initialized_tuner: bool = False) -> None:
        super().validate()
@@ -79,6 +82,9 @@ class ExperimentConfig(ConfigBase):
            _validate_for_exp(self)
        else:
            _validate_for_nnictl(self)
+        if self.trial_gpu_number and hasattr(self.training_service, 'use_active_gpu'):
+            if self.training_service.use_active_gpu is None:
+                raise ValueError('Please set "use_active_gpu"')
 ## End of public API ##

--- a/nni/experiment/config/convert.py
+++ b/nni/experiment/config/convert.py
@@ -61,49 +61,115 @@ def to_v1_yaml(config: ExperimentConfig, skip_nnictl: bool = False) -> Dict[str,
    data['trial'] = {
        'command': data.pop('trialCommand'),
        'codeDir': data.pop('trialCodeDirectory'),
-        'gpuNum': data.pop('trialGpuNumber', '')
    }
+    if 'trialGpuNumber' in data:
+        data['trial']['gpuNum'] = data.pop('trialGpuNumber')
    if ts['platform'] == 'local':
        data['localConfig'] = {
-            'useActiveGpu': ts['useActiveGpu'],
+            'useActiveGpu': ts.get('useActiveGpu', False),
            'maxTrialNumPerGpu': ts['maxTrialNumberPerGpu']
        }
-        if ts.get('gpuIndices') is not None:
+        if 'gpuIndices' in ts:
-            data['localConfig']['gpuIndices'] = ','.join(str(idx) for idx in ts['gpuIndices'])
+            data['localConfig']['gpuIndices'] = _convert_gpu_indices(ts['gpuIndices'])
    elif ts['platform'] == 'remote':
+        print(ts)
        data['remoteConfig'] = {'reuse': ts['reuseMode']}
        data['machineList'] = []
        for machine in ts['machineList']:
-            machine = {
+            machine_v1 = {
-                'ip': machine['host'],
+                'ip': machine.get('host'),
-                'username': machine['user'],
+                'port': machine.get('port'),
-                'passwd': machine['password'],
+                'username': machine.get('user'),
-                'sshKeyPath': machine['sshKeyFile'],
+                'passwd': machine.get('password'),
-                'passphrase': machine['sshPassphrase'],
+                'sshKeyPath': machine.get('sshKeyFile'),
-                'gpuIndices': _convert_gpu_indices(machine['gpuIndices']),
+                'passphrase': machine.get('sshPassphrase'),
-                'maxTrialNumPerGpu': machine['maxTrialNumPerGpu'],
+                'gpuIndices': _convert_gpu_indices(machine.get('gpuIndices')),
-                'useActiveGpu': machine['useActiveGpu'],
+                'maxTrialNumPerGpu': machine.get('maxTrialNumPerGpu'),
-                'preCommand': machine['trialPrepareCommand']
+                'useActiveGpu': machine.get('useActiveGpu'),
+                'preCommand': machine.get('trialPrepareCommand')
            }
+            machine_v1 = {k: v for k, v in machine_v1.items() if v is not None}
+            data['machineList'].append(machine_v1)
    elif ts['platform'] == 'pai':
-        data['trial']['cpuNum'] = ts['trialCpuNumber']
+        data['trial']['image'] = ts['dockerImage']
-        data['trial']['memoryMB'] = util.parse_size(ts['trialMemorySize'])
+        data['trial']['nniManagerNFSMountPath'] = ts['localStorageMountPoint']
-        data['trial']['image'] = ts['docker_image']
+        data['trial']['containerNFSMountPath'] = ts['containerStorageMountPoint']
        data['paiConfig'] = {
            'userName': ts['username'],
            'token': ts['token'],
-            'host': 'https://' + ts['host'],
+            'host': ts['host'],
            'reuse': ts['reuseMode']
        }
+        if 'openpaiConfigFile' in ts:
+            data['paiConfig']['paiConfigPath'] = ts['openpaiConfigFile']
+        elif 'openpaiConfig' in ts:
+            conf_file = NamedTemporaryFile('w', delete=False)
+            json.dump(ts['openpaiConfig'], conf_file, indent=4)
+            data['paiConfig']['paiConfigPath'] = conf_file.name
+    elif ts['platform'] == 'aml':
+        data['trial']['image'] = ts['dockerImage']
+        data['amlConfig'] = dict(ts)
+        data['amlConfig'].pop('platform')
+        data['amlConfig'].pop('dockerImage')
+    elif ts['platform'] == 'kubeflow':
+        data['trial'].pop('command')
+        data['trial'].pop('gpuNum')
+        data['kubeflowConfig'] = dict(ts['storage'])
+        data['kubeflowConfig']['operator'] = ts['operator']
+        data['kubeflowConfig']['apiVersion'] = ts['apiVersion']
+        data['trial']['worker'] = _convert_kubeflow_role(ts['worker'])
+        if ts.get('parameterServer') is not None:
+            if ts['operator'] == 'tf-operator':
+                data['trial']['ps'] = _convert_kubeflow_role(ts['parameterServer'])
+            else:
+                data['trial']['master'] = _convert_kubeflow_role(ts['parameterServer'])
+    elif ts['platform'] == 'frameworkcontroller':
+        data['trial'].pop('command')
+        data['trial'].pop('gpuNum')
+        data['frameworkcontrollerConfig'] = dict(ts['storage'])
+        data['frameworkcontrollerConfig']['serviceAccountName'] = ts['serviceAccountName']
+        data['trial']['taskRoles'] = [_convert_fxctl_role(r) for r in ts['taskRoles']]
+    elif ts['platform'] == 'adl':
+        data['trial']['image'] = ts['dockerImage']
    return data
 def _convert_gpu_indices(indices):
    return ','.join(str(idx) for idx in indices) if indices is not None else None
+def _convert_kubeflow_role(data):
+    return {
+        'replicas': data['replicas'],
+        'command': data['command'],
+        'gpuNum': data['gpuNumber'],
+        'cpuNum': data['cpuNumber'],
+        'memoryMB': util.parse_size(data['memorySize']),
+        'image': data['dockerImage']
+    }
+def _convert_fxctl_role(data):
+    return {
+        'name': data['name'],
+        'taskNum': data['taskNumber'],
+        'command': data['command'],
+        'gpuNum': data['gpuNumber'],
+        'cpuNum': data['cpuNumber'],
+        'memoryMB': util.parse_size(data['memorySize']),
+        'image': data['dockerImage'],
+        'frameworkAttemptCompletionPolicy': {
+            'minFailedTaskCount': data['attemptCompletionMinFailedTasks'],
+            'minSucceededTaskCount': data['attemptCompletionMinSucceededTasks']
+        }
+    }
 def to_cluster_metadata(config: ExperimentConfig) -> List[Dict[str, Any]]:
    experiment_config = to_v1_yaml(config, skip_nnictl=True)
@@ -135,9 +201,19 @@ def to_cluster_metadata(config: ExperimentConfig) -> List[Dict[str, Any]]:
        ret.append(request_data)
    elif config.training_service.platform == 'openpai':
-        pai_config_data = dict()
+        ret.append({'pai_config': experiment_config['paiConfig']})
-        pai_config_data['pai_config'] = experiment_config['paiConfig']
-        ret.append(pai_config_data)
+    elif config.training_service.platform == 'aml':
+        ret.append({'aml_config': experiment_config['amlConfig']})
+    elif config.training_service.platform == 'kubeflow':
+        ret.append({'kubeflow_config': experiment_config['kubeflowConfig']})
+    elif config.training_service.platform == 'frameworkcontroller':
+        ret.append({'frameworkcontroller_config': experiment_config['frameworkcontrollerConfig']})
+    elif config.training_service.platform == 'adl':
+        pass
    else:
        raise RuntimeError('Unsupported training service ' + config.training_service.platform)
@@ -159,7 +235,7 @@ def to_rest_json(config: ExperimentConfig) -> Dict[str, Any]:
    if config.search_space is not None:
        request_data['searchSpace'] = json.dumps(config.search_space)
-    else:
+    elif config.search_space_file is not None:
        request_data['searchSpace'] = Path(config.search_space_file).read_text()
    request_data['trainingServicePlatform'] = experiment_config.get('trainingServicePlatform')

--- a/nni/experiment/config/frameworkcontroller.py
+++ b/nni/experiment/config/frameworkcontroller.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+from dataclasses import dataclass
+from typing import List, Optional
+from .base import ConfigBase
+from .common import TrainingServiceConfig
+from . import util
+__all__ = [
+    'FrameworkControllerConfig',
+    'FrameworkControllerRoleConfig',
+    'FrameworkControllerNfsConfig',
+    'FrameworkControllerAzureStorageConfig'
+]
+@dataclass(init=False)
+class _FrameworkControllerStorageConfig(ConfigBase):
+    storage: str
+    server: Optional[str] = None
+    path: Optional[str] = None
+    azure_account: Optional[str] = None
+    azure_share: Optional[str] = None
+    key_vault: Optional[str] = None
+    key_vault_secret: Optional[str] = None
+@dataclass(init=False)
+class FrameworkControllerNfsConfig(ConfigBase):
+    storage: str = 'nfs'
+    server: str
+    path: str
+@dataclass(init=False)
+class FrameworkControllerAzureStorageConfig(ConfigBase):
+    storage: str = 'azureStorage'
+    azure_account: str
+    azure_share: str
+    key_vault: str
+    key_vault_secret: str
+@dataclass(init=False)
+class FrameworkControllerRoleConfig(ConfigBase):
+    name: str
+    docker_image: str = 'msranni/nni:latest'
+    task_number: int
+    command: str
+    gpu_number: int
+    cpu_number: int
+    memory_size: str
+    attempt_completion_min_failed_tasks: int
+    attempt_completion_min_succeeded_tasks: int
+@dataclass(init=False)
+class FrameworkControllerConfig(TrainingServiceConfig):
+    platform: str = 'frameworkcontroller'
+    service_account_name: str
+    storage: _FrameworkControllerStorageConfig
+    task_roles: List[FrameworkControllerRoleConfig]
+    def __init__(self, **kwargs):
+        kwargs = util.case_insensitive(kwargs)
+        kwargs['storage'] = util.load_config(_FrameworkControllerStorageConfig, kwargs.get('storage'))
+        kwargs['taskroles'] = util.load_config(FrameworkControllerRoleConfig, kwargs.get('taskroles'))
+        super().__init__(**kwargs)
+    _validation_rules = {
+        'platform': lambda value: (value == 'frameworkcontroller', 'cannot be modified')
+    }
--- a/nni/experiment/config/kubeflow.py
+++ b/nni/experiment/config/kubeflow.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+from dataclasses import dataclass
+from typing import Optional
+from .base import ConfigBase
+from .common import TrainingServiceConfig
+from . import util
+__all__ = ['KubeflowConfig', 'KubeflowRoleConfig', 'KubeflowNfsConfig', 'KubeflowAzureStorageConfig']
+@dataclass(init=False)
+class _KubeflowStorageConfig(ConfigBase):
+    storage: str
+    server: Optional[str] = None
+    path: Optional[str] = None
+    azure_account: Optional[str] = None
+    azure_share: Optional[str] = None
+    key_vault: Optional[str] = None
+    key_vault_secret: Optional[str] = None
+@dataclass(init=False)
+class KubeflowNfsConfig(_KubeflowStorageConfig):
+    storage: str = 'nfs'
+    server: str
+    path: str
+@dataclass(init=False)
+class KubeflowAzureStorageConfig(ConfigBase):
+    storage: str = 'azureStorage'
+    azure_account: str
+    azure_share: str
+    key_vault: str
+    key_vault_secret: str
+@dataclass(init=False)
+class KubeflowRoleConfig(ConfigBase):
+    replicas: int
+    command: str
+    gpu_number: int
+    cpu_number: int
+    memory_size: str
+    docker_image: str = 'msranni/nni:latest'
+@dataclass(init=False)
+class KubeflowConfig(TrainingServiceConfig):
+    platform: str = 'kubeflow'
+    operator: str
+    api_version: str
+    storage: _KubeflowStorageConfig
+    worker: KubeflowRoleConfig
+    parameter_server: Optional[KubeflowRoleConfig] = None
+    def __init__(self, **kwargs):
+        kwargs = util.case_insensitve(kwargs)
+        kwargs['storage'] = util.load_config(_KubeflowStorageConfig, kwargs.get('storage'))
+        kwargs['worker'] = util.load_config(KubeflowRoleConfig, kwargs.get('worker'))
+        kwargs['parameterserver'] = util.load_config(KubeflowRoleConfig, kwargs.get('parameterserver'))
+        super().__init__(**kwargs)
+    _validation_rules = {
+        'platform': lambda value: (value == 'kubeflow', 'cannot be modified'),
+        'operator': lambda value: value in ['tf-operator', 'pytorch-operator']
+    }
--- a/nni/experiment/config/local.py
+++ b/nni/experiment/config/local.py
@@ -11,7 +11,7 @@ __all__ = ['LocalConfig']
 @dataclass(init=False)
 class LocalConfig(TrainingServiceConfig):
    platform: str = 'local'
-    use_active_gpu: bool
+    use_active_gpu: Optional[bool] = None
    max_trial_number_per_gpu: int = 1
    gpu_indices: Optional[Union[List[int], str]] = None

--- a/nni/experiment/config/openpai.py
+++ b/nni/experiment/config/openpai.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, Dict, Optional
+from .base import PathLike
+from .common import TrainingServiceConfig
+from . import util
+__all__ = ['OpenpaiConfig']
+@dataclass(init=False)
+class OpenpaiConfig(TrainingServiceConfig):
+    platform: str = 'openpai'
+    host: str
+    username: str
+    token: str
+    docker_image: str = 'msranni/nni:latest'
+    local_storage_mount_point: PathLike
+    container_storage_mount_point: str
+    reuse_mode: bool = False
+    openpai_config: Optional[Dict[str, Any]] = None
+    openpai_config_file: Optional[PathLike] = None
+    _canonical_rules = {
+        'host': lambda value: 'https://' + value if '://' not in value else value,  # type: ignore
+        'local_storage_mount_point': util.canonical_path,
+        'openpai_config_file': util.canonical_path
+    }
+    _validation_rules = {
+        'platform': lambda value: (value == 'openpai', 'cannot be modified'),
+        'local_storage_mount_point': lambda value: Path(value).is_dir(),
+        'container_storage_mount_point': lambda value: (Path(value).is_absolute(), 'is not absolute'),
+        'openpai_config_file': lambda value: Path(value).is_file()
+    }
+    def validate(self) -> None:
+        super().validate()
+        if self.openpai_config is not None and self.openpai_config_file is not None:
+            raise ValueError('openpai_config and openpai_config_file can only be set one')
--- a/nni/experiment/config/remote.py
+++ b/nni/experiment/config/remote.py
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+from dataclasses import dataclass
+from pathlib import Path
+from typing import List, Optional, Union
+from .base import ConfigBase, PathLike
+from .common import TrainingServiceConfig
+from . import util
+__all__ = ['RemoteConfig', 'RemoteMachineConfig']
+@dataclass(init=False)
+class RemoteMachineConfig(ConfigBase):
+    host: str
+    port: int = 22
+    user: str
+    password: Optional[str] = None
+    ssh_key_file: Optional[PathLike] = None
+    ssh_passphrase: Optional[str] = None
+    use_active_gpu: bool = False
+    max_trial_number_per_gpu: int = 1
+    gpu_indices: Optional[Union[List[int], str]] = None
+    trial_prepare_command: Optional[str] = None
+    _canonical_rules = {
+        'ssh_key_file': util.canonical_path,
+        'gpu_indices': lambda value: [int(idx) for idx in value.split(',')] if isinstance(value, str) else value,
+    }
+    _validation_rules = {
+        'port': lambda value: 0 < value < 65536,
+        'max_trial_number_per_gpu': lambda value: value > 0,
+        'gpu_indices': lambda value: all(idx >= 0 for idx in value) and len(value) == len(set(value))
+    }
+    def validate(self):
+        super().validate()
+        if self.password is None and not Path(self.ssh_key_file).is_file():
+            raise ValueError(f'Password is not provided and cannot find SSH key file "{self.ssh_key_file}"')
+@dataclass(init=False)
+class RemoteConfig(TrainingServiceConfig):
+    platform: str = 'remote'
+    reuse_mode: bool = False
+    machine_list: List[RemoteMachineConfig]
+    def __init__(self, **kwargs):
+        kwargs = util.case_insensitive(kwargs)
+        kwargs['machinelist'] = util.load_config(RemoteMachineConfig, kwargs.get('machinelist'))
+        super().__init__(**kwargs)
+    _validation_rules = {
+        'platform': lambda value: (value == 'remote', 'cannot be modified')
+    }
--- a/nni/experiment/config/util.py
+++ b/nni/experiment/config/util.py
@@ -8,12 +8,15 @@ Miscellaneous utility functions.
 import math
 import os.path
 from pathlib import Path
-from typing import Optional, Union
+from typing import Any, Dict, Optional, Union
 PathLike = Union[Path, str]
-def case_insensitive(key: str) -> str:
+def case_insensitive(key_or_kwargs: Union[str, Dict[str, Any]]) -> Union[str, Dict[str, Any]]:
-    return key.lower().replace('_', '')
+    if isinstance(key_or_kwargs, str):
+        return key_or_kwargs.lower().replace('_', '')
+    else:
+        return {key.lower().replace('_', ''): value for key, value in key_or_kwargs.items()}
 def camel_case(key: str) -> str:
    words = key.split('_')
@@ -26,13 +29,20 @@ def canonical_path(path: Optional[PathLike]) -> Optional[str]:
 def count(*values) -> int:
    return sum(value is not None and value is not False for value in values)
-def training_service_config_factory(platform: str): # -> TrainingServiceConfig
+def training_service_config_factory(platform: str, **kwargs): # -> TrainingServiceConfig
    from .common import TrainingServiceConfig
    for cls in TrainingServiceConfig.__subclasses__():
        if cls.platform == platform:
-            return cls()
+            return cls(**kwargs)
    raise ValueError(f'Unrecognized platform {platform}')
+def load_config(Type, value):
+    if isinstance(value, list):
+        return [load_config(Type, item) for item in value]
+    if isinstance(value, dict):
+        return Type(**value)
+    return value
 def strip_optional(type_hint):
    return type_hint.__args__[0] if str(type_hint).startswith('typing.Optional[') else type_hint

--- a/nni/experiment/nni_client.py
+++ b/nni/experiment/nni_client.py
@@ -28,7 +28,7 @@ import json
 import requests
 __all__ = [
-    'ExternalExperiment',
+    'LegacyExperiment',
    'TrialResult',
    'TrialMetricData',
    'TrialHyperParameters',
@@ -228,7 +228,7 @@ class TrialJob:
                    .format(self.trialJobId, self.status, self.hyperParameters, self.logPath,
                            self.startTime, self.endTime, self.finalMetricData, self.stderrPath)
-class ExternalExperiment:
+class LegacyExperiment:
    def __init__(self):
        self._endpoint = None
        self._exp_id = None

--- a/nni/tools/nnictl/launcher.py
+++ b/nni/tools/nnictl/launcher.py
@@ -9,6 +9,7 @@ import random
 import time
 import tempfile
 from subprocess import Popen, check_call, CalledProcessError, PIPE, STDOUT
+from nni.experiment.config import ExperimentConfig, convert
 from nni.tools.annotation import expand_annotations, generate_search_space
 from nni.tools.package_utils import get_builtin_module_class_name
 import nni_node
@@ -591,6 +592,11 @@ def create_experiment(args):
        print_error('Please set correct config path!')
        exit(1)
    experiment_config = get_yml_content(config_path)
+    try:
+        config = ExperimentConfig(**experiment_config)
+        experiment_config = convert.to_v1_yaml(config)
+    except Exception:
+        pass
    try:
        validate_all_content(experiment_config, config_path)
    except Exception as e:

--- a/pipelines/fast-test.yml
+++ b/pipelines/fast-test.yml
@@ -62,7 +62,9 @@ jobs:
  - script: |
      cd test
-      python3 -m pytest ut
+      python3 -m pytest ut --ignore=ut/sdk/test_pruners.py --ignore=ut/sdk/test_compressor_tf.py
+      python3 -m pytest ut/sdk/test_pruners.py
+      python3 -m pytest ut/sdk/test_compressor_tf.py
    displayName: Python unit test
  - script: |

--- a/test/config/integration_tests.yml
+++ b/test/config/integration_tests.yml
@@ -143,8 +143,8 @@ testCases:
  config:
    maxTrialNum: 4
    trialConcurrency: 4
-  launchCommand: python3 -c 'from nni.experiment import ExternalExperiment as Experiment; exp = Experiment(); exp.start_experiment("$configFile")'
+  launchCommand: python3 -c 'from nni.experiment import LegacyExperiment as Experiment; exp = Experiment(); exp.start_experiment("$configFile")'
-  stopCommand: python3 -c 'from nni.experiment import ExternalExperiment as Experiment; exp = Experiment(); exp.connect_experiment("http://localhost:8080/"); exp.stop_experiment()'
+  stopCommand: python3 -c 'from nni.experiment import LegacyExperiment as Experiment; exp = Experiment(); exp.connect_experiment("http://localhost:8080/"); exp.stop_experiment()'
  validator:
    class: NnicliValidator
  platform: linux darwin

--- a/test/config/integration_tests_tf2.yml
+++ b/test/config/integration_tests_tf2.yml
@@ -110,8 +110,8 @@ testCases:
  config:
    maxTrialNum: 4
    trialConcurrency: 4
-  launchCommand: python3 -c 'from nni.experiment import ExternalExperiment as Experiment; exp = Experiment(); exp.start_experiment("$configFile")'
+  launchCommand: python3 -c 'from nni.experiment import LegacyExperiment as Experiment; exp = Experiment(); exp.start_experiment("$configFile")'
-  stopCommand: python3 -c 'from nni.experiment import ExternalExperiment as Experiment; exp = Experiment(); exp.connect_experiment("http://localhost:8080/"); exp.stop_experiment()'
+  stopCommand: python3 -c 'from nni.experiment import LegacyExperiment as Experiment; exp = Experiment(); exp.connect_experiment("http://localhost:8080/"); exp.stop_experiment()'
  validator:
    class: NnicliValidator
  platform: linux darwin

--- a/test/config/pr_tests.yml
+++ b/test/config/pr_tests.yml
@@ -47,8 +47,8 @@ testCases:
  config:
    maxTrialNum: 4
    trialConcurrency: 4
-  launchCommand: python3 -c 'from nni.experiment import ExternalExperiment as Experiment; exp = Experiment(); exp.start_experiment("$configFile")'
+  launchCommand: python3 -c 'from nni.experiment import LegacyExperiment as Experiment; exp = Experiment(); exp.start_experiment("$configFile")'
-  stopCommand: python3 -c 'from nni.experiment import ExternalExperiment as Experiment; exp = Experiment(); exp.connect_experiment("http://localhost:8080/"); exp.stop_experiment()'
+  stopCommand: python3 -c 'from nni.experiment import LegacyExperiment as Experiment; exp = Experiment(); exp.connect_experiment("http://localhost:8080/"); exp.stop_experiment()'
  validator:
    class: NnicliValidator
  platform: linux darwin