Defaults to "info" or "debug", depending on ``debug`` option. When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info".
Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc.
When time runs out, the experiment will stop creating trials but continue to serve WebUI.
The exception is trial, whose logging level is directly managed by trial code.
For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``logging.CRITICAL``.
maxTrialNumber
* - experimentWorkingDirectory
--------------
- ``Optional[str]``
- Specify the :ref:`directory <path>` to place log, checkpoint, metadata, and other run-time stuff.
Limit the number of trials to create if specified.
By default uses ``~/nni-experiments``.
NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments.
type: ``Optional[int]``
* - tunerGpuIndices
When the budget runs out, the experiment will stop creating trials but continue to serve WebUI.
- ``Optional[list[int] | str | int]``
- Limit the GPUs visible to tuner, assessor, and advisor.
This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process.
maxTrialDuration
Because tuner, assessor, and advisor run in the same process, this option will affect them all.
---------------------
* - tuner
Limit the duration of trial job if specified.
- ``Optional[AlgorithmConfig]``
- Specify the tuner.
type: ``Optional[str]``
The built-in tuners can be found `here <../builtin_tuner.rst>`__ and you can follow `this tutorial <../Tuner/CustomizeTuner.rst>`__ to customize a new tuner.
format: ``number + s|m|h|d``
* - assessor
- ``Optional[AlgorithmConfig]``
examples: ``"10m"``, ``"0.5h"``
- Specify the assessor.
The built-in assessors can be found `here <../builtin_assessor.rst>`__ and you can follow `this tutorial <../Assessor/CustomizeAssessor.rst>`__ to customize a new assessor.
When time runs out, the current trial job will stop.
* - advisor
- ``Optional[AlgorithmConfig]``
nniManagerIp
- Specify the advisor.
------------
NNI provides two built-in advisors: `BOHB <../Tuner/BohbAdvisor.rst>`__ and `Hyperband <../Tuner/HyperbandAdvisor.rst>`__, and you can follow `this tutorial <../Tuner/CustomizeAdvisor.rst>`__ to customize a new advisor.
IP of the current machine, used by training machines to access NNI manager. Not used in local mode.
* - trainingService
- ``TrainingServiceConfig``
type: ``Optional[str]``
- Specify the `training service <../TrainingService/Overview.rst>`__.
If not specified, IPv4 address of ``eth0`` will be used.
* - sharedStorage
- ``Optional[SharedStorageConfig]``
Except for the local mode, it is highly recommended to set this field manually.
- Configure the shared storage, detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__.
Defaults to "info" or "debug", depending on `debug`_ option. When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info".
Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc.
The exception is trial, whose logging level is directly managed by trial code.
For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``logging.CRITICAL``.
experimentWorkingDirectory
--------------------------
Specify the :ref:`directory <path>` to place log, checkpoint, metadata, and other run-time stuff.
type: ``Optional[str]``
By default uses ``~/nni-experiments``.
NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments.
tunerGpuIndices
---------------
Limit the GPUs visible to tuner, assessor, and advisor.
type: ``Optional[list[int] | str | int]``
This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process.
Because tuner, assessor, and advisor run in the same process, this option will affect them all.
tuner
-----
Specify the tuner.
type: Optional `AlgorithmConfig`_
The built-in tuners can be found `here <../builtin_tuner.rst>`__ and you can follow `this tutorial <../Tuner/CustomizeTuner.rst>`__ to customize a new tuner.
assessor
--------
Specify the assessor.
type: Optional `AlgorithmConfig`_
The built-in assessors can be found `here <../builtin_assessor.rst>`__ and you can follow `this tutorial <../Assessor/CustomizeAssessor.rst>`__ to customize a new assessor.
advisor
-------
Specify the advisor.
type: Optional `AlgorithmConfig`_
NNI provides two built-in advisors: `BOHB <../Tuner/BohbAdvisor.rst>`__ and `Hyperband <../Tuner/HyperbandAdvisor.rst>`__, and you can follow `this tutorial <../Tuner/CustomizeAdvisor.rst>`__ to customize a new advisor.
trainingService
---------------
Specify the `training service <../TrainingService/Overview.rst>`__.
type: `TrainingServiceConfig`_
sharedStorage
-------------
Configure the shared storage, detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__.
type: Optional `SharedStorageConfig`_
AlgorithmConfig
AlgorithmConfig
^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^
...
@@ -363,42 +250,34 @@ For customized algorithms, there are two ways to describe them:
...
@@ -363,42 +250,34 @@ For customized algorithms, there are two ways to describe them:
2. Specify code directory and class name directly.
2. Specify code directory and class name directly.
.. list-table::
:widths: 10 10 80
:header-rows: 1
name
* - Field Name
----
- Type
- Description
Name of the built-in or registered algorithm.
* - name
- ``Optional[str]``
- Name of the built-in or registered algorithm.
``str`` for the built-in and registered algorithm, ``None`` for other customized algorithms.
type: ``str`` for the built-in and registered algorithm, ``None`` for other customized algorithms.
* - className
- ``Optional[str]``
- Qualified class name of not registered customized algorithm.
``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.
example: ``"my_tuner.MyTuner"``
* - codeDirectory
- ``Optional[str]``
- `Path`_ to the directory containing the customized algorithm class.
``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.
className
* - classArgs
---------
- ``Optional[dict[str, Any]]``
- Keyword arguments passed to algorithm class' constructor.
Qualified class name of not registered customized algorithm.
See algorithm's document for supported value.
type: ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.
example: ``"my_tuner.MyTuner"``
codeDirectory
-------------
`Path`_ to the directory containing the customized algorithm class.
type: ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.
classArgs
---------
Keyword arguments passed to algorithm class' constructor.
type: ``Optional[dict[str, Any]]``
See algorithm's document for supported value.
TrainingServiceConfig
TrainingServiceConfig
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
...
@@ -407,635 +286,421 @@ One of the following:
...
@@ -407,635 +286,421 @@ One of the following:
- `LocalConfig`_
- `LocalConfig`_
- `RemoteConfig`_
- `RemoteConfig`_
- :ref:`OpenpaiConfig <openpai-class>`
- `OpenpaiConfig`_
- `AmlConfig`_
- `AmlConfig`_
- `DlcConfig`_
- `DlcConfig`_
- `HybridConfig`_
- `HybridConfig`_
For `Kubeflow <../TrainingService/KubeflowMode.rst>`_, `FrameworkController <../TrainingService/FrameworkControllerMode.rst>`_, and `AdaptDL <../TrainingService/AdaptDLMode.rst>`_ training platforms, it is suggested to use `v1 config schema <../Tutorial/ExperimentConfig.rst>`_ for now.
For `Kubeflow <../TrainingService/KubeflowMode.rst>`_, `FrameworkController <../TrainingService/FrameworkControllerMode.rst>`_, and `AdaptDL <../TrainingService/AdaptDLMode.rst>`_ training platforms, it is suggested to use `v1 config schema <../Tutorial/ExperimentConfig.rst>`_ for now.
LocalConfig
LocalConfig
-----------
-----------
Detailed usage can be found `here <../TrainingService/LocalMode.rst>`__.
Detailed usage can be found `here <../TrainingService/LocalMode.rst>`__.
platform
.. list-table::
""""""""
:widths: 10 10 80
:header-rows: 1
Constant string ``"local"``.
* - Field Name
- Type
- Description
useActiveGpu
* - platform
""""""""""""
- ``"local"``
-
Specify whether NNI should submit trials to GPUs occupied by other tasks.
* - useActiveGpu
- ``Optional[bool]``
type: ``Optional[bool]``
- Specify whether NNI should submit trials to GPUs occupied by other tasks.
Must be set when ``trialGpuNumber`` greater than zero.
Must be set when `trialGpuNumber`_ greater than zero.
Following processes can make GPU "active":
Following processes can make GPU "active":
- non-NNI CUDA programs
- non-NNI CUDA programs
- graphical desktop
- graphical desktop
- trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
- trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
- other users' CUDA programs, if you are using a shared server
- other users' CUDA programs, if you are using a shared server
If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.
If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.
When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.
When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.
maxTrialNumberPerGpu
""""""""""""""""""""
Specify how many trials can share one GPU.
* - maxTrialNumberPerGpu
- ``int``
type: ``int``
- Specify how many trials can share one GPU.
default: ``1``
default: ``1``
gpuIndices
""""""""""
Limit the GPUs visible to trial processes.
type: ``Optional[list[int] | str | int]``
If `trialGpuNumber`_ is less than the length of this value, only a subset will be visible to each trial.
This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.
* - gpuIndices
- ``Optional[list[int] | str | int]``
- Limit the GPUs visible to trial processes.
If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial.
This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.
RemoteConfig
RemoteConfig
------------
------------
Detailed usage can be found `here <../TrainingService/RemoteMachineMode.rst>`__.
Detailed usage can be found `here <../TrainingService/RemoteMachineMode.rst>`__.
If not specified, `sshKeyFile`_ will be used instead.
sshKeyFile
**********
`Path`_ to sshKeyFile (identity file).
type: ``Optional[str]``
Only used when `password`_ is not specified.
* - Field Name
- Type
- Description
* - host
- ``str``
- IP or hostname (domain name) of the machine.
sshPassphrase
* - port
*************
- ``int``
- SSH service port.
default: ``22``
Passphrase of SSH identity file.
* - user
- ``str``
- Login user name.
type: ``Optional[str]``
* - password
- ``Optional[str]``
- If not specified, ``sshKeyFile`` will be used instead.
* - sshKeyFile
- ``Optional[str]``
- `Path`_ to ``sshKeyFile`` (identity file).
Only used when ``password`` is not specified.
useActiveGpu
* - sshPassphrase
************
- ``Optional[str]``
- Passphrase of SSH identity file.
Specify whether NNI should submit trials to GPUs occupied by other tasks.
* - useActiveGpu
- ``bool``
type: ``bool``
- Specify whether NNI should submit trials to GPUs occupied by other tasks.
default: ``False``
default: ``False``
Must be set when ``trialGpuNumber`` greater than zero.
Following processes can make GPU "active":
Must be set when `trialGpuNumber`_ greater than zero.
Following processes can make GPU "active":
- non-NNI CUDA programs
- non-NNI CUDA programs
- graphical desktop
- graphical desktop
- trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
- trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
- other users' CUDA programs, if you are using a shared server
- other users' CUDA programs, if you are using a shared server
If your remote machine is a graphical OS like Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.
If your remote machine is a graphical OS like Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.
When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.
When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.
maxTrialNumberPerGpu
********************
Specify how many trials can share one GPU.
type: ``int``
default: ``1``
gpuIndices
**********
Limit the GPUs visible to trial processes.
* - maxTrialNumberPerGpu
- ``int``
- Specify how many trials can share one GPU.
default: ``1``
type: ``Optional[list[int] | str | int]``
* - gpuIndices
- ``Optional[list[int] | str | int]``
- Limit the GPUs visible to trial processes.
If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial.
This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.
If `trialGpuNumber`_ is less than the length of this value, only a subset will be visible to each trial.
* - pythonPath
- ``Optional[str]``
This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.
- Specify a Python environment.
This path will be inserted at the front of PATH. Here are some examples:
pythonPath
**********
Specify a Python environment.
type: ``Optional[str]``
This path will be inserted at the front of PATH. Here are some examples:
- (linux) pythonPath: ``/opt/python3.7/bin``
- (linux) pythonPath: ``/opt/python3.7/bin``
- (windows) pythonPath: ``C:/Python37``
- (windows) pythonPath: ``C:/Python37``
If you are working on Anaconda, there is some difference. On Windows, you also have to add ``../script`` and ``../Library/bin`` separated by ``;``. Examples are as below:
If you are working on Anaconda, there is some difference. On Windows, you also have to add ``../script`` and ``../Library/bin`` separated by ``;``. Examples are as below:
An example can be found `here <https://github.com/microsoft/pai/blob/master/docs/manual/cluster-user/examples/hello-world-job.yaml>`__.
AmlConfig
AmlConfig
---------
---------
Detailed usage can be found `here <../TrainingService/AMLMode.rst>`__.
Detailed usage can be found `here <../TrainingService/AMLMode.rst>`__.
.. list-table::
:widths: 10 10 80
:header-rows: 1
platform
* - Field Name
""""""""
- Type
- Description
Constant string ``"aml"``.
dockerImage
"""""""""""
Name and tag of docker image to run the trials.
type: ``str``
default: ``"msranni/nni:latest"``
subscriptionId
""""""""""""""
Azure subscription ID.
type: ``str``
resourceGroup
"""""""""""""
Azure resource group name.
type: ``str``
workspaceName
* - platform
"""""""""""""
- ``"aml"``
-
Azure workspace name.
* - dockerImage
- ``str``
- Name and tag of docker image to run the trials.
default: ``"msranni/nni:latest"``
type: ``str``
* - subscriptionId
- ``str``
- Azure subscription ID.
* - resourceGroup
- ``str``
- Azure resource group name.
computeTarget
* - workspaceName
"""""""""""""
- ``str``
- Azure workspace name.
AML compute cluster name.
type: ``str``
* - computeTarget
- ``str``
- AML compute cluster name.
DlcConfig
DlcConfig
---------
---------
Detailed usage can be found `here <../TrainingService/DlcMode.rst>`__.
Detailed usage can be found `here <../TrainingService/DlcMode.rst>`__.
.. list-table::
:widths: 10 10 80
:header-rows: 1
platform
* - Field Name
""""""""
- Type
- Description
Constant string ``"dlc"``.
type
""""
Job spec type.
type: ``str``
default: ``"worker"``
image
"""""
Name and tag of docker image to run the trials.
type: ``str``
jobType
"""""""
PAI-DLC training job type, ``"TFJob"`` or ``"PyTorchJob"``.
type: ``str``
podCount
""""""""
Pod count to run a single training job.
type: ``str``
ecsSpec
"""""""
Training server config spec string.
type: ``str``
* - platform
- ``"dlc"``
-
* - type
- ``str``
- Job spec type.
default: ``"worker"``.
region
* - image
""""""
- ``str``
- Name and tag of docker image to run the trials.
The region where PAI-DLC public-cluster locates.
* - jobType
- ``str``
- PAI-DLC training job type, ``"TFJob"`` or ``"PyTorchJob"``.
type: ``str``
* - podCount
- ``str``
- Pod count to run a single training job.
* - ecsSpec
- ``str``
- Training server config spec string.
nasDataSourceId
* - region
"""""""""""""""
- ``str``
- The region where PAI-DLC public-cluster locates.
The NAS datasource id configurated in PAI-DLC side.
* - nasDataSourceId
- ``str``
- The NAS datasource id configurated in PAI-DLC side.
type: ``str``
* - accessKeyId
- ``str``
- The accessKeyId of your cloud account.
* - accessKeySecret
- ``str``
- The accessKeySecret of your cloud account.
* - localStorageMountPoint
- ``str``
- The mount point of the NAS on PAI-DSW server, default is /home/admin/workspace/.
accessKeyId
* - containerStorageMountPoint
"""""""""""
- ``str``
- The mount point of the NAS on PAI-DLC side, default is /root/data/.
The accessKeyId of your cloud account.
type: ``str``
accessKeySecret
"""""""""""""""
The accessKeySecret of your cloud account.
type: ``str``
localStorageMountPoint
""""""""""""""""""""""
The mount point of the NAS on PAI-DSW server, default is /home/admin/workspace/.
type: ``str``
containerStorageMountPoint
""""""""""""""""""""""""""
The mount point of the NAS on PAI-DLC side, default is /root/data/.
type: ``str``
HybridConfig
HybridConfig
------------
------------
Currently only support `LocalConfig`_, `RemoteConfig`_, :ref:`OpenpaiConfig <openpai-class>` and `AmlConfig`_ . Detailed usage can be found `here <../TrainingService/HybridMode.rst>`__.
Currently only support `LocalConfig`_, `RemoteConfig`_, `OpenpaiConfig`_ and `AmlConfig`_ . Detailed usage can be found `here <../TrainingService/HybridMode.rst>`__.
type: list of `TrainingServiceConfig`_
SharedStorageConfig
SharedStorageConfig
^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^
Detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__.
Detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__.
nfsConfig
nfsConfig
---------
---------
storageType
.. list-table::
"""""""""""
:widths: 10 10 80
:header-rows: 1
Constant string ``"NFS"``.
localMountPoint
"""""""""""""""
The path that the storage has been or will be mounted in the local machine.
* - Field Name
- Type
- Description
type: ``str``
* - storageType
- ``"NFS"``
-
If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.
* - localMountPoint
- ``str``
- The path that the storage has been or will be mounted in the local machine.
If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.
* - remoteMountPoint
- ``str``
- The path that the storage will be mounted in the remote machine.
If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.
remoteMountPoint
* - localMounted
""""""""""""""""
- ``str``
- Specify the object and status to mount the shared storage.
``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.
The path that the storage will be mounted in the remote machine.
* - nfsServer
- ``str``
type: ``str``
- NFS server host.
If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.
localMounted
""""""""""""
Specify the object and status to mount the shared storage.
``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.
nfsServer
"""""""""
NFS server host.
type: ``str``
exportedDirectory
"""""""""""""""""
Exported directory of NFS server, detailed `here <https://www.ibm.com/docs/en/aix/7.2?topic=system-nfs-exporting-mounting>`_.
type: ``str``
* - exportedDirectory
- ``str``
- Exported directory of NFS server, detailed `here <https://www.ibm.com/docs/en/aix/7.2?topic=system-nfs-exporting-mounting>`_.
azureBlobConfig
azureBlobConfig
---------------
---------------
storageType
.. list-table::
"""""""""""
:widths: 10 10 80
:header-rows: 1
Constant string ``"AzureBlob"``.
* - Field Name
- Type
localMountPoint
- Description
"""""""""""""""
* - storageType
The path that the storage has been or will be mounted in the local machine.
- ``"AzureBlob"``
-
type: ``str``
* - localMountPoint
If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.
- ``str``
- The path that the storage has been or will be mounted in the local machine.
If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.
remoteMountPoint
""""""""""""""""
* - remoteMountPoint
- ``str``
The path that the storage will be mounted in the remote machine.
- The path that the storage will be mounted in the remote machine.
If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.
type: ``str``
Note that the directory must be empty when using AzureBlob.
If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.
* - localMounted
- ``str``
Note that the directory must be empty when using AzureBlob.
- Specify the object and status to mount the shared storage.
``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.
localMounted
""""""""""""
* - storageAccountName
- ``str``
Specify the object and status to mount the shared storage.
``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.