Unverified Commit deef0c42 authored by liuzhe-lz's avatar liuzhe-lz Committed by GitHub
Browse files

Update docs to v2 config (#3966)

parent 03a02232
...@@ -46,39 +46,28 @@ Use ``examples/trials/mnist-pytorch`` as an example. The NNI config YAML file's ...@@ -46,39 +46,28 @@ Use ``examples/trials/mnist-pytorch`` as an example. The NNI config YAML file's
.. code-block:: yaml .. code-block:: yaml
authorName: default searchSpaceFile: search_space.json
experimentName: example_mnist trialCommand: python3 mnist.py
trialConcurrency: 1 trialConcurrency: 1
maxExecDuration: 1h maxTrialNumber: 10
maxTrialNum: 10
trainingServicePlatform: aml
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner name: TPE
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs: classArgs:
#choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
trial: trainingService:
command: python3 mnist.py platform: aml
codeDir: . dockerImage: msranni/nni
image: msranni/nni subscriptionId: ${your subscription ID}
gpuNum: 1 resourceGroup: ${your resource group}
amlConfig: workspaceName: ${your workspace name}
subscriptionId: ${replace_to_your_subscriptionId} computeTarget: ${your compute target}
resourceGroup: ${replace_to_your_resourceGroup}
workspaceName: ${replace_to_your_workspaceName}
computeTarget: ${replace_to_your_computeTarget}
Note: You should set ``trainingServicePlatform: aml`` in NNI config YAML file if you want to start experiment in aml mode. Note: You should set ``platform: aml`` in NNI config YAML file if you want to start experiment in aml mode.
Compared with `LocalMode <LocalMode.rst>`__ trial configuration in aml mode have these additional keys: Compared with `LocalMode <LocalMode.rst>`__ training service configuration in aml mode have these additional keys:
* image * dockerImage
* required key. The docker image name used in job. NNI support image ``msranni/nni`` for running aml jobs. * required key. The docker image name used in job. NNI support image ``msranni/nni`` for running aml jobs.
...@@ -103,7 +92,7 @@ amlConfig: ...@@ -103,7 +92,7 @@ amlConfig:
* required key, the compute cluster name you want to use in your AML workspace. `refer <https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target>`__ See Step 6. * required key, the compute cluster name you want to use in your AML workspace. `refer <https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target>`__ See Step 6.
* maxTrialNumPerGpu * maxTrialNumberPerGpu
* optional key, default 1. Used to specify the max concurrency trial number on a GPU device. * optional key, default 1. Used to specify the max concurrency trial number on a GPU device.
......
...@@ -8,7 +8,7 @@ In this tutorial, we will use the example in [nni/examples/trials/mnist-pytorch] ...@@ -8,7 +8,7 @@ In this tutorial, we will use the example in [nni/examples/trials/mnist-pytorch]
Before starts Before starts
You have an implementation for MNIST classifer using convolutional layers, the Python code is in ``mnist_before.py``. You have an implementation for MNIST classifer using convolutional layers, the Python code is similar to ``mnist.py``.
.. ..
...@@ -37,15 +37,13 @@ to get hyper-parameters' values assigned by tuner. ``tuner_params`` is an object ...@@ -37,15 +37,13 @@ to get hyper-parameters' values assigned by tuner. ``tuner_params`` is an object
1.3 Report NNI results: Use the API: ``nni.report_intermediate_result(accuracy)`` to send ``accuracy`` to assessor. Use the API: ``nni.report_final_result(accuracy)`` to send `accuracy` to tuner. 1.3 Report NNI results: Use the API: ``nni.report_intermediate_result(accuracy)`` to send ``accuracy`` to assessor. Use the API: ``nni.report_final_result(accuracy)`` to send `accuracy` to tuner.
We had made the changes and saved it to ``mnist.py``.
**NOTE**\ : **NOTE**\ :
.. code-block:: bash .. code-block:: bash
accuracy - The `accuracy` could be any python object, but if you use NNI built-in tuner/assessor, `accuracy` should be a numerical variable (e.g. float, int). accuracy - The `accuracy` could be any python object, but if you use NNI built-in tuner/assessor, `accuracy` should be a numerical variable (e.g. float, int).
assessor - The assessor will decide which trial should early stop based on the history performance of trial (intermediate result of one trial).
tuner - The tuner will generate next parameters/architecture based on the explore history (final result of all trials). tuner - The tuner will generate next parameters/architecture based on the explore history (final result of all trials).
assessor - The assessor will decide which trial should early stop based on the history performance of trial (intermediate result of one trial).
.. ..
...@@ -71,16 +69,6 @@ Refer to `define search space <../Tutorial/SearchSpaceSpec.rst>`__ to learn more ...@@ -71,16 +69,6 @@ Refer to `define search space <../Tutorial/SearchSpaceSpec.rst>`__ to learn more
.. ..
3.1 enable NNI API mode
To enable NNI API mode, you need to set useAnnotation to *false* and provide the path of SearchSpace file (you just defined in step 1):
.. code-block:: bash
useAnnotation: false
searchSpacePath: /path/to/your/search_space.json
To run an experiment in NNI, you only needed: To run an experiment in NNI, you only needed:
...@@ -96,11 +84,11 @@ To run an experiment in NNI, you only needed: ...@@ -96,11 +84,11 @@ To run an experiment in NNI, you only needed:
You can download nni source code and a set of examples can be found in ``nni/examples``, run ``ls nni/examples/trials`` to see all the trial examples. You can download nni source code and a set of examples can be found in ``nni/examples``, run ``ls nni/examples/trials`` to see all the trial examples.
Let's use a simple trial example, e.g. mnist, provided by NNI. After you installed NNI, NNI examples have been put in ~/nni/examples, run ``ls ~/nni/examples/trials`` to see all the trial examples. You can simply execute the following command to run the NNI mnist example: Let's use a simple trial example, e.g. mnist, provided by NNI. After you cloned NNI source, NNI examples have been put in ~/nni/examples, run ``ls ~/nni/examples/trials`` to see all the trial examples. You can simply execute the following command to run the NNI mnist example:
.. code-block:: bash .. code-block:: bash
python ~/nni/examples/trials/mnist-annotation/mnist.py python ~/nni/examples/trials/mnist-pytorch/mnist.py
This command will be filled in the YAML configure file below. Please refer to `here <../TrialExample/Trials.rst>`__ for how to write your own trial. This command will be filled in the YAML configure file below. Please refer to `here <../TrialExample/Trials.rst>`__ for how to write your own trial.
...@@ -110,53 +98,42 @@ This command will be filled in the YAML configure file below. Please refer to `h ...@@ -110,53 +98,42 @@ This command will be filled in the YAML configure file below. Please refer to `h
.. code-block:: bash .. code-block:: bash
tuner: tuner:
builtinTunerName: TPE name: TPE
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
*builtinTunerName* is used to specify a tuner in NNI, *classArgs* are the arguments pass to the tuner (the spec of builtin tuners can be found `here <../Tuner/BuiltinTuner.rst>`__\ ), *optimization_mode* is to indicate whether you want to maximize or minimize your trial's result. *name* is used to specify a tuner in NNI, *classArgs* are the arguments pass to the tuner (the spec of builtin tuners can be found `here <../Tuner/BuiltinTuner.rst>`__\ ), *optimization_mode* is to indicate whether you want to maximize or minimize your trial's result.
**Prepare configure file**\ : Since you have already known which trial code you are going to run and which tuner you are going to use, it is time to prepare the YAML configure file. NNI provides a demo configure file for each trial example, ``cat ~/nni/examples/trials/mnist-annotation/config.yml`` to see it. Its content is basically shown below: **Prepare configure file**\ : Since you have already known which trial code you are going to run and which tuner you are going to use, it is time to prepare the YAML configure file. NNI provides a demo configure file for each trial example, ``cat ~/nni/examples/trials/mnist-pytorch/config.yml`` to see it. Its content is basically shown below:
.. code-block:: yaml .. code-block:: yaml
authorName: your_name experimentName: local training service example
experimentName: auto_mnist
# how many trials could be concurrently running
trialConcurrency: 1
# maximum experiment running duration searchSpaceFile ~/nni/examples/trials/mnist-pytorch/search_space.json
maxExecDuration: 3h trailCommand: python3 mnist.py
trialCodeDirectory: ~/nni/examples/trials/mnist-pytorch
# empty means never stop trialGpuNumber: 0
maxTrialNum: 100 trialConcurrency: 1
maxExperimentDuration: 3h
# choice: local, remote maxTrialNumber: 10
trainingServicePlatform: local
# search space file trainingService:
searchSpacePath: search_space.json platform: local
# choice: true, false
useAnnotation: true
tuner: tuner:
builtinTunerName: TPE name: TPE
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
trial:
command: python mnist.py
codeDir: ~/nni/examples/trials/mnist-annotation
gpuNum: 0
Here *useAnnotation* is true because this trial example uses our python annotation (refer to `here <../Tutorial/AnnotationSpec.rst>`__ for details). For trial, we should provide *trialCommand* which is the command to run the trial, provide *trialCodeDir* where the trial code is. The command will be executed in this directory. We should also provide how many GPUs a trial requires.
With all these steps done, we can run the experiment with the following command: With all these steps done, we can run the experiment with the following command:
.. code-block:: bash .. code-block:: bash
nnictl create --config ~/nni/examples/trials/mnist-annotation/config.yml nnictl create --config ~/nni/examples/trials/mnist-pytorch/config.yml
You can refer to `here <../Tutorial/Nnictl.rst>`__ for more usage guide of *nnictl* command line tool. You can refer to `here <../Tutorial/Nnictl.rst>`__ for more usage guide of *nnictl* command line tool.
...@@ -169,29 +146,29 @@ The experiment has been running now. Other than *nnictl*\ , NNI also provides We ...@@ -169,29 +146,29 @@ The experiment has been running now. Other than *nnictl*\ , NNI also provides We
Using multiple local GPUs to speed up search Using multiple local GPUs to speed up search
-------------------------------------------- --------------------------------------------
The following steps assume that you have 4 NVIDIA GPUs installed at local and `tensorflow with GPU support <https://www.tensorflow.org/install/gpu>`__. The demo enables 4 concurrent trail jobs and each trail job uses 1 GPU. The following steps assume that you have 4 NVIDIA GPUs installed at local and PyTorch with CUDA support. The demo enables 4 concurrent trail jobs and each trail job uses 1 GPU.
**Prepare configure file**\ : NNI provides a demo configuration file for the setting above, ``cat ~/nni/examples/trials/mnist-annotation/config_gpu.yml`` to see it. The trailConcurrency and gpuNum are different from the basic configure file: **Prepare configure file**\ : NNI provides a demo configuration file for the setting above, ``cat ~/nni/examples/trials/mnist-pytorch/config_detailed.yml`` to see it. The trailConcurrency and trialGpuNumber are different from the basic configure file:
.. code-block:: bash .. code-block:: bash
... ...
# how many trials could be concurrently running trialGpuNumber: 1
trialConcurrency: 4 trialConcurrency: 4
... ...
trial: trainingService:
command: python mnist.py platform: local
codeDir: ~/nni/examples/trials/mnist-annotation useActiveGpu: false # set to "true" if you are using graphical OS like Windows 10 and Ubuntu desktop
gpuNum: 1
We can run the experiment with the following command: We can run the experiment with the following command:
.. code-block:: bash .. code-block:: bash
nnictl create --config ~/nni/examples/trials/mnist-annotation/config_gpu.yml nnictl create --config ~/nni/examples/trials/mnist-pytorch/config_detailed.yml
You can use *nnictl* command line tool or WebUI to trace the training progress. *nvidia_smi* command line tool can also help you to monitor the GPU usage during training. You can use *nnictl* command line tool or WebUI to trace the training progress. *nvidia_smi* command line tool can also help you to monitor the GPU usage during training.
...@@ -50,85 +50,84 @@ You could use the following configuration in your NNI's config file: ...@@ -50,85 +50,84 @@ You could use the following configuration in your NNI's config file:
.. code-block:: yaml .. code-block:: yaml
nniManagerNFSMountPath: /local/mnt localStorageMountPoint: /local/mnt
**Step 4. Get OpenPAI's storage config name and nniManagerMountPath** **Step 4. Get OpenPAI's storage config name and localStorageMountPoint**
The ``Team share storage`` field is storage configuration used to specify storage value in OpenPAI. You can get ``paiStorageConfigName`` and ``containerNFSMountPath`` field in ``Team share storage``\ , for example: The ``Team share storage`` field is storage configuration used to specify storage value in OpenPAI. You can get ``storageConfigName`` and ``containerStorageMountPoint`` field in ``Team share storage``\ , for example:
.. code-block:: yaml .. code-block:: yaml
paiStorageConfigName: confignfs-data storageConfigName: confignfs-data
containerNFSMountPath: /mnt/confignfs-data containerStorageMountPoint: /mnt/confignfs-data
Run an experiment Run an experiment
----------------- -----------------
Use ``examples/trials/mnist-annotation`` as an example. The NNI config YAML file's content is like: Use ``examples/trials/mnist-pytorch`` as an example. The NNI config YAML file's content is like:
.. code-block:: yaml .. code-block:: yaml
authorName: your_name searchSpaceFile: search_space.json
experimentName: auto_mnist trialCommand: python3 mnist.py
# how many trials could be concurrently running trialGpuNumber: 0
trialConcurrency: 2 trialConcurrency: 1
# maximum experiment running duration maxTrialNumber: 10
maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
# choice: local, remote, pai
trainingServicePlatform: pai
# search space file
searchSpacePath: search_space.json
# choice: true, false
useAnnotation: true
tuner: tuner:
builtinTunerName: TPE name: TPE
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
trial: trainingService:
command: python3 mnist.py platform: openpai
codeDir: ~/nni/examples/trials/mnist-annotation host: http://123.123.123.123
gpuNum: 0 username: ${your user name}
cpuNum: 1 token: ${your token}
memoryMB: 8196 dockerImage: msranni/nni
image: msranni/nni:latest trialCpuNumber: 1
virtualCluster: default trialMemorySize: 8GB
nniManagerNFSMountPath: /local/mnt storageConfigName: ${your storage config name}
containerNFSMountPath: /mnt/confignfs-data localStorageMountPoint: ${NFS mount point on local machine}
paiStorageConfigName: confignfs-data containerStorageMountPoint: ${NFS mount point inside Docker container}
# Configuration to access OpenPAI Cluster
paiConfig: Note: You should set ``platform: pai`` in NNI config YAML file if you want to start experiment in pai mode. The host field in configuration file is PAI's job submission page uri, like ``10.10.5.1``\ , the default protocol in NNI is HTTPS, if your PAI's cluster disabled https, please use the uri in ``http://10.10.5.1`` format.
userName: your_pai_nni_user
token: your_pai_token OpenPai configurations
host: 10.1.1.1 ^^^^^^^^^^^^^^^^^^^^^^
# optional, experimental feature.
reuse: true Compared with `LocalMode <LocalMode.rst>`__ and `RemoteMachineMode <RemoteMachineMode.rst>`__\ , ``trainingService`` configuration in pai mode has the following additional keys:
Note: You should set ``trainingServicePlatform: pai`` in NNI config YAML file if you want to start experiment in pai mode. The host field in configuration file is PAI's job submission page uri, like ``10.10.5.1``\ , the default http protocol in NNI is ``http``\ , if your PAI's cluster enabled https, please use the uri in ``https://10.10.5.1`` format.
*
Trial configurations username
^^^^^^^^^^^^^^^^^^^^
Required key. User name of OpenPAI platform.
Compared with `LocalMode <LocalMode.rst>`__ and `RemoteMachineMode <RemoteMachineMode.rst>`__\ , ``trial`` configuration in pai mode has the following additional keys:
*
token
Required key. Authentication key of OpenPAI platform.
*
host
Required key. The host of OpenPAI platform. It's OpenPAI's job submission page uri, like ``10.10.5.1``\ , the default protocol in NNI is HTTPS, if your OpenPAI cluster disabled https, please use the uri in ``http://10.10.5.1`` format.
* *
cpuNum trialCpuNumber
Optional key. Should be positive number based on your trial program's CPU requirement. If it is not set in trial configuration, it should be set in the config file specified in ``paiConfigPath`` field. Optional key. Should be positive number based on your trial program's CPU requirement. If it is not set in trial configuration, it should be set in the config specified in ``openpaiConfig`` or ``openpaiConfigFile`` field.
* *
memoryMB trialMemorySize
Optional key. Should be positive number based on your trial program's memory requirement. If it is not set in trial configuration, it should be set in the config file specified in ``paiConfigPath`` field. Optional key. Should be in format like ``2gb`` based on your trial program's memory requirement. If it is not set in trial configuration, it should be set in the config specified in ``openpaiConfig`` or ``openpaiConfigFile`` field.
* *
image dockerImage
Optional key. In pai mode, your trial program will be scheduled by OpenPAI to run in `Docker container <https://www.docker.com/>`__. This key is used to specify the Docker image used to create the container in which your trial will run. Optional key. In OpenPai mode, your trial program will be scheduled by OpenPAI to run in `Docker container <https://www.docker.com/>`__. This key is used to specify the Docker image used to create the container in which your trial will run.
We already build a docker image :githublink:`nnimsra/nni <deployment/docker/Dockerfile>`. You can either use this image directly in your config file, or build your own image based on it. If it is not set in trial configuration, it should be set in the config file specified in ``paiConfigPath`` field. We already build a docker image :githublink:`nnimsra/nni <deployment/docker/Dockerfile>`. You can either use this image directly in your config file, or build your own image based on it. If it is not set in trial configuration, it should be set in the config specified in ``openpaiConfig`` or ``openpaiConfigFile`` field.
.. cannot find :githublink:`nnimsra/nni <deployment/docker/Dockerfile>` .. cannot find :githublink:`nnimsra/nni <deployment/docker/Dockerfile>`
...@@ -138,30 +137,31 @@ Compared with `LocalMode <LocalMode.rst>`__ and `RemoteMachineMode <RemoteMachin ...@@ -138,30 +137,31 @@ Compared with `LocalMode <LocalMode.rst>`__ and `RemoteMachineMode <RemoteMachin
Optional key. Set the virtualCluster of OpenPAI. If omitted, the job will run on default virtual cluster. Optional key. Set the virtualCluster of OpenPAI. If omitted, the job will run on default virtual cluster.
* *
nniManagerNFSMountPath localStorageMountPoint
Required key. Set the mount path in your nniManager machine. Required key. Set the mount path in the machine you run nnictl.
* *
containerNFSMountPath containerStorageMountPoint
Required key. Set the mount path in your container used in OpenPAI. Required key. Set the mount path in your container used in OpenPAI.
* *
paiStorageConfigName: storageConfigName:
Optional key. Set the storage name used in OpenPAI. If it is not set in trial configuration, it should be set in the config file specified in ``paiConfigPath`` field. Optional key. Set the storage name used in OpenPAI. If it is not set in trial configuration, it should be set in the config specified in ``openpaiConfig`` or ``openpaiConfigFile`` field.
* *
command openpaiConfigFile
Optional key. Set the commands used in OpenPAI container.
*
paiConfigPath
Optional key. Set the file path of OpenPAI job configuration, the file is in yaml format. Optional key. Set the file path of OpenPAI job configuration, the file is in yaml format.
If users set ``paiConfigPath`` in NNI's configuration file, no need to specify the fields ``command``\ , ``paiStorageConfigName``\ , ``virtualCluster``\ , ``image``\ , ``memoryMB``\ , ``cpuNum``\ , ``gpuNum`` in ``trial`` configuration. These fields will use the values from the config file specified by ``paiConfigPath``. If users set ``openpaiConfigFile`` in NNI's configuration file, no need to specify the fields ``storageConfigName``, ``virtualCluster``, ``dockerImage``, ``trialCpuNumber``, ``trialGpuNumber``, ``trialMemorySize`` in configuration. These fields will use the values from the config file specified by ``openpaiConfigFile``.
*
openpaiConfig
Optional key. Similar to ``openpaiConfigFile``, but instead of referencing an external file, using this field you embed the content into NNI's config YAML.
Note: Note:
...@@ -172,32 +172,6 @@ Compared with `LocalMode <LocalMode.rst>`__ and `RemoteMachineMode <RemoteMachin ...@@ -172,32 +172,6 @@ Compared with `LocalMode <LocalMode.rst>`__ and `RemoteMachineMode <RemoteMachin
#. #.
If users set multiple taskRoles in OpenPAI's configuration file, NNI will wrap all of these taksRoles and start multiple tasks in one trial job, users should ensure that only one taskRole report metric to NNI, otherwise there might be some conflict error. If users set multiple taskRoles in OpenPAI's configuration file, NNI will wrap all of these taksRoles and start multiple tasks in one trial job, users should ensure that only one taskRole report metric to NNI, otherwise there might be some conflict error.
OpenPAI configurations
^^^^^^^^^^^^^^^^^^^^^^
``paiConfig`` includes OpenPAI specific configurations,
*
userName
Required key. User name of OpenPAI platform.
*
token
Required key. Authentication key of OpenPAI platform.
*
host
Required key. The host of OpenPAI platform. It's OpenPAI's job submission page uri, like ``10.10.5.1``\ , the default http protocol in NNI is ``http``\ , if your OpenPAI cluster enabled https, please use the uri in ``https://10.10.5.1`` format.
*
reuse (experimental feature)
Optional key, default is false. If it's true, NNI will reuse OpenPAI jobs to run as many as possible trials. It can save time of creating new jobs. User needs to make sure each trial can run independent in same job, for example, avoid loading checkpoint from previous trials.
Once complete to fill NNI experiment config file and save (for example, save as exp_pai.yml), then run the following command Once complete to fill NNI experiment config file and save (for example, save as exp_pai.yml), then run the following command
.. code-block:: bash .. code-block:: bash
......
...@@ -95,94 +95,45 @@ e.g. there are three machines, which can be logged in with username and password ...@@ -95,94 +95,45 @@ e.g. there are three machines, which can be logged in with username and password
Install and run NNI on one of those three machines or another machine, which has network access to them. Install and run NNI on one of those three machines or another machine, which has network access to them.
Use ``examples/trials/mnist-annotation`` as the example. Below is content of ``examples/trials/mnist-annotation/config_remote.yml``\ : Use ``examples/trials/mnist-pytorch`` as the example. Below is content of ``examples/trials/mnist-pytorch/config_remote.yml``\ :
.. code-block:: yaml .. code-block:: yaml
authorName: default searchSpaceFile: search_space.json
experimentName: example_mnist trialCommand: python3 mnist.py
trialConcurrency: 1 trialCodeDirectory: . # default value, can be omitted
maxExecDuration: 1h trialGpuNumber: 0
maxTrialNum: 10 trialConcurrency: 4
#choice: local, remote, pai maxTrialNumber: 20
trainingServicePlatform: remote
# search space file
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: true
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner name: TPE
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs: classArgs:
#choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
trial: trainingService:
command: python3 mnist.py platform: remote
codeDir: . machineList:
gpuNum: 0 - host: 192.0.2.1
#machineList can be empty if the platform is local user: alice
machineList: ssh_key_file: ~/.ssh/id_rsa
- ip: 10.1.1.1 - host: 192.0.2.2
username: bob port: 10022
passwd: bob123 user: bob
#port can be skip if using default ssh port 22 password: bob123
#port: 22 pythonPath: /usr/bin
- ip: 10.1.1.2
username: bob Files in ``trialCodeDirectory`` will be uploaded to remote machines automatically. You can run below command on Windows, Linux, or macOS to spawn trials on remote Linux machines:
passwd: bob123
- ip: 10.1.1.3
username: bob
passwd: bob123
Files in ``codeDir`` will be uploaded to remote machines automatically. You can run below command on Windows, Linux, or macOS to spawn trials on remote Linux machines:
.. code-block:: bash .. code-block:: bash
nnictl create --config examples/trials/mnist-annotation/config_remote.yml nnictl create --config examples/trials/mnist-pytorch/config_remote.yml
Configure python environment Configure python environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By default, commands and scripts will be executed in the default environment in remote machine. If there are multiple python virtual environments in your remote machine, and you want to run experiments in a specific environment, then use **pythonPath** to specify a python environment on your remote machine. By default, commands and scripts will be executed in the default environment in remote machine. If there are multiple python virtual environments in your remote machine, and you want to run experiments in a specific environment, then use **pythonPath** to specify a python environment on your remote machine.
Use ``examples/trials/mnist-tfv2`` as the example. Below is content of ``examples/trials/mnist-tfv2/config_remote.yml``\ : For example, with anaconda you can specify:
.. code-block:: yaml .. code-block:: yaml
authorName: default pythonPath: /home/bob/.conda/envs/ENV-NAME/bin
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
#machineList can be empty if the platform is local
machineList:
- ip: ${replace_to_your_remote_machine_ip}
username: ${replace_to_your_remote_machine_username}
sshKeyPath: ${replace_to_your_remote_machine_sshKeyPath}
# Below is an example of specifying python environment.
pythonPath: ${replace_to_python_environment_path_in_your_remote_machine}
Remote machine supports running experiment in reuse mode. In this mode, NNI will reuse remote machine jobs to run as many as possible trials. It can save time of creating new jobs. User needs to make sure each trial can run independent in the same job, for example, avoid loading checkpoint from previous trials.
Follow the setting to enable reuse mode:
.. code-block:: yaml
remoteConfig:
reuse: true
\ No newline at end of file
...@@ -190,36 +190,27 @@ More support variable type you could reference `here <../Tutorial/SearchSpaceSpe ...@@ -190,36 +190,27 @@ More support variable type you could reference `here <../Tutorial/SearchSpaceSpe
In the config file, you could set some settings including: In the config file, you could set some settings including:
* Experiment setting: ``trialConcurrency``\ , ``maxExecDuration``\ , ``maxTrialNum``\ , ``trial gpuNum``\ , etc. * Experiment setting: ``trialConcurrency``\ , ``trialGpuNumber``\ , etc.
* Platform setting: ``trainingServicePlatform``\ , etc. * Platform setting: ``trainingService``\ , etc.
* Path seeting: ``searchSpacePath``\ , ``trial codeDir``\ , etc. * Path setting: ``searchSpaceFile``\ , ``trialCodeDirectory``\ , etc.
* Algorithm setting: select ``tuner`` algorithm, ``tuner optimize_mode``\ , etc. * Algorithm setting: select ``tuner`` algorithm, ``tuner optimize_mode``\ , etc.
An config.yml as follow: An config.yml as follow:
.. code-block:: yaml .. code-block:: yaml
authorName: default experimentName: auto-gbdt example
experimentName: example_auto-gbdt searchSpaceFile: search_space.json
trialCommand: python3 main.py
trialGpuNumber: 0
trialConcurrency: 1 trialConcurrency: 1
maxExecDuration: 10h maxTrialNumber: 10
maxTrialNum: 10 trainingService:
#choice: local, remote, pai platform: local
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner name: TPE #choice: TPE, Random, Anneal, Evolution, BatchTuner, etc
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs: classArgs:
#choice: maximize, minimize
optimize_mode: minimize optimize_mode: minimize
trial:
command: python3 main.py
codeDir: .
gpuNum: 0
Run this experiment with command as follow: Run this experiment with command as follow:
......
...@@ -59,7 +59,7 @@ Benchmark code should receive a configuration from NNI manager, and report the c ...@@ -59,7 +59,7 @@ Benchmark code should receive a configuration from NNI manager, and report the c
Config file Config file
^^^^^^^^^^^ ^^^^^^^^^^^
One could start a NNI experiment with a config file. A config file for NNI is a ``yaml`` file usually including experiment settings (\ ``trialConcurrency``\ , ``maxExecDuration``\ , ``maxTrialNum``\ , ``trial gpuNum``\ , etc.), platform settings (\ ``trainingServicePlatform``\ , etc.), path settings (\ ``searchSpacePath``\ , ``trial codeDir``\ , etc.) and tuner settings (\ ``tuner``\ , ``tuner optimize_mode``\ , etc.). Please refer to `here <../Tutorial/QuickStart.rst>`__ for more information. One could start a NNI experiment with a config file. A config file for NNI is a ``yaml`` file usually including experiment settings (\ ``trialConcurrency``\ , ``trialGpuNumber``\ , etc.), platform settings (\ ``trainingService``\ ), path settings (\ ``searchSpaceFile``\ , ``trialCodeDirectory``\ , etc.) and tuner settings (\ ``tuner``\ , ``tuner optimize_mode``\ , etc.). Please refer to `here <../Tutorial/QuickStart.rst>`__ for more information.
Here is an example of tuning RocksDB with SMAC algorithm: Here is an example of tuning RocksDB with SMAC algorithm:
......
...@@ -67,27 +67,26 @@ Modify ``nni/examples/trials/ga_squad/config.yml``\ , here is the default config ...@@ -67,27 +67,26 @@ Modify ``nni/examples/trials/ga_squad/config.yml``\ , here is the default config
.. code-block:: yaml .. code-block:: yaml
authorName: default experimentName: ga-squad example
experimentName: example_ga_squad trialCommand: python3 trial.py
trialCodeDirectory: ~/nni/examples/trials/ga_squad
trialGpuNumber: 0
trialConcurrency: 1 trialConcurrency: 1
maxExecDuration: 1h maxTrialNumber: 10
maxTrialNum: 1 maxExperimentDuration: 1h
#choice: local, remote
trainingServicePlatform: local searchSpace: {} # hard-coded in tuner
#choice: true, false
useAnnotation: false
tuner: tuner:
codeDir: ~/nni/examples/tuners/ga_customer_tuner className: customer_tuner.CustomerTuner
classFileName: customer_tuner.py codeDirectory: ~/nni/examples/tuners/ga_customer_tuner
className: CustomerTuner
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
trial:
command: python3 trial.py
codeDir: ~/nni/examples/trials/ga_squad
gpuNum: 0
In the **trial** part, if you want to use GPU to perform the architecture search, change ``gpuNum`` from ``0`` to ``1``. You need to increase the ``maxTrialNum`` and ``maxExecDuration``\ , according to how long you want to wait for the search result. trainingService:
platform: local
In the **trial** part, if you want to use GPU to perform the architecture search, change ``trialGpuNum`` from ``0`` to ``1``. You need to increase the ``maxTrialNumber`` and ``maxExperimentDuration``\ , according to how long you want to wait for the search result.
2.3 submit this job 2.3 submit this job
^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
...@@ -96,73 +95,15 @@ In the **trial** part, if you want to use GPU to perform the architecture search ...@@ -96,73 +95,15 @@ In the **trial** part, if you want to use GPU to perform the architecture search
nnictl create --config ~/nni/examples/trials/ga_squad/config.yml nnictl create --config ~/nni/examples/trials/ga_squad/config.yml
3 Run this example on OpenPAI 3. Technical details about the trial
-----------------------------
Due to the memory limitation of upload, we only upload the source code and complete the data download and training on OpenPAI. This experiment requires sufficient memory that ``memoryMB >= 32G``\ , and the training may last for several hours.
3.1 Update configuration
^^^^^^^^^^^^^^^^^^^^^^^^
Modify ``nni/examples/trials/ga_squad/config_pai.yml``\ , here is the default configuration:
.. code-block:: yaml
authorName: default
experimentName: example_ga_squad
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: pai
#choice: true, false
useAnnotation: false
#Your nni_manager ip
nniManagerIp: 10.10.10.10
tuner:
codeDir: https://github.com/Microsoft/nni/tree/v2.3/examples/tuners/ga_customer_tuner
classFileName: customer_tuner.py
className: CustomerTuner
classArgs:
optimize_mode: maximize
trial:
command: chmod +x ./download.sh && ./download.sh && python3 trial.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 32869
#The docker image to run nni job on OpenPAI
image: msranni/nni:latest
paiConfig:
#The username to login OpenPAI
userName: username
#The password to login OpenPAI
passWord: password
#The host of restful server of OpenPAI
host: 10.10.10.10
Please change the default value to your personal account and machine information. Including ``nniManagerIp``\ , ``userName``\ , ``passWord`` and ``host``.
In the "trial" part, if you want to use GPU to perform the architecture search, change ``gpuNum`` from ``0`` to ``1``. You need to increase the ``maxTrialNum`` and ``maxExecDuration``\ , according to how long you want to wait for the search result.
``trialConcurrency`` is the number of trials running concurrently, which is the number of GPUs you want to use, if you are setting ``gpuNum`` to 1.
3.2 submit this job
^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
nnictl create --config ~/nni/examples/trials/ga_squad/config_pai.yml
4. Technical details about the trial
------------------------------------ ------------------------------------
4.1 How does it works 3.1 How does it works
^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^
The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner. The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner.
4.2 The trial 3.2 The trial
^^^^^^^^^^^^^ ^^^^^^^^^^^^^
The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction: The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
...@@ -224,7 +165,7 @@ performs topological sorting on the internal graph representation, and the code ...@@ -224,7 +165,7 @@ performs topological sorting on the internal graph representation, and the code
performs actually conversion that maps each layer to a part in Tensorflow computation graph. performs actually conversion that maps each layer to a part in Tensorflow computation graph.
4.3 The tuner 3.3 The tuner
^^^^^^^^^^^^^ ^^^^^^^^^^^^^
The tuner is much more simple than the trial. They actually share the same ``graph.py``. Besides, the tuner has a ``customer_tuner.py``\ , the most important class in which is ``CustomerTuner``\ : The tuner is much more simple than the trial. They actually share the same ``graph.py``. Besides, the tuner has a ``customer_tuner.py``\ , the most important class in which is ``CustomerTuner``\ :
...@@ -272,7 +213,7 @@ As we can see, the overloaded method ``generate_parameters`` implements a pretty ...@@ -272,7 +213,7 @@ As we can see, the overloaded method ``generate_parameters`` implements a pretty
controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result. controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result.
4.4 Model configuration format 3.4 Model configuration format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure. Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment