experiment_config.rst 22.4 KB
Newer Older
liuzhe-lz's avatar
liuzhe-lz committed
1
2
3
4
===========================
Experiment Config Reference
===========================

kvartet's avatar
kvartet committed
5
A config file is needed when creating an experiment. This document describes the rules to write a config file and provides some examples.
liuzhe-lz's avatar
liuzhe-lz committed
6

kvartet's avatar
kvartet committed
7
.. Note::
liuzhe-lz's avatar
liuzhe-lz committed
8

kvartet's avatar
kvartet committed
9
    1. This document lists field names with ``camelCase``. If users use these fields in the pythonic way with NNI Python APIs (e.g., ``nni.experiment``), the field names should be converted to ``snake_case``.
liuzhe-lz's avatar
liuzhe-lz committed
10

kvartet's avatar
kvartet committed
11
    2. In this document, the type of fields are formatted as `Python type hint <https://docs.python.org/3.10/library/typing.html>`_. Therefore JSON objects are called `dict` and arrays are called `list`.
liuzhe-lz's avatar
liuzhe-lz committed
12

kvartet's avatar
kvartet committed
13
    .. _path: 
liuzhe-lz's avatar
liuzhe-lz committed
14

kvartet's avatar
kvartet committed
15
    3. Some fields take a path to a file or directory. Unless otherwise noted, both absolute path and relative path are supported, and ``~`` will be expanded to the home directory.
liuzhe-lz's avatar
liuzhe-lz committed
16

kvartet's avatar
kvartet committed
17
18
19
20
21
22
       - When written in the YAML file, relative paths are relative to the directory containing that file.
       - When assigned in Python code, relative paths are relative to the current working directory.
       - All relative paths are converted to absolute when loading YAML file into Python class, and when saving Python class to YAML file.

    4. Setting a field to ``None`` or ``null`` is equivalent to not setting the field.

liuzhe-lz's avatar
liuzhe-lz committed
23
24
25
26
27
28
29
30
31
32
33
34
35
Examples
========

Local Mode
^^^^^^^^^^

.. code-block:: yaml

    experimentName: MNIST
    searchSpaceFile: search_space.json
    trialCommand: python mnist.py
    trialCodeDirectory: .
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
36
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
    maxExperimentDuration: 24h
    maxTrialNumber: 100
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: local
      useActiveGpu: True

Local Mode (Inline Search Space)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

    searchSpace:
      batch_size:
        _type: choice
        _value: [16, 32, 64]
      learning_rate:
        _type: loguniform
        _value: [0.0001, 0.1]
    trialCommand: python mnist.py
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
61
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: local
      useActiveGpu: True

Remote Mode
^^^^^^^^^^^

.. code-block:: yaml

    experimentName: MNIST
    searchSpaceFile: search_space.json
    trialCommand: python mnist.py
    trialCodeDirectory: .
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
80
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
    maxExperimentDuration: 24h
    maxTrialNumber: 100
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: remote
      machineList:
        - host: 11.22.33.44
          user: alice
          password: xxxxx
        - host: my.domain.com
          user: bob
          sshKeyFile: ~/.ssh/id_rsa

Reference
=========

liuzhe-lz's avatar
liuzhe-lz committed
100
ExperimentConfig
liuzhe-lz's avatar
liuzhe-lz committed
101
^^^^^^^^^^^^^^^^
liuzhe-lz's avatar
liuzhe-lz committed
102

J-shang's avatar
J-shang committed
103
104
105
106
107
108
109
110
111
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description
    
    * - experimentName
112
      - ``str``, optional
J-shang's avatar
J-shang committed
113
114
115
      - Mnemonic name of the experiment, which will be shown in WebUI and nnictl.

    * - searchSpaceFile
116
      - ``str``, optional
J-shang's avatar
J-shang committed
117
      - Path_ to the JSON file containing the search space.
liuzhe-lz's avatar
liuzhe-lz committed
118
        Search space format is determined by tuner. The common format for built-in tuners is documented :doc:`here </hpo/search_space>`.
J-shang's avatar
J-shang committed
119
120
121
        Mutually exclusive to ``searchSpace``.

    * - searchSpace
122
      - ``JSON``, optional
J-shang's avatar
J-shang committed
123
      - Search space object.
liuzhe-lz's avatar
liuzhe-lz committed
124
        The format is determined by tuner. Common format for built-in tuners is documented :doc:`here </hpo/search_space>`.
J-shang's avatar
J-shang committed
125
126
127
128
129
130
131
132
133
134
        Note that ``None`` means "no such field" so empty search space should be written as ``{}``.
        Mutually exclusive to ``searchSpaceFile``.

    * - trialCommand
      - ``str``
      - Command to launch trial.
        The command will be executed in bash on Linux and macOS, and in PowerShell on Windows.
        Note that using ``python3`` on Linux and macOS, and using ``python`` on Windows.

    * - trialCodeDirectory
135
136
      - ``str``, optional
      - Default: ``"."``. `Path`_ to the directory containing trial source files.
J-shang's avatar
J-shang committed
137
138
139
140
141
142
143
144
145
        All files in this directory will be sent to the training machine, unless in the ``.nniignore`` file.
        (See :ref:`nniignore <nniignore>` for details.)

    * - trialConcurrency
      - ``int``
      - Specify how many trials should be run concurrently.
        The real concurrency also depends on hardware resources and may be less than this value.

    * - trialGpuNumber
146
147
      - ``int`` or ``None``, optional
      - Default: None. This field might have slightly different meanings for various training services,
J-shang's avatar
J-shang committed
148
        especially when set to ``0`` or ``None``.
liuzhe-lz's avatar
liuzhe-lz committed
149
        See :doc:`training service's document </experiment/training_service/overview>` for details.
J-shang's avatar
J-shang committed
150
151
152
153
154
155

        In local mode, setting the field to ``0`` will prevent trials from accessing GPU (by empty ``CUDA_VISIBLE_DEVICES``).
        And when set to ``None``, trials will be created and scheduled as if they did not use GPU,
        but they can still use all GPU resources if they want.

    * - maxExperimentDuration
156
157
158
159
      - ``str``, optional
      - Limit the duration of this experiment if specified. The duration is unlimited if not set.
        Format: ``number + s|m|h|d``.
        Examples: ``"10m"``, ``"0.5h"``.
J-shang's avatar
J-shang committed
160
161
162
        When time runs out, the experiment will stop creating trials but continue to serve WebUI.

    * - maxTrialNumber
163
164
      - ``int``, optional
      - Limit the number of trials to create if specified. The trial number is unlimited if not set.
J-shang's avatar
J-shang committed
165
166
167
        When the budget runs out, the experiment will stop creating trials but continue to serve WebUI.

    * - maxTrialDuration
168
169
170
171
      - ``str``, optional
      - Limit the duration of trial job if specified. The duration is unlimited if not set.
        Format: ``number + s|m|h|d``.
        Examples: ``"10m"``, ``"0.5h"``.
J-shang's avatar
J-shang committed
172
173
174
        When time runs out, the current trial job will stop.

    * - nniManagerIp
175
176
      - ``str``, optional
      - Default: default connection chosen by system. IP of the current machine, used by training machines to access NNI manager. Not used in local mode.
J-shang's avatar
J-shang committed
177
178
179
        Except for the local mode, it is highly recommended to set this field manually.

    * - useAnnotation
180
      - ``bool``, optional
liuzhe-lz's avatar
liuzhe-lz committed
181
      - Default: ``False``. Enable :doc:`annotation </hpo/nni_annotation>`.
J-shang's avatar
J-shang committed
182
183
184
        When using annotation, ``searchSpace`` and ``searchSpaceFile`` should not be specified manually.

    * - debug
185
186
      - ``bool``, optional
      - Default: ``False``. Enable debug mode.
J-shang's avatar
J-shang committed
187
188
189
        When enabled, logging will be more verbose and some internal validation will be loosened.

    * - logLevel
190
191
      - ``str``, optional
      - Default: ``info`` or ``debug``, depending on ``debug`` option. Set log level of the whole system.
J-shang's avatar
J-shang committed
192
        values: ``"trace"``, ``"debug"``, ``"info"``, ``"warning"``, ``"error"``, ``"fatal"``
193
        When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info".
J-shang's avatar
J-shang committed
194
195
196
197
198
        Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc.
        The exception is trial, whose logging level is directly managed by trial code.
        For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``logging.CRITICAL``.

    * - experimentWorkingDirectory
199
200
201
      - ``str``, optional
      - Default: ``~/nni-experiments``.
        Specify the :ref:`directory <path>` to place log, checkpoint, metadata, and other run-time stuff.
J-shang's avatar
J-shang committed
202
203
204
        NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments.

    * - tunerGpuIndices
205
      - ``list[int]`` or ``str`` or ``int``, optional
J-shang's avatar
J-shang committed
206
207
208
209
210
      - Limit the GPUs visible to tuner, assessor, and advisor.
        This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process.
        Because tuner, assessor, and advisor run in the same process, this option will affect them all.

    * - tuner
211
      - ``AlgorithmConfig``, optional
J-shang's avatar
J-shang committed
212
      - Specify the tuner.
liuzhe-lz's avatar
liuzhe-lz committed
213
        The built-in tuners can be found :doc:`here </hpo/tuners>` and you can follow :doc:`this tutorial </hpo/custom_algorithm>` to customize a new tuner.
J-shang's avatar
J-shang committed
214
215

    * - assessor
216
      - ``AlgorithmConfig``, optional
J-shang's avatar
J-shang committed
217
      - Specify the assessor.
liuzhe-lz's avatar
liuzhe-lz committed
218
        The built-in assessors can be found :doc:`here </hpo/assessors>` and you can follow :doc:`this tutorial </hpo/custom_algorithm>` to customize a new assessor.
J-shang's avatar
J-shang committed
219
220

    * - advisor
221
      - ``AlgorithmConfig``, optional
J-shang's avatar
J-shang committed
222
      - Specify the advisor.
liuzhe-lz's avatar
liuzhe-lz committed
223
        NNI provides two built-in advisors: :class:`BOHB <nni.algorithms.hpo.bohb_advisor.BOHB>` and :class:`Hyperband <nni.algorithms.hpo.hyperband_advisor.Hyperband>`.
J-shang's avatar
J-shang committed
224
225
226

    * - trainingService
      - ``TrainingServiceConfig``
liuzhe-lz's avatar
liuzhe-lz committed
227
      - Specify the :doc:`training service </experiment/training_service/overview>`.
J-shang's avatar
J-shang committed
228
229

    * - sharedStorage
230
      - ``SharedStorageConfig``, optional
liuzhe-lz's avatar
liuzhe-lz committed
231
      - Configure the shared storage, detailed usage can be found :doc:`here </experiment/training_service/shared_storage>`.
kvartet's avatar
kvartet committed
232

liuzhe-lz's avatar
liuzhe-lz committed
233
AlgorithmConfig
liuzhe-lz's avatar
liuzhe-lz committed
234
235
236
237
^^^^^^^^^^^^^^^

``AlgorithmConfig`` describes a tuner / assessor / advisor algorithm.

kvartet's avatar
kvartet committed
238
For customized algorithms, there are two ways to describe them:
liuzhe-lz's avatar
liuzhe-lz committed
239

liuzhe-lz's avatar
liuzhe-lz committed
240
1. :doc:`Register the algorithm </hpo/custom_algorithm_installation>` to use it like built-in. (preferred)
liuzhe-lz's avatar
liuzhe-lz committed
241

liuzhe-lz's avatar
liuzhe-lz committed
242
2. Specify code directory and class name directly.
liuzhe-lz's avatar
liuzhe-lz committed
243

J-shang's avatar
J-shang committed
244
245
246
247
248
249
250
251
252
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description
    
    * - name
253
254
      - ``str`` or ``None``, optional
      - Default: None. Name of the built-in or registered algorithm.
J-shang's avatar
J-shang committed
255
256
257
        ``str`` for the built-in and registered algorithm, ``None`` for other customized algorithms.

    * - className
258
259
      - ``str`` or ``None``, optional
      - Default: None. Qualified class name of not registered customized algorithm.
J-shang's avatar
J-shang committed
260
261
262
263
        ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.
        example: ``"my_tuner.MyTuner"``

    * - codeDirectory
264
265
      - ``str`` or ``None``, optional
      - Default: None. Path_ to the directory containing the customized algorithm class.
J-shang's avatar
J-shang committed
266
267
268
        ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.

    * - classArgs
269
      - ``dict[str, Any]``, optional
J-shang's avatar
J-shang committed
270
271
      - Keyword arguments passed to algorithm class' constructor.
        See algorithm's document for supported value.
liuzhe-lz's avatar
liuzhe-lz committed
272
273

TrainingServiceConfig
liuzhe-lz's avatar
liuzhe-lz committed
274
^^^^^^^^^^^^^^^^^^^^^
liuzhe-lz's avatar
liuzhe-lz committed
275

kvartet's avatar
kvartet committed
276
One of the following:
liuzhe-lz's avatar
liuzhe-lz committed
277

liuzhe-lz's avatar
liuzhe-lz committed
278
279
- `LocalConfig`_
- `RemoteConfig`_
J-shang's avatar
J-shang committed
280
- `OpenpaiConfig`_
liuzhe-lz's avatar
liuzhe-lz committed
281
- `AmlConfig`_
282
- `DlcConfig`_
kvartet's avatar
kvartet committed
283
- `HybridConfig`_
liuzhe-lz's avatar
liuzhe-lz committed
284
285
- :doc:`FrameworkControllerConfig </experiment/training_service/frameworkcontroller>`
- :doc:`KubeflowConfig </experiment/training_service/kubeflow>`
liuzhe-lz's avatar
liuzhe-lz committed
286

287
288
.. _reference-local-config-label:

liuzhe-lz's avatar
liuzhe-lz committed
289
LocalConfig
kvartet's avatar
kvartet committed
290
-----------
liuzhe-lz's avatar
liuzhe-lz committed
291

292
Introduction of the corresponding local training service can be found :doc:`/experiment/training_service/local`.
liuzhe-lz's avatar
liuzhe-lz committed
293

J-shang's avatar
J-shang committed
294
295
296
297
298
299
300
301
302
303
304
305
306
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description

    * - platform
      - ``"local"``
      -
    
    * - useActiveGpu
307
308
      - ``bool``, optional
      - Default: ``False``. Specify whether NNI should submit trials to GPUs occupied by other tasks.
J-shang's avatar
J-shang committed
309
310
311
312
313
314
315
316
317
318
319
320
        Must be set when ``trialGpuNumber`` greater than zero.
        Following processes can make GPU "active":

          - non-NNI CUDA programs
          - graphical desktop
          - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
          - other users' CUDA programs, if you are using a shared server
          
        If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.
        When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.

    * - maxTrialNumberPerGpu
321
322
      - ``int``, optional
      - Default: ``1``. Specify how many trials can share one GPU.
J-shang's avatar
J-shang committed
323
324

    * - gpuIndices
325
      - ``list[int]`` or ``str`` or ``int``, optional
J-shang's avatar
J-shang committed
326
327
328
      - Limit the GPUs visible to trial processes.
        If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial.
        This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.
liuzhe-lz's avatar
liuzhe-lz committed
329

330
331
.. _reference-remote-config-label:

liuzhe-lz's avatar
liuzhe-lz committed
332
RemoteConfig
kvartet's avatar
kvartet committed
333
------------
liuzhe-lz's avatar
liuzhe-lz committed
334

335
Detailed usage can be found :doc:`/experiment/training_service/remote`.
liuzhe-lz's avatar
liuzhe-lz committed
336

J-shang's avatar
J-shang committed
337
338
339
.. list-table::
    :widths: 10 10 80
    :header-rows: 1
liuzhe-lz's avatar
liuzhe-lz committed
340

J-shang's avatar
J-shang committed
341
342
343
    * - Field Name
      - Type
      - Description
liuzhe-lz's avatar
liuzhe-lz committed
344

J-shang's avatar
J-shang committed
345
346
347
    * - platform
      - ``"remote"``
      -
liuzhe-lz's avatar
liuzhe-lz committed
348

J-shang's avatar
J-shang committed
349
350
351
    * - machineList
      - ``List[RemoteMachineConfig]``
      - List of training machines.
liuzhe-lz's avatar
liuzhe-lz committed
352

J-shang's avatar
J-shang committed
353
    * - reuseMode
354
      - ``bool``, optional
Yuge Zhang's avatar
Yuge Zhang committed
355
      - Default: ``True``. Enable :ref:`reuse mode <training-service-reuse>`.
liuzhe-lz's avatar
liuzhe-lz committed
356
357

RemoteMachineConfig
kvartet's avatar
kvartet committed
358
"""""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
359

J-shang's avatar
J-shang committed
360
361
362
363
364
365
366
367
368
369
370
371
372
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description

    * - host
      - ``str``
      - IP or hostname (domain name) of the machine.

    * - port
373
374
      - ``int``, optional
      - Default: ``22``. SSH service port.
J-shang's avatar
J-shang committed
375
376
377
378
379
380

    * - user
      - ``str``
      - Login user name.

    * - password
381
      - ``str``, optional
J-shang's avatar
J-shang committed
382
383
384
      - If not specified, ``sshKeyFile`` will be used instead.
    
    * - sshKeyFile
385
      - ``str``, optional
J-shang's avatar
J-shang committed
386
387
388
389
      - `Path`_ to ``sshKeyFile`` (identity file).
        Only used when ``password`` is not specified.

    * - sshPassphrase
390
      - ``str``, optional
J-shang's avatar
J-shang committed
391
392
393
      - Passphrase of SSH identity file.

    * - useActiveGpu
394
395
      - ``bool``, optional
      - Default: ``False``. Specify whether NNI should submit trials to GPUs occupied by other tasks.
J-shang's avatar
J-shang committed
396
397
398
399
400
401
402
        Must be set when ``trialGpuNumber`` greater than zero.
        Following processes can make GPU "active":

          - non-NNI CUDA programs
          - graphical desktop
          - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
          - other users' CUDA programs, if you are using a shared server
kvartet's avatar
kvartet committed
403
  
J-shang's avatar
J-shang committed
404
405
        If your remote machine is a graphical OS like Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.
        When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.
liuzhe-lz's avatar
liuzhe-lz committed
406

J-shang's avatar
J-shang committed
407
    * - maxTrialNumberPerGpu
408
409
      - ``int``, optional
      - Default: ``1``. Specify how many trials can share one GPU.
liuzhe-lz's avatar
liuzhe-lz committed
410

J-shang's avatar
J-shang committed
411
    * - gpuIndices
412
      - ``list[int]`` or ``str`` or ``int``, optional
J-shang's avatar
J-shang committed
413
414
415
      - Limit the GPUs visible to trial processes.
        If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial.
        This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.
liuzhe-lz's avatar
liuzhe-lz committed
416

J-shang's avatar
J-shang committed
417
    * - pythonPath
418
      - ``str``, optional
J-shang's avatar
J-shang committed
419
420
      - Specify a Python environment.
        This path will be inserted at the front of PATH. Here are some examples: 
liuzhe-lz's avatar
liuzhe-lz committed
421

J-shang's avatar
J-shang committed
422
423
          - (linux) pythonPath: ``/opt/python3.7/bin``
          - (windows) pythonPath: ``C:/Python37``
liuzhe-lz's avatar
liuzhe-lz committed
424

J-shang's avatar
J-shang committed
425
        If you are working on Anaconda, there is some difference. On Windows, you also have to add ``../script`` and ``../Library/bin`` separated by ``;``. Examples are as below:
liuzhe-lz's avatar
liuzhe-lz committed
426

J-shang's avatar
J-shang committed
427
          - (linux anaconda) pythonPath: ``/home/yourname/anaconda3/envs/myenv/bin/``
428
          - (windows anaconda) pythonPath: ``C:/Users/yourname/.conda/envs/myenv``; ``C:/Users/yourname/.conda/envs/myenv/Scripts``; ``C:/Users/yourname/.conda/envs/myenv/Library/bin``
kvartet's avatar
kvartet committed
429

J-shang's avatar
J-shang committed
430
        This is useful if preparing steps vary for different machines.
liuzhe-lz's avatar
liuzhe-lz committed
431

liuzhe-lz's avatar
liuzhe-lz committed
432
OpenpaiConfig
kvartet's avatar
kvartet committed
433
-------------
liuzhe-lz's avatar
liuzhe-lz committed
434

liuzhe-lz's avatar
liuzhe-lz committed
435
Detailed usage can be found :doc:`here </experiment/training_service/openpai>`.
liuzhe-lz's avatar
liuzhe-lz committed
436

J-shang's avatar
J-shang committed
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description

    * - platform
      - ``"openpai"``
      -
    
    * - host
      - ``str``
      - Hostname of OpenPAI service.
        This may include ``https://`` or ``http://`` prefix.
        HTTPS will be used by default.

    * - username
      - ``str``
      - OpenPAI user name.

    * - token
      - ``str``
      - OpenPAI user token.
        This can be found in your OpenPAI user settings page.

    * - trialCpuNumber
      - ``int``
      - Specify the CPU number of each trial to be used in OpenPAI container.

    * - trialMemorySize
      - ``str``
      - Specify the memory size of each trial to be used in OpenPAI container.
        format: ``number + tb|gb|mb|kb``.
        examples: ``"8gb"``, ``"8192mb"``.

    * - storageConfigName
      - ``str``
      - Specify the storage name used in OpenPAI.

    * - dockerImage
479
480
      - ``str``, optional
      - Default: ``"msranni/nni:latest"``. Name and tag of docker image to run the trials.
J-shang's avatar
J-shang committed
481
482
483
484
485
486
487
488
489
490
491

    * - localStorageMountPoint
      - ``str``
      - :ref:`Mount point <path>` of storage service (typically NFS) on the local machine.

    * - containerStorageMountPoint
      - ``str``
      - Mount point of storage service (typically NFS) in docker container.
        This must be an absolute path.

    * - reuseMode
492
      - ``bool``, optional
Yuge Zhang's avatar
Yuge Zhang committed
493
      - Default: ``True``. Enable :ref:`reuse mode <training-service-reuse>`.
J-shang's avatar
J-shang committed
494
495

    * - openpaiConfig
496
      - ``JSON``, optional
J-shang's avatar
J-shang committed
497
498
499
      - Embedded OpenPAI config file.

    * - openpaiConfigFile
500
      - ``str``, optional
J-shang's avatar
J-shang committed
501
502
      - `Path`_ to OpenPAI config file.
        An example can be found `here <https://github.com/microsoft/pai/blob/master/docs/manual/cluster-user/examples/hello-world-job.yaml>`__.
liuzhe-lz's avatar
liuzhe-lz committed
503
504

AmlConfig
kvartet's avatar
kvartet committed
505
---------
liuzhe-lz's avatar
liuzhe-lz committed
506

liuzhe-lz's avatar
liuzhe-lz committed
507
Detailed usage can be found :doc:`here </experiment/training_service/aml>`.
liuzhe-lz's avatar
liuzhe-lz committed
508

J-shang's avatar
J-shang committed
509
510
511
.. list-table::
    :widths: 10 10 80
    :header-rows: 1
liuzhe-lz's avatar
liuzhe-lz committed
512

J-shang's avatar
J-shang committed
513
514
515
    * - Field Name
      - Type
      - Description
liuzhe-lz's avatar
liuzhe-lz committed
516

J-shang's avatar
J-shang committed
517
518
519
    * - platform
      - ``"aml"``
      -
520

J-shang's avatar
J-shang committed
521
    * - dockerImage
522
523
      - ``str``, optional
      - Default: ``"msranni/nni:latest"``. Name and tag of docker image to run the trials.
liuzhe-lz's avatar
liuzhe-lz committed
524

J-shang's avatar
J-shang committed
525
526
527
    * - subscriptionId
      - ``str``
      - Azure subscription ID.
liuzhe-lz's avatar
liuzhe-lz committed
528

J-shang's avatar
J-shang committed
529
530
531
    * - resourceGroup
      - ``str``
      - Azure resource group name.
liuzhe-lz's avatar
liuzhe-lz committed
532

J-shang's avatar
J-shang committed
533
534
535
    * - workspaceName
      - ``str``
      - Azure workspace name.
liuzhe-lz's avatar
liuzhe-lz committed
536

J-shang's avatar
J-shang committed
537
538
539
    * - computeTarget
      - ``str``
      - AML compute cluster name.
kvartet's avatar
kvartet committed
540

541
542
543
DlcConfig
---------

liuzhe-lz's avatar
liuzhe-lz committed
544
Detailed usage can be found :doc:`here </experiment/training_service/paidlc>`.
545

J-shang's avatar
J-shang committed
546
547
548
.. list-table::
    :widths: 10 10 80
    :header-rows: 1
549

J-shang's avatar
J-shang committed
550
551
552
    * - Field Name
      - Type
      - Description
553

J-shang's avatar
J-shang committed
554
555
556
557
558
    * - platform
      - ``"dlc"``
      -
    
    * - type
559
560
      - ``str``, optional
      - Default: ``"Worker"``. Job spec type.
561

J-shang's avatar
J-shang committed
562
563
564
    * - image
      - ``str``
      - Name and tag of docker image to run the trials.
565

J-shang's avatar
J-shang committed
566
    * - jobType
567
568
      - ``str``, optional
      - Default: ``"TFJob"``. PAI-DLC training job type, ``"TFJob"`` or ``"PyTorchJob"``.
569

J-shang's avatar
J-shang committed
570
571
572
    * - podCount
      - ``str``
      - Pod count to run a single training job.
573

J-shang's avatar
J-shang committed
574
575
576
    * - ecsSpec
      - ``str``
      - Training server config spec string.
577

J-shang's avatar
J-shang committed
578
579
580
    * - region
      - ``str``
      - The region where PAI-DLC public-cluster locates.
581

J-shang's avatar
J-shang committed
582
583
584
    * - nasDataSourceId
      - ``str``
      - The NAS datasource id configurated in PAI-DLC side.
585

586
587
588
589
    * - ossDataSourceId
      - ``str``
      - The OSS datasource id configurated in PAI-DLC side, this is optional.

J-shang's avatar
J-shang committed
590
591
592
    * - accessKeyId
      - ``str``
      - The accessKeyId of your cloud account.
593

J-shang's avatar
J-shang committed
594
595
596
    * - accessKeySecret
      - ``str``
      - The accessKeySecret of your cloud account.
597

J-shang's avatar
J-shang committed
598
599
600
    * - localStorageMountPoint
      - ``str``
      - The mount point of the NAS on PAI-DSW server, default is /home/admin/workspace/.
601

J-shang's avatar
J-shang committed
602
603
604
    * - containerStorageMountPoint
      - ``str``
      - The mount point of the NAS on PAI-DLC side, default is /root/data/.
605

kvartet's avatar
kvartet committed
606
607
608
HybridConfig
------------

liuzhe-lz's avatar
liuzhe-lz committed
609
Currently only support `LocalConfig`_, `RemoteConfig`_, `OpenpaiConfig`_ and `AmlConfig`_ . Detailed usage can be found :doc:`here </experiment/training_service/hybrid>`.
kvartet's avatar
kvartet committed
610

QuanluZhang's avatar
QuanluZhang committed
611
612
.. _reference-sharedstorage-config-label:

kvartet's avatar
kvartet committed
613
614
615
SharedStorageConfig
^^^^^^^^^^^^^^^^^^^

616
Detailed usage can be found :doc:`here </experiment/training_service/shared_storage>`.
kvartet's avatar
kvartet committed
617

liuzhe-lz's avatar
liuzhe-lz committed
618
NfsConfig
kvartet's avatar
kvartet committed
619
620
---------

J-shang's avatar
J-shang committed
621
622
623
.. list-table::
    :widths: 10 10 80
    :header-rows: 1
kvartet's avatar
kvartet committed
624

J-shang's avatar
J-shang committed
625
626
627
    * - Field Name
      - Type
      - Description
kvartet's avatar
kvartet committed
628

J-shang's avatar
J-shang committed
629
630
631
    * - storageType
      - ``"NFS"``
      -
kvartet's avatar
kvartet committed
632

J-shang's avatar
J-shang committed
633
634
635
636
    * - localMountPoint
      - ``str``
      - The path that the storage has been or will be mounted in the local machine.
        If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.
kvartet's avatar
kvartet committed
637

J-shang's avatar
J-shang committed
638
639
640
641
    * - remoteMountPoint
      - ``str``
      - The path that the storage will be mounted in the remote machine.
        If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.
kvartet's avatar
kvartet committed
642

J-shang's avatar
J-shang committed
643
644
645
646
647
    * - localMounted
      - ``str``
      - Specify the object and status to mount the shared storage.
        values: ``"usermount"``, ``"nnimount"``, ``"nomount"``
        ``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.
kvartet's avatar
kvartet committed
648

J-shang's avatar
J-shang committed
649
650
651
    * - nfsServer
      - ``str``
      - NFS server host.
kvartet's avatar
kvartet committed
652

J-shang's avatar
J-shang committed
653
654
655
    * - exportedDirectory
      - ``str``
      - Exported directory of NFS server, detailed `here <https://www.ibm.com/docs/en/aix/7.2?topic=system-nfs-exporting-mounting>`_.
kvartet's avatar
kvartet committed
656

liuzhe-lz's avatar
liuzhe-lz committed
657
AzureBlobConfig
kvartet's avatar
kvartet committed
658
659
---------------

J-shang's avatar
J-shang committed
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description

    * - storageType
      - ``"AzureBlob"``
      -

    * - localMountPoint
      - ``str``
      - The path that the storage has been or will be mounted in the local machine.
        If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.

    * - remoteMountPoint
      - ``str``
      - The path that the storage will be mounted in the remote machine.
        If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.
        Note that the directory must be empty when using AzureBlob.

    * - localMounted
      - ``str``
      - Specify the object and status to mount the shared storage.
        values: ``"usermount"``, ``"nnimount"``, ``"nomount"``.
        ``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.

    * - storageAccountName
      - ``str``
      - Azure storage account name.

    * - storageAccountKey
694
      - ``str``
J-shang's avatar
J-shang committed
695
696
697
698
699
      - Azure storage account key.

    * - containerName
      - ``str``
      - AzureBlob container name.