"cacheflow/models/opt.py" did not exist on "39161c98a054756cb39edb9f634c6a466410c92d"
experiment_config.rst 22.7 KB
Newer Older
liuzhe-lz's avatar
liuzhe-lz committed
1
2
3
4
===========================
Experiment Config Reference
===========================

kvartet's avatar
kvartet committed
5
A config file is needed when creating an experiment. This document describes the rules to write a config file and provides some examples.
liuzhe-lz's avatar
liuzhe-lz committed
6

kvartet's avatar
kvartet committed
7
.. Note::
liuzhe-lz's avatar
liuzhe-lz committed
8

kvartet's avatar
kvartet committed
9
    1. This document lists field names with ``camelCase``. If users use these fields in the pythonic way with NNI Python APIs (e.g., ``nni.experiment``), the field names should be converted to ``snake_case``.
liuzhe-lz's avatar
liuzhe-lz committed
10

kvartet's avatar
kvartet committed
11
    2. In this document, the type of fields are formatted as `Python type hint <https://docs.python.org/3.10/library/typing.html>`_. Therefore JSON objects are called `dict` and arrays are called `list`.
liuzhe-lz's avatar
liuzhe-lz committed
12

kvartet's avatar
kvartet committed
13
    .. _path: 
liuzhe-lz's avatar
liuzhe-lz committed
14

kvartet's avatar
kvartet committed
15
    3. Some fields take a path to a file or directory. Unless otherwise noted, both absolute path and relative path are supported, and ``~`` will be expanded to the home directory.
liuzhe-lz's avatar
liuzhe-lz committed
16

kvartet's avatar
kvartet committed
17
18
19
20
21
22
23
24
25
26
       - When written in the YAML file, relative paths are relative to the directory containing that file.
       - When assigned in Python code, relative paths are relative to the current working directory.
       - All relative paths are converted to absolute when loading YAML file into Python class, and when saving Python class to YAML file.

    4. Setting a field to ``None`` or ``null`` is equivalent to not setting the field.

.. contents:: Contents
   :local:
   :depth: 3
 
liuzhe-lz's avatar
liuzhe-lz committed
27

liuzhe-lz's avatar
liuzhe-lz committed
28
29
30
31
32
33
34
35
36
37
38
39
40
Examples
========

Local Mode
^^^^^^^^^^

.. code-block:: yaml

    experimentName: MNIST
    searchSpaceFile: search_space.json
    trialCommand: python mnist.py
    trialCodeDirectory: .
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
41
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
    maxExperimentDuration: 24h
    maxTrialNumber: 100
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: local
      useActiveGpu: True

Local Mode (Inline Search Space)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

    searchSpace:
      batch_size:
        _type: choice
        _value: [16, 32, 64]
      learning_rate:
        _type: loguniform
        _value: [0.0001, 0.1]
    trialCommand: python mnist.py
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
66
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: local
      useActiveGpu: True

Remote Mode
^^^^^^^^^^^

.. code-block:: yaml

    experimentName: MNIST
    searchSpaceFile: search_space.json
    trialCommand: python mnist.py
    trialCodeDirectory: .
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
85
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
    maxExperimentDuration: 24h
    maxTrialNumber: 100
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: remote
      machineList:
        - host: 11.22.33.44
          user: alice
          password: xxxxx
        - host: my.domain.com
          user: bob
          sshKeyFile: ~/.ssh/id_rsa

Reference
=========

liuzhe-lz's avatar
liuzhe-lz committed
105
ExperimentConfig
liuzhe-lz's avatar
liuzhe-lz committed
106
^^^^^^^^^^^^^^^^
liuzhe-lz's avatar
liuzhe-lz committed
107

J-shang's avatar
J-shang committed
108
109
110
111
112
113
114
115
116
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description
    
    * - experimentName
117
      - ``str``, optional
J-shang's avatar
J-shang committed
118
119
120
      - Mnemonic name of the experiment, which will be shown in WebUI and nnictl.

    * - searchSpaceFile
121
      - ``str``, optional
J-shang's avatar
J-shang committed
122
123
124
125
126
      - Path_ to the JSON file containing the search space.
        Search space format is determined by tuner. The common format for built-in tuners is documented  `here <../Tutorial/SearchSpaceSpec.rst>`__.
        Mutually exclusive to ``searchSpace``.

    * - searchSpace
127
      - ``JSON``, optional
J-shang's avatar
J-shang committed
128
129
130
131
132
133
134
135
136
137
138
139
      - Search space object.
        The format is determined by tuner. Common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__.
        Note that ``None`` means "no such field" so empty search space should be written as ``{}``.
        Mutually exclusive to ``searchSpaceFile``.

    * - trialCommand
      - ``str``
      - Command to launch trial.
        The command will be executed in bash on Linux and macOS, and in PowerShell on Windows.
        Note that using ``python3`` on Linux and macOS, and using ``python`` on Windows.

    * - trialCodeDirectory
140
141
      - ``str``, optional
      - Default: ``"."``. `Path`_ to the directory containing trial source files.
J-shang's avatar
J-shang committed
142
143
144
145
146
147
148
149
150
        All files in this directory will be sent to the training machine, unless in the ``.nniignore`` file.
        (See :ref:`nniignore <nniignore>` for details.)

    * - trialConcurrency
      - ``int``
      - Specify how many trials should be run concurrently.
        The real concurrency also depends on hardware resources and may be less than this value.

    * - trialGpuNumber
151
152
      - ``int`` or ``None``, optional
      - Default: None. This field might have slightly different meanings for various training services,
J-shang's avatar
J-shang committed
153
154
155
156
157
158
159
160
        especially when set to ``0`` or ``None``.
        See `training service's document <../training_services.rst>`__ for details.

        In local mode, setting the field to ``0`` will prevent trials from accessing GPU (by empty ``CUDA_VISIBLE_DEVICES``).
        And when set to ``None``, trials will be created and scheduled as if they did not use GPU,
        but they can still use all GPU resources if they want.

    * - maxExperimentDuration
161
162
163
164
      - ``str``, optional
      - Limit the duration of this experiment if specified. The duration is unlimited if not set.
        Format: ``number + s|m|h|d``.
        Examples: ``"10m"``, ``"0.5h"``.
J-shang's avatar
J-shang committed
165
166
167
        When time runs out, the experiment will stop creating trials but continue to serve WebUI.

    * - maxTrialNumber
168
169
      - ``int``, optional
      - Limit the number of trials to create if specified. The trial number is unlimited if not set.
J-shang's avatar
J-shang committed
170
171
172
        When the budget runs out, the experiment will stop creating trials but continue to serve WebUI.

    * - maxTrialDuration
173
174
175
176
      - ``str``, optional
      - Limit the duration of trial job if specified. The duration is unlimited if not set.
        Format: ``number + s|m|h|d``.
        Examples: ``"10m"``, ``"0.5h"``.
J-shang's avatar
J-shang committed
177
178
179
        When time runs out, the current trial job will stop.

    * - nniManagerIp
180
181
      - ``str``, optional
      - Default: default connection chosen by system. IP of the current machine, used by training machines to access NNI manager. Not used in local mode.
J-shang's avatar
J-shang committed
182
183
184
        Except for the local mode, it is highly recommended to set this field manually.

    * - useAnnotation
185
186
      - ``bool``, optional
      - Default: ``False``. Enable `annotation <../Tutorial/AnnotationSpec.rst>`__.
J-shang's avatar
J-shang committed
187
188
189
        When using annotation, ``searchSpace`` and ``searchSpaceFile`` should not be specified manually.

    * - debug
190
191
      - ``bool``, optional
      - Default: ``False``. Enable debug mode.
J-shang's avatar
J-shang committed
192
193
194
        When enabled, logging will be more verbose and some internal validation will be loosened.

    * - logLevel
195
196
      - ``str``, optional
      - Default: ``info`` or ``debug``, depending on ``debug`` option. Set log level of the whole system.
J-shang's avatar
J-shang committed
197
        values: ``"trace"``, ``"debug"``, ``"info"``, ``"warning"``, ``"error"``, ``"fatal"``
198
        When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info".
J-shang's avatar
J-shang committed
199
200
201
202
203
        Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc.
        The exception is trial, whose logging level is directly managed by trial code.
        For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``logging.CRITICAL``.

    * - experimentWorkingDirectory
204
205
206
      - ``str``, optional
      - Default: ``~/nni-experiments``.
        Specify the :ref:`directory <path>` to place log, checkpoint, metadata, and other run-time stuff.
J-shang's avatar
J-shang committed
207
208
209
        NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments.

    * - tunerGpuIndices
210
      - ``list[int]`` or ``str`` or ``int``, optional
J-shang's avatar
J-shang committed
211
212
213
214
215
      - Limit the GPUs visible to tuner, assessor, and advisor.
        This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process.
        Because tuner, assessor, and advisor run in the same process, this option will affect them all.

    * - tuner
216
      - ``AlgorithmConfig``, optional
J-shang's avatar
J-shang committed
217
218
219
220
      - Specify the tuner.
        The built-in tuners can be found `here <../builtin_tuner.rst>`__ and you can follow `this tutorial <../Tuner/CustomizeTuner.rst>`__ to customize a new tuner.

    * - assessor
221
      - ``AlgorithmConfig``, optional
J-shang's avatar
J-shang committed
222
223
224
225
      - Specify the assessor.
        The built-in assessors can be found `here <../builtin_assessor.rst>`__ and you can follow `this tutorial <../Assessor/CustomizeAssessor.rst>`__ to customize a new assessor.

    * - advisor
226
      - ``AlgorithmConfig``, optional
J-shang's avatar
J-shang committed
227
228
229
230
231
232
233
234
      - Specify the advisor.
        NNI provides two built-in advisors: `BOHB <../Tuner/BohbAdvisor.rst>`__ and `Hyperband <../Tuner/HyperbandAdvisor.rst>`__, and you can follow `this tutorial <../Tuner/CustomizeAdvisor.rst>`__ to customize a new advisor.

    * - trainingService
      - ``TrainingServiceConfig``
      - Specify the `training service <../TrainingService/Overview.rst>`__.

    * - sharedStorage
235
      - ``SharedStorageConfig``, optional
J-shang's avatar
J-shang committed
236
      - Configure the shared storage, detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__.
kvartet's avatar
kvartet committed
237

liuzhe-lz's avatar
liuzhe-lz committed
238
AlgorithmConfig
liuzhe-lz's avatar
liuzhe-lz committed
239
240
241
242
^^^^^^^^^^^^^^^

``AlgorithmConfig`` describes a tuner / assessor / advisor algorithm.

kvartet's avatar
kvartet committed
243
For customized algorithms, there are two ways to describe them:
liuzhe-lz's avatar
liuzhe-lz committed
244

kvartet's avatar
kvartet committed
245
  1. `Register the algorithm <../Tutorial/InstallCustomizedAlgos.rst>`__ to use it like built-in. (preferred)
liuzhe-lz's avatar
liuzhe-lz committed
246
247

  2. Specify code directory and class name directly.
liuzhe-lz's avatar
liuzhe-lz committed
248

J-shang's avatar
J-shang committed
249
250
251
252
253
254
255
256
257
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description
    
    * - name
258
259
      - ``str`` or ``None``, optional
      - Default: None. Name of the built-in or registered algorithm.
J-shang's avatar
J-shang committed
260
261
262
        ``str`` for the built-in and registered algorithm, ``None`` for other customized algorithms.

    * - className
263
264
      - ``str`` or ``None``, optional
      - Default: None. Qualified class name of not registered customized algorithm.
J-shang's avatar
J-shang committed
265
266
267
268
        ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.
        example: ``"my_tuner.MyTuner"``

    * - codeDirectory
269
270
      - ``str`` or ``None``, optional
      - Default: None. Path_ to the directory containing the customized algorithm class.
J-shang's avatar
J-shang committed
271
272
273
        ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.

    * - classArgs
274
      - ``dict[str, Any]``, optional
J-shang's avatar
J-shang committed
275
276
      - Keyword arguments passed to algorithm class' constructor.
        See algorithm's document for supported value.
liuzhe-lz's avatar
liuzhe-lz committed
277
278

TrainingServiceConfig
liuzhe-lz's avatar
liuzhe-lz committed
279
^^^^^^^^^^^^^^^^^^^^^
liuzhe-lz's avatar
liuzhe-lz committed
280

kvartet's avatar
kvartet committed
281
One of the following:
liuzhe-lz's avatar
liuzhe-lz committed
282

liuzhe-lz's avatar
liuzhe-lz committed
283
284
- `LocalConfig`_
- `RemoteConfig`_
J-shang's avatar
J-shang committed
285
- `OpenpaiConfig`_
liuzhe-lz's avatar
liuzhe-lz committed
286
- `AmlConfig`_
287
- `DlcConfig`_
kvartet's avatar
kvartet committed
288
- `HybridConfig`_
liuzhe-lz's avatar
liuzhe-lz committed
289

kvartet's avatar
kvartet committed
290
For `Kubeflow <../TrainingService/KubeflowMode.rst>`_, `FrameworkController <../TrainingService/FrameworkControllerMode.rst>`_, and `AdaptDL <../TrainingService/AdaptDLMode.rst>`_ training platforms, it is suggested to use `v1 config schema <../Tutorial/ExperimentConfig.rst>`_ for now.
liuzhe-lz's avatar
liuzhe-lz committed
291

292
293
.. _reference-local-config-label:

liuzhe-lz's avatar
liuzhe-lz committed
294
LocalConfig
kvartet's avatar
kvartet committed
295
-----------
liuzhe-lz's avatar
liuzhe-lz committed
296

297
Introduction of the corresponding local training service can be found :doc:`../experiment/local`.
liuzhe-lz's avatar
liuzhe-lz committed
298

J-shang's avatar
J-shang committed
299
300
301
302
303
304
305
306
307
308
309
310
311
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description

    * - platform
      - ``"local"``
      -
    
    * - useActiveGpu
312
313
      - ``bool``, optional
      - Default: ``False``. Specify whether NNI should submit trials to GPUs occupied by other tasks.
J-shang's avatar
J-shang committed
314
315
316
317
318
319
320
321
322
323
324
325
        Must be set when ``trialGpuNumber`` greater than zero.
        Following processes can make GPU "active":

          - non-NNI CUDA programs
          - graphical desktop
          - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
          - other users' CUDA programs, if you are using a shared server
          
        If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.
        When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.

    * - maxTrialNumberPerGpu
326
327
      - ``int``, optional
      - Default: ``1``. Specify how many trials can share one GPU.
J-shang's avatar
J-shang committed
328
329

    * - gpuIndices
330
      - ``list[int]`` or ``str`` or ``int``, optional
J-shang's avatar
J-shang committed
331
332
333
      - Limit the GPUs visible to trial processes.
        If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial.
        This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.
liuzhe-lz's avatar
liuzhe-lz committed
334

335
336
.. _reference-remote-config-label:

liuzhe-lz's avatar
liuzhe-lz committed
337
RemoteConfig
kvartet's avatar
kvartet committed
338
------------
liuzhe-lz's avatar
liuzhe-lz committed
339

340
Detailed usage can be found :doc:`../experiment/remote`.
liuzhe-lz's avatar
liuzhe-lz committed
341

J-shang's avatar
J-shang committed
342
343
344
.. list-table::
    :widths: 10 10 80
    :header-rows: 1
liuzhe-lz's avatar
liuzhe-lz committed
345

J-shang's avatar
J-shang committed
346
347
348
    * - Field Name
      - Type
      - Description
liuzhe-lz's avatar
liuzhe-lz committed
349

J-shang's avatar
J-shang committed
350
351
352
    * - platform
      - ``"remote"``
      -
liuzhe-lz's avatar
liuzhe-lz committed
353

J-shang's avatar
J-shang committed
354
355
356
    * - machineList
      - ``List[RemoteMachineConfig]``
      - List of training machines.
liuzhe-lz's avatar
liuzhe-lz committed
357

J-shang's avatar
J-shang committed
358
    * - reuseMode
359
360
      - ``bool``, optional
      - Default: ``True``. Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__.
liuzhe-lz's avatar
liuzhe-lz committed
361
362

RemoteMachineConfig
kvartet's avatar
kvartet committed
363
"""""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
364

J-shang's avatar
J-shang committed
365
366
367
368
369
370
371
372
373
374
375
376
377
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description

    * - host
      - ``str``
      - IP or hostname (domain name) of the machine.

    * - port
378
379
      - ``int``, optional
      - Default: ``22``. SSH service port.
J-shang's avatar
J-shang committed
380
381
382
383
384
385

    * - user
      - ``str``
      - Login user name.

    * - password
386
      - ``str``, optional
J-shang's avatar
J-shang committed
387
388
389
      - If not specified, ``sshKeyFile`` will be used instead.
    
    * - sshKeyFile
390
      - ``str``, optional
J-shang's avatar
J-shang committed
391
392
393
394
      - `Path`_ to ``sshKeyFile`` (identity file).
        Only used when ``password`` is not specified.

    * - sshPassphrase
395
      - ``str``, optional
J-shang's avatar
J-shang committed
396
397
398
      - Passphrase of SSH identity file.

    * - useActiveGpu
399
400
      - ``bool``, optional
      - Default: ``False``. Specify whether NNI should submit trials to GPUs occupied by other tasks.
J-shang's avatar
J-shang committed
401
402
403
404
405
406
407
        Must be set when ``trialGpuNumber`` greater than zero.
        Following processes can make GPU "active":

          - non-NNI CUDA programs
          - graphical desktop
          - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
          - other users' CUDA programs, if you are using a shared server
kvartet's avatar
kvartet committed
408
  
J-shang's avatar
J-shang committed
409
410
        If your remote machine is a graphical OS like Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.
        When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.
liuzhe-lz's avatar
liuzhe-lz committed
411

J-shang's avatar
J-shang committed
412
    * - maxTrialNumberPerGpu
413
414
      - ``int``, optional
      - Default: ``1``. Specify how many trials can share one GPU.
liuzhe-lz's avatar
liuzhe-lz committed
415

J-shang's avatar
J-shang committed
416
    * - gpuIndices
417
      - ``list[int]`` or ``str`` or ``int``, optional
J-shang's avatar
J-shang committed
418
419
420
      - Limit the GPUs visible to trial processes.
        If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial.
        This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.
liuzhe-lz's avatar
liuzhe-lz committed
421

J-shang's avatar
J-shang committed
422
    * - pythonPath
423
      - ``str``, optional
J-shang's avatar
J-shang committed
424
425
      - Specify a Python environment.
        This path will be inserted at the front of PATH. Here are some examples: 
liuzhe-lz's avatar
liuzhe-lz committed
426

J-shang's avatar
J-shang committed
427
428
          - (linux) pythonPath: ``/opt/python3.7/bin``
          - (windows) pythonPath: ``C:/Python37``
liuzhe-lz's avatar
liuzhe-lz committed
429

J-shang's avatar
J-shang committed
430
        If you are working on Anaconda, there is some difference. On Windows, you also have to add ``../script`` and ``../Library/bin`` separated by ``;``. Examples are as below:
liuzhe-lz's avatar
liuzhe-lz committed
431

J-shang's avatar
J-shang committed
432
          - (linux anaconda) pythonPath: ``/home/yourname/anaconda3/envs/myenv/bin/``
433
          - (windows anaconda) pythonPath: ``C:/Users/yourname/.conda/envs/myenv``; ``C:/Users/yourname/.conda/envs/myenv/Scripts``; ``C:/Users/yourname/.conda/envs/myenv/Library/bin``
kvartet's avatar
kvartet committed
434

J-shang's avatar
J-shang committed
435
        This is useful if preparing steps vary for different machines.
liuzhe-lz's avatar
liuzhe-lz committed
436

liuzhe-lz's avatar
liuzhe-lz committed
437
OpenpaiConfig
kvartet's avatar
kvartet committed
438
-------------
liuzhe-lz's avatar
liuzhe-lz committed
439

kvartet's avatar
kvartet committed
440
Detailed usage can be found `here <../TrainingService/PaiMode.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
441

J-shang's avatar
J-shang committed
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description

    * - platform
      - ``"openpai"``
      -
    
    * - host
      - ``str``
      - Hostname of OpenPAI service.
        This may include ``https://`` or ``http://`` prefix.
        HTTPS will be used by default.

    * - username
      - ``str``
      - OpenPAI user name.

    * - token
      - ``str``
      - OpenPAI user token.
        This can be found in your OpenPAI user settings page.

    * - trialCpuNumber
      - ``int``
      - Specify the CPU number of each trial to be used in OpenPAI container.

    * - trialMemorySize
      - ``str``
      - Specify the memory size of each trial to be used in OpenPAI container.
        format: ``number + tb|gb|mb|kb``.
        examples: ``"8gb"``, ``"8192mb"``.

    * - storageConfigName
      - ``str``
      - Specify the storage name used in OpenPAI.

    * - dockerImage
484
485
      - ``str``, optional
      - Default: ``"msranni/nni:latest"``. Name and tag of docker image to run the trials.
J-shang's avatar
J-shang committed
486
487
488
489
490
491
492
493
494
495
496

    * - localStorageMountPoint
      - ``str``
      - :ref:`Mount point <path>` of storage service (typically NFS) on the local machine.

    * - containerStorageMountPoint
      - ``str``
      - Mount point of storage service (typically NFS) in docker container.
        This must be an absolute path.

    * - reuseMode
497
498
      - ``bool``, optional
      - Default: ``True``. Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__.
J-shang's avatar
J-shang committed
499
500

    * - openpaiConfig
501
      - ``JSON``, optional
J-shang's avatar
J-shang committed
502
503
504
      - Embedded OpenPAI config file.

    * - openpaiConfigFile
505
      - ``str``, optional
J-shang's avatar
J-shang committed
506
507
      - `Path`_ to OpenPAI config file.
        An example can be found `here <https://github.com/microsoft/pai/blob/master/docs/manual/cluster-user/examples/hello-world-job.yaml>`__.
liuzhe-lz's avatar
liuzhe-lz committed
508
509

AmlConfig
kvartet's avatar
kvartet committed
510
---------
liuzhe-lz's avatar
liuzhe-lz committed
511

kvartet's avatar
kvartet committed
512
Detailed usage can be found `here <../TrainingService/AMLMode.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
513

J-shang's avatar
J-shang committed
514
515
516
.. list-table::
    :widths: 10 10 80
    :header-rows: 1
liuzhe-lz's avatar
liuzhe-lz committed
517

J-shang's avatar
J-shang committed
518
519
520
    * - Field Name
      - Type
      - Description
liuzhe-lz's avatar
liuzhe-lz committed
521

J-shang's avatar
J-shang committed
522
523
524
    * - platform
      - ``"aml"``
      -
525

J-shang's avatar
J-shang committed
526
    * - dockerImage
527
528
      - ``str``, optional
      - Default: ``"msranni/nni:latest"``. Name and tag of docker image to run the trials.
liuzhe-lz's avatar
liuzhe-lz committed
529

J-shang's avatar
J-shang committed
530
531
532
    * - subscriptionId
      - ``str``
      - Azure subscription ID.
liuzhe-lz's avatar
liuzhe-lz committed
533

J-shang's avatar
J-shang committed
534
535
536
    * - resourceGroup
      - ``str``
      - Azure resource group name.
liuzhe-lz's avatar
liuzhe-lz committed
537

J-shang's avatar
J-shang committed
538
539
540
    * - workspaceName
      - ``str``
      - Azure workspace name.
liuzhe-lz's avatar
liuzhe-lz committed
541

J-shang's avatar
J-shang committed
542
543
544
    * - computeTarget
      - ``str``
      - AML compute cluster name.
kvartet's avatar
kvartet committed
545

546
547
548
549
550
DlcConfig
---------

Detailed usage can be found `here <../TrainingService/DlcMode.rst>`__.

J-shang's avatar
J-shang committed
551
552
553
.. list-table::
    :widths: 10 10 80
    :header-rows: 1
554

J-shang's avatar
J-shang committed
555
556
557
    * - Field Name
      - Type
      - Description
558

J-shang's avatar
J-shang committed
559
560
561
562
563
    * - platform
      - ``"dlc"``
      -
    
    * - type
564
565
      - ``str``, optional
      - Default: ``"Worker"``. Job spec type.
566

J-shang's avatar
J-shang committed
567
568
569
    * - image
      - ``str``
      - Name and tag of docker image to run the trials.
570

J-shang's avatar
J-shang committed
571
    * - jobType
572
573
      - ``str``, optional
      - Default: ``"TFJob"``. PAI-DLC training job type, ``"TFJob"`` or ``"PyTorchJob"``.
574

J-shang's avatar
J-shang committed
575
576
577
    * - podCount
      - ``str``
      - Pod count to run a single training job.
578

J-shang's avatar
J-shang committed
579
580
581
    * - ecsSpec
      - ``str``
      - Training server config spec string.
582

J-shang's avatar
J-shang committed
583
584
585
    * - region
      - ``str``
      - The region where PAI-DLC public-cluster locates.
586

J-shang's avatar
J-shang committed
587
588
589
    * - nasDataSourceId
      - ``str``
      - The NAS datasource id configurated in PAI-DLC side.
590

591
592
593
594
    * - ossDataSourceId
      - ``str``
      - The OSS datasource id configurated in PAI-DLC side, this is optional.

J-shang's avatar
J-shang committed
595
596
597
    * - accessKeyId
      - ``str``
      - The accessKeyId of your cloud account.
598

J-shang's avatar
J-shang committed
599
600
601
    * - accessKeySecret
      - ``str``
      - The accessKeySecret of your cloud account.
602

J-shang's avatar
J-shang committed
603
604
605
    * - localStorageMountPoint
      - ``str``
      - The mount point of the NAS on PAI-DSW server, default is /home/admin/workspace/.
606

J-shang's avatar
J-shang committed
607
608
609
    * - containerStorageMountPoint
      - ``str``
      - The mount point of the NAS on PAI-DLC side, default is /root/data/.
610

kvartet's avatar
kvartet committed
611
612
613
HybridConfig
------------

J-shang's avatar
J-shang committed
614
Currently only support `LocalConfig`_, `RemoteConfig`_, `OpenpaiConfig`_ and `AmlConfig`_ . Detailed usage can be found `here <../TrainingService/HybridMode.rst>`__.
kvartet's avatar
kvartet committed
615
616
617
618
619
620
621
622
623

SharedStorageConfig
^^^^^^^^^^^^^^^^^^^

Detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__.

nfsConfig
---------

J-shang's avatar
J-shang committed
624
625
626
.. list-table::
    :widths: 10 10 80
    :header-rows: 1
kvartet's avatar
kvartet committed
627

J-shang's avatar
J-shang committed
628
629
630
    * - Field Name
      - Type
      - Description
kvartet's avatar
kvartet committed
631

J-shang's avatar
J-shang committed
632
633
634
    * - storageType
      - ``"NFS"``
      -
kvartet's avatar
kvartet committed
635

J-shang's avatar
J-shang committed
636
637
638
639
    * - localMountPoint
      - ``str``
      - The path that the storage has been or will be mounted in the local machine.
        If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.
kvartet's avatar
kvartet committed
640

J-shang's avatar
J-shang committed
641
642
643
644
    * - remoteMountPoint
      - ``str``
      - The path that the storage will be mounted in the remote machine.
        If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.
kvartet's avatar
kvartet committed
645

J-shang's avatar
J-shang committed
646
647
648
649
650
    * - localMounted
      - ``str``
      - Specify the object and status to mount the shared storage.
        values: ``"usermount"``, ``"nnimount"``, ``"nomount"``
        ``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.
kvartet's avatar
kvartet committed
651

J-shang's avatar
J-shang committed
652
653
654
    * - nfsServer
      - ``str``
      - NFS server host.
kvartet's avatar
kvartet committed
655

J-shang's avatar
J-shang committed
656
657
658
    * - exportedDirectory
      - ``str``
      - Exported directory of NFS server, detailed `here <https://www.ibm.com/docs/en/aix/7.2?topic=system-nfs-exporting-mounting>`_.
kvartet's avatar
kvartet committed
659
660
661
662

azureBlobConfig
---------------

J-shang's avatar
J-shang committed
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
.. list-table::
    :widths: 10 10 80
    :header-rows: 1

    * - Field Name
      - Type
      - Description

    * - storageType
      - ``"AzureBlob"``
      -

    * - localMountPoint
      - ``str``
      - The path that the storage has been or will be mounted in the local machine.
        If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.

    * - remoteMountPoint
      - ``str``
      - The path that the storage will be mounted in the remote machine.
        If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.
        Note that the directory must be empty when using AzureBlob.

    * - localMounted
      - ``str``
      - Specify the object and status to mount the shared storage.
        values: ``"usermount"``, ``"nnimount"``, ``"nomount"``.
        ``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.

    * - storageAccountName
      - ``str``
      - Azure storage account name.

    * - storageAccountKey
697
      - ``str``
J-shang's avatar
J-shang committed
698
699
700
701
702
      - Azure storage account key.

    * - containerName
      - ``str``
      - AzureBlob container name.