experiment_config.rst 19.2 KB
Newer Older
liuzhe-lz's avatar
liuzhe-lz committed
1
2
3
4
===========================
Experiment Config Reference
===========================

kvartet's avatar
kvartet committed
5
A config file is needed when creating an experiment. This document describes the rules to write a config file and provides some examples.
liuzhe-lz's avatar
liuzhe-lz committed
6

kvartet's avatar
kvartet committed
7
.. Note::
liuzhe-lz's avatar
liuzhe-lz committed
8

kvartet's avatar
kvartet committed
9
    1. This document lists field names with ``camelCase``. If users use these fields in the pythonic way with NNI Python APIs (e.g., ``nni.experiment``), the field names should be converted to ``snake_case``.
liuzhe-lz's avatar
liuzhe-lz committed
10

kvartet's avatar
kvartet committed
11
    2. In this document, the type of fields are formatted as `Python type hint <https://docs.python.org/3.10/library/typing.html>`_. Therefore JSON objects are called `dict` and arrays are called `list`.
liuzhe-lz's avatar
liuzhe-lz committed
12

kvartet's avatar
kvartet committed
13
    .. _path: 
liuzhe-lz's avatar
liuzhe-lz committed
14

kvartet's avatar
kvartet committed
15
    3. Some fields take a path to a file or directory. Unless otherwise noted, both absolute path and relative path are supported, and ``~`` will be expanded to the home directory.
liuzhe-lz's avatar
liuzhe-lz committed
16

kvartet's avatar
kvartet committed
17
18
19
20
21
22
23
24
25
26
       - When written in the YAML file, relative paths are relative to the directory containing that file.
       - When assigned in Python code, relative paths are relative to the current working directory.
       - All relative paths are converted to absolute when loading YAML file into Python class, and when saving Python class to YAML file.

    4. Setting a field to ``None`` or ``null`` is equivalent to not setting the field.

.. contents:: Contents
   :local:
   :depth: 3
 
liuzhe-lz's avatar
liuzhe-lz committed
27

liuzhe-lz's avatar
liuzhe-lz committed
28
29
30
31
32
33
34
35
36
37
38
39
40
Examples
========

Local Mode
^^^^^^^^^^

.. code-block:: yaml

    experimentName: MNIST
    searchSpaceFile: search_space.json
    trialCommand: python mnist.py
    trialCodeDirectory: .
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
41
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
    maxExperimentDuration: 24h
    maxTrialNumber: 100
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: local
      useActiveGpu: True

Local Mode (Inline Search Space)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

    searchSpace:
      batch_size:
        _type: choice
        _value: [16, 32, 64]
      learning_rate:
        _type: loguniform
        _value: [0.0001, 0.1]
    trialCommand: python mnist.py
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
66
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: local
      useActiveGpu: True

Remote Mode
^^^^^^^^^^^

.. code-block:: yaml

    experimentName: MNIST
    searchSpaceFile: search_space.json
    trialCommand: python mnist.py
    trialCodeDirectory: .
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
85
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
    maxExperimentDuration: 24h
    maxTrialNumber: 100
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: remote
      machineList:
        - host: 11.22.33.44
          user: alice
          password: xxxxx
        - host: my.domain.com
          user: bob
          sshKeyFile: ~/.ssh/id_rsa

Reference
=========

liuzhe-lz's avatar
liuzhe-lz committed
105
ExperimentConfig
liuzhe-lz's avatar
liuzhe-lz committed
106
^^^^^^^^^^^^^^^^
liuzhe-lz's avatar
liuzhe-lz committed
107

liuzhe-lz's avatar
liuzhe-lz committed
108
109
experimentName
--------------
liuzhe-lz's avatar
liuzhe-lz committed
110

kvartet's avatar
kvartet committed
111
Mnemonic name of the experiment, which will be shown in WebUI and nnictl.
liuzhe-lz's avatar
liuzhe-lz committed
112
113
114
115

type: ``Optional[str]``


liuzhe-lz's avatar
liuzhe-lz committed
116
117
searchSpaceFile
---------------
liuzhe-lz's avatar
liuzhe-lz committed
118

kvartet's avatar
kvartet committed
119
Path_ to the JSON file containing the search space.
liuzhe-lz's avatar
liuzhe-lz committed
120
121
122

type: ``Optional[str]``

kvartet's avatar
kvartet committed
123
Search space format is determined by tuner. The common format for built-in tuners is documented  `here <../Tutorial/SearchSpaceSpec.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
124

liuzhe-lz's avatar
liuzhe-lz committed
125
Mutually exclusive to `searchSpace`_.
liuzhe-lz's avatar
liuzhe-lz committed
126
127


liuzhe-lz's avatar
liuzhe-lz committed
128
129
searchSpace
-----------
liuzhe-lz's avatar
liuzhe-lz committed
130
131
132

Search space object.

liuzhe-lz's avatar
liuzhe-lz committed
133
type: ``Optional[JSON]``
liuzhe-lz's avatar
liuzhe-lz committed
134

liuzhe-lz's avatar
liuzhe-lz committed
135
The format is determined by tuner. Common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
136
137
138

Note that ``None`` means "no such field" so empty search space should be written as ``{}``.

liuzhe-lz's avatar
liuzhe-lz committed
139
Mutually exclusive to `searchSpaceFile`_.
liuzhe-lz's avatar
liuzhe-lz committed
140
141


liuzhe-lz's avatar
liuzhe-lz committed
142
143
trialCommand
------------
liuzhe-lz's avatar
liuzhe-lz committed
144

liuzhe-lz's avatar
liuzhe-lz committed
145
Command to launch trial.
liuzhe-lz's avatar
liuzhe-lz committed
146
147
148

type: ``str``

liuzhe-lz's avatar
liuzhe-lz committed
149
The command will be executed in bash on Linux and macOS, and in PowerShell on Windows.
liuzhe-lz's avatar
liuzhe-lz committed
150

kvartet's avatar
kvartet committed
151
152
Note that using ``python3`` on Linux and macOS, and using ``python`` on Windows.

liuzhe-lz's avatar
liuzhe-lz committed
153

liuzhe-lz's avatar
liuzhe-lz committed
154
155
trialCodeDirectory
------------------
liuzhe-lz's avatar
liuzhe-lz committed
156
157
158
159
160
161
162

`Path`_ to the directory containing trial source files.

type: ``str``

default: ``"."``

kvartet's avatar
kvartet committed
163
164
All files in this directory will be sent to the training machine, unless in the ``.nniignore`` file.
(See :ref:`nniignore <nniignore>` for details.)
liuzhe-lz's avatar
liuzhe-lz committed
165
166


liuzhe-lz's avatar
liuzhe-lz committed
167
168
trialConcurrency
----------------
liuzhe-lz's avatar
liuzhe-lz committed
169
170
171
172
173
174
175
176

Specify how many trials should be run concurrently.

type: ``int``

The real concurrency also depends on hardware resources and may be less than this value.


liuzhe-lz's avatar
liuzhe-lz committed
177
178
trialGpuNumber
--------------
liuzhe-lz's avatar
liuzhe-lz committed
179
180
181
182
183

Number of GPUs used by each trial.

type: ``Optional[int]``

kvartet's avatar
kvartet committed
184
This field might have slightly different meanings for various training services,
liuzhe-lz's avatar
liuzhe-lz committed
185
especially when set to ``0`` or ``None``.
kvartet's avatar
kvartet committed
186
See `training service's document <../training_services.rst>`__ for details.
liuzhe-lz's avatar
liuzhe-lz committed
187

kvartet's avatar
kvartet committed
188
In local mode, setting the field to ``0`` will prevent trials from accessing GPU (by empty ``CUDA_VISIBLE_DEVICES``).
liuzhe-lz's avatar
liuzhe-lz committed
189
190
And when set to ``None``, trials will be created and scheduled as if they did not use GPU,
but they can still use all GPU resources if they want.
liuzhe-lz's avatar
liuzhe-lz committed
191
192


liuzhe-lz's avatar
liuzhe-lz committed
193
194
maxExperimentDuration
---------------------
liuzhe-lz's avatar
liuzhe-lz committed
195
196
197
198
199
200
201
202
203

Limit the duration of this experiment if specified.

type: ``Optional[str]``

format: ``number + s|m|h|d``

examples: ``"10m"``, ``"0.5h"``

kvartet's avatar
kvartet committed
204
When time runs out, the experiment will stop creating trials but continue to serve WebUI.
liuzhe-lz's avatar
liuzhe-lz committed
205
206


liuzhe-lz's avatar
liuzhe-lz committed
207
208
maxTrialNumber
--------------
liuzhe-lz's avatar
liuzhe-lz committed
209
210
211
212
213

Limit the number of trials to create if specified.

type: ``Optional[int]``

kvartet's avatar
kvartet committed
214
When the budget runs out, the experiment will stop creating trials but continue to serve WebUI.
liuzhe-lz's avatar
liuzhe-lz committed
215
216


liuzhe-lz's avatar
liuzhe-lz committed
217
218
nniManagerIp
------------
liuzhe-lz's avatar
liuzhe-lz committed
219

kvartet's avatar
kvartet committed
220
IP of the current machine, used by training machines to access NNI manager. Not used in local mode.
liuzhe-lz's avatar
liuzhe-lz committed
221
222
223

type: ``Optional[str]``

liuzhe-lz's avatar
liuzhe-lz committed
224
If not specified, IPv4 address of ``eth0`` will be used.
liuzhe-lz's avatar
liuzhe-lz committed
225

kvartet's avatar
kvartet committed
226
Except for the local mode, it is highly recommended to set this field manually.
liuzhe-lz's avatar
liuzhe-lz committed
227
228


liuzhe-lz's avatar
liuzhe-lz committed
229
230
231
232
useAnnotation
-------------

Enable `annotation <../Tutorial/AnnotationSpec.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
233
234
235
236
237

type: ``bool``

default: ``False``

liuzhe-lz's avatar
liuzhe-lz committed
238
When using annotation, `searchSpace`_ and `searchSpaceFile`_ should not be specified manually.
liuzhe-lz's avatar
liuzhe-lz committed
239
240
241
242
243
244
245
246
247
248
249


debug
-----

Enable debug mode.

type: ``bool``

default: ``False``

kvartet's avatar
kvartet committed
250
When enabled, logging will be more verbose and some internal validation will be loosened.
liuzhe-lz's avatar
liuzhe-lz committed
251
252


liuzhe-lz's avatar
liuzhe-lz committed
253
254
logLevel
--------
liuzhe-lz's avatar
liuzhe-lz committed
255

kvartet's avatar
kvartet committed
256
Set log level of the whole system.
liuzhe-lz's avatar
liuzhe-lz committed
257
258
259
260
261

type: ``Optional[str]``

values: ``"trace"``, ``"debug"``, ``"info"``, ``"warning"``, ``"error"``, ``"fatal"``

kvartet's avatar
kvartet committed
262
Defaults to "info" or "debug", depending on `debug`_ option. When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info".
liuzhe-lz's avatar
liuzhe-lz committed
263
264
265
266
267

Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc.

The exception is trial, whose logging level is directly managed by trial code.

liuzhe-lz's avatar
liuzhe-lz committed
268
For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``logging.CRITICAL``.
liuzhe-lz's avatar
liuzhe-lz committed
269
270


liuzhe-lz's avatar
liuzhe-lz committed
271
272
experimentWorkingDirectory
--------------------------
liuzhe-lz's avatar
liuzhe-lz committed
273

kvartet's avatar
kvartet committed
274
Specify the :ref:`directory <path>` to place log, checkpoint, metadata, and other run-time stuff.
liuzhe-lz's avatar
liuzhe-lz committed
275
276
277
278
279

type: ``Optional[str]``

By default uses ``~/nni-experiments``.

kvartet's avatar
kvartet committed
280
NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments.
liuzhe-lz's avatar
liuzhe-lz committed
281
282


liuzhe-lz's avatar
liuzhe-lz committed
283
284
tunerGpuIndices
---------------
liuzhe-lz's avatar
liuzhe-lz committed
285
286
287

Limit the GPUs visible to tuner, assessor, and advisor.

kvartet's avatar
kvartet committed
288
type: ``Optional[list[int] | str | int]``
liuzhe-lz's avatar
liuzhe-lz committed
289
290
291

This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process.

kvartet's avatar
kvartet committed
292
Because tuner, assessor, and advisor run in the same process, this option will affect them all.
liuzhe-lz's avatar
liuzhe-lz committed
293
294
295
296
297


tuner
-----

kvartet's avatar
kvartet committed
298
Specify the tuner. 
liuzhe-lz's avatar
liuzhe-lz committed
299
300
301

type: Optional `AlgorithmConfig`_

kvartet's avatar
kvartet committed
302
303
The built-in tuners can be found `here <../builtin_tuner.rst>`__ and you can follow `this tutorial <../Tuner/CustomizeTuner.rst>`__ to customize a new tuner.

liuzhe-lz's avatar
liuzhe-lz committed
304
305
306
307

assessor
--------

kvartet's avatar
kvartet committed
308
Specify the assessor. 
liuzhe-lz's avatar
liuzhe-lz committed
309
310
311

type: Optional `AlgorithmConfig`_

kvartet's avatar
kvartet committed
312
313
The built-in assessors can be found `here <../builtin_assessor.rst>`__ and you can follow `this tutorial <../Assessor/CustomizeAssessor.rst>`__ to customize a new assessor.

liuzhe-lz's avatar
liuzhe-lz committed
314
315
316
317

advisor
-------

kvartet's avatar
kvartet committed
318
Specify the advisor. 
liuzhe-lz's avatar
liuzhe-lz committed
319
320
321

type: Optional `AlgorithmConfig`_

kvartet's avatar
kvartet committed
322
323
NNI provides two built-in advisors: `BOHB <../Tuner/BohbAdvisor.rst>`__ and `Hyperband <../Tuner/HyperbandAdvisor.rst>`__, and you can follow `this tutorial <../Tuner/CustomizeAdvisor.rst>`__ to customize a new advisor.

liuzhe-lz's avatar
liuzhe-lz committed
324

liuzhe-lz's avatar
liuzhe-lz committed
325
326
trainingService
---------------
liuzhe-lz's avatar
liuzhe-lz committed
327

kvartet's avatar
kvartet committed
328
Specify the `training service <../TrainingService/Overview.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
329
330
331
332

type: `TrainingServiceConfig`_


kvartet's avatar
kvartet committed
333
334
335
336
337
338
339
340
sharedStorage
-------------

Configure the shared storage, detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__.

type: Optional `SharedStorageConfig`_


liuzhe-lz's avatar
liuzhe-lz committed
341
AlgorithmConfig
liuzhe-lz's avatar
liuzhe-lz committed
342
343
344
345
^^^^^^^^^^^^^^^

``AlgorithmConfig`` describes a tuner / assessor / advisor algorithm.

kvartet's avatar
kvartet committed
346
For customized algorithms, there are two ways to describe them:
liuzhe-lz's avatar
liuzhe-lz committed
347

kvartet's avatar
kvartet committed
348
  1. `Register the algorithm <../Tutorial/InstallCustomizedAlgos.rst>`__ to use it like built-in. (preferred)
liuzhe-lz's avatar
liuzhe-lz committed
349
350

  2. Specify code directory and class name directly.
liuzhe-lz's avatar
liuzhe-lz committed
351
352
353
354
355


name
----

kvartet's avatar
kvartet committed
356
Name of the built-in or registered algorithm.
liuzhe-lz's avatar
liuzhe-lz committed
357

kvartet's avatar
kvartet committed
358
type: ``str`` for the built-in and registered algorithm, ``None`` for other customized algorithms.
liuzhe-lz's avatar
liuzhe-lz committed
359
360


liuzhe-lz's avatar
liuzhe-lz committed
361
362
className
---------
liuzhe-lz's avatar
liuzhe-lz committed
363

kvartet's avatar
kvartet committed
364
Qualified class name of not registered customized algorithm.
liuzhe-lz's avatar
liuzhe-lz committed
365

kvartet's avatar
kvartet committed
366
type: ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.
liuzhe-lz's avatar
liuzhe-lz committed
367
368
369
370

example: ``"my_tuner.MyTuner"``


liuzhe-lz's avatar
liuzhe-lz committed
371
372
codeDirectory
-------------
liuzhe-lz's avatar
liuzhe-lz committed
373

kvartet's avatar
kvartet committed
374
`Path`_ to the directory containing the customized algorithm class.
liuzhe-lz's avatar
liuzhe-lz committed
375

kvartet's avatar
kvartet committed
376
type: ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.
liuzhe-lz's avatar
liuzhe-lz committed
377
378


liuzhe-lz's avatar
liuzhe-lz committed
379
380
classArgs
---------
liuzhe-lz's avatar
liuzhe-lz committed
381
382
383
384
385
386
387
388
389

Keyword arguments passed to algorithm class' constructor.

type: ``Optional[dict[str, Any]]``

See algorithm's document for supported value.


TrainingServiceConfig
liuzhe-lz's avatar
liuzhe-lz committed
390
^^^^^^^^^^^^^^^^^^^^^
liuzhe-lz's avatar
liuzhe-lz committed
391

kvartet's avatar
kvartet committed
392
One of the following:
liuzhe-lz's avatar
liuzhe-lz committed
393

liuzhe-lz's avatar
liuzhe-lz committed
394
395
- `LocalConfig`_
- `RemoteConfig`_
kvartet's avatar
kvartet committed
396
- :ref:`OpenpaiConfig <openpai-class>`
liuzhe-lz's avatar
liuzhe-lz committed
397
- `AmlConfig`_
kvartet's avatar
kvartet committed
398
- `HybridConfig`_
liuzhe-lz's avatar
liuzhe-lz committed
399

kvartet's avatar
kvartet committed
400
For `Kubeflow <../TrainingService/KubeflowMode.rst>`_, `FrameworkController <../TrainingService/FrameworkControllerMode.rst>`_, and `AdaptDL <../TrainingService/AdaptDLMode.rst>`_ training platforms, it is suggested to use `v1 config schema <../Tutorial/ExperimentConfig.rst>`_ for now.
liuzhe-lz's avatar
liuzhe-lz committed
401
402
403


LocalConfig
kvartet's avatar
kvartet committed
404
-----------
liuzhe-lz's avatar
liuzhe-lz committed
405

kvartet's avatar
kvartet committed
406
Detailed usage can be found `here <../TrainingService/LocalMode.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
407
408

platform
kvartet's avatar
kvartet committed
409
""""""""
liuzhe-lz's avatar
liuzhe-lz committed
410
411
412
413

Constant string ``"local"``.


liuzhe-lz's avatar
liuzhe-lz committed
414
useActiveGpu
kvartet's avatar
kvartet committed
415
""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
416
417
418

Specify whether NNI should submit trials to GPUs occupied by other tasks.

liuzhe-lz's avatar
liuzhe-lz committed
419
type: ``Optional[bool]``
liuzhe-lz's avatar
liuzhe-lz committed
420

kvartet's avatar
kvartet committed
421
Must be set when `trialGpuNumber`_ greater than zero.
liuzhe-lz's avatar
liuzhe-lz committed
422

kvartet's avatar
kvartet committed
423
424
425
426
427
428
429
430
431
432
Following processes can make GPU "active":

  - non-NNI CUDA programs
  - graphical desktop
  - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
  - other users' CUDA programs, if you are using a shared server
  
If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.

When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.
liuzhe-lz's avatar
liuzhe-lz committed
433
434


liuzhe-lz's avatar
liuzhe-lz committed
435
maxTrialNumberPerGpu
kvartet's avatar
kvartet committed
436
""""""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
437
438
439
440
441
442
443
444

Specify how many trials can share one GPU.

type: ``int``

default: ``1``


liuzhe-lz's avatar
liuzhe-lz committed
445
gpuIndices
kvartet's avatar
kvartet committed
446
""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
447
448
449

Limit the GPUs visible to trial processes.

kvartet's avatar
kvartet committed
450
type: ``Optional[list[int] | str | int]``
liuzhe-lz's avatar
liuzhe-lz committed
451

liuzhe-lz's avatar
liuzhe-lz committed
452
If `trialGpuNumber`_ is less than the length of this value, only a subset will be visible to each trial.
liuzhe-lz's avatar
liuzhe-lz committed
453
454
455
456
457

This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.


RemoteConfig
kvartet's avatar
kvartet committed
458
------------
liuzhe-lz's avatar
liuzhe-lz committed
459

kvartet's avatar
kvartet committed
460
Detailed usage can be found `here <../TrainingService/RemoteMachineMode.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
461
462

platform
kvartet's avatar
kvartet committed
463
""""""""
liuzhe-lz's avatar
liuzhe-lz committed
464
465
466
467

Constant string ``"remote"``.


liuzhe-lz's avatar
liuzhe-lz committed
468
machineList
kvartet's avatar
kvartet committed
469
"""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
470
471
472
473
474
475

List of training machines.

type: list of `RemoteMachineConfig`_


liuzhe-lz's avatar
liuzhe-lz committed
476
reuseMode
kvartet's avatar
kvartet committed
477
"""""""""
liuzhe-lz's avatar
liuzhe-lz committed
478

kvartet's avatar
kvartet committed
479
Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__.
liuzhe-lz's avatar
liuzhe-lz committed
480

liuzhe-lz's avatar
liuzhe-lz committed
481
type: ``bool``
liuzhe-lz's avatar
liuzhe-lz committed
482
483
484


RemoteMachineConfig
kvartet's avatar
kvartet committed
485
"""""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
486
487

host
kvartet's avatar
kvartet committed
488
****
liuzhe-lz's avatar
liuzhe-lz committed
489
490
491
492
493
494
495

IP or hostname (domain name) of the machine.

type: ``str``


port
kvartet's avatar
kvartet committed
496
****
liuzhe-lz's avatar
liuzhe-lz committed
497
498
499
500
501

SSH service port.

type: ``int``

liuzhe-lz's avatar
liuzhe-lz committed
502
default: ``22``
liuzhe-lz's avatar
liuzhe-lz committed
503
504
505


user
kvartet's avatar
kvartet committed
506
****
liuzhe-lz's avatar
liuzhe-lz committed
507
508
509
510
511
512
513

Login user name.

type: ``str``


password
kvartet's avatar
kvartet committed
514
********
liuzhe-lz's avatar
liuzhe-lz committed
515
516
517
518
519

Login password.

type: ``Optional[str]``

liuzhe-lz's avatar
liuzhe-lz committed
520
If not specified, `sshKeyFile`_ will be used instead.
liuzhe-lz's avatar
liuzhe-lz committed
521
522


liuzhe-lz's avatar
liuzhe-lz committed
523
sshKeyFile
kvartet's avatar
kvartet committed
524
**********
liuzhe-lz's avatar
liuzhe-lz committed
525

liuzhe-lz's avatar
liuzhe-lz committed
526
`Path`_ to sshKeyFile (identity file).
liuzhe-lz's avatar
liuzhe-lz committed
527

liuzhe-lz's avatar
liuzhe-lz committed
528
type: ``Optional[str]``
liuzhe-lz's avatar
liuzhe-lz committed
529
530
531
532

Only used when `password`_ is not specified.


liuzhe-lz's avatar
liuzhe-lz committed
533
sshPassphrase
kvartet's avatar
kvartet committed
534
*************
liuzhe-lz's avatar
liuzhe-lz committed
535
536
537
538
539
540

Passphrase of SSH identity file.

type: ``Optional[str]``


liuzhe-lz's avatar
liuzhe-lz committed
541
useActiveGpu
kvartet's avatar
kvartet committed
542
************
liuzhe-lz's avatar
liuzhe-lz committed
543
544
545
546
547

Specify whether NNI should submit trials to GPUs occupied by other tasks.

type: ``bool``

liuzhe-lz's avatar
liuzhe-lz committed
548
549
default: ``False``

kvartet's avatar
kvartet committed
550
551
552
553
554
555
556
557
558
559
560
561
562
Must be set when `trialGpuNumber`_ greater than zero.

Following processes can make GPU "active":

  - non-NNI CUDA programs
  - graphical desktop
  - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
  - other users' CUDA programs, if you are using a shared server
  
If your remote machine is a graphical OS like Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.

When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.

liuzhe-lz's avatar
liuzhe-lz committed
563

liuzhe-lz's avatar
liuzhe-lz committed
564
maxTrialNumberPerGpu
kvartet's avatar
kvartet committed
565
********************
liuzhe-lz's avatar
liuzhe-lz committed
566
567
568
569
570
571
572
573

Specify how many trials can share one GPU.

type: ``int``

default: ``1``


liuzhe-lz's avatar
liuzhe-lz committed
574
gpuIndices
kvartet's avatar
kvartet committed
575
**********
liuzhe-lz's avatar
liuzhe-lz committed
576
577
578

Limit the GPUs visible to trial processes.

kvartet's avatar
kvartet committed
579
type: ``Optional[list[int] | str | int]``
liuzhe-lz's avatar
liuzhe-lz committed
580

liuzhe-lz's avatar
liuzhe-lz committed
581
If `trialGpuNumber`_ is less than the length of this value, only a subset will be visible to each trial.
liuzhe-lz's avatar
liuzhe-lz committed
582
583
584
585

This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.


586
pythonPath
kvartet's avatar
kvartet committed
587
**********
liuzhe-lz's avatar
liuzhe-lz committed
588

kvartet's avatar
kvartet committed
589
Specify a Python environment.
liuzhe-lz's avatar
liuzhe-lz committed
590
591
592

type: ``Optional[str]``

kvartet's avatar
kvartet committed
593
594
595
596
597
598
599
600
601
602
This path will be inserted at the front of PATH. Here are some examples: 

    - (linux) pythonPath: ``/opt/python3.7/bin``
    - (windows) pythonPath: ``C:/Python37``

If you are working on Anaconda, there is some difference. On Windows, you also have to add ``../script`` and ``../Library/bin`` separated by ``;``. Examples are as below:

    - (linux anaconda) pythonPath: ``/home/yourname/anaconda3/envs/myenv/bin/``
    - (windows anaconda) pythonPath: ``C:/Users/yourname/.conda/envs/myenv;C:/Users/yourname/.conda/envs/myenv/Scripts;C:/Users/yourname/.conda/envs/myenv/Library/bin``

liuzhe-lz's avatar
liuzhe-lz committed
603
604
This is useful if preparing steps vary for different machines.

liuzhe-lz's avatar
liuzhe-lz committed
605
.. _openpai-class:
liuzhe-lz's avatar
liuzhe-lz committed
606

liuzhe-lz's avatar
liuzhe-lz committed
607
OpenpaiConfig
kvartet's avatar
kvartet committed
608
-------------
liuzhe-lz's avatar
liuzhe-lz committed
609

kvartet's avatar
kvartet committed
610
Detailed usage can be found `here <../TrainingService/PaiMode.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
611
612

platform
kvartet's avatar
kvartet committed
613
""""""""
liuzhe-lz's avatar
liuzhe-lz committed
614
615
616
617
618

Constant string ``"openpai"``.


host
kvartet's avatar
kvartet committed
619
""""
liuzhe-lz's avatar
liuzhe-lz committed
620
621
622
623
624

Hostname of OpenPAI service.

type: ``str``

kvartet's avatar
kvartet committed
625
This may include ``https://`` or ``http://`` prefix.
liuzhe-lz's avatar
liuzhe-lz committed
626
627
628

HTTPS will be used by default.

liuzhe-lz's avatar
liuzhe-lz committed
629
630

username
kvartet's avatar
kvartet committed
631
""""""""
liuzhe-lz's avatar
liuzhe-lz committed
632
633
634
635
636
637
638

OpenPAI user name.

type: ``str``


token
kvartet's avatar
kvartet committed
639
"""""
liuzhe-lz's avatar
liuzhe-lz committed
640
641
642
643
644
645
646
647

OpenPAI user token.

type: ``str``

This can be found in your OpenPAI user settings page.


kvartet's avatar
kvartet committed
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
trialCpuNumber
""""""""""""""

Specify the CPU number of each trial to be used in OpenPAI container.

type: ``int``


trialMemorySize
"""""""""""""""

Specify the memory size of each trial to be used in OpenPAI container.

type: ``str``

format: ``number + tb|gb|mb|kb``

examples: ``"8gb"``, ``"8192mb"``


storageConfigName
"""""""""""""""""

Specify the storage name used in OpenPAI.

type: ``str``


liuzhe-lz's avatar
liuzhe-lz committed
676
dockerImage
kvartet's avatar
kvartet committed
677
"""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
678

liuzhe-lz's avatar
liuzhe-lz committed
679
Name and tag of docker image to run the trials.
liuzhe-lz's avatar
liuzhe-lz committed
680

liuzhe-lz's avatar
liuzhe-lz committed
681
type: ``str``
liuzhe-lz's avatar
liuzhe-lz committed
682

liuzhe-lz's avatar
liuzhe-lz committed
683
default: ``"msranni/nni:latest"``
liuzhe-lz's avatar
liuzhe-lz committed
684
685


kvartet's avatar
kvartet committed
686
687
localStorageMountPoint
""""""""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
688

kvartet's avatar
kvartet committed
689
:ref:`Mount point <path>` of storage service (typically NFS) on the local machine.
liuzhe-lz's avatar
liuzhe-lz committed
690
691
692
693

type: ``str``


liuzhe-lz's avatar
liuzhe-lz committed
694
containerStorageMountPoint
kvartet's avatar
kvartet committed
695
""""""""""""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
696

liuzhe-lz's avatar
liuzhe-lz committed
697
Mount point of storage service (typically NFS) in docker container.
liuzhe-lz's avatar
liuzhe-lz committed
698
699
700

type: ``str``

liuzhe-lz's avatar
liuzhe-lz committed
701
This must be an absolute path.
liuzhe-lz's avatar
liuzhe-lz committed
702
703


liuzhe-lz's avatar
liuzhe-lz committed
704
reuseMode
kvartet's avatar
kvartet committed
705
"""""""""
liuzhe-lz's avatar
liuzhe-lz committed
706

kvartet's avatar
kvartet committed
707
Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__.
liuzhe-lz's avatar
liuzhe-lz committed
708
709
710
711
712
713

type: ``bool``

default: ``False``


liuzhe-lz's avatar
liuzhe-lz committed
714
openpaiConfig
kvartet's avatar
kvartet committed
715
"""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
716

liuzhe-lz's avatar
liuzhe-lz committed
717
718
719
720
721
722
Embedded OpenPAI config file.

type: ``Optional[JSON]``


openpaiConfigFile
kvartet's avatar
kvartet committed
723
"""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
724
725
726
727
728

`Path`_ to OpenPAI config file.

type: ``Optional[str]``

kvartet's avatar
kvartet committed
729
An example can be found `here <https://github.com/microsoft/pai/blob/master/docs/manual/cluster-user/examples/hello-world-job.yaml>`__.
liuzhe-lz's avatar
liuzhe-lz committed
730
731
732


AmlConfig
kvartet's avatar
kvartet committed
733
---------
liuzhe-lz's avatar
liuzhe-lz committed
734

kvartet's avatar
kvartet committed
735
Detailed usage can be found `here <../TrainingService/AMLMode.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
736
737
738


platform
kvartet's avatar
kvartet committed
739
""""""""
liuzhe-lz's avatar
liuzhe-lz committed
740
741
742
743
744

Constant string ``"aml"``.


dockerImage
kvartet's avatar
kvartet committed
745
"""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
746
747

Name and tag of docker image to run the trials.
liuzhe-lz's avatar
liuzhe-lz committed
748
749
750

type: ``str``

liuzhe-lz's avatar
liuzhe-lz committed
751
default: ``"msranni/nni:latest"``
liuzhe-lz's avatar
liuzhe-lz committed
752
753


liuzhe-lz's avatar
liuzhe-lz committed
754
subscriptionId
kvartet's avatar
kvartet committed
755
""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
756
757

Azure subscription ID.
liuzhe-lz's avatar
liuzhe-lz committed
758
759
760
761

type: ``str``


liuzhe-lz's avatar
liuzhe-lz committed
762
resourceGroup
kvartet's avatar
kvartet committed
763
"""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
764
765

Azure resource group name.
liuzhe-lz's avatar
liuzhe-lz committed
766

liuzhe-lz's avatar
liuzhe-lz committed
767
type: ``str``
liuzhe-lz's avatar
liuzhe-lz committed
768
769


liuzhe-lz's avatar
liuzhe-lz committed
770
workspaceName
kvartet's avatar
kvartet committed
771
"""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
772

liuzhe-lz's avatar
liuzhe-lz committed
773
774
775
Azure workspace name.

type: ``str``
liuzhe-lz's avatar
liuzhe-lz committed
776
777


liuzhe-lz's avatar
liuzhe-lz committed
778
computeTarget
kvartet's avatar
kvartet committed
779
"""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
780
781
782
783

AML compute cluster name.

type: ``str``
kvartet's avatar
kvartet committed
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932


HybridConfig
------------

Currently only support `LocalConfig`_, `RemoteConfig`_, :ref:`OpenpaiConfig <openpai-class>` and `AmlConfig`_ . Detailed usage can be found `here <../TrainingService/HybridMode.rst>`__.

type: list of `TrainingServiceConfig`_


SharedStorageConfig
^^^^^^^^^^^^^^^^^^^

Detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__.


nfsConfig
---------

storageType
"""""""""""

Constant string ``"NFS"``.


localMountPoint
"""""""""""""""

The path that the storage has been or will be mounted in the local machine.

type: ``str``

If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.


remoteMountPoint
""""""""""""""""

The path that the storage will be mounted in the remote achine.

type: ``str``

If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.


localMounted
""""""""""""

Specify the object and status to mount the shared storage.

type: ``str``

values: ``"usermount"``, ``"nnimount"``, ``"nomount"``

``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.


nfsServer
"""""""""

NFS server host.

type: ``str``


exportedDirectory
"""""""""""""""""

Exported directory of NFS server, detailed `here <https://www.ibm.com/docs/en/aix/7.2?topic=system-nfs-exporting-mounting>`_.

type: ``str``


azureBlobConfig
---------------

storageType
"""""""""""

Constant string ``"AzureBlob"``.


localMountPoint
"""""""""""""""

The path that the storage has been or will be mounted in the local machine.

type: ``str``

If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.


remoteMountPoint
""""""""""""""""

The path that the storage will be mounted in the remote achine.

type: ``str``

If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.

Note that the directory must be empty when using AzureBlob. 


localMounted
""""""""""""

Specify the object and status to mount the shared storage.

type: ``str``

values: ``"usermount"``, ``"nnimount"``, ``"nomount"``

``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.


storageAccountName
""""""""""""""""""

Azure storage account name.

type: ``str``


storageAccountKey
"""""""""""""""""

Azure storage account key.

type: ``Optional[str]``

When not set storageAccountKey, should use ``az login`` with Azure CLI at first and set `resourceGroupName`_.


resourceGroupName
"""""""""""""""""

Resource group that AzureBlob container belongs to.

type: ``Optional[str]``

Required if ``storageAccountKey`` not set.

containerName
"""""""""""""

AzureBlob container name.

type: ``str``