experiment_config.rst 19.5 KB
Newer Older
liuzhe-lz's avatar
liuzhe-lz committed
1
2
3
4
===========================
Experiment Config Reference
===========================

kvartet's avatar
kvartet committed
5
A config file is needed when creating an experiment. This document describes the rules to write a config file and provides some examples.
liuzhe-lz's avatar
liuzhe-lz committed
6

kvartet's avatar
kvartet committed
7
.. Note::
liuzhe-lz's avatar
liuzhe-lz committed
8

kvartet's avatar
kvartet committed
9
    1. This document lists field names with ``camelCase``. If users use these fields in the pythonic way with NNI Python APIs (e.g., ``nni.experiment``), the field names should be converted to ``snake_case``.
liuzhe-lz's avatar
liuzhe-lz committed
10

kvartet's avatar
kvartet committed
11
    2. In this document, the type of fields are formatted as `Python type hint <https://docs.python.org/3.10/library/typing.html>`_. Therefore JSON objects are called `dict` and arrays are called `list`.
liuzhe-lz's avatar
liuzhe-lz committed
12

kvartet's avatar
kvartet committed
13
    .. _path: 
liuzhe-lz's avatar
liuzhe-lz committed
14

kvartet's avatar
kvartet committed
15
    3. Some fields take a path to a file or directory. Unless otherwise noted, both absolute path and relative path are supported, and ``~`` will be expanded to the home directory.
liuzhe-lz's avatar
liuzhe-lz committed
16

kvartet's avatar
kvartet committed
17
18
19
20
21
22
23
24
25
26
       - When written in the YAML file, relative paths are relative to the directory containing that file.
       - When assigned in Python code, relative paths are relative to the current working directory.
       - All relative paths are converted to absolute when loading YAML file into Python class, and when saving Python class to YAML file.

    4. Setting a field to ``None`` or ``null`` is equivalent to not setting the field.

.. contents:: Contents
   :local:
   :depth: 3
 
liuzhe-lz's avatar
liuzhe-lz committed
27

liuzhe-lz's avatar
liuzhe-lz committed
28
29
30
31
32
33
34
35
36
37
38
39
40
Examples
========

Local Mode
^^^^^^^^^^

.. code-block:: yaml

    experimentName: MNIST
    searchSpaceFile: search_space.json
    trialCommand: python mnist.py
    trialCodeDirectory: .
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
41
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
    maxExperimentDuration: 24h
    maxTrialNumber: 100
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: local
      useActiveGpu: True

Local Mode (Inline Search Space)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

    searchSpace:
      batch_size:
        _type: choice
        _value: [16, 32, 64]
      learning_rate:
        _type: loguniform
        _value: [0.0001, 0.1]
    trialCommand: python mnist.py
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
66
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: local
      useActiveGpu: True

Remote Mode
^^^^^^^^^^^

.. code-block:: yaml

    experimentName: MNIST
    searchSpaceFile: search_space.json
    trialCommand: python mnist.py
    trialCodeDirectory: .
    trialGpuNumber: 1
liuzhe-lz's avatar
liuzhe-lz committed
85
    trialConcurrency: 2
liuzhe-lz's avatar
liuzhe-lz committed
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
    maxExperimentDuration: 24h
    maxTrialNumber: 100
    tuner:
      name: TPE
      classArgs:
        optimize_mode: maximize
    trainingService:
      platform: remote
      machineList:
        - host: 11.22.33.44
          user: alice
          password: xxxxx
        - host: my.domain.com
          user: bob
          sshKeyFile: ~/.ssh/id_rsa

Reference
=========

liuzhe-lz's avatar
liuzhe-lz committed
105
ExperimentConfig
liuzhe-lz's avatar
liuzhe-lz committed
106
^^^^^^^^^^^^^^^^
liuzhe-lz's avatar
liuzhe-lz committed
107

liuzhe-lz's avatar
liuzhe-lz committed
108
109
experimentName
--------------
liuzhe-lz's avatar
liuzhe-lz committed
110

kvartet's avatar
kvartet committed
111
Mnemonic name of the experiment, which will be shown in WebUI and nnictl.
liuzhe-lz's avatar
liuzhe-lz committed
112
113
114
115

type: ``Optional[str]``


liuzhe-lz's avatar
liuzhe-lz committed
116
117
searchSpaceFile
---------------
liuzhe-lz's avatar
liuzhe-lz committed
118

kvartet's avatar
kvartet committed
119
Path_ to the JSON file containing the search space.
liuzhe-lz's avatar
liuzhe-lz committed
120
121
122

type: ``Optional[str]``

kvartet's avatar
kvartet committed
123
Search space format is determined by tuner. The common format for built-in tuners is documented  `here <../Tutorial/SearchSpaceSpec.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
124

liuzhe-lz's avatar
liuzhe-lz committed
125
Mutually exclusive to `searchSpace`_.
liuzhe-lz's avatar
liuzhe-lz committed
126
127


liuzhe-lz's avatar
liuzhe-lz committed
128
129
searchSpace
-----------
liuzhe-lz's avatar
liuzhe-lz committed
130
131
132

Search space object.

liuzhe-lz's avatar
liuzhe-lz committed
133
type: ``Optional[JSON]``
liuzhe-lz's avatar
liuzhe-lz committed
134

liuzhe-lz's avatar
liuzhe-lz committed
135
The format is determined by tuner. Common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
136
137
138

Note that ``None`` means "no such field" so empty search space should be written as ``{}``.

liuzhe-lz's avatar
liuzhe-lz committed
139
Mutually exclusive to `searchSpaceFile`_.
liuzhe-lz's avatar
liuzhe-lz committed
140
141


liuzhe-lz's avatar
liuzhe-lz committed
142
143
trialCommand
------------
liuzhe-lz's avatar
liuzhe-lz committed
144

liuzhe-lz's avatar
liuzhe-lz committed
145
Command to launch trial.
liuzhe-lz's avatar
liuzhe-lz committed
146
147
148

type: ``str``

liuzhe-lz's avatar
liuzhe-lz committed
149
The command will be executed in bash on Linux and macOS, and in PowerShell on Windows.
liuzhe-lz's avatar
liuzhe-lz committed
150

kvartet's avatar
kvartet committed
151
152
Note that using ``python3`` on Linux and macOS, and using ``python`` on Windows.

liuzhe-lz's avatar
liuzhe-lz committed
153

liuzhe-lz's avatar
liuzhe-lz committed
154
155
trialCodeDirectory
------------------
liuzhe-lz's avatar
liuzhe-lz committed
156
157
158
159
160
161
162

`Path`_ to the directory containing trial source files.

type: ``str``

default: ``"."``

kvartet's avatar
kvartet committed
163
164
All files in this directory will be sent to the training machine, unless in the ``.nniignore`` file.
(See :ref:`nniignore <nniignore>` for details.)
liuzhe-lz's avatar
liuzhe-lz committed
165
166


liuzhe-lz's avatar
liuzhe-lz committed
167
168
trialConcurrency
----------------
liuzhe-lz's avatar
liuzhe-lz committed
169
170
171
172
173
174
175
176

Specify how many trials should be run concurrently.

type: ``int``

The real concurrency also depends on hardware resources and may be less than this value.


liuzhe-lz's avatar
liuzhe-lz committed
177
178
trialGpuNumber
--------------
liuzhe-lz's avatar
liuzhe-lz committed
179
180
181
182
183

Number of GPUs used by each trial.

type: ``Optional[int]``

kvartet's avatar
kvartet committed
184
This field might have slightly different meanings for various training services,
liuzhe-lz's avatar
liuzhe-lz committed
185
especially when set to ``0`` or ``None``.
kvartet's avatar
kvartet committed
186
See `training service's document <../training_services.rst>`__ for details.
liuzhe-lz's avatar
liuzhe-lz committed
187

kvartet's avatar
kvartet committed
188
In local mode, setting the field to ``0`` will prevent trials from accessing GPU (by empty ``CUDA_VISIBLE_DEVICES``).
liuzhe-lz's avatar
liuzhe-lz committed
189
190
And when set to ``None``, trials will be created and scheduled as if they did not use GPU,
but they can still use all GPU resources if they want.
liuzhe-lz's avatar
liuzhe-lz committed
191
192


liuzhe-lz's avatar
liuzhe-lz committed
193
194
maxExperimentDuration
---------------------
liuzhe-lz's avatar
liuzhe-lz committed
195
196
197
198
199
200
201
202
203

Limit the duration of this experiment if specified.

type: ``Optional[str]``

format: ``number + s|m|h|d``

examples: ``"10m"``, ``"0.5h"``

kvartet's avatar
kvartet committed
204
When time runs out, the experiment will stop creating trials but continue to serve WebUI.
liuzhe-lz's avatar
liuzhe-lz committed
205
206


liuzhe-lz's avatar
liuzhe-lz committed
207
208
maxTrialNumber
--------------
liuzhe-lz's avatar
liuzhe-lz committed
209
210
211
212
213

Limit the number of trials to create if specified.

type: ``Optional[int]``

kvartet's avatar
kvartet committed
214
When the budget runs out, the experiment will stop creating trials but continue to serve WebUI.
liuzhe-lz's avatar
liuzhe-lz committed
215
216


Ni Hao's avatar
Ni Hao committed
217
218
219
220
221
222
223
224
225
226
227
228
229
230
maxTrialDuration
---------------------

Limit the duration of trial job if specified.

type: ``Optional[str]``

format: ``number + s|m|h|d``

examples: ``"10m"``, ``"0.5h"``

When time runs out, the current trial job will stop.


liuzhe-lz's avatar
liuzhe-lz committed
231
232
nniManagerIp
------------
liuzhe-lz's avatar
liuzhe-lz committed
233

kvartet's avatar
kvartet committed
234
IP of the current machine, used by training machines to access NNI manager. Not used in local mode.
liuzhe-lz's avatar
liuzhe-lz committed
235
236
237

type: ``Optional[str]``

liuzhe-lz's avatar
liuzhe-lz committed
238
If not specified, IPv4 address of ``eth0`` will be used.
liuzhe-lz's avatar
liuzhe-lz committed
239

kvartet's avatar
kvartet committed
240
Except for the local mode, it is highly recommended to set this field manually.
liuzhe-lz's avatar
liuzhe-lz committed
241
242


liuzhe-lz's avatar
liuzhe-lz committed
243
244
245
246
useAnnotation
-------------

Enable `annotation <../Tutorial/AnnotationSpec.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
247
248
249
250
251

type: ``bool``

default: ``False``

liuzhe-lz's avatar
liuzhe-lz committed
252
When using annotation, `searchSpace`_ and `searchSpaceFile`_ should not be specified manually.
liuzhe-lz's avatar
liuzhe-lz committed
253
254
255
256
257
258
259
260
261
262
263


debug
-----

Enable debug mode.

type: ``bool``

default: ``False``

kvartet's avatar
kvartet committed
264
When enabled, logging will be more verbose and some internal validation will be loosened.
liuzhe-lz's avatar
liuzhe-lz committed
265
266


liuzhe-lz's avatar
liuzhe-lz committed
267
268
logLevel
--------
liuzhe-lz's avatar
liuzhe-lz committed
269

kvartet's avatar
kvartet committed
270
Set log level of the whole system.
liuzhe-lz's avatar
liuzhe-lz committed
271
272
273
274
275

type: ``Optional[str]``

values: ``"trace"``, ``"debug"``, ``"info"``, ``"warning"``, ``"error"``, ``"fatal"``

kvartet's avatar
kvartet committed
276
Defaults to "info" or "debug", depending on `debug`_ option. When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info".
liuzhe-lz's avatar
liuzhe-lz committed
277
278
279
280
281

Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc.

The exception is trial, whose logging level is directly managed by trial code.

liuzhe-lz's avatar
liuzhe-lz committed
282
For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``logging.CRITICAL``.
liuzhe-lz's avatar
liuzhe-lz committed
283
284


liuzhe-lz's avatar
liuzhe-lz committed
285
286
experimentWorkingDirectory
--------------------------
liuzhe-lz's avatar
liuzhe-lz committed
287

kvartet's avatar
kvartet committed
288
Specify the :ref:`directory <path>` to place log, checkpoint, metadata, and other run-time stuff.
liuzhe-lz's avatar
liuzhe-lz committed
289
290
291
292
293

type: ``Optional[str]``

By default uses ``~/nni-experiments``.

kvartet's avatar
kvartet committed
294
NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments.
liuzhe-lz's avatar
liuzhe-lz committed
295
296


liuzhe-lz's avatar
liuzhe-lz committed
297
298
tunerGpuIndices
---------------
liuzhe-lz's avatar
liuzhe-lz committed
299
300
301

Limit the GPUs visible to tuner, assessor, and advisor.

kvartet's avatar
kvartet committed
302
type: ``Optional[list[int] | str | int]``
liuzhe-lz's avatar
liuzhe-lz committed
303
304
305

This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process.

kvartet's avatar
kvartet committed
306
Because tuner, assessor, and advisor run in the same process, this option will affect them all.
liuzhe-lz's avatar
liuzhe-lz committed
307
308
309
310
311


tuner
-----

kvartet's avatar
kvartet committed
312
Specify the tuner. 
liuzhe-lz's avatar
liuzhe-lz committed
313
314
315

type: Optional `AlgorithmConfig`_

kvartet's avatar
kvartet committed
316
317
The built-in tuners can be found `here <../builtin_tuner.rst>`__ and you can follow `this tutorial <../Tuner/CustomizeTuner.rst>`__ to customize a new tuner.

liuzhe-lz's avatar
liuzhe-lz committed
318
319
320
321

assessor
--------

kvartet's avatar
kvartet committed
322
Specify the assessor. 
liuzhe-lz's avatar
liuzhe-lz committed
323
324
325

type: Optional `AlgorithmConfig`_

kvartet's avatar
kvartet committed
326
327
The built-in assessors can be found `here <../builtin_assessor.rst>`__ and you can follow `this tutorial <../Assessor/CustomizeAssessor.rst>`__ to customize a new assessor.

liuzhe-lz's avatar
liuzhe-lz committed
328
329
330
331

advisor
-------

kvartet's avatar
kvartet committed
332
Specify the advisor. 
liuzhe-lz's avatar
liuzhe-lz committed
333
334
335

type: Optional `AlgorithmConfig`_

kvartet's avatar
kvartet committed
336
337
NNI provides two built-in advisors: `BOHB <../Tuner/BohbAdvisor.rst>`__ and `Hyperband <../Tuner/HyperbandAdvisor.rst>`__, and you can follow `this tutorial <../Tuner/CustomizeAdvisor.rst>`__ to customize a new advisor.

liuzhe-lz's avatar
liuzhe-lz committed
338

liuzhe-lz's avatar
liuzhe-lz committed
339
340
trainingService
---------------
liuzhe-lz's avatar
liuzhe-lz committed
341

kvartet's avatar
kvartet committed
342
Specify the `training service <../TrainingService/Overview.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
343
344
345
346

type: `TrainingServiceConfig`_


kvartet's avatar
kvartet committed
347
348
349
350
351
352
353
354
sharedStorage
-------------

Configure the shared storage, detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__.

type: Optional `SharedStorageConfig`_


liuzhe-lz's avatar
liuzhe-lz committed
355
AlgorithmConfig
liuzhe-lz's avatar
liuzhe-lz committed
356
357
358
359
^^^^^^^^^^^^^^^

``AlgorithmConfig`` describes a tuner / assessor / advisor algorithm.

kvartet's avatar
kvartet committed
360
For customized algorithms, there are two ways to describe them:
liuzhe-lz's avatar
liuzhe-lz committed
361

kvartet's avatar
kvartet committed
362
  1. `Register the algorithm <../Tutorial/InstallCustomizedAlgos.rst>`__ to use it like built-in. (preferred)
liuzhe-lz's avatar
liuzhe-lz committed
363
364

  2. Specify code directory and class name directly.
liuzhe-lz's avatar
liuzhe-lz committed
365
366
367
368
369


name
----

kvartet's avatar
kvartet committed
370
Name of the built-in or registered algorithm.
liuzhe-lz's avatar
liuzhe-lz committed
371

kvartet's avatar
kvartet committed
372
type: ``str`` for the built-in and registered algorithm, ``None`` for other customized algorithms.
liuzhe-lz's avatar
liuzhe-lz committed
373
374


liuzhe-lz's avatar
liuzhe-lz committed
375
376
className
---------
liuzhe-lz's avatar
liuzhe-lz committed
377

kvartet's avatar
kvartet committed
378
Qualified class name of not registered customized algorithm.
liuzhe-lz's avatar
liuzhe-lz committed
379

kvartet's avatar
kvartet committed
380
type: ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.
liuzhe-lz's avatar
liuzhe-lz committed
381
382
383
384

example: ``"my_tuner.MyTuner"``


liuzhe-lz's avatar
liuzhe-lz committed
385
386
codeDirectory
-------------
liuzhe-lz's avatar
liuzhe-lz committed
387

kvartet's avatar
kvartet committed
388
`Path`_ to the directory containing the customized algorithm class.
liuzhe-lz's avatar
liuzhe-lz committed
389

kvartet's avatar
kvartet committed
390
type: ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms.
liuzhe-lz's avatar
liuzhe-lz committed
391
392


liuzhe-lz's avatar
liuzhe-lz committed
393
394
classArgs
---------
liuzhe-lz's avatar
liuzhe-lz committed
395
396
397
398
399
400
401
402
403

Keyword arguments passed to algorithm class' constructor.

type: ``Optional[dict[str, Any]]``

See algorithm's document for supported value.


TrainingServiceConfig
liuzhe-lz's avatar
liuzhe-lz committed
404
^^^^^^^^^^^^^^^^^^^^^
liuzhe-lz's avatar
liuzhe-lz committed
405

kvartet's avatar
kvartet committed
406
One of the following:
liuzhe-lz's avatar
liuzhe-lz committed
407

liuzhe-lz's avatar
liuzhe-lz committed
408
409
- `LocalConfig`_
- `RemoteConfig`_
kvartet's avatar
kvartet committed
410
- :ref:`OpenpaiConfig <openpai-class>`
liuzhe-lz's avatar
liuzhe-lz committed
411
- `AmlConfig`_
kvartet's avatar
kvartet committed
412
- `HybridConfig`_
liuzhe-lz's avatar
liuzhe-lz committed
413

kvartet's avatar
kvartet committed
414
For `Kubeflow <../TrainingService/KubeflowMode.rst>`_, `FrameworkController <../TrainingService/FrameworkControllerMode.rst>`_, and `AdaptDL <../TrainingService/AdaptDLMode.rst>`_ training platforms, it is suggested to use `v1 config schema <../Tutorial/ExperimentConfig.rst>`_ for now.
liuzhe-lz's avatar
liuzhe-lz committed
415
416
417


LocalConfig
kvartet's avatar
kvartet committed
418
-----------
liuzhe-lz's avatar
liuzhe-lz committed
419

kvartet's avatar
kvartet committed
420
Detailed usage can be found `here <../TrainingService/LocalMode.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
421
422

platform
kvartet's avatar
kvartet committed
423
""""""""
liuzhe-lz's avatar
liuzhe-lz committed
424
425
426
427

Constant string ``"local"``.


liuzhe-lz's avatar
liuzhe-lz committed
428
useActiveGpu
kvartet's avatar
kvartet committed
429
""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
430
431
432

Specify whether NNI should submit trials to GPUs occupied by other tasks.

liuzhe-lz's avatar
liuzhe-lz committed
433
type: ``Optional[bool]``
liuzhe-lz's avatar
liuzhe-lz committed
434

kvartet's avatar
kvartet committed
435
Must be set when `trialGpuNumber`_ greater than zero.
liuzhe-lz's avatar
liuzhe-lz committed
436

kvartet's avatar
kvartet committed
437
438
439
440
441
442
443
444
445
446
Following processes can make GPU "active":

  - non-NNI CUDA programs
  - graphical desktop
  - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
  - other users' CUDA programs, if you are using a shared server
  
If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.

When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.
liuzhe-lz's avatar
liuzhe-lz committed
447
448


liuzhe-lz's avatar
liuzhe-lz committed
449
maxTrialNumberPerGpu
kvartet's avatar
kvartet committed
450
""""""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
451
452
453
454
455
456
457
458

Specify how many trials can share one GPU.

type: ``int``

default: ``1``


liuzhe-lz's avatar
liuzhe-lz committed
459
gpuIndices
kvartet's avatar
kvartet committed
460
""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
461
462
463

Limit the GPUs visible to trial processes.

kvartet's avatar
kvartet committed
464
type: ``Optional[list[int] | str | int]``
liuzhe-lz's avatar
liuzhe-lz committed
465

liuzhe-lz's avatar
liuzhe-lz committed
466
If `trialGpuNumber`_ is less than the length of this value, only a subset will be visible to each trial.
liuzhe-lz's avatar
liuzhe-lz committed
467
468
469
470
471

This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.


RemoteConfig
kvartet's avatar
kvartet committed
472
------------
liuzhe-lz's avatar
liuzhe-lz committed
473

kvartet's avatar
kvartet committed
474
Detailed usage can be found `here <../TrainingService/RemoteMachineMode.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
475
476

platform
kvartet's avatar
kvartet committed
477
""""""""
liuzhe-lz's avatar
liuzhe-lz committed
478
479
480
481

Constant string ``"remote"``.


liuzhe-lz's avatar
liuzhe-lz committed
482
machineList
kvartet's avatar
kvartet committed
483
"""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
484
485
486
487
488
489

List of training machines.

type: list of `RemoteMachineConfig`_


liuzhe-lz's avatar
liuzhe-lz committed
490
reuseMode
kvartet's avatar
kvartet committed
491
"""""""""
liuzhe-lz's avatar
liuzhe-lz committed
492

kvartet's avatar
kvartet committed
493
Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__.
liuzhe-lz's avatar
liuzhe-lz committed
494

liuzhe-lz's avatar
liuzhe-lz committed
495
type: ``bool``
liuzhe-lz's avatar
liuzhe-lz committed
496
497
498


RemoteMachineConfig
kvartet's avatar
kvartet committed
499
"""""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
500
501

host
kvartet's avatar
kvartet committed
502
****
liuzhe-lz's avatar
liuzhe-lz committed
503
504
505
506
507
508
509

IP or hostname (domain name) of the machine.

type: ``str``


port
kvartet's avatar
kvartet committed
510
****
liuzhe-lz's avatar
liuzhe-lz committed
511
512
513
514
515

SSH service port.

type: ``int``

liuzhe-lz's avatar
liuzhe-lz committed
516
default: ``22``
liuzhe-lz's avatar
liuzhe-lz committed
517
518
519


user
kvartet's avatar
kvartet committed
520
****
liuzhe-lz's avatar
liuzhe-lz committed
521
522
523
524
525
526
527

Login user name.

type: ``str``


password
kvartet's avatar
kvartet committed
528
********
liuzhe-lz's avatar
liuzhe-lz committed
529
530
531
532
533

Login password.

type: ``Optional[str]``

liuzhe-lz's avatar
liuzhe-lz committed
534
If not specified, `sshKeyFile`_ will be used instead.
liuzhe-lz's avatar
liuzhe-lz committed
535
536


liuzhe-lz's avatar
liuzhe-lz committed
537
sshKeyFile
kvartet's avatar
kvartet committed
538
**********
liuzhe-lz's avatar
liuzhe-lz committed
539

liuzhe-lz's avatar
liuzhe-lz committed
540
`Path`_ to sshKeyFile (identity file).
liuzhe-lz's avatar
liuzhe-lz committed
541

liuzhe-lz's avatar
liuzhe-lz committed
542
type: ``Optional[str]``
liuzhe-lz's avatar
liuzhe-lz committed
543
544
545
546

Only used when `password`_ is not specified.


liuzhe-lz's avatar
liuzhe-lz committed
547
sshPassphrase
kvartet's avatar
kvartet committed
548
*************
liuzhe-lz's avatar
liuzhe-lz committed
549
550
551
552
553
554

Passphrase of SSH identity file.

type: ``Optional[str]``


liuzhe-lz's avatar
liuzhe-lz committed
555
useActiveGpu
kvartet's avatar
kvartet committed
556
************
liuzhe-lz's avatar
liuzhe-lz committed
557
558
559
560
561

Specify whether NNI should submit trials to GPUs occupied by other tasks.

type: ``bool``

liuzhe-lz's avatar
liuzhe-lz committed
562
563
default: ``False``

kvartet's avatar
kvartet committed
564
565
566
567
568
569
570
571
572
573
574
575
576
Must be set when `trialGpuNumber`_ greater than zero.

Following processes can make GPU "active":

  - non-NNI CUDA programs
  - graphical desktop
  - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time
  - other users' CUDA programs, if you are using a shared server
  
If your remote machine is a graphical OS like Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial.

When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously.

liuzhe-lz's avatar
liuzhe-lz committed
577

liuzhe-lz's avatar
liuzhe-lz committed
578
maxTrialNumberPerGpu
kvartet's avatar
kvartet committed
579
********************
liuzhe-lz's avatar
liuzhe-lz committed
580
581
582
583
584
585
586
587

Specify how many trials can share one GPU.

type: ``int``

default: ``1``


liuzhe-lz's avatar
liuzhe-lz committed
588
gpuIndices
kvartet's avatar
kvartet committed
589
**********
liuzhe-lz's avatar
liuzhe-lz committed
590
591
592

Limit the GPUs visible to trial processes.

kvartet's avatar
kvartet committed
593
type: ``Optional[list[int] | str | int]``
liuzhe-lz's avatar
liuzhe-lz committed
594

liuzhe-lz's avatar
liuzhe-lz committed
595
If `trialGpuNumber`_ is less than the length of this value, only a subset will be visible to each trial.
liuzhe-lz's avatar
liuzhe-lz committed
596
597
598
599

This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable.


600
pythonPath
kvartet's avatar
kvartet committed
601
**********
liuzhe-lz's avatar
liuzhe-lz committed
602

kvartet's avatar
kvartet committed
603
Specify a Python environment.
liuzhe-lz's avatar
liuzhe-lz committed
604
605
606

type: ``Optional[str]``

kvartet's avatar
kvartet committed
607
608
609
610
611
612
613
614
615
616
This path will be inserted at the front of PATH. Here are some examples: 

    - (linux) pythonPath: ``/opt/python3.7/bin``
    - (windows) pythonPath: ``C:/Python37``

If you are working on Anaconda, there is some difference. On Windows, you also have to add ``../script`` and ``../Library/bin`` separated by ``;``. Examples are as below:

    - (linux anaconda) pythonPath: ``/home/yourname/anaconda3/envs/myenv/bin/``
    - (windows anaconda) pythonPath: ``C:/Users/yourname/.conda/envs/myenv;C:/Users/yourname/.conda/envs/myenv/Scripts;C:/Users/yourname/.conda/envs/myenv/Library/bin``

liuzhe-lz's avatar
liuzhe-lz committed
617
618
This is useful if preparing steps vary for different machines.

liuzhe-lz's avatar
liuzhe-lz committed
619
.. _openpai-class:
liuzhe-lz's avatar
liuzhe-lz committed
620

liuzhe-lz's avatar
liuzhe-lz committed
621
OpenpaiConfig
kvartet's avatar
kvartet committed
622
-------------
liuzhe-lz's avatar
liuzhe-lz committed
623

kvartet's avatar
kvartet committed
624
Detailed usage can be found `here <../TrainingService/PaiMode.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
625
626

platform
kvartet's avatar
kvartet committed
627
""""""""
liuzhe-lz's avatar
liuzhe-lz committed
628
629
630
631
632

Constant string ``"openpai"``.


host
kvartet's avatar
kvartet committed
633
""""
liuzhe-lz's avatar
liuzhe-lz committed
634
635
636
637
638

Hostname of OpenPAI service.

type: ``str``

kvartet's avatar
kvartet committed
639
This may include ``https://`` or ``http://`` prefix.
liuzhe-lz's avatar
liuzhe-lz committed
640
641
642

HTTPS will be used by default.

liuzhe-lz's avatar
liuzhe-lz committed
643
644

username
kvartet's avatar
kvartet committed
645
""""""""
liuzhe-lz's avatar
liuzhe-lz committed
646
647
648
649
650
651
652

OpenPAI user name.

type: ``str``


token
kvartet's avatar
kvartet committed
653
"""""
liuzhe-lz's avatar
liuzhe-lz committed
654
655
656
657
658
659
660
661

OpenPAI user token.

type: ``str``

This can be found in your OpenPAI user settings page.


kvartet's avatar
kvartet committed
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
trialCpuNumber
""""""""""""""

Specify the CPU number of each trial to be used in OpenPAI container.

type: ``int``


trialMemorySize
"""""""""""""""

Specify the memory size of each trial to be used in OpenPAI container.

type: ``str``

format: ``number + tb|gb|mb|kb``

examples: ``"8gb"``, ``"8192mb"``


storageConfigName
"""""""""""""""""

Specify the storage name used in OpenPAI.

type: ``str``


liuzhe-lz's avatar
liuzhe-lz committed
690
dockerImage
kvartet's avatar
kvartet committed
691
"""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
692

liuzhe-lz's avatar
liuzhe-lz committed
693
Name and tag of docker image to run the trials.
liuzhe-lz's avatar
liuzhe-lz committed
694

liuzhe-lz's avatar
liuzhe-lz committed
695
type: ``str``
liuzhe-lz's avatar
liuzhe-lz committed
696

liuzhe-lz's avatar
liuzhe-lz committed
697
default: ``"msranni/nni:latest"``
liuzhe-lz's avatar
liuzhe-lz committed
698
699


kvartet's avatar
kvartet committed
700
701
localStorageMountPoint
""""""""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
702

kvartet's avatar
kvartet committed
703
:ref:`Mount point <path>` of storage service (typically NFS) on the local machine.
liuzhe-lz's avatar
liuzhe-lz committed
704
705
706
707

type: ``str``


liuzhe-lz's avatar
liuzhe-lz committed
708
containerStorageMountPoint
kvartet's avatar
kvartet committed
709
""""""""""""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
710

liuzhe-lz's avatar
liuzhe-lz committed
711
Mount point of storage service (typically NFS) in docker container.
liuzhe-lz's avatar
liuzhe-lz committed
712
713
714

type: ``str``

liuzhe-lz's avatar
liuzhe-lz committed
715
This must be an absolute path.
liuzhe-lz's avatar
liuzhe-lz committed
716
717


liuzhe-lz's avatar
liuzhe-lz committed
718
reuseMode
kvartet's avatar
kvartet committed
719
"""""""""
liuzhe-lz's avatar
liuzhe-lz committed
720

kvartet's avatar
kvartet committed
721
Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__.
liuzhe-lz's avatar
liuzhe-lz committed
722
723
724
725
726
727

type: ``bool``

default: ``False``


liuzhe-lz's avatar
liuzhe-lz committed
728
openpaiConfig
kvartet's avatar
kvartet committed
729
"""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
730

liuzhe-lz's avatar
liuzhe-lz committed
731
732
733
734
735
736
Embedded OpenPAI config file.

type: ``Optional[JSON]``


openpaiConfigFile
kvartet's avatar
kvartet committed
737
"""""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
738
739
740
741
742

`Path`_ to OpenPAI config file.

type: ``Optional[str]``

kvartet's avatar
kvartet committed
743
An example can be found `here <https://github.com/microsoft/pai/blob/master/docs/manual/cluster-user/examples/hello-world-job.yaml>`__.
liuzhe-lz's avatar
liuzhe-lz committed
744
745
746


AmlConfig
kvartet's avatar
kvartet committed
747
---------
liuzhe-lz's avatar
liuzhe-lz committed
748

kvartet's avatar
kvartet committed
749
Detailed usage can be found `here <../TrainingService/AMLMode.rst>`__.
liuzhe-lz's avatar
liuzhe-lz committed
750
751
752


platform
kvartet's avatar
kvartet committed
753
""""""""
liuzhe-lz's avatar
liuzhe-lz committed
754
755
756
757
758

Constant string ``"aml"``.


dockerImage
kvartet's avatar
kvartet committed
759
"""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
760
761

Name and tag of docker image to run the trials.
liuzhe-lz's avatar
liuzhe-lz committed
762
763
764

type: ``str``

liuzhe-lz's avatar
liuzhe-lz committed
765
default: ``"msranni/nni:latest"``
liuzhe-lz's avatar
liuzhe-lz committed
766
767


liuzhe-lz's avatar
liuzhe-lz committed
768
subscriptionId
kvartet's avatar
kvartet committed
769
""""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
770
771

Azure subscription ID.
liuzhe-lz's avatar
liuzhe-lz committed
772
773
774
775

type: ``str``


liuzhe-lz's avatar
liuzhe-lz committed
776
resourceGroup
kvartet's avatar
kvartet committed
777
"""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
778
779

Azure resource group name.
liuzhe-lz's avatar
liuzhe-lz committed
780

liuzhe-lz's avatar
liuzhe-lz committed
781
type: ``str``
liuzhe-lz's avatar
liuzhe-lz committed
782
783


liuzhe-lz's avatar
liuzhe-lz committed
784
workspaceName
kvartet's avatar
kvartet committed
785
"""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
786

liuzhe-lz's avatar
liuzhe-lz committed
787
788
789
Azure workspace name.

type: ``str``
liuzhe-lz's avatar
liuzhe-lz committed
790
791


liuzhe-lz's avatar
liuzhe-lz committed
792
computeTarget
kvartet's avatar
kvartet committed
793
"""""""""""""
liuzhe-lz's avatar
liuzhe-lz committed
794
795
796
797

AML compute cluster name.

type: ``str``
kvartet's avatar
kvartet committed
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835


HybridConfig
------------

Currently only support `LocalConfig`_, `RemoteConfig`_, :ref:`OpenpaiConfig <openpai-class>` and `AmlConfig`_ . Detailed usage can be found `here <../TrainingService/HybridMode.rst>`__.

type: list of `TrainingServiceConfig`_


SharedStorageConfig
^^^^^^^^^^^^^^^^^^^

Detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__.


nfsConfig
---------

storageType
"""""""""""

Constant string ``"NFS"``.


localMountPoint
"""""""""""""""

The path that the storage has been or will be mounted in the local machine.

type: ``str``

If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.


remoteMountPoint
""""""""""""""""

836
The path that the storage will be mounted in the remote machine.
kvartet's avatar
kvartet committed
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892

type: ``str``

If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.


localMounted
""""""""""""

Specify the object and status to mount the shared storage.

type: ``str``

values: ``"usermount"``, ``"nnimount"``, ``"nomount"``

``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.


nfsServer
"""""""""

NFS server host.

type: ``str``


exportedDirectory
"""""""""""""""""

Exported directory of NFS server, detailed `here <https://www.ibm.com/docs/en/aix/7.2?topic=system-nfs-exporting-mounting>`_.

type: ``str``


azureBlobConfig
---------------

storageType
"""""""""""

Constant string ``"AzureBlob"``.


localMountPoint
"""""""""""""""

The path that the storage has been or will be mounted in the local machine.

type: ``str``

If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``.


remoteMountPoint
""""""""""""""""

893
The path that the storage will be mounted in the remote machine.
kvartet's avatar
kvartet committed
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946

type: ``str``

If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``.

Note that the directory must be empty when using AzureBlob. 


localMounted
""""""""""""

Specify the object and status to mount the shared storage.

type: ``str``

values: ``"usermount"``, ``"nnimount"``, ``"nomount"``

``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future.


storageAccountName
""""""""""""""""""

Azure storage account name.

type: ``str``


storageAccountKey
"""""""""""""""""

Azure storage account key.

type: ``Optional[str]``

When not set storageAccountKey, should use ``az login`` with Azure CLI at first and set `resourceGroupName`_.


resourceGroupName
"""""""""""""""""

Resource group that AzureBlob container belongs to.

type: ``Optional[str]``

Required if ``storageAccountKey`` not set.

containerName
"""""""""""""

AzureBlob container name.

type: ``str``