ExperimentConfig.md 18.8 KB
Newer Older
Scarlett Li's avatar
Scarlett Li committed
1
# Experiment config reference
Deshui Yu's avatar
Deshui Yu committed
2

SparkSnail's avatar
SparkSnail committed
3
A config file is needed when create an experiment, the path of the config file is provide to nnictl.
4
The config file is written in YAML format, and need to be written correctly.
SparkSnail's avatar
SparkSnail committed
5
This document describes the rule to write config file, and will provide some examples and templates. 
Yan Ni's avatar
Yan Ni committed
6
7
8
9
10
11

 - [Template](#Template) (the templates of an config file)
 - [Configuration spec](#Configuration) (the configuration specification of every attribute in config file)
 - [Examples](#Examples) (the examples of config file)

<a name="Template"></a>
Deshui Yu's avatar
Deshui Yu committed
12
13
## Template
* __light weight(without Annotation and Assessor)__ 
Chi Song's avatar
Chi Song committed
14

Deshui Yu's avatar
Deshui Yu committed
15
16
17
18
19
20
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
SparkSnail's avatar
SparkSnail committed
21
#choice: local, remote, pai, kubeflow
Deshui Yu's avatar
Deshui Yu committed
22
23
24
25
26
27
trainingServicePlatform: 
searchSpacePath: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
28
29
30
31
32
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
33
trial:
34
35
36
  command: 
  codeDir: 
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
37
38
39
40
41
42
43
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
Chi Song's avatar
Chi Song committed
44

Deshui Yu's avatar
Deshui Yu committed
45
* __Use Assessor__
Chi Song's avatar
Chi Song committed
46

Deshui Yu's avatar
Deshui Yu committed
47
48
49
50
51
52
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
SparkSnail's avatar
SparkSnail committed
53
#choice: local, remote, pai, kubeflow
Deshui Yu's avatar
Deshui Yu committed
54
55
56
57
58
59
trainingServicePlatform: 
searchSpacePath: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
60
61
62
63
64
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
65
66
assessor:
  #choice: Medianstop
67
68
69
70
71
  builtinAssessorName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
72
trial:
73
74
75
  command: 
  codeDir: 
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
76
77
78
79
80
81
82
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
Chi Song's avatar
Chi Song committed
83

Deshui Yu's avatar
Deshui Yu committed
84
* __Use Annotation__
Chi Song's avatar
Chi Song committed
85

Deshui Yu's avatar
Deshui Yu committed
86
87
88
89
90
91
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
SparkSnail's avatar
SparkSnail committed
92
#choice: local, remote, pai, kubeflow
Deshui Yu's avatar
Deshui Yu committed
93
94
95
96
97
trainingServicePlatform: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
98
99
100
101
102
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
103
104
assessor:
  #choice: Medianstop
105
106
107
108
109
  builtinAssessorName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
110
trial:
111
112
113
  command: 
  codeDir: 
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
114
115
116
117
118
119
120
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
Yan Ni's avatar
Yan Ni committed
121
122
<a name="Configuration"></a>
## Configuration spec
Deshui Yu's avatar
Deshui Yu committed
123
124
125
126
* __authorName__
  * Description  
            
	 __authorName__ is the name of the author who create the experiment.
127
   TBD: add default value
Deshui Yu's avatar
Deshui Yu committed
128
129
130
131
	 
* __experimentName__
  * Description
  
SparkSnail's avatar
SparkSnail committed
132
    __experimentName__ is the name of the experiment created.  
133
    TBD: add default value
Deshui Yu's avatar
Deshui Yu committed
134
135
136
137
	
* __trialConcurrency__
  * Description
    
Chi Song's avatar
Chi Song committed
138
      __trialConcurrency__ specifies the max num of trial jobs run simultaneously.  
Deshui Yu's avatar
Deshui Yu committed
139
	 
Yan Ni's avatar
Yan Ni committed
140
        Note: if trialGpuNum is bigger than the free gpu numbers, and the trial jobs running simultaneously can not reach trialConcurrency number, some trial jobs will be put into a queue to wait for gpu allocation.
Deshui Yu's avatar
Deshui Yu committed
141
142
143
144
145
	 
* __maxExecDuration__
  * Description
    
	__maxExecDuration__ specifies the max duration time of an experiment.The unit of the time is {__s__, __m__, __h__, __d__}, which means {_seconds_, _minutes_, _hours_, _days_}.  
Yan Ni's avatar
Yan Ni committed
146
147

        Note: The maxExecDuration spec set the time of an experiment, not a trial job. If the experiment reach the max duration time, the experiment will not stop, but could not submit new trial jobs any more.
Deshui Yu's avatar
Deshui Yu committed
148
149
150
151
	
* __maxTrialNum__
  *  Description
    
152
	 __maxTrialNum__ specifies the max number of trial jobs created by NNI, including succeeded and failed jobs.  
Deshui Yu's avatar
Deshui Yu committed
153
154
155
156
	 
* __trainingServicePlatform__
  * Description
      
SparkSnail's avatar
SparkSnail committed
157
	  __trainingServicePlatform__ specifies the platform to run the experiment, including {__local__, __remote__, __pai__, __kubeflow__}.  
Deshui Yu's avatar
Deshui Yu committed
158
	
SparkSnail's avatar
SparkSnail committed
159
160
161
162
    * __local__ run an experiment on local ubuntu machine.  
	
	
    * __remote__ submit trial jobs to remote ubuntu machines, and __machineList__ field should be filed in order to set up SSH connection to remote machine.  
SparkSnail's avatar
SparkSnail committed
163

SparkSnail's avatar
SparkSnail committed
164
165
166
	
    * __pai__  submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md)
   
167
    * __kubeflow__ submit trial jobs to [kubeflow](https://www.kubeflow.org/docs/about/kubeflow/), NNI support kubeflow based on normal kubernetes and [azure kubernetes](https://azure.microsoft.com/en-us/services/kubernetes-service/).
Deshui Yu's avatar
Deshui Yu committed
168
169
170
171
	
* __searchSpacePath__
  * Description
    
Chi Song's avatar
Chi Song committed
172
      __searchSpacePath__ specifies the path of search space file, which should be a valid path in the local linux machine.
Deshui Yu's avatar
Deshui Yu committed
173
	        
Chi Song's avatar
Chi Song committed
174
      Note: if set useAnnotation=True, the searchSpacePath field should be removed.
Deshui Yu's avatar
Deshui Yu committed
175
176
177
* __useAnnotation__
  * Description
   
SparkSnail's avatar
SparkSnail committed
178
179
    __useAnnotation__ use annotation to analysis trial code and generate search space. 
	   
Chi Song's avatar
Chi Song committed
180
    Note: if set useAnnotation=True, the searchSpacePath field should be removed.
SparkSnail's avatar
SparkSnail committed
181
182
183
184

* __nniManagerIp__
  * Description
   
185
    __nniManagerIp__ set the IP address of the machine on which NNI manager process runs. This field is optional, and if it's not set, eth0 device IP will be used instead.
SparkSnail's avatar
SparkSnail committed
186

Chi Song's avatar
Chi Song committed
187
    Note: run ifconfig on NNI manager's machine to check if eth0 device exists. If not, we recommend to set nnimanagerIp explicitly.
188
189
190
191
192
193
194
195
196
197
198
199

* __logDir__
  * Description

    __logDir__ configures the directory to store logs and data of the experiment. The default value is `<user home directory>/nni/experiment`

* __logLevel__
  * Description

    __logLevel__ sets log level for the experiment, available log levels are: `trace, debug, info, warning, error, fatal`. The default value is `info`.


Deshui Yu's avatar
Deshui Yu committed
200
201
202
* __tuner__
  * Description
  
203
    __tuner__ specifies the tuner algorithm in the experiment, there are two kinds of ways to set tuner. One way is to use tuner provided by NNI sdk, need to set __builtinTunerName__ and __classArgs__. Another way is to use users' own tuner file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
204
205
  * __builtinTunerName__ and __classArgs__
    * __builtinTunerName__
Deshui Yu's avatar
Deshui Yu committed
206
    
207
	  __builtinTunerName__ specifies the name of system tuner, NNI sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__, __BatchTuner__, __GridSearch__}
Chi Song's avatar
Chi Song committed
208
    * __classArgs__
Deshui Yu's avatar
Deshui Yu committed
209
	
SparkSnail's avatar
SparkSnail committed
210
	   __classArgs__ specifies the arguments of tuner algorithm. If the __builtinTunerName__ is in {__TPE__, __Random__, __Anneal__, __Evolution__}, user should set __optimize_mode__.
211
  * __codeDir__, __classFileName__, __className__ and __classArgs__
Chi Song's avatar
Chi Song committed
212
     * __codeDir__
Deshui Yu's avatar
Deshui Yu committed
213
        
Chi Song's avatar
Chi Song committed
214
215
          __codeDir__ specifies the directory of tuner code.
     * __classFileName__
Deshui Yu's avatar
Deshui Yu committed
216
	   
Chi Song's avatar
Chi Song committed
217
          __classFileName__ specifies the name of tuner file.
218
219
     * __className__
	   
Chi Song's avatar
Chi Song committed
220
          __className__ specifies the name of tuner class.
221
222
     * __classArgs__
	   
Chi Song's avatar
Chi Song committed
223
          __classArgs__ specifies the arguments of tuner algorithm.
224
  * __gpuNum__
Deshui Yu's avatar
Deshui Yu committed
225
    
Chi Song's avatar
Chi Song committed
226
      __gpuNum__ specifies the gpu number to run the tuner process. The value of this field should be a positive number.
Deshui Yu's avatar
Deshui Yu committed
227
	  
Chi Song's avatar
Chi Song committed
228
      Note: users could only specify one way to set tuner, for example, set {tunerName, optimizationMode} or {tunerCommand, tunerCwd}, and could not set them both. 
Deshui Yu's avatar
Deshui Yu committed
229
230
231
232

* __assessor__
 
  * Description
233
  
234
    __assessor__ specifies the assessor algorithm to run an experiment, there are two kinds of ways to set assessor. One way is to use assessor provided by NNI sdk, users need to set __builtinAssessorName__ and __classArgs__. Another way is to use users' own assessor file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
235
236
  * __builtinAssessorName__ and __classArgs__
    * __builtinAssessorName__
Deshui Yu's avatar
Deshui Yu committed
237
    
238
        __builtinAssessorName__ specifies the name of system assessor, NNI sdk provides one kind of assessor {__Medianstop__}
Chi Song's avatar
Chi Song committed
239
240
241
    * __classArgs__

        __classArgs__ specifies the arguments of assessor algorithm
242
  * __codeDir__, __classFileName__, __className__ and __classArgs__
Chi Song's avatar
Chi Song committed
243
    * __codeDir__
Deshui Yu's avatar
Deshui Yu committed
244
        
Chi Song's avatar
Chi Song committed
245
246
         __codeDir__ specifies the directory of assessor code.
    * __classFileName__
247
	   
Chi Song's avatar
Chi Song committed
248
249
         __classFileName__ specifies the name of assessor file.
    * __className__
250
	   
Chi Song's avatar
Chi Song committed
251
252
         __className__ specifies the name of assessor class.
    * __classArgs__
253
	   
Chi Song's avatar
Chi Song committed
254
         __classArgs__ specifies the arguments of assessor algorithm.
255
  * __gpuNum__
Deshui Yu's avatar
Deshui Yu committed
256
    
Chi Song's avatar
Chi Song committed
257
      __gpuNum__ specifies the gpu number to run the assessor process. The value of this field should be a positive number.
Deshui Yu's avatar
Deshui Yu committed
258

Chi Song's avatar
Chi Song committed
259
      Note: users' could only specify one way to set assessor, for example,set {assessorName, optimizationMode} or {assessorCommand, assessorCwd}, and users could not set them both.If users do not want to use assessor, assessor fileld should leave to empty. 
SparkSnail's avatar
SparkSnail committed
260
* __trial(local, remote)__
261
  * __command__
Deshui Yu's avatar
Deshui Yu committed
262

263
264
      __command__  specifies the command to run trial process.
  * __codeDir__
Deshui Yu's avatar
Deshui Yu committed
265
    
Chi Song's avatar
Chi Song committed
266
      __codeDir__ specifies the directory of your own trial file.
267
  * __gpuNum__
Deshui Yu's avatar
Deshui Yu committed
268
    
Chi Song's avatar
Chi Song committed
269
      __gpuNum__ specifies the num of gpu to run the trial process. Default value is 0. 
SparkSnail's avatar
SparkSnail committed
270
271
272
273
274
275
276

* __trial(pai)__
  * __command__

      __command__  specifies the command to run trial process.
  * __codeDir__
    
Chi Song's avatar
Chi Song committed
277
      __codeDir__ specifies the directory of the own trial file.
SparkSnail's avatar
SparkSnail committed
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
  * __gpuNum__
    
	  __gpuNum__ specifies the num of gpu to run the trial process. Default value is 0.
  * __cpuNum__

    __cpuNum__ is the cpu number of cpu to be used in pai container.
  * __memoryMB__

    __memoryMB__ set the momory size to be used in pai's container.
  
  * __image__

    __image__ set the image to be used in pai.

  * __dataDir__

    __dataDir__ is the data directory in hdfs to be used.
  
  * __outputDir__
   
    __outputDir__ is the output directory in hdfs to be used in pai, the stdout and stderr files are stored in the directory after job finished.
  


* __trial(kubeflow)__
  
  * __codeDir__
    
    __codeDir__ is the local directory where the code files in.
  
  * __ps(optional)__
    
    __ps__ is the configuration for kubeflow's tensorflow-operator. 
    * __replicas__
      
      __replicas__ is the replica number of __ps__ role.
    
    * __command__
      
      __command__ is the run script in __ps__'s container.
    
    * __gpuNum__
     
      __gpuNum__ set the gpu number to be used in __ps__ container.
    
    * __cpuNum__
    
      __cpuNum__ set the cpu number to be used in __ps__ container.
    
    * __memoryMB__
      
      __memoryMB__ set the memory size of the container.
    
    * __image__
      
Chi Song's avatar
Chi Song committed
333
      __image__ set the image to be used in __ps__.
SparkSnail's avatar
SparkSnail committed
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359

  * __worker__
    
    __worker__ is the configuration for kubeflow's tensorflow-operator. 
    * __replicas__
      
      __replicas__ is the replica number of __worker__ role.
    
    * __command__
      
      __command__ is the run script in __worker__'s container.
    
    * __gpuNum__
     
      __gpuNum__ set the gpu number to be used in __worker__ container.
    
    * __cpuNum__
    
      __cpuNum__ set the cpu number to be used in __worker__ container.
    
    * __memoryMB__
      
      __memoryMB__ set the memory size of the container.
    
    * __image__
      
Chi Song's avatar
Chi Song committed
360
      __image__ set the image to be used in __worker__.
SparkSnail's avatar
SparkSnail committed
361
362
363



Deshui Yu's avatar
Deshui Yu committed
364
365
* __machineList__
 
SparkSnail's avatar
SparkSnail committed
366
     __machineList__ should be set if users set __trainingServicePlatform__=remote, or it could be empty.
Deshui Yu's avatar
Deshui Yu committed
367
368
  * __ip__
    
SparkSnail's avatar
SparkSnail committed
369
	__ip__ is the ip address of remote machine.
Deshui Yu's avatar
Deshui Yu committed
370
371
  * __port__
    
SparkSnail's avatar
SparkSnail committed
372
	__port__ is the ssh port to be used to connect machine.
Deshui Yu's avatar
Deshui Yu committed
373
	
Chi Song's avatar
Chi Song committed
374
	Note: if users set port empty, the default value will be 22.
Deshui Yu's avatar
Deshui Yu committed
375
376
  * __username__
    
SparkSnail's avatar
SparkSnail committed
377
	__username__ is the account of remote machine.
Deshui Yu's avatar
Deshui Yu committed
378
379
  * __passwd__
    
SparkSnail's avatar
SparkSnail committed
380
	__passwd__ specifies the password of the account.
Deshui Yu's avatar
Deshui Yu committed
381

382
383
  * __sshKeyPath__

SparkSnail's avatar
SparkSnail committed
384
    If users use ssh key to login remote machine, could set __sshKeyPath__ in config file. __sshKeyPath__ is the path of ssh key file, which should be valid.
385
	
386
	Note: if users set passwd and sshKeyPath simultaneously, NNI will try passwd.
387
388
389
		
  * __passphrase__

SparkSnail's avatar
SparkSnail committed
390
391
392
393
394
395
    __passphrase__ is used to protect ssh key, which could be empty if users don't have passphrase.

* __kubeflowConfig__:
  
  * __operator__
    
396
    __operator__ specify the kubeflow's operator to be used, NNI support __tf-operator__ in current version.
SparkSnail's avatar
SparkSnail committed
397
  
398
399
400
401
  * __storage__
   
    __storage__ specify the storage type of kubeflow, including {__nfs__, __azureStorage__}. This field is optional, and the default value is __nfs__. If the config use azureStorage, this field must be completed.
  
SparkSnail's avatar
SparkSnail committed
402
403
404
405
406
407
408
409
  * __nfs__
    
    __server__ is the host of nfs server

    __path__ is the mounted path of nfs
  
  * __keyVault__
    
SparkSnail's avatar
SparkSnail committed
410
    If users want to use azure kubernetes service, they should set keyVault to storage the private key of your azure storage account. Refer: https://docs.microsoft.com/en-us/azure/key-vault/key-vault-manage-with-cli2
SparkSnail's avatar
SparkSnail committed
411
412
413
414
415
416

    * __vaultName__

      __vaultName__ is the value of ```--vault-name``` used in az command.

    * __name__
417

SparkSnail's avatar
SparkSnail committed
418
      __name__ is the value of ```--name``` used in az command.
419

SparkSnail's avatar
SparkSnail committed
420
421
422
423
424
425
426
427
428
429
430
431
  * __azureStorage__
    
    If users use azure kubernetes service, they should set azure storage account to store code files.

    * __accountName__
     
      __accountName__ is the name of azure storage account.

    * __azureShare__
      
      __azureShare__ is the share of the azure file storage.

SparkSnail's avatar
SparkSnail committed
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
* __paiConfig__

  * __userName__
    
    __userName__ is the user name of your pai account.

  * __password__
    
    __password__ is the password of the pai account.
  
  * __host__
    
    __host__ is the host of pai.
    
    

Yan Ni's avatar
Yan Ni committed
448
<a name="Examples"></a>        
Deshui Yu's avatar
Deshui Yu committed
449
450
451
## Examples
* __local mode__

SparkSnail's avatar
SparkSnail committed
452
  If users want to run trial jobs in local machine, and use annotation to generate search space, could use the following config:
Chi Song's avatar
Chi Song committed
453

Deshui Yu's avatar
Deshui Yu committed
454
455
456
457
458
459
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
SparkSnail's avatar
SparkSnail committed
460
#choice: local, remote, pai, kubeflow
Deshui Yu's avatar
Deshui Yu committed
461
462
463
464
465
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
tuner:
  #choice: TPE, Random, Anneal, Evolution
466
467
468
469
470
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
471
trial:
472
473
474
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
475
476
```

SparkSnail's avatar
SparkSnail committed
477
  Could add assessor configuration in config file if set assessor.
Chi Song's avatar
Chi Song committed
478

Deshui Yu's avatar
Deshui Yu committed
479
480
481
482
483
484
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
SparkSnail's avatar
SparkSnail committed
485
#choice: local, remote, pai, kubeflow
Deshui Yu's avatar
Deshui Yu committed
486
487
488
489
490
491
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution
492
493
494
495
496
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
497
498
assessor:
  #choice: Medianstop
499
500
501
502
503
  builtinAssessorName: Medianstop
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
504
trial:
505
506
507
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
508
509
510
```

  Or you could specify your own tuner and assessor file as following:
Chi Song's avatar
Chi Song committed
511

Deshui Yu's avatar
Deshui Yu committed
512
513
514
515
516
517
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
SparkSnail's avatar
SparkSnail committed
518
#choice: local, remote, pai, kubeflow
Deshui Yu's avatar
Deshui Yu committed
519
520
521
522
523
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
524
525
526
527
528
529
530
  codeDir: /nni/tuner
  classFileName: mytuner.py
  className: MyTuner
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
531
assessor:
532
533
534
535
536
537
538
  codeDir: /nni/assessor
  classFileName: myassessor.py
  className: MyAssessor
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
539
trial:
540
541
542
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
543
544
545
546
```

* __remote mode__

SparkSnail's avatar
SparkSnail committed
547
If run trial jobs in remote machine, users could specify the remote mahcine information as fllowing format:
Chi Song's avatar
Chi Song committed
548

Deshui Yu's avatar
Deshui Yu committed
549
550
551
552
553
554
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
SparkSnail's avatar
SparkSnail committed
555
#choice: local, remote, pai, kubeflow
Deshui Yu's avatar
Deshui Yu committed
556
557
558
559
560
561
trainingServicePlatform: remote
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution
562
563
564
565
566
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
567
trial:
568
569
570
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
571
572
573
574
575
576
577
578
579
580
581
582
583
#machineList can be empty if the platform is local
machineList:
  - ip: 10.10.10.10
    port: 22
    username: test
    passwd: test
  - ip: 10.10.10.11
    port: 22
    username: test
    passwd: test
  - ip: 10.10.10.12
    port: 22
    username: test
584
585
    sshKeyPath: /nni/sshkey
    passphrase: qwert
586
```
SparkSnail's avatar
SparkSnail committed
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613

* __pai mode__

```
authorName: test
experimentName: nni_test1
trialConcurrency: 1
maxExecDuration:500h
maxTrialNum: 1
#choice: local, remote, pai, kubeflow
trainingServicePlatform: pai
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution, BatchTuner
  #SMAC (SMAC should be installed through nnictl)
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
trial:
  command: python3 main.py 
  codeDir: .
  gpuNum: 4
  cpuNum: 2
  memoryMB: 10000
614
  #The docker image to run NNI job on pai
SparkSnail's avatar
SparkSnail committed
615
616
617
  image: msranni/nni:latest
  #The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
  dataDir: hdfs://10.11.12.13:9000/test
618
  #The hdfs directory to store output data generated by NNI, format 'hdfs://host:port/directory'
SparkSnail's avatar
SparkSnail committed
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
  outputDir: hdfs://10.11.12.13:9000/test
paiConfig:
  #The username to login pai
  userName: test
  #The password to login pai
  passWord: test
  #The host of restful server of pai
  host: 10.10.10.10
```

* __kubeflow mode__

kubeflow use nfs as storage.

```
authorName: default
experimentName: example_mni
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 1
#choice: local, remote, pai, kubeflow
trainingServicePlatform: kubeflow
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
trial:
  codeDir: .
  worker:
    replicas: 1
    command: python3 mnist.py
    gpuNum: 0
    cpuNum: 1
    memoryMB: 8192
    image: msranni/nni:latest
kubeflowConfig:
  operator: tf-operator
  nfs:
    server: 10.10.10.10
    path: /var/nfs/general
```
Chi Song's avatar
Chi Song committed
665

SparkSnail's avatar
SparkSnail committed
666
kubeflow use azure storage
Chi Song's avatar
Chi Song committed
667

SparkSnail's avatar
SparkSnail committed
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
```
authorName: default
experimentName: example_mni
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 1
#choice: local, remote, pai, kubeflow
trainingServicePlatform: kubeflow
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
#nniManagerIp: 10.10.10.10
tuner:
  #choice: TPE, Random, Anneal, Evolution
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
assessor:
  builtinAssessorName: Medianstop
  classArgs:
    optimize_mode: maximize
  gpuNum: 0
trial:
  codeDir: .
  worker:
    replicas: 1
    command: python3 mnist.py
    gpuNum: 0
    cpuNum: 1
    memoryMB: 4096
    image: msranni/nni:latest
kubeflowConfig:
  operator: tf-operator
  keyVault:
    vaultName: Contoso-Vault
    name: AzureStorageAccountKey
  azureStorage:
    accountName: storage
    azureShare: share01
```