ExperimentConfig.md 11.5 KB
Newer Older
Scarlett Li's avatar
Scarlett Li committed
1
# Experiment config reference
Deshui Yu's avatar
Deshui Yu committed
2
3
4
5
6
7
8
9
10
11
12
13
14
===

If you want to create a new nni experiment, you need to prepare a config file in your local machine, and provide the path of this file to nnictl.
The config file is written in yaml format, and need to be written correctly.
This document describes the rule to write config file, and will provide some examples and templates for you. 
## Template
* __light weight(without Annotation and Assessor)__ 
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
SparkSnail's avatar
SparkSnail committed
15
#choice: local, remote, pai
Deshui Yu's avatar
Deshui Yu committed
16
17
18
19
20
21
trainingServicePlatform: 
searchSpacePath: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
22
23
24
25
26
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
27
trial:
28
29
30
  command: 
  codeDir: 
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
* __Use Assessor__
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
SparkSnail's avatar
SparkSnail committed
45
#choice: local, remote, pai
Deshui Yu's avatar
Deshui Yu committed
46
47
48
49
50
51
trainingServicePlatform: 
searchSpacePath: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
52
53
54
55
56
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
57
58
assessor:
  #choice: Medianstop
59
60
61
62
63
  builtinAssessorName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
64
trial:
65
66
67
  command: 
  codeDir: 
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
* __Use Annotation__
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
SparkSnail's avatar
SparkSnail committed
82
#choice: local, remote, pai
Deshui Yu's avatar
Deshui Yu committed
83
84
85
86
87
trainingServicePlatform: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
88
89
90
91
92
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
93
94
assessor:
  #choice: Medianstop
95
96
97
98
99
  builtinAssessorName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
100
trial:
101
102
103
  command: 
  codeDir: 
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
104
105
106
107
108
109
110
111
112
113
114
115
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
## Configuration
* __authorName__
  * Description  
            
	 __authorName__ is the name of the author who create the experiment.
116
   TBD: add default value
Deshui Yu's avatar
Deshui Yu committed
117
118
119
120
121
	 
* __experimentName__
  * Description
  
    __experimentName__ is the name of the experiment you created.  
122
    TBD: add default value
Deshui Yu's avatar
Deshui Yu committed
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
	
* __trialConcurrency__
  * Description
    
	 __trialConcurrency__ specifies the max num of trial jobs run simultaneously.  
	 
	    Note: if you set trialGpuNum bigger than the free gpu numbers in your machine, and the trial jobs running simultaneously can not reach trialConcurrency number, some trial jobs will be put into a queue to wait for gpu allocation.
	 
* __maxExecDuration__
  * Description
    
	__maxExecDuration__ specifies the max duration time of an experiment.The unit of the time is {__s__, __m__, __h__, __d__}, which means {_seconds_, _minutes_, _hours_, _days_}.  
	
* __maxTrialNum__
  *  Description
    
139
	 __maxTrialNum__ specifies the max number of trial jobs created by nni, including succeeded and failed jobs.  
Deshui Yu's avatar
Deshui Yu committed
140
141
142
143
144
145
146
147
	 
* __trainingServicePlatform__
  * Description
      
	  __trainingServicePlatform__ specifies the platform to run the experiment, including {__local__, __remote__}.  
	* __local__ mode means you run an experiment in your local linux machine.  
	
	* __remote__ mode means you submit trial jobs to remote linux machines. If you set platform as remote, you should complete __machineList__ field.  
SparkSnail's avatar
SparkSnail committed
148
149

	* __pai__ mode means you submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md)
Deshui Yu's avatar
Deshui Yu committed
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
	
* __searchSpacePath__
  * Description
    
	 __searchSpacePath__ specifies the path of search space file you want to use, which should be a valid path in your local linux machine.
	        
	    Note: if you set useAnnotation=True, you should remove searchSpacePath field or just let it be empty.
* __useAnnotation__
  * Description
   
    __useAnnotation__ means whether you use annotation to analysis your code and generate search space. 
	   
	    Note: if you set useAnnotation=True, you should not set searchSpacePath.
		
* __tuner__
  * Description
  
167
168
169
    __tuner__ specifies the tuner algorithm you use to run an experiment, there are two kinds of ways to set tuner. One way is to use tuner provided by nni sdk, you just need to set __builtinTunerName__ and __classArgs__. Another way is to use your own tuner file, and you need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
  * __builtinTunerName__ and __classArgs__
    * __builtinTunerName__
Deshui Yu's avatar
Deshui Yu committed
170
    
171
	  __builtinTunerName__ specifies the name of system tuner you want to use, nni sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__, __BatchTuner__, __GridSearch__}
172
	 * __classArgs__
Deshui Yu's avatar
Deshui Yu committed
173
	
174
	   __classArgs__ specifies the arguments of tuner algorithm. If the __builtinTunerName__ is in {__TPE__, __Random__, __Anneal__, __Evolution__}, you should set __optimize_mode__.
175
176
  * __codeDir__, __classFileName__, __className__ and __classArgs__
      * __codeDir__
Deshui Yu's avatar
Deshui Yu committed
177
        
178
179
		__codeDir__ specifies the directory of tuner code.
	    * __classFileName__
Deshui Yu's avatar
Deshui Yu committed
180
	   
181
182
183
184
185
186
187
188
	  __classFileName__ specifies the name of tuner file.
     * __className__
	   
	  __className__ specifies the name of tuner class.
     * __classArgs__
	   
	  __classArgs__ specifies the arguments of tuner algorithm.
  * __gpuNum__
Deshui Yu's avatar
Deshui Yu committed
189
    
190
	  __gpuNum__ specifies the gpu number you want to use to run the tuner process. The value of this field should be a positive number.
Deshui Yu's avatar
Deshui Yu committed
191
192
193
194
195
196
	  
	    Note: you could only specify one way to set tuner, for example, you could set {tunerName, optimizationMode} or {tunerCommand, tunerCwd}, and you could not set them both. 

* __assessor__
 
  * Description
197
198
199
200
  
    __assessor__ specifies the assessor algorithm you use to run an experiment, there are two kinds of ways to set assessor. One way is to use assessor provided by nni sdk, you just need to set __builtinAssessorName__ and __classArgs__. Another way is to use your own tuner file, and you need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
  * __builtinAssessorName__ and __classArgs__
    * __builtinAssessorName__
Deshui Yu's avatar
Deshui Yu committed
201
    
202
203
204
205
206
207
	  __builtinAssessorName__ specifies the name of system assessor you want to use, nni sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__}
	 * __classArgs__
	
	   __classArgs__ specifies the arguments of tuner algorithm
  * __codeDir__, __classFileName__, __className__ and __classArgs__
      * __codeDir__
Deshui Yu's avatar
Deshui Yu committed
208
        
209
210
211
212
213
214
215
216
217
218
219
		__codeDir__ specifies the directory of tuner code.
	    * __classFileName__
	   
	  __classFileName__ specifies the name of tuner file.
     * __className__
	   
	  __className__ specifies the name of tuner class.
     * __classArgs__
	   
	  __classArgs__ specifies the arguments of tuner algorithm.
  * __gpuNum__
Deshui Yu's avatar
Deshui Yu committed
220
    
221
	__gpuNum__ specifies the gpu number you want to use to run the assessor process. The value of this field should be a positive number.
Deshui Yu's avatar
Deshui Yu committed
222

223
        Note: you could only specify one way to set assessor, for example, you could set {assessorName, optimizationMode} or {assessorCommand, assessorCwd}, and you could not set them both.If you do not want to use assessor, you just need to leave assessor empty or remove assessor in your config file. Default value is 0. 
Deshui Yu's avatar
Deshui Yu committed
224
* __trial__
225
  * __command__
Deshui Yu's avatar
Deshui Yu committed
226

227
228
      __command__  specifies the command to run trial process.
  * __codeDir__
Deshui Yu's avatar
Deshui Yu committed
229
    
230
231
	  __codeDir__ specifies the directory of your own trial file.
  * __gpuNum__
Deshui Yu's avatar
Deshui Yu committed
232
    
233
	  __gpuNum__ specifies the num of gpu you want to use to run your trial process. Default value is 0. 
Deshui Yu's avatar
Deshui Yu committed
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
* __machineList__
 
     __machineList__ should be set if you set __trainingServicePlatform__=remote, or it could be empty.
  * __ip__
    
	__ip__ is the ip address of your remote machine.
  * __port__
    
	__port__ is the ssh port you want to use to connect machine.
	
	    Note: if you set port empty, the default value will be 22.
  * __username__
    
	__username__ is the account you use.
  * __passwd__
    
	__passwd__ specifies the password of your account.

252
253
254
255
256
257
258
259
260
261
262
  * __sshKeyPath__

    If you want to use ssh key to login remote machine, you could set __sshKeyPath__ in config file. __sshKeyPath__ is the path of ssh key file, which should be valid.
	
	    Note: if you set passwd and sshKeyPath simultaneously, nni will try passwd.
		
  * __passphrase__

    __passphrase__ is used to protect ssh key, which could be empty if you don't have passphrase.


Deshui Yu's avatar
Deshui Yu committed
263
264
265
266
267
268
269
270
271
272
## Examples
* __local mode__

  If you want to run your trial jobs in your local machine, and use annotation to generate search space, you could use the following config:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
SparkSnail's avatar
SparkSnail committed
273
#choice: local, remote, pai
Deshui Yu's avatar
Deshui Yu committed
274
275
276
277
278
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
tuner:
  #choice: TPE, Random, Anneal, Evolution
279
280
281
282
283
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
284
trial:
285
286
287
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
288
289
290
291
292
293
294
295
296
```

  If you want to use assessor, you could add assessor configuration in your file.
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
SparkSnail's avatar
SparkSnail committed
297
#choice: local, remote, pai
Deshui Yu's avatar
Deshui Yu committed
298
299
300
301
302
303
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution
304
305
306
307
308
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
309
310
assessor:
  #choice: Medianstop
311
312
313
314
315
  builtinAssessorName: Medianstop
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
316
trial:
317
318
319
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
320
321
322
323
324
325
326
327
328
```

  Or you could specify your own tuner and assessor file as following:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
SparkSnail's avatar
SparkSnail committed
329
#choice: local, remote, pai
Deshui Yu's avatar
Deshui Yu committed
330
331
332
333
334
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
335
336
337
338
339
340
341
  codeDir: /nni/tuner
  classFileName: mytuner.py
  className: MyTuner
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
342
assessor:
343
344
345
346
347
348
349
  codeDir: /nni/assessor
  classFileName: myassessor.py
  className: MyAssessor
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
350
trial:
351
352
353
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
354
355
356
357
358
359
360
361
362
363
364
```

* __remote mode__

If you want run trial jobs in your remote machine, you could specify the remote mahcine information as fllowing format:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
SparkSnail's avatar
SparkSnail committed
365
#choice: local, remote, pai
Deshui Yu's avatar
Deshui Yu committed
366
367
368
369
370
371
trainingServicePlatform: remote
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution
372
373
374
375
376
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
377
trial:
378
379
380
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
381
382
383
384
385
386
387
388
389
390
391
392
393
#machineList can be empty if the platform is local
machineList:
  - ip: 10.10.10.10
    port: 22
    username: test
    passwd: test
  - ip: 10.10.10.11
    port: 22
    username: test
    passwd: test
  - ip: 10.10.10.12
    port: 22
    username: test
394
395
    sshKeyPath: /nni/sshkey
    passphrase: qwert
396
```