ExperimentConfig.md 11.1 KB
Newer Older
Deshui Yu's avatar
Deshui Yu committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Experiment config reference
===

If you want to create a new nni experiment, you need to prepare a config file in your local machine, and provide the path of this file to nnictl.
The config file is written in yaml format, and need to be written correctly.
This document describes the rule to write config file, and will provide some examples and templates for you. 
## Template
* __light weight(without Annotation and Assessor)__ 
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
#choice: local, remote
trainingServicePlatform: 
searchSpacePath: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
22
23
24
25
26
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
27
trial:
28
29
30
  command: 
  codeDir: 
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
* __Use Assessor__
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
#choice: local, remote
trainingServicePlatform: 
searchSpacePath: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
52
53
54
55
56
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
57
58
assessor:
  #choice: Medianstop
59
60
61
62
63
  builtinAssessorName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
64
trial:
65
66
67
  command: 
  codeDir: 
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
* __Use Annotation__
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
#choice: local, remote
trainingServicePlatform: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
88
89
90
91
92
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
93
94
assessor:
  #choice: Medianstop
95
96
97
98
99
  builtinAssessorName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
100
trial:
101
102
103
  command: 
  codeDir: 
  gpuNum: 
Deshui Yu's avatar
Deshui Yu committed
104
105
106
107
108
109
110
111
112
113
114
115
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
## Configuration
* __authorName__
  * Description  
            
	 __authorName__ is the name of the author who create the experiment.
116
   TBD: add default value
Deshui Yu's avatar
Deshui Yu committed
117
118
119
120
121
	 
* __experimentName__
  * Description
  
    __experimentName__ is the name of the experiment you created.  
122
    TBD: add default value
Deshui Yu's avatar
Deshui Yu committed
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
	
* __trialConcurrency__
  * Description
    
	 __trialConcurrency__ specifies the max num of trial jobs run simultaneously.  
	 
	    Note: if you set trialGpuNum bigger than the free gpu numbers in your machine, and the trial jobs running simultaneously can not reach trialConcurrency number, some trial jobs will be put into a queue to wait for gpu allocation.
	 
* __maxExecDuration__
  * Description
    
	__maxExecDuration__ specifies the max duration time of an experiment.The unit of the time is {__s__, __m__, __h__, __d__}, which means {_seconds_, _minutes_, _hours_, _days_}.  
	
* __maxTrialNum__
  *  Description
    
	 __maxTrialNum__ specifies the max number of trial jobs created by nni, including successed and failed jobs.  
	 
* __trainingServicePlatform__
  * Description
      
	  __trainingServicePlatform__ specifies the platform to run the experiment, including {__local__, __remote__}.  
	* __local__ mode means you run an experiment in your local linux machine.  
	
	* __remote__ mode means you submit trial jobs to remote linux machines. If you set platform as remote, you should complete __machineList__ field.  
	
* __searchSpacePath__
  * Description
    
	 __searchSpacePath__ specifies the path of search space file you want to use, which should be a valid path in your local linux machine.
	        
	    Note: if you set useAnnotation=True, you should remove searchSpacePath field or just let it be empty.
* __useAnnotation__
  * Description
   
    __useAnnotation__ means whether you use annotation to analysis your code and generate search space. 
	   
	    Note: if you set useAnnotation=True, you should not set searchSpacePath.
		
* __tuner__
  * Description
  
165
166
167
    __tuner__ specifies the tuner algorithm you use to run an experiment, there are two kinds of ways to set tuner. One way is to use tuner provided by nni sdk, you just need to set __builtinTunerName__ and __classArgs__. Another way is to use your own tuner file, and you need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
  * __builtinTunerName__ and __classArgs__
    * __builtinTunerName__
Deshui Yu's avatar
Deshui Yu committed
168
    
169
170
	  __builtinTunerName__ specifies the name of system tuner you want to use, nni sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__}
	 * __classArgs__
Deshui Yu's avatar
Deshui Yu committed
171
	
172
173
174
	   __classArgs__ specifies the arguments of tuner algorithm
  * __codeDir__, __classFileName__, __className__ and __classArgs__
      * __codeDir__
Deshui Yu's avatar
Deshui Yu committed
175
        
176
177
		__codeDir__ specifies the directory of tuner code.
	    * __classFileName__
Deshui Yu's avatar
Deshui Yu committed
178
	   
179
180
181
182
183
184
185
186
	  __classFileName__ specifies the name of tuner file.
     * __className__
	   
	  __className__ specifies the name of tuner class.
     * __classArgs__
	   
	  __classArgs__ specifies the arguments of tuner algorithm.
  * __gpuNum__
Deshui Yu's avatar
Deshui Yu committed
187
    
188
	  __gpuNum__ specifies the gpu number you want to use to run the tuner process. The value of this field should be a positive number.
Deshui Yu's avatar
Deshui Yu committed
189
190
191
192
193
194
	  
	    Note: you could only specify one way to set tuner, for example, you could set {tunerName, optimizationMode} or {tunerCommand, tunerCwd}, and you could not set them both. 

* __assessor__
 
  * Description
195
196
197
198
  
    __assessor__ specifies the assessor algorithm you use to run an experiment, there are two kinds of ways to set assessor. One way is to use assessor provided by nni sdk, you just need to set __builtinAssessorName__ and __classArgs__. Another way is to use your own tuner file, and you need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
  * __builtinAssessorName__ and __classArgs__
    * __builtinAssessorName__
Deshui Yu's avatar
Deshui Yu committed
199
    
200
201
202
203
204
205
	  __builtinAssessorName__ specifies the name of system assessor you want to use, nni sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__}
	 * __classArgs__
	
	   __classArgs__ specifies the arguments of tuner algorithm
  * __codeDir__, __classFileName__, __className__ and __classArgs__
      * __codeDir__
Deshui Yu's avatar
Deshui Yu committed
206
        
207
208
209
210
211
212
213
214
215
216
217
		__codeDir__ specifies the directory of tuner code.
	    * __classFileName__
	   
	  __classFileName__ specifies the name of tuner file.
     * __className__
	   
	  __className__ specifies the name of tuner class.
     * __classArgs__
	   
	  __classArgs__ specifies the arguments of tuner algorithm.
  * __gpuNum__
Deshui Yu's avatar
Deshui Yu committed
218
    
219
	__gpuNum__ specifies the gpu number you want to use to run the assessor process. The value of this field should be a positive number.
Deshui Yu's avatar
Deshui Yu committed
220

221
        Note: you could only specify one way to set assessor, for example, you could set {assessorName, optimizationMode} or {assessorCommand, assessorCwd}, and you could not set them both.If you do not want to use assessor, you just need to leave assessor empty or remove assessor in your config file. Default value is 0. 
Deshui Yu's avatar
Deshui Yu committed
222
* __trial__
223
  * __command__
Deshui Yu's avatar
Deshui Yu committed
224

225
226
      __command__  specifies the command to run trial process.
  * __codeDir__
Deshui Yu's avatar
Deshui Yu committed
227
    
228
229
	  __codeDir__ specifies the directory of your own trial file.
  * __gpuNum__
Deshui Yu's avatar
Deshui Yu committed
230
    
231
	  __gpuNum__ specifies the num of gpu you want to use to run your trial process. Default value is 0. 
Deshui Yu's avatar
Deshui Yu committed
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
* __machineList__
 
     __machineList__ should be set if you set __trainingServicePlatform__=remote, or it could be empty.
  * __ip__
    
	__ip__ is the ip address of your remote machine.
  * __port__
    
	__port__ is the ssh port you want to use to connect machine.
	
	    Note: if you set port empty, the default value will be 22.
  * __username__
    
	__username__ is the account you use.
  * __passwd__
    
	__passwd__ specifies the password of your account.

250
251
252
253
254
255
256
257
258
259
260
  * __sshKeyPath__

    If you want to use ssh key to login remote machine, you could set __sshKeyPath__ in config file. __sshKeyPath__ is the path of ssh key file, which should be valid.
	
	    Note: if you set passwd and sshKeyPath simultaneously, nni will try passwd.
		
  * __passphrase__

    __passphrase__ is used to protect ssh key, which could be empty if you don't have passphrase.


Deshui Yu's avatar
Deshui Yu committed
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
## Examples
* __local mode__

  If you want to run your trial jobs in your local machine, and use annotation to generate search space, you could use the following config:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
tuner:
  #choice: TPE, Random, Anneal, Evolution
277
278
279
280
281
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
282
trial:
283
284
285
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
```

  If you want to use assessor, you could add assessor configuration in your file.
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution
302
303
304
305
306
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
307
308
assessor:
  #choice: Medianstop
309
310
311
312
313
  builtinAssessorName: Medianstop
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
314
trial:
315
316
317
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
```

  Or you could specify your own tuner and assessor file as following:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
333
334
335
336
337
338
339
  codeDir: /nni/tuner
  classFileName: mytuner.py
  className: MyTuner
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
340
assessor:
341
342
343
344
345
346
347
  codeDir: /nni/assessor
  classFileName: myassessor.py
  className: MyAssessor
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
348
trial:
349
350
351
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
```

* __remote mode__

If you want run trial jobs in your remote machine, you could specify the remote mahcine information as fllowing format:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: remote
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution
370
371
372
373
374
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
375
trial:
376
377
378
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
379
380
381
382
383
384
385
386
387
388
389
390
391
#machineList can be empty if the platform is local
machineList:
  - ip: 10.10.10.10
    port: 22
    username: test
    passwd: test
  - ip: 10.10.10.11
    port: 22
    username: test
    passwd: test
  - ip: 10.10.10.12
    port: 22
    username: test
392
393
    sshKeyPath: /nni/sshkey
    passphrase: qwert
Deshui Yu's avatar
Deshui Yu committed
394
```