ExperimentConfig.md 10.4 KB
Newer Older
Deshui Yu's avatar
Deshui Yu committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
Experiment config reference
===

If you want to create a new nni experiment, you need to prepare a config file in your local machine, and provide the path of this file to nnictl.
The config file is written in yaml format, and need to be written correctly.
This document describes the rule to write config file, and will provide some examples and templates for you. 
## Template
* __light weight(without Annotation and Assessor)__ 
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
#choice: local, remote
trainingServicePlatform: 
searchSpacePath: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
  tunerName: 
  #choice: Maximize, Minimize
  optimizationMode: 
  tunerGpuNum: 
trial:
  trialCommand: 
  trialCodeDir: 
  trialGpuNum: 
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
* __Use Assessor__
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
#choice: local, remote
trainingServicePlatform: 
searchSpacePath: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
  tunerName: 
  #choice: Maximize, Minimize
  optimizationMode: 
  tunerGpuNum: 
assessor:
  #choice: Medianstop
  assessorName: 
  #choice: Maximize, Minimize
  optimizationMode: 
  assessorGpuNum: 
trial:
  trialCommand: 
  trialCodeDir: 
  trialGpuNum: 
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
* __Use Annotation__
```
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
#choice: local, remote
trainingServicePlatform: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
  tunerName: 
  #choice: Maximize, Minimize
  optimizationMode: 
  tunerGpuNum: 
assessor:
  #choice: Medianstop
  assessorName: 
  #choice: Maximize, Minimize
  optimizationMode: 
  assessorGpuNum: 
trial:
  trialCommand: 
  trialCodeDir: 
  trialGpuNum: 
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
```
## Configuration
* __authorName__
  * Description  
            
	 __authorName__ is the name of the author who create the experiment.
	 
* __experimentName__
  * Description
  
    __experimentName__ is the name of the experiment you created.  
	
* __trialConcurrency__
  * Description
    
	 __trialConcurrency__ specifies the max num of trial jobs run simultaneously.  
	 
	    Note: if you set trialGpuNum bigger than the free gpu numbers in your machine, and the trial jobs running simultaneously can not reach trialConcurrency number, some trial jobs will be put into a queue to wait for gpu allocation.
	 
* __maxExecDuration__
  * Description
    
	__maxExecDuration__ specifies the max duration time of an experiment.The unit of the time is {__s__, __m__, __h__, __d__}, which means {_seconds_, _minutes_, _hours_, _days_}.  
	
* __maxTrialNum__
  *  Description
    
	 __maxTrialNum__ specifies the max number of trial jobs created by nni, including successed and failed jobs.  
	 
* __trainingServicePlatform__
  * Description
      
	  __trainingServicePlatform__ specifies the platform to run the experiment, including {__local__, __remote__}.  
	* __local__ mode means you run an experiment in your local linux machine.  
	
	* __remote__ mode means you submit trial jobs to remote linux machines. If you set platform as remote, you should complete __machineList__ field.  
	
* __searchSpacePath__
  * Description
    
	 __searchSpacePath__ specifies the path of search space file you want to use, which should be a valid path in your local linux machine.
	        
	    Note: if you set useAnnotation=True, you should remove searchSpacePath field or just let it be empty.
* __useAnnotation__
  * Description
   
    __useAnnotation__ means whether you use annotation to analysis your code and generate search space. 
	   
	    Note: if you set useAnnotation=True, you should not set searchSpacePath.
		
* __tuner__
  * Description
  
    __tuner__ specifies the tuner algorithm you use to run an experiment, there are two kinds of ways to set tuner. One way is to use tuner provided by nni sdk, you just need to set __tunerName__ and __optimizationMode__. Another way is to use your own tuner file, and you need to set __tunerCommand__, __tunerCwd__.
  * __tunerName__ and __optimizationMode__
    * __tunerName__
    
	  __tunerName__ specifies the name of system tuner you want to use, nni sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__}
	 * __optimizationMode__
	
	   __optimizationMode__ specifies the optimization mode of tuner algorithm, including {__Maximize__, __Minimize__}
  * __tunerCommand__ and __tunerCwd__
      * __tunerCommand__
        
		__tunerCommand__ specifies the command you want to use to run your own tuner file, for example {__python3 mytuner.py__}
	   * __tunerCwd__
	   
	     __tunerCwd__ specifies the working directory of your own tuner file, which is the path of your own tuner file.
  * __tunerGpuNum__
    
	  __tunerGPUNum__ specifies the gpu number you want to use to run the tuner process. The value of this field should be a positive number.
	  
	    Note: you could only specify one way to set tuner, for example, you could set {tunerName, optimizationMode} or {tunerCommand, tunerCwd}, and you could not set them both. 

* __assessor__
 
  * Description
 
    __assessor__ specifies the assessor algorithm you use to run experiment, there are two kinds of ways to set assessor. One way is to use assessor provided by nni sdk, you just need to set __assessorName__ and __optimizationMode__. Another way is to use your own assessor file, and you need to set __assessorCommand__, __assessorCwd__.
  * __assessorName__ and __optimizationMode__
    * __assessorName__
    
	  __assessorName__ specifies the name of system assessor you want to use, nni sdk provides one kind of assessor, which is {__Medianstop__}.
	 * __optimizationMode__
	 
	   __optimizationMode__ specifies the optimization mode of tuner algorithm, including {__Maximize__, __Minimize__}
  * __assessorCommand__ and __assessorCwd__
      * __assessorCommand__
        
		__assessorCommand__ specifies the command you want to use to run your own assessor file, for example {__python3 myassessor.py__}
	 * __assessorCwd__
	  
	   __assessorCwd__ specifies the working directory of your own assessor file, which is the path of your own assessor file.
  * __assessorGpuNum__
    
	__assessorGPUNum__ specifies the gpu number you want to use to run the assessor process. The value of this field should be a positive number.

        Note: you could only specify one way to set assessor, for example, you could set {assessorName, optimizationMode} or {assessorCommand, assessorCwd}, and you could not set them both.If you do not want to use assessor, you just need to leave assessor empty or remove assessor in your config file. 
* __trial__
  * __trialCommand__

      __trialCommand__  specifies the command to run trial process.
  * __trialCodeDir__
    
	  __trialCodeDir__ specifies the directory of your own trial file.
  * __trialGpuNum__
    
	  __trialGpuNum__ specifies the num of gpu you want to use to run your trial process.
* __machineList__
 
     __machineList__ should be set if you set __trainingServicePlatform__=remote, or it could be empty.
  * __ip__
    
	__ip__ is the ip address of your remote machine.
  * __port__
    
	__port__ is the ssh port you want to use to connect machine.
	
	    Note: if you set port empty, the default value will be 22.
  * __username__
    
	__username__ is the account you use.
  * __passwd__
    
	__passwd__ specifies the password of your account.

## Examples
* __local mode__

  If you want to run your trial jobs in your local machine, and use annotation to generate search space, you could use the following config:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
tuner:
  #choice: TPE, Random, Anneal, Evolution
  tunerName: TPE
  #choice: Maximize, Minimize
  optimizationMode: Maximize
  tunerGpuNum: 0
trial:
  trialCommand: python3 mnist.py
  trialCodeDir: /nni/mnist
  trialGpuNum: 0
```

  If you want to use assessor, you could add assessor configuration in your file.
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution
  tunerName: TPE
  #choice: Maximize, Minimize
  optimizationMode: Maximize
  tunerGpuNum: 0
assessor:
  #choice: Medianstop
  assessorName: Medianstop
  #choice: Maximize, Minimize
  optimizationMode: Maximize
  assessorGpuNum: 0
trial:
  trialCommand: python3 mnist.py
  trialCodeDir: /nni/mnist
  trialGpuNum: 0
```

  Or you could specify your own tuner and assessor file as following:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  tunerCommand: python3 mytuner.py
  tunerCwd: /nni/tuner
  tunerGpuNum: 0
assessor:
  assessorCommand: python3 myassessor.py
  assessorCwd: /nni/assessor
  assessorGpuNum: 0
trial:
  trialCommand: python3 mnist.py
  trialCodeDir: /nni/mnist
  trialGpuNum: 0
```

* __remote mode__

If you want run trial jobs in your remote machine, you could specify the remote mahcine information as fllowing format:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: remote
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution
  tunerName: TPE
  #choice: Maximize, Minimize
  optimizationMode: Maximize
  tunerGpuNum: 0
trial:
  trialCommand: python3 mnist.py
  trialCodeDir: /nni/mnist
  trialGpuNum: 0
#machineList can be empty if the platform is local
machineList:
  - ip: 10.10.10.10
    port: 22
    username: test
    passwd: test
  - ip: 10.10.10.11
    port: 22
    username: test
    passwd: test
  - ip: 10.10.10.12
    port: 22
    username: test
    passwd: test
```