Unverified Commit ad88e3a3 authored by SparkSnail's avatar SparkSnail Committed by GitHub
Browse files

add document for nnictl (#230)

* fix nnictl bug

* fix install.sh

* add desc for Dockerfile.build.base

* update document for Dockerfile

* update

* refactor port detect

* update

* refactor NNICTLDOC.md

* add document for pai and nnictl

* add default value for port
parent 42d8cbda
......@@ -12,7 +12,7 @@ experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform:
searchSpacePath:
#choice: true, false
......@@ -42,7 +42,7 @@ experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform:
searchSpacePath:
#choice: true, false
......@@ -79,7 +79,7 @@ experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform:
#choice: true, false
useAnnotation:
......@@ -146,6 +146,8 @@ machineList:
* __remote__ mode means you submit trial jobs to remote linux machines. If you set platform as remote, you should complete __machineList__ field.
* __pai__ mode means you submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md)
* __searchSpacePath__
* Description
......@@ -268,7 +270,7 @@ experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
......@@ -292,7 +294,7 @@ experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
......@@ -324,7 +326,7 @@ experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
......@@ -360,7 +362,7 @@ experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: remote
searchSpacePath: /nni/search_space.json
#choice: true, false
......
......@@ -62,7 +62,7 @@ maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
# choice: local, remote
# choice: local, remote, pai
trainingServicePlatform: local
# choice: true, false
......
......@@ -14,6 +14,7 @@ nnictl trial
nnictl experiment
nnictl config
nnictl log
nnictl webui
```
### Manage an experiment
* __nnictl create__
......@@ -33,7 +34,7 @@ nnictl log
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --config, -c| True| |yaml configure file of the experiment|
| --port, -p | False| |the port of restful server|
* __nnictl resume__
......@@ -56,11 +57,20 @@ nnictl log
* __nnictl stop__
* Description
You can use this command to stop a running experiment.
You can use this command to stop a running experiment or multiple experiments.
* Usage
nnictl stop
nnictl stop [id]
* Detail
1.If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message.
2.If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
3.If the id ends with *, nnictl will stop all experiments whose ids matchs the regular.
4.If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
5.If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
6.Users could use 'nnictl stop all' to stop all experiments
* __nnictl update__
......@@ -78,6 +88,7 @@ nnictl log
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --filename, -f| True| |the file storing your new search space|
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl update concurrency__
* Description
......@@ -93,6 +104,7 @@ nnictl log
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --value, -v| True| |the number of allowed concurrent trials|
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl update duration__
* Description
......@@ -108,6 +120,7 @@ nnictl log
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --value, -v| True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.|
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl trial__
......@@ -120,6 +133,12 @@ nnictl log
nnictl trial ls
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl trial kill__
* Description
......@@ -133,6 +152,7 @@ nnictl log
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --trialid, -t| True| |ID of the trial you want to kill.|
| --id, -i| False| |ID of the experiment you want to set|
......@@ -147,6 +167,36 @@ nnictl log
nnictl experiment show
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl experiment status__
* Description
Show the status of experiment.
* Usage
nnictl experiment status
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl experiment list__
* Description
Show the id and start time of all running experiments.
* Usage
nnictl experiment list
* __nnictl config show__
......@@ -176,6 +226,7 @@ nnictl log
| --head, -h| False| |show head lines of stdout|
| --tail, -t| False| |show tail lines of stdout|
| --path, -p| False| |show the path of stdout file|
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl log stderr__
* Description
......@@ -193,6 +244,7 @@ nnictl log
| --head, -h| False| |show head lines of stderr|
| --tail, -t| False| |show tail lines of stderr|
| --path, -p| False| |show the path of stderr file|
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl log trial__
* Description
......@@ -209,3 +261,19 @@ nnictl log
| ------ | ------ | ------ |------ |
| --id, -I| False| |the id of trial|
### Manage webui
* __nnictl webui url__
* Description
Show the urls of the experiment.
* Usage
nnictl webui url
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|
\ No newline at end of file
......@@ -34,7 +34,7 @@ trialConcurrency: 2
maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
# choice: local, remote
# choice: local, remote, pai
trainingServicePlatform: local
# choice: true, false
useAnnotation: true
......
......@@ -67,7 +67,7 @@ def detect_port(port):
socket_test = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
try:
socket_test.connect(('127.0.0.1', int(port)))
socket_test.shutdown(2)
socket_test.close()
return True
except:
return False
......@@ -92,12 +92,12 @@ pai_config_schema = {
machine_list_schima = {
Optional('machineList'):[Or({
'ip': str,
'port': And(int, lambda x: 0 < x < 65535),
Optional('port'): And(int, lambda x: 0 < x < 65535),
'username': str,
'passwd': str
},{
'ip': str,
'port': And(int, lambda x: 0 < x < 65535),
Optional('port'): And(int, lambda x: 0 < x < 65535),
'username': str,
'sshKeyPath': os.path.exists,
Optional('passphrase'): str
......
......@@ -97,6 +97,11 @@ def validate_common_content(experiment_config):
experiment_config['maxExecDuration'] = '999d'
if experiment_config.get('maxTrialNum') is None:
experiment_config['maxTrialNum'] = 99999
if experiment_config['trainingServicePlatform'] == 'remote':
for index in range(len(experiment_config['machineList'])):
if experiment_config['machineList'][index].get('port') is None:
experiment_config['machineList'][index]['port'] = 22
except Exception as exception:
raise Exception(exception)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment