Unverified Commit ad88e3a3 authored by SparkSnail's avatar SparkSnail Committed by GitHub
Browse files

add document for nnictl (#230)

* fix nnictl bug

* fix install.sh

* add desc for Dockerfile.build.base

* update document for Dockerfile

* update

* refactor port detect

* update

* refactor NNICTLDOC.md

* add document for pai and nnictl

* add default value for port
parent 42d8cbda
...@@ -12,7 +12,7 @@ experimentName: ...@@ -12,7 +12,7 @@ experimentName:
trialConcurrency: trialConcurrency:
maxExecDuration: maxExecDuration:
maxTrialNum: maxTrialNum:
#choice: local, remote #choice: local, remote, pai
trainingServicePlatform: trainingServicePlatform:
searchSpacePath: searchSpacePath:
#choice: true, false #choice: true, false
...@@ -42,7 +42,7 @@ experimentName: ...@@ -42,7 +42,7 @@ experimentName:
trialConcurrency: trialConcurrency:
maxExecDuration: maxExecDuration:
maxTrialNum: maxTrialNum:
#choice: local, remote #choice: local, remote, pai
trainingServicePlatform: trainingServicePlatform:
searchSpacePath: searchSpacePath:
#choice: true, false #choice: true, false
...@@ -79,7 +79,7 @@ experimentName: ...@@ -79,7 +79,7 @@ experimentName:
trialConcurrency: trialConcurrency:
maxExecDuration: maxExecDuration:
maxTrialNum: maxTrialNum:
#choice: local, remote #choice: local, remote, pai
trainingServicePlatform: trainingServicePlatform:
#choice: true, false #choice: true, false
useAnnotation: useAnnotation:
...@@ -146,6 +146,8 @@ machineList: ...@@ -146,6 +146,8 @@ machineList:
* __remote__ mode means you submit trial jobs to remote linux machines. If you set platform as remote, you should complete __machineList__ field. * __remote__ mode means you submit trial jobs to remote linux machines. If you set platform as remote, you should complete __machineList__ field.
* __pai__ mode means you submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md)
* __searchSpacePath__ * __searchSpacePath__
* Description * Description
...@@ -268,7 +270,7 @@ experimentName: test_experiment ...@@ -268,7 +270,7 @@ experimentName: test_experiment
trialConcurrency: 3 trialConcurrency: 3
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 10 maxTrialNum: 10
#choice: local, remote #choice: local, remote, pai
trainingServicePlatform: local trainingServicePlatform: local
#choice: true, false #choice: true, false
useAnnotation: true useAnnotation: true
...@@ -292,7 +294,7 @@ experimentName: test_experiment ...@@ -292,7 +294,7 @@ experimentName: test_experiment
trialConcurrency: 3 trialConcurrency: 3
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 10 maxTrialNum: 10
#choice: local, remote #choice: local, remote, pai
trainingServicePlatform: local trainingServicePlatform: local
searchSpacePath: /nni/search_space.json searchSpacePath: /nni/search_space.json
#choice: true, false #choice: true, false
...@@ -324,7 +326,7 @@ experimentName: test_experiment ...@@ -324,7 +326,7 @@ experimentName: test_experiment
trialConcurrency: 3 trialConcurrency: 3
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 10 maxTrialNum: 10
#choice: local, remote #choice: local, remote, pai
trainingServicePlatform: local trainingServicePlatform: local
searchSpacePath: /nni/search_space.json searchSpacePath: /nni/search_space.json
#choice: true, false #choice: true, false
...@@ -360,7 +362,7 @@ experimentName: test_experiment ...@@ -360,7 +362,7 @@ experimentName: test_experiment
trialConcurrency: 3 trialConcurrency: 3
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 10 maxTrialNum: 10
#choice: local, remote #choice: local, remote, pai
trainingServicePlatform: remote trainingServicePlatform: remote
searchSpacePath: /nni/search_space.json searchSpacePath: /nni/search_space.json
#choice: true, false #choice: true, false
......
...@@ -62,7 +62,7 @@ maxExecDuration: 3h ...@@ -62,7 +62,7 @@ maxExecDuration: 3h
# empty means never stop # empty means never stop
maxTrialNum: 100 maxTrialNum: 100
# choice: local, remote # choice: local, remote, pai
trainingServicePlatform: local trainingServicePlatform: local
# choice: true, false # choice: true, false
......
...@@ -14,6 +14,7 @@ nnictl trial ...@@ -14,6 +14,7 @@ nnictl trial
nnictl experiment nnictl experiment
nnictl config nnictl config
nnictl log nnictl log
nnictl webui
``` ```
### Manage an experiment ### Manage an experiment
* __nnictl create__ * __nnictl create__
...@@ -33,7 +34,7 @@ nnictl log ...@@ -33,7 +34,7 @@ nnictl log
| Name, shorthand | Required|Default | Description | | Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ | | ------ | ------ | ------ |------ |
| --config, -c| True| |yaml configure file of the experiment| | --config, -c| True| |yaml configure file of the experiment|
| --port, -p | False| |the port of restful server|
* __nnictl resume__ * __nnictl resume__
...@@ -56,11 +57,20 @@ nnictl log ...@@ -56,11 +57,20 @@ nnictl log
* __nnictl stop__ * __nnictl stop__
* Description * Description
You can use this command to stop a running experiment. You can use this command to stop a running experiment or multiple experiments.
* Usage * Usage
nnictl stop nnictl stop [id]
* Detail
1.If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message.
2.If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
3.If the id ends with *, nnictl will stop all experiments whose ids matchs the regular.
4.If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
5.If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
6.Users could use 'nnictl stop all' to stop all experiments
* __nnictl update__ * __nnictl update__
...@@ -78,6 +88,7 @@ nnictl log ...@@ -78,6 +88,7 @@ nnictl log
| Name, shorthand | Required|Default | Description | | Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ | | ------ | ------ | ------ |------ |
| --filename, -f| True| |the file storing your new search space| | --filename, -f| True| |the file storing your new search space|
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl update concurrency__ * __nnictl update concurrency__
* Description * Description
...@@ -93,6 +104,7 @@ nnictl log ...@@ -93,6 +104,7 @@ nnictl log
| Name, shorthand | Required|Default | Description | | Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ | | ------ | ------ | ------ |------ |
| --value, -v| True| |the number of allowed concurrent trials| | --value, -v| True| |the number of allowed concurrent trials|
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl update duration__ * __nnictl update duration__
* Description * Description
...@@ -108,6 +120,7 @@ nnictl log ...@@ -108,6 +120,7 @@ nnictl log
| Name, shorthand | Required|Default | Description | | Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ | | ------ | ------ | ------ |------ |
| --value, -v| True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.| | --value, -v| True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.|
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl trial__ * __nnictl trial__
...@@ -120,6 +133,12 @@ nnictl log ...@@ -120,6 +133,12 @@ nnictl log
nnictl trial ls nnictl trial ls
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl trial kill__ * __nnictl trial kill__
* Description * Description
...@@ -133,6 +152,7 @@ nnictl log ...@@ -133,6 +152,7 @@ nnictl log
| Name, shorthand | Required|Default | Description | | Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ | | ------ | ------ | ------ |------ |
| --trialid, -t| True| |ID of the trial you want to kill.| | --trialid, -t| True| |ID of the trial you want to kill.|
| --id, -i| False| |ID of the experiment you want to set|
...@@ -147,6 +167,36 @@ nnictl log ...@@ -147,6 +167,36 @@ nnictl log
nnictl experiment show nnictl experiment show
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl experiment status__
* Description
Show the status of experiment.
* Usage
nnictl experiment status
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl experiment list__
* Description
Show the id and start time of all running experiments.
* Usage
nnictl experiment list
* __nnictl config show__ * __nnictl config show__
...@@ -176,6 +226,7 @@ nnictl log ...@@ -176,6 +226,7 @@ nnictl log
| --head, -h| False| |show head lines of stdout| | --head, -h| False| |show head lines of stdout|
| --tail, -t| False| |show tail lines of stdout| | --tail, -t| False| |show tail lines of stdout|
| --path, -p| False| |show the path of stdout file| | --path, -p| False| |show the path of stdout file|
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl log stderr__ * __nnictl log stderr__
* Description * Description
...@@ -193,6 +244,7 @@ nnictl log ...@@ -193,6 +244,7 @@ nnictl log
| --head, -h| False| |show head lines of stderr| | --head, -h| False| |show head lines of stderr|
| --tail, -t| False| |show tail lines of stderr| | --tail, -t| False| |show tail lines of stderr|
| --path, -p| False| |show the path of stderr file| | --path, -p| False| |show the path of stderr file|
| --id, -i| False| |ID of the experiment you want to set|
* __nnictl log trial__ * __nnictl log trial__
* Description * Description
...@@ -209,3 +261,19 @@ nnictl log ...@@ -209,3 +261,19 @@ nnictl log
| ------ | ------ | ------ |------ | | ------ | ------ | ------ |------ |
| --id, -I| False| |the id of trial| | --id, -I| False| |the id of trial|
### Manage webui
* __nnictl webui url__
* Description
Show the urls of the experiment.
* Usage
nnictl webui url
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|
\ No newline at end of file
...@@ -34,7 +34,7 @@ trialConcurrency: 2 ...@@ -34,7 +34,7 @@ trialConcurrency: 2
maxExecDuration: 3h maxExecDuration: 3h
# empty means never stop # empty means never stop
maxTrialNum: 100 maxTrialNum: 100
# choice: local, remote # choice: local, remote, pai
trainingServicePlatform: local trainingServicePlatform: local
# choice: true, false # choice: true, false
useAnnotation: true useAnnotation: true
......
...@@ -67,7 +67,7 @@ def detect_port(port): ...@@ -67,7 +67,7 @@ def detect_port(port):
socket_test = socket.socket(socket.AF_INET,socket.SOCK_STREAM) socket_test = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
try: try:
socket_test.connect(('127.0.0.1', int(port))) socket_test.connect(('127.0.0.1', int(port)))
socket_test.shutdown(2) socket_test.close()
return True return True
except: except:
return False return False
...@@ -92,12 +92,12 @@ pai_config_schema = { ...@@ -92,12 +92,12 @@ pai_config_schema = {
machine_list_schima = { machine_list_schima = {
Optional('machineList'):[Or({ Optional('machineList'):[Or({
'ip': str, 'ip': str,
'port': And(int, lambda x: 0 < x < 65535), Optional('port'): And(int, lambda x: 0 < x < 65535),
'username': str, 'username': str,
'passwd': str 'passwd': str
},{ },{
'ip': str, 'ip': str,
'port': And(int, lambda x: 0 < x < 65535), Optional('port'): And(int, lambda x: 0 < x < 65535),
'username': str, 'username': str,
'sshKeyPath': os.path.exists, 'sshKeyPath': os.path.exists,
Optional('passphrase'): str Optional('passphrase'): str
......
...@@ -97,6 +97,11 @@ def validate_common_content(experiment_config): ...@@ -97,6 +97,11 @@ def validate_common_content(experiment_config):
experiment_config['maxExecDuration'] = '999d' experiment_config['maxExecDuration'] = '999d'
if experiment_config.get('maxTrialNum') is None: if experiment_config.get('maxTrialNum') is None:
experiment_config['maxTrialNum'] = 99999 experiment_config['maxTrialNum'] = 99999
if experiment_config['trainingServicePlatform'] == 'remote':
for index in range(len(experiment_config['machineList'])):
if experiment_config['machineList'][index].get('port') is None:
experiment_config['machineList'][index]['port'] = 22
except Exception as exception: except Exception as exception:
raise Exception(exception) raise Exception(exception)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment