add document for nnictl (#230)

* fix nnictl bug * fix install.sh * add desc for Dockerfile.build.base * update document for Dockerfile * update * refactor port detect * update * refactor NNICTLDOC.md * add document for pai and nnictl * add default value for port

add document for nnictl (#230)
* fix nnictl bug * fix install.sh * add desc for Dockerfile.build.base * update document for Dockerfile * update * refactor port detect * update * refactor NNICTLDOC.md * add document for pai and nnictl * add default value for port
ad88e3a3 · SparkSnail · GitHub · 42d8cbda · ad88e3a3 · ad88e3a3
Unverified Commit ad88e3a3 authored Oct 17, 2018 by SparkSnail Committed by GitHub Oct 17, 2018
7 changed files
--- a/docs/ExperimentConfig.md
+++ b/docs/ExperimentConfig.md
@@ -12,7 +12,7 @@ experimentName:
 trialConcurrency: 
 maxExecDuration: 
 maxTrialNum: 
-#choice: local, remote
+#choice: local, remote, pai
 trainingServicePlatform: 
 searchSpacePath: 
 #choice: true, false
@@ -42,7 +42,7 @@ experimentName:
 trialConcurrency: 
 maxExecDuration: 
 maxTrialNum: 
-#choice: local, remote
+#choice: local, remote, pai
 trainingServicePlatform: 
 searchSpacePath: 
 #choice: true, false
@@ -79,7 +79,7 @@ experimentName:
 trialConcurrency: 
 maxExecDuration: 
 maxTrialNum: 
-#choice: local, remote
+#choice: local, remote, pai
 trainingServicePlatform: 
 #choice: true, false
 useAnnotation: 
@@ -146,6 +146,8 @@ machineList:
 	* __remote__ mode means you submit trial jobs to remote linux machines. If you set platform as remote, you should complete __machineList__ field.  
+	* __pai__ mode means you submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md)
 * __searchSpacePath__
  * Description
@@ -268,7 +270,7 @@ experimentName: test_experiment
 trialConcurrency: 3
 maxExecDuration: 1h
 maxTrialNum: 10
-#choice: local, remote
+#choice: local, remote, pai
 trainingServicePlatform: local
 #choice: true, false
 useAnnotation: true
@@ -292,7 +294,7 @@ experimentName: test_experiment
 trialConcurrency: 3
 maxExecDuration: 1h
 maxTrialNum: 10
-#choice: local, remote
+#choice: local, remote, pai
 trainingServicePlatform: local
 searchSpacePath: /nni/search_space.json
 #choice: true, false
@@ -324,7 +326,7 @@ experimentName: test_experiment
 trialConcurrency: 3
 maxExecDuration: 1h
 maxTrialNum: 10
-#choice: local, remote
+#choice: local, remote, pai
 trainingServicePlatform: local
 searchSpacePath: /nni/search_space.json
 #choice: true, false
@@ -360,7 +362,7 @@ experimentName: test_experiment
 trialConcurrency: 3
 maxExecDuration: 1h
 maxTrialNum: 10
-#choice: local, remote
+#choice: local, remote, pai
 trainingServicePlatform: remote
 searchSpacePath: /nni/search_space.json
 #choice: true, false

--- a/docs/GetStarted.md
+++ b/docs/GetStarted.md
@@ -62,7 +62,7 @@ maxExecDuration: 3h
 # empty means never stop
 maxTrialNum: 100
-# choice: local, remote  
+# choice: local, remote, pai
 trainingServicePlatform: local
 # choice: true, false  

--- a/docs/NNICTLDOC.md
+++ b/docs/NNICTLDOC.md
@@ -14,6 +14,7 @@ nnictl trial
 nnictl experiment
 nnictl config
 nnictl log
+nnictl webui
 ```
 ### Manage an experiment
 * __nnictl create__ 
@@ -33,7 +34,7 @@ nnictl log
      | Name, shorthand | Required|Default | Description |
      | ------ | ------ | ------ |------ |
      | --config, -c|  True| |yaml configure file of the experiment|
+      | --port, -p  |  False| |the port of restful server| 
 * __nnictl resume__
@@ -56,11 +57,20 @@ nnictl log
 * __nnictl stop__
  * Description
-		  You can use this command to stop a running experiment.
+		You can use this command to stop a running experiment or multiple experiments.
  * Usage
-        nnictl stop 
+        nnictl stop [id]
+  * Detail
+        1.If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message.
+        2.If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
+        3.If the id ends with *, nnictl will stop all experiments whose ids matchs the regular.
+        4.If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
+        5.If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
+        6.Users could use 'nnictl stop all' to stop all experiments  
 * __nnictl update__
@@ -78,6 +88,7 @@ nnictl log
           | Name, shorthand | Required|Default | Description |
           | ------ | ------ | ------ |------ |
         | --filename, -f|  True| |the file storing your new search space|
+         | --id, -i|  False| |ID of the experiment you want to set|
 	* __nnictl update concurrency__  
        * Description
@@ -93,6 +104,7 @@ nnictl log
            | Name, shorthand | Required|Default | Description |
            | ------ | ------ | ------ |------ |
           | --value, -v|  True| |the number of allowed concurrent trials|
+           | --id, -i|  False| |ID of the experiment you want to set|
     * __nnictl update duration__  
        * Description
@@ -108,6 +120,7 @@ nnictl log
          | Name, shorthand | Required|Default | Description |
          | ------ | ------ | ------ |------ |
          | --value, -v|  True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.|
+          | --id, -i|  False| |ID of the experiment you want to set|
 * __nnictl trial__
@@ -120,6 +133,12 @@ nnictl log
           nnictl trial ls
+      Options:
+      | Name, shorthand | Required|Default | Description |
+      | ------ | ------ | ------ |------ |
+    | --id, -i|  False| |ID of the experiment you want to set|
  * __nnictl trial kill__
      * Description
@@ -133,6 +152,7 @@ nnictl log
          | Name, shorthand | Required|Default | Description |
          | ------ | ------ | ------ |------ |
         | --trialid, -t|  True| |ID of the trial you want to kill.| 
+         | --id, -i|  False| |ID of the experiment you want to set|     
@@ -147,6 +167,36 @@ nnictl log
 	     nnictl experiment show
+      Options:
+        | Name, shorthand | Required|Default | Description |
+        | ------ | ------ | ------ |------ |
+      | --id, -i|  False| |ID of the experiment you want to set|
+* __nnictl experiment status__
+  * Description
+	     Show the status of experiment.
+   * Usage
+	     nnictl experiment status
+      Options:
+      | Name, shorthand | Required|Default | Description |
+      | ------ | ------ | ------ |------ |
+     | --id, -i|  False| |ID of the experiment you want to set|
+* __nnictl experiment list__
+  * Description
+	     Show the id and start time of all running experiments.
+   * Usage
+	     nnictl experiment list
 * __nnictl config show__
@@ -176,6 +226,7 @@ nnictl log
     | --head, -h| False| |show head lines of stdout|
     | --tail, -t|  False| |show tail lines of stdout|
 	   | --path, -p|  False| |show the path of stdout file|
+     | --id, -i|  False| |ID of the experiment you want to set|
 * __nnictl log stderr__
  * Description
@@ -193,6 +244,7 @@ nnictl log
    | --head, -h| False| |show head lines of stderr|
    | --tail, -t|  False| |show tail lines of stderr|
 	  | --path, -p|  False| |show the path of stderr file|
+    | --id, -i|  False| |ID of the experiment you want to set|
 * __nnictl log trial__
  * Description
@@ -209,3 +261,19 @@ nnictl log
      | ------ | ------ | ------ |------ |
    | --id, -I| False| |the id of trial|
+### Manage webui
+* __nnictl webui url__
+   * Description
+	     Show the urls of the experiment. 
+   * Usage
+		    nnictl webui url
+    	Options:
+       | Name, shorthand | Required|Default | Description |
+       | ------ | ------ | ------ |------ |
+     | --id, -i|  False| |ID of the experiment you want to set|
\ No newline at end of file
--- a/docs/RemoteMachineMode.md
+++ b/docs/RemoteMachineMode.md
@@ -34,7 +34,7 @@ trialConcurrency: 2
 maxExecDuration: 3h
 # empty means never stop
 maxTrialNum: 100
-# choice: local, remote  
+# choice: local, remote, pai
 trainingServicePlatform: local
 # choice: true, false  
 useAnnotation: true

--- a/tools/nnicmd/common_utils.py
+++ b/tools/nnicmd/common_utils.py
@@ -67,7 +67,7 @@ def detect_port(port):
    socket_test = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    try:
        socket_test.connect(('127.0.0.1', int(port)))
-        socket_test.shutdown(2)
+        socket_test.close()
        return True
    except:
        return False
--- a/tools/nnicmd/config_schema.py
+++ b/tools/nnicmd/config_schema.py
@@ -92,12 +92,12 @@ pai_config_schema = {
 machine_list_schima = {
 Optional('machineList'):[Or({
    'ip': str,
-    'port': And(int, lambda x: 0 < x < 65535),
+    Optional('port'): And(int, lambda x: 0 < x < 65535),
    'username': str,
    'passwd': str
    },{
    'ip': str,
-    'port': And(int, lambda x: 0 < x < 65535),
+    Optional('port'): And(int, lambda x: 0 < x < 65535),
    'username': str,
    'sshKeyPath': os.path.exists,
    Optional('passphrase'): str

--- a/tools/nnicmd/launcher_utils.py
+++ b/tools/nnicmd/launcher_utils.py
@@ -97,6 +97,11 @@ def validate_common_content(experiment_config):
            experiment_config['maxExecDuration'] = '999d'
        if experiment_config.get('maxTrialNum') is None:
            experiment_config['maxTrialNum'] = 99999
+        if experiment_config['trainingServicePlatform'] == 'remote':
+            for index in range(len(experiment_config['machineList'])):
+                if experiment_config['machineList'][index].get('port') is None:
+                    experiment_config['machineList'][index]['port'] = 22
    except Exception as exception:
        raise Exception(exception)