- 25 Apr, 2019 1 commit
-
-
chicm-ms authored
-
- 18 Apr, 2019 1 commit
-
-
chicm-ms authored
* Refactoring local training service * Designated GPU for local training service * RemoteMachine designated GPU configuration
-
- 14 Mar, 2019 1 commit
-
-
SparkSnail authored
SSH client has a max number of open channels for a connection, if we set the number of trialCurrency too big, our ssh client will exec command using ssh frequently, then we will meet the error of Error: (SSH) Channel open failure: open failed. Refactor the code, set one connection has a max trial concurrency, when the number of trial reach the ssh connection restriction, will create a new ssh connection to exec trial commands.
-
- 29 Jan, 2019 1 commit
-
-
SparkSnail authored
* fix remote bug * add document * add document * update * update * update * update * fix remote issue * fix forEach * update doc according to comments * update * update * update * remove 'any more' * add base version for remote-log * change launcher.py * test * basic version * debug * debug * basic work version * fix code * update disable_log * remove unused line * add diable log in kubernetesTrainingService * add detect frameworkcontroller * fix comment * update * update * fix kubernetesData * debug * debug * debug * fix comment * fix conflict * remove local temp files * revert launcher.py * update code by comments * remove disableLog * remove disable Log * set timeout for cleanup * fix code by comments * update variable names * add comments * add delay function * update * update * update by comments * add in remote script path * rename variables * update variable name * add mkdir -p for subfolder
-
- 14 Sep, 2018 1 commit
-
-
fishyds authored
* Merge latest code changes into Github Master * temporary modification for travis * temporary modification for travis
-
- 20 Aug, 2018 1 commit
-
-
Deshui Yu authored
-