- 20 Jun, 2019 1 commit
-
-
demianzhang authored
* fix local and remote training services tslint
-
- 19 Jun, 2019 1 commit
-
-
Hongarc authored
-
- 27 Mar, 2019 1 commit
-
-
SparkSnail authored
-
- 22 Mar, 2019 1 commit
-
-
SparkSnail authored
If user set remoteloggingType in config file, log content will not be transmitted from trialkeeper
-
- 15 Mar, 2019 1 commit
-
-
SparkSnail authored
check nni version in trialkeeper, to make sure the version of trialkeeper is consistent with trainingService add a debug mode in config file
-
- 25 Feb, 2019 2 commits
-
-
SparkSnail authored
trial_keeper use 50070 port to connect to webhdfs server, and PAI use a mapping method to map 50070 port to 5070 port to visit restful server, this method has some risk for PAI may not support this kind of mapping in later release.Now use Pylon path(/webhdfs/api/v1) instead of 50070 port in webhdfs client of trial_keeper, the path is transmitted in trainingService. In this pr, we have these changes: 1. Change to use webhdfs path instead of 50070 port in hdfs client. 2. Change to use new hdfs package "PythonWebHDFS", which is build to support pylon by myself. You could test the new function from "sparksnail/nni:dev-pai" image to test pai trainingService. 3. Update some variables' name according to comments.
-
fishyds authored
* Fix a race condition bug that does not store Trial Job cancel status correctly
-
- 17 Dec, 2018 1 commit
-
-
fishyds authored
* [PAI training service] codeDir files upload improvement * Create full local temp folder * Organize the folder structure for experiment and trial files
-
- 20 Nov, 2018 1 commit
-
-
fishyds authored
* Kubeflow TrainingService support, v1 (#373) 1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs 2. Update nni python SDK to support new kubeflow platform 3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file 4. This version only supports Tensorflow operator. Will add more operators' support in future versions
-
- 12 Nov, 2018 1 commit
-
-
fishyds authored
* Change base image from devel to runtime, to reduce docker image size * Support running multiple experiment for PAI * Fix a bug regarding to recuisively reference between paiRestServer and paiTrainingService
-
- 05 Nov, 2018 1 commit
-
-
Gems Guo authored
-
- 02 Nov, 2018 1 commit
-
-
Gems Guo authored
-
- 31 Oct, 2018 3 commits
- 17 Oct, 2018 1 commit
-
-
fishyds authored
Fix paiTrainingService broken issue comes from OpenPAI API upgrade
-
- 12 Oct, 2018 1 commit
-
-
chicm-ms authored
* Pull latest code (#2) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit 1d174836. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * Bug fix for SQuAD example tuner. (#134) * Update Makefile (#151) * test * update setup.py * update Makefile and install.sh * rever setup.py * change color * update doc * update doc * fix auto-completion's extra space * update Makefile * update webui * Update doc image (#163) * update doc * trivial * trivial * trivial * trivial * trivial * trivial * update image * update image size * Update ga squad (#104) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * update readme * sklearn examples (#169) * fix nnictl bug * fix install.sh * add sklearn-regression example * add sklearn classification * update sklearn * update example * remove additional code * Update batch tuner (#158) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * update readme * update batch tuner * Quickly fix cascading search space bug in tuner (#156) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * update readme * quickly fix cascading searchspace bug in tuner * Add iterative search space example (#119) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * update readme * add iterative search space example * update * update readme * change name * Add api nni.get_sequence_id() * Add sequence_id to TrialJobDetail
-
- 29 Sep, 2018 1 commit
-
-
fishyds authored
* webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit 1d174836. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc
-
- 28 Sep, 2018 1 commit
-
-
fishyds authored
* Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT
-
- 27 Sep, 2018 1 commit
-
-
fishyds authored
* PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container.
-