- 08 Jan, 2019 1 commit
-
-
fishyds authored
* Fix a race condidtion issue in trial_keeper for reading log from pipe
-
- 02 Jan, 2019 1 commit
-
-
fishyds authored
[Logging architecture refactor] Remove unused metrics related code in nni trial_tools, support kubeflow mode for logging architecture refactor (#551) * Remove unused metrics related code in nni trial_tools, support kubeflow mode for logging architecture refactor
-
- 29 Dec, 2018 1 commit
-
-
fishyds authored
* Removed unused log code, refactor to rename some class name in nni sdk and trial_tools * Fix the regression bug that loca/remote mode doesnt work
-
- 20 Dec, 2018 1 commit
-
-
fishyds authored
* Update nnictl.py Fix the issue that nnictl --version via pip installation doesn't work * Update kubeflow training service document (#494) * Remove kubectl related document, add messages for kubeconfig * Add design section for kubeflow training service * Move the image files for PAI training service doc into img folder. * Update KubeflowMode.md (#498) Update KubeflowMode.md, small terms change * [V0.4.1 bug fix] Cannot run kubeflow training service due to trial_keeper change (#503) * Update kubeflow training service document * fix bug a that kubeflow trial job cannot run * upgrade version number (#499) * [V0.4.1 bug fix] Support read K8S config from KUBECONFIG environment variable (#507) * Add KUBCONFIG env variable support * In main.ts, throw cached error to make sure nnictl can show the error in stderr
-
- 17 Dec, 2018 1 commit
-
-
fishyds authored
* [PAI training service] codeDir files upload improvement * Create full local temp folder * Organize the folder structure for experiment and trial files
-
- 29 Nov, 2018 1 commit
-
-
fishyds authored
* [Trial keeper refactor] refactor trial keeper stdout output
-
- 20 Nov, 2018 1 commit
-
-
fishyds authored
* Kubeflow TrainingService support, v1 (#373) 1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs 2. Update nni python SDK to support new kubeflow platform 3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file 4. This version only supports Tensorflow operator. Will add more operators' support in future versions
-
- 12 Nov, 2018 1 commit
-
-
fishyds authored
* Change base image from devel to runtime, to reduce docker image size * Support running multiple experiment for PAI * Fix a bug regarding to recuisively reference between paiRestServer and paiTrainingService
-
- 05 Nov, 2018 1 commit
-
-
Gems Guo authored
-