- 30 Nov, 2018 5 commits
-
-
SparkSnail authored
1.Add kubeflow in experiment config document 2.Add AKS in kubeflow document
-
fishyds authored
* [Kubeflow training service] fix bug that wrongly split kube delete cmd into 2 lines * Adjust white space
-
Zejun Lin authored
* fix bug * add docs
-
Lijiao authored
* Support to show 2 logPath * fix lint * Update trial status color
-
Scarlett Li authored
* update doc for "write trial" * fix link * issue 414
-
- 29 Nov, 2018 5 commits
-
-
fishyds authored
* Kubeflow training service documentation, v1 * Fix typos based on comments
-
SparkSnail authored
1.Refactor nnictl information when validateion error. 2.Set kubernetesServer as optional.
-
fishyds authored
* [Trial keeper refactor] refactor trial keeper stdout output
-
Lijiao authored
-
fishyds authored
* Add codeDir file count validation for setClusterConfig * fix a small bug if find command is not installed * Remove codeDir validation for local training service * Remove useless import
-
- 28 Nov, 2018 5 commits
-
-
chicm-ms authored
* Fix trial job start time * updates * updates
-
Lijiao authored
* Fix bug * fix lint
-
Matei13 authored
-
fishyds authored
* [PAI training service] Support virtual cluster config * fix a small bug to convert virtualCluster to string
-
SparkSnail authored
Support aks of kuberflow training service Support nnictl set nniManagerIp
-
- 27 Nov, 2018 4 commits
-
-
Lijiao authored
-
fishyds authored
* fix bugs due to ts.tailstream (#273)
-
Yan Ni authored
* update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * update Makefile with shorturl * fix false fail on mac webui * fix cross os remote tmpdir issue * add readonly to RemoteMachineTrainingService.remoteOS * fix var name for PR 386
-
chicm-ms authored
* Rest retrieve multiple final results for multiphase job * updates
-
- 25 Nov, 2018 2 commits
-
-
QuanluZhang authored
-
QuanluZhang authored
* add one more trial job status, EARLY_STOPPED * fix datastore/nnimanager/mockeddatastore. test/webui/metrics_reader not done. USER_TO_CANCEL * fix bug * modifications based on Deshui's comments * fix bug * fix bug in remote mode
-
- 23 Nov, 2018 4 commits
-
-
SparkSnail authored
Add nniManager Ip in nnictl, pai TrainingService and kubeflow TrainingService. If users set nniManagerIp, pai and kubeflow will use this ip instead of using getIPV4() function. Web UI will also use this nniManagerIp.
-
fishyds authored
* Adjust sleep position for sdk_test.py * Exit dispather process if receive Terminate command * Add comment for sleep change in sdk_test.py
-
SparkSnail authored
In nnictl, classArgs is not required, now set it as optional for some kind of tuner and assessor may not require classArgs.
-
fishyds authored
* Use different output folder for ps and worker * Add cuda_visible_devices env var if gpuNum is 0
-
- 22 Nov, 2018 5 commits
-
-
Yan Ni authored
* add gpuNum check for local TS * set CUDA_VISIBLE_DEVICES to empty string when gpuNum is 0 * remove redundency code
-
fishyds authored
[Kubeflow training service] Update kubeflow exp job config schema to support distributed training (#387) * Support distributed training on tf-operator, for worker and ps * Update validation rule for kubeflow config * small code refactor adjustment for private methods * Use different output folder for ps and worker
-
chicm-ms authored
* Asynchronous dispatcher * updates * updates * updates * updates
-
Lijiao authored
-
Zejun Lin authored
* fix sdk's unittest and add medianstop, batchtuner to ci * fix sdk's unittest and add medianstop, batchtuner to ci * remove debug info * update azure-pipelines * remove useless code * add some checks * fix pylint * update ci test * update ci
-
- 20 Nov, 2018 2 commits
-
-
The Gitter Badger authored
-
fishyds authored
* Kubeflow TrainingService support, v1 (#373) 1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs 2. Update nni python SDK to support new kubeflow platform 3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file 4. This version only supports Tensorflow operator. Will add more operators' support in future versions
-
- 19 Nov, 2018 2 commits
- 16 Nov, 2018 2 commits
-
-
SparkSnail authored
Fix "nnictl stop"
-
Zejun Lin authored
* add gridsearch tuner * add gridsearchtuner * add gridsearchtuner * add gridsearchtuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch and pylint
-
- 15 Nov, 2018 1 commit
-
-
Lijiao authored
-
- 14 Nov, 2018 3 commits