- 28 Nov, 2018 5 commits
-
-
chicm-ms authored
* Fix trial job start time * updates * updates
-
Lijiao authored
* Fix bug * fix lint
-
Matei13 authored
-
fishyds authored
* [PAI training service] Support virtual cluster config * fix a small bug to convert virtualCluster to string
-
SparkSnail authored
Support aks of kuberflow training service Support nnictl set nniManagerIp
-
- 27 Nov, 2018 4 commits
-
-
Lijiao authored
-
fishyds authored
* fix bugs due to ts.tailstream (#273)
-
Yan Ni authored
* update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * update Makefile with shorturl * fix false fail on mac webui * fix cross os remote tmpdir issue * add readonly to RemoteMachineTrainingService.remoteOS * fix var name for PR 386
-
chicm-ms authored
* Rest retrieve multiple final results for multiphase job * updates
-
- 25 Nov, 2018 2 commits
-
-
QuanluZhang authored
-
QuanluZhang authored
* add one more trial job status, EARLY_STOPPED * fix datastore/nnimanager/mockeddatastore. test/webui/metrics_reader not done. USER_TO_CANCEL * fix bug * modifications based on Deshui's comments * fix bug * fix bug in remote mode
-
- 23 Nov, 2018 4 commits
-
-
SparkSnail authored
Add nniManager Ip in nnictl, pai TrainingService and kubeflow TrainingService. If users set nniManagerIp, pai and kubeflow will use this ip instead of using getIPV4() function. Web UI will also use this nniManagerIp.
-
fishyds authored
* Adjust sleep position for sdk_test.py * Exit dispather process if receive Terminate command * Add comment for sleep change in sdk_test.py
-
SparkSnail authored
In nnictl, classArgs is not required, now set it as optional for some kind of tuner and assessor may not require classArgs.
-
fishyds authored
* Use different output folder for ps and worker * Add cuda_visible_devices env var if gpuNum is 0
-
- 22 Nov, 2018 5 commits
-
-
Yan Ni authored
* add gpuNum check for local TS * set CUDA_VISIBLE_DEVICES to empty string when gpuNum is 0 * remove redundency code
-
fishyds authored
[Kubeflow training service] Update kubeflow exp job config schema to support distributed training (#387) * Support distributed training on tf-operator, for worker and ps * Update validation rule for kubeflow config * small code refactor adjustment for private methods * Use different output folder for ps and worker
-
chicm-ms authored
* Asynchronous dispatcher * updates * updates * updates * updates
-
Lijiao authored
-
Zejun Lin authored
* fix sdk's unittest and add medianstop, batchtuner to ci * fix sdk's unittest and add medianstop, batchtuner to ci * remove debug info * update azure-pipelines * remove useless code * add some checks * fix pylint * update ci test * update ci
-
- 20 Nov, 2018 2 commits
-
-
The Gitter Badger authored
-
fishyds authored
* Kubeflow TrainingService support, v1 (#373) 1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs 2. Update nni python SDK to support new kubeflow platform 3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file 4. This version only supports Tensorflow operator. Will add more operators' support in future versions
-
- 19 Nov, 2018 2 commits
- 16 Nov, 2018 2 commits
-
-
SparkSnail authored
Fix "nnictl stop"
-
Zejun Lin authored
* add gridsearch tuner * add gridsearchtuner * add gridsearchtuner * add gridsearchtuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch and pylint
-
- 15 Nov, 2018 1 commit
-
-
Lijiao authored
-
- 14 Nov, 2018 4 commits
- 13 Nov, 2018 5 commits
-
-
Scarlett Li authored
- Updated document for "write a trial" related fixes per Quanlu's feedback; - Fix wrong links in Get started per Meng's feedback.
-
SparkSnail authored
Remove "RUN python3 -m pip --no-cache-dir install torch torchvision"
-
SparkSnail authored
1.Set scikit-learn==0.20.0 in Dockerfile 2.Update readme.md of dockerile 3.Add PyTorch 0.4.1 4.Add description for 'nnictl stop all'
-
gongwuji authored
* update local demo doc and configuration * change folder name * Update tutorial_1_CR_exp_local_api.md no need to have a new training file * Delete mnist_gpu.py no need to have a new training file * Update config_gpu.yml no need to have a new training file * add PyTorch to Dockerfile
-
gongwuji authored
* update local demo doc and configuration * change folder name * Update tutorial_1_CR_exp_local_api.md no need to have a new training file * Delete mnist_gpu.py no need to have a new training file * Update config_gpu.yml no need to have a new training file
-
- 12 Nov, 2018 4 commits
-
-
QuanluZhang authored
* update makefile * update launcher.py to fix the problem of finding main.js * remove duplicated lib
-
fishyds authored
* Change base image from devel to runtime, to reduce docker image size * Support running multiple experiment for PAI * Fix a bug regarding to recuisively reference between paiRestServer and paiTrainingService
-
QuanluZhang authored
* update doc for docker image * update
-
noklam authored
* Update nnictl.py * modify help message for nnictl stop
-