Commits · d2f597a63d855913f4a739812892633e9f4d2f49 · OpenDAS / nni

28 Nov, 2018 5 commits
- Fix trial start time (#408) · d2f597a6
  chicm-ms authored Nov 28, 2018
```
* Fix trial job start time

* updates

* updates
```
  d2f597a6
- Fix bug of webui's table (#407) · 8dc2a975
  Lijiao authored Nov 28, 2018
```
* Fix bug

* fix lint
```
  8dc2a975
- Correct typo (#402) · 194af6f0
  Matei13 authored Nov 28, 2018
  
  194af6f0
- [PAI training service] Support virtualCluster configuration (#401) · 154bcc55
  fishyds authored Nov 28, 2018
```
* [PAI training service] Support virtual cluster config

* fix a small bug to convert virtualCluster to string
```
  154bcc55
- Support Azure k8s (#383) · 21a2bb0b
  SparkSnail authored Nov 28, 2018
```
Support aks of kuberflow training service
Support nnictl set nniManagerIp
```
  21a2bb0b
27 Nov, 2018 4 commits

Fix bugs and update webui doc (#397) · 56b46003
Lijiao authored Nov 27, 2018

56b46003
Merge v0.2 branch back to master for PR #273 (#400) · b2b4f458
fishyds authored Nov 27, 2018
```
* fix bugs due to ts.tailstream (#273)
```
b2b4f458

mac support with local, remote & pai mode (#386) · 101b02ff

Yan Ni authored Nov 27, 2018

* update Makefile for mac support, wait for aka.ms support

* refix Makefile for colorful echo

* update Makefile with shorturl

* fix false fail on mac webui

* fix cross os remote tmpdir issue

* add readonly to RemoteMachineTrainingService.remoteOS

* fix var name for PR 386

101b02ff

Multi final metrics (#377) · 694bb539
chicm-ms authored Nov 27, 2018
```
* Rest retrieve multiple final results for multiphase job

* updates
```
694bb539

25 Nov, 2018 2 commits

add NO_MORE_TRIAL state in experiment (#389) · e577bafd
QuanluZhang authored Nov 26, 2018

e577bafd

Fix trialjobstate (#385) · c4d1aefe

QuanluZhang authored Nov 26, 2018

* add one more trial job status, EARLY_STOPPED

* fix datastore/nnimanager/mockeddatastore. test/webui/metrics_reader not done. USER_TO_CANCEL

* fix bug

* modifications based on Deshui's comments

* fix bug

* fix bug in remote mode

c4d1aefe

23 Nov, 2018 4 commits

Add nniManagerIp in nnictl and trainingService (#393) · c2a4ce6c

SparkSnail authored Nov 23, 2018

Add nniManager Ip in nnictl, pai TrainingService and kubeflow TrainingService.
If users set nniManagerIp, pai and kubeflow will use this ip instead of using getIPV4() function.
Web UI will also use this nniManagerIp.

c2a4ce6c

Move the call of experimentDoneCleanUp into stopExperiment() method (#390) · cb7c7ff0

fishyds authored Nov 23, 2018

* Adjust sleep position for sdk_test.py

* Exit dispather process if receive Terminate command

* Add comment for sleep change in sdk_test.py

cb7c7ff0

NNICTL set classArgs as optional (#374) · 851955e6

SparkSnail authored Nov 23, 2018

In nnictl, classArgs is not required, now set it as optional for some kind of tuner and assessor may not require classArgs.

851955e6

[Kubeflow Training Service] Explicitly set cuda_visible_devices env var (#388) · 28e26ae9
fishyds authored Nov 23, 2018
```
* Use different output folder for ps and worker

* Add cuda_visible_devices env var if gpuNum is 0
```
28e26ae9

22 Nov, 2018 5 commits

add gpuNum check for local TS (#378) · 1df750e2

Yan Ni authored Nov 22, 2018

* add gpuNum check for local TS

* set CUDA_VISIBLE_DEVICES to empty string when gpuNum is 0

* remove redundency code

1df750e2

[Kubeflow training service] Update kubeflow exp job config schema to support... · e341df81

fishyds authored Nov 22, 2018

[Kubeflow training service] Update kubeflow exp job config schema to support distributed training (#387)

* Support distributed training on tf-operator, for worker and ps

* Update validation rule for kubeflow config

* small code refactor adjustment for private methods

* Use different output folder for ps and worker

e341df81

Asynchronous dispatcher (#372) · a5d614de
chicm-ms authored Nov 22, 2018
```
* Asynchronous dispatcher

* updates

* updates

* updates

* updates
```
a5d614de
Show intermediate result (#384) · 8d63b108
Lijiao authored Nov 22, 2018

8d63b108

Update ci with new built-in tuner and assessor (#359) · 7035f3e7

Zejun Lin authored Nov 22, 2018

* fix sdk's unittest and add medianstop, batchtuner to ci

* fix sdk's unittest and add medianstop, batchtuner to ci

* remove debug info

* update azure-pipelines

* remove useless code

* add some checks

* fix pylint

* update ci test

* update ci

7035f3e7

20 Nov, 2018 2 commits

Add Gitter badge (#376) · 76277dbd
The Gitter Badger authored Nov 20, 2018

76277dbd

[Kubeflow Training Service] V1, merge from kubeflow branch to master branch (#382) · 806afeb6

fishyds authored Nov 20, 2018

* Kubeflow TrainingService support, v1 (#373)

1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs
2. Update nni python SDK to support new kubeflow platform
3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file
4. This version only supports Tensorflow operator. Will add more operators' support in future versions

806afeb6

19 Nov, 2018 2 commits
- Update README.md (#371) · b749266d
  fishyds authored Nov 19, 2018
  
  b749266d
- Add more tooltips in default metric graph (#370) · 9cc234b2
  Lijiao authored Nov 19, 2018
```
* Add more tooltip in default metric graph and fix bug

* update
```
  9cc234b2
16 Nov, 2018 2 commits

Fix nni stop (#368) · 8f716170
SparkSnail authored Nov 16, 2018
```
Fix "nnictl stop"
```
8f716170

add gridsearch tuner (#364) · d6c07948

Zejun Lin authored Nov 16, 2018

* add gridsearch tuner

* add gridsearchtuner

* add gridsearchtuner

* add gridsearchtuner

* update gridsearch tuner

* update gridsearch tuner

* update gridsearch tuner

* update gridsearch tuner

* update gridsearch tuner

* update gridsearch tuner

* update gridsearch tuner

* update gridsearch and pylint

d6c07948

15 Nov, 2018 1 commit
- Support hyper-band (#358) · f253576f
  Lijiao authored Nov 15, 2018
  
  f253576f
14 Nov, 2018 4 commits
- update tutorial for remote machine as well (#367) · 1e9cd5fe
  gongwuji authored Nov 14, 2018
  
  1e9cd5fe
- add more details for remote mode docs (#365) · e45db625
  gongwuji authored Nov 14, 2018
  
  e45db625
- add more details for remote mode docs (#366) · 9f62bf6a
  gongwuji authored Nov 14, 2018
  
  9f62bf6a
- Fix the issue#211: WebUI does not support search for a specific Trial (#355) · 5b24f046
  Lijiao authored Nov 14, 2018
```
* Fix the issue#211: WebUI does not support search for a specific Trial

* delete unuseful code

* Update

* default 20
```
  5b24f046
13 Nov, 2018 5 commits

Updated document for "write a trial" related fixes. (#351) · 9380e68c

Scarlett Li authored Nov 13, 2018

- Updated document for "write a trial" related fixes per Quanlu's feedback;
- Fix wrong links in Get started per Meng's feedback.

9380e68c

Quick fix Docker (#363) · 183763ef
SparkSnail authored Nov 13, 2018
```
Remove "RUN python3 -m pip --no-cache-dir install torch torchvision"
```
183763ef

Add Pytorch and set sklearn version in Dockerfile (#346) · e3901253

SparkSnail authored Nov 13, 2018

1.Set scikit-learn==0.20.0 in Dockerfile
2.Update readme.md of dockerile
3.Add PyTorch 0.4.1
4.Add description for 'nnictl stop all'

e3901253

add PyTorch to Dockerfile (#362) · 7508c87d

gongwuji authored Nov 13, 2018

* update local demo doc and configuration

* change folder name

* Update tutorial_1_CR_exp_local_api.md

no need to have a new training file

* Delete mnist_gpu.py

no need to have a new training file

* Update config_gpu.yml

no need to have a new training file

* add PyTorch to Dockerfile

7508c87d

update local demo doc and configuration (#344) · 95a8f93e

gongwuji authored Nov 13, 2018

* update local demo doc and configuration

* change folder name

* Update tutorial_1_CR_exp_local_api.md

no need to have a new training file

* Delete mnist_gpu.py

no need to have a new training file

* Update config_gpu.yml

no need to have a new training file

95a8f93e

12 Nov, 2018 4 commits

update makefile (#350) · b345da07

QuanluZhang authored Nov 12, 2018

* update makefile

* update launcher.py to fix the problem of finding main.js

* remove duplicated lib

b345da07

[PAI training service] Support running multiple PAI experiment (#348) · b1d4c129

fishyds authored Nov 12, 2018

* Change base image from devel to runtime, to reduce docker image size

* Support running multiple experiment for PAI

* Fix a bug regarding to recuisively reference between paiRestServer and
paiTrainingService

b1d4c129

update doc for docker image (#353) · 35e0832b
QuanluZhang authored Nov 12, 2018
```
* update doc for docker image

* update
```
35e0832b
Update nnictl.py (#347) · 48b91c45
noklam authored Nov 12, 2018
```
* Update nnictl.py

* modify help message for nnictl stop
```
48b91c45