Commits · c265903e23196dfd32322f1f86b02d9315eb6dd4 · OpenDAS / nni

07 Dec, 2018 1 commit
- Support kuberflow pytorch-operator (#406) · c265903e
  SparkSnail authored Dec 07, 2018
```
1.Support pytorch-operator
2.remove unsupported operator
```
  c265903e
05 Dec, 2018 1 commit
- [V0.4 Release] Kubeflow training service: Remove unued kubernetesServer config entry (#444) · 311d3da6
  fishyds authored Dec 04, 2018
```
* Remove unused kubernetesServer config entry in config file and schema validation
```
  311d3da6
30 Nov, 2018 1 commit
- [Kubeflow training service] fix bug that wrongly split kube delete cmd into 2 lines (#425) · 5426cfe8
  fishyds authored Nov 30, 2018
```
* [Kubeflow training service] fix bug that wrongly split kube delete cmd into 2 lines

* Adjust white space
```
  5426cfe8
29 Nov, 2018 1 commit

Add codeDir file count validation for setClusterConfig (#409) · cf3d434f

fishyds authored Nov 29, 2018

* Add codeDir file count validation for setClusterConfig

* fix a small bug if find command is not installed

* Remove codeDir validation for local training service

* Remove useless import

cf3d434f

28 Nov, 2018 1 commit

Support Azure k8s (#383) · 21a2bb0b

SparkSnail authored Nov 28, 2018

Support aks of kuberflow training service
Support nnictl set nniManagerIp

21a2bb0b

25 Nov, 2018 1 commit

Fix trialjobstate (#385) · c4d1aefe

QuanluZhang authored Nov 26, 2018

* add one more trial job status, EARLY_STOPPED

* fix datastore/nnimanager/mockeddatastore. test/webui/metrics_reader not done. USER_TO_CANCEL

* fix bug

* modifications based on Deshui's comments

* fix bug

* fix bug in remote mode

c4d1aefe

23 Nov, 2018 3 commits

Add nniManagerIp in nnictl and trainingService (#393) · c2a4ce6c

SparkSnail authored Nov 23, 2018

Add nniManager Ip in nnictl, pai TrainingService and kubeflow TrainingService.
If users set nniManagerIp, pai and kubeflow will use this ip instead of using getIPV4() function.
Web UI will also use this nniManagerIp.

c2a4ce6c

Move the call of experimentDoneCleanUp into stopExperiment() method (#390) · cb7c7ff0

fishyds authored Nov 23, 2018

* Adjust sleep position for sdk_test.py

* Exit dispather process if receive Terminate command

* Add comment for sleep change in sdk_test.py

cb7c7ff0

[Kubeflow Training Service] Explicitly set cuda_visible_devices env var (#388) · 28e26ae9
fishyds authored Nov 23, 2018
```
* Use different output folder for ps and worker

* Add cuda_visible_devices env var if gpuNum is 0
```
28e26ae9

22 Nov, 2018 1 commit

[Kubeflow training service] Update kubeflow exp job config schema to support... · e341df81

fishyds authored Nov 22, 2018

[Kubeflow training service] Update kubeflow exp job config schema to support distributed training (#387)

* Support distributed training on tf-operator, for worker and ps

* Update validation rule for kubeflow config

* small code refactor adjustment for private methods

* Use different output folder for ps and worker

e341df81

20 Nov, 2018 1 commit

[Kubeflow Training Service] V1, merge from kubeflow branch to master branch (#382) · 806afeb6

fishyds authored Nov 20, 2018

* Kubeflow TrainingService support, v1 (#373)

1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs
2. Update nni python SDK to support new kubeflow platform
3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file
4. This version only supports Tensorflow operator. Will add more operators' support in future versions

806afeb6