Commits · 95d194781105297ba87e3e643cbc08cc4765a96d · OpenDAS / nni

08 Jan, 2019 1 commit
- Fix a race condidtion issue in trial_keeper for reading log from pipe (#578) · 95d19478
  fishyds authored Jan 08, 2019
```
* Fix a race condidtion issue in trial_keeper for reading log from pipe
```
  95d19478
02 Jan, 2019 1 commit

[Logging architecture refactor] Remove unused metrics related code in nni... · 37354dff

fishyds authored Jan 02, 2019

[Logging architecture refactor] Remove unused metrics related code in nni trial_tools, support kubeflow mode for logging architecture refactor (#551)

* Remove unused metrics related code in nni trial_tools, support kubeflow mode for logging architecture refactor

37354dff

29 Dec, 2018 1 commit

NNI logging architecture improvement (#539) · cb83ac0f

fishyds authored Dec 29, 2018

* Removed unused log code, refactor to rename some class name in nni sdk and trial_tools

* Fix the regression bug that loca/remote mode doesnt work

cb83ac0f

20 Dec, 2018 1 commit

[V0.4.1 Release] Merge v0.4.1 branch back to Master (#509) · ff834cea

fishyds authored Dec 20, 2018

* Update nnictl.py

Fix the issue that nnictl --version via pip installation doesn't work

* Update kubeflow training service document (#494)

* Remove kubectl related document, add messages for kubeconfig
* Add design section for kubeflow training service
* Move the image files for PAI training service doc into img folder.

* Update KubeflowMode.md (#498)

Update KubeflowMode.md, small terms change

* [V0.4.1 bug fix] Cannot run kubeflow training service due to trial_keeper change (#503)

* Update kubeflow training service document

* fix bug a that kubeflow trial job cannot run

* upgrade version number (#499)

* [V0.4.1 bug fix] Support read K8S config from KUBECONFIG environment variable (#507)

* Add KUBCONFIG env variable support

* In main.ts, throw cached error to make sure nnictl can show the error in stderr

ff834cea

17 Dec, 2018 1 commit

[PAITrainingService] Improve uploading codeDir efficiency (#479) · 9397b6f6

fishyds authored Dec 17, 2018

* [PAI training service] codeDir files upload improvement

* Create full local temp folder

* Organize the folder structure for experiment and trial files

9397b6f6

29 Nov, 2018 1 commit
- Trial keeper refactor (#411) · 2b126039
  fishyds authored Nov 29, 2018
```
* [Trial keeper refactor] refactor trial keeper stdout output
```
  2b126039
20 Nov, 2018 1 commit

[Kubeflow Training Service] V1, merge from kubeflow branch to master branch (#382) · 806afeb6

fishyds authored Nov 20, 2018

* Kubeflow TrainingService support, v1 (#373)

1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs
2. Update nni python SDK to support new kubeflow platform
3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file
4. This version only supports Tensorflow operator. Will add more operators' support in future versions

806afeb6

12 Nov, 2018 1 commit

[PAI training service] Support running multiple PAI experiment (#348) · b1d4c129

fishyds authored Nov 12, 2018

* Change base image from devel to runtime, to reduce docker image size

* Support running multiple experiment for PAI

* Fix a bug regarding to recuisively reference between paiRestServer and
paiTrainingService

b1d4c129

05 Nov, 2018 1 commit
- Uniform the names of python modules · e3872ba1
  Gems Guo authored Nov 05, 2018
  
  e3872ba1
18 Oct, 2018 1 commit
- Add exception handler in trial_keeper (#235) · b29b7e55
  SparkSnail authored Oct 18, 2018
```
* add exception handling in trial_keeper.py
```
  b29b7e55
30 Sep, 2018 1 commit
- Fix trial keeper wrongly exit issue (#152) · f0d1f62f
  fishyds authored Sep 30, 2018
```
* Fix trial keeper bug, use actual exitcode to exit rather than 1
```
  f0d1f62f
29 Sep, 2018 2 commits

Merge branch V0.2 to Master (#143) · 2a28a578

fishyds authored Sep 29, 2018

* webui logpath and document (#135)

* Add webui document and logpath as a href

* fix tslint

* fix comments by Chengmin

* Pai training service bug fix and enhancement (#136)

* Add NNI installation scripts

* Update pai script, update NNI_out_dir

* Update NNI dir in nni sdk local.py

* Create .nni folder in nni sdk local.py

* Add check before creating .nni folder

* Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT

* Improve annotation (#138)

* Improve annotation

* Minor bugfix

* Selectively install through pip (#139)

Selectively install through pip 
* update setup.py

* fix paiTrainingService bugs (#137)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* Add documentation for NNI PAI mode experiment (#141)

* Add documentation for NNI PAI mode

* Fix typo based on PR comments

* Exit with subprocess return code of trial keeper

* Remove additional exit code

* Fix typo based on PR comments

* update doc for smac tuner (#140)

* Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)

* Revert "Selectively install through pip (#139)"

This reverts commit 1d174836.

* Add exit code of subprocess for trial_keeper

* Update README, add link to PAImode doc

2a28a578

Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) · 9d88f1b6

fishyds authored Sep 29, 2018

* Revert "Selectively install through pip (#139)"

This reverts commit 1d174836.

* Add exit code of subprocess for trial_keeper

* Update README, add link to PAImode doc

9d88f1b6

27 Sep, 2018 1 commit

PAI Training Service implementation (#128) · d3506e34

fishyds authored Sep 27, 2018

* PAI Training service implementation
**1. Implement PAITrainingService
**2. Add trial-keeper python module, and modify setup.py to install the module
**3. Add PAItrainingService rest server to collect metrics from PAI container.

d3506e34