- 02 Mar, 2020 1 commit
-
-
George Cheng authored
* skeleton of dlts training service (#1844) * Hello, DLTS! * Revert version * Remove fs-extra * Add some default cluster config * schema * fix * Optional cluster (default to `.default`) Depends on DLWorkspace#837 * fix * fix * optimize gpu type * No more copy * Format * Code clean up * Issue fix * Add optional fields in config * Issue fix * Lint * Lint * Validate email, password and team * Doc * Doc fix * Set TMPDIR * Use metadata instead of gpu_capacity * Cancel paused DLTS job * workaround lint rules * pylint * doc Co-authored-by:QuanluZhang <z.quanluzhang@gmail.com>
-
- 23 Dec, 2019 1 commit
-
-
SparkSnail authored
-
- 11 Dec, 2019 1 commit
-
-
chicm-ms authored
* enable eslint * remove tslint
-
- 10 Dec, 2019 1 commit
-
-
chicm-ms authored
* update eslint rules * auto fix eslint * manually fix eslint (#1833)
-
- 25 Nov, 2019 1 commit
-
-
liuzhe-lz authored
-
- 24 Jun, 2019 1 commit
-
-
chicm-ms authored
* Refactor multiphase interface * Implement multiphase on PAI * update multiphase doc
-
- 20 Jun, 2019 1 commit
-
-
demianzhang authored
* fix local and remote training services tslint
-
- 19 Jun, 2019 1 commit
-
-
Hongarc authored
-
- 20 Nov, 2018 1 commit
-
-
fishyds authored
* Kubeflow TrainingService support, v1 (#373) 1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs 2. Update nni python SDK to support new kubeflow platform 3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file 4. This version only supports Tensorflow operator. Will add more operators' support in future versions
-
- 12 Nov, 2018 1 commit
-
-
fishyds authored
* Change base image from devel to runtime, to reduce docker image size * Support running multiple experiment for PAI * Fix a bug regarding to recuisively reference between paiRestServer and paiTrainingService
-
- 27 Sep, 2018 1 commit
-
-
fishyds authored
* PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container.
-
- 14 Sep, 2018 1 commit
-
-
fishyds authored
* Merge latest code changes into Github Master * temporary modification for travis * temporary modification for travis
-
- 24 Aug, 2018 1 commit
-
-
Deshui Yu authored
-
- 20 Aug, 2018 1 commit
-
-
Deshui Yu authored
-