1. 20 Dec, 2018 1 commit
    • fishyds's avatar
      [V0.4.1 Release] Merge v0.4.1 branch back to Master (#509) · ff834cea
      fishyds authored
      * Update nnictl.py
      
      Fix the issue that nnictl --version via pip installation doesn't work
      
      * Update kubeflow training service document (#494)
      
      * Remove kubectl related document, add messages for kubeconfig
      * Add design section for kubeflow training service
      * Move the image files for PAI training service doc into img folder.
      
      * Update KubeflowMode.md (#498)
      
      Update KubeflowMode.md, small terms change
      
      * [V0.4.1 bug fix] Cannot run kubeflow training service due to trial_keeper change (#503)
      
      * Update kubeflow training service document
      
      * fix bug a that kubeflow trial job cannot run
      
      * upgrade version number (#499)
      
      * [V0.4.1 bug fix] Support read K8S config from KUBECONFIG environment variable (#507)
      
      * Add KUBCONFIG env variable support
      
      * In main.ts, throw cached error to make sure nnictl can show the error in stderr
      ff834cea
  2. 14 Dec, 2018 1 commit
  3. 13 Dec, 2018 1 commit
  4. 07 Dec, 2018 1 commit
  5. 05 Dec, 2018 1 commit
  6. 30 Nov, 2018 1 commit
  7. 29 Nov, 2018 1 commit
  8. 28 Nov, 2018 1 commit
  9. 25 Nov, 2018 1 commit
    • QuanluZhang's avatar
      Fix trialjobstate (#385) · c4d1aefe
      QuanluZhang authored
      * add one more trial job status, EARLY_STOPPED
      
      * fix datastore/nnimanager/mockeddatastore. test/webui/metrics_reader not done. USER_TO_CANCEL
      
      * fix bug
      
      * modifications based on Deshui's comments
      
      * fix bug
      
      * fix bug in remote mode
      c4d1aefe
  10. 23 Nov, 2018 3 commits
  11. 22 Nov, 2018 1 commit
    • fishyds's avatar
      [Kubeflow training service] Update kubeflow exp job config schema to support... · e341df81
      fishyds authored
      [Kubeflow training service] Update kubeflow exp job config schema to support distributed training (#387)
      
      * Support distributed training on tf-operator, for worker and ps
      
      * Update validation rule for kubeflow config
      
      * small code refactor adjustment for private methods
      
      * Use different output folder for ps and worker
      e341df81
  12. 20 Nov, 2018 1 commit
    • fishyds's avatar
      [Kubeflow Training Service] V1, merge from kubeflow branch to master branch (#382) · 806afeb6
      fishyds authored
      * Kubeflow TrainingService support, v1 (#373)
      
      1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs
      2. Update nni python SDK to support new kubeflow platform
      3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file
      4. This version only supports Tensorflow operator. Will add more operators' support in future versions
      806afeb6