1. 28 Nov, 2018 1 commit
  2. 25 Nov, 2018 1 commit
    • QuanluZhang's avatar
      Fix trialjobstate (#385) · c4d1aefe
      QuanluZhang authored
      * add one more trial job status, EARLY_STOPPED
      
      * fix datastore/nnimanager/mockeddatastore. test/webui/metrics_reader not done. USER_TO_CANCEL
      
      * fix bug
      
      * modifications based on Deshui's comments
      
      * fix bug
      
      * fix bug in remote mode
      c4d1aefe
  3. 23 Nov, 2018 3 commits
  4. 22 Nov, 2018 1 commit
    • fishyds's avatar
      [Kubeflow training service] Update kubeflow exp job config schema to support... · e341df81
      fishyds authored
      [Kubeflow training service] Update kubeflow exp job config schema to support distributed training (#387)
      
      * Support distributed training on tf-operator, for worker and ps
      
      * Update validation rule for kubeflow config
      
      * small code refactor adjustment for private methods
      
      * Use different output folder for ps and worker
      e341df81
  5. 20 Nov, 2018 1 commit
    • fishyds's avatar
      [Kubeflow Training Service] V1, merge from kubeflow branch to master branch (#382) · 806afeb6
      fishyds authored
      * Kubeflow TrainingService support, v1 (#373)
      
      1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs
      2. Update nni python SDK to support new kubeflow platform
      3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file
      4. This version only supports Tensorflow operator. Will add more operators' support in future versions
      806afeb6