1. 23 Nov, 2018 2 commits
  2. 22 Nov, 2018 1 commit
    • fishyds's avatar
      [Kubeflow training service] Update kubeflow exp job config schema to support... · e341df81
      fishyds authored
      [Kubeflow training service] Update kubeflow exp job config schema to support distributed training (#387)
      
      * Support distributed training on tf-operator, for worker and ps
      
      * Update validation rule for kubeflow config
      
      * small code refactor adjustment for private methods
      
      * Use different output folder for ps and worker
      e341df81
  3. 20 Nov, 2018 1 commit
    • fishyds's avatar
      [Kubeflow Training Service] V1, merge from kubeflow branch to master branch (#382) · 806afeb6
      fishyds authored
      * Kubeflow TrainingService support, v1 (#373)
      
      1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs
      2. Update nni python SDK to support new kubeflow platform
      3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file
      4. This version only supports Tensorflow operator. Will add more operators' support in future versions
      806afeb6