1. 05 Aug, 2019 1 commit
  2. 01 Aug, 2019 1 commit
  3. 30 Jul, 2019 1 commit
  4. 24 Jun, 2019 1 commit
  5. 20 Jun, 2019 1 commit
  6. 19 Jun, 2019 2 commits
  7. 23 May, 2019 1 commit
  8. 17 Apr, 2019 1 commit
  9. 11 Apr, 2019 2 commits
  10. 27 Mar, 2019 1 commit
  11. 22 Mar, 2019 1 commit
  12. 20 Mar, 2019 1 commit
  13. 15 Mar, 2019 1 commit
    • SparkSnail's avatar
      Support version check of nni (#807) · d0b22fc7
      SparkSnail authored
      check nni version in trialkeeper, to make sure the version of trialkeeper is consistent with trainingService
      add a debug mode in config file
      d0b22fc7
  14. 25 Feb, 2019 2 commits
    • SparkSnail's avatar
      Support webhdfs path in python hdfs client (#722) · 8c4c0ef2
      SparkSnail authored
      trial_keeper use 50070 port to connect to webhdfs server, and PAI use a mapping method to map 50070 port to 5070 port to visit restful server, this method has some risk for PAI may not support this kind of mapping in later release.Now use Pylon path(/webhdfs/api/v1) instead of 50070 port in webhdfs client of trial_keeper, the path is transmitted in trainingService.
      In this pr, we have these changes:
      
      1. Change to use webhdfs path instead of 50070 port in hdfs client.
      2. Change to use new hdfs package "PythonWebHDFS", which is build to support pylon by myself. You could test the new function from "sparksnail/nni:dev-pai" image to test pai trainingService.
      3. Update some variables' name according to comments.
      8c4c0ef2
    • fishyds's avatar
      Fix a race condition bug that does not store Trial Job cancel status correctly (#707) · 9a3a75c8
      fishyds authored
      * Fix a race condition bug that does not store Trial Job cancel status correctly
      9a3a75c8
  15. 25 Jan, 2019 2 commits
    • fishyds's avatar
      [PAI bug fixing] Fix the incorrect PAI webhdfs endpoint path (#653) · 6bc12de0
      fishyds authored
      
      * Fix PAI webhdfs api endpoint
      6bc12de0
    • chicm-ms's avatar
      Refactoring nnimanager log (#652) · 6d591989
      chicm-ms authored
      * Pull code (#22)
      
      * Support distributed job for frameworkcontroller (#612)
      
      support distributed job for frameworkcontroller
      
      * Multiphase doc (#519)
      
      * multiPhase doc
      
      * updates
      
      * updates
      
      * Add time parser for 'nnictl update duration' (#632)
      
      Current nnictl update duration only support seconds unit, add a parser for this command to support {s, m, h, d}
      
      * fix experiment state bug (#629)
      
      * update top README.md (#622)
      
      * Update README.md
      
      * update (#634)
      
      * Integration tests refactoring (#625)
      
      * Integration test refactoring (#21) (#616)
      
      * Integration test refactoring (#21)
      
      * Refactoring integration tests
      
      * test metrics
      
      * update azure pipeline
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * update trigger
      
      * Integration test refactoring (#618)
      
      * updates
      
      * updates
      
      * update pipeline (#619)
      
      * update pipeline
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * test pipeline (#623)
      
      * test pipeline
      
      * updates
      
      * updates
      
      * updates
      
      * Update integration test (#624)
      
      * Update integration test
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * Revert "Pull code (#22)"
      
      This reverts commit 62fc165ad7b2ba724eead3b99f010aa34491e2c7.
      
      * Update nnimanager logs
      
      * updates
      
      * Update README.md
      
      * Revert "Update README.md"
      
      This reverts commit bc67061160e5d57305a6e7fb63d491d12d0e9002.
      
      * updates
      
      * updates
      6d591989
  16. 04 Jan, 2019 1 commit
  17. 03 Jan, 2019 1 commit
  18. 29 Dec, 2018 1 commit
  19. 19 Dec, 2018 1 commit
  20. 17 Dec, 2018 1 commit
  21. 10 Dec, 2018 1 commit
  22. 07 Dec, 2018 1 commit
  23. 30 Nov, 2018 1 commit
  24. 29 Nov, 2018 1 commit
  25. 28 Nov, 2018 1 commit
  26. 25 Nov, 2018 1 commit
    • QuanluZhang's avatar
      Fix trialjobstate (#385) · c4d1aefe
      QuanluZhang authored
      * add one more trial job status, EARLY_STOPPED
      
      * fix datastore/nnimanager/mockeddatastore. test/webui/metrics_reader not done. USER_TO_CANCEL
      
      * fix bug
      
      * modifications based on Deshui's comments
      
      * fix bug
      
      * fix bug in remote mode
      c4d1aefe
  27. 23 Nov, 2018 1 commit
    • SparkSnail's avatar
      Add nniManagerIp in nnictl and trainingService (#393) · c2a4ce6c
      SparkSnail authored
      Add nniManager Ip in nnictl, pai TrainingService and kubeflow TrainingService.
      If users set nniManagerIp, pai and kubeflow will use this ip instead of using getIPV4() function.
      Web UI will also use this nniManagerIp.
      c2a4ce6c
  28. 20 Nov, 2018 1 commit
    • fishyds's avatar
      [Kubeflow Training Service] V1, merge from kubeflow branch to master branch (#382) · 806afeb6
      fishyds authored
      * Kubeflow TrainingService support, v1 (#373)
      
      1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs
      2. Update nni python SDK to support new kubeflow platform
      3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file
      4. This version only supports Tensorflow operator. Will add more operators' support in future versions
      806afeb6
  29. 12 Nov, 2018 1 commit
  30. 05 Nov, 2018 2 commits
  31. 02 Nov, 2018 2 commits
  32. 31 Oct, 2018 3 commits