1. 15 Mar, 2019 1 commit
    • SparkSnail's avatar
      Support version check of nni (#807) · d0b22fc7
      SparkSnail authored
      check nni version in trialkeeper, to make sure the version of trialkeeper is consistent with trainingService
      add a debug mode in config file
      d0b22fc7
  2. 14 Mar, 2019 1 commit
    • SparkSnail's avatar
      Fix ssh connection error (#829) · de9e2842
      SparkSnail authored
      SSH client has a max number of open channels for a connection, if we set the number of trialCurrency too big, our ssh client will exec command using ssh frequently, then we will meet the error of Error: (SSH) Channel open failure: open failed.
      Refactor the code, set one connection has a max trial concurrency, when the number of trial reach the ssh connection restriction, will create a new ssh connection to exec trial commands.
      de9e2842
  3. 25 Feb, 2019 1 commit
  4. 29 Jan, 2019 1 commit
    • SparkSnail's avatar
      Migrate remote log (#655) · 9d3d926b
      SparkSnail authored
      * fix remote bug
      
      * add document
      
      * add document
      
      * update
      
      * update
      
      * update
      
      * update
      
      * fix remote issue
      
      * fix forEach
      
      * update doc according to comments
      
      * update
      
      * update
      
      * update
      
      * remove 'any more'
      
      * add base version for remote-log
      
      * change launcher.py
      
      * test
      
      * basic version
      
      * debug
      
      * debug
      
      * basic work version
      
      * fix code
      
      * update disable_log
      
      * remove unused line
      
      * add diable log in kubernetesTrainingService
      
      * add detect frameworkcontroller
      
      * fix comment
      
      * update
      
      * update
      
      * fix kubernetesData
      
      * debug
      
      * debug
      
      * debug
      
      * fix comment
      
      * fix conflict
      
      * remove local temp files
      
      * revert launcher.py
      
      * update code by comments
      
      * remove disableLog
      
      * remove disable Log
      
      * set timeout for cleanup
      
      * fix code by comments
      
      * update variable names
      
      * add comments
      
      * add delay function
      
      * update
      
      * update
      
      * update by comments
      
      * add  in remote script path
      
      * rename variables
      
      * update variable name
      
      * add mkdir -p for subfolder
      9d3d926b
  5. 25 Jan, 2019 1 commit
    • chicm-ms's avatar
      Refactoring nnimanager log (#652) · 6d591989
      chicm-ms authored
      * Pull code (#22)
      
      * Support distributed job for frameworkcontroller (#612)
      
      support distributed job for frameworkcontroller
      
      * Multiphase doc (#519)
      
      * multiPhase doc
      
      * updates
      
      * updates
      
      * Add time parser for 'nnictl update duration' (#632)
      
      Current nnictl update duration only support seconds unit, add a parser for this command to support {s, m, h, d}
      
      * fix experiment state bug (#629)
      
      * update top README.md (#622)
      
      * Update README.md
      
      * update (#634)
      
      * Integration tests refactoring (#625)
      
      * Integration test refactoring (#21) (#616)
      
      * Integration test refactoring (#21)
      
      * Refactoring integration tests
      
      * test metrics
      
      * update azure pipeline
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * update trigger
      
      * Integration test refactoring (#618)
      
      * updates
      
      * updates
      
      * update pipeline (#619)
      
      * update pipeline
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * test pipeline (#623)
      
      * test pipeline
      
      * updates
      
      * updates
      
      * updates
      
      * Update integration test (#624)
      
      * Update integration test
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * Revert "Pull code (#22)"
      
      This reverts commit 62fc165ad7b2ba724eead3b99f010aa34491e2c7.
      
      * Update nnimanager logs
      
      * updates
      
      * Update README.md
      
      * Revert "Update README.md"
      
      This reverts commit bc67061160e5d57305a6e7fb63d491d12d0e9002.
      
      * updates
      
      * updates
      6d591989
  6. 04 Jan, 2019 1 commit
  7. 29 Nov, 2018 1 commit
  8. 27 Nov, 2018 1 commit
    • Yan Ni's avatar
      mac support with local, remote & pai mode (#386) · 101b02ff
      Yan Ni authored
      * update Makefile for mac support, wait for aka.ms support
      
      * refix Makefile for colorful echo
      
      * update Makefile with shorturl
      
      * fix false fail on mac webui
      
      * fix cross os remote tmpdir issue
      
      * add readonly to RemoteMachineTrainingService.remoteOS
      
      * fix var name for PR 386
      101b02ff
  9. 25 Nov, 2018 1 commit
    • QuanluZhang's avatar
      Fix trialjobstate (#385) · c4d1aefe
      QuanluZhang authored
      * add one more trial job status, EARLY_STOPPED
      
      * fix datastore/nnimanager/mockeddatastore. test/webui/metrics_reader not done. USER_TO_CANCEL
      
      * fix bug
      
      * modifications based on Deshui's comments
      
      * fix bug
      
      * fix bug in remote mode
      c4d1aefe
  10. 20 Nov, 2018 1 commit
    • fishyds's avatar
      [Kubeflow Training Service] V1, merge from kubeflow branch to master branch (#382) · 806afeb6
      fishyds authored
      * Kubeflow TrainingService support, v1 (#373)
      
      1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs
      2. Update nni python SDK to support new kubeflow platform
      3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file
      4. This version only supports Tensorflow operator. Will add more operators' support in future versions
      806afeb6
  11. 02 Nov, 2018 1 commit
  12. 16 Oct, 2018 1 commit
  13. 12 Oct, 2018 2 commits
    • chicm-ms's avatar
      Add api nni.get_sequence_id() (#203) · 1388d763
      chicm-ms authored
      * Pull latest code (#2)
      
      * webui logpath and document (#135)
      
      * Add webui document and logpath as a href
      
      * fix tslint
      
      * fix comments by Chengmin
      
      * Pai training service bug fix and enhancement (#136)
      
      * Add NNI installation scripts
      
      * Update pai script, update NNI_out_dir
      
      * Update NNI dir in nni sdk local.py
      
      * Create .nni folder in nni sdk local.py
      
      * Add check before creating .nni folder
      
      * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT
      
      * Improve annotation (#138)
      
      * Improve annotation
      
      * Minor bugfix
      
      * Selectively install through pip (#139)
      
      Selectively install through pip 
      * update setup.py
      
      * fix paiTrainingService bugs (#137)
      
      * fix nnictl bug
      
      * add hdfs host validation
      
      * fix bugs
      
      * fix dockerfile
      
      * fix install.sh
      
      * update install.sh
      
      * fix dockerfile
      
      * Set timeout for HDFSUtility exists function
      
      * remove unused TODO
      
      * fix sdk
      
      * add optional for outputDir and dataDir
      
      * refactor dockerfile.base
      
      * Remove unused import in hdfsclientUtility
      
      * Add documentation for NNI PAI mode experiment (#141)
      
      * Add documentation for NNI PAI mode
      
      * Fix typo based on PR comments
      
      * Exit with subprocess return code of trial keeper
      
      * Remove additional exit code
      
      * Fix typo based on PR comments
      
      * update doc for smac tuner (#140)
      
      * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)
      
      * Revert "Selectively install through pip (#139)"
      
      This reverts commit 1d174836.
      
      * Add exit code of subprocess for trial_keeper
      
      * Update README, add link to PAImode doc
      
      * fix bug (#147)
      
      * Refactor nnictl and add config_pai.yml (#144)
      
      * fix nnictl bug
      
      * add hdfs host validation
      
      * fix bugs
      
      * fix dockerfile
      
      * fix install.sh
      
      * update install.sh
      
      * fix dockerfile
      
      * Set timeout for HDFSUtility exists function
      
      * remove unused TODO
      
      * fix sdk
      
      * add optional for outputDir and dataDir
      
      * refactor dockerfile.base
      
      * Remove unused import in hdfsclientUtility
      
      * add config_pai.yml
      
      * refactor nnictl create logic and add colorful print
      
      * fix nnictl stop logic
      
      * add annotation for config_pai.yml
      
      * add document for start experiment
      
      * fix config.yml
      
      * fix document
      
      * Fix trial keeper wrongly exit issue (#152)
      
      * Fix trial keeper bug, use actual exitcode to exit rather than 1
      
      * Fix bug of table sort (#145)
      
      * Update doc for PAIMode and v0.2 release notes (#153)
      
      * Update v0.2 documentation regards to release note and PAI training service
      
      * Update document to describe NNI docker image
      
      * Bug fix for SQuAD example tuner. (#134)
      
      * Update Makefile (#151)
      
      * test
      
      * update setup.py
      
      * update Makefile and install.sh
      
      * rever setup.py
      
      * change color
      
      * update doc
      
      * update doc
      
      * fix auto-completion's extra space
      
      * update Makefile
      
      * update webui
      
      * Update doc image (#163)
      
      * update doc
      
      * trivial
      
      * trivial
      
      * trivial
      
      * trivial
      
      * trivial
      
      * trivial
      
      * update image
      
      * update image size
      
      * Update ga squad (#104)
      
      * update readme in ga_squad
      
      * update readme
      
      * fix typo
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * update readme
      
      * sklearn examples (#169)
      
      * fix nnictl bug
      
      * fix install.sh
      
      * add sklearn-regression example
      
      * add sklearn classification
      
      * update sklearn
      
      * update example
      
      * remove additional code
      
      * Update batch tuner (#158)
      
      * update readme in ga_squad
      
      * update readme
      
      * fix typo
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * update readme
      
      * update batch tuner
      
      * Quickly fix cascading search space bug in tuner (#156)
      
      * update readme in ga_squad
      
      * update readme
      
      * fix typo
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * update readme
      
      * quickly fix cascading searchspace bug in tuner
      
      * Add iterative search space example (#119)
      
      * update readme in ga_squad
      
      * update readme
      
      * fix typo
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * update readme
      
      * add iterative search space example
      
      * update
      
      * update readme
      
      * change name
      
      * Add api nni.get_sequence_id()
      
      * Add sequence_id to TrialJobDetail
      1388d763
    • fishyds's avatar
      Fix OpenPAI training service failed issue after multiphase training code merged (#206) · f4ee9f8a
      fishyds authored
      * fix parameter file name issue for multi-phase training
      
      * Updated based on comments
      f4ee9f8a
  14. 08 Oct, 2018 1 commit
    • chicm-ms's avatar
      Multi-phase training service (#148) · 39085789
      chicm-ms authored
      * Dev enas  - multi-phase hyper parameters support (#96)
      
      * Multi-phase support
      
      * Updates
      
      * Updates
      
      * updates
      
      * updates
      
      * updates
      
      * Merge master to dev-enas (#117)
      
      * Multi-phase support
      
      * update document (#92)
      
      * Edit readme.md
      
      * updated a word
      
      * Update GetStarted.md
      
      * Update GetStarted.md
      
      * refact readme, getstarted and write your trial md.
      
      * Update README.md
      
      * Update WriteYourTrial.md
      
      * Update WriteYourTrial.md
      
      * Update WriteYourTrial.md
      
      * Update WriteYourTrial.md
      
      * Fix nnictl bugs and add new feature (#75)
      
      * fix nnictl bug
      
      * fix nnictl create bug
      
      * add experiment status logic
      
      * add more information for nnictl
      
      * fix Evolution Tuner bug
      
      * refactor code
      
      * fix code in updater.py
      
      * fix nnictl --help
      
      * fix classArgs bug
      
      * update check response.status_code logic
      
      * Updates
      
      * remove Buffer warning (#100)
      
      * update readme in ga_squad
      
      * update readme
      
      * fix typo
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Updates
      
      * updates
      
      * updates
      
      * updates
      
      * Add support for debugging mode
      
      * fix setup.py (#115)
      
      * Add DAG model configuration format for SQuAD example.
      
      * Explain config format for SQuAD QA model.
      
      * Add more detailed introduction about the evolution algorithm.
      
      * Merge master to dev-enas (#118)
      
      * update document (#92)
      
      * Edit readme.md
      
      * updated a word
      
      * Update GetStarted.md
      
      * Update GetStarted.md
      
      * refact readme, getstarted and write your trial md.
      
      * Update README.md
      
      * Update WriteYourTrial.md
      
      * Update WriteYourTrial.md
      
      * Update WriteYourTrial.md
      
      * Update WriteYourTrial.md
      
      * Fix nnictl bugs and add new feature (#75)
      
      * fix nnictl bug
      
      * fix nnictl create bug
      
      * add experiment status logic
      
      * add more information for nnictl
      
      * fix Evolution Tuner bug
      
      * refactor code
      
      * fix code in updater.py
      
      * fix nnictl --help
      
      * fix classArgs bug
      
      * update check response.status_code logic
      
      * remove Buffer warning (#100)
      
      * update readme in ga_squad
      
      * update readme
      
      * fix typo
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Add support for debugging mode
      
      * fix setup.py (#115)
      
      * Add DAG model configuration format for SQuAD example.
      
      * Explain config format for SQuAD QA model.
      
      * Add more detailed introduction about the evolution algorithm.
      
      * Fix install.sh add add trial log path (#109)
      
      * fix nnictl bug
      
      * fix nnictl create bug
      
      * add experiment status logic
      
      * add more information for nnictl
      
      * fix Evolution Tuner bug
      
      * refactor code
      
      * fix code in updater.py
      
      * fix nnictl --help
      
      * fix classArgs bug
      
      * update check response.status_code logic
      
      * show trial log path
      
      * update document
      
      * fix install.sh
      
      * set default vallue for maxTrialNum and maxExecDuration
      
      * fix nnictl
      
      * support multiPhase (#127)
      
      * fix nnictl bug
      
      * support multiPhase
      
      * Fix multiphase datastore problem (#125)
      
      * Fix multiphase datastore problem
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * Pull latest code (#2)
      
      * webui logpath and document (#135)
      
      * Add webui document and logpath as a href
      
      * fix tslint
      
      * fix comments by Chengmin
      
      * Pai training service bug fix and enhancement (#136)
      
      * Add NNI installation scripts
      
      * Update pai script, update NNI_out_dir
      
      * Update NNI dir in nni sdk local.py
      
      * Create .nni folder in nni sdk local.py
      
      * Add check before creating .nni folder
      
      * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT
      
      * Improve annotation (#138)
      
      * Improve annotation
      
      * Minor bugfix
      
      * Selectively install through pip (#139)
      
      Selectively install through pip 
      * update setup.py
      
      * fix paiTrainingService bugs (#137)
      
      * fix nnictl bug
      
      * add hdfs host validation
      
      * fix bugs
      
      * fix dockerfile
      
      * fix install.sh
      
      * update install.sh
      
      * fix dockerfile
      
      * Set timeout for HDFSUtility exists function
      
      * remove unused TODO
      
      * fix sdk
      
      * add optional for outputDir and dataDir
      
      * refactor dockerfile.base
      
      * Remove unused import in hdfsclientUtility
      
      * Add documentation for NNI PAI mode experiment (#141)
      
      * Add documentation for NNI PAI mode
      
      * Fix typo based on PR comments
      
      * Exit with subprocess return code of trial keeper
      
      * Remove additional exit code
      
      * Fix typo based on PR comments
      
      * update doc for smac tuner (#140)
      
      * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)
      
      * Revert "Selectively install through pip (#139)"
      
      This reverts commit 1d174836.
      
      * Add exit code of subprocess for trial_keeper
      
      * Update README, add link to PAImode doc
      
      * fix bug (#147)
      
      * Refactor nnictl and add config_pai.yml (#144)
      
      * fix nnictl bug
      
      * add hdfs host validation
      
      * fix bugs
      
      * fix dockerfile
      
      * fix install.sh
      
      * update install.sh
      
      * fix dockerfile
      
      * Set timeout for HDFSUtility exists function
      
      * remove unused TODO
      
      * fix sdk
      
      * add optional for outputDir and dataDir
      
      * refactor dockerfile.base
      
      * Remove unused import in hdfsclientUtility
      
      * add config_pai.yml
      
      * refactor nnictl create logic and add colorful print
      
      * fix nnictl stop logic
      
      * add annotation for config_pai.yml
      
      * add document for start experiment
      
      * fix config.yml
      
      * fix document
      
      * Fix trial keeper wrongly exit issue (#152)
      
      * Fix trial keeper bug, use actual exitcode to exit rather than 1
      
      * Fix bug of table sort (#145)
      
      * Update doc for PAIMode and v0.2 release notes (#153)
      
      * Update v0.2 documentation regards to release note and PAI training service
      
      * Update document to describe NNI docker image
      
      * Bug fix for SQuAD example tuner. (#134)
      
      * Update Makefile (#151)
      
      * test
      
      * update setup.py
      
      * update Makefile and install.sh
      
      * rever setup.py
      
      * change color
      
      * update doc
      
      * update doc
      
      * fix auto-completion's extra space
      
      * update Makefile
      
      * update webui
      
      * Update doc image (#163)
      
      * update doc
      
      * trivial
      
      * trivial
      
      * trivial
      
      * trivial
      
      * trivial
      
      * trivial
      
      * update image
      
      * update image size
      
      * Update ga squad (#104)
      
      * update readme in ga_squad
      
      * update readme
      
      * fix typo
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * update readme
      
      * sklearn examples (#169)
      
      * fix nnictl bug
      
      * fix install.sh
      
      * add sklearn-regression example
      
      * add sklearn classification
      
      * update sklearn
      
      * update example
      
      * remove additional code
      
      * Update batch tuner (#158)
      
      * update readme in ga_squad
      
      * update readme
      
      * fix typo
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * update readme
      
      * update batch tuner
      
      * Quickly fix cascading search space bug in tuner (#156)
      
      * update readme in ga_squad
      
      * update readme
      
      * fix typo
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * update readme
      
      * quickly fix cascading searchspace bug in tuner
      
      * Add iterative search space example (#119)
      
      * update readme in ga_squad
      
      * update readme
      
      * fix typo
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * update readme
      
      * add iterative search space example
      
      * update
      
      * update readme
      
      * change name
      
      * updates
      
      * updates
      
      * Updates CI
      
      * updates
      39085789
  15. 27 Sep, 2018 1 commit
    • fishyds's avatar
      PAI Training Service implementation (#128) · d3506e34
      fishyds authored
      * PAI Training service implementation
      **1. Implement PAITrainingService
      **2. Add trial-keeper python module, and modify setup.py to install the module
      **3. Add PAItrainingService rest server to collect metrics from PAI container.
      d3506e34
  16. 14 Sep, 2018 1 commit
  17. 07 Sep, 2018 1 commit
  18. 24 Aug, 2018 1 commit
  19. 20 Aug, 2018 1 commit