"...targets/git@developer.sourcefind.cn:gaoqiong/migraphx.git" did not exist on "70009bccc9ab9a3cd1512167d9a6b0b7f72cfbc3"
- 30 Apr, 2020 1 commit
-
-
Chi Song authored
To support Windows node in remote mode, this PR adds a layer of commands (osCommands) to deal difference between Windows and Unix-like OS. To share code, ShellExecutor is added to enrich original SshClient class. I will implement windows version commands in next phase. This pattern can be expanded to Local or other platform in future, so I moved related code to common folder for sharing.
-
- 26 Apr, 2020 1 commit
-
-
Chi Song authored
Add shell support for ssh connection, so that remote script can be started with user environment. Minor fixes, 1. Fix gpu_metrics_collector to support pyenv. As pyenv will create one more process, so that original pgrep code always got extra processes, and cannot start gpu_metrics_collector. 2. Fix NASUI failure on dev-install-node-modules, to create subfolder every time. 3. Fix MakeFile to reduce mis-created links, and other minor issues. 4. Add node --watch for nni_manager for better dev experience.
-
- 15 Jan, 2020 1 commit
-
-
chicm-ms authored
-
- 18 Dec, 2019 1 commit
-
-
chicm-ms authored
* Fix local system as remote machine issue #1852
-
- 11 Dec, 2019 1 commit
-
-
chicm-ms authored
* enable eslint * remove tslint
-
- 10 Dec, 2019 1 commit
-
-
chicm-ms authored
* update eslint rules * auto fix eslint * manually fix eslint (#1833)
-
- 25 Nov, 2019 1 commit
-
-
liuzhe-lz authored
-
- 22 Nov, 2019 2 commits
-
-
SparkSnail authored
-
SparkSnail authored
-
- 21 Nov, 2019 1 commit
-
-
liuzhe-lz authored
* fix gpu script permission issue * make gpu tool local to user
-
- 08 Nov, 2019 1 commit
-
-
chicm-ms authored
-
- 31 Oct, 2019 1 commit
-
-
SparkSnail authored
-
- 26 Sep, 2019 1 commit
-
-
liuzhe-lz authored
* Refactor web UI to support incremental metric loading * refactor * Remove host job * Move sequence ID to NNI manager * implement incremental loading
-
- 14 Aug, 2019 1 commit
-
-
Guoxin authored
* squash commits in v1.0 first round bug bash
-
- 12 Aug, 2019 1 commit
-
-
suiguoxin authored
-
- 02 Aug, 2019 1 commit
-
-
SparkSnail authored
-
- 20 Jun, 2019 1 commit
-
-
demianzhang authored
* fix local and remote training services tslint
-
- 19 Jun, 2019 1 commit
-
-
Hongarc authored
-
- 03 Jun, 2019 1 commit
-
-
SparkSnail authored
-
- 28 May, 2019 1 commit
-
-
SparkSnail authored
-
- 27 May, 2019 1 commit
-
-
demianzhang authored
* test python * test python36 * debug python * debug python * debug * python version * test python * debug * install nni * install nni * test powershell * debug python * test * test python * use python * test python * test python * test * update * test powershell * debug python * debug python * debug python * debug powershell * debug * debug * debug install.ps1 * add continueOnError: true * debug * debug * update * update * add unittest * test node * update * update joi * debug joi * add joi * debug joi * Update install * update * update * add unittest * add convert command * add example * fix windows commands * debug * fix tensorflow version * fix pipeline * update * add gpu logic in windows * update * update * debug * fix commands * fix commands * update * update * Fix comments * update * fix kill command * fix package.json * Update package.json * Refactor runScript * Fix bug * Fix comments * Fix execKill * Update * Update * Add unittest back * Rollback install node * Fix gpu memory * Update * Rollback check process * Update mnist-hyperband.test.yml * Update pipelines-it-local-windows.yml * Update uninstall.ps1 * Fix virtual environment * Fix tar * Fix isAlive * change gpu index logic * test gpu index * fix pipeline * add cifar10 * fix cifar10 * remove gpu in cifar10 * test mnist gpu * update * debug * Fix comments * debug * Update install.ps1 * debug * update gpu metrics shell * debug * debug * debug * debug * debug * debug sigbreak * Preinstall node-pre-gyp * Update Installation.md * Update Installation.md * Remove install node-pre-gyp * use taskkill to stop node process * use ctl+c event to stop process * add sigtrem signal in stop logic * add ctl+break command * Update isAlive * debug sigterm * Update pypi readme * Update * fix stop logic * fix pipeline, add cifar10 * revert mnist, remove gpu * Fix virtualenv * Fix comments * Update * Update * Fix install * Update install.ps1 * Update install.ps1 * Fix comments * Fix virtualenv install * Update * Update * Fix comments * Update * Update install.ps1 * Update * Update localTrainingService.ts * Update * Update * Update * Update * Update * Update util.ts * Update utils.ts * Fix system slash * Update tmp dir * Fix system slash * Use python3 in remote * Write tar command to file * Update tar * Update * Update * Fix stop * Update StopSignal type * Add removeTrialJobMetricListener * remove Listeners * Update listener * Update * Use Temp dir * Use Temp dir * Add remote windows pipeline * Update pipelines-it-remote-windows.yml * Update * remote build wheel * Update pipelines-it-remote-windows.yml * debug * debug * Use docker source install * Update * Update * Rollback remote build wheel * Use self node and yarn * Fix docker source install * Rollback Makefile * Upgrade docker pip * Update * Update * Remote build wheel * Use inline runOptions * Hide wget output * Add continueOnError * Update * Update * Update * Upgrade pip * Add chmod * Update * debug * Update * Use pscp * Update * Download putty * Update * Update * Update * Update * Update * Update * Update * Update * Update * debug * exclude metis * Refactor pathJoin * Update * debug metis * debug metis * Update * Update dependency * Fix comments * Update * Fix tslint * Fix comments * Fix comments * add doc * Fix comments * Update * Update doc
-
- 25 Apr, 2019 1 commit
-
-
chicm-ms authored
-
- 22 Apr, 2019 1 commit
-
-
demianzhang authored
-
- 18 Apr, 2019 1 commit
-
-
chicm-ms authored
* Refactoring local training service * Designated GPU for local training service * RemoteMachine designated GPU configuration
-
- 01 Apr, 2019 1 commit
-
-
SparkSnail authored
-
- 27 Mar, 2019 1 commit
-
-
SparkSnail authored
-
- 22 Mar, 2019 1 commit
-
-
SparkSnail authored
If user set remoteloggingType in config file, log content will not be transmitted from trialkeeper
-
- 15 Mar, 2019 1 commit
-
-
SparkSnail authored
check nni version in trialkeeper, to make sure the version of trialkeeper is consistent with trainingService add a debug mode in config file
-
- 14 Mar, 2019 1 commit
-
-
SparkSnail authored
SSH client has a max number of open channels for a connection, if we set the number of trialCurrency too big, our ssh client will exec command using ssh frequently, then we will meet the error of Error: (SSH) Channel open failure: open failed. Refactor the code, set one connection has a max trial concurrency, when the number of trial reach the ssh connection restriction, will create a new ssh connection to exec trial commands.
-
- 25 Feb, 2019 2 commits
-
-
SparkSnail authored
* add trialkeeper_stdout and trialkeeper_stderr * fix nnictl set remote nniManagerIP
-
fishyds authored
* Fix a race condition bug that does not store Trial Job cancel status correctly
-
- 29 Jan, 2019 1 commit
-
-
SparkSnail authored
* fix remote bug * add document * add document * update * update * update * update * fix remote issue * fix forEach * update doc according to comments * update * update * update * remove 'any more' * add base version for remote-log * change launcher.py * test * basic version * debug * debug * basic work version * fix code * update disable_log * remove unused line * add diable log in kubernetesTrainingService * add detect frameworkcontroller * fix comment * update * update * fix kubernetesData * debug * debug * debug * fix comment * fix conflict * remove local temp files * revert launcher.py * update code by comments * remove disableLog * remove disable Log * set timeout for cleanup * fix code by comments * update variable names * add comments * add delay function * update * update * update by comments * add in remote script path * rename variables * update variable name * add mkdir -p for subfolder
-
- 25 Jan, 2019 1 commit
-
-
chicm-ms authored
* Pull code (#22) * Support distributed job for frameworkcontroller (#612) support distributed job for frameworkcontroller * Multiphase doc (#519) * multiPhase doc * updates * updates * Add time parser for 'nnictl update duration' (#632) Current nnictl update duration only support seconds unit, add a parser for this command to support {s, m, h, d} * fix experiment state bug (#629) * update top README.md (#622) * Update README.md * update (#634) * Integration tests refactoring (#625) * Integration test refactoring (#21) (#616) * Integration test refactoring (#21) * Refactoring integration tests * test metrics * update azure pipeline * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * update trigger * Integration test refactoring (#618) * updates * updates * update pipeline (#619) * update pipeline * updates * updates * updates * updates * updates * test pipeline (#623) * test pipeline * updates * updates * updates * Update integration test (#624) * Update integration test * updates * updates * updates * updates * updates * updates * Revert "Pull code (#22)" This reverts commit 62fc165ad7b2ba724eead3b99f010aa34491e2c7. * Update nnimanager logs * updates * Update README.md * Revert "Update README.md" This reverts commit bc67061160e5d57305a6e7fb63d491d12d0e9002. * updates * updates
-
- 04 Jan, 2019 1 commit
-
-
SparkSnail authored
trial job could not be stopped in remote machine when experiment is stopped, because awit/async does not work normally in forEach, refer https://codeburst.io/javascript-async-await-with-foreach-b6ba62bbf404.
-
- 03 Jan, 2019 1 commit
-
-
Shinai Yang (FA TALENT) authored
-
- 29 Nov, 2018 1 commit
-
-
fishyds authored
* Add codeDir file count validation for setClusterConfig * fix a small bug if find command is not installed * Remove codeDir validation for local training service * Remove useless import
-
- 27 Nov, 2018 1 commit
-
-
Yan Ni authored
* update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * update Makefile with shorturl * fix false fail on mac webui * fix cross os remote tmpdir issue * add readonly to RemoteMachineTrainingService.remoteOS * fix var name for PR 386
-
- 25 Nov, 2018 1 commit
-
-
QuanluZhang authored
* add one more trial job status, EARLY_STOPPED * fix datastore/nnimanager/mockeddatastore. test/webui/metrics_reader not done. USER_TO_CANCEL * fix bug * modifications based on Deshui's comments * fix bug * fix bug in remote mode
-
- 20 Nov, 2018 1 commit
-
-
fishyds authored
* Kubeflow TrainingService support, v1 (#373) 1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs 2. Update nni python SDK to support new kubeflow platform 3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file 4. This version only supports Tensorflow operator. Will add more operators' support in future versions
-
- 02 Nov, 2018 1 commit
-
-
chicm-ms authored
-