Unverified Commit 358efb26 authored by Yan Ni's avatar Yan Ni Committed by GitHub
Browse files

Dev weight sharing (#568) (#576)

* Dev weight sharing (#568)

* add pycharm project files to .gitignore list

* update pylintrc to conform vscode settings

* fix RemoteMachineMode for wrong trainingServicePlatform

* simple weight sharing

* update gitignore file

* change tuner codedir to relative path

* add python cache files to gitignore list

* move extract scalar reward logic from dispatcher to tuner

* update tuner code corresponding to last commit

* update doc for receive_trial_result api change

* add numpy to package whitelist of pylint

* distinguish param value from return reward for tuner.extract_scalar_reward

* update pylintrc

* add comments to dispatcher.handle_report_metric_data

* update install for mac support

* fix root mode bug on Makefile

* Quick fix bug: nnictl port value error (#245)

* fix port bug

* Dev exp stop more (#221)

* Exp stop refactor (#161)

* Update RemoteMachineMode.md (#63)

* Remove unused classes for SQuAD QA example.

* Remove more unused functions for SQuAD QA example.

* Fix default dataset config.

* Add Makefile README (#64)

* update document (#92)

* Edit readme.md

* updated a word

* Update GetStarted.md

* Update GetStarted.md

* refact readme, getstarted and write your trial md.

* Update README.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Fix nnictl bugs and add new feature (#75)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* remove Buffer warning (#100)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* Add support for debugging mode

* fix setup.py (#115)

* Add DAG model configuration format for SQuAD example.

* Explain config format for SQuAD QA model.

* Add more detailed introduction about the evolution algorithm.

* Fix install.sh add add trial log path (#109)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* show trial log path

* update document

* fix install.sh

* set default vallue for maxTrialNum and maxExecDuration

* fix nnictl

* Dev smac (#116)

* support package install (#91)

* fix nnictl bug

* support package install

* update

* update package install logic

* Fix package install issue (#95)

* fix nnictl bug

* fix pakcage install

* support SMAC as a tuner on nni (#81)

* update doc

* update doc

* update doc

* update hyperopt installation

* update doc

* update doc

* update description in setup.py

* update setup.py

* modify encoding

* encoding

* add encoding

* remove pymc3

* update doc

* update builtin tuner spec

* support smac in sdk, fix logging issue

* support smac tuner

* add optimize_mode

* update config in nnictl

* add __init__.py

* update smac

* update import path

* update setup.py: remove entry_point

* update rest server validation

* fix bug in nnictl launcher

* support classArgs: optimize_mode

* quick fix bug

* test travis

* add dependency

* add dependency

* add dependency

* add dependency

* create smac python package

* fix trivial points

* optimize import of tuners, modify nnictl accordingly

* fix bug: incorrect algorithm_name

* trivial refactor

* for debug

* support virtual

* update doc of SMAC

* update smac requirements

* update requirements

* change debug mode

* update doc

* update doc

* refactor based on comments

* fix comments

* modify example config path to relative path and increase maxTrialNum (#94)

* modify example config path to relative path and increase maxTrialNum

* add document

* support conda (#90) (#110)

* support install from venv and travis CI

* support install from venv and travis CI

* support install from venv and travis CI

* support conda

* support conda

* modify example config path to relative path and increase maxTrialNum

* undo messy commit

* undo messy commit

* Support pip install as root (#77)

* Typo on #58 (#122)

* PAI Training Service implementation (#128)

* PAI Training service implementation
**1. Implement PAITrainingService
**2. Add trial-keeper python module, and modify setup.py to install the module
**3. Add PAItrainingService rest server to collect metrics from PAI container.

* fix datastore for multiple final result (#129)

* Update NNI v0.2 release notes (#132)

Update NNI v0.2 release notes

* Update setup.py Makefile and documents (#130)

* update makefile and setup.py

* update makefile and setup.py

* update document

* update document

* Update Makefile no travis

*  update doc

*  update doc

* fix convert from ss to pcs (#133)

* Fix bugs about webui (#131)

* Fix webui bugs

* Fix tslint

* webui logpath and document (#135)

* Add webui document and logpath as a href

* fix tslint

* fix comments by Chengmin

* Pai training service bug fix and enhancement (#136)

* Add NNI installation scripts

* Update pai script, update NNI_out_dir

* Update NNI dir in nni sdk local.py

* Create .nni folder in nni sdk local.py

* Add check before creating .nni folder

* Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT

* Improve annotation (#138)

* Improve annotation

* Minor bugfix

* Selectively install through pip (#139)

Selectively install through pip 
* update setup.py

* fix paiTrainingService bugs (#137)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* Add documentation for NNI PAI mode experiment (#141)

* Add documentation for NNI PAI mode

* Fix typo based on PR comments

* Exit with subprocess return code of trial keeper

* Remove additional exit code

* Fix typo based on PR comments

* update doc for smac tuner (#140)

* Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)

* Revert "Selectively install through pip (#139)"

This reverts commit 1d174836.

* Add exit code of subprocess for trial_keeper

* Update README, add link to PAImode doc

* Merge branch V0.2 to Master (#143)

* webui logpath and document (#135)

* Add webui document and logpath as a href

* fix tslint

* fix comments by Chengmin

* Pai training service bug fix and enhancement (#136)

* Add NNI installation scripts

* Update pai script, update NNI_out_dir

* Update NNI dir in nni sdk local.py

* Create .nni folder in nni sdk local.py

* Add check before creating .nni folder

* Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT

* Improve annotation (#138)

* Improve annotation

* Minor bugfix

* Selectively install through pip (#139)

Selectively install through pip 
* update setup.py

* fix paiTrainingService bugs (#137)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* Add documentation for NNI PAI mode experiment (#141)

* Add documentation for NNI PAI mode

* Fix typo based on PR comments

* Exit with subprocess return code of trial keeper

* Remove additional exit code

* Fix typo based on PR comments

* update doc for smac tuner (#140)

* Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)

* Revert "Selectively install through pip (#139)"

This reverts commit 1d174836.

* Add exit code of subprocess for trial_keeper

* Update README, add link to PAImode doc

* fix bug (#147)

* Refactor nnictl and add config_pai.yml (#144)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* add config_pai.yml

* refactor nnictl create logic and add colorful print

* fix nnictl stop logic

* add annotation for config_pai.yml

* add document for start experiment

* fix config.yml

* fix document

* Fix trial keeper wrongly exit issue (#152)

* Fix trial keeper bug, use actual exitcode to exit rather than 1

* Fix bug of table sort (#145)

* Update doc for PAIMode and v0.2 release notes (#153)

* Update v0.2 documentation regards to release note and PAI training service

* Update document to describe NNI docker image

* fix antd (#159)

* refactor experiment stopping logic

* support change concurrency

* remove trialJobs.ts

* trivial changes

* fix bugs

* fix bug

* support updating maxTrialNum

* Modify IT scripts for supporting multiple experiments

* Update ci (#175)

* Update RemoteMachineMode.md (#63)

* Remove unused classes for SQuAD QA example.

* Remove more unused functions for SQuAD QA example.

* Fix default dataset config.

* Add Makefile README (#64)

* update document (#92)

* Edit readme.md

* updated a word

* Update GetStarted.md

* Update GetStarted.md

* refact readme, getstarted and write your trial md.

* Update README.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Fix nnictl bugs and add new feature (#75)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* remove Buffer warning (#100)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* Add support for debugging mode

* modify CI cuz of refracting exp stop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* file saving

* fix issues from code merge

* remove $(INSTALL_PREFIX)/nni/nni_manager before install

* fix indent

* fix merge issue

* socket close

* update port

* fix merge error

* modify ci logic in nnimanager

* fix ci

* fix bug

* change suspended to done

* update ci (#229)

* update ci

* update ci

* update ci (#232)

* update ci

* update ci

* update azure-pipelines

* update azure-pipelines

* update ci (#233)

* update ci

* update ci

* update azure-pipelines

* update azure-pipelines

* update azure-pipelines

* run.py (#238)

* Nnupdate ci (#239)

* run.py

* test ci

* Nnupdate ci (#240)

* run.py

* test ci

* test ci

* Udci (#241)

* run.py

* test ci

* test ci

* test ci

* update ci (#242)

* run.py

* test ci

* test ci

* test ci

* update ci

* revert install.sh (#244)

* run.py

* test ci

* test ci

* test ci

* update ci

* revert install.sh

* add comments

* remove assert

* trivial change

* trivial change

* update Makefile (#246)

* update Makefile

* update Makefile

* quick fix for ci (#248)

* add update trialNum and fix bugs (#261)

* Add builtin tuner to CI (#247)

* update Makefile

* update Makefile

* add builtin-tuner test

* add builtin-tuner test

* refractor ci

* update azure.yml

* add built-in tuner test

* fix bugs

* Doc refactor (#258)

* doc refactor

* image name refactor

* Refactor nnictl to support listing stopped experiments. (#256)

Refactor nnictl to support listing stopped experiments.

* Show experiment parameters more beautifully (#262)

* fix error on example of RemoteMachineMode (#269)

* add pycharm project files to .gitignore list

* update pylintrc to conform vscode settings

* fix RemoteMachineMode for wrong trainingServicePlatform

* Update docker file to use latest nni release (#263)

* fix bug about execDuration and endTime (#270)

* fix bug about execDuration and endTime

* modify time interval to 30 seconds

* refactor based on Gems's suggestion

* for triggering ci

* Refactor dockerfile (#264)

* refactor Dockerfile

* Support nnictl tensorboard (#268)

support tensorboard

* Sdk update (#272)

* Rename get_parameters to get_next_parameter

* annotations add get_next_parameter

* updates

* updates

* updates

* updates

* updates

* add experiment log path to experiment profile (#276)

* refactor extract reward from dict by tuner

* update Makefile for mac support, wait for aka.ms support

* refix Makefile for colorful echo

* unversion config.yml with machine information

* sync graph.py between tuners & trial of ga_squad

* sync graph.py between tuners & trial of ga_squad

* copy weight shared ga_squad under weight_sharing folder

* mv ga_squad code back to master

* simple tuner & trial ready

* Fix nnictl multiThread option

* weight sharing with async dispatcher simple example ready

* update for ga_squad

* fix bug

* modify multihead attention name

* add min_layer_num to Graph

* fix bug

* update share id calc

* fix bug

* add save logging

* fix ga_squad tuner bug

* sync bug fix for ga_squad tuner

* fix same hash_id bug

* add lock to simple tuner in weight sharing

* Add readme to simple weight sharing

* update

* update

* add paper link

* update

* reformat with autopep8

* add documentation for weight sharing

* test for weight sharing

* delete irrelevant files

* move details of weight sharing in to code comments

* Dev weight sharing update doc (#577)

* add pycharm project files to .gitignore list

* update pylintrc to conform vscode settings

* fix RemoteMachineMode for wrong trainingServicePlatform

* simple weight sharing

* update gitignore file

* change tuner codedir to relative path

* add python cache files to gitignore list

* move extract scalar reward logic from dispatcher to tuner

* update tuner code corresponding to last commit

* update doc for receive_trial_result api change

* add numpy to package whitelist of pylint

* distinguish param value from return reward for tuner.extract_scalar_reward

* update pylintrc

* add comments to dispatcher.handle_report_metric_data

* update install for mac support

* fix root mode bug on Makefile

* Quick fix bug: nnictl port value error (#245)

* fix port bug

* Dev exp stop more (#221)

* Exp stop refactor (#161)

* Update RemoteMachineMode.md (#63)

* Remove unused classes for SQuAD QA example.

* Remove more unused functions for SQuAD QA example.

* Fix default dataset config.

* Add Makefile README (#64)

* update document (#92)

* Edit readme.md

* updated a word

* Update GetStarted.md

* Update GetStarted.md

* refact readme, getstarted and write your trial md.

* Update README.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Fix nnictl bugs and add new feature (#75)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* remove Buffer warning (#100)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* Add support for debugging mode

* fix setup.py (#115)

* Add DAG model configuration format for SQuAD example.

* Explain config format for SQuAD QA model.

* Add more detailed introduction about the evolution algorithm.

* Fix install.sh add add trial log path (#109)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* show trial log path

* update document

* fix install.sh

* set default vallue for maxTrialNum and maxExecDuration

* fix nnictl

* Dev smac (#116)

* support package install (#91)

* fix nnictl bug

* support package install

* update

* update package install logic

* Fix package install issue (#95)

* fix nnictl bug

* fix pakcage install

* support SMAC as a tuner on nni (#81)

* update doc

* update doc

* update doc

* update hyperopt installation

* update doc

* update doc

* update description in setup.py

* update setup.py

* modify encoding

* encoding

* add encoding

* remove pymc3

* update doc

* update builtin tuner spec

* support smac in sdk, fix logging issue

* support smac tuner

* add optimize_mode

* update config in nnictl

* add __init__.py

* update smac

* update import path

* update setup.py: remove entry_point

* update rest server validation

* fix bug in nnictl launcher

* support classArgs: optimize_mode

* quick fix bug

* test travis

* add dependency

* add dependency

* add dependency

* add dependency

* create smac python package

* fix trivial points

* optimize import of tuners, modify nnictl accordingly

* fix bug: incorrect algorithm_name

* trivial refactor

* for debug

* support virtual

* update doc of SMAC

* update smac requirements

* update requirements

* change debug mode

* update doc

* update doc

* refactor based on comments

* fix comments

* modify example config path to relative path and increase maxTrialNum (#94)

* modify example config path to relative path and increase maxTrialNum

* add document

* support conda (#90) (#110)

* support install from venv and travis CI

* support install from venv and travis CI

* support install from venv and travis CI

* support conda

* support conda

* modify example config path to relative path and increase maxTrialNum

* undo messy commit

* undo messy commit

* Support pip install as root (#77)

* Typo on #58 (#122)

* PAI Training Service implementation (#128)

* PAI Training service implementation
**1. Implement PAITrainingService
**2. Add trial-keeper python module, and modify setup.py to install the module
**3. Add PAItrainingService rest server to collect metrics from PAI container.

* fix datastore for multiple final result (#129)

* Update NNI v0.2 release notes (#132)

Update NNI v0.2 release notes

* Update setup.py Makefile and documents (#130)

* update makefile and setup.py

* update makefile and setup.py

* update document

* update document

* Update Makefile no travis

*  update doc

*  update doc

* fix convert from ss to pcs (#133)

* Fix bugs about webui (#131)

* Fix webui bugs

* Fix tslint

* webui logpath and document (#135)

* Add webui document and logpath as a href

* fix tslint

* fix comments by Chengmin

* Pai training service bug fix and enhancement (#136)

* Add NNI installation scripts

* Update pai script, update NNI_out_dir

* Update NNI dir in nni sdk local.py

* Create .nni folder in nni sdk local.py

* Add check before creating .nni folder

* Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT

* Improve annotation (#138)

* Improve annotation

* Minor bugfix

* Selectively install through pip (#139)

Selectively install through pip 
* update setup.py

* fix paiTrainingService bugs (#137)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* Add documentation for NNI PAI mode experiment (#141)

* Add documentation for NNI PAI mode

* Fix typo based on PR comments

* Exit with subprocess return code of trial keeper

* Remove additional exit code

* Fix typo based on PR comments

* update doc for smac tuner (#140)

* Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)

* Revert "Selectively install through pip (#139)"

This reverts commit 1d174836.

* Add exit code of subprocess for trial_keeper

* Update README, add link to PAImode doc

* Merge branch V0.2 to Master (#143)

* webui logpath and document (#135)

* Add webui document and logpath as a href

* fix tslint

* fix comments by Chengmin

* Pai training service bug fix and enhancement (#136)

* Add NNI installation scripts

* Update pai script, update NNI_out_dir

* Update NNI dir in nni sdk local.py

* Create .nni folder in nni sdk local.py

* Add check before creating .nni folder

* Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT

* Improve annotation (#138)

* Improve annotation

* Minor bugfix

* Selectively install through pip (#139)

Selectively install through pip 
* update setup.py

* fix paiTrainingService bugs (#137)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* Add documentation for NNI PAI mode experiment (#141)

* Add documentation for NNI PAI mode

* Fix typo based on PR comments

* Exit with subprocess return code of trial keeper

* Remove additional exit code

* Fix typo based on PR comments

* update doc for smac tuner (#140)

* Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)

* Revert "Selectively install through pip (#139)"

This reverts commit 1d174836.

* Add exit code of subprocess for trial_keeper

* Update README, add link to PAImode doc

* fix bug (#147)

* Refactor nnictl and add config_pai.yml (#144)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* add config_pai.yml

* refactor nnictl create logic and add colorful print

* fix nnictl stop logic

* add annotation for config_pai.yml

* add document for start experiment

* fix config.yml

* fix document

* Fix trial keeper wrongly exit issue (#152)

* Fix trial keeper bug, use actual exitcode to exit rather than 1

* Fix bug of table sort (#145)

* Update doc for PAIMode and v0.2 release notes (#153)

* Update v0.2 documentation regards to release note and PAI training service

* Update document to describe NNI docker image

* fix antd (#159)

* refactor experiment stopping logic

* support change concurrency

* remove trialJobs.ts

* trivial changes

* fix bugs

* fix bug

* support updating maxTrialNum

* Modify IT scripts for supporting multiple experiments

* Update ci (#175)

* Update RemoteMachineMode.md (#63)

* Remove unused classes for SQuAD QA example.

* Remove more unused functions for SQuAD QA example.

* Fix default dataset config.

* Add Makefile README (#64)

* update document (#92)

* Edit readme.md

* updated a word

* Update GetStarted.md

* Update GetStarted.md

* refact readme, getstarted and write your trial md.

* Update README.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Fix nnictl bugs and add new feature (#75)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* remove Buffer warning (#100)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* Add support for debugging mode

* modify CI cuz of refracting exp stop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* file saving

* fix issues from code merge

* remove $(INSTALL_PREFIX)/nni/nni_manager before install

* fix indent

* fix merge issue

* socket close

* update port

* fix merge error

* modify ci logic in nnimanager

* fix ci

* fix bug

* change suspended to done

* update ci (#229)

* update ci

* update ci

* update ci (#232)

* update ci

* update ci

* update azure-pipelines

* update azure-pipelines

* update ci (#233)

* update ci

* update ci

* update azure-pipelines

* update azure-pipelines

* update azure-pipelines

* run.py (#238)

* Nnupdate ci (#239)

* run.py

* test ci

* Nnupdate ci (#240)

* run.py

* test ci

* test ci

* Udci (#241)

* run.py

* test ci

* test ci

* test ci

* update ci (#242)

* run.py

* test ci

* test ci

* test ci

* update ci

* revert install.sh (#244)

* run.py

* test ci

* test ci

* test ci

* update ci

* revert install.sh

* add comments

* remove assert

* trivial change

* trivial change

* update Makefile (#246)

* update Makefile

* update Makefile

* quick fix for ci (#248)

* add update trialNum and fix bugs (#261)

* Add builtin tuner to CI (#247)

* update Makefile

* update Makefile

* add builtin-tuner test

* add builtin-tuner test

* refractor ci

* update azure.yml

* add built-in tuner test

* fix bugs

* Doc refactor (#258)

* doc refactor

* image name refactor

* Refactor nnictl to support listing stopped experiments. (#256)

Refactor nnictl to support listing stopped experiments.

* Show experiment parameters more beautifully (#262)

* fix error on example of RemoteMachineMode (#269)

* add pycharm project files to .gitignore list

* update pylintrc to conform vscode settings

* fix RemoteMachineMode for wrong trainingServicePlatform

* Update docker file to use latest nni release (#263)

* fix bug about execDuration and endTime (#270)

* fix bug about execDuration and endTime

* modify time interval to 30 seconds

* refactor based on Gems's suggestion

* for triggering ci

* Refactor dockerfile (#264)

* refactor Dockerfile

* Support nnictl tensorboard (#268)

support tensorboard

* Sdk update (#272)

* Rename get_parameters to get_next_parameter

* annotations add get_next_parameter

* updates

* updates

* updates

* updates

* updates

* add experiment log path to experiment profile (#276)

* refactor extract reward from dict by tuner

* update Makefile for mac support, wait for aka.ms support

* refix Makefile for colorful echo

* unversion config.yml with machine information

* sync graph.py between tuners & trial of ga_squad

* sync graph.py between tuners & trial of ga_squad

* copy weight shared ga_squad under weight_sharing folder

* mv ga_squad code back to master

* simple tuner & trial ready

* Fix nnictl multiThread option

* weight sharing with async dispatcher simple example ready

* update for ga_squad

* fix bug

* modify multihead attention name

* add min_layer_num to Graph

* fix bug

* update share id calc

* fix bug

* add save logging

* fix ga_squad tuner bug

* sync bug fix for ga_squad tuner

* fix same hash_id bug

* add lock to simple tuner in weight sharing

* Add readme to simple weight sharing

* update

* update

* add paper link

* update

* reformat with autopep8

* add documentation for weight sharing

* test for weight sharing

* delete irrelevant files

* move details of weight sharing in to code comments

* add example section

* Dev weight sharing update (#579)

* add pycharm project files to .gitignore list

* update pylintrc to conform vscode settings

* fix RemoteMachineMode for wrong trainingServicePlatform

* simple weight sharing

* update gitignore file

* change tuner codedir to relative path

* add python cache files to gitignore list

* move extract scalar reward logic from dispatcher to tuner

* update tuner code corresponding to last commit

* update doc for receive_trial_result api change

* add numpy to package whitelist of pylint

* distinguish param value from return reward for tuner.extract_scalar_reward

* update pylintrc

* add comments to dispatcher.handle_report_metric_data

* update install for mac support

* fix root mode bug on Makefile

* Quick fix bug: nnictl port value error (#245)

* fix port bug

* Dev exp stop more (#221)

* Exp stop refactor (#161)

* Update RemoteMachineMode.md (#63)

* Remove unused classes for SQuAD QA example.

* Remove more unused functions for SQuAD QA example.

* Fix default dataset config.

* Add Makefile README (#64)

* update document (#92)

* Edit readme.md

* updated a word

* Update GetStarted.md

* Update GetStarted.md

* refact readme, getstarted and write your trial md.

* Update README.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Fix nnictl bugs and add new feature (#75)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* remove Buffer warning (#100)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* Add support for debugging mode

* fix setup.py (#115)

* Add DAG model configuration format for SQuAD example.

* Explain config format for SQuAD QA model.

* Add more detailed introduction about the evolution algorithm.

* Fix install.sh add add trial log path (#109)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* show trial log path

* update document

* fix install.sh

* set default vallue for maxTrialNum and maxExecDuration

* fix nnictl

* Dev smac (#116)

* support package install (#91)

* fix nnictl bug

* support package install

* update

* update package install logic

* Fix package install issue (#95)

* fix nnictl bug

* fix pakcage install

* support SMAC as a tuner on nni (#81)

* update doc

* update doc

* update doc

* update hyperopt installation

* update doc

* update doc

* update description in setup.py

* update setup.py

* modify encoding

* encoding

* add encoding

* remove pymc3

* update doc

* update builtin tuner spec

* support smac in sdk, fix logging issue

* support smac tuner

* add optimize_mode

* update config in nnictl

* add __init__.py

* update smac

* update import path

* update setup.py: remove entry_point

* update rest server validation

* fix bug in nnictl launcher

* support classArgs: optimize_mode

* quick fix bug

* test travis

* add dependency

* add dependency

* add dependency

* add dependency

* create smac python package

* fix trivial points

* optimize import of tuners, modify nnictl accordingly

* fix bug: incorrect algorithm_name

* trivial refactor

* for debug

* support virtual

* update doc of SMAC

* update smac requirements

* update requirements

* change debug mode

* update doc

* update doc

* refactor based on comments

* fix comments

* modify example config path to relative path and increase maxTrialNum (#94)

* modify example config path to relative path and increase maxTrialNum

* add document

* support conda (#90) (#110)

* support install from venv and travis CI

* support install from venv and travis CI

* support install from venv and travis CI

* support conda

* support conda

* modify example config path to relative path and increase maxTrialNum

* undo messy commit

* undo messy commit

* Support pip install as root (#77)

* Typo on #58 (#122)

* PAI Training Service implementation (#128)

* PAI Training service implementation
**1. Implement PAITrainingService
**2. Add trial-keeper python module, and modify setup.py to install the module
**3. Add PAItrainingService rest server to collect metrics from PAI container.

* fix datastore for multiple final result (#129)

* Update NNI v0.2 release notes (#132)

Update NNI v0.2 release notes

* Update setup.py Makefile and documents (#130)

* update makefile and setup.py

* update makefile and setup.py

* update document

* update document

* Update Makefile no travis

*  update doc

*  update doc

* fix convert from ss to pcs (#133)

* Fix bugs about webui (#131)

* Fix webui bugs

* Fix tslint

* webui logpath and document (#135)

* Add webui document and logpath as a href

* fix tslint

* fix comments by Chengmin

* Pai training service bug fix and enhancement (#136)

* Add NNI installation scripts

* Update pai script, update NNI_out_dir

* Update NNI dir in nni sdk local.py

* Create .nni folder in nni sdk local.py

* Add check before creating .nni folder

* Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT

* Improve annotation (#138)

* Improve annotation

* Minor bugfix

* Selectively install through pip (#139)

Selectively install through pip 
* update setup.py

* fix paiTrainingService bugs (#137)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* Add documentation for NNI PAI mode experiment (#141)

* Add documentation for NNI PAI mode

* Fix typo based on PR comments

* Exit with subprocess return code of trial keeper

* Remove additional exit code

* Fix typo based on PR comments

* update doc for smac tuner (#140)

* Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)

* Revert "Selectively install through pip (#139)"

This reverts commit 1d174836.

* Add exit code of subprocess for trial_keeper

* Update README, add link to PAImode doc

* Merge branch V0.2 to Master (#143)

* webui logpath and document (#135)

* Add webui document and logpath as a href

* fix tslint

* fix comments by Chengmin

* Pai training service bug fix and enhancement (#136)

* Add NNI installation scripts

* Update pai script, update NNI_out_dir

* Update NNI dir in nni sdk local.py

* Create .nni folder in nni sdk local.py

* Add check before creating .nni folder

* Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT

* Improve annotation (#138)

* Improve annotation

* Minor bugfix

* Selectively install through pip (#139)

Selectively install through pip 
* update setup.py

* fix paiTrainingService bugs (#137)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* Add documentation for NNI PAI mode experiment (#141)

* Add documentation for NNI PAI mode

* Fix typo based on PR comments

* Exit with subprocess return code of trial keeper

* Remove additional exit code

* Fix typo based on PR comments

* update doc for smac tuner (#140)

* Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)

* Revert "Selectively install through pip (#139)"

This reverts commit 1d174836.

* Add exit code of subprocess for trial_keeper

* Update README, add link to PAImode doc

* fix bug (#147)

* Refactor nnictl and add config_pai.yml (#144)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* add config_pai.yml

* refactor nnictl create logic and add colorful print

* fix nnictl stop logic

* add annotation for config_pai.yml

* add document for start experiment

* fix config.yml

* fix document

* Fix trial keeper wrongly exit issue (#152)

* Fix trial keeper bug, use actual exitcode to exit rather than 1

* Fix bug of table sort (#145)

* Update doc for PAIMode and v0.2 release notes (#153)

* Update v0.2 documentation regards to release note and PAI training service

* Update document to describe NNI docker image

* fix antd (#159)

* refactor experiment stopping logic

* support change concurrency

* remove trialJobs.ts

* trivial changes

* fix bugs

* fix bug

* support updating maxTrialNum

* Modify IT scripts for supporting multiple experiments

* Update ci (#175)

* Update RemoteMachineMode.md (#63)

* Remove unused classes for SQuAD QA example.

* Remove more unused functions for SQuAD QA example.

* Fix default dataset config.

* Add Makefile README (#64)

* update document (#92)

* Edit readme.md

* updated a word

* Update GetStarted.md

* Update GetStarted.md

* refact readme, getstarted and write your trial md.

* Update README.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Fix nnictl bugs and add new feature (#75)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* remove Buffer warning (#100)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* Add support for debugging mode

* modify CI cuz of refracting exp stop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* file saving

* fix issues from code merge

* remove $(INSTALL_PREFIX)/nni/nni_manager before install

* fix indent

* fix merge issue

* socket close

* update port

* fix merge error

* modify ci logic in nnimanager

* fix ci

* fix bug

* change suspended to done

* update ci (#229)

* update ci

* update ci

* update ci (#232)

* update ci

* update ci

* update azure-pipelines

* update azure-pipelines

* update ci (#233)

* update ci

* update ci

* update azure-pipelines

* update azure-pipelines

* update azure-pipelines

* run.py (#238)

* Nnupdate ci (#239)

* run.py

* test ci

* Nnupdate ci (#240)

* run.py

* test ci

* test ci

* Udci (#241)

* run.py

* test ci

* test ci

* test ci

* update ci (#242)

* run.py

* test ci

* test ci

* test ci

* update ci

* revert install.sh (#244)

* run.py

* test ci

* test ci

* test ci

* update ci

* revert install.sh

* add comments

* remove assert

* trivial change

* trivial change

* update Makefile (#246)

* update Makefile

* update Makefile

* quick fix for ci (#248)

* add update trialNum and fix bugs (#261)

* Add builtin tuner to CI (#247)

* update Makefile

* update Makefile

* add builtin-tuner test

* add builtin-tuner test

* refractor ci

* update azure.yml

* add built-in tuner test

* fix bugs

* Doc refactor (#258)

* doc refactor

* image name refactor

* Refactor nnictl to support listing stopped experiments. (#256)

Refactor nnictl to support listing stopped experiments.

* Show experiment parameters more beautifully (#262)

* fix error on example of RemoteMachineMode (#269)

* add pycharm project files to .gitignore list

* update pylintrc to conform vscode settings

* fix RemoteMachineMode for wrong trainingServicePlatform

* Update docker file to use latest nni release (#263)

* fix bug about execDuration and endTime (#270)

* fix bug about execDuration and endTime

* modify time interval to 30 seconds

* refactor based on Gems's suggestion

* for triggering ci

* Refactor dockerfile (#264)

* refactor Dockerfile

* Support nnictl tensorboard (#268)

support tensorboard

* Sdk update (#272)

* Rename get_parameters to get_next_parameter

* annotations add get_next_parameter

* updates

* updates

* updates

* updates

* updates

* add experiment log path to experiment profile (#276)

* refactor extract reward from dict by tuner

* update Makefile for mac support, wait for aka.ms support

* refix Makefile for colorful echo

* unversion config.yml with machine information

* sync graph.py between tuners & trial of ga_squad

* sync graph.py between tuners & trial of ga_squad

* copy weight shared ga_squad under weight_sharing folder

* mv ga_squad code back to master

* simple tuner & trial ready

* Fix nnictl multiThread option

* weight sharing with async dispatcher simple example ready

* update for ga_squad

* fix bug

* modify multihead attention name

* add min_layer_num to Graph

* fix bug

* update share id calc

* fix bug

* add save logging

* fix ga_squad tuner bug

* sync bug fix for ga_squad tuner

* fix same hash_id bug

* add lock to simple tuner in weight sharing

* Add readme to simple weight sharing

* update

* update

* add paper link

* update

* reformat with autopep8

* add documentation for weight sharing

* test for weight sharing

* delete irrelevant files

* move details of weight sharing in to code comments

* add example section

* update weight sharing tutorial

* Dev weight sharing (#581)

* add pycharm project files to .gitignore list

* update pylintrc to conform vscode settings

* fix RemoteMachineMode for wrong trainingServicePlatform

* simple weight sharing

* update gitignore file

* change tuner codedir to relative path

* add python cache files to gitignore list

* move extract scalar reward logic from dispatcher to tuner

* update tuner code corresponding to last commit

* update doc for receive_trial_result api change

* add numpy to package whitelist of pylint

* distinguish param value from return reward for tuner.extract_scalar_reward

* update pylintrc

* add comments to dispatcher.handle_report_metric_data

* update install for mac support

* fix root mode bug on Makefile

* Quick fix bug: nnictl port value error (#245)

* fix port bug

* Dev exp stop more (#221)

* Exp stop refactor (#161)

* Update RemoteMachineMode.md (#63)

* Remove unused classes for SQuAD QA example.

* Remove more unused functions for SQuAD QA example.

* Fix default dataset config.

* Add Makefile README (#64)

* update document (#92)

* Edit readme.md

* updated a word

* Update GetStarted.md

* Update GetStarted.md

* refact readme, getstarted and write your trial md.

* Update README.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Fix nnictl bugs and add new feature (#75)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* remove Buffer warning (#100)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* Add support for debugging mode

* fix setup.py (#115)

* Add DAG model configuration format for SQuAD example.

* Explain config format for SQuAD QA model.

* Add more detailed introduction about the evolution algorithm.

* Fix install.sh add add trial log path (#109)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* show trial log path

* update document

* fix install.sh

* set default vallue for maxTrialNum and maxExecDuration

* fix nnictl

* Dev smac (#116)

* support package install (#91)

* fix nnictl bug

* support package install

* update

* update package install logic

* Fix package install issue (#95)

* fix nnictl bug

* fix pakcage install

* support SMAC as a tuner on nni (#81)

* update doc

* update doc

* update doc

* update hyperopt installation

* update doc

* update doc

* update description in setup.py

* update setup.py

* modify encoding

* encoding

* add encoding

* remove pymc3

* update doc

* update builtin tuner spec

* support smac in sdk, fix logging issue

* support smac tuner

* add optimize_mode

* update config in nnictl

* add __init__.py

* update smac

* update import path

* update setup.py: remove entry_point

* update rest server validation

* fix bug in nnictl launcher

* support classArgs: optimize_mode

* quick fix bug

* test travis

* add dependency

* add dependency

* add dependency

* add dependency

* create smac python package

* fix trivial points

* optimize import of tuners, modify nnictl accordingly

* fix bug: incorrect algorithm_name

* trivial refactor

* for debug

* support virtual

* update doc of SMAC

* update smac requirements

* update requirements

* change debug mode

* update doc

* update doc

* refactor based on comments

* fix comments

* modify example config path to relative path and increase maxTrialNum (#94)

* modify example config path to relative path and increase maxTrialNum

* add document

* support conda (#90) (#110)

* support install from venv and travis CI

* support install from venv and travis CI

* support install from venv and travis CI

* support conda

* support conda

* modify example config path to relative path and increase maxTrialNum

* undo messy commit

* undo messy commit

* Support pip install as root (#77)

* Typo on #58 (#122)

* PAI Training Service implementation (#128)

* PAI Training service implementation
**1. Implement PAITrainingService
**2. Add trial-keeper python module, and modify setup.py to install the module
**3. Add PAItrainingService rest server to collect metrics from PAI container.

* fix datastore for multiple final result (#129)

* Update NNI v0.2 release notes (#132)

Update NNI v0.2 release notes

* Update setup.py Makefile and documents (#130)

* update makefile and setup.py

* update makefile and setup.py

* update document

* update document

* Update Makefile no travis

*  update doc

*  update doc

* fix convert from ss to pcs (#133)

* Fix bugs about webui (#131)

* Fix webui bugs

* Fix tslint

* webui logpath and document (#135)

* Add webui document and logpath as a href

* fix tslint

* fix comments by Chengmin

* Pai training service bug fix and enhancement (#136)

* Add NNI installation scripts

* Update pai script, update NNI_out_dir

* Update NNI dir in nni sdk local.py

* Create .nni folder in nni sdk local.py

* Add check before creating .nni folder

* Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT

* Improve annotation (#138)

* Improve annotation

* Minor bugfix

* Selectively install through pip (#139)

Selectively install through pip 
* update setup.py

* fix paiTrainingService bugs (#137)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* Add documentation for NNI PAI mode experiment (#141)

* Add documentation for NNI PAI mode

* Fix typo based on PR comments

* Exit with subprocess return code of trial keeper

* Remove additional exit code

* Fix typo based on PR comments

* update doc for smac tuner (#140)

* Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)

* Revert "Selectively install through pip (#139)"

This reverts commit 1d174836.

* Add exit code of subprocess for trial_keeper

* Update README, add link to PAImode doc

* Merge branch V0.2 to Master (#143)

* webui logpath and document (#135)

* Add webui document and logpath as a href

* fix tslint

* fix comments by Chengmin

* Pai training service bug fix and enhancement (#136)

* Add NNI installation scripts

* Update pai script, update NNI_out_dir

* Update NNI dir in nni sdk local.py

* Create .nni folder in nni sdk local.py

* Add check before creating .nni folder

* Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT

* Improve annotation (#138)

* Improve annotation

* Minor bugfix

* Selectively install through pip (#139)

Selectively install through pip 
* update setup.py

* fix paiTrainingService bugs (#137)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* Add documentation for NNI PAI mode experiment (#141)

* Add documentation for NNI PAI mode

* Fix typo based on PR comments

* Exit with subprocess return code of trial keeper

* Remove additional exit code

* Fix typo based on PR comments

* update doc for smac tuner (#140)

* Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)

* Revert "Selectively install through pip (#139)"

This reverts commit 1d174836.

* Add exit code of subprocess for trial_keeper

* Update README, add link to PAImode doc

* fix bug (#147)

* Refactor nnictl and add config_pai.yml (#144)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* add config_pai.yml

* refactor nnictl create logic and add colorful print

* fix nnictl stop logic

* add annotation for config_pai.yml

* add document for start experiment

* fix config.yml

* fix document

* Fix trial keeper wrongly exit issue (#152)

* Fix trial keeper bug, use actual exitcode to exit rather than 1

* Fix bug of table sort (#145)

* Update doc for PAIMode and v0.2 release notes (#153)

* Update v0.2 documentation regards to release note and PAI training service

* Update document to describe NNI docker image

* fix antd (#159)

* refactor experiment stopping logic

* support change concurrency

* remove trialJobs.ts

* trivial changes

* fix bugs

* fix bug

* support updating maxTrialNum

* Modify IT scripts for supporting multiple experiments

* Update ci (#175)

* Update RemoteMachineMode.md (#63)

* Remove unused classes for SQuAD QA example.

* Remove more unused functions for SQuAD QA example.

* Fix default dataset config.

* Add Makefile README (#64)

* update document (#92)

* Edit readme.md

* updated a word

* Update GetStarted.md

* Update GetStarted.md

* refact readme, getstarted and write your trial md.

* Update README.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Fix nnictl bugs and add new feature (#75)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* remove Buffer warning (#100)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* Add support for debugging mode

* modify CI cuz of refracting exp stop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* update CI for expstop

* file saving

* fix issues from code merge

* remove $(INSTALL_PREFIX)/nni/nni_manager before install

* fix indent

* fix merge issue

* socket close

* update port

* fix merge error

* modify ci logic in nnimanager

* fix ci

* fix bug

* change suspended to done

* update ci (#229)

* update ci

* update ci

* update ci (#232)

* update ci

* update ci

* update azure-pipelines

* update azure-pipelines

* update ci (#233)

* update ci

* update ci

* update azure-pipelines

* update azure-pipelines

* update azure-pipelines

* run.py (#238)

* Nnupdate ci (#239)

* run.py

* test ci

* Nnupdate ci (#240)

* run.py

* test ci

* test ci

* Udci (#241)

* run.py

* test ci

* test ci

* test ci

* update ci (#242)

* run.py

* test ci

* test ci

* test ci

* update ci

* revert install.sh (#244)

* run.py

* test ci

* test ci

* test ci

* update ci

* revert install.sh

* add comments

* remove assert

* trivial change

* trivial change

* update Makefile (#246)

* update Makefile

* update Makefile

* quick fix for ci (#248)

* add update trialNum and fix bugs (#261)

* Add builtin tuner to CI (#247)

* update Makefile

* update Makefile

* add builtin-tuner test

* add builtin-tuner test

* refractor ci

* update azure.yml

* add built-in tuner test

* fix bugs

* Doc refactor (#258)

* doc refactor

* image name refactor

* Refactor nnictl to support listing stopped experiments. (#256)

Refactor nnictl to support listing stopped experiments.

* Show experiment parameters more beautifully (#262)

* fix error on example of RemoteMachineMode (#269)

* add pycharm project files to .gitignore list

* update pylintrc to conform vscode settings

* fix RemoteMachineMode for wrong trainingServicePlatform

* Update docker file to use latest nni release (#263)

* fix bug about execDuration and endTime (#270)

* fix bug about execDuration and endTime

* modify time interval to 30 seconds

* refactor based on Gems's suggestion

* for triggering ci

* Refactor dockerfile (#264)

* refactor Dockerfile

* Support nnictl tensorboard (#268)

support tensorboard

* Sdk update (#272)

* Rename get_parameters to get_next_parameter

* annotations add get_next_parameter

* updates

* updates

* updates

* updates

* updates

* add experiment log path to experiment profile (#276)

* refactor extract reward from dict by tuner

* update Makefile for mac support, wait for aka.ms support

* refix Makefile for colorful echo

* unversion config.yml with machine information

* sync graph.py between tuners & trial of ga_squad

* sync graph.py between tuners & trial of ga_squad

* copy weight shared ga_squad under weight_sharing folder

* mv ga_squad code back to master

* simple tuner & trial ready

* Fix nnictl multiThread option

* weight sharing with async dispatcher simple example ready

* update for ga_squad

* fix bug

* modify multihead attention name

* add min_layer_num to Graph

* fix bug

* update share id calc

* fix bug

* add save logging

* fix ga_squad tuner bug

* sync bug fix for ga_squad tuner

* fix same hash_id bug

* add lock to simple tuner in weight sharing

* Add readme to simple weight sharing

* update

* update

* add paper link

* update

* reformat with autopep8

* add documentation for weight sharing

* test for weight sharing

* delete irrelevant files

* move details of weight sharing in to code comments

* add example section

* update weight sharing tutorial

* fix divide by zero risk

* update tuner thread exception handling

* fix bug for async test
parent e6eb6eab
...@@ -97,6 +97,7 @@ class MsgDispatcher(MsgDispatcherBase): ...@@ -97,6 +97,7 @@ class MsgDispatcher(MsgDispatcherBase):
def handle_request_trial_jobs(self, data): def handle_request_trial_jobs(self, data):
# data: number or trial jobs # data: number or trial jobs
ids = [_create_parameter_id() for _ in range(data)] ids = [_create_parameter_id() for _ in range(data)]
_logger.debug("requesting for generating params of {}".format(ids))
params_list = self.tuner.generate_multiple_parameters(ids) params_list = self.tuner.generate_multiple_parameters(ids)
for i, _ in enumerate(params_list): for i, _ in enumerate(params_list):
......
...@@ -19,10 +19,14 @@ ...@@ -19,10 +19,14 @@
# ================================================================================================== # ==================================================================================================
#import json_tricks #import json_tricks
import os
import logging import logging
import json_tricks import os
from queue import Queue
import sys
from multiprocessing.dummy import Pool as ThreadPool from multiprocessing.dummy import Pool as ThreadPool
import json_tricks
from .common import init_logger, multi_thread_enabled from .common import init_logger, multi_thread_enabled
from .recoverable import Recoverable from .recoverable import Recoverable
from .protocol import CommandType, receive from .protocol import CommandType, receive
...@@ -34,6 +38,7 @@ class MsgDispatcherBase(Recoverable): ...@@ -34,6 +38,7 @@ class MsgDispatcherBase(Recoverable):
def __init__(self): def __init__(self):
if multi_thread_enabled(): if multi_thread_enabled():
self.pool = ThreadPool() self.pool = ThreadPool()
self.thread_results = []
def run(self): def run(self):
"""Run the tuner. """Run the tuner.
...@@ -49,7 +54,11 @@ class MsgDispatcherBase(Recoverable): ...@@ -49,7 +54,11 @@ class MsgDispatcherBase(Recoverable):
if command is None or command is CommandType.Terminate: if command is None or command is CommandType.Terminate:
break break
if multi_thread_enabled(): if multi_thread_enabled():
self.pool.map_async(self.handle_request, [(command, data)]) result = self.pool.map_async(self.handle_request_thread, [(command, data)])
self.thread_results.append(result)
if any([thread_result.ready() and not thread_result.successful() for thread_result in self.thread_results]):
_logger.debug('Caught thread exception')
break
else: else:
self.handle_request((command, data)) self.handle_request((command, data))
...@@ -59,6 +68,16 @@ class MsgDispatcherBase(Recoverable): ...@@ -59,6 +68,16 @@ class MsgDispatcherBase(Recoverable):
_logger.info('Terminated by NNI manager') _logger.info('Terminated by NNI manager')
def handle_request_thread(self, request):
if multi_thread_enabled():
try:
self.handle_request(request)
except Exception as e:
_logger.exception(str(e))
raise
else:
pass
def handle_request(self, request): def handle_request(self, request):
command, data = request command, data = request
......
...@@ -48,6 +48,7 @@ class Tuner(Recoverable): ...@@ -48,6 +48,7 @@ class Tuner(Recoverable):
result = [] result = []
for parameter_id in parameter_id_list: for parameter_id in parameter_id_list:
try: try:
_logger.debug("generating param for {}".format(parameter_id))
res = self.generate_parameters(parameter_id) res = self.generate_parameters(parameter_id)
except nni.NoMoreTrialError: except nni.NoMoreTrialError:
return result return result
......
authorName: default
experimentName: example_weight_sharing
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
#choice: true, false
useAnnotation: false
multiThread: true
tuner:
codeDir: .
classFileName: simple_tuner.py
className: SimpleTuner
trial:
command: python3 main.py
codeDir: .
gpuNum: 0
machineList:
- ip: 10.10.10.10
username: bob
passwd: bob123
- ip: 10.10.10.11
username: bob
passwd: bob123
"""
Test code for weight sharing
need NFS setup and mounted as `/mnt/nfs/nni`
"""
import hashlib
import os
import random
import time
import nni
def generate_rand_file(fl_name):
"""
generate random file and write to `fl_name`
"""
fl_size = random.randint(1024, 102400)
fl_dir = os.path.split(fl_name)[0]
if not os.path.exists(fl_dir):
os.makedirs(fl_dir)
with open(fl_name, 'wb') as fout:
fout.write(os.urandom(fl_size))
def check_sum(fl_name, tid=None):
"""
compute checksum for generated file of `fl_name`
"""
hasher = hashlib.md5()
with open(fl_name, 'rb') as fin:
for chunk in iter(lambda: fin.read(4096), b""):
hasher.update(chunk)
ret = hasher.hexdigest()
if tid is not None:
ret = ret + str(tid)
return ret
if __name__ == '__main__':
nfs_path = '/mnt/nfs/nni/test'
params = nni.get_next_parameter()
print(params)
if params['id'] == 0:
model_file = os.path.join(nfs_path, str(params['id']), 'model.dat')
generate_rand_file(model_file)
time.sleep(10)
nni.report_final_result({
'checksum': check_sum(model_file, tid=params['id']),
'path': model_file
})
else:
model_file = params['prev_path']
time.sleep(10)
nni.report_final_result({
'checksum': check_sum(model_file, tid=params['prev_id'])
})
"""
SimpleTuner for Weight Sharing
"""
import logging
from threading import Event, Lock
from nni.tuner import Tuner
_logger = logging.getLogger('WeightSharingTuner')
class SimpleTuner(Tuner):
"""
simple tuner, test for weight sharing
"""
def __init__(self):
super(SimpleTuner, self).__init__()
self.trial_meta = {}
self.f_id = None # father
self.sig_event = Event()
self.thread_lock = Lock()
def generate_parameters(self, parameter_id):
if self.f_id is None:
self.thread_lock.acquire()
self.f_id = parameter_id
self.trial_meta[parameter_id] = {
'prev_id': 0,
'id': parameter_id,
'checksum': None,
'path': '',
}
_logger.info('generate parameter for father trial %s' %
parameter_id)
self.thread_lock.release()
return {
'prev_id': 0,
'id': parameter_id,
}
else:
self.sig_event.wait()
self.thread_lock.acquire()
self.trial_meta[parameter_id] = {
'id': parameter_id,
'prev_id': self.f_id,
'prev_path': self.trial_meta[self.f_id]['path']
}
self.thread_lock.release()
return self.trial_meta[parameter_id]
def receive_trial_result(self, parameter_id, parameters, reward):
self.thread_lock.acquire()
if parameter_id == self.f_id:
self.trial_meta[parameter_id]['checksum'] = reward['checksum']
self.trial_meta[parameter_id]['path'] = reward['path']
self.sig_event.set()
else:
if reward['checksum'] != self.trial_meta[self.f_id]['checksum']:
raise ValueError("Inconsistency in weight sharing: {} != {}".format(
reward['checksum'], self.trial_meta[self.f_id]['checksum']))
self.thread_lock.release()
def update_search_space(self, search_space):
pass
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment