Multi-phase training service (#148) (39085789) · Commits · OpenDAS / nni

Unverified Commit 39085789 authored Oct 08, 2018 by

chicm-ms Committed by GitHub Oct 08, 2018

Multi-phase training service (#148)

* Dev enas  - multi-phase hyper parameters support (#96)

* Multi-phase support

* Updates

* Updates

* updates

* updates

* updates

* Merge master to dev-enas (#117)

* Multi-phase support

* update document (#92)

* Edit readme.md

* updated a word

* Update GetStarted.md

* Update GetStarted.md

* refact readme, getstarted and write your trial md.

* Update README.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Fix nnictl bugs and add new feature (#75)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* Updates

* remove Buffer warning (#100)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* Updates

* updates

* updates

* updates

* Add support for debugging mode

* fix setup.py (#115)

* Add DAG model configuration format for SQuAD example.

* Explain config format for SQuAD QA model.

* Add more detailed introduction about the evolution algorithm.

* Merge master to dev-enas (#118)

* update document (#92)

* Edit readme.md

* updated a word

* Update GetStarted.md

* Update GetStarted.md

* refact readme, getstarted and write your trial md.

* Update README.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Update WriteYourTrial.md

* Fix nnictl bugs and add new feature (#75)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* remove Buffer warning (#100)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* Add support for debugging mode

* fix setup.py (#115)

* Add DAG model configuration format for SQuAD example.

* Explain config format for SQuAD QA model.

* Add more detailed introduction about the evolution algorithm.

* Fix install.sh add add trial log path (#109)

* fix nnictl bug

* fix nnictl create bug

* add experiment status logic

* add more information for nnictl

* fix Evolution Tuner bug

* refactor code

* fix code in updater.py

* fix nnictl --help

* fix classArgs bug

* update check response.status_code logic

* show trial log path

* update document

* fix install.sh

* set default vallue for maxTrialNum and maxExecDuration

* fix nnictl

* support multiPhase (#127)

* fix nnictl bug

* support multiPhase

* Fix multiphase datastore problem (#125)

* Fix multiphase datastore problem

* updates

* updates

* updates

* updates

* Pull latest code (#2)

* webui logpath and document (#135)

* Add webui document and logpath as a href

* fix tslint

* fix comments by Chengmin

* Pai training service bug fix and enhancement (#136)

* Add NNI installation scripts

* Update pai script, update NNI_out_dir

* Update NNI dir in nni sdk local.py

* Create .nni folder in nni sdk local.py

* Add check before creating .nni folder

* Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT

* Improve annotation (#138)

* Improve annotation

* Minor bugfix

* Selectively install through pip (#139)

Selectively install through pip 
* update setup.py

* fix paiTrainingService bugs (#137)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* Add documentation for NNI PAI mode experiment (#141)

* Add documentation for NNI PAI mode

* Fix typo based on PR comments

* Exit with subprocess return code of trial keeper

* Remove additional exit code

* Fix typo based on PR comments

* update doc for smac tuner (#140)

* Revert "Selectively install through pip (#139)" due to potential pip install issue (#142)

* Revert "Selectively install through pip (#139)"

This reverts commit 1d174836.

* Add exit code of subprocess for trial_keeper

* Update README, add link to PAImode doc

* fix bug (#147)

* Refactor nnictl and add config_pai.yml (#144)

* fix nnictl bug

* add hdfs host validation

* fix bugs

* fix dockerfile

* fix install.sh

* update install.sh

* fix dockerfile

* Set timeout for HDFSUtility exists function

* remove unused TODO

* fix sdk

* add optional for outputDir and dataDir

* refactor dockerfile.base

* Remove unused import in hdfsclientUtility

* add config_pai.yml

* refactor nnictl create logic and add colorful print

* fix nnictl stop logic

* add annotation for config_pai.yml

* add document for start experiment

* fix config.yml

* fix document

* Fix trial keeper wrongly exit issue (#152)

* Fix trial keeper bug, use actual exitcode to exit rather than 1

* Fix bug of table sort (#145)

* Update doc for PAIMode and v0.2 release notes (#153)

* Update v0.2 documentation regards to release note and PAI training service

* Update document to describe NNI docker image

* Bug fix for SQuAD example tuner. (#134)

* Update Makefile (#151)

* test

* update setup.py

* update Makefile and install.sh

* rever setup.py

* change color

* update doc

* update doc

* fix auto-completion's extra space

* update Makefile

* update webui

* Update doc image (#163)

* update doc

* trivial

* trivial

* trivial

* trivial

* trivial

* trivial

* update image

* update image size

* Update ga squad (#104)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* update readme

* sklearn examples (#169)

* fix nnictl bug

* fix install.sh

* add sklearn-regression example

* add sklearn classification

* update sklearn

* update example

* remove additional code

* Update batch tuner (#158)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* update readme

* update batch tuner

* Quickly fix cascading search space bug in tuner (#156)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* update readme

* quickly fix cascading searchspace bug in tuner

* Add iterative search space example (#119)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* update readme

* add iterative search space example

* update

* update readme

* change name

* updates

* updates

* Updates CI

* updates

parent 6ef65117

Hide whitespace changes

Inline Side-by-side

View file @ 39085789

...	@@ -54,7 +54,7 @@ def run():	...	@@ -54,7 +54,7 @@ def run():
	if trial > current_trial:		if trial > current_trial:
	current_trial = trial		current_trial = trial
	print('Trial #%d done' % trial)		print('Trial #%d done' % trial)
			subprocess.run(['nnictl', 'log', 'stderr'])
	assert tuner_status == 'DONE' and assessor_status == 'DONE', 'Failed to finish in 1 min'		assert tuner_status == 'DONE' and assessor_status == 'DONE', 'Failed to finish in 1 min'

	ss1 = json.load(open('search_space.json'))		ss1 = json.load(open('search_space.json'))
...		...

View file @ 39085789

...	@@ -29,6 +29,7 @@ Optional('maxExecDuration'): Regex(r'^[1-9][0-9]*[s\|m\|h\|d]$'),	...	@@ -29,6 +29,7 @@ Optional('maxExecDuration'): Regex(r'^[1-9][0-9]*[s\|m\|h\|d]$'),
	Optional('maxTrialNum'): And(int, lambda x: 1 <= x <= 99999),		Optional('maxTrialNum'): And(int, lambda x: 1 <= x <= 99999),
	'trainingServicePlatform': And(str, lambda x: x in ['remote', 'local', 'pai']),		'trainingServicePlatform': And(str, lambda x: x in ['remote', 'local', 'pai']),
	Optional('searchSpacePath'): os.path.exists,		Optional('searchSpacePath'): os.path.exists,
			Optional('multiPhase'): bool,
	'useAnnotation': bool,		'useAnnotation': bool,
	'tuner': Or({		'tuner': Or({
	'builtinTunerName': Or('TPE', 'Random', 'Anneal', 'Evolution', 'SMAC', 'BatchTuner'),		'builtinTunerName': Or('TPE', 'Random', 'Anneal', 'Evolution', 'SMAC', 'BatchTuner'),
...		...

View file @ 39085789

...	@@ -114,6 +114,8 @@ def set_pai_config(experiment_config, port):	...	@@ -114,6 +114,8 @@ def set_pai_config(experiment_config, port):
	if not response or not response.status_code == 200:		if not response or not response.status_code == 200:
	if response is not None:		if response is not None:
	err_message = response.text		err_message = response.text
			with open(STDERR_FULL_PATH, 'a+') as fout:
			fout.write(json.dumps(json.loads(err_message), indent=4, sort_keys=True, separators=(',', ':')))
	return False, err_message		return False, err_message

	#set trial_config		#set trial_config
...	@@ -128,6 +130,8 @@ def set_experiment(experiment_config, mode, port):	...	@@ -128,6 +130,8 @@ def set_experiment(experiment_config, mode, port):
	request_data['maxExecDuration'] = experiment_config['maxExecDuration']		request_data['maxExecDuration'] = experiment_config['maxExecDuration']
	request_data['maxTrialNum'] = experiment_config['maxTrialNum']		request_data['maxTrialNum'] = experiment_config['maxTrialNum']
	request_data['searchSpace'] = experiment_config.get('searchSpace')		request_data['searchSpace'] = experiment_config.get('searchSpace')
			if experiment_config.get('multiPhase'):
			request_data['multiPhase'] = experiment_config.get('multiPhase')
	request_data['tuner'] = experiment_config['tuner']		request_data['tuner'] = experiment_config['tuner']
	if 'assessor' in experiment_config:		if 'assessor' in experiment_config:
	request_data['assessor'] = experiment_config['assessor']		request_data['assessor'] = experiment_config['assessor']
...		...

Please register or to comment