Commits · 4c11b84b1c7360aecff7c4a679d7e05076ffc19d · ModelZoo / ResNet50_tensorflow

28 Mar, 2019 3 commits

Added benchmark test and convergence test for the NCF model (#6318) · 4c11b84b

Shining Sun authored Mar 28, 2019

* initial commit

* bug fix

* Move build_stats from common to keras main, because it is only applicable in keras

* remove tailing blank line

* add test for synth data

* add kwargs to init

* add kwargs to function invokation

* correctly pass kwargs

* debug

* debug

* debug

* fix super init

* bug fix

* fix local_flags

* fix import

* bug fix

* fix log_steps flag

* bug fix

* bug fix: add missing return value

* resolve double-defined flags

* lint fix

* move log_steps flag to benchmarK flag

* fix lint

* lint fix

* lint fix

* try flag core default values

* bug fix

* bug fix

* bug fix

* debug

* debug

* remove debug prints

* rename benchmark methods

* flag bug fix for synth benchmark

4c11b84b

Add a README for ResNet keras (#6293) · f5bb2af2

Shining Sun authored Mar 28, 2019

* Initial commit

* Fininshed

* bug fix

* bug fix

* bug fix

* Resolve review comments

* Typo fix

* resolve comments

* fix number error

* Resolve comments

f5bb2af2

Add trivial Keras model (#6460) · b09685fe
Haoyu Zhang authored Mar 27, 2019

b09685fe

27 Mar, 2019 1 commit
- Add Keras ResNet model tests in legacy graph mode (#6444) · 7dbbca9d
  Haoyu Zhang authored Mar 26, 2019
  
  7dbbca9d
26 Mar, 2019 1 commit
- Move distribution strategy creation before creating any ops, which is (#6435) · b3594a83
  Yuefeng Zhou authored Mar 25, 2019
```
required by multi-node collective ops in eager mode.
```
  b3594a83
25 Mar, 2019 1 commit
- Add/Modify tests to track Tensorboard overhead and improve performance of accuracy test. (#6434) · 94f9c2cc
  Haoyu Zhang authored Mar 25, 2019
  
  94f9c2cc
22 Mar, 2019 1 commit
- Disable Tensorboard callback by default (#6424) · 8d5d36e0
  Haoyu Zhang authored Mar 22, 2019
  
  8d5d36e0
20 Mar, 2019 1 commit
- Added thread tuning and tweaked tests to improve Keras model performance (#6396) · 7b5606a5
  Haoyu Zhang authored Mar 19, 2019
  
  7b5606a5
19 Mar, 2019 2 commits
- Add config to enable XLA in TF 2.0 (#6406) · dba24007
  Haoyu Zhang authored Mar 19, 2019
  
  dba24007
- Add the option to run Keras resnet model on multiple workers. (#6368) · 3024bde6
  Soroush Radpour authored Mar 19, 2019
  
  3024bde6
13 Mar, 2019 1 commit
- Add fp16 to 8 gpu fp16 tests. (#6353) · ba0a6f60
  Toby Boyd authored Mar 12, 2019
  
  ba0a6f60
12 Mar, 2019 1 commit
- xla to bs=128 for num_gpu=8 (#6351) · 19daade4
  Toby Boyd authored Mar 12, 2019
```
* xla to bs=128 for num_gpu=8

* remove todo
```
  19daade4
07 Mar, 2019 1 commit
- Add fp16 to keras benchmarks (#6314) · 258b77cc
  Reed authored Mar 06, 2019
  
  258b77cc
06 Mar, 2019 1 commit
- Mixed precision support (#6309) · e4a046e7
  Reed authored Mar 06, 2019
```
* Mixed precision support

* Add TODOs
```
  e4a046e7
02 Mar, 2019 1 commit
- fix resnet breakage and add keras end-to-end tests (#6295) · 8367cf6d
  Taylor Robie authored Mar 02, 2019
```
* fix resnet breakage and add keras end-to-end tests

* delint

* address PR comments
```
  8367cf6d
01 Mar, 2019 2 commits

Keras-fy NCF Model (#6092) · 048e5bff

Shining Sun authored Mar 01, 2019

* tmp commit

* tmp commit

* first attempt (without eval)

* Bug fixes

* bug fixes

* training done

* Loss NAN, no eval

* Loss weight problem solved

* resolve the NAN loss problem

* Problem solved. Clean up needed

* Added a todo

* Remove debug prints

* Extract get_optimizer to ncf_common

* Move metrics computation back to neumf; use DS.scope api

* Extract DS.scope code to utils

* lint fixes

* Move obtaining DS above producer.start to avoid race condition

* move pt 1

* move pt 2

* Update the run script

* Wrap keras_model related code into functions

* Update the doc for softmax_logitfy and change the method name

* Resolve PR comments

* working version with: eager, DS, batch and no masks

* Remove git conflict indicator

* move reshape to neumf_model

* working version, not converge

* converged

* fix a test

* more lint fix

* more lint fix

* more lint fixes

* more lint fix

* Removed unused imports

* fix test

* dummy commit for kicking of checks

* fix lint issue

* dummy input to kick off checks

* dummy input to kick off checks

* add collective to dist strat

* addressed review comments

* add a doc string

048e5bff

Add Keras XLA Tests (#6286) · fa9ed456

Haoyu Zhang authored Mar 01, 2019

* Added XLA test with a monkey-patched op to avoid OOM

* Added doc strings in Keras benchmarks to avoid Lint error

fa9ed456

28 Feb, 2019 1 commit
- Add benchmarks for thread tuning. (#6283) · 54dffe2e
  Yuefeng Zhou authored Feb 28, 2019
```
* Add benchmarks for thread tuning.

* Address comment/

* Add a comment.
```
  54dffe2e
25 Feb, 2019 1 commit
- Add root_data_dir to constructor of Resnet50KerasBenchmarkSynth and... · 338088df
  Dong Lin authored Feb 25, 2019
```
Add root_data_dir to constructor of Resnet50KerasBenchmarkSynth and Resnet50KerasBenchmarkReal (#6259)
```
  338088df
22 Feb, 2019 3 commits
- Set data_dir to cifar-10-batches-bin in keras_cifar_benchmark.py (#6251) · da1d3e60
  Dong Lin authored Feb 22, 2019
  
  da1d3e60
- Add kwargs to make the benchmark class constructor forward compatible. (#6246) · 5f4d34fc
  Dong Lin authored Feb 21, 2019
```
This is needed to avoid breaking benchmark execution if PerfZero provides more
Named arguments before  the benchmark class constructor is updated.
```
  5f4d34fc
- Allow user to specify root_data_dir in the benchmark class constructor (#6213) · 5c6fa148
  Dong Lin authored Feb 21, 2019
```
* Allow user to specify root_data_dir in the benchmark class constructor

* Address comments
```
  5c6fa148
21 Feb, 2019 1 commit
- Add flag to enable XLA in Keras models (#6240) · 4571d3fa
  Haoyu Zhang authored Feb 21, 2019
```
* Add flag to enable XLA in Keras models

* Fix lint errors (some of them are old errors)
```
  4571d3fa
19 Feb, 2019 1 commit
- Pass datasets_num_private_threads flag into Keras resnet model. (#6211) · ad386df5
  Yuefeng Zhou authored Feb 18, 2019
  
  ad386df5
15 Feb, 2019 1 commit
- added data_dir. (#6205) · 078575a1
  Toby Boyd authored Feb 14, 2019
  
  078575a1
14 Feb, 2019 3 commits
- Fix lint issues (#6204) · 68b724ce
  Toby Boyd authored Feb 14, 2019
  
  68b724ce
- Move test to under accuracy class. (#6202) · 5d3a7d04
  Toby Boyd authored Feb 14, 2019
  
  5d3a7d04
- Workarond for memory issue in eager mode. (#6197) · ae699073
  Yuefeng Zhou authored Feb 13, 2019
```
* Workarond for memory issue in eager mode.

* Add a TODO

* Fix typo

* Address comments

* remove patch which appear hacky.

* fix typo
```
  ae699073
13 Feb, 2019 2 commits
- Do not toggle eager if tf 2.0 is used. (#6188) · e334f3e2
  Toby Boyd authored Feb 13, 2019
  
  e334f3e2
- Add a flag to specify distribution strategies. (#6185) · 79b57a3f
  Yuefeng Zhou authored Feb 12, 2019
```
* Add a flag to specify distribution strategies.

* Fix a small error.

* Address comments.

* Address comments.

* Fix typos.
```
  79b57a3f
12 Feb, 2019 2 commits

Add model_dir to all tests to avoid "resource not found error". (#6143) · f788046c

Toby Boyd authored Feb 12, 2019

* fix test benchmark_graph_1_gpu_no_dist_strat failing

- Failure only occurs when all 1_gpu tests are run
together with the error:
tensorflow.python.framework.errors_impl.NotFoundError:
Resource localhost/logdir:/tmp/cifar10_model/
N10tensorflow22SummaryWriterInterfaceE does not exist.
[Op:WriteScalarSummary] name: epoch_loss/

Another fix might be to generate a different model_dir
in the core code, but that has other draw backs such as
restarting from the checkpoint.

* Model_dir for all tests.

f788046c

add validation_freq which matches estimator settings. (#6180) · 7e056690
Toby Boyd authored Feb 11, 2019
```
- Modest speedup for CIFAR-10
- Slightly greater speedup expected for ImageNet ResNet50.
```
7e056690

11 Feb, 2019 1 commit
- Fix accuracy name (#6179) · 27e86174
  Toby Boyd authored Feb 11, 2019
  
  27e86174
09 Feb, 2019 1 commit

Add pure synthetic data to keras resnet model. (#6174) · 05383c7b

Yuefeng Zhou authored Feb 08, 2019

* Add pure synthetic data to keras resnet mode.

* Add imports.

* Address comments.

* update comment

* Undo set up synthetic data for real data path.

* update comment

* Address comment

* Remove trailing whiltespaces.

* s/make_data_set_iterator/make_dataset_iterator/

05383c7b

08 Feb, 2019 1 commit
- Revert "Revert "tf_upgrade_v2 on resnet and utils folders. (#6154)" (#6162)" (#6167) · b2c9e3f5
  Goldie Gadde authored Feb 08, 2019
```
This reverts commit 57e07520.
```
  b2c9e3f5
06 Feb, 2019 1 commit
- Revert "tf_upgrade_v2 on resnet and utils folders. (#6154)" (#6162) · 57e07520
  Goldie Gadde authored Feb 06, 2019
```
This reverts commit d6b2b83c.
```
  57e07520
05 Feb, 2019 1 commit

tf_upgrade_v2 on resnet and utils folders. (#6154) · d6b2b83c

Goldie Gadde authored Feb 05, 2019

* Add resnet56 short tests. (#6101)

* Add resnet56 short tests.
- created base benchmark module
- renamed accuracy test class to contain the word Accuracy
which will result in a need to update all the jobs
and a loss of history but is worth it.
- short tests are mostly copied from shining with oss refactor

* Address feedback.

* Move flag_methods to init
- Address setting default flags repeatedly.

* Rename accuracy tests.

* Lint errors resolved.

* fix model_dir set to flags.data_dir.

* fixed not fulling pulling out flag_methods.

* Use core mirrored strategy in official models (#6126)

* Imagenet short tests (#6132)

* Add short imagenet tests (taken from seemuch)
- also rename to match go forward naming

* fix method name

* Update doc strings.

* Fixe gpu number.

* points default data_dir to child folder. (#6131)

Failed test is python2  and was a kokoro failure

* Imagenet short tests (#6136)

* Add short imagenet tests (taken from seemuch)
- also rename to match go forward naming

* fix method name

* Update doc strings.

* Fixe gpu number.

* Add fill_objects

* fixed calling wrong class in super.

* fix lint issue.

* Flag (#6121)

* Fix the turn_off_ds flag problem

* add param names to all args

* Export benchmark stats using tf.test.Benchmark.report_benchmark() (#6103)

* Export benchmark stats using tf.test.Benchmark.report_benchmark()

* Fix python style using pyformat

* Typos. (#6120)

* log verbosity=2 logs every epoch no progress bars (#6142)

* tf_upgrade_v2 on resnet and utils folder.

* tf_upgrade_v2 on resnet and utils folder.

d6b2b83c

03 Feb, 2019 1 commit
- log verbosity=2 logs every epoch no progress bars (#6142) · 722f345e
  Toby Boyd authored Feb 02, 2019
  
  722f345e
01 Feb, 2019 2 commits
- Export benchmark stats using tf.test.Benchmark.report_benchmark() (#6103) · a1bd019e
  Dong Lin authored Feb 01, 2019
```
* Export benchmark stats using tf.test.Benchmark.report_benchmark()

* Fix python style using pyformat
```
  a1bd019e
- Flag (#6121) · ed6c805a
  Shining Sun authored Feb 01, 2019
```
* Fix the turn_off_ds flag problem

* add param names to all args
```
  ed6c805a