Commits · 03027e33beccbfa3fd77c4c04e7ec68ff09edd5c · ModelZoo / ResNet50_tensorflow

15 May, 2019 3 commits

Fix lint errors. (#6788) · 03027e33
Rachel Lim authored May 15, 2019

03027e33

Set the --clone_model_in_keras_dist_strat to None. (#6781) · 2d4cfad0

Igor authored May 15, 2019

* Set the --clone_model_in_keras_dist_strat to None.  Remove the separate no_cloning benchmarks and add a couple of cloning ones.  Fixes the learning rate schedule to cache its ops per graph.

2d4cfad0

Adds keras imagenet benchmarks which use tf.data's `experimental_slack` option. (#6744) · 6aa6bac5

Rachel Lim authored May 15, 2019

* Added 'tfdata_exp' version of all benchmarks which set
FLAGS.tf_data_experimental_slack = True. Renamed
`data_prefetch_with_slack` to `data_delay_prefetch` (haoyu's change)
to make the names more distinct.

* Add flag to resnet input pipeline and surface through
keras_imagenet_main.py

6aa6bac5

11 May, 2019 1 commit

Add FP16 to transformer with benchmark tests. (#6756) · b7e97bec

Toby Boyd authored May 10, 2019

* Add FP16 and benchmarks.

* add missing run and report.

* Add loss_scale as option not included with dtype.

* move loss_scale validation under dtype conditional.

* add loss_scale to flags tested.

b7e97bec

10 May, 2019 5 commits
- Fix trivial model to work properly with fp16 (#6760) · c0b31c51
  Haoyu Zhang authored May 10, 2019
```
* Fix trivial model to work properly with fp16

* Add comment on manual casting
```
  c0b31c51
- Minimize variables and computation in trivial model (#6759) · 5e876e6e
  Haoyu Zhang authored May 10, 2019
```
Previously we had one dense layer in trivial model. The weight was [224*224*3, num_classes].
Using two dense layers, the weights are [224*224*3, 1] and [1, num_classes].
```
  5e876e6e
- Do not report accuracy metrics for benchmark tests (#6757) · cfa37aab
  Haoyu Zhang authored May 10, 2019
```
* Do not report metrics in performance benchmarks

* Rename flag
```
  cfa37aab
- Fix broken test in V2 (#6755) · bae940dc
  Haoyu Zhang authored May 10, 2019
  
  bae940dc
- Use LR schedule ops instead of LR callback for tweaked tests (#6745) · 4b4dbad1
  Haoyu Zhang authored May 09, 2019
```
* Modified tweaked tests to use tensor learning rate
```
  4b4dbad1
09 May, 2019 1 commit

Use TensorFlow ops for Keras LearningRateSchedule (#6739) · 9d38e894

Haoyu Zhang authored May 09, 2019

* Add learning rate tensor. This makes training slower

* Improve LearningRateSchedule with better efficiency

* Fix lint error

* Replace constant definition with existing one

9d38e894

07 May, 2019 1 commit
- Use flags to define collective ops when initializing MirroredStrategy (#6724) · f5073f49
  Haoyu Zhang authored May 07, 2019
  
  f5073f49
06 May, 2019 1 commit
- Fix ResNet model convergence problem (#6721) · a182abc1
  Haoyu Zhang authored May 06, 2019
  
  a182abc1
04 May, 2019 1 commit

Enable CuDNN BatchNorm spatial persistent by default (#6710) · 58deb059

Haoyu Zhang authored May 03, 2019

* Enable CuDNN BatchNorm spatial persistent by default; Remove 2nd zero padding layer

* Apply scale=False and fused=True consistently to BatchNorm layers

* Undo remove padding layer

* Replace zero padding with padding attribute in max pooling for better performance

* Resolve comments

* Revert "Replace zero padding with padding attribute in max pooling for better performance"

This reverts commit ad49db057c800ecac008eec1057005bd2c08ac73.

58deb059

03 May, 2019 1 commit
- Add graph rewrite convergence benchmark (#6712) · 0a96c7b4
  Reed authored May 02, 2019
  
  0a96c7b4
02 May, 2019 1 commit
- Add graph rewrite benchmarks (#6708) · e172ac82
  Reed authored May 02, 2019
  
  e172ac82
01 May, 2019 1 commit

Add --fp16_implementation option. (#6703) · b691578c

Reed authored May 01, 2019

This options allows the new tf.train.experimental.enable_mixed_precision_graph_rewrite() function to be used for fp16, instead of manual casts.

b691578c

30 Apr, 2019 1 commit
- Eval every 10 epochs to better match estimator tests. (#6696) · 3ee027fb
  Toby Boyd authored Apr 30, 2019
  
  3ee027fb
29 Apr, 2019 3 commits

Replace per_device with per_replica and PerDevice with PerReplica, because the... · b00783d7

Igor authored Apr 29, 2019

Replace per_device with per_replica and PerDevice with PerReplica, because the PerDevice concept was renamed and doesn't exist anymore. (#6693)

* Replace per_device with per_replica and PerDevice with PerReplica, because the PerReplica concept was renamed and doesn't exist anymore.

b00783d7

bug fix (#6695) · 0f6f656f
Shining Sun authored Apr 29, 2019
```
* bug fix

* bug fix
```
0f6f656f

Add benchmarks with the --cloning flag to Resnet and NFC. (#6675) · af47736d

Igor authored Apr 29, 2019

* Add benchmarks with the --cloning flag to Resnet and NFC.

* Renamed cloning to clone_model_in_keras_dist_strat. Dropped a few tests that aren't essential.

* Fixed up the formatting after re-naming the flag to a much longer  name.  Thanks, lint.
* Fixed the lint error in nfc_common.py

af47736d

26 Apr, 2019 2 commits

Combined imagenet and cifar-10 estimator tests (#6672) · acc6f6d7

Toby Boyd authored Apr 26, 2019

* Combined imagenet and cifar-10 benchmarks

* Comments and epochs_between_evals.

* Added tuned tests and cleaned up benchmark flags

* Fix names.

* Return results and add images/sec hook.

* updated doc strings for return values.

* 128 to 256 batch for FP16 test

* added more doc strings to fix lint.

acc6f6d7

Add num_packs flag for MirroredStrategy's cross device ops. (#6676) · 4a1fba0b

Ayush Dubey authored Apr 26, 2019

* Add num_packs flag for MirroredStrategy's cross device ops.

* fix parens

* Fix lint errors and make all_reduce_alg more robust.

* Set default num_packs to 1

4a1fba0b

25 Apr, 2019 1 commit
- Revert "Specify NCCL as the all reduce algorithm (#6662)" (#6671) · a7338771
  Haoyu Zhang authored Apr 25, 2019
```
Reason: test failures because contrib is not available in V2

This reverts commit 325dd761.
```
  a7338771
24 Apr, 2019 5 commits
- Add top_1 accuracy check. (#6663) · ff5cef9a
  Toby Boyd authored Apr 24, 2019
  
  ff5cef9a
- Added none check for output_dir (#6664) · 05b9122f
  Shining Sun authored Apr 24, 2019
```
* Added none check for output_dir

* Change double quote to single
```
  05b9122f
- Specify NCCL as the all reduce algorithm (#6662) · 325dd761
  Haoyu Zhang authored Apr 24, 2019
  
  325dd761
- Add tests to track 8 GPU fp16 performance in legacy graph mode (#6653) · 4ad73a1c
  Haoyu Zhang authored Apr 23, 2019
  
  4ad73a1c
- Add experimental tf.data sleep tuning for better performance (#6634) · 50dfb31d
  Haoyu Zhang authored Apr 23, 2019
```
* Introduce a short sleep before ds.prefetch in tf.data.

* Further limit dataset threads to reduce CPU contention

* Tuned dataset sleep time

* Rename dataset sleep flag; enable it only for Keras Graph mode
```
  50dfb31d
23 Apr, 2019 2 commits
- Small word tweak (#6650) · 4698a41e
  Toby Boyd authored Apr 23, 2019
```
* Small word tweak

* Few more tweaks
```
  4698a41e
- Update README.md (#6612) · 9d299984
  Usama Muneeb authored Apr 23, 2019
```
Added additional information on using the `SavedModel` for prediction purposes.
```
  9d299984
22 Apr, 2019 1 commit
- Use tf.image.resize_with_crop_or_pad (#6632) · 7772cb1d
  Toby Boyd authored Apr 22, 2019
  
  7772cb1d
18 Apr, 2019 1 commit
- Update logic to rescale L2 loss in distribution strategy (#6601) · c33d3ef4
  Haoyu Zhang authored Apr 17, 2019
  
  c33d3ef4
17 Apr, 2019 4 commits

Added unit tests keras cifar and imagenet (#6535) · 2ae6d37a

Shining Sun authored Apr 17, 2019

* before moving test cases to the base class

* Added tests for keras cifar and keras imagenet

* fix cifar10_test

* add blank lines

* fix lint errors

* fix lint

* Resolve comments

* Modified two resnet keras tests

* Tests passed

* Remove keras_test_base

* Remove gpu from the no-dist tests

2ae6d37a

Revert "Set input layer `batch_size` in multi-replica mode" (#6598) · 4f3cc31c

rxsang authored Apr 17, 2019

* Revert "Set input layer `batch_size` in multi-replica mode (#6578)"

This reverts commit f1a59682.

* Rename variables.

4f3cc31c

tf.estimator.train_and_evalute doesn't return anything in multi-worker case. (#6582) · 20b19b61

Yuefeng Zhou authored Apr 17, 2019

* Update resnet_run_loop.py

* Update resnet_run_loop.py

* Update resnet_run_loop.py

* Update resnet_run_loop.py

* Update resnet_run_loop.py

20b19b61

Set input layer `batch_size` in multi-replica mode (#6578) · f1a59682
rxsang authored Apr 16, 2019

f1a59682

12 Apr, 2019 2 commits
- Move metrics info from extras to metrics field in test_log.proto (#6548) · 645202b1
  Dong Lin authored Apr 11, 2019
  
  645202b1
- reduce test batch size for 1 GPU no_dist_strat (#6564) · 17ef6405
  Taylor Robie authored Apr 11, 2019
  
  17ef6405
11 Apr, 2019 1 commit

Ensure static shapes when enabling XLA in Resnet Keras model in graph mode. (#6558) · e08b6286

rxsang authored Apr 10, 2019

* Revert "Revert " Ensure static shapes when enabling XLA in Resnet Keras model (#6508)" (#6517)"

This reverts commit cc9eef76.

* Set `batch_size` to keras.Input in non-eager mode.

Eager mode currently has OOM problem.

* Add comments for enable_eager flag.

* Always set drop_remainder=True.

* Only set drop_remainder=True for XLA.

e08b6286

10 Apr, 2019 1 commit

Refactored ResNet code and added additional architectures. (#6316) · e9359d00

Vighnesh Birodkar authored Apr 10, 2019

* Refactored ResNet code and added additional architectures.

* Added numerical layer names instead of alphabetical.

* Change dash to underscore.

* Corrected return statement.

* Use conv_strides argument.

* Set classes=10

* Use partial to reduce code duplication.

e9359d00