Commits · dc4c5f1acbd085a6cd92330451736b44d60ce915 · ModelZoo / ResNet50_tensorflow

01 Aug, 2019 1 commit

Merged commit includes the following changes: (#7354) · dc4c5f1a

Haoyu Zhang authored Aug 01, 2019

261171038  by gjn<gjn@google.com>:

    Remove weight_decay_rate 0 early exit check

    Removing this code path should be fine since this was actually not doing
    what it meant to do. Since weight_decay_rate is actually a tensor, the
    equality check was only looking at the id of the object and comparing to
    0. This should never be true. Evaluating a tensor is also not what we
    want to do at this point of the code. Thus it should be fine to simply
    remove this code.

--
261169862  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

261153520  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

261140302  by hongkuny<hongkuny@google.com>:

    Clean up

--

PiperOrigin-RevId: 261171038

dc4c5f1a

31 Jul, 2019 1 commit
- Change to experimental_run_tf_function. (#7344) · a552e76a
  Toby Boyd authored Jul 31, 2019
  
  a552e76a
24 Jul, 2019 1 commit
- Returning an object causes the program to exit with a non-zero code. (#7294) · 9fb1a1b6
  Soroush Radpour authored Jul 24, 2019
  
  9fb1a1b6
23 Jul, 2019 1 commit

Single execution path tests for ResNet50, ResNet56, NCF, and Shakespeare LSTM. (#7276) · 9d8c9aa4

Toby Boyd authored Jul 23, 2019

* Add force_run_distributed tests.

* Added enable_eager

* r/force_run_distributed/force_v2_in_keras_compile

* Adding force_v2 tests and FLAGs.

* Rename method to avoid conflict.

* Add cpu force_v2 tests.

* fix lint, wrap line.

* change to force_v2_in_keras_compile

* Update method name.

* Lower mlperf target to 0.736.

9d8c9aa4

19 Jul, 2019 1 commit

Merged commit includes the following changes: (#7263) · c5a4978d

Jing Li authored Jul 19, 2019

* Merged commit includes the following changes:
258867180  by jingli<jingli@google.com>:

    Add new folders for upcoming reorg in model garden.

--
258893811  by hongkuny<hongkuny@google.com>:

    Adds summaries for metrics, allowing metrics inside keras.model.

--
258893048  by isaprykin<isaprykin@google.com>:

    Remove the `cloning` argument to `compile()`.

    Keras models are distributed by cloning in graph mode and without cloning in eager mode as of the change # 258652546.

--
258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258874998  by hongkuny<hongkuny@google.com>:

    Internal

--
258872662  by hongkuny<hongkuny@google.com>:

    Fix doc

--

PiperOrigin-RevId: 258867180

* Create __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

c5a4978d

21 Jun, 2019 1 commit

NCF XLA and Eager tests with a refactor of resnet flags to make this cleaner. (#7067) · a68f65f8

Toby Boyd authored Jun 21, 2019

* XLA FP32 and first test

* More XLA benchmarks FP32.

* Add eager to NCF and refactor resnet.

* fix v2_0 calls and more flag refactor.

* Remove extra flag args.

* 90 epoch default

* add return

* remove xla not used by estimator.

* Remove duplicate run_eagerly.

* fix flag defaults.

* Remove fp16_implementation flag option.

* Remove stop early on mlperf test.

* remove unneeded args.

* load flags from keras mains.

a68f65f8

20 Jun, 2019 1 commit

Improve performance of Keras ResNet models when not using distribution strategy (#7055) · cf3c2407

Haoyu Zhang authored Jun 20, 2019

* Do not set learning phase when skipping eval

* Do not set learning phase in no dist strat case

* Added device placement, tweaked benchmarks

* Added tweaked benchmarks for Cifar

* Fix device scope

* Fix lint

* Add explicit GPU placement flag

* Also run accuracy test with explicit GPU placement

* Added doc string

cf3c2407

19 Jun, 2019 1 commit

Add XLA to transformer (#7048) · 269581dc

Toby Boyd authored Jun 19, 2019



* set default steps to 300K.

* Log flags to perfzero.

* Add XLA support to transformer

- Moved config logic to keras_utils
- Added enable_xla flag to _performance flags
- Did not refactor enable_xla flag from keras resnet due to
  reliance on calling FLAGs in estimator keras and that is
  a needed refactor for another time.

* fix g3 lint complaint.

* Refactor set config into keras_utils.

* Move flags out of main.

* pipe through enable_xla

* Update official/transformer/v2/misc.py
Co-Authored-By: Reed <reedwm@google.com>

269581dc

10 Jun, 2019 1 commit
- Code cleanup. (#6989) · f7a44074
  rxsang authored Jun 10, 2019
  
  f7a44074
06 Jun, 2019 1 commit

Have each model provide a default loss scale. (#6930) · 42a8af1d

Reed authored Jun 06, 2019

Before, there was a global default loss scale for all models. Currently, only resnet uses loss scaling, but this will be useful once more models support it.

42a8af1d

31 May, 2019 1 commit
- Support pure eager execution in ResNet50 (#6929) · f6c2d9f8
  Haoyu Zhang authored May 30, 2019
```
* Support pure eager execution in ResNet50

* Use smaller batch size
```
  f6c2d9f8
23 May, 2019 2 commits

Fix non dist strat case. (#6867) · 68650c42
rxsang authored May 23, 2019

68650c42

Add enable_get_next_as_optional flag. (#6858) · 272a2baa

rxsang authored May 22, 2019

* Add enable_get_next_as_optional flag.

* Set enable_get_next_as_optional to strategy.

* Add comments to explain the flag.

* Remove trailing whitespace.

* Remove trailing space.

272a2baa

15 May, 2019 1 commit

Adds keras imagenet benchmarks which use tf.data's `experimental_slack` option. (#6744) · 6aa6bac5

Rachel Lim authored May 15, 2019

* Added 'tfdata_exp' version of all benchmarks which set
FLAGS.tf_data_experimental_slack = True. Renamed
`data_prefetch_with_slack` to `data_delay_prefetch` (haoyu's change)
to make the names more distinct.

* Add flag to resnet input pipeline and surface through
keras_imagenet_main.py

6aa6bac5

10 May, 2019 2 commits
- Fix trivial model to work properly with fp16 (#6760) · c0b31c51
  Haoyu Zhang authored May 10, 2019
```
* Fix trivial model to work properly with fp16

* Add comment on manual casting
```
  c0b31c51
- Do not report accuracy metrics for benchmark tests (#6757) · cfa37aab
  Haoyu Zhang authored May 10, 2019
```
* Do not report metrics in performance benchmarks

* Rename flag
```
  cfa37aab
09 May, 2019 1 commit

Use TensorFlow ops for Keras LearningRateSchedule (#6739) · 9d38e894

Haoyu Zhang authored May 09, 2019

* Add learning rate tensor. This makes training slower

* Improve LearningRateSchedule with better efficiency

* Fix lint error

* Replace constant definition with existing one

9d38e894

07 May, 2019 1 commit
- Use flags to define collective ops when initializing MirroredStrategy (#6724) · f5073f49
  Haoyu Zhang authored May 07, 2019
  
  f5073f49
04 May, 2019 1 commit

Enable CuDNN BatchNorm spatial persistent by default (#6710) · 58deb059

Haoyu Zhang authored May 03, 2019

* Enable CuDNN BatchNorm spatial persistent by default; Remove 2nd zero padding layer

* Apply scale=False and fused=True consistently to BatchNorm layers

* Undo remove padding layer

* Replace zero padding with padding attribute in max pooling for better performance

* Resolve comments

* Revert "Replace zero padding with padding attribute in max pooling for better performance"

This reverts commit ad49db057c800ecac008eec1057005bd2c08ac73.

58deb059

29 Apr, 2019 1 commit

Add benchmarks with the --cloning flag to Resnet and NFC. (#6675) · af47736d

Igor authored Apr 29, 2019

* Add benchmarks with the --cloning flag to Resnet and NFC.

* Renamed cloning to clone_model_in_keras_dist_strat. Dropped a few tests that aren't essential.

* Fixed up the formatting after re-naming the flag to a much longer  name.  Thanks, lint.
* Fixed the lint error in nfc_common.py

af47736d

25 Apr, 2019 1 commit
- Revert "Specify NCCL as the all reduce algorithm (#6662)" (#6671) · a7338771
  Haoyu Zhang authored Apr 25, 2019
```
Reason: test failures because contrib is not available in V2

This reverts commit 325dd761.
```
  a7338771
24 Apr, 2019 2 commits
- Specify NCCL as the all reduce algorithm (#6662) · 325dd761
  Haoyu Zhang authored Apr 24, 2019
  
  325dd761
- Add experimental tf.data sleep tuning for better performance (#6634) · 50dfb31d
  Haoyu Zhang authored Apr 23, 2019
```
* Introduce a short sleep before ds.prefetch in tf.data.

* Further limit dataset threads to reduce CPU contention

* Tuned dataset sleep time

* Rename dataset sleep flag; enable it only for Keras Graph mode
```
  50dfb31d
17 Apr, 2019 2 commits
- Revert "Set input layer `batch_size` in multi-replica mode" (#6598) · 4f3cc31c
  rxsang authored Apr 17, 2019
```
* Revert "Set input layer `batch_size` in multi-replica mode (#6578)"

This reverts commit f1a59682.

* Rename variables.
```
  4f3cc31c
- Set input layer `batch_size` in multi-replica mode (#6578) · f1a59682
  rxsang authored Apr 16, 2019
  
  f1a59682
11 Apr, 2019 1 commit

Ensure static shapes when enabling XLA in Resnet Keras model in graph mode. (#6558) · e08b6286

rxsang authored Apr 10, 2019

* Revert "Revert " Ensure static shapes when enabling XLA in Resnet Keras model (#6508)" (#6517)"

This reverts commit cc9eef76.

* Set `batch_size` to keras.Input in non-eager mode.

Eager mode currently has OOM problem.

* Add comments for enable_eager flag.

* Always set drop_remainder=True.

* Only set drop_remainder=True for XLA.

e08b6286

08 Apr, 2019 1 commit

Add DS support for NCF keras (#6447) · 1255d5b9

Shining Sun authored Apr 08, 2019

* add ds support for ncf

* remove comments for in_top_k

* avoid expanding the input layers

* resolve comments and fix lint

* Added some comments in code and fix lint

* fix lint

* add some documentation

* add tensorflow imports

1255d5b9

05 Apr, 2019 1 commit

Add profiler callback for Keras models (#6528) · 3f94db4e

Haoyu Zhang authored Apr 04, 2019

* Add profiler callback for Keras models

* Update build stats to identify time callback by type

* Add warning message when both TensorBoard and profiler callbacks are used

3f94db4e

03 Apr, 2019 4 commits
- Add dynamic loss scaling support (#6518) · 17e923da
  Reed authored Apr 03, 2019
  
  17e923da
- Revert " Ensure static shapes when enabling XLA in Resnet Keras model (#6508)" (#6517) · cc9eef76
  Haoyu Zhang authored Apr 03, 2019
```
Reason: break 1-gpu nightly test.

This reverts commit 371645fc.
```
  cc9eef76
- Add flag to enable Xprof (#6352) · 154d3ffa
  Haoyu Zhang authored Apr 02, 2019
  
  154d3ffa
- Fix Resnet XLA with multi-GPUs (#6510) · 6dea4846
  rxsang authored Apr 02, 2019
```
Don't pass `batch_size` to keras.layers.Input in DS multi-replica case. There is currently a bug in Keras side which will cause a batch size incompatible error.
```
  6dea4846
02 Apr, 2019 1 commit

Ensure static shapes when enabling XLA in Resnet Keras model (#6508) · 371645fc

rxsang authored Apr 02, 2019

* Update resnet_model.py

* Ensure static shapes when enabling XLA.

* Define `drop_remainder` as a variable.

* Handles per_replica_batch_size in non-XLA mode

* Remove trailing whitespace.

371645fc

28 Mar, 2019 1 commit
- Add trivial Keras model (#6460) · b09685fe
  Haoyu Zhang authored Mar 27, 2019
  
  b09685fe
26 Mar, 2019 1 commit
- Move distribution strategy creation before creating any ops, which is (#6435) · b3594a83
  Yuefeng Zhou authored Mar 25, 2019
```
required by multi-node collective ops in eager mode.
```
  b3594a83
22 Mar, 2019 1 commit
- Disable Tensorboard callback by default (#6424) · 8d5d36e0
  Haoyu Zhang authored Mar 22, 2019
  
  8d5d36e0
20 Mar, 2019 1 commit
- Added thread tuning and tweaked tests to improve Keras model performance (#6396) · 7b5606a5
  Haoyu Zhang authored Mar 19, 2019
  
  7b5606a5
19 Mar, 2019 2 commits
- Add config to enable XLA in TF 2.0 (#6406) · dba24007
  Haoyu Zhang authored Mar 19, 2019
  
  dba24007
- Add the option to run Keras resnet model on multiple workers. (#6368) · 3024bde6
  Soroush Radpour authored Mar 19, 2019
  
  3024bde6
06 Mar, 2019 1 commit
- Mixed precision support (#6309) · e4a046e7
  Reed authored Mar 06, 2019
```
* Mixed precision support

* Add TODOs
```
  e4a046e7