Commits · a2a8ff8280ff8b2bcf4580c30ce62b0b3f44370b · ModelZoo / ResNet50_tensorflow

16 Aug, 2019 2 commits

Show flags when --help is specified in resnet. · a2a8ff82
Reed authored Aug 16, 2019

a2a8ff82

Add multi-worker benchmarks to Keras ResNet model. · ff6c3b1e

Ayush Dubey authored Aug 16, 2019

Also add `worker_hosts` and `task_index` flags.  These flags enable running the
model over multiple hosts by passing the cluster information via command line.

Setting `TF_CONFIG` will continue to work.

PiperOrigin-RevId: 263825245

ff6c3b1e

06 Aug, 2019 1 commit

[ResNet / NCF] Test force V1 path and allow V2 path as default (#7383) · 97622ffc

Toby Boyd authored Aug 05, 2019

* force_v2_in_keras_compile FLAG default to None and added seperate temp path.

* switch to force testing 1v path not force v2 path.

* Rename function force_v1_path.

97622ffc

05 Aug, 2019 1 commit
- Remove layout_off tests and related utils. (#7359) · 6545cb3c
  Toby Boyd authored Aug 05, 2019
  
  6545cb3c
02 Aug, 2019 1 commit

Merged commit includes the following changes: (#7365) · 1921a3b5

Haoyu Zhang authored Aug 02, 2019

261339941  by haoyuzhang<haoyuzhang@google.com>:

    Own library functions in Keras ResNet models, and remove dependencies on v1 Estimator version of ResNet models.

    Most dependencies that the Keras version has are related to data input pipelines. Created dedicated files (cifar_preprocessing.py, imagenet_preprocessing.py) to collect all logic handling Cifar and ImageNet data input function.

--
261339166  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

261317601  by akuegel<akuegel@google.com>:

    Internal change

261218818  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

PiperOrigin-RevId: 261339941

1921a3b5

01 Aug, 2019 1 commit

Merged commit includes the following changes: (#7354) · dc4c5f1a

Haoyu Zhang authored Aug 01, 2019

261171038  by gjn<gjn@google.com>:

    Remove weight_decay_rate 0 early exit check

    Removing this code path should be fine since this was actually not doing
    what it meant to do. Since weight_decay_rate is actually a tensor, the
    equality check was only looking at the id of the object and comparing to
    0. This should never be true. Evaluating a tensor is also not what we
    want to do at this point of the code. Thus it should be fine to simply
    remove this code.

--
261169862  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

261153520  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

261140302  by hongkuny<hongkuny@google.com>:

    Clean up

--

PiperOrigin-RevId: 261171038

dc4c5f1a

31 Jul, 2019 1 commit
- Change to experimental_run_tf_function. (#7344) · a552e76a
  Toby Boyd authored Jul 31, 2019
  
  a552e76a
24 Jul, 2019 1 commit
- Returning an object causes the program to exit with a non-zero code. (#7294) · 9fb1a1b6
  Soroush Radpour authored Jul 24, 2019
  
  9fb1a1b6
23 Jul, 2019 1 commit

Single execution path tests for ResNet50, ResNet56, NCF, and Shakespeare LSTM. (#7276) · 9d8c9aa4

Toby Boyd authored Jul 23, 2019

* Add force_run_distributed tests.

* Added enable_eager

* r/force_run_distributed/force_v2_in_keras_compile

* Adding force_v2 tests and FLAGs.

* Rename method to avoid conflict.

* Add cpu force_v2 tests.

* fix lint, wrap line.

* change to force_v2_in_keras_compile

* Update method name.

* Lower mlperf target to 0.736.

9d8c9aa4

19 Jul, 2019 1 commit

Merged commit includes the following changes: (#7263) · c5a4978d

Jing Li authored Jul 19, 2019

* Merged commit includes the following changes:
258867180  by jingli<jingli@google.com>:

    Add new folders for upcoming reorg in model garden.

--
258893811  by hongkuny<hongkuny@google.com>:

    Adds summaries for metrics, allowing metrics inside keras.model.

--
258893048  by isaprykin<isaprykin@google.com>:

    Remove the `cloning` argument to `compile()`.

    Keras models are distributed by cloning in graph mode and without cloning in eager mode as of the change # 258652546.

--
258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258874998  by hongkuny<hongkuny@google.com>:

    Internal

--
258872662  by hongkuny<hongkuny@google.com>:

    Fix doc

--

PiperOrigin-RevId: 258867180

* Create __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

c5a4978d

21 Jun, 2019 1 commit

NCF XLA and Eager tests with a refactor of resnet flags to make this cleaner. (#7067) · a68f65f8

Toby Boyd authored Jun 21, 2019

* XLA FP32 and first test

* More XLA benchmarks FP32.

* Add eager to NCF and refactor resnet.

* fix v2_0 calls and more flag refactor.

* Remove extra flag args.

* 90 epoch default

* add return

* remove xla not used by estimator.

* Remove duplicate run_eagerly.

* fix flag defaults.

* Remove fp16_implementation flag option.

* Remove stop early on mlperf test.

* remove unneeded args.

* load flags from keras mains.

a68f65f8

20 Jun, 2019 1 commit

Improve performance of Keras ResNet models when not using distribution strategy (#7055) · cf3c2407

Haoyu Zhang authored Jun 20, 2019

* Do not set learning phase when skipping eval

* Do not set learning phase in no dist strat case

* Added device placement, tweaked benchmarks

* Added tweaked benchmarks for Cifar

* Fix device scope

* Fix lint

* Add explicit GPU placement flag

* Also run accuracy test with explicit GPU placement

* Added doc string

cf3c2407

19 Jun, 2019 1 commit

Add XLA to transformer (#7048) · 269581dc

Toby Boyd authored Jun 19, 2019



* set default steps to 300K.

* Log flags to perfzero.

* Add XLA support to transformer

- Moved config logic to keras_utils
- Added enable_xla flag to _performance flags
- Did not refactor enable_xla flag from keras resnet due to
  reliance on calling FLAGs in estimator keras and that is
  a needed refactor for another time.

* fix g3 lint complaint.

* Refactor set config into keras_utils.

* Move flags out of main.

* pipe through enable_xla

* Update official/transformer/v2/misc.py
Co-Authored-By: Reed <reedwm@google.com>

269581dc

10 Jun, 2019 1 commit
- Code cleanup. (#6989) · f7a44074
  rxsang authored Jun 10, 2019
  
  f7a44074
06 Jun, 2019 1 commit

Have each model provide a default loss scale. (#6930) · 42a8af1d

Reed authored Jun 06, 2019

Before, there was a global default loss scale for all models. Currently, only resnet uses loss scaling, but this will be useful once more models support it.

42a8af1d

31 May, 2019 1 commit
- Support pure eager execution in ResNet50 (#6929) · f6c2d9f8
  Haoyu Zhang authored May 30, 2019
```
* Support pure eager execution in ResNet50

* Use smaller batch size
```
  f6c2d9f8
23 May, 2019 2 commits

Fix non dist strat case. (#6867) · 68650c42
rxsang authored May 23, 2019

68650c42

Add enable_get_next_as_optional flag. (#6858) · 272a2baa

rxsang authored May 22, 2019

* Add enable_get_next_as_optional flag.

* Set enable_get_next_as_optional to strategy.

* Add comments to explain the flag.

* Remove trailing whitespace.

* Remove trailing space.

272a2baa

15 May, 2019 1 commit

Adds keras imagenet benchmarks which use tf.data's `experimental_slack` option. (#6744) · 6aa6bac5

Rachel Lim authored May 15, 2019

* Added 'tfdata_exp' version of all benchmarks which set
FLAGS.tf_data_experimental_slack = True. Renamed
`data_prefetch_with_slack` to `data_delay_prefetch` (haoyu's change)
to make the names more distinct.

* Add flag to resnet input pipeline and surface through
keras_imagenet_main.py

6aa6bac5

10 May, 2019 2 commits
- Fix trivial model to work properly with fp16 (#6760) · c0b31c51
  Haoyu Zhang authored May 10, 2019
```
* Fix trivial model to work properly with fp16

* Add comment on manual casting
```
  c0b31c51
- Do not report accuracy metrics for benchmark tests (#6757) · cfa37aab
  Haoyu Zhang authored May 10, 2019
```
* Do not report metrics in performance benchmarks

* Rename flag
```
  cfa37aab
09 May, 2019 1 commit

Use TensorFlow ops for Keras LearningRateSchedule (#6739) · 9d38e894

Haoyu Zhang authored May 09, 2019

* Add learning rate tensor. This makes training slower

* Improve LearningRateSchedule with better efficiency

* Fix lint error

* Replace constant definition with existing one

9d38e894

07 May, 2019 1 commit
- Use flags to define collective ops when initializing MirroredStrategy (#6724) · f5073f49
  Haoyu Zhang authored May 07, 2019
  
  f5073f49
04 May, 2019 1 commit

Enable CuDNN BatchNorm spatial persistent by default (#6710) · 58deb059

Haoyu Zhang authored May 03, 2019

* Enable CuDNN BatchNorm spatial persistent by default; Remove 2nd zero padding layer

* Apply scale=False and fused=True consistently to BatchNorm layers

* Undo remove padding layer

* Replace zero padding with padding attribute in max pooling for better performance

* Resolve comments

* Revert "Replace zero padding with padding attribute in max pooling for better performance"

This reverts commit ad49db057c800ecac008eec1057005bd2c08ac73.

58deb059

29 Apr, 2019 1 commit

Add benchmarks with the --cloning flag to Resnet and NFC. (#6675) · af47736d

Igor authored Apr 29, 2019

* Add benchmarks with the --cloning flag to Resnet and NFC.

* Renamed cloning to clone_model_in_keras_dist_strat. Dropped a few tests that aren't essential.

* Fixed up the formatting after re-naming the flag to a much longer  name.  Thanks, lint.
* Fixed the lint error in nfc_common.py

af47736d

25 Apr, 2019 1 commit
- Revert "Specify NCCL as the all reduce algorithm (#6662)" (#6671) · a7338771
  Haoyu Zhang authored Apr 25, 2019
```
Reason: test failures because contrib is not available in V2

This reverts commit 325dd761.
```
  a7338771
24 Apr, 2019 2 commits
- Specify NCCL as the all reduce algorithm (#6662) · 325dd761
  Haoyu Zhang authored Apr 24, 2019
  
  325dd761
- Add experimental tf.data sleep tuning for better performance (#6634) · 50dfb31d
  Haoyu Zhang authored Apr 23, 2019
```
* Introduce a short sleep before ds.prefetch in tf.data.

* Further limit dataset threads to reduce CPU contention

* Tuned dataset sleep time

* Rename dataset sleep flag; enable it only for Keras Graph mode
```
  50dfb31d
17 Apr, 2019 2 commits
- Revert "Set input layer `batch_size` in multi-replica mode" (#6598) · 4f3cc31c
  rxsang authored Apr 17, 2019
```
* Revert "Set input layer `batch_size` in multi-replica mode (#6578)"

This reverts commit f1a59682.

* Rename variables.
```
  4f3cc31c
- Set input layer `batch_size` in multi-replica mode (#6578) · f1a59682
  rxsang authored Apr 16, 2019
  
  f1a59682
11 Apr, 2019 1 commit

Ensure static shapes when enabling XLA in Resnet Keras model in graph mode. (#6558) · e08b6286

rxsang authored Apr 10, 2019

* Revert "Revert " Ensure static shapes when enabling XLA in Resnet Keras model (#6508)" (#6517)"

This reverts commit cc9eef76.

* Set `batch_size` to keras.Input in non-eager mode.

Eager mode currently has OOM problem.

* Add comments for enable_eager flag.

* Always set drop_remainder=True.

* Only set drop_remainder=True for XLA.

e08b6286

08 Apr, 2019 1 commit

Add DS support for NCF keras (#6447) · 1255d5b9

Shining Sun authored Apr 08, 2019

* add ds support for ncf

* remove comments for in_top_k

* avoid expanding the input layers

* resolve comments and fix lint

* Added some comments in code and fix lint

* fix lint

* add some documentation

* add tensorflow imports

1255d5b9

05 Apr, 2019 1 commit

Add profiler callback for Keras models (#6528) · 3f94db4e

Haoyu Zhang authored Apr 04, 2019

* Add profiler callback for Keras models

* Update build stats to identify time callback by type

* Add warning message when both TensorBoard and profiler callbacks are used

3f94db4e

03 Apr, 2019 4 commits
- Add dynamic loss scaling support (#6518) · 17e923da
  Reed authored Apr 03, 2019
  
  17e923da
- Revert " Ensure static shapes when enabling XLA in Resnet Keras model (#6508)" (#6517) · cc9eef76
  Haoyu Zhang authored Apr 03, 2019
```
Reason: break 1-gpu nightly test.

This reverts commit 371645fc.
```
  cc9eef76
- Add flag to enable Xprof (#6352) · 154d3ffa
  Haoyu Zhang authored Apr 02, 2019
  
  154d3ffa
- Fix Resnet XLA with multi-GPUs (#6510) · 6dea4846
  rxsang authored Apr 02, 2019
```
Don't pass `batch_size` to keras.layers.Input in DS multi-replica case. There is currently a bug in Keras side which will cause a batch size incompatible error.
```
  6dea4846
02 Apr, 2019 1 commit

Ensure static shapes when enabling XLA in Resnet Keras model (#6508) · 371645fc

rxsang authored Apr 02, 2019

* Update resnet_model.py

* Ensure static shapes when enabling XLA.

* Define `drop_remainder` as a variable.

* Handles per_replica_batch_size in non-XLA mode

* Remove trailing whitespace.

371645fc

28 Mar, 2019 1 commit
- Add trivial Keras model (#6460) · b09685fe
  Haoyu Zhang authored Mar 27, 2019
  
  b09685fe
26 Mar, 2019 1 commit
- Move distribution strategy creation before creating any ops, which is (#6435) · b3594a83
  Yuefeng Zhou authored Mar 25, 2019
```
required by multi-node collective ops in eager mode.
```
  b3594a83