Commits · b4e560dc6c254366a16f7cceb85ced731a4ddcb8 · ModelZoo / ResNet50_tensorflow

11 Oct, 2019 2 commits
- Revert "Update usage of tf.contrib.data to tf.data.experimental" (#7654) · b4e560dc
  Hongkun Yu authored Oct 11, 2019
```
* Revert "Update tf.contrib.data to tf.data.experimental. (#7650)"

This reverts commit faf4bbb3.

* revert research
```
  b4e560dc
- Update tf.contrib.data to tf.data.experimental. (#7650) · faf4bbb3
  Derek Murray authored Oct 11, 2019
  
  faf4bbb3
09 Sep, 2019 1 commit

Unexpose some flags from models which do not use them. · e91c41c2

Reed Wanderman-Milne authored Sep 09, 2019

--stop_threshold, --num_gpu, --hooks, --export_dir, and --distribution_strategy have been unexposed from models which do not use them

PiperOrigin-RevId: 268032080

e91c41c2

04 Sep, 2019 1 commit

Unexpose some flags from models which do not use them. · a85c40e3

Reed Wanderman-Milne authored Sep 03, 2019

--clean, --train_epochs, and --epochs_between_evals have been unexposed from models which do not use them

PiperOrigin-RevId: 267065651

a85c40e3

26 Aug, 2019 1 commit

Unexpose some flags from models which do not use them. · 560b3af4

Reed Wanderman-Milne authored Aug 26, 2019

--synthetic_data, --dtype, --all_reduce_alg, and --num_packs have been unexposed from models which do not use them

PiperOrigin-RevId: 265483564

560b3af4

23 Aug, 2019 1 commit

Unexpose some flags from models which do not use them. · 882e51a4

Reed Wanderman-Milne authored Aug 22, 2019

--num_parallel_calls, --inter_op_parallelism_threads, and --intra_op_parallelism_threads have been unexposed from models which do not use them

PiperOrigin-RevId: 264965788

882e51a4

20 Aug, 2019 1 commit
- Move Estimator-only export utils to R1 · a53371eb
  Hongkun Yu authored Aug 19, 2019
```
PiperOrigin-RevId: 264300408
```
  a53371eb
19 Aug, 2019 1 commit

Do not expose --max_train_steps in models that do not use it. · 824ff2d6

Reed Wanderman-Milne authored Aug 19, 2019

Only the V1 resnet model uses --max_train_steps. This unexposes the flag in the keras_application_models, mnist, keras resnet, CTL resnet Models. Before this change, such models allowed the flag to be specified, but ignored it.

I also removed the "max_train" argument from the run_synthetic function, since this only had any meaning for the V1 resnet model. Instead, the V1 resnet model now directly passes --max_train_steps=1 to run_synthetic.

PiperOrigin-RevId: 264269836

824ff2d6

16 Aug, 2019 1 commit

Add multi-worker benchmarks to Keras ResNet model. · ff6c3b1e

Ayush Dubey authored Aug 16, 2019

Also add `worker_hosts` and `task_index` flags.  These flags enable running the
model over multiple hosts by passing the cluster information via command line.

Setting `TF_CONFIG` will continue to work.

PiperOrigin-RevId: 263825245

ff6c3b1e

01 Aug, 2019 1 commit
- Move the official ResNet (estimator version) under `official/r1` (#7355) · 87542800
  Haoyu Zhang authored Aug 01, 2019
```
* Restructure resnet estimator code to under official/r1

* Continue moving resnet code...

* Improved README.md
```
  87542800
21 Jun, 2019 1 commit

NCF XLA and Eager tests with a refactor of resnet flags to make this cleaner. (#7067) · a68f65f8

Toby Boyd authored Jun 21, 2019

* XLA FP32 and first test

* More XLA benchmarks FP32.

* Add eager to NCF and refactor resnet.

* fix v2_0 calls and more flag refactor.

* Remove extra flag args.

* 90 epoch default

* add return

* remove xla not used by estimator.

* Remove duplicate run_eagerly.

* fix flag defaults.

* Remove fp16_implementation flag option.

* Remove stop early on mlperf test.

* remove unneeded args.

* load flags from keras mains.

a68f65f8

19 Jun, 2019 1 commit

Add XLA to transformer (#7048) · 269581dc

Toby Boyd authored Jun 19, 2019



* set default steps to 300K.

* Log flags to perfzero.

* Add XLA support to transformer

- Moved config logic to keras_utils
- Added enable_xla flag to _performance flags
- Did not refactor enable_xla flag from keras resnet due to
  reliance on calling FLAGs in estimator keras and that is
  a needed refactor for another time.

* fix g3 lint complaint.

* Refactor set config into keras_utils.

* Move flags out of main.

* pipe through enable_xla

* Update official/transformer/v2/misc.py
Co-Authored-By: Reed <reedwm@google.com>

269581dc

14 Jun, 2019 1 commit

Fix graph.rewrite for TF 2.0 and remove num_parallel_batches (#7019) · d44b7283

Toby Boyd authored Jun 14, 2019

* tf.compat.v1.train.experimental.enable_mixed_precision_graph_rewrite

* Remove num_parallel_batches which is not used.

d44b7283

06 Jun, 2019 1 commit

Have each model provide a default loss scale. (#6930) · 42a8af1d

Reed authored Jun 06, 2019

Before, there was a global default loss scale for all models. Currently, only resnet uses loss scaling, but this will be useful once more models support it.

42a8af1d

18 May, 2019 1 commit
- Do not return immediately after train_and_evaluate. (#6807) · 23662bb4
  Ayush Dubey authored May 17, 2019
  
  23662bb4
15 May, 2019 2 commits

Fix lint errors. (#6788) · 03027e33
Rachel Lim authored May 15, 2019

03027e33

Adds keras imagenet benchmarks which use tf.data's `experimental_slack` option. (#6744) · 6aa6bac5

Rachel Lim authored May 15, 2019

* Added 'tfdata_exp' version of all benchmarks which set
FLAGS.tf_data_experimental_slack = True. Renamed
`data_prefetch_with_slack` to `data_delay_prefetch` (haoyu's change)
to make the names more distinct.

* Add flag to resnet input pipeline and surface through
keras_imagenet_main.py

6aa6bac5

11 May, 2019 1 commit

Add FP16 to transformer with benchmark tests. (#6756) · b7e97bec

Toby Boyd authored May 10, 2019

* Add FP16 and benchmarks.

* add missing run and report.

* Add loss_scale as option not included with dtype.

* move loss_scale validation under dtype conditional.

* add loss_scale to flags tested.

b7e97bec

01 May, 2019 1 commit

Add --fp16_implementation option. (#6703) · b691578c

Reed authored May 01, 2019

This options allows the new tf.train.experimental.enable_mixed_precision_graph_rewrite() function to be used for fp16, instead of manual casts.

b691578c

29 Apr, 2019 1 commit

Replace per_device with per_replica and PerDevice with PerReplica, because the... · b00783d7

Igor authored Apr 29, 2019

Replace per_device with per_replica and PerDevice with PerReplica, because the PerDevice concept was renamed and doesn't exist anymore. (#6693)

* Replace per_device with per_replica and PerDevice with PerReplica, because the PerReplica concept was renamed and doesn't exist anymore.

b00783d7

26 Apr, 2019 2 commits

Combined imagenet and cifar-10 estimator tests (#6672) · acc6f6d7

Toby Boyd authored Apr 26, 2019

* Combined imagenet and cifar-10 benchmarks

* Comments and epochs_between_evals.

* Added tuned tests and cleaned up benchmark flags

* Fix names.

* Return results and add images/sec hook.

* updated doc strings for return values.

* 128 to 256 batch for FP16 test

* added more doc strings to fix lint.

acc6f6d7

Add num_packs flag for MirroredStrategy's cross device ops. (#6676) · 4a1fba0b

Ayush Dubey authored Apr 26, 2019

* Add num_packs flag for MirroredStrategy's cross device ops.

* fix parens

* Fix lint errors and make all_reduce_alg more robust.

* Set default num_packs to 1

4a1fba0b

18 Apr, 2019 1 commit
- Update logic to rescale L2 loss in distribution strategy (#6601) · c33d3ef4
  Haoyu Zhang authored Apr 17, 2019
  
  c33d3ef4
17 Apr, 2019 1 commit

tf.estimator.train_and_evalute doesn't return anything in multi-worker case. (#6582) · 20b19b61

Yuefeng Zhou authored Apr 17, 2019

* Update resnet_run_loop.py

* Update resnet_run_loop.py

* Update resnet_run_loop.py

* Update resnet_run_loop.py

* Update resnet_run_loop.py

20b19b61

11 Apr, 2019 1 commit

Ensure static shapes when enabling XLA in Resnet Keras model in graph mode. (#6558) · e08b6286

rxsang authored Apr 10, 2019

* Revert "Revert " Ensure static shapes when enabling XLA in Resnet Keras model (#6508)" (#6517)"

This reverts commit cc9eef76.

* Set `batch_size` to keras.Input in non-eager mode.

Eager mode currently has OOM problem.

* Add comments for enable_eager flag.

* Always set drop_remainder=True.

* Only set drop_remainder=True for XLA.

e08b6286

03 Apr, 2019 2 commits
- Add dynamic loss scaling support (#6518) · 17e923da
  Reed authored Apr 03, 2019
  
  17e923da
- Revert " Ensure static shapes when enabling XLA in Resnet Keras model (#6508)" (#6517) · cc9eef76
  Haoyu Zhang authored Apr 03, 2019
```
Reason: break 1-gpu nightly test.

This reverts commit 371645fc.
```
  cc9eef76
02 Apr, 2019 1 commit

Ensure static shapes when enabling XLA in Resnet Keras model (#6508) · 371645fc

rxsang authored Apr 02, 2019

* Update resnet_model.py

* Ensure static shapes when enabling XLA.

* Define `drop_remainder` as a variable.

* Handles per_replica_batch_size in non-XLA mode

* Remove trailing whitespace.

371645fc

30 Mar, 2019 1 commit
- Optimize data input pipeline · eebea3f8
  Haoyu Zhang authored Mar 29, 2019
```
Co-authored-by: Jiri Simsa <jsimsa@google.com>
```
  eebea3f8
28 Mar, 2019 2 commits

Scale up learning rate according to num workers in Estimator imagenet models. (#6472) · 76300c26

Yuefeng Zhou authored Mar 28, 2019

* Move distribution strategy creation before creating any ops, which is
required by multi-node collective ops in eager mode.

* Scale up learning rate according to num workers in ResNet50 w/
Estimator.

* Scale up LR in cifar.

* Fix a typo.

* Add num_workers to run param as well. Make num_worker optional in
params.

76300c26

Re-enable checkpoints for multi worker GPU strategies. (#6471) · 6d3989eb
Ayush Dubey authored Mar 28, 2019

6d3989eb

19 Mar, 2019 1 commit

Shard input for distribution strategy. (#6349) · 04792078

Ayush Dubey authored Mar 19, 2019

* Shard input for distribution strategy.

* Pass in input_context from real input_fn.

* Pass in input_context from real input_fn.

* Make pipeline id base 1 for better readability.

04792078

12 Mar, 2019 1 commit
- V1 optimizer fix (#6350) · 9bdfb04a
  Toby Boyd authored Mar 12, 2019
```
* optimizer back to compat.v1

* add doc string to fix lint
```
  9bdfb04a
11 Mar, 2019 1 commit

Adding LARS to ResNet (#6327) · 0b0dc7f5

pkanwar23 authored Mar 11, 2019

* Adding LARS to ResNet

* Fixes for the LARS patch

* Fixes for the LARS patch

* more fixes

* 1 more fix

0b0dc7f5

07 Mar, 2019 2 commits

No checkpointing only if multi worker strategy. (#6322) · a5db4420
Ayush Dubey authored Mar 07, 2019

a5db4420

Add command line option for multi worker collective implementations, disable checkpointing. (#6317) · 05a79f5a

Ayush Dubey authored Mar 07, 2019

* s/CollectiveAllReduceStrategy/MultiWorkerMirroredStrategy

* More s/contrib.distribute/distribute.experimental

* Collective communication options in MultiWorkerMirroredStrategy.

* Minor fixes

* No checkpointing if multi worker.

* turn off checkpointing

* fix lint

05a79f5a

22 Feb, 2019 2 commits
- Set data_dir to cifar-10-batches-bin in keras_cifar_benchmark.py (#6251) · da1d3e60
  Dong Lin authored Feb 22, 2019
  
  da1d3e60
- Remove isintance change for contrib strategy (#6250) · 21a4ad75
  guptapriya authored Feb 22, 2019
```
* Remove isintance change for contrib strategy

Replace it with class name check instead which should work regardless

* Add quotes for string

* fix quote type
```
  21a4ad75
21 Feb, 2019 1 commit

Multi-worker support for Resnet. (#6206) · f2e90945

Ayush Dubey authored Feb 21, 2019

* Update official resnet for multi worker training with distribution strategies.

* Fixes for multi worker training.

* Fix call to `get_distribution_strategy`.

* Undo test change.

* Fix spacing.

* Move cluster configuration to distribution_utils.

* Move train_and_evaluate out of loop.  Also, update docstrings for multi-worker flags and add use_train_and_evaluate flag.

* Update distribution_strategy flag to match exported name for collective strategy.

f2e90945

19 Feb, 2019 1 commit
- Pass datasets_num_private_threads flag into Keras resnet model. (#6211) · ad386df5
  Yuefeng Zhou authored Feb 18, 2019
  
  ad386df5