Commits · bd488858d610e44df69da6f89277e9de8a03722c · ModelZoo / ResNet50_tensorflow

17 Mar, 2020 1 commit
- tf.compat.v1.logging implemented with absl · 3043566d
  ayushmankumar7 authored Mar 18, 2020
  
  3043566d
05 Mar, 2020 1 commit
- Remove force_v2_in_keras_compile. experimental_run_tf_function is no-op now. · d3d7f15f
  Hongkun Yu authored Mar 05, 2020
```
PiperOrigin-RevId: 299160422
```
  d3d7f15f
02 Mar, 2020 1 commit
- Add TimeHistory callback to BERT. · 533d1e6b
  Will Cromar authored Mar 02, 2020
```
PiperOrigin-RevId: 298466825
```
  533d1e6b
25 Feb, 2020 1 commit
- Creates modeling/performance.py to include mix prediction related stuff · fb35d6be
  Hongkun Yu authored Feb 24, 2020
```
PiperOrigin-RevId: 297002741
```
  fb35d6be
27 Nov, 2019 1 commit
- Remove 'default' in get_distribution_strategy which is complex and error-prone · 04256053
  Hongkun Yu authored Nov 26, 2019
```
PiperOrigin-RevId: 282669615
```
  04256053
28 Oct, 2019 1 commit
- Add Resnet50 benchmark suite that read training data from remote storage · 06f22a59
  Zongwei Zhou authored Oct 28, 2019
```
PiperOrigin-RevId: 277082247
```
  06f22a59
21 Oct, 2019 1 commit
- Fix typo in utils/flags/_conventions.py · c3bf4a79
  minoring authored Oct 21, 2019
```
arparse -> argparse
```
  c3bf4a79
16 Oct, 2019 1 commit

Add support for the tf.keras.mixed_precision API in NCF · cb913691

Reed Wanderman-Milne authored Oct 16, 2019

To test, I did 50 fp32 runs and 50 fp16 runs. I used the following command:

python ncf_keras_main.py --dataset=ml-20m --num_gpus=1 --train_epochs=10 --clean --batch_size=99000 --learning_rate=0.00382059 --beta1=0.783529 --beta2=0.909003 --epsilon=1.45439e-7 --layers=256,256,128,64 --num_factors=64 --hr_threshold=0.635 --ml_perf --nouse_synthetic_data --data_dir ~/ncf_data_dir_python3 --model_dir ~/tmp_model_dir --keras_use_ctl

For the fp16 runs, I added --dtype=fp16. The average hit-rate for both fp16 and fp32 was 0.6365. I also did 50 runs with the mixed precision graph rewrite, and the average hit-rate was 0.6363. The difference is likely due to noise.

PiperOrigin-RevId: 275059871

cb913691

07 Oct, 2019 1 commit
- Internal change · 41293260
  A. Unique TensorFlower authored Oct 07, 2019
```
PiperOrigin-RevId: 273371605
```
  41293260
09 Sep, 2019 1 commit

Unexpose some flags from models which do not use them. · e91c41c2

Reed Wanderman-Milne authored Sep 09, 2019

--stop_threshold, --num_gpu, --hooks, --export_dir, and --distribution_strategy have been unexposed from models which do not use them

PiperOrigin-RevId: 268032080

e91c41c2

04 Sep, 2019 1 commit

Unexpose some flags from models which do not use them. · a85c40e3

Reed Wanderman-Milne authored Sep 03, 2019

--clean, --train_epochs, and --epochs_between_evals have been unexposed from models which do not use them

PiperOrigin-RevId: 267065651

a85c40e3

30 Aug, 2019 1 commit
- Fix bug where dynamic loss scaling was broken · 765da424
  Reed Wanderman-Milne authored Aug 30, 2019
```
PiperOrigin-RevId: 266376708
```
  765da424
26 Aug, 2019 1 commit

Unexpose some flags from models which do not use them. · 560b3af4

Reed Wanderman-Milne authored Aug 26, 2019

--synthetic_data, --dtype, --all_reduce_alg, and --num_packs have been unexposed from models which do not use them

PiperOrigin-RevId: 265483564

560b3af4

23 Aug, 2019 1 commit

Unexpose some flags from models which do not use them. · 882e51a4

Reed Wanderman-Milne authored Aug 22, 2019

--num_parallel_calls, --inter_op_parallelism_threads, and --intra_op_parallelism_threads have been unexposed from models which do not use them

PiperOrigin-RevId: 264965788

882e51a4

20 Aug, 2019 2 commits
- fix transformer amp · c9c05e9b
  Vinh Nguyen authored Aug 20, 2019
  
  c9c05e9b
- change default fp16_implementation to graph_rewrite · c186c85a
  Vinh Nguyen authored Aug 20, 2019
  
  c186c85a
19 Aug, 2019 1 commit

Do not expose --max_train_steps in models that do not use it. · 824ff2d6

Reed Wanderman-Milne authored Aug 19, 2019

Only the V1 resnet model uses --max_train_steps. This unexposes the flag in the keras_application_models, mnist, keras resnet, CTL resnet Models. Before this change, such models allowed the flag to be specified, but ignored it.

I also removed the "max_train" argument from the run_synthetic function, since this only had any meaning for the V1 resnet model. Instead, the V1 resnet model now directly passes --max_train_steps=1 to run_synthetic.

PiperOrigin-RevId: 264269836

824ff2d6

16 Aug, 2019 1 commit

Add multi-worker benchmarks to Keras ResNet model. · ff6c3b1e

Ayush Dubey authored Aug 16, 2019

Also add `worker_hosts` and `task_index` flags.  These flags enable running the
model over multiple hosts by passing the cluster information via command line.

Setting `TF_CONFIG` will continue to work.

PiperOrigin-RevId: 263825245

ff6c3b1e

06 Aug, 2019 1 commit

[ResNet / NCF] Test force V1 path and allow V2 path as default (#7383) · 97622ffc

Toby Boyd authored Aug 05, 2019

* force_v2_in_keras_compile FLAG default to None and added seperate temp path.

* switch to force testing 1v path not force v2 path.

* Rename function force_v1_path.

97622ffc

23 Jul, 2019 1 commit

Single execution path tests for ResNet50, ResNet56, NCF, and Shakespeare LSTM. (#7276) · 9d8c9aa4

Toby Boyd authored Jul 23, 2019

* Add force_run_distributed tests.

* Added enable_eager

* r/force_run_distributed/force_v2_in_keras_compile

* Adding force_v2 tests and FLAGs.

* Rename method to avoid conflict.

* Add cpu force_v2 tests.

* fix lint, wrap line.

* change to force_v2_in_keras_compile

* Update method name.

* Lower mlperf target to 0.736.

9d8c9aa4

21 Jun, 2019 2 commits

Fix help print error when stdout/stderr not use utf-8 encoding (#7079) · 0f6845ce
Neil authored Jun 22, 2019

0f6845ce

NCF XLA and Eager tests with a refactor of resnet flags to make this cleaner. (#7067) · a68f65f8

Toby Boyd authored Jun 21, 2019

* XLA FP32 and first test

* More XLA benchmarks FP32.

* Add eager to NCF and refactor resnet.

* fix v2_0 calls and more flag refactor.

* Remove extra flag args.

* 90 epoch default

* add return

* remove xla not used by estimator.

* Remove duplicate run_eagerly.

* fix flag defaults.

* Remove fp16_implementation flag option.

* Remove stop early on mlperf test.

* remove unneeded args.

* load flags from keras mains.

a68f65f8

19 Jun, 2019 1 commit

Add XLA to transformer (#7048) · 269581dc

Toby Boyd authored Jun 19, 2019



* set default steps to 300K.

* Log flags to perfzero.

* Add XLA support to transformer

- Moved config logic to keras_utils
- Added enable_xla flag to _performance flags
- Did not refactor enable_xla flag from keras resnet due to
  reliance on calling FLAGs in estimator keras and that is
  a needed refactor for another time.

* fix g3 lint complaint.

* Refactor set config into keras_utils.

* Move flags out of main.

* pipe through enable_xla

* Update official/transformer/v2/misc.py
Co-Authored-By: Reed <reedwm@google.com>

269581dc

06 Jun, 2019 1 commit

Have each model provide a default loss scale. (#6930) · 42a8af1d

Reed authored Jun 06, 2019

Before, there was a global default loss scale for all models. Currently, only resnet uses loss scaling, but this will be useful once more models support it.

42a8af1d

18 May, 2019 1 commit
- Include flags when reporting benchmark. (#6809) · bdae51af
  Reed authored May 17, 2019
```
This will allow one to easily reproduce a benchmark by running with the flags.
```
  bdae51af
15 May, 2019 1 commit

Adds keras imagenet benchmarks which use tf.data's `experimental_slack` option. (#6744) · 6aa6bac5

Rachel Lim authored May 15, 2019

* Added 'tfdata_exp' version of all benchmarks which set
FLAGS.tf_data_experimental_slack = True. Renamed
`data_prefetch_with_slack` to `data_delay_prefetch` (haoyu's change)
to make the names more distinct.

* Add flag to resnet input pipeline and surface through
keras_imagenet_main.py

6aa6bac5

11 May, 2019 1 commit

Add FP16 to transformer with benchmark tests. (#6756) · b7e97bec

Toby Boyd authored May 10, 2019

* Add FP16 and benchmarks.

* add missing run and report.

* Add loss_scale as option not included with dtype.

* move loss_scale validation under dtype conditional.

* add loss_scale to flags tested.

b7e97bec

01 May, 2019 1 commit

Add --fp16_implementation option. (#6703) · b691578c

Reed authored May 01, 2019

This options allows the new tf.train.experimental.enable_mixed_precision_graph_rewrite() function to be used for fp16, instead of manual casts.

b691578c

26 Apr, 2019 2 commits

Add num_packs flag for MirroredStrategy's cross device ops. (#6676) · 4a1fba0b

Ayush Dubey authored Apr 26, 2019

* Add num_packs flag for MirroredStrategy's cross device ops.

* fix parens

* Fix lint errors and make all_reduce_alg more robust.

* Set default num_packs to 1

4a1fba0b

Do not query GPU compatibility before app main (#6679) · 9b17d796

Gaurav Jain authored Apr 25, 2019

tf.test.is_gpu_available() should not be called in flags since this is
called before app.main() and the runtime has not yet been initialized.

9b17d796

03 Apr, 2019 1 commit
- Add dynamic loss scaling support (#6518) · 17e923da
  Reed authored Apr 03, 2019
  
  17e923da
28 Mar, 2019 1 commit

Added benchmark test and convergence test for the NCF model (#6318) · 4c11b84b

Shining Sun authored Mar 28, 2019

* initial commit

* bug fix

* Move build_stats from common to keras main, because it is only applicable in keras

* remove tailing blank line

* add test for synth data

* add kwargs to init

* add kwargs to function invokation

* correctly pass kwargs

* debug

* debug

* debug

* fix super init

* bug fix

* fix local_flags

* fix import

* bug fix

* fix log_steps flag

* bug fix

* bug fix: add missing return value

* resolve double-defined flags

* lint fix

* move log_steps flag to benchmarK flag

* fix lint

* lint fix

* lint fix

* try flag core default values

* bug fix

* bug fix

* bug fix

* debug

* debug

* remove debug prints

* rename benchmark methods

* flag bug fix for synth benchmark

4c11b84b

20 Mar, 2019 1 commit
- Added thread tuning and tweaked tests to improve Keras model performance (#6396) · 7b5606a5
  Haoyu Zhang authored Mar 19, 2019
  
  7b5606a5
07 Mar, 2019 1 commit

Add command line option for multi worker collective implementations, disable checkpointing. (#6317) · 05a79f5a

Ayush Dubey authored Mar 07, 2019

* s/CollectiveAllReduceStrategy/MultiWorkerMirroredStrategy

* More s/contrib.distribute/distribute.experimental

* Collective communication options in MultiWorkerMirroredStrategy.

* Minor fixes

* No checkpointing if multi worker.

* turn off checkpointing

* fix lint

05a79f5a

21 Feb, 2019 1 commit

Multi-worker support for Resnet. (#6206) · f2e90945

Ayush Dubey authored Feb 21, 2019

* Update official resnet for multi worker training with distribution strategies.

* Fixes for multi worker training.

* Fix call to `get_distribution_strategy`.

* Undo test change.

* Fix spacing.

* Move cluster configuration to distribution_utils.

* Move train_and_evaluate out of loop.  Also, update docstrings for multi-worker flags and add use_train_and_evaluate flag.

* Update distribution_strategy flag to match exported name for collective strategy.

f2e90945

13 Feb, 2019 1 commit

Add a flag to specify distribution strategies. (#6185) · 79b57a3f

Yuefeng Zhou authored Feb 12, 2019

* Add a flag to specify distribution strategies.

* Fix a small error.

* Address comments.

* Address comments.

* Fix typos.

79b57a3f

08 Feb, 2019 1 commit
- Revert "Revert "tf_upgrade_v2 on resnet and utils folders. (#6154)" (#6162)" (#6167) · b2c9e3f5
  Goldie Gadde authored Feb 08, 2019
```
This reverts commit 57e07520.
```
  b2c9e3f5
06 Feb, 2019 1 commit
- Revert "tf_upgrade_v2 on resnet and utils folders. (#6154)" (#6162) · 57e07520
  Goldie Gadde authored Feb 06, 2019
```
This reverts commit d6b2b83c.
```
  57e07520
05 Feb, 2019 1 commit

tf_upgrade_v2 on resnet and utils folders. (#6154) · d6b2b83c

Goldie Gadde authored Feb 05, 2019

* Add resnet56 short tests. (#6101)

* Add resnet56 short tests.
- created base benchmark module
- renamed accuracy test class to contain the word Accuracy
which will result in a need to update all the jobs
and a loss of history but is worth it.
- short tests are mostly copied from shining with oss refactor

* Address feedback.

* Move flag_methods to init
- Address setting default flags repeatedly.

* Rename accuracy tests.

* Lint errors resolved.

* fix model_dir set to flags.data_dir.

* fixed not fulling pulling out flag_methods.

* Use core mirrored strategy in official models (#6126)

* Imagenet short tests (#6132)

* Add short imagenet tests (taken from seemuch)
- also rename to match go forward naming

* fix method name

* Update doc strings.

* Fixe gpu number.

* points default data_dir to child folder. (#6131)

Failed test is python2  and was a kokoro failure

* Imagenet short tests (#6136)

* Add short imagenet tests (taken from seemuch)
- also rename to match go forward naming

* fix method name

* Update doc strings.

* Fixe gpu number.

* Add fill_objects

* fixed calling wrong class in super.

* fix lint issue.

* Flag (#6121)

* Fix the turn_off_ds flag problem

* add param names to all args

* Export benchmark stats using tf.test.Benchmark.report_benchmark() (#6103)

* Export benchmark stats using tf.test.Benchmark.report_benchmark()

* Fix python style using pyformat

* Typos. (#6120)

* log verbosity=2 logs every epoch no progress bars (#6142)

* tf_upgrade_v2 on resnet and utils folder.

* tf_upgrade_v2 on resnet and utils folder.

d6b2b83c

13 Oct, 2018 1 commit
- refactor method and flag names. · 26301e74
  Toby Boyd authored Oct 13, 2018
  
  26301e74