Commits · 609260cd2db52abc4eabc70521442aa31901bbbc · ModelZoo / ResNet50_tensorflow

23 Jul, 2019 2 commits

Update lint presubmit to be consistent with tensorflow (#7278) · 609260cd
Hongkun Yu authored Jul 22, 2019
```
Only care about errors and output into an error file.
```
609260cd

Merged commit includes the following changes: (#7277) · 1fc839bc

Hongkun Yu authored Jul 22, 2019

259442882  by hongkuny<hongkuny@google.com>:

    Internal

--
259341546  by mrry<mrry@google.com>:

    Remove DEBUG-level logging from the BERT benchmark.

    This triggers graph serialization and other verbose logging in the TensorFlow runtime, which inflates the execution time.

--
259253185  by hongkuny<hongkuny@google.com>:

    Writes a separated checkpoint for the core model in pretraining.
    Clean up export utils to just take a model as argument.

--
258893811  by hongkuny<hongkuny@google.com>:

    Adds summaries for metrics, allowing metrics inside keras.model.

--
258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258597234  by rxsang<rxsang@google.com>:

    Update all the TPUStrategy examples to use the new v2 APIs, i.e.
    make_dataset_iterator -> experimental_distribute_dataset,
    make_input_fn_iterator -> experimental_distribute_datasets_from_function,
    unwrap -> experimental_local_results,
    experimental_run -> experimental_run_v2

--
258581998  by taylorrobie<taylorrobie@google.com>:

    Update keras v2 optimizers to reuse coefficients which are shared across all updates, which reduces the total number of ops created by between 5% (for simple optimizers such as SGD and Adagrad) and 25% (for complicated optimizers such as Adam and NAdam). Separate copies are made for each device and dtype.

    The effect of this change on run time is fairly minimal since Grappler is expected to consolidate most of these ops; however it does improve graph construction time.

--
258208153  by hongkuny<hongkuny@google.com>:

    Adds run_eagerly option for bert.

--
257883986  by hongkuny<hongkuny@google.com>:

    Adds tf.summary for bert training

--
256204636  by hongkuny<hongkuny@google.com>:

    Internal

--
256079834  by hongkuny<hongkuny@google.com>:

    Clean up: move common flags together for further refactoring
    Enable steps_per_loop option for all applications.

--
255493073  by hongkuny<hongkuny@google.com>:

    BERT initial OSS readme update.

--
255470372  by dmchen<dmchen@google.com>:

    Slightly expand expected range for F1 score in BERT SQuAD accuracy test

--
255109240  by hongkuny<hongkuny@google.com>:

    Update eval/predict batch sizes.

--
255010016  by hongkuny<hongkuny@google.com>:

    Internal

--
254874613  by hongkuny<hongkuny@google.com>:

    Update glue tasks enum to match directory name

--
254866171  by taylorrobie<taylorrobie@google.com>:

    Internal change

254785517  by zongweiz<zongweiz@google.com>:

    Use train_single_step for BERT GPU models to temporarily work around some performance bugs in GPU runs

--
254497647  by hongkuny<hongkuny@google.com>:

    Fix device placement for TPU export model.

--
254134531  by yuefengz<yuefengz@google.com>:

    Fix a typo in bert_benchmark.py

--
254069984  by hongkuny<hongkuny@google.com>:
    Automated rollback of changelist 254060732.

254061429  by hongkuny<hongkuny@google.com>:

    Use host while loop for training steps.

--
254060732  by yifeif<yifeif@google.com>:
    Automated rollback of changelist 254027750.

254027750  by hongkuny<hongkuny@google.com>:

    Internal change

253850824  by hongkuny<hongkuny@google.com>:

    Improve bert training utils.

--
253818191  by hongkuny<hongkuny@google.com>:

    Update savedmodel export to use new model.save() api.

--
253636854  by dmchen<dmchen@google.com>:

    Run only training in BERT SQuAD performance test

--
253118910  by hongkuny<hongkuny@google.com>:

    Internal change

253113801  by zongweiz<zongweiz@google.com>:

    Internal change

252697519  by dmchen<dmchen@google.com>:

    BERT SQuAD accuracy test

--
252663512  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

--
252647871  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Enable multi worker TPU training for BERT pretraining.

--
252522861  by hongkuny<hongkuny@google.com>:

    Remove export using trained model due to implementation error

--
252156812  by yuefengz<yuefengz@google.com>:

    Fix the callback method name in BERT: replaced on_batch_start with on_batch_begin. Without the fix, it won't work with Keras callbacks.

--
251782065  by dmchen<dmchen@google.com>:

    Internal change

251681245  by hongkuny<hongkuny@google.com>:

    Update bert to use the new tf.distribute APIs

--
251575972  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Remove `steps_per_run` when instantiating TPUStrategy.

--
251325964  by hongkuny<hongkuny@google.com>:

    Improve flags

--
250942274  by tobyboyd<tobyboyd@google.com>:

    Internal change

250779087  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Reduce BERT Perfzero benchmark test training steps.

--
250713045  by hongkuny<hongkuny@google.com>:

    TPU util

--
250606180  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix BERT benchamrk test errors.

--
250589623  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Change BERT benchmark test pretrained checkpoint url.

--
250587892  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix error in BERT custom training loop checkpoint restoration.

--
250577163  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add logic to inject callback that measures performance in BERT custom training
    loop.

--
250529526  by hongkuny<hongkuny@google.com>:

    Internal clean up

--
250428976  by hongkuny<hongkuny@google.com>:

    Internal change

250415383  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add min/max value to BERT classifier benchmark test.

--
250376246  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add benchmark performance test to run BERT on multiple numbers of GPUs.

--
250347237  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix linting errors in BERT benchmark test.

--
250326131  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

250315593  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

250303528  by haoyuzhang<haoyuzhang@google.com>:

    Add method docstring to fix lint error.

--
250009207  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add feature in BERT to write training metrics to a summary file.

--
249896208  by hongkuny<hongkuny@google.com>:

    Adds __init__.py

--
249883771  by hongkuny<hongkuny@google.com>:

    Creates a benchmark dir

--
249580533  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

249566870  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Set up BERT benchmark test.

--
249500988  by hongkuny<hongkuny@google.com>:

    Lints

--
249377254  by hongkuny<hongkuny@google.com>:

    Internal change

249373328  by hongkuny<hongkuny@google.com>:

    Clean up tf import

--
249333938  by hongkuny<hongkuny@google.com>:

    Fix tf1 import

--
249325089  by hongkuny<hongkuny@google.com>:

    BERT 2.0

--
249173564  by hongkuny<hongkuny@google.com>:

    Internal change

PiperOrigin-RevId: 259442882

1fc839bc

22 Jul, 2019 1 commit

Add a new sanity check script that is able to only check incremental changes. (#7265) · 6a6c3616

Hongkun Yu authored Jul 22, 2019

* Update pylint.rcfile

* Update pylint.rcfile

* Update pylint.rcfile

* add new sanity check script for lint to replace current lint script.

* Revert "Update pylint.rcfile"

This reverts commit f6036cd7e7c4b9e3eeb47bb56a63927a040a2761.

* Revert "Update pylint.rcfile"

This reverts commit e3af497342e26bbbbecfc8c8f79cb0e24a2ef960.

* Revert "Update pylint.rcfile"

This reverts commit 6136636eee6e90fd191ebbb4ccaa9fb89c0290f4.

* update scripts

* disable trailing-newlines

6a6c3616

21 Jul, 2019 1 commit
- Add a simple signal-based Python callstack sampler for debugging · 830a17ec
  Zongwei Zhou authored Jul 19, 2019
  
  830a17ec
20 Jul, 2019 3 commits
- [Transformer] Use float16 input and output for softmax in mixed-precision training · 448c31b6
  Zongwei Zhou authored Jul 12, 2019
  
  448c31b6
- improved v2 check. · 49b90e86
  Toby Boyd authored Jul 19, 2019
  
  49b90e86
- update v2 check and fix ncf v2 check error logic. · 308c7934
  Toby Boyd authored Jul 19, 2019
  
  308c7934
19 Jul, 2019 8 commits

Merged commit includes the following changes: (#7264) · 6f47c378

Igor authored Jul 19, 2019

259030078  by isaprykin<isaprykin@google.com>:

    Clean up the --clone_model_in_keras_dist_strat from Keras Resnet.

    The cloning flag has been removed.  The current rule is that cloning is only done in graph mode.  That resulted in duplicate benchmarks: eager+no-cloning vs eager+cloning.  I removed eager+cloning ones.

--
259026454  by isaprykin<isaprykin@google.com>:

    Internal change

PiperOrigin-RevId: 259030078

6f47c378

Merged commit includes the following changes: (#7263) · c5a4978d

Jing Li authored Jul 19, 2019

* Merged commit includes the following changes:
258867180  by jingli<jingli@google.com>:

    Add new folders for upcoming reorg in model garden.

--
258893811  by hongkuny<hongkuny@google.com>:

    Adds summaries for metrics, allowing metrics inside keras.model.

--
258893048  by isaprykin<isaprykin@google.com>:

    Remove the `cloning` argument to `compile()`.

    Keras models are distributed by cloning in graph mode and without cloning in eager mode as of the change # 258652546.

--
258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258874998  by hongkuny<hongkuny@google.com>:

    Internal

--
258872662  by hongkuny<hongkuny@google.com>:

    Fix doc

--

PiperOrigin-RevId: 258867180

* Create __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

c5a4978d

Revert "Change how TF 2 is checked" (#7260) · 2569fa9a
Toby Boyd authored Jul 19, 2019
```
This reverts commit 712f473e.
```
2569fa9a
Fix lint error · 283de38b
guptapriya authored Jul 18, 2019

283de38b
Disable ncf tests for 1.x · 8c8779a3
guptapriya authored Jul 18, 2019

8c8779a3

NCF Keras: Fail early with TF 1.x + dist strat · 41d071ee

guptapriya authored Jul 18, 2019

This combination does not yet work. Fail early with an explicit message instead of throwing error later on.

41d071ee

Change how TF 2 is checked · 712f473e

guptapriya authored Jul 18, 2019

The current approach checks for presence of contrib. Sometimes this is not sufficient (for e..g when testing TF 1 + enable_v2_behavior=True which is what internal tests currently do)

712f473e

Merged commit includes the following changes: (#7255) · 32fadf00

Hongkun Yu authored Jul 18, 2019

258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258874998  by hongkuny<hongkuny@google.com>:

    Internal

--
258872662  by hongkuny<hongkuny@google.com>:

    Fix doc

--
258871624  by hongkuny<hongkuny@google.com>:

    Internal change

PiperOrigin-RevId: 258881002

32fadf00

18 Jul, 2019 3 commits

Merged commit includes the following changes: (#7252) · 1fb34e76

Hongkun Yu authored Jul 18, 2019

258597234  by rxsang<rxsang@google.com>:

    Update all the TPUStrategy examples to use the new v2 APIs, i.e.
    make_dataset_iterator -> experimental_distribute_dataset,
    make_input_fn_iterator -> experimental_distribute_datasets_from_function,
    unwrap -> experimental_local_results,
    experimental_run -> experimental_run_v2

--
258581998  by taylorrobie<taylorrobie@google.com>:

    Update keras v2 optimizers to reuse coefficients which are shared across all updates, which reduces the total number of ops created by between 5% (for simple optimizers such as SGD and Adagrad) and 25% (for complicated optimizers such as Adam and NAdam). Separate copies are made for each device and dtype.

    The effect of this change on run time is fairly minimal since Grappler is expected to consolidate most of these ops; however it does improve graph construction time.

--

PiperOrigin-RevId: 258597234

1fb34e76

Refactor and add benchmarks as well as accuracy tests for GPU and CPU (#7248) · e0a2b8c3

Toby Boyd authored Jul 18, 2019

* Added benchmarks and common flags.

* Add cpu tests.

* Add tracking epoch times.

* fix transformer.

* Add examples_per_second.

* fix pylint

e0a2b8c3

Improve Keras graph performance for ResNet56 (#7241) · dd5a91d3

Haoyu Zhang authored Jul 18, 2019

* Config threadpool, cuDNN persistent BN, and grappler layout optimizer properly for ResNet56

* Add tweaked tests for Resnet56

* Avoid triggering the last partial batch overhead by explicitly dropping remainder

dd5a91d3

16 Jul, 2019 2 commits

Merged commit includes the following changes: (#7221) · e21dcdd0

Hongkun Yu authored Jul 16, 2019

258208153  by hongkuny<hongkuny@google.com>:

    Adds run_eagerly option for bert.

--

PiperOrigin-RevId: 258208153

e21dcdd0

Ncf perf optimizations for CTL and multi GPU (#7206) · 492f8c92

nnigania authored Jul 16, 2019

* Ncf perf changes 1)exclude metric layer from CTL train step 2)dataset optimization to fix size of the sample_weights, preventing a costly broadcast during loss calculation for multi-gpu case

492f8c92

15 Jul, 2019 2 commits
- Initial implementation of Shakespeare character LSTM. (#7218) · 395f6d2d
  Bruce Fontaine authored Jul 15, 2019
```
* Initial implementation of Shakespeare character LSTM.

* Fix import order
```
  395f6d2d
- Merged commit includes the following changes: (#7209) · dc8c6ce1
  Hongkun Yu authored Jul 15, 2019
```
257883986  by hongkuny<hongkuny@google.com>:

    Adds tf.summary for bert training

--

PiperOrigin-RevId: 257883986
```
  dc8c6ce1
11 Jul, 2019 5 commits
- Reduce transformer fp16 test to 12 iterations. (#7183) · 81123ebf
  Toby Boyd authored Jul 11, 2019
  
  81123ebf
- Record highest uncased bleu found. (#7196) · 35620eaf
  Toby Boyd authored Jul 11, 2019
```
* Record highest uncased bleu found.

* change to bleu_best_score_iteration
```
  35620eaf
- Add stdev to the Dense layer. (#7189) · fa28535d
  Toby Boyd authored Jul 10, 2019
  
  fa28535d
- Merged commit includes the following changes: (#7191) · 13feba3c
  saberkun authored Jul 10, 2019
```
257314238  by hongkuny<hongkuny@google.com>:

    Creates transformer v2 README.
    Remove contents that are not implemented.

--

PiperOrigin-RevId: 257314238
```
  13feba3c
- Move Keras Hook to use global step to resolve issues across epochs. (#7186) · f4b02d15
  Toby Boyd authored Jul 10, 2019
```
* Move to global_step.

* Hook to use global_step.

* fix comment start step 1 not step 0.

* remove hack used for testing.

* Add docstring.
```
  f4b02d15
09 Jul, 2019 1 commit
- Improve performance for Cifar ResNet benchmarks (#7178) · 2ed43e66
  Haoyu Zhang authored Jul 09, 2019
```
* Improve performance for Cifar ResNet benchmarks

* Revert batch size changes to benchmarks
```
  2ed43e66
08 Jul, 2019 2 commits
- Reorder and then add CTL XLA tests. (#7169) · 18e477c6
  Toby Boyd authored Jul 08, 2019
  
  18e477c6
- Reduce iterations from 20 to 12 and add FP16 dynamic. (#7168) · cf1a276a
  Toby Boyd authored Jul 08, 2019
```
* reduce iterations from 20 to 12.

* add fp16 dynamic batch accuracy check.

* fix existing lint issue.
```
  cf1a276a
03 Jul, 2019 1 commit

Unit tests pass TF 2.0 GPU and CPU locally. (#7101) · 49097655

Toby Boyd authored Jul 03, 2019

* Fix unit tests failures.

* 96% of TF 2.0 tests on GPU are passing.

* Currently all passing GPU and CPU TF 2.0

* Address code comments.

* use tf 2.0 cast.

* Comment about working on TF 2.0 CPU

* Uses contrib turn off for TF 2.0.

* Fix wide_deep and add keras_common_tests.

* use context to get num_gpus.

* Switch to tf.keras.metrics

49097655

02 Jul, 2019 3 commits

Merged commit includes the following changes: (#7141) · 5175b7e6

saberkun authored Jul 02, 2019

256204636  by hongkuny<hongkuny@google.com>:

    Internal

--
256079834  by hongkuny<hongkuny@google.com>:

    Clean up: move common flags together for further refactoring
    Enable steps_per_loop option for all applications.

--

PiperOrigin-RevId: 256204636

5175b7e6

Add StepCounterHook to hooks_helper.py (#7134) · 8155eb9d
Yuefeng Zhou authored Jul 02, 2019
```
* Add StepCounterHook to hooks_helper.py

* Update symbol.
```
8155eb9d
Allow distibution_utils.py to worker with PSStrategy or none strategy (#7135) · 680eb35c
Yuefeng Zhou authored Jul 02, 2019
```
when there are multiple workers.
```
680eb35c

28 Jun, 2019 4 commits

Add FP16 end-to-end tests (#7122) · 58a3de6c
Toby Boyd authored Jun 28, 2019

58a3de6c

NCF CTL Perf optimization to convert gradients from sparse to dense (#7102) · 44ff121d

nnigania authored Jun 28, 2019

* borrowing a tf1.x optimization which converts gradients from sparse to dense for better perf

* cleanup after code review

44ff121d

Merged commit includes the following changes: (#7119) · 5afa9569

saberkun authored Jun 27, 2019

* Merged commit includes the following changes:
255493073  by hongkuny<hongkuny@google.com>:

    BERT initial OSS readme update.

--
255470372  by dmchen<dmchen@google.com>:

    Slightly expand expected range for F1 score in BERT SQuAD accuracy test

--
255109240  by hongkuny<hongkuny@google.com>:

    Update eval/predict batch sizes.

--
255010016  by hongkuny<hongkuny@google.com>:

    Internal

--
254874613  by hongkuny<hongkuny@google.com>:

    Update glue tasks enum to match directory name

--
254866171  by taylorrobie<taylorrobie@google.com>:

    Internal change

254785517  by zongweiz<zongweiz@google.com>:

    Use train_single_step for BERT GPU models to temporarily work around some performance bugs in GPU runs

--
254497647  by hongkuny<hongkuny@google.com>:

    Fix device placement for TPU export model.

--

PiperOrigin-RevId: 255493073

* Update README.md

5afa9569

Merged commit includes the following changes: (#7116) · 76995053

David M. Chen authored Jun 27, 2019

255493073  by hongkuny<hongkuny@google.com>:

    BERT initial OSS readme update.

--
255470372  by dmchen<dmchen@google.com>:

    Slightly expand expected range for F1 score in BERT SQuAD accuracy test

--
255109240  by hongkuny<hongkuny@google.com>:

    Update eval/predict batch sizes.

--
255010016  by hongkuny<hongkuny@google.com>:

    Internal

--

PiperOrigin-RevId: 255493073

76995053

25 Jun, 2019 1 commit

Merged commit includes the following changes: (#7100) · a156e203

saberkun authored Jun 25, 2019

254874613  by hongkuny<hongkuny@google.com>:

    Update glue tasks enum to match directory name

--
254866171  by taylorrobie<taylorrobie@google.com>:

    Internal change

PiperOrigin-RevId: 254874613

a156e203

24 Jun, 2019 1 commit

Merged commit includes the following changes: (#7093) · 240623ac

saberkun authored Jun 24, 2019

254785517  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Use train_single_step for BERT GPU models to temporarily work around some performance bugs in GPU runs

--
254497647  by hongkuny<hongkuny@google.com>:

    Fix device placement for TPU export model.

--

PiperOrigin-RevId: 254785517

240623ac