Commits · d09994b26ac8592765be45dc8a8444ead55eb32b · ModelZoo / ResNet50_tensorflow

24 Jul, 2019 8 commits
- fix flags to force_v2_in_keras_compile (#7287) · d09994b2
  Toby Boyd authored Jul 23, 2019
  
  d09994b2
- Lower MLPerf hr@10 target (#7285) · 829190e6
  Toby Boyd authored Jul 23, 2019
  
  829190e6
- Unskip tests with 1.x · 296d0d3f
  guptapriya authored Jul 18, 2019
  
  296d0d3f
- Remove loss layer test · 3a796b5a
  guptapriya authored Jul 18, 2019
  
  3a796b5a
- Remove loss layer · 67f81649
  guptapriya authored Jul 18, 2019
  
  67f81649
- Update synth data pipeline dtype · ffbada72
  guptapriya authored Jul 18, 2019
  
  ffbada72
- Use add_loss in transformer model · 13cc0f70
  guptapriya authored Jul 18, 2019
  
  13cc0f70
- Merged commit includes the following changes: (#7289) · ab8febd4
  Hongkun Yu authored Jul 23, 2019
```
259649972  by hongkuny<hongkuny@google.com>:

    Update docs.

--
259470074  by hongkuny<hongkuny@google.com>:

    Adds a dedup phase for trainable variables.

--

PiperOrigin-RevId: 259649972
```
  ab8febd4
23 Jul, 2019 5 commits

Single execution path tests for ResNet50, ResNet56, NCF, and Shakespeare LSTM. (#7276) · 9d8c9aa4

Toby Boyd authored Jul 23, 2019

* Add force_run_distributed tests.

* Added enable_eager

* r/force_run_distributed/force_v2_in_keras_compile

* Adding force_v2 tests and FLAGs.

* Rename method to avoid conflict.

* Add cpu force_v2 tests.

* fix lint, wrap line.

* change to force_v2_in_keras_compile

* Update method name.

* Lower mlperf target to 0.736.

9d8c9aa4

add log_steps with faster logging for 8xGPU. (#7274) · 8390b362
Toby Boyd authored Jul 23, 2019

8390b362

Merged commit includes the following changes: (#7281) · 64d6c094

Hongjun Choi authored Jul 22, 2019

* Merged commit includes the following changes:
259442882  by hongkuny<hongkuny@google.com>:

    Internal

--
259377621  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix NCF serialization/de-serialization logic in NCF input pipeline to use tf.FixedLenFeature instead of raw string/binary decoding.

--
259373183  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Create binary to generate NCF training/evaluation dataset offline.

--
259026454  by isaprykin<isaprykin@google.com>:

    Internal change

258871624  by hongkuny<hongkuny@google.com>:

    Internal change

257285772  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

256202287  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change.

--
254069984  by hongkuny<hongkuny@google.com>:
    Automated rollback of changelist 254060732.

254060732  by yifeif<yifeif@google.com>:
    Automated rollback of changelist 254027750.

254027750  by hongkuny<hongkuny@google.com>:

    Internal change

253118910  by hongkuny<hongkuny@google.com>:

    Internal change

251906769  by hongkuny<hongkuny@google.com>:

    Internal change

251303452  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

PiperOrigin-RevId: 259442882

* Update ncf_keras_main.py

64d6c094

Update lint presubmit to be consistent with tensorflow (#7278) · 609260cd
Hongkun Yu authored Jul 22, 2019
```
Only care about errors and output into an error file.
```
609260cd

Merged commit includes the following changes: (#7277) · 1fc839bc

Hongkun Yu authored Jul 22, 2019

259442882  by hongkuny<hongkuny@google.com>:

    Internal

--
259341546  by mrry<mrry@google.com>:

    Remove DEBUG-level logging from the BERT benchmark.

    This triggers graph serialization and other verbose logging in the TensorFlow runtime, which inflates the execution time.

--
259253185  by hongkuny<hongkuny@google.com>:

    Writes a separated checkpoint for the core model in pretraining.
    Clean up export utils to just take a model as argument.

--
258893811  by hongkuny<hongkuny@google.com>:

    Adds summaries for metrics, allowing metrics inside keras.model.

--
258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258597234  by rxsang<rxsang@google.com>:

    Update all the TPUStrategy examples to use the new v2 APIs, i.e.
    make_dataset_iterator -> experimental_distribute_dataset,
    make_input_fn_iterator -> experimental_distribute_datasets_from_function,
    unwrap -> experimental_local_results,
    experimental_run -> experimental_run_v2

--
258581998  by taylorrobie<taylorrobie@google.com>:

    Update keras v2 optimizers to reuse coefficients which are shared across all updates, which reduces the total number of ops created by between 5% (for simple optimizers such as SGD and Adagrad) and 25% (for complicated optimizers such as Adam and NAdam). Separate copies are made for each device and dtype.

    The effect of this change on run time is fairly minimal since Grappler is expected to consolidate most of these ops; however it does improve graph construction time.

--
258208153  by hongkuny<hongkuny@google.com>:

    Adds run_eagerly option for bert.

--
257883986  by hongkuny<hongkuny@google.com>:

    Adds tf.summary for bert training

--
256204636  by hongkuny<hongkuny@google.com>:

    Internal

--
256079834  by hongkuny<hongkuny@google.com>:

    Clean up: move common flags together for further refactoring
    Enable steps_per_loop option for all applications.

--
255493073  by hongkuny<hongkuny@google.com>:

    BERT initial OSS readme update.

--
255470372  by dmchen<dmchen@google.com>:

    Slightly expand expected range for F1 score in BERT SQuAD accuracy test

--
255109240  by hongkuny<hongkuny@google.com>:

    Update eval/predict batch sizes.

--
255010016  by hongkuny<hongkuny@google.com>:

    Internal

--
254874613  by hongkuny<hongkuny@google.com>:

    Update glue tasks enum to match directory name

--
254866171  by taylorrobie<taylorrobie@google.com>:

    Internal change

254785517  by zongweiz<zongweiz@google.com>:

    Use train_single_step for BERT GPU models to temporarily work around some performance bugs in GPU runs

--
254497647  by hongkuny<hongkuny@google.com>:

    Fix device placement for TPU export model.

--
254134531  by yuefengz<yuefengz@google.com>:

    Fix a typo in bert_benchmark.py

--
254069984  by hongkuny<hongkuny@google.com>:
    Automated rollback of changelist 254060732.

254061429  by hongkuny<hongkuny@google.com>:

    Use host while loop for training steps.

--
254060732  by yifeif<yifeif@google.com>:
    Automated rollback of changelist 254027750.

254027750  by hongkuny<hongkuny@google.com>:

    Internal change

253850824  by hongkuny<hongkuny@google.com>:

    Improve bert training utils.

--
253818191  by hongkuny<hongkuny@google.com>:

    Update savedmodel export to use new model.save() api.

--
253636854  by dmchen<dmchen@google.com>:

    Run only training in BERT SQuAD performance test

--
253118910  by hongkuny<hongkuny@google.com>:

    Internal change

253113801  by zongweiz<zongweiz@google.com>:

    Internal change

252697519  by dmchen<dmchen@google.com>:

    BERT SQuAD accuracy test

--
252663512  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

--
252647871  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Enable multi worker TPU training for BERT pretraining.

--
252522861  by hongkuny<hongkuny@google.com>:

    Remove export using trained model due to implementation error

--
252156812  by yuefengz<yuefengz@google.com>:

    Fix the callback method name in BERT: replaced on_batch_start with on_batch_begin. Without the fix, it won't work with Keras callbacks.

--
251782065  by dmchen<dmchen@google.com>:

    Internal change

251681245  by hongkuny<hongkuny@google.com>:

    Update bert to use the new tf.distribute APIs

--
251575972  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Remove `steps_per_run` when instantiating TPUStrategy.

--
251325964  by hongkuny<hongkuny@google.com>:

    Improve flags

--
250942274  by tobyboyd<tobyboyd@google.com>:

    Internal change

250779087  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Reduce BERT Perfzero benchmark test training steps.

--
250713045  by hongkuny<hongkuny@google.com>:

    TPU util

--
250606180  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix BERT benchamrk test errors.

--
250589623  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Change BERT benchmark test pretrained checkpoint url.

--
250587892  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix error in BERT custom training loop checkpoint restoration.

--
250577163  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add logic to inject callback that measures performance in BERT custom training
    loop.

--
250529526  by hongkuny<hongkuny@google.com>:

    Internal clean up

--
250428976  by hongkuny<hongkuny@google.com>:

    Internal change

250415383  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add min/max value to BERT classifier benchmark test.

--
250376246  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add benchmark performance test to run BERT on multiple numbers of GPUs.

--
250347237  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix linting errors in BERT benchmark test.

--
250326131  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

250315593  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

250303528  by haoyuzhang<haoyuzhang@google.com>:

    Add method docstring to fix lint error.

--
250009207  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add feature in BERT to write training metrics to a summary file.

--
249896208  by hongkuny<hongkuny@google.com>:

    Adds __init__.py

--
249883771  by hongkuny<hongkuny@google.com>:

    Creates a benchmark dir

--
249580533  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

249566870  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Set up BERT benchmark test.

--
249500988  by hongkuny<hongkuny@google.com>:

    Lints

--
249377254  by hongkuny<hongkuny@google.com>:

    Internal change

249373328  by hongkuny<hongkuny@google.com>:

    Clean up tf import

--
249333938  by hongkuny<hongkuny@google.com>:

    Fix tf1 import

--
249325089  by hongkuny<hongkuny@google.com>:

    BERT 2.0

--
249173564  by hongkuny<hongkuny@google.com>:

    Internal change

PiperOrigin-RevId: 259442882

1fc839bc

22 Jul, 2019 1 commit

Add a new sanity check script that is able to only check incremental changes. (#7265) · 6a6c3616

Hongkun Yu authored Jul 22, 2019

* Update pylint.rcfile

* Update pylint.rcfile

* Update pylint.rcfile

* add new sanity check script for lint to replace current lint script.

* Revert "Update pylint.rcfile"

This reverts commit f6036cd7e7c4b9e3eeb47bb56a63927a040a2761.

* Revert "Update pylint.rcfile"

This reverts commit e3af497342e26bbbbecfc8c8f79cb0e24a2ef960.

* Revert "Update pylint.rcfile"

This reverts commit 6136636eee6e90fd191ebbb4ccaa9fb89c0290f4.

* update scripts

* disable trailing-newlines

6a6c3616

21 Jul, 2019 1 commit
- Add a simple signal-based Python callstack sampler for debugging · 830a17ec
  Zongwei Zhou authored Jul 19, 2019
  
  830a17ec
20 Jul, 2019 3 commits
- [Transformer] Use float16 input and output for softmax in mixed-precision training · 448c31b6
  Zongwei Zhou authored Jul 12, 2019
  
  448c31b6
- improved v2 check. · 49b90e86
  Toby Boyd authored Jul 19, 2019
  
  49b90e86
- update v2 check and fix ncf v2 check error logic. · 308c7934
  Toby Boyd authored Jul 19, 2019
  
  308c7934
19 Jul, 2019 9 commits

Merged commit includes the following changes: (#7264) · 6f47c378

Igor authored Jul 19, 2019

259030078  by isaprykin<isaprykin@google.com>:

    Clean up the --clone_model_in_keras_dist_strat from Keras Resnet.

    The cloning flag has been removed.  The current rule is that cloning is only done in graph mode.  That resulted in duplicate benchmarks: eager+no-cloning vs eager+cloning.  I removed eager+cloning ones.

--
259026454  by isaprykin<isaprykin@google.com>:

    Internal change

PiperOrigin-RevId: 259030078

6f47c378

Merged commit includes the following changes: (#7263) · c5a4978d

Jing Li authored Jul 19, 2019

* Merged commit includes the following changes:
258867180  by jingli<jingli@google.com>:

    Add new folders for upcoming reorg in model garden.

--
258893811  by hongkuny<hongkuny@google.com>:

    Adds summaries for metrics, allowing metrics inside keras.model.

--
258893048  by isaprykin<isaprykin@google.com>:

    Remove the `cloning` argument to `compile()`.

    Keras models are distributed by cloning in graph mode and without cloning in eager mode as of the change # 258652546.

--
258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258874998  by hongkuny<hongkuny@google.com>:

    Internal

--
258872662  by hongkuny<hongkuny@google.com>:

    Fix doc

--

PiperOrigin-RevId: 258867180

* Create __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

c5a4978d

Revert "Change how TF 2 is checked" (#7260) · 2569fa9a
Toby Boyd authored Jul 19, 2019
```
This reverts commit 712f473e.
```
2569fa9a
Fix lint error · 283de38b
guptapriya authored Jul 18, 2019

283de38b
Disable ncf tests for 1.x · 8c8779a3
guptapriya authored Jul 18, 2019

8c8779a3

NCF Keras: Fail early with TF 1.x + dist strat · 41d071ee

guptapriya authored Jul 18, 2019

This combination does not yet work. Fail early with an explicit message instead of throwing error later on.

41d071ee

Fix for TF-models #7216: CIFAR-10 tutorial for multi-GPU fails because full... · 97a87f9c

Chris Mattmann authored Jul 18, 2019

Fix for TF-models #7216: CIFAR-10 tutorial for multi-GPU fails because full shape isn't passed to prefetch_queue contributed by mattmann. (#7217)

97a87f9c

Change how TF 2 is checked · 712f473e

guptapriya authored Jul 18, 2019

The current approach checks for presence of contrib. Sometimes this is not sufficient (for e..g when testing TF 1 + enable_v2_behavior=True which is what internal tests currently do)

712f473e

Merged commit includes the following changes: (#7255) · 32fadf00

Hongkun Yu authored Jul 18, 2019

258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258874998  by hongkuny<hongkuny@google.com>:

    Internal

--
258872662  by hongkuny<hongkuny@google.com>:

    Fix doc

--
258871624  by hongkuny<hongkuny@google.com>:

    Internal change

PiperOrigin-RevId: 258881002

32fadf00

18 Jul, 2019 7 commits

Merged commit includes the following changes: (#7252) · 1fb34e76

Hongkun Yu authored Jul 18, 2019

258597234  by rxsang<rxsang@google.com>:

    Update all the TPUStrategy examples to use the new v2 APIs, i.e.
    make_dataset_iterator -> experimental_distribute_dataset,
    make_input_fn_iterator -> experimental_distribute_datasets_from_function,
    unwrap -> experimental_local_results,
    experimental_run -> experimental_run_v2

--
258581998  by taylorrobie<taylorrobie@google.com>:

    Update keras v2 optimizers to reuse coefficients which are shared across all updates, which reduces the total number of ops created by between 5% (for simple optimizers such as SGD and Adagrad) and 25% (for complicated optimizers such as Adam and NAdam). Separate copies are made for each device and dtype.

    The effect of this change on run time is fairly minimal since Grappler is expected to consolidate most of these ops; however it does improve graph construction time.

--

PiperOrigin-RevId: 258597234

1fb34e76

Update CODEOWNERS (#7251) · 79b87be6
Jing Li authored Jul 18, 2019

79b87be6

Refactor and add benchmarks as well as accuracy tests for GPU and CPU (#7248) · e0a2b8c3

Toby Boyd authored Jul 18, 2019

* Added benchmarks and common flags.

* Add cpu tests.

* Add tracking epoch times.

* fix transformer.

* Add examples_per_second.

* fix pylint

e0a2b8c3

Fix for #7225: CIFAR-10 eval fails with error TypeError: Input 'predictions'... · 63605b95

Chris Mattmann authored Jul 18, 2019

Fix for #7225: CIFAR-10 eval fails with error TypeError: Input 'predictions' of 'InTopKV2' Op has type float16 that contributed by mattmann. (#7227)

63605b95

Merged commit includes the following changes: (#7250) · 3b9025d5

Yongzhe Wang authored Jul 18, 2019

* Merged commit includes the following changes:
257930561  by yongzhe:

    Mobile LSTD TfLite Client.

--
257928126  by yongzhe:

    Mobile SSD Tflite client.

--
257921181  by menglong:

    Fix discrepancy between pre_bottleneck = {true, false}

--
257561213  by yongzhe:

    File utils.

--
257449226  by yongzhe:

    Mobile SSD Client.

--
257264654  by yongzhe:

    SSD utils.

--
257235648  by yongzhe:

    Proto bazel build rules.

--
256437262  by Menglong Zhu:

    Fix check for FusedBatchNorm op to only verify it as a prefix.

--
256283755  by yongzhe:

    Bazel build and copybara changes.

--
251947295  by yinxiao:

    Add missing interleaved option in checkpoint restore.

--
251513479  by yongzhe:

    Conversion utils.

--
248783193  by yongzhe:

    Branch protos needed for the lstd client.

--
248200507  by menglong:

    Fix proto namespace in example config

--

PiperOrigin-RevId: 257930561

* Delete BUILD

* Merged commit includes the following changes:
258709909  by yongzhe:

    1. Fix a bug that input wasn't copied.
    2. Change the tensor indexing to support graph with postprocessing.
    3. Fix a bug that the quantized lstm states weren't initialized.

--
258398095  by yongzhe:

    Internal change.

--

PiperOrigin-RevId: 258709909

* Adding myself as the code owner

3b9025d5

Improve Keras graph performance for ResNet56 (#7241) · dd5a91d3

Haoyu Zhang authored Jul 18, 2019

* Config threadpool, cuDNN persistent BN, and grappler layout optimizer properly for ResNet56

* Add tweaked tests for Resnet56

* Avoid triggering the last partial batch overhead by explicitly dropping remainder

dd5a91d3

Merged commit includes the following changes: (#7249) · b7221961

Yongzhe Wang authored Jul 18, 2019

* Merged commit includes the following changes:
257930561  by yongzhe:

    Mobile LSTD TfLite Client.

--
257928126  by yongzhe:

    Mobile SSD Tflite client.

--
257921181  by menglong:

    Fix discrepancy between pre_bottleneck = {true, false}

--
257561213  by yongzhe:

    File utils.

--
257449226  by yongzhe:

    Mobile SSD Client.

--
257264654  by yongzhe:

    SSD utils.

--
257235648  by yongzhe:

    Proto bazel build rules.

--
256437262  by Menglong Zhu:

    Fix check for FusedBatchNorm op to only verify it as a prefix.

--
256283755  by yongzhe:

    Bazel build and copybara changes.

--
251947295  by yinxiao:

    Add missing interleaved option in checkpoint restore.

--
251513479  by yongzhe:

    Conversion utils.

--
248783193  by yongzhe:

    Branch protos needed for the lstd client.

--
248200507  by menglong:

    Fix proto namespace in example config

--

P...

b7221961

16 Jul, 2019 3 commits

Merged commit includes the following changes: (#7221) · e21dcdd0

Hongkun Yu authored Jul 16, 2019

258208153  by hongkuny<hongkuny@google.com>:

    Adds run_eagerly option for bert.

--

PiperOrigin-RevId: 258208153

e21dcdd0

Ncf perf optimizations for CTL and multi GPU (#7206) · 492f8c92

nnigania authored Jul 16, 2019

* Ncf perf changes 1)exclude metric layer from CTL train step 2)dataset optimization to fix size of the sample_weights, preventing a costly broadcast during loss calculation for multi-gpu case

492f8c92

Merged commit includes the following changes: (#7220) · 66d00a87

yongzhe2160 authored Jul 16, 2019

* Merged commit includes the following changes:
257930561  by yongzhe:

    Mobile LSTD TfLite Client.

--
257928126  by yongzhe:

    Mobile SSD Tflite client.

--
257921181  by menglong:

    Fix discrepancy between pre_bottleneck = {true, false}

--
257561213  by yongzhe:

    File utils.

--
257449226  by yongzhe:

    Mobile SSD Client.

--
257264654  by yongzhe:

    SSD utils.

--
257235648  by yongzhe:

    Proto bazel build rules.

--
256437262  by Menglong Zhu:

    Fix check for FusedBatchNorm op to only verify it as a prefix.

--
256283755  by yongzhe:

    Bazel build and copybara changes.

--
251947295  by yinxiao:

    Add missing interleaved option in checkpoint restore.

--
251513479  by yongzhe:

    Conversion utils.

--
248783193  by yongzhe:

    Branch protos needed for the lstd client.

--
248200507  by menglong:

    Fix proto namespace in example config

--

PiperOrigin-RevId: 257930561

* Delete BUILD

66d00a87

15 Jul, 2019 3 commits

Initial implementation of Shakespeare character LSTM. (#7218) · 395f6d2d
Bruce Fontaine authored Jul 15, 2019
```
* Initial implementation of Shakespeare character LSTM.

* Fix import order
```
395f6d2d

Merged commit includes the following changes: (#7209) · dc8c6ce1

Hongkun Yu authored Jul 15, 2019

257883986  by hongkuny<hongkuny@google.com>:

    Adds tf.summary for bert training

--

PiperOrigin-RevId: 257883986

dc8c6ce1

Object detection changes: (#7208) · fe748d4a

pkulzc authored Jul 15, 2019

257914648  by lzc:

    Internal changes

--
257525973  by Zhichao Lu:

    Fixes bug that silently prevents checkpoints from loading when training w/ eager + functions. Also sets up scripts to run training.

--
257296614  by Zhichao Lu:

    Adding detection_features to model outputs

--
257234565  by Zhichao Lu:

    Fix wrong order of `classes_with_max_scores` in class-agnostic NMS caused by
    sorting in partitioned-NMS.

--
257232002  by ronnyvotel:

    Supporting `filter_nonoverlapping` option in np_box_list_ops.clip_to_window().

--
257198282  by Zhichao Lu:

    Adding the focal loss and l1 loss from the Objects as Points paper.

--
257089535  by Zhichao Lu:

    Create Keras based ssd + resnetv1 + fpn.

--
257087407  by Zhichao Lu:

    Make object_detection/data_decoders Python3-compatible.

--
257004582  by Zhichao Lu:

    Updates _decode_raw_data_into_masks_and_boxes to the latest binary masks-to-string encoding fo...

fe748d4a