Commits · aad413408df304ffcda6c5c034afa558acf31824 · ModelZoo / ResNet50_tensorflow

30 Jul, 2019 1 commit

Merged commit includes the following changes: (#7324) · aad41340

Hongkun Yu authored Jul 29, 2019

260601376  by hongkuny<hongkuny@google.com>:

    reorder Q,K to make TPU faster.

--

PiperOrigin-RevId: 260601376

aad41340

29 Jul, 2019 2 commits

Merged commit includes the following changes: (#7323) · d65af7d8

Hongkun Yu authored Jul 29, 2019

260580119  by hongkuny<hongkuny@google.com>:

    Adds expect_partial()

--

PiperOrigin-RevId: 260580119

d65af7d8

Merged commit includes the following changes: (#7322) · 803f833c

Hongjun Choi authored Jul 29, 2019

260228553  by priyag<priyag@google.com>:

    Enable transformer and NCF official model tests. Also fix some minor issues so that all tests pass with TF 1 + enable_v2_behavior.

--
260043210  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add logic to train NCF model using offline generated data.

--
259778607  by priyag<priyag@google.com>:

    Internal change

259656389  by hongkuny<hongkuny@google.com>:

    Internal change

PiperOrigin-RevId: 260228553

803f833c

26 Jul, 2019 2 commits

Merged commit includes the following changes: (#7309) · 8c7a0e75

Hongkun Yu authored Jul 26, 2019

260060237  by zongweiz<zongweiz@google.com>:

    [BERT SQuAD] Enable mixed precision training

    Add mixed precision training support for BERT SQuAD model. Using the experimental Keras mixed precision API. For numeric stability, use fp32 for layer normalization, dense layers with GELU activation, etc.

--

PiperOrigin-RevId: 260060237

8c7a0e75

Merged commit includes the following changes: (#7307) · 745a06a9

Hongkun Yu authored Jul 25, 2019

260052674  by hongkuny<hongkuny@google.com>:

    Add expect_partial()

--

PiperOrigin-RevId: 260052674

745a06a9

25 Jul, 2019 5 commits
- Add Resnet50 CTL benchmark (pure eager w/ distribution strategy) · b59420dd
  Zongwei Zhou authored Jul 25, 2019
  
  b59420dd
- Merged commit includes the following changes: (#7301) · 53e3adb8
  Hongkun Yu authored Jul 24, 2019
```
259889221  by hongkuny<hongkuny@google.com>:

    Add no ds / xla / eager perfzero tests

--

PiperOrigin-RevId: 259889221
```
  53e3adb8
- Merged commit includes the following changes: (#7298) · 3c5330d8
  Hongkun Yu authored Jul 24, 2019
```
259790197  by hongkuny<hongkuny@google.com>:

    Update pretraining model to match tf1 var names.

--

PiperOrigin-RevId: 259790197
```
  3c5330d8
- Add a high max for MLPerf tests so they are green. (#7295) · 2533c697
  Toby Boyd authored Jul 24, 2019
  
  2533c697
- Additional force_v2 tests. (#7296) · 1c509f19
  Toby Boyd authored Jul 24, 2019
  
  1c509f19
24 Jul, 2019 10 commits
- Returning an object causes the program to exit with a non-zero code. (#7294) · 9fb1a1b6
  Soroush Radpour authored Jul 24, 2019
  
  9fb1a1b6
- NCF benchmark: top_1 to hr_at_10_max. (#7291) · c612d8c7
  Toby Boyd authored Jul 24, 2019
```
* top_1 to hr_at_10_max.

* Call self._run_and_report_benchmark not Super
```
  c612d8c7
- fix flags to force_v2_in_keras_compile (#7287) · d09994b2
  Toby Boyd authored Jul 23, 2019
  
  d09994b2
- Lower MLPerf hr@10 target (#7285) · 829190e6
  Toby Boyd authored Jul 23, 2019
  
  829190e6
- Unskip tests with 1.x · 296d0d3f
  guptapriya authored Jul 18, 2019
  
  296d0d3f
- Remove loss layer test · 3a796b5a
  guptapriya authored Jul 18, 2019
  
  3a796b5a
- Remove loss layer · 67f81649
  guptapriya authored Jul 18, 2019
  
  67f81649
- Update synth data pipeline dtype · ffbada72
  guptapriya authored Jul 18, 2019
  
  ffbada72
- Use add_loss in transformer model · 13cc0f70
  guptapriya authored Jul 18, 2019
  
  13cc0f70
- Merged commit includes the following changes: (#7289) · ab8febd4
  Hongkun Yu authored Jul 23, 2019
```
259649972  by hongkuny<hongkuny@google.com>:

    Update docs.

--
259470074  by hongkuny<hongkuny@google.com>:

    Adds a dedup phase for trainable variables.

--

PiperOrigin-RevId: 259649972
```
  ab8febd4
23 Jul, 2019 5 commits

Single execution path tests for ResNet50, ResNet56, NCF, and Shakespeare LSTM. (#7276) · 9d8c9aa4

Toby Boyd authored Jul 23, 2019

* Add force_run_distributed tests.

* Added enable_eager

* r/force_run_distributed/force_v2_in_keras_compile

* Adding force_v2 tests and FLAGs.

* Rename method to avoid conflict.

* Add cpu force_v2 tests.

* fix lint, wrap line.

* change to force_v2_in_keras_compile

* Update method name.

* Lower mlperf target to 0.736.

9d8c9aa4

add log_steps with faster logging for 8xGPU. (#7274) · 8390b362
Toby Boyd authored Jul 23, 2019

8390b362

Merged commit includes the following changes: (#7281) · 64d6c094

Hongjun Choi authored Jul 22, 2019

* Merged commit includes the following changes:
259442882  by hongkuny<hongkuny@google.com>:

    Internal

--
259377621  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix NCF serialization/de-serialization logic in NCF input pipeline to use tf.FixedLenFeature instead of raw string/binary decoding.

--
259373183  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Create binary to generate NCF training/evaluation dataset offline.

--
259026454  by isaprykin<isaprykin@google.com>:

    Internal change

258871624  by hongkuny<hongkuny@google.com>:

    Internal change

257285772  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

256202287  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change.

--
254069984  by hongkuny<hongkuny@google.com>:
    Automated rollback of changelist 254060732.

254060732  by yifeif<yifeif@google.com>:
    Automated rollback of changelist 254027750.

254027750  by hongkuny<hongkuny@google.com>:

    Internal change

253118910  by hongkuny<hongkuny@google.com>:

    Internal change

251906769  by hongkuny<hongkuny@google.com>:

    Internal change

251303452  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

PiperOrigin-RevId: 259442882

* Update ncf_keras_main.py

64d6c094

Update lint presubmit to be consistent with tensorflow (#7278) · 609260cd
Hongkun Yu authored Jul 22, 2019
```
Only care about errors and output into an error file.
```
609260cd

Merged commit includes the following changes: (#7277) · 1fc839bc

Hongkun Yu authored Jul 22, 2019

259442882  by hongkuny<hongkuny@google.com>:

    Internal

--
259341546  by mrry<mrry@google.com>:

    Remove DEBUG-level logging from the BERT benchmark.

    This triggers graph serialization and other verbose logging in the TensorFlow runtime, which inflates the execution time.

--
259253185  by hongkuny<hongkuny@google.com>:

    Writes a separated checkpoint for the core model in pretraining.
    Clean up export utils to just take a model as argument.

--
258893811  by hongkuny<hongkuny@google.com>:

    Adds summaries for metrics, allowing metrics inside keras.model.

--
258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258597234  by rxsang<rxsang@google.com>:

    Update all the TPUStrategy examples to use the new v2 APIs, i.e.
    make_dataset_iterator -> experimental_distribute_dataset,
    make_input_fn_iterator -> experimental_distribute_datasets_from_function,
    unwrap -> experimental_local_results,
    experimental_run -> experimental_run_v2

--
258581998  by taylorrobie<taylorrobie@google.com>:

    Update keras v2 optimizers to reuse coefficients which are shared across all updates, which reduces the total number of ops created by between 5% (for simple optimizers such as SGD and Adagrad) and 25% (for complicated optimizers such as Adam and NAdam). Separate copies are made for each device and dtype.

    The effect of this change on run time is fairly minimal since Grappler is expected to consolidate most of these ops; however it does improve graph construction time.

--
258208153  by hongkuny<hongkuny@google.com>:

    Adds run_eagerly option for bert.

--
257883986  by hongkuny<hongkuny@google.com>:

    Adds tf.summary for bert training

--
256204636  by hongkuny<hongkuny@google.com>:

    Internal

--
256079834  by hongkuny<hongkuny@google.com>:

    Clean up: move common flags together for further refactoring
    Enable steps_per_loop option for all applications.

--
255493073  by hongkuny<hongkuny@google.com>:

    BERT initial OSS readme update.

--
255470372  by dmchen<dmchen@google.com>:

    Slightly expand expected range for F1 score in BERT SQuAD accuracy test

--
255109240  by hongkuny<hongkuny@google.com>:

    Update eval/predict batch sizes.

--
255010016  by hongkuny<hongkuny@google.com>:

    Internal

--
254874613  by hongkuny<hongkuny@google.com>:

    Update glue tasks enum to match directory name

--
254866171  by taylorrobie<taylorrobie@google.com>:

    Internal change

254785517  by zongweiz<zongweiz@google.com>:

    Use train_single_step for BERT GPU models to temporarily work around some performance bugs in GPU runs

--
254497647  by hongkuny<hongkuny@google.com>:

    Fix device placement for TPU export model.

--
254134531  by yuefengz<yuefengz@google.com>:

    Fix a typo in bert_benchmark.py

--
254069984  by hongkuny<hongkuny@google.com>:
    Automated rollback of changelist 254060732.

254061429  by hongkuny<hongkuny@google.com>:

    Use host while loop for training steps.

--
254060732  by yifeif<yifeif@google.com>:
    Automated rollback of changelist 254027750.

254027750  by hongkuny<hongkuny@google.com>:

    Internal change

253850824  by hongkuny<hongkuny@google.com>:

    Improve bert training utils.

--
253818191  by hongkuny<hongkuny@google.com>:

    Update savedmodel export to use new model.save() api.

--
253636854  by dmchen<dmchen@google.com>:

    Run only training in BERT SQuAD performance test

--
253118910  by hongkuny<hongkuny@google.com>:

    Internal change

253113801  by zongweiz<zongweiz@google.com>:

    Internal change

252697519  by dmchen<dmchen@google.com>:

    BERT SQuAD accuracy test

--
252663512  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

--
252647871  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Enable multi worker TPU training for BERT pretraining.

--
252522861  by hongkuny<hongkuny@google.com>:

    Remove export using trained model due to implementation error

--
252156812  by yuefengz<yuefengz@google.com>:

    Fix the callback method name in BERT: replaced on_batch_start with on_batch_begin. Without the fix, it won't work with Keras callbacks.

--
251782065  by dmchen<dmchen@google.com>:

    Internal change

251681245  by hongkuny<hongkuny@google.com>:

    Update bert to use the new tf.distribute APIs

--
251575972  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Remove `steps_per_run` when instantiating TPUStrategy.

--
251325964  by hongkuny<hongkuny@google.com>:

    Improve flags

--
250942274  by tobyboyd<tobyboyd@google.com>:

    Internal change

250779087  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Reduce BERT Perfzero benchmark test training steps.

--
250713045  by hongkuny<hongkuny@google.com>:

    TPU util

--
250606180  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix BERT benchamrk test errors.

--
250589623  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Change BERT benchmark test pretrained checkpoint url.

--
250587892  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix error in BERT custom training loop checkpoint restoration.

--
250577163  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add logic to inject callback that measures performance in BERT custom training
    loop.

--
250529526  by hongkuny<hongkuny@google.com>:

    Internal clean up

--
250428976  by hongkuny<hongkuny@google.com>:

    Internal change

250415383  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add min/max value to BERT classifier benchmark test.

--
250376246  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add benchmark performance test to run BERT on multiple numbers of GPUs.

--
250347237  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix linting errors in BERT benchmark test.

--
250326131  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

250315593  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

250303528  by haoyuzhang<haoyuzhang@google.com>:

    Add method docstring to fix lint error.

--
250009207  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add feature in BERT to write training metrics to a summary file.

--
249896208  by hongkuny<hongkuny@google.com>:

    Adds __init__.py

--
249883771  by hongkuny<hongkuny@google.com>:

    Creates a benchmark dir

--
249580533  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

249566870  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Set up BERT benchmark test.

--
249500988  by hongkuny<hongkuny@google.com>:

    Lints

--
249377254  by hongkuny<hongkuny@google.com>:

    Internal change

249373328  by hongkuny<hongkuny@google.com>:

    Clean up tf import

--
249333938  by hongkuny<hongkuny@google.com>:

    Fix tf1 import

--
249325089  by hongkuny<hongkuny@google.com>:

    BERT 2.0

--
249173564  by hongkuny<hongkuny@google.com>:

    Internal change

PiperOrigin-RevId: 259442882

1fc839bc

22 Jul, 2019 1 commit

Add a new sanity check script that is able to only check incremental changes. (#7265) · 6a6c3616

Hongkun Yu authored Jul 22, 2019

* Update pylint.rcfile

* Update pylint.rcfile

* Update pylint.rcfile

* add new sanity check script for lint to replace current lint script.

* Revert "Update pylint.rcfile"

This reverts commit f6036cd7e7c4b9e3eeb47bb56a63927a040a2761.

* Revert "Update pylint.rcfile"

This reverts commit e3af497342e26bbbbecfc8c8f79cb0e24a2ef960.

* Revert "Update pylint.rcfile"

This reverts commit 6136636eee6e90fd191ebbb4ccaa9fb89c0290f4.

* update scripts

* disable trailing-newlines

6a6c3616

21 Jul, 2019 1 commit
- Add a simple signal-based Python callstack sampler for debugging · 830a17ec
  Zongwei Zhou authored Jul 19, 2019
  
  830a17ec
20 Jul, 2019 3 commits
- [Transformer] Use float16 input and output for softmax in mixed-precision training · 448c31b6
  Zongwei Zhou authored Jul 12, 2019
  
  448c31b6
- improved v2 check. · 49b90e86
  Toby Boyd authored Jul 19, 2019
  
  49b90e86
- update v2 check and fix ncf v2 check error logic. · 308c7934
  Toby Boyd authored Jul 19, 2019
  
  308c7934
19 Jul, 2019 9 commits

Merged commit includes the following changes: (#7264) · 6f47c378

Igor authored Jul 19, 2019

259030078  by isaprykin<isaprykin@google.com>:

    Clean up the --clone_model_in_keras_dist_strat from Keras Resnet.

    The cloning flag has been removed.  The current rule is that cloning is only done in graph mode.  That resulted in duplicate benchmarks: eager+no-cloning vs eager+cloning.  I removed eager+cloning ones.

--
259026454  by isaprykin<isaprykin@google.com>:

    Internal change

PiperOrigin-RevId: 259030078

6f47c378

Merged commit includes the following changes: (#7263) · c5a4978d

Jing Li authored Jul 19, 2019

* Merged commit includes the following changes:
258867180  by jingli<jingli@google.com>:

    Add new folders for upcoming reorg in model garden.

--
258893811  by hongkuny<hongkuny@google.com>:

    Adds summaries for metrics, allowing metrics inside keras.model.

--
258893048  by isaprykin<isaprykin@google.com>:

    Remove the `cloning` argument to `compile()`.

    Keras models are distributed by cloning in graph mode and without cloning in eager mode as of the change # 258652546.

--
258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258874998  by hongkuny<hongkuny@google.com>:

    Internal

--
258872662  by hongkuny<hongkuny@google.com>:

    Fix doc

--

PiperOrigin-RevId: 258867180

* Create __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

c5a4978d

Revert "Change how TF 2 is checked" (#7260) · 2569fa9a
Toby Boyd authored Jul 19, 2019
```
This reverts commit 712f473e.
```
2569fa9a
Fix lint error · 283de38b
guptapriya authored Jul 18, 2019

283de38b
Disable ncf tests for 1.x · 8c8779a3
guptapriya authored Jul 18, 2019

8c8779a3

NCF Keras: Fail early with TF 1.x + dist strat · 41d071ee

guptapriya authored Jul 18, 2019

This combination does not yet work. Fail early with an explicit message instead of throwing error later on.

41d071ee

Fix for TF-models #7216: CIFAR-10 tutorial for multi-GPU fails because full... · 97a87f9c

Chris Mattmann authored Jul 18, 2019

Fix for TF-models #7216: CIFAR-10 tutorial for multi-GPU fails because full shape isn't passed to prefetch_queue contributed by mattmann. (#7217)

97a87f9c

Change how TF 2 is checked · 712f473e

guptapriya authored Jul 18, 2019

The current approach checks for presence of contrib. Sometimes this is not sufficient (for e..g when testing TF 1 + enable_v2_behavior=True which is what internal tests currently do)

712f473e

Merged commit includes the following changes: (#7255) · 32fadf00

Hongkun Yu authored Jul 18, 2019

258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258874998  by hongkuny<hongkuny@google.com>:

    Internal

--
258872662  by hongkuny<hongkuny@google.com>:

    Fix doc

--
258871624  by hongkuny<hongkuny@google.com>:

    Internal change

PiperOrigin-RevId: 258881002

32fadf00

18 Jul, 2019 1 commit

Merged commit includes the following changes: (#7252) · 1fb34e76

Hongkun Yu authored Jul 18, 2019

258597234  by rxsang<rxsang@google.com>:

    Update all the TPUStrategy examples to use the new v2 APIs, i.e.
    make_dataset_iterator -> experimental_distribute_dataset,
    make_input_fn_iterator -> experimental_distribute_datasets_from_function,
    unwrap -> experimental_local_results,
    experimental_run -> experimental_run_v2

--
258581998  by taylorrobie<taylorrobie@google.com>:

    Update keras v2 optimizers to reuse coefficients which are shared across all updates, which reduces the total number of ops created by between 5% (for simple optimizers such as SGD and Adagrad) and 25% (for complicated optimizers such as Adam and NAdam). Separate copies are made for each device and dtype.

    The effect of this change on run time is fairly minimal since Grappler is expected to consolidate most of these ops; however it does improve graph construction time.

--

PiperOrigin-RevId: 258597234

1fb34e76