Commits · f42fddee89d96d2b0af102ca440f76087e817495 · ModelZoo / ResNet50_tensorflow

31 May, 2019 2 commits

Merged commit includes the following changes: (#6931) · f42fddee

Hongjun Choi authored May 30, 2019

250779087  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Reduce BERT Perfzero benchmark test training steps.

--

PiperOrigin-RevId: 250779087

f42fddee

Support pure eager execution in ResNet50 (#6929) · f6c2d9f8
Haoyu Zhang authored May 30, 2019
```
* Support pure eager execution in ResNet50

* Use smaller batch size
```
f6c2d9f8

30 May, 2019 2 commits

Merged commit includes the following changes: (#6926) · 15db2195
saberkun authored May 30, 2019
```
250713045  by hongkuny<hongkuny@google.com>:

    TPU util

--

PiperOrigin-RevId: 250713045
```
15db2195

Merged commit includes the following changes: (#6921) · d76e39e7

Hongjun Choi authored May 29, 2019

250606180  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix BERT benchamrk test errors.

--
250589623  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Change BERT benchmark test pretrained checkpoint url.

--
250587892  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix error in BERT custom training loop checkpoint restoration.

--
250577163  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add logic to inject callback that measures performance in BERT custom training
    loop.

--
250529526  by hongkuny<hongkuny@google.com>:

    Internal clean up

--
250428976  by hongkuny<hongkuny@google.com>:

    Internal change

250415383  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add min/max value to BERT classifier benchmark test.

--
250376246  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add benchmark performance test to run BERT on multiple numbers of GPUs.

--

PiperOrigin-RevId: 250606180

d76e39e7

29 May, 2019 7 commits
- Add tweaked cloning tests (#6916) · ab993a21
  Haoyu Zhang authored May 29, 2019
  
  ab993a21
- Put all python dependencies into one line. (#6870) · e0388cfe
  Marvin Teichmann authored May 29, 2019
```
* Put all python dependencies into one line.

This makes it easier to copy, paste & install all dependencies at once. In addition many users have custom setups (virtualenv, conda, .etc). Having it in one line easily allows to grap the dependencies.

* Remove 'sudo' from all pip install commands and adjust troubleshooting section.
```
  e0388cfe
- Make max_length and static_batch configurable (#6893) · ab1c1dfc
  Zhang Xunkai authored May 29, 2019
```
* Make max_length and static_batch configurable.

* Fix line length.

* Fix incorrect parameters in building eval input.

* Improve comments for readability.
```
  ab1c1dfc
- update estimator benchmarks too · e80b385a
  guptapriya authored May 29, 2019
  
  e80b385a
- Reduce max_length to 64 in static_batch cases. · 39638d66
  guptapriya authored May 28, 2019
  
  39638d66
- fix num_gpus in benchmark · 3bb5dd6c
  guptapriya authored May 28, 2019
  
  3bb5dd6c
- Add flag to use custom training loop for keras NCF model. (#6905) · b5a69819
  Bruce Fontaine authored May 28, 2019
```
* Add flag to use custom training loop for keras NCF model.

* Add error check to NCF model for custom training loop + tf1.0.
```
  b5a69819
28 May, 2019 13 commits

Add static batch benchmarks to estimator (#6886) · 383c6e30

guptapriya authored May 28, 2019

* Add static batch benchmarks to estimator 

So we can distinguish how much static vs dynamic batch matter.

* change max_length for static_batch tests

* Add flag for max length

383c6e30

Make 'off' a string literal. · 3928d481
Igor authored May 28, 2019

3928d481
Turn dist strat off for 1 GPU benchmarks · 2be9ba5b
guptapriya authored May 28, 2019

2be9ba5b
Remove assert_broadcastable monkey patch (#6901) · 1d16f473
Haoyu Zhang authored May 28, 2019

1d16f473

Add a custom training loop for NCF model with TF2.0 (#6899) · 4c1d95cc

Bruce Fontaine authored May 28, 2019

* Add a custom training loop for NCF model with TF2.0

* Fix long line in ncf_keras_main.py

* Remove dataset repeat when using custom training loop.

4c1d95cc

undo shuffle change · df523d91

guptapriya authored May 28, 2019

this is not going to help with current tf.data semantics. so removing it.

df523d91

Add distribute strategies to transformer. (#6883) · b9c1d1ca

Igor authored May 28, 2019

* Fixes that make transformer run.

* Remove debug print statements.

* Changed the permissions to 644.

* Fix the rest of the permissions.

* enable static batch in all benchmarks

* Restrict dist strat hack to training mode

For now we will do predict/eval without dist strat, so remove that hack in non training cases.

* Use `inputs` instead of `x` as arg name for call

Keras has different behavior based on whether the inputs are called `inputs` or not. Using `inputs` gives expected behaviors.

* Avoid extra map fn on input in dist strat case

* Update how we handle custom metrics

This new approach works with and without dist strat. The previous one didn't work with dist strat. We need to fix that but this is reasonable in meantime (b/133724664).

* Update benchmarks

* typo in metrics code

* Revert metrics change

Didn't actually work in distributed case..

b9c1d1ca

Merged commit includes the following changes: (#6898) · 7af3bd91

Hongjun Choi authored May 28, 2019

250347237  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix linting errors in BERT benchmark test.

--
250326131  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

250315593  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

250303528  by haoyuzhang<haoyuzhang@google.com>:

    Add method docstring to fix lint error.

--

PiperOrigin-RevId: 250347237

7af3bd91

Use more warmup steps for 96 core tests (#6881) · 8b52cd23

Haoyu Zhang authored May 28, 2019

* Run different numbers of steps on different platforms

* Add new tests for delayed performance measurement

8b52cd23

Add shuffle to dataset records · 733a752d
guptapriya authored May 28, 2019
```
This shuffling should help in getting shuffling each epoch.
```
733a752d

Fix bug in dataset reader. (#6871) · 9b7b64be

Marvin Teichmann authored May 28, 2019

The ".mat" files loaded in the dataset are byte files. Python 3.7 requires them to be loaded using "rb".

9b7b64be

Remove extra time_callback · 372ac40a
guptapriya authored May 28, 2019

372ac40a
Fix breakage due to early stopping callback · 719eec7b
guptapriya authored May 28, 2019

719eec7b

26 May, 2019 1 commit

Merged commit includes the following changes: (#6885) · 6fc642d4

Hongjun Choi authored May 25, 2019

250009207  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add feature in BERT to write training metrics to a summary file.

--

PiperOrigin-RevId: 250009207

6fc642d4

24 May, 2019 7 commits

Merged commit includes the following changes: (#6880) · fa10031d

saberkun authored May 24, 2019

249896208  by hongkuny<hongkuny@google.com>:

    Adds __init__.py

--

PiperOrigin-RevId: 249896208

fa10031d

Add early stopping logic to ncf keras when desired threshold is met. Also... · 7033c8a2

Priya Gupta authored May 23, 2019

Add early stopping logic to ncf keras when desired threshold is met. Also change the default batch size to match the tuned hyperparams

7033c8a2

Merged commit includes the following changes: (#6879) · 7f9db598

saberkun authored May 24, 2019

249883771  by hongkuny<hongkuny@google.com>:

    Creates a benchmark dir

--

PiperOrigin-RevId: 249883771

7f9db598

Transformer v2 benchmark (#6860) · f2ea2f53

Toby Boyd authored May 24, 2019

* Moved common keras code to utils.

* Initial 1 gpu benchmark

- Aligned flags with resnet example
- removed code/features that are not super useful
- eval as part of train if bleu source/ref provided
- add exp_per_second hook

* Rename benchmark classes, pass batch-size and log_steps.

* fix docstring

* Predict done with checkpoints inline

- perfzero baseclass

* steps not epochs with smoother training loop.

* do not initialize history outside loop.

* 5000 between eval not 500

* estimator to keras.

* remove epochs var.

* use range not xrange.

* 200K steps for 1 gpu

* fix global step

f2ea2f53

Add a graph optional_next Reset benchmark. (#6876) · 49eaaaf2
rxsang authored May 24, 2019
```
* Add a graph optional_next Reset benchmark.

* Fix lint error.
```
49eaaaf2
Moved common keras code to utils. (#6859) · 3254cabb
Toby Boyd authored May 24, 2019

3254cabb

Merged commit that fixes transformer's predict and eval. (#6874) · b9cab01b

Tian Lin authored May 24, 2019

* Merged commit includes the following changes:
249776315  by tianlin<tianlin@google.com>:

    Internal change

249763206  by tianlin<tianlin@google.com>:

    For TF 2.0 (related to Beam Search), expand cond dims in tf.where(cond, x, y) to make all parameters broadcastable.

--
249392724  by hongkuny<hongkuny@google.com>:

    Internal change

PiperOrigin-RevId: 249776315

* Merged commit includes the following changes:
249823043  by tianlin<tianlin@google.com>:

    Bring back v2 test for predict and eval.

--

PiperOrigin-RevId: 249823043

b9cab01b

23 May, 2019 6 commits

Add a test enabling get_next_as_optional behavior. (#6862) · 92bad0d2

rxsang authored May 23, 2019

* Add a test enabling get_next_as_optional behavior.

* Remove repeated flag.

* Remove trailing space.

* Make the name shorter.

* Fix lint error.

* Refine the benchmark name.

92bad0d2

Fix non dist strat case. (#6867) · 68650c42
rxsang authored May 23, 2019

68650c42

NCF Keras: Add validation every epoch · abe9e96a

guptapriya authored May 23, 2019

Adding validation every epoch allows us to view the progress during training instead of having to wait until the last eval. Mostly useful for manual runs.

abe9e96a

Change batch size and epochs for NCF benchmarks · e8f97a1d

guptapriya authored May 23, 2019

Current batch size 160000 does not converge to the desired HR. So we decrease to 99k which is known to converge. Tested locally and got to 63.5 at epoch 7. Also decreasing number of epochs as I don't see any improvement after epoch 7-8.

e8f97a1d

Merged commit includes the following changes: (#6863) · f06b5716

Hongjun Choi authored May 22, 2019

249580533  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

249566870  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Set up BERT benchmark test.

--

PiperOrigin-RevId: 249580533

f06b5716

Add enable_get_next_as_optional flag. (#6858) · 272a2baa

rxsang authored May 22, 2019

* Add enable_get_next_as_optional flag.

* Set enable_get_next_as_optional to strategy.

* Add comments to explain the flag.

* Remove trailing whitespace.

* Remove trailing space.

272a2baa

22 May, 2019 2 commits
- fix lint issues. (#6855) · 3a97b68c
  Toby Boyd authored May 22, 2019
  
  3a97b68c
- Merged commit includes the following changes: (#6856) · 85bdf764
  saberkun authored May 22, 2019
```
249500988  by hongkuny<hongkuny@google.com>:

    Lints

--

PiperOrigin-RevId: 249500988
```
  85bdf764