Commits · 4909765543ff0c96627161ecc75eec6c309dbdce · ModelZoo / ResNet50_tensorflow

03 Jul, 2019 1 commit

Unit tests pass TF 2.0 GPU and CPU locally. (#7101) · 49097655

Toby Boyd authored Jul 03, 2019

* Fix unit tests failures.

* 96% of TF 2.0 tests on GPU are passing.

* Currently all passing GPU and CPU TF 2.0

* Address code comments.

* use tf 2.0 cast.

* Comment about working on TF 2.0 CPU

* Uses contrib turn off for TF 2.0.

* Fix wide_deep and add keras_common_tests.

* use context to get num_gpus.

* Switch to tf.keras.metrics

49097655

28 Jun, 2019 1 commit

NCF CTL Perf optimization to convert gradients from sparse to dense (#7102) · 44ff121d

nnigania authored Jun 28, 2019

* borrowing a tf1.x optimization which converts gradients from sparse to dense for better perf

* cleanup after code review

44ff121d

24 Jun, 2019 1 commit
- adding 8 gpu test for ncf (#7092) · 1157c738
  nnigania authored Jun 24, 2019
  
  1157c738
21 Jun, 2019 1 commit

NCF XLA and Eager tests with a refactor of resnet flags to make this cleaner. (#7067) · a68f65f8

Toby Boyd authored Jun 21, 2019

* XLA FP32 and first test

* More XLA benchmarks FP32.

* Add eager to NCF and refactor resnet.

* fix v2_0 calls and more flag refactor.

* Remove extra flag args.

* 90 epoch default

* add return

* remove xla not used by estimator.

* Remove duplicate run_eagerly.

* fix flag defaults.

* Remove fp16_implementation flag option.

* Remove stop early on mlperf test.

* remove unneeded args.

* load flags from keras mains.

a68f65f8

18 Jun, 2019 1 commit

adding a new perf test for ncf, and changing some names (#7038) · 90f8c43b

nnigania authored Jun 18, 2019

* adding a new perf test for ncf, and changing some names

* Added change to make ncf use the data from the gcp bucket, and removed the need to re-download data >1day old. Reorganized the perf-zero tests

90f8c43b

13 Jun, 2019 8 commits
- fix ctl case; add check for 2.0 · f6f04066
  guptapriya authored Jun 11, 2019
  
  f6f04066
- Add more tests and benchmarks to cover no dist strat and ctl cases · 8f44de85
  guptapriya authored Jun 11, 2019
  
  8f44de85
- fix non strategy case; clean up documentation · bed2745d
  guptapriya authored Jun 11, 2019
  
  bed2745d
- Fix early stopping metric check · 9214db1d
  guptapriya authored Jun 11, 2019
  
  9214db1d
- use tf.keras.losses instead of tf.losses · 8f6e2547
  guptapriya authored Jun 11, 2019
  
  8f6e2547
- fix loss scaling · 6da769b1
  guptapriya authored Jun 11, 2019
  
  6da769b1
- Clean up unused flags etc · 71c6a697
  guptapriya authored Jun 07, 2019
  
  71c6a697
- set tf seed · 7b6c8999
  guptapriya authored Jun 06, 2019
  
  7b6c8999
05 Jun, 2019 5 commits
- Update min/max threshold for NCF · d01ac976
  guptapriya authored Jun 04, 2019
  
  d01ac976
- fix lint · 080347bc
  guptapriya authored Jun 05, 2019
  
  080347bc
- Add the per epoch callback back · dcd76d49
  guptapriya authored Jun 05, 2019
  
  dcd76d49
- make training input handling in keras fit case the same as CTL case · dbdf712e
  guptapriya authored Jun 05, 2019
  
  dbdf712e
- scale loss by num replicas · 5b81bb59
  guptapriya authored Jun 05, 2019
  
  5b81bb59
03 Jun, 2019 9 commits
- Add NCF custom training loop benchmark (#6943) · e59ad48f
  guptapriya authored Jun 03, 2019
```
* Add CTL benchmark

* Divide train loss by number of train steps

* increase num epochs to 10

* add benchmark for early stopping with CTL

* remove whitespace
```
  e59ad48f
- fix lint issues · 25f13fa9
  guptapriya authored Jun 03, 2019
  
  25f13fa9
- Address code review comments · 3d2a7e7f
  guptapriya authored Jun 03, 2019
  
  3d2a7e7f
- cleanup · d0186041
  guptapriya authored Jun 03, 2019
  
  d0186041
- fix model by making inputs a dict · d7aa51b4
  guptapriya authored Jun 03, 2019
  
  d7aa51b4
- refactor metrics code · 95220449
  guptapriya authored Jun 02, 2019
  
  95220449
- remove cloning flag · 9511801a
  guptapriya authored Jun 03, 2019
  
  9511801a
- try #1 to fix CTL · f0a8be5d
  guptapriya authored Jun 03, 2019
  
  f0a8be5d
- Add custom loss and metrics to NCF compile/fit version · 70704b94
  guptapriya authored Jun 02, 2019
  
  70704b94
31 May, 2019 2 commits
- Fix internal lint errors (#6937) · 7546a9e3
  Haoyu Zhang authored May 31, 2019
  
  7546a9e3
- Fix various lint errors (#6934) · ba415414
  Haoyu Zhang authored May 31, 2019
```
* Fix various lint errors

* Fix logging format
```
  ba415414
29 May, 2019 1 commit

Add flag to use custom training loop for keras NCF model. (#6905) · b5a69819

Bruce Fontaine authored May 28, 2019

* Add flag to use custom training loop for keras NCF model.

* Add error check to NCF model for custom training loop + tf1.0.

b5a69819

28 May, 2019 3 commits
- Add a custom training loop for NCF model with TF2.0 (#6899) · 4c1d95cc
  Bruce Fontaine authored May 28, 2019
```
* Add a custom training loop for NCF model with TF2.0

* Fix long line in ncf_keras_main.py

* Remove dataset repeat when using custom training loop.
```
  4c1d95cc
- Remove extra time_callback · 372ac40a
  guptapriya authored May 28, 2019
  
  372ac40a
- Fix breakage due to early stopping callback · 719eec7b
  guptapriya authored May 28, 2019
  
  719eec7b
24 May, 2019 2 commits

Add early stopping logic to ncf keras when desired threshold is met. Also... · 7033c8a2

Priya Gupta authored May 23, 2019

Add early stopping logic to ncf keras when desired threshold is met. Also change the default batch size to match the tuned hyperparams

7033c8a2

Merged commit that fixes transformer's predict and eval. (#6874) · b9cab01b

Tian Lin authored May 24, 2019

* Merged commit includes the following changes:
249776315  by tianlin<tianlin@google.com>:

    Internal change

249763206  by tianlin<tianlin@google.com>:

    For TF 2.0 (related to Beam Search), expand cond dims in tf.where(cond, x, y) to make all parameters broadcastable.

--
249392724  by hongkuny<hongkuny@google.com>:

    Internal change

PiperOrigin-RevId: 249776315

* Merged commit includes the following changes:
249823043  by tianlin<tianlin@google.com>:

    Bring back v2 test for predict and eval.

--

PiperOrigin-RevId: 249823043

b9cab01b

23 May, 2019 2 commits

NCF Keras: Add validation every epoch · abe9e96a

guptapriya authored May 23, 2019

Adding validation every epoch allows us to view the progress during training instead of having to wait until the last eval. Mostly useful for manual runs.

abe9e96a

Change batch size and epochs for NCF benchmarks · e8f97a1d

guptapriya authored May 23, 2019

Current batch size 160000 does not converge to the desired HR. So we decrease to 99k which is known to converge. Tested locally and got to 63.5 at epoch 7. Also decreasing number of epochs as I don't see any improvement after epoch 7-8.

e8f97a1d

15 May, 2019 1 commit

Set the --clone_model_in_keras_dist_strat to None. (#6781) · 2d4cfad0

Igor authored May 15, 2019

* Set the --clone_model_in_keras_dist_strat to None.  Remove the separate no_cloning benchmarks and add a couple of cloning ones.  Fixes the learning rate schedule to cache its ops per graph.

2d4cfad0

08 May, 2019 1 commit
- r/tf.random_uniform/tf.random.uniform (#6735) · 9c5253f1
  Toby Boyd authored May 08, 2019
  
  9c5253f1
29 Apr, 2019 1 commit

Replace per_device with per_replica and PerDevice with PerReplica, because the... · b00783d7

Igor authored Apr 29, 2019

Replace per_device with per_replica and PerDevice with PerReplica, because the PerDevice concept was renamed and doesn't exist anymore. (#6693)

* Replace per_device with per_replica and PerDevice with PerReplica, because the PerReplica concept was renamed and doesn't exist anymore.

b00783d7