Commits · 9485aa1dcf2c2e1ebaede0af0283796dcfb01911 · ModelZoo / ResNet50_tensorflow

30 Apr, 2021 2 commits
- Disabling Tensorboard profiling for NCF. · eff1856a
  A. Unique TensorFlower authored Apr 29, 2021
```
PiperOrigin-RevId: 371256980
```
  eff1856a
- Disabling Tensorboard profiling for NCF. · 6b695ca6
  A. Unique TensorFlower authored Apr 29, 2021
```
PiperOrigin-RevId: 371256980
```
  6b695ca6
10 Mar, 2021 2 commits
- Internal change · 5df0cd30
  Frederick Liu authored Mar 10, 2021
```
PiperOrigin-RevId: 362075728
```
  5df0cd30
- Internal change · cca677c9
  Frederick Liu authored Mar 10, 2021
```
PiperOrigin-RevId: 362075728
```
  cca677c9
12 Aug, 2020 2 commits
- Internal change · 999fae62
  Hongkun Yu authored Aug 12, 2020
```
PiperOrigin-RevId: 326286926
```
  999fae62
- Internal change · 88253ce5
  Hongkun Yu authored Aug 12, 2020
```
PiperOrigin-RevId: 326286926
```
  88253ce5
19 May, 2020 1 commit
- [Clean up] Remove enable_eager in the session config: Model garden is TF2 only now. · c2666cea
  Hongkun Yu authored May 19, 2020
```
Remove is_v2_0

PiperOrigin-RevId: 312336907
```
  c2666cea
31 Mar, 2020 1 commit
- Move NCF estimator to R1. · 6d7030f2
  Hongkun Yu authored Mar 30, 2020
```
PiperOrigin-RevId: 303897691
```
  6d7030f2
24 Feb, 2020 1 commit
- Use unittest.mock as we are py3 now · 4b8f80c3
  Hongkun Yu authored Feb 24, 2020
```
PiperOrigin-RevId: 296944580
```
  4b8f80c3
16 Oct, 2019 1 commit

Add support for the tf.keras.mixed_precision API in NCF · cb913691

Reed Wanderman-Milne authored Oct 16, 2019

To test, I did 50 fp32 runs and 50 fp16 runs. I used the following command:

python ncf_keras_main.py --dataset=ml-20m --num_gpus=1 --train_epochs=10 --clean --batch_size=99000 --learning_rate=0.00382059 --beta1=0.783529 --beta2=0.909003 --epsilon=1.45439e-7 --layers=256,256,128,64 --num_factors=64 --hr_threshold=0.635 --ml_perf --nouse_synthetic_data --data_dir ~/ncf_data_dir_python3 --model_dir ~/tmp_model_dir --keras_use_ctl

For the fp16 runs, I added --dtype=fp16. The average hit-rate for both fp16 and fp32 was 0.6365. I also did 50 runs with the mixed precision graph rewrite, and the average hit-rate was 0.6363. The difference is likely due to noise.

PiperOrigin-RevId: 275059871

cb913691

19 Aug, 2019 1 commit

Do not expose --max_train_steps in models that do not use it. · 824ff2d6

Reed Wanderman-Milne authored Aug 19, 2019

Only the V1 resnet model uses --max_train_steps. This unexposes the flag in the keras_application_models, mnist, keras resnet, CTL resnet Models. Before this change, such models allowed the flag to be specified, but ignored it.

I also removed the "max_train" argument from the run_synthetic function, since this only had any meaning for the V1 resnet model. Instead, the V1 resnet model now directly passes --max_train_steps=1 to run_synthetic.

PiperOrigin-RevId: 264269836

824ff2d6

12 Aug, 2019 1 commit

Merged commit includes the following changes: (#7430) · 03b4a0af

Hongjun Choi authored Aug 12, 2019

262988559  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Enable NCF TF 2.0 model to run on TPUStrategy.

--
262971756  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

262967691  by hongkuny<hongkuny@google.com>:

    Internal

--

PiperOrigin-RevId: 262988559

03b4a0af

19 Jul, 2019 1 commit
- Disable ncf tests for 1.x · 8c8779a3
  guptapriya authored Jul 18, 2019
  
  8c8779a3
03 Jul, 2019 1 commit

Unit tests pass TF 2.0 GPU and CPU locally. (#7101) · 49097655

Toby Boyd authored Jul 03, 2019

* Fix unit tests failures.

* 96% of TF 2.0 tests on GPU are passing.

* Currently all passing GPU and CPU TF 2.0

* Address code comments.

* use tf 2.0 cast.

* Comment about working on TF 2.0 CPU

* Uses contrib turn off for TF 2.0.

* Fix wide_deep and add keras_common_tests.

* use context to get num_gpus.

* Switch to tf.keras.metrics

49097655

13 Jun, 2019 2 commits
- fix ctl case; add check for 2.0 · f6f04066
  guptapriya authored Jun 11, 2019
  
  f6f04066
- Add more tests and benchmarks to cover no dist strat and ctl cases · 8f44de85
  guptapriya authored Jun 11, 2019
  
  8f44de85
20 Apr, 2019 1 commit

Remove contrib imports, or move them inline (#6591) · 8ff9eb54

Shining Sun authored Apr 19, 2019

* Remove contrib imports, or move them inline

* Use exposed API for FixedLenFeature

* Replace tf.logging with absl logging

* Change GFile to v2 APIs

* replace tf.logging with absl loggin in movielens

* Fixing an import bug

* Change gfile to v2 APIs in code

* Swap to keras optimizer v2

* Bug fix for optimizer

* Change tf.log to tf.keras.backend.log

* Change the loss function to keras loss

* convert another loss to keras loss

* Resolve comments and fix lint

* Add a doc string

* Fix existing tests and add new tests for DS

* Added tests for multi-replica

* Fix lint

* resolve comments

* make estimator run in tf2.0

* use compat v1 loss

* fix lint issue

8ff9eb54

28 Mar, 2019 1 commit

Added benchmark test and convergence test for the NCF model (#6318) · 4c11b84b

Shining Sun authored Mar 28, 2019

* initial commit

* bug fix

* Move build_stats from common to keras main, because it is only applicable in keras

* remove tailing blank line

* add test for synth data

* add kwargs to init

* add kwargs to function invokation

* correctly pass kwargs

* debug

* debug

* debug

* fix super init

* bug fix

* fix local_flags

* fix import

* bug fix

* fix log_steps flag

* bug fix

* bug fix: add missing return value

* resolve double-defined flags

* lint fix

* move log_steps flag to benchmarK flag

* fix lint

* lint fix

* lint fix

* try flag core default values

* bug fix

* bug fix

* bug fix

* debug

* debug

* remove debug prints

* rename benchmark methods

* flag bug fix for synth benchmark

4c11b84b

13 Mar, 2019 1 commit

Fix ncf test for keras (#6355) · dadc4a62

Shining Sun authored Mar 13, 2019

* Fix ncf test for keras

* add a todo for batch_size and eval_batch_size for ncf keras

* lint fix

* fix typos

* Lint fix

* fix lint

* resolve pr comment

* resolve pr comment

dadc4a62

02 Mar, 2019 1 commit
- fix resnet breakage and add keras end-to-end tests (#6295) · 8367cf6d
  Taylor Robie authored Mar 02, 2019
```
* fix resnet breakage and add keras end-to-end tests

* delint

* address PR comments
```
  8367cf6d
01 Mar, 2019 1 commit

Keras-fy NCF Model (#6092) · 048e5bff

Shining Sun authored Mar 01, 2019

* tmp commit

* tmp commit

* first attempt (without eval)

* Bug fixes

* bug fixes

* training done

* Loss NAN, no eval

* Loss weight problem solved

* resolve the NAN loss problem

* Problem solved. Clean up needed

* Added a todo

* Remove debug prints

* Extract get_optimizer to ncf_common

* Move metrics computation back to neumf; use DS.scope api

* Extract DS.scope code to utils

* lint fixes

* Move obtaining DS above producer.start to avoid race condition

* move pt 1

* move pt 2

* Update the run script

* Wrap keras_model related code into functions

* Update the doc for softmax_logitfy and change the method name

* Resolve PR comments

* working version with: eager, DS, batch and no masks

* Remove git conflict indicator

* move reshape to neumf_model

* working version, not converge

* converged

* fix a test

* more lint fix

* more lint fix

* more lint fixes

* more lint fix

* Removed unused imports

* fix test

* dummy commit for kicking of checks

* fix lint issue

* dummy input to kick off checks

* dummy input to kick off checks

* add collective to dist strat

* addressed review comments

* add a doc string

048e5bff

07 Jan, 2019 2 commits

address PR comments · 1bb074b0
Taylor Robie authored Jan 07, 2019

1bb074b0

rough pass at carving out existing NCF pipeline · c5ff4ec7

Taylor Robie authored Nov 18, 2018

2nd half of rough replacement pass

fix dataset map functions

reduce bias in sample selection

cache pandas work on a daily basis

cleanup and fix batch check for multi gpu

multi device fix

fix treatment of eval data padding

print data producer

replace epoch overlap with padding and masking

move type and shape info into the producer class and update run.sh with larger batch size hyperparams

remove xla for multi GPU

more cleanup

remove model runner altogether

bug fixes

address subtle pipeline hang and improve producer __repr__

fix crash

fix assert

use popen_helper to create pools

add StreamingFilesDataset and abstract data storage to a separate class

bug fix

fix wait bug and add manual stack trace print

more bug fixes and refactor valid point mask to work with TPU sharding

misc bug fixes and adjust dtypes

address crash from decoding bools

fix remaining dtypes and change record writer pattern since it does not append

fix synthetic data

use TPUStrategy instead of TPUEstimator

minor tweaks around moving to TPUStrategy

cleanup some old code

delint and simplify permutation generation

remove low level tf layer definition, use single table with slice for keras, and misc fixes

missed minor point on removing tf layer definition

fix several bugs from recombinging layer definitions

delint and add docstrings

Update ncf_test.py. Section for identical inputs and different outputs was removed.

update data test to run against the new producer class

c5ff4ec7

03 Nov, 2018 1 commit

Have async process end when all data is written. (#5652) · 424fe9f6

Reed authored Nov 02, 2018

I've noticed sometimes the async process's pool processes do not die when ncf_main.py ends and kills the async process. This commit fixes the issue.

424fe9f6

01 Nov, 2018 1 commit
- Add --use_while_loop option. (#5653) · 826eea75
  Reed authored Nov 01, 2018
  
  826eea75
29 Oct, 2018 1 commit
- Add option to not use estimator. (#5623) · 0c0860ed
  Reed authored Oct 29, 2018
```
The option is --nouse_estimator
```
  0c0860ed
26 Oct, 2018 1 commit

Split --ml_perf into two flags. (#5615) · 4298c3a3

Reed authored Oct 26, 2018

--ml_perf now just changes the model to make it MLPerf compliant. --output_ml_perf_compliance_logging adds the MLPerf compliance logs.

4298c3a3

03 Oct, 2018 1 commit

Move evaluation to .evaluate() (#5413) · c494582f

Taylor Robie authored Oct 02, 2018

* move evaluation from numpy to tensorflow

fix syntax error

don't use sigmoid to convert logits. there is too much precision loss.

WIP: add logit metrics

continue refactor of NCF evaluation

fix syntax error

fix bugs in eval loss calculation

fix eval loss reweighting

remove numpy based metric calculations

fix logging hooks

fix sigmoid to softmax bug

fix comment

catch rare PIPE error and address some PR comments

* fix metric test and address PR comments

* delint and fix python2

* fix test and address PR comments

* extend eval to TPUs

c494582f

22 Aug, 2018 1 commit

Fix convergence issues for MLPerf. (#5161) · 64710c05

Reed authored Aug 22, 2018

* Fix convergence issues for MLPerf.

Thank you to @robieta for helping me find these issues, and for providng an algorithm for the `get_hit_rate_and_ndcg_mlperf` function.

This change causes every forked process to set a new seed, so that forked processes do not generate the same set of random numbers. This improves evaluation hit rates.

Additionally, it adds a flag, --ml_perf, that makes further changes so that the evaluation hit rate can match the MLPerf reference implementation.

I ran 4 times with --ml_perf and 4 times without. Without --ml_perf, the highest hit rates achieved by each run were 0.6278, 0.6287, 0.6289, and 0.6241. With --ml_perf, the highest hit rates were 0.6353, 0.6356, 0.6367, and 0.6353.

* fix lint error

* Fix failing test

* Address @robieta's feedback

* Address more feedback

64710c05