official/resnet/keras/keras_common.py · c7f29b2a40e5698fb4053d93e9ddde4135f5ba49 · ModelZoo / ResNet50_tensorflow

Replace pipeline in NCF (#5786) · 56cbd1f2
Taylor Robie authored Jan 08, 2019
* rough pass at carving out existing NCF pipeline

2nd half of rough replacement pass

fix dataset map functions

reduce bias in sample selection

cache pandas work on a daily basis

cleanup and fix batch check for multi gpu

multi device fix

fix treatment of eval data padding

print data producer

replace epoch overlap with padding and masking

move type and shape info into the producer class and update run.sh with larger batch size hyperparams

remove xla for multi GPU

more cleanup

remove model runner altogether

bug fixes

address subtle pipeline hang and improve producer __repr__

fix crash

fix assert

use popen_helper to create pools

add StreamingFilesDataset and abstract data storage to a separate class

bug fix

fix wait bug and add manual stack trace print

more bug fixes and refactor valid point mask to work with TPU sharding

misc bug fixes and adjust dtypes

address crash from decoding bools

fix remaining dtypes and change record writer pattern since it does not append

fix synthetic data

use TPUStrategy instead of TPUEstimator

minor tweaks around moving to TPUStrategy

cleanup some old code

delint and simplify permutation generation

remove low level tf layer definition, use single table with slice for keras, and misc fixes

missed minor point on removing tf layer definition

fix several bugs from recombinging layer definitions

delint and add docstrings

Update ncf_test.py. Section for identical inputs and different outputs was removed.

update data test to run against the new producer class

* remove 'deterministic'

* delint

* address PR comments

* change eval_batch_size flag from a string to an int

* Add bisection based producer for increased scalability, enable fully deterministic data production, and use the materialized and bisection producer to check each other (via expected output md5's)

* remove references to hash pipeline

* skip bisection when it is not needed

* add unbuffer to run.sh as tee is causing issues

* address PR comments

* address more PR comments

* fix lint errors

* trim lines in resnet keras

* remove mock to debug kokoro failures

* Revert "remove mock to debug kokoro failures"

This reverts commit 63f5827d.

* remove match_mlperf from expected cache keys

* fix test now that cache construction no longer uses match_mlperf

* disable tests to debug test failure

* disable more tests

* completely disable data_test

* restore data test

* add versions to requirements.txt

* update call to TPUStrategy
56cbd1f2
keras_common.py 7.68 KB
Replace keras_common.py