• Taylor Robie's avatar
    Replace pipeline in NCF (#5786) · 56cbd1f2
    Taylor Robie authored
    * rough pass at carving out existing NCF pipeline
    
    2nd half of rough replacement pass
    
    fix dataset map functions
    
    reduce bias in sample selection
    
    cache pandas work on a daily basis
    
    cleanup and fix batch check for multi gpu
    
    multi device fix
    
    fix treatment of eval data padding
    
    print data producer
    
    replace epoch overlap with padding and masking
    
    move type and shape info into the producer class and update run.sh with larger batch size hyperparams
    
    remove xla for multi GPU
    
    more cleanup
    
    remove model runner altogether
    
    bug fixes
    
    address subtle pipeline hang and improve producer __repr__
    
    fix crash
    
    fix assert
    
    use popen_helper to create pools
    
    add StreamingFilesDataset and abstract data storage to a separate class
    
    bug fix
    
    fix wait bug and add manual stack trace print
    
    more bug fixes and refactor valid point mask to work with TPU sharding
    
    misc bug fixes and adjust dtypes
    
    address crash from decoding bools
    
    fix remaining dtypes and change record writer pattern since it does not append
    
    fix synthetic data
    
    use TPUStrategy instead of TPUEstimator
    
    minor tweaks around moving to TPUStrategy
    
    cleanup some old code
    
    delint and simplify permutation generation
    
    remove low level tf layer definition, use single table with slice for keras, and misc fixes
    
    missed minor point on removing tf layer definition
    
    fix several bugs from recombinging layer definitions
    
    delint and add docstrings
    
    Update ncf_test.py. Section for identical inputs and different outputs was removed.
    
    update data test to run against the new producer class
    
    * remove 'deterministic'
    
    * delint
    
    * address PR comments
    
    * change eval_batch_size flag from a string to an int
    
    * Add bisection based producer for increased scalability, enable fully deterministic data production, and use the materialized and bisection producer to check each other (via expected output md5's)
    
    * remove references to hash pipeline
    
    * skip bisection when it is not needed
    
    * add unbuffer to run.sh as tee is causing issues
    
    * address PR comments
    
    * address more PR comments
    
    * fix lint errors
    
    * trim lines in resnet keras
    
    * remove mock to debug kokoro failures
    
    * Revert "remove mock to debug kokoro failures"
    
    This reverts commit 63f5827d.
    
    * remove match_mlperf from expected cache keys
    
    * fix test now that cache construction no longer uses match_mlperf
    
    * disable tests to debug test failure
    
    * disable more tests
    
    * completely disable data_test
    
    * restore data test
    
    * add versions to requirements.txt
    
    * update call to TPUStrategy
    56cbd1f2
keras_common.py 7.68 KB