• Taylor Robie's avatar
    NCF pipeline refactor (take 2) and initial TPU port. (#4935) · 6518c1c7
    Taylor Robie authored
    * intermediate commit
    
    * ncf now working
    
    * reorder pipeline
    
    * allow batched decode for file backed dataset
    
    * fix bug
    
    * more tweaks
    
    * parallize false negative generation
    
    * shared pool hack
    
    * workers ignore sigint
    
    * intermediate commit
    
    * simplify buffer backed dataset creation to fixed length record approach only. (more cleanup needed)
    
    * more tweaks
    
    * simplify pipeline
    
    * fix misplaced cleanup() calls. (validation works\!)
    
    * more tweaks
    
    * sixify memoryview usage
    
    * more sixification
    
    * fix bug
    
    * add future imports
    
    * break up training input pipeline
    
    * more pipeline tuning
    
    * first pass at moving negative generation to async
    
    * refactor async pipeline to use files instead of ipc
    
    * refactor async pipeline
    
    * move expansion and concatenation from reduce worker to generation workers
    
    * abandon complete async due to interactions with the tensorflow threadpool
    
    * cleanup
    
    * remove performance_comparison.py
    
    * experiment with rough generator + interleave pipeline
    
    * yet more pipeline tuning
    
    * update on-the-fly pipeline
    
    * refactor preprocessing, and move train generation behind a GRPC server
    
    * fix leftover call
    
    * intermediate commit
    
    * intermediate commit
    
    * fix index error in data pipeline, and add logging to train data server
    
    * make sharding more robust to imbalance
    
    * correctly sample with replacement
    
    * file buffers are no longer needed for this branch
    
    * tweak sampling methods
    
    * add README for data pipeline
    
    * fix eval sampling, and vectorize eval metrics
    
    * add spillover and static training batch sizes
    
    * clean up cruft from earlier iterations
    
    * rough delint
    
    * delint 2 / n
    
    * add type annotations
    
    * update run script
    
    * make run.sh a bit nicer
    
    * change embedding initializer to match reference
    
    * rough pass at pure estimator model_fn
    
    * impose static shape hack (revisit later)
    
    * refinements
    
    * fix dir error in run.sh
    
    * add documentation
    
    * add more docs and fix an assert
    
    * old data test is no longer valid. Keeping it around as reference for the new one
    
    * rough draft of data pipeline validation script
    
    * don't rely on shuffle default
    
    * tweaks and documentation
    
    * add separate eval batch size for performance
    
    * initial commit
    
    * terrible hacking
    
    * mini hacks
    
    * missed a bug
    
    * messing about trying to get TPU running
    
    * TFRecords based TPU attempt
    
    * bug fixes
    
    * don't log remotely
    
    * more bug fixes
    
    * TPU tweaks and bug fixes
    
    * more tweaks
    
    * more adjustments
    
    * rework model definition
    
    * tweak data pipeline
    
    * refactor async TFRecords generation
    
    * temp commit to run.sh
    
    * update log behavior
    
    * fix logging bug
    
    * add check for subprocess start to avoid cryptic hangs
    
    * unify deserialize and make it TPU compliant
    
    * delint
    
    * remove gRPC pipeline code
    
    * fix logging bug
    
    * delint and remove old test files
    
    * add unit tests for NCF pipeline
    
    * delint
    
    * clean up run.sh, and add run_tpu.sh
    
    * forgot the most important line
    
    * fix run.sh bugs
    
    * yet more bash debugging
    
    * small tweak to add keras summaries to model_fn
    
    * Clean up sixification issues
    
    * address PR comments
    
    * delinting is never over
    6518c1c7
presubmit.sh 2.41 KB