- 07 Jan, 2019 15 commits
-
-
Taylor Robie authored
-
Taylor Robie authored
This reverts commit 63f5827d.
-
Taylor Robie authored
-
Taylor Robie authored
-
Taylor Robie authored
-
Taylor Robie authored
-
Taylor Robie authored
-
Taylor Robie authored
-
Taylor Robie authored
-
Taylor Robie authored
Add bisection based producer for increased scalability, enable fully deterministic data production, and use the materialized and bisection producer to check each other (via expected output md5's)
-
Taylor Robie authored
-
Taylor Robie authored
-
Taylor Robie authored
-
Taylor Robie authored
-
Taylor Robie authored
2nd half of rough replacement pass fix dataset map functions reduce bias in sample selection cache pandas work on a daily basis cleanup and fix batch check for multi gpu multi device fix fix treatment of eval data padding print data producer replace epoch overlap with padding and masking move type and shape info into the producer class and update run.sh with larger batch size hyperparams remove xla for multi GPU more cleanup remove model runner altogether bug fixes address subtle pipeline hang and improve producer __repr__ fix crash fix assert use popen_helper to create pools add StreamingFilesDataset and abstract data storage to a separate class bug fix fix wait bug and add manual stack trace print more bug fixes and refactor valid point mask to work with TPU sharding misc bug fixes and adjust dtypes address crash from decoding bools fix remaining dtypes and change record writer pattern since it does not append fix synthetic data use TPUStrategy instead of TPUEstimator minor tweaks around moving to TPUStrategy cleanup some old code delint and simplify permutation generation remove low level tf layer definition, use single table with slice for keras, and misc fixes missed minor point on removing tf layer definition fix several bugs from recombinging layer definitions delint and add docstrings Update ncf_test.py. Section for identical inputs and different outputs was removed. update data test to run against the new producer class
-
- 20 Dec, 2018 1 commit
-
-
Alexandre Passos authored
-
- 07 Nov, 2018 1 commit
-
-
Reed authored
This tag should match EVAL_HP_NUM_NEG.
-
- 03 Nov, 2018 1 commit
-
-
Reed authored
I've noticed sometimes the async process's pool processes do not die when ncf_main.py ends and kills the async process. This commit fixes the issue.
-
- 01 Nov, 2018 1 commit
-
-
Reed authored
-
- 30 Oct, 2018 3 commits
-
-
Taylor Robie authored
-
Taylor Robie authored
* Keras-ify TPU embedding lookup * delint * pull get_variable() out of keras lambda * delint * move get_variable under variable scope
-
Tayo Oguntebi authored
* Merges TPU-TC optimizations into HEAD. * Split a line that went over 80 from a tab. * Remove trailing whitespace.
-
- 29 Oct, 2018 1 commit
-
-
Reed authored
The option is --nouse_estimator
-
- 26 Oct, 2018 1 commit
-
-
Reed authored
--ml_perf now just changes the model to make it MLPerf compliant. --output_ml_perf_compliance_logging adds the MLPerf compliance logs.
-
- 25 Oct, 2018 2 commits
-
-
Taylor Robie authored
prevent async process from writing alive file until the main process has created the cache root (#5614)
-
Reed authored
The error message was: absl.flags._exceptions.IllegalFlagValueError: flag --ml_perf=None: ('Non-boolean argument to boolean flag', 'None')
-
- 24 Oct, 2018 1 commit
-
-
Taylor Robie authored
* first pass at __getattr__ abuse logger * first pass at adding tags to NCF * minor formatting updates * fix tag name * convert metrics to python floats * getting closer... * direct mlperf logs to a file * small tweaks and add stitching * update tags * fix tag and add a sudo call * tweak format of run.sh * delint * use distribution strategies for evaluation * address PR comments * delint and fix test * adjust flag validation for xla * add prefix to distinguish log stitching * fix index bug * fix clear cache for root user * dockerize cache drop * TIL some regex magic
-
- 20 Oct, 2018 1 commit
-
-
Reed authored
-
- 19 Oct, 2018 1 commit
-
-
Taylor Robie authored
-
- 18 Oct, 2018 2 commits
-
-
Taylor Robie authored
* intermediate commit finish replacing spillover with resampled padding intermediate commit * resolve merge conflict * intermediate commit * further consolidate the data pipeline * complete first pass at data pipeline refactor * remove some leftover code * fix test * remove resampling, and move train padding logic into neumf.py * small tweaks * fix weight bug * address PR comments * fix dict zip. (Reed led me astray) * delint * make data test deterministic and delint * Reed didn't lead me astray. I just can't read. * more delinting * even more delinting * use resampling for last batch padding * pad last batch with unique data * Revert "pad last batch with unique data" This reverts commit cbdf46efcd5c7907038a24105b88d38e7f1d6da2. * move padded batch to the beginning * delint * fix step check for synthetic data
-
Shawn Wang authored
-
- 17 Oct, 2018 2 commits
-
-
Shawn Wang authored
-
Shawn Wang authored
-
- 14 Oct, 2018 1 commit
-
-
Taylor Robie authored
* move flagfile into the cache_dir * remove duplicate code * delint
-
- 13 Oct, 2018 1 commit
-
-
shizhiw authored
* Use data_dir instead of flags.FLAGS.data_dir in data_preprocessing.py. * Use data_dir instead of flags.FLAGS.data_dir in data_preprocessing.py. * Replace multiprocess pool with popen_helper.get_pool() in data_preprocessing.
-
- 11 Oct, 2018 5 commits
-
-
shizhiw authored
* Use data_dir instead of flags.FLAGS.data_dir in data_preprocessing.py. * Use data_dir instead of flags.FLAGS.data_dir in data_preprocessing.py.
-
Shawn Wang authored
Add comments, exit async process after waiting for flagfile for too long and make directory for data_dir in case it does not exist.
-
Shawn Wang authored
-
Shawn Wang authored
-
Shawn Wang authored
-