1. 20 Dec, 2018 1 commit
  2. 07 Nov, 2018 1 commit
  3. 03 Nov, 2018 1 commit
  4. 01 Nov, 2018 1 commit
  5. 30 Oct, 2018 3 commits
  6. 29 Oct, 2018 1 commit
  7. 26 Oct, 2018 1 commit
    • Reed's avatar
      Split --ml_perf into two flags. (#5615) · 4298c3a3
      Reed authored
      --ml_perf now just changes the model to make it MLPerf compliant. --output_ml_perf_compliance_logging adds the MLPerf compliance logs.
      4298c3a3
  8. 25 Oct, 2018 2 commits
  9. 24 Oct, 2018 1 commit
    • Taylor Robie's avatar
      Add logging calls to NCF (#5576) · 780f5265
      Taylor Robie authored
      * first pass at __getattr__ abuse logger
      
      * first pass at adding tags to NCF
      
      * minor formatting updates
      
      * fix tag name
      
      * convert metrics to python floats
      
      * getting closer...
      
      * direct mlperf logs to a file
      
      * small tweaks and add stitching
      
      * update tags
      
      * fix tag and add a sudo call
      
      * tweak format of run.sh
      
      * delint
      
      * use distribution strategies for evaluation
      
      * address PR comments
      
      * delint and fix test
      
      * adjust flag validation for xla
      
      * add prefix to distinguish log stitching
      
      * fix index bug
      
      * fix clear cache for root user
      
      * dockerize cache drop
      
      * TIL some regex magic
      780f5265
  10. 20 Oct, 2018 1 commit
  11. 19 Oct, 2018 1 commit
  12. 18 Oct, 2018 2 commits
    • Taylor Robie's avatar
      Reorder NCF data pipeline (#5536) · 19d4eaaf
      Taylor Robie authored
      * intermediate commit
      
      finish replacing spillover with resampled padding
      
      intermediate commit
      
      * resolve merge conflict
      
      * intermediate commit
      
      * further consolidate the data pipeline
      
      * complete first pass at data pipeline refactor
      
      * remove some leftover code
      
      * fix test
      
      * remove resampling, and move train padding logic into neumf.py
      
      * small tweaks
      
      * fix weight bug
      
      * address PR comments
      
      * fix dict zip. (Reed led me astray)
      
      * delint
      
      * make data test deterministic and delint
      
      * Reed didn't lead me astray. I just can't read.
      
      * more delinting
      
      * even more delinting
      
      * use resampling for last batch padding
      
      * pad last batch with unique data
      
      * Revert "pad last batch with unique data"
      
      This reverts commit cbdf46efcd5c7907038a24105b88d38e7f1d6da2.
      
      * move padded batch to the beginning
      
      * delint
      
      * fix step check for synthetic data
      19d4eaaf
    • Shawn Wang's avatar
      Delint. · 3ec25e5d
      Shawn Wang authored
      3ec25e5d
  13. 17 Oct, 2018 2 commits
  14. 14 Oct, 2018 1 commit
  15. 13 Oct, 2018 1 commit
  16. 11 Oct, 2018 5 commits
  17. 10 Oct, 2018 2 commits
  18. 09 Oct, 2018 2 commits
  19. 05 Oct, 2018 1 commit
  20. 03 Oct, 2018 1 commit
    • Taylor Robie's avatar
      Move evaluation to .evaluate() (#5413) · c494582f
      Taylor Robie authored
      * move evaluation from numpy to tensorflow
      
      fix syntax error
      
      don't use sigmoid to convert logits. there is too much precision loss.
      
      WIP: add logit metrics
      
      continue refactor of NCF evaluation
      
      fix syntax error
      
      fix bugs in eval loss calculation
      
      fix eval loss reweighting
      
      remove numpy based metric calculations
      
      fix logging hooks
      
      fix sigmoid to softmax bug
      
      fix comment
      
      catch rare PIPE error and address some PR comments
      
      * fix metric test and address PR comments
      
      * delint and fix python2
      
      * fix test and address PR comments
      
      * extend eval to TPUs
      c494582f
  21. 02 Oct, 2018 1 commit
  22. 20 Sep, 2018 1 commit
  23. 14 Sep, 2018 1 commit
  24. 11 Sep, 2018 1 commit
  25. 05 Sep, 2018 2 commits
    • Reed's avatar
      Fix spurious "did not start correctly" error. (#5252) · 7babedc5
      Reed authored
      * Fix spurious "did not start correctly" error.
      
      The error "Generation subprocess did not start correctly" would occur if the async process started up after the main process checked for the subproc_alive file.
      
      * Add error message
      7babedc5
    • Reed's avatar
      Fix crash caused by race in the async process. (#5250) · 5856878d
      Reed authored
      When constructing the evaluation records, data_async_generation.py would copy the records into the final directory. The main process would wait until the eval records existed. However, the main process would sometimes read the eval records before they were fully copied, causing a DataLossError.
      5856878d
  26. 22 Aug, 2018 1 commit
    • Reed's avatar
      Fix convergence issues for MLPerf. (#5161) · 64710c05
      Reed authored
      * Fix convergence issues for MLPerf.
      
      Thank you to @robieta for helping me find these issues, and for providng an algorithm for the `get_hit_rate_and_ndcg_mlperf` function.
      
      This change causes every forked process to set a new seed, so that forked processes do not generate the same set of random numbers. This improves evaluation hit rates.
      
      Additionally, it adds a flag, --ml_perf, that makes further changes so that the evaluation hit rate can match the MLPerf reference implementation.
      
      I ran 4 times with --ml_perf and 4 times without. Without --ml_perf, the highest hit rates achieved by each run were 0.6278, 0.6287, 0.6289, and 0.6241. With --ml_perf, the highest hit rates were 0.6353, 0.6356, 0.6367, and 0.6353.
      
      * fix lint error
      
      * Fix failing test
      
      * Address @robieta's feedback
      
      * Address more feedback
      64710c05
  27. 18 Aug, 2018 1 commit
    • Reed's avatar
      Speed up cache construction. (#5131) · 5aee67b4
      Reed authored
      This is done by using a higher Pickle protocol version, which the Python docs describe as being "slightly more efficient". This reduces the file write time at the beginning from 2 1/2 minutes to 5 seconds.
      5aee67b4
  28. 02 Aug, 2018 1 commit