1. 05 Feb, 2019 1 commit
    • Goldie Gadde's avatar
      tf_upgrade_v2 on resnet and utils folders. (#6154) · d6b2b83c
      Goldie Gadde authored
      * Add resnet56 short tests. (#6101)
      
      * Add resnet56 short tests.
      - created base benchmark module
      - renamed accuracy test class to contain the word Accuracy
      which will result in a need to update all the jobs
      and a loss of history but is worth it.
      - short tests are mostly copied from shining with oss refactor
      
      * Address feedback.
      
      * Move flag_methods to init
      - Address setting default flags repeatedly.
      
      * Rename accuracy tests.
      
      * Lint errors resolved.
      
      * fix model_dir set to flags.data_dir.
      
      * fixed not fulling pulling out flag_methods.
      
      * Use core mirrored strategy in official models (#6126)
      
      * Imagenet short tests (#6132)
      
      * Add short imagenet tests (taken from seemuch)
      - also rename to match go forward naming
      
      * fix method name
      
      * Update doc strings.
      
      * Fixe gpu number.
      
      * points default data_dir to child folder. (#6131)
      
      Failed test is python2  and was a kokoro failure
      
      * Imagenet short tests (#6136)
      
      * Add short imagenet tests (taken from seemuch)
      - also rename to match go forward naming
      
      * fix method name
      
      * Update doc strings.
      
      * Fixe gpu number.
      
      * Add fill_objects
      
      * fixed calling wrong class in super.
      
      * fix lint issue.
      
      * Flag (#6121)
      
      * Fix the turn_off_ds flag problem
      
      * add param names to all args
      
      * Export benchmark stats using tf.test.Benchmark.report_benchmark() (#6103)
      
      * Export benchmark stats using tf.test.Benchmark.report_benchmark()
      
      * Fix python style using pyformat
      
      * Typos. (#6120)
      
      * log verbosity=2 logs every epoch no progress bars (#6142)
      
      * tf_upgrade_v2 on resnet and utils folder.
      
      * tf_upgrade_v2 on resnet and utils folder.
      d6b2b83c
  2. 07 Jan, 2019 1 commit
    • Taylor Robie's avatar
      rough pass at carving out existing NCF pipeline · c5ff4ec7
      Taylor Robie authored
      2nd half of rough replacement pass
      
      fix dataset map functions
      
      reduce bias in sample selection
      
      cache pandas work on a daily basis
      
      cleanup and fix batch check for multi gpu
      
      multi device fix
      
      fix treatment of eval data padding
      
      print data producer
      
      replace epoch overlap with padding and masking
      
      move type and shape info into the producer class and update run.sh with larger batch size hyperparams
      
      remove xla for multi GPU
      
      more cleanup
      
      remove model runner altogether
      
      bug fixes
      
      address subtle pipeline hang and improve producer __repr__
      
      fix crash
      
      fix assert
      
      use popen_helper to create pools
      
      add StreamingFilesDataset and abstract data storage to a separate class
      
      bug fix
      
      fix wait bug and add manual stack trace print
      
      more bug fixes and refactor valid point mask to work with TPU sharding
      
      misc bug fixes and adjust dtypes
      
      address crash from decoding bools
      
      fix remaining dtypes and change record writer pattern since it does not append
      
      fix synthetic data
      
      use TPUStrategy instead of TPUEstimator
      
      minor tweaks around moving to TPUStrategy
      
      cleanup some old code
      
      delint and simplify permutation generation
      
      remove low level tf layer definition, use single table with slice for keras, and misc fixes
      
      missed minor point on removing tf layer definition
      
      fix several bugs from recombinging layer definitions
      
      delint and add docstrings
      
      Update ncf_test.py. Section for identical inputs and different outputs was removed.
      
      update data test to run against the new producer class
      c5ff4ec7
  3. 26 Oct, 2018 1 commit
    • Reed's avatar
      Split --ml_perf into two flags. (#5615) · 4298c3a3
      Reed authored
      --ml_perf now just changes the model to make it MLPerf compliant. --output_ml_perf_compliance_logging adds the MLPerf compliance logs.
      4298c3a3
  4. 24 Oct, 2018 2 commits
    • Taylor Robie's avatar
      Move version check to a function (#5601) · f175abc3
      Taylor Robie authored
      * move version check to a function
      
      * delint
      
      * tweak pip check
      
      * delint
      f175abc3
    • Taylor Robie's avatar
      Add logging calls to NCF (#5576) · 780f5265
      Taylor Robie authored
      * first pass at __getattr__ abuse logger
      
      * first pass at adding tags to NCF
      
      * minor formatting updates
      
      * fix tag name
      
      * convert metrics to python floats
      
      * getting closer...
      
      * direct mlperf logs to a file
      
      * small tweaks and add stitching
      
      * update tags
      
      * fix tag and add a sudo call
      
      * tweak format of run.sh
      
      * delint
      
      * use distribution strategies for evaluation
      
      * address PR comments
      
      * delint and fix test
      
      * adjust flag validation for xla
      
      * add prefix to distinguish log stitching
      
      * fix index bug
      
      * fix clear cache for root user
      
      * dockerize cache drop
      
      * TIL some regex magic
      780f5265
  5. 13 Sep, 2018 4 commits
  6. 04 Sep, 2018 1 commit
  7. 13 Jul, 2018 1 commit
  8. 20 Jun, 2018 1 commit
    • Taylor Robie's avatar
      Wide Deep refactor and deep movies (#4506) · 20070ca4
      Taylor Robie authored
      * begin branch
      
      * finish download script
      
      * rename download to dataset
      
      * intermediate commit
      
      * intermediate commit
      
      * misc tweaks
      
      * intermediate commit
      
      * intermediate commit
      
      * intermediate commit
      
      * delint and update census test.
      
      * add movie tests
      
      * delint
      
      * fix py2 issue
      
      * address PR comments
      
      * intermediate commit
      
      * intermediate commit
      
      * intermediate commit
      
      * finish wide deep transition to vanilla movielens
      
      * delint
      
      * intermediate commit
      
      * intermediate commit
      
      * intermediate commit
      
      * intermediate commit
      
      * fix import
      
      * add default ncf csv construction
      
      * change default on download_if_missing
      
      * shard and vectorize example serialization
      
      * fix import
      
      * update ncf data unittests
      
      * delint
      
      * delint
      
      * more delinting
      
      * fix wide-deep movielens serialization
      
      * address PR comments
      
      * add file_io tests
      
      * investigate wide-deep test failure
      
      * remove hard coded path and properly use flags.
      
      * address file_io test PR comments
      
      * missed a hash_bucked_size
      20070ca4
  9. 12 Jun, 2018 1 commit
  10. 06 Jun, 2018 1 commit
  11. 05 Jun, 2018 1 commit
  12. 04 Jun, 2018 2 commits
    • Qianli Scott Zhu's avatar
      Fix lint error for cloud_lib (#4446) · 7ba78e94
      Qianli Scott Zhu authored
      7ba78e94
    • Taylor Robie's avatar
      First pass at a TPU loop for Transformer (#4296) · 2eeb85fe
      Taylor Robie authored
      * port changes from previous branch now that transformer util changes are in master
      
      fix incorrect count
      
      correct (hopefully) treatment of batch_size
      
      set eval_metrics to a dummy function for now
      
      add some comments
      
      start bringing metrics to transformer TPU
      
      resolve logits shape
      
      metrics are now working except for tf.py_func metrics
      
      increase batch_size for tpu, and create summary host call
      
      fix host call
      
      reduce tpu default batch size
      
      further tune batch sizes
      
      add minibatch loss to summary
      
      handle case of single_iteration_train_steps > number points in an epoch
      
      begin to incorporate hooks
      
      add sleep workarounds
      
      disable hooks altogether
      
      generalize host call function and move to newly created tpu utils module
      
      remove all traces of params as an object
      
      switch from  to
      
      address some PR comments, and change the number of data points.
      
      minor tweaks
      
      add tpu dry run for testing, and use matmul for TPU embedding
      
      infeed/outfeed queue issue is fixed. Sleeps are no longer necessary
      
      add some documentation.
      
      cleanup and address PR comments
      
      delint
      
      add accelerator __init__
      
      fix embedding
      
      missed PR comment
      
      address PR comments
      
      fix validator bug
      
      rewrite cloud storage validator, and add oauth dependency to requirements.txt
      
      * delint
      2eeb85fe
  13. 01 Jun, 2018 3 commits
    • Yanhui Liang's avatar
      Fix the hooks test comments (#4427) · 02571056
      Yanhui Liang authored
      02571056
    • Qianli Scott Zhu's avatar
      Add new test ID and test env info to the benchmark run. (#4426) · d2d6ab4c
      Qianli Scott Zhu authored
      * Add new test ID and test env info to the benchmark run.
      
      * Fix test.
      
      * Fix lint
      
      * Address review comment.
      d2d6ab4c
    • Qianli Scott Zhu's avatar
      Record the status for a benchmark run. (#4402) · 47c5642e
      Qianli Scott Zhu authored
      * Update benchmark logger to update the run status.
      
      This is important for streaming upload to bigquery so that the
      dashboard can ignore the 'running' benchmark at the moment since
      its not finished yet.
      
      * Move the run status into a separate table.
      
      Also update the run status in the benchmark uploader and
      BigqueryBenchmarkLogger.
      
      * Insert instead of update for the benchmark status for file logger.
      
      * Address review comments.
      
      Update the logger to have benchmark context, which will update the
      run status accordingly.
      
      * Fix broken tests.
      
      * Move the benchmark logger context to main function.
      
      * Fix tests.
      
      * Update the rest of the models to use the context in main.
      
      * Delint.
      47c5642e
  14. 30 May, 2018 1 commit
  15. 25 May, 2018 1 commit
    • Karmel Allison's avatar
      Fix/log ex per sec (#4360) · d626b908
      Karmel Allison authored
      * Using BenchmarkLogger
      
      * Using BenchmarkLogger
      
      * Fixing tests
      
      * Linting fixes.
      
      * Adding comments
      
      * Moving mock logger
      
      * Moving mock logger
      
      * Glinting
      
      * Responding to CR
      
      * Reverting assertEmpty
      d626b908
  16. 11 May, 2018 1 commit
    • Qianli Scott Zhu's avatar
      Add benchmark logger that does stream upload to bigquery. (#4210) · 0270cac7
      Qianli Scott Zhu authored
      * Move the benchmark_uploader to new location.
      
      * Update benchmark logger to streaming upload.
      
      * Fix lint and unit test error.
      
      * delint.
      
      * Update the benchmark uploader test.
      
      Skip the import of benchmark_uploader when bigquery is not installed.
      
      * Merge the 2 classes of benchmark uploader into 1.
      
      * Address review comments.
      
      * delint.
      
      * Execute bigquery upload in a separate thread.
      
      * Change to use python six.moves for importing.
      
      * Address review comments and delint.
      
      * Address review comment.
      
      Adding comment for potential performance impact for model on CPU.
      
      * Fix random failure on py3.
      
      * Fix the order of flag saver to avoid the randomness.
      
      The test is broken when the benchmark_logger_type is set first, and
      validated when the benchmark_log_dir is not set yet.
      0270cac7
  17. 03 May, 2018 2 commits
  18. 01 May, 2018 1 commit
  19. 20 Apr, 2018 1 commit
  20. 19 Apr, 2018 1 commit
    • Qianli Scott Zhu's avatar
      Benchmark update (#4034) · 21ec0e1b
      Qianli Scott Zhu authored
      * Update the benchmark logger to have default logging.
      
      1. Create global instance of benchmark logger, which default log to
      tf.logging.info
      2. Allow user to config the logging location.
      3. Fix nits in code and comment.
      
      * Fix lint and test error.
      
      * Address review comments.
      
      * Remove the duplicated print statement.
      21ec0e1b
  21. 03 Apr, 2018 1 commit
    • Karmel Allison's avatar
      Rename logging directory (#3860) · a0e3604f
      Karmel Allison authored
      * Updating name of logging package to avoid overwriting Python builtin logging.
      
      * Updating name of logging package to avoid overwriting Python builtin logging.
      a0e3604f