1. 18 Aug, 2018 1 commit
    • Reed's avatar
      Speed up cache construction. (#5131) · 5aee67b4
      Reed authored
      This is done by using a higher Pickle protocol version, which the Python docs describe as being "slightly more efficient". This reduces the file write time at the beginning from 2 1/2 minutes to 5 seconds.
      5aee67b4
  2. 16 Aug, 2018 2 commits
  3. 15 Aug, 2018 1 commit
  4. 14 Aug, 2018 2 commits
    • alope107's avatar
      Transformer partial fix (#5092) · 6f5967a0
      alope107 authored
      * Fix Transformer TPU crash in Python 2.X.
      
      - Tensorflow raises an error when tf_inspect.getfullargspec is called on
      a functools.partial in Python 2.X. This issue would be hit during the
      eval stage of the Transformer TPU model. This change replaces the call
      to functools.partial with a lambda to work around the issue.
      
      * Remove unused import from transformer_main.
      
      * Fix lint error.
      6f5967a0
    • Zac Wellmer's avatar
      Resnet transfer learning (#5047) · 7bffd37b
      Zac Wellmer authored
      * warm start a resent with all but the dense layer and only update the final layer weights when fine tuning
      
      * Update README for Transfer Learning
      
      * make lint happy and variable naming error related to scaled gradients
      
      * edit the test cases for cifar10 and imagenet to reflect the default case of no fine tuning
      7bffd37b
  5. 13 Aug, 2018 1 commit
  6. 10 Aug, 2018 1 commit
  7. 02 Aug, 2018 2 commits
  8. 01 Aug, 2018 1 commit
  9. 31 Jul, 2018 8 commits
  10. 30 Jul, 2018 2 commits
    • Taylor Robie's avatar
      NCF pipeline refactor (take 2) and initial TPU port. (#4935) · 6518c1c7
      Taylor Robie authored
      * intermediate commit
      
      * ncf now working
      
      * reorder pipeline
      
      * allow batched decode for file backed dataset
      
      * fix bug
      
      * more tweaks
      
      * parallize false negative generation
      
      * shared pool hack
      
      * workers ignore sigint
      
      * intermediate commit
      
      * simplify buffer backed dataset creation to fixed length record approach only. (more cleanup needed)
      
      * more tweaks
      
      * simplify pipeline
      
      * fix misplaced cleanup() calls. (validation works\!)
      
      * more tweaks
      
      * sixify memoryview usage
      
      * more sixification
      
      * fix bug
      
      * add future imports
      
      * break up training input pipeline
      
      * more pipeline tuning
      
      * first pass at moving negative generation to async
      
      * refactor async pipeline to use files instead of ipc
      
      * refactor async pipeline
      
      * move expansion and concatenation from reduce worker to generation workers
      
      * abandon complete async due to interactions with the tensorflow threadpool
      
      * cleanup
      
      * remove performance_comparison.py
      
      * experiment with rough generator + interleave pipeline
      
      * yet more pipeline tuning
      
      * update on-the-fly pipeline
      
      * refactor preprocessing, and move train generation behind a GRPC server
      
      * fix leftover call
      
      * intermediate commit
      
      * intermediate commit
      
      * fix index error in data pipeline, and add logging to train data server
      
      * make sharding more robust to imbalance
      
      * correctly sample with replacement
      
      * file buffers are no longer needed for this branch
      
      * tweak sampling methods
      
      * add README for data pipeline
      
      * fix eval sampling, and vectorize eval metrics
      
      * add spillover and static training batch sizes
      
      * clean up cruft from earlier iterations
      
      * rough delint
      
      * delint 2 / n
      
      * add type annotations
      
      * update run script
      
      * make run.sh a bit nicer
      
      * change embedding initializer to match reference
      
      * rough pass at pure estimator model_fn
      
      * impose static shape hack (revisit later)
      
      * refinements
      
      * fix dir error in run.sh
      
      * add documentation
      
      * add more docs and fix an assert
      
      * old data test is no longer valid. Keeping it around as reference for the new one
      
      * rough draft of data pipeline validation script
      
      * don't rely on shuffle default
      
      * tweaks and documentation
      
      * add separate eval batch size for performance
      
      * initial commit
      
      * terrible hacking
      
      * mini hacks
      
      * missed a bug
      
      * messing about trying to get TPU running
      
      * TFRecords based TPU attempt
      
      * bug fixes
      
      * don't log remotely
      
      * more bug fixes
      
      * TPU tweaks and bug fixes
      
      * more tweaks
      
      * more adjustments
      
      * rework model definition
      
      * tweak data pipeline
      
      * refactor async TFRecords generation
      
      * temp commit to run.sh
      
      * update log behavior
      
      * fix logging bug
      
      * add check for subprocess start to avoid cryptic hangs
      
      * unify deserialize and make it TPU compliant
      
      * delint
      
      * remove gRPC pipeline code
      
      * fix logging bug
      
      * delint and remove old test files
      
      * add unit tests for NCF pipeline
      
      * delint
      
      * clean up run.sh, and add run_tpu.sh
      
      * forgot the most important line
      
      * fix run.sh bugs
      
      * yet more bash debugging
      
      * small tweak to add keras summaries to model_fn
      
      * Clean up sixification issues
      
      * address PR comments
      
      * delinting is never over
      6518c1c7
    • Sundara Tejaswi Digumarti's avatar
      Compute metrics under distributed strategies. (#4942) · a88b89be
      Sundara Tejaswi Digumarti authored
      Removed the conditional over distributed strategies when computing metrics.
      Metrics are now computed even when distributed strategies are used.
      a88b89be
  11. 26 Jul, 2018 1 commit
    • Jiang Yu's avatar
      fix batch_size in transformer_main.py (#4897) · 2d7a0d6a
      Jiang Yu authored
      * fix batch_size in transformer_main.py
      
      fix batch_size in transformer_main.py which causes ResourceExhaustedError: OOM during training Transformer models using models/official/transformer
      
      * small format change
      
      change format from one line to multiple ones in order to pass lint tests
      
      * remove trailing space and add comment
      2d7a0d6a
  12. 21 Jul, 2018 1 commit
  13. 20 Jul, 2018 1 commit
    • Yanhui Liang's avatar
      Add eager for keras benchmark (#4825) · 2689c9ae
      Yanhui Liang authored
      * Add more arguments
      
      * Add eager mode
      
      * Add notes for eager mode
      
      * Address the comments
      
      * Fix argument typos
      
      * Add warning for eager and multi-gpu
      
      * Fix typo
      
      * Fix notes
      
      * Fix pylint
      2689c9ae
  14. 19 Jul, 2018 1 commit
  15. 13 Jul, 2018 2 commits
    • Yanhui Liang's avatar
      Keras model benchmark (#4476) · 937a530a
      Yanhui Liang authored
      * Add callbacks
      
      * Add readme
      
      * update readme
      
      * fix some comments
      
      * Address all comments
      
      * Update docstrings
      
      * Add method docstrings
      
      * Update callbacks
      
      * Add comments on global_step initialization
      
      * Some updates
      
      * Address comments
      937a530a
    • Qianli Scott Zhu's avatar
      Add shorter timeout for GCP util. (#4762) · c020e502
      Qianli Scott Zhu authored
      * Add shorter timeout for GCP util.
      
      * Add comment for change reason and unit for timeout.
      c020e502
  16. 12 Jul, 2018 1 commit
  17. 11 Jul, 2018 1 commit
  18. 10 Jul, 2018 1 commit
  19. 26 Jun, 2018 1 commit
  20. 25 Jun, 2018 2 commits
  21. 22 Jun, 2018 1 commit
  22. 20 Jun, 2018 1 commit
    • Taylor Robie's avatar
      Wide Deep refactor and deep movies (#4506) · 20070ca4
      Taylor Robie authored
      * begin branch
      
      * finish download script
      
      * rename download to dataset
      
      * intermediate commit
      
      * intermediate commit
      
      * misc tweaks
      
      * intermediate commit
      
      * intermediate commit
      
      * intermediate commit
      
      * delint and update census test.
      
      * add movie tests
      
      * delint
      
      * fix py2 issue
      
      * address PR comments
      
      * intermediate commit
      
      * intermediate commit
      
      * intermediate commit
      
      * finish wide deep transition to vanilla movielens
      
      * delint
      
      * intermediate commit
      
      * intermediate commit
      
      * intermediate commit
      
      * intermediate commit
      
      * fix import
      
      * add default ncf csv construction
      
      * change default on download_if_missing
      
      * shard and vectorize example serialization
      
      * fix import
      
      * update ncf data unittests
      
      * delint
      
      * delint
      
      * more delinting
      
      * fix wide-deep movielens serialization
      
      * address PR comments
      
      * add file_io tests
      
      * investigate wide-deep test failure
      
      * remove hard coded path and properly use flags.
      
      * address file_io test PR comments
      
      * missed a hash_bucked_size
      20070ca4
  23. 18 Jun, 2018 1 commit
  24. 12 Jun, 2018 2 commits
    • Katherine Wu's avatar
      Add checklist for official models. Remove file access from flag validator (fix build) (#4492) · bb62f248
      Katherine Wu authored
      * Add checklist for official models. Remove file access from flag validator (causing issues with BUILD)
      
      * spelling
      
      * address PR comments
      bb62f248
    • Katherine Wu's avatar
      Transformer multi gpu, remove multi_gpu flag, distribution helper functions (#4457) · 29c9f985
      Katherine Wu authored
      * Add DistributionStrategy to transformer model
      
      * add num_gpu flag
      
      * Calculate per device batch size for transformer
      
      * remove reference to flags_core
      
      * Add synthetic data option to transformer
      
      * fix typo
      
      * add import back in
      
      * Use hierarchical copy
      
      * address PR comments
      
      * lint
      
      * fix spaces
      
      * group train op together to fix single GPU error
      
      * Fix translate bug (sorted_keys is a dict, not a list)
      
      * Change params to a default dict (translate.py was throwing errors because params didn't have the TPU parameters.)
      
      * Address PR comments. Removed multi gpu flag + more
      
      * fix lint
      
      * fix more lints
      
      * add todo for Synthetic dataset
      
      * Update docs
      29c9f985
  25. 11 Jun, 2018 1 commit
  26. 08 Jun, 2018 1 commit