"vscode:/vscode.git/clone" did not exist on "5f73c08729e97eb3f760633c6ffba4f34cfe5538"
  1. 05 Sep, 2018 2 commits
  2. 04 Sep, 2018 1 commit
  3. 02 Sep, 2018 2 commits
  4. 01 Sep, 2018 2 commits
  5. 30 Aug, 2018 1 commit
  6. 29 Aug, 2018 1 commit
  7. 28 Aug, 2018 2 commits
  8. 27 Aug, 2018 2 commits
    • Taylor Robie's avatar
      ResNet eval_only mode (#5186) · d1c48afc
      Taylor Robie authored
      * Make ResNet robust to the case that epochs_between_evals does not divide train_epochs, and add an --eval_only option
      
      * add some comments to make the control flow easier to follow
      
      * address PR comments
      d1c48afc
    • Toby Boyd's avatar
      Add 5 epoch warmup to resnet (#5176) · 9bf586de
      Toby Boyd authored
      * Add 5 epoch warmup
      
      * get_lr with warm_up only for imagenet
      
      * Add base_lr, remove fp16 unittest arg validation
      
      * Remove validation check stopping v1 and FP16
      9bf586de
  9. 25 Aug, 2018 1 commit
  10. 22 Aug, 2018 1 commit
    • Reed's avatar
      Fix convergence issues for MLPerf. (#5161) · 64710c05
      Reed authored
      * Fix convergence issues for MLPerf.
      
      Thank you to @robieta for helping me find these issues, and for providng an algorithm for the `get_hit_rate_and_ndcg_mlperf` function.
      
      This change causes every forked process to set a new seed, so that forked processes do not generate the same set of random numbers. This improves evaluation hit rates.
      
      Additionally, it adds a flag, --ml_perf, that makes further changes so that the evaluation hit rate can match the MLPerf reference implementation.
      
      I ran 4 times with --ml_perf and 4 times without. Without --ml_perf, the highest hit rates achieved by each run were 0.6278, 0.6287, 0.6289, and 0.6241. With --ml_perf, the highest hit rates were 0.6353, 0.6356, 0.6367, and 0.6353.
      
      * fix lint error
      
      * Fix failing test
      
      * Address @robieta's feedback
      
      * Address more feedback
      64710c05
  11. 20 Aug, 2018 1 commit
  12. 18 Aug, 2018 1 commit
    • Reed's avatar
      Speed up cache construction. (#5131) · 5aee67b4
      Reed authored
      This is done by using a higher Pickle protocol version, which the Python docs describe as being "slightly more efficient". This reduces the file write time at the beginning from 2 1/2 minutes to 5 seconds.
      5aee67b4
  13. 16 Aug, 2018 2 commits
  14. 15 Aug, 2018 1 commit
  15. 14 Aug, 2018 2 commits
    • alope107's avatar
      Transformer partial fix (#5092) · 6f5967a0
      alope107 authored
      * Fix Transformer TPU crash in Python 2.X.
      
      - Tensorflow raises an error when tf_inspect.getfullargspec is called on
      a functools.partial in Python 2.X. This issue would be hit during the
      eval stage of the Transformer TPU model. This change replaces the call
      to functools.partial with a lambda to work around the issue.
      
      * Remove unused import from transformer_main.
      
      * Fix lint error.
      6f5967a0
    • Zac Wellmer's avatar
      Resnet transfer learning (#5047) · 7bffd37b
      Zac Wellmer authored
      * warm start a resent with all but the dense layer and only update the final layer weights when fine tuning
      
      * Update README for Transfer Learning
      
      * make lint happy and variable naming error related to scaled gradients
      
      * edit the test cases for cifar10 and imagenet to reflect the default case of no fine tuning
      7bffd37b
  16. 13 Aug, 2018 1 commit
  17. 10 Aug, 2018 1 commit
  18. 02 Aug, 2018 2 commits
  19. 01 Aug, 2018 1 commit
  20. 31 Jul, 2018 8 commits
  21. 30 Jul, 2018 2 commits
    • Taylor Robie's avatar
      NCF pipeline refactor (take 2) and initial TPU port. (#4935) · 6518c1c7
      Taylor Robie authored
      * intermediate commit
      
      * ncf now working
      
      * reorder pipeline
      
      * allow batched decode for file backed dataset
      
      * fix bug
      
      * more tweaks
      
      * parallize false negative generation
      
      * shared pool hack
      
      * workers ignore sigint
      
      * intermediate commit
      
      * simplify buffer backed dataset creation to fixed length record approach only. (more cleanup needed)
      
      * more tweaks
      
      * simplify pipeline
      
      * fix misplaced cleanup() calls. (validation works\!)
      
      * more tweaks
      
      * sixify memoryview usage
      
      * more sixification
      
      * fix bug
      
      * add future imports
      
      * break up training input pipeline
      
      * more pipeline tuning
      
      * first pass at moving negative generation to async
      
      * refactor async pipeline to use files instead of ipc
      
      * refactor async pipeline
      
      * move expansion and concatenation from reduce worker to generation workers
      
      * abandon complete async due to interactions with the tensorflow threadpool
      
      * cleanup
      
      * remove performance_comparison.py
      
      * experiment with rough generator + interleave pipeline
      
      * yet more pipeline tuning
      
      * update on-the-fly pipeline
      
      * refactor preprocessing, and move train generation behind a GRPC server
      
      * fix leftover call
      
      * intermediate commit
      
      * intermediate commit
      
      * fix index error in data pipeline, and add logging to train data server
      
      * make sharding more robust to imbalance
      
      * correctly sample with replacement
      
      * file buffers are no longer needed for this branch
      
      * tweak sampling methods
      
      * add README for data pipeline
      
      * fix eval sampling, and vectorize eval metrics
      
      * add spillover and static training batch sizes
      
      * clean up cruft from earlier iterations
      
      * rough delint
      
      * delint 2 / n
      
      * add type annotations
      
      * update run script
      
      * make run.sh a bit nicer
      
      * change embedding initializer to match reference
      
      * rough pass at pure estimator model_fn
      
      * impose static shape hack (revisit later)
      
      * refinements
      
      * fix dir error in run.sh
      
      * add documentation
      
      * add more docs and fix an assert
      
      * old data test is no longer valid. Keeping it around as reference for the new one
      
      * rough draft of data pipeline validation script
      
      * don't rely on shuffle default
      
      * tweaks and documentation
      
      * add separate eval batch size for performance
      
      * initial commit
      
      * terrible hacking
      
      * mini hacks
      
      * missed a bug
      
      * messing about trying to get TPU running
      
      * TFRecords based TPU attempt
      
      * bug fixes
      
      * don't log remotely
      
      * more bug fixes
      
      * TPU tweaks and bug fixes
      
      * more tweaks
      
      * more adjustments
      
      * rework model definition
      
      * tweak data pipeline
      
      * refactor async TFRecords generation
      
      * temp commit to run.sh
      
      * update log behavior
      
      * fix logging bug
      
      * add check for subprocess start to avoid cryptic hangs
      
      * unify deserialize and make it TPU compliant
      
      * delint
      
      * remove gRPC pipeline code
      
      * fix logging bug
      
      * delint and remove old test files
      
      * add unit tests for NCF pipeline
      
      * delint
      
      * clean up run.sh, and add run_tpu.sh
      
      * forgot the most important line
      
      * fix run.sh bugs
      
      * yet more bash debugging
      
      * small tweak to add keras summaries to model_fn
      
      * Clean up sixification issues
      
      * address PR comments
      
      * delinting is never over
      6518c1c7
    • Sundara Tejaswi Digumarti's avatar
      Compute metrics under distributed strategies. (#4942) · a88b89be
      Sundara Tejaswi Digumarti authored
      Removed the conditional over distributed strategies when computing metrics.
      Metrics are now computed even when distributed strategies are used.
      a88b89be
  22. 26 Jul, 2018 1 commit
    • Jiang Yu's avatar
      fix batch_size in transformer_main.py (#4897) · 2d7a0d6a
      Jiang Yu authored
      * fix batch_size in transformer_main.py
      
      fix batch_size in transformer_main.py which causes ResourceExhaustedError: OOM during training Transformer models using models/official/transformer
      
      * small format change
      
      change format from one line to multiple ones in order to pass lint tests
      
      * remove trailing space and add comment
      2d7a0d6a
  23. 21 Jul, 2018 1 commit
  24. 20 Jul, 2018 1 commit
    • Yanhui Liang's avatar
      Add eager for keras benchmark (#4825) · 2689c9ae
      Yanhui Liang authored
      * Add more arguments
      
      * Add eager mode
      
      * Add notes for eager mode
      
      * Address the comments
      
      * Fix argument typos
      
      * Add warning for eager and multi-gpu
      
      * Fix typo
      
      * Fix notes
      
      * Fix pylint
      2689c9ae