1. 20 Mar, 2019 1 commit
  2. 19 Mar, 2019 3 commits
  3. 18 Mar, 2019 1 commit
  4. 13 Mar, 2019 2 commits
  5. 12 Mar, 2019 2 commits
  6. 11 Mar, 2019 1 commit
  7. 07 Mar, 2019 3 commits
  8. 06 Mar, 2019 1 commit
  9. 02 Mar, 2019 1 commit
  10. 01 Mar, 2019 3 commits
    • Shining Sun's avatar
      Keras-fy NCF Model (#6092) · 048e5bff
      Shining Sun authored
      * tmp commit
      
      * tmp commit
      
      * first attempt (without eval)
      
      * Bug fixes
      
      * bug fixes
      
      * training done
      
      * Loss NAN, no eval
      
      * Loss weight problem solved
      
      * resolve the NAN loss problem
      
      * Problem solved. Clean up needed
      
      * Added a todo
      
      * Remove debug prints
      
      * Extract get_optimizer to ncf_common
      
      * Move metrics computation back to neumf; use DS.scope api
      
      * Extract DS.scope code to utils
      
      * lint fixes
      
      * Move obtaining DS above producer.start to avoid race condition
      
      * move pt 1
      
      * move pt 2
      
      * Update the run script
      
      * Wrap keras_model related code into functions
      
      * Update the doc for softmax_logitfy and change the method name
      
      * Resolve PR comments
      
      * working version with: eager, DS, batch and no masks
      
      * Remove git conflict indicator
      
      * move reshape to neumf_model
      
      * working version, not converge
      
      * converged
      
      * fix a test
      
      * more lint fix
      
      * more lint fix
      
      * more lint fixes
      
      * more lint fix
      
      * Removed unused imports
      
      * fix test
      
      * dummy commit for kicking of checks
      
      * fix lint issue
      
      * dummy input to kick off checks
      
      * dummy input to kick off checks
      
      * add collective to dist strat
      
      * addressed review comments
      
      * add a doc string
      048e5bff
    • Haoyu Zhang's avatar
      Add Keras XLA Tests (#6286) · fa9ed456
      Haoyu Zhang authored
      * Added XLA test with a monkey-patched op to avoid OOM
      
      * Added doc strings in Keras benchmarks to avoid Lint error
      fa9ed456
    • Yash Katariya's avatar
      Update imagenet_preprocessing.py (#6291) · a76cd3ac
      Yash Katariya authored
      a76cd3ac
  11. 28 Feb, 2019 3 commits
  12. 25 Feb, 2019 1 commit
  13. 22 Feb, 2019 4 commits
  14. 21 Feb, 2019 2 commits
    • Ayush Dubey's avatar
      Multi-worker support for Resnet. (#6206) · f2e90945
      Ayush Dubey authored
      * Update official resnet for multi worker training with distribution strategies.
      
      * Fixes for multi worker training.
      
      * Fix call to `get_distribution_strategy`.
      
      * Undo test change.
      
      * Fix spacing.
      
      * Move cluster configuration to distribution_utils.
      
      * Move train_and_evaluate out of loop.  Also, update docstrings for multi-worker flags and add use_train_and_evaluate flag.
      
      * Update distribution_strategy flag to match exported name for collective strategy.
      f2e90945
    • Haoyu Zhang's avatar
      Add flag to enable XLA in Keras models (#6240) · 4571d3fa
      Haoyu Zhang authored
      * Add flag to enable XLA in Keras models
      
      * Fix lint errors (some of them are old errors)
      4571d3fa
  15. 19 Feb, 2019 1 commit
  16. 15 Feb, 2019 1 commit
  17. 14 Feb, 2019 7 commits
  18. 13 Feb, 2019 2 commits
  19. 12 Feb, 2019 1 commit
    • Toby Boyd's avatar
      Add model_dir to all tests to avoid "resource not found error". (#6143) · f788046c
      Toby Boyd authored
      * fix test benchmark_graph_1_gpu_no_dist_strat failing
      
      - Failure only occurs when all 1_gpu tests are run
      together with the error:
      tensorflow.python.framework.errors_impl.NotFoundError:
      Resource localhost/logdir:/tmp/cifar10_model/
      N10tensorflow22SummaryWriterInterfaceE does not exist.
      [Op:WriteScalarSummary] name: epoch_loss/
      
      Another fix might be to generate a different model_dir
      in the core code, but that has other draw backs such as
      restarting from the checkpoint.
      
      * Model_dir for all tests.
      f788046c