1. 24 Jul, 2019 8 commits
  2. 23 Jul, 2019 5 commits
    • Toby Boyd's avatar
      Single execution path tests for ResNet50, ResNet56, NCF, and Shakespeare LSTM. (#7276) · 9d8c9aa4
      Toby Boyd authored
      * Add force_run_distributed tests.
      
      * Added enable_eager
      
      * r/force_run_distributed/force_v2_in_keras_compile
      
      * Adding force_v2 tests and FLAGs.
      
      * Rename method to avoid conflict.
      
      * Add cpu force_v2 tests.
      
      * fix lint, wrap line.
      
      * change to force_v2_in_keras_compile
      
      * Update method name.
      
      * Lower mlperf target to 0.736.
      9d8c9aa4
    • Toby Boyd's avatar
      8390b362
    • Hongjun Choi's avatar
      Merged commit includes the following changes: (#7281) · 64d6c094
      Hongjun Choi authored
      * Merged commit includes the following changes:
      259442882  by hongkuny<hongkuny@google.com>:
      
          Internal
      
      --
      259377621  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Fix NCF serialization/de-serialization logic in NCF input pipeline to use tf.FixedLenFeature instead of raw string/binary decoding.
      
      --
      259373183  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Create binary to generate NCF training/evaluation dataset offline.
      
      --
      259026454  by isaprykin<isaprykin@google.com>:
      
          Internal change
      
      258871624  by hongkuny<hongkuny@google.com>:
      
          Internal change
      
      257285772  by haoyuzhang<haoyuzhang@google.com>:
      
          Internal change
      
      256202287  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Internal change.
      
      --
      254069984  by hongkuny<hongkuny@google.com>:
          Automated rollback of changelist 254060732.
      
      254060732  by yifeif<yifeif@google.com>:
          Automated rollback of changelist 254027750.
      
      254027750  by hongkuny<hongkuny@google.com>:
      
          Internal change
      
      253118910  by hongkuny<hongkuny@google.com>:
      
          Internal change
      
      251906769  by hongkuny<hongkuny@google.com>:
      
          Internal change
      
      251303452  by haoyuzhang<haoyuzhang@google.com>:
      
          Internal change
      
      PiperOrigin-RevId: 259442882
      
      * Update ncf_keras_main.py
      64d6c094
    • Hongkun Yu's avatar
      Update lint presubmit to be consistent with tensorflow (#7278) · 609260cd
      Hongkun Yu authored
      Only care about errors and output into an error file.
      609260cd
    • Hongkun Yu's avatar
      Merged commit includes the following changes: (#7277) · 1fc839bc
      Hongkun Yu authored
      259442882  by hongkuny<hongkuny@google.com>:
      
          Internal
      
      --
      259341546  by mrry<mrry@google.com>:
      
          Remove DEBUG-level logging from the BERT benchmark.
      
          This triggers graph serialization and other verbose logging in the TensorFlow runtime, which inflates the execution time.
      
      --
      259253185  by hongkuny<hongkuny@google.com>:
      
          Writes a separated checkpoint for the core model in pretraining.
          Clean up export utils to just take a model as argument.
      
      --
      258893811  by hongkuny<hongkuny@google.com>:
      
          Adds summaries for metrics, allowing metrics inside keras.model.
      
      --
      258881002  by hongkuny<hongkuny@google.com>:
      
          Fix lint.
      
      --
      258597234  by rxsang<rxsang@google.com>:
      
          Update all the TPUStrategy examples to use the new v2 APIs, i.e.
          make_dataset_iterator -> experimental_distribute_dataset,
          make_input_fn_iterator -> experimental_distribute_datasets_from_function,
          unwrap -> experimental_local_results,
          experimental_run -> experimental_run_v2
      
      --
      258581998  by taylorrobie<taylorrobie@google.com>:
      
          Update keras v2 optimizers to reuse coefficients which are shared across all updates, which reduces the total number of ops created by between 5% (for simple optimizers such as SGD and Adagrad) and 25% (for complicated optimizers such as Adam and NAdam). Separate copies are made for each device and dtype.
      
          The effect of this change on run time is fairly minimal since Grappler is expected to consolidate most of these ops; however it does improve graph construction time.
      
      --
      258208153  by hongkuny<hongkuny@google.com>:
      
          Adds run_eagerly option for bert.
      
      --
      257883986  by hongkuny<hongkuny@google.com>:
      
          Adds tf.summary for bert training
      
      --
      256204636  by hongkuny<hongkuny@google.com>:
      
          Internal
      
      --
      256079834  by hongkuny<hongkuny@google.com>:
      
          Clean up: move common flags together for further refactoring
          Enable steps_per_loop option for all applications.
      
      --
      255493073  by hongkuny<hongkuny@google.com>:
      
          BERT initial OSS readme update.
      
      --
      255470372  by dmchen<dmchen@google.com>:
      
          Slightly expand expected range for F1 score in BERT SQuAD accuracy test
      
      --
      255109240  by hongkuny<hongkuny@google.com>:
      
          Update eval/predict batch sizes.
      
      --
      255010016  by hongkuny<hongkuny@google.com>:
      
          Internal
      
      --
      254874613  by hongkuny<hongkuny@google.com>:
      
          Update glue tasks enum to match directory name
      
      --
      254866171  by taylorrobie<taylorrobie@google.com>:
      
          Internal change
      
      254785517  by zongweiz<zongweiz@google.com>:
      
          Use train_single_step for BERT GPU models to temporarily work around some performance bugs in GPU runs
      
      --
      254497647  by hongkuny<hongkuny@google.com>:
      
          Fix device placement for TPU export model.
      
      --
      254134531  by yuefengz<yuefengz@google.com>:
      
          Fix a typo in bert_benchmark.py
      
      --
      254069984  by hongkuny<hongkuny@google.com>:
          Automated rollback of changelist 254060732.
      
      254061429  by hongkuny<hongkuny@google.com>:
      
          Use host while loop for training steps.
      
      --
      254060732  by yifeif<yifeif@google.com>:
          Automated rollback of changelist 254027750.
      
      254027750  by hongkuny<hongkuny@google.com>:
      
          Internal change
      
      253850824  by hongkuny<hongkuny@google.com>:
      
          Improve bert training utils.
      
      --
      253818191  by hongkuny<hongkuny@google.com>:
      
          Update savedmodel export to use new model.save() api.
      
      --
      253636854  by dmchen<dmchen@google.com>:
      
          Run only training in BERT SQuAD performance test
      
      --
      253118910  by hongkuny<hongkuny@google.com>:
      
          Internal change
      
      253113801  by zongweiz<zongweiz@google.com>:
      
          Internal change
      
      252697519  by dmchen<dmchen@google.com>:
      
          BERT SQuAD accuracy test
      
      --
      252663512  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Internal change
      
      --
      252647871  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Enable multi worker TPU training for BERT pretraining.
      
      --
      252522861  by hongkuny<hongkuny@google.com>:
      
          Remove export using trained model due to implementation error
      
      --
      252156812  by yuefengz<yuefengz@google.com>:
      
          Fix the callback method name in BERT: replaced on_batch_start with on_batch_begin. Without the fix, it won't work with Keras callbacks.
      
      --
      251782065  by dmchen<dmchen@google.com>:
      
          Internal change
      
      251681245  by hongkuny<hongkuny@google.com>:
      
          Update bert to use the new tf.distribute APIs
      
      --
      251575972  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Remove `steps_per_run` when instantiating TPUStrategy.
      
      --
      251325964  by hongkuny<hongkuny@google.com>:
      
          Improve flags
      
      --
      250942274  by tobyboyd<tobyboyd@google.com>:
      
          Internal change
      
      250779087  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Reduce BERT Perfzero benchmark test training steps.
      
      --
      250713045  by hongkuny<hongkuny@google.com>:
      
          TPU util
      
      --
      250606180  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Fix BERT benchamrk test errors.
      
      --
      250589623  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Change BERT benchmark test pretrained checkpoint url.
      
      --
      250587892  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Fix error in BERT custom training loop checkpoint restoration.
      
      --
      250577163  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Add logic to inject callback that measures performance in BERT custom training
          loop.
      
      --
      250529526  by hongkuny<hongkuny@google.com>:
      
          Internal clean up
      
      --
      250428976  by hongkuny<hongkuny@google.com>:
      
          Internal change
      
      250415383  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Add min/max value to BERT classifier benchmark test.
      
      --
      250376246  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Add benchmark performance test to run BERT on multiple numbers of GPUs.
      
      --
      250347237  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Fix linting errors in BERT benchmark test.
      
      --
      250326131  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Internal change
      
      250315593  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Internal change
      
      250303528  by haoyuzhang<haoyuzhang@google.com>:
      
          Add method docstring to fix lint error.
      
      --
      250009207  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Add feature in BERT to write training metrics to a summary file.
      
      --
      249896208  by hongkuny<hongkuny@google.com>:
      
          Adds __init__.py
      
      --
      249883771  by hongkuny<hongkuny@google.com>:
      
          Creates a benchmark dir
      
      --
      249580533  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Internal change
      
      249566870  by A. Unique TensorFlower<gardener@tensorflow.org>:
      
          Set up BERT benchmark test.
      
      --
      249500988  by hongkuny<hongkuny@google.com>:
      
          Lints
      
      --
      249377254  by hongkuny<hongkuny@google.com>:
      
          Internal change
      
      249373328  by hongkuny<hongkuny@google.com>:
      
          Clean up tf import
      
      --
      249333938  by hongkuny<hongkuny@google.com>:
      
          Fix tf1 import
      
      --
      249325089  by hongkuny<hongkuny@google.com>:
      
          BERT 2.0
      
      --
      249173564  by hongkuny<hongkuny@google.com>:
      
          Internal change
      
      PiperOrigin-RevId: 259442882
      1fc839bc
  3. 22 Jul, 2019 1 commit
    • Hongkun Yu's avatar
      Add a new sanity check script that is able to only check incremental changes. (#7265) · 6a6c3616
      Hongkun Yu authored
      * Update pylint.rcfile
      
      * Update pylint.rcfile
      
      * Update pylint.rcfile
      
      * add new sanity check script for lint to replace current lint script.
      
      * Revert "Update pylint.rcfile"
      
      This reverts commit f6036cd7e7c4b9e3eeb47bb56a63927a040a2761.
      
      * Revert "Update pylint.rcfile"
      
      This reverts commit e3af497342e26bbbbecfc8c8f79cb0e24a2ef960.
      
      * Revert "Update pylint.rcfile"
      
      This reverts commit 6136636eee6e90fd191ebbb4ccaa9fb89c0290f4.
      
      * update scripts
      
      * disable trailing-newlines
      6a6c3616
  4. 21 Jul, 2019 1 commit
  5. 20 Jul, 2019 3 commits
  6. 19 Jul, 2019 9 commits
    • Igor's avatar
      Merged commit includes the following changes: (#7264) · 6f47c378
      Igor authored
      259030078  by isaprykin<isaprykin@google.com>:
      
          Clean up the --clone_model_in_keras_dist_strat from Keras Resnet.
      
          The cloning flag has been removed.  The current rule is that cloning is only done in graph mode.  That resulted in duplicate benchmarks: eager+no-cloning vs eager+cloning.  I removed eager+cloning ones.
      
      --
      259026454  by isaprykin<isaprykin@google.com>:
      
          Internal change
      
      PiperOrigin-RevId: 259030078
      6f47c378
    • Jing Li's avatar
      Merged commit includes the following changes: (#7263) · c5a4978d
      Jing Li authored
      * Merged commit includes the following changes:
      258867180  by jingli<jingli@google.com>:
      
          Add new folders for upcoming reorg in model garden.
      
      --
      258893811  by hongkuny<hongkuny@google.com>:
      
          Adds summaries for metrics, allowing metrics inside keras.model.
      
      --
      258893048  by isaprykin<isaprykin@google.com>:
      
          Remove the `cloning` argument to `compile()`.
      
          Keras models are distributed by cloning in graph mode and without cloning in eager mode as of the change # 258652546.
      
      --
      258881002  by hongkuny<hongkuny@google.com>:
      
          Fix lint.
      
      --
      258874998  by hongkuny<hongkuny@google.com>:
      
          Internal
      
      --
      258872662  by hongkuny<hongkuny@google.com>:
      
          Fix doc
      
      --
      
      PiperOrigin-RevId: 258867180
      
      * Create __init__.py
      
      * Update __init__.py
      
      * Update __init__.py
      
      * Update __init__.py
      c5a4978d
    • Toby Boyd's avatar
      Revert "Change how TF 2 is checked" (#7260) · 2569fa9a
      Toby Boyd authored
      This reverts commit 712f473e.
      2569fa9a
    • guptapriya's avatar
      Fix lint error · 283de38b
      guptapriya authored
      283de38b
    • guptapriya's avatar
      Disable ncf tests for 1.x · 8c8779a3
      guptapriya authored
      8c8779a3
    • guptapriya's avatar
      NCF Keras: Fail early with TF 1.x + dist strat · 41d071ee
      guptapriya authored
      This combination does not yet work. Fail early with an explicit message instead of throwing error later on.
      41d071ee
    • Chris Mattmann's avatar
      Fix for TF-models #7216: CIFAR-10 tutorial for multi-GPU fails because full... · 97a87f9c
      Chris Mattmann authored
      Fix for TF-models #7216: CIFAR-10 tutorial for multi-GPU fails because full shape isn't passed to prefetch_queue contributed by mattmann. (#7217)
      
      97a87f9c
    • guptapriya's avatar
      Change how TF 2 is checked · 712f473e
      guptapriya authored
      The current approach checks for presence of contrib. Sometimes this is not sufficient (for e..g when testing TF 1 + enable_v2_behavior=True which is what internal tests currently do)
      712f473e
    • Hongkun Yu's avatar
      Merged commit includes the following changes: (#7255) · 32fadf00
      Hongkun Yu authored
      258881002  by hongkuny<hongkuny@google.com>:
      
          Fix lint.
      
      --
      258874998  by hongkuny<hongkuny@google.com>:
      
          Internal
      
      --
      258872662  by hongkuny<hongkuny@google.com>:
      
          Fix doc
      
      --
      258871624  by hongkuny<hongkuny@google.com>:
      
          Internal change
      
      PiperOrigin-RevId: 258881002
      32fadf00
  7. 18 Jul, 2019 7 commits
    • Hongkun Yu's avatar
      Merged commit includes the following changes: (#7252) · 1fb34e76
      Hongkun Yu authored
      258597234  by rxsang<rxsang@google.com>:
      
          Update all the TPUStrategy examples to use the new v2 APIs, i.e.
          make_dataset_iterator -> experimental_distribute_dataset,
          make_input_fn_iterator -> experimental_distribute_datasets_from_function,
          unwrap -> experimental_local_results,
          experimental_run -> experimental_run_v2
      
      --
      258581998  by taylorrobie<taylorrobie@google.com>:
      
          Update keras v2 optimizers to reuse coefficients which are shared across all updates, which reduces the total number of ops created by between 5% (for simple optimizers such as SGD and Adagrad) and 25% (for complicated optimizers such as Adam and NAdam). Separate copies are made for each device and dtype.
      
          The effect of this change on run time is fairly minimal since Grappler is expected to consolidate most of these ops; however it does improve graph construction time.
      
      --
      
      PiperOrigin-RevId: 258597234
      1fb34e76
    • Jing Li's avatar
      Update CODEOWNERS (#7251) · 79b87be6
      Jing Li authored
      79b87be6
    • Toby Boyd's avatar
      Refactor and add benchmarks as well as accuracy tests for GPU and CPU (#7248) · e0a2b8c3
      Toby Boyd authored
      * Added benchmarks and common flags.
      
      * Add cpu tests.
      
      * Add tracking epoch times.
      
      * fix transformer.
      
      * Add examples_per_second.
      
      * fix pylint
      e0a2b8c3
    • Chris Mattmann's avatar
      Fix for #7225: CIFAR-10 eval fails with error TypeError: Input 'predictions'... · 63605b95
      Chris Mattmann authored
      Fix for #7225: CIFAR-10 eval fails with error TypeError: Input 'predictions' of 'InTopKV2' Op has type float16 that contributed by mattmann. (#7227)
      
      63605b95
    • Yongzhe Wang's avatar
      Merged commit includes the following changes: (#7250) · 3b9025d5
      Yongzhe Wang authored
      * Merged commit includes the following changes:
      257930561  by yongzhe:
      
          Mobile LSTD TfLite Client.
      
      --
      257928126  by yongzhe:
      
          Mobile SSD Tflite client.
      
      --
      257921181  by menglong:
      
          Fix discrepancy between pre_bottleneck = {true, false}
      
      --
      257561213  by yongzhe:
      
          File utils.
      
      --
      257449226  by yongzhe:
      
          Mobile SSD Client.
      
      --
      257264654  by yongzhe:
      
          SSD utils.
      
      --
      257235648  by yongzhe:
      
          Proto bazel build rules.
      
      --
      256437262  by Menglong Zhu:
      
          Fix check for FusedBatchNorm op to only verify it as a prefix.
      
      --
      256283755  by yongzhe:
      
          Bazel build and copybara changes.
      
      --
      251947295  by yinxiao:
      
          Add missing interleaved option in checkpoint restore.
      
      --
      251513479  by yongzhe:
      
          Conversion utils.
      
      --
      248783193  by yongzhe:
      
          Branch protos needed for the lstd client.
      
      --
      248200507  by menglong:
      
          Fix proto namespace in example config
      
      --
      
      PiperOrigin-RevId: 257930561
      
      * Delete BUILD
      
      * Merged commit includes the following changes:
      258709909  by yongzhe:
      
          1. Fix a bug that input wasn't copied.
          2. Change the tensor indexing to support graph with postprocessing.
          3. Fix a bug that the quantized lstm states weren't initialized.
      
      --
      258398095  by yongzhe:
      
          Internal change.
      
      --
      
      PiperOrigin-RevId: 258709909
      
      * Adding myself as the code owner
      3b9025d5
    • Haoyu Zhang's avatar
      Improve Keras graph performance for ResNet56 (#7241) · dd5a91d3
      Haoyu Zhang authored
      * Config threadpool, cuDNN persistent BN, and grappler layout optimizer properly for ResNet56
      
      * Add tweaked tests for Resnet56
      
      * Avoid triggering the last partial batch overhead by explicitly dropping remainder
      dd5a91d3
    • Yongzhe Wang's avatar
      Merged commit includes the following changes: (#7249) · b7221961
      Yongzhe Wang authored
      * Merged commit includes the following changes:
      257930561  by yongzhe:
      
          Mobile LSTD TfLite Client.
      
      --
      257928126  by yongzhe:
      
          Mobile SSD Tflite client.
      
      --
      257921181  by menglong:
      
          Fix discrepancy between pre_bottleneck = {true, false}
      
      --
      257561213  by yongzhe:
      
          File utils.
      
      --
      257449226  by yongzhe:
      
          Mobile SSD Client.
      
      --
      257264654  by yongzhe:
      
          SSD utils.
      
      --
      257235648  by yongzhe:
      
          Proto bazel build rules.
      
      --
      256437262  by Menglong Zhu:
      
          Fix check for FusedBatchNorm op to only verify it as a prefix.
      
      --
      256283755  by yongzhe:
      
          Bazel build and copybara changes.
      
      --
      251947295  by yinxiao:
      
          Add missing interleaved option in checkpoint restore.
      
      --
      251513479  by yongzhe:
      
          Conversion utils.
      
      --
      248783193  by yongzhe:
      
          Branch protos needed for the lstd client.
      
      --
      248200507  by menglong:
      
          Fix proto namespace in example config
      
      --
      
      P...
      b7221961
  8. 16 Jul, 2019 3 commits
    • Hongkun Yu's avatar
      Merged commit includes the following changes: (#7221) · e21dcdd0
      Hongkun Yu authored
      258208153  by hongkuny<hongkuny@google.com>:
      
          Adds run_eagerly option for bert.
      
      --
      
      PiperOrigin-RevId: 258208153
      e21dcdd0
    • nnigania's avatar
      Ncf perf optimizations for CTL and multi GPU (#7206) · 492f8c92
      nnigania authored
      * Ncf perf changes 1)exclude metric layer from CTL train step 2)dataset optimization to fix size of the sample_weights, preventing a costly broadcast during loss calculation for multi-gpu case
      492f8c92
    • yongzhe2160's avatar
      Merged commit includes the following changes: (#7220) · 66d00a87
      yongzhe2160 authored
      * Merged commit includes the following changes:
      257930561  by yongzhe:
      
          Mobile LSTD TfLite Client.
      
      --
      257928126  by yongzhe:
      
          Mobile SSD Tflite client.
      
      --
      257921181  by menglong:
      
          Fix discrepancy between pre_bottleneck = {true, false}
      
      --
      257561213  by yongzhe:
      
          File utils.
      
      --
      257449226  by yongzhe:
      
          Mobile SSD Client.
      
      --
      257264654  by yongzhe:
      
          SSD utils.
      
      --
      257235648  by yongzhe:
      
          Proto bazel build rules.
      
      --
      256437262  by Menglong Zhu:
      
          Fix check for FusedBatchNorm op to only verify it as a prefix.
      
      --
      256283755  by yongzhe:
      
          Bazel build and copybara changes.
      
      --
      251947295  by yinxiao:
      
          Add missing interleaved option in checkpoint restore.
      
      --
      251513479  by yongzhe:
      
          Conversion utils.
      
      --
      248783193  by yongzhe:
      
          Branch protos needed for the lstd client.
      
      --
      248200507  by menglong:
      
          Fix proto namespace in example config
      
      --
      
      PiperOrigin-RevId: 257930561
      
      * Delete BUILD
      66d00a87
  9. 15 Jul, 2019 3 commits
    • Bruce Fontaine's avatar
      Initial implementation of Shakespeare character LSTM. (#7218) · 395f6d2d
      Bruce Fontaine authored
      * Initial implementation of Shakespeare character LSTM.
      
      * Fix import order
      395f6d2d
    • Hongkun Yu's avatar
      Merged commit includes the following changes: (#7209) · dc8c6ce1
      Hongkun Yu authored
      257883986  by hongkuny<hongkuny@google.com>:
      
          Adds tf.summary for bert training
      
      --
      
      PiperOrigin-RevId: 257883986
      dc8c6ce1
    • pkulzc's avatar
      Object detection changes: (#7208) · fe748d4a
      pkulzc authored
      257914648  by lzc:
      
          Internal changes
      
      --
      257525973  by Zhichao Lu:
      
          Fixes bug that silently prevents checkpoints from loading when training w/ eager + functions. Also sets up scripts to run training.
      
      --
      257296614  by Zhichao Lu:
      
          Adding detection_features to model outputs
      
      --
      257234565  by Zhichao Lu:
      
          Fix wrong order of `classes_with_max_scores` in class-agnostic NMS caused by
          sorting in partitioned-NMS.
      
      --
      257232002  by ronnyvotel:
      
          Supporting `filter_nonoverlapping` option in np_box_list_ops.clip_to_window().
      
      --
      257198282  by Zhichao Lu:
      
          Adding the focal loss and l1 loss from the Objects as Points paper.
      
      --
      257089535  by Zhichao Lu:
      
          Create Keras based ssd + resnetv1 + fpn.
      
      --
      257087407  by Zhichao Lu:
      
          Make object_detection/data_decoders Python3-compatible.
      
      --
      257004582  by Zhichao Lu:
      
          Updates _decode_raw_data_into_masks_and_boxes to the latest binary masks-to-string encoding fo...
      fe748d4a