- 19 Aug, 2019 1 commit
-
-
Ayush Dubey authored
PiperOrigin-RevId: 264244022
-
- 16 Aug, 2019 2 commits
-
-
Hongkun Yu authored
PiperOrigin-RevId: 263863438
-
Priya Gupta authored
PiperOrigin-RevId: 263854996
-
- 12 Aug, 2019 2 commits
-
-
Hongjun Choi authored
262988559 by A. Unique TensorFlower<gardener@tensorflow.org>: Enable NCF TF 2.0 model to run on TPUStrategy. -- 262971756 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 262967691 by hongkuny<hongkuny@google.com>: Internal -- PiperOrigin-RevId: 262988559 -
Hongkun Yu authored
262962783 by hongkuny<hongkuny@google.com>: Internal change 262460803 by hongkuny<hongkuny@google.com>: Add a public method to extract shareable layers with decoder. -- 262315011 by A. Unique TensorFlower<gardener@tensorflow.org>: Refactor tpu initialization logic to common module. -- 262299019 by akuegel<akuegel@google.com>: Internal change 262178259 by hongkuny<hongkuny@google.com>: We should call training=True in CTL train step. -- 262081759 by akuegel<akuegel@google.com>: Internal change 262021128 by isaprykin<isaprykin@google.com>: Internal change 262004398 by taylorrobie<taylorrobie@google.com>: Internal change 261786323 by yanhuasun<yanhuasun@google.com>: Replace set, dict with ObjectIdentityDict/Set to prepare for eq implementation -- 261393597 by hongkuny<hongkuny@google.com>: add an encoder mode for BertModel which returns all layers. -- 261218818 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 261202754 by hongkuny<hongkuny@google.com>: Use enable_xla flag for classifier and squad, so xla option is exposed to users. -- 261171038 by gjn<gjn@google.com>: Remove weight_decay_rate 0 early exit check Removing this code path should be fine since this was actually not doing what it meant to do. Since weight_decay_rate is actually a tensor, the equality check was only looking at the id of the object and comparing to 0. This should never be true. Evaluating a tensor is also not what we want to do at this point of the code. Thus it should be fine to simply remove this code. -- 261169862 by haoyuzhang<haoyuzhang@google.com>: Internal change 261153520 by haoyuzhang<haoyuzhang@google.com>: Internal change 261140302 by hongkuny<hongkuny@google.com>: Clean up -- 260862396 by A. Unique TensorFlower<gardener@tensorflow.org>: Fix BERT pretraining input pipeline to shuffle and shard dataset properly for multi-worker training. -- 260601376 by hongkuny<hongkuny@google.com>: reorder Q,K to make TPU faster. -- 260580119 by hongkuny<hongkuny@google.com>: Adds expect_partial() -- 260228553 by priyag<priyag@google.com>: Enable transformer and NCF official model tests. Also fix some minor issues so that all tests pass with TF 1 + enable_v2_behavior. -- 260060237 by zongweiz<zongweiz@google.com>: [BERT SQuAD] Enable mixed precision training Add mixed precision training support for BERT SQuAD model. Using the experimental Keras mixed precision API. For numeric stability, use fp32 for layer normalization, dense layers with GELU activation, etc. -- 260052674 by hongkuny<hongkuny@google.com>: Add expect_partial() -- 259889221 by hongkuny<hongkuny@google.com>: Add no ds / xla / eager perfzero tests -- 259790197 by hongkuny<hongkuny@google.com>: Update pretraining model to match tf1 var names. -- 259656389 by hongkuny<hongkuny@google.com>: Internal change 259649972 by hongkuny<hongkuny@google.com>: Update docs. -- 259470074 by hongkuny<hongkuny@google.com>: Adds a dedup phase for trainable variables. -- 259442882 by hongkuny<hongkuny@google.com>: Internal -- 259341546 by mrry<mrry@google.com>: Remove DEBUG-level logging from the BERT benchmark. This triggers graph serialization and other verbose logging in the TensorFlow runtime, which inflates the execution time. -- 259253185 by hongkuny<hongkuny@google.com>: Writes a separated checkpoint for the core model in pretraining. Clean up export utils to just take a model as argument. -- 258893811 by hongkuny<hongkuny@google.com>: Adds summaries for metrics, allowing metrics inside keras.model. -- 258881002 by hongkuny<hongkuny@google.com>: Fix lint. -- 258871624 by hongkuny<hongkuny@google.com>: Internal change 258597234 by rxsang<rxsang@google.com>: Update all the TPUStrategy examples to use the new v2 APIs, i.e. make_dataset_iterator -> experimental_distribute_dataset, make_input_fn_iterator -> experimental_distribute_datasets_from_function, unwrap -> experimental_local_results, experimental_run -> experimental_run_v2 -- 258581998 by taylorrobie<taylorrobie@google.com>: Update keras v2 optimizers to reuse coefficients which are shared across all updates, which reduces the total number of ops created by between 5% (for simple optimizers such as SGD and Adagrad) and 25% (for complicated optimizers such as Adam and NAdam). Separate copies are made for each device and dtype. The effect of this change on run time is fairly minimal since Grappler is expected to consolidate most of these ops; however it does improve graph construction time. -- 258208153 by hongkuny<hongkuny@google.com>: Adds run_eagerly option for bert. -- 257883986 by hongkuny<hongkuny@google.com>: Adds tf.summary for bert training -- 257285772 by haoyuzhang<haoyuzhang@google.com>: Internal change 256242827 by yuefengz<yuefengz@google.com>: Internal change 256204636 by hongkuny<hongkuny@google.com>: Internal -- 256079834 by hongkuny<hongkuny@google.com>: Clean up: move common flags together for further refactoring Enable steps_per_loop option for all applications. -- 255493073 by hongkuny<hongkuny@google.com>: BERT initial OSS readme update. -- 255470372 by dmchen<dmchen@google.com>: Slightly expand expected range for F1 score in BERT SQuAD accuracy test -- 255109240 by hongkuny<hongkuny@google.com>: Update eval/predict batch sizes. -- 255010016 by hongkuny<hongkuny@google.com>: Internal -- 254874613 by hongkuny<hongkuny@google.com>: Update glue tasks enum to match directory name -- 254866171 by taylorrobie<taylorrobie@google.com>: Internal change 254785517 by zongweiz<zongweiz@google.com>: Use train_single_step for BERT GPU models to temporarily work around some performance bugs in GPU runs -- 254497647 by hongkuny<hongkuny@google.com>: Fix device placement for TPU export model. -- 254293763 by haoyuzhang<haoyuzhang@google.com>: Internal change 254134531 by yuefengz<yuefengz@google.com>: Fix a typo in bert_benchmark.py -- 254069984 by hongkuny<hongkuny@google.com>: Automated rollback of changelist 254060732. 254061429 by hongkuny<hongkuny@google.com>: Use host while loop for training steps. -- 254060732 by yifeif<yifeif@google.com>: Automated rollback of changelist 254027750. 254027750 by hongkuny<hongkuny@google.com>: Internal change 253850824 by hongkuny<hongkuny@google.com>: Improve bert training utils. -- 253818191 by hongkuny<hongkuny@google.com>: Update savedmodel export to use new model.save() api. -- 253636854 by dmchen<dmchen@google.com>: Run only training in BERT SQuAD performance test -- 253118910 by hongkuny<hongkuny@google.com>: Internal change 253113801 by zongweiz<zongweiz@google.com>: Internal change 252697519 by dmchen<dmchen@google.com>: BERT SQuAD accuracy test -- 252663512 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change -- 252647871 by A. Unique TensorFlower<gardener@tensorflow.org>: Enable multi worker TPU training for BERT pretraining. -- 252550871 by hongkuny<hongkuny@google.com>: Internal change 252522861 by hongkuny<hongkuny@google.com>: Remove export using trained model due to implementation error -- 252156812 by yuefengz<yuefengz@google.com>: Fix the callback method name in BERT: replaced on_batch_start with on_batch_begin. Without the fix, it won't work with Keras callbacks. -- 251782065 by dmchen<dmchen@google.com>: Internal change 251681245 by hongkuny<hongkuny@google.com>: Update bert to use the new tf.distribute APIs -- 251575972 by A. Unique TensorFlower<gardener@tensorflow.org>: Remove `steps_per_run` when instantiating TPUStrategy. -- 251325964 by hongkuny<hongkuny@google.com>: Improve flags -- 251303452 by haoyuzhang<haoyuzhang@google.com>: Internal change 250942274 by tobyboyd<tobyboyd@google.com>: Internal change 250779087 by A. Unique TensorFlower<gardener@tensorflow.org>: Reduce BERT Perfzero benchmark test training steps. -- 250713045 by hongkuny<hongkuny@google.com>: TPU util -- 250606180 by A. Unique TensorFlower<gardener@tensorflow.org>: Fix BERT benchamrk test errors. -- 250589623 by A. Unique TensorFlower<gardener@tensorflow.org>: Change BERT benchmark test pretrained checkpoint url. -- 250587892 by A. Unique TensorFlower<gardener@tensorflow.org>: Fix error in BERT custom training loop checkpoint restoration. -- 250577163 by A. Unique TensorFlower<gardener@tensorflow.org>: Add logic to inject callback that measures performance in BERT custom training loop. -- 250529526 by hongkuny<hongkuny@google.com>: Internal clean up -- 250428976 by hongkuny<hongkuny@google.com>: Internal change 250415383 by A. Unique TensorFlower<gardener@tensorflow.org>: Add min/max value to BERT classifier benchmark test. -- 250376246 by A. Unique TensorFlower<gardener@tensorflow.org>: Add benchmark performance test to run BERT on multiple numbers of GPUs. -- 250347237 by A. Unique TensorFlower<gardener@tensorflow.org>: Fix linting errors in BERT benchmark test. -- 250326131 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 250315593 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 250303528 by haoyuzhang<haoyuzhang@google.com>: Add method docstring to fix lint error. -- 250009207 by A. Unique TensorFlower<gardener@tensorflow.org>: Add feature in BERT to write training metrics to a summary file. -- 249896208 by hongkuny<hongkuny@google.com>: Adds __init__.py -- 249883771 by hongkuny<hongkuny@google.com>: Creates a benchmark dir -- 249580533 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 249566870 by A. Unique TensorFlower<gardener@tensorflow.org>: Set up BERT benchmark test. -- 249500988 by hongkuny<hongkuny@google.com>: Lints -- 249377254 by hongkuny<hongkuny@google.com>: Internal change 249373328 by hongkuny<hongkuny@google.com>: Clean up tf import -- 249333938 by hongkuny<hongkuny@google.com>: Fix tf1 import -- 249325089 by hongkuny<hongkuny@google.com>: BERT 2.0 -- 249195008 by tianlin<tianlin@google.com>: Internal change 249173564 by hongkuny<hongkuny@google.com>: Internal change 246677582 by haoyuzhang<haoyuzhang@google.com>: Internal change 245821839 by shiningsun<shiningsun@google.com>: Internal change 245353681 by gjn<gjn@google.com>: Internal change 245340898 by haoyuzhang<haoyuzhang@google.com>: Internal change 245155641 by haoyuzhang<haoyuzhang@google.com>: Internal change 244019160 by haoyuzhang<haoyuzhang@google.com>: Internal change 242930998 by shiningsun<shiningsun@google.com>: Internal change 242049350 by haoyuzhang<haoyuzhang@google.com>: Internal change 241663771 by haoyuzhang<haoyuzhang@google.com>: Internal change 241054800 by haoyuzhang<haoyuzhang@google.com>: Internal change 241028555 by yuefengz<yuefengz@google.com>: Internal change 239316550 by haoyuzhang<haoyuzhang@google.com>: Internal change 238251867 by haoyuzhang<haoyuzhang@google.com>: Internal change 237876559 by taylorrobie<taylorrobie@google.com>: Internal change 236346619 by haoyuzhang<haoyuzhang@google.com>: Internal change 236182665 by tayo<tayo@google.com>: Internal change 234652747 by wangtz<wangtz@google.com>: Internal change 233837502 by shiningsun<shiningsun@google.com>: Internal change 232033015 by shiningsun<shiningsun@google.com>: Internal change 228564809 by taylorrobie<taylorrobie@google.com>: Internal change 227052580 by shiningsun<shiningsun@google.com>: Internal change 225436264 by shiningsun<shiningsun@google.com>: Internal change 222283824 by taylorrobie<taylorrobie@google.com>: Internal change 219241224 by taylorrobie<taylorrobie@google.com>: Internal change 218774474 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 218610966 by taylorrobie<taylorrobie@google.com>: Internal change 218576353 by taylorrobie<taylorrobie@google.com>: Internal change 217776707 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 217749789 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 214516790 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 212339556 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 210658133 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 206866123 by taylorrobie<taylorrobie@google.com>: Internal change 205252141 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 202519641 by scottzhu<scottzhu@google.com>: Internal change 201299684 by kathywu<kathywu@google.com>: Internal change 199655516 by karmel<karmel@google.com>: Internal change 199209802 by karmel<karmel@google.com>: Internal change 198089630 by karmel<karmel@google.com>: Internal change 198060863 by karmel<karmel@google.com>: Automated rollback of changelist 197920496. 197920496 by kathywu<kathywu@google.com>: Internal change 197841416 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 195867348 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 195725348 by taylorrobie<taylorrobie@google.com>: Internal change 195283704 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 194662698 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 194103064 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 193581866 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 192783651 by scottzhu<scottzhu@google.com>: Automated rollback of changelist 192714881. 192714881 by scottzhu<scottzhu@google.com>: Automated rollback of changelist 192710755. 192710755 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 192374551 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 192346754 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 192298443 by karmel<karmel@google.com>: Internal change 192220576 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 191514106 by scottzhu<scottzhu@google.com>: Internal change 191327699 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 190938103 by karmel<karmel@google.com>: Internal change 190804388 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 190479716 by karmel<karmel@google.com>: Internal change 189844661 by scottzhu<scottzhu@google.com>: Automated rollback of changelist 189816818. 189816818 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 189639056 by A. Unique TensorFlower<gardener@tensorflow.org>: Internal change 189628781 by karmel<karmel@google.com>: Internal change 189267175 by karmel<karmel@google.com>: Internal change 189096159 by karmel<karmel@google.com>: Internal change 189085341 by karmel<karmel@google.com>: Internal change 188949700 by karmel<karmel@google.com>: Internal change PiperOrigin-RevId: 262962783
-
- 10 Aug, 2019 1 commit
-
-
Nimit Nigania authored
-
- 08 Aug, 2019 1 commit
-
-
Nimit Nigania authored
-
- 05 Aug, 2019 1 commit
-
-
Toby Boyd authored
-
- 21 Jul, 2019 1 commit
-
-
Zongwei Zhou authored
-
- 20 Jul, 2019 1 commit
-
-
Toby Boyd authored
-
- 19 Jul, 2019 2 commits
-
-
guptapriya authored
The current approach checks for presence of contrib. Sometimes this is not sufficient (for e..g when testing TF 1 + enable_v2_behavior=True which is what internal tests currently do)
- 18 Jul, 2019 1 commit
-
-
Toby Boyd authored
* Added benchmarks and common flags. * Add cpu tests. * Add tracking epoch times. * fix transformer. * Add examples_per_second. * fix pylint
-
- 11 Jul, 2019 1 commit
-
-
Toby Boyd authored
* Move to global_step. * Hook to use global_step. * fix comment start step 1 not step 0. * remove hack used for testing. * Add docstring.
-
- 03 Jul, 2019 1 commit
-
-
Toby Boyd authored
* Fix unit tests failures. * 96% of TF 2.0 tests on GPU are passing. * Currently all passing GPU and CPU TF 2.0 * Address code comments. * use tf 2.0 cast. * Comment about working on TF 2.0 CPU * Uses contrib turn off for TF 2.0. * Fix wide_deep and add keras_common_tests. * use context to get num_gpus. * Switch to tf.keras.metrics
-
- 02 Jul, 2019 1 commit
-
-
Yuefeng Zhou authored
when there are multiple workers.
-
- 19 Jun, 2019 1 commit
-
-
Toby Boyd authored
* set default steps to 300K. * Log flags to perfzero. * Add XLA support to transformer - Moved config logic to keras_utils - Added enable_xla flag to _performance flags - Did not refactor enable_xla flag from keras resnet due to reliance on calling FLAGs in estimator keras and that is a needed refactor for another time. * fix g3 lint complaint. * Refactor set config into keras_utils. * Move flags out of main. * pipe through enable_xla * Update official/transformer/v2/misc.py Co-Authored-By:Reed <reedwm@google.com>
-
- 24 May, 2019 1 commit
-
-
Toby Boyd authored
-
- 29 Apr, 2019 1 commit
-
-
Igor authored
Replace per_device with per_replica and PerDevice with PerReplica, because the PerDevice concept was renamed and doesn't exist anymore. (#6693) * Replace per_device with per_replica and PerDevice with PerReplica, because the PerReplica concept was renamed and doesn't exist anymore.
-
- 26 Apr, 2019 1 commit
-
-
Ayush Dubey authored
* Add num_packs flag for MirroredStrategy's cross device ops. * fix parens * Fix lint errors and make all_reduce_alg more robust. * Set default num_packs to 1
-
- 25 Apr, 2019 1 commit
-
-
Ayush Dubey authored
* Remove contrib AllReduceCrossDeviceOps and update all_reduce_alg options with MirroredStrategy. * cleanup
-
- 24 Apr, 2019 1 commit
-
-
Yuefeng Zhou authored
-
- 11 Apr, 2019 1 commit
-
-
rxsang authored
* Make BatchTimestamp object printable. * Removing trailing whitespace. * Make BatchTimestamp repr a string.
-
- 08 Apr, 2019 1 commit
-
-
Shining Sun authored
* add ds support for ncf * remove comments for in_top_k * avoid expanding the input layers * resolve comments and fix lint * Added some comments in code and fix lint * fix lint * add some documentation * add tensorflow imports
-
- 01 Apr, 2019 1 commit
-
-
Haoyu Zhang authored
-
- 29 Mar, 2019 1 commit
-
-
Shining Sun authored
-
- 28 Mar, 2019 1 commit
-
-
Shining Sun authored
* initial commit * bug fix * Move build_stats from common to keras main, because it is only applicable in keras * remove tailing blank line * add test for synth data * add kwargs to init * add kwargs to function invokation * correctly pass kwargs * debug * debug * debug * fix super init * bug fix * fix local_flags * fix import * bug fix * fix log_steps flag * bug fix * bug fix: add missing return value * resolve double-defined flags * lint fix * move log_steps flag to benchmarK flag * fix lint * lint fix * lint fix * try flag core default values * bug fix * bug fix * bug fix * debug * debug * remove debug prints * rename benchmark methods * flag bug fix for synth benchmark
-
- 19 Mar, 2019 1 commit
-
-
Soroush Radpour authored
-
- 07 Mar, 2019 1 commit
-
-
Ayush Dubey authored
* s/CollectiveAllReduceStrategy/MultiWorkerMirroredStrategy * More s/contrib.distribute/distribute.experimental * Collective communication options in MultiWorkerMirroredStrategy. * Minor fixes * No checkpointing if multi worker. * turn off checkpointing * fix lint
-
- 02 Mar, 2019 1 commit
-
-
Taylor Robie authored
* fix resnet breakage and add keras end-to-end tests * delint * address PR comments
-
- 01 Mar, 2019 1 commit
-
-
Shining Sun authored
* tmp commit * tmp commit * first attempt (without eval) * Bug fixes * bug fixes * training done * Loss NAN, no eval * Loss weight problem solved * resolve the NAN loss problem * Problem solved. Clean up needed * Added a todo * Remove debug prints * Extract get_optimizer to ncf_common * Move metrics computation back to neumf; use DS.scope api * Extract DS.scope code to utils * lint fixes * Move obtaining DS above producer.start to avoid race condition * move pt 1 * move pt 2 * Update the run script * Wrap keras_model related code into functions * Update the doc for softmax_logitfy and change the method name * Resolve PR comments * working version with: eager, DS, batch and no masks * Remove git conflict indicator * move reshape to neumf_model * working version, not converge * converged * fix a test * more lint fix * more lint fix * more lint fixes * more lint fix * Removed unused imports * fix test * dummy commit for kicking of checks * fix lint issue * dummy input to kick off checks * dummy input to kick off checks * add collective to dist strat * addressed review comments * add a doc string
-
- 28 Feb, 2019 2 commits
-
-
Ayush Dubey authored
* s/CollectiveAllReduceStrategy/MultiWorkerMirroredStrategy * More s/contrib.distribute/distribute.experimental
-
Tayo Oguntebi authored
-
- 21 Feb, 2019 1 commit
-
-
Ayush Dubey authored
* Update official resnet for multi worker training with distribution strategies. * Fixes for multi worker training. * Fix call to `get_distribution_strategy`. * Undo test change. * Fix spacing. * Move cluster configuration to distribution_utils. * Move train_and_evaluate out of loop. Also, update docstrings for multi-worker flags and add use_train_and_evaluate flag. * Update distribution_strategy flag to match exported name for collective strategy.
-
- 14 Feb, 2019 1 commit
-
-
Toby Boyd authored
* One device from contrib to core. * remove test code.
-
- 13 Feb, 2019 1 commit
-
-
Yuefeng Zhou authored
* Add a flag to specify distribution strategies. * Fix a small error. * Address comments. * Address comments. * Fix typos.
-
- 12 Feb, 2019 1 commit
-
-
Toby Boyd authored
* Remove contrib thread pool. * Remove commented out contrib import. * Fix lint issues. * move tf.data.options higher. Tweak line breaks. * do not monkey patch on or off if dist_strat is off * Do not monkey patch if no_dist_strat. * Fix file permissions. * fix file permissions. * Revert change to main. Add hasattr(tf, 'contrib') to utils * compat.v1.logging * tf.compat.v1.get_local_variables.
-
- 11 Feb, 2019 1 commit
-
-
Toby Boyd authored
* Remove contrib thread pool. * Remove commented out contrib import. * Fix lint issues. * move tf.data.options higher. Tweak line breaks.
-
- 09 Feb, 2019 1 commit
-
-
Yuefeng Zhou authored
* Add pure synthetic data to keras resnet mode. * Add imports. * Address comments. * update comment * Undo set up synthetic data for real data path. * update comment * Address comment * Remove trailing whiltespaces. * s/make_data_set_iterator/make_dataset_iterator/
-
- 08 Feb, 2019 1 commit
-
-
Goldie Gadde authored
This reverts commit 57e07520.
-