Commits · 5b0ef1fcc229d38d20f63d1d10d63df11c44bbf0 · ModelZoo / ResNet50_tensorflow

19 Aug, 2019 1 commit
- Extend synthetic data monkey patch to MultiWorkerMirroredStrategy. · ee584397
  Ayush Dubey authored Aug 19, 2019
```
PiperOrigin-RevId: 264244022
```
  ee584397
16 Aug, 2019 2 commits
- Consolidation & readability. · b1d9ac5b
  Hongkun Yu authored Aug 16, 2019
```
PiperOrigin-RevId: 263863438
```
  b1d9ac5b
- fix monkey patch for synthetic data for resnet keras model. · 1f2cebfa
  Priya Gupta authored Aug 16, 2019
```
PiperOrigin-RevId: 263854996
```
  1f2cebfa
12 Aug, 2019 2 commits

Merged commit includes the following changes: (#7430) · 03b4a0af

Hongjun Choi authored Aug 12, 2019

262988559  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Enable NCF TF 2.0 model to run on TPUStrategy.

--
262971756  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

262967691  by hongkuny<hongkuny@google.com>:

    Internal

--

PiperOrigin-RevId: 262988559

03b4a0af

Merged commit includes the following changes: (#7429) · 3a14837d

Hongkun Yu authored Aug 12, 2019

262962783  by hongkuny<hongkuny@google.com>:

    Internal change

262460803  by hongkuny<hongkuny@google.com>:

    Add a public method to extract shareable layers with decoder.

--
262315011  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Refactor tpu initialization logic to common module.

--
262299019  by akuegel<akuegel@google.com>:

    Internal change

262178259  by hongkuny<hongkuny@google.com>:

    We should call training=True in CTL train step.

--
262081759  by akuegel<akuegel@google.com>:

    Internal change

262021128  by isaprykin<isaprykin@google.com>:

    Internal change

262004398  by taylorrobie<taylorrobie@google.com>:

    Internal change

261786323  by yanhuasun<yanhuasun@google.com>:

    Replace set, dict with ObjectIdentityDict/Set to prepare for eq implementation

--
261393597  by hongkuny<hongkuny@google.com>:

    add an encoder mode for BertModel which returns all layers.

--
261218818  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

261202754  by hongkuny<hongkuny@google.com>:

    Use enable_xla flag for classifier and squad, so xla option is exposed to users.

--
261171038  by gjn<gjn@google.com>:

    Remove weight_decay_rate 0 early exit check

    Removing this code path should be fine since this was actually not doing
    what it meant to do. Since weight_decay_rate is actually a tensor, the
    equality check was only looking at the id of the object and comparing to
    0. This should never be true. Evaluating a tensor is also not what we
    want to do at this point of the code. Thus it should be fine to simply
    remove this code.

--
261169862  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

261153520  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

261140302  by hongkuny<hongkuny@google.com>:

    Clean up

--
260862396  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix BERT pretraining input pipeline to shuffle and shard dataset properly for multi-worker training.

--
260601376  by hongkuny<hongkuny@google.com>:

    reorder Q,K to make TPU faster.

--
260580119  by hongkuny<hongkuny@google.com>:

    Adds expect_partial()

--
260228553  by priyag<priyag@google.com>:

    Enable transformer and NCF official model tests. Also fix some minor issues so that all tests pass with TF 1 + enable_v2_behavior.

--
260060237  by zongweiz<zongweiz@google.com>:

    [BERT SQuAD] Enable mixed precision training

    Add mixed precision training support for BERT SQuAD model. Using the experimental Keras mixed precision API. For numeric stability, use fp32 for layer normalization, dense layers with GELU activation, etc.

--
260052674  by hongkuny<hongkuny@google.com>:

    Add expect_partial()

--
259889221  by hongkuny<hongkuny@google.com>:

    Add no ds / xla / eager perfzero tests

--
259790197  by hongkuny<hongkuny@google.com>:

    Update pretraining model to match tf1 var names.

--
259656389  by hongkuny<hongkuny@google.com>:

    Internal change

259649972  by hongkuny<hongkuny@google.com>:

    Update docs.

--
259470074  by hongkuny<hongkuny@google.com>:

    Adds a dedup phase for trainable variables.

--
259442882  by hongkuny<hongkuny@google.com>:

    Internal

--
259341546  by mrry<mrry@google.com>:

    Remove DEBUG-level logging from the BERT benchmark.

    This triggers graph serialization and other verbose logging in the TensorFlow runtime, which inflates the execution time.

--
259253185  by hongkuny<hongkuny@google.com>:

    Writes a separated checkpoint for the core model in pretraining.
    Clean up export utils to just take a model as argument.

--
258893811  by hongkuny<hongkuny@google.com>:

    Adds summaries for metrics, allowing metrics inside keras.model.

--
258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258871624  by hongkuny<hongkuny@google.com>:

    Internal change

258597234  by rxsang<rxsang@google.com>:

    Update all the TPUStrategy examples to use the new v2 APIs, i.e.
    make_dataset_iterator -> experimental_distribute_dataset,
    make_input_fn_iterator -> experimental_distribute_datasets_from_function,
    unwrap -> experimental_local_results,
    experimental_run -> experimental_run_v2

--
258581998  by taylorrobie<taylorrobie@google.com>:

    Update keras v2 optimizers to reuse coefficients which are shared across all updates, which reduces the total number of ops created by between 5% (for simple optimizers such as SGD and Adagrad) and 25% (for complicated optimizers such as Adam and NAdam). Separate copies are made for each device and dtype.

    The effect of this change on run time is fairly minimal since Grappler is expected to consolidate most of these ops; however it does improve graph construction time.

--
258208153  by hongkuny<hongkuny@google.com>:

    Adds run_eagerly option for bert.

--
257883986  by hongkuny<hongkuny@google.com>:

    Adds tf.summary for bert training

--
257285772  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

256242827  by yuefengz<yuefengz@google.com>:

    Internal change

256204636  by hongkuny<hongkuny@google.com>:

    Internal

--
256079834  by hongkuny<hongkuny@google.com>:

    Clean up: move common flags together for further refactoring
    Enable steps_per_loop option for all applications.

--
255493073  by hongkuny<hongkuny@google.com>:

    BERT initial OSS readme update.

--
255470372  by dmchen<dmchen@google.com>:

    Slightly expand expected range for F1 score in BERT SQuAD accuracy test

--
255109240  by hongkuny<hongkuny@google.com>:

    Update eval/predict batch sizes.

--
255010016  by hongkuny<hongkuny@google.com>:

    Internal

--
254874613  by hongkuny<hongkuny@google.com>:

    Update glue tasks enum to match directory name

--
254866171  by taylorrobie<taylorrobie@google.com>:

    Internal change

254785517  by zongweiz<zongweiz@google.com>:

    Use train_single_step for BERT GPU models to temporarily work around some performance bugs in GPU runs

--
254497647  by hongkuny<hongkuny@google.com>:

    Fix device placement for TPU export model.

--
254293763  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

254134531  by yuefengz<yuefengz@google.com>:

    Fix a typo in bert_benchmark.py

--
254069984  by hongkuny<hongkuny@google.com>:
    Automated rollback of changelist 254060732.

254061429  by hongkuny<hongkuny@google.com>:

    Use host while loop for training steps.

--
254060732  by yifeif<yifeif@google.com>:
    Automated rollback of changelist 254027750.

254027750  by hongkuny<hongkuny@google.com>:

    Internal change

253850824  by hongkuny<hongkuny@google.com>:

    Improve bert training utils.

--
253818191  by hongkuny<hongkuny@google.com>:

    Update savedmodel export to use new model.save() api.

--
253636854  by dmchen<dmchen@google.com>:

    Run only training in BERT SQuAD performance test

--
253118910  by hongkuny<hongkuny@google.com>:

    Internal change

253113801  by zongweiz<zongweiz@google.com>:

    Internal change

252697519  by dmchen<dmchen@google.com>:

    BERT SQuAD accuracy test

--
252663512  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

--
252647871  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Enable multi worker TPU training for BERT pretraining.

--
252550871  by hongkuny<hongkuny@google.com>:

    Internal change

252522861  by hongkuny<hongkuny@google.com>:

    Remove export using trained model due to implementation error

--
252156812  by yuefengz<yuefengz@google.com>:

    Fix the callback method name in BERT: replaced on_batch_start with on_batch_begin. Without the fix, it won't work with Keras callbacks.

--
251782065  by dmchen<dmchen@google.com>:

    Internal change

251681245  by hongkuny<hongkuny@google.com>:

    Update bert to use the new tf.distribute APIs

--
251575972  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Remove `steps_per_run` when instantiating TPUStrategy.

--
251325964  by hongkuny<hongkuny@google.com>:

    Improve flags

--
251303452  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

250942274  by tobyboyd<tobyboyd@google.com>:

    Internal change

250779087  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Reduce BERT Perfzero benchmark test training steps.

--
250713045  by hongkuny<hongkuny@google.com>:

    TPU util

--
250606180  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix BERT benchamrk test errors.

--
250589623  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Change BERT benchmark test pretrained checkpoint url.

--
250587892  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix error in BERT custom training loop checkpoint restoration.

--
250577163  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add logic to inject callback that measures performance in BERT custom training
    loop.

--
250529526  by hongkuny<hongkuny@google.com>:

    Internal clean up

--
250428976  by hongkuny<hongkuny@google.com>:

    Internal change

250415383  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add min/max value to BERT classifier benchmark test.

--
250376246  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add benchmark performance test to run BERT on multiple numbers of GPUs.

--
250347237  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Fix linting errors in BERT benchmark test.

--
250326131  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

250315593  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

250303528  by haoyuzhang<haoyuzhang@google.com>:

    Add method docstring to fix lint error.

--
250009207  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Add feature in BERT to write training metrics to a summary file.

--
249896208  by hongkuny<hongkuny@google.com>:

    Adds __init__.py

--
249883771  by hongkuny<hongkuny@google.com>:

    Creates a benchmark dir

--
249580533  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

249566870  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Set up BERT benchmark test.

--
249500988  by hongkuny<hongkuny@google.com>:

    Lints

--
249377254  by hongkuny<hongkuny@google.com>:

    Internal change

249373328  by hongkuny<hongkuny@google.com>:

    Clean up tf import

--
249333938  by hongkuny<hongkuny@google.com>:

    Fix tf1 import

--
249325089  by hongkuny<hongkuny@google.com>:

    BERT 2.0

--
249195008  by tianlin<tianlin@google.com>:

    Internal change

249173564  by hongkuny<hongkuny@google.com>:

    Internal change

246677582  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

245821839  by shiningsun<shiningsun@google.com>:

    Internal change

245353681  by gjn<gjn@google.com>:

    Internal change

245340898  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

245155641  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

244019160  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

242930998  by shiningsun<shiningsun@google.com>:

    Internal change

242049350  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

241663771  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

241054800  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

241028555  by yuefengz<yuefengz@google.com>:

    Internal change

239316550  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

238251867  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

237876559  by taylorrobie<taylorrobie@google.com>:

    Internal change

236346619  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

236182665  by tayo<tayo@google.com>:

    Internal change

234652747  by wangtz<wangtz@google.com>:

    Internal change

233837502  by shiningsun<shiningsun@google.com>:

    Internal change

232033015  by shiningsun<shiningsun@google.com>:

    Internal change

228564809  by taylorrobie<taylorrobie@google.com>:

    Internal change

227052580  by shiningsun<shiningsun@google.com>:

    Internal change

225436264  by shiningsun<shiningsun@google.com>:

    Internal change

222283824  by taylorrobie<taylorrobie@google.com>:

    Internal change

219241224  by taylorrobie<taylorrobie@google.com>:

    Internal change

218774474  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

218610966  by taylorrobie<taylorrobie@google.com>:

    Internal change

218576353  by taylorrobie<taylorrobie@google.com>:

    Internal change

217776707  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

217749789  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

214516790  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

212339556  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

210658133  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

206866123  by taylorrobie<taylorrobie@google.com>:

    Internal change

205252141  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

202519641  by scottzhu<scottzhu@google.com>:

    Internal change

201299684  by kathywu<kathywu@google.com>:

    Internal change

199655516  by karmel<karmel@google.com>:

    Internal change

199209802  by karmel<karmel@google.com>:

    Internal change

198089630  by karmel<karmel@google.com>:

    Internal change

198060863  by karmel<karmel@google.com>:
    Automated rollback of changelist 197920496.

197920496  by kathywu<kathywu@google.com>:

    Internal change

197841416  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

195867348  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

195725348  by taylorrobie<taylorrobie@google.com>:

    Internal change

195283704  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

194662698  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

194103064  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

193581866  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

192783651  by scottzhu<scottzhu@google.com>:
    Automated rollback of changelist 192714881.

192714881  by scottzhu<scottzhu@google.com>:
    Automated rollback of changelist 192710755.

192710755  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

192374551  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

192346754  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

192298443  by karmel<karmel@google.com>:

    Internal change

192220576  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

191514106  by scottzhu<scottzhu@google.com>:

    Internal change

191327699  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

190938103  by karmel<karmel@google.com>:

    Internal change

190804388  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

190479716  by karmel<karmel@google.com>:

    Internal change

189844661  by scottzhu<scottzhu@google.com>:
    Automated rollback of changelist 189816818.

189816818  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

189639056  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

189628781  by karmel<karmel@google.com>:

    Internal change

189267175  by karmel<karmel@google.com>:

    Internal change

189096159  by karmel<karmel@google.com>:

    Internal change

189085341  by karmel<karmel@google.com>:

    Internal change

188949700  by karmel<karmel@google.com>:

    Internal change

PiperOrigin-RevId: 262962783

3a14837d

10 Aug, 2019 1 commit
- acutally removing the loss logging · 0b8b571a
  Nimit Nigania authored Aug 09, 2019
  
  0b8b571a
08 Aug, 2019 1 commit
- add option to print loss · 52372782
  Nimit Nigania authored Aug 08, 2019
  
  52372782
05 Aug, 2019 1 commit
- Remove layout_off tests and related utils. (#7359) · 6545cb3c
  Toby Boyd authored Aug 05, 2019
  
  6545cb3c
21 Jul, 2019 1 commit
- Add a simple signal-based Python callstack sampler for debugging · 830a17ec
  Zongwei Zhou authored Jul 19, 2019
  
  830a17ec
20 Jul, 2019 1 commit
- improved v2 check. · 49b90e86
  Toby Boyd authored Jul 19, 2019
  
  49b90e86
19 Jul, 2019 2 commits

Revert "Change how TF 2 is checked" (#7260) · 2569fa9a
Toby Boyd authored Jul 19, 2019
```
This reverts commit 712f473e.
```
2569fa9a

Change how TF 2 is checked · 712f473e

guptapriya authored Jul 18, 2019

The current approach checks for presence of contrib. Sometimes this is not sufficient (for e..g when testing TF 1 + enable_v2_behavior=True which is what internal tests currently do)

712f473e

18 Jul, 2019 1 commit

Refactor and add benchmarks as well as accuracy tests for GPU and CPU (#7248) · e0a2b8c3

Toby Boyd authored Jul 18, 2019

* Added benchmarks and common flags.

* Add cpu tests.

* Add tracking epoch times.

* fix transformer.

* Add examples_per_second.

* fix pylint

e0a2b8c3

11 Jul, 2019 1 commit

Move Keras Hook to use global step to resolve issues across epochs. (#7186) · f4b02d15

Toby Boyd authored Jul 10, 2019

* Move to global_step.

* Hook to use global_step.

* fix comment start step 1 not step 0.

* remove hack used for testing.

* Add docstring.

f4b02d15

03 Jul, 2019 1 commit

Unit tests pass TF 2.0 GPU and CPU locally. (#7101) · 49097655

Toby Boyd authored Jul 03, 2019

* Fix unit tests failures.

* 96% of TF 2.0 tests on GPU are passing.

* Currently all passing GPU and CPU TF 2.0

* Address code comments.

* use tf 2.0 cast.

* Comment about working on TF 2.0 CPU

* Uses contrib turn off for TF 2.0.

* Fix wide_deep and add keras_common_tests.

* use context to get num_gpus.

* Switch to tf.keras.metrics

49097655

02 Jul, 2019 1 commit
- Allow distibution_utils.py to worker with PSStrategy or none strategy (#7135) · 680eb35c
  Yuefeng Zhou authored Jul 02, 2019
```
when there are multiple workers.
```
  680eb35c
19 Jun, 2019 1 commit

Add XLA to transformer (#7048) · 269581dc

Toby Boyd authored Jun 19, 2019



* set default steps to 300K.

* Log flags to perfzero.

* Add XLA support to transformer

- Moved config logic to keras_utils
- Added enable_xla flag to _performance flags
- Did not refactor enable_xla flag from keras resnet due to
  reliance on calling FLAGs in estimator keras and that is
  a needed refactor for another time.

* fix g3 lint complaint.

* Refactor set config into keras_utils.

* Move flags out of main.

* pipe through enable_xla

* Update official/transformer/v2/misc.py
Co-Authored-By: Reed <reedwm@google.com>

269581dc

24 May, 2019 1 commit
- Moved common keras code to utils. (#6859) · 3254cabb
  Toby Boyd authored May 24, 2019
  
  3254cabb
29 Apr, 2019 1 commit

Replace per_device with per_replica and PerDevice with PerReplica, because the... · b00783d7

Igor authored Apr 29, 2019

Replace per_device with per_replica and PerDevice with PerReplica, because the PerDevice concept was renamed and doesn't exist anymore. (#6693)

* Replace per_device with per_replica and PerDevice with PerReplica, because the PerReplica concept was renamed and doesn't exist anymore.

b00783d7

26 Apr, 2019 1 commit

Add num_packs flag for MirroredStrategy's cross device ops. (#6676) · 4a1fba0b

Ayush Dubey authored Apr 26, 2019

* Add num_packs flag for MirroredStrategy's cross device ops.

* fix parens

* Fix lint errors and make all_reduce_alg more robust.

* Set default num_packs to 1

4a1fba0b

25 Apr, 2019 1 commit
- Remove contrib cross device ops and update all_reduce_alg options. (#6673) · ece99414
  Ayush Dubey authored Apr 25, 2019
```
* Remove contrib AllReduceCrossDeviceOps and update all_reduce_alg options with MirroredStrategy.

* cleanup
```
  ece99414
24 Apr, 2019 1 commit
- Update distribution_utils.py (#6615) · 98672351
  Yuefeng Zhou authored Apr 24, 2019
  
  98672351
11 Apr, 2019 1 commit

Make BatchTimestamp object printable. (#6557) · 80dde852

rxsang authored Apr 10, 2019

* Make BatchTimestamp object printable.

* Removing trailing whitespace.

* Make BatchTimestamp repr a string.

80dde852

08 Apr, 2019 1 commit

Add DS support for NCF keras (#6447) · 1255d5b9

Shining Sun authored Apr 08, 2019

* add ds support for ncf

* remove comments for in_top_k

* avoid expanding the input layers

* resolve comments and fix lint

* Added some comments in code and fix lint

* fix lint

* add some documentation

* add tensorflow imports

1255d5b9

01 Apr, 2019 1 commit
- Add synthetic data monkey patch to OneDeviceStrategy as well (#6505) · d9823dae
  Haoyu Zhang authored Apr 01, 2019
  
  d9823dae
29 Mar, 2019 1 commit
- fix a typo in doc string (#6475) · 5775220a
  Shining Sun authored Mar 28, 2019
  
  5775220a
28 Mar, 2019 1 commit

Added benchmark test and convergence test for the NCF model (#6318) · 4c11b84b

Shining Sun authored Mar 28, 2019

* initial commit

* bug fix

* Move build_stats from common to keras main, because it is only applicable in keras

* remove tailing blank line

* add test for synth data

* add kwargs to init

* add kwargs to function invokation

* correctly pass kwargs

* debug

* debug

* debug

* fix super init

* bug fix

* fix local_flags

* fix import

* bug fix

* fix log_steps flag

* bug fix

* bug fix: add missing return value

* resolve double-defined flags

* lint fix

* move log_steps flag to benchmarK flag

* fix lint

* lint fix

* lint fix

* try flag core default values

* bug fix

* bug fix

* bug fix

* debug

* debug

* remove debug prints

* rename benchmark methods

* flag bug fix for synth benchmark

4c11b84b

19 Mar, 2019 1 commit
- Add the option to run Keras resnet model on multiple workers. (#6368) · 3024bde6
  Soroush Radpour authored Mar 19, 2019
  
  3024bde6
07 Mar, 2019 1 commit

Add command line option for multi worker collective implementations, disable checkpointing. (#6317) · 05a79f5a

Ayush Dubey authored Mar 07, 2019

* s/CollectiveAllReduceStrategy/MultiWorkerMirroredStrategy

* More s/contrib.distribute/distribute.experimental

* Collective communication options in MultiWorkerMirroredStrategy.

* Minor fixes

* No checkpointing if multi worker.

* turn off checkpointing

* fix lint

05a79f5a

02 Mar, 2019 1 commit
- fix resnet breakage and add keras end-to-end tests (#6295) · 8367cf6d
  Taylor Robie authored Mar 02, 2019
```
* fix resnet breakage and add keras end-to-end tests

* delint

* address PR comments
```
  8367cf6d
01 Mar, 2019 1 commit

Keras-fy NCF Model (#6092) · 048e5bff

Shining Sun authored Mar 01, 2019

* tmp commit

* tmp commit

* first attempt (without eval)

* Bug fixes

* bug fixes

* training done

* Loss NAN, no eval

* Loss weight problem solved

* resolve the NAN loss problem

* Problem solved. Clean up needed

* Added a todo

* Remove debug prints

* Extract get_optimizer to ncf_common

* Move metrics computation back to neumf; use DS.scope api

* Extract DS.scope code to utils

* lint fixes

* Move obtaining DS above producer.start to avoid race condition

* move pt 1

* move pt 2

* Update the run script

* Wrap keras_model related code into functions

* Update the doc for softmax_logitfy and change the method name

* Resolve PR comments

* working version with: eager, DS, batch and no masks

* Remove git conflict indicator

* move reshape to neumf_model

* working version, not converge

* converged

* fix a test

* more lint fix

* more lint fix

* more lint fixes

* more lint fix

* Removed unused imports

* fix test

* dummy commit for kicking of checks

* fix lint issue

* dummy input to kick off checks

* dummy input to kick off checks

* add collective to dist strat

* addressed review comments

* add a doc string

048e5bff

28 Feb, 2019 2 commits
- Change `CollectiveAllReduceStrategy` to `MultiWorkerMirroredStrategy`. (#6282) · d793ea82
  Ayush Dubey authored Feb 28, 2019
```
* s/CollectiveAllReduceStrategy/MultiWorkerMirroredStrategy

* More s/contrib.distribute/distribute.experimental
```
  d793ea82
- Updating stale DistributionStrategy test. (#6281) · 4b566d4e
  Tayo Oguntebi authored Feb 28, 2019
  
  4b566d4e
21 Feb, 2019 1 commit

Multi-worker support for Resnet. (#6206) · f2e90945

Ayush Dubey authored Feb 21, 2019

* Update official resnet for multi worker training with distribution strategies.

* Fixes for multi worker training.

* Fix call to `get_distribution_strategy`.

* Undo test change.

* Fix spacing.

* Move cluster configuration to distribution_utils.

* Move train_and_evaluate out of loop.  Also, update docstrings for multi-worker flags and add use_train_and_evaluate flag.

* Update distribution_strategy flag to match exported name for collective strategy.

f2e90945

14 Feb, 2019 1 commit
- One device strat (#6196) · b66ef95e
  Toby Boyd authored Feb 13, 2019
```
* One device from contrib to core.

* remove test code.
```
  b66ef95e
13 Feb, 2019 1 commit

Add a flag to specify distribution strategies. (#6185) · 79b57a3f

Yuefeng Zhou authored Feb 12, 2019

* Add a flag to specify distribution strategies.

* Fix a small error.

* Address comments.

* Address comments.

* Fix typos.

79b57a3f

12 Feb, 2019 1 commit

V2 contrib tweaks (#6184) · a1ee97e6

Toby Boyd authored Feb 11, 2019

* Remove contrib thread pool.

* Remove commented out contrib import.

* Fix lint issues.

* move tf.data.options higher. Tweak line breaks.

* do not monkey patch on or off if dist_strat is off

* Do not monkey patch if no_dist_strat.

* Fix file permissions.

* fix file permissions.

* Revert change to main.  Add hasattr(tf, 'contrib') to utils

* compat.v1.logging

* tf.compat.v1.get_local_variables.

a1ee97e6

11 Feb, 2019 1 commit

Remove contrib thread pool. (#6175) · b6c0c7f9

Toby Boyd authored Feb 11, 2019

* Remove contrib thread pool.

* Remove commented out contrib import.

* Fix lint issues.

* move tf.data.options higher. Tweak line breaks.

b6c0c7f9

09 Feb, 2019 1 commit

Add pure synthetic data to keras resnet model. (#6174) · 05383c7b

Yuefeng Zhou authored Feb 08, 2019

* Add pure synthetic data to keras resnet mode.

* Add imports.

* Address comments.

* update comment

* Undo set up synthetic data for real data path.

* update comment

* Address comment

* Remove trailing whiltespaces.

* s/make_data_set_iterator/make_dataset_iterator/

05383c7b

08 Feb, 2019 1 commit
- Revert "Revert "tf_upgrade_v2 on resnet and utils folders. (#6154)" (#6162)" (#6167) · b2c9e3f5
  Goldie Gadde authored Feb 08, 2019
```
This reverts commit 57e07520.
```
  b2c9e3f5