Commits · 05a79f5a2227ff89a67c6ebd862f4818e6301f97 · ModelZoo / ResNet50_tensorflow

07 Mar, 2019 1 commit

Add command line option for multi worker collective implementations, disable checkpointing. (#6317) · 05a79f5a

Ayush Dubey authored Mar 07, 2019

* s/CollectiveAllReduceStrategy/MultiWorkerMirroredStrategy

* More s/contrib.distribute/distribute.experimental

* Collective communication options in MultiWorkerMirroredStrategy.

* Minor fixes

* No checkpointing if multi worker.

* turn off checkpointing

* fix lint

05a79f5a

02 Mar, 2019 1 commit
- fix resnet breakage and add keras end-to-end tests (#6295) · 8367cf6d
  Taylor Robie authored Mar 02, 2019
```
* fix resnet breakage and add keras end-to-end tests

* delint

* address PR comments
```
  8367cf6d
01 Mar, 2019 1 commit

Keras-fy NCF Model (#6092) · 048e5bff

Shining Sun authored Mar 01, 2019

* tmp commit

* tmp commit

* first attempt (without eval)

* Bug fixes

* bug fixes

* training done

* Loss NAN, no eval

* Loss weight problem solved

* resolve the NAN loss problem

* Problem solved. Clean up needed

* Added a todo

* Remove debug prints

* Extract get_optimizer to ncf_common

* Move metrics computation back to neumf; use DS.scope api

* Extract DS.scope code to utils

* lint fixes

* Move obtaining DS above producer.start to avoid race condition

* move pt 1

* move pt 2

* Update the run script

* Wrap keras_model related code into functions

* Update the doc for softmax_logitfy and change the method name

* Resolve PR comments

* working version with: eager, DS, batch and no masks

* Remove git conflict indicator

* move reshape to neumf_model

* working version, not converge

* converged

* fix a test

* more lint fix

* more lint fix

* more lint fixes

* more lint fix

* Removed unused imports

* fix test

* dummy commit for kicking of checks

* fix lint issue

* dummy input to kick off checks

* dummy input to kick off checks

* add collective to dist strat

* addressed review comments

* add a doc string

048e5bff

28 Feb, 2019 1 commit
- Change `CollectiveAllReduceStrategy` to `MultiWorkerMirroredStrategy`. (#6282) · d793ea82
  Ayush Dubey authored Feb 28, 2019
```
* s/CollectiveAllReduceStrategy/MultiWorkerMirroredStrategy

* More s/contrib.distribute/distribute.experimental
```
  d793ea82
21 Feb, 2019 1 commit

Multi-worker support for Resnet. (#6206) · f2e90945

Ayush Dubey authored Feb 21, 2019

* Update official resnet for multi worker training with distribution strategies.

* Fixes for multi worker training.

* Fix call to `get_distribution_strategy`.

* Undo test change.

* Fix spacing.

* Move cluster configuration to distribution_utils.

* Move train_and_evaluate out of loop.  Also, update docstrings for multi-worker flags and add use_train_and_evaluate flag.

* Update distribution_strategy flag to match exported name for collective strategy.

f2e90945

14 Feb, 2019 1 commit
- One device strat (#6196) · b66ef95e
  Toby Boyd authored Feb 13, 2019
```
* One device from contrib to core.

* remove test code.
```
  b66ef95e
13 Feb, 2019 1 commit

Add a flag to specify distribution strategies. (#6185) · 79b57a3f

Yuefeng Zhou authored Feb 12, 2019

* Add a flag to specify distribution strategies.

* Fix a small error.

* Address comments.

* Address comments.

* Fix typos.

79b57a3f

12 Feb, 2019 1 commit

V2 contrib tweaks (#6184) · a1ee97e6

Toby Boyd authored Feb 11, 2019

* Remove contrib thread pool.

* Remove commented out contrib import.

* Fix lint issues.

* move tf.data.options higher. Tweak line breaks.

* do not monkey patch on or off if dist_strat is off

* Do not monkey patch if no_dist_strat.

* Fix file permissions.

* fix file permissions.

* Revert change to main.  Add hasattr(tf, 'contrib') to utils

* compat.v1.logging

* tf.compat.v1.get_local_variables.

a1ee97e6

09 Feb, 2019 1 commit

Add pure synthetic data to keras resnet model. (#6174) · 05383c7b

Yuefeng Zhou authored Feb 08, 2019

* Add pure synthetic data to keras resnet mode.

* Add imports.

* Address comments.

* update comment

* Undo set up synthetic data for real data path.

* update comment

* Address comment

* Remove trailing whiltespaces.

* s/make_data_set_iterator/make_dataset_iterator/

05383c7b

01 Feb, 2019 1 commit
- Use core mirrored strategy in official models (#6126) · a66d4713
  guptapriya authored Jan 31, 2019
  
  a66d4713
27 Dec, 2018 1 commit
- Fixed lint and flag issues · 03c35ec6
  Shining Sun authored Dec 27, 2018
  
  03c35ec6
24 Dec, 2018 1 commit
- fix lint errors. · 122bb012
  Toby Boyd authored Dec 23, 2018
  
  122bb012
21 Dec, 2018 1 commit
- bug fixes · c923a420
  Shining Sun authored Dec 21, 2018
  
  c923a420
20 Dec, 2018 2 commits
- bug fixes and clean ups · 6f881f77
  Shining Sun authored Dec 20, 2018
  
  6f881f77
- Inlude the distribution_utils file · b1b4c805
  Shining Sun authored Dec 19, 2018
  
  b1b4c805
21 Nov, 2018 1 commit

cross_tower_ops -> cross_device_ops (#5776) · 9a4848a2

josh11b authored Nov 20, 2018

We've deprecated the "tower" terminology in DistributionStrategy, so the "cross_tower_ops" argument is now "cross_device_ops", matching the current name of "AllReduceCrossDeviceOps".

9a4848a2

25 Oct, 2018 1 commit
- Update distribution_utils.py · c5dbd487
  josh11b authored Oct 24, 2018
  
  c5dbd487
24 Oct, 2018 1 commit
- AllReduceCrossTowerOps -> AllReduceCrossDeviceOps · 6c560cb3
  josh11b authored Oct 24, 2018
  
  6c560cb3
12 Oct, 2018 1 commit
- forced nccl has same num_packs as default. · 1f21b69e
  Toby Boyd authored Oct 12, 2018
  
  1f21b69e
12 Jun, 2018 1 commit

Transformer multi gpu, remove multi_gpu flag, distribution helper functions (#4457) · 29c9f985

Katherine Wu authored Jun 12, 2018

* Add DistributionStrategy to transformer model

* add num_gpu flag

* Calculate per device batch size for transformer

* remove reference to flags_core

* Add synthetic data option to transformer

* fix typo

* add import back in

* Use hierarchical copy

* address PR comments

* lint

* fix spaces

* group train op together to fix single GPU error

* Fix translate bug (sorted_keys is a dict, not a list)

* Change params to a default dict (translate.py was throwing errors because params didn't have the TPU parameters.)

* Address PR comments. Removed multi gpu flag + more

* fix lint

* fix more lints

* add todo for Synthetic dataset

* Update docs

29c9f985