Commits · 4c11b84b1c7360aecff7c4a679d7e05076ffc19d · ModelZoo / ResNet50_tensorflow

28 Mar, 2019 4 commits

Added benchmark test and convergence test for the NCF model (#6318) · 4c11b84b

Shining Sun authored Mar 28, 2019

* initial commit

* bug fix

* Move build_stats from common to keras main, because it is only applicable in keras

* remove tailing blank line

* add test for synth data

* add kwargs to init

* add kwargs to function invokation

* correctly pass kwargs

* debug

* debug

* debug

* fix super init

* bug fix

* fix local_flags

* fix import

* bug fix

* fix log_steps flag

* bug fix

* bug fix: add missing return value

* resolve double-defined flags

* lint fix

* move log_steps flag to benchmarK flag

* fix lint

* lint fix

* lint fix

* try flag core default values

* bug fix

* bug fix

* bug fix

* debug

* debug

* remove debug prints

* rename benchmark methods

* flag bug fix for synth benchmark

4c11b84b

Re-enable checkpoints for multi worker GPU strategies. (#6471) · 6d3989eb
Ayush Dubey authored Mar 28, 2019

6d3989eb

Add a README for ResNet keras (#6293) · f5bb2af2

Shining Sun authored Mar 28, 2019

* Initial commit

* Fininshed

* bug fix

* bug fix

* bug fix

* Resolve review comments

* Typo fix

* resolve comments

* fix number error

* Resolve comments

f5bb2af2

Add trivial Keras model (#6460) · b09685fe
Haoyu Zhang authored Mar 27, 2019

b09685fe

27 Mar, 2019 2 commits

Change function signature (#6459) · 0b2b8997

cclauss authored Mar 27, 2019

* from NCF_input import NCFDataset for line 181

The type __NCFDataset__ is used in the type declaration on line 81 but it is never imported.

[flake8](http://flake8.pycqa.org) testing of https://github.com/tensorflow/models on Python 3.7.1

$ __flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics__
```
./official/recommendation/data_preprocessing.py:180:3: F821 undefined name 'NCFDataset'
  # type: (str, str, dict, typing.Optional[str], bool, typing.Optional[str]) -> (NCFDataset, typing.Callable)
  ^
1    F821 undefined name 'NCFDataset'
1
```
__E901,E999,F821,F822,F823__ are the "_showstopper_" [flake8](http://flake8.pycqa.org) issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.
* F821: undefined name `name`
* F822: undefined name `name` in `__all__`
* F823: local variable name referenced before assignment
* E901: SyntaxError or IndentationError
* E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree

* int, int, data_pipeline.BaseDataConstructor

0b2b8997

Add Keras ResNet model tests in legacy graph mode (#6444) · 7dbbca9d
Haoyu Zhang authored Mar 26, 2019

7dbbca9d

26 Mar, 2019 3 commits

Python typing: Use 'str', not 'string' (#6422) · e2ef6108

cclauss authored Mar 26, 2019

https://mypy.readthedocs.io/en/latest/cheat_sheet.html

[flake8](http://flake8.pycqa.org) testing of https://github.com/tensorflow/models on Python 3.7.1

$ __flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics__
```
./official/recommendation/data_pipeline.py:346:41: F821 undefined name 'string'
               epoch_dir=None           # type: string
                                        ^
```

e2ef6108

Move distribution strategy creation before creating any ops, which is (#6435) · b3594a83
Yuefeng Zhou authored Mar 25, 2019
```
required by multi-node collective ops in eager mode.
```
b3594a83

change tf.to_int32 to tf.cast (#6359) · 6765b16d

tranvohuy authored Mar 26, 2019

tf.to_int32 raise deprecated warning.
change tf.to_int32(labels) to tf.cast(labels, tf.int32)

6765b16d

25 Mar, 2019 1 commit
- Add/Modify tests to track Tensorboard overhead and improve performance of accuracy test. (#6434) · 94f9c2cc
  Haoyu Zhang authored Mar 25, 2019
  
  94f9c2cc
22 Mar, 2019 1 commit
- Disable Tensorboard callback by default (#6424) · 8d5d36e0
  Haoyu Zhang authored Mar 22, 2019
  
  8d5d36e0
20 Mar, 2019 2 commits
- Add `input_context` to `input_fn` in cifar10_main. (#6414) · 721cd512
  Ayush Dubey authored Mar 20, 2019
```
* Add `input_context` to `input_fn` in cifar10_main.

* Change sharding log message to be consistent with `dataset.shard` params.

* Lint
```
  721cd512
- Added thread tuning and tweaked tests to improve Keras model performance (#6396) · 7b5606a5
  Haoyu Zhang authored Mar 19, 2019
  
  7b5606a5
19 Mar, 2019 3 commits
- Add config to enable XLA in TF 2.0 (#6406) · dba24007
  Haoyu Zhang authored Mar 19, 2019
  
  dba24007
- Shard input for distribution strategy. (#6349) · 04792078
  Ayush Dubey authored Mar 19, 2019
```
* Shard input for distribution strategy.

* Pass in input_context from real input_fn.

* Pass in input_context from real input_fn.

* Make pipeline id base 1 for better readability.
```
  04792078
- Add the option to run Keras resnet model on multiple workers. (#6368) · 3024bde6
  Soroush Radpour authored Mar 19, 2019
  
  3024bde6
18 Mar, 2019 1 commit
- Add support for TPUEstimator to data processing pipeline and add the … (#6330) · cf304238
  Bruce Fontaine authored Mar 18, 2019
```
* Add support for TPUEstimator to data processing pipeline and add the ability to store epochs in user specified location.
```
  cf304238
13 Mar, 2019 2 commits

Fix ncf test for keras (#6355) · dadc4a62

Shining Sun authored Mar 13, 2019

* Fix ncf test for keras

* add a todo for batch_size and eval_batch_size for ncf keras

* lint fix

* fix typos

* Lint fix

* fix lint

* resolve pr comment

* resolve pr comment

dadc4a62

Add fp16 to 8 gpu fp16 tests. (#6353) · ba0a6f60
Toby Boyd authored Mar 12, 2019

ba0a6f60

12 Mar, 2019 2 commits
- xla to bs=128 for num_gpu=8 (#6351) · 19daade4
  Toby Boyd authored Mar 12, 2019
```
* xla to bs=128 for num_gpu=8

* remove todo
```
  19daade4
- V1 optimizer fix (#6350) · 9bdfb04a
  Toby Boyd authored Mar 12, 2019
```
* optimizer back to compat.v1

* add doc string to fix lint
```
  9bdfb04a
11 Mar, 2019 1 commit

Adding LARS to ResNet (#6327) · 0b0dc7f5

pkanwar23 authored Mar 11, 2019

* Adding LARS to ResNet

* Fixes for the LARS patch

* Fixes for the LARS patch

* more fixes

* 1 more fix

0b0dc7f5

07 Mar, 2019 3 commits
- No checkpointing only if multi worker strategy. (#6322) · a5db4420
  Ayush Dubey authored Mar 07, 2019
  
  a5db4420
- Add command line option for multi worker collective implementations, disable checkpointing. (#6317) · 05a79f5a
  Ayush Dubey authored Mar 07, 2019
```
* s/CollectiveAllReduceStrategy/MultiWorkerMirroredStrategy

* More s/contrib.distribute/distribute.experimental

* Collective communication options in MultiWorkerMirroredStrategy.

* Minor fixes

* No checkpointing if multi worker.

* turn off checkpointing

* fix lint
```
  05a79f5a
- Add fp16 to keras benchmarks (#6314) · 258b77cc
  Reed authored Mar 06, 2019
  
  258b77cc
06 Mar, 2019 1 commit
- Mixed precision support (#6309) · e4a046e7
  Reed authored Mar 06, 2019
```
* Mixed precision support

* Add TODOs
```
  e4a046e7
02 Mar, 2019 1 commit
- fix resnet breakage and add keras end-to-end tests (#6295) · 8367cf6d
  Taylor Robie authored Mar 02, 2019
```
* fix resnet breakage and add keras end-to-end tests

* delint

* address PR comments
```
  8367cf6d
01 Mar, 2019 3 commits

Keras-fy NCF Model (#6092) · 048e5bff

Shining Sun authored Mar 01, 2019

* tmp commit

* tmp commit

* first attempt (without eval)

* Bug fixes

* bug fixes

* training done

* Loss NAN, no eval

* Loss weight problem solved

* resolve the NAN loss problem

* Problem solved. Clean up needed

* Added a todo

* Remove debug prints

* Extract get_optimizer to ncf_common

* Move metrics computation back to neumf; use DS.scope api

* Extract DS.scope code to utils

* lint fixes

* Move obtaining DS above producer.start to avoid race condition

* move pt 1

* move pt 2

* Update the run script

* Wrap keras_model related code into functions

* Update the doc for softmax_logitfy and change the method name

* Resolve PR comments

* working version with: eager, DS, batch and no masks

* Remove git conflict indicator

* move reshape to neumf_model

* working version, not converge

* converged

* fix a test

* more lint fix

* more lint fix

* more lint fixes

* more lint fix

* Removed unused imports

* fix test

* dummy commit for kicking of checks

* fix lint issue

* dummy input to kick off checks

* dummy input to kick off checks

* add collective to dist strat

* addressed review comments

* add a doc string

048e5bff

Add Keras XLA Tests (#6286) · fa9ed456

Haoyu Zhang authored Mar 01, 2019

* Added XLA test with a monkey-patched op to avoid OOM

* Added doc strings in Keras benchmarks to avoid Lint error

fa9ed456

Update imagenet_preprocessing.py (#6291) · a76cd3ac
Yash Katariya authored Mar 01, 2019

a76cd3ac

28 Feb, 2019 3 commits
- Change `CollectiveAllReduceStrategy` to `MultiWorkerMirroredStrategy`. (#6282) · d793ea82
  Ayush Dubey authored Feb 28, 2019
```
* s/CollectiveAllReduceStrategy/MultiWorkerMirroredStrategy

* More s/contrib.distribute/distribute.experimental
```
  d793ea82
- Add benchmarks for thread tuning. (#6283) · 54dffe2e
  Yuefeng Zhou authored Feb 28, 2019
```
* Add benchmarks for thread tuning.

* Address comment/

* Add a comment.
```
  54dffe2e
- Updating stale DistributionStrategy test. (#6281) · 4b566d4e
  Tayo Oguntebi authored Feb 28, 2019
  
  4b566d4e
25 Feb, 2019 1 commit
- Add root_data_dir to constructor of Resnet50KerasBenchmarkSynth and... · 338088df
  Dong Lin authored Feb 25, 2019
```
Add root_data_dir to constructor of Resnet50KerasBenchmarkSynth and Resnet50KerasBenchmarkReal (#6259)
```
  338088df
22 Feb, 2019 4 commits
- Set data_dir to cifar-10-batches-bin in keras_cifar_benchmark.py (#6251) · da1d3e60
  Dong Lin authored Feb 22, 2019
  
  da1d3e60
- Remove isintance change for contrib strategy (#6250) · 21a4ad75
  guptapriya authored Feb 22, 2019
```
* Remove isintance change for contrib strategy

Replace it with class name check instead which should work regardless

* Add quotes for string

* fix quote type
```
  21a4ad75
- Add kwargs to make the benchmark class constructor forward compatible. (#6246) · 5f4d34fc
  Dong Lin authored Feb 21, 2019
```
This is needed to avoid breaking benchmark execution if PerfZero provides more
Named arguments before  the benchmark class constructor is updated.
```
  5f4d34fc
- Allow user to specify root_data_dir in the benchmark class constructor (#6213) · 5c6fa148
  Dong Lin authored Feb 21, 2019
```
* Allow user to specify root_data_dir in the benchmark class constructor

* Address comments
```
  5c6fa148
21 Feb, 2019 2 commits

Multi-worker support for Resnet. (#6206) · f2e90945

Ayush Dubey authored Feb 21, 2019

* Update official resnet for multi worker training with distribution strategies.

* Fixes for multi worker training.

* Fix call to `get_distribution_strategy`.

* Undo test change.

* Fix spacing.

* Move cluster configuration to distribution_utils.

* Move train_and_evaluate out of loop.  Also, update docstrings for multi-worker flags and add use_train_and_evaluate flag.

* Update distribution_strategy flag to match exported name for collective strategy.

f2e90945

Add flag to enable XLA in Keras models (#6240) · 4571d3fa
Haoyu Zhang authored Feb 21, 2019
```
* Add flag to enable XLA in Keras models

* Fix lint errors (some of them are old errors)
```
4571d3fa