Commits · f2ea2f537967af92fd47eeeb91b0e312aedaddad · ModelZoo / ResNet50_tensorflow

24 May, 2019 2 commits

Transformer v2 benchmark (#6860) · f2ea2f53

Toby Boyd authored May 24, 2019

* Moved common keras code to utils.

* Initial 1 gpu benchmark

- Aligned flags with resnet example
- removed code/features that are not super useful
- eval as part of train if bleu source/ref provided
- add exp_per_second hook

* Rename benchmark classes, pass batch-size and log_steps.

* fix docstring

* Predict done with checkpoints inline

- perfzero baseclass

* steps not epochs with smoother training loop.

* do not initialize history outside loop.

* 5000 between eval not 500

* estimator to keras.

* remove epochs var.

* use range not xrange.

* 200K steps for 1 gpu

* fix global step

f2ea2f53

Merged commit that fixes transformer's predict and eval. (#6874) · b9cab01b

Tian Lin authored May 24, 2019

* Merged commit includes the following changes:
249776315  by tianlin<tianlin@google.com>:

    Internal change

249763206  by tianlin<tianlin@google.com>:

    For TF 2.0 (related to Beam Search), expand cond dims in tf.where(cond, x, y) to make all parameters broadcastable.

--
249392724  by hongkuny<hongkuny@google.com>:

    Internal change

PiperOrigin-RevId: 249776315

* Merged commit includes the following changes:
249823043  by tianlin<tianlin@google.com>:

    Bring back v2 test for predict and eval.

--

PiperOrigin-RevId: 249823043

b9cab01b

22 May, 2019 3 commits

fix lint issues. (#6855) · 3a97b68c
Toby Boyd authored May 22, 2019

3a97b68c

Add Transformer Big Benchmarks + FP16 for other tests. (#6838) · 23f75313

Toby Boyd authored May 22, 2019

* Add big tests.

* fix super

* Add fp16, increase 8xGPU batch-sizes

* Adding the rest of the fp16 tests.

* Big accuracy test batch_perf_gpu

* fix docstrings

* add _run_and_report

* Edited docstrings

23f75313

Merge Transformer V2 to Github (#6846) · c4f34e58

Tian Lin authored May 22, 2019

* Merged commit includes the following changes:
249218656  by tianlin<tianlin@google.com>:

    Deal with imports, fix a typo and make unit tests fast.

--
249198645  by tianlin<tianlin@google.com>:

    Trivial: Remove one empty line before "import tensorflow"

--
249195490  by tianlin<tianlin@google.com>:

    Initialize Transformer TF V2 Model with Keras subclassing implementation. (Compatible with TF V1)

--
249195008  by tianlin<tianlin@google.com>:

    Internal change

249173564  by hongkuny<hongkuny@google.com>:

    Internal change

249079258  by hongkuny<hongkuny@google.com>:

    Internal change

247691534  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

247533725  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

247509295  by haoyuzhang<haoyuzhang@google.com>:

    Internal change

247311355  by wangtz<wangtz@google.com>:

    Internal change

247303127  by wangtz<wangtz@google.com>:

  ...

c4f34e58

11 May, 2019 1 commit

Add FP16 to transformer with benchmark tests. (#6756) · b7e97bec

Toby Boyd authored May 10, 2019

* Add FP16 and benchmarks.

* add missing run and report.

* Add loss_scale as option not included with dtype.

* move loss_scale validation under dtype conditional.

* add loss_scale to flags tested.

b7e97bec

09 May, 2019 1 commit

Transformer instrumented for benchmarking (#6734) · 40543869

Toby Boyd authored May 09, 2019

* Add first benchmark and return stats.

* Remove print statements update training steps.

* Revert print T: in print statement.

* Remove print(stats)

* add 2 gpu accuracy test for base.

* Fixed total_batch_size when using gpu + gFile deprecations.

* 8 GPU test name fix

* Add 4 and 8 GPU tests.

* typo fixes.

* Clean up test names and methods.

* bleu uncased.  docstring format fix.

40543869

07 May, 2019 1 commit
- Move tests_data to gcs and upgrade data_download. (#6722) · 0f76239b
  Toby Boyd authored May 06, 2019
  
  0f76239b
29 Apr, 2019 2 commits

Replace per_device with per_replica and PerDevice with PerReplica, because the... · b00783d7

Igor authored Apr 29, 2019

Replace per_device with per_replica and PerDevice with PerReplica, because the PerDevice concept was renamed and doesn't exist anymore. (#6693)

* Replace per_device with per_replica and PerDevice with PerReplica, because the PerReplica concept was renamed and doesn't exist anymore.

b00783d7

fixed simple typo (#6686) · d087c89b
Songyi Blair Han authored Apr 30, 2019

d087c89b

12 Apr, 2019 1 commit
- Update README.md (#6569) · b4b8c723
  Yash Katariya authored Apr 12, 2019
```
* Update README.md

* Update README.md

* Update README.md
```
  b4b8c723
13 Feb, 2019 1 commit

Add a flag to specify distribution strategies. (#6185) · 79b57a3f

Yuefeng Zhou authored Feb 12, 2019

* Add a flag to specify distribution strategies.

* Fix a small error.

* Address comments.

* Address comments.

* Fix typos.

79b57a3f

02 Feb, 2019 1 commit
- Typos. (#6120) · 51814d49
  Paige Bailey authored Feb 02, 2019
  
  51814d49
15 Jan, 2019 1 commit
- Stop crashing at the end of training phase. (#6049) · 1c99681e
  wangtz authored Jan 16, 2019
```
It currently fails with
TypeError: not all arguments converted during string formatting
```
  1c99681e
20 Dec, 2018 1 commit
- Move references to `tf_record_iterator`. (#5830) · a1eb92b0
  Mark Daoust authored Dec 19, 2018
```
For tf2 this will only be available in `compat.v1`.
```
  a1eb92b0
17 Dec, 2018 1 commit

Explicitly pass values kwarg to tf.name_scope (#5922) · 91a59c78

bananabowl authored Dec 17, 2018

Explicitly pass values kwarg to tf.name_scope as it is currently being treated as the default_name kwarg instead. This causes an exception to be thrown in eager mode.

91a59c78

04 Oct, 2018 1 commit

set strip_default_attrs=True for SavedModel exports (#5439) · cdcd3ec2

Taylor Robie authored Oct 04, 2018

* set strip_default_attrs=True for SavedModel exports

* specify dtype in resnet export

* another dtype fix

* fix another dtype issue, and set --image_bytes_as_serving_input to default to False

cdcd3ec2

30 Aug, 2018 2 commits

Bypassing Export model step, if training on TPU's. As this need inference to... · 23b5b422

Aman Gupta authored Aug 30, 2018

Bypassing Export model step, if training on TPU's. As this need inference to be supported on TPU's. Remove this check once inference is supported. (#5209)

23b5b422

Bypassing Export model step, if training on TPU's. As this need inference to... · 5133522f

Aman Gupta authored Aug 30, 2018

Bypassing Export model step, if training on TPU's. As this need inference to be supported on TPU's. Remove this check once inference is supported.

5133522f

16 Aug, 2018 1 commit

Deterministic dataset order fix (#5098) · 468d8bb6

Jules Gagnon-Marchand authored Aug 16, 2018

* Deterministic dataset order fix

In order for the order of the files to be deterministic, in `tf.data.Dataset.list_files(..., shuffle)`, shuffle needs to be True, otherwise different iterator inits will yield different file orders

* removed unnecessary shuffle of filenames

* Removed the `_FILE_SHUFFLE_BUFFER` definition

468d8bb6

14 Aug, 2018 1 commit

Transformer partial fix (#5092) · 6f5967a0

alope107 authored Aug 14, 2018

* Fix Transformer TPU crash in Python 2.X.

- Tensorflow raises an error when tf_inspect.getfullargspec is called on
a functools.partial in Python 2.X. This issue would be hit during the
eval stage of the Transformer TPU model. This change replaces the call
to functools.partial with a lambda to work around the issue.

* Remove unused import from transformer_main.

* Fix lint error.

6f5967a0

26 Jul, 2018 1 commit

fix batch_size in transformer_main.py (#4897) · 2d7a0d6a

Jiang Yu authored Jul 25, 2018

* fix batch_size in transformer_main.py

fix batch_size in transformer_main.py which causes ResourceExhaustedError: OOM during training Transformer models using models/official/transformer

* small format change

change format from one line to multiple ones in order to pass lint tests

* remove trailing space and add comment

2d7a0d6a

11 Jul, 2018 1 commit

Use six and feature detection in string conversion (#4740) · df978fdd

cclauss authored Jul 11, 2018

* Use six and feature detection in string conversion

Leverage [__six.ensure_text()__](https://github.com/benjaminp/six/blob/master/six.py#L890) to deliver Unicode text in both Python 2 and Python 3.

Follow Python porting best practice [use feature detection instead of version detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection) in ___unicode_to_native()__.

* Revert the use of six.ensure_text()

Thanks for catching that!  I jumped the gun.  It is I who have brought shame...

df978fdd

26 Jun, 2018 1 commit
- Rename programmers_guide/ to guide/ in tf-models. · 5d747e22
  Billy Lamberta authored Jun 25, 2018
  
  5d747e22
22 Jun, 2018 1 commit
- Fix transformer test (#4606) · b1a704d7
  Katherine Wu authored Jun 22, 2018
  
  b1a704d7
20 Jun, 2018 1 commit

Wide Deep refactor and deep movies (#4506) · 20070ca4

Taylor Robie authored Jun 20, 2018

* begin branch

* finish download script

* rename download to dataset

* intermediate commit

* intermediate commit

* misc tweaks

* intermediate commit

* intermediate commit

* intermediate commit

* delint and update census test.

* add movie tests

* delint

* fix py2 issue

* address PR comments

* intermediate commit

* intermediate commit

* intermediate commit

* finish wide deep transition to vanilla movielens

* delint

* intermediate commit

* intermediate commit

* intermediate commit

* intermediate commit

* fix import

* add default ncf csv construction

* change default on download_if_missing

* shard and vectorize example serialization

* fix import

* update ncf data unittests

* delint

* delint

* more delinting

* fix wide-deep movielens serialization

* address PR comments

* add file_io tests

* investigate wide-deep test failure

* remove hard coded path and properly...

20070ca4

18 Jun, 2018 1 commit
- remove unused imports and lint (#4475) · eef72ed6
  Taylor Robie authored Jun 18, 2018
```
* remove unused imports and lint

* fix schedule.py

* address PR comments
```
  eef72ed6
12 Jun, 2018 2 commits

Add checklist for official models. Remove file access from flag validator (fix build) (#4492) · bb62f248
Katherine Wu authored Jun 12, 2018
```
* Add checklist for official models. Remove file access from flag validator (causing issues with BUILD)

* spelling

* address PR comments
```
bb62f248

Transformer multi gpu, remove multi_gpu flag, distribution helper functions (#4457) · 29c9f985

Katherine Wu authored Jun 12, 2018

* Add DistributionStrategy to transformer model

* add num_gpu flag

* Calculate per device batch size for transformer

* remove reference to flags_core

* Add synthetic data option to transformer

* fix typo

* add import back in

* Use hierarchical copy

* address PR comments

* lint

* fix spaces

* group train op together to fix single GPU error

* Fix translate bug (sorted_keys is a dict, not a list)

* Change params to a default dict (translate.py was throwing errors because params didn't have the TPU parameters.)

* Address PR comments. Removed multi gpu flag + more

* fix lint

* fix more lints

* add todo for Synthetic dataset

* Update docs

29c9f985

07 Jun, 2018 1 commit
- Change unittest tf test (#4485) · 0cb7e02d
  Katherine Wu authored Jun 07, 2018
  
  0cb7e02d
06 Jun, 2018 1 commit

Cleanup TPU-ization of Transformer (#4459) · 441c9bca

Taylor Robie authored Jun 06, 2018

* add tests for matmul embedding and schedule manager, as well as some minor cleanup

* delint

* address PR comments

441c9bca

05 Jun, 2018 1 commit
- Export Transformer saved model, and add vocab file flag. (#4281) · 21f794fd
  Katherine Wu authored Jun 05, 2018
  
  21f794fd
04 Jun, 2018 1 commit

First pass at a TPU loop for Transformer (#4296) · 2eeb85fe

Taylor Robie authored Jun 04, 2018

* port changes from previous branch now that transformer util changes are in master

fix incorrect count

correct (hopefully) treatment of batch_size

set eval_metrics to a dummy function for now

add some comments

start bringing metrics to transformer TPU

resolve logits shape

metrics are now working except for tf.py_func metrics

increase batch_size for tpu, and create summary host call

fix host call

reduce tpu default batch size

further tune batch sizes

add minibatch loss to summary

handle case of single_iteration_train_steps > number points in an epoch

begin to incorporate hooks

add sleep workarounds

disable hooks altogether

generalize host call function and move to newly created tpu utils module

remove all traces of params as an object

switch from  to

address some PR comments, and change the number of data points.

minor tweaks

add tpu dry run for testing, and use matmul for TPU embedding

infeed/outfeed queue issue is fixed. Sleeps are no longer necessary

add some documentation.

cleanup and address PR comments

delint

add accelerator __init__

fix embedding

missed PR comment

address PR comments

fix validator bug

rewrite cloud storage validator, and add oauth dependency to requirements.txt

* delint

2eeb85fe

01 Jun, 2018 2 commits

Add new test ID and test env info to the benchmark run. (#4426) · d2d6ab4c
Qianli Scott Zhu authored Jun 01, 2018
```
* Add new test ID and test env info to the benchmark run.

* Fix test.

* Fix lint

* Address review comment.
```
d2d6ab4c

Record the status for a benchmark run. (#4402) · 47c5642e

Qianli Scott Zhu authored Jun 01, 2018

* Update benchmark logger to update the run status.

This is important for streaming upload to bigquery so that the
dashboard can ignore the 'running' benchmark at the moment since
its not finished yet.

* Move the run status into a separate table.

Also update the run status in the benchmark uploader and
BigqueryBenchmarkLogger.

* Insert instead of update for the benchmark status for file logger.

* Address review comments.

Update the logger to have benchmark context, which will update the
run status accordingly.

* Fix broken tests.

* Move the benchmark logger context to main function.

* Fix tests.

* Update the rest of the models to use the context in main.

* Delint.

47c5642e

15 May, 2018 1 commit
- Fix transformer loss (#4270) · 0344c550
  Katherine Wu authored May 15, 2018
  
  0344c550
11 May, 2018 2 commits
- Update the wide_deep and transformer code for latest benchmark config. (#4246) · c0a380d2
  Qianli Scott Zhu authored May 11, 2018
```
* Update the wide_deep code for latest benchmark config.

* Also update the transformer benchmark code.
```
  c0a380d2
- Add official flag-parsing and benchmarking logging utils to Transformer (#4163) · a84e1ef9
  Katherine Wu authored May 11, 2018
  
  a84e1ef9
02 May, 2018 1 commit
- Add transformer model (#4148) · 3fca8afe
  Katherine Wu authored May 02, 2018
  
  3fca8afe