Commits · 2eeb85feb7f915b175f677f630abbfdfec5749f2 · ModelZoo / ResNet50_tensorflow

04 Jun, 2018 1 commit

First pass at a TPU loop for Transformer (#4296) · 2eeb85fe

Taylor Robie authored Jun 04, 2018

* port changes from previous branch now that transformer util changes are in master

fix incorrect count

correct (hopefully) treatment of batch_size

set eval_metrics to a dummy function for now

add some comments

start bringing metrics to transformer TPU

resolve logits shape

metrics are now working except for tf.py_func metrics

increase batch_size for tpu, and create summary host call

fix host call

reduce tpu default batch size

further tune batch sizes

add minibatch loss to summary

handle case of single_iteration_train_steps > number points in an epoch

begin to incorporate hooks

add sleep workarounds

disable hooks altogether

generalize host call function and move to newly created tpu utils module

remove all traces of params as an object

switch from  to

address some PR comments, and change the number of data points.

minor tweaks

add tpu dry run for testing, and use matmul for TPU embedding

infeed/outfeed queue issue is fixed. Sleeps are no longer necessary

add some documentation.

cleanup and address PR comments

delint

add accelerator __init__

fix embedding

missed PR comment

address PR comments

fix validator bug

rewrite cloud storage validator, and add oauth dependency to requirements.txt

* delint

2eeb85fe

01 Jun, 2018 3 commits

Fix the hooks test comments (#4427) · 02571056
Yanhui Liang authored Jun 01, 2018

02571056
Add new test ID and test env info to the benchmark run. (#4426) · d2d6ab4c
Qianli Scott Zhu authored Jun 01, 2018
```
* Add new test ID and test env info to the benchmark run.

* Fix test.

* Fix lint

* Address review comment.
```
d2d6ab4c

Record the status for a benchmark run. (#4402) · 47c5642e

Qianli Scott Zhu authored Jun 01, 2018

* Update benchmark logger to update the run status.

This is important for streaming upload to bigquery so that the
dashboard can ignore the 'running' benchmark at the moment since
its not finished yet.

* Move the run status into a separate table.

Also update the run status in the benchmark uploader and
BigqueryBenchmarkLogger.

* Insert instead of update for the benchmark status for file logger.

* Address review comments.

Update the logger to have benchmark context, which will update the
run status accordingly.

* Fix broken tests.

* Move the benchmark logger context to main function.

* Fix tests.

* Update the rest of the models to use the context in main.

* Delint.

47c5642e

30 May, 2018 1 commit
- Fix hooks_test for examples/second hook (#4411) · d41ed934
  Yanhui Liang authored May 30, 2018
```
* Fix hooks_test

* Add more comments

* Fix lints
```
  d41ed934
25 May, 2018 1 commit

Fix/log ex per sec (#4360) · d626b908

Karmel Allison authored May 25, 2018

* Using BenchmarkLogger

* Using BenchmarkLogger

* Fixing tests

* Linting fixes.

* Adding comments

* Moving mock logger

* Moving mock logger

* Glinting

* Responding to CR

* Reverting assertEmpty

d626b908

11 May, 2018 2 commits

Add benchmark logger that does stream upload to bigquery. (#4210) · 0270cac7

Qianli Scott Zhu authored May 11, 2018

* Move the benchmark_uploader to new location.

* Update benchmark logger to streaming upload.

* Fix lint and unit test error.

* delint.

* Update the benchmark uploader test.

Skip the import of benchmark_uploader when bigquery is not installed.

* Merge the 2 classes of benchmark uploader into 1.

* Address review comments.

* delint.

* Execute bigquery upload in a separate thread.

* Change to use python six.moves for importing.

* Address review comments and delint.

* Address review comment.

Adding comment for potential performance impact for model on CPU.

* Fix random failure on py3.

* Fix the order of flag saver to avoid the randomness.

The test is broken when the benchmark_logger_type is set first, and
validated when the benchmark_log_dir is not set yet.

0270cac7

Add official flag-parsing and benchmarking logging utils to Transformer (#4163) · a84e1ef9
Katherine Wu authored May 11, 2018

a84e1ef9

08 May, 2018 1 commit
- Forbid ResNet v1 from running with fp16 (#4207) · 4b8fe704
  Taylor Robie authored May 08, 2018
```
* forbid resnet v1 fp16

* address PR comments
```
  4b8fe704
03 May, 2018 3 commits

Add dataset info and hyper parameter logging for benchmark. (#4152) · eb0c0dfd

Qianli Scott Zhu authored May 03, 2018

* Add dataset info and hyper parameter logging for benchmark.

* Address review comments.

* Address the view comment for data schema name.

* Fix test cases.

* Lint fix.

eb0c0dfd

Restore ResNet Distribution Strategies (#4134) · 18d05ad3

Taylor Robie authored May 03, 2018

* Revert 823da318. This restores distribution strategies for resnet.

This commit is not a direct revert due to significant merge conflict
resolution.

* fix flags test

* npc is no longer used in resnet

18d05ad3

Move argparsing from builtin argparse to absl (#4099) · 5f9f6b84

Taylor Robie authored May 02, 2018

* squash of modular absl usage commits

* delint

* address PR comments

* change hooks to comma separated list, as absl behavior for space separated lists is not as expected

5f9f6b84

01 May, 2018 1 commit
- catch cpuinfo ImportError (#4138) · 5c78b9d7
  Taylor Robie authored May 01, 2018
```
* catch cpuinfo ImportError

* add psutil import catch

* fix typo
```
  5c78b9d7
26 Apr, 2018 1 commit
- Add export savedmodel to wide_deep (#4041) · db778817
  Katherine Wu authored Apr 26, 2018
  
  db778817
20 Apr, 2018 1 commit

Update the timestamp to default to UTC. (#4040) · 31f7f41b

Qianli Scott Zhu authored Apr 20, 2018

Previously the timestamp we pushed to bigquery was PDT, but the
timezone spec was set to 'Z' (UTC). This is provide incorrect
value to bigquery, and might affect analysis down the road.

31f7f41b

19 Apr, 2018 2 commits

Benchmark update (#4034) · 21ec0e1b

Qianli Scott Zhu authored Apr 19, 2018

* Update the benchmark logger to have default logging.

1. Create global instance of benchmark logger, which default log to
tf.logging.info
2. Allow user to config the logging location.
3. Fix nits in code and comment.

* Fix lint and test error.

* Address review comments.

* Remove the duplicated print statement.

21ec0e1b

Revert "Resnet distribution strategies (#3887)" (#4033) · 823da318
Taylor Robie authored Apr 19, 2018
```
This reverts commit 32aa6563.
```
823da318

12 Apr, 2018 1 commit

Resnet distribution strategies (#3887) · 32aa6563

Taylor Robie authored Apr 12, 2018

* begin transfer from contrib fork

more changes to resnet_run_loop

use AUTOTUNE in prefetch

first pass at resnet with functional distribution strategies

fix syntax error

delint

aesthetic tweaks

delint and fix typos

rip multi_gpu flag out of resnet entirely. Subject to saved model load verification

update cifar10 and imagenet tests to reflect that the model function no longer need to know about multi_gpu

fix imagenet test

start addressing PR comments

more PR response work

* misc tweaks

* add a comment

* final pr tweaks

* fix parsers

32aa6563

10 Apr, 2018 3 commits
- change reference_data.py to use tf.gfile (#3921) · 2661eb97
  Taylor Robie authored Apr 10, 2018
```
* change reference_data.py to use tf.gfile

* simplify json treatment

* Update reference files to account for a superficial change in batch_norm
```
  2661eb97
- Add copyright header for testing script. (#3943) · c93409cf
  Qianli Scott Zhu authored Apr 10, 2018
  
  c93409cf
- Adding stop threshold logic (#3863) · 310f70d5
  Karmel Allison authored Apr 10, 2018
```
* Adding tests

* Adding tests

* Repackaging

* Adding logging

* Linting
```
  310f70d5
09 Apr, 2018 1 commit

Add fp16 support to official ResNet. (#3687) · fbb27cf3

Taylor Robie authored Apr 09, 2018

* Add fp16 support to resnet.

* address PR comments

* add dtype checking to model definition

* delint

* more PR comments

* few more tweaks

* update resnet checkpoints

fbb27cf3

03 Apr, 2018 2 commits
- Rename logging directory (#3860) · a0e3604f
  Karmel Allison authored Apr 03, 2018
```
* Updating name of logging package to avoid overwriting Python builtin logging.

* Updating name of logging package to avoid overwriting Python builtin logging.
```
  a0e3604f
- Fix the testing script exit code for python test. (#3858) · c3b26603
  Qianli Scott Zhu authored Apr 03, 2018
  
  c3b26603
02 Apr, 2018 1 commit

Add testing script for local lint and python test. (#3797) · 03bf0d38

Qianli Scott Zhu authored Apr 02, 2018

* Add presubmit testing script for local testing.

* Update the test script to be more modularized.

1. Check the script file location and cd into repo root dir.
2. Allow caller to call differnt tests.

03bf0d38

29 Mar, 2018 1 commit
- Add End-to-end tests for wide deep, and fix "wide" and "deep" configurations. (#3798) · 9cc7eac1
  Taylor Robie authored Mar 29, 2018
```
* add end-to-end tests for wide_deep

delint

* address PR comments
```
  9cc7eac1
28 Mar, 2018 2 commits

Add SavedModel export to Resnet (#3759) · eb73a850

Karmel Allison authored Mar 28, 2018

* Adding export_dir and model saving for Resnet

* Moving to utils for tests

* Adding batch_size

* Adding multi-gpu export warning

* Responding to CR

* Py3 compliance

eb73a850

Add benchmark upload util to Bigquery. (#3776) · 932364b6

Qianli Scott Zhu authored Mar 28, 2018

* Add benchmark upload util to bigquery.

Also update the benchmark logger and bigquery schema for the
errors found during the integration test.

* Fix lint error.

* Update test to clear all the env vars during test.

This was causing error since the Kokoro test has TF_PKG=tf-nightly
injected during test.

* Update lintrc to ignore google related package.

* Another attempt to fix lint import error.

* Address the review comment.

* Fix lint error.

* Another fix for lint.

* Update test comment for env var clean up.

932364b6

27 Mar, 2018 3 commits

Update the importing logic for cpuinfo and psutil. (#3781) · 03781c74

Qianli Scott Zhu authored Mar 27, 2018

* Update the importing logic for cpuinfo and psutil.

Those two libs are usually not installed by default, and we should
not force people to install them if they just want to run resnet.

* Add pylint warning suppression.

03781c74

Add reference data tests to official. (#3723) · 587f5792

Taylor Robie authored Mar 27, 2018

* Add golden test util to streamline symbolic and numerical comparison to reference graphs, and apply golden tests to ResNet.

update tests

use more concise logic for path property

delint

add some comments

delint

address PR comments

make resnet tests more concise, and supress warning test in py2

change resnet name template

more shuffling of data dirs

address PR comments and add tensorflow version info

Remove subTest due to py2

switch from tf.__version__ to tf.VERSION, and include tf.GIT_VERSION

supress lint error from json load unpack

* address PR comments

* address PR comments

* delint

587f5792

Add requirements.txt to official. (#3760) · 86cb0aa3

Taylor Robie authored Mar 27, 2018

* add requirements.txt now that there are dependencies beyond tensorflow

* direct pip info to README

86cb0aa3

26 Mar, 2018 1 commit

Benchmark run info logging (#3708) · d3952b2c

Qianli Scott Zhu authored Mar 26, 2018

* Init test for logging benchmark run.

* Fix collect CPU info.

* Update max split for handling GPU information.

* Another fix for parse GPU info.

* Fix GPU and CPU info collector.

* Update logging function to be static.

* Remove the cifar10 logging and fix a lint error.

* Address the review comment.

* Fix lint error.

* Fix lint error for logger and logger_test.

* Another lint fix for the test.

* Simplify the CPU info logging.

We will start in a conserative way, and probably add more info in
future.

* Remove unused dependencies.

d3952b2c

23 Mar, 2018 2 commits

Fix random order problem in benchmark logging. (#3725) · 08af7775
Qianli Scott Zhu authored Mar 23, 2018

08af7775

Resnet benchmark logging (#3704) · b9b44f7b

Qianli Scott Zhu authored Mar 23, 2018

* Update reset model for benchmark logging.

To enable benchmark logging, just add "--hooks LoggingMetricHook"

* Benchmark logger fix for resnet.

1. Update default at_end to False for metric logger to avoid
checkpoint error.
2. Update resnet run to log final evaluation result.

* Update log output for final eval_result.

* Typo fix.

* Unset the default value for benchmark_log_dir.

Usually the benchmark should be logged to different directly for
each run. Having a default value will hide the choice from user.

* Bug fix for benchmark logger initialization.

* Fix lint error.

* Address the review comment.

1. Update the logger to cover evaluation result.
2. Move the flag to performance parser.

* Undo the change for arg_parser.

b9b44f7b

21 Mar, 2018 2 commits

Add session hook for benchmark metric logging. (#3672) · 4b85dab1

Qianli Scott Zhu authored Mar 21, 2018

* Add session hook for benchmark metric logging.

Current hook is very similar as the LoggingTensorHook. Some of the
function are directly copied since the original one was not
exposed for import. We should seek to eventually move this code to
core when it is mature enough.

* Update metric_hook to use LoggingTensorHook as base.

The existing hook is similar enough to LoggingTensorHook, and
we should eliminate duplicate as much as possible.

* Address review comment.

1. Update global step tensor handle.
2. Update tests.
3. Update document.

* Update tests for py3.

* Fix lint error

4b85dab1

Fixing linting for kokoro (#3676) · bea947de
Karmel Allison authored Mar 20, 2018

bea947de

20 Mar, 2018 2 commits

Glint everything (#3654) · 7cfb6bbd

Karmel Allison authored Mar 20, 2018

* Glint everything

* Adding rcfile and pylinting

* Extra newline

* Few last lints

7cfb6bbd

Use util functions hooks_helper and parser in mnist and wide_deep, and rename... · adfd5a3a
Katherine Wu authored Mar 20, 2018
```
Use util functions hooks_helper and parser in mnist and wide_deep, and rename epochs_between_eval (from epochs_per_eval) (#3650)
```
adfd5a3a

19 Mar, 2018 2 commits
- Move the official/benchmark files to offical/utils/logging. · 0308e7e1
  Scott Zhu authored Mar 19, 2018
  
  0308e7e1
- Improve directory treatment in ResNet end-to-end tests. (#3651) · f85ab4c8
  Taylor Robie authored Mar 19, 2018
```
* use proper temp directory for end to end tests.

* add supers to tearDown
```
  f85ab4c8