- 04 Jun, 2018 1 commit
-
-
Taylor Robie authored
* port changes from previous branch now that transformer util changes are in master fix incorrect count correct (hopefully) treatment of batch_size set eval_metrics to a dummy function for now add some comments start bringing metrics to transformer TPU resolve logits shape metrics are now working except for tf.py_func metrics increase batch_size for tpu, and create summary host call fix host call reduce tpu default batch size further tune batch sizes add minibatch loss to summary handle case of single_iteration_train_steps > number points in an epoch begin to incorporate hooks add sleep workarounds disable hooks altogether generalize host call function and move to newly created tpu utils module remove all traces of params as an object switch from to address some PR comments, and change the number of data points. minor tweaks add tpu dry run for testing, and use matmul for TPU embedding infeed/outfeed queue issue is fixed. Sleeps are no longer necessary add some documentation. cleanup and address PR comments delint add accelerator __init__ fix embedding missed PR comment address PR comments fix validator bug rewrite cloud storage validator, and add oauth dependency to requirements.txt * delint
-
- 01 Jun, 2018 3 commits
-
-
Yanhui Liang authored
-
Qianli Scott Zhu authored
* Add new test ID and test env info to the benchmark run. * Fix test. * Fix lint * Address review comment.
-
Qianli Scott Zhu authored
* Update benchmark logger to update the run status. This is important for streaming upload to bigquery so that the dashboard can ignore the 'running' benchmark at the moment since its not finished yet. * Move the run status into a separate table. Also update the run status in the benchmark uploader and BigqueryBenchmarkLogger. * Insert instead of update for the benchmark status for file logger. * Address review comments. Update the logger to have benchmark context, which will update the run status accordingly. * Fix broken tests. * Move the benchmark logger context to main function. * Fix tests. * Update the rest of the models to use the context in main. * Delint.
-
- 30 May, 2018 1 commit
-
-
Yanhui Liang authored
* Fix hooks_test * Add more comments * Fix lints
-
- 25 May, 2018 1 commit
-
-
Karmel Allison authored
* Using BenchmarkLogger * Using BenchmarkLogger * Fixing tests * Linting fixes. * Adding comments * Moving mock logger * Moving mock logger * Glinting * Responding to CR * Reverting assertEmpty
-
- 11 May, 2018 2 commits
-
-
Qianli Scott Zhu authored
* Move the benchmark_uploader to new location. * Update benchmark logger to streaming upload. * Fix lint and unit test error. * delint. * Update the benchmark uploader test. Skip the import of benchmark_uploader when bigquery is not installed. * Merge the 2 classes of benchmark uploader into 1. * Address review comments. * delint. * Execute bigquery upload in a separate thread. * Change to use python six.moves for importing. * Address review comments and delint. * Address review comment. Adding comment for potential performance impact for model on CPU. * Fix random failure on py3. * Fix the order of flag saver to avoid the randomness. The test is broken when the benchmark_logger_type is set first, and validated when the benchmark_log_dir is not set yet.
-
Katherine Wu authored
-
- 08 May, 2018 1 commit
-
-
Taylor Robie authored
* forbid resnet v1 fp16 * address PR comments
-
- 03 May, 2018 3 commits
-
-
Qianli Scott Zhu authored
* Add dataset info and hyper parameter logging for benchmark. * Address review comments. * Address the view comment for data schema name. * Fix test cases. * Lint fix.
-
Taylor Robie authored
* Revert 823da318. This restores distribution strategies for resnet. This commit is not a direct revert due to significant merge conflict resolution. * fix flags test * npc is no longer used in resnet
-
Taylor Robie authored
* squash of modular absl usage commits * delint * address PR comments * change hooks to comma separated list, as absl behavior for space separated lists is not as expected
-
- 01 May, 2018 1 commit
-
-
Taylor Robie authored
* catch cpuinfo ImportError * add psutil import catch * fix typo
-
- 26 Apr, 2018 1 commit
-
-
Katherine Wu authored
-
- 20 Apr, 2018 1 commit
-
-
Qianli Scott Zhu authored
Previously the timestamp we pushed to bigquery was PDT, but the timezone spec was set to 'Z' (UTC). This is provide incorrect value to bigquery, and might affect analysis down the road.
-
- 19 Apr, 2018 2 commits
-
-
Qianli Scott Zhu authored
* Update the benchmark logger to have default logging. 1. Create global instance of benchmark logger, which default log to tf.logging.info 2. Allow user to config the logging location. 3. Fix nits in code and comment. * Fix lint and test error. * Address review comments. * Remove the duplicated print statement.
-
Taylor Robie authored
This reverts commit 32aa6563.
-
- 12 Apr, 2018 1 commit
-
-
Taylor Robie authored
* begin transfer from contrib fork more changes to resnet_run_loop use AUTOTUNE in prefetch first pass at resnet with functional distribution strategies fix syntax error delint aesthetic tweaks delint and fix typos rip multi_gpu flag out of resnet entirely. Subject to saved model load verification update cifar10 and imagenet tests to reflect that the model function no longer need to know about multi_gpu fix imagenet test start addressing PR comments more PR response work * misc tweaks * add a comment * final pr tweaks * fix parsers
-
- 10 Apr, 2018 3 commits
-
-
Taylor Robie authored
* change reference_data.py to use tf.gfile * simplify json treatment * Update reference files to account for a superficial change in batch_norm
-
Qianli Scott Zhu authored
-
Karmel Allison authored
* Adding tests * Adding tests * Repackaging * Adding logging * Linting
-
- 09 Apr, 2018 1 commit
-
-
Taylor Robie authored
* Add fp16 support to resnet. * address PR comments * add dtype checking to model definition * delint * more PR comments * few more tweaks * update resnet checkpoints
-
- 03 Apr, 2018 2 commits
-
-
Karmel Allison authored
* Updating name of logging package to avoid overwriting Python builtin logging. * Updating name of logging package to avoid overwriting Python builtin logging.
-
Qianli Scott Zhu authored
-
- 02 Apr, 2018 1 commit
-
-
Qianli Scott Zhu authored
* Add presubmit testing script for local testing. * Update the test script to be more modularized. 1. Check the script file location and cd into repo root dir. 2. Allow caller to call differnt tests.
-
- 29 Mar, 2018 1 commit
-
-
Taylor Robie authored
* add end-to-end tests for wide_deep delint * address PR comments
-
- 28 Mar, 2018 2 commits
-
-
Karmel Allison authored
* Adding export_dir and model saving for Resnet * Moving to utils for tests * Adding batch_size * Adding multi-gpu export warning * Responding to CR * Py3 compliance
-
Qianli Scott Zhu authored
* Add benchmark upload util to bigquery. Also update the benchmark logger and bigquery schema for the errors found during the integration test. * Fix lint error. * Update test to clear all the env vars during test. This was causing error since the Kokoro test has TF_PKG=tf-nightly injected during test. * Update lintrc to ignore google related package. * Another attempt to fix lint import error. * Address the review comment. * Fix lint error. * Another fix for lint. * Update test comment for env var clean up.
-
- 27 Mar, 2018 3 commits
-
-
Qianli Scott Zhu authored
* Update the importing logic for cpuinfo and psutil. Those two libs are usually not installed by default, and we should not force people to install them if they just want to run resnet. * Add pylint warning suppression.
-
Taylor Robie authored
* Add golden test util to streamline symbolic and numerical comparison to reference graphs, and apply golden tests to ResNet. update tests use more concise logic for path property delint add some comments delint address PR comments make resnet tests more concise, and supress warning test in py2 change resnet name template more shuffling of data dirs address PR comments and add tensorflow version info Remove subTest due to py2 switch from tf.__version__ to tf.VERSION, and include tf.GIT_VERSION supress lint error from json load unpack * address PR comments * address PR comments * delint
-
Taylor Robie authored
* add requirements.txt now that there are dependencies beyond tensorflow * direct pip info to README
-
- 26 Mar, 2018 1 commit
-
-
Qianli Scott Zhu authored
* Init test for logging benchmark run. * Fix collect CPU info. * Update max split for handling GPU information. * Another fix for parse GPU info. * Fix GPU and CPU info collector. * Update logging function to be static. * Remove the cifar10 logging and fix a lint error. * Address the review comment. * Fix lint error. * Fix lint error for logger and logger_test. * Another lint fix for the test. * Simplify the CPU info logging. We will start in a conserative way, and probably add more info in future. * Remove unused dependencies.
-
- 23 Mar, 2018 2 commits
-
-
Qianli Scott Zhu authored
-
Qianli Scott Zhu authored
* Update reset model for benchmark logging. To enable benchmark logging, just add "--hooks LoggingMetricHook" * Benchmark logger fix for resnet. 1. Update default at_end to False for metric logger to avoid checkpoint error. 2. Update resnet run to log final evaluation result. * Update log output for final eval_result. * Typo fix. * Unset the default value for benchmark_log_dir. Usually the benchmark should be logged to different directly for each run. Having a default value will hide the choice from user. * Bug fix for benchmark logger initialization. * Fix lint error. * Address the review comment. 1. Update the logger to cover evaluation result. 2. Move the flag to performance parser. * Undo the change for arg_parser.
-
- 21 Mar, 2018 2 commits
-
-
Qianli Scott Zhu authored
* Add session hook for benchmark metric logging. Current hook is very similar as the LoggingTensorHook. Some of the function are directly copied since the original one was not exposed for import. We should seek to eventually move this code to core when it is mature enough. * Update metric_hook to use LoggingTensorHook as base. The existing hook is similar enough to LoggingTensorHook, and we should eliminate duplicate as much as possible. * Address review comment. 1. Update global step tensor handle. 2. Update tests. 3. Update document. * Update tests for py3. * Fix lint error
-
Karmel Allison authored
-
- 20 Mar, 2018 2 commits
-
-
Karmel Allison authored
* Glint everything * Adding rcfile and pylinting * Extra newline * Few last lints
-
Katherine Wu authored
Use util functions hooks_helper and parser in mnist and wide_deep, and rename epochs_between_eval (from epochs_per_eval) (#3650)
-
- 19 Mar, 2018 2 commits
-
-
Scott Zhu authored
-
Taylor Robie authored
* use proper temp directory for end to end tests. * add supers to tearDown
-