- 02 Sep, 2018 1 commit
-
-
Toby Boyd authored
-
- 30 Aug, 2018 1 commit
-
-
Aman Gupta authored
Bypassing Export model step, if training on TPU's. As this need inference to be supported on TPU's. Remove this check once inference is supported. (#5209)
-
- 29 Aug, 2018 1 commit
-
-
Yanhui Liang authored
* Add distribution strategy to keras benchmark * Fix comments * Fix lints
-
- 28 Aug, 2018 2 commits
-
-
Jaeman authored
* Fix bug on distributed training in mnist using MirroredStrategy API * Remove unnecessary codes and chagne distribution strategy source - Remove multi-gpu - Remove TowerOptimizer - Change from MirroredStrategy to distribution_utils.get_distribution_strategy
-
Josh Gordon authored
-
- 27 Aug, 2018 2 commits
-
-
Taylor Robie authored
* Make ResNet robust to the case that epochs_between_evals does not divide train_epochs, and add an --eval_only option * add some comments to make the control flow easier to follow * address PR comments
-
Toby Boyd authored
* Add 5 epoch warmup * get_lr with warm_up only for imagenet * Add base_lr, remove fp16 unittest arg validation * Remove validation check stopping v1 and FP16
-
- 25 Aug, 2018 1 commit
-
-
Toby Boyd authored
* Add top_5 to to eval. * labels shape to [?] from [?,1] matches unittest.
-
- 22 Aug, 2018 1 commit
-
-
Reed authored
* Fix convergence issues for MLPerf. Thank you to @robieta for helping me find these issues, and for providng an algorithm for the `get_hit_rate_and_ndcg_mlperf` function. This change causes every forked process to set a new seed, so that forked processes do not generate the same set of random numbers. This improves evaluation hit rates. Additionally, it adds a flag, --ml_perf, that makes further changes so that the evaluation hit rate can match the MLPerf reference implementation. I ran 4 times with --ml_perf and 4 times without. Without --ml_perf, the highest hit rates achieved by each run were 0.6278, 0.6287, 0.6289, and 0.6241. With --ml_perf, the highest hit rates were 0.6353, 0.6356, 0.6367, and 0.6353. * fix lint error * Fix failing test * Address @robieta's feedback * Address more feedback
-
- 20 Aug, 2018 1 commit
-
-
Taylor Robie authored
* perform a codecs check and remove unicode \ufeff if utf-8 is not present * delint
-
- 18 Aug, 2018 1 commit
-
-
Reed authored
This is done by using a higher Pickle protocol version, which the Python docs describe as being "slightly more efficient". This reduces the file write time at the beginning from 2 1/2 minutes to 5 seconds.
-
- 16 Aug, 2018 2 commits
-
-
Jules Gagnon-Marchand authored
* Deterministic dataset order fix In order for the order of the files to be deterministic, in `tf.data.Dataset.list_files(..., shuffle)`, shuffle needs to be True, otherwise different iterator inits will yield different file orders * removed unnecessary shuffle of filenames * Removed the `_FILE_SHUFFLE_BUFFER` definition
-
Taylor Robie authored
-
- 15 Aug, 2018 1 commit
-
-
Wei Wang authored
-
- 14 Aug, 2018 2 commits
-
-
alope107 authored
* Fix Transformer TPU crash in Python 2.X. - Tensorflow raises an error when tf_inspect.getfullargspec is called on a functools.partial in Python 2.X. This issue would be hit during the eval stage of the Transformer TPU model. This change replaces the call to functools.partial with a lambda to work around the issue. * Remove unused import from transformer_main. * Fix lint error.
-
Zac Wellmer authored
* warm start a resent with all but the dense layer and only update the final layer weights when fine tuning * Update README for Transfer Learning * make lint happy and variable naming error related to scaled gradients * edit the test cases for cifar10 and imagenet to reflect the default case of no fine tuning
-
- 13 Aug, 2018 1 commit
-
-
kangtop729 authored
There is a typing error.
-
- 10 Aug, 2018 1 commit
-
-
Yanhui Liang authored
-
- 02 Aug, 2018 2 commits
-
-
Reed authored
-
Reed authored
The data_async_generation.py process would print to stderr, but the main process would redirect it's stderr to a pipe. The main process never read from the pipe, so when the pipe was full, data_async_generation.py would stall on a write to stderr. This change makes data_async_generation.py not write to stdout/stderr.
-
- 01 Aug, 2018 1 commit
-
-
Reed authored
The output of an embeddding layer is already flattened, so the Flatten layers acted as no-ops.
-
- 31 Jul, 2018 8 commits
-
-
Taylor Robie authored
-
Reed authored
* Fix crash when Python interpreter not on PATH. * Fix lint error.
-
Reed authored
-
Reed authored
-
Taylor Robie authored
* add indirection file * remove unused imports * fix import
-
Reed authored
-
Reed authored
-
Reed authored
-
- 30 Jul, 2018 2 commits
-
-
Taylor Robie authored
* intermediate commit * ncf now working * reorder pipeline * allow batched decode for file backed dataset * fix bug * more tweaks * parallize false negative generation * shared pool hack * workers ignore sigint * intermediate commit * simplify buffer backed dataset creation to fixed length record approach only. (more cleanup needed) * more tweaks * simplify pipeline * fix misplaced cleanup() calls. (validation works\!) * more tweaks * sixify memoryview usage * more sixification * fix bug * add future imports * break up training input pipeline * more pipeline tuning * first pass at moving negative generation to async * refactor async pipeline to use files instead of ipc * refactor async pipeline * move expansion and concatenation from reduce worker to generation workers * abandon complete async due to interactions with the tensorflow threadpool * cleanup * remove performance_comparison.py * experiment with rough generator + interleave pipeline * yet more pipeline tuning * update on-the-fly pipeline * refactor preprocessing, and move train generation behind a GRPC server * fix leftover call * intermediate commit * intermediate commit * fix index error in data pipeline, and add logging to train data server * make sharding more robust to imbalance * correctly sample with replacement * file buffers are no longer needed for this branch * tweak sampling methods * add README for data pipeline * fix eval sampling, and vectorize eval metrics * add spillover and static training batch sizes * clean up cruft from earlier iterations * rough delint * delint 2 / n * add type annotations * update run script * make run.sh a bit nicer * change embedding initializer to match reference * rough pass at pure estimator model_fn * impose static shape hack (revisit later) * refinements * fix dir error in run.sh * add documentation * add more docs and fix an assert * old data test is no longer valid. Keeping it around as reference for the new one * rough draft of data pipeline validation script * don't rely on shuffle default * tweaks and documentation * add separate eval batch size for performance * initial commit * terrible hacking * mini hacks * missed a bug * messing about trying to get TPU running * TFRecords based TPU attempt * bug fixes * don't log remotely * more bug fixes * TPU tweaks and bug fixes * more tweaks * more adjustments * rework model definition * tweak data pipeline * refactor async TFRecords generation * temp commit to run.sh * update log behavior * fix logging bug * add check for subprocess start to avoid cryptic hangs * unify deserialize and make it TPU compliant * delint * remove gRPC pipeline code * fix logging bug * delint and remove old test files * add unit tests for NCF pipeline * delint * clean up run.sh, and add run_tpu.sh * forgot the most important line * fix run.sh bugs * yet more bash debugging * small tweak to add keras summaries to model_fn * Clean up sixification issues * address PR comments * delinting is never over
-
Sundara Tejaswi Digumarti authored
Removed the conditional over distributed strategies when computing metrics. Metrics are now computed even when distributed strategies are used.
-
- 26 Jul, 2018 1 commit
-
-
Jiang Yu authored
* fix batch_size in transformer_main.py fix batch_size in transformer_main.py which causes ResourceExhaustedError: OOM during training Transformer models using models/official/transformer * small format change change format from one line to multiple ones in order to pass lint tests * remove trailing space and add comment
-
- 21 Jul, 2018 1 commit
-
-
Igor Ganichev authored
float32 should be fine for mnist loss and accuracy metrics and float64 is not available on TPUs.
-
- 20 Jul, 2018 1 commit
-
-
Yanhui Liang authored
* Add more arguments * Add eager mode * Add notes for eager mode * Address the comments * Fix argument typos * Add warning for eager and multi-gpu * Fix typo * Fix notes * Fix pylint
-
- 19 Jul, 2018 1 commit
-
-
Asim Shankar authored
-
- 13 Jul, 2018 2 commits
-
-
Yanhui Liang authored
* Add callbacks * Add readme * update readme * fix some comments * Address all comments * Update docstrings * Add method docstrings * Update callbacks * Add comments on global_step initialization * Some updates * Address comments
-
Qianli Scott Zhu authored
* Add shorter timeout for GCP util. * Add comment for change reason and unit for timeout.
-
- 12 Jul, 2018 1 commit
-
-
Taylor Robie authored
-
- 11 Jul, 2018 1 commit
-
-
cclauss authored
* Use six and feature detection in string conversion Leverage [__six.ensure_text()__](https://github.com/benjaminp/six/blob/master/six.py#L890) to deliver Unicode text in both Python 2 and Python 3. Follow Python porting best practice [use feature detection instead of version detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection) in ___unicode_to_native()__. * Revert the use of six.ensure_text() Thanks for catching that! I jumped the gun. It is I who have brought shame...
-
- 10 Jul, 2018 1 commit
-
-
Eliel Hojman authored
* Rename of files in README.md The name of the files to run specified in the README.md file were not updated. * Missed one file name change
-