- 02 Oct, 2018 1 commit
-
-
Reed authored
-
- 01 Oct, 2018 2 commits
-
-
Aman Gupta authored
Some changes specific to prediction. Removing traces of expected results, as this is just prediction.
-
netfs authored
with serving signature that accepts JPEG image bytes instead of a fixed size [HxWxC] image tensor. Passing JPEG image bytes is easier for inference/serving use cases. The model internally resizes/crops the JPEG image to required [HxWxC] tensor before passing it on for actual model inference. This change aligns with Cloud TPU/ResNet-50 model that offers a similar interface (jpeg bytes) for inferencing here: https://github.com/tensorflow/tpu/tree/master/models/official/resnet NOTE: This flag is set to `True` by default for ImageNet, and is disallowed for CIFAR (as it does not apply to CIFAR).
-
- 28 Sep, 2018 1 commit
-
-
Toby Boyd authored
-
- 25 Sep, 2018 2 commits
-
-
Aman Gupta authored
-
Aman Gupta authored
Right now we don't have input data for prediction. So using top 10 entries of test data as input.
-
- 20 Sep, 2018 1 commit
-
-
Taylor Robie authored
* bug fixes and add seed * more random corrections * make cleanup more robust * return cleanup fn * delint and address PR comments. * delint and fix tests * delinting is never done * add pipeline hashing * delint
-
- 19 Sep, 2018 1 commit
-
-
Naurril authored
-
- 17 Sep, 2018 1 commit
-
-
Tayo Oguntebi authored
-
- 14 Sep, 2018 1 commit
-
-
Reed authored
Sometimes it takes longer than 15 seconds, and even longer than 1 minute, to spawn and create the alive file.
-
- 13 Sep, 2018 4 commits
- 11 Sep, 2018 2 commits
- 05 Sep, 2018 4 commits
-
-
Reed authored
* Fix spurious "did not start correctly" error. The error "Generation subprocess did not start correctly" would occur if the async process started up after the main process checked for the subproc_alive file. * Add error message
-
Toby Boyd authored
-
Toby Boyd authored
-
Reed authored
When constructing the evaluation records, data_async_generation.py would copy the records into the final directory. The main process would wait until the eval records existed. However, the main process would sometimes read the eval records before they were fully copied, causing a DataLossError.
-
- 04 Sep, 2018 1 commit
-
-
Yanhui Liang authored
-
- 02 Sep, 2018 2 commits
- 01 Sep, 2018 2 commits
- 30 Aug, 2018 2 commits
-
-
Aman Gupta authored
Bypassing Export model step, if training on TPU's. As this need inference to be supported on TPU's. Remove this check once inference is supported. (#5209)
-
Aman Gupta authored
Bypassing Export model step, if training on TPU's. As this need inference to be supported on TPU's. Remove this check once inference is supported.
-
- 29 Aug, 2018 1 commit
-
-
Yanhui Liang authored
* Add distribution strategy to keras benchmark * Fix comments * Fix lints
-
- 28 Aug, 2018 2 commits
-
-
Jaeman authored
* Fix bug on distributed training in mnist using MirroredStrategy API * Remove unnecessary codes and chagne distribution strategy source - Remove multi-gpu - Remove TowerOptimizer - Change from MirroredStrategy to distribution_utils.get_distribution_strategy
-
Josh Gordon authored
-
- 27 Aug, 2018 2 commits
-
-
Taylor Robie authored
* Make ResNet robust to the case that epochs_between_evals does not divide train_epochs, and add an --eval_only option * add some comments to make the control flow easier to follow * address PR comments
-
Toby Boyd authored
* Add 5 epoch warmup * get_lr with warm_up only for imagenet * Add base_lr, remove fp16 unittest arg validation * Remove validation check stopping v1 and FP16
-
- 25 Aug, 2018 1 commit
-
-
Toby Boyd authored
* Add top_5 to to eval. * labels shape to [?] from [?,1] matches unittest.
-
- 22 Aug, 2018 1 commit
-
-
Reed authored
* Fix convergence issues for MLPerf. Thank you to @robieta for helping me find these issues, and for providng an algorithm for the `get_hit_rate_and_ndcg_mlperf` function. This change causes every forked process to set a new seed, so that forked processes do not generate the same set of random numbers. This improves evaluation hit rates. Additionally, it adds a flag, --ml_perf, that makes further changes so that the evaluation hit rate can match the MLPerf reference implementation. I ran 4 times with --ml_perf and 4 times without. Without --ml_perf, the highest hit rates achieved by each run were 0.6278, 0.6287, 0.6289, and 0.6241. With --ml_perf, the highest hit rates were 0.6353, 0.6356, 0.6367, and 0.6353. * fix lint error * Fix failing test * Address @robieta's feedback * Address more feedback
-
- 20 Aug, 2018 1 commit
-
-
Taylor Robie authored
* perform a codecs check and remove unicode \ufeff if utf-8 is not present * delint
-
- 18 Aug, 2018 1 commit
-
-
Reed authored
This is done by using a higher Pickle protocol version, which the Python docs describe as being "slightly more efficient". This reduces the file write time at the beginning from 2 1/2 minutes to 5 seconds.
-
- 16 Aug, 2018 2 commits
-
-
Jules Gagnon-Marchand authored
* Deterministic dataset order fix In order for the order of the files to be deterministic, in `tf.data.Dataset.list_files(..., shuffle)`, shuffle needs to be True, otherwise different iterator inits will yield different file orders * removed unnecessary shuffle of filenames * Removed the `_FILE_SHUFFLE_BUFFER` definition
-
Taylor Robie authored
-
- 15 Aug, 2018 1 commit
-
-
Wei Wang authored
-
- 14 Aug, 2018 1 commit
-
-
alope107 authored
* Fix Transformer TPU crash in Python 2.X. - Tensorflow raises an error when tf_inspect.getfullargspec is called on a functools.partial in Python 2.X. This issue would be hit during the eval stage of the Transformer TPU model. This change replaces the call to functools.partial with a lambda to work around the issue. * Remove unused import from transformer_main. * Fix lint error.
-