- 14 Sep, 2018 1 commit
-
-
Reed authored
Sometimes it takes longer than 15 seconds, and even longer than 1 minute, to spawn and create the alive file.
-
- 11 Sep, 2018 2 commits
- 05 Sep, 2018 4 commits
-
-
Reed authored
* Fix spurious "did not start correctly" error. The error "Generation subprocess did not start correctly" would occur if the async process started up after the main process checked for the subproc_alive file. * Add error message
-
Toby Boyd authored
-
Toby Boyd authored
-
Reed authored
When constructing the evaluation records, data_async_generation.py would copy the records into the final directory. The main process would wait until the eval records existed. However, the main process would sometimes read the eval records before they were fully copied, causing a DataLossError.
-
- 04 Sep, 2018 1 commit
-
-
Yanhui Liang authored
-
- 02 Sep, 2018 2 commits
- 01 Sep, 2018 2 commits
- 30 Aug, 2018 1 commit
-
-
Aman Gupta authored
Bypassing Export model step, if training on TPU's. As this need inference to be supported on TPU's. Remove this check once inference is supported. (#5209)
-
- 29 Aug, 2018 1 commit
-
-
Yanhui Liang authored
* Add distribution strategy to keras benchmark * Fix comments * Fix lints
-
- 28 Aug, 2018 2 commits
-
-
Jaeman authored
* Fix bug on distributed training in mnist using MirroredStrategy API * Remove unnecessary codes and chagne distribution strategy source - Remove multi-gpu - Remove TowerOptimizer - Change from MirroredStrategy to distribution_utils.get_distribution_strategy
-
Josh Gordon authored
-
- 27 Aug, 2018 2 commits
-
-
Taylor Robie authored
* Make ResNet robust to the case that epochs_between_evals does not divide train_epochs, and add an --eval_only option * add some comments to make the control flow easier to follow * address PR comments
-
Toby Boyd authored
* Add 5 epoch warmup * get_lr with warm_up only for imagenet * Add base_lr, remove fp16 unittest arg validation * Remove validation check stopping v1 and FP16
-
- 25 Aug, 2018 1 commit
-
-
Toby Boyd authored
* Add top_5 to to eval. * labels shape to [?] from [?,1] matches unittest.
-
- 22 Aug, 2018 1 commit
-
-
Reed authored
* Fix convergence issues for MLPerf. Thank you to @robieta for helping me find these issues, and for providng an algorithm for the `get_hit_rate_and_ndcg_mlperf` function. This change causes every forked process to set a new seed, so that forked processes do not generate the same set of random numbers. This improves evaluation hit rates. Additionally, it adds a flag, --ml_perf, that makes further changes so that the evaluation hit rate can match the MLPerf reference implementation. I ran 4 times with --ml_perf and 4 times without. Without --ml_perf, the highest hit rates achieved by each run were 0.6278, 0.6287, 0.6289, and 0.6241. With --ml_perf, the highest hit rates were 0.6353, 0.6356, 0.6367, and 0.6353. * fix lint error * Fix failing test * Address @robieta's feedback * Address more feedback
-
- 20 Aug, 2018 1 commit
-
-
Taylor Robie authored
* perform a codecs check and remove unicode \ufeff if utf-8 is not present * delint
-
- 18 Aug, 2018 1 commit
-
-
Reed authored
This is done by using a higher Pickle protocol version, which the Python docs describe as being "slightly more efficient". This reduces the file write time at the beginning from 2 1/2 minutes to 5 seconds.
-
- 16 Aug, 2018 2 commits
-
-
Jules Gagnon-Marchand authored
* Deterministic dataset order fix In order for the order of the files to be deterministic, in `tf.data.Dataset.list_files(..., shuffle)`, shuffle needs to be True, otherwise different iterator inits will yield different file orders * removed unnecessary shuffle of filenames * Removed the `_FILE_SHUFFLE_BUFFER` definition
-
Taylor Robie authored
-
- 15 Aug, 2018 1 commit
-
-
Wei Wang authored
-
- 14 Aug, 2018 2 commits
-
-
alope107 authored
* Fix Transformer TPU crash in Python 2.X. - Tensorflow raises an error when tf_inspect.getfullargspec is called on a functools.partial in Python 2.X. This issue would be hit during the eval stage of the Transformer TPU model. This change replaces the call to functools.partial with a lambda to work around the issue. * Remove unused import from transformer_main. * Fix lint error.
-
Zac Wellmer authored
* warm start a resent with all but the dense layer and only update the final layer weights when fine tuning * Update README for Transfer Learning * make lint happy and variable naming error related to scaled gradients * edit the test cases for cifar10 and imagenet to reflect the default case of no fine tuning
-
- 13 Aug, 2018 1 commit
-
-
kangtop729 authored
There is a typing error.
-
- 10 Aug, 2018 1 commit
-
-
Yanhui Liang authored
-
- 02 Aug, 2018 2 commits
-
-
Reed authored
-
Reed authored
The data_async_generation.py process would print to stderr, but the main process would redirect it's stderr to a pipe. The main process never read from the pipe, so when the pipe was full, data_async_generation.py would stall on a write to stderr. This change makes data_async_generation.py not write to stdout/stderr.
-
- 01 Aug, 2018 1 commit
-
-
Reed authored
The output of an embeddding layer is already flattened, so the Flatten layers acted as no-ops.
-
- 31 Jul, 2018 8 commits
-
-
Taylor Robie authored
-
Reed authored
* Fix crash when Python interpreter not on PATH. * Fix lint error.
-
Reed authored
-
Reed authored
-
Taylor Robie authored
* add indirection file * remove unused imports * fix import
-
Reed authored
-
Reed authored
-
Reed authored
-