Commits · f3be93a730ed41e2dea5c593e0f8c1cf7186290c · ModelZoo / ResNet50_tensorflow

02 Oct, 2018 1 commit
- Add flags for adam hyperparameters (#5428) · f3be93a7
  Reed authored Oct 02, 2018
  
  f3be93a7
01 Oct, 2018 2 commits

Some changes specific to prediction. Removing traces of expected results, as... · ea24314d
Aman Gupta authored Oct 01, 2018
```
Some changes specific to prediction. Removing traces of expected results, as this is just prediction.
```
ea24314d

Add `--image_bytes_as_serving_input` flag to export SavedModel (#5393) · 295259c3

netfs authored Oct 01, 2018

with serving signature that accepts JPEG image bytes instead
of a fixed size [HxWxC] image tensor.

Passing JPEG image bytes is easier for inference/serving use
cases. The model internally resizes/crops the JPEG image to
required [HxWxC] tensor before passing it on for actual model
inference.

This change aligns with Cloud TPU/ResNet-50 model that offers a
similar interface (jpeg bytes) for inferencing here:

https://github.com/tensorflow/tpu/tree/master/models/official/resnet

NOTE: This flag is set to `True` by default for ImageNet, and is
disallowed for CIFAR (as it does not apply to CIFAR).

295259c3

28 Sep, 2018 1 commit
- Update lr and default number epochs for CIFAR 10 (#5243) · 38385b0a
  Toby Boyd authored Sep 28, 2018
  
  38385b0a
25 Sep, 2018 2 commits
- Removing export_outputs value from TPUEstimator. This is not required with TPUEstimators · 6f8012b7
  Aman Gupta authored Sep 25, 2018
  
  6f8012b7
- Enabling Prediction in mnist_tpu. · dfe2a43f
  Aman Gupta authored Sep 25, 2018
```
Right now we don't have input data for prediction. So using top 10
entries of test data as input.
```
  dfe2a43f
20 Sep, 2018 1 commit

Fix/ncf mlperf tweaks: robustness and determinism (#5334) · 4dc1080d

Taylor Robie authored Sep 19, 2018

* bug fixes and add seed

* more random corrections

* make cleanup more robust

* return cleanup fn

* delint and address PR comments.

* delint and fix tests

* delinting is never done

* add pipeline hashing

* delint

4dc1080d

19 Sep, 2018 1 commit
- remove final_size parameter of resnet (#5326) · 8ff61153
  Naurril authored Sep 19, 2018
  
  8ff61153
17 Sep, 2018 1 commit
- Minor comment in movielens python file. (#5320) · 630c4ca8
  Tayo Oguntebi authored Sep 17, 2018
  
  630c4ca8
14 Sep, 2018 1 commit

Wait longer for async process to spawn. (#5307) · 17fa5286

Reed authored Sep 13, 2018

Sometimes it takes longer than 15 seconds, and even longer than 1 minute, to spawn and create the alive file.

17fa5286

13 Sep, 2018 4 commits
- Update logger.py · 9cc4cec8
  Sami Kama authored Sep 13, 2018
```
if-else to if only
```
  9cc4cec8
- Update logger.py · ea7009d4
  Sami Kama authored Sep 12, 2018
```
Spaces after equal signs as well
```
  ea7009d4
- Update logger.py · 381d88f6
  Sami Kama authored Sep 12, 2018
```
Spaces after commas
```
  381d88f6
- Pass session config to device_lib list_local_devices() call · eaaf891c
  Sami Kama authored Sep 12, 2018
  
  eaaf891c
11 Sep, 2018 2 commits
- Fix race condition with ready file. (#5271) · 34beb7ad
  Reed authored Sep 11, 2018
  
  34beb7ad
- Update link for official documentation on python style guide from google (#5268) · 3c373614
  Prathyush authored Sep 11, 2018
```
General guidelines for Google Python Style Guide links to a deprecated style guide from google
```
  3c373614
05 Sep, 2018 4 commits

Fix spurious "did not start correctly" error. (#5252) · 7babedc5

Reed authored Sep 05, 2018

* Fix spurious "did not start correctly" error.

The error "Generation subprocess did not start correctly" would occur if the async process started up after the main process checked for the subproc_alive file.

* Add error message

7babedc5

Add tf.float32 to unittest args · 5c0c749b
Toby Boyd authored Sep 05, 2018

5c0c749b
Move tf.cast for fp16 to input pipeline. · 76dbcb5a
Toby Boyd authored Sep 03, 2018

76dbcb5a

Fix crash caused by race in the async process. (#5250) · 5856878d

Reed authored Sep 05, 2018

When constructing the evaluation records, data_async_generation.py would copy the records into the final directory. The main process would wait until the eval records existed. However, the main process would sometimes read the eval records before they were fully copied, causing a DataLossError.

5856878d

04 Sep, 2018 1 commit
- Update on_finish from async to sync (#5242) · e0f6a392
  Yanhui Liang authored Sep 04, 2018
  
  e0f6a392
02 Sep, 2018 2 commits
- tweak synth input_fn comments · 967133c1
  Toby Boyd authored Sep 02, 2018
  
  967133c1
- Improve synthic data performance · c9972ad6
  Toby Boyd authored Sep 02, 2018
  
  c9972ad6
01 Sep, 2018 2 commits
- Update README · 76c0ac54
  Toby Boyd authored Sep 01, 2018
  
  76c0ac54
- Change default to v1 and 90 epochs · 7b21c9f7
  Toby Boyd authored Sep 01, 2018
  
  7b21c9f7
30 Aug, 2018 2 commits

Bypassing Export model step, if training on TPU's. As this need inference to... · 23b5b422

Aman Gupta authored Aug 30, 2018

Bypassing Export model step, if training on TPU's. As this need inference to be supported on TPU's. Remove this check once inference is supported. (#5209)

23b5b422

Bypassing Export model step, if training on TPU's. As this need inference to... · 5133522f

Aman Gupta authored Aug 30, 2018

Bypassing Export model step, if training on TPU's. As this need inference to be supported on TPU's. Remove this check once inference is supported.

5133522f

29 Aug, 2018 1 commit
- Add distribution strategy to keras benchmark (#5188) · 28863de1
  Yanhui Liang authored Aug 29, 2018
```
* Add distribution strategy to keras benchmark

* Fix comments

* Fix lints
```
  28863de1
28 Aug, 2018 2 commits

Fix bug on distributed training in mnist using MirroredStrategy API (#5183) · 6a0dda1f

Jaeman authored Aug 29, 2018

* Fix bug on distributed training in mnist using MirroredStrategy API

* Remove unnecessary codes and chagne distribution strategy source
- Remove multi-gpu
- Remove TowerOptimizer
- Change from MirroredStrategy to distribution_utils.get_distribution_strategy

6a0dda1f

Adding a note on fairness (#5197) · 0d105c32
Josh Gordon authored Aug 28, 2018

0d105c32

27 Aug, 2018 2 commits

ResNet eval_only mode (#5186) · d1c48afc

Taylor Robie authored Aug 27, 2018

* Make ResNet robust to the case that epochs_between_evals does not divide train_epochs, and add an --eval_only option

* add some comments to make the control flow easier to follow

* address PR comments

d1c48afc

Add 5 epoch warmup to resnet (#5176) · 9bf586de

Toby Boyd authored Aug 27, 2018

* Add 5 epoch warmup

* get_lr with warm_up only for imagenet

* Add base_lr, remove fp16 unittest arg validation

* Remove validation check stopping v1 and FP16

9bf586de

25 Aug, 2018 1 commit
- Add top_5 to to eval to resnet (#5178) · acb0ea4e
  Toby Boyd authored Aug 24, 2018
```
* Add top_5 to to eval.

* labels shape to [?] from [?,1] matches unittest.
```
  acb0ea4e
22 Aug, 2018 1 commit

Fix convergence issues for MLPerf. (#5161) · 64710c05

Reed authored Aug 22, 2018

* Fix convergence issues for MLPerf.

Thank you to @robieta for helping me find these issues, and for providng an algorithm for the `get_hit_rate_and_ndcg_mlperf` function.

This change causes every forked process to set a new seed, so that forked processes do not generate the same set of random numbers. This improves evaluation hit rates.

Additionally, it adds a flag, --ml_perf, that makes further changes so that the evaluation hit rate can match the MLPerf reference implementation.

I ran 4 times with --ml_perf and 4 times without. Without --ml_perf, the highest hit rates achieved by each run were 0.6278, 0.6287, 0.6289, and 0.6241. With --ml_perf, the highest hit rates were 0.6353, 0.6356, 0.6367, and 0.6353.

* fix lint error

* Fix failing test

* Address @robieta's feedback

* Address more feedback

64710c05

20 Aug, 2018 1 commit
- Strip \ufeff if system cannot support unicode (#5145) · d089975f
  Taylor Robie authored Aug 20, 2018
```
* perform a codecs check and remove unicode \ufeff if utf-8 is not present

* delint
```
  d089975f
18 Aug, 2018 1 commit

Speed up cache construction. (#5131) · 5aee67b4

Reed authored Aug 17, 2018

This is done by using a higher Pickle protocol version, which the Python docs describe as being "slightly more efficient". This reduces the file write time at the beginning from 2 1/2 minutes to 5 seconds.

5aee67b4

16 Aug, 2018 2 commits

Deterministic dataset order fix (#5098) · 468d8bb6

Jules Gagnon-Marchand authored Aug 16, 2018

* Deterministic dataset order fix

In order for the order of the files to be deterministic, in `tf.data.Dataset.list_files(..., shuffle)`, shuffle needs to be True, otherwise different iterator inits will yield different file orders

* removed unnecessary shuffle of filenames

* Removed the `_FILE_SHUFFLE_BUFFER` definition

468d8bb6

use existing inter and intra flags, and fix wide deep test. (#5110) · 909ee1b3
Taylor Robie authored Aug 16, 2018

909ee1b3

15 Aug, 2018 1 commit
- Add Inter/Intra_op_parallelism_threads Support to Wide and Deep (#5046) · 55d55abc
  Wei Wang authored Aug 15, 2018
  
  55d55abc
14 Aug, 2018 1 commit

Transformer partial fix (#5092) · 6f5967a0

alope107 authored Aug 14, 2018

* Fix Transformer TPU crash in Python 2.X.

- Tensorflow raises an error when tf_inspect.getfullargspec is called on
a functools.partial in Python 2.X. This issue would be hit during the
eval stage of the Transformer TPU model. This change replaces the call
to functools.partial with a lambda to work around the issue.

* Remove unused import from transformer_main.

* Fix lint error.

6f5967a0