Commits · 2644707cd388f5a791a04dc41fe5fdc77a55a6a4 · ModelZoo / ResNet50_tensorflow

25 Oct, 2018 3 commits
- prevent async process from writing alive file until the main process has... · 2644707c
  Taylor Robie authored Oct 25, 2018
```
prevent async process from writing alive file until the main process has created the cache root (#5614)
```
  2644707c
- Fix crash when --ml_perf flag is not specified. (#5610) · 48a4b443
  Reed authored Oct 25, 2018
```
The error message was:

absl.flags._exceptions.IllegalFlagValueError: flag --ml_perf=None: ('Non-boolean argument to boolean flag', 'None')
```
  48a4b443
- Update distribution_utils.py · c5dbd487
  josh11b authored Oct 24, 2018
  
  c5dbd487
24 Oct, 2018 4 commits

Move version check to a function (#5601) · f175abc3
Taylor Robie authored Oct 24, 2018
```
* move version check to a function

* delint

* tweak pip check

* delint
```
f175abc3
Rename "num_towers" to "num_replicas" (#5599) · d7676c1c
josh11b authored Oct 24, 2018
```
To match new terminology in DistributionStrategy.
```
d7676c1c
AllReduceCrossTowerOps -> AllReduceCrossDeviceOps · 6c560cb3
josh11b authored Oct 24, 2018

6c560cb3

Add logging calls to NCF (#5576) · 780f5265

Taylor Robie authored Oct 24, 2018

* first pass at __getattr__ abuse logger

* first pass at adding tags to NCF

* minor formatting updates

* fix tag name

* convert metrics to python floats

* getting closer...

* direct mlperf logs to a file

* small tweaks and add stitching

* update tags

* fix tag and add a sudo call

* tweak format of run.sh

* delint

* use distribution strategies for evaluation

* address PR comments

* delint and fix test

* adjust flag validation for xla

* add prefix to distinguish log stitching

* fix index bug

* fix clear cache for root user

* dockerize cache drop

* TIL some regex magic

780f5265

20 Oct, 2018 1 commit
- Add XLA support to NCF (#5572) · f2b702a0
  Reed authored Oct 19, 2018
  
  f2b702a0
19 Oct, 2018 1 commit
- fix error when last shard is not assigned a batch (#5569) · bf298439
  Taylor Robie authored Oct 18, 2018
  
  bf298439
18 Oct, 2018 3 commits

Reorder NCF data pipeline (#5536) · 19d4eaaf

Taylor Robie authored Oct 18, 2018

* intermediate commit

finish replacing spillover with resampled padding

intermediate commit

* resolve merge conflict

* intermediate commit

* further consolidate the data pipeline

* complete first pass at data pipeline refactor

* remove some leftover code

* fix test

* remove resampling, and move train padding logic into neumf.py

* small tweaks

* fix weight bug

* address PR comments

* fix dict zip. (Reed led me astray)

* delint

* make data test deterministic and delint

* Reed didn't lead me astray. I just can't read.

* more delinting

* even more delinting

* use resampling for last batch padding

* pad last batch with unique data

* Revert "pad last batch with unique data"

This reverts commit cbdf46efcd5c7907038a24105b88d38e7f1d6da2.

* move padded batch to the beginning

* delint

* fix step check for synthetic data

19d4eaaf

Drop references to `is_single_tower`. · b860839f

josh11b authored Oct 17, 2018

Since we plan on deleting this method, it is only used in distribution_utils_test.py.

b860839f

Delint. · 3ec25e5d
Shawn Wang authored Oct 17, 2018

3ec25e5d

17 Oct, 2018 2 commits
- Fix a few imports. · f9742f43
  Shawn Wang authored Oct 17, 2018
  
  f9742f43
- Refactor neumf_model.py to support users who just need top_k and ndcg tensors. · 91000bc5
  Shawn Wang authored Oct 17, 2018
  
  91000bc5
14 Oct, 2018 1 commit
- Make flagfile sharing robust to distributed filesystems and multi-worker setups. (#5521) · 91b2debd
  Taylor Robie authored Oct 14, 2018
```
* move flagfile into the cache_dir

* remove duplicate code

* delint
```
  91b2debd
13 Oct, 2018 6 commits
- fix lint import order · b98409cb
  Toby Boyd authored Oct 13, 2018
  
  b98409cb
- comment editing and code cleanup. · ea37b1b5
  Toby Boyd authored Oct 13, 2018
  
  ea37b1b5
- typos and method rename · 15e53f3c
  Toby Boyd authored Oct 13, 2018
  
  15e53f3c
- refactor method and flag names. · 26301e74
  Toby Boyd authored Oct 13, 2018
  
  26301e74
- Replace multiprocess pool with popen_helper.get_pool() in data_preprocessing. (#5512) · 0c5c3a77
  shizhiw authored Oct 12, 2018
```
* Use data_dir instead of flags.FLAGS.data_dir in data_preprocessing.py.

* Use data_dir instead of flags.FLAGS.data_dir in data_preprocessing.py.

* Replace multiprocess pool with popen_helper.get_pool() in data_preprocessing.
```
  0c5c3a77
- fix num_parallel_batches · eff33131
  Toby Boyd authored Oct 12, 2018
  
  eff33131
12 Oct, 2018 3 commits
- forced nccl has same num_packs as default. · 1f21b69e
  Toby Boyd authored Oct 12, 2018
  
  1f21b69e
- perf_args piped in and add back top_1 and top_5 · bd86e960
  Toby Boyd authored Oct 12, 2018
  
  bd86e960
- Add option to run perf tuned args. · 2894bb53
  Toby Boyd authored Oct 12, 2018
  
  2894bb53
11 Oct, 2018 5 commits
- Use data_dir instead of flags.FLAGS.data_dir in data_preprocessing.py. (#5506) · b88da6ee
  shizhiw authored Oct 11, 2018
```
* Use data_dir instead of flags.FLAGS.data_dir in data_preprocessing.py.

* Use data_dir instead of flags.FLAGS.data_dir in data_preprocessing.py.
```
  b88da6ee
- Add comments, exit async process after waiting for flagfile for too long and... · 1980a0da
  Shawn Wang authored Oct 11, 2018
```
Add comments, exit async process after waiting for flagfile for too long and make directory for data_dir in case it does not exist.
```
  1980a0da
- Use flagfile to pass flags to data async generation process: small fix. · 5d497296
  Shawn Wang authored Oct 11, 2018
  
  5d497296
- Use flagfile to pass flags to data async generation process. · c88fcb2b
  Shawn Wang authored Oct 11, 2018
  
  c88fcb2b
- Added option to use_subprocess or not in ncf_main.py. · d4ac494f
  Shawn Wang authored Oct 11, 2018
  
  d4ac494f
10 Oct, 2018 2 commits
- Improve perf by converting sparse grads to dense. (#5470) · ad254209
  Reed authored Oct 10, 2018
  
  ad254209
- Add --use_synthetic_data option to NCF. (#5468) · 75d592e9
  Reed authored Oct 10, 2018
```
* Add --use_synthetic_data option to NCF.

* Add comment to _SYNTHETIC_BATCHES_PER_EPOCH

* Fix test

* Hopefully fix lint issue
```
  75d592e9
09 Oct, 2018 2 commits
- fixed a missing import. · a45cafb3
  Shawn Wang authored Oct 09, 2018
  
  a45cafb3
- Allow data async generation to be run as a separate job rather than as a subprocess. · 9b7e4163
  Shawn Wang authored Oct 09, 2018
  
  9b7e4163
06 Oct, 2018 1 commit
- roll back AUTOTUNE · c4e7318b
  Toby Boyd authored Oct 05, 2018
  
  c4e7318b
05 Oct, 2018 2 commits
- Use AUTOTUNE, remove noop take, and comment fixes · fe3746e6
  Toby Boyd authored Oct 04, 2018
  
  fe3746e6
- Fix/ncf eval default (#5438) · aec1fec6
  Taylor Robie authored Oct 04, 2018
```
* improve default handling for eval_batch_size

* return eval_batch_size default to None

* fix syntax error
```
  aec1fec6
04 Oct, 2018 2 commits

Update resnet README with new checkpoints and SavedModels (#5440) · 505cad95

Taylor Robie authored Oct 04, 2018

* Update resnet README with new checkpoints and SavedModels

* add more detail on channels_first vs channels_last

* fix typo

* add disclaimer about checkpoints

505cad95

set strip_default_attrs=True for SavedModel exports (#5439) · cdcd3ec2

Taylor Robie authored Oct 04, 2018

* set strip_default_attrs=True for SavedModel exports

* specify dtype in resnet export

* another dtype fix

* fix another dtype issue, and set --image_bytes_as_serving_input to default to False

cdcd3ec2

03 Oct, 2018 2 commits

link to non-deprecated imagenet preprocessing script · 0c74ba69
Toby Boyd authored Oct 03, 2018

0c74ba69

Move evaluation to .evaluate() (#5413) · c494582f

Taylor Robie authored Oct 02, 2018

* move evaluation from numpy to tensorflow

fix syntax error

don't use sigmoid to convert logits. there is too much precision loss.

WIP: add logit metrics

continue refactor of NCF evaluation

fix syntax error

fix bugs in eval loss calculation

fix eval loss reweighting

remove numpy based metric calculations

fix logging hooks

fix sigmoid to softmax bug

fix comment

catch rare PIPE error and address some PR comments

* fix metric test and address PR comments

* delint and fix python2

* fix test and address PR comments

* extend eval to TPUs

c494582f