Merge pull request #5223 from tfboyd/update_resnet_defaults_v1

Update resnet defaults v1

Merge pull request #5223 from tfboyd/update_resnet_defaults_v1
Update resnet defaults v1
dda23ecf · Toby Boyd · GitHub · 481728db · 76c0ac54 · dda23ecf
Unverified Commit dda23ecf authored Sep 04, 2018 by Toby Boyd Committed by GitHub Sep 04, 2018
Showing with 20 additions and 4 deletions

official/resnet/README.md official/resnet/README.md +18 -2

official/resnet/imagenet_main.py official/resnet/imagenet_main.py +1 -1

official/resnet/resnet_run_loop.py official/resnet/resnet_run_loop.py +1 -1

No files found.
--- a/official/resnet/README.md
+++ b/official/resnet/README.md
@@ -8,7 +8,23 @@ See the following papers for more background:

 [2] [Identity Mappings in Deep Residual Networks](https://arxiv.org/pdf/1603.05027.pdf) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Jul 2016.

-In code v1 refers to the resnet defined in [1], while v2 correspondingly refers to [2]. The principle difference between the two versions is that v1 applies batch normalization and activation after convolution, while v2 applies batch normalization, then activation, and finally convolution. A schematic comparison is presented in Figure 1 (left) of [2].
+In code, v1 refers to the ResNet defined in [1] but where a stride 2 is used on
+the 3x3 conv rather than the first 1x1 in the bottleneck. This change results
+in higher and more stable accuracy with less epochs than the original v1 and has
+shown to scale to higher batch sizes with minimal degradation in accuracy.
+There is no originating paper and the first mention we are aware of was in the
+[torch version of ResNetv1](https://github.com/facebook/fb.resnet.torch). Most
+popular v1 implementations are this implementation which we call ResNetv1.5. In
+testing we found v1.5 requires ~12% more compute to train and has 6% reduced
+throughput for inference compared to ResNetv1. Comparing the v1 model to the
+v1.5 model, which has happened in blog posts, is an apples-to-oranges
+comparison especially in regards to hardware or platform performance. CIFAR-10
+ResNet does not use the bottleneck and is not impacted by these nuances.
+
+v2 refers to [2]. The principle difference between the two versions is that v1
+applies batch normalization and activation after convolution, while v2 applies
+batch normalization, then activation, and finally convolution. A schematic
+comparison is presented in Figure 1 (left) of [2].

 Please proceed according to which dataset you would like to train/evaluate on:


--- a/official/resnet/imagenet_main.py
+++ b/official/resnet/imagenet_main.py
@@ -323,7 +323,7 @@ def define_imagenet_flags():
  resnet_run_loop.define_resnet_flags(
      resnet_size_choices=['18', '34', '50', '101', '152', '200'])
  flags.adopt_module_key_flags(resnet_run_loop)
-  flags_core.set_defaults(train_epochs=100)
+  flags_core.set_defaults(train_epochs=90)


 def run_imagenet(flags_obj):

--- a/official/resnet/resnet_run_loop.py
+++ b/official/resnet/resnet_run_loop.py
@@ -519,7 +519,7 @@ def define_resnet_flags(resnet_size_choices=None):
  flags.adopt_module_key_flags(flags_core)

  flags.DEFINE_enum(
-      name='resnet_version', short_name='rv', default='2',
+      name='resnet_version', short_name='rv', default='1',
      enum_values=['1', '2'],
      help=flags_core.help_wrap(
          'Version of ResNet. (1 or 2) See README.md for details.'))