Unverified Commit dda23ecf authored by Toby Boyd's avatar Toby Boyd Committed by GitHub
Browse files

Merge pull request #5223 from tfboyd/update_resnet_defaults_v1

Update resnet defaults v1
parents 481728db 76c0ac54
...@@ -8,7 +8,23 @@ See the following papers for more background: ...@@ -8,7 +8,23 @@ See the following papers for more background:
[2] [Identity Mappings in Deep Residual Networks](https://arxiv.org/pdf/1603.05027.pdf) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Jul 2016. [2] [Identity Mappings in Deep Residual Networks](https://arxiv.org/pdf/1603.05027.pdf) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Jul 2016.
In code v1 refers to the resnet defined in [1], while v2 correspondingly refers to [2]. The principle difference between the two versions is that v1 applies batch normalization and activation after convolution, while v2 applies batch normalization, then activation, and finally convolution. A schematic comparison is presented in Figure 1 (left) of [2]. In code, v1 refers to the ResNet defined in [1] but where a stride 2 is used on
the 3x3 conv rather than the first 1x1 in the bottleneck. This change results
in higher and more stable accuracy with less epochs than the original v1 and has
shown to scale to higher batch sizes with minimal degradation in accuracy.
There is no originating paper and the first mention we are aware of was in the
[torch version of ResNetv1](https://github.com/facebook/fb.resnet.torch). Most
popular v1 implementations are this implementation which we call ResNetv1.5. In
testing we found v1.5 requires ~12% more compute to train and has 6% reduced
throughput for inference compared to ResNetv1. Comparing the v1 model to the
v1.5 model, which has happened in blog posts, is an apples-to-oranges
comparison especially in regards to hardware or platform performance. CIFAR-10
ResNet does not use the bottleneck and is not impacted by these nuances.
v2 refers to [2]. The principle difference between the two versions is that v1
applies batch normalization and activation after convolution, while v2 applies
batch normalization, then activation, and finally convolution. A schematic
comparison is presented in Figure 1 (left) of [2].
Please proceed according to which dataset you would like to train/evaluate on: Please proceed according to which dataset you would like to train/evaluate on:
...@@ -77,7 +93,7 @@ ResNet-50 v1 (Accuracy 75.91%): ...@@ -77,7 +93,7 @@ ResNet-50 v1 (Accuracy 75.91%):
* [SavedModel](http://download.tensorflow.org/models/official/20180601_resnet_v1_imagenet_savedmodel.tar.gz) * [SavedModel](http://download.tensorflow.org/models/official/20180601_resnet_v1_imagenet_savedmodel.tar.gz)
### Transfer Learning ### Transfer Learning
You can use a pretrained model to initialize a training process. In addition you are able to freeze all but the final fully connected layers to fine tune your model. Transfer Learning is useful when training on your own small datasets. For a brief look at transfer learning in the context of convolutional neural networks, we recommend reading these [short notes](http://cs231n.github.io/transfer-learning/). You can use a pretrained model to initialize a training process. In addition you are able to freeze all but the final fully connected layers to fine tune your model. Transfer Learning is useful when training on your own small datasets. For a brief look at transfer learning in the context of convolutional neural networks, we recommend reading these [short notes](http://cs231n.github.io/transfer-learning/).
To fine tune a pretrained resnet you must make three changes to your training procedure: To fine tune a pretrained resnet you must make three changes to your training procedure:
......
...@@ -323,7 +323,7 @@ def define_imagenet_flags(): ...@@ -323,7 +323,7 @@ def define_imagenet_flags():
resnet_run_loop.define_resnet_flags( resnet_run_loop.define_resnet_flags(
resnet_size_choices=['18', '34', '50', '101', '152', '200']) resnet_size_choices=['18', '34', '50', '101', '152', '200'])
flags.adopt_module_key_flags(resnet_run_loop) flags.adopt_module_key_flags(resnet_run_loop)
flags_core.set_defaults(train_epochs=100) flags_core.set_defaults(train_epochs=90)
def run_imagenet(flags_obj): def run_imagenet(flags_obj):
......
...@@ -519,7 +519,7 @@ def define_resnet_flags(resnet_size_choices=None): ...@@ -519,7 +519,7 @@ def define_resnet_flags(resnet_size_choices=None):
flags.adopt_module_key_flags(flags_core) flags.adopt_module_key_flags(flags_core)
flags.DEFINE_enum( flags.DEFINE_enum(
name='resnet_version', short_name='rv', default='2', name='resnet_version', short_name='rv', default='1',
enum_values=['1', '2'], enum_values=['1', '2'],
help=flags_core.help_wrap( help=flags_core.help_wrap(
'Version of ResNet. (1 or 2) See README.md for details.')) 'Version of ResNet. (1 or 2) See README.md for details.'))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment