Commit b6215c6f authored by Abdullah Rashwan's avatar Abdullah Rashwan Committed by A. Unique TensorFlower
Browse files

Internal change

PiperOrigin-RevId: 329754787
parent a5b38a72
......@@ -15,5 +15,6 @@
"""All necessary imports for registration."""
# pylint: disable=unused-import
from official.nlp import tasks
from official.nlp import tasks as nlp_task
from official.utils.testing import mock_task
from official.vision import beta
# TF Vision Model Garden
## Introduction
TF Vision model garden provides a large collection of baselines and checkpoints for image classification, object detection, and instance segmentation.
## Image Classification
### Common Settings and Notes
* We provide ImageNet checkpoints for [ResNet](https://arxiv.org/abs/1512.03385) models.
* Training details:
* All models are trained from scratch for 90 epochs with batch size 4096 and 1.6 initial stepwise decay learning rate.
* Unless noted, all models are trained with l2 weight regularization and ReLU activation.
### ImageNet Baselines
| model | resolution | epochs | FLOPs (B) | params (M) | Top-1 | Top-5 | download |
| ------------ |:-------------:| ---------:|-----------:|--------:|--------:|---------:|---------:|
| ResNet-50 | 224x224 | 90 | 4.1 | 25.6 | 76.1 | 92.9 | config |
## Object Detection and Instance Segmentation
### Common Settings and Notes
* We provide models based on two detection frameworks, [RetinaNet](https://arxiv.org/abs/1708.02002) or [Mask R-CNN](https://arxiv.org/abs/1703.06870), and two backbones, [ResNet-FPN](https://arxiv.org/abs/1612.03144) or [SpineNet](https://arxiv.org/abs/1912.05027).
* Models are all trained on COCO train2017 and evaluated on COCO val2017.
* Training details:
* Models finetuned from ImageNet pretrained checkpoints adopt the 12 or 36 epochs schedule. Models trained from scratch adopt the 350 epochs schedule.
* The default training data augmentation implements horizontal flipping and scale jittering with a random scale between [0.5, 2.0].
* Unless noted, all models are trained with l2 weight regularization and ReLU activation.
* We use batch size 256 and stepwise learning rate that decays at the last 30 and 10 epoch.
* We use square image as input by resizing the long side of an image to the target size then padding the short side with zeros.
### COCO Object Detection Baselines
#### RetinaNet (ImageNet pretrained)
| backbone | resolution | epochs | FLOPs (B) | params (M) | box AP | download |
| ------------ |:-------------:| ---------:|-----------:|--------:|--------:|-----------:|
| R50-FPN | 640x640 | 12 | 97.0 | 34.0 | 34.3 | config|
| R50-FPN | 640x640 | 36 | 97.0 | 34.0 | 37.3 | config|
#### RetinaNet (Trained from scratch)
| backbone | resolution | epochs | FLOPs (B) | params (M) | box AP | download |
| ------------ |:-------------:| ---------:|-----------:|--------:|---------:|-----------:|
| SpineNet-49 | 640x640 | 350 | 85.4| 28.5 | 42.4| config|
| SpineNet-96 | 1024x1024 | 350 | 265.4 | 43.0 | 46.0 | config |
| SpineNet-143 | 1280x1280 | 350 | 524.0 | 67.0 | 46.8 |config|
### Instance Segmentation Baselines
#### Mask R-CNN (ImageNet pretrained)
#### Mask R-CNN (Trained from scratch)
| backbone | resolution | epochs | FLOPs (B) | params (M) | box AP | mask AP | download |
| ------------ |:-------------:| ---------:|-----------:|--------:|--------:|-----------:|-----------:|
| SpineNet-49 | 640x640 | 350 | 215.7 | 40.8 | 42.6 | 37.9 | config |
This directory contains the new design of TF model garden vision framework.
Stay tuned.
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""NLP package definition."""
# Lint as: python3
# pylint: disable=unused-import
from official.vision.beta import configs
from official.vision.beta import tasks
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Configs package definition."""
from official.vision.beta.configs import image_classification
from official.vision.beta.configs import maskrcnn
from official.vision.beta.configs import retinanet
from official.vision.beta.configs import video_classification
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Backbones configurations."""
from typing import Optional
# Import libraries
import dataclasses
from official.modeling import hyperparams
@dataclasses.dataclass
class ResNet(hyperparams.Config):
"""ResNet config."""
model_id: int = 50
@dataclasses.dataclass
class EfficientNet(hyperparams.Config):
"""EfficientNet config."""
model_id: str = 'b0'
stochastic_depth_drop_rate: float = 0.0
se_ratio: float = 0.0
@dataclasses.dataclass
class SpineNet(hyperparams.Config):
"""SpineNet config."""
model_id: str = '49'
@dataclasses.dataclass
class RevNet(hyperparams.Config):
"""RevNet config."""
# Specifies the depth of RevNet.
model_id: int = 56
@dataclasses.dataclass
class Backbone(hyperparams.OneOfConfig):
"""Configuration for backbones.
Attributes:
type: 'str', type of backbone be used, one the of fields below.
resnet: resnet backbone config.
revnet: revnet backbone config.
efficientnet: efficientnet backbone config.
spinenet: spinenet backbone config.
"""
type: Optional[str] = None
resnet: ResNet = ResNet()
revnet: RevNet = RevNet()
efficientnet: EfficientNet = EfficientNet()
spinenet: SpineNet = SpineNet()
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""3D Backbones configurations."""
from typing import Optional, Tuple
# Import libraries
import dataclasses
from official.modeling import hyperparams
@dataclasses.dataclass
class ResNet3DBlock(hyperparams.Config):
"""Configuration of a ResNet 3D block."""
temporal_strides: int = 1
temporal_kernel_sizes: Tuple[int, ...] = ()
use_self_gating: bool = False
@dataclasses.dataclass
class ResNet3D(hyperparams.Config):
"""ResNet config."""
model_id: int = 50
stem_temporal_conv_stride: int = 2
stem_temporal_pool_stride: int = 2
block_specs: Tuple[ResNet3DBlock, ...] = ()
@dataclasses.dataclass
class ResNet3D50(ResNet3D):
"""Block specifications of the Resnet50 (3D) model."""
model_id: int = 50
block_specs: Tuple[
ResNet3DBlock, ResNet3DBlock, ResNet3DBlock, ResNet3DBlock] = (
ResNet3DBlock(temporal_strides=1,
temporal_kernel_sizes=(3, 3, 3),
use_self_gating=True),
ResNet3DBlock(temporal_strides=1,
temporal_kernel_sizes=(3, 1, 3, 1),
use_self_gating=True),
ResNet3DBlock(temporal_strides=1,
temporal_kernel_sizes=(3, 1, 3, 1, 3, 1),
use_self_gating=True),
ResNet3DBlock(temporal_strides=1,
temporal_kernel_sizes=(1, 3, 1),
use_self_gating=True))
@dataclasses.dataclass
class Backbone3D(hyperparams.OneOfConfig):
"""Configuration for backbones.
Attributes:
type: 'str', type of backbone be used, on the of fields below.
resnet: resnet3d backbone config.
"""
type: Optional[str] = None
resnet_3d: ResNet3D = ResNet3D50()
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Common configurations."""
# Import libraries
import dataclasses
from official.modeling import hyperparams
@dataclasses.dataclass
class NormActivation(hyperparams.Config):
activation: str = 'relu'
use_sync_bn: bool = False
norm_momentum: float = 0.99
norm_epsilon: float = 0.001
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Decoders configurations."""
from typing import Optional
# Import libraries
import dataclasses
from official.modeling import hyperparams
@dataclasses.dataclass
class Identity(hyperparams.Config):
"""Identity config."""
pass
@dataclasses.dataclass
class FPN(hyperparams.Config):
"""FPN config."""
num_filters: int = 256
use_separable_conv: bool = False
@dataclasses.dataclass
class Decoder(hyperparams.OneOfConfig):
"""Configuration for decoders.
Attributes:
type: 'str', type of decoder be used, on the of fields below.
fpn: fpn config.
"""
type: Optional[str] = None
fpn: FPN = FPN()
identity: Identity = Identity()
runtime:
distribution_strategy: 'mirrored'
mixed_precision_dtype: 'float16'
loss_scale: 'dynamic'
task:
model:
num_classes: 1001
input_size: [224, 224, 3]
backbone:
type: 'resnet'
resnet:
model_id: 50
losses:
l2_weight_decay: 0.0001
one_hot: True
label_smoothing: 0.1
train_data:
input_path: 'imagenet-2012-tfrecord/train*'
is_training: True
global_batch_size: 2048
dtype: 'float16'
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: False
global_batch_size: 2048
dtype: 'float16'
drop_remainder: False
trainer:
train_steps: 56160
validation_steps: 25
validation_interval: 625
steps_per_loop: 625
summary_interval: 625
checkpoint_interval: 625
optimizer_config:
optimizer:
type: 'sgd'
sgd:
momentum: 0.9
learning_rate:
type: 'stepwise'
stepwise:
boundaries: [18750, 37500, 50000]
values: [0.8, 0.08, 0.008, 0.0008]
warmup:
type: 'linear'
linear:
warmup_steps: 3125
runtime:
distribution_strategy: 'tpu'
mixed_precision_dtype: 'bfloat16'
task:
model:
num_classes: 1001
input_size: [224, 224, 3]
backbone:
type: 'resnet'
resnet:
model_id: 50
losses:
l2_weight_decay: 0.0001
one_hot: True
label_smoothing: 0.1
train_data:
input_path: 'imagenet-2012-tfrecord/train*'
is_training: True
global_batch_size: 4096
dtype: 'bfloat16'
validation_data:
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: False
global_batch_size: 4096
dtype: 'bfloat16'
drop_remainder: False
trainer:
train_steps: 28080
validation_steps: 13
validation_interval: 312
steps_per_loop: 312
summary_interval: 312
checkpoint_interval: 312
optimizer_config:
optimizer:
type: 'sgd'
sgd:
momentum: 0.9
learning_rate:
type: 'stepwise'
stepwise:
boundaries: [9360, 18720, 24960]
values: [1.6, 0.16, 0.016, 0.0016]
warmup:
type: 'linear'
linear:
warmup_steps: 1560
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Image classification configuration definition."""
import os
from typing import List
import dataclasses
from official.core import exp_factory
from official.modeling import hyperparams
from official.modeling import optimization
from official.modeling.hyperparams import config_definitions as cfg
from official.vision.beta.configs import backbones
from official.vision.beta.configs import common
@dataclasses.dataclass
class DataConfig(cfg.DataConfig):
"""Input config for training."""
input_path: str = ''
global_batch_size: int = 0
is_training: bool = True
dtype: str = 'float32'
shuffle_buffer_size: int = 10000
cycle_length: int = 10
@dataclasses.dataclass
class ImageClassificationModel(hyperparams.Config):
num_classes: int = 0
input_size: List[int] = dataclasses.field(default_factory=list)
backbone: backbones.Backbone = backbones.Backbone(
type='resnet', resnet=backbones.ResNet())
dropout_rate: float = 0.0
norm_activation: common.NormActivation = common.NormActivation()
# Adds a BatchNormalization layer pre-GlobalAveragePooling in classification
add_head_batch_norm: bool = False
@dataclasses.dataclass
class Losses(hyperparams.Config):
one_hot: bool = True
label_smoothing: float = 0.0
l2_weight_decay: float = 0.0
@dataclasses.dataclass
class ImageClassificationTask(cfg.TaskConfig):
"""The model config."""
model: ImageClassificationModel = ImageClassificationModel()
train_data: DataConfig = DataConfig(is_training=True)
validation_data: DataConfig = DataConfig(is_training=False)
losses: Losses = Losses()
gradient_clip_norm: float = 0.0
@exp_factory.register_config_factory('image_classification')
def image_classification() -> cfg.ExperimentConfig:
"""Image classification general."""
return cfg.ExperimentConfig(
task=ImageClassificationTask(),
trainer=cfg.TrainerConfig(),
restrictions=[
'task.train_data.is_training != None',
'task.validation_data.is_training != None'
])
IMAGENET_TRAIN_EXAMPLES = 1281167
IMAGENET_VAL_EXAMPLES = 50000
IMAGENET_INPUT_PATH_BASE = 'imagenet-2012-tfrecord'
@exp_factory.register_config_factory('resnet_imagenet')
def image_classification_imagenet() -> cfg.ExperimentConfig:
"""Image classification on imagenet with resnet."""
train_batch_size = 4096
eval_batch_size = 4096
steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size
config = cfg.ExperimentConfig(
task=ImageClassificationTask(
model=ImageClassificationModel(
num_classes=1001,
input_size=[224, 224, 3],
norm_activation=common.NormActivation(
norm_momentum=0.9, norm_epsilon=1e-5)),
losses=Losses(l2_weight_decay=1e-4),
train_data=DataConfig(
input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'),
is_training=True,
global_batch_size=train_batch_size),
validation_data=DataConfig(
input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'),
is_training=False,
global_batch_size=eval_batch_size)),
trainer=cfg.TrainerConfig(
steps_per_loop=steps_per_epoch,
summary_interval=steps_per_epoch,
checkpoint_interval=steps_per_epoch,
train_steps=90 * steps_per_epoch,
validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size,
validation_interval=steps_per_epoch,
optimizer_config=optimization.OptimizationConfig({
'optimizer': {
'type': 'sgd',
'sgd': {
'momentum': 0.9
}
},
'learning_rate': {
'type': 'stepwise',
'stepwise': {
'boundaries': [
30 * steps_per_epoch, 60 * steps_per_epoch,
80 * steps_per_epoch
],
'values': [
0.1 * train_batch_size / 256,
0.01 * train_batch_size / 256,
0.001 * train_batch_size / 256,
0.0001 * train_batch_size / 256,
]
}
},
'warmup': {
'type': 'linear',
'linear': {
'warmup_steps': 5 * steps_per_epoch,
'warmup_learning_rate': 0
}
}
})),
restrictions=[
'task.train_data.is_training != None',
'task.validation_data.is_training != None'
])
return config
@exp_factory.register_config_factory('revnet_imagenet')
def image_classification_imagenet_revnet() -> cfg.ExperimentConfig:
"""Returns a revnet config for image classification on imagenet."""
train_batch_size = 4096
eval_batch_size = 4096
steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size
config = cfg.ExperimentConfig(
task=ImageClassificationTask(
model=ImageClassificationModel(
num_classes=1001,
input_size=[224, 224, 3],
backbone=backbones.Backbone(
type='revnet', revnet=backbones.RevNet(model_id=56)),
norm_activation=common.NormActivation(
norm_momentum=0.9, norm_epsilon=1e-5),
add_head_batch_norm=True),
losses=Losses(l2_weight_decay=1e-4),
train_data=DataConfig(
input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'),
is_training=True,
global_batch_size=train_batch_size),
validation_data=DataConfig(
input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'),
is_training=False,
global_batch_size=eval_batch_size)),
trainer=cfg.TrainerConfig(
steps_per_loop=steps_per_epoch,
summary_interval=steps_per_epoch,
checkpoint_interval=steps_per_epoch,
train_steps=90 * steps_per_epoch,
validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size,
validation_interval=steps_per_epoch,
optimizer_config=optimization.OptimizationConfig({
'optimizer': {
'type': 'sgd',
'sgd': {
'momentum': 0.9
}
},
'learning_rate': {
'type': 'stepwise',
'stepwise': {
'boundaries': [
30 * steps_per_epoch, 60 * steps_per_epoch,
80 * steps_per_epoch
],
'values': [0.8, 0.08, 0.008, 0.0008]
}
},
'warmup': {
'type': 'linear',
'linear': {
'warmup_steps': 5 * steps_per_epoch,
'warmup_learning_rate': 0
}
}
})),
restrictions=[
'task.train_data.is_training != None',
'task.validation_data.is_training != None'
])
return config
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for image_classification."""
# pylint: disable=unused-import
from absl.testing import parameterized
import tensorflow as tf
from official.core import exp_factory
from official.modeling.hyperparams import config_definitions as cfg
from official.vision import beta
from official.vision.beta.configs import image_classification as exp_cfg
class ImageClassificationConfigTest(tf.test.TestCase, parameterized.TestCase):
@parameterized.parameters(('resnet_imagenet',),
('revnet_imagenet',))
def test_image_classification_configs(self, config_name):
config = exp_factory.get_exp_config(config_name)
self.assertIsInstance(config, cfg.ExperimentConfig)
self.assertIsInstance(config.task, exp_cfg.ImageClassificationTask)
self.assertIsInstance(config.task.model,
exp_cfg.ImageClassificationModel)
self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig)
config.task.train_data.is_training = None
with self.assertRaises(KeyError):
config.validate()
if __name__ == '__main__':
tf.test.main()
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Mask R-CNN configuration definition."""
import os
from typing import List, Optional
import dataclasses
from official.core import exp_factory
from official.modeling import hyperparams
from official.modeling import optimization
from official.modeling.hyperparams import config_definitions as cfg
from official.vision.beta.configs import backbones
from official.vision.beta.configs import common
from official.vision.beta.configs import decoders
# pylint: disable=missing-class-docstring
@dataclasses.dataclass
class TfExampleDecoder(hyperparams.Config):
regenerate_source_id: bool = False
@dataclasses.dataclass
class TfExampleDecoderLabelMap(hyperparams.Config):
regenerate_source_id: bool = False
label_map: str = ''
@dataclasses.dataclass
class DataDecoder(hyperparams.OneOfConfig):
type: Optional[str] = 'simple_decoder'
simple_decoder: TfExampleDecoder = TfExampleDecoder()
label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap()
@dataclasses.dataclass
class Parser(hyperparams.Config):
num_channels: int = 3
match_threshold: float = 0.5
unmatched_threshold: float = 0.5
aug_rand_hflip: bool = False
aug_scale_min: float = 1.0
aug_scale_max: float = 1.0
skip_crowd_during_training: bool = True
max_num_instances: int = 100
rpn_match_threshold: float = 0.7
rpn_unmatched_threshold: float = 0.3
rpn_batch_size_per_im: int = 256
rpn_fg_fraction: float = 0.5
mask_crop_size: int = 112
@dataclasses.dataclass
class DataConfig(cfg.DataConfig):
"""Input config for training."""
input_path: str = ''
global_batch_size: int = 0
is_training: bool = False
dtype: str = 'bfloat16'
decoder: DataDecoder = DataDecoder()
parser: Parser = Parser()
shuffle_buffer_size: int = 10000
@dataclasses.dataclass
class Anchor(hyperparams.Config):
num_scales: int = 1
aspect_ratios: List[float] = dataclasses.field(
default_factory=lambda: [0.5, 1.0, 2.0])
anchor_size: float = 8.0
@dataclasses.dataclass
class RPNHead(hyperparams.Config):
num_convs: int = 1
num_filters: int = 256
use_separable_conv: bool = False
@dataclasses.dataclass
class DetectionHead(hyperparams.Config):
num_convs: int = 4
num_filters: int = 256
use_separable_conv: bool = False
num_fcs: int = 1
fc_dims: int = 1024
@dataclasses.dataclass
class ROIGenerator(hyperparams.Config):
pre_nms_top_k: int = 2000
pre_nms_score_threshold: float = 0.0
pre_nms_min_size_threshold: float = 0.0
nms_iou_threshold: float = 0.7
num_proposals: int = 1000
test_pre_nms_top_k: int = 1000
test_pre_nms_score_threshold: float = 0.0
test_pre_nms_min_size_threshold: float = 0.0
test_nms_iou_threshold: float = 0.7
test_num_proposals: int = 1000
use_batched_nms: bool = False
@dataclasses.dataclass
class ROISampler(hyperparams.Config):
mix_gt_boxes: bool = True
num_sampled_rois: int = 512
foreground_fraction: float = 0.25
foreground_iou_threshold: float = 0.5
background_iou_high_threshold: float = 0.5
background_iou_low_threshold: float = 0.0
@dataclasses.dataclass
class ROIAligner(hyperparams.Config):
crop_size: int = 7
sample_offset: float = 0.5
@dataclasses.dataclass
class DetectionGenerator(hyperparams.Config):
pre_nms_top_k: int = 5000
pre_nms_score_threshold: float = 0.05
nms_iou_threshold: float = 0.5
max_num_detections: int = 100
use_batched_nms: bool = False
@dataclasses.dataclass
class MaskHead(hyperparams.Config):
upsample_factor: int = 2
num_convs: int = 4
num_filters: int = 256
use_separable_conv: bool = False
@dataclasses.dataclass
class MaskSampler(hyperparams.Config):
num_sampled_masks: int = 128
@dataclasses.dataclass
class MaskROIAligner(hyperparams.Config):
crop_size: int = 14
sample_offset: float = 0.5
@dataclasses.dataclass
class MaskRCNN(hyperparams.Config):
num_classes: int = 0
input_size: List[int] = dataclasses.field(default_factory=list)
min_level: int = 2
max_level: int = 6
anchor: Anchor = Anchor()
include_mask: bool = True
backbone: backbones.Backbone = backbones.Backbone(
type='resnet', resnet=backbones.ResNet())
decoder: decoders.Decoder = decoders.Decoder(
type='fpn', fpn=decoders.FPN())
rpn_head: RPNHead = RPNHead()
detection_head: DetectionHead = DetectionHead()
roi_generator: ROIGenerator = ROIGenerator()
roi_sampler: ROISampler = ROISampler()
roi_aligner: ROIAligner = ROIAligner()
detection_generator: DetectionGenerator = DetectionGenerator()
mask_head: Optional[MaskHead] = MaskHead()
mask_sampler: Optional[MaskSampler] = MaskSampler()
mask_roi_aligner: Optional[MaskROIAligner] = MaskROIAligner()
norm_activation: common.NormActivation = common.NormActivation(
norm_momentum=0.997,
norm_epsilon=0.0001,
use_sync_bn=True)
@dataclasses.dataclass
class Losses(hyperparams.Config):
rpn_huber_loss_delta: float = 1. / 9.
frcnn_huber_loss_delta: float = 1.
l2_weight_decay: float = 0.0
rpn_score_weight: float = 1.0
rpn_box_weight: float = 1.0
frcnn_class_weight: float = 1.0
frcnn_box_weight: float = 1.0
mask_weight: float = 1.0
@dataclasses.dataclass
class MaskRCNNTask(cfg.TaskConfig):
model: MaskRCNN = MaskRCNN()
train_data: DataConfig = DataConfig(is_training=True)
validation_data: DataConfig = DataConfig(is_training=False)
losses: Losses = Losses()
init_checkpoint: Optional[str] = None
init_checkpoint_modules: str = 'all' # all or backbone
annotation_file: Optional[str] = None
gradient_clip_norm: float = 0.0
COCO_INPUT_PATH_BASE = 'coco'
@exp_factory.register_config_factory('fasterrcnn_resnetfpn_coco')
def fasterrcnn_resnetfpn_coco() -> cfg.ExperimentConfig:
"""COCO object detection with Faster R-CNN."""
steps_per_epoch = 500
coco_val_samples = 5000
config = cfg.ExperimentConfig(
runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'),
task=MaskRCNNTask(
init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080',
init_checkpoint_modules='backbone',
annotation_file=os.path.join(COCO_INPUT_PATH_BASE,
'instances_val2017.json'),
model=MaskRCNN(
num_classes=91,
input_size=[1024, 1024, 3],
include_mask=False,
mask_head=None,
mask_sampler=None,
mask_roi_aligner=None),
losses=Losses(l2_weight_decay=0.00004),
train_data=DataConfig(
input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'),
is_training=True,
global_batch_size=64,
parser=Parser(
aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)),
validation_data=DataConfig(
input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'),
is_training=False,
global_batch_size=8)),
trainer=cfg.TrainerConfig(
train_steps=22500,
validation_steps=coco_val_samples // 8,
validation_interval=steps_per_epoch,
steps_per_loop=steps_per_epoch,
summary_interval=steps_per_epoch,
checkpoint_interval=steps_per_epoch,
optimizer_config=optimization.OptimizationConfig({
'optimizer': {
'type': 'sgd',
'sgd': {
'momentum': 0.9
}
},
'learning_rate': {
'type': 'stepwise',
'stepwise': {
'boundaries': [15000, 20000],
'values': [0.12, 0.012, 0.0012],
}
},
'warmup': {
'type': 'linear',
'linear': {
'warmup_steps': 500,
'warmup_learning_rate': 0.0067
}
}
})),
restrictions=[
'task.train_data.is_training != None',
'task.validation_data.is_training != None'
])
return config
@exp_factory.register_config_factory('maskrcnn_resnetfpn_coco')
def maskrcnn_resnetfpn_coco() -> cfg.ExperimentConfig:
"""COCO object detection with Mask R-CNN."""
steps_per_epoch = 500
coco_val_samples = 5000
config = cfg.ExperimentConfig(
runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'),
task=MaskRCNNTask(
init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080',
init_checkpoint_modules='backbone',
annotation_file=os.path.join(COCO_INPUT_PATH_BASE,
'instances_val2017.json'),
model=MaskRCNN(
num_classes=91,
input_size=[1024, 1024, 3],
include_mask=True),
losses=Losses(l2_weight_decay=0.00004),
train_data=DataConfig(
input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'),
is_training=True,
global_batch_size=64,
parser=Parser(
aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)),
validation_data=DataConfig(
input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'),
is_training=False,
global_batch_size=8)),
trainer=cfg.TrainerConfig(
train_steps=22500,
validation_steps=coco_val_samples // 8,
validation_interval=steps_per_epoch,
steps_per_loop=steps_per_epoch,
summary_interval=steps_per_epoch,
checkpoint_interval=steps_per_epoch,
optimizer_config=optimization.OptimizationConfig({
'optimizer': {
'type': 'sgd',
'sgd': {
'momentum': 0.9
}
},
'learning_rate': {
'type': 'stepwise',
'stepwise': {
'boundaries': [15000, 20000],
'values': [0.12, 0.012, 0.0012],
}
},
'warmup': {
'type': 'linear',
'linear': {
'warmup_steps': 500,
'warmup_learning_rate': 0.0067
}
}
})),
restrictions=[
'task.train_data.is_training != None',
'task.validation_data.is_training != None'
])
return config
@exp_factory.register_config_factory('maskrcnn_spinenet_coco')
def maskrcnn_spinenet_coco() -> cfg.ExperimentConfig:
"""COCO object detection with Mask R-CNN with SpineNet backbone."""
steps_per_epoch = 463
coco_val_samples = 5000
config = cfg.ExperimentConfig(
runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'),
task=MaskRCNNTask(
annotation_file=os.path.join(COCO_INPUT_PATH_BASE,
'instances_val2017.json'),
model=MaskRCNN(
backbone=backbones.Backbone(
type='spinenet', spinenet=backbones.SpineNet(model_id='49')),
decoder=decoders.Decoder(
type='identity', identity=decoders.Identity()),
anchor=Anchor(anchor_size=3),
norm_activation=common.NormActivation(use_sync_bn=True),
num_classes=91,
input_size=[640, 640, 3],
min_level=3,
max_level=7,
include_mask=True),
losses=Losses(l2_weight_decay=0.00004),
train_data=DataConfig(
input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'),
is_training=True,
global_batch_size=256,
parser=Parser(
aug_rand_hflip=True, aug_scale_min=0.5, aug_scale_max=2.0)),
validation_data=DataConfig(
input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'),
is_training=False,
global_batch_size=8)),
trainer=cfg.TrainerConfig(
train_steps=steps_per_epoch * 350,
validation_steps=coco_val_samples // 8,
validation_interval=steps_per_epoch,
steps_per_loop=steps_per_epoch,
summary_interval=steps_per_epoch,
checkpoint_interval=steps_per_epoch,
optimizer_config=optimization.OptimizationConfig({
'optimizer': {
'type': 'sgd',
'sgd': {
'momentum': 0.9
}
},
'learning_rate': {
'type': 'stepwise',
'stepwise': {
'boundaries': [
steps_per_epoch * 320, steps_per_epoch * 340
],
'values': [0.28, 0.028, 0.0028],
}
},
'warmup': {
'type': 'linear',
'linear': {
'warmup_steps': 2000,
'warmup_learning_rate': 0.0067
}
}
})),
restrictions=[
'task.train_data.is_training != None',
'task.validation_data.is_training != None'
])
return config
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""RetinaNet configuration definition."""
import os
from typing import List, Optional
import dataclasses
from official.core import exp_factory
from official.modeling import hyperparams
from official.modeling import optimization
from official.modeling.hyperparams import config_definitions as cfg
from official.vision.beta.configs import backbones
from official.vision.beta.configs import common
from official.vision.beta.configs import decoders
# pylint: disable=missing-class-docstring
@dataclasses.dataclass
class TfExampleDecoder(hyperparams.Config):
regenerate_source_id: bool = False
@dataclasses.dataclass
class TfExampleDecoderLabelMap(hyperparams.Config):
regenerate_source_id: bool = False
label_map: str = ''
@dataclasses.dataclass
class DataDecoder(hyperparams.OneOfConfig):
type: Optional[str] = 'simple_decoder'
simple_decoder: TfExampleDecoder = TfExampleDecoder()
label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap()
@dataclasses.dataclass
class Parser(hyperparams.Config):
num_channels: int = 3
match_threshold: float = 0.5
unmatched_threshold: float = 0.5
aug_rand_hflip: bool = False
aug_scale_min: float = 1.0
aug_scale_max: float = 1.0
skip_crowd_during_training: bool = True
max_num_instances: int = 100
@dataclasses.dataclass
class DataConfig(cfg.DataConfig):
"""Input config for training."""
input_path: str = ''
global_batch_size: int = 0
is_training: bool = False
dtype: str = 'bfloat16'
decoder: DataDecoder = DataDecoder()
parser: Parser = Parser()
shuffle_buffer_size: int = 10000
@dataclasses.dataclass
class Anchor(hyperparams.Config):
num_scales: int = 3
aspect_ratios: List[float] = dataclasses.field(
default_factory=lambda: [0.5, 1.0, 2.0])
anchor_size: float = 4.0
@dataclasses.dataclass
class Losses(hyperparams.Config):
focal_loss_alpha: float = 0.25
focal_loss_gamma: float = 1.5
huber_loss_delta: float = 0.1
box_loss_weight: int = 50
l2_weight_decay: float = 0.0
@dataclasses.dataclass
class RetinaNetHead(hyperparams.Config):
num_convs: int = 4
num_filters: int = 256
use_separable_conv: bool = False
@dataclasses.dataclass
class DetectionGenerator(hyperparams.Config):
pre_nms_top_k: int = 5000
pre_nms_score_threshold: float = 0.05
nms_iou_threshold: float = 0.5
max_num_detections: int = 100
use_batched_nms: bool = False
@dataclasses.dataclass
class RetinaNet(hyperparams.Config):
num_classes: int = 0
input_size: List[int] = dataclasses.field(default_factory=list)
min_level: int = 3
max_level: int = 7
anchor: Anchor = Anchor()
backbone: backbones.Backbone = backbones.Backbone(
type='resnet', resnet=backbones.ResNet())
decoder: decoders.Decoder = decoders.Decoder(
type='fpn', fpn=decoders.FPN())
head: RetinaNetHead = RetinaNetHead()
detection_generator: DetectionGenerator = DetectionGenerator()
norm_activation: common.NormActivation = common.NormActivation()
@dataclasses.dataclass
class RetinaNetTask(cfg.TaskConfig):
model: RetinaNet = RetinaNet()
train_data: DataConfig = DataConfig(is_training=True)
validation_data: DataConfig = DataConfig(is_training=False)
losses: Losses = Losses()
init_checkpoint: Optional[str] = None
init_checkpoint_modules: str = 'all' # all or backbone
gradient_clip_norm: float = 0.0
@exp_factory.register_config_factory('retinanet')
def retinanet() -> cfg.ExperimentConfig:
"""RetinaNet general config."""
return cfg.ExperimentConfig(
task=RetinaNetTask(),
restrictions=[
'task.train_data.is_training != None',
'task.validation_data.is_training != None'
])
COCO_INPUT_PATH_BASE = 'coco'
COCO_TRIAN_EXAMPLES = 118287
COCO_VAL_EXAMPLES = 5000
@exp_factory.register_config_factory('retinanet_resnetfpn_coco')
def retinanet_resnetfpn_coco() -> cfg.ExperimentConfig:
"""COCO object detection with RetinaNet."""
train_batch_size = 256
eval_batch_size = 8
steps_per_epoch = COCO_TRIAN_EXAMPLES // train_batch_size
config = cfg.ExperimentConfig(
runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'),
task=RetinaNetTask(
init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080',
init_checkpoint_modules='backbone',
model=RetinaNet(
num_classes=91,
input_size=[640, 640, 3],
min_level=3,
max_level=7),
losses=Losses(l2_weight_decay=1e-4),
train_data=DataConfig(
input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'),
is_training=True,
global_batch_size=train_batch_size,
parser=Parser(
aug_rand_hflip=True, aug_scale_min=0.5, aug_scale_max=2.0)),
validation_data=DataConfig(
input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'),
is_training=False,
global_batch_size=eval_batch_size)),
trainer=cfg.TrainerConfig(
train_steps=72 * steps_per_epoch,
validation_steps=COCO_VAL_EXAMPLES // eval_batch_size,
validation_interval=steps_per_epoch,
steps_per_loop=steps_per_epoch,
summary_interval=steps_per_epoch,
checkpoint_interval=steps_per_epoch,
optimizer_config=optimization.OptimizationConfig({
'optimizer': {
'type': 'sgd',
'sgd': {
'momentum': 0.9
}
},
'learning_rate': {
'type': 'stepwise',
'stepwise': {
'boundaries': [
57 * steps_per_epoch, 67 * steps_per_epoch
],
'values': [
0.28 * train_batch_size / 256.0,
0.028 * train_batch_size / 256.0,
0.0028 * train_batch_size / 256.0
],
}
},
'warmup': {
'type': 'linear',
'linear': {
'warmup_steps': 500,
'warmup_learning_rate': 0.0067
}
}
})),
restrictions=[
'task.train_data.is_training != None',
'task.validation_data.is_training != None'
])
return config
@exp_factory.register_config_factory('retinanet_spinenet_coco')
def retinanet_spinenet_coco() -> cfg.ExperimentConfig:
"""COCO object detection with RetinaNet using SpineNet backbone."""
train_batch_size = 256
eval_batch_size = 8
steps_per_epoch = COCO_TRIAN_EXAMPLES // train_batch_size
input_size = 640
config = cfg.ExperimentConfig(
runtime=cfg.RuntimeConfig(mixed_precision_dtype='float32'),
task=RetinaNetTask(
model=RetinaNet(
backbone=backbones.Backbone(
type='spinenet',
spinenet=backbones.SpineNet(model_id='49')),
decoder=decoders.Decoder(
type='identity', identity=decoders.Identity()),
anchor=Anchor(anchor_size=3),
norm_activation=common.NormActivation(use_sync_bn=True),
num_classes=91,
input_size=[input_size, input_size, 3],
min_level=3,
max_level=7),
losses=Losses(l2_weight_decay=4e-5),
train_data=DataConfig(
input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'),
is_training=True,
global_batch_size=train_batch_size,
parser=Parser(
aug_rand_hflip=True, aug_scale_min=0.5, aug_scale_max=2.0)),
validation_data=DataConfig(
input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'),
is_training=False,
global_batch_size=eval_batch_size)),
trainer=cfg.TrainerConfig(
train_steps=350 * steps_per_epoch,
validation_steps=COCO_VAL_EXAMPLES // eval_batch_size,
validation_interval=steps_per_epoch,
steps_per_loop=steps_per_epoch,
summary_interval=steps_per_epoch,
checkpoint_interval=steps_per_epoch,
optimizer_config=optimization.OptimizationConfig({
'optimizer': {
'type': 'sgd',
'sgd': {
'momentum': 0.9
}
},
'learning_rate': {
'type': 'stepwise',
'stepwise': {
'boundaries': [
320 * steps_per_epoch, 340 * steps_per_epoch
],
'values': [
0.28 * train_batch_size / 256.0,
0.028 * train_batch_size / 256.0,
0.0028 * train_batch_size / 256.0
],
}
},
'warmup': {
'type': 'linear',
'linear': {
'warmup_steps': 2000,
'warmup_learning_rate': 0.0067
}
}
})),
restrictions=[
'task.train_data.is_training != None',
'task.validation_data.is_training != None'
])
return config
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Classification decoder and parser."""
# Import libraries
import tensorflow as tf
from official.vision.beta.dataloaders import decoder
from official.vision.beta.dataloaders import parser
from official.vision.beta.ops import preprocess_ops
MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255)
STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255)
class Decoder(decoder.Decoder):
"""A tf.Example decoder for classification task."""
def __init__(self):
self._keys_to_features = {
'image/encoded': tf.io.FixedLenFeature((), tf.string, default_value=''),
'image/class/label': (
tf.io.FixedLenFeature((), tf.int64, default_value=-1))
}
def decode(self, serialized_example):
return tf.io.parse_single_example(
serialized_example, self._keys_to_features)
class Parser(parser.Parser):
"""Parser to parse an image and its annotations into a dictionary of tensors."""
def __init__(self,
output_size,
num_classes,
aug_rand_hflip=True,
dtype='float32'):
"""Initializes parameters for parsing annotations in the dataset.
Args:
output_size: `Tenssor` or `list` for [height, width] of output image. The
output_size should be divided by the largest feature stride 2^max_level.
num_classes: `float`, number of classes.
aug_rand_hflip: `bool`, if True, augment training with random
horizontal flip.
dtype: `str`, cast output image in dtype. It can be 'float32', 'float16',
or 'bfloat16'.
"""
self._output_size = output_size
self._aug_rand_hflip = aug_rand_hflip
self._num_classes = num_classes
if dtype == 'float32':
self._dtype = tf.float32
elif dtype == 'float16':
self._dtype = tf.float16
elif dtype == 'bfloat16':
self._dtype = tf.bfloat16
else:
raise ValueError('dtype {!r} is not supported!'.format(dtype))
def _parse_train_data(self, decoded_tensors):
"""Parses data for training."""
label = tf.cast(decoded_tensors['image/class/label'], dtype=tf.int32)
image_bytes = decoded_tensors['image/encoded']
image_shape = tf.image.extract_jpeg_shape(image_bytes)
# Crops image.
# TODO(pengchong): support image format other than JPEG.
cropped_image = preprocess_ops.random_crop_image_v2(
image_bytes, image_shape)
image = tf.cond(
tf.reduce_all(tf.equal(tf.shape(cropped_image), image_shape)),
lambda: preprocess_ops.center_crop_image_v2(image_bytes, image_shape),
lambda: cropped_image)
if self._aug_rand_hflip:
image = tf.image.random_flip_left_right(image)
# Resizes image.
image = tf.image.resize(
image, self._output_size, method=tf.image.ResizeMethod.BILINEAR)
# Normalizes image with mean and std pixel values.
image = preprocess_ops.normalize_image(image,
offset=MEAN_RGB,
scale=STDDEV_RGB)
# Convert image to self._dtype.
image = tf.image.convert_image_dtype(image, self._dtype)
return image, label
def _parse_eval_data(self, decoded_tensors):
"""Parses data for evaluation."""
label = tf.cast(decoded_tensors['image/class/label'], dtype=tf.int32)
image_bytes = decoded_tensors['image/encoded']
image_shape = tf.image.extract_jpeg_shape(image_bytes)
# Center crops and resizes image.
image = preprocess_ops.center_crop_image_v2(image_bytes, image_shape)
image = tf.image.resize(
image, self._output_size, method=tf.image.ResizeMethod.BILINEAR)
image = tf.reshape(image, [self._output_size[0], self._output_size[1], 3])
# Normalizes image with mean and std pixel values.
image = preprocess_ops.normalize_image(image,
offset=MEAN_RGB,
scale=STDDEV_RGB)
# Convert image to self._dtype.
image = tf.image.convert_image_dtype(image, self._dtype)
return image, label
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests classification_input.py."""
import io
# Import libraries
from absl.testing import parameterized
import numpy as np
from PIL import Image
import tensorflow as tf
from official.core import input_reader
from official.modeling.hyperparams import config_definitions as cfg
from official.vision.beta.dataloaders import classification_input
def _encode_image(image_array, fmt):
image = Image.fromarray(image_array)
with io.BytesIO() as output:
image.save(output, format=fmt)
return output.getvalue()
class DecoderTest(tf.test.TestCase, parameterized.TestCase):
@parameterized.parameters(
(100, 100, 0), (100, 100, 1), (100, 100, 2),
)
def test_decoder(self, image_height, image_width, num_instances):
decoder = classification_input.Decoder()
image = _encode_image(
np.uint8(np.random.rand(image_height, image_width, 3) * 255),
fmt='JPEG')
label = 2
serialized_example = tf.train.Example(
features=tf.train.Features(
feature={
'image/encoded': (tf.train.Feature(
bytes_list=tf.train.BytesList(value=[image]))),
'image/class/label': (
tf.train.Feature(
int64_list=tf.train.Int64List(value=[label]))),
})).SerializeToString()
decoded_tensors = decoder.decode(tf.convert_to_tensor(serialized_example))
results = tf.nest.map_structure(lambda x: x.numpy(), decoded_tensors)
self.assertCountEqual(
['image/encoded', 'image/class/label'], results.keys())
self.assertEqual(label, results['image/class/label'])
class ParserTest(parameterized.TestCase, tf.test.TestCase):
@parameterized.parameters(
([224, 224, 3], 'float32', True),
([224, 224, 3], 'float16', True),
([224, 224, 3], 'float32', False),
([224, 224, 3], 'float16', False),
([512, 640, 3], 'float32', True),
([512, 640, 3], 'float16', True),
([512, 640, 3], 'float32', False),
([512, 640, 3], 'float16', False),
([640, 640, 3], 'float32', True),
([640, 640, 3], 'bfloat16', True),
([640, 640, 3], 'float32', False),
([640, 640, 3], 'bfloat16', False),
)
def test_parser(self, output_size, dtype, is_training):
params = cfg.DataConfig(
input_path='/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*',
global_batch_size=2,
is_training=True,
examples_consume=4)
decoder = classification_input.Decoder()
parser = classification_input.Parser(
output_size=output_size[:2],
num_classes=1001,
aug_rand_hflip=False,
dtype=dtype)
reader = input_reader.InputReader(
params,
dataset_fn=tf.data.TFRecordDataset,
decoder_fn=decoder.decode,
parser_fn=parser.parse_fn(params.is_training))
dataset = reader.read()
images, labels = next(iter(dataset))
self.assertAllEqual(images.numpy().shape,
[params.global_batch_size] + output_size)
self.assertAllEqual(labels.numpy().shape, [params.global_batch_size])
if dtype == 'float32':
self.assertAllEqual(images.dtype, tf.float32)
elif dtype == 'float16':
self.assertAllEqual(images.dtype, tf.float16)
elif dtype == 'bfloat16':
self.assertAllEqual(images.dtype, tf.bfloat16)
if __name__ == '__main__':
tf.test.main()
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""The generic decoder interface."""
import abc
class Decoder(object):
"""Decodes the raw data into tensors."""
__metaclass__ = abc.ABCMeta
@abc.abstractmethod
def decode(self, serialized_example):
"""Decodes the serialized example into tensors.
Args:
serialized_example: a serialized string tensor that encodes the data.
Returns:
decoded_tensors: a dict of Tensors.
"""
pass
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Data parser and processing for Mask R-CNN."""
# Import libraries
import tensorflow as tf
from official.vision.beta.dataloaders import parser
from official.vision.beta.dataloaders import utils
from official.vision.beta.ops import anchor
from official.vision.beta.ops import box_ops
from official.vision.beta.ops import preprocess_ops
class Parser(parser.Parser):
"""Parser to parse an image and its annotations into a dictionary of tensors."""
def __init__(self,
output_size,
min_level,
max_level,
num_scales,
aspect_ratios,
anchor_size,
rpn_match_threshold=0.7,
rpn_unmatched_threshold=0.3,
rpn_batch_size_per_im=256,
rpn_fg_fraction=0.5,
aug_rand_hflip=False,
aug_scale_min=1.0,
aug_scale_max=1.0,
skip_crowd_during_training=True,
max_num_instances=100,
include_mask=False,
mask_crop_size=112,
dtype='float32'):
"""Initializes parameters for parsing annotations in the dataset.
Args:
output_size: `Tensor` or `list` for [height, width] of output image. The
output_size should be divided by the largest feature stride 2^max_level.
min_level: `int` number of minimum level of the output feature pyramid.
max_level: `int` number of maximum level of the output feature pyramid.
num_scales: `int` number representing intermediate scales added
on each level. For instances, num_scales=2 adds one additional
intermediate anchor scales [2^0, 2^0.5] on each level.
aspect_ratios: `list` of float numbers representing the aspect raito
anchors added on each level. The number indicates the ratio of width to
height. For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors
on each scale level.
anchor_size: `float` number representing the scale of size of the base
anchor to the feature stride 2^level.
rpn_match_threshold:
rpn_unmatched_threshold:
rpn_batch_size_per_im:
rpn_fg_fraction:
aug_rand_hflip: `bool`, if True, augment training with random
horizontal flip.
aug_scale_min: `float`, the minimum scale applied to `output_size` for
data augmentation during training.
aug_scale_max: `float`, the maximum scale applied to `output_size` for
data augmentation during training.
skip_crowd_during_training: `bool`, if True, skip annotations labeled with
`is_crowd` equals to 1.
max_num_instances: `int` number of maximum number of instances in an
image. The groundtruth data will be padded to `max_num_instances`.
include_mask: a bool to indicate whether parse mask groundtruth.
mask_crop_size: the size which groundtruth mask is cropped to.
dtype: `str`, data type. One of {`bfloat16`, `float32`, `float16`}.
"""
self._max_num_instances = max_num_instances
self._skip_crowd_during_training = skip_crowd_during_training
# Anchor.
self._output_size = output_size
self._min_level = min_level
self._max_level = max_level
self._num_scales = num_scales
self._aspect_ratios = aspect_ratios
self._anchor_size = anchor_size
# Target assigning.
self._rpn_match_threshold = rpn_match_threshold
self._rpn_unmatched_threshold = rpn_unmatched_threshold
self._rpn_batch_size_per_im = rpn_batch_size_per_im
self._rpn_fg_fraction = rpn_fg_fraction
# Data augmentation.
self._aug_rand_hflip = aug_rand_hflip
self._aug_scale_min = aug_scale_min
self._aug_scale_max = aug_scale_max
# Mask.
self._include_mask = include_mask
self._mask_crop_size = mask_crop_size
# Image output dtype.
self._dtype = dtype
def _parse_train_data(self, data):
"""Parses data for training.
Args:
data: the decoded tensor dictionary from TfExampleDecoder.
Returns:
image: image tensor that is preproessed to have normalized value and
dimension [output_size[0], output_size[1], 3]
labels: a dictionary of tensors used for training. The following describes
{key: value} pairs in the dictionary.
image_info: a 2D `Tensor` that encodes the information of the image and
the applied preprocessing. It is in the format of
[[original_height, original_width], [scaled_height, scaled_width],
anchor_boxes: ordered dictionary with keys
[min_level, min_level+1, ..., max_level]. The values are tensor with
shape [height_l, width_l, 4] representing anchor boxes at each level.
rpn_score_targets: ordered dictionary with keys
[min_level, min_level+1, ..., max_level]. The values are tensor with
shape [height_l, width_l, anchors_per_location]. The height_l and
width_l represent the dimension of class logits at l-th level.
rpn_box_targets: ordered dictionary with keys
[min_level, min_level+1, ..., max_level]. The values are tensor with
shape [height_l, width_l, anchors_per_location * 4]. The height_l and
width_l represent the dimension of bounding box regression output at
l-th level.
gt_boxes: Groundtruth bounding box annotations. The box is represented
in [y1, x1, y2, x2] format. The coordinates are w.r.t the scaled
image that is fed to the network. The tennsor is padded with -1 to
the fixed dimension [self._max_num_instances, 4].
gt_classes: Groundtruth classes annotations. The tennsor is padded
with -1 to the fixed dimension [self._max_num_instances].
gt_masks: groundtrugh masks cropped by the bounding box and
resized to a fixed size determined by mask_crop_size.
"""
classes = data['groundtruth_classes']
boxes = data['groundtruth_boxes']
if self._include_mask:
masks = data['groundtruth_instance_masks']
is_crowds = data['groundtruth_is_crowd']
# Skips annotations with `is_crowd` = True.
if self._skip_crowd_during_training:
num_groundtruths = tf.shape(classes)[0]
with tf.control_dependencies([num_groundtruths, is_crowds]):
indices = tf.cond(
tf.greater(tf.size(is_crowds), 0),
lambda: tf.where(tf.logical_not(is_crowds))[:, 0],
lambda: tf.cast(tf.range(num_groundtruths), tf.int64))
classes = tf.gather(classes, indices)
boxes = tf.gather(boxes, indices)
if self._include_mask:
masks = tf.gather(masks, indices)
# Gets original image and its size.
image = data['image']
image_shape = tf.shape(image)[0:2]
# Normalizes image with mean and std pixel values.
image = preprocess_ops.normalize_image(image)
# Flips image randomly during training.
if self._aug_rand_hflip:
if self._include_mask:
image, boxes, masks = preprocess_ops.random_horizontal_flip(
image, boxes, masks)
else:
image, boxes, _ = preprocess_ops.random_horizontal_flip(
image, boxes)
# Converts boxes from normalized coordinates to pixel coordinates.
# Now the coordinates of boxes are w.r.t. the original image.
boxes = box_ops.denormalize_boxes(boxes, image_shape)
# Resizes and crops image.
image, image_info = preprocess_ops.resize_and_crop_image(
image,
self._output_size,
padded_size=preprocess_ops.compute_padded_size(
self._output_size, 2 ** self._max_level),
aug_scale_min=self._aug_scale_min,
aug_scale_max=self._aug_scale_max)
image_height, image_width, _ = image.get_shape().as_list()
# Resizes and crops boxes.
# Now the coordinates of boxes are w.r.t the scaled image.
image_scale = image_info[2, :]
offset = image_info[3, :]
boxes = preprocess_ops.resize_and_crop_boxes(
boxes, image_scale, image_info[1, :], offset)
# Filters out ground truth boxes that are all zeros.
indices = box_ops.get_non_empty_box_indices(boxes)
boxes = tf.gather(boxes, indices)
classes = tf.gather(classes, indices)
if self._include_mask:
masks = tf.gather(masks, indices)
# Transfer boxes to the original image space and do normalization.
cropped_boxes = boxes + tf.tile(tf.expand_dims(offset, axis=0), [1, 2])
cropped_boxes /= tf.tile(tf.expand_dims(image_scale, axis=0), [1, 2])
cropped_boxes = box_ops.normalize_boxes(cropped_boxes, image_shape)
num_masks = tf.shape(masks)[0]
masks = tf.image.crop_and_resize(
tf.expand_dims(masks, axis=-1),
cropped_boxes,
box_indices=tf.range(num_masks, dtype=tf.int32),
crop_size=[self._mask_crop_size, self._mask_crop_size],
method='bilinear')
masks = tf.squeeze(masks, axis=-1)
# Assigns anchor targets.
# Note that after the target assignment, box targets are absolute pixel
# offsets w.r.t. the scaled image.
input_anchor = anchor.build_anchor_generator(
min_level=self._min_level,
max_level=self._max_level,
num_scales=self._num_scales,
aspect_ratios=self._aspect_ratios,
anchor_size=self._anchor_size)
anchor_boxes = input_anchor(image_size=(image_height, image_width))
anchor_labeler = anchor.RpnAnchorLabeler(
self._rpn_match_threshold,
self._rpn_unmatched_threshold,
self._rpn_batch_size_per_im,
self._rpn_fg_fraction)
rpn_score_targets, rpn_box_targets = anchor_labeler.label_anchors(
anchor_boxes, boxes,
tf.cast(tf.expand_dims(classes, axis=-1), dtype=tf.float32))
# Casts input image to self._dtype
image = tf.cast(image, dtype=self._dtype)
# Packs labels for model_fn outputs.
labels = {
'anchor_boxes':
anchor_boxes,
'image_info':
image_info,
'rpn_score_targets':
rpn_score_targets,
'rpn_box_targets':
rpn_box_targets,
'gt_boxes':
preprocess_ops.clip_or_pad_to_fixed_size(boxes,
self._max_num_instances,
-1),
'gt_classes':
preprocess_ops.clip_or_pad_to_fixed_size(classes,
self._max_num_instances,
-1),
}
if self._include_mask:
labels['gt_masks'] = preprocess_ops.clip_or_pad_to_fixed_size(
masks, self._max_num_instances, -1)
return image, labels
def _parse_eval_data(self, data):
"""Parses data for evaluation.
Args:
data: the decoded tensor dictionary from TfExampleDecoder.
Returns:
A dictionary of {'images': image, 'labels': labels} where
image: image tensor that is preproessed to have normalized value and
dimension [output_size[0], output_size[1], 3]
labels: a dictionary of tensors used for training. The following
describes {key: value} pairs in the dictionary.
source_ids: Source image id. Default value -1 if the source id is
empty in the groundtruth annotation.
image_info: a 2D `Tensor` that encodes the information of the image
and the applied preprocessing. It is in the format of
[[original_height, original_width], [scaled_height, scaled_width],
anchor_boxes: ordered dictionary with keys
[min_level, min_level+1, ..., max_level]. The values are tensor with
shape [height_l, width_l, 4] representing anchor boxes at each
level.
"""
# Gets original image and its size.
image = data['image']
image_shape = tf.shape(image)[0:2]
# Normalizes image with mean and std pixel values.
image = preprocess_ops.normalize_image(image)
# Resizes and crops image.
image, image_info = preprocess_ops.resize_and_crop_image(
image,
self._output_size,
padded_size=preprocess_ops.compute_padded_size(
self._output_size, 2 ** self._max_level),
aug_scale_min=1.0,
aug_scale_max=1.0)
image_height, image_width, _ = image.get_shape().as_list()
# Casts input image to self._dtype
image = tf.cast(image, dtype=self._dtype)
# Converts boxes from normalized coordinates to pixel coordinates.
boxes = box_ops.denormalize_boxes(data['groundtruth_boxes'], image_shape)
# Compute Anchor boxes.
input_anchor = anchor.build_anchor_generator(
min_level=self._min_level,
max_level=self._max_level,
num_scales=self._num_scales,
aspect_ratios=self._aspect_ratios,
anchor_size=self._anchor_size)
anchor_boxes = input_anchor(image_size=(image_height, image_width))
labels = {
'image_info': image_info,
'anchor_boxes': anchor_boxes,
}
groundtruths = {
'source_id': data['source_id'],
'height': data['height'],
'width': data['width'],
'num_detections': tf.shape(data['groundtruth_classes']),
'boxes': boxes,
'classes': data['groundtruth_classes'],
'areas': data['groundtruth_area'],
'is_crowds': tf.cast(data['groundtruth_is_crowd'], tf.int32),
}
groundtruths['source_id'] = utils.process_source_id(
groundtruths['source_id'])
groundtruths = utils.pad_groundtruths_to_fixed_size(
groundtruths, self._max_num_instances)
labels['groundtruths'] = groundtruths
return image, labels
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for maskrcnn_input."""
# Import libraries
from absl.testing import parameterized
import tensorflow as tf
from official.core import input_reader
from official.modeling.hyperparams import config_definitions as cfg
from official.vision.beta.dataloaders import maskrcnn_input
from official.vision.beta.dataloaders import tf_example_decoder
class InputReaderTest(parameterized.TestCase, tf.test.TestCase):
@parameterized.parameters(
([1024, 1024], True, True, True),
([1024, 1024], True, False, True),
([1024, 1024], False, True, True),
([1024, 1024], False, False, True),
([1024, 1024], True, True, False),
([1024, 1024], True, False, False),
([1024, 1024], False, True, False),
([1024, 1024], False, False, False),
)
def testMaskRCNNInputReader(self,
output_size,
skip_crowd_during_training,
include_mask,
is_training):
min_level = 3
max_level = 7
num_scales = 3
aspect_ratios = [1.0, 2.0, 0.5]
max_num_instances = 100
batch_size = 2
mask_crop_size = 112
anchor_size = 4.0
params = cfg.DataConfig(
input_path='/placer/prod/home/snaggletooth/test/data/coco/val*',
global_batch_size=batch_size,
is_training=is_training)
parser = maskrcnn_input.Parser(
output_size=output_size,
min_level=min_level,
max_level=max_level,
num_scales=num_scales,
aspect_ratios=aspect_ratios,
anchor_size=anchor_size,
rpn_match_threshold=0.7,
rpn_unmatched_threshold=0.3,
rpn_batch_size_per_im=256,
rpn_fg_fraction=0.5,
aug_rand_hflip=True,
aug_scale_min=0.8,
aug_scale_max=1.2,
skip_crowd_during_training=skip_crowd_during_training,
max_num_instances=max_num_instances,
include_mask=include_mask,
mask_crop_size=mask_crop_size,
dtype='bfloat16')
decoder = tf_example_decoder.TfExampleDecoder(include_mask=include_mask)
reader = input_reader.InputReader(
params,
dataset_fn=tf.data.TFRecordDataset,
decoder_fn=decoder.decode,
parser_fn=parser.parse_fn(params.is_training))
dataset = reader.read()
iterator = iter(dataset)
images, labels = next(iterator)
np_images = images.numpy()
np_labels = tf.nest.map_structure(lambda x: x.numpy(), labels)
if is_training:
self.assertAllEqual(np_images.shape,
[batch_size, output_size[0], output_size[1], 3])
self.assertAllEqual(np_labels['image_info'].shape, [batch_size, 4, 2])
self.assertAllEqual(np_labels['gt_boxes'].shape,
[batch_size, max_num_instances, 4])
self.assertAllEqual(np_labels['gt_classes'].shape,
[batch_size, max_num_instances])
if include_mask:
self.assertAllEqual(np_labels['gt_masks'].shape,
[batch_size, max_num_instances,
mask_crop_size, mask_crop_size])
for level in range(min_level, max_level + 1):
stride = 2 ** level
output_size_l = [output_size[0] / stride, output_size[1] / stride]
anchors_per_location = num_scales * len(aspect_ratios)
self.assertAllEqual(
np_labels['rpn_score_targets'][level].shape,
[batch_size, output_size_l[0], output_size_l[1],
anchors_per_location])
self.assertAllEqual(
np_labels['rpn_box_targets'][level].shape,
[batch_size, output_size_l[0], output_size_l[1],
4 * anchors_per_location])
self.assertAllEqual(
np_labels['anchor_boxes'][level].shape,
[batch_size, output_size_l[0], output_size_l[1],
4 * anchors_per_location])
else:
self.assertAllEqual(np_images.shape,
[batch_size, output_size[0], output_size[1], 3])
self.assertAllEqual(np_labels['image_info'].shape, [batch_size, 4, 2])
if __name__ == '__main__':
tf.test.main()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment