Commit 1efe98bb authored by Zhichao Lu's avatar Zhichao Lu Committed by lzc5123016
Browse files

Merged commit includes the following changes:

185215255  by Zhichao Lu:

    Stop populating image/object/class/text field when generating COCO tf record.

--
185213306  by Zhichao Lu:

    Use the params batch size and not the one from train_config in input_fn

--
185209081  by Zhichao Lu:

    Handle the case when there are no ground-truth masks for an image.

--
185195531  by Zhichao Lu:

    Remove unstack and stack operations on features from third_party/object_detection/model.py.

--
185195017  by Zhichao Lu:

    Matrix multiplication based gather op implementation.

--
185187744  by Zhichao Lu:

    Fix eval_util minor issue.

--
185098733  by Zhichao Lu:

    Internal change

185076656  by Zhichao Lu:

    Increment the amount of boxes for coco17.

--
185074199  by Zhichao Lu:

    Add config for SSD Resnet50 v1 with FPN.

--
185060199  by Zhichao Lu:

    Fix a bug in clear_detections.
    This method set detection_keys to an empty dictionary instead of an empty set. I've refactored so that this method and the constructor use the same code path.

--
185031359  by Zhichao Lu:

    Eval TPU trained models continuously.

--
185016591  by Zhichao Lu:

    Use TPUEstimatorSpec for TPU

--
185013651  by Zhichao Lu:

    Add PreprocessorCache to record and duplicate augmentations.

--
184921763  by Zhichao Lu:

    Minor fixes for object detection.

--
184920610  by Zhichao Lu:

    Adds a model builder test for "embedded_ssd_mobilenet_v1" feature extractor.

--
184919284  by Zhichao Lu:

    Added unit tests for TPU, with optional training / eval.

--
184915910  by Zhichao Lu:

    Update third_party g3 doc with Mask RCNN detection models.

--
184914085  by Zhichao Lu:

    Slight change to WeightSharedConvolutionalBoxPredictor implementation to make things match more closely with RetinaNet.  Specifically we now construct the box encoding and class predictor towers separately rather than having them share weights until penultimate layer.

--
184913786  by Zhichao Lu:

    Plumbs SSD Resnet V1 with FPN models into model builder.

--
184910030  by Zhichao Lu:

    Add coco metrics to evaluator.

--
184897758  by Zhichao Lu:

    Merge changes from github.

--
184888736  by Zhichao Lu:

    Ensure groundtruth_weights are always 1-D.

--
184887256  by Zhichao Lu:

    Introduce an option to add summaries in the model so it can be turned off when necessary.

--
184865559  by Zhichao Lu:

    Updating inputs so that a dictionary of tensors is returned from input_fn. Moving unbatch/unpad to model.py.
    Also removing source_id key from features dictionary, and replacing with an integer hash.

--
184859205  by Zhichao Lu:

    This CL is trying to hide those differences by making the default settings work with the public code.

--
184769779  by Zhichao Lu:

    Pass groundtruth weights into ssd meta architecture all the way to target assigner.

    This will allow training ssd models with padded groundtruth tensors.

--
184767117  by Zhichao Lu:

    * Add `params` arg to make all input fns work with TPUEstimator
    * Add --master
    * Output eval results

--
184766244  by Zhichao Lu:

    Update create_coco_tf_record to include category indices

--
184752937  by Zhichao Lu:

    Create a third_party version of TPU compatible mobilenet_v2_focal_loss coco config.

--
184750174  by Zhichao Lu:

    A few small fixes for multiscale anchor generator and a test.

--
184746581  by Zhichao Lu:

    Update jupyter notebook to show mask if provided by model.

--
184728646  by Zhichao Lu:

    Adding a few more tests to make sure decoding with/without label maps performs as expected.

--
184624154  by Zhichao Lu:

    Add an object detection binary for TPU.

--
184622118  by Zhichao Lu:

    Batch, transform, and unbatch in the tflearn interface.

--
184595064  by Zhichao Lu:

    Add support for training grayscale models.

--
184532026  by Zhichao Lu:

    Change dataset_builder.build to perform optional batching using tf.data.Dataset API

--
184330239  by Zhichao Lu:

    Add augment_input_data and transform_input_data helper functions to third_party/tensorflow_models/object_detection/inputs.py

--
184328681  by Zhichao Lu:

    Use an internal rgb to gray method that can be quantized.

--
184327909  by Zhichao Lu:

    Helper function to return padding shapes to use with Dataset.padded_batch.

--
184326291  by Zhichao Lu:

    Added decode_func for specialized decoding.

--
184314676  by Zhichao Lu:

    Add unstack_batch method to inputs.py.

    This will enable us to convert batched tensors to lists of tensors. This is compatible with OD API that consumes groundtruth batch as a list of tensors.

--
184281269  by Zhichao Lu:

    Internal test target changes.

--
184192851  by Zhichao Lu:

    Adding `Estimator` interface for object detection.

--
184187885  by Zhichao Lu:

    Add config_util functions to help with input pipeline.

    1. function to return expected shapes from the resizer config
    2. function to extract image_resizer_config from model_config.

--
184139892  by Zhichao Lu:

    Adding support for depthwise SSD (ssd-lite) and depthwise box predictions.

--
184089891  by Zhichao Lu:

    Fix third_party faster rcnn resnet101 coco config.

--
184083378  by Zhichao Lu:

    In the case when there is no object/weights field in tf.Example proto, return a default weight of 1.0 for all boxes.

--

PiperOrigin-RevId: 185215255
parent fbc5ba06
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Common utils for tests for object detection tflearn model."""
from __future__ import absolute_import
import os
import tempfile
import tensorflow as tf
from object_detection import model
from object_detection import model_hparams
FLAGS = tf.flags.FLAGS
FASTER_RCNN_MODEL_NAME = 'faster_rcnn_resnet50_pets'
SSD_INCEPTION_MODEL_NAME = 'ssd_inception_v2_pets'
PATH_BASE = 'google3/third_party/tensorflow_models/object_detection/'
def GetPipelineConfigPath(model_name):
"""Returns path to the local pipeline config file."""
return os.path.join(FLAGS.test_srcdir, PATH_BASE, 'samples', 'configs',
model_name + '.config')
def InitializeFlags(model_name_for_test):
FLAGS.model_dir = tempfile.mkdtemp()
FLAGS.pipeline_config_path = GetPipelineConfigPath(model_name_for_test)
def BuildExperiment():
"""Builds an Experiment object for testing purposes."""
run_config = tf.contrib.learn.RunConfig()
hparams = model_hparams.create_hparams(
hparams_overrides='load_pretrained=false')
# pylint: disable=protected-access
experiment_fn = model._build_experiment_fn(10, 10)
# pylint: enable=protected-access
return experiment_fn(run_config, hparams)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Creates and runs `Estimator` for object detection model on TPUs.
This uses the TPUEstimator API to define and run a model in TRAIN/EVAL modes.
"""
# pylint: enable=line-too-long
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import os
import tensorflow as tf
from tensorflow.contrib.tpu.python.tpu import tpu_config
from tensorflow.contrib.tpu.python.tpu import tpu_estimator
from tensorflow.contrib.training.python.training import evaluation
from object_detection import inputs
from object_detection import model
from object_detection import model_hparams
from object_detection.builders import model_builder
from object_detection.utils import config_util
tf.flags.DEFINE_bool('use_tpu', True, 'Use TPUs rather than plain CPUs')
# Cloud TPU Cluster Resolvers
tf.flags.DEFINE_string(
'gcp_project',
default=None,
help='Project name for the Cloud TPU-enabled project. If not specified, we '
'will attempt to automatically detect the GCE project from metadata.')
tf.flags.DEFINE_string(
'tpu_zone',
default=None,
help='GCE zone where the Cloud TPU is located in. If not specified, we '
'will attempt to automatically detect the GCE project from metadata.')
tf.flags.DEFINE_string(
'tpu_name',
default=None,
help='Name of the Cloud TPU for Cluster Resolvers. You must specify either '
'this flag or --master.')
tf.flags.DEFINE_string(
'master', default=None,
help='GRPC URL of the master (e.g. grpc://ip.address.of.tpu:8470). You '
'must specify either this flag or --tpu_name.')
tf.flags.DEFINE_integer('num_shards', 8, 'Number of shards (TPU cores).')
tf.flags.DEFINE_integer('iterations_per_loop', 100,
'Number of iterations per TPU training loop.')
# For mode=train_and_eval, evaluation occurs after training is finished.
# Note: independently of steps_per_checkpoint, estimator will save the most
# recent checkpoint every 10 minutes by default for train_and_eval
tf.flags.DEFINE_string('mode', 'train_and_eval',
'Mode to run: train, eval, train_and_eval')
tf.flags.DEFINE_integer('train_batch_size', 32 * 8, 'Batch size for training.')
# For EVAL.
tf.flags.DEFINE_integer('min_eval_interval_secs', 180,
'Minimum seconds between evaluations.')
tf.flags.DEFINE_integer(
'eval_timeout_secs', None,
'Maximum seconds between checkpoints before evaluation terminates.')
FLAGS = tf.flags.FLAGS
def create_estimator(run_config,
hparams,
pipeline_config_path,
train_steps=None,
eval_steps=None,
train_batch_size=None,
model_fn_creator=model.create_model_fn,
use_tpu=False,
num_shards=1,
params=None,
**kwargs):
"""Creates an `Estimator` object.
Args:
run_config: A `RunConfig`.
hparams: A `HParams`.
pipeline_config_path: A path to a pipeline config file.
train_steps: Number of training steps. If None, the number of training steps
is set from the `TrainConfig` proto.
eval_steps: Number of evaluation steps per evaluation cycle. If None, the
number of evaluation steps is set from the `EvalConfig` proto.
train_batch_size: Training batch size. If none, use batch size from
`TrainConfig` proto.
model_fn_creator: A function that creates a `model_fn` for `Estimator`.
Follows the signature:
* Args:
* `detection_model_fn`: Function that returns `DetectionModel` instance.
* `configs`: Dictionary of pipeline config objects.
* `hparams`: `HParams` object.
* Returns:
`model_fn` for `Estimator`.
use_tpu: Boolean, whether training and evaluation should run on TPU.
num_shards: Number of shards (TPU cores).
params: Parameter dictionary passed from the estimator.
**kwargs: Additional keyword arguments for configuration override.
Returns:
Estimator: A estimator object used for training and evaluation
train_input_fn: Input function for the training loop
eval_input_fn: Input function for the evaluation run
train_steps: Number of training steps either from arg `train_steps` or
`TrainConfig` proto
eval_steps: Number of evaluation steps either from arg `eval_steps` or
`EvalConfig` proto
"""
configs = config_util.get_configs_from_pipeline_file(pipeline_config_path)
configs = config_util.merge_external_params_with_configs(
configs,
hparams,
train_steps=train_steps,
eval_steps=eval_steps,
batch_size=train_batch_size,
**kwargs)
model_config = configs['model']
train_config = configs['train_config']
train_input_config = configs['train_input_config']
eval_config = configs['eval_config']
eval_input_config = configs['eval_input_config']
if params is None:
params = {}
if train_steps is None:
train_steps = train_config.num_steps if train_config.num_steps else None
if eval_steps is None:
eval_steps = eval_config.num_examples if eval_config.num_examples else None
detection_model_fn = functools.partial(
model_builder.build, model_config=model_config)
# Create the input functions for TRAIN/EVAL.
train_input_fn = inputs.create_train_input_fn(
train_config=train_config,
train_input_config=train_input_config,
model_config=model_config)
eval_input_fn = inputs.create_eval_input_fn(
eval_config=eval_config,
eval_input_config=eval_input_config,
model_config=model_config)
estimator = tpu_estimator.TPUEstimator(
model_fn=model_fn_creator(detection_model_fn, configs, hparams,
use_tpu),
train_batch_size=train_config.batch_size,
# For each core, only batch size 1 is supported for eval.
eval_batch_size=num_shards * 1 if use_tpu else 1,
use_tpu=use_tpu,
config=run_config,
params=params)
return estimator, train_input_fn, eval_input_fn, train_steps, eval_steps
def main(unused_argv):
tf.flags.mark_flag_as_required('model_dir')
tf.flags.mark_flag_as_required('pipeline_config_path')
if FLAGS.master is None and FLAGS.tpu_name is None:
raise RuntimeError('You must specify either --master or --tpu_name.')
if FLAGS.master is not None:
if FLAGS.tpu_name is not None:
tf.logging.warn('Both --master and --tpu_name are set. Ignoring '
'--tpu_name and using --master.')
tpu_grpc_url = FLAGS.master
else:
tpu_cluster_resolver = (
tf.contrib.cluster_resolver.python.training.TPUClusterResolver(
tpu_names=[FLAGS.tpu_name],
zone=FLAGS.tpu_zone,
project=FLAGS.gcp_project))
tpu_grpc_url = tpu_cluster_resolver.get_master()
config = tpu_config.RunConfig(
master=tpu_grpc_url,
evaluation_master=tpu_grpc_url,
model_dir=FLAGS.model_dir,
tpu_config=tpu_config.TPUConfig(
iterations_per_loop=FLAGS.iterations_per_loop,
num_shards=FLAGS.num_shards))
params = {}
estimator, train_input_fn, eval_input_fn, train_steps, eval_steps = (
create_estimator(
config,
model_hparams.create_hparams(),
FLAGS.pipeline_config_path,
train_steps=FLAGS.num_train_steps,
eval_steps=FLAGS.num_eval_steps,
train_batch_size=FLAGS.train_batch_size,
use_tpu=FLAGS.use_tpu,
num_shards=FLAGS.num_shards,
params=params))
if FLAGS.mode in ['train', 'train_and_eval']:
estimator.train(input_fn=train_input_fn, max_steps=train_steps)
if FLAGS.mode == 'train_and_eval':
# Eval one time.
eval_results = estimator.evaluate(input_fn=eval_input_fn, steps=eval_steps)
tf.logging.info('Eval results: %s' % eval_results)
# Continuously evaluating.
if FLAGS.mode == 'eval':
def terminate_eval():
tf.logging.info('Terminating eval after %d seconds of no checkpoints' %
FLAGS.eval_timeout_secs)
return True
# Run evaluation when there's a new checkpoint.
for ckpt in evaluation.checkpoints_iterator(
FLAGS.model_dir,
min_interval_secs=FLAGS.min_eval_interval_secs,
timeout=FLAGS.eval_timeout_secs,
timeout_fn=terminate_eval):
tf.logging.info('Starting to evaluate.')
try:
eval_results = estimator.evaluate(
input_fn=eval_input_fn,
steps=eval_steps,
checkpoint_path=ckpt)
tf.logging.info('Eval results: %s' % eval_results)
# Terminate eval job when final checkpoint is reached
current_step = int(os.path.basename(ckpt).split('-')[1])
if current_step >= train_steps:
tf.logging.info(
'Evaluation finished after training step %d' % current_step)
break
except tf.errors.NotFoundError:
tf.logging.info(
'Checkpoint %s no longer exists, skipping checkpoint' % ckpt)
if __name__ == '__main__':
tf.app.run()
...@@ -119,6 +119,7 @@ py_library( ...@@ -119,6 +119,7 @@ py_library(
py_test( py_test(
name = "ssd_resnet_v1_fpn_feature_extractor_test", name = "ssd_resnet_v1_fpn_feature_extractor_test",
timeout = "long",
srcs = ["ssd_resnet_v1_fpn_feature_extractor_test.py"], srcs = ["ssd_resnet_v1_fpn_feature_extractor_test.py"],
deps = [ deps = [
":ssd_resnet_v1_fpn_feature_extractor", ":ssd_resnet_v1_fpn_feature_extractor",
......
...@@ -52,7 +52,8 @@ class EmbeddedSSDMobileNetV1FeatureExtractor( ...@@ -52,7 +52,8 @@ class EmbeddedSSDMobileNetV1FeatureExtractor(
conv_hyperparams, conv_hyperparams,
batch_norm_trainable=True, batch_norm_trainable=True,
reuse_weights=None, reuse_weights=None,
use_explicit_padding=False): use_explicit_padding=False,
use_depthwise=False):
"""MobileNetV1 Feature Extractor for Embedded-friendly SSD Models. """MobileNetV1 Feature Extractor for Embedded-friendly SSD Models.
Args: Args:
...@@ -69,6 +70,7 @@ class EmbeddedSSDMobileNetV1FeatureExtractor( ...@@ -69,6 +70,7 @@ class EmbeddedSSDMobileNetV1FeatureExtractor(
reuse_weights: Whether to reuse variables. Default is None. reuse_weights: Whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
Raises: Raises:
ValueError: upon invalid `pad_to_multiple` values. ValueError: upon invalid `pad_to_multiple` values.
...@@ -80,7 +82,7 @@ class EmbeddedSSDMobileNetV1FeatureExtractor( ...@@ -80,7 +82,7 @@ class EmbeddedSSDMobileNetV1FeatureExtractor(
super(EmbeddedSSDMobileNetV1FeatureExtractor, self).__init__( super(EmbeddedSSDMobileNetV1FeatureExtractor, self).__init__(
is_training, depth_multiplier, min_depth, pad_to_multiple, is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, batch_norm_trainable, reuse_weights, conv_hyperparams, batch_norm_trainable, reuse_weights,
use_explicit_padding) use_explicit_padding, use_depthwise)
def extract_features(self, preprocessed_inputs): def extract_features(self, preprocessed_inputs):
"""Extract features from preprocessed inputs. """Extract features from preprocessed inputs.
...@@ -119,6 +121,7 @@ class EmbeddedSSDMobileNetV1FeatureExtractor( ...@@ -119,6 +121,7 @@ class EmbeddedSSDMobileNetV1FeatureExtractor(
'layer_depth': [-1, -1, 512, 256, 256], 'layer_depth': [-1, -1, 512, 256, 256],
'conv_kernel_size': [-1, -1, 3, 3, 2], 'conv_kernel_size': [-1, -1, 3, 3, 2],
'use_explicit_padding': self._use_explicit_padding, 'use_explicit_padding': self._use_explicit_padding,
'use_depthwise': self._use_depthwise,
} }
with slim.arg_scope(self._conv_hyperparams): with slim.arg_scope(self._conv_hyperparams):
......
...@@ -36,7 +36,8 @@ class SSDInceptionV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -36,7 +36,8 @@ class SSDInceptionV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
conv_hyperparams, conv_hyperparams,
batch_norm_trainable=True, batch_norm_trainable=True,
reuse_weights=None, reuse_weights=None,
use_explicit_padding=False): use_explicit_padding=False,
use_depthwise=False):
"""InceptionV2 Feature Extractor for SSD Models. """InceptionV2 Feature Extractor for SSD Models.
Args: Args:
...@@ -53,11 +54,12 @@ class SSDInceptionV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -53,11 +54,12 @@ class SSDInceptionV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
reuse_weights: Whether to reuse variables. Default is None. reuse_weights: Whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
""" """
super(SSDInceptionV2FeatureExtractor, self).__init__( super(SSDInceptionV2FeatureExtractor, self).__init__(
is_training, depth_multiplier, min_depth, pad_to_multiple, is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, batch_norm_trainable, reuse_weights, conv_hyperparams, batch_norm_trainable, reuse_weights,
use_explicit_padding) use_explicit_padding, use_depthwise)
def preprocess(self, resized_inputs): def preprocess(self, resized_inputs):
"""SSD preprocessing. """SSD preprocessing.
...@@ -92,6 +94,7 @@ class SSDInceptionV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -92,6 +94,7 @@ class SSDInceptionV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
'from_layer': ['Mixed_4c', 'Mixed_5c', '', '', '', ''], 'from_layer': ['Mixed_4c', 'Mixed_5c', '', '', '', ''],
'layer_depth': [-1, -1, 512, 256, 256, 128], 'layer_depth': [-1, -1, 512, 256, 256, 128],
'use_explicit_padding': self._use_explicit_padding, 'use_explicit_padding': self._use_explicit_padding,
'use_depthwise': self._use_depthwise,
} }
with slim.arg_scope(self._conv_hyperparams): with slim.arg_scope(self._conv_hyperparams):
......
...@@ -36,7 +36,8 @@ class SSDInceptionV3FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -36,7 +36,8 @@ class SSDInceptionV3FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
conv_hyperparams, conv_hyperparams,
batch_norm_trainable=True, batch_norm_trainable=True,
reuse_weights=None, reuse_weights=None,
use_explicit_padding=False): use_explicit_padding=False,
use_depthwise=False):
"""InceptionV3 Feature Extractor for SSD Models. """InceptionV3 Feature Extractor for SSD Models.
Args: Args:
...@@ -53,11 +54,12 @@ class SSDInceptionV3FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -53,11 +54,12 @@ class SSDInceptionV3FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
reuse_weights: Whether to reuse variables. Default is None. reuse_weights: Whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
""" """
super(SSDInceptionV3FeatureExtractor, self).__init__( super(SSDInceptionV3FeatureExtractor, self).__init__(
is_training, depth_multiplier, min_depth, pad_to_multiple, is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, batch_norm_trainable, reuse_weights, conv_hyperparams, batch_norm_trainable, reuse_weights,
use_explicit_padding) use_explicit_padding, use_depthwise)
def preprocess(self, resized_inputs): def preprocess(self, resized_inputs):
"""SSD preprocessing. """SSD preprocessing.
...@@ -92,6 +94,7 @@ class SSDInceptionV3FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -92,6 +94,7 @@ class SSDInceptionV3FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''], 'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''],
'layer_depth': [-1, -1, -1, 512, 256, 128], 'layer_depth': [-1, -1, -1, 512, 256, 128],
'use_explicit_padding': self._use_explicit_padding, 'use_explicit_padding': self._use_explicit_padding,
'use_depthwise': self._use_depthwise,
} }
with slim.arg_scope(self._conv_hyperparams): with slim.arg_scope(self._conv_hyperparams):
......
...@@ -37,7 +37,8 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -37,7 +37,8 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
conv_hyperparams, conv_hyperparams,
batch_norm_trainable=True, batch_norm_trainable=True,
reuse_weights=None, reuse_weights=None,
use_explicit_padding=False): use_explicit_padding=False,
use_depthwise=False):
"""MobileNetV1 Feature Extractor for SSD Models. """MobileNetV1 Feature Extractor for SSD Models.
Args: Args:
...@@ -54,11 +55,12 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -54,11 +55,12 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
reuse_weights: Whether to reuse variables. Default is None. reuse_weights: Whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
""" """
super(SSDMobileNetV1FeatureExtractor, self).__init__( super(SSDMobileNetV1FeatureExtractor, self).__init__(
is_training, depth_multiplier, min_depth, pad_to_multiple, is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, batch_norm_trainable, reuse_weights, conv_hyperparams, batch_norm_trainable, reuse_weights,
use_explicit_padding) use_explicit_padding, use_depthwise)
def preprocess(self, resized_inputs): def preprocess(self, resized_inputs):
"""SSD preprocessing. """SSD preprocessing.
...@@ -94,6 +96,7 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -94,6 +96,7 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
'', ''], '', ''],
'layer_depth': [-1, -1, 512, 256, 256, 128], 'layer_depth': [-1, -1, 512, 256, 256, 128],
'use_explicit_padding': self._use_explicit_padding, 'use_explicit_padding': self._use_explicit_padding,
'use_depthwise': self._use_depthwise,
} }
with slim.arg_scope(self._conv_hyperparams): with slim.arg_scope(self._conv_hyperparams):
......
...@@ -25,9 +25,11 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -25,9 +25,11 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
conv_hyperparams, conv_hyperparams,
resnet_base_fn, resnet_base_fn,
resnet_scope_name, resnet_scope_name,
fpn_scope_name,
batch_norm_trainable=True, batch_norm_trainable=True,
reuse_weights=None, reuse_weights=None,
use_explicit_padding=False): use_explicit_padding=False,
use_depthwise=False):
"""SSD FPN feature extractor based on Resnet v1 architecture. """SSD FPN feature extractor based on Resnet v1 architecture.
Args: Args:
...@@ -39,7 +41,9 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -39,7 +41,9 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
width dimensions to. width dimensions to.
conv_hyperparams: tf slim arg_scope for conv2d and separable_conv2d ops. conv_hyperparams: tf slim arg_scope for conv2d and separable_conv2d ops.
resnet_base_fn: base resnet network to use. resnet_base_fn: base resnet network to use.
resnet_scope_name: scope name to construct resnet resnet_scope_name: scope name under which to construct resnet
fpn_scope_name: scope name under which to construct the feature pyramid
network.
batch_norm_trainable: Whether to update batch norm parameters during batch_norm_trainable: Whether to update batch norm parameters during
training or not. When training with a small batch size training or not. When training with a small batch size
(e.g. 1), it is desirable to disable batch norm update and use (e.g. 1), it is desirable to disable batch norm update and use
...@@ -47,6 +51,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -47,6 +51,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
reuse_weights: Whether to reuse variables. Default is None. reuse_weights: Whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. UNUSED currently. features. Default is False. UNUSED currently.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
Raises: Raises:
ValueError: On supplying invalid arguments for unused arguments. ValueError: On supplying invalid arguments for unused arguments.
...@@ -62,6 +67,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -62,6 +67,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
raise ValueError('Explicit padding is not a valid option.') raise ValueError('Explicit padding is not a valid option.')
self._resnet_base_fn = resnet_base_fn self._resnet_base_fn = resnet_base_fn
self._resnet_scope_name = resnet_scope_name self._resnet_scope_name = resnet_scope_name
self._fpn_scope_name = fpn_scope_name
def preprocess(self, resized_inputs): def preprocess(self, resized_inputs):
"""SSD preprocessing. """SSD preprocessing.
...@@ -124,6 +130,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -124,6 +130,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
scope=scope) scope=scope)
image_features = self._filter_features(image_features) image_features = self._filter_features(image_features)
last_feature_map = image_features['block4'] last_feature_map = image_features['block4']
with tf.variable_scope(self._fpn_scope_name, reuse=self._reuse_weights):
with slim.arg_scope(self._conv_hyperparams): with slim.arg_scope(self._conv_hyperparams):
for i in range(5, 7): for i in range(5, 7):
last_feature_map = slim.conv2d( last_feature_map = slim.conv2d(
...@@ -154,7 +161,8 @@ class SSDResnet50V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor): ...@@ -154,7 +161,8 @@ class SSDResnet50V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
conv_hyperparams, conv_hyperparams,
batch_norm_trainable=True, batch_norm_trainable=True,
reuse_weights=None, reuse_weights=None,
use_explicit_padding=False): use_explicit_padding=False,
use_depthwise=False):
"""Resnet50 v1 FPN Feature Extractor for SSD Models. """Resnet50 v1 FPN Feature Extractor for SSD Models.
Args: Args:
...@@ -170,11 +178,12 @@ class SSDResnet50V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor): ...@@ -170,11 +178,12 @@ class SSDResnet50V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
pretrained batch norm params. pretrained batch norm params.
reuse_weights: Whether to reuse variables. Default is None. reuse_weights: Whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. features. Default is False. UNUSED currently.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
""" """
super(SSDResnet50V1FpnFeatureExtractor, self).__init__( super(SSDResnet50V1FpnFeatureExtractor, self).__init__(
is_training, depth_multiplier, min_depth, pad_to_multiple, is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, resnet_v1.resnet_v1_50, 'resnet_v1_50_fpn', conv_hyperparams, resnet_v1.resnet_v1_50, 'resnet_v1_50', 'fpn',
batch_norm_trainable, reuse_weights, use_explicit_padding) batch_norm_trainable, reuse_weights, use_explicit_padding)
...@@ -188,7 +197,8 @@ class SSDResnet101V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor): ...@@ -188,7 +197,8 @@ class SSDResnet101V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
conv_hyperparams, conv_hyperparams,
batch_norm_trainable=True, batch_norm_trainable=True,
reuse_weights=None, reuse_weights=None,
use_explicit_padding=False): use_explicit_padding=False,
use_depthwise=False):
"""Resnet101 v1 FPN Feature Extractor for SSD Models. """Resnet101 v1 FPN Feature Extractor for SSD Models.
Args: Args:
...@@ -204,11 +214,12 @@ class SSDResnet101V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor): ...@@ -204,11 +214,12 @@ class SSDResnet101V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
pretrained batch norm params. pretrained batch norm params.
reuse_weights: Whether to reuse variables. Default is None. reuse_weights: Whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. features. Default is False. UNUSED currently.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
""" """
super(SSDResnet101V1FpnFeatureExtractor, self).__init__( super(SSDResnet101V1FpnFeatureExtractor, self).__init__(
is_training, depth_multiplier, min_depth, pad_to_multiple, is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, resnet_v1.resnet_v1_101, 'resnet_v1_101_fpn', conv_hyperparams, resnet_v1.resnet_v1_101, 'resnet_v1_101', 'fpn',
batch_norm_trainable, reuse_weights, use_explicit_padding) batch_norm_trainable, reuse_weights, use_explicit_padding)
...@@ -222,7 +233,8 @@ class SSDResnet152V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor): ...@@ -222,7 +233,8 @@ class SSDResnet152V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
conv_hyperparams, conv_hyperparams,
batch_norm_trainable=True, batch_norm_trainable=True,
reuse_weights=None, reuse_weights=None,
use_explicit_padding=False): use_explicit_padding=False,
use_depthwise=False):
"""Resnet152 v1 FPN Feature Extractor for SSD Models. """Resnet152 v1 FPN Feature Extractor for SSD Models.
Args: Args:
...@@ -238,9 +250,10 @@ class SSDResnet152V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor): ...@@ -238,9 +250,10 @@ class SSDResnet152V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
pretrained batch norm params. pretrained batch norm params.
reuse_weights: Whether to reuse variables. Default is None. reuse_weights: Whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. features. Default is False. UNUSED currently.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
""" """
super(SSDResnet152V1FpnFeatureExtractor, self).__init__( super(SSDResnet152V1FpnFeatureExtractor, self).__init__(
is_training, depth_multiplier, min_depth, pad_to_multiple, is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, resnet_v1.resnet_v1_152, 'resnet_v1_152_fpn', conv_hyperparams, resnet_v1.resnet_v1_152, 'resnet_v1_152', 'fpn',
batch_norm_trainable, reuse_weights, use_explicit_padding) batch_norm_trainable, reuse_weights, use_explicit_padding)
...@@ -7,7 +7,7 @@ from object_detection.models import ssd_resnet_v1_fpn_feature_extractor_testbase ...@@ -7,7 +7,7 @@ from object_detection.models import ssd_resnet_v1_fpn_feature_extractor_testbase
class SSDResnet50V1FeatureExtractorTest( class SSDResnet50V1FeatureExtractorTest(
ssd_resnet_v1_fpn_feature_extractor_testbase. ssd_resnet_v1_fpn_feature_extractor_testbase.
SSDResnetFeatureExtractorTestBase): SSDResnetFPNFeatureExtractorTestBase):
"""SSDResnet50v1Fpn feature extractor test.""" """SSDResnet50v1Fpn feature extractor test."""
def _create_feature_extractor(self, depth_multiplier, pad_to_multiple): def _create_feature_extractor(self, depth_multiplier, pad_to_multiple):
...@@ -19,13 +19,13 @@ class SSDResnet50V1FeatureExtractorTest( ...@@ -19,13 +19,13 @@ class SSDResnet50V1FeatureExtractorTest(
is_training, depth_multiplier, min_depth, pad_to_multiple, is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, batch_norm_trainable) conv_hyperparams, batch_norm_trainable)
def _scope_name(self): def _resnet_scope_name(self):
return 'resnet_v1_50_fpn' return 'resnet_v1_50'
class SSDResnet101V1FeatureExtractorTest( class SSDResnet101V1FeatureExtractorTest(
ssd_resnet_v1_fpn_feature_extractor_testbase. ssd_resnet_v1_fpn_feature_extractor_testbase.
SSDResnetFeatureExtractorTestBase): SSDResnetFPNFeatureExtractorTestBase):
"""SSDResnet101v1Fpn feature extractor test.""" """SSDResnet101v1Fpn feature extractor test."""
def _create_feature_extractor(self, depth_multiplier, pad_to_multiple): def _create_feature_extractor(self, depth_multiplier, pad_to_multiple):
...@@ -38,13 +38,13 @@ class SSDResnet101V1FeatureExtractorTest( ...@@ -38,13 +38,13 @@ class SSDResnet101V1FeatureExtractorTest(
is_training, depth_multiplier, min_depth, pad_to_multiple, is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, batch_norm_trainable)) conv_hyperparams, batch_norm_trainable))
def _scope_name(self): def _resnet_scope_name(self):
return 'resnet_v1_101_fpn' return 'resnet_v1_101'
class SSDResnet152V1FeatureExtractorTest( class SSDResnet152V1FeatureExtractorTest(
ssd_resnet_v1_fpn_feature_extractor_testbase. ssd_resnet_v1_fpn_feature_extractor_testbase.
SSDResnetFeatureExtractorTestBase): SSDResnetFPNFeatureExtractorTestBase):
"""SSDResnet152v1Fpn feature extractor test.""" """SSDResnet152v1Fpn feature extractor test."""
def _create_feature_extractor(self, depth_multiplier, pad_to_multiple): def _create_feature_extractor(self, depth_multiplier, pad_to_multiple):
...@@ -57,8 +57,8 @@ class SSDResnet152V1FeatureExtractorTest( ...@@ -57,8 +57,8 @@ class SSDResnet152V1FeatureExtractorTest(
is_training, depth_multiplier, min_depth, pad_to_multiple, is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, batch_norm_trainable)) conv_hyperparams, batch_norm_trainable))
def _scope_name(self): def _resnet_scope_name(self):
return 'resnet_v1_152_fpn' return 'resnet_v1_152'
if __name__ == '__main__': if __name__ == '__main__':
......
"""Tests for ssd resnet v1 FPN feature extractors.""" """Tests for ssd resnet v1 FPN feature extractors."""
import abc import abc
import numpy as np import numpy as np
import tensorflow as tf
from object_detection.models import ssd_feature_extractor_test from object_detection.models import ssd_feature_extractor_test
class SSDResnetFeatureExtractorTestBase( class SSDResnetFPNFeatureExtractorTestBase(
ssd_feature_extractor_test.SsdFeatureExtractorTestBase): ssd_feature_extractor_test.SsdFeatureExtractorTestBase):
"""Helper test class for SSD Resnet v1 FPN feature extractors.""" """Helper test class for SSD Resnet v1 FPN feature extractors."""
@abc.abstractmethod @abc.abstractmethod
def _scope_name(self): def _resnet_scope_name(self):
pass pass
@abc.abstractmethod
def _fpn_scope_name(self):
return 'fpn'
def test_extract_features_returns_correct_shapes_256(self): def test_extract_features_returns_correct_shapes_256(self):
image_height = 256 image_height = 256
image_width = 256 image_width = 256
...@@ -73,5 +78,16 @@ class SSDResnetFeatureExtractorTestBase( ...@@ -73,5 +78,16 @@ class SSDResnetFeatureExtractorTestBase(
def test_variables_only_created_in_scope(self): def test_variables_only_created_in_scope(self):
depth_multiplier = 1 depth_multiplier = 1
pad_to_multiple = 1 pad_to_multiple = 1
self.check_feature_extractor_variables_under_scope( g = tf.Graph()
depth_multiplier, pad_to_multiple, self._scope_name()) with g.as_default():
feature_extractor = self._create_feature_extractor(
depth_multiplier, pad_to_multiple)
preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
feature_extractor.extract_features(preprocessed_inputs)
variables = g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
for variable in variables:
self.assertTrue(
variable.name.startswith(self._resnet_scope_name())
or variable.name.startswith(self._fpn_scope_name()))
...@@ -35,6 +35,7 @@ ...@@ -35,6 +35,7 @@
"from io import StringIO\n", "from io import StringIO\n",
"from matplotlib import pyplot as plt\n", "from matplotlib import pyplot as plt\n",
"from PIL import Image\n", "from PIL import Image\n",
"from object_detection.utils import ops as utils_ops\n",
"\n", "\n",
"if tf.__version__ < '1.4.0':\n", "if tf.__version__ < '1.4.0':\n",
" raise ImportError('Please upgrade your tensorflow installation to v1.4.* or later!')\n" " raise ImportError('Please upgrade your tensorflow installation to v1.4.* or later!')\n"
...@@ -223,6 +224,59 @@ ...@@ -223,6 +224,59 @@
"IMAGE_SIZE = (12, 8)" "IMAGE_SIZE = (12, 8)"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def run_inference_for_single_image(image, graph):\n",
" with graph.as_default():\n",
" with tf.Session() as sess:\n",
" # Get handles to input and output tensors\n",
" ops = tf.get_default_graph().get_operations()\n",
" all_tensor_names = {output.name for op in ops for output in op.outputs}\n",
" tensor_dict = {}\n",
" for key in [\n",
" 'num_detections', 'detection_boxes', 'detection_scores',\n",
" 'detection_classes', 'detection_masks'\n",
" ]:\n",
" tensor_name = key + ':0'\n",
" if tensor_name in all_tensor_names:\n",
" tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(\n",
" tensor_name)\n",
" if 'detection_masks' in tensor_dict:\n",
" # The following processing is only for single image\n",
" detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])\n",
" detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])\n",
" # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.\n",
" real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)\n",
" detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])\n",
" detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])\n",
" detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(\n",
" detection_masks, detection_boxes, image.shape[0], image.shape[1])\n",
" detection_masks_reframed = tf.cast(\n",
" tf.greater(detection_masks_reframed, 0.5), tf.uint8)\n",
" # Follow the convention by adding back the batch dimension\n",
" tensor_dict['detection_masks'] = tf.expand_dims(\n",
" detection_masks_reframed, 0)\n",
" image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')\n",
"\n",
" # Run inference\n",
" output_dict = sess.run(tensor_dict,\n",
" feed_dict={image_tensor: np.expand_dims(image, 0)})\n",
"\n",
" # all outputs are float32 numpy arrays, so convert types as appropriate\n",
" output_dict['num_detections'] = int(output_dict['num_detections'][0])\n",
" output_dict['detection_classes'] = output_dict[\n",
" 'detection_classes'][0].astype(np.uint8)\n",
" output_dict['detection_boxes'] = output_dict['detection_boxes'][0]\n",
" output_dict['detection_scores'] = output_dict['detection_scores'][0]\n",
" if 'detection_masks' in output_dict:\n",
" output_dict['detection_masks'] = output_dict['detection_masks'][0]\n",
" return output_dict"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
...@@ -231,39 +285,27 @@ ...@@ -231,39 +285,27 @@
}, },
"outputs": [], "outputs": [],
"source": [ "source": [
"with detection_graph.as_default():\n", "for image_path in TEST_IMAGE_PATHS:\n",
" with tf.Session(graph=detection_graph) as sess:\n", " image = Image.open(image_path)\n",
" # Definite input and output Tensors for detection_graph\n", " # the array based representation of the image will be used later in order to prepare the\n",
" image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')\n", " # result image with boxes and labels on it.\n",
" # Each box represents a part of the image where a particular object was detected.\n", " image_np = load_image_into_numpy_array(image)\n",
" detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')\n", " # Expand dimensions since the model expects images to have shape: [1, None, None, 3]\n",
" # Each score represent how level of confidence for each of the objects.\n", " image_np_expanded = np.expand_dims(image_np, axis=0)\n",
" # Score is shown on the result image, together with the class label.\n", " # Actual detection.\n",
" detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')\n", " output_dict = run_inference_for_single_image(image_np, detection_graph)\n",
" detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')\n", " # Visualization of the results of a detection.\n",
" num_detections = detection_graph.get_tensor_by_name('num_detections:0')\n", " vis_util.visualize_boxes_and_labels_on_image_array(\n",
" for image_path in TEST_IMAGE_PATHS:\n", " image_np,\n",
" image = Image.open(image_path)\n", " output_dict['detection_boxes'],\n",
" # the array based representation of the image will be used later in order to prepare the\n", " output_dict['detection_classes'],\n",
" # result image with boxes and labels on it.\n", " output_dict['detection_scores'],\n",
" image_np = load_image_into_numpy_array(image)\n", " category_index,\n",
" # Expand dimensions since the model expects images to have shape: [1, None, None, 3]\n", " instance_masks=output_dict.get('detection_masks'),\n",
" image_np_expanded = np.expand_dims(image_np, axis=0)\n", " use_normalized_coordinates=True,\n",
" # Actual detection.\n", " line_thickness=8)\n",
" (boxes, scores, classes, num) = sess.run(\n", " plt.figure(figsize=IMAGE_SIZE)\n",
" [detection_boxes, detection_scores, detection_classes, num_detections],\n", " plt.imshow(image_np)"
" feed_dict={image_tensor: image_np_expanded})\n",
" # Visualization of the results of a detection.\n",
" vis_util.visualize_boxes_and_labels_on_image_array(\n",
" image_np,\n",
" np.squeeze(boxes),\n",
" np.squeeze(classes).astype(np.int32),\n",
" np.squeeze(scores),\n",
" category_index,\n",
" use_normalized_coordinates=True,\n",
" line_thickness=8)\n",
" plt.figure(figsize=IMAGE_SIZE)\n",
" plt.imshow(image_np)"
] ]
}, },
{ {
...@@ -275,6 +317,9 @@ ...@@ -275,6 +317,9 @@
} }
], ],
"metadata": { "metadata": {
"colab": {
"version": "0.3.2"
},
"kernelspec": { "kernelspec": {
"display_name": "Python 2", "display_name": "Python 2",
"language": "python", "language": "python",
......
...@@ -22,4 +22,8 @@ message ArgMaxMatcher { ...@@ -22,4 +22,8 @@ message ArgMaxMatcher {
// Whether to ensure each row is matched to at least one column. // Whether to ensure each row is matched to at least one column.
optional bool force_match_for_each_row = 5 [default = false]; optional bool force_match_for_each_row = 5 [default = false];
// Force constructed match objects to use matrix multiplication based gather
// instead of standard tf.gather
optional bool use_matmul_gather = 6 [default = false];
} }
...@@ -5,4 +5,7 @@ package object_detection.protos; ...@@ -5,4 +5,7 @@ package object_detection.protos;
// Configuration proto for bipartite matcher. See // Configuration proto for bipartite matcher. See
// matchers/bipartite_matcher.py for details. // matchers/bipartite_matcher.py for details.
message BipartiteMatcher { message BipartiteMatcher {
// Force constructed match objects to use matrix multiplication based gather
// instead of standard tf.gather
optional bool use_matmul_gather = 6 [default = false];
} }
...@@ -52,6 +52,9 @@ message ConvolutionalBoxPredictor { ...@@ -52,6 +52,9 @@ message ConvolutionalBoxPredictor {
optional bool apply_sigmoid_to_scores = 9 [default = false]; optional bool apply_sigmoid_to_scores = 9 [default = false];
optional float class_prediction_bias_init = 10 [default = 0.0]; optional float class_prediction_bias_init = 10 [default = 0.0];
// Whether to use depthwise separable convolution for box predictor layers.
optional bool use_depthwise = 11 [default = false];
} }
// Configuration proto for weight shared convolutional box predictor. // Configuration proto for weight shared convolutional box predictor.
......
...@@ -34,6 +34,9 @@ message KeepAspectRatioResizer { ...@@ -34,6 +34,9 @@ message KeepAspectRatioResizer {
// [max_dimension, max_dimension]. Note that the zeros are padded to the // [max_dimension, max_dimension]. Note that the zeros are padded to the
// bottom and the right of the resized image. // bottom and the right of the resized image.
optional bool pad_to_max_dimension = 4 [default = false]; optional bool pad_to_max_dimension = 4 [default = false];
// Whether to also resize the image channels from 3 to 1 (RGB to grayscale).
optional bool convert_to_grayscale = 5 [default = false];
} }
// Configuration proto for image resizer that resizes to a fixed shape. // Configuration proto for image resizer that resizes to a fixed shape.
...@@ -46,4 +49,7 @@ message FixedShapeResizer { ...@@ -46,4 +49,7 @@ message FixedShapeResizer {
// Desired method when resizing image. // Desired method when resizing image.
optional ResizeType resize_method = 3 [default = BILINEAR]; optional ResizeType resize_method = 3 [default = BILINEAR];
// Whether to also resize the image channels from 3 to 1 (RGB to grayscale).
optional bool convert_to_grayscale = 4 [default = false];
} }
...@@ -19,4 +19,5 @@ message MultiscaleAnchorGenerator { ...@@ -19,4 +19,5 @@ message MultiscaleAnchorGenerator {
repeated float aspect_ratios = 4; repeated float aspect_ratios = 4;
// Number of intermediate scale each scale octave // Number of intermediate scale each scale octave
optional int32 scales_per_octave = 5 [default = 2];
} }
...@@ -32,6 +32,7 @@ message PreprocessingStep { ...@@ -32,6 +32,7 @@ message PreprocessingStep {
SSDRandomCropPadFixedAspectRatio ssd_random_crop_pad_fixed_aspect_ratio = 24; SSDRandomCropPadFixedAspectRatio ssd_random_crop_pad_fixed_aspect_ratio = 24;
RandomVerticalFlip random_vertical_flip = 25; RandomVerticalFlip random_vertical_flip = 25;
RandomRotation90 random_rotation90 = 26; RandomRotation90 random_rotation90 = 26;
RGBtoGray rgb_to_gray = 27;
} }
} }
...@@ -236,6 +237,11 @@ message RandomResizeMethod { ...@@ -236,6 +237,11 @@ message RandomResizeMethod {
optional float target_width = 2; optional float target_width = 2;
} }
// Converts the RGB image to a grayscale image. This also converts the image
// depth from 3 to 1, unlike RandomRGBtoGray which does not change the image
// depth.
message RGBtoGray {}
// Scales boxes from normalized coordinates to pixel coordinates. // Scales boxes from normalized coordinates to pixel coordinates.
message ScaleBoxesToPixelCoordinates { message ScaleBoxesToPixelCoordinates {
} }
......
...@@ -86,4 +86,8 @@ message SsdFeatureExtractor { ...@@ -86,4 +86,8 @@ message SsdFeatureExtractor {
// Whether to use explicit padding when extracting SSD multiresolution // Whether to use explicit padding when extracting SSD multiresolution
// features. Note that this does not apply to the base feature extractor. // features. Note that this does not apply to the base feature extractor.
optional bool use_explicit_padding = 7 [default=false]; optional bool use_explicit_padding = 7 [default=false];
// Whether to use depthwise separable convolutions for to extract additional
// feature maps added by SSD.
optional bool use_depthwise = 8 [default=false];
} }
...@@ -78,4 +78,14 @@ message TrainConfig { ...@@ -78,4 +78,14 @@ message TrainConfig {
// Setting this option to false is very useful while debugging the model and // Setting this option to false is very useful while debugging the model and
// losses. // losses.
optional bool add_regularization_loss = 18 [default=true]; optional bool add_regularization_loss = 18 [default=true];
// Maximum number of boxes used during training.
// Set this to at least the maximum amount of boxes in the input data.
// Otherwise, it may cause "Data loss: Attempted to pad to a smaller size
// than the input element" errors.
optional int32 max_number_of_boxes = 20 [default=50];
// Whether to remove padding along `num_boxes` dimension of the groundtruth
// tensors.
optional bool unpad_groundtruth_tensors = 21 [default=true];
} }
...@@ -7,4 +7,5 @@ licenses(["notice"]) ...@@ -7,4 +7,5 @@ licenses(["notice"])
exports_files([ exports_files([
"faster_rcnn_resnet50_pets.config", "faster_rcnn_resnet50_pets.config",
"ssd_inception_v2_pets.config", "ssd_inception_v2_pets.config",
"ssd_mobilenet_v1_focal_loss_pets.config",
]) ])
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment