Commit 001a2a61 authored by pkulzc's avatar pkulzc Committed by Sergio Guadarrama
Browse files

Internal changes for object detection. (#3656)

* Force cast of num_classes to integer

PiperOrigin-RevId: 188335318

* Updating config util to allow overwriting of cosine decay learning rates.

PiperOrigin-RevId: 188338852

* Make box_list_ops.py and box_list_ops_test.py work with C API enabled.

The C API has improved shape inference over the original Python
code. This causes some previously-working conds to fail. Switching to smart_cond fixes this.

Another effect of the improved shape inference is that one of the
failures tested gets caught earlier, so I modified the test to reflect
this.

PiperOrigin-RevId: 188409792

* Fix parallel event file writing issue.

Without this change, the event files might get corrupted when multiple evaluations are run in parallel.

PiperOrigin-RevId: 188502560

* Deprecating the boolean flag of from_detection_checkpoint.

Replace with a string field fine_tune_checkpoint_type to train_config to provide extensibility. The fine_tune_checkpoint_type can currently take value of `detection`, `classification`, or others when the restore_map is overwritten.

PiperOrigin-RevId: 188518685

* Automated g4 rollback of changelist 188502560

PiperOrigin-RevId: 188519969

* Introducing eval metrics specs for Coco Mask metrics. This allows metrics to be computed in tensorflow using the tf.learn Estimator.

PiperOrigin-RevId: 188528485

* Minor fix to make object_detection/metrics/coco_evaluation.py python3 compatible.

PiperOrigin-RevId: 188550683

* Updating eval_util to handle eval_metric_ops from multiple `DetectionEvaluator`s.

PiperOrigin-RevId: 188560474

* Allow tensor input for new_height and new_width for resize_image.

PiperOrigin-RevId: 188561908

* Fix typo in fine_tune_checkpoint_type name in trainer.

PiperOrigin-RevId: 188799033

* Adding mobilenet feature extractor to object detection.

PiperOrigin-RevId: 188916897

* Allow label maps to optionally contain an explicit background class with id zero.

PiperOrigin-RevId: 188951089

* Fix boundary conditions in random_pad_to_aspect_ratio to ensure that min_scale is always less than max_scale.

PiperOrigin-RevId: 189026868

* Fallback on from_detection_checkpoint option if fine_tune_checkpoint_type isn't set.

PiperOrigin-RevId: 189052833

* Add proper names for learning rate schedules so we don't see cryptic names on tensorboard.

PiperOrigin-RevId: 189069837

* Enforcing that all datasets are batched (and then unbatched in the model) with batch_size >= 1.

PiperOrigin-RevId: 189117178

* Adding regularization to total loss returned from DetectionModel.loss().

PiperOrigin-RevId: 189189123

* Standardize the names of loss scalars (for SSD, Faster R-CNN and R-FCN) in both training and eval so they can be compared on tensorboard.

Log localization and classification losses in evaluation.

PiperOrigin-RevId: 189189940

* Remove negative test from box list ops test.

PiperOrigin-RevId: 189229327

* Add an option to warmup learning rate in manual stepping schedule.

PiperOrigin-RevId: 189361039

* Replace tf.contrib.slim.tfexample_decoder.LookupTensor with object_detection.data_decoders.tf_example_decoder.LookupTensor.

PiperOrigin-RevId: 189388556

* Force regularization summary variables under specific family names.

PiperOrigin-RevId: 189393190

* Automated g4 rollback of changelist 188619139

PiperOrigin-RevId: 189396001

* Remove step 0 schedule since we do a hard check for it after cl/189361039

PiperOrigin-RevId: 189396697

* PiperOrigin-RevId: 189040463

* PiperOrigin-RevId: 189059229

* PiperOrigin-RevId: 189214402

* Force regularization summary variables under specific family names.

PiperOrigin-RevId: 189393190

* Automated g4 rollback of changelist 188619139

PiperOrigin-RevId: 189396001

* Make slim python3 compatible.

* Monir fixes.

* Add TargetAssignment summaries in a separate family.

PiperOrigin-RevId: 189407487

* 1. Setting `family` keyword arg prepends the summary names twice with the same name. Directly adding family suffix to the name gets rid of this problem.
2. Make sure the eval losses have the same name.

PiperOrigin-RevId: 189434618

* Minor fixes to make object detection tf 1.4 compatible.

PiperOrigin-RevId: 189437519

* Call the base of mobilenet_v1 feature extractor under the right arg scope and set batchnorm is_training based on the value passed in the constructor.

PiperOrigin-RevId: 189460890

* Automated g4 rollback of changelist 188409792

PiperOrigin-RevId: 189463882

* Update object detection syncing.

PiperOrigin-RevId: 189601955

* Add an option to warmup learning rate, hold it constant for a certain number of steps and cosine decay it.

PiperOrigin-RevId: 189606169

* Let the proposal feature extractor function in faster_rcnn meta architectures return the activations (end_points).

PiperOrigin-RevId: 189619301

* Fixed bug which caused masks to be mostly zeros (caused by detection_boxes being in absolute coordinates if scale_to_absolute=True.

PiperOrigin-RevId: 189641294

* Open sourcing Mobilenetv2 + SSDLite.

PiperOrigin-RevId: 189654520

* Remove unused files.
parent 2913cb24
......@@ -64,24 +64,24 @@ class InputsTest(tf.test.TestCase):
configs['train_config'], configs['train_input_config'], model_config)
features, labels = train_input_fn()
self.assertAllEqual([None, None, 3],
self.assertAllEqual([1, None, None, 3],
features[fields.InputDataFields.image].shape.as_list())
self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
self.assertAllEqual([],
self.assertAllEqual([1],
features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual(
[None, 4],
[1, 50, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual(
[None, model_config.faster_rcnn.num_classes],
[1, 50, model_config.faster_rcnn.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[None],
[1, 50],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype)
......
......@@ -159,6 +159,7 @@ class FasterRCNNFeatureExtractor(object):
Returns:
rpn_feature_map: A tensor with shape [batch, height, width, depth]
activations: A dictionary mapping activation tensor names to tensors.
"""
with tf.variable_scope(scope, values=[preprocessed_inputs]):
return self._extract_proposal_features(preprocessed_inputs, scope)
......@@ -906,7 +907,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
image_shape: A 1-D tensor representing the input image shape.
"""
image_shape = tf.shape(preprocessed_inputs)
rpn_features_to_crop = self._feature_extractor.extract_proposal_features(
rpn_features_to_crop, _ = self._feature_extractor.extract_proposal_features(
preprocessed_inputs, scope=self.first_stage_feature_extractor_scope)
feature_map_shape = tf.shape(rpn_features_to_crop)
......@@ -1649,14 +1650,14 @@ class FasterRCNNMetaArch(model.DetectionModel):
tf.reduce_sum(localization_losses, axis=1) / normalizer)
objectness_loss = tf.reduce_mean(
tf.reduce_sum(objectness_losses, axis=1) / normalizer)
loss_dict = {}
with tf.name_scope('localization_loss'):
loss_dict['first_stage_localization_loss'] = (
self._first_stage_loc_loss_weight * localization_loss)
with tf.name_scope('objectness_loss'):
loss_dict['first_stage_objectness_loss'] = (
self._first_stage_obj_loss_weight * objectness_loss)
localization_loss = tf.multiply(self._first_stage_loc_loss_weight,
localization_loss,
name='localization_loss')
objectness_loss = tf.multiply(self._first_stage_obj_loss_weight,
objectness_loss, name='objectness_loss')
loss_dict = {localization_loss.op.name: localization_loss,
objectness_loss.op.name: objectness_loss}
return loss_dict
def _loss_box_classifier(self,
......@@ -1782,15 +1783,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
) = self._unpad_proposals_and_apply_hard_mining(
proposal_boxlists, second_stage_loc_losses,
second_stage_cls_losses, num_proposals)
loss_dict = {}
with tf.name_scope('localization_loss'):
loss_dict['second_stage_localization_loss'] = (
self._second_stage_loc_loss_weight * second_stage_loc_loss)
localization_loss = tf.multiply(self._second_stage_loc_loss_weight,
second_stage_loc_loss,
name='localization_loss')
with tf.name_scope('classification_loss'):
loss_dict['second_stage_classification_loss'] = (
self._second_stage_cls_loss_weight * second_stage_cls_loss)
classification_loss = tf.multiply(self._second_stage_cls_loss_weight,
second_stage_cls_loss,
name='classification_loss')
loss_dict = {localization_loss.op.name: localization_loss,
classification_loss.op.name: classification_loss}
second_stage_mask_loss = None
if prediction_masks is not None:
if groundtruth_masks_list is None:
......@@ -1857,9 +1859,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
tf.boolean_mask(second_stage_mask_losses, paddings_indicator))
if second_stage_mask_loss is not None:
with tf.name_scope('mask_loss'):
loss_dict['second_stage_mask_loss'] = (
self._second_stage_mask_loss_weight * second_stage_mask_loss)
mask_loss = tf.multiply(self._second_stage_mask_loss_weight,
second_stage_mask_loss, name='mask_loss')
loss_dict[mask_loss.op.name] = mask_loss
return loss_dict
def _padded_batched_proposals_indicator(self,
......@@ -1927,26 +1929,32 @@ class FasterRCNNMetaArch(model.DetectionModel):
decoded_boxlist_list=[proposal_boxlist])
def restore_map(self,
from_detection_checkpoint=True,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False):
"""Returns a map of variables to load from a foreign checkpoint.
See parent class for details.
Args:
from_detection_checkpoint: whether to restore from a full detection
fine_tune_checkpoint_type: whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training. Default
True.
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'.
load_all_detection_checkpoint_vars: whether to load all variables (when
`from_detection_checkpoint` is True). If False, only variables within
the feature extractor scopes are included. Default False.
`fine_tune_checkpoint_type` is `detection`). If False, only variables
within the feature extractor scopes are included. Default False.
Returns:
A dict mapping variable names (to load from a checkpoint) to variables in
the model graph.
Raises:
ValueError: if fine_tune_checkpoint_type is neither `classification`
nor `detection`.
"""
if not from_detection_checkpoint:
if fine_tune_checkpoint_type not in ['detection', 'classification']:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
if fine_tune_checkpoint_type == 'classification':
return self._feature_extractor.restore_from_classification_checkpoint_fn(
self.first_stage_feature_extractor_scope,
self.second_stage_feature_extractor_scope)
......
......@@ -47,8 +47,9 @@ class FakeFasterRCNNFeatureExtractor(
def _extract_proposal_features(self, preprocessed_inputs, scope):
with tf.variable_scope('mock_model'):
return 0 * slim.conv2d(preprocessed_inputs,
num_outputs=3, kernel_size=1, scope='layer1')
proposal_features = 0 * slim.conv2d(
preprocessed_inputs, num_outputs=3, kernel_size=1, scope='layer1')
return proposal_features, {}
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
with tf.variable_scope('mock_model'):
......@@ -792,10 +793,12 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
loss_dict = model.loss(prediction_dict, true_image_shapes)
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
self.assertTrue('second_stage_localization_loss' not in loss_dict_out)
self.assertTrue('second_stage_classification_loss' not in loss_dict_out)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertTrue('Loss/BoxClassifierLoss/localization_loss'
not in loss_dict_out)
self.assertTrue('Loss/BoxClassifierLoss/classification_loss'
not in loss_dict_out)
# TODO(rathodv): Split test into two - with and without masks.
def test_loss_full(self):
......@@ -890,11 +893,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)
def test_loss_full_zero_padded_proposals(self):
model = self._build_model(
......@@ -978,11 +983,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)
def test_loss_full_multiple_label_groundtruth(self):
model = self._build_model(
......@@ -1074,11 +1081,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)
def test_loss_full_zero_padded_proposals_nonzero_loss_with_two_images(self):
model = self._build_model(
......@@ -1173,12 +1182,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['first_stage_localization_loss'],
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'],
exp_loc_loss)
self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
self.assertAllClose(loss_dict_out['second_stage_localization_loss'],
exp_loc_loss)
self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
def test_loss_with_hard_mining(self):
model = self._build_model(is_training=True,
......@@ -1263,9 +1273,10 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['second_stage_localization_loss'],
exp_loc_loss)
self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
def test_restore_map_for_classification_ckpt(self):
# Define mock tensorflow classification graph and save variables.
......@@ -1296,7 +1307,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
preprocessed_inputs, true_image_shapes = model.preprocess(inputs)
prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
model.postprocess(prediction_dict, true_image_shapes)
var_map = model.restore_map(from_detection_checkpoint=False)
var_map = model.restore_map(fine_tune_checkpoint_type='classification')
self.assertIsInstance(var_map, dict)
saver = tf.train.Saver(var_map)
with self.test_session(graph=test_graph_classification) as sess:
......@@ -1338,7 +1349,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
prediction_dict2 = model2.predict(preprocessed_inputs2, true_image_shapes)
model2.postprocess(prediction_dict2, true_image_shapes)
another_variable = tf.Variable([17.0], name='another_variable') # pylint: disable=unused-variable
var_map = model2.restore_map(from_detection_checkpoint=True)
var_map = model2.restore_map(fine_tune_checkpoint_type='detection')
self.assertIsInstance(var_map, dict)
saver = tf.train.Saver(var_map)
with self.test_session(graph=test_graph_detection2) as sess:
......@@ -1366,7 +1377,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
model.postprocess(prediction_dict, true_image_shapes)
another_variable = tf.Variable([17.0], name='another_variable') # pylint: disable=unused-variable
var_map = model.restore_map(
from_detection_checkpoint=True,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=True)
self.assertIsInstance(var_map, dict)
self.assertIn('another_variable', var_map)
......
......@@ -503,7 +503,7 @@ class SSDMetaArch(model.DetectionModel):
self.groundtruth_lists(fields.BoxListFields.classes),
keypoints, weights)
if self._add_summaries:
self._summarize_input(
self._summarize_target_assignment(
self.groundtruth_lists(fields.BoxListFields.boxes), match_list)
location_losses = self._localization_loss(
prediction_dict['box_encodings'],
......@@ -538,19 +538,20 @@ class SSDMetaArch(model.DetectionModel):
normalizer = tf.maximum(tf.to_float(tf.reduce_sum(batch_reg_weights)),
1.0)
with tf.name_scope('localization_loss'):
localization_loss_normalizer = normalizer
if self._normalize_loc_loss_by_codesize:
localization_loss_normalizer *= self._box_coder.code_size
localization_loss = ((self._localization_loss_weight / (
localization_loss_normalizer)) * localization_loss)
with tf.name_scope('classification_loss'):
classification_loss = ((self._classification_loss_weight / normalizer) *
classification_loss)
localization_loss_normalizer = normalizer
if self._normalize_loc_loss_by_codesize:
localization_loss_normalizer *= self._box_coder.code_size
localization_loss = tf.multiply((self._localization_loss_weight /
localization_loss_normalizer),
localization_loss,
name='localization_loss')
classification_loss = tf.multiply((self._classification_loss_weight /
normalizer), classification_loss,
name='classification_loss')
loss_dict = {
'localization_loss': localization_loss,
'classification_loss': classification_loss
localization_loss.op.name: localization_loss,
classification_loss.op.name: classification_loss
}
return loss_dict
......@@ -615,7 +616,7 @@ class SSDMetaArch(model.DetectionModel):
self._target_assigner, self.anchors, groundtruth_boxlists,
groundtruth_classes_with_background_list, groundtruth_weights_list)
def _summarize_input(self, groundtruth_boxes_list, match_list):
def _summarize_target_assignment(self, groundtruth_boxes_list, match_list):
"""Creates tensorflow summaries for the input boxes and anchors.
This function creates four summaries corresponding to the average
......@@ -639,14 +640,18 @@ class SSDMetaArch(model.DetectionModel):
[match.num_unmatched_columns() for match in match_list])
ignored_anchors_per_image = tf.stack(
[match.num_ignored_columns() for match in match_list])
tf.summary.scalar('Input/AvgNumGroundtruthBoxesPerImage',
tf.reduce_mean(tf.to_float(num_boxes_per_image)))
tf.summary.scalar('Input/AvgNumPositiveAnchorsPerImage',
tf.reduce_mean(tf.to_float(pos_anchors_per_image)))
tf.summary.scalar('Input/AvgNumNegativeAnchorsPerImage',
tf.reduce_mean(tf.to_float(neg_anchors_per_image)))
tf.summary.scalar('Input/AvgNumIgnoredAnchorsPerImage',
tf.reduce_mean(tf.to_float(ignored_anchors_per_image)))
tf.summary.scalar('AvgNumGroundtruthBoxesPerImage',
tf.reduce_mean(tf.to_float(num_boxes_per_image)),
family='TargetAssignment')
tf.summary.scalar('AvgNumPositiveAnchorsPerImage',
tf.reduce_mean(tf.to_float(pos_anchors_per_image)),
family='TargetAssignment')
tf.summary.scalar('AvgNumNegativeAnchorsPerImage',
tf.reduce_mean(tf.to_float(neg_anchors_per_image)),
family='TargetAssignment')
tf.summary.scalar('AvgNumIgnoredAnchorsPerImage',
tf.reduce_mean(tf.to_float(ignored_anchors_per_image)),
family='TargetAssignment')
def _apply_hard_mining(self, location_losses, cls_losses, prediction_dict,
match_list):
......@@ -731,16 +736,17 @@ class SSDMetaArch(model.DetectionModel):
return decoded_boxes, decoded_keypoints
def restore_map(self,
from_detection_checkpoint=True,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False):
"""Returns a map of variables to load from a foreign checkpoint.
See parent class for details.
Args:
from_detection_checkpoint: whether to restore from a full detection
fine_tune_checkpoint_type: whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'.
load_all_detection_checkpoint_vars: whether to load all variables (when
`from_detection_checkpoint` is True). If False, only variables within
the appropriate scopes are included. Default False.
......@@ -748,15 +754,22 @@ class SSDMetaArch(model.DetectionModel):
Returns:
A dict mapping variable names (to load from a checkpoint) to variables in
the model graph.
Raises:
ValueError: if fine_tune_checkpoint_type is neither `classification`
nor `detection`.
"""
if fine_tune_checkpoint_type not in ['detection', 'classification']:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
variables_to_restore = {}
for variable in tf.global_variables():
var_name = variable.op.name
if from_detection_checkpoint and load_all_detection_checkpoint_vars:
if (fine_tune_checkpoint_type == 'detection' and
load_all_detection_checkpoint_vars):
variables_to_restore[var_name] = variable
else:
if var_name.startswith(self._extract_features_scope):
if not from_detection_checkpoint:
if fine_tune_checkpoint_type == 'classification':
var_name = (
re.split('^' + self._extract_features_scope + '/',
var_name)[-1])
......
......@@ -72,6 +72,13 @@ class MockAnchorGenerator2x2(anchor_generator.AnchorGenerator):
return 4
def _get_value_for_matching_key(dictionary, suffix):
for key in dictionary.keys():
if key.endswith(suffix):
return dictionary[key]
raise ValueError('key not found {}'.format(suffix))
class SsdMetaArchTest(test_case.TestCase):
def _create_model(self, apply_hard_mining=True,
......@@ -270,7 +277,9 @@ class SsdMetaArchTest(test_case.TestCase):
prediction_dict = model.predict(preprocessed_tensor,
true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (loss_dict['localization_loss'], loss_dict['classification_loss'])
return (
_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
_get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))
batch_size = 2
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
......@@ -305,7 +314,7 @@ class SsdMetaArchTest(test_case.TestCase):
prediction_dict = model.predict(preprocessed_tensor,
true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (loss_dict['localization_loss'],)
return (_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),)
batch_size = 2
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
......@@ -335,7 +344,9 @@ class SsdMetaArchTest(test_case.TestCase):
prediction_dict = model.predict(preprocessed_tensor,
true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (loss_dict['localization_loss'], loss_dict['classification_loss'])
return (
_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
_get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))
batch_size = 2
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
......@@ -366,7 +377,7 @@ class SsdMetaArchTest(test_case.TestCase):
sess.run(init_op)
saved_model_path = saver.save(sess, save_path)
var_map = model.restore_map(
from_detection_checkpoint=True,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False)
self.assertIsInstance(var_map, dict)
saver = tf.train.Saver(var_map)
......@@ -402,7 +413,7 @@ class SsdMetaArchTest(test_case.TestCase):
prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
model.postprocess(prediction_dict, true_image_shapes)
another_variable = tf.Variable([17.0], name='another_variable') # pylint: disable=unused-variable
var_map = model.restore_map(from_detection_checkpoint=False)
var_map = model.restore_map(fine_tune_checkpoint_type='classification')
self.assertNotIn('another_variable', var_map)
self.assertIsInstance(var_map, dict)
saver = tf.train.Saver(var_map)
......@@ -423,7 +434,7 @@ class SsdMetaArchTest(test_case.TestCase):
model.postprocess(prediction_dict, true_image_shapes)
another_variable = tf.Variable([17.0], name='another_variable') # pylint: disable=unused-variable
var_map = model.restore_map(
from_detection_checkpoint=True,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=True)
self.assertIsInstance(var_map, dict)
self.assertIn('another_variable', var_map)
......
......@@ -100,6 +100,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
groundtruth_is_crowd=groundtruth_is_crowd))
self._annotation_id += groundtruth_dict[standard_fields.InputDataFields.
groundtruth_boxes].shape[0]
# Boolean to indicate whether a detection has been added for this image.
self._image_ids[image_id] = False
def add_single_detected_image_info(self,
......@@ -120,9 +121,6 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
[num_boxes] containing detection scores for the boxes.
DetectionResultFields.detection_classes: integer numpy array of shape
[num_boxes] containing 1-indexed detection classes for the boxes.
DetectionResultFields.detection_masks: optional uint8 numpy array of
shape [num_boxes, image_height, image_width] containing instance
masks for the boxes.
Raises:
ValueError: If groundtruth for the image_id is not available.
......@@ -200,7 +198,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
all_metrics_per_category=self._all_metrics_per_category)
box_metrics.update(box_per_category_ap)
box_metrics = {'DetectionBoxes_'+ key: value
for key, value in box_metrics.iteritems()}
for key, value in iter(box_metrics.items())}
return box_metrics
def get_estimator_eval_metric_ops(self, image_id, groundtruth_boxes,
......@@ -282,6 +280,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
return np.float32(self._metrics[metric_name])
return value_func
# Ensure that the metrics are only evaluated once.
first_value_op = tf.py_func(first_value_func, [], tf.float32)
eval_metric_ops = {metric_names[0]: (first_value_op, update_op)}
with tf.control_dependencies([first_value_op]):
......@@ -292,7 +291,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
def _check_mask_type_and_value(array_name, masks):
"""Checks whether mask dtype is uint8 anf the values are either 0 or 1."""
"""Checks whether mask dtype is uint8 and the values are either 0 or 1."""
if masks.dtype != np.uint8:
raise ValueError('{} must be of type np.uint8. Found {}.'.format(
array_name, masks.dtype))
......@@ -334,6 +333,9 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
groundtruth_dict):
"""Adds groundtruth for a single image to be used for evaluation.
If the image has already been added, a warning is logged, and groundtruth is
ignored.
Args:
image_id: A unique string/integer identifier for the image.
groundtruth_dict: A dictionary containing -
......@@ -379,6 +381,9 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
detections_dict):
"""Adds detections for a single image to be used for evaluation.
If a detection has already been added for this image id, a warning is
logged, and the detection is skipped.
Args:
image_id: A unique string/integer identifier for the image.
detections_dict: A dictionary containing -
......@@ -435,25 +440,25 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
A dictionary holding -
1. summary_metrics:
'Precision/mAP': mean average precision over classes averaged over IOU
thresholds ranging from .5 to .95 with .05 increments
'Precision/mAP@.50IOU': mean average precision at 50% IOU
'Precision/mAP@.75IOU': mean average precision at 75% IOU
'Precision/mAP (small)': mean average precision for small objects
(area < 32^2 pixels)
'Precision/mAP (medium)': mean average precision for medium sized
objects (32^2 pixels < area < 96^2 pixels)
'Precision/mAP (large)': mean average precision for large objects
(96^2 pixels < area < 10000^2 pixels)
'Recall/AR@1': average recall with 1 detection
'Recall/AR@10': average recall with 10 detections
'Recall/AR@100': average recall with 100 detections
'Recall/AR@100 (small)': average recall for small objects with 100
detections
'Recall/AR@100 (medium)': average recall for medium objects with 100
detections
'Recall/AR@100 (large)': average recall for large objects with 100
detections
'DetectionMasks_Precision/mAP': mean average precision over classes
averaged over IOU thresholds ranging from .5 to .95 with .05 increments.
'DetectionMasks_Precision/mAP@.50IOU': mean average precision at 50% IOU.
'DetectionMasks_Precision/mAP@.75IOU': mean average precision at 75% IOU.
'DetectionMasks_Precision/mAP (small)': mean average precision for small
objects (area < 32^2 pixels).
'DetectionMasks_Precision/mAP (medium)': mean average precision for medium
sized objects (32^2 pixels < area < 96^2 pixels).
'DetectionMasks_Precision/mAP (large)': mean average precision for large
objects (96^2 pixels < area < 10000^2 pixels).
'DetectionMasks_Recall/AR@1': average recall with 1 detection.
'DetectionMasks_Recall/AR@10': average recall with 10 detections.
'DetectionMasks_Recall/AR@100': average recall with 100 detections.
'DetectionMasks_Recall/AR@100 (small)': average recall for small objects
with 100 detections.
'DetectionMasks_Recall/AR@100 (medium)': average recall for medium objects
with 100 detections.
'DetectionMasks_Recall/AR@100 (large)': average recall for large objects
with 100 detections.
2. per_category_ap: if include_metrics_per_category is True, category
specific results with keys of the form:
......@@ -482,3 +487,101 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
mask_metrics = {'DetectionMasks_'+ key: value
for key, value in mask_metrics.iteritems()}
return mask_metrics
def get_estimator_eval_metric_ops(self, image_id, groundtruth_boxes,
groundtruth_classes,
groundtruth_instance_masks,
detection_scores, detection_classes,
detection_masks):
"""Returns a dictionary of eval metric ops to use with `tf.EstimatorSpec`.
Note that once value_op is called, the detections and groundtruth added via
update_op are cleared.
Args:
image_id: Unique string/integer identifier for the image.
groundtruth_boxes: float32 tensor of shape [num_boxes, 4] containing
`num_boxes` groundtruth boxes of the format
[ymin, xmin, ymax, xmax] in absolute image coordinates.
groundtruth_classes: int32 tensor of shape [num_boxes] containing
1-indexed groundtruth classes for the boxes.
groundtruth_instance_masks: uint8 tensor array of shape
[num_boxes, image_height, image_width] containing groundtruth masks
corresponding to the boxes. The elements of the array must be in {0, 1}.
detection_scores: float32 tensor of shape [num_boxes] containing
detection scores for the boxes.
detection_classes: int32 tensor of shape [num_boxes] containing
1-indexed detection classes for the boxes.
detection_masks: uint8 tensor array of shape
[num_boxes, image_height, image_width] containing instance masks
corresponding to the boxes. The elements of the array must be in {0, 1}.
Returns:
a dictionary of metric names to tuple of value_op and update_op that can
be used as eval metric ops in tf.EstimatorSpec. Note that all update ops
must be run together and similarly all value ops must be run together to
guarantee correct behaviour.
"""
def update_op(
image_id,
groundtruth_boxes,
groundtruth_classes,
groundtruth_instance_masks,
detection_scores,
detection_classes,
detection_masks):
self.add_single_ground_truth_image_info(
image_id,
{'groundtruth_boxes': groundtruth_boxes,
'groundtruth_classes': groundtruth_classes,
'groundtruth_instance_masks': groundtruth_instance_masks})
self.add_single_detected_image_info(
image_id,
{'detection_scores': detection_scores,
'detection_classes': detection_classes,
'detection_masks': detection_masks})
update_op = tf.py_func(update_op, [image_id,
groundtruth_boxes,
groundtruth_classes,
groundtruth_instance_masks,
detection_scores,
detection_classes,
detection_masks], [])
metric_names = ['DetectionMasks_Precision/mAP',
'DetectionMasks_Precision/mAP@.50IOU',
'DetectionMasks_Precision/mAP@.75IOU',
'DetectionMasks_Precision/mAP (large)',
'DetectionMasks_Precision/mAP (medium)',
'DetectionMasks_Precision/mAP (small)',
'DetectionMasks_Recall/AR@1',
'DetectionMasks_Recall/AR@10',
'DetectionMasks_Recall/AR@100',
'DetectionMasks_Recall/AR@100 (large)',
'DetectionMasks_Recall/AR@100 (medium)',
'DetectionMasks_Recall/AR@100 (small)']
if self._include_metrics_per_category:
for category_dict in self._categories:
metric_names.append('DetectionMasks_PerformanceByCategory/mAP/' +
category_dict['name'])
def first_value_func():
self._metrics = self.evaluate()
self.clear()
return np.float32(self._metrics[metric_names[0]])
def value_func_factory(metric_name):
def value_func():
return np.float32(self._metrics[metric_name])
return value_func
# Ensure that the metrics are only evaluated once.
first_value_op = tf.py_func(first_value_func, [], tf.float32)
eval_metric_ops = {metric_names[0]: (first_value_op, update_op)}
with tf.control_dependencies([first_value_op]):
for metric_name in metric_names[1:]:
eval_metric_ops[metric_name] = (tf.py_func(
value_func_factory(metric_name), [], np.float32), update_op)
return eval_metric_ops
......@@ -403,5 +403,101 @@ class CocoMaskEvaluationTest(tf.test.TestCase):
self.assertFalse(coco_evaluator._detection_masks_list)
class CocoMaskEvaluationPyFuncTest(tf.test.TestCase):
def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
category_list = [{'id': 0, 'name': 'person'},
{'id': 1, 'name': 'cat'},
{'id': 2, 'name': 'dog'}]
coco_evaluator = coco_evaluation.CocoMaskEvaluator(category_list)
image_id = tf.placeholder(tf.string, shape=())
groundtruth_boxes = tf.placeholder(tf.float32, shape=(None, 4))
groundtruth_classes = tf.placeholder(tf.float32, shape=(None))
groundtruth_masks = tf.placeholder(tf.uint8, shape=(None, None, None))
detection_scores = tf.placeholder(tf.float32, shape=(None))
detection_classes = tf.placeholder(tf.float32, shape=(None))
detection_masks = tf.placeholder(tf.uint8, shape=(None, None, None))
eval_metric_ops = coco_evaluator.get_estimator_eval_metric_ops(
image_id, groundtruth_boxes,
groundtruth_classes,
groundtruth_masks,
detection_scores,
detection_classes,
detection_masks)
_, update_op = eval_metric_ops['DetectionMasks_Precision/mAP']
with self.test_session() as sess:
sess.run(update_op,
feed_dict={
image_id: 'image1',
groundtruth_boxes: np.array([[100., 100., 200., 200.]]),
groundtruth_classes: np.array([1]),
groundtruth_masks: np.pad(np.ones([1, 100, 100],
dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant'),
detection_scores: np.array([.8]),
detection_classes: np.array([1]),
detection_masks: np.pad(np.ones([1, 100, 100],
dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant')
})
sess.run(update_op,
feed_dict={
image_id: 'image2',
groundtruth_boxes: np.array([[50., 50., 100., 100.]]),
groundtruth_classes: np.array([1]),
groundtruth_masks: np.pad(np.ones([1, 50, 50],
dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant'),
detection_scores: np.array([.8]),
detection_classes: np.array([1]),
detection_masks: np.pad(np.ones([1, 50, 50], dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant')
})
sess.run(update_op,
feed_dict={
image_id: 'image3',
groundtruth_boxes: np.array([[25., 25., 50., 50.]]),
groundtruth_classes: np.array([1]),
groundtruth_masks: np.pad(np.ones([1, 25, 25],
dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant'),
detection_scores: np.array([.8]),
detection_classes: np.array([1]),
detection_masks: np.pad(np.ones([1, 25, 25],
dtype=np.uint8),
((0, 0), (10, 10), (10, 10)),
mode='constant')
})
metrics = {}
for key, (value_op, _) in eval_metric_ops.iteritems():
metrics[key] = value_op
metrics = sess.run(metrics)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP@.50IOU'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP@.75IOU'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (medium)'],
1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (small)'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@1'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@10'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (medium)'],
1.0)
self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (small)'], 1.0)
self.assertFalse(coco_evaluator._groundtruth_list)
self.assertFalse(coco_evaluator._image_ids_with_detections)
self.assertFalse(coco_evaluator._image_id_to_mask_shape_map)
self.assertFalse(coco_evaluator._detection_masks_list)
if __name__ == '__main__':
tf.test.main()
......@@ -39,7 +39,6 @@ from object_detection import model_hparams
from object_detection.builders import model_builder
from object_detection.builders import optimizer_builder
from object_detection.core import standard_fields as fields
from object_detection.metrics import coco_evaluation
from object_detection.utils import config_util
from object_detection.utils import label_map_util
from object_detection.utils import shape_utils
......@@ -121,8 +120,8 @@ def unstack_batch(tensor_dict, unpad_groundtruth_tensors=True):
2. [batch_size, height, width, channels]
3. [batch_size, num_boxes, d1, d2, ... dn]
When unpad_tensors is set to true, unstacked tensors of form 3 above are
sliced along the `num_boxes` dimension using the value in tensor
When unpad_groundtruth_tensors is set to true, unstacked tensors of form 3
above are sliced along the `num_boxes` dimension using the value in tensor
field.InputDataFields.num_groundtruth_boxes.
Note that this function has a static list of input data fields and has to be
......@@ -198,6 +197,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
"""
train_config = configs['train_config']
eval_input_config = configs['eval_input_config']
eval_config = configs['eval_config']
def model_fn(features, labels, mode, params=None):
"""Constructs the object detection model.
......@@ -250,9 +250,25 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
prediction_dict, features[fields.InputDataFields.true_image_shape])
if mode == tf.estimator.ModeKeys.TRAIN:
if not train_config.fine_tune_checkpoint_type:
# train_config.from_detection_checkpoint field is deprecated. For
# backward compatibility, sets finetune_checkpoint_type based on
# from_detection_checkpoint.
if train_config.from_detection_checkpoint:
train_config.fine_tune_checkpoint_type = 'detection'
else:
train_config.fine_tune_checkpoint_type = 'classification'
if train_config.fine_tune_checkpoint and hparams.load_pretrained:
if not train_config.fine_tune_checkpoint_type:
# train_config.from_detection_checkpoint field is deprecated. For
# backward compatibility, set train_config.fine_tune_checkpoint_type
# based on train_config.from_detection_checkpoint.
if train_config.from_detection_checkpoint:
train_config.fine_tune_checkpoint_type = 'detection'
else:
train_config.fine_tune_checkpoint_type = 'classification'
asg_map = detection_model.restore_map(
from_detection_checkpoint=train_config.from_detection_checkpoint,
fine_tune_checkpoint_type=train_config.fine_tune_checkpoint_type,
load_all_detection_checkpoint_vars=(
train_config.load_all_detection_checkpoint_vars))
available_var_map = (
......@@ -273,6 +289,15 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
losses_dict = detection_model.loss(
prediction_dict, features[fields.InputDataFields.true_image_shape])
losses = [loss_tensor for loss_tensor in losses_dict.itervalues()]
if train_config.add_regularization_loss:
regularization_losses = tf.get_collection(
tf.GraphKeys.REGULARIZATION_LOSSES)
if regularization_losses:
regularization_loss = tf.add_n(regularization_losses,
name='regularization_loss')
losses.append(regularization_loss)
if not use_tpu:
tf.summary.scalar('regularization_loss', regularization_loss)
total_loss = tf.add_n(losses, name='total_loss')
if mode == tf.estimator.ModeKeys.TRAIN:
......@@ -321,8 +346,12 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
class_agnostic = (fields.DetectionResultFields.detection_classes
not in detections)
groundtruth = _get_groundtruth_data(detection_model, class_agnostic)
use_original_images = fields.InputDataFields.original_image in features
eval_images = (
features[fields.InputDataFields.original_image] if use_original_images
else features[fields.InputDataFields.image])
eval_dict = eval_util.result_dict_for_single_example(
tf.expand_dims(features[fields.InputDataFields.original_image][0], 0),
eval_images[0:1],
features[inputs.HASH_KEY][0],
detections,
groundtruth,
......@@ -334,7 +363,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
else:
category_index = label_map_util.create_category_index_from_labelmap(
eval_input_config.label_map_path)
if not use_tpu:
if not use_tpu and use_original_images:
detection_and_groundtruth = (
vis_utils.draw_side_by_side_evaluation_image(
eval_dict, category_index, max_boxes_to_draw=20,
......@@ -343,17 +372,12 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
detection_and_groundtruth)
# Eval metrics on a single image.
detection_fields = fields.DetectionResultFields()
input_data_fields = fields.InputDataFields()
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
category_index.values())
eval_metric_ops = coco_evaluator.get_estimator_eval_metric_ops(
image_id=eval_dict[input_data_fields.key],
groundtruth_boxes=eval_dict[input_data_fields.groundtruth_boxes],
groundtruth_classes=eval_dict[input_data_fields.groundtruth_classes],
detection_boxes=eval_dict[detection_fields.detection_boxes],
detection_scores=eval_dict[detection_fields.detection_scores],
detection_classes=eval_dict[detection_fields.detection_classes])
eval_metrics = eval_config.metrics_set
if not eval_metrics:
eval_metrics = ['coco_detection_metrics']
eval_metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
eval_metrics, category_index.values(), eval_dict,
include_metrics_per_category=False)
if use_tpu:
return tf.contrib.tpu.TPUEstimatorSpec(
......@@ -376,7 +400,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
return model_fn
def _build_experiment_fn(train_steps, eval_steps):
def build_experiment_fn(train_steps, eval_steps):
"""Returns a function that creates an `Experiment`."""
def build_experiment(run_config, hparams):
......@@ -509,8 +533,8 @@ def main(unused_argv):
tf.flags.mark_flag_as_required('pipeline_config_path')
config = tf.contrib.learn.RunConfig(model_dir=FLAGS.model_dir)
learn_runner.run(
experiment_fn=_build_experiment_fn(FLAGS.num_train_steps,
FLAGS.num_eval_steps),
experiment_fn=build_experiment_fn(FLAGS.num_train_steps,
FLAGS.num_eval_steps),
run_config=config,
hparams=model_hparams.create_hparams())
......
......@@ -49,6 +49,6 @@ def BuildExperiment():
hparams_overrides='load_pretrained=false')
# pylint: disable=protected-access
experiment_fn = model._build_experiment_fn(10, 10)
experiment_fn = model.build_experiment_fn(10, 10)
# pylint: enable=protected-access
return experiment_fn(run_config, hparams)
......@@ -105,12 +105,10 @@ class FasterRCNNInceptionResnetV2FeatureExtractor(
is_training=self._train_batch_norm):
with tf.variable_scope('InceptionResnetV2',
reuse=self._reuse_weights) as scope:
rpn_feature_map, _ = (
inception_resnet_v2.inception_resnet_v2_base(
preprocessed_inputs, final_endpoint='PreAuxLogits',
scope=scope, output_stride=self._first_stage_features_stride,
align_feature_maps=True))
return rpn_feature_map
return inception_resnet_v2.inception_resnet_v2_base(
preprocessed_inputs, final_endpoint='PreAuxLogits',
scope=scope, output_stride=self._first_stage_features_stride,
align_feature_maps=True)
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
"""Extracts second stage box classifier features.
......
......@@ -35,7 +35,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 299, 299, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -50,7 +50,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=8)
preprocessed_inputs = tf.random_uniform(
[1, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -65,7 +65,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 112, 112, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......
......@@ -109,6 +109,9 @@ class FasterRCNNInceptionV2FeatureExtractor(
Returns:
rpn_feature_map: A tensor with shape [batch, height, width, depth]
activations: A dictionary mapping feature extractor tensor names to
tensors
Raises:
InvalidArgumentError: If the spatial size of `preprocessed_inputs`
(height or width) is less than 33.
......@@ -134,7 +137,7 @@ class FasterRCNNInceptionV2FeatureExtractor(
depth_multiplier=self._depth_multiplier,
scope=scope)
return activations['Mixed_4e']
return activations['Mixed_4e'], activations
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
"""Extracts second stage box classifier features.
......
......@@ -36,7 +36,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -51,7 +51,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=8)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -66,7 +66,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 112, 112, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -84,7 +84,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Mobilenet v1 Faster R-CNN implementation."""
import tensorflow as tf
from object_detection.meta_architectures import faster_rcnn_meta_arch
from nets import mobilenet_v1
slim = tf.contrib.slim
def _batch_norm_arg_scope(list_ops,
use_batch_norm=True,
batch_norm_decay=0.9997,
batch_norm_epsilon=0.001,
batch_norm_scale=False,
train_batch_norm=False):
"""Slim arg scope for Mobilenet V1 batch norm."""
if use_batch_norm:
batch_norm_params = {
'is_training': train_batch_norm,
'scale': batch_norm_scale,
'decay': batch_norm_decay,
'epsilon': batch_norm_epsilon
}
normalizer_fn = slim.batch_norm
else:
normalizer_fn = None
batch_norm_params = None
return slim.arg_scope(list_ops,
normalizer_fn=normalizer_fn,
normalizer_params=batch_norm_params)
class FasterRCNNMobilenetV1FeatureExtractor(
faster_rcnn_meta_arch.FasterRCNNFeatureExtractor):
"""Faster R-CNN Mobilenet V1 feature extractor implementation."""
def __init__(self,
is_training,
first_stage_features_stride,
batch_norm_trainable=False,
reuse_weights=None,
weight_decay=0.0,
depth_multiplier=1.0,
min_depth=16):
"""Constructor.
Args:
is_training: See base class.
first_stage_features_stride: See base class.
batch_norm_trainable: See base class.
reuse_weights: See base class.
weight_decay: See base class.
depth_multiplier: float depth multiplier for feature extractor.
min_depth: minimum feature extractor depth.
Raises:
ValueError: If `first_stage_features_stride` is not 8 or 16.
"""
if first_stage_features_stride != 8 and first_stage_features_stride != 16:
raise ValueError('`first_stage_features_stride` must be 8 or 16.')
self._depth_multiplier = depth_multiplier
self._min_depth = min_depth
super(FasterRCNNMobilenetV1FeatureExtractor, self).__init__(
is_training, first_stage_features_stride, batch_norm_trainable,
reuse_weights, weight_decay)
def preprocess(self, resized_inputs):
"""Faster R-CNN Mobilenet V1 preprocessing.
Maps pixel values to the range [-1, 1].
Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
return (2.0 / 255.0) * resized_inputs - 1.0
def _extract_proposal_features(self, preprocessed_inputs, scope):
"""Extracts first stage RPN features.
Args:
preprocessed_inputs: A [batch, height, width, channels] float32 tensor
representing a batch of images.
scope: A scope name.
Returns:
rpn_feature_map: A tensor with shape [batch, height, width, depth]
activations: A dictionary mapping feature extractor tensor names to
tensors
Raises:
InvalidArgumentError: If the spatial size of `preprocessed_inputs`
(height or width) is less than 33.
ValueError: If the created network is missing the required activation.
"""
preprocessed_inputs.get_shape().assert_has_rank(4)
shape_assert = tf.Assert(
tf.logical_and(tf.greater_equal(tf.shape(preprocessed_inputs)[1], 33),
tf.greater_equal(tf.shape(preprocessed_inputs)[2], 33)),
['image size must at least be 33 in both height and width.'])
with tf.control_dependencies([shape_assert]):
with tf.variable_scope('MobilenetV1',
reuse=self._reuse_weights) as scope:
with _batch_norm_arg_scope([slim.conv2d, slim.separable_conv2d],
batch_norm_scale=True,
train_batch_norm=self._train_batch_norm):
_, activations = mobilenet_v1.mobilenet_v1_base(
preprocessed_inputs,
final_endpoint='Conv2d_13_pointwise',
min_depth=self._min_depth,
depth_multiplier=self._depth_multiplier,
scope=scope)
return activations['Conv2d_13_pointwise'], activations
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
"""Extracts second stage box classifier features.
Args:
proposal_feature_maps: A 4-D float tensor with shape
[batch_size * self.max_num_proposals, crop_height, crop_width, depth]
representing the feature map cropped to each proposal.
scope: A scope name (unused).
Returns:
proposal_classifier_features: A 4-D float tensor with shape
[batch_size * self.max_num_proposals, height, width, depth]
representing box classifier features for each proposal.
"""
net = proposal_feature_maps
depth = lambda d: max(int(d * 1.0), 16)
with tf.variable_scope('MobilenetV1', reuse=self._reuse_weights):
with _batch_norm_arg_scope([slim.conv2d, slim.separable_conv2d],
batch_norm_scale=True,
train_batch_norm=self._train_batch_norm):
with slim.arg_scope(
[slim.conv2d, slim.separable_conv2d], padding='SAME'):
net = slim.separable_conv2d(
net,
depth(1024), [3, 3],
depth_multiplier=1,
stride=2,
scope='Conv2d_12_pointwise')
return slim.separable_conv2d(
net,
depth(1024), [3, 3],
depth_multiplier=1,
stride=1,
scope='Conv2d_13_pointwise')
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for faster_rcnn_mobilenet_v1_feature_extractor."""
import numpy as np
import tensorflow as tf
from object_detection.models import faster_rcnn_mobilenet_v1_feature_extractor as faster_rcnn_mobilenet_v1
class FasterRcnnMobilenetV1FeatureExtractorTest(tf.test.TestCase):
def _build_feature_extractor(self, first_stage_features_stride):
return faster_rcnn_mobilenet_v1.FasterRCNNMobilenetV1FeatureExtractor(
is_training=False,
first_stage_features_stride=first_stage_features_stride,
batch_norm_trainable=False,
reuse_weights=None,
weight_decay=0.0)
def test_extract_proposal_features_returns_expected_size(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
features_shape_out = sess.run(features_shape)
self.assertAllEqual(features_shape_out, [4, 7, 7, 1024])
def test_extract_proposal_features_stride_eight(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=8)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
features_shape_out = sess.run(features_shape)
self.assertAllEqual(features_shape_out, [4, 7, 7, 1024])
def test_extract_proposal_features_half_size_input(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 112, 112, 3], maxval=255, dtype=tf.float32)
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
features_shape_out = sess.run(features_shape)
self.assertAllEqual(features_shape_out, [1, 4, 4, 1024])
def test_extract_proposal_features_dies_on_invalid_stride(self):
with self.assertRaises(ValueError):
self._build_feature_extractor(first_stage_features_stride=99)
def test_extract_proposal_features_dies_on_very_small_images(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
with self.assertRaises(tf.errors.InvalidArgumentError):
sess.run(
features_shape,
feed_dict={preprocessed_inputs: np.random.rand(4, 32, 32, 3)})
def test_extract_proposal_features_dies_with_incorrect_rank_inputs(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[224, 224, 3], maxval=255, dtype=tf.float32)
with self.assertRaises(ValueError):
feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
def test_extract_box_classifier_features_returns_expected_size(self):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
proposal_feature_maps = tf.random_uniform(
[3, 14, 14, 576], maxval=255, dtype=tf.float32)
proposal_classifier_features = (
feature_extractor.extract_box_classifier_features(
proposal_feature_maps, scope='TestScope'))
features_shape = tf.shape(proposal_classifier_features)
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
features_shape_out = sess.run(features_shape)
self.assertAllEqual(features_shape_out, [3, 7, 7, 1024])
if __name__ == '__main__':
tf.test.main()
......@@ -171,6 +171,8 @@ class FasterRCNNNASFeatureExtractor(
Returns:
rpn_feature_map: A tensor with shape [batch, height, width, depth]
end_points: A dictionary mapping feature extractor tensor names to tensors
Raises:
ValueError: If the created network is missing the required activation.
"""
......@@ -202,7 +204,7 @@ class FasterRCNNNASFeatureExtractor(
rpn_feature_map_shape = [batch] + shape_without_batch
rpn_feature_map.set_shape(rpn_feature_map_shape)
return rpn_feature_map
return rpn_feature_map, end_points
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
"""Extracts second stage box classifier features.
......@@ -231,9 +233,11 @@ class FasterRCNNNASFeatureExtractor(
# Note that what follows is largely a copy of build_nasnet_large() within
# nasnet.py. We are copying to minimize code pollution in slim.
# pylint: disable=protected-access
hparams = nasnet._large_imagenet_config(is_training=self._is_training)
# pylint: enable=protected-access
# TODO(shlens,skornblith): Determine the appropriate drop path schedule.
# For now the schedule is the default (1.0->0.7 over 250,000 train steps).
hparams = nasnet.large_imagenet_config()
if not self._is_training:
hparams.set_hparam('drop_path_keep_prob', 1.0)
# Calculate the total number of cells in the network
# -- Add 2 for the reduction cells.
......
......@@ -35,7 +35,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 299, 299, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -50,7 +50,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -65,7 +65,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 112, 112, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......
......@@ -95,6 +95,9 @@ class FasterRCNNResnetV1FeatureExtractor(
Returns:
rpn_feature_map: A tensor with shape [batch, height, width, depth]
activations: A dictionary mapping feature extractor tensor names to
tensors
Raises:
InvalidArgumentError: If the spatial size of `preprocessed_inputs`
(height or width) is less than 33.
......@@ -130,7 +133,7 @@ class FasterRCNNResnetV1FeatureExtractor(
scope=var_scope)
handle = scope + '/%s/block3' % self._architecture
return activations[handle]
return activations[handle], activations
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
"""Extracts second stage box classifier features.
......
......@@ -47,7 +47,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16, architecture=architecture)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -62,7 +62,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=8)
preprocessed_inputs = tf.random_uniform(
[4, 224, 224, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -77,7 +77,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
first_stage_features_stride=16)
preprocessed_inputs = tf.random_uniform(
[1, 112, 112, 3], maxval=255, dtype=tf.float32)
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......@@ -95,7 +95,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
feature_extractor = self._build_feature_extractor(
first_stage_features_stride=16)
preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
rpn_feature_map = feature_extractor.extract_proposal_features(
rpn_feature_map, _ = feature_extractor.extract_proposal_features(
preprocessed_inputs, scope='TestScope')
features_shape = tf.shape(rpn_feature_map)
......
......@@ -100,11 +100,13 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
'use_depthwise': self._use_depthwise,
}
with slim.arg_scope(self._conv_hyperparams):
# TODO(skligys): Enable fused batch norm once quantization supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
with tf.variable_scope('MobilenetV1',
reuse=self._reuse_weights) as scope:
with tf.variable_scope('MobilenetV1',
reuse=self._reuse_weights) as scope:
with slim.arg_scope(
mobilenet_v1.mobilenet_v1_arg_scope(
is_training=(self._batch_norm_trainable and self._is_training))):
# TODO(skligys): Enable fused batch norm once quantization supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
_, image_features = mobilenet_v1.mobilenet_v1_base(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
final_endpoint='Conv2d_13_pointwise',
......@@ -112,6 +114,9 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
depth_multiplier=self._depth_multiplier,
use_explicit_padding=self._use_explicit_padding,
scope=scope)
with slim.arg_scope(self._conv_hyperparams):
# TODO(skligys): Enable fused batch norm once quantization supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_layout=feature_map_layout,
depth_multiplier=self._depth_multiplier,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment