Release MobileNet V3 models and SSDLite models with MobileNet V3 backbone. (#7678)

* Merged commit includes the following changes: 275131829 by Sergio Guadarrama: updates mobilenet/README.md to be github compatible adds V2+ reference to mobilenet_v1.md file and fixes invalid markdown -- 274908068 by Sergio Guadarrama: Opensource MobilenetV3 detection models. -- 274697808 by Sergio Guadarrama: Fixed cases where tf.TensorShape was constructed with float dimensions This is a prerequisite for making TensorShape and Dimension more strict about the types of their arguments. -- 273577462 by Sergio Guadarrama: Fixing `conv_defs['defaults']` override issue. -- 272801298 by Sergio Guadarrama: Adds links to trained models for Moblienet V3, adds a version of minimalistic mobilenet-v3 to the definitions. -- 268928503 by Sergio Guadarrama: Mobilenet v2 with group normalization. -- 263492735 by Sergio Guadarrama: Internal change 260037126 by Sergio Guadarrama: Adds an option of using a custom depthwise operation in `expanded_conv`. -- 259997001 by Sergio Guadarrama: Explicitly mark Python binaries/tests with python_version = "PY2". -- 252697685 by Sergio Guadarrama: Internal change 251918746 by Sergio Guadarrama: Internal change 251909704 by Sergio Guadarrama: Mobilenet V3 backbone implementation. -- 247510236 by Sergio Guadarrama: Internal change 246196802 by Sergio Guadarrama: Internal change 246014539 by Sergio Guadarrama: Internal change 245891435 by Sergio Guadarrama: Internal change 245834925 by Sergio Guadarrama: n/a -- PiperOrigin-RevId: 275131829 * Merged commit includes the following changes: 274959989 by Zhichao Lu: Update detection model zoo with MobilenetV3 SSD candidates. -- 274908068 by Zhichao Lu: Opensource MobilenetV3 detection models. -- 274695889 by richardmunoz: RandomPatchGaussian preprocessing step This step can be used during model training to randomly apply gaussian noise to a random image patch. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_patch_gaussian { random_coef: 0.5 min_patch_size: 1 max_patch_size: 250 min_gaussian_stddev: 0.0 max_gaussian_stddev: 1.0 } } ... } -- 274257872 by lzc: Internal change. -- 274114689 by Zhichao Lu: Pass native_resize flag to other FPN variants. -- 274112308 by lzc: Internal change. -- 274090763 by richardmunoz: Util function for getting a patch mask on an image for use with the Object Detection API -- 274069806 by Zhichao Lu: Adding functions which will help compute predictions and losses for CenterNet. -- 273860828 by lzc: Internal change. -- 273380069 by richardmunoz: RandomImageDownscaleToTargetPixels preprocessing step This step can be used during model training to randomly downscale an image to a random target number of pixels. If the image does not contain more than the target number of pixels, then downscaling is skipped. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_downscale_to_target_pixels { random_coef: 0.5 min_target_pixels: 300000 max_target_pixels: 500000 } } ... } -- 272987602 by Zhichao Lu: Avoid -inf when empty box list is passed. -- 272525836 by Zhichao Lu: Cleanup repeated resizing code in meta archs. -- 272458667 by richardmunoz: RandomJpegQuality preprocessing step This step can be used during model training to randomly encode the image into a jpeg with a random quality level. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_jpeg_quality { random_coef: 0.5 min_jpeg_quality: 80 max_jpeg_quality: 100 } } ... } -- 271412717 by Zhichao Lu: Enables TPU training with the V2 eager + tf.function Object Detection training loops. -- 270744153 by Zhichao Lu: Adding the offset and size target assigners for CenterNet. -- 269916081 by Zhichao Lu: Include basic installation in Object Detection API tutorial. Also: - Use TF2.0 - Use saved_model -- 269376056 by Zhichao Lu: Fix to variable loading in RetinaNet w/ custom loops. (makes the code rely on the exact name scopes that are generated a little bit less) -- 269256251 by lzc: Add use_partitioned_nms field to config and update post_prossing_builder to honor that flag when building nms function. -- 268865295 by Zhichao Lu: Adding functionality for importing and merging back internal state of the metric. -- 268640984 by Zhichao Lu: Fix computation of gaussian sigma value to create CenterNet heatmap target. -- 267475576 by Zhichao Lu: Fix for exporter trying to export non-existent exponential moving averages. -- 267286768 by Zhichao Lu: Update mixed-precision policy. -- 266166879 by Zhichao Lu: Internal change 265860884 by Zhichao Lu: Apply floor function to center coordinates when creating heatmap for CenterNet target. -- 265702749 by Zhichao Lu: Internal change -- 264241949 by ronnyvotel: Updating Faster R-CNN 'final_anchors' to be in normalized coordinates. -- 264175192 by lzc: Update model_fn to only read hparams if it is not None. -- 264159328 by Zhichao Lu: Modify nearest neighbor upsampling to eliminate a multiply operation. For quantized models, the multiply operation gets unnecessarily quantized and reduces accuracy (simple stacking would work in place of the broadcast op which doesn't require quantization). Also removes an unnecessary reshape op. -- 263668306 by Zhichao Lu: Add the option to use dynamic map_fn for batch NMS -- 263031163 by Zhichao Lu: Mark outside compilation for NMS as optional. -- 263024916 by Zhichao Lu: Add an ExperimentalModel meta arch for experimenting with new model types. -- 262655894 by Zhichao Lu: Add the center heatmap target assigner for CenterNet -- 262431036 by Zhichao Lu: Adding add_eval_dict to allow for evaluation on model_v2 -- 262035351 by ronnyvotel: Removing any non-Tensor predictions from the third stage of Mask R-CNN. -- 261953416 by Zhichao Lu: Internal change. -- 261834966 by Zhichao Lu: Fix the NMS OOM issue on TPU by forcing NMS to run outside of TPU. -- 261775941 by Zhichao Lu: Make Keras InputLayer compatible with both TF 1.x and TF 2.0. -- 261775633 by Zhichao Lu: Visualize additional channels with ground-truth bounding boxes. -- 261768117 by lzc: Internal change. -- 261766773 by ronnyvotel: Exposing `return_raw_detections_during_predict` in Faster R-CNN Proto. -- 260975089 by ronnyvotel: Moving calculation of batched prediction tensor names after all tensors in prediction dictionary are created. -- 259816913 by ronnyvotel: Adding raw detection boxes and feature map indices to SSD -- 259791955 by Zhichao Lu: Added a flag to control the use partitioned_non_max_suppression. -- 259580475 by Zhichao Lu: Tweak quantization-aware training re-writer to support NasFpn model architecture. -- 259579943 by rathodv: Add a meta target assigner proto and builders in OD API. -- 259577741 by Zhichao Lu: Internal change. -- 259366315 by lzc: Internal change. -- 259344310 by ronnyvotel: Updating faster rcnn so that raw_detection_boxes from predict() are in normalized coordinates. -- 259338670 by Zhichao Lu: Add support for use_native_resize_op to more feature extractors. Use dynamic shapes when static shapes are not available. -- 259083543 by ronnyvotel: Updating/fixing documentation. -- 259078937 by rathodv: Add prediction fields for tensors returned from detection_model.predict. -- 259044601 by Zhichao Lu: Add protocol buffer and builders for temperature scaling calibration. -- 259036770 by lzc: Internal changes. -- 259006223 by ronnyvotel: Adding detection anchor indices to Faster R-CNN Config. This is useful when one wishes to associate final detections and the anchors (or pre-nms boxes) from which they originated. -- 258872501 by Zhichao Lu: Run the training pipeline of ssd + resnet_v1_50 + fpn with a checkpoint. -- 258840686 by ronnyvotel: Adding standard outputs to DetectionModel.predict(). This CL only updates Faster R-CNN. Other meta architectures will be updated in future CLs. -- 258672969 by lzc: Internal change. -- 258649494 by lzc: Internal changes. -- 258630321 by ronnyvotel: Fixing documentation in shape_utils.flatten_dimensions(). -- 258468145 by Zhichao Lu: Add additional output tensors parameter to Postprocess op. -- 258099219 by Zhichao Lu: Internal changes -- PiperOrigin-RevId: 274959989

Release MobileNet V3 models and SSDLite models with MobileNet V3 backbone. (#7678)
* Merged commit includes the following changes: 275131829 by Sergio Guadarrama: updates mobilenet/README.md to be github compatible adds V2+ reference to mobilenet_v1.md file and fixes invalid markdown -- 274908068 by Sergio Guadarrama: Opensource MobilenetV3 detection models. -- 274697808 by Sergio Guadarrama: Fixed cases where tf.TensorShape was constructed with float dimensions This is a prerequisite for making TensorShape and Dimension more strict about the types of their arguments. -- 273577462 by Sergio Guadarrama: Fixing `conv_defs['defaults']` override issue. -- 272801298 by Sergio Guadarrama: Adds links to trained models for Moblienet V3, adds a version of minimalistic mobilenet-v3 to the definitions. -- 268928503 by Sergio Guadarrama: Mobilenet v2 with group normalization. -- 263492735 by Sergio Guadarrama: Internal change 260037126 by Sergio Guadarrama: Adds an option of using a custom depthwise operation in `expanded_conv`. -- 259997001 by Sergio Guadarrama: Explicitly mark Python binaries/tests with python_version = "PY2". -- 252697685 by Sergio Guadarrama: Internal change 251918746 by Sergio Guadarrama: Internal change 251909704 by Sergio Guadarrama: Mobilenet V3 backbone implementation. -- 247510236 by Sergio Guadarrama: Internal change 246196802 by Sergio Guadarrama: Internal change 246014539 by Sergio Guadarrama: Internal change 245891435 by Sergio Guadarrama: Internal change 245834925 by Sergio Guadarrama: n/a -- PiperOrigin-RevId: 275131829 * Merged commit includes the following changes: 274959989 by Zhichao Lu: Update detection model zoo with MobilenetV3 SSD candidates. -- 274908068 by Zhichao Lu: Opensource MobilenetV3 detection models. -- 274695889 by richardmunoz: RandomPatchGaussian preprocessing step This step can be used during model training to randomly apply gaussian noise to a random image patch. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_patch_gaussian { random_coef: 0.5 min_patch_size: 1 max_patch_size: 250 min_gaussian_stddev: 0.0 max_gaussian_stddev: 1.0 } } ... } -- 274257872 by lzc: Internal change. -- 274114689 by Zhichao Lu: Pass native_resize flag to other FPN variants. -- 274112308 by lzc: Internal change. -- 274090763 by richardmunoz: Util function for getting a patch mask on an image for use with the Object Detection API -- 274069806 by Zhichao Lu: Adding functions which will help compute predictions and losses for CenterNet. -- 273860828 by lzc: Internal change. -- 273380069 by richardmunoz: RandomImageDownscaleToTargetPixels preprocessing step This step can be used during model training to randomly downscale an image to a random target number of pixels. If the image does not contain more than the target number of pixels, then downscaling is skipped. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_downscale_to_target_pixels { random_coef: 0.5 min_target_pixels: 300000 max_target_pixels: 500000 } } ... } -- 272987602 by Zhichao Lu: Avoid -inf when empty box list is passed. -- 272525836 by Zhichao Lu: Cleanup repeated resizing code in meta archs. -- 272458667 by richardmunoz: RandomJpegQuality preprocessing step This step can be used during model training to randomly encode the image into a jpeg with a random quality level. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_jpeg_quality { random_coef: 0.5 min_jpeg_quality: 80 max_jpeg_quality: 100 } } ... } -- 271412717 by Zhichao Lu: Enables TPU training with the V2 eager + tf.function Object Detection training loops. -- 270744153 by Zhichao Lu: Adding the offset and size target assigners for CenterNet. -- 269916081 by Zhichao Lu: Include basic installation in Object Detection API tutorial. Also: - Use TF2.0 - Use saved_model -- 269376056 by Zhichao Lu: Fix to variable loading in RetinaNet w/ custom loops. (makes the code rely on the exact name scopes that are generated a little bit less) -- 269256251 by lzc: Add use_partitioned_nms field to config and update post_prossing_builder to honor that flag when building nms function. -- 268865295 by Zhichao Lu: Adding functionality for importing and merging back internal state of the metric. -- 268640984 by Zhichao Lu: Fix computation of gaussian sigma value to create CenterNet heatmap target. -- 267475576 by Zhichao Lu: Fix for exporter trying to export non-existent exponential moving averages. -- 267286768 by Zhichao Lu: Update mixed-precision policy. -- 266166879 by Zhichao Lu: Internal change 265860884 by Zhichao Lu: Apply floor function to center coordinates when creating heatmap for CenterNet target. -- 265702749 by Zhichao Lu: Internal change -- 264241949 by ronnyvotel: Updating Faster R-CNN 'final_anchors' to be in normalized coordinates. -- 264175192 by lzc: Update model_fn to only read hparams if it is not None. -- 264159328 by Zhichao Lu: Modify nearest neighbor upsampling to eliminate a multiply operation. For quantized models, the multiply operation gets unnecessarily quantized and reduces accuracy (simple stacking would work in place of the broadcast op which doesn't require quantization). Also removes an unnecessary reshape op. -- 263668306 by Zhichao Lu: Add the option to use dynamic map_fn for batch NMS -- 263031163 by Zhichao Lu: Mark outside compilation for NMS as optional. -- 263024916 by Zhichao Lu: Add an ExperimentalModel meta arch for experimenting with new model types. -- 262655894 by Zhichao Lu: Add the center heatmap target assigner for CenterNet -- 262431036 by Zhichao Lu: Adding add_eval_dict to allow for evaluation on model_v2 -- 262035351 by ronnyvotel: Removing any non-Tensor predictions from the third stage of Mask R-CNN. -- 261953416 by Zhichao Lu: Internal change. -- 261834966 by Zhichao Lu: Fix the NMS OOM issue on TPU by forcing NMS to run outside of TPU. -- 261775941 by Zhichao Lu: Make Keras InputLayer compatible with both TF 1.x and TF 2.0. -- 261775633 by Zhichao Lu: Visualize additional channels with ground-truth bounding boxes. -- 261768117 by lzc: Internal change. -- 261766773 by ronnyvotel: Exposing `return_raw_detections_during_predict` in Faster R-CNN Proto. -- 260975089 by ronnyvotel: Moving calculation of batched prediction tensor names after all tensors in prediction dictionary are created. -- 259816913 by ronnyvotel: Adding raw detection boxes and feature map indices to SSD -- 259791955 by Zhichao Lu: Added a flag to control the use partitioned_non_max_suppression. -- 259580475 by Zhichao Lu: Tweak quantization-aware training re-writer to support NasFpn model architecture. -- 259579943 by rathodv: Add a meta target assigner proto and builders in OD API. -- 259577741 by Zhichao Lu: Internal change. -- 259366315 by lzc: Internal change. -- 259344310 by ronnyvotel: Updating faster rcnn so that raw_detection_boxes from predict() are in normalized coordinates. -- 259338670 by Zhichao Lu: Add support for use_native_resize_op to more feature extractors. Use dynamic shapes when static shapes are not available. -- 259083543 by ronnyvotel: Updating/fixing documentation. -- 259078937 by rathodv: Add prediction fields for tensors returned from detection_model.predict. -- 259044601 by Zhichao Lu: Add protocol buffer and builders for temperature scaling calibration. -- 259036770 by lzc: Internal changes. -- 259006223 by ronnyvotel: Adding detection anchor indices to Faster R-CNN Config. This is useful when one wishes to associate final detections and the anchors (or pre-nms boxes) from which they originated. -- 258872501 by Zhichao Lu: Run the training pipeline of ssd + resnet_v1_50 + fpn with a checkpoint. -- 258840686 by ronnyvotel: Adding standard outputs to DetectionModel.predict(). This CL only updates Faster R-CNN. Other meta architectures will be updated in future CLs. -- 258672969 by lzc: Internal change. -- 258649494 by lzc: Internal changes. -- 258630321 by ronnyvotel: Fixing documentation in shape_utils.flatten_dimensions(). -- 258468145 by Zhichao Lu: Add additional output tensors parameter to Postprocess op. -- 258099219 by Zhichao Lu: Internal changes -- PiperOrigin-RevId: 274959989
0ba83cf0 · pkulzc · Sergio Guadarrama · 9aed0ffb · 0ba83cf0 · 0ba83cf0
Commit 0ba83cf0 authored Oct 17, 2019 by pkulzc Committed by Sergio Guadarrama Oct 17, 2019
20 changed files
--- a/research/object_detection/protos/model.proto
+++ b/research/object_detection/protos/model.proto
@@ -10,5 +10,15 @@ message DetectionModel {
  oneof model {
    FasterRcnn faster_rcnn = 1;
    Ssd ssd = 2;
+
+    // This can be used to define experimental models. To define your own
+    // experimental meta architecture, populate a key in the
+    // model_builder.EXPERIMENTAL_META_ARCHITECURE_BUILDER_MAP dict and set its
+    // value to a function that builds your model.
+    ExperimentalModel experimental_model = 3;
  }
 }
+
+message ExperimentalModel {
+  optional string name = 1;
+}
--- a/research/object_detection/protos/post_processing.proto
+++ b/research/object_detection/protos/post_processing.proto
@@ -40,8 +40,11 @@ message BatchNonMaxSuppression {
  // Soft NMS sigma parameter; Bodla et al, https://arxiv.org/abs/1704.04503)
  optional float soft_nms_sigma = 9 [default = 0.0];

+  // Whether to use partitioned version of non_max_suppression.
+  optional bool use_partitioned_nms = 10 [default = false];
+
  // Whether to use tf.image.combined_non_max_suppression.
-  optional bool use_combined_nms = 10 [default = false];
+  optional bool use_combined_nms = 11 [default = false];
 }

 // Configuration proto for post-processing predicted boxes and

--- a/research/object_detection/protos/preprocessor.proto
+++ b/research/object_detection/protos/preprocessor.proto
@@ -39,6 +39,9 @@ message PreprocessingStep {
    AutoAugmentImage autoaugment_image = 31;
    DropLabelProbabilistically drop_label_probabilistically = 32;
    RemapLabels remap_labels = 33;
+    RandomJpegQuality random_jpeg_quality = 34;
+    RandomDownscaleToTargetPixels random_downscale_to_target_pixels = 35;
+    RandomPatchGaussian random_patch_gaussian = 36;
  }
 }

@@ -490,3 +493,43 @@ message RemapLabels {
  // Label to map to.
  optional int32 new_label = 2;
 }
+
+// Applies a jpeg encoding with a random quality factor.
+message RandomJpegQuality {
+  // Probability of keeping the original image.
+  optional float random_coef = 1 [default = 0.0];
+
+  // Minimum jpeg quality to use.
+  optional int32 min_jpeg_quality = 2 [default = 0];
+
+  // Maximum jpeg quality to use.
+  optional int32 max_jpeg_quality = 3 [default = 100];
+}
+
+// Randomly shrinks image (keeping aspect ratio) to a target number of pixels.
+// If the image contains less than the chosen target number of pixels, it will
+// not be changed.
+message RandomDownscaleToTargetPixels {
+  // Probability of keeping the original image.
+  optional float random_coef = 1 [default = 0.0];
+
+  // The target number of pixels will be chosen to be in the range
+  // [min_target_pixels, max_target_pixels]
+  optional int32 min_target_pixels = 2 [default = 300000];
+  optional int32 max_target_pixels = 3 [default = 500000];
+}
+
+message RandomPatchGaussian {
+  // Probability of keeping the original image.
+  optional float random_coef = 1 [default = 0.0];
+
+  // The patch size will be chosen to be in the range
+  // [min_patch_size, max_patch_size).
+  optional int32 min_patch_size = 2 [default = 1];
+  optional int32 max_patch_size = 3 [default = 250];
+
+  // The standard deviation of the gaussian noise applied within the patch will
+  // be chosen to be in the range [min_gaussian_stddev, max_gaussian_stddev).
+  optional float min_gaussian_stddev = 4 [default = 0.0];
+  optional float max_gaussian_stddev = 5 [default = 1.0];
+}
--- a/research/object_detection/protos/ssd.proto
+++ b/research/object_detection/protos/ssd.proto
@@ -13,7 +13,7 @@ import "object_detection/protos/post_processing.proto";
 import "object_detection/protos/region_similarity_calculator.proto";

 // Configuration for Single Shot Detection (SSD) models.
-// Next id: 26
+// Next id: 27
 message Ssd {
  // Number of classes to predict.
  optional int32 num_classes = 1;
@@ -96,6 +96,8 @@ message Ssd {

  optional float implicit_example_weight = 23 [default = 1.0];

+  optional bool return_raw_detections_during_predict = 26 [default = false];
+
  // Configuration proto for MaskHead.
  // Next id: 11
  message MaskHead {

--- a/research/object_detection/protos/target_assigner.proto
+++ b/research/object_detection/protos/target_assigner.proto
+syntax = "proto2";
+
+package object_detection.protos;
+
+import "object_detection/protos/box_coder.proto";
+import "object_detection/protos/matcher.proto";
+import "object_detection/protos/region_similarity_calculator.proto";
+
+// Message to configure Target Assigner for object detectors.
+message TargetAssigner {
+  optional Matcher matcher = 1;
+  optional RegionSimilarityCalculator similarity_calculator = 2;
+  optional BoxCoder box_coder = 3;
+}
--- a/research/object_detection/samples/configs/ssdlite_mobilenet_v3_large_320x320_coco.config
+++ b/research/object_detection/samples/configs/ssdlite_mobilenet_v3_large_320x320_coco.config
+# SSDLite with Mobilenet v3 large feature extractor.
+# Trained on COCO14, initialized from scratch.
+# 3.22M parameters, 1.02B FLOPs
+# TPU-compatible.
+# Users should configure the fine_tune_checkpoint field in the train config as
+# well as the label_map_path and input_path fields in the train_input_reader and
+# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
+# should be configured.
+
+model {
+  ssd {
+    inplace_batchnorm_update: true
+    freeze_batchnorm: false
+    num_classes: 90
+    box_coder {
+      faster_rcnn_box_coder {
+        y_scale: 10.0
+        x_scale: 10.0
+        height_scale: 5.0
+        width_scale: 5.0
+      }
+    }
+    matcher {
+      argmax_matcher {
+        matched_threshold: 0.5
+        unmatched_threshold: 0.5
+        ignore_thresholds: false
+        negatives_lower_than_unmatched: true
+        force_match_for_each_row: true
+        use_matmul_gather: true
+      }
+    }
+    similarity_calculator {
+      iou_similarity {
+      }
+    }
+    encode_background_as_zeros: true
+    anchor_generator {
+      ssd_anchor_generator {
+        num_layers: 6
+        min_scale: 0.2
+        max_scale: 0.95
+        aspect_ratios: 1.0
+        aspect_ratios: 2.0
+        aspect_ratios: 0.5
+        aspect_ratios: 3.0
+        aspect_ratios: 0.3333
+      }
+    }
+    image_resizer {
+      fixed_shape_resizer {
+        height: 320
+        width: 320
+      }
+    }
+    box_predictor {
+      convolutional_box_predictor {
+        min_depth: 0
+        max_depth: 0
+        num_layers_before_predictor: 0
+        use_dropout: false
+        dropout_keep_probability: 0.8
+        kernel_size: 3
+        use_depthwise: true
+        box_code_size: 4
+        apply_sigmoid_to_scores: false
+        class_prediction_bias_init: -4.6
+        conv_hyperparams {
+          activation: RELU_6,
+          regularizer {
+            l2_regularizer {
+              weight: 0.00004
+            }
+          }
+          initializer {
+            random_normal_initializer {
+              stddev: 0.03
+              mean: 0.0
+            }
+          }
+          batch_norm {
+            train: true,
+            scale: true,
+            center: true,
+            decay: 0.97,
+            epsilon: 0.001,
+          }
+        }
+      }
+    }
+    feature_extractor {
+      type: 'ssd_mobilenet_v3_large'
+      min_depth: 16
+      depth_multiplier: 1.0
+      use_depthwise: true
+      conv_hyperparams {
+        activation: RELU_6,
+        regularizer {
+          l2_regularizer {
+            weight: 0.00004
+          }
+        }
+        initializer {
+          truncated_normal_initializer {
+            stddev: 0.03
+            mean: 0.0
+          }
+        }
+        batch_norm {
+          train: true,
+          scale: true,
+          center: true,
+          decay: 0.97,
+          epsilon: 0.001,
+        }
+      }
+      override_base_feature_extractor_hyperparams: true
+    }
+    loss {
+      classification_loss {
+        weighted_sigmoid_focal {
+          alpha: 0.75,
+          gamma: 2.0
+        }
+      }
+      localization_loss {
+        weighted_smooth_l1 {
+          delta: 1.0
+        }
+      }
+      classification_weight: 1.0
+      localization_weight: 1.0
+    }
+    normalize_loss_by_num_matches: true
+    normalize_loc_loss_by_codesize: true
+    post_processing {
+      batch_non_max_suppression {
+        score_threshold: 1e-8
+        iou_threshold: 0.6
+        max_detections_per_class: 100
+        max_total_detections: 100
+        use_static_shapes: true
+      }
+      score_converter: SIGMOID
+    }
+  }
+}
+
+train_config: {
+  batch_size: 512
+  sync_replicas: true
+  startup_delay_steps: 0
+  replicas_to_aggregate: 32
+  num_steps: 400000
+  data_augmentation_options {
+    random_horizontal_flip {
+    }
+  }
+  data_augmentation_options {
+    ssd_random_crop {
+    }
+  }
+  optimizer {
+    momentum_optimizer: {
+      learning_rate: {
+        cosine_decay_learning_rate {
+          learning_rate_base: 0.4
+          total_steps: 400000
+          warmup_learning_rate: 0.13333
+          warmup_steps: 2000
+        }
+      }
+      momentum_optimizer_value: 0.9
+    }
+    use_moving_average: false
+  }
+  max_number_of_boxes: 100
+  unpad_groundtruth_tensors: false
+}
+
+train_input_reader: {
+  tf_record_input_reader {
+    input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-?????-of-00100"
+  }
+  label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
+}
+
+eval_config: {
+  num_examples: 8000
+}
+
+eval_input_reader: {
+  tf_record_input_reader {
+    input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-?????-of-00010"
+  }
+  label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
+  shuffle: false
+  num_readers: 1
+}
+
--- a/research/object_detection/samples/configs/ssdlite_mobilenet_v3_small_320x320_coco.config
+++ b/research/object_detection/samples/configs/ssdlite_mobilenet_v3_small_320x320_coco.config
+# SSDLite with Mobilenet v3 small feature extractor.
+# Trained on COCO14, initialized from scratch.
+# TPU-compatible.
+# Users should configure the fine_tune_checkpoint field in the train config as
+# well as the label_map_path and input_path fields in the train_input_reader and
+# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
+# should be configured.
+
+model {
+  ssd {
+    inplace_batchnorm_update: true
+    freeze_batchnorm: false
+    num_classes: 90
+    box_coder {
+      faster_rcnn_box_coder {
+        y_scale: 10.0
+        x_scale: 10.0
+        height_scale: 5.0
+        width_scale: 5.0
+      }
+    }
+    matcher {
+      argmax_matcher {
+        matched_threshold: 0.5
+        unmatched_threshold: 0.5
+        ignore_thresholds: false
+        negatives_lower_than_unmatched: true
+        force_match_for_each_row: true
+        use_matmul_gather: true
+      }
+    }
+    similarity_calculator {
+      iou_similarity {
+      }
+    }
+    encode_background_as_zeros: true
+    anchor_generator {
+      ssd_anchor_generator {
+        num_layers: 6
+        min_scale: 0.2
+        max_scale: 0.95
+        aspect_ratios: 1.0
+        aspect_ratios: 2.0
+        aspect_ratios: 0.5
+        aspect_ratios: 3.0
+        aspect_ratios: 0.3333
+      }
+    }
+    image_resizer {
+      fixed_shape_resizer {
+        height: 320
+        width: 320
+      }
+    }
+    box_predictor {
+      convolutional_box_predictor {
+        min_depth: 0
+        max_depth: 0
+        num_layers_before_predictor: 0
+        use_dropout: false
+        dropout_keep_probability: 0.8
+        kernel_size: 3
+        use_depthwise: true
+        box_code_size: 4
+        apply_sigmoid_to_scores: false
+        class_prediction_bias_init: -4.6
+        conv_hyperparams {
+          activation: RELU_6,
+          regularizer {
+            l2_regularizer {
+              weight: 0.00004
+            }
+          }
+          initializer {
+            random_normal_initializer {
+              stddev: 0.03
+              mean: 0.0
+            }
+          }
+          batch_norm {
+            train: true,
+            scale: true,
+            center: true,
+            decay: 0.97,
+            epsilon: 0.001,
+          }
+        }
+      }
+    }
+    feature_extractor {
+      type: 'ssd_mobilenet_v3_small'
+      min_depth: 16
+      depth_multiplier: 1.0
+      use_depthwise: true
+      conv_hyperparams {
+        activation: RELU_6,
+        regularizer {
+          l2_regularizer {
+            weight: 0.00004
+          }
+        }
+        initializer {
+          truncated_normal_initializer {
+            stddev: 0.03
+            mean: 0.0
+          }
+        }
+        batch_norm {
+          train: true,
+          scale: true,
+          center: true,
+          decay: 0.97,
+          epsilon: 0.001,
+        }
+      }
+      override_base_feature_extractor_hyperparams: true
+    }
+    loss {
+      classification_loss {
+        weighted_sigmoid_focal {
+          alpha: 0.75,
+          gamma: 2.0
+        }
+      }
+      localization_loss {
+        weighted_smooth_l1 {
+          delta: 1.0
+        }
+      }
+      classification_weight: 1.0
+      localization_weight: 1.0
+    }
+    normalize_loss_by_num_matches: true
+    normalize_loc_loss_by_codesize: true
+    post_processing {
+      batch_non_max_suppression {
+        score_threshold: 1e-8
+        iou_threshold: 0.6
+        max_detections_per_class: 100
+        max_total_detections: 100
+        use_static_shapes: true
+      }
+      score_converter: SIGMOID
+    }
+  }
+}
+
+train_config: {
+  batch_size: 512
+  sync_replicas: true
+  startup_delay_steps: 0
+  replicas_to_aggregate: 32
+  num_steps: 800000
+  data_augmentation_options {
+    random_horizontal_flip {
+    }
+  }
+  data_augmentation_options {
+    ssd_random_crop {
+    }
+  }
+  optimizer {
+    momentum_optimizer: {
+      learning_rate: {
+        cosine_decay_learning_rate {
+          learning_rate_base: 0.4
+          total_steps: 800000
+          warmup_learning_rate: 0.13333
+          warmup_steps: 2000
+        }
+      }
+      momentum_optimizer_value: 0.9
+    }
+    use_moving_average: false
+  }
+  max_number_of_boxes: 100
+  unpad_groundtruth_tensors: false
+}
+
+train_input_reader: {
+  tf_record_input_reader {
+    input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-?????-of-00100"
+  }
+  label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
+}
+
+eval_config: {
+  num_examples: 8000
+}
+
+eval_input_reader: {
+  tf_record_input_reader {
+    input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-?????-of-00010"
+  }
+  label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
+  shuffle: false
+  num_readers: 1
+}
+
--- a/research/object_detection/tpu_exporters/faster_rcnn.py
+++ b/research/object_detection/tpu_exporters/faster_rcnn.py
@@ -97,7 +97,12 @@ def get_prediction_tensor_shapes(pipeline_config):
  prediction_dict = detection_model.predict(preprocessed_inputs,
                                            true_image_shapes)

-  shapes_info = {k: v.shape.as_list() for k, v in prediction_dict.items()}
+  shapes_info = {}
+  for k, v in prediction_dict.items():
+    if isinstance(v, list):
+      shapes_info[k] = [item.shape.as_list() for item in v]
+    else:
+      shapes_info[k] = v.shape.as_list()
  return shapes_info


@@ -200,7 +205,12 @@ def build_graph(pipeline_config,
  }

  for k in prediction_dict:
-    prediction_dict[k].set_shape(shapes_info[k])
+    if isinstance(prediction_dict[k], list):
+      prediction_dict[k] = [
+          prediction_dict[k][idx].set_shape(shapes_info[k][idx])
+          for idx in len(prediction_dict[k])]
+    else:
+      prediction_dict[k].set_shape(shapes_info[k])

  if use_bfloat16:
    prediction_dict = utils.bfloat16_to_float32_nested(prediction_dict)

--- a/research/object_detection/utils/config_util.py
+++ b/research/object_detection/utils/config_util.py
@@ -552,6 +552,9 @@ def _maybe_update_config_with_key_value(configs, key, value):
    _update_retain_original_images(configs["eval_config"], value)
  elif field_name == "use_bfloat16":
    _update_use_bfloat16(configs, value)
+  elif field_name == "retain_original_image_additional_channels_in_eval":
+    _update_retain_original_image_additional_channels(configs["eval_config"],
+                                                      value)
  else:
    return False
  return True
@@ -935,3 +938,62 @@ def _update_use_bfloat16(configs, use_bfloat16):
    use_bfloat16: A bool, indicating whether to use bfloat16 for training.
  """
  configs["train_config"].use_bfloat16 = use_bfloat16
+
+
+def _update_retain_original_image_additional_channels(
+    eval_config,
+    retain_original_image_additional_channels):
+  """Updates eval config to retain original image additional channels or not.
+
+  The eval_config object is updated in place, and hence not returned.
+
+  Args:
+    eval_config: A eval_pb2.EvalConfig.
+    retain_original_image_additional_channels: Boolean indicating whether to
+      retain original image additional channels in eval mode.
+  """
+  eval_config.retain_original_image_additional_channels = (
+      retain_original_image_additional_channels)
+
+
+def remove_unecessary_ema(variables_to_restore, no_ema_collection=None):
+  """Remap and Remove EMA variable that are not created during training.
+
+  ExponentialMovingAverage.variables_to_restore() returns a map of EMA names
+  to tf variables to restore. E.g.:
+  {
+      conv/batchnorm/gamma/ExponentialMovingAverage: conv/batchnorm/gamma,
+      conv_4/conv2d_params/ExponentialMovingAverage: conv_4/conv2d_params,
+      global_step: global_step
+  }
+  This function takes care of the extra ExponentialMovingAverage variables
+  that get created during eval but aren't available in the checkpoint, by
+  remapping the key to the shallow copy of the variable itself, and remove
+  the entry of its EMA from the variables to restore. An example resulting
+  dictionary would look like:
+  {
+      conv/batchnorm/gamma: conv/batchnorm/gamma,
+      conv_4/conv2d_params: conv_4/conv2d_params,
+      global_step: global_step
+  }
+  Args:
+    variables_to_restore: A dictionary created by ExponentialMovingAverage.
+      variables_to_restore().
+    no_ema_collection: A list of namescope substrings to match the variables
+      to eliminate EMA.
+
+  Returns:
+    A variables_to_restore dictionary excluding the collection of unwanted
+    EMA mapping.
+  """
+  if no_ema_collection is None:
+    return variables_to_restore
+
+  for key in variables_to_restore:
+    if "ExponentialMovingAverage" in key:
+      for name in no_ema_collection:
+        if name in key:
+          variables_to_restore[key.replace("/ExponentialMovingAverage",
+                                           "")] = variables_to_restore[key]
+          del variables_to_restore[key]
+  return variables_to_restore
--- a/research/object_detection/utils/config_util_test.py
+++ b/research/object_detection/utils/config_util_test.py
@@ -872,6 +872,62 @@ class ConfigUtilTest(tf.test.TestCase):
          field_name="shuffle",
          value=False)

+  def testOverWriteRetainOriginalImageAdditionalChannels(self):
+    """Tests that keyword arguments are applied correctly."""
+    original_retain_original_image_additional_channels = True
+    desired_retain_original_image_additional_channels = False
+
+    pipeline_config_path = os.path.join(self.get_temp_dir(), "pipeline.config")
+    pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
+    pipeline_config.eval_config.retain_original_image_additional_channels = (
+        original_retain_original_image_additional_channels)
+    _write_config(pipeline_config, pipeline_config_path)
+
+    configs = config_util.get_configs_from_pipeline_file(pipeline_config_path)
+    override_dict = {
+        "retain_original_image_additional_channels_in_eval":
+            desired_retain_original_image_additional_channels
+    }
+    configs = config_util.merge_external_params_with_configs(
+        configs, kwargs_dict=override_dict)
+    retain_original_image_additional_channels = configs[
+        "eval_config"].retain_original_image_additional_channels
+    self.assertEqual(desired_retain_original_image_additional_channels,
+                     retain_original_image_additional_channels)
+
+  def testRemoveUnecessaryEma(self):
+    input_dict = {
+        "expanded_conv_10/project/act_quant/min":
+            1,
+        "FeatureExtractor/MobilenetV2_2/expanded_conv_5/expand/act_quant/min":
+            2,
+        "expanded_conv_10/expand/BatchNorm/gamma/min/ExponentialMovingAverage":
+            3,
+        "expanded_conv_3/depthwise/BatchNorm/beta/max/ExponentialMovingAverage":
+            4,
+        "BoxPredictor_1/ClassPredictor_depthwise/act_quant":
+            5
+    }
+
+    no_ema_collection = ["/min", "/max"]
+
+    output_dict = {
+        "expanded_conv_10/project/act_quant/min":
+            1,
+        "FeatureExtractor/MobilenetV2_2/expanded_conv_5/expand/act_quant/min":
+            2,
+        "expanded_conv_10/expand/BatchNorm/gamma/min":
+            3,
+        "expanded_conv_3/depthwise/BatchNorm/beta/max":
+            4,
+        "BoxPredictor_1/ClassPredictor_depthwise/act_quant":
+            5
+    }
+
+    self.assertEqual(
+        output_dict,
+        config_util.remove_unecessary_ema(input_dict, no_ema_collection))
+

 if __name__ == "__main__":
  tf.test.main()
--- a/research/object_detection/utils/object_detection_evaluation.py
+++ b/research/object_detection/utils/object_detection_evaluation.py
@@ -227,6 +227,29 @@ class ObjectDetectionEvaluator(DetectionEvaluator):
    ])
    self._build_metric_names()

+  def get_internal_state(self):
+    """Returns internal state and image ids that lead to the state.
+
+    Note that only evaluation results will be returned (e.g. not raw predictions
+    or groundtruth.
+    """
+    return self._evaluation.get_internal_state(), self._image_ids
+
+  def merge_internal_state(self, image_ids, state_tuple):
+    """Merges internal state with the existing state of evaluation.
+
+    If image_id is already seen by evaluator, an error will be thrown.
+
+    Args:
+      image_ids: list of images whose state is stored in the tuple.
+      state_tuple: state.
+    """
+    for image_id in image_ids:
+      if image_id in self._image_ids:
+        raise ValueError('Image with id {} already added.'.format(image_id))
+
+    self._evaluation.merge_internal_state(state_tuple)
+
  def _build_metric_names(self):
    """Builds a list with metric names."""
    if self._recall_lower_bound > 0.0 or self._recall_upper_bound < 1.0:
@@ -434,21 +457,23 @@ class ObjectDetectionEvaluator(DetectionEvaluator):
        label_id_offset=self._label_id_offset)
    self._image_ids.clear()

-  def get_estimator_eval_metric_ops(self, eval_dict):
-    """Returns dict of metrics to use with `tf.estimator.EstimatorSpec`.
+  def add_eval_dict(self, eval_dict):
+    """Observes an evaluation result dict for a single example.

-    Note that this must only be implemented if performing evaluation with a
-    `tf.estimator.Estimator`.
+    When executing eagerly, once all observations have been observed by this
+    method you can use `.evaluate()` to get the final metrics.
+
+    When using `tf.estimator.Estimator` for evaluation this function is used by
+    `get_estimator_eval_metric_ops()` to construct the metric update op.

    Args:
      eval_dict: A dictionary that holds tensors for evaluating an object
        detection model, returned from
-        eval_util.result_dict_for_single_example(). It must contain
-        standard_fields.InputDataFields.key.
+        eval_util.result_dict_for_single_example().

    Returns:
-      A dictionary of metric names to tuple of value_op and update_op that can
-      be used as eval metric ops in `tf.estimator.EstimatorSpec`.
+      None when executing eagerly, or an update_op that can be used to update
+      the eval metrics in `tf.estimator.EstimatorSpec`.
    """
    # remove unexpected fields
    eval_dict_filtered = dict()
@@ -479,7 +504,25 @@ class ObjectDetectionEvaluator(DetectionEvaluator):

    args = [eval_dict_filtered[standard_fields.InputDataFields.key]]
    args.extend(six.itervalues(eval_dict_filtered))
-    update_op = tf.py_func(update_op, args, [])
+    return tf.py_func(update_op, args, [])
+
+  def get_estimator_eval_metric_ops(self, eval_dict):
+    """Returns dict of metrics to use with `tf.estimator.EstimatorSpec`.
+
+    Note that this must only be implemented if performing evaluation with a
+    `tf.estimator.Estimator`.
+
+    Args:
+      eval_dict: A dictionary that holds tensors for evaluating an object
+        detection model, returned from
+        eval_util.result_dict_for_single_example(). It must contain
+        standard_fields.InputDataFields.key.
+
+    Returns:
+      A dictionary of metric names to tuple of value_op and update_op that can
+      be used as eval metric ops in `tf.estimator.EstimatorSpec`.
+    """
+    update_op = self.add_eval_dict(eval_dict)

    def first_value_func():
      self._metrics = self.evaluate()
@@ -919,6 +962,16 @@ class OpenImagesInstanceSegmentationChallengeEvaluator(
        group_of_weight=0.0)


+ObjectDetectionEvaluationState = collections.namedtuple(
+    'ObjectDetectionEvaluationState', [
+        'num_gt_instances_per_class',
+        'scores_per_class',
+        'tp_fp_labels_per_class',
+        'num_gt_imgs_per_class',
+        'num_images_correctly_detected_per_class',
+    ])
+
+
 class ObjectDetectionEvaluation(object):
  """Internal implementation of Pascal object detection metrics."""

@@ -996,12 +1049,47 @@ class ObjectDetectionEvaluation(object):
    self.average_precision_per_class.fill(np.nan)
    self.precisions_per_class = [np.nan] * self.num_class
    self.recalls_per_class = [np.nan] * self.num_class
+    self.sum_tp_class = [np.nan] * self.num_class

    self.corloc_per_class = np.ones(self.num_class, dtype=float)

  def clear_detections(self):
    self._initialize_detections()

+  def get_internal_state(self):
+    """Returns internal state of the evaluation.
+
+    NOTE: that only evaluation results will be returned
+    (e.g. no raw predictions or groundtruth).
+    Returns:
+      internal state of the evaluation.
+    """
+    return ObjectDetectionEvaluationState(
+        self.num_gt_instances_per_class, self.scores_per_class,
+        self.tp_fp_labels_per_class, self.num_gt_imgs_per_class,
+        self.num_images_correctly_detected_per_class)
+
+  def merge_internal_state(self, state_tuple):
+    """Merges internal state of the evaluation with the current state.
+
+    Args:
+      state_tuple: state tuple representing evaluation state: should be of type
+        ObjectDetectionEvaluationState.
+    """
+    (num_gt_instances_per_class, scores_per_class, tp_fp_labels_per_class,
+     num_gt_imgs_per_class, num_images_correctly_detected_per_class) = (
+         state_tuple)
+    assert self.num_class == len(num_gt_instances_per_class)
+    assert self.num_class == len(scores_per_class)
+    assert self.num_class == len(tp_fp_labels_per_class)
+    for i in range(self.num_class):
+      self.scores_per_class[i].extend(scores_per_class[i])
+      self.tp_fp_labels_per_class[i].extend(tp_fp_labels_per_class[i])
+      self.num_gt_instances_per_class[i] += num_gt_instances_per_class[i]
+      self.num_gt_imgs_per_class[i] += num_gt_imgs_per_class[i]
+      self.num_images_correctly_detected_per_class[
+          i] += num_images_correctly_detected_per_class[i]
+
  def add_single_ground_truth_image_info(self,
                                         image_key,
                                         groundtruth_boxes,
@@ -1162,9 +1250,9 @@ class ObjectDetectionEvaluation(object):
          ~groundtruth_is_difficult_list
          & ~groundtruth_is_group_of_list] == class_index)
      num_groupof_gt_instances = self.group_of_weight * np.sum(
-          groundtruth_class_labels[groundtruth_is_group_of_list
-                                   & ~groundtruth_is_difficult_list] ==
-          class_index)
+          groundtruth_class_labels[
+              groundtruth_is_group_of_list
+              & ~groundtruth_is_difficult_list] == class_index)
      self.num_gt_instances_per_class[
          class_index] += num_gt_instances + num_groupof_gt_instances
      if np.any(groundtruth_class_labels == class_index):
@@ -1216,6 +1304,7 @@ class ObjectDetectionEvaluation(object):

      self.precisions_per_class[class_index] = precision_within_bound
      self.recalls_per_class[class_index] = recall_within_bound
+      self.sum_tp_class[class_index] = tp_fp_labels.sum()
      average_precision = metrics.compute_average_precision(
          precision_within_bound, recall_within_bound)
      self.average_precision_per_class[class_index] = average_precision

--- a/research/object_detection/utils/object_detection_evaluation_test.py
+++ b/research/object_detection/utils/object_detection_evaluation_test.py
@@ -941,6 +941,34 @@ class ObjectDetectionEvaluationTest(tf.test.TestCase):
    self.assertAlmostEqual(expected_mean_ap, mean_ap)
    self.assertAlmostEqual(expected_mean_corloc, mean_corloc)

+  def test_merge_internal_state(self):
+    # Test that if initial state is merged, the results of the evaluation are
+    # the same.
+    od_eval_state = self.od_eval.get_internal_state()
+    copy_od_eval = object_detection_evaluation.ObjectDetectionEvaluation(
+        self.od_eval.num_class)
+    copy_od_eval.merge_internal_state(od_eval_state)
+
+    (average_precision_per_class, mean_ap, precisions_per_class,
+     recalls_per_class, corloc_per_class,
+     mean_corloc) = self.od_eval.evaluate()
+
+    (copy_average_precision_per_class, copy_mean_ap, copy_precisions_per_class,
+     copy_recalls_per_class, copy_corloc_per_class,
+     copy_mean_corloc) = copy_od_eval.evaluate()
+
+    for i in range(self.od_eval.num_class):
+      self.assertTrue(
+          np.allclose(copy_precisions_per_class[i], precisions_per_class[i]))
+      self.assertTrue(
+          np.allclose(copy_recalls_per_class[i], recalls_per_class[i]))
+    self.assertTrue(
+        np.allclose(copy_average_precision_per_class,
+                    average_precision_per_class))
+    self.assertTrue(np.allclose(copy_corloc_per_class, corloc_per_class))
+    self.assertAlmostEqual(copy_mean_ap, mean_ap)
+    self.assertAlmostEqual(copy_mean_corloc, mean_corloc)
+

 class ObjectDetectionEvaluatorTest(tf.test.TestCase, parameterized.TestCase):


--- a/research/object_detection/utils/ops.py
+++ b/research/object_detection/utils/ops.py
@@ -967,9 +967,8 @@ def nearest_neighbor_upsampling(input_tensor, scale=None, height_scale=None,
    w_scale = scale if width_scale is None else width_scale
    (batch_size, height, width,
     channels) = shape_utils.combined_static_and_dynamic_shape(input_tensor)
-    output_tensor = tf.reshape(
-        input_tensor, [batch_size, height, 1, width, 1, channels]) * tf.ones(
-            [1, 1, h_scale, 1, w_scale, 1], dtype=input_tensor.dtype)
+    output_tensor = tf.stack([input_tensor] * w_scale, axis=3)
+    output_tensor = tf.stack([output_tensor] * h_scale, axis=2)
    return tf.reshape(output_tensor,
                      [batch_size, height * h_scale, width * w_scale, channels])


--- a/research/object_detection/utils/patch_ops.py
+++ b/research/object_detection/utils/patch_ops.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Operations for image patches."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+
+def get_patch_mask(y, x, patch_size, image_shape):
+  """Creates a 2D mask array for a square patch of a given size and location.
+
+  The mask is created with its center at the y and x coordinates, which must be
+  within the image. While the mask center must be within the image, the mask
+  itself can be partially outside of it. If patch_size is an even number, then
+  the mask is created with lower-valued coordinates first (top and left).
+
+  Args:
+    y: An integer or scalar int32 tensor. The vertical coordinate of the
+      patch mask center. Must be within the range [0, image_height).
+    x: An integer or scalar int32 tensor. The horizontal coordinate of the
+      patch mask center. Must be within the range [0, image_width).
+    patch_size: An integer or scalar int32 tensor. The square size of the
+      patch mask. Must be at least 1.
+    image_shape: A list or 1D int32 tensor representing the shape of the image
+      to which the mask will correspond, with the first two values being image
+      height and width. For example, [image_height, image_width] or
+      [image_height, image_width, image_channels].
+
+  Returns:
+    Boolean mask tensor of shape [image_height, image_width] with True values
+    for the patch.
+
+  Raises:
+    tf.errors.InvalidArgumentError: if x is not in the range [0, image_width), y
+      is not in the range [0, image_height), or patch_size is not at least 1.
+  """
+  image_hw = image_shape[:2]
+  mask_center_yx = tf.stack([y, x])
+  with tf.control_dependencies([
+      tf.debugging.assert_greater_equal(
+          patch_size, 1,
+          message='Patch size must be >= 1'),
+      tf.debugging.assert_greater_equal(
+          mask_center_yx, 0,
+          message='Patch center (y, x) must be >= (0, 0)'),
+      tf.debugging.assert_less(
+          mask_center_yx, image_hw,
+          message='Patch center (y, x) must be < image (h, w)')
+  ]):
+    mask_center_yx = tf.identity(mask_center_yx)
+
+  half_patch_size = tf.cast(patch_size, dtype=tf.float32) / 2
+  start_yx = mask_center_yx - tf.cast(tf.floor(half_patch_size), dtype=tf.int32)
+  end_yx = mask_center_yx + tf.cast(tf.ceil(half_patch_size), dtype=tf.int32)
+
+  start_yx = tf.maximum(start_yx, 0)
+  end_yx = tf.minimum(end_yx, image_hw)
+
+  start_y = start_yx[0]
+  start_x = start_yx[1]
+  end_y = end_yx[0]
+  end_x = end_yx[1]
+
+  lower_pad = image_hw[0] - end_y
+  upper_pad = start_y
+  left_pad = start_x
+  right_pad = image_hw[1] - end_x
+  mask = tf.ones([end_y - start_y, end_x - start_x], dtype=tf.bool)
+  return tf.pad(mask, [[upper_pad, lower_pad], [left_pad, right_pad]])
--- a/research/object_detection/utils/patch_ops_test.py
+++ b/research/object_detection/utils/patch_ops_test.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for object_detection.utils.patch_ops."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from absl.testing import parameterized
+import numpy as np
+import tensorflow as tf
+
+from object_detection.utils import patch_ops
+
+
+class GetPatchMaskTest(tf.test.TestCase, parameterized.TestCase):
+
+  def testMaskShape(self):
+    image_shape = [15, 10]
+    mask = patch_ops.get_patch_mask(
+        10, 5, patch_size=3, image_shape=image_shape)
+    self.assertListEqual(mask.shape.as_list(), image_shape)
+
+  def testHandleImageShapeWithChannels(self):
+    image_shape = [15, 10, 3]
+    mask = patch_ops.get_patch_mask(
+        10, 5, patch_size=3, image_shape=image_shape)
+    self.assertListEqual(mask.shape.as_list(), image_shape[:2])
+
+  def testMaskDType(self):
+    mask = patch_ops.get_patch_mask(2, 3, patch_size=2, image_shape=[6, 7])
+    self.assertDTypeEqual(mask, bool)
+
+  def testMaskAreaWithEvenPatchSize(self):
+    image_shape = [6, 7]
+    mask = patch_ops.get_patch_mask(2, 3, patch_size=2, image_shape=image_shape)
+    expected_mask = np.array([
+        [0, 0, 0, 0, 0, 0, 0],
+        [0, 0, 1, 1, 0, 0, 0],
+        [0, 0, 1, 1, 0, 0, 0],
+        [0, 0, 0, 0, 0, 0, 0],
+        [0, 0, 0, 0, 0, 0, 0],
+        [0, 0, 0, 0, 0, 0, 0],
+    ]).reshape(image_shape).astype(bool)
+    self.assertAllEqual(mask, expected_mask)
+
+  def testMaskAreaWithEvenPatchSize4(self):
+    image_shape = [6, 7]
+    mask = patch_ops.get_patch_mask(2, 3, patch_size=4, image_shape=image_shape)
+    expected_mask = np.array([
+        [0, 1, 1, 1, 1, 0, 0],
+        [0, 1, 1, 1, 1, 0, 0],
+        [0, 1, 1, 1, 1, 0, 0],
+        [0, 1, 1, 1, 1, 0, 0],
+        [0, 0, 0, 0, 0, 0, 0],
+        [0, 0, 0, 0, 0, 0, 0],
+    ]).reshape(image_shape).astype(bool)
+    self.assertAllEqual(mask, expected_mask)
+
+  def testMaskAreaWithOddPatchSize(self):
+    image_shape = [6, 7]
+    mask = patch_ops.get_patch_mask(2, 3, patch_size=3, image_shape=image_shape)
+    expected_mask = np.array([
+        [0, 0, 0, 0, 0, 0, 0],
+        [0, 0, 1, 1, 1, 0, 0],
+        [0, 0, 1, 1, 1, 0, 0],
+        [0, 0, 1, 1, 1, 0, 0],
+        [0, 0, 0, 0, 0, 0, 0],
+        [0, 0, 0, 0, 0, 0, 0],
+    ]).reshape(image_shape).astype(bool)
+    self.assertAllEqual(mask, expected_mask)
+
+  def testMaskAreaPartiallyOutsideImage(self):
+    image_shape = [6, 7]
+    mask = patch_ops.get_patch_mask(5, 6, patch_size=5, image_shape=image_shape)
+    expected_mask = np.array([
+        [0, 0, 0, 0, 0, 0, 0],
+        [0, 0, 0, 0, 0, 0, 0],
+        [0, 0, 0, 0, 0, 0, 0],
+        [0, 0, 0, 0, 1, 1, 1],
+        [0, 0, 0, 0, 1, 1, 1],
+        [0, 0, 0, 0, 1, 1, 1],
+    ]).reshape(image_shape).astype(bool)
+    self.assertAllEqual(mask, expected_mask)
+
+  @parameterized.parameters(
+      {'y': 0, 'x': -1},
+      {'y': -1, 'x': 0},
+      {'y': 0, 'x': 11},
+      {'y': 16, 'x': 0},
+  )
+  def testStaticCoordinatesOutsideImageRaisesError(self, y, x):
+    image_shape = [15, 10]
+    with self.assertRaises(tf.errors.InvalidArgumentError):
+      patch_ops.get_patch_mask(y, x, patch_size=3, image_shape=image_shape)
+
+  def testDynamicCoordinatesOutsideImageRaisesError(self):
+    image_shape = [15, 10]
+    x = tf.random_uniform([], minval=-2, maxval=-1, dtype=tf.int32)
+    y = tf.random_uniform([], minval=0, maxval=1, dtype=tf.int32)
+    mask = patch_ops.get_patch_mask(
+        y, x, patch_size=3, image_shape=image_shape)
+    with self.assertRaises(tf.errors.InvalidArgumentError):
+      self.evaluate(mask)
+
+  @parameterized.parameters(
+      {'patch_size': 0},
+      {'patch_size': -1},
+  )
+  def testStaticNonPositivePatchSizeRaisesError(self, patch_size):
+    image_shape = [6, 7]
+    with self.assertRaises(tf.errors.InvalidArgumentError):
+      patch_ops.get_patch_mask(
+          0, 0, patch_size=patch_size, image_shape=image_shape)
+
+  def testDynamicNonPositivePatchSizeRaisesError(self):
+    image_shape = [6, 7]
+    patch_size = -1 * tf.random_uniform([], minval=0, maxval=3, dtype=tf.int32)
+    mask = patch_ops.get_patch_mask(
+        0, 0, patch_size=patch_size, image_shape=image_shape)
+    with self.assertRaises(tf.errors.InvalidArgumentError):
+      self.evaluate(mask)
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/utils/shape_utils.py
+++ b/research/object_detection/utils/shape_utils.py
@@ -383,7 +383,7 @@ def flatten_dimensions(inputs, first, last):

  Example:
  `inputs` is a tensor with initial shape [10, 5, 20, 20, 3].
-  new_tensor = flatten_dimensions(inputs, last=4, first=2)
+  new_tensor = flatten_dimensions(inputs, first=1, last=3)
  new_tensor.shape -> [10, 100, 20, 3].

  Args:
@@ -465,3 +465,34 @@ def expand_first_dimension(inputs, dims):
    inputs_reshaped = tf.reshape(inputs, expanded_shape)

  return inputs_reshaped
+
+
+def resize_images_and_return_shapes(inputs, image_resizer_fn):
+  """Resizes images using the given function and returns their true shapes.
+
+  Args:
+    inputs: a float32 Tensor representing a batch of inputs of shape
+      [batch_size, height, width, channels].
+    image_resizer_fn: a function which takes in a single image and outputs
+      a resized image and its original shape.
+
+  Returns:
+    resized_inputs: The inputs resized according to image_resizer_fn.
+    true_image_shapes: A integer tensor of shape [batch_size, 3]
+      representing the height, width and number of channels in inputs.
+  """
+
+  if inputs.dtype is not tf.float32:
+    raise ValueError('`resize_images_and_return_shapes` expects a'
+                     ' tf.float32 tensor')
+
+  # TODO(jonathanhuang): revisit whether to always use batch size as
+  # the number of parallel iterations vs allow for dynamic batching.
+  outputs = static_or_dynamic_map_fn(
+      image_resizer_fn,
+      elems=inputs,
+      dtype=[tf.float32, tf.int32])
+  resized_inputs = outputs[0]
+  true_image_shapes = outputs[1]
+
+  return resized_inputs, true_image_shapes
--- a/research/object_detection/utils/spatial_transform_ops.py
+++ b/research/object_detection/utils/spatial_transform_ops.py
@@ -290,10 +290,11 @@ def multilevel_roi_align(features, boxes, box_levels, output_size,

  Args:
    features: A list of 4D float tensors of shape [batch_size, max_height,
-      max_width, channels] containing features.
+      max_width, channels] containing features. Note that each feature map must
+      have the same number of channels.
    boxes: A 3D float tensor of shape [batch_size, num_boxes, 4] containing
      boxes of the form [ymin, xmin, ymax, xmax] in normalized coordinates.
-    box_levels: A 3D int32 tensor of shape [batch_size, num_boxes, 1]
+    box_levels: A 3D int32 tensor of shape [batch_size, num_boxes]
      representing the feature level index for each box.
    output_size: An list of two integers [size_y, size_x] indicating the output
      feature size for each box.

--- a/research/object_detection/utils/visualization_utils.py
+++ b/research/object_detection/utils/visualization_utils.py
@@ -536,7 +536,7 @@ def draw_side_by_side_evaluation_image(eval_dict,
  # Add the batch dimension if the eval_dict is for single example.
  if len(eval_dict[detection_fields.detection_classes].shape) == 1:
    for key in eval_dict:
-      if key != input_data_fields.original_image:
+      if key != input_data_fields.original_image and key != input_data_fields.image_additional_channels:
        eval_dict[key] = tf.expand_dims(eval_dict[key], 0)

  for indx in range(eval_dict[input_data_fields.original_image].shape[0]):
@@ -600,8 +600,42 @@ def draw_side_by_side_evaluation_image(eval_dict,
        max_boxes_to_draw=None,
        min_score_thresh=0.0,
        use_normalized_coordinates=use_normalized_coordinates)
-    images_with_detections_list.append(
-        tf.concat([images_with_detections, images_with_groundtruth], axis=2))
+    images_to_visualize = tf.concat([images_with_detections,
+                                     images_with_groundtruth], axis=2)
+
+    if input_data_fields.image_additional_channels in eval_dict:
+      images_with_additional_channels_groundtruth = (
+          draw_bounding_boxes_on_image_tensors(
+              tf.expand_dims(
+                  eval_dict[input_data_fields.image_additional_channels][indx],
+                  axis=0),
+              tf.expand_dims(
+                  eval_dict[input_data_fields.groundtruth_boxes][indx], axis=0),
+              tf.expand_dims(
+                  eval_dict[input_data_fields.groundtruth_classes][indx],
+                  axis=0),
+              tf.expand_dims(
+                  tf.ones_like(
+                      eval_dict[input_data_fields.groundtruth_classes][indx],
+                      dtype=tf.float32),
+                  axis=0),
+              category_index,
+              original_image_spatial_shape=tf.expand_dims(
+                  eval_dict[input_data_fields.original_image_spatial_shape]
+                  [indx],
+                  axis=0),
+              true_image_shape=tf.expand_dims(
+                  eval_dict[input_data_fields.true_image_shape][indx], axis=0),
+              instance_masks=groundtruth_instance_masks,
+              keypoints=None,
+              max_boxes_to_draw=None,
+              min_score_thresh=0.0,
+              use_normalized_coordinates=use_normalized_coordinates))
+      images_to_visualize = tf.concat(
+          [images_to_visualize, images_with_additional_channels_groundtruth],
+          axis=2)
+    images_with_detections_list.append(images_to_visualize)
+
  return images_with_detections_list



--- a/research/slim/BUILD
+++ b/research/slim/BUILD
 # Description:
 #   Contains files for loading, training and evaluating TF-Slim-based models.
+# load("//devtools/python/blaze:python3.bzl", "py2and3_test")

 package(
    default_visibility = ["//visibility:public"],
@@ -91,6 +92,7 @@ sh_binary(
 py_binary(
    name = "build_visualwakewords_data",
    srcs = ["datasets/build_visualwakewords_data.py"],
+    python_version = "PY2",
    deps = [
        ":build_visualwakewords_data_lib",
        # "//tensorflow",
@@ -512,11 +514,34 @@ py_library(
    ],
 )

+py_library(
+    name = "mobilenet_common",
+    srcs = [
+        "nets/mobilenet/conv_blocks.py",
+        "nets/mobilenet/mobilenet.py",
+    ],
+    srcs_version = "PY2AND3",
+    deps = [
+        # "//tensorflow",
+    ],
+)
+
 py_library(
    name = "mobilenet_v2",
-    srcs = glob(["nets/mobilenet/*.py"]),
+    srcs = ["nets/mobilenet/mobilenet_v2.py"],
    srcs_version = "PY2AND3",
    deps = [
+        ":mobilenet_common",
+        # "//tensorflow",
+    ],
+)
+
+py_library(
+    name = "mobilenet_v3",
+    srcs = ["nets/mobilenet/mobilenet_v3.py"],
+    srcs_version = "PY2AND3",
+    deps = [
+        ":mobilenet_common",
        # "//tensorflow",
    ],
 )
@@ -532,11 +557,22 @@ py_test(
    ],
 )

+py_test(  # py2and3_test
+    name = "mobilenet_v3_test",
+    srcs = ["nets/mobilenet/mobilenet_v3_test.py"],
+    srcs_version = "PY2AND3",
+    deps = [
+        ":mobilenet",
+        # "//tensorflow",
+    ],
+)
+
 py_library(
    name = "mobilenet",
    deps = [
        ":mobilenet_v1",
        ":mobilenet_v2",
+        ":mobilenet_v3",
    ],
 )

@@ -709,6 +745,7 @@ py_library(
 py_test(
    name = "resnet_v1_test",
    size = "medium",
+    timeout = "long",
    srcs = ["nets/resnet_v1_test.py"],
    python_version = "PY2",
    shard_count = 2,

--- a/research/slim/datasets/download_mscoco.sh
+++ b/research/slim/datasets/download_mscoco.sh