Release Context RCNN code and pre-trained model

Context R-CNN: Long Term Temporal Context for Per Camera Object Detection http://openaccess.thecvf.com/content_CVPR_2020/html/Beery_Context_R-CNN_Long_Term_Temporal_Context_for_Per-Camera_Object_Detection_CVPR_2020_paper.html

Release Context RCNN code and pre-trained model
Context R-CNN: Long Term Temporal Context for Per Camera Object Detection http://openaccess.thecvf.com/content_CVPR_2020/html/Beery_Context_R-CNN_Long_Term_Temporal_Context_for_Per-Camera_Object_Detection_CVPR_2020_paper.html
79ae8004 · pkulzc · GitHub · c87c3965 · 79ae8004 · 79ae8004
Unverified Commit 79ae8004 authored Jun 18, 2020 by pkulzc Committed by GitHub Jun 18, 2020
4 changed files
--- a/research/object_detection/README.md
+++ b/research/object_detection/README.md
--- a/research/object_detection/g3doc/context_rcnn.md
+++ b/research/object_detection/g3doc/context_rcnn.md
@@ -22,6 +22,18 @@ contextual features. We focus on building context from object-centric features
 generated with a pre-trained Faster R-CNN model, but you can adapt the provided
 code to use alternative feature extractors.

+Each of these data processing scripts uses Apache Beam, which can be installed
+using
+
+```
+pip install apache-beam
+```
+
+and can be run locally, or on a cluster for efficient processing of large
+amounts of data. See the
+[Apache Beam documentation](https://beam.apache.org/documentation/runners/dataflow/)
+for more information.
+
 ### Generating TfRecords from a set of images and a COCO-CameraTraps style JSON

 If your data is already stored in TfRecords, you can skip this first step.
@@ -99,6 +111,10 @@ python object_detection/export_inference_graph.py \
  --additional_output_tensor_names detection_features
 ```

+Make sure that you have set `output_final_box_features: true` within
+your config file before exporting. This is needed to export the features as an
+output, but it does not need to be set during training.
+
 To generate and save contextual features for your data, run

 ```
@@ -111,7 +127,8 @@ python object_detection/dataset_tools/context_rcnn/generate_embedding_data.py \

 ### Building up contextual memory banks and storing them for each context group

-To build the context features into memory banks, run
+To build the context features you just added for each image into memory banks,
+run

 ```
 python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \
@@ -121,6 +138,9 @@ python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \
  --time_horizon month
 ```

+where the input_tfrecords for add_context_to_examples.py are the
+output_tfrecords from generate_embedding_data.py.
+
 For all options, see add_context_to_examples.py. By default, this code builds
 TfSequenceExamples, which are more data efficient (this allows you to store the
 context features once for each context group, as opposed to once per image). If

--- a/research/object_detection/model_lib.py
+++ b/research/object_detection/model_lib.py
@@ -23,9 +23,9 @@ import functools
 import os

 import tensorflow.compat.v1 as tf
+import tensorflow.compat.v2 as tf2
 import tf_slim as slim

-
 from object_detection import eval_util
 from object_detection import exporter as exporter_lib
 from object_detection import inputs
@@ -349,7 +349,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
      from tensorflow.python.keras.engine import base_layer_utils  # pylint: disable=g-import-not-at-top
      # Enable v2 behavior, as `mixed_bfloat16` is only supported in TF 2.0.
      base_layer_utils.enable_v2_dtype_behavior()
-      tf.compat.v2.keras.mixed_precision.experimental.set_policy(
+      tf2.keras.mixed_precision.experimental.set_policy(
          'mixed_bfloat16')
    detection_model = detection_model_fn(
        is_training=is_training, add_summaries=(not use_tpu))

--- a/research/object_detection/samples/configs/context_rcnn_resnet101_snapshot_serengeti_sync.config
+++ b/research/object_detection/samples/configs/context_rcnn_resnet101_snapshot_serengeti_sync.config
+# Context R-CNN configuration for Snapshot Serengeti Dataset, with sequence
+# example input data with context_features.
+# This model uses attention into contextual features within the Faster R-CNN
+# object detection framework to improve object detection performance.
+# See https://arxiv.org/abs/1912.03538 for more information.
+# Search for "PATH_TO_BE_CONFIGURED" to find the fields that should be
+# configured.
+
+# This config is TPU compatible.
+
+model {
+  faster_rcnn {
+    num_classes: 48
+    image_resizer {
+      fixed_shape_resizer {
+        height: 640
+        width: 640
+      }
+    }
+    feature_extractor {
+      type: "faster_rcnn_resnet101"
+      first_stage_features_stride: 16
+      batch_norm_trainable: true
+    }
+    first_stage_anchor_generator {
+      grid_anchor_generator {
+        height_stride: 16
+        width_stride: 16
+        scales: 0.25
+        scales: 0.5
+        scales: 1.0
+        scales: 2.0
+        aspect_ratios: 0.5
+        aspect_ratios: 1.0
+        aspect_ratios: 2.0
+      }
+    }
+    first_stage_box_predictor_conv_hyperparams {
+      op: CONV
+      regularizer {
+        l2_regularizer {
+          weight: 0.0
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+          stddev: 0.00999999977648
+        }
+      }
+    }
+    first_stage_nms_score_threshold: 0.0
+    first_stage_nms_iou_threshold: 0.699999988079
+    first_stage_max_proposals: 300
+    first_stage_localization_loss_weight: 2.0
+    first_stage_objectness_loss_weight: 1.0
+    initial_crop_size: 14
+    maxpool_kernel_size: 2
+    maxpool_stride: 2
+    second_stage_box_predictor {
+      mask_rcnn_box_predictor {
+        fc_hyperparams {
+          op: FC
+          regularizer {
+            l2_regularizer {
+              weight: 0.0
+            }
+          }
+          initializer {
+            variance_scaling_initializer {
+              factor: 1.0
+              uniform: true
+              mode: FAN_AVG
+            }
+          }
+        }
+        use_dropout: false
+        dropout_keep_probability: 1.0
+        share_box_across_classes: true
+      }
+    }
+    second_stage_post_processing {
+      batch_non_max_suppression {
+        score_threshold: 0.0
+        iou_threshold: 0.600000023842
+        max_detections_per_class: 100
+        max_total_detections: 300
+      }
+      score_converter: SOFTMAX
+    }
+    second_stage_localization_loss_weight: 2.0
+    second_stage_classification_loss_weight: 1.0
+    use_matmul_crop_and_resize: true
+    clip_anchors_to_image: true
+    use_matmul_gather_in_matcher: true
+    use_static_balanced_label_sampler: true
+    use_static_shapes: true
+    context_config {
+      max_num_context_features: 2000
+      context_feature_length: 2057
+    }
+  }
+}
+train_config {
+  batch_size: 64
+  data_augmentation_options {
+    random_horizontal_flip {
+    }
+  }
+  sync_replicas: true
+  optimizer {
+    momentum_optimizer {
+      learning_rate {
+        manual_step_learning_rate {
+          initial_learning_rate: 0.0
+          schedule {
+            step: 2000
+            learning_rate: 0.00200000009499
+          }
+          schedule {
+            step: 200000
+            learning_rate: 0.000199999994948
+          }
+          schedule {
+            step: 300000
+            learning_rate: 1.99999994948e-05
+          }
+          warmup: true
+        }
+      }
+      momentum_optimizer_value: 0.899999976158
+    }
+    use_moving_average: false
+  }
+  gradient_clipping_by_norm: 10.0
+  fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/faster_rcnn_resnet101_coco_2018_08_14/model.ckpt"
+  from_detection_checkpoint: true
+  num_steps: 500000
+  replicas_to_aggregate: 8
+  max_number_of_boxes: 100
+  unpad_groundtruth_tensors: false
+  use_bfloat16: true
+}
+train_input_reader {
+  label_map_path: "PATH_TO_BE_CONFIGURED/ss_label_map.pbtxt"
+  tf_record_input_reader {
+    input_path: "PATH_TO_BE_CONFIGURED/snapshot_serengeti_train-?????-of-?????"
+  }
+  load_context_features: true
+  input_type: TF_SEQUENCE_EXAMPLE
+}
+eval_config {
+  max_evals: 50
+  metrics_set: "coco_detection_metrics"
+  use_moving_averages: false
+  batch_size: 4
+}
+eval_input_reader {
+  label_map_path: "PATH_TO_BE_CONFIGURED/ss_label_map.pbtxt"
+  shuffle: false
+  num_epochs: 1
+  tf_record_input_reader {
+    input_path: "PATH_TO_BE_CONFIGURED/snapshot_serengeti_val-?????-of-?????"
+  }
+  load_context_features: true
+  input_type: TF_SEQUENCE_EXAMPLE
+}