Unverified Commit 79ae8004 authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Release Context RCNN code and pre-trained model

Context R-CNN: Long Term Temporal Context for Per Camera Object Detection

http://openaccess.thecvf.com/content_CVPR_2020/html/Beery_Context_R-CNN_Long_Term_Temporal_Context_for_Per-Camera_Object_Detection_CVPR_2020_paper.html
parent c87c3965
This diff is collapsed.
......@@ -22,6 +22,18 @@ contextual features. We focus on building context from object-centric features
generated with a pre-trained Faster R-CNN model, but you can adapt the provided
code to use alternative feature extractors.
Each of these data processing scripts uses Apache Beam, which can be installed
using
```
pip install apache-beam
```
and can be run locally, or on a cluster for efficient processing of large
amounts of data. See the
[Apache Beam documentation](https://beam.apache.org/documentation/runners/dataflow/)
for more information.
### Generating TfRecords from a set of images and a COCO-CameraTraps style JSON
If your data is already stored in TfRecords, you can skip this first step.
......@@ -99,6 +111,10 @@ python object_detection/export_inference_graph.py \
--additional_output_tensor_names detection_features
```
Make sure that you have set `output_final_box_features: true` within
your config file before exporting. This is needed to export the features as an
output, but it does not need to be set during training.
To generate and save contextual features for your data, run
```
......@@ -111,7 +127,8 @@ python object_detection/dataset_tools/context_rcnn/generate_embedding_data.py \
### Building up contextual memory banks and storing them for each context group
To build the context features into memory banks, run
To build the context features you just added for each image into memory banks,
run
```
python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \
......@@ -121,6 +138,9 @@ python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \
--time_horizon month
```
where the input_tfrecords for add_context_to_examples.py are the
output_tfrecords from generate_embedding_data.py.
For all options, see add_context_to_examples.py. By default, this code builds
TfSequenceExamples, which are more data efficient (this allows you to store the
context features once for each context group, as opposed to once per image). If
......
......@@ -23,9 +23,9 @@ import functools
import os
import tensorflow.compat.v1 as tf
import tensorflow.compat.v2 as tf2
import tf_slim as slim
from object_detection import eval_util
from object_detection import exporter as exporter_lib
from object_detection import inputs
......@@ -349,7 +349,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
from tensorflow.python.keras.engine import base_layer_utils # pylint: disable=g-import-not-at-top
# Enable v2 behavior, as `mixed_bfloat16` is only supported in TF 2.0.
base_layer_utils.enable_v2_dtype_behavior()
tf.compat.v2.keras.mixed_precision.experimental.set_policy(
tf2.keras.mixed_precision.experimental.set_policy(
'mixed_bfloat16')
detection_model = detection_model_fn(
is_training=is_training, add_summaries=(not use_tpu))
......
# Context R-CNN configuration for Snapshot Serengeti Dataset, with sequence
# example input data with context_features.
# This model uses attention into contextual features within the Faster R-CNN
# object detection framework to improve object detection performance.
# See https://arxiv.org/abs/1912.03538 for more information.
# Search for "PATH_TO_BE_CONFIGURED" to find the fields that should be
# configured.
# This config is TPU compatible.
model {
faster_rcnn {
num_classes: 48
image_resizer {
fixed_shape_resizer {
height: 640
width: 640
}
}
feature_extractor {
type: "faster_rcnn_resnet101"
first_stage_features_stride: 16
batch_norm_trainable: true
}
first_stage_anchor_generator {
grid_anchor_generator {
height_stride: 16
width_stride: 16
scales: 0.25
scales: 0.5
scales: 1.0
scales: 2.0
aspect_ratios: 0.5
aspect_ratios: 1.0
aspect_ratios: 2.0
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.00999999977648
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.699999988079
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
use_dropout: false
dropout_keep_probability: 1.0
share_box_across_classes: true
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.600000023842
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
use_matmul_crop_and_resize: true
clip_anchors_to_image: true
use_matmul_gather_in_matcher: true
use_static_balanced_label_sampler: true
use_static_shapes: true
context_config {
max_num_context_features: 2000
context_feature_length: 2057
}
}
}
train_config {
batch_size: 64
data_augmentation_options {
random_horizontal_flip {
}
}
sync_replicas: true
optimizer {
momentum_optimizer {
learning_rate {
manual_step_learning_rate {
initial_learning_rate: 0.0
schedule {
step: 2000
learning_rate: 0.00200000009499
}
schedule {
step: 200000
learning_rate: 0.000199999994948
}
schedule {
step: 300000
learning_rate: 1.99999994948e-05
}
warmup: true
}
}
momentum_optimizer_value: 0.899999976158
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/faster_rcnn_resnet101_coco_2018_08_14/model.ckpt"
from_detection_checkpoint: true
num_steps: 500000
replicas_to_aggregate: 8
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
use_bfloat16: true
}
train_input_reader {
label_map_path: "PATH_TO_BE_CONFIGURED/ss_label_map.pbtxt"
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/snapshot_serengeti_train-?????-of-?????"
}
load_context_features: true
input_type: TF_SEQUENCE_EXAMPLE
}
eval_config {
max_evals: 50
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 4
}
eval_input_reader {
label_map_path: "PATH_TO_BE_CONFIGURED/ss_label_map.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/snapshot_serengeti_val-?????-of-?????"
}
load_context_features: true
input_type: TF_SEQUENCE_EXAMPLE
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment