Commit 63054210 authored by Zhichao Lu's avatar Zhichao Lu Committed by pkulzc
Browse files

Merged commit includes the following changes:

195269567  by Zhichao Lu:

    Removing image summaries during train mode.

--
195147413  by Zhichao Lu:

    SSDLite config for mobilenet v2.

--
194883585  by Zhichao Lu:

    Simplify TPU compatible nearest neighbor upsampling using reshape and broadcasting.

--
194851009  by Zhichao Lu:

    Include ava v2.1 detection models in model zoo.

--
194292198  by Zhichao Lu:

    Add option to evaluate any checkpoint (without requiring write access to that directory and overwriting any existing logs there).

--
194122420  by Zhichao Lu:

    num_gt_boxes_per_image and num_det_boxes_per_image value incorrect.
    Should be not the expand dim.

--
193974479  by Zhichao Lu:

    Fixing a bug in the coco evaluator.

--
193959861  by Zhichao Lu:

    Read the default batch size from config file.

--
193737238  by Zhichao Lu:

    Fix data augmentation functions.

--
193576336  by Zhichao Lu:

    Add support for training keypoints.

--
193409179  by Zhichao Lu:

    Update protobuf requirements to 3+ in installation docs.

--
193382651  by Zhichao Lu:

    Updating coco evaluation metrics to allow for a batch of image info, rather than a single image.

--
193244778  by Zhichao Lu:

    Remove deprecated batch_norm_trainable field from ssd mobilenet v2 config

--
193228972  by Zhichao Lu:

    Make sure the final layers are also resized proportional to conv_depth_ratio.

--
193204364  by Zhichao Lu:

    Do not add batch norm parameters to final conv2d ops that predict boxes encodings and class scores in weight shared conv box predictor.

    This allows us to set proper bias and force initial predictions to be background when using focal loss.

--
193137342  by Zhichao Lu:

    Add a util function to visualize value histogram as a tf.summary.image.

--
193119411  by Zhichao Lu:

    Adding support for reading in logits as groundtruth labels and applying an optional temperature (scaling) before softmax in support of distillation.

--
193087707  by Zhichao Lu:

    Post-process now works again in train mode.

--
193067658  by Zhichao Lu:

    fix flakiness in testSSDRandomCropWithMultiClassScores due to randomness.

--
192922089  by Zhichao Lu:

    Add option to set dropout for classification net in weight shared box predictor.

--
192850747  by Zhichao Lu:

    Remove inaccurate caveat from proto file.

--
192837477  by Zhichao Lu:

    Extend to accept different ratios of conv channels.

--
192813444  by Zhichao Lu:

    Adding option for one_box_for_all_classes to the box_predictor

--
192624207  by Zhichao Lu:

    Update to trainer to allow for reading multiclass scores

--
192583425  by Zhichao Lu:

    Contains implementation of Visual Relations Detection evaluation metric (per
    image evaluation).

--
192529600  by Zhichao Lu:

    Modify the ssd meta arch to allow the option of not adding an implicit background class.

--
192512429  by Zhichao Lu:

    Refactor model_tpu_main.py files and move continuous eval loop into model_lib.py

--
192494267  by Zhichao Lu:

    Update create_pascal_tf_record.py and create_pet_tf_record.py

--
192485456  by Zhichao Lu:

    Enforcing that all eval metric ops have valid python strings.

--
192472546  by Zhichao Lu:

    Set regularize_depthwise to true in mobilenet_v1_argscope.

--
192421843  by Zhichao Lu:

    Refactoring of Mask-RCNN to put all mask prediction code in third stage.

--
192320460  by Zhichao Lu:

    Returning eval_on_train_input_fn from create_estimator_and_inputs(), rather than using train_input_fn in EVAL mode (which will still have data augmentation).

--
192226678  by Zhichao Lu:

    Access TPUEstimator and CrossShardOptimizer from tf namesspace.

--
192195514  by Zhichao Lu:

    Fix test that was flaky due to randomness

--
192166224  by Zhichao Lu:

    Minor fixes to match git repo.

--
192147130  by Zhichao Lu:

    use shape utils for assertion in feature extractor.

--
192132440  by Zhichao Lu:

    Class agnostic masks for mask_rcnn

--
192006190  by Zhichao Lu:

    Add learning rate summary in EVAL mode in model.py

--
192004845  by Zhichao Lu:

    Migrating away from Experiment class, as it is now deprecated. Also, refactoring into a separate model library and binaries.

--
191957195  by Zhichao Lu:

    Add classification_loss and localiztion_loss metrics for TPU jobs.

--
191932855  by Zhichao Lu:

    Add an option to skip the last striding in mobilenet. The modified network has nominal output stride 16 instead of 32.

--
191787921  by Zhichao Lu:

    Add option to override base feature extractor hyperparams in SSD models. This would allow us to use the same set of hyperparams for the complete feature extractor (base + new layers) if desired.

--
191743097  by Zhichao Lu:

    Adding an attribute to SSD model to indicate which fields in prediction dictionary have a batch dimension. This will be useful for future video models.

--
191668425  by Zhichao Lu:

    Internal change.

--
191649512  by Zhichao Lu:

    Introduce two parameters in ssd.proto - freeze_batchnorm, inplace_batchnorm_update - and set up slim arg_scopes in ssd_meta_arch.py such that applies it to all batchnorm ops in the predict() method.

    This centralizes the control of freezing and doing inplace batchnorm updates.

--
191620303  by Zhichao Lu:

    Modifications to the preprocessor to support multiclass scores

--

PiperOrigin-RevId: 195269567
parent 5f9f6b84
......@@ -90,6 +90,15 @@ reporting an issue.
## Release information
### April 30, 2018
We have released a Faster R-CNN detector with ResNet-101 feature extractor trained on [AVA](https://research.google.com/ava/) v2.1.
Compared with other commonly used object detectors, it changes the action classification loss function to per-class Sigmoid loss to handle boxes with multiple labels.
The model is trained on the training split of AVA v2.1 for 1.5M iterations, it achieves mean AP of 11.25% over 60 classes on the validation split of AVA v2.1.
For more details please refer to this [paper](https://arxiv.org/abs/1705.08421).
<b>Thanks to contributors</b>: Chen Sun, David Ross
### April 2, 2018
Supercharge your mobile phones with the next generation mobile object detector!
......
item {
name: "bend/bow (at the waist)"
id: 1
}
item {
name: "crouch/kneel"
id: 3
}
item {
name: "dance"
id: 4
}
item {
name: "fall down"
id: 5
}
item {
name: "get up"
id: 6
}
item {
name: "jump/leap"
id: 7
}
item {
name: "lie/sleep"
id: 8
}
item {
name: "martial art"
id: 9
}
item {
name: "run/jog"
id: 10
}
item {
name: "sit"
id: 11
}
item {
name: "stand"
id: 12
}
item {
name: "swim"
id: 13
}
item {
name: "walk"
id: 14
}
item {
name: "answer phone"
id: 15
}
item {
name: "carry/hold (an object)"
id: 17
}
item {
name: "climb (e.g., a mountain)"
id: 20
}
item {
name: "close (e.g., a door, a box)"
id: 22
}
item {
name: "cut"
id: 24
}
item {
name: "dress/put on clothing"
id: 26
}
item {
name: "drink"
id: 27
}
item {
name: "drive (e.g., a car, a truck)"
id: 28
}
item {
name: "eat"
id: 29
}
item {
name: "enter"
id: 30
}
item {
name: "hit (an object)"
id: 34
}
item {
name: "lift/pick up"
id: 36
}
item {
name: "listen (e.g., to music)"
id: 37
}
item {
name: "open (e.g., a window, a car door)"
id: 38
}
item {
name: "play musical instrument"
id: 41
}
item {
name: "point to (an object)"
id: 43
}
item {
name: "pull (an object)"
id: 45
}
item {
name: "push (an object)"
id: 46
}
item {
name: "put down"
id: 47
}
item {
name: "read"
id: 48
}
item {
name: "ride (e.g., a bike, a car, a horse)"
id: 49
}
item {
name: "sail boat"
id: 51
}
item {
name: "shoot"
id: 52
}
item {
name: "smoke"
id: 54
}
item {
name: "take a photo"
id: 56
}
item {
name: "text on/look at a cellphone"
id: 57
}
item {
name: "throw"
id: 58
}
item {
name: "touch (an object)"
id: 59
}
item {
name: "turn (e.g., a screwdriver)"
id: 60
}
item {
name: "watch (e.g., TV)"
id: 61
}
item {
name: "work on a computer"
id: 62
}
item {
name: "write"
id: 63
}
item {
name: "fight/hit (a person)"
id: 64
}
item {
name: "give/serve (an object) to (a person)"
id: 65
}
item {
name: "grab (a person)"
id: 66
}
item {
name: "hand clap"
id: 67
}
item {
name: "hand shake"
id: 68
}
item {
name: "hand wave"
id: 69
}
item {
name: "hug (a person)"
id: 70
}
item {
name: "kiss (a person)"
id: 72
}
item {
name: "lift (a person)"
id: 73
}
item {
name: "listen to (a person)"
id: 74
}
item {
name: "push (another person)"
id: 76
}
item {
name: "sing to (e.g., self, a person, a group)"
id: 77
}
item {
name: "take (an object) from (a person)"
id: 78
}
item {
name: "talk to (e.g., self, a person, a group)"
id: 79
}
item {
name: "watch (a person)"
id: 80
}
......@@ -91,7 +91,7 @@ Some remarks on frozen inference graphs:
## Kitti-trained models {#kitti-models}
Model name | Speed (ms) | Pascal mAP@0.5 (ms) | Outputs
Model name | Speed (ms) | Pascal mAP@0.5 | Outputs
----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :-------------: | :-----:
[faster_rcnn_resnet101_kitti](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_kitti_2018_01_28.tar.gz) | 79 | 87 | Boxes
......@@ -103,6 +103,13 @@ Model name
[faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz) | 347 | | Boxes
## AVA v2.1 trained models {#ava-models}
Model name | Speed (ms) | Pascal mAP@0.5 | Outputs
----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :-------------: | :-----:
[faster_rcnn_resnet101_ava_v2.1](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_ava_v2.1_2018_04_30.tar.gz) | 93 | 11 | Boxes
[^1]: See [MSCOCO evaluation protocol](http://cocodataset.org/#detections-eval).
[^2]: This is PASCAL mAP with a slightly different way of true positives computation: see [Open Images evaluation protocol](evaluation_protocols.md#open-images).
......@@ -325,16 +325,16 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
}
eval_metric_ops = None
if mode in (tf.estimator.ModeKeys.TRAIN, tf.estimator.ModeKeys.EVAL):
if mode == tf.estimator.ModeKeys.EVAL:
class_agnostic = (fields.DetectionResultFields.detection_classes
not in detections)
groundtruth = _get_groundtruth_data(detection_model, class_agnostic)
use_original_images = fields.InputDataFields.original_image in features
original_images = (
eval_images = (
features[fields.InputDataFields.original_image] if use_original_images
else features[fields.InputDataFields.image])
eval_dict = eval_util.result_dict_for_single_example(
original_images[0:1],
eval_images[0:1],
features[inputs.HASH_KEY][0],
detections,
groundtruth,
......@@ -355,22 +355,21 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
img_summary = tf.summary.image('Detections_Left_Groundtruth_Right',
detection_and_groundtruth)
if mode == tf.estimator.ModeKeys.EVAL:
# Eval metrics on a single example.
eval_metrics = eval_config.metrics_set
if not eval_metrics:
eval_metrics = ['coco_detection_metrics']
eval_metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
eval_metrics, category_index.values(), eval_dict,
include_metrics_per_category=False)
for loss_key, loss_tensor in iter(losses_dict.items()):
eval_metric_ops[loss_key] = tf.metrics.mean(loss_tensor)
for var in optimizer_summary_vars:
eval_metric_ops[var.op.name] = (var, tf.no_op())
if img_summary is not None:
eval_metric_ops['Detections_Left_Groundtruth_Right'] = (
img_summary, tf.no_op())
eval_metric_ops = {str(k): v for k, v in eval_metric_ops.iteritems()}
# Eval metrics on a single example.
eval_metrics = eval_config.metrics_set
if not eval_metrics:
eval_metrics = ['coco_detection_metrics']
eval_metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
eval_metrics, category_index.values(), eval_dict,
include_metrics_per_category=False)
for loss_key, loss_tensor in iter(losses_dict.items()):
eval_metric_ops[loss_key] = tf.metrics.mean(loss_tensor)
for var in optimizer_summary_vars:
eval_metric_ops[var.op.name] = (var, tf.no_op())
if img_summary is not None:
eval_metric_ops['Detections_Left_Groundtruth_Right'] = (
img_summary, tf.no_op())
eval_metric_ops = {str(k): v for k, v in eval_metric_ops.iteritems()}
if use_tpu:
return tf.contrib.tpu.TPUEstimatorSpec(
......
# Faster R-CNN with Resnet-101 (v1), configuration for AVA v2.1.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
faster_rcnn {
num_classes: 80
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SIGMOID
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
second_stage_classification_loss {
weighted_sigmoid {
anchorwise_output: true
}
}
}
}
train_config: {
batch_size: 1
num_steps: 1500000
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 1200000
learning_rate: .00003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
merge_multiple_label_boxes: true
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
data_augmentation_options {
random_horizontal_flip {
}
}
max_number_of_boxes: 100
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/ava_train.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/ava_label_map_v2.1.pbtxt"
}
eval_config: {
metrics_set: "pascal_voc_detection_metrics"
use_moving_averages: false
num_examples: 57371
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/ava_val.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/ava_label_map_v2.1.pbtxt"
shuffle: false
num_readers: 1
}
......@@ -54,6 +54,7 @@ model {
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 3
use_depthwise: true
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
......
......@@ -774,8 +774,8 @@ def nearest_neighbor_upsampling(input_tensor, scale):
Nearest neighbor upsampling function that maps input tensor with shape
[batch_size, height, width, channels] to [batch_size, height * scale
, width * scale, channels]. This implementation only uses reshape and tile to
make it compatible with certain hardware.
, width * scale, channels]. This implementation only uses reshape and
broadcasting to make it TPU compatible.
Args:
input_tensor: A float32 tensor of size [batch, height_in, width_in,
......@@ -785,13 +785,14 @@ def nearest_neighbor_upsampling(input_tensor, scale):
data_up: A float32 tensor of size
[batch, height_in*scale, width_in*scale, channels].
"""
shape = shape_utils.combined_static_and_dynamic_shape(input_tensor)
shape_before_tile = [shape[0], shape[1], 1, shape[2], 1, shape[3]]
shape_after_tile = [shape[0], shape[1] * scale, shape[2] * scale, shape[3]]
data_reshaped = tf.reshape(input_tensor, shape_before_tile)
resized_tensor = tf.tile(data_reshaped, [1, 1, scale, 1, scale, 1])
resized_tensor = tf.reshape(resized_tensor, shape_after_tile)
return resized_tensor
with tf.name_scope('nearest_neighbor_upsampling'):
(batch_size, height, width,
channels) = shape_utils.combined_static_and_dynamic_shape(input_tensor)
output_tensor = tf.reshape(
input_tensor, [batch_size, height, 1, width, 1, channels]) * tf.ones(
[1, 1, scale, 1, scale, 1], dtype=input_tensor.dtype)
return tf.reshape(output_tensor,
[batch_size, height * scale, width * scale, channels])
def matmul_gather_on_zeroth_axis(params, indices, scope=None):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment