Unverified Commit fe748d4a authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Object detection changes: (#7208)

257914648  by lzc:

    Internal changes

--
257525973  by Zhichao Lu:

    Fixes bug that silently prevents checkpoints from loading when training w/ eager + functions. Also sets up scripts to run training.

--
257296614  by Zhichao Lu:

    Adding detection_features to model outputs

--
257234565  by Zhichao Lu:

    Fix wrong order of `classes_with_max_scores` in class-agnostic NMS caused by
    sorting in partitioned-NMS.

--
257232002  by ronnyvotel:

    Supporting `filter_nonoverlapping` option in np_box_list_ops.clip_to_window().

--
257198282  by Zhichao Lu:

    Adding the focal loss and l1 loss from the Objects as Points paper.

--
257089535  by Zhichao Lu:

    Create Keras based ssd + resnetv1 + fpn.

--
257087407  by Zhichao Lu:

    Make object_detection/data_decoders Python3-compatible.

--
257004582  by Zhichao Lu:

    Updates _decode_raw_data_into_masks_and_boxes to the latest binary masks-to-string encoding format.

--
257002124  by Zhichao Lu:

    Make object_detection/utils Python3-compatible, except json_utils.

    The patching trick used in json_utils is not going to work in Python 3.

--
256795056  by lzc:

    Add a detection_anchor_indices field to detection outputs.

--
256477542  by Zhichao Lu:

    Make object_detection/core Python3-compatible.

--
256387593  by Zhichao Lu:

    Edit class_id_function_approximations builder to skip class ids not present in label map.

--
256259039  by Zhichao Lu:

    Move NMS to TPU for FasterRCNN.

--
256071360  by rathodv:

    When multiclass_scores is empty, add one-hot encoding of groundtruth_classes as multiclass scores so that data_augmentation ops that expect the presence of multiclass_scores don't have to individually handle this case.

    Also copy input tensor_dict to out_tensor_dict first to avoid inplace modification.

--
256023645  by Zhichao Lu:

    Adds the first WIP iterations of TensorFlow v2 eager + functions style custom training & evaluation loops.

--
255980623  by Zhichao Lu:

    Adds a new data augmentation operation "remap_labels" which remaps a set of labels to a new label.

--
255753259  by Zhichao Lu:

    Announcement of the released evaluation tutorial for Open Images Challenge
    2019.

--
255698776  by lzc:

    Fix rewrite_nn_resize_op function which was broken by tf forward compatibility movement.

--
255623150  by Zhichao Lu:

    Add Keras-based ResnetV1 models.

--
255504992  by Zhichao Lu:

    Fixing the typo in specifying label expansion for ground truth segmentation
    file.

--
255470768  by Zhichao Lu:

    1. Fixing Python bug with parsed arguments.
    2. Adding capability to parse relevant columns from CSV header.
    3. Fixing bug with duplicated labels expansion.

--
255462432  by Zhichao Lu:

    Adds a new data augmentation operation "drop_label_probabilistically" which drops a given label with the given probability. This supports experiments on training in the presence of label noise.

--
255441632  by rathodv:

    Fallback on groundtruth classes when multiclass_scores tensor is empty.

--
255434899  by Zhichao Lu:

    Ensuring evaluation binary can run even with big files by synchronizing
    processing of ground truth and predictions: in this way, ground truth is not stored but immediatly
    used for evaluation. In case gt of object masks, this allows to run
    evaluations on relatively large sets.

--
255337855  by lzc:

    Internal change.

--
255308908  by Zhichao Lu:

    Add comment to clarify usage of calibration parameters proto.

--
255266371  by Zhichao Lu:

    Ensuring correct processing of the case, when no groundtruth masks are provided
    for an image.

--
255236648  by Zhichao Lu:

    Refactor model_builder in faster_rcnn.py to a util_map, so that it's possible to be overwritten.

--
255093285  by Zhichao Lu:

    Updating capability to subsample data during evaluation

--
255081222  by rathodv:

    Convert groundtruth masks to be of type float32 before its used in the loss function.

    When using mixed precision training, masks are represented using bfloat16 tensors in the input pipeline for performance reasons. We need to convert them to float32 before using it in the loss function.

--
254788436  by Zhichao Lu:

    Add forward_compatible to non_max_suppression_with_scores to make it is
    compatible with older tensorflow version.

--
254442362  by Zhichao Lu:

    Add num_layer field to ssd feature extractor proto.

--
253911582  by jonathanhuang:

    Plumbs Soft-NMS options (using the new tf.image.non_max_suppression_with_scores op) into the TF Object Detection API.  It adds a `soft_nms_sigma` field to the postprocessing proto file and plumbs this through to both the multiclass and class_agnostic versions of NMS. Note that there is no effect on behavior of NMS when soft_nms_sigma=0 (which it is set to by default).

    See also "Soft-NMS -- Improving Object Detection With One Line of Code" by Bodla et al (https://arxiv.org/abs/1704.04503)

--
253703949  by Zhichao Lu:

    Internal test fixes.

--
253151266  by Zhichao Lu:

    Fix the op type check for FusedBatchNorm, given that we introduced
    FusedBatchNormV3 in a previous change.

--
252718956  by Zhichao Lu:

    Customize activation function to enable relu6 instead of relu for saliency
    prediction model seastarization

--
252158593  by Zhichao Lu:

    Make object_detection/core Python3-compatible.

--
252150717  by Zhichao Lu:

    Make object_detection/core Python3-compatible.

--
251967048  by Zhichao Lu:

    Make GraphRewriter proto extensible.

--
251950039  by Zhichao Lu:

    Remove experimental_export_device_assignment from TPUEstimator.export_savedmodel(), so as to remove rewrite_for_inference().

    As a replacement, export_savedmodel() V2 API supports device_assignment where user call tpu.rewrite in model_fn and pass in device_assigment there.

--
251890697  by rathodv:

    Updated docstring to include new output nodes.

--
251662894  by Zhichao Lu:

    Add autoaugment augmentation option to objection detection api codebase. This
    is an available option in preprocessor.py.

    The intended usage of autoaugment is to be done along with random flipping and
    cropping for best results.

--
251532908  by Zhichao Lu:

    Add TrainingDataType enum to track whether class-specific or agnostic data was used to fit the calibration function.

    This is useful, since classes with few observations may require a calibration function fit on all classes.

--
251511339  by Zhichao Lu:

    Add multiclass isotonic regression to the calibration builder.

--
251317769  by pengchong:

    Internal Change.

--
250729989  by Zhichao Lu:

    Fixing bug in gt statistics count in case of mask and box annotations.

--
250729627  by Zhichao Lu:

    Label expansion for segmentation.

--
250724905  by Zhichao Lu:

    Fix use_depthwise in fpn and test it with fpnlite on ssd + mobilenet v2.

--
250670379  by Zhichao Lu:

    Internal change

250630364  by lzc:

    Fix detection_model_zoo footnotes

--
250560654  by Zhichao Lu:

    Fix static shape issue in matmul_crop_and_resize.

--
250534857  by Zhichao Lu:

    Edit class agnostic calibration function docstring to more accurately describe the function's outputs.

--
250533277  by Zhichao Lu:

    Edit the multiclass messages to use class ids instead of labels.

--

PiperOrigin-RevId: 257914648
parent 81123ebf
# Open Images Challenge Evaluation
The Object Detection API is currently supporting several evaluation metrics used in the [Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html).
In addition, several data processing tools are available. Detailed instructions on using the tools for each track are available below.
The Object Detection API is currently supporting several evaluation metrics used
in the
[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html)
and
[Open Images Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html).
In addition, several data processing tools are available. Detailed instructions
on using the tools for each track are available below.
**NOTE**: links to the external website in this tutorial may change after the Open Images Challenge 2018 is finished.
**NOTE:** all data links are updated to the Open Images Challenge 2019.
## Object Detection Track
The [Object Detection metric](https://storage.googleapis.com/openimages/web/object_detection_metric.html) protocol requires a pre-processing of the released data to ensure correct evaluation. The released data contains only leaf-most bounding box annotations and image-level labels.
The evaluation metric implementation is available in the class `OpenImagesDetectionChallengeEvaluator`.
1. Download class hierarchy of Open Images Challenge 2018 in JSON format from [here](https://storage.googleapis.com/openimages/challenge_2018/bbox_labels_500_hierarchy.json).
2. Download ground-truth [boundling boxes](https://storage.googleapis.com/openimages/challenge_2018/train/challenge-2018-train-annotations-bbox.csv) and [image-level labels](https://storage.googleapis.com/openimages/challenge_2018/train/challenge-2018-train-annotations-human-imagelabels.csv).
3. Filter the rows corresponding to the validation set images you want to use and store the results in the same CSV format.
4. Run the following command to create hierarchical expansion of the bounding boxes annotations:
The
[Object Detection metric](https://storage.googleapis.com/openimages/web/evaluation.html#object_detection_eval)
protocol requires a pre-processing of the released data to ensure correct
evaluation. The released data contains only leaf-most bounding box annotations
and image-level labels. The evaluation metric implementation is available in the
class `OpenImagesChallengeEvaluator`.
1. Download
[class hierarchy of Open Images Detection Challenge 2019](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-label500-hierarchy.json)
in JSON format.
2. Download
[ground-truth boundling boxes](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-detection-bbox.csv)
and
[image-level labels](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-detection-human-imagelabels.csv).
3. Run the following command to create hierarchical expansion of the bounding
boxes and image-level label annotations:
```
HIERARCHY_FILE=/path/to/bbox_labels_500_hierarchy.json
BOUNDING_BOXES=/path/to/challenge-2018-train-annotations-bbox
IMAGE_LABELS=/path/to/challenge-2018-train-annotations-human-imagelabels
HIERARCHY_FILE=/path/to/challenge-2019-label500-hierarchy.json
BOUNDING_BOXES=/path/to/challenge-2019-validation-detection-bbox
IMAGE_LABELS=/path/to/challenge-2019-validation-detection-human-imagelabels
python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \
--json_hierarchy_file=${HIERARCHY_FILE} \
......@@ -33,13 +47,18 @@ python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \
--annotation_type=2
```
After step 4 you will have produced the ground-truth files suitable for running 'OID Challenge Object Detection Metric 2018' evaluation.
1. If you are not using Tensorflow, you can run evaluation directly using your
algorithm's output and generated ground-truth files. {value=4}
After step 3 you produced the ground-truth files suitable for running 'OID
Challenge Object Detection Metric 2019' evaluation. To run the evaluation, use
the following command:
```
INPUT_PREDICTIONS=/path/to/detection_predictions.csv
OUTPUT_METRICS=/path/to/output/metrics/file
python models/research/object_detection/metrics/oid_od_challenge_evaluation.py \
python models/research/object_detection/metrics/oid_challenge_evaluation.py \
--input_annotations_boxes=${BOUNDING_BOXES}_expanded.csv \
--input_annotations_labels=${IMAGE_LABELS}_expanded.csv \
--input_class_labelmap=object_detection/data/oid_object_detection_challenge_500_label_map.pbtxt \
......@@ -47,66 +66,99 @@ python models/research/object_detection/metrics/oid_od_challenge_evaluation.py \
--output_metrics=${OUTPUT_METRICS} \
```
### Running evaluation on CSV files directly
For the Object Detection Track, the participants will be ranked on:
5. If you are not using Tensorflow, you can run evaluation directly using your algorithm's output and generated ground-truth files. {value=5}
- "OpenImagesDetectionChallenge_Precision/mAP@0.5IOU"
To use evaluation within Tensorflow training, use metric name
`oid_challenge_detection_metrics` in the evaluation config.
### Running evaluation using TF Object Detection API
## Instance Segmentation Track
5. Produce tf.Example files suitable for running inference: {value=5}
The
[Instance Segmentation metric](https://storage.googleapis.com/openimages/web/evaluation.html#instance_segmentation_eval)
can be directly evaluated using the ground-truth data and model predictions. The
evaluation metric implementation is available in the class
`OpenImagesChallengeEvaluator`.
```
RAW_IMAGES_DIR=/path/to/raw_images_location
OUTPUT_DIR=/path/to/output_tfrecords
python object_detection/dataset_tools/create_oid_tf_record.py \
--input_box_annotations_csv ${BOUNDING_BOXES}_expanded.csv \
--input_image_label_annotations_csv ${IMAGE_LABELS}_expanded.csv \
--input_images_directory ${RAW_IMAGES_DIR} \
--input_label_map object_detection/data/oid_object_detection_challenge_500_label_map.pbtxt \
--output_tf_record_path_prefix ${OUTPUT_DIR} \
--num_shards=100
```
1. Download
[class hierarchy of Open Images Instance Segmentation Challenge 2019](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-label300-segmentable-hierarchy.json)
in JSON format.
2. Download
[ground-truth bounding boxes](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-segmentation-bbox.csv)
and
[image-level labels](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-segmentation-labels.csv).
3. Download instance segmentation files for the validation set (see
[Open Images Challenge Downloads page](https://storage.googleapis.com/openimages/web/challenge2019_downloads.html)).
The download consists of a set of .zip archives containing binary .png
masks.
Those should be transformed into a single CSV file in the format:
ImageID,LabelName,ImageWidth,ImageHeight,XMin,YMin,XMax,YMax,GroupOf,Mask
where Mask is MS COCO RLE encoding of a binary mask stored in .png file.
6. Run inference of your model and fill corresponding fields in tf.Example: see [this tutorial](object_detection/g3doc/oid_inference_and_evaluation.md) on running the inference with Tensorflow Object Detection API models. {value=6}
NOTE: the util to make the transformation will be released soon.
7. Finally, run the evaluation script to produce the final evaluation result.
1. Run the following command to create hierarchical expansion of the instance
segmentation, bounding boxes and image-level label annotations: {value=4}
```
INPUT_TFRECORDS_WITH_DETECTIONS=/path/to/tf_records_with_detections
OUTPUT_CONFIG_DIR=/path/to/configs
HIERARCHY_FILE=/path/to/challenge-2019-label300-hierarchy.json
BOUNDING_BOXES=/path/to/challenge-2019-validation-detection-bbox
IMAGE_LABELS=/path/to/challenge-2019-validation-detection-human-imagelabels
echo "
label_map_path: 'object_detection/data/oid_object_detection_challenge_500_label_map.pbtxt'
tf_record_input_reader: { input_path: '${INPUT_TFRECORDS_WITH_DETECTIONS}' }
" > ${OUTPUT_CONFIG_DIR}/input_config.pbtxt
python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \
--json_hierarchy_file=${HIERARCHY_FILE} \
--input_annotations=${BOUNDING_BOXES}.csv \
--output_annotations=${BOUNDING_BOXES}_expanded.csv \
--annotation_type=1
python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \
--json_hierarchy_file=${HIERARCHY_FILE} \
--input_annotations=${IMAGE_LABELS}.csv \
--output_annotations=${IMAGE_LABELS}_expanded.csv \
--annotation_type=2
echo "
metrics_set: 'oid_challenge_detection_metrics'
" > ${OUTPUT_CONFIG_DIR}/eval_config.pbtxt
python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \
--json_hierarchy_file=${HIERARCHY_FILE} \
--input_annotations=${INSTANCE_SEGMENTATIONS}.csv \
--output_annotations=${INSTANCE_SEGMENTATIONS}_expanded.csv \
--annotation_type=1
```
OUTPUT_METRICS_DIR=/path/to/metrics_csv
1. If you are not using Tensorflow, you can run evaluation directly using your
algorithm's output and generated ground-truth files. {value=4}
python object_detection/metrics/offline_eval_map_corloc.py \
--eval_dir=${OUTPUT_METRICS_DIR} \
--eval_config_path=${OUTPUT_CONFIG_DIR}/eval_config.pbtxt \
--input_config_path=${OUTPUT_CONFIG_DIR}/input_config.pbtxt
```
INPUT_PREDICTIONS=/path/to/instance_segmentation_predictions.csv
OUTPUT_METRICS=/path/to/output/metrics/file
The result of the evaluation will be stored in `${OUTPUT_METRICS_DIR}/metrics.csv`
python models/research/object_detection/metrics/oid_challenge_evaluation.py \
--input_annotations_boxes=${BOUNDING_BOXES}_expanded.csv \
--input_annotations_labels=${IMAGE_LABELS}_expanded.csv \
--input_class_labelmap=object_detection/data/oid_object_detection_challenge_500_label_map.pbtxt \
--input_predictions=${INPUT_PREDICTIONS} \
--input_annotations_segm=${INSTANCE_SEGMENTATIONS}_expanded.csv
--output_metrics=${OUTPUT_METRICS} \
```
For the Object Detection Track, the participants will be ranked on:
For the Instance Segmentation Track, the participants will be ranked on:
- "OpenImagesChallenge2018_Precision/mAP@0.5IOU"
- "OpenImagesInstanceSegmentationChallenge_Precision/mAP@0.5IOU"
## Visual Relationships Detection Track
The [Visual Relationships Detection metrics](https://storage.googleapis.com/openimages/web/vrd_detection_metric.html) can be directly evaluated using the ground-truth data and model predictions. The evaluation metric implementation is available in the class `VRDRelationDetectionEvaluator`,`VRDPhraseDetectionEvaluator`.
1. Download the ground-truth [visual relationships annotations](https://storage.googleapis.com/openimages/challenge_2018/train/challenge-2018-train-vrd.csv) and [image-level labels](https://storage.googleapis.com/openimages/challenge_2018/train/challenge-2018-train-vrd-labels.csv).
2. Filter the rows corresponding to the validation set images you want to use and store the results in the same CSV format.
3. Run the follwing command to produce final metrics:
The
[Visual Relationships Detection metrics](https://storage.googleapis.com/openimages/web/evaluation.html#visual_relationships_eval)
can be directly evaluated using the ground-truth data and model predictions. The
evaluation metric implementation is available in the class
`VRDRelationDetectionEvaluator`,`VRDPhraseDetectionEvaluator`.
1. Download the ground-truth
[visual relationships annotations](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-vrd.csv)
and
[image-level labels](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-vrd-labels.csv).
2. Run the follwing command to produce final metrics:
```
INPUT_ANNOTATIONS_BOXES=/path/to/challenge-2018-train-vrd.csv
......
......@@ -138,6 +138,8 @@ Model name
[^2]: This is PASCAL mAP with a slightly different way of true positives computation: see [Open Images evaluation protocols](evaluation_protocols.md), oid_V2_detection_metrics.
[^3]: Non-face boxes are dropped during training and non-face groundtruth boxes are ignored when evaluating.
[^4]: This is Open Images Challenge metric: see [Open Images evaluation protocols](evaluation_protocols.md), oid_challenge_detection_metrics.
......@@ -135,22 +135,29 @@ output bounding-boxes labelled in the same manner.
The old metric name is DEPRECATED.
`EvalConfig.metrics_set='open_images_V2_detection_metrics'`
## OID Challenge Object Detection Metric 2018
## OID Challenge Object Detection Metric
`EvalConfig.metrics_set='oid_challenge_detection_metrics'`
The metric for the OID Challenge Object Detection Metric 2018, Object Detection
track. The description is provided on the [Open Images Challenge
website](https://storage.googleapis.com/openimages/web/challenge.html).
The metric for the OID Challenge Object Detection Metric 2018/2019 Object
Detection track. The description is provided on the
[Open Images Challenge website](https://storage.googleapis.com/openimages/web/evaluation.html#object_detection_eval).
The old metric name is DEPRECATED.
`EvalConfig.metrics_set='oid_challenge_object_detection_metrics'`
## OID Challenge Visual Relationship Detection Metric 2018
## OID Challenge Visual Relationship Detection Metric
The metric for the OID Challenge Visual Relationship Detection Metric 2018, Visual
Relationship Detection track. The description is provided on the [Open Images
Challenge
website](https://storage.googleapis.com/openimages/web/challenge.html). Note:
this is currently a stand-alone metric, that can be used only through the
The metric for the OID Challenge Visual Relationship Detection Metric 2018,2019
Visual Relationship Detection track. The description is provided on the
[Open Images Challenge website](https://storage.googleapis.com/openimages/web/evaluation.html#visual_relationships_eval).
Note: this is currently a stand-alone metric, that can be used only through the
`metrics/oid_vrd_challenge_evaluation.py` util.
## OID Challenge Instance Segmentation Metric
`EvalConfig.metrics_set='oid_challenge_segmentation_metrics'`
The metric for the OID Challenge Instance Segmentation Metric 2019, Instance
Segmentation track. The description is provided on the
[Open Images Challenge website](https://storage.googleapis.com/openimages/web/evaluation.html#instance_segmentation_eval).
......@@ -47,6 +47,22 @@ INPUT_BUILDER_UTIL_MAP = {
}
def _multiclass_scores_or_one_hot_labels(multiclass_scores,
groundtruth_boxes,
groundtruth_classes, num_classes):
"""Returns one-hot encoding of classes when multiclass_scores is empty."""
# Replace groundtruth_classes tensor with multiclass_scores tensor when its
# non-empty. If multiclass_scores is empty fall back on groundtruth_classes
# tensor.
def true_fn():
return tf.reshape(multiclass_scores,
[tf.shape(groundtruth_boxes)[0], num_classes])
def false_fn():
return tf.one_hot(groundtruth_classes, num_classes)
return tf.cond(tf.size(multiclass_scores) > 0, true_fn, false_fn)
def transform_input_data(tensor_dict,
model_preprocess_fn,
image_resizer_fn,
......@@ -89,102 +105,106 @@ def transform_input_data(tensor_dict,
and classes for a given image if the boxes are exactly the same.
retain_original_image: (optional) whether to retain original image in the
output dictionary.
use_multiclass_scores: whether to use multiclass scores as
class targets instead of one-hot encoding of `groundtruth_classes`.
use_multiclass_scores: whether to use multiclass scores as class targets
instead of one-hot encoding of `groundtruth_classes`. When
this is True and multiclass_scores is empty, one-hot encoding of
`groundtruth_classes` is used as a fallback.
use_bfloat16: (optional) a bool, whether to use bfloat16 in training.
Returns:
A dictionary keyed by fields.InputDataFields containing the tensors obtained
after applying all the transformations.
"""
# Reshape flattened multiclass scores tensor into a 2D tensor of shape
# [num_boxes, num_classes].
if fields.InputDataFields.multiclass_scores in tensor_dict:
tensor_dict[fields.InputDataFields.multiclass_scores] = tf.reshape(
tensor_dict[fields.InputDataFields.multiclass_scores], [
tf.shape(tensor_dict[fields.InputDataFields.groundtruth_boxes])[0],
num_classes
])
if fields.InputDataFields.groundtruth_boxes in tensor_dict:
tensor_dict = util_ops.filter_groundtruth_with_nan_box_coordinates(
tensor_dict)
tensor_dict = util_ops.filter_unrecognized_classes(tensor_dict)
out_tensor_dict = tensor_dict.copy()
if fields.InputDataFields.multiclass_scores in out_tensor_dict:
out_tensor_dict[
fields.InputDataFields
.multiclass_scores] = _multiclass_scores_or_one_hot_labels(
out_tensor_dict[fields.InputDataFields.multiclass_scores],
out_tensor_dict[fields.InputDataFields.groundtruth_boxes],
out_tensor_dict[fields.InputDataFields.groundtruth_classes],
num_classes)
if fields.InputDataFields.groundtruth_boxes in out_tensor_dict:
out_tensor_dict = util_ops.filter_groundtruth_with_nan_box_coordinates(
out_tensor_dict)
out_tensor_dict = util_ops.filter_unrecognized_classes(out_tensor_dict)
if retain_original_image:
tensor_dict[fields.InputDataFields.original_image] = tf.cast(
image_resizer_fn(tensor_dict[fields.InputDataFields.image], None)[0],
tf.uint8)
out_tensor_dict[fields.InputDataFields.original_image] = tf.cast(
image_resizer_fn(out_tensor_dict[fields.InputDataFields.image],
None)[0], tf.uint8)
if fields.InputDataFields.image_additional_channels in tensor_dict:
channels = tensor_dict[fields.InputDataFields.image_additional_channels]
tensor_dict[fields.InputDataFields.image] = tf.concat(
[tensor_dict[fields.InputDataFields.image], channels], axis=2)
if fields.InputDataFields.image_additional_channels in out_tensor_dict:
channels = out_tensor_dict[fields.InputDataFields.image_additional_channels]
out_tensor_dict[fields.InputDataFields.image] = tf.concat(
[out_tensor_dict[fields.InputDataFields.image], channels], axis=2)
# Apply data augmentation ops.
if data_augmentation_fn is not None:
tensor_dict = data_augmentation_fn(tensor_dict)
out_tensor_dict = data_augmentation_fn(out_tensor_dict)
# Apply model preprocessing ops and resize instance masks.
image = tensor_dict[fields.InputDataFields.image]
image = out_tensor_dict[fields.InputDataFields.image]
preprocessed_resized_image, true_image_shape = model_preprocess_fn(
tf.expand_dims(tf.cast(image, dtype=tf.float32), axis=0))
if use_bfloat16:
preprocessed_resized_image = tf.cast(
preprocessed_resized_image, tf.bfloat16)
tensor_dict[fields.InputDataFields.image] = tf.squeeze(
out_tensor_dict[fields.InputDataFields.image] = tf.squeeze(
preprocessed_resized_image, axis=0)
tensor_dict[fields.InputDataFields.true_image_shape] = tf.squeeze(
out_tensor_dict[fields.InputDataFields.true_image_shape] = tf.squeeze(
true_image_shape, axis=0)
if fields.InputDataFields.groundtruth_instance_masks in tensor_dict:
masks = tensor_dict[fields.InputDataFields.groundtruth_instance_masks]
if fields.InputDataFields.groundtruth_instance_masks in out_tensor_dict:
masks = out_tensor_dict[fields.InputDataFields.groundtruth_instance_masks]
_, resized_masks, _ = image_resizer_fn(image, masks)
if use_bfloat16:
resized_masks = tf.cast(resized_masks, tf.bfloat16)
tensor_dict[fields.InputDataFields.
groundtruth_instance_masks] = resized_masks
out_tensor_dict[
fields.InputDataFields.groundtruth_instance_masks] = resized_masks
# Transform groundtruth classes to one hot encodings.
label_offset = 1
zero_indexed_groundtruth_classes = tensor_dict[
zero_indexed_groundtruth_classes = out_tensor_dict[
fields.InputDataFields.groundtruth_classes] - label_offset
tensor_dict[fields.InputDataFields.groundtruth_classes] = tf.one_hot(
zero_indexed_groundtruth_classes, num_classes)
if use_multiclass_scores:
tensor_dict[fields.InputDataFields.groundtruth_classes] = tensor_dict[
fields.InputDataFields.multiclass_scores]
tensor_dict.pop(fields.InputDataFields.multiclass_scores, None)
out_tensor_dict[
fields.InputDataFields.groundtruth_classes] = out_tensor_dict[
fields.InputDataFields.multiclass_scores]
else:
out_tensor_dict[fields.InputDataFields.groundtruth_classes] = tf.one_hot(
zero_indexed_groundtruth_classes, num_classes)
out_tensor_dict.pop(fields.InputDataFields.multiclass_scores, None)
if fields.InputDataFields.groundtruth_confidences in tensor_dict:
groundtruth_confidences = tensor_dict[
if fields.InputDataFields.groundtruth_confidences in out_tensor_dict:
groundtruth_confidences = out_tensor_dict[
fields.InputDataFields.groundtruth_confidences]
# Map the confidences to the one-hot encoding of classes
tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
out_tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
tf.reshape(groundtruth_confidences, [-1, 1]) *
tensor_dict[fields.InputDataFields.groundtruth_classes])
out_tensor_dict[fields.InputDataFields.groundtruth_classes])
else:
groundtruth_confidences = tf.ones_like(
zero_indexed_groundtruth_classes, dtype=tf.float32)
tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
tensor_dict[fields.InputDataFields.groundtruth_classes])
out_tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
out_tensor_dict[fields.InputDataFields.groundtruth_classes])
if merge_multiple_boxes:
merged_boxes, merged_classes, merged_confidences, _ = (
util_ops.merge_boxes_with_multiple_labels(
tensor_dict[fields.InputDataFields.groundtruth_boxes],
out_tensor_dict[fields.InputDataFields.groundtruth_boxes],
zero_indexed_groundtruth_classes,
groundtruth_confidences,
num_classes))
merged_classes = tf.cast(merged_classes, tf.float32)
tensor_dict[fields.InputDataFields.groundtruth_boxes] = merged_boxes
tensor_dict[fields.InputDataFields.groundtruth_classes] = merged_classes
tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
out_tensor_dict[fields.InputDataFields.groundtruth_boxes] = merged_boxes
out_tensor_dict[fields.InputDataFields.groundtruth_classes] = merged_classes
out_tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
merged_confidences)
if fields.InputDataFields.groundtruth_boxes in tensor_dict:
tensor_dict[fields.InputDataFields.num_groundtruth_boxes] = tf.shape(
tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]
if fields.InputDataFields.groundtruth_boxes in out_tensor_dict:
out_tensor_dict[fields.InputDataFields.num_groundtruth_boxes] = tf.shape(
out_tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]
return tensor_dict
return out_tensor_dict
def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
......
......@@ -611,6 +611,62 @@ class DataTransformationFnTest(test_case.TestCase):
self.assertAllClose(transformed_inputs[fields.InputDataFields.image],
np.concatenate((image, additional_channels), axis=2))
def test_use_multiclass_scores_when_present(self):
image = np.random.rand(4, 4, 3).astype(np.float32)
tensor_dict = {
fields.InputDataFields.image:
tf.constant(image),
fields.InputDataFields.groundtruth_boxes:
tf.constant(np.array([[.5, .5, 1, 1], [.5, .5, 1, 1]], np.float32)),
fields.InputDataFields.multiclass_scores:
tf.constant(np.array([0.2, 0.3, 0.5, 0.1, 0.6, 0.3], np.float32)),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([1, 2], np.int32))
}
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=_fake_model_preprocessor_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=3, use_multiclass_scores=True)
with self.test_session() as sess:
transformed_inputs = sess.run(
input_transformation_fn(tensor_dict=tensor_dict))
self.assertAllClose(
np.array([[0.2, 0.3, 0.5], [0.1, 0.6, 0.3]], np.float32),
transformed_inputs[fields.InputDataFields.groundtruth_classes])
def test_use_multiclass_scores_when_not_present(self):
image = np.random.rand(4, 4, 3).astype(np.float32)
tensor_dict = {
fields.InputDataFields.image:
tf.constant(image),
fields.InputDataFields.groundtruth_boxes:
tf.constant(np.array([[.5, .5, 1, 1], [.5, .5, 1, 1]], np.float32)),
fields.InputDataFields.multiclass_scores:
tf.placeholder(tf.float32),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([1, 2], np.int32))
}
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=_fake_model_preprocessor_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=3, use_multiclass_scores=True)
with self.test_session() as sess:
transformed_inputs = sess.run(
input_transformation_fn(tensor_dict=tensor_dict),
feed_dict={
tensor_dict[fields.InputDataFields.multiclass_scores]:
np.array([], dtype=np.float32)
})
self.assertAllClose(
np.array([[0, 1, 0], [0, 0, 1]], np.float32),
transformed_inputs[fields.InputDataFields.groundtruth_classes])
def test_returns_correct_class_label_encodings(self):
tensor_dict = {
fields.InputDataFields.image:
......
......@@ -108,6 +108,7 @@ from object_detection.core import standard_fields as fields
from object_detection.core import target_assigner
from object_detection.utils import ops
from object_detection.utils import shape_utils
from object_detection.utils import variables_helper
slim = tf.contrib.slim
......@@ -210,7 +211,7 @@ class FasterRCNNFeatureExtractor(object):
the model graph.
"""
variables_to_restore = {}
for variable in tf.global_variables():
for variable in variables_helper.get_global_variables_safely():
for scope_name in [first_stage_feature_extractor_scope,
second_stage_feature_extractor_scope]:
if variable.op.name.startswith(scope_name):
......@@ -275,7 +276,7 @@ class FasterRCNNKerasFeatureExtractor(object):
the model graph.
"""
variables_to_restore = {}
for variable in tf.global_variables():
for variable in variables_helper.get_global_variables_safely():
for scope_name in [first_stage_feature_extractor_scope,
second_stage_feature_extractor_scope]:
if variable.op.name.startswith(scope_name):
......@@ -1193,6 +1194,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
detection_masks = self._gather_instance_masks(
detection_masks, detection_classes)
detection_masks = tf.cast(detection_masks, tf.float32)
prediction_dict[fields.DetectionResultFields.detection_masks] = (
tf.reshape(tf.sigmoid(detection_masks),
[batch_size, max_detection, mask_height, mask_width]))
......@@ -1461,9 +1463,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
mask_predictions=mask_predictions)
if 'rpn_features_to_crop' in prediction_dict and self._initial_crop_size:
self._add_detection_features_output_node(
detections_dict[fields.DetectionResultFields.detection_boxes],
prediction_dict['rpn_features_to_crop'])
detections_dict[
'detection_features'] = self._add_detection_features_output_node(
detections_dict[fields.DetectionResultFields.detection_boxes],
prediction_dict['rpn_features_to_crop'])
return detections_dict
......@@ -1474,18 +1477,25 @@ class FasterRCNNMetaArch(model.DetectionModel):
def _add_detection_features_output_node(self, detection_boxes,
rpn_features_to_crop):
"""Add the detection features to the output node.
"""Add detection features to outputs.
The detection features are from cropping rpn_features with boxes.
Each bounding box has one feature vector of length depth, which comes from
mean_pooling of the cropped rpn_features.
This function extracts box features for each box in rpn_features_to_crop.
It returns the extracted box features, reshaped to
[batch size, max_detections, height, width, depth], and average pools
the extracted features across the spatial dimensions and adds a graph node
to the pooled features named 'pooled_detection_features'
Args:
detection_boxes: a 3-D float32 tensor of shape
[batch_size, max_detection, 4] which represents the bounding boxes.
[batch_size, max_detections, 4] which represents the bounding boxes.
rpn_features_to_crop: A 4-D float32 tensor with shape
[batch, height, width, depth] representing image features to crop using
the proposals boxes.
Returns:
detection_features: a 4-D float32 tensor of shape
[batch size, max_detections, height, width, depth] representing
cropped image features
"""
with tf.name_scope('SecondStageDetectionFeaturesExtract'):
flattened_detected_feature_maps = (
......@@ -1495,15 +1505,23 @@ class FasterRCNNMetaArch(model.DetectionModel):
flattened_detected_feature_maps)
batch_size = tf.shape(detection_boxes)[0]
max_detection = tf.shape(detection_boxes)[1]
max_detections = tf.shape(detection_boxes)[1]
detection_features_pool = tf.reduce_mean(
detection_features_unpooled, axis=[1, 2])
detection_features = tf.reshape(
reshaped_detection_features_pool = tf.reshape(
detection_features_pool,
[batch_size, max_detection, tf.shape(detection_features_pool)[-1]])
[batch_size, max_detections, tf.shape(detection_features_pool)[-1]])
reshaped_detection_features_pool = tf.identity(
reshaped_detection_features_pool, 'pooled_detection_features')
detection_features = tf.identity(
detection_features, 'detection_features')
reshaped_detection_features = tf.reshape(
detection_features_unpooled,
[batch_size, max_detections,
tf.shape(detection_features_unpooled)[1],
tf.shape(detection_features_unpooled)[2],
tf.shape(detection_features_unpooled)[3]])
return reshaped_detection_features
def _postprocess_rpn(self,
rpn_box_encodings_batch,
......@@ -1749,6 +1767,15 @@ class FasterRCNNMetaArch(model.DetectionModel):
resized_masks_list.append(resized_mask)
groundtruth_masks_list = resized_masks_list
# Masks could be set to bfloat16 in the input pipeline for performance
# reasons. Convert masks back to floating point space here since the rest of
# this module assumes groundtruth to be of float32 type.
float_groundtruth_masks_list = []
if groundtruth_masks_list:
for mask in groundtruth_masks_list:
float_groundtruth_masks_list.append(tf.cast(mask, tf.float32))
groundtruth_masks_list = float_groundtruth_masks_list
if self.groundtruth_has_field(fields.BoxListFields.weights):
groundtruth_weights_list = self.groundtruth_lists(
fields.BoxListFields.weights)
......@@ -2619,7 +2646,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
self.first_stage_feature_extractor_scope,
self.second_stage_feature_extractor_scope)
variables_to_restore = tf.global_variables()
variables_to_restore = variables_helper.get_global_variables_safely()
variables_to_restore.append(slim.get_or_create_global_step())
# Only load feature extractor variables to be consistent with loading from
# a classification checkpoint.
......
......@@ -383,6 +383,11 @@ class FasterRCNNMetaArchTest(
class_predictions_with_background_shapes = [(16, 3), (None, 3)]
proposal_boxes_shapes = [(2, 8, 4), (None, 8, 4)]
batch_size = 2
initial_crop_size = 3
maxpool_stride = 1
height = initial_crop_size/maxpool_stride
width = initial_crop_size/maxpool_stride
depth = 3
image_shape = np.array((2, 36, 48, 3), dtype=np.int32)
for (num_proposals_shape, refined_box_encoding_shape,
class_predictions_with_background_shape,
......@@ -433,6 +438,7 @@ class FasterRCNNMetaArchTest(
'detection_scores': tf.zeros([2, 5]),
'detection_classes': tf.zeros([2, 5]),
'num_detections': tf.zeros([2]),
'detection_features': tf.zeros([2, 5, width, height, depth])
}, true_image_shapes)
with self.test_session(graph=tf_graph) as sess:
detections_out = sess.run(
......@@ -453,6 +459,9 @@ class FasterRCNNMetaArchTest(
self.assertAllClose(detections_out['num_detections'].shape, [2])
self.assertTrue(np.amax(detections_out['detection_masks'] <= 1.0))
self.assertTrue(np.amin(detections_out['detection_masks'] >= 0.0))
self.assertAllEqual(detections_out['detection_features'].shape,
[2, 5, width, height, depth])
self.assertGreaterEqual(np.amax(detections_out['detection_features']), 0)
def _get_box_classifier_features_shape(self,
image_size,
......
......@@ -28,6 +28,7 @@ from object_detection.core import standard_fields as fields
from object_detection.core import target_assigner
from object_detection.utils import ops
from object_detection.utils import shape_utils
from object_detection.utils import variables_helper
from object_detection.utils import visualization_utils
slim = tf.contrib.slim
......@@ -45,6 +46,7 @@ class SSDFeatureExtractor(object):
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
num_layers=6,
override_base_feature_extractor_hyperparams=False):
"""Constructor.
......@@ -61,6 +63,7 @@ class SSDFeatureExtractor(object):
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
num_layers: Number of SSD layers.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams_fn`.
......@@ -73,6 +76,7 @@ class SSDFeatureExtractor(object):
self._reuse_weights = reuse_weights
self._use_explicit_padding = use_explicit_padding
self._use_depthwise = use_depthwise
self._num_layers = num_layers
self._override_base_feature_extractor_hyperparams = (
override_base_feature_extractor_hyperparams)
......@@ -126,7 +130,7 @@ class SSDFeatureExtractor(object):
the model graph.
"""
variables_to_restore = {}
for variable in tf.global_variables():
for variable in variables_helper.get_global_variables_safely():
var_name = variable.op.name
if var_name.startswith(feature_extractor_scope + '/'):
var_name = var_name.replace(feature_extractor_scope + '/', '')
......@@ -148,6 +152,7 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
inplace_batchnorm_update,
use_explicit_padding=False,
use_depthwise=False,
num_layers=6,
override_base_feature_extractor_hyperparams=False,
name=None):
"""Constructor.
......@@ -172,6 +177,7 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
num_layers: Number of SSD layers.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams_config`.
......@@ -189,6 +195,7 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
self._inplace_batchnorm_update = inplace_batchnorm_update
self._use_explicit_padding = use_explicit_padding
self._use_depthwise = use_depthwise
self._num_layers = num_layers
self._override_base_feature_extractor_hyperparams = (
override_base_feature_extractor_hyperparams)
......@@ -247,11 +254,13 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
the model graph.
"""
variables_to_restore = {}
for variable in tf.global_variables():
var_name = variable.op.name
for variable in self.variables:
# variable.name includes ":0" at the end, but the names in the checkpoint
# do not have the suffix ":0". So, we strip it here.
var_name = variable.name[:-2]
if var_name.startswith(feature_extractor_scope + '/'):
var_name = var_name.replace(feature_extractor_scope + '/', '')
variables_to_restore[var_name] = variable
variables_to_restore[var_name] = variable
return variables_to_restore
......@@ -709,6 +718,14 @@ class SSDMetaArch(model.DetectionModel):
additional_fields = {
'multiclass_scores': detection_scores_with_background
}
if self._anchors is not None:
anchor_indices = tf.range(self._anchors.num_boxes_static())
batch_anchor_indices = tf.tile(
tf.expand_dims(anchor_indices, 0), [batch_size, 1])
# All additional fields need to be float.
additional_fields.update({
'anchor_indices': tf.cast(batch_anchor_indices, tf.float32),
})
if detection_keypoints is not None:
detection_keypoints = tf.identity(
detection_keypoints, 'raw_keypoint_locations')
......@@ -737,6 +754,12 @@ class SSDMetaArch(model.DetectionModel):
fields.DetectionResultFields.raw_detection_scores:
detection_scores_with_background
}
if (nmsed_additional_fields is not None and
'anchor_indices' in nmsed_additional_fields):
detection_dict.update({
fields.DetectionResultFields.detection_anchor_indices:
tf.cast(nmsed_additional_fields['anchor_indices'], tf.int32),
})
if (nmsed_additional_fields is not None and
fields.BoxListFields.keypoints in nmsed_additional_fields):
detection_dict[fields.DetectionResultFields.detection_keypoints] = (
......@@ -1218,13 +1241,24 @@ class SSDMetaArch(model.DetectionModel):
if fine_tune_checkpoint_type == 'detection':
variables_to_restore = {}
for variable in tf.global_variables():
var_name = variable.op.name
if load_all_detection_checkpoint_vars:
variables_to_restore[var_name] = variable
else:
if var_name.startswith(self._extract_features_scope):
if tf.executing_eagerly():
for variable in self.variables:
# variable.name includes ":0" at the end, but the names in the
# checkpoint do not have the suffix ":0". So, we strip it here.
var_name = variable.name[:-2]
if load_all_detection_checkpoint_vars:
variables_to_restore[var_name] = variable
else:
if var_name.startswith(self._extract_features_scope):
variables_to_restore[var_name] = variable
else:
for variable in variables_helper.get_global_variables_safely():
var_name = variable.op.name
if load_all_detection_checkpoint_vars:
variables_to_restore[var_name] = variable
else:
if var_name.startswith(self._extract_features_scope):
variables_to_restore[var_name] = variable
return variables_to_restore
......
......@@ -188,6 +188,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
[0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]]]
raw_detection_scores = [[[0, 0], [0, 0], [0, 0], [0, 0]],
[[0, 0], [0, 0], [0, 0], [0, 0]]]
detection_anchor_indices = [[0, 2, 1, 0, 0], [0, 2, 1, 0, 0]]
for input_shape in input_shapes:
tf_graph = tf.Graph()
......@@ -229,6 +230,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
raw_detection_boxes)
self.assertAllEqual(detections_out['raw_detection_scores'],
raw_detection_scores)
self.assertAllEqual(detections_out['detection_anchor_indices'],
detection_anchor_indices)
def test_postprocess_results_are_correct_static(self, use_keras):
with tf.Graph().as_default():
......
......@@ -13,7 +13,12 @@
# limitations under the License.
# ==============================================================================
"""Class for evaluating object detections with COCO metrics."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from six.moves import zip
import tensorflow as tf
from object_detection.core import standard_fields
......
......@@ -39,6 +39,10 @@ then evaluation (in multi-class mode) can be invoked as follows:
metrics = evaluator.ComputeMetrics()
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
import copy
import time
......@@ -48,6 +52,8 @@ from pycocotools import coco
from pycocotools import cocoeval
from pycocotools import mask
from six.moves import range
from six.moves import zip
import tensorflow as tf
from object_detection.utils import json_utils
......
......@@ -40,6 +40,8 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import logging
from absl import app
from absl import flags
import pandas as pd
......@@ -120,20 +122,22 @@ def main(unused_argv):
object_detection_evaluation.OpenImagesChallengeEvaluator(
categories, evaluate_masks=is_instance_segmentation_eval))
all_predictions = pd.read_csv(FLAGS.input_predictions)
images_processed = 0
for _, groundtruth in enumerate(all_annotations.groupby('ImageID')):
logging.info('Processing image %d', images_processed)
image_id, image_groundtruth = groundtruth
groundtruth_dictionary = utils.build_groundtruth_dictionary(
image_groundtruth, class_label_map)
challenge_evaluator.add_single_ground_truth_image_info(
image_id, groundtruth_dictionary)
all_predictions = pd.read_csv(FLAGS.input_predictions)
for _, prediction_data in enumerate(all_predictions.groupby('ImageID')):
image_id, image_predictions = prediction_data
prediction_dictionary = utils.build_predictions_dictionary(
image_predictions, class_label_map)
all_predictions.loc[all_predictions['ImageID'] == image_id],
class_label_map)
challenge_evaluator.add_single_detected_image_info(image_id,
prediction_dictionary)
images_processed += 1
metrics = challenge_evaluator.evaluate()
......
......@@ -18,10 +18,13 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import base64
import zlib
import numpy as np
import pandas as pd
from pycocotools import mask as coco_mask
from pycocotools import mask
from object_detection.core import standard_fields
......@@ -53,33 +56,42 @@ def _decode_raw_data_into_masks_and_boxes(segments, image_widths,
"""Decods binary segmentation masks into np.arrays and boxes.
Args:
segments: pandas Series object containing either None entries or strings
with COCO-encoded binary masks. All masks are expected to be the same size.
segments: pandas Series object containing either
None entries, or strings with
base64, zlib compressed, COCO RLE-encoded binary masks.
All masks are expected to be the same size.
image_widths: pandas Series of mask widths.
image_heights: pandas Series of mask heights.
Returns:
a np.ndarray of the size NxWxH, where W and H is determined from the encoded
masks; for the None values, zero arrays of size WxH are created. if input
masks; for the None values, zero arrays of size WxH are created. If input
contains only None values, W=1, H=1.
"""
segment_masks = []
segment_boxes = []
ind = segments.first_valid_index()
if ind is not None:
size = [int(image_heights.iloc[ind]), int(image_widths[ind])]
size = [int(image_heights[ind]), int(image_widths[ind])]
else:
# It does not matter which size we pick since no masks will ever be
# evaluated.
size = [1, 1]
return np.zeros((segments.shape[0], 1, 1), dtype=np.uint8), np.zeros(
(segments.shape[0], 4), dtype=np.float32)
for segment, im_width, im_height in zip(segments, image_widths,
image_heights):
if pd.isnull(segment):
segment_masks.append(np.zeros([1, size[0], size[1]], dtype=np.uint8))
segment_boxes.append(np.expand_dims(np.array([0.0, 0.0, 0.0, 0.0]), 0))
else:
encoding_dict = {'size': [im_height, im_width], 'counts': segment}
mask_tensor = mask.decode(encoding_dict)
compressed_mask = base64.b64decode(segment)
rle_encoded_mask = zlib.decompress(compressed_mask)
decoding_dict = {
'size': [im_height, im_width],
'counts': rle_encoded_mask
}
mask_tensor = coco_mask.decode(decoding_dict)
segment_masks.append(np.expand_dims(mask_tensor, 0))
segment_boxes.append(np.expand_dims(_to_normalized_box(mask_tensor), 0))
......
......@@ -18,15 +18,43 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import base64
import zlib
import numpy as np
import pandas as pd
from pycocotools import mask
from pycocotools import mask as coco_mask
import tensorflow as tf
from object_detection.core import standard_fields
from object_detection.metrics import oid_challenge_evaluation_utils as utils
def encode_mask(mask_to_encode):
"""Encodes a binary mask into the Kaggle challenge text format.
The encoding is done in three stages:
- COCO RLE-encoding,
- zlib compression,
- base64 encoding (to use as entry in csv file).
Args:
mask_to_encode: binary np.ndarray of dtype bool and 2d shape.
Returns:
A (base64) text string of the encoded mask.
"""
mask_to_encode = np.squeeze(mask_to_encode)
mask_to_encode = mask_to_encode.reshape(mask_to_encode.shape[0],
mask_to_encode.shape[1], 1)
mask_to_encode = mask_to_encode.astype(np.uint8)
mask_to_encode = np.asfortranarray(mask_to_encode)
encoded_mask = coco_mask.encode(mask_to_encode)[0]['counts']
compressed_mask = zlib.compress(encoded_mask, zlib.Z_BEST_COMPRESSION)
base64_mask = base64.b64encode(compressed_mask)
return base64_mask
class OidUtilTest(tf.test.TestCase):
def testMaskToNormalizedBox(self):
......@@ -44,10 +72,10 @@ class OidUtilTest(tf.test.TestCase):
mask1 = np.array([[0, 0, 1, 1], [0, 0, 1, 1], [0, 0, 0, 0]], dtype=np.uint8)
mask2 = np.array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], dtype=np.uint8)
encoding1 = mask.encode(np.asfortranarray(mask1))
encoding2 = mask.encode(np.asfortranarray(mask2))
encoding1 = encode_mask(mask1)
encoding2 = encode_mask(mask2)
vals = pd.Series([encoding1['counts'], encoding2['counts']])
vals = pd.Series([encoding1, encoding2])
image_widths = pd.Series([mask1.shape[1], mask2.shape[1]])
image_heights = pd.Series([mask1.shape[0], mask2.shape[0]])
......@@ -60,6 +88,15 @@ class OidUtilTest(tf.test.TestCase):
self.assertAllEqual(expected_segm, segm)
self.assertAllEqual(expected_bbox, bbox)
def testDecodeToTensorsNoMasks(self):
vals = pd.Series([None, None])
image_widths = pd.Series([None, None])
image_heights = pd.Series([None, None])
segm, bbox = utils._decode_raw_data_into_masks_and_boxes(
vals, image_widths, image_heights)
self.assertAllEqual(np.zeros((2, 1, 1), dtype=np.uint8), segm)
self.assertAllEqual(np.zeros((2, 4), dtype=np.float32), bbox)
class OidChallengeEvaluationUtilTest(tf.test.TestCase):
......@@ -140,13 +177,13 @@ class OidChallengeEvaluationUtilTest(tf.test.TestCase):
mask2 = np.array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]],
dtype=np.uint8)
encoding1 = mask.encode(np.asfortranarray(mask1))
encoding2 = mask.encode(np.asfortranarray(mask2))
encoding1 = encode_mask(mask1)
encoding2 = encode_mask(mask2)
np_data = pd.DataFrame(
[[
'fe58ec1b06db2bb7', mask1.shape[1], mask1.shape[0], '/m/04bcr3',
0.0, 0.3, 0.5, 0.6, 0, None, encoding1['counts']
0.0, 0.3, 0.5, 0.6, 0, None, encoding1
],
[
'fe58ec1b06db2bb7', None, None, '/m/02gy9n', 0.1, 0.2, 0.3, 0.4, 1,
......@@ -154,7 +191,7 @@ class OidChallengeEvaluationUtilTest(tf.test.TestCase):
],
[
'fe58ec1b06db2bb7', mask2.shape[1], mask2.shape[0], '/m/02gy9n',
0.5, 0.6, 0.8, 0.9, 0, None, encoding2['counts']
0.5, 0.6, 0.8, 0.9, 0, None, encoding2
],
[
'fe58ec1b06db2bb7', None, None, '/m/04bcr3', None, None, None,
......@@ -218,21 +255,21 @@ class OidChallengeEvaluationUtilTest(tf.test.TestCase):
mask2 = np.array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]],
dtype=np.uint8)
encoding1 = mask.encode(np.asfortranarray(mask1))
encoding2 = mask.encode(np.asfortranarray(mask2))
encoding1 = encode_mask(mask1)
encoding2 = encode_mask(mask2)
np_data = pd.DataFrame(
[[
'fe58ec1b06db2bb7', mask1.shape[1], mask1.shape[0], '/m/04bcr3',
encoding1['counts'], 0.8
],
[
'fe58ec1b06db2bb7', mask2.shape[1], mask2.shape[0], '/m/02gy9n',
encoding2['counts'], 0.6
]],
columns=[
'ImageID', 'ImageWidth', 'ImageHeight', 'LabelName', 'Mask', 'Score'
])
np_data = pd.DataFrame([[
'fe58ec1b06db2bb7', mask1.shape[1], mask1.shape[0], '/m/04bcr3',
encoding1, 0.8
],
[
'fe58ec1b06db2bb7', mask2.shape[1],
mask2.shape[0], '/m/02gy9n', encoding2, 0.6
]],
columns=[
'ImageID', 'ImageWidth', 'ImageHeight',
'LabelName', 'Mask', 'Score'
])
class_label_map = {'/m/04bcr3': 1, '/m/02gy9n': 3}
prediction_dictionary = utils.build_predictions_dictionary(
np_data, class_label_map)
......
......@@ -24,7 +24,6 @@ import os
import tensorflow as tf
from tensorflow.python.util import function_utils
from object_detection import eval_util
from object_detection import exporter as exporter_lib
from object_detection import inputs
......@@ -187,7 +186,7 @@ def unstack_batch(tensor_dict, unpad_groundtruth_tensors=True):
return unbatched_tensor_dict
def _provide_groundtruth(model, labels):
def provide_groundtruth(model, labels):
"""Provides the labels to a model as groundtruth.
This helper function extracts the corresponding boxes, classes,
......@@ -287,7 +286,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
labels, unpad_groundtruth_tensors=unpad_groundtruth_tensors)
if mode in (tf.estimator.ModeKeys.TRAIN, tf.estimator.ModeKeys.EVAL):
_provide_groundtruth(detection_model, labels)
provide_groundtruth(detection_model, labels)
preprocessed_images = features[fields.InputDataFields.image]
if use_tpu and train_config.use_bfloat16:
......@@ -524,7 +523,7 @@ def create_estimator_and_inputs(run_config,
pipeline_config_path,
config_override=None,
train_steps=None,
sample_1_of_n_eval_examples=1,
sample_1_of_n_eval_examples=None,
sample_1_of_n_eval_on_train_examples=1,
model_fn_creator=create_model_fn,
use_tpu_estimator=False,
......@@ -606,9 +605,12 @@ def create_estimator_and_inputs(run_config,
pipeline_config_path, config_override=config_override)
kwargs.update({
'train_steps': train_steps,
'sample_1_of_n_eval_examples': sample_1_of_n_eval_examples,
'use_bfloat16': configs['train_config'].use_bfloat16 and use_tpu
})
if sample_1_of_n_eval_examples >= 1:
kwargs.update({
'sample_1_of_n_eval_examples': sample_1_of_n_eval_examples
})
if override_eval_num_epochs:
kwargs.update({'eval_num_epochs': 1})
tf.logging.warning(
......@@ -667,11 +669,6 @@ def create_estimator_and_inputs(run_config,
model_fn = model_fn_creator(detection_model_fn, configs, hparams, use_tpu,
postprocess_on_cpu)
if use_tpu_estimator:
# Multicore inference disabled due to b/129367127
tpu_estimator_args = function_utils.fn_args(tf.contrib.tpu.TPUEstimator)
kwargs = {}
if 'experimental_export_device_assignment' in tpu_estimator_args:
kwargs['experimental_export_device_assignment'] = True
estimator = tf.contrib.tpu.TPUEstimator(
model_fn=model_fn,
train_batch_size=train_config.batch_size,
......@@ -681,8 +678,7 @@ def create_estimator_and_inputs(run_config,
config=run_config,
export_to_tpu=export_to_tpu,
eval_on_tpu=False, # Eval runs on CPU, so disable eval on TPU
params=params if params else {},
**kwargs)
params=params if params else {})
else:
estimator = tf.estimator.Estimator(model_fn=model_fn, config=run_config)
......
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Constructs model, inputs, and training environment."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import copy
import time
import tensorflow as tf
from object_detection import eval_util
from object_detection import inputs
from object_detection import model_lib
from object_detection.builders import model_builder
from object_detection.builders import optimizer_builder
from object_detection.core import standard_fields as fields
from object_detection.utils import config_util
from object_detection.utils import label_map_util
from object_detection.utils import ops
from object_detection.utils import variables_helper
MODEL_BUILD_UTIL_MAP = model_lib.MODEL_BUILD_UTIL_MAP
### NOTE: This file is a wip.
### TODO(kaftan): Explore adding unit tests for individual methods
### TODO(kaftan): Add unit test that checks training on a single image w/
#### groundtruth, and verfiy that loss goes to zero.
#### Possibly have version that takes it as the whole train & eval dataset,
#### & verify the loss output from the eval_loop method.
### TODO(kaftan): Make sure the unit tests run in TAP presubmits or Kokoro
def _compute_losses_and_predictions_dicts(
model, features, labels,
add_regularization_loss=True,
use_tpu=False,
use_bfloat16=False):
"""Computes the losses dict and predictions dict for a model on inputs.
Args:
model: a DetectionModel (based on Keras).
features: Dictionary of feature tensors from the input dataset.
Should be in the format output by `inputs.train_input` and
`inputs.eval_input`.
features[fields.InputDataFields.image] is a [batch_size, H, W, C]
float32 tensor with preprocessed images.
features[HASH_KEY] is a [batch_size] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [batch_size, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] (optional) is a
[batch_size, H, W, C] float32 tensor with original images.
labels: A dictionary of groundtruth tensors post-unstacking. The original
labels are of the form returned by `inputs.train_input` and
`inputs.eval_input`. The shapes may have been modified by unstacking with
`model_lib.unstack_batch`. However, the dictionary includes the following
fields.
labels[fields.InputDataFields.num_groundtruth_boxes] is a
int32 tensor indicating the number of valid groundtruth boxes
per image.
labels[fields.InputDataFields.groundtruth_boxes] is a float32 tensor
containing the corners of the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a float32
one-hot tensor of classes.
labels[fields.InputDataFields.groundtruth_weights] is a float32 tensor
containing groundtruth weights for the boxes.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
float32 tensor containing only binary values, which represent
instance masks for objects.
labels[fields.InputDataFields.groundtruth_keypoints] is a
float32 tensor containing keypoints for each box.
add_regularization_loss: Whether or not to include the model's
regularization loss in the losses dictionary.
use_tpu: Whether computation should happen on a TPU.
use_bfloat16: Whether computation on a TPU should use bfloat16.
Returns:
A tuple containing the losses dictionary (with the total loss under
the key 'Loss/total_loss'), and the predictions dictionary produced by
`model.predict`.
"""
model_lib.provide_groundtruth(model, labels)
preprocessed_images = features[fields.InputDataFields.image]
# TODO(kaftan): Check how we're supposed to do this mixed precision stuff
## in TF2 TPUStrategy + Keras
if use_tpu and use_bfloat16:
with tf.contrib.tpu.bfloat16_scope():
prediction_dict = model.predict(
preprocessed_images,
features[fields.InputDataFields.true_image_shape])
prediction_dict = ops.bfloat16_to_float32_nested(prediction_dict)
else:
prediction_dict = model.predict(
preprocessed_images,
features[fields.InputDataFields.true_image_shape])
losses_dict = model.loss(
prediction_dict, features[fields.InputDataFields.true_image_shape])
losses = [loss_tensor for loss_tensor in losses_dict.values()]
if add_regularization_loss:
# TODO(kaftan): As we figure out mixed precision & bfloat 16, we may
## need to convert these regularization losses from bfloat16 to float32
## as well.
regularization_losses = model.regularization_losses()
if regularization_losses:
regularization_loss = tf.add_n(
regularization_losses, name='regularization_loss')
losses.append(regularization_loss)
losses_dict['Loss/regularization_loss'] = regularization_loss
total_loss = tf.add_n(losses, name='total_loss')
losses_dict['Loss/total_loss'] = total_loss
return losses_dict, prediction_dict
# TODO(kaftan): Explore removing learning_rate from this method & returning
## The full losses dict instead of just total_loss, then doing all summaries
## saving in a utility method called by the outer training loop.
# TODO(kaftan): Explore adding gradient summaries
def eager_train_step(detection_model,
features,
labels,
unpad_groundtruth_tensors,
optimizer,
learning_rate,
add_regularization_loss=True,
clip_gradients_value=None,
use_tpu=False,
use_bfloat16=False,
global_step=None,
num_replicas=1.0):
"""Process a single training batch.
This method computes the loss for the model on a single training batch,
while tracking the gradients with a gradient tape. It then updates the
model variables with the optimizer, clipping the gradients if
clip_gradients_value is present.
This method can run eagerly or inside a tf.function.
Args:
detection_model: A DetectionModel (based on Keras) to train.
features: Dictionary of feature tensors from the input dataset.
Should be in the format output by `inputs.train_input.
features[fields.InputDataFields.image] is a [batch_size, H, W, C]
float32 tensor with preprocessed images.
features[HASH_KEY] is a [batch_size] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [batch_size, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] (optional, not used
during training) is a
[batch_size, H, W, C] float32 tensor with original images.
labels: A dictionary of groundtruth tensors. This method unstacks
these labels using model_lib.unstack_batch. The stacked labels are of
the form returned by `inputs.train_input` and `inputs.eval_input`.
labels[fields.InputDataFields.num_groundtruth_boxes] is a [batch_size]
int32 tensor indicating the number of valid groundtruth boxes
per image.
labels[fields.InputDataFields.groundtruth_boxes] is a
[batch_size, num_boxes, 4] float32 tensor containing the corners of
the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a
[batch_size, num_boxes, num_classes] float32 one-hot tensor of
classes. num_classes includes the background class.
labels[fields.InputDataFields.groundtruth_weights] is a
[batch_size, num_boxes] float32 tensor containing groundtruth weights
for the boxes.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[batch_size, num_boxes, H, W] float32 tensor containing only binary
values, which represent instance masks for objects.
labels[fields.InputDataFields.groundtruth_keypoints] is a
[batch_size, num_boxes, num_keypoints, 2] float32 tensor containing
keypoints for each box.
unpad_groundtruth_tensors: A parameter passed to unstack_batch.
optimizer: The training optimizer that will update the variables.
learning_rate: The learning rate tensor for the current training step.
This is used only for TensorBoard logging purposes, it does not affect
model training.
add_regularization_loss: Whether or not to include the model's
regularization loss in the losses dictionary.
clip_gradients_value: If this is present, clip the gradients global norm
at this value using `tf.clip_by_global_norm`.
use_tpu: Whether computation should happen on a TPU.
use_bfloat16: Whether computation on a TPU should use bfloat16.
global_step: The current training step. Used for TensorBoard logging
purposes. This step is not updated by this function and must be
incremented separately.
num_replicas: The number of replicas in the current distribution strategy.
This is used to scale the total loss so that training in a distribution
strategy works correctly.
Returns:
The total loss observed at this training step
"""
# """Execute a single training step in the TF v2 style loop."""
is_training = True
detection_model._is_training = is_training # pylint: disable=protected-access
tf.keras.backend.set_learning_phase(is_training)
labels = model_lib.unstack_batch(
labels, unpad_groundtruth_tensors=unpad_groundtruth_tensors)
with tf.GradientTape() as tape:
losses_dict, _ = _compute_losses_and_predictions_dicts(
detection_model, features, labels, add_regularization_loss, use_tpu,
use_bfloat16)
total_loss = losses_dict['Loss/total_loss']
# Normalize loss for num replicas
total_loss = tf.math.divide(total_loss,
tf.constant(num_replicas, dtype=tf.float32))
losses_dict['Loss/normalized_total_loss'] = total_loss
for loss_type in losses_dict:
tf.compat.v2.summary.scalar(
loss_type, losses_dict[loss_type], step=global_step)
trainable_variables = detection_model.trainable_variables
gradients = tape.gradient(total_loss, trainable_variables)
if clip_gradients_value:
gradients, _ = tf.clip_by_global_norm(gradients, clip_gradients_value)
optimizer.apply_gradients(zip(gradients, trainable_variables))
if not use_tpu:
tf.compat.v2.summary.scalar('learning_rate', learning_rate,
step=global_step)
return total_loss
def load_fine_tune_checkpoint(
model, checkpoint_path, checkpoint_type,
load_all_detection_checkpoint_vars, input_dataset,
unpad_groundtruth_tensors, use_tpu, use_bfloat16):
"""Load a fine tuning classification or detection checkpoint.
To make sure the model variables are all built, this method first executes
the model by computing a dummy loss. (Models might not have built their
variables before their first execution)
It then loads a variable-name based classification or detection checkpoint
that comes from converted TF 1.x slim model checkpoints.
This method updates the model in-place and does not return a value.
Args:
model: A DetectionModel (based on Keras) to load a fine-tuning
checkpoint for.
checkpoint_path: Directory with checkpoints file or path to checkpoint.
checkpoint_type: Whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`.
load_all_detection_checkpoint_vars: whether to load all variables (when
`fine_tune_checkpoint_type` is `detection`). If False, only variables
within the feature extractor scopes are included. Default False.
input_dataset: The tf.data Dataset the model is being trained on. Needed
to get the shapes for the dummy loss computation.
unpad_groundtruth_tensors: A parameter passed to unstack_batch.
use_tpu: Whether computation should happen on a TPU.
use_bfloat16: Whether computation on a TPU should use bfloat16.
"""
features, labels = iter(input_dataset).next()
def _dummy_computation_fn(features, labels):
model._is_training = False # pylint: disable=protected-access
tf.keras.backend.set_learning_phase(False)
labels = model_lib.unstack_batch(
labels, unpad_groundtruth_tensors=unpad_groundtruth_tensors)
return _compute_losses_and_predictions_dicts(
model,
features,
labels,
use_tpu=use_tpu,
use_bfloat16=use_bfloat16)
strategy = tf.compat.v2.distribute.get_strategy()
strategy.experimental_run_v2(
_dummy_computation_fn, args=(
features,
labels,
))
var_map = model.restore_map(
fine_tune_checkpoint_type=checkpoint_type,
load_all_detection_checkpoint_vars=(
load_all_detection_checkpoint_vars))
available_var_map = (
variables_helper.get_variables_available_in_checkpoint(
var_map,
checkpoint_path,
include_global_step=False))
tf.train.init_from_checkpoint(checkpoint_path,
available_var_map)
def train_loop(
hparams,
pipeline_config_path,
model_dir,
config_override=None,
train_steps=None,
use_tpu=False,
save_final_config=False,
export_to_tpu=None,
checkpoint_every_n=1000, **kwargs):
"""Trains a model using eager + functions.
This method:
1. Processes the pipeline configs
2. (Optionally) saves the as-run config
3. Builds the model & optimizer
4. Gets the training input data
5. Loads a fine-tuning detection or classification checkpoint if requested
6. Loops over the train data, executing distributed training steps inside
tf.functions.
7. Checkpoints the model every `checkpoint_every_n` training steps.
8. Logs the training metrics as TensorBoard summaries.
Args:
hparams: A `HParams`.
pipeline_config_path: A path to a pipeline config file.
model_dir:
The directory to save checkpoints and summaries to.
config_override: A pipeline_pb2.TrainEvalPipelineConfig text proto to
override the config from `pipeline_config_path`.
train_steps: Number of training steps. If None, the number of training steps
is set from the `TrainConfig` proto.
use_tpu: Boolean, whether training and evaluation should run on TPU.
save_final_config: Whether to save final config (obtained after applying
overrides) to `model_dir`.
export_to_tpu: When use_tpu and export_to_tpu are true,
`export_savedmodel()` exports a metagraph for serving on TPU besides the
one on CPU. If export_to_tpu is not provided, we will look for it in
hparams too.
checkpoint_every_n:
Checkpoint every n training steps.
**kwargs: Additional keyword arguments for configuration override.
"""
## Parse the configs
get_configs_from_pipeline_file = MODEL_BUILD_UTIL_MAP[
'get_configs_from_pipeline_file']
merge_external_params_with_configs = MODEL_BUILD_UTIL_MAP[
'merge_external_params_with_configs']
create_pipeline_proto_from_configs = MODEL_BUILD_UTIL_MAP[
'create_pipeline_proto_from_configs']
configs = get_configs_from_pipeline_file(
pipeline_config_path, config_override=config_override)
kwargs.update({
'train_steps': train_steps,
'use_bfloat16': configs['train_config'].use_bfloat16 and use_tpu
})
configs = merge_external_params_with_configs(
configs, hparams, kwargs_dict=kwargs)
model_config = configs['model']
train_config = configs['train_config']
train_input_config = configs['train_input_config']
unpad_groundtruth_tensors = train_config.unpad_groundtruth_tensors
use_bfloat16 = train_config.use_bfloat16
add_regularization_loss = train_config.add_regularization_loss
clip_gradients_value = None
if train_config.gradient_clipping_by_norm > 0:
clip_gradients_value = train_config.gradient_clipping_by_norm
# update train_steps from config but only when non-zero value is provided
if train_steps is None and train_config.num_steps != 0:
train_steps = train_config.num_steps
# Read export_to_tpu from hparams if not passed.
if export_to_tpu is None:
export_to_tpu = hparams.get('export_to_tpu', False)
tf.logging.info(
'train_loop: use_tpu %s, export_to_tpu %s', use_tpu,
export_to_tpu)
# Parse the checkpoint fine tuning configs
if hparams.load_pretrained:
fine_tune_checkpoint_path = train_config.fine_tune_checkpoint
else:
fine_tune_checkpoint_path = None
load_all_detection_checkpoint_vars = (
train_config.load_all_detection_checkpoint_vars)
# TODO(kaftan) (or anyone else): move this piece of config munging to
## utils/config_util.py
if not train_config.fine_tune_checkpoint_type:
# train_config.from_detection_checkpoint field is deprecated. For
# backward compatibility, set train_config.fine_tune_checkpoint_type
# based on train_config.from_detection_checkpoint.
if train_config.from_detection_checkpoint:
train_config.fine_tune_checkpoint_type = 'detection'
else:
train_config.fine_tune_checkpoint_type = 'classification'
fine_tune_checkpoint_type = train_config.fine_tune_checkpoint_type
# Write the as-run pipeline config to disk.
if save_final_config:
pipeline_config_final = create_pipeline_proto_from_configs(configs)
config_util.save_pipeline_config(pipeline_config_final, model_dir)
# TODO(kaftan): Either make strategy a parameter of this method, or
## grab it w/ Distribution strategy's get_scope
# Build the model, optimizer, and training input
strategy = tf.compat.v2.distribute.MirroredStrategy()
with strategy.scope():
detection_model = model_builder.build(
model_config=model_config, is_training=True)
# Create the inputs.
train_input = inputs.train_input(
train_config=train_config,
train_input_config=train_input_config,
model_config=model_config,
model=detection_model)
train_input = strategy.experimental_distribute_dataset(
train_input.repeat())
global_step = tf.compat.v2.Variable(
0, trainable=False, dtype=tf.compat.v2.dtypes.int64)
optimizer, (learning_rate,) = optimizer_builder.build(
train_config.optimizer, global_step=global_step)
if callable(learning_rate):
learning_rate_fn = learning_rate
else:
learning_rate_fn = lambda: learning_rate
## Train the model
summary_writer = tf.compat.v2.summary.create_file_writer(model_dir + '/train')
with summary_writer.as_default():
with strategy.scope():
# Load a fine-tuning checkpoint.
if fine_tune_checkpoint_path:
load_fine_tune_checkpoint(detection_model, fine_tune_checkpoint_path,
fine_tune_checkpoint_type,
load_all_detection_checkpoint_vars,
train_input,
unpad_groundtruth_tensors, use_tpu,
use_bfloat16)
ckpt = tf.compat.v2.train.Checkpoint(
step=global_step, model=detection_model)
manager = tf.compat.v2.train.CheckpointManager(
ckpt, model_dir, max_to_keep=7)
## Maybe re-enable checkpoint restoration depending on how it works:
# ckpt.restore(manager.latest_checkpoint)
def train_step_fn(features, labels):
return eager_train_step(
detection_model,
features,
labels,
unpad_groundtruth_tensors,
optimizer,
learning_rate=learning_rate_fn(),
use_bfloat16=use_bfloat16,
add_regularization_loss=add_regularization_loss,
clip_gradients_value=clip_gradients_value,
use_tpu=use_tpu,
global_step=global_step,
num_replicas=strategy.num_replicas_in_sync)
@tf.function
def _dist_train_step(data_iterator):
"""A distributed train step."""
features, labels = data_iterator.next()
per_replica_losses = strategy.experimental_run_v2(
train_step_fn, args=(
features,
labels,
))
# TODO(anjalisridhar): explore if it is safe to remove the
## num_replicas scaling of the loss and switch this to a ReduceOp.Mean
mean_loss = strategy.reduce(
tf.distribute.ReduceOp.SUM, per_replica_losses, axis=None)
return mean_loss
train_input_iter = iter(train_input)
for _ in range(train_steps):
start_time = time.time()
loss = _dist_train_step(train_input_iter)
global_step.assign_add(1)
end_time = time.time()
tf.compat.v2.summary.scalar(
'steps_per_sec', 1.0 / (end_time - start_time), step=global_step)
# TODO(kaftan): Remove this print after it is no longer helpful for
## debugging.
tf.print('Finished step', global_step, end_time, loss)
if int(global_step.value().numpy()) % checkpoint_every_n == 0:
manager.save()
def eager_eval_loop(
detection_model,
configs,
eval_dataset,
use_tpu=False,
postprocess_on_cpu=False,
global_step=None):
"""Evaluate the model eagerly on the evaluation dataset.
This method will compute the evaluation metrics specified in the configs on
the entire evaluation dataset, then return the metrics. It will also log
the metrics to TensorBoard
Args:
detection_model: A DetectionModel (based on Keras) to evaluate.
configs: Object detection configs that specify the evaluators that should
be used, as well as whether regularization loss should be included and
if bfloat16 should be used on TPUs.
eval_dataset: Dataset containing evaluation data.
use_tpu: Whether a TPU is being used to execute the model for evaluation.
postprocess_on_cpu: Whether model postprocessing should happen on
the CPU when using a TPU to execute the model.
global_step: A variable containing the training step this model was trained
to. Used for logging purposes.
Returns:
A dict of evaluation metrics representing the results of this evaluation.
"""
train_config = configs['train_config']
eval_input_config = configs['eval_input_config']
eval_config = configs['eval_config']
use_bfloat16 = train_config.use_bfloat16
add_regularization_loss = train_config.add_regularization_loss
is_training = False
detection_model._is_training = is_training # pylint: disable=protected-access
tf.keras.backend.set_learning_phase(is_training)
evaluator_options = eval_util.evaluator_options_from_eval_config(
eval_config)
class_agnostic_category_index = (
label_map_util.create_class_agnostic_category_index())
class_agnostic_evaluators = eval_util.get_evaluators(
eval_config,
list(class_agnostic_category_index.values()),
evaluator_options)
class_aware_evaluators = None
if eval_input_config.label_map_path:
class_aware_category_index = (
label_map_util.create_category_index_from_labelmap(
eval_input_config.label_map_path))
class_aware_evaluators = eval_util.get_evaluators(
eval_config,
list(class_aware_category_index.values()),
evaluator_options)
evaluators = None
loss_metrics = {}
@tf.function
def compute_eval_dict(features, labels):
"""Compute the evaluation result on an image."""
# For evaling on train data, it is necessary to check whether groundtruth
# must be unpadded.
boxes_shape = (
labels[fields.InputDataFields.groundtruth_boxes].get_shape().as_list())
unpad_groundtruth_tensors = boxes_shape[1] is not None and not use_tpu
labels = model_lib.unstack_batch(
labels, unpad_groundtruth_tensors=unpad_groundtruth_tensors)
losses_dict, prediction_dict = _compute_losses_and_predictions_dicts(
detection_model, features, labels, add_regularization_loss, use_tpu,
use_bfloat16)
def postprocess_wrapper(args):
return detection_model.postprocess(args[0], args[1])
# TODO(kaftan): Depending on how postprocessing will work for TPUS w/
## TPUStrategy, may be good to move wrapping to a utility method
if use_tpu and postprocess_on_cpu:
detections = tf.contrib.tpu.outside_compilation(
postprocess_wrapper,
(prediction_dict, features[fields.InputDataFields.true_image_shape]))
else:
detections = postprocess_wrapper(
(prediction_dict, features[fields.InputDataFields.true_image_shape]))
class_agnostic = (
fields.DetectionResultFields.detection_classes not in detections)
# TODO(kaftan) (or anyone): move `_prepare_groundtruth_for_eval to eval_util
## and call this from there.
groundtruth = model_lib._prepare_groundtruth_for_eval( # pylint: disable=protected-access
detection_model, class_agnostic, eval_input_config.max_number_of_boxes)
use_original_images = fields.InputDataFields.original_image in features
if use_original_images:
eval_images = features[fields.InputDataFields.original_image]
true_image_shapes = tf.slice(
features[fields.InputDataFields.true_image_shape], [0, 0], [-1, 3])
original_image_spatial_shapes = features[
fields.InputDataFields.original_image_spatial_shape]
else:
eval_images = features[fields.InputDataFields.image]
true_image_shapes = None
original_image_spatial_shapes = None
eval_dict = eval_util.result_dict_for_batched_example(
eval_images,
features[inputs.HASH_KEY],
detections,
groundtruth,
class_agnostic=class_agnostic,
scale_to_absolute=True,
original_image_spatial_shapes=original_image_spatial_shapes,
true_image_shapes=true_image_shapes)
return eval_dict, losses_dict, class_agnostic
i = 0
for features, labels in eval_dataset:
eval_dict, losses_dict, class_agnostic = compute_eval_dict(features, labels)
end_time = time.time()
# TODO(kaftan): Remove this print after it is no longer helpful for
## debugging.
tf.print('Finished eval dict computation', i, end_time)
i += 1
if evaluators is None:
if class_agnostic:
evaluators = class_agnostic_evaluators
else:
evaluators = class_aware_evaluators
for evaluator in evaluators:
evaluator.add_eval_dict(eval_dict)
for loss_key, loss_tensor in iter(losses_dict.items()):
if loss_key not in loss_metrics:
loss_metrics[loss_key] = tf.keras.metrics.Mean()
loss_metrics[loss_key].update_state(loss_tensor)
eval_metrics = {}
for evaluator in evaluators:
eval_metrics.update(evaluator.evaluate())
for loss_key in loss_metrics:
eval_metrics[loss_key] = loss_metrics[loss_key].result()
eval_metrics = {str(k): v for k, v in eval_metrics.items()}
for k in eval_metrics:
tf.compat.v2.summary.scalar(k, eval_metrics[k], step=global_step)
return eval_metrics
def eval_continuously(
hparams,
pipeline_config_path,
config_override=None,
train_steps=None,
sample_1_of_n_eval_examples=1,
sample_1_of_n_eval_on_train_examples=1,
use_tpu=False,
override_eval_num_epochs=True,
postprocess_on_cpu=False,
export_to_tpu=None,
model_dir=None,
checkpoint_dir=None,
wait_interval=180,
**kwargs):
"""Run continuous evaluation of a detection model eagerly.
This method builds the model, and continously restores it from the most
recent training checkpoint in the checkpoint directory & evaluates it
on the evaluation data.
Args:
hparams: A `HParams`.
pipeline_config_path: A path to a pipeline config file.
config_override: A pipeline_pb2.TrainEvalPipelineConfig text proto to
override the config from `pipeline_config_path`.
train_steps: Number of training steps. If None, the number of training steps
is set from the `TrainConfig` proto.
sample_1_of_n_eval_examples: Integer representing how often an eval example
should be sampled. If 1, will sample all examples.
sample_1_of_n_eval_on_train_examples: Similar to
`sample_1_of_n_eval_examples`, except controls the sampling of training
data for evaluation.
use_tpu: Boolean, whether training and evaluation should run on TPU.
override_eval_num_epochs: Whether to overwrite the number of epochs to 1 for
eval_input.
postprocess_on_cpu: When use_tpu and postprocess_on_cpu are true,
postprocess is scheduled on the host cpu.
export_to_tpu: When use_tpu and export_to_tpu are true,
`export_savedmodel()` exports a metagraph for serving on TPU besides the
one on CPU. If export_to_tpu is not provided, we will look for it in
hparams too.
model_dir:
Directory to output resulting evaluation summaries to.
checkpoint_dir:
Directory that contains the training checkpoints.
wait_interval:
Terminate evaluation in no new checkpoints arrive within this wait
interval (in seconds).
**kwargs: Additional keyword arguments for configuration override.
"""
get_configs_from_pipeline_file = MODEL_BUILD_UTIL_MAP[
'get_configs_from_pipeline_file']
merge_external_params_with_configs = MODEL_BUILD_UTIL_MAP[
'merge_external_params_with_configs']
configs = get_configs_from_pipeline_file(
pipeline_config_path, config_override=config_override)
kwargs.update({
'sample_1_of_n_eval_examples': sample_1_of_n_eval_examples,
'use_bfloat16': configs['train_config'].use_bfloat16 and use_tpu
})
if train_steps is not None:
kwargs['train_steps'] = train_steps
if override_eval_num_epochs:
kwargs.update({'eval_num_epochs': 1})
tf.logging.warning(
'Forced number of epochs for all eval validations to be 1.')
configs = merge_external_params_with_configs(
configs, hparams, kwargs_dict=kwargs)
model_config = configs['model']
train_input_config = configs['train_input_config']
eval_config = configs['eval_config']
eval_input_configs = configs['eval_input_configs']
eval_on_train_input_config = copy.deepcopy(train_input_config)
eval_on_train_input_config.sample_1_of_n_examples = (
sample_1_of_n_eval_on_train_examples)
if override_eval_num_epochs and eval_on_train_input_config.num_epochs != 1:
tf.logging.warning('Expected number of evaluation epochs is 1, but '
'instead encountered `eval_on_train_input_config'
'.num_epochs` = '
'{}. Overwriting `num_epochs` to 1.'.format(
eval_on_train_input_config.num_epochs))
eval_on_train_input_config.num_epochs = 1
detection_model = model_builder.build(
model_config=model_config, is_training=True)
# Create the inputs.
eval_inputs = []
for eval_input_config in eval_input_configs:
next_eval_input = inputs.eval_input(
eval_config=eval_config,
eval_input_config=eval_input_config,
model_config=model_config,
model=detection_model)
eval_inputs.append((eval_input_config.name, next_eval_input))
# Read export_to_tpu from hparams if not passed.
if export_to_tpu is None:
export_to_tpu = hparams.get('export_to_tpu', False)
tf.logging.info('eval_continuously: use_tpu %s, export_to_tpu %s',
use_tpu, export_to_tpu)
global_step = tf.compat.v2.Variable(
0, trainable=False, dtype=tf.compat.v2.dtypes.int64)
prev_checkpoint = None
waiting = False
while True:
ckpt = tf.compat.v2.train.Checkpoint(
step=global_step, model=detection_model)
manager = tf.compat.v2.train.CheckpointManager(
ckpt, checkpoint_dir, max_to_keep=3)
latest_checkpoint = manager.latest_checkpoint
if prev_checkpoint == latest_checkpoint:
if prev_checkpoint is None:
tf.logging.info('No checkpoints found yet. Trying again in %s seconds.'
% wait_interval)
time.sleep(wait_interval)
else:
if waiting:
tf.logging.info('Terminating eval after %s seconds of no new '
'checkpoints.' % wait_interval)
break
else:
tf.logging.info('No new checkpoint found. Will try again '
'in %s seconds and terminate if no checkpoint '
'appears.' % wait_interval)
waiting = True
time.sleep(wait_interval)
else:
tf.logging.info('New checkpoint found. Starting evaluation.')
waiting = False
prev_checkpoint = latest_checkpoint
ckpt.restore(latest_checkpoint)
for eval_name, eval_input in eval_inputs:
summary_writer = tf.compat.v2.summary.create_file_writer(
model_dir + '/eval' + eval_name)
with summary_writer.as_default():
eager_eval_loop(
detection_model,
configs,
eval_input,
use_tpu=use_tpu,
postprocess_on_cpu=postprocess_on_cpu,
global_step=global_step)
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for object detection model library."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import tensorflow as tf
from object_detection import model_hparams
from object_detection import model_lib_v2
from object_detection.utils import config_util
# Model for test. Current options are:
# 'ssd_mobilenet_v2_pets_keras'
MODEL_NAME_FOR_TEST = 'ssd_mobilenet_v2_pets_keras'
def _get_data_path():
"""Returns an absolute path to TFRecord file."""
return os.path.join(tf.resource_loader.get_data_files_path(), 'test_data',
'pets_examples.record')
def get_pipeline_config_path(model_name):
"""Returns path to the local pipeline config file."""
return os.path.join(tf.resource_loader.get_data_files_path(), 'samples',
'configs', model_name + '.config')
def _get_labelmap_path():
"""Returns an absolute path to label map file."""
return os.path.join(tf.resource_loader.get_data_files_path(), 'data',
'pet_label_map.pbtxt')
def _get_config_kwarg_overrides():
"""Returns overrides to the configs that insert the correct local paths."""
data_path = _get_data_path()
label_map_path = _get_labelmap_path()
return {
'train_input_path': data_path,
'eval_input_path': data_path,
'label_map_path': label_map_path
}
def _get_configs_for_model(model_name):
"""Returns configurations for model."""
filename = get_pipeline_config_path(model_name)
configs = config_util.get_configs_from_pipeline_file(filename)
configs = config_util.merge_external_params_with_configs(
configs, kwargs_dict=_get_config_kwarg_overrides())
return configs
class ModelLibTest(tf.test.TestCase):
@classmethod
def setUpClass(cls):
tf.keras.backend.clear_session()
def test_train_loop_then_eval_loop(self):
"""Tests that Estimator and input function are constructed correctly."""
hparams = model_hparams.create_hparams(
hparams_overrides='load_pretrained=false')
pipeline_config_path = get_pipeline_config_path(MODEL_NAME_FOR_TEST)
config_kwarg_overrides = _get_config_kwarg_overrides()
model_dir = tf.test.get_temp_dir()
train_steps = 2
model_lib_v2.train_loop(
hparams,
pipeline_config_path,
model_dir=model_dir,
train_steps=train_steps,
checkpoint_every_n=1,
**config_kwarg_overrides)
model_lib_v2.eval_continuously(
hparams,
pipeline_config_path,
model_dir=model_dir,
checkpoint_dir=model_dir,
train_steps=train_steps,
wait_interval=10,
**config_kwarg_overrides)
......@@ -25,6 +25,7 @@ Huang et al. (https://arxiv.org/abs/1611.10012)
import tensorflow as tf
from object_detection.meta_architectures import faster_rcnn_meta_arch
from object_detection.utils import variables_helper
from nets import inception_resnet_v2
slim = tf.contrib.slim
......@@ -195,7 +196,7 @@ class FasterRCNNInceptionResnetV2FeatureExtractor(
"""
variables_to_restore = {}
for variable in tf.global_variables():
for variable in variables_helper.get_global_variables_safely():
if variable.op.name.startswith(
first_stage_feature_extractor_scope):
var_name = variable.op.name.replace(
......
......@@ -30,6 +30,7 @@ import tensorflow as tf
from object_detection.meta_architectures import faster_rcnn_meta_arch
from object_detection.models.keras_models import inception_resnet_v2
from object_detection.utils import model_util
from object_detection.utils import variables_helper
class FasterRCNNInceptionResnetV2KerasFeatureExtractor(
......@@ -1070,7 +1071,7 @@ class FasterRCNNInceptionResnetV2KerasFeatureExtractor(
}
variables_to_restore = {}
for variable in tf.global_variables():
for variable in variables_helper.get_global_variables_safely():
var_name = keras_to_slim_name_mapping.get(variable.op.name)
if var_name:
variables_to_restore[var_name] = variable
......
......@@ -23,6 +23,7 @@ https://arxiv.org/abs/1707.07012
import tensorflow as tf
from object_detection.meta_architectures import faster_rcnn_meta_arch
from object_detection.utils import variables_helper
from nets.nasnet import nasnet
from nets.nasnet import nasnet_utils
......@@ -307,7 +308,7 @@ class FasterRCNNNASFeatureExtractor(
# Note that the NAS checkpoint only contains the moving average version of
# the Variables so we need to generate an appropriate dictionary mapping.
variables_to_restore = {}
for variable in tf.global_variables():
for variable in variables_helper.get_global_variables_safely():
if variable.op.name.startswith(
first_stage_feature_extractor_scope):
var_name = variable.op.name.replace(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment