Unverified Commit 32e7d660 authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Open Images Challenge 2018 tools, minor fixes and refactors. (#4661)

* Merged commit includes the following changes:
202804536  by Zhichao Lu:

    Return tf.data.Dataset from input_fn that goes into the estimator and use PER_HOST_V2 option for tpu input pipeline config.

    This change shaves off 100ms per step resulting in 25 minutes of total reduced training time for ssd mobilenet v1 (15k steps to convergence).

--
202769340  by Zhichao Lu:

    Adding as_matrix() transformation for image-level labels.

--
202768721  by Zhichao Lu:

    Challenge evaluation protocol modification: adding labelmaps creation.

--
202750966  by Zhichao Lu:

    Add the explicit names to two output nodes.

--
202732783  by Zhichao Lu:

    Enforcing that batch size is 1 for evaluation, and no original images are retained during evaluation when use_tpu=False (to avoid dynamic shapes).

--
202425430  by Zhichao Lu:

    Refactor input pipeline to improve performance.

--
202406389  by Zhichao Lu:

    Only check the validity of `warmup_learning_rate` if it will be used.

--
202330450  by Zhichao Lu:

    Adding the description of the flag input_image_label_annotations_csv to add
      image-level labels to tf.Example.

--
202029012  by Zhichao Lu:

    Enabling displaying relationship name in the final metrics output.

--
202024010  by Zhichao Lu:

    Update to the public README.

--
201999677  by Zhichao Lu:

    Fixing the way negative labels are handled in VRD evaluation.

--
201962313  by Zhichao Lu:

    Fix a bug in resize_to_range.

--
201808488  by Zhichao Lu:

    Update ssd_inception_v2_pets.config to use right filename of pets dataset tf records.

--
201779225  by Zhichao Lu:

    Update object detection API installation doc

--
201766518  by Zhichao Lu:

    Add shell script to create pycocotools package for CMLE.

--
201722377  by Zhichao Lu:

    Removes verified_labels field and uses groundtruth_image_classes field instead.

--
201616819  by Zhichao Lu:

    Disable eval_on_tpu since eval_metrics is not setup to execute on TPU.
    Do not use run_config.task_type to switch tpu mode for EVAL,
    since that won't work in unit test.
    Expand unit test to verify that the same instantiation of the Estimator can independently disable eval on TPU whereas training is enabled on TPU.

--
201524716  by Zhichao Lu:

    Disable export model to TPU, inference is not compatible with TPU.
    Add GOOGLE_INTERNAL support in object detection copy.bara.sky

--
201453347  by Zhichao Lu:

    Fixing bug when evaluating the quantized model.

--
200795826  by Zhichao Lu:

    Fixing parsing bug: image-level labels are parsed as tuples instead of numpy
    array.

--
200746134  by Zhichao Lu:

    Adding image_class_text and image_class_label fields into tf_example_decoder.py

--
200743003  by Zhichao Lu:

    Changes to model_main.py and model_tpu_main to enable training and continuous eval.

--
200736324  by Zhichao Lu:

    Replace deprecated squeeze_dims argument.

--
200730072  by Zhichao Lu:

    Make detections only during predict and eval mode while creating model function

--
200729699  by Zhichao Lu:

    Minor correction to internal documentation (definition of Huber loss)

--
200727142  by Zhichao Lu:

    Add command line parsing as a set of flags using argparse and add header to the
    resulting file.

--
200726169  by Zhichao Lu:

    A tutorial on running evaluation for the Open Images Challenge 2018.

--
200665093  by Zhichao Lu:

    Cleanup on variables_helper_test.py.

--
200652145  by Zhichao Lu:

    Add an option to write (non-frozen) graph when exporting inference graph.

--
200573810  by Zhichao Lu:

    Update ssd_mobilenet_v1_coco and ssd_inception_v2_coco download links to point to a newer version.

--
200498014  by Zhichao Lu:

    Add test for groundtruth mask resizing.

--
200453245  by Zhichao Lu:

    Cleaning up exporting_models.md along with exporting scripts

--
200311747  by Zhichao Lu:

    Resize groundtruth mask to match the size of the original image.

--
200287269  by Zhichao Lu:

    Having a option to use custom MatMul based crop_and_resize op as an alternate to the TF op in Faster-RCNN

--
200127859  by Zhichao Lu:

    Updating the instructions to run locally with new binary. Also updating pets configs since file path naming has changed.

--
200127044  by Zhichao Lu:

    A simpler evaluation util to compute Open Images Challenge
    2018 metric (object detection track).

--
200124019  by Zhichao Lu:

    Freshening up configuring_jobs.md

--
200086825  by Zhichao Lu:

    Make merge_multiple_label_boxes work for ssd model.

--
199843258  by Zhichao Lu:

    Allows inconsistent feature channels to be compatible with WeightSharedConvolutionalBoxPredictor.

--
199676082  by Zhichao Lu:

    Enable an override for `InputReader.shuffle` for object detection pipelines.

--
199599212  by Zhichao Lu:

    Markdown fixes.

--
199535432  by Zhichao Lu:

    Pass num_additional_channels to tf.example decoder in predict_input_fn.

--
199399439  by Zhichao Lu:

    Adding `num_additional_channels` field to specify how many additional channels to use in the model.

--

PiperOrigin-RevId: 202804536

* Add original model builder and docs back.
parent 86ac7a47
...@@ -13,7 +13,7 @@ file is split into 5 parts: ...@@ -13,7 +13,7 @@ file is split into 5 parts:
model parameters (ie. SGD parameters, input preprocessing and feature extractor model parameters (ie. SGD parameters, input preprocessing and feature extractor
initialization values). initialization values).
3. The `eval_config`, which determines what set of metrics will be reported for 3. The `eval_config`, which determines what set of metrics will be reported for
evaluation (currently we only support the PASCAL VOC metrics). evaluation.
4. The `train_input_config`, which defines what dataset the model should be 4. The `train_input_config`, which defines what dataset the model should be
trained on. trained on.
5. The `eval_input_config`, which defines what dataset the model will be 5. The `eval_input_config`, which defines what dataset the model will be
...@@ -118,6 +118,7 @@ optimizer { ...@@ -118,6 +118,7 @@ optimizer {
} }
fine_tune_checkpoint: "/usr/home/username/tmp/model.ckpt-#####" fine_tune_checkpoint: "/usr/home/username/tmp/model.ckpt-#####"
from_detection_checkpoint: true from_detection_checkpoint: true
load_all_detection_checkpoint_vars: true
gradient_clipping_by_norm: 10.0 gradient_clipping_by_norm: 10.0
data_augmentation_options { data_augmentation_options {
random_horizontal_flip { random_horizontal_flip {
...@@ -130,8 +131,8 @@ data_augmentation_options { ...@@ -130,8 +131,8 @@ data_augmentation_options {
While optional, it is highly recommended that users utilize other object While optional, it is highly recommended that users utilize other object
detection checkpoints. Training an object detector from scratch can take days. detection checkpoints. Training an object detector from scratch can take days.
To speed up the training process, it is recommended that users re-use the To speed up the training process, it is recommended that users re-use the
feature extractor parameters from a pre-existing object classification or feature extractor parameters from a pre-existing image classification or
detection checkpoint. `train_config` provides two fields to specify object detection checkpoint. `train_config` provides two fields to specify
pre-existing checkpoints: `fine_tune_checkpoint` and pre-existing checkpoints: `fine_tune_checkpoint` and
`from_detection_checkpoint`. `fine_tune_checkpoint` should provide a path to `from_detection_checkpoint`. `fine_tune_checkpoint` should provide a path to
the pre-existing checkpoint the pre-existing checkpoint
...@@ -157,6 +158,8 @@ number of workers, gpu type). ...@@ -157,6 +158,8 @@ number of workers, gpu type).
## Configuring the Evaluator ## Configuring the Evaluator
Currently evaluation is fixed to generating metrics as defined by the PASCAL VOC The main components to set in `eval_config` are `num_examples` and
challenge. The parameters for `eval_config` are set to reasonable defaults and `metrics_set`. The parameter `num_examples` indicates the number of batches (
typically do not need to be configured. currently of batch size 1) used for an evaluation cycle, and often is the total
size of the evaluation dataset. The parameter `metrics_set` indicates which
metrics to run during evaluation (i.e. `"coco_detection_metrics"`).
...@@ -69,10 +69,10 @@ Some remarks on frozen inference graphs: ...@@ -69,10 +69,10 @@ Some remarks on frozen inference graphs:
| Model name | Speed (ms) | COCO mAP[^1] | Outputs | | Model name | Speed (ms) | COCO mAP[^1] | Outputs |
| ------------ | :--------------: | :--------------: | :-------------: | | ------------ | :--------------: | :--------------: | :-------------: |
| [ssd_mobilenet_v1_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz) | 30 | 21 | Boxes | | [ssd_mobilenet_v1_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz) | 30 | 21 | Boxes |
| [ssd_mobilenet_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz) | 31 | 22 | Boxes | | [ssd_mobilenet_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz) | 31 | 22 | Boxes |
| [ssdlite_mobilenet_v2_coco](http://download.tensorflow.org/models/object_detection/ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz) | 27 | 22 | Boxes | | [ssdlite_mobilenet_v2_coco](http://download.tensorflow.org/models/object_detection/ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz) | 27 | 22 | Boxes |
| [ssd_inception_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz) | 42 | 24 | Boxes | | [ssd_inception_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2018_01_28.tar.gz) | 42 | 24 | Boxes |
| [faster_rcnn_inception_v2_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz) | 58 | 28 | Boxes | | [faster_rcnn_inception_v2_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz) | 58 | 28 | Boxes |
| [faster_rcnn_resnet50_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz) | 89 | 30 | Boxes | | [faster_rcnn_resnet50_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz) | 89 | 30 | Boxes |
| [faster_rcnn_resnet50_lowproposals_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_lowproposals_coco_2018_01_28.tar.gz) | 64 | | Boxes | | [faster_rcnn_resnet50_lowproposals_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_lowproposals_coco_2018_01_28.tar.gz) | 64 | | Boxes |
......
...@@ -12,16 +12,25 @@ command from tensorflow/models/research: ...@@ -12,16 +12,25 @@ command from tensorflow/models/research:
``` bash ``` bash
# From tensorflow/models/research/ # From tensorflow/models/research/
INPUT_TYPE=image_tensor
PIPELINE_CONFIG_PATH={path to pipeline config file}
TRAINED_CKPT_PREFIX={path to model.ckpt}
EXPORT_DIR={path to folder that will be used for export}
python object_detection/export_inference_graph.py \ python object_detection/export_inference_graph.py \
--input_type image_tensor \ --input_type=${INPUT_TYPE} \
--pipeline_config_path ${PIPELINE_CONFIG_PATH} \ --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--trained_checkpoint_prefix ${TRAIN_PATH} \ --trained_checkpoint_prefix=${TRAINED_CKPT_PREFIX} \
--output_directory ${EXPORT_DIR} --output_directory=${EXPORT_DIR}
``` ```
Afterwards, you should see the directory ${EXPORT_DIR} containing the following: NOTE: We are configuring our exported model to ingest 4-D image tensors. We can
also configure the exported model to take encoded images or serialized
`tf.Example`s.
After export, you should see the directory ${EXPORT_DIR} containing the following:
* output_inference_graph.pb, the frozen graph format of the exported model
* saved_model/, a directory containing the saved model format of the exported model * saved_model/, a directory containing the saved model format of the exported model
* frozen_inference_graph.pb, the frozen graph format of the exported model
* model.ckpt.*, the model checkpoints used for exporting * model.ckpt.*, the model checkpoints used for exporting
* checkpoint, a file specifying to restore included checkpoint files * checkpoint, a file specifying to restore included checkpoint files
* pipeline.config, pipeline config file for the exported model
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
Tensorflow Object Detection API depends on the following libraries: Tensorflow Object Detection API depends on the following libraries:
* Protobuf 3+ * Protobuf 3.0.0
* Python-tk * Python-tk
* Pillow 1.0 * Pillow 1.0
* lxml * lxml
...@@ -13,6 +13,7 @@ Tensorflow Object Detection API depends on the following libraries: ...@@ -13,6 +13,7 @@ Tensorflow Object Detection API depends on the following libraries:
* Matplotlib * Matplotlib
* Tensorflow * Tensorflow
* Cython * Cython
* contextlib2
* cocoapi * cocoapi
For detailed steps to install Tensorflow, follow the [Tensorflow installation For detailed steps to install Tensorflow, follow the [Tensorflow installation
...@@ -30,21 +31,28 @@ The remaining libraries can be installed on Ubuntu 16.04 using via apt-get: ...@@ -30,21 +31,28 @@ The remaining libraries can be installed on Ubuntu 16.04 using via apt-get:
``` bash ``` bash
sudo apt-get install protobuf-compiler python-pil python-lxml python-tk sudo apt-get install protobuf-compiler python-pil python-lxml python-tk
sudo pip install Cython pip install --user Cython
sudo pip install jupyter pip install --user contextlib2
sudo pip install matplotlib pip install --user jupyter
pip install --user matplotlib
``` ```
Alternatively, users can install dependencies using pip: Alternatively, users can install dependencies using pip:
``` bash ``` bash
sudo pip install Cython pip install --user Cython
sudo pip install pillow pip install --user contextlib2
sudo pip install lxml pip install --user pillow
sudo pip install jupyter pip install --user lxml
sudo pip install matplotlib pip install --user jupyter
pip install --user matplotlib
``` ```
Note that sometimes "sudo apt-get install protobuf-compiler" will install
Protobuf 3+ versions for you and some users have issues when using 3.5.
If that is your case, you're suggested to download and install Protobuf 3.0.0
(available [here](https://github.com/google/protobuf/releases/tag/v3.0.0)).
## COCO API installation ## COCO API installation
Download the Download the
......
...@@ -13,8 +13,8 @@ inferred detections. ...@@ -13,8 +13,8 @@ inferred detections.
Inferred detections will look like the following: Inferred detections will look like the following:
![](img/oid_bus_72e19c28aac34ed8.jpg){height="300"} ![](img/oid_bus_72e19c28aac34ed8.jpg)
![](img/oid_monkey_3b4168c89cecbc5b.jpg){height="300"} ![](img/oid_monkey_3b4168c89cecbc5b.jpg)
On the validation set of Open Images, this tutorial requires 27GB of free disk On the validation set of Open Images, this tutorial requires 27GB of free disk
space and the inference step takes approximately 9 hours on a single NVIDIA space and the inference step takes approximately 9 hours on a single NVIDIA
...@@ -100,6 +100,8 @@ python -m object_detection/dataset_tools/create_oid_tf_record \ ...@@ -100,6 +100,8 @@ python -m object_detection/dataset_tools/create_oid_tf_record \
--num_shards=100 --num_shards=100
``` ```
To add image-level labels, use the `--input_image_label_annotations_csv` flag.
This results in 100 TFRecord files (shards), written to This results in 100 TFRecord files (shards), written to
`oid/${SPLIT}_tfrecords`, with filenames matching `oid/${SPLIT}_tfrecords`, with filenames matching
`${SPLIT}.tfrecord-000[0-9][0-9]-of-00100`. Each shard contains approximately `${SPLIT}.tfrecord-000[0-9][0-9]-of-00100`. Each shard contains approximately
...@@ -146,7 +148,7 @@ access to the images, `infer_detections` can optionally discard them with the ...@@ -146,7 +148,7 @@ access to the images, `infer_detections` can optionally discard them with the
`--discard_image_pixels` flag. Discarding the images drastically reduces the `--discard_image_pixels` flag. Discarding the images drastically reduces the
size of the output TFRecord. size of the output TFRecord.
### Accelerating inference {#accelerating_inference} ### Accelerating inference
Running inference on the whole validation or test set can take a long time to Running inference on the whole validation or test set can take a long time to
complete due to the large number of images present in these sets (41,620 and complete due to the large number of images present in these sets (41,620 and
...@@ -196,7 +198,7 @@ After all `infer_detections` processes finish, `tensorflow/models/research/oid` ...@@ -196,7 +198,7 @@ After all `infer_detections` processes finish, `tensorflow/models/research/oid`
will contain one output TFRecord from each process, with name matching will contain one output TFRecord from each process, with name matching
`validation_detections.tfrecord-0000[0-3]-of-00004`. `validation_detections.tfrecord-0000[0-3]-of-00004`.
## Computing evaluation measures {#compute_evaluation_measures} ## Computing evaluation measures
To compute evaluation measures on the inferred detections you first need to To compute evaluation measures on the inferred detections you first need to
create the appropriate configuration files: create the appropriate configuration files:
...@@ -237,7 +239,7 @@ file contains an `object_detection.protos.EvalConfig` message that describes the ...@@ -237,7 +239,7 @@ file contains an `object_detection.protos.EvalConfig` message that describes the
evaluation metric. For more information about these protos see the corresponding evaluation metric. For more information about these protos see the corresponding
source files. source files.
### Expected mAPs {#expected-maps} ### Expected mAPs
The result of running `offline_eval_map_corloc` is a CSV file located at The result of running `offline_eval_map_corloc` is a CSV file located at
`${SPLIT}_eval_metrics/metrics.csv`. With the above configuration, the file will `${SPLIT}_eval_metrics/metrics.csv`. With the above configuration, the file will
......
...@@ -33,8 +33,8 @@ from object_detection.protos import input_reader_pb2 ...@@ -33,8 +33,8 @@ from object_detection.protos import input_reader_pb2
from object_detection.protos import model_pb2 from object_detection.protos import model_pb2
from object_detection.protos import train_pb2 from object_detection.protos import train_pb2
from object_detection.utils import config_util from object_detection.utils import config_util
from object_detection.utils import dataset_util
from object_detection.utils import ops as util_ops from object_detection.utils import ops as util_ops
from object_detection.utils import shape_utils
HASH_KEY = 'hash' HASH_KEY = 'hash'
HASH_BINS = 1 << 31 HASH_BINS = 1 << 31
...@@ -91,6 +91,9 @@ def transform_input_data(tensor_dict, ...@@ -91,6 +91,9 @@ def transform_input_data(tensor_dict,
A dictionary keyed by fields.InputDataFields containing the tensors obtained A dictionary keyed by fields.InputDataFields containing the tensors obtained
after applying all the transformations. after applying all the transformations.
""" """
if fields.InputDataFields.groundtruth_boxes in tensor_dict:
tensor_dict = util_ops.filter_groundtruth_with_nan_box_coordinates(
tensor_dict)
if fields.InputDataFields.image_additional_channels in tensor_dict: if fields.InputDataFields.image_additional_channels in tensor_dict:
channels = tensor_dict[fields.InputDataFields.image_additional_channels] channels = tensor_dict[fields.InputDataFields.image_additional_channels]
tensor_dict[fields.InputDataFields.image] = tf.concat( tensor_dict[fields.InputDataFields.image] = tf.concat(
...@@ -135,6 +138,103 @@ def transform_input_data(tensor_dict, ...@@ -135,6 +138,103 @@ def transform_input_data(tensor_dict,
return tensor_dict return tensor_dict
def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
spatial_image_shape=None):
"""Pads input tensors to static shapes.
Args:
tensor_dict: Tensor dictionary of input data
max_num_boxes: Max number of groundtruth boxes needed to compute shapes for
padding.
num_classes: Number of classes in the dataset needed to compute shapes for
padding.
spatial_image_shape: A list of two integers of the form [height, width]
containing expected spatial shape of the image.
Returns:
A dictionary keyed by fields.InputDataFields containing padding shapes for
tensors in the dataset.
Raises:
ValueError: If groundtruth classes is neither rank 1 nor rank 2.
"""
if not spatial_image_shape or spatial_image_shape == [-1, -1]:
height, width = None, None
else:
height, width = spatial_image_shape # pylint: disable=unpacking-non-sequence
num_additional_channels = 0
if fields.InputDataFields.image_additional_channels in tensor_dict:
num_additional_channels = tensor_dict[
fields.InputDataFields.image_additional_channels].shape[2].value
padding_shapes = {
# Additional channels are merged before batching.
fields.InputDataFields.image: [
height, width, 3 + num_additional_channels
],
fields.InputDataFields.image_additional_channels: [
height, width, num_additional_channels
],
fields.InputDataFields.source_id: [],
fields.InputDataFields.filename: [],
fields.InputDataFields.key: [],
fields.InputDataFields.groundtruth_difficult: [max_num_boxes],
fields.InputDataFields.groundtruth_boxes: [max_num_boxes, 4],
fields.InputDataFields.groundtruth_classes: [max_num_boxes, num_classes],
fields.InputDataFields.groundtruth_instance_masks: [
max_num_boxes, height, width
],
fields.InputDataFields.groundtruth_is_crowd: [max_num_boxes],
fields.InputDataFields.groundtruth_group_of: [max_num_boxes],
fields.InputDataFields.groundtruth_area: [max_num_boxes],
fields.InputDataFields.groundtruth_weights: [max_num_boxes],
fields.InputDataFields.num_groundtruth_boxes: [],
fields.InputDataFields.groundtruth_label_types: [max_num_boxes],
fields.InputDataFields.groundtruth_label_scores: [max_num_boxes],
fields.InputDataFields.true_image_shape: [3],
fields.InputDataFields.multiclass_scores: [
max_num_boxes, num_classes + 1 if num_classes is not None else None
],
fields.InputDataFields.groundtruth_image_classes: [num_classes],
}
if fields.InputDataFields.original_image in tensor_dict:
padding_shapes[fields.InputDataFields.original_image] = [
None, None, 3 + num_additional_channels
]
if fields.InputDataFields.groundtruth_keypoints in tensor_dict:
tensor_shape = (
tensor_dict[fields.InputDataFields.groundtruth_keypoints].shape)
padding_shape = [max_num_boxes, tensor_shape[1].value,
tensor_shape[2].value]
padding_shapes[fields.InputDataFields.groundtruth_keypoints] = padding_shape
if fields.InputDataFields.groundtruth_keypoint_visibilities in tensor_dict:
tensor_shape = tensor_dict[fields.InputDataFields.
groundtruth_keypoint_visibilities].shape
padding_shape = [max_num_boxes, tensor_shape[1].value]
padding_shapes[fields.InputDataFields.
groundtruth_keypoint_visibilities] = padding_shape
padded_tensor_dict = {}
for tensor_name in tensor_dict:
expected_shape = padding_shapes[tensor_name]
current_shape = shape_utils.combined_static_and_dynamic_shape(
tensor_dict[tensor_name])
trailing_paddings = [
expected_shape_dim - current_shape_dim if expected_shape_dim else 0
for expected_shape_dim, current_shape_dim in zip(
expected_shape, current_shape)
]
paddings = tf.stack([tf.zeros(len(trailing_paddings), dtype=tf.int32),
trailing_paddings],
axis=1)
padded_tensor_dict[tensor_name] = tf.pad(
tensor_dict[tensor_name], paddings=paddings)
padded_tensor_dict[tensor_name].set_shape(expected_shape)
return padded_tensor_dict
def augment_input_data(tensor_dict, data_augmentation_options): def augment_input_data(tensor_dict, data_augmentation_options):
"""Applies data augmentation ops to input tensors. """Applies data augmentation ops to input tensors.
...@@ -231,6 +331,8 @@ def create_train_input_fn(train_config, train_input_config, ...@@ -231,6 +331,8 @@ def create_train_input_fn(train_config, train_input_config,
params: Parameter dictionary passed from the estimator. params: Parameter dictionary passed from the estimator.
Returns: Returns:
A tf.data.Dataset that holds (features, labels) tuple.
features: Dictionary of feature tensors. features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [batch_size, H, W, C] features[fields.InputDataFields.image] is a [batch_size, H, W, C]
float32 tensor with preprocessed images. float32 tensor with preprocessed images.
...@@ -275,33 +377,39 @@ def create_train_input_fn(train_config, train_input_config, ...@@ -275,33 +377,39 @@ def create_train_input_fn(train_config, train_input_config,
raise TypeError('The `model_config` must be a ' raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.') 'model_pb2.DetectionModel.')
data_augmentation_options = [ def transform_and_pad_input_data_fn(tensor_dict):
preprocessor_builder.build(step) """Combines transform and pad operation."""
for step in train_config.data_augmentation_options data_augmentation_options = [
] preprocessor_builder.build(step)
data_augmentation_fn = functools.partial( for step in train_config.data_augmentation_options
augment_input_data, data_augmentation_options=data_augmentation_options) ]
data_augmentation_fn = functools.partial(
augment_input_data,
data_augmentation_options=data_augmentation_options)
model = model_builder.build(model_config, is_training=True)
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model.preprocess,
image_resizer_fn=image_resizer_fn,
num_classes=config_util.get_number_of_classes(model_config),
data_augmentation_fn=data_augmentation_fn,
merge_multiple_boxes=train_config.merge_multiple_label_boxes,
retain_original_image=train_config.retain_original_images)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=train_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
model = model_builder.build(model_config, is_training=True)
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model.preprocess,
image_resizer_fn=image_resizer_fn,
num_classes=config_util.get_number_of_classes(model_config),
data_augmentation_fn=data_augmentation_fn,
retain_original_image=train_config.retain_original_images)
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build']( dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
train_input_config, train_input_config,
transform_input_data_fn=transform_data_fn, transform_input_data_fn=transform_and_pad_input_data_fn,
batch_size=params['batch_size'] if params else train_config.batch_size, batch_size=params['batch_size'] if params else train_config.batch_size)
max_num_boxes=train_config.max_number_of_boxes, return dataset
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
input_dict = dataset_util.make_initializable_iterator(dataset).get_next()
return (_get_features_dict(input_dict), _get_labels_dict(input_dict))
return _train_input_fn return _train_input_fn
...@@ -309,6 +417,8 @@ def create_train_input_fn(train_config, train_input_config, ...@@ -309,6 +417,8 @@ def create_train_input_fn(train_config, train_input_config,
def create_eval_input_fn(eval_config, eval_input_config, model_config): def create_eval_input_fn(eval_config, eval_input_config, model_config):
"""Creates an eval `input` function for `Estimator`. """Creates an eval `input` function for `Estimator`.
# TODO(ronnyvotel,rathodv): Allow batch sizes of more than 1 for eval.
Args: Args:
eval_config: An eval_pb2.EvalConfig. eval_config: An eval_pb2.EvalConfig.
eval_input_config: An input_reader_pb2.InputReader. eval_input_config: An input_reader_pb2.InputReader.
...@@ -325,6 +435,8 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config): ...@@ -325,6 +435,8 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
params: Parameter dictionary passed from the estimator. params: Parameter dictionary passed from the estimator.
Returns: Returns:
A tf.data.Dataset that holds (features, labels) tuple.
features: Dictionary of feature tensors. features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor
with preprocessed images. with preprocessed images.
...@@ -366,36 +478,41 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config): ...@@ -366,36 +478,41 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
raise TypeError('The `model_config` must be a ' raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.') 'model_pb2.DetectionModel.')
num_classes = config_util.get_number_of_classes(model_config) def transform_and_pad_input_data_fn(tensor_dict):
model = model_builder.build(model_config, is_training=False) """Combines transform and pad operation."""
image_resizer_config = config_util.get_image_resizer_config(model_config) num_classes = config_util.get_number_of_classes(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config) model = model_builder.build(model_config, is_training=False)
image_resizer_config = config_util.get_image_resizer_config(model_config)
transform_data_fn = functools.partial( image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_input_data, model_preprocess_fn=model.preprocess,
image_resizer_fn=image_resizer_fn, transform_data_fn = functools.partial(
num_classes=num_classes, transform_input_data, model_preprocess_fn=model.preprocess,
data_augmentation_fn=None, image_resizer_fn=image_resizer_fn,
retain_original_image=eval_config.retain_original_images) num_classes=num_classes,
data_augmentation_fn=None,
retain_original_image=eval_config.retain_original_images)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=eval_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build']( dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
eval_input_config, eval_input_config,
transform_input_data_fn=transform_data_fn, batch_size=1, # Currently only support batch size of 1 for eval.
batch_size=params.get('batch_size', 1), transform_input_data_fn=transform_and_pad_input_data_fn)
num_classes=config_util.get_number_of_classes(model_config), return dataset
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
input_dict = dataset_util.make_initializable_iterator(dataset).get_next()
return (_get_features_dict(input_dict), _get_labels_dict(input_dict))
return _eval_input_fn return _eval_input_fn
def create_predict_input_fn(model_config): def create_predict_input_fn(model_config, predict_input_config):
"""Creates a predict `input` function for `Estimator`. """Creates a predict `input` function for `Estimator`.
Args: Args:
model_config: A model_pb2.DetectionModel. model_config: A model_pb2.DetectionModel.
predict_input_config: An input_reader_pb2.InputReader.
Returns: Returns:
`input_fn` for `Estimator` in PREDICT mode. `input_fn` for `Estimator` in PREDICT mode.
...@@ -424,7 +541,9 @@ def create_predict_input_fn(model_config): ...@@ -424,7 +541,9 @@ def create_predict_input_fn(model_config):
num_classes=num_classes, num_classes=num_classes,
data_augmentation_fn=None) data_augmentation_fn=None)
decoder = tf_example_decoder.TfExampleDecoder(load_instance_masks=False) decoder = tf_example_decoder.TfExampleDecoder(
load_instance_masks=False,
num_additional_channels=predict_input_config.num_additional_channels)
input_dict = transform_fn(decoder.decode(example)) input_dict = transform_fn(decoder.decode(example))
images = tf.to_float(input_dict[fields.InputDataFields.image]) images = tf.to_float(input_dict[fields.InputDataFields.image])
images = tf.expand_dims(images, axis=0) images = tf.expand_dims(images, axis=0)
......
...@@ -48,17 +48,30 @@ def _get_configs_for_model(model_name): ...@@ -48,17 +48,30 @@ def _get_configs_for_model(model_name):
label_map_path=label_map_path) label_map_path=label_map_path)
def _make_initializable_iterator(dataset):
"""Creates an iterator, and initializes tables.
Args:
dataset: A `tf.data.Dataset` object.
Returns:
A `tf.data.Iterator`.
"""
iterator = dataset.make_initializable_iterator()
tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, iterator.initializer)
return iterator
class InputsTest(tf.test.TestCase): class InputsTest(tf.test.TestCase):
def test_faster_rcnn_resnet50_train_input(self): def test_faster_rcnn_resnet50_train_input(self):
"""Tests the training input function for FasterRcnnResnet50.""" """Tests the training input function for FasterRcnnResnet50."""
configs = _get_configs_for_model('faster_rcnn_resnet50_pets') configs = _get_configs_for_model('faster_rcnn_resnet50_pets')
configs['train_config'].unpad_groundtruth_tensors = True
model_config = configs['model'] model_config = configs['model']
model_config.faster_rcnn.num_classes = 37 model_config.faster_rcnn.num_classes = 37
train_input_fn = inputs.create_train_input_fn( train_input_fn = inputs.create_train_input_fn(
configs['train_config'], configs['train_input_config'], model_config) configs['train_config'], configs['train_input_config'], model_config)
features, labels = train_input_fn() features, labels = _make_initializable_iterator(train_input_fn()).get_next()
self.assertAllEqual([1, None, None, 3], self.assertAllEqual([1, None, None, 3],
features[fields.InputDataFields.image].shape.as_list()) features[fields.InputDataFields.image].shape.as_list())
...@@ -67,17 +80,17 @@ class InputsTest(tf.test.TestCase): ...@@ -67,17 +80,17 @@ class InputsTest(tf.test.TestCase):
features[inputs.HASH_KEY].shape.as_list()) features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype) self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, 50, 4], [1, 100, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list()) labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype) labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, 50, model_config.faster_rcnn.num_classes], [1, 100, model_config.faster_rcnn.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list()) labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype) labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, 50], [1, 100],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list()) labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype) labels[fields.InputDataFields.groundtruth_weights].dtype)
...@@ -89,8 +102,7 @@ class InputsTest(tf.test.TestCase): ...@@ -89,8 +102,7 @@ class InputsTest(tf.test.TestCase):
model_config.faster_rcnn.num_classes = 37 model_config.faster_rcnn.num_classes = 37
eval_input_fn = inputs.create_eval_input_fn( eval_input_fn = inputs.create_eval_input_fn(
configs['eval_config'], configs['eval_input_config'], model_config) configs['eval_config'], configs['eval_input_config'], model_config)
features, labels = eval_input_fn() features, labels = _make_initializable_iterator(eval_input_fn()).get_next()
self.assertAllEqual([1, None, None, 3], self.assertAllEqual([1, None, None, 3],
features[fields.InputDataFields.image].shape.as_list()) features[fields.InputDataFields.image].shape.as_list())
self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype) self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
...@@ -102,27 +114,27 @@ class InputsTest(tf.test.TestCase): ...@@ -102,27 +114,27 @@ class InputsTest(tf.test.TestCase):
self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list()) self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype) self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, None, 4], [1, 100, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list()) labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype) labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, None, model_config.faster_rcnn.num_classes], [1, 100, model_config.faster_rcnn.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list()) labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype) labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, None], [1, 100],
labels[fields.InputDataFields.groundtruth_area].shape.as_list()) labels[fields.InputDataFields.groundtruth_area].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_area].dtype) labels[fields.InputDataFields.groundtruth_area].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, None], [1, 100],
labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list()) labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
self.assertEqual( self.assertEqual(
tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype) tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, None], [1, 100],
labels[fields.InputDataFields.groundtruth_difficult].shape.as_list()) labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
self.assertEqual( self.assertEqual(
tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype) tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
...@@ -135,7 +147,7 @@ class InputsTest(tf.test.TestCase): ...@@ -135,7 +147,7 @@ class InputsTest(tf.test.TestCase):
batch_size = configs['train_config'].batch_size batch_size = configs['train_config'].batch_size
train_input_fn = inputs.create_train_input_fn( train_input_fn = inputs.create_train_input_fn(
configs['train_config'], configs['train_input_config'], model_config) configs['train_config'], configs['train_input_config'], model_config)
features, labels = train_input_fn() features, labels = _make_initializable_iterator(train_input_fn()).get_next()
self.assertAllEqual([batch_size, 300, 300, 3], self.assertAllEqual([batch_size, 300, 300, 3],
features[fields.InputDataFields.image].shape.as_list()) features[fields.InputDataFields.image].shape.as_list())
...@@ -149,17 +161,17 @@ class InputsTest(tf.test.TestCase): ...@@ -149,17 +161,17 @@ class InputsTest(tf.test.TestCase):
self.assertEqual(tf.int32, self.assertEqual(tf.int32,
labels[fields.InputDataFields.num_groundtruth_boxes].dtype) labels[fields.InputDataFields.num_groundtruth_boxes].dtype)
self.assertAllEqual( self.assertAllEqual(
[batch_size, 50, 4], [batch_size, 100, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list()) labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype) labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual( self.assertAllEqual(
[batch_size, 50, model_config.ssd.num_classes], [batch_size, 100, model_config.ssd.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list()) labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype) labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual( self.assertAllEqual(
[batch_size, 50], [batch_size, 100],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list()) labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype) labels[fields.InputDataFields.groundtruth_weights].dtype)
...@@ -171,8 +183,7 @@ class InputsTest(tf.test.TestCase): ...@@ -171,8 +183,7 @@ class InputsTest(tf.test.TestCase):
model_config.ssd.num_classes = 37 model_config.ssd.num_classes = 37
eval_input_fn = inputs.create_eval_input_fn( eval_input_fn = inputs.create_eval_input_fn(
configs['eval_config'], configs['eval_input_config'], model_config) configs['eval_config'], configs['eval_input_config'], model_config)
features, labels = eval_input_fn() features, labels = _make_initializable_iterator(eval_input_fn()).get_next()
self.assertAllEqual([1, 300, 300, 3], self.assertAllEqual([1, 300, 300, 3],
features[fields.InputDataFields.image].shape.as_list()) features[fields.InputDataFields.image].shape.as_list())
self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype) self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
...@@ -184,27 +195,27 @@ class InputsTest(tf.test.TestCase): ...@@ -184,27 +195,27 @@ class InputsTest(tf.test.TestCase):
self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list()) self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype) self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, None, 4], [1, 100, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list()) labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype) labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, None, model_config.ssd.num_classes], [1, 100, model_config.ssd.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list()) labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype) labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, None], [1, 100],
labels[fields.InputDataFields.groundtruth_area].shape.as_list()) labels[fields.InputDataFields.groundtruth_area].shape.as_list())
self.assertEqual(tf.float32, self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_area].dtype) labels[fields.InputDataFields.groundtruth_area].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, None], [1, 100],
labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list()) labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
self.assertEqual( self.assertEqual(
tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype) tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
self.assertAllEqual( self.assertAllEqual(
[1, None], [1, 100],
labels[fields.InputDataFields.groundtruth_difficult].shape.as_list()) labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
self.assertEqual( self.assertEqual(
tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype) tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
...@@ -213,7 +224,8 @@ class InputsTest(tf.test.TestCase): ...@@ -213,7 +224,8 @@ class InputsTest(tf.test.TestCase):
"""Tests the predict input function.""" """Tests the predict input function."""
configs = _get_configs_for_model('ssd_inception_v2_pets') configs = _get_configs_for_model('ssd_inception_v2_pets')
predict_input_fn = inputs.create_predict_input_fn( predict_input_fn = inputs.create_predict_input_fn(
model_config=configs['model']) model_config=configs['model'],
predict_input_config=configs['eval_input_config'])
serving_input_receiver = predict_input_fn() serving_input_receiver = predict_input_fn()
image = serving_input_receiver.features[fields.InputDataFields.image] image = serving_input_receiver.features[fields.InputDataFields.image]
...@@ -223,6 +235,23 @@ class InputsTest(tf.test.TestCase): ...@@ -223,6 +235,23 @@ class InputsTest(tf.test.TestCase):
self.assertEqual(tf.float32, image.dtype) self.assertEqual(tf.float32, image.dtype)
self.assertEqual(tf.string, receiver_tensors.dtype) self.assertEqual(tf.string, receiver_tensors.dtype)
def test_predict_input_with_additional_channels(self):
"""Tests the predict input function with additional channels."""
configs = _get_configs_for_model('ssd_inception_v2_pets')
configs['eval_input_config'].num_additional_channels = 2
predict_input_fn = inputs.create_predict_input_fn(
model_config=configs['model'],
predict_input_config=configs['eval_input_config'])
serving_input_receiver = predict_input_fn()
image = serving_input_receiver.features[fields.InputDataFields.image]
receiver_tensors = serving_input_receiver.receiver_tensors[
inputs.SERVING_FED_EXAMPLE_KEY]
# RGB + 2 additional channels = 5 channels.
self.assertEqual([1, 300, 300, 5], image.shape.as_list())
self.assertEqual(tf.float32, image.dtype)
self.assertEqual(tf.string, receiver_tensors.dtype)
def test_error_with_bad_train_config(self): def test_error_with_bad_train_config(self):
"""Tests that a TypeError is raised with improper train config.""" """Tests that a TypeError is raised with improper train config."""
configs = _get_configs_for_model('ssd_inception_v2_pets') configs = _get_configs_for_model('ssd_inception_v2_pets')
...@@ -597,5 +626,93 @@ class DataTransformationFnTest(tf.test.TestCase): ...@@ -597,5 +626,93 @@ class DataTransformationFnTest(tf.test.TestCase):
(np_image + 5) * 2) (np_image + 5) * 2)
class PadInputDataToStaticShapesFnTest(tf.test.TestCase):
def test_pad_images_boxes_and_classes(self):
input_tensor_dict = {
fields.InputDataFields.image:
tf.placeholder(tf.float32, [None, None, 3]),
fields.InputDataFields.groundtruth_boxes:
tf.placeholder(tf.float32, [None, 4]),
fields.InputDataFields.groundtruth_classes:
tf.placeholder(tf.int32, [None, 3]),
fields.InputDataFields.true_image_shape: tf.placeholder(tf.int32, [3]),
}
padded_tensor_dict = inputs.pad_input_data_to_static_shapes(
tensor_dict=input_tensor_dict,
max_num_boxes=3,
num_classes=3,
spatial_image_shape=[5, 6])
self.assertAllEqual(
padded_tensor_dict[fields.InputDataFields.image].shape.as_list(),
[5, 6, 3])
self.assertAllEqual(
padded_tensor_dict[fields.InputDataFields.true_image_shape]
.shape.as_list(), [3])
self.assertAllEqual(
padded_tensor_dict[fields.InputDataFields.groundtruth_boxes]
.shape.as_list(), [3, 4])
self.assertAllEqual(
padded_tensor_dict[fields.InputDataFields.groundtruth_classes]
.shape.as_list(), [3, 3])
def test_do_not_pad_dynamic_images(self):
input_tensor_dict = {
fields.InputDataFields.image:
tf.placeholder(tf.float32, [None, None, 3]),
}
padded_tensor_dict = inputs.pad_input_data_to_static_shapes(
tensor_dict=input_tensor_dict,
max_num_boxes=3,
num_classes=3,
spatial_image_shape=[None, None])
self.assertAllEqual(
padded_tensor_dict[fields.InputDataFields.image].shape.as_list(),
[None, None, 3])
def test_images_and_additional_channels(self):
input_tensor_dict = {
fields.InputDataFields.image:
tf.placeholder(tf.float32, [None, None, 3]),
fields.InputDataFields.image_additional_channels:
tf.placeholder(tf.float32, [None, None, 2]),
}
padded_tensor_dict = inputs.pad_input_data_to_static_shapes(
tensor_dict=input_tensor_dict,
max_num_boxes=3,
num_classes=3,
spatial_image_shape=[5, 6])
self.assertAllEqual(
padded_tensor_dict[fields.InputDataFields.image].shape.as_list(),
[5, 6, 5])
self.assertAllEqual(
padded_tensor_dict[fields.InputDataFields.image_additional_channels]
.shape.as_list(), [5, 6, 2])
def test_keypoints(self):
input_tensor_dict = {
fields.InputDataFields.groundtruth_keypoints:
tf.placeholder(tf.float32, [None, 16, 4]),
fields.InputDataFields.groundtruth_keypoint_visibilities:
tf.placeholder(tf.bool, [None, 16]),
}
padded_tensor_dict = inputs.pad_input_data_to_static_shapes(
tensor_dict=input_tensor_dict,
max_num_boxes=3,
num_classes=3,
spatial_image_shape=[5, 6])
self.assertAllEqual(
padded_tensor_dict[fields.InputDataFields.groundtruth_keypoints]
.shape.as_list(), [3, 16, 4])
self.assertAllEqual(
padded_tensor_dict[
fields.InputDataFields.groundtruth_keypoint_visibilities]
.shape.as_list(), [3, 16])
if __name__ == '__main__': if __name__ == '__main__':
tf.test.main() tf.test.main()
...@@ -253,7 +253,8 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -253,7 +253,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
second_stage_mask_prediction_loss_weight=1.0, second_stage_mask_prediction_loss_weight=1.0,
hard_example_miner=None, hard_example_miner=None,
parallel_iterations=16, parallel_iterations=16,
add_summaries=True): add_summaries=True,
use_matmul_crop_and_resize=False):
"""FasterRCNNMetaArch Constructor. """FasterRCNNMetaArch Constructor.
Args: Args:
...@@ -360,6 +361,9 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -360,6 +361,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
in parallel for calls to tf.map_fn. in parallel for calls to tf.map_fn.
add_summaries: boolean (default: True) controlling whether summary ops add_summaries: boolean (default: True) controlling whether summary ops
should be added to tensorflow graph. should be added to tensorflow graph.
use_matmul_crop_and_resize: Force the use of matrix multiplication based
crop and resize instead of standard tf.image.crop_and_resize while
computing second stage input feature maps.
Raises: Raises:
ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` at ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` at
...@@ -446,6 +450,7 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -446,6 +450,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
self._second_stage_cls_loss_weight = second_stage_classification_loss_weight self._second_stage_cls_loss_weight = second_stage_classification_loss_weight
self._second_stage_mask_loss_weight = ( self._second_stage_mask_loss_weight = (
second_stage_mask_prediction_loss_weight) second_stage_mask_prediction_loss_weight)
self._use_matmul_crop_and_resize = use_matmul_crop_and_resize
self._hard_example_miner = hard_example_miner self._hard_example_miner = hard_example_miner
self._parallel_iterations = parallel_iterations self._parallel_iterations = parallel_iterations
...@@ -1429,11 +1434,26 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1429,11 +1434,26 @@ class FasterRCNNMetaArch(model.DetectionModel):
tf.range(start=0, limit=proposals_shape[0]), 1) tf.range(start=0, limit=proposals_shape[0]), 1)
return tf.reshape(ones_mat * multiplier, [-1]) return tf.reshape(ones_mat * multiplier, [-1])
cropped_regions = tf.image.crop_and_resize( if self._use_matmul_crop_and_resize:
features_to_crop, def _single_image_crop_and_resize(inputs):
self._flatten_first_two_dimensions(proposal_boxes_normalized), single_image_features_to_crop, proposal_boxes_normalized = inputs
get_box_inds(proposal_boxes_normalized), return ops.matmul_crop_and_resize(
(self._initial_crop_size, self._initial_crop_size)) tf.expand_dims(single_image_features_to_crop, 0),
proposal_boxes_normalized,
[self._initial_crop_size, self._initial_crop_size])
cropped_regions = self._flatten_first_two_dimensions(
shape_utils.static_or_dynamic_map_fn(
_single_image_crop_and_resize,
elems=[features_to_crop, proposal_boxes_normalized],
dtype=tf.float32,
parallel_iterations=self._parallel_iterations))
else:
cropped_regions = tf.image.crop_and_resize(
features_to_crop,
self._flatten_first_two_dimensions(proposal_boxes_normalized),
get_box_inds(proposal_boxes_normalized),
(self._initial_crop_size, self._initial_crop_size))
return slim.max_pool2d( return slim.max_pool2d(
cropped_regions, cropped_regions,
[self._maxpool_kernel_size, self._maxpool_kernel_size], [self._maxpool_kernel_size, self._maxpool_kernel_size],
......
...@@ -152,7 +152,8 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase): ...@@ -152,7 +152,8 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
softmax_second_stage_classification_loss=True, softmax_second_stage_classification_loss=True,
predict_masks=False, predict_masks=False,
pad_to_max_dimension=None, pad_to_max_dimension=None,
masks_are_class_agnostic=False): masks_are_class_agnostic=False,
use_matmul_crop_and_resize=False):
def image_resizer_fn(image, masks=None): def image_resizer_fn(image, masks=None):
"""Fake image resizer function.""" """Fake image resizer function."""
...@@ -287,7 +288,9 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase): ...@@ -287,7 +288,9 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
second_stage_classification_loss_weight, second_stage_classification_loss_weight,
'second_stage_classification_loss': 'second_stage_classification_loss':
second_stage_classification_loss, second_stage_classification_loss,
'hard_example_miner': hard_example_miner} 'hard_example_miner': hard_example_miner,
'use_matmul_crop_and_resize': use_matmul_crop_and_resize
}
return self._get_model( return self._get_model(
self._get_second_stage_box_predictor( self._get_second_stage_box_predictor(
...@@ -465,14 +468,16 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase): ...@@ -465,14 +468,16 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
for key in expected_shapes: for key in expected_shapes:
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key]) self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
def test_predict_gives_correct_shapes_in_train_mode_both_stages(self): def _test_predict_gives_correct_shapes_in_train_mode_both_stages(
self, use_matmul_crop_and_resize=False):
test_graph = tf.Graph() test_graph = tf.Graph()
with test_graph.as_default(): with test_graph.as_default():
model = self._build_model( model = self._build_model(
is_training=True, is_training=True,
number_of_stages=2, number_of_stages=2,
second_stage_batch_size=7, second_stage_batch_size=7,
predict_masks=False) predict_masks=False,
use_matmul_crop_and_resize=use_matmul_crop_and_resize)
batch_size = 2 batch_size = 2
image_size = 10 image_size = 10
...@@ -535,6 +540,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase): ...@@ -535,6 +540,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
tensor_dict_out['rpn_objectness_predictions_with_background'].shape, tensor_dict_out['rpn_objectness_predictions_with_background'].shape,
(2, num_anchors_out, 2)) (2, num_anchors_out, 2))
def test_predict_gives_correct_shapes_in_train_mode_both_stages(self):
self._test_predict_gives_correct_shapes_in_train_mode_both_stages()
def test_predict_gives_correct_shapes_in_train_mode_matmul_crop_resize(self):
self._test_predict_gives_correct_shapes_in_train_mode_both_stages(
use_matmul_crop_and_resize=True)
def _test_postprocess_first_stage_only_inference_mode( def _test_postprocess_first_stage_only_inference_mode(
self, pad_to_max_dimension=None): self, pad_to_max_dimension=None):
model = self._build_model( model = self._build_model(
......
...@@ -76,7 +76,8 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch): ...@@ -76,7 +76,8 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
second_stage_classification_loss, second_stage_classification_loss,
hard_example_miner, hard_example_miner,
parallel_iterations=16, parallel_iterations=16,
add_summaries=True): add_summaries=True,
use_matmul_crop_and_resize=False):
"""RFCNMetaArch Constructor. """RFCNMetaArch Constructor.
Args: Args:
...@@ -159,14 +160,17 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch): ...@@ -159,14 +160,17 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
in parallel for calls to tf.map_fn. in parallel for calls to tf.map_fn.
add_summaries: boolean (default: True) controlling whether summary ops add_summaries: boolean (default: True) controlling whether summary ops
should be added to tensorflow graph. should be added to tensorflow graph.
use_matmul_crop_and_resize: Force the use of matrix multiplication based
crop and resize instead of standard tf.image.crop_and_resize while
computing second stage input feature maps.
Raises: Raises:
ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` ValueError: If `second_stage_batch_size` > `first_stage_max_proposals`
ValueError: If first_stage_anchor_generator is not of type ValueError: If first_stage_anchor_generator is not of type
grid_anchor_generator.GridAnchorGenerator. grid_anchor_generator.GridAnchorGenerator.
""" """
# TODO(rathodv): add_summaries is currently unused. Respect that directive # TODO(rathodv): add_summaries and crop_and_resize_fn is currently
# in the future. # unused. Respect that directive in the future.
super(RFCNMetaArch, self).__init__( super(RFCNMetaArch, self).__init__(
is_training, is_training,
num_classes, num_classes,
......
...@@ -480,12 +480,16 @@ class SSDMetaArch(model.DetectionModel): ...@@ -480,12 +480,16 @@ class SSDMetaArch(model.DetectionModel):
with tf.name_scope('Postprocessor'): with tf.name_scope('Postprocessor'):
preprocessed_images = prediction_dict['preprocessed_inputs'] preprocessed_images = prediction_dict['preprocessed_inputs']
box_encodings = prediction_dict['box_encodings'] box_encodings = prediction_dict['box_encodings']
box_encodings = tf.identity(box_encodings, 'raw_box_encodings')
class_predictions = prediction_dict['class_predictions_with_background'] class_predictions = prediction_dict['class_predictions_with_background']
detection_boxes, detection_keypoints = self._batch_decode(box_encodings) detection_boxes, detection_keypoints = self._batch_decode(box_encodings)
detection_boxes = tf.identity(detection_boxes, 'raw_box_locations')
detection_boxes = tf.expand_dims(detection_boxes, axis=2) detection_boxes = tf.expand_dims(detection_boxes, axis=2)
detection_scores_with_background = self._score_conversion_fn( detection_scores_with_background = self._score_conversion_fn(
class_predictions) class_predictions)
detection_scores_with_background = tf.identity(
detection_scores_with_background, 'raw_box_scores')
detection_scores = tf.slice(detection_scores_with_background, [0, 0, 1], detection_scores = tf.slice(detection_scores_with_background, [0, 0, 1],
[-1, -1, -1]) [-1, -1, -1])
additional_fields = None additional_fields = None
......
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Common IO utils used in offline metric computation.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import csv
def write_csv(fid, metrics):
"""Writes metrics key-value pairs to CSV file.
Args:
fid: File identifier of an opened file.
metrics: A dictionary with metrics to be written.
"""
metrics_writer = csv.writer(fid, delimiter=',')
for metric_name, metric_value in metrics.items():
metrics_writer.writerow([metric_name, str(metric_value)])
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Runs evaluation using OpenImages groundtruth and predictions.
Example usage:
python models/research/object_detection/metrics/oid_od_challenge_evaluation.py \
--input_annotations_boxes=/path/to/input/annotations-human-bbox.csv \
--input_annotations_labels=/path/to/input/annotations-label.csv \
--input_class_labelmap=/path/to/input/class_labelmap.pbtxt \
--input_predictions=/path/to/input/predictions.csv \
--output_metrics=/path/to/output/metric.csv \
CSVs with bounding box annotations and image label (including the image URLs)
can be downloaded from the Open Images Challenge website:
https://storage.googleapis.com/openimages/web/challenge.html
The format of the input csv and the metrics itself are described on the
challenge website.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import pandas as pd
from google.protobuf import text_format
from object_detection.metrics import io_utils
from object_detection.metrics import oid_od_challenge_evaluation_utils as utils
from object_detection.protos import string_int_label_map_pb2
from object_detection.utils import object_detection_evaluation
def _load_labelmap(labelmap_path):
"""Loads labelmap from the labelmap path.
Args:
labelmap_path: Path to the labelmap.
Returns:
A dictionary mapping class name to class numerical id
A list with dictionaries, one dictionary per category.
"""
label_map = string_int_label_map_pb2.StringIntLabelMap()
with open(labelmap_path, 'r') as fid:
label_map_string = fid.read()
text_format.Merge(label_map_string, label_map)
labelmap_dict = {}
categories = []
for item in label_map.item:
labelmap_dict[item.name] = item.id
categories.append({'id': item.id, 'name': item.name})
return labelmap_dict, categories
def main(parsed_args):
all_box_annotations = pd.read_csv(parsed_args.input_annotations_boxes)
all_label_annotations = pd.read_csv(parsed_args.input_annotations_labels)
all_label_annotations.rename(
columns={'Confidence': 'ConfidenceImageLabel'}, inplace=True)
all_annotations = pd.concat([all_box_annotations, all_label_annotations])
class_label_map, categories = _load_labelmap(parsed_args.input_class_labelmap)
challenge_evaluator = (
object_detection_evaluation.OpenImagesDetectionChallengeEvaluator(
categories))
for _, groundtruth in enumerate(all_annotations.groupby('ImageID')):
image_id, image_groundtruth = groundtruth
groundtruth_dictionary = utils.build_groundtruth_boxes_dictionary(
image_groundtruth, class_label_map)
challenge_evaluator.add_single_ground_truth_image_info(
image_id, groundtruth_dictionary)
all_predictions = pd.read_csv(parsed_args.input_predictions)
for _, prediction_data in enumerate(all_predictions.groupby('ImageID')):
image_id, image_predictions = prediction_data
prediction_dictionary = utils.build_predictions_dictionary(
image_predictions, class_label_map)
challenge_evaluator.add_single_detected_image_info(image_id,
prediction_dictionary)
metrics = challenge_evaluator.evaluate()
with open(parsed_args.output_metrics, 'w') as fid:
io_utils.write_csv(fid, metrics)
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description='Evaluate Open Images Object Detection Challenge predictions.'
)
parser.add_argument(
'--input_annotations_boxes',
required=True,
help='File with groundtruth boxes annotations.')
parser.add_argument(
'--input_annotations_labels',
required=True,
help='File with groundtruth labels annotations')
parser.add_argument(
'--input_predictions',
required=True,
help="""File with detection predictions; NOTE: no postprocessing is
applied in the evaluation script.""")
parser.add_argument(
'--input_class_labelmap',
required=True,
help='Open Images Challenge labelmap.')
parser.add_argument(
'--output_metrics', required=True, help='Output file with csv metrics')
args = parser.parse_args()
main(args)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Converts data from CSV to the OpenImagesDetectionChallengeEvaluator format.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from object_detection.core import standard_fields
def build_groundtruth_boxes_dictionary(data, class_label_map):
"""Builds a groundtruth dictionary from groundtruth data in CSV file.
Args:
data: Pandas DataFrame with the groundtruth data for a single image.
class_label_map: Class labelmap from string label name to an integer.
Returns:
A dictionary with keys suitable for passing to
OpenImagesDetectionChallengeEvaluator.add_single_ground_truth_image_info:
standard_fields.InputDataFields.groundtruth_boxes: float32 numpy array
of shape [num_boxes, 4] containing `num_boxes` groundtruth boxes of
the format [ymin, xmin, ymax, xmax] in absolute image coordinates.
standard_fields.InputDataFields.groundtruth_classes: integer numpy array
of shape [num_boxes] containing 1-indexed groundtruth classes for the
boxes.
standard_fields.InputDataFields.verified_labels: integer 1D numpy array
containing all classes for which labels are verified.
standard_fields.InputDataFields.groundtruth_group_of: Optional length
M numpy boolean array denoting whether a groundtruth box contains a
group of instances.
"""
data_boxes = data[data.ConfidenceImageLabel.isnull()]
data_labels = data[data.XMin.isnull()]
return {
standard_fields.InputDataFields.groundtruth_boxes:
data_boxes[['YMin', 'XMin', 'YMax', 'XMax']].as_matrix(),
standard_fields.InputDataFields.groundtruth_classes:
data_boxes['LabelName'].map(lambda x: class_label_map[x]).as_matrix(),
standard_fields.InputDataFields.groundtruth_group_of:
data_boxes['IsGroupOf'].as_matrix().astype(int),
standard_fields.InputDataFields.groundtruth_image_classes:
data_labels['LabelName'].map(lambda x: class_label_map[x])
.as_matrix(),
}
def build_predictions_dictionary(data, class_label_map):
"""Builds a predictions dictionary from predictions data in CSV file.
Args:
data: Pandas DataFrame with the predictions data for a single image.
class_label_map: Class labelmap from string label name to an integer.
Returns:
Dictionary with keys suitable for passing to
OpenImagesDetectionChallengeEvaluator.add_single_detected_image_info:
standard_fields.DetectionResultFields.detection_boxes: float32 numpy
array of shape [num_boxes, 4] containing `num_boxes` detection boxes
of the format [ymin, xmin, ymax, xmax] in absolute image coordinates.
standard_fields.DetectionResultFields.detection_scores: float32 numpy
array of shape [num_boxes] containing detection scores for the boxes.
standard_fields.DetectionResultFields.detection_classes: integer numpy
array of shape [num_boxes] containing 1-indexed detection classes for
the boxes.
"""
return {
standard_fields.DetectionResultFields.detection_boxes:
data[['YMin', 'XMin', 'YMax', 'XMax']].as_matrix(),
standard_fields.DetectionResultFields.detection_classes:
data['LabelName'].map(lambda x: class_label_map[x]).as_matrix(),
standard_fields.DetectionResultFields.detection_scores:
data['Score'].as_matrix()
}
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for oid_od_challenge_evaluation_util."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import pandas as pd
import tensorflow as tf
from object_detection.core import standard_fields
from object_detection.metrics import oid_od_challenge_evaluation_utils as utils
class OidOdChallengeEvaluationUtilTest(tf.test.TestCase):
def testBuildGroundtruthDictionary(self):
np_data = pd.DataFrame(
[['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.3, 0.5, 0.6, 1, None], [
'fe58ec1b06db2bb7', '/m/02gy9n', 0.1, 0.2, 0.3, 0.4, 0, None
], ['fe58ec1b06db2bb7', '/m/04bcr3', None, None, None, None, None, 1], [
'fe58ec1b06db2bb7', '/m/083vt', None, None, None, None, None, 0
], ['fe58ec1b06db2bb7', '/m/02gy9n', None, None, None, None, None, 1]],
columns=[
'ImageID', 'LabelName', 'XMin', 'XMax', 'YMin', 'YMax', 'IsGroupOf',
'ConfidenceImageLabel'
])
class_label_map = {'/m/04bcr3': 1, '/m/083vt': 2, '/m/02gy9n': 3}
groundtruth_dictionary = utils.build_groundtruth_boxes_dictionary(
np_data, class_label_map)
self.assertTrue(standard_fields.InputDataFields.groundtruth_boxes in
groundtruth_dictionary)
self.assertTrue(standard_fields.InputDataFields.groundtruth_classes in
groundtruth_dictionary)
self.assertTrue(standard_fields.InputDataFields.groundtruth_group_of in
groundtruth_dictionary)
self.assertTrue(standard_fields.InputDataFields.groundtruth_image_classes in
groundtruth_dictionary)
self.assertAllEqual(
np.array([1, 3]), groundtruth_dictionary[
standard_fields.InputDataFields.groundtruth_classes])
self.assertAllEqual(
np.array([1, 0]), groundtruth_dictionary[
standard_fields.InputDataFields.groundtruth_group_of])
expected_boxes_data = np.array([[0.5, 0.0, 0.6, 0.3], [0.3, 0.1, 0.4, 0.2]])
self.assertNDArrayNear(
expected_boxes_data, groundtruth_dictionary[
standard_fields.InputDataFields.groundtruth_boxes], 1e-5)
self.assertAllEqual(
np.array([1, 2, 3]), groundtruth_dictionary[
standard_fields.InputDataFields.groundtruth_image_classes])
def testBuildPredictionDictionary(self):
np_data = pd.DataFrame(
[['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.3, 0.5, 0.6, 0.1], [
'fe58ec1b06db2bb7', '/m/02gy9n', 0.1, 0.2, 0.3, 0.4, 0.2
], ['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.1, 0.2, 0.3, 0.3]],
columns=[
'ImageID', 'LabelName', 'XMin', 'XMax', 'YMin', 'YMax', 'Score'
])
class_label_map = {'/m/04bcr3': 1, '/m/083vt': 2, '/m/02gy9n': 3}
prediction_dictionary = utils.build_predictions_dictionary(
np_data, class_label_map)
self.assertTrue(standard_fields.DetectionResultFields.detection_boxes in
prediction_dictionary)
self.assertTrue(standard_fields.DetectionResultFields.detection_classes in
prediction_dictionary)
self.assertTrue(standard_fields.DetectionResultFields.detection_scores in
prediction_dictionary)
self.assertAllEqual(
np.array([1, 3, 1]), prediction_dictionary[
standard_fields.DetectionResultFields.detection_classes])
expected_boxes_data = np.array([[0.5, 0.0, 0.6, 0.3], [0.3, 0.1, 0.4, 0.2],
[0.2, 0.0, 0.3, 0.1]])
self.assertNDArrayNear(
expected_boxes_data, prediction_dictionary[
standard_fields.DetectionResultFields.detection_boxes], 1e-5)
self.assertNDArrayNear(
np.array([0.1, 0.2, 0.3]), prediction_dictionary[
standard_fields.DetectionResultFields.detection_scores], 1e-5)
if __name__ == '__main__':
tf.test.main()
...@@ -15,8 +15,8 @@ ...@@ -15,8 +15,8 @@
r"""Runs evaluation using OpenImages groundtruth and predictions. r"""Runs evaluation using OpenImages groundtruth and predictions.
Example usage: Example usage:
python third_party/tensorflow_models/object_detection/\ python \
metrics/oid_vrd_challenge_evaluation.py \ models/research/object_detection/metrics/oid_vrd_challenge_evaluation.py \
--input_annotations_boxes=/path/to/input/annotations-human-bbox.csv \ --input_annotations_boxes=/path/to/input/annotations-human-bbox.csv \
--input_annotations_labels=/path/to/input/annotations-label.csv \ --input_annotations_labels=/path/to/input/annotations-label.csv \
--input_class_labelmap=/path/to/input/class_labelmap.pbtxt \ --input_class_labelmap=/path/to/input/class_labelmap.pbtxt \
...@@ -39,6 +39,7 @@ import argparse ...@@ -39,6 +39,7 @@ import argparse
import pandas as pd import pandas as pd
from google.protobuf import text_format from google.protobuf import text_format
from object_detection.metrics import io_utils
from object_detection.metrics import oid_vrd_challenge_evaluation_utils as utils from object_detection.metrics import oid_vrd_challenge_evaluation_utils as utils
from object_detection.protos import string_int_label_map_pb2 from object_detection.protos import string_int_label_map_pb2
from object_detection.utils import vrd_evaluation from object_detection.utils import vrd_evaluation
...@@ -109,12 +110,14 @@ def main(parsed_args): ...@@ -109,12 +110,14 @@ def main(parsed_args):
phrase_evaluator.add_single_detected_image_info(image_id, phrase_evaluator.add_single_detected_image_info(image_id,
prediction_dictionary) prediction_dictionary)
relation_metrics = relation_evaluator.evaluate() relation_metrics = relation_evaluator.evaluate(
phrase_metrics = phrase_evaluator.evaluate() relationships=_swap_labelmap_dict(relationship_label_map))
phrase_metrics = phrase_evaluator.evaluate(
relationships=_swap_labelmap_dict(relationship_label_map))
with open(parsed_args.output_metrics, 'w') as fid: with open(parsed_args.output_metrics, 'w') as fid:
utils.write_csv(fid, relation_metrics) io_utils.write_csv(fid, relation_metrics)
utils.write_csv(fid, phrase_metrics) io_utils.write_csv(fid, phrase_metrics)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -18,7 +18,6 @@ from __future__ import absolute_import ...@@ -18,7 +18,6 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import csv
import numpy as np import numpy as np
from object_detection.core import standard_fields from object_detection.core import standard_fields
from object_detection.utils import vrd_evaluation from object_detection.utils import vrd_evaluation
...@@ -58,18 +57,21 @@ def build_groundtruth_vrd_dictionary(data, class_label_map, ...@@ -58,18 +57,21 @@ def build_groundtruth_vrd_dictionary(data, class_label_map,
boxes['object'] = data_boxes[['YMin2', 'XMin2', 'YMax2', 'XMax2']].as_matrix() boxes['object'] = data_boxes[['YMin2', 'XMin2', 'YMax2', 'XMax2']].as_matrix()
labels = np.zeros(data_boxes.shape[0], dtype=vrd_evaluation.label_data_type) labels = np.zeros(data_boxes.shape[0], dtype=vrd_evaluation.label_data_type)
labels['subject'] = data_boxes['LabelName1'].map(lambda x: class_label_map[x]) labels['subject'] = data_boxes['LabelName1'].map(
labels['object'] = data_boxes['LabelName2'].map(lambda x: class_label_map[x]) lambda x: class_label_map[x]).as_matrix()
labels['object'] = data_boxes['LabelName2'].map(
lambda x: class_label_map[x]).as_matrix()
labels['relation'] = data_boxes['RelationshipLabel'].map( labels['relation'] = data_boxes['RelationshipLabel'].map(
lambda x: relationship_label_map[x]) lambda x: relationship_label_map[x]).as_matrix()
return { return {
standard_fields.InputDataFields.groundtruth_boxes: standard_fields.InputDataFields.groundtruth_boxes:
boxes, boxes,
standard_fields.InputDataFields.groundtruth_classes: standard_fields.InputDataFields.groundtruth_classes:
labels, labels,
standard_fields.InputDataFields.verified_labels: standard_fields.InputDataFields.groundtruth_image_classes:
data_labels['LabelName'].map(lambda x: class_label_map[x]), data_labels['LabelName'].map(lambda x: class_label_map[x])
.as_matrix(),
} }
...@@ -106,10 +108,12 @@ def build_predictions_vrd_dictionary(data, class_label_map, ...@@ -106,10 +108,12 @@ def build_predictions_vrd_dictionary(data, class_label_map,
boxes['object'] = data_boxes[['YMin2', 'XMin2', 'YMax2', 'XMax2']].as_matrix() boxes['object'] = data_boxes[['YMin2', 'XMin2', 'YMax2', 'XMax2']].as_matrix()
labels = np.zeros(data_boxes.shape[0], dtype=vrd_evaluation.label_data_type) labels = np.zeros(data_boxes.shape[0], dtype=vrd_evaluation.label_data_type)
labels['subject'] = data_boxes['LabelName1'].map(lambda x: class_label_map[x]) labels['subject'] = data_boxes['LabelName1'].map(
labels['object'] = data_boxes['LabelName2'].map(lambda x: class_label_map[x]) lambda x: class_label_map[x]).as_matrix()
labels['object'] = data_boxes['LabelName2'].map(
lambda x: class_label_map[x]).as_matrix()
labels['relation'] = data_boxes['RelationshipLabel'].map( labels['relation'] = data_boxes['RelationshipLabel'].map(
lambda x: relationship_label_map[x]) lambda x: relationship_label_map[x]).as_matrix()
return { return {
standard_fields.DetectionResultFields.detection_boxes: standard_fields.DetectionResultFields.detection_boxes:
...@@ -119,15 +123,3 @@ def build_predictions_vrd_dictionary(data, class_label_map, ...@@ -119,15 +123,3 @@ def build_predictions_vrd_dictionary(data, class_label_map,
standard_fields.DetectionResultFields.detection_scores: standard_fields.DetectionResultFields.detection_scores:
data_boxes['Score'].as_matrix() data_boxes['Score'].as_matrix()
} }
def write_csv(fid, metrics):
"""Writes metrics key-value pairs to CSV file.
Args:
fid: File identifier of an opened file.
metrics: A dictionary with metrics to be written.
"""
metrics_writer = csv.writer(fid, delimiter=',')
for metric_name, metric_value in metrics.items():
metrics_writer.writerow([metric_name, str(metric_value)])
...@@ -66,7 +66,7 @@ class OidVrdChallengeEvaluationUtilsTest(tf.test.TestCase): ...@@ -66,7 +66,7 @@ class OidVrdChallengeEvaluationUtilsTest(tf.test.TestCase):
groundtruth_dictionary) groundtruth_dictionary)
self.assertTrue(standard_fields.InputDataFields.groundtruth_classes in self.assertTrue(standard_fields.InputDataFields.groundtruth_classes in
groundtruth_dictionary) groundtruth_dictionary)
self.assertTrue(standard_fields.InputDataFields.verified_labels in self.assertTrue(standard_fields.InputDataFields.groundtruth_image_classes in
groundtruth_dictionary) groundtruth_dictionary)
self.assertAllEqual( self.assertAllEqual(
...@@ -87,8 +87,8 @@ class OidVrdChallengeEvaluationUtilsTest(tf.test.TestCase): ...@@ -87,8 +87,8 @@ class OidVrdChallengeEvaluationUtilsTest(tf.test.TestCase):
expected_vrd_data[field], groundtruth_dictionary[ expected_vrd_data[field], groundtruth_dictionary[
standard_fields.InputDataFields.groundtruth_boxes][field], 1e-5) standard_fields.InputDataFields.groundtruth_boxes][field], 1e-5)
self.assertAllEqual( self.assertAllEqual(
np.array([1, 2, 3]), np.array([1, 2, 3]), groundtruth_dictionary[
groundtruth_dictionary[standard_fields.InputDataFields.verified_labels]) standard_fields.InputDataFields.groundtruth_image_classes])
def testBuildPredictionDictionary(self): def testBuildPredictionDictionary(self):
np_data = pd.DataFrame( np_data = pd.DataFrame(
......
...@@ -114,7 +114,7 @@ class TfExampleDetectionAndGTParser(data_parser.DataToNumpyParser): ...@@ -114,7 +114,7 @@ class TfExampleDetectionAndGTParser(data_parser.DataToNumpyParser):
Int64Parser(fields.TfExampleFields.object_difficult), Int64Parser(fields.TfExampleFields.object_difficult),
fields.InputDataFields.groundtruth_group_of: fields.InputDataFields.groundtruth_group_of:
Int64Parser(fields.TfExampleFields.object_group_of), Int64Parser(fields.TfExampleFields.object_group_of),
fields.InputDataFields.verified_labels: fields.InputDataFields.groundtruth_image_classes:
Int64Parser(fields.TfExampleFields.image_class_label), Int64Parser(fields.TfExampleFields.image_class_label),
} }
...@@ -136,6 +136,8 @@ class TfExampleDetectionAndGTParser(data_parser.DataToNumpyParser): ...@@ -136,6 +136,8 @@ class TfExampleDetectionAndGTParser(data_parser.DataToNumpyParser):
groundtruth group of flag (optional, None if not specified). groundtruth group of flag (optional, None if not specified).
fields.InputDataFields.groundtruth_difficult - a numpy array containing fields.InputDataFields.groundtruth_difficult - a numpy array containing
groundtruth difficult flag (optional, None if not specified). groundtruth difficult flag (optional, None if not specified).
fields.InputDataFields.groundtruth_image_classes - a numpy array
containing groundtruth image-level labels.
fields.DetectionResultFields.detection_boxes - a numpy array containing fields.DetectionResultFields.detection_boxes - a numpy array containing
detection boxes. detection boxes.
fields.DetectionResultFields.detection_classes - a numpy array containing fields.DetectionResultFields.detection_classes - a numpy array containing
......
...@@ -125,7 +125,8 @@ class TfExampleDecoderTest(tf.test.TestCase): ...@@ -125,7 +125,8 @@ class TfExampleDecoderTest(tf.test.TestCase):
results_dict = parser.parse(example) results_dict = parser.parse(example)
self.assertIsNotNone(results_dict) self.assertIsNotNone(results_dict)
np_testing.assert_equal( np_testing.assert_equal(
verified_labels, results_dict[fields.InputDataFields.verified_labels]) verified_labels,
results_dict[fields.InputDataFields.groundtruth_image_classes])
def testParseString(self): def testParseString(self):
string_val = 'abc' string_val = 'abc'
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment