"...bert-large_oneflow.git" did not exist on "5988d2cc317ac8cb8e21f84ec17dbd59e805df6c"
Commit 8cf8446b authored by Yukun Zhu's avatar Yukun Zhu Committed by aquariusjay
Browse files

Adding panoptic evaluation tools and update internal changes. (#6320)

* Internal changes

PiperOrigin-RevId: 237183552

* update readme

PiperOrigin-RevId: 237184584
parent 05a79f5a
......@@ -64,6 +64,21 @@ works:
```
* Auto-DeepLab (also called hnasnet in core/nas_network.py):
```
@inproceedings{autodeeplab2019,
title={Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic
Image Segmentation},
author={Chenxi Liu and Liang-Chieh Chen and Florian Schroff and Hartwig Adam
and Wei Hua and Alan Yuille and Li Fei-Fei},
booktitle={CVPR},
year={2019}
}
```
In the current implementation, we support adopting the following network
backbones:
......@@ -72,6 +87,15 @@ backbones:
2. Xception [9, 10]: A powerful network structure intended for server-side
deployment.
3. ResNet-v1-{50,101} [14]: We provide both the original ResNet-v1 and its
'beta' variant where the 'stem' is modified for semantic segmentation.
4. PNASNet [15]: A Powerful network structure found by neural architecture
search.
5. Auto-DeepLab (called HNASNet in the code): A segmentation-specific network
backbone found by neural architecture search.
This directory contains our TensorFlow [11] implementation. We provide codes
allowing users to train the model, evaluate results in terms of mIOU (mean
intersection-over-union), and visualize segmentation results. We use PASCAL VOC
......@@ -91,6 +115,8 @@ Some segmentation results on Flickr images:
* YuKun Zhu, github: [yknzhu](https://github.com/YknZhu)
* George Papandreou, github: [gpapan](https://github.com/gpapan)
* Hui Hui, github: [huihui-personal](https://github.com/huihui-personal)
* Maxwell D. Collins, github: [mcollinswisc](https://github.com/mcollinswisc)
* Ting Liu: github: [tingliu](https://github.com/tingliu)
## Tables of Contents
......@@ -131,9 +157,17 @@ under tensorflow/models. Please refer to the LICENSE for details.
## Change Logs
### March 6, 2019
* Released the evaluation code (under the `evaluation` folder) for image
parsing, a.k.a. panoptic segmentation. In particular, the released code supports
evaluating the parsing results in terms of both the parsing covering and
panoptic quality metrics. **Contributors**: Maxwell Collins and Ting Liu.
### February 6, 2019
* Update decoder module to exploit multiple low-level features with different
* Updated decoder module to exploit multiple low-level features with different
output_strides.
### December 3, 2018
......@@ -241,3 +275,11 @@ and Cityscapes.
13. **The Cityscapes Dataset for Semantic Urban Scene Understanding**<br />
Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. <br />
[[link]](https://www.cityscapes-dataset.com/). In CVPR, 2016.
14. **Deep Residual Learning for Image Recognition**<br />
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. <br />
[[link]](https://arxiv.org/abs/1512.03385). In CVPR, 2016.
15. **Progressive Neural Architecture Search**<br />
Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy. <br />
[[link]](https://arxiv.org/abs/1712.00559). In ECCV, 2018.
......@@ -175,3 +175,25 @@ class NASBaseCell(object):
h for h, is_used in zip(net, used_hiddenstates) if not is_used])
net = tf.concat(values=states_to_combine, axis=3)
return net
@tf.contrib.framework.add_arg_scope
def _apply_drop_path(self, net):
"""Apply drop_path regularization."""
drop_path_keep_prob = self._drop_path_keep_prob
if drop_path_keep_prob < 1.0:
# Scale keep prob by layer number.
assert self._cell_num != -1
layer_ratio = (self._cell_num + 1) / float(self._total_num_cells)
drop_path_keep_prob = 1 - layer_ratio * (1 - drop_path_keep_prob)
# Decrease keep prob over time.
current_step = tf.cast(tf.train.get_or_create_global_step(), tf.float32)
current_ratio = tf.minimum(1.0, current_step / self._total_training_steps)
drop_path_keep_prob = (1 - current_ratio * (1 - drop_path_keep_prob))
# Drop path.
noise_shape = [tf.shape(net)[0], 1, 1, 1]
random_tensor = drop_path_keep_prob
random_tensor += tf.random_uniform(noise_shape, dtype=tf.float32)
binary_tensor = tf.cast(tf.floor(random_tensor), net.dtype)
keep_prob_inv = tf.cast(1.0 / drop_path_keep_prob, net.dtype)
net = net * keep_prob_inv * binary_tensor
return net
......@@ -13,7 +13,21 @@
# limitations under the License.
# ==============================================================================
"""Network structure used by NAS."""
"""Network structure used by NAS.
Here we provide a few NAS backbones for semantic segmentation.
Currently, we have
1. pnasnet
"Progressive Neural Architecture Search", Chenxi Liu, Barret Zoph,
Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei,
Alan Yuille, Jonathan Huang, Kevin Murphy. In ECCV, 2018.
2. hnasnet (also called Auto-DeepLab)
"Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic
Image Segmentation", Chenxi Liu, Liang-Chieh Chen, Florian Schroff,
Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei. In CVPR, 2019.
"""
from __future__ import absolute_import
from __future__ import division
......
......@@ -19,7 +19,7 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import google3
import numpy as np
import tensorflow as tf
......
......@@ -59,6 +59,28 @@ def flip_dim(tensor_list, prob=0.5, dim=1):
return outputs
def _image_dimensions(image, rank):
"""Returns the dimensions of an image tensor.
Args:
image: A rank-D Tensor. For 3-D of shape: `[height, width, channels]`.
rank: The expected rank of the image
Returns:
A list of corresponding to the dimensions of the input image. Dimensions
that are statically known are python integers, otherwise they are integer
scalar tensors.
"""
if image.get_shape().is_fully_defined():
return image.get_shape().as_list()
else:
static_shape = image.get_shape().with_rank(rank).as_list()
dynamic_shape = tf.unstack(tf.shape(image), rank)
return [
s if s is not None else d for s, d in zip(static_shape, dynamic_shape)
]
def pad_to_bounding_box(image, offset_height, offset_width, target_height,
target_width, pad_value):
"""Pads the given image with the given pad_value.
......@@ -82,39 +104,61 @@ def pad_to_bounding_box(image, offset_height, offset_width, target_height,
ValueError: If the shape of image is incompatible with the offset_* or
target_* arguments.
"""
image_rank = tf.rank(image)
image_rank_assert = tf.Assert(
tf.equal(image_rank, 3),
['Wrong image tensor rank [Expected] [Actual]',
3, image_rank])
with tf.control_dependencies([image_rank_assert]):
image -= pad_value
image_shape = tf.shape(image)
height, width = image_shape[0], image_shape[1]
target_width_assert = tf.Assert(
tf.greater_equal(
target_width, width),
['target_width must be >= width'])
target_height_assert = tf.Assert(
tf.greater_equal(target_height, height),
['target_height must be >= height'])
with tf.control_dependencies([target_width_assert]):
after_padding_width = target_width - offset_width - width
with tf.control_dependencies([target_height_assert]):
after_padding_height = target_height - offset_height - height
offset_assert = tf.Assert(
tf.logical_and(
tf.greater_equal(after_padding_width, 0),
tf.greater_equal(after_padding_height, 0)),
['target size not possible with the given target offsets'])
height_params = tf.stack([offset_height, after_padding_height])
width_params = tf.stack([offset_width, after_padding_width])
channel_params = tf.stack([0, 0])
with tf.control_dependencies([offset_assert]):
paddings = tf.stack([height_params, width_params, channel_params])
padded = tf.pad(image, paddings)
return padded + pad_value
with tf.name_scope(None, 'pad_to_bounding_box', [image]):
image = tf.convert_to_tensor(image, name='image')
original_dtype = image.dtype
if original_dtype != tf.float32 and original_dtype != tf.float64:
# If image dtype is not float, we convert it to int32 to avoid overflow.
image = tf.cast(image, tf.int32)
image_rank_assert = tf.Assert(
tf.logical_or(
tf.equal(tf.rank(image), 3),
tf.equal(tf.rank(image), 4)),
['Wrong image tensor rank.'])
with tf.control_dependencies([image_rank_assert]):
image -= pad_value
image_shape = image.get_shape()
is_batch = True
if image_shape.ndims == 3:
is_batch = False
image = tf.expand_dims(image, 0)
elif image_shape.ndims is None:
is_batch = False
image = tf.expand_dims(image, 0)
image.set_shape([None] * 4)
elif image.get_shape().ndims != 4:
raise ValueError('Input image must have either 3 or 4 dimensions.')
_, height, width, _ = _image_dimensions(image, rank=4)
target_width_assert = tf.Assert(
tf.greater_equal(
target_width, width),
['target_width must be >= width'])
target_height_assert = tf.Assert(
tf.greater_equal(target_height, height),
['target_height must be >= height'])
with tf.control_dependencies([target_width_assert]):
after_padding_width = target_width - offset_width - width
with tf.control_dependencies([target_height_assert]):
after_padding_height = target_height - offset_height - height
offset_assert = tf.Assert(
tf.logical_and(
tf.greater_equal(after_padding_width, 0),
tf.greater_equal(after_padding_height, 0)),
['target size not possible with the given target offsets'])
batch_params = tf.stack([0, 0])
height_params = tf.stack([offset_height, after_padding_height])
width_params = tf.stack([offset_width, after_padding_width])
channel_params = tf.stack([0, 0])
with tf.control_dependencies([offset_assert]):
paddings = tf.stack([batch_params, height_params, width_params,
channel_params])
padded = tf.pad(image, paddings)
if not is_batch:
padded = tf.squeeze(padded, axis=[0])
outputs = padded + pad_value
if outputs.dtype != original_dtype:
outputs = tf.cast(outputs, original_dtype)
return outputs
def _crop(image, offset_height, offset_width, crop_height, crop_width):
......@@ -267,7 +311,7 @@ def get_random_scale(min_scale_factor, max_scale_factor, step_size):
raise ValueError('Unexpected value of min_scale_factor.')
if min_scale_factor == max_scale_factor:
return tf.to_float(min_scale_factor)
return tf.cast(min_scale_factor, tf.float32)
# When step_size = 0, we sample the value uniformly from [min, max).
if step_size == 0:
......@@ -297,7 +341,9 @@ def randomly_scale_image_and_label(image, label=None, scale=1.0):
if scale == 1.0:
return image, label
image_shape = tf.shape(image)
new_dim = tf.to_int32(tf.to_float([image_shape[0], image_shape[1]]) * scale)
new_dim = tf.cast(
tf.cast([image_shape[0], image_shape[1]], tf.float32) * scale,
tf.int32)
# Need squeeze and expand_dims because image interpolation takes
# 4D tensors as input.
......@@ -389,9 +435,9 @@ def resize_to_range(image,
"""
with tf.name_scope(scope, 'resize_to_range', [image]):
new_tensor_list = []
min_size = tf.to_float(min_size)
min_size = tf.cast(min_size, tf.float32)
if max_size is not None:
max_size = tf.to_float(max_size)
max_size = tf.cast(max_size, tf.float32)
# Modify the max_size to be a multiple of factor plus 1 and make sure the
# max dimension after resizing is no larger than max_size.
if factor is not None:
......@@ -399,8 +445,8 @@ def resize_to_range(image,
- factor)
[orig_height, orig_width, _] = resolve_shape(image, rank=3)
orig_height = tf.to_float(orig_height)
orig_width = tf.to_float(orig_width)
orig_height = tf.cast(orig_height, tf.float32)
orig_width = tf.cast(orig_width, tf.float32)
orig_min_size = tf.minimum(orig_height, orig_width)
# Calculate the larger of the possible sizes
......@@ -419,7 +465,7 @@ def resize_to_range(image,
small_width = tf.to_int32(tf.ceil(orig_width * small_scale_factor))
small_size = tf.stack([small_height, small_width])
new_size = tf.cond(
tf.to_float(tf.reduce_max(large_size)) > max_size,
tf.cast(tf.reduce_max(large_size), tf.float32) > max_size,
lambda: small_size,
lambda: large_size)
# Ensure that both output sides are multiples of factor plus one.
......
......@@ -252,25 +252,27 @@ class PreprocessUtilsTest(tf.test.TestCase):
[255, 3, 5, 255, 255],
[255, 255, 255, 255, 255]]]).astype(dtype)
with self.test_session():
image_placeholder = tf.placeholder(tf.float32)
with self.session() as sess:
padded_image = preprocess_utils.pad_to_bounding_box(
image_placeholder, 2, 1, 5, 5, 255)
self.assertAllClose(padded_image.eval(
feed_dict={image_placeholder: image}), expected_image)
image, 2, 1, 5, 5, 255)
padded_image = sess.run(padded_image)
self.assertAllClose(padded_image, expected_image)
# Add batch size = 1 to image.
padded_image = preprocess_utils.pad_to_bounding_box(
np.expand_dims(image, 0), 2, 1, 5, 5, 255)
padded_image = sess.run(padded_image)
self.assertAllClose(padded_image, np.expand_dims(expected_image, 0))
def testReturnOriginalImageWhenTargetSizeIsEqualToImageSize(self):
image = np.dstack([[[5, 6],
[9, 0]],
[[4, 3],
[3, 5]]])
with self.test_session():
image_placeholder = tf.placeholder(tf.float32)
with self.session() as sess:
padded_image = preprocess_utils.pad_to_bounding_box(
image_placeholder, 0, 0, 2, 2, 255)
self.assertAllClose(padded_image.eval(
feed_dict={image_placeholder: image}), image)
image, 0, 0, 2, 2, 255)
padded_image = sess.run(padded_image)
self.assertAllClose(padded_image, image)
def testDieOnTargetSizeGreaterThanImageSize(self):
image = np.dstack([[[5, 6],
......@@ -306,7 +308,7 @@ class PreprocessUtilsTest(tf.test.TestCase):
'target size not possible with the given target offsets'):
padded_image.eval(feed_dict={image_placeholder: image})
def testDieIfImageTensorRankIsNotThree(self):
def testDieIfImageTensorRankIsTwo(self):
image = np.vstack([[5, 6],
[9, 0]])
with self.test_session():
......
......@@ -17,7 +17,7 @@
from __future__ import print_function
import collections
import google3
import tensorflow as tf
from deeplab import common
......@@ -37,7 +37,7 @@ class DatasetTest(tf.test.TestCase):
dataset_name='pascal_voc_seg',
split_name='val',
dataset_dir=
'research/deeplab/testing/pascal_voc_seg',
'deeplab/testing/pascal_voc_seg',
batch_size=1,
crop_size=[3, 3], # Use small size for testing.
min_resize_value=3,
......
......@@ -72,7 +72,7 @@ def main(unused_argv):
'*.' + FLAGS.segmentation_format))
for annotation in annotations:
raw_annotation = _remove_colormap(annotation)
filename = os.path.splitext(os.path.basename(annotation))[0]
filename = os.path.basename(annotation)[:-4]
_save_annotation(raw_annotation,
os.path.join(
FLAGS.output_dir,
......
# Evaluation Metrics for Whole Image Parsing
Whole Image Parsing [1], also known as Panoptic Segmentation [2], generalizes
the tasks of semantic segmentation for "stuff" classes and instance
segmentation for "thing" classes, assigning both semantic and instance labels
to every pixel in an image.
Previous works evaluate the parsing result with separate metrics (e.g., one for
semantic segmentation result and one for object detection result). Recently,
Kirillov et al. propose the unified instance-based Panoptic Quality (PQ) metric
[2] into several benchmarks [3, 4].
However, we notice that the instance-based PQ metric often places
disproportionate emphasis on small instance parsing, as well as on "thing" over
"stuff" classes. To remedy these effects, we propose an alternative
region-based Parsing Covering (PC) metric [5], which adapts the Covering
metric [6], previously used for class-agnostics segmentation quality
evaluation, to the task of image parsing.
Here, we provide implementation of both PQ and PC for evaluating the parsing
results. We briefly explain both metrics below for reference.
## Panoptic Quality (PQ)
Given a groundtruth segmentation S and a predicted segmentation S', PQ is
defined as follows:
<p align="center">
<img src="g3doc/img/equation_pq.png" width=400>
</p>
where R and R' are groundtruth regions and predicted regions respectively,
and |TP|, |FP|, and |FN| are the number of true positives, false postives,
and false negatives. The matching is determined by a threshold of 0.5
Intersection-Over-Union (IOU).
PQ treats all regions of the same ‘stuff‘ class as one instance, and the
size of instances is not considered. For example, instances with 10 × 10
pixels contribute equally to the metric as instances with 1000 × 1000 pixels.
Therefore, PQ is sensitive to false positives with small regions and some
heuristics could improve the performance, such as removing those small
regions (as also pointed out in the open-sourced evaluation code from [2]).
Thus, we argue that PQ is suitable in applications where one cares equally for
the parsing quality of instances irrespective of their sizes.
## Parsing Covering (PC)
We notice that there are applications where one pays more attention to large
objects, e.g., autonomous driving (where nearby objects are more important
than far away ones). Motivated by this, we propose to also evaluate the
quality of image parsing results by extending the existing Covering metric [5],
which accounts for instance sizes. Specifically, our proposed metric, Parsing
Covering (PC), is defined as follows:
<p align="center">
<img src="g3doc/img/equation_pc.png" width=400>
</p>
where S<sub>i</sub> and S<sub>i</sub>' are the groundtruth segmentation and
predicted segmentation for the i-th semantic class respectively, and
N<sub>i</sub> is the total number of pixels of groundtruth regions from
S<sub>i</sub> . The Covering for class i, Cov<sub>i</sub> , is computed in
the same way as the original Covering metric except that only groundtruth
regions from S<sub>i</sub> and predicted regions from S<sub>i</sub>' are
considered. PC is then obtained by computing the average of Cov<sub>i</sub>
over C semantic classes.
A notable difference between PQ and the proposed PC is that there is no
matching involved in PC and hence no matching threshold. As an attempt to
treat equally "thing" and "stuff", the segmentation of "stuff" classes still
receives partial PC score if the segmentation is only partially correct. For
example, if one out of three equally-sized trees is perfectly segmented, the
model will get the same partial score by using PC regardless of considering
"tree" as "stuff" or "thing".
## Tutorial
To evaluate the parsing results with PQ and PC, we provide two options:
1. Python off-line evaluation with results saved in the [COCO format](http://cocodataset.org/#format-results).
2. TensorFlow on-line evaluation.
Below, we explain each option in detail.
#### 1. Python off-line evaluation with results saved in COCO format
[COCO result format](http://cocodataset.org/#format-results) has been
adopted by several benchmarks [3, 4]. Therefore, we provide a convenient
function, `eval_coco_format`, to evaluate the results saved in COCO format
in terms of PC and re-implemented PQ.
Before using the provided function, the users need to download the official COCO
panotpic segmentation task API. Please see [installation](../g3doc/installation.md#add-libraries-to-pythonpath)
for reference.
Once the official COCO panoptic segmentation task API is downloaded, the
users should be able to run the `eval_coco_format.py` to evaluate the parsing
results in terms of both PC and reimplemented PQ.
To be concrete, let's take a look at the function, `eval_coco_format` in
`eval_coco_format.py`:
```python
eval_coco_format(gt_json_file,
pred_json_file,
gt_folder=None,
pred_folder=None,
metric='pq',
num_categories=201,
ignored_label=0,
max_instances_per_category=256,
intersection_offset=None,
normalize_by_image_size=True,
num_workers=0,
print_digits=3):
```
where
1. `gt_json_file`: Path to a JSON file giving ground-truth annotations in COCO
format.
2. `pred_json_file`: Path to a JSON file for the predictions to evaluate.
3. `gt_folder`: Folder containing panoptic-format ID images to match
ground-truth annotations to image regions.
4. `pred_folder`: Path to a folder containing ID images for predictions.
5. `metric`: Name of a metric to compute. Set to `pc`, `pq` for evaluation in PC
or PQ, respectively.
6. `num_categories`: The number of segmentation categories (or "classes") in the
dataset.
7. `ignored_label`: A category id that is ignored in evaluation, e.g. the "void"
label in COCO panoptic segmentation dataset.
8. `max_instances_per_category`: The maximum number of instances for each
category to ensure unique instance labels.
9. `intersection_offset`: The maximum number of unique labels.
10. `normalize_by_image_size`: Whether to normalize groundtruth instance region
areas by image size when using PC.
11. `num_workers`: If set to a positive number, will spawn child processes to
compute parts of the metric in parallel by splitting the images between the
workers. If set to -1, will use the value of multiprocessing.cpu_count().
12. `print_digits`: Number of significant digits to print in summary of computed
metrics.
The input arguments have default values set for the COCO panoptic segmentation
dataset. Thus, users only need to provide the `gt_json_file` and the
`pred_json_file` (following the COCO format) to run the evaluation on COCO with
PQ. If users want to evaluate the results on other datasets, they may need
to change the default values.
As an example, the interested users could take a look at the provided unit
test, `test_compare_pq_with_reference_eval`, in `eval_coco_format_test.py`.
#### 2. TensorFlow on-line evaluation
Users may also want to run the TensorFlow on-line evaluation, similar to the
[tf.contrib.metrics.streaming_mean_iou](https://www.tensorflow.org/api_docs/python/tf/contrib/metrics/streaming_mean_iou).
Below, we provide a code snippet that shows how to use the provided
`streaming_panoptic_quality` and `streaming_parsing_covering`.
```python
metric_map = {}
metric_map['panoptic_quality'] = streaming_metrics.streaming_panoptic_quality(
category_label,
instance_label,
category_prediction,
instance_prediction,
num_classes=201,
max_instances_per_category=256,
ignored_label=0,
offset=256*256)
metric_map['parsing_covering'] = streaming_metrics.streaming_parsing_covering(
category_label,
instance_label,
category_prediction,
instance_prediction,
num_classes=201,
max_instances_per_category=256,
ignored_label=0,
offset=256*256,
normalize_by_image_size=True)
metrics_to_values, metrics_to_updates = slim.metrics.aggregate_metric_map(
metric_map)
```
where `metric_map` is a dictionary storing the streamed results of PQ and PC.
The `category_label` and the `instance_label` are the semantic segmentation and
instance segmentation groundtruth, respectively. That is, in the panoptic
segmentation format:
panoptic_label = category_label * max_instances_per_category + instance_label.
Similarly, the `category_prediction` and the `instance_prediction` are the
predicted semantic segmentation and instance segmentation, respectively.
Below, we provide a code snippet about how to summarize the results in the
context of tf.summary.
```python
summary_ops = []
for metric_name, metric_value in metrics_to_values.iteritems():
if metric_name == 'panoptic_quality':
[pq, sq, rq, total_tp, total_fn, total_fp] = tf.unstack(
metric_value, 6, axis=0)
panoptic_metrics = {
# Panoptic quality.
'pq': pq,
# Segmentation quality.
'sq': sq,
# Recognition quality.
'rq': rq,
# Total true positives.
'total_tp': total_tp,
# Total false negatives.
'total_fn': total_fn,
# Total false positives.
'total_fp': total_fp,
}
# Find the valid classes that will be used for evaluation. We will
# ignore the `ignore_label` class and other classes which have (tp + fn
# + fp) equal to 0.
valid_classes = tf.logical_and(
tf.not_equal(tf.range(0, num_classes), void_label),
tf.not_equal(total_tp + total_fn + total_fp, 0))
for target_metric, target_value in panoptic_metrics.iteritems():
output_metric_name = '{}_{}'.format(metric_name, target_metric)
op = tf.summary.scalar(
output_metric_name,
tf.reduce_mean(tf.boolean_mask(target_value, valid_classes)))
op = tf.Print(op, [target_value], output_metric_name + '_classwise: ',
summarize=num_classes)
op = tf.Print(
op,
[tf.reduce_mean(tf.boolean_mask(target_value, valid_classes))],
output_metric_name + '_mean: ',
summarize=1)
summary_ops.append(op)
elif metric_name == 'parsing_covering':
[per_class_covering,
total_per_class_weighted_ious,
total_per_class_gt_areas] = tf.unstack(metric_value, 3, axis=0)
# Find the valid classes that will be used for evaluation. We will
# ignore the `void_label` class and other classes which have
# total_per_class_weighted_ious + total_per_class_gt_areas equal to 0.
valid_classes = tf.logical_and(
tf.not_equal(tf.range(0, num_classes), void_label),
tf.not_equal(
total_per_class_weighted_ious + total_per_class_gt_areas, 0))
op = tf.summary.scalar(
metric_name,
tf.reduce_mean(tf.boolean_mask(per_class_covering, valid_classes)))
op = tf.Print(op, [per_class_covering], metric_name + '_classwise: ',
summarize=num_classes)
op = tf.Print(
op,
[tf.reduce_mean(
tf.boolean_mask(per_class_covering, valid_classes))],
metric_name + '_mean: ',
summarize=1)
summary_ops.append(op)
else:
raise ValueError('The metric_name "%s" is not supported.' % metric_name)
```
Afterwards, the users could use the following code to run the evaluation in
TensorFlow.
Users can take a look at eval.py for reference which provides a simple
example to run the streaming evaluation of mIOU for semantic segmentation.
```python
metric_values = slim.evaluation.evaluation_loop(
master=FLAGS.master,
checkpoint_dir=FLAGS.checkpoint_dir,
logdir=FLAGS.eval_logdir,
num_evals=num_batches,
eval_op=metrics_to_updates.values(),
final_op=metrics_to_values.values(),
summary_op=tf.summary.merge(summary_ops),
max_number_of_evaluations=FLAGS.max_number_of_evaluations,
eval_interval_secs=FLAGS.eval_interval_secs)
```
### References
1. **Image Parsing: Unifying Segmentation, Detection, and Recognition**<br />
Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, and Song-Chun Zhu<br />
IJCV, 2005.
2. **Panoptic Segmentation**<br />
Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother and Piotr
Dollár<br />
arXiv:1801.00868, 2018.
3. **Microsoft COCO: Common Objects in Context**<br />
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross
Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick,
Piotr Dollar<br />
In the Proc. of ECCV, 2014.
4. **The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes**<br />
Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulò, and Peter Kontschieder<br />
In the Proc. of ICCV, 2017.
5. **DeeperLab: Single-Shot Image Parser**<br />
Tien-Ju Yang, Maxwell D. Collins, Yukun Zhu, Jyh-Jing Hwang, Ting Liu,
Xiao Zhang, Vivienne Sze, George Papandreou, Liang-Chieh Chen<br />
arXiv: 1902.05093, 2019.
6. **Contour Detection and Hierarchical Image Segmentation**<br />
Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik<br />
PAMI, 2011
# Copyright 2019 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Defines the top-level interface for evaluating segmentations."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import abc
import numpy as np
import six
_EPSILON = 1e-10
def realdiv_maybe_zero(x, y):
"""Element-wise x / y where y may contain zeros, for those returns 0 too."""
return np.where(
np.less(np.abs(y), _EPSILON), np.zeros_like(x), np.divide(x, y))
@six.add_metaclass(abc.ABCMeta)
class SegmentationMetric(object):
"""Abstract base class for computers of segmentation metrics.
Subclasses will implement both:
1. Comparing the predicted segmentation for an image with the groundtruth.
2. Computing the final metric over a set of images.
These are often done as separate steps, due to the need to accumulate
intermediate values other than the metric itself across images, computing the
actual metric value only on these accumulations after all the images have been
compared.
A simple usage would be:
metric = MetricImplementation(...)
for <image>, <groundtruth> in evaluation_set:
<prediction> = run_segmentation(<image>)
metric.compare_and_accumulate(<prediction>, <groundtruth>)
print(metric.result())
"""
def __init__(self, num_categories, ignored_label, max_instances_per_category,
offset):
"""Base initialization for SegmentationMetric.
Args:
num_categories: The number of segmentation categories (or "classes" in the
dataset.
ignored_label: A category id that is ignored in evaluation, e.g. the void
label as defined in COCO panoptic segmentation dataset.
max_instances_per_category: The maximum number of instances for each
category. Used in ensuring unique instance labels.
offset: The maximum number of unique labels. This is used, by multiplying
the ground-truth labels, to generate unique ids for individual regions
of overlap between groundtruth and predicted segments.
"""
self.num_categories = num_categories
self.ignored_label = ignored_label
self.max_instances_per_category = max_instances_per_category
self.offset = offset
self.reset()
def _naively_combine_labels(self, category_array, instance_array):
"""Naively creates a combined label array from categories and instances."""
return (category_array.astype(np.uint32) * self.max_instances_per_category +
instance_array.astype(np.uint32))
@abc.abstractmethod
def compare_and_accumulate(
self, groundtruth_category_array, groundtruth_instance_array,
predicted_category_array, predicted_instance_array):
"""Compares predicted segmentation with groundtruth, accumulates its metric.
It is not assumed that instance ids are unique across different categories.
See for example combine_semantic_and_instance_predictions.py in official
PanopticAPI evaluation code for issues to consider when fusing category
and instance labels.
Instances ids of the ignored category have the meaning that id 0 is "void"
and remaining ones are crowd instances.
Args:
groundtruth_category_array: A 2D numpy uint16 array of groundtruth
per-pixel category labels.
groundtruth_instance_array: A 2D numpy uint16 array of groundtruth
instance labels.
predicted_category_array: A 2D numpy uint16 array of predicted per-pixel
category labels.
predicted_instance_array: A 2D numpy uint16 array of predicted instance
labels.
Returns:
The value of the metric over all comparisons done so far, including this
one, as a float scalar.
"""
raise NotImplementedError('Must be implemented in subclasses.')
@abc.abstractmethod
def result(self):
"""Computes the metric over all comparisons done so far."""
raise NotImplementedError('Must be implemented in subclasses.')
@abc.abstractmethod
def detailed_results(self, is_thing=None):
"""Computes and returns the detailed final metric results.
Args:
is_thing: A boolean array of length `num_categories`. The entry
`is_thing[category_id]` is True iff that category is a "thing" category
instead of "stuff."
Returns:
A dictionary with a breakdown of metrics and/or metric factors by things,
stuff, and all categories.
"""
raise NotImplementedError('Not implemented in subclasses.')
@abc.abstractmethod
def result_per_category(self):
"""For supported metrics, return individual per-category metric values.
Returns:
A numpy array of shape `[self.num_categories]`, where index `i` is the
metrics value over only that category.
"""
raise NotImplementedError('Not implemented in subclass.')
def print_detailed_results(self, is_thing=None, print_digits=3):
"""Prints out a detailed breakdown of metric results.
Args:
is_thing: A boolean array of length num_categories.
`is_thing[category_id]` will say whether that category is a "thing"
rather than "stuff."
print_digits: Number of significant digits to print in computed metrics.
"""
raise NotImplementedError('Not implemented in subclass.')
@abc.abstractmethod
def merge(self, other_instance):
"""Combines the accumulated results of another instance into self.
The following two cases should put `metric_a` into an equivalent state.
Case 1 (with merge):
metric_a = MetricsSubclass(...)
metric_a.compare_and_accumulate(<comparison 1>)
metric_a.compare_and_accumulate(<comparison 2>)
metric_b = MetricsSubclass(...)
metric_b.compare_and_accumulate(<comparison 3>)
metric_b.compare_and_accumulate(<comparison 4>)
metric_a.merge(metric_b)
Case 2 (without merge):
metric_a = MetricsSubclass(...)
metric_a.compare_and_accumulate(<comparison 1>)
metric_a.compare_and_accumulate(<comparison 2>)
metric_a.compare_and_accumulate(<comparison 3>)
metric_a.compare_and_accumulate(<comparison 4>)
Args:
other_instance: Another compatible instance of the same metric subclass.
"""
raise NotImplementedError('Not implemented in subclass.')
@abc.abstractmethod
def reset(self):
"""Resets the accumulation to the metric class's state at initialization.
Note that this function will be called in SegmentationMetric.__init__.
"""
raise NotImplementedError('Must be implemented in subclasses.')
# Copyright 2019 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Computes evaluation metrics on groundtruth and predictions in COCO format.
The Common Objects in Context (COCO) dataset defines a format for specifying
combined semantic and instance segmentations as "panoptic" segmentations. This
is done with the combination of JSON and image files as specified at:
http://cocodataset.org/#format-results
where the JSON file specifies the overall structure of the result,
including the categories for each annotation, and the images specify the image
region for each annotation in that image by its ID.
This script computes additional metrics such as Parsing Covering on datasets and
predictions in this format. An implementation of Panoptic Quality is also
provided for convenience.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import json
import multiprocessing
import os
from absl import app
from absl import flags
from absl import logging
import numpy as np
from PIL import Image
import utils as panopticapi_utils
import six
from deeplab.evaluation import panoptic_quality
from deeplab.evaluation import parsing_covering
FLAGS = flags.FLAGS
flags.DEFINE_string(
'gt_json_file', None,
' Path to a JSON file giving ground-truth annotations in COCO format.')
flags.DEFINE_string('pred_json_file', None,
'Path to a JSON file for the predictions to evaluate.')
flags.DEFINE_string(
'gt_folder', None,
'Folder containing panoptic-format ID images to match ground-truth '
'annotations to image regions.')
flags.DEFINE_string('pred_folder', None,
'Folder containing ID images for predictions.')
flags.DEFINE_enum(
'metric', 'pq', ['pq', 'pc'], 'Shorthand name of a metric to compute. '
'Supported values are:\n'
'Panoptic Quality (pq)\n'
'Parsing Covering (pc)')
flags.DEFINE_integer(
'num_categories', 201,
'The number of segmentation categories (or "classes") in the dataset.')
flags.DEFINE_integer(
'ignored_label', 0,
'A category id that is ignored in evaluation, e.g. the void label as '
'defined in COCO panoptic segmentation dataset.')
flags.DEFINE_integer(
'max_instances_per_category', 256,
'The maximum number of instances for each category. Used in ensuring '
'unique instance labels.')
flags.DEFINE_integer('intersection_offset', None,
'The maximum number of unique labels.')
flags.DEFINE_bool(
'normalize_by_image_size', True,
'Whether to normalize groundtruth instance region areas by image size. If '
'True, groundtruth instance areas and weighted IoUs will be divided by the '
'size of the corresponding image before accumulated across the dataset. '
'Only used for Parsing Covering (pc) evaluation.')
flags.DEFINE_integer(
'num_workers', 0, 'If set to a positive number, will spawn child processes '
'to compute parts of the metric in parallel by splitting '
'the images between the workers. If set to -1, will use '
'the value of multiprocessing.cpu_count().')
flags.DEFINE_integer('print_digits', 3,
'Number of significant digits to print in metrics.')
def _build_metric(metric,
num_categories,
ignored_label,
max_instances_per_category,
intersection_offset=None,
normalize_by_image_size=True):
"""Creates a metric aggregator objet of the given name."""
if metric == 'pq':
logging.warning('One should check Panoptic Quality results against the '
'official COCO API code. Small numerical differences '
'(< 0.1%) can be magnified by rounding.')
return panoptic_quality.PanopticQuality(num_categories, ignored_label,
max_instances_per_category,
intersection_offset)
elif metric == 'pc':
return parsing_covering.ParsingCovering(
num_categories, ignored_label, max_instances_per_category,
intersection_offset, normalize_by_image_size)
else:
raise ValueError('No implementation for metric "%s"' % metric)
def _matched_annotations(gt_json, pred_json):
"""Yields a set of (groundtruth, prediction) image annotation pairs.."""
image_id_to_pred_ann = {
annotation['image_id']: annotation
for annotation in pred_json['annotations']
}
for gt_ann in gt_json['annotations']:
image_id = gt_ann['image_id']
pred_ann = image_id_to_pred_ann[image_id]
yield gt_ann, pred_ann
def _open_panoptic_id_image(image_path):
"""Loads a COCO-format panoptic ID image from file."""
return panopticapi_utils.rgb2id(
np.array(Image.open(image_path), dtype=np.uint32))
def _split_panoptic(ann_json, id_array, ignored_label, allow_crowds):
"""Given the COCO JSON and ID map, splits into categories and instances."""
category = np.zeros(id_array.shape, np.uint16)
instance = np.zeros(id_array.shape, np.uint16)
next_instance_id = collections.defaultdict(int)
# Skip instance label 0 for ignored label. That is reserved for void.
next_instance_id[ignored_label] = 1
for segment_info in ann_json['segments_info']:
if allow_crowds and segment_info['iscrowd']:
category_id = ignored_label
else:
category_id = segment_info['category_id']
mask = np.equal(id_array, segment_info['id'])
category[mask] = category_id
instance[mask] = next_instance_id[category_id]
next_instance_id[category_id] += 1
return category, instance
def _category_and_instance_from_annotation(ann_json, folder, ignored_label,
allow_crowds):
"""Given the COCO JSON annotations, finds maps of categories and instances."""
panoptic_id_image = _open_panoptic_id_image(
os.path.join(folder, ann_json['file_name']))
return _split_panoptic(ann_json, panoptic_id_image, ignored_label,
allow_crowds)
def _compute_metric(metric_aggregator, gt_folder, pred_folder,
annotation_pairs):
"""Iterates over matched annotation pairs and computes a metric over them."""
for gt_ann, pred_ann in annotation_pairs:
# We only expect "iscrowd" to appear in the ground-truth, and not in model
# output. In predicted JSON it is simply ignored, as done in official code.
gt_category, gt_instance = _category_and_instance_from_annotation(
gt_ann, gt_folder, metric_aggregator.ignored_label, True)
pred_category, pred_instance = _category_and_instance_from_annotation(
pred_ann, pred_folder, metric_aggregator.ignored_label, False)
metric_aggregator.compare_and_accumulate(gt_category, gt_instance,
pred_category, pred_instance)
return metric_aggregator
def _iterate_work_queue(work_queue):
"""Creates an iterable that retrieves items from a queue until one is None."""
task = work_queue.get(block=True)
while task is not None:
yield task
task = work_queue.get(block=True)
def _run_metrics_worker(metric_aggregator, gt_folder, pred_folder, work_queue,
result_queue):
result = _compute_metric(metric_aggregator, gt_folder, pred_folder,
_iterate_work_queue(work_queue))
result_queue.put(result, block=True)
def _is_thing_array(categories_json, ignored_label):
"""is_thing[category_id] is a bool on if category is "thing" or "stuff"."""
is_thing_dict = {}
for category_json in categories_json:
is_thing_dict[category_json['id']] = bool(category_json['isthing'])
# Check our assumption that the category ids are consecutive.
# Usually metrics should be able to handle this case, but adding a warning
# here.
max_category_id = max(six.iterkeys(is_thing_dict))
if len(is_thing_dict) != max_category_id + 1:
seen_ids = six.viewkeys(is_thing_dict)
all_ids = set(six.moves.range(max_category_id + 1))
unseen_ids = all_ids.difference(seen_ids)
if unseen_ids != {ignored_label}:
logging.warning(
'Nonconsecutive category ids or no category JSON specified for ids: '
'%s', unseen_ids)
is_thing_array = np.zeros(max_category_id + 1)
for category_id, is_thing in six.iteritems(is_thing_dict):
is_thing_array[category_id] = is_thing
return is_thing_array
def eval_coco_format(gt_json_file,
pred_json_file,
gt_folder=None,
pred_folder=None,
metric='pq',
num_categories=201,
ignored_label=0,
max_instances_per_category=256,
intersection_offset=None,
normalize_by_image_size=True,
num_workers=0,
print_digits=3):
"""Top-level code to compute metrics on a COCO-format result.
Note that the default values are set for COCO panoptic segmentation dataset,
and thus the users may want to change it for their own dataset evaluation.
Args:
gt_json_file: Path to a JSON file giving ground-truth annotations in COCO
format.
pred_json_file: Path to a JSON file for the predictions to evaluate.
gt_folder: Folder containing panoptic-format ID images to match ground-truth
annotations to image regions.
pred_folder: Folder containing ID images for predictions.
metric: Name of a metric to compute.
num_categories: The number of segmentation categories (or "classes") in the
dataset.
ignored_label: A category id that is ignored in evaluation, e.g. the "void"
label as defined in the COCO panoptic segmentation dataset.
max_instances_per_category: The maximum number of instances for each
category. Used in ensuring unique instance labels.
intersection_offset: The maximum number of unique labels.
normalize_by_image_size: Whether to normalize groundtruth instance region
areas by image size. If True, groundtruth instance areas and weighted IoUs
will be divided by the size of the corresponding image before accumulated
across the dataset. Only used for Parsing Covering (pc) evaluation.
num_workers: If set to a positive number, will spawn child processes to
compute parts of the metric in parallel by splitting the images between
the workers. If set to -1, will use the value of
multiprocessing.cpu_count().
print_digits: Number of significant digits to print in summary of computed
metrics.
Returns:
The computed result of the metric as a float scalar.
"""
with open(gt_json_file, 'r') as gt_json_fo:
gt_json = json.load(gt_json_fo)
with open(pred_json_file, 'r') as pred_json_fo:
pred_json = json.load(pred_json_fo)
if gt_folder is None:
gt_folder = gt_json_file.replace('.json', '')
if pred_folder is None:
pred_folder = pred_json_file.replace('.json', '')
if intersection_offset is None:
intersection_offset = (num_categories + 1) * max_instances_per_category
metric_aggregator = _build_metric(
metric, num_categories, ignored_label, max_instances_per_category,
intersection_offset, normalize_by_image_size)
if num_workers == -1:
logging.info('Attempting to get the CPU count to set # workers.')
num_workers = multiprocessing.cpu_count()
if num_workers > 0:
logging.info('Computing metric in parallel with %d workers.', num_workers)
work_queue = multiprocessing.Queue()
result_queue = multiprocessing.Queue()
workers = []
worker_args = (metric_aggregator, gt_folder, pred_folder, work_queue,
result_queue)
for _ in six.moves.range(num_workers):
workers.append(
multiprocessing.Process(target=_run_metrics_worker, args=worker_args))
for worker in workers:
worker.start()
for ann_pair in _matched_annotations(gt_json, pred_json):
work_queue.put(ann_pair, block=True)
# Will cause each worker to return a result and terminate upon recieving a
# None task.
for _ in six.moves.range(num_workers):
work_queue.put(None, block=True)
# Retrieve results.
for _ in six.moves.range(num_workers):
metric_aggregator.merge(result_queue.get(block=True))
for worker in workers:
worker.join()
else:
logging.info('Computing metric in a single process.')
annotation_pairs = _matched_annotations(gt_json, pred_json)
_compute_metric(metric_aggregator, gt_folder, pred_folder, annotation_pairs)
is_thing = _is_thing_array(gt_json['categories'], ignored_label)
metric_aggregator.print_detailed_results(
is_thing=is_thing, print_digits=print_digits)
return metric_aggregator.detailed_results(is_thing=is_thing)
def main(argv):
if len(argv) > 1:
raise app.UsageError('Too many command-line arguments.')
eval_coco_format(FLAGS.gt_json_file, FLAGS.pred_json_file, FLAGS.gt_folder,
FLAGS.pred_folder, FLAGS.metric, FLAGS.num_categories,
FLAGS.ignored_label, FLAGS.max_instances_per_category,
FLAGS.intersection_offset, FLAGS.normalize_by_image_size,
FLAGS.num_workers, FLAGS.print_digits)
if __name__ == '__main__':
flags.mark_flags_as_required(
['gt_json_file', 'gt_folder', 'pred_json_file', 'pred_folder'])
app.run(main)
# Copyright 2019 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for eval_coco_format script."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from absl import flags
from absl.testing import absltest
import evaluation as panopticapi_eval
from deeplab.evaluation import eval_coco_format
_TEST_DIR = 'deeplab/evaluation/testdata'
FLAGS = flags.FLAGS
class EvalCocoFormatTest(absltest.TestCase):
def test_compare_pq_with_reference_eval(self):
sample_data_dir = os.path.join(_TEST_DIR)
gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
gt_folder = os.path.join(sample_data_dir, 'coco_gt')
pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
pred_folder = os.path.join(sample_data_dir, 'coco_pred')
panopticapi_results = panopticapi_eval.pq_compute(
gt_json_file, pred_json_file, gt_folder, pred_folder)
deeplab_results = eval_coco_format.eval_coco_format(
gt_json_file,
pred_json_file,
gt_folder,
pred_folder,
metric='pq',
num_categories=7,
ignored_label=0,
max_instances_per_category=256,
intersection_offset=(256 * 256))
self.assertCountEqual(deeplab_results.keys(), ['All', 'Things', 'Stuff'])
for cat_group in ['All', 'Things', 'Stuff']:
self.assertCountEqual(deeplab_results[cat_group], ['pq', 'sq', 'rq', 'n'])
for metric in ['pq', 'sq', 'rq', 'n']:
self.assertAlmostEqual(deeplab_results[cat_group][metric],
panopticapi_results[cat_group][metric])
def test_compare_pc_with_golden_value(self):
sample_data_dir = os.path.join(_TEST_DIR)
gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
gt_folder = os.path.join(sample_data_dir, 'coco_gt')
pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
pred_folder = os.path.join(sample_data_dir, 'coco_pred')
deeplab_results = eval_coco_format.eval_coco_format(
gt_json_file,
pred_json_file,
gt_folder,
pred_folder,
metric='pc',
num_categories=7,
ignored_label=0,
max_instances_per_category=256,
intersection_offset=(256 * 256),
normalize_by_image_size=False)
self.assertCountEqual(deeplab_results.keys(), ['All', 'Things', 'Stuff'])
for cat_group in ['All', 'Things', 'Stuff']:
self.assertCountEqual(deeplab_results[cat_group], ['pc', 'n'])
self.assertAlmostEqual(deeplab_results['All']['pc'], 0.68210561)
self.assertEqual(deeplab_results['All']['n'], 6)
self.assertAlmostEqual(deeplab_results['Things']['pc'], 0.5890529)
self.assertEqual(deeplab_results['Things']['n'], 4)
self.assertAlmostEqual(deeplab_results['Stuff']['pc'], 0.86821097)
self.assertEqual(deeplab_results['Stuff']['n'], 2)
def test_compare_pc_with_golden_value_normalize_by_size(self):
sample_data_dir = os.path.join(_TEST_DIR)
gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
gt_folder = os.path.join(sample_data_dir, 'coco_gt')
pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
pred_folder = os.path.join(sample_data_dir, 'coco_pred')
deeplab_results = eval_coco_format.eval_coco_format(
gt_json_file,
pred_json_file,
gt_folder,
pred_folder,
metric='pc',
num_categories=7,
ignored_label=0,
max_instances_per_category=256,
intersection_offset=(256 * 256),
normalize_by_image_size=True)
self.assertCountEqual(deeplab_results.keys(), ['All', 'Things', 'Stuff'])
self.assertAlmostEqual(deeplab_results['All']['pc'], 0.68214908840)
def test_pc_with_multiple_workers(self):
sample_data_dir = os.path.join(_TEST_DIR)
gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
gt_folder = os.path.join(sample_data_dir, 'coco_gt')
pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
pred_folder = os.path.join(sample_data_dir, 'coco_pred')
deeplab_results = eval_coco_format.eval_coco_format(
gt_json_file,
pred_json_file,
gt_folder,
pred_folder,
metric='pc',
num_categories=7,
ignored_label=0,
max_instances_per_category=256,
intersection_offset=(256 * 256),
num_workers=3,
normalize_by_image_size=False)
self.assertCountEqual(deeplab_results.keys(), ['All', 'Things', 'Stuff'])
self.assertAlmostEqual(deeplab_results['All']['pc'], 0.68210561668)
if __name__ == '__main__':
absltest.main()
# Copyright 2019 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Implementation of the Panoptic Quality metric.
Panoptic Quality is an instance-based metric for evaluating the task of
image parsing, aka panoptic segmentation.
Please see the paper for details:
"Panoptic Segmentation", Alexander Kirillov, Kaiming He, Ross Girshick,
Carsten Rother and Piotr Dollar. arXiv:1801.00868, 2018.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
import prettytable
import six
from deeplab.evaluation import base_metric
def _ids_to_counts(id_array):
"""Given a numpy array, a mapping from each unique entry to its count."""
ids, counts = np.unique(id_array, return_counts=True)
return dict(six.moves.zip(ids, counts))
class PanopticQuality(base_metric.SegmentationMetric):
"""Metric class for Panoptic Quality.
"Panoptic Segmentation" by Alexander Kirillov, Kaiming He, Ross Girshick,
Carsten Rother, Piotr Dollar.
https://arxiv.org/abs/1801.00868
"""
def compare_and_accumulate(
self, groundtruth_category_array, groundtruth_instance_array,
predicted_category_array, predicted_instance_array):
"""See base class."""
# First, combine the category and instance labels so that every unique
# value for (category, instance) is assigned a unique integer label.
pred_segment_id = self._naively_combine_labels(predicted_category_array,
predicted_instance_array)
gt_segment_id = self._naively_combine_labels(groundtruth_category_array,
groundtruth_instance_array)
# Pre-calculate areas for all groundtruth and predicted segments.
gt_segment_areas = _ids_to_counts(gt_segment_id)
pred_segment_areas = _ids_to_counts(pred_segment_id)
# We assume there is only one void segment and it has instance id = 0.
void_segment_id = self.ignored_label * self.max_instances_per_category
# There may be other ignored groundtruth segments with instance id > 0, find
# those ids using the unique segment ids extracted with the area computation
# above.
ignored_segment_ids = {
gt_segment_id for gt_segment_id in six.iterkeys(gt_segment_areas)
if (gt_segment_id //
self.max_instances_per_category) == self.ignored_label
}
# Next, combine the groundtruth and predicted labels. Dividing up the pixels
# based on which groundtruth segment and which predicted segment they belong
# to, this will assign a different 32-bit integer label to each choice
# of (groundtruth segment, predicted segment), encoded as
# gt_segment_id * offset + pred_segment_id.
intersection_id_array = (
gt_segment_id.astype(np.uint32) * self.offset +
pred_segment_id.astype(np.uint32))
# For every combination of (groundtruth segment, predicted segment) with a
# non-empty intersection, this counts the number of pixels in that
# intersection.
intersection_areas = _ids_to_counts(intersection_id_array)
# Helper function that computes the area of the overlap between a predicted
# segment and the ground-truth void/ignored segment.
def prediction_void_overlap(pred_segment_id):
void_intersection_id = void_segment_id * self.offset + pred_segment_id
return intersection_areas.get(void_intersection_id, 0)
# Compute overall ignored overlap.
def prediction_ignored_overlap(pred_segment_id):
total_ignored_overlap = 0
for ignored_segment_id in ignored_segment_ids:
intersection_id = ignored_segment_id * self.offset + pred_segment_id
total_ignored_overlap += intersection_areas.get(intersection_id, 0)
return total_ignored_overlap
# Sets that are populated with which segments groundtruth/predicted segments
# have been matched with overlapping predicted/groundtruth segments
# respectively.
gt_matched = set()
pred_matched = set()
# Calculate IoU per pair of intersecting segments of the same category.
for intersection_id, intersection_area in six.iteritems(intersection_areas):
gt_segment_id = intersection_id // self.offset
pred_segment_id = intersection_id % self.offset
gt_category = gt_segment_id // self.max_instances_per_category
pred_category = pred_segment_id // self.max_instances_per_category
if gt_category != pred_category:
continue
# Union between the groundtruth and predicted segments being compared does
# not include the portion of the predicted segment that consists of
# groundtruth "void" pixels.
union = (
gt_segment_areas[gt_segment_id] +
pred_segment_areas[pred_segment_id] - intersection_area -
prediction_void_overlap(pred_segment_id))
iou = intersection_area / union
if iou > 0.5:
self.tp_per_class[gt_category] += 1
self.iou_per_class[gt_category] += iou
gt_matched.add(gt_segment_id)
pred_matched.add(pred_segment_id)
# Count false negatives for each category.
for gt_segment_id in six.iterkeys(gt_segment_areas):
if gt_segment_id in gt_matched:
continue
category = gt_segment_id // self.max_instances_per_category
# Failing to detect a void segment is not a false negative.
if category == self.ignored_label:
continue
self.fn_per_class[category] += 1
# Count false positives for each category.
for pred_segment_id in six.iterkeys(pred_segment_areas):
if pred_segment_id in pred_matched:
continue
# A false positive is not penalized if is mostly ignored in the
# groundtruth.
if (prediction_ignored_overlap(pred_segment_id) /
pred_segment_areas[pred_segment_id]) > 0.5:
continue
category = pred_segment_id // self.max_instances_per_category
self.fp_per_class[category] += 1
return self.result()
def _valid_categories(self):
"""Categories with a "valid" value for the metric, have > 0 instances.
We will ignore the `ignore_label` class and other classes which have
`tp + fn + fp = 0`.
Returns:
Boolean array of shape `[num_categories]`.
"""
valid_categories = np.not_equal(
self.tp_per_class + self.fn_per_class + self.fp_per_class, 0)
if self.ignored_label >= 0 and self.ignored_label < self.num_categories:
valid_categories[self.ignored_label] = False
return valid_categories
def detailed_results(self, is_thing=None):
"""See base class."""
valid_categories = self._valid_categories()
# If known, break down which categories are valid _and_ things/stuff.
category_sets = collections.OrderedDict()
category_sets['All'] = valid_categories
if is_thing is not None:
category_sets['Things'] = np.logical_and(valid_categories, is_thing)
category_sets['Stuff'] = np.logical_and(valid_categories,
np.logical_not(is_thing))
# Compute individual per-class metrics that constitute factors of PQ.
sq = base_metric.realdiv_maybe_zero(self.iou_per_class, self.tp_per_class)
rq = base_metric.realdiv_maybe_zero(
self.tp_per_class,
self.tp_per_class + 0.5 * self.fn_per_class + 0.5 * self.fp_per_class)
pq = np.multiply(sq, rq)
# Assemble detailed results dictionary.
results = {}
for category_set_name, in_category_set in six.iteritems(category_sets):
if np.any(in_category_set):
results[category_set_name] = {
'pq': np.mean(pq[in_category_set]),
'sq': np.mean(sq[in_category_set]),
'rq': np.mean(rq[in_category_set]),
# The number of categories in this subset.
'n': np.sum(in_category_set.astype(np.int32)),
}
else:
results[category_set_name] = {'pq': 0, 'sq': 0, 'rq': 0, 'n': 0}
return results
def result_per_category(self):
"""See base class."""
sq = base_metric.realdiv_maybe_zero(self.iou_per_class, self.tp_per_class)
rq = base_metric.realdiv_maybe_zero(
self.tp_per_class,
self.tp_per_class + 0.5 * self.fn_per_class + 0.5 * self.fp_per_class)
return np.multiply(sq, rq)
def print_detailed_results(self, is_thing=None, print_digits=3):
"""See base class."""
results = self.detailed_results(is_thing=is_thing)
tab = prettytable.PrettyTable()
tab.add_column('', [], align='l')
for fieldname in ['PQ', 'SQ', 'RQ', 'N']:
tab.add_column(fieldname, [], align='r')
for category_set, subset_results in six.iteritems(results):
data_cols = [
round(subset_results[col_key], print_digits) * 100
for col_key in ['pq', 'sq', 'rq']
]
data_cols += [subset_results['n']]
tab.add_row([category_set] + data_cols)
print(tab)
def result(self):
"""See base class."""
pq_per_class = self.result_per_category()
valid_categories = self._valid_categories()
if not np.any(valid_categories):
return 0.
return np.mean(pq_per_class[valid_categories])
def merge(self, other_instance):
"""See base class."""
self.iou_per_class += other_instance.iou_per_class
self.tp_per_class += other_instance.tp_per_class
self.fn_per_class += other_instance.fn_per_class
self.fp_per_class += other_instance.fp_per_class
def reset(self):
"""See base class."""
self.iou_per_class = np.zeros(self.num_categories, dtype=np.float64)
self.tp_per_class = np.zeros(self.num_categories, dtype=np.float64)
self.fn_per_class = np.zeros(self.num_categories, dtype=np.float64)
self.fp_per_class = np.zeros(self.num_categories, dtype=np.float64)
# Copyright 2019 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for Panoptic Quality metric."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl.testing import absltest
import numpy as np
import six
from deeplab.evaluation import panoptic_quality
from deeplab.evaluation import test_utils
# See the definition of the color names at:
# https://en.wikipedia.org/wiki/Web_colors.
_CLASS_COLOR_MAP = {
(0, 0, 0): 0,
(0, 0, 255): 1, # Person (blue).
(255, 0, 0): 2, # Bear (red).
(0, 255, 0): 3, # Tree (lime).
(255, 0, 255): 4, # Bird (fuchsia).
(0, 255, 255): 5, # Sky (aqua).
(255, 255, 0): 6, # Cat (yellow).
}
class PanopticQualityTest(absltest.TestCase):
def test_perfect_match(self):
categories = np.zeros([6, 6], np.uint16)
instances = np.array([
[1, 1, 1, 1, 1, 1],
[1, 2, 2, 2, 2, 1],
[1, 2, 2, 2, 2, 1],
[1, 2, 2, 2, 2, 1],
[1, 2, 2, 1, 1, 1],
[1, 2, 1, 1, 1, 1],
],
dtype=np.uint16)
pq = panoptic_quality.PanopticQuality(
num_categories=1,
ignored_label=2,
max_instances_per_category=16,
offset=16)
pq.compare_and_accumulate(categories, instances, categories, instances)
np.testing.assert_array_equal(pq.iou_per_class, [2.0])
np.testing.assert_array_equal(pq.tp_per_class, [2])
np.testing.assert_array_equal(pq.fn_per_class, [0])
np.testing.assert_array_equal(pq.fp_per_class, [0])
np.testing.assert_array_equal(pq.result_per_category(), [1.0])
self.assertEqual(pq.result(), 1.0)
def test_totally_wrong(self):
det_categories = np.array([
[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
],
dtype=np.uint16)
gt_categories = 1 - det_categories
instances = np.zeros([6, 6], np.uint16)
pq = panoptic_quality.PanopticQuality(
num_categories=2,
ignored_label=2,
max_instances_per_category=1,
offset=16)
pq.compare_and_accumulate(gt_categories, instances, det_categories,
instances)
np.testing.assert_array_equal(pq.iou_per_class, [0.0, 0.0])
np.testing.assert_array_equal(pq.tp_per_class, [0, 0])
np.testing.assert_array_equal(pq.fn_per_class, [1, 1])
np.testing.assert_array_equal(pq.fp_per_class, [1, 1])
np.testing.assert_array_equal(pq.result_per_category(), [0.0, 0.0])
self.assertEqual(pq.result(), 0.0)
def test_matches_by_iou(self):
good_det_labels = np.array(
[
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 2, 2, 2, 2, 1],
[1, 2, 2, 2, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
],
dtype=np.uint16)
gt_labels = np.array(
[
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 2, 2, 2, 1],
[1, 2, 2, 2, 2, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
],
dtype=np.uint16)
pq = panoptic_quality.PanopticQuality(
num_categories=1,
ignored_label=2,
max_instances_per_category=16,
offset=16)
pq.compare_and_accumulate(
np.zeros_like(gt_labels), gt_labels, np.zeros_like(good_det_labels),
good_det_labels)
# iou(1, 1) = 28/30
# iou(2, 2) = 6/8
np.testing.assert_array_almost_equal(pq.iou_per_class, [28 / 30 + 6 / 8])
np.testing.assert_array_equal(pq.tp_per_class, [2])
np.testing.assert_array_equal(pq.fn_per_class, [0])
np.testing.assert_array_equal(pq.fp_per_class, [0])
self.assertAlmostEqual(pq.result(), (28 / 30 + 6 / 8) / 2)
bad_det_labels = np.array(
[
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 2, 2, 1],
[1, 1, 1, 2, 2, 1],
[1, 1, 1, 2, 2, 1],
[1, 1, 1, 1, 1, 1],
],
dtype=np.uint16)
pq.reset()
pq.compare_and_accumulate(
np.zeros_like(gt_labels), gt_labels, np.zeros_like(bad_det_labels),
bad_det_labels)
# iou(1, 1) = 27/32
np.testing.assert_array_almost_equal(pq.iou_per_class, [27 / 32])
np.testing.assert_array_equal(pq.tp_per_class, [1])
np.testing.assert_array_equal(pq.fn_per_class, [1])
np.testing.assert_array_equal(pq.fp_per_class, [1])
self.assertAlmostEqual(pq.result(), (27 / 32) * (1 / 2))
def test_wrong_instances(self):
categories = np.array([
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 2, 2, 1, 2, 2],
[1, 2, 2, 1, 2, 2],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
],
dtype=np.uint16)
predicted_instances = np.array([
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
],
dtype=np.uint16)
groundtruth_instances = np.zeros([6, 6], dtype=np.uint16)
pq = panoptic_quality.PanopticQuality(
num_categories=3,
ignored_label=0,
max_instances_per_category=10,
offset=100)
pq.compare_and_accumulate(categories, groundtruth_instances, categories,
predicted_instances)
np.testing.assert_array_equal(pq.iou_per_class, [0.0, 1.0, 0.0])
np.testing.assert_array_equal(pq.tp_per_class, [0, 1, 0])
np.testing.assert_array_equal(pq.fn_per_class, [0, 0, 1])
np.testing.assert_array_equal(pq.fp_per_class, [0, 0, 2])
np.testing.assert_array_equal(pq.result_per_category(), [0, 1, 0])
self.assertAlmostEqual(pq.result(), 0.5)
def test_instance_order_is_arbitrary(self):
categories = np.array([
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 2, 2, 1, 2, 2],
[1, 2, 2, 1, 2, 2],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
],
dtype=np.uint16)
predicted_instances = np.array([
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
],
dtype=np.uint16)
groundtruth_instances = np.array([
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0],
[0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
],
dtype=np.uint16)
pq = panoptic_quality.PanopticQuality(
num_categories=3,
ignored_label=0,
max_instances_per_category=10,
offset=100)
pq.compare_and_accumulate(categories, groundtruth_instances, categories,
predicted_instances)
np.testing.assert_array_equal(pq.iou_per_class, [0.0, 1.0, 2.0])
np.testing.assert_array_equal(pq.tp_per_class, [0, 1, 2])
np.testing.assert_array_equal(pq.fn_per_class, [0, 0, 0])
np.testing.assert_array_equal(pq.fp_per_class, [0, 0, 0])
np.testing.assert_array_equal(pq.result_per_category(), [0, 1, 1])
self.assertAlmostEqual(pq.result(), 1.0)
def test_matches_expected(self):
pred_classes = test_utils.read_segmentation_with_rgb_color_map(
'team_pred_class.png', _CLASS_COLOR_MAP)
pred_instances = test_utils.read_test_image(
'team_pred_instance.png', mode='L')
instance_class_map = {
0: 0,
47: 1,
97: 1,
133: 1,
150: 1,
174: 1,
198: 2,
215: 1,
244: 1,
255: 1,
}
gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
'team_gt_instance.png', instance_class_map)
pq = panoptic_quality.PanopticQuality(
num_categories=3,
ignored_label=0,
max_instances_per_category=256,
offset=256 * 256)
pq.compare_and_accumulate(gt_classes, gt_instances, pred_classes,
pred_instances)
np.testing.assert_array_almost_equal(
pq.iou_per_class, [2.06104, 5.26827, 0.54069], decimal=4)
np.testing.assert_array_equal(pq.tp_per_class, [1, 7, 1])
np.testing.assert_array_equal(pq.fn_per_class, [0, 1, 0])
np.testing.assert_array_equal(pq.fp_per_class, [0, 0, 0])
np.testing.assert_array_almost_equal(pq.result_per_category(),
[2.061038, 0.702436, 0.54069])
self.assertAlmostEqual(pq.result(), 0.62156287)
def test_merge_accumulates_all_across_instances(self):
categories = np.zeros([6, 6], np.uint16)
good_det_labels = np.array([
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 2, 2, 2, 2, 1],
[1, 2, 2, 2, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
],
dtype=np.uint16)
gt_labels = np.array([
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 2, 2, 2, 1],
[1, 2, 2, 2, 2, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
],
dtype=np.uint16)
good_pq = panoptic_quality.PanopticQuality(
num_categories=1,
ignored_label=2,
max_instances_per_category=16,
offset=16)
for _ in six.moves.range(2):
good_pq.compare_and_accumulate(categories, gt_labels, categories,
good_det_labels)
bad_det_labels = np.array([
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 2, 2, 1],
[1, 1, 1, 2, 2, 1],
[1, 1, 1, 2, 2, 1],
[1, 1, 1, 1, 1, 1],
],
dtype=np.uint16)
bad_pq = panoptic_quality.PanopticQuality(
num_categories=1,
ignored_label=2,
max_instances_per_category=16,
offset=16)
for _ in six.moves.range(2):
bad_pq.compare_and_accumulate(categories, gt_labels, categories,
bad_det_labels)
good_pq.merge(bad_pq)
np.testing.assert_array_almost_equal(
good_pq.iou_per_class, [2 * (28 / 30 + 6 / 8) + 2 * (27 / 32)])
np.testing.assert_array_equal(good_pq.tp_per_class, [2 * 2 + 2])
np.testing.assert_array_equal(good_pq.fn_per_class, [2])
np.testing.assert_array_equal(good_pq.fp_per_class, [2])
self.assertAlmostEqual(good_pq.result(), 0.63177083)
if __name__ == '__main__':
absltest.main()
# Copyright 2019 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Implementation of the Parsing Covering metric.
Parsing Covering is a region-based metric for evaluating the task of
image parsing, aka panoptic segmentation.
Please see the paper for details:
"DeeperLab: Single-Shot Image Parser", Tien-Ju Yang, Maxwell D. Collins,
Yukun Zhu, Jyh-Jing Hwang, Ting Liu, Xiao Zhang, Vivienne Sze,
George Papandreou, Liang-Chieh Chen. arXiv: 1902.05093, 2019.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
import prettytable
import six
from deeplab.evaluation import base_metric
class ParsingCovering(base_metric.SegmentationMetric):
r"""Metric class for Parsing Covering.
Computes segmentation covering metric introduced in (Arbelaez, et al., 2010)
with extension to handle multi-class semantic labels (a.k.a. parsing
covering). Specifically, segmentation covering (SC) is defined in Eq. (8) in
(Arbelaez et al., 2010) as:
SC(c) = \sum_{R\in S}(|R| * \max_{R'\in S'}O(R,R')) / \sum_{R\in S}|R|,
where S are the groundtruth instance regions and S' are the predicted
instance regions. The parsing covering is simply:
PC = \sum_{c=1}^{C}SC(c) / C,
where C is the number of classes.
"""
def __init__(self,
num_categories,
ignored_label,
max_instances_per_category,
offset,
normalize_by_image_size=True):
"""Initialization for ParsingCovering.
Args:
num_categories: The number of segmentation categories (or "classes" in the
dataset.
ignored_label: A category id that is ignored in evaluation, e.g. the void
label as defined in COCO panoptic segmentation dataset.
max_instances_per_category: The maximum number of instances for each
category. Used in ensuring unique instance labels.
offset: The maximum number of unique labels. This is used, by multiplying
the ground-truth labels, to generate unique ids for individual regions
of overlap between groundtruth and predicted segments.
normalize_by_image_size: Whether to normalize groundtruth instance region
areas by image size. If True, groundtruth instance areas and weighted
IoUs will be divided by the size of the corresponding image before
accumulated across the dataset.
"""
super(ParsingCovering, self).__init__(num_categories, ignored_label,
max_instances_per_category, offset)
self.normalize_by_image_size = normalize_by_image_size
def compare_and_accumulate(
self, groundtruth_category_array, groundtruth_instance_array,
predicted_category_array, predicted_instance_array):
"""See base class."""
# Allocate intermediate data structures.
max_ious = np.zeros([self.num_categories, self.max_instances_per_category],
dtype=np.float64)
gt_areas = np.zeros([self.num_categories, self.max_instances_per_category],
dtype=np.float64)
pred_areas = np.zeros(
[self.num_categories, self.max_instances_per_category],
dtype=np.float64)
# This is a dictionary in the format:
# {(category, gt_instance): [(pred_instance, intersection_area)]}.
intersections = collections.defaultdict(list)
# First, combine the category and instance labels so that every unique
# value for (category, instance) is assigned a unique integer label.
pred_segment_id = self._naively_combine_labels(predicted_category_array,
predicted_instance_array)
gt_segment_id = self._naively_combine_labels(groundtruth_category_array,
groundtruth_instance_array)
# Next, combine the groundtruth and predicted labels. Dividing up the pixels
# based on which groundtruth segment and which predicted segment they belong
# to, this will assign a different 32-bit integer label to each choice
# of (groundtruth segment, predicted segment), encoded as
# gt_segment_id * offset + pred_segment_id.
intersection_id_array = (
gt_segment_id.astype(np.uint32) * self.offset +
pred_segment_id.astype(np.uint32))
# For every combination of (groundtruth segment, predicted segment) with a
# non-empty intersection, this counts the number of pixels in that
# intersection.
intersection_ids, intersection_areas = np.unique(
intersection_id_array, return_counts=True)
# Find areas of all groundtruth and predicted instances, as well as of their
# intersections.
for intersection_id, intersection_area in six.moves.zip(
intersection_ids, intersection_areas):
gt_segment_id = intersection_id // self.offset
gt_category = gt_segment_id // self.max_instances_per_category
if gt_category == self.ignored_label:
continue
gt_instance = gt_segment_id % self.max_instances_per_category
gt_areas[gt_category, gt_instance] += intersection_area
pred_segment_id = intersection_id % self.offset
pred_category = pred_segment_id // self.max_instances_per_category
pred_instance = pred_segment_id % self.max_instances_per_category
pred_areas[pred_category, pred_instance] += intersection_area
if pred_category != gt_category:
continue
intersections[gt_category, gt_instance].append((pred_instance,
intersection_area))
# Find maximum IoU for every groundtruth instance.
for gt_label, instance_intersections in six.iteritems(intersections):
category, gt_instance = gt_label
gt_area = gt_areas[category, gt_instance]
ious = []
for pred_instance, intersection_area in instance_intersections:
pred_area = pred_areas[category, pred_instance]
union = gt_area + pred_area - intersection_area
ious.append(intersection_area / union)
max_ious[category, gt_instance] = max(ious)
# Normalize groundtruth instance areas by image size if necessary.
if self.normalize_by_image_size:
gt_areas /= groundtruth_category_array.size
# Compute per-class weighted IoUs and areas summed over all groundtruth
# instances.
self.weighted_iou_per_class += np.sum(max_ious * gt_areas, axis=-1)
self.gt_area_per_class += np.sum(gt_areas, axis=-1)
return self.result()
def result_per_category(self):
"""See base class."""
return base_metric.realdiv_maybe_zero(self.weighted_iou_per_class,
self.gt_area_per_class)
def _valid_categories(self):
"""Categories with a "valid" value for the metric, have > 0 instances.
We will ignore the `ignore_label` class and other classes which have
groundtruth area of 0.
Returns:
Boolean array of shape `[num_categories]`.
"""
valid_categories = np.not_equal(self.gt_area_per_class, 0)
if self.ignored_label >= 0 and self.ignored_label < self.num_categories:
valid_categories[self.ignored_label] = False
return valid_categories
def detailed_results(self, is_thing=None):
"""See base class."""
valid_categories = self._valid_categories()
# If known, break down which categories are valid _and_ things/stuff.
category_sets = collections.OrderedDict()
category_sets['All'] = valid_categories
if is_thing is not None:
category_sets['Things'] = np.logical_and(valid_categories, is_thing)
category_sets['Stuff'] = np.logical_and(valid_categories,
np.logical_not(is_thing))
covering_per_class = self.result_per_category()
results = {}
for category_set_name, in_category_set in six.iteritems(category_sets):
if np.any(in_category_set):
results[category_set_name] = {
'pc': np.mean(covering_per_class[in_category_set]),
# The number of valid categories in this subset.
'n': np.sum(in_category_set.astype(np.int32)),
}
else:
results[category_set_name] = {'pc': 0, 'n': 0}
return results
def print_detailed_results(self, is_thing=None, print_digits=3):
"""See base class."""
results = self.detailed_results(is_thing=is_thing)
tab = prettytable.PrettyTable()
tab.add_column('', [], align='l')
for fieldname in ['PC', 'N']:
tab.add_column(fieldname, [], align='r')
for category_set, subset_results in six.iteritems(results):
data_cols = [
round(subset_results['pc'], print_digits) * 100, subset_results['n']
]
tab.add_row([category_set] + data_cols)
print(tab)
def result(self):
"""See base class."""
covering_per_class = self.result_per_category()
valid_categories = self._valid_categories()
if not np.any(valid_categories):
return 0.
return np.mean(covering_per_class[valid_categories])
def merge(self, other_instance):
"""See base class."""
self.weighted_iou_per_class += other_instance.weighted_iou_per_class
self.gt_area_per_class += other_instance.gt_area_per_class
def reset(self):
"""See base class."""
self.weighted_iou_per_class = np.zeros(
self.num_categories, dtype=np.float64)
self.gt_area_per_class = np.zeros(self.num_categories, dtype=np.float64)
# Copyright 2019 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for Parsing Covering metric."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl.testing import absltest
import numpy as np
from deeplab.evaluation import parsing_covering
from deeplab.evaluation import test_utils
# See the definition of the color names at:
# https://en.wikipedia.org/wiki/Web_colors.
_CLASS_COLOR_MAP = {
(0, 0, 0): 0,
(0, 0, 255): 1, # Person (blue).
(255, 0, 0): 2, # Bear (red).
(0, 255, 0): 3, # Tree (lime).
(255, 0, 255): 4, # Bird (fuchsia).
(0, 255, 255): 5, # Sky (aqua).
(255, 255, 0): 6, # Cat (yellow).
}
class CoveringConveringTest(absltest.TestCase):
def test_perfect_match(self):
categories = np.zeros([6, 6], np.uint16)
instances = np.array([
[2, 2, 2, 2, 2, 2],
[2, 4, 4, 4, 4, 2],
[2, 4, 4, 4, 4, 2],
[2, 4, 4, 4, 4, 2],
[2, 4, 4, 2, 2, 2],
[2, 4, 2, 2, 2, 2],
],
dtype=np.uint16)
pc = parsing_covering.ParsingCovering(
num_categories=3,
ignored_label=2,
max_instances_per_category=2,
offset=16,
normalize_by_image_size=False)
pc.compare_and_accumulate(categories, instances, categories, instances)
np.testing.assert_array_equal(pc.weighted_iou_per_class, [0.0, 21.0, 0.0])
np.testing.assert_array_equal(pc.gt_area_per_class, [0.0, 21.0, 0.0])
np.testing.assert_array_equal(pc.result_per_category(), [0.0, 1.0, 0.0])
self.assertEqual(pc.result(), 1.0)
def test_totally_wrong(self):
categories = np.zeros([6, 6], np.uint16)
gt_instances = np.array([
[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
],
dtype=np.uint16)
pred_instances = 1 - gt_instances
pc = parsing_covering.ParsingCovering(
num_categories=2,
ignored_label=0,
max_instances_per_category=1,
offset=16,
normalize_by_image_size=False)
pc.compare_and_accumulate(categories, gt_instances, categories,
pred_instances)
np.testing.assert_array_equal(pc.weighted_iou_per_class, [0.0, 0.0])
np.testing.assert_array_equal(pc.gt_area_per_class, [0.0, 10.0])
np.testing.assert_array_equal(pc.result_per_category(), [0.0, 0.0])
self.assertEqual(pc.result(), 0.0)
def test_matches_expected(self):
pred_classes = test_utils.read_segmentation_with_rgb_color_map(
'team_pred_class.png', _CLASS_COLOR_MAP)
pred_instances = test_utils.read_test_image(
'team_pred_instance.png', mode='L')
instance_class_map = {
0: 0,
47: 1,
97: 1,
133: 1,
150: 1,
174: 1,
198: 2,
215: 1,
244: 1,
255: 1,
}
gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
'team_gt_instance.png', instance_class_map)
pc = parsing_covering.ParsingCovering(
num_categories=3,
ignored_label=0,
max_instances_per_category=256,
offset=256 * 256,
normalize_by_image_size=False)
pc.compare_and_accumulate(gt_classes, gt_instances, pred_classes,
pred_instances)
np.testing.assert_array_almost_equal(
pc.weighted_iou_per_class, [0.0, 39864.14634, 3136], decimal=4)
np.testing.assert_array_equal(pc.gt_area_per_class, [0.0, 56870, 5800])
np.testing.assert_array_almost_equal(
pc.result_per_category(), [0.0, 0.70097, 0.54069], decimal=4)
self.assertAlmostEqual(pc.result(), 0.6208296732)
def test_matches_expected_normalize_by_size(self):
pred_classes = test_utils.read_segmentation_with_rgb_color_map(
'team_pred_class.png', _CLASS_COLOR_MAP)
pred_instances = test_utils.read_test_image(
'team_pred_instance.png', mode='L')
instance_class_map = {
0: 0,
47: 1,
97: 1,
133: 1,
150: 1,
174: 1,
198: 2,
215: 1,
244: 1,
255: 1,
}
gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
'team_gt_instance.png', instance_class_map)
pc = parsing_covering.ParsingCovering(
num_categories=3,
ignored_label=0,
max_instances_per_category=256,
offset=256 * 256,
normalize_by_image_size=True)
pc.compare_and_accumulate(gt_classes, gt_instances, pred_classes,
pred_instances)
np.testing.assert_array_almost_equal(
pc.weighted_iou_per_class, [0.0, 0.5002088756, 0.03935002196],
decimal=4)
np.testing.assert_array_almost_equal(
pc.gt_area_per_class, [0.0, 0.7135955832, 0.07277746408], decimal=4)
# Note that the per-category and overall PCs are identical to those without
# normalization in the previous test, because we only have a single image.
np.testing.assert_array_almost_equal(
pc.result_per_category(), [0.0, 0.70097, 0.54069], decimal=4)
self.assertAlmostEqual(pc.result(), 0.6208296732)
if __name__ == '__main__':
absltest.main()
# Copyright 2019 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Code to compute segmentation in a "streaming" pattern in Tensorflow.
These aggregate the metric over examples of the evaluation set. Each example is
assumed to be fed in in a stream, and the metric implementation accumulates
across them.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from deeplab.evaluation import panoptic_quality
from deeplab.evaluation import parsing_covering
_EPSILON = 1e-10
def _realdiv_maybe_zero(x, y):
"""Support tf.realdiv(x, y) where y may contain zeros."""
return tf.where(tf.less(y, _EPSILON), tf.zeros_like(x), tf.realdiv(x, y))
def _running_total(value, shape, name=None):
"""Maintains a running total of tensor `value` between calls."""
with tf.variable_scope(name, 'running_total', [value]):
total_var = tf.get_variable(
'total',
shape,
value.dtype,
initializer=tf.zeros_initializer(),
trainable=False,
collections=[
tf.GraphKeys.LOCAL_VARIABLES, tf.GraphKeys.METRIC_VARIABLES
])
updated_total = tf.assign_add(total_var, value, use_locking=True)
return total_var, updated_total
def _panoptic_quality_helper(
groundtruth_category_array, groundtruth_instance_array,
predicted_category_array, predicted_instance_array, num_classes,
max_instances_per_category, ignored_label, offset):
"""Helper function to compute panoptic quality."""
pq = panoptic_quality.PanopticQuality(num_classes, ignored_label,
max_instances_per_category, offset)
pq.compare_and_accumulate(groundtruth_category_array,
groundtruth_instance_array,
predicted_category_array, predicted_instance_array)
return pq.iou_per_class, pq.tp_per_class, pq.fn_per_class, pq.fp_per_class
def streaming_panoptic_quality(groundtruth_categories,
groundtruth_instances,
predicted_categories,
predicted_instances,
num_classes,
max_instances_per_category,
ignored_label,
offset,
name=None):
"""Aggregates the panoptic metric across calls with different input tensors.
See tf.metrics.* functions for comparable functionality and usage.
Args:
groundtruth_categories: A 2D uint16 tensor of groundtruth category labels.
groundtruth_instances: A 2D uint16 tensor of groundtruth instance labels.
predicted_categories: A 2D uint16 tensor of predicted category labels.
predicted_instances: A 2D uint16 tensor of predicted instance labels.
num_classes: Number of classes in the dataset as an integer.
max_instances_per_category: The maximum number of instances for each class
as an integer or integer tensor.
ignored_label: The class id to be ignored in evaluation as an integer or
integer tensor.
offset: The maximum number of unique labels as an integer or integer tensor.
name: An optional variable_scope name.
Returns:
qualities: A tensor of shape `[6, num_classes]`, where (1) panoptic quality,
(2) segmentation quality, (3) recognition quality, (4) total_tp,
(5) total_fn and (6) total_fp are saved in the respective rows.
update_ops: List of operations that update the running overall panoptic
quality.
Raises:
RuntimeError: If eager execution is enabled.
"""
if tf.executing_eagerly():
raise RuntimeError('Cannot aggregate when eager execution is enabled.')
input_args = [
tf.convert_to_tensor(groundtruth_categories, tf.uint16),
tf.convert_to_tensor(groundtruth_instances, tf.uint16),
tf.convert_to_tensor(predicted_categories, tf.uint16),
tf.convert_to_tensor(predicted_instances, tf.uint16),
tf.convert_to_tensor(num_classes, tf.int32),
tf.convert_to_tensor(max_instances_per_category, tf.int32),
tf.convert_to_tensor(ignored_label, tf.int32),
tf.convert_to_tensor(offset, tf.int32),
]
return_types = [
tf.float64,
tf.float64,
tf.float64,
tf.float64,
]
with tf.variable_scope(name, 'streaming_panoptic_quality', input_args):
panoptic_results = tf.py_func(
_panoptic_quality_helper, input_args, return_types, stateful=False)
iou, tp, fn, fp = tuple(panoptic_results)
total_iou, updated_iou = _running_total(
iou, [num_classes], name='iou_total')
total_tp, updated_tp = _running_total(tp, [num_classes], name='tp_total')
total_fn, updated_fn = _running_total(fn, [num_classes], name='fn_total')
total_fp, updated_fp = _running_total(fp, [num_classes], name='fp_total')
update_ops = [updated_iou, updated_tp, updated_fn, updated_fp]
sq = _realdiv_maybe_zero(total_iou, total_tp)
rq = _realdiv_maybe_zero(total_tp,
total_tp + 0.5 * total_fn + 0.5 * total_fp)
pq = tf.multiply(sq, rq)
qualities = tf.stack([pq, sq, rq, total_tp, total_fn, total_fp], axis=0)
return qualities, update_ops
def _parsing_covering_helper(
groundtruth_category_array, groundtruth_instance_array,
predicted_category_array, predicted_instance_array, num_classes,
max_instances_per_category, ignored_label, offset, normalize_by_image_size):
"""Helper function to compute parsing covering."""
pc = parsing_covering.ParsingCovering(num_classes, ignored_label,
max_instances_per_category, offset,
normalize_by_image_size)
pc.compare_and_accumulate(groundtruth_category_array,
groundtruth_instance_array,
predicted_category_array, predicted_instance_array)
return pc.weighted_iou_per_class, pc.gt_area_per_class
def streaming_parsing_covering(groundtruth_categories,
groundtruth_instances,
predicted_categories,
predicted_instances,
num_classes,
max_instances_per_category,
ignored_label,
offset,
normalize_by_image_size=True,
name=None):
"""Aggregates the covering across calls with different input tensors.
See tf.metrics.* functions for comparable functionality and usage.
Args:
groundtruth_categories: A 2D uint16 tensor of groundtruth category labels.
groundtruth_instances: A 2D uint16 tensor of groundtruth instance labels.
predicted_categories: A 2D uint16 tensor of predicted category labels.
predicted_instances: A 2D uint16 tensor of predicted instance labels.
num_classes: Number of classes in the dataset as an integer.
max_instances_per_category: The maximum number of instances for each class
as an integer or integer tensor.
ignored_label: The class id to be ignored in evaluation as an integer or
integer tensor.
offset: The maximum number of unique labels as an integer or integer tensor.
normalize_by_image_size: Whether to normalize groundtruth region areas by
image size. If True, groundtruth instance areas and weighted IoUs will be
divided by the size of the corresponding image before accumulated across
the dataset.
name: An optional variable_scope name.
Returns:
coverings: A tensor of shape `[3, num_classes]`, where (1) per class
coverings, (2) per class sum of weighted IoUs, and (3) per class sum of
groundtruth region areas are saved in the perspective rows.
update_ops: List of operations that update the running overall parsing
covering.
Raises:
RuntimeError: If eager execution is enabled.
"""
if tf.executing_eagerly():
raise RuntimeError('Cannot aggregate when eager execution is enabled.')
input_args = [
tf.convert_to_tensor(groundtruth_categories, tf.uint16),
tf.convert_to_tensor(groundtruth_instances, tf.uint16),
tf.convert_to_tensor(predicted_categories, tf.uint16),
tf.convert_to_tensor(predicted_instances, tf.uint16),
tf.convert_to_tensor(num_classes, tf.int32),
tf.convert_to_tensor(max_instances_per_category, tf.int32),
tf.convert_to_tensor(ignored_label, tf.int32),
tf.convert_to_tensor(offset, tf.int32),
tf.convert_to_tensor(normalize_by_image_size, tf.bool),
]
return_types = [
tf.float64,
tf.float64,
]
with tf.variable_scope(name, 'streaming_parsing_covering', input_args):
covering_results = tf.py_func(
_parsing_covering_helper, input_args, return_types, stateful=False)
weighted_iou_per_class, gt_area_per_class = tuple(covering_results)
total_weighted_iou_per_class, updated_weighted_iou_per_class = (
_running_total(
weighted_iou_per_class, [num_classes],
name='weighted_iou_per_class_total'))
total_gt_area_per_class, updated_gt_area_per_class = _running_total(
gt_area_per_class, [num_classes], name='gt_area_per_class_total')
covering_per_class = _realdiv_maybe_zero(total_weighted_iou_per_class,
total_gt_area_per_class)
coverings = tf.stack([
covering_per_class,
total_weighted_iou_per_class,
total_gt_area_per_class,
],
axis=0)
update_ops = [updated_weighted_iou_per_class, updated_gt_area_per_class]
return coverings, update_ops
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment