Unverified Commit 1f8b5b27 authored by Simon Geisler's avatar Simon Geisler Committed by GitHub
Browse files

Merge branch 'master' into master

parents 0eeeaf98 8fcf177e
## Download and preprocess Criteo TB dataset
[Apache Beam](https://beam.apache.org) enables distributed preprocessing of the
dataset and can be run on
[Google Cloud Dataflow](https://cloud.google.com/dataflow/). The preprocessing
scripts can be run locally via DirectRunner provided that the local host has
enough CPU/Memory/Storage.
Install required packages.
```bash
python3 setup.py install
```
Set up the following environment variables, replacing bucket-name with the name
of your Cloud Storage bucket and project name with your GCP project name.
```bash
export STORAGE_BUCKET=gs://bucket-name
export PROJECT=my-gcp-project
export REGION=us-central1
```
Note: If running locally above environment variables won't be needed and instead
of gs://bucket-name a local path can be used, also consider passing smaller
`max_vocab_size` argument.
1. Download raw
[Criteo TB dataset](https://labs.criteo.com/2013/12/download-terabyte-click-logs/)
to a GCS bucket.
Organize the data in the following way:
* The files day_0.gz, day_1.gz, ..., day_22.gz in
${STORAGE_BUCKET}/criteo_raw/train/
* The file day_23.gz in ${STORAGE_BUCKET}/criteo_raw/test/
2. Shard the raw training/test data into multiple files.
```bash
python3 shard_rebalancer.py \
--input_path "${STORAGE_BUCKET}/criteo_raw/train/*" \
--output_path "${STORAGE_BUCKET}/criteo_raw_sharded/train/train" \
--num_output_files 1024 --filetype csv --runner DataflowRunner \
--project ${PROJECT} --region ${REGION}
```
```bash
python3 shard_rebalancer.py \
--input_path "${STORAGE_BUCKET}/criteo_raw/test/*" \
--output_path "${STORAGE_BUCKET}/criteo_raw_sharded/test/test" \
--num_output_files 64 --filetype csv --runner DataflowRunner \
--project ${PROJECT} --region ${REGION}
```
3. Generate vocabulary and preprocess the data.
Generate vocabulary:
```bash
python3 criteo_preprocess.py \
--input_path "${STORAGE_BUCKET}/criteo_raw_sharded/*/*" \
--output_path "${STORAGE_BUCKET}/criteo/" \
--temp_dir "${STORAGE_BUCKET}/criteo_vocab/" \
--vocab_gen_mode --runner DataflowRunner --max_vocab_size 5000000 \
--project ${PROJECT} --region ${REGION}
```
Preprocess training and test data:
```bash
python3 criteo_preprocess.py \
--input_path "${STORAGE_BUCKET}/criteo_raw_sharded/train/*" \
--output_path "${STORAGE_BUCKET}/criteo/train/train" \
--temp_dir "${STORAGE_BUCKET}/criteo_vocab/" \
--runner DataflowRunner --max_vocab_size 5000000 \
--project ${PROJECT} --region ${REGION}
```
```bash
python3 criteo_preprocess.py \
--input_path "${STORAGE_BUCKET}/criteo_raw_sharded/test/*" \
--output_path "${STORAGE_BUCKET}/criteo/test/test" \
--temp_dir "${STORAGE_BUCKET}/criteo_vocab/" \
--runner DataflowRunner --max_vocab_size 5000000 \
--project ${PROJECT} --region ${REGION}
```
4. (Optional) Re-balance the dataset.
```bash
python3 shard_rebalancer.py \
--input_path "${STORAGE_BUCKET}/criteo/train/*" \
--output_path "${STORAGE_BUCKET}/criteo_balanced/train/train" \
--num_output_files 8192 --filetype csv --runner DataflowRunner \
--project ${PROJECT} --region ${REGION}
```
```bash
python3 shard_rebalancer.py \
--input_path "${STORAGE_BUCKET}/criteo/test/*" \
--output_path "${STORAGE_BUCKET}/criteo_balanced/test/test" \
--num_output_files 1024 --filetype csv --runner DataflowRunner \
--project ${PROJECT} --region ${REGION}
```
At this point training and test data are in the buckets:
* `${STORAGE_BUCKET}/criteo_balanced/train/`
* `${STORAGE_BUCKET}/criteo_balanced/test/`
All other buckets can be removed.
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""TFX beam preprocessing pipeline for Criteo data.
Preprocessing util for criteo data. Transformations:
1. Fill missing features with zeros.
2. Set negative integer features to zeros.
3. Normalize integer features using log(x+1).
4. For categorical features (hex), convert to integer and take value modulus the
max_vocab_size value.
Usage:
For raw Criteo data, this script should be run twice.
First run should set vocab_gen_mode to true. This run is used to generate
vocabulary files in the temp_dir location.
Second run should set vocab_gen_mode to false. It is necessary to point to the
same temp_dir used during the first run.
"""
import argparse
import datetime
import os
from absl import logging
import apache_beam as beam
import numpy as np
import tensorflow as tf
import tensorflow_transform as tft
import tensorflow_transform.beam as tft_beam
from tensorflow_transform.tf_metadata import dataset_metadata
from tensorflow_transform.tf_metadata import schema_utils
from tfx_bsl.public import tfxio
parser = argparse.ArgumentParser()
parser.add_argument(
"--input_path",
default=None,
required=True,
help="Input path. Be sure to set this to cover all data, to ensure "
"that sparse vocabs are complete.")
parser.add_argument(
"--output_path",
default=None,
required=True,
help="Output path.")
parser.add_argument(
"--temp_dir",
default=None,
required=True,
help="Directory to store temporary metadata. Important because vocab "
"dictionaries will be stored here. Co-located with data, ideally.")
parser.add_argument(
"--csv_delimeter",
default="\t",
help="Delimeter string for input and output.")
parser.add_argument(
"--vocab_gen_mode",
action="store_true",
default=False,
help="If it is set, process full dataset and do not write CSV output. In "
"this mode, See temp_dir for vocab files. input_path should cover all "
"data, e.g. train, test, eval.")
parser.add_argument(
"--runner",
help="Runner for Apache Beam, needs to be one of {DirectRunner, "
"DataflowRunner}.",
default="DirectRunner")
parser.add_argument(
"--project",
default=None,
help="ID of your project. Ignored by DirectRunner.")
parser.add_argument(
"--region",
default=None,
help="Region. Ignored by DirectRunner.")
parser.add_argument(
"--max_vocab_size",
type=int,
default=10_000_000,
help="Max index range, categorical features convert to integer and take "
"value modulus the max_vocab_size")
args = parser.parse_args()
NUM_NUMERIC_FEATURES = 13
NUMERIC_FEATURE_KEYS = [
f"int-feature-{x + 1}" for x in range(NUM_NUMERIC_FEATURES)]
CATEGORICAL_FEATURE_KEYS = [
"categorical-feature-%d" % x for x in range(NUM_NUMERIC_FEATURES + 1, 40)]
LABEL_KEY = "clicked"
# Data is first preprocessed in pure Apache Beam using numpy.
# This removes missing values and hexadecimal-encoded values.
# For the TF schema, we can thus specify the schema as FixedLenFeature
# for TensorFlow Transform.
FEATURE_SPEC = dict([(name, tf.io.FixedLenFeature([], dtype=tf.int64))
for name in CATEGORICAL_FEATURE_KEYS] +
[(name, tf.io.FixedLenFeature([], dtype=tf.float32))
for name in NUMERIC_FEATURE_KEYS] +
[(LABEL_KEY, tf.io.FixedLenFeature([], tf.float32))])
INPUT_METADATA = dataset_metadata.DatasetMetadata(
schema_utils.schema_from_feature_spec(FEATURE_SPEC))
def apply_vocab_fn(inputs):
"""Preprocessing fn for sparse features.
Applies vocab to bucketize sparse features. This function operates using
previously-created vocab files.
Pre-condition: Full vocab has been materialized.
Args:
inputs: Input features to transform.
Returns:
Output dict with transformed features.
"""
outputs = {}
outputs[LABEL_KEY] = inputs[LABEL_KEY]
for key in NUMERIC_FEATURE_KEYS:
outputs[key] = inputs[key]
for idx, key in enumerate(CATEGORICAL_FEATURE_KEYS):
vocab_fn = os.path.join(
args.temp_dir, "tftransform_tmp", "feature_{}_vocab".format(idx))
outputs[key] = tft.apply_vocabulary(inputs[key], vocab_fn)
return outputs
def compute_vocab_fn(inputs):
"""Preprocessing fn for sparse features.
This function computes unique IDs for the sparse features. We rely on implicit
behavior which writes the vocab files to the vocab_filename specified in
tft.compute_and_apply_vocabulary.
Pre-condition: Sparse features have been converted to integer and mod'ed with
args.max_vocab_size.
Args:
inputs: Input features to transform.
Returns:
Output dict with transformed features.
"""
outputs = {}
outputs[LABEL_KEY] = inputs[LABEL_KEY]
for key in NUMERIC_FEATURE_KEYS:
outputs[key] = inputs[key]
for idx, key in enumerate(CATEGORICAL_FEATURE_KEYS):
outputs[key] = tft.compute_and_apply_vocabulary(
x=inputs[key],
vocab_filename="feature_{}_vocab".format(idx))
return outputs
class FillMissing(beam.DoFn):
"""Fills missing elements with zero string value."""
def process(self, element):
elem_list = element.split(args.csv_delimeter)
out_list = []
for val in elem_list:
new_val = "0" if not val else val
out_list.append(new_val)
yield (args.csv_delimeter).join(out_list)
class NegsToZeroLog(beam.DoFn):
"""For int features, sets negative values to zero and takes log(x+1)."""
def process(self, element):
elem_list = element.split(args.csv_delimeter)
out_list = []
for i, val in enumerate(elem_list):
if i > 0 and i <= NUM_NUMERIC_FEATURES:
new_val = "0" if int(val) < 0 else val
new_val = np.log(int(new_val) + 1)
new_val = str(new_val)
else:
new_val = val
out_list.append(new_val)
yield (args.csv_delimeter).join(out_list)
class HexToIntModRange(beam.DoFn):
"""For categorical features, takes decimal value and mods with max value."""
def process(self, element):
elem_list = element.split(args.csv_delimeter)
out_list = []
for i, val in enumerate(elem_list):
if i > NUM_NUMERIC_FEATURES:
new_val = int(val, 16) % args.max_vocab_size
else:
new_val = val
out_list.append(str(new_val))
yield str.encode((args.csv_delimeter).join(out_list))
def transform_data(data_path, output_path):
"""Preprocesses Criteo data.
Two processing modes are supported. Raw data will require two passes.
If full vocab files already exist, only one pass is necessary.
Args:
data_path: File(s) to read.
output_path: Path to which output CSVs are written, if necessary.
"""
preprocessing_fn = compute_vocab_fn if args.vocab_gen_mode else apply_vocab_fn
gcp_project = args.project
region = args.region
job_name = (f"criteo-preprocessing-"
f"{datetime.datetime.now().strftime('%y%m%d-%H%M%S')}")
# set up Beam pipeline.
pipeline_options = None
if args.runner == "DataflowRunner":
options = {
"staging_location": os.path.join(output_path, "tmp", "staging"),
"temp_location": os.path.join(output_path, "tmp"),
"job_name": job_name,
"project": gcp_project,
"save_main_session": True,
"region": region,
"setup_file": "./setup.py",
}
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
elif args.runner == "DirectRunner":
pipeline_options = beam.options.pipeline_options.DirectOptions(
direct_num_workers=os.cpu_count(),
direct_running_mode="multi_threading")
with beam.Pipeline(args.runner, options=pipeline_options) as pipeline:
with tft_beam.Context(temp_dir=args.temp_dir):
processed_lines = (
pipeline
# Read in TSV data.
| beam.io.ReadFromText(data_path, coder=beam.coders.StrUtf8Coder())
# Fill in missing elements with the defaults (zeros).
| "FillMissing" >> beam.ParDo(FillMissing())
# For numerical features, set negatives to zero. Then take log(x+1).
| "NegsToZeroLog" >> beam.ParDo(NegsToZeroLog())
# For categorical features, mod the values with vocab size.
| "HexToIntModRange" >> beam.ParDo(HexToIntModRange()))
# CSV reader: List the cols in order, as dataset schema is not ordered.
ordered_columns = [LABEL_KEY
] + NUMERIC_FEATURE_KEYS + CATEGORICAL_FEATURE_KEYS
csv_tfxio = tfxio.BeamRecordCsvTFXIO(
physical_format="text",
column_names=ordered_columns,
delimiter=args.csv_delimeter,
schema=INPUT_METADATA.schema)
converted_data = (
processed_lines
| "DecodeData" >> csv_tfxio.BeamSource())
raw_dataset = (converted_data, csv_tfxio.TensorAdapterConfig())
# The TFXIO output format is chosen for improved performance.
transformed_dataset, _ = (
raw_dataset | tft_beam.AnalyzeAndTransformDataset(
preprocessing_fn, output_record_batches=False))
# Transformed metadata is not necessary for encoding.
transformed_data, transformed_metadata = transformed_dataset
if not args.vocab_gen_mode:
# Write to CSV.
transformed_csv_coder = tft.coders.CsvCoder(
ordered_columns, transformed_metadata.schema,
delimiter=args.csv_delimeter)
_ = (
transformed_data
| "EncodeDataCsv" >> beam.Map(transformed_csv_coder.encode)
| "WriteDataCsv" >> beam.io.WriteToText(output_path))
if __name__ == "__main__":
logging.set_verbosity(logging.INFO)
transform_data(data_path=args.input_path,
output_path=args.output_path)
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Setup configuration for criteo dataset preprocessing.
This is used while running Tensorflow transform on Cloud Dataflow.
"""
import setuptools
version = "0.1.0"
if __name__ == "__main__":
setuptools.setup(
name="criteo_preprocessing",
version=version,
install_requires=["tensorflow-transform"],
packages=setuptools.find_packages(),
)
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Rebalance a set of CSV/TFRecord shards to a target number of files.
"""
import argparse
import datetime
import os
import apache_beam as beam
import tensorflow as tf
parser = argparse.ArgumentParser()
parser.add_argument(
"--input_path",
default=None,
required=True,
help="Input path.")
parser.add_argument(
"--output_path",
default=None,
required=True,
help="Output path.")
parser.add_argument(
"--num_output_files",
type=int,
default=256,
help="Number of output file shards.")
parser.add_argument(
"--filetype",
default="tfrecord",
help="File type, needs to be one of {tfrecord, csv}.")
parser.add_argument(
"--project",
default=None,
help="ID (not name) of your project. Ignored by DirectRunner")
parser.add_argument(
"--runner",
help="Runner for Apache Beam, needs to be one of "
"{DirectRunner, DataflowRunner}.",
default="DirectRunner")
parser.add_argument(
"--region",
default=None,
help="region")
args = parser.parse_args()
def rebalance_data_shards():
"""Rebalances data shards."""
def csv_pipeline(pipeline: beam.Pipeline):
"""Rebalances CSV dataset.
Args:
pipeline: Beam pipeline object.
"""
_ = (
pipeline
| beam.io.ReadFromText(args.input_path)
| beam.io.WriteToText(args.output_path,
num_shards=args.num_output_files))
def tfrecord_pipeline(pipeline: beam.Pipeline):
"""Rebalances TFRecords dataset.
Args:
pipeline: Beam pipeline object.
"""
example_coder = beam.coders.ProtoCoder(tf.train.Example)
_ = (
pipeline
| beam.io.ReadFromTFRecord(args.input_path, coder=example_coder)
| beam.io.WriteToTFRecord(args.output_path, file_name_suffix="tfrecord",
coder=example_coder,
num_shards=args.num_output_files))
job_name = (
f"shard-rebalancer-{datetime.datetime.now().strftime('%y%m%d-%H%M%S')}")
# set up Beam pipeline.
options = {
"staging_location": os.path.join(args.output_path, "tmp", "staging"),
"temp_location": os.path.join(args.output_path, "tmp"),
"job_name": job_name,
"project": args.project,
"save_main_session": True,
"region": args.region,
}
opts = beam.pipeline.PipelineOptions(flags=[], **options)
with beam.Pipeline(args.runner, options=opts) as pipeline:
if args.filetype == "tfrecord":
tfrecord_pipeline(pipeline)
elif args.filetype == "csv":
csv_pipeline(pipeline)
if __name__ == "__main__":
rebalance_data_shards()
......@@ -46,16 +46,13 @@ flags.DEFINE_bool('search_hints', True,
flags.DEFINE_string('site_path', '/api_docs/python',
'Path prefix in the _toc.yaml')
flags.DEFINE_bool('gen_report', False,
'Generate an API report containing the health of the '
'docstrings of the public API.')
PROJECT_SHORT_NAME = 'tfnlp'
PROJECT_FULL_NAME = 'TensorFlow Official Models - NLP Modeling Library'
def gen_api_docs(code_url_prefix, site_path, output_dir, gen_report,
project_short_name, project_full_name, search_hints):
def gen_api_docs(code_url_prefix, site_path, output_dir, project_short_name,
project_full_name, search_hints):
"""Generates api docs for the tensorflow docs package."""
build_api_docs_lib.hide_module_model_and_layer_methods()
del tfnlp.layers.MultiHeadAttention
......@@ -68,7 +65,6 @@ def gen_api_docs(code_url_prefix, site_path, output_dir, gen_report,
code_url_prefix=code_url_prefix,
search_hints=search_hints,
site_path=site_path,
gen_report=gen_report,
callbacks=[public_api.explicit_package_contents_filter],
)
......@@ -84,7 +80,6 @@ def main(argv):
code_url_prefix=FLAGS.code_url_prefix,
site_path=FLAGS.site_path,
output_dir=FLAGS.output_dir,
gen_report=FLAGS.gen_report,
project_short_name=PROJECT_SHORT_NAME,
project_full_name=PROJECT_FULL_NAME,
search_hints=FLAGS.search_hints)
......
......@@ -46,16 +46,12 @@ flags.DEFINE_bool('search_hints', True,
flags.DEFINE_string('site_path', 'tfvision/api_docs/python',
'Path prefix in the _toc.yaml')
flags.DEFINE_bool('gen_report', False,
'Generate an API report containing the health of the '
'docstrings of the public API.')
PROJECT_SHORT_NAME = 'tfvision'
PROJECT_FULL_NAME = 'TensorFlow Official Models - Vision Modeling Library'
def gen_api_docs(code_url_prefix, site_path, output_dir, gen_report,
project_short_name, project_full_name, search_hints):
def gen_api_docs(code_url_prefix, site_path, output_dir, project_short_name,
project_full_name, search_hints):
"""Generates api docs for the tensorflow docs package."""
build_api_docs_lib.hide_module_model_and_layer_methods()
......@@ -66,7 +62,6 @@ def gen_api_docs(code_url_prefix, site_path, output_dir, gen_report,
code_url_prefix=code_url_prefix,
search_hints=search_hints,
site_path=site_path,
gen_report=gen_report,
callbacks=[public_api.explicit_package_contents_filter],
)
......@@ -82,7 +77,6 @@ def main(argv):
code_url_prefix=FLAGS.code_url_prefix,
site_path=FLAGS.site_path,
output_dir=FLAGS.output_dir,
gen_report=FLAGS.gen_report,
project_short_name=PROJECT_SHORT_NAME,
project_full_name=PROJECT_FULL_NAME,
search_hints=FLAGS.search_hints)
......
......@@ -32,6 +32,7 @@ class ResNet(hyperparams.Config):
stochastic_depth_drop_rate: float = 0.0
resnetd_shortcut: bool = False
replace_stem_max_pool: bool = False
bn_trainable: bool = True
@dataclasses.dataclass
......
......@@ -15,15 +15,44 @@
# Lint as: python3
"""Common configurations."""
from typing import Optional, List
# Import libraries
import dataclasses
from typing import Optional
# Import libraries
from official.core import config_definitions as cfg
from official.modeling import hyperparams
@dataclasses.dataclass
class TfExampleDecoder(hyperparams.Config):
"""A simple TF Example decoder config."""
regenerate_source_id: bool = False
mask_binarize_threshold: Optional[float] = None
@dataclasses.dataclass
class TfExampleDecoderLabelMap(hyperparams.Config):
"""TF Example decoder with label map config."""
regenerate_source_id: bool = False
mask_binarize_threshold: Optional[float] = None
label_map: str = ''
@dataclasses.dataclass
class DataDecoder(hyperparams.OneOfConfig):
"""Data decoder config.
Attributes:
type: 'str', type of data decoder be used, one of the fields below.
simple_decoder: simple TF Example decoder config.
label_map_decoder: TF Example decoder with label map config.
"""
type: Optional[str] = 'simple_decoder'
simple_decoder: TfExampleDecoder = TfExampleDecoder()
label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap()
@dataclasses.dataclass
class RandAugment(hyperparams.Config):
"""Configuration for RandAugment."""
......
......@@ -14,11 +14,10 @@
# Lint as: python3
"""Image classification configuration definition."""
import dataclasses
import os
from typing import List, Optional
import dataclasses
from official.core import config_definitions as cfg
from official.core import exp_factory
from official.modeling import hyperparams
......@@ -47,6 +46,7 @@ class DataConfig(cfg.DataConfig):
label_field_key: str = 'image/class/label'
decode_jpeg_only: bool = True
mixup_and_cutmix: Optional[common.MixupAndCutmix] = None
decoder: Optional[common.DataDecoder] = common.DataDecoder()
# Keep for backward compatibility.
aug_policy: Optional[str] = None # None, 'autoaug', or 'randaug'.
......
......@@ -17,7 +17,7 @@
import dataclasses
import os
from typing import List, Optional
from typing import List, Optional, Union
from official.core import config_definitions as cfg
from official.core import exp_factory
......@@ -29,26 +29,6 @@ from official.vision.beta.configs import backbones
# pylint: disable=missing-class-docstring
@dataclasses.dataclass
class TfExampleDecoder(hyperparams.Config):
regenerate_source_id: bool = False
mask_binarize_threshold: Optional[float] = None
@dataclasses.dataclass
class TfExampleDecoderLabelMap(hyperparams.Config):
regenerate_source_id: bool = False
mask_binarize_threshold: Optional[float] = None
label_map: str = ''
@dataclasses.dataclass
class DataDecoder(hyperparams.OneOfConfig):
type: Optional[str] = 'simple_decoder'
simple_decoder: TfExampleDecoder = TfExampleDecoder()
label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap()
@dataclasses.dataclass
class Parser(hyperparams.Config):
num_channels: int = 3
......@@ -73,7 +53,7 @@ class DataConfig(cfg.DataConfig):
global_batch_size: int = 0
is_training: bool = False
dtype: str = 'bfloat16'
decoder: DataDecoder = DataDecoder()
decoder: common.DataDecoder = common.DataDecoder()
parser: Parser = Parser()
shuffle_buffer_size: int = 10000
file_type: str = 'tfrecord'
......@@ -221,7 +201,8 @@ class MaskRCNNTask(cfg.TaskConfig):
drop_remainder=False)
losses: Losses = Losses()
init_checkpoint: Optional[str] = None
init_checkpoint_modules: str = 'all' # all or backbone
init_checkpoint_modules: Union[
str, List[str]] = 'all' # all, backbone, and/or decoder
annotation_file: Optional[str] = None
per_category_metrics: bool = False
# If set, we only use masks for the specified class IDs.
......
......@@ -15,9 +15,9 @@
# Lint as: python3
"""RetinaNet configuration definition."""
import os
from typing import List, Optional
import dataclasses
import os
from typing import List, Optional, Union
from official.core import config_definitions as cfg
from official.core import exp_factory
......@@ -29,22 +29,22 @@ from official.vision.beta.configs import backbones
# pylint: disable=missing-class-docstring
# Keep for backward compatibility.
@dataclasses.dataclass
class TfExampleDecoder(hyperparams.Config):
regenerate_source_id: bool = False
class TfExampleDecoder(common.TfExampleDecoder):
"""A simple TF Example decoder config."""
# Keep for backward compatibility.
@dataclasses.dataclass
class TfExampleDecoderLabelMap(hyperparams.Config):
regenerate_source_id: bool = False
label_map: str = ''
class TfExampleDecoderLabelMap(common.TfExampleDecoderLabelMap):
"""TF Example decoder with label map config."""
# Keep for backward compatibility.
@dataclasses.dataclass
class DataDecoder(hyperparams.OneOfConfig):
type: Optional[str] = 'simple_decoder'
simple_decoder: TfExampleDecoder = TfExampleDecoder()
label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap()
class DataDecoder(common.DataDecoder):
"""Data decoder config."""
@dataclasses.dataclass
......@@ -55,6 +55,7 @@ class Parser(hyperparams.Config):
aug_rand_hflip: bool = False
aug_scale_min: float = 1.0
aug_scale_max: float = 1.0
aug_policy: Optional[str] = None
skip_crowd_during_training: bool = True
max_num_instances: int = 100
......@@ -66,7 +67,7 @@ class DataConfig(cfg.DataConfig):
global_batch_size: int = 0
is_training: bool = False
dtype: str = 'bfloat16'
decoder: DataDecoder = DataDecoder()
decoder: common.DataDecoder = common.DataDecoder()
parser: Parser = Parser()
shuffle_buffer_size: int = 10000
file_type: str = 'tfrecord'
......@@ -144,7 +145,8 @@ class RetinaNetTask(cfg.TaskConfig):
validation_data: DataConfig = DataConfig(is_training=False)
losses: Losses = Losses()
init_checkpoint: Optional[str] = None
init_checkpoint_modules: str = 'all' # all or backbone
init_checkpoint_modules: Union[
str, List[str]] = 'all' # all, backbone, and/or decoder
annotation_file: Optional[str] = None
per_category_metrics: bool = False
export_config: ExportConfig = ExportConfig()
......
......@@ -14,10 +14,10 @@
# Lint as: python3
"""Semantic segmentation configuration definition."""
import dataclasses
import os
from typing import List, Optional, Union
import dataclasses
import numpy as np
from official.core import exp_factory
......@@ -50,8 +50,10 @@ class DataConfig(cfg.DataConfig):
aug_scale_min: float = 1.0
aug_scale_max: float = 1.0
aug_rand_hflip: bool = True
aug_policy: Optional[str] = None
drop_remainder: bool = True
file_type: str = 'tfrecord'
decoder: Optional[common.DataDecoder] = common.DataDecoder()
@dataclasses.dataclass
......@@ -120,7 +122,7 @@ class SemanticSegmentationTask(cfg.TaskConfig):
def semantic_segmentation() -> cfg.ExperimentConfig:
"""Semantic segmentation general."""
return cfg.ExperimentConfig(
task=SemanticSegmentationModel(),
task=SemanticSegmentationTask(),
trainer=cfg.TrainerConfig(),
restrictions=[
'task.train_data.is_training != None',
......
......@@ -127,6 +127,7 @@ class ResNet(tf.keras.Model):
kernel_initializer: str = 'VarianceScaling',
kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None,
bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None,
bn_trainable: bool = True,
**kwargs):
"""Initializes a ResNet model.
......@@ -153,6 +154,8 @@ class ResNet(tf.keras.Model):
Conv2D. Default to None.
bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D.
Default to None.
bn_trainable: A `bool` that indicates whether batch norm layers should be
trainable. Default to True.
**kwargs: Additional keyword arguments to be passed.
"""
self._model_id = model_id
......@@ -174,6 +177,7 @@ class ResNet(tf.keras.Model):
self._kernel_initializer = kernel_initializer
self._kernel_regularizer = kernel_regularizer
self._bias_regularizer = bias_regularizer
self._bn_trainable = bn_trainable
if tf.keras.backend.image_data_format() == 'channels_last':
bn_axis = -1
......@@ -195,7 +199,10 @@ class ResNet(tf.keras.Model):
bias_regularizer=self._bias_regularizer)(
inputs)
x = self._norm(
axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)(
axis=bn_axis,
momentum=norm_momentum,
epsilon=norm_epsilon,
trainable=bn_trainable)(
x)
x = tf_utils.get_activation(activation, use_keras_layer=True)(x)
elif stem_type == 'v1':
......@@ -210,7 +217,10 @@ class ResNet(tf.keras.Model):
bias_regularizer=self._bias_regularizer)(
inputs)
x = self._norm(
axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)(
axis=bn_axis,
momentum=norm_momentum,
epsilon=norm_epsilon,
trainable=bn_trainable)(
x)
x = tf_utils.get_activation(activation, use_keras_layer=True)(x)
x = layers.Conv2D(
......@@ -224,7 +234,10 @@ class ResNet(tf.keras.Model):
bias_regularizer=self._bias_regularizer)(
x)
x = self._norm(
axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)(
axis=bn_axis,
momentum=norm_momentum,
epsilon=norm_epsilon,
trainable=bn_trainable)(
x)
x = tf_utils.get_activation(activation, use_keras_layer=True)(x)
x = layers.Conv2D(
......@@ -238,7 +251,10 @@ class ResNet(tf.keras.Model):
bias_regularizer=self._bias_regularizer)(
x)
x = self._norm(
axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)(
axis=bn_axis,
momentum=norm_momentum,
epsilon=norm_epsilon,
trainable=bn_trainable)(
x)
x = tf_utils.get_activation(activation, use_keras_layer=True)(x)
else:
......@@ -256,7 +272,10 @@ class ResNet(tf.keras.Model):
bias_regularizer=self._bias_regularizer)(
x)
x = self._norm(
axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)(
axis=bn_axis,
momentum=norm_momentum,
epsilon=norm_epsilon,
trainable=bn_trainable)(
x)
x = tf_utils.get_activation(activation, use_keras_layer=True)(x)
else:
......@@ -324,7 +343,8 @@ class ResNet(tf.keras.Model):
activation=self._activation,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_momentum,
norm_epsilon=self._norm_epsilon)(
norm_epsilon=self._norm_epsilon,
bn_trainable=self._bn_trainable)(
inputs)
for _ in range(1, block_repeats):
......@@ -341,7 +361,8 @@ class ResNet(tf.keras.Model):
activation=self._activation,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_momentum,
norm_epsilon=self._norm_epsilon)(
norm_epsilon=self._norm_epsilon,
bn_trainable=self._bn_trainable)(
x)
return tf.keras.layers.Activation('linear', name=name)(x)
......@@ -362,6 +383,7 @@ class ResNet(tf.keras.Model):
'kernel_initializer': self._kernel_initializer,
'kernel_regularizer': self._kernel_regularizer,
'bias_regularizer': self._bias_regularizer,
'bn_trainable': self._bn_trainable
}
return config_dict
......@@ -400,4 +422,5 @@ def build_resnet(
use_sync_bn=norm_activation_config.use_sync_bn,
norm_momentum=norm_activation_config.norm_momentum,
norm_epsilon=norm_activation_config.norm_epsilon,
kernel_regularizer=l2_regularizer)
kernel_regularizer=l2_regularizer,
bn_trainable=backbone_cfg.bn_trainable)
......@@ -135,6 +135,7 @@ class ResNetTest(parameterized.TestCase, tf.test.TestCase):
kernel_initializer='VarianceScaling',
kernel_regularizer=None,
bias_regularizer=None,
bn_trainable=True
)
network = resnet.ResNet(**kwargs)
......
......@@ -72,6 +72,7 @@ class ResidualBlock(tf.keras.layers.Layer):
use_sync_bn=False,
norm_momentum=0.99,
norm_epsilon=0.001,
bn_trainable=True,
**kwargs):
"""Initializes a residual block with BN after convolutions.
......@@ -99,6 +100,8 @@ class ResidualBlock(tf.keras.layers.Layer):
use_sync_bn: A `bool`. If True, use synchronized batch normalization.
norm_momentum: A `float` of normalization momentum for the moving average.
norm_epsilon: A `float` added to variance to avoid dividing by zero.
bn_trainable: A `bool` that indicates whether batch norm layers should be
trainable. Default to True.
**kwargs: Additional keyword arguments to be passed.
"""
super(ResidualBlock, self).__init__(**kwargs)
......@@ -126,6 +129,7 @@ class ResidualBlock(tf.keras.layers.Layer):
else:
self._bn_axis = 1
self._activation_fn = tf_utils.get_activation(activation)
self._bn_trainable = bn_trainable
def build(self, input_shape):
if self._use_projection:
......@@ -140,7 +144,8 @@ class ResidualBlock(tf.keras.layers.Layer):
self._norm0 = self._norm(
axis=self._bn_axis,
momentum=self._norm_momentum,
epsilon=self._norm_epsilon)
epsilon=self._norm_epsilon,
trainable=self._bn_trainable)
self._conv1 = tf.keras.layers.Conv2D(
filters=self._filters,
......@@ -154,7 +159,8 @@ class ResidualBlock(tf.keras.layers.Layer):
self._norm1 = self._norm(
axis=self._bn_axis,
momentum=self._norm_momentum,
epsilon=self._norm_epsilon)
epsilon=self._norm_epsilon,
trainable=self._bn_trainable)
self._conv2 = tf.keras.layers.Conv2D(
filters=self._filters,
......@@ -168,7 +174,8 @@ class ResidualBlock(tf.keras.layers.Layer):
self._norm2 = self._norm(
axis=self._bn_axis,
momentum=self._norm_momentum,
epsilon=self._norm_epsilon)
epsilon=self._norm_epsilon,
trainable=self._bn_trainable)
if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1:
self._squeeze_excitation = nn_layers.SqueezeExcitation(
......@@ -203,7 +210,8 @@ class ResidualBlock(tf.keras.layers.Layer):
'activation': self._activation,
'use_sync_bn': self._use_sync_bn,
'norm_momentum': self._norm_momentum,
'norm_epsilon': self._norm_epsilon
'norm_epsilon': self._norm_epsilon,
'bn_trainable': self._bn_trainable
}
base_config = super(ResidualBlock, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
......@@ -249,6 +257,7 @@ class BottleneckBlock(tf.keras.layers.Layer):
use_sync_bn=False,
norm_momentum=0.99,
norm_epsilon=0.001,
bn_trainable=True,
**kwargs):
"""Initializes a standard bottleneck block with BN after convolutions.
......@@ -277,6 +286,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
use_sync_bn: A `bool`. If True, use synchronized batch normalization.
norm_momentum: A `float` of normalization momentum for the moving average.
norm_epsilon: A `float` added to variance to avoid dividing by zero.
bn_trainable: A `bool` that indicates whether batch norm layers should be
trainable. Default to True.
**kwargs: Additional keyword arguments to be passed.
"""
super(BottleneckBlock, self).__init__(**kwargs)
......@@ -303,6 +314,7 @@ class BottleneckBlock(tf.keras.layers.Layer):
self._bn_axis = -1
else:
self._bn_axis = 1
self._bn_trainable = bn_trainable
def build(self, input_shape):
if self._use_projection:
......@@ -330,7 +342,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
self._norm0 = self._norm(
axis=self._bn_axis,
momentum=self._norm_momentum,
epsilon=self._norm_epsilon)
epsilon=self._norm_epsilon,
trainable=self._bn_trainable)
self._conv1 = tf.keras.layers.Conv2D(
filters=self._filters,
......@@ -343,7 +356,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
self._norm1 = self._norm(
axis=self._bn_axis,
momentum=self._norm_momentum,
epsilon=self._norm_epsilon)
epsilon=self._norm_epsilon,
trainable=self._bn_trainable)
self._activation1 = tf_utils.get_activation(
self._activation, use_keras_layer=True)
......@@ -360,7 +374,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
self._norm2 = self._norm(
axis=self._bn_axis,
momentum=self._norm_momentum,
epsilon=self._norm_epsilon)
epsilon=self._norm_epsilon,
trainable=self._bn_trainable)
self._activation2 = tf_utils.get_activation(
self._activation, use_keras_layer=True)
......@@ -375,7 +390,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
self._norm3 = self._norm(
axis=self._bn_axis,
momentum=self._norm_momentum,
epsilon=self._norm_epsilon)
epsilon=self._norm_epsilon,
trainable=self._bn_trainable)
self._activation3 = tf_utils.get_activation(
self._activation, use_keras_layer=True)
......@@ -414,7 +430,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
'activation': self._activation,
'use_sync_bn': self._use_sync_bn,
'norm_momentum': self._norm_momentum,
'norm_epsilon': self._norm_epsilon
'norm_epsilon': self._norm_epsilon,
'bn_trainable': self._bn_trainable
}
base_config = super(BottleneckBlock, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
......
......@@ -425,7 +425,7 @@ class PositionalEncoding(tf.keras.layers.Layer):
self._rezero = Scale(initializer=initializer, name='rezero')
state_prefix = state_prefix if state_prefix is not None else ''
self._state_prefix = state_prefix
self._frame_count_name = f'{state_prefix}/pos_enc_frame_count'
self._frame_count_name = f'{state_prefix}_pos_enc_frame_count'
def get_config(self):
"""Returns a dictionary containing the config used for initialization."""
......@@ -523,7 +523,7 @@ class PositionalEncoding(tf.keras.layers.Layer):
inputs: An input `tf.Tensor`.
states: A `dict` of states such that, if any of the keys match for this
layer, will overwrite the contents of the buffer(s). Expected keys
include `state_prefix + '/pos_enc_frame_count'`.
include `state_prefix + '_pos_enc_frame_count'`.
output_states: A `bool`. If True, returns the output tensor and output
states. Returns just the output tensor otherwise.
......@@ -587,8 +587,8 @@ class GlobalAveragePool3D(tf.keras.layers.Layer):
state_prefix = state_prefix if state_prefix is not None else ''
self._state_prefix = state_prefix
self._state_name = f'{state_prefix}/pool_buffer'
self._frame_count_name = f'{state_prefix}/pool_frame_count'
self._state_name = f'{state_prefix}_pool_buffer'
self._frame_count_name = f'{state_prefix}_pool_frame_count'
def get_config(self):
"""Returns a dictionary containing the config used for initialization."""
......@@ -611,8 +611,8 @@ class GlobalAveragePool3D(tf.keras.layers.Layer):
inputs: An input `tf.Tensor`.
states: A `dict` of states such that, if any of the keys match for this
layer, will overwrite the contents of the buffer(s).
Expected keys include `state_prefix + '/pool_buffer'` and
`state_prefix + '/pool_frame_count'`.
Expected keys include `state_prefix + '__pool_buffer'` and
`state_prefix + '__pool_frame_count'`.
output_states: A `bool`. If True, returns the output tensor and output
states. Returns just the output tensor otherwise.
......
......@@ -384,7 +384,7 @@ class MaskRCNNModelTest(parameterized.TestCase, tf.test.TestCase):
ckpt.save(os.path.join(save_dir, 'ckpt'))
partial_ckpt = tf.train.Checkpoint(backbone=backbone)
partial_ckpt.restore(tf.train.latest_checkpoint(
partial_ckpt.read(tf.train.latest_checkpoint(
save_dir)).expect_partial().assert_existing_objects_matched()
if include_mask:
......
......@@ -646,3 +646,183 @@ def _saturation(image: tf.Tensor,
return augment.blend(tf.repeat(tf.image.rgb_to_grayscale(image), 3, axis=-1),
image,
saturation)
def random_crop_image_with_boxes_and_labels(img, boxes, labels, min_scale,
aspect_ratio_range,
min_overlap_params, max_retry):
"""Crops a random slice from the input image.
The function will correspondingly recompute the bounding boxes and filter out
outside boxes and their labels.
References:
[1] End-to-End Object Detection with Transformers
https://arxiv.org/abs/2005.12872
The preprocessing steps:
1. Sample a minimum IoU overlap.
2. For each trial, sample the new image width, height, and top-left corner.
3. Compute the IoUs of bounding boxes with the cropped image and retry if
the maximum IoU is below the sampled threshold.
4. Find boxes whose centers are in the cropped image.
5. Compute new bounding boxes in the cropped region and only select those
boxes' labels.
Args:
img: a 'Tensor' of shape [height, width, 3] representing the input image.
boxes: a 'Tensor' of shape [N, 4] representing the ground-truth bounding
boxes with (ymin, xmin, ymax, xmax).
labels: a 'Tensor' of shape [N,] representing the class labels of the boxes.
min_scale: a 'float' in [0.0, 1.0) indicating the lower bound of the random
scale variable.
aspect_ratio_range: a list of two 'float' that specifies the lower and upper
bound of the random aspect ratio.
min_overlap_params: a list of four 'float' representing the min value, max
value, step size, and offset for the minimum overlap sample.
max_retry: an 'int' representing the number of trials for cropping. If it is
exhausted, no cropping will be performed.
Returns:
img: a Tensor representing the random cropped image. Can be the
original image if max_retry is exhausted.
boxes: a Tensor representing the bounding boxes in the cropped image.
labels: a Tensor representing the new bounding boxes' labels.
"""
shape = tf.shape(img)
original_h = shape[0]
original_w = shape[1]
minval, maxval, step, offset = min_overlap_params
min_overlap = tf.math.floordiv(
tf.random.uniform([], minval=minval, maxval=maxval), step) * step - offset
min_overlap = tf.clip_by_value(min_overlap, 0.0, 1.1)
if min_overlap > 1.0:
return img, boxes, labels
aspect_ratio_low = aspect_ratio_range[0]
aspect_ratio_high = aspect_ratio_range[1]
for _ in tf.range(max_retry):
scale_h = tf.random.uniform([], min_scale, 1.0)
scale_w = tf.random.uniform([], min_scale, 1.0)
new_h = tf.cast(
scale_h * tf.cast(original_h, dtype=tf.float32), dtype=tf.int32)
new_w = tf.cast(
scale_w * tf.cast(original_w, dtype=tf.float32), dtype=tf.int32)
# Aspect ratio has to be in the prespecified range
aspect_ratio = new_h / new_w
if aspect_ratio_low > aspect_ratio or aspect_ratio > aspect_ratio_high:
continue
left = tf.random.uniform([], 0, original_w - new_w, dtype=tf.int32)
right = left + new_w
top = tf.random.uniform([], 0, original_h - new_h, dtype=tf.int32)
bottom = top + new_h
normalized_left = tf.cast(
left, dtype=tf.float32) / tf.cast(
original_w, dtype=tf.float32)
normalized_right = tf.cast(
right, dtype=tf.float32) / tf.cast(
original_w, dtype=tf.float32)
normalized_top = tf.cast(
top, dtype=tf.float32) / tf.cast(
original_h, dtype=tf.float32)
normalized_bottom = tf.cast(
bottom, dtype=tf.float32) / tf.cast(
original_h, dtype=tf.float32)
cropped_box = tf.expand_dims(
tf.stack([
normalized_top,
normalized_left,
normalized_bottom,
normalized_right,
]),
axis=0)
iou = box_ops.bbox_overlap(
tf.expand_dims(cropped_box, axis=0),
tf.expand_dims(boxes, axis=0)) # (1, 1, n_ground_truth)
iou = tf.squeeze(iou, axis=[0, 1])
# If not a single bounding box has a Jaccard overlap of greater than
# the minimum, try again
if tf.reduce_max(iou) < min_overlap:
continue
centroids = box_ops.yxyx_to_cycxhw(boxes)
mask = tf.math.logical_and(
tf.math.logical_and(centroids[:, 0] > normalized_top,
centroids[:, 0] < normalized_bottom),
tf.math.logical_and(centroids[:, 1] > normalized_left,
centroids[:, 1] < normalized_right))
# If not a single bounding box has its center in the crop, try again.
if tf.reduce_sum(tf.cast(mask, dtype=tf.int32)) > 0:
indices = tf.squeeze(tf.where(mask), axis=1)
filtered_boxes = tf.gather(boxes, indices)
boxes = tf.clip_by_value(
(filtered_boxes[..., :] * tf.cast(
tf.stack([original_h, original_w, original_h, original_w]),
dtype=tf.float32) -
tf.cast(tf.stack([top, left, top, left]), dtype=tf.float32)) /
tf.cast(tf.stack([new_h, new_w, new_h, new_w]), dtype=tf.float32),
0.0, 1.0)
img = tf.image.crop_to_bounding_box(img, top, left, bottom - top,
right - left)
labels = tf.gather(labels, indices)
break
return img, boxes, labels
def random_crop(image,
boxes,
labels,
min_scale=0.3,
aspect_ratio_range=(0.5, 2.0),
min_overlap_params=(0.0, 1.4, 0.2, 0.1),
max_retry=50,
seed=None):
"""Randomly crop the image and boxes, filtering labels.
Args:
image: a 'Tensor' of shape [height, width, 3] representing the input image.
boxes: a 'Tensor' of shape [N, 4] representing the ground-truth bounding
boxes with (ymin, xmin, ymax, xmax).
labels: a 'Tensor' of shape [N,] representing the class labels of the boxes.
min_scale: a 'float' in [0.0, 1.0) indicating the lower bound of the random
scale variable.
aspect_ratio_range: a list of two 'float' that specifies the lower and upper
bound of the random aspect ratio.
min_overlap_params: a list of four 'float' representing the min value, max
value, step size, and offset for the minimum overlap sample.
max_retry: an 'int' representing the number of trials for cropping. If it is
exhausted, no cropping will be performed.
seed: the random number seed of int, but could be None.
Returns:
image: a Tensor representing the random cropped image. Can be the
original image if max_retry is exhausted.
boxes: a Tensor representing the bounding boxes in the cropped image.
labels: a Tensor representing the new bounding boxes' labels.
"""
with tf.name_scope('random_crop'):
do_crop = tf.greater(tf.random.uniform([], seed=seed), 0.5)
if do_crop:
return random_crop_image_with_boxes_and_labels(image, boxes, labels,
min_scale,
aspect_ratio_range,
min_overlap_params,
max_retry)
else:
return image, boxes, labels
......@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for preprocess_ops.py."""
import io
......@@ -42,7 +41,7 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
([12, 2], 10),
([13, 2, 3], 10),
)
def testPadToFixedSize(self, input_shape, output_size):
def test_pad_to_fixed_size(self, input_shape, output_size):
# Copies input shape to padding shape.
clip_shape = input_shape[:]
clip_shape[0] = min(output_size, clip_shape[0])
......@@ -63,16 +62,11 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
(100, 256, 128, 256, 32, 1.0, 1.0, 128, 256),
(200, 512, 200, 128, 32, 0.25, 0.25, 224, 128),
)
def testResizeAndCropImageRectangluarCase(self,
input_height,
input_width,
desired_height,
desired_width,
stride,
scale_y,
scale_x,
output_height,
output_width):
def test_resize_and_crop_image_rectangluar_case(self, input_height,
input_width, desired_height,
desired_width, stride,
scale_y, scale_x,
output_height, output_width):
image = tf.convert_to_tensor(
np.random.rand(input_height, input_width, 3))
......@@ -98,16 +92,10 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
(100, 200, 220, 220, 32, 1.1, 1.1, 224, 224),
(512, 512, 1024, 1024, 32, 2.0, 2.0, 1024, 1024),
)
def testResizeAndCropImageSquareCase(self,
input_height,
input_width,
desired_height,
desired_width,
stride,
scale_y,
scale_x,
output_height,
output_width):
def test_resize_and_crop_image_square_case(self, input_height, input_width,
desired_height, desired_width,
stride, scale_y, scale_x,
output_height, output_width):
image = tf.convert_to_tensor(
np.random.rand(input_height, input_width, 3))
......@@ -135,18 +123,10 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
(100, 200, 80, 100, 32, 0.5, 0.5, 50, 100, 96, 128),
(200, 100, 80, 100, 32, 0.5, 0.5, 100, 50, 128, 96),
)
def testResizeAndCropImageV2(self,
input_height,
input_width,
short_side,
long_side,
stride,
scale_y,
scale_x,
desired_height,
desired_width,
output_height,
output_width):
def test_resize_and_crop_image_v2(self, input_height, input_width, short_side,
long_side, stride, scale_y, scale_x,
desired_height, desired_width,
output_height, output_width):
image = tf.convert_to_tensor(
np.random.rand(input_height, input_width, 3))
image_shape = tf.shape(image)[0:2]
......@@ -176,9 +156,7 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
@parameterized.parameters(
(400, 600), (600, 400),
)
def testCenterCropImage(self,
input_height,
input_width):
def test_center_crop_image(self, input_height, input_width):
image = tf.convert_to_tensor(
np.random.rand(input_height, input_width, 3))
cropped_image = preprocess_ops.center_crop_image(image)
......@@ -188,9 +166,7 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
@parameterized.parameters(
(400, 600), (600, 400),
)
def testCenterCropImageV2(self,
input_height,
input_width):
def test_center_crop_image_v2(self, input_height, input_width):
image_bytes = tf.constant(
_encode_image(
np.uint8(np.random.rand(input_height, input_width, 3) * 255),
......@@ -204,9 +180,7 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
@parameterized.parameters(
(400, 600), (600, 400),
)
def testRandomCropImage(self,
input_height,
input_width):
def test_random_crop_image(self, input_height, input_width):
image = tf.convert_to_tensor(
np.random.rand(input_height, input_width, 3))
_ = preprocess_ops.random_crop_image(image)
......@@ -214,9 +188,7 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
@parameterized.parameters(
(400, 600), (600, 400),
)
def testRandomCropImageV2(self,
input_height,
input_width):
def test_random_crop_image_v2(self, input_height, input_width):
image_bytes = tf.constant(
_encode_image(
np.uint8(np.random.rand(input_height, input_width, 3) * 255),
......@@ -244,6 +216,21 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
jittered_image = preprocess_ops._saturation(image, saturation)
assert jittered_image.shape == image.shape
@parameterized.parameters((640, 640, 20), (1280, 1280, 30))
def test_random_crop(self, input_height, input_width, num_boxes):
image = tf.convert_to_tensor(np.random.rand(input_height, input_width, 3))
boxes_height = np.random.randint(0, input_height, size=(num_boxes, 1))
top = np.random.randint(0, high=(input_height - boxes_height))
down = top + boxes_height
boxes_width = np.random.randint(0, input_width, size=(num_boxes, 1))
left = np.random.randint(0, high=(input_width - boxes_width))
right = left + boxes_width
boxes = tf.constant(
np.concatenate([top, left, down, right], axis=-1), tf.float32)
labels = tf.constant(
np.random.randint(low=0, high=num_boxes, size=(num_boxes,)), tf.int64)
_ = preprocess_ops.random_crop(image, boxes, labels)
if __name__ == '__main__':
tf.test.main()
......@@ -82,6 +82,8 @@ $ python3 -m official.vision.beta.projects.deepmac_maskrcnn.train \
```
`CONFIG_FILE` can be any file in the `configs/experiments` directory.
When using SpineNet models, please specify
`--experiment=deep_mask_head_rcnn_spinenet_coco`
**Note:** The default eval batch size of 32 discards some samples during
validation. For accurate vaidation statistics, launch a dedicated eval job on
......@@ -93,11 +95,12 @@ In the following table, we report the Mask mAP of our models on the non-VOC
classes when only training with masks for the VOC calsses. Performance is
measured on the `coco-val2017` set.
Backbone | Mask head | Config name | Mask mAP
:--------- | :----------- | :--------------------------------------- | -------:
ResNet-50 | Default | `deep_mask_head_rcnn_voc_r50.yaml` | 25.9
ResNet-50 | Hourglass-52 | `deep_mask_head_rcnn_voc_r50_hg52.yaml` | 33.1
ResNet-101 | Hourglass-52 | `deep_mask_head_rcnn_voc_r101_hg52.yaml` | 34.4
Backbone | Mask head | Config name | Mask mAP
:------------| :----------- | :-----------------------------------------------| -------:
ResNet-50 | Default | `deep_mask_head_rcnn_voc_r50.yaml` | 25.9
ResNet-50 | Hourglass-52 | `deep_mask_head_rcnn_voc_r50_hg52.yaml` | 33.1
ResNet-101 | Hourglass-52 | `deep_mask_head_rcnn_voc_r101_hg52.yaml` | 34.4
SpienNet-143 | Hourglass-52 | `deep_mask_head_rcnn_voc_spinenet143_hg52.yaml` | 38.7
## See also
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment