Commit c173234f authored by lzc5123016's avatar lzc5123016
Browse files

Merge remote-tracking branch 'upstream/master'

Sync with master.
parents 87ed703c 20a4313d
......@@ -4,7 +4,7 @@ http://stackoverflow.com/questions/tagged/tensorflow
Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:
1. It must be a bug or a feature request.
1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
2. The form below must be filled out.
**Here's why we have that policy**: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
......
......@@ -4,7 +4,7 @@ This repository contains a number of different models implemented in [TensorFlow
The [official models](official) are a collection of example models that use TensorFlow's high-level APIs. They are intended to be well-maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here.
The [research models](research) are a large collection of models implemented in TensorFlow by researchers. It is up to the individual researchers to maintain the models and/or provide support on issues and pull requests.
The [research models](https://github.com/tensorflow/models/tree/master/research) are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests.
The [samples folder](samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts.
......
......@@ -2,11 +2,11 @@
The TensorFlow official models are a collection of example models that use TensorFlow's high-level APIs. They are intended to be well-maintained, tested, and kept up to date with the latest TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read.
The master branch of the models are **in development**, and they target the [nightly binaries](https://github.com/tensorflow/tensorflow#installation) built from the [master branch of TensorFlow](https://github.com/tensorflow/tensorflow/tree/master).
The master branch of the models are **in development**, and they target the [nightly binaries](https://github.com/tensorflow/tensorflow#installation) built from the [master branch of TensorFlow](https://github.com/tensorflow/tensorflow/tree/master). We aim to keep them backwards compatible with the latest release when possible (currently TensorFlow 1.5), but we cannot always guarantee compatibility.
**Stable versions** of the official models targeting releases of TensorFlow are available as tagged branches or [downloadable releases](https://github.com/tensorflow/models/releases). Model repository version numbers match the target TensorFlow release, such that [branch r1.4.0](https://github.com/tensorflow/models/tree/r1.4.0) and [release v1.4.0](https://github.com/tensorflow/models/releases/tag/v1.4.0) are compatible with [TensorFlow v1.4.0](https://github.com/tensorflow/tensorflow/releases/tag/v1.4.0).
If you are on a version of TensorFlow earlier than v1.4, please [update your installation](https://www.tensorflow.org/install/).
If you are on a version of TensorFlow earlier than 1.4, please [update your installation](https://www.tensorflow.org/install/).
---
......
......@@ -91,6 +91,7 @@ def dataset(directory, images_file, labels_file):
def decode_label(label):
label = tf.decode_raw(label, tf.uint8) # tf.string -> [tf.uint8]
label = tf.reshape(label, []) # label is a scalar
return tf.to_int32(label)
images = tf.data.FixedLengthRecordDataset(
......
......@@ -142,7 +142,7 @@ def validate_batch_size_for_multi_gpu(batch_size):
if not num_gpus:
raise ValueError('Multi-GPU mode was specified, but no GPUs '
'were found. To use CPU, run without --multi_gpu.')
remainder = batch_size % num_gpus
if remainder:
err = ('When running with multiple GPUs, batch size '
......@@ -184,8 +184,7 @@ def main(unused_argv):
ds = dataset.train(FLAGS.data_dir)
ds = ds.cache().shuffle(buffer_size=50000).batch(FLAGS.batch_size).repeat(
FLAGS.train_epochs)
(images, labels) = ds.make_one_shot_iterator().get_next()
return (images, labels)
return ds
# Set up training hook that logs the training accuracy every 100 steps.
tensors_to_log = {'train_accuracy': 'train_accuracy'}
......
......@@ -27,6 +27,26 @@ import tensorflow as tf
import dataset
import mnist
# Cloud TPU Cluster Resolvers
tf.flags.DEFINE_string(
"gcp_project", default=None,
help="Project name for the Cloud TPU-enabled project. If not specified, we "
"will attempt to automatically detect the GCE project from metadata.")
tf.flags.DEFINE_string(
"tpu_zone", default=None,
help="GCE zone where the Cloud TPU is located in. If not specified, we "
"will attempt to automatically detect the GCE project from metadata.")
tf.flags.DEFINE_string(
"tpu_name", default=None,
help="Name of the Cloud TPU for Cluster Resolvers. You must specify either "
"this flag or --master.")
# Model specific paramenters
tf.flags.DEFINE_string(
"master", default=None,
help="GRPC URL of the master (e.g. grpc://ip.address.of.tpu:8470). You "
"must specify either this flag or --tpu_name.")
tf.flags.DEFINE_string("data_dir", "",
"Path to directory containing the MNIST dataset")
tf.flags.DEFINE_string("model_dir", None, "Estimator model_dir")
......@@ -40,7 +60,6 @@ tf.flags.DEFINE_integer("eval_steps", 0,
tf.flags.DEFINE_float("learning_rate", 0.05, "Learning rate.")
tf.flags.DEFINE_bool("use_tpu", True, "Use TPUs rather than plain CPUs")
tf.flags.DEFINE_string("master", "local", "GRPC URL of the Cloud TPU instance.")
tf.flags.DEFINE_integer("iterations", 50,
"Number of iterations per TPU training loop.")
tf.flags.DEFINE_integer("num_shards", 8, "Number of shards (TPU chips).")
......@@ -111,9 +130,25 @@ def main(argv):
del argv # Unused.
tf.logging.set_verbosity(tf.logging.INFO)
if FLAGS.master is None and FLAGS.tpu_name is None:
raise RuntimeError("You must specify either --master or --tpu_name.")
if FLAGS.master is not None:
if FLAGS.tpu_name is not None:
tf.logging.warn("Both --master and --tpu_name are set. Ignoring "
"--tpu_name and using --master.")
tpu_grpc_url = FLAGS.master
else:
tpu_cluster_resolver = (
tf.contrib.cluster_resolver.TPUClusterResolver(
tpu_names=[FLAGS.tpu_name],
zone=FLAGS.tpu_zone,
project=FLAGS.gcp_project))
tpu_grpc_url = tpu_cluster_resolver.get_master()
run_config = tf.contrib.tpu.RunConfig(
master=FLAGS.master,
evaluation_master=FLAGS.master,
master=tpu_grpc_url,
evaluation_master=tpu_grpc_url,
model_dir=FLAGS.model_dir,
session_config=tf.ConfigProto(
allow_soft_placement=True, log_device_placement=True),
......
......@@ -23,13 +23,14 @@ import sys
import tensorflow as tf
import resnet_model
import resnet_shared
import resnet
_HEIGHT = 32
_WIDTH = 32
_NUM_CHANNELS = 3
_DEFAULT_IMAGE_BYTES = _HEIGHT * _WIDTH * _NUM_CHANNELS
# The record is the image plus a one-byte label
_RECORD_BYTES = _DEFAULT_IMAGE_BYTES + 1
_NUM_CLASSES = 10
_NUM_DATA_FILES = 5
......@@ -42,12 +43,6 @@ _NUM_IMAGES = {
###############################################################################
# Data processing
###############################################################################
def record_dataset(filenames):
"""Returns an input pipeline Dataset from `filenames`."""
record_bytes = _DEFAULT_IMAGE_BYTES + 1
return tf.data.FixedLengthRecordDataset(filenames, record_bytes)
def get_filenames(is_training, data_dir):
"""Returns a list of filenames."""
data_dir = os.path.join(data_dir, 'cifar-10-batches-bin')
......@@ -65,13 +60,8 @@ def get_filenames(is_training, data_dir):
return [os.path.join(data_dir, 'test_batch.bin')]
def parse_record(raw_record):
def parse_record(raw_record, is_training):
"""Parse CIFAR-10 image and label from a raw record."""
# Every record consists of a label followed by the image, with a fixed number
# of bytes for each.
label_bytes = 1
record_bytes = label_bytes + _DEFAULT_IMAGE_BYTES
# Convert bytes to a vector of uint8 that is record_bytes long.
record_vector = tf.decode_raw(raw_record, tf.uint8)
......@@ -82,13 +72,15 @@ def parse_record(raw_record):
# The remaining bytes after the label represent the image, which we reshape
# from [depth * height * width] to [depth, height, width].
depth_major = tf.reshape(record_vector[label_bytes:record_bytes],
depth_major = tf.reshape(record_vector[1:_RECORD_BYTES],
[_NUM_CHANNELS, _HEIGHT, _WIDTH])
# Convert from [depth, height, width] to [height, width, depth], and cast as
# float32.
image = tf.cast(tf.transpose(depth_major, [1, 2, 0]), tf.float32)
image = preprocess_image(image, is_training)
return image, label
......@@ -110,7 +102,8 @@ def preprocess_image(image, is_training):
return image
def input_fn(is_training, data_dir, batch_size, num_epochs=1):
def input_fn(is_training, data_dir, batch_size, num_epochs=1,
num_parallel_calls=1):
"""Input_fn using the tf.data input pipeline for CIFAR-10 dataset.
Args:
......@@ -118,44 +111,34 @@ def input_fn(is_training, data_dir, batch_size, num_epochs=1):
data_dir: The directory containing the input data.
batch_size: The number of samples per batch.
num_epochs: The number of epochs to repeat the dataset.
num_parallel_calls: The number of records that are processed in parallel.
This can be optimized per data set but for generally homogeneous data
sets, should be approximately the number of available CPU cores.
Returns:
A tuple of images and labels.
A dataset that can be used for iteration.
"""
dataset = record_dataset(get_filenames(is_training, data_dir))
if is_training:
# When choosing shuffle buffer sizes, larger sizes result in better
# randomness, while smaller sizes have better performance. Because CIFAR-10
# is a relatively small dataset, we choose to shuffle the full epoch.
dataset = dataset.shuffle(buffer_size=_NUM_IMAGES['train'])
filenames = get_filenames(is_training, data_dir)
dataset = tf.data.FixedLengthRecordDataset(filenames, _RECORD_BYTES)
dataset = dataset.map(parse_record)
dataset = dataset.map(
lambda image, label: (preprocess_image(image, is_training), label))
dataset = dataset.prefetch(2 * batch_size)
# We call repeat after shuffling, rather than before, to prevent separate
# epochs from blending together.
dataset = dataset.repeat(num_epochs)
# Batch results by up to batch_size, and then fetch the tuple from the
# iterator.
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()
images, labels = iterator.get_next()
return images, labels
return resnet.process_record_dataset(dataset, is_training, batch_size,
_NUM_IMAGES['train'], parse_record, num_epochs, num_parallel_calls)
###############################################################################
# Running the model
###############################################################################
class Cifar10Model(resnet_model.Model):
class Cifar10Model(resnet.Model):
def __init__(self, resnet_size, data_format=None):
def __init__(self, resnet_size, data_format=None, num_classes=_NUM_CLASSES):
"""These are the parameters that work for CIFAR-10 data.
Args:
resnet_size: The number of convolutional layers needed in the model.
data_format: Either 'channels_first' or 'channels_last', specifying which
data format to use when setting up the model.
num_classes: The number of output classes needed from the model. This
enables users to extend the same model to their own datasets.
"""
if resnet_size % 6 != 2:
raise ValueError('resnet_size must be 6n + 2:', resnet_size)
......@@ -164,7 +147,7 @@ class Cifar10Model(resnet_model.Model):
super(Cifar10Model, self).__init__(
resnet_size=resnet_size,
num_classes=_NUM_CLASSES,
num_classes=num_classes,
num_filters=16,
kernel_size=3,
conv_stride=1,
......@@ -172,7 +155,7 @@ class Cifar10Model(resnet_model.Model):
first_pool_stride=None,
second_pool_size=8,
second_pool_stride=1,
block_fn=resnet_model.building_block,
block_fn=resnet.building_block,
block_sizes=[num_blocks] * 3,
block_strides=[1, 2, 2],
final_size=64,
......@@ -183,7 +166,7 @@ def cifar10_model_fn(features, labels, mode, params):
"""Model function for CIFAR-10."""
features = tf.reshape(features, [-1, _HEIGHT, _WIDTH, _NUM_CHANNELS])
learning_rate_fn = resnet_shared.learning_rate_with_decay(
learning_rate_fn = resnet.learning_rate_with_decay(
batch_size=params['batch_size'], batch_denom=128,
num_images=_NUM_IMAGES['train'], boundary_epochs=[100, 150, 200],
decay_rates=[1, 0.1, 0.01, 0.001])
......@@ -200,23 +183,23 @@ def cifar10_model_fn(features, labels, mode, params):
def loss_filter_fn(name):
return True
return resnet_shared.resnet_model_fn(features, labels, mode, Cifar10Model,
resnet_size=params['resnet_size'],
weight_decay=weight_decay,
learning_rate_fn=learning_rate_fn,
momentum=0.9,
data_format=params['data_format'],
loss_filter_fn=loss_filter_fn)
return resnet.resnet_model_fn(features, labels, mode, Cifar10Model,
resnet_size=params['resnet_size'],
weight_decay=weight_decay,
learning_rate_fn=learning_rate_fn,
momentum=0.9,
data_format=params['data_format'],
loss_filter_fn=loss_filter_fn)
def main(unused_argv):
resnet_shared.resnet_main(FLAGS, cifar10_model_fn, input_fn)
resnet.resnet_main(FLAGS, cifar10_model_fn, input_fn)
if __name__ == '__main__':
tf.logging.set_verbosity(tf.logging.INFO)
parser = resnet_shared.ResnetArgParser()
parser = resnet.ResnetArgParser()
# Set defaults that are reasonable for this model.
parser.set_defaults(data_dir='/tmp/cifar10_data',
model_dir='/tmp/cifar10_model',
......
......@@ -27,6 +27,9 @@ import cifar10_main
tf.logging.set_verbosity(tf.logging.ERROR)
_BATCH_SIZE = 128
_HEIGHT = 32
_WIDTH = 32
_NUM_CHANNELS = 3
class BaseTest(tf.test.TestCase):
......@@ -34,8 +37,8 @@ class BaseTest(tf.test.TestCase):
def test_dataset_input_fn(self):
fake_data = bytearray()
fake_data.append(7)
for i in range(3):
for _ in range(1024):
for i in range(_NUM_CHANNELS):
for _ in range(_HEIGHT * _WIDTH):
fake_data.append(i)
_, filename = mkstemp(dir=self.get_temp_dir())
......@@ -43,12 +46,14 @@ class BaseTest(tf.test.TestCase):
data_file.write(fake_data)
data_file.close()
fake_dataset = cifar10_main.record_dataset(filename)
fake_dataset = fake_dataset.map(cifar10_main.parse_record)
fake_dataset = tf.data.FixedLengthRecordDataset(
filename, cifar10_main._RECORD_BYTES)
fake_dataset = fake_dataset.map(
lambda val: cifar10_main.parse_record(val, False))
image, label = fake_dataset.make_one_shot_iterator().get_next()
self.assertEqual(label.get_shape().as_list(), [10])
self.assertEqual(image.get_shape().as_list(), [32, 32, 3])
self.assertAllEqual(label.shape, (10,))
self.assertAllEqual(image.shape, (_HEIGHT, _WIDTH, _NUM_CHANNELS))
with self.test_session() as sess:
image, label = sess.run([image, label])
......@@ -57,10 +62,10 @@ class BaseTest(tf.test.TestCase):
for row in image:
for pixel in row:
self.assertAllEqual(pixel, np.array([0, 1, 2]))
self.assertAllClose(pixel, np.array([-1.225, 0., 1.225]), rtol=1e-3)
def input_fn(self):
features = tf.random_uniform([_BATCH_SIZE, 32, 32, 3])
features = tf.random_uniform([_BATCH_SIZE, _HEIGHT, _WIDTH, _NUM_CHANNELS])
labels = tf.random_uniform(
[_BATCH_SIZE], maxval=9, dtype=tf.int32)
return features, tf.one_hot(labels, 10)
......@@ -102,6 +107,17 @@ class BaseTest(tf.test.TestCase):
def test_cifar10_model_fn_predict_mode(self):
self.cifar10_model_fn_helper(tf.estimator.ModeKeys.PREDICT)
def test_cifar10model_shape(self):
batch_size = 135
num_classes = 246
model = cifar10_main.Cifar10Model(
32, data_format='channels_last', num_classes=num_classes)
fake_input = tf.random_uniform([batch_size, _HEIGHT, _WIDTH, _NUM_CHANNELS])
output = model(fake_input, training=True)
self.assertAllEqual(output.shape, (batch_size, num_classes))
if __name__ == '__main__':
tf.test.main()
......@@ -23,8 +23,7 @@ import sys
import tensorflow as tf
import resnet_model
import resnet_shared
import resnet
import vgg_preprocessing
_DEFAULT_IMAGE_SIZE = 224
......@@ -36,19 +35,19 @@ _NUM_IMAGES = {
'validation': 50000,
}
_FILE_SHUFFLE_BUFFER = 1024
_NUM_TRAIN_FILES = 1024
_SHUFFLE_BUFFER = 1500
###############################################################################
# Data processing
###############################################################################
def filenames(is_training, data_dir):
def get_filenames(is_training, data_dir):
"""Return filenames for dataset."""
if is_training:
return [
os.path.join(data_dir, 'train-%05d-of-01024' % i)
for i in range(1024)]
for i in range(_NUM_TRAIN_FILES)]
else:
return [
os.path.join(data_dir, 'validation-%05d-of-00128' % i)
......@@ -83,6 +82,8 @@ def parse_record(raw_record, is_training):
image = tf.image.decode_image(
tf.reshape(parsed['image/encoded'], shape=[]),
_NUM_CHANNELS)
# Note that tf.image.convert_image_dtype scales the image data to [0, 1).
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image = vgg_preprocessing.preprocess_image(
......@@ -98,53 +99,62 @@ def parse_record(raw_record, is_training):
return image, tf.one_hot(label, _NUM_CLASSES)
def input_fn(is_training, data_dir, batch_size, num_epochs=1):
"""Input function which provides batches for train or eval."""
dataset = tf.data.Dataset.from_tensor_slices(
filenames(is_training, data_dir))
def input_fn(is_training, data_dir, batch_size, num_epochs=1,
num_parallel_calls=1):
"""Input function which provides batches for train or eval.
Args:
is_training: A boolean denoting whether the input is for training.
data_dir: The directory containing the input data.
batch_size: The number of samples per batch.
num_epochs: The number of epochs to repeat the dataset.
num_parallel_calls: The number of records that are processed in parallel.
This can be optimized per data set but for generally homogeneous data
sets, should be approximately the number of available CPU cores.
Returns:
A dataset that can be used for iteration.
"""
filenames = get_filenames(is_training, data_dir)
dataset = tf.data.Dataset.from_tensor_slices(filenames)
if is_training:
dataset = dataset.shuffle(buffer_size=_FILE_SHUFFLE_BUFFER)
# Shuffle the input files
dataset = dataset.shuffle(buffer_size=_NUM_TRAIN_FILES)
# Convert to individual records
dataset = dataset.flat_map(tf.data.TFRecordDataset)
dataset = dataset.map(lambda value: parse_record(value, is_training),
num_parallel_calls=5)
dataset = dataset.prefetch(batch_size)
if is_training:
# When choosing shuffle buffer sizes, larger sizes result in better
# randomness, while smaller sizes have better performance.
dataset = dataset.shuffle(buffer_size=_SHUFFLE_BUFFER)
# We call repeat after shuffling, rather than before, to prevent separate
# epochs from blending together.
dataset = dataset.repeat(num_epochs)
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()
images, labels = iterator.get_next()
return images, labels
return resnet.process_record_dataset(dataset, is_training, batch_size,
_SHUFFLE_BUFFER, parse_record, num_epochs, num_parallel_calls)
###############################################################################
# Running the model
###############################################################################
class ImagenetModel(resnet_model.Model):
def __init__(self, resnet_size, data_format=None):
class ImagenetModel(resnet.Model):
def __init__(self, resnet_size, data_format=None, num_classes=_NUM_CLASSES):
"""These are the parameters that work for Imagenet data.
Args:
resnet_size: The number of convolutional layers needed in the model.
data_format: Either 'channels_first' or 'channels_last', specifying which
data format to use when setting up the model.
num_classes: The number of output classes needed from the model. This
enables users to extend the same model to their own datasets.
"""
# For bigger models, we want to use "bottleneck" layers
if resnet_size < 50:
block_fn = resnet_model.building_block
block_fn = resnet.building_block
final_size = 512
else:
block_fn = resnet_model.bottleneck_block
block_fn = resnet.bottleneck_block
final_size = 2048
super(ImagenetModel, self).__init__(
resnet_size=resnet_size,
num_classes=_NUM_CLASSES,
num_classes=num_classes,
num_filters=64,
kernel_size=7,
conv_stride=2,
......@@ -184,28 +194,28 @@ def _get_block_sizes(resnet_size):
def imagenet_model_fn(features, labels, mode, params):
"""Our model_fn for ResNet to be used with our Estimator."""
learning_rate_fn = resnet_shared.learning_rate_with_decay(
learning_rate_fn = resnet.learning_rate_with_decay(
batch_size=params['batch_size'], batch_denom=256,
num_images=_NUM_IMAGES['train'], boundary_epochs=[30, 60, 80, 90],
decay_rates=[1, 0.1, 0.01, 0.001, 1e-4])
return resnet_shared.resnet_model_fn(features, labels, mode, ImagenetModel,
resnet_size=params['resnet_size'],
weight_decay=1e-4,
learning_rate_fn=learning_rate_fn,
momentum=0.9,
data_format=params['data_format'],
loss_filter_fn=None)
return resnet.resnet_model_fn(features, labels, mode, ImagenetModel,
resnet_size=params['resnet_size'],
weight_decay=1e-4,
learning_rate_fn=learning_rate_fn,
momentum=0.9,
data_format=params['data_format'],
loss_filter_fn=None)
def main(unused_argv):
resnet_shared.resnet_main(FLAGS, imagenet_model_fn, input_fn)
resnet.resnet_main(FLAGS, imagenet_model_fn, input_fn)
if __name__ == '__main__':
tf.logging.set_verbosity(tf.logging.INFO)
parser = resnet_shared.ResnetArgParser(
parser = resnet.ResnetArgParser(
resnet_size_choices=[18, 34, 50, 101, 152, 200])
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(argv=[sys.argv[0]] + unparsed)
......@@ -176,6 +176,17 @@ class BaseTest(tf.test.TestCase):
def test_resnet_model_fn_predict_mode(self):
self.resnet_model_fn_helper(tf.estimator.ModeKeys.PREDICT)
def test_imagenetmodel_shape(self):
batch_size = 135
num_classes = 246
model = imagenet_main.ImagenetModel(
50, data_format='channels_last', num_classes=num_classes)
fake_input = tf.random_uniform([batch_size, 224, 224, 3])
output = model(fake_input, training=True)
self.assertAllEqual(output.shape, (batch_size, num_classes))
if __name__ == '__main__':
tf.test.main()
......@@ -12,7 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains definitions for the preactivation form of Residual Networks.
"""Contains definitions for the preactivation form of Residual Networks
(also known as ResNet v2).
Residual networks (ResNets) were originally proposed in:
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
......@@ -32,12 +33,69 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import tensorflow as tf
_BATCH_NORM_DECAY = 0.997
_BATCH_NORM_EPSILON = 1e-5
################################################################################
# Functions for input processing.
################################################################################
def process_record_dataset(dataset, is_training, batch_size, shuffle_buffer,
parse_record_fn, num_epochs=1, num_parallel_calls=1):
"""Given a Dataset with raw records, parse each record into images and labels,
and return an iterator over the records.
Args:
dataset: A Dataset representing raw records
is_training: A boolean denoting whether the input is for training.
batch_size: The number of samples per batch.
shuffle_buffer: The buffer size to use when shuffling records. A larger
value results in better randomness, but smaller values reduce startup
time and use less memory.
parse_record_fn: A function that takes a raw record and returns the
corresponding (image, label) pair.
num_epochs: The number of epochs to repeat the dataset.
num_parallel_calls: The number of records that are processed in parallel.
This can be optimized per data set but for generally homogeneous data
sets, should be approximately the number of available CPU cores.
Returns:
Dataset of (image, label) pairs ready for iteration.
"""
# We prefetch a batch at a time, This can help smooth out the time taken to
# load input files as we go through shuffling and processing.
dataset = dataset.prefetch(buffer_size=batch_size)
if is_training:
# Shuffle the records. Note that we shuffle before repeating to ensure
# that the shuffling respects epoch boundaries.
dataset = dataset.shuffle(buffer_size=shuffle_buffer)
# If we are training over multiple epochs before evaluating, repeat the
# dataset for the appropriate number of epochs.
dataset = dataset.repeat(num_epochs)
# Parse the raw records into images and labels
dataset = dataset.map(lambda value: parse_record_fn(value, is_training),
num_parallel_calls=num_parallel_calls)
dataset = dataset.batch(batch_size)
# Operations between the final prefetch and the get_next call to the iterator
# will happen synchronously during run time. We prefetch here again to
# background all of the above processing work and keep it out of the
# critical training path.
dataset = dataset.prefetch(1)
return dataset
################################################################################
# Functions building the ResNet model.
################################################################################
def batch_norm_relu(inputs, training, data_format):
"""Performs a batch normalization followed by a ReLU."""
# We set fused=True for a significant performance boost. See
......@@ -318,3 +376,235 @@ class Model(object):
inputs = tf.layers.dense(inputs=inputs, units=self.num_classes)
inputs = tf.identity(inputs, 'final_dense')
return inputs
################################################################################
# Functions for running training/eval/validation loops for the model.
################################################################################
def learning_rate_with_decay(
batch_size, batch_denom, num_images, boundary_epochs, decay_rates):
"""Get a learning rate that decays step-wise as training progresses.
Args:
batch_size: the number of examples processed in each training batch.
batch_denom: this value will be used to scale the base learning rate.
`0.1 * batch size` is divided by this number, such that when
batch_denom == batch_size, the initial learning rate will be 0.1.
num_images: total number of images that will be used for training.
boundary_epochs: list of ints representing the epochs at which we
decay the learning rate.
decay_rates: list of floats representing the decay rates to be used
for scaling the learning rate. Should be the same length as
boundary_epochs.
Returns:
Returns a function that takes a single argument - the number of batches
trained so far (global_step)- and returns the learning rate to be used
for training the next batch.
"""
initial_learning_rate = 0.1 * batch_size / batch_denom
batches_per_epoch = num_images / batch_size
# Multiply the learning rate by 0.1 at 100, 150, and 200 epochs.
boundaries = [int(batches_per_epoch * epoch) for epoch in boundary_epochs]
vals = [initial_learning_rate * decay for decay in decay_rates]
def learning_rate_fn(global_step):
global_step = tf.cast(global_step, tf.int32)
return tf.train.piecewise_constant(global_step, boundaries, vals)
return learning_rate_fn
def resnet_model_fn(features, labels, mode, model_class,
resnet_size, weight_decay, learning_rate_fn, momentum,
data_format, loss_filter_fn=None):
"""Shared functionality for different resnet model_fns.
Initializes the ResnetModel representing the model layers
and uses that model to build the necessary EstimatorSpecs for
the `mode` in question. For training, this means building losses,
the optimizer, and the train op that get passed into the EstimatorSpec.
For evaluation and prediction, the EstimatorSpec is returned without
a train op, but with the necessary parameters for the given mode.
Args:
features: tensor representing input images
labels: tensor representing class labels for all input images
mode: current estimator mode; should be one of
`tf.estimator.ModeKeys.TRAIN`, `EVALUATE`, `PREDICT`
model_class: a class representing a TensorFlow model that has a __call__
function. We assume here that this is a subclass of ResnetModel.
resnet_size: A single integer for the size of the ResNet model.
weight_decay: weight decay loss rate used to regularize learned variables.
learning_rate_fn: function that returns the current learning rate given
the current global_step
momentum: momentum term used for optimization
data_format: Input format ('channels_last', 'channels_first', or None).
If set to None, the format is dependent on whether a GPU is available.
loss_filter_fn: function that takes a string variable name and returns
True if the var should be included in loss calculation, and False
otherwise. If None, batch_normalization variables will be excluded
from the loss.
Returns:
EstimatorSpec parameterized according to the input params and the
current mode.
"""
# Generate a summary node for the images
tf.summary.image('images', features, max_outputs=6)
model = model_class(resnet_size, data_format)
logits = model(features, mode == tf.estimator.ModeKeys.TRAIN)
predictions = {
'classes': tf.argmax(logits, axis=1),
'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
}
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
# Calculate loss, which includes softmax cross entropy and L2 regularization.
cross_entropy = tf.losses.softmax_cross_entropy(
logits=logits, onehot_labels=labels)
# Create a tensor named cross_entropy for logging purposes.
tf.identity(cross_entropy, name='cross_entropy')
tf.summary.scalar('cross_entropy', cross_entropy)
# If no loss_filter_fn is passed, assume we want the default behavior,
# which is that batch_normalization variables are excluded from loss.
if not loss_filter_fn:
def loss_filter_fn(name):
return 'batch_normalization' not in name
# Add weight decay to the loss.
loss = cross_entropy + weight_decay * tf.add_n(
[tf.nn.l2_loss(v) for v in tf.trainable_variables()
if loss_filter_fn(v.name)])
if mode == tf.estimator.ModeKeys.TRAIN:
global_step = tf.train.get_or_create_global_step()
learning_rate = learning_rate_fn(global_step)
# Create a tensor named learning_rate for logging purposes
tf.identity(learning_rate, name='learning_rate')
tf.summary.scalar('learning_rate', learning_rate)
optimizer = tf.train.MomentumOptimizer(
learning_rate=learning_rate,
momentum=momentum)
# Batch norm requires update ops to be added as a dependency to train_op
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss, global_step)
else:
train_op = None
accuracy = tf.metrics.accuracy(
tf.argmax(labels, axis=1), predictions['classes'])
metrics = {'accuracy': accuracy}
# Create a tensor named train_accuracy for logging purposes
tf.identity(accuracy[1], name='train_accuracy')
tf.summary.scalar('train_accuracy', accuracy[1])
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=loss,
train_op=train_op,
eval_metric_ops=metrics)
def resnet_main(flags, model_function, input_function):
# Using the Winograd non-fused algorithms provides a small performance boost.
os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'
# Set up a RunConfig to only save checkpoints once per training cycle.
run_config = tf.estimator.RunConfig().replace(save_checkpoints_secs=1e9)
classifier = tf.estimator.Estimator(
model_fn=model_function, model_dir=flags.model_dir, config=run_config,
params={
'resnet_size': flags.resnet_size,
'data_format': flags.data_format,
'batch_size': flags.batch_size,
})
for _ in range(flags.train_epochs // flags.epochs_per_eval):
tensors_to_log = {
'learning_rate': 'learning_rate',
'cross_entropy': 'cross_entropy',
'train_accuracy': 'train_accuracy'
}
logging_hook = tf.train.LoggingTensorHook(
tensors=tensors_to_log, every_n_iter=100)
print('Starting a training cycle.')
def input_fn_train():
return input_function(True, flags.data_dir, flags.batch_size,
flags.epochs_per_eval, flags.num_parallel_calls)
classifier.train(input_fn=input_fn_train, hooks=[logging_hook])
print('Starting to evaluate.')
# Evaluate the model and print results
def input_fn_eval():
return input_function(False, flags.data_dir, flags.batch_size,
1, flags.num_parallel_calls)
eval_results = classifier.evaluate(input_fn=input_fn_eval)
print(eval_results)
class ResnetArgParser(argparse.ArgumentParser):
"""Arguments for configuring and running a Resnet Model.
"""
def __init__(self, resnet_size_choices=None):
super(ResnetArgParser, self).__init__()
self.add_argument(
'--data_dir', type=str, default='/tmp/resnet_data',
help='The directory where the input data is stored.')
self.add_argument(
'--num_parallel_calls', type=int, default=5,
help='The number of records that are processed in parallel '
'during input processing. This can be optimized per data set but '
'for generally homogeneous data sets, should be approximately the '
'number of available CPU cores.')
self.add_argument(
'--model_dir', type=str, default='/tmp/resnet_model',
help='The directory where the model will be stored.')
self.add_argument(
'--resnet_size', type=int, default=50,
choices=resnet_size_choices,
help='The size of the ResNet model to use.')
self.add_argument(
'--train_epochs', type=int, default=100,
help='The number of epochs to use for training.')
self.add_argument(
'--epochs_per_eval', type=int, default=1,
help='The number of training epochs to run between evaluations.')
self.add_argument(
'--batch_size', type=int, default=32,
help='Batch size for training and evaluation.')
self.add_argument(
'--data_format', type=str, default=None,
choices=['channels_first', 'channels_last'],
help='A flag to override the data format used in the model. '
'channels_first provides a performance boost on GPU but '
'is not always compatible with CPU. If left unspecified, '
'the data format will be chosen automatically based on '
'whether TensorFlow was built for CPU or GPU.')
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Functions for running Resnet that are shared across datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import tensorflow as tf
def learning_rate_with_decay(
batch_size, batch_denom, num_images, boundary_epochs, decay_rates):
"""Get a learning rate that decays step-wise as training progresses.
Args:
batch_size: the number of examples processed in each training batch.
batch_denom: this value will be used to scale the base learning rate.
`0.1 * batch size` is divided by this number, such that when
batch_denom == batch_size, the initial learning rate will be 0.1.
num_images: total number of images that will be used for training.
boundary_epochs: list of ints representing the epochs at which we
decay the learning rate.
decay_rates: list of floats representing the decay rates to be used
for scaling the learning rate. Should be the same length as
boundary_epochs.
Returns:
Returns a function that takes a single argument - the number of batches
trained so far (global_step)- and returns the learning rate to be used
for training the next batch.
"""
initial_learning_rate = 0.1 * batch_size / batch_denom
batches_per_epoch = num_images / batch_size
# Multiply the learning rate by 0.1 at 100, 150, and 200 epochs.
boundaries = [int(batches_per_epoch * epoch) for epoch in boundary_epochs]
vals = [initial_learning_rate * decay for decay in decay_rates]
def learning_rate_fn(global_step):
global_step = tf.cast(global_step, tf.int32)
return tf.train.piecewise_constant(global_step, boundaries, vals)
return learning_rate_fn
def resnet_model_fn(features, labels, mode, model_class,
resnet_size, weight_decay, learning_rate_fn, momentum,
data_format, loss_filter_fn=None):
"""Shared functionality for different resnet model_fns.
Initializes the ResnetModel representing the model layers
and uses that model to build the necessary EstimatorSpecs for
the `mode` in question. For training, this means building losses,
the optimizer, and the train op that get passed into the EstimatorSpec.
For evaluation and prediction, the EstimatorSpec is returned without
a train op, but with the necessary parameters for the given mode.
Args:
features: tensor representing input images
labels: tensor representing class labels for all input images
mode: current estimator mode; should be one of
`tf.estimator.ModeKeys.TRAIN`, `EVALUATE`, `PREDICT`
model_class: a class representing a TensorFlow model that has a __call__
function. We assume here that this is a subclass of ResnetModel.
resnet_size: A single integer for the size of the ResNet model.
weight_decay: weight decay loss rate used to regularize learned variables.
learning_rate_fn: function that returns the current learning rate given
the current global_step
momentum: momentum term used for optimization
data_format: Input format ('channels_last', 'channels_first', or None).
If set to None, the format is dependent on whether a GPU is available.
loss_filter_fn: function that takes a string variable name and returns
True if the var should be included in loss calculation, and False
otherwise. If None, batch_normalization variables will be excluded
from the loss.
Returns:
EstimatorSpec parameterized according to the input params and the
current mode.
"""
# Generate a summary node for the images
tf.summary.image('images', features, max_outputs=6)
model = model_class(resnet_size, data_format)
logits = model(features, mode == tf.estimator.ModeKeys.TRAIN)
predictions = {
'classes': tf.argmax(logits, axis=1),
'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
}
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
# Calculate loss, which includes softmax cross entropy and L2 regularization.
cross_entropy = tf.losses.softmax_cross_entropy(
logits=logits, onehot_labels=labels)
# Create a tensor named cross_entropy for logging purposes.
tf.identity(cross_entropy, name='cross_entropy')
tf.summary.scalar('cross_entropy', cross_entropy)
# If no loss_filter_fn is passed, assume we want the default behavior,
# which is that batch_normalization variables are excluded from loss.
if not loss_filter_fn:
def loss_filter_fn(name):
return 'batch_normalization' not in name
# Add weight decay to the loss.
loss = cross_entropy + weight_decay * tf.add_n(
[tf.nn.l2_loss(v) for v in tf.trainable_variables()
if loss_filter_fn(v.name)])
if mode == tf.estimator.ModeKeys.TRAIN:
global_step = tf.train.get_or_create_global_step()
learning_rate = learning_rate_fn(global_step)
# Create a tensor named learning_rate for logging purposes
tf.identity(learning_rate, name='learning_rate')
tf.summary.scalar('learning_rate', learning_rate)
optimizer = tf.train.MomentumOptimizer(
learning_rate=learning_rate,
momentum=momentum)
# Batch norm requires update ops to be added as a dependency to train_op
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss, global_step)
else:
train_op = None
accuracy = tf.metrics.accuracy(
tf.argmax(labels, axis=1), predictions['classes'])
metrics = {'accuracy': accuracy}
# Create a tensor named train_accuracy for logging purposes
tf.identity(accuracy[1], name='train_accuracy')
tf.summary.scalar('train_accuracy', accuracy[1])
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=loss,
train_op=train_op,
eval_metric_ops=metrics)
def resnet_main(flags, model_function, input_function):
# Using the Winograd non-fused algorithms provides a small performance boost.
os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'
# Set up a RunConfig to only save checkpoints once per training cycle.
run_config = tf.estimator.RunConfig().replace(save_checkpoints_secs=1e9)
classifier = tf.estimator.Estimator(
model_fn=model_function, model_dir=flags.model_dir, config=run_config,
params={
'resnet_size': flags.resnet_size,
'data_format': flags.data_format,
'batch_size': flags.batch_size,
})
for _ in range(flags.train_epochs // flags.epochs_per_eval):
tensors_to_log = {
'learning_rate': 'learning_rate',
'cross_entropy': 'cross_entropy',
'train_accuracy': 'train_accuracy'
}
logging_hook = tf.train.LoggingTensorHook(
tensors=tensors_to_log, every_n_iter=100)
print('Starting a training cycle.')
classifier.train(
input_fn=lambda: input_function(
True, flags.data_dir, flags.batch_size, flags.epochs_per_eval),
hooks=[logging_hook])
print('Starting to evaluate.')
# Evaluate the model and print results
eval_results = classifier.evaluate(input_fn=lambda: input_function(
False, flags.data_dir, flags.batch_size))
print(eval_results)
class ResnetArgParser(argparse.ArgumentParser):
"""Arguments for configuring and running a Resnet Model.
"""
def __init__(self, resnet_size_choices=None):
super(ResnetArgParser, self).__init__()
self.add_argument(
'--data_dir', type=str, default='/tmp/resnet_data',
help='The directory where the input data is stored.')
self.add_argument(
'--model_dir', type=str, default='/tmp/resnet_model',
help='The directory where the model will be stored.')
self.add_argument(
'--resnet_size', type=int, default=50,
choices=resnet_size_choices,
help='The size of the ResNet model to use.')
self.add_argument(
'--train_epochs', type=int, default=100,
help='The number of epochs to use for training.')
self.add_argument(
'--epochs_per_eval', type=int, default=1,
help='The number of training epochs to run between evaluations.')
self.add_argument(
'--batch_size', type=int, default=32,
help='Batch size for training and evaluation.')
self.add_argument(
'--data_format', type=str, default=None,
choices=['channels_first', 'channels_last'],
help='A flag to override the data format used in the model. '
'channels_first provides a performance boost on GPU but '
'is not always compatible with CPU. If left unspecified, '
'the data format will be chosen automatically based on '
'whether TensorFlow was built for CPU or GPU.')
......@@ -192,10 +192,7 @@ def input_fn(data_file, num_epochs, shuffle, batch_size):
# epochs from blending together.
dataset = dataset.repeat(num_epochs)
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
return features, labels
return dataset
def main(unused_argv):
......
......@@ -54,7 +54,9 @@ class BaseTest(tf.test.TestCase):
temp_csv.write(TEST_INPUT)
def test_input_fn(self):
features, labels = wide_deep.input_fn(self.input_csv, 1, False, 1)
dataset = wide_deep.input_fn(self.input_csv, 1, False, 1)
features, labels = dataset.make_one_shot_iterator().get_next()
with tf.Session() as sess:
features, labels = sess.run((features, labels))
......
......@@ -150,12 +150,17 @@ def spectrogram_to_mel_matrix(num_mel_bins=20,
An np.array with shape (num_spectrogram_bins, num_mel_bins).
Raises:
ValueError: if frequency edges are incorrectly ordered.
ValueError: if frequency edges are incorrectly ordered or out of range.
"""
nyquist_hertz = audio_sample_rate / 2.
if lower_edge_hertz < 0.0:
raise ValueError("lower_edge_hertz %.1f must be >= 0" % lower_edge_hertz)
if lower_edge_hertz >= upper_edge_hertz:
raise ValueError("lower_edge_hertz %.1f >= upper_edge_hertz %.1f" %
(lower_edge_hertz, upper_edge_hertz))
if upper_edge_hertz > nyquist_hertz:
raise ValueError("upper_edge_hertz %.1f is greater than Nyquist %.1f" %
(upper_edge_hertz, nyquist_hertz))
spectrogram_bins_hertz = np.linspace(0.0, nyquist_hertz, num_spectrogram_bins)
spectrogram_bins_mel = hertz_to_mel(spectrogram_bins_hertz)
# The i'th mel band (starting from i=1) has center frequency
......
......@@ -124,5 +124,6 @@ def load_vggish_slim_checkpoint(session, checkpoint_path):
vggish_vars = [v for v in tf.global_variables() if v.name in vggish_var_names]
# Use a Saver to restore just the variables selected above.
saver = tf.train.Saver(vggish_vars, name='vggish_load_pretrained')
saver = tf.train.Saver(vggish_vars, name='vggish_load_pretrained',
write_version=1)
saver.restore(session, checkpoint_path)
......@@ -62,6 +62,6 @@ The image `matched_images.png` is generated and should look similar to this one:
`matplotlib` may complain with a message such as `no display name and no
$DISPLAY environment variable`. To fix this, one option is add the line
`backend : Agg` to the file `config/matplotlib/matplotlibrc`. On this problem,
`backend : Agg` to the file `.config/matplotlib/matplotlibrc`. On this problem,
see the discussion
[here](https://stackoverflow.com/questions/37604289/tkinter-tclerror-no-display-name-and-no-display-environment-variable).
......@@ -72,9 +72,11 @@ feature extraction and matching:
## Dataset
The Google-Landmarks dataset will be released together with a Kaggle-hosted
landmark recognition competition. We will include the link to it here once it is
launched (expect this to be done around mid-January, 2018).
The Google Landmarks dataset has been released as part of two Kaggle challenges:
[Landmark Recognition](https://www.kaggle.com/c/landmark-recognition-challenge)
and [Landmark Retrieval](https://www.kaggle.com/c/landmark-retrieval-challenge).
If you make use of the dataset in your research, please consider citing the
paper mentioned above.
## Maintainers
......
......@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Python interface for DatumProto.
DatumProto is protocol buffer used to serialize tensor with arbitrary shape.
......@@ -93,7 +92,7 @@ def ReadFromFile(file_path):
Returns:
data: Numpy array.
"""
with tf.gfile.FastGFile(file_path, 'r') as f:
with tf.gfile.FastGFile(file_path, 'rb') as f:
return ParseFromString(f.read())
......
......@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Python interface for DelfFeatures proto.
Support read and write of DelfFeatures from/to numpy arrays and file.
......@@ -169,7 +168,7 @@ def ReadFromFile(file_path):
attention: [N] float array with attention scores.
orientations: [N] float array with orientations.
"""
with tf.gfile.FastGFile(file_path, 'r') as f:
with tf.gfile.FastGFile(file_path, 'rb') as f:
return ParseFromString(f.read())
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment