Commit 4157e58e authored by Ray Smith's avatar Ray Smith
Browse files

Added STREET model for FSNS dataset

parent 43dad800
# StreetView Tensorflow Recurrent End-to-End Transcription (STREET) Model.
A TensorFlow implementation of the STREET model described in the paper:
"End-to-End Interpretation of the French Street Name Signs Dataset"
Raymond Smith, Chunhui Gu, Dar-Shyang Lee, Huiyi Hu, Ranjith
Unnikrishnan, Julian Ibarz, Sacha Arnoud, Sophia Lin.
*International Workshop on Robust Reading, Amsterdam, 9 October 2016.*
Available at: http://link.springer.com/chapter/10.1007%2F978-3-319-46604-0_30
## Contact
***Author:*** Ray Smith (rays@google.com).
***Pull requests and issues:*** @theraysmith.
## Contents
* [Introduction](#introduction)
* [Installing and setting up the STREET model](#installing-and-setting-up-the-street-model)
* [Downloading the datasets](#downloading-the-datasets)
* [Confidence Tests](#confidence-tests)
* [Training a model](#training-a-model)
* [The Variable Graph Specification Language](#the-variable-graph-specification-language)
## Introduction
The *STREET* model is a deep recurrent neural network that learns how to
identify the name of a street (in France) from an image containing upto four
different views of the street name sign. The model merges information from the
different views and normalizes the text to the correct format. For example:
<center>
![Example image](g3doc/avdessapins.png)
Avenue des Sapins
</center>
## Installing and setting up the STREET model
[Install Tensorflow](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#virtualenv-installation)
Install numpy:
```
sudo pip install numpy
```
Build the LSTM op:
```
cd cc
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
g++ -std=c++11 -shared rnn_ops.cc -o rnn_ops.so -fPIC -I $TF_INC -O3 -mavx
```
Run the unittests:
```
cd ../python
python decoder_test.py
python errorcounter_test.py
python shapes_test.py
python vgslspecs_test.py
python vgsl_model_test.py
```
## Downloading the datasets
The French Street Name Signs (FSNS) datasets can be downloaded from:
`https://download.tensorflow.org/data/fsns-20160927`
Note that these datasets are very large. The approximate sizes are:
* Train: 512 files of 300MB each.
* Validation: 64 files of 40MB each.
* Test: 64 file of 50MB each.
* Testdata: some smaller data files of a few MB for testing.
## Confidence Tests
The datasets download includes a directory `testdata` that contains some small
datasets that are big enough to test that models can actually learn something.
Assuming that you have put the downloads in directory `data` alongside
`python` then you can run the following tests:
### Mnist for zero-dimensional data
```
cd python
train_dir=/tmp/mnist
rm -rf $train_dir
python vgsl_train.py --model_str='16,0,0,1[Ct5,5,16 Mp3,3 Lfys32 Lfxs64]O0s12' \
--max_steps=1024 --train_data=../data/testdata/mnist-sample-00000-of-00001 \
--initial_learning_rate=0.001 --final_learning_rate=0.001 \
--num_preprocess_threads=1 --train_dir=$train_dir
python vgsl_eval.py --model_str='16,0,0,1[Ct5,5,16 Mp3,3 Lfys32 Lfxs64]O0s12' \
--num_steps=256 --eval_data=../data/testdata/mnist-sample-00000-of-00001 \
--num_preprocess_threads=1 --decoder=../testdata/numbers.charset_size=12.txt \
--eval_interval_secs=0 --train_dir=$train_dir --eval_dir=$train_dir/eval
```
Depending on your machine, this should run in about 1 minute, and should obtain
error rates below 50%. Actual error rates will vary according to random
initialization.
### Fixed-length targets for number recognition
```
cd python
train_dir=/tmp/fixed
rm -rf $train_dir
python vgsl_train.py --model_str='8,16,0,1[S1(1x16)1,3 Lfx32 Lrx32 Lfx32]O1s12' \
--max_steps=3072 --train_data=../data/testdata/numbers-16-00000-of-00001 \
--initial_learning_rate=0.001 --final_learning_rate=0.001 \
--num_preprocess_threads=1 --train_dir=$train_dir
python vgsl_eval.py --model_str='8,16,0,1[S1(1x16)1,3 Lfx32 Lrx32 Lfx32]O1s12' \
--num_steps=256 --eval_data=../data/testdata/numbers-16-00000-of-00001 \
--num_preprocess_threads=1 --decoder=../testdata/numbers.charset_size=12.txt \
--eval_interval_secs=0 --train_dir=$train_dir --eval_dir=$train_dir/eval
```
Depending on your machine, this should run in about 1-2 minutes, and should
obtain a label error rate between 50 and 80%, with word error rates probably
not coming below 100%. Actual error rates will vary
according to random initialization.
### OCR-style data with CTC
```
cd python
train_dir=/tmp/ctc
rm -rf $train_dir
python vgsl_train.py --model_str='1,32,0,1[S1(1x32)1,3 Lbx100]O1c105' \
--max_steps=4096 --train_data=../data/testdata/arial-32-00000-of-00001 \
--initial_learning_rate=0.001 --final_learning_rate=0.001 \
--num_preprocess_threads=1 --train_dir=$train_dir &
python vgsl_eval.py --model_str='1,32,0,1[S1(1x32)1,3 Lbx100]O1c105' \
--num_steps=256 --eval_data=../data/testdata/arial-32-00000-of-00001 \
--num_preprocess_threads=1 --decoder=../testdata/arial.charset_size=105.txt \
--eval_interval_secs=15 --train_dir=$train_dir --eval_dir=$train_dir/eval &
tensorboard --logdir=$train_dir
```
Depending on your machine, the background training should run for about 3-4
minutes, and should obtain a label error rate between 10 and 50%, with
correspondingly higher word error rates and even higher sequence error rate.
Actual error rates will vary according to random initialization.
The background eval will run for ever, and will have to be terminated by hand.
The tensorboard command will run a visualizer that can be viewed with a
browser. Go to the link that it prints to view tensorboard and see the
training progress. See the [Tensorboard](https://www.tensorflow.org/versions/r0.10/how_tos/summaries_and_tensorboard/index.html)
introduction for more information.
### Mini FSNS dataset
You can test the actual STREET model on a small FSNS data set. The model will
overfit to this small dataset, but will give some confidence that everything
is working correctly. *Note* that this test runs the training and evaluation
in parallel, which is something that you should do when training any substantial
system, so you can monitor progress.
```
cd python
train_dir=/tmp/fsns
rm -rf $train_dir
python vgsl_train.py --max_steps=10000 --num_preprocess_threads=1 \
--train_data=../data/testdata/fsns-00000-of-00001 \
--initial_learning_rate=0.0001 --final_learning_rate=0.0001 \
--train_dir=$train_dir &
python vgsl_eval.py --num_steps=256 --num_preprocess_threads=1 \
--eval_data=../data/testdata/fsns-00000-of-00001 \
--decoder=../testdata/charset_size=134.txt \
--eval_interval_secs=300 --train_dir=$train_dir --eval_dir=$train_dir/eval &
tensorboard --logdir=$train_dir
```
Depending on your machine, the training should finish in about 1-2 *hours*.
As with the CTC testset above, the eval and tensorboard will have to be
terminated manually.
## Training a full FSNS model
After running the tests above, you are ready to train the real thing!
*Note* that you might want to use a train_dir somewhere other than /tmp as
you can stop the training, reboot if needed and continue if you keep the
data intact, but /tmp gets deleted on a reboot.
```
cd python
train_dir=/tmp/fsns
rm -rf $train_dir
python vgsl_train.py --max_steps=100000000 --train_data=../data/train/train* \
--train_dir=$train_dir &
python vgsl_eval.py --num_steps=1000 \
--eval_data=../data/validation/validation* \
--decoder=../testdata/charset_size=134.txt \
--eval_interval_secs=300 --train_dir=$train_dir --eval_dir=$train_dir/eval &
tensorboard --logdir=$train_dir
```
Training will take a very long time (probably many weeks) to reach minimum
error rate on a single machine, although it will probably take substatially
fewer iterations than with parallel training. Faster training can be obtained
with parallel training on a cluster.
Since the setup is likely to be very site-specific, please see the TensorFlow
documentation on
[Distributed TensorFlow](https://www.tensorflow.org/versions/r0.10/how_tos/distributed/index.html)
for more information. Some code changes may be needed in the `Train` function
in `vgsl_model.py`.
With 40 parallel training workers, nearly optimal error rates (about 25%
sequence error on the validation set) are obtained in about 30 million steps,
although the error continues to fall slightly over the next 30 million, to
perhaps as low as 23%.
With a single machine the number of steps could be substantially lower.
Although untested on this problem, on other problems the ratio is typically
5 to 1 so low error rates could be obtained as soon as 6 million iterations,
which could be reached in about 4 weeks.
## The Variable Graph Specification Language
The STREET model makes use of a graph specification language (VGSL) that
enables rapid experimentation with different model architectures. The language
defines a Tensor Flow graph that can be used to process images of variable sizes
to output a 1-dimensional sequence, like a transcription/OCR problem, or a
0-dimensional label, as for image identification problems. For more information
see [vgslspecs](g3doc/vgslspecs.md)
/* Copyright 2016 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
// OpKernel of LSTM Neural Networks:
//
// LSTM: VariableLSTMOp (VariableLSTMGradOp)
//
// where (.*) are the ops to compute gradients for the corresponding ops.
#define EIGEN_USE_THREADS
#include <vector>
#ifdef GOOGLE_INCLUDES
#include "third_party/eigen3/Eigen/Core"
#include "third_party/tensorflow/core/framework/op.h"
#include "third_party/tensorflow/core/framework/op_kernel.h"
#include "third_party/tensorflow/core/framework/tensor.h"
#else
#include "Eigen/Core"
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/tensor.h"
#endif // GOOGLE_INCLUDES
namespace tensorflow {
using Eigen::array;
using Eigen::DenseIndex;
using IndexPair = Eigen::IndexPair<int>;
Status AreDimsEqual(int dim1, int dim2, const string& message) {
if (dim1 != dim2) {
return errors::InvalidArgument(message, ": ", dim1, " vs. ", dim2);
}
return Status::OK();
}
// ------------------------------- VariableLSTMOp -----------------------------
// Kernel to compute the forward propagation of a Long Short-Term Memory
// network. See the doc of the op below for more detail.
class VariableLSTMOp : public OpKernel {
public:
explicit VariableLSTMOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
OP_REQUIRES_OK(ctx, ctx->GetAttr("clip", &clip_));
OP_REQUIRES(
ctx, clip_ >= 0.0,
errors::InvalidArgument("clip_ needs to be equal or greator than 0"));
}
void Compute(OpKernelContext* ctx) override {
// Inputs.
const auto input = ctx->input(0).tensor<float, 4>();
const auto initial_state = ctx->input(1).tensor<float, 2>();
const auto initial_memory = ctx->input(2).tensor<float, 2>();
const auto w_m_m = ctx->input(3).tensor<float, 3>();
const int batch_size = input.dimension(0);
const int seq_len = input.dimension(1);
const int output_dim = input.dimension(3);
// Sanity checks.
OP_REQUIRES_OK(ctx, AreDimsEqual(4, input.dimension(2), "Input num"));
OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, initial_state.dimension(0),
"State batch"));
OP_REQUIRES_OK(
ctx, AreDimsEqual(output_dim, initial_state.dimension(1), "State dim"));
OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, initial_memory.dimension(0),
"Memory batch"));
OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, initial_memory.dimension(1),
"Memory dim"));
OP_REQUIRES_OK(
ctx, AreDimsEqual(output_dim, w_m_m.dimension(0), "Weight dim 0"));
OP_REQUIRES_OK(ctx, AreDimsEqual(4, w_m_m.dimension(1), "Weight dim 1"));
OP_REQUIRES_OK(
ctx, AreDimsEqual(output_dim, w_m_m.dimension(2), "Weight dim 2"));
// Outputs.
Tensor* act_tensor = nullptr;
OP_REQUIRES_OK(ctx, ctx->allocate_output(
0, {batch_size, seq_len, output_dim}, &act_tensor));
auto act = act_tensor->tensor<float, 3>();
act.setZero();
Tensor* gate_raw_act_tensor = nullptr;
OP_REQUIRES_OK(ctx,
ctx->allocate_output(1, {batch_size, seq_len, 4, output_dim},
&gate_raw_act_tensor));
auto gate_raw_act = gate_raw_act_tensor->tensor<float, 4>();
gate_raw_act.setZero();
Tensor* memory_tensor = nullptr;
OP_REQUIRES_OK(ctx,
ctx->allocate_output(2, {batch_size, seq_len, output_dim},
&memory_tensor));
auto memory = memory_tensor->tensor<float, 3>();
memory.setZero();
// Const and scratch tensors.
Tensor ones_tensor;
OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, {batch_size, output_dim},
&ones_tensor));
auto ones = ones_tensor.tensor<float, 2>();
ones.setConstant(1.0);
Tensor state_tensor;
OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, {batch_size, output_dim},
&state_tensor));
auto state = state_tensor.tensor<float, 2>();
state = initial_state;
Tensor scratch_tensor;
OP_REQUIRES_OK(ctx,
ctx->allocate_temp(DT_FLOAT, {batch_size, 4, output_dim},
&scratch_tensor));
auto scratch = scratch_tensor.tensor<float, 3>();
scratch.setZero();
// Uses the most efficient order for the contraction depending on the batch
// size.
// This is the code shared by both cases. It is discouraged to use the
// implicit capture with lambda functions, but it should be clear that what
// is done here.
auto Forward = [&](int i) {
// Each pre-activation value is stored in the following order (See the
// comment of the op for the meaning):
//
// i: 0
// j: 1
// f: 2
// o: 3
// Adds one to the pre-activation values of the forget gate. This is a
// heuristic to make the training easier.
scratch.chip(2, 1) += ones;
gate_raw_act.chip(i, 1) = scratch;
// c_t = f_t * c_{t-1} + i_t * j_t
if (i == 0) {
state = initial_memory * scratch.chip(2, 1).sigmoid();
} else {
state = memory.chip(i - 1, 1) * scratch.chip(2, 1).sigmoid();
}
state += scratch.chip(0, 1).sigmoid() * scratch.chip(1, 1).tanh();
if (clip_ > 0.0) {
// Clips the values if required.
state = state.cwiseMax(-clip_).cwiseMin(clip_);
}
memory.chip(i, 1) = state;
// h_t = o_t * tanh(c_t)
state = scratch.chip(3, 1).sigmoid() * state.tanh();
act.chip(i, 1) = state;
};
if (batch_size == 1) {
// Reshapes the weight tensor to pretend as if it is a matrix
// multiplication which is more efficient.
auto w_m_m_r =
w_m_m.reshape(array<DenseIndex, 2>{output_dim, 4 * output_dim});
// Dimensions for the contraction.
const array<IndexPair, 1> m_m_dim = {IndexPair(1, 0)};
for (int i = 0; i < seq_len; ++i) {
// Computes the pre-activation value of the input and each gate.
scratch = input.chip(i, 1) +
state.contract(w_m_m_r, m_m_dim)
.reshape(array<DenseIndex, 3>{batch_size, 4, output_dim});
Forward(i);
}
} else {
// Shuffles the dimensions of the weight tensor to be efficient when used
// in the left-hand side. Allocates memory for the shuffled tensor for
// efficiency.
Tensor w_m_m_s_tensor;
OP_REQUIRES_OK(ctx,
ctx->allocate_temp(DT_FLOAT, {output_dim * 4, output_dim},
&w_m_m_s_tensor));
auto w_m_m_s = w_m_m_s_tensor.tensor<float, 2>();
w_m_m_s = w_m_m.shuffle(array<int, 3>{2, 1, 0})
.reshape(array<DenseIndex, 2>{output_dim * 4, output_dim});
// Dimensions for the contraction.
const array<IndexPair, 1> m_m_dim = {IndexPair(1, 1)};
for (int i = 0; i < seq_len; ++i) {
// Computes the pre-activation value of the input and each gate.
scratch = input.chip(i, 1) +
w_m_m_s.contract(state, m_m_dim)
.reshape(array<DenseIndex, 3>{output_dim, 4, batch_size})
.shuffle(array<int, 3>{2, 1, 0});
Forward(i);
}
}
}
private:
// Threshold to clip the values of memory cells.
float clip_ = 0;
};
REGISTER_KERNEL_BUILDER(Name("VariableLSTM").Device(DEVICE_CPU),
VariableLSTMOp);
REGISTER_OP("VariableLSTM")
.Attr("clip: float = 0.0")
.Input("input: float32")
.Input("initial_state: float32")
.Input("initial_memory: float32")
.Input("w_m_m: float32")
.Output("activation: float32")
.Output("gate_raw_act: float32")
.Output("memory: float32")
.Doc(R"doc(
Computes the forward propagation of a Long Short-Term Memory Network.
It computes the following equation recursively for `0<t<=T`:
i_t = sigmoid(a_{i,t})
j_t = tanh(a_{j,t})
f_t = sigmoid(a_{f,t} + 1.0)
o_t = sigmoid(a_{o,t})
c_t = f_t * c_{t-1} + i_t * j_t
c'_t = min(max(c_t, -clip), clip) if clip > 0 else c_t
h_t = o_t * tanh(c'_t)
where
a_{l,t} = w_{l,m,m} * h_{t-1} + x'_{l,t}
where
x'_{l,t} = w_{l,m,i} * x_{t}.
`input` corresponds to the concatenation of `X'_i`, `X'_j`, `X'_f`, and `X'_o`
where `X'_l = (x'_{l,1}, x'_{l,2}, ..., x'_{l,T})`, `initial_state` corresponds
to `h_{0}`, `initial_memory` corresponds to `c_{0}` and `weight` corresponds to
`w_{l,m,m}`. `X'_l` (the transformed input) is computed outside of the op in
advance, so w_{l,m,i} is not passed in to the op.
`activation` corresponds to `H = (h_1, h_2, ..., h_T)`, `gate_raw_activation`
corresponds to the concatanation of `A_i`, `A_j`, `A_f` and `A_o`, and `memory`
corresponds `C = (c_0, c_1, ..., c_T)`.
All entries in the batch are propagated to the end, and are assumed to be the
same length.
input: 4-D with shape `[batch_size, seq_len, 4, num_nodes]`
initial_state: 2-D with shape `[batch_size, num_nodes]`
initial_memory: 2-D with shape `[batch_size, num_nodes]`
w_m_m: 3-D with shape `[num_nodes, 4, num_nodes]`
activation: 3-D with shape `[batch_size, seq_len, num_nodes]`
gate_raw_act: 3-D with shape `[batch_size, seq_len, 4, num_nodes]`
memory: 3-D with shape `[batch_size, seq_len, num_nodes]`
)doc");
// ----------------------------- VariableLSTMGradOp ----------------------------
// Kernel to compute the gradient of VariableLSTMOp.
class VariableLSTMGradOp : public OpKernel {
public:
explicit VariableLSTMGradOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
void Compute(OpKernelContext* ctx) override {
// Inputs.
const auto initial_state = ctx->input(0).tensor<float, 2>();
const auto initial_memory = ctx->input(1).tensor<float, 2>();
const auto w_m_m = ctx->input(2).tensor<float, 3>();
const auto act = ctx->input(3).tensor<float, 3>();
const auto gate_raw_act = ctx->input(4).tensor<float, 4>();
const auto memory = ctx->input(5).tensor<float, 3>();
const auto act_grad = ctx->input(6).tensor<float, 3>();
const auto gate_raw_act_grad = ctx->input(7).tensor<float, 4>();
const auto memory_grad = ctx->input(8).tensor<float, 3>();
const int batch_size = act.dimension(0);
const int seq_len = act.dimension(1);
const int output_dim = act.dimension(2);
// Sanity checks.
OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, initial_state.dimension(0),
"State batch"));
OP_REQUIRES_OK(
ctx, AreDimsEqual(output_dim, initial_state.dimension(1), "State dim"));
OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, initial_memory.dimension(0),
"Memory batch"));
OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, initial_memory.dimension(1),
"Memory dim"));
OP_REQUIRES_OK(
ctx, AreDimsEqual(output_dim, w_m_m.dimension(0), "Weight dim 0"));
OP_REQUIRES_OK(ctx, AreDimsEqual(4, w_m_m.dimension(1), "Weight dim 1"));
OP_REQUIRES_OK(
ctx, AreDimsEqual(output_dim, w_m_m.dimension(2), "Weight dim 2"));
OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, gate_raw_act.dimension(0),
"Gate raw activation batch"));
OP_REQUIRES_OK(ctx, AreDimsEqual(seq_len, gate_raw_act.dimension(1),
"Gate raw activation len"));
OP_REQUIRES_OK(ctx, AreDimsEqual(4, gate_raw_act.dimension(2),
"Gate raw activation num"));
OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, gate_raw_act.dimension(3),
"Gate raw activation dim"));
OP_REQUIRES_OK(
ctx, AreDimsEqual(batch_size, memory.dimension(0), "Memory batch"));
OP_REQUIRES_OK(ctx,
AreDimsEqual(seq_len, memory.dimension(1), "Memory len"));
OP_REQUIRES_OK(ctx,
AreDimsEqual(output_dim, memory.dimension(2), "Memory dim"));
OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, act_grad.dimension(0),
"Activation gradient batch"));
OP_REQUIRES_OK(ctx, AreDimsEqual(seq_len, act_grad.dimension(1),
"Activation gradient len"));
OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, act_grad.dimension(2),
"Activation gradient dim"));
OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, gate_raw_act_grad.dimension(0),
"Activation gradient batch"));
OP_REQUIRES_OK(ctx, AreDimsEqual(seq_len, gate_raw_act_grad.dimension(1),
"Activation gradient len"));
OP_REQUIRES_OK(ctx, AreDimsEqual(4, gate_raw_act_grad.dimension(2),
"Activation gradient num"));
OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, gate_raw_act_grad.dimension(3),
"Activation gradient dim"));
OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, memory_grad.dimension(0),
"Memory gradient batch"));
OP_REQUIRES_OK(ctx, AreDimsEqual(seq_len, memory_grad.dimension(1),
"Memory gradient len"));
OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, memory_grad.dimension(2),
"Memory gradient dim"));
// Outputs.
std::vector<Tensor*> collections(4, nullptr);
OP_REQUIRES_OK(ctx,
ctx->allocate_output(0, {batch_size, seq_len, 4, output_dim},
&collections[0]));
auto input_grad = collections[0]->tensor<float, 4>();
input_grad.setZero();
OP_REQUIRES_OK(ctx, ctx->allocate_output(1, {batch_size, output_dim},
&collections[1]));
auto init_state_grad = collections[1]->tensor<float, 2>();
init_state_grad.setZero();
OP_REQUIRES_OK(ctx, ctx->allocate_output(2, {batch_size, output_dim},
&collections[2]));
auto init_memory_grad = collections[2]->tensor<float, 2>();
init_memory_grad.setZero();
OP_REQUIRES_OK(ctx, ctx->allocate_output(3, {output_dim, 4, output_dim},
&collections[3]));
auto w_m_m_grad = collections[3]->tensor<float, 3>();
w_m_m_grad.setZero();
// Const and scratch tensors.
Tensor ones_tensor;
OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, {batch_size, output_dim},
&ones_tensor));
auto ones = ones_tensor.tensor<float, 2>();
ones.setConstant(1.0);
Tensor scratch_tensor;
OP_REQUIRES_OK(ctx,
ctx->allocate_temp(DT_FLOAT, {batch_size, 4, output_dim},
&scratch_tensor));
auto scratch = scratch_tensor.tensor<float, 3>();
scratch.setZero();
Tensor tmp1_tensor;
OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, {batch_size, output_dim},
&tmp1_tensor));
auto tmp1 = tmp1_tensor.tensor<float, 2>();
tmp1.setZero();
Tensor tmp2_tensor;
OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, {batch_size, output_dim},
&tmp2_tensor));
auto tmp2 = tmp2_tensor.tensor<float, 2>();
tmp2.setZero();
// Uses the most efficient order for the contraction depending on the batch
// size.
// Shuffles the dimensions of the weight tensor to be efficient when used in
// the left-hand side. Allocates memory for the shuffled tensor for
// efficiency.
Tensor w_m_m_s_tensor;
OP_REQUIRES_OK(ctx,
ctx->allocate_temp(DT_FLOAT, {4, output_dim, output_dim},
&w_m_m_s_tensor));
auto w_m_m_s = w_m_m_s_tensor.tensor<float, 3>();
if (batch_size == 1) {
// Allocates memory only it is used.
w_m_m_s = w_m_m.shuffle(array<int, 3>{1, 2, 0});
}
// Dimensions for the contraction with the weight tensor.
const array<IndexPair, 1> m_m_dim =
batch_size == 1 ? array<IndexPair, 1>{IndexPair(1, 0)}
: array<IndexPair, 1>{IndexPair(1, 1)};
// Dimensions for the contraction of the batch dimensions.
const array<IndexPair, 1> b_b_dim = {IndexPair(0, 0)};
for (int i = seq_len - 1; i >= 0; --i) {
if (i == seq_len - 1) {
init_state_grad = act_grad.chip(i, 1);
} else {
w_m_m_grad +=
act.chip(i, 1)
.contract(scratch.reshape(
array<DenseIndex, 2>{batch_size, 4 * output_dim}),
b_b_dim)
.reshape(array<DenseIndex, 3>{output_dim, 4, output_dim});
if (batch_size == 1) {
init_state_grad.device(ctx->eigen_cpu_device()) =
scratch.chip(0, 1).contract(w_m_m_s.chip(0, 0), m_m_dim) +
scratch.chip(1, 1).contract(w_m_m_s.chip(1, 0), m_m_dim) +
scratch.chip(2, 1).contract(w_m_m_s.chip(2, 0), m_m_dim) +
scratch.chip(3, 1).contract(w_m_m_s.chip(3, 0), m_m_dim);
} else {
init_state_grad.device(ctx->eigen_cpu_device()) =
(w_m_m.chip(0, 1).contract(scratch.chip(0, 1), m_m_dim) +
w_m_m.chip(1, 1).contract(scratch.chip(1, 1), m_m_dim) +
w_m_m.chip(2, 1).contract(scratch.chip(2, 1), m_m_dim) +
w_m_m.chip(3, 1).contract(scratch.chip(3, 1), m_m_dim))
.shuffle(array<int, 2>{1, 0});
}
init_state_grad += act_grad.chip(i, 1);
}
auto gate_raw_act_t = gate_raw_act.chip(i, 1);
auto gate_raw_act_grad_t = gate_raw_act_grad.chip(i, 1);
// Output gate.
tmp1 = memory.chip(i, 1);
tmp1 = tmp1.tanh(); // y_t
tmp2 = gate_raw_act_t.chip(3, 1).sigmoid(); // o_t
scratch.chip(3, 1) = init_state_grad * tmp1 * tmp2 * (ones - tmp2) +
gate_raw_act_grad_t.chip(3, 1);
init_memory_grad += init_state_grad * tmp2 * (ones - tmp1.square()) +
memory_grad.chip(i, 1);
// Input gate.
tmp1 = gate_raw_act_t.chip(0, 1).sigmoid(); // i_t
tmp2 = gate_raw_act_t.chip(1, 1);
tmp2 = tmp2.tanh(); // j_t
scratch.chip(0, 1) = init_memory_grad * tmp2 * tmp1 * (ones - tmp1) +
gate_raw_act_grad_t.chip(0, 1);
// Input.
scratch.chip(1, 1) = init_memory_grad * tmp1 * (ones - tmp2.square()) +
gate_raw_act_grad_t.chip(1, 1);
// Forget gate.
tmp1 = gate_raw_act_t.chip(2, 1).sigmoid(); // f_t
if (i == 0) {
scratch.chip(2, 1) =
init_memory_grad * initial_memory * tmp1 * (ones - tmp1) +
gate_raw_act_grad_t.chip(2, 1);
} else {
scratch.chip(2, 1) =
init_memory_grad * memory.chip(i - 1, 1) * tmp1 * (ones - tmp1) +
gate_raw_act_grad_t.chip(2, 1);
}
// Memory.
init_memory_grad *= tmp1;
input_grad.chip(i, 1) = scratch;
}
w_m_m_grad += initial_state
.contract(scratch.reshape(array<DenseIndex, 2>{
batch_size, 4 * output_dim}),
b_b_dim)
.reshape(array<DenseIndex, 3>{output_dim, 4, output_dim});
if (batch_size == 1) {
init_state_grad.device(ctx->eigen_cpu_device()) =
(scratch.chip(0, 1).contract(w_m_m_s.chip(0, 0), m_m_dim) +
scratch.chip(1, 1).contract(w_m_m_s.chip(1, 0), m_m_dim) +
scratch.chip(2, 1).contract(w_m_m_s.chip(2, 0), m_m_dim) +
scratch.chip(3, 1).contract(w_m_m_s.chip(3, 0), m_m_dim));
} else {
init_state_grad.device(ctx->eigen_cpu_device()) =
(w_m_m.chip(0, 1).contract(scratch.chip(0, 1), m_m_dim) +
w_m_m.chip(1, 1).contract(scratch.chip(1, 1), m_m_dim) +
w_m_m.chip(2, 1).contract(scratch.chip(2, 1), m_m_dim) +
w_m_m.chip(3, 1).contract(scratch.chip(3, 1), m_m_dim))
.shuffle(array<int, 2>{1, 0});
}
}
};
REGISTER_KERNEL_BUILDER(Name("VariableLSTMGrad").Device(DEVICE_CPU),
VariableLSTMGradOp);
REGISTER_OP("VariableLSTMGrad")
.Input("initial_state: float32")
.Input("initial_memory: float32")
.Input("w_m_m: float32")
.Input("activation: float32")
.Input("gate_raw_act: float32")
.Input("memory: float32")
.Input("act_grad: float32")
.Input("gate_raw_act_grad: float32")
.Input("memory_grad: float32")
.Output("input_grad: float32")
.Output("initial_state_grad: float32")
.Output("initial_memory_grad: float32")
.Output("w_m_m_grad: float32")
.Doc(R"doc(
Computes the gradient for VariableLSTM.
This is to be used conjunction with VariableLSTM. It ignores the clipping used
in the forward pass.
initial_state: 2-D with shape `[batch_size, num_nodes]`
initial_memory: 2-D with shape `[batch_size, num_nodes]`
w_m_m: 3-D with shape `[num_nodes, 4, num_nodes]`
activation: 3-D with shape `[batch_size, seq_len, num_nodes]`
gate_raw_act: 3-D with shape `[batch_size, seq_len, 4, num_nodes]`
memory: 3-D with shape `[batch_size, seq_len, num_nodes]`
act_grad: 3-D with shape `[batch_size, seq_len, num_nodes]`
gate_raw_act_grad: 3-D with shape `[batch_size, seq_len, 4, num_nodes]`
memory_grad: 3-D with shape `[batch_size, seq_len, num_nodes]`
input_grad: 3-D with shape `[batch_size, seq_len, num_nodes]`
initial_state_grad: 2-D with shape `[batch_size, num_nodes]`
initial_memory_grad: 2-D with shape `[batch_size, num_nodes]`
w_m_m_grad: 3-D with shape `[num_nodes, 4, num_nodes]`
)doc");
} // namespace tensorflow
# VGSL Specs - rapid prototyping of mixed conv/LSTM networks for images.
Variable-size Graph Specification Language (VGSL) enables the specification of a
Tensor Flow graph, composed of convolutions and LSTMs, that can process
variable-sized images, from a very short definition string.
## Applications: What is VGSL Specs good for?
VGSL Specs are designed specifically to create TF graphs for:
* Variable size images as the input. (In one or BOTH dimensions!)
* Output an image (heat map), sequence (like text), or a category.
* Convolutions and LSTMs are the main computing component.
* Fixed-size images are OK too!
But wait, aren't there other systems that simplify generating TF graphs? There
are indeed, but something they all have in common is that they are designed for
fixed size images only. If you want to solve a real OCR problem, you either have
to cut the image into arbitrary sized pieces and try to stitch the results back
together, or use VGSL.
## Basic Usage
A full model, including input and the output layers, can be built using
vgsl_model.py. Alternatively you can supply your own tensors and add your own
loss function layer if you wish, using vgslspecs.py directly.
### Building a full model
Provided your problem matches the one addressed by vgsl_model, you are good to
go.
Targeted problems:
* Images for input, either 8 bit greyscale or 24 bit color.
* Output is 0-d (A category, like cat, dog, train, car.)
* Output is 1-d, with either variable length or a fixed length sequence, eg
OCR, transcription problems in general.
Currently only softmax (1 of n) outputs are supported, but it would not be
difficult to extend to logistic.
Use vgsl_train.py to train your model, and vgsl_eval.py to evaluate it. They
just call Train and Eval in vgsl_model.py.
### Model string for a full model
The model string for a full model includes the input spec, the output spec and
the layers spec in between. Example:
```
'1,0,0,3[Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256]O1c105'
```
The first 4 numbers specify the standard TF tensor dimensions: [batch, height,
width, depth], except that height and/or width may be zero, allowing them to be
variable. Batch is specific only to training, and may be a different value at
recognition/inference time. Depth needs to be 1 for greyscale and 3 for color.
The model string in square brackets [] is the main model definition, which is
described [below.](#basic-layers-syntax) The output specification takes the
form:
```
O(2|1|0)(l|s|c)n output layer with n classes.
2 (heatmap) Output is a 2-d vector map of the input (possibly at
different scale). (Not yet supported.)
1 (sequence) Output is a 1-d sequence of vector values.
0 (category) Output is a 0-d single vector value.
l uses a logistic non-linearity on the output, allowing multiple
hot elements in any output vector value. (Not yet supported.)
s uses a softmax non-linearity, with one-hot output in each value.
c uses a softmax with CTC. Can only be used with s (sequence).
NOTE Only O0s, O1s and O1c are currently supported.
```
The number of classes must match the encoding of the TF Example data set.
### Layers only - providing your own input and loss layers
You don't have to use the canned input/output modules, if you provide your
separate code to read TF Example and loss functions. First prepare your inputs:
* A TF-conventional batch of: `images = tf.float32[batch, height, width,
depth]`
* A tensor of the width of each image in the batch: `widths = tf.int64[batch]`
* A tensor of the height of each image in the batch: `heights =
tf.int64[batch]`
Note that these can be created from individual images using
`tf.train.batch_join` with `dynamic_pad=True.`
```python
import vgslspecs
...
spec = '[Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256]'
vgsl = vgslspecs.VGSLSpecs(widths, heights, is_training=True)
last_layer = vgsl.Build(images, spec)
...
AddSomeLossFunction(last_layer)....
```
With some appropriate training data, this would create a world-class OCR engine!
## Basic Layers Syntax
NOTE that *all* ops input and output the standard TF convention of a 4-d tensor:
`[batch, height, width, depth]` *regardless of any collapsing of dimensions.*
This greatly simplifies things, and allows the VGSLSpecs class to track changes
to the values of widths and heights, so they can be correctly passed in to LSTM
operations, and used by any downstream CTC operation.
NOTE: in the descriptions below, `<d>` is a numeric value, and literals are
described using regular expression syntax.
NOTE: Whitespace is allowed between ops.
### Naming
Each op gets a unique name by default, based on its spec string plus its
character position in the overall specification. All the Ops take an optional
name argument in braces after the mnemonic code, but before any numeric
arguments.
### Functional ops
```
C(s|t|r|l|m)[{name}]<y>,<x>,<d> Convolves using a y,x window, with no shrinkage,
SAME infill, d outputs, with s|t|r|l|m non-linear layer.
F(s|t|r|l|m)[{name}]<d> Fully-connected with s|t|r|l|m non-linearity and d
outputs. Reduces height, width to 1. Input height and width must be constant.
L(f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
The LSTM must have one of:
f runs the LSTM forward only.
r runs the LSTM reversed only.
b runs the LSTM bidirectionally.
It will operate on either the x- or y-dimension, treating the other dimension
independently (as if part of the batch).
(Full 2-d and grid are not yet supported).
s (optional) summarizes the output in the requested dimension,
outputting only the final step, collapsing the dimension to a
single element.
Do[{name}] Insert a dropout layer.
```
In the above, `(s|t|r|l|m)` specifies the type of the non-linearity:
```python
s = sigmoid
t = tanh
r = relu
l = linear (i.e., None)
m = softmax
```
Examples:
`Cr5,5,32` Runs a 5x5 Relu convolution with 32 depth/number of filters.
`Lfx{MyLSTM}128` runs a forward-only LSTM, named 'MyLSTM' in the x-dimension
with 128 outputs, treating the y dimension independently.
`Lfys64` runs a forward-only LSTM in the y-dimension with 64 outputs, treating
the x-dimension independently and collapses the y-dimension to 1 element.
### Plumbing ops
The plumbing ops allow the construction of arbitrarily complex graphs. Something
currently missing is the ability to define macros for generating say an
inception unit in multiple places.
```
[...] Execute ... networks in series (layers).
(...) Execute ... networks in parallel, with their output concatenated in depth.
S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to another
dimension.
Mp[{name}]<y>,<x> Maxpool the input, reducing the (y,x) rectangle to a single
value.
```
In the `S` op, `<a>, <b>, <d>, <e>, <f>` are numbers.
`S` is a generalized reshape. It splits input dimension `d` into `a` x `b`,
sending the high/most significant part `a` to the high/most significant side of
dimension `e`, and the low part `b` to the high side of dimension `f`.
Exception: if `d=e=f`, then then dimension `d` is internally transposed to
`bxa`. *At least one* of `e`, `f` must be equal to `d`, so no dimension can be
totally destroyed. Either `a` or `b` can be zero, meaning whatever is left after
taking out the other, allowing dimensions to be of variable size.
NOTE: Remember the standard TF convention of a 4-d tensor: `[batch, height,
width, depth]`, so `batch=0, height=1, width=2, depth=3.`
Eg. `S3(3x50)2,3` will split the 150-element depth into 3x50, with the 3 going
to the most significant part of the width, and the 50 part staying in depth.
This will rearrange a 3x50 output parallel operation to spread the 3 output sets
over width.
### Full Examples
Example 1: A graph capable of high quality OCR.
`1,0,0,1[Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256]O1c105`
As layer descriptions: (Input layer is at the bottom, output at the top.)
```
O1c105: Output layer produces 1-d (sequence) output, trained with CTC,
outputting 105 classes.
Lfx256: Forward-only LSTM in x with 256 outputs
Lrx128: Reverse-only LSTM in x with 128 outputs
Lfx128: Forward-only LSTM in x with 128 outputs
Lfys64: Dimension-summarizing LSTM, summarizing the y-dimension with 64 outputs
Mp3,3: 3 x 3 Maxpool
Ct5,5,16: 5 x 5 Convolution with 16 outputs and tanh non-linearity
[]: The body of the graph is alway expressed as a series of layers.
1,0,0,1: Input is a batch of 1 image of variable size in greyscale
```
Example 2: The STREET network for reading French street name signs end-to-end.
For a detailed description see the [FSNS dataset
paper](http://link.springer.com/chapter/10.1007%2F978-3-319-46604-0_30)
```
1,600,150,3[S2(4x150)0,2 Ct5,5,16 Mp2,2 Ct5,5,64 Mp3,3
([Lrys64 Lbx128][Lbys64 Lbx128][Lfys64 Lbx128]) S3(3x0)2,3
Lfx128 Lrx128 S0(1x4)0,3 Lfx256]O1c134
```
Since networks are usually illustrated with the input at the bottom, the input
layer is at the bottom, output at the top, with 'headings' *below* the section
they introduce.
```
O1c134: Output is a 1-d sequence, trained with CTC and 134 output softmax.
Lfx256: Forward-only LSTM with 256 outputs
S0(1x4)0,3: Reshape transferring the batch of 4 tiles to the depth dimension.
Lrx128: Reverse-only LSTM with 128 outputs
Lfx128: Forward-only LSTM with 128 outputs
(Final section above)
S3(3x0)2,3: Split the outputs of the 3 parallel summarizers and spread over the
x-dimension
[Lfys64 Lbx128]: Summarizing LSTM downwards on the y-dimension with 64
outputs, followed by a bi-directional LSTM in the x-dimension with 128
outputs
[Lbys64 Lbx128]: Summarizing bi-directional LSTM on the y-dimension with
64 outputs, followed by a bi-directional LSTM in the x-dimension with 128
outputs
[Lrys64 Lbx128]: Summarizing LSTM upwards on the y-dimension with 64 outputs,
followed by a bi-directional LSTM in the x-dimension with 128 outputs
(): In parallel (re-using the inputs and concatenating the outputs):
(Summarizing section above)
Mp3,3: 3 x 3 Maxpool
Ct5,5,64: 5 x 5 Convolution with 64 outputs and tanh non-linearity
Mp2,2: 2 x 2 Maxpool
Ct5,5,16: 5 x 5 Convolution with 16 outputs and tanh non-linearity
S2(4x150)0,2: Split the x-dimension into 4x150, converting each tiled 600x150
image into a batch of 4 150x150 images
(Convolutional input section above)
[]: The body of the graph is alway expressed as a series of layers.
1,150,600,3: Input is a batch of 1, 600x150 image in 24 bit color
```
## Variable size Tensors Under the Hood
Here are some notes about handling variable-sized images since they require some
consideration and a little bit of knowledge about what goes on inside.
A variable-sized image is an input for which the width and/or height are not
known at graph-building time, so the tensor shape contains unknown/None/-1
sizes.
Many standard NN layers, such as convolutions, are designed to cope naturally
with variable-sized images in TF and produce a variable sized image as the
output. For other layers, such as 'Fully connected' variable size is
fundamentally difficult, if not impossible to deal with, since by definition,
*all* its inputs are connected via a weight to an output. The number of inputs
therefore must be fixed.
It is possible to handle variable sized images by using sparse tensors. Some
implementations make a single variable dimension a list instead of part of the
tensor. Both these solutions suffer from completely segregating the world of
variable size from the world of fixed size, making models and their descriptions
completely non-interchangeable.
In VGSL, we use a standard 4-d Tensor, `[batch, height, width, depth]` and
either use a batch size of 1 or put up with padding of the input images to the
largest size of any element of the batch. The other price paid for this
standardization is that the user must supply a pair of tensors of shape [batch]
specifying the width and height of each input in a batch. This allows the LSTMs
in the graph to know how many iterations to execute and how to correctly
back-propagate the gradients.
The standard TF implementation of CTC also requires a tensor giving the sequence
lengths of its inputs. If the output of VGSL is going into CTC, the lengths can
be obtained using:
```python
import vgslspecs
...
spec = '[Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256]'
vgsl = vgslspecs.VGSLSpecs(widths, heights, is_training=True)
last_layer = vgsl.Build(images, spec)
seq_lengths = vgsl.GetLengths()
```
The above will provide the widths that were given in the constructor, scaled
down by the max-pool operator. The heights may be obtained using
`vgsl.GetLengths(1)`, specifying the index of the y-dimension.
NOTE that currently the only way of collapsing a dimension of unknown size to
known size (1) is through the use of a summarizing LSTM. A single summarizing
LSTM will collapse one dimension (x or y), leaving a 1-d sequence. The 1-d
sequence can then be collapsed in the other dimension to make a 0-d categorical
(softmax) or embedding (logistic) output.
Using the (parallel) op it is entirely possible to run multiple [series] of ops
that collapse x first in one and y first in the other, reducing both eventually
to a single categorical value! For eample, the following description may do
something useful with ImageNet-like problems:
```python
[Cr5,5,16 Mp2,2 Cr5,5,64 Mp3,3 ([Lfxs64 Lfys256] [Lfys64 Lfxs256]) Fr512 Fr512]
```
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Basic CTC+recoder decoder.
Decodes a sequence of class-ids into UTF-8 text.
For basic information on CTC See:
Alex Graves et al. Connectionist Temporal Classification: Labelling Unsegmented
Sequence Data with Recurrent Neural Networks.
http://www.cs.toronto.edu/~graves/icml_2006.pdf
"""
import collections
import re
import errorcounter as ec
import tensorflow as tf
# Named tuple Part describes a part of a multi (1 or more) part code that
# represents a utf-8 string. For example, Chinese character 'x' might be
# represented by 3 codes of which (utf8='x', index=1, num_codes3) would be the
# middle part. (The actual code is not stored in the tuple).
Part = collections.namedtuple('Part', 'utf8 index, num_codes')
# Class that decodes a sequence of class-ids into UTF-8 text.
class Decoder(object):
"""Basic CTC+recoder decoder."""
def __init__(self, filename):
r"""Constructs a Decoder.
Reads the text file describing the encoding and build the encoder.
The text file contains lines of the form:
<code>[,<code>]*\t<string>
Each line defines a mapping from a sequence of one or more integer codes to
a corresponding utf-8 string.
Args:
filename: Name of file defining the decoding sequences.
"""
# self.decoder is a list of lists of Part(utf8, index, num_codes).
# The index to the top-level list is a code. The list given by the code
# index is a list of the parts represented by that code, Eg if the code 42
# represents the 2nd (index 1) out of 3 part of Chinese character 'x', then
# self.decoder[42] = [..., (utf8='x', index=1, num_codes3), ...] where ...
# means all other uses of the code 42.
self.decoder = []
if filename:
self._InitializeDecoder(filename)
def SoftmaxEval(self, sess, model, num_steps):
"""Evaluate a model in softmax mode.
Adds char, word recall and sequence error rate events to the sw summary
writer, and returns them as well
TODO(rays) Add LogisticEval.
Args:
sess: A tensor flow Session.
model: The model to run in the session. Requires a VGSLImageModel or any
other class that has a using_ctc attribute and a RunAStep(sess) method
that reurns a softmax result with corresponding labels.
num_steps: Number of steps to evaluate for.
Returns:
ErrorRates named tuple.
Raises:
ValueError: If an unsupported number of dimensions is used.
"""
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
# Run the requested number of evaluation steps, gathering the outputs of the
# softmax and the true labels of the evaluation examples.
total_label_counts = ec.ErrorCounts(0, 0, 0, 0)
total_word_counts = ec.ErrorCounts(0, 0, 0, 0)
sequence_errors = 0
for _ in xrange(num_steps):
softmax_result, labels = model.RunAStep(sess)
# Collapse softmax to same shape as labels.
predictions = softmax_result.argmax(axis=-1)
# Exclude batch from num_dims.
num_dims = len(predictions.shape) - 1
batch_size = predictions.shape[0]
null_label = softmax_result.shape[-1] - 1
for b in xrange(batch_size):
if num_dims == 2:
# TODO(rays) Support 2-d data.
raise ValueError('2-d label data not supported yet!')
else:
if num_dims == 1:
pred_batch = predictions[b, :]
labels_batch = labels[b, :]
else:
pred_batch = [predictions[b]]
labels_batch = [labels[b]]
text = self.StringFromCTC(pred_batch, model.using_ctc, null_label)
truth = self.StringFromCTC(labels_batch, False, null_label)
# Note that recall_errs is false negatives (fn) aka drops/deletions.
# Actual recall would be 1-fn/truth_words.
# Likewise precision_errs is false positives (fp) aka adds/insertions.
# Actual precision would be 1-fp/ocr_words.
total_word_counts = ec.AddErrors(total_word_counts,
ec.CountWordErrors(text, truth))
total_label_counts = ec.AddErrors(total_label_counts,
ec.CountErrors(text, truth))
if text != truth:
sequence_errors += 1
coord.request_stop()
coord.join(threads)
return ec.ComputeErrorRates(total_label_counts, total_word_counts,
sequence_errors, num_steps * batch_size)
def StringFromCTC(self, ctc_labels, merge_dups, null_label):
"""Decodes CTC output to a string.
Extracts only sequences of codes that are allowed by self.decoder.
Labels that make illegal code sequences are dropped.
Note that, by its nature of taking only top choices, this is much weaker
than a full-blown beam search that considers all the softmax outputs.
For languages without many multi-code sequences, this doesn't make much
difference, but for complex scripts the accuracy will be much lower.
Args:
ctc_labels: List of class labels including null characters to remove.
merge_dups: If True, Duplicate labels will be merged
null_label: Label value to ignore.
Returns:
Labels decoded to a string.
"""
# Run regular ctc on the labels, extracting a list of codes.
codes = self._CodesFromCTC(ctc_labels, merge_dups, null_label)
length = len(codes)
if length == 0:
return ''
# strings and partials are both indexed by the same index as codes.
# strings[i] is the best completed string upto position i, and
# partials[i] is a list of partial code sequences at position i.
# Warning: memory is squared-order in length.
strings = []
partials = []
for pos in xrange(length):
code = codes[pos]
parts = self.decoder[code]
partials.append([])
strings.append('')
# Iterate over the parts that this code can represent.
for utf8, index, num_codes in parts:
if index > pos:
continue
# We can use code if it is an initial code (index==0) or continues a
# sequence in the partials list at the previous position.
if index == 0 or partials[pos - 1].count(
Part(utf8, index - 1, num_codes)) > 0:
if index < num_codes - 1:
# Save the partial sequence.
partials[-1].append(Part(utf8, index, num_codes))
elif not strings[-1]:
# A code sequence is completed. Append to the best string that we
# had where it started.
if pos >= num_codes:
strings[-1] = strings[pos - num_codes] + utf8
else:
strings[-1] = utf8
if not strings[-1] and pos > 0:
# We didn't get anything here so copy the previous best string, skipping
# the current code, but it may just be a partial anyway.
strings[-1] = strings[-2]
return strings[-1]
def _InitializeDecoder(self, filename):
"""Reads the decoder file and initializes self.decoder from it.
Args:
filename: Name of text file mapping codes to utf8 strings.
Raises:
ValueError: if the input file is not parsed correctly.
"""
line_re = re.compile(r'(?P<codes>\d+(,\d+)*)\t(?P<utf8>.+)')
with tf.gfile.GFile(filename) as f:
for line in f:
m = line_re.match(line)
if m is None:
raise ValueError('Unmatched line:', line)
# codes is the sequence that maps to the string.
str_codes = m.groupdict()['codes'].split(',')
codes = []
for code in str_codes:
codes.append(int(code))
utf8 = m.groupdict()['utf8']
num_codes = len(codes)
for index, code in enumerate(codes):
while code >= len(self.decoder):
self.decoder.append([])
self.decoder[code].append(Part(utf8, index, num_codes))
def _CodesFromCTC(self, ctc_labels, merge_dups, null_label):
"""Collapses CTC output to regular output.
Args:
ctc_labels: List of class labels including null characters to remove.
merge_dups: If True, Duplicate labels will be merged.
null_label: Label value to ignore.
All trailing zeros are removed!!
TODO(rays) This may become a problem with non-CTC models.
If using charset, this should not be a problem as zero is always space.
tf.pad can only append zero, so we have to be able to drop them, as a
non-ctc will have learned to output trailing zeros instead of trailing
nulls. This is awkward, as the stock ctc loss function requires that the
null character be num_classes-1.
Returns:
(List of) Labels with null characters removed.
"""
out_labels = []
prev_label = -1
zeros_needed = 0
for label in ctc_labels:
if label == null_label:
prev_label = -1
elif label != prev_label or not merge_dups:
if label == 0:
# Count zeros and only emit them when it is clear there is a non-zero
# after, so as to truncate away all trailing zeros.
zeros_needed += 1
else:
if merge_dups and zeros_needed > 0:
out_labels.append(0)
else:
out_labels += [0] * zeros_needed
zeros_needed = 0
out_labels.append(label)
prev_label = label
return out_labels
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for decoder."""
import os
import tensorflow as tf
import decoder
def _testdata(filename):
return os.path.join('../testdata/', filename)
class DecoderTest(tf.test.TestCase):
def testCodesFromCTC(self):
"""Tests that the simple CTC decoder drops nulls and duplicates.
"""
ctc_labels = [9, 9, 9, 1, 9, 2, 2, 3, 9, 9, 0, 0, 1, 9, 1, 9, 9, 9]
decode = decoder.Decoder(filename=None)
non_null_labels = decode._CodesFromCTC(
ctc_labels, merge_dups=False, null_label=9)
self.assertEqual(non_null_labels, [1, 2, 2, 3, 0, 0, 1, 1])
idempotent_labels = decode._CodesFromCTC(
non_null_labels, merge_dups=False, null_label=9)
self.assertEqual(idempotent_labels, non_null_labels)
collapsed_labels = decode._CodesFromCTC(
ctc_labels, merge_dups=True, null_label=9)
self.assertEqual(collapsed_labels, [1, 2, 3, 0, 1, 1])
non_idempotent_labels = decode._CodesFromCTC(
collapsed_labels, merge_dups=True, null_label=9)
self.assertEqual(non_idempotent_labels, [1, 2, 3, 0, 1])
def testStringFromCTC(self):
"""Tests that the decoder can decode sequences including multi-codes.
"""
# - f - a r - m(1/2)m -junk sp b a r - n -
ctc_labels = [9, 6, 9, 1, 3, 9, 4, 9, 5, 5, 9, 5, 0, 2, 1, 3, 9, 4, 9]
decode = decoder.Decoder(filename=_testdata('charset_size_10.txt'))
text = decode.StringFromCTC(ctc_labels, merge_dups=True, null_label=9)
self.assertEqual(text, 'farm barn')
if __name__ == '__main__':
tf.test.main()
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Some simple tools for error counting.
"""
import collections
# Named tuple Error counts describes the counts needed to accumulate errors
# over multiple trials:
# false negatives (aka drops or deletions),
# false positives: (aka adds or insertions),
# truth_count: number of elements in ground truth = denominator for fn,
# test_count: number of elements in test string = denominator for fp,
# Note that recall = 1 - fn/truth_count, precision = 1 - fp/test_count,
# accuracy = 1 - (fn + fp) / (truth_count + test_count).
ErrorCounts = collections.namedtuple('ErrorCounts', ['fn', 'fp', 'truth_count',
'test_count'])
# Named tuple for error rates, as a percentage. Accuracies are just 100-error.
ErrorRates = collections.namedtuple('ErrorRates',
['label_error', 'word_recall_error',
'word_precision_error', 'sequence_error'])
def CountWordErrors(ocr_text, truth_text):
"""Counts the word drop and add errors as a bag of words.
Args:
ocr_text: OCR text string.
truth_text: Truth text string.
Returns:
ErrorCounts named tuple.
"""
# Convert to lists of words.
return CountErrors(ocr_text.split(), truth_text.split())
def CountErrors(ocr_text, truth_text):
"""Counts the drops and adds between 2 bags of iterables.
Simple bag of objects count returns the number of dropped and added
elements, regardless of order, from anything that is iterable, eg
a pair of strings gives character errors, and a pair of word lists give
word errors.
Args:
ocr_text: OCR text iterable (eg string for chars, word list for words).
truth_text: Truth text iterable.
Returns:
ErrorCounts named tuple.
"""
counts = collections.Counter(truth_text)
counts.subtract(ocr_text)
drops = sum(c for c in counts.values() if c > 0)
adds = sum(-c for c in counts.values() if c < 0)
return ErrorCounts(drops, adds, len(truth_text), len(ocr_text))
def AddErrors(counts1, counts2):
"""Adds the counts and returns a new sum tuple.
Args:
counts1: ErrorCounts named tuples to sum.
counts2: ErrorCounts named tuples to sum.
Returns:
Sum of counts1, counts2.
"""
return ErrorCounts(counts1.fn + counts2.fn, counts1.fp + counts2.fp,
counts1.truth_count + counts2.truth_count,
counts1.test_count + counts2.test_count)
def ComputeErrorRates(label_counts, word_counts, seq_errors, num_seqs):
"""Returns an ErrorRates corresponding to the given counts.
Args:
label_counts: ErrorCounts for the character labels
word_counts: ErrorCounts for the words
seq_errors: Number of sequence errors
num_seqs: Total sequences
Returns:
ErrorRates corresponding to the given counts.
"""
label_errors = label_counts.fn + label_counts.fp
num_labels = label_counts.truth_count + label_counts.test_count
return ErrorRates(
ComputeErrorRate(label_errors, num_labels),
ComputeErrorRate(word_counts.fn, word_counts.truth_count),
ComputeErrorRate(word_counts.fp, word_counts.test_count),
ComputeErrorRate(seq_errors, num_seqs))
def ComputeErrorRate(error_count, truth_count):
"""Returns a sanitized percent error rate from the raw counts.
Prevents div by 0 and clips return to 100%.
Args:
error_count: Number of errors.
truth_count: Number to divide by.
Returns:
100.0 * error_count / truth_count clipped to 100.
"""
if truth_count == 0:
truth_count = 1
error_count = 1
elif error_count > truth_count:
error_count = truth_count
return error_count * 100.0 / truth_count
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for errorcounter."""
import tensorflow as tf
import errorcounter as ec
class ErrorcounterTest(tf.test.TestCase):
def testComputeErrorRate(self):
"""Tests that the percent calculation works as expected.
"""
rate = ec.ComputeErrorRate(error_count=0, truth_count=0)
self.assertEqual(rate, 100.0)
rate = ec.ComputeErrorRate(error_count=1, truth_count=0)
self.assertEqual(rate, 100.0)
rate = ec.ComputeErrorRate(error_count=10, truth_count=1)
self.assertEqual(rate, 100.0)
rate = ec.ComputeErrorRate(error_count=0, truth_count=1)
self.assertEqual(rate, 0.0)
rate = ec.ComputeErrorRate(error_count=3, truth_count=12)
self.assertEqual(rate, 25.0)
def testCountErrors(self):
"""Tests that the error counter works as expected.
"""
truth_str = 'farm barn'
counts = ec.CountErrors(ocr_text=truth_str, truth_text=truth_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=0, fp=0, truth_count=9, test_count=9))
# With a period on the end, we get a char error.
dot_str = 'farm barn.'
counts = ec.CountErrors(ocr_text=dot_str, truth_text=truth_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=0, fp=1, truth_count=9, test_count=10))
counts = ec.CountErrors(ocr_text=truth_str, truth_text=dot_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=1, fp=0, truth_count=10, test_count=9))
# Space is just another char.
no_space = 'farmbarn'
counts = ec.CountErrors(ocr_text=no_space, truth_text=truth_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=1, fp=0, truth_count=9, test_count=8))
counts = ec.CountErrors(ocr_text=truth_str, truth_text=no_space)
self.assertEqual(
counts, ec.ErrorCounts(
fn=0, fp=1, truth_count=8, test_count=9))
# Lose them all.
counts = ec.CountErrors(ocr_text='', truth_text=truth_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=9, fp=0, truth_count=9, test_count=0))
counts = ec.CountErrors(ocr_text=truth_str, truth_text='')
self.assertEqual(
counts, ec.ErrorCounts(
fn=0, fp=9, truth_count=0, test_count=9))
def testCountWordErrors(self):
"""Tests that the error counter works as expected.
"""
truth_str = 'farm barn'
counts = ec.CountWordErrors(ocr_text=truth_str, truth_text=truth_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=0, fp=0, truth_count=2, test_count=2))
# With a period on the end, we get a word error.
dot_str = 'farm barn.'
counts = ec.CountWordErrors(ocr_text=dot_str, truth_text=truth_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=1, fp=1, truth_count=2, test_count=2))
counts = ec.CountWordErrors(ocr_text=truth_str, truth_text=dot_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=1, fp=1, truth_count=2, test_count=2))
# Space is special.
no_space = 'farmbarn'
counts = ec.CountWordErrors(ocr_text=no_space, truth_text=truth_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=2, fp=1, truth_count=2, test_count=1))
counts = ec.CountWordErrors(ocr_text=truth_str, truth_text=no_space)
self.assertEqual(
counts, ec.ErrorCounts(
fn=1, fp=2, truth_count=1, test_count=2))
# Lose them all.
counts = ec.CountWordErrors(ocr_text='', truth_text=truth_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=2, fp=0, truth_count=2, test_count=0))
counts = ec.CountWordErrors(ocr_text=truth_str, truth_text='')
self.assertEqual(
counts, ec.ErrorCounts(
fn=0, fp=2, truth_count=0, test_count=2))
# With a space in ba rn, there is an extra add.
sp_str = 'farm ba rn'
counts = ec.CountWordErrors(ocr_text=sp_str, truth_text=truth_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=1, fp=2, truth_count=2, test_count=3))
counts = ec.CountWordErrors(ocr_text=truth_str, truth_text=sp_str)
self.assertEqual(
counts, ec.ErrorCounts(
fn=2, fp=1, truth_count=3, test_count=2))
if __name__ == '__main__':
tf.test.main()
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Ops and utilities for neural networks.
For now, just an LSTM layer.
"""
import shapes
import tensorflow as tf
rnn = tf.load_op_library("../cc/rnn_ops.so")
def rnn_helper(inp,
length,
cell_type=None,
direction="forward",
name=None,
*args,
**kwargs):
"""Adds ops for a recurrent neural network layer.
This function calls an actual implementation of a recurrent neural network
based on `cell_type`.
There are three modes depending on the value of `direction`:
forward: Adds a forward RNN.
backward: Adds a backward RNN.
bidirectional: Adds both forward and backward RNNs and creates a
bidirectional RNN.
Args:
inp: A 3-D tensor of shape [`batch_size`, `max_length`, `feature_dim`].
length: A 1-D tensor of shape [`batch_size`] and type int64. Each element
represents the length of the corresponding sequence in `inp`.
cell_type: Cell type of RNN. Currently can only be "lstm".
direction: One of "forward", "backward", "bidirectional".
name: Name of the op.
*args: Other arguments to the layer.
**kwargs: Keyword arugments to the layer.
Returns:
A 3-D tensor of shape [`batch_size`, `max_length`, `num_nodes`].
"""
assert cell_type is not None
rnn_func = None
if cell_type == "lstm":
rnn_func = lstm_layer
assert rnn_func is not None
assert direction in ["forward", "backward", "bidirectional"]
with tf.variable_scope(name):
if direction in ["forward", "bidirectional"]:
forward = rnn_func(
inp=inp,
length=length,
backward=False,
name="forward",
*args,
**kwargs)
if isinstance(forward, tuple):
# lstm_layer returns a tuple (output, memory). We only need the first
# element.
forward = forward[0]
if direction in ["backward", "bidirectional"]:
backward = rnn_func(
inp=inp,
length=length,
backward=True,
name="backward",
*args,
**kwargs)
if isinstance(backward, tuple):
# lstm_layer returns a tuple (output, memory). We only need the first
# element.
backward = backward[0]
if direction == "forward":
out = forward
elif direction == "backward":
out = backward
else:
out = tf.concat(2, [forward, backward])
return out
@tf.RegisterShape("VariableLSTM")
def _variable_lstm_shape(op):
"""Shape function for the VariableLSTM op."""
input_shape = op.inputs[0].get_shape().with_rank(4)
state_shape = op.inputs[1].get_shape().with_rank(2)
memory_shape = op.inputs[2].get_shape().with_rank(2)
w_m_m_shape = op.inputs[3].get_shape().with_rank(3)
batch_size = input_shape[0].merge_with(state_shape[0])
batch_size = input_shape[0].merge_with(memory_shape[0])
seq_len = input_shape[1]
gate_num = input_shape[2].merge_with(w_m_m_shape[1])
output_dim = input_shape[3].merge_with(state_shape[1])
output_dim = output_dim.merge_with(memory_shape[1])
output_dim = output_dim.merge_with(w_m_m_shape[0])
output_dim = output_dim.merge_with(w_m_m_shape[2])
return [[batch_size, seq_len, output_dim],
[batch_size, seq_len, gate_num, output_dim],
[batch_size, seq_len, output_dim]]
@tf.RegisterGradient("VariableLSTM")
def _variable_lstm_grad(op, act_grad, gate_grad, mem_grad):
"""Gradient function for the VariableLSTM op."""
initial_state = op.inputs[1]
initial_memory = op.inputs[2]
w_m_m = op.inputs[3]
act = op.outputs[0]
gate_raw_act = op.outputs[1]
memory = op.outputs[2]
return rnn.variable_lstm_grad(initial_state, initial_memory, w_m_m, act,
gate_raw_act, memory, act_grad, gate_grad,
mem_grad)
def lstm_layer(inp,
length=None,
state=None,
memory=None,
num_nodes=None,
backward=False,
clip=50.0,
reg_func=tf.nn.l2_loss,
weight_reg=False,
weight_collection="LSTMWeights",
bias_reg=False,
stddev=None,
seed=None,
decode=False,
use_native_weights=False,
name=None):
"""Adds ops for an LSTM layer.
This adds ops for the following operations:
input => (forward-LSTM|backward-LSTM) => output
The direction of the LSTM is determined by `backward`. If it is false, the
forward LSTM is used, the backward one otherwise.
Args:
inp: A 3-D tensor of shape [`batch_size`, `max_length`, `feature_dim`].
length: A 1-D tensor of shape [`batch_size`] and type int64. Each element
represents the length of the corresponding sequence in `inp`.
state: If specified, uses it as the initial state.
memory: If specified, uses it as the initial memory.
num_nodes: The number of LSTM cells.
backward: If true, reverses the `inp` before adding the ops. The output is
also reversed so that the direction is the same as `inp`.
clip: Value used to clip the cell values.
reg_func: Function used for the weight regularization such as
`tf.nn.l2_loss`.
weight_reg: If true, regularize the filter weights with `reg_func`.
weight_collection: Collection to add the weights to for regularization.
bias_reg: If true, regularize the bias vector with `reg_func`.
stddev: Standard deviation used to initialize the variables.
seed: Seed used to initialize the variables.
decode: If true, does not add ops which are not used for inference.
use_native_weights: If true, uses weights in the same format as the native
implementations.
name: Name of the op.
Returns:
A 3-D tensor of shape [`batch_size`, `max_length`, `num_nodes`].
"""
with tf.variable_scope(name):
if backward:
if length is None:
inp = tf.reverse(inp, [False, True, False])
else:
inp = tf.reverse_sequence(inp, length, 1, 0)
num_prev = inp.get_shape()[2]
if stddev:
initializer = tf.truncated_normal_initializer(stddev=stddev, seed=seed)
else:
initializer = tf.uniform_unit_scaling_initializer(seed=seed)
if use_native_weights:
with tf.variable_scope("LSTMCell"):
w = tf.get_variable(
"W_0",
shape=[num_prev + num_nodes, 4 * num_nodes],
initializer=initializer,
dtype=tf.float32)
w_i_m = tf.slice(w, [0, 0], [num_prev, 4 * num_nodes], name="w_i_m")
w_m_m = tf.reshape(
tf.slice(w, [num_prev, 0], [num_nodes, 4 * num_nodes]),
[num_nodes, 4, num_nodes],
name="w_m_m")
else:
w_i_m = tf.get_variable("w_i_m", [num_prev, 4 * num_nodes],
initializer=initializer)
w_m_m = tf.get_variable("w_m_m", [num_nodes, 4, num_nodes],
initializer=initializer)
if not decode and weight_reg:
tf.add_to_collection(weight_collection, reg_func(w_i_m, name="w_i_m_reg"))
tf.add_to_collection(weight_collection, reg_func(w_m_m, name="w_m_m_reg"))
batch_size = shapes.tensor_dim(inp, dim=0)
num_frames = shapes.tensor_dim(inp, dim=1)
prev = tf.reshape(inp, tf.pack([batch_size * num_frames, num_prev]))
if use_native_weights:
with tf.variable_scope("LSTMCell"):
b = tf.get_variable(
"B",
shape=[4 * num_nodes],
initializer=tf.zeros_initializer,
dtype=tf.float32)
biases = tf.identity(b, name="biases")
else:
biases = tf.get_variable(
"biases", [4 * num_nodes], initializer=tf.constant_initializer(0.0))
if not decode and bias_reg:
tf.add_to_collection(
weight_collection, reg_func(
biases, name="biases_reg"))
prev = tf.nn.xw_plus_b(prev, w_i_m, biases)
prev = tf.reshape(prev, tf.pack([batch_size, num_frames, 4, num_nodes]))
if state is None:
state = tf.fill(tf.pack([batch_size, num_nodes]), 0.0)
if memory is None:
memory = tf.fill(tf.pack([batch_size, num_nodes]), 0.0)
out, _, mem = rnn.variable_lstm(prev, state, memory, w_m_m, clip=clip)
if backward:
if length is None:
out = tf.reverse(out, [False, True, False])
else:
out = tf.reverse_sequence(out, length, 1, 0)
return out, mem
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Shape manipulation functions.
rotate_dimensions: prepares for a rotating transpose by returning a rotated
list of dimension indices.
transposing_reshape: allows a dimension to be factorized, with one of the pieces
transferred to another dimension, or to transpose factors within a single
dimension.
tensor_dim: gets a shape dimension as a constant integer if known otherwise a
runtime usable tensor value.
tensor_shape: returns the full shape of a tensor as the tensor_dim.
"""
import tensorflow as tf
def rotate_dimensions(num_dims, src_dim, dest_dim):
"""Returns a list of dimension indices that will rotate src_dim to dest_dim.
src_dim is moved to dest_dim, with all intervening dimensions shifted towards
the hole left by src_dim. Eg:
num_dims = 4, src_dim=3, dest_dim=1
Returned list=[0, 3, 1, 2]
For a tensor with dims=[5, 4, 3, 2] a transpose would yield [5, 2, 4, 3].
Args:
num_dims: The number of dimensions to handle.
src_dim: The dimension to move.
dest_dim: The dimension to move src_dim to.
Returns:
A list of rotated dimension indices.
"""
# List of dimensions for transpose.
dim_list = range(num_dims)
# Shuffle src_dim to dest_dim by swapping to shuffle up the other dims.
step = 1 if dest_dim > src_dim else -1
for x in xrange(src_dim, dest_dim, step):
dim_list[x], dim_list[x + step] = dim_list[x + step], dim_list[x]
return dim_list
def transposing_reshape(tensor,
src_dim,
part_a,
part_b,
dest_dim_a,
dest_dim_b,
name=None):
"""Splits src_dim and sends one of the pieces to another dim.
Terminology:
A matrix is often described as 'row-major' or 'column-major', which doesn't
help if you can't remember which is the row index and which is the column,
even if you know what 'major' means, so here is a simpler explanation of it:
When TF stores a tensor of size [d0, d1, d2, d3] indexed by [i0, i1, i2, i3],
the memory address of an element is calculated using:
((i0 * d1 + i1) * d2 + i2) * d3 + i3, so, d0 is the MOST SIGNIFICANT dimension
and d3 the LEAST SIGNIFICANT, just like in the decimal number 1234, 1 is the
most significant digit and 4 the least significant. In both cases the most
significant is multiplied by the largest number to determine its 'value'.
Furthermore, if we reshape the tensor to [d0'=d0, d1'=d1 x d2, d2'=d3], then
the MOST SIGNIFICANT part of d1' is d1 and the LEAST SIGNIFICANT part of d1'
is d2.
Action:
transposing_reshape splits src_dim into factors [part_a, part_b], and sends
the most significant part (of size part_a) to be the most significant part of
dest_dim_a*(Exception: see NOTE 2), and the least significant part (of size
part_b) to be the most significant part of dest_dim_b.
This is basically a combination of reshape, rotating transpose, reshape.
NOTE1: At least one of dest_dim_a and dest_dim_b must equal src_dim, ie one of
the parts always stays put, so src_dim is never totally destroyed and the
output number of dimensions is always the same as the input.
NOTE2: If dest_dim_a == dest_dim_b == src_dim, then parts a and b are simply
transposed within src_dim to become part_b x part_a, so the most significant
part becomes the least significant part and vice versa. Thus if you really
wanted to make one of the parts the least significant side of the destiantion,
the destination dimension can be internally transposed with a second call to
transposing_reshape.
NOTE3: One of part_a and part_b may be -1 to allow src_dim to be of unknown
size with one known-size factor. Otherwise part_a * part_b must equal the size
of src_dim.
NOTE4: The reshape preserves as many known-at-graph-build-time dimension sizes
as are available.
Example:
Input dims=[5, 2, 6, 2]
tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
[[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
[[[24, 25]...
src_dim=2, part_a=2, part_b=3, dest_dim_a=3, dest_dim_b=2
output dims =[5, 2, 3, 4]
output tensor=[[[[0, 1, 6, 7][2, 3, 8, 9][4, 5, 10, 11]]
[[12, 13, 18, 19][14, 15, 20, 21][16, 17, 22, 23]]]
[[[24, 26, 28]...
Example2:
Input dims=[phrases, words, letters]=[2, 6, x]
tensor=[[[the][cat][sat][on][the][mat]]
[[a][stitch][in][time][saves][nine]]]
We can factorize the 6 words into 3x2 = [[the][cat]][[sat][on]][[the][mat]]
or 2x3=[[the][cat][sat]][[on][the][mat]] and
src_dim=1, part_a=3, part_b=2, dest_dim_a=1, dest_dim_b=1
would yield:
[[[the][sat][the][cat][on][mat]]
[[a][in][saves][stitch][time][nine]]], but
src_dim=1, part_a=2, part_b=3, dest_dim_a=1, dest_dim_b=1
would yield:
[[[the][on][cat][the][sat][mat]]
[[a][time][stitch][saves][in][nine]]], and
src_dim=1, part_a=2, part_b=3, dest_dim_a=0, dest_dim_b=1
would yield:
[[[the][cat][sat]]
[[a][stitch][in]]
[[on][the][mat]]
[[time][saves][nine]]]
Now remember that the words above represent any least-significant subset of
the input dimensions.
Args:
tensor: A tensor to reshape.
src_dim: The dimension to split.
part_a: The first factor of the split.
part_b: The second factor of the split.
dest_dim_a: The dimension to move part_a of src_dim to.
dest_dim_b: The dimension to move part_b of src_dim to.
name: Optional base name for all the ops.
Returns:
Reshaped tensor.
Raises:
ValueError: If the args are invalid.
"""
if dest_dim_a != src_dim and dest_dim_b != src_dim:
raise ValueError(
'At least one of dest_dim_a, dest_dim_b must equal src_dim!')
if part_a == 0 or part_b == 0:
raise ValueError('Zero not allowed for part_a or part_b!')
if part_a < 0 and part_b < 0:
raise ValueError('At least one of part_a and part_b must be positive!')
if not name:
name = 'transposing_reshape'
prev_shape = tensor_shape(tensor)
expanded = tf.reshape(
tensor,
prev_shape[:src_dim] + [part_a, part_b] + prev_shape[src_dim + 1:],
name=name + '_reshape_in')
dest = dest_dim_b
if dest_dim_a != src_dim:
# We are just moving part_a to dest_dim_a.
dest = dest_dim_a
else:
# We are moving part_b to dest_dim_b.
src_dim += 1
dim_list = rotate_dimensions(len(expanded.get_shape()), src_dim, dest)
expanded = tf.transpose(expanded, dim_list, name=name + '_rot_transpose')
# Reshape identity except dest,dest+1, which get merged.
ex_shape = tensor_shape(expanded)
combined = ex_shape[dest] * ex_shape[dest + 1]
return tf.reshape(
expanded,
ex_shape[:dest] + [combined] + ex_shape[dest + 2:],
name=name + '_reshape_out')
def tensor_dim(tensor, dim):
"""Returns int dimension if known at a graph build time else a tensor.
If the size of the dim of tensor is known at graph building time, then that
known value is returned, otherwise (instead of None), a Tensor that will give
the size of the dimension when the graph is run. The return value will be
accepted by tf.reshape in multiple (or even all) dimensions, even when the
sizes are not known at graph building time, unlike -1, which can only be used
in one dimension. It is a bad idea to use tf.shape all the time, as some ops
demand a known (at graph build time) size. This function therefore returns
the best available, most useful dimension size.
Args:
tensor: Input tensor.
dim: Dimension to find the size of.
Returns:
An integer if shape is known at build time, otherwise a tensor of int32.
"""
result = tensor.get_shape().as_list()[dim]
if result is None:
result = tf.shape(tensor)[dim]
return result
def tensor_shape(tensor):
"""Returns a heterogeneous list of tensor_dim for the tensor.
See tensor_dim for a more detailed explanation.
Args:
tensor: Input tensor.
Returns:
A heterogeneous list of integers and int32 tensors.
"""
result = []
for d in xrange(len(tensor.get_shape())):
result.append(tensor_dim(tensor, d))
return result
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for shapes."""
import numpy as np
import tensorflow as tf
import shapes
def _rand(*size):
return np.random.uniform(size=size).astype('f')
class ShapesTest(tf.test.TestCase):
"""Tests just the shapes from a call to transposing_reshape."""
def __init__(self, other):
super(ShapesTest, self).__init__(other)
self.batch_size = 4
self.im_height = 24
self.im_width = 36
self.depth = 20
def testReshapeTile(self):
"""Tests that a tiled input can be reshaped to the batch dimension."""
fake = tf.placeholder(
tf.float32, shape=(None, None, None, self.depth), name='inputs')
real = _rand(self.batch_size, self.im_height, self.im_width, self.depth)
with self.test_session() as sess:
outputs = shapes.transposing_reshape(
fake, src_dim=2, part_a=3, part_b=-1, dest_dim_a=0, dest_dim_b=2)
res_image = sess.run([outputs], feed_dict={fake: real})
self.assertEqual(
tuple(res_image[0].shape),
(self.batch_size * 3, self.im_height, self.im_width / 3, self.depth))
def testReshapeDepth(self):
"""Tests that depth can be reshaped to the x dimension."""
fake = tf.placeholder(
tf.float32, shape=(None, None, None, self.depth), name='inputs')
real = _rand(self.batch_size, self.im_height, self.im_width, self.depth)
with self.test_session() as sess:
outputs = shapes.transposing_reshape(
fake, src_dim=3, part_a=4, part_b=-1, dest_dim_a=2, dest_dim_b=3)
res_image = sess.run([outputs], feed_dict={fake: real})
self.assertEqual(
tuple(res_image[0].shape),
(self.batch_size, self.im_height, self.im_width * 4, self.depth / 4))
class DataTest(tf.test.TestCase):
"""Tests that the data is moved correctly in a call to transposing_reshape.
"""
def testTransposingReshape_2_2_3_2_1(self):
"""Case: dest_a == src, dest_b < src: Split with Least sig part going left.
"""
with self.test_session() as sess:
fake = tf.placeholder(
tf.float32, shape=(None, None, None, 2), name='inputs')
outputs = shapes.transposing_reshape(
fake, src_dim=2, part_a=2, part_b=3, dest_dim_a=2, dest_dim_b=1)
# Make real inputs. The tensor looks like this:
# tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
# [[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
# [[[24, 25]...
real = np.arange(120).reshape((5, 2, 6, 2))
np_array = sess.run([outputs], feed_dict={fake: real})[0]
self.assertEqual(tuple(np_array.shape), (5, 6, 2, 2))
self.assertAllEqual(np_array[0, :, :, :],
[[[0, 1], [6, 7]], [[12, 13], [18, 19]],
[[2, 3], [8, 9]], [[14, 15], [20, 21]],
[[4, 5], [10, 11]], [[16, 17], [22, 23]]])
def testTransposingReshape_2_2_3_2_3(self):
"""Case: dest_a == src, dest_b > src: Split with Least sig part going right.
"""
with self.test_session() as sess:
fake = tf.placeholder(
tf.float32, shape=(None, None, None, 2), name='inputs')
outputs = shapes.transposing_reshape(
fake, src_dim=2, part_a=2, part_b=3, dest_dim_a=2, dest_dim_b=3)
# Make real inputs. The tensor looks like this:
# tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
# [[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
# [[[24, 25]...
real = np.arange(120).reshape((5, 2, 6, 2))
np_array = sess.run([outputs], feed_dict={fake: real})[0]
self.assertEqual(tuple(np_array.shape), (5, 2, 2, 6))
self.assertAllEqual(
np_array[0, :, :, :],
[[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11]],
[[12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]]])
def testTransposingReshape_2_2_3_2_2(self):
"""Case: dest_a == src, dest_b == src. Transpose within dimension 2.
"""
with self.test_session() as sess:
fake = tf.placeholder(
tf.float32, shape=(None, None, None, 2), name='inputs')
outputs = shapes.transposing_reshape(
fake, src_dim=2, part_a=2, part_b=3, dest_dim_a=2, dest_dim_b=2)
# Make real inputs. The tensor looks like this:
# tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
# [[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
# [[[24, 25]...
real = np.arange(120).reshape((5, 2, 6, 2))
np_array = sess.run([outputs], feed_dict={fake: real})[0]
self.assertEqual(tuple(np_array.shape), (5, 2, 6, 2))
self.assertAllEqual(
np_array[0, :, :, :],
[[[0, 1], [6, 7], [2, 3], [8, 9], [4, 5], [10, 11]],
[[12, 13], [18, 19], [14, 15], [20, 21], [16, 17], [22, 23]]])
def testTransposingReshape_2_2_3_1_2(self):
"""Case: dest_a < src, dest_b == src. Split with Most sig part going left.
"""
with self.test_session() as sess:
fake = tf.placeholder(
tf.float32, shape=(None, None, None, 2), name='inputs')
outputs = shapes.transposing_reshape(
fake, src_dim=2, part_a=2, part_b=3, dest_dim_a=1, dest_dim_b=2)
# Make real inputs. The tensor looks like this:
# tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
# [[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
# [[[24, 25]...
real = np.arange(120).reshape((5, 2, 6, 2))
np_array = sess.run([outputs], feed_dict={fake: real})[0]
self.assertEqual(tuple(np_array.shape), (5, 4, 3, 2))
self.assertAllEqual(np_array[0, :, :, :],
[[[0, 1], [2, 3], [4, 5]],
[[12, 13], [14, 15], [16, 17]],
[[6, 7], [8, 9], [10, 11]],
[[18, 19], [20, 21], [22, 23]]])
def testTransposingReshape_2_2_3_3_2(self):
"""Case: dest_a < src, dest_b == src. Split with Most sig part going right.
"""
with self.test_session() as sess:
fake = tf.placeholder(
tf.float32, shape=(None, None, None, 2), name='inputs')
outputs = shapes.transposing_reshape(
fake, src_dim=2, part_a=2, part_b=3, dest_dim_a=3, dest_dim_b=2)
# Make real inputs. The tensor looks like this:
# tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
# [[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
# [[[24, 25]...
real = np.arange(120).reshape((5, 2, 6, 2))
np_array = sess.run([outputs], feed_dict={fake: real})[0]
self.assertEqual(tuple(np_array.shape), (5, 2, 3, 4))
self.assertAllEqual(
np_array[0, :, :, :],
[[[0, 1, 6, 7], [2, 3, 8, 9], [4, 5, 10, 11]],
[[12, 13, 18, 19], [14, 15, 20, 21], [16, 17, 22, 23]]])
if __name__ == '__main__':
tf.test.main()
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Model eval separate from training."""
from tensorflow import app
from tensorflow.python.platform import flags
import vgsl_model
flags.DEFINE_string('eval_dir', '/tmp/mdir/eval',
'Directory where to write event logs.')
flags.DEFINE_string('graph_def_file', None,
'Output eval graph definition file.')
flags.DEFINE_string('train_dir', '/tmp/mdir',
'Directory where to find training checkpoints.')
flags.DEFINE_string('model_str',
'1,150,600,3[S2(4x150)0,2 Ct5,5,16 Mp2,2 Ct5,5,64 Mp3,3'
'([Lrys64 Lbx128][Lbys64 Lbx128][Lfys64 Lbx128])S3(3x0)2,3'
'Lfx128 Lrx128 S0(1x4)0,3 Do Lfx256]O1c134',
'Network description.')
flags.DEFINE_integer('num_steps', 1000, 'Number of steps to run evaluation.')
flags.DEFINE_integer('eval_interval_secs', 60,
'Time interval between eval runs.')
flags.DEFINE_string('eval_data', None, 'Evaluation data filepattern')
flags.DEFINE_string('decoder', None, 'Charset decoder')
FLAGS = flags.FLAGS
def main(argv):
del argv
vgsl_model.Eval(FLAGS.train_dir, FLAGS.eval_dir, FLAGS.model_str,
FLAGS.eval_data, FLAGS.decoder, FLAGS.num_steps,
FLAGS.graph_def_file, FLAGS.eval_interval_secs)
if __name__ == '__main__':
app.run()
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""String network description language to define network layouts."""
import collections
import tensorflow as tf
from tensorflow.python.ops import parsing_ops
# Named tuple for the standard tf image tensor Shape.
# batch_size: Number of images to batch-up for training.
# height: Fixed height of image or None for variable.
# width: Fixed width of image or None for variable.
# depth: Desired depth in bytes per pixel of input images.
ImageShape = collections.namedtuple('ImageTensorDims',
['batch_size', 'height', 'width', 'depth'])
def ImageInput(input_pattern, num_threads, shape, using_ctc, reader=None):
"""Creates an input image tensor from the input_pattern filenames.
TODO(rays) Expand for 2-d labels, 0-d labels, and logistic targets.
Args:
input_pattern: Filenames of the dataset(s) to read.
num_threads: Number of preprocessing threads.
shape: ImageShape with the desired shape of the input.
using_ctc: Take the unpadded_class labels instead of padded.
reader: Function that returns an actual reader to read Examples from
input files. If None, uses tf.TFRecordReader().
Returns:
images: Float Tensor containing the input image scaled to [-1.28, 1.27].
heights: Tensor int64 containing the heights of the images.
widths: Tensor int64 containing the widths of the images.
labels: Serialized SparseTensor containing the int64 labels.
sparse_labels: Serialized SparseTensor containing the int64 labels.
truths: Tensor string of the utf8 truth texts.
Raises:
ValueError: if the optimizer type is unrecognized.
"""
data_files = tf.gfile.Glob(input_pattern)
assert data_files, 'no files found for dataset ' + input_pattern
queue_capacity = shape.batch_size * num_threads * 2
filename_queue = tf.train.string_input_producer(
data_files, capacity=queue_capacity)
# Create a subgraph with its own reader (but sharing the
# filename_queue) for each preprocessing thread.
images_and_label_lists = []
for _ in range(num_threads):
image, height, width, labels, text = _ReadExamples(filename_queue, shape,
using_ctc, reader)
images_and_label_lists.append([image, height, width, labels, text])
# Create a queue that produces the examples in batches.
images, heights, widths, labels, truths = tf.train.batch_join(
images_and_label_lists,
batch_size=shape.batch_size,
capacity=16 * shape.batch_size,
dynamic_pad=True)
# Deserialize back to sparse, because the batcher doesn't do sparse.
labels = tf.deserialize_many_sparse(labels, tf.int64)
sparse_labels = tf.cast(labels, tf.int32)
labels = tf.sparse_tensor_to_dense(labels)
labels = tf.reshape(labels, [shape.batch_size, -1], name='Labels')
# Crush the other shapes to just the batch dimension.
heights = tf.reshape(heights, [-1], name='Heights')
widths = tf.reshape(widths, [-1], name='Widths')
truths = tf.reshape(truths, [-1], name='Truths')
# Give the images a nice name as well.
images = tf.identity(images, name='Images')
tf.image_summary('Images', images)
return images, heights, widths, labels, sparse_labels, truths
def _ReadExamples(filename_queue, shape, using_ctc, reader=None):
"""Builds network input tensor ops for TF Example.
Args:
filename_queue: Queue of filenames, from tf.train.string_input_producer
shape: ImageShape with the desired shape of the input.
using_ctc: Take the unpadded_class labels instead of padded.
reader: Function that returns an actual reader to read Examples from
input files. If None, uses tf.TFRecordReader().
Returns:
image: Float Tensor containing the input image scaled to [-1.28, 1.27].
height: Tensor int64 containing the height of the image.
width: Tensor int64 containing the width of the image.
labels: Serialized SparseTensor containing the int64 labels.
text: Tensor string of the utf8 truth text.
"""
if reader:
reader = reader()
else:
reader = tf.TFRecordReader()
_, example_serialized = reader.read(filename_queue)
example_serialized = tf.reshape(example_serialized, shape=[])
features = tf.parse_single_example(
example_serialized,
{'image/encoded': parsing_ops.FixedLenFeature(
[1], dtype=tf.string, default_value=''),
'image/text': parsing_ops.FixedLenFeature(
[1], dtype=tf.string, default_value=''),
'image/class': parsing_ops.VarLenFeature(dtype=tf.int64),
'image/unpadded_class': parsing_ops.VarLenFeature(dtype=tf.int64),
'image/height': parsing_ops.FixedLenFeature(
[1], dtype=tf.int64, default_value=1),
'image/width': parsing_ops.FixedLenFeature(
[1], dtype=tf.int64, default_value=1)})
if using_ctc:
labels = features['image/unpadded_class']
else:
labels = features['image/class']
labels = tf.serialize_sparse(labels)
image = tf.reshape(features['image/encoded'], shape=[], name='encoded')
image = _ImageProcessing(image, shape)
height = tf.reshape(features['image/height'], [-1])
width = tf.reshape(features['image/width'], [-1])
text = tf.reshape(features['image/text'], shape=[])
return image, height, width, labels, text
def _ImageProcessing(image_buffer, shape):
"""Convert a PNG string into an input tensor.
We allow for fixed and variable sizes.
Does fixed conversion to floats in the range [-1.28, 1.27].
Args:
image_buffer: Tensor containing a PNG encoded image.
shape: ImageShape with the desired shape of the input.
Returns:
image: Decoded, normalized image in the range [-1.28, 1.27].
"""
image = tf.image.decode_png(image_buffer, channels=shape.depth)
image.set_shape([shape.height, shape.width, shape.depth])
image = tf.cast(image, tf.float32)
image = tf.sub(image, 128.0)
image = tf.mul(image, 1 / 100.0)
return image
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""String network description language to define network layouts."""
import re
import time
import decoder
import errorcounter as ec
import shapes
import tensorflow as tf
import vgsl_input
import vgslspecs
import tensorflow.contrib.slim as slim
from tensorflow.core.framework import summary_pb2
from tensorflow.python.platform import tf_logging as logging
# Parameters for rate decay.
# We divide the learning_rate_halflife by DECAY_STEPS_FACTOR and use DECAY_RATE
# as the decay factor for the learning rate, ie we use the DECAY_STEPS_FACTORth
# root of 2 as the decay rate every halflife/DECAY_STEPS_FACTOR to achieve the
# desired halflife.
DECAY_STEPS_FACTOR = 16
DECAY_RATE = pow(0.5, 1.0 / DECAY_STEPS_FACTOR)
def Train(train_dir,
model_str,
train_data,
max_steps,
master='',
task=0,
ps_tasks=0,
initial_learning_rate=0.001,
final_learning_rate=0.001,
learning_rate_halflife=160000,
optimizer_type='Adam',
num_preprocess_threads=1,
reader=None):
"""Testable trainer with no dependence on FLAGS.
Args:
train_dir: Directory to write checkpoints.
model_str: Network specification string.
train_data: Training data file pattern.
max_steps: Number of training steps to run.
master: Name of the TensorFlow master to use.
task: Task id of this replica running the training. (0 will be master).
ps_tasks: Number of tasks in ps job, or 0 if no ps job.
initial_learning_rate: Learing rate at start of training.
final_learning_rate: Asymptotic minimum learning rate.
learning_rate_halflife: Number of steps over which to halve the difference
between initial and final learning rate.
optimizer_type: One of 'GradientDescent', 'AdaGrad', 'Momentum', 'Adam'.
num_preprocess_threads: Number of input threads.
reader: Function that returns an actual reader to read Examples from input
files. If None, uses tf.TFRecordReader().
"""
if master.startswith('local'):
device = tf.ReplicaDeviceSetter(ps_tasks)
else:
device = '/cpu:0'
with tf.Graph().as_default():
with tf.device(device):
model = InitNetwork(train_data, model_str, 'train', initial_learning_rate,
final_learning_rate, learning_rate_halflife,
optimizer_type, num_preprocess_threads, reader)
# Create a Supervisor. It will take care of initialization, summaries,
# checkpoints, and recovery.
#
# When multiple replicas of this program are running, the first one,
# identified by --task=0 is the 'chief' supervisor. It is the only one
# that takes case of initialization, etc.
sv = tf.train.Supervisor(
logdir=train_dir,
is_chief=(task == 0),
saver=model.saver,
save_summaries_secs=10,
save_model_secs=30,
recovery_wait_secs=5)
step = 0
while step < max_steps:
try:
# Get an initialized, and possibly recovered session. Launch the
# services: Checkpointing, Summaries, step counting.
with sv.managed_session(master) as sess:
while step < max_steps:
_, step = model.TrainAStep(sess)
if sv.coord.should_stop():
break
except tf.errors.AbortedError as e:
logging.error('Received error:%s', e)
continue
def Eval(train_dir,
eval_dir,
model_str,
eval_data,
decoder_file,
num_steps,
graph_def_file=None,
eval_interval_secs=0,
reader=None):
"""Restores a model from a checkpoint and evaluates it.
Args:
train_dir: Directory to find checkpoints.
eval_dir: Directory to write summary events.
model_str: Network specification string.
eval_data: Evaluation data file pattern.
decoder_file: File to read to decode the labels.
num_steps: Number of eval steps to run.
graph_def_file: File to write graph definition to for freezing.
eval_interval_secs: How often to run evaluations, or once if 0.
reader: Function that returns an actual reader to read Examples from input
files. If None, uses tf.TFRecordReader().
Returns:
(char error rate, word recall error rate, sequence error rate) as percent.
Raises:
ValueError: If unimplemented feature is used.
"""
decode = None
if decoder_file:
decode = decoder.Decoder(decoder_file)
# Run eval.
rates = ec.ErrorRates(
label_error=None,
word_recall_error=None,
word_precision_error=None,
sequence_error=None)
with tf.Graph().as_default():
model = InitNetwork(eval_data, model_str, 'eval', reader=reader)
sw = tf.train.SummaryWriter(eval_dir)
while True:
sess = tf.Session('')
if graph_def_file is not None:
# Write the eval version of the graph to a file for freezing.
if not tf.gfile.Exists(graph_def_file):
with tf.gfile.FastGFile(graph_def_file, 'w') as f:
f.write(
sess.graph.as_graph_def(add_shapes=True).SerializeToString())
ckpt = tf.train.get_checkpoint_state(train_dir)
if ckpt and ckpt.model_checkpoint_path:
step = model.Restore(ckpt.model_checkpoint_path, sess)
if decode:
rates = decode.SoftmaxEval(sess, model, num_steps)
_AddRateToSummary('Label error rate', rates.label_error, step, sw)
_AddRateToSummary('Word recall error rate', rates.word_recall_error,
step, sw)
_AddRateToSummary('Word precision error rate',
rates.word_precision_error, step, sw)
_AddRateToSummary('Sequence error rate', rates.sequence_error, step,
sw)
sw.flush()
print 'Error rates=', rates
else:
raise ValueError('Non-softmax decoder evaluation not implemented!')
if eval_interval_secs:
time.sleep(eval_interval_secs)
else:
break
return rates
def InitNetwork(input_pattern,
model_spec,
mode='eval',
initial_learning_rate=0.00005,
final_learning_rate=0.00005,
halflife=1600000,
optimizer_type='Adam',
num_preprocess_threads=1,
reader=None):
"""Constructs a python tensor flow model defined by model_spec.
Args:
input_pattern: File pattern of the data in tfrecords of Example.
model_spec: Concatenation of input spec, model spec and output spec.
See Build below for input/output spec. For model spec, see vgslspecs.py
mode: One of 'train', 'eval'
initial_learning_rate: Initial learning rate for the network.
final_learning_rate: Final learning rate for the network.
halflife: Number of steps over which to halve the difference between
initial and final learning rate for the network.
optimizer_type: One of 'GradientDescent', 'AdaGrad', 'Momentum', 'Adam'.
num_preprocess_threads: Number of threads to use for image processing.
reader: Function that returns an actual reader to read Examples from input
files. If None, uses tf.TFRecordReader().
Eval tasks need only specify input_pattern and model_spec.
Returns:
A VGSLImageModel class.
Raises:
ValueError: if the model spec syntax is incorrect.
"""
model = VGSLImageModel(mode, model_spec, initial_learning_rate,
final_learning_rate, halflife)
left_bracket = model_spec.find('[')
right_bracket = model_spec.rfind(']')
if left_bracket < 0 or right_bracket < 0:
raise ValueError('Failed to find [] in model spec! ', model_spec)
input_spec = model_spec[:left_bracket]
layer_spec = model_spec[left_bracket:right_bracket + 1]
output_spec = model_spec[right_bracket + 1:]
model.Build(input_pattern, input_spec, layer_spec, output_spec,
optimizer_type, num_preprocess_threads, reader)
return model
class VGSLImageModel(object):
"""Class that builds a tensor flow model for training or evaluation.
"""
def __init__(self, mode, model_spec, initial_learning_rate,
final_learning_rate, halflife):
"""Constructs a VGSLImageModel.
Args:
mode: One of "train", "eval"
model_spec: Full model specification string, for reference only.
initial_learning_rate: Initial learning rate for the network.
final_learning_rate: Final learning rate for the network.
halflife: Number of steps over which to halve the difference between
initial and final learning rate for the network.
"""
# The string that was used to build this model.
self.model_spec = model_spec
# The layers between input and output.
self.layers = None
# The train/eval mode.
self.mode = mode
# The initial learning rate.
self.initial_learning_rate = initial_learning_rate
self.final_learning_rate = final_learning_rate
self.decay_steps = halflife / DECAY_STEPS_FACTOR
self.decay_rate = DECAY_RATE
# Tensor for the labels.
self.labels = None
self.sparse_labels = None
# Debug data containing the truth text.
self.truths = None
# Tensor for loss
self.loss = None
# Train operation
self.train_op = None
# Tensor for the global step counter
self.global_step = None
# Tensor for the output predictions (usually softmax)
self.output = None
# True if we are using CTC training mode.
self.using_ctc = False
# Saver object to load or restore the variables.
self.saver = None
def Build(self, input_pattern, input_spec, model_spec, output_spec,
optimizer_type, num_preprocess_threads, reader):
"""Builds the model from the separate input/layers/output spec strings.
Args:
input_pattern: File pattern of the data in tfrecords of TF Example format.
input_spec: Specification of the input layer:
batchsize,height,width,depth (4 comma-separated integers)
Training will run with batches of batchsize images, but runtime can
use any batch size.
height and/or width can be 0 or -1, indicating variable size,
otherwise all images must be the given size.
depth must be 1 or 3 to indicate greyscale or color.
NOTE 1-d image input, treating the y image dimension as depth, can
be achieved using S1(1x0)1,3 as the first op in the model_spec, but
the y-size of the input must then be fixed.
model_spec: Model definition. See vgslspecs.py
output_spec: Output layer definition:
O(2|1|0)(l|s|c)n output layer with n classes.
2 (heatmap) Output is a 2-d vector map of the input (possibly at
different scale).
1 (sequence) Output is a 1-d sequence of vector values.
0 (value) Output is a 0-d single vector value.
l uses a logistic non-linearity on the output, allowing multiple
hot elements in any output vector value.
s uses a softmax non-linearity, with one-hot output in each value.
c uses a softmax with CTC. Can only be used with s (sequence).
NOTE Only O1s and O1c are currently supported.
optimizer_type: One of 'GradientDescent', 'AdaGrad', 'Momentum', 'Adam'.
num_preprocess_threads: Number of threads to use for image processing.
reader: Function that returns an actual reader to read Examples from input
files. If None, uses tf.TFRecordReader().
"""
self.global_step = tf.Variable(0, name='global_step', trainable=False)
shape = _ParseInputSpec(input_spec)
out_dims, out_func, num_classes = _ParseOutputSpec(output_spec)
self.using_ctc = out_func == 'c'
images, heights, widths, labels, sparse, _ = vgsl_input.ImageInput(
input_pattern, num_preprocess_threads, shape, self.using_ctc, reader)
self.labels = labels
self.sparse_labels = sparse
self.layers = vgslspecs.VGSLSpecs(widths, heights, self.mode == 'train')
last_layer = self.layers.Build(images, model_spec)
self._AddOutputs(last_layer, out_dims, out_func, num_classes)
if self.mode == 'train':
self._AddOptimizer(optimizer_type)
# For saving the model across training and evaluation
self.saver = tf.train.Saver()
def TrainAStep(self, sess):
"""Runs a training step in the session.
Args:
sess: Session in which to train the model.
Returns:
loss, global_step.
"""
_, loss, step = sess.run([self.train_op, self.loss, self.global_step])
return loss, step
def Restore(self, checkpoint_path, sess):
"""Restores the model from the given checkpoint path into the session.
Args:
checkpoint_path: File pathname of the checkpoint.
sess: Session in which to restore the model.
Returns:
global_step of the model.
"""
self.saver.restore(sess, checkpoint_path)
return tf.train.global_step(sess, self.global_step)
def RunAStep(self, sess):
"""Runs a step for eval in the session.
Args:
sess: Session in which to run the model.
Returns:
output tensor result, labels tensor result.
"""
return sess.run([self.output, self.labels])
def _AddOutputs(self, prev_layer, out_dims, out_func, num_classes):
"""Adds the output layer and loss function.
Args:
prev_layer: Output of last layer of main network.
out_dims: Number of output dimensions, 0, 1 or 2.
out_func: Output non-linearity. 's' or 'c'=softmax, 'l'=logistic.
num_classes: Number of outputs/size of last output dimension.
"""
height_in = shapes.tensor_dim(prev_layer, dim=1)
logits, outputs = self._AddOutputLayer(prev_layer, out_dims, out_func,
num_classes)
if self.mode == 'train':
# Setup loss for training.
self.loss = self._AddLossFunction(logits, height_in, out_dims, out_func)
tf.scalar_summary('loss', self.loss, name='loss')
elif out_dims == 0:
# Be sure the labels match the output, even in eval mode.
self.labels = tf.slice(self.labels, [0, 0], [-1, 1])
self.labels = tf.reshape(self.labels, [-1])
logging.info('Final output=%s', outputs)
logging.info('Labels tensor=%s', self.labels)
self.output = outputs
def _AddOutputLayer(self, prev_layer, out_dims, out_func, num_classes):
"""Add the fully-connected logits and SoftMax/Logistic output Layer.
Args:
prev_layer: Output of last layer of main network.
out_dims: Number of output dimensions, 0, 1 or 2.
out_func: Output non-linearity. 's' or 'c'=softmax, 'l'=logistic.
num_classes: Number of outputs/size of last output dimension.
Returns:
logits: Pre-softmax/logistic fully-connected output shaped to out_dims.
outputs: Post-softmax/logistic shaped to out_dims.
Raises:
ValueError: if syntax is incorrect.
"""
# Reduce dimensionality appropriate to the output dimensions.
batch_in = shapes.tensor_dim(prev_layer, dim=0)
height_in = shapes.tensor_dim(prev_layer, dim=1)
width_in = shapes.tensor_dim(prev_layer, dim=2)
depth_in = shapes.tensor_dim(prev_layer, dim=3)
if out_dims:
# Combine any remaining height and width with batch and unpack after.
shaped = tf.reshape(prev_layer, [-1, depth_in])
else:
# Everything except batch goes to depth, and therefore has to be known.
shaped = tf.reshape(prev_layer, [-1, height_in * width_in * depth_in])
logits = slim.fully_connected(shaped, num_classes, activation_fn=None)
if out_func == 'l':
raise ValueError('Logistic not yet supported!')
else:
output = tf.nn.softmax(logits)
# Reshape to the dessired output.
if out_dims == 2:
output_shape = [batch_in, height_in, width_in, num_classes]
elif out_dims == 1:
output_shape = [batch_in, height_in * width_in, num_classes]
else:
output_shape = [batch_in, num_classes]
output = tf.reshape(output, output_shape, name='Output')
logits = tf.reshape(logits, output_shape)
return logits, output
def _AddLossFunction(self, logits, height_in, out_dims, out_func):
"""Add the appropriate loss function.
Args:
logits: Pre-softmax/logistic fully-connected output shaped to out_dims.
height_in: Height of logits before going into the softmax layer.
out_dims: Number of output dimensions, 0, 1 or 2.
out_func: Output non-linearity. 's' or 'c'=softmax, 'l'=logistic.
Returns:
loss: That which is to be minimized.
Raises:
ValueError: if logistic is used.
"""
if out_func == 'c':
# Transpose batch to the middle.
ctc_input = tf.transpose(logits, [1, 0, 2])
# Compute the widths of each batch element from the input widths.
widths = self.layers.GetLengths(dim=2, factor=height_in)
cross_entropy = tf.nn.ctc_loss(ctc_input, self.sparse_labels, widths)
elif out_func == 's':
if out_dims == 2:
self.labels = _PadLabels3d(logits, self.labels)
elif out_dims == 1:
self.labels = _PadLabels2d(
shapes.tensor_dim(
logits, dim=1), self.labels)
else:
self.labels = tf.slice(self.labels, [0, 0], [-1, 1])
self.labels = tf.reshape(self.labels, [-1])
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits, self.labels, name='xent')
else:
# TODO(rays) Labels need an extra dimension for logistic, so different
# padding functions are needed, as well as a different loss function.
raise ValueError('Logistic not yet supported!')
return tf.reduce_sum(cross_entropy)
def _AddOptimizer(self, optimizer_type):
"""Adds an optimizer with learning rate decay to minimize self.loss.
Args:
optimizer_type: One of 'GradientDescent', 'AdaGrad', 'Momentum', 'Adam'.
Raises:
ValueError: if the optimizer type is unrecognized.
"""
learn_rate_delta = self.initial_learning_rate - self.final_learning_rate
learn_rate_dec = tf.add(
tf.train.exponential_decay(learn_rate_delta, self.global_step,
self.decay_steps, self.decay_rate),
self.final_learning_rate)
if optimizer_type == 'GradientDescent':
opt = tf.train.GradientDescentOptimizer(learn_rate_dec)
elif optimizer_type == 'AdaGrad':
opt = tf.train.AdagradOptimizer(learn_rate_dec)
elif optimizer_type == 'Momentum':
opt = tf.train.MomentumOptimizer(learn_rate_dec, momentum=0.9)
elif optimizer_type == 'Adam':
opt = tf.train.AdamOptimizer(learning_rate=learn_rate_dec)
else:
raise ValueError('Invalid optimizer type: ' + optimizer_type)
tf.scalar_summary('learn_rate', learn_rate_dec, name='lr_summ')
self.train_op = opt.minimize(
self.loss, global_step=self.global_step, name='train')
def _PadLabels3d(logits, labels):
"""Pads or slices 3-d labels to match logits.
Covers the case of 2-d softmax output, when labels is [batch, height, width]
and logits is [batch, height, width, onehot]
Args:
logits: 4-d Pre-softmax fully-connected output.
labels: 3-d, but not necessarily matching in size.
Returns:
labels: Resized by padding or clipping to match logits.
"""
logits_shape = shapes.tensor_shape(logits)
labels_shape = shapes.tensor_shape(labels)
labels = tf.reshape(labels, [-1, labels_shape[2]])
labels = _PadLabels2d(logits_shape[2], labels)
labels = tf.reshape(labels, [labels_shape[0], -1])
labels = _PadLabels2d(logits_shape[1] * logits_shape[2], labels)
return tf.reshape(labels, [labels_shape[0], logits_shape[1], logits_shape[2]])
def _PadLabels2d(logits_size, labels):
"""Pads or slices the 2nd dimension of 2-d labels to match logits_size.
Covers the case of 1-d softmax output, when labels is [batch, seq] and
logits is [batch, seq, onehot]
Args:
logits_size: Tensor returned from tf.shape giving the target size.
labels: 2-d, but not necessarily matching in size.
Returns:
labels: Resized by padding or clipping the last dimension to logits_size.
"""
pad = logits_size - tf.shape(labels)[1]
def _PadFn():
return tf.pad(labels, [[0, 0], [0, pad]])
def _SliceFn():
return tf.slice(labels, [0, 0], [-1, logits_size])
return tf.cond(tf.greater(pad, 0), _PadFn, _SliceFn)
def _ParseInputSpec(input_spec):
"""Parses input_spec and returns the numbers obtained therefrom.
Args:
input_spec: Specification of the input layer. See Build.
Returns:
shape: ImageShape with the desired shape of the input.
Raises:
ValueError: if syntax is incorrect.
"""
pattern = re.compile(R'(\d+),(\d+),(\d+),(\d+)')
m = pattern.match(input_spec)
if m is None:
raise ValueError('Failed to parse input spec:' + input_spec)
batch_size = int(m.group(1))
y_size = int(m.group(2)) if int(m.group(2)) > 0 else None
x_size = int(m.group(3)) if int(m.group(3)) > 0 else None
depth = int(m.group(4))
if depth not in [1, 3]:
raise ValueError('Depth must be 1 or 3, had:', depth)
return vgsl_input.ImageShape(batch_size, y_size, x_size, depth)
def _ParseOutputSpec(output_spec):
"""Parses the output spec.
Args:
output_spec: Output layer definition. See Build.
Returns:
out_dims: 2|1|0 for 2-d, 1-d, 0-d.
out_func: l|s|c for logistic, softmax, softmax+CTC
num_classes: Number of classes in output.
Raises:
ValueError: if syntax is incorrect.
"""
pattern = re.compile(R'(O)(0|1|2)(l|s|c)(\d+)')
m = pattern.match(output_spec)
if m is None:
raise ValueError('Failed to parse output spec:' + output_spec)
out_dims = int(m.group(2))
out_func = m.group(3)
if out_func == 'c' and out_dims != 1:
raise ValueError('CTC can only be used with a 1-D sequence!')
num_classes = int(m.group(4))
return out_dims, out_func, num_classes
def _AddRateToSummary(tag, rate, step, sw):
"""Adds the given rate to the summary with the given tag.
Args:
tag: Name for this value.
rate: Value to add to the summary. Perhaps an error rate.
step: Global step of the graph for the x-coordinate of the summary.
sw: Summary writer to which to write the rate value.
"""
sw.add_summary(
summary_pb2.Summary(value=[summary_pb2.Summary.Value(
tag=tag, simple_value=rate)]), step)
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for vgsl_model."""
import os
import numpy as np
import tensorflow as tf
import vgsl_input
import vgsl_model
def _testdata(filename):
return os.path.join('../testdata/', filename)
def _rand(*size):
return np.random.uniform(size=size).astype('f')
class VgslModelTest(tf.test.TestCase):
def testParseInputSpec(self):
"""The parser must return the numbers in the correct order.
"""
shape = vgsl_model._ParseInputSpec(input_spec='32,42,256,3')
self.assertEqual(
shape,
vgsl_input.ImageShape(
batch_size=32, height=42, width=256, depth=3))
# Nones must be inserted for zero sizes.
shape = vgsl_model._ParseInputSpec(input_spec='1,0,0,3')
self.assertEqual(
shape,
vgsl_input.ImageShape(
batch_size=1, height=None, width=None, depth=3))
def testParseOutputSpec(self):
"""The parser must return the correct args in the correct order.
"""
out_dims, out_func, num_classes = vgsl_model._ParseOutputSpec(
output_spec='O1c142')
self.assertEqual(out_dims, 1)
self.assertEqual(out_func, 'c')
self.assertEqual(num_classes, 142)
out_dims, out_func, num_classes = vgsl_model._ParseOutputSpec(
output_spec='O2s99')
self.assertEqual(out_dims, 2)
self.assertEqual(out_func, 's')
self.assertEqual(num_classes, 99)
out_dims, out_func, num_classes = vgsl_model._ParseOutputSpec(
output_spec='O0l12')
self.assertEqual(out_dims, 0)
self.assertEqual(out_func, 'l')
self.assertEqual(num_classes, 12)
def testPadLabels2d(self):
"""Must pad timesteps in labels to match logits.
"""
with self.test_session() as sess:
# Make placeholders for logits and labels.
ph_logits = tf.placeholder(tf.float32, shape=(None, None, 42))
ph_labels = tf.placeholder(tf.int64, shape=(None, None))
padded_labels = vgsl_model._PadLabels2d(tf.shape(ph_logits)[1], ph_labels)
# Make actual inputs.
real_logits = _rand(4, 97, 42)
real_labels = _rand(4, 85)
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (4, 97))
real_labels = _rand(4, 97)
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (4, 97))
real_labels = _rand(4, 100)
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (4, 97))
def testPadLabels3d(self):
"""Must pad height and width in labels to match logits.
The tricky thing with 3-d is that the rows and columns need to remain
intact, so we'll test it with small known data.
"""
with self.test_session() as sess:
# Make placeholders for logits and labels.
ph_logits = tf.placeholder(tf.float32, shape=(None, None, None, 42))
ph_labels = tf.placeholder(tf.int64, shape=(None, None, None))
padded_labels = vgsl_model._PadLabels3d(ph_logits, ph_labels)
# Make actual inputs.
real_logits = _rand(1, 3, 4, 42)
# Test all 9 combinations of height x width in [small, ok, big]
real_labels = np.arange(6).reshape((1, 2, 3)) # Height small, width small
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (1, 3, 4))
self.assertAllEqual(np_array[0, :, :],
[[0, 1, 2, 0], [3, 4, 5, 0], [0, 0, 0, 0]])
real_labels = np.arange(8).reshape((1, 2, 4)) # Height small, width ok
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (1, 3, 4))
self.assertAllEqual(np_array[0, :, :],
[[0, 1, 2, 3], [4, 5, 6, 7], [0, 0, 0, 0]])
real_labels = np.arange(10).reshape((1, 2, 5)) # Height small, width big
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (1, 3, 4))
self.assertAllEqual(np_array[0, :, :],
[[0, 1, 2, 3], [5, 6, 7, 8], [0, 0, 0, 0]])
real_labels = np.arange(9).reshape((1, 3, 3)) # Height ok, width small
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (1, 3, 4))
self.assertAllEqual(np_array[0, :, :],
[[0, 1, 2, 0], [3, 4, 5, 0], [6, 7, 8, 0]])
real_labels = np.arange(12).reshape((1, 3, 4)) # Height ok, width ok
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (1, 3, 4))
self.assertAllEqual(np_array[0, :, :],
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]])
real_labels = np.arange(15).reshape((1, 3, 5)) # Height ok, width big
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (1, 3, 4))
self.assertAllEqual(np_array[0, :, :],
[[0, 1, 2, 3], [5, 6, 7, 8], [10, 11, 12, 13]])
real_labels = np.arange(12).reshape((1, 4, 3)) # Height big, width small
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (1, 3, 4))
self.assertAllEqual(np_array[0, :, :],
[[0, 1, 2, 0], [3, 4, 5, 0], [6, 7, 8, 0]])
real_labels = np.arange(16).reshape((1, 4, 4)) # Height big, width ok
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (1, 3, 4))
self.assertAllEqual(np_array[0, :, :],
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]])
real_labels = np.arange(20).reshape((1, 4, 5)) # Height big, width big
np_array = sess.run([padded_labels],
feed_dict={ph_logits: real_logits,
ph_labels: real_labels})[0]
self.assertEqual(tuple(np_array.shape), (1, 3, 4))
self.assertAllEqual(np_array[0, :, :],
[[0, 1, 2, 3], [5, 6, 7, 8], [10, 11, 12, 13]])
def testEndToEndSizes0d(self):
"""Tests that the output sizes match when training/running real 0d data.
Uses mnist with dual summarizing LSTMs to reduce to a single value.
"""
filename = _testdata('mnist-tiny')
with self.test_session() as sess:
model = vgsl_model.InitNetwork(
filename,
model_spec='4,0,0,1[Cr5,5,16 Mp3,3 Lfys16 Lfxs16]O0s12',
mode='train')
tf.initialize_all_variables().run(session=sess)
coord = tf.train.Coordinator()
tf.train.start_queue_runners(sess=sess, coord=coord)
_, step = model.TrainAStep(sess)
self.assertEqual(step, 1)
output, labels = model.RunAStep(sess)
self.assertEqual(len(output.shape), 2)
self.assertEqual(len(labels.shape), 1)
self.assertEqual(output.shape[0], labels.shape[0])
self.assertEqual(output.shape[1], 12)
# TODO(rays) Support logistic and test with Imagenet (as 0d, multi-object.)
def testEndToEndSizes1dCTC(self):
"""Tests that the output sizes match when training with CTC.
Basic bidi LSTM on top of convolution and summarizing LSTM with CTC.
"""
filename = _testdata('arial-32-tiny')
with self.test_session() as sess:
model = vgsl_model.InitNetwork(
filename,
model_spec='2,0,0,1[Cr5,5,16 Mp3,3 Lfys16 Lbx100]O1c105',
mode='train')
tf.initialize_all_variables().run(session=sess)
coord = tf.train.Coordinator()
tf.train.start_queue_runners(sess=sess, coord=coord)
_, step = model.TrainAStep(sess)
self.assertEqual(step, 1)
output, labels = model.RunAStep(sess)
self.assertEqual(len(output.shape), 3)
self.assertEqual(len(labels.shape), 2)
self.assertEqual(output.shape[0], labels.shape[0])
# This is ctc - the only cast-iron guarantee is labels <= output.
self.assertLessEqual(labels.shape[1], output.shape[1])
self.assertEqual(output.shape[2], 105)
def testEndToEndSizes1dFixed(self):
"""Tests that the output sizes match when training/running 1 data.
Convolution, summarizing LSTM with fwd rev fwd to allow no CTC.
"""
filename = _testdata('numbers-16-tiny')
with self.test_session() as sess:
model = vgsl_model.InitNetwork(
filename,
model_spec='8,0,0,1[Cr5,5,16 Mp3,3 Lfys16 Lfx64 Lrx64 Lfx64]O1s12',
mode='train')
tf.initialize_all_variables().run(session=sess)
coord = tf.train.Coordinator()
tf.train.start_queue_runners(sess=sess, coord=coord)
_, step = model.TrainAStep(sess)
self.assertEqual(step, 1)
output, labels = model.RunAStep(sess)
self.assertEqual(len(output.shape), 3)
self.assertEqual(len(labels.shape), 2)
self.assertEqual(output.shape[0], labels.shape[0])
# Not CTC, output lengths match.
self.assertEqual(output.shape[1], labels.shape[1])
self.assertEqual(output.shape[2], 12)
# TODO(rays) Get a 2-d dataset and support 2d (heat map) outputs.
if __name__ == '__main__':
tf.test.main()
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Model trainer for single or multi-replica training."""
from tensorflow import app
from tensorflow.python.platform import flags
import vgsl_model
flags.DEFINE_string('master', '', 'Name of the TensorFlow master to use.')
flags.DEFINE_string('train_dir', '/tmp/mdir',
'Directory where to write event logs.')
flags.DEFINE_string('model_str',
'1,150,600,3[S2(4x150)0,2 Ct5,5,16 Mp2,2 Ct5,5,64 Mp3,3'
'([Lrys64 Lbx128][Lbys64 Lbx128][Lfys64 Lbx128])S3(3x0)2,3'
'Lfx128 Lrx128 S0(1x4)0,3 Do Lfx256]O1c134',
'Network description.')
flags.DEFINE_integer('max_steps', 10000, 'Number of steps to train for.')
flags.DEFINE_integer('task', 0, 'Task id of the replica running the training.')
flags.DEFINE_integer('ps_tasks', 0, 'Number of tasks in the ps job.'
'If 0 no ps job is used.')
flags.DEFINE_string('train_data', None, 'Training data filepattern')
flags.DEFINE_float('initial_learning_rate', 0.00002, 'Initial learning rate')
flags.DEFINE_float('final_learning_rate', 0.00002, 'Final learning rate')
flags.DEFINE_integer('learning_rate_halflife', 1600000,
'Halflife of learning rate')
flags.DEFINE_string('optimizer_type', 'Adam',
'Optimizer from:GradientDescent, AdaGrad, Momentum, Adam')
flags.DEFINE_integer('num_preprocess_threads', 4, 'Number of input threads')
FLAGS = flags.FLAGS
def main(argv):
del argv
vgsl_model.Train(FLAGS.train_dir, FLAGS.model_str, FLAGS.train_data,
FLAGS.max_steps, FLAGS.master, FLAGS.task, FLAGS.ps_tasks,
FLAGS.initial_learning_rate, FLAGS.final_learning_rate,
FLAGS.learning_rate_halflife, FLAGS.optimizer_type,
FLAGS.num_preprocess_threads)
if __name__ == '__main__':
app.run()
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""String network description language mapping to TF-Slim calls where possible.
See vglspecs.md for detailed description.
"""
import re
from string import maketrans
import nn_ops
import shapes
import tensorflow as tf
import tensorflow.contrib.slim as slim
# Class that builds a set of ops to manipulate variable-sized images.
class VGSLSpecs(object):
"""Layers that can be built from a string definition."""
def __init__(self, widths, heights, is_training):
"""Constructs a VGSLSpecs.
Args:
widths: Tensor of size batch_size of the widths of the inputs.
heights: Tensor of size batch_size of the heights of the inputs.
is_training: True if the graph should be build for training.
"""
# The string that was used to build this model.
self.model_str = None
# True if we are training
self.is_training = is_training
# Tensor for the size of the images, of size batch_size.
self.widths = widths
self.heights = heights
# Overall reduction factors of this model so far for each dimension.
# TODO(rays) consider building a graph from widths and heights instead of
# computing a scale factor.
self.reduction_factors = [1.0, 1.0, 1.0, 1.0]
# List of Op parsers.
# TODO(rays) add more Op types as needed.
self.valid_ops = [self.AddSeries, self.AddParallel, self.AddConvLayer,
self.AddMaxPool, self.AddDropout, self.AddReShape,
self.AddFCLayer, self.AddLSTMLayer]
# Translation table to convert unacceptable characters that may occur
# in op strings that cannot be used as names.
self.transtab = maketrans('(,)', '___')
def Build(self, prev_layer, model_str):
"""Builds a network with input prev_layer from a VGSLSpecs description.
Args:
prev_layer: The input tensor.
model_str: Model definition similar to Tesseract as follows:
============ FUNCTIONAL OPS ============
C(s|t|r|l|m)[{name}]<y>,<x>,<d> Convolves using a y,x window, with no
shrinkage, SAME infill, d outputs, with s|t|r|l|m non-linear layer.
(s|t|r|l|m) specifies the type of non-linearity:
s = sigmoid
t = tanh
r = relu
l = linear (i.e., None)
m = softmax
F(s|t|r|l|m)[{name}]<d> Fully-connected with s|t|r|l|m non-linearity and
d outputs. Reduces height, width to 1. Input height and width must be
constant.
L(f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
f runs the LSTM forward only.
r runs the LSTM reversed only.
b runs the LSTM bidirectionally.
x runs the LSTM in the x-dimension (on data with or without the
y-dimension).
y runs the LSTM in the y-dimension (data must have a y dimension).
s (optional) summarizes the output in the requested dimension,
outputting only the final step, collapsing the dimension to a
single element.
Examples:
Lfx128 runs a forward-only LSTM in the x-dimension with 128
outputs, treating any y dimension independently.
Lfys64 runs a forward-only LSTM in the y-dimension with 64 outputs
and collapses the y-dimension to 1 element.
NOTE that Lbxsn is implemented as (LfxsnLrxsn) since the summaries
need to be taken from opposite ends of the output
Do[{name}] Insert a dropout layer.
============ PLUMBING OPS ============
[...] Execute ... networks in series (layers).
(...) Execute ... networks in parallel, with their output concatenated
in depth.
S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to
another dimension.
Splits input dimension d into a x b, sending the high part (a) to the
high side of dimension e, and the low part (b) to the high side of
dimension f. Exception: if d=e=f, then then dimension d is internally
transposed to bxa.
Either a or b can be zero, meaning whatever is left after taking out
the other, allowing dimensions to be of variable size.
Eg. S3(3x50)2,3 will split the 150-element depth into 3x50, with the 3
going to the most significant part of the width, and the 50 part
staying in depth.
This will rearrange a 3x50 output parallel operation to spread the 3
output sets over width.
Mp[{name}]<y>,<x> Maxpool the input, reducing the (y,x) rectangle to a
single vector value.
Returns:
Output tensor
"""
self.model_str = model_str
final_layer, _ = self.BuildFromString(prev_layer, 0)
return final_layer
def GetLengths(self, dim=2, factor=1):
"""Returns the lengths of the batch of elements in the given dimension.
WARNING: The returned sizes may not exactly match TF's calculation.
Args:
dim: dimension to get the sizes of, in [1,2]. batch, depth not allowed.
factor: A scalar value to multiply by.
Returns:
The original heights/widths scaled by the current scaling of the model and
the given factor.
Raises:
ValueError: If the args are invalid.
"""
if dim == 1:
lengths = self.heights
elif dim == 2:
lengths = self.widths
else:
raise ValueError('Invalid dimension given to GetLengths')
lengths = tf.cast(lengths, tf.float32)
if self.reduction_factors[dim] is not None:
lengths = tf.div(lengths, self.reduction_factors[dim])
else:
lengths = tf.ones_like(lengths)
if factor != 1:
lengths = tf.mul(lengths, tf.cast(factor, tf.float32))
return tf.cast(lengths, tf.int32)
def BuildFromString(self, prev_layer, index):
"""Adds the layers defined by model_str[index:] to the model.
Args:
prev_layer: Input tensor.
index: Position in model_str to start parsing
Returns:
Output tensor, next model_str index.
Raises:
ValueError: If the model string is unrecognized.
"""
index = self._SkipWhitespace(index)
for op in self.valid_ops:
output_layer, next_index = op(prev_layer, index)
if output_layer is not None:
return output_layer, next_index
if output_layer is not None:
return output_layer, next_index
raise ValueError('Unrecognized model string:' + self.model_str[index:])
def AddSeries(self, prev_layer, index):
"""Builds a sequence of layers for a VGSLSpecs model.
Args:
prev_layer: Input tensor.
index: Position in model_str to start parsing
Returns:
Output tensor of the series, end index in model_str.
Raises:
ValueError: If [] are unbalanced.
"""
if self.model_str[index] != '[':
return None, None
index += 1
while index < len(self.model_str) and self.model_str[index] != ']':
prev_layer, index = self.BuildFromString(prev_layer, index)
if index == len(self.model_str):
raise ValueError('Missing ] at end of series!' + self.model_str)
return prev_layer, index + 1
def AddParallel(self, prev_layer, index):
"""tf.concats outputs of layers that run on the same inputs.
Args:
prev_layer: Input tensor.
index: Position in model_str to start parsing
Returns:
Output tensor of the parallel, end index in model_str.
Raises:
ValueError: If () are unbalanced or the elements don't match.
"""
if self.model_str[index] != '(':
return None, None
index += 1
layers = []
num_dims = 0
# Each parallel must output the same, including any reduction factor, in
# all dimensions except depth.
# We have to save the starting factors, so they don't get reduced by all
# the elements of the parallel, only once.
original_factors = self.reduction_factors
final_factors = None
while index < len(self.model_str) and self.model_str[index] != ')':
self.reduction_factors = original_factors
layer, index = self.BuildFromString(prev_layer, index)
if num_dims == 0:
num_dims = len(layer.get_shape())
elif num_dims != len(layer.get_shape()):
raise ValueError('All elements of parallel must return same num dims')
layers.append(layer)
if final_factors:
if final_factors != self.reduction_factors:
raise ValueError('All elements of parallel must scale the same')
else:
final_factors = self.reduction_factors
if index == len(self.model_str):
raise ValueError('Missing ) at end of parallel!' + self.model_str)
return tf.concat(num_dims - 1, layers), index + 1
def AddConvLayer(self, prev_layer, index):
"""Add a single standard convolutional layer.
Args:
prev_layer: Input tensor.
index: Position in model_str to start parsing
Returns:
Output tensor, end index in model_str.
"""
pattern = re.compile(R'(C)(s|t|r|l|m)({\w+})?(\d+),(\d+),(\d+)')
m = pattern.match(self.model_str, index)
if m is None:
return None, None
name = self._GetLayerName(m.group(0), index, m.group(3))
width = int(m.group(4))
height = int(m.group(5))
depth = int(m.group(6))
fn = self._NonLinearity(m.group(2))
return slim.conv2d(
prev_layer, depth, [height, width], activation_fn=fn,
scope=name), m.end()
def AddMaxPool(self, prev_layer, index):
"""Add a maxpool layer.
Args:
prev_layer: Input tensor.
index: Position in model_str to start parsing
Returns:
Output tensor, end index in model_str.
"""
pattern = re.compile(R'(Mp)({\w+})?(\d+),(\d+)(?:,(\d+),(\d+))?')
m = pattern.match(self.model_str, index)
if m is None:
return None, None
name = self._GetLayerName(m.group(0), index, m.group(2))
height = int(m.group(3))
width = int(m.group(4))
y_stride = height if m.group(5) is None else m.group(5)
x_stride = width if m.group(6) is None else m.group(6)
self.reduction_factors[1] *= y_stride
self.reduction_factors[2] *= x_stride
return slim.max_pool2d(
prev_layer, [height, width], [y_stride, x_stride],
padding='SAME',
scope=name), m.end()
def AddDropout(self, prev_layer, index):
"""Adds a dropout layer.
Args:
prev_layer: Input tensor.
index: Position in model_str to start parsing
Returns:
Output tensor, end index in model_str.
"""
pattern = re.compile(R'(Do)({\w+})?')
m = pattern.match(self.model_str, index)
if m is None:
return None, None
name = self._GetLayerName(m.group(0), index, m.group(2))
layer = slim.dropout(
prev_layer, 0.5, is_training=self.is_training, scope=name)
return layer, m.end()
def AddReShape(self, prev_layer, index):
"""Reshapes the input tensor by moving each (x_scale,y_scale) rectangle to.
the depth dimension. NOTE that the TF convention is that inputs are
[batch, y, x, depth].
Args:
prev_layer: Input tensor.
index: Position in model_str to start parsing
Returns:
Output tensor, end index in model_str.
"""
pattern = re.compile(R'(S)(?:{(\w)})?(\d+)\((\d+)x(\d+)\)(\d+),(\d+)')
m = pattern.match(self.model_str, index)
if m is None:
return None, None
name = self._GetLayerName(m.group(0), index, m.group(2))
src_dim = int(m.group(3))
part_a = int(m.group(4))
part_b = int(m.group(5))
dest_dim_a = int(m.group(6))
dest_dim_b = int(m.group(7))
if part_a == 0:
part_a = -1
if part_b == 0:
part_b = -1
prev_shape = tf.shape(prev_layer)
layer = shapes.transposing_reshape(
prev_layer, src_dim, part_a, part_b, dest_dim_a, dest_dim_b, name=name)
# Compute scale factors.
result_shape = tf.shape(layer)
for i in xrange(len(self.reduction_factors)):
if self.reduction_factors[i] is not None:
factor1 = tf.cast(self.reduction_factors[i], tf.float32)
factor2 = tf.cast(prev_shape[i], tf.float32)
divisor = tf.cast(result_shape[i], tf.float32)
self.reduction_factors[i] = tf.div(tf.mul(factor1, factor2), divisor)
return layer, m.end()
def AddFCLayer(self, prev_layer, index):
"""Parse expression and add Fully Connected Layer.
Args:
prev_layer: Input tensor.
index: Position in model_str to start parsing
Returns:
Output tensor, end index in model_str.
"""
pattern = re.compile(R'(F)(s|t|r|l|m)({\w+})?(\d+)')
m = pattern.match(self.model_str, index)
if m is None:
return None, None
fn = self._NonLinearity(m.group(2))
name = self._GetLayerName(m.group(0), index, m.group(3))
depth = int(m.group(4))
input_depth = shapes.tensor_dim(prev_layer, 1) * shapes.tensor_dim(
prev_layer, 2) * shapes.tensor_dim(prev_layer, 3)
# The slim fully connected is actually a 1x1 conv, so we have to crush the
# dimensions on input.
# Everything except batch goes to depth, and therefore has to be known.
shaped = tf.reshape(
prev_layer, [-1, input_depth], name=name + '_reshape_in')
output = slim.fully_connected(shaped, depth, activation_fn=fn, scope=name)
# Width and height are collapsed to 1.
self.reduction_factors[1] = None
self.reduction_factors[2] = None
return tf.reshape(
output, [shapes.tensor_dim(prev_layer, 0), 1, 1, depth],
name=name + '_reshape_out'), m.end()
def AddLSTMLayer(self, prev_layer, index):
"""Parse expression and add LSTM Layer.
Args:
prev_layer: Input tensor.
index: Position in model_str to start parsing
Returns:
Output tensor, end index in model_str.
"""
pattern = re.compile(R'(L)(f|r|b)(x|y)(s)?({\w+})?(\d+)')
m = pattern.match(self.model_str, index)
if m is None:
return None, None
direction = m.group(2)
dim = m.group(3)
summarize = m.group(4) == 's'
name = self._GetLayerName(m.group(0), index, m.group(5))
depth = int(m.group(6))
if direction == 'b' and summarize:
fwd = self._LSTMLayer(prev_layer, 'forward', dim, True, depth,
name + '_forward')
back = self._LSTMLayer(prev_layer, 'backward', dim, True, depth,
name + '_reverse')
return tf.concat(3, [fwd, back], name=name + '_concat'), m.end()
if direction == 'f':
direction = 'forward'
elif direction == 'r':
direction = 'backward'
else:
direction = 'bidirectional'
outputs = self._LSTMLayer(prev_layer, direction, dim, summarize, depth,
name)
if summarize:
# The x or y dimension is getting collapsed.
if dim == 'x':
self.reduction_factors[2] = None
else:
self.reduction_factors[1] = None
return outputs, m.end()
def _LSTMLayer(self, prev_layer, direction, dim, summarize, depth, name):
"""Adds an LSTM layer with the given pre-parsed attributes.
Always maps 4-D to 4-D regardless of summarize.
Args:
prev_layer: Input tensor.
direction: 'forward' 'backward' or 'bidirectional'
dim: 'x' or 'y', dimension to consider as time.
summarize: True if we are to return only the last timestep.
depth: Output depth.
name: Some string naming the op.
Returns:
Output tensor.
"""
# If the target dimension is y, we need to transpose.
if dim == 'x':
lengths = self.GetLengths(2, 1)
inputs = prev_layer
else:
lengths = self.GetLengths(1, 1)
inputs = tf.transpose(prev_layer, [0, 2, 1, 3], name=name + '_ytrans_in')
input_batch = shapes.tensor_dim(inputs, 0)
num_slices = shapes.tensor_dim(inputs, 1)
num_steps = shapes.tensor_dim(inputs, 2)
input_depth = shapes.tensor_dim(inputs, 3)
# Reshape away the other dimension.
inputs = tf.reshape(
inputs, [-1, num_steps, input_depth], name=name + '_reshape_in')
# We need to replicate the lengths by the size of the other dimension, and
# any changes that have been made to the batch dimension.
tile_factor = tf.to_float(input_batch *
num_slices) / tf.to_float(tf.shape(lengths)[0])
lengths = tf.tile(lengths, [tf.cast(tile_factor, tf.int32)])
lengths = tf.cast(lengths, tf.int64)
outputs = nn_ops.rnn_helper(
inputs,
lengths,
cell_type='lstm',
num_nodes=depth,
direction=direction,
name=name,
stddev=0.1)
# Output depth is doubled if bi-directional.
if direction == 'bidirectional':
output_depth = depth * 2
else:
output_depth = depth
# Restore the other dimension.
if summarize:
outputs = tf.slice(
outputs, [0, num_steps - 1, 0], [-1, 1, -1], name=name + '_sum_slice')
outputs = tf.reshape(
outputs, [input_batch, num_slices, 1, output_depth],
name=name + '_reshape_out')
else:
outputs = tf.reshape(
outputs, [input_batch, num_slices, num_steps, output_depth],
name=name + '_reshape_out')
if dim == 'y':
outputs = tf.transpose(outputs, [0, 2, 1, 3], name=name + '_ytrans_out')
return outputs
def _NonLinearity(self, code):
"""Returns the non-linearity function pointer for the given string code.
For forwards compatibility, allows the full names for stand-alone
non-linearities, as well as the single-letter names used in ops like C,F.
Args:
code: String code representing a non-linearity function.
Returns:
non-linearity function represented by the code.
"""
if code in ['s', 'Sig']:
return tf.sigmoid
elif code in ['t', 'Tanh']:
return tf.tanh
elif code in ['r', 'Relu']:
return tf.nn.relu
elif code in ['m', 'Smax']:
return tf.nn.softmax
return None
def _GetLayerName(self, op_str, index, name_str):
"""Generates a name for the op, using a user-supplied name if possible.
Args:
op_str: String representing the parsed op.
index: Position in model_str of the start of the op.
name_str: User-supplied {name} with {} that need removing or None.
Returns:
Selected name.
"""
if name_str:
return name_str[1:-1]
else:
return op_str.translate(self.transtab) + '_' + str(index)
def _SkipWhitespace(self, index):
"""Skips any leading whitespace in the model description.
Args:
index: Position in model_str to start parsing
Returns:
end index in model_str of whitespace.
"""
pattern = re.compile(R'([ \t\n]+)')
m = pattern.match(self.model_str, index)
if m is None:
return index
return m.end()
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for vgslspecs."""
import numpy as np
import tensorflow as tf
import vgslspecs
def _rand(*size):
return np.random.uniform(size=size).astype('f')
class VgslspecsTest(tf.test.TestCase):
def __init__(self, other):
super(VgslspecsTest, self).__init__(other)
self.max_width = 36
self.max_height = 24
self.batch_size = 4
def SetupInputs(self):
# Make placeholders for standard inputs.
# Everything is variable in the input, except the depth.
self.ph_image = tf.placeholder(
tf.float32, shape=(None, None, None, 3), name='inputs')
self.ph_widths = tf.placeholder(tf.int64, shape=(None,), name='w')
self.ph_heights = tf.placeholder(tf.int64, shape=(None,), name='h')
# Make actual inputs.
self.in_image = _rand(self.batch_size, self.max_height, self.max_width, 3)
self.in_widths = [24, 12, self.max_width, 30]
self.in_heights = [self.max_height, 18, 12, 6]
def ExpectScaledSize(self, spec, target_shape, factor=1):
"""Tests that the output of the graph of the given spec has target_shape."""
with tf.Graph().as_default():
with self.test_session() as sess:
self.SetupInputs()
# Only the placeholders are given at construction time.
vgsl = vgslspecs.VGSLSpecs(self.ph_widths, self.ph_heights, True)
outputs = vgsl.Build(self.ph_image, spec)
# Compute the expected output widths from the given scale factor.
target_widths = tf.div(self.in_widths, factor).eval()
target_heights = tf.div(self.in_heights, factor).eval()
# Run with the 'real' data.
tf.initialize_all_variables().run()
res_image, res_widths, res_heights = sess.run(
[outputs, vgsl.GetLengths(2), vgsl.GetLengths(1)],
feed_dict={self.ph_image: self.in_image,
self.ph_widths: self.in_widths,
self.ph_heights: self.in_heights})
self.assertEqual(tuple(res_image.shape), target_shape)
if target_shape[1] > 1:
self.assertEqual(tuple(res_heights), tuple(target_heights))
if target_shape[2] > 1:
self.assertEqual(tuple(res_widths), tuple(target_widths))
def testSameSizeConv(self):
"""Test all types of Conv. There is no scaling."""
self.ExpectScaledSize(
'[Cs{MyConv}5,5,16 Ct3,3,12 Cr4,4,24 Cl5,5,64]',
(self.batch_size, self.max_height, self.max_width, 64))
def testSameSizeLSTM(self):
"""Test all non-reducing LSTMs. Output depth is doubled with BiDi."""
self.ExpectScaledSize('[Lfx16 Lrx8 Do Lbx24 Lfy12 Do{MyDo} Lry7 Lby32]',
(self.batch_size, self.max_height, self.max_width,
64))
def testSameSizeParallel(self):
"""Parallel affects depth, but not scale."""
self.ExpectScaledSize('[Cs5,5,16 (Lfx{MyLSTM}32 Lrx32 Lbx16)]',
(self.batch_size, self.max_height, self.max_width,
96))
def testScalingOps(self):
"""Test a heterogeneous series with scaling."""
self.ExpectScaledSize('[Cs5,5,16 Mp{MyPool}2,2 Ct3,3,32 Mp3,3 Lfx32 Lry64]',
(self.batch_size, self.max_height / 6,
self.max_width / 6, 64), 6)
def testXReduction(self):
"""Test a heterogeneous series with reduction of x-dimension."""
self.ExpectScaledSize('[Cr5,5,16 Mp2,2 Ct3,3,32 Mp3,3 Lfxs32 Lry64]',
(self.batch_size, self.max_height / 6, 1, 64), 6)
def testYReduction(self):
"""Test a heterogeneous series with reduction of y-dimension."""
self.ExpectScaledSize('[Cl5,5,16 Mp2,2 Ct3,3,32 Mp3,3 Lfys32 Lfx64]',
(self.batch_size, 1, self.max_width / 6, 64), 6)
def testXYReduction(self):
"""Test a heterogeneous series with reduction to 0-d."""
self.ExpectScaledSize(
'[Cr5,5,16 Lfys32 Lfxs64 Fr{MyFC}16 Ft20 Fl12 Fs32 Fm40]',
(self.batch_size, 1, 1, 40))
def testReshapeTile(self):
"""Tests that a tiled input can be reshaped to the batch dimension."""
self.ExpectScaledSize('[S2(3x0)0,2 Cr5,5,16 Lfys16]',
(self.batch_size * 3, 1, self.max_width / 3, 16), 3)
def testReshapeDepth(self):
"""Tests that depth can be reshaped to the x dimension."""
self.ExpectScaledSize('[Cl5,5,16 Mp3,3 (Lrys32 Lbys16 Lfys32) S3(3x0)2,3]',
(self.batch_size, 1, self.max_width, 32))
if __name__ == '__main__':
tf.test.main()
0
104 <nul>
1 G
2 r
3 a
4 s
5 l
6 n
7 d
8 .
9 B
10 C
11 O
12 W
13 Y
14 ,
15 (
16 u
17 z
18 i
19 e
20 )
21 1
22 9
23 2
24 -
25 6
26 o
27 L
28 P
29 '
30 t
31 m
32 K
33 c
34 k
35 V
36 S
37 D
38 J
39 h
40 M
41 x
42 E
43 q
44 ;
45 A
46 y
47 f
48 5
49 7
50 b
51 4
52 0
53 3
54 N
55 I
56 T
57 /
58 p
59 w
60 g
61 H
62 “
63 F
62 ”
62 "
29 ’
64 R
24 —
65 8
66 v
67 ?
68 é
69 %
70 :
71 j
72 \
73 {
74 }
75 |
76 U
77 $
78 °
79 *
80 !
81 ]
82 Q
29 ‘
83 Z
84 X
85 [
86 =
87 +
88 §
89 _
90 £
91 &
92 #
93 >
94 <
95 ~
96 €
97 @
98 ¢
99 »
100 «
47,5 fl
47,18 fi
101 ®
102 ©
103 ¥
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment