Adding SyntaxNet to tensorflow/models (#63)

32ab5a58 · calberti · Martin Wicke · 148a15fb · 148a15fb · 32ab5a58
Commit 32ab5a58 authored May 12, 2016 by calberti Committed by Martin Wicke May 12, 2016
20 changed files
--- a/.gitignore
+++ b/.gitignore
-autoencoder/MNIST_data/*
-*.pyc
--- a/.gitmodules
+++ b/.gitmodules
+[submodule "tensorflow"]
+	path = syntaxnet/tensorflow
+	url = https://github.com/tensorflow/tensorflow.git
--- a/syntaxnet/.gitignore
+++ b/syntaxnet/.gitignore
+/bazel-bin
+/bazel-genfiles
+/bazel-out
+/bazel-tensorflow
+/bazel-testlogs
+/bazel-tf
+/bazel-syntaxnet
--- a/syntaxnet/README.md
+++ b/syntaxnet/README.md
+# SyntaxNet: Neural Models of Syntax.
+
+*A TensorFlow implementation of the models described in [Andor et al. (2016)]
+(http://arxiv.org/pdf/1603.06042v1.pdf).*
+
+At Google, we spend a lot of time thinking about how computer systems can read
+and understand human language in order to process it in intelligent ways. We are
+excited to share the fruits of our research with the broader community by
+releasing SyntaxNet, an open-source neural network framework for [TensorFlow]
+(http://www.tensorflow.org) that provides a foundation for Natural Language
+Understanding (NLU) systems. Our release includes all the code needed to train
+new SyntaxNet models on your own data, as well as *Parsey McParseface*, an
+English parser that we have trained for you, and that you can use to analyze
+English text.
+
+So, how accurate is Parsey McParseface? For this release, we tried to balance a
+model that runs fast enough to be useful on a single machine (e.g. ~600
+words/second on a modern desktop) and that is also the most accurate parser
+available. Here's how Parsey McParseface compares to the academic literature on
+several different English domains: (all numbers are % correct head assignments
+in the tree, or unlabelled attachment score)
+
+Model                                                                                                           | News  | Web   | Questions
+--------------------------------------------------------------------------------------------------------------- | :---: | :---: | :-------:
+[Martins et al. (2013)](http://www.cs.cmu.edu/~ark/TurboParser/)                                                | 93.10 | 88.23 | 94.21
+[Zhang and McDonald (2014)](http://research.google.com/pubs/archive/38148.pdf)                                  | 93.32 | 88.65 | 93.37
+[Weiss et al. (2015)](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43800.pdf) | 93.91 | 89.29 | 94.17
+[Andor et al. (2016)](http://arxiv.org/pdf/1603.06042v1.pdf)*                                                   | 94.44 | 90.17 | 95.40
+Parsey McParseface                                                                                              | 94.15 | 89.08 | 94.77
+
+We see that Parsey McParseface is state-of-the-art; more importantly, with
+SyntaxNet you can train larger networks with more hidden units and bigger beam
+sizes if you want to push the accuracy even further: [Andor et al. (2016)]
+(http://arxiv.org/pdf/1603.06042v1.pdf)* is simply a SyntaxNet model with a
+larger beam and network. For futher information on the datasets, see that paper
+under the section "Treebank Union".
+
+Parsey McParseface is also state-of-the-art for part-of-speech (POS) tagging
+(numbers below are per-token accuracy):
+
+Model                                                                      | News  | Web   | Questions
+-------------------------------------------------------------------------- | :---: | :---: | :-------:
+[Ling et al. (2015)](http://www.cs.cmu.edu/~lingwang/papers/emnlp2015.pdf) | 97.78 | 94.03 | 96.18
+[Andor et al. (2016)](http://arxiv.org/pdf/1603.06042v1.pdf)*              | 97.77 | 94.80 | 96.86
+Parsey McParseface                                                         | 97.52 | 94.24 | 96.45
+
+The first part of this tutorial describes how to install the necessary tools and
+use the already trained models provided in this release. In the second part of
+the tutorial we provide more background about the models, as well as
+instructions for training models on other datasets.
+
+## Contents
+* [Installation](#installation)
+* [Getting Started](#getting-started)
+    * [Parsing from Standard Input](#parsing-from-standard-input)
+    * [Annotating a Corpus](#annotating-a-corpus)
+    * [Configuring the Python Scripts](#configuring-the-python-scripts)
+    * [Next Steps](#next-steps)
+* [Detailed Tutorial: Building an NLP Pipeline with SyntaxNet](#detailed-tutorial-building-an-nlp-pipeline-with-syntaxnet)
+    * [Obtaining Data](#obtaining-data)
+    * [Part-of-Speech Tagging](#part-of-speech-tagging)
+    * [Training the SyntaxNet POS Tagger](#training-the-syntaxnet-pos-tagger)
+    * [Preprocessing with the Tagger](#preprocessing-with-the-tagger)
+    * [Dependency Parsing: Transition-Based Parsing](#dependency-parsing-transition-based-parsing)
+    * [Training a Parser Step 1: Local Pretraining](#training-a-parser-step-1-local-pretraining)
+    * [Training a Parser Step 2: Global Training](#training-a-parser-step-2-global-training)
+* [Contact](#contact)
+* [Credits](#credits)
+
+## Installation
+
+Running and training SyntaxNet models requires building this package from
+source. You'll need to install:
+
+*   bazel:
+    *   follow the instructions [here](http://bazel.io/docs/install.html)
+    *   **Note: You must use bazel version 0.2.2, NOT 0.2.2b, due to a WORKSPACE
+        issue**
+*   swig:
+    *   `apt-get install swig` on Ubuntu
+    *   `brew install swig` on OSX
+*   protocol buffers, with a version supported by TensorFlow:
+    *   check your protobuf version with `pip freeze | grep protobuf1`
+    *   upgrade to a supported version with `pip install -U protobuf==3.0.0b2`
+*   asciitree, to draw parse trees on the console for the demo:
+    *   `pip install asciitree`
+
+Once you completed the above steps, you can build and test SyntaxNet with the
+following commands:
+
+```shell
+  git clone --recursive https://github.com/tensorflow/models.git
+  cd models/syntaxnet/tensorflow
+  ./configure
+  cd ..
+  bazel test syntaxnet/... util/utf8/...
+  # On Mac, run the following:
+  bazel test --linkopt=-headerpad_max_install_names \
+    syntaxnet/... util/utf8/...
+```
+
+Bazel should complete reporting all tests passed.
+
+## Getting Started
+
+Once you have successfully built SyntaxNet, you can start parsing text right
+away with Parsey McParseface, located under `syntaxnet/models`. The easiest
+thing is to use or modify the included script `syntaxnet/demo.sh`, which shows a
+basic setup to parse English taking plain text as input.
+
+### Parsing from Standard Input
+
+Simply pass one sentence per line of text into the script at
+`syntaxnet/demo.sh`. The script will break the text into words, run the POS
+tagger, run the parser, and then generate an ASCII version of the parse tree:
+
+```shell
+echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh
+
+Input: Bob brought the pizza to Alice .
+Parse:
+brought VBD ROOT
+ +-- Bob NNP nsubj
+ +-- pizza NN dobj
+ |   +-- the DT det
+ +-- to IN prep
+ |   +-- Alice NNP pobj
+ +-- . . punct
+```
+
+The ASCII tree shows the text organized as in the parse, not left-to-right as
+visualized in our tutorial graphs. In this example, we see that the verb
+"brought" is the root of the sentence, with the subject "Bob", the object
+"pizza", and the prepositional phrase "to Alice".
+
+If you want to feed in tokenized, CONLL-formatted text, you can run `demo.sh
+--conll`.
+
+### Annotating a Corpus
+
+To change the pipeline to read and write to specific files (as opposed to piping
+through stdin and stdout), we have to modify the `demo.sh` to point to the files
+we want. The SyntaxNet models are configured via a combination of run-time flags
+(which are easy to change) and a text format `TaskSpec` protocol buffer. The
+spec file used in the demo is in `syntaxnet/models/treebank_union/context`.
+
+To use corpora instead of stdin/stdout, we have to:
+
+1.  Create or modify a `input` field inside the `TaskSpec`, with the
+    `file_pattern` specifying the location we want. If the input corpus is in
+    CONLL format, make sure to put `record_format: 'conll-sentence'`.
+1.  Change the `--input` and/or `--output` flag to use the name of the resource
+    as the output, instead of `stdin` and `stdout`.
+
+E.g., if we wanted to POS tag the CONLL corpus `./wsj.conll`, we would create
+two entries, one for the input and one for the output:
+
+```protosame
+input {
+  name: 'wsj-data'
+  record_format: 'conll-sentence'
+  Part {
+    file_pattern: './wsj.conll'
+  }
+}
+input {
+  name: 'wsj-data-tagged'
+  record_format: 'conll-sentence'
+  Part {
+    file_pattern: './wsj-tagged.conll'
+  }
+}
+```
+
+Then we can use `--input=wsj-data --output=wsj-data-tagged` on the command line
+to specify reading and writing to these files.
+
+### Configuring the Python Scripts
+
+As mentioned above, the python scripts are configured in two ways:
+
+1.  **Run-time flags** are used to point to the `TaskSpec` file, switch between
+    inputs for reading and writing, and set various run-time model parameters.
+    At training time, these flags are used to set the learning rate, hidden
+    layer sizes, and other key parameters.
+1.  The **`TaskSpec` proto** stores configuration about the transition system,
+    the features, and a set of named static resources required by the parser. It
+    is specified via the `--task_context` flag. A few key notes to remember:
+
+    -   The `Parameter` settings in the `TaskSpec` have a prefix: either
+        `brain_pos` (they apply to the tagger) or `brain_parser` (they apply to
+        the parser). The `--prefix` run-time flag switches between reading from
+        the two configurations.
+    -   The resources will be created and/or modified during multiple stages of
+        training. As described above, the resources can also be used at
+        evaluation time to read or write to specific files. These resources are
+        also separate from the model parameters, which are saved separately via
+        calls to TensorFlow ops, and loaded via the `--model_path` flag.
+    -   Because the `TaskSpec` contains file path, remember that copying around
+        this file is not enough to relocate a trained model: you need up move
+        and update all the paths as well.
+
+Note that some run-time flags need to be consistent between training and testing
+(e.g. the number of hidden units).
+
+### Next Steps
+
+There are many ways to extend this framework, e.g. adding new features, changing
+the model structure, training on other languages, etc. We suggest reading the
+detailed tutorial below to get a handle on the rest of the framework.
+
+## Detailed Tutorial: Building an NLP Pipeline with SyntaxNet
+
+In this tutorial, we'll go over how to train new models, and explain in a bit
+more technical detail the NLP side of the models. Our goal here is to explain
+the NLP pipeline produced by this package.
+
+### Obtaining Data
+
+The included English parser, Parsey McParseface, was trained on the the standard
+corpora of the [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) and
+[OntoNotes](https://catalog.ldc.upenn.edu/LDC2013T19), as well as the [English
+Web Treebank](https://catalog.ldc.upenn.edu/LDC2012T13), but these are
+unfortunately not freely available.
+
+However, the [Universal Dependencies](http://universaldependencies.org/) project
+provides freely available treebank data in a number of languages. SyntaxNet can
+be trained and evaluated on any of these corpora.
+
+### Part-of-Speech Tagging
+
+Consider the following sentence, which exhibits several ambiguities that affect
+its interpretation:
+
+> I saw the man with glasses.
+
+This sentence is composed of words: strings of characters that are segmented
+into groups (e.g. "I", "saw", etc.) Each word in the sentence has a *grammatical
+function* that can be useful for understanding the meaning of language. For
+example, "saw" in this example is a past tense of the verb "to see". But any
+given word might have different meanings in different contexts: "saw" could just
+as well be a noun (e.g., a saw used for cutting) or a present tense verb (using
+a saw to cut something).
+
+A logical first step in understanding language is figuring out these roles for
+each word in the sentence. This process is called *Part-of-Speech (POS)
+Tagging*. The roles are called POS tags. Although a given word might have
+multiple possible tags depending on the context, given any one interpretation of
+a sentence each word will generally only have one tag.
+
+One interesting challenge of POS tagging is that the problem of defining a
+vocabulary of POS tags for a given language is quite involved. While the concept
+of nouns and verbs is pretty common, it has been traditionally difficult to
+agree on a standard set of roles across all languages. The [Universal
+Dependencies](http://www.universaldependencies.org) project aims to solve this
+problem.
+
+### Training the SyntaxNet POS Tagger
+
+In general, determining the correct POS tag requires understanding the entire
+sentence and the context in which it is uttered. In practice, we can do very
+well just by considering a small window of words around the word of interest.
+For example, words that follow the word ‘the’ tend to be adjectives or nouns,
+rather than verbs.
+
+To predict POS tags, we use a simple setup. We processes the sentences
+left-to-right. For any given word, we extract features of that word and a window
+around it, and use these as inputs to a feed-forward neural network classifier,
+which predicts a probability distribution over POS tags. Because we make
+decisions in left-to-right order, we also use prior decisions as features in
+subsequent ones (e.g. "the previous predicted tag was a noun.").
+
+All the models in this package use a flexible markup language to define
+features. For example, the features in the POS tagger are found in the
+`brain_pos_features` parameter in the `TaskSpec`, and look like this (modulo
+spacing):
+
+```
+stack(3).word stack(2).word stack(1).word stack.word input.word input(1).word input(2).word input(3).word;
+input.digit input.hyphen;
+stack.suffix(length=2) input.suffix(length=2) input(1).suffix(length=2);
+stack.prefix(length=2) input.prefix(length=2) input(1).prefix(length=2)
+```
+
+Note that `stack` here means "words we have already tagged." Thus, this feature
+spec uses three types of features: words, suffixes, and prefixes. The features
+are grouped into blocks that share an embedding matrix, concatenated together,
+and fed into a chain of hidden layers. This structure is based upon the model
+proposed by [Chen and Manning (2014)]
+(http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf).
+
+We show this layout in the schematic below: the state of the system (a stack and
+a buffer, visualized below for both the POS and the dependency parsing task) is
+used to extract sparse features, which are fed into the network in groups. We
+show only a small subset of the features to simplify the presentation in the
+schematic:
+
+![Schematic](ff_nn_schematic.png "Feed-forward Network Structure")
+
+In the configuration above, each block gets its own embedding matrix and the
+blocks in the configuration above are delineated with a semi-colon. The
+dimensions of each block are controlled in the `brain_pos_embedding_dims`
+parameter. **Important note:** unlike many simple NLP models, this is *not* a
+bag of words model. Remember that although certain features share embedding
+matrices, the above features will be concatenated, so the interpretation of
+`input.word` will be quite different from `input(1).word`. This also means that
+adding features increases the dimension of the `concat` layer of the model as
+well as the number of parameters for the first hidden layer.
+
+To train the model, first edit `syntaxnet/context.pbtxt` so that the inputs
+`training-corpus`, `tuning-corpus`, and `dev-corpus` point to the location of
+your training data. You can then train a part-of-speech tagger with:
+
+```shell
+bazel-bin/syntaxnet/parser_trainer \
+  --task_context=syntaxnet/context.pbtxt \
+  --arg_prefix=brain_pos \  # read from POS configuration
+  --compute_lexicon \       # required for first stage of pipeline
+  --graph_builder=greedy \  # no beam search
+  --training_corpus=training-corpus \  # names of training/tuning set
+  --tuning_corpus=tuning-corpus \
+  --output_path=models \  # where to save new resources
+  --batch_size=32 \       # Hyper-parameters
+  --decay_steps=3600 \
+  --hidden_layer_sizes=128 \
+  --learning_rate=0.08 \
+  --momentum=0.9 \
+  --seed=0 \
+  --params=128-0.08-3600-0.9-0  # name for these parameters
+```
+
+This will read in the data, construct a lexicon, build a tensorflow graph for
+the model with the specific hyperparameters, and train the model. Every so often
+the model will be evaluated on the tuning set, and only the checkpoint with the
+highest accuracy on this set will be saved. **Note that you should never use a
+corpus you intend to test your model on as your tuning set, as you will inflate
+your test set results.**
+
+For best results, you should repeat this command with at least 3 different
+seeds, and possibly with a few different values for `--learning_rate` and
+`--decay_steps`. Good values for `--learning_rate` are usually close to 0.1, and
+you usually want `--decay_steps` to correspond to about one tenth of your
+corpus. The `--params` flag is only a human readable identifier for the model
+being trained, used to construct the full output path, so that you don't need to
+worry about clobbering old models by accident.
+
+The `--arg_prefix` flag controls which parameters should be read from the task
+context file `context.pbtxt`. In this case `arg_prefix` is set to `brain_pos`,
+so the paramters being used in this training run are
+`brain_pos_transition_system`, `brain_pos_embedding_dims`, `brain_pos_features`
+and, `brain_pos_embedding_names`. To train the dependency parser later
+`arg_prefix` will be set to `brain_parser`.
+
+### Preprocessing with the Tagger
+
+Now that we have a trained POS tagging model, we want to use the output of this
+model as features in the parser. Thus the next step is to run the trained model
+over our training, tuning, and dev (evaluation) sets. We can use the
+parser_eval.py` script for this.
+
+For example, the model `128-0.08-3600-0.9-0` trained above can be run over the
+training, tuning, and dev sets with the following command:
+
+```shell
+PARAMS=128-0.08-3600-0.9-0
+for SET in training tuning dev; do
+  bazel-bin/syntaxnet/parser_eval \
+    --task_context=models/brain_pos/greedy/$PARAMS/context \
+    --hidden_layer_sizes=128 \
+    --input=$SET-corpus \
+    --output=tagged-$SET-corpus \
+    --arg_prefix=brain_pos \
+    --graph_builder=greedy \
+    --model_path=models/brain_pos/greedy/$PARAMS/model
+done
+```
+
+**Important note:** This command only works because we have created entries for
+you in `context.pbtxt` that correspond to `tagged-training-corpus`,
+`tagged-dev-corpus`, and `tagged-tuning-corpus`. From these default settings,
+the above will write tagged versions of the training, tuning, and dev set to the
+directory `models/brain_pos/greedy/$PARAMS/`. This location is chosen because
+the `input` entries do not have `file_pattern` set: instead, they have `creator:
+brain_pos/greedy`, which means that `parser_trainer.py` will construct *new*
+files when called with `--arg_prefix=brain_pos --graph_builder=greedy` using the
+`--model_path` flag to determine the location.
+
+For convenience, `parser_eval.py` also logs POS tagging accuracy after the
+output tagged datasets have been written.
+
+### Dependency Parsing: Transition-Based Parsing
+
+Now that we have a prediction for the grammatical role of the words, we want to
+understand how the words in the sentence relate to each other. This parser is
+built around the *head-modifier* construction: for each word, we choose a
+*syntactic head* that it modifies according to some grammatical role.
+
+An example for the above sentence is as follows:
+
+![Figure](sawman.png)
+
+Below each word in the sentence we see both a fine-grained part-of-speech
+(*PRP*, *VBD*, *DT*, *NN* etc.), and a coarse-grained part-of-speech (*PRON*,
+*VERB*, *DET*, *NOUN*, etc.). Coarse-grained POS tags encode basic grammatical
+categories, while the fine-grained POS tags make further distinctions: for
+example *NN* is a singular noun (as opposed, for example, to *NNS*, which is a
+plural noun), and *VBD* is a past-tense verb. For more discussion see [Petrov et
+al. (2012)](http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf).
+
+Crucially, we also see directed arcs signifying grammatical relationships
+between different words in the sentence. For example *I* is the subject of
+*saw*, as signified by the directed arc labeled *nsubj* between these words;
+*man* is the direct object (dobj) of *saw*; the preposition *with* modifies
+*man* with a prep relation, signifiying modification by a prepositional phrase;
+and so on. In addition the verb *saw* is identified as the *root* of the entire
+sentence.
+
+Whenever we have a directed arc between two words, we refer to the word at the
+start of the arc as the *head*, and the word at the end of the arc as the
+*modifier*. For example we have one arc where the head is *saw* and the modifier
+is *I*, another where the head is *saw* and the modifier is *man*, and so on.
+
+The grammatical relationships encoded in dependency structures are directly
+related to the underlying meaning of the sentence in question. They allow us to
+easily recover the answers to various questions, for example *whom did I see?*,
+*who saw the man with glasses?*, and so on.
+
+SyntaxNet is a **transition-based** dependency parser [Nivre (2007)]
+(http://www.mitpressjournals.org/doi/pdfplus/10.1162/coli.07-056-R1-07-027) that
+constructs a parse incrementally. Like the tagger, it processes words
+left-to-right. The words all start as unprocessed input, called the *buffer*. As
+words are encountered they are put onto a *stack*. At each step, the parser can
+do one of three things:
+
+1.  **SHIFT:** Push another word onto the top of the stack, i.e. shifting one
+    token from the buffer to the stack.
+1.  **LEFT_ARC:** Pop the top two words from the stack. Attach the second to the
+    first, creating an arc pointing to the **left**. Push the **first** word
+    back on the stack.
+1.  **RIGHT_ARC:** Pop the top two words from the stack. Attach the second to
+    the first, creating an arc point to the **right**. Push the **second** word
+    back on the stack.
+
+At each step, we call the combination of the stack and the buffer the
+*configuration* of the parser. For the left and right actions, we also assign a
+dependency relation label to that arc. This process is visualized in the
+following animation for a short sentence:
+
+![Animation](looping-parser.gif "Parsing in Action")
+
+Note that this parser is following a sequence of actions, called a
+**derivation**, to produce a "gold" tree labeled by a linguist. We can use this
+sequence of decisions to learn a classifier that takes a configuration and
+predicts the next action to take.
+
+### Training a Parser Step 1: Local Pretraining
+
+As described in our [paper](http://arxiv.org/pdf/1603.06042v1.pdf), the first
+step in training the model is to *pre-train* using *local* decisions. In this
+phase, we use the gold dependency to guide the parser, and train a softmax layer
+to predict the correct action given these gold dependencies. This can be
+performed very efficiently, since the parser's decisions are all independent in
+this setting.
+
+Once the tagged datasets are available, a locally normalized dependency parsing
+model can be trained with the following command:
+
+```shell
+bazel-bin/syntaxnet/parser_trainer \
+  --arg_prefix=brain_parser \
+  --batch_size=32 \
+  --projectivize_training_set \
+  --decay_steps=4400 \
+  --graph_builder=greedy \
+  --hidden_layer_sizes=200,200 \
+  --learning_rate=0.08 \
+  --momentum=0.85 \
+  --output_path=models \
+  --task_context=models/brain_pos/greedy/$PARAMS/context \
+  --seed=4 \
+  --training_corpus=tagged-training-corpus \
+  --tuning_corpus=tagged-tuning-corpus \
+  --params=200x200-0.08-4400-0.85-4
+```
+
+Note that we point the trainer to the context corresponding to the POS tagger
+that we picked previously. This allows the parser to reuse the lexicons and the
+tagged datasets that were created in the previous steps. Processing data can be
+done similarly to how tagging was done above. For example if in this case we
+picked parameters `200x200-0.08-4400-0.85-4`, the training, tuning and dev sets
+can be parsed with the following command:
+
+```shell
+PARAMS=200x200-0.08-4400-0.85-4
+for SET in training tuning dev; do
+  bazel-bin/syntaxnet/parser_eval \
+    --task_context=models/brain_parser/greedy/$PARAMS/context \
+    --hidden_layer_sizes=200,200 \
+    --input=tagged-$SET-corpus \
+    --output=parsed-$SET-corpus \
+    --arg_prefix=brain_parser \
+    --graph_builder=greedy \
+    --model_path=models/brain_parser/greedy/$PARAMS/model
+done
+```
+
+### Training a Parser Step 2: Global Training
+
+As we describe in the paper, there are several problems with the locally
+normalized models we just trained. The most important is the *label-bias*
+problem: the model doesn't learn what a good parse looks like, only what action
+to take given a history of gold decisions. This is because the scores are
+normalized *locally* using a softmax for each decision.
+
+In the paper, we show how we can achieve much better results using a *globally*
+normalized model: in this model, the softmax scores are summed in log space, and
+the scores are not normalized until we reach a final decision. When the parser
+stops, the scores of each hypothesis are normalized against a small set of
+possible parses (in the case of this model, a beam size of 8). When training, we
+force the parser to stop during parsing when the gold derivation falls off the
+beam (a strategy known as early-updates).
+
+We give a simplified view of how this training works for a [garden path
+sentence](https://en.wikipedia.org/wiki/Garden_path_sentence), where it is
+important to maintain multiple hypotheses. A single mistake early on in parsing
+leads to a completely incorrect parse; after training, the model learns to
+prefer the second (correct) parse.
+
+![Beam search training](beam_search_training.png)
+
+Parsey McParseface correctly parses this sentence. Even though the correct parse
+is initially ranked 4th out of multiple hypotheses, when the end of the garden
+path is reached, Parsey McParseface can recover due to the beam; using a larger
+beam will get a more accurate model, but it will be slower (we used beam 32 for
+the models in the paper).
+
+Once you have the pre-trained locally normalized model, a globally normalized
+parsing model can now be trained with the following command:
+
+```shell
+bazel-bin/syntaxnet/parser_trainer \
+  --arg_prefix=brain_parser \
+  --batch_size=8 \
+  --decay_steps=100 \
+  --graph_builder=structured \
+  --hidden_layer_sizes=200,200 \
+  --learning_rate=0.02 \
+  --momentum=0.9 \
+  --output_path=models \
+  --task_context=models/brain_parser/greedy/$PARAMS/context \
+  --seed=0 \
+  --training_corpus=projectivized-training-corpus \
+  --tuning_corpus=tagged-tuning-corpus \
+  --params=200x200-0.02-100-0.9-0 \
+  --pretrained_params=models/brain_parser/greedy/$PARAMS/model \
+  --pretrained_params_names=\
+embedding_matrix_0,embedding_matrix_1,embedding_matrix_2,\
+bias_0,weights_0,bias_1,weights_1
+```
+
+Training a beam model with the structured builder will take a lot longer than
+the greedy training runs above, perhaps 3 or 4 times longer. Note once again
+that multiple restarts of training will yield the most reliable results.
+Evaluation can again be done with `parser_eval.py`. In this case we use
+parameters `200x200-0.02-100-0.9-0` to evaluate on the training, tuning and dev
+sets with the following command:
+
+```shell
+PARAMS=200x200-0.02-100-0.9-0
+for SET in training tuning dev; do
+  bazel-bin/syntaxnet/parser_eval \
+    --task_context=models/brain_parser/structured/$PARAMS/context \
+    --hidden_layer_sizes=200,200 \
+    --input=tagged-$SET-corpus \
+    --output=beam-parsed-$SET-corpus \
+    --arg_prefix=brain_parser \
+    --graph_builder=structured \
+    --model_path=models/brain_parser/structured/$PARAMS/model
+done
+```
+
+Hooray! You now have your very own cousin of Parsey McParseface, ready to go out
+and parse text in the wild.
+
+## Contact
+
+To ask questions or report issues please contact syntaxnet-users@google.com.
+
+## Credits
+
+Original authors of the code in this package include (in alphabetical order):
+
+*   apresta@google.com (Alessandro Presta)
+*   bohnetbd@google.com (Bernd Bohnet)
+*   chrisalberti@google.com (Chris Alberti)
+*   credo@google.com (Tim Credo)
+*   danielandor@google.com (Daniel Andor)
+*   djweiss@google.com (David Weiss)
+*   epitler@google.com (Emily Pitler)
+*   gcoppola@google.com (Greg Coppola)
+*   golding@google.com (Andy Golding)
+*   istefan@google.com (Stefan Istrate)
+*   kbhall@google.com (Keith Hall)
+*   kuzman@google.com (Kuzman Ganchev)
+*   mjcollins@google.com (Michael Collins)
+*   ringgaard@google.com (Michael Ringgaard)
+*   ryanmcd@google.com (Ryan McDonald)
+*   severyn@google.com (Aliaksei Severyn)
+*   slav@google.com (Slav Petrov)
+*   terrykoo@google.com (Terry Koo)
--- a/syntaxnet/WORKSPACE
+++ b/syntaxnet/WORKSPACE
+local_repository(
+  name = "tf",
+  path = __workspace_dir__ + "/tensorflow",
+)
+
+load('//tensorflow/tensorflow:workspace.bzl', 'tf_workspace')
+tf_workspace("tensorflow/", "@tf")
+
+# Specify the minimum required Bazel version.
+load("@tf//tensorflow:tensorflow.bzl", "check_version")
+check_version("0.2.0")
+
+# ===== gRPC dependencies =====
+
+bind(
+    name = "libssl",
+    actual = "@boringssl_git//:ssl",
+)
+
+git_repository(
+    name = "boringssl_git",
+    commit = "436432d849b83ab90f18773e4ae1c7a8f148f48d",
+    init_submodules = True,
+    remote = "https://github.com/mdsteele/boringssl-bazel.git",
+)
+
+bind(
+    name = "zlib",
+    actual = "@zlib_archive//:zlib",
+)
+
+new_http_archive(
+    name = "zlib_archive",
+    build_file = "zlib.BUILD",
+    sha256 = "879d73d8cd4d155f31c1f04838ecd567d34bebda780156f0e82a20721b3973d5",
+    strip_prefix = "zlib-1.2.8",
+    url = "http://zlib.net/zlib128.zip",
+)
--- a/syntaxnet/beam_search_training.png
+++ b/syntaxnet/beam_search_training.png
--- a/syntaxnet/ff_nn_schematic.png
+++ b/syntaxnet/ff_nn_schematic.png
--- a/syntaxnet/looping-parser.gif
+++ b/syntaxnet/looping-parser.gif
--- a/syntaxnet/sawman.png
+++ b/syntaxnet/sawman.png
--- a/syntaxnet/syntaxnet/BUILD
+++ b/syntaxnet/syntaxnet/BUILD
+# Description:
+# A syntactic parser and part-of-speech tagger in TensorFlow.
+
+package(
+    default_visibility = ["//visibility:private"],
+    features = ["-layering_check"],
+)
+
+licenses(["notice"])  # Apache 2.0
+
+load(
+    "syntaxnet",
+    "tf_proto_library",
+    "tf_proto_library_py",
+    "tf_gen_op_libs",
+    "tf_gen_op_wrapper_py",
+)
+
+# proto libraries
+
+tf_proto_library(
+    name = "feature_extractor_proto",
+    srcs = ["feature_extractor.proto"],
+)
+
+tf_proto_library(
+    name = "sentence_proto",
+    srcs = ["sentence.proto"],
+)
+
+tf_proto_library_py(
+    name = "sentence_py_pb2",
+    srcs = ["sentence.proto"],
+)
+
+tf_proto_library(
+    name = "dictionary_proto",
+    srcs = ["dictionary.proto"],
+)
+
+tf_proto_library_py(
+    name = "dictionary_py_pb2",
+    srcs = ["dictionary.proto"],
+)
+
+tf_proto_library(
+    name = "kbest_syntax_proto",
+    srcs = ["kbest_syntax.proto"],
+    deps = [":sentence_proto"],
+)
+
+tf_proto_library(
+    name = "task_spec_proto",
+    srcs = ["task_spec.proto"],
+)
+
+tf_proto_library_py(
+    name = "task_spec_py_pb2",
+    srcs = ["task_spec.proto"],
+)
+
+tf_proto_library(
+    name = "sparse_proto",
+    srcs = ["sparse.proto"],
+)
+
+tf_proto_library_py(
+    name = "sparse_py_pb2",
+    srcs = ["sparse.proto"],
+)
+
+# cc libraries for feature extraction and parsing
+
+cc_library(
+    name = "base",
+    hdrs = ["base.h"],
+    visibility = ["//visibility:public"],
+    deps = [
+        "@re2//:re2",
+        "@tf//google/protobuf",
+        "@tf//third_party/eigen3",
+    ] + select({
+        "//conditions:default": [
+            "@tf//tensorflow/core:framework",
+            "@tf//tensorflow/core:lib",
+        ],
+        "@tf//tensorflow:darwin": [
+            "@tf//tensorflow/core:framework_headers_lib",
+        ],
+    }),
+)
+
+cc_library(
+    name = "utils",
+    srcs = ["utils.cc"],
+    hdrs = [
+        "utils.h",
+    ],
+    deps = [
+        ":base",
+        "//util/utf8:unicodetext",
+    ],
+)
+
+cc_library(
+    name = "test_main",
+    testonly = 1,
+    srcs = ["test_main.cc"],
+    linkopts = ["-lm"],
+    deps = [
+        "@tf//tensorflow/core:lib",
+        "@tf//tensorflow/core:testlib",
+        "//external:gtest",
+    ],
+)
+
+cc_library(
+    name = "document_format",
+    srcs = ["document_format.cc"],
+    hdrs = ["document_format.h"],
+    deps = [
+        ":registry",
+        ":sentence_proto",
+        ":task_context",
+    ],
+)
+
+cc_library(
+    name = "text_formats",
+    srcs = ["text_formats.cc"],
+    deps = [
+        ":document_format",
+    ],
+    alwayslink = 1,
+)
+
+cc_library(
+    name = "fml_parser",
+    srcs = ["fml_parser.cc"],
+    hdrs = ["fml_parser.h"],
+    deps = [
+        ":feature_extractor_proto",
+        ":utils",
+    ],
+)
+
+cc_library(
+    name = "proto_io",
+    hdrs = ["proto_io.h"],
+    deps = [
+        ":feature_extractor_proto",
+        ":fml_parser",
+        ":kbest_syntax_proto",
+        ":sentence_proto",
+        ":task_context",
+    ],
+)
+
+cc_library(
+    name = "feature_extractor",
+    srcs = ["feature_extractor.cc"],
+    hdrs = [
+        "feature_extractor.h",
+        "feature_types.h",
+    ],
+    deps = [
+        ":document_format",
+        ":feature_extractor_proto",
+        ":kbest_syntax_proto",
+        ":proto_io",
+        ":sentence_proto",
+        ":task_context",
+        ":utils",
+        ":workspace",
+    ],
+)
+
+cc_library(
+    name = "affix",
+    srcs = ["affix.cc"],
+    hdrs = ["affix.h"],
+    deps = [
+        ":dictionary_proto",
+        ":feature_extractor",
+        ":shared_store",
+        ":term_frequency_map",
+        ":utils",
+        ":workspace",
+    ],
+)
+
+cc_library(
+    name = "sentence_features",
+    srcs = ["sentence_features.cc"],
+    hdrs = ["sentence_features.h"],
+    deps = [
+        ":affix",
+        ":feature_extractor",
+        ":registry",
+    ],
+)
+
+cc_library(
+    name = "shared_store",
+    srcs = ["shared_store.cc"],
+    hdrs = ["shared_store.h"],
+    deps = [
+        ":utils",
+    ],
+)
+
+cc_library(
+    name = "registry",
+    srcs = ["registry.cc"],
+    hdrs = ["registry.h"],
+    deps = [
+        ":utils",
+    ],
+)
+
+cc_library(
+    name = "workspace",
+    srcs = ["workspace.cc"],
+    hdrs = ["workspace.h"],
+    deps = [
+        ":utils",
+    ],
+)
+
+cc_library(
+    name = "task_context",
+    srcs = ["task_context.cc"],
+    hdrs = ["task_context.h"],
+    deps = [
+        ":task_spec_proto",
+        ":utils",
+    ],
+)
+
+cc_library(
+    name = "term_frequency_map",
+    srcs = ["term_frequency_map.cc"],
+    hdrs = ["term_frequency_map.h"],
+    visibility = ["//visibility:public"],
+    deps = [
+        ":utils",
+    ],
+    alwayslink = 1,
+)
+
+cc_library(
+    name = "parser_transitions",
+    srcs = [
+        "arc_standard_transitions.cc",
+        "parser_state.cc",
+        "parser_transitions.cc",
+        "tagger_transitions.cc",
+    ],
+    hdrs = [
+        "parser_state.h",
+        "parser_transitions.h",
+    ],
+    deps = [
+        ":kbest_syntax_proto",
+        ":registry",
+        ":shared_store",
+        ":task_context",
+        ":term_frequency_map",
+    ],
+    alwayslink = 1,
+)
+
+cc_library(
+    name = "populate_test_inputs",
+    testonly = 1,
+    srcs = ["populate_test_inputs.cc"],
+    hdrs = ["populate_test_inputs.h"],
+    deps = [
+        ":dictionary_proto",
+        ":sentence_proto",
+        ":task_context",
+        ":term_frequency_map",
+        ":test_main",
+    ],
+)
+
+cc_library(
+    name = "parser_features",
+    srcs = ["parser_features.cc"],
+    hdrs = ["parser_features.h"],
+    deps = [
+        ":affix",
+        ":feature_extractor",
+        ":parser_transitions",
+        ":registry",
+        ":sentence_features",
+        ":sentence_proto",
+        ":task_context",
+        ":term_frequency_map",
+        ":workspace",
+    ],
+    alwayslink = 1,
+)
+
+cc_library(
+    name = "embedding_feature_extractor",
+    srcs = ["embedding_feature_extractor.cc"],
+    hdrs = ["embedding_feature_extractor.h"],
+    deps = [
+        ":feature_extractor",
+        ":parser_features",
+        ":parser_transitions",
+        ":sparse_proto",
+        ":task_context",
+        ":workspace",
+    ],
+)
+
+cc_library(
+    name = "sentence_batch",
+    srcs = ["sentence_batch.cc"],
+    hdrs = ["sentence_batch.h"],
+    deps = [
+        ":embedding_feature_extractor",
+        ":feature_extractor",
+        ":parser_features",
+        ":parser_transitions",
+        ":sparse_proto",
+        ":task_context",
+        ":task_spec_proto",
+        ":term_frequency_map",
+        ":workspace",
+    ],
+)
+
+cc_library(
+    name = "reader_ops",
+    srcs = [
+        "beam_reader_ops.cc",
+        "reader_ops.cc",
+    ],
+    deps = [
+        ":parser_features",
+        ":parser_transitions",
+        ":sentence_batch",
+        ":sentence_proto",
+        ":task_context",
+        ":task_spec_proto",
+    ],
+    alwayslink = 1,
+)
+
+cc_library(
+    name = "document_filters",
+    srcs = ["document_filters.cc"],
+    deps = [
+        ":document_format",
+        ":parser_features",
+        ":parser_transitions",
+        ":sentence_batch",
+        ":sentence_proto",
+        ":task_context",
+        ":task_spec_proto",
+        ":text_formats",
+    ],
+    alwayslink = 1,
+)
+
+cc_library(
+    name = "lexicon_builder",
+    srcs = ["lexicon_builder.cc"],
+    deps = [
+        ":document_format",
+        ":parser_features",
+        ":parser_transitions",
+        ":sentence_batch",
+        ":sentence_proto",
+        ":task_context",
+        ":task_spec_proto",
+        ":text_formats",
+    ],
+    alwayslink = 1,
+)
+
+cc_library(
+    name = "unpack_sparse_features",
+    srcs = ["unpack_sparse_features.cc"],
+    deps = [
+        ":sparse_proto",
+        ":utils",
+    ],
+)
+
+cc_library(
+    name = "parser_ops_cc",
+    srcs = ["ops/parser_ops.cc"],
+    deps = [
+        ":base",
+        ":document_filters",
+        ":lexicon_builder",
+        ":reader_ops",
+        ":unpack_sparse_features",
+    ],
+    alwayslink = 1,
+)
+
+cc_binary(
+    name = "parser_ops.so",
+    linkopts = select({
+        "//conditions:default": ["-lm"],
+        "@tf//tensorflow:darwin": [],
+    }),
+    linkshared = 1,
+    linkstatic = 1,
+    deps = [
+        ":parser_ops_cc",
+    ],
+)
+
+# cc tests
+
+filegroup(
+    name = "testdata",
+    srcs = [
+        "testdata/context.pbtxt",
+        "testdata/document",
+        "testdata/mini-training-set",
+    ],
+)
+
+cc_test(
+    name = "shared_store_test",
+    size = "small",
+    srcs = ["shared_store_test.cc"],
+    deps = [
+        ":shared_store",
+        ":test_main",
+    ],
+)
+
+cc_test(
+    name = "sentence_features_test",
+    size = "medium",
+    srcs = ["sentence_features_test.cc"],
+    deps = [
+        ":feature_extractor",
+        ":populate_test_inputs",
+        ":sentence_features",
+        ":sentence_proto",
+        ":task_context",
+        ":task_spec_proto",
+        ":term_frequency_map",
+        ":test_main",
+        ":workspace",
+    ],
+)
+
+cc_test(
+    name = "arc_standard_transitions_test",
+    size = "small",
+    srcs = ["arc_standard_transitions_test.cc"],
+    data = [":testdata"],
+    deps = [
+        ":parser_transitions",
+        ":populate_test_inputs",
+        ":test_main",
+    ],
+)
+
+cc_test(
+    name = "tagger_transitions_test",
+    size = "small",
+    srcs = ["tagger_transitions_test.cc"],
+    data = [":testdata"],
+    deps = [
+        ":parser_transitions",
+        ":populate_test_inputs",
+        ":test_main",
+    ],
+)
+
+cc_test(
+    name = "parser_features_test",
+    size = "small",
+    srcs = ["parser_features_test.cc"],
+    deps = [
+        ":feature_extractor",
+        ":parser_features",
+        ":parser_transitions",
+        ":populate_test_inputs",
+        ":sentence_proto",
+        ":task_context",
+        ":task_spec_proto",
+        ":term_frequency_map",
+        ":test_main",
+        ":workspace",
+    ],
+)
+
+# py graph builder and trainer
+
+tf_gen_op_libs(
+    op_lib_names = ["parser_ops"],
+)
+
+tf_gen_op_wrapper_py(
+    name = "parser_ops",
+    deps = [":parser_ops_op_lib"],
+)
+
+py_library(
+    name = "load_parser_ops_py",
+    srcs = ["load_parser_ops.py"],
+    data = [":parser_ops.so"],
+)
+
+py_library(
+    name = "graph_builder",
+    srcs = ["graph_builder.py"],
+    deps = [
+        "@tf//tensorflow:tensorflow_py",
+        "@tf//tensorflow/core:protos_all_py",
+        ":load_parser_ops_py",
+        ":parser_ops",
+    ],
+)
+
+py_library(
+    name = "structured_graph_builder",
+    srcs = ["structured_graph_builder.py"],
+    deps = [
+        ":graph_builder",
+    ],
+)
+
+py_binary(
+    name = "parser_trainer",
+    srcs = ["parser_trainer.py"],
+    deps = [
+        ":graph_builder",
+        ":structured_graph_builder",
+        ":task_spec_py_pb2",
+    ],
+)
+
+py_binary(
+    name = "parser_eval",
+    srcs = ["parser_eval.py"],
+    deps = [
+        ":graph_builder",
+        ":sentence_py_pb2",
+        ":structured_graph_builder",
+    ],
+)
+
+py_binary(
+    name = "conll2tree",
+    srcs = ["conll2tree.py"],
+    deps = [
+        ":graph_builder",
+        ":sentence_py_pb2",
+    ],
+)
+
+# py tests
+
+py_test(
+    name = "lexicon_builder_test",
+    size = "small",
+    srcs = ["lexicon_builder_test.py"],
+    deps = [
+        ":graph_builder",
+        ":sentence_py_pb2",
+        ":task_spec_py_pb2",
+    ],
+)
+
+py_test(
+    name = "text_formats_test",
+    size = "small",
+    srcs = ["text_formats_test.py"],
+    deps = [
+        ":graph_builder",
+        ":sentence_py_pb2",
+        ":task_spec_py_pb2",
+    ],
+)
+
+py_test(
+    name = "reader_ops_test",
+    size = "medium",
+    srcs = ["reader_ops_test.py"],
+    data = [":testdata"],
+    tags = ["notsan"],
+    deps = [
+        ":dictionary_py_pb2",
+        ":graph_builder",
+        ":sparse_py_pb2",
+    ],
+)
+
+py_test(
+    name = "beam_reader_ops_test",
+    size = "medium",
+    srcs = ["beam_reader_ops_test.py"],
+    data = [":testdata"],
+    tags = ["notsan"],
+    deps = [
+        ":structured_graph_builder",
+    ],
+)
+
+py_test(
+    name = "graph_builder_test",
+    size = "medium",
+    srcs = ["graph_builder_test.py"],
+    data = [
+        ":testdata",
+    ],
+    tags = ["notsan"],
+    deps = [
+        ":graph_builder",
+        ":sparse_py_pb2",
+    ],
+)
+
+sh_test(
+    name = "parser_trainer_test",
+    size = "medium",
+    srcs = ["parser_trainer_test.sh"],
+    data = [
+        ":parser_eval",
+        ":parser_trainer",
+        ":testdata",
+    ],
+    tags = ["notsan"],
+)
--- a/syntaxnet/syntaxnet/affix.cc
+++ b/syntaxnet/syntaxnet/affix.cc
+/* Copyright 2016 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include "syntaxnet/affix.h"
+
+#include <ctype.h>
+#include <string.h>
+#include <functional>
+#include <string>
+
+#include "syntaxnet/shared_store.h"
+#include "syntaxnet/task_context.h"
+#include "syntaxnet/term_frequency_map.h"
+#include "syntaxnet/utils.h"
+#include "syntaxnet/workspace.h"
+#include "tensorflow/core/lib/core/status.h"
+#include "tensorflow/core/platform/env.h"
+#include "tensorflow/core/platform/regexp.h"
+#include "util/utf8/unicodetext.h"
+
+namespace syntaxnet {
+
+// Initial number of buckets in term and affix hash maps. This must be a power
+// of two.
+static const int kInitialBuckets = 1024;
+
+// Fill factor for term and affix hash maps.
+static const int kFillFactor = 2;
+
+int TermHash(string term) {
+  return utils::Hash32(term.data(), term.size(), 0xDECAF);
+}
+
+// Copies a substring of a Unicode text to a string.
+static void UnicodeSubstring(UnicodeText::const_iterator start,
+                             UnicodeText::const_iterator end, string *result) {
+  result->clear();
+  result->append(start.utf8_data(), end.utf8_data() - start.utf8_data());
+}
+
+AffixTable::AffixTable(Type type, int max_length) {
+  type_ = type;
+  max_length_ = max_length;
+  Resize(0);
+}
+
+AffixTable::~AffixTable() { Reset(0); }
+
+void AffixTable::Reset(int max_length) {
+  // Save new maximum affix length.
+  max_length_ = max_length;
+
+  // Delete all data.
+  for (size_t i = 0; i < affixes_.size(); ++i) delete affixes_[i];
+  affixes_.clear();
+  buckets_.clear();
+  Resize(0);
+}
+
+void AffixTable::Read(const AffixTableEntry &table_entry) {
+  CHECK_EQ(table_entry.type(), type_ == PREFIX ? "PREFIX" : "SUFFIX");
+  CHECK_GE(table_entry.max_length(), 0);
+  Reset(table_entry.max_length());
+
+  // First, create all affixes.
+  for (int affix_id = 0; affix_id < table_entry.affix_size(); ++affix_id) {
+    const auto &affix_entry = table_entry.affix(affix_id);
+    CHECK_GE(affix_entry.length(), 0);
+    CHECK_LE(affix_entry.length(), max_length_);
+    CHECK(FindAffix(affix_entry.form()) == NULL);  // forbid duplicates
+    Affix *affix = AddNewAffix(affix_entry.form(), affix_entry.length());
+    CHECK_EQ(affix->id(), affix_id);
+  }
+  CHECK_EQ(affixes_.size(), table_entry.affix_size());
+
+  // Next, link the shorter affixes.
+  for (int affix_id = 0; affix_id < table_entry.affix_size(); ++affix_id) {
+    const auto &affix_entry = table_entry.affix(affix_id);
+    if (affix_entry.shorter_id() == -1) {
+      CHECK_EQ(affix_entry.length(), 1);
+      continue;
+    }
+    CHECK_GT(affix_entry.length(), 1);
+    CHECK_GE(affix_entry.shorter_id(), 0);
+    CHECK_LT(affix_entry.shorter_id(), affixes_.size());
+    Affix *affix = affixes_[affix_id];
+    Affix *shorter = affixes_[affix_entry.shorter_id()];
+    CHECK_EQ(affix->length(), shorter->length() + 1);
+    affix->set_shorter(shorter);
+  }
+}
+
+void AffixTable::Read(ProtoRecordReader *reader) {
+  AffixTableEntry table_entry;
+  TF_CHECK_OK(reader->Read(&table_entry));
+  Read(table_entry);
+}
+
+void AffixTable::Write(AffixTableEntry *table_entry) const {
+  table_entry->Clear();
+  table_entry->set_type(type_ == PREFIX ? "PREFIX" : "SUFFIX");
+  table_entry->set_max_length(max_length_);
+  for (const Affix *affix : affixes_) {
+    auto *affix_entry = table_entry->add_affix();
+    affix_entry->set_form(affix->form());
+    affix_entry->set_length(affix->length());
+    affix_entry->set_shorter_id(
+        affix->shorter() == NULL ? -1 : affix->shorter()->id());
+  }
+}
+
+void AffixTable::Write(ProtoRecordWriter *writer) const {
+  AffixTableEntry table_entry;
+  Write(&table_entry);
+  writer->Write(table_entry);
+}
+
+Affix *AffixTable::AddAffixesForWord(const char *word, size_t size) {
+  // The affix length is measured in characters and not bytes so we need to
+  // determine the length in characters.
+  UnicodeText text;
+  text.PointToUTF8(word, size);
+  int length = text.size();
+
+  // Determine longest affix.
+  int affix_len = length;
+  if (affix_len > max_length_) affix_len = max_length_;
+  if (affix_len == 0) return NULL;
+
+  // Find start and end of longest affix.
+  UnicodeText::const_iterator start, end;
+  if (type_ == PREFIX) {
+    start = end = text.begin();
+    for (int i = 0; i < affix_len; ++i) ++end;
+  } else {
+    start = end = text.end();
+    for (int i = 0; i < affix_len; ++i) --start;
+  }
+
+  // Try to find successively shorter affixes.
+  Affix *top = NULL;
+  Affix *ancestor = NULL;
+  string s;
+  while (affix_len > 0) {
+    // Try to find affix in table.
+    UnicodeSubstring(start, end, &s);
+    Affix *affix = FindAffix(s);
+    if (affix == NULL) {
+      // Affix not found, add new one to table.
+      affix = AddNewAffix(s, affix_len);
+
+      // Update ancestor chain.
+      if (ancestor != NULL) ancestor->set_shorter(affix);
+      ancestor = affix;
+      if (top == NULL) top = affix;
+    } else {
+      // Affix found. Update ancestor if needed and return match.
+      if (ancestor != NULL) ancestor->set_shorter(affix);
+      if (top == NULL) top = affix;
+      break;
+    }
+
+    // Next affix.
+    if (type_ == PREFIX) {
+      --end;
+    } else {
+      ++start;
+    }
+
+    affix_len--;
+  }
+
+  return top;
+}
+
+Affix *AffixTable::GetAffix(int id) const {
+  if (id < 0 || id >= static_cast<int>(affixes_.size())) {
+    return NULL;
+  } else {
+    return affixes_[id];
+  }
+}
+
+string AffixTable::AffixForm(int id) const {
+  Affix *affix = GetAffix(id);
+  if (affix == NULL) {
+    return "";
+  } else {
+    return affix->form();
+  }
+}
+
+int AffixTable::AffixId(const string &form) const {
+  Affix *affix = FindAffix(form);
+  if (affix == NULL) {
+    return -1;
+  } else {
+    return affix->id();
+  }
+}
+
+Affix *AffixTable::AddNewAffix(const string &form, int length) {
+  int hash = TermHash(form);
+  int id = affixes_.size();
+  if (id > static_cast<int>(buckets_.size()) * kFillFactor) Resize(id);
+  int b = hash & (buckets_.size() - 1);
+
+  // Create new affix object.
+  Affix *affix = new Affix(id, form.c_str(), length);
+  affixes_.push_back(affix);
+
+  // Insert affix in bucket chain.
+  affix->next_ = buckets_[b];
+  buckets_[b] = affix;
+
+  return affix;
+}
+
+Affix *AffixTable::FindAffix(const string &form) const {
+  // Compute hash value for word.
+  int hash = TermHash(form);
+
+  // Try to find affix in hash table.
+  Affix *affix = buckets_[hash & (buckets_.size() - 1)];
+  while (affix != NULL) {
+    if (strcmp(affix->form_.c_str(), form.c_str()) == 0) return affix;
+    affix = affix->next_;
+  }
+  return NULL;
+}
+
+void AffixTable::Resize(int size_hint) {
+  // Compute new size for bucket array.
+  int new_size = kInitialBuckets;
+  while (new_size < size_hint) new_size *= 2;
+  int mask = new_size - 1;
+
+  // Distribute affixes in new buckets.
+  buckets_.resize(new_size);
+  for (size_t i = 0; i < buckets_.size(); ++i) {
+    buckets_[i] = NULL;
+  }
+  for (size_t i = 0; i < affixes_.size(); ++i) {
+    Affix *affix = affixes_[i];
+    int b = TermHash(affix->form_) & mask;
+    affix->next_ = buckets_[b];
+    buckets_[b] = affix;
+  }
+}
+
+}  // namespace syntaxnet
--- a/syntaxnet/syntaxnet/affix.h
+++ b/syntaxnet/syntaxnet/affix.h
+/* Copyright 2016 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#ifndef $TARGETDIR_AFFIX_H_
+#define $TARGETDIR_AFFIX_H_
+
+#include <stddef.h>
+#include <string>
+#include <vector>
+
+#include "syntaxnet/utils.h"
+#include "syntaxnet/dictionary.pb.h"
+#include "syntaxnet/feature_extractor.h"
+#include "syntaxnet/proto_io.h"
+#include "syntaxnet/sentence.pb.h"
+#include "syntaxnet/task_context.h"
+#include "syntaxnet/term_frequency_map.h"
+#include "syntaxnet/workspace.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+
+namespace syntaxnet {
+
+// An affix represents a prefix or suffix of a word of a certain length. Each
+// affix has a unique id and a textual form. An affix also has a pointer to the
+// affix that is one character shorter. This creates a chain of affixes that are
+// successively shorter.
+class Affix {
+ private:
+  friend class AffixTable;
+  Affix(int id, const char *form, int length)
+      : id_(id), length_(length), form_(form), shorter_(NULL), next_(NULL) {}
+
+ public:
+  // Returns unique id of affix.
+  int id() const { return id_; }
+
+  // Returns the textual representation of the affix.
+  string form() const { return form_; }
+
+  // Returns the length of the affix.
+  int length() const { return length_; }
+
+  // Gets/sets the affix that is one character shorter.
+  Affix *shorter() const { return shorter_; }
+  void set_shorter(Affix *next) { shorter_ = next; }
+
+ private:
+  // Affix id.
+  int id_;
+
+  // Length (in characters) of affix.
+  int length_;
+
+  // Text form of affix.
+  string form_;
+
+  // Pointer to affix that is one character shorter.
+  Affix *shorter_;
+
+  // Next affix in bucket chain.
+  Affix *next_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Affix);
+};
+
+// An affix table holds all prefixes/suffixes of all the words added to the
+// table up to a maximum length. The affixes are chained together to enable
+// fast lookup of all affixes for a word.
+class AffixTable {
+ public:
+  // Affix table type.
+  enum Type { PREFIX, SUFFIX };
+
+  AffixTable(Type type, int max_length);
+  ~AffixTable();
+
+  // Resets the affix table and initialize the table for affixes of up to the
+  // maximum length specified.
+  void Reset(int max_length);
+
+  // De-serializes this from the given proto.
+  void Read(const AffixTableEntry &table_entry);
+
+  // De-serializes this from the given records.
+  void Read(ProtoRecordReader *reader);
+
+  // Serializes this to the given proto.
+  void Write(AffixTableEntry *table_entry) const;
+
+  // Serializes this to the given records.
+  void Write(ProtoRecordWriter *writer) const;
+
+  // Adds all prefixes/suffixes of the word up to the maximum length to the
+  // table. The longest affix is returned. The pointers in the affix can be
+  // used for getting shorter affixes.
+  Affix *AddAffixesForWord(const char *word, size_t size);
+
+  // Gets the affix information for the affix with a certain id. Returns NULL if
+  // there is no affix in the table with this id.
+  Affix *GetAffix(int id) const;
+
+  // Gets affix form from id. If the affix does not exist in the table, an empty
+  // string is returned.
+  string AffixForm(int id) const;
+
+  // Gets affix id for affix. If the affix does not exist in the table, -1 is
+  // returned.
+  int AffixId(const string &form) const;
+
+  // Returns size of the affix table.
+  int size() const { return affixes_.size(); }
+
+  // Returns the maximum affix length.
+  int max_length() const { return max_length_; }
+
+ private:
+  // Adds a new affix to table.
+  Affix *AddNewAffix(const string &form, int length);
+
+  // Finds existing affix in table.
+  Affix *FindAffix(const string &form) const;
+
+  // Resizes bucket array.
+  void Resize(int size_hint);
+
+  // Affix type (prefix or suffix).
+  Type type_;
+
+  // Maximum length of affix.
+  int max_length_;
+
+  // Index from affix ids to affix items.
+  vector<Affix *> affixes_;
+
+  // Buckets for word-to-affix hash map.
+  vector<Affix *> buckets_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(AffixTable);
+};
+
+}  // namespace syntaxnet
+
+#endif  // $TARGETDIR_AFFIX_H_
--- a/syntaxnet/syntaxnet/arc_standard_transitions.cc
+++ b/syntaxnet/syntaxnet/arc_standard_transitions.cc
+/* Copyright 2016 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+// Arc-standard transition system.
+//
+// This transition system has three types of actions:
+//  - The SHIFT action pushes the next input token to the stack and
+//    advances to the next input token.
+//  - The LEFT_ARC action adds a dependency relation from first to second token
+//    on the stack and removes second one.
+//  - The RIGHT_ARC action adds a dependency relation from second to first token
+//    on the stack and removes the first one.
+//
+// The transition system operates with parser actions encoded as integers:
+//  - A SHIFT action is encoded as 0.
+//  - A LEFT_ARC action is encoded as an odd number starting from 1.
+//  - A RIGHT_ARC action is encoded as an even number starting from 2.
+
+#include <string>
+
+#include "syntaxnet/utils.h"
+#include "syntaxnet/parser_state.h"
+#include "syntaxnet/parser_transitions.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+
+namespace syntaxnet {
+
+class ArcStandardTransitionState : public ParserTransitionState {
+ public:
+  // Clones the transition state by returning a new object.
+  ParserTransitionState *Clone() const override {
+    return new ArcStandardTransitionState();
+  }
+
+  // Pushes the root on the stack before using the parser state in parsing.
+  void Init(ParserState *state) override { state->Push(-1); }
+
+  // Adds transition state specific annotations to the document.
+  void AddParseToDocument(const ParserState &state, bool rewrite_root_labels,
+                          Sentence *sentence) const override {
+    for (int i = 0; i < state.NumTokens(); ++i) {
+      Token *token = sentence->mutable_token(i);
+      token->set_label(state.LabelAsString(state.Label(i)));
+      if (state.Head(i) != -1) {
+        token->set_head(state.Head(i));
+      } else {
+        token->clear_head();
+        if (rewrite_root_labels) {
+          token->set_label(state.LabelAsString(state.RootLabel()));
+        }
+      }
+    }
+  }
+
+  // Whether a parsed token should be considered correct for evaluation.
+  bool IsTokenCorrect(const ParserState &state, int index) const override {
+    return state.GoldHead(index) == state.Head(index);
+  }
+
+  // Returns a human readable string representation of this state.
+  string ToString(const ParserState &state) const override {
+    string str;
+    str.append("[");
+    for (int i = state.StackSize() - 1; i >= 0; --i) {
+      const string &word = state.GetToken(state.Stack(i)).word();
+      if (i != state.StackSize() - 1) str.append(" ");
+      if (word == "") {
+        str.append(ParserState::kRootLabel);
+      } else {
+        str.append(word);
+      }
+    }
+    str.append("]");
+    for (int i = state.Next(); i < state.NumTokens(); ++i) {
+      tensorflow::strings::StrAppend(&str, " ", state.GetToken(i).word());
+    }
+    return str;
+  }
+};
+
+class ArcStandardTransitionSystem : public ParserTransitionSystem {
+ public:
+  // Action types for the arc-standard transition system.
+  enum ParserActionType {
+    SHIFT = 0,
+    LEFT_ARC = 1,
+    RIGHT_ARC = 2,
+  };
+
+  // The SHIFT action uses the same value as the corresponding action type.
+  static ParserAction ShiftAction() { return SHIFT; }
+
+  // The LEFT_ARC action converts the label to an odd number greater or equal
+  // to 1.
+  static ParserAction LeftArcAction(int label) { return 1 + (label << 1); }
+
+  // The RIGHT_ARC action converts the label to an even number greater or equal
+  // to 2.
+  static ParserAction RightArcAction(int label) {
+    return 1 + ((label << 1) | 1);
+  }
+
+  // Extracts the action type from a given parser action.
+  static ParserActionType ActionType(ParserAction action) {
+    return static_cast<ParserActionType>(action < 1 ? action
+                                                    : 1 + (~action & 1));
+  }
+
+  // Extracts the label from a given parser action. If the action is SHIFT,
+  // returns -1.
+  static int Label(ParserAction action) {
+    return action < 1 ? -1 : (action - 1) >> 1;
+  }
+
+  // Returns the number of action types.
+  int NumActionTypes() const override { return 3; }
+
+  // Returns the number of possible actions.
+  int NumActions(int num_labels) const override { return 1 + 2 * num_labels; }
+
+  // The method returns the default action for a given state.
+  ParserAction GetDefaultAction(const ParserState &state) const override {
+    // If there are further tokens available in the input then Shift.
+    if (!state.EndOfInput()) return ShiftAction();
+
+    // Do a "reduce".
+    return RightArcAction(2);
+  }
+
+  // Returns the next gold action for a given state according to the
+  // underlying annotated sentence.
+  ParserAction GetNextGoldAction(const ParserState &state) const override {
+    // If the stack contains less than 2 tokens, the only valid parser action is
+    // shift.
+    if (state.StackSize() < 2) {
+      DCHECK(!state.EndOfInput());
+      return ShiftAction();
+    }
+
+    // If the second token on the stack is the head of the first one,
+    // return a right arc action.
+    if (state.GoldHead(state.Stack(0)) == state.Stack(1) &&
+        DoneChildrenRightOf(state, state.Stack(0))) {
+      const int gold_label = state.GoldLabel(state.Stack(0));
+      return RightArcAction(gold_label);
+    }
+
+    // If the first token on the stack is the head of the second one,
+    // return a left arc action.
+    if (state.GoldHead(state.Stack(1)) == state.Top()) {
+      const int gold_label = state.GoldLabel(state.Stack(1));
+      return LeftArcAction(gold_label);
+    }
+
+    // Otherwise, shift.
+    return ShiftAction();
+  }
+
+  // Determines if a token has any children to the right in the sentence.
+  // Arc standard is a bottom-up parsing method and has to finish all sub-trees
+  // first.
+  static bool DoneChildrenRightOf(const ParserState &state, int head) {
+    int index = state.Next();
+    int num_tokens = state.sentence().token_size();
+    while (index < num_tokens) {
+      // Check if the token at index is the child of head.
+      int actual_head = state.GoldHead(index);
+      if (actual_head == head) return false;
+
+      // If the head of the token at index is to the right of it there cannot be
+      // any children in-between, so we can skip forward to the head.  Note this
+      // is only true for projective trees.
+      if (actual_head > index) {
+        index = actual_head;
+      } else {
+        ++index;
+      }
+    }
+    return true;
+  }
+
+  // Checks if the action is allowed in a given parser state.
+  bool IsAllowedAction(ParserAction action,
+                       const ParserState &state) const override {
+    switch (ActionType(action)) {
+      case SHIFT:
+        return IsAllowedShift(state);
+      case LEFT_ARC:
+        return IsAllowedLeftArc(state);
+      case RIGHT_ARC:
+        return IsAllowedRightArc(state);
+    }
+
+    return false;
+  }
+
+  // Returns true if a shift is allowed in the given parser state.
+  bool IsAllowedShift(const ParserState &state) const {
+    // We can shift if there are more input tokens.
+    return !state.EndOfInput();
+  }
+
+  // Returns true if a left-arc is allowed in the given parser state.
+  bool IsAllowedLeftArc(const ParserState &state) const {
+    // Left-arc requires two or more tokens on the stack but the first token
+    // is the root an we do not want and left arc to the root.
+    return state.StackSize() > 2;
+  }
+
+  // Returns true if a right-arc is allowed in the given parser state.
+  bool IsAllowedRightArc(const ParserState &state) const {
+    // Right arc requires three or more tokens on the stack.
+    return state.StackSize() > 1;
+  }
+
+  // Performs the specified action on a given parser state, without adding the
+  // action to the state's history.
+  void PerformActionWithoutHistory(ParserAction action,
+                                   ParserState *state) const override {
+    switch (ActionType(action)) {
+      case SHIFT:
+        PerformShift(state);
+        break;
+      case LEFT_ARC:
+        PerformLeftArc(state, Label(action));
+        break;
+      case RIGHT_ARC:
+        PerformRightArc(state, Label(action));
+        break;
+    }
+  }
+
+  // Makes a shift by pushing the next input token on the stack and moving to
+  // the next position.
+  void PerformShift(ParserState *state) const {
+    DCHECK(IsAllowedShift(*state));
+    state->Push(state->Next());
+    state->Advance();
+  }
+
+  // Makes a left-arc between the two top tokens on stack and pops the second
+  // token on stack.
+  void PerformLeftArc(ParserState *state, int label) const {
+    DCHECK(IsAllowedLeftArc(*state));
+    int s0 = state->Pop();
+    state->AddArc(state->Pop(), s0, label);
+    state->Push(s0);
+  }
+
+  // Makes a right-arc between the two top tokens on stack and pops the stack.
+  void PerformRightArc(ParserState *state, int label) const {
+    DCHECK(IsAllowedRightArc(*state));
+    int s0 = state->Pop();
+    int s1 = state->Pop();
+    state->AddArc(s0, s1, label);
+    state->Push(s1);
+  }
+
+  // We are in a deterministic state when we either reached the end of the input
+  // or reduced everything from the stack.
+  bool IsDeterministicState(const ParserState &state) const override {
+    return state.StackSize() < 2 && !state.EndOfInput();
+  }
+
+  // We are in a final state when we reached the end of the input and the stack
+  // is empty.
+  bool IsFinalState(const ParserState &state) const override {
+    return state.EndOfInput() && state.StackSize() < 2;
+  }
+
+  // Returns a string representation of a parser action.
+  string ActionAsString(ParserAction action,
+                        const ParserState &state) const override {
+    switch (ActionType(action)) {
+      case SHIFT:
+        return "SHIFT";
+      case LEFT_ARC:
+        return "LEFT_ARC(" + state.LabelAsString(Label(action)) + ")";
+      case RIGHT_ARC:
+        return "RIGHT_ARC(" + state.LabelAsString(Label(action)) + ")";
+    }
+    return "UNKNOWN";
+  }
+
+  // Returns a new transition state to be used to enhance the parser state.
+  ParserTransitionState *NewTransitionState(bool training_mode) const override {
+    return new ArcStandardTransitionState();
+  }
+};
+
+REGISTER_TRANSITION_SYSTEM("arc-standard", ArcStandardTransitionSystem);
+
+}  // namespace syntaxnet
--- a/syntaxnet/syntaxnet/arc_standard_transitions_test.cc
+++ b/syntaxnet/syntaxnet/arc_standard_transitions_test.cc
+/* Copyright 2016 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include <memory>
+#include <string>
+#include <gmock/gmock.h>
+
+#include "syntaxnet/utils.h"
+#include "syntaxnet/parser_state.h"
+#include "syntaxnet/parser_transitions.h"
+#include "syntaxnet/populate_test_inputs.h"
+#include "syntaxnet/sentence.pb.h"
+#include "syntaxnet/task_context.h"
+#include "syntaxnet/task_spec.pb.h"
+#include "syntaxnet/term_frequency_map.h"
+#include "tensorflow/core/lib/core/status.h"
+#include "tensorflow/core/platform/env.h"
+#include "tensorflow/core/platform/test.h"
+
+namespace syntaxnet {
+
+class ArcStandardTransitionTest : public ::testing::Test {
+ public:
+  ArcStandardTransitionTest()
+      : transition_system_(ParserTransitionSystem::Create("arc-standard")) {}
+
+ protected:
+  // Creates a label map and a tag map for testing based on the given
+  // document and initializes the transition system appropriately.
+  void SetUpForDocument(const Sentence &document) {
+    input_label_map_ = context_.GetInput("label-map", "text", "");
+    transition_system_->Setup(&context_);
+    PopulateTestInputs::Defaults(document).Populate(&context_);
+    label_map_.Load(TaskContext::InputFile(*input_label_map_),
+                    0 /* minimum frequency */,
+                    -1 /* maximum number of terms */);
+    transition_system_->Init(&context_);
+  }
+
+  // Creates a cloned state from a sentence in order to test that cloning
+  // works correctly for the new parser states.
+  ParserState *NewClonedState(Sentence *sentence) {
+    ParserState state(sentence, transition_system_->NewTransitionState(
+                                    true /* training mode */),
+                      &label_map_);
+    return state.Clone();
+  }
+
+  // Performs gold transitions and check that the labels and heads recorded
+  // in the parser state match gold heads and labels.
+  void GoldParse(Sentence *sentence) {
+    ParserState *state = NewClonedState(sentence);
+    LOG(INFO) << "Initial parser state: " << state->ToString();
+    while (!transition_system_->IsFinalState(*state)) {
+      ParserAction action = transition_system_->GetNextGoldAction(*state);
+      EXPECT_TRUE(transition_system_->IsAllowedAction(action, *state));
+      LOG(INFO) << "Performing action: "
+                << transition_system_->ActionAsString(action, *state);
+      transition_system_->PerformActionWithoutHistory(action, state);
+      LOG(INFO) << "Parser state: " << state->ToString();
+    }
+    for (int i = 0; i < sentence->token_size(); ++i) {
+      EXPECT_EQ(state->GoldLabel(i), state->Label(i));
+      EXPECT_EQ(state->GoldHead(i), state->Head(i));
+    }
+    delete state;
+  }
+
+  // Always takes the default action, and verifies that this leads to
+  // a final state through a sequence of allowed actions.
+  void DefaultParse(Sentence *sentence) {
+    ParserState *state = NewClonedState(sentence);
+    LOG(INFO) << "Initial parser state: " << state->ToString();
+    while (!transition_system_->IsFinalState(*state)) {
+      ParserAction action = transition_system_->GetDefaultAction(*state);
+      EXPECT_TRUE(transition_system_->IsAllowedAction(action, *state));
+      LOG(INFO) << "Performing action: "
+                << transition_system_->ActionAsString(action, *state);
+      transition_system_->PerformActionWithoutHistory(action, state);
+      LOG(INFO) << "Parser state: " << state->ToString();
+    }
+    delete state;
+  }
+
+  TaskContext context_;
+  TaskInput *input_label_map_ = nullptr;
+  TermFrequencyMap label_map_;
+  std::unique_ptr<ParserTransitionSystem> transition_system_;
+};
+
+TEST_F(ArcStandardTransitionTest, SingleSentenceDocumentTest) {
+  string document_text;
+  Sentence document;
+  TF_CHECK_OK(ReadFileToString(
+      tensorflow::Env::Default(),
+      "syntaxnet/testdata/document",
+      &document_text));
+  LOG(INFO) << "see doc\n:" << document_text;
+  CHECK(TextFormat::ParseFromString(document_text, &document));
+  SetUpForDocument(document);
+  GoldParse(&document);
+  DefaultParse(&document);
+}
+
+}  // namespace syntaxnet
--- a/syntaxnet/syntaxnet/base.h
+++ b/syntaxnet/syntaxnet/base.h
+/* Copyright 2016 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#ifndef $TARGETDIR_BASE_H_
+#define $TARGETDIR_BASE_H_
+
+#include <functional>
+#include <string>
+#include <vector>
+#include <unordered_map>
+#include <unordered_set>
+#include "tensorflow/core/lib/core/status.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/platform/default/integral_types.h"
+#include "tensorflow/core/platform/mutex.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+
+
+using tensorflow::int32;
+using tensorflow::int64;
+using tensorflow::uint64;
+using tensorflow::uint32;
+using tensorflow::uint32;
+using tensorflow::protobuf::TextFormat;
+using tensorflow::mutex_lock;
+using tensorflow::mutex;
+using std::map;
+using std::pair;
+using std::vector;
+using std::unordered_map;
+using std::unordered_set;
+typedef signed int char32;
+
+using tensorflow::StringPiece;
+using std::string;
+
+  // namespace syntaxnet
+
+#endif  // $TARGETDIR_BASE_H_
--- a/syntaxnet/syntaxnet/beam_reader_ops.cc
+++ b/syntaxnet/syntaxnet/beam_reader_ops.cc
+/* Copyright 2016 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include <algorithm>
+#include <deque>
+#include <map>
+#include <memory>
+#include <string>
+#include <utility>
+#include <vector>
+
+#include "syntaxnet/base.h"
+#include "syntaxnet/parser_state.h"
+#include "syntaxnet/parser_transitions.h"
+#include "syntaxnet/sentence_batch.h"
+#include "syntaxnet/sentence.pb.h"
+#include "syntaxnet/shared_store.h"
+#include "syntaxnet/sparse.pb.h"
+#include "syntaxnet/task_context.h"
+#include "syntaxnet/task_spec.pb.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor.h"
+#include "tensorflow/core/framework/tensor_shape.h"
+#include "tensorflow/core/lib/core/status.h"
+#include "tensorflow/core/lib/io/inputbuffer.h"
+#include "tensorflow/core/platform/env.h"
+
+using tensorflow::DEVICE_CPU;
+using tensorflow::DT_BOOL;
+using tensorflow::DT_FLOAT;
+using tensorflow::DT_INT32;
+using tensorflow::DT_INT64;
+using tensorflow::DT_STRING;
+using tensorflow::DataType;
+using tensorflow::OpKernel;
+using tensorflow::OpKernelConstruction;
+using tensorflow::OpKernelContext;
+using tensorflow::TTypes;
+using tensorflow::Tensor;
+using tensorflow::TensorShape;
+using tensorflow::errors::FailedPrecondition;
+using tensorflow::errors::InvalidArgument;
+
+namespace syntaxnet {
+
+// Wraps ParserState so that the history of transitions (actions
+// performed and the beam slot they were performed in) are recorded.
+struct ParserStateWithHistory {
+ public:
+  // New state with an empty history.
+  explicit ParserStateWithHistory(const ParserState &s) : state(s.Clone()) {}
+
+  // New state obtained by cloning the given state and applying the given
+  // action. The given beam slot and action are appended to the history.
+  ParserStateWithHistory(const ParserStateWithHistory &next,
+                         const ParserTransitionSystem &transitions, int32 slot,
+                         int32 action, float score)
+      : state(next.state->Clone()),
+        slot_history(next.slot_history),
+        action_history(next.action_history),
+        score_history(next.score_history) {
+    transitions.PerformAction(action, state.get());
+    slot_history.push_back(slot);
+    action_history.push_back(action);
+    score_history.push_back(score);
+  }
+
+  std::unique_ptr<ParserState> state;
+  std::vector<int32> slot_history;
+  std::vector<int32> action_history;
+  std::vector<float> score_history;
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(ParserStateWithHistory);
+};
+
+struct BatchStateOptions {
+  // Maximum number of parser states in a beam.
+  int max_beam_size;
+
+  // Number of parallel sentences to decode.
+  int batch_size;
+
+  // Argument prefix for context parameters.
+  string arg_prefix;
+
+  // Corpus name to read from from context inputs.
+  string corpus_name;
+
+  // Whether we allow weights in SparseFeatures protos.
+  bool allow_feature_weights;
+
+  // Whether beams should be considered alive until all states are final, or
+  // until the gold path falls off.
+  bool continue_until_all_final;
+
+  // Whether to skip to a new sentence after each training step.
+  bool always_start_new_sentences;
+
+  // Parameter for deciding which tokens to score.
+  string scoring_type;
+};
+
+// Encapsulates the environment needed to parse with a beam, keeping a
+// record of path histories.
+class BeamState {
+ public:
+  // The agenda is keyed by a tuple that is the score followed by an
+  // int that is -1 if the path coincides with the gold path and 0
+  // otherwise. The lexicographic ordering of the keys therefore
+  // ensures that for all paths sharing the same score, the gold path
+  // will always be at the bottom. This situation can occur at the
+  // onset of training when all weights are zero and therefore all
+  // paths have an identically zero score.
+  typedef std::pair<double, int> KeyType;
+  typedef std::multimap<KeyType, std::unique_ptr<ParserStateWithHistory>>
+      AgendaType;
+  typedef std::pair<const KeyType, std::unique_ptr<ParserStateWithHistory>>
+      AgendaItem;
+  typedef Eigen::Tensor<float, 2, Eigen::RowMajor, Eigen::DenseIndex>
+      ScoreMatrixType;
+
+  // The beam can be
+  //   - ALIVE: parsing is still active, features are being output for at least
+  //     some slots in the beam.
+  //   - DYING: features should be output for this beam only one more time, then
+  //     the beam will be DEAD. This state is reached when the gold path falls
+  //     out of the beam and features have to be output one last time.
+  //   - DEAD: parsing is not active, features are not being output and the no
+  //     actions are taken on the states.
+  enum State { ALIVE = 0, DYING = 1, DEAD = 2 };
+
+  explicit BeamState(const BatchStateOptions &options) : options_(options) {}
+
+  void Reset() {
+    if (options_.always_start_new_sentences ||
+        gold_ == nullptr || transition_system_->IsFinalState(*gold_)) {
+      AdvanceSentence();
+    }
+    slots_.clear();
+    if (gold_ == nullptr) {
+      state_ = DEAD;  // EOF has been reached.
+    } else {
+      gold_->set_is_gold(true);
+      slots_.emplace(KeyType(0.0, -1), std::unique_ptr<ParserStateWithHistory>(
+          new ParserStateWithHistory(*gold_)));
+      state_ = ALIVE;
+    }
+  }
+
+  void UpdateAllFinal() {
+    all_final_ = true;
+    for (const AgendaItem &item : slots_) {
+      if (!transition_system_->IsFinalState(*item.second->state)) {
+        all_final_ = false;
+        break;
+      }
+    }
+    if (all_final_) {
+      state_ = DEAD;
+    }
+  }
+
+  // This method updates the beam. For all elements of the beam, all
+  // allowed transitions are scored and insterted into a new beam. The
+  // beam size is capped by discarding the lowest scoring slots at any
+  // given time. There is one exception to this process: the gold path
+  // is forced to remain in the beam at all times, even if it scores
+  // low. This is to ensure that the gold path can be used for
+  // training at the moment it would otherwise fall off (and be absent
+  // from) the beam.
+  void Advance(const ScoreMatrixType &scores) {
+    // If the beam was in the state of DYING, it is now DEAD.
+    if (state_ == DYING) state_ = DEAD;
+
+    // When to stop advancing depends on the 'continue_until_all_final' arg.
+    if (!IsAlive() || gold_ == nullptr) return;
+
+    AdvanceGold();
+
+    const int score_rows = scores.dimension(0);
+    const int num_actions = scores.dimension(1);
+
+    // Advance beam.
+    AgendaType previous_slots;
+    previous_slots.swap(slots_);
+
+    CHECK_EQ(state_, ALIVE);
+
+    int slot = 0;
+    for (AgendaItem &item : previous_slots) {
+      {
+        ParserState *current = item.second->state.get();
+        VLOG(2) << "Slot: " << slot;
+        VLOG(2) << "Parser state: " << current->ToString();
+        VLOG(2) << "Parser state cumulative score: " << item.first.first << " "
+                << (item.first.second < 0 ? "golden" : "");
+      }
+      if (!transition_system_->IsFinalState(*item.second->state)) {
+        // Not a final state.
+        for (int action = 0; action < num_actions; ++action) {
+          // Is action allowed?
+          if (!transition_system_->IsAllowedAction(action,
+                                                   *item.second->state)) {
+            continue;
+          }
+          CHECK_LT(slot, score_rows);
+          MaybeInsertWithNewAction(item, slot, scores(slot, action), action);
+          PruneBeam();
+        }
+      } else {
+        // Final state: no need to advance.
+        MaybeInsert(&item);
+        PruneBeam();
+      }
+      ++slot;
+    }
+    UpdateAllFinal();
+  }
+
+  void PopulateFeatureOutputs(
+      std::vector<std::vector<std::vector<SparseFeatures>>> *features) {
+    for (const AgendaItem &item : slots_) {
+      VLOG(2) << "State: " << item.second->state->ToString();
+      std::vector<std::vector<SparseFeatures>> f =
+          features_->ExtractSparseFeatures(*workspace_, *item.second->state);
+      for (size_t i = 0; i < f.size(); ++i) (*features)[i].push_back(f[i]);
+    }
+  }
+
+  int BeamSize() const { return slots_.size(); }
+
+  bool IsAlive() const { return state_ == ALIVE; }
+
+  bool IsDead() const { return state_ == DEAD; }
+
+  bool AllFinal() const { return all_final_; }
+
+  // The current contents of the beam.
+  AgendaType slots_;
+
+  // Which batch this refers to.
+  int beam_id_ = 0;
+
+  // Sentence batch reader.
+  SentenceBatch *sentence_batch_ = nullptr;
+
+  // Label map.
+  const TermFrequencyMap *label_map_ = nullptr;
+
+  // Transition system.
+  const ParserTransitionSystem *transition_system_ = nullptr;
+
+  // Feature extractor.
+  const ParserEmbeddingFeatureExtractor *features_ = nullptr;
+
+  // Feature workspace set.
+  WorkspaceSet *workspace_ = nullptr;
+
+  // Internal workspace registry for use in feature extraction.
+  WorkspaceRegistry *workspace_registry_ = nullptr;
+
+  // ParserState used to get gold actions.
+  std::unique_ptr<ParserState> gold_;
+
+ private:
+  // Creates a new ParserState if there's another sentence to be read.
+  void AdvanceSentence() {
+    gold_.reset();
+    if (sentence_batch_->AdvanceSentence(beam_id_)) {
+      gold_.reset(new ParserState(sentence_batch_->sentence(beam_id_),
+                                  transition_system_->NewTransitionState(true),
+                                  label_map_));
+      workspace_->Reset(*workspace_registry_);
+      features_->Preprocess(workspace_, gold_.get());
+    }
+  }
+
+  void AdvanceGold() {
+    gold_action_ = -1;
+    if (!transition_system_->IsFinalState(*gold_)) {
+      gold_action_ = transition_system_->GetNextGoldAction(*gold_);
+      if (transition_system_->IsAllowedAction(gold_action_, *gold_)) {
+        // In cases where the gold annotation is incompatible with the
+        // transition system, the action returned as gold might be not allowed.
+        transition_system_->PerformAction(gold_action_, gold_.get());
+      }
+    }
+  }
+
+  // Removes the first non-gold beam element if the beam is larger than
+  // the maximum beam size. If the gold element was at the bottom of the
+  // beam, sets the beam state to DYING, otherwise leaves the state alone.
+  void PruneBeam() {
+    if (static_cast<int>(slots_.size()) > options_.max_beam_size) {
+      auto bottom = slots_.begin();
+      if (!options_.continue_until_all_final &&
+          bottom->second->state->is_gold()) {
+        state_ = DYING;
+        ++bottom;
+      }
+      slots_.erase(bottom);
+    }
+  }
+
+  // Inserts an item in the beam if
+  //   - the item is gold,
+  //   - the beam is not full, or
+  //   - the item's new score is greater than the lowest score in the beam after
+  //     the score has been incremented by given delta_score.
+  // Inserted items have slot, delta_score and action appended to their history.
+  void MaybeInsertWithNewAction(const AgendaItem &item, const int slot,
+                                const double delta_score, const int action) {
+    const double score = item.first.first + delta_score;
+    const bool is_gold =
+        item.second->state->is_gold() && action == gold_action_;
+    if (is_gold || static_cast<int>(slots_.size()) < options_.max_beam_size ||
+        score > slots_.begin()->first.first) {
+      const KeyType key{score, -static_cast<int>(is_gold)};
+      slots_.emplace(key, std::unique_ptr<ParserStateWithHistory>(
+                              new ParserStateWithHistory(
+                                  *item.second, *transition_system_, slot,
+                                  action, delta_score)))
+          ->second->state->set_is_gold(is_gold);
+    }
+  }
+
+  // Inserts an item in the beam if
+  //   - the item is gold,
+  //   - the beam is not full, or
+  //   - the item's new score is greater than the lowest score in the beam.
+  // The history of inserted items is left untouched.
+  void MaybeInsert(AgendaItem *item) {
+    const bool is_gold = item->second->state->is_gold();
+    const double score = item->first.first;
+    if (is_gold || static_cast<int>(slots_.size()) < options_.max_beam_size ||
+        score > slots_.begin()->first.first) {
+      slots_.emplace(item->first, std::move(item->second));
+    }
+  }
+
+  // Limits the number of slots on the beam.
+  const BatchStateOptions &options_;
+
+  int gold_action_ = -1;
+  State state_ = ALIVE;
+  bool all_final_ = false;
+  TF_DISALLOW_COPY_AND_ASSIGN(BeamState);
+};
+
+// Encapsulates the state of a batch of beams. It is an object of this
+// type that will persist through repeated Op evaluations as the
+// multiple steps are computed in sequence.
+class BatchState {
+ public:
+  explicit BatchState(const BatchStateOptions &options)
+      : options_(options), features_(options.arg_prefix) {}
+
+  ~BatchState() { SharedStore::Release(label_map_); }
+
+  void Init(TaskContext *task_context) {
+    // Create sentence batch.
+    sentence_batch_.reset(
+        new SentenceBatch(BatchSize(), options_.corpus_name));
+    sentence_batch_->Init(task_context);
+
+    // Create transition system.
+    transition_system_.reset(ParserTransitionSystem::Create(task_context->Get(
+        tensorflow::strings::StrCat(options_.arg_prefix, "_transition_system"),
+        "arc-standard")));
+    transition_system_->Setup(task_context);
+    transition_system_->Init(task_context);
+
+    // Create label map.
+    string label_map_path =
+        TaskContext::InputFile(*task_context->GetInput("label-map"));
+    label_map_ = SharedStoreUtils::GetWithDefaultName<TermFrequencyMap>(
+        label_map_path, 0, 0);
+
+    // Setup features.
+    features_.Setup(task_context);
+    features_.Init(task_context);
+    features_.RequestWorkspaces(&workspace_registry_);
+
+    // Create workspaces.
+    workspaces_.resize(BatchSize());
+
+    // Create beams.
+    beams_.clear();
+    for (int beam_id = 0; beam_id < BatchSize(); ++beam_id) {
+      beams_.emplace_back(options_);
+      beams_[beam_id].beam_id_ = beam_id;
+      beams_[beam_id].sentence_batch_ = sentence_batch_.get();
+      beams_[beam_id].transition_system_ = transition_system_.get();
+      beams_[beam_id].label_map_ = label_map_;
+      beams_[beam_id].features_ = &features_;
+      beams_[beam_id].workspace_ = &workspaces_[beam_id];
+      beams_[beam_id].workspace_registry_ = &workspace_registry_;
+    }
+  }
+
+  void ResetBeams() {
+    for (BeamState &beam : beams_) {
+      beam.Reset();
+    }
+
+    // Rewind if no states remain in the batch (we need to rewind the corpus).
+    if (sentence_batch_->size() == 0) {
+      ++epoch_;
+      VLOG(2) << "Starting epoch " << epoch_;
+      sentence_batch_->Rewind();
+    }
+  }
+
+  // Resets the offset vectors required for a single run because we're
+  // starting a new matrix of scores.
+  void ResetOffsets() {
+    beam_offsets_.clear();
+    step_offsets_ = {0};
+    UpdateOffsets();
+  }
+
+  void AdvanceBeam(const int beam_id,
+                   const TTypes<float>::ConstMatrix &scores) {
+    const int offset = beam_offsets_.back()[beam_id];
+    Eigen::array<Eigen::DenseIndex, 2> offsets = {offset, 0};
+    Eigen::array<Eigen::DenseIndex, 2> extents = {
+        beam_offsets_.back()[beam_id + 1] - offset, NumActions()};
+    BeamState::ScoreMatrixType beam_scores = scores.slice(offsets, extents);
+    beams_[beam_id].Advance(beam_scores);
+  }
+
+  void UpdateOffsets() {
+    beam_offsets_.emplace_back(BatchSize() + 1, 0);
+    std::vector<int> &offsets = beam_offsets_.back();
+    for (int beam_id = 0; beam_id < BatchSize(); ++beam_id) {
+      // If the beam is ALIVE or DYING (but not DEAD), we want to
+      // output the activations.
+      const BeamState &beam = beams_[beam_id];
+      const int beam_size = beam.IsDead() ? 0 : beam.BeamSize();
+      offsets[beam_id + 1] = offsets[beam_id] + beam_size;
+    }
+    const int output_size = offsets.back();
+    step_offsets_.push_back(step_offsets_.back() + output_size);
+  }
+
+  tensorflow::Status PopulateFeatureOutputs(OpKernelContext *context) {
+    const int feature_size = FeatureSize();
+    std::vector<std::vector<std::vector<SparseFeatures>>> features(
+        feature_size);
+    for (int beam_id = 0; beam_id < BatchSize(); ++beam_id) {
+      if (!beams_[beam_id].IsDead()) {
+        beams_[beam_id].PopulateFeatureOutputs(&features);
+      }
+    }
+    CHECK_EQ(features.size(), feature_size);
+    Tensor *output;
+    const int total_slots = beam_offsets_.back().back();
+    for (int i = 0; i < feature_size; ++i) {
+      std::vector<std::vector<SparseFeatures>> &f = features[i];
+      CHECK_EQ(total_slots, f.size());
+      if (total_slots == 0) {
+        TF_RETURN_IF_ERROR(
+            context->allocate_output(i, TensorShape({0, 0}), &output));
+      } else {
+        const int size = f[0].size();
+        TF_RETURN_IF_ERROR(context->allocate_output(
+            i, TensorShape({total_slots, size}), &output));
+        for (int j = 0; j < total_slots; ++j) {
+          CHECK_EQ(size, f[j].size());
+          for (int k = 0; k < size; ++k) {
+            if (!options_.allow_feature_weights && f[j][k].weight_size() > 0) {
+              return FailedPrecondition(
+                  "Feature weights are not allowed when allow_feature_weights "
+                  "is set to false.");
+            }
+            output->matrix<string>()(j, k) = f[j][k].SerializeAsString();
+          }
+        }
+      }
+    }
+    return tensorflow::Status::OK();
+  }
+
+  // Returns the offset (i.e. row number) of a particular beam at a
+  // particular step in the final concatenated score matrix.
+  int GetOffset(const int step, const int beam_id) const {
+    return step_offsets_[step] + beam_offsets_[step][beam_id];
+  }
+
+  int FeatureSize() const { return features_.embedding_dims().size(); }
+
+  int NumActions() const {
+    return transition_system_->NumActions(label_map_->Size());
+  }
+
+  int BatchSize() const { return options_.batch_size; }
+
+  const BeamState &Beam(const int i) const { return beams_[i]; }
+
+  int Epoch() const { return epoch_; }
+
+  const string &ScoringType() const { return options_.scoring_type; }
+
+ private:
+  const BatchStateOptions options_;
+
+  // How many times the document source has been rewound.
+  int epoch_ = 0;
+
+  // Batch of sentences, and the corresponding parser states.
+  std::unique_ptr<SentenceBatch> sentence_batch_;
+
+  // Transition system.
+  std::unique_ptr<ParserTransitionSystem> transition_system_;
+
+  // Label map for transition system..
+  const TermFrequencyMap *label_map_;
+
+  // Typed feature extractor for embeddings.
+  ParserEmbeddingFeatureExtractor features_;
+
+  // Batch: WorkspaceSet objects.
+  std::vector<WorkspaceSet> workspaces_;
+
+  // Internal workspace registry for use in feature extraction.
+  WorkspaceRegistry workspace_registry_;
+
+  std::deque<BeamState> beams_;
+  std::vector<std::vector<int>> beam_offsets_;
+
+  // Keeps track of the slot offset of each step.
+  std::vector<int> step_offsets_;
+  TF_DISALLOW_COPY_AND_ASSIGN(BatchState);
+};
+
+// Creates a BeamState and hooks it up with a parser. This Op needs to
+// remain alive for the duration of the parse.
+class BeamParseReader : public OpKernel {
+ public:
+  explicit BeamParseReader(OpKernelConstruction *context) : OpKernel(context) {
+    string file_path;
+    int feature_size;
+    BatchStateOptions options;
+    OP_REQUIRES_OK(context, context->GetAttr("task_context", &file_path));
+    OP_REQUIRES_OK(context, context->GetAttr("feature_size", &feature_size));
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("beam_size", &options.max_beam_size));
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("batch_size", &options.batch_size));
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("arg_prefix", &options.arg_prefix));
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("corpus_name", &options.corpus_name));
+    OP_REQUIRES_OK(context, context->GetAttr("allow_feature_weights",
+                                             &options.allow_feature_weights));
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("continue_until_all_final",
+                                    &options.continue_until_all_final));
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("always_start_new_sentences",
+                                    &options.always_start_new_sentences));
+
+    // Reads task context from file.
+    string data;
+    OP_REQUIRES_OK(context, ReadFileToString(tensorflow::Env::Default(),
+                                             file_path, &data));
+    TaskContext task_context;
+    OP_REQUIRES(context,
+                TextFormat::ParseFromString(data, task_context.mutable_spec()),
+                InvalidArgument("Could not parse task context at ", file_path));
+    OP_REQUIRES(
+        context, options.batch_size > 0,
+        InvalidArgument("Batch size ", options.batch_size, " too small."));
+    options.scoring_type = task_context.Get(
+        tensorflow::strings::StrCat(options.arg_prefix, "_scoring"), "");
+
+    // Create batch state.
+    batch_state_.reset(new BatchState(options));
+    batch_state_->Init(&task_context);
+
+    // Check number of feature groups matches the task context.
+    const int required_size = batch_state_->FeatureSize();
+    OP_REQUIRES(
+        context, feature_size == required_size,
+        InvalidArgument("Task context requires feature_size=", required_size));
+
+    // Set expected signature.
+    std::vector<DataType> output_types(feature_size, DT_STRING);
+    output_types.push_back(DT_INT64);
+    output_types.push_back(DT_INT32);
+    OP_REQUIRES_OK(context, context->MatchSignature({}, output_types));
+  }
+
+  void Compute(OpKernelContext *context) override {
+    mutex_lock lock(mu_);
+
+    // Write features.
+    batch_state_->ResetBeams();
+    batch_state_->ResetOffsets();
+    batch_state_->PopulateFeatureOutputs(context);
+
+    // Forward the beam state vector.
+    Tensor *output;
+    const int feature_size = batch_state_->FeatureSize();
+    OP_REQUIRES_OK(context, context->allocate_output(feature_size,
+                                                     TensorShape({}), &output));
+    output->scalar<int64>()() = reinterpret_cast<int64>(batch_state_.get());
+
+    // Output number of epochs.
+    OP_REQUIRES_OK(context, context->allocate_output(feature_size + 1,
+                                                     TensorShape({}), &output));
+    output->scalar<int32>()() = batch_state_->Epoch();
+  }
+
+ private:
+  // mutex to synchronize access to Compute.
+  mutex mu_;
+
+  // The object whose handle will be passed among the Ops.
+  std::unique_ptr<BatchState> batch_state_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(BeamParseReader);
+};
+
+REGISTER_KERNEL_BUILDER(Name("BeamParseReader").Device(DEVICE_CPU),
+                        BeamParseReader);
+
+// Updates the beam based on incoming scores and outputs new feature vectors
+// based on the updated beam.
+class BeamParser : public OpKernel {
+ public:
+  explicit BeamParser(OpKernelConstruction *context) : OpKernel(context) {
+    int feature_size;
+    OP_REQUIRES_OK(context, context->GetAttr("feature_size", &feature_size));
+
+    // Set expected signature.
+    std::vector<DataType> output_types(feature_size, DT_STRING);
+    output_types.push_back(DT_INT64);
+    output_types.push_back(DT_BOOL);
+    OP_REQUIRES_OK(context,
+                   context->MatchSignature({DT_INT64, DT_FLOAT}, output_types));
+  }
+
+  void Compute(OpKernelContext *context) override {
+    BatchState *batch_state =
+        reinterpret_cast<BatchState *>(context->input(0).scalar<int64>()());
+
+    const TTypes<float>::ConstMatrix scores = context->input(1).matrix<float>();
+    VLOG(2) << "Scores: " << scores;
+    CHECK_EQ(scores.dimension(1), batch_state->NumActions());
+
+    // In AdvanceBeam we use beam_offsets_[beam_id] to determine the slice of
+    // scores that should be used for advancing, but beam_offsets_[beam_id] only
+    // exists for beams that have a sentence loaded.
+    const int batch_size = batch_state->BatchSize();
+    for (int beam_id = 0; beam_id < batch_size; ++beam_id) {
+      batch_state->AdvanceBeam(beam_id, scores);
+    }
+    batch_state->UpdateOffsets();
+
+    // Forward the beam state unmodified.
+    Tensor *output;
+    const int feature_size = batch_state->FeatureSize();
+    OP_REQUIRES_OK(context, context->allocate_output(feature_size,
+                                                     TensorShape({}), &output));
+    output->scalar<int64>()() = context->input(0).scalar<int64>()();
+
+    // Output the new features of all the slots in all the beams.
+    OP_REQUIRES_OK(context, batch_state->PopulateFeatureOutputs(context));
+
+    // Output whether the beams are alive.
+    OP_REQUIRES_OK(
+        context, context->allocate_output(feature_size + 1,
+                                          TensorShape({batch_size}), &output));
+    for (int beam_id = 0; beam_id < batch_size; ++beam_id) {
+      output->vec<bool>()(beam_id) = batch_state->Beam(beam_id).IsAlive();
+    }
+  }
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(BeamParser);
+};
+
+REGISTER_KERNEL_BUILDER(Name("BeamParser").Device(DEVICE_CPU), BeamParser);
+
+// Extracts the paths for the elements of the current beams and returns
+// indices into a scoring matrix that is assumed to have been
+// constructed along with the beam search.
+class BeamParserOutput : public OpKernel {
+ public:
+  explicit BeamParserOutput(OpKernelConstruction *context) : OpKernel(context) {
+    // Set expected signature.
+    OP_REQUIRES_OK(context,
+                   context->MatchSignature(
+                       {DT_INT64}, {DT_INT32, DT_INT32, DT_INT32, DT_FLOAT}));
+  }
+
+  void Compute(OpKernelContext *context) override {
+    BatchState *batch_state =
+        reinterpret_cast<BatchState *>(context->input(0).scalar<int64>()());
+
+    const int num_actions = batch_state->NumActions();
+    const int batch_size = batch_state->BatchSize();
+
+    // Vectors for output.
+    //
+    // Each step of each batch:path gets its index computed and a
+    // unique path id assigned.
+    std::vector<int32> indices;
+    std::vector<int32> path_ids;
+
+    // Each unique path gets a batch id and a slot (in the beam)
+    // id. These are in effect the row and column of the final
+    // 'logits' matrix going to CrossEntropy.
+    std::vector<int32> beam_ids;
+    std::vector<int32> slot_ids;
+
+    // To compute the cross entropy we also need the slot id of the
+    // gold path, one per batch.
+    std::vector<int32> gold_slot(batch_size, -1);
+
+    // For good measure we also output the path scores as computed by
+    // the beam decoder, so it can be compared in tests with the path
+    // scores computed via the indices in TF. This has the same length
+    // as beam_ids and slot_ids.
+    std::vector<float> path_scores;
+
+    // The scores tensor has, conceptually, four dimensions: 1. number
+    // of steps, 2. batch size, 3. number of paths on the beam at that
+    // step, and 4. the number of actions scored. However this is not
+    // a true tensor since the size of the beam at each step may not
+    // be equal among all steps and among all batches. Only the batch
+    // size and number of actions is fixed.
+    int path_id = 0;
+    for (int beam_id = 0; beam_id < batch_size; ++beam_id) {
+      // This occurs at the end of the corpus, when there aren't enough
+      // sentences to fill the batch.
+      if (batch_state->Beam(beam_id).gold_ == nullptr) continue;
+
+      // Populate the vectors that will index into the concatenated
+      // scores tensor.
+      int slot = 0;
+      for (const auto &item : batch_state->Beam(beam_id).slots_) {
+        beam_ids.push_back(beam_id);
+        slot_ids.push_back(slot);
+        path_scores.push_back(item.first.first);
+        VLOG(2) << "PATH SCORE @ beam_id:" << beam_id << " slot:" << slot
+                << " : " << item.first.first << " " << item.first.second;
+        VLOG(2) << "SLOT HISTORY: "
+                << utils::Join(item.second->slot_history, " ");
+        VLOG(2) << "SCORE HISTORY: "
+                << utils::Join(item.second->score_history, " ");
+        VLOG(2) << "ACTION HISTORY: "
+                << utils::Join(item.second->action_history, " ");
+
+        // Record where the gold path ended up.
+        if (item.second->state->is_gold()) {
+          CHECK_EQ(gold_slot[beam_id], -1);
+          gold_slot[beam_id] = slot;
+        }
+
+        for (size_t step = 0; step < item.second->slot_history.size(); ++step) {
+          const int step_beam_offset = batch_state->GetOffset(step, beam_id);
+          const int slot_index = item.second->slot_history[step];
+          const int action_index = item.second->action_history[step];
+          indices.push_back(num_actions * (step_beam_offset + slot_index) +
+                            action_index);
+          path_ids.push_back(path_id);
+        }
+        ++slot;
+        ++path_id;
+      }
+
+      // One and only path must be the golden one.
+      CHECK_GE(gold_slot[beam_id], 0);
+    }
+
+    const int num_ix_elements = indices.size();
+    Tensor *output;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                0, TensorShape({2, num_ix_elements}), &output));
+    auto indices_and_path_ids = output->matrix<int32>();
+    for (size_t i = 0; i < indices.size(); ++i) {
+      indices_and_path_ids(0, i) = indices[i];
+      indices_and_path_ids(1, i) = path_ids[i];
+    }
+
+    const int num_path_elements = beam_ids.size();
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(
+                       1, TensorShape({2, num_path_elements}), &output));
+    auto beam_and_slot_ids = output->matrix<int32>();
+    for (size_t i = 0; i < beam_ids.size(); ++i) {
+      beam_and_slot_ids(0, i) = beam_ids[i];
+      beam_and_slot_ids(1, i) = slot_ids[i];
+    }
+
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                2, TensorShape({batch_size}), &output));
+    std::copy(gold_slot.begin(), gold_slot.end(), output->vec<int32>().data());
+
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                3, TensorShape({num_path_elements}), &output));
+    std::copy(path_scores.begin(), path_scores.end(),
+              output->vec<float>().data());
+  }
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(BeamParserOutput);
+};
+
+REGISTER_KERNEL_BUILDER(Name("BeamParserOutput").Device(DEVICE_CPU),
+                        BeamParserOutput);
+
+// Computes eval metrics for the best path in the input beams.
+class BeamEvalOutput : public OpKernel {
+ public:
+  explicit BeamEvalOutput(OpKernelConstruction *context) : OpKernel(context) {
+    // Set expected signature.
+    OP_REQUIRES_OK(context,
+                   context->MatchSignature({DT_INT64}, {DT_INT32, DT_STRING}));
+  }
+
+  void Compute(OpKernelContext *context) override {
+    int num_tokens = 0;
+    int num_correct = 0;
+    int all_final = 0;
+    BatchState *batch_state =
+        reinterpret_cast<BatchState *>(context->input(0).scalar<int64>()());
+    const int batch_size = batch_state->BatchSize();
+    vector<Sentence> documents;
+    for (int beam_id = 0; beam_id < batch_size; ++beam_id) {
+      if (batch_state->Beam(beam_id).gold_ != nullptr &&
+          batch_state->Beam(beam_id).AllFinal()) {
+        ++all_final;
+        const auto &item = *batch_state->Beam(beam_id).slots_.rbegin();
+        ComputeTokenAccuracy(*item.second->state, batch_state->ScoringType(),
+                             &num_tokens, &num_correct);
+        documents.push_back(item.second->state->sentence());
+        item.second->state->AddParseToDocument(&documents.back());
+      }
+    }
+    Tensor *output;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, TensorShape({2}), &output));
+    auto eval_metrics = output->vec<int32>();
+    eval_metrics(0) = num_tokens;
+    eval_metrics(1) = num_correct;
+
+    const int output_size = documents.size();
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                1, TensorShape({output_size}), &output));
+    for (int i = 0; i < output_size; ++i) {
+      output->vec<string>()(i) = documents[i].SerializeAsString();
+    }
+  }
+
+ private:
+  // Tallies the # of correct and incorrect tokens for a given ParserState.
+  void ComputeTokenAccuracy(const ParserState &state,
+                            const string &scoring_type,
+                            int *num_tokens, int *num_correct) {
+    for (int i = 0; i < state.sentence().token_size(); ++i) {
+      const Token &token = state.GetToken(i);
+      if (utils::PunctuationUtil::ScoreToken(token.word(), token.tag(),
+                                             scoring_type)) {
+        ++*num_tokens;
+        if (state.IsTokenCorrect(i)) ++*num_correct;
+      }
+    }
+  }
+
+  TF_DISALLOW_COPY_AND_ASSIGN(BeamEvalOutput);
+};
+
+REGISTER_KERNEL_BUILDER(Name("BeamEvalOutput").Device(DEVICE_CPU),
+                        BeamEvalOutput);
+
+}  // namespace syntaxnet
--- a/syntaxnet/syntaxnet/beam_reader_ops_test.py
+++ b/syntaxnet/syntaxnet/beam_reader_ops_test.py
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for beam_reader_ops."""
+
+
+import os.path
+import time
+
+import tensorflow as tf
+
+from tensorflow.python.framework import test_util
+from tensorflow.python.platform import googletest
+from tensorflow.python.platform import logging
+
+from syntaxnet import structured_graph_builder
+from syntaxnet.ops import gen_parser_ops
+
+FLAGS = tf.app.flags.FLAGS
+if not hasattr(FLAGS, 'test_srcdir'):
+  FLAGS.test_srcdir = ''
+if not hasattr(FLAGS, 'test_tmpdir'):
+  FLAGS.test_tmpdir = tf.test.get_temp_dir()
+
+
+class ParsingReaderOpsTest(test_util.TensorFlowTestCase):
+
+  def setUp(self):
+    # Creates a task context with the correct testing paths.
+    initial_task_context = os.path.join(
+        FLAGS.test_srcdir,
+        'syntaxnet/'
+        'testdata/context.pbtxt')
+    self._task_context = os.path.join(FLAGS.test_tmpdir, 'context.pbtxt')
+    with open(initial_task_context, 'r') as fin:
+      with open(self._task_context, 'w') as fout:
+        fout.write(fin.read().replace('SRCDIR', FLAGS.test_srcdir)
+                   .replace('OUTPATH', FLAGS.test_tmpdir))
+
+    # Creates necessary term maps.
+    with self.test_session() as sess:
+      gen_parser_ops.lexicon_builder(task_context=self._task_context,
+                                     corpus_name='training-corpus').run()
+      self._num_features, self._num_feature_ids, _, self._num_actions = (
+          sess.run(gen_parser_ops.feature_size(task_context=self._task_context,
+                                               arg_prefix='brain_parser')))
+
+  def MakeGraph(self,
+                max_steps=10,
+                beam_size=2,
+                batch_size=1,
+                **kwargs):
+    """Constructs a structured learning graph."""
+    assert max_steps > 0, 'Empty network not supported.'
+
+    logging.info('MakeGraph + %s', kwargs)
+
+    with self.test_session(graph=tf.Graph()) as sess:
+      feature_sizes, domain_sizes, embedding_dims, num_actions = sess.run(
+          gen_parser_ops.feature_size(task_context=self._task_context))
+    embedding_dims = [8, 8, 8]
+    hidden_layer_sizes = []
+    learning_rate = 0.01
+    builder = structured_graph_builder.StructuredGraphBuilder(
+        num_actions,
+        feature_sizes,
+        domain_sizes,
+        embedding_dims,
+        hidden_layer_sizes,
+        seed=1,
+        max_steps=max_steps,
+        beam_size=beam_size,
+        gate_gradients=True,
+        use_locking=True,
+        use_averaging=False,
+        check_parameters=False,
+        **kwargs)
+    builder.AddTraining(self._task_context,
+                        batch_size,
+                        learning_rate=learning_rate,
+                        decay_steps=1000,
+                        momentum=0.9,
+                        corpus_name='training-corpus')
+    builder.AddEvaluation(self._task_context,
+                          batch_size,
+                          evaluation_max_steps=25,
+                          corpus_name=None)
+    builder.training['inits'] = tf.group(*builder.inits.values(), name='inits')
+    return builder
+
+  def Train(self, **kwargs):
+    with self.test_session(graph=tf.Graph()) as sess:
+      max_steps = 3
+      batch_size = 3
+      beam_size = 3
+      builder = (
+          self.MakeGraph(
+              max_steps=max_steps, beam_size=beam_size,
+              batch_size=batch_size, **kwargs))
+      logging.info('params: %s', builder.params.keys())
+      logging.info('variables: %s', builder.variables.keys())
+
+      t = builder.training
+      sess.run(t['inits'])
+      costs = []
+      gold_slots = []
+      alive_steps_vector = []
+      every_n = 5
+      walltime = time.time()
+      for step in range(10):
+        if step > 0 and step % every_n == 0:
+          new_walltime = time.time()
+          logging.info(
+              'Step: %d <cost>: %f <gold_slot>: %f <alive_steps>: %f <iter '
+              'time>: %f ms',
+              step, sum(costs[-every_n:]) / float(every_n),
+              sum(gold_slots[-every_n:]) / float(every_n),
+              sum(alive_steps_vector[-every_n:]) / float(every_n),
+              1000 * (new_walltime - walltime) / float(every_n))
+          walltime = new_walltime
+
+        cost, gold_slot, alive_steps, _ = sess.run(
+            [t['cost'], t['gold_slot'], t['alive_steps'], t['train_op']])
+        costs.append(cost)
+        gold_slots.append(gold_slot.mean())
+        alive_steps_vector.append(alive_steps.mean())
+
+      if builder._only_train:
+        trainable_param_names = [
+            k for k in builder.params if k in builder._only_train]
+      else:
+        trainable_param_names = builder.params.keys()
+      if builder._use_averaging:
+        for v in trainable_param_names:
+          avg = builder.variables['%s_avg_var' % v].eval()
+          tf.assign(builder.params[v], avg).eval()
+
+      # Reset for pseudo eval.
+      costs = []
+      gold_slots = []
+      alive_stepss = []
+      for step in range(10):
+        cost, gold_slot, alive_steps = sess.run(
+            [t['cost'], t['gold_slot'], t['alive_steps']])
+        costs.append(cost)
+        gold_slots.append(gold_slot.mean())
+        alive_stepss.append(alive_steps.mean())
+
+      logging.info(
+          'Pseudo eval: <cost>: %f <gold_slot>: %f <alive_steps>: %f',
+          sum(costs[-every_n:]) / float(every_n),
+          sum(gold_slots[-every_n:]) / float(every_n),
+          sum(alive_stepss[-every_n:]) / float(every_n))
+
+  def PathScores(self, iterations, beam_size, max_steps, batch_size):
+    with self.test_session(graph=tf.Graph()) as sess:
+      t = self.MakeGraph(beam_size=beam_size, max_steps=max_steps,
+                         batch_size=batch_size).training
+      sess.run(t['inits'])
+      all_path_scores = []
+      beam_path_scores = []
+      for i in range(iterations):
+        logging.info('run %d', i)
+        tensors = (
+            sess.run(
+                [t['alive_steps'], t['concat_scores'],
+                 t['all_path_scores'], t['beam_path_scores'],
+                 t['indices'], t['path_ids']]))
+
+        logging.info('alive for %s, all_path_scores and beam_path_scores, '
+                     'indices and path_ids:'
+                     '\n%s\n%s\n%s\n%s',
+                     tensors[0], tensors[2], tensors[3], tensors[4], tensors[5])
+        logging.info('diff:\n%s', tensors[2] - tensors[3])
+
+        all_path_scores.append(tensors[2])
+        beam_path_scores.append(tensors[3])
+      return all_path_scores, beam_path_scores
+
+  def testParseUntilNotAlive(self):
+    """Ensures that the 'alive' condition works in the Cond ops."""
+    with self.test_session(graph=tf.Graph()) as sess:
+      t = self.MakeGraph(batch_size=3, beam_size=2, max_steps=5).training
+      sess.run(t['inits'])
+      for i in range(5):
+        logging.info('run %d', i)
+        tf_alive = t['alive'].eval()
+        self.assertFalse(any(tf_alive))
+
+  def testParseMomentum(self):
+    """Ensures that Momentum training can be done using the gradients."""
+    self.Train()
+    self.Train(model_cost='perceptron_loss')
+    self.Train(model_cost='perceptron_loss',
+               only_train='softmax_weight,softmax_bias', softmax_init=0)
+    self.Train(only_train='softmax_weight,softmax_bias', softmax_init=0)
+
+  def testPathScoresAgree(self):
+    """Ensures that path scores computed in the beam are same in the net."""
+    all_path_scores, beam_path_scores = self.PathScores(
+        iterations=1, beam_size=130, max_steps=5, batch_size=1)
+    self.assertArrayNear(all_path_scores[0], beam_path_scores[0], 1e-6)
+
+  def testBatchPathScoresAgree(self):
+    """Ensures that path scores computed in the beam are same in the net."""
+    all_path_scores, beam_path_scores = self.PathScores(
+        iterations=1, beam_size=130, max_steps=5, batch_size=22)
+    self.assertArrayNear(all_path_scores[0], beam_path_scores[0], 1e-6)
+
+  def testBatchOneStepPathScoresAgree(self):
+    """Ensures that path scores computed in the beam are same in the net."""
+    all_path_scores, beam_path_scores = self.PathScores(
+        iterations=1, beam_size=130, max_steps=1, batch_size=22)
+    self.assertArrayNear(all_path_scores[0], beam_path_scores[0], 1e-6)
+
+
+if __name__ == '__main__':
+  googletest.main()
--- a/syntaxnet/syntaxnet/conll2tree.py
+++ b/syntaxnet/syntaxnet/conll2tree.py
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""A program to generate ASCII trees from conll files."""
+
+import collections
+
+import asciitree
+import tensorflow as tf
+
+import syntaxnet.load_parser_ops
+
+from tensorflow.python.platform import logging
+from syntaxnet import sentence_pb2
+from syntaxnet.ops import gen_parser_ops
+
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+
+flags.DEFINE_string('task_context',
+                    'syntaxnet/models/parsey_mcparseface/context.pbtxt',
+                    'Path to a task context with inputs and parameters for '
+                    'feature extractors.')
+flags.DEFINE_string('corpus_name', 'stdin-conll',
+                    'Path to a task context with inputs and parameters for '
+                    'feature extractors.')
+
+
+def to_dict(sentence):
+  """Builds a dictionary representing the parse tree of a sentence.
+
+  Args:
+    sentence: Sentence protocol buffer to represent.
+  Returns:
+    Dictionary mapping tokens to children.
+  """
+  token_str = ['%s %s %s' % (token.word, token.tag, token.label)
+               for token in sentence.token]
+  children = [[] for token in sentence.token]
+  root = -1
+  for i in range(0, len(sentence.token)):
+    token = sentence.token[i]
+    if token.head == -1:
+      root = i
+    else:
+      children[token.head].append(i)
+
+  def _get_dict(i):
+    d = collections.OrderedDict()
+    for c in children[i]:
+      d[token_str[c]] = _get_dict(c)
+    return d
+
+  tree = collections.OrderedDict()
+  tree[token_str[root]] = _get_dict(root)
+  return tree
+
+
+def main(unused_argv):
+  logging.set_verbosity(logging.INFO)
+  with tf.Session() as sess:
+    src = gen_parser_ops.document_source(batch_size=32,
+                                         corpus_name=FLAGS.corpus_name,
+                                         task_context=FLAGS.task_context)
+    sentence = sentence_pb2.Sentence()
+    while True:
+      documents, finished = sess.run(src)
+      logging.info('Read %d documents', len(documents))
+      for d in documents:
+        sentence.ParseFromString(d)
+        tr = asciitree.LeftAligned()
+        d = to_dict(sentence)
+        print 'Input: %s' % sentence.text
+        print 'Parse:'
+        print tr(d)
+
+      if finished:
+        break
+
+
+if __name__ == '__main__':
+  tf.app.run()
--- a/syntaxnet/syntaxnet/context.pbtxt
+++ b/syntaxnet/syntaxnet/context.pbtxt
+Parameter {
+  name: 'brain_parser_embedding_dims'
+  value: '64;32;32'
+}
+Parameter {
+  name: 'brain_parser_features'
+  value: 'input.word input(1).word input(2).word input(3).word stack.word stack(1).word stack(2).word stack(3).word stack.child(1).word stack.child(1).sibling(-1).word stack.child(-1).word stack.child(-1).sibling(1).word stack(1).child(1).word stack(1).child(1).sibling(-1).word stack(1).child(-1).word stack(1).child(-1).sibling(1).word stack.child(2).word stack.child(-2).word stack(1).child(2).word stack(1).child(-2).word;input.tag input(1).tag input(2).tag input(3).tag stack.tag stack(1).tag stack(2).tag stack(3).tag stack.child(1).tag stack.child(1).sibling(-1).tag stack.child(-1).tag stack.child(-1).sibling(1).tag stack(1).child(1).tag stack(1).child(1).sibling(-1).tag stack(1).child(-1).tag stack(1).child(-1).sibling(1).tag stack.child(2).tag stack.child(-2).tag stack(1).child(2).tag stack(1).child(-2).tag;stack.child(1).label stack.child(1).sibling(-1).label stack.child(-1).label stack.child(-1).sibling(1).label stack(1).child(1).label stack(1).child(1).sibling(-1).label stack(1).child(-1).label stack(1).child(-1).sibling(1).label stack.child(2).label stack.child(-2).label stack(1).child(2).label stack(1).child(-2).label'
+}
+Parameter {
+  name: 'brain_parser_embedding_names'
+  value: 'words;tags;labels'
+}
+Parameter {
+  name: 'brain_parser_scoring'
+  value: 'default'
+}
+Parameter {
+  name: 'brain_pos_transition_system'
+  value: 'tagger'
+}
+Parameter {
+  name: 'brain_pos_embedding_dims'
+  value: '64;4;8;8'
+}
+Parameter {
+  name: 'brain_pos_features'
+  value: 'stack(3).word stack(2).word stack(1).word stack.word input.word input(1).word input(2).word input(3).word;input.digit input.hyphen;stack.suffix(length=2) input.suffix(length=2) input(1).suffix(length=2);stack.prefix(length=2) input.prefix(length=2) input(1).prefix(length=2)'
+}
+Parameter {
+  name: 'brain_pos_embedding_names'
+  value: 'words;other;suffix;prefix'
+}
+input {
+  name: 'training-corpus'
+  record_format: 'conll-sentence'
+  Part {
+    file_pattern: '<your-dataset>/treebank-train.trees.conll'
+  }
+}
+input {
+  name: 'tuning-corpus'
+  record_format: 'conll-sentence'
+  Part {
+    file_pattern: '<your-dataset>/dev.conll'
+  }
+}
+input {
+  name: 'dev-corpus'
+  record_format: 'conll-sentence'
+  Part {
+    file_pattern: '<your-dataset>/test.conll'
+  }
+}
+input {
+  name: 'tagged-training-corpus'
+  creator: 'brain_pos/greedy'
+  record_format: 'conll-sentence'
+}
+input {
+  name: 'tagged-tuning-corpus'
+  creator: 'brain_pos/greedy'
+  record_format: 'conll-sentence'
+}
+input {
+  name: 'tagged-dev-corpus'
+  creator: 'brain_pos/greedy'
+  record_format: 'conll-sentence'
+}
+input {
+  name: 'label-map'
+  creator: 'brain_pos/greedy'
+}
+input {
+  name: 'word-map'
+  creator: 'brain_pos/greedy'
+}
+input {
+  name: 'lcword-map'
+  creator: 'brain_pos/greedy'
+}
+input {
+  name: 'tag-map'
+  creator: 'brain_pos/greedy'
+}
+input {
+  name: 'category-map'
+  creator: 'brain_pos/greedy'
+}
+input {
+  name: 'prefix-table'
+  creator: 'brain_pos/greedy'
+}
+input {
+  name: 'suffix-table'
+  creator: 'brain_pos/greedy'
+}
+input {
+  name: 'tag-to-category'
+  creator: 'brain_pos/greedy'
+}
+input {
+  name: 'projectivized-training-corpus'
+  creator: 'brain_parser/greedy'
+  record_format: 'conll-sentence'
+}
+input {
+  name: 'parsed-training-corpus'
+  creator: 'brain_parser/greedy'
+  record_format: 'conll-sentence'
+}
+input {
+  name: 'parsed-tuning-corpus'
+  creator: 'brain_parser/greedy'
+  record_format: 'conll-sentence'
+}
+input {
+  name: 'parsed-dev-corpus'
+  creator: 'brain_parser/greedy'
+  record_format: 'conll-sentence'
+}
+input {
+  name: 'beam-parsed-training-corpus'
+  creator: 'brain_parser/structured'
+  record_format: 'conll-sentence'
+}
+input {
+  name: 'beam-parsed-tuning-corpus'
+  creator: 'brain_parser/structured'
+  record_format: 'conll-sentence'
+}
+input {
+  name: 'beam-parsed-dev-corpus'
+  creator: 'brain_parser/structured'
+  record_format: 'conll-sentence'
+}
+input {
+  name: 'stdin'
+  record_format: 'english-text'
+  Part {
+    file_pattern: '-'
+  }
+}
+input {
+  name: 'stdin-conll'
+  record_format: 'conll-sentence'
+  Part {
+    file_pattern: '-'
+  }
+}
+input {
+  name: 'stdout-conll'
+  record_format: 'conll-sentence'
+  Part {
+    file_pattern: '-'
+  }
+}
--- a/syntaxnet/syntaxnet/demo.sh
+++ b/syntaxnet/syntaxnet/demo.sh
+#!/bin/bash
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+# A script that runs a tokenizer, a part-of-speech tagger and a dependency
+# parser on an English text file, with one sentence per line.
+#
+# Example usage:
+#  echo "Parsey McParseface is my favorite parser!" | syntaxnet/demo.sh
+
+# To run on a conll formatted file, add the --conll command line argument.
+#
+
+PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
+MODEL_DIR=syntaxnet/models/parsey_mcparseface
+[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin
+
+$PARSER_EVAL \
+  --input=$INPUT_FORMAT \
+  --output=stdout-conll \
+  --hidden_layer_sizes=64 \
+  --arg_prefix=brain_tagger \
+  --graph_builder=structured \
+  --task_context=$MODEL_DIR/context.pbtxt \
+  --model_path=$MODEL_DIR/tagger-params \
+  --slim_model \
+  --batch_size=1024 \
+  --alsologtostderr \
+   | \
+  $PARSER_EVAL \
+  --input=stdin-conll \
+  --output=stdout-conll \
+  --hidden_layer_sizes=512,512 \
+  --arg_prefix=brain_parser \
+  --graph_builder=structured \
+  --task_context=$MODEL_DIR/context.pbtxt \
+  --model_path=$MODEL_DIR/parser-params \
+  --slim_model \
+  --batch_size=1024 \
+  --alsologtostderr \
+  | \
+  bazel-bin/syntaxnet/conll2tree \
+  --task_context=$MODEL_DIR/context.pbtxt \
+  --alsologtostderr