Updates to syntaxnet, including update tensorflow sub-module, bazel...

Updates to syntaxnet, including update tensorflow sub-module, bazel requirement and fix trainer crash (#479) * syntaxnet: Cosmetic fixes recommended by python lint. * syntaxnet: Fix crash in parser_trainer due to inconsistency between LexiconBuilder::Compute() and context.pbtxt definition ('char-map' input declaration was missing). * syntaxnet: reduce flakiness in GraphBuilderTest. * syntaxnet: Update tensorflow submodule to version > 0.10. * syntaxnet: Update to latest stable bazel (0.3.1). This update comes partially to allow Tensorflow submodule to build succesffuly. In this commit, I also update and simplify the WORKSPACE, to avoid declaring dependencies already present in tensorflow. * syntaxnet: Update bazel version check to require version 0.3.0 * syntaxnet: Document pip requirement, along with python mock module.

Updates to syntaxnet, including update tensorflow sub-module, bazel...
Updates to syntaxnet, including update tensorflow sub-module, bazel requirement and fix trainer crash (#479) * syntaxnet: Cosmetic fixes recommended by python lint. * syntaxnet: Fix crash in parser_trainer due to inconsistency between LexiconBuilder::Compute() and context.pbtxt definition ('char-map' input declaration was missing). * syntaxnet: reduce flakiness in GraphBuilderTest. * syntaxnet: Update tensorflow submodule to version > 0.10. * syntaxnet: Update to latest stable bazel (0.3.1). This update comes partially to allow Tensorflow submodule to build succesffuly. In this commit, I also update and simplify the WORKSPACE, to avoid declaring dependencies already present in tensorflow. * syntaxnet: Update bazel version check to require version 0.3.0 * syntaxnet: Document pip requirement, along with python mock module.
51238b1b · Livio Soares · calberti · 2390974a · 51238b1b · 51238b1b
Commit 51238b1b authored Oct 03, 2016 by Livio Soares Committed by calberti Oct 03, 2016
20 changed files
--- a/syntaxnet/Dockerfile
+++ b/syntaxnet/Dockerfile
@@ -5,17 +5,17 @@ ENV SYNTAXNETDIR=/opt/tensorflow PATH=$PATH:/root/bin
 RUN mkdir -p $SYNTAXNETDIR \
    && cd $SYNTAXNETDIR \
    && apt-get update \
-    && apt-get install git zlib1g-dev file swig python2.7 python-dev python-pip -y \
+    && apt-get install git zlib1g-dev file swig python2.7 python-dev python-pip python-mock -y \
    && pip install --upgrade pip \
-    && pip install -U protobuf==3.0.0b2 \
+    && pip install -U protobuf==3.0.0 \
    && pip install asciitree \
    && pip install numpy \
-    && wget https://github.com/bazelbuild/bazel/releases/download/0.2.2b/bazel-0.2.2b-installer-linux-x86_64.sh \
-    && chmod +x bazel-0.2.2b-installer-linux-x86_64.sh \
-    && ./bazel-0.2.2b-installer-linux-x86_64.sh --user \
+    && wget https://github.com/bazelbuild/bazel/releases/download/0.3.1/bazel-0.3.1-installer-linux-x86_64.sh \
+    && chmod +x bazel-0.3.1-installer-linux-x86_64.sh \
+    && ./bazel-0.3.1-installer-linux-x86_64.sh --user \
    && git clone --recursive https://github.com/tensorflow/models.git \
    && cd $SYNTAXNETDIR/models/syntaxnet/tensorflow \
-    && echo "\n\n\n" | ./configure \
+    && echo "\n\n\n\n" | ./configure \
    && apt-get autoremove -y \
    && apt-get clean


--- a/syntaxnet/README.md
+++ b/syntaxnet/README.md
@@ -29,7 +29,7 @@ Model
 [Martins et al. (2013)](http://www.cs.cmu.edu/~ark/TurboParser/)                                                | 93.10 | 88.23 | 94.21
 [Zhang and McDonald (2014)](http://research.google.com/pubs/archive/38148.pdf)                                  | 93.32 | 88.65 | 93.37
 [Weiss et al. (2015)](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43800.pdf) | 93.91 | 89.29 | 94.17
-[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)*                                                   | 94.44 | 90.17 | 95.40
+[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)*                                                         | 94.44 | 90.17 | 95.40
 Parsey McParseface                                                                                              | 94.15 | 89.08 | 94.77

 We see that Parsey McParseface is state-of-the-art; more importantly, with
@@ -45,7 +45,7 @@ Parsey McParseface is also state-of-the-art for part-of-speech (POS) tagging
 Model                                                                      | News  | Web   | Questions
 -------------------------------------------------------------------------- | :---: | :---: | :-------:
 [Ling et al. (2015)](http://www.cs.cmu.edu/~lingwang/papers/emnlp2015.pdf) | 97.78 | 94.03 | 96.18
-[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)*              | 97.77 | 94.80 | 96.86
+[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)*                    | 97.77 | 94.80 | 96.86
 Parsey McParseface                                                         | 97.52 | 94.24 | 96.45

 The first part of this tutorial describes how to install the necessary tools and
@@ -78,10 +78,16 @@ source. You'll need to install:

 *   python 2.7:
    * python 3 support is not available yet
+*   pip (python package manager)
+    * `apt-get install python-pip` on Ubuntu
+    * `brew` installs pip along with python on OSX
 *   bazel:
-    *   **versions 0.2.0 - 0.2.2b, NOT 0.2.3**
+    *   **versions 0.3.0 - 0.3.1*
    *   follow the instructions [here](http://bazel.io/docs/install.html)
-    *   Alternately, Download bazel (0.2.2-0.2.2b) <.deb> from [here](https://github.com/bazelbuild/bazel/releases) for your system configuration.
+    *   Alternately, Download bazel <.deb> from
+        [https://github.com/bazelbuild/bazel/releases]
+        (https://github.com/bazelbuild/bazel/releases) for your system
+        configuration.
    *   Install it using the command: sudo dpkg -i <.deb file>
    *   Check for the bazel version by typing: bazel version
 *   swig:
@@ -94,12 +100,14 @@ source. You'll need to install:
    *   `pip install asciitree`
 *   numpy, package for scientific computing:
    *   `pip install numpy`
+*   mock, package for unit testing:
+    *   `pip install mock`

 Once you completed the above steps, you can build and test SyntaxNet with the
 following commands:

 ```shell
-  git clone --recursive --recurse-submodules https://github.com/tensorflow/models.git
+  git clone --recursive https://github.com/tensorflow/models.git
  cd models/syntaxnet/tensorflow
  ./configure
  cd ..

--- a/syntaxnet/WORKSPACE
+++ b/syntaxnet/WORKSPACE
 local_repository(
  name = "org_tensorflow",
-  path = __workspace_dir__ + "/tensorflow",
+  path = "tensorflow",
 )

-load('//tensorflow/tensorflow:workspace.bzl', 'tf_workspace')
-tf_workspace("tensorflow/", "@org_tensorflow")
+load('@org_tensorflow//tensorflow:workspace.bzl', 'tf_workspace')
+tf_workspace()

 # Specify the minimum required Bazel version.
 load("@org_tensorflow//tensorflow:tensorflow.bzl", "check_version")
-check_version("0.2.0")
-
-# ===== gRPC dependencies =====
-
-bind(
-    name = "libssl",
-    actual = "@ts_boringssl_git//:ssl",
-)
-
-git_repository(
-    name = "ts_boringssl_git",
-    commit = "436432d849b83ab90f18773e4ae1c7a8f148f48d",
-    init_submodules = True,
-    remote = "https://github.com/mdsteele/boringssl-bazel.git",
-)
-
-bind(
-    name = "zlib",
-    actual = "@ts_zlib_archive//:zlib",
-)
-
-new_http_archive(
-    name = "ts_zlib_archive",
-    build_file = "zlib.BUILD",
-    sha256 = "879d73d8cd4d155f31c1f04838ecd567d34bebda780156f0e82a20721b3973d5",
-    strip_prefix = "zlib-1.2.8",
-    url = "http://zlib.net/zlib128.zip",
-)
+check_version("0.3.0")
--- a/syntaxnet/syntaxnet/BUILD
+++ b/syntaxnet/syntaxnet/BUILD
@@ -78,7 +78,7 @@ cc_library(
    hdrs = ["base.h"],
    visibility = ["//visibility:public"],
    deps = [
-        "@re2//:re2",
+        "@com_googlesource_code_re2//:re2",
        "@protobuf//:protobuf",
        "@org_tensorflow//third_party/eigen3",
    ] + select({

--- a/syntaxnet/syntaxnet/beam_reader_ops.cc
+++ b/syntaxnet/syntaxnet/beam_reader_ops.cc
@@ -35,7 +35,6 @@ limitations under the License.
 #include "tensorflow/core/framework/tensor.h"
 #include "tensorflow/core/framework/tensor_shape.h"
 #include "tensorflow/core/lib/core/status.h"
-#include "tensorflow/core/lib/io/inputbuffer.h"
 #include "tensorflow/core/platform/env.h"

 using tensorflow::DEVICE_CPU;

--- a/syntaxnet/syntaxnet/beam_reader_ops_test.py
+++ b/syntaxnet/syntaxnet/beam_reader_ops_test.py
@@ -18,7 +18,6 @@

 import os.path
 import time
-
 import tensorflow as tf

 from tensorflow.python.framework import test_util

--- a/syntaxnet/syntaxnet/conll2tree.py
+++ b/syntaxnet/syntaxnet/conll2tree.py
@@ -40,9 +40,11 @@ flags.DEFINE_string('corpus_name', 'stdin-conll',

 def to_dict(sentence):
  """Builds a dictionary representing the parse tree of a sentence.
-     Note that the suffix "@id" (where 'id' is a number) is appended to each element
-     to handle the sentence that has multiple elements with identical representation.
-     Those suffix needs to be removed after the asciitree is rendered.
+
+     Note that the suffix "@id" (where 'id' is a number) is appended to each
+     element to handle the sentence that has multiple elements with identical
+     representation. Those suffix needs to be removed after the asciitree is
+     rendered.

  Args:
    sentence: Sentence protocol buffer to represent.
@@ -54,7 +56,8 @@ def to_dict(sentence):
  root = -1
  for i in range(0, len(sentence.token)):
    token = sentence.token[i]
-    token_str.append('%s %s %s @%d' % (token.word, token.tag, token.label, (i+1)))
+    token_str.append('%s %s %s @%d' %
+                     (token.word, token.tag, token.label, (i+1)))
    if token.head == -1:
      root = i
    else:
@@ -88,7 +91,7 @@ def main(unused_argv):
        print 'Input: %s' % sentence.text
        print 'Parse:'
        tr_str = tr(d)
-        pat = re.compile('\s*@\d+$')
+        pat = re.compile(r'\s*@\d+$')
        for tr_ln in tr_str.splitlines():
          print pat.sub('', tr_ln)


--- a/syntaxnet/syntaxnet/context.pbtxt
+++ b/syntaxnet/syntaxnet/context.pbtxt
@@ -86,6 +86,10 @@ input {
  name: 'category-map'
  creator: 'brain_pos/greedy'
 }
+input {
+  name: 'char-map'
+  creator: 'brain_pos/greedy'
+}
 input {
  name: 'prefix-table'
  creator: 'brain_pos/greedy'

--- a/syntaxnet/syntaxnet/document_format.h
+++ b/syntaxnet/syntaxnet/document_format.h
@@ -25,7 +25,7 @@ limitations under the License.
 #include "syntaxnet/registry.h"
 #include "syntaxnet/sentence.pb.h"
 #include "syntaxnet/task_context.h"
-#include "tensorflow/core/lib/io/inputbuffer.h"
+#include "tensorflow/core/lib/io/buffered_inputstream.h"

 namespace syntaxnet {

@@ -42,7 +42,7 @@ class DocumentFormat : public RegisterableClass<DocumentFormat> {

  // Reads a record from the given input buffer with format specific logic.
  // Returns false if no record could be read because we reached end of file.
-  virtual bool ReadRecord(tensorflow::io::InputBuffer *buffer,
+  virtual bool ReadRecord(tensorflow::io::BufferedInputStream *buffer,
                          string *record) = 0;

  // Converts a key/value pair to one or more documents.

--- a/syntaxnet/syntaxnet/feature_extractor.h
+++ b/syntaxnet/syntaxnet/feature_extractor.h
@@ -50,7 +50,6 @@ limitations under the License.
 #include "syntaxnet/workspace.h"
 #include "tensorflow/core/lib/core/status.h"
 #include "tensorflow/core/lib/core/stringpiece.h"
-#include "tensorflow/core/lib/io/inputbuffer.h"
 #include "tensorflow/core/lib/io/record_reader.h"
 #include "tensorflow/core/lib/io/record_writer.h"
 #include "tensorflow/core/lib/strings/strcat.h"

--- a/syntaxnet/syntaxnet/graph_builder.py
+++ b/syntaxnet/syntaxnet/graph_builder.py
@@ -256,7 +256,7 @@ class GreedyParser(object):
            self.params[name])

  def GetStep(self):
-    def OnesInitializer(shape, dtype=tf.float32):
+    def OnesInitializer(shape, dtype=tf.float32, partition_info=None):
      return tf.ones(shape, dtype)
    return self._AddVariable([], tf.int32, 'step', OnesInitializer)

@@ -475,7 +475,7 @@ class GreedyParser(object):
  def AddPretrainedEmbeddings(self, index, embeddings_path, task_context):
    """Embeddings at the given index will be set to pretrained values."""

-    def _Initializer(shape, dtype=tf.float32):
+    def _Initializer(shape, dtype=tf.float32, partition_info=None):
      unused_dtype = dtype
      t = gen_parser_ops.word_embedding_initializer(
          vectors=embeddings_path,

--- a/syntaxnet/syntaxnet/graph_builder_test.py
+++ b/syntaxnet/syntaxnet/graph_builder_test.py
@@ -18,7 +18,6 @@

 # disable=no-name-in-module,unused-import,g-bad-import-order,maybe-no-member
 import os.path
-
 import tensorflow as tf

 from tensorflow.python.framework import test_util
@@ -221,7 +220,7 @@ class GraphBuilderTest(test_util.TensorFlowTestCase):
    with self.test_session(graph=graph1) as sess:
      sess.run(parser.inits.values())
      metrics1 = None
-      for _ in range(500):
+      for _ in range(50):
        cost1, _ = sess.run([parser.training['cost'],
                             parser.training['train_op']])
        em1 = parser.evaluation['eval_metrics'].eval()
@@ -240,7 +239,7 @@ class GraphBuilderTest(test_util.TensorFlowTestCase):
    with self.test_session(graph=graph2) as sess:
      sess.run(parser.inits.values())
      metrics2 = None
-      for _ in range(500):
+      for _ in range(50):
        cost2, _ = sess.run([parser.training['cost'],
                             parser.training['train_op']])
        em2 = parser.evaluation['eval_metrics'].eval()

--- a/syntaxnet/syntaxnet/lexicon_builder_test.py
+++ b/syntaxnet/syntaxnet/lexicon_builder_test.py
@@ -19,7 +19,6 @@

 # disable=no-name-in-module,unused-import,g-bad-import-order,maybe-no-member
 import os.path
-
 import tensorflow as tf

 import syntaxnet.load_parser_ops

--- a/syntaxnet/syntaxnet/parser_eval.py
+++ b/syntaxnet/syntaxnet/parser_eval.py
@@ -19,7 +19,6 @@
 import os
 import os.path
 import time
-
 import tempfile
 import tensorflow as tf


--- a/syntaxnet/syntaxnet/parser_trainer.py
+++ b/syntaxnet/syntaxnet/parser_trainer.py
@@ -20,7 +20,6 @@
 import os
 import os.path
 import time
-
 import tensorflow as tf

 from tensorflow.python.platform import gfile

--- a/syntaxnet/syntaxnet/parser_trainer_test.sh
+++ b/syntaxnet/syntaxnet/parser_trainer_test.sh
@@ -17,12 +17,9 @@
 # This test trains a parser on a small dataset, then runs it in greedy mode and
 # in structured mode with beam 1, and checks that the result is identical.

-
-
-
 set -eux

-BINDIR=$TEST_SRCDIR/syntaxnet
+BINDIR=$TEST_SRCDIR/$TEST_WORKSPACE/syntaxnet
 CONTEXT=$BINDIR/testdata/context.pbtxt
 TMP_DIR=/tmp/syntaxnet-output


--- a/syntaxnet/syntaxnet/proto_io.h
+++ b/syntaxnet/syntaxnet/proto_io.h
@@ -32,7 +32,8 @@ limitations under the License.
 #include "tensorflow/core/lib/core/errors.h"
 #include "tensorflow/core/lib/core/status.h"
 #include "tensorflow/core/lib/core/stringpiece.h"
-#include "tensorflow/core/lib/io/inputbuffer.h"
+#include "tensorflow/core/lib/io/buffered_inputstream.h"
+#include "tensorflow/core/lib/io/random_inputstream.h"
 #include "tensorflow/core/lib/io/record_reader.h"
 #include "tensorflow/core/lib/io/record_writer.h"
 #include "tensorflow/core/lib/strings/strcat.h"
@@ -181,22 +182,27 @@ class TextReader {
    if (filename_ == "-") {
      static const int kInputBufferSize = 8 * 1024; /* bytes */
      file_.reset(new StdIn());
-      buffer_.reset(
-          new tensorflow::io::InputBuffer(file_.get(), kInputBufferSize));
+      stream_.reset(new tensorflow::io::RandomAccessInputStream(file_.get()));
+      buffer_.reset(new tensorflow::io::BufferedInputStream(file_.get(),
+                                                            kInputBufferSize));
    } else {
      static const int kInputBufferSize = 1 * 1024 * 1024; /* bytes */
      TF_CHECK_OK(
          tensorflow::Env::Default()->NewRandomAccessFile(filename_, &file_));
-      buffer_.reset(
-          new tensorflow::io::InputBuffer(file_.get(), kInputBufferSize));
+      stream_.reset(new tensorflow::io::RandomAccessInputStream(file_.get()));
+      buffer_.reset(new tensorflow::io::BufferedInputStream(file_.get(),
+                                                            kInputBufferSize));
    }
  }

 private:
  string filename_;
  int sentence_count_ = 0;
-  std::unique_ptr<tensorflow::RandomAccessFile> file_;  // must outlive buffer_
-  std::unique_ptr<tensorflow::io::InputBuffer> buffer_;
+  std::unique_ptr<tensorflow::RandomAccessFile>
+      file_;  // must outlive buffer_, stream_
+  std::unique_ptr<tensorflow::io::RandomAccessInputStream>
+      stream_;  // Must outlive buffer_
+  std::unique_ptr<tensorflow::io::BufferedInputStream> buffer_;
  std::unique_ptr<DocumentFormat> format_;
 };


--- a/syntaxnet/syntaxnet/reader_ops.cc
+++ b/syntaxnet/syntaxnet/reader_ops.cc
@@ -35,7 +35,6 @@ limitations under the License.
 #include "tensorflow/core/framework/tensor.h"
 #include "tensorflow/core/framework/tensor_shape.h"
 #include "tensorflow/core/lib/core/status.h"
-#include "tensorflow/core/lib/io/inputbuffer.h"
 #include "tensorflow/core/lib/io/table.h"
 #include "tensorflow/core/lib/io/table_options.h"
 #include "tensorflow/core/lib/strings/stringprintf.h"

--- a/syntaxnet/syntaxnet/reader_ops_test.py
+++ b/syntaxnet/syntaxnet/reader_ops_test.py
@@ -17,12 +17,10 @@


 import os.path
-
 import numpy as np
 import tensorflow as tf

 from tensorflow.python.framework import test_util
-from tensorflow.python.ops import control_flow_ops as cf
 from tensorflow.python.platform import googletest
 from tensorflow.python.platform import tf_logging as logging

@@ -164,7 +162,9 @@ class ParsingReaderOpsTest(test_util.TensorFlowTestCase):
      loop_vars = [epoch, num_actions]

      res = sess.run(
-          cf.While(Condition, Body, loop_vars, parallel_iterations=1))
+          tf.while_loop(Condition, Body, loop_vars,
+                        shape_invariants=[tf.TensorShape(None)] * 2,
+                        parallel_iterations=1))
      logging.info('Result: %s', res)
      self.assertEqual(res[0], 2)


--- a/syntaxnet/syntaxnet/sentence_batch.h
+++ b/syntaxnet/syntaxnet/sentence_batch.h
@@ -18,6 +18,7 @@ limitations under the License.

 #include <memory>
 #include <string>
+#include <utility>
 #include <vector>

 #include "syntaxnet/embedding_feature_extractor.h"
@@ -38,7 +39,7 @@ class SentenceBatch {
 public:
  SentenceBatch(int batch_size, string input_name)
      : batch_size_(batch_size),
-        input_name_(input_name),
+        input_name_(std::move(input_name)),
        sentences_(batch_size) {}

  // Initializes all resources and opens the corpus file.