"vscode:/vscode.git/clone" did not exist on "72b8e9f7971e397b38999efb6955bbe64d156de2"
Commit 51238b1b authored by Livio Soares's avatar Livio Soares Committed by calberti
Browse files

Updates to syntaxnet, including update tensorflow sub-module, bazel...

Updates to syntaxnet, including update tensorflow sub-module, bazel requirement and fix trainer crash (#479)

* syntaxnet: Cosmetic fixes recommended by python lint.

* syntaxnet: Fix crash in parser_trainer due to inconsistency between LexiconBuilder::Compute()
	   and context.pbtxt definition ('char-map' input declaration was missing).

* syntaxnet: reduce flakiness in GraphBuilderTest.

* syntaxnet: Update tensorflow submodule to version > 0.10.

* syntaxnet: Update to latest stable bazel (0.3.1).

This update comes partially to allow Tensorflow submodule to build
succesffuly. In this commit, I also update and simplify the WORKSPACE,
to avoid declaring dependencies already present in tensorflow.

* syntaxnet: Update bazel version check to require version 0.3.0

* syntaxnet: Document pip requirement, along with python mock module.
parent 2390974a
......@@ -5,17 +5,17 @@ ENV SYNTAXNETDIR=/opt/tensorflow PATH=$PATH:/root/bin
RUN mkdir -p $SYNTAXNETDIR \
&& cd $SYNTAXNETDIR \
&& apt-get update \
&& apt-get install git zlib1g-dev file swig python2.7 python-dev python-pip -y \
&& apt-get install git zlib1g-dev file swig python2.7 python-dev python-pip python-mock -y \
&& pip install --upgrade pip \
&& pip install -U protobuf==3.0.0b2 \
&& pip install -U protobuf==3.0.0 \
&& pip install asciitree \
&& pip install numpy \
&& wget https://github.com/bazelbuild/bazel/releases/download/0.2.2b/bazel-0.2.2b-installer-linux-x86_64.sh \
&& chmod +x bazel-0.2.2b-installer-linux-x86_64.sh \
&& ./bazel-0.2.2b-installer-linux-x86_64.sh --user \
&& wget https://github.com/bazelbuild/bazel/releases/download/0.3.1/bazel-0.3.1-installer-linux-x86_64.sh \
&& chmod +x bazel-0.3.1-installer-linux-x86_64.sh \
&& ./bazel-0.3.1-installer-linux-x86_64.sh --user \
&& git clone --recursive https://github.com/tensorflow/models.git \
&& cd $SYNTAXNETDIR/models/syntaxnet/tensorflow \
&& echo "\n\n\n" | ./configure \
&& echo "\n\n\n\n" | ./configure \
&& apt-get autoremove -y \
&& apt-get clean
......
......@@ -29,7 +29,7 @@ Model
[Martins et al. (2013)](http://www.cs.cmu.edu/~ark/TurboParser/) | 93.10 | 88.23 | 94.21
[Zhang and McDonald (2014)](http://research.google.com/pubs/archive/38148.pdf) | 93.32 | 88.65 | 93.37
[Weiss et al. (2015)](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43800.pdf) | 93.91 | 89.29 | 94.17
[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)* | 94.44 | 90.17 | 95.40
[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)* | 94.44 | 90.17 | 95.40
Parsey McParseface | 94.15 | 89.08 | 94.77
We see that Parsey McParseface is state-of-the-art; more importantly, with
......@@ -45,7 +45,7 @@ Parsey McParseface is also state-of-the-art for part-of-speech (POS) tagging
Model | News | Web | Questions
-------------------------------------------------------------------------- | :---: | :---: | :-------:
[Ling et al. (2015)](http://www.cs.cmu.edu/~lingwang/papers/emnlp2015.pdf) | 97.78 | 94.03 | 96.18
[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)* | 97.77 | 94.80 | 96.86
[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)* | 97.77 | 94.80 | 96.86
Parsey McParseface | 97.52 | 94.24 | 96.45
The first part of this tutorial describes how to install the necessary tools and
......@@ -78,10 +78,16 @@ source. You'll need to install:
* python 2.7:
* python 3 support is not available yet
* pip (python package manager)
* `apt-get install python-pip` on Ubuntu
* `brew` installs pip along with python on OSX
* bazel:
* **versions 0.2.0 - 0.2.2b, NOT 0.2.3**
* **versions 0.3.0 - 0.3.1*
* follow the instructions [here](http://bazel.io/docs/install.html)
* Alternately, Download bazel (0.2.2-0.2.2b) <.deb> from [here](https://github.com/bazelbuild/bazel/releases) for your system configuration.
* Alternately, Download bazel <.deb> from
[https://github.com/bazelbuild/bazel/releases]
(https://github.com/bazelbuild/bazel/releases) for your system
configuration.
* Install it using the command: sudo dpkg -i <.deb file>
* Check for the bazel version by typing: bazel version
* swig:
......@@ -94,12 +100,14 @@ source. You'll need to install:
* `pip install asciitree`
* numpy, package for scientific computing:
* `pip install numpy`
* mock, package for unit testing:
* `pip install mock`
Once you completed the above steps, you can build and test SyntaxNet with the
following commands:
```shell
git clone --recursive --recurse-submodules https://github.com/tensorflow/models.git
git clone --recursive https://github.com/tensorflow/models.git
cd models/syntaxnet/tensorflow
./configure
cd ..
......
local_repository(
name = "org_tensorflow",
path = __workspace_dir__ + "/tensorflow",
path = "tensorflow",
)
load('//tensorflow/tensorflow:workspace.bzl', 'tf_workspace')
tf_workspace("tensorflow/", "@org_tensorflow")
load('@org_tensorflow//tensorflow:workspace.bzl', 'tf_workspace')
tf_workspace()
# Specify the minimum required Bazel version.
load("@org_tensorflow//tensorflow:tensorflow.bzl", "check_version")
check_version("0.2.0")
# ===== gRPC dependencies =====
bind(
name = "libssl",
actual = "@ts_boringssl_git//:ssl",
)
git_repository(
name = "ts_boringssl_git",
commit = "436432d849b83ab90f18773e4ae1c7a8f148f48d",
init_submodules = True,
remote = "https://github.com/mdsteele/boringssl-bazel.git",
)
bind(
name = "zlib",
actual = "@ts_zlib_archive//:zlib",
)
new_http_archive(
name = "ts_zlib_archive",
build_file = "zlib.BUILD",
sha256 = "879d73d8cd4d155f31c1f04838ecd567d34bebda780156f0e82a20721b3973d5",
strip_prefix = "zlib-1.2.8",
url = "http://zlib.net/zlib128.zip",
)
check_version("0.3.0")
......@@ -78,7 +78,7 @@ cc_library(
hdrs = ["base.h"],
visibility = ["//visibility:public"],
deps = [
"@re2//:re2",
"@com_googlesource_code_re2//:re2",
"@protobuf//:protobuf",
"@org_tensorflow//third_party/eigen3",
] + select({
......
......@@ -35,7 +35,6 @@ limitations under the License.
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/framework/tensor_shape.h"
#include "tensorflow/core/lib/core/status.h"
#include "tensorflow/core/lib/io/inputbuffer.h"
#include "tensorflow/core/platform/env.h"
using tensorflow::DEVICE_CPU;
......
......@@ -18,7 +18,6 @@
import os.path
import time
import tensorflow as tf
from tensorflow.python.framework import test_util
......
......@@ -40,9 +40,11 @@ flags.DEFINE_string('corpus_name', 'stdin-conll',
def to_dict(sentence):
"""Builds a dictionary representing the parse tree of a sentence.
Note that the suffix "@id" (where 'id' is a number) is appended to each element
to handle the sentence that has multiple elements with identical representation.
Those suffix needs to be removed after the asciitree is rendered.
Note that the suffix "@id" (where 'id' is a number) is appended to each
element to handle the sentence that has multiple elements with identical
representation. Those suffix needs to be removed after the asciitree is
rendered.
Args:
sentence: Sentence protocol buffer to represent.
......@@ -54,7 +56,8 @@ def to_dict(sentence):
root = -1
for i in range(0, len(sentence.token)):
token = sentence.token[i]
token_str.append('%s %s %s @%d' % (token.word, token.tag, token.label, (i+1)))
token_str.append('%s %s %s @%d' %
(token.word, token.tag, token.label, (i+1)))
if token.head == -1:
root = i
else:
......@@ -88,7 +91,7 @@ def main(unused_argv):
print 'Input: %s' % sentence.text
print 'Parse:'
tr_str = tr(d)
pat = re.compile('\s*@\d+$')
pat = re.compile(r'\s*@\d+$')
for tr_ln in tr_str.splitlines():
print pat.sub('', tr_ln)
......
......@@ -86,6 +86,10 @@ input {
name: 'category-map'
creator: 'brain_pos/greedy'
}
input {
name: 'char-map'
creator: 'brain_pos/greedy'
}
input {
name: 'prefix-table'
creator: 'brain_pos/greedy'
......
......@@ -25,7 +25,7 @@ limitations under the License.
#include "syntaxnet/registry.h"
#include "syntaxnet/sentence.pb.h"
#include "syntaxnet/task_context.h"
#include "tensorflow/core/lib/io/inputbuffer.h"
#include "tensorflow/core/lib/io/buffered_inputstream.h"
namespace syntaxnet {
......@@ -42,7 +42,7 @@ class DocumentFormat : public RegisterableClass<DocumentFormat> {
// Reads a record from the given input buffer with format specific logic.
// Returns false if no record could be read because we reached end of file.
virtual bool ReadRecord(tensorflow::io::InputBuffer *buffer,
virtual bool ReadRecord(tensorflow::io::BufferedInputStream *buffer,
string *record) = 0;
// Converts a key/value pair to one or more documents.
......
......@@ -50,7 +50,6 @@ limitations under the License.
#include "syntaxnet/workspace.h"
#include "tensorflow/core/lib/core/status.h"
#include "tensorflow/core/lib/core/stringpiece.h"
#include "tensorflow/core/lib/io/inputbuffer.h"
#include "tensorflow/core/lib/io/record_reader.h"
#include "tensorflow/core/lib/io/record_writer.h"
#include "tensorflow/core/lib/strings/strcat.h"
......
......@@ -256,7 +256,7 @@ class GreedyParser(object):
self.params[name])
def GetStep(self):
def OnesInitializer(shape, dtype=tf.float32):
def OnesInitializer(shape, dtype=tf.float32, partition_info=None):
return tf.ones(shape, dtype)
return self._AddVariable([], tf.int32, 'step', OnesInitializer)
......@@ -475,7 +475,7 @@ class GreedyParser(object):
def AddPretrainedEmbeddings(self, index, embeddings_path, task_context):
"""Embeddings at the given index will be set to pretrained values."""
def _Initializer(shape, dtype=tf.float32):
def _Initializer(shape, dtype=tf.float32, partition_info=None):
unused_dtype = dtype
t = gen_parser_ops.word_embedding_initializer(
vectors=embeddings_path,
......
......@@ -18,7 +18,6 @@
# disable=no-name-in-module,unused-import,g-bad-import-order,maybe-no-member
import os.path
import tensorflow as tf
from tensorflow.python.framework import test_util
......@@ -221,7 +220,7 @@ class GraphBuilderTest(test_util.TensorFlowTestCase):
with self.test_session(graph=graph1) as sess:
sess.run(parser.inits.values())
metrics1 = None
for _ in range(500):
for _ in range(50):
cost1, _ = sess.run([parser.training['cost'],
parser.training['train_op']])
em1 = parser.evaluation['eval_metrics'].eval()
......@@ -240,7 +239,7 @@ class GraphBuilderTest(test_util.TensorFlowTestCase):
with self.test_session(graph=graph2) as sess:
sess.run(parser.inits.values())
metrics2 = None
for _ in range(500):
for _ in range(50):
cost2, _ = sess.run([parser.training['cost'],
parser.training['train_op']])
em2 = parser.evaluation['eval_metrics'].eval()
......
......@@ -19,7 +19,6 @@
# disable=no-name-in-module,unused-import,g-bad-import-order,maybe-no-member
import os.path
import tensorflow as tf
import syntaxnet.load_parser_ops
......
......@@ -19,7 +19,6 @@
import os
import os.path
import time
import tempfile
import tensorflow as tf
......
......@@ -20,7 +20,6 @@
import os
import os.path
import time
import tensorflow as tf
from tensorflow.python.platform import gfile
......
......@@ -17,12 +17,9 @@
# This test trains a parser on a small dataset, then runs it in greedy mode and
# in structured mode with beam 1, and checks that the result is identical.
set -eux
BINDIR=$TEST_SRCDIR/syntaxnet
BINDIR=$TEST_SRCDIR/$TEST_WORKSPACE/syntaxnet
CONTEXT=$BINDIR/testdata/context.pbtxt
TMP_DIR=/tmp/syntaxnet-output
......
......@@ -32,7 +32,8 @@ limitations under the License.
#include "tensorflow/core/lib/core/errors.h"
#include "tensorflow/core/lib/core/status.h"
#include "tensorflow/core/lib/core/stringpiece.h"
#include "tensorflow/core/lib/io/inputbuffer.h"
#include "tensorflow/core/lib/io/buffered_inputstream.h"
#include "tensorflow/core/lib/io/random_inputstream.h"
#include "tensorflow/core/lib/io/record_reader.h"
#include "tensorflow/core/lib/io/record_writer.h"
#include "tensorflow/core/lib/strings/strcat.h"
......@@ -181,22 +182,27 @@ class TextReader {
if (filename_ == "-") {
static const int kInputBufferSize = 8 * 1024; /* bytes */
file_.reset(new StdIn());
buffer_.reset(
new tensorflow::io::InputBuffer(file_.get(), kInputBufferSize));
stream_.reset(new tensorflow::io::RandomAccessInputStream(file_.get()));
buffer_.reset(new tensorflow::io::BufferedInputStream(file_.get(),
kInputBufferSize));
} else {
static const int kInputBufferSize = 1 * 1024 * 1024; /* bytes */
TF_CHECK_OK(
tensorflow::Env::Default()->NewRandomAccessFile(filename_, &file_));
buffer_.reset(
new tensorflow::io::InputBuffer(file_.get(), kInputBufferSize));
stream_.reset(new tensorflow::io::RandomAccessInputStream(file_.get()));
buffer_.reset(new tensorflow::io::BufferedInputStream(file_.get(),
kInputBufferSize));
}
}
private:
string filename_;
int sentence_count_ = 0;
std::unique_ptr<tensorflow::RandomAccessFile> file_; // must outlive buffer_
std::unique_ptr<tensorflow::io::InputBuffer> buffer_;
std::unique_ptr<tensorflow::RandomAccessFile>
file_; // must outlive buffer_, stream_
std::unique_ptr<tensorflow::io::RandomAccessInputStream>
stream_; // Must outlive buffer_
std::unique_ptr<tensorflow::io::BufferedInputStream> buffer_;
std::unique_ptr<DocumentFormat> format_;
};
......
......@@ -35,7 +35,6 @@ limitations under the License.
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/framework/tensor_shape.h"
#include "tensorflow/core/lib/core/status.h"
#include "tensorflow/core/lib/io/inputbuffer.h"
#include "tensorflow/core/lib/io/table.h"
#include "tensorflow/core/lib/io/table_options.h"
#include "tensorflow/core/lib/strings/stringprintf.h"
......
......@@ -17,12 +17,10 @@
import os.path
import numpy as np
import tensorflow as tf
from tensorflow.python.framework import test_util
from tensorflow.python.ops import control_flow_ops as cf
from tensorflow.python.platform import googletest
from tensorflow.python.platform import tf_logging as logging
......@@ -164,7 +162,9 @@ class ParsingReaderOpsTest(test_util.TensorFlowTestCase):
loop_vars = [epoch, num_actions]
res = sess.run(
cf.While(Condition, Body, loop_vars, parallel_iterations=1))
tf.while_loop(Condition, Body, loop_vars,
shape_invariants=[tf.TensorShape(None)] * 2,
parallel_iterations=1))
logging.info('Result: %s', res)
self.assertEqual(res[0], 2)
......
......@@ -18,6 +18,7 @@ limitations under the License.
#include <memory>
#include <string>
#include <utility>
#include <vector>
#include "syntaxnet/embedding_feature_extractor.h"
......@@ -38,7 +39,7 @@ class SentenceBatch {
public:
SentenceBatch(int batch_size, string input_name)
: batch_size_(batch_size),
input_name_(input_name),
input_name_(std::move(input_name)),
sentences_(batch_size) {}
// Initializes all resources and opens the corpus file.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment