Update CoNLL evaluation table & evaluator.py

24188199 · Ivan Bogatyy · bb9e4418 · 24188199 · 24188199
Commit 24188199 authored Mar 15, 2017 by Ivan Bogatyy
Hide whitespace changes
Inline Side-by-side

Showing with 76 additions and 11 deletions

syntaxnet/dragnn/tools/evaluator.py syntaxnet/dragnn/tools/evaluator.py +1 -1

syntaxnet/g3doc/conll2017/README.md syntaxnet/g3doc/conll2017/README.md +75 -10

No files found.
--- a/syntaxnet/dragnn/tools/evaluator.py
+++ b/syntaxnet/dragnn/tools/evaluator.py
@@ -54,6 +54,7 @@ flags.DEFINE_string('timeline_output_file', '', 'Path to save timeline to. '


 def main(unused_argv):
+  tf.logging.set_verbosity(tf.logging.INFO)

  # Parse the flags containint lists, using regular expressions.
  # This matches and extracts key=value pairs.
@@ -133,7 +134,6 @@ def main(unused_argv):
    tf.logging.info('Processed %d documents in %.2f seconds.',
                    len(input_corpus), time.time() - start_time)
    pos, uas, las = evaluation.calculate_parse_metrics(input_corpus, processed)
-    print 'POS %.2f UAS %.2f LAS %.2f' % (pos, uas, las)

    if FLAGS.output_file:
      with gfile.GFile(FLAGS.output_file, 'w') as f:

--- a/syntaxnet/g3doc/conll2017/README.md
+++ b/syntaxnet/g3doc/conll2017/README.md
@@ -5,25 +5,90 @@ on Dependency Parsing](http://universaldependencies.org/conll17/). Note that we
 are providing detailed tutorials to make it easier to use DRAGNN as a platform
 for improving upon the baselines.

+Please see our [paper](paper.pdf) more technical details about the model.
+
 ## Running the baselines

-*   Install SyntaxNet/DRAGNN following the install instructions in README.md
-*   Download the models here: [link]
+*   Install SyntaxNet/DRAGNN following the install instructions.
+*   Download the models [here](https://drive.google.com/file/d/0BxpbZGYVZsEeSFdrUnBNMUp1YzQ/view?usp=sharing)
 *   Download the contest [data data and
    tools](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1976]).
 *   Run the baseline_eval.py to run the pre-trained tokenizer and evaluate on
    the dev set.

-You should obtain the following results on the dev sets:
-
-NOTE: This will be filled in when the latest model results are available.
+You should obtain the following results on the dev sets with gold
+segmentation. Note: Our segmenter does not split multi-word tokens, which may
+not play nice (yet) the official evaluation script.

-Language | No. tokens | Tokenization F1 | UAS | LAS
-------- | :--------: | :-------------: | :-: | :-:
-Chinese  | XX         | XX              | XX  | XX
+| Language | UAS | LAS |
+| -------- | :--------: | :-------------: |
+| Ancient_Greek-PROIEL | 81.52	| 76.87 |
+| Ancient_Greek	| 70.96	| 65.13 |
+| Arabic	| 84.79	| 78.90 |
+| Basque	| 80.96	| 77.19 |
+| Bulgarian	| 91.33	| 86.77 |
+| Catalan	| 91.32	| 88.76 |
+| Chinese	| 77.56	| 71.96 |
+| Croatian	| 86.62	| 81.84 |
+| Czech-CAC	| 89.99	| 86.09 |
+| Czech-CLTT	| 78.25	| 73.70 |
+| Czech	| 89.55	| 85.23 |
+| Danish	| 84.69	| 81.36 |
+| Dutch-LassySmall | 84.12	| 80.85 |
+| Dutch	| 86.68	| 81.91 |
+| English-LinES	| 82.43	| 78.46 |
+| English-ParTUT	| 83.55	| 79.00 |
+| English	| 87.60	| 84.20 |
+| Estonian	| 75.77	| 67.76 |
+| Finnish-FTB	| 87.54	| 83.70 |
+| Finnish	| 87.05	| 83.33 |
+| French-ParTUT	| 85.12	| 80.79 |
+| French-Sequoia	| 87.90	| 85.74 |
+| French	| 91.05	| 88.48 |
+| Galician-TreeGal | 75.26	| 69.50 |
+| Galician	| 84.64	| 81.58 |
+| German	| 85.53	| 81.27 |
+| Gothic	| 81.79	| 74.99 |
+| Greek	| 86.99	| 84.23 |
+| Hebrew	| 87.79	| 82.18 |
+| Hindi	| 93.73	| 90.10 |
+| Hungarian	| 78.68	| 73.03 |
+| Indonesian	| 83.02	| 76.51 |
+| Irish	| 75.02	| 65.66 |
+| Italian-ParTUT	| 85.09	| 80.90 |
+| Italian	| 90.73	| 87.71 |
+| Japanese	| 95.33	| 93.99 |
+| Kazakh	| 28.09	| 7.87 |
+| Korean	| 81.21	| 76.78 |
+| Latin-ITTB	| 82.86	| 78.43 |
+| Latin-PROIEL	| 79.52	| 73.58 |
+| Latin	| 64.72	| 54.59 |
+| Latvian	| 76.17	| 70.55 |
+| Norwegian-Bokmaal | 91.23	| 88.79 |
+| Norwegian-Nynorsk | 89.32	| 86.67 |
+| Old_Church_Slavonic | 84.96	| 79.65 |
+| Persian	| 87.70	| 83.98 |
+| Polish	| 91.32	| 86.83 |
+| Portuguese-BR	| 92.36	| 90.60 |
+| Portuguese	| 90.60	| 88.12 |
+| Romanian	| 89.41	| 83.00 |
+| Russian-SynTagRus | 91.51	| 89.05 |
+| Russian	| 85.18	| 80.71 |
+| Slovak	| 88.08	| 82.64 |
+| Slovenian-SST	| 66.77	| 59.38 |
+| Slovenian	| 89.85	| 87.62 |
+| Spanish-AnCora | 91.02	| 88.61 |
+| Spanish	| 90.32	| 87.16 |
+| Swedish-LinES	| 83.67	| 78.96 |
+| Swedish	| 82.45	| 78.75 |
+| Turkish	| 68.81	| 60.57 |
+| Ukrainian	| 72.19	| 62.79 |
+| Urdu	| 85.50	| 79.19 |
+| Uyghur	| 69.23	| 43.27 |
+| Vietnamese	| 65.18	| 55.61 |

 ## Using DRAGNN for developing your own models

 We hope that DRAGNN will be useful as a starting point for deep learning parsing
-methods. We've provided a few recipes for alternative baselines in the examples/
-directory; look for more coming soon!
+methods. We've provided a few recipes for alternative baselines sprinkled
+through the tutorials and examples.