README.md 3.14 KB
Newer Older
Ivan Bogatyy's avatar
Ivan Bogatyy committed
1
2
3
4
5
6
7
8
9
10
11
12
13
# CoNLL2017 Shared Task Instructions

We are pleased to provide a competitive baseline for the [CoNLL2017 Shared Task
on Dependency Parsing](http://universaldependencies.org/conll17/). Note that we
are providing detailed tutorials to make it easier to use DRAGNN as a platform
for improving upon the baselines.

Please see our [paper](paper.pdf) more technical details about the model.

## Running the baselines

*   Install SyntaxNet/DRAGNN following the install instructions.
*   Download the models [here](https://drive.google.com/file/d/0BxpbZGYVZsEeSFdrUnBNMUp1YzQ/view?usp=sharing)
Ivan Bogatyy's avatar
Ivan Bogatyy committed
14
*   Download the contest [data and tools](http://universaldependencies.org/conll17/)
Ivan Bogatyy's avatar
Ivan Bogatyy committed
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
*   Run the baseline_eval.py to run the pre-trained tokenizer and evaluate on
    the dev set.

You should obtain the following results on the dev sets with gold
segmentation. Note: Our segmenter does not split multi-word tokens, which may
not play nice (yet) the official evaluation script.

| Language | UAS | LAS |
| -------- | :--------: | :-------------: |
| Ancient_Greek-PROIEL | 81.52	| 76.87 |
| Ancient_Greek	| 70.96	| 65.13 |
| Arabic	| 84.79	| 78.90 |
| Basque	| 80.96	| 77.19 |
| Bulgarian	| 91.33	| 86.77 |
| Catalan	| 91.32	| 88.76 |
| Chinese	| 77.56	| 71.96 |
| Croatian	| 86.62	| 81.84 |
| Czech-CAC	| 89.99	| 86.09 |
| Czech-CLTT	| 78.25	| 73.70 |
| Czech	| 89.55	| 85.23 |
| Danish	| 84.69	| 81.36 |
| Dutch-LassySmall | 84.12	| 80.85 |
| Dutch	| 86.68	| 81.91 |
| English-LinES	| 82.43	| 78.46 |
| English-ParTUT	| 83.55	| 79.00 |
| English	| 87.60	| 84.20 |
| Estonian	| 75.77	| 67.76 |
| Finnish-FTB	| 87.54	| 83.70 |
| Finnish	| 87.05	| 83.33 |
| French-ParTUT	| 85.12	| 80.79 |
| French-Sequoia	| 87.90	| 85.74 |
| French	| 91.05	| 88.48 |
| Galician-TreeGal | 75.26	| 69.50 |
| Galician	| 84.64	| 81.58 |
| German	| 85.53	| 81.27 |
| Gothic	| 81.79	| 74.99 |
| Greek	| 86.99	| 84.23 |
| Hebrew	| 87.79	| 82.18 |
| Hindi	| 93.73	| 90.10 |
| Hungarian	| 78.68	| 73.03 |
| Indonesian	| 83.02	| 76.51 |
| Irish	| 75.02	| 65.66 |
| Italian-ParTUT	| 85.09	| 80.90 |
| Italian	| 90.73	| 87.71 |
| Japanese	| 95.33	| 93.99 |
| Kazakh	| 28.09	| 7.87 |
| Korean	| 81.21	| 76.78 |
| Latin-ITTB	| 82.86	| 78.43 |
| Latin-PROIEL	| 79.52	| 73.58 |
| Latin	| 64.72	| 54.59 |
| Latvian	| 76.17	| 70.55 |
| Norwegian-Bokmaal | 91.23	| 88.79 |
| Norwegian-Nynorsk | 89.32	| 86.67 |
| Old_Church_Slavonic | 84.96	| 79.65 |
| Persian	| 87.70	| 83.98 |
| Polish	| 91.32	| 86.83 |
| Portuguese-BR	| 92.36	| 90.60 |
| Portuguese	| 90.60	| 88.12 |
| Romanian	| 89.41	| 83.00 |
| Russian-SynTagRus | 91.51	| 89.05 |
| Russian	| 85.18	| 80.71 |
| Slovak	| 88.08	| 82.64 |
| Slovenian-SST	| 66.77	| 59.38 |
| Slovenian	| 89.85	| 87.62 |
| Spanish-AnCora | 91.02	| 88.61 |
| Spanish	| 90.32	| 87.16 |
| Swedish-LinES	| 83.67	| 78.96 |
| Swedish	| 82.45	| 78.75 |
| Turkish	| 68.81	| 60.57 |
| Ukrainian	| 72.19	| 62.79 |
| Urdu	| 85.50	| 79.19 |
| Uyghur	| 69.23	| 43.27 |
| Vietnamese	| 65.18	| 55.61 |

## Using DRAGNN for developing your own models

We hope that DRAGNN will be useful as a starting point for deep learning parsing
methods. We've provided a few recipes for alternative baselines sprinkled
through the tutorials and examples.