@@ -1204,16 +1204,19 @@ sake of readability).
1204
1204
<term >CoNLL-U</term >:
1205
1205
<eg xml : space =" preserve" >1 They they PRON _ _ 2 nsubj _ _
1206
1206
2 buy buy VERB _ _ 0 root _ _
1207
- 3 books book NOUN _ _ 2 obj _ _
1207
+ 3 books book NOUN _ _ 2 obj _ SpaceAfter=No
1208
1208
4 . . PUNCT _ _ 2 punct _ _</eg >
1209
1209
In this example, the first column gives a numerical, one-based, node index
1210
1210
of the tokens in the current sentence, the second column a token of a word
1211
1211
form or a symbol, the third column the corresponding lemma, the forth
1212
1212
column a UD part-of-speech tag, the seventh column the node index of the
1213
1213
syntactic head of the current token, and the eighth column a label of the
1214
- type of the dependency relation between the token and its head. (Empty
1215
- columns are marked with <code >_</code >.) Note that, by convention, the
1216
- index for the (unrepresented) root node is 0.</p >
1214
+ type of the dependency relation between the token and its head. The very
1215
+ last, tenth, column can be used for miscellaneous information, such as
1216
+ whether or not there is space to add after the token when joining them.
1217
+ Empty columns are marked with <code >_</code >. Note that, by convention,
1218
+ the index for the unrepresented, <soCalled >virtual</soCalled > root node
1219
+ is 0.</p >
1217
1220
<p >A graphical rendition of this example is given below in terms of an
1218
1221
annotated dependency graph.</p >
1219
1222
<p ><graphic width =" 300px" url =" Images/dependency1.png" /></p >
@@ -1233,7 +1236,7 @@ sake of readability).
1233
1236
<s n =" 0" >
1234
1237
<w n =" 1" head =" 2" deprel =" nsubj" pos =" PRON" lemma =" they" >They</w >
1235
1238
<w n =" 2" head =" 0" deprel =" root" pos =" VERB" lemma =" buy" >buy</w >
1236
- <w n =" 3" head =" 2" deprel =" obj" pos =" NOUN" lemma =" book" >books</w >
1239
+ <w n =" 3" head =" 2" deprel =" obj" pos =" NOUN" lemma =" book" join = " right " >books</w >
1237
1240
<pc n =" 4" head =" 2" deprel =" punct" pos =" PUNCT" lemma =" ." >.</pc >
1238
1241
</s >
1239
1242
</egXML >
@@ -1245,22 +1248,28 @@ sake of readability).
1245
1248
index of their syntactic head. Labels for the types of the dependency
1246
1249
relation are provided as the value of the <att >deprel</att > attributes on
1247
1250
<gi >w</gi > and <gi >pc</gi > elements. Part-of-speech tags and lemmas are
1248
- given as values of <att >pos</att > and <att >lemma</att > attributes.</p >
1251
+ given as values of <att >pos</att > and <att >lemma</att > attributes. Last,
1252
+ but not least, the <att >join</att > attribute can be used for information
1253
+ on whether a token is adjacent to the tokens on its left-hand or
1254
+ right-hand side when joining them.</p >
1249
1255
1250
1256
<p >A more complex example in the CoNLL-U format is given below:<note
1251
1257
place =" bottom" >The example is taken from the documentation of the CoNLL-U
1252
- format at <ptr target =" https://universaldependencies.org/format.html" />.</note >
1258
+ format at <ptr target =" https://universaldependencies.org/format.html" />,
1259
+ with the addition of <code >SpaceAfter=No</code > <!-- in the fifth row -->
1260
+ on the last column for miscellaneous information.</note >
1253
1261
<eg xml : space =" preserve" >1 They they PRON PRP Case=Nom|Number=Plur 2 nsubj 2:nsubj|4:nsubj _
1254
1262
2 buy buy VERB VBP Number=Plur|Person=3|Tense=Pres 0 root 0:root _
1255
1263
3 and and CCONJ CC _ 4 cc 4:cc _
1256
1264
4 sell sell VERB VBP Number=Plur|Person=3|Tense=Pres 2 conj 0:root|2:conj _
1257
- 5 books book NOUN NNS Number=Plur 2 obj 2:obj|4:obj _
1265
+ 5 books book NOUN NNS Number=Plur 2 obj 2:obj|4:obj SpaceAfter=No
1258
1266
6 . . PUNCT . _ 2 punct 2:punct _</eg >
1259
1267
In this grammatical annnotation of a sentence with coordination ellipsis,
1260
- there are additional columns. The fifth column provides a concurrent
1261
- part-of-speech tagging using a non-UD tagset. The sixth column adds a
1262
- pipe-separated list of morphosyntactic features. The ninth column encodes
1263
- an extended dependency structure with additional dependency relations.</p >
1268
+ there are additional non-empty columns. The fifth column provides a
1269
+ concurrent part-of-speech tagging using a non-UD tagset. The sixth column
1270
+ adds a pipe-separated list of morphosyntactic features. And the ninth
1271
+ column encodes an extended dependency structure with additional dependency
1272
+ relations.</p >
1264
1273
<p >As can be seen from the graphical rendition of this example below, there
1265
1274
are dependent nodes with multiple arcs, pointing to more than one head
1266
1275
node.</p >
@@ -1273,7 +1282,7 @@ sake of readability).
1273
1282
<w n =" 2" head =" 0" deprel =" root" pos =" VERB VBP" msd =" Number=Plur|Person=3|Tense=Pres" lemma =" buy" >buy</w >
1274
1283
<w n =" 3" head =" 4" deprel =" cc" pos =" CCONJ CC" lemma =" and" >and</w >
1275
1284
<w n =" 4" head =" 2 0" deprel =" conj root" pos =" VERB VBP" msd =" Number=Plur|Person=3|Tense=Pres" lemma =" sell" >sell</w >
1276
- <w n =" 5" head =" 2 4" deprel =" obj obj" pos =" NOUN NNS" msd =" Number=Plur" lemma =" book" >books</w >
1285
+ <w n =" 5" head =" 2 4" deprel =" obj obj" pos =" NOUN NNS" msd =" Number=Plur" lemma =" book" join = " right " >books</w >
1277
1286
<pc n =" 6" head =" 2" deprel =" punct" pos =" PUNCT ." lemma =" ." >.</pc >
1278
1287
</s >
1279
1288
</egXML >
0 commit comments