Skip to content

Commit 1ec18fa

Browse files
author
matteo
committed
update README
1 parent aebd050 commit 1ec18fa

File tree

1 file changed

+53
-46
lines changed

1 file changed

+53
-46
lines changed

README.md

Lines changed: 53 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -29,80 +29,81 @@ The training set is available at `train.json`. Each document contains a number o
2929
{
3030
"fold": 2,
3131
"documentId": "8313",
32-
"documentText": "Gennaro Basile was an Italian painter, born in Naples but active in the German-speaking countries. He settled at Brünn, in Moravia, and lived about 1756. His best picture is the altar-piece in the chapel of the chateau at Seeberg, in Salzburg. Most of his works remained in Moravia.",
32+
"source": "DBpedia Abstract",
33+
"documentText": "Gennaro Basile\n\nGennaro Basile was an Italian painter, born in Naples but active in the German-speaking countries. He settled at Brünn, in Moravia, and lived about 1756. His best picture is the altar-piece in the chapel of the chateau at Seeberg, in Salzburg. Most of his works remained in Moravia.",
3334
"passages": [
3435
{
35-
"passageId": "8313:0:98",
36+
"passageId": "8313:16:114",
37+
"passageStart": 16,
38+
"passageEnd": 114,
39+
"passageText": "Gennaro Basile was an Italian painter, born in Naples but active in the German-speaking countries.",
3640
"exhaustivelyAnnotatedProperties": [
37-
{
41+
{
3842
"propertyId": "12",
3943
"propertyName": "PLACE_OF_BIRTH",
4044
"propertyDescription": "Describes the relationship between a person and the location where she/he was born."
4145
}
4246
],
43-
"passageStart": 0,
44-
"passageEnd": 98,
45-
"passageText": "Gennaro Basile was an Italian painter, born in Naples but active in the German-speaking countries.",
4647
"facts": [
4748
{
48-
"factId": "8313:0:14:47:53:12",
49+
"factId": "8313:16:30:63:69:12",
4950
"propertyId": "12",
5051
"humanReadable": "<Gennaro Basile> <PLACE_OF_BIRTH> <Naples>",
5152
"annotatedPassage": "<Gennaro Basile> was an Italian painter, born in <Naples> but active in the German-speaking countries.",
52-
"subjectStart": 0,
53-
"subjectEnd": 14,
53+
"subjectStart": 16,
54+
"subjectEnd": 30,
5455
"subjectText": "Gennaro Basile",
5556
"subjectUri": "http://www.wikidata.org/entity/Q19517888",
56-
"objectStart": 47,
57-
"objectEnd": 53,
57+
"objectStart": 63,
58+
"objectEnd": 69,
5859
"objectText": "Naples",
5960
"objectUri": "http://www.wikidata.org/entity/Q2634"
6061
}
6162
]
6263
},
6364
{
64-
"passageId": "8313:99:153",
65+
"passageId": "8313:115:169",
66+
"passageStart": 115,
67+
"passageEnd": 169,
68+
"passageText": "He settled at Brünn, in Moravia, and lived about 1756.",
6569
"exhaustivelyAnnotatedProperties": [
66-
{
67-
"propertyId": "12",
68-
"propertyName": "PLACE_OF_BIRTH",
69-
"propertyDescription": "Describes the relationship between a person and the location where she/he was born."
70-
},
7170
{
7271
"propertyId": "11",
7372
"propertyName": "PLACE_OF_RESIDENCE",
7473
"propertyDescription": "Describes the relationship between a person and the location where she/he lives/lived."
74+
},
75+
{
76+
"propertyId": "12",
77+
"propertyName": "PLACE_OF_BIRTH",
78+
"propertyDescription": "Describes the relationship between a person and the location where she/he was born."
7579
}
7680
],
77-
"passageStart": 99,
78-
"passageEnd": 153,
79-
"passageText": "He settled at Brünn, in Moravia, and lived about 1756.",
8081
"facts": [
8182
{
82-
"factId": "8313:99:101:113:118:11",
83+
"factId": "8313:115:117:129:134:11",
8384
"propertyId": "11",
8485
"humanReadable": "<He> <PLACE_OF_RESIDENCE> <Brünn>",
8586
"annotatedPassage": "<He> settled at <Brünn>, in Moravia, and lived about 1756.",
86-
"subjectStart": 99,
87-
"subjectEnd": 101,
87+
"subjectStart": 115,
88+
"subjectEnd": 117,
8889
"subjectText": "He",
8990
"subjectUri": "http://www.wikidata.org/entity/Q19517888",
90-
"objectStart": 113,
91-
"objectEnd": 118,
91+
"objectStart": 129,
92+
"objectEnd": 134,
9293
"objectText": "Brünn",
9394
"objectUri": "http://www.wikidata.org/entity/Q14960"
9495
},
9596
{
96-
"factId": "8313:99:101:123:130:11",
97+
"factId": "8313:115:117:139:146:11",
9798
"propertyId": "11",
9899
"humanReadable": "<He> <PLACE_OF_RESIDENCE> <Moravia>",
99100
"annotatedPassage": "<He> settled at Brünn, in <Moravia>, and lived about 1756.",
100-
"subjectStart": 99,
101-
"subjectEnd": 101,
101+
"subjectStart": 115,
102+
"subjectEnd": 117,
102103
"subjectText": "He",
103104
"subjectUri": "http://www.wikidata.org/entity/Q19517888",
104-
"objectStart": 123,
105-
"objectEnd": 130,
105+
"objectStart": 139,
106+
"objectEnd": 146,
106107
"objectText": "Moravia",
107108
"objectUri": "http://www.wikidata.org/entity/Q43266"
108109
}
@@ -117,40 +118,46 @@ The training set is available at `train.json`. Each document contains a number o
117118
The official evaluation script is also available for download and can be used to evaluate a system using the training set (via cross-validation). The script takes a gold standard file (e.g., `train.json`) and a prediction file (which needs to be produced by the system). The prediction file should look exactly like the gold standard file (same documents and fields), except for the contents of `facts` (which should contain the facts predicted by the system).
118119

119120
```
120-
usage: evaluator.py [-h] [-e {span_e,span_o,uri}] [-c] [-a ANALYSISPATH] [-f {1,2,3,4,5}]
121+
usage: evaluator.py [-h] [-e {span_exact,span_overlap,uri}] [-c]
122+
[-a ANALYSISPATH] [-f {1,2,3,4,5,-1}]
121123
goldFile predictionFile
122124
123-
mandatory arguments:
124-
goldFile Path of the KnowledgeNet file with the gold data
125-
predictionFile Path of the KnowledgeNet file with the predicted data
126-
-e {span_e,span_o,uri} Choose the evaluation method: span-exact vs span-overlap vs uri
127-
128-
optional arguments:
129-
-h, --help show this help message and exit
130-
-c print raw counts of tp/fn/fp for prec/rec/F1 metrics
131-
-a ANALYSISPATH Folder to store error analysis files (default=no analysis).
132-
-f {1,2,3,4,5} folds to evaluate (useful during cross-validation). Default is 4.
125+
positional arguments:
126+
goldFile path of the KnowledgeNet file with the gold data
127+
predictionFile path of the KnowledgeNet file with the predicted data
133128
129+
optional arguments:
130+
-h, --help show this help message and exit
131+
-e {span_exact,span_overlap,uri} choose the evaluation method: span-exact vs span-overlap vs uri
132+
-c print raw counts of tp/fn/fp for prec/rec/F1 metrics
133+
-a ANALYSISPATH folder to store error analysis and results files
134+
(default=no analysis).
135+
-f {1,2,3,4,5,-1} folds to evaluate. Default is 4. Choose -1 to evaluate on all the folds.
134136
```
135137

136138
The prediction file has to keep the same unique identifiers and attributes for the given documents and passages.
137-
Each new fact has to be described by a `factId` (obtained as explained above) and should contain the following attributes that are needed to run the evaluation script:
139+
Each new fact must contain the following attributes that are needed to run the evaluation script:
138140
* `subjectStart`
139141
* `subjectEnd`
140142
* `objectStart`
141143
* `objectEnd`
142-
* `subjectUri`
143-
* `objectUri`
144+
* `subjectUri` (can be empty)
145+
* `objectUri` (can be empty)
144146
* `propertyId`
145147

148+
A `factId` will be automatically generated from these attributes.
149+
146150
#### Evaluation Methods
147151
Two facts are considered the same when they have the same property, and there is a match between the values for subject and object.
148152

149153
We consider three different methods to establish if there is a match:
150-
* **Span Overlap** (`span_o`): there is an overlap between the character offsets
151-
* **Span Exact** (`span_e`): the character offsets are exactly the same
154+
* **Span Overlap** (`span_overlap`): there is an overlap between the character offsets (set as default in the evaluation script)
155+
* **Span Exact** (`span_exact`): the character offsets are exactly the same
152156
* **URI** (`uri`): Wikidata URIs are the same (only applies to facts that have URIs for both the subject and the object)
153157

158+
#### Error Analysis
159+
In order to facilitate error analysis the script creates a simple html file for browser visualization. It can be enabled using the option `-a`.
160+
154161
## Adding a system to the leaderboard
155162

156163
To preserve the integrity of the results, we have released the test set (fifth fold) without annotations (`test-no-facts.json`). To evaluate the results of your system and (optionally) add your system to the leaderboard, please send an email with your prediction file to filipe[at]diffbot[dot]com.

0 commit comments

Comments
 (0)