You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+39-21
Original file line number
Diff line number
Diff line change
@@ -1,39 +1,50 @@
1
-
# muceval
2
-
MUC-like evaluation script for named entity recognition systems as used in the advanced research project for NLP.
1
+
# nereval
2
+
Evaluation script for named entity recognition (NER) systems based on entity-level F1 score.
3
+
4
+
## Definition
5
+
The metric as implemented here has been described by Nadeau and Sekine (2007) and was widely used as part of the Message Understanding Conferences (Grishman and Sundheim, 1996). It evaluates an NER system according to two axes: whether it is able to assign the right type to an entity, and whether it finds the exact entity boundaries. For both axes, the number of correct predictions (COR), the number of actual predictions (ACT) and the number of possible predictions (POS) are computed. From these statistics, precision and recall can be derived:
6
+
7
+
```
8
+
precision = COR/ACT
9
+
recall = COR/POS
10
+
```
11
+
12
+
The final score is the micro-averaged F1 measure of precision and recall of both type and boundary axes.
The script can either be used from within Python or from the command line when classification results have been written to a JSON file.
17
23
18
24
### Usage from Command Line
19
-
Assume we have the following classification results in `input.json`:
25
+
Assume we have the following classification results in `examples/input.json`:
20
26
21
27
```json
22
28
[
23
29
{
24
-
"text": "a b",
30
+
"text": "CILINDRISCHE PLUG",
25
31
"true": [
26
32
{
27
-
"text": "a",
28
-
"type": "NAME",
33
+
"text": "CILINDRISCHE PLUG",
34
+
"type": "Productname",
29
35
"start": 0
30
36
}
31
37
],
32
38
"predicted": [
33
39
{
34
-
"text": "a",
35
-
"type": "LOCATION",
40
+
"text": "CILINDRISCHE",
41
+
"type": "Productname",
36
42
"start": 0
43
+
},
44
+
{
45
+
"text": "PLUG",
46
+
"type": "Productname",
47
+
"start": 13
37
48
}
38
49
]
39
50
}
@@ -43,16 +54,16 @@ Assume we have the following classification results in `input.json`:
43
54
Then the script can be executed as follows:
44
55
45
56
```sh
46
-
python muceval.py input.json
47
-
F1-score: 0.50
57
+
python nereval.py examples/input.json
58
+
F1-score: 0.33
48
59
```
49
60
50
61
### Usage from Python
51
62
Alternatively, the evaluation metric can be directly invoked from within python. Example:
52
63
53
64
```py
54
-
importmuceval
55
-
frommucevalimport Entity
65
+
importnereval
66
+
fromnerevalimport Entity
56
67
57
68
# Ground-truth:
58
69
# CILINDRISCHE PLUG
@@ -71,13 +82,13 @@ y_pred = [
71
82
Entity('PLUG', 'Productname', 13)
72
83
]
73
84
74
-
score =muceval.evaluate([y_true], [y_pred])
85
+
score =nereval.evaluate([y_true], [y_pred])
75
86
print('F1-score: %.2f'% score)
76
87
F1-score: 0.33
77
88
```
78
89
79
-
## Important Note on Symmetry
80
-
The metric itself is not symmetric due to the inherent problem of word overlaps in NER. So `evaluate(y_true, y_pred) != evaluate(y_pred, y_true)`. This comes apparent if we consider the following example (tagger uses an IOB scheme):
90
+
## Note on Symmetry
91
+
The metric itself is not symmetric due to the inherent problem of word overlaps in NER. So `evaluate(y_true, y_pred) != evaluate(y_pred, y_true)`. This comes apparent if we consider the following example (tagger uses an BIO scheme):
81
92
82
93
```
83
94
# Example 1:
@@ -96,3 +107,10 @@ Predicted: B_PROD I_PROD B_PROD B_DIM O
96
107
Correct Text: 2
97
108
Correct Type: 3
98
109
```
110
+
111
+
## Notes and References
112
+
Used in a student research project on natural language processing at [University of Twente, Netherlands](https://www.utwente.nl).
113
+
114
+
**References**
115
+
* Grishman, R., & Sundheim, B. (1996). [Message understanding conference-6: A brief history](http://www.aclweb.org/anthology/C96-1079). *In COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics* (Vol. 1).
116
+
* Nadeau, D., & Sekine, S. (2007). [A survey of named entity recognition and classification](http://www.jbe-platform.com/content/journals/10.1075/li.30.1.03nad). *Lingvisticae Investigationes*, 30(1), 3-26.
0 commit comments