Skip to content

Commit 4c5e5f5

Browse files
committed
rename script to nereval.py and add metric defintion
1 parent eb08175 commit 4c5e5f5

File tree

6 files changed

+61
-37
lines changed

6 files changed

+61
-37
lines changed

Makefile

+2-2
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ test:
99
pytest
1010

1111
test-coverage:
12-
pytest --cov=muceval --cov-report term
12+
pytest --cov=nereval --cov-report term
1313

1414
lint:
15-
pylint muceval.py || exit 0
15+
pylint nereval.py || exit 0

README.md

+39-21
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,50 @@
1-
# muceval
2-
MUC-like evaluation script for named entity recognition systems as used in the advanced research project for NLP.
1+
# nereval
2+
Evaluation script for named entity recognition (NER) systems based on entity-level F1 score.
3+
4+
## Definition
5+
The metric as implemented here has been described by Nadeau and Sekine (2007) and was widely used as part of the Message Understanding Conferences (Grishman and Sundheim, 1996). It evaluates an NER system according to two axes: whether it is able to assign the right type to an entity, and whether it finds the exact entity boundaries. For both axes, the number of correct predictions (COR), the number of actual predictions (ACT) and the number of possible predictions (POS) are computed. From these statistics, precision and recall can be derived:
6+
7+
```
8+
precision = COR/ACT
9+
recall = COR/POS
10+
```
11+
12+
The final score is the micro-averaged F1 measure of precision and recall of both type and boundary axes.
313

414
## Installation
515
```sh
6-
git clone https://github.com/jantrienes/twente-arp-nlp-evaluation.git
7-
cd twente-arp-nlp-evaluation
8-
# either install this as module via pip
16+
git clone https://github.com/jantrienes/nereval.git
17+
cd nereval
918
pip install .
10-
11-
# or copy main python file into local project
12-
cp muceval.py ~/theproject
1319
```
1420

1521
## Usage
1622
The script can either be used from within Python or from the command line when classification results have been written to a JSON file.
1723

1824
### Usage from Command Line
19-
Assume we have the following classification results in `input.json`:
25+
Assume we have the following classification results in `examples/input.json`:
2026

2127
```json
2228
[
2329
{
24-
"text": "a b",
30+
"text": "CILINDRISCHE PLUG",
2531
"true": [
2632
{
27-
"text": "a",
28-
"type": "NAME",
33+
"text": "CILINDRISCHE PLUG",
34+
"type": "Productname",
2935
"start": 0
3036
}
3137
],
3238
"predicted": [
3339
{
34-
"text": "a",
35-
"type": "LOCATION",
40+
"text": "CILINDRISCHE",
41+
"type": "Productname",
3642
"start": 0
43+
},
44+
{
45+
"text": "PLUG",
46+
"type": "Productname",
47+
"start": 13
3748
}
3849
]
3950
}
@@ -43,16 +54,16 @@ Assume we have the following classification results in `input.json`:
4354
Then the script can be executed as follows:
4455

4556
```sh
46-
python muceval.py input.json
47-
F1-score: 0.50
57+
python nereval.py examples/input.json
58+
F1-score: 0.33
4859
```
4960

5061
### Usage from Python
5162
Alternatively, the evaluation metric can be directly invoked from within python. Example:
5263

5364
```py
54-
import muceval
55-
from muceval import Entity
65+
import nereval
66+
from nereval import Entity
5667

5768
# Ground-truth:
5869
# CILINDRISCHE PLUG
@@ -71,13 +82,13 @@ y_pred = [
7182
Entity('PLUG', 'Productname', 13)
7283
]
7384

74-
score = muceval.evaluate([y_true], [y_pred])
85+
score = nereval.evaluate([y_true], [y_pred])
7586
print('F1-score: %.2f' % score)
7687
F1-score: 0.33
7788
```
7889

79-
## Important Note on Symmetry
80-
The metric itself is not symmetric due to the inherent problem of word overlaps in NER. So `evaluate(y_true, y_pred) != evaluate(y_pred, y_true)`. This comes apparent if we consider the following example (tagger uses an IOB scheme):
90+
## Note on Symmetry
91+
The metric itself is not symmetric due to the inherent problem of word overlaps in NER. So `evaluate(y_true, y_pred) != evaluate(y_pred, y_true)`. This comes apparent if we consider the following example (tagger uses an BIO scheme):
8192

8293
```
8394
# Example 1:
@@ -96,3 +107,10 @@ Predicted: B_PROD I_PROD B_PROD B_DIM O
96107
Correct Text: 2
97108
Correct Type: 3
98109
```
110+
111+
## Notes and References
112+
Used in a student research project on natural language processing at [University of Twente, Netherlands](https://www.utwente.nl).
113+
114+
**References**
115+
* Grishman, R., & Sundheim, B. (1996). [Message understanding conference-6: A brief history](http://www.aclweb.org/anthology/C96-1079). *In COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics* (Vol. 1).
116+
* Nadeau, D., & Sekine, S. (2007). [A survey of named entity recognition and classification](http://www.jbe-platform.com/content/journals/10.1075/li.30.1.03nad). *Lingvisticae Investigationes*, 30(1), 3-26.

input.json

+11-6
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,24 @@
11
[
22
{
3-
"text": "a b",
3+
"text": "CILINDRISCHE PLUG",
44
"true": [
55
{
6-
"text": "a",
7-
"type": "NAME",
6+
"text": "CILINDRISCHE PLUG",
7+
"type": "Productname",
88
"start": 0
99
}
1010
],
1111
"predicted": [
1212
{
13-
"text": "a",
14-
"type": "LOCATION",
13+
"text": "CILINDRISCHE",
14+
"type": "Productname",
1515
"start": 0
16+
},
17+
{
18+
"text": "PLUG",
19+
"type": "Productname",
20+
"start": 13
1621
}
1722
]
1823
}
19-
]
24+
]

muceval.py nereval.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ def evaluate(y_true, y_pred):
105105
106106
Example
107107
-------
108-
>>> from muceval import Entity, evaluate
108+
>>> from nereval import Entity, evaluate
109109
>>> y_true = [
110110
... [Entity('a', 'b', 0), Entity('b', 'b', 2)]
111111
... ]

setup.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
from distutils.core import setup
22

33
setup(
4-
name='muceval',
4+
name='nereval',
55
version='0.2.2',
6-
description='MUC-like evaluation script for named entity recognition systems.',
6+
description='Evaluation script for named entity recognition systems based on F1 score.',
77
license='MIT',
8-
py_modules=['muceval'],
8+
py_modules=['nereval'],
99
tests_require=[
1010
'pytest',
1111
'pytest-cov',

test_muceval.py test_nereval.py

+5-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import os
22
import pytest
3-
from muceval import (
3+
from nereval import (
44
correct_text, correct_type, count_correct, has_overlap, Entity, precision, recall, evaluate,
55
_parse_json, evaluate_json, sign_test
66
)
@@ -173,9 +173,10 @@ def test_parse_json():
173173
predictions = _parse_json(file_name)
174174
assert len(predictions) == 1
175175
instance = predictions[0]
176-
assert instance['text'] == 'a b'
177-
assert instance['true'][0] == Entity('a', 'NAME', 0)
178-
assert instance['predicted'][0] == Entity('a', 'LOCATION', 0)
176+
assert instance['text'] == 'CILINDRISCHE PLUG'
177+
assert instance['true'][0] == Entity('CILINDRISCHE PLUG', 'Productname', 0)
178+
assert instance['predicted'][0] == Entity('CILINDRISCHE', 'Productname', 0)
179+
assert instance['predicted'][1] == Entity('PLUG', 'Productname', 13)
179180

180181
def test_evaluate_json():
181182
file_name = os.path.join(os.path.dirname(__file__), 'input.json')

0 commit comments

Comments
 (0)