https://github.com/cyk1337/eval4ner

[PyPI] NER MUC evaluation toolkit in Python
https://github.com/cyk1337/eval4ner

named-entity-recognition ner ner-evaluation nlp python-implementaion

Last synced: 4 months ago
JSON representation

[PyPI] NER MUC evaluation toolkit in Python

Host: GitHub
URL: https://github.com/cyk1337/eval4ner
Owner: cyk1337
License: mit
Created: 2018-11-23T07:08:33.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2025-08-16T06:00:07.000Z (11 months ago)
Last Synced: 2026-03-28T00:58:08.042Z (4 months ago)
Topics: named-entity-recognition, ner, ner-evaluation, nlp, python-implementaion
Language: Python
Homepage:
Size: 172 KB
Stars: 16
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # eval4ner: An All-Round Evaluation for Named Entity Recognition

![Stable version](https://img.shields.io/pypi/v/eval4ner)

![Python3](https://img.shields.io/pypi/pyversions/eval4ner)![wheel:eval4ner](https://img.shields.io/pypi/wheel/eval4ner)

![Download](https://img.shields.io/pypi/dm/eval4ner)

![MIT License](https://img.shields.io/pypi/l/eval4ner)

Table of Contents

=================

- [TL;DR](https://github.com/cyk1337/eval4ner/#tldr)

- [Preliminaries for NER Evaluation](https://github.com/cyk1337/eval4ner/#preliminaries-for-ner-evaluation)

- [User Guide](https://github.com/cyk1337/eval4ner/#user-guide)

    - [Installation](https://github.com/cyk1337/eval4ner/#installation)

    - [Usage](https://github.com/cyk1337/eval4ner/#usage)

- [Citation](https://github.com/cyk1337/eval4ner/#citation)

- [References](https://github.com/cyk1337/eval4ner/#references)

This is a Python toolkit of MUC-5 evaluation metrics for evaluating Named Entity Recognition (NER) results. 

## TL;DR

It considers not only the mode of strict matching, *i.e.*, extracted entities are correct w.r.t both boundaries and types, but that of partial match, summarizing as following four modes:  

- Strict：exact match (Both entity boundary and type are correct)

- Exact boundary matching：predicted entity boundary is correct, regardless of entity boundary

- Partial boundary matching：entity boundaries overlap, regardless of entity boundary

- Type matching：some overlap between the system tagged entity and the gold annotation is required;

Refer to the blog [Evaluation Metrics of Name Entity Recognition](https://ychai.uk/notes/2018/11/21/NLP/NER/NER-Evaluation-Metrics/#SemEval%E2%80%9813) for explanations of MUC metric.

## Preliminaries for NER Evaluation

In research and production, following scenarios of NER systems can occur frequently: 

  

    Scenario

    Golden Standard

    NER system prediction

    Measure

  

  

    

    Entity Type

    Entity Boundary (Surface String)

    Entity Type

    Entity Boundary (Surface String)

    Type

    Partial

    Exact

    Strict

  

  

    III

    MUSIC_NAME

    告白气球

    

    

    MIS

    MIS

    MIS

    MIS

  

  

    II

    

    

    MUSIC_NAME

    年轮

    SPU

    SPU

    SPU

    SPU

  

  

    V

    MUSIC_NAME

    告白气球

    MUSIC_NAME

    一首告白气球

    COR

    PAR

    INC

    INC

  

  

    IV

    MUSIC_NAME

    告白气球

    SINGER

    告白气球

    INC

    COR

    COR

    INC

  

  

    I

    MUSIC_NAME

    告白气球

    MUSIC_NAME

    告白气球

    COR

    COR

    COR

    COR

  

  

    VI

    MUSIC_NAME

    告白气球

    SINGER

    一首告白气球

    INC

    PAR

    INC

    INC

  

Thus, MUC-5 takes into account all these scenarios for all-sided evaluation. 

Then we can compute:

**Number of golden standard**:

Possible(POS) = COR + INC + PAR + MIS = TP + FN

**Number of predictee**: 

Actual(ACT) = COR + INC + PAR + SPU = TP + FP

The evaluation type of exact match and partial match are as follows:

### Exact match(i.e. Strict, Exact)

$\text{Precision = COR / ACT = TP / (TP + FP)}$

$\text{Recall = COR / POS = TP / (TP + FN)}$

### Partial match (i.e. Partial, Type)

$\text{Precision = (COR + 0.5 * PAR) /ACT}$

$\text{Recall = (COR + 0.5 * PAR)/ POS }$

### F-Measure

$F_\alpha = ((\alpha^2 + 1)* PR) / (\alpha^2 P + R)$

$F_1 = (2PR)/ (P +R)$

Therefore, we can get the results:

  

    Measure

    Type

    Partial

    Exact

    Strict

  

  

    Correct

    2

    2

    2

    1

  

  

    Incorrect

    2

    0

    2

    3

  

  

    Partial

    0

    2

    0

    0

  

  

    Missed

    1

    1

    1

    1

  

  

    Spurius

    1

    1

    1

    1

  

  

    Precision

    0.4

    0.6

    0.4

    0.2

  

  

    Recall

    0.4

    0.6

    0.4

    0.2

  

  

    F1 score

    0.4

    0.6

    0.4

    0.2

  

## User Guide

### Installation

```bash

pip install [-U] eval4ner

```

### Usage

#### 1. Evaluate single prediction

```python

import eval4ner.muc as muc

import pprint

grount_truth = [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]

prediction = [('PER', 'John Jones and Peter Peters came to York')]

text = 'John Jones and Peter Peters came to York'

one_result = muc.evaluate_one(prediction, grount_truth, text)

pprint.pprint(one_result)

```

Output:

```bash

{'exact': {'actual': 1,

           'correct': 0,

           'f1_score': 0,

           'incorrect': 1,

           'missed': 2,

           'partial': 0,

           'possible': 3,

           'precision': 0.0,

           'recall': 0.0,

           'spurius': 0},

 'partial': {'actual': 1,

             'correct': 0,

             'f1_score': 0.25,

             'incorrect': 0,

             'missed': 2,

             'partial': 1,

             'possible': 3,

             'precision': 0.5,

             'recall': 0.16666666666666666,

             'spurius': 0},

 'strict': {'actual': 1,

            'correct': 0,

            'f1_score': 0,

            'incorrect': 1,

            'missed': 2,

            'partial': 0,

            'possible': 3,

            'precision': 0.0,

            'recall': 0.0,

            'spurius': 0},

 'type': {'actual': 1,

          'correct': 1,

          'f1_score': 0.5,

          'incorrect': 0,

          'missed': 2,

          'partial': 0,

          'possible': 3,

          'precision': 1.0,

          'recall': 0.3333333333333333,

          'spurius': 0}}

```

#### 2. Evaluate all predictions

```python

import eval4ner.muc as muc

# ground truth

grount_truths = [

    [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')],

    [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')],

    [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]

]

# NER model prediction

predictions = [

    [('PER', 'John Jones and Peter Peters came to York')],

    [('LOC', 'John Jones'), ('PER', 'Peters'), ('LOC', 'York')],

    [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]

]

# input texts

texts = [

    'John Jones and Peter Peters came to York',

    'John Jones and Peter Peters came to York',

    'John Jones and Peter Peters came to York'

]

muc.evaluate_all(predictions, grount_truths * 1, texts, verbose=True)

```

Output:

```bash

 NER evaluation scores:

  strict mode, Precision=0.4444, Recall=0.4444, F1:0.4444

   exact mode, Precision=0.5556, Recall=0.5556, F1:0.5556

 partial mode, Precision=0.7778, Recall=0.6667, F1:0.6944

    type mode, Precision=0.8889, Recall=0.6667, F1:0.7222

```

This repo will be long-term supported. Welcome to contribute and PR.

## Citation

For attribution in academic contexts, please cite this work as:

```

@misc{eval4ner,

  title={Evaluation Metrics of Named Entity Recognition},

  author={Chai, Yekun},

  year={2018},

  howpublished={\url{https://cyk1337.github.io/notes/2018/11/21/NLP/NER/NER-Evaluation-Metrics/}},

}

@misc{chai2018-ner-eval,

  author = {Chai, Yekun},

  title = {eval4ner: An All-Round Evaluation for Named Entity Recognition},

  year = {2019},

  publisher = {GitHub},

  journal = {GitHub repository},

  howpublished = {\url{https://github.com/cyk1337/eval4ner}}

}

```

## References

1. [Evaluation of the SemEval-2013 Task 9.1: Recognition and Classification of pharmacological substances](https://www.cs.york.ac.uk/semeval-2013/task9/data/uploads/semeval_2013-task-9_1-evaluation-metrics.pdf)

2. [MUC-5 Evaluation Metrics](https://www.aclweb.org/anthology/M93-1007.pdf)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cyk1337/eval4ner

Awesome Lists containing this project

README