https://github.com/wadaboa/ner-annotator

GUI useful to manually annotate text for Named Entity Recognition purposes
https://github.com/wadaboa/ner-annotator

named-entity-recognition ner nlp pyqt5 spacy

Last synced: 28 days ago
JSON representation

GUI useful to manually annotate text for Named Entity Recognition purposes

Host: GitHub
URL: https://github.com/wadaboa/ner-annotator
Owner: Wadaboa
License: mit
Created: 2020-04-17T13:41:32.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2023-06-22T08:40:48.000Z (almost 2 years ago)
Last Synced: 2025-03-24T01:35:14.571Z (about 1 month ago)
Topics: named-entity-recognition, ner, nlp, pyqt5, spacy
Language: Python
Size: 388 KB
Stars: 15
Watchers: 1
Forks: 9
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        [![Downloads](https://pepy.tech/badge/ner-annotator)](https://pepy.tech/project/ner-annotator)

[![Python](https://img.shields.io/pypi/pyversions/ner-annotator)](https://img.shields.io/pypi/pyversions/ner-annotator)

[![PyPy](https://img.shields.io/pypi/v/ner-annotator)](https://img.shields.io/pypi/v/ner-annotator)

# Named Entity Recognition Annotator

This repository contains a NER utility to annotate text, given some entities.

|                                              Dark GUI                                               |                                               Light GUI                                               |

| :-------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------: |

| ![dark-gui](https://raw.githubusercontent.com/Wadaboa/ner-annotator/master/assets/img/gui-dark.png) | ![light-gui](https://raw.githubusercontent.com/Wadaboa/ner-annotator/master/assets/img/gui-light.png) |

## Installation

To install this GUI you need to make sure that you have `Python 3` on your system.

Then, `cd` into the project's root and run:

```bash

pip install .

```

This will install the `ner_annotator` package and its required dependencies (mainly `PyQt5`).

## Usage

To run this utility, execute the following command:

```bash

ner_annotator  -o  -e 

```

Here, `` is the path to the input text file, which should contain your training text lines, separated by newlines; `` is the path to where you would like to save the `.json` output file (if not given, it defaults to the same directory as the input file); `` is the list of entities you would like to annotate.

For example, I could run the program like this:

```bash

ner_annotator '~/Desktop/train.txt' -e 'BirthDate' 'Name'

```

You can also optionally pass an existing NER model to the annotator, so as to identify entities using that model (button between previous and next line controls in the GUI) and eventually modify/add/remove them. For example:

```bash

ner_annotator '~/Desktop/train.txt' -e 'BirthDate' 'Name' -m '~/Desktop/NER'

```

Currently, only `SpaCy` models are supported, but you can contribute to the project and add compatibility with other NER models, by checking the `model.py` file inside the `ner_annotator` package.

The great thing about this package is that it is able to automagically identify the correct library for the given model (i.e. you don't have to specify that your model should be loaded with `SpaCy` or any other NLP library).

## Config file

In order to have a faster annotation experience, you can save your model entities names to reuse them the next time you are going to need this tool.\

To do that, you need to create a `.json` file (see [assets/json/config.json](`config.json`)), with a schema like the following:

```json

{

	"models": [

		{

			"name": "example-1",

			"entities": ["entity-1-1", "entity-1-2", "entity-1-3"]

		},

		{

			"name": "example-2",

			"entities": ["entity-2-1", "entity-2-2"]

		}

	]

}

```

To use the entities of the model `example-1`, for example, you can run:

```bash

python3 annotator.py '~/Desktop/train.txt' -c '~/Desktop/config.json' -n 'example-1'

```

Here, `~/Desktop/config.json` is the path to the `.json` file mentioned above.\

This bash command will be the equivalent in this example:

```bash

python3 annotator.py '~/Desktop/train.txt' -e 'entity-1-1' 'entity-1-2' 'entity-1-3'

```

## Output

The utility software will output a `.json` file with the following schema:

```json

[

	{

		"content": "text",

		"entities": [[0, 1, "entity"]]

	}

]

```

You can convert this output into the specific format required by your NER model by passing the `-p` option to the `ner_annotator` tool. In this way, on your output folder you will also find a `pickle` file (with the same name as the given `.json` output file, but with no extension), which can then be used to load entities in another program with the requested NLP library. To load the saved pickle file, you can do something along these lines:

```python

import pickle

pickle.load(open("~/Desktop/output", 'rb'))

```

In this example, `ner_annotator` was either called with `-o ~/Desktop/output.json` or without the `-o` option but with `-i ~/Desktop/train.txt` or similar.

Currently, only `SpaCy` models conversion is provided.

## Distribution

This package is available on `PyPy`, so you can also install it by simply running:

```bash

pip install ner-annotator

```

You can also install extra packages, like `SpaCy`:

```bash

pip install ner-annotator[spacy]

```

_Personal note_: In order to upload a new version of the package to PyPy, just execute `scripts/deploy.sh`, insert `__token__` as Twine username and the saved API token as Twine password.

## Thanks to

- GUI icons are provided by [Icons8](https://icons8.it)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wadaboa/ner-annotator

Awesome Lists containing this project

README