Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/adirthaborgohain/ner-re

A Named Entity Recognition + Entity Linker + Relation Extraction Pipeline built using spacy v3. Given a text, the pipeline will extract entities from the text as trained and will disambiguate the entities to its normalized form through an Entity Linker connected to a Knowledge Base and will assign a relation between the entities, if any.
https://github.com/adirthaborgohain/ner-re

named-entity-recognition nlp relation-extraction spacy transformers

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/adirthaborgohain/ner-re
Owner: AdirthaBorgohain
Created: 2021-11-01T13:00:13.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-06-02T06:23:13.000Z (over 1 year ago)
Last Synced: 2024-11-02T09:42:16.752Z (3 months ago)
Topics: named-entity-recognition, nlp, relation-extraction, spacy, transformers
Language: Python
Homepage:
Size: 15.1 MB
Stars: 36
Watchers: 3
Forks: 9
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# NER-RE

A Named Entity Recognition + Relation Extraction Pipeline built using spaCy v3.0. Given a text, the pipeline will
extract entites from the text as trained and will assign a relation between the entities, if any.

There are three separate models: A Named Entity Recognition Model, an Entity Linker Model and Relation Extraction Model.

- Three separate directories host the code for the three models:
- NER: Named Entity Recognition Model
- EL: Entity Linker Model
- RE: Relation Extraction Model

Information extraction from unstructured texts – contracts, financial papers, healthcare records, and so on — is one of
the most useful applications of Natural Language Processing (NLP), as it enables autonomous data query to derive fresh
insights. Named entity recognition (NER) has long been used to recognise entities inside a text and retain the
information for sophisticated querying and filtering. However, because we don't know how the entities are related to one
another, NER alone isn't adequate to grasp the unstructured text meaningfully. Joint NER and relation extraction (RE)
offers up a whole new approach of retrieving information using knowledge graphs, where we may travel between nodes to
identify hidden relationships. As a result, completing these two tasks together can be very useful.

Here, we fine-tune a transformers model for NER using spaCy v3, and on top of it, we add relation extraction as well to
the pipeline finetuning a transformers model.

Also, an Entity Linker component is present. The Entity Linker is a SpaCy component which performs linking to a knowledge
base. The linker simply performs a string overlap - based search (char-3grams) on named entities, comparing them with
the concepts in a knowledge base using an **approximate nearest neighbours search (ANN)**.

At its core, the Entity Linker is a nearest neighbourhood search and the relation extraction model a classifier that
predicts a relation `r` for a given pair of entity `{e1, e2}`. In case of transformers, this classifier is added on top
of the output hidden states. The pre-trained model that we fine-tune here is a **BERT** model but any pre-trained
model available in huggingface library can be used by simply inputting the name in the config files.

Sample annotated data for training can be found inside the assets directory of both the model folders. This repository
contains code for training NER and RE models and then using them together using the PipelineModel class
from ``PipelineModel.py`` python script. Codebase for both the models follow spaCy's recommended file structuring.

### Screenshot:

![example](example.png)

### Important Commands:

The following commands are common to both the models:

- To run the whole flow:
* `spacy project run all_gpu` (Using GPU)
* `spacy project run all` (Using CPU)

P.S. : If using CPU, model will not use transformers and will be trained using tok2vec instead.

- All runnable commands are mentioned in project.yaml

- For training, three annotated files are needed:
* annotations_train.jsonl
* annotations_dev.jsonl
* annotations_test.jsonl

### Note:

Ensure all modules and libraries mentioned in requirements.txt are installed before running the project.