Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shoebjoarder/python-taggers-annotation-to-dkpro-core
Proof of Concept for the integration of Flair NLP NER-tagger in DKPro-Core
https://github.com/shoebjoarder/python-taggers-annotation-to-dkpro-core
dkpro-cassis flair flairnlp named-entity-recognition poc
Last synced: 19 days ago
JSON representation
Proof of Concept for the integration of Flair NLP NER-tagger in DKPro-Core
- Host: GitHub
- URL: https://github.com/shoebjoarder/python-taggers-annotation-to-dkpro-core
- Owner: shoebjoarder
- Created: 2020-05-29T12:36:31.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-12-23T18:30:43.000Z (about 3 years ago)
- Last Synced: 2024-10-27T17:54:55.423Z (2 months ago)
- Topics: dkpro-cassis, flair, flairnlp, named-entity-recognition, poc
- Language: Jupyter Notebook
- Homepage:
- Size: 714 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Proof of Concept of FlairNLP Named Entity Recognition integration with DKPro-Core
DKPro Core is Java based and it integrates taggers e.g. StanfordNLP, OpenNLP etc., where as the Flair is Python based state-of-the-art NLP model. This PoC shows the process of connecting the Java based DKPro Core and Python based Flair together using a middleware DKPro-Cassis. Below are the short descriptions of the technologies that have been used to complete the Proof of Concept.
[DKPro Core](https://dkpro.github.io/dkpro-core/info/):
> DKPro Core addresses tasks that are commonly referred to as linguistic pre-processing, e.g. part-of-speech taggers, parsers, etc. Within DKPro Core, a steadily growing set of third-party tools for such tasks have been wrapped into interoperable and interchangeable components for the Apache UIMA framework.
[FlairNLP](https://github.com/flairNLP/flair):
> Flair is a powerful NLP library. Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS), sense disambiguation and classification.
[DKPro-Cassis](https://github.com/dkpro/dkpro-cassis):
> DKPro-Cassis is a pure-Python implementation of the Common Analysis System (CAS) as defined by the UIMA framework. The CAS is a data structure representing an object to be enriched with annotations (the so-called Subject of Analysis, short SofA).
## Demo version
An example of the use case can be found under `Example/dkpro_flairnlp_ner_poc.ipynb`
## System requirements
- Python >= 3.6
## Installation Guide
A complete installation guide for both Windows and Linux users can be found [here](doc/INSTALL.md).
## Features
- Using DKPro Cassis to annotate Flair NLP NER-tags
- Re-useable CAS object for further annotations in Java based DKPro-Core## Workflow and Usage
![PoCWorkflow](doc/screenshots/workflow.png)
In order to complete this PoC, three files were needed:
- 2 Java files
- Tokenizer (it outputs XMI and TypeSystem files)
- POS-Tagger (it takes Flair XMI file as input and outputs annotated XMI file)
- 1 Jupyter Notebook file (it takes CAS object as input, executes Flair NER and annotates the CAS object with NER-tags)A detailed description of the workflow and usage can be found [here](/doc/USAGE.md)
## Acknowledgement
- [Piush Aggarwal](https://www.ltl.uni-due.de/team/piush-aggarwal)
- [Prof. Torsten Zesch](https://www.ltl.uni-due.de/team/torsten-zesch)