Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mpuig/spacy-lookup
Named Entity Recognition based on dictionaries
https://github.com/mpuig/spacy-lookup
named-entity-recognition natural-language-processing ner nlp spacy spacy-extension spacy-pipeline
Last synced: 5 days ago
JSON representation
Named Entity Recognition based on dictionaries
- Host: GitHub
- URL: https://github.com/mpuig/spacy-lookup
- Owner: mpuig
- License: mit
- Created: 2018-01-15T17:32:39.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2019-03-03T18:17:59.000Z (almost 6 years ago)
- Last Synced: 2024-12-30T15:17:11.089Z (12 days ago)
- Topics: named-entity-recognition, natural-language-processing, ner, nlp, spacy, spacy-extension, spacy-pipeline
- Language: Python
- Size: 3.55 MB
- Stars: 244
- Watchers: 10
- Forks: 38
- Open Issues: 5
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
spacy-lookup: Named Entity Recognition based on dictionaries
************************************************************`spaCy v2.0 `_ extension and pipeline component
for adding Named Entities metadata to ``Doc`` objects. Detects Named Entities
using dictionaries. The extension sets the custom ``Doc``,
``Token`` and ``Span`` attributes ``._.is_entity``, ``._.entity_type``,
``._.has_entities`` and ``._.entities``.Named Entities are matched using the python module ``flashtext``, and
looks up in the data provided by different dictionaries.Installation
===============``spacy-lookup`` requires ``spacy`` v2.0.16 or higher.
.. code:: bash
pip install spacy-lookup
Usage
=====
First, you need to download a language model... code:: bash
python -m spacy download en
Import the component and initialise it with the shared ``nlp`` object (i.e. an
instance of ``Language``), which is used to initialise ``flashtext``
with the shared vocab, and create the match patterns. Then add the component
anywhere in your pipeline... code:: python
import spacy
from spacy_lookup import Entitynlp = spacy.load('en')
entity = Entity(keywords_list=['python', 'product manager', 'java platform'])
nlp.add_pipe(entity, last=True)doc = nlp(u"I am a product manager for a java and python.")
assert doc._.has_entities == True
assert doc[0]._.is_entity == False
assert doc[3]._.entity_desc == 'product manager'
assert doc[3]._.is_entity == Trueprint([(token.text, token._.canonical) for token in doc if token._.is_entity])
``spacy-lookup`` only cares about the token text, so you can use it on a blank
``Language`` instance (it should work for all
`available languages `_!), or in
a pipeline with a loaded model. If you're loading a model and your pipeline
includes a tagger, parser and entity recognizer, make sure to add the entity
component as ``last=True``, so the spans are merged at the end of the pipeline.Available attributes
--------------------The extension sets attributes on the ``Doc``, ``Span`` and ``Token``. You can
change the attribute names on initialisation of the extension. For more details
on custom components and attributes, see the
`processing pipelines documentation `_.====================== ======= ===
``Token._.is_entity`` bool Whether the token is an entity.
``Token._.entity_type`` unicode A human-readable description of the entity.
``Doc._.has_entities`` bool Whether the document contains entity.
``Doc._.entities`` list ``(entity, index, description)`` tuples of the document's entities.
``Span._.has_entities`` bool Whether the span contains entity.
``Span._.entities`` list ``(entity, index, description)`` tuples of the span's entities.
====================== ======= ===Settings
--------On initialisation of ``Entity``, you can define the following settings:
=============== ============ ===
``nlp`` ``Language`` The shared ``nlp`` object. Used to initialise the matcher with the shared ``Vocab``, and create ``Doc`` match patterns.
``attrs`` tuple Attributes to set on the ._ property. Defaults to ``('has_entities', 'is_entity', 'entity_type', 'entity')``.
``keywords_list`` list Optional lookup table with the list of terms to look for.
``keywords_dict`` dict Optional lookup table with the list of terms to look for.
``keywords_file`` string Optional filename with the list of terms to look for.
=============== ============ ===.. code:: python
entity = Entity(nlp, keywords_list=['python', 'java platform'], label='ACME')
nlp.add_pipe(entity)
doc = nlp(u"I am a product manager for a java platform and python.")
assert doc[3]._.is_entity