https://github.com/centre-for-humanities-computing/spacy_polyglot
A wrapper library for wrapping Polyglot into a spaCy pipeline. This is intended mainly for validation. We do not recommend using this in application.
https://github.com/centre-for-humanities-computing/spacy_polyglot
Last synced: 5 days ago
JSON representation
A wrapper library for wrapping Polyglot into a spaCy pipeline. This is intended mainly for validation. We do not recommend using this in application.
- Host: GitHub
- URL: https://github.com/centre-for-humanities-computing/spacy_polyglot
- Owner: centre-for-humanities-computing
- License: mit
- Created: 2023-05-05T02:26:29.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-08T19:54:14.000Z (about 3 years ago)
- Last Synced: 2025-01-03T21:41:13.247Z (over 1 year ago)
- Language: Python
- Size: 12.7 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spaCy Polyglot
A spaCy wrapper for using polyglot within spaCy.
## Installation
```bash
pip install https://github.com/centre-for-humanities-computing/spacy_polyglot
```
You might need to install some dependencies first:
```bash
# install dev tools
apt-get install python3-venv
echo -e "[INFO:] Installing DEV tools ..." # user msg
apt-get update && apt-get install -y apt-transport-https -y
apt-get install libicu-dev -y
apt-get install python3-dev -y
#install things in this order
echo -e "[INFO:] Installing packages ..." # user msg
pip install pycld2
pip install polyglot
pip install --no-binary=:pyicu: pyicu
```
*Note*: This package is only intended to work on Linux.
## Usage
```python
import spacy
from spacy_polyglot import PolyglotComponent # just to register the component
nlp = spacy.blank("da")
nlp.add_pipe("polyglot", last=True)
doc = nlp("Jeg hedder Anders og bor i Odense.")
print(doc.ents)
# (Anders, Odense)
nlp = spacy.blank("en")
nlp.add_pipe("polyglot", last=True)
doc = nlp("My name is Anders and I live in Odense.")
print(doc.ents)
# (Anders, Odense)
```