https://github.com/imvladikon/spacy-trankit

💥 Trankit models directly in spaCy💥
https://github.com/imvladikon/spacy-trankit

nlp spacy spacy-extension spacy-nlp spacy-pipeline trankit

Last synced: about 2 months ago
JSON representation

💥 Trankit models directly in spaCy💥

Host: GitHub
URL: https://github.com/imvladikon/spacy-trankit
Owner: imvladikon
License: mit
Created: 2023-12-30T18:12:56.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-02-11T22:08:02.000Z (over 1 year ago)
Last Synced: 2025-03-03T04:02:57.741Z (2 months ago)
Topics: nlp, spacy, spacy-extension, spacy-nlp, spacy-pipeline, trankit
Language: Python
Homepage:
Size: 28.3 KB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        

# spaCy + Trankit

This package wraps the [Trankit](https://github.com/nlp-uoregon/trankit) library, so you can use trankit models in a

[spaCy](https://spacy.io) pipeline. 

[//]: # ([![tests](https://github.com/imvladikon/spacy-trankit/actions/workflows/tests.yml/badge.svg)](https://github.com/imvladikon/spacy-trankit/actions/workflows/tests.yml))

[//]: # ([![PyPi](https://img.shields.io/pypi/v/spacy-trankit.svg?style=flat-square)](https://pypi.python.org/pypi/spacy-trankit))

[![GitHub](https://img.shields.io/github/release/imvladikon/spacy-trankit/all.svg?style=flat-square)](https://github.com/imvladikon/spacy-trankit)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)

Using this wrapper, you'll be able to use the following annotations, computed by

your pretrained `trankit` pipeline/model:

- Statistical tokenization (reflected in the `Doc` and its tokens)

- Lemmatization (`token.lemma` and `token.lemma_`)

- Part-of-speech tagging (`token.tag`, `token.tag_`, `token.pos`, `token.pos_`)

- Morphological analysis (`token.morph`)

- Dependency parsing (`token.dep`, `token.dep_`, `token.head`)

- Named entity recognition (`doc.ents`, `token.ent_type`, `token.ent_type_`,

  `token.ent_iob`, `token.ent_iob_`)

- Sentence segmentation (`doc.sents`)

## ️️️⌛️ Installation

As of v0.1.0 `spacy-trankit` is only compatible with **spaCy v3.x**. To install

the most recent version:

```bash

pip install git+https://github.com/imvladikon/spacy-trankit

```

or from pypi:

```bash

pip install spacy-trankit

```

## 📖 Usage & Examples

Load pre-trained `trankit` model into a spaCy pipeline:

```python

import spacy_trankit

# Initialize the pipeline

nlp = spacy_trankit.load("en")

doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")

for token in doc:

    print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)

print(doc.ents)

```

Load it from the path:

```python

import spacy_trankit

# Initialize the pipeline

nlp = spacy_trankit.load_from_path(name="en", path="./cache") 

doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")

for token in doc:

    print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)

print(doc.ents)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/imvladikon/spacy-trankit

Awesome Lists containing this project

README