Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kennethenevoldsen/spacy-wrap
spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to include existing fine-tuned models within your SpaCy workflow.
https://github.com/kennethenevoldsen/spacy-wrap
deep-learning huggingface huggingface-transformers language-model machine-learning natural-language-processing nlp pytorch spacy spacy-extension spacy-extensions spacy-models spacy-nlp spacy-pipeline spacy-transformers text-classification transformers
Last synced: 2 months ago
JSON representation
spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to include existing fine-tuned models within your SpaCy workflow.
- Host: GitHub
- URL: https://github.com/kennethenevoldsen/spacy-wrap
- Owner: KennethEnevoldsen
- License: mit
- Created: 2022-01-30T18:49:52.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-15T16:37:36.000Z (8 months ago)
- Last Synced: 2024-09-29T13:01:17.538Z (3 months ago)
- Topics: deep-learning, huggingface, huggingface-transformers, language-model, machine-learning, natural-language-processing, nlp, pytorch, spacy, spacy-extension, spacy-extensions, spacy-models, spacy-nlp, spacy-pipeline, spacy-transformers, text-classification, transformers
- Language: Python
- Homepage: https://KennethEnevoldsen.github.io/spacy-wrap/
- Size: 2.2 MB
- Stars: 46
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Citation: citation.cff
Awesome Lists containing this project
README
# spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines[![PyPI version](https://badge.fury.io/py/spacy-wrap.svg)](https://pypi.org/project/spacy-wrap/)
[![python version](https://img.shields.io/badge/Python-%3E=3.8-blue)](https://github.com/kennethenevoldsen/spacy-wrap)
[![Code style: black](https://img.shields.io/badge/Code%20Style-Black-black)](https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html)
[![github actions pytest](https://github.com/kennethenevoldsen/spacy-wrap/actions/workflows/tests.yml/badge.svg)](https://github.com/kennethenevoldsen/spacy-wrap/actions)
[![github actions docs](https://github.com/kennethenevoldsen/spacy-wrap/actions/workflows/documentation.yml/badge.svg)](https://kennethenevoldsen.github.io/spacy-wrap/)
![github coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/KennethEnevoldsen/33fb85a2c440013df494c1fce884633c/raw/3813a0369fdd61b39a806b7b91839ff405ef809a/badge-spacy-wrap-coverage.json)spaCy-wrap is a minimal library intended for wrapping fine-tuned transformers from the [Huggingface model hub](https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads) in your spaCy pipeline allowing the inclusion of existing models within [SpaCy](https://spacy.io) workflows.
As far as possible it follows a similar API as [spacy-transformers](https://github.com/explosion/spacy-transformers).
**NOTE**: Since the release of spaCy-wrap, Explosion released the [spacy-huggingface-pipelines](https://github.com/explosion/spacy-huggingface-pipelines) it takes the approach of wrapping the Huggingface pipeline as opposed to the transformer. That means token aggregation and conversion into spans happens at
the Huggingface pipeline, while in spaCy-wrap it happens at the logits of the model which can sometimes lead to unfortunate differences in results.
I generally recommend using the spacy-huggingface-pipelines for most use cases, but if you need to use the transformer output more directly
spaCy-wrap can have its uses.## Installation
Installing spacy-wrap is simple using pip:
```
pip install spacy_wrap
```## Examples
The following shows a simple example of how you can quickly add a fine-tuned transformer model from the Huggingface model hub for either [text classification](https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads), [named entity](https://huggingface.co/models?pipeline_tag=token-classification&sort=downloads) or [token classification](https://huggingface.co/models?pipeline_tag=token-classification&sort=downloads).### Sequence Classification
In this example, we will use a [model](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) fine-tuned for sentiment classification on SST2. This model classifies whether a text is positive or negative. We will add this model to a blank English pipeline:```python
import spacy
import spacy_wrapnlp = spacy.blank("en")
config = {
"doc_extension_trf_data": "clf_trf_data", # document extention for the forward pass
"doc_extension_prediction": "sentiment", # document extention for the prediction
"model": {
# the model name or path of huggingface model
"name": "distilbert-base-uncased-finetuned-sst-2-english",
},
}transformer = nlp.add_pipe("sequence_classification_transformer", config=config)
doc = nlp("spaCy is a wonderful tool")
print(doc.cats)
# {'NEGATIVE': 0.001, 'POSITIVE': 0.999}
print(doc._.sentiment)
# 'POSITIVE'
print(doc._.clf_trf_data)
# TransformerData(wordpieces=...
```
These pipelines can also easily be applied to multiple documents using the `nlp.pipe` as one would expect from a spaCy component:```python
docs = nlp.pipe(
[
"I hate wrapping my own models",
"Isn't there a tool for this?!",
"spacy-wrap is great for wrapping models",
]
)for doc in docs:
print(doc._.sentiment)
# 'NEGATIVE'
# 'NEGATIVE'
# 'POSITIVE'
```
More Examples
It is always nice to have more than one example. Here is another one where we add the Hate speech model for Danish to a blank Danish pipeline:
```python
import spacy
import spacy_wrapnlp = spacy.blank("da")
config = {
"doc_extension_trf_data": "clf_trf_data", # document extention for the forward pass
"doc_extension_prediction": "hate_speech", # document extention for the prediction
# choose custom labels
"labels": ["Not hate Speech", "Hate speech"],
"model": {
"name": "DaNLP/da-bert-hatespeech-detection", # the model name or path of huggingface model
},
}transformer = nlp.add_pipe("classification_transformer", config=config)
doc = nlp("Senile gamle idiot") # old senile idiot
doc._.clf_trf_data
# TransformerData(wordpieces=...
doc._.hate_speech
# "Hate speech"
doc._.hate_speech_prob
# {'prob': array([0.013, 0.987], dtype=float32), 'labels': ['Not hate Speech', 'Hate speech']}
```
### Token Classification
We can also use the model for token classification:```python
import spacy
import spacy_wrap
nlp = spacy.blank("en")config = {"model": {"name": "vblagoje/bert-english-uncased-finetuned-pos"},
# "predictions_to": ["pos"] # optional, can be "pos", "tag" or "ents"
}snlp.add_pipe("token_classification_transformer", config=config)
text = "My name is Wolfgang and I live in Berlin"
doc = nlp(text)
print(doc._.tok_clf_predictions)
# ['PRON', 'NOUN', 'AUX', 'PROPN', 'CCONJ', 'PRON', 'VERB', 'ADP', 'PROPN']
```By default, spacy-wrap will automatically detect it the labels follow the universal POS tags as well. If so it will also assign it to the `token.pos`, similar regular spacy pipelines:
```python
print(doc[0].pos_)
# 'PRON'
```### Named Entity Recognition
In this example, we use a model fine-tuned for named entity recognition. spacy-wrap will in this case infer from the IOB tags that the model is intended for named entity recognition and assign it to `doc.ents`.```python
import spacy
import spacy_wrap
nlp = spacy.blank("en")# specify model from the hub
config = {"model": {"name": "dslim/bert-base-NER"},
"predictions_to": ["ents"]} # forced to be named entity recognition, if left out it will be estimated from the labels# add it to the pipe
nlp.add_pipe("token_classification_transformer", config=config)doc = nlp("My name is Wolfgang and I live in Berlin.")
print(doc.ents)
# (Wolfgang, Berlin)
```# 📖 Documentation
| Documentation | |
| -------------------------- | ------------------------------------------- |
| 🔧 **[Installation]** | Installation instructions for spacy-wrap. |
| 📰 **[News and changelog]** | New additions, changes and version history. |
| 🎛 **[Documentation]** | The reference for spacy-wrap's API. |[Documentation]: https://kennethenevoldsen.github.io/spacy-wrap/index.html
[Installation]: https://kennethenevoldsen.github.io/spacy-wrap/installation.html
[News and changelog]: https://kennethenevoldsen.github.io/spacy-wrap/news.html# 💬 Where to ask questions
| Type | |
| ------------------------------ | ---------------------- |
| 🚨 **FAQ** | [FAQ] |
| 🚨 **Bug Reports** | [GitHub Issue Tracker] |
| 🎁 **Feature Requests & Ideas** | [GitHub Issue Tracker] |
| 👩💻 **Usage Questions** | [GitHub Discussions] |
| 🗯 **General Discussion** | [GitHub Discussions] |[FAQ]: https://kennethenevoldsen.github.io/spacy-wrap/faq.html
[github issue tracker]: https://github.com/kennethenevoldsen/spacy-wrap/issues
[github discussions]: https://github.com/kennethenevoldsen/spacy-wrap/discussions