Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/explosion/spacy-transformers
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
https://github.com/explosion/spacy-transformers
bert google gpt-2 huggingface language-model machine-learning natural-language-processing natural-language-understanding nlp openai pytorch pytorch-model spacy spacy-extension spacy-pipeline transfer-learning xlnet
Last synced: 7 days ago
JSON representation
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
- Host: GitHub
- URL: https://github.com/explosion/spacy-transformers
- Owner: explosion
- License: mit
- Created: 2019-07-26T19:12:34.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-06-05T08:48:15.000Z (5 months ago)
- Last Synced: 2024-10-29T14:55:19.294Z (10 days ago)
- Topics: bert, google, gpt-2, huggingface, language-model, machine-learning, natural-language-processing, natural-language-understanding, nlp, openai, pytorch, pytorch-model, spacy, spacy-extension, spacy-pipeline, transfer-learning, xlnet
- Language: Python
- Homepage: https://spacy.io/usage/embeddings-transformers
- Size: 1.14 MB
- Stars: 1,348
- Watchers: 32
- Forks: 165
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-python-machine-learning-resources - GitHub
- awesome-transformer-nlp - spacy-transformers - a library that wrap Hugging Face's Transformers, in order to extract features to power NLP pipelines. It also calculates an alignment so the Transformer features can be related back to actual words instead of just wordpieces. (Transformer Implementations By Communities / PyTorch and TensorFlow)
- awesome-list - spacy-transformers - Use pretrained transformers in spaCy, based on HuggingFace Transformers. (Natural Language Processing / General Purpose NLP)
- bert-in-production - spacy-transformers
- awesome-ChatGPT-repositories - spacy-transformers - 🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy (NLP)
README
# spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
This package provides [spaCy](https://github.com/explosion/spaCy) components and
architectures to use transformer models via
[Hugging Face's `transformers`](https://github.com/huggingface/transformers) in
spaCy. The result is convenient access to state-of-the-art transformer
architectures, such as BERT, GPT-2, XLNet, etc.> **This release requires [spaCy v3](https://spacy.io/usage/v3).** For the
> previous version of this library, see the
> [`v0.6.x` branch](https://github.com/explosion/spacy-transformers/tree/v0.6.x).[![tests](https://github.com/explosion/spacy-transformers/actions/workflows/tests.yml/badge.svg)](https://github.com/explosion/spacy-transformers/actions/workflows/tests.yml)
[![PyPi](https://img.shields.io/pypi/v/spacy-transformers.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.python.org/pypi/spacy-transformers)
[![GitHub](https://img.shields.io/github/release/explosion/spacy-transformers/all.svg?style=flat-square&logo=github)](https://github.com/explosion/spacy-transformers/releases)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)## Features
- Use pretrained transformer models like **BERT**, **RoBERTa** and **XLNet** to
power your spaCy pipeline.
- Easy **multi-task learning**: backprop to one transformer model from several
pipeline components.
- Train using spaCy v3's powerful and extensible config system.
- Automatic alignment of transformer output to spaCy's tokenization.
- Easily customize what transformer data is saved in the `Doc` object.
- Easily customize how long documents are processed.
- Out-of-the-box serialization and model packaging.## 🚀 Installation
Installing the package from pip will automatically install all dependencies,
including PyTorch and spaCy. Make sure you install this package **before** you
install the models. Also note that this package requires **Python 3.6+**,
**PyTorch v1.5+** and **spaCy v3.0+**.```bash
pip install 'spacy[transformers]'
```For GPU installation, find your CUDA version using `nvcc --version` and add the
[version in brackets](https://spacy.io/usage/#gpu), e.g.
`spacy[transformers,cuda92]` for CUDA9.2 or `spacy[transformers,cuda100]` for
CUDA10.0.If you are having trouble installing PyTorch, follow the
[instructions](https://pytorch.org/get-started/locally/) on the official website
for your specific operating system and requirements.## 📖 Documentation
> ⚠️ **Important note:** This package has been extensively refactored to take
> advantage of [spaCy v3.0](https://spacy.io). Previous versions that were built
> for [spaCy v2.x](https://v2.spacy.io) worked considerably differently. Please
> see previous tagged versions of this README for documentation on prior
> versions.- 📘
[Embeddings, Transformers and Transfer Learning](https://spacy.io/usage/embeddings-transformers):
How to use transformers in spaCy
- 📘 [Training Pipelines and Models](https://spacy.io/usage/training): Train and
update components on your own data and integrate custom models
- 📘
[Layers and Model Architectures](https://spacy.io/usage/layers-architectures):
Power spaCy components with custom neural networks
- 📗 [`Transformer`](https://spacy.io/api/transformer): Pipeline component API
reference
- 📗
[Transformer architectures](https://spacy.io/api/architectures#transformers):
Architectures and registered functions## Applying pretrained text and token classification models
Note that the `transformer` component from `spacy-transformers` does not support
task-specific heads like token or text classification. A task-specific
transformer model can be used as a source of features to train spaCy components
like `ner` or `textcat`, but the `transformer` component does not provide access
to task-specific heads for training or inference.Alternatively, if you only want use to the **predictions** from an existing
Hugging Face text or token classification model, you can use the wrappers from
[`spacy-huggingface-pipelines`](https://github.com/explosion/spacy-huggingface-pipelines)
to incorporate task-specific transformer models into your spaCy pipelines.## Bug reports and other issues
Please use [spaCy's issue tracker](https://github.com/explosion/spaCy/issues) to
report a bug, or open a new thread on the
[discussion board](https://github.com/explosion/spaCy/discussions) for any other
issue.