https://github.com/explosion/spacy-transformers

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
https://github.com/explosion/spacy-transformers

bert google gpt-2 huggingface language-model machine-learning natural-language-processing natural-language-understanding nlp openai pytorch pytorch-model spacy spacy-extension spacy-pipeline transfer-learning xlnet

Last synced: 7 months ago
JSON representation

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Host: GitHub
URL: https://github.com/explosion/spacy-transformers
Owner: explosion
License: mit
Created: 2019-07-26T19:12:34.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2025-02-06T11:15:50.000Z (10 months ago)
Last Synced: 2025-04-24T08:55:13.355Z (7 months ago)
Topics: bert, google, gpt-2, huggingface, language-model, machine-learning, natural-language-processing, natural-language-understanding, nlp, openai, pytorch, pytorch-model, spacy, spacy-extension, spacy-pipeline, transfer-learning, xlnet
Language: Python
Homepage: https://spacy.io/usage/embeddings-transformers
Size: 1.15 MB
Stars: 1,382
Watchers: 30
Forks: 172
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-transformer-nlp - spacy-transformers - a library that wrap Hugging Face's Transformers, in order to extract features to power NLP pipelines. It also calculates an alignment so the Transformer features can be related back to actual words instead of just wordpieces. (Transformer Implementations By Communities / PyTorch and TensorFlow)
awesome-list - spacy-transformers - Use pretrained transformers in spaCy, based on HuggingFace Transformers. (Natural Language Processing / General Purpose NLP)
bert-in-production - spacy-transformers
awesome-python-machine-learning-resources - GitHub
awesome-ChatGPT-repositories - spacy-transformers - 🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy (NLP)

README

          

# spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

This package provides [spaCy](https://github.com/explosion/spaCy) components and

architectures to use transformer models via

[Hugging Face's `transformers`](https://github.com/huggingface/transformers) in

spaCy. The result is convenient access to state-of-the-art transformer

architectures, such as BERT, GPT-2, XLNet, etc.

> **This release requires [spaCy v3](https://spacy.io/usage/v3).** For the

> previous version of this library, see the

> [`v0.6.x` branch](https://github.com/explosion/spacy-transformers/tree/v0.6.x).

[![tests](https://github.com/explosion/spacy-transformers/actions/workflows/tests.yml/badge.svg)](https://github.com/explosion/spacy-transformers/actions/workflows/tests.yml)

[![PyPi](https://img.shields.io/pypi/v/spacy-transformers.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.python.org/pypi/spacy-transformers)

[![GitHub](https://img.shields.io/github/release/explosion/spacy-transformers/all.svg?style=flat-square&logo=github)](https://github.com/explosion/spacy-transformers/releases)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)

## Features

- Use pretrained transformer models like **BERT**, **RoBERTa** and **XLNet** to

  power your spaCy pipeline.

- Easy **multi-task learning**: backprop to one transformer model from several

  pipeline components.

- Train using spaCy v3's powerful and extensible config system.

- Automatic alignment of transformer output to spaCy's tokenization.

- Easily customize what transformer data is saved in the `Doc` object.

- Easily customize how long documents are processed.

- Out-of-the-box serialization and model packaging.

## 🚀 Installation

Installing the package from pip will automatically install all dependencies,

including PyTorch and spaCy. Make sure you install this package **before** you

install the models. Also note that this package requires **Python 3.6+**,

**PyTorch v1.5+** and **spaCy v3.0+**.

```bash

pip install 'spacy[transformers]'

```

For GPU installation, find your CUDA version using `nvcc --version` and add the

[version in brackets](https://spacy.io/usage/#gpu), e.g.

`spacy[transformers,cuda92]` for CUDA9.2 or `spacy[transformers,cuda100]` for

CUDA10.0.

If you are having trouble installing PyTorch, follow the

[instructions](https://pytorch.org/get-started/locally/) on the official website

for your specific operating system and requirements.

## 📖 Documentation

> ⚠️ **Important note:** This package has been extensively refactored to take

> advantage of [spaCy v3.0](https://spacy.io). Previous versions that were built

> for [spaCy v2.x](https://v2.spacy.io) worked considerably differently. Please

> see previous tagged versions of this README for documentation on prior

> versions.

- 📘

  [Embeddings, Transformers and Transfer Learning](https://spacy.io/usage/embeddings-transformers):

  How to use transformers in spaCy

- 📘 [Training Pipelines and Models](https://spacy.io/usage/training): Train and

  update components on your own data and integrate custom models

- 📘

  [Layers and Model Architectures](https://spacy.io/usage/layers-architectures):

  Power spaCy components with custom neural networks

- 📗 [`Transformer`](https://spacy.io/api/transformer): Pipeline component API

  reference

- 📗

  [Transformer architectures](https://spacy.io/api/architectures#transformers):

  Architectures and registered functions

## Applying pretrained text and token classification models

Note that the `transformer` component from `spacy-transformers` does not support

task-specific heads like token or text classification. A task-specific

transformer model can be used as a source of features to train spaCy components

like `ner` or `textcat`, but the `transformer` component does not provide access

to task-specific heads for training or inference.

Alternatively, if you only want use to the **predictions** from an existing

Hugging Face text or token classification model, you can use the wrappers from

[`spacy-huggingface-pipelines`](https://github.com/explosion/spacy-huggingface-pipelines)

to incorporate task-specific transformer models into your spaCy pipelines.

## Bug reports and other issues

Please use [spaCy's issue tracker](https://github.com/explosion/spaCy/issues) to

report a bug, or open a new thread on the

[discussion board](https://github.com/explosion/spaCy/discussions) for any other

issue.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/explosion/spacy-transformers

Awesome Lists containing this project

README