https://github.com/talmago/spacy-coref
Lightweight cross-lingual coreference resolution with spaCy using ONNX Runtime inference of transformer models.
https://github.com/talmago/spacy-coref
corefer cross-lingual minilm minilm-l12-h384-uncased natural-language-processing natural-language-understanding onnxruntime spacy spacy-nlp transformers
Last synced: 2 months ago
JSON representation
Lightweight cross-lingual coreference resolution with spaCy using ONNX Runtime inference of transformer models.
- Host: GitHub
- URL: https://github.com/talmago/spacy-coref
- Owner: talmago
- Created: 2025-07-31T19:56:34.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-08-04T08:23:30.000Z (2 months ago)
- Last Synced: 2025-08-04T11:58:25.193Z (2 months ago)
- Topics: corefer, cross-lingual, minilm, minilm-l12-h384-uncased, natural-language-processing, natural-language-understanding, onnxruntime, spacy, spacy-nlp, transformers
- Language: Python
- Homepage:
- Size: 63.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# spacy-coref
Lightweight, fast co-reference resolution using a distilled version of AllenNLP's coreference model (exported to ONNX).
## โจ Features
- ๐ง Cross-lingual coreference resolution
- ๐ชถ Lightweight model based on AllenNLPโs coref modeling
- โก Fast inference via ONNX
- ๐ Easy integration with spaCy---
## ๐ฆ Installation
```bash
$ pip install spacy-coref
```## ๐ Quickstart
Usage as a standalone component
```python
from spacy_coref import CoreferenceResolver, decode_clustersresolver = CoreferenceResolver.from_pretrained("talmago/allennlp-coref-onnx-mMiniLMv2-L12-H384-distilled-from-XLMR-Large")
sentences = [
["Barack", "Obama", "was", "the", "44th", "President", "of", "the", "United", "States", "."],
["He", "was", "born", "in", "Hawaii", "."]
]pred = resolver(sentences)
print(decode_clusters(sentences, pred["clusters"][0]))
# Output is:
# [['Barack Obama', 'He']]
```Usage with spaCy
```python
import spacy
import spacy_corefnlp = spacy.load("en_core_web_sm")
nlp.add_pipe("coref_minilm")doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
print(doc._.coref_clusters[0])
print(doc._.cluster_heads)
print(doc._.resolved_text)# Output is:
# [Barack Obama, He]
# {'Barack Obama': Barack Obama}
# Barack Obama was born in Hawaii. Barack Obama was elected president in 2008.
```## ๐ ๏ธ Development
Set up virtualenv
```sh
$ make env
```Set PYTHONPATH
```sh
$ export PYTHONPATH=$PYTHONPATH:/Users/talmago/git/spacy-coref/src
```Code formatting
```sh
$ make format
```### ๐ References
This project builds on the work of the following repositories:
- **[crosslingual-coreference](https://github.com/davidberenstein1957/crosslingual-coreference)**
David Berenstein's implementation of multilingual coreference resolution, adapted from the original AllenNLP coref model.
GitHub: [davidberenstein1957/crosslingual-coreference](https://github.com/davidberenstein1957/crosslingual-coreference)- **[AllenNLP coreference model](https://github.com/allenai/allennlp-models/tree/b1f372248c17ad12684d344955fbcd98e957e77e/allennlp_models/coref)**
Official AllenNLP implementation of coreference resolution.
GitHub: [allenai/allennlp-models](https://github.com/allenai/allennlp-models)