https://github.com/explosion/spacy-alignments
💫 A spaCy package for Yohei Tamura's Rust tokenizations library
https://github.com/explosion/spacy-alignments
Last synced: 6 months ago
JSON representation
💫 A spaCy package for Yohei Tamura's Rust tokenizations library
- Host: GitHub
- URL: https://github.com/explosion/spacy-alignments
- Owner: explosion
- License: mit
- Created: 2020-12-08T08:07:25.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-11-03T15:15:32.000Z (over 2 years ago)
- Last Synced: 2025-01-29T18:38:16.786Z (about 1 year ago)
- Language: Python
- Size: 21.5 KB
- Stars: 27
- Watchers: 7
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spacy-alignments: Align tokenizations for spaCy + transformers
A spaCy package for Yohei Tamura's Rust
[tokenizations](https://github.com/tamuhey/tokenizations/) library with Python
bindings.
## Installation
```
pip install -U pip setuptools wheel
pip install spacy-alignments
```
If no binary wheel is available for your platform, you will need to [install
Rust](https://www.rust-lang.org/tools/install) in order to build
`spacy-alignments` from source.
## spacy-alignments vs. pytokenizations
The `spacy_alignments` module is a drop-in replacement for `tokenizations`:
```python
import spacy_alignments as tokenizations
a2b, b2a = tokenizations.get_alignments(["Ã¥", "BC"], ["abc"])
assert a2b == [[0], [0]]
assert b2a == [[0, 1]]
```
The only difference between this package and the original
[`pytokenizations`](https://pypi.org/project/pytokenizations/) is that it
switches the build system to `setuptools-rust` to make it easier for us at
Explosion to build source and binary packages for a wider range of platforms.
## Bug reports and other issues
Please use [spaCy's issue tracker](https://github.com/explosion/spaCy/issues) to report a bug, or open a new thread on the
[discussion board](https://github.com/explosion/spaCy/discussions)
for any other issue.