Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xwiz/spacy_symspell
Spacy symspell extension
https://github.com/xwiz/spacy_symspell
spacy spelling-correction spelling-suggestions symspell
Last synced: 18 days ago
JSON representation
Spacy symspell extension
- Host: GitHub
- URL: https://github.com/xwiz/spacy_symspell
- Owner: xwiz
- License: mit
- Created: 2019-04-18T09:16:32.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-08-26T16:12:59.000Z (about 5 years ago)
- Last Synced: 2024-10-08T01:31:18.057Z (29 days ago)
- Topics: spacy, spelling-correction, spelling-suggestions, symspell
- Language: Python
- Size: 3.41 MB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spaCy Symspell
## Spelling correction implementation in spaCy via SymspellThis package is a [spaCy 2.0 extension](https://spacy.io/usage/processing-pipelines#section-extensions) that adds sentnece/spelling corrections via Symspell to spaCy's text processing pipeline.
## Installation
`pip install spacy_symspell`
## Notes
This package is still in Alpha and there may be unforeseen errors. Dictionary loading time is also significant, can take up to 30 seconds on slow machines.## Usage
Adding the component to the processing pipeline is relatively simple:
import spacy
from spacy_symspell import SpellingCorrectornlp = spacy.load('en_core_web_sm')
corrector = SpellingCorrector()
nlp.add_pipe(corrector)
doc = nlp('What doyuoknowabout antyhing')for s in doc._.suggestions:#iterable
print(s) #What doyon about anything
doc._.segmentation #::segmented_string - What doyouk now about antyhing ::corrected_string - that dook now about anythingspaCy_symspell operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `suggestions` (a list of all found spelling suggestions) and `segmentation` (a corrected sentence in the case of ommitted spaces).
## Todo
Symspell accuracy can be improved with the help of spaCy by extracting and analyzing resulting n-grams and cross-referencing with possible n-grams deductible from the character groups in the symspell result. For example the correction 'that dook now' leaves us with a verbless sentence, and on closer analysis will reveal that the character group 'now' is related with the verb 'know', and the verb know is associated with the n-gram 'you know'.## Under the hood
[spacy_symspell](https://github.com/xwiz/spacy_symspell) is currently a wrapper of the [python port](https://github.com/mammothb/symspellpy) for [Symspell](https://github.com/wolfgarbe/SymSpell). For additional details, see the linked project pages.