https://github.com/snapp-incubator/go-symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
https://github.com/snapp-incubator/go-symspell
damerau-levenshtein fuzzy-matching fuzzy-search levenshtein spell-checker spellcheck spelling symspell
Last synced: 4 months ago
JSON representation
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
- Host: GitHub
- URL: https://github.com/snapp-incubator/go-symspell
- Owner: snapp-incubator
- License: agpl-3.0
- Created: 2025-01-02T11:47:46.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-01-04T11:26:55.000Z (4 months ago)
- Last Synced: 2025-01-04T12:34:54.193Z (4 months ago)
- Topics: damerau-levenshtein, fuzzy-matching, fuzzy-search, levenshtein, spell-checker, spellcheck, spelling, symspell
- Language: Go
- Homepage:
- Size: 2.99 MB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
- License: LICENSE
Awesome Lists containing this project
README
# SymSpell Package
[](https://pkg.go.dev/github.com/snapp-incubator/go-symspell)
[](https://goreportcard.com/report/github.com/snapp-incubator/go-symspell)## Overview
The `symspell` package provides a Golang implementation of the SymSpell algorithm, a fast and memory-efficient algorithm
for spelling correction, word segmentation, and fuzzy string matching. It supports both unigrams and bigrams for
advanced contextual correction.## Features
- Fast lookup for single-word corrections
- Compound word corrections
- Customizable edit distance and prefix length
- Support for unigram and bigram dictionaries
- Configurable thresholds for performance tuning## Installation
Install the package using `go get`:
```sh
go get github.com/snapp-incubator/go-symspell
```## Usage
- Import the Package
- import "github.com/snapp-incubator/go-symspell"
- Initialize SymSpell
- Simple Lookup
##### Lookup
###### Load a unigram dictionary:
```go
package mainimport "github.com/snapp-incubator/go-symspell"
func main() {
symSpell := symspell.NewSymSpellWithLoadDictionary("path/to/vocab.txt", 0, 1,
symspell.WithCountThreshold(10),
symspell.WithMaxDictionaryEditDistance(3),
symspell.WithPrefixLength(5),
)
}
```##### Compound Lookup
###### Load both unigram and bigram dictionaries:
```go
package mainfunc main() {
symSpell := symspell.NewSymSpellWithLoadBigramDictionary("path/to/vocab.txt", "path/to/vocab_bigram.txt", 0, 1,
symspell.WithCountThreshold(1),
symspell.WithMaxDictionaryEditDistance(3),
symspell.WithPrefixLength(7),
)
}
```### Perform Lookup
#### Single Word Lookup
```go
suggestions, err := symSpell.Lookup("حیابان", symspell.Top, 3)
if err != nil {
log.Fatal(err)
}
fmt.Println(suggestions[0].Term) // Output: خیابان
```Compound Word Lookup
```go
suggestion := symSpell.LookupCompound("حیابان ملاصدزا", 3)
fmt.Println(suggestion.Term) // Output: خیابان ملاصدرا
```## Examples
#### Unit Tests
The repository includes comprehensive unit tests. Run the tests with:
```shell
go test ./...
```Example test cases include single-word corrections, compound word corrections, and edge cases.
### Configuration Options
- WithMaxDictionaryEditDistance: Sets the maximum edit distance for corrections.
- WithPrefixLength: Sets the prefix length for index optimization.
- WithCountThreshold: Filters dictionary entries with low frequency.Dictionaries
The dictionaries should be formatted as plain text files:
- Unigram file: Each line should contain a term and its frequency, separated by a space.(or could be custom seperator)
- Bigram file: Each line should contain two terms and their frequency, separated by a space.#### Example:
Unigram (vocab.txt):
```text
خیابان 1000
میدان 800
```Bigram (vocab_bigram.txt):
```text
خیابان کارگر 500
میدان آزادی 300
```### Performance
SymSpell is optimized for speed and memory efficiency. For large vocabularies, tune maxEditDistance, prefixLength, and
countThreshold to balance performance and accuracy.