Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/reneklacan/symspell
Spelling correction & Fuzzy search based on Symmetric Delete spelling correction algorithm.
https://github.com/reneklacan/symspell
rust spellcheck spelling-correction symspell
Last synced: 12 days ago
JSON representation
Spelling correction & Fuzzy search based on Symmetric Delete spelling correction algorithm.
- Host: GitHub
- URL: https://github.com/reneklacan/symspell
- Owner: reneklacan
- License: mit
- Created: 2018-04-08T21:54:09.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-04-10T14:58:19.000Z (over 1 year ago)
- Last Synced: 2024-10-14T16:39:50.639Z (24 days ago)
- Topics: rust, spellcheck, spelling-correction, symspell
- Language: Rust
- Homepage:
- Size: 2.54 MB
- Stars: 131
- Watchers: 8
- Forks: 30
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Documentation](https://docs.rs/symspell/badge.svg)](https://docs.rs/symspell)
# SymSpell
Rust implementation of brilliant [SymSpell](https://github.com/wolfgarbe/SymSpell) originally written in C# by [@wolfgarbe](https://github.com/wolfgarbe).
## Usage
```rust
extern crate symspell;use symspell::{AsciiStringStrategy, SymSpell, Verbosity};
fn main() {
let mut symspell: SymSpell = SymSpell::default();symspell.load_dictionary("data/frequency_dictionary_en_82_765.txt", 0, 1, " ");
symspell.load_bigram_dictionary(
"./data/frequency_bigramdictionary_en_243_342.txt",
0,
2,
" "
);let suggestions = symspell.lookup("roket", Verbosity::Top, 2);
println!("{:?}", suggestions);let sentence = "whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixtgrade and ins pired him"
let compound_suggestions = symspell.lookup_compound(sentence, 2);
println!("{:?}", compound_suggestions);let sentence = "whereisthelove";
let segmented = symspell.word_segmentation(sentence, 2);
println!("{:?}", segmented);
}
```N.B. the dictionary entries have to be lowercase
## Advanced Usage
### Using Custom Settings
```rust
let mut symspell: SymSpell = SymSpellBuilder::default()
.max_dictionary_edit_distance(2)
.prefix_length(7)
.count_threshold(1)
.build()
.unwrap()
```### String Strategy
String strategy is abstraction for string manipulation, for example preprocessing.
There are two strategies included:
* `UnicodeStringStrategy`
* Doesn't do any prepocessing and handles strings as they are.
* `AsciiStringStrategy`
* Transliterates strings into ASCII only characters.
* Useful when you are working with accented languages and you don't want to care about accents, etcTo configure string strategy just pass it as a type parameter:
```rust
let mut ascii_symspell: SymSpell = SymSpell::default();
let mut unicode_symspell: SymSpell = SymSpell::default();
```### Javascript Bindings
This crate can be compiled against wasm32 target and exposes a SymSpell Class that can be used from Javascript as follow.
Only `UnicodeStringStrategy` is exported, meaning that if someone wants to manipulate ASCII only strings the dictionary and the sentences must be prepared in advance from JS.```javascript
const fs = require('fs');
const rust = require('./pkg');let dictionary = fs.readFileSync('data/frequency_dictionary_en_82_765.txt');
let sentence = "whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixtgrade and ins pired him";let symspell = new rust.SymSpell({ max_edit_distance: 2, prefix_length: 7, count_threshold: 1});
symspell.load_dictionary(dictionary.buffer, { term_index: 0, count_index: 1, separator: " "});
symspell.load_bigram_dictionary(bigram_dict.buffer, { term_index: 0, count_index: 2, separator: " "});
symspell.lookup_compound(sentence, 1);
```It can be compiled using `wasm-pack` (eg. `wasm-pack build --release --target nodejs`)