Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/prinzhorn/nicenshtein
Efficiently index and search a dictionary by Levenshtein distance
https://github.com/prinzhorn/nicenshtein
Last synced: 24 days ago
JSON representation
Efficiently index and search a dictionary by Levenshtein distance
- Host: GitHub
- URL: https://github.com/prinzhorn/nicenshtein
- Owner: Prinzhorn
- License: mit
- Created: 2018-04-03T15:08:13.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-04-16T09:01:01.000Z (over 6 years ago)
- Last Synced: 2024-06-20T02:04:03.041Z (5 months ago)
- Language: Go
- Size: 8.79 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
Note: this was mostly meant for me to learn some basic Go. Right now it is really slow for distances > 2 (slower than the naive approach).
# Nicenshtein
Efficiently index and search a dictionary by Levenshtein distance. This is done by creating a trie (prefix tree) as an index and then walking the trie for collecting all words within a given distance. We keep track of the number of edits that have been made and walk multiple paths at the same time until all edits are consumed.
It is safe to use with utf-8 strings as it uses runes internally.
Check out [nicenshtein-server](https://github.com/Prinzhorn/nicenshtein-server) as well, it has a demo live at [https://nicenshtein.now.sh](https://nicenshtein.now.sh).
# API
## NewNicenshtein()
Returns a new instance of a Nicenshtein index with the following methods:
## IndexFile(filePath string): error
Indexes every single line in the given file using `AddWord`.
## AddWord(word string)
Adds a `word` to the index.
## ContainsWord(word string): bool
Returns whether or not the index contains the given `word`.
## CollectWords(out \*map[string]byte, word string, maxDistance byte)
Will fill `out` (maps words to distances) with all words that are within `maxDistance` of `word`.