Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/prinzhorn/nicenshtein

Efficiently index and search a dictionary by Levenshtein distance
https://github.com/prinzhorn/nicenshtein

Last synced: 24 days ago
JSON representation

Efficiently index and search a dictionary by Levenshtein distance

Host: GitHub
URL: https://github.com/prinzhorn/nicenshtein
Owner: Prinzhorn
License: mit
Created: 2018-04-03T15:08:13.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2018-04-16T09:01:01.000Z (over 6 years ago)
Last Synced: 2024-06-20T02:04:03.041Z (5 months ago)
Language: Go
Size: 8.79 KB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

Note: this was mostly meant for me to learn some basic Go. Right now it is really slow for distances > 2 (slower than the naive approach).

# Nicenshtein

Efficiently index and search a dictionary by Levenshtein distance. This is done by creating a trie (prefix tree) as an index and then walking the trie for collecting all words within a given distance. We keep track of the number of edits that have been made and walk multiple paths at the same time until all edits are consumed.

It is safe to use with utf-8 strings as it uses runes internally.

Check out [nicenshtein-server](https://github.com/Prinzhorn/nicenshtein-server) as well, it has a demo live at [https://nicenshtein.now.sh](https://nicenshtein.now.sh).

# API

## NewNicenshtein()

Returns a new instance of a Nicenshtein index with the following methods:

## IndexFile(filePath string): error

Indexes every single line in the given file using `AddWord`.

## AddWord(word string)

Adds a `word` to the index.

## ContainsWord(word string): bool

Returns whether or not the index contains the given `word`.

## CollectWords(out \*map[string]byte, word string, maxDistance byte)

Will fill `out` (maps words to distances) with all words that are within `maxDistance` of `word`.