https://github.com/universal-automata/liblevenshtein-shared

Various utilities regarding Levenshtein transducers. (Shared files for testing, etc.)
https://github.com/universal-automata/liblevenshtein-shared

Last synced: 3 months ago
JSON representation

Various utilities regarding Levenshtein transducers. (Shared files for testing, etc.)

Host: GitHub
URL: https://github.com/universal-automata/liblevenshtein-shared
Owner: universal-automata
License: mit
Created: 2014-03-29T21:17:32.000Z (about 11 years ago)
Default Branch: master
Last Pushed: 2016-04-02T19:58:41.000Z (about 9 years ago)
Last Synced: 2025-01-21T00:41:56.302Z (5 months ago)
Size: 565 KB
Stars: 2
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # liblevenshtein

### A library for generating Finite State Transducers based on Levenshtein Automata.

This particular module contains files that are shared among the supported

languages, for testing, etc.

Levenshtein transducers accept a query term and return all terms in a

dictionary that are within n spelling errors away from it. They constitute a

highly-efficient (space _and_ time) class of spelling correctors that work very

well when you do not require context while making suggestions.  Forget about

performing a linear scan over your dictionary to find all terms that are

sufficiently-close to the user's query, using a quadratic implementation of the

[Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) or

[Damerau-Levenshtein

distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance),

these babies find _all_ the terms from your dictionary in linear time _on the

length of the query term_ (not on the size of the dictionary, on the length of

the query term).

If you need context, then take the candidates generated by the transducer as a

starting place, and plug them into whatever model you're using for context (such

as by selecting the sequence of terms that have the greatest probability of

appearing together).

For a quick demonstration, please visit the [Github Page,

here](http://dylon.github.io/liblevenshtein/).

The library is currently only written in CoffeeScript (and JavaScript), but I

will be porting it to other languages, soon.  If you have a specific language

you would like to see it in, or package-management system you would like it

deployed to, let me know.

This library is based largely on the work of [Stoyan

Mihov](http://www.lml.bas.bg/~stoyan/), [Klaus

Schulz](http://www.klaus-schulze.com/), and Petar Nikolaev Mitankin: "[Fast

String Correction with

Levenshtein-Automata](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652

"Klaus Schulz and Stoyan Mihov (2002)")".  For more details, please see the

[wiki](https://github.com/dylon/liblevenshtein/wiki).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/universal-automata/liblevenshtein-shared

Awesome Lists containing this project

README