Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/universal-automata/liblevenshtein-coffeescript
Various utilities regarding Levenshtein transducers. (CoffeeScript / JavaScript / Node.js)
https://github.com/universal-automata/liblevenshtein-coffeescript
Last synced: 17 days ago
JSON representation
Various utilities regarding Levenshtein transducers. (CoffeeScript / JavaScript / Node.js)
- Host: GitHub
- URL: https://github.com/universal-automata/liblevenshtein-coffeescript
- Owner: universal-automata
- License: mit
- Created: 2014-03-29T20:38:12.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2016-06-20T01:59:12.000Z (over 8 years ago)
- Last Synced: 2025-01-12T13:11:45.494Z (17 days ago)
- Language: CoffeeScript
- Size: 32.2 KB
- Stars: 12
- Watchers: 4
- Forks: 7
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# liblevenshtein
## CoffeeScript / JavaScript / Node.js
### A library for generating Finite State Transducers based on Levenshtein Automata.
[![npm version][npm-version-badge]][npm-repo]
[![Build Status][travis-ci-badge]][travis-ci]
[![Join the chat at https://gitter.im/universal-automata/liblevenshtein-coffeescript][gitter-badge]][gitter-channel]Levenshtein transducers accept a query term and return all terms in a
dictionary that are within n spelling errors away from it. They constitute a
highly-efficient (space _and_ time) class of spelling correctors that work very
well when you do not require context while making suggestions. Forget about
performing a linear scan over your dictionary to find all terms that are
sufficiently-close to the user's query, using a quadratic implementation of the
[Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) or
[Damerau-Levenshtein
distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance),
these babies find _all_ the terms from your dictionary in linear time _on the
length of the query term_ (not on the size of the dictionary, on the length of
the query term).If you need context, then take the candidates generated by the transducer as a
starting place, and plug them into whatever model you're using for context (such
as by selecting the sequence of terms that have the greatest probability of
appearing together).For a quick demonstration, please visit the [Github Page,
here](http://universal-automata.github.io/liblevenshtein/).The library is currently written in Java, CoffeeScript, and JavaScript, but I
will be porting it to other languages, soon. If you have a specific language
you would like to see it in, or package-management system you would like it
deployed to, let me know.### Basic Usage:
#### Node.js
Install the module via `npm`:
```
% npm install liblevenshtein
info trying registry request attempt 1 at 12:59:16
http GET https://registry.npmjs.org/liblevenshtein
http 304 https://registry.npmjs.org/liblevenshtein
[email protected] node_modules/liblevenshtein
```Then, you may `require` it to do whatever you need:
```javascript
var levenshtein = require('liblevenshtein');// Assume "completion_list" is a list of terms you want to match against in
// fuzzy queries.
var builder = new levenshtein.Builder()
.dictionary(completion_list, false) // generate spelling candidates from unsorted completion_list
.algorithm("transposition") // use Levenshtein distance extended with transposition
.sort_candidates(true) // sort the spelling candidates before returning them
.case_insensitive_sort(true) // ignore character-casing while sorting terms
.include_distance(false) // just return the ordered terms (drop the distances)
.maximum_candidates(10); // only want the top-10 candidates// Maximum number of spelling errors we will allow the spelling candidates to
// have, with regard to the query term.
var MAX_EDIT_DISTANCE = 2;var transducer = builder.build();
// Assume "term" corresponds to some query term. Once invoking
// transducer.transduce(term, MAX_EDIT_DISTANCE), candidates will contain a list
// of all spelling candidates from the completion list that are within
// MAX_EDIT_DISTANCE units of error from the query term.
var candidates = transducer.transduce(term, MAX_EDIT_DISTANCE);
```#### In the Browser
To use the library on your website, reference the desired file from the
`