https://github.com/universal-automata/liblevenshtein-coffeescript

Various utilities regarding Levenshtein transducers. (CoffeeScript / JavaScript / Node.js)
https://github.com/universal-automata/liblevenshtein-coffeescript

Last synced: 6 months ago
JSON representation

Various utilities regarding Levenshtein transducers. (CoffeeScript / JavaScript / Node.js)

Host: GitHub
URL: https://github.com/universal-automata/liblevenshtein-coffeescript
Owner: universal-automata
License: mit
Created: 2014-03-29T20:38:12.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2016-06-20T01:59:12.000Z (about 9 years ago)
Last Synced: 2025-01-12T13:11:45.494Z (6 months ago)
Language: CoffeeScript
Size: 32.2 KB
Stars: 12
Watchers: 4
Forks: 7
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # liblevenshtein

## CoffeeScript / JavaScript / Node.js

### A library for generating Finite State Transducers based on Levenshtein Automata.

[![npm version][npm-version-badge]][npm-repo]

[![Build Status][travis-ci-badge]][travis-ci]

[![Join the chat at https://gitter.im/universal-automata/liblevenshtein-coffeescript][gitter-badge]][gitter-channel]

Levenshtein transducers accept a query term and return all terms in a

dictionary that are within n spelling errors away from it. They constitute a

highly-efficient (space _and_ time) class of spelling correctors that work very

well when you do not require context while making suggestions.  Forget about

performing a linear scan over your dictionary to find all terms that are

sufficiently-close to the user's query, using a quadratic implementation of the

[Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) or

[Damerau-Levenshtein

distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance),

these babies find _all_ the terms from your dictionary in linear time _on the

length of the query term_ (not on the size of the dictionary, on the length of

the query term).

If you need context, then take the candidates generated by the transducer as a

starting place, and plug them into whatever model you're using for context (such

as by selecting the sequence of terms that have the greatest probability of

appearing together).

For a quick demonstration, please visit the [Github Page,

here](http://universal-automata.github.io/liblevenshtein/).

The library is currently written in Java, CoffeeScript, and JavaScript, but I

will be porting it to other languages, soon.  If you have a specific language

you would like to see it in, or package-management system you would like it

deployed to, let me know.

### Basic Usage:

#### Node.js

Install the module via `npm`:

```

% npm install liblevenshtein

info trying registry request attempt 1 at 12:59:16

http GET https://registry.npmjs.org/liblevenshtein

http 304 https://registry.npmjs.org/liblevenshtein

[email protected] node_modules/liblevenshtein

```

Then, you may `require` it to do whatever you need:

```javascript

var levenshtein = require('liblevenshtein');

// Assume "completion_list" is a list of terms you want to match against in

// fuzzy queries.

var builder = new levenshtein.Builder()

  .dictionary(completion_list, false)  // generate spelling candidates from unsorted completion_list

  .algorithm("transposition")          // use Levenshtein distance extended with transposition

  .sort_candidates(true)               // sort the spelling candidates before returning them

  .case_insensitive_sort(true)         // ignore character-casing while sorting terms

  .include_distance(false)             // just return the ordered terms (drop the distances)

  .maximum_candidates(10);             // only want the top-10 candidates

// Maximum number of spelling errors we will allow the spelling candidates to

// have, with regard to the query term.

var MAX_EDIT_DISTANCE = 2;

var transducer = builder.build();

// Assume "term" corresponds to some query term. Once invoking

// transducer.transduce(term, MAX_EDIT_DISTANCE), candidates will contain a list

// of all spelling candidates from the completion list that are within

// MAX_EDIT_DISTANCE units of error from the query term.

var candidates = transducer.transduce(term, MAX_EDIT_DISTANCE);

```

#### In the Browser

To use the library on your website, reference the desired file from the

`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/universal-automata/liblevenshtein-coffeescript

Awesome Lists containing this project

README