https://github.com/jhermsmeier/node-sift-distance
SIFT distance algorithm
https://github.com/jhermsmeier/node-sift-distance
distance sift sift-algorithm string-distance
Last synced: 5 months ago
JSON representation
SIFT distance algorithm
- Host: GitHub
- URL: https://github.com/jhermsmeier/node-sift-distance
- Owner: jhermsmeier
- License: mit
- Created: 2015-02-14T13:19:53.000Z (over 10 years ago)
- Default Branch: SIFT4
- Last Pushed: 2015-02-15T11:38:06.000Z (over 10 years ago)
- Last Synced: 2025-01-07T15:10:03.974Z (6 months ago)
- Topics: distance, sift, sift-algorithm, string-distance
- Language: JavaScript
- Size: 160 KB
- Stars: 7
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# SIFT 4
[](https://npmjs.com/sift-distance)
[](https://npmjs.com/sift-distance)
[](https://travis-ci.org/jhermsmeier/node-sift-distance)## Install via [npm](https://npmjs.com)
```sh
$ npm install sift-distance
```**NOTE:** The major version of this module tracks the algorithm's version.
So, if you want to use SIFT 3, for example, you'd install `[email protected]`, for version 3B of the SIFT algorithm `[email protected]`, for version 4 `[email protected]` and so on.## About
This implements the [SIFT4 extended version](http://siderite.blogspot.com/2014/11/super-fast-and-accurate-string-distance.html).
## API
#### SIFT( *a*, *b*, *[options]* )
- **String|Buffer|Array** `a`
- **String|Buffer|Array** `b`
- **Object** `options`
- **Number** `maxOffset`
- **Number** `maxDistance`
- **Function** `tokenizer`
- **Function** `tokenMatcher`
- **Function** `matchEvaluator`
- **Function** `lengthEvaluator`
- **Function** `transpositionEvaluator`### Options
#### Number `maxOffset`
The maximum largest common substring offset to be matched against one another. Defaults to `5`.
#### Number `maxDistance`
Distance at which the algorithm should stop computing the value and just exit (the values are too different anyway).
#### Function `tokenizer( value ) -> String|Array|Buffer`
- **Mixed** `value`
Function to transform strings into vectors of tokens.
#### Function `tokenMatcher( token1, token2 ) -> Boolean`
- **Mixed** `token1`
- **Mixed** `token2`Function to determine if two tokens match each other (equal).
#### Function `matchEvaluator( token1, token2 ) -> Number`
- **Mixed** `token1`
- **Mixed** `token2`Function to determine the way a token match should be added to the `lcs` (largest common substring). For example, a fuzzy match could be implemented.
#### Function `lengthEvaluator( lcs ) -> Number`
- **Number** `lcs`: largest common substring length
Function to determine the way the `lcs` value is added to the `lcss`. For example, longer continuous substrings could be awarded.
#### Function `transpositionEvaluator( transpositions, lcss ) -> Number`
- **Number** `transpositions`: number of transpositions
- **Number** `lcss`: largest common subsequence lengthFunction to determine the way the number of transpositions affects the final result.