https://github.com/thaumant/eddie
https://github.com/thaumant/eddie
damerau-levenshtein edit-distance hamming jaro jaro-winkler levenshtein string-similarity
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/thaumant/eddie
- Owner: thaumant
- Created: 2019-11-08T19:42:01.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-01-19T15:54:01.000Z (over 5 years ago)
- Last Synced: 2025-07-23T13:50:38.487Z (3 months ago)
- Topics: damerau-levenshtein, edit-distance, hamming, jaro, jaro-winkler, levenshtein, string-similarity
- Language: Rust
- Size: 125 KB
- Stars: 20
- Watchers: 1
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Eddie
Fast and well-tested implementations of edit distance/string similarity metrics:
- Levenshtein,
- Damerau-Levenshtein,
- Hamming,
- Jaro,
- Jaro-Winkler.## Documentation
See [API reference][1].
[1]: https://docs.rs/eddie/
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
eddie = "0.4"
```## Basic usage
Levenshtein:
```rust
use eddie::Levenshtein;
let lev = Levenshtein::new();
let dist = lev.distance("martha", "marhta");
assert_eq!(dist, 2);
```Damerau-Levenshtein:
```rust
use eddie::DamerauLevenshtein;
let damlev = DamerauLevenshtein::new();
let dist = damlev.distance("martha", "marhta");
assert_eq!(dist, 1);
```Hamming:
```rust
use eddie::Hamming;
let hamming = Hamming::new();
let dist = hamming.distance("martha", "marhta");
assert_eq!(dist, Some(2));
```Jaro:
```rust
use eddie::Jaro;
let jaro = Jaro::new();
let sim = jaro.similarity("martha", "marhta");
assert!((sim - 0.94).abs() < 0.01);
```Jaro-Winkler:
```rust
use eddie::JaroWinkler;
let jarwin = JaroWinkler::new();
let sim = jarwin.similarity("martha", "marhta");
assert!((sim - 0.96).abs() < 0.01);
```## Strings vs slices
The crate exposes two modules containing two sets of implementations:
- `eddie::str` for comparing UTF-8 encoded `&str` and `&String` values.
Implementations are reexported in the root module.
- `eddie::slice` for comparing generic slices `&[T]`.
Implementations in this module are significantly faster than those from `eddie::str`,
but will produce incorrect results for UTF-8 and other variable width character encodings.Usage example:
```rust
use eddie::slice::Levenshtein;let lev = Levenshtein::new();
let dist = lev.distance(&[1, 2, 3], &[1, 3]);
assert_eq!(dist, 1);
```[2]: https://doc.rust-lang.org/std/primitive.char.html
## Complementary metrics
The main metric methods are complemented with inverted and/or relative versions.
The naming convention across the crate is following:
- `distance` — a number of edits required to transform one string to the other;
- `rel_dist` — a distance between two strings, relative to string length (inversion of similarity);
- `similarity` — similarity between two strings (inversion of relative distance).## Performance
At the moment Eddie has the fastest implementations among the alternatives from crates.io that have Unicode support.
For example, when comparing common english words you can expect at least 1.5-2x speedup for any given algorithm except Hamming.
For the detailed measurements tables see [Benchmarks][3] page.
[3]: http://github.com/thaumant/eddie/tree/master/benchmarks.md