https://github.com/pluots/stringmetrics

Rust library for approximate string matching
https://github.com/pluots/stringmetrics

Last synced: about 2 months ago
JSON representation

Rust library for approximate string matching

Host: GitHub
URL: https://github.com/pluots/stringmetrics
Owner: pluots
License: other
Created: 2022-05-10T11:14:04.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-09-03T21:39:37.000Z (9 months ago)
Last Synced: 2025-03-29T17:51:11.815Z (3 months ago)
Language: Rust
Size: 567 KB
Stars: 10
Watchers: 2
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

        # Stringmetrics

This is a Rust library for approximate string matching that implements simple

algorithms such has Hamming distance, Levenshtein distance, Jaccard similarity,

and more.

Here are some useful quick links:

- Crate info: 

- Crate docs: 

- Python library page: 

- Crate source: 

## Algorithms

The main purpose of this library is to provide a variety of string

metric functions. Included algorithms are:

- Levenshtein Distance

- Limited & Weighted Levenshtein Distance

- Jaccard Similarity

- Hamming Distance

See [the documentation](https://docs.rs/stringmetrics/) for full information.

Some examples are below:

```rs

// Basic levenshtein distance

use stringmetrics::levenshtein;

assert_eq!(levenshtein("kitten", "sitting"), 3);

```

```rs

// Levenshtein distance with a limit to save computation time

use stringmetrics::levenshtein_limit;

assert_eq!(levenshtein_limit("a very long string", "short!", 4), 4);

```

```rs

// Set custom weights

use stringmetrics::{levenshtein_weight, LevWeights};

// This struct holds insertion, deletion, and substitution costs

let weights = LevWeights::new(4, 3, 2);

assert_eq!(levenshtein_weight("kitten", "sitting", 100, &weights), 8);

```

```rs

// Basic hamming distance

use stringmetrics::hamming;

let a = "abcdefg";

let b = "aaadefa";

assert_eq!(hamming(a, b), Ok(3));

```

## Future Algorithms & Direction

Eventually, this library aims to add support for more algorithms. Intended work

includes:

1. Update levenshtein distance to have a more performant algorithm for short

   (<64 characters) and long (>100 characters) strings

2. Add the Damerau–Levenshtein distance

3. Add the Jaro–Winkler distance

4. Add the Tversky index

5. Add Cosine similarity

6. Add some useful tokenizers with examples

## License

See the LICENSE file for license information. The provided license does allow

for proprietary use and adaptation; that being said, I kindly suggest that if

you come up with an improvement, you submit a pull request and help us all out

:)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pluots/stringmetrics

Awesome Lists containing this project

README