https://github.com/pluots/stringmetrics
Rust library for approximate string matching
https://github.com/pluots/stringmetrics
Last synced: about 2 months ago
JSON representation
Rust library for approximate string matching
- Host: GitHub
- URL: https://github.com/pluots/stringmetrics
- Owner: pluots
- License: other
- Created: 2022-05-10T11:14:04.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-09-03T21:39:37.000Z (9 months ago)
- Last Synced: 2025-03-29T17:51:11.815Z (2 months ago)
- Language: Rust
- Size: 567 KB
- Stars: 10
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Stringmetrics
This is a Rust library for approximate string matching that implements simple
algorithms such has Hamming distance, Levenshtein distance, Jaccard similarity,
and more.Here are some useful quick links:
- Crate info:
- Crate docs:
- Python library page:
- Crate source:## Algorithms
The main purpose of this library is to provide a variety of string
metric functions. Included algorithms are:- Levenshtein Distance
- Limited & Weighted Levenshtein Distance
- Jaccard Similarity
- Hamming DistanceSee [the documentation](https://docs.rs/stringmetrics/) for full information.
Some examples are below:```rs
// Basic levenshtein distance
use stringmetrics::levenshtein;assert_eq!(levenshtein("kitten", "sitting"), 3);
``````rs
// Levenshtein distance with a limit to save computation time
use stringmetrics::levenshtein_limit;assert_eq!(levenshtein_limit("a very long string", "short!", 4), 4);
``````rs
// Set custom weights
use stringmetrics::{levenshtein_weight, LevWeights};// This struct holds insertion, deletion, and substitution costs
let weights = LevWeights::new(4, 3, 2);
assert_eq!(levenshtein_weight("kitten", "sitting", 100, &weights), 8);
``````rs
// Basic hamming distance
use stringmetrics::hamming;let a = "abcdefg";
let b = "aaadefa";
assert_eq!(hamming(a, b), Ok(3));
```## Future Algorithms & Direction
Eventually, this library aims to add support for more algorithms. Intended work
includes:1. Update levenshtein distance to have a more performant algorithm for short
(<64 characters) and long (>100 characters) strings
2. Add the Damerau–Levenshtein distance
3. Add the Jaro–Winkler distance
4. Add the Tversky index
5. Add Cosine similarity
6. Add some useful tokenizers with examples## License
See the LICENSE file for license information. The provided license does allow
for proprietary use and adaptation; that being said, I kindly suggest that if
you come up with an improvement, you submit a pull request and help us all out
:)