https://github.com/rapidfuzz/rapidfuzz-rs
Rapid fuzzy string matching in Rust using various string metrics
https://github.com/rapidfuzz/rapidfuzz-rs
Last synced: 6 months ago
JSON representation
Rapid fuzzy string matching in Rust using various string metrics
- Host: GitHub
- URL: https://github.com/rapidfuzz/rapidfuzz-rs
- Owner: rapidfuzz
- License: apache-2.0
- Created: 2020-04-10T13:15:44.000Z (about 6 years ago)
- Default Branch: main
- Last Pushed: 2024-06-29T17:54:30.000Z (almost 2 years ago)
- Last Synced: 2025-06-05T08:09:19.106Z (about 1 year ago)
- Language: Rust
- Homepage: https://docs.rs/rapidfuzz/latest/rapidfuzz/
- Size: 583 KB
- Stars: 54
- Watchers: 2
- Forks: 5
- Open Issues: 6
-
Metadata Files:
- Readme: Readme.md
- Changelog: CHANGELOG.rst
- License: LICENSE-APACHE
- Security: SECURITY.md
Awesome Lists containing this project
README
Rapid fuzzy string matching in Rust using the Levenshtein Distance
Description •
Installation •
Usage •
License
---
## Description
RapidFuzz is a general purpose string matching library with implementations
for Rust, C++ and Python.
### Key Features
- **Diverse String Metrics**: Offers a variety of string metrics
to suit different use cases. These range from the Levenshtein
distance for edit-based comparisons to the Jaro-Winkler similarity for
more nuanced similarity assessments.
- **Optimized for Speed**: The library is designed with performance in mind.
Each implementation is carefully designed to ensure optimal performance,
making it suitable for the analysis of large datasets.
- **Easy to use**: The API is designed to be simple to use, while still giving
the implementation room for optimization.
## Installation
The installation is as simple as:
```console
$ cargo add rapidfuzz
```
## Usage
The following examples show the usage with the Levenshtein distance. Other metrics
can be found in the [fuzz](https://docs.rs/rapidfuzz/latest/rapidfuzz/fuzz/index.html) and [distance](https://docs.rs/rapidfuzz/latest/rapidfuzz/distance/index.html) modules.
```rust
use rapidfuzz::distance::levenshtein;
// Perform a simple comparision using he levenshtein distance
assert_eq!(
3,
levenshtein::distance("kitten".chars(), "sitting".chars())
);
// If you are sure the input strings are ASCII only it's usually faster to operate on bytes
assert_eq!(
3,
levenshtein::distance("kitten".bytes(), "sitting".bytes())
);
// You can provide a score_cutoff value to filter out strings with distance that is worse than
// the score_cutoff
assert_eq!(
None,
levenshtein::distance_with_args(
"kitten".chars(),
"sitting".chars(),
&levenshtein::Args::default().score_cutoff(2)
)
);
// You can provide a score_hint to tell the implementation about the expected score.
// This can be used to select a more performant implementation internally, but might cause
// a slowdown in cases where the distance is actually worse than the score_hint
assert_eq!(
3,
levenshtein::distance_with_args(
"kitten".chars(),
"sitting".chars(),
&levenshtein::Args::default().score_hint(2)
)
);
// When comparing a single string to multiple strings you can use the
// provided `BatchComparators`. These can cache part of the calculation
// which can provide significant speedups
let scorer = levenshtein::BatchComparator::new("kitten".chars());
assert_eq!(3, scorer.distance("sitting".chars()));
assert_eq!(0, scorer.distance("kitten".chars()));
```
## License
Licensed under either of [Apache License, Version
2.0](https://github.com/rapidfuzz/rapidfuzz-rs/blob/main/LICENSE-APACHE) or [MIT License](https://github.com/rapidfuzz/rapidfuzz-rs/blob/main/LICENSE-MIT) at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in RapidFuzz by you, as defined in the Apache-2.0 license, shall be
dual licensed as above, without any additional terms or conditions.