An open API service indexing awesome lists of open source software.

https://github.com/rapidfuzz/rapidfuzz-rs

Rapid fuzzy string matching in Rust using various string metrics
https://github.com/rapidfuzz/rapidfuzz-rs

Last synced: 6 months ago
JSON representation

Rapid fuzzy string matching in Rust using various string metrics

Awesome Lists containing this project

README

          


RapidFuzz


Rapid fuzzy string matching in Rust using the Levenshtein Distance



Continous Integration


Gitter chat


Documentation


license


Description
Installation
Usage
License

---
## Description

RapidFuzz is a general purpose string matching library with implementations
for Rust, C++ and Python.

### Key Features

- **Diverse String Metrics**: Offers a variety of string metrics
to suit different use cases. These range from the Levenshtein
distance for edit-based comparisons to the Jaro-Winkler similarity for
more nuanced similarity assessments.
- **Optimized for Speed**: The library is designed with performance in mind.
Each implementation is carefully designed to ensure optimal performance,
making it suitable for the analysis of large datasets.
- **Easy to use**: The API is designed to be simple to use, while still giving
the implementation room for optimization.

## Installation

The installation is as simple as:
```console
$ cargo add rapidfuzz
```

## Usage

The following examples show the usage with the Levenshtein distance. Other metrics
can be found in the [fuzz](https://docs.rs/rapidfuzz/latest/rapidfuzz/fuzz/index.html) and [distance](https://docs.rs/rapidfuzz/latest/rapidfuzz/distance/index.html) modules.

```rust
use rapidfuzz::distance::levenshtein;

// Perform a simple comparision using he levenshtein distance
assert_eq!(
3,
levenshtein::distance("kitten".chars(), "sitting".chars())
);

// If you are sure the input strings are ASCII only it's usually faster to operate on bytes
assert_eq!(
3,
levenshtein::distance("kitten".bytes(), "sitting".bytes())
);

// You can provide a score_cutoff value to filter out strings with distance that is worse than
// the score_cutoff
assert_eq!(
None,
levenshtein::distance_with_args(
"kitten".chars(),
"sitting".chars(),
&levenshtein::Args::default().score_cutoff(2)
)
);

// You can provide a score_hint to tell the implementation about the expected score.
// This can be used to select a more performant implementation internally, but might cause
// a slowdown in cases where the distance is actually worse than the score_hint
assert_eq!(
3,
levenshtein::distance_with_args(
"kitten".chars(),
"sitting".chars(),
&levenshtein::Args::default().score_hint(2)
)
);

// When comparing a single string to multiple strings you can use the
// provided `BatchComparators`. These can cache part of the calculation
// which can provide significant speedups
let scorer = levenshtein::BatchComparator::new("kitten".chars());
assert_eq!(3, scorer.distance("sitting".chars()));
assert_eq!(0, scorer.distance("kitten".chars()));
```

## License
Licensed under either of [Apache License, Version
2.0](https://github.com/rapidfuzz/rapidfuzz-rs/blob/main/LICENSE-APACHE) or [MIT License](https://github.com/rapidfuzz/rapidfuzz-rs/blob/main/LICENSE-MIT) at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in RapidFuzz by you, as defined in the Apache-2.0 license, shall be
dual licensed as above, without any additional terms or conditions.