https://github.com/vborovikov/fuzzy
Fuzzy string comparison library
https://github.com/vborovikov/fuzzy
csharp dotnet fuzzy-string fuzzy-string-comparison fuzzy-string-matching string-comparison string-matching
Last synced: 5 months ago
JSON representation
Fuzzy string comparison library
- Host: GitHub
- URL: https://github.com/vborovikov/fuzzy
- Owner: vborovikov
- License: mit
- Created: 2023-05-07T13:00:13.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-05T07:53:33.000Z (about 1 year ago)
- Last Synced: 2024-05-05T08:36:58.476Z (about 1 year ago)
- Topics: csharp, dotnet, fuzzy-string, fuzzy-string-comparison, fuzzy-string-matching, string-comparison, string-matching
- Language: C#
- Homepage:
- Size: 50.8 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# FuzzyCompare
Fuzzy string comparison library[](https://www.nuget.org/packages/FuzzyCompare)
[](https://www.nuget.org/packages/FuzzyCompare)
[](https://github.com/vborovikov/fuzzy/blob/main/LICENSE)## ComparisonMethods
The class provides three static methods:
- `JaroSimilarity`: calculates the Jaro similarity between two given `ReadOnlySpan` inputs. It returns a float value that ranges from 0 to 1, representing the similarity between the two inputs.
- `JaroWinklerSimilarity`: calculates the Jaro-Winkler similarity between two given `ReadOnlySpan` inputs. It returns a float value that ranges from 0 to 1, representing the similarity between the two inputs. The p parameter is optional and it sets the scaling factor for the common prefix length adjustment.
- `LevenshteinDistance`: calculates the Levenshtein distance between two given `ReadOnlySpan` inputs. It returns an int value that represents the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one input into the other.## FuzzyStringComparer
The `FuzzyStringComparer` class provides a way to compare strings using a fuzzy matching algorithm. The main purpose of the `FuzzyStringComparer` class is to determine the similarity between two strings, by calculating a similarity score based on the number of matching characters and their positions within the strings.
It provides a constructor that takes an optional CultureInfo parameter, which can be used to specify the culture to use for string comparison. By default, the `FuzzyStringComparer` class uses the current culture for case-insensitive string comparison.
The `FuzzyStringComparer` class uses a fuzzy matching algorithm that takes into account several factors, including the length of the strings, the number of matching characters, and the positions of the matching characters within the strings. The algorithm is designed to be tolerant of small differences between the strings, such as typos, misspellings, or minor variations in formatting.
## Tokenizer
The `Tokenizer` class is a utility class that provides methods for tokenizing strings and spans of characters. The main purpose of the class is to break down a string or span of characters into individual tokens, which are then returned as a token enumerator.
The class provides two main methods for tokenizing strings and spans of characters: `Tokenize` and `EnumerateTokens`. The `Tokenize` method takes a string as input and returns an `TokenEnumerator` object that can be used to iterate through the tokens in the string. The `EnumerateTokens` method takes a read-only span of characters as input and returns a `TokenRefEnumerator` object that can be used to iterate through the tokens in the span.