https://github.com/andrewjsaid/levenshtypo
A fuzzy string dictionary based on Levenshtein automata
https://github.com/andrewjsaid/levenshtypo
dotnet edit-distance fuzzy-string fuzzy-string-matching levenshtein levenshtein-automata levenshtein-string-distance optimal-string-alignment restricted-edit string-distance string-matching
Last synced: 5 months ago
JSON representation
A fuzzy string dictionary based on Levenshtein automata
- Host: GitHub
- URL: https://github.com/andrewjsaid/levenshtypo
- Owner: andrewjsaid
- License: mit
- Created: 2024-07-30T22:26:42.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-07-12T15:37:26.000Z (11 months ago)
- Last Synced: 2025-10-25T01:05:14.298Z (7 months ago)
- Topics: dotnet, edit-distance, fuzzy-string, fuzzy-string-matching, levenshtein, levenshtein-automata, levenshtein-string-distance, optimal-string-alignment, restricted-edit, string-distance, string-matching
- Language: C#
- Homepage:
- Size: 1.68 MB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
๏ปฟ# ๐ Levenshtypo
> Fast, typo-tolerant string lookup for your .NET apps โ powered by Levenshtein Automata + Trie magic.
**Levenshtypo** is a high-performance fuzzy matching library that helps you find strings _even when your users mistype them_. Whether you're building search, suggestions, command matchers, or text correction tools, Levenshtypo lets you query massive datasets with typo tolerance and blazingly fast response times.
---
## ๐ Features
- โก๏ธ **Fast fuzzy lookup** over large datasets
- ๐ Backed by a Trie for fast prefix traversal
- ๐ง Uses [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) for string matching
- ๐ฏ Also supports **restricted edit distance** (insertions, deletions, substitutions + transpositions)
- ๐๏ธ Fully exposed **Levenshtein Automata** for custom workflows
- ๐งช Minimal allocations and branchy hot paths tuned for speed
---
## ๐ก Why Use Levenshtypo?
Traditional string matching fails when:
- Your users make typos (`"git cmomit"`)
- Input comes from noisy sources (voice input, OCR)
- You want a UX that _feels smart_, not frustrating
Instead of `dictionary["cmomit"]` you can do `leveshtrie.Search("cmomit", maxEditDistance: 1)`.
---
## ๐งช Basic Usage
```csharp
using Levenshtypo;
var matcher = Levenshtrie.CreateStrings(["docker", "doctor", "rocket", "locker"]);
foreach (var match in matcher.Search("docer", 2))
{
Console.WriteLine($"{match.Result} (distance {match.Distance})");
}
// docker(distance 1)
// doctor(distance 2)
// locker(distance 2)
```
### ๐ง Under the Hood
- The dataset is loaded into a [Trie](https://en.wikipedia.org/wiki/Trie).
- A Levenshtein automaton is built on the fly from your query.
- The trie is traversed with the automaton to **prune irrelevant branches early**, yielding matches quickly.
---
## ๐จ Installation
๐ฆ Available on NuGet:
```
Install-Package Levenshtypo
```
Or via CLI:
```bash
dotnet add package Levenshtypo
```
[](https://github.com/andrewjsaid/levenshtypo/actions/workflows/tests.yml)
[](https://github.com/andrewjsaid/levenshtypo/actions/workflows/aot.yml)
---
## โ๏ธ Automaton-Only Mode
Need raw speed and full control?
```csharp
var automaton = LevenshtomatonFactory.Instance.Construct(
"docker",
maxEditDistance: 2,
metric: LevenshtypoMetric.RestrictedEdit);
foreach (var word in english)
{
if (automaton.Matches(word))
{
Console.WriteLine(word);
}
}
```
โ๏ธ Over 3000x faster than using `if(LevenshteinDistance(word, "docker") <= 2)`
You can hook into the automaton layer directly for:
- Custom indexing
- Building autocomplete engines
- Approximate dictionary search
---
## ๐ง Performance
Levenshtypo is written with performance at the forefront of all decisions.
> Practical Example: Matching against **450,000+ words** (Edit Distance = 1) is typically less than **0.02 ms** compared to 73 ms with a for-loop.
If the following benchmarks don't impress you, nothing will!
Search all English Language with a fuzzy key
- **Naive**: Compute Levenshtein Distance against all words.
- **Levenshtypo_All**: This library, with all results buffered into an array.
- **Levenshtypo_Lazy**: This library, with lazy evaluation (`IEnumerable`).
- **Levenshtypo_Any**: This library, with lazy evaluation (`IEnumerable`), stopping at the first result.
- **Dictionary**: .NET Dictionary which only works for distance of 0.
| Method | Mean | Allocated |
| -------------------------- | ----------------: | --------: |
| Distance0_Levenshtypo_All | 361.444 ns | 240 B |
| Distance0_Levenshtypo_Lazy | 975.169 ns | 480 B |
| Distance0_Levenshtypo_Any | 614.947 ns | 480 B |
| Distance0_Dictionary | 9.128 ns | - |
| Distance0_Naive | 813,419.616 ns | 89 B |
| Distance1_Levenshtypo_All | 19,008.096 ns | 536 B |
| Distance1_Levenshtypo_Lazy | 38,615.868 ns | 480 B |
| Distance1_Levenshtypo_Any | 25,805.258 ns | 480 B |
| Distance1_Naive | 73,459,775.661 ns | 193 B |
| Distance2_Levenshtypo_All | 276,157.020 ns | 2600 B |
| Distance2_Levenshtypo_Lazy | 440,689.397 ns | 480 B |
| Distance2_Levenshtypo_Any | 215,542.244 ns | 480 B |
| Distance2_Naive | 68,999,745.833 ns | 700 B |
| Distance3_Levenshtypo_All | 1,617,282.340 ns | 25985 B |
| Distance3_Levenshtypo_Lazy | 2,452,026.901 ns | 1123 B |
| Distance3_Levenshtypo_Any | 231,972.804 ns | 584 B |
| Distance3_Naive | 71,845,738.624 ns | 4369 B |
Load all English Language dataset
- **Levenshtypo**: This library.
- **Dictionary**: .NET Dictionary for comparison.
| Method | Mean | Allocated |
| ------------------- | ------------: | -----------: |
| English_Dictionary | 31,755.45 ฮผs | 35524.19 KB |
| English_Levenshtypo | 142,010.47 ฮผs | 145145.15 KB |
---
## ๐ License
MIT โ free for personal and commercial use.
---
> Made with โค๏ธ, performance profiling, and typo tolerance by [@andrewjsaid](https://github.com/andrewjsaid)