{"id":50794391,"url":"https://github.com/stella/fuzzy-search","last_synced_at":"2026-06-12T13:31:57.905Z","repository":{"id":351865076,"uuid":"1185655071","full_name":"stella/fuzzy-search","owner":"stella","description":"Approximate substring matching for Node.js and Bun via a Rust Myers engine","archived":false,"fork":false,"pushed_at":"2026-06-11T11:29:21.000Z","size":2551,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-11T13:12:06.713Z","etag":null,"topics":["approximate-string-matching","bun","fuzzy-search","legaltech","napi-rs","nodejs","rust","stella","text-search","typescript"],"latest_commit_sha":null,"homepage":"https://stll.app","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stella.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-03-18T20:10:39.000Z","updated_at":"2026-06-11T11:20:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"c534fe53-e109-4fb5-823c-999ddd4d2964","html_url":"https://github.com/stella/fuzzy-search","commit_stats":null,"previous_names":["stella/fuzzy-search"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/stella/fuzzy-search","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stella%2Ffuzzy-search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stella%2Ffuzzy-search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stella%2Ffuzzy-search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stella%2Ffuzzy-search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stella","download_url":"https://codeload.github.com/stella/fuzzy-search/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stella%2Ffuzzy-search/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34247461,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximate-string-matching","bun","fuzzy-search","legaltech","napi-rs","nodejs","rust","stella","text-search","typescript"],"created_at":"2026-06-12T13:31:57.797Z","updated_at":"2026-06-12T13:31:57.893Z","avatar_url":"https://github.com/stella.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\".github/assets/banner.png\" alt=\"stella\" width=\"100%\" /\u003e\n\u003c/p\u003e\n\n# @stll/fuzzy-search\n\n[NAPI-RS](https://napi.rs/) approximate substring\nmatching for Node.js and Bun. Finds near-matches\nwithin edit distance k with stable UTF-16 offsets,\nreplace-safe match ranges, and optional diacritics\nnormalization.\n\nBuilt on [Myers' bit-parallel algorithm](https://doi.org/10.1145/316542.316550)\n(1999), implemented in Rust and exposed to\nJavaScript via [NAPI-RS](https://github.com/napi-rs/napi-rs).\n\n## Install\n\n```bash\nnpm install @stll/fuzzy-search\n# or\nbun add @stll/fuzzy-search\n```\n\nThe companion `@stll/fuzzy-search-wasm` package is\navailable for browser builds.\n\nIf you use the browser package with Vite, import the\nbundled plugin so the generated WASM loader is not\npre-bundled into broken asset URLs:\n\n```typescript\nimport { defineConfig } from \"vite\";\nimport stllFuzzySearchWasm from \"@stll/fuzzy-search-wasm/vite\";\n\nexport default defineConfig({\n  plugins: [stllFuzzySearchWasm()],\n});\n```\n\nPrebuilts are available for:\n\n| Platform      | Architecture |\n| ------------- | ------------ |\n| macOS         | x64, arm64   |\n| Linux (glibc) | x64, arm64   |\n| WASM          | browser      |\n\n## Usage\n\n```typescript\nimport { FuzzySearch } from \"@stll/fuzzy-search\";\n\nconst fs = new FuzzySearch(\n  [\n    { pattern: \"Gaislerová\", distance: 1 },\n    { pattern: \"Novák\", distance: 1 },\n    { pattern: \"Příbram\", distance: 2 },\n  ],\n  {\n    normalizeDiacritics: true,\n    wholeWords: true,\n  },\n);\n\nfs.findIter(\"Smlouva s Gais1erová v Pribram\");\n// [\n//   { pattern: 0, start: 10, end: 20,\n//     text: \"Gais1erová\", distance: 1 },\n//   { pattern: 2, start: 23, end: 30,\n//     text: \"Pribram\", distance: 0 },\n// ]\n```\n\n### Patterns\n\nPatterns can be strings (default distance 1) or\nobjects with explicit distance and optional name:\n\n```typescript\nconst fs = new FuzzySearch([\n  \"simple\", // distance 1\n  { pattern: \"named\", name: \"entity\" }, // distance 1\n  { pattern: \"precise\", distance: 2 }, // distance 2\n]);\n```\n\nDistance must be less than pattern length.\n\n### Options\n\n```typescript\nconst fs = new FuzzySearch(patterns, {\n  // Strip diacritics before matching (NFD + remove\n  // combining marks). \"Příbram\" matches \"Pribram\"\n  // at distance 0.\n  normalizeDiacritics: true, // default: false\n\n  // Only match whole words. Uses Unicode\n  // is_alphanumeric() for boundary detection.\n  // CJK characters always pass (no inter-word\n  // spaces in CJK).\n  wholeWords: true, // default: true\n\n  // Case-insensitive matching (Unicode-aware).\n  caseInsensitive: true, // default: false\n\n  // Unicode word boundaries (reserved for future\n  // UAX#29 segmentation support).\n  unicodeBoundaries: true, // default: true\n\n  // Drop matches whose score is below threshold.\n  // Score = 1 - distance / pattern.length.\n  // Inclusive (score \u003e= minScore keeps the match).\n  minScore: 0.7,\n\n  // Return only the top k matches by score, across\n  // all patterns. Tie-broken by start, then pattern.\n  kBest: 5,\n});\n```\n\n### Scored output\n\nEvery match carries a normalized score in `[0, 1]`,\ncomputed as `1 - distance / pattern.length` and\nclamped at 0. Pair it with `minScore` and `kBest` for\ntop-N ranking without a follow-up sort:\n\n```typescript\nconst fs = new FuzzySearch(\n  [\n    { pattern: \"Novák\", distance: 2 },\n    { pattern: \"Gaislerová\", distance: 2 },\n  ],\n  { wholeWords: true, minScore: 0.7, kBest: 3 },\n);\n\nfs.findIter(\"Nowák a Gais1erova\");\n// [\n//   { pattern: 0, text: \"Nowák\", distance: 1, score: 0.8, ... },\n//   { pattern: 1, text: \"Gais1erova\", distance: 2, score: 0.8, ... },\n// ]\n```\n\n`replaceAll` always replaces every distance-qualified\nmatch and ignores `minScore` / `kBest`, so the\n`replacements`-by-pattern contract stays\ndeterministic.\n\n### Replace\n\n```typescript\nfs.replaceAll(\"Smlouva s Gais1erová\", [\n  \"[REDACTED]\",\n  \"[REDACTED]\",\n  \"[REDACTED]\",\n]);\n// \"Smlouva s [REDACTED]\"\n```\n\n`replacements[i]` replaces pattern `i`.\n\n### Distance helper\n\n```typescript\nimport { distance } from \"@stll/fuzzy-search\";\n\ndistance(\"kitten\", \"sitting\"); // 3\ndistance(\"abcd\", \"abdc\", \"damerau-levenshtein\"); // 1\n```\n\n## Benchmarks\n\nThe repository includes a checked-in benchmark harness\nfor synthetic and corpus-based searches. The inputs\nare public and the scripts are reproducible from the\nrepo. Run them locally:\n\n```bash\nbun run bench:install\nbun run bench:download\nbun run bench:speed\nbun run bench:correctness\n```\n\nThe speed harness compares practical JS ecosystem\nalternatives, but not every comparator implements the\nsame exact semantics. `@stll/fuzzy-search` is solving\napproximate substring search with offsets and\nreplacement-friendly match ranges; tools like\n`fuse.js` and `fuzzball` are included as reference\npoints, not as exact drop-in equivalents. The\nheadline comparisons in this repo are the\nsubstring-mode rows against sliding-window\nLevenshtein baselines.\n\nRepresentative baseline from the checked-in public\nharness on this machine:\n\n- runtime: Bun `1.3.12`\n- platform: macOS `26.4.1` (`Darwin arm64`)\n\n| Scenario                         | `@stll/fuzzy-search` | Sliding-window JS baseline | Relative |\n| -------------------------------- | -------------------- | -------------------------- | -------- |\n| Czech legal, `64 KB`, `5` names  | `2.41 ms`            | `80.78 ms`                 | `33.5x`  |\n| Bible, `4.0 MB`, `5` names       | `239.91 ms`          | `3903.26 ms`               | `16.3x`  |\n| Czech news, `4.8 MB`, `5` names  | `262.39 ms`          | `4350.52 ms`               | `16.6x`  |\n| German news, `5.5 MB`, `5` names | `405.72 ms`          | `6816.03 ms`               | `16.8x`  |\n\nThese rows are substring mode (`wholeWords: false`)\nwith edit distance `1-2`, which is the core workload\nthis package is designed for.\n\n\u003cdetails\u003e\n\u003csummary\u003eAlternatives tested\u003c/summary\u003e\n\n- [fastest-levenshtein](https://www.npmjs.com/package/fastest-levenshtein) + sliding window — fastest JS Levenshtein distance\n- [fuse.js](https://www.npmjs.com/package/fuse.js) — fuzzy search (scoring, not substring matching)\n- [fuzzball](https://www.npmjs.com/package/fuzzball) — Python rapidfuzz port\n- naive JS — O(nm) Levenshtein per window position\n\n\u003c/details\u003e\n\n## Correctness\n\nCorrectness is covered by example-based tests and\nproperty tests. The property suite verifies distance\nbounds, oracle agreement, whole-word boundaries,\nUTF-16 offset stability, normalization behavior, and\nmixed option combinations over randomized inputs.\n\n## API\n\n| Method                                | Returns        | Description              |\n| ------------------------------------- | -------------- | ------------------------ |\n| `new FuzzySearch(patterns, options?)` | instance       | Build matcher            |\n| `.findIter(haystack)`                 | `FuzzyMatch[]` | Non-overlapping matches  |\n| `.isMatch(haystack)`                  | `boolean`      | Any pattern matches?     |\n| `.replaceAll(haystack, replacements)` | `string`       | Replace matched patterns |\n| `.patternCount`                       | `number`       | Number of patterns       |\n\n### Types\n\n```typescript\ntype PatternEntry =\n  | string\n  | { pattern: string; distance?: number; name?: string };\n\ntype Options = {\n  normalizeDiacritics?: boolean; // default: false\n  wholeWords?: boolean; // default: true\n  caseInsensitive?: boolean; // default: false\n  unicodeBoundaries?: boolean; // default: true\n  minScore?: number; // drop matches below threshold\n  kBest?: number; // top-k by score, ties by start\n};\n\ntype FuzzyMatch = {\n  pattern: number; // index into patterns array\n  start: number; // UTF-16 code unit offset\n  end: number; // exclusive\n  text: string; // matched substring\n  distance: number; // actual Levenshtein distance\n  score: number; // 1 - distance/pattern.length\n  name?: string; // pattern name (if provided)\n};\n```\n\nMatch offsets are UTF-16 code unit indices,\ncompatible with `String.prototype.slice()`.\n\n### Error handling\n\n- Constructor throws if a pattern is empty, longer\n  than 64 characters, or has distance \u003e= pattern\n  length.\n- `replaceAll` throws if `replacements.length`\n  does not equal `patternCount`.\n\n## How it works\n\n1. **Myers' bit-parallel algorithm** scans the text\n   in O(n) per pattern for patterns up to 64\n   characters. No DFA construction, no state\n   explosion at higher distances.\n\n2. **Start position recovery** via small-window\n   Levenshtein: for each match end position from\n   Myers, a window of [m-k, m+k] characters is\n   evaluated to find the exact start and distance.\n\n3. **Diacritics normalization**: NFD decomposition +\n   combining mark stripping (Unicode General\n   Category M via `unicode-normalization` crate).\n   Covers all scripts.\n\n4. **UTF-16 offset translation**: character-level\n   matching with incremental char→UTF-16 mapping\n   for JS string compatibility.\n\n## Limitations\n\n- **Pattern length capped at 64 characters.** Myers\n  uses a single u64 bit-vector per pattern. Longer\n  patterns would need multi-word vectors (not yet\n  implemented).\n- **No streaming API.** The full haystack must be in\n  memory. For chunked processing, use\n  `@stll/aho-corasick`'s `StreamMatcher` for exact\n  prefiltering and fuzzy-search on flagged regions.\n- **WASM requires `SharedArrayBuffer`.** Browser\n  builds need `Cross-Origin-Opener-Policy: same-origin`\n  and `Cross-Origin-Embedder-Policy: require-corp`\n  headers.\n\n## Development\n\n```bash\nbun install\nbun run build           # native module (requires Rust)\nbun test                # 36 unit tests\nbun run test:props      # 36 property tests × 1000 runs\n\nbun run bench:install   # benchmark dependencies\nbun run bench:download  # download corpora\nbun run bench:speed     # speed comparison\nbun run bench:correctness  # oracle verification\n\nbun run lint            # oxlint\nbun run format          # oxfmt + rustfmt\n```\n\n## License\n\n[MIT](./LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstella%2Ffuzzy-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstella%2Ffuzzy-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstella%2Ffuzzy-search/lists"}