https://github.com/elchemista/fuzler

A tiny, Rust‑powered string‑similarity helper for Elixir.
https://github.com/elchemista/fuzler

elixir full-text-search fuzzy-search rust

Last synced: 4 months ago
JSON representation

A tiny, Rust‑powered string‑similarity helper for Elixir.

Host: GitHub
URL: https://github.com/elchemista/fuzler
Owner: elchemista
License: mit
Created: 2025-04-30T09:55:14.000Z (6 months ago)
Default Branch: master
Last Pushed: 2025-05-10T21:26:22.000Z (5 months ago)
Last Synced: 2025-06-01T12:41:48.098Z (5 months ago)
Topics: elixir, full-text-search, fuzzy-search, rust
Language: Elixir
Homepage: https://hex.pm/packages/fuzler
Size: 15.5 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Fuzler

_A tiny, Rust‑powered string‑similarity helper for Elixir._

`Fuzler` gives you **one public function**:

```elixir

Fuzler.similarity_score(query :: String.t(), target :: String.t()) :: float

```

It returns a **normalised score in $0.0 – 1.0$** that tells you how closely

two pieces of text match—robust to typos, word‑order swaps, case and basic

punctuation.

Behind the scenes it calls a compiled Rust NIF that mixes:

- **Hamming distance** – for very short, nearly equal‑length strings.

- **SIMD Levenshtein** – fast edit distance from the `triple_accel` crate.

- **Token‑bag Jaccard** – ignores word order.

- **Partial‑ratio window** – finds the best‑matching snippet when the target is much longer than the query.

The result is symmetric (`score(a,b) ≈ score(b,a)`), length‑normalised and remains meaningful from single words to multi‑sentence paragraphs.

---

## Installation

Add to your `mix.exs`:

```elixir

def deps do

  [

    {:fuzler, "~> 0.1.2"}

  ]

end

```

You need **Rust ≥ 1.70** installed; `rustler` will compile the NIF automatically.

---

## Quick examples

```elixir

iex> Fuzler.similarity_score("ciao", "ciao")

1.0

iex> Fuzler.similarity_score("bella ciao", "ciao bella")

0.70       # same words, different order

iex> long_text = "bella ciao come va oggi spero che tu stia bene ..."

iex> Fuzler.similarity_score("ciao", long_text)

0.75       # query appears once inside a 40‑token paragraph

iex> Fuzler.similarity_score("bonjour", long_text)

0.12       # word not present

```

---

## When should I use it?

| Use case                                    | Why it works well                                    |

| ------------------------------------------- | ---------------------------------------------------- |

| typo‑tolerant autocomplete / “did‑you‑mean” | Hamming + Levenshtein catch small edits fast         |

| matching short queries inside long blobs    | windowed _partial ratio_ focuses on the best slice   |

| order‑agnostic key comparison               | token‑bag Jaccard treats “ciao bella” = “bella ciao” |

| quick relevance scoring in Elixir           | pure NIF call, no external service needed            |

**Not** a full‑text search engine or a semantic synonym matcher—that’s what

Tantivy / Embeddings are for.

---

## API

```elixir

@doc "Returns a similarity score ∈ [0.0, 1.0]"

@spec similarity_score(String.t(), String.t()) :: float

```

If the NIF failed to load you’ll get:

```elixir

:erlang.nif_error(:nif_not_loaded)

```

so your code can decide to fall back or skip tests.

---

## How good is the score?

| Query / Target                                      | Score ≈     |

| --------------------------------------------------- | ----------- |

| identical strings (any case / punctuation)          | 1.00        |

| same words, swapped order                           | 0.68 – 0.72 |

| one‑word query present once in 45‑token paragraph   | \~0.75      |

| one‑word query absent from paragraph                | ≤ 0.15      |

| 80‑token paragraph vs same with 1 typo              | ≥ 0.90      |

| “ciao bella” with +30 random filler tokens appended | \~0.58      |

---

## Running the test suite

`mix test` runs a handful of ExUnit cases covering:

- case & punctuation variations

- word‑order permutations

- query present / absent in long paragraph (> 40 tokens)

- very long strings with tiny edits

- monotonic drop as filler tokens grow

All similarity tests auto‑skip if the NIF isn’t loaded (e.g. on

CI without Rust).

---

## License

MIT [License](LICENSE)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/elchemista/fuzler

Awesome Lists containing this project

README