https://github.com/bnmoch3/polars-fuzzy-match
  
  
    Polars extension for fzf-style fuzzy matching  
    https://github.com/bnmoch3/polars-fuzzy-match
  
polars polars-dataframe python rust
        Last synced: 4 months ago 
        JSON representation
    
Polars extension for fzf-style fuzzy matching
- Host: GitHub
 - URL: https://github.com/bnmoch3/polars-fuzzy-match
 - Owner: bnmoch3
 - License: mit
 - Created: 2024-02-27T13:58:59.000Z (over 1 year ago)
 - Default Branch: master
 - Last Pushed: 2024-08-15T21:37:26.000Z (about 1 year ago)
 - Last Synced: 2025-06-25T07:06:19.795Z (4 months ago)
 - Topics: polars, polars-dataframe, python, rust
 - Language: Python
 - Homepage:
 - Size: 20.5 KB
 - Stars: 28
 - Watchers: 1
 - Forks: 3
 - Open Issues: 5
 - 
            Metadata Files:
            
- Readme: README.md
 - License: LICENSE
 
 
Awesome Lists containing this project
- awesome-polars - polars-fuzzy-match - Python package for fuzzy matching with Polars, i.e. matching text elements that are similar but not exactly identical by [@bnm3k](https://github.com/bnm3k). (Libraries/Packages/Scripts / Polars plugins)
 
README
          # Polars Fuzzy Matching
## Installation
```
pip install polars
pip install polars-fuzzy-match
```
## Usage
With both the plugin and polars installed, usage is as follows:
```python
import polars as pl
from polars_fuzzy_match import fuzzy_match_score
df = pl.DataFrame(
    {
        'strs': ['foo', 'foo quz BAR', 'baaarfoo', 'quz'],
    }
)
pattern = 'bar'
out = df.with_columns(
    score=fuzzy_match_score(
        pl.col('strs'),
        pattern,
    )
)
print(out)
```
This outputs:
```
shape: (4, 2)
┌─────────────┬───────┐
│ strs        ┆ score │
│ ---         ┆ ---   │
│ str         ┆ u32   │
╞═════════════╪═══════╡
│ foo         ┆ null  │
│ foo quz BAR ┆ 88    │
│ baaarfoo    ┆ 74    │
│ quz         ┆ null  │
└─────────────┴───────┘
```
When there is no match, score is `null`. When the pattern matches the value in
the given column, score is non-null. The higher the score, the closer the value
is to the pattern. Therefore, we can filter out values that do not match and
order by score:
```python
pattern = 'bar'
out = (
    df.with_columns(
        score=fuzzy_match_score(
            pl.col('strs'),
            pattern,
        )
    )
    .filter(pl.col('score').is_not_null())
    .sort(by='score', descending=True)
)
print(out)
```
This outputs:
```
shape: (2, 2)
┌─────────────┬───────┐
│ strs        ┆ score │
│ ---         ┆ ---   │
│ str         ┆ u32   │
╞═════════════╪═══════╡
│ foo quz BAR ┆ 88    │
│ baaarfoo    ┆ 74    │
└─────────────┴───────┘
```
### Fzf-style search syntax
This plugin supports Fzf-style search syntax for the pattern. It's worth noting
that this section is taken almost verbatim from the Fzf README:
| Pattern   | Match type                 | Description                                 |
| --------- | -------------------------- | ------------------------------------------- |
| `bar`     | fuzzy                      | items that fuzzy match `bar` e.g. 'bXXaXXr' |
| `'foo`    | substring exact match      | items that include `foo` e.g. 'is foo ok'  |
| `^music`  | prefix exact match         | items that start with `music`               |
| `.mp3$`   | suffix exact match         | items that end with `.mp3`                  |
| `!fire`   | inverse exact match        | items that do not include `fire`            |
| `!^music` | inverse prefix exact match | items that do not start with `music`        |
| `!.mp3$`  | inverse suffix exact match | items that do not end with `.mp3`           |
## Credits
1. Marco Gorelli's Tutorial on writing Polars Plugin. See
   [here](https://marcogorelli.github.io/polars-plugins-tutorial/).
2. The Helix Editor team for the
   [Nucleo fuzzy matching library](https://github.com/helix-editor/nucleo).