Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bnm3k/polars-fuzzy-match
Polars extension for fzf-style fuzzy matching
https://github.com/bnm3k/polars-fuzzy-match
polars polars-dataframe python rust
Last synced: 4 months ago
JSON representation
Polars extension for fzf-style fuzzy matching
- Host: GitHub
- URL: https://github.com/bnm3k/polars-fuzzy-match
- Owner: bnm3k
- License: mit
- Created: 2024-02-27T13:58:59.000Z (9 months ago)
- Default Branch: master
- Last Pushed: 2024-03-05T08:04:24.000Z (9 months ago)
- Last Synced: 2024-07-12T01:59:16.334Z (4 months ago)
- Topics: polars, polars-dataframe, python, rust
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 16
- Watchers: 1
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-polars - polars-fuzzy-match - Python package for fuzzy matching with Polars, i.e. matching text elements that are similar but not exactly identical by [@bnm3k](https://github.com/bnm3k). (Libraries/Packages/Scripts / Python)
README
# Polars Fuzzy Matching
## Installation
```
pip install polars
pip install polars-fuzzy-match
```## Usage
With both the plugin and polars installed, usage is as follows:
```python
import polars as pl
from polars_fuzzy_match import fuzzy_match_scoredf = pl.DataFrame(
{
'strs': ['foo', 'foo quz BAR', 'baaarfoo', 'quz'],
}
)
pattern = 'bar'
out = df.with_columns(
score=fuzzy_match_score(
pl.col('strs'),
pattern,
)
)
print(out)
```This outputs:
```
shape: (4, 2)
┌─────────────┬───────┐
│ strs ┆ score │
│ --- ┆ --- │
│ str ┆ u32 │
╞═════════════╪═══════╡
│ foo ┆ null │
│ foo quz BAR ┆ 88 │
│ baaarfoo ┆ 74 │
│ quz ┆ null │
└─────────────┴───────┘
```When there is no match, score is `null`. When the pattern matches the value in
the given column, score is non-null. The higher the score, the closer the value
is to the pattern. Therefore, we can filter out values that do not match and
order by score:```python
pattern = 'bar'
out = (
df.with_columns(
score=fuzzy_match_score(
pl.col('strs'),
pattern,
)
)
.filter(pl.col('score').is_not_null())
.sort(by='score', descending=True)
)
print(out)
```This outputs:
```
shape: (2, 2)
┌─────────────┬───────┐
│ strs ┆ score │
│ --- ┆ --- │
│ str ┆ u32 │
╞═════════════╪═══════╡
│ foo quz BAR ┆ 88 │
│ baaarfoo ┆ 74 │
└─────────────┴───────┘
```### Fzf-style search syntax
This plugin supports Fzf-style search syntax for the pattern. It's worth noting
that this section is taken almost verbatim from the Fzf README:| Pattern | Match type | Description |
| --------- | -------------------------- | ------------------------------------------- |
| `bar` | fuzzy | items that fuzzy match `bar` e.g. 'bXXaXXr' |
| `'foo` | substring exact match | items that include `foo` e.g. 'is foo ok' |
| `^music` | prefix exact match | items that start with `music` |
| `.mp3$` | suffix exact match | items that end with `.mp3` |
| `!fire` | inverse exact match | items that do not include `fire` |
| `!^music` | inverse prefix exact match | items that do not start with `music` |
| `!.mp3$` | inverse suffix exact match | items that do not end with `.mp3` |## Credits
1. Marco Gorelli's Tutorial on writing Polars Plugin. See
[here](https://marcogorelli.github.io/polars-plugins-tutorial/).
2. The Helix Editor team for the
[Nucleo fuzzy matching library](https://github.com/helix-editor/nucleo).