Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/seatgeek/thefuzz
Fuzzy String Matching in Python
https://github.com/seatgeek/thefuzz
Last synced: 2 days ago
JSON representation
Fuzzy String Matching in Python
- Host: GitHub
- URL: https://github.com/seatgeek/thefuzz
- Owner: seatgeek
- License: mit
- Created: 2021-03-05T19:07:19.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2024-02-27T19:08:27.000Z (12 months ago)
- Last Synced: 2025-02-11T15:48:13.164Z (9 days ago)
- Language: Python
- Size: 116 KB
- Stars: 3,041
- Watchers: 25
- Forks: 144
- Open Issues: 41
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.rst
- License: LICENSE.txt
Awesome Lists containing this project
- jimsghstars - seatgeek/thefuzz - Fuzzy String Matching in Python (Python)
README
.. image:: https://github.com/seatgeek/thefuzz/actions/workflows/ci.yml/badge.svg
:target: https://github.com/seatgeek/thefuzzTheFuzz
=======Fuzzy string matching like a boss. It uses `Levenshtein Distance `_ to calculate the differences between sequences in a simple-to-use package.
Requirements
============- Python 3.8 or higher
- `rapidfuzz `_For testing
~~~~~~~~~~~
- pycodestyle
- hypothesis
- pytestInstallation
============Using pip via PyPI
.. code:: bash
pip install thefuzz
Using pip via GitHub
.. code:: bash
pip install git+git://github.com/seatgeek/[email protected]#egg=thefuzz
Adding to your ``requirements.txt`` file (run ``pip install -r requirements.txt`` afterwards)
.. code:: bash
git+ssh://[email protected]/seatgeek/[email protected]#egg=thefuzz
Manually via GIT
.. code:: bash
git clone git://github.com/seatgeek/thefuzz.git thefuzz
cd thefuzz
python setup.py installUsage
=====.. code:: python
>>> from thefuzz import fuzz
>>> from thefuzz import processSimple Ratio
~~~~~~~~~~~~.. code:: python
>>> fuzz.ratio("this is a test", "this is a test!")
97Partial Ratio
~~~~~~~~~~~~~.. code:: python
>>> fuzz.partial_ratio("this is a test", "this is a test!")
100Token Sort Ratio
~~~~~~~~~~~~~~~~.. code:: python
>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
100Token Set Ratio
~~~~~~~~~~~~~~~.. code:: python
>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100Partial Token Sort Ratio
~~~~~~~~~~~~~~~~~~~~~~~~.. code:: python
>>> fuzz.token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
84
>>> fuzz.partial_token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
100Process
~~~~~~~.. code:: python
>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
[('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
("Dallas Cowboys", 90)You can also pass additional parameters to ``extractOne`` method to make it use a specific scorer. A typical use case is to match file paths:
.. code:: python
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61).. |Build Status| image:: https://github.com/seatgeek/thefuzz/actions/workflows/ci.yml/badge.svg
:target: https://github.com/seatgeek/thefuzz