https://github.com/tufayellus/find-closely-matched-words-in-python
This repository contains template code to find closely matched words from a list of words
https://github.com/tufayellus/find-closely-matched-words-in-python
compare-data compare-strings match-word python-levenshtein similarity-search
Last synced: about 1 year ago
JSON representation
This repository contains template code to find closely matched words from a list of words
- Host: GitHub
- URL: https://github.com/tufayellus/find-closely-matched-words-in-python
- Owner: TufayelLUS
- Created: 2024-01-10T06:44:59.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-10T07:02:09.000Z (over 2 years ago)
- Last Synced: 2025-01-23T23:44:46.718Z (over 1 year ago)
- Topics: compare-data, compare-strings, match-word, python-levenshtein, similarity-search
- Language: Python
- Homepage:
- Size: 3.91 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Find Closely Matched Words In Python
This repository contains template code to find closely matched words from a list of words
# Introduction
Often we face cases where we have different words to process and consider similar words as the same. An example situation can be a comparison of the data obtained from two different websites where the data can slightly vary from one another. This template should help to get an idea of how to handle those cases using fuzzywuzzy library and Levenshtein distance algorithm.
# Example
Let's assume that we have two lists of words
[
"mercedez",
"toyota supra",
"bmw"
]
and
[
"Mercedez benz",
"supra",
"bmw"
]
Here, you can see that the words mercedez and Mercedez benz are both similar to one another. And they're the same company. To compare and consider them as the same data, this template approach will be useful. If you have two words already known instead of a list, you may simply do this:
from fuzzywuzzy import fuzz
accuracy_ratio = 80
word1 = "mercedez"
word2 = "Mercedez benz"
if fuzz.partial_ratio(word1, word2) > accuracy_ratio:
print("Matched")
else:
print("Not matched)
And that will do the trick.
# Command
pip install fuzzywuzzy python-Levenshtein
# Need a software developer to automate your tasks?
Contact Me on Fiverr | Contact Me on LinkedIn | Contact Me on Facebook