Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mlampros/fuzzywuzzyR

fuzzy string matching in R
https://github.com/mlampros/fuzzywuzzyR

fuzzywuzzy matching python r reticulate string

Last synced: about 2 months ago
JSON representation

fuzzy string matching in R

Awesome Lists containing this project

README

        

[![tic](https://github.com/mlampros/fuzzywuzzyR/workflows/tic/badge.svg?branch=master)](https://github.com/mlampros/fuzzywuzzyR/actions)
[![codecov.io](https://codecov.io/github/mlampros/fuzzywuzzyR/coverage.svg?branch=master)](https://codecov.io/github/mlampros/fuzzywuzzyR?branch=master)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/fuzzywuzzyR)](http://cran.r-project.org/package=fuzzywuzzyR)
[![Downloads](http://cranlogs.r-pkg.org/badges/grand-total/fuzzywuzzyR?color=blue)](http://www.r-pkg.org/pkg/fuzzywuzzyR)
Buy Me A Coffee
[![Dependencies](https://tinyverse.netlify.com/badge/fuzzywuzzyR)](https://cran.r-project.org/package=fuzzywuzzyR)

## fuzzywuzzyR

The **fuzzywuzzyR** package is a fuzzy string matching implementation of the [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy) python package. It uses the [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) to calculate the differences between sequences. More details on the functionality of fuzzywuzzyR can be found in the [blog-post](http://mlampros.github.io/2017/04/13/fuzzywuzzyR_package/) and in the package Vignette.


**UPDATE 26-07-2018**: A [Singularity image file](http://mlampros.github.io/2018/07/26/singularity_containers/) is available in case that someone intends to run *fuzzywuzzyR* on Ubuntu Linux (locally or in a cloud instance) with all package requirements pre-installed. This allows the user to utilize the *fuzzywuzzyR* package without having to spend time on the installation process.


### **System Requirements**


* Python (>= 2.4)

* difflib

* fuzzywuzzy ( >=0.15.0 )

* [python-Levenshtein](https://github.com/ztane/python-Levenshtein/) ( >=0.12.0, optional, provides a 4-10x speedup in String Matching, though may result in differing results for certain cases)


Before the installation of any python modules one should check the python-configuration using :


```R
reticulate::py_config()

```

All modules should be installed in the default python configuration (the configuration that the R-session displays as default), otherwise errors will occur during package installation.


#### **Debian/Ubuntu/Fedora**


**Python2**

```R
sudo apt-get install python-pip
sudo pip install --upgrade pip
pip install fuzzywuzzy
pip install python-Levenshtein
```

**Python 3**

```R
sudo apt-get install python3-pip
sudo pip3 install --upgrade pip
pip3 install fuzzywuzzy
pip3 install python-Levenshtein
```


#### **Macintosh OSX**

```R
sudo easy_install pip
sudo pip install fuzzywuzzy
sudo pip install python-Levenshtein
```

#### **Windows OS**


* Download of [get-pip.py](https://bootstrap.pypa.io/get-pip.py)
* Update of the Environment variables ( Control Panel >> System and Security >> System >> Advanced system settings >> Environment variables >> System variables >> Path >> Edit ) by adding ( for instance in case of python 2.7 ) :
```R
C:\Python27;C:\Python27\Scripts
```

* Install the [Build Tools for Visual Studio](https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2017)
* Open the *Command prompt* and use the following commands:
```R
pip install fuzzywuzzy
pip install python-Levenshtein
```


### **Installation of the fuzzywuzzyR package**


To install the package from CRAN use,

```R

install.packages('fuzzywuzzyR')

```

and to download the latest version from Github use the *install_github* function of the devtools package,


```R

devtools::install_github(repo = 'mlampros/fuzzywuzzyR')

```


Use the following link to report bugs/issues,


[https://github.com/mlampros/fuzzywuzzyR/issues](https://github.com/mlampros/fuzzywuzzyR/issues)


### **Citation:**

If you use the code of this repository in your paper or research please cite both **fuzzywuzzyR** and the **original software** [https://CRAN.R-project.org/package=fuzzywuzzyR/citation.html](https://CRAN.R-project.org/package=fuzzywuzzyR/citation.html):


```R
@Manual{,
title = {{fuzzywuzzyR}: Fuzzy String Matching in R},
author = {Lampros Mouselimis},
year = {2021},
note = {R package version 1.0.5},
url = {https://CRAN.R-project.org/package=fuzzywuzzyR},
}
```