Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ColinFay/tidystringdist
String distance calculation the tidy way.
https://github.com/ColinFay/tidystringdist
Last synced: 3 months ago
JSON representation
String distance calculation the tidy way.
- Host: GitHub
- URL: https://github.com/ColinFay/tidystringdist
- Owner: ColinFay
- License: other
- Created: 2017-09-09T20:40:57.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2019-03-20T20:36:57.000Z (over 5 years ago)
- Last Synced: 2024-05-21T02:13:03.079Z (6 months ago)
- Language: R
- Size: 72.3 KB
- Stars: 40
- Watchers: 6
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - ColinFay/tidystringdist - String distance calculation the tidy way. (R)
README
---
output:
md_document:
variant: markdown_github
---```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```[![Coverage Status](https://img.shields.io/codecov/c/github/ColinFay/tidystringdist/master.svg)](https://codecov.io/github/ColinFay/tidystringdist?branch=master)
[![Travis-CI Build Status](https://travis-ci.org/ColinFay/tidystringdist.svg?branch=master)](https://travis-ci.org/ColinFay/tidystringdist)
# tidystringdist
Compute string distance the tidy way. Built on top of the `stringdist` package.
## Install tidystringdist
You'll get the dev version on:
```{r eval = FALSE}
devtools::install_github("ColinFay/tidystringdist")
```Stable version is available with :
```{r eval = FALSE}
install.packages("tidystringdist")
```## tidystringdist basic workflow
## tidycomb
First, you need to create a tibble with the combinations of words you want to compare. You can do this with the `tidy_comb` and `tidy_comb_all` functions. The first takes a base word and combines it with each elements of a list or a column of a data.frame, the 2nd combines all the possible couples from a list or a column.
If you already have a data.frame with two columns containing the strings to compare, you can skip this part.
```{r}
library(tidystringdist)tidy_comb_all(LETTERS[1:3])
``````{r}
tidy_comb_all(iris, Species)
``````{r}
tidy_comb("Paris", state.name[1:3])
```### tidy_string_dist
Once you've got this data.frame, you can use `tidy_string_dist()` to compute string distance. This function takes a data.frame, the two columns containing the strings, and one or more stringdist methods.
Note that if you've used the `tidy_comb` function to create your data.frame, you won't need to set the column names.
```{r example, warnings = FALSE, error=FALSE, message=FALSE}
library(dplyr)
data(starwars)
tidy_comb_sw <- tidy_comb_all(starwars, name)
tidy_stringdist(tidy_comb_sw)
```Default call compute all the methods. You can use specific method with the `method` argument:
```{r}
tidy_stringdist(tidy_comb_sw, method = c("osa","jw"))
```## Tidyverse workflow
The goal is to provide a convenient interface to work with other tools from the tidyverse.
```{r}
tidy_stringdist(tidy_comb_sw, method= "osa") %>%
filter(osa > 20) %>%
arrange(desc(osa))
``````{r}
starwars %>%
filter(species == "Droid") %>%
tidy_comb_all(name) %>%
tidy_stringdist() %>%
summarise_if(is.numeric, mean)
```### Contact
Questions and feedbacks [welcome](mailto:[email protected])!