https://github.com/ColinFay/tidystringdist

String distance calculation the tidy way.
https://github.com/ColinFay/tidystringdist

Last synced: 8 months ago
JSON representation

String distance calculation the tidy way.

Host: GitHub
URL: https://github.com/ColinFay/tidystringdist
Owner: ColinFay
License: other
Created: 2017-09-09T20:40:57.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2019-03-20T20:36:57.000Z (over 6 years ago)
Last Synced: 2024-08-06T03:04:31.976Z (11 months ago)
Language: R
Size: 72.3 KB
Stars: 40
Watchers: 6
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

jimsghstars - ColinFay/tidystringdist - String distance calculation the tidy way. (R)

README

        ---

output:

  md_document:

    variant: markdown_github

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "README-"

)

```

[![Coverage Status](https://img.shields.io/codecov/c/github/ColinFay/tidystringdist/master.svg)](https://codecov.io/github/ColinFay/tidystringdist?branch=master)

[![Travis-CI Build Status](https://travis-ci.org/ColinFay/tidystringdist.svg?branch=master)](https://travis-ci.org/ColinFay/tidystringdist)

# tidystringdist

Compute string distance the tidy way. Built on top of the `stringdist` package.

## Install tidystringdist

You'll get the dev version on: 

```{r eval = FALSE}

devtools::install_github("ColinFay/tidystringdist")

```

Stable version is available with : 

```{r eval = FALSE}

install.packages("tidystringdist")

```

## tidystringdist basic workflow

## tidycomb 

First, you need to create a tibble with the combinations of words you want to compare. You can do this with the `tidy_comb` and `tidy_comb_all` functions. The first takes a base word and combines it with each elements of a list or a column of a data.frame, the 2nd combines all the possible couples from a list or a column.

If you already have a data.frame with two columns containing the strings to compare, you can skip this part. 

```{r}

library(tidystringdist)

tidy_comb_all(LETTERS[1:3])

```

```{r}

tidy_comb_all(iris, Species)

```

```{r}

tidy_comb("Paris", state.name[1:3])

```

### tidy_string_dist

Once you've got this data.frame, you can use `tidy_string_dist()` to compute string distance. This function takes a data.frame, the two columns containing the strings, and one or more stringdist methods. 

Note that if you've used the `tidy_comb` function to create your data.frame, you won't need to set the column names. 

```{r example, warnings = FALSE, error=FALSE, message=FALSE}

library(dplyr)

data(starwars)

tidy_comb_sw <- tidy_comb_all(starwars, name)

tidy_stringdist(tidy_comb_sw)

```

Default call compute all the methods. You can use specific method with the `method` argument: 

```{r}

tidy_stringdist(tidy_comb_sw, method = c("osa","jw"))

```

## Tidyverse workflow

The goal is to provide a convenient interface to work with other tools from the tidyverse. 

```{r}

tidy_stringdist(tidy_comb_sw, method= "osa") %>%

  filter(osa > 20) %>%

  arrange(desc(osa))

```

```{r}

starwars %>%

  filter(species == "Droid") %>%

  tidy_comb_all(name) %>%

  tidy_stringdist() %>% 

  summarise_if(is.numeric, mean)

```

### Contact

Questions and feedbacks [welcome](mailto:[email protected])!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ColinFay/tidystringdist

Awesome Lists containing this project

README