Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/joeroe/controller

Tools for working with controlled vocabularies in R
https://github.com/joeroe/controller

authority-control controlled-vocabularies r r-package

Last synced: 23 days ago
JSON representation

Tools for working with controlled vocabularies in R

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# controller

[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)
[![R-CMD-check](https://github.com/joeroe/controller/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/joeroe/controller/actions/workflows/R-CMD-check.yaml)
[![Test covcontrollerge](https://codecov.io/gh/joeroe/controller/graph/badge.svg)](https://app.codecov.io/gh/joeroe/controller)

**controller** is a collection of functions for working with controlled vocabularies in R.
It introduces the `control()` verb, which recodes values in a vector using a lookup table of preferred and variant terms (a *thesaurus*).

## Installation

You can install the development version of controlled from GitHub using the [remotes](https://remotes.r-lib.org/) package:

``` r
remotes::install_github("joeroe/controller")
```

## Example

A common data-tidying problem is standardising variant terms for the same concept.
Imagine we have a dataset that uses a number of different names for shades of the same colour.
As data analysts, we naturally want to recode the data to eliminate this messy creativity, for example using [dplyr::recode()](https://dplyr.tidyverse.org/reference/recode.html):

```{r eg-dplyr}
library(dplyr, warn.conflicts = FALSE)
shades <- c("daffodil", "purple", "magenta", "azure", "navy", "violet")

recode(shades,
daffodil = "yellow",
purple = "purple",
magenta = "pink",
azure = "blue",
navy = "blue",
violet = "purple")
```

But recoding this way can be tedious, especially if there are a large number of terms.
With `control()`, we can instead use a data frame containing a thesaurus to replace the values:

```{r eg-controller}
library(controller)
data("colour_thesaurus")

control(shades, colour_thesaurus)
```

`control()` also supports fuzzy matching, removing the need to exhaustively list variants for common causes of differing terminology.
For example, to perform a case insensitive match to the thesaurus:

```{r eg-ci}
shades <- toupper(shades)
control_ci(shades, colour_thesaurus)
```