Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nrennie/messy

R package to make a data frame messy and untidy.
https://github.com/nrennie/messy

r r-package teaching

Last synced: about 8 hours ago
JSON representation

R package to make a data frame messy and untidy.

Awesome Lists containing this project

README

        

[![R-CMD-check](https://github.com/nrennie/messy/workflows/R-CMD-check/badge.svg)](https://github.com/nrennie/messy/actions)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/messy)](https://cran.r-project.org/package=messy)

# messy

When teaching examples using R, instructors often using *nice* datasets - but these aren't very realistic, and aren't what students will later encounter in the real world. Real datasets have typos, missing values encoded in strange ways, and weird spaces. The {messy} R package takes a *clean* dataset, and randomly adds these things in - giving students the opportunity to practice their data cleaning and wrangling skills without having to change all of your examples.

## Installation

Install from CRAN using:

```r
install.packages("messy")
```

Install development version from GitHub using:

```r
remotes::install_github("nrennie/messy")
```

## Usage

For more in-depth usage instructions, see the package documentation at [nrennie.rbind.io/messy](https://nrennie.rbind.io/messy/) which has examples of each function.

The simplest way to use the {messy} package is applying the `messy()` function:

```r
set.seed(1234)
messy(ToothGrowth[1:10,])
```

```r
len supp dose
1 4.2 VC 0.5
2 11.5
3 7.3 VC 0.5
4 5.8 (VC 0.5
5 6.4 VC
6 10 VC 0.5
7 11.2 0.5
8 11.2 VC 0.5
9 5.2 VC 0.5
10 7 VC 0.5
```

You can vary the amount of *messiness* for each function, and chain together different functions to create customised messy data:

```r
set.seed(1234)
ToothGrowth[1:10,] |>
make_missing(cols = "supp", missing = " ") |>
make_missing(cols = c("len", "dose"), missing = c(NA, 999)) |>
add_whitespace(cols = "supp", messiness = 0.5) |>
add_special_chars(cols = "supp")
```

```r
len supp dose
1 4.2 VC 0.5
2 11.5 VC NA
3 7.3 VC 0.5
4 5.8 *VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 11.2 0.5
8 11.2 V#C NA
9 5.2 !VC 0.5
10 7.0 VC* 0.5
```