https://github.com/nrennie/messy
R package to make a data frame messy and untidy.
https://github.com/nrennie/messy
r r-package teaching
Last synced: about 1 year ago
JSON representation
R package to make a data frame messy and untidy.
- Host: GitHub
- URL: https://github.com/nrennie/messy
- Owner: nrennie
- License: cc-by-4.0
- Created: 2023-08-31T14:05:16.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-12-03T23:03:54.000Z (over 1 year ago)
- Last Synced: 2025-04-04T04:12:04.429Z (about 1 year ago)
- Topics: r, r-package, teaching
- Language: R
- Homepage: https://nrennie.rbind.io/messy/
- Size: 1.61 MB
- Stars: 141
- Watchers: 5
- Forks: 9
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
- License: LICENSE.md
Awesome Lists containing this project
README
[](https://github.com/nrennie/messy/actions)
[](https://cran.r-project.org/package=messy)
# messy 
When teaching examples using R, instructors often using *nice* datasets - but these aren't very realistic, and aren't what students will later encounter in the real world. Real datasets have typos, missing values encoded in strange ways, and weird spaces. The {messy} R package takes a *clean* dataset, and randomly adds these things in - giving students the opportunity to practice their data cleaning and wrangling skills without having to change all of your examples.
## Installation
Install from CRAN using:
```r
install.packages("messy")
```
Install development version from GitHub using:
```r
remotes::install_github("nrennie/messy")
```
## Usage
For more in-depth usage instructions, see the package documentation at [nrennie.rbind.io/messy](https://nrennie.rbind.io/messy/) which has examples of each function.
The simplest way to use the {messy} package is applying the `messy()` function:
```r
set.seed(1234)
messy(ToothGrowth[1:10,])
```
```r
len supp dose
1 4.2 VC 0.5
2 11.5
3 7.3 VC 0.5
4 5.8 (VC 0.5
5 6.4 VC
6 10 VC 0.5
7 11.2 0.5
8 11.2 VC 0.5
9 5.2 VC 0.5
10 7 VC 0.5
```
You can vary the amount of *messiness* for each function, and chain together different functions to create customised messy data:
```r
set.seed(1234)
ToothGrowth[1:10,] |>
make_missing(cols = "supp", missing = " ") |>
make_missing(cols = c("len", "dose"), missing = c(NA, 999)) |>
add_whitespace(cols = "supp", messiness = 0.5) |>
add_special_chars(cols = "supp")
```
```r
len supp dose
1 4.2 VC 0.5
2 11.5 VC NA
3 7.3 VC 0.5
4 5.8 *VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 11.2 0.5
8 11.2 V#C NA
9 5.2 !VC 0.5
10 7.0 VC* 0.5
```