An open API service indexing awesome lists of open source software.

https://github.com/jvcasillas/untidydata

R package of untidy datasets made for the purpose of teaching the tidyverse
https://github.com/jvcasillas/untidydata

Last synced: 11 days ago
JSON representation

R package of untidy datasets made for the purpose of teaching the tidyverse

Awesome Lists containing this project

README

        

[![R-CMD-check](https://github.com/jvcasillas/untidydata/workflows/R-CMD-check/badge.svg)](https://github.com/jvcasillas/untidydata/actions)
[![CodeFactor](https://www.codefactor.io/repository/github/jvcasillas/untidydata/badge)](https://www.codefactor.io/repository/github/jvcasillas/untidydata)

## untidydata

An R package of untidy datasets made for the purpose of teaching the
tidyverse.

Last update: 2021-01-27

### Overview

The purpose of this package is to store untidy datasets I have been
creating for teaching purposes in a version controlled environment. The
datasets vary in difficulty and present different problems common when
tidying data.

### Installation

You can install the development version from GitHub with:

install.packages("devtools")
devtools::install_github("jvcasillas/untidydata")

### Datasets

- [language\_diversity](#language_diversity)
- [pre\_post](#pre_post)
- [spanish\_vowels](#spanish_vowels)
- [spirantization](#spirantization)
- [vot](#vot)

#### `language_diversity`

- Difficulty: easy
- A long format dataset that is most useful in wide format.
- Data taken from Appendix 1 in:
Nettle, D. (1998). Explaining Global Patterns of Language Diversity.
*Journal of Anthropological Archaeology*, 17, 354–374.

#### `pre_post`

- Difficulty: easy
- A typical pre-test, post-test data set in wide format.

#### `spanish_vowels`

- Difficulty: easy
- Simulated Spanish vowel formant measurements from male and female
speakers.

#### `spirantization`

- Difficulty: easy
- Simulated intensity measurements of CV sequences in word initial and
word medial position from L2 learners and native speakers.

#### `vot`

- Difficulty: medium
- A voice-onset time data set. Includes coronal stop data from English
and Spanish monolinguals, as well as English/Spanish bilinguals.