https://github.com/tidyverse/tidyr

Tidy Messy Data
https://github.com/tidyverse/tidyr

r tidy-data

Last synced: 25 days ago
JSON representation

Tidy Messy Data

Host: GitHub
URL: https://github.com/tidyverse/tidyr
Owner: tidyverse
License: other
Created: 2014-06-10T14:24:33.000Z (almost 11 years ago)
Default Branch: main
Last Pushed: 2025-03-03T19:50:08.000Z (3 months ago)
Last Synced: 2025-05-03T02:13:39.341Z (about 1 month ago)
Topics: r, tidy-data
Language: R
Homepage: https://tidyr.tidyverse.org/
Size: 19.7 MB
Stars: 1,394
Watchers: 70
Forks: 415
Open Issues: 55
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Support: .github/SUPPORT.md

Awesome Lists containing this project

awesome-starred - tidyr - Easily tidy data with spread and gather functions. (R)
Road2R - tidyr - Easily Tidy Data with 'spread()' and 'gather()' Functions. (Table of Contents / Data manipulation)
jimsghstars - tidyverse/tidyr - Tidy Messy Data (R)

README

        ---

output: github_document

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "README-"

)

```

# tidyr 

[![CRAN status](https://www.r-pkg.org/badges/version/tidyr)](https://cran.r-project.org/package=tidyr)

[![R-CMD-check](https://github.com/tidyverse/tidyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/tidyr/actions/workflows/R-CMD-check.yaml)

[![Codecov test coverage](https://codecov.io/gh/tidyverse/tidyr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/tidyr?branch=main)

  

## Overview

The goal of tidyr is to help you create __tidy data__. Tidy data is data where:

1. Each variable is a column; each column is a variable.

1. Each observation is a row; each row is an observation.

1. Each value is a cell; each cell is a single value.

Tidy data describes a standard way of storing data that is used wherever possible throughout the [tidyverse](https://www.tidyverse.org/). If you ensure that your data is tidy, you'll spend less time fighting with the tools and more time working on your analysis. Learn more about tidy data in `vignette("tidy-data")`.

## Installation

```{r, eval = FALSE}

# The easiest way to get tidyr is to install the whole tidyverse:

install.packages("tidyverse")

# Alternatively, install just tidyr:

install.packages("tidyr")

# Or the development version from GitHub:

# install.packages("pak")

pak::pak("tidyverse/tidyr")

```

## Cheatsheet

  

## Getting started

```{r}

library(tidyr)

```

tidyr functions fall into five main categories:

* "Pivoting" which converts between long and wide forms. tidyr 1.0.0 

  introduces `pivot_longer()` and `pivot_wider()`, replacing the older 

  `spread()` and `gather()` functions. See `vignette("pivot")` for more 

  details.

  

* "Rectangling", which turns deeply nested lists (as from JSON) into tidy

  tibbles. See `unnest_longer()`, `unnest_wider()`, `hoist()`, and 

  `vignette("rectangle")` for more details.

  

* Nesting converts grouped data to a form where each group becomes a

  single row containing a nested data frame, and unnesting does the opposite.

  See `nest()`, `unnest()`, and  `vignette("nest")` for more details.

* Splitting and combining character columns. Use `separate_wider_delim()`,

  `separate_wider_position()`, and `separate_wider_regex()` to pull a single

  character column into multiple columns; use `unite()` to combine multiple

  columns into a single character column.

* Make implicit missing values explicit with `complete()`; make explicit 

  missing values implicit with `drop_na()`; replace missing values with

  next/previous value with `fill()`, or a known value with `replace_na()`.

## Related work

tidyr [supersedes](https://lifecycle.r-lib.org/articles/stages.html#superseded) reshape2 (2010-2014) and reshape (2005-2010). Somewhat counterintuitively, each iteration of the package has done less. tidyr is designed specifically for tidying data, not general reshaping (reshape2), or the general aggregation (reshape). 

[data.table](https://rdatatable.gitlab.io/data.table) provides high-performance implementations of `melt()` and `dcast()`

If you'd like to read more about data reshaping from a CS perspective, I'd recommend the following three papers:

* [Wrangler: Interactive visual specification of data transformation scripts](http://vis.stanford.edu/papers/wrangler)

* [An interactive framework for data cleaning](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2000/CSD-00-1110.pdf) (Potter's wheel)

* [On efficiently implementing SchemaSQL on a SQL database system](https://www.vldb.org/conf/1999/P45.pdf)

To guide your reading, here's a translation between the terminology used in different places:

| tidyr 1.0.0       | pivot longer | pivot wider |

|-------------------|--------------|-------------|

| tidyr < 1.0.0     | gather       | spread      |

| reshape(2)        | melt         | cast        |

| spreadsheets      | unpivot      | pivot       | 

| databases         | fold         | unfold      |

## Getting help

If you encounter a clear bug, please file a minimal reproducible example on [github](https://github.com/tidyverse/tidyr/issues). For questions and other discussion, please use [forum.posit.co](https://forum.posit.co/).

---

Please note that the tidyr project is released with a [Contributor Code of Conduct](https://tidyr.tidyverse.org/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tidyverse/tidyr

Awesome Lists containing this project

README