https://github.com/nacnudus/unpivotr

Unpivot complex and irregular data layouts in R
https://github.com/nacnudus/unpivotr

excel pivot-tables r spreadsheet

Last synced: 21 days ago
JSON representation

Unpivot complex and irregular data layouts in R

Host: GitHub
URL: https://github.com/nacnudus/unpivotr
Owner: nacnudus
License: other
Created: 2016-08-22T21:04:24.000Z (almost 9 years ago)
Default Branch: main
Last Pushed: 2024-11-30T21:12:46.000Z (6 months ago)
Last Synced: 2024-12-06T20:12:17.791Z (6 months ago)
Topics: excel, pivot-tables, r, spreadsheet
Language: R
Homepage: https://nacnudus.github.io/unpivotr/
Size: 5.85 MB
Stars: 185
Watchers: 8
Forks: 19
Open Issues: 4
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

jimsghstars - nacnudus/unpivotr - Unpivot complex and irregular data layouts in R (R)

README

        ---

output: github_document

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/"

)

```

# unpivotr

[![Cran Status](http://www.r-pkg.org/badges/version/unpivotr)](https://CRAN.R-project.org/package=unpivotr)

![Cran Downloads](https://cranlogs.r-pkg.org/badges/unpivotr)

[![codecov](https://codecov.io/github/nacnudus/unpivotr/coverage.svg?branch=master)](https://app.codecov.io/gh/nacnudus/unpivotr)

[![R-CMD-check](https://github.com/nacnudus/unpivotr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/nacnudus/unpivotr/actions/workflows/R-CMD-check.yaml)

[unpivotr](https://github.com/nacnudus/unpivotr) deals with non-tabular data,

especially from spreadsheets.  Use unpivotr when your source data has any of

these 'features':

* Multi-headered hydra

* Meaningful formatting

* Headers anywhere but at the top of each column

* Non-text headers e.g. dates

* Other stuff around the table

* Several similar tables in one sheet

* Sentinel values

* Superscript symbols

* Meaningful comments

* Nested HTML tables

If that list makes your blood boil, you'll enjoy the function names.

* `behead()` deals with multi-headered hydra tables one layer of headers at a

  time, working from the edge of the table inwards.  It's a bit like using

  `header = TRUE` in `read.csv()`, but because it's a function, you can apply it

  to as many layers of headers as you need.  You end up with all the headers in

  columns.

* `spatter()` is like `tidyr::spread()` but preserves mixed data types.  You get

  into a mixed-data-type situation by delaying type coercion until *after* the

  table is tidy (rather than before, like `read.csv()` et al).  And yes, it

  usually follows `behead()`.

More positive, corrective functions:

* `justify()` aligns column headers before `behead()`ing, and has deliberate

  moral overtones.

* `enhead()` attaches a header to the body of the data, *a la* Frankenstein.

  The effect is the same as `behead()`, but is more powerful because you can

  choose exactly which header cells you want, paying attention to formatting

  (which `behead()` doesn't understand).

* `isolate_sentinels()` separates meaningful symbols like `"N/A"` or

  `"confidential"` from the rest of the data, giving them some time alone think

  about what they've done.

* `partition()` takes a sheet with several tables on it, and slashes into pieces

  that each contain one table.  You can then unpivot each table in turn with

  `purrr::map()` or similar.

## Make cells tidy

Unpivotr uses data where each cells is represented by one row in a dataframe.

Like this.

![Gif of tidyxl converting cells into a tidy representation of one row per cell](./vignettes/tidy_xlsx.gif)

What can you do with tidy cells?  The best places to start are:

* [Spreadsheet Munging

  Strategies](https://nacnudus.github.io/spreadsheet-munging-strategies/), a

  free, online cookbook using [tidyxl](https://github.com/nacnudus/tidyxl/) and

  [unpivotr](https://github.com/nacnudus/unpivotr)

* [Screencasts](https://www.youtube.com/watch?v=1sinC7wsS5U) on YouTube.

* [Worked examples](https://github.com/nacnudus/ukfarm) on GitHub.

Otherwise the basic idea is:

1. Read the data with a specialist tool.

   * For spreadsheets, use [tidyxl](https://nacnudus.github.io/tidyxl/).

   * For plain text files, you might soon be able to use

     [readr](https://readr.tidyverse.org), but for now you'll have to install a

     pull-request on that package with

     `devtools::install_github("tidyverse/readr#760")`.

   * For tables in html pages, use `unpivotr::tidy_html()`

   * For data frames, use `unpivotr::as_cells()` -- this should be a last

     resort, because by the time the data is in a conventional data frame, it

     is often too late -- formatting has been lost, and most data types have

     been coerced to strings.

1. Either `behead()` straight away, else `dplyr::filter()` separately for the

   header cells and the data cells, and then recombine with `enhead()`.

1. `spatter()` so that each column has one data type.

```{r}

library(unpivotr)

library(tidyverse)

x <- purpose$`up-left left-up`

x # A pivot table in a conventional data frame.  Four levels of headers, in two

  # rows and two columns.

y <- as_cells(x) # 'Tokenize' or 'melt' the data frame into one row per cell

y

rectify(y) # useful for reviewing the melted form as though in a spreadsheet

y %>%

  behead("up-left", "sex") %>%               # Strip headers

  behead("up", "life-satisfication") %>%  # one

  behead("left-up", "qualification") %>%     # by

  behead("left", "age-band") %>%            # one.

  select(-row, -col, -data_type, count = chr) %>% # cleanup

  mutate(count = as.integer(count))

```

Note the compass directions in the code above, which hint to `behead()` where to

find the header cell for each data cell.

* `"up-left"` means the header (`Female`, `Male`) is positioned up and to the

  left of the columns of data cells it describes.

* `"up"` means the header (`0 - 6`, `7 - 10`) is positioned directly above the

  columns of data cells it describes.

* `"left-up"` means the header (`Bachelor's degree`, `Certificate`, etc.) is

  positioned to the left and upwards of the rows of data cells it describes.

* `"left"` means the header (`15 - 24`, `25 - 44`, etc.) is positioned directly to

  the left of the rows of data cells it describes.

## Installation

```{r, echo = TRUE, eval = FALSE}

# install.packages("devtools") # If you don't already have devtools

devtools::install_github("nacnudus/unpivotr", build_vignettes = TRUE)

```

The version 0.4.0 release had somee breaking changes.  See `NEWS.md` for

details.  The previous version can be installed as follow:

```r

devtools::install_version("unpivotr", version = "0.3.1", repos = "http://cran.us.r-project.org")

```

## Similar projects

[unpivotr](https://github.com/nacnudus/unpivotr) is inspired by

[Databaker](https://github.com/sensiblecodeio/databaker), a collaboration

between the [United Kingdom Office of National Statistics](https://www.ons.gov.uk/)

and [The Sensible Code Company](https://sensiblecode.io/).

[unpivotr](https://github.com/nacnudus/unpivotr).

[jailbreaker](https://github.com/rsheets/jailbreakr) attempts to extract

non-tabular data from spreadsheets into tabular structures automatically via

some clever algorithms.  [unpivotr](https://github.com/nacnudus/unpivotr)

differs by being less magic, and equipping you to express what you want to do.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nacnudus/unpivotr

Awesome Lists containing this project

README