{"id":13401165,"url":"https://github.com/nacnudus/unpivotr","last_synced_at":"2025-05-16T05:06:01.702Z","repository":{"id":10599774,"uuid":"66308149","full_name":"nacnudus/unpivotr","owner":"nacnudus","description":"Unpivot complex and irregular data layouts in R","archived":false,"fork":false,"pushed_at":"2024-11-30T21:12:46.000Z","size":6134,"stargazers_count":185,"open_issues_count":4,"forks_count":19,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-12-06T20:12:17.791Z","etag":null,"topics":["excel","pivot-tables","r","spreadsheet"],"latest_commit_sha":null,"homepage":"https://nacnudus.github.io/unpivotr/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nacnudus.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-08-22T21:04:24.000Z","updated_at":"2024-11-30T21:12:50.000Z","dependencies_parsed_at":"2024-01-18T11:04:04.884Z","dependency_job_id":"e08f131a-ed33-4410-bdd7-f85e5ae63f5d","html_url":"https://github.com/nacnudus/unpivotr","commit_stats":{"total_commits":624,"total_committers":11,"mean_commits":56.72727272727273,"dds":0.05608974358974361,"last_synced_commit":"f7eb82b0e5f9ea67357402f7973a49254312d15c"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nacnudus%2Funpivotr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nacnudus%2Funpivotr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nacnudus%2Funpivotr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nacnudus%2Funpivotr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nacnudus","download_url":"https://codeload.github.com/nacnudus/unpivotr/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254471061,"owners_count":22076585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["excel","pivot-tables","r","spreadsheet"],"created_at":"2024-07-30T19:00:59.409Z","updated_at":"2025-05-16T05:05:56.691Z","avatar_url":"https://github.com/nacnudus.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, echo = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/\"\n)\n```\n\n# unpivotr\n\n\u003c!-- badges: start --\u003e\n[![Cran Status](http://www.r-pkg.org/badges/version/unpivotr)](https://CRAN.R-project.org/package=unpivotr)\n![Cran Downloads](https://cranlogs.r-pkg.org/badges/unpivotr)\n[![codecov](https://codecov.io/github/nacnudus/unpivotr/coverage.svg?branch=master)](https://app.codecov.io/gh/nacnudus/unpivotr)\n[![R-CMD-check](https://github.com/nacnudus/unpivotr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/nacnudus/unpivotr/actions/workflows/R-CMD-check.yaml)\n\u003c!-- badges: end --\u003e\n\n[unpivotr](https://github.com/nacnudus/unpivotr) deals with non-tabular data,\nespecially from spreadsheets.  Use unpivotr when your source data has any of\nthese 'features':\n\n* Multi-headered hydra\n* Meaningful formatting\n* Headers anywhere but at the top of each column\n* Non-text headers e.g. dates\n* Other stuff around the table\n* Several similar tables in one sheet\n* Sentinel values\n* Superscript symbols\n* Meaningful comments\n* Nested HTML tables\n\nIf that list makes your blood boil, you'll enjoy the function names.\n\n* `behead()` deals with multi-headered hydra tables one layer of headers at a\n  time, working from the edge of the table inwards.  It's a bit like using\n  `header = TRUE` in `read.csv()`, but because it's a function, you can apply it\n  to as many layers of headers as you need.  You end up with all the headers in\n  columns.\n* `spatter()` is like `tidyr::spread()` but preserves mixed data types.  You get\n  into a mixed-data-type situation by delaying type coercion until *after* the\n  table is tidy (rather than before, like `read.csv()` et al).  And yes, it\n  usually follows `behead()`.\n\nMore positive, corrective functions:\n\n* `justify()` aligns column headers before `behead()`ing, and has deliberate\n  moral overtones.\n* `enhead()` attaches a header to the body of the data, *a la* Frankenstein.\n  The effect is the same as `behead()`, but is more powerful because you can\n  choose exactly which header cells you want, paying attention to formatting\n  (which `behead()` doesn't understand).\n* `isolate_sentinels()` separates meaningful symbols like `\"N/A\"` or\n  `\"confidential\"` from the rest of the data, giving them some time alone think\n  about what they've done.\n* `partition()` takes a sheet with several tables on it, and slashes into pieces\n  that each contain one table.  You can then unpivot each table in turn with\n  `purrr::map()` or similar.\n\n## Make cells tidy\n\nUnpivotr uses data where each cells is represented by one row in a dataframe.\nLike this.\n\n![Gif of tidyxl converting cells into a tidy representation of one row per cell](./vignettes/tidy_xlsx.gif)\n\nWhat can you do with tidy cells?  The best places to start are:\n\n* [Spreadsheet Munging\n  Strategies](https://nacnudus.github.io/spreadsheet-munging-strategies/), a\n  free, online cookbook using [tidyxl](https://github.com/nacnudus/tidyxl/) and\n  [unpivotr](https://github.com/nacnudus/unpivotr)\n* [Screencasts](https://www.youtube.com/watch?v=1sinC7wsS5U) on YouTube.\n* [Worked examples](https://github.com/nacnudus/ukfarm) on GitHub.\n\nOtherwise the basic idea is:\n\n1. Read the data with a specialist tool.\n   * For spreadsheets, use [tidyxl](https://nacnudus.github.io/tidyxl/).\n   * For plain text files, you might soon be able to use\n     [readr](https://readr.tidyverse.org), but for now you'll have to install a\n     pull-request on that package with\n     `devtools::install_github(\"tidyverse/readr#760\")`.\n   * For tables in html pages, use `unpivotr::tidy_html()`\n   * For data frames, use `unpivotr::as_cells()` -- this should be a last\n     resort, because by the time the data is in a conventional data frame, it\n     is often too late -- formatting has been lost, and most data types have\n     been coerced to strings.\n1. Either `behead()` straight away, else `dplyr::filter()` separately for the\n   header cells and the data cells, and then recombine with `enhead()`.\n1. `spatter()` so that each column has one data type.\n\n```{r}\nlibrary(unpivotr)\nlibrary(tidyverse)\nx \u003c- purpose$`up-left left-up`\nx # A pivot table in a conventional data frame.  Four levels of headers, in two\n  # rows and two columns.\n\ny \u003c- as_cells(x) # 'Tokenize' or 'melt' the data frame into one row per cell\ny\n\nrectify(y) # useful for reviewing the melted form as though in a spreadsheet\n\ny %\u003e%\n  behead(\"up-left\", \"sex\") %\u003e%               # Strip headers\n  behead(\"up\", \"life-satisfication\") %\u003e%  # one\n  behead(\"left-up\", \"qualification\") %\u003e%     # by\n  behead(\"left\", \"age-band\") %\u003e%            # one.\n  select(-row, -col, -data_type, count = chr) %\u003e% # cleanup\n  mutate(count = as.integer(count))\n```\n\nNote the compass directions in the code above, which hint to `behead()` where to\nfind the header cell for each data cell.\n\n* `\"up-left\"` means the header (`Female`, `Male`) is positioned up and to the\n  left of the columns of data cells it describes.\n* `\"up\"` means the header (`0 - 6`, `7 - 10`) is positioned directly above the\n  columns of data cells it describes.\n* `\"left-up\"` means the header (`Bachelor's degree`, `Certificate`, etc.) is\n  positioned to the left and upwards of the rows of data cells it describes.\n* `\"left\"` means the header (`15 - 24`, `25 - 44`, etc.) is positioned directly to\n  the left of the rows of data cells it describes.\n\n## Installation\n\n```{r, echo = TRUE, eval = FALSE}\n# install.packages(\"devtools\") # If you don't already have devtools\ndevtools::install_github(\"nacnudus/unpivotr\", build_vignettes = TRUE)\n```\n\nThe version 0.4.0 release had somee breaking changes.  See `NEWS.md` for\ndetails.  The previous version can be installed as follow:\n\n```r\ndevtools::install_version(\"unpivotr\", version = \"0.3.1\", repos = \"http://cran.us.r-project.org\")\n```\n\n## Similar projects\n\n[unpivotr](https://github.com/nacnudus/unpivotr) is inspired by\n[Databaker](https://github.com/sensiblecodeio/databaker), a collaboration\nbetween the [United Kingdom Office of National Statistics](https://www.ons.gov.uk/)\nand [The Sensible Code Company](https://sensiblecode.io/).\n[unpivotr](https://github.com/nacnudus/unpivotr).\n\n[jailbreaker](https://github.com/rsheets/jailbreakr) attempts to extract\nnon-tabular data from spreadsheets into tabular structures automatically via\nsome clever algorithms.  [unpivotr](https://github.com/nacnudus/unpivotr)\ndiffers by being less magic, and equipping you to express what you want to do.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnacnudus%2Funpivotr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnacnudus%2Funpivotr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnacnudus%2Funpivotr/lists"}