Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/edgararuiz-zz/pivotable
https://github.com/edgararuiz-zz/pivotable
business-intelligence database dimensions measures pivot-tables r
Last synced: 4 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/edgararuiz-zz/pivotable
- Owner: edgararuiz-zz
- Created: 2019-09-26T21:50:38.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2019-11-18T00:36:54.000Z (almost 5 years ago)
- Last Synced: 2024-08-13T07:14:13.806Z (3 months ago)
- Topics: business-intelligence, database, dimensions, measures, pivot-tables, r
- Language: R
- Homepage: https://edgararuiz.github.io/pivotable/
- Size: 514 KB
- Stars: 9
- Watchers: 2
- Forks: 2
- Open Issues: 10
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
- jimsghstars - edgararuiz-zz/pivotable - (R)
README
---
output: github_document
---```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(pivotable)
library(dplyr)
library(lubridate)
library(DBI)
library(RSQLite)add_figure <- function(path, width = 400) {
char_html <- paste0("
")
htmltools::HTML(char_html)
}
toc <- function() {
re <- readLines("README.Rmd")
has_title <- as.logical(lapply(re, function(x) substr(x, 1, 2) == "##"))
only_titles <- re[has_title]
titles <- trimws(gsub("#", "", only_titles))
links <- trimws(gsub("`", "", titles))
links <- tolower(links)
links <- trimws(gsub(" ", "-", links))
links <- trimws(gsub("\\.", "-", links))
links <- trimws(gsub("\\,", "", links))
toc_list <- lapply(
seq_along(titles),
function(x) {
pad <- ifelse(substr(only_titles[x], 1, 3) == "###", " - ", " - ")
paste0(pad, "[", titles[x], "](#",links[x], ")")
}
)
toc_full <- paste(toc_list, collapse = "\n")
cat(toc_full)
}```
# pivotable
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
[![Travis build status](https://travis-ci.org/edgararuiz/pivotable.svg?branch=master)](https://travis-ci.org/edgararuiz/pivotable)
[![Codecov test coverage](https://codecov.io/gh/edgararuiz/pivotable/branch/master/graph/badge.svg)](https://codecov.io/gh/edgararuiz/pivotable?branch=master)
[![CRAN status](https://www.r-pkg.org/badges/version/pivotable)](https://CRAN.R-project.org/package=pivotable)- [Intro](#intro)
- [Installation](#installation)
- [Pivot table](#pivot-table)
- [Values](#values)
- [Rows](#rows)
- [Columns](#columns)
- [Pivot table operations](#pivot-table-operations)
- [Switch rows to columns](#switch-rows-to-columns)
- [Filter](#filter)
- [Drill](#drill)
- [Totals](#totals)
- [Default values](#default-values)
- [Convert to tibble](#convert-to-tibble)
- [Define dimensions and measures](#define-dimensions-and-measures)
- [Database, Spark and `data.table` connections](#database-spark-and-data-table-connections)
- [Measures and dimensions](#measures-and-dimensions)
- [pivottabler](#pivottabler)## Intro
Create pivot tables with commonly used terms as commands such as: `pivot_rows()`, `pivot_columns()` and `pivot_values()`, and string them together with a pipe (`%>%`).
The idea is that the creation of a pivot table is done using code, as opposed to drag-and-drop. This means that actions such as `pivot_flip()` and `pivot_drill()` are also possible, and performed using R commands.
Another goal of `pivotable` is to provide a framework to easily define `prep_measures()` and `prep_dimensions()` of your data. The resulting R object can then be used as the source of the pivot table. This should make it possible to create consistent analysis, and subsequent reporting of the data.
## Installation
``` r
# install.packages("remotes")
remotes::install_github("edgararuiz/pivotable")
```## Pivot table
### Values
The `pivot_values()` function is used to add an aggregation in the pivot table. When used against a data frame, you will have to provide an aggregation formula of a field, or fields, within the data. If using a pre-defined set of dimensions and measures, then simply call the desired measure field, no need to re-aggregate. For more info see [Define dimensions and measures](#define-dimensions-and-measures).
```{r}
library(dplyr)
library(pivotable)retail_orders %>%
pivot_values(sum(sales))
```Multiple aggregations are supported by `pivot_values()`
```{r}
retail_orders %>%
pivot_values(sum(sales), n())
```The aggregations can also be named inside `pivot_values()`
```{r}
retail_orders %>%
pivot_values(total_sales = sum(sales), no_sales = n())
```### Rows
As its name indicates, `pivot_rows()` adds a data grouping based on the variable or variables passed to the function. The aggregation is split by the variable(s) and each total is displayed by row.
```{r}
retail_orders %>%
pivot_rows(country) %>%
pivot_values(sum(sales))
```### Columns
As its name indicates, `pivot_rows()` adds a data grouping based on the variable or variables passed to the function. The aggregation is split by the variable(s) and each total is displayed by column.
```{r}
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(sum(sales))
```## Pivot table operations
### Switch rows to columns
Instead of "manually" switching the content of `pivot_rows()` and `pivot_columns()`, specially during data exploration, simply pipe the code to the `pivot_flip()` command.
```{r}
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(sum(sales)) %>%
pivot_flip()
````pivotable` also includes support for the `t()` method from base R. It will perform the exact same operation as `pivot_flip()`
```{r}
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(sum(sales)) %>%
t()
```### Filter
To limit the pivot table to display only a subset of the pivot table, use `pivot_focus()`
```{r}
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(total_sales = sum(sales)) %>%
pivot_focus(
country %in% c("Japan", "USA", "UK"),
status == "Shipped",
total_sales > 200000
)
````pivotable` also supports `dplyr`'s `filter()` command.
```{r}
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(total_sales = sum(sales)) %>%
filter(
country %in% c("Japan", "USA", "UK"),
status == "Shipped",
total_sales > 200000
)
```### Drill
Another powerful thing of pivot tables is the ability to drill down into the data. To do this in `pivotable`, you will need to define a hierarchy dimension using the `dim_hierarchy()` command. That command is made to be called within one of the dimension definition functions, such as `pivot_rows()` or `pivot_columns()`. The order of the hierarchy is defined by the order in which the variables is passed to the function.
```{r}
retail_orders %>%
pivot_rows(order_date = dim_hierarchy(
year = as.integer(format(orderdate, "%Y")),
month = as.integer(format(orderdate, "%m"))
)
) %>%
pivot_values(sum(sales))
```The `pivot_drill()` command will add the next level of the hierarchy dimension to the pivot table.
```{r}
retail_orders %>%
pivot_rows(order_date = dim_hierarchy(
year = as.integer(format(orderdate, "%Y")),
month = as.integer(format(orderdate, "%m"))
)
) %>%
pivot_values(sum(sales)) %>%
pivot_drill(order_date)
```A helper function called `dim_hierarchy_mqy()` creates a three level date hierarchy: year, quarter and month. The function will create the formulas to calculate each level, but those formulas will not be evaluated until the drilling into the level. The formulas are generic enough to work on database back-ends.
```{r}
retail_orders %>%
pivot_rows(order_date = dim_hierarchy_mqy(orderdate)) %>%
pivot_values(sum(sales)) %>%
pivot_drill(order_date)
```### Totals
The display of the totals can be controlled using `pivot_totals()`. It is possible to control the display of row and column totals individually. **Currently, sub-totals and grand totals are `sum()` aggregates, so if the actual calculations are not record counts, or another `sum()` consider leaving them off. For example, if the calculation is a `mean()`, then the sub-total and grand total will be the aggregate `sum()` total of the averages, which would be incorrect**.
```{r}
retail_orders %>%
pivot_rows(status, country) %>%
pivot_values(sum(sales)) %>%
pivot_totals(
include_row_totals = TRUE
)
```A default can be set to control if the column or row totals are displayed. The default is set to not show.
```{r}
pivot_default_totals(
include_column_totals = TRUE,
include_row_totals = TRUE
)retail_orders %>%
pivot_rows(status, country) %>%
pivot_values(sum(sales))
``````{r}
pivot_default_totals(FALSE, FALSE)
```### Default values
By themselves, `pivot_rows()` and `pivot_columns()` will only provide a list of the unique values of the data frame. There is a way to setup a default aggregation by using `pivot_default_values()`
```{r}
pivot_default_values(n())retail_orders %>%
pivot_rows(status)
```### Convert to tibble
The `as_tibble()` function is supported to convert the resulting pivot table into a rectangular `tibble()`
```{r}
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(sum(sales)) %>%
as_tibble()
```## Define dimensions and measures
With `pivotable`, it is possible to pre-define a set of dimensions and measures that can then be easily accesses and re-used by pivot tables. The idea is to provide a way to centralize data definitions, which creates a consistent reporting.
```{r}
orders <- retail_orders %>%
prep_dimensions(
order_date = dim_hierarchy(
year = as.integer(format(orderdate, "%Y")),
month = as.integer(format(orderdate, "%m"))
),
status,
country
) %>%
prep_measures(
orders_qty = n(),
order_total = sum(sales),
sales_qty = sum(ifelse(status %in% c("In Process", "Shipped"), 1, 0)),
sales_total = sum(ifelse(status %in% c("In Process", "Shipped"), sales, 0))
)orders
``````{r}
orders %>%
pivot_rows(status) %>%
pivot_columns(order_date) %>%
pivot_values(sales_total)
``````{r}
orders %>%
pivot_rows(status) %>%
pivot_columns(order_date) %>%
pivot_values(sales_total) %>%
pivot_drill(order_date)
```## Database, Spark and `data.table` connections
Because `pivotable` uses `dplyr` commands to create the aggregations. This allows `pivotable` to take advantage of the same integration that `dplyr` has, such as Spark, databases and `data.table`.
```{r}
library(DBI)
library(RSQLite)con <- dbConnect(SQLite(), ":memory:")
tbl_sales <- copy_to(con, retail_orders)
tbl_sales %>%
pivot_columns(status) %>%
pivot_rows(country) %>%
pivot_values(sum(sales, na.rm = TRUE))
```### Measures and dimensions
It is also possible create a data definition against a database connection. The `prep_dimensions()` and `prep_measures()` calculations will not be send to the database until used in the pivot table.
```{r}
orders_db <- tbl_sales %>%
prep_dimensions(
status,
country
) %>%
prep_measures(
orders_qty = n(),
order_total = sum(sales, na.rm = TRUE),
sales_qty = sum(ifelse(status %in% c("In Process", "Shipped"), 1, 0), na.rm = TRUE),
sales_total = sum(ifelse(status %in% c("In Process", "Shipped"), sales, 0), na.rm = TRUE)
)
``````{r}
orders_db %>%
pivot_columns(status) %>%
pivot_values(sales_total)
``````{r}
dbDisconnect(con)
```## pivottabler
`pivotable` uses `pivottabler` to print the pivot table into the R console. The `to_pivottabler()` returns the actual `pivottabler` object. This allows you to further customize the pivot table using that package's API.
```{r}
pt <- orders %>%
pivot_rows(order_date) %>%
pivot_values(orders_qty) %>%
to_pivottabler()pt$asMatrix(repeatHeaders = TRUE, includeHeaders = TRUE)
```