Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Sayani07/gghdr

Plots of highest density regions (HDR) for ggplot2
https://github.com/Sayani07/gghdr

Last synced: about 2 months ago
JSON representation

Plots of highest density regions (HDR) for ggplot2

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
set.seed(1234)
```

# gghdr

[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![R-CMD-check](https://github.com/Sayani07/gghdr/workflows/R-CMD-check/badge.svg)](https://github.com/Sayani07/gghdr/actions)


Package `gghdr` helps to visualize Highest Density Regions (HDR) in one and two dimensions. HDRs
are useful in displaying multimodality in the distribution. This work draws inspiration from the the package [`hdrcde`](https://pkg.robjhyndman.com/hdrcde/) developed by [Rob Hyndman](https://robjhyndman.com/) and provide a framework for displaying HDRs under `ggplot2` framework.

# Installation

You could install the stable version on CRAN:

```{r install, eval = FALSE}
install.packages("gghdr")
```

You could install the development version from Github using:

```{r install_github, eval = FALSE}
# install.packages("remotes")
remotes::install_github("Sayani07/gghdr")
```

# An overview of gghdr

There are several statistical methods to summarize a distribution by region of the sample space covering certain probability. For example, in a traditional boxplot, the central box bounded by the interquartile range represents 50% coverage and whiskers represent 99% coverage for large samples. The method of summarizing a distribution using highest density regions is useful for analysing multimodal distributions. We illustrate this by exploring the data set `faithful` which contains the waiting time and duration of eruptions for the old faithful geyser in the Yellowstone National Park, USA.

```{r setup, echo=FALSE, message=FALSE}
library(hdrcde)
```

```{r boxplot}
library(ggplot2)
ggplot(faithful, aes(y=eruptions)) + geom_boxplot()
```

We can use `geom_hdr_boxplot` to display the same variable. Along with displaying the 99% and 50% highest density regions, it also shows the local mode in each of the regions. This shows that eruption times are likely to be around 4.5 minutes or 2 minutes, but rarely for around 3 minutes. This insight was not apparent in the above boxplot.

```{r gg_hdr-boxplot, echo=TRUE, eval = T}
library(gghdr)
library(ggplot2)
ggplot(faithful, aes(y = eruptions)) +
geom_hdr_boxplot(prob = c(.5, 0.99), fill = "blue") +
theme_minimal()
```

It can be interesting to supplement a scatterplot with marginal distributions of one or both variables to enhance insights into the relationship between the two variables. This is possible through `geom_hdr_rug`. This shows two clear clusters, one with shorter waiting times and shorter eruptions (around 2 minutes) and another with longer waiting times and longer eruptions (around 4.5 minutes). The `geom_hdr_rug` adds to this information by displaying the highest density region of eruption time covering 50% and 99%.

```{r hdr_rug}
ggplot(faithful) +
geom_point(aes(x = eruptions, y = waiting)) +
geom_hdr_rug(aes(x = eruptions), prob = c(0.99, 0.5), fill = "blue")
```

The previous example can be extended to allow displaying the scatterplot with points coloured according to the bivariate highest density regions using `hdr_bin`.`hdr_bin` can also be mapped to only the x-axis or y-axis to show the marginal distribution of any one variable. This figure enriches the information in the scatterplot by emphasizing the highest bivariate density regions covering 50%, 90%, 99%, and more than 99% coverage.

```{r hdr_bin}
ggplot(data = faithful, aes(x = waiting, y=eruptions)) +
geom_point(aes(colour = hdr_bin(x = waiting, y = eruptions))) +
scale_colour_viridis_d(direction = -1)
```

You can read more about gghdr in the [vignette](https://sayani07.github.io/gghdr/).

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)