Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Sayani07/gghdr
Plots of highest density regions (HDR) for ggplot2
https://github.com/Sayani07/gghdr
Last synced: 2 months ago
JSON representation
Plots of highest density regions (HDR) for ggplot2
- Host: GitHub
- URL: https://github.com/Sayani07/gghdr
- Owner: Sayani07
- License: gpl-3.0
- Created: 2019-12-11T05:10:38.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2024-02-07T09:54:49.000Z (12 months ago)
- Last Synced: 2024-05-19T00:29:55.579Z (8 months ago)
- Language: R
- Homepage: https://sayani07.github.io/gghdr/
- Size: 25.8 MB
- Stars: 47
- Watchers: 11
- Forks: 5
- Open Issues: 7
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
set.seed(1234)
```# gghdr
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![R-CMD-check](https://github.com/Sayani07/gghdr/workflows/R-CMD-check/badge.svg)](https://github.com/Sayani07/gghdr/actions)
Package `gghdr` helps to visualize Highest Density Regions (HDR) in one and two dimensions. HDRs
are useful in displaying multimodality in the distribution. This work draws inspiration from the the package [`hdrcde`](https://pkg.robjhyndman.com/hdrcde/) developed by [Rob Hyndman](https://robjhyndman.com/) and provide a framework for displaying HDRs under `ggplot2` framework.# Installation
You could install the stable version on CRAN:
```{r install, eval = FALSE}
install.packages("gghdr")
```You could install the development version from Github using:
```{r install_github, eval = FALSE}
# install.packages("remotes")
remotes::install_github("Sayani07/gghdr")
```# An overview of gghdr
There are several statistical methods to summarize a distribution by region of the sample space covering certain probability. For example, in a traditional boxplot, the central box bounded by the interquartile range represents 50% coverage and whiskers represent 99% coverage for large samples. The method of summarizing a distribution using highest density regions is useful for analysing multimodal distributions. We illustrate this by exploring the data set `faithful` which contains the waiting time and duration of eruptions for the old faithful geyser in the Yellowstone National Park, USA.
```{r setup, echo=FALSE, message=FALSE}
library(hdrcde)
``````{r boxplot}
library(ggplot2)
ggplot(faithful, aes(y=eruptions)) + geom_boxplot()
```
We can use `geom_hdr_boxplot` to display the same variable. Along with displaying the 99% and 50% highest density regions, it also shows the local mode in each of the regions. This shows that eruption times are likely to be around 4.5 minutes or 2 minutes, but rarely for around 3 minutes. This insight was not apparent in the above boxplot.
```{r gg_hdr-boxplot, echo=TRUE, eval = T}
library(gghdr)
library(ggplot2)
ggplot(faithful, aes(y = eruptions)) +
geom_hdr_boxplot(prob = c(.5, 0.99), fill = "blue") +
theme_minimal()
```It can be interesting to supplement a scatterplot with marginal distributions of one or both variables to enhance insights into the relationship between the two variables. This is possible through `geom_hdr_rug`. This shows two clear clusters, one with shorter waiting times and shorter eruptions (around 2 minutes) and another with longer waiting times and longer eruptions (around 4.5 minutes). The `geom_hdr_rug` adds to this information by displaying the highest density region of eruption time covering 50% and 99%.
```{r hdr_rug}
ggplot(faithful) +
geom_point(aes(x = eruptions, y = waiting)) +
geom_hdr_rug(aes(x = eruptions), prob = c(0.99, 0.5), fill = "blue")
```The previous example can be extended to allow displaying the scatterplot with points coloured according to the bivariate highest density regions using `hdr_bin`.`hdr_bin` can also be mapped to only the x-axis or y-axis to show the marginal distribution of any one variable. This figure enriches the information in the scatterplot by emphasizing the highest bivariate density regions covering 50%, 90%, 99%, and more than 99% coverage.
```{r hdr_bin}
ggplot(data = faithful, aes(x = waiting, y=eruptions)) +
geom_point(aes(colour = hdr_bin(x = waiting, y = eruptions))) +
scale_colour_viridis_d(direction = -1)
```You can read more about gghdr in the [vignette](https://sayani07.github.io/gghdr/).
[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)