Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jumpingrivers/datasaurus

R Package 📦 Containing the Datasaurus Dozen datasets :bar_chart:
https://github.com/jumpingrivers/datasaurus

anscombesquartet datasaurus datasaurus-dozen datasets r r-package rstats summary-statistics

Last synced: about 11 hours ago
JSON representation

R Package 📦 Containing the Datasaurus Dozen datasets :bar_chart:

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/"
)
```

# datasauRus

[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![CRAN status](https://www.r-pkg.org/badges/version/datasauRus)](https://CRAN.R-project.org/package=datasauRus)
[![R-CMD-check](https://github.com/jumpingrivers/datasauRus/workflows/R-CMD-check/badge.svg)](https://github.com/jumpingrivers/datasauRus/actions)
[![R-CMD-check](https://github.com/jumpingrivers/datasauRus/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jumpingrivers/datasauRus/actions/workflows/R-CMD-check.yaml)

This package wraps the awesome Datasaurus Dozen datasets. The Datasaurus Dozen show us why visualisation is important -- summary statistics can be the same but distributions can be very different. In short, this package gives a fun alternative to [Anscombe's Quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet), available in R as `anscombe`.

The original Datasaurus was created by Alberto Cairo. The other Dozen were generated using simulated annealing and the process
is described in the paper "Same Stats, Different Graphs: Generating
Datasets with Varied Appearance and Identical Statistics through
Simulated Annealing" by Justin
Matejka and George Fitzmaurice ([open access materials including manuscript and code](https://www.research.autodesk.com/publications/same-stats-different-graphs/), [official paper](https://doi.org/10.1145/3025453.3025912)).

In the paper, Justin and George simulate a variety of datasets that the same summary statistics to the Datasaurus but have very different distributions.

```{r, out.width="600px", fig.alt="Sequential dinosaur gif", echo = FALSE}
knitr::include_graphics("https://damassets.autodesk.net/content/dam/autodesk/research/publications-assets/gifs/same-stats-different-graphs/DinoSequentialSmaller.gif")
```

## Install
The latest stable version is available on CRAN

```{r, eval = FALSE}
install.packages("datasauRus")
```

You can get the latest development version from GitHub, so use {devtools} to install the package

```{r, eval = FALSE}
devtools::install_github("jumpingrivers/datasauRus")
```

## Usage

You can use the package to produce Anscombe plots and more.

```{r datasets, fig.height=12, fig.width=9}
library("ggplot2")
library("datasauRus")
ggplot(datasaurus_dozen, aes(x = x, y = y, colour = dataset))+
geom_point() +
theme_void() +
theme(legend.position = "none")+
facet_wrap(~dataset, ncol = 3)
```

## Code of Conduct

Please note that the datasauRus project is released with a [Contributor Code of Conduct](https://jumpingrivers.github.io/datasauRus/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms