An open API service indexing awesome lists of open source software.

https://github.com/paulnorthrop/anscombiser

Create datasets with identical summary statistics
https://github.com/paulnorthrop/anscombiser

anscombe anscombes-quartet anscombesquartet

Last synced: about 2 months ago
JSON representation

Create datasets with identical summary statistics

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 8,
fig.height = 8,
fig.path = "man/figures/README-"
)
```

# anscombiser

[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/paulnorthrop/anscombiser?branch=main&svg=true)](https://ci.appveyor.com/project/paulnorthrop/anscombiser)
[![R-CMD-check](https://github.com/paulnorthrop/anscombiser/workflows/R-CMD-check/badge.svg)](https://github.com/paulnorthrop/anscombiser/actions)
[![Coverage Status](https://codecov.io/github/paulnorthrop/anscombiser/coverage.svg?branch=main)](https://codecov.io/github/paulnorthrop/anscombiser?branch=master)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/anscombiser)](https://cran.r-project.org/package=anscombiser)
[![Downloads (monthly)](https://cranlogs.r-pkg.org/badges/anscombiser?color=brightgreen)](https://cran.r-project.org/package=anscombiser)
[![Downloads (total)](https://cranlogs.r-pkg.org/badges/grand-total/anscombiser?color=brightgreen)](https://cran.r-project.org/package=anscombiser)

### What does anscombiser do?

Anscombe's quartet are a set of four two-variable datasets that have several common summary statistics (essentially means, variances and correlation) but which have very different joint distributions. This becomes apparent when the data are plotted, which illustrates the importance of using graphical displays in Statistics. The `anscombiser` package provides a quick and easy way to create several datasets that have common values for Anscombe's summary statistics but display very different behaviour when plotted. It does this by transforming (shifting, scaling and rotating) the dataset to achieve target summary statistics.

### An example

The `mimic()` function transforms an input dataset (`dino` below left) so that it has the same values of Anscombe's summary statistics as another dataset (`trump` below right).

```{r, trump, out.width='50%', fig.show='hold'}
library(anscombiser)
library(datasauRus)
dino <- datasaurus_dozen_wide[, c("dino_x", "dino_y")]
new_dino <- mimic(dino, trump)
plot(new_dino, legend_args = list(x = "topright"))
plot(new_dino, input = TRUE, legend_args = list(x = "bottomright"), pch = 20)
```

In this example these images had similar summary statistics from the outset and therefore the appearance of the `dino` dataset has changed little. Otherwise, the first dataset will be deformed but its general shape will still be recognisable.

The rotation applied to the input dataset is not unique. The function `mimic` (and a function `anscombise` that is specific to Anscombe's quartet) has an argument `idempotent` that controls how the rotation is performed. In the special case where the input dataset already has the desired summary statistics, using `idempotent = TRUE` ensures that the output dataset is the same as the input dataset.

### Installation

To get the current released version from CRAN:

```{r installation, eval = FALSE}
install.packages("anscombiser")
```

### Vignette

See `vignette("intro-to-anscombiser", package = "anscombiser")` for an overview of the package.