Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eliocamp/metamer
Create data sets with identical statistics.
https://github.com/eliocamp/metamer
r r-package rstats
Last synced: 11 days ago
JSON representation
Create data sets with identical statistics.
- Host: GitHub
- URL: https://github.com/eliocamp/metamer
- Owner: eliocamp
- License: gpl-3.0
- Created: 2018-11-28T20:56:36.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2022-06-23T20:46:56.000Z (over 2 years ago)
- Last Synced: 2024-10-01T12:36:26.074Z (about 1 month ago)
- Topics: r, r-package, rstats
- Language: R
- Homepage: https://eliocamp.github.io/metamer
- Size: 9.72 MB
- Stars: 25
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
- jimsghstars - eliocamp/metamer - Create data sets with identical statistics. (R)
README
---
output: github_document
---```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
cache = TRUE,
cache.extra = 42
)```
# metamer[![DOI](https://zenodo.org/badge/159563811.svg)](https://zenodo.org/badge/latestdoi/159563811) [![Travis build status](https://travis-ci.org/eliocamp/metamer.svg?branch=master)](https://travis-ci.org/eliocamp/metamer) [![Codecov test coverage](https://codecov.io/gh/eliocamp/metamer/branch/master/graph/badge.svg)](https://app.codecov.io/gh/eliocamp/metamer?branch=master)
Implements the algorithm proposed by [Matejka & Fitzmaurice (2017)](https://www.autodesk.com/research/publications/same-stats-different-graphs) to create metamers (datasets with identical statistical properties but very different graphs) with an annealing scheme derived from [de Vicente et al. (2003)](https://www.sciencedirect.com/science/article/abs/pii/S0375960103013653?via%3Dihub).
In colour theory, [metamers](https://en.wikipedia.org/wiki/Metamerism_(color)) are colours that have very different wavelength distribution but are perceived as equal by out visual system. This happens because out eyes essentially summarise a continuous distribution of wavelength by just 3 numbers: the amount that each type of cone cell is exited. Colour metamerism is how artists can reproduce so many colours with a few pigments, or how PC monitors use only 3 lights to show colourful pictures.
![](man/figures/lemon.jpg)
(from the excellent [Color: From Hexcodes to Eyeballs](http://jamie-wong.com/post/color/) by [Jamie Wong](https://github.com/jlfwong))
Statistical transformations such as mean, standard deviation and correlation behave very similarly in that they summarise data with just a few numbers for the benefit of our limited cognitive capacity. Thus, statistical metamers are sets of data that share some statistical properties.
[This article](https://eliocamp.github.io/codigo-r/en/2019/01/statistical-metamerism/) explores statistical metamerism in more detail.
## Installation
You can install metamer with:
```r
install.packages("metamer")
```or install the development version with:
``` r
# install.packages("devtools")
devtools::install_github("eliocamp/metamer")
```## Example
You can construct metamers from a starting dataset and a vector of statistical properties to remain constant (by default, up to 2 significant figures).
```{r example}
library(metamer)
# Start with the datasaurus
# install.packages("datasauRus")
dino <- subset(datasauRus::datasaurus_dozen, dataset == "dino")
dino$dataset <- NULL# And we want to preserve means and correlation
mean_cor <- delayed_with(mean(x), mean(y), cor(x, y))set.seed(42) # To make results reproducible
metamers <- metamerise(dino, preserve = mean_cor,
stop_if = n_metamers(300),
perturbation = 1,
keep = 19)
print(metamers)
```We found `r length(metamers$metamers)` metamers. Let's see the final one, with the starting dataset as background.
```{r}
library(ggplot2)ggplot(tail(metamers), aes(x, y)) +
geom_point(data = dino, color = "red", alpha = 0.5, size = 0.4) +
geom_point()
```We can check that the statistical properties have been preserved up to 2 significant figures:
```{r}
cbind(dino = signif(mean_cor(dino), 2),
last = signif(mean_cor(tail(metamers)), 2))
```However, a semi random cloud of points is not that interesting, so we can specify a minimizing function so that the result is similar to another dataset. `metamerise` will start from the last metamer of the previous run if the `data` argument is a list of metamers and append the result.
```{r}
x_shape <- subset(datasauRus::datasaurus_dozen, dataset == "x_shape")
x_shape$dataset <- NULL
``````{r}
metamers <- metamerise(dino,
preserve = mean_cor,
minimize = mean_dist_to(x_shape),
stop_if = minimize_ratio(0.02),
keep = 99)
```Now the result is a bit more impressive.
```{r}
ggplot(tail(metamers), aes(x, y)) +
geom_point(data = dino, color = "red", alpha = 0.5, size = 0.4) +
geom_point()
```We can animate the whole thing.
```{r, gganimate = list(fps = 30)}
library(gganimate)ggplot(metamers, aes(x, y)) +
geom_point() +
transition_manual(.metamer)
```You can freehand your own starting or target data with the `draw_data()` utility, that will open a shiny interface. You might need to install `shiny` and `miniUI` with `install.packages(c("shiny", "miniUI"))`.
Metamerizing operations can be chained while changing the minimizing function.
```{r}
star <- subset(datasauRus::datasaurus_dozen, dataset == "star")
star$dataset <- NULL
set.seed(42)
metamers <- metamerise(dino,
preserve = mean_cor,
minimize = mean_dist_to(x_shape),
stop_if = minimize_ratio(0.05),
keep = 29) |>
set_minimize(mean_dist_to(star)) |>
metamerise(stop_if = minimize_ratio(0.05),
keep = 30) |>
set_minimize(mean_dist_to(dino)) |>
metamerise(stop_if = minimize_ratio(0.05),
keep = 30)
```And the full sequence
```{r, gganimate = list(nframes = 30*3, fps = 30)}
ggplot(metamers, aes(x, y)) +
geom_point() +
transition_manual(.metamer)
```