Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pachadotdev/spuriouscorrelations

Keeping alive the amazing examples from Tyler Vigen
https://github.com/pachadotdev/spuriouscorrelations

correlation r

Last synced: 2 months ago
JSON representation

Keeping alive the amazing examples from Tyler Vigen

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# Spurious correlations

The goal of spuriouscorrelations is to keep alive the amazing examples from
[Tyler Vigen](https://web.archive.org/web/20230607181247/https://tylervigen.com/spurious-correlations).
Unfortunately, as of 2023-10-09, the website is down as my students noticed.
Therefore, I decided to use the snapshot from the Internet Wayback Machine to
save the datasets from 2023-06-07.

## Installation

You can install the development version of spuriouscorrelations like so:

``` r
remotes::install_github("pachadotdev/spuriouscorrelations")
```

## Example

This is a basic example which shows you how to plot a spurious correlation:

```{r example}
library(dplyr)
library(ggplot2)
library(spuriouscorrelations)

unique(spurious_correlations$var2_short)

nicholas_cage <- spurious_correlations %>%
filter(var2_short == "Nicholas Cage")

ggplot(nicholas_cage) +
geom_point(aes(x = var1_value, y = var2_value, color = year), size = 4) +
theme_minimal() +
labs(
x = "Drownings by falling into a pool",
y = "Nicholas Cage movies",
title = "Drownings by Falling into a Pool vs Nicholas Cage Movies"
)
```

The correlation is:
```{r example2}
cor(nicholas_cage$var1_value, nicholas_cage$var2_value)
```

Now let's make it double-axis:

```{r example3}
library(tidyr)

nicholas_cage_long <- nicholas_cage %>%
select(year, var1_value, var2_value) %>%
pivot_longer(
cols = c(var1_value, var2_value),
names_to = "variable",
values_to = "value"
) %>%
# standardize the values
group_by(variable) %>%
mutate(
value = (value - mean(value)) / sd(value),
variable = case_when(
variable == "var1_value" ~ "Drownings by falling into a pool",
variable == "var2_value" ~ "Nicholas Cage movies"
)
)

# make a double y axis plot with year on the x axis
ggplot(nicholas_cage_long, aes(
x = year, y = value, color = variable,
group = variable
)) +
geom_line() +
geom_point() +
theme_minimal() +
theme(legend.position = "top") +
labs(
x = "Year",
y = "Standardized values",
title = "Drownings by Falling into a Pool vs Nicholas Cage Movies"
)
```