Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pachadotdev/spuriouscorrelations
Keeping alive the amazing examples from Tyler Vigen
https://github.com/pachadotdev/spuriouscorrelations
correlation r
Last synced: 2 months ago
JSON representation
Keeping alive the amazing examples from Tyler Vigen
- Host: GitHub
- URL: https://github.com/pachadotdev/spuriouscorrelations
- Owner: pachadotdev
- License: cc0-1.0
- Created: 2023-10-09T16:39:01.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-10T14:50:10.000Z (about 1 year ago)
- Last Synced: 2024-10-11T19:21:13.181Z (3 months ago)
- Topics: correlation, r
- Language: HTML
- Homepage:
- Size: 557 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# Spurious correlations
The goal of spuriouscorrelations is to keep alive the amazing examples from
[Tyler Vigen](https://web.archive.org/web/20230607181247/https://tylervigen.com/spurious-correlations).
Unfortunately, as of 2023-10-09, the website is down as my students noticed.
Therefore, I decided to use the snapshot from the Internet Wayback Machine to
save the datasets from 2023-06-07.## Installation
You can install the development version of spuriouscorrelations like so:
``` r
remotes::install_github("pachadotdev/spuriouscorrelations")
```## Example
This is a basic example which shows you how to plot a spurious correlation:
```{r example}
library(dplyr)
library(ggplot2)
library(spuriouscorrelations)unique(spurious_correlations$var2_short)
nicholas_cage <- spurious_correlations %>%
filter(var2_short == "Nicholas Cage")ggplot(nicholas_cage) +
geom_point(aes(x = var1_value, y = var2_value, color = year), size = 4) +
theme_minimal() +
labs(
x = "Drownings by falling into a pool",
y = "Nicholas Cage movies",
title = "Drownings by Falling into a Pool vs Nicholas Cage Movies"
)
```The correlation is:
```{r example2}
cor(nicholas_cage$var1_value, nicholas_cage$var2_value)
```Now let's make it double-axis:
```{r example3}
library(tidyr)nicholas_cage_long <- nicholas_cage %>%
select(year, var1_value, var2_value) %>%
pivot_longer(
cols = c(var1_value, var2_value),
names_to = "variable",
values_to = "value"
) %>%
# standardize the values
group_by(variable) %>%
mutate(
value = (value - mean(value)) / sd(value),
variable = case_when(
variable == "var1_value" ~ "Drownings by falling into a pool",
variable == "var2_value" ~ "Nicholas Cage movies"
)
)# make a double y axis plot with year on the x axis
ggplot(nicholas_cage_long, aes(
x = year, y = value, color = variable,
group = variable
)) +
geom_line() +
geom_point() +
theme_minimal() +
theme(legend.position = "top") +
labs(
x = "Year",
y = "Standardized values",
title = "Drownings by Falling into a Pool vs Nicholas Cage Movies"
)
```