Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tmasjc/atl-salary-data
Simpson's paradox: a case study exercise
https://github.com/tmasjc/atl-salary-data
opendata shinyapps
Last synced: 9 days ago
JSON representation
Simpson's paradox: a case study exercise
- Host: GitHub
- URL: https://github.com/tmasjc/atl-salary-data
- Owner: tmasjc
- Created: 2018-01-10T15:22:59.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2019-10-14T04:22:27.000Z (about 5 years ago)
- Last Synced: 2024-11-30T01:46:09.335Z (13 days ago)
- Topics: opendata, shinyapps
- Language: R
- Homepage: https://tmasjc.shinyapps.io/atl_salary_data/
- Size: 1.35 MB
- Stars: 6
- Watchers: 0
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
- jimsghstars - tmasjc/atl-salary-data - Simpson's paradox: a case study exercise (R)
README
---
title: "Atlanta City Employee Salaries 2015"
output: github_document
---```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, fig.width = 7)
```## Overview
This [Shiny app](https://tmasjc.shinyapps.io/atl_salary_data/) explores the Atlanta city employee salaries data for the Year 2015. In particular, the relationship between employee ``salary`` and variables such as ``gender``, ``age``, and ``ethnic group``.
Dataset obtained from [data.world](https://data.world/brentbrewington/atlanta-city-employee-salaries), contributed by [Brent Brewington](https://github.com/bbrewington/atlanta-salary-data).
## Components
- **Selector Panel** which allows user to select points using cursor (click-and-drag)
- **Plot I** displays the relationship of employee ``median salary`` by ``age``, seperated by ``gender``
- **Plot II** displays the relationship of employee ``median salary`` by ``ethnic group``, seperated by ``gender``
- **Table I** which tabulates count at each combination of ``ethnic group`` and ``gender``![Screen Shot](screen.gif)
## Motivation
This Shiny app is meant as a simple demonstration of how unbalanced dataset can be misleading at times. In statistics, this phenomenon is known as **[Simpson's paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox)** or the Yule–Simpson effect.
If we look at male and female ``median salary`` at various ``age group``, we might observe that *in general, male earns higher pay than female*.
```{r, warning=FALSE, message=FALSE}
library(dplyr)
library(stringr)
library(ggplot2)raw <- readRDS("Data/atl_2015.RDS") %>%
rename(ethnic = ethnic.origin, job = job.title, salary = annual.salary) %>%
mutate(gender = factor(sex), ethnic = factor(ethnic)) %>%
as_tibble()which_less_than <- function(vec, n) {
x <- table(vec) < n
sort(unique(vec))[x]
}
sub_age <- with(raw, age %>% which_less_than(10))
sub_ethnic <- with(raw, ethnic %>% which_less_than(30))dat <- raw %>%
filter(!(age %in% sub_age), !(ethnic %in% sub_ethnic)) %>%
mutate(ethnic = factor(str_extract(ethnic, "^[A-Z]?[a-z]+")))
``````{r}
old <- theme_set(theme_light() + theme(plot.title = element_text(size = 11), legend.position = "none"))
``````{r}
dat %>%
group_by(age, sex) %>%
summarise(salary = median(salary)) %>%
ggplot(aes(age, salary, col = sex, group = sex)) +
scale_x_continuous(breaks = seq(20, 100, 10), limits = c(20, 70)) +
geom_line() +
labs(x = "Age", y = "Median Salary", col = "Sex") +
theme(legend.position = "bottom")
```However, if we look at male and female ``median salary`` by ``ethnic groups``, we notice that female's median salary does not seem to lag behind. As a matter of fact, *both Asian and White women median salary is actually higher than men by an obvious margin*.
```{r}
dat %>%
group_by(ethnic, sex) %>%
summarise(salary = median(salary)) %>%
ggplot(aes(ethnic, salary, col = sex)) +
geom_line(col = "gray") +
geom_point() +
labs(x = "Ethnic Group", y = "Median Salary", col = "Sex") +
theme(legend.position = "bottom")
```Tinker around the Shiny app to explore the data and understand the logic behind.
## Deployment
This Shiny app is shipped with Docker, built on [rocker/tidyverse](https://hub.docker.com/r/rocker/tidyverse/) image.
```bash
# from this repo
git clone [email protected]:tmasjc/atl-salary-data.git# move inside directory
cd atl-salary-data# name your image
# it may take a while to build
docker build -t atl_salary_data .# start your container
docker run -dp 3838:3838 atl_salary_data## You are set. Go to localhost:3838/atl_salary_data to view application. ##
```Or you can simply deploy it to Shiny server as per normal.